A Description of Every Data Set in `{stevedata}`
Source:vignettes/stevedata-description.Rmd
stevedata-description.Rmd
Here is a description of every data set in stevedata.
The underlying code in the vignettes/
folder will show the
code that formats this table.
Object Name | Description |
---|---|
af_crime93 |
These data are in Table 9.1 of the 3rd edition of
Agresti and Finlay’s Statistical Methods for the Social
Sciences. The data are from Statistical Abstract of the United
States and most variables were measured in 1993.
Rows: 51 Columns: 8 . Data types: character (1); numeric (7) |
aluminum_premiums |
A near daily data set on the price of aluminum premiums
(USD/MT) for LME in the U.S., Western Europe, East Asia, and Southeast
Asia. I like these data as illustrative of some of the shortsightedness
of the aluminum tariffs that Donald Trump announced in March 2018. The
tariffs had no discernible effect on manufacturing employment or
earnings, but they created a supply shock that made aluminum more
expensive. Rows: 3664 Columns: 3 . Data types: Date (1); factor (1); numeric (1) |
anes_partytherms |
A data set on thermometer ratings for the Democratic
party, Republican party, “both major parties”, and a major party
thermometer index from the American National Election Studies
(1978-2012). Rows: 33830 Columns: 19 . Data types: numeric (18); character (1) |
anes_prochoice |
A simple data set for in-class illustration about how
to estimate and interpret interactive relationships. The data here are
deliberately minimal for that end. Rows: 5914 Columns: 14 . Data types: character (1); numeric (13) |
anes_vote84 |
This is a simple data set for estimating a simple model
on voter turnout from the 1984 American National Election Studies (ANES)
1984 time-series. Rows: 2257 Columns: 9 . Data types: integer (1); character (1); numeric (7) |
Arca |
Daily data on the NYSE Arca Steel Index. These data are
useful for me in teaching how Trump’s 2018 steel tariffs didn’t do much
good for the steel industry. Rows: 966 Columns: 5 . Data types: Date (1); numeric (4) |
arcticseaice |
This data set from Connelly et al. (2017) measures the
Arctic sea ice extent in 10^6 square kilometers. It includes lower
bounds and upper bounds on annual averages.
Rows: 115 Columns: 4 . Data types: numeric (4) |
arg_tariff |
Simple mean tariff rate for Argentina, starting in
1980. The goal is to keep these data current.
Rows: 39 Columns: 3 . Data types: character (1); numeric (2) |
asn_stats |
These are yearly counts on air accidents and
fatalities, including measures for corporate jet accidents and
hijackings. The hijackings are of particular interest to me, at least
from a historical terrorism perspective.
Rows: 78 Columns: 7 . Data types: integer (7) |
CFT15 |
This is the replication data for “Randomization
Inference in the Regression Discontinuity Design: An Application to
Party Advantages in the U.S. Senate”, published in 2015 in Journal of
Causal Inference. I use these data to teach about regression
discontinuity designs. Rows: 1390 Columns: 9 . Data types: numeric (9) |
clemson_temps |
This data set contains daily temperatures (highs and
lows) for Clemson, South Carolina from Jan. 1, 1930 to the end of the
most recent calendar year. The goal is to update this periodically with
new data for as long as I live in this town.
Rows: 33148 Columns: 3 . Data types: Date (1); numeric (2) |
co2emissions |
This is a sample data set, cobbled from various
sources, about carbon dioxide emissions in the history of the planet
from 800,000 BCE to the most recently concluded calendar year. I use
this for a data visualization example for a lecture on climate change
and international politics. Data communicate yearly averages/estimates.
Rows: 3099 Columns: 2 . Data types: numeric (2) |
coffee_imports |
A simple panel on coffee imports for importing
countries. Rows: 4530 Columns: 4 . Data types: character (1); numeric (3) |
coffee_price |
This is primary commodity price data for coffee
(Arabica, Robustas) from 1980 to the present. I manually update these
data since FRED’s coverage since 2017 has been spotty.
Rows: 499 Columns: 3 . Data types: Date (1); numeric (2) |
commodity_prices |
A data set on select, monthly commodity prices made
available by the World Bank in its so-called “pink sheet.” These data
are potentially useful for applications on data gathering, inflation
adjustments, indexing, cointegration, general economic riff-raff, and
more. Rows: 756 Columns: 11 . Data types: Date (1); numeric (10) |
CP77 |
This is a simple data set provided by Chatterjee and
Price (1977, p. 108) that serves as a known example of
heteroscedasticity. Rows: 50 Columns: 6 . Data types: character (2); integer (4) |
DAPO |
A reduced form of data set for reproducing an analysis
on the determinants of Arab public opinion in seven countries toward 13
different countries. Rows: 91 Columns: 11 . Data types: character (2); numeric (9) |
Datasaurus |
An illustrative exercise in never trusting the summary
statistics without also visualizing them.
Rows: 1846 Columns: 3 . Data types: character (1); numeric (2) |
DCE12 |
A data set on domestic conflict events in 2012 as
recorded by the Cross-National Time Series Database. Data exist for
teaching about count models. Rows: 198 Columns: 19 . Data types: character (2); numeric (17) |
Dee04 |
This should be a data set for a (partial?) replication
of Dee’s (2004) article on the purported civics returns to education. I
use these data for in-class illustration about instrumental variable
analyses. Rows: 9227 Columns: 8 . Data types: numeric (8) |
DJIA |
This data set contains the value of the Dow Jones
Industrial Average on daily close for all available dates (to the best
of my knowledge) from 1885 to the most recent update I feel like
including. Extensions shouldn’t be too difficult with existing packages.
Rows: 37931 Columns: 2 . Data types: Date (1); numeric (1) |
DST |
These are fatalities (and, in the case of terrorism,
casualties as well) for drunk-driving, suicide, and acts of terrorism in
the U.S. spanning 1970 to 2018. Only one of these is sufficiently
important to command public attention despite being the least severe
public bad. Do you want to guess which one?
Rows: 49 Columns: 5 . Data types: numeric (5) |
EBJ |
A data set on the apparent economic benefits of
post-conflict justice Rows: 95 Columns: 12 . Data types: numeric (12) |
eight_schools |
You’ve all seen these before. These are the “eight
schools” that everyone gets when being introduced to Bayesian
programming. Here are the full data for your consideration, which you
can use instead of awkwardly searching where the data are and
copy-pasting them as a list. Every damn time, Steve.
Rows: 8 Columns: 6 . Data types: character (1); numeric (5) |
election_turnout |
A simple data set on education and state-level (+ DC)
turnout in the 2016 presidential election. This is inspired by what
Pollock (2012) does in his book. Rows: 51 Columns: 14 . Data types: numeric (8); character (3); integer (3) |
eq_passengercars |
Data from the International Monetary Fund for the
export quality and unit/trade value of passenger cars for all available
countries and years from 1963 to 2014.
Rows: 60424 Columns: 6 . Data types: character (1); numeric (3); factor (2) |
ESS10NO |
This is a simple data set to illustrate the use of
sampling weights fromthe European Social Survey.
Rows: 1411 Columns: 24 . Data types: character (3); numeric (19); c(“POSIXct”, “POSIXt”) (2) |
ESS9GB |
This is a replication data originally set to accompany
a blog post and presentation to students at the University of Nottingham
in March 2020. However, COVID-19 led to the cancellation of the talk.
Rows: 1905 Columns: 19 . Data types: character (4); numeric (13); Date (2) |
ESSBE5 |
This is a sample data set cobbled from the fifth round
of European Social Survey data for Belgium. It offers a means to do a
basic replication of some of Chapter 5 of The SAGE Handbook of
Regression Analysis and Causal Inference.
Rows: 1704 Columns: 10 . Data types: numeric (8); character (2) |
eurostat_codes |
A data set taken from Eurostat’s glossary on codes and
country classifications. Rows: 56 Columns: 3 . Data types: character (3) |
eustates |
European Union membership by accession date
Rows: 28 Columns: 3 . Data types: Date (1); character (2) |
fakeAPI |
This is a hypothetical universe of schools in a given
territorial unit, patterned off the apipop data available in the survey
package. Rows: 10000 Columns: 11 . Data types: integer (1); character (3); numeric (7) |
fakeHappiness |
This is a toy (“fake”) data set I might use to
illustrate the so-called curvilinear effect of age on happiness.
Rows: 1000 Columns: 8 . Data types: numeric (8) |
fakeLogit |
This is a simple fake data set to illustrate a logistic
regression. Rows: 10000 Columns: 2 . Data types: numeric (1); integer (1) |
fakeTSCS |
This is a toy (i.e. “fake”) data set created by the
fabricatr package. There are 100 observations for 25 hypothetical
countries. The outcome y is a linear function of a baseline for each
hypothetical country, plus a yearly growth trend as well as varying
growth errors for each country. x1 is supposed to have a linear effect
of .5 on y, all things considered. x2 is supposed to have a linear
effect of 1 on y for each unit change in x2, all things considered.
Rows: 2500 Columns: 8 . Data types: integer (1); character (1); numeric (6) |
fakeTSD |
This is a toy (i.e. “fake”) data set created by the
fabricatr package. There are 100 observations. The outcome y is a linear
function of 20 + (.25 * year) + .(25 * x1) + (1 * x2) + e. This clearly
implies some autocorrelation in the data. I.e. it’s a time-series.
Rows: 100 Columns: 5 . Data types: integer (1); numeric (4) |
ghp100k |
This is the yearly rate of gun homicides per 100,000
people in the population, selecting on “Western” countries of interest.
Rows: 561 Columns: 3 . Data types: character (1); numeric (2) |
GHR04 |
This is a data set for replicating Ghobarah et
al. (2004), a reduced form of what they make available on Dataverse for
replication. Variables have been renamed for legibility.
Rows: 182 Columns: 15 . Data types: character (2); numeric (13) |
gss_abortion |
This is a toy data set derived from the General Social
Survey that I intend to use for several purposes. First, the battery of
abortion items can serve as toy data to illustrate mixed effects
modeling as equivalent to a one-parameter (Rasch) model. Second, I
include some covariates to also do some basic regressions. I think
abortion opinions are useful learning tools for statistical inference
for college students. Third, there’s a time-series component as well for
understanding how abortion attitudes have changed over time.
Rows: 64814 Columns: 18 . Data types: numeric (15); character (3) |
gss_spending |
This is a toy data set that collects attitudes on
toward national spending for various things in the General Social Survey
for 2018. I use these data for in-class illustration about ordinal
variables and ordinal models. Rows: 2348 Columns: 33 . Data types: numeric (33) |
gss_wages |
Wage data from the General Social Survey (1974-2018) to
illustrate wage discrepancies by gender (while also considering
respondent occupation, age, and education).
Rows: 61697 Columns: 11 . Data types: numeric (6); character (5) |
Guber99 |
A data set for a canonical case of a Simpson’s paradox,
useful for in-class instruction on the topic.
Rows: 50 Columns: 8 . Data types: character (1); numeric (3); integer (4) |
illiteracy30 |
This is perhaps the canonical data set for illustrating
the ecological fallacy. Rows: 49 Columns: 11 . Data types: character (1); numeric (10) |
inglehart03 |
A data set based on summary information provided in
Inglehart’s (2003) article in PS: Political Science &
Politics. These data would be from the article itself and only
indirectly from the raw World or European Values Survey.
Rows: 77 Columns: 4 . Data types: character (1); numeric (3) |
Lipset59 |
A data set on democracy and economic development for 48
countries that Lipset (1959) first described.
Rows: 48 Columns: 11 . Data types: character (3); numeric (8) |
LOTI |
These data contain monthly mean temperature anomalies
expressed as deviations from the corresponding 1951-1980 means. They are
useful for showing how we can measure climate change.
Rows: 1716 Columns: 2 . Data types: Date (1); numeric (1) |
LTPT |
These data are a monthly time-series of changes in the
consumer price index relative to a Dec. 1997 starting date for
televisions, computers, and related items. I use this as in-class
illustration that globalization has made consumer electronics cheaper
across the board for Americans. Rows: 1704 Columns: 3 . Data types: Date (1); character (1); numeric (1) |
LTWT |
“Let Them Watch TV”: These data contain price indices
for various items for the general urban consumer. Categories include
medical services, college tuition, college textbooks, child care,
housing, food and beverages, all items (i.e. general CPI), new vehicles,
apparel, and televisions. The base period in value was originally the
1982-4 average, but I converted the base period to January 2000. I use
these data for in-class discussion about how liberalized trade has made
consumer electronics (like TVs) fractions of their past prices. Yet,
young adults face mounting costs for college, child-raising, and health
care that government policy has failed to address.
Rows: 2377 Columns: 3 . Data types: Date (1); factor (1); numeric (1) |
min_wage |
A data set on the various federal minimum wage rates.
Rows: 23 Columns: 2 . Data types: Date (1); numeric (1) |
mm_mlda |
These are data you can use to replicate the regression
discontinuity design analyses throughout Chapter 4 of Mastering
’Metrics. Original analyses come from Carpenter and Dobkin (2009, 2011).
Rows: 50 Columns: 19 . Data types: numeric (19) |
mm_nhis |
These are data from the 2009 NHIS survey. People who
have read Mastering ‘Metrics should recognize these data. They’re
featured prominently in that book and the authors’ discussion of random
assignment and experiments. Rows: 18790 Columns: 10 . Data types: numeric (10) |
mm_randhie |
These are data from the RAND Health Insurance
Experiment (HIE).People who have read Mastering ‘Metrics should
recognize these data. They’re featured prominently in that book and the
authors’ discussion of random assignment and experiments.
Rows: Columns: . Data types: tbl_df (2); tbl (2); data.frame (2) |
mvprod |
Data, largely from Organisation Internationale des
Constructeurs d’Automobiles (OICA), on motor vehicle production in
various countries (and the world totals) from 1950 to 2019 at various
intervals. Tallies include production of passenger cars, light
commercial vehicles, minibuses, trucks, buses and coaches.
Rows: 1206 Columns: 3 . Data types: character (1); numeric (2) |
nesarc_drinkspd |
This toy data set is loosely modified from Wave I of
the NESARC data set. Here, my main interest is the number of drinks
consumed on a usual day drinking alcohol in the past 12 months,
according to respondents in the nationally representative survey of
43,093 Americans. Rows: 43093 Columns: 8 . Data types: numeric (8) |
Newhouse77 |
These are the data in Newhouse’s (1977) simple OLS
model from 1977. In his case, he’s trying to explain medical care
expenditures as a function of GDP per capita for these countries. It’s
probably the easiest OLS model I can find in print because Newhouse
helpfully provides all the data in one simple table.
Rows: 13 Columns: 5 . Data types: character (1); numeric (4) |
ODGI |
The NOAA Earth System Research Laboratory has an “ozone
depleting gas index” (ODGI) data set from 1992 to 2018. This dataset
summarizes Table 1 and Table 2 from its website. The primary interest
here (for my purposes) is the ODGI indices (including the new 2012
measure). The data set includes constituent greenhouse gases/chlorines
as well in parts per trillion. The primary use here is for in-class
illustration. Rows: 62 Columns: 16 . Data types: numeric (15); character (1) |
OODTPT |
A data set for replicating an argument about the
relationship between democracyand tariffs/non-tariff trade barriers.
Rows: 75 Columns: 16 . Data types: character (2); numeric (14) |
PPGE |
A data set on government spending in select rich
countries as a function oftrade/GDP, financial openness, and the
state-year-level engagement in tradeunions. The data offer a means to
quasi-replicate Garrett’s (1998) argumentabout left-wing governments’
ability to stem the tide of globalization’seffect on decreased
government spending. Rows: 1020 Columns: 9 . Data types: character (2); numeric (7) |
PRDEG |
A data set for replicating David Leblang’s (1996)
analysis on property rights,democracy, and economic growth.
Rows: 147 Columns: 10 . Data types: numeric (9); character (1) |
Presidents |
This should be self-evident. Here are all U.S.
presidents who have completed their terms in office (i.e. excluding the
current one). Rows: 45 Columns: 3 . Data types: character (1); Date (2) |
pwt_sample |
These are some macroeconomic data for 21 select (rich)
countries. I’ve used these data before to discuss issues of grouping and
skew in cross-sectional data. Rows: 1540 Columns: 12 . Data types: character (2); integer (1); numeric (9) |
quartets |
These are four x-y data sets, combined into a long
format, which have the same traditional statistical properties (mean,
variance, correlation, regression line, etc.). However, they look quite
different. Rows: 44 Columns: 3 . Data types: numeric (2); character (1) |
recessions |
Data on U.S. recessions, past to present. Data include
information on contraction, expansion, and cycle.
Rows: 35 Columns: 8 . Data types: Date (2); numeric (6) |
SBCD |
A data set on banking, currency, debt, and
debt-restructuring crises from1970 to 2017.
Rows: 547 Columns: 4 . Data types: character (2); numeric (1); integer (1) |
scb_regions |
This is a simple data set for matching region codes to
the names of territorial units in Sweden, at least recorded/cataloged by
the Central Bureau of Statistics in Sweden.
Rows: 312 Columns: 2 . Data types: character (2) |
SCP16 |
County-level data on vote share and various
background/demographic information for the 2016 South Carolina
GOP/Democratic primaries. Rows: 46 Columns: 15 . Data types: character (1); numeric (14) |
sealevels |
These data describe how sea level has changed over
time, in both relative and absolute terms. Absolute sea level change
refers to the height of the ocean surface regardless of whether nearby
land is rising or falling. Rows: 136 Columns: 5 . Data types: integer (1); numeric (4) |
so2concentrations |
This data set contains yearly observations by the
Environmental Protection Agency on the concentration of sulfur dioxide
in parts per billion, based on 32 sites. I use this for in-class
illustration. Note that the national standard is 75 parts per billion.
Data are the national trend. Rows: 41 Columns: 4 . Data types: numeric (4) |
states_war |
A data set on state performance in inter-state wars.
This data is useful for evaluating Valentino et al.’s (2010) “Bear Any
Burden” analysis using more current data.
Rows: 284 Columns: 23 . Data types: numeric (21); character (2) |
steves_clothes |
I cobbled together this data set of the professional
clothes (polos, long-sleeve dress shirts, pants) in my closet, largely
for illustration on the origins of apparel in the U.S. for an intro
lecture on trade. Rows: 86 Columns: 4 . Data types: character (4) |
sugar_price |
This is primary commodity price data for sugar
globally, in the United States, and in Europe for every month from 1980
to (roughly) the present. Prices are nominal U.S. cents per pound and
are not seasonally adjusted (“NSA”). Rows: 1316 Columns: 3 . Data types: Date (1); character (1); numeric (1) |
sweden_counties |
A simple data set on Sweden’s counties.
Rows: 21 Columns: 6 . Data types: character (4); numeric (2) |
thatcher_approval |
A data set on satisfaction/dissatisfaction ratings
during Margaret Thatcher’s tenure as prime minister.
Rows: 125 Columns: 8 . Data types: character (1); Date (1); numeric (6) |
therms |
A data set on thermometer ratings for Donald Trump and
Barack Obama in 2020. I use these data for in-class illustration of
central limit theorem. Basically: the sampling distribution of a
population is normal, even if the underlying population is decidedly
not. Rows: 3080 Columns: 2 . Data types: numeric (2) |
turnips |
A data set on turnip prices from my experience with
Animal Crossing (New Horizons) Rows: 1429 Columns: 3 . Data types: Date (1); character (1); numeric (1) |
TV16 |
These data come from the 2016 CCES and allow interested
students to model the individual correlates of the Trump vote in 2016.
Code/analysis heavily indebted to a 2017 analysis I did on my blog (see
references). Rows: 64600 Columns: 21 . Data types: integer (1); character (2); numeric (18) |
ukg_eeri |
This is a (near) daily data set on the effective
exchange rate index for the United Kingdom’s pound sterling from 1990
onward. The data are indexed, such that 100 equals the monthly average
in January 2005. This is useful for illustrating devaluations of the
pound after Black Wednesday, the financial crisis, and, more recently,
the UK’s separation from the European Union.
Rows: 8340 Columns: 2 . Data types: Date (1); numeric (1) |
uniondensity |
Cross-national data on relative size of the trade
unions and predictors in 20 countries. This is a data set of interest to
replicating Western and Jackman (1994), who themselves were addressing a
debate between Wallerstein and Stephens on which of two highly
correlated predictors explains trade union density.
Rows: 20 Columns: 5 . Data types: character (1); numeric (4) |
usa_chn_gdp_forecasts |
This is a toy data set to examine the time in which we
should expect China to overtake the United States in total gross
domestic product (GDP), given current trends. It includes an OECD
long-term GDP forecast from 2014, and forecasts from the forecast and
prophet packages in R. Rows: 182 Columns: 12 . Data types: character (1); numeric (11) |
usa_computers |
This is a simple and regrettably incomplete time-series
on the percentage of U.S. households with access to a computer, by year.
Rows: 19 Columns: 2 . Data types: numeric (2) |
usa_migration |
This data set contains counts/estimates for the number
of inbound migrants in the U.S as well as outbound migrants of American
origin to other countries from 1990 to 2017.
Rows: 3535 Columns: 5 . Data types: numeric (2); character (3) |
usa_states |
A simple data set from state.abb, state.name,
state.region, and state.division (+ District of Columbia). I’d rather
just have all these in one place. Rows: 51 Columns: 4 . Data types: character (4) |
usa_tradegdp |
A yearly data set on U.S. trade and GDP from 1790 to
2018. Data also include a population variable to facilitate per capita
adjustments, if the user sees it useful.
Rows: 229 Columns: 5 . Data types: numeric (5) |
voteincome |
A data set on turnout and demographic data from the
2000 Current Population Survey. This is a basic port of the voteincome
data from the Zelig package. Rows: 1500 Columns: 7 . Data types: character (1); numeric (6) |
wbd_example |
A simple data set drawn from World Bank Open Data. I’ll
use it to illustratesome merge issues you might encounter in panel data.
Rows: 4537 Columns: 7 . Data types: character (3); integer (1); numeric (3) |
wvs_ccodes |
A simple data set that syncs World Values Survey
country codes (s003) with corresponding country codes from the
Correlates of War state system membership data.
Rows: 112 Columns: 3 . Data types: numeric (2); character (1) |
wvs_immig |
A data set on attitudes about immigration for all
observations in the third to sixth wave of the World Values Survey. I
use these data for in-class illustration.
Rows: 310388 Columns: 6 . Data types: numeric (4); character (1); integer (1) |
wvs_justifbribe |
A data set on attitudes about the justifiability of
bribe-taking for all observations in the third to sixth wave of the
World Values Survey. I use these data for in-class illustration about
seemingly interval-level, but information-poor measurements.
Rows: 348532 Columns: 6 . Data types: numeric (4); character (1); integer (1) |
wvs_usa_abortion |
A data set on attitudes about the justifiability of
abortion in the United States based on World Values Survey responses
recorded across six waves (from 1982 to 2011). I assembled this data
frame probably around 2014 and routinely use it for in-class
illustration about regression, post-estimation simulation, quantities of
interest, and how to think about modeling a dependent variable that is
on a 1-10 scale, but has curious heaping patterns.
Rows: 10387 Columns: 16 . Data types: integer (16) |
wvs_usa_educat |
This is a simple data set that summarizes what the
education codes are in the World Values Survey for the United States.
Rows: 42 Columns: 6 . Data types: numeric (3); character (3) |
wvs_usa_regions |
This is a simple data set that summarizes what the
region codes are in the World Values Survey for the United States.
Rows: 63 Columns: 6 . Data types: numeric (1); character (5) |
yugo_sales |
A data set on Yugo sales against two competing models
in the United States from 1985 to 1992.
Rows: 24 Columns: 3 . Data types: numeric (2); character (1) |