Skip to contents

Here is a description of every data set in stevedata. The underlying code in the vignettes/ folder will show the code that formats this table.

Object Name Description
af_crime93 These data are in Table 9.1 of the 3rd edition of Agresti and Finlay’s Statistical Methods for the Social Sciences . The data are from Statistical Abstract of the United States and most variables were measured in 1993.

Rows: 51 Columns: 8 .
Column types: character (1); numeric (7)
african_coups A data set on modeling coups in Africa using data from the period between 1960 and 1975 (1982). These data offer a partial replication of Jackman (1978).

Rows: 45 Columns: 11 .
Column types: character (2); numeric (9)
aluminum_premiums A near daily data set on the price of aluminum premiums (USD/MT) for LME in the U.S., Western Europe, East Asia, and Southeast Asia. I like these data as illustrative of some of the shortsightedness of the aluminum tariffs that Donald Trump announced in March 2018. The tariffs had no discernible effect on manufacturing employment or earnings, but they created a supply shock that made aluminum more expensive.

Rows: 3664 Columns: 3 .
Column types: Date (1); factor (1); numeric (1)
anes_partytherms A data set on thermometer ratings for the Democratic party, Republican party, “both major parties”, and a major party thermometer index from the American National Election Studies (1978-2012).

Rows: 33830 Columns: 19 .
Column types: character (1); numeric (18)
anes_prochoice A simple data set for in-class illustration about how to estimate and interpret interactive relationships. The data here are deliberately minimal for that end.

Rows: 5914 Columns: 14 .
Column types: character (1); numeric (13)
anes_vote84 This is a simple data set for estimating a simple model on voter turnout from the 1984 American National Election Studies (ANES) 1984 time-series.

Rows: 2257 Columns: 9 .
Column types: character (1); integer (1); numeric (7)
Arca Daily data on the NYSE Arca Steel Index. These data are useful for me in teaching how Trump’s 2018 steel tariffs didn’t do much good for the steel industry.

Rows: 966 Columns: 5 .
Column types: Date (1); numeric (4)
arcticseaice This data set from Connelly et al. (2017) measures the Arctic sea ice extent in 10^6 square kilometers. It includes lower bounds and upper bounds on annual averages.

Rows: 115 Columns: 4 .
Column types: numeric (4)
arg_tariff Simple mean tariff rate for Argentina, starting in 1980. The goal is to keep these data current.

Rows: 39 Columns: 3 .
Column types: character (1); numeric (2)
asn_stats These are yearly counts on air accidents and fatalities, including measures for corporate jet accidents and hijackings. The hijackings are of particular interest to me, at least from a historical terrorism perspective.

Rows: 78 Columns: 7 .
Column types: integer (7)
CFT15 This is the replication data for “Randomization Inference in the Regression Discontinuity Design: An Application to Party Advantages in the U.S. Senate”, published in 2015 in Journal of Causal Inference . I use these data to teach about regression discontinuity designs.

Rows: 1390 Columns: 9 .
Column types: numeric (9)
chile88 A data set on voting intentions in the 1988 Chilean plebiscite, which ultimately ended the military junta rule of Augusto Pinochet.

Rows: 2700 Columns: 8 .
Column types: character (3); numeric (5)
clemson_temps This data set contains daily temperatures (highs and lows) for Clemson, South Carolina from Jan. 1, 1930 to the end of the most recent calendar year. The goal is to update this periodically with new data for as long as I live in this town.

Rows: 33148 Columns: 3 .
Column types: Date (1); numeric (2)
co2emissions This is a sample data set, cobbled from various sources, about carbon dioxide emissions in the history of the planet from 800,000 BCE to the most recently concluded calendar year. I use this for a data visualization example for a lecture on climate change and international politics. Data communicate yearly averages/estimates.

Rows: 3099 Columns: 2 .
Column types: numeric (2)
coffee_imports A simple panel on coffee imports for importing countries.

Rows: 4530 Columns: 4 .
Column types: character (1); numeric (3)
coffee_price This is primary commodity price data for coffee (Arabica, Robustas) from 1980 to the present. I manually update these data since FRED’s coverage since 2017 has been spotty.

Rows: 499 Columns: 3 .
Column types: Date (1); numeric (2)
commodity_prices A data set on select, monthly commodity prices made available by the World Bank in its so-called “pink sheet.” These data are potentially useful for applications on data gathering, inflation adjustments, indexing, cointegration, general economic riff-raff, and more.

Rows: 756 Columns: 11 .
Column types: Date (1); numeric (10)
CP77 This is a simple data set provided by Chatterjee and Price (1977, p. 108) that serves as a known example of heteroscedasticity.

Rows: 50 Columns: 6 .
Column types: character (2); integer (4)
DAPO A reduced form of data set for reproducing an analysis on the determinants of Arab public opinion in seven countries toward 13 different countries.

Rows: 91 Columns: 11 .
Column types: character (2); numeric (9)
Datasaurus An illustrative exercise in never trusting the summary statistics without also visualizing them.

Rows: 1846 Columns: 3 .
Column types: character (1); numeric (2)
DCE12 A data set on domestic conflict events in 2012 as recorded by the Cross-National Time Series Database. Data exist for teaching about count models.

Rows: 198 Columns: 19 .
Column types: character (2); numeric (17)
Dee04 This should be a data set for a (partial?) replication of Dee’s (2004) article on the purported civics returns to education. I use these data for in-class illustration about instrumental variable analyses.

Rows: 9227 Columns: 8 .
Column types: numeric (8)
DJIA This data set contains the value of the Dow Jones Industrial Average on daily close for all available dates (to the best of my knowledge) from 1885 to the most recent update I feel like including. Extensions shouldn’t be too difficult with existing packages.

Rows: 37931 Columns: 2 .
Column types: Date (1); numeric (1)
DST These are fatalities (and, in the case of terrorism, casualties as well) for drunk-driving, suicide, and acts of terrorism in the U.S. spanning 1970 to 2018. Only one of these is sufficiently important to command public attention despite being the least severe public bad. Do you want to guess which one?

Rows: 49 Columns: 5 .
Column types: numeric (5)
EBJ A data set on the apparent economic benefits of post-conflict justice

Rows: 95 Columns: 12 .
Column types: numeric (12)
eight_schools You’ve all seen these before. These are the “eight schools” that everyone gets when being introduced to Bayesian programming. Here are the full data for your consideration, which you can use instead of awkwardly searching where the data are and copy-pasting them as a list. Every damn time, Steve.

Rows: 8 Columns: 6 .
Column types: character (1); numeric (5)
election_turnout A simple data set on education and state-level (+ DC) turnout in the 2016 presidential election. This is inspired by what Pollock (2012) does in his book.

Rows: 51 Columns: 14 .
Column types: character (3); integer (3); numeric (8)
eq_passengercars Data from the International Monetary Fund for the export quality and unit/trade value of passenger cars for all available countries and years from 1963 to 2014.

Rows: 60424 Columns: 6 .
Column types: character (1); factor (2); numeric (3)
ESS10NO This is a simple data set to illustrate the use of sampling weights from the European Social Survey.

Rows: 1411 Columns: 24 .
Column types: character (3); numeric (19); POSIXct-POSIXt (2)
ESS9GB This is a replication data originally set to accompany a blog post and presentation to students at the University of Nottingham in March 2020. However, COVID-19 led to the cancellation of the talk.

Rows: 1905 Columns: 19 .
Column types: character (4); Date (2); numeric (13)
ESSBE5 This is a sample data set cobbled from the fifth round of European Social Survey data for Belgium. It offers a means to do a basic replication of some of Chapter 5 of The SAGE Handbook of Regression Analysis and Causal Inference.

Rows: 1704 Columns: 10 .
Column types: character (2); numeric (8)
eu_ua_fta24 A data set on an April 2024 roll call vote to extend an emergency free trade agreement with Ukraine.

Rows: 705 Columns: 9 .
Column types: character (8); numeric (1)
eurostat_codes A data set taken from Eurostat’s glossary on codes and country classifications.

Rows: 56 Columns: 3 .
Column types: character (3)
eustates European Union membership by accession date

Rows: 28 Columns: 3 .
Column types: character (2); Date (1)
fakeAPI This is a hypothetical universe of schools in a given territorial unit, patterned off the apipop data available in the survey package.

Rows: 10000 Columns: 11 .
Column types: character (3); integer (1); numeric (7)
fakeHappiness This is a toy (“fake”) data set I might use to illustrate the so-called curvilinear effect of age on happiness.

Rows: 1000 Columns: 8 .
Column types: numeric (8)
fakeLogit This is a simple fake data set to illustrate a logistic regression.

Rows: 10000 Columns: 2 .
Column types: integer (1); numeric (1)
fakeTSCS This is a toy (i.e. “fake”) data set created by the fabricatr package. There are 100 observations for 25 hypothetical countries. The outcome y is a linear function of a baseline for each hypothetical country, plus a yearly growth trend as well as varying growth errors for each country. x1 is supposed to have a linear effect of .5 on y , all things considered. x2 is supposed to have a linear effect of 1 on y for each unit change in x2 , all things considered.

Rows: 2500 Columns: 8 .
Column types: character (1); integer (1); numeric (6)
fakeTSD This is a toy (i.e. “fake”) data set created by the fabricatr package. There are 100 observations. The outcome y is a linear function of 20 + (.25 * year) + .(25 * x1) + (1 * x2) + e . This clearly implies some autocorrelation in the data. I.e. it’s a time-series.

Rows: 100 Columns: 5 .
Column types: integer (1); numeric (4)
ghp100k This is the yearly rate of gun homicides per 100,000 people in the population, selecting on “Western” countries of interest.

Rows: 561 Columns: 3 .
Column types: character (1); numeric (2)
GHR04 This is a data set for replicating Ghobarah et al. (2004), a reduced form of what they make available on Dataverse for replication. Variables have been renamed for legibility.

Rows: 182 Columns: 15 .
Column types: character (2); numeric (13)
gss_abortion This is a toy data set derived from the General Social Survey that I intend to use for several purposes. First, the battery of abortion items can serve as toy data to illustrate mixed effects modeling as equivalent to a one-parameter (Rasch) model. Second, I include some covariates to also do some basic regressions. I think abortion opinions are useful learning tools for statistical inference for college students. Third, there’s a time-series component as well for understanding how abortion attitudes have changed over time.

Rows: 64814 Columns: 18 .
Column types: character (3); numeric (15)
gss_spending This is a toy data set that collects attitudes on toward national spending for various things in the General Social Survey for 2018. I use these data for in-class illustration about ordinal variables and ordinal models.

Rows: 2348 Columns: 33 .
Column types: numeric (33)
gss_wages Wage data from the General Social Survey (1974-2018) to illustrate wage discrepancies by gender (while also considering respondent occupation, age, and education).

Rows: 61697 Columns: 11 .
Column types: character (5); numeric (6)
Guber99 A data set for a canonical case of a Simpson’s paradox, useful for in-class instruction on the topic.

Rows: 50 Columns: 8 .
Column types: character (1); integer (4); numeric (3)
illiteracy30 This is perhaps the canonical data set for illustrating the ecological fallacy.

Rows: 49 Columns: 11 .
Column types: character (1); numeric (10)
inglehart03 A data set based on summary information provided in Inglehart’s (2003) article in PS: Political Science & Politics . These data would be from the article itself and only indirectly from the raw World or European Values Survey.

Rows: 77 Columns: 4 .
Column types: character (1); numeric (3)
Lipset59 A data set on democracy and economic development for 48 countries that Lipset (1959) first described.

Rows: 48 Columns: 11 .
Column types: character (3); numeric (8)
LOTI These data contain monthly mean temperature anomalies expressed as deviations from the corresponding 1951-1980 means. They are useful for showing how we can measure climate change.

Rows: 1716 Columns: 2 .
Column types: Date (1); numeric (1)
LTPT These data are a monthly time-series of changes in the consumer price index relative to a Dec. 1997 starting date for televisions, computers, and related items. I use this as in-class illustration that globalization has made consumer electronics cheaper across the board for Americans.

Rows: 1704 Columns: 3 .
Column types: character (1); Date (1); numeric (1)
LTWT “Let Them Watch TV”: These data contain price indices for various items for the general urban consumer. Categories include medical services, college tuition, college textbooks, child care, housing, food and beverages, all items (i.e. general CPI), new vehicles, apparel, and televisions. The base period in value was originally the 1982-4 average, but I converted the base period to January 2000. I use these data for in-class discussion about how liberalized trade has made consumer electronics (like TVs) fractions of their past prices. Yet, young adults face mounting costs for college, child-raising, and health care that government policy has failed to address.

Rows: 2377 Columns: 3 .
Column types: Date (1); factor (1); numeric (1)
min_wage A data set on the various federal minimum wage rates.

Rows: 23 Columns: 2 .
Column types: Date (1); numeric (1)
mm_mlda These are data you can use to replicate the regression discontinuity design analyses throughout Chapter 4 of Mastering ’Metrics . Original analyses come from Carpenter and Dobkin (2009, 2011).

Rows: 50 Columns: 19 .
Column types: numeric (19)
mm_nhis These are data from the 2009 NHIS survey. People who have read Mastering ‘Metrics should recognize these data. They’re featured prominently in that book and the authors’ discussion of random assignment and experiments.

Rows: 18790 Columns: 10 .
Column types: numeric (10)
mm_randhie These are data from the RAND Health Insurance Experiment (HIE). People who have read Mastering ‘Metrics should recognize these data. They’re featured prominently in that book and the authors’ discussion of random assignment and experiments.

Rows: Columns: .
Column types: data.frame (2); tbl (2); tbl_df (2)
mvprod Data, largely from Organisation Internationale des Constructeurs d’Automobiles (OICA), on motor vehicle production in various countries (and the world totals) from 1950 to 2019 at various intervals. Tallies include production of passenger cars, light commercial vehicles, minibuses, trucks, buses and coaches.

Rows: 1206 Columns: 3 .
Column types: character (1); numeric (2)
nesarc_drinkspd This toy data set is loosely modified from Wave I of the NESARC data set. Here, my main interest is the number of drinks consumed on a usual day drinking alcohol in the past 12 months, according to respondents in the nationally representative survey of 43,093 Americans.

Rows: 43093 Columns: 8 .
Column types: numeric (8)
Newhouse77 These are the data in Newhouse’s (1977) simple OLS model from 1977. In his case, he’s trying to explain medical care expenditures as a function of GDP per capita for these countries. It’s probably the easiest OLS model I can find in print because Newhouse helpfully provides all the data in one simple table.

Rows: 13 Columns: 5 .
Column types: character (1); numeric (4)
ODGI The NOAA Earth System Research Laboratory has an “ozone depleting gas index” (ODGI) data set from 1992 to 2018. This dataset summarizes Table 1 and Table 2 from its website. The primary interest here (for my purposes) is the ODGI indices (including the new 2012 measure). The data set includes constituent greenhouse gases/chlorines as well in parts per trillion. The primary use here is for in-class illustration.

Rows: 62 Columns: 16 .
Column types: character (1); numeric (15)
OODTPT A data set for replicating an argument about the relationship between democracy and tariffs/non-tariff trade barriers.

Rows: 75 Columns: 16 .
Column types: character (2); numeric (14)
PPGE A data set on government spending in select rich countries as a function of trade/GDP, financial openness, and the state-year-level engagement in trade unions. The data offer a means to quasi-replicate Garrett’s (1998) argument about left-wing governments’ ability to stem the tide of globalization’s effect on decreased government spending.

Rows: 1020 Columns: 9 .
Column types: character (2); numeric (7)
PRDEG A data set for replicating David Leblang’s (1996) analysis on property rights, democracy, and economic growth.

Rows: 147 Columns: 10 .
Column types: character (1); numeric (9)
Presidents This should be self-evident. Here are all U.S. presidents who have completed their terms in office (i.e. excluding the current one).

Rows: 45 Columns: 3 .
Column types: character (1); Date (2)
pwt_sample These are some macroeconomic data for 21 select (rich) countries. I’ve used these data before to discuss issues of grouping and skew in cross-sectional data.

Rows: 1540 Columns: 12 .
Column types: character (2); integer (1); numeric (9)
quartets These are four x-y data sets, combined into a long format, which have the same traditional statistical properties (mean, variance, correlation, regression line, etc.). However, they look quite different.

Rows: 44 Columns: 3 .
Column types: character (1); numeric (2)
recessions Data on U.S. recessions, past to present. Data include information on contraction, expansion, and cycle.

Rows: 35 Columns: 8 .
Column types: Date (2); numeric (6)
SBCD A data set on banking, currency, debt, and debt-restructuring crises from 1970 to 2017.

Rows: 547 Columns: 4 .
Column types: character (2); integer (1); numeric (1)
scb_regions This is a simple data set for matching region codes to the names of territorial units in Sweden, at least recorded/cataloged by the Central Bureau of Statistics in Sweden.

Rows: 312 Columns: 2 .
Column types: character (2)
SCP16 County-level data on vote share and various background/demographic information for the 2016 South Carolina GOP/Democratic primaries.

Rows: 46 Columns: 15 .
Column types: character (1); numeric (14)
sealevels These data describe how sea level has changed over time, in both relative and absolute terms. Absolute sea level change refers to the height of the ocean surface regardless of whether nearby land is rising or falling.

Rows: 136 Columns: 5 .
Column types: integer (1); numeric (4)
so2concentrations This data set contains yearly observations by the Environmental Protection Agency on the concentration of sulfur dioxide in parts per billion, based on 32 sites. I use this for in-class illustration. Note that the national standard is 75 parts per billion. Data are the national trend.

Rows: 41 Columns: 4 .
Column types: numeric (4)
states_war A data set on state performance in inter-state wars. This data is useful for evaluating Valentino et al.’s (2010) “Bear Any Burden” analysis using more current data.

Rows: 284 Columns: 23 .
Column types: character (2); numeric (21)
steves_clothes I cobbled together this data set of the professional clothes (polos, long-sleeve dress shirts, pants) in my closet, largely for illustration on the origins of apparel in the U.S. for an intro lecture on trade.

Rows: 86 Columns: 4 .
Column types: character (4)
sugar_price This is primary commodity price data for sugar globally, in the United States, and in Europe for every month from 1980 to (roughly) the present. Prices are nominal U.S. cents per pound and are not seasonally adjusted (“NSA”).

Rows: 1316 Columns: 3 .
Column types: character (1); Date (1); numeric (1)
sweden_counties A simple data set on Sweden’s counties.

Rows: 21 Columns: 6 .
Column types: character (4); numeric (2)
thatcher_approval A data set on satisfaction/dissatisfaction ratings during Margaret Thatcher’s tenure as prime minister.

Rows: 125 Columns: 8 .
Column types: character (1); Date (1); numeric (6)
therms A data set on thermometer ratings for Donald Trump and Barack Obama in 2020. I use these data for in-class illustration of central limit theorem. Basically: the sampling distribution of a population is normal, even if the underlying population is decidedly not.

Rows: 3080 Columns: 2 .
Column types: numeric (2)
turnips A data set on turnip prices from my experience with Animal Crossing (New Horizons)

Rows: 1429 Columns: 3 .
Column types: character (1); Date (1); numeric (1)
TV16 These data come from the 2016 CCES and allow interested students to model the individual correlates of the Trump vote in 2016. Code/analysis heavily indebted to a 2017 analysis I did on my blog (see references).

Rows: 64600 Columns: 21 .
Column types: character (2); integer (1); numeric (18)
ukg_eeri This is a (near) daily data set on the effective exchange rate index for the United Kingdom’s pound sterling from 1990 onward. The data are indexed, such that 100 equals the monthly average in January 2005. This is useful for illustrating devaluations of the pound after Black Wednesday, the financial crisis, and, more recently, the UK’s separation from the European Union.

Rows: 8340 Columns: 2 .
Column types: Date (1); numeric (1)
uniondensity Cross-national data on relative size of the trade unions and predictors in 20 countries. This is a data set of interest to replicating Western and Jackman (1994), who themselves were addressing a debate between Wallerstein and Stephens on which of two highly correlated predictors explains trade union density.

Rows: 20 Columns: 5 .
Column types: character (1); numeric (4)
usa_chn_gdp_forecasts This is a toy data set to examine the time in which we should expect China to overtake the United States in total gross domestic product (GDP), given current trends. It includes an OECD long-term GDP forecast from 2014, and forecasts from the forecast and prophet packages in R.

Rows: 182 Columns: 12 .
Column types: character (1); numeric (11)
usa_computers This is a simple and regrettably incomplete time-series on the percentage of U.S. households with access to a computer, by year.

Rows: 19 Columns: 2 .
Column types: numeric (2)
usa_migration This data set contains counts/estimates for the number of inbound migrants in the U.S as well as outbound migrants of American origin to other countries from 1990 to 2017.

Rows: 3535 Columns: 5 .
Column types: character (3); numeric (2)
usa_states A simple data set from state.abb , state.name , state.region , and state.division (+ District of Columbia). I’d rather just have all these in one place.

Rows: 51 Columns: 4 .
Column types: character (4)
usa_tradegdp A yearly data set on U.S. trade and GDP from 1790 to 2018. Data also include a population variable to facilitate per capita adjustments, if the user sees it useful.

Rows: 229 Columns: 5 .
Column types: numeric (5)
USFAHR A data set on economic aid allocation by the United States for assorted years. These are useful for illustrative cross-sectional relationships between human rights and U.S. aid allocation at what amounts to midway points for various presidential administrations.

Rows: 1654 Columns: 18 .
Column types: character (2); numeric (16)
voteincome A data set on turnout and demographic data from the 2000 Current Population Survey. This is a basic port of the voteincome data from the Zelig package.

Rows: 1500 Columns: 7 .
Column types: character (1); numeric (6)
wbd_example A simple data set drawn from World Bank Open Data. I’ll use it to illustrate some merge issues you might encounter in panel data.

Rows: 4537 Columns: 7 .
Column types: character (3); integer (1); numeric (3)
wvs_ccodes A simple data set that syncs World Values Survey country codes ( s003 ) with corresponding country codes from the Correlates of War state system membership data.

Rows: 112 Columns: 3 .
Column types: character (1); numeric (2)
wvs_immig A data set on attitudes about immigration for all observations in the third to sixth wave of the World Values Survey. I use these data for in-class illustration.

Rows: 310388 Columns: 6 .
Column types: character (1); integer (1); numeric (4)
wvs_justifbribe A data set on attitudes about the justifiability of bribe-taking for all observations in the third to sixth wave of the World Values Survey. I use these data for in-class illustration about seemingly interval-level, but information-poor measurements.

Rows: 348532 Columns: 6 .
Column types: character (1); integer (1); numeric (4)
wvs_usa_abortion A data set on attitudes about the justifiability of abortion in the United States based on World Values Survey responses recorded across six waves (from 1982 to 2011). I assembled this data frame probably around 2014 and routinely use it for in-class illustration about regression, post-estimation simulation, quantities of interest, and how to think about modeling a dependent variable that is on a 1-10 scale, but has curious heaping patterns.

Rows: 10387 Columns: 16 .
Column types: integer (16)
wvs_usa_educat This is a simple data set that summarizes what the education codes are in the World Values Survey for the United States.

Rows: 42 Columns: 6 .
Column types: character (3); numeric (3)
wvs_usa_regions This is a simple data set that summarizes what the region codes are in the World Values Survey for the United States.

Rows: 63 Columns: 6 .
Column types: character (5); numeric (1)
yugo_sales A data set on Yugo sales against two competing models in the United States from 1985 to 1992.

Rows: 24 Columns: 3 .
Column types: character (1); numeric (2)