Comparative Public Health: The Political Economy of Human Misery and Well-Being

This is a data set for replicating Ghobarah et al. (2004), a reduced form of what they make available on Dataverse for replication. Variables have been renamed for legibility.

Usage

GHR04

Format

A data frame with 182 observations on the following 15 variables.

country: a character vector denoting a country name
iso3c: a three-character ISO code for the country
pubhlthexppgdp: a numeric vector for public health expenditures as a percentage of GDP
totexphlth: a numeric vector for total expenditures on health
hale: a numeric vector for health adjusted life expectancy (in years)
log_gdppc: a numeric vector for (log-transformed) GDP per capita
gini: a numeric vector for income inequality
log_educ: a numeric vector for (log-transformed) educational attainment
log_vanhanen: a numeric vector for (log-transformed) racial-linguistic-religious heterogeneity
rivalry: a dummy variable indicating the presence of an enduring international rivalry for the country
polity: a numeric vector communicating a Polity score, as a measure of the democratic nature of the country's regime
prvhlthexpgdp: a numeric vector for private spending on health as a percentage of GDP
urban_growth: a numeric vector for the pace of urbanization
cwdeaths: a numeric vector for civil war deaths
contig_cw: a dummy variable communicating whether there is a civil war in a geographically contiguous territory

Source

Ghobarah, Hazem Adam, Paul Huth, and Bruce Russett. 2004. "Comparative Public Health: The Political Economy of Human Misery and Well-Being" International Studies Quarterly 48: 73-94

Details

The three-character ISO code is the only new addition to the data. I add this because the country names they have in the data are not neat and may lead users astray if they wanted to search for a specific observation. The ISO code for Yugoslavia (Serbia and Montenegro) around this time was "SCG".

The data the authors make available come with no .do file to indicate what exactly they used. Some forensic work based on the descriptive statistics they mention led to this reduced form of their data, which almost perfectly replicates their results. The differences are typically in the hundredths, and often in the thousandths, and should be considered "good enough" for replication purposes. The descriptive statistics correspond with what the authors report in their analyses for all variables, except the Polity variable. I have no way of knowing how they got the median they report. It should be 6, not 7.

The only real confusion on my end is why I ended up with one more observation than they report in Tables 1 and 3, and two more observations than they report in Table 2. This suggests one (or more?) of their variables they use has an NA, but I have no way of knowing what it could be.