Add Simulated GDP, Population, and GDP per Capita Data
Source:R/add_sim_gdp_pop.R
add_sim_gdp_pop.Rd
add_sim_gdp_pop()
allows you to add estimated gross domestic product
(GDP), population, and GDP per capita data provided by recent updates by
Anders, Fariss, Markowitz (and now Barnum) to the original 2020 publication
in International Studies Quarterly. The function leans on data available in
isard, a spin-off package featuring data that have periodic updates.
Value
add_sim_gdp_pop()
takes a (dyad-year, leader-year, leader-dyad-year,
state-year) data frame and adds information about the simulated GDP,
population, and GDP per capita for that state (or pair of states) in a given
year.
Details
You can read more about the data in the documentation for isard.
The function leans on attributes of the data that are provided by one of the "create" functions. Make sure a recognized function (or data created by that function) appear at the top of the proverbial pipe. Users will also want to note that the function accesses two different data sets. Thus, the data set it uses will depend on whatever peacesciencer understands is the "master" data set (communicated in the attributes field for system type).
Users primarily working in the Correlates of War system will be a little disappointed that the simulations the authors provide are demarcated in the Gleditsch-Ward system. The overlap is substantial, but the data the authors provide are at the mercy of the Gleditsch-Ward system for describing the universe of cases that could have a GDP, a population, or a GDP per capita. There will be conspicuous missingness for Correlates of War data concerning Serbia (1916, 1917), Morocco (1905-1912), Egypt (1856-1882), Saudi Arabia (1927-1931), and Laos (1953). Interested users may want to explore some imputation procedures, potentially leveraging older versions of the data.
Fariss et al. (2022) provide multiple variations of GDP and GDP per capita in their simulations, but the data I provide follow their suggested defaults. The GDP per capita is demarcated in constant 2011 international dollars (purchasing power parity (PPP)), GDP is expenditure-side real GDP in millions of 2017 international dollars (PPP). The simulated population estimate is in millions of people. The Maddison Project Database is the source of simulations for GDP per capita while Penn World Table is the source of simulations for GDP and population. You can use the latter two metrics and create another version of GDP per capita if you like.
The data in isard include simulated standard deviations around the estimate. It's understandable that users are interested in just the point estimate but the variation of uncertainty around the estimate is also important. You should consider incorporating it into your analyses. Be mindful that the data are fundamentally state-year and that extensions to leader-level data should be understood as approximations for leaders in a given state-year.
The keep
argument must include one or more of the estimates included in the
cw_gdppop
or gw_gdppop
data in the isard data. Otherwise, it will
return an error that it cannot subset columns that do not exist.
References
Please cite Miller (2022) for peacesciencer. Beyond that, consult the documentation in isard for additional citations (contingent on which GDP, population, or GDP per capita estimate you are using).
Examples
# just call `library(tidyverse)` at the top of the your script
library(magrittr)
cow_ddy %>% add_sim_gdp_pop()
#> # A tibble: 2,214,930 × 15
#> ccode1 ccode2 year mrgdppc1 sd_mrgdppc1 pwtrgdp1 sd_pwtrgdp1 pwtpop1
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 20 1920 9642. 1468. 1179542. 527271. 107.
#> 2 2 20 1921 9618. 1481. 1170100. 494793. 109.
#> 3 2 20 1922 9837. 1522. 1215550. 491351. 110.
#> 4 2 20 1923 10318. 1636. 1320502. 532126. 112.
#> 5 2 20 1924 10633. 1653. 1348635. 585636. 114.
#> 6 2 20 1925 10871. 1765. 1430219. 610983. 116.
#> 7 2 20 1926 11070. 1720. 1446257. 596361. 117.
#> 8 2 20 1927 11112. 1696. 1477319. 607553. 119.
#> 9 2 20 1928 11204. 1765. 1532243. 663724. 120.
#> 10 2 20 1929 11106. 1744. 1517361. 650671. 122.
#> # ℹ 2,214,920 more rows
#> # ℹ 7 more variables: sd_pwtpop1 <dbl>, mrgdppc2 <dbl>, sd_mrgdppc2 <dbl>,
#> # pwtrgdp2 <dbl>, sd_pwtrgdp2 <dbl>, pwtpop2 <dbl>, sd_pwtpop2 <dbl>
create_stateyears() %>% add_sim_gdp_pop()
#> Joining with `by = join_by(ccode, year)`
#> # A tibble: 17,511 × 9
#> ccode statenme year mrgdppc sd_mrgdppc pwtrgdp sd_pwtrgdp pwtpop sd_pwtpop
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 United St… 1816 2668. 411. 27951. 12015. 9.16 0.690
#> 2 2 United St… 1817 2656. 421. 28557. 12256. 9.41 0.702
#> 3 2 United St… 1818 2644. 419. 29150. 12349. 9.69 0.711
#> 4 2 United St… 1819 2643. 422. 29640. 12462. 9.95 0.714
#> 5 2 United St… 1820 2657. 415. 30452. 13244. 10.2 0.711
#> 6 2 United St… 1821 2683. 420. 31981. 13850. 10.5 0.743
#> 7 2 United St… 1822 2714. 435. 32809. 14113. 10.8 0.760
#> 8 2 United St… 1823 2749. 423. 34079. 14294. 11.1 0.769
#> 9 2 United St… 1824 2785. 434. 35784. 16068. 11.4 0.798
#> 10 2 United St… 1825 2809. 447. 36887. 15932. 11.8 0.812
#> # ℹ 17,501 more rows
create_stateyears(system = "gw") %>% add_sim_gdp_pop()
#> Joining with `by = join_by(gwcode, year)`
#> # A tibble: 18,985 × 9
#> gwcode statename year mrgdppc sd_mrgdppc pwtrgdp sd_pwtrgdp pwtpop sd_pwtpop
#> <dbl> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 United S… 1816 2668. 411. 27951. 12015. 9.16 0.690
#> 2 2 United S… 1817 2656. 421. 28557. 12256. 9.41 0.702
#> 3 2 United S… 1818 2644. 419. 29150. 12349. 9.69 0.711
#> 4 2 United S… 1819 2643. 422. 29640. 12462. 9.95 0.714
#> 5 2 United S… 1820 2657. 415. 30452. 13244. 10.2 0.711
#> 6 2 United S… 1821 2683. 420. 31981. 13850. 10.5 0.743
#> 7 2 United S… 1822 2714. 435. 32809. 14113. 10.8 0.760
#> 8 2 United S… 1823 2749. 423. 34079. 14294. 11.1 0.769
#> 9 2 United S… 1824 2785. 434. 35784. 16068. 11.4 0.798
#> 10 2 United S… 1825 2809. 447. 36887. 15932. 11.8 0.812
#> # ℹ 18,975 more rows