This is a running list of various parlor tricks that you can do with the data and functions in peacesciencer. Space and time considerations, along with some rigidity imposed by CRAN guidelines, preclude me from including these as outright functions or belaboring them in greater detail in the manuscript. Again, peacesciencer can do a lot, but it can’t do everything. Yet, some of its functionality may not also be obvious from the manuscript or documentation files because they’re not necessarily core functions. Thus “parlor trick” is an appropriate descriptor here.
Create a “New State” Variable
The manuscript includes a partial replication of a state-year civil
conflict analysis analogous to Fearon and Laitin (2003)
and Gibler and Miller
(2014). Both of those analyses include a “new state” variable,
arguing that states within the first two years of their existence are
more likely to experience a civil war onset. The partial replication
does not include this. This is because the easiest way to create this
variable is through a group_by()
mutate based on the row
number of the group, but group_by()
has the unfortunate
side effect of erasing any other attributes in the data (i.e. the
ps_system
and ps_type
attributes). This would
break the peacesciencer pipe. If you want this variable,
I recommend creating and merging this variable after creating the bulk
of the data.
Here’s how you’d do it.
# Hypothetical main data
create_stateyears(system = 'gw') %>%
filter(between(year, 1946, 2019)) %>%
add_ucdp_acd(type = "intrastate") %>%
add_peace_years() -> Data
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(gwcode, year)`
# Add in new state variable after the fact
create_stateyears(system = 'gw') %>%
group_by(gwcode) %>%
mutate(newstate = ifelse(row_number() <= 2, 1, 0)) %>%
left_join(Data, .) %>%
select(gwcode:ucdponset, newstate, everything()) -> Data
#> Joining with `by = join_by(gwcode, statename, year)`
# Proof of concept: Here's India
Data %>% filter(gwcode == 750)
#> # A tibble: 73 × 9
#> gwcode statename year ucdpongoing ucdponset newstate maxintensity
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 750 India 1947 0 0 1 NA
#> 2 750 India 1948 1 1 1 2
#> 3 750 India 1949 1 0 0 2
#> 4 750 India 1950 1 0 0 2
#> 5 750 India 1951 1 0 0 2
#> 6 750 India 1952 0 0 0 NA
#> 7 750 India 1953 0 0 0 NA
#> 8 750 India 1954 0 0 0 NA
#> 9 750 India 1955 0 0 0 NA
#> 10 750 India 1956 1 1 0 1
#> # ℹ 63 more rows
#> # ℹ 2 more variables: conflict_ids <chr>, ucdpspell <dbl>
# And here's Belize
Data %>% filter(gwcode == 80)
#> # A tibble: 39 × 9
#> gwcode statename year ucdpongoing ucdponset newstate maxintensity
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 80 Belize 1981 0 0 1 NA
#> 2 80 Belize 1982 0 0 1 NA
#> 3 80 Belize 1983 0 0 0 NA
#> 4 80 Belize 1984 0 0 0 NA
#> 5 80 Belize 1985 0 0 0 NA
#> 6 80 Belize 1986 0 0 0 NA
#> 7 80 Belize 1987 0 0 0 NA
#> 8 80 Belize 1988 0 0 0 NA
#> 9 80 Belize 1989 0 0 0 NA
#> 10 80 Belize 1990 0 0 0 NA
#> # ℹ 29 more rows
#> # ℹ 2 more variables: conflict_ids <chr>, ucdpspell <dbl>
Code Capabilities/Development/Militarization as Bremer (1992) Did
The manuscript includes a replication of Bremer’s (1992) “dangerous dyads” design, albeit one that leverages newer/better data sources that were unavailable to Bremer at the time. For convenience’s sake, the replication used other approaches to estimating Bremer’s variables, including the “weak-link” mechanisms that Dixon (1994) introduced in his seminal work on democratic conflict resolution. If the user wanted to recreate some of the covariates as Bremer (1992) did it, here would be how to do it.
The covariates in question concern information grabbed from the Correlates of War national material capabilities data set.1 For example, the user guide recreates the “relative power” variable as a proportion of the lower composite index of national capabilities (CINC) variable over the higher one. Bremer opts for a different approach, defining a “relative power” variable as a three-part ordinal category where the more powerful side has a CINC score that is 1) 10 times higher than the less powerful side, 2) three times higher than the other side, or 3) less than three times higher than the other side. Here is the exact passage on p. 322.
Based on these CINC scores, I computed the larger-to-smaller capability ratios for all dyad-years and classified them into three groups. If the capability ratio was less than or equal to three, then the dyad was considered to constitute a case of small power difference. If the ratio was larger than 10, then the power difference was coded as large, whereas a ratio between 3 and 10 was coded as a medium power difference. If either of the CINC scores was missing (or equal to zero) for a ratio calculation, then the power difference score for that dyad was coded as missing also.
This is an easy case_when()
function, but it also
would’ve consumed space and words in a manuscript than the allocated
journal space would allow. There’s added difficulty in making sure to
identify which side in a non-directed dyad-year is more powerful.
cow_ddy %>% # built-in data set for convenience
filter(ccode2 > ccode1) %>% # make it non-directed
# add CINC scores
add_nmc() %>%
# select just what we want
select(ccode1:year, cinc1, cinc2) -> Bremer
Bremer %>%
# create a three-item ordinal relative power category with values 2, 1, and 0
mutate(relpow = case_when(
(cinc1 > cinc2) & (cinc1 > 10*cinc2) ~ 2,
(cinc1 > cinc2) & ((cinc1 > 3*cinc2) & (cinc1 < 10*cinc2)) ~ 1,
(cinc1 > cinc2) & (cinc1 <= 3*cinc2) ~ 0,
# copy-paste, re-arrange
(cinc2 > cinc1) & (cinc2 > 10*cinc1) ~ 2,
(cinc2 > cinc1) & ((cinc2 > 3*cinc1) & (cinc2 < 10*cinc1))~ 1,
(cinc2 > cinc1) & (cinc2 <= 3*cinc1) ~ 0,
TRUE ~ NA_real_
)) -> relpow_example
# Let's inspect the output.
relpow_example %>% na.omit %>%
mutate(whichside = ifelse(cinc1 > cinc2, "ccode1 > ccode2",
"ccode2 >= ccode1")) %>%
group_split(whichside, relpow)
#> <list_of<
#> tbl_df<
#> ccode1 : double
#> ccode2 : double
#> year : double
#> cinc1 : double
#> cinc2 : double
#> relpow : double
#> whichside: character
#> >
#> >[6]>
#> [[1]]
#> # A tibble: 132,639 × 7
#> ccode1 ccode2 year cinc1 cinc2 relpow whichside
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2 200 1892 0.173 0.173 0 ccode1 > ccode2
#> 2 2 200 1897 0.169 0.166 0 ccode1 > ccode2
#> 3 2 200 1898 0.197 0.157 0 ccode1 > ccode2
#> 4 2 200 1899 0.185 0.169 0 ccode1 > ccode2
#> 5 2 200 1900 0.188 0.178 0 ccode1 > ccode2
#> 6 2 200 1901 0.203 0.174 0 ccode1 > ccode2
#> 7 2 200 1902 0.208 0.161 0 ccode1 > ccode2
#> 8 2 200 1903 0.210 0.143 0 ccode1 > ccode2
#> 9 2 200 1904 0.205 0.135 0 ccode1 > ccode2
#> 10 2 200 1905 0.214 0.121 0 ccode1 > ccode2
#> # ℹ 132,629 more rows
#>
#> [[2]]
#> # A tibble: 114,225 × 7
#> ccode1 ccode2 year cinc1 cinc2 relpow whichside
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2 70 1831 0.0420 0.00945 1 ccode1 > ccode2
#> 2 2 70 1832 0.0445 0.00963 1 ccode1 > ccode2
#> 3 2 70 1833 0.0481 0.00958 1 ccode1 > ccode2
#> 4 2 70 1834 0.0478 0.00971 1 ccode1 > ccode2
#> 5 2 70 1835 0.0485 0.00980 1 ccode1 > ccode2
#> 6 2 70 1836 0.0510 0.00941 1 ccode1 > ccode2
#> 7 2 70 1837 0.0535 0.00975 1 ccode1 > ccode2
#> 8 2 70 1838 0.0533 0.00966 1 ccode1 > ccode2
#> 9 2 70 1839 0.0508 0.00948 1 ccode1 > ccode2
#> 10 2 70 1840 0.0495 0.00898 1 ccode1 > ccode2
#> # ℹ 114,215 more rows
#>
#> [[3]]
#> # A tibble: 198,867 × 7
#> ccode1 ccode2 year cinc1 cinc2 relpow whichside
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2 20 1920 0.290 0.0101 2 ccode1 > ccode2
#> 2 2 20 1921 0.253 0.0105 2 ccode1 > ccode2
#> 3 2 20 1922 0.256 0.00841 2 ccode1 > ccode2
#> 4 2 20 1923 0.272 0.00986 2 ccode1 > ccode2
#> 5 2 20 1924 0.254 0.00889 2 ccode1 > ccode2
#> 6 2 20 1925 0.254 0.00870 2 ccode1 > ccode2
#> 7 2 20 1926 0.263 0.00924 2 ccode1 > ccode2
#> 8 2 20 1927 0.239 0.00937 2 ccode1 > ccode2
#> 9 2 20 1928 0.240 0.00970 2 ccode1 > ccode2
#> 10 2 20 1929 0.240 0.00980 2 ccode1 > ccode2
#> # ℹ 198,857 more rows
#>
#> [[4]]
#> # A tibble: 141,100 × 7
#> ccode1 ccode2 year cinc1 cinc2 relpow whichside
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2 200 1861 0.144 0.258 0 ccode2 >= ccode1
#> 2 2 200 1862 0.176 0.251 0 ccode2 >= ccode1
#> 3 2 200 1863 0.179 0.251 0 ccode2 >= ccode1
#> 4 2 200 1864 0.193 0.243 0 ccode2 >= ccode1
#> 5 2 200 1865 0.135 0.256 0 ccode2 >= ccode1
#> 6 2 200 1866 0.0982 0.248 0 ccode2 >= ccode1
#> 7 2 200 1867 0.114 0.253 0 ccode2 >= ccode1
#> 8 2 200 1868 0.107 0.253 0 ccode2 >= ccode1
#> 9 2 200 1869 0.108 0.246 0 ccode2 >= ccode1
#> 10 2 200 1870 0.0990 0.242 0 ccode2 >= ccode1
#> # ℹ 141,090 more rows
#>
#> [[5]]
#> # A tibble: 133,564 × 7
#> ccode1 ccode2 year cinc1 cinc2 relpow whichside
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2 200 1816 0.0397 0.337 1 ccode2 >= ccode1
#> 2 2 200 1817 0.0358 0.328 1 ccode2 >= ccode1
#> 3 2 200 1818 0.0361 0.329 1 ccode2 >= ccode1
#> 4 2 200 1819 0.0371 0.317 1 ccode2 >= ccode1
#> 5 2 200 1820 0.0371 0.317 1 ccode2 >= ccode1
#> 6 2 200 1821 0.0342 0.317 1 ccode2 >= ccode1
#> 7 2 200 1822 0.0329 0.311 1 ccode2 >= ccode1
#> 8 2 200 1823 0.0331 0.318 1 ccode2 >= ccode1
#> 9 2 200 1824 0.0330 0.330 1 ccode2 >= ccode1
#> 10 2 200 1825 0.0342 0.331 1 ccode2 >= ccode1
#> # ℹ 133,554 more rows
#>
#> [[6]]
#> # A tibble: 235,749 × 7
#> ccode1 ccode2 year cinc1 cinc2 relpow whichside
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 20 200 1920 0.0101 0.128 2 ccode2 >= ccode1
#> 2 20 200 1922 0.00841 0.0945 2 ccode2 >= ccode1
#> 3 20 200 1923 0.00986 0.0990 2 ccode2 >= ccode1
#> 4 20 200 1924 0.00889 0.107 2 ccode2 >= ccode1
#> 5 20 200 1925 0.00870 0.0956 2 ccode2 >= ccode1
#> 6 20 200 1939 0.00909 0.0997 2 ccode2 >= ccode1
#> 7 20 255 1934 0.00891 0.0891 2 ccode2 >= ccode1
#> 8 20 255 1935 0.00874 0.103 2 ccode2 >= ccode1
#> 9 20 255 1936 0.00865 0.115 2 ccode2 >= ccode1
#> 10 20 255 1937 0.00893 0.118 2 ccode2 >= ccode1
#> # ℹ 235,739 more rows
Next, the manuscript codes Bremer’s (1992) development/“advanced economies” measure using the weak-link of the lower GDP per capita in the dyad using the simulations from Anders et al. (2020). In my defense, this is exactly the kind of data Bremer wishes he had available to him. He says so himself on footnote 26 on page 324.
Under the most optimistic assumptions about data availability, I would estimate that the number of dyad-years for which the relevant data [GNP or GDP per capita] could be assembled would be less than 20% of the total dyad-years under consideration. A more realistic estimate might be as low as 10%. Clearly, our ability to test a generalization when 80% to 90% of the needed data are missing is very limited, and especially so in this case, because the missing data would be concentrated heavily in the pre-World War II era and less advanced states.
Given this limitation, Bremer uses this approach to coding the development/“advanced economies” measure.
A more economically advanced state should be characterized by possessing a share of system-wide economic capability that is greater than its share of system-wide demographic capability. Hence, in years when this was found to be true, I classified a state as more advanced; otherwise, less advanced. The next step involved examining each pair of states in each year and assigning it to one of three groups: both more advanced (7,160 dyad-years), one more advanced (61,823 dyad-years), and both less advanced (128,939 dyad-years).
Replicating this approach is going to require group-by summaries of the raw national material capabilities data, which is outside of peacesciencer’s core functionality. Bremer’s wording here is a little vague; he doesn’t explain what variable, or variables, comprise “economic capability” and “demographic capability.” Let’s assume that “demographic capability” is just the total population variable whereas the “economic capability” variables include iron and steel production and primary energy consumption. The variable would look something like this.
cow_nmc %>%
group_by(year) %>%
# calculate year proportions
mutate(prop_tpop = tpop/sum(tpop, na.rm=T),
prop_irst = irst/sum(irst, na.rm=T),
prop_pec = pec/sum(pec, na.rm=T)) %>%
ungroup() %>%
# standardize an "economic capability" measure
# then make an advanced dummy
mutate(econcap = (prop_irst + prop_pec)/2,
advanced = ifelse(econcap > prop_tpop, 1, 0)) %>%
select(ccode, year, prop_tpop:ncol(.)) -> Advanced
Advanced
#> # A tibble: 15,951 × 7
#> ccode year prop_tpop prop_irst prop_pec econcap advanced
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1816 0.0398 0.0954 0.00966 0.0525 1
#> 2 2 1817 0.0404 0.0938 0.0103 0.0520 1
#> 3 2 1818 0.0411 0.102 0.0110 0.0564 1
#> 4 2 1819 0.0416 0.101 0.0104 0.0555 1
#> 5 2 1820 0.0422 0.113 0.0105 0.0617 1
#> 6 2 1821 0.0430 0.0927 0.0108 0.0518 1
#> 7 2 1822 0.0431 0.0950 0.0109 0.0530 1
#> 8 2 1823 0.0439 0.0933 0.0111 0.0522 1
#> 9 2 1824 0.0447 0.0861 0.0122 0.0491 1
#> 10 2 1825 0.0453 0.0891 0.0129 0.0510 1
#> # ℹ 15,941 more rows
Now, let’s merge this into the Bremer
data frame we
created. I’ll make this an ordinal variable as well with the same 2, 1,
0 ordering scheme.
Bremer %>%
left_join(., Advanced %>% select(ccode, year, advanced) %>%
rename(ccode1 = ccode, advanced1 = advanced)) %>%
left_join(., Advanced %>% select(ccode, year, advanced) %>%
rename(ccode2 = ccode, advanced2 = advanced)) %>%
mutate(advancedcat = case_when(
advanced1 == 1 & advanced2 == 1 ~ 2,
(advanced1 == 1 & advanced2 == 0) | (advanced1 == 0 & advanced2 == 1) ~ 1,
advanced1 == 0 & advanced2 == 0 ~ 0
)) -> Bremer
#> Joining with `by = join_by(ccode1, year)`
#> Joining with `by = join_by(ccode2, year)`
# Let's inspect the output
Bremer %>% na.omit %>%
group_split(advancedcat)
#> <list_of<
#> tbl_df<
#> ccode1 : double
#> ccode2 : double
#> year : double
#> cinc1 : double
#> cinc2 : double
#> advanced1 : double
#> advanced2 : double
#> advancedcat: double
#> >
#> >[3]>
#> [[1]]
#> # A tibble: 538,707 × 8
#> ccode1 ccode2 year cinc1 cinc2 advanced1 advanced2 advancedcat
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 31 40 1986 0.0000349 0.00326 0 0 0
#> 2 31 40 1987 0.0000349 0.00328 0 0 0
#> 3 31 40 1988 0.0000460 0.00334 0 0 0
#> 4 31 40 1989 0.0000584 0.00335 0 0 0
#> 5 31 40 1990 0.0000511 0.00325 0 0 0
#> 6 31 40 1991 0.0000432 0.00330 0 0 0
#> 7 31 40 1992 0.0000444 0.00271 0 0 0
#> 8 31 40 1993 0.0000479 0.00265 0 0 0
#> 9 31 40 1994 0.0000365 0.00198 0 0 0
#> 10 31 40 1995 0.0000355 0.00161 0 0 0
#> # ℹ 538,697 more rows
#>
#> [[2]]
#> # A tibble: 344,483 × 8
#> ccode1 ccode2 year cinc1 cinc2 advanced1 advanced2 advancedcat
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 31 1986 0.135 0.0000349 1 0 1
#> 2 2 31 1987 0.134 0.0000349 1 0 1
#> 3 2 31 1988 0.134 0.0000460 1 0 1
#> 4 2 31 1989 0.148 0.0000584 1 0 1
#> 5 2 31 1990 0.141 0.0000511 1 0 1
#> 6 2 31 1991 0.137 0.0000432 1 0 1
#> 7 2 31 1992 0.148 0.0000444 1 0 1
#> 8 2 31 1993 0.153 0.0000479 1 0 1
#> 9 2 31 1994 0.146 0.0000365 1 0 1
#> 10 2 31 1995 0.140 0.0000355 1 0 1
#> # ℹ 344,473 more rows
#>
#> [[3]]
#> # A tibble: 54,945 × 8
#> ccode1 ccode2 year cinc1 cinc2 advanced1 advanced2 advancedcat
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 20 1920 0.290 0.0101 1 1 2
#> 2 2 20 1921 0.253 0.0105 1 1 2
#> 3 2 20 1922 0.256 0.00841 1 1 2
#> 4 2 20 1923 0.272 0.00986 1 1 2
#> 5 2 20 1924 0.254 0.00889 1 1 2
#> 6 2 20 1925 0.254 0.00870 1 1 2
#> 7 2 20 1926 0.263 0.00924 1 1 2
#> 8 2 20 1927 0.239 0.00937 1 1 2
#> 9 2 20 1928 0.240 0.00970 1 1 2
#> 10 2 20 1929 0.240 0.00980 1 1 2
#> # ℹ 54,935 more rows
Finally, the manuscript creates a militarization measure that is a weak-link that uses the data on military personnel and total population. Bremer opts for an approach similar to the development indicator he uses.
Instead, I relied on the material capabilities data set discussed above, and classified a state as more militarized if its share of system-wide military capabilities was greater than its share of system-wide demographic capabilities. I classified it less militarized if this was not true. The classification of each dyad-year was then based on whether both, one, or neither of the two states making up the dyad were more militarized in that year.
It reads like this is what he’s doing, while again reiterating that I’m assuming he’s using just the total population variable to measure “demographic capability.”
cow_nmc %>%
group_by(year) %>%
# calculate year proportions
mutate(prop_tpop = tpop/sum(tpop, na.rm=T),
prop_milex = milex/sum(milex, na.rm=T),
prop_milper = milper/sum(milper, na.rm=T)) %>%
ungroup() %>%
# standardize a "military capability" measure
# then make an advanced dummy
mutate(militcap = (prop_milper + prop_milex)/2,
militarized = ifelse(militcap > prop_tpop, 1, 0)) %>%
select(ccode, year, prop_tpop:ncol(.)) -> Militarized
Militarized
#> # A tibble: 15,951 × 7
#> ccode year prop_tpop prop_milex prop_milper militcap militarized
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 1816 0.0398 0.0682 0.00859 0.0384 0
#> 2 2 1817 0.0404 0.0451 0.00827 0.0267 0
#> 3 2 1818 0.0411 0.0370 0.00832 0.0227 0
#> 4 2 1819 0.0416 0.0449 0.00709 0.0260 0
#> 5 2 1820 0.0422 0.0310 0.00733 0.0192 0
#> 6 2 1821 0.0430 0.0345 0.00486 0.0197 0
#> 7 2 1822 0.0431 0.0249 0.00417 0.0146 0
#> 8 2 1823 0.0439 0.0249 0.00534 0.0151 0
#> 9 2 1824 0.0447 0.0295 0.00474 0.0171 0
#> 10 2 1825 0.0453 0.0321 0.00511 0.0186 0
#> # ℹ 15,941 more rows
Let’s merge this into the Bremer
data we created and
inspect the output.
Bremer %>%
left_join(., Militarized %>% select(ccode, year, militarized) %>%
rename(ccode1 = ccode, militarized1 = militarized)) %>%
left_join(., Militarized %>% select(ccode, year, militarized) %>%
rename(ccode2 = ccode, militarized2 = militarized)) %>%
mutate(militcat = case_when(
militarized1 == 1 & militarized2 == 1 ~ 2,
(militarized1 == 1 & militarized2 == 0) |
(advanced1 == 0 & militarized2 == 1) ~ 1,
militarized1 == 0 & militarized2 == 0 ~ 0
)) -> Bremer
#> Joining with `by = join_by(ccode1, year)`
#> Joining with `by = join_by(ccode2, year)`
Bremer %>% select(ccode1:year, militarized1:ncol(.)) %>%
na.omit %>%
group_split(militcat)
#> <list_of<
#> tbl_df<
#> ccode1 : double
#> ccode2 : double
#> year : double
#> militarized1: double
#> militarized2: double
#> militcat : double
#> >
#> >[3]>
#> [[1]]
#> # A tibble: 303,368 × 6
#> ccode1 ccode2 year militarized1 militarized2 militcat
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 20 1923 0 0 0
#> 2 2 20 1925 0 0 0
#> 3 2 20 1926 0 0 0
#> 4 2 20 1927 0 0 0
#> 5 2 20 1928 0 0 0
#> 6 2 20 1929 0 0 0
#> 7 2 20 1930 0 0 0
#> 8 2 20 1931 0 0 0
#> 9 2 20 1932 0 0 0
#> 10 2 20 1933 0 0 0
#> # ℹ 303,358 more rows
#>
#> [[2]]
#> # A tibble: 340,196 × 6
#> ccode1 ccode2 year militarized1 militarized2 militcat
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 20 1920 1 0 1
#> 2 2 20 1921 1 0 1
#> 3 2 20 1922 1 0 1
#> 4 2 20 1924 1 0 1
#> 5 2 20 1947 1 0 1
#> 6 2 20 1948 1 0 1
#> 7 2 20 1949 1 0 1
#> 8 2 20 1950 1 0 1
#> 9 2 20 1971 1 0 1
#> 10 2 20 1973 1 0 1
#> # ℹ 340,186 more rows
#>
#> [[3]]
#> # A tibble: 112,758 × 6
#> ccode1 ccode2 year militarized1 militarized2 militcat
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 20 1942 1 1 2
#> 2 2 20 1943 1 1 2
#> 3 2 20 1944 1 1 2
#> 4 2 20 1945 1 1 2
#> 5 2 20 1946 1 1 2
#> 6 2 20 1951 1 1 2
#> 7 2 20 1952 1 1 2
#> 8 2 20 1953 1 1 2
#> 9 2 20 1954 1 1 2
#> 10 2 20 1955 1 1 2
#> # ℹ 112,748 more rows
If we wanted to perfectly recreate the data as Bremer (1992) did it almost 30 years ago, here’s how you’d do it in peacesciencer (albeit with newer data). Still, I think the data innovations that have followed Bremer (1992) merit the approach employed in the manuscript.
Get Multiple Peace Years in One Fell Swoop
add_peace_years()
is designed to work generally, based
on the other data/functions included in the package. For example, assume
you wanted to a dyad-year analysis comparing the Correlates of War (CoW)
Militarized Interstate Dispute (MID) with the Gibler-Miller-Little
conflict data. Just add both in the pipe and ask for peace-years.
cow_ddy %>%
# non-directed, politically relevant, for convenience
filter(ccode2 > ccode1) %>%
filter_prd() %>%
add_cow_mids(keep = NULL) %>%
add_gml_mids(keep = NULL) %>%
add_peace_years() -> NDY
#> Joining with `by = join_by(ccode1, ccode2, year)`
#> Joining with `by = join_by(ccode1, ccode2, year)`
#> add_cow_mids() IMPORTANT MESSAGE: By default, this function whittles
#> dispute-year data into dyad-year data by first selecting on unique onsets.
#> Thereafter, where duplicates remain, it whittles dispute-year data into
#> dyad-year data in the following order: 1) retaining highest `fatality`, 2)
#> retaining highest `hostlev`, 3) retaining highest estimated `mindur`, 4)
#> retaining highest estimated `maxdur`, 5) retaining reciprocated over
#> non-reciprocated observations, 6) retaining the observation with the lowest
#> start month, and, where duplicates still remained (and they don't), 7) forcibly
#> dropping all duplicates for observations that are otherwise very similar. See:
#> http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
#> Dyadic data are non-directed and initiation variables make no sense in this
#> context.
#> Joining with `by = join_by(ccode1, ccode2, year)`
#> add_gml_mids() IMPORTANT MESSAGE: By default, this function whittles
#> dispute-year data into dyad-year data by first selecting on unique onsets.
#> Thereafter, where duplicates remain, it whittles dispute-year data into
#> dyad-year data in the following order: 1) retaining highest `fatality`, 2)
#> retaining highest `hostlev`, 3) retaining highest estimated `mindur`, 4)
#> retaining highest estimated `maxdur`, 5) retaining reciprocated over
#> non-reciprocated observations, 6) retaining the observation with the lowest
#> start month, and, where duplicates still remained (and they don't), 7) forcibly
#> dropping all duplicates for observations that are otherwise very similar. See:
#> http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
#> Joining with `by = join_by(year, dyad)`
#> Joining with `by = join_by(year, dyad)`
# Here's a snapshot of U.S-Cuba from 1980-89 for illustration sake.
NDY %>%
filter(ccode1 == 2 & ccode2 == 40) %>%
select(ccode1:year, cowmidongoing, gmlmidongoing, cowmidspell:gmlmidspell) %>%
filter(year >= 1980)
#> # A tibble: 37 × 7
#> ccode1 ccode2 year cowmidongoing gmlmidongoing cowmidspell gmlmidspell
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 40 1980 0 0 0 0
#> 2 2 40 1981 1 1 1 1
#> 3 2 40 1982 0 0 0 0
#> 4 2 40 1983 1 1 1 1
#> 5 2 40 1984 0 0 0 0
#> 6 2 40 1985 0 0 1 1
#> 7 2 40 1986 0 1 2 2
#> 8 2 40 1987 1 1 3 0
#> 9 2 40 1988 0 0 0 0
#> 10 2 40 1989 0 0 1 1
#> # ℹ 27 more rows
You can do this with state-year data as well. For example, you can compare how CoW and UCDP code civil wars differently since 1946. Do note, however, that the nature of different state systems used in these data sets means we’ll treat one as a master and merge other codes into it.
create_stateyears(system = 'gw') %>%
filter(between(year, 1946, 2019)) %>%
add_ccode_to_gw() %>%
add_ucdp_acd(type = "intrastate", only_wars = TRUE) %>%
add_cow_wars(type = "intra") %>%
# select just a few things
select(gwcode, ccode, year, statename, ucdpongoing, ucdponset,
cowintraongoing, cowintraonset) %>%
add_peace_years() %>%
select(gwcode:statename, ucdpspell, cowintraspell, everything()) %>%
# India is illustrative of how the two differ.
# UCDP has an intra-state conflict to the level of war early
# into its existence. CoW does not.
filter(gwcode == 750)
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(year, ccode)`
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(ccode, year)`
#> # A tibble: 73 × 10
#> gwcode ccode year statename ucdpspell cowintraspell ucdpongoing ucdponset
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 750 750 1947 India 0 0 0 0
#> 2 750 750 1948 India 1 1 1 1
#> 3 750 750 1949 India 0 2 1 0
#> 4 750 750 1950 India 0 3 1 0
#> 5 750 750 1951 India 0 4 1 0
#> 6 750 750 1952 India 0 5 0 0
#> 7 750 750 1953 India 1 6 0 0
#> 8 750 750 1954 India 2 7 0 0
#> 9 750 750 1955 India 3 8 0 0
#> 10 750 750 1956 India 4 9 0 0
#> # ℹ 63 more rows
#> # ℹ 2 more variables: cowintraongoing <dbl>, cowintraonset <dbl>
Measure Leader Tenure in Days
create_leaderyears()
, by default, returns an estimate of
leader-tenure as the unique calendar year for the leader. I think of
this is a reasonable thing to include, and benchmarking to years is
doing some internal lifting elsewhere in the function that generates
leader-year data from leader-day data in Archigos. However, it can lead
some peculiar observations that may not square with how we knee-jerk
think about leader tenure.
I will illustrate what I mean by this with the case of Jimmy Carter from leader-year data standardized to Correlates of War state system membership.
leader_years <- create_leaderyears(standardize = 'cow')
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(ccode, date)`
leader_years %>% filter(obsid == "USA-1977")
#> # A tibble: 5 × 7
#> obsid leader ccode gender leaderage year yrinoffice
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 USA-1977 Carter 2 M 53 1977 1
#> 2 USA-1977 Carter 2 M 54 1978 2
#> 3 USA-1977 Carter 2 M 55 1979 3
#> 4 USA-1977 Carter 2 M 56 1980 4
#> 5 USA-1977 Carter 2 M 57 1981 5
Jimmy Carter took office in January 1977 (year 1) and had a tenure through 1978 (year 2), 1979 (year 3), 1980 (year 4), and exited office in January 1981 (year 5). We know presidents in the American context have four-year terms. This output suggests five years.
If this is that problematic for the research design, especially one
that may be interested in what happens to leader behavior after a
certain amount of time in office, a user can do something like generate
estimates of leader tenure in a given year to the day. Basically, once
the core leader-year are generated, the user can use the
create_leaderdays()
function and summarize leader tenure in
the year as the minimum number of days the leader was in office in the
year and the maximum number of days the leader was in office in the
year.
# don't standardize the leader-days for this use, just to be safe.
create_leaderdays(standardize = 'none') %>%
# extract year from date
mutate(year = lubridate::year(date)) %>%
# group by leader
group_by(obsid) %>%
# count days in office, for leader tenure
mutate(daysinoffice = seq(1:n())) %>%
# group-by leader and year
group_by(obsid, year) %>%
# how long was the minimum (maximum) days in office for the leader in the year?
summarize(min_daysoffice = min(daysinoffice),
max_dayoffice = max(daysinoffice)) %>%
#practice safe group-by, and assign to object
ungroup() -> leader_tenures
#> `summarise()` has grouped output by 'obsid'. You can override using the
#> `.groups` argument.
# add this information to our data
leader_years %>%
left_join(., leader_tenures) -> leader_years
#> Joining with `by = join_by(obsid, year)`
Here’s what this would look like in the case of Jimmy Carter.
leader_years %>% filter(obsid == "USA-1977")
#> # A tibble: 5 × 9
#> obsid leader ccode gender leaderage year yrinoffice min_daysoffice
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <int>
#> 1 USA-1977 Carter 2 M 53 1977 1 1
#> 2 USA-1977 Carter 2 M 54 1978 2 347
#> 3 USA-1977 Carter 2 M 55 1979 3 712
#> 4 USA-1977 Carter 2 M 56 1980 4 1077
#> 5 USA-1977 Carter 2 M 57 1981 5 1443
#> # ℹ 1 more variable: max_dayoffice <int>
This measure might be more useful. Basically, Jimmy Carter was a new
leader in 1977 (min_daysoffice = 1
). By 1978, he had almost
a year under his belt (i.e. Jan. 1, 1978 was his 347th day in office).
By time he left office in 1981, he had completed 1,462 days on the
job.
create_leaderyears()
elects to not create this
information for the user. No matter, it does not take much effort for
the user to create it if this is the kind of information they
wanted.