Various Parlor Tricks in `{peacesciencer}` • peacesciencer

library(tidyverse)
library(peacesciencer)

packageVersion("peacesciencer")
#> [1] '1.2.0'
packageVersion("isard") # a dependency, but not formally required.
#> [1] '0.1.0'
Sys.Date()
#> [1] "2025-07-17"

This is a running list of various parlor tricks that you can do with the data and functions in peacesciencer. Space and time considerations, along with some rigidity imposed by CRAN guidelines, preclude me from including these as outright functions or belaboring them in greater detail in the manuscript. Again, peacesciencer can do a lot, but it can’t do everything. Yet, some of its functionality may not also be obvious from the manuscript or documentation files because they’re not necessarily core functions. Thus “parlor trick” is an appropriate descriptor here.

Create a “New State” Variable

The manuscript includes a partial replication of a state-year civil conflict analysis analogous to Fearon and Laitin (2003) and Gibler and Miller (2014). Both of those analyses include a “new state” variable, arguing that states within the first two years of their existence are more likely to experience a civil war onset. The partial replication does not include this. This is because the easiest way to create this variable is through a group_by() mutate based on the row number of the group, but group_by() has the unfortunate side effect of erasing any other attributes in the data (i.e. the ps_system and ps_type attributes). This would break the peacesciencer pipe. If you want this variable, I recommend creating and merging this variable after creating the bulk of the data.

Here’s how you’d do it.

# Hypothetical main data
create_stateyears(system = 'gw') %>%
  filter(between(year, 1946, 2019)) %>%
  add_ucdp_acd(type = "intrastate") %>%
  add_peace_years() -> Data
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(gwcode, year)`

# Add in new state variable after the fact
create_stateyears(system = 'gw') %>%
  group_by(gwcode) %>%
  mutate(newstate = ifelse(row_number() <= 2, 1, 0)) %>%
  left_join(Data, .) %>%
  select(gwcode:ucdponset, newstate, everything()) -> Data
#> Joining with `by = join_by(gwcode, gw_name, microstate, year)`

# Proof of concept: Here's India
Data %>% filter(gwcode == 750)
#> # A tibble: 73 × 10
#>    gwcode gw_name microstate  year ucdpongoing ucdponset newstate maxintensity
#>     <dbl> <chr>        <dbl> <dbl>       <dbl>     <dbl>    <dbl>        <dbl>
#>  1    750 India            0  1947           0         0        1           NA
#>  2    750 India            0  1948           1         1        1            2
#>  3    750 India            0  1949           1         0        0            2
#>  4    750 India            0  1950           1         0        0            2
#>  5    750 India            0  1951           1         0        0            2
#>  6    750 India            0  1952           0         0        0           NA
#>  7    750 India            0  1953           0         0        0           NA
#>  8    750 India            0  1954           0         0        0           NA
#>  9    750 India            0  1955           0         0        0           NA
#> 10    750 India            0  1956           1         1        0            1
#> # ℹ 63 more rows
#> # ℹ 2 more variables: conflict_ids <chr>, ucdpspell <dbl>

# And here's Belize
Data %>% filter(gwcode == 80)
#> # A tibble: 39 × 10
#>    gwcode gw_name microstate  year ucdpongoing ucdponset newstate maxintensity
#>     <dbl> <chr>        <dbl> <dbl>       <dbl>     <dbl>    <dbl>        <dbl>
#>  1     80 Belize           0  1981           0         0        1           NA
#>  2     80 Belize           0  1982           0         0        1           NA
#>  3     80 Belize           0  1983           0         0        0           NA
#>  4     80 Belize           0  1984           0         0        0           NA
#>  5     80 Belize           0  1985           0         0        0           NA
#>  6     80 Belize           0  1986           0         0        0           NA
#>  7     80 Belize           0  1987           0         0        0           NA
#>  8     80 Belize           0  1988           0         0        0           NA
#>  9     80 Belize           0  1989           0         0        0           NA
#> 10     80 Belize           0  1990           0         0        0           NA
#> # ℹ 29 more rows
#> # ℹ 2 more variables: conflict_ids <chr>, ucdpspell <dbl>

Code Capabilities/Development/Militarization as Bremer (1992) Did

The manuscript includes a replication of Bremer’s (1992) “dangerous dyads” design, albeit one that leverages newer/better data sources that were unavailable to Bremer at the time. For convenience’s sake, the replication used other approaches to estimating Bremer’s variables, including the “weak-link” mechanisms that Dixon (1994) introduced in his seminal work on democratic conflict resolution. If the user wanted to recreate some of the covariates as Bremer (1992) did it, here would be how to do it.

The covariates in question concern information grabbed from the Correlates of War national material capabilities data set.¹ For example, the user guide recreates the “relative power” variable as a proportion of the lower composite index of national capabilities (CINC) variable over the higher one. Bremer opts for a different approach, defining a “relative power” variable as a three-part ordinal category where the more powerful side has a CINC score that is 1) 10 times higher than the less powerful side, 2) three times higher than the other side, or 3) less than three times higher than the other side. Here is the exact passage on p. 322.

Based on these CINC scores, I computed the larger-to-smaller capability ratios for all dyad-years and classified them into three groups. If the capability ratio was less than or equal to three, then the dyad was considered to constitute a case of small power difference. If the ratio was larger than 10, then the power difference was coded as large, whereas a ratio between 3 and 10 was coded as a medium power difference. If either of the CINC scores was missing (or equal to zero) for a ratio calculation, then the power difference score for that dyad was coded as missing also.

This is an easy case_when() function, but it also would’ve consumed space and words in a manuscript than the allocated journal space would allow. There’s added difficulty in making sure to identify which side in a non-directed dyad-year is more powerful.

cow_ddy %>% # built-in data set for convenience
  filter(ccode2 > ccode1) %>% # make it non-directed
  # add CINC scores
  add_nmc() %>%
  # select just what we want
  select(ccode1:year, cinc1, cinc2) -> Bremer

Bremer %>% 
  # create a three-item ordinal relative power category with values 2, 1, and 0
  mutate(relpow = case_when(
    (cinc1 > cinc2) & (cinc1 > 10*cinc2) ~ 2,
    (cinc1 > cinc2) & ((cinc1 > 3*cinc2) & (cinc1 < 10*cinc2)) ~ 1,
    (cinc1 > cinc2) & (cinc1 <= 3*cinc2) ~ 0,
    # copy-paste, re-arrange
    (cinc2 > cinc1) & (cinc2 > 10*cinc1) ~ 2,
    (cinc2 > cinc1) & ((cinc2 > 3*cinc1) & (cinc2 < 10*cinc1))~ 1,
    (cinc2 > cinc1) & (cinc2 <= 3*cinc1) ~ 0,
    TRUE ~ NA_real_
  )) -> relpow_example

# Let's inspect the output.
relpow_example %>% na.omit %>% 
  mutate(whichside = ifelse(cinc1 > cinc2, "ccode1 > ccode2", 
                            "ccode2 >= ccode1")) %>%
  group_split(whichside, relpow)
#> <list_of<
#>   tbl_df<
#>     ccode1   : double
#>     ccode2   : double
#>     year     : double
#>     cinc1    : double
#>     cinc2    : double
#>     relpow   : double
#>     whichside: character
#>   >
#> >[6]>
#> [[1]]
#> # A tibble: 132,639 × 7
#>    ccode1 ccode2  year cinc1 cinc2 relpow whichside      
#>     <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <chr>          
#>  1      2    200  1892 0.173 0.173      0 ccode1 > ccode2
#>  2      2    200  1897 0.169 0.166      0 ccode1 > ccode2
#>  3      2    200  1898 0.197 0.157      0 ccode1 > ccode2
#>  4      2    200  1899 0.185 0.169      0 ccode1 > ccode2
#>  5      2    200  1900 0.188 0.178      0 ccode1 > ccode2
#>  6      2    200  1901 0.203 0.174      0 ccode1 > ccode2
#>  7      2    200  1902 0.208 0.161      0 ccode1 > ccode2
#>  8      2    200  1903 0.210 0.143      0 ccode1 > ccode2
#>  9      2    200  1904 0.205 0.135      0 ccode1 > ccode2
#> 10      2    200  1905 0.214 0.121      0 ccode1 > ccode2
#> # ℹ 132,629 more rows
#> 
#> [[2]]
#> # A tibble: 114,225 × 7
#>    ccode1 ccode2  year  cinc1   cinc2 relpow whichside      
#>     <dbl>  <dbl> <dbl>  <dbl>   <dbl>  <dbl> <chr>          
#>  1      2     70  1831 0.0420 0.00945      1 ccode1 > ccode2
#>  2      2     70  1832 0.0445 0.00963      1 ccode1 > ccode2
#>  3      2     70  1833 0.0481 0.00958      1 ccode1 > ccode2
#>  4      2     70  1834 0.0478 0.00971      1 ccode1 > ccode2
#>  5      2     70  1835 0.0485 0.00980      1 ccode1 > ccode2
#>  6      2     70  1836 0.0510 0.00941      1 ccode1 > ccode2
#>  7      2     70  1837 0.0535 0.00975      1 ccode1 > ccode2
#>  8      2     70  1838 0.0533 0.00966      1 ccode1 > ccode2
#>  9      2     70  1839 0.0508 0.00948      1 ccode1 > ccode2
#> 10      2     70  1840 0.0495 0.00898      1 ccode1 > ccode2
#> # ℹ 114,215 more rows
#> 
#> [[3]]
#> # A tibble: 198,867 × 7
#>    ccode1 ccode2  year cinc1   cinc2 relpow whichside      
#>     <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl> <chr>          
#>  1      2     20  1920 0.290 0.0101       2 ccode1 > ccode2
#>  2      2     20  1921 0.253 0.0105       2 ccode1 > ccode2
#>  3      2     20  1922 0.256 0.00841      2 ccode1 > ccode2
#>  4      2     20  1923 0.272 0.00986      2 ccode1 > ccode2
#>  5      2     20  1924 0.254 0.00889      2 ccode1 > ccode2
#>  6      2     20  1925 0.254 0.00870      2 ccode1 > ccode2
#>  7      2     20  1926 0.263 0.00924      2 ccode1 > ccode2
#>  8      2     20  1927 0.239 0.00937      2 ccode1 > ccode2
#>  9      2     20  1928 0.240 0.00970      2 ccode1 > ccode2
#> 10      2     20  1929 0.240 0.00980      2 ccode1 > ccode2
#> # ℹ 198,857 more rows
#> 
#> [[4]]
#> # A tibble: 141,100 × 7
#>    ccode1 ccode2  year  cinc1 cinc2 relpow whichside       
#>     <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <chr>           
#>  1      2    200  1861 0.144  0.258      0 ccode2 >= ccode1
#>  2      2    200  1862 0.176  0.251      0 ccode2 >= ccode1
#>  3      2    200  1863 0.179  0.251      0 ccode2 >= ccode1
#>  4      2    200  1864 0.193  0.243      0 ccode2 >= ccode1
#>  5      2    200  1865 0.135  0.256      0 ccode2 >= ccode1
#>  6      2    200  1866 0.0982 0.248      0 ccode2 >= ccode1
#>  7      2    200  1867 0.114  0.253      0 ccode2 >= ccode1
#>  8      2    200  1868 0.107  0.253      0 ccode2 >= ccode1
#>  9      2    200  1869 0.108  0.246      0 ccode2 >= ccode1
#> 10      2    200  1870 0.0990 0.242      0 ccode2 >= ccode1
#> # ℹ 141,090 more rows
#> 
#> [[5]]
#> # A tibble: 133,564 × 7
#>    ccode1 ccode2  year  cinc1 cinc2 relpow whichside       
#>     <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <chr>           
#>  1      2    200  1816 0.0397 0.337      1 ccode2 >= ccode1
#>  2      2    200  1817 0.0358 0.328      1 ccode2 >= ccode1
#>  3      2    200  1818 0.0361 0.329      1 ccode2 >= ccode1
#>  4      2    200  1819 0.0371 0.317      1 ccode2 >= ccode1
#>  5      2    200  1820 0.0371 0.317      1 ccode2 >= ccode1
#>  6      2    200  1821 0.0342 0.317      1 ccode2 >= ccode1
#>  7      2    200  1822 0.0329 0.311      1 ccode2 >= ccode1
#>  8      2    200  1823 0.0331 0.318      1 ccode2 >= ccode1
#>  9      2    200  1824 0.0330 0.330      1 ccode2 >= ccode1
#> 10      2    200  1825 0.0342 0.331      1 ccode2 >= ccode1
#> # ℹ 133,554 more rows
#> 
#> [[6]]
#> # A tibble: 235,749 × 7
#>    ccode1 ccode2  year   cinc1  cinc2 relpow whichside       
#>     <dbl>  <dbl> <dbl>   <dbl>  <dbl>  <dbl> <chr>           
#>  1     20    200  1920 0.0101  0.128       2 ccode2 >= ccode1
#>  2     20    200  1922 0.00841 0.0945      2 ccode2 >= ccode1
#>  3     20    200  1923 0.00986 0.0990      2 ccode2 >= ccode1
#>  4     20    200  1924 0.00889 0.107       2 ccode2 >= ccode1
#>  5     20    200  1925 0.00870 0.0956      2 ccode2 >= ccode1
#>  6     20    200  1939 0.00909 0.0997      2 ccode2 >= ccode1
#>  7     20    255  1934 0.00891 0.0891      2 ccode2 >= ccode1
#>  8     20    255  1935 0.00874 0.103       2 ccode2 >= ccode1
#>  9     20    255  1936 0.00865 0.115       2 ccode2 >= ccode1
#> 10     20    255  1937 0.00893 0.118       2 ccode2 >= ccode1
#> # ℹ 235,739 more rows

Next, the manuscript codes Bremer’s (1992) development/“advanced economies” measure using the weak-link of the lower GDP per capita in the dyad using the simulations from Anders et al. (2020). In my defense, this is exactly the kind of data Bremer wishes he had available to him. He says so himself on footnote 26 on page 324.

Under the most optimistic assumptions about data availability, I would estimate that the number of dyad-years for which the relevant data [GNP or GDP per capita] could be assembled would be less than 20% of the total dyad-years under consideration. A more realistic estimate might be as low as 10%. Clearly, our ability to test a generalization when 80% to 90% of the needed data are missing is very limited, and especially so in this case, because the missing data would be concentrated heavily in the pre-World War II era and less advanced states.

Given this limitation, Bremer uses this approach to coding the development/“advanced economies” measure.

A more economically advanced state should be characterized by possessing a share of system-wide economic capability that is greater than its share of system-wide demographic capability. Hence, in years when this was found to be true, I classified a state as more advanced; otherwise, less advanced. The next step involved examining each pair of states in each year and assigning it to one of three groups: both more advanced (7,160 dyad-years), one more advanced (61,823 dyad-years), and both less advanced (128,939 dyad-years).

Replicating this approach is going to require group-by summaries of the raw national material capabilities data, which is outside of peacesciencer’s core functionality. Bremer’s wording here is a little vague; he doesn’t explain what variable, or variables, comprise “economic capability” and “demographic capability.” Let’s assume that “demographic capability” is just the total population variable whereas the “economic capability” variables include iron and steel production and primary energy consumption. The variable would look something like this.

cow_nmc %>%
  group_by(year) %>%
  # calculate year proportions
  mutate(prop_tpop = tpop/sum(tpop, na.rm=T),
         prop_irst = irst/sum(irst, na.rm=T),
         prop_pec = pec/sum(pec, na.rm=T)) %>%
  ungroup() %>%
  # standardize an "economic capability" measure
  # then make an advanced dummy
  mutate(econcap = (prop_irst + prop_pec)/2,
         advanced = ifelse(econcap > prop_tpop, 1, 0)) %>%
  select(ccode, year, prop_tpop:ncol(.)) -> Advanced

Advanced
#> # A tibble: 15,951 × 7
#>    ccode  year prop_tpop prop_irst prop_pec econcap advanced
#>    <dbl> <dbl>     <dbl>     <dbl>    <dbl>   <dbl>    <dbl>
#>  1     2  1816    0.0398    0.0954  0.00966  0.0525        1
#>  2     2  1817    0.0404    0.0938  0.0103   0.0520        1
#>  3     2  1818    0.0411    0.102   0.0110   0.0564        1
#>  4     2  1819    0.0416    0.101   0.0104   0.0555        1
#>  5     2  1820    0.0422    0.113   0.0105   0.0617        1
#>  6     2  1821    0.0430    0.0927  0.0108   0.0518        1
#>  7     2  1822    0.0431    0.0950  0.0109   0.0530        1
#>  8     2  1823    0.0439    0.0933  0.0111   0.0522        1
#>  9     2  1824    0.0447    0.0861  0.0122   0.0491        1
#> 10     2  1825    0.0453    0.0891  0.0129   0.0510        1
#> # ℹ 15,941 more rows

Now, let’s merge this into the Bremer data frame we created. I’ll make this an ordinal variable as well with the same 2, 1, 0 ordering scheme.

Bremer %>%
  left_join(., Advanced %>% select(ccode, year, advanced) %>%
              rename(ccode1 = ccode, advanced1 = advanced)) %>%
  left_join(., Advanced %>% select(ccode, year, advanced) %>% 
              rename(ccode2 = ccode, advanced2 = advanced)) %>%
  mutate(advancedcat = case_when(
    advanced1 == 1 & advanced2 == 1 ~ 2,
    (advanced1 == 1 & advanced2 == 0) | (advanced1 == 0 & advanced2 == 1) ~ 1,
    advanced1 == 0 & advanced2 == 0 ~ 0
  )) -> Bremer
#> Joining with `by = join_by(ccode1, year)`
#> Joining with `by = join_by(ccode2, year)`

# Let's inspect the output
Bremer %>% na.omit %>%
  group_split(advancedcat) 
#> <list_of<
#>   tbl_df<
#>     ccode1     : double
#>     ccode2     : double
#>     year       : double
#>     cinc1      : double
#>     cinc2      : double
#>     advanced1  : double
#>     advanced2  : double
#>     advancedcat: double
#>   >
#> >[3]>
#> [[1]]
#> # A tibble: 538,707 × 8
#>    ccode1 ccode2  year     cinc1   cinc2 advanced1 advanced2 advancedcat
#>     <dbl>  <dbl> <dbl>     <dbl>   <dbl>     <dbl>     <dbl>       <dbl>
#>  1     31     40  1986 0.0000349 0.00326         0         0           0
#>  2     31     40  1987 0.0000349 0.00328         0         0           0
#>  3     31     40  1988 0.0000460 0.00334         0         0           0
#>  4     31     40  1989 0.0000584 0.00335         0         0           0
#>  5     31     40  1990 0.0000511 0.00325         0         0           0
#>  6     31     40  1991 0.0000432 0.00330         0         0           0
#>  7     31     40  1992 0.0000444 0.00271         0         0           0
#>  8     31     40  1993 0.0000479 0.00265         0         0           0
#>  9     31     40  1994 0.0000365 0.00198         0         0           0
#> 10     31     40  1995 0.0000355 0.00161         0         0           0
#> # ℹ 538,697 more rows
#> 
#> [[2]]
#> # A tibble: 344,483 × 8
#>    ccode1 ccode2  year cinc1     cinc2 advanced1 advanced2 advancedcat
#>     <dbl>  <dbl> <dbl> <dbl>     <dbl>     <dbl>     <dbl>       <dbl>
#>  1      2     31  1986 0.135 0.0000349         1         0           1
#>  2      2     31  1987 0.134 0.0000349         1         0           1
#>  3      2     31  1988 0.134 0.0000460         1         0           1
#>  4      2     31  1989 0.148 0.0000584         1         0           1
#>  5      2     31  1990 0.141 0.0000511         1         0           1
#>  6      2     31  1991 0.137 0.0000432         1         0           1
#>  7      2     31  1992 0.148 0.0000444         1         0           1
#>  8      2     31  1993 0.153 0.0000479         1         0           1
#>  9      2     31  1994 0.146 0.0000365         1         0           1
#> 10      2     31  1995 0.140 0.0000355         1         0           1
#> # ℹ 344,473 more rows
#> 
#> [[3]]
#> # A tibble: 54,945 × 8
#>    ccode1 ccode2  year cinc1   cinc2 advanced1 advanced2 advancedcat
#>     <dbl>  <dbl> <dbl> <dbl>   <dbl>     <dbl>     <dbl>       <dbl>
#>  1      2     20  1920 0.290 0.0101          1         1           2
#>  2      2     20  1921 0.253 0.0105          1         1           2
#>  3      2     20  1922 0.256 0.00841         1         1           2
#>  4      2     20  1923 0.272 0.00986         1         1           2
#>  5      2     20  1924 0.254 0.00889         1         1           2
#>  6      2     20  1925 0.254 0.00870         1         1           2
#>  7      2     20  1926 0.263 0.00924         1         1           2
#>  8      2     20  1927 0.239 0.00937         1         1           2
#>  9      2     20  1928 0.240 0.00970         1         1           2
#> 10      2     20  1929 0.240 0.00980         1         1           2
#> # ℹ 54,935 more rows

Finally, the manuscript creates a militarization measure that is a weak-link that uses the data on military personnel and total population. Bremer opts for an approach similar to the development indicator he uses.

Instead, I relied on the material capabilities data set discussed above, and classified a state as more militarized if its share of system-wide military capabilities was greater than its share of system-wide demographic capabilities. I classified it less militarized if this was not true. The classification of each dyad-year was then based on whether both, one, or neither of the two states making up the dyad were more militarized in that year.

It reads like this is what he’s doing, while again reiterating that I’m assuming he’s using just the total population variable to measure “demographic capability.”

cow_nmc %>%
  group_by(year) %>%
  # calculate year proportions
  mutate(prop_tpop = tpop/sum(tpop, na.rm=T),
         prop_milex = milex/sum(milex, na.rm=T),
         prop_milper = milper/sum(milper, na.rm=T)) %>%
  ungroup() %>%
  # standardize a "military capability" measure
  # then make an advanced dummy
  mutate(militcap = (prop_milper + prop_milex)/2,
         militarized = ifelse(militcap > prop_tpop, 1, 0)) %>%
  select(ccode, year, prop_tpop:ncol(.)) -> Militarized

Militarized
#> # A tibble: 15,951 × 7
#>    ccode  year prop_tpop prop_milex prop_milper militcap militarized
#>    <dbl> <dbl>     <dbl>      <dbl>       <dbl>    <dbl>       <dbl>
#>  1     2  1816    0.0398     0.0682     0.00859   0.0384           0
#>  2     2  1817    0.0404     0.0451     0.00827   0.0267           0
#>  3     2  1818    0.0411     0.0370     0.00832   0.0227           0
#>  4     2  1819    0.0416     0.0449     0.00709   0.0260           0
#>  5     2  1820    0.0422     0.0310     0.00733   0.0192           0
#>  6     2  1821    0.0430     0.0345     0.00486   0.0197           0
#>  7     2  1822    0.0431     0.0249     0.00417   0.0146           0
#>  8     2  1823    0.0439     0.0249     0.00534   0.0151           0
#>  9     2  1824    0.0447     0.0295     0.00474   0.0171           0
#> 10     2  1825    0.0453     0.0321     0.00511   0.0186           0
#> # ℹ 15,941 more rows

Let’s merge this into the Bremer data we created and inspect the output.

Bremer %>%
  left_join(., Militarized %>% select(ccode, year, militarized) %>% 
              rename(ccode1 = ccode, militarized1 = militarized)) %>%
  left_join(., Militarized %>% select(ccode, year, militarized) %>% 
              rename(ccode2 = ccode, militarized2 = militarized)) %>%
  mutate(militcat = case_when(
    militarized1 == 1 & militarized2 == 1 ~ 2,
    (militarized1 == 1 & militarized2 == 0) | 
      (advanced1 == 0 & militarized2 == 1) ~ 1,
    militarized1 == 0 & militarized2 == 0 ~ 0
  )) -> Bremer
#> Joining with `by = join_by(ccode1, year)`
#> Joining with `by = join_by(ccode2, year)`

Bremer %>% select(ccode1:year, militarized1:ncol(.)) %>% 
  na.omit %>%
  group_split(militcat) 
#> <list_of<
#>   tbl_df<
#>     ccode1      : double
#>     ccode2      : double
#>     year        : double
#>     militarized1: double
#>     militarized2: double
#>     militcat    : double
#>   >
#> >[3]>
#> [[1]]
#> # A tibble: 303,368 × 6
#>    ccode1 ccode2  year militarized1 militarized2 militcat
#>     <dbl>  <dbl> <dbl>        <dbl>        <dbl>    <dbl>
#>  1      2     20  1923            0            0        0
#>  2      2     20  1925            0            0        0
#>  3      2     20  1926            0            0        0
#>  4      2     20  1927            0            0        0
#>  5      2     20  1928            0            0        0
#>  6      2     20  1929            0            0        0
#>  7      2     20  1930            0            0        0
#>  8      2     20  1931            0            0        0
#>  9      2     20  1932            0            0        0
#> 10      2     20  1933            0            0        0
#> # ℹ 303,358 more rows
#> 
#> [[2]]
#> # A tibble: 340,196 × 6
#>    ccode1 ccode2  year militarized1 militarized2 militcat
#>     <dbl>  <dbl> <dbl>        <dbl>        <dbl>    <dbl>
#>  1      2     20  1920            1            0        1
#>  2      2     20  1921            1            0        1
#>  3      2     20  1922            1            0        1
#>  4      2     20  1924            1            0        1
#>  5      2     20  1947            1            0        1
#>  6      2     20  1948            1            0        1
#>  7      2     20  1949            1            0        1
#>  8      2     20  1950            1            0        1
#>  9      2     20  1971            1            0        1
#> 10      2     20  1973            1            0        1
#> # ℹ 340,186 more rows
#> 
#> [[3]]
#> # A tibble: 112,758 × 6
#>    ccode1 ccode2  year militarized1 militarized2 militcat
#>     <dbl>  <dbl> <dbl>        <dbl>        <dbl>    <dbl>
#>  1      2     20  1942            1            1        2
#>  2      2     20  1943            1            1        2
#>  3      2     20  1944            1            1        2
#>  4      2     20  1945            1            1        2
#>  5      2     20  1946            1            1        2
#>  6      2     20  1951            1            1        2
#>  7      2     20  1952            1            1        2
#>  8      2     20  1953            1            1        2
#>  9      2     20  1954            1            1        2
#> 10      2     20  1955            1            1        2
#> # ℹ 112,748 more rows

If we wanted to perfectly recreate the data as Bremer (1992) did it almost 30 years ago, here’s how you’d do it in peacesciencer (albeit with newer data). Still, I think the data innovations that have followed Bremer (1992) merit the approach employed in the manuscript.

Get Multiple Peace Years in One Fell Swoop

add_peace_years() is designed to work generally, based on the other data/functions included in the package. For example, assume you wanted to a dyad-year analysis comparing the Correlates of War (CoW) Militarized Interstate Dispute (MID) with the Gibler-Miller-Little conflict data. Just add both in the pipe and ask for peace-years.

cow_ddy %>%
  # non-directed, politically relevant, for convenience
  filter(ccode2 > ccode1) %>%
  filter_prd() %>%
  add_cow_mids(keep = NULL) %>%
  add_gml_mids(keep = NULL) %>%
  add_peace_years() -> NDY
#> Joining with `by = join_by(ccode1, ccode2, year)`
#> add_cow_mids() IMPORTANT MESSAGE: By default, this function whittles
#> dispute-year data into dyad-year data by first selecting on unique onsets.
#> Thereafter, where duplicates remain, it whittles dispute-year data into
#> dyad-year data in the following order: 1) retaining highest `fatality`, 2)
#> retaining highest `hostlev`, 3) retaining highest estimated `mindur`, 4)
#> retaining highest estimated `maxdur`, 5) retaining reciprocated over
#> non-reciprocated observations, 6) retaining the observation with the lowest
#> start month, and, where duplicates still remained (and they don't), 7) forcibly
#> dropping all duplicates for observations that are otherwise very similar. See:
#> http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
#> Dyadic data are non-directed and initiation variables make no sense in this
#> context.
#> Joining with `by = join_by(ccode1, ccode2, year)`
#> add_gml_mids() IMPORTANT MESSAGE: By default, this function whittles
#> dispute-year data into dyad-year data by first selecting on unique onsets.
#> Thereafter, where duplicates remain, it whittles dispute-year data into
#> dyad-year data in the following order: 1) retaining highest `fatality`, 2)
#> retaining highest `hostlev`, 3) retaining highest estimated `mindur`, 4)
#> retaining highest estimated `maxdur`, 5) retaining reciprocated over
#> non-reciprocated observations, 6) retaining the observation with the lowest
#> start month, and, where duplicates still remained (and they don't), 7) forcibly
#> dropping all duplicates for observations that are otherwise very similar. See:
#> http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
#> Joining with `by = join_by(year, dyad)`
#> Joining with `by = join_by(year, dyad)`

# Here's a snapshot of U.S-Cuba from 1980-89 for illustration sake.
NDY %>%
  filter(ccode1 == 2 & ccode2 == 40) %>%
  select(ccode1:year, cowmidongoing, gmlmidongoing, cowmidspell:gmlmidspell) %>%
  filter(year >= 1980)
#> # A tibble: 37 × 7
#>    ccode1 ccode2  year cowmidongoing gmlmidongoing cowmidspell gmlmidspell
#>     <dbl>  <dbl> <dbl>         <dbl>         <dbl>       <dbl>       <dbl>
#>  1      2     40  1980             0             0           0           0
#>  2      2     40  1981             1             1           1           1
#>  3      2     40  1982             0             0           0           0
#>  4      2     40  1983             1             1           1           1
#>  5      2     40  1984             0             0           0           0
#>  6      2     40  1985             0             0           1           1
#>  7      2     40  1986             0             1           2           2
#>  8      2     40  1987             1             1           3           0
#>  9      2     40  1988             0             0           0           0
#> 10      2     40  1989             0             0           1           1
#> # ℹ 27 more rows

You can do this with state-year data as well. For example, you can compare how CoW and UCDP code civil wars differently since 1946. Do note, however, that the nature of different state systems used in these data sets means we’ll treat one as a master and merge other codes into it.

create_stateyears(system = 'gw') %>%
  filter(between(year, 1946, 2019)) %>%
  add_ccode_to_gw() %>%
  add_ucdp_acd(type = "intrastate", only_wars = TRUE) %>%
  add_cow_wars(type = "intra") %>%
  # select just a few things
  select(gwcode, ccode, year, gw_name, ucdpongoing, ucdponset,
         cowintraongoing, cowintraonset) %>%
  add_peace_years() %>%
  select(gwcode:gw_name, ucdpspell, cowintraspell, everything()) %>%
  # India is illustrative of how the two differ.
  # UCDP has an intra-state conflict to the level of war early 
  #  into its existence. CoW does not.
  filter(gwcode == 750)
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(year, ccode)`
#> Joining with `by = join_by(gwcode, year)`
#> Joining with `by = join_by(ccode, year)`
#> # A tibble: 73 × 10
#>    gwcode ccode  year gw_name ucdpspell cowintraspell ucdpongoing ucdponset
#>     <dbl> <dbl> <dbl> <chr>       <dbl>         <dbl>       <dbl>     <dbl>
#>  1    750   750  1947 India           0             0           0         0
#>  2    750   750  1948 India           1             1           1         1
#>  3    750   750  1949 India           0             2           1         0
#>  4    750   750  1950 India           0             3           1         0
#>  5    750   750  1951 India           0             4           1         0
#>  6    750   750  1952 India           0             5           0         0
#>  7    750   750  1953 India           1             6           0         0
#>  8    750   750  1954 India           2             7           0         0
#>  9    750   750  1955 India           3             8           0         0
#> 10    750   750  1956 India           4             9           0         0
#> # ℹ 63 more rows
#> # ℹ 2 more variables: cowintraongoing <dbl>, cowintraonset <dbl>

Measure Leader Tenure in Days

create_leaderyears(), by default, returns an estimate of leader-tenure as the unique calendar year for the leader. I think of this is a reasonable thing to include, and benchmarking to years is doing some internal lifting elsewhere in the function that generates leader-year data from leader-day data in Archigos. However, it can lead some peculiar observations that may not square with how we knee-jerk think about leader tenure.

I will illustrate what I mean by this with the case of Jimmy Carter from leader-year data standardized to Correlates of War state system membership.

leader_years <- create_leaderyears(standardize = 'cow')

leader_years %>% filter(obsid == "USA-1977")
#> # A tibble: 5 × 7
#>   obsid    leader ccode gender leaderage  year yrinoffice
#>   <chr>    <chr>  <dbl> <chr>      <dbl> <dbl>      <dbl>
#> 1 USA-1977 Carter     2 M             53  1977          1
#> 2 USA-1977 Carter     2 M             54  1978          2
#> 3 USA-1977 Carter     2 M             55  1979          3
#> 4 USA-1977 Carter     2 M             56  1980          4
#> 5 USA-1977 Carter     2 M             57  1981          5

Jimmy Carter took office in January 1977 (year 1) and had a tenure through 1978 (year 2), 1979 (year 3), 1980 (year 4), and exited office in January 1981 (year 5). We know presidents in the American context have four-year terms. This output suggests five years.

If this is that problematic for the research design, especially one that may be interested in what happens to leader behavior after a certain amount of time in office, a user can do something like generate estimates of leader tenure in a given year to the day. Basically, once the core leader-year are generated, the user can use the create_leaderdays() function and summarize leader tenure in the year as the minimum number of days the leader was in office in the year and the maximum number of days the leader was in office in the year.

# don't standardize the leader-days for this use, just to be safe.
create_leaderdays(standardize = 'none') %>% 
  # extract year from date
  mutate(year = lubridate::year(date)) %>%
  # group by leader
  group_by(obsid) %>%
  # count days in office, for leader tenure
  mutate(daysinoffice = seq(1:n())) %>%
  # group-by leader and year
  group_by(obsid, year) %>%
  # how long was the minimum (maximum) days in office for the leader in the year?
  summarize(min_daysoffice = min(daysinoffice),
            max_dayoffice = max(daysinoffice)) %>%
  #practice safe group-by, and assign to object
  ungroup() -> leader_tenures
#> `summarise()` has grouped output by 'obsid'. You can override using the
#> `.groups` argument.

# add this information to our data
leader_years %>%
  left_join(., leader_tenures) -> leader_years
#> Joining with `by = join_by(obsid, year)`

Here’s what this would look like in the case of Jimmy Carter.

leader_years %>% filter(obsid == "USA-1977")
#> # A tibble: 5 × 9
#>   obsid    leader ccode gender leaderage  year yrinoffice min_daysoffice
#>   <chr>    <chr>  <dbl> <chr>      <dbl> <dbl>      <dbl>          <int>
#> 1 USA-1977 Carter     2 M             53  1977          1              1
#> 2 USA-1977 Carter     2 M             54  1978          2            347
#> 3 USA-1977 Carter     2 M             55  1979          3            712
#> 4 USA-1977 Carter     2 M             56  1980          4           1077
#> 5 USA-1977 Carter     2 M             57  1981          5           1443
#> # ℹ 1 more variable: max_dayoffice <int>

This measure might be more useful. Basically, Jimmy Carter was a new leader in 1977 (min_daysoffice = 1). By 1978, he had almost a year under his belt (i.e. Jan. 1, 1978 was his 347th day in office). By time he left office in 1981, he had completed 1,462 days on the job.

create_leaderyears() elects to not create this information for the user. No matter, it does not take much effort for the user to create it if this is the kind of information they wanted.