library(tidyverse)
library(peacesciencer)
library(lubridate)

The peace science data ecosystem fundamentally revolves around two different classifications of what is a “state” and what comprises the state system since the Congress of Vienna. The first is the Correlates of War (CoW) system. The CoW system is, as far as I can tell, the first of its kind to devise numeric codes for states in the international system while also offering a temporal dimension. I am unaware of a definitive article describing the CoW system data—Sarkees and Wayman discuss it in their book—but the criteria for consideration of a state hinge on matters of territorial occupation (i.e. a state is a geopolitical entity), population size (>500,000, supposedly), diplomatic recognition (by the UK or France before 1919 and by the League of Nations/UN afterward), having an independent foreign policy (which is why “One China, two systems” Hong Kong won’t count), and having a sovereign political authority (i.e. to filter out puppet states).1 Peace scientists, certainly in the inter-state context, know these data well. They serve as the basis for the entire CoW data ecosystem.

The Gleditsch-Ward (G-W) system is a revision of this state system. Introduced in a 1999 article in International Interactions, G-W raise some conceptual problems (as they see it) with the CoW system. Their proposed changes amount to an alternative system that features prominently in some important data sets. Civil conflict researchers likely use this system more than the CoW system because the G-W system is the system of choice for the UCDP armed conflict data. The G-W system also feature prominently in some macroeconomic data and some spatial data sets.

This vignette will offer a discussion of these two systems. It will eschew the conceptual issues that motivate the divergences between the two; indeed, the differences are often overstated to belie the commonality between the two. Integration of the two in—spitballing here—more than 90% of applications will be unproblematic, even in most cases where both systems disagree on something. However, where the two do differ, they sometimes differ in ways that amount to a collision of one system into the other. peacesciencer treats these collisions as unavoidable. The package’s functions acknowledge this and, subtly, encourage the researcher to acknowledge this as well.

## Where the Systems Conflict With Each Other

By in large, a user might see the differences between the two systems and overstate the differences they see, at least for practical concerns. For example, CoW treats Canada as an independent state only starting in 1920 (coinciding with its founding membership in the League of Nations, one of its coding criteria) whereas G-W have Canada as an independent state starting in 1867 (coinciding with The British North America Act, 1867). However, the code for Canada (20) is identical in both systems and Canada never had a period where it disappeared from either system. These cases are simple; one temporal domain is a subset of the other system’s temporal domain for a particular state. There are other differences to note. G-W will have a few states that CoW doesn’t have (e.g. Transvaal and Orange Free State) and CoW will have a few states that G-W don’t have (e.g. Sao Tome and Principe and Seychelles). These cases are simple; there is no corresponding state code for the entity in the other system. The commonality of both is more apparent than the differences (at least I think), and a lot of the differences pose no real problem for integrating one into the other. Yet, it is also true that where they differ, they sometimes really differ. Trying to integrate one into the other amounts to a collision.

Here is an example where the two systems will collide with each other. Yemeni unification is one of several points of divergence between the two systems. Unlike, say, the dissolution of Yugoslavia, both CoW and G-W are in agreement on when the unification took place (22 May 1990). However, they disagree on what this means for data entry. I am unaware of either G-W or CoW discussing this exact case, but the difference in interpretation mirrors (likely) how they see the unification of Germany that same year. CoW seems to interpret Yemeni unification as the creation of an unseen, entirely new Yemeni state and not just a simple integration of one into the other. Thus, the newly formed Republic of Yemen (“Yemen”) gets a new state code.2 G-W seem to interpret that unification was less the case of the formation of a new state, but more the demise of the People’s Democratic Republic of Yemen (“South Yemen”) and its integration into the Yemen Arab Republic (“North Yemen”). There is no new code for this new entity, just the continuation of “Yemen (Arab Republic of Yemen)” and the demise of “Yemen, People’s Republic of.”

This might seem like it’s a distinction without much of a difference, but it will matter in your standard peace science data. Here, for example, is what would happen when we merge G-W codes into CoW state-year data for these cases.

create_stateyears() %>%
filter(ccode %in% c(678:680) & year %in% c(1988:1991)) %>%
#> Joining with by = join_by(ccode, year)
#> # A tibble: 8 × 4
#>   ccode statenme                 year gwcode
#>   <dbl> <chr>                   <dbl>  <dbl>
#> 1   678 Yemen Arab Republic      1988    678
#> 2   678 Yemen Arab Republic      1989    678
#> 3   678 Yemen Arab Republic      1990    678
#> 4   679 Yemen                    1990    678
#> 5   679 Yemen                    1991    678
#> 6   680 Yemen People's Republic  1988    680
#> 7   680 Yemen People's Republic  1989    680
#> 8   680 Yemen People's Republic  1990    680

The G-W code of 678 is going to be duplicated twice in these data. G-W code 678 in 1990 refers to both the Yemen Arab Republic before unification, and Yemen after unification. CoW sees two states where G-W sees one. For state-day data, this would not be a problem. For state-year data, this becomes a problem because the aggregation of time results in duplicate entries for the code being merged into the data. Worse yet, there is no easy way around this and it’s a unique issue that arises in trying to integrate two different state systems with each other.

Here is another case where both systems will collide with each other: Serbia and Yugoslavia. In this case, both state systems differ in major ways on both classifying entities and dates.

cow_states %>%
mutate(startdate = ymd(paste0(styear,"/",stmonth, "/", stday)),
enddate = ymd(paste0(endyear,"/",endmonth,"/",endday))) %>%
select(stateabb:statenme, startdate, enddate) %>%
mutate(data = "CoW") %>%
rename(statename = statenme) %>%
filter(ccode == 345) %>%
bind_rows(., gw_states %>%
filter(gwcode %in% c(340, 345)) %>%
mutate(data = "G-W")) %>%
select(data, stateabb, statename, ccode, gwcode, everything())
#> # A tibble: 5 × 7
#>   data  stateabb statename  ccode gwcode startdate  enddate
#>   <chr> <chr>    <chr>      <dbl>  <dbl> <date>     <date>
#> 1 CoW   YUG      Yugoslavia   345     NA 1878-07-13 1941-04-20
#> 2 CoW   YUG      Yugoslavia   345     NA 1944-10-20 2016-12-31
#> 3 G-W   SER      Serbia        NA    340 1878-07-13 1915-10-01
#> 4 G-W   SER      Serbia        NA    340 2006-06-05 2017-12-31
#> 5 G-W   YUG      Yugoslavia    NA    345 1918-12-01 2006-06-04

The main difference here is how should we interpret what Yugoslavia was. Unlike the case of Yemen, G-W discuss Yugoslavia a bit in their 1999 article. Here is one passage on page 397.

Yugoslavia appears in the COW-list continuously from 1878 to 1941. However,the Serbian government fled the German invasion in 1915. The new kingdom of Serbia, Croatia, and Montenegro was proclaimed in 1918 and did not become the Kingdom of Yugoslavia until 1929. Is it sensible to consider this a single polity?

They revisit this case again on page 401 when describing the major differences between their system and CoW’s system.

Unlike COW, we consider Serbia from 1878 to the Austro-Hungarian invasion in 1915 to be a different polity from the Kingdom of the Croats, Serbs, and Slovenes (renamed Yugoslavia in 1929), which is not established until 1918.

There is an interesting difference of interpretation about whether Serbia should disappear from the international system for a three-year period during World War I. CoW says “no” while G-W point to the government’s retreat through Albania and the Austro-Hungarian/Bulgarian occupations as suggestive of a “state” without territory (and, thus, not a state). The bigger difference of interpretation concerns how to interpret “Yugoslavia.” CoW seems to interpret a Serbian “center” to Yugoslavia, analogous to their interpretation (and G-W’s interpretation) of a Prussian core to the German Empire. For CoW, this means Serbia precedes and succeeds Yugoslavia and Yugoslavia is fundamentally a territorial expansion of Serbia as a result of World War I. For G-W, the 1915 retreat of the Serbian government and the 1918 creation of the State of Slovenes, Croats and Serbs amounts to the death of one state (Serbia) and the formation of a new state (Yugoslavia) few years later. Yugoslavia dies in 2006 when the last remnant of its creation, Montenegro, emerges as independent from Serbia. Thus, Serbia reappears as a state system entity for the first time since 1915.

The integration here will run inverse to the situation with Yemeni and German unification in 1990. In those cases, G-W see integration whereas CoW sees new state creation (in the case of Yemen) or old state restoration (in the case of Germany). In this case, CoW sees one continuous state breaking apart whereas G-W see a state death and old state restoration. Correlates of War state code 345 will appear twice for 2006, referring to both the G-W state of Serbia and the G-W state of Yugoslavia that same year.

create_stateyears(system = 'gw') %>%
filter(gwcode %in% c(340, 345) & year %in% c(2005:2008)) %>%
#> Joining with by = join_by(gwcode, year)
#> # A tibble: 5 × 4
#>   gwcode statename   year ccode
#>    <dbl> <chr>      <dbl> <dbl>
#> 1    340 Serbia      2006   345
#> 2    340 Serbia      2007   345
#> 3    340 Serbia      2008   345
#> 4    345 Yugoslavia  2005   345
#> 5    345 Yugoslavia  2006   345

## How {peacesciencer} Handles the Integration of CoW and G-W State System Data

peacesciencer has two functions for converting CoW codes into G-W codes (and vice-versa). add_ccode_to_gw() will take a data set where the ps_system attribute is “gw” and match the G-W codes to CoW codes. The data it uses for this is the gw_cow_years data frame in this package. You can see how it was created here (along with ample annotation about what I’m doing, where the two differ, and why I’m doing what I’m doing). The corollary to this is add_gwcode_to_cow(), which adds G-W codes to a data frame with a ps_system attribute of “cow”. This function uses the cow_gw_years data frame in this package. The code that generates these data are also amply annotated and available for public viewing. These are more about the implementation but the philosophy is more important to state here. I break this philosophy into the following main points.

First, these collisions between the G-W state system data and CoW state system data are unavoidable at the higher levels of temporal aggregation (e.g. state-years). If the user is trying to merge G-W codes into CoW state-year data, they will create duplicate G-W state codes in 1990 for Yemen Arab Republic/Yemen (and Germany/West Germany). This is because CoW sees two states merging into one new (Yemen) or previous (Germany) state while G-W see one folding into the other. A similar situation will happen trying to merge CoW codes into G-W state-year data regarding the final disintegration of Yugoslavia in 2006. CoW sees Serbia as preceding, dominating, and succeeding Yugoslavia where G-W see Yugoslavia as an entity entirely distinct from Serbia. Consider the implication here: a user may have Gleditsch-Ward state-year data for a civil conflict analysis and want to merge in CoW’s national material capabilities data into it. Matching CoW codes to G-W state codes beforehand will invariably create duplicate entries Serbia-2006 and Yugoslavia-2006. The user cannot avoid this. This will happen where the two state systems collide with each other.

Second, the functionality I build into peacesciencer comes from a philosophy that some classification system must be a “master” system. I preach this to my students as well. What the user elects to treat as the “master” system is to their discretion, but peacesciencer forces this on the user in an important way. Namely, the “create” family of functions assign a ps_system attribute to the data it creates. If the user starts their workflow with create_stateyears(system = 'gw'), they will get a state-year data frame in which the “master” system is G-W. If the user instead wants CoW to be the master system, they should run create_stateyears(system = 'cow') and not create_stateyears(system = 'gw'). This is ultimately a design choice by the user, but peacesciencer will force this in its own way. Something must be a “master” system.

It’s worth stating that there is no right or wrong answer here and that the user’s choice should be tailored to the research design. My recommendation is to take one of two tracks toward choosing the “master” system. One approach is to make the “master” system to be the one coinciding with the bulk of the data the author will use. CoW has a larger presence than G-W in the peace science data ecosystem, certainly for “right-hand side” variables (e.g. capabilities, trade) and for inter-state conflict. Whereas I am primarily an inter-state conflict researcher, this would account for why CoW is the default option for the data-creation functions. A more reasonable approach is to make the “master” system to be the one coinciding with the outcome variable. Think of it this way. The user is creating data in peacesciencer because they want to explain some outcome. Let’s say they are interested in explaining intrastate conflict at all levels of intensity in the UCDP armed conflict data. These conflicts use the G-W system for classification. The data they collect are fundamentally nested in the universe of G-W state-year data since 1946. It is more important, as a principle, to get that part right than to split hairs about Yemen, Germany, and Serbia/Yugoslavia. Under those conditions, the user should make the G-W system their master (e.g. through create_stateyears(system = 'gw')).

Finally, peacesciencer strives to make the integration as seamless as possible when it can. For example, add_minimum_distance() will look at the data the user feeds it to see what is the “master” system. If it’s CoW, add_minimum_distance() will merge in minimum distance data from the cow_mindist data frame in this package. If it’s the G-W system, add_minimum_distance() will merge in minimum distance data from the gw_mindist data frame. add_sdp_gdp() and add_democracy() also do this. Collisions between the CoW state system data and G-W state system data are unavoidable; for example, anything in the CoW ecosystem (e.g. alliances, IGOs, capabilities) is going to require CoW codes before merging. Where possible, peacesciencer tries to be inclusive and avoid elevating one over the other where it can.

The end result is a suite of functions in peacesciencer that work well and robustly, given the circumstances. If the CoW system is the master, merging in G-W codes will result in duplicate G-W codes given different interpretations of German and Yemeni unification. If the G-W system is the master, merging in CoW codes will result in duplicate CoW codes given the different interpretation of the disintegration of Yugoslavia. These are unavoidable, but the functionality of add_ccode_to_gw() and add_gwcode_to_cow() will importantly not duplicate the master codes. The underlying data used in these functions were pre-processed to make sure that did not happen. Other functions like add_minimum_distance(), add_sdp_gdp(), and add_democracy() come in both CoW and G-W flavors to allow for easier integration as well. No matter the commonality between both systems, they do differ in important ways that will create some unavoidable collisions when merging one into the other. The user should be aware of this even as peacesciencer works well to contain the collisions that do occur.