Abstract

This article introduces {peacesciencer}, an R package that contains a litany of tools for creating data of widespread interest to the peace science community. The package is cross-platform, assuming only a somewhat recent installation of the R programming language with some of the enhanced functionality of the broadly popular {tidvyerse} packages. Peace science researchers can use this package to greatly reduce the time needed to perfectly recreate common types of data from scratch and to merge in ubiquitous indicators included in almost every analysis (e.g. democracy data, contiguity data). The software is freely available on CRAN and maintains an active website documenting its features at http://svmiller.com/peacesciencer.

Introduction

This manuscript tackles a recurring problem for researchers in the peace science community. True research reproducibility is best achieved creating data from scratch, though no published guide exists that informs researchers how to do this on their own. Instead, researchers may end up reusing old code generated in past studies, leaving them to spend time and energy adjusting the sample of states, the temporal domain, and doing whatever additional troubleshooting may arise from this practice. Researchers may additionally spend too much time reproducing old code for standard information that goes into any dyadic or monadic analysis—like contiguity relationships and democracy—and have to do additional troubleshooting for how these various data sources treat missing data or treat state codes in a manner inconsistent with the more accessible Correlates of War or Gleditsch-Ward state codes. This is all compounded by changes in technology that treat the creation of the data and the analysis of data as a continuous process in which the contemporary quantitative political scientist is increasingly becoming a computer programmer as well. Graduate students and other beginners in the field face unique challenges associated with these developments. Students just learning peace science must learn how scholarship informs data in peace science and how data inform scholarship at the same time they are needing to learn quantitative methods in a chosen software package.

{peacesciencer} is designed to address these problems. Built around the free and open source R programming language, {peacesciencer} contains a suite of data and functions for creating data of interest to researchers. Researchers can use {peacesciencer} to create dyad-year, leader-year, leader-dyad-year, and state-year data (among some others) from scratch. Afterward, they can add a variety of standard information (e.g. contiguity, alliances, major power status, GDP per capita estimates, capability estimates, and more) to these data with a simple command. This is a considerable time-saver since, in the absence of it, researchers would have to more meticulously code and transform the raw data to conform to the kind of data they want. {peacesciencer} comes with some data innovations as well, including a comprehensive data set on democracy by year, an original data set on capitals and capital transitions, and a function to create peace years between ongoing conflicts. All are done with the maximum possible transparency. The project is available for public view on Github (https://github.com/svmiller/peacesciencer/). The data-raw directory on the project’s Github contains information and comments about how every data set was created. The function manuals (http://svmiller.com/peacesciencer/reference) contain additional comments about what each function returns and, in appropriate cases, why it is doing what it is doing. Thus, {peacesciencer} not only assists a peace scientist with their research, but it does so in a manner that best conforms to the Data Access and Research Transparency Initiative (DA-RT) initiative across all political science.

This data feature proceeds in the following fashion. The next section expands what need this package fills for peace scientists. Afterward, it provides an overview of what is included in {peacesciencer} to help researchers more quickly conduct the kind of quantitative research they want. Thereafter, it provides a tutorial on how to install and best use {peacesciencer} in the R programming language. A more comprehensive tutorial follows, showing how {peacesciencer} already has a suite of data and functions that can allow for effective replications of a “dangerous dyads” type analysis (Bremer, 1992), standard state-year analyses of civil conflict onset (e.g. Fearon & Laitin, 2003), and even leader-year analyses of inter-state dispute initiation (e.g. Horowitz & Stam, 2014). This feature concludes with a comparison of {peacesciencer} with other alternatives and a discussion of how {peacesciencer} can inform more reasoned design decisions for researchers in peace science.

Why {peacesciencer}?

{peacesciencer} is motivated by the following observations and ideals that led to its creation. For one, researchers invest too much time in the construction of a data set that faithfully captures the unit of analysis. Assume a researcher wants an original data set on all directed dyad-years for Correlates of War states for an analysis of inter-state conflict. How might one do that? The answer has never been immediately obvious. No published guide exists that shows a researcher how to create these data themselves from scratch, which is one reason why software bundles like EUGene and NewGene are attractive to researchers who primarily care about the substance of their research question. After all, EUGene’s main value—if not its primary impetus—was allowing researchers to create data sets for replication of previous studies, producing a host of data types (e.g. dyad-year, state-year, dispute-year) along the way that users can amend as they saw fit. {peacesciencer} is primarily born from this question about how to create these data from scratch. The underlying code that produces these data types is available online and {peacesciencer} converts these lines of code into simple functions for the ease of the researcher.

Second, researchers also invest too much time in retracing steps for peace science analyses for new projects. Assume a researcher finished a state-year analysis on the correlates of civil conflict onset a few years ago and wants to start a new project that analyzes the same outcome from a different angle (or perhaps using newer data). Under these conditions, a researcher will have to find where they stored that replication code and copy-paste it into a new directory for the new project. They may then have to change the name of some files, change some code to account for potentially new column names in the newer data, and troubleshoot instances where their old code does not perform as it once did. At its worst, this process may lead to some errors by the researcher. At best, this is tedium that spends the researcher’s time they would rather invest in analyzing the data. The lion’s share of {peacesciencer}’s functionality is both creating the units of analysis for the researcher and merging in different forms of data in wide use in the peace science community so that the researcher can spend less of their time on tedium.

Third, the creation of the data and the analysis of the data are increasingly becoming one continuous process. Not too long ago, it used to be the case that researchers had to download a data set, or create one from scratch (possibly in a spreadsheet or through a program like EUGene). After downloading or constructing the data, the researcher then opened a specialty program for statistical analysis (e.g. SAS, SPSS, Stata) to recode raw data into a form suitable for analysis before running a statistical model that regresses some outcome on a set of covariates. Current research practices still resemble this process, but the steps between them are no longer as large as they were in the past. Software options exist that allow the researcher to load data, create data, clean data, analyze data, and present the results of the analysis all within one program. {peacesciencer}, by itself, does not do all these things, but it seamlessly connects the beginning of the research process to the end of the research process without needing to leave the increasingly popular R programming language and RStudio (its free-for-use integrated development environment [IDE]).

Fourth, it is increasingly the case that as the steps between creating data and analyzing data decrease in size, the lines between them blur as well. In other words, to create data is to code data and the contemporary quantitative political scientist is increasingly becoming a computer programmer (c.f. Bowers & Voors, 2016). This is happening concurrent to innovations in programming languages for statistical analysis, especially the R programming language that {peacesciencer} uses. There have been significant advances in add-on packages that allow users to do things like get World Bank data from the internet (Arel-Bundock, 2021a) and even format results from a statistical model for presentation in a way the reduces the probability of transcription errors to almost zero (Arel-Bundock, 2021b). {peacesciencer} embraces this. This R package reduces the time required to create peace science data for analysis and also informs the user about the code required to create the kind of data the user wants.

Finally, the creation and presentation of data in peace science should be 100% robust and transparent, which {peacesciencer} takes seriously in the following ways. The website for {peacesciencer} has several vignettes that describe its processes in some detail. These include how it provides reasonable estimates of democracy that may not be available in the Polity data or the Varieties of Democracy data and how a researcher can whittle dyadic dispute-years into true dyad-years through reasonable case exclusions. {peacesciencer} subjects itself to a battery of tests before publishing updates, making sure new features do not create duplicate entries in the original data (which is the surest sign of a botched merge). The project’s Github contains a publicly available data-raw directory that shows how every data set included (and processed) in {peacesciencer} was created. The function manuals included {peacesciencer} contain ample documentation that clarify what each function is doing, what it returns to the user, and why it is doing it this way. Researchers can also use the project’s Github to point out bugs, ask for further clarification, and propose additions. {peacesciencer} takes seriously the Data Access and Research Transparency Initiative (DA-RT) initiative across all political science and endeavors for maximum transparency, leveraging open source and version control software to inform users of what data it uses and how it uses the data.

What is Included in {peacesciencer}

{peacesciencer} comes with a fully developed suite of built-in functions for generating some of the most widespread forms of peace science data and populating the data with important variables that recur in many quantitative analyses. The core functionality of {peacesciencer} reduces to two broad categories of functions. These categories are functions that create the base data of interest to a researcher and functions, called after the base data are created, that add variables of interest to the data frame or subset the base data to a handful of rows that the researcher deems appropriate for analysis.1 Table 1 and Table 2 list these core functions as of version 1.0, the version of this package slated for release alongside the manuscript.2

Table 1 lists functions that create base data frames for a researcher starting an original project. They serve as functions that communicate the units of analysis supported in this package and that the package is capable of generating for an interested user. For example, create_stateyears() will generate the full universe of state-years from the Correlates of War (Correlates of War, 2011: v. 2016) or Gleditsch-Ward (Gleditsch & Ward, 1999: v. 2017) system, encompassing all state-years to the most recently concluded calendar year, depending on the arguments supplied to the user in the function. create_leaderyears() will generate the full universe of leader-years from the Archigos leader data (Goemans, Gleditsch & Chiozza, 2009: v. 4.1), optionally standardizing leader-years to the Gleditsch-Ward or Correlates of War state system data. As of version 1.0, {peacesciencer} is capable of creating full dyad-year data, leader-day data, leader-dyad-year data, leader-year data, state-day data, and state-year data.3 A vignette on the package’s website shows how users can create other forms of data from these functions as well (e.g. dyadic-dispute-year, leader-months, state-quarters).

Table 1: Functions that Create Base Data
Function Description
create_dyadyears() Create dyad-years from state system membership data
create_leaderdays() Create leader-day data from Archigos
create_leaderdyadyears() Create leader-dyad-years data from Archigos
create_leaderyears() Create leader-years data from Archigos
create_statedays() Create state-days from state system membership data
create_stateyears() Create state-years from state system membership data

Table 2 lists the main functions in {peacesciencer} that add information or subset the number of rows of the data to just those of interest to the user, describing these functions and listing whether they are applicable to dyad-year (D), leader-year (L), leader-dyad-year (LD), state-year (S) data or specialty functions applicable to just the dyadic conflict data (C).4 All these functions use raw or pre-processed data included in the package. For example, add_gml_mids() uses a dyadic dispute-year version of the MID data offered by Gibler et al. (2016: v. 2.2.1) and merges in information about whether there was an ongoing MID or MID onset in a dyad-year, leader-year, leader-dyad-year, or state-year. {peacesciencer} also has some data innovations included in these functions. For example, add_capital_distance() calculates distance between state capitals in kilometers using the Vincenty method (i.e. “as the crow flies”) based on an original data set of state capitals that accounts for instances when capitals moved (e.g. Brazil in 1960, Burundi in 2018). add_democracy() does more than just add Polity data to a data set. The data underpinning the function feature an innovation in democracy data, providing reasonable estimates of democracy using the Marquez (2016) method of extending the Unified Democracy Scores (UDS) data (Pemstein, Meserve & Melton, 2010) in addition to Polity estimates (Marshall, Gurr & Jaggers, 2017: v. 2017) and V-Dem estimates (Coppedge et al., 2020: v. 10).5

Table 2: Functions that Add Data
Function Type Description
add_archigos() D,S Add Archigos political leader information to a data frame
add_atop_alliance() D, LD Add Alliance Treaty Obligations and Provisions (ATOP) alliance data to a data frame
add_capital_distance() D, L, LD, S Add capital-to-capital distance to a data frame
add_ccode_to_gw() D, L, LD, S Match CoW state codes to G-W state codes
add_contiguity() D, L, LD, S Add CoW direct contiguity information to a data frame
add_cow_alliance() D, LD Add CoW alliance data to a data frame
add_cow_majors() D, L, LD, S Add CoW major power information to a data frame
add_cow_mids() D Add CoW-MID data to a dyad-year data frame
add_cow_trade() D, L, LD, S Add CoW trade data to a data frame
add_cow_wars() D, S Add CoW war data to a dyad-year or state-year data frame
add_creg_fractionalization() D, L, LD, S Add fractionalization/polarization estimates from CREG to a data frame
add_democracy() D, L, LD, S Add democracy information to a data frame
add_fpsim() D, LD Add dyadic foreign policy similarity measures to a data frame
add_gml_mids() D, L, LD, S Add GML MID data to a data frame
add_gwcode_to_cow() D, L, LD, S Match G-W state codes to CoW state codes
add_igos() D, S Add CoW IGO data to a data frame
add_lead() L, LD Add leader experience/attributes (LEAD) to data
add_lwuf() L, LD Add leader willingness to use force estimates to data
add_minimum_distance() D, L, LD, S Add minimum distance data to data frame
add_nmc() D, L, LD, S Add Correlates of War National Military Capabilities data
add_peace_years() D, S Add ‘peace years’ to dyad-year/state-year conflict data
add_rugged_terrain() D, L, LD, S Add rugged terrain information to a data frame
add_sdp_gdp() D, L, LD, S Add (surplus, gross) domestic product data to a data frame
add_spells() D, L, LD, S Add duration ‘spells’ to a data frame
add_strategic_rivalries() D, S Add strategic rivalry information to a data frame
add_ucdp_acd() S Add UCDP Armed Conflict Data to state-year data frame
add_ucdp_onsets S Add UCDP onsets to state-year data
filter_prd() D, LD Filter dyad-year data to just politically relevant dyads
wc_duration() C Whittle Duplicate Conflict-Years by (Minimum or Maximum) Conflict Duration
wc_fatality() C Whittle Duplicate Conflict-Years by Highest Fatality
wc_hostility() C Whittle Duplicate Conflict-Years by Highest Hostility
wc_jds() C Whittle Duplicate Conflict-Years by Just Dropping Something
wc_onsets() C Whittle Uniqute Conflict Onset-Years from Conflict-Year Data
wc_recip() C Whittle Duplicate Conflict-Years by Conflict Reciprocation
wc_stmon() C Whittle Duplicate Conflict-Years by Lowest Start Month

{peacesciencer}’s coverage focuses mostly on data that are released as standalone data sets for download, especially those in the Correlates of War or Gleditsch-Ward ecosystem of data. Data that can be obtained from a stable advanced programming interface—like the World Bank, for example—can be obtained through those other means (e.g. Arel-Bundock, 2021a). Its coverage will assuredly expand with new additions of interest to the peace science community, though the package already offers a lot to meet researcher needs.6

How to Install {peacesciencer}

{peacesciencer} is a package for the R programming language. This assumes at least some familiarity with the R programming language. Users should have at least version 3.5 of R, which should not be an issue since the most recent version—as of writing—is 4.1.2. {peacesciencer} is designed to be as user-friendly as possible. Those proficient in R, those just learning R, and those with no experience in R should be able to pick up its use fairly quickly.

{peacesciencer}’s functions work out of the box, though users should find their experience augmented by two additional downloads. First, RStudio offers an IDE that serves as a user-friendly graphical user interface (GUI) over what is, at its core, a programming language with a command-line interface. RStudio’s design will make it much easier for users to experiment with {peacesciencer}’s functionality and read its documentation to assist them with the use of these functions. The second additional download is {tidyverse}, itself a suite of packages that share a common form and design (Wickham et al., 2019). {peacesciencer} functions make considerable use of the component packages of {tidyverse}, and {peacesciencer} can work without it, but installing and loading {tidyverse} will allow the researcher to make quicker use of {peacesciencer}’s functionality. A user can open an R session by way of RStudio and install both packages as follows.7

# Install the packages for use
install.packages(c("tidyverse", "peacesciencer"))

R packages once installed need to be loaded with every R session (i.e. every time the user opens RStudio). The user can load both with the library() function in R.

# Load the packages for use
library(tidyverse)
library(peacesciencer)

Thereafter, a researcher can start using {peacesciencer} to create the kind of data they need.

A Tutorial on How to Use {peacesciencer}

I encourage users who are using {peacesciencer} for the first time, especially those who are learning R for the first time because of their interest in this package, to approach {peacesciencer} with an idea of the kind of data they want to create for the sake of a project. The core functionality of {peacesciencer} begins with the creation of a data frame, which can then be populated with various indicators of interest. No matter, the suggested use of {peacesciencer} begins with the creation of a data frame. To start, assume a new user without much familiarity with the R programming language installed RStudio, {tidyverse}, and {peacesciencer} with the idea of using {peacesciencer} to help them start a new research project that seeks to explain civil conflict onset across state-years. Toward that end, they have identified their unit of analysis is state-year and can be created with the create_stateyears(). To get started, I encourage entering the following command in the console in RStudio.

# See documentation for create_stateyears() function.
?create_stateyears()

This will open a documentation file for this function, which is itself quite verbose and informative for the user about what the function is doing. In this case, the documentation file will show there are three arguments in this function, each with built-in defaults. This function will allow the user to choose a state system for which they want state-years (system, which accepts either “cow” or “gw” for Correlates of War [CoW] state system data or Gleditsch-Ward [G-W] state system data and defaults to “cow”), whether they want to extend the state system to the most recently concluded calendar year (mry, which defaults to TRUE), and whether they may want to additionally subset the years to a more narrow temporal domain they may already have in mind (subset_years, which defaults to no subset of the data, returning all possible state-years). If the user simply ran the function with no overrides, the function would return all Correlates of War state-years from 1816 to the most recent year (2021, as of writing).

# Create all CoW state-years from 1816 to most recently concluded calendar year.
create_stateyears()
#> # A tibble: 16,926 × 3
#>    ccode statenme                  year
#>    <dbl> <chr>                    <int>
#>  1     2 United States of America  1816
#>  2     2 United States of America  1817
#>  3     2 United States of America  1818
#>  4     2 United States of America  1819
#>  5     2 United States of America  1820
#>  6     2 United States of America  1821
#>  7     2 United States of America  1822
#>  8     2 United States of America  1823
#>  9     2 United States of America  1824
#> 10     2 United States of America  1825
#> # … with 16,916 more rows

A user who approaches {peacesciencer} with a project in mind will see they can better tailor this function to what they want. Their interest in a state-year analysis of civil conflict will likely gravitate them toward the Gleditsch-Ward state system data, since that is the state system that serves as the basis of the UCDP armed conflict data. They will also know they have no use for pre-1946 observations since civil conflict data typically have coverage only from 1946 forward. Thus, the user can supply some additional arguments to tailor the creation of data to just what they want (here: all Gleditsch-Ward state-years from 1946 to 2019).

# Create all G-W state years from 1946 to 2019.
create_stateyears(system = 'gw', subset_years = c(1946:2019))
#> # A tibble: 10,490 × 3
#>    gwcode statename                 year
#>     <dbl> <chr>                    <int>
#>  1      2 United States of America  1946
#>  2      2 United States of America  1947
#>  3      2 United States of America  1948
#>  4      2 United States of America  1949
#>  5      2 United States of America  1950
#>  6      2 United States of America  1951
#>  7      2 United States of America  1952
#>  8      2 United States of America  1953
#>  9      2 United States of America  1954
#> 10      2 United States of America  1955
#> # … with 10,480 more rows

Another reader may be interested in using {peacesciencer} for a research project using a leader-year unit of analysis. These data can be created with create_leaderyears(). Users can read more about what this function is doing by consulting the documentation file in R with the following command.

# See documentation for create_leaderyears() function.
?create_leaderyears()

This function has three arguments denoting the leader system to inform the creation of the leader-year data (i.e. Archigos) and two other arguments: standardize and subset_years. The standardize arguments, which when it defaults to “none”, returns all leader-years as presented in the raw Archigos data. Archigos’ leader data are nominally denominated in the Gleditsch-Ward state system data, if not necessarily Gleditsch-Ward state system dates (e.g. Archigos often has leader entries prior to state system entry in a few cases). Thus, the user can standardize leader-year data to Gleditsch-Ward state system dates of CoW state system dates. Finally, the user can subset the leader-year data to a more narrow temporal domain that may interest them with the subset_years function. If a user is interested in creating a data set of leader-years for an analysis of inter-state conflict initiation with the GML MID data, they can create the data that interests them with the following function. Notice how create_leaderyears() also returns information about the leader too, like the leader’s approximate age that year, their gender, and information about their tenure.

# Create all leader-years for CoW states from 1870 to 2010.
create_leaderyears(standardize = "cow", subset_years = c(1870:2010))
#> # A tibble: 15,074 × 7
#>    obsid    ccode leader gender  year yrinoffice leaderage
#>    <chr>    <dbl> <chr>  <chr>  <dbl>      <dbl>     <dbl>
#>  1 USA-1869     2 Grant  M       1870          2        48
#>  2 USA-1869     2 Grant  M       1871          3        49
#>  3 USA-1869     2 Grant  M       1872          4        50
#>  4 USA-1869     2 Grant  M       1873          5        51
#>  5 USA-1869     2 Grant  M       1874          6        52
#>  6 USA-1869     2 Grant  M       1875          7        53
#>  7 USA-1869     2 Grant  M       1876          8        54
#>  8 USA-1869     2 Grant  M       1877          9        55
#>  9 USA-1877     2 Hayes  M       1877          1        55
#> 10 USA-1877     2 Hayes  M       1878          2        56
#> # … with 15,064 more rows

Researchers using {peacesciencer} to create data for their research project should start with one of these “create” functions. Whether state-year, dyad-year, or a leader-level analysis, these functions will create the full universe of cases of interest to a researcher for the unit of analysis that serves as the core of their project.

Creating Dyad-Year Data and Adding to Dyad-Year Data in {peacesciencer}

After creating the base data of interest to their project, researchers can begin to add information they want with the suite of functions outlined in Table 2. For example, researcher interested in a dyad-year analysis of inter-state disputes can create a non-directed dyad-year data set from 1816 to 2010 in {peacesciencer} with the create_dyadyears() function, one of the aforementioned “create” function. The following would create the base data of interest to the user (i.e. all non-directed dyad-years from 1816 to 2010).

# Create all non-directed dyad-years from 1816 to 2010.
create_dyadyears(directed = FALSE, subset_years = c(1816:2010))
#> # A tibble: 842,655 × 3
#>    ccode1 ccode2  year
#>     <dbl>  <dbl> <int>
#>  1      2     20  1920
#>  2      2     20  1921
#>  3      2     20  1922
#>  4      2     20  1923
#>  5      2     20  1924
#>  6      2     20  1925
#>  7      2     20  1926
#>  8      2     20  1927
#>  9      2     20  1928
#> 10      2     20  1929
#> # … with 842,645 more rows

Adding information to this data frame is a simple matter of joining a series of functions together in a “pipe.” The “pipe”—represented as %>% in the code below—is an operator built into {tidyverse} that allows users to pass forward expressions or functions. These pipes are common in the programming world and, as the code below will show, have the benefit of changing code in a way that is more intuitive and easier to both read and write for the user. While these functions can be modified to work without {tidyverse} installed and loaded into the session, the user will find their experience with {peacesciencer} is only improved by this important package.

For example, assume the researcher wants just all politically relevant, non-directed dyad-years, where political relevance is traditionally understood as a dyadic relationship involving some form of a contiguity relationship or a major power (Lemke & Reed, 2001). create_dyadyears(directed = FALSE, subset_years = c(1816:2010)) created the full universe of non-directed dyad-years from 1816 to 2010, though this full universe includes “irrelevant” dyads like Nigeria-Mongolia and Estonia-Rwanda. Reducing the data to just politically relevant dyads is simple in {peacesciencer} and {tidyverse}. Users first create the data they want (here: create_dyadyears(directed = FALSE, subset_years = c(1816:2010))), follow it with the pipe operator (%>%), and then add another function from {peacesciencer} (here: filter_prd(), which also quietly executes add_contiguity() and add_cow_majors()).

# create base data, and pipe (%>%) to next function
create_dyadyears(directed = FALSE, subset_years = c(1816:2010)) %>%
  # subset data to politically relevant dyads (PRDs)
  filter_prd()
#> # A tibble: 112,282 × 7
#>    ccode1 ccode2  year conttype cowmaj1 cowmaj2   prd
#>     <dbl>  <dbl> <int>    <dbl>   <dbl>   <dbl> <dbl>
#>  1      2     20  1920        1       1       0     1
#>  2      2     20  1921        1       1       0     1
#>  3      2     20  1922        1       1       0     1
#>  4      2     20  1923        1       1       0     1
#>  5      2     20  1924        1       1       0     1
#>  6      2     20  1925        1       1       0     1
#>  7      2     20  1926        1       1       0     1
#>  8      2     20  1927        1       1       0     1
#>  9      2     20  1928        1       1       0     1
#> 10      2     20  1929        1       1       0     1
#> # … with 112,272 more rows

Here, the user has created all non-directed dyad-years from 1816 to 2010 and then subset the data to just those with a major power or with some kind of contiguity relationship.

Users will find that the ease of the “pipe” will allow them greater agency in creating the full data set they may want for an analysis. Indeed, the pipe has the effect of forming something analogous to a drop-down menu, in which the user can “select” additional data/commands they may want simply by specifying the function in {peacesciencer} that does what they want. For example, a researcher can follow filter_prd() with another pipe and communicate they want information about ongoing conflicts and conflict onsets from the Gibler-Miller-Little (GML) dispute data set (Gibler, Miller & Little, 2016). Following filter_prd() with add_gml_mids(keep = NULL) will add information about ongoing conflicts and onsets in a given dyad-year.8

# create base data, and pipe (%>%) to next function
create_dyadyears(directed = FALSE, subset_years = c(1816:2010)) %>%
  # subset data to politically relevant dyads (PRDs), pipe to next function
  filter_prd() %>%
  # add conflict information from GML-MID data,
  add_gml_mids(keep = NULL) 
#> # A tibble: 112,282 × 9
#>    ccode1 ccode2  year conttype cowmaj1 cowmaj2   prd gmlmidonset gmlmidongoing
#>     <dbl>  <dbl> <dbl>    <dbl>   <dbl>   <dbl> <dbl>       <dbl>         <dbl>
#>  1      2     20  1920        1       1       0     1           0             0
#>  2      2     20  1921        1       1       0     1           0             0
#>  3      2     20  1922        1       1       0     1           0             0
#>  4      2     20  1923        1       1       0     1           0             0
#>  5      2     20  1924        1       1       0     1           0             0
#>  6      2     20  1925        1       1       0     1           0             0
#>  7      2     20  1926        1       1       0     1           0             0
#>  8      2     20  1927        1       1       0     1           0             0
#>  9      2     20  1928        1       1       0     1           0             0
#> 10      2     20  1929        1       1       0     1           0             0
#> # … with 112,272 more rows

Users can also calculate peace-years for these conflicts with add_spells() by using the pipe to pass forward the data set and applying the add_spells() function to it.9

# create base data, and pipe (%>%) to next function
create_dyadyears(directed = FALSE, subset_years = c(1816:2010)) %>%
  # subset data to politically relevant dyads (PRDs), pipe to next function
  filter_prd() %>%
  # add conflict information from GML-MID data, pipe to next function
  add_gml_mids(keep = NULL) %>%
  # add peace years ("spells")
  add_spells()
#> # A tibble: 112,282 × 10
#>    ccode1 ccode2  year conttype cowmaj1 cowmaj2   prd gmlmidonset gmlmidongoing
#>     <dbl>  <dbl> <dbl>    <dbl>   <dbl>   <dbl> <dbl>       <dbl>         <dbl>
#>  1      2     20  1920        1       1       0     1           0             0
#>  2      2     20  1921        1       1       0     1           0             0
#>  3      2     20  1922        1       1       0     1           0             0
#>  4      2     20  1923        1       1       0     1           0             0
#>  5      2     20  1924        1       1       0     1           0             0
#>  6      2     20  1925        1       1       0     1           0             0
#>  7      2     20  1926        1       1       0     1           0             0
#>  8      2     20  1927        1       1       0     1           0             0
#>  9      2     20  1928        1       1       0     1           0             0
#> 10      2     20  1929        1       1       0     1           0             0
#> # … with 112,272 more rows, and 1 more variable: gmlmidspell <dbl>

Researchers should see that {peacesciencer}’s functionality can scale up nicely from there. For example, the following would round out the kind of information necessary to replicate Bremer’s (1992) famous “dangerous dyads” analysis by adding information about national material capabilities for both states in the dyad (add_nmc()), estimates of democracy for both states in the dyad (add_democracy()), information about alliance commitments in the dyad-year (add_cow_alliance(), by way of Gibler (2009)), and finishing with information about estimated population size and (surplus, gross) domestic product based on simulations reported by Anders, Fariss & Markowitz (2020). Whereas add_sdp_gdp() is the last command in the pipe-based workflow, the {peacesciencer} call ends by assigning to an object called Data. This type of assignment is done with the “right hand” assignment operator (i.e. ->).

# create base data, and pipe (%>%) to next function
create_dyadyears(directed = FALSE, subset_years = c(1816:2010)) %>%
  # subset data to politically relevant dyads (PRDs), pipe to next function
  filter_prd() %>%
  # add conflict information from GML-MID data, pipe to next function
  add_gml_mids(keep = NULL) %>%
  # add peace years ("spells"), pipe to next function
  add_spells() %>%
  # add capabilities data, pipe to next function
  add_nmc() %>%
  # add some estimates about democracy for each state, pipe to next function
  add_democracy() %>%
  # add information about alliance commitments in dyad-year
  add_cow_alliance() %>%
  # finish with information about population and GDP/SDP
  # and then assign to object, called, minimally, 'Data'
  add_sdp_gdp() -> Data

Data
#> # A tibble: 112,282 × 42
#>    ccode1 ccode2  year conttype cowmaj1 cowmaj2   prd gmlmidonset gmlmidongoing
#>     <dbl>  <dbl> <dbl>    <dbl>   <dbl>   <dbl> <dbl>       <dbl>         <dbl>
#>  1      2     20  1920        1       1       0     1           0             0
#>  2      2     20  1921        1       1       0     1           0             0
#>  3      2     20  1922        1       1       0     1           0             0
#>  4      2     20  1923        1       1       0     1           0             0
#>  5      2     20  1924        1       1       0     1           0             0
#>  6      2     20  1925        1       1       0     1           0             0
#>  7      2     20  1926        1       1       0     1           0             0
#>  8      2     20  1927        1       1       0     1           0             0
#>  9      2     20  1928        1       1       0     1           0             0
#> 10      2     20  1929        1       1       0     1           0             0
#> # … with 112,272 more rows, and 33 more variables: gmlmidspell <dbl>,
#> #   milex1 <dbl>, milper1 <dbl>, irst1 <dbl>, pec1 <dbl>, tpop1 <dbl>,
#> #   upop1 <dbl>, cinc1 <dbl>, milex2 <dbl>, milper2 <dbl>, irst2 <dbl>,
#> #   pec2 <dbl>, tpop2 <dbl>, upop2 <dbl>, cinc2 <dbl>, v2x_polyarchy1 <dbl>,
#> #   polity21 <dbl>, xm_qudsest1 <dbl>, v2x_polyarchy2 <dbl>, polity22 <dbl>,
#> #   xm_qudsest2 <dbl>, cow_defense <dbl>, cow_neutral <dbl>, cow_nonagg <dbl>,
#> #   cow_entente <dbl>, wbgdp2011est1 <dbl>, wbpopest1 <dbl>, sdpest1 <dbl>, …

If the user wants to move these data into Stata for analysis, they can save it to their current working directory with a command like haven::write_dta(Data, "my-data.dta") and import it into Stata when they are done. No matter, {peacesciencer} has pre-processed, cleaned, recoded, and merged the desired data that has greatly reduced the time and energy a researcher might otherwise spend doing something like hard-coding -9s in these data to be NA in the capabilities data. In this particular application, it has already created the main data required for a replication of Bremer (1992). There is only some slight data work to create the desired indicators for a statistical model of conflict onset, like a dummy variable for land-contiguity, the presence of a major power in the dyad, and some “weak-link” indicators of militarization, relative power in the dyad, level of democracy in the dyad (using the Marquez (2016) method for extending the Unified Democracy Scores data), and the GDP per capita in the dyad. Table 3 is a formatted version of the results of a logistic regression model of conflict onset using these “dangerous dyads” indicators. Users typically do not end their analysis here—often looking for new predictors of conflict onset with these covariates in mind—but {peacesciencer} greatly reduces the time and energy researchers must invest into cleaning and processing data for analysis.

Table 3: A “Dangerous Dyads” Analysis of Non-Directed Dyad-Years from {peacesciencer}
Model 1
Land Contiguity 1.062*
(0.057)
Dyadic CINC Proportion (Lower/Higher) 0.453*
(0.036)
CoW Major Power in Dyad 0.144*
(0.057)
Defense Pact −0.119*
(0.058)
Dyadic Democracy (Weak-Link) −0.493*
(0.052)
Dyadic GDP per Capita (Weak-Link) 0.293*
(0.051)
Dyadic Militarization (Minimum) 0.263*
(0.023)
t −0.146*
(0.005)
t^2 0.002*
(0.000)
t^3 0.000*
(0.000)
Intercept −3.045*
(0.063)
Num.Obs. 103919
+ p < 0.1, * p < 0.05

Creating State-Year Data and Adding to State-Year Data in {peacesciencer}

{peacesciencer} is capable of generating data for replications of analyses at multiple levels beyond dyad-year. Suppose a researcher wants to create a state-year data frame to conduct an analysis of civil conflict onset analogous to Fearon and Laitin’s (2003) well-cited analysis of civil conflict onset, but using UCDP conflict data and the Gleditsch-Ward state system for creating the appropriate universe of state-years. The pipe-based workflow will start with create_stateyears(system = 'gw', subset_years = c(1946:2019)), creating the full universe of Gleditsch-Ward state years and subsetting them to just 1946-2019 (because the UCDP data included in {peacesciencer} include just the observations in that time frame). Next, we can use the add_ucdp_acd() function to return information about ongoing UCDP conflicts and onsets for these states. add_ucdp_acd() takes three arguments: type, issue, and only_wars. type is an optional argument for the type of armed conflicts for which the researcher wants information. Options include “extrasystemic”, “interstate”, “intrastate”, and “II” (short for “internationalized intrastate”). If no type is specified, the function returns information about ongoing disputes and onsets for all states for all types of conflict. If the user wants information about multiple types of conflict–say: intra-state wars and internationalized intra-state wars—they can specify that as a character vector (e.g. type = c("intrastate", "II")). issue is another optional argument for what issue types of conflicts the user wants beyond the type of armed conflict. Options include “territory”, “government”, and “both”. If no issue is specified, the function returns information for all conflicts regardless of the particular issue. only_wars is an argument that subsets the data to just those with the intensity levels of “war” when only_wars = TRUE. The argument defaults to FALSE, returning information about conflicts with at least 25 deaths in addition to the conflicts with more than 1,000 deaths. In this application, add_ucdp_acd(type = "intrastate", only_wars = FALSE)) returns state-year information about ongoing intra-state conflicts over any issue and at either of UCDP’s severity thresholds.10

# Create a set of state-year data for a civil conflict analysis.
create_stateyears(system = 'gw', subset_years = c(1946:2019)) %>%
  add_ucdp_acd(type="intrastate", only_wars = FALSE) %>%
  add_spells() %>%
  add_democracy() %>%
  add_creg_fractionalization() %>%
  add_sdp_gdp() %>%
  add_rugged_terrain() -> Data

Finally, we can add some covariates of interest to these data. add_spells() calculates peace spells between ongoing conflicts in the data generated by add_ucdp_acd(). add_democracy() adds information about the level of democracy in the year using three prominent data sets on democracy (Polity, V-Dem, and Marquez’ (2016) extension of Pemstein et al’s (2010) Unified Democracy Scores). add_creg_fractionalization() adds information about the fractionalization and polarization of a state’s ethnic and religious groups from the Composition of Religious and Ethnic Groups (CREG) Project at the University of Illinois. add_sdp_gdp() will add information about a state’s estimated GDP, population, and GDP per capita from the Anders, Fariss & Markowitz (2020) simulations. Finally, add_rugged_terrain() provide two estimates of the ruggedness of a state’s terrain. The first is the terrain ruggedness index calculated by Nunn & Puga (2012) and the second is the Gibler & Miller (2014) extension of the natural logged percentage of the state that is mountainous (originally calculated by Fearon & Laitin (2003)). At the end of the pipe, the data returned by {peacesciencer} is assigned to an object minimally called Data.

{peacesciencer}’s tight integration with the {tidyverse} permits wide flexibility for the researcher. For example, assume the researcher wants to discern the estimated effect of the same set of covariates on intra-state conflicts at the threshold of war and those intra-state conflicts at or below the threshold of war. The first call included all conflicts with at least 25 deaths, per the UCDP’s inclusion rules, and the peace years were calculated for those as well. If the researcher wants a new set of conflicts with a new set of peace years, it would be a simple matter of repeating the pipe-based workflow, but altering the argument in add_ucdp_acd() to be only_wars = TRUE. {peacesciencer} would then calculate the peace years for those (add_peace_years()). To avoid confusion with the overlapping column names, the researcher can use some {tidyverse} verbs to rename all those conflict variables to have a distinct prefix of war_ (i.e. rename_at(vars(ucdpongoing:ucdpspell), ~paste0("war_", .))) before finally joining these data into the master data frame (i.e. left_join(Data, .) -> Data). Table 4 shows the fruits of the data {peacesciencer} generated after some post-processing and lagging important variables.

# Repeat the process, but for a new DV of just wars
create_stateyears(system = 'gw', subset_years = c(1946:2019)) %>%
  add_ucdp_acd(type="intrastate", only_wars = TRUE) %>%
  add_spells() %>%
  rename_at(vars(ucdpongoing:ucdpspell), ~paste0("war_", .)) %>%
  left_join(Data, .) -> Data
Table 4: A Civil Conflict Analysis of Gleditsch-Ward State-Years in {peacesciencer}
All UCDP Conflicts Wars Only
GDP per Capita (Lagged) −0.285* −0.343*
(0.110) (0.172)
Population Size (Lagged) 0.229* 0.272*
(0.067) (0.106)
Extended UDS Democracy Score (Lagged) 0.257 −0.085
(0.181) (0.270)
Extended UDS Democracy Score^2 (Lagged) −0.726* −0.761*
(0.211) (0.352)
% Mountainous Terrain (Logged) 0.055 0.342*
(0.067) (0.112)
Ethnic Fractionalization 0.442 0.333
(0.358) (0.554)
Religious Fractionalization −0.389 −0.281
(0.402) (0.593)
t −0.074+ −0.111*
(0.039) (0.056)
t^2 0.004* 0.005+
(0.002) (0.003)
t^3 0.000* 0.000+
(0.000) (0.000)
Intercept −5.098* −6.591*
(1.351) (2.084)
Num.Obs. 8192 8192
+ p < 0.1, * p < 0.05

Creating Leader-Year Data and Adding to Leader-Year Data in {peacesciencer}

{peacesciencer} also has support for newer levels of analysis in the peace science community, prominently leader-levels of analysis. There has been considerable emphasis in peace science research to emphasize that state leaders, not “states”, make foreign policy decisions that may lead to war. Understanding the attributes of leaders themselves are critical to the core research questions of the community (Goemans, Gleditsch & Chiozza, 2009; Horowitz & Stam, 2014; Ellis, Horowitz & Stam, 2015) and {peacesciencer} wants to help researchers toward that end.

Table 2 shows there are some dedicated functions for populating leader-level data with leader-specific information, in addition to adding state-year-level information (e.g. democracy, capabilities) to leader-level data. Table 1 shows support for creating leader-level data that are standardized to either the CoW or G-W state system data. Suppose a researcher wanted to create leader-year data, standardized to CoW system dates, for an analysis of inter-state dispute initiation analogous to what Horowitz & Stam (2014) do. The user would first start with creating the base data (create_leaderyears(standardize = "cow", subset_years=c(1875:2010))), which also generates some leader attributes (e.g. estimated leader age, year in office, and leader gender). From there, they would add information about leader conflict behavior in the year with add_gml_mids() and calculate peace-years with the add_spells() function.11 add_lead() will add a battery of leader experience and attributes variables from data created by Ellis, Horowitz & Stam (2015), including whether the leader had military experience, combat experience, was a rebel fighter, and many more. Finally, add_nmc() and add_democracy() will add state-year estimates of national capabilities and democracy for assisting in making state-to-state comparisons, even for leader-level analyses. Some light recoding of the data created in {peacesciencer} and some regression modeling reproduces the results presented in Table 5, itself an approximation of the kind of leader-year analysis exemplified in Horowitz & Stam (2014).

# Create a set of leader-year data for an analysis of conflict initiation.
create_leaderyears(standardize = "cow", subset_years=c(1875:2010)) %>%
  add_gml_mids() %>%
  add_spells() %>%
  add_lead() %>%
  add_nmc() %>%
  add_democracy() -> Data
Table 5: A Leader-Year Analysis of Inter-State Dispute Initiation in {peacesciencer}
Model 1
Male Leader −0.752*
(0.206)
Leader Age 0.010*
(0.002)
Leader Tenure (in Years) 0.027*
(0.004)
Military Service (No Combat) 0.332*
(0.084)
Military Serivce (with Combat) 0.132
(0.090)
Previous Rebel Experience 0.287*
(0.069)
Prior War Win 0.033
(0.107)
Prior War Loss 0.090
(0.106)
Prior Rebel Win −0.136
(0.102)
Prior Rebel Loss 0.328*
(0.142)
Extended UDS Democracy Score −0.180*
(0.037)
CINC 11.858*
(0.626)
t −0.247*
(0.028)
t^2 0.010*
(0.004)
t^3 0.000+
(0.000)
Intercept −1.816*
(0.243)
Num.Obs. 13839
+ p < 0.1, * p < 0.05

A Comparison with Other Approaches

{peacesciencer} is not the only software available to peace science researchers who want to reduce the time and energy required to faithfully recreate data from scratch. Alternatives exist, some more inaccessible than others. NewGene, for example, is a stand-alone software program for Microsoft Windows and Mac that can create various types of data of interest to international relations scholars (Bennett, Poast & Stam, 2019). NewGene is itself the evolution of EUGene, which served conflict researchers well for over a decade (Bennett & Stam, 2000). Finally, peace scientists well-versed in Structured Query Language (SQL) could use one of several “join” transformations to create dyad-year data from state system data, even if this might amount to a detour in the peace scientist’s research agenda for this particular task.12 A comparison of {peacesciencer} with EUGene and NewGene suggests the following benefits of the package while emphasizing areas where other options may have some advantages.

EUGene is the clear inspiration for this package. Though its original impetus was the generation of expected utility data for evaluating Bueno de Mesquita & Lalman (1992) (i.e. the “EU” in “EUGene”), the software became quite popular for peace science scholars in the early 2000s for helping them start new projects from scratch with important data already provided. {peacesciencer} covers all the same units of analysis that EUGene covers, as of version 3.2.13 EUGene has more explicit support for dyadic dispute-year data whereas {peacesciencer} treats dyadic dispute-year data as a derivation of dyad-year data, with respect to functions that populate base data with additional information.14 {peacesciencer} may reflect more current research interests in the peace science community. This is why there are leader-level data and support for the G-W (and UCDP) ecosystem of data that EUGene does not have, but there is no function yet for things like calculating expected utility values. Importantly, though, {peacesciencer} mimics the verbosity of EUGene’s user manual. EUGene’s user manual was amply informative about what it was calculating and why it was doing what it was doing for some defaults. Likewise, {peacesciencer}’s documentation strives to be as informative as possible as to what it is doing and why it is doing what it is doing for some defaults. A scholar who remembers EUGene well will ideally think of {peacesciencer} as the most faithful approximation of what that software did for the community at the time. It has the added benefit of being agnostic to the researcher’s operating system, having greater flexibility of data types supported, and better reflecting more current frontiers in the community. It does have the drawback of requiring at least some level of comfort with the R programming language.

NewGene is the latest evolution of EUGene, at least as a stand-alone executable program for creating the kind of data of interest to the peace science community. NewGene’s greatest strength, relative to EUGene and even {peacesciencer}, is its support for k-adic data (c.f. Poast, 2010). k-adic levels of analysis are not yet supported in {peacesciencer} and users interested in generating k-adic data should consider downloading NewGene. No matter, {peacesciencer} has several superlatives in relation to NewGene. It supports leader-year and leader-dyad-year data while NewGene does not. It offers support for the G-W (and UCDP) ecosystem of data whereas NewGene does not and is mostly aimed for researchers interested in inter-state conflict. NewGene deviates a bit from EUGene by only indirectly asking users what kind of data they want (e.g. state-year, dyad-year) and without providing too much detail about what it is doing and why it is doing what it is doing. For example, NewGene only indirectly states the unit of analysis of the data to be generated near the top of its interface by asking the user how many country columns they want. This is an indirect way of the user to get data that are state-year, dyad-year, triad-year (etc.), which are then expanded and populated with data at various levels. {peacesciencer}, much like EUGene, is more explicit, encouraging the user to be upfront about what their unit of analysis is and what are the data that can be plausibly plugged into the data the user is creating. {peacesciencer}, again, does expect at least some level of comfort with the R programming language, but even this comes with greater ease of interpreting what is the unit of analysis and what are the primary spatial and temporal units that serve as the basis of the data.

Conclusion

{peacesciencer} is already more than capable of creating the kind of data in high demand in peace science. It can create dyad-year, leader-year, leader-dyad-year, and state-year data (among others). It is also generalizable to the dispute data included in the package, allowing for merging into dispute-year data as well. This feature showed how it can effectively approximate three types of analyses in wide use in the peace science community. Surely researchers can and will add more information to these simple analyses after using {peacesciencer}, but the package already does a lot of the tedious work for researchers. It also does this in a maximally transparent way that conforms well to the DA-RT initiative across all political science. This is not to say {peacesciencer} does everything, but {peacesciencer} can only evolve and expand on what it already does well. This package can only evolve to meet new analytical demands for the peace science community. Users are free to request new features as “issues” on the project’s Github.

Finally, a skeptical reader should not think that making the process as simple as possible necessarily facilitates poor decision-making by the user. In cases where it is evident what the user wants (e.g. an estimate of the level of democracy in the state-year), {peacesciencer} does the necessary work to provide the user that information. However, the package makes sure to leave important decision-making to the researcher. For example, add_cow_alliance() returns information about various types of alliance pledges in the dyad-year—should one exist—but leaves it to the researcher to say whether they want to define the presence of an alliance to be just a defense pledge or any type of alliance pledge. add_contiguity() returns information about the type of contiguity relationship in the dyad-year, but leaves it to the researcher whether they want to code a contiguity variable as the presence of a mutual land border or some other type of contiguity threshold. The documentation included in this package, and on the website, is replete with caveats about the underlying data (e.g. the contiguity data are not ordinal and should not be treated as such), how and where data issues arise (e.g. how CoW state system data differ from Gleditsch-Ward data and how one is coerced into the other), and how researchers should consider optimally using its functionality (e.g. add_ucdp_acd() probably should not lump all forms of conflict together (c.f. Gibler & Miller, Forthcoming)). {peacesciencer} does not endeavor to make researchers lazy or sloppy, and it does not ultimately do this. Instead, {peacesciencer} encourages well-reasoned design decisions by the user up front and reduces the tedium associated with starting quantitative peace science research. It achieves this in a quick, robust, and transparent way.

References

Anders, Therese, Christopher J Fariss & Jonathan N Markowitz (2020) Bread before guns or butter: Introducing surplus domestic product (SDP). International Studies Quarterly 64(2): 392–405.
Arel-Bundock, Vincent (2021b) Modelsummary: Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready (https://CRAN.R-project.org/package=modelsummary).
Arel-Bundock, Vincent (2021a) WDI: World Development Indicators and Other World Bank Data (https://CRAN.R-project.org/package=WDI).
Barbieri, Katherine, Omar MG Keshk & Brian Pollins (2009) Trading data: Evaluating our assumptions and coding rules. Conflict Management and Peace Science 26(5): 471–491.
Beck, Nathaniel, Jonathan N Katz & Richard Tucker (1998) Taking time seriously: Time-series-cross-section analysis with a binary dependent variable. American Journal of Political Science 42(4): 1260–1288.
Bennett, DScott, Paul Poast & Allan C Stam (2019) NewGene: An introduction for users. Journal of Conflict Resolution 63(6): 1579–1592.
Bennett, DScott & Allan Stam (2000) EUGene: A conceptual manual. International Interactions 26(2): 179–204.
Bowers, Jake & Maarten Voors (2016) How to improve your relationship with your future self. Revista de Ciencia Politica 36(3): 829–848.
Bremer, Stuart A (1992) Dangerous dyads: Conditions affecting the likelihood of interstate war, 1816-1965. Journal of Conflict Resolution 36(2): 309–341.
Bueno de Mesquita, Bruce & David Lalman (1992) War and Reason: Domestic and International Imperatives. Yale University Press.
Carter, David B & Curtis S Signorino (2010) Back to the future: Modeling time dependence in binary data. Political Analysis 18(3): 271–292.
Carter, Jeff & Charles E Smith (2020) A framework for measuring leaders’ willingness to use force. American Political Science Review 114(4): 1352–1358.
Coppedge, Michael, John Gerring, Carl Henrik Knutsen, Staffan I Lindberg, Jan Teorell, David Altman, Michael Bernhard, MSteven Fish, Adam Glynn, Allen Hicken, Anna Luhrmann, Kyle L Marquardt, Kelly McMann, Pamela Paxton, Daniel Pemstein, Brigitte Seim, Rachel Sigman, Svend-Erik Skaaning, Jeffrey Staton, Agnes Cornell, Lisa Gastaldi, Haakon Gjerløw, Valeriya Mechkova, Johannes von Römer, Aksel Sundtröm, Eitan Tzelgov, Luca Uberti, Yi-ting Wang, Tore Wig & Daniel Ziblatt (2020) V-dem codebook v10.
Correlates of War (2011) State system membership list, v2016 (http://correlatesofwar.org).
Ellis, Cali Mortenson, Michael C Horowitz & Allan C Stam (2015) Introducing the LEAD data set. International Interactions 41(4): 718–741.
Fearon, James D & David D Laitin (2003) Ethnicity, insurgency, and civil war. American Political Science Review 97(1): 75–90.
Gibler, Douglas M (2009) International Military Alliances, 1648-2008. Washington DC: CQ Press.
Gibler, Douglas M & Steven V Miller (2014) External territorial threat, state capacity, and civil war. Journal of Peace Research 51(5): 634–646.
Gibler, Douglas M & Steven V Miller (Forthcoming) An appraisal of Project Mars and the Divided Armies argument. International Studies Quarterly (Forthcoming).
Gibler, Douglas M, Steven V Miller & Erin K Little (2016) An analysis of the Militarized Interstate Dispute (MID) dataset, 1816-2001. International Studies Quarterly 60(4): 719–730.
Gleditsch, Kristian S & Michael D Ward (1999) A revised list of independent states since the Congress of Vienna. International Interactions 25(4): 393–413.
Gleditsch, Nils Petter, Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Havard Strand (2002) Armed conflict 1946-2001: A new dataset. Journal of Peace Research 39(5): 615–637.
Goemans, Henk E, Kristian Skrede Gleditsch & Giacomo Chiozza (2009) Introducing Archigos: A dataset on political leaders. Journal of Peace Research 46(2): 269–83.
Horowitz, Michael C & Allan C Stam (2014) How prior military experience influences the future militarized behavior of leaders. International Organization 68(3): 527–559.
Leeds, Bretty Ashley, Jeffrey M Ritter, Sara McLaughlin Mitchell & Andrew G Long (2002) Alliance treaty obligations and provisions, 1815-1944. International Interactions 28: 237–260.
Lemke, Douglas & William Reed (2001) The relevance of politically relevant dyads. Journal of Conflict Resolution 45(1): 126–144.
Marquez, Xavier (2016) A quick method for extending the Unified Democracy Scores (http://dx.doi.org/10.2139/ssrn.2753830).
Marshall, Monty G, Ted Robert Gurr & Keith Jaggers (2017) Polity IV project: Political regime characteristics and transitions, 1800-2016.
Nunn, Nathan & Diego Puga (2012) Ruggedness: The blessing of bad geography in Africa. Review of Economics and Statistics 94(1): 20–36.
Pemstein, Daniel, Stephen A Meserve & James Melton (2010) Democratic compromise: A latent variable analysis of ten measures of regime type. Political Analysis 18(4): 426–449.
Poast, Paul (2010) (Mis)using dyadic data to analyze multilateral events. Political Analysis 18(4): 403–425.
Schvitz, Guy, Seraina Rüegger, Luc Girardin, Lars-Erik Cederman, Nils Weidmann & Kristian Skrede Gleditsch (2021) Mapping The International System, 1886-2017: The CShapes 2.0 Dataset. Journal of Conflict Resolution (https://journals.sagepub.com/doi/full/10.1177/00220027211013563).
Singer, David J (1987) Reconstructing the Correlates of War dataset on material capabilities of states, 1816-1985. International Interactions 14(1): 115–32.
Stinnett, Douglas M, Jaroslav Tir, Philip Schafer, Paul F Diehl & Charles S Gochman (2002) The Correlates of War Project Direct Contiguity Data, version 3. Conflict Management and Peace Science 19(2): 58–66.
Thompson, William R & David R Dreyer (2012) Handbook of International Rivalries, 1494-2010. Washington DC: CQ Press.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, Alex Hayes, Lionel Henry, Jim Hester, Max Kuhn, Thomas Lin Pedersen, Evan Miller, Stephan Milton Bache, Kirill Müller, Jeroen Ooms, David Robinson, Dana Paige Seidel, Vitalie Spinu, Kohske Takahashi, Davis Vaughan, Claus Wilke, Kara Woo & Hiroaki Yutani (2019) Welcome to the tidyverse. Journal of Open Source Software 4(43): 1686.

  1. In both cases, SQL-like “joins”—especially the left (outer) join—are the primary workhorse in these functions. A vignette on the package website, included in the appendix, explains these various joins, why they are used, and what issues may arise from them.↩︎

  2. {peacesciencer} also has miscellaneous “helper” functions that are not belabored in this manuscript. These include functions that allow the user to use {peacesciencer} functions with data created outside the package (declare_attributes()), download external data for quicker use of other functions (download_extdata()), and functions that communicate suggested citations (ps_cite()), among a few other functions. See the package’s website for more information (http://svmiller.com/peacesciencer).↩︎

  3. Leader-dyad-year data are too time-consuming to calculate each time and too size-prohibitive to include in the package. These data can be downloaded through download_extdata() for much easier use.↩︎

  4. The “whittle” (wc_) class of functions is a suite of functions that whittle dyadic dispute-years to true dyad-year data. They are effectively used in add_cow_mids() and add_gml_mids() for dyad-year data, but are offered here to allow users to employ their own case exclusion rules to create their own dyad-year conflict data. A vignette on {peacesciencer}’s website explains these case exclusion rules in some detail.↩︎

  5. {peacesciencer} comes with a vignette that explains just how much unnecessary missingness pervades data on democracy in the absence of an innovation like this, and how this missingness might adversely affect inferences of inter-state or intra-state conflict.↩︎

  6. The version of {peacesciencer} slated for release includes the most up-to-date-possible versions of data sets referenced in this manuscript. One helper function, ps_version(), will inform the user about the versions of the data included in the installed package. {peacesciencer} promises prompt package updates as new versions become available.↩︎

  7. R users will recognize that pound signs (#) allow for commenting code. Users new to R should understand its use here is primarily for exposition of what the code snippets in this manuscript are doing. In subsequent code snippets, #> will indicate console output the user can expect from these functions.↩︎

  8. keep = NULL in this context will focus the information returned to just the information about ongoing conflicts and onsets, discarding potentially unwanted columns about things like Side A, hostility levels, and other information included in the data set. Type ?add_gml_mids() for more information.↩︎

  9. add_spells() will only calculate the peace years—titled, in this case, gmlmidspell—and will leave the temporal dependence adjustment to the taste of the researcher. Importantly, I do not recommend manually creating splines or square/cube terms because it creates more problems in adjusting for temporal dependence in model predictions. In a regression formula in R, you can specify the Carter & Signorino (2010) approach as ... + gmlmidspell + I(gmlmidspell^2) + I(gmlmidspell^3). The Beck, Katz & Tucker (1998) cubic splines approach is ... + splines::bs(gmlmidspell, 4). This function includes the spell and three splines (hence the 4 in the command). Either approach makes for easier model predictions, given R’s functionality.↩︎

  10. The function also returns some background information of interest to the researcher, including the maximum intensity observed and the IDs associated with all ongoing conflicts in the state that year.↩︎

  11. The documentation for add_gml_mids(), along with the data-raw directory on Github, provides ample explanation for how inter-state conflicts in the GML conflict data are connected to specific leaders. Since inter-state dispute data can have missing days, connecting leaders to dispute onsets involves some important archival research that is only possible because of the information collected by Gibler, Miller & Little (2016) over 10 years of scholarship. The documentation also considers the multiple ways in which users can more flexibly code the concept of “initiation” based on participant-level summaries. The default coding of initiation, based on this exhaustive archival work, is to code a leader initiating a dispute if they were on Side A or joined the conflict at any point after the onset of the dispute. Users can change this to focus on just Side A, or just originators on Side A. This is the init argument in add_gml_mids().↩︎

  12. That said, {peacesciencer} makes ample use of SQL-like “joins” for the construction of base data and functions that merge data into base data created in the package. A discussion of these joins appears as a vignette on the package’s website.↩︎

  13. The latest version, as of writing, is v. 3.212 released in 2017. However, this is apparently a bug fix on string variables and that the underlying functionality was still last updated in 2007 with the release of version 3.2.↩︎

  14. For example, the “whittle” class of functions (i.e. wc_ in Table 2) are designed for dyadic dispute-year data also included in the package while a helper function—declare_attributes()—can allow a user to treat other dyadic dispute-year data as dyad-year data for the sake of using any of the other supported functions in Table 2.↩︎