add_gml_mids() merges in GML's MID data to a (dyad-year, leader-year, leader-dyad-year, state-year) data frame. The current version of the GML MID data is 2.2.1.

## Usage

add_gml_mids(data, keep, init = "sidea-all-joiners")

## Arguments

data

a data frame with appropriate peacesciencer attributes

keep

an optional parameter, specified as a character vector, applicable to just the dyad-year data, and passed to the function in a select(one_of(.)) wrapper. This allows the user to discard unwanted columns from the directed dispute data so that the output does not consume too much space in memory. Note: the Correlates of War system codes (ccode1, ccode2), the observation year (year), the presence or absence of an ongoing MID (gmlmidongoing), and the presence or absence of a unique MID onset (gmlmidonset) are always returned. It would be foolish and self-defeating to eliminate those observations. The user is free to keep or discard anything else they see fit.

If keep is not specified in the function, the ensuing output returns everything.

init

how should initiators be coded? Applicable only to state-year, leader-dyad-year, and leader-year data. This parameter accepts one of three possible values ("sidea-orig", "sidea-with-joiners", "sidea-all-joiners"). "sidea-orig" = a state initiates a MID (which appears as a summary return in the output) if the state was on Side A at the onset of the dispute. "sidea-with-joiners" = a state initiates a MID (which appears as a summary return in the output) if the state was on Side A at the onset of the dispute or if the state joined the MID on Side A. "sidea-all-joiners" = a state initiates a MID (which appears as a summary return in the output) if the state was on Side A at the onset of the dispute or if it joined at any point thereafter. See details section for more discussion. The default is "sidea-all-joiners".

## Value

add_gml_mids() takes a (dyad-year, leader-year, leader-dyad-year, state-year) data frame and adds dispute information from the GML MID data. If the data are dyad-year, the return is a laundry list of information about onsets, ongoing conflicts, and assorted participant- and dispute-level summaries. If the data are leader-dyad-year, these are carefully matched to leaders as well. If the data are state-year or leader-year, the function returns information about ongoing disputes (and onsets) and whether there were any ongoing disputes (and onsets) the state (or leader) initiated.

## Details

Dyads are capable of having multiple disputes in a given year, which can create a problem for merging into a complete dyad-year data frame. Consider the case of France and Italy in 1860, which had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306), as illustrative of the problem. This merging process employs several rules to whittle down these duplicate dyad-years for merging into a dyad-year data frame.

The function will also return a message to the user about the case-exclusion rules that went into this process. Users who are interested in implementing their own case-exclusion rules should look up the "whittle" class of functions also provided in this package.

Determining "initiation" for state-year summaries of inter-state disputes is possible since there is an implied directionality of "initiation." In about half of all cases, this is straightforward. You can use the participant summaries and determine that if the dispute was bilateral and the dispute did not escalate beyond an attack, the state on Side A initiated the dispute. For multilateral MIDs, these conditions still hold at least for originators. However, there is considerable difficulty for cases where 1) participant-level summaries suggested actions at the level of clash or higher, 2) the participant was a joiner and not an originator. The effort required to flesh this out is enormous, and perhaps forthcoming in a future update.

add_gml_mids() allows you to make one of three judgment calls here (see the arguments section of the documentation). If it were my call to make, I would say you should probably use the option "sidea-all-joiners". My review of the MID data with Doug Gibler suggests most states that join a dispute are not roped into a conflict (i.e. targeted by some other state) after the first incident. They routinely initiate their entry into the conflict, which is what this concept of "initiation" is supposed to capture in the literature. There are no doubt cases where some third state is brought into the dispute by the actions of some other state even as the original MID coding rules place a high barrier on coding that type of dispute entry. However, the time required to individually assess whether a state initiated their entry into a MID under something other than the simplest of cases (e.g. bilateral cases where the highest participant action fell short of a clash) would be too time-consuming. It would require an audit of almost half of all participant-level summaries in the data. In a forthcoming publication, Gibler and Miller offer excellent coverage here with a new data set on militarized events. However, this would include only confrontations after World War II.

## References

Gibler, Douglas M., Steven V. Miller, and Erin K. Little. 2016. “An Analysis of the Militarized Interstate Dispute (MID) Dataset, 1816-2001.” International Studies Quarterly 60(4): 719-730.

Steven V. Miller

## Examples


# \donttest{
# just call library(tidyverse) at the top of the your script
library(magrittr)
#> Joining, by = c("ccode1", "ccode2", "year")
#> add_gml_mids() IMPORTANT MESSAGE: By default, this function whittles dispute-year data into dyad-year data by first selecting on unique onsets. Thereafter, where duplicates remain, it whittles dispute-year data into dyad-year data in the following order: 1) retaining highest fatality, 2) retaining highest hostlev, 3) retaining highest estimated mindur, 4) retaining highest estimated maxdur, 5) retaining reciprocated over non-reciprocated observations, 6) retaining the observation with the lowest start month, and, where duplicates still remained (and they don't), 7) forcibly dropping all duplicates for observations that are otherwise very similar.
#> # A tibble: 2,101,440 × 26
#>    ccode1 ccode2  year dispnum gmlmidongoing gmlmidonset sidea1 sidea2 fatality1
#>     <dbl>  <dbl> <dbl>   <dbl>         <dbl>       <dbl>  <dbl>  <dbl>     <dbl>
#>  1      2     20  1920      NA             0           0     NA     NA        NA
#>  2      2     20  1921      NA             0           0     NA     NA        NA
#>  3      2     20  1922      NA             0           0     NA     NA        NA
#>  4      2     20  1923      NA             0           0     NA     NA        NA
#>  5      2     20  1924      NA             0           0     NA     NA        NA
#>  6      2     20  1925      NA             0           0     NA     NA        NA
#>  7      2     20  1926      NA             0           0     NA     NA        NA
#>  8      2     20  1927      NA             0           0     NA     NA        NA
#>  9      2     20  1928      NA             0           0     NA     NA        NA
#> 10      2     20  1929      NA             0           0     NA     NA        NA
#> # … with 2,101,430 more rows, and 17 more variables: fatality2 <dbl>,
#> #   fatalpre1 <dbl>, fatalpre2 <dbl>, hiact1 <dbl>, hiact2 <dbl>,
#> #   hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>, orig2 <dbl>, fatality <dbl>,
#> #   hostlev <dbl>, recip <dbl>, mindur <dbl>, maxdur <dbl>, stmon <dbl>,
#> #   init1 <dbl>, init2 <dbl>

# keep just the dispute number and Side A/B identifiers
#> Joining, by = c("ccode1", "ccode2", "year")
#> add_gml_mids() IMPORTANT MESSAGE: By default, this function whittles dispute-year data into dyad-year data by first selecting on unique onsets. Thereafter, where duplicates remain, it whittles dispute-year data into dyad-year data in the following order: 1) retaining highest fatality, 2) retaining highest hostlev, 3) retaining highest estimated mindur, 4) retaining highest estimated maxdur, 5) retaining reciprocated over non-reciprocated observations, 6) retaining the observation with the lowest start month, and, where duplicates still remained (and they don't), 7) forcibly dropping all duplicates for observations that are otherwise very similar.
#> # A tibble: 2,101,440 × 12
#>    ccode1 ccode2  year gmlmidonset gmlmidongoing init1 init2 sidea1 sidea2 orig1
#>     <dbl>  <dbl> <dbl>       <dbl>         <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
#>  1      2     20  1920           0             0    NA    NA     NA     NA    NA
#>  2      2     20  1921           0             0    NA    NA     NA     NA    NA
#>  3      2     20  1922           0             0    NA    NA     NA     NA    NA
#>  4      2     20  1923           0             0    NA    NA     NA     NA    NA
#>  5      2     20  1924           0             0    NA    NA     NA     NA    NA
#>  6      2     20  1925           0             0    NA    NA     NA     NA    NA
#>  7      2     20  1926           0             0    NA    NA     NA     NA    NA
#>  8      2     20  1927           0             0    NA    NA     NA     NA    NA
#>  9      2     20  1928           0             0    NA    NA     NA     NA    NA
#> 10      2     20  1929           0             0    NA    NA     NA     NA    NA
#> # … with 2,101,430 more rows, and 2 more variables: orig2 <dbl>, dispnum <dbl>
# }