How `{peacesciencer}` Coerces Dispute-Year Data into Dyad-Year Data • peacesciencer

library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.4
#> ✔ forcats   1.0.0     ✔ stringr   1.5.0
#> ✔ ggplot2   3.5.1     ✔ tibble    3.3.0
#> ✔ lubridate 1.9.4     ✔ tidyr     1.3.0
#> ✔ purrr     1.1.0     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(peacesciencer)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)
#> {peacesciencer} includes additional remote data for separate download. Please type ?download_extdata() for more information.
#> This message disappears on load when these data are downloaded and in the package's `extdata` directory.
library(kableExtra)
#> 
#> Attaching package: 'kableExtra'
#> 
#> The following object is masked from 'package:dplyr':
#> 
#>     group_rows

Dyad-year models—whether directed or non-directed—seek to explain variation in conflict onset by reference to some covariates of interest. This unit of analysis in these models is the dyad-year (e.g. USA-Canada, 1920; USA-Canada, 1921) and not the dispute-year. A researcher who is not careful about the difference will end up with duplicate dyad-year observations for dyads in which there were multiple confrontations ongoing in a calendar year.

Here is my favorite case in point: the Italy-France dyad in 1860. This dyad not only had three unique disputes occurring in 1860, they also had three unique onsets that year. Two were even wars even as France was a passive participant in those. Heck, they even all started effectively at the same time, but concerned different components of the wars of Italian unification happening at that time. A researcher should be mindful about this: their unit of analysis is supposed to be the dyad-year, not the dispute-year. Not knowing the difference is the difference of having three Italy-France observations for 1860 or just one. The researcher who has a dyad-year design wants the latter, not the former.

haven::read_dta("~/Koofr/data/cow/mid/5/MIDB 5.0.dta") %>%
  filter(dispnum %in% c(112, 113, 306)) %>%
  select(dispnum:sidea, fatality, hiact, hostlev)
#> # A tibble: 8 × 13
#>   dispnum stabb ccode stday stmon styear endday endmon endyear sidea fatality
#>     <dbl> <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl>    <dbl>
#> 1     112 ITA     325     7     9   1860     29      9    1860     1        4
#> 2     112 FRN     220    10     9   1860     29      9    1860     0        0
#> 3     112 PAP     327     7     9   1860     29      9    1860     0        5
#> 4     113 ITA     325    18     9   1860     13      2    1861     1        5
#> 5     113 FRN     220    13    11   1860     19      1    1861     0        0
#> 6     113 SIC     329    18     9   1860     13      2    1861     0        4
#> 7     306 ITA     325    17     9   1860     19      1    1861     0       -9
#> 8     306 FRN     220    17     9   1860     19      1    1861     1       -9
#> # ℹ 2 more variables: hiact <dbl>, hostlev <dbl>

This means a researcher must make careful design decisions about which cases to exclude from their data. There is no correct answer here, per se. There is good reason theoretically to employ certain case exclusion rules before others, which is what peacesciencer will do by default in the add_cow_mids() and add_gml_mids() functions. This vignette will explain what peacesciencer does by default. Users who want to employ their own case exclusion rules are free to use the “whittle” class of functions (e.g. whittle_conflicts_onsets(), whittle_conflicts_fatality()) on the dyadic dispute-year data included in this package. I will start with version 5.0 of the CoW-MID data and the dyadic dispute-year data I created from it.

Converting CoW-MID Dyadic Dispute-Year Data into Dyad-Year Data

First, let’s identify where there are dyad-year duplicates in the data.

cow_mid_dirdisps %>%
  # make it non-directed for ease of presentation
  filter(ccode2 > ccode1) %>%
  group_by(ccode1, ccode2, year) %>%
  summarize(n = n(),
            mids = paste0(dispnum, collapse = ", ")) %>% 
  arrange(-n) %>%
  filter(n > 1) %>%
  ungroup() 
#> `summarise()` has grouped output by 'ccode1', 'ccode2'. You can override using
#> the `.groups` argument.
#> # A tibble: 498 × 5
#>    ccode1 ccode2  year     n mids                            
#>     <dbl>  <dbl> <dbl> <int> <chr>                           
#>  1    200    365  1920     6 186, 197, 1133, 2363, 2364, 2604
#>  2      2    365  1958     5 125, 173, 608, 2215, 2216       
#>  3    651    666  1959     5 3375, 3405, 3419, 3421, 3430    
#>  4    651    666  1960     5 3375, 3405, 3419, 3422, 3430    
#>  5    652    666  1955     5 3404, 3405, 3416, 3417, 3418    
#>  6      2    365  1962     4 61, 1353, 2219, 3361            
#>  7      2    365  1967     4 345, 2930, 2931, 2934           
#>  8    200    365  1919     4 197, 2363, 2604, 2605           
#>  9    651    666  1958     4 3375, 3405, 3419, 3420          
#> 10    652    666  1954     4 3403, 3404, 3415, 3417          
#> # ℹ 488 more rows

The absolute most in the data is the United Kingdom-Soviet Union dyad, which had six conflicts ongoing and/or initiated in 1920. Next most is a tie between the United States-Soviet Union dyad in 1958, the Egypt-Israel dyad (1959, 1960), and the Syria-Israel dyad (1955). All told, there are 498 dyad-years that duplicate in the dyadic dispute-year data. We need to whittle those down to where there is no more than one dyad-year in these data.

First: Select Unique Onsets

The primary aim is to preserve the unique onsets. The case of the United States-United Kingdom dyad in 1903 will illustrate what’s at stake here. Here, the United States and United Kingdom had three MIDs ongoing in 1903. Two (MID#0002 and MID#0254) began in 1902. The third, MID#3301, is a new onset. In this case, we want to remove the observation for MID#0002 and MID#0254 and keep the observation for MID3301.

cow_mid_dirdisps %>%
  filter(ccode1 == 2 & ccode2 == 200 & year == 1903) %>%
  select(dispnum:disponset) %>%
  kbl(., 
      caption = "United States-United Kingdom Dyadic Dispute-Years in 1903",
      booktabs = TRUE, longtable = TRUE) %>%
  kable_styling(position = "center", full_width = F,
                bootstrap_options = "striped")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

United States-United Kingdom Dyadic Dispute-Years in 1903
dispnum	ccode1	ccode2	year	dispongoing	disponset
2	2	200	1903	1	0
254	2	200	1903	1	0
3301	2	200	1903	1	1

Here’s how peacesciencer does this first cut. Grouping by dyad-year (i.e. group_by(ccode1, ccode2, year)), it creates a new variable that equals 1 if the number of rows by dyad-year is more than 1. Maintaining the same grouped structure, it calculates the standard deviation of the the disponset variable. Cases where no standard deviation could be calculate are cases where the dyad-year does not duplicate and these are assigned as 0. Next, it creates a simple removeme column that equals 1 if 1) it’s a duplicated dyad-year, and 2) it’s not a unique onset, and 3) the standard deviation is greater than 0 (i.e. there is at least one onset in that dyad-year). It then removes cases where removeme == 1.

cow_mid_dirdisps %>%
  group_by(ccode1, ccode2, year)  %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  # Remove anything that's not a unique MID onset
  mutate(sd = sd(disponset),
         sd = ifelse(is.na(sd), 0, sd)) %>%
  mutate(removeme = ifelse(duplicated == 1 & disponset == 0 & sd > 0, 
                           1, 0)) %>% 
  filter(removeme != 1) %>% 
  # remove detritus
  select(-removeme, -sd) %>%
  # practice safe group_by()
  ungroup() -> hold_this

# ^ The `hold_this` naming convention is my favorite for intermediate objects.
# It's also a bad idea to overwrite data objects that come in this package.

Observe how it fixed that USA-United Kingdom observation in 1903.

hold_this %>% 
  filter(ccode1 == 2 & ccode2 == 200 & year == 1903) %>%
  select(dispnum:disponset)
#> # A tibble: 1 × 6
#>   dispnum ccode1 ccode2  year dispongoing disponset
#>     <dbl>  <dbl>  <dbl> <dbl>       <dbl>     <dbl>
#> 1    3301      2    200  1903           1         1

It did not fix the Italy-France problem from 1860, but that’s because all three dispute-years were onsets that year.

hold_this %>% 
  filter(ccode1 == 220 & ccode2 == 325 & year == 1860) %>%
  select(dispnum:disponset) %>%
  kbl(., caption = "France-Italy Dyadic Dispute-Years in 1903",
      booktabs = TRUE, longtable = TRUE)  %>%
  kable_styling(position = "center", full_width = F,
                bootstrap_options = "striped")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

France-Italy Dyadic Dispute-Years in 1903
dispnum	ccode1	ccode2	year	dispongoing	disponset
112	220	325	1860	1	1
113	220	325	1860	1	1
306	220	325	1860	1	1

This just tells us we’re not done, but we knew we wouldn’t be. We need more exclusion rules to whittle down the data.

Second: Keep the Highest Dispute-Level Fatality

If presented the opportunity to keep one dispute and drop another where two appear in a year, researchers will likely prefer the more “serious” one rather than the one that might have been a simple threat to use or show of force. Consider this Russia-Ottoman Empire (Turkey) dyad-year in 1853. There are two unique onsets between the two that year. One (MID#0057) became the Crimean War, an important conflict! The other (MID#0126) was an apparent show of force with no fatalities. Under those conditions, it’s an easy call to keep the one with more fatalities.

hold_this %>%
  filter(ccode1 == 365 & ccode2 == 640 & year == 1853) %>%
  select(dispnum:disponset, fatality1:fatality2, hiact1, hiact2) %>%
  kbl(., 
      caption = "Russia-Ottoman Empire Dyadic Dispute-Years in 1853",
      booktabs = TRUE, longtable = TRUE)  %>%
  kable_styling(position = "center", full_width = F, 
                bootstrap_options = "striped")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

Russia-Ottoman Empire Dyadic Dispute-Years in 1853
dispnum	ccode1	ccode2	year	dispongoing	disponset	fatality1	fatality2	hiact1	hiact2
57	365	640	1853	1	1	6	6	20	20
126	365	640	1853	1	1	0	0	7	0

There is one limitation with CoW-MID data toward this end. We obviously know CoW-MID only assigns fatalities at the end of the dispute to the participants, so we’d have no way of knowing a priori how many fatalities in that Russia-Turkey dyad were in 1853. We could have a situation like Belgium-Germany in 1939-1940. In that case, the highest action in which Belgium engaged against Germany in 1939 was a mobilization and the war that momentarily eliminated Belgium from the international system happened the next year. We also don’t know to what extent Turkey was responsible for Russia’s fatalities. The Crimean War was a multilateral war pitting the Russians against the United Kingdom, Austria-Hungary, Italy, Turkey, and France.

Thus, what follows is crude, but still useful. We’ll use the dispute-level fatality information as a stand-in here and keep the duplicate dyad-year observation with the highest fatality score. We’ll also need to take inventory of how to handle the cases where fatality == -9. In a forthcoming data release, we find that cases of missing fatalities in the CoW-MID data mean that there were fatalities in more than half of the cases. Some were even wars! However, we’d have no way of knowing this from CoW-MID. We’ll be safe and recode -9 to be .5, indicating more than 0 fatalities but “less” than the fatality level of 1 (1-25 deaths) in that CoW-MID can at least confidently say the latter happened.

hold_this %>%
  left_join(., cow_mid_disps %>% select(dispnum, fatality)) %>%
  mutate(fatality = ifelse(fatality == -9, .5, fatality)) %>%
  arrange(ccode1, ccode2, year) %>%
  group_by(ccode1, ccode2, year) %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  group_by(ccode1, ccode2, year, duplicated) %>% 
  # Keep the highest fatality
  filter(fatality == max(fatality)) %>% 
  mutate(fatality = ifelse(fatality == .5, -9, fatality)) %>% 
  arrange(ccode1, ccode2, year) %>%
  # practice safe group_by()
  ungroup() -> hold_this
#> Joining with `by = join_by(dispnum)`

This will fix the Russia-Turkey-1853 problem.

hold_this %>% filter(ccode1 == 365 & ccode2 == 640 & year == 1853)
#> # A tibble: 1 × 20
#>   dispnum ccode1 ccode2  year dispongoing disponset sidea1 sidea2 fatality1
#>     <dbl>  <dbl>  <dbl> <dbl>       <dbl>     <dbl>  <dbl>  <dbl>     <dbl>
#> 1      57    365    640  1853           1         1      1      0         6
#> # ℹ 11 more variables: fatality2 <dbl>, fatalpre1 <dbl>, fatalpre2 <dbl>,
#> #   hiact1 <dbl>, hiact2 <dbl>, hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>,
#> #   orig2 <dbl>, duplicated <dbl>, fatality <dbl>

It won’t fix cases where there were multiple disputes initiated in the same year in the dyad, but no one died. There are lot of these. So, we’ll need more case exclusion rules.

Third: Keep the Highest Dispute-Level Hostility

The next case exclusion rule will want to continue isolating those serious MIDs from MIDs of lesser severity. Consider this case of India and Pakistan in 1963.

hold_this %>% 
  filter(ccode1 == 750 & ccode2 == 770 & year == 1963) %>%
  select(dispnum:year, disponset, fatality1, fatality2, hiact1, hiact2) %>%
  kbl(., caption = "India-Pakistan Dyadic Dispute-Years in 1963",
      booktabs = TRUE, longtable = TRUE)  %>%
  kable_styling(position = "center", full_width = F,
                bootstrap_options = "striped")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

India-Pakistan Dyadic Dispute-Years in 1963
dispnum	ccode1	ccode2	year	disponset	fatality1	fatality2	hiact1	hiact2
1317	750	770	1963	1	0	0	0	14
2630	750	770	1963	1	0	0	0	1

These are two unique MID onsets in 1963 and neither was fatal, meaning this duplicate dyad-year is still here. However, MID#2630 was just a threat to use force whereas MID#1317 had an occupation of territory (by Pakistan against India). The former is a threat. The latter is a use. MID#2630 has a higher hostility level and that is the MID we’ll want to keep. The same caveat applies, as it did with fatalities, so we’ll have to use the dispute-level hostility variable as a plug-in here.


hold_this %>%
  left_join(., cow_mid_disps %>% select(dispnum, hostlev)) %>%
  arrange(ccode1, ccode2, year) %>%
  group_by(ccode1, ccode2, year) %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  group_by(ccode1, ccode2, year, duplicated) %>%
  # Keep the highest hostlev
  filter(hostlev == max(hostlev)) %>%
  arrange(ccode1, ccode2, year) %>%
  # practice safe group_by()
  ungroup() -> hold_this
#> Joining with `by = join_by(dispnum)`

This will at least fix that India-Pakistan observation in 1963, and others like it.

hold_this %>% 
  filter(ccode1 == 750 & ccode2 == 770 & year == 1963) %>%
  select(dispnum:year, disponset, fatality1, fatality2, hiact1, hiact2)
#> # A tibble: 1 × 9
#>   dispnum ccode1 ccode2  year disponset fatality1 fatality2 hiact1 hiact2
#>     <dbl>  <dbl>  <dbl> <dbl>     <dbl>     <dbl>     <dbl>  <dbl>  <dbl>
#> 1    1317    750    770  1963         1         0         0      0     14

Fourth: Keep the Highest Dispute-Level (Minimum, Then Maximum) Duration

At this point, we still have duplicate dyad-years remaining in these data, but we’ve selected on cases that are fairly similar to each other (at least given the dispute- and participant-level data that are available). The duplicates that remain will be unique onsets with the same fatality levels and hostility levels. The next available measure that approximates dispute severity is duration. Consider this duplicate observation of Colombia-Peru in 1852 and the corresponding MIDs (MID#1506 and MID#1523).

haven::read_dta("~/Koofr/data/cow/mid/5/MIDB 5.0.dta") %>%
  filter(dispnum %in% c(1506, 1523)) %>%
  select(dispnum:sidea, fatality, hiact, hostlev)
#> # A tibble: 7 × 13
#>   dispnum stabb ccode stday stmon styear endday endmon endyear sidea fatality
#>     <dbl> <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl>    <dbl>
#> 1    1506 VEN     101    -9    10   1852     -9     11    1852     1        0
#> 2    1506 CHL     155    14     9   1852     -9     11    1852     0        0
#> 3    1506 PER     135    -9     8   1852     -9     11    1852     0        0
#> 4    1506 COL     100    -9     8   1852     -9     11    1852     1        0
#> 5    1523 PER     135    -9     3   1852     18      7    1852     0        0
#> 6    1523 CHL     155     2     6   1852      2      6    1852     1        0
#> 7    1523 COL     100    -9     3   1852     18      7    1852     1        0
#> # ℹ 2 more variables: hiact <dbl>, hostlev <dbl>

These MIDs look fairly similar. They both started the same year. They both have the same level of fatalities (none). They both have the same hostility level (a show of force). It would be tough to read tea leaves to argue that an alert (hiact: 8) is “greater” than a show of force (hiact: 7) even as 8 > 7 (i.e. CoW-MID action codes have never been truly ordinal). Further, they’re both multilateral MIDs. MID#1506 pit Venezuela and Colombia against Chile and Peru whereas MID#1523 pit Chile and Colombia against Peru. Both even unhelpfully have some unknown duration to them. There are -9s in start days in both.

However, MID#1523 has the highest minimum duration. It lasted at least 110 days (and as many as 140) whereas MID#1506 has a minimum duration of 63 days (and a maximum duration of 122 days). Under those conditions, we will keep the one with the minimum duration and then, where duplicates still remain, keep the one with the highest maximum duration.

hold_this %>%
  left_join(., cow_mid_disps %>% select(dispnum, mindur, maxdur)) %>%
  arrange(ccode1, ccode2, year) %>%
  group_by(ccode1, ccode2, year) %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  group_by(ccode1, ccode2, year, duplicated) %>%
  # Keep the highest mindur
  filter(mindur == max(mindur)) %>%
  arrange(ccode1, ccode2, year) %>%
  group_by(ccode1, ccode2, year) %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  group_by(ccode1, ccode2, year, duplicated) %>%
  # Keep the highest maxdur
  filter(maxdur == max(maxdur)) %>%
  # practice safe group_by()
  ungroup() -> hold_this
#> Joining with `by = join_by(dispnum)`

This will fix that Colombia-Peru problem in 1852.

hold_this %>% 
  filter(ccode1 == 135 & ccode2 == 100 & year == 1852) %>%
  select(dispnum:year, disponset, fatality1, fatality2, hiact1, hiact2)
#> # A tibble: 1 × 9
#>   dispnum ccode1 ccode2  year disponset fatality1 fatality2 hiact1 hiact2
#>     <dbl>  <dbl>  <dbl> <dbl>     <dbl>     <dbl>     <dbl>  <dbl>  <dbl>
#> 1    1523    135    100  1852         1         0         0      0      8

Final Case Exclusions for the CoW-MID Data

We had started with 498 duplicate directed dyad-years in the dyadic dispute-year data. We’re now down to just 24 directed (12 non-directed) dyad-years. A glance at these remaining observations suggest the substance here is very similar. For example, MID#4428 and MID#4430 are both one-day border fortifications between Kyrgyzstan and Uzbekistan in 2005. MID#2171 and MID#2172 are both one-day threats to use force between Cyprus and Turkey in 1965.

hold_this %>%
  group_by(ccode1, ccode2, year) %>%
  filter(n() > 1) %>% filter(ccode2 > ccode1) %>%
  select(dispnum:disponset, hiact1:hiact2, fatality:maxdur) %>%
  kbl(., caption = "Duplicate Non-Directed Dyad-Years Still Remaining",
      booktabs = TRUE, longtable = TRUE)  %>%
  kable_styling(position = "center", full_width = F,
                bootstrap_options = "striped")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

Duplicate Non-Directed Dyad-Years Still Remaining
dispnum	ccode1	ccode2	year	dispongoing	disponset	hiact1	hiact2	hostlev	mindur	maxdur
2233	2	365	1986	1	1	7	0	3	1	1
3637	2	365	1986	1	1	0	7	3	1	1
2171	352	640	1965	1	1	1	1	2	1	1
2172	352	640	1965	1	1	0	1	2	1	1
4416	365	372	2003	1	1	7	0	3	1	1
4420	365	372	2003	1	1	7	0	3	1	1
2800	541	560	1987	1	1	15	12	4	1	1
2801	541	560	1987	1	1	0	16	4	1	1
4428	703	704	2005	1	1	11	11	3	1	1
4430	703	704	2005	1	1	11	0	3	1	1
4225	731	740	1999	1	1	12	7	3	4	4
4322	731	740	1999	1	1	0	8	3	4	4

The final case exclusion rules will round us home. First, a few of these duplicate dyad-years feature a case where one dispute was reciprocated and the other was not. For example, MID#4428 was a mutual border fortification while MID#4430 was just one border fortification directed by Kyrgyzstan against Uzbekistan. Thus, we should keep the one that involved at least two codable incidents rather than the MID in which there was just one codable incident.

A reader may object here that reciprocation should feature higher in the proverbial chain, given its prominence in the audience cost literature. I caution that we should not do this. Gibler and Miller (also with Little) have driven home that the reciprocation variable is an information-poor variable. It only minimally tells you that Side B in a MID initiated a militarized incident or was involved in an attack in which there was no clear initiator. In our review of the conflict data, we find that attacks or ambushes initiated by Side A are countered when they happen more than half the time. Further, inferences made from the reciprocation variable are among the most sensitive to the errors we report in the CoW-MID data. For that reason, we discourage researchers from using this variable for their analyses and, for this application, it’s why peacesciencer uses the dispute-level reciprocation variable near the bottom of the rung in its case exclusions.

Still, here’s how to do that.

hold_this %>%
  left_join(., cow_mid_disps %>% select(dispnum, recip)) %>%
  arrange(ccode1, ccode2, year) %>%
  group_by(ccode1, ccode2, year) %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  group_by(ccode1, ccode2, year, duplicated) %>%
  # Keep the reciprocated ones, where non-reciprocated ones exist
  filter(recip == max(recip)) %>%
  arrange(ccode1, ccode2, year) %>%
  # practice safe group_by()
  ungroup() -> hold_this
#> Joining with `by = join_by(dispnum)`

We’re down to just three duplicate dyad-years now. The only reason MID#4428 and MID#4430 are both still there is CoW-MID has MID#4428 as unreciprocated at the dispute-level while it also has a militarized incident for Side B in the dispute. This is a CoW-MID issue and not a peacesciencer issue.

hold_this %>%
  group_by(ccode1, ccode2, year) %>%
  filter(n() > 1) %>% filter(ccode2 > ccode1) %>%
  select(dispnum:disponset, hiact1:hiact2, fatality:maxdur) %>%
  kbl(., caption = "Duplicate Non-Directed Dyad-Years Still Remaining",
      booktabs = TRUE, longtable = TRUE)  %>%
  kable_styling(position = "center", full_width = F, 
                bootstrap_options = "striped")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")

Duplicate Non-Directed Dyad-Years Still Remaining
dispnum	ccode1	ccode2	year	dispongoing	disponset	hiact1	hiact2	hostlev	mindur	maxdur
2233	2	365	1986	1	1	7	0	3	1	1
3637	2	365	1986	1	1	0	7	3	1	1
4416	365	372	2003	1	1	7	0	3	1	1
4420	365	372	2003	1	1	7	0	3	1	1
4428	703	704	2005	1	1	11	11	3	1	1
4430	703	704	2005	1	1	11	0	3	1	1

All three are effectively identical MIDs. They start the same year. They have the same fatality-level, hostility-level, duration, and both are either are reciprocated or not-reciprocated (that MID#4428/MID#4430 issue notwithstanding). Thus, we will select the one that has the lowest start month.

hold_this %>%
  left_join(., cow_mid_disps %>% select(dispnum, stmon)) %>%
  arrange(ccode1, ccode2, year) %>%
  group_by(ccode1, ccode2, year) %>%
  mutate(duplicated = ifelse(n() > 1, 1, 0)) %>%
  group_by(ccode1, ccode2, year, duplicated) %>%
  # Keep the reciprocated ones, where non-reciprocated ones exist
  filter(stmon == min(stmon)) %>%
  arrange(ccode1, ccode2, year) %>%
  # practice safe group_by()
  ungroup() -> hold_this
#> Joining with `by = join_by(dispnum)`
# And we're done

And this is enough to eliminate duplicate dyad-years.

hold_this %>%
  group_by(ccode1, ccode2, year) %>%
  filter(n() > 1) 
#> # A tibble: 0 × 25
#> # Groups:   ccode1, ccode2, year [0]
#> # ℹ 25 variables: dispnum <dbl>, ccode1 <dbl>, ccode2 <dbl>, year <dbl>,
#> #   dispongoing <dbl>, disponset <dbl>, sidea1 <dbl>, sidea2 <dbl>,
#> #   fatality1 <dbl>, fatality2 <dbl>, fatalpre1 <dbl>, fatalpre2 <dbl>,
#> #   hiact1 <dbl>, hiact2 <dbl>, hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>,
#> #   orig2 <dbl>, duplicated <dbl>, fatality <dbl>, hostlev <dbl>, mindur <dbl>,
#> #   maxdur <dbl>, recip <dbl>, stmon <dbl>