Whittle Duplicate Conflict-Years by Conflict Duration
Source:R/whittle_conflicts_duration.R
whittle_conflicts_duration.Rd
whittle_conflicts_duration()
is in a class of
do-it-yourself functions for coercing (i.e. "whittling") conflict-year
data with cross-sectional units to unique conflict-year data by
cross-sectional unit. The inspiration here is clearly the problem
of whittling dyadic dispute-year data into true dyad-year data (like in
the Gibler-Miller-Little conflict data). This particular function will
keep the observations with the highest estimated duration.
Arguments
- data
a data frame with a declared conflict attribute type.
- durtype
a duration on which to filter/whittle the data. Options include
"mindur"
or"maxdur"
. The default is"mindur"
.- ...
optional, only to make the shortcut work
Value
whittle_conflicts_duration()
takes a dyad-year data frame
or leader-dyad-year data frame with a declared conflict attribute type
and, grouping by the dyad and year, returns just those observations that
have the highest observed dispute-level fatality. This will not eliminate
all duplicates, far from it, but it's a sensible cut later into the
procedure (after whittling onsets in whittle_conflicts_onsets(),
and maybe some other things
the extent to which dispute-level duration
is a heuristic for dispute-level severity/importance.
Details
Dyads are capable of having multiple disputes in a given year,
which can create a problem for merging into a complete dyad-year
data frame. Consider the case of France and Italy in 1860, which
had three separate dispute onsets that year (MID#0112, MID#0113, MID#0306),
as illustrative of the problem. The default process in peacesciencer
employs several rules to whittle down these duplicate dyad-years for
merging into a dyad-year data frame. These are available in
add_cow_mids()
and add_gml_mids()
.
Some conflicts can be of an unknown length and often come with estimates
of a minimum duration and a maximum duration. This will concern the
durtype
parameter in this function. In many/most conflicts,
certainly thinking of the inter-state dispute data, dates are known with
precision (to the day) and the estimate of minimum conflict duration is
equal to the estimate of maximum conflict duration. For some conflicts,
the estimates will vary. This does importantly imply that using this
particular whittle function with the default (mindur
) will produce
different results than using this particular whittle function and asking
to retain the highest maximum duration (maxdur
). Use the function
with that in mind.
wc_duration()
is a simple, less wordy, shortcut for the same function.
References
Miller, Steven V. 2021. "How peacesciencer Coerces Dispute-Year Data into Dyad-Year Data". URL: http://svmiller.com/peacesciencer/articles/coerce-dispute-year-dyad-year.html
Examples
# \donttest{
# just call `library(tidyverse)` at the top of the your script
library(magrittr)
gml_dirdisp %>% whittle_conflicts_onsets() %>% whittle_conflicts_duration()
#> # A tibble: 9,308 × 39
#> dispnum ccode1 ccode2 year midongoing midonset sidea1 sidea2 revstate1
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2968 2 20 1979 1 1 0 1 0
#> 2 3900 2 20 1989 1 1 0 1 0
#> 3 3972 2 20 1991 1 1 1 0 1
#> 4 4183 2 20 1997 1 1 0 1 0
#> 5 1665 2 40 1921 1 1 1 0 1
#> 6 1677 2 40 1933 1 1 1 0 1
#> 7 1677 2 40 1934 1 0 1 0 1
#> 8 246 2 40 1960 1 1 1 0 1
#> 9 246 2 40 1961 1 0 1 0 1
#> 10 61 2 40 1962 1 1 1 0 1
#> # ℹ 9,298 more rows
#> # ℹ 30 more variables: revstate2 <dbl>, revtype11 <dbl>, revtype12 <dbl>,
#> # revtype21 <dbl>, revtype22 <dbl>, fatality1 <dbl>, fatality2 <dbl>,
#> # fatalpre1 <dbl>, fatalpre2 <dbl>, hiact1 <dbl>, hiact2 <dbl>,
#> # hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>, orig2 <dbl>, hiact <dbl>,
#> # hostlev <dbl>, mindur <dbl>, maxdur <dbl>, outcome <dbl>, settle <dbl>,
#> # fatality <dbl>, fatalpre <dbl>, stmon <dbl>, endmon <dbl>, recip <dbl>, …
cow_mid_dirdisps %>% whittle_conflicts_onsets() %>% whittle_conflicts_duration()
#> Joining with `by = join_by(dispnum)`
#> # A tibble: 10,268 × 20
#> dispnum ccode1 ccode2 year dispongoing disponset sidea1 sidea2 fatality1
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2968 2 20 1979 1 1 0 1 0
#> 2 3900 2 20 1989 1 1 0 1 0
#> 3 3972 2 20 1991 1 1 1 0 0
#> 4 4183 2 20 1997 1 1 0 1 0
#> 5 1665 2 40 1921 1 1 1 0 0
#> 6 1677 2 40 1933 1 1 1 0 0
#> 7 1677 2 40 1934 1 0 1 0 0
#> 8 246 2 40 1960 1 1 0 1 0
#> 9 246 2 40 1961 1 0 0 1 0
#> 10 61 2 40 1962 1 1 1 0 0
#> # ℹ 10,258 more rows
#> # ℹ 11 more variables: fatality2 <dbl>, fatalpre1 <dbl>, fatalpre2 <dbl>,
#> # hiact1 <dbl>, hiact2 <dbl>, hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>,
#> # orig2 <dbl>, mindur <dbl>, maxdur <dbl>
# }