Add minimum distance data to your data frame
Source:R/add_minimum_distance.R
add_minimum_distance.Rd
add_minimum_distance()
allows you to add the minimum
distance (in kilometers) to a dyad-year, state-year, leader-year, or
leader-dyad-year data. These estimates span the temporal domain of 1886 to
2019.
Arguments
- data
a data frame with appropriate peacesciencer attributes
- use_extdata
logical, defaults to TRUE. If TRUE, the function uses the augmented version of the minimum distance data made available by way of the
download_extdata()
function. If FALSE, the function uses eithercow_mindist
orgw_mindist
in the package.- slice
concerns data subset behavior when
use_extdata
is TRUE. Can be either "first" (the default option), "jan1", "june30", "last", or "dec31". See details section for more.- ...
optional, only to make the shortcut (
add_min_dist()
) work
Value
add_minimum_distance()
takes a (dyad-year, leader-year,
leader-dyad-year, state-year) data frame and adds the minimum distance
between the first state and the second state (in dyad-year or leader-dyad-year
data) or the minimum minimum (sic) distance for a given state in a given year
for data that are state-year or leader-year.
Details
The function leans on attributes of the data that are provided by one of the
"create" functions in this package (e.g. create_dyadyears()
or
create_stateyears()
).
This function will add estimates to leader-level data (like the kind created
create_leaderyears()
or create_leaderdyadyears()
), but the standard
caveat applies that the minimum distance data merged into these kinds of
data should be understood as approximations.
The function will create an on-the-fly directed version of the non-directed data prior to merging, even if your data are non-directed. It's just easier to do it that way and the concern for computation time is minimal.
Underneath the hood, a grouped summarize function returning a minimum estimate generates the value for state-year or leader-year data. If there is a given year where there is no minimum distance recorded whatsoever, this value is infinity. The function quietly corrects this underneath the hood, but the summarize function that calculates this still returns this warning.
The use_extdata
argument checks for whether you have the "plus" version of
the data in the package's extdata directory. If you don't have it, the
function issues a stop suggesting that you should run download_extdata()
to
get a copy of these data or to set use_extdata
to be FALSE.
download_extdata()
has additional information about the data sets that
use_extdata
would incorporate into your data. Check for "minimum distance"
in the documentation there, and be mindful of your state system that
peacesciencer is treating as your master system.
On the slice
Argument
The slice
argument is applicable only when use_extdata
is TRUE and
determines how the minimum distance data are sliced prior to merging into
your data set. The "plussed up" version of the minimum distance data that you
can retrieve from download_extdata()
and optionally use in this function
has every dyadic minimum distance from 1886 to 2019, by year, on Jan. 1,
June 30, Dec. 31, and at any point in a given year where the dyadic minimum
distance changed for one reason or another. A quick explanation follows.
"first": this is the default option. It will return the earliest observed minimum distance in a given dyad-year. In most cases, this is Jan. 1 of a given year. However, it need not be. For example, the minimum distance in the Correlates of War version of the data for the United States and Canada is on Jan. 10, 1920.
"jan1": entering this as the value in the slice
argument returns the
minimum distance observed on Jan. 1 of the referent year. Using the above
case of Canada and the United States in 1920, this observation would be
missing for the year because the dyad did not exist on Jan. 1, 1920 in the
Correlates of War system. This incidentally the only option available to you
if use_extdata
is set to FALSE. cow_mindist and gw_mindist are
benchmarked to Jan. 1 of a given year.
"june30": this is the recorded minimum distance, if one exists, for a dyad on June 30 of a given year. This is a basic midway point of a calendar year. Selecting this means there would be no minimum distance inserted for Germany and Austria in 1938 in the Correlates of War system. Austria momentarily exits the system on March 13, 1938.
"dec31": this is the recorded minimum distance, if one exists, for a dyad on Dec. 31 of a given year. Selecting this means there would be no minimum distance between the Republic of Vietnam and China in 1975 in the Correlates of War system. The Republic of Vietnam was eliminated from the international system on April 30 of that year.
"last": this will return the last observed minimum distance in a given dyad-year. In most cases, this is Dec. 31 of a given year. However, it need not be. In the above cases concerning some manner of system exit, the last observed minimum distance would be used.
References
Schvitz, Guy, Luc Girardin, Seraina Ruegger, Nils B. Weidmann, Lars-Erik
Cederman, and Kristian Skrede Gleditsch. 2022. "Mapping The International
System, 1886-2017: The CShapes
2.0 Dataset." Journal of Conflict
Resolution. 66(1): 144-161.
Weidmann, Nils B. and Kristian Skrede Gleditsch. 2010. "Mapping and Measuring
Country Shapes: The cshapes
Package." The R Journal 2(1): 18-24.
Examples
# \donttest{
# just call `library(tidyverse)` at the top of the your script
library(magrittr)
cow_ddy %>% add_minimum_distance(use_extdata = FALSE)
#> # A tibble: 2,214,930 × 4
#> ccode1 ccode2 year mindist
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2 20 1920 NA
#> 2 2 20 1921 0
#> 3 2 20 1922 0
#> 4 2 20 1923 0
#> 5 2 20 1924 0
#> 6 2 20 1925 0
#> 7 2 20 1926 0
#> 8 2 20 1927 0
#> 9 2 20 1928 0
#> 10 2 20 1929 0
#> # ℹ 2,214,920 more rows
# }