Show Duplicate Observations in Your Dyad-Year or State-Year Data Frame

show_duplicates() shows which data are duplicated in data generated in peacesciencer. It's a useful diagnostic tool for users doing some do-it-yourself functions with peacesciencer.

Usage

show_duplicates(data)

Arguments

data: a dyad-year data frame or a state-year data frame created in peacesciencer.

Value

show_duplicates() takes a dyad-year data frame or state-year data frame generated in peacesciencer and shows what observations are duplicated by unique combination of dyad-year or state-year, contingent on what was supplied to it.

Details

The function leans on attributes of the data that are provided by the create_dyadyear() or create_stateyear() function. Make sure that function (or data created by that function) appear at the top of the proverbial pipe.

The data returned will also have a new column called duplicated. Thus, an implicit assumption in this function is the user does not have a column in the data with this name that is of interest to the user. It will be overwritten.

Author

Steven V. Miller

Examples


# just call `library(tidyverse)` at the top of the your script
library(magrittr)

gml_dirdisp %>% show_duplicates()
#> # A tibble: 1,838 × 40
#>    dispnum ccode1 ccode2  year midongoing midonset sidea1 sidea2 revstate1
#>      <dbl>  <dbl>  <dbl> <dbl>      <dbl>    <dbl>  <dbl>  <dbl>     <dbl>
#>  1    2981      2     40  1983          1        1      1      0         1
#>  2    3058      2     40  1983          1        1      1      0         1
#>  3    1554      2     70  1836          1        1      0      1         0
#>  4    1555      2     70  1836          1        1      1      0         0
#>  5    1556      2     70  1836          1        0      1      0         0
#>  6    1548      2     70  1860          1        0      1      0         0
#>  7    1549      2     70  1860          1        1      1      0         1
#>  8    2347      2     93  1982          1        0      0      1         1
#>  9    2977      2     93  1982          1        1      1      0         1
#> 10    2741      2     95  1988          1        0      1      0         1
#> # ℹ 1,828 more rows
#> # ℹ 31 more variables: revstate2 <dbl>, revtype11 <dbl>, revtype12 <dbl>,
#> #   revtype21 <dbl>, revtype22 <dbl>, fatality1 <dbl>, fatality2 <dbl>,
#> #   fatalpre1 <dbl>, fatalpre2 <dbl>, hiact1 <dbl>, hiact2 <dbl>,
#> #   hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>, orig2 <dbl>, hiact <dbl>,
#> #   hostlev <dbl>, mindur <dbl>, maxdur <dbl>, outcome <dbl>, settle <dbl>,
#> #   fatality <dbl>, fatalpre <dbl>, stmon <dbl>, endmon <dbl>, recip <dbl>, …
cow_mid_dirdisps %>% show_duplicates()
#> # A tibble: 2,152 × 19
#>    dispnum ccode1 ccode2  year dispongoing disponset sidea1 sidea2 fatality1
#>      <dbl>  <dbl>  <dbl> <dbl>       <dbl>     <dbl>  <dbl>  <dbl>     <dbl>
#>  1    2981      2     40  1983           1         1      1      0         0
#>  2    3058      2     40  1983           1         1      1      0         1
#>  3      69      2     42  1916           1         0      1      0         0
#>  4     322      2     42  1916           1         1      1      0        -9
#>  5    1554      2     70  1836           1         1      0      1         0
#>  6    1555      2     70  1836           1         1      1      0         0
#>  7    1548      2     70  1860           1         0      1      0         0
#>  8    1549      2     70  1860           1         1      1      0        -9
#>  9       2      2    200  1902           1         1      1      0         0
#> 10     254      2    200  1902           1         1      0      1         0
#> # ℹ 2,142 more rows
#> # ℹ 10 more variables: fatality2 <dbl>, fatalpre1 <dbl>, fatalpre2 <dbl>,
#> #   hiact1 <dbl>, hiact2 <dbl>, hostlev1 <dbl>, hostlev2 <dbl>, orig1 <dbl>,
#> #   orig2 <dbl>, duplicated <dbl>