Run Times (and Tips) for Various Functions in `{peacesciencer}`

library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
#> ✓ tibble  3.1.4     ✓ dplyr   1.0.7
#> ✓ tidyr   1.1.4     ✓ stringr 1.4.0
#> ✓ readr   2.0.2     ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
library(peacesciencer)
#> {peacesciencer} includes additional remote data for separate download. Please type ?download_extdata() for more information.
#> This message disappears on load when these data are downloaded and in the package's `extdata` directory.
library(kableExtra)
#> 
#> Attaching package: 'kableExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     group_rows

create_bench <- readRDS("~/Dropbox/projects/peacesciencer/data-raw/times/create_bench.rds")
state_bench <- readRDS("~/Dropbox/projects/peacesciencer/data-raw/times/state_bench.rds")
dyad_bench <- readRDS("~/Dropbox/projects/peacesciencer/data-raw/times/dyad_bench.rds")

I had time around watching the UEFA Euro 2020 action to evaluate the expected run times in peacesciencer. The TV is in the living room and that meant I can have my laptop open for this. My laptop is a more appropriate computer for doing this because my desktop is comically overpowered and may provide some unrealistic expectations (for me) about how other users might experience peacesciencer.¹

A subdirectory on the project’s Github shows what I did here. I grouped the functions in peacesciencer into two types, with one type have two subcomponents. The first type creates base data. These are create_statedays(), create_stateyears(), and create_dyadyears(). The second type broadly changes data—either adding to it or subtracting from it. For convenience sake, it’s good to think of this family of peacesciencer functions as applicable to state-year or dyad-year data. With that in mind, I used the {microbenchmark} package to run each of the relevant functions 100 times—across those two types (and three overall groups)—to see how long these functions can take for a user with a computer similar to mine. The times are calculated as nanoseconds and the benchmarking happened while I was also fiddling with other things, perusing the internet, and watching UEFA Euro 2020. Thus, what I offer here is illustrative, but still useful.

A user can see the associated R Markdown file for this vignette to see the code for processing/formatting, so I want to focus on just the substance. Here is a summary table of the run times for the create_* functions from this exercise. Note that these functions were executed with the default options, so these are all Correlates of War state system data from 1816 to the most recently concluded calendar year.

A Summary of Run Times Across 100 Trials for Data-Creating Functions in peacesciencer
Function	Average	Median	95% Interval	Minimum	Maximum
create_statedays()	0.963	0.948	[0.43, 1.791]	0.370	2.260
create_stateyears()	0.039	0.035	[0.027, 0.066]	0.026	0.199
create_dyadyears()	4.757	4.649	[3.565, 6.46]	3.437	7.167

The simulations show that creating dyad-years is by far the most time-intensive data-creating function in peacesciencer. This is not terribly surprising. The code for create_dyadyears() is transforming the raw Correlates of War (or Gleditsch-Ward) state system data. The nature of this transformation is invariably going to take more time than state-year or even state-day summaries of the data. That said, about 4-5 seconds for creating these data is pretty damn good, all things considered.

These are the run times for the functions that add to state-year data, ranged from most time-consuming to least time-consuming.

A Summary of Run Times Across 100 Trials for State-Year Data-Changing Functions in peacesciencer
Function	Average	Median	95% Interval	Minimum	Maximum
add_archigos()	6.291	6.185	[5.25, 7.726]	4.948	8.195
add_creg_fractionalization() [G-W]	3.389	3.313	[2.93, 4.052]	2.817	4.229
add_creg_fractionalization() [CoW]	3.350	3.284	[2.919, 4.137]	2.845	4.561
add_capital_distance()	2.310	2.256	[1.975, 2.86]	1.936	3.234
add_strategic_rivalries()	2.254	2.199	[2.01, 2.717]	1.940	3.123
add_cow_wars(type=“intra”)	0.615	0.581	[0.514, 0.917]	0.508	0.972
add_contiguity()	0.391	0.373	[0.334, 0.614]	0.318	0.798
add_gwcode_to_cow()	0.379	0.353	[0.309, 0.633]	0.303	0.691
add_ucdp_acd()	0.153	0.145	[0.129, 0.205]	0.128	0.407
add_minimum_distance() [G-W]	0.151	0.142	[0.119, 0.214]	0.113	0.572
add_minimum_distance() [CoW]	0.142	0.136	[0.117, 0.187]	0.108	0.433
add_peace_years() [CoW Intra-State Wars]	0.119	0.114	[0.098, 0.147]	0.097	0.398
add_peace_years() [(G-W) UCDP ACD]	0.119	0.114	[0.096, 0.181]	0.092	0.250
add_cow_majors()	0.027	0.025	[0.021, 0.043]	0.020	0.045
add_ccode_to_gw()	0.010	0.009	[0.008, 0.015]	0.008	0.018
add_democracy() [G-W]	0.010	0.007	[0.006, 0.012]	0.005	0.302
add_sdp_gdp() [G-W]	0.009	0.008	[0.006, 0.014]	0.006	0.026
add_sdp_gdp() [CoW]	0.008	0.007	[0.006, 0.013]	0.006	0.015
add_cow_trade()	0.007	0.006	[0.005, 0.011]	0.005	0.012
add_democracy() [CoW]	0.007	0.006	[0.005, 0.012]	0.005	0.019
add_igos()	0.007	0.006	[0.005, 0.01]	0.005	0.020
add_nmc()	0.007	0.007	[0.005, 0.015]	0.005	0.021
add_rugged_terrain() [G-W]	0.007	0.006	[0.005, 0.013]	0.005	0.022
add_rugged_terrain() [CoW]	0.006	0.006	[0.005, 0.01]	0.005	0.014

There are five functions for which the average execution time is over a second. Knowing what I know about how I wrote these functions, a few of them make some sense.add_archigos() takes the most time by far—an average of over six seconds—largely because 1) it needs to rowwise-transform a subset of the raw data to extend dates into leader-days before calculating the relevant variables as a group-by mutate before doing the most time-consuming function I sometimes bury into these functions: a group-by slice for eliminating duplicates. add_creg_fractionalization() has this same group-by slice largely because its state codes are not quite Correlates of War and note quite Gleditsch-Ward, for which a group-by slice is one of my go-tos for eliminating grouped duplicates. add_capital_distance() is a bit time-consuming because it’s doing on-the-fly “as the crow flies” distance estimates between state capitals using the provided latitude/longitude coordinates. add_strategic_rivalries() doesn’t have any of these, but it has a lot of buried if-elses for how a user may want to calculate the presence of a rivalry type at the state-year.

The dyad-year run times are a little bit more interesting and will merit some further explanation. Most of these are a little time-consuming because of the reasons mentioned above (e.g. group-by slices, as in the add_creg_fractionalization() and add_archigos() cases). The peace-year calculations are a little time-consuming, but ultimately have a straightforward explanation.

A Summary of Run Times Across 100 Trials for Dyad-Year Data-Changing Functions in peacesciencer
Function	Average	Median	95% Interval	Minimum	Maximum
add_peace_years() [CoW-MID]	12.913	12.566	[11.574, 15.679]	11.297	15.982
add_peace_years() [GML MID]	12.319	11.967	[11.057, 14.449]	10.914	16.352
add_archigos()	5.994	5.837	[5.068, 7.524]	4.904	7.815
add_creg_fractionalization() [CoW]	3.588	3.430	[3.143, 4.454]	3.068	4.911
add_creg_fractionalization() [G-W]	3.548	3.433	[3.143, 4.447]	3.043	4.582
filter_prd() [+ add_contiguity() + add_cow_majors()]	2.868	2.840	[2.471, 3.616]	2.352	3.746
add_contiguity()	2.071	1.974	[1.778, 2.671]	1.762	2.853
add_capital_distance()	1.881	1.851	[1.623, 2.322]	1.549	2.484
add_cow_wars(type=“inter”)	1.395	1.353	[1.19, 1.79]	1.188	2.119
add_cow_trade()	1.025	0.954	[0.854, 1.36]	0.851	1.564
add_igos()	1.002	0.961	[0.874, 1.318]	0.871	1.561
add_minimum_distance() [CoW]	0.956	0.902	[0.787, 1.457]	0.772	1.620
add_minimum_distance() [G-W]	0.929	0.894	[0.795, 1.288]	0.792	1.406
add_gwcode_to_cow()	0.773	0.740	[0.671, 1.03]	0.648	1.135
add_atop_alliance()	0.634	0.571	[0.492, 0.933]	0.488	0.985
add_cow_majors()	0.623	0.583	[0.514, 0.873]	0.503	1.156
add_nmc()	0.572	0.528	[0.458, 0.83]	0.451	0.961
add_sdp_gdp() [G-W]	0.521	0.478	[0.415, 0.853]	0.414	0.896
add_cow_alliance()	0.515	0.462	[0.416, 0.787]	0.412	0.864
add_sdp_gdp() [CoW]	0.497	0.449	[0.412, 0.819]	0.402	0.997
add_democracy() [G-W]	0.444	0.408	[0.369, 0.737]	0.358	0.874
add_gml_mids(keep=NULL)	0.444	0.399	[0.345, 0.755]	0.336	0.908
add_democracy() [CoW]	0.443	0.411	[0.371, 0.638]	0.361	0.805
add_cow_mids(keep=NULL)	0.440	0.408	[0.359, 0.713]	0.346	0.863
add_strategic_rivalries()	0.418	0.382	[0.345, 0.739]	0.340	0.915
add_ccode_to_gw()	0.414	0.383	[0.348, 0.653]	0.342	0.762
add_rugged_terrain() [CoW]	0.402	0.360	[0.311, 0.736]	0.308	0.840
add_rugged_terrain() [G-W]	0.384	0.347	[0.306, 0.718]	0.301	0.734

Basically, add_peace_years() works generally with a variety of data types you feed it. It’s also implicitly a grouped function. For state-year data, that means you have about 217 “groups” (i.e. states) in the Correlates of War cases. If you want—as I do here—the full damn universe of Correlates of War dyads from 1816 to 2020, that means you’ll have 41,252 dyads for which add_peace_years() will calculate your peace spells. So yeah, that’s going to take some time. You can cut that in about half if you filtered the data to just politically relevant dyads before calculating peace years.

Ultimately, the examples on the README show that you can do most things in peacesciencer in a matter of seconds. Unless you’re stress-testing the package’s ability to do everything on the full universe of dyad-year data, you can create the kind of data you want in well under a minute. Some functions take longer than others, mostly because of some hacks I built into these functions on the premise that I know they’ll work as I intend them to work (even if a more optimal alternative is possible).

My laptop is pretty good as far as performance laptops go. At the least, it has 16 GB of RAM. That is on the high end as far as most consumer laptops go, but dedicated professionals may have a laptop similar to what I have.↩︎

Run Times (and Tips) for Various Functions in {peacesciencer}