Calculate Signorino and Ritter's (1999) S for Similarity
srs.Rd
srs()
takes two vectors and returns Signorino and Ritter's S statistic
communicating broadly understood "similarity" of interests or ratings.
Arguments
- x1
a vector, and one assumes an integer
- x2
a vector, and one assumes an integer
- distances
the type of distances between ratings/attachments to estimate. Can be either "absolute" or "squared". Defaults to "absolute", but see note in details section.
- weights
a vector of weights. Defaults to NULL for creating unweighted S statistics
- range
defaults to NULL, but an optional vector that forces the range to be a certain value. If NULL, the function calculates a range based on the maximum and minimum values observed across both
x1
andx2
. See details section for more.
Value
srs()
takes two vectors and returns Signorino and Ritter's S statistic
communicating broadly understood "similarity" of interests or ratings.
Details
Be advised that Signorino and Ritter's (1999) treatment of the S statistic used absolute distances when squared distances are more commonly used in the world of distance and association metrics.
There are potentially instances in which the conceivable range of
ratings/attachments (i.e. your two vectors) are not observed. In the case of
applications to alliance data, this is almost an impossibility. Every state,
by assumption, is maximally committed to defending itself. There will
assuredly be cases in which there is no commitment to another state in the
data (either for reasons of disinterest or enmity, though the first calls
into question what a 0 should communicate and the latter betrays the
interesting complexity of alliances). Thus, the minimum and maximum, one
assumes, will always be observed in the alliance data. Perhaps the same could
be said for UN voting data, though I couldn't rule out the possibility that
there is a dyad out there for which both states never voted "yes" or "no".
That would have implications for the range in the denominator of the formula.
You can override that by hard-setting the range in the range
argument.
The function subsets to complete cases of the two vectors for which you want an S score. If weights are included, the function further subsets to complete cases including the weights as well.
The function implicitly assumes that x1
and x2
are columns in a data
frame. One indirect check for this looks at whether x1
and x2
are the
same length. The function will stop if they're not.
Several Comments on Weighting
If it were my call to make, I'd caution against the IR standard of using the composite index of national capabilities (CINC) as a weight on the calculation of the S statistic. Conceptually, weighting by capabilities tries to capture some kind of "importance" quantity. Related to the familiar application of alliances, this would prioritize those states that could conceivably bring more to the battlefield. In practice, this adds one anachronism to another. Capabilities, as measured, are basically a nineteenth century measurement for which estimates of energy consumption, iron and steel production, and urban population size are given equal weight in composition of the measure to military expenditures and military size. Alliances themselves are somewhat antiquarian, certainly in what we want them to do for this measure. If the question is "why must alliances be measures of foreign policy similarity", the answer kind of reduces to "we have historical data on them." If you want estimates for the 19th century, you have this, but then are implicitly confessing your measure of foreign policy similarity is an anachronism.
There are other peculiarities too. The data on capabilities has always been historically skewed to the right. Very few states have proportionally that much weight. As the state system has expanded in size (i.e. as empires ended), the relative weight at the top necessarily decreases. For example, the top 3 states in capabilities in 1816 (the United Kingdom, Russia, and France) combined for 61.8% of capabilities in a system of just 23 states. In 2016, the top three states (China, the United States, and India) combined for 45% of capabilities in a system of 195 states. New states are almost always small states that possess almost no capabilities. 11 of 23 states in 1816 had less than 1% of capabilities. That's about 48% of the system. In 2016, 176 of 195 states have less than 1% of capabilities. That's over 90% of the system. If the idea is to identify the "important" foreign policy ties, I echo Haege's (2011) contention that this approach is a second-best solution. It's second-best to other metrics that better model chance-corrected agreement. It just discards too much information and gives too much weight to great powers and/or states that are conspicuously high on capabilities (e.g. India).
Faithfully calculating a weighted S statistic (by system capabilities) requires a weight that sums to 1. In the most literal sense of 1, there is no year in the National Material Capabilities data (v. 2016) in which system capabilities in a given year sum to 1. In almost 60% of cases/years, the discrepancy doesn't look like a rounding error either. In 1860, all capabilities sum to over 1.07! In the context of applications with Correlates of War's CINC scores, you can still use the raw data because the function doesn't assume the weights sum to 1. You'll see how in the denominator of the formula.
Weights are only applicable to absolute distances. If you specify a weight
variable with distances = 'squared'
, the function will ignore your weights.
In applications to the Correlates of War system, as far as I am aware, there are no CoW states for which there isn't a CINC estimate. If, for some reason, a CINC score (or some other weight) is missing, the cases are dropped before weights are applied.
If weights are supplied, the weights must match the length of either x1
or
x2
. The function builds in an implicit assumption that the weights are a
column in the data frame you're using.