Stargazing on Scale (ALMA Observatory, Chile)

I think we’ve all had things that we’ve written that we kind of cringe seeing now. That’s me and this guide I wrote over five years ago on how to include information about the random effects from a mixed effects (multilevel) model in a table made by the {stargazer} package. I mean, ew, Steve from five years ago. For a guide that purports to “quasi-automate” something, it involves a lot of manual steps and is convoluted as hell. Even the illustrative data set is unintuitive. Yet, it’s one of the most frequently visited pages on my website despite not being very helpful. Good work, Steve from five years ago.

There is a better way, though. There are multiple benefits to the approach that follows. It involves fewer steps. It’s generalizable to cases where {stargazer} can’t process the kind of statistical model. It also has a more intuitive example data set.

Here are the R packages we’ll be using in this guide.

library(tidyverse) # for all-things workflow
library(stevemisc) # for the r2sd() function and the data set
library(lme4) # the workhorse mixed effects models package in R
library(blme) # some Bayesian add-ons in an lme4 framework
library(stargazer) # the star of the show


## The Data and the Models

I discussed these data in a post from late March this year. In that post, which I was supposed to give to some students in the United Kingdom before the COVID-19 outbreak, I proposed a rudimentary statistical model of immigration sentiment in the United Kingdom to better teach students in the UK about quantitative methods. I make that data available in my {stevedata} package as the ESS9GB data.

Briefly, the immigsent is a 31-point scale of attitudes about immigration created as an additive index of three 10-point prompts about whether a respondent believes immigration is good for the UK’s economy, whether cultural life is enriched by immigrants, and whether the UK is made a better place to live because of immigrants. Higher values mean more pro-immigration sentiment. We’re going to propose an explanation of variation in the immigsent variable as a function of the respondent’s age (agea), whether the respondent is a woman (female) or unemployed (uempla), the respondent’s household income in deciles (hinctnta), and hte respondent’s ideology on an 11-point left-right scale (lrscale). Data come from the 9th round of ESS data for the United Kingdom and were already subset to just those respondents who were born in the United Kingdom.

We’ll propose three models. The first is a simple linear model, much like I did in the post in March. The second is a mixed effects model with a random effect for one of 12 UK regions in the data. These regions are East Midlands, East of England, London, North East, North West, Northern Ireland, Scotland, South East, South West, Wales, West Midlands, and Yorkshire and the Humber. The third model will include a random slope for ideology on the random effect of region. This model will also be estimated with the {blme} package. Of note: {stargazer} doesn’t support the {blme} package.

Before estimating the models, let’s scale everything that’s not binary by two standard deviations. Centering, at the least, is just good modeling practice. Mixed effects models, in particular, get very whiny in the absence of naturally occurring zeroes.1

ESS9GB %>%
mutate_at(vars("agea", "eduyrs", "hinctnta", "lrscale"), list(z = ~r2sd(.))) %>%
rename_at(vars(contains("_z")),
~paste("z", gsub("_z", "", .), sep = "_") ) -> Data


Now, let’s estimate the three models.

# Simple linear model
M1 <- lm(immigsent ~ z_agea + female + z_eduyrs + uempla + z_hinctnta  +
z_lrscale, data=Data)

# Simple Linear Mixed Effects Model
M2 <- lmer(immigsent ~ z_agea + female + z_eduyrs + uempla + z_hinctnta  +
z_lrscale + (1 | region), data=Data)

# Simple Bayesian Linear Mixed Effects Model (not supported by stargazer)
M3 <- blmer(immigsent ~ z_agea + female + z_eduyrs + uempla + z_hinctnta  +
z_lrscale + (1 + z_lrscale | region), data=Data)


## Creating a Stargazer Table

The approach I recommend starts with creating “tidied” data frames of the regression model. These will contain the fixed effects and their standard errors. Importantly, that’s all we want from these objects even as the {broom} package will also store information about the random effects. It’s useful information in a lot of applications, just not this particular one.

tidyM1 <- broom::tidy(M1)
tidyM2 <- broom::tidy(M2) %>% filter(effect == "fixed")
tidyM3 <- broom::tidy(M3) %>% filter(effect == "fixed")


Next, grab some information about the random effects and store them as vectors. Some reviewers will care more about certain aspects of a mixed effects model than others, but I think, at a minimum, a researcher estimating and presenting a mixed effects model must present 1) the number of unique group-level “clusters” in the random effect(s) (in our case: the 12 regions of the UK in the data) and 2) the standard deviation (or variance) of the random parameters (i.e. the random slopes and/or the random intercepts). This simple example has just one random intercept and, for M3, a random slope on top of that. However, this approach I recommend is generalizable to more random effects for more complicated models.

 # Number of unique regions for that one random effect
num_region <- as.numeric(sapply(ranef(M2),nrow)[1])

# SD for region in M2
sd_regionM2 <- round(as.numeric(attributes(VarCorr(M2)$"region")$stddev), 3)

# SD for region in M3. Note: we have a random slope on top of that too.
sd_regionM3 <- round(as.numeric(attributes(VarCorr(M3)$"region")$stddev)[1], 3)

# sd for the random slope in region for M3.
sd_regionM3lrscale <- round(as.numeric(attributes(VarCorr(M3)$"region")$stddev)[2], 3)


Now, let’s manually create a tibble that has all this information presented in the order we want. We’re also going to add the number of observations from the model as well. Do note you can choose to eschew the vectors and have the tibble do the calculations for you, but that would clutter up the workflow (I think). I’m sure there’s a way of getting escaping LaTeX characters in this if you’re thinking ahead to a LaTeX table, but this will do for now.

tribble(~stat, ~M1, ~M2, ~M3,
"Number of Regions", NA, num_region, num_region,
"sd(Region)", NA, sd_regionM2, sd_regionM3,
"sd(Region, Ideology)", NA, NA, sd_regionM3lrscale,
"", NA, NA, NA,
"N", nobs(M1), nobs(M2), nobs(M3)) -> mod_stats


Now, let’s create a regression table using the {stargazer} package. This will format to HTML (for this blog post), but changing the type to “latex” (for example) won’t materially change anything. I’ll walk through the important pieces after the table.

stargazer(M1, M2, M2, type="html",
# ^  Notice M2 is called twice. I'm going somewhere with this.
# Below: manually supply tidied coefficients and standard errors
coef = list(tidyM1$estimate, tidyM2$estimate, tidyM3$estimate), se = list(tidyM1$std.error, tidyM2$std.error, tidyM3$std.error),
# Omit model statistics by default...
omit.table.layout = "s",
# ...but supply your own that you created (with random effects)
add.lines = lapply(1:nrow(mod_stats), function(i) unlist(mod_stats[i, ])),
covariate.labels = c("Age","Female","Years of Education", "Unemployed", "Household Income (Deciles)", "Ideology (L to R)"),
notes="<small>Data: ESS, Round 9 (United Kingdom)</small>",
dep.var.labels="Pro-Immigration Sentiment",
model.names = FALSE,
column.labels = c("OLS", "Linear Mixed Effects", "Linear Mixed Effects (Bayesian)")
)

 Dependent variable: Pro-Immigration Sentiment OLS Linear Mixed Effects Linear Mixed Effects(Bayesian) (1) (2) (3) Age -0.068 -0.103 -0.021 (0.372) (0.372) (0.373) Female -0.248 -0.212 -0.225 (0.338) (0.335) (0.335) Years of Education 3.544*** 3.491*** 3.427*** (0.354) (0.354) (0.354) Unemployed -1.102 -1.005 -0.744 (1.204) (1.194) (1.203) Household Income (Deciles) 2.007*** 1.908*** 1.885*** (0.365) (0.366) (0.366) Ideology (L to R) -2.267*** -2.273*** -2.109 (0.343) (0.341) (1.764) Constant 17.269*** 17.133*** 17.150*** (0.243) (0.368) (0.369) Number of Regions 12 12 sd(Region) 0.941 0.946 sd(Region, Ideology) 5.971 N 1454 1454 1454 Note: *p<0.1; **p<0.05; ***p<0.01 Data: ESS, Round 9 (United Kingdom)

This approach is flexible to your own needs and particular set of models, but I want to highlight these important components. First, notice that Model 3 in the presentation was actually called as Model 2. Minimally, the function is stargazer(M1, M2, M2) because Model 3 (M3) is of class blmerMod. The overlap between blmerMod for Model 3 and lmerMod for Model 2 is obviously substantial, but {stargazer} will only process the latter and not the former. Thus, technically, Model 3 in the table was actually called as Model 2 again.

Second, and importantly, I manually supplied the coefficients and the standard errors as a list drawn from the tidied objects I created with the {broom} package. This allowed me to overwrite Model 2 (as Model 3) with the actual coefficients and standard errors from Model 3. Thus, you can see in Model 3 that the results of the Bayesian mixed effects model suggest that allowing a random of slope for ideology for the 12 regions in the UK implies there’s no overall effect of ideology in the United Kingdom after considering the region-by-region variation in the data. I note this humbly because, as I mentioned in the blog post from March, this is just a simple exercise aimed for instruction about quantitative methods.

More importantly than the inferential takeaway (for the sake of this guide), I think this offers a workaround for those of you working with statistical models that {stargazer} can’t process. If {stargazer} can’t process/summarize the particular statistical model, but the tidy() function in the {broom} package can, 1) estimate a simple linear model or generalized linear model that contains all the same variables (with the exact variable names), 2) have {stargazer} process that model and 3) overwrite the coefficients and standard errors with the information summarized by the tidy() function in {broom}.

Third, omit the default model statistics that {stargazer} wants to supply and add your own with the add.lines option. This will be the information that includes the number of groups/”clusters” in the random effect, the standard deviation of the random parameters, and the total number of observations in the data. The rest of the {stargazer} call is just for formatting.

My previous stab at this purported a means to “automate” the inclusion of information about random effects in a {stargazer} table summarizing mixed effects regressions. This is a better way of doing it.

1. The analyses from the blog post in March were done on unstandardized inputs. So, the coefficients and standard errors will look different but the underlying t-statistics will be the same for all but the constant, which incidentally becomes more precise and meaningful.