(Quasi)-automating the Inclusion of Random Effects in R's Stargazer Package
stargazer is a godsend for those of us who look for seamless ways to manage the execution and presentation of our statistical analyses. LaTeX, for all its strengths, inconveniences users who need to manually create tables. Doing this for peer-reviewed scholarship can be a perilous process, greatly increasing the probability of human error in the presentation of the results. The stargazer package, as all useful packages do, automates that and makes it the job of R to render the regression table. It also handles lmer
output, which is fantastic for those of us who estimate mixed effects models as models of choice.
However, one limitation of the stargazer package is that it ultimately processes lmer
output no different than it would handle a standard linear model or generalized linear model. It reports only the fixed effects of the model, even though the whole point of mixed effects models is that they contain important quantities of interest known as random effects. When these models are estimated, these parameters should be communicated in the regression table as well. In most standard models, following discussion in the likes of Andrew Gelman and Jennifer Hill’s book, these quantities of interest are typically just the number of unique observations in the random effect (i.e. “the number of groups”) and the standard deviation of the random effect. This second quantity of interest, which is standard output from an lmer
model, communicates the variation in the group intercepts left unexplained after the fixed effects of the model are estimated.
Normally, the analyst can just manually insert these into the stargazer
output. However, with some effort, this can be automated as well. This assumes that LaTeX is the output the researcher wants and the researcher is proficient in LaTeX tags.
I illustrate this process below using the cake
data that comes with the lme4
package. First, let’s understand the data. The lme4
reference manual describes the cake data set as follows.
Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units at the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that the replicate numbering represents temporal ordering.
The substance of the experiment is beyond the expertise of the political scientist, though I heard “cake” and “data” and that’s all I needed to hear. We’ll be estimating the outcome variable (angle
) as a function of the temperature at which it was cooked. The first model will be the temperature as a continuous variable. The second model will break the temperature variable into ordered factors, leaving the (< 175) category as the baseline comparison.
We’ll also estimate two random effects for each model. The first random effect will be each unique combination of the replicate variable (a factor variable with 15 different values) and the recipe variable (which was three, described earlier). All told, there are 45 different values in the first random effect (i.e. 15*3 = 45). The second random effect will be just the replicate variable.
Let’s get started in R, loading the two required packages for this exercise and the data to be used.
Let’s also load in a manual function that will allow us to insert new rows at various points in a data frame.
This is a simple function and its use will be seen in clearer detail soon.
Next, let’s estimate and get summaries of our two models.
Next, let’s use the stargazer
command to create the broad template for our table. I keep it simple in this stargazer
call, just eliminating the dependent variable label and providing labels for the covariates. Do consult Jake Russ’ quite handy cheatsheet for stargazer if you want to know more about how to master this package.
Now, let’s convert this new object that we titled Tables
as a data.frame and coerce the one vector in it (also called Tables
) into a character.
Manually spit out the object (i.e. the Tables call) to see where you want to insert the information about the random effects. This is typically done above the model fit information and after the last row of standard errors for the fixed effects. In this case, I see I’m starting at row 25. Make a note of it as follows.
This will be placeholder information and it’s useful to keep a note of it. Next, create some standard rows you’ll want to add. You can do this in R.
Recall that characters in R that include backslashes should be preceded by an additional backslash. This is why you see double the normal backslashes for LaTeX tags.
Now, starting at row 25 in that Tables object, let’s add a horizontal line, a row partitioning the random effect information to follow, and another horizontal line.
Notice this is where we start using the insertrow
function we outlined earlier. This function identifies a table data.frame (here: Tables
). Then, it inserts a row of whatever you want (e.g. hline
, randomeffect
) at starting point r
(which we specified earlier as row 25). It pushes everything else from the row number identified down one spot.
The next step is to know the order of your random effect groupings. Generally, I think “last comes first” is how lme4
stores lmer
output. The first random effect specified in the lmer
call is the last random effect stored in the lmer
output. Therefore, the recipe:replicate
combination is the first random effect and the replicate
random effect comes second. Let’s get the number of unique values in each of the groupings.
If you were to just run those commands without storing the output, you’d see the integers are 45 and 15 respectively. They don’t vary (in our case) from the first model to the second. If they do, grab them from the second model and store them as another output in your R environment.
Next, let’s get the standard deviation of the random effect, which communicates the amount of variation among the different groupings left unexplained after the estimation of the fixed effects in the model. Look careful at how this information can be extracted from lmer
output.
Now, we’re going to use the paste
function to create a free-floating character row. Notice that anything in quotes will be passed as a character into the assigned output. You can “pause” this by closing a quote, adding a comma, and assigning an R object. Follow it with a comma and resume with characters in quotes.
Some important notes about what’s happening here. Since we have just two models we’re including, we need just two ampersands to separate the quantities of Model 1 from Model 2. Further, since this is LaTeX output we’re wanting, make sure you close with four backslashes. Observe.
Let’s see what this did.
This is glorified LaTeX output, stored as a character in R.
Now, let’s do the same for the information about the random effects for the second model.
Here’s the fun part. Let’s add them all together now with our insertrow
function.
Save as if it were a table, but give it a .tex extension. Use these options to make sure you don’t get row names, column names, or quotes around the characters.
Assuming you have a LaTeX/R workflow setup where you could immediately render your LaTeX document with your new table, you’ll get output like this.
If you’re proficient with LaTeX, this will look at lot more intuitive. It is a convoluted way to automate this process, though the payoff comes later in the analysis process when different tweaks are made and time is saved from having to manually re-enter random effects into the regression table. It would be fantastic if the stargazer
package could do this automatically. Until then, here is a way to quasi-automate the inclusion of random effects into stargazer
output.
Code for this exercise will be made available at my Github page.
Disqus is great for comments/feedback but I had no idea it came with these gaudy ads.