It's better than bad; it's good.

I’m writing this out of necessity so that it may go into a future grad-level course about what logarithmic transformations do to your OLS model. When we are first introduced to logarithmic transformations, we learn they have a nice effect of coercing normality into positive real variables that have some kind of unwelcome skew. They become a quick fix for cases where the OLS model is sensitive to skew on either the left- or right-hand side of the equation. However, we often lose sight of the fact that the introduction of logarithmic transformations on one or both sides of the regression equation result in a different interpretation of what the OLS model is telling you for the stuff you want to know. So, I’m writing this as a simple primer for future students so that we can avoid some uncomfortable interpretations of OLS model parameters in the presence of logarithmic transformations. The goal of this post isn’t to litigate whether logarithmic transformations make sense as a matter of principle. Sometimes they do; sometimes they don’t. The goal here is just to make sure my students understand how interpretation of the model output changes in the presence of logarithmic transformations of the underlying phenomenon being estimated.

First, here are the R packages I’ll be using in this post.

library(tidyverse)     # for most things
library(stevedata)     # for the data
library(modelsummary)  # for modelsummary()
library(kableExtra)    # for extra table formatting
library(modelr)        # for data grids


## An Aside on Logarithms (and Exponents), and Some Basic Rules

I want to start with a discussion of basic rules of (natural) logarithms and their exponents. The logarithm of a number $$x$$ to base $$b$$ is the exponent to which $$b$$ must be raised to make $$x$$. This is a bit easier to see when the base is 10, which is common in a lot of scientific and engineering examples (like the Richter scale). $$10^2$$ (or 10 raised to the power of 2) is 100. That means the logarithm of base 10 of 100 is 2, which effectively inverts or “undoes” the exponent.

Statisticians instead prefer a logarithm of base $$e$$ (i.e. Leonhard Euler’s constant, or about 2.718) because the derivative of this so-called “natural” log of $$x$$ is a simple $$1/x$$. Much of the same rules apply, even if the derivative of the log of base 10 is different. However, it’s why statisticians mean “log of base $$e$$” when they say “log” or “natural log.”

I’ll admit that it’s been a very long time since I’ve had to think about “proving” these so-called logarithmic laws or identities, but here are the operative ones you’ll need to remember. In the following notation, “log” is shorthand for the log of base $$e$$ and exponentiation (often represented as something like $$e^{x}$$) is spelled out a little more as $$exp$$ in juxtaposition to Leonhard Euler’s constant ($$e$$).

$log(e) = 1 \\ log(1) = 0 \\ exp(1) = e \\ log(a) = b \\ exp(b) = a \\ log(exp(a)) = a \\ exp(log(a)) = a \\ log(a^b) = b*(log(a)) \\ log(a*b) = log(a) + log(b) \\ log(a/b) = log(a) - log(b) \\ exp(a*b) = (exp(a))^b$

Here’s a basic proof of concept in R for some of these important identities. Let $$a$$ be 45 and let $$b$$ be 48. Pay careful attention to the quotient and product identities.

log(45^48); 48*(log(45))
#> [1] 182.7198
#> [1] 182.7198
log(exp(45))
#> [1] 45
exp(log(48))
#> [1] 48
log(45*48); log(45) + log(48)
#> [1] 7.677864
#> [1] 7.677864
log(45/48); log(45) - log(48)
#> [1] -0.06453852
#> [1] -0.06453852


I do want to talk about one derivation of the quotient rule (i.e. $$log(a/b) = log(a) - log(b)$$). This is the percentage change approximation rule (of thumb) in which log differences are understood as percent changes. Here’s a quick formulation of this that is worth proving because it’s going to matter a great deal to how we deal with log-transformed dependent variables. Let there be two values, $$y'$$ and $$y$$, where $$y$$ is observed at $$x$$ and $$y'$$ is observed at $$(x + 1)$$ (i.e. a one-unit change). This quotient rule in logarithms devolves accordingly.

$log(y') - log(y) = log(y'/y) = \beta(x + 1) - \beta x = \beta$

In other words, our regression coefficient is our estimated of the difference between $$log(y')$$ and $$log(y)$$ (alternatively: $$log(y'/y$$). Let’s start with the quotient, which we know has a logarithmic identity. We can exponentiate that logarithmic identity and the following equivalencies come out.

$exp(log(y'/y)) = exp(\beta) = y'/y = 1 + (\frac{y'-y}{y})$

We’d say from this that $$\frac{y'-y}{y}$$ is the relative change and that $$\frac{y'-y}{y}*100 = 100*(exp(\beta) -1)$$ is the percentage change. Part of that ($$100*(exp(\beta - 1)$$) comes by way of something that I admittedly have very little appetite to discuss: Taylor series and the Maclaurin series. I won’t pretend to have the most sophisticated treatment here, but the operative language we use is for “very small values close to 0”, the Maclaurin series shows $$exp(x) - 1 \approx x$$, and, alternatively, $$exp(x) - x \approx 1$$. Calculus was never my strong suit, but you’ll just have to commit some of that to memory.

## The Data and the Models

The data I’ll be using for this post are a classic data set in pedagogical instruction. These are the Chatterjee and Price (1977) education expenditure data. In this classic case, Chatterjee and Price (1977) are trying to model projected per capita public school expenditures (edexppc) as a function of the the number of residents (per thousand) living in urban areas (urbanpop), the state-level income per capita (incpc), and the number of residents (per thousand) under 18 years of age.1 These data are relatively well-known for statistical instruction because they are an often cited case of heteroskedasticity.2 You could further teach the jackknife and the bootstrap around them with minimal effort. We will not bother with those issues here, though it’s worth reiterating that the interpretation of log-transformed regression parameters should come with a caveat about other aspects of model fit and OLS’ assumptions.

The data look like this.

The Chatterjee and Price (1977) Education Expenditure Data Set
State Urban Population Income per Capita Under-18 Population Education Expenditure per Capita Forecasts
AK 831 5309 333 311
AL 584 3724 332 208
AR 500 3680 320 221
AZ 796 4504 340 332
CA 909 5438 307 332
CO 785 5046 324 304
CT 774 5889 307 317
DE 722 5540 328 344
FL 805 4647 287 243
GA 603 4243 339 250
HI 484 5613 386 546
IA 572 4869 318 232
ID 541 4323 344 268
IL 830 5753 320 308
IN 649 4908 329 264
KS 661 5057 304 337
KY 523 3967 325 216
LA 661 3825 355 244
MA 846 5233 305 261
MD 766 5331 323 330
ME 508 3944 325 235
MI 738 5439 337 379
MN 664 4921 330 378
MO 701 4672 309 231
MS 445 3448 358 215
MT 534 4418 335 302
NC 450 4120 321 245
ND 443 4782 333 246
NE 615 4827 318 268
NH 564 4578 323 231
NJ 889 5759 310 285
NM 698 3764 366 317
NV 809 5560 330 291
NY 856 5663 301 387
OH 753 5012 324 221
OK 680 4189 306 234
OR 671 4697 305 316
PA 715 4894 300 300
RI 871 4780 303 300
SC 476 3817 342 233
SD 446 4296 330 230
TN 588 3946 315 212
TX 797 4336 335 269
UT 804 4005 378 315
VA 631 4715 317 261
VT 322 4011 328 270
WA 726 4989 313 312
WI 659 4634 328 342
WV 390 3828 310 214
WY 605 4813 331 323

We are going to run four different regressions on these data, with an eye toward understanding forecasts of educated expenditure per capita as a function of all three of these other variables. In the first case, we’re going to use the raw, untransformed scale of all the variables (i.e. nothing is log-transformed). In the second model, we’re going to log-transform just the dependent variable. In the third model, we’re going to pick one of the independent variables to transform (the under-18 population variable) and we’re going to leave the dependent variable on its original scale. In the fourth model and final model, we’re going to log-transform the dependent variable in addition to this under-18 population variable.

Let’s perform the analyses with the code below. Vincent Arel-Bundock’s {modelsummary} magic is happening underneath the hood to format the regression table.

CP77 %>% mutate(ln_edexppc = log(edexppc),
ln_pop = log(pop)) -> CP77

M1 <- lm(edexppc ~ urbanpop + incpc + pop, CP77)
M2 <- lm(ln_edexppc ~ urbanpop + incpc + pop, CP77)
M3 <- lm(edexppc ~ urbanpop + incpc + ln_pop, CP77)
M4 <- lm(ln_edexppc ~ urbanpop + incpc + ln_pop, CP77)

Multiple Regressions of Education Expenditure per Capita from Chatterjee and Price (1977)
No Transformations  DV is Log-Transformed  Under-18 Population is Log-Transformed  DV and Under-18 Population are Both Log-Transformed
Under-18 Population 1.552*** 0.005*** 503.206*** 1.495***
(0.315) (0.001) (106.804) (0.341)
Urban Population −0.004 0.000 −0.003 0.000
(0.051) (0.000) (0.052) (0.000)
Income per Capita 0.072*** 0.000*** 0.072*** 0.000***
(0.012) (0.000) (0.012) (0.000)
Intercept −556.568*** 3.026*** −2961.222*** −4.128*
(123.195) (0.395) (632.959) (2.019)
Num.Obs. 50 50 50 50
R2 Adj. 0.565 0.568 0.551 0.559
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Normally, this type of regression strategy would just minimally note that both scales (raw and log-transformed) of the under-18 population variable (and the DV) would result in a statistically significant coefficient that is positive. Or: “the (hypothesized, positive) effect of the under-18 population variable is robust to different variable specifications/transformations.” However, what they communicate as a coefficient is actually kind of different. Let’s start with the first case, and the simplest case.

### Neither the IV or DV is Log-Transformed

This is the simplest case and I don’t want to belabor it too much because I’m assuming a rudimentary understanding of OLS regression. In this instance, the regression coefficient is communicating a basic takeaway: a one-unit change in the number of residents (per thousand) under 18 years of age coincides with an estimated change of about 1.552 in the projected per capita public school expenditures for 1975. This isn’t too difficult to understand; it’s the most ideal case. We can use basic model predictions to show just that.

CP77 %>%
data_grid(.model = M1,
pop = seq(min(pop), max(pop), by=1)) %>%
mutate(pred = predict(M1, .),
diff = pred - lag(pred, 1)) -> predM1

predM1
#> # A tibble: 100 × 5
#>      pop urbanpop incpc  pred  diff
#>    <dbl>    <int> <int> <dbl> <dbl>
#>  1   287      661  4697  226. NA
#>  2   288      661  4697  228.  1.55
#>  3   289      661  4697  229.  1.55
#>  4   290      661  4697  231.  1.55
#>  5   291      661  4697  232.  1.55
#>  6   292      661  4697  234.  1.55
#>  7   293      661  4697  235.  1.55
#>  8   294      661  4697  237.  1.55
#>  9   295      661  4697  238.  1.55
#> 10   296      661  4697  240.  1.55
#> # … with 90 more rows


We can summarize those model predictions we just gathered as well and compare them to the regression coefficient.

predM1 %>%
# this is basically a mean in name only, given R distinct-value weirdness
summarize(mean = mean(diff, na.rm=T)) %>% pull()
#> [1] 1.552054

coef(M1)[4]
#>      pop
#> 1.552054


So yeah, this wasn’t too hard. When nothing in the OLS model is log-transformed, a one-unit change in the value of the independent variable coincides with an estimated change of the regression coefficient in the value of the dependent variable. That’s easy, but it’s also worth repeating.

### The DV is Log-Transformed, but the IV Isn’t

This changes a little bit when the dependent variable is log-transformed as practitioners sometimes like to do to impose normality on positive, (typically?) right-skewed variables. After all, skewed dependent variables are typically the culprit of some OLS model diagnostic problems (e.g. heteroskedasticity, non-normal residuals). It’s tempting—and I suppose, not completely dishonest—to say that a one-unit change in the under-18 population variable coincides with an estimated change of about .005 in the log-transformed value of the dependent variable. After all, R may or may not know that this dependent variable is a logarithmic transformation of something else. At least, if it knows, it doesn’t care. The practitioner has to know something else is happening in this so-called “log-lin” model.

Let’s illustrate what’s happening here in a stylized form. First, let’s create a simple prediction grid from this model where the income per capita and urban population variables are both fixed at their central tendency. I think data_grid() does the median by default. We’re going to set the under-18 population variable to be two values. The first is the median and the second is the median, + 1. Our simple data look like this.

CP77 %>%
data_grid(.model = M2,
pop = c(median(pop), median(pop) + 1)) -> newdat

newdat %>% data.frame
#>     pop urbanpop incpc
#> 1 324.5      661  4697
#> 2 325.5      661  4697


Now, we’re going to get our estimated values of education expenditure per capita estimates, given the model. We’re going to compare that to the regression coefficient and find something not too dissimilar to what we found before: the regression coefficient.

newdat %>%
mutate(pred = predict(M2, newdata =.)) -> newdat

newdat %>% mutate(diff = pred - lag(pred, 1)) %>%
data.frame
#>     pop urbanpop incpc     pred        diff
#> 1 324.5      661  4697 5.630567          NA
#> 2 325.5      661  4697 5.635155 0.004587787

coef(M2)[4]
#>         pop
#> 0.004587787


Hold on, though! The dependent variable is a logarithm and logarithms have special rules, some of which we introduced above. In particular, the quotient rule is going to apply here. $$log(a) - log(b) = log(a/b)$$ In our context here, we have two different values of the dependent variable. One, when the under-18 population variable is at the median, is about 5.630567. The other, when the under-18 population variable increases by one unit, is about 5.635155. The difference between them is the regression coefficient, yes, but those two values are logarithms. Thus, this quotient rule applies and we can understand the OLS model as communicating the following things, rounded for clarity, and omitting the other regressors (which are themselves fixed). Let $$y$$ be the value of estimated education expenditures per capita when the under-18 population variable is at the median and let $$y'$$ communicate the value of estimated education expenditures per capita when the under-18 population variable changes in its level by 1.

$log(y) = 5.63057 \\ log(y') = 5.635155 \\ log(y') - log(y) = \beta_{pop} \\ log(y'/y) = \beta_{pop}$

A few interesting things are happening here. For one, recall that the dependent variable is log-transformed, meaning it has a special identity: the geometric mean. So yes, there is an arithmetic mean of the log-transformed variable but contained in this logarithmic transformation is another attribute, the geometric mean, from its exponential form. Second, the regression coefficient is something akin to a ratio because of this quotient rule and logarithmic transformations. Third, there’s another property lurking around here of interest to us: the percentage change approximation rule. With those in mind, here are the following ways you could unpack this quantity in relation to its untransformed (raw) scale, given some of the logarithmic identities introduced above. Our focus here is just on the under-18 population variable in order to maintain consistency throughout the post.

### Summarizing Our log(DV)~IV Model

1. $$exp(.004587787) = 1.004598$$, or: “a one-unit change in the residents (per thousand) under the age of 18 multiplies the expected per capita public school expenditures by about 1.004598. You can plug in any value here as well. For example, $$exp(.004587787*20) = 1.096097$$, or: a 20-unit change in the residents (per thousand) under the age of 18—which is incidentally about a standard deviation change across the range of this variable—multiplies the expected per capita public school expenditures by about 1.096.
2. The first quantity also contains the relative change and percentage change, though you’d have to subtract 1. For example: $$(exp(.004587787) - 1)*100 = .4598$$. Or: a one-unit change in the residents (per thousand) under the age of 18 coincides with an estimated .4598 percent increase in expected per capita public school expenditures. The relative increase can be obtained by dividing the percentage increase by 100, or by not multiply the exponentiated coefficient (-1) by 100.
3. The percentage change rule (of thumb) of logarithmic difference says a one-unit change in the residents (per thousand) under the age of 18 coincides with an approximated $$100*(.004587787) \approx .45$$ percent increase in expected per capita public school expenditures. This one is typically everyone’s go-to for cases where the DV is logged but the IV is not. Just understand the percentage change rule is always approximate, though it’s fair to use it because your treatment of the regression coefficients is always approximate too.

We can use a prediction grid to illustrate this. Here, let’s create a prediction grid with a sequence from the minimum to the maximum of the under-18 population variable, fixing the other regressors at a typical value. Then, we’ll create model predictions, based on the model with the logged dependent variable. We can exponentiate those model predictions and create a relative change variable communicating the relative change from its previous value (i.e. $$\frac{y'-y}{y}$$), and then multiplying that by 100 to get its percentage change. What emerges is consistent with what we did above, though we are relating the discussion back to our raw (untransformed) variable that we log-transformed for the regression.

CP77 %>%
data_grid(.model = M2,
pop = seq(min(pop), max(pop), by=1)) -> newdat

newdat %>%
mutate(pred = predict(M2, newdata=.),
exppred = exp(pred)) %>%
mutate(relchange = (exppred - lag(exppred))/lag(exppred, 1),
perchange = relchange*100) %>%
summarize_at(c("relchange", "perchange"), ~mean(.,na.rm=T)) %>%
data.frame
#>     relchange perchange
#> 1 0.004598327 0.4598327

# Let's compare to our coefficient from M2 now
(exp(coef(M2)[4])-1); (exp(coef(M2)[4])-1)*100
#>         pop
#> 0.004598327
#>       pop
#> 0.4598327

# Technically not it, but close
coef(M2)[4]*100
#>       pop
#> 0.4587787


### The DV isn’t Log-Transformed, but the IV is

This is the so-called “lin-log” model and I see this less often than I see the so-called “log-lin” model. In our case, a rudimentary explanation of the under-18 population coefficient would say “a one-unit change in the logged value of the under-18 population variable coincides with an estimated change of about 503 in the estimated per capita education expenditures.” Or, words to that effect. This is already an odd thing to say because the transformation of the independent variable will juice up the absolute value of the coefficient (i.e. it reduces the scale of the variable in the model), which might artificially make the effect look “big.” In our particular case, there is no possible increase of 1 on the log scale for this variable. The logged minimum is about 5.66 and the logged maximum is about 5.96. A one-unit change doesn’t exist on the log scale here. Perhaps that is reason to not have log-transformed this variable in the first place, though the point of this exercises ignores that question altogether (i.e. I’m more interested in explaining how to interpret the OLS model in the presence of log transformations).

That said, something else is happening here. It’s a comparison of what the dependent variable is estimated to be under two hypotheticals. The first $$log(pop)$$ and the second is $$log(pop) + 1$$. Whatever that change comes out to is our regression coefficient (of about 503 in this case). However, we need to break this down into little pieces. To start, 1 in the context of a logarithmic variable can be understood as the log of Leonhard Eueler’s constant ($$e$$). Thus, $$log(x) + 1$$ can also be restated as $$log(x) + log(e)$$, which we have available to us because this variable was log-transformed before plugging it into the model.·That could further be restated as $$log(xe)$$ given the logarithmic identities introduced above. Fundamentally, the regression coefficient is communicating a proportional change, saying what the dependent variable would look like if you were to multiply a value of the independent variable by Leonhard Euler’s constant.

There’s another way of looking at this, much like we did above with the percentage change of rule approximation. Comparing two values of the per capita education expenditures ($$y', y$$) for a one-unit change in the logged under-18 population value ($$x', x$$) can be written as $$y' - y = 503.206*(log(x') - log(x)) = 503.206*(\log(x'/x))$$. Our percentage rule of thumb will reappear in how we can summarize this relationship, though this time it’s on the right-hand side of the formula. Here are the many ways you could summarize the “lin-log” model we estimated.

### Summarizing Our DV~log(IV) Model

1. The estimated change in per capita expenditures when the under-18 population variable is multiplied by Leonhard Euler’s constant is 503.206.
2. The estimated change in per capita expenditures is 503.206 when the under-18 population variable changes by $$100*(e - 1) \approx 171.82\%$$
3. A 1% change in the under-18 population variable changes the estimated per capita education expenditures by $$503.206*log(1.01) = 5.007$$. A 10% change in the under-18 population variable changes the estimated per capita expenditures by $$503.206*log(1.10) = 47.96$$. A 20% change in the under-18 population variable changes the estimated per capita expenditures by $$503.206*log(1.20) = 91.74$$.
4. Alternatively, and this is the one I see most often used given the percentage change approximation rule of thumb: a 1% change in the under-18 population variable changes the estimated per capita expenditures by $$503.206/100 \approx 5.03206$$

I see the fourth approach taught to students more often than the third. While I would not object to a student doing this, I think the third is more honest. Consider the code below as proof of concept. In this case, I’m starting a vector of 20 with the minimum of the under-18 population variable, leaving the next 19 spots blank. Then, I’m going to loop through the vector and create a new observation in the vector that is just the previous one, increased by 1%. This will encompass the effective range of the variable. Then, I’m going to create a prediction grid of its logarithm and get model predictions, summarizing the first differences of those predictions. While the $$\beta/100$$ interpretation isn’t necessarily wrong for communicating the regression coefficient for a change of 1% (it’s an admitted approximation!), $$\beta*log(1.01)$$ is the more accurate summary.

x <- c(322, rep(NA, 19))

for (i in 2:20) {
x[i] <- x[i-1]*1.01
}

CP77 %>%
data_grid(.model = M3,
ln_pop = log(x)) -> newdat

newdat %>%
mutate(pred = predict(M3, newdata =.),
diff = pred - lag(pred, 1)) %>%
summarize(mean = mean(diff, na.rm=T)) %>% data.frame
#>       mean
#> 1 5.007064

# Let's compare to our coefficient from M3
log(1.01)*coef(M3)[4]
#>   ln_pop
#> 5.007064

# Technically not it, but close
coef(M3)[4]/100
#>   ln_pop
#> 5.032058


### Both the DV and the IV are Log-Transformed

Beyond the simple OLS case, I think this so-called “log-log” model is the most straightforward to understand. We may have struggled to internalize that the log-transformed dependent variable is now functionally a ratio because of the quotient rule, but learned that a one-unit change in the (not log-transformed) independent variable communicated some estimated percent change in the underlying dependent variable that was log-transformed. $$\beta*100$$ is a useful way of summarizing/approximating it. We may have found it odd to think about a log-transformed independent variable also having this property, but also learned that a one percentage change in the independent variable (underneath the logarithmic transformation) has a $$\beta*log(1.01) \approx \beta/100$$ change in the value of the dependent variable. Because the quotient rule applies to both the dependent variable and the independent variable in this case, we have both. There is an estimate percentage change in the dependent variable for some percentage change in the independent variable.

### Summarizing Our log(DV)~log(IV) Model

1. The estimated change in per capita expenditures when the under-18 population variable is multiplied by Leonhard Euler’s constant is about $$exp(1.495) \approx 4.459$$. This would be about a 4.459-fold increase when the under-18 population variable is multiplied by Leonhard Euler’s constant (i.e. increases by 1 on its logarithmic scale).
2. The estimated percentage change in per capita education expenditures for a 1% change in the under-18 population is $$1.01^{1.495} = 1.0149$$. The relative change in per capita education expenditures is about .0149 for a 1% increase in the under-18 population variable and the percentage change in per capita education expenditures is about 1.49% for a 1% increase in the under-18 population variable.
3. This estimated percentage change in per capita education expenditures for a 1% change in the under-18 population can be approximated with the regression coefficient: $$\beta = 1.495$$.

We can use an amalgam of the same code above (having already generated our 1% changes in the under-18 population variable) to illustrate what this looks like.

CP77 %>%
data_grid(.model = M4,
ln_pop = log(x)) -> newdat

newdat %>%
mutate(pred = predict(M4, newdata=.),
exppred = exp(pred)) %>%
mutate(relchange = (exppred - lag(exppred))/lag(exppred, 1),
perchange = relchange*100) %>%
summarize_at(c("relchange", "perchange"), ~mean(.,na.rm=T)) %>%
data.frame
#>    relchange perchange
#> 1 0.01498897  1.498897

# Let's compare to our coefficient from M4
(1.01^coef(M4)[4]-1);(1.01^coef(M4)[4]-1)*100
#>     ln_pop
#> 0.01498897
#>   ln_pop
#> 1.498897

# Technically not it, but close
coef(M4)[4]
#>   ln_pop
#> 1.495201


## Conclusion

It’s not completely dishonest to summarize an OLS model with logarithmic transformations in very general language. A researcher could summarize an OLS model in which only the dependent variable is log-transformed by saying “a one-unit change in the independent variable coincides with an estimated change of $$\beta$$ in the log-transformed dependent variable.” Perhaps the goal is just to identify statistical significance and direction, in which case the language can be even more general. However, it’s not too much effort to relay the log transformations back to their untransformed values. Different procedures apply for different conditions of log transformation (i.e. whether the DV or IV is log-transformed, or if both are), but think of it this way. The standard OLS model in which nothing is log-transformed communicates unit changes and the presence of log transformations communicates proportional changes on the untransformed scale. How you communicate those depends on what exactly is log-transformed, but be mindful that the presence of logarithmic transformations mean there are multiplicative/proportional effects being communicated in the model.

1. The urban population data are benchmarked to 1970. The income per capita variable is supposedly from 1973. The under-18 variable is supposedly from 1974. The per capita public school expenditures data are forecasts for the year 1975. The data are admittedly ancient, but they are useful for this purpose.

2. This post won’t address the issue of passing a Breusch-Pagan test for heteroskedastic errors, but it’s worth saying that the model with the log-transformed dependent variable is sufficient to pass this test. We often tell students that logarithmic transformations with positive real variables fix issues of skew, especially focusing on the dependent variable.