Log, Log, Log (i.e. What Logarithmic Transformations Do to Your Linear Model Summary)

: It's better than bad; it's good.

When we are first introduced to logarithmic transformations, we learn they have a nice effect of coercing normality into positive real variables that have some kind of unwelcome skew. They become a quick fix for cases where the linear model summary is sensitive to skew on either the left- or right-hand side of the equation. However, we often lose sight of the fact that the introduction of logarithmic transformations on one or both sides of the regression equation result in a different interpretation of what the model is telling you for the stuff you want to know. So, I’m writing this as a simple primer for future students so that we can avoid some uncomfortable interpretations of model parameters in the presence of logarithmic transformations. The goal of this post isn’t to litigate whether logarithmic transformations make sense as a matter of principle. Sometimes they do; sometimes they don’t. The goal here is just to make sure my students understand how interpretation of the model output changes in the presence of logarithmic transformations of the underlying phenomenon being estimated.

First, here are the R packages I’ll be using in this post.

library(tidyverse)     # for most things
library(stevedata)     # for the data
library(modelsummary)  # for modelsummary()
library(kableExtra)    # for extra table formatting
library(modelr)        # for data grids

Here’s a table of contents.

An Aside on Logarithms (and Exponents), and Some Basic Rules
The Data and the Models
Conclusion

An Aside on Logarithms (and Exponents), and Some Basic Rules

I want to start with a discussion of basic rules of (natural) logarithms and their exponents. The logarithm of a number \(x\) to base \(b\) is the exponent to which \(b\) must be raised to make \(x\). This is a bit easier to see when the base is 10, which is common in a lot of scientific and engineering examples (like the Richter scale). \(10^2\) (or 10 raised to the power of 2) is 100. That means the logarithm of base 10 of 100 is 2, which effectively inverts or “undoes” the exponent.

Statisticians instead prefer a logarithm of base \(e\) (i.e. Leonhard Euler’s constant, or about 2.718) because the derivative of this so-called “natural” log of \(x\) is a simple \(1/x\). Much of the same rules apply, even if the derivative of the log of base 10 is different. However, it’s why statisticians mean “log of base \(e\)” when they say “log” or “natural log.”

I’ll admit that it’s been a very long time since I’ve had to think about “proving” these so-called logarithmic laws or identities, but here are the operative ones you’ll need to remember. In the following notation, “log” is shorthand for the log of base \(e\) and exponentiation (often represented as something like \(e^{x}\)) is spelled out a little more as \(exp\) in juxtaposition to Leonhard Euler’s constant (\(e\)).

\[log(e) = 1 \\ log(1) = 0 \\ exp(1) = e \\ log(a) = b \\ exp(b) = a \\ log(exp(a)) = a \\ exp(log(a)) = a \\ log(a^b) = b*(log(a)) \\ log(a*b) = log(a) + log(b) \\ log(a/b) = log(a) - log(b) \\ exp(a*b) = (exp(a))^b\]

Here’s a basic proof of concept in R for some of these important identities. Let \(a\) be 45 and let \(b\) be 48. Pay careful attention to the quotient and product identities.

log(45^48); 48*(log(45))
#> [1] 182.7198
#> [1] 182.7198
log(exp(45))
#> [1] 45
exp(log(48))
#> [1] 48
log(45*48); log(45) + log(48)
#> [1] 7.677864
#> [1] 7.677864
log(45/48); log(45) - log(48)
#> [1] -0.06453852
#> [1] -0.06453852

I do want to talk about one derivation of the quotient rule (i.e. \(log(a/b) = log(a) - log(b)\)). This is the percentage change approximation rule (of thumb) in which log differences are understood as percent changes. Here’s a quick formulation of this that is worth proving because it’s going to matter a great deal to how we deal with log-transformed dependent variables. Let there be two values, \(y'\) and \(y\), where \(y\) is observed at \(x\) and \(y'\) is observed at \((x + 1)\) (i.e. a one-unit change). This quotient rule in logarithms devolves accordingly.

\[log(y') - log(y) = log(y'/y) = \beta(x + 1) - \beta x = \beta\]

In other words, our regression coefficient is our estimated of the difference between \(log(y')\) and \(log(y)\) (alternatively: \(log(y'/y\)). Let’s start with the quotient, which we know has a logarithmic identity. We can exponentiate that logarithmic identity and the following equivalencies come out.

\[exp(log(y'/y)) = exp(\beta) = y'/y = 1 + (\frac{y'-y}{y})\]

We’d say from this that \(\frac{y'-y}{y}\) is the relative change and that \(\frac{y'-y}{y}*100 = 100*(exp(\beta) -1)\) is the percentage change. Part of that (\(100*(exp(\beta - 1)\)) comes by way of something that I admittedly have very little appetite to discuss: Taylor series and the Maclaurin series. I won’t pretend to have the most sophisticated treatment here, but the operative language we use is for “very small values close to 0”, the Maclaurin series shows \(exp(x) - 1 \approx x\), and, alternatively, \(exp(x) - x \approx 1\). Calculus was never my strong suit, but you’ll just have to commit some of that to memory.

The Data and the Models

The data I’ll be using for this post are a classic data set in pedagogical instruction. These are the Chatterjee and Price (1977) education expenditure data. In this classic case, Chatterjee and Price (1977) are trying to model projected per capita public school expenditures (edexppc) as a function of the the number of residents (per thousand) living in urban areas (urbanpop), the state-level income per capita (incpc), and the number of residents (per thousand) under 18 years of age.¹ These data are relatively well-known for statistical instruction because they are an often cited case of heteroskedasticity.² You could further teach the jackknife and the bootstrap around them with minimal effort. We will not bother with those issues here, though it’s worth reiterating that the interpretation of log-transformed regression parameters should come with a caveat about other aspects of model fit and OLS’ assumptions.

The data look like this.

The Chatterjee and Price (1977) Education Expenditure Data Set
State	Urban Population	Income per Capita	Under-18 Population	Education Expenditure per Capita Forecasts
AK	831	5309	333	311
AL	584	3724	332	208
AR	500	3680	320	221
AZ	796	4504	340	332
CA	909	5438	307	332
CO	785	5046	324	304
CT	774	5889	307	317
DE	722	5540	328	344
FL	805	4647	287	243
GA	603	4243	339	250
HI	484	5613	386	546
IA	572	4869	318	232
ID	541	4323	344	268
IL	830	5753	320	308
IN	649	4908	329	264
KS	661	5057	304	337
KY	523	3967	325	216
LA	661	3825	355	244
MA	846	5233	305	261
MD	766	5331	323	330
ME	508	3944	325	235
MI	738	5439	337	379
MN	664	4921	330	378
MO	701	4672	309	231
MS	445	3448	358	215
MT	534	4418	335	302
NC	450	4120	321	245
ND	443	4782	333	246
NE	615	4827	318	268
NH	564	4578	323	231
NJ	889	5759	310	285
NM	698	3764	366	317
NV	809	5560	330	291
NY	856	5663	301	387
OH	753	5012	324	221
OK	680	4189	306	234
OR	671	4697	305	316
PA	715	4894	300	300
RI	871	4780	303	300
SC	476	3817	342	233
SD	446	4296	330	230
TN	588	3946	315	212
TX	797	4336	335	269
UT	804	4005	378	315
VA	631	4715	317	261
VT	322	4011	328	270
WA	726	4989	313	312
WI	659	4634	328	342
WV	390	3828	310	214
WY	605	4813	331	323

We are going to run four different regressions on these data, with an eye toward understanding forecasts of educated expenditure per capita as a function of all three of these other variables. In the first case, we’re going to use the raw, untransformed scale of all the variables (i.e. nothing is log-transformed). In the second model, we’re going to log-transform just the dependent variable. In the third model, we’re going to pick one of the independent variables to transform (the under-18 population variable) and we’re going to leave the dependent variable on its original scale. In the fourth model and final model, we’re going to log-transform the dependent variable in addition to this under-18 population variable.

Let’s perform the analyses with the code below. Vincent Arel-Bundock’s {modelsummary} magic is happening underneath the hood to format the regression table.

CP77 %>% mutate(ln_edexppc = log(edexppc),
                ln_pop = log(pop)) -> CP77

M1 <- lm(edexppc ~ urbanpop + incpc + pop, CP77)
M2 <- lm(ln_edexppc ~ urbanpop + incpc + pop, CP77)
M3 <- lm(edexppc ~ urbanpop + incpc + ln_pop, CP77)
M4 <- lm(ln_edexppc ~ urbanpop + incpc + ln_pop, CP77)

Multiple Regressions of Education Expenditure per Capita from Chatterjee and Price (1977)
	No Transformations	DV is Log-Transformed	Under-18 Population is Log-Transformed	DV and Under-18 Population are Both Log-Transformed
Under-18 Population	1.552***	0.005***	503.206***	1.495***
	(0.315)	(0.001)	(106.804)	(0.341)
Urban Population	−0.004	0.000	−0.003	0.000
	(0.051)	(0.000)	(0.052)	(0.000)
Income per Capita	0.072***	0.000***	0.072***	0.000***
	(0.012)	(0.000)	(0.012)	(0.000)
Intercept	−556.568***	3.026***	−2961.222***	−4.128*
	(123.195)	(0.395)	(632.959)	(2.019)
Num.Obs.	50	50	50	50
R2 Adj.	0.565	0.568	0.551	0.559
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Normally, this type of regression strategy would just minimally note that both scales (raw and log-transformed) of the under-18 population variable (and the DV) would result in a statistically significant coefficient that is positive. Or: “the (hypothesized, positive) effect of the under-18 population variable is robust to different variable specifications/transformations.” However, what they communicate as a coefficient is actually kind of different. Let’s start with the first case, and the simplest case.

Neither the IV or DV is Log-Transformed

This is the simplest case and I don’t want to belabor it too much because I’m assuming a rudimentary understanding of OLS. In this instance, the regression coefficient is communicating a basic takeaway: a one-unit change in the number of residents (per thousand) under 18 years of age coincides with an estimated change of about 1.552 in the projected per capita public school expenditures for 1975. This isn’t too difficult to understand; it’s the most ideal case. We can use basic model predictions to show just that.

CP77 %>%
  data_grid(.model = M1,
            pop = seq(min(pop), max(pop), by=1)) %>%
  mutate(pred = predict(M1, .),
         diff = pred - lag(pred, 1)) -> predM1

predM1
#> # A tibble: 100 × 5
#>      pop urbanpop incpc  pred  diff
#>    <dbl>    <int> <int> <dbl> <dbl>
#>  1   287      661  4697  226. NA   
#>  2   288      661  4697  228.  1.55
#>  3   289      661  4697  229.  1.55
#>  4   290      661  4697  231.  1.55
#>  5   291      661  4697  232.  1.55
#>  6   292      661  4697  234.  1.55
#>  7   293      661  4697  235.  1.55
#>  8   294      661  4697  237.  1.55
#>  9   295      661  4697  238.  1.55
#> 10   296      661  4697  240.  1.55
#> # ℹ 90 more rows

We can summarize those model predictions we just gathered as well and compare them to the regression coefficient.

predM1 %>% 
  # this is basically a mean in name only, given R distinct-value weirdness
  summarize(mean = mean(diff, na.rm=T)) %>% pull()
#> [1] 1.552054

coef(M1)[4]
#>      pop 
#> 1.552054

So yeah, this wasn’t too hard. When nothing in the model is log-transformed, a one-unit change in the value of the independent variable coincides with an estimated change of the regression coefficient in the value of the dependent variable. That’s easy, but it’s also worth repeating.

The DV is Log-Transformed, but the IV Isn’t

This changes a little bit when the dependent variable is log-transformed as practitioners sometimes like to do to impose normality on positive, (typically?) right-skewed variables. After all, skewed dependent variables are typically the culprit of some OLS diagnostic problems (e.g. heteroskedasticity, non-normal residuals). It’s tempting—and I suppose, not completely dishonest—to say that a one-unit change in the under-18 population variable coincides with an estimated change of about .005 in the log-transformed value of the dependent variable. After all, R may or may not know that this dependent variable is a logarithmic transformation of something else. At least, if it knows, it doesn’t care. The practitioner has to know something else is happening in this so-called “log-lin” model.

Let’s illustrate what’s happening here in a stylized form. First, let’s create a simple prediction grid from this model where the income per capita and urban population variables are both fixed at their central tendency. I think data_grid() does the median by default. We’re going to set the under-18 population variable to be two values. The first is the median and the second is the median, + 1. Our simple data look like this.

CP77 %>%
  data_grid(.model = M2,
            pop = c(median(pop), median(pop) + 1)) -> newdat

newdat %>% data.frame
#>     pop urbanpop incpc
#> 1 324.5      661  4697
#> 2 325.5      661  4697

Now, we’re going to get our estimated values of education expenditure per capita estimates, given the model. We’re going to compare that to the regression coefficient and find something not too dissimilar to what we found before: the regression coefficient.

newdat %>%
  mutate(pred = predict(M2, newdata =.)) -> newdat

newdat %>% mutate(diff = pred - lag(pred, 1)) %>%
  data.frame
#>     pop urbanpop incpc     pred        diff
#> 1 324.5      661  4697 5.630567          NA
#> 2 325.5      661  4697 5.635155 0.004587787

coef(M2)[4]
#>         pop 
#> 0.004587787

Hold on, though! The dependent variable is a logarithm and logarithms have special rules, some of which we introduced above. In particular, the quotient rule is going to apply here. \(log(a) - log(b) = log(a/b)\) In our context here, we have two different values of the dependent variable. One, when the under-18 population variable is at the median, is about 5.630567. The other, when the under-18 population variable increases by one unit, is about 5.635155. The difference between them is the regression coefficient, yes, but those two values are logarithms. Thus, this quotient rule applies and we can understand the linear model as communicating the following things, rounded for clarity, and omitting the other regressors (which are themselves fixed). Let \(y\) be the value of estimated education expenditures per capita when the under-18 population variable is at the median and let \(y'\) communicate the value of estimated education expenditures per capita when the under-18 population variable changes in its level by 1.

\[log(y) = 5.63057 \\ log(y') = 5.635155 \\ log(y') - log(y) = \beta_{pop} \\ log(y'/y) = \beta_{pop}\]

A few interesting things are happening here. For one, recall that the dependent variable is log-transformed, meaning it has a special identity: the geometric mean. So yes, there is an arithmetic mean of the log-transformed variable but contained in this logarithmic transformation is another attribute, the geometric mean, from its exponential form. Second, the regression coefficient is something akin to a ratio because of this quotient rule and logarithmic transformations. Third, there’s another property lurking around here of interest to us: the percentage change approximation rule. With those in mind, here are the following ways you could unpack this quantity in relation to its untransformed (raw) scale, given some of the logarithmic identities introduced above. Our focus here is just on the under-18 population variable in order to maintain consistency throughout the post.

Summarizing Our log(DV)~IV Model

\(exp(.004587787) = 1.004598\), or: “a one-unit change in the residents (per thousand) under the age of 18 multiplies the expected per capita public school expenditures by about 1.004598. You can plug in any value here as well. For example, \(exp(.004587787*20) = 1.096097\), or: a 20-unit change in the residents (per thousand) under the age of 18—which is incidentally about a standard deviation change across the range of this variable—multiplies the expected per capita public school expenditures by about 1.096.
The first quantity also contains the relative change and percentage change, though you’d have to subtract 1. For example: \((exp(.004587787) - 1)*100 = .4598\). Or: a one-unit change in the residents (per thousand) under the age of 18 coincides with an estimated .4598 percent increase in expected per capita public school expenditures. The relative increase can be obtained by dividing the percentage increase by 100, or by not multiply the exponentiated coefficient (-1) by 100.
The percentage change rule (of thumb) of logarithmic difference says a one-unit change in the residents (per thousand) under the age of 18 coincides with an approximated \(100*(.004587787) \approx .45\) percent increase in expected per capita public school expenditures. This one is typically everyone’s go-to for cases where the DV is logged but the IV is not. Just understand the percentage change rule is always approximate, though it’s fair to use it because your treatment of the regression coefficients is always approximate too.

We can use a prediction grid to illustrate this. Here, let’s create a prediction grid with a sequence from the minimum to the maximum of the under-18 population variable, fixing the other regressors at a typical value. Then, we’ll create model predictions, based on the model with the logged dependent variable. We can exponentiate those model predictions and create a relative change variable communicating the relative change from its previous value (i.e. \(\frac{y'-y}{y}\)), and then multiplying that by 100 to get its percentage change. What emerges is consistent with what we did above, though we are relating the discussion back to our raw (untransformed) variable that we log-transformed for the regression.

CP77 %>%
  data_grid(.model = M2,
            pop = seq(min(pop), max(pop), by=1)) -> newdat

newdat %>% 
  mutate(pred = predict(M2, newdata=.),
         exppred = exp(pred)) %>% 
  mutate(relchange = (exppred - lag(exppred))/lag(exppred, 1),
         perchange = relchange*100) %>%
  summarize_at(c("relchange", "perchange"), ~mean(.,na.rm=T)) %>%
  data.frame
#>     relchange perchange
#> 1 0.004598327 0.4598327

# Let's compare to our coefficient from M2 now
(exp(coef(M2)[4])-1); (exp(coef(M2)[4])-1)*100 
#>         pop 
#> 0.004598327
#>       pop 
#> 0.4598327

# Technically not it, but close
coef(M2)[4]*100
#>       pop 
#> 0.4587787

The DV isn’t Log-Transformed, but the IV is

This is the so-called “lin-log” model and I see this less often than I see the so-called “log-lin” model. In our case, a rudimentary explanation of the under-18 population coefficient would say “a one-unit change in the logged value of the under-18 population variable coincides with an estimated change of about 503 in the estimated per capita education expenditures.” Or, words to that effect. This is already an odd thing to say because the transformation of the independent variable will juice up the absolute value of the coefficient (i.e. it reduces the scale of the variable in the model), which might artificially make the effect look “big.” In our particular case, there is no possible increase of 1 on the log scale for this variable. The logged minimum is about 5.66 and the logged maximum is about 5.96. A one-unit change doesn’t exist on the log scale here. Perhaps that is reason to not have log-transformed this variable in the first place, though the point of this exercise ignores that question altogether (i.e. I’m more interested in explaining how to interpret the model output in the presence of log transformations).

That said, something else is happening here. It’s a comparison of what the dependent variable is estimated to be under two hypotheticals. The first \(log(pop)\) and the second is \(log(pop) + 1\). Whatever that change comes out to is our regression coefficient (of about 503 in this case). However, we need to break this down into little pieces. To start, 1 in the context of a logarithmic variable can be understood as the log of Leonhard Eueler’s constant (\(e\)). Thus, \(log(x) + 1\) can also be restated as \(log(x) + log(e)\), which we have available to us because this variable was log-transformed before plugging it into the model.·That could further be restated as \(log(xe)\) given the logarithmic identities introduced above. Fundamentally, the regression coefficient is communicating a proportional change, saying what the dependent variable would look like if you were to multiply a value of the independent variable by Leonhard Euler’s constant.

There’s another way of looking at this, much like we did above with the percentage change of rule approximation. Comparing two values of the per capita education expenditures (\(y', y\)) for a one-unit change in the logged under-18 population value (\(x', x\)) can be written as \(y' - y = 503.206*(log(x') - log(x)) = 503.206*(\log(x'/x))\). Our percentage rule of thumb will reappear in how we can summarize this relationship, though this time it’s on the right-hand side of the formula. Here are the many ways you could summarize the “lin-log” model we estimated.

Summarizing Our DV~log(IV) Model

The estimated change in per capita expenditures when the under-18 population variable is multiplied by Leonhard Euler’s constant is 503.206.
The estimated change in per capita expenditures is 503.206 when the under-18 population variable changes by \(100*(e - 1) \approx 171.82\%\)
A 1% change in the under-18 population variable changes the estimated per capita education expenditures by \(503.206*log(1.01) = 5.007\). A 10% change in the under-18 population variable changes the estimated per capita expenditures by \(503.206*log(1.10) = 47.96\). A 20% change in the under-18 population variable changes the estimated per capita expenditures by \(503.206*log(1.20) = 91.74\).
Alternatively, and this is the one I see most often used given the percentage change approximation rule of thumb: a 1% change in the under-18 population variable changes the estimated per capita expenditures by \(503.206/100 \approx 5.03206\)

I see the fourth approach taught to students more often than the third. While I would not object to a student doing this, I think the third is more honest. Consider the code below as proof of concept. In this case, I’m starting a vector of 20 with the minimum of the under-18 population variable, leaving the next 19 spots blank. Then, I’m going to loop through the vector and create a new observation in the vector that is just the previous one, increased by 1%. This will encompass the effective range of the variable. Then, I’m going to create a prediction grid of its logarithm and get model predictions, summarizing the first differences of those predictions. While the \(\beta/100\) interpretation isn’t necessarily wrong for communicating the regression coefficient for a change of 1% (it’s an admitted approximation!), \(\beta*log(1.01)\) is the more accurate summary.

x <- c(322, rep(NA, 19))

for (i in 2:20) {
  x[i] <- x[i-1]*1.01
}

CP77 %>%
    data_grid(.model = M3,
              ln_pop = log(x)) -> newdat

newdat %>%
  mutate(pred = predict(M3, newdata =.),
         diff = pred - lag(pred, 1)) %>%
  summarize(mean = mean(diff, na.rm=T)) %>% data.frame
#>       mean
#> 1 5.007064

# Let's compare to our coefficient from M3
log(1.01)*coef(M3)[4]
#>   ln_pop 
#> 5.007064

# Technically not it, but close
coef(M3)[4]/100
#>   ln_pop 
#> 5.032058

Both the DV and the IV are Log-Transformed

Beyond the simple case where nothing is log-transformed, I think this so-called “log-log” model is the most straightforward to understand. We may have struggled to internalize that the log-transformed dependent variable is now functionally a ratio because of the quotient rule, but learned that a one-unit change in the (not log-transformed) independent variable communicated some estimated percent change in the underlying dependent variable that was log-transformed. \(\beta*100\) is a useful way of summarizing/approximating it. We may have found it odd to think about a log-transformed independent variable also having this property, but also learned that a one percentage change in the independent variable (underneath the logarithmic transformation) has a \(\beta*log(1.01) \approx \beta/100\) change in the value of the dependent variable. Because the quotient rule applies to both the dependent variable and the independent variable in this case, we have both. There is an estimate percentage change in the dependent variable for some percentage change in the independent variable.

Summarizing Our log(DV)~log(IV) Model

The estimated change in per capita expenditures when the under-18 population variable is multiplied by Leonhard Euler’s constant is about \(exp(1.495) \approx 4.459\). This would be about a 4.459-fold increase when the under-18 population variable is multiplied by Leonhard Euler’s constant (i.e. increases by 1 on its logarithmic scale).
The estimated percentage change in per capita education expenditures for a 1% change in the under-18 population is \(1.01^{1.495} = 1.0149\). The relative change in per capita education expenditures is about .0149 for a 1% increase in the under-18 population variable and the percentage change in per capita education expenditures is about 1.49% for a 1% increase in the under-18 population variable.
This estimated percentage change in per capita education expenditures for a 1% change in the under-18 population can be approximated with the regression coefficient: \(\beta = 1.495\).

We can use an amalgam of the same code above (having already generated our 1% changes in the under-18 population variable) to illustrate what this looks like.

CP77 %>%
    data_grid(.model = M4,
              ln_pop = log(x)) -> newdat

newdat %>%
  mutate(pred = predict(M4, newdata=.),
         exppred = exp(pred)) %>% 
  mutate(relchange = (exppred - lag(exppred))/lag(exppred, 1),
         perchange = relchange*100) %>%
  summarize_at(c("relchange", "perchange"), ~mean(.,na.rm=T)) %>%
  data.frame
#>    relchange perchange
#> 1 0.01498897  1.498897

# Let's compare to our coefficient from M4
(1.01^coef(M4)[4]-1);(1.01^coef(M4)[4]-1)*100 
#>     ln_pop 
#> 0.01498897
#>   ln_pop 
#> 1.498897

# Technically not it, but close
coef(M4)[4]
#>   ln_pop 
#> 1.495201

Conclusion

It’s not completely dishonest to summarize a linear model with logarithmic transformations in general language. A researcher could summarize a model in which only the dependent variable is log-transformed by saying “a one-unit change in the independent variable coincides with an estimated change of \(\beta\) in the log-transformed dependent variable.” Perhaps the goal is just to identify statistical significance and direction, in which case the language can be even more general. However, it’s not too much effort to relay the log transformations back to their untransformed values. Different procedures apply for different conditions of log transformation (i.e. whether the DV or IV is log-transformed, or if both are), but think of it this way. The standard model in which nothing is log-transformed communicates unit changes and the presence of log transformations communicates proportional changes on the untransformed scale. How you communicate those depends on what exactly is log-transformed, but be mindful that the presence of logarithmic transformations mean there are multiplicative/proportional effects you should unpack from the model output.

The urban population data are benchmarked to 1970. The income per capita variable is supposedly from 1973. The under-18 variable is supposedly from 1974. The per capita public school expenditures data are forecasts for the year 1975. The data are admittedly ancient, but they are useful for this purpose. ↩
This post won’t address the issue of passing a Breusch-Pagan test for heteroskedastic errors, but it’s worth saying that the model with the log-transformed dependent variable is sufficient to pass this test. We often tell students that logarithmic transformations with positive real variables fix issues of skew, especially focusing on the dependent variable. ↩