Skip to contents

linloess_plot() provides a visual diagnostic of the linearity assumption of the OLS model. Provided an OLS model fit by lm() in base R, the function extracts the model frame and creates a faceted scatterplot. For each facet, a linear smoother and LOESS smoother are estimated over the points. Users who run this function can assess just how much the linear smoother and LOESS smoother diverge. The more they diverge, the more the user can determine how much the OLS model is a good fit as specified. The plot will also point to potential outliers that may need further consideration.

Usage

linloess_plot(
  mod,
  resid = TRUE,
  smoother = "loess",
  se = TRUE,
  span = 0.75,
  ...
)

Arguments

mod

a fitted OLS model

resid

logical, defaults to TRUE. If FALSE, the y-axis on these plots are the raw values of the dependent variable. If TRUE, the y-axis is the model's residuals. Either work well here for the matter at hand, provided you treat the output here as illustrative or suggestive.

smoother

defaults to "loess", and is passed to the 'method' argument for the non-linear smoother.

se

logical, defaults to TRUE. If TRUE, gives standard error estimates with the assorted smoothers.

span

a numeric, defaults to .75. An adjustment to the smoother. Higher values permit smoother lines and might be warranted in the presence of sparse pockets of the data.

...

optional parameters, passed to the scatterplot (geom_point()) component of this function. Useful if you want to make the smoothers more legible against the points.

Value

linloess_plot() returns a faceted scatterplot as a ggplot2 object. The linear smoother is in solid blue (with blue standard error bands) and the LOESS smoother is a dashed black line (with gray/default standard error bands). You can add cosmetic features to it after the fact. The function may spit warnings to you related to the LOESS smoother, depending your data. I think these to be fine the extent to which this is really just a visual aid and an informal diagnostic for the linearity assumption.

Details

This function makes an implicit assumption that there is no variable in the regression formula with the name ".y" or ".resid".

It may be in your interest (for the sake of rudimentary diagnostic checks) to disable the standard error bands for particularly ill-fitting linear models.

Author

Steven V. Miller

Examples


M1 <- lm(mpg ~ ., data=mtcars)

linloess_plot(M1)
#> `geom_smooth()` using formula = 'y ~ x'
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01
#> Warning: pseudoinverse used at 4
#> Warning: neighborhood radius 2
#> Warning: reciprocal condition number  1.8444e-17
#> Warning: pseudoinverse used at 4
#> Warning: neighborhood radius 2
#> Warning: reciprocal condition number  1.8444e-17
#> Warning: pseudoinverse used at 3.98
#> Warning: neighborhood radius 4.02
#> Warning: reciprocal condition number  6.1406e-17
#> Warning: There are other near singularities as well. 16.16
#> Warning: pseudoinverse used at 3.98
#> Warning: neighborhood radius 4.02
#> Warning: reciprocal condition number  6.1406e-17
#> Warning: There are other near singularities as well. 16.16
#> Warning: pseudoinverse used at 2.99
#> Warning: neighborhood radius 1.01
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 4.0401
#> Warning: pseudoinverse used at 2.99
#> Warning: neighborhood radius 1.01
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 4.0401
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01

linloess_plot(M1, color="black", pch=21)
#> `geom_smooth()` using formula = 'y ~ x'
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01
#> Warning: pseudoinverse used at 4
#> Warning: neighborhood radius 2
#> Warning: reciprocal condition number  1.8444e-17
#> Warning: pseudoinverse used at 4
#> Warning: neighborhood radius 2
#> Warning: reciprocal condition number  1.8444e-17
#> Warning: pseudoinverse used at 3.98
#> Warning: neighborhood radius 4.02
#> Warning: reciprocal condition number  6.1406e-17
#> Warning: There are other near singularities as well. 16.16
#> Warning: pseudoinverse used at 3.98
#> Warning: neighborhood radius 4.02
#> Warning: reciprocal condition number  6.1406e-17
#> Warning: There are other near singularities as well. 16.16
#> Warning: pseudoinverse used at 2.99
#> Warning: neighborhood radius 1.01
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 4.0401
#> Warning: pseudoinverse used at 2.99
#> Warning: neighborhood radius 1.01
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 4.0401
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01
#> Warning: pseudoinverse used at -0.005
#> Warning: neighborhood radius 1.005
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 1.01