Skip to contents

sim_qi() is a function to simulate quantities of interest from a regression model

Usage

sim_qi(
  mod,
  nsim = 1000,
  newdata,
  original_scale = TRUE,
  return_newdata = FALSE
)

Arguments

mod

a model object

nsim

number of simulations to be run, defaults to 1,000

newdata

A data frame with a hypothetical prediction grid. If absent, defaults to the model frame.

original_scale

logical, defaults to TRUE. If TRUE, the ensuing simulations are returned on their original scale. If FALSE, the ensuing simulations are transformed to a more practical/intuitive quantity that for now is the simulated probability. This argument is ignored in the context of simulations on the linear model.

return_newdata

logical, defaults to FALSE. If TRUE, the output returns additional columns corresponding with the inputs provided to newdata. This may facilitate easier transformation along with greater clarity as to what the simulations correspond.

Value

sim_qi() returns a data frame (as a tibble) with the quantities of interest and identifying information about the particular simulation number. For linear models, or simple generalized linear models where the dependent variable is either "there" or "not there", the quantity of interest returned is a single column (called y). For models where the underlying estimation of the dependent variable is, for lack of a better term, "multiple" (e.g. ordinal models with the basic proportional odds) assumption), the columns returned correspond with the number of distinct values of the outcome. For example, an ordinal model where there are five unique values of the dependent variable will return columns y1, y2, y3, y4, and y5.

Details

Specifying a variable in newdata with the exact same name as the dependent variable (e.g. mpg in the simple example provided in this documentation file) is necessary for matrix multiplication purposes. If you set return_newdata to TRUE, you should not interpret the column matching the name of the dependent variable as communicating the kind of information you want from this function. That particular column is just a simple placeholder you need for matrix multiplication. The information you want will always be in a column (or columns) named (or starting with) y.

This function builds in an implicit assumption that your dependent variable in the regression model is not called y.

For ordinal models, I recommend setting original_scale to be FALSE. The function, underneath the hood, is actually calculating things on the level of the probability. It's just transforming back to a logit or a probit, if that is what you say you want.

Examples


set.seed(8675309)

M1 <- lm(mpg ~ cyl + wt, mtcars)

sim_qi(M1, 10)
#> # A tibble: 320 × 2
#>        y   sim
#>    <dbl> <int>
#>  1  21.5     1
#>  2  20.9     1
#>  3  25.9     1
#>  4  20.1     1
#>  5  15.8     1
#>  6  19.5     1
#>  7  15.5     1
#>  8  23.9     1
#>  9  24.0     1
#> 10  19.5     1
#> # ℹ 310 more rows