Simulate a (Augmented) Dickey-Fuller Test to Assess Unit Root in a Time Series

sadf_test() provides a simulation approach to assessing unit root in a time series by way of the (Augmented) Dickey-Fuller test. It takes a vector and performs three (Augmented) Dickey-Fuller tests (no drift, no trend; drift, no trend; drift and trend) and calculates tau statistics as one normally would. Rather than interpolate or approximate a p-value, it simulates some user-specified number of (Augmented) Dickey-Fuller tests of either a known, non-stationary time series or a known, white-noise time series matching the length of the time series the user provides. This allows the user to make assessments of non-stationarity or stationarity by way of simulation rather than approximation from received critical values by way of books or tables some years out of date.

Usage

sadf_test(x, n_lags = NULL, n_sims = 1000, sim_hyp = "nonstationary")

Arguments

x: a vector
n_lags: defaults to NULL, but must be 0 or a positive integer. This argument determines the number of lagged first differences to include in the estimation procedure. Recall that the test statistic (tau) is still the t-statistic for the level value of the vector at t-1, whether the constant (drift) and time trend is included or not. If this value is 0, the procedure is the classic Dickey-Fuller test. If this value is greater than 0, this is the "augmented" Dickey-Fuller test, so-called because it is "augmented" by the number of lagged first differences to assess higher-order AR processes. If no argument is specified, the default lag is Schwert's suggested lower bound. The lag_suggests data provides more information about these suggested lags.
n_sims: the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000.
sim_hyp: can be either "stationary" or "nonstationary". If "stationary", the function runs (A)DF tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary" (default), the function generates three different data sets of a pure random walk, a random walk with a drift, and a random walk with a drift and trend. It then runs (A)DF tests on all those. This allows the user to assess the compatibility/plausibility of their test statistics with data that are known to be nonstationary in some form.

Value

sadf_test() returns a list of length 3. The first element in the list is a matrix of tau statistics calculated by the test. The second element is a data frame of the simulated tau statistics of either a known white-noise time series or three different non-stationary time series (pure random walk, random walk with drift, random walk with drift and trend). The third element contains some attributes about the procedure for post-processing.

Details

The Dickey-Fuller and its "augmented" corollary are curious statistical procedures, even if the underlying concept is straightforward. I have seen various implementations of these procedures use slightly different terminology to describe its procedure, though this particular implementation will impose nomenclature in which the classic Dickey-Fuller procedure that assumes just the AR(1) process is one in which n_lags is 0. The addition of lags (of first differences) is what ultimately makes the Dickey-Fuller procedure to be "augmented."

The function employs the default suggested by Schwert (1989) for the number of lagged first differences to include in this procedure. Schwert (1989) recommends taking the length of the series and dividing it by 100 before raising that number to the power of 1/4. Thereafter, multiply it by 12 and round down the number to the nearest integer. There are other suggested defaults you can consider. adf.test in aTSA takes the length of the series, divides it by 100 and raises it to the power of 2/9. It multiplies that by 4 and floors the result. adf.test in tseries subtracts 1 from the length of the series before raising it to the power of 1/3 (flooring that result as well). The Examples section will show you how you can do this.

This function specifies three different types of tests: 1) no drift, no trend, 2) drift, no trend, and 3) drift and trend. In the language of the lm() function, the first is lm(y ~ ly - 1) where y is the value of y and ly is its first-order lag. The second test is lm(y ~ ly), intuitively suggesting the y-intercept in this equation is the "drift". The third would be lm(y ~ ly + t) with t being a simple integer that increases by 1 for each observation (i.e. a time-trend).

None of this is meant to discourage the use of Fuller (1976) or its various reproductions for the sake of diagnosing stationarity or non-stationary, and I will confess their expertise on these matters outpaces mine. Consider the justification for this function to be largely philosophical and/or experimental. Why not simulate it? It's not like time or computing power are huge issues anymore.

This is always awkwardly stated, but it's a good reminder that the classic Dickey-Fuller statistics are mostly intended to come back negative. That's not always the case, to be clear, but it is the intended case. You assess the statistic by "how negative" it is. Stationary time series will produce test statistics more negative ("smaller") than those produced by non-stationary time series. In a way, this makes the hypotheses implicitly one-tailed (to use that language).

This function removes missing values from the vector before calculating test statistics.

References

Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business & Economic Statistics 7(2): 147–159.

Author

Steven V. Miller

Examples


y <- na.omit(USDSEK[1:500,])$close # there is one missing value here. n = 499.


sadf_test(y, n_sims = 25) # Doing 25, just to make it quick
#> $stats
#>            [,1]
#> [1,] -0.3275852
#> [2,] -1.3004835
#> [3,] -1.6669445
#> 
#> $sims
#>             tau sim                cat
#> 1  -0.321658306   1 No Drift, No Trend
#> 2  -0.089483503   1    Drift, No Trend
#> 3  -0.863914420   1    Drift and Trend
#> 4  -0.290391997   2 No Drift, No Trend
#> 5  -0.298088303   2    Drift, No Trend
#> 6  -1.708435583   2    Drift and Trend
#> 7  -0.892769715   3 No Drift, No Trend
#> 8  -0.426609180   3    Drift, No Trend
#> 9  -1.774396136   3    Drift and Trend
#> 10  0.195715679   4 No Drift, No Trend
#> 11  0.765703222   4    Drift, No Trend
#> 12 -2.395183174   4    Drift and Trend
#> 13  0.387840100   5 No Drift, No Trend
#> 14 -1.445699247   5    Drift, No Trend
#> 15 -3.078797610   5    Drift and Trend
#> 16 -0.158347893   6 No Drift, No Trend
#> 17 -1.395757562   6    Drift, No Trend
#> 18 -1.892292343   6    Drift and Trend
#> 19 -0.857842065   7 No Drift, No Trend
#> 20  1.847528559   7    Drift, No Trend
#> 21 -1.999687400   7    Drift and Trend
#> 22  0.506968909   8 No Drift, No Trend
#> 23  0.439669353   8    Drift, No Trend
#> 24 -1.201905879   8    Drift and Trend
#> 25  0.997677267   9 No Drift, No Trend
#> 26 -0.684686459   9    Drift, No Trend
#> 27 -1.439471233   9    Drift and Trend
#> 28 -0.554027708  10 No Drift, No Trend
#> 29 -0.270181444  10    Drift, No Trend
#> 30 -1.758765433  10    Drift and Trend
#> 31  1.200162747  11 No Drift, No Trend
#> 32 -1.696407543  11    Drift, No Trend
#> 33 -0.570673036  11    Drift and Trend
#> 34  0.051175250  12 No Drift, No Trend
#> 35 -0.007280289  12    Drift, No Trend
#> 36 -1.912377639  12    Drift and Trend
#> 37 -1.404489053  13 No Drift, No Trend
#> 38 -0.079733049  13    Drift, No Trend
#> 39 -3.394792701  13    Drift and Trend
#> 40 -2.483068030  14 No Drift, No Trend
#> 41 -1.010555656  14    Drift, No Trend
#> 42 -2.979136062  14    Drift and Trend
#> 43 -0.119081733  15 No Drift, No Trend
#> 44  0.477788324  15    Drift, No Trend
#> 45 -2.972185787  15    Drift and Trend
#> 46 -1.050542108  16 No Drift, No Trend
#> 47  1.140244198  16    Drift, No Trend
#> 48 -2.109933395  16    Drift and Trend
#> 49 -1.193731137  17 No Drift, No Trend
#> 50 -0.768490588  17    Drift, No Trend
#> 51 -1.484854773  17    Drift and Trend
#> 52  0.114277390  18 No Drift, No Trend
#> 53  1.818232205  18    Drift, No Trend
#> 54 -1.902376509  18    Drift and Trend
#> 55 -2.949196929  19 No Drift, No Trend
#> 56 -0.422198473  19    Drift, No Trend
#> 57 -2.491775658  19    Drift and Trend
#> 58 -3.079267969  20 No Drift, No Trend
#> 59  0.186079938  20    Drift, No Trend
#> 60 -2.068793539  20    Drift and Trend
#> 61  0.878051547  21 No Drift, No Trend
#> 62  0.579578226  21    Drift, No Trend
#> 63 -3.852946280  21    Drift and Trend
#> 64 -0.388931056  22 No Drift, No Trend
#> 65 -0.147721581  22    Drift, No Trend
#> 66 -1.669302315  22    Drift and Trend
#> 67  0.995708715  23 No Drift, No Trend
#> 68  1.496345575  23    Drift, No Trend
#> 69 -1.473440802  23    Drift and Trend
#> 70  1.354644734  24 No Drift, No Trend
#> 71  0.423147888  24    Drift, No Trend
#> 72 -3.086403933  24    Drift and Trend
#> 73 -0.122032908  25 No Drift, No Trend
#> 74  0.345245148  25    Drift, No Trend
#> 75 -1.916681696  25    Drift and Trend
#> 
#> $attributes
#>   lags       sim_hyp n_sims   n test
#> 1    5 nonstationary     25 499  adf
#> 
#> attr(,"class")
#> [1] "sadf_test"