Simulate a (Augmented) Dickey-Fuller Test to Assess Unit Root in a Time Series
sadf_test.Rd
sadf_test()
provides a simulation approach to assessing
unit root in a time series by way of the (Augmented) Dickey-Fuller test. It
takes a vector and performs three (Augmented) Dickey-Fuller tests (no drift,
no trend; drift, no trend; drift and trend) and calculates tau statistics as
one normally would. Rather than interpolate or approximate a p-value, it
simulates some user-specified number of (Augmented) Dickey-Fuller tests of
either a known, non-stationary time series or a known, white-noise time series
matching the length of the time series the user provides. This allows the
user to make assessments of non-stationarity or stationarity by way of
simulation rather than approximation from received critical values by way of
books or tables some years out of date.
Arguments
- x
a vector
- n_lags
defaults to NULL, but must be 0 or a positive integer. This argument determines the number of lagged first differences to include in the estimation procedure. Recall that the test statistic (tau) is still the t-statistic for the level value of the vector at t-1, whether the constant (drift) and time trend is included or not. If this value is 0, the procedure is the classic Dickey-Fuller test. If this value is greater than 0, this is the "augmented" Dickey-Fuller test, so-called because it is "augmented" by the number of lagged first differences to assess higher-order AR processes. If no argument is specified, the default lag is Schwert's suggested lower bound. The
lag_suggests
data provides more information about these suggested lags.- n_sims
the number of simulations for calculating an interval or distribution of test statistics for assessing stationarity or non-stationarity. Defaults to 1,000.
- sim_hyp
can be either "stationary" or "nonstationary". If "stationary", the function runs (A)DF tests on simulated stationary (pure white noise) data. This allows the user to assess compatibility/plausibility of the test statistic against a distribution of test statistics that are known to be pure white noise (in expectation). If "nonstationary" (default), the function generates three different data sets of a pure random walk, a random walk with a drift, and a random walk with a drift and trend. It then runs (A)DF tests on all those. This allows the user to assess the compatibility/plausibility of their test statistics with data that are known to be nonstationary in some form.
Value
sadf_test()
returns a list of length 3. The first element
in the list is a matrix of tau statistics calculated by the test. The second
element is a data frame of the simulated tau statistics of either a known
white-noise time series or three different non-stationary time series
(pure random walk, random walk with drift, random walk with drift and trend).
The third element contains some attributes about the procedure for
post-processing.
Details
The Dickey-Fuller and its "augmented" corollary are curious statistical
procedures, even if the underlying concept is straightforward. I have seen
various implementations of these procedures use slightly different
terminology to describe its procedure, though this particular implementation
will impose nomenclature in which the classic Dickey-Fuller procedure that
assumes just the AR(1) process is one in which n_lags
is 0. The
addition of lags (of first differences) is what ultimately makes the
Dickey-Fuller procedure to be "augmented."
The function employs the default suggested by Schwert (1989) for the number
of lagged first differences to include in this procedure. Schwert (1989)
recommends taking the length of the series and dividing it by 100 before
raising that number to the power of 1/4. Thereafter, multiply it by 12 and
round down the number to the nearest integer. There are other suggested
defaults you can consider. adf.test
in aTSA takes the length of
the series, divides it by 100 and raises it to the power of 2/9. It
multiplies that by 4 and floors the result. adf.test
in tseries
subtracts 1 from the length of the series before raising it to the power of
1/3 (flooring that result as well). The Examples section will show you how
you can do this.
This function specifies three different types of tests: 1) no drift, no trend,
2) drift, no trend, and 3) drift and trend. In the language of the lm()
function, the first is lm(y ~ ly - 1)
where y
is the value of y
and
ly
is its first-order lag. The second test is lm(y ~ ly)
, intuitively
suggesting the y-intercept in this equation is the "drift". The third would
be lm(y ~ ly + t)
with t
being a simple integer that increases by 1 for
each observation (i.e. a time-trend).
None of this is meant to discourage the use of Fuller (1976) or its various reproductions for the sake of diagnosing stationarity or non-stationary, and I will confess their expertise on these matters outpaces mine. Consider the justification for this function to be largely philosophical and/or experimental. Why not simulate it? It's not like time or computing power are huge issues anymore.
This is always awkwardly stated, but it's a good reminder that the classic Dickey-Fuller statistics are mostly intended to come back negative. That's not always the case, to be clear, but it is the intended case. You assess the statistic by "how negative" it is. Stationary time series will produce test statistics more negative ("smaller") than those produced by non-stationary time series. In a way, this makes the hypotheses implicitly one-tailed (to use that language).
This function removes missing values from the vector before calculating test statistics.
References
Schwert, G. William. 1989. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business & Economic Statistics 7(2): 147–159.
Examples
y <- na.omit(USDSEK[1:500,])$close
sadf_test(y, n_sims = 25) # Doing 25, just to make it quick
#> $stats
#> [,1]
#> [1,] -0.3275852
#> [2,] -1.3004835
#> [3,] -1.6669445
#>
#> $sims
#> tau sim cat
#> 1 -0.321658306 1 No Drift, No Trend
#> 2 -0.089483503 1 Drift, No Trend
#> 3 -0.863914420 1 Drift and Trend
#> 4 -0.290391997 2 No Drift, No Trend
#> 5 -0.298088303 2 Drift, No Trend
#> 6 -1.708435583 2 Drift and Trend
#> 7 -0.892769715 3 No Drift, No Trend
#> 8 -0.426609180 3 Drift, No Trend
#> 9 -1.774396136 3 Drift and Trend
#> 10 0.195715679 4 No Drift, No Trend
#> 11 0.765703222 4 Drift, No Trend
#> 12 -2.395183174 4 Drift and Trend
#> 13 0.387840100 5 No Drift, No Trend
#> 14 -1.445699247 5 Drift, No Trend
#> 15 -3.078797610 5 Drift and Trend
#> 16 -0.158347893 6 No Drift, No Trend
#> 17 -1.395757562 6 Drift, No Trend
#> 18 -1.892292343 6 Drift and Trend
#> 19 -0.857842065 7 No Drift, No Trend
#> 20 1.847528559 7 Drift, No Trend
#> 21 -1.999687400 7 Drift and Trend
#> 22 0.506968909 8 No Drift, No Trend
#> 23 0.439669353 8 Drift, No Trend
#> 24 -1.201905879 8 Drift and Trend
#> 25 0.997677267 9 No Drift, No Trend
#> 26 -0.684686459 9 Drift, No Trend
#> 27 -1.439471233 9 Drift and Trend
#> 28 -0.554027708 10 No Drift, No Trend
#> 29 -0.270181444 10 Drift, No Trend
#> 30 -1.758765433 10 Drift and Trend
#> 31 1.200162747 11 No Drift, No Trend
#> 32 -1.696407543 11 Drift, No Trend
#> 33 -0.570673036 11 Drift and Trend
#> 34 0.051175250 12 No Drift, No Trend
#> 35 -0.007280289 12 Drift, No Trend
#> 36 -1.912377639 12 Drift and Trend
#> 37 -1.404489053 13 No Drift, No Trend
#> 38 -0.079733049 13 Drift, No Trend
#> 39 -3.394792701 13 Drift and Trend
#> 40 -2.483068030 14 No Drift, No Trend
#> 41 -1.010555656 14 Drift, No Trend
#> 42 -2.979136062 14 Drift and Trend
#> 43 -0.119081733 15 No Drift, No Trend
#> 44 0.477788324 15 Drift, No Trend
#> 45 -2.972185787 15 Drift and Trend
#> 46 -1.050542108 16 No Drift, No Trend
#> 47 1.140244198 16 Drift, No Trend
#> 48 -2.109933395 16 Drift and Trend
#> 49 -1.193731137 17 No Drift, No Trend
#> 50 -0.768490588 17 Drift, No Trend
#> 51 -1.484854773 17 Drift and Trend
#> 52 0.114277390 18 No Drift, No Trend
#> 53 1.818232205 18 Drift, No Trend
#> 54 -1.902376509 18 Drift and Trend
#> 55 -2.949196929 19 No Drift, No Trend
#> 56 -0.422198473 19 Drift, No Trend
#> 57 -2.491775658 19 Drift and Trend
#> 58 -3.079267969 20 No Drift, No Trend
#> 59 0.186079938 20 Drift, No Trend
#> 60 -2.068793539 20 Drift and Trend
#> 61 0.878051547 21 No Drift, No Trend
#> 62 0.579578226 21 Drift, No Trend
#> 63 -3.852946280 21 Drift and Trend
#> 64 -0.388931056 22 No Drift, No Trend
#> 65 -0.147721581 22 Drift, No Trend
#> 66 -1.669302315 22 Drift and Trend
#> 67 0.995708715 23 No Drift, No Trend
#> 68 1.496345575 23 Drift, No Trend
#> 69 -1.473440802 23 Drift and Trend
#> 70 1.354644734 24 No Drift, No Trend
#> 71 0.423147888 24 Drift, No Trend
#> 72 -3.086403933 24 Drift and Trend
#> 73 -0.122032908 25 No Drift, No Trend
#> 74 0.345245148 25 Drift, No Trend
#> 75 -1.916681696 25 Drift and Trend
#>
#> $attributes
#> lags sim_hyp n_sims n test
#> 1 5 nonstationary 25 499 adf
#>
#> attr(,"class")
#> [1] "sadf_test"