Unit root tests and Box-Jenkins
Anton Parlow Lab session Econ710 UWM Econ Department
03/05/2010
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
1 / 35
Our plan
Introduction to time series AR and MA-process Box-Jenkins Method Unit root tests Short review of Stata Finding the proper model Unit root tests Arima Forecasting
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
2 / 35
Introduction
A time series is the outcome of a variable observed over time e.g. annually, quarterly, monthly and so on. There are dierent ways to describe a series e.g. has it a trend, a drift or is it a random walk? Example: Quarterly real GDP from 1947 to 2008
We want to explain GDP today with past values of GDP but have to nd the proper model rst.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests 03/05/2010 3 / 35
AR and MA-process
If GDP (yt ) depends only on its own (=auto) and past values (regressive) we have an autoregressive process: yt = + 1 yt1 + 2 yt2 + 3yt3 + + p ytp +
t
In general we call it an AR(p)-model and if GDP depends only on one past realization (=lag), it is an AR(1)-process: yt = + 1 yt1 +
t
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
4 / 35
AR and MA-process continued
If a variable depends only on past realizations of own error-terms we have a moving average process yt = +
t
+ 1
t1
+ 2
t2
+ 3
t3
+ + q
tq
In general we call it a MA(q)-model and if it depends only on one past error-term, it is a MA(1)-process: yt = +
t
+ 1
t1
Sometimes called a white noise process or the error-term is well-behaved (E [ut ] = 0, Var (ut ) = 2 ) and they are iid (=independently identically distributed) A bit hard to nd examples for this, so let us focus on AR-processes today!
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
5 / 35
AR and MA-process continued
In general theses two models are an ARMA(p,q)-model where p = order for the AR-process, q = order for the MA-process Examples: ARMA(1,0)= AR(1)-process yt = + 1 yt1 + ARMA(0,1)= MA(1)-process yt = +
t t
+ 1
t1
ARMA(1,1)= AR(1) and MA(1) in one model yt = + 1 yt1 + i
t1
If you see an ARIMA(p,I,q)-model then the I stands for integrated or when is the model stationary (see unit-root tests). If I=0 or I(0) the time series is already stationary. If I=1 or I(1) then it is stationary after rst dierencing and so on.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
6 / 35
AR and MA-process continued
Sometimes it is convenient to write these models in lag-operator notation L for L = one lag, L2 = two lags and so on. Example: yt = + 1 yt1 + that Lyt = yt1 , L2 yt =
t
becomes yt = + 1 Lyt + = yt3 and so on
yt2 , L3 yt
Example ARMA(1,1) in L-notation:
1 yt = [1 ]t yt [1 1 L] = [1 1 ] t open the brackets yt 1 Lyt = 1 yt = 1 Lyt + t 1 L t nally: yt = 1 yt1 + t 1 t1
[1 ]
1 L
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
7 / 35
AR and MA-process continued
How to gure out the process describing a time-series? Use the autocorrelation function ACF (= covariance between past realizations) and the partial autocorrelation function PACF. See Hamilton chapter 3 for a very good step by step derivation of these. Take a look at these and decide. Time-series modeling is often referred as art (actually empirical work in general) meaning you can have two economists telling you something else if they look at these functions. Remember the ACF and PACF are pretty much opposite to each other when we talk about AR and MA-processes. An AR-process has a (exponentially) declining ACF and spikes for the PACF. A MA-process has spikes in the ACF and (exponentially) declining PACF CONFUSED??? see some examples next
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
8 / 35
AR and MA-process continued
Example AR(1):
Example AR(2):
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
9 / 35
AR and MA-process continued
Example MA(1):
Example MA(2):
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
10 / 35
AR and MA-process continued
Much more fun if you have AR and MA-terms in your model.. ARMA(1,1):
Another way to nd the underlying process is to use information criteria like BIC, AIC, SIC which is part of the output in Eviews but not in STATA (calculating by hand a lot of fun) e.g. start with AR(0), then AR(1), AR(2).. and calculate the information criteria a trick maybe use estat ic
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
11 / 35
Box-Jenkins-method
Was the rst systematical approach to time-series modeling including 4 steps: 1. Model identication = test for stationarity, use ACF and PACF to nd the right model or information criteria 2. Model estimation = run the regressions, get the residues 3. Model checking = use the residues to check if they are white noise (graph, Q-tests and more) (4. Forecasting - see appendix)
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
12 / 35
Unit root tests
If a time series is stationary, regressions results are not spurious or screwed up. This means most of the time we want to have the series stationary (not needed if you do error-correction models). Problem is, most macroeconomic time series like GDP, unemployment, trade and many more are non-stationary (=contain a unit-root) or are not going back to their mean and the variance is not constant (actually increasing over time). More formally, a series is stationary when the errors are: 1. E ( t ) = 0 2. var ( t ) = 2 = or is constant 3. E (
t t1 )
= 0 or error terms are not (serially) correlated
in other words: the errors are well-behaved or white noise. A non-stationary time series has the opposite properties!
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
13 / 35
Unit root tests continued
Or if we use yt instead, a time-series is stationary when: 1. E (yt ) = the mean is constant and does not depend on time 2. E (yt )(ytj ) = j that the auto covariance is independent of time too! This means we have to test for non-stationarity, which is done using unit root tests like the most common Dickey-Fuller test. To make a non-stationary time series stationary, we can do the following: 1. take the rst dierences 2. or detrend the time series (dont do this today)
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
14 / 35
Unit root tests continued
The Dickey Fuller test (or augmented if more than one lag is included) uses following test regressions: 1. yt = yt1 +
t
note: = yt yt1 , = (constant 1)
if the time series is at (no trend) and potentially slow turning around zero 2. yt = + yt1 +
t
if the series is at and potentially slow-turning around a non-zero value (or has a drift, intercept = ) 3. yt = + yt1 + T +
t
if the series has a trend T (up or down) and a drift (intercept) or slow-turning around a trend line you would draw through the data The DF-test has its own test statistics and we want to reject the H0 : = 0 for stationarity. Or in other words if we cannot reject H0 the series is non-stationary and it has to be rst dierenced.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
15 / 35
Unit root tests continued
How do we choose the lag-length p for the DF-test? Schwert (1989) suggests following rule of thumb: pmax = 12
T 100
1 4
where T = number of periods e.g. years, quarters
Why should we care? If p (1) is too small some serial correlation can remain in the errors and biases the test, (2) is too large the power of the test will suer Another test for unit roots is suggested by Phillips-Perron (=PP) which corrects for a serial correlation and heteroskedasticity in the errors. And both ADF and PP-tests are not very helpful if the series is close to be stationary. Kwiatkowski, Phillips, Schmidt and Shin (1992) suggest a test for stationarity, the so-called KPSS-test s.t. H0 = series is stationary.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
16 / 35
Unit root tests continued
There are more tests out there, but in general it is not enough to use the Dickey-Fuller test only. Usually you use some more to be condent about your time series.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
17 / 35
Short Stata review
Remember a command in Stata has the following structure: [command] variable, options We used gen for generating new variables e.g. gen lgdp=log(gdp) to generate the log of GDP Remember: if you want to have the residues after a regression use predict
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
18 / 35
Finding the proper model - Step 1
We will work with quarterly GDP data rst 1. set mem 50m 2. load [Link] 3. Stata needs to know it is a time series. 3.1. generate a time-variable: gen time=tq(1947q1)+_n-1 3.2. give it the right format: format time %tq 3.3. tell Stata about it: tsset time 4. graph the series: tsline gdp 5. generate: gen lgdp=log(gdp) and graph it again: tsline lgdp
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
19 / 35
Finding the proper model - Step 1
Let us play around with ACF (=ac) and PACF (=pac) and lgdp is the variable, option = lag-length 1. ac lgdp, lags(10) 2. pac lgdp, lags(10) or 3. corrgram lgdp, lags(10) What do we see? Do it again for 20 lags. Let us do the same for the rst-dierence version of lgdp. There are two ways: 1. generate a new variable: gen flgdp=[Link] or 2. ac [Link]
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
20 / 35
Finding the proper model continued - Step 2
Assume an AR(1)-model is okay for log of real GDP. We should run following regression: reg lgdp [Link] note: Stata uses L= for lag, L2= two lags, L3 = three lags Stata uses D = for taking the rst dierence Stata uses F = if you have to forward your series, sometimes called a lead pretty convenient, because you can use these for generating new variables too.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
21 / 35
Finding the proper model continued - Step 3
If the AR(1) model is the proper one, the errors should be white noise. There are a couple of ways to test for it: 1. graph the errors 2. do a Breusch-Godfrey-test for serial correlation 3. do a Q-test called White-Noise test (or portmanteau test) Note: The Box-pierce test is not very common anymore, due its poor performance in small samples.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
22 / 35
Finding the proper model continued - Step 3
1. Graphing the errors To get the residues after the regression: predict res, resid Stata will save the errors in res There are two ways to graph them: 1.1. tsline resid plots them against time, there should be no pattern over time 1.2. plot the residues against past residues and there should be no pattern again! reg res [Link], beta twoway (scatter res [Link])
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
23 / 35
Finding the proper model continued - Step 3
2. Breusch-Godfrey-test again after the regression do the following (no need for predicting errors): estat bgodfrey, lags(10) H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise! 3. White-noise test run the regression predict the errors and do the following wntestq resid, lags(10) H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
24 / 35
Unit root tests
Are pretty straightforward in Stata: load quarterly data for defense spending [Link] and generate the log of defense spending (ds) 1.A-Dickey-Fuller tests [Link]: no constant, no trend term dfuller lds, noconstant [Link]: constant, no trend dfuller lds [Link]: constant, trend dfuller, lds trend options: 4. includes lags for ADF: dfuller lds, lags(10) includes 10 lags 5. if you need the regression output: dfuller lds, regress
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
25 / 35
Unit root tests continued
2. Phillips-Perron-test If we dont specify a lag-length PP-test uses Schwerts thumb of rule. Options are similar to dfuller pperron lds Remember: H0 =non-stationary [Link]-test kpss lds type help kpss into Stata, options are a bit dierent Remember: H0 =stationary If we reject the Null, then the series is non-stationary. Stata gives you the test values for dierent lag-lengths.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
26 / 35
ARIMA in Stata
We focused on AR-processes using OLS so far, but more powerful is following command: arima Arima-estimation is a maximum likelihood estimation and remember the notation is in general Arima(p,I,q) where I = integration e.g. I=0 the series is already stationary, I=1 you have to take the rst dierences rst examples arima ds, ar(1) AR(1) for defense spending (ds) arima ds, arima(1,0,0) still AR(1) but already stationary without rst-dierencing arima [Link], ar(1) = arima ds, arima(1,1,0) rst-dierence version of AR(1) on ds arima ds, ma(1) = arima ds, arima(0,0,1) would be a MA(1)-process for ds arima ds, ar(1) ma(1) = arima ds, arima(1,0,1) would have an AR(1) and a MA(1) component to get the AIC, BIC for the models, use following command after a regression: estat ic
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
27 / 35
ARIMA in Stata continued
Residual test to test the residuals for auto-correlation, it is similar as before (but bgodfrey will not work) e.g. predict the residuals and graph them, do a whitenoise test (wntestq res) or if you like a durban watson statistics (dwstat res) which should be around 2.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
28 / 35
Forecasting
There are dierent types of forecasting after a regression. We can do an in-sample forecast (using the quarters given) or we can do an out-of-sample forecast (adding quarters). I will do it for the Arima-command (OLS is a bit dierent) Remember: To check the quality of your forecast, you need to calculate the Root mean square error (RMSE). The RMSE uses the forecast-error (actual observation minus the forecast) and the formula is the following: RMSE = Example AR(1)-model: arima fgdp, ar(1) Do a one-step ahead forecast: predict fgdp1, y Compare the actual value with the forecast tsline fgdp fgdp1 (Yt forecastt )2
N
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
29 / 35
Forecasting continued
Calculate the RMSE: 1. Generate the forecast error: gen ferr=fgdp-fgdp1 2. Generate the square of the forecast error: gen ferr2=ferr^2 3. Get the mean of the errors sum ferr2 (0.0040) 4. Use it to compute the RMSE. display "rmse: " (0.0040)^.5 Note there are more ways to measure forecast accuracy.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
30 / 35
Forecasting continued
A dynamic forecast could be done as follows: predict fgdpd, xb dynamic(.) Plot the actual value and the forecast tsline fgdp fgdpd Out of sample forecast Do the regression but then you have to extend the time-horizon rst: tsappend, add(24) adds 24 quarters to the quarterly data-set we have. Then use the predict command for one-step ahead or dynamic forecasts.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
31 / 35
Forecasting continued
A simple linear OLS-forecast (dont ask me about the dynamic one, same command as above is not working. There should be a way to compute it manually in Stata): reg fgdp [Link] predict fgdp1 (Stata assumes the option xb anyway in this case) tsline fgdp fgdp1 What else could be done??? There is much more out there e.g. rolling forecast, comparing forecasts of dierent models e.g. AR(1) with AR(2) and so on.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
32 / 35
How to create the rst dierence of a series
The simplest way in Stata is: Let gdp be in levels and we want to create the rst dierence: gen fgdp=[Link] (same as: yt yt1 ) or D2 would be (yt yt1 ) (yt1 yt2 ) As you have seen above, in a regression you can use D,F and L in front of a variable without generating a new variable rst!
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
33 / 35
Setting the time
In our examples we had quarterly data, what if you have annual, monthly, weekly or daily data? annual data gen time=1947+_n-1 tsset time monthly data gen time=tm(1962m2)+_n-1 format time %tm tsset time weekly data gen time=tw(1962w1)+_n-1 format time %tw tsset time daily data gen time=td(1apr1962)+_n-1 format time %td tsset time Note:: _n = adds 1 observation to the start date and then it subtracts one.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
34 / 35
How to detrend a series? And how to choose the time horizon?
1. Detrending Sometimes you want to detrend a series e.g. there is a trend present or compared to taking the rst dierence, you save one observation. Imagine you only have 20 years of annual observations. Steps: create a trend variable, e.g. a variable increasing with time gen trend = _n+1 regress your variable of interest using a constant and a trend reg lgdp trend use the residuals for the fun stu you want to do! 2. Choosing the time horizon There a couple of ways e.g. use observations if starting with 1980 or so but one neat command is the following tin = time in reg [Link] [Link] tin{1947q1,1965q4) that the observations are from January 1947 (rst quarter) to December 1965 (fourth quarter)
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests
03/05/2010
35 / 35