Vous êtes sur la page 1sur 35

Unit root tests and Box-Jenkins

Anton Parlow Lab session Econ710 UWM Econ Department

03/05/2010

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

1 / 35

Our plan

Introduction to time series AR and MA-process Box-Jenkins Method Unit root tests Short review of Stata Finding the proper model Unit root tests Arima Forecasting

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

2 / 35

Introduction
A time series is the outcome of a variable observed over time e.g. annually, quarterly, monthly and so on. There are dierent ways to describe a series e.g. has it a trend, a drift or is it a random walk? Example: Quarterly real GDP from 1947 to 2008

We want to explain GDP today with past values of GDP but have to nd the proper model rst.
Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests 03/05/2010 3 / 35

AR and MA-process

If GDP (yt ) depends only on its own (=auto) and past values (regressive) we have an autoregressive process: yt = + 1 yt1 + 2 yt2 + 3yt3 + + p ytp +
t

In general we call it an AR(p)-model and if GDP depends only on one past realization (=lag), it is an AR(1)-process: yt = + 1 yt1 +
t

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

4 / 35

AR and MA-process continued

If a variable depends only on past realizations of own error-terms we have a moving average process yt = +
t

+ 1

t1

+ 2

t2

+ 3

t3

+ + q

tq

In general we call it a MA(q)-model and if it depends only on one past error-term, it is a MA(1)-process: yt = +
t

+ 1

t1

Sometimes called a white noise process or the error-term is well-behaved (E [ut ] = 0, Var (ut ) = 2 ) and they are iid (=independently identically distributed) A bit hard to nd examples for this, so let us focus on AR-processes today!

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

5 / 35

AR and MA-process continued

In general theses two models are an ARMA(p,q)-model where p = order for the AR-process, q = order for the MA-process Examples: ARMA(1,0)= AR(1)-process yt = + 1 yt1 + ARMA(0,1)= MA(1)-process yt = +
t t

+ 1

t1

ARMA(1,1)= AR(1) and MA(1) in one model yt = + 1 yt1 + i


t1

If you see an ARIMA(p,I,q)-model then the I stands for integrated or when is the model stationary (see unit-root tests). If I=0 or I(0) the time series is already stationary. If I=1 or I(1) then it is stationary after rst dierencing and so on.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

6 / 35

AR and MA-process continued

Sometimes it is convenient to write these models in lag-operator notation L for L = one lag, L2 = two lags and so on. Example: yt = + 1 yt1 + that Lyt = yt1 , L2 yt =
t

becomes yt = + 1 Lyt + = yt3 and so on

yt2 , L3 yt

Example ARMA(1,1) in L-notation:


1 yt = [1 ]t yt [1 1 L] = [1 1 ] t open the brackets yt 1 Lyt = 1 yt = 1 Lyt + t 1 L t nally: yt = 1 yt1 + t 1 t1

[1 ]

1 L

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

7 / 35

AR and MA-process continued

How to gure out the process describing a time-series? Use the autocorrelation function ACF (= covariance between past realizations) and the partial autocorrelation function PACF. See Hamilton chapter 3 for a very good step by step derivation of these. Take a look at these and decide. Time-series modeling is often referred as art (actually empirical work in general) meaning you can have two economists telling you something else if they look at these functions. Remember the ACF and PACF are pretty much opposite to each other when we talk about AR and MA-processes. An AR-process has a (exponentially) declining ACF and spikes for the PACF. A MA-process has spikes in the ACF and (exponentially) declining PACF CONFUSED??? see some examples next

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

8 / 35

AR and MA-process continued


Example AR(1):

Example AR(2):

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

9 / 35

AR and MA-process continued


Example MA(1):

Example MA(2):

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

10 / 35

AR and MA-process continued

Much more fun if you have AR and MA-terms in your model.. ARMA(1,1):

Another way to nd the underlying process is to use information criteria like BIC, AIC, SIC which is part of the output in Eviews but not in STATA (calculating by hand a lot of fun) e.g. start with AR(0), then AR(1), AR(2).. and calculate the information criteria a trick maybe use estat ic

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

11 / 35

Box-Jenkins-method

Was the rst systematical approach to time-series modeling including 4 steps: 1. Model identication = test for stationarity, use ACF and PACF to nd the right model or information criteria 2. Model estimation = run the regressions, get the residues 3. Model checking = use the residues to check if they are white noise (graph, Q-tests and more) (4. Forecasting - see appendix)

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

12 / 35

Unit root tests

If a time series is stationary, regressions results are not spurious or screwed up. This means most of the time we want to have the series stationary (not needed if you do error-correction models). Problem is, most macroeconomic time series like GDP, unemployment, trade and many more are non-stationary (=contain a unit-root) or are not going back to their mean and the variance is not constant (actually increasing over time). More formally, a series is stationary when the errors are: 1. E ( t ) = 0 2. var ( t ) = 2 = or is constant 3. E (
t t1 )

= 0 or error terms are not (serially) correlated

in other words: the errors are well-behaved or white noise. A non-stationary time series has the opposite properties!

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

13 / 35

Unit root tests continued

Or if we use yt instead, a time-series is stationary when: 1. E (yt ) = the mean is constant and does not depend on time 2. E (yt )(ytj ) = j that the auto covariance is independent of time too! This means we have to test for non-stationarity, which is done using unit root tests like the most common Dickey-Fuller test. To make a non-stationary time series stationary, we can do the following: 1. take the rst dierences 2. or detrend the time series (dont do this today)

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

14 / 35

Unit root tests continued

The Dickey Fuller test (or augmented if more than one lag is included) uses following test regressions: 1. yt = yt1 +
t

note: = yt yt1 , = (constant 1)

if the time series is at (no trend) and potentially slow turning around zero 2. yt = + yt1 +
t

if the series is at and potentially slow-turning around a non-zero value (or has a drift, intercept = ) 3. yt = + yt1 + T +
t

if the series has a trend T (up or down) and a drift (intercept) or slow-turning around a trend line you would draw through the data The DF-test has its own test statistics and we want to reject the H0 : = 0 for stationarity. Or in other words if we cannot reject H0 the series is non-stationary and it has to be rst dierenced.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

15 / 35

Unit root tests continued

How do we choose the lag-length p for the DF-test? Schwert (1989) suggests following rule of thumb: pmax = 12
T 100
1 4

where T = number of periods e.g. years, quarters

Why should we care? If p (1) is too small some serial correlation can remain in the errors and biases the test, (2) is too large the power of the test will suer Another test for unit roots is suggested by Phillips-Perron (=PP) which corrects for a serial correlation and heteroskedasticity in the errors. And both ADF and PP-tests are not very helpful if the series is close to be stationary. Kwiatkowski, Phillips, Schmidt and Shin (1992) suggest a test for stationarity, the so-called KPSS-test s.t. H0 = series is stationary.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

16 / 35

Unit root tests continued

There are more tests out there, but in general it is not enough to use the Dickey-Fuller test only. Usually you use some more to be condent about your time series.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

17 / 35

Short Stata review

Remember a command in Stata has the following structure: [command] variable, options We used gen for generating new variables e.g. gen lgdp=log(gdp) to generate the log of GDP Remember: if you want to have the residues after a regression use predict

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

18 / 35

Finding the proper model - Step 1

We will work with quarterly GDP data rst 1. set mem 50m 2. load gdp.dta 3. Stata needs to know it is a time series. 3.1. generate a time-variable: gen time=tq(1947q1)+_n-1 3.2. give it the right format: format time %tq 3.3. tell Stata about it: tsset time 4. graph the series: tsline gdp 5. generate: gen lgdp=log(gdp) and graph it again: tsline lgdp

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

19 / 35

Finding the proper model - Step 1

Let us play around with ACF (=ac) and PACF (=pac) and lgdp is the variable, option = lag-length 1. ac lgdp, lags(10) 2. pac lgdp, lags(10) or 3. corrgram lgdp, lags(10) What do we see? Do it again for 20 lags. Let us do the same for the rst-dierence version of lgdp. There are two ways: 1. generate a new variable: gen flgdp=D.lgdp or 2. ac D.lgdp

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

20 / 35

Finding the proper model continued - Step 2

Assume an AR(1)-model is okay for log of real GDP. We should run following regression: reg lgdp L.lgdp note: Stata uses L= for lag, L2= two lags, L3 = three lags Stata uses D = for taking the rst dierence Stata uses F = if you have to forward your series, sometimes called a lead pretty convenient, because you can use these for generating new variables too.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

21 / 35

Finding the proper model continued - Step 3

If the AR(1) model is the proper one, the errors should be white noise. There are a couple of ways to test for it: 1. graph the errors 2. do a Breusch-Godfrey-test for serial correlation 3. do a Q-test called White-Noise test (or portmanteau test) Note: The Box-pierce test is not very common anymore, due its poor performance in small samples.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

22 / 35

Finding the proper model continued - Step 3

1. Graphing the errors To get the residues after the regression: predict res, resid Stata will save the errors in res There are two ways to graph them: 1.1. tsline resid plots them against time, there should be no pattern over time 1.2. plot the residues against past residues and there should be no pattern again! reg res L.res, beta twoway (scatter res L.res)

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

23 / 35

Finding the proper model continued - Step 3

2. Breusch-Godfrey-test again after the regression do the following (no need for predicting errors): estat bgodfrey, lags(10) H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise! 3. White-noise test run the regression predict the errors and do the following wntestq resid, lags(10) H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

24 / 35

Unit root tests


Are pretty straightforward in Stata: load quarterly data for defense spending ds.dta and generate the log of defense spending (ds) 1.A-Dickey-Fuller tests 1.case: no constant, no trend term dfuller lds, noconstant 2.case: constant, no trend dfuller lds 3.case: constant, trend dfuller, lds trend options: 4. includes lags for ADF: dfuller lds, lags(10) includes 10 lags 5. if you need the regression output: dfuller lds, regress

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

25 / 35

Unit root tests continued

2. Phillips-Perron-test If we dont specify a lag-length PP-test uses Schwerts thumb of rule. Options are similar to dfuller pperron lds Remember: H0 =non-stationary 3.KPSS-test kpss lds type help kpss into Stata, options are a bit dierent Remember: H0 =stationary If we reject the Null, then the series is non-stationary. Stata gives you the test values for dierent lag-lengths.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

26 / 35

ARIMA in Stata
We focused on AR-processes using OLS so far, but more powerful is following command: arima Arima-estimation is a maximum likelihood estimation and remember the notation is in general Arima(p,I,q) where I = integration e.g. I=0 the series is already stationary, I=1 you have to take the rst dierences rst examples arima ds, ar(1) AR(1) for defense spending (ds) arima ds, arima(1,0,0) still AR(1) but already stationary without rst-dierencing arima D.ds, ar(1) = arima ds, arima(1,1,0) rst-dierence version of AR(1) on ds arima ds, ma(1) = arima ds, arima(0,0,1) would be a MA(1)-process for ds arima ds, ar(1) ma(1) = arima ds, arima(1,0,1) would have an AR(1) and a MA(1) component to get the AIC, BIC for the models, use following command after a regression: estat ic

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

27 / 35

ARIMA in Stata continued

Residual test to test the residuals for auto-correlation, it is similar as before (but bgodfrey will not work) e.g. predict the residuals and graph them, do a whitenoise test (wntestq res) or if you like a durban watson statistics (dwstat res) which should be around 2.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

28 / 35

Forecasting

There are dierent types of forecasting after a regression. We can do an in-sample forecast (using the quarters given) or we can do an out-of-sample forecast (adding quarters). I will do it for the Arima-command (OLS is a bit dierent) Remember: To check the quality of your forecast, you need to calculate the Root mean square error (RMSE). The RMSE uses the forecast-error (actual observation minus the forecast) and the formula is the following: RMSE = Example AR(1)-model: arima fgdp, ar(1) Do a one-step ahead forecast: predict fgdp1, y Compare the actual value with the forecast tsline fgdp fgdp1 (Yt forecastt )2
N

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

29 / 35

Forecasting continued

Calculate the RMSE: 1. Generate the forecast error: gen ferr=fgdp-fgdp1 2. Generate the square of the forecast error: gen ferr2=ferr^2 3. Get the mean of the errors sum ferr2 (0.0040) 4. Use it to compute the RMSE. display "rmse: " (0.0040)^.5 Note there are more ways to measure forecast accuracy.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

30 / 35

Forecasting continued

A dynamic forecast could be done as follows: predict fgdpd, xb dynamic(.) Plot the actual value and the forecast tsline fgdp fgdpd Out of sample forecast Do the regression but then you have to extend the time-horizon rst: tsappend, add(24) adds 24 quarters to the quarterly data-set we have. Then use the predict command for one-step ahead or dynamic forecasts.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

31 / 35

Forecasting continued

A simple linear OLS-forecast (dont ask me about the dynamic one, same command as above is not working. There should be a way to compute it manually in Stata): reg fgdp L.fgdp predict fgdp1 (Stata assumes the option xb anyway in this case) tsline fgdp fgdp1 What else could be done??? There is much more out there e.g. rolling forecast, comparing forecasts of dierent models e.g. AR(1) with AR(2) and so on.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

32 / 35

How to create the rst dierence of a series

The simplest way in Stata is: Let gdp be in levels and we want to create the rst dierence: gen fgdp=D.gdp (same as: yt yt1 ) or D2 would be (yt yt1 ) (yt1 yt2 ) As you have seen above, in a regression you can use D,F and L in front of a variable without generating a new variable rst!

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

33 / 35

Setting the time


In our examples we had quarterly data, what if you have annual, monthly, weekly or daily data? annual data gen time=1947+_n-1 tsset time monthly data gen time=tm(1962m2)+_n-1 format time %tm tsset time weekly data gen time=tw(1962w1)+_n-1 format time %tw tsset time daily data gen time=td(1apr1962)+_n-1 format time %td tsset time Note:: _n = adds 1 observation to the start date and then it subtracts one.

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

34 / 35

How to detrend a series? And how to choose the time horizon?


1. Detrending Sometimes you want to detrend a series e.g. there is a trend present or compared to taking the rst dierence, you save one observation. Imagine you only have 20 years of annual observations. Steps: create a trend variable, e.g. a variable increasing with time gen trend = _n+1 regress your variable of interest using a constant and a trend reg lgdp trend use the residuals for the fun stu you want to do! 2. Choosing the time horizon There a couple of ways e.g. use observations if starting with 1980 or so but one neat command is the following tin = time in reg D.lgdp D2.lgdp tin{1947q1,1965q4) that the observations are from January 1947 (rst quarter) to December 1965 (fourth quarter)

Anton Parlow Lab session Econ710 UWM Econ Department () and Box-Jenkins Unit root tests

03/05/2010

35 / 35

Vous aimerez peut-être aussi