Time Series

WESS Time Series Lectures
Alexander Karalis Isaac
July, 2015
Alexander Karalis Isaac (Warwick) Time Series July, 2015 1 / 90

Econometrics so far
We have studied the regression equation
yi = α + βxi + i i = 1...n
where i indicates an individual in a sample of n observations.

We have been interested in
The effect of x on y; dy /dx = β
The predicted value of y, given x; E[yi |xi ] = α + βxi
The fit of the model, e.g. the R 2 statistic
Can we do the same with a pair of time series
yt = α + βxt + t
NO!
(Or rather, only under special circumstances, and such a regression is only
ever part of the answer!)

Two problems in time series (1)
With data yt , xt the critical assumption E[t |xt ] = 0 is difficult to

maintain. The y and x are often simultaneously determined in a system,
’the economy’.
yt = αy + βy xt + yt
xt = αx + βx yt + xt
Consider the regression
yt = a + bxt + et
= a + b(α + βx yt + xt ) + et
The regression error et is an estimate of the yt error yt . It will be

correlated with the regressor xt as this regressor actually contains yt , and
so contains yt

Look at the estimates from a regression yt = α + βxt + t
param estimate tvalue

a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -
Does this look like a good regression?


a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -
yt is U.S. output. xt is mean land sea temperature

Now what do you think about the regression?


a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -
yt is U.S. output. xt is mean land sea temperature

Now what do you think about the regression?
This is not a regression about climate change!

It is called the spurious regression problem.
It is easy to do crap regressions with time series data!

Overview
In Part I our models will look like
yt = α + βyt−1 + t or yt = α + βt−1 + t
so the explanatory variable is replaced by the previous value of the

dependent variable, or by previous errors.
Our primary interest will be prediction, E[yt |yt−1 ] = α + βyt−1 .

Overview
In Part I our models will look like
yt = α + βyt−1 + t or yt = α + βt−1 + t
so the explanatory variable is replaced by the previous value of the

dependent variable, or by previous errors.
Our primary interest will be prediction, E[yt |yt−1 ] = α + βyt−1 .
In Part II we will learn how to estimate dynamic relationships
yt = α0 + α1 yt−1 + β0 xt + β1 xt−1 + t
in a way that elimantes the two common problems with time series
regressions.

Time series data
Our sample data {yt }, {xt } refers to observations on the same unit in
sequential time periods, t = 1, . . . , T
Periods may be years, quarters, months, weeks or days, depending on
interst and data availability
years quarters months weeks days

finance finance finance
macro macro macro
growth growth

Time series data
There are three main types of time sereis data

Mean reverting (’stationary’)
Series with a trend (’trend stationary’) series
Series with permanent shocks (’Integrated series’)
Part I of these notes deals with stationary series
Part II looks at testing for permanent shocks and dealing with Integrated
series

Mean reverting series

Trend stationary series

Integrated series

Know your data
The data file ‘macro vars.xls’ contains lots of U.S. data series
GDP is a good example
Plot U.S. GDP
What is the first thing to do to this data?

Take logs
The first thing we don’t like is the exponential shape.

Our regressions are linear regressions, so lets transform the data to make it
more linear
generate ln(GDP) and plot this

Logs
Look at your data. For data in levels, taking logs is often the first
step in applied work
Not if the data is already in % changes, or an interest rate!
Logarithms make percentage changes comparable by eye, which is
often more relevant.
Recall g = (xt − xt−1 )/xt−1 ⇒ 1 + g = xt /xt−1 , so then
ln(xt ) − ln(xt−1 ) = ln(1 + g ) ≈ ln(g ) for small g .

Difference
If the variable appears to be trending, as with ln GDPt , a safe thing to

do is take differences.
Generate the difference of GDP and plot this

What is the main difference compared to previous plots?
This now looks mean reverting. Later we will test formally whether a
series is mean reverting or integrated, but don’t forget it’s always
sensible to start by looking at your data.
∆yt = yt − yt−1 is the difference operator.

Know your data: prices
Consider the price data. Look at

levels
log levels
difference of logs - inflation
the difference of inflation
Which would you be happy to consider mean reverting?

Part I: Modelling stationary time series
So far I have talked about mean reverting series, but we can be more
precise
We will model variables which are covariance stationary (stationary
for short)
The mean exists and does not depend on time, E[yt ] = µ for all t
(quick notation ∀t).
The variance exists and is independent of time, var(yt ) = σy2 .
The Autocovariance, cov(yt , yt−k ) = σk2 is indpendent of time, it
depends only on k and not on t

Modelling stationary time series: assumptions on errors
Any model of a stationary series imposes two key assumptions on the

errors
A1: E[t ] = E[E[t |yt−1 ]] = 0
A2: E[t t−s ] = 0 ∀s > 0
There are also some technical assumptions
A3: E[2t ] = var(t ) = σ2
A4: yt and yt−j become independent as j gets large
A5: Very large outliers are unlikely
These assumptions apply to the true model, and we have to replicate them
in our statistical model.

Discussion of assumpions
A1 This tells us that t is unpredictable given information about yt−1 ,

available before period t begins. In the regression context, we require
that t is unpredictable given all our r.h.s. variables. We use
predetermined data yt−1 , yt−2 , t−1 , t−2 , ...
A2 In cross-sections, this is a second order assumption determining the
standard error of b.
In time series it is a first order assumption, determining the
consistency of b. See exercise.
A3 Allows us to make calculations about variances, including confidence
intervals around parameter estimates and forecasts. It is implied by
stationarity

Discussion of assumptions
A4 This is a technical requirement to derive limiting behaviour of

estimators. It replaces i.i.d. assumption in cross-sectional data
A5 This says that our models are not suitable for certain types of very
wild randomness. You should worry about this if you do
high-frequency finance, but it’s generally not a problem with
macroeconomic data.
In applied work, spurious regressions and models with wrong/insufficient
dynamics tend to violate A2, so checking it is key. Also, check A2 if you
are evaluating someone else’s work!

Stationary time series: AR(1) model
Our first time sereis model for stationary data
yt = α + βyt−1 + t (1)
This replaces independent explanatory data with past value of the

dependent variable.
Models the correlation between yt and its own past
Stationarity requires |β| < 1
Then the influence of past shocks dies away smoothly
Estimate the model by OLS

The AR(1) estimator
The AR(1) regression is like a standard OLS regression

PT
t=2 (yt − ȳ )(yt−1 − ȳ )
b= PT 2
t=2 (yt−1 − ȳ )
cov(yt , yt−1 )
=
var (yt )
a = ȳ − b ȳ
a
⇒ ȳ =
1−b

The AR(1) estimator
Variance of b follows standard OLS theory

T
X −1
var(b) = σ̂ 2 (yt−1 − ȳ )2
t=2
σ̂ 2
=
\t )
var(y
T
1 X
where σ̂ 2 = ˆ2t
T −1−k
t=2
Note we lose an extra DoF for every lag we include in the autoregression
Confidence testing as usual, given |b| < 1:
τ = (b − bH0 )/SE (b) ∼ tα/2,DoF
Expect low R 2 compared to cross sectional data.

Example
Look at the series for ∆ln(GDP) and do an AR(1) estimation

a
b
R2 -
Plot the residuals of the regression. Do you think they meet A1 - A3?
We will look at formal tests for these assumptions below.

General AR(p) model
One lag of yt may not be enough: an omitted variable bias
This shows up as E[t t−s ] 6= 0
We find a model with enough lags to ensure E[t t−s ] = 0∀s
yt = α + β1 yt−1 + β2 yt−2 + · · · + βp yt−p + t
Post-estimation approximate F-test, Bruesch-Godfrey test
ˆt = b1 ˆt−1 + · · · + bq t−q

ˆ + νt
H0 : b1 = b2 = · · · = bq = 0
HA : bi 6= 0 for some i
(RSSR − RSSU)/q
τ= ∼ χ2q
(RSSU)/DoF
= nR 2 ∼ χ2q
Inference on β̂i as in standard multivariate OLS models

Model selection strategy
Should be begin small and add lags until A2 holds?

Model selection strategy
Should be begin small and add lags until A2 holds?

NO!: Don’t base your model selection algorithm on starting from
models that don’t make any statistical sense
Start big and eliminate insignificant regressors, to find the smallest
model for which A2 still holds
Often start with p = f + 1 where f =nobs/year.
Quarterly example
Begin with p=5
Re-esetimate excluding the insiginifcant longer lags
Check E[t t−s ] = 0, s = 1...4
Repeat untill model contains only significant terms
D. Hendry ’PcGets’ software automates this

Notes on examples
You do some examples: ∆GDPt , ∆Const , ∆Invt , ∆Inft :

The MA(q) process
We noticed some series require very long AR models to capture all

the conditional correlation of the yt series.
This costs degress of freedom, making estimates and forecasts less
accurate
Is there a smaller model which could capture the dependency that AR
models struggle with?

The MA(q) process
We noticed some series require very long AR models to capture all

the conditional correlation of the yt series.
This costs degress of freedom, making estimates and forecasts less
accurate
Is there a smaller model which could capture the dependency that AR
models struggle with?
This is the moving average process

The MA(1) process
Equation:
yt = α + βt−1 + t
Simple to analyse
Stationary for any β value
t }T
Harder to estimate - b determines {ˆ t }T
t=1 , but {ˆ t=1 is the
regressor which determines b!
Solution: take an MLE approach (as in Probit)

MLE in the MA(1)
t |yt−1 ∼ N(0, σ2 )

1
f (yt |yt−1 ) = √ exp((yt − α − βt−1 )2 /(2σ2 ))
2πσ
X
l(α, β, σ2 ) = ln f (yt |yt−1 )
t
max l(.)w .r .t.α, β, σ2
Techinchally this is also conditional on 0 . A typical assumption is

0 = E[t ] = 0, though there are other approaches.
Inference follows standard maximum likelihood procedure

Information criteria
The MLE approach suggests another tool for tackling model selection
Minimise the expected information loss across potential models
AIC: −(2l(θ̂) − 2k): choose model with lowest AIC

BIC: −(2l(θ̂) − k ln(T )): choose model with lowest BIC
BIC generally chooses smaller models, unless you have small T
Combine insights from significance tests and Info criteria to choose

parsimonious model. Always check A2 holds!
Information criteria are also relevant for AR models, which can be
placed within MLE theory

Notes on examples
You do some MA(q) examples: ∆GDPt , ∆Const , ∆Invt , ∆Inft :

Model evaluation: forecast performance
If your job is forecasting, choose model with best forecasts!
I In sample forecasts:
Estimation period is 1 . . . T and look at e.g. 1-period ahead forecast
E[yt+1 |yt , θ̂T ]
This is similar to in-sample fit where we compare ŷt with yt , but now
we are doing it 1-period ahead.
I Out of sample forecasts:
Estimation period is 1 . . . N, and look at 1-period ahead forecasts
E[yt+1 |yt , θ̂N ] for t = N + 1, N + 2, N + 3 etc. up to final data point T .
This is a tougher test as none of the information in the forecast period
contributed to the parameter estimation.
A simple criterion Minimum Mean Square Error
N
1 X
MSE = (ŷi − yi )2
N
i=1
where ŷi is the forecast, yi is the realsiation.

Empirical Example: In-sample forecast comparisons
Compare 1-step ahead forecasts from 4-lag and preferred AR, MA models
Variable Model MSE Model MSE

∆ GDP AR(4) MA(4)
AR( ) MA( )
∆ Cons AR(4) MA(4)
AR( ) MA( )
∆ Inv AR(4) MA(4)
AR( ) MA( )
∆ Inf AR(4) MA(4)
AR( ) MA( )

ARMA(p,q) models
We can combine the forecasting power of AR and MA components
yt = α + β1 yt−1 + · · · + βp yt−p + γ1 t−1 + · · · + γq t−q + t
A1, A2, A3 apply for a well specified model

Estimation is by maximum likelihood
Don’t do large ARMAs, in practice ARMA(2,1) is often a good
approximation for macroeconomic time series.

Forecasts from an ARMA(2,1)
MSE
Variable k=1 k=4 k=8
∆ GDP
∆ Cons
∆ Inv
∆ Inf
Estimate the model to 2005. From 2003q1, produce static 1-period ahead
forecasts up to 2005, then dynamic 4 and 8 period ahead forecasts also
from 2003q1
What happens to the MSE as the forecast horizon increases?

Out of sample forecast example
Now estimate the model to 2007 and repeat the process using dynamic
out of sample forecasting up to 2011
MSE
Variable k=1 k=4 k=8
∆ GDP
∆ Cons
∆ Inv
∆ Inf
This is the problem the BoE had (with a more sophisticated model) during
the crisis
The FED did less badly because its model updates the parameters, via the
Kalman filter, when it makes an error. Beyond the scope of this course!

More on forecast errors
We have used the MSFE to look at different models and the effect of
different time horizons
Out of sample forecasts errors are larger than in-sample, because the
forecast error is really composed of two parts
MSFE = E[(yT +1 − ŷT +1|T )2 ]

= σ2 + var[(a − α) + (b − β)yT ]
The out of sample forecasts involve re-estimating the model, so give

an estimate of the likely performance of the model in real time

Deeper into time sereis: preliminaries
What does the AR part actually measure?

What does the MS part actually measure?
Why is their combination sometimes more useful?
Think about the way the influence past shocks, t−s decays over time
To go deeper into time series we need to brush up our maths!
We will look at deriving the conditional and unconditional
expectations, variances and autocovariances for simple time-series
models.

Conditional Expectations
The conditional expectation E[yt+1 |yt ] follows from the conditional mean
eqation we write down in AR(1) or MA(1) model
E[yt+1 |yt ] = E[α + βyt + t+1 |yt ]

= α + βE[yt |yt ] + E[t+1 |yt ]
= α + βyt
E[yt+1 |yt ] = E[α + βt + t+1 |yt ]

= α + βE[t |yt ]
= α + βt

Looking further ahead: iterative forecasts
AR(1)
E[yt+2 |yt ] = E[α + βyt+1 + t+2 |yt ]

= α + βE[yt+1 |yt ] + E[t+2 |yt ]
= α + β(α + βyt )
= α + βα + β 2 yt
k−1
X
E[yt+k |yt ] = β i α + β k yt
i=0
α
lim E[yt+k |yt ] =
k→∞ 1−β
MA(1)
E[yt+k |yt ] = α ∀k ≥ 2

Unconditional Expectations
If we know the process, but have no observations, what is our best

guess at a value yt ? Our best guess is the unconditional mean implied
by the process
AR(1)
E[yt ] = α + βE[yt−1 ] + E[t ]

= α + βE[yt ]
α
E[yt ] =
1−β
MA(1)
E[yt ] = α + βE[t−1 ] + E[t ]

=α

Uncertainty and variance AR(1)
Conditional variance
var(yt+1 |yt ) = var(α + βyt + t+1 |yt )

= var (t |yt ) = σ2
var(tt+2 |yt ) = var(α + βyt+1 + t+2 |yt )
= β 2 var(yt+1 |yt ) + var(t+2 |yt )
= (1 + β 2 )σ 2
k−1
X
var(yt+k |yt ) = (β 2 )i σ 2
i=0
⇒ lim var(yt+k |yt ) =
k→∞

Uncertainty and variance AR(1)
Unconditional variance
σy2 = var(yt ) = var(α + βyt−1 + t )

= β 2 var(yt ) + σ2
σ2
σy2 =
1 − β2
Compare this to the limit of the conditional variance

Uncertainty and Variance MA(1)
Conditional variance
var(yt+1 |yt ) = var(α + βt + t+1 |yt )

= σ2
var(yt+k |yt ) = var(α + βt+k−1 + t+k |yt )
= (1 + β 2 )σ2 ∀k ≥ 2
Unconditional variance
σy2 = var(α + βt−1 + t )

= (1 + β 2 )σ 2
So the conditional variance of MA(1) returns to unconditional

variance after 2 periods!

Forecast error variance
We should include confidence intervals in our forecasts

Assume t ∼ N(0, σ2 )
Then the 95% confidence intervals for E[yt+k |yt ] are
k−1
X
AR(1) = yt+k|t ± 1.96 (β 2 )i σ2
j=0
MA(1) = yt+k|t ± 1.96(1 + β 2 )σ2 ∀k ≥ 2
In practice it is common to apply these formulas to forecasts

generated with estimates a, b, σ̂2 , ignoring the extra uncertainty
created by estimating parameters

Forecasts with confidence intervals
PIC

Deeper into time sereis: ACF
An important property is the correlation between yt and yt−k

The Autocovariance function is the set of numbers
cov(yt , yt−k ) := σk2
The sample estimator
PT of the Autocovariance function is
2 1
σ̂k = T −k−1 t=k+1 ỹt ỹt−k where ỹt = yt − ȳ
The Autocovariance function is normalised by the variance of y to
give the Autocorrelation function ACF(k):
cov(yt , yt−k )
ρk =
var(yt )

ACF for various stationary models
PIC

ACF: discussion
The ACF shows us how long it takes for the influence of past shocks
to die away, by measuring the correlation between yt and its own past
values.
For stationary processes the ACF becomes statistically insignificant
after a finite number of periods.
Stationary processes have finite memory - the influence of a shock is
finite
PIC:growth ACF

Deeper into time series: PACF
Clearly autoregressions can caputre correlation between yt and its

past, but how many lags do we need?
If yt = α + β1 yt−1 + β2 yt−2 + t , then we know from regression
analysis that β2 is a measure of the conditional correlation between yt
and yt−2 after accounting for the correlation explained by yt−1
cov(yt , yt−k |yt−1 , yt−2 , ..., yt−k+1 )

PACF (k) = 1/2
var(yt |yt−1 , ..., yt−k+1 ) var(yt−k |yt−1 , ..., yt−k+1 )
e.g. PACF (3) =

PACFs for stationary processes
PIC

Memory in AR(1)
Let yt = βyt−1 + t , i.e. put α = 0 ⇒ µ = 0
cov(yt , yt−1 ) = E[(βyt−1 + t )yt−1 ]

2
= βE[yt−1 ] = βσy2
⇒ corr (yt , yt−1 ) = β
cov(yt , yt−2 ) = E[(βyt−1 + t )yt−2 )]

= E[(β(βyt−2 + t−1 ) + t )yt−2 ]
= E[β 2 yt−1 + βt−1 yt−2 + t yt−2 ]
= βσy2
⇒ corr (yt , yt−2 ) = β 2
corr (yt , yt−k ) = β k

ACF for different AR(1) models
PIC

PACF AR models
yt = β1 yt−1 + β2 yt−2 +t

| {z }
cond corr
The coefficient in the bracketed term is

cov(yt , yt−2 |yt−1 )
p
var(yt |yt−1 ) var(yt−2 |yt−1 )
PACF (k) = βk in AR(p) models

So the PACF drops sharply to 0 after the final lagged term in the
AR(p) model
This is an alternative way to think about how many lags to include

PACF various AR models
PIC
What do you notice about ACF vs. PACF in AR models?

ACF MA(1)
Consider the mean zero MA(1) yt = βt−1 + t
cov(yt , yt−1 ) = E[(βt−1 + t )(βt−2 + t−1 )]

= βσ2
⇒ corr (yt , yt−1 ) = β
cov(yt , yt−2 ) = E[(βt−1 + t )(βt−3 + t−2 )]

=0
ACF (k) = 0 ∀k≥2
The ACF of an MA(q) process drops to 0 sharply after q + 1 lags

PACF of MA(1)
To calculate the PACF directly is hard. Here’s a neat trick

Assume β < 1, notice t = yt − βt−1
yt = β(yt−1 − βt−2 ) + t
= β(yt−1 − β(yt−2 − βt−3 )) + t
= βyt−1 − β 2 yt−2 + β 3 (yt−3 − βt−4 ) + t
X∞
yt = (−1)i+1 β i yt−i + t
j=1
Which is an AR(∞), and is well defined give |β| < 1.

Using the earlier result, the PACF will decay geometrically as β i declines
to zero

Box Jenkins model building method
Two famous statisticians suggested the ACF/PACF as a way of building

times series regressions
AR(p) MA(q) ARMA(p,q)

ACF Decays smoohtly Chops off at q lags Decays smoothly
PACF Chops off at p lags Decays smoothly Decays smoothly
Inspection of empirical ACF, PACF can help suggest sensible starting

ARMA(p,q) model.
Then test down to small model using significance and information
criteria.
Always check A2 holds for your residuals

Emprical P/ACF
Genuine AR(1) process

PIC
∆ ln GDP
PIC

Emprical P/ACF
Genuine MA(1) process

PIC
∆Inf
PIC

Summary
We have dealt with finite memory processes where

I ACF (k) → 0 as k → ∞
I PACF (k) → 0 as k → ∞
I E[yt ] = µ ∀ t
I var(yt ) = σy2 ∀ t
I cov(yt , yt−k ) depends only on k and not t
ARMA(p,q) models make decent forcasts for these series
But in economics, they are only approximate models
How do we deal with levels of series and model relationships between

dynamic economic variables?

PART II: Integrated processes
Prcoesses with permanent shocks are called integrated processes

I A simple example shows our ideas of µ and σ 2 are not compatible with
permanent shocks
The first problem is to decide if a series is integrated
I Dickey Fuller tests
We then have a choice
I Difference the series to make it stationary
I Look for cointegration between two or more integrated series

Permanent shocks
Consider the random walk yt = yt−1 + t , y0 = 0
y1 = y0 + 1 = 1
y2 = y1 + 2 = 1 + 2
...yt = 1 + 2 + · · · + t
Xt−1
var(yt ) = var( t−i )
j=0
2
= tσ
→∞ as t → ∞
The variance of this process grows without bound

Permanent shocks
What about the mean?

Think about the random walk with drift yt = α + yt−1 + t
This is an AR(1) with β = 1
α
Thus E[yt ] = 1−β is undefined
The process has no unconditional mean
Conditional forecasts
E[yt+k |yt ] = α + yt−1
With an error variance that grows without bound
Regression analysis struggles with such data

Regressions with random walks
Regress the two uncorrelated random walks yt , xt in the dataset on

eachother
param value tstat

α
β
R2 -
Breusch-Godfrey stat for serial corr up to order 4:

This is typical of a spurious regression
High R 2 combined with positive serial correlation is always a sign of
spurious regression
Now regress ∆y on ∆x. Is there any relationship?

Testing for unit roots: Dickey-Fuller test
The best way to avoid spurious regressions is to do regressions with

stationary series
To determine stationarity, we need to test β = 1 in the process
yt = α + βyt−1 + t
∆yt = α + (β − 1)yt−1 + t
= α + ρyt−1 + t
H0 : ρ̂ = 0 ⇒ there is a unit root
HA : ρ̂ < 0 ⇒ No unit root
ρ̂
tDF =
SE (ρ̂)
The test stat tDF follows the Dickey Fuller distribution, which gives much
more negative critical values than the standard normal

Dickey Fuller distribution
PIC
The DF distribution is sensitive to specification of the test
I Inclusion of an intercept
I Inclusion of a trend
I Number of lags
I Sample size

The Augmented Dickey Fuller test
It is essential the there is no serial correlation in DF regression

residuals
If necessary add lagged differences of the dependent variable
yt = α + β1 yt−1 + β2 yt−2 + t
= α + β1 yt−1 + β2 yt−1 − β2 yt−1 + β2 yt−2 + t
= α + (β1 + β2 )yt−1 − β2 ∆yt−1 + t
∆yt = α + (β1 + β2 − 1)yt−1 − β2 ∆yt−1 + t
= α + ρyt−1 − β2 ∆yt−1 + t
Hypothesis, Alternative and test statistic as previous slide

Dealing with trends
Include trends using the ‘restricted trend’ option if available, for

g = γ/(1 − β)
yt = α + γt + βyt−1 + t
∆yt = α + (β − 1)(yt−1 − gt) + t
= α + ρ(yt−1 − gt) + t
⇒ ∆yt = α + t if ρ = 0 (2)
⇒ yt = α + γt + βyt−1 + t if ρ < 0 (3)
From (2) if process is unit root it is RW with drift

From (3) if process is not unit root, it is trend stationary with |β| < 1.

Dickey Fuller Tables

Notes on exercise
The order of integration, d, written yt ∼ I (d) is the number of times

a series must be differenced in order to make the series yt stationary
Determine the order of integration of Output, Consupmtion,
Investment and Prices.
Do any series exhibit trend-stationary behaviour?

Cointegration: Random walks which Tango!
So far we have dealt with Integrated series by differencing to make

them stationary and modelling their (univariate) stationary behaviour.
There is an important case when we can work with two (or more)
Integrated series directly. This is when the series are cointegrated
I Economic behaviour creates long run - equilibrium - relationships
between series. E.g. output and consumption, investment and output,
house prices and earnings (?), stock prices and profits (?)
I The ratio of such series is a stationary series, even though the two
series are I(1)!
I Variables which cointegrate in this way adjust to dynamic shocks in
order to move back towards their equilibrium relationship

Output and Consumption
Plots of series

Cointegration: formal definition
If a linear combination of I(1) series is I(0) then the two series cointegrate
xt ∼ I (1) yt ∼ I (1)
yt − βxt ∼ I (0)
The ‘cointegrating vector’ is the pair of values (1, −β) which

(working in logs) give the stationary ratio between the series
Economic theory often suggests theoretical values for β, so itis
interesting to see if these are true in the data

Common stochastic trends
Cointegration occurs when two series share a common stochastic trend,
say Xt . Let X0 = 0 and
Xt = Xt−1 + t
t
X
⇒ Xt = t
s=1
Let ỹt and x̃t be independent I (0) processes and let
yt = βXt + ỹt xt = Xt + x̃t

⇒ yt − βxt = βXt + ỹt − β(Xt + x̃t )
= ỹt − β x̃t ∼ I (0)
The common stochastic trend has been cancelled out. The pair (1, β) is
called the cointegrating vector as gives is the stationary linear combination
of y and x
Output and Consumption
Plots of ratio and residual in superconsistent regression

Cointegration: long and short-run relationships
If an economically meaningful equilibrium relationship exists:

There must be dynamic adjustment in the short run in order to return
the variables towards equilibrium levels when shocks push them apart
Thus the long-run relationship makes predictions about short-run
adjustment dynamics
The levels of the series this period help us predict changes in the
series next period
We can represent both the long-run and the short-run behaviour of
cointegrated series through the error correction model

Error correction model
We have seen an estimate of the cointegrating relationship between

const and outputt
ct = βyt +
Encouragingly the residuals ˆt from this relationship were stationary

But look at the BGodfrey stat - XXX - the above model is not
dynamically well-specified; it does not meet A2.
A model with more general dynamics is
ct = β1 yt + β2 yt−1 + β3 ct−1 + t (4)
This allows for the response of ct to its own past, current and lagged
values of yt

Although (4) is a more general dynamic specification, it consists of
I (1) variables, yet the t series should be I (0).
With a bit of algebra we can rewrite the model entirely in terms of
I (0) variables
ct = β1 yt + β2 yt−1 + β3 ct−1 + t
= β1 yt − β1 yt−1 + β1 yt−1 + β2 yt−1 + β3 ct−1 + t
= β1 ∆yt + (β1 + β2 )yt−1 + β3 ct−1 + t
∆ct = β1 ∆yt + (β1 + β2 )yt−1 + (β3 − 1)ct−1 + t

β1 + β2
= β1 ∆yt + (β3 − 1) ct−1 − yt−1 +t
1 − β3
| {z }
E. C. term
∆yt , ∆ct are I (0), provided there is cointegration, so are the error
term and the equilibrium relationship in the large brackets

We can re-write the final line of the ECM as
∆ct = α1 ∆yt + α2 (ct−1 − βyt−1 ) + t (5)

β1 +β2
Cointegration imposes the restrictions γ = (β3 − 1) and β = 1−β3
If there is a cointegrating relationship
I ct−1 − β̂yt−1 ∼ I (0), and ˆt ∼ I (0)
I α̂2 < 0
The α̂2 < 0 requirement ensures ct adjusts to being above its
long-run level in period t − 1 by reducing in period t
To estimate such a model, we need an estimate of ct−1 − β̂yt−1

Estimation of the ECM
Engle and Granger (1987) propose a two-step procedure for estimating (5)
First we need an estimate of the cointegrating vector. Regress:
ct = βyt + νt
⇒ ν̂t = ct − β̂yt
the ν̂t is our estimate of deviations from the long-run equilibrium

relationship
Second, we estimate, by OLS
∆ct = α1 ∆yt + α2 ν̂t−1 + t
We can recover estimates of the parameters of the original dynamic

model (4) from the parameters of the estimated ECM, α̂1 , α̂2 and
ν̂t−1

Testing for cointegration: EG procedure
The two-step estimation approach suggests a method for testing whether 2
series are actually cointegrated
Estimate the cointegrating relationship
ct = βyt + νt
Save ν̂t series and perform an ADF test with no intercept

p−1
X
∆ν̂t = ρν̂t + γi ∆ν̂t−i + ut
i=1
H0 : ρ = 0 ⇒ ut is I(1) and there is no conitegration
HA : ρ < 0 ⇒ ut is I(0) and there may be cointegration
Critical values are McKinnon’s < DF critical values
If we find H0 is rejected ...

Testing for cointegration: EG procedure
...estimate the ECM
∆ct = α1 ∆yt + α2 ν̂t−1 + t
Test that there is a significant, negative change in ct whenever

ct−1 > β̂yt−1 , in order to restor equilibrium
H0 : α̂2 < 0 ⇒ error correction is significant

HA : α̂2 ≥ 0 ⇒ no significant error correction
α̂2
τ= ∼ t0.05,DoF
SE (α̂2 )
If the estimates pass these two tests, there is significant cointegration

and the ECM can be used to estimate the dynamic model
If not, then work with differences, i.e. transform the two series to
make them stationary.

EG procedure: discussion
The Engle-Granger procedure works well with two variables, but there are
drawbacks
The initial regression is misspecified, ν̂t is usually serially correlated
This two-step step approach introduces more variance than a
dynamically well-specified 1-step procedure
Results, esp. with more than two variables are sensitive to which
variable is taken as the left hand side variable
With more than two variables, there may be more than one
cointegrating relationship, and EG will estimate a linear combination
of these relationships, which has no real interpretation
These problems can be overcome by the Johansen procedure which is a
vector-based approach to estimating cointegrating equations

Empirical examples
Series β̂ ν̂t ∼ I (0) α̂2 t-stat

(ct , yt )
(hpt , wt )
(SPt , Dt )

Forecast comparisons
Estimate your error correction models on 1960-2000

Estimate your preferred ARIMA on 1960-2000
Produce 1-step and 4-step ahead out of sample forecasts with each
model for 2001-2006
Compare the MSPE from each model

Summary: Work stream for applied time series
Graph your data. Think about:

I Is the series trending over time?
I Is the trend exponential or linear?
I Is the series mean reverting?
I Would the series look mean reverting in most subsamples?
I Are there several variables that seem to exhibit the same random trend?
Take logs of exponentially increasing variables
Begin Dickey Fuller tests
I Decide about appropriate inclusion of trends and constants based on
visual inspection and inspection of DF regression results
I Include f + 1 lags in initial DF specification and remove insignificant
lags; check for serial correlation up to order f , ensure A2 is satisfied.
Using preferred specification of DF tests decide on order of
integration of the series

Summary: With the transformed stationary series
Build univariate ARMA models for forecasting

I Inspect ACF, PACF, decide on candidate AR, MA, ARMA specification
I Start with AR(f+1), MA(f+1) or ARMA (f/2,f/2)(?) specification and
test down by eliminating insignificant lags, minimizing AIC/BIC; ensure
A2 is satisfied in preferred model
Inspect forecast predictions v.s. actual outcomes
I Do the forecast error bounds include 95% of actual outcomes?
I Are the forecast errors close to uncorrelated?
Test robustness by performing out of sample forecast exercise
I You will need to reserve part of your sample so will lose some
information from the estimation
I But you might find a model that performs better in practice, or at
leatunderstand more about how your model is likely to perform as new
data comes in

Summary: Modelling cointegrating series
Plot the ratio of interest
Engle-Granger Procedure Step I
I Estimate the cointegrating relationship with appropriate constant/trend
inclusion
I Save the residuals
I Perform Dickey Fuller test on residuals
I No constant! McKinnon p-values
I H0 : no cointegration. If reject H0 go to...
Engle-Granger Procedure Step II
I Estimate ECM with appropriate lagged differences so that A2 holds
I Test αˆ2 < 0 by standard t-test
I H0 : no cointegration (α2 = 0). If reject H0 ...
ECM is correct model. Recover parameters of restricted ARDL model
with appropriate tranformations
Interpret cointegrating relationship
Make dynamic forecasts
The End!

Time Series

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Time Series

Transféré par

Droits d'auteur :

Formats disponibles

WESS Time Series Lectures

Alexander Karalis Isaac

Alexander Karalis Isaac (Warwick) Time Series July, 2015 1 / 90

where i indicates an individual in a sample of n observations.

Alexander Karalis Isaac (Warwick) Time Series July, 2015 2 / 90

With data yt , xt the critical assumption E[t |xt ] = 0 is difficult to

Consider the regression

The regression error et is an estimate of the yt error yt . It will be

Alexander Karalis Isaac (Warwick) Time Series July, 2015 3 / 90

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue

Does this look like a good regression?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 4 / 90

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue

Does this look like a good regression?

yt is U.S. output. xt is mean land sea temperature

Alexander Karalis Isaac (Warwick) Time Series July, 2015 4 / 90

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue

Does this look like a good regression?

yt is U.S. output. xt is mean land sea temperature

This is not a regression about climate change!

Alexander Karalis Isaac (Warwick) Time Series July, 2015 4 / 90

In Part I our models will look like

so the explanatory variable is replaced by the previous value of the

Alexander Karalis Isaac (Warwick) Time Series July, 2015 5 / 90

In Part I our models will look like

so the explanatory variable is replaced by the previous value of the

Alexander Karalis Isaac (Warwick) Time Series July, 2015 5 / 90

years quarters months weeks days

Alexander Karalis Isaac (Warwick) Time Series July, 2015 6 / 90

There are three main types of time sereis data

Part I of these notes deals with stationary series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 7 / 90

Alexander Karalis Isaac (Warwick) Time Series July, 2015 8 / 90

Alexander Karalis Isaac (Warwick) Time Series July, 2015 9 / 90

Alexander Karalis Isaac (Warwick) Time Series July, 2015 10 / 90

Plot U.S. GDP

What is the first thing to do to this data?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 11 / 90

The first thing we don’t like is the exponential shape.

generate ln(GDP) and plot this

Alexander Karalis Isaac (Warwick) Time Series July, 2015 12 / 90

Alexander Karalis Isaac (Warwick) Time Series July, 2015 13 / 90

If the variable appears to be trending, as with ln GDPt , a safe thing to

Generate the difference of GDP and plot this

Alexander Karalis Isaac (Warwick) Time Series July, 2015 14 / 90

Consider the price data. Look at

Alexander Karalis Isaac (Warwick) Time Series July, 2015 15 / 90

Alexander Karalis Isaac (Warwick) Time Series July, 2015 16 / 90

Any model of a stationary series imposes two key assumptions on the

Alexander Karalis Isaac (Warwick) Time Series July, 2015 17 / 90

A1 This tells us that t is unpredictable given information about yt−1 ,

Alexander Karalis Isaac (Warwick) Time Series July, 2015 18 / 90

A4 This is a technical requirement to derive limiting behaviour of

Alexander Karalis Isaac (Warwick) Time Series July, 2015 19 / 90

Our first time sereis model for stationary data

This replaces independent explanatory data with past value of the

Alexander Karalis Isaac (Warwick) Time Series July, 2015 20 / 90

The AR(1) regression is like a standard OLS regression

Alexander Karalis Isaac (Warwick) Time Series July, 2015 21 / 90

Variance of b follows standard OLS theory

With data yt , xt the critical assumption E[t |xt ] = 0 is difficult to

The regression error et is an estimate of the yt error yt . It will be

Look at the estimates from a regression yt = α + βxt + t

Look at the estimates from a regression yt = α + βxt + t

Look at the estimates from a regression yt = α + βxt + t

A1 This tells us that t is unpredictable given information about yt−1 ,

yt = α + β1 yt−1 + β2 yt−2 + · · · + βp yt−p + t

ˆt = b1 ˆt−1 + · · · + bq t−q

t |yt−1 ∼ N(0, σ2 )

Techinchally this is also conditional on 0 . A typical assumption is

yt = α + β1 yt−1 + · · · + βp yt−p + γ1 t−1 + · · · + γq t−q + t

E[yt+1 |yt ] = E[α + βyt + t+1 |yt ]

E[yt+1 |yt ] = E[α + βt + t+1 |yt ]

E[yt+2 |yt ] = E[α + βyt+1 + t+2 |yt ]

E[yt ] = α + βE[yt−1 ] + E[t ]

E[yt ] = α + βE[t−1 ] + E[t ]

var(yt+1 |yt ) = var(α + βyt + t+1 |yt )

σy2 = var(yt ) = var(α + βyt−1 + t )

var(yt+1 |yt ) = var(α + βt + t+1 |yt )

σy2 = var(α + βt−1 + t )

MA(1) = yt+k|t ± 1.96(1 + β 2 )σ2 ∀k ≥ 2