Vous êtes sur la page 1sur 6

Massachusetts Institute of Technology Guido Kuersteiner

Department of Economics
Time Series 14.384
Lecture Note 4: Prediction and Wold Decomposition
We consider a weakly stationary time series x
t
and are interested in obtaining a forecast of x
t+1
based on
past observed values of x
t
. It is common to consider forecasts x
t+1
that minimize the mean squared forecast
error.
x
t+1
)
2
= inf E (x
t+1
y)
2
E (x
t+1

yM
t
where M
t
is the set of all measurable functions of {x
t
, ..., x
1
} such that y M
t
i Ey
2
and includes the
constant functions
1
. By the projection theorem
x
t+1
= P
M
t
(x
t+1
) = E
M
t
x
t+1
where E
M
t
x
t+1
is the conditional expectation dened by
EWE
Mt
X = EWX W M
t
where X is any random variable dened on the same sample space as x
t
.
It follows at once that x
t+1
E
M
t
x
t+1
M

t
since for all y M
t
it has to hold from the denition of
the conditional expectations that E (y (x
t+1
E
Mt
x
t+1
)) = Eyx
t+1
EyE
Mt
x
t+1
= 0. By the projection
theorem this establishes that the conditional expectation is a projection. This result is not very useful in
practice because the conditional expectation cannot in general be computed. It is therefore useful to restrict
attention to the class of best linear predictors. We denote the closed linear span of {1, x
t
, . . . , x
1
} by M
t

.
Then the best linear predictor satises
x
t+1
)
2
= inf E (x
t+1
y)
2
E(x
t+1

yM

t
P
t1
and by the projection theorem we have again that
t
(x
t+1
) = +
j=0

j,t
x
tj
such that x
t+1
= P
M

t1
X

j,t
cov (x
tj
, x
ti
) = cov (x
t+1
, x
ti
) i = 0, . . . , t 1
j=0
since sp{1, x
t
, . . . , x
1
} M
t
it follows immediately that in general

2
E (x
t
E
M
t
(x
t+1
))
2
E x
t+1
P
M

t
(x
t+1
)
The only exception is the case where x
t
is a Gaussian process. Then it is the case that
E
M
t
(x
t+1
) = P
M

t
(x
t+1
).
In particular we write for the best linear predictor
x
t+1
= P
M
x
t+1
for t 1.
t
Note that M

t
= sp {x
t
, ..., x
1
} = sp {x
t
x
1
} . Therefore x
t+1
can be found by projecting onto the x
t
, ..., x
1

past forecast errors {x
t
x
1
}. We can dene x
t
, ..., x
1
x
t+1
recursively by setting
x
1
= 0
1
More formally, M
t
is the -eld generated by {x
t
, ...., x
1
}, i.e. M
t
is the smallest sigma eld of the sample space such that
x
t
, ...., x
1
are measurable functions.
such that
t1
X
x
tj
) t > 1 x
t+1
=
j,t
(x
tj

j=0
where
t1
X
x
tj
, x
ti
x
ti
i .
j,t
hx
tj
x
ti
i = hx
t+1
, x
ti

j=0
Since x
tj

tj1
by the projection theorem, the left-hand side reduces to x
tj
M

i,t
kx
ti
x
ti
i (4.1) x
ti
k
2
= hx
t+1
, x
ti

x
ti
k
2
=
t
2
i
and substituting for
P
t2i
x
tij1
) leads to Denoting kx
ti
x
ti
=
j=0

j,ti1
(x
tij1

_ _
t2i
X

i,t
=

t
2
i
_

x
(i + 1)
j,ti1
hx
t+1
, x
tij1
x
tij1
i
_
(4.2)
j=0
x
tij1
i , which is equal to where cov(x
t
, x
t+|h|
) =
x
(h). Now, the last term in the sum, hx
t+1
, x
tij1

2
tij1

j+i+1,t
form (4.1), can be substituted in (4.2) to give
_ _
t2i
X

i,t
=

t
2
i
_

x
(i + 1)
j,ti1

j+i+1,t

t
2
ij1
_
.
j=0
x
t
k
2
= kx
t
k
2
k
j=0

2
1
Also,
2
t
= kx
t
x
t
k
2
=
x
(0)
P
t1
j,t1

t
2
j1
and
2
=
x
(0). Note that
t1,t
=

x
(1) assuming that
x
(h) is known or estimated. These equations show that all the coecients can be
calculated recursively.
h-step ahead prediction
We now want to predict x
t+h
based on x
t
, ..., x
1
. The linear predictor is

t
(x
t+h
) x
t+h
= P
M

t1
X
=
j,t+h1
(x
tj
x
tj
) .
j=0
We want to apply these general results to forecast x
t+h
if x
t
is assumed to follow an ARMA(p, q) process
of the form
(L)x
t
= (L)
t

t
WN (0,
2
)
where (L) = 1
1
L...
p
L
p
and (L) = 1 +
1
L + ... +
q
L
q
. Dene
w
t
=
1
(L)x
t
t > max(p, q)
and let
w
t
=
1
x
t
for t max(p, q).
Note that sp {x
t
, ..., x
1
} = sp {w
t
, ..., w
1
}. We determine w
t
recursively as before, i.e.
w
1
= 0
t1
X
w
t+1
=
j,t
(w
tj
w
tj
) .
j=0
2
1

b
Denoting
2
t
= kw
t
w
t
k
2
we can now determine the coecients
j,t
as

j,t

t
2
j
= hw
t+1
, w
tj
w
tj
i . (4.3)
w
tj
M

tj
and w
t+1
M

Now for t > max(p, q) and j > q it follows that w


tj

tj
. Therefore
j,t
= 0
for t > max(p, q) and j > q. From (4.3) we have now for t > max(p, q) and j < q
" #
qj1
X

j,t
=
t


2
j
hw
t+1
, w
tj
i
k,tj1

k+j+1,t

t
2
kj1
k=0
and
q,t
=

t
2
q
hw
t+1
, w
tq
i . Also,
q1
X

t
2
j
= kw
tj
k
2

2
k,tj1

2
tj1k
k=0
and noting that Ew
i
w
j
=
w
(i j) with
_

x
(i j) 1 i, j m
_

2
[
x
(i j)
P
p

w
(i j) =
P
q
r=1

x
(r |i j|)] min(i, j) m < max(i, j) 2m
_
r=0

r+|ij|
min(i, j) > m
0 otherwise.
These relationships allow for recursive estimation of
j,t
,
2
t
and w
t
. For large t and (L) invertible w
t
can
be approximately determined by using the parameters
j
of the lag polynomial (L) instead of the optimal
projection parameters. The predictions for x
t
are now obtained from
w
t
=
1
P
M
x
t
t1

1
= x
t
t < max(p, q)
and
w
t
=
1
P
M
(L)x
t
t1
=
1
( x
t

1
x
t1
...
p
x
tp
) t max(p, q).
It follows that (w
t
w
t
) = x
t
x
t
. Therefore
t
X
x
tj
) t < max(p, q) x
t+1
=
j,t
(x
tj

j=0
q
X
x
t+1
=
1
x
t
+ ... +
p
x
tp+1
+
j,t
(x
tj
x
tj
)
j=0
It follows immediately that for x
t
ARMA(p, 0)
x
t+1
=
1
x
t
+ ... +
p
x
tp+1
.
The h-step ahead predictor can be found iteratively
x
t+2
= P
M
x
t+2
t
=
1
P
M
x
t+1
+
2
x
t
+ ... +
p
x
tp+2
t
q
X
+
j,t+1
(x
t+2j
x
t+2j
) ,
j=h
so in particular for the AR(p) model
t max(p, q)
x
t+hp+1
h > p 1. x
t+h
=
1
x
t+h1
+
2
x
t+h2
+ ... +
p

3

4.1. The Wold Decomposition


We show that a mean zero stationary process x
t
can be decomposed into a perfectly predictable component
and a MA() process with white noise innovations.
Let M
t
= sp{x
s
, s t} and dene the one-step mean square prediction error as

2
= E

x
t
P
Mt1
x
t

2
. (4.4)
T
Also let M

= M
t
such that M

is a closed linear subspace of M = sp{x
t
, t Z}. We call a process
t=
x
t
deterministic if x
t
M

. For a deterministic process the forecast error variance is

2
E

x
t
P
Mt1
x
t
= E (x
t
x
t
)
2
= 0
since x
t
M

M
t1
. We prove the Wold decomposition theorem.
Theorem 4.1 (Wold Decomposition). If x
t
is weakly stationary and mean zero with
2
> 0 as dened in
(4.4) then
X
x
t
=
j

tj
+ v
t
(4.5)
with
i)
t
WN

0,
2

,
ii) E (
t
v
s
) = 0 t, s,
iii) v
t
M

,
P
iv)
j
2
< .
v) v
t
is deterministic.
Proof. Let

t
= x
t
P
Mt1
x
t

j
= hx
t
,
tj
i

X
v
t
= x
t

j

tj
j=0
We have
t
M
t
and
t
M

t1
by the projection theorem. Therefore
E(
t

s
) = 0 t 6= s.
Also E
t
= 0 by linearity of P
M
t1
x
t
and stationarity of x
t
. Again by linearity of P
M
t1
x
t
and weak stationarity
of x
t
we have E
2
t
= E(x
t
P
M
t1
x
t
)
2
=
2
independent of t. This shows that
t
is WN (0,
2
).
Also let H
t
= sp{
t
,
t1
, ....}. Then H
t
has a countably innite orthogonal basis
t
. The projection of x
t
onto H
t
is then given by

X
P
H
t
x
t
=
j

tj
.
j=0
To show this let y
t
= P
Ht
x
t
such that by the denition of the projection operator y
t
H
t
. It now follows that
for every > 0 and some k <

2

k

X

y
t
hy
t
,
tj
i
tj

= < .


j=0 j=k+1
|hy
t
,
tj
i|
2
4

To see this rst note that ky


t
k
2
kx
t
k
2
by the projection theorem. Then by Bessels inequality
k
X
for all k
j=0
|hy
t
,
tj
i|
2
ky
t
k
2
which proves the above inequality. Next note that
hy
t
,
tj
i = hy
t
x
t
,
tj
i + hx
t
,
tj
i = hx
t
,
tj
i =
2

j
since y
t
x
t
is orthogonal to H
t
. We have therefore established that

2

k

X


y
t

j

tj

<


j=0
and that

X
j=0

2
j
< .
Then
_ _

X
_
x
t

_
j=0
Ev
t

s
= E
j

tj
,
s
= Ex
t

s

j
E
2
s
= Ex
t

s

Ex
t

2
= 0

2
for s t and for s > t
s
M

s1
M

t
, but v
t
M
t
so Ev
t

s
= 0 again. From v
t
M
t
= M
t1
sp{
t
}
and Ev
t

t
= 0 it follows v
t
M
t1
. Repeating the same argument and using Ev
t

tj
= 0 leads to v
t
M
tj
thus v
t

T
j=0
M
tj
M

. Then
sp{v
j
, j t} M

.
P

tj
we have From x
t
= v
t
+
M
t
= H
t
sp{v
j
, j t}. (4.6)
Finally, if z M

M
t
= M

then z M
s1
such that hz,
s
i = 0 for all s. But this shows that z H
t

or z sp{v
j
, j t} by (4.6) such thatM

sp{v
j
, j t} implying that
M

= sp{v
j
, j t} for all t.
This means that v
j
is deterministic, i.e. the prediciton error variance is zero.
A process is said to be purely non-deterministic if M

= {0}. In this case the Wold decomposition is of
the form
x
t
=

X
j=0

tj
where
j
and
tj
are as dened before. Processes in this class include the ARMA(p, q) model introduced
before.
The h-step ahead predictor of (4.5) is given by
P
M
t
x
t+h
=

X
j=h

t+hj
+ v
t+h
5
since
t+k
M
t
for k > 0. It also follows that the variance of the prediction error is given by

2

h1


kx
t+h
P
Mt
x
h+1
k
2
=

t+hj


j=0
h1
X
=
2

j
2
.
j=0
As h the variance of the prediction error tends to the variance of x
t
.
6

Vous aimerez peut-être aussi