Vous êtes sur la page 1sur 7

The Durbin-Levinson Algorithm

Consider the problem of estimating the parameters of an AR(p) model.


We already have the Yule-Walker equations

1
p

p
,
2
Z
=
X
(0)

p
.
The Durbin-Levinson algorithm provides an alternative that avoids the matrix inversion in
the Yule-Walker equations.
It is actually a prediction algorithm. We will see that it also can be used for parameter
estimation for the AR(p) model.
Nice side eects of using the DL-algorithm: we will automatically get partial autocorrelations
and mean-squared errors associated with our predictions!
The down-side is that it will not help us in our upcoming dance war with the mathstats
class.
It may or may not run on squirrels. This is still an open problem.
The DL-algorithm is an example of a recursive prediction algorithm.
Suppose we predict X
n+1
from X
1
, X
2
, . . . , X
n
.
Suppose then that time goes by and we get to observe X
n+1
, but now we want to predict
X
n+2
from X
1
, X
2
, . . . , X
n+1
.
We could
start from scratch,
or, we could use what we learned from predicting X
n+1
and update that somehow!
The setup for the DL-algorithm is a mean zero, (otherwise subtract the mean, predict, and add it
back), stationary process {X
t
} with covariance function
X
(h).
Notation:
The best linear predictor of X
n+1
given X
1
, X
2
, . . . , X
n
:

X
n+1
= b
nn
X
1
+b
n,n1
X
2
+ +b
n1
X
n
=
n

i=1
b
ni
X
ni+1
The mean squared prediction error is
v
n
= E
_
_
X
n+1


X
n+1
_
2
_
We want to recursively compute the best bs, and, at the same time, compute the vs.
In the proof of the DL-algorithm, it becomes apparent why
b
nn
=
X
(n) = the PACF at lag n.
Without further ado...
The Durbin-Levinson Algorithm
Step Zero Set b
00
= 0, v
0
=
X
(0), and n = 1.
Step One Compute
b
nn
=
_

X
(n)
n1

i=1
b
n1,i

X
(n i)
_
v
1
n1
.
Step Two For n 2, compute
_
_
_
b
n1
.
.
.
b
n,n1
_
_
_ =
_
_
_
b
n1,1
.
.
.
b
n1,n1
_
_
_b
nn
_
_
_
b
n1,n1
.
.
.
b
n,1
_
_
_.
Step Three Compute
v
n
= v
n1
(1 b
2
nn
)
Set n = n + 1 and return to Step One.
(Note: The DL-algorithm requires that
X
(0) > 0 and that
X
(n) 0 as n .)
Proof:
1. Set A
1
to be the span of {X
2
, . . . , X
n
}. That is, let A
1
be the set of all random variables that
can be formed from linear combinations of X
2
, . . . , X
n
.
Let A
2
be the span of the single random variable X
1
L
{X
2
,...,Xn}
(X
1
). Here, L
{X
2
,...,Xn}
(X
1
)
is our usual notation for the best linear predictor of X
1
based on X
2
, . . . , X
n
. (As mentioned
in class, it is the projection of X
1
onto the subspace generated by X
2
, . . . , X
n
and is more
commonly written as P
sp{X
2
,...,Xn}
(X
1
).)
Note:

X
n+1
= L
{X
1
,...,Xn}
(X
n+1
) if and only if
X
n+1
is a linear combination of X
1
, . . . , X
n
E[(X
n+1


X
n+1
)X
i
] = 0 for i = 1, 2, . . . , n
(That second condition is from the derivative set equal to zero used in minimizing the MSE
of the best linear predictor.)
2. Claim: A
1
and A
2
are orthogonal in the sense that if Y
1
A
1
and Y
2
A
2
then E[Y
1
Y
2
] = 0.
Proof of claim:
Y
1
A
1
implies that Y
1
has the form
Y
1
= a
2
X
2
+ +a
n
X
n
.
Y
2
A
2
implies that Y
2
has the form
Y
2
= a
_
X
1
L
{X
2
,...,Xn}
(X
1
)
_
So,
E[Y
1
Y
2
] = E
_
(

n
i=2
a
i
X
i
) a
_
X
1
L
{X
2
,...,Xn}
(X
1
)
__
= a

n
i=2
a
i
E
_
X
i
_
X
1
L
{X
2
,...,Xn}
(X
1
)
__
= 0
because that expectation is zero for each i in the sum.
3. Note that

X
n+1
= L
{X
1
,...,Xn}
(X
n+1
) = L
A
1
(X
n+1
) +L
A
2
(X
n+1
)
= L
{X
2
,...,Xn}
(X
n+1
) +a
_
X
1
L
{X
2
,...,Xn}
(X
1
)
_
for some a IR.
4. In general, if we want to nd the best linear predictor of a random variable Y based on a
random variable X:

Y = aX, we minimize E
_
(Y aX)
2
_
with respect to a.
It is easy to show that a = E[XY ]/E[X
2
].
In our problem then, we have that
a =
E
_
X
n+1
(X
1
L
{X
2
,...,Xn}
(X
1
))
_
E
_
_
X
1
L
{X
2
,...,Xn}
(X
1
)
_
2
_ .
5. Note that (X
1
, . . . , X
n
)
t
, (X
n
, . . . , X
1
)
t
, and, (X
2
, . . . , X
n+1
), for example, all have the same
variance-covariance matrix.
Since best linear prediction depends only on the variance covariance matrix, we then have
that
L
{X
2
,...,Xn}
(X
n+1
) = L
{X
2
,...,Xn}
(X
1
)
since the lag dierences are the same.
In our notation,
L
{X
2
,...,Xn}
(X
n+1
) = b
n1,n1
X
2
+b
n1,n2
X
3
+ +b
n1,1
X
n
=
n1

i=1
b
n1,i
X
n+1i
and
L
{X
2
,...,Xn}
(X
1
) = b
n1,1
X
2
+b
n1,2
X
3
+ +b
n1,n1
X
n
=
n1

i=1
b
n1,i
X
i+1
.
So,
E
_
_
X
1
L
{X
2
,...,Xn}
(X
1
)
_
2
_
= E
_
_
X
n+1
L
{X
2
,...,Xn}
(X
n+1
)
_
2
_
= E
_
_
X
n
L
{X
1
,...,X
n1
}
(X
n
)
_
2
_
= v
n1
6. Therefore,
a =

X
(n) E
_
X
n+1

n1
i=1
b
n1,i
X
i+1
_
v
n1
=
_

X
(n)
n1

i=1
b
n1,i

X
(n i)
_
v
1
n1
,
which is the formula given in Step One of the DL-algorithm!
7.

X
n+1
= L
A
1
(X
n+1
) +a(X
1
L
A
1
(X
1
))
= aX
1
+

n1
i=1
(b
n1,i
ab
n1,ni
) X
n+1i
.
Hey! Wait! We know that

X
n+1
=

n
i=1
b
ni
X
n+1i
.
8. Since
n
is invertible (since we assume here that
X
(0) > 0 and that
X
(n) 0 as n ),
the two solutions for

X
n+1
in (7.) are equal. Equating coecients gives us
b
nn
= a, and b
nj
= b
n1,j
ab
n1,ni
.
This is Step Two of the DL-algorithm.
9. Now
v
n
= E[(X
n+1


X
n+1
)
2
]
= E[(X
n+1
L
A
1
(X
n+1
) L
A
2
(X
n+1
))
2
]
= E[(X
n+1
L
{X
2
,...,Xn}
(X
n+1
))
2
] 2E[(X
n+1
L
A
1
(X
n+1
) L
A
2
(X
n+1
))]
+E[(L
A
2
(X
n+1
))
2
]
= v
n1
2E[X
n+1
L
A
2
(X
n+1
)] +E[a
2
(X
1
L
{X
2
,...,Xn}
(X
1
))
2
]
= v
n1
+a
2
v
n1
2E[X
n+1
a(X
1
L
{X
2
,...,Xn}
(X
1
))]
= (1 +a
2
)v
n1
2aE[X
n+1
(X
1
L
{X
2
,...,Xn}
(X
1
))].
But,
a = b
nn
=
E[X
n+1
(X
1
L
{X
2
,...,Xn}
(X
1
))]
v
n1
So,
v
n
= (1 +a
2
)v
n1
2a av
n1
= (1 a
2
)v
n1
= (1 b
2
nn
)v
n1
which is Step Three of the DL-algorithm!
The PACF Connection:
During the proof of the DL-algorithm, we saw that
b
nn
= a =
E[X
n+1
(X
1
L
{X
2
,...,Xn}
(X
1
))]
E[(X
1
L
{X
2
,...,Xn}
(X
1
))
2
]
,
and that this may be rewritten (see step 5 of DL-proof) as
=
E[(X
n+1
L
{X
2
,...,Xn}
(X
n+1
))(X
1
L
{X
2
,...,Xn}
(X
1
))]
(E[(X
1
L
{X
2
,...,Xn}
(X
1
))])
1/2
(E[(X
n+1
L
{X
2
,...,Xn}
(X
n+1
))])
1/2
.
But this is the denition of
Corr((X
n+1
L
{X
2
,...,Xn}
(X
n+1
), X
1
L
{X
2
,...,Xn}
(X
1
))
which is the denition of
X
(n), the PACF of {X
t
} at lag n.
Example: AR(2), X
t
=
1
X
t1
+
2
X
t2
+Z
t
where {Z
t
} WN(0,
2
Z
), and
1
and
2
are known
(we are doing prediction not estimation of parameters) and are such that the process is causal.
We wish to recursively predict

X
n+1
for n = 1, 2, . . ., based on previous values, and give the
MSE of the predictions.
We will need
X
(0),
X
(1),
X
(2), . . ., but we can solve for them individually as needed. ie:
We set up the standard equations by multiplying the AR equation by X
tk
and taking ex-
pectations:

X
(0)
1

X
(1)
2

X
(2) =
2
Z

X
(1)
1

X
(0)
2

X
(1) = 0

X
(2)
1

X
(1)
2

X
(0) = 0
which give us

X
(0) =
1
2
(1 +
2
)(1 (
1
+
2
)(
1
+ 1
2
))

2
Z
,

X
(1) =

1
same denominator

2
Z
,
and

X
(2) =

2
1
+
2

2
2
same denominator

2
Z
.
Since, for k 2 we have

X
(k) =
1

X
(k 1) +
2

X
(k 2),
we can easily get additional s as needed.
The DL-algorithm:
n = 1
b
00
= 0, v
0
=
X
(0)
b
11
= [
X
(1)]v
1
0
=

X
(1)

X
(0)
=
X
(1) =

1
1
2
v
1
= v
0
(1 b
2
11
) =
X
(0)
_
1

2
1
(1
2
)
2
_
So, the best linear predictor of X
2
based on X
1
is

X
2
= b
11
X
1
=

1
1
2
X
1
and the MSE of this predictor is v
1
.
n = 2
b
22
= [
X
(2) b
11

X
(1)] v
1
1
=

X
(2)

X
(1)

X
(0)

X
(1)

X
(0)
_
1
_

X
(1)

X
(0)
_
2
_
=

X
(2)
2
X
(1)
1
2
X
(1)
= =
2
b
21
= b
11
b
22
b
11
=

1
1
2
(1
2
) =
1
v
2
= v
1
(1 b
2
22
) =
X
(0)
_
1

2
1
(1
2
)
2
_
(1
2
2
)
So, the best linear predictor of X
3
based on X
1
and X
2
is

X
3
= b
22
X
1
+b
21
X
2
=
2
X
1
+
1
X
2
(Hmmm... is this surprising?) and the MSE associated with this prediction is v
2
.
n = 3
Continuing, we get
b
33
= =

X
(3)
1

X
(2)
2

X
(1)
1
1

X
(1)
2

X
(2)
and now the reason for the transformation to this -representation becomes apparent! That nu-
merator is zero for this AR(2) model!
Now
_
b
31
b
32
_
=
_
b
21
b
22
_
b
33
_
b
22
b
21
_
=
_
b
21
b
22
_
=
_

1

2
_
So, the best linear predictor of X
4
given X
1
, X
2
, and X
3
is

X
4
= b
33
X
1
+b
32
X
2
+b
31
X
3
=
2
X
2
+
1
X
3
.
(Hmmm, also not surprising!)
In fact, for all future n, we will get b
nn
= 0 and

X
n
=
1
X
n
+
2
X
n1
.
Note that we also now know the PACF for this AR(2) model:

X
(1) = b
11
=

1
1
2
,
X
(2) = b
22
=
2
,
X
(3) = b
33
= 0
and

X
(n) = b
nn
= 0 for n 2.

Vous aimerez peut-être aussi