LectureNotes 9 Filtering

State Space Models and Filtering
Jes
us Fern
andez-Villaverde
University of Pennsylvania
State Space Form

What is a state space representation?
States versus observables.
Why is it useful?
Relation with filtering.
Relation with optimal control.
Linear versus nonlinear, Gaussian versus nongaussian.
2
State Space Representation
Let the following system:

Transition equation
xt+1 = F xt + G t+1, t+1 N (0, Q)
Measurement equation
zt = H 0xt + t, t N (0, R)
where xt are the states and zt are the observables.
Assume we want to write the likelihood function of z T = {zt}T
t=1.
3
The State Space Representation is Not Unique

Take the previous state space representation.
Let B be a non-singular squared matrix conforming with F .
0 0
Then, if xt = Bxt, F = BF B , G = BG, and H = H B , we
can write a new, equivalent, representation:

Transition equation
xt+1 = F xt + G t+1, t+1 N (0, Q)

zt = H 0xt + t, t N (0, R)
4
Example I
Assume the following AR(2) process:
zt = 1zt1 + 2zt2 + t, t N 0, 2
Model is not apparently not Markovian.
Can we write this model in dierent state space forms?
Yes!
5
State Space Representation I
Transition equation:
xt =
where xt =
yt 2yt1
"
1 1
2 0
xt1 +
"
i0
Measurement equation:
zt =
1 0 xt
1
0
State Space Representation II
xt =
where xt =
yt yt1
"
1 2
1 0
zt =
Try B =
1 0
0 2
xt1 +
"
1
0
i0
"
1 0 xt
on the second system to get the first system.

7
Example II
Assume the following MA(1) process:
2
zt = t + t1, t N 0, , and E t s = 0 for s 6= t.
Again, we have a more conmplicated structure than a simple Markovian process.
However, it will again be straightforward to write a state space representation.
State Space Representation I
xt =
where xt =
yt t
"
0 1
0 0
zt =
xt1 +
"
i0
1 0 xt
State Space Representation II

xt = t1
where xt = [ t1]0
zt = xt + t
Again both representations are equivalent!
10
Example III
Assume the following random walk plus drift process:
zt = zt1 + + t, t N 0, 2
This is even more interesting.
We have a unit root.
We have a constant parameter (the drift).
11
xt =
where xt =
yt
"
1 1
0 1
zt =
xt1 +
"
i0
12
1 0 xt
1
0
Some Conditions on the State Space Representation
We only consider Stable Systems.

A system is stable if for any initial state x0, the vector of states, xt,
converges to some unique x.
A necessary and sucient condition for the system to be stable is that:
|i (F )| < 1
for all i, where i (F ) stands for eigenvalue of F .
13
Introducing the Kalman Filter
Developed by Kalman and Bucy.

Wide application in science.
Basic idea.
Prediction, smoothing, and control.
Why the name filter?
14
Some Definitions
Let xt|t1 = E xt|z t1 be the best linear predictor of xt given the

history of observables until t 1, i.e. z t1.
t1
Let zt|t1 = E zt|z
= H 0xt|t1 be the best linear predictor of zt
given the history of observables until t 1, i.e. z t1.
t
Let xt|t = E xt|z be the best linear predictor of xt given the history
of observables until t, i.e. z t.
15
What is the Kalman Filter trying to do?
Let assume we have xt|t1 and zt|t1.

We observe a new zt.
We need to obtain xt|t.
Note that xt+1|t = F xt|t and zt+1|t = H 0xt+1|t, so we can go back
to the first step and wait for zt+1.
Therefore, the key question is how to obtain xt|t from xt|t1 and zt.
16
A Minimization Approach to the Kalman Filter I
Assume we use the following equation to get xt|t from zt and xt|t1:
xt|t = xt|t1 + Kt zt zt|t1 = xt|t1 + Kt zt H 0xt|t1
This formula will have some probabilistic justification (to follow).

What is Kt?
17
A Minimization Approach to the Kalman Filter II
Kt is called the Kalman filter gain and it measures how much we

update xt|t1 as a function in our error in predicting zt.
The question is how to find the optimal Kt.
The Kalman filter is about how to build Kt such that optimally update
xt|t from xt|t1 and zt.
How do we find the optimal Kt?
18
Some Additional Definitions
Let t|t1 E
Let t|t1 E
xt xt|t1
xt xt|t1 |z t1 be the predicting
error variance covariance matrix of xt given the history of observables

until t 1, i.e. z t1.
zt zt|t1
zt zt|t1 |z t1
be the predicting
error variance covariance matrix of zt given the history of observables

until t 1, i.e. z t1.
Let t|t E
xt xt|t
xt xt|t |z t be the predicting error vari-
ance covariance matrix of xt given the history of observables until t1,

i.e. z t.
19
Finding the optimal Kt
We want Kt such that min t|t.

It can be shown that, if that is the case:
1
0
Kt = t|t1H H t|t1H + R
with the optimal update of xt|t given zt and xt|t1 being:
xt|t = xt|t1 + Kt zt H 0xt|t1

We will provide some intuition later.
20
Example I
Assume the following model in State Space form:
Transition equation
xt = + t, t N 0, 2
zt = xt + t, t N 0, 2
Let 2 = q 2 .
21
Example II
Then, if 1|0 = 2 , what it means that x1 was drawn from the ergodic
distribution of xt.
We have:
K1 = 2
1
1
.
1+q
1+q
Therefore, the bigger 2 relative to 2 (the bigger q) the lower K1

and the less we trust z1.
22
The Kalman Filter Algorithm I

Given t|t1, zt, and xt|t1, we can now set the Kalman filter algorithm.
Let t|t1, then we compute:
t|t1 E
zt zt|t1
zt zt|t1 |z t1
H 0 xt xt|t1
xt xt|t1 H+
0
0
= E t xt x
t|t1 H + H xt xt|t1 t+
t 0t|z t1
= H 0t|t1H + R
23
The Kalman Filter Algorithm II

E
E H 0 xt xt|t1
zt zt|t1
xt xt|t1
xt xt|t1 |z t1
+ t xt xt|t1 |z t1
1
0
Let t|t1, xt|t1, Kt, and zt then we compute:

24
=
= H 0t|t1
The Kalman Filter Algorithm III

Let t|t1, xt|t1, Kt, and zt, then we compute:
t|t E
xt xt|t1
xt xt|t1
xt xt|t
xt xt|t |z t =
xt xt|t1
0
zt H 0xt|t1 Kt0
0
0
Kt zt H xt|t1 xt xt|t1 +
0
0
0
Kt zt H xt|t1 zt H xt|t1 Kt0 |z t
= t|t1 KtH 0t|t1
0
where, you have to notice that xt xt|t = xt xt|t1 Kt zt H xt|t1 .
25
The Kalman Filter Algorithm IV

Let t|t1, xt|t1, Kt, and zt, then we compute:
t+1|t = F t|tF 0 + GQG0
Let xt|t, then we compute:
1. xt+1|t = F xt|t
2. zt+1|t = H 0xt+1|t
Therefore, from xt|t1, t|t1, and zt we compute xt|t and t|t.
26
The Kalman Filter Algorithm V
We also compute zt|t1 and t|t1.

Why?
To calculate the likelihood function of z T = {zt}T
t=1 (to follow).
27
The Kalman Filter Algorithm: A Review

We start with xt|t1 and t|t1.
The, we observe zt and:
t|t1 = H 0t|t1H + R
zt|t1 = H 0xt|t1
1
0
t|t = t|t1 KtH 0t|t1

28

t+1|t = F t|tF 0 + GQG0
xt+1|t = F xt|t
We finish with xt+1|t and t+1|t.
29
Some Intuition about the optimal Kt
1
0
Remember: Kt = t|t1H H t|t1H + R
Notice that we can rewrite Kt in the following way:

Kt = t|t1H1
t|t1
If we did a big mistake forecasting xt|t1 using past information (t|t1
large) we give a lot of weight to the new information (Kt large).
If the new information is noise (R large) we give a lot of weight to the

old prediction (Kt small).
30
A Probabilistic Approach to the Kalman Filter
Assume:
Z|w = [X 0|w Y 0|w]0 N
then:
"
x
y
# "
xx xy
yx yy
#!
1
X|y, w N x + xy 1
yy (y y ) , xx xy yy yx
t1
Also xt|t1 E xt|z
and:
t|t1 E
xt xt|t1
31
xt xt|t1 |z t1
Some Derivations I
If zt|z t1 is the random variable zt (observable) conditional on z t1, then:
t1
0
t1
Let zt|t1 E zt|z
= E H xt + t|z
= H 0xt|t1
Let
t|t1 E
zt zt|t1
zt zt|t1 |z t1 =
0
0 x x
H
t
t|t1 xt xt|t1 H+
t xt xt|t1 H+
= H 0
E
t|t1H + R
H 0 xt xt|t1 0t+
t 0t|z t1
32
Some Derivations II
Finally, let
E
zt zt|t1
E H 0 xt xt|t1
xt xt|t1 |z t1 =
xt xt|t1
+ t xt xt|t1 |z t1 =
= H 0t|t1
33
The Kalman Filter First Iteration I
Assume we know x1|0 and 1|0, then
x1 0
|z N
z1
Remember that:
"
x1|0
H 0x1|0
# "
1|0
1|0H
H 01|0 H 01|0H + R
#!
1
X|y, w N x + xy 1
34
The Kalman Filter First Iteration II

Then, we can write:
where
and
0
1
x1|z1, z = x1|z N x1|1, 1|1
0
0
x1|1 = x1|0 + 1|0H H 1|0H + R
z1 H x1|0
1
0
H 01|0
1|1 = 1|0 1|0H H 1|0H + R
35
Therefore, we have that:

z1|0 = H 0x1|0
1|0 = H 01|0H + R
0
0
x1|1 = x1|0 + 1|0H H 1|0H + R
z1 H x1|0
1
0
1|1 = 1|0 1|0H H 1|0H + R
H 01|0
Also, since x2|1 = F x1|1 + G 2|1 and z2|1 = H 0x2|1 + 2|1:

x2|1 = F x1|1
2|1 = F 1|1F 0 + GQG0
36
The Kalman Filter th Iteration I
Assume we know xt|t1 and t|t1, then
xt t1
|z
N
zt
Remember that:
"
xt|t1
H 0xt|t1
# "
t|t1
t|t1H
H 0t|t1 H 0t|t1H + R
1
X|y, w N x + xy 1
37
#!
The Kalman Filter th Iteration II

Then, we can write:
where
and
t1
t
xt|zt, z
= xt|z N xt|t, t|t
0
0
xt|t = xt|t1 + t|t1H H t|t1H + R
zt H xt|t1
1
0
H 0t|t1
t|t = t|t1 t|t1H H t|t1H + R
38
The Kalman Filter Algorithm

Given xt|t1, t|t1 and observation zt
t|t1 = H 0t|t1H + R
zt|t1 = H 0xt|t1
t|t = t|t1 t|t1H H 0t|t1H + R
H 0
1
0
0
0
zt H xt|t1
39
t+1|t = F t|tF 0 + GQGt|t1

xt+1|t = F xt|t1
40
Putting the Minimization and the Probabilistic Approaches Together
From the Minimization Approach we know that:

From the Probability Approach we know that:
0
0
zt H xt|t1
41
But since:
1
0
We can also write in the probabilistic approach:
0
0
zt H xt|t1 =
0
= xt|t1 + Kt zt H xt|t1
Therefore, both approaches are equivalent.
42
Writing the Likelihood Function

We want to write the likelihood function of z T = {zt}T
t=1:
T
X
T
X
t=1
T
log ` z |F, G, H, Q, R =
t1
log ` zt|z
F, G, H, Q, R =
X
1
N
1
log 2 + log
vt0 1
vt
t|t1 +
t|t1
2
2 t=1
t=1 2
vt = zt zt|t1 = zt H 0xt|t1
0
t|t1 = Htt|t1Ht + R
43
Initial conditions for the Kalman Filter
An important step in the Kalman Fitler is to set the initial conditions.

Initial conditions:
1. x1|0
2. 1|0
Where do they come from?
44
Since we only consider stable system, the standard approach is to set:

x1|0 = x
1|0 =
where x solves
x = F x
= F F 0 + GQG0
How do we find ?
= [I F F ]1 vec(GQG0)
45
Initial conditions for the Kalman Filter II

Under the following conditions:
1. The system is stable, i.e. all eigenvalues of F are strictly less than one
in absolute value.
2. GQG0 and R are p.s.d. symmetric
3. 1|0 is p.s.d. symmetric
Then t+1|t .
46
Remarks
1. There are more general theorems than the one just described.
2. Those theorems are based on non-stable systems.
3. Since we are going to work with stable system the former theorem is
enough.
4. Last theorem gives us a way to find as t+1|t for any 1|0 we

start with.
47
The Kalman Filter and DSGE models
Basic Real Business Cycle model

max E0
t=0
t { log ct + (1 ) log (1 lt)}
ct + kt+1 = kt (ezt lt)1 + (1 ) k
zt = zt1 + t, t N (0, )
Parameters: = {, , , , , }
48
Equilibrium Conditions
(
1
1
1 1
= Et
1 + ezt+1 kt+1
lt+1
ct
ct+1
1
= (1 ) ezt ktlt
1 lt
ct
ct + kt+1 = ezt ktlt1 + (1 ) kt
zt = zt1 + t
49
A Special Case
We set, unrealistically but rather useful for our point, = 1.

In this case, the model has two important and useful features:
1. First, the income and the substitution eect from a productivity
shock to labor supply exactly cancel each other. Consequently, lt
is constant and equal to:
(1 )
lt = l =
(1 ) + (1 ) (1 )
2. Second, the policy function for capital is kt+1 = ezt ktl1.
50
A Special Case II
The definition of kt+1 implies that ct = (1 ) ezt ktl1.
Let us try if the Euler Equation holds:
(
1
1
1 1
= Et
ezt+1 kt+1
lt+1
ct
c
)
(t+1
1
1
zt+1 k1l1
=
E
e
t
t+1 t+1
l1
(1 ) ezt ktl1
(1 ) ezt+1kt+1
(
=
E
t
(1 ) ezt ktl1
(1 ) kt+1
=
(1 )
(1 )
51
Let us try if the Intratemporal condition holds

1
zt kl
(1
)
e
=
t
1l
(1 ) ezt ktl1
1
(1 )
=
1l
(1 )
l
(1 ) (1 ) l = (1 ) (1 l)
((1 ) (1 ) + (1 ) ) l = (1 )
Finally, the budget constraint holds because of the definition of ct.
52
Transition Equation
Since this policy function is linear in logs, we have the transition equation for the model:
0 0
log kt+1 = log l1 log kt + 1 t.

zt
zt1
1
0
0
Note constant.
Alternative formulations.
53
Measurement Equation
As observables, we assume log yt and log
0 it subject to a linearly additive
measurement error Vt = v1,t v2,t .
Let Vt N (0, ), where is a diagonal matrix with 21 and 22, as
diagonal elements.
Why measurement error? Stochastic singularity.
Then:
log yt
log it
log l1 1 0
0
1 0
54
log kt+1 +
zt
v1,t
v2,t
The Solution to the Model in State Space Form
1
log yt
xt = log kt , zt =
log it
zt1
1
0 0
0
F = log l1 , G = 1
1
0
0
H0 =
log l1 1 0
0
1 0
55
2
,Q =
,R =
The Solution to the Model in State Space Form III

Now, using z T , F, G, H, Q, and R as defined in the last slide...
...we can use the Ricatti equations to compute the likelihood function
of the model:
T
log ` z |F, G, H, Q, R
Croos-equations restrictions implied by equilibrium solution.

With the likelihood, we can do inference!
56
What do we Do if 6= 1?
We have two options:
First, we could linearize or log-linearize the model and apply the

Kalman filter.
Second, we could compute the likelihood function of the model using

a non-linear filter (particle filter).
Advantages and disadvantages.

Fern
andez-Villaverde, Rubio-Ramrez, and Santos (2005).
57
The Kalman Filter and linearized DSGE Models
We linearize (or loglinerize) around the steady state.

We assume that we have data on log output (log yt), log hours (log lt),
and log investment (log ct) subject to a linearly additive measurement
0
error Vt = v1,t v2,t v3,t .
We need to write the model in state space form. Remember that
and
b
b
k
t+1 = P kt + Qzt
b + Sz
lbt = Rk
t
t
58
Writing the Likelihood Function I
The transitions Equation:
1
1 0 0
0
b
kt+1 = 0 P Q k
t + 0 t.
0 0
1
zt+1
zt
The Measurement Equation requires some care.
59
Writing the Likelihood Function II

b + (1 )lb
Notice that ybt = zt + k
t
t
b + Sz
Therefore, using lbt = Rk
t
t
b + (1 )(Rk
b + Sz ) =
ybt = zt + k
t
t
t
b + (1 + (1 )S) z
( + (1 )R) k
t
t
b and using again lb = Rk

b + Sz
Also since cbt = 5lbt + zt + k
t
t
t
t
b (Rk
b + Sz ) =
cbt = zt + k
t
t
t
5
b + (1 S) z
( 5R) k
t
t
5
60
Writing the Likelihood Function III

Therefore the measurement equation is:
1
log yt
log y + (1 )R 1 + (1 )S
b
R
S
log lt = log l
kt
log ct
1 5S
log c
5R
zt
v1,t
+ v2,t .
v3,t
61
The Likelihood Function of a General Dynamic Equilibrium Economy
St = f (St1, Wt; )
Yt = g (St, Vt; )
Interpretation.
62
Some Assumptions
n
1. We can partition {Wt} into two independent sequences W1,t

n
and
W2,t , s.t. Wt = W1,t, W2,t and dim W2,t +dim (Vt) dim (Yt).
t
t1
2. We can always evaluate the conditional densities p yt|W1 , y , S0; .
Lubick and Schorfheide (2003).
3. The model assigns positive probability to the data.
63
Our Goal: Likelihood Function
Evaluate the likelihood function of the a sequence of realizations of

the observable y T at a particular parameter value :
p yT ;
We factorize it as:
p yT ;
=
T Z Z
Y
t=1
T
Y
t=1
p yt|y t1;
t
t1
t
t1
p yt|W1 , y , S0; p W1 , S0|y ; dW1tdS0
64
A Law of Large Numbers
If
N )T
n
oT
t|t1,i
t|t1,i
t
t1
s0
, w1
N i.i.d. draws from p W1 , S0|y ;
,
t=1
i=1 t=1
then:
p yT ; '
T
N
Y
1 X
t=1 N i=1
t|t1,i
p yt|w1
65
t|t1,i
, y t1, s0
...thus
The problem of evaluating the likelihood is equivalent to the problem of
drawing from
n
p W1t, S0|y t1;
66
oT
t=1
Introducing Particles
t1,i
t1,i N
t1
t1
s0
, w1
N i.i.d. draws from p W1 , S0|y ; .
i=1
t1,i
t1,i
Each s0
, w1
is a particle and
particles.
o
t1,i
t1,i N
s0
, w1
a swarm of
i=1
t|t1,i
t|t1,i N
t
t1
s0
, w1
N i.i.d. draws from p W1 , S0|y ; .
i=1
t|t1,i
t|t1,i
, w1
is a proposed particle and
Each s0
a swarm of proposed particles.
67
t|t1,i
t|t1,i N
s0
, w1
i=1
... and Weights
t|t1,i
p yt|w1
qti = P
N
t|t1,i
, y t1, s0
t|t1,i t1 t|t1,i
p
y
|w
, y , s0
;
t 1
i=1
68
A Proposition
Let
and
N
t|t1,i
be a draw with replacement from st|t1,i
,
w
0
1
i=1
n
oN
probabilities qti. Then sei0, we1i i=1 is a draw from p W1t, S0|yt; .
oN
e 1i
sei0, w
i=1
69
Importance of the Proposition
t|t1,i
t|t1,i N
t
t1
, w1
from p W1 , S0|y ;
1. It shows how a draw s0
i=1
n
oN
t,i
t,i
t
t
can be used to draw s0 , w1
from p W1 , S0|y ; .
i=1
n
t,i
t,i N
t
t
2. With a draw s0 , w1
from p W1 , S0|y ; we can use p W1,t+1;
i=1
t+1|t,i
t+1|t,i N
to get a draw s0
, w1
and iterate the procedure.
i=1
70
Sequential Monte Carlo I: Filtering

Step 0, Initialization: Set t
p (S0; ).
1 and initialize p W1t1, S0|y t1;
N
t|t1,i
t|t1,i
Step 1, Prediction: Sample N values s0
, w1
from the
i=1

t1
density p W1t, S0|y t1; = p W1,t; p W1 , S0|y t1; .
t|t1,i
Step 2, Weighting: Assign to each draw s0

qti.
t|t1,i
, w1
the weight
o
t|t1,i
t|t1,i N
t,i
t,i N
with rep. from s0
, w1
Step 3, Sampling: Draw s0 , w1
i=1
i=1
n oN
with probabilities qti
. If t < T set t t + 1 and go to
i=1
n
step 1.
Otherwise stop.
71
Sequential Monte Carlo II: Likelihood
Use
N )T
t|t1,i
t|t1,i
s0
, w1
to compute:
i=1 t=1
p yT ; '
T 1 X
N
Y
t=1 N i=1
t|t1,i
p yt|w1
72
t|t1,i
, y t1, s0
A Trivial Application
T
How do we evaluate the likelihood function p y |, , of the nonlinear,
nonnormal process:
st = +
st1
+ wt
1 + st1
yt = st + vt
where wt N (0, ) and vt t (2) given some observables y T = {yt}T
t=1
and s0.
73
0,i
1. Let s0 = s0 for all i.
2. Generate N i.i.d. draws
1|0,i
3. Evaluate p y1|w1
N
1|0,i
s0 , w1|0,i
from N (0, ).
i=1
1|0,i
, y 0, s0
= pt(2) y1 +
4. Evaluate the relative weights q1i =
74
1|0,i
s0
1|0,i
1+s0
+ w1|0,i
!!
!!
1|0,i
s
pt(2) y1 + 0 1|0,i +w1|0,i
1+s
0
!! .
1|0,i
PN
s
0
p
y
+
+w1|0,i
1
i=1 t(2)
1|0,i
1+s
0
N
1|0,i
5. Resample with replacement N values of s0 , w1|0,i
with relan
oNi=1
1,i
tive weights q1i . Call those sampled values s0 , w1,i
.
i=1
6. Go to step 1, and iterate 1-4 until the end of the sample.
75
A Law of Large Numbers
A law of the large numbers delivers:
N
X
1
1|0,i
1|0,i
p y1| y 0, , , '
p y1|w1 , y 0, s0
N i=1
and consequently:
T
N
Y
X
1
t|t1,i t1 t|t1,i
p y T , , '
p yt|w1
, y , s0
N
t=1
i=1
76
Comparison with Alternative Schemes
Deterministic algorithms: Extended Kalman Filter and derivations

(Jazwinski, 1973), Gaussian Sum approximations (Alspach and Sorenson, 1972), grid-based filters (Bucy and Senne, 1974), Jacobian of the
transform (Miranda and Rui, 1997).
Tanizaki (1996).
Simulation algorithms: Kitagawa (1987), Gordon, Salmond and Smith

(1993), Mariano and Tanizaki (1995) and Geweke and Tanizaki (1999).
77
A Real Application: the Stochastic Neoclassical Growth Model
Standard model.
Isnt the model nearly linear?
Yes, but:
1. Better to begin with something easy.
2. We will learn something nevertheless.
78
The Model
1
1
ct (1lt )
P
t
.
Representative agent with utility function U = E0 t=0
1
One good produced according to yt = ezt Aktlt1 with (0, 1) .

Productivity evolves zt = zt1 + t, || < 1 and t N (0, ).
Law of motion for capital kt+1 = it + (1 )kt.
Resource constrain ct + it = yt.
79
Solve for c (, ) and l (, ) given initial conditions.

Characterized by:
h
1
Uc(t) = Et Uc(t + 1) 1 + Aezt+1kt+1
l(kt+1, zt+1)
1 c(kt, zt)
= (1 ) ezt Aktl(kt, zt)
1 l(kt, zt)
A system of functional equations with no known analytical solution.
80
Solving the Model

We need to use a numerical method to solve it.
Dierent nonlinear approximations: value function iteration, perturbation, projection methods.
We use a Finite Element Method. Why? Aruoba, Fern
andez-Villaverde
and Rubio-Ramrez (2003):
1. Speed: sparse system.
2. Accuracy: flexible grid generation.
3. Scalable.
81
Building the Likelihood Function
Time series:
1. Quarterly real output, hours worked and investment.
2. Main series from the model and keep dimensionality low.
Measurement error. Why?

= (, , , , , , , 1, 2, 3)
82

kt =
t =
gdpt =
hourst =
invt =
1
1 (
tanh
)
1
t1 k
f1(St1, Wt; ) = e
t1l kt1, tanh (t1);
1
1 l kt1, tanh (t1);
1
+ (1 ) kt1
(1 )
1
1
l kt1, tanh (t1);
f2(St1, Wt; ) = tanh( tanh1(t1) + t)
1
1 ( )
tanh
1
t
g1(St, Vt; ) = e
kt l kt, tanh (t);
+ V1,t
1
g2(St, Vt; ) = l kt, tanh (t); + V2,t
1
1 ( )
tanh
1
t
g3(St, Vt; ) = e
kt l kt, tanh (t);
1
1 l kt, tanh (t);
1
+ V3,t
(1 )
1
l kt, tanh1(t);
Likelihood Function
83
Since our measurement equation implies that

32
p (yt|St; ) = (2)
12
||
(St ;)
2
where (St; ) = (yt x(St; )))0 1 (yt x(St; )) t, we have
3T
2
(2)
T
|| 2
p yT ;
T Z
Y
t=1
'
3T
(2) 2
(St ;)
t1
2
p St|y , S0; dSt p (S0; ) dS1
T
|| 2
T 1 X
N
(sit ;)
Y
e 2
t=1 N i=1
84
Priors for the Parameters

Priors for the Parameters of the Model
Parameters Distribution Hyperparameters
Uniform
0,1
Uniform
0,1
Uniform
0,100
Uniform
0,1
Uniform
0,0.05
Uniform
0.75,1
Uniform
0,0.1
1
Uniform
0,0.1
2
Uniform
0,0.1
3
Uniform
0,0.1
85
Likelihood-Based Inference I: a Bayesian Perspective
Define priors over parameters: truncated uniforms.

Use a Random-walk Metropolis-Hastings to draw from the posterior.
Find the Marginal Likelihood.
86
Likelihood-Based Inference II: a Maximum Likelihood Perspective
We only need to maximize the likelihood.

Diculties to maximize with Newton type schemes.
Common problem in dynamic equilibrium economies.
We use a simulated annealing scheme.
87
An Exercise with Artificial Data

First simulate data with our model and use that data as sample.
Pick true parameter values. Benchmark calibration values for the
stochastic neoclassical growth model (Cooley and Prescott, 1995).
Calibrated Parameters
Parameter
Value
0.357 0.95
2.0
0.4
Parameter
1
2
Value
0.99
0.007 1.58*104 0.0011
Sensitivity: = 50 and = 0.035.
88
0.02
3
8.66*104
Figure 5.1: Likelihood Function Benchmark Calibration

Likelihood cut at
Likelihood cut at
0.9 0.92 0.94 0.96 0.98
1.5
Likelihood cut at
2.5
Likelihood cut at
3.5
0.38
Likelihood cut at
0.4
0.42
Likelihood cut at
0.018
0.02
0.022
Likelihood cut at
Nonlinear
Linear
Pseudotrue
7
10 11
x 10
0.98
-3
0.985
0.99
0.25
0.3
0.35
0.4
Figure 5.2: Posterior Distribution Benchmark Calibration
5000
5000
4000
4000
3000
3000
2000
2000
1000
1000
0.94850.9490.9495 0.95 0.95050.9510.9515
0
1.996
1.998
4000
2000
2000
0.3995
0.4
0.4005
0.01957
0.0196
0.01963
5000
6.99
6.995
7.005
7.01
x 10
0.988
0.989
0.99
0.991
-3
5000
2.004
5000
2.002
4000
5000
0.3564
0.3568
0.3572
0.3576
0
1.578 1.579 1.58 1.581 1.582 1.583 1.584
x 10
4000
4000
2000
2000
0
1.116
1.117
1.118
-4
1.119
1.12
x 10
-3
8.645 8.65 8.655 8.66 8.665 8.67 8.675

x 10
-4
Figure 5.3: Likelihood Function Extreme Calibration

Likelihood cut
Likelihood cut
0.9 0.92 0.94 0.96 0.98
Likelihood cut
40
45
50
55
Likelihood cut
0.36 0.38
Likelihood cut
0.4
0.42 0.44
Likelihood cut
0.016 0.018 0.02 0.022
Likelihood cut
Nonlinear
Linear
Pseudotrue
0.03
0.035
0.04
0.95 0.96 0.97 0.98 0.99
0.3
0.35
0.4
Figure 5.4: Posterior Distribution Extreme Calibration
6000
6000
4000
4000
2000
2000
0.9495
0.95
0
49.95
0.9505
50
6000
6000
4000
4000
2000
2000
0
0.3996
0.3998
0.4
0.4002
0.019555 0.019565 0.019575 0.019585
6000
6000
4000
4000
2000
2000
0.035 0.035 0.035 0.035 0.035 0.035 0.035
0.989
0.9895
0.99
0.9905
5000
50.05
5000
0.3567
0.3569
0.3571
0.3573
1.58 1.5805 1.581 1.5815 1.582 1.5825

x 10
6000
6000
4000
4000
2000
2000
0
1.117
1.1175
1.118
-4
1.1185
1.119
x 10
-3
8.655
8.66
8.665
x 10
-4
Figure 5.5: Converge of Posteriors Extreme Calibration
80
0.8
60
0.6
0.4
4
x 10
40
4
x 10
0.03
0.45
0.4
0.35
0.02
4
x 10
0.01
0.03
0.95
4
5
3
x 10
0.04
0.02
4
x 10
0.9
4
x 10
0.4
0.05
0.35
4
x 10
4
x 10
0.015
0.04
0.01
0.02
0.005
0
4
x 10
4
x 10
Figure 5.6: Posterior Distribution Real Data
10000
15000
10000
5000
5000
0
0.96
x 10
0.97
0.98
0.99
0
1.68
1.7
1.72
1.74
1.76
6.35
6.4
10000
5000
0
0.32 0.322 0.324 0.326 0.328
0.33
0.332
0
6.2
6.25
6.3
x 10
15000
15000
10000
10000
5000
5000
0
0.0198
0.02
0.0202
0.0204
0
0.9964 0.9966 0.9968 0.997 0.9972 0.9974 0.9976
1
10000
10000
5000
5000
0
0.385
0.39
-3
0.395
0.4
0
0.0435 0.044 0.0445 0.045 0.0455 0.046 0.0465
10000
15000
10000
5000
5000
0
0.014
0.0145
0.015
0.0155
0.016
0
0.037 0.0375 0.038 0.0385 0.039 0.0395 0.04
Figure 6.1: Likelihood Function

Transversal cut at
Transversal cut at
-1000
-1000
-2000
-2000
-3000
-3000
-4000
-4000
-5000
-5000
Exact
100 Particles
1000 Particles
10000 Particles
-6000
-7000
-7000
0.39
0.395
0.4
0.405
Exact
100 Particles
1000 Particles
10000 Particles
-6000
0.41
0.97
0.98
Exact
100 Particles
1000 Particles
10000 Particles
-60
-80
-40
-80
-100
-120
-120
-140
-140
-160
-160
-180
-180
-200
-200
-220
-220
0.94
0.95
1.01
0.96
0.97
Exact
100 Particles
1000 Particles
10000 Particles
-60
-100
0.93
Transversal cut at
Transversal cut at
-40
0.99
6.85
6.9
6.95
7.05
7.1
7.15
x 10
-3
Figure 6.2: C.D.F. Benchmark Calibration

10000 particles
20000 particles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
2000
4000
6000
8000
10000
0.5
1.5
2
x 10
30000 particles
40000 particles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
3
x 10
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
4
4
60000 particles
1
x 10
50000 particles
5
x 10
6
x 10
Figure 6.3: C.D.F. Extreme Calibration

10000 particles
20000 particles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
2000
4000
6000
8000
10000
0.5
1.5
2
x 10
30000 particles
40000 particles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
3
x 10
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
4
4
60000 particles
1
x 10
50000 particles
5
x 10
6
x 10
Figure 6.4: C.D.F. Real Data

10000 particles
20000 particles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
2000
4000
6000
8000
10000
0.5
1.5
2
x 10
30000 particles
40000 particles
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
3
x 10
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
4
4
60000 particles
1
x 10
50000 particles
5
x 10
6
x 10
Posterior Distributions Benchmark Calibration

Parameters
Mean
s.d.
6.72105
0.357
3.40104
0.950
6.78104
2.000
0.400
8.60105
0.020
1.34105
0.989
1.54105
0.007
9.29106
1.58104
1
5.75108
2
1.12102
6.44107
3
8.64104
6.49107
89
Maximum Likelihood Estimates Benchmark Calibration

Parameters
MLE
s.d.
0.357
8.19106
0.950
0.001
2.000
0.020
0.400
2.02106
0.002
2.07105
0.990
1.00106
0.007
0.004
1.58104
1
0.007
2
1.12103
0.007
3
8.63104
0.005
90
Posterior Distributions Extreme Calibration

Parameters
Mean
s.d.
7.19104
0.357
1.88104
0.950
7.12103
50.00
0.400
4.80105
0.020
3.52106
0.989
8.69106
0.035
4.47106
1.58104
1
1.87108
2
1.12102
2.14107
3
8.65104
2.33107
91
Maximum Likelihood Estimates Extreme Calibration

Parameters
MLE
s.d.
0.357
2.42106
0.950
6.12103
50.000
0.022
0.400
3.62107
0.019
7.43106
0.990
1.00105
0.035
0.015
1.58104
1
0.017
2
1.12103
0.014
3
8.66104
0.023
92
Convergence on Number of Particles

Convergence Real Data
N
Mean
s.d.
10000
1014.558
0.3296
20000
1014.600
0.2595
30000
1014.653
0.1829
40000
1014.666
0.1604
50000
1014.688
0.1465
60000
1014.664
0.1347
93
Posterior Distributions Real Data

Parameters Mean
s.d.
7.976 104
0.323
0.008
0.969
0.011
1.825
0.001
0.388
3.557 105
0.006
9.221 105
0.997
0.023
2.702 104
1
0.039
5.346 104
2
0.018
4.723 104
3
0.034
6.300 104
94
Maximum Likelihood Estimates Real Data

Parameters
MLE
s.d.
0.044
0.390
0.708
0.987
1.398
1.781
0.019
0.324
0.160
0.006
0.997
8.67103
0.023
0.224
1
0.038
0.060
2
0.016
0.061
3
0.035
0.076
95
Logmarginal Likelihood Dierence: Nonlinear-Linear

p Benchmark Calibration Extreme Calibration Real Data
0.1
73.631
117.608
93.65
0.5
73.627
117.592
93.55
0.9
73.603
117.564
93.55
96
output
hours
inv
Nonlinear versus Linear Moments Real Data

Real Data
Nonlinear (SMC filter) Linear (Kalman filter)
Mean
s.d
Mean
s.d
Mean
s.d
1.95
0.073
1.91
0.129
1.61
0.068
0.36
0.014
0.36
0.023
0.34
0.004
0.42
0.066
0.44
0.073
0.28
0.044
97
A Future Application: Good Luck or Good Policy?
U.S. economy has become less volatile over the last 20 years (Stock
and Watson, 2002).
Why?
1. Good luck: Sims (1999), Bernanke and Mihov (1998a and 1998b)
and Stock and Watson (2002).
2. Good policy: Clarida, Gertler and Gal (2000), Cogley and Sargent
(2001 and 2003), De Long (1997) and Romer and Romer (2002).
3. Long run trend: Blanchard and Simon (2001).
98
How Has the Literature Addressed this Question?
So far: mostly with reduced form models (usually VARs).

But:
1. Results dicult to interpret.
2. How to run counterfactuals?
3. Welfare analysis.
99
Why Not a Dynamic Equilibrium Model?
New generation equilibrium models: Christiano, Eichebaum and Evans

(2003) and Smets and Wouters (2003).
Linear and Normal.

But we can do it!!!
100
Environment
Discrete time t = 0, 1, ...

t = (s , ..., s ) and probability
Stochastic
process
s
S
with
history
s
t
0

st .
101
The Final Good Producer

Perfectly Competitive Final Good Producer that solves
max
yi(st)
Z
1

t
yi s di pi st yi st di.
Demand function for each input of the form
1
1
t

p
s
i
t ,
y
s
yi st =
p (st)
with price aggregator:

p st =
! 1

t
1
pi s
di
.
102
The Intermediate Good Producer
Continuum of intermediate good producers, each of one behaving as

monopolistic competitor.
The producer of good i has access to the technology:

t
yi st = max ez (s )ki st1 li1 st , 0 .

t
t1
Productivity z s = z s
+ z st .
Calvo pricing with indexing. Probability of changing prices (before

observing current period shocks) 1 .
103
Consumers Problem
c

m

c st dc st1
l st l
m st
X
E
t c st
l st
+ m st
c
l
m
st t=0

Z
t
t
t
t
t
t+1
t+1
p s
c s +x s
+M s +
q s s B s
dst+1 =
t+1
s

t
t
t
t
t1
t1
t
t
p s
w s l s +r s k s
+M s
+ B s + s + T st
t+1
B s
B

t

x
s
t
t1
+ x st .
k s = (1 ) k s

k st1
104
Government Policy
Monetary Policy: Taylor rule

t
i s
= rg g st

t
t
+a s
s g st

t
t
t
+b s
y s yg s
+ i st

t
t1
g s
= g s
+ st

t
t1
a s
= a s
+ a st

t
t1
b s
= b s
+ b st
Fiscal Policy.
105
Stochastic Volatility I
We can stack all shocks in one vector:

0
t
t
t
t
t
t
t
t
s = z s , c s , l s , m s , i s , s , a s , b st
Stochastic volatility:

0.5
t
s = R st
st .

The matrix R st can be decomposed as:

1

t
t
H st G st .
R s =G s
106
Stochastic Volatility II

H st (instantaneous shocks variances) is diagonal with nonzero ele
ments hi st that evolve:

t
t1
+ i i st .
log hi s = log hi s

G st (loading matrix) is lower triangular, with unit entries in the

diagonal and entries ij st that evolve:

t
t1
+ ij ij st .
ij s = ij s
107
Where Are We Now?
Solving the model: problem with 45 state variables: physical

capital,
t
the aggregate price level, 7 shocks, 8 elements of matrix H s , and

the 28 elements of the matrix G st .
Perturbation.
We are making good progress.
108

LectureNotes 9 Filtering

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

LectureNotes 9 Filtering

Transféré par

Droits d'auteur :

Formats disponibles

State Space Models and Filtering

State Space Form

State Space Representation

Let the following system:

The State Space Representation is Not Unique

Then, if xt = Bxt, F = BF B , G = BG, and H = H B , we

can write a new, equivalent, representation:

xt+1 = F xt + G t+1, t+1 N (0, Q)

Assume the following AR(2) process:

State Space Representation I

State Space Representation II

on the second system to get the first system.

Assume the following MA(1) process:

Again, we have a more conmplicated structure than a simple Markovian process.

However, it will again be straightforward to write a state space representation.

State Space Representation I

State Space Representation II

Assume the following random walk plus drift process:

State Space Representation

Some Conditions on the State Space Representation

We only consider Stable Systems.

Introducing the Kalman Filter

Developed by Kalman and Bucy.

Let xt|t1 = E xt|z t1 be the best linear predictor of xt given the

What is the Kalman Filter trying to do?

Let assume we have xt|t1 and zt|t1.

A Minimization Approach to the Kalman Filter I

xt|t = xt|t1 + Kt zt zt|t1 = xt|t1 + Kt zt H 0xt|t1

This formula will have some probabilistic justification (to follow).

A Minimization Approach to the Kalman Filter II

Kt is called the Kalman filter gain and it measures how much we

Some Additional Definitions

xt xt|t1 |z t1 be the predicting

error variance covariance matrix of xt given the history of observables

error variance covariance matrix of zt given the history of observables

xt xt|t |z t be the predicting error vari-

ance covariance matrix of xt given the history of observables until t1,

Finding the optimal Kt

We want Kt such that min t|t.

with the optimal update of xt|t given zt and xt|t1 being:

xt|t = xt|t1 + Kt zt H 0xt|t1

Therefore, the bigger 2 relative to 2 (the bigger q) the lower K1

The Kalman Filter Algorithm I

The Kalman Filter Algorithm II

Let t|t1, then we compute:

Let t|t1, xt|t1, Kt, and zt then we compute:

xt|t = xt|t1 + Kt zt H 0xt|t1

The Kalman Filter Algorithm III

= t|t1 KtH 0t|t1

The Kalman Filter Algorithm IV

The Kalman Filter Algorithm V

We also compute zt|t1 and t|t1.

The Kalman Filter Algorithm: A Review

t|t = t|t1 KtH 0t|t1

xt|t = xt|t1 + Kt zt H 0xt|t1

Some Intuition about the optimal Kt

Notice that we can rewrite Kt in the following way:

If the new information is noise (R large) we give a lot of weight to the

A Probabilistic Approach to the Kalman Filter

The Kalman Filter First Iteration I

Assume we know x1|0 and 1|0, then

The Kalman Filter First Iteration II

Therefore, we have that:

Also, since x2|1 = F x1|1 + G 2|1 and z2|1 = H 0x2|1 + 2|1: