1.1
The term rational expectations is most closely associated with Nobel Laureate Robert Lucas of the University of Chicago, but the question of rationality
of expectations came into the place before Lucas investigated the issue (see
Muth [1960] or Muth [1961]). The most basic interpretation of rational expectations is usually summarized by the following statement:
Individuals do not make systematic errors in forming their expectations; expectations errors are corrected immediately, so that
on average expectations are correct.
But rational expectation is a bit more subtil concept that may be defined in
3 ways.
Definition 1 (Broad definition) Rational expectations are such that individuals formulate their expectations in an optimal way, which is actually com1
probability distributions of the shocks that hit the economy that is what is
needed to compute all the moments (average, standard deviations, covariances
. . . ) which are needed to compute expectations. In other words, and this is
precisely what makes rational expectations so attractive:
Expectations should be consistent with the model
= Solving the model is finding an expectation function.
Notation: Hereafter, we will essentially deal with markovian models, and will
work with the following notation:
Eti (xt ) = E(xt |ti )
where ti = {xk ; k = 0 . . . t i}.
The weak definition of rational expectations satisfies two vary important properties.
Proposition 1 Rational Expectations do not exhibit any bias: Let x
bt = xt xet
denote the expectation error:
Et1 (b
xt ) = 0
which essentially corresponds to the fact that individuals do not make systematic errors in forming their expectations.
Proposition 2 Expectation errors do not exhibit any serial correlation:
Covt1 (b
xt , x
bt1 ) = Et1 (b
xt x
bt1 ) Et1 (b
xt )Et1 (b
xt1 )
= Et1 (b
xt )b
xt1 Et1 (b
xt )b
xt1
= 0
Example 1 Lets consider the following AR(2) process
xt = 1 xt1 + 2 xt2 + t
such that the roots lies outside the unit circle and t is the innovation of the
process.
Beyond, the example reveals a very important property of rational expectations: a rational expectation model is not a model in which the individual knows everything. Everything depends on the information structure. Lets consider some simple examples.
Example 2 (signal extraction) In this example, we will deal with a situation where the agents know the model but do not perfectly observe the shocks
they face. Information is therefore incomplete because the agents do not know
perfectly the distribution of the true shocks.
Assume that a firm wants to predict the demand, d, it will be addressed, but
only observes a random variable x that is related to d as
x=d+
(1.1)
E(d 0 1 x|1) = 0
E(d 0 1 x|x) = 0
These are the two normal equation associated with an OLS estimate, hence we
have
1 =
2
Cov(x, d)
Cov(d + , d)
=
= 2 d 2
V(x)
V(d + )
d +
and
0 =
d2 + 2
i ti
i=0
Y +
X
i=0
!
i t+1i
Since Y is a deterministic constant, E Y = Y , such that
E(Yt+1 |) = Y +
i E(t+1i |)
i=0
k
X
i=0
i+1 ti
k
X
i=0
i+1 ti
pt
1.2
1.2.1
(1.2)
1
<1
1+r
(1.3)
e
where t+1
denotes expected inflation
e
t+1
Et (Pt+1 ) Pt
Pt
(1.4)
Taking logs lowercases will denote logged variables using the approximation log(1 + x) x and reorganizing, we end up with
pt = aEt (pt+1 ) + (1 a)mt where a =
Monopolistic competition
1+
demand
pt = yt Et yt+1
(1.5)
the term in yt accounts for the fact that the greater the greater the price
is, the lower the demand is. The term in Et yt+1 accounts for the fact that
greater expected sells tend to lower the price.1 The firm acts as a monopolist
maximizing its profit
max pt yt ct yt
yt
taking the demand (1.5) into account. ct is the marginal cost, which is assumed to follow an exogenous stochastic process. Note that we assume, for the
moment, that the firm adopts a purely static behavior. Profit maximization
taking (1.5) into account yields
2yt Et yt+1 ct = 0
which may be rewritten as
yt = aEt (pt+1 ) + bct + d where a =
1
,b=
and d =
2
2
2
If < 0, the model may be given an alternative interpretation. Greater expected sells
lead the firm to raise its price (you may think of goods such as tobacco, alcohol, . . . , each
good that may create addiction).
At this point we are left with the expectational difference equation (1.2),
which may either be solved forward or backward looking depending on
the value of a. When |a| < 1 the solution should be forward looking, as it
will become clear in a moment, conversely, when |a| > 1 the model should be
solved backward. The next section investigates this issue.
1.2.2
The problem that arises with the case |a| < 1 may be understood by looking
at figure 1.1, which reports the dynamics of equation
Et yt+1 =
b
1
yt xt
a
a
(1.6)
Solving this
10
45
yt
y y0
k
X
i=0
11
For the first term to converge, we need the expectation Et (xt+k ) not to increase
at a too fast pace. Then provided that |a| < 1, a sufficient condition for the
first term to converge is that the expectation explodes at a rate lower than
|1/a 1|.3 In the sequel we will assume that this condition holds.
Finally, since |a| < 1, imposing that lim |yt | < holds, we have
t
k+1
lim a
Et (yt+k+1 ) = 0
ai Et (xt+i )
(1.7)
i=0
In other words, yt is given by the discounted sum of all future expected values
of xt . In order to get further insight on the form of the solution, we may be
willing to specify a particular process for xt . We shall assume that it takes
the following AR(1) form:
xt = xt1 + (1 )x + t
where || < 1 for sake of stationarity and t is the innovation of the process.
Note that
Et xt+1 = xt + (1 )x
Et xt+2 = Et xt+1 + (1 )x = 2 xt + (1 )(1 + )x
Et xt+3 = Et xt+2 + (1 )x = 3 xt + (1 )(1 + + 2 )x
..
.
Et xt+i = i xt + (1 )(1 + + 2 + . . . + i )x = i xt + (1 i+1 )x
Therefore, the solution takes the form
X
ai (i xt + (1 i )x)
yt = b
= b
i=0
X
i=0
(a) (xt x) +
X
i=0
ax
xt x
x
+
= b
1 a 1 a
b
ab(1 )
=
x
xt +
1 a
(1 a)(1 a)
12
Stochastic Case
9
8
5.5
7
5
6
5
4.5
4
4
0
50
100
Time
150
200
3
0
50
100
Time
150
200
Note: This example was generated using a = 0.8, b = 1, = 0.95, = 0.1 and x = 1.
Matlab Code: Forward Solution
\simple
%
% Forward solution
%
lg = 100;
T
= [1:long];
a
= 0.8;
b
= 1;
rho = 0.95;
sx = 0.1;
xb = 1;
%
% Deterministic case
%
y=a*b*xb/(1-a);
%
% Stochastic case
%
%
% 1) Simulate the exogenous process
%
x
= zeros(lg,1);
randn(state,1234567890);
13
e
= randn(lg,1)*sx;
x(1) = xb;
for i=2:long;
x(i) = rho*x(i-1)+(1-rho)*xb+e(i);
end
%
% 2) Compute the solution
%
y
= b*x/(1-a*rho)+a*b*(1-rho)*xb/((1-a)*(1-a*rho));
Factorization
The method of factorization was introduced by Sargent [1979]. It amounts to
make use of the forward operator F , introduced in the first chapter.4 In a first
step, equation (1.2) is rewritten in terms of F
yt = aEt yt+1 +bxt Et (yt ) = aEt (yt+1 )+bEt (xt ) (1aF )Et yt = bEt xt
which rewrites as
E t yt = b
Et xt
1 aF
X
1
ai F i
=
1 aF
i=0
Therefore, we have
E t y t = yt = b
ai F i Et xt = b
ai Et xt+i
i=0
i=0
Note that although we get, obviously, the same solution, this method is not
as transparent as the previous one since the terminal condition (1.6) does not
appear explicitly.
Method of undetermined coefficients
This method proceeds by making an initial guess on the form of the solution.
An educated guess for the problem at hand would be
yt =
i Et xt+i
i=0
14
i Et xt+i = aEt
i=0
i Et+1 xt+1+i
i=0
+ bxt
ai Et xt+i
i=0
The problem with such an approach is the we need to make the right guess
from the very beginning. Assume for a while that we had specified the following guess
yt = xt
Then
xt = aEt xt+1 + bxt
Identifying term by terms we would have obtained = b or = 0, which is
obviously a mistake.
As a simple example, let us assume that the process for xt is given by the same
AR(1) process as before. We therefore have to solve the following dynamic
system
Since the system is linear and that xt exhibits a constant term, we guess a
solution of the form
yt = 0 + 1 xt
Plugging this guess in the expectational difference equation, we get
0 + 1 xt = aEt (0 + 1 xt+1 ) + bxt
15
b
1 a
ab(1 )
x
(1 a)(1 a)
1.2.3
Until now, we have only considered the case of a regular economy in which
|a| < 1, which provided we are ready to impose a nonexplosion condition
yields a unique solution that only involves fundamental shocks. In this
section we investigate what happens when we relax the condition |a| < 1
and consider the case |a| > 1. This fundamentally changes the nature of the
solution, as can be seen from figure 1.3. More precisely, any initial condition
y0 for y is admissible as any leads the economy back to its longrun solution
y. The equilibrium is then said to be indeterminate.
From a mathematical point of view, the sum involved in the forward solution
is unlikely to converge. Therefore, the solution should be computed in an
alternative way. Let us recall the expectational difference equation
yt = aEt yt+1 + bxt
5
Note that this is here that we make use of the assumptions on the process for the
exogenous shock.
16
45
y
y0
yt
b
1
yt + xt + t+1
a
a
Since |a| > 1 this equation is stable and the system is fundamentally backward
looking. Note that t+1 is serially uncorrelated, and not necessarily correlated
with the innovations of xt . In other words, this shock may not be a fundamental shock and is alike a sunspot. For example, I wake up in the morning,
look at the weather and decides to consume more. Why? I dont know! This
is purely extrinsic to the economy!
17
Figure 1.4 reports an example of such an economy. We have drawn the solution
to the model for different values of the volatility of the sunspot, using the
same draw. As can be seen, although each solution is perfectly admissible,
the properties of the economy are rather different depending on the volatility
of the sunspot variable. Besides, one may compute the volatility and the first
Figure 1.4: Backward Solution
=0.1
Without sunspot
2.5
2.5
1.5
1.5
0.5
0.5
0
0
50
100
Time
=0.5
150
200
0
0
50
50
100
Time
150
200
100
Time
150
200
0
0
100
Time
=1
150
200
2
0
50
Note: This example was generated using a = 1.8, b = 1, = 0.95, = 0.1 and x = 1.
order autocorrelation of yt :6
y2 =
y (1) =
a2
b2 ( + a)
2
+
2
x
(a2 1)(a )
a2 1
"
#
b2 (a2 1)x2
1
1+ 2
a
b (a + )x2 + a2 (a )2
18
1.2.4
Lets now go back to the forward looking solution. The ways we dealt with it
led us to eliminate any bubble that is we imposed condition (1.6) to bound
the sequence. By doing so, we restricted ourselves to a particular class of
19
solution, but there may exist a wider class of admissible solution that satisfy
(1.2) without being bounded.
Let us now assume that such an alternative solution of the form does exist
yet = yt + bt
where yt is the solution (1.7) and bt is a bubble. In order for yet to be a solution
to (1.2), we need to place some additional assumption on its behavior.
If yet = yt + bt it has to be the case that Et yet+1 = Et yt+1 + Et bt+1 , such that
solution to (1.2). Note that since |a| < 1 in the case of a forward solution,
d
r
20
Bubble
28
2.5
Bubble solution
Fundamental solution
27
2
26
1.5
25
24
10
Time
15
20
1
0
dividend/price
10
Time
15
20
0.039
0.35
0.0385
0.3
0.038
0.25
0.0375
0.2
0.037
0.0365
0
10
Time
15
20
0.15
0
10
Time
15
20
21
r
= 0.04;
%
% Fundamental solution p*
%
p_star = d_star/r;
%
% bubble
%
long
= 20;
T
= [0:long];
b
= (1+r).^T;
p
= p_star+b;
(a)1 bt + t+1
t+1
with probability
with probability 1
with Et t+1 = 0. So defined, the bubble keeps on inflating with probability and bursts with probability (1 ). Lets check that bt = aEt bt+1
bt =
=
=
=
22
Bubble
200
150
Bubble solution
Fundamental solution
150
100
100
50
50
0
0
50
100
Time
150
200
50
0
50
dividend/price
100
Time
150
200
0.06
50
0.05
0.04
0.03
0.02
50
0.01
0
0
50
100
Time
150
200
100
0
50
100
Time
150
200
23
Up to this point we have been dealing with very simple situations where the
problem is either backward looking or forward looking. Unfortunately, such
a case is rather scarce, and most of economic problems such as investment
decisions, pricing decisions . . . are both backward and forward looking. We
examine such situations in the next section.
1.3
(1.8)
24
We are now willing to solve this expectational equation. As before, there exist
many methods.
1.3.1
Let us recall that solving the equation using undetermined coefficients amounts
to formulate a guess for the solution and find some restrictions on the coefficients of the guess such that equation (1.8) is satisfied. An educated guess in
this case is given by
yt = yt1 +
i Et xt+i
i=0
Where does this guess come from? Experience! and this is precisely why
the method of undetermined coefficients, although it may appear particularly
practical in a number of (simple) problems, is not always appealing.
Plugging this guess in equation (1.8) yields
#
"
X
X
i Et+1 xt+1+i + byt1 + cxt
i Et xt+i = aEt yt +
yt1 +
i=0
i=0
= a yt1 +
i Et xt+i
i=0
+ aEt
"
+byt1 + cxt
= (a2 + b)yt1 + a
i Et xt+i + a
i=0
i Et+1 xt+1+i
i Et xt+1+i + cxt
i=0
i=0
(1.9)
0 = a0 + c
(1.10)
i = ai + ai1
i > 1
1
a
(1.11)
25
1. the two solutions lie outside the unit circle: the model is said to be a
source and only one particular point the steady state is a solution
to the equation.
2. One solution lie outside the unit circle and the other one inside: the
model exhibits the saddle path property.
3. The two solutions lie inside the unit circle: the model is said to be a sink
and there is indeterminacy.
Here, we will restrict ourselves to the situation where an extended version of
the condition |a| < 1 we were dealing with in the preceding section holds,
namely one root will be of modulus greater than one and the other less than
one. The model will therefore exhibit the socalled saddle point property, for
which we will provide a geometrical interpretation in a moment. To sum up,
we consider a situation where |1 | < 1 and |2 | > 1. Since we restrict ourselves
to the stationary solution, we necessarily have || < 1 so that = 1 .
Once has been obtained, we can solve for i , i = 0, . . .. 0 is obtained from
(1.10) and takes the value
0 =
c
1 a1
a
i1 =
1 a1
1
a
1
i1
1
X
c
i
2 Et xt+i
1 a1
i=0
Example 4 In the case of an AR(1) process for xt , the solution is straightforward, as all the process may be simplified. Indeed, let us consider the following
problem
26
with t ; N (0, ). An educated guess for the solution of this equation would
be
yt = yt1 + xt
Let us then compute the solution of the problem, that is let us find and
. Plugging the guess for the solution in the expectational difference equation
leads to
yt1 + xt = aEt (yt + xt+1 ) + byt1 + cxt
= a2 yt1 + axt + axt + byt1 + cxt
= (a2 + b)yt1 + (c + a( + ))xt
Therefore, we have to solve the system
= a2 + b
= c + a( + )
Like in the general case, we select the stable root of the first equation 1 , such
that |1 | < 1, and =
c
1a(1 +)
=
=
=
=
=
roots([a -1 b]);
min(mu);
mu(i);
max(mu);
mu(i);
alpha = b/(1-a*(mu1+rho));
%
% Simulation
%
27
6
4
0.5
2
0
2
0.5
0
50
100
Time
150
200
4
0
50
100
Time
150
200
150
200
3
2
0.5
1
0
1
0.5
0
50
100
Time
150
200
2
0
50
100
Time
28
lg
= 200;
randn(state,1234567890);
e
= randn(lg,1)*se;
x
= zeros(lg,1);
y
= zeros(lg,1);
x(1) = 0;
y(1) = alpha*x(1);
for i = 2:lg;
x(i) = rho*x(i-1)+e(i);
y(i) = mu1*y(i-1)+alpha*x(i);
end
Note that contrary to the simple case we considered in the previous section, the
solution does not only inherit the persistence of the shock, but also generates its
own persistence through 1 as can be seen from the first order autocorrelation
(1) =
1.3.2
1 +
1 + 1
Factorization
c
(F 1 )Et yt1 = (F 2 )1 Et xt
a
29
c
1
(1 1
2 F ) Et xt
a2
i
i
2 F
i=0
so that
(F 1 )Et yt1 =
c X i i
c X i
2 F Et xt =
2 Et xt+i
a2
a2
i=0
i=0
Now, applying the leading operator on the left hand side of the equation
and acknowledging that 2 = 1/a 1 , we have
X
c
i
yt = 1 yt1 +
2 Et xt+i
1 a1
i=0
1.3.3
A matricial approach
In this section, we would like to provide you with some geometrical intuition of
what is actually going on when the saddle path property applies in the model.
To do so, we will rely on a matricial approach. First of all, let us recall the
problem we have in hands:
yt = aEt yt+1 + byt1 + cxt
Introducing the technical variable zt defined as
zt+1 = yt
the model may be rewritten as7
1
Et yt+1
yt
c
ab
a
=
xt
zt+1
zt
1
1 0
Remember that Et yt+1 = yt+1 t+1 where t+1 is an iid process which represents the expectation error, therefore, the system rewrites
1
yt
c
1
yt+1
ab
a
xt
t+1
=
zt
1
0
zt+1
1 0
7
In the next section we will actually pool all the equations in a single system, but for
pedagogical purposes let us separate exogenous variables from the rest for a while.
30
In order to understand the saddle path property let us focus on the homogenous part of the equation
1
b
yt+1
yt
yt
a a
=
=W
zt+1
zt
zt
1 0
Provided b 6= 0 the matrix W can be diagonalized and may be rewritten as
W = P DP 1
where D contains the two eigenvalues of W and P the associated eigenvectors.
Figure 1.8 provides a way of thinking about eigenvectors and eigenvalues in
dynamical systems. The figure reports the two eigenvectors, P1 and P2 , associated with the two eigenvalues 1 and 2 of W . 1 is the stable root and
2 is the unstable root. As can be seen from the graph, displacements along
Figure 1.8: Geometrical interpretation of eigenvalues/eigenvectors
z 6
P2
P1
x2
6
x1 R
x2
x1
z
x3
x4
I
x4
x3 )
P1 are convergent, in the sense they shift either x1 or x4 toward the center of
the graph (x1 and x4 ), while displacements along P2 are divergent (shift of x2
and x3 to x2 and x3 ). In fact the eigenvector determines the direction along
31
which the system will evolve and the eigenvalue the speed at which the shift
will take place.
The characteristic equation that gives the eigenvalues, in the case we are studying, is given by
1
b
det(W I) = 0 2 + = 0
a
a
which exactly corresponds to the equations we were dealing with in the previous sections. We will not enter the formal resolution of the model right now,
as we will undertake an extensive treatment in the next section. However, we
will just try to understand what may be going on using a phase diagram like
approach to understand the dynamics. Figures 1.91.11 report the different
possible configuration we may encounter solving this type of model. The first
one is a source (figure 1.9), which is such that no matter the initial condition
we feed the system with except y0 = y , z0 = z the system will explode.
Both y and z will not be bounded. The second one is a sink (figure 1.10), all
trajectories converge back to the steady state of the economy, one is then free
to choose whatever trajectory it wants to go back to the steady state. The
equilibrium is therefore indeterminate.
Figure 1.9: A source
yt
P2
yt+1 = 0
zt+1 = 0
6
I
i
P1
y
R
zt
32
P1
y
P2
yt+1 = 0
zt+1 = 0
z z
-
y
i
i
zt
In the last situation (figure 1.11) this corresponds to the most commonly
encountered situation in economic theory the economy lies on a saddle: one
branch of the saddle converges to the steady state, the other one diverges. The
problem is then to select where to start from. It should be clear to you that in
t, zt is perfectly known as zt = yt1 which was selected in the earlier period.
zt is then said to be predetermined: the agents is endowed with its value when
she enters the period. This is part of the information set. Solving the system
therefore amounts to select a value for yt , given that for zt and the structure
of the model. How to proceed then? Let us assume for a while that at time
0, the economy is endowed with z0 , and assume that we impose the value y01
as a starting value for y. In such a case, the economy will explode: in other
words a solution including a bubble has been selected. If, alternatively, y02 is
selected, then the economy will converge to the steady state (z , y ) and all
the variables will be bounded. In other words, we have selected a trajectory
such that
lim |yt | <
P2
yt+1 = 0
P1
zt+1 = 0
y2
0
1
y0
y
?
z0
zt
1.4
1.4.1
(1.12)
(1.13)
34
tion that actually drives the dynamics of the economy under consideration:8 it
relates future values of states St+1 to current and expected values of variables
of interest, current state variables and shocks to fundamentals Et+1 . In other
words, (1.13) furnishes the transition from one state of the system to another
one. Our problem is then to solve this system.
As a first step, it would be great if we were able to eliminate all variables
defined by the measurement equation and restrict ourselves to a state equation,
as it would bring us back to our initial problem. To do so, we use (1.17) to
eliminate Yt .
1
Yt = Mcc
Mcs St
WE =
1 M
Mss0 Msc0 Mcc
cs
1
1
1 M
Mss1 Msc1 Mcc
cs
M se
We are then back to our expectational difference equation. But it needs additional work. Indeed, Farmer proposes a method that enables us to forget
about expectations when solving for the system. He proposes to replace the
expectation by the actual variable minus the expectation error
Et St+1 = St+1 Zt+1
where Et Zt+1 = 0. Then the system rewrites
St+1 = WS St + WE Et+1 + Zt+1
(1.14)
1.4.2
? have shown that the existence and uniqueness of a solution depends fundamentally on the position of the eigenvalues of WS relative to the unit circle.
Denoting by NB and NF the number of, respectively, predetermined and jump
variables, and by NI and NO the number of eigenvalues that lie inside and
outside the unit circle, we have the following proposition.
Proposition 4
(i) If NI = NB and NO = NF , then there exists a unique solution path for
the rational expectation model that converges to the steady state;
(ii) If NI > NB (and NO < NF ), then the system displays indeterminacy;
(iii) If NI > NB (and NO > NF ), then the system is a source.
Hereafter we will deal with the two first situations, the last one being never
studied in economics.
The diagonalization of WS leads to
WS = P D P 1
where D is the matrix that contains the eigenvalues of WS on its diagonal and
P is the matrix that contains the associated eigenvectors. For convenience,
we assume that both D and P are such that eigenvalues are sorted in the
ascending order. We shall then consider two cases
1. The model satisfies the saddle path property (NI = NB and NO = NF )
2. The model exhibit indeterminacy (NI > NB and NO < NF )
The saddle path
In this section, we consider the case were the model satisfies the saddle path
property (NI = NB and NO = NF ). For convenience, we consider the following
partitioning of the matrices
DB 0
PBB PBF
PBB PBF
1
D=
, P =
, P =
0 DF
PF B PF F
PF B PF F
36
This partition conforms the position of the eigenvalues relative to the unit
circle. For instance, a B stands for the set of eigenvalues that lie within the
unit circle, whereas B stands for the set of eigenvalues that lie out of it.
We then apply the following modification to the system in order to make it
diagonal:
Set = P 1 St
so that
SeB,t
SeF,t
SeB,t
SeF,t
!
!
RB.
RF.
Et+1 +
PB.
PF.
Zt+1
have
PF B SB,t + PF F SF,t = 0
This condition expresses the relationship that relates the jump variables to the
predetermined variables, and therefore defined the initial condition SF,t which
is compatible with (i) the initial conditions on the predetermined variables
and (ii) the stationarity of the solution:
SF,t = (PF F )1 PF B SB,t = SB,t
Plugging this result in the law of motion of backward variables we have
SB,t+1 = (WBB + WBF )SB,t + RB Et+1 + ZB,t+1
but by definition, no expectation error may be done when predicting a predetermined variable, such that ZBt+1 = 0. Hence, the solution of the problem is
given by
SB,t+1 = MSS SB,t + MSE Et+1
(1.15)
cc
(1.16)
Yt = SB,t
(1.17)
SF,t = SB,t
(1.18)
38
1.5
In this section we present a method to solve for multivariate rational expectations models, a because there are many of them (almost as many as authors
that deal with this problem).9 The one we present was introduced by Sims
[2000] and recently revisited by Lubik and Schorfheide [2003]. It has the advantage of being general and explicitly dealing with expectation errors. This
latter property makes it particularly suitable for solving sunspot equilibria.
1.5.1
ues from a system which is not invertible. One way to think of this approach is
to remember that when we compute the eigenvalues of a diagonalizable matrix
A, we want to find a number and an associated eigenvector V such that
(A I)V = 0
The generalized Schur decomposition of two matrices A and B attempts to
compute something similar, but rather than considering (AI), the problem
considers (A B). A more formal, and above all a more rigorous
statement of the Schur decomposition is given by the following definitions and
theorem.
Definition 4 Let P C Cnn be a matrixvalued function of a complex
variable (a matrix pencil). Then the set of its generalized eigenvalues (P ) is
defined as
(P ) = {z C : |P (z) = 0}
When P (z) writes as Az B, we denote this set as (A, B). Then there exists
a vector V such that BV = AV .
Definition 5 Let P (z) be a matrix pencil, P is said to be regular if there
exists z C such that |P (z)| =
6 0 i.e. if (P ) 6= C.
9
In the appendix we present an alternative method that enables you to solve for singular
systems.
39
for nonsquare matrices and is the most general form of diagonalization. Any
complex matrix A(n m) can be factored into the form
A = U DV
where U (n n), D(n m) and V (m m), with U and V unitary matrices
(U U = V V = I(nn) ). D is a diagonal matrix with positive values dii ,
i = 1 . . . r and 0 elsewhere. r is the rank of the matrix. dii are called the
singular values of A.
1.5.2
Representation
(1.19)
40
1
0
0
1
1
0
0
0 0 0
1 0 0 0
0 1 0 0
0 0 0
0 0 0 0
1 1
Y
=
t
0 0 0 0
0 0
0 0 1
0 0 0 0
0
0
1
0
0
0
0 t + 0
0
Y
+
t1
0
0
0
1
0
0
1
0
0
0
1.5.3
We now turn to the resolution of the system (1.19). Since, A0 is not necessarily
invertible, we will make full use of the generalized Schur decomposition of
(A0 , A1 ). There therefore exist matrices Q, Z, T and S such that
Q T Z = A0 , Q SZ = A1 , QQ = ZZ = Inn
10
y
t
41
and T and S are upper triangular. Let us then define Xt = Z Yt and pre
multiply (1.19) by Q to get
T11 T12
W1,t
S11 S12
W1,t1
Q1
=
+
(Bt + Ct )
0 T22
W2,t
0 S22
W2,t1
Q2
(1.20)
Let us assume, without loss of generality that the system is ordered and partitioned such that the m 1 vector W2,t is purely explosive. Accordingly, the
remaining n m 1vector W1,t is stable. Let us first focus on the explosive
part of the system
T22 W2,t = S22 W2,t1 + Q2 (Bt + Ct )
For this particular block, the diagonal elements of T22 can be null, while S22
is necessarily full rank, as its diagonal elements must be different from zero if
the model is not degenerate. Therefore, the model may be written
1
1
W2,t = M W2,t+1 S22
Q2 (Bt+1 + Ct+1 ) where M S22
T22
1
M s1 S22
Q2 (Bt+s + Ct+s )
s=1
1
M s1 S22
Q2 (Bt+s + Ct+s )
s=1
Note that by definition of the vector Yt which does not involve any variable
which do not belong to the information set available in t, we should have
Et W2,t = W2,t . But,
Et W2,t = Et
1
M s1 S22
Q2 (Bt+s + Ct+s ) = 0
s=1
(1.21)
42
Our problem is now to know whether we can pin down the vector of expectation errors uniquely from that set of restrictions. Indeed, the vector t may
not be uniquely determined. This is the case for instance when the number
of expectation errors k exceeds the number of explosive components m. In
this case, equation (1.21) does not provide enough restrictions to determine
uniquely the vector t . In other words, it is possible to introduce expectation
errors which are not related with fundamental uncertainty the socalled
sunspot variables.
Sims [2000] shows that a necessary and sufficient condition for a stable solution
to exist is that the column space of Q2 B be contained in the column space of
Q2 C:
span(Q2 B) span(Q2 C)
Otherwise stated, we can reexpress Q2 B as a linear function of Q2 C (Q2 B =
Q2 C), implying that k > m. This is actually a generalization of the socalled
Blanchard and Khan condition that states that the number of explosive eigenvalues should be equal to the number of jump variables in the system. Lubik
and Schorfheide [2003] complement this statement by the following lemma.
Lemma 1 Statements (i) and (ii) are equivalent
(i) For every t R , there exists an t Rk such that Q2 Bt + Q2 Ct = 0.
(ii) There exists a (real) k matrix such that Q2 B = Q2 C
Endowed with this lemma, we can compute the set of all solutions (fully determinate and indeterminate solutions), reported in the following proposition.
Proposition 5 (Lubik and Schorfheide [2003]) Let t be a p 1 vector
of sunspot shocks, satisfying Et1 t = 0. Suppose that condition (i) of lemma
1 is satisfied. The full set of solutions for the forecast errors in the linear
rational expectations model is
1
t = (V1 D11
U1 Q2 B + V2 M1 )t + V2 M2 t
U
|{z}
mm
0
0
D
|{z}
mk
V
|{z}
kk
V1
V2
= U1 D11 V1
(1.22)
(1.23)
e = V1
43
44
e , we get
Therefore, plugging this result in the determination of
e = D1 U Q2 B
11 1
e + V2 M1 , we finally get
Since = V1
1
= V1 D11
U1 Q2 B + V2 M1
This last result tells us how to solve the model and under which condition
the system is determined or not. Indeed, let us recall that k is the number
of expectation errors, while r is the number of linearly independent expectation errors. According to this proposition, if k = r, all expectation errors
are linearly independent, and the system is therefore totally determinate. M1
and M2 are identically zeros. Conversely, if k > r expectation errors are not
linearly independent, meaning that the system does not provide enough restrictions to uniquely pin down the expectation errors. We therefore have to
introduce extrinsic uncertainty in the system the socalled sunspot variables. We will deal first with the determinate case, before considering the case
of indeterminate system.
Determinacy
This case occurs when the number of expectation errors exactly matches the
number of explosive components (k = m), or otherwise stated in the case
45
while that of purely extrinsic expectation errors is nil. To get such an effect
.
in the first part of system (1.20), we shall premultiply by the matrix [I .. ]
1
where Q1 CV1 D11
U1 . Then, taking into account that W2t = 0, we have
W1,t
W2,t
S11 S12 S22
W1,t1
=
0
0
W2,t1
Q1 Q2
Bt
+
0
1
1
T11
T11
(T12 T22 )
0
I
we have
1
1
1
W1,t1
W1,t
T11 (Q1 Q2 )
T11 S11 T11
(S12 S22 )
Bt
+
=
W2,t1
W2,t
0
0
0
Now recall that Wt = Z Yt and that ZZ = I. Therefore, premultiplying the
last equation by Z, we end up with a solution of the form
Yt = My Yt1 + Me t
(1.24)
with
M =Z
1
1
T11
S11 T11
(S12 S22 )
0
0
Z and Me = Z
1
T11
(Q1 Q2 )
0
46
Indeterminacy
This case arises as soon as the number of expectation errors is greater than the
number of explosive components (k > m), which translates into the fact that
k > r. As shown in proposition 5, the expectation errors are then not only
linear combinations of fundamental disturbances for all t but also of purely
extrinsic disturbances called sunspot variables. Then, the expectation errors
are shown to be of the form
1
t = (V1 D11
U1 Q2 B + V2 M1 )t + V2 M2 t
where both M1 and M2 can be freely chosen. This actually raises several questions. The first one is how to select M1 and M2 ? They are totally arbitrary
the only restriction we have to impose is that M1 is a (k r) matrix and M2
is a (k r) p matrix. A second one is then how to interpret these sunspots?
In order to partially circumvent these difficulties, it is useful to introduce the
notion of beliefs. For instance, this amounts to introduce new shocks the
sunspots beside the standard expectation error. In such a case, a variable
yt will be determined by its expectation at time t 1, a shock on the beliefs
that leads to a revision of forecasts, and the expectation error
yt = Et1 yt + t + t
where t is the shock on the belief, that satisfies Et1 t = 0, and t is the
expectation error. t is a k 1 vector. Then the system 1.19 rewrites
A0 Yt = A1 Yt1 + Bt + C(t + t )
which can be restated in the form
A0 Yt = A1 Yt1 + B
t
t
+ C t
where B = [B C]. Implicit in this rewriting of the system is the fact that the
belief shock be treated like a fundamental shock, therefore condition (1.21)
rewrites
Q2 B
t
t
+ Q2 C t = 0
47
This shows that the expectation error is a function of both the fundamental
shocks and the beliefs.
If this latter formulation furnishes an economic interpretation to the sunspots,
it leaves unidentified the matrices M1 and M1 . From a practical point of view,
we can, arbitrarily, set these matrices to zeros and then proceed exactly as in
the determinate case, replacing B by B in the solution. This leads to
t
(1.25)
Yt = My Yt1 + Me
t
with
M =Z
1
1
(S12 S22 )
S11 T11
T11
0
0
Z and Me = Z
1
(Q1 Q2 )
T11
0
Note however, that even if we know the form of the solution, we know nothing
about the statistical properties of the t shocks. In particular, we do not know
their covariance matrix that can be set arbitrarily.
1.5.4
In this section, we will show you how the solution may be used to study the
dynamic properties of the model from a quantitative point of view. We will
basically address two issues
1. Impulse response functions
2. Computation of moments
Impulse response functions
As we have already seen in the preceding chapter, the impulse response function of a variable to a shock gives us the expected response of the variable to
a shock at different horizons in other words this corresponds to the best
linear predictor of the variable if the economic environment remains the same
in the future. For instance, and just to remind you what it is, let us consider
the case of an AR(1) process:
xt = xt1 + (1 )x + t
48
Assume for a while that no shocks occurred in the past, such that xt remained
steady at the level x from t = 0 to T . A unit positive shock of magnitude
occurs in T , xT is then given by
xT = x +
xt
Time
49
if i = k
otherwise
Since both t are innovations, they are orthogonal to Yt , such that the previous
equation reduces to
yy = My yy My + Me ee ME
Solving this equation for SS can be achieved remembering that vec(ABC) =
(A C )vec(B), hence
vec(yy ) = (I My My )1 vec(ee )
50
The computation of covariances at leads and lags proceeds the same way. For
). From
instance, assume we want to compute jSS = E(St Stj
Yt = My Yt1 + Me t
we know that
Yt =
Myj Ytj
+ Me
j
X
Myi ti
i=0
Therefore,
E(Yt Ytj
)
+ Me
j
X
i=0
Since are innovations, they are orthogonal to any past value of Y , such that
0
if i < j
E(ti Ytj ) =
ee Me if i = j
Then, the previous equation reduces to
E(Yt Ytj
) = Myj yy + Me Myj ee Me
1.6
Economic examples
This section intends to provide you with some economic applications of the set
of tools we have described up to now. We will consider three examples, two of
which may be thought of as micro examples. In the first one a firm decides on
its labor demand, the second one is a macro model and endogenous growth
model `
a la Romer [1986] which allows to show that even a nonlinear model
may be expressed in linear terms and therefore may be solved in a very simple
way. The last one deals with the socalled Lucas critique which has strong
implications on the econometric side.
1.6.1
51
Labor demand
We consider the case of a firm that has to decide on its level of employment.
The firm is infinitely lived and produces a good relying on a decreasing returns
to scale technology that essentially uses labor another way to think of it
would be to assume that physical capital is a fixedfactor. This technology is
represented by the production function
Yt = f0 nt
f1 2
n with f0 , f1 > 0.
2 t
f1 2
2
max Et
f0 nt+s nt+s wt+s nt+s (nt+s nt+s1 )
1+r
2
2
{n }
=0
s=0
none
none
s
1
f1 2
2
Et
f0 nt+s nt+s wt+s nt+s (nt+s nt+s1 )
1+r
2
2
s+1
1
(nt+s+1 nt+s )2
Et
1+r
2
none
52
mizing
s
1
1
f1 2
2
2
Et
(nt+s+1 nt+s )
f0 nt+s nt+s wt+s nt+s (nt+s nt+s1 )
1+r
2
2
1+r 2
which yields the following first order condition
s
1
1
Et
f0 f1 nt+s wt+s (nt+s nt+s1 ) +
(nt+s+1 nt+s ) = 0
1+r
1+r
since r is a constant this reduces to
1
(nt+s+1 nt+s ) = 0
Et f0 f1 nt+s wt+s (nt+s nt+s1 ) +
1+r
Now remark that this relationship holds whatever s, such that we may restrict
ourselves to the case s = 0 which then yields noting that nti , i > 0 belongs
to the information set
f0 f1 nt wt (nt nt1 ) +
(Et nt+1 nt ) = 0
1+r
rearranging terms
1+r
f1 (1 + r)
nt + (1 + r)nt1 +
(f0 wt ) = 0
Et nt+1 2 + r +
T +
P (F ) may be factorized as
P (F ) = (F 1 )(F 2 )
Let us compute the discriminant of this second order polynomial
f1 (1 + r) 2
f1
f1
2+r+
(1 + r) + 2(2 + r) > 0
4(1 + r) = (1 + r)
53
Hence, since > 0, we know that the two roots are real. Further
f1 (1 + r)
<0
f1
P (1) = (1 + r) + 2(2 + r) > 0
P (0) = 1 + r > 0
f1 (1 + r)
1
2+r+
>1
P (x) = 0 x =
2
P (1) =
P (0) being greater than 0 and since P(1) is negative, one root lies between
0 and 1, and the other one is therefore greater than 1 since lim P (x) = .
x
1 + r wt f 0
F 2
or
nt = 1 nt1 +
1 + r f0 wt
1 + r X i
2 Et (f0 wt+i )
=
n
+
1 t1
2 1 1
2
2 F
i=0
1 X i
nt = 1 nt1 +
2 Et (f0 wt+i )
i=0
f0 (1 + r)
1 X i
2 Et wt+i
nt =
+ 1 nt1
(2 1)
i=0
For practical purposes let us assume that wt follows an AR(1) process of the
form
wt = wt1 + (1 )w + t
we have
Et wt+i = i wt + (1 i )w
such that nt rewrites
nt =
f0 (1 + r)
1+r
(1 + r)(1 )
w + 1 nt1
wt
(2 1) (2 1)(2 )
( 2 )
54
i Et wt+i
i=0
X
X
i Et+1 wt+i+1
i Et wt+i +
Et 0 + 1 0 + 1 nt1 +
i=0
i=0
!
X
f1 (1 + r)
i Et wt+i
0 + 1 nt1 +
2+r+
i=0
1+r
+(1 + r)nt1 +
(f0 wt ) = 0
which rewrites
0 (1 + 1 ) +
12 nt1
+ 1
i Et wt+i +
i=0
f1 (1 + r)
0 + 1 nt1 +
2+r+
+(1 + r)nt1 +
1+r
(f0 wt ) = 0
i Et wt+i+1
i=0
i Et wt+i
i=0
f1 (1+r)
(1
+
2
+
r
+
0 + 1+r
0
1
f0 = 0
f
(1+r)
2 2 + r + 1
1 + (1 + r) = 0
1
f
(1+r)
1
0 1 2 + r +
1+r
=0
f
(1+r)
1
i 1 2 + r +
+ i1 = 0
The second equation of the system exactly corresponds to the second order
polynomial we solved in the factorization method. The system therefore exhibits the saddle path property so that 1 (0, 1) and 2 (1, ). Let us
recall that 1 + 2 = 2 + r + f1 (1 + r)/, such that the system for 0 and i
rewrites
0 (1 + 1 ) 2 + r +
0 2 1+r
=0
1
i = 2 i1
f1 (1+r)
0 +
1+r
f0
=0
55
Therefore, we have
0 =
1+r
1
=
2
and i = i
2 0 . Finally, we have
0 =
f0 (1 + r)
(2 1)
1 X i
f0 (1 + r)
2 Et wt+i
+ 1 nt1
nt =
(2 1)
i=0
n 2+r+
(f0 w) = 0 n =
n + (1 + r)n +
f1
Denoting n
bt = nt n and w
bt = wt w, and introducing the technical
variable zbt+1 = n
bt , the Labor demand reexpresses as
1+r
f1 (1 + r)
w
bt = 0
n
bt + (1 + r)b
zt
Et n
bt+1 2 + r +
Et1 n
bt + t , the system expresses as
1
0
0
1
1
1
0
2+r+
f1 (1+r)
0
0
1
0
0
0
1+r
zbt+1
n
bt
w
bt
Et n
bt+1
0
0
0
(1 + r)
0
0
1
0
0
0
0
0
0
0
t +
0
1
0
0
0
1
0
0
zbt
n
bt1
w
bt1
Et1 n
bt
We now provide you with an example of the type of dynamics this model
may generate. Figure 1.13 reports the impulse response function of labor to
56
f0
1
f1
0.2
0.001/1
w
0.6
0.95
a positive shock on the real wage (table 1.1 reports the parameterization).
As expected, labor demand shifts downward instantaneously, but depending
on the size of the adjustment cost, the magnitude of the impact effect differs. When adjustment costs are low, the firm drastically cuts employment,
which goes back steadily to its initial level as the effects of the shock vanish. Conversely, when adjustment costs are high, the firm does not respond as
Figure 1.13: Impulse Response to a Wage Shock
Small adjustment costs ( = 0.001)
Real Wage
Labor Demand
0.8
0.6
0.4
0.2
0
10
Time
15
5
0
20
10
Time
15
20
15
20
Labor Demand
1.5
2
0.8
2.5
0.6
3
0.4
0.2
0
3.5
5
10
Time
15
20
4
0
10
Time
much as before since it wants to avoid paying the cost. Nevertheless, it remains
optimal to cut employment, so in order to minimize the cost, the firm spreads
it intertemporally by smoothing the employment profile, therefore generating
a hump shaped response of employment.
57
58
1.6.2
We consider an economy that consists of a large number of dynastic households and a large number of firms. Firms are producing a homogeneous final
product that can be either consumed or invested by means of capital and labor
services. Firms own their capital stock and hire labor supplied by the households. Households own the firms. In each and every period three perfectly
competitive markets open the markets for consumption goods, labor services, and financial capital in the form of firms shares. Household preferences
are characterized by the lifetime utility function:
Et
X
s=0
h1+
log(ct+s ) t+s
1+
s
(1.26)
(1.27)
(1.28)
(1.29)
59
{ct+s ,kt+1+s }
s=0
Et
s log(ct+s )
s=0
s.t.
h1+
t+s
1+
kt+1 =yt = at kt h1
ct + (1 )kt
t
log(at ) = log(at1 ) + (1 ) log(a) + t
The set of conditions characterizing the equilibrium is given by
yt
h
t ct =(1 )
ht
yt =at kt h1
t
yt =ct + it
kt+1 =it + (1 )kt
yt+1
ct
+1
1 =Et
ct+1
kt+1
(1.30)
(1.31)
(1.32)
(1.33)
(1.34)
kt+1+s
=0
ct+s
The problem with this dynamic system is that it is fundamentally nonlinear
lim s
and therefore the methods we have developed so far are not designed to handle
it. The usual way to deal with this type of system is then to take a linear
or loglinear approximation of each equation about the deterministic steady
state. Therefore, the first step is to find the deterministic steady state.
Deterministic steady state
able, x, is the value x such that xt = x for all t. Therefore, the steady state
of the RBC model is characterized by the set of equations:
y
h c =(1 )
h
y =ak h 1
(1.35)
(1.36)
y t =c + i
(1.37)
k =i + (1 )k
y
1 =Et
+1
k
(1.38)
(1.39)
60
i
k
=
y
y
c
=
=
s
=
= 1 si
c
y
1 (1 )
y
si
h =
1
sc
1
1+
y =a
1 (1 )
h , c = sc y , i = y c .
Then, a restatement of the problem is in order, as we are to take an approximation with respect to log(x):
f (x) f (exp(log(x)))
which leads to the following first order Taylor expansion
f (x) f (x ) + f (exp(log(x )))exp(log(x ))b
x = f (x ) + f (x )x x
b
61
Applying this technic to the system (1.30)(1.34), we end up with the system
(1 + )b
ht + b
ct ybt
(1.40)
ybt (1 )b
ht b
ht b
at = 0
(1.41)
ybt sc b
ct sibit = 0
(1.42)
b
kt+1 bit (1 )b
kt = 0
(1.43)
Et b
ct+1 b
ct (1 (1 ))(Et ybt+1 Et b
kt+1 )
(1.44)
b
at b
at1 bt
(1.45)
Note that only the last three equations of the system involve dynamics, but
they depend on variables that are defined in the first three equations. Either
we solve the first three equations in terms of the state and costate variables,
or we adapt a little bit the method. We choose the second solution.
Let us define Yt = {b
kt+1 , b
at , Et b
ct+1 } and Xt = {b
yt , b
ct , bit , b
ht }. The system can
be rewritten as a set of two equations. The first one gathers static equations
x Xt = y Yt1 + t + t
1
0
0 1
1
0
0
0
x =
=
1 sc si
0 y 0
1 1
0
1
0
0
0
0
0
1
1
0
=
0 0
0
0
= 1
0
0
62
with
0
0 0 0
1
0 0
0
0 0 0
0
1 0 0x =
0y =
(1 (1 )) 0 0 0
1 (1 ) 0 1
0 0 0
1 0 0
0
1x = 0 0 0 0
1y = 0
0
0 0
0 1 0 0
0
0
1
0
=
=
0
0
Xt = y Yt1 + t + t
where j = 1
x j , j = {y, , }. Furthermore, remembering that Et t+1 =
Et t+1 = 0, we have Et Xt+1 = y Yt . Hence, plugging this result and the first
equation in the second equation we get
A0 Yt = A1 Yt+1 + Bt + Ct
where A0 = 0y +0x y , A1 = 1y +1x y , B = +0x and C = +0x .
We then just use the algorithm as described previously.
Then, we make use of the result in proposition 5, to get t . Since it turns
out that the model is determinate, the expectation error is a function of the
fundamental shock t
1
t = V1 D11
U1 Q2 Bt
Plugging this result in the equation governing static equations, we end up with
1
Xt = y Yt1 + (e V1 D11
U1 Q2 B)t
63
0.4
0.988
0.025
delta
= 0.025;
rho
= 0.95;
beta
= 0.988;
%
% Deterministic Steady state
%
ysk = (1-beta*(1-delta))/(alpha*beta);
ksy = 1/ysk;
si = delta/ysk;
sc = 1-si;
% Define:
%
% Y=[k(t+1) a(t+1) E_tc(t+1)]
%
% X=[y,c,i,h]
%
ny = 3; % # of variables in vector Y
nx = 4; % # of variables in vector X
ne = 1; % # of fundamental shocks
nn = 1; % # of expectation errors
%
% Initialize the Upsilon matrices
%
UX=zeros(nx,nx);
UY=zeros(nx,ny);
UE=zeros(nx,ne);
UN=zeros(nx,nn);
G0Y=zeros(ny,ny);
G1Y=zeros(ny,ny);
G0X=zeros(ny,nx);
G1X=zeros(ny,nx);
GE=zeros(ny,ne);
GN=zeros(ny,nn);
%
% Production function
%
UX(1,1)=1;
UX(1,4)=alpha-1;
UY(1,1)=alpha;
UY(1,2)=rho;
UE(1)=1;
%
% Consumption c(t)=E(c(t)|t-1)+eta(t)
%
0.95
64
UX(2,2)=1;
UY(2,3)=1;
UN(2)=1;
%
% Resource constraint
%
UX(3,1)=1;
UX(3,2)=-sc;
UX(3,3)=-si;
%
% Consumption-leisure arbitrage
%
UX(4,1)=-1;
UX(4,2)=1;
UX(4,4)=1;
%
% Accumulation of capital
%
G0Y(1,1)=1;
G1Y(1,1)=1-delta;
G1X(1,3)=delta;
%
% Productivity shock
%
G0Y(2,2)=1;
G1Y(2,2)=rho;
GE(2)=1;
%
% Euler equation
%
G0Y(3,1)=1-beta*(1-delta);
G0Y(3,3)=1;
G0X(3,1)=-(1-beta*(1-delta));
G1X(3,2)=1;
%
% Solution
%
% Step 1: solve the first set of equations
%
PIY = inv(UX)*UY;
PIE = inv(UX)*UE;
PIN = inv(UX)*UN;
%
% Step 2: build the standard System
%
A0 = G0Y+G0X*PIY;
A1 = G1Y+G1X*PIY;
B
= GE+G1X*PIE;
C
= GN+G1X*PIN;
%
% Step 3: Call Sims routine
%
[MY,ME,ETA,MU_]=sims_solve(A0,A1,B,C);
%
65
1.6.3
Let us consider the simplest new keynesian model, with the following IS curve
yt = Et yt+1 (it Et t+1 ) + gt
where yt denotes output, t is the inflation rate, it is the nominal interest rate
and gt is a stochastic shock that follows an AR(1) process of the form
gt = g gt1 + gt
the model also includes a Phillips curve that relates positively inflation to the
output gap
t = yt + Et t+1 + ut
where ut is a supply shock that obeys
ut = u ut1 + ut
For stationarity purposes, we have |g | < 1 and |u | < 1.
The model is closed by a simple Taylor rule of the form
it = t + y yt
66
Consumption
0.8
1.8
0.7
1.6
1.4
0.6
1.2
0.5
1
0.8
0
10
Time
15
20
0.4
0
Investment
10
Time
15
20
15
20
Hours worked
1.5
5
4
3
2
0.5
1
0
0
10
Time
15
20
0
0
10
Time
67
Plugging this rule in the first equation, and remembering the definition of
expectation errors, the system rewrites
yt =Et1 yt + ty
t =Et1 t + t
gt =g gt1 + gt
ut =u ut1 + ut
(1 + y )yt =Et yt+1 t + Et t+1 + gt
t =yt + Et t+1 + ut
Defining Yt = {yt , t , gt , ut , Et yt+1 , Et t+1 } and t = {ty , t }, the system
rewrites
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1 + y + 1 0 1
1
0 1 0
0
0
Yt = 0
0
0 0 0 1
0 0 0 0
0 g 0 0
0 0 u 0
0 0 0 0
0 0 0 0
0 0
0 0
1 0
t +
0 1
0 0
0 0
0
1
0
0
0
0
1
0
0
0
0
0
Yt1
0
1
0
0
0
0
0.4
0.9
g
0.9
u
0.9
y
0.25
1.5/0.5
68
%
alpha
= 0.4;
gy
= 0.25;
gp
= 0.5;
rho_g
= 0.9;
rho_u
= 0.95;
lambda = 1;
beta
= 0.9;
% Define:
%
% Y=[y(t),pi(t),g(t),u(t),E_t y(t+1),E_t pi(t+1)]
%
ny = 6; % # of variables in vector Y
ne = 2; % # of fundamental shocks
nn = 2; % # of expectation errors
%
% Initialize the matrices
%
A0 = zeros(ny,ny);
A1 = zeros(ny,ny);
B
= zeros(ny,ne);
C
= zeros(ny,nn);
%
% Output
%
A0(1,1) = 1;
A1(1,5) = 1;
C(1,1) = 1;
%
% Inflation
%
A0(2,2) = 1;
A1(2,6) = 1;
C(2,2) = 1;
%
% IS shock
%
A0(3,3) = 1;
A1(3,3) = rho_g;
B(3,1) = 1;
%
% Supply shock
%
A0(4,4) = 1;
A1(4,4) = rho_u;
B(4,2) = 1;
%
% IS curve
%
A0(5,1) = 1+alpha*gy;
A0(5,2) = alpha*gp;
A0(5,3) = -1;
A0(5,5) = -1;
A0(5,6) = -alpha;
69
%
% Phillips Curve
%
A0(6,1) = -lambda;
A0(6,2) = 1;
A0(6,4) = -1;
A0(6,6) = -beta;
%
% Call Sims routine
%
[MY,ME,ETA,MU_]=sims_solve(A0,A1,B,C);
1.6.4
AK growth model
s log(Ct+s )
s=0
70
havior of the consumer. The first order condition associated to the consumption/savings decisions may be obtained forming the following Lagrangean,
where t is the multiplier associated to the resource constraint
Lt = Et
s=0
Terms involving Ct :
max Et (log(Ct ) t Ct ) = max (log(Ct ) t Ct )
{Ct }
{Ct }
{Kt+1 }
71
T
X
k=0
k + lim T Et (XT +1 )
T
The second term in the right hand side of the latter equation corresponds
precisely to the transversality condition. Hence, Xt reduces to
Xt =
Kt+1 =
Ct
1
1
72
consumption, output and investment are just a linear function of capital, the
nonstationarity of capital translates into the non stationarity of these variables. Nevertheless, as can be seen from the law of motion of consumption, for
example, log(Ct ) log(Kt ) is a stationary process. Kt and Ct are then said
to be cointegrated with a cointegrating vector (1, 1).
This has extremely important economic implications, that may be analyzed
in the light of the impulse response functions, reported in figure 1.15. In fact,
figure 1.15 reports two balanced growth paths for each variable: The first one
corresponds to the path without any shock, the second one corresponds to
the path that includes a non expected positive shock on technology in period
10. As can be seen, this shock yields a permanent increase in all variables.
Therefore, this model can account for the fact that countries may not converge.
Why is that so? The answer to this question is actually simple and may be
Figure 1.15: Impulse response functions
Output
Consumption
0.052
0.0135
0.05
0.013
0.048
0.0125
0.046
0.012
0.044
0.0115
0.042
0.011
0.04
0.0105
0.038
10
20
30
40
50
0.01
10
20
30
Time
Time
Investment
Capital
40
50
40
50
0.038
1.3
0.036
0.034
1.2
0.032
1.1
0.03
0.028
10
20
30
Time
40
50
10
20
30
Time
73
Consumption
0.09
0.024
0.08
0.022
0.02
0.07
0.018
0.06
0.016
0.05
0.014
0.04
0.03
0.012
0
50
100
Time
150
200
0.01
50
Investment
100
Time
150
200
150
200
Capital
0.07
2.5
0.06
2
0.05
0.04
1.5
0.03
0.02
50
100
Time
150
200
50
100
Time
the rate of growth of each variable, which estimates are reported in table (1.4)
and which distributions are represented in figures 1.171.20. It is interesting
to note that all variables exhibit when taken in loglevels a spurious
correlation with output that just reflects the existence of a common trend due
to the balanced growth path hypothesis.
74
Corr(.,Y )
Corr(.,Y )
Y
0.40
0.79
1.00
-0.01
Y
0.99
C
0.40
0.09
0.30
0.93
C
0.99
I
0.40
1.06
0.99
-0.02
I
0.99
K
0.40
0.09
-0.08
0.93
K
0.99
Consumption
200
200
150
150
100
100
50
50
4
Time
4
Time
x 10
Investment
200
150
150
100
100
50
50
4
Time
6
3
x 10
Capital
200
6
3
x 10
0
2.5
3.5
4
Time
4.5
5.5
3
x 10
75
Consumption
200
200
150
150
100
100
50
50
8
Time
10
x 10
0.5
1
Time
Investment
200
150
150
100
100
50
50
0.01
2
x 10
Capital
200
0
0.008 0.009
1.5
0.5
1
Time
1.5
2
x 10
Consumption
5000
250
4000
200
3000
150
2000
100
1000
50
0
60
40
20
0
Time
20
40
60
0
0.2
0.25
Investment
0.3
0.35
Time
0.4
0.45
Capital
250
200
200
150
150
100
100
50
50
0
0.998
0.9985
0.999 0.9995
Time
1.0005
0
0.4
0.2
0
Time
0.2
0.4
76
Consumption
200
250
200
150
150
100
100
50
0
0.4
50
0.2
0
Time
0.2
0.4
0
0.7
0.75
Investment
200
150
150
100
100
50
50
0.2
0
Time
0.85
Time
0.9
0.95
Capital
200
0
0.4
0.8
0.2
0.4
0
0.4
0.2
0
Time
0.2
0.4
77
78
for s
= 1:nsim;
disp(s)
randn(state,s);
e
= randn(long,1)*se;
a
= zeros(long,1);
K
= zeros(long,1);
a(1) = log(ab)+e(1);
K(1) = K0;
for
i
= 2:long;
a(i)= rho*a(i-1)+(1-rho)*log(ab)+e(i);
K(i)= beta*(exp(a(i-1))+1-delta)*K(i-1);
end;
C
= (1-beta)*(exp(a)+1-delta).*K;
Y
= exp(a).*K;
I
= Y-C;
X
= [Y C I K];
dx
= diff(log(X));
mx(s,:)
= mean(dx);
sx(s,:)
= std(dx);
tmp
= corrcoef(dx);cx(s,:)=tmp(1,:);
tmp
= corrcoef(dx(2:end,1),dx(1:end-1,1));ry=tmp(1,2);
tmp
= corrcoef(dx(2:end,2),dx(1:end-1,2));rc=tmp(1,2);
tmp
= corrcoef(dx(2:end,3),dx(1:end-1,3));ri=tmp(1,2);
tmp
= corrcoef(dx(2:end,4),dx(1:end-1,4));rk=tmp(1,2);
rx(s,:)
= [ry rc ri rk];
end;
disp(mean(mx))
disp(mean(sx))
disp(mean(cx))
disp(mean(rx))
1.6.5
Announcements
In the last two examples, we will help you to give an answer to this crucial
question:
Why do these two guys annoy us with rational expectations?
In this example we will show you how different may the impulse response to a
shock be different depending on the fact that the shock is announced or not.
To illustrate this issue, let us go back to the problem of asset pricing. Let pt
be the price of a stock, dt be the dividend which will be taken as exogenous
and r be the rate of return on a riskless asset, assumed to be held constant
over time. As we have seen earlier, standard theory of finance states that when
agents are risk neutral, the asset pricing equation is given by:
Et pt+1 pt dt
+
=r
pt
pt
79
or equivalently
1
1
Et pt+1 +
dt
1+r
1+r
Let us now consider that the dividend policy of the firm is such that from
pt =
period 0 on, the firm serves a dividend equal to d0 . The price of the asset is
therefore given by
i
1 X
1
d0
pt =
Et dt+i =
1+r
1+r
r
i=0
Unexpected shock
2.5
2.5
1.5
1.5
0.5
10
20
30
Time
40
50
60
0.5
10
20
t0=20, T=40
2.5
1.5
1.5
10
20
30
Time
40
50
60
40
50
60
t0=30, T=40
2.5
0.5
30
Time
40
50
60
0.5
10
20
30
Time
80
it
it
T 1
1 X
1
1 X
1
d0 +
d1
1+r
1+r
1+r
1+r
i=t
i=T
it
it
T
1
X
X
1
1
1
1
d0 +
(d1 d0 + d0 )
1+r
1+r
1+r
1+r
i=t
i=T
it
it
1
1
1 X
1 X
d0 +
(d1 d0 )
1+r
1+r
1+r
1+r
i=t
i=T
d1
r
Hence, the dynamics of the asset price is given by
d0
for t < t0
r
T t
d1 d0
d0
1
pt =
for t0 6 t 6 T
+ 1+r
r
dr1
for t > T
r
pt =
Hence, compared to the earlier situation, there is now a transition phase that
takes place as soon as the individuals has learnt the news and exploits this
additional piece of information when formulating her expectations. This dynamics is depicted in figure 1.21 for different dates of announcement.
1.6.6
As a last example, we now have a look at the socalled Lucas critique. One
typical answer to the question raised in the previous section may be found
81
in the socalled Lucas critique (see e.g. Lucas [1976]) , or the econometric
policy evaluation critique, which asserts that because the apparently (for old
fashioned econometricians) structural parameters of a model may change when
policy changes, standard econometrics may not be used to study alternative
regimes. In order to illustrate this point, let us go back to the simplest example
we were dealing with:
yt = aEt yt+1 + bxt
xt = xt1 + t
which solution is given by
b
xt
1 a
Now let us assume for a while that yt denotes output and xt is money, which is
yt =
82
10
15
20
Time
25
30
35
40
b
xt
1 a
83
think of the model, solve the model and therefore evaluate and test the model.
This will be studied in the next chapter.
84
Bibliography
Blanchard, O. and C. Kahn, The Solution of Linear Difference Models under
Rational Expectations, Econometrica, 1980, 48 (5), 13051311.
Blanchard, O.J. and S. Fisher, Lectures on Macroeconomics, Cambridge: MIT
Press, 1989.
Lubik, T.A. and F. Schorfheide, Computing Sunspot Equilibria in Linear Rational Expectations Models, Journal of Economic Dynamics and Control,
2003, 28, 273285.
Lucas, R., Econometric policy Evaluation : a Critique, in K. Brunner and
A.H. Meltzer, editors, The Phillips Curve and Labor Markets, Amsterdam: NorthHolland, 1976.
Muth, J.F., Optimal Properties of Exponentially Weighted Forecasts, Journal
of the American Statistical Association, 1960, 55.
, Rational Expections and the Theory of Price Movements, Econometrica, 1961, 29, 315335.
Romer, P., Increasing Returns and Long Run Growth, Journal of Political
Economy, 1986, 94, 10021037.
Sargent, T., Macroeconomic Theory, MIT Press, 1979.
Sargent, T.J., Dynamic Macroeconomic Theory, Londres: Harvard University
Press, 1987.
Sims, C., Solving Linear Rational Expectations Models, manuscript, Princeton
University 2000.
85
86
BIBLIOGRAPHY
Contents
1 Expectations and Economic Dynamics
1.1
1.2
1.2.1
1.2.2
1.2.3
15
1.2.4
18
23
1.3.1
. . . . . . . .
24
1.3.2
Factorization . . . . . . . . . . . . . . . . . . . . . . . .
28
1.3.3
A matricial approach . . . . . . . . . . . . . . . . . . . .
29
33
1.4.1
Representation . . . . . . . . . . . . . . . . . . . . . . .
33
1.4.2
35
38
1.5.1
38
1.5.2
Representation . . . . . . . . . . . . . . . . . . . . . . .
39
1.5.3
40
1.5.4
47
Economic examples . . . . . . . . . . . . . . . . . . . . . . . . .
50
1.6.1
Labor demand . . . . . . . . . . . . . . . . . . . . . . .
51
1.6.2
58
1.6.3
65
1.6.4
AK growth model . . . . . . . . . . . . . . . . . . . . .
69
1.6.5
Announcements . . . . . . . . . . . . . . . . . . . . . . .
78
1.3
1.4
1.5
1.6
87
88
CONTENTS
1.6.6
80
List of Figures
1.1
10
1.2
Forward Solution . . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.3
16
1.4
Backward Solution . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.5
Deterministic Bubble . . . . . . . . . . . . . . . . . . . . . . . .
20
1.6
Bursting Bubble . . . . . . . . . . . . . . . . . . . . . . . . . .
22
1.7
Backwardforward solution . . . . . . . . . . . . . . . . . . . .
27
1.8
30
1.9
A source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
32
. . . . . . . . . . . . . . . . . . . . . . . . . .
33
48
56
66
. . . . . . . . . . . . . . . . . . . .
72
73
74
75
75
76
79
82
89
90
LIST OF FIGURES
List of Tables
1.1
56
1.2
63
1.3
67
1.4
MonteCarlo Simulations . . . . . . . . . . . . . . . . . . . . .
74
91
Lecture Notes 2
McGrattan (1999)), which, as PEA, may be thought of as a generalization of the method of undetermined coefficients to the higher order
but exploits some orthogonality conditions rather than relying on simulations;
Each method is illustrated by an economic example, which is intended to show
you the potential and simplicity of the method. However, before going to such
methods, we shall now see why linearizing many not always be a good idea.
The big question is then
What are we missing?
2.1
u(y, x)g(x)dx
The first order condition1 for choosing y is then given by (applying Leibniz
rule)
u(y, x)g(x)dx = 0
u(y, x)g(x)dx = 0
y
(2.1)
1
Note that since u(.) is concave and g(.) is positive, this condition is necessary and
sufficient.
y
x
1
y
Jy
hxx hxy
+ (y x)
u(y, x) = (y x)
x
Jx
hyx hyy
2
1
= Jy y + Jx x +
hxx x2 + (hxy + hyx )xy + hyy y 2
2
(2.2)
Note that in this case, we are not maximizing the expected value of the problem
but the value, taking into account the expected value of x. Given the functional
form (2.2), the first order condition is now
Jy + (hxy + hyx )
which is exactly the same as before. In other words, we have for the
quadratic formulation
Argmax E(u(y, x)) = Argmax u(y, E(x))
y
This is what is usually called the Certainty equivalence principle: risk does not
matter in decision making, the only thing that matters is the average value
of the random variable x, not its variability. But this is usually not a general
result. Let us consider, for example, the case of Burnsides [1998] asset pricing
model.
3
2.1.1
An assetpricing example
This model is a standard asset pricing model for which (i) the marginal intertemporal rate of substitution is an exponential function of the rate of growth
of consumption and (ii) the endowment is a Gaussian exogenous process. As
shown by Burnside (1998), this setting permits to obtain a closed form solution
to the problem. We consider a frictionless pure exchange economy a
` la Mehra
and Prescott (1985) and Rietz (1988) with a single household and a unique
perishable consumption good produced by a single tree. The household can
hold equity shares to transfer wealth from one period to another. The problem of a single agent is then to choose consumption and equity holdings to
maximize her expected discounted stream of utility, given by
Et
X
=0
ct+
with (, 0) (0, 1]
(2.3)
(2.4)
(2.5)
where xt , the rate of growth of dividends, is assumed to be a Gaussian stationary AR(1) process
xt = (1 )x + xt1 + t
(2.6)
(2.7)
Burnside (1998) shows that the above equation admits an exact solution of
the form2
yt =
(2.8)
i=1
where
and
2 2
2(1 i ) 2 (1 2i )
ai = xi +
+
i
2(1 )2
1
1 2
(1 i )
1
As can be seen from the definition of ai , the volatility of the shock, directly
bi =
enters the decision rule,, therefore Burnsides [1998] model does not make the
certainty equivalent hypothesis: risk matters for asset holdings decisions.
What happens then, if we now obtain a solution relying on a first order
Taylor approximation of the model?
First of all let us determine the deterministic steady state of the economy:
y ? = exp(x? )(1 + y ? )
x? = x? + (1 )x
such that we get
exp(x? )
1 exp(x? )
= x
y? =
x?
(2.9)
(2.10)
(2.11)
b
xt = exp(x? )Et (b
xt+1 ) + exp(x? )Et (b
xt+1 )
taking expectations and identifying, we obtain
=
exp(x? )
(1 exp(x? ))(1 exp(x? ))
N
1 X yt yet
E1 = 100
yt
N
t=1
and
yt yet
= 100 max
yt
where yt denotes the true solution to pricedividend ratio and yet is the ap-
the true solution, while E is the maximal relative error made using the approximation rather than the true solution. These criteria are evaluated over
the interval xt [x x , x + x ] where is selected such that we explore
6
99.99% of the distribution of x. Table 2.1 reports E1 and E for the different cases. Our benchmark experiment amounts to considering the Mehra and
Prescotts [1985] parameterization of the asset pricing model. We therefore
set the mean of the rate of growth of dividend to x = 0.0179, its persistence to
= 0.139 and the volatility of the innovations to = 0.0348. These values
are consistent with the properties of consumption growth in annual data from
1889 to 1979. was set to -1.5, the value widely used in the literature, and
to 0.95, which is standard for annual frequency. We then investigate the implications of changes in these parameters in terms of accuracy. In particular, we
study the implications of larger and lower impatience, higher volatility, larger
curvature of the utility function and more persistence in the rate of growth of
dividends.
Table 2.1: Accuracy check
E1
E
E1
E
Benchmark
1.43
1.46
=0.5
0.29
0.29
=0.5
0.24
0.26
=0.001
0.01
0.03
=0.99
2.92
2.94
=0.1
11.70
11.72
=-10
23.53
24.47
=0
1.57
1.57
=-5
8.57
8.85
=0.5
5.52
6.76
=0
0.50
0.51
=0.9
37.50
118.94
Note: The series defining the true solution was truncated after 800 terms,
as no significant improvement was found adding additional terms at the
machine accuracy. When exploring variations in , the overall volatility
of the rate of growth of dividends was maintained to its benchmark level.
Exact (high )
Exact (low )
Linear App.
18
16
14
12
10
8
6
4
2
0
0.15
0.1
0.05
xt
0.05
0.1
0.15
0.2
1. in terms of curvature,
2. in terms of level.
The first type of error is obvious, as the linear approximation is not intended
(as it cannot) to capture any curvature. The second type of error is related
to the fact that we are using a approximation about the deterministic steady
state. Therefore, the latter source of error is related to the risk component. In
fact, this may be understood in light of the ai terms in the exact solution which
include the volatility of the innovations. In order to be sure that this
error is related to this component, we also report the exact solution when we
cut the overall volatility by 25% (thick dashed line). As can be seen the level
error tends to diminish dramatically, which indicates that the risk component
plays a major role in this as the average error is cut by 20% then (5% as
far as the maximal error is concerned). Hence, this suggests that the linear
approximation may only be accurate for low enough variability and curvature,
which prevents its use for studying structural breaks.
2.2
We now consider another situation where the linear approximation may perform poorly. This situation is related to the existence of strong asymmetries
in decision rules or strong asymmetries in the objective functions the economic
agents have to optimize. In order to illustrate this situation, let us take the
problem of a firm that has to decide on employment and which faces asymmetric adjustment costs. Asymmetric adjustment costs may be justified on
institutional grounds. We may argue for example that there exist laws in the
economy that render firings more costly than hirings.
We consider the case of a firm that has to decide on its level of employment.
The firm is infinitely lived and produces a good relying on a decreasing returns
to scale technology that essentially uses labor another way to think of it
would be to assume that physical capital is a fixedfactor. This technology is
9
subject to
nt = t + nt1
(2.12)
1
= A wt +
Et t+1
1+r
(2.13)
(2.14)
10
50
40
30
20
10
0
2
1.5
0.5
0
t
0.5
1.5
Note that
j
Et (wt+j ) = wt + (1 )w +
j1
X
i=0
therefore
t =
d
= j wt + (1 j )w
2
w
1+r
(1 + r)A
wt
r
1+r 1+r
Then, t is given by
t = (wt ) C
0 1
(1 + r)A
w
1+r
wt
r
1+r 1+r
and we have
nt = (wt ) + nt1
11
wt
Since C 0 (.) may exhibit strong asymmetries, the decision rule may be extremely
nonlinear too to yield a decision rule of the form we depict in figure (2.3). As
can be seen from the graph, the linear approximation would do a very poor
job, as any departure from the steady state level (? = 0) would create a large
error. In other words, and as should have been expected, strong nonlinearities
forbid the use of linear approximations.
Beyond this point that may appear quite peculiar, since such important
nonlinearities are barely encountered after all, there exists a large class of
models for which linear approximation would do a bad job: models with binding constraints that we now investigate.
2.3
In this section, we will provide you with an example where linear approximation should not be used because the decision rules are not differentiable. This
12
is the case when the agent faces possibly binding constraints. To illustrate it
we will develop a model of a consumer who is constrained on its borrowing in
the financial market.
We consider the case of a household who determines her consumption/saving
plans in order to maximize her lifetime expected utility, which is characterized
by the function:
Et
=0
c1
t+ 1
1
(2.15)
(2.16)
with t ; N (0, 2 ). These revenus are then used to consume and purchase
assets on the financial market. Therefore, the budget constraint in t is given
by
at+1 = (1 + r)at + t ct
(2.17)
The first order conditions to this model may be obtained forming the
5
Et (.) denotes mathematical conditional expectations. Expectations are conditional on
information available at the beginning of period t.
13
X
c1
t+ 1
1
=0
(2.18)
t = t + (1 + r)Et t+1
(2.19)
(2.20)
t at+1 = 0
(2.21)
t > 0
(2.22)
t > 0
(2.23)
together with
c
= min ((1 + r)at + t ) , (1 + r)Et c
t
t+1
The decision rule of consumption is not differentiable in the point where assets
holdings are not sufficient to guaranty a positive net position on asset holdings:
((1 + r)at + t ) = (1 + r)Et c
t+1
Just to give you an idea of this phenomenon, we reported in figure 2.4 the
consumption decision rule for two different values of t as a function of cash
onhand which is given by (1+r)at +t .6 This nondifferentiability implies
obviously that linear approximation cannot be useful in this case, as they are
not defined in the neighborhood of the point that makes the household switch
from the unconstrained to the constrained regime. Nevertheless, if we are
6
14
120
Consumption (c )
100
80
60
40
20
0
0
50
100
150
200
cashonhand ((1+r)att)
15
250
300
to consider an economy with tiny shocks and where the steady state lies in
the unconstrained regime, the linear approximation may be sufficient as the
decision rule is particularly smooth in this region because of consumption
smoothing).
We therefore need to investigate alternative methods, which however require some preliminaries
16
Bibliography
Burnside, C., Solving asset pricing models with Gaussian shocks, Journal of
Economic Dynamics and Control, 1998, 22, 329340.
Christiano, L., Solving the Stochastic Growth Model by Linear Quadratic
Approximation and by Value Function Iteration, Journal of Business and
Economic Statistics, 1990, 8 (1), 2326.
Collard, F. and M. Juillard, Accuracy of Stochastic Perturbation Methods:
The Case of Asset Pricing Models, Journal of Economic Dynamics and
Control, 2001, 25 (6/7), 979999.
and
Marcet, A., Solving Nonlinear Stochastic Models by Parametrizing Expectations, mimeo, CarnegieMellon University 1988.
and G. Lorenzoni, The Parameterized Expectations Approach: Some
Practical Issues, in R. Marimon and A. Scott, editors, Computational
methods for the study of dynamic economies, Oxford: Oxford University
Press, 1999, pp. 143171.
McGrattan, E.R., Solving the Stochastic Growth Models with a Finite Element Method, Journal of Economic Dynamics and Control, 1996, 20,
1942.
, Application of Weighted Residual Methods to Dynamic Economic Models, in R. Marimon and A. Scott, editors, Computational methods for
the study of dynamic economies, Oxford: Oxford University Press, 1999,
pp. 114142.
Mehra, R. and E.C. Prescott, The equity premium: a puzzle, Journal of
Monetary Economics, 1985, 15, 145161.
Rietz, T.A., The equity risk premium: a solution, Journal of Monetary Economics, 1988, 22, 117131.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.
18
Index
Asymmetries, 9
Binding constraints, 12
Certainty equivalence, 2
Nonlinearities, 9
19
20
Contents
2 Towards nonlinear methods
2.1
2.1.1
An assetpricing example . . . . . . . . . . . . . . . . .
2.2
2.3
12
21
22
List of Figures
2.1
Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
11
2.3
. . . . . . . . . . . . . . . . . . . . . . . .
12
2.4
15
23
24
List of Tables
2.1
Accuracy check . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
Lecture Notes 3
Approximation Methods
In this chapter, we deal with a very important problem that we will encounter
in a wide variety of economic problems: approximation of functions. Such
a problem commonly occurs when it is too costly either in terms of time or
complexity to compute the true function or when this function is unknown
and we just need to have a rough idea of its main properties. Usually the only
thing that is required then is to be able to compute this function at one or a
few points and formulate a guess for all other values. This leaves us with some
choice concerning either the local or global character of the approximation
and the level of accuracy we want to achieve. As we will see in different
applications, choosing the method is often a matter of efficiency and ease of
computing.
Following Judd [1998], we will consider 3 types of approximation methods
1. Local approximation, which essentially exploits information on the value
of the function in one point and its derivatives at the same point. The
idea is then to obtain a (hopefully) good approximation of the function
in a neighborhood of the benchmark point.
2. Lp approximations, which actually find a nice function that is close to
the function we want to evaluate in the sense of a Lp norm. Ideally, we
would need information on the whole function to find a good approximation, which is usually infeasible or which would make the problem
1
of approximation totally irrelevant! Therefore, we usually rely on interpolation, which then appears as the other side of the same problem, but
only requires to know the function at some points.
3. Regressions, which may be viewed as an intermediate situation between
the two preceding cases, as it usually relies exactly as in econometrics,
on m moments to find n parameters of the approximating function.
3.1
Local approximations
3.1.1
Taylor series expansion is certainly the most wellknown and natural approximation to any student.
The basic framework: This approximation relies on the standard Taylors
theorem:
Theorem 1 Suppose F : R R is a C k+1 function, then for x? Rn , we
have
F (x) = F (x? ) +
+
1
2
n
X
1
k!
n
X
F ?
(x )(xi x?i )
xi
i=1
n
X
i=1 i1 =1
n
X
F
(x? )(xi1 x?i1 )(xi2 x?i2 ) + . . .
xi1 xi2
n
X
F
(x? )(xi1 x?i1 ) . . . (xik x?ik )
xi1 . . . xi1
i1 =1
ik =1
+O kx x? kk+1
+
...
case, then we are sure that the error will be at most of order O kx x? kk+1 .1 .
In fact, we may look at Taylor series expansion from a slightly different
1 F
?
k! xi (x ).
n
X
k=0
k (x x? )k
X
xk
i=0
k!
k x k <
r = sup |x| :
k=0
r therefore provides the maximal radius of x C for which the complex series
converges. That is for any x C such that |x| < r, the series converges while
Let us recall at this point that a function f : Rn Rk is O x` if limx0
kf (x)k
kxk`
<
n
X
k=0
k (x x? )k for kx x? k < r
For example, let us consider the tangent function tan(x). tan(x) is defined by
the ratio of two analytic functions
tan(x) =
sin(x)
cos(x)
(1)n
x2n
(2n)!
(1)n
x2n+1
(2n + 1)!
i=0
sin(x) =
X
i=0
X
1 kF ?
(x )(x x? )k
k! xk
k=0
X
xk
k=1
What the theorem tells us is that this approximation may be used for values
of x such that kx x? k is below kx? xo k = k0 1k = 1, that is for x such
that 1 < x < 1. In other words, the radius of convergence of the Taylor
report in table 3.1, the true value of log(1 x), its approximate value using
100 terms in the summation, and the absolute deviation of this approximation
from the true value.2 As can be seen from the table, as soon as kx? xo k
approaches the radius, the accuracy of the approximation by the Taylor series
expansion performs more and more poorly.
Table 3.1: Taylor series expansion for log(1 x)
x
-0.9999
-0.9900
-0.9000
-0.5000
0.0000
0.5000
0.9000
0.9900
0.9999
log(1 x)
0.69309718
0.68813464
0.64185389
0.40546511
0.00000000
-0.69314718
-2.30258509
-4.60517019
-9.21034037
100 (x)
0.68817193
0.68632282
0.64185376
0.40546511
0.00000000
-0.69314718
-2.30258291
-4.38945277
-5.17740221
0.00492525
0.00181182
1.25155e-007
1.11022e-016
0
2.22045e-016
2.18735e-006
0.215717
4.03294
2
The word true lies between quotes as it has been computed by the computer and is
therefore an approximation!
1
1 1
=
kt+1 + 1
ct
ct+1
(3.1)
(3.2)
let us denote by b
kt the deviation of the capital stock kt with
respect to its steady state level in period t, such that b
kt = kt k ? . Likewise,
Linearization:
we define b
ct = ct c? . The first step of linearization is to reexpress the system
F (kt+1 , ct+1 , kt , ct ) =
kt+1 kt + ct (1 )kt
1
ct
1
ct+1
kt+1 + 1
+F3 (k ? , c? , k ? , c? )b
kt + F4 (k ? , c? , k ? , c? )b
ct
Since we just want to make the case of Taylor series expansions, we do not need to be
any more precise on the origin of these two equations. Therefore, we will take them as given,
but we will come back to their determination in the sequel.
1
1
+1 b
ct+1 c1? ( 1)k ? 2 b
c?2 b
kt+1 = 0
ct + c?2 k
which simplifies to
(
b
kt+1 k ? 1 b
k +b
ct (1 )b
kt = 0
? 1t
?
b
ct + k
+1 b
ct+1 ( 1) kc ? k ? 1 b
kt+1 = 0
We then have to solve the implied linear dynamic system, bu this is another
story that we will deal with in a couple of chapters.
Loglinearization:
proximation to the equilibrium. Such an approximation is usually taken because it delivers a natural interpretation of the coefficients in front of the
variables: these can be interpreted as elasticities. Indeed, lets consider the
following onedimensional function f (x) and lets assume that we want to take
a loglinear approximation of f around x? . This would amount to have, as
deviation, a logdeviation rather than a simple deviation, such that we can
define
x
b = log(x) log(x? )
Then, a restatement of the problem is in order, as we are to take an approximation with respect to log(x):
f (x) = f (exp(log(x)))
which leads to the following first order Taylor expansion
f (x) ' f (x? ) + f 0 (exp(log(x? ))) exp(log(x? ))b
x = f (x? ) + f 0 (x? )x? x
b
If we apply this technic to the growth model, we end up with the following
system
b
ct (1 )b
kt = 0
b
ct + b
ct+1 ( 1)(1 (1 ))b
kt+1 = 0
b
kt+1
1(1) b
kt
1(1)
3.2
Regressions as approximation
This type of approximation is particularly common in economics as it just corresponds to ordinary least square (OLS) . As you know the problem essentially
amounts to approximate a function F by another function G of exogenous variables. In other words, we have a set of observable endogenous variables y i ,
i = 1 . . . N , which we are willing to explain in terms of the set of exogenous
variables Xi = {x1i , . . . , xki }, i = 1 . . . N . This problem amounts to find a set
N
X
i=1
(3.3)
The idea is therefore that is chosen such that on average G(X ; ) is close
enough to y, such that G delivers a good approximation for the true function
F in the region of x that we consider. This is actually nothing else than
econometrics!
There are however several choices that can be made and that give us much
more freedom than in econometrics. We now consider investigate these choices.
points they can use to reveal information on the F function, as data are given
by history. In numerical analysis this constraint is relaxed, we may be exactly
identified for example, meaning that the number of data points, N , is exactly
equal to the number of parameters, p, we want to reveal. Never would a good
econometrician do this kind of thing! Obviously, we can impose a situation
where N > p in order to exploit more information. The difference between
these two choices should be clear to you, in the first case we are sure that
the approximation will be exact in the selected points, whereas this will not
necessarily be the case in the second experiment. To see that, just think of the
following: assume we have a sample made of 2 data points for a function that
we want to approximate using a linear function. we actually need 2 parameters
8
to define a linear function: the intercept, and the slope . In such a case,
(3.3) rewrites
min (y1 x1 )2 + (y2 x2 )2
{,}
1 1
x1 x2
y1 x1
y2 x2
A.v = 0
This system then just amounts to find the null space of the matrix A, which,
in the case x1 6= x2 (which we can always impose as we will see next), imposes
vi = 0, i = 1, 2. This therefore leads to
y1 = + x1
y2 = + x2
such that the approximation is exact. When the system is overidentified this
is not the case anymore.
Selection of data points:
x?
X
`=0
(1 )` it`
Therefore, taking as a basis for exogenous variables powers of the capital stock
is basically a bad idea. To see that, let us assume that = 0.025 and it is a
white noise process with volatility 0.1, then let us simulate a 1000 data points
process for the capital stock and let us compute the correlation matrix for k tj ,
10
j = 1, . . . , 4, we get:
kt
kt2
kt3
kt4
kt
1.0000
0.9835
0.9572
0.9315
kt2
0.9835
1.0000
0.9933
0.9801
kt3
0.9572
0.9933
1.0000
0.9963
kt4
0.9315
0.9801
0.9963
1.0000
implying a very high correlation between the powers of the capital stock, therefore raising the possibility of occurrence of multicolinearity. A typical answer
to this problem is to rely on orthogonal polynomials rather than monomials.
We will discuss this issue extensively in the next section. A second possibility is
to rely on parcimonious approaches that do not require too much information
in terms of function specification. An alternative is to use neural networks.
Before going to Neural Network approximations, we first have to deal with
a potential problem you may face with all the examples I gave is that when
we solve a model: the true decision rule is unknown, such that we do not
know the function we are dealing with. However, the main properties of the
decision rule are known, in particular, we know that it has to satisfy some
conditions imposed by economic theory. As an example, let us attempt to find
an approximate solution for the consumption decision rule in the deterministic
optimal growth model. Economic theory teaches us that consumption should
satisfy the Euler equation
1
c
= c
t
t+1 kt+1 + 1
(3.4)
kt+1 = kt ct + (1 )kt
Let us assume that consumption may be approximated by
(3.5)
over the interval [k; k]. Our problem is then to find the triple {0 , 1 , 2 } such
that
N
X
t=1
1
2
(kt , ) (kt+1 , ) kt+1
+1
11
is minimum. This actually amounts to solve a nonlinear least squares problem. However, a lot of structure is put on this problem as kt+1 has to satisfy
the law of motion for capital:
1
R(kt , ) (kt , ) (kt+1 , ) kt+1
+1
4. If the quantity
N
X
R(ki , )2
t=1
The true decision rule was computed using value iteration, which we shall study later.
12
from the graph, the solution we obtained with our approximation is rather
accurate and we actually have
1 X ctrue
cLS
4
i
i
true
= 8.743369e and
N
ci
i=1
true
ci cLS
i
= 0.005140
max
true
c
i={1,...,N }
i
such that the error we are making is particularly small. Indeed, using the
approximate decision rule, an agent would make a maximal error of 0.51% in
its economic calculus i.e. 51 cents for each 100$ spent on consumption.
Figure 3.2: Leastsquare approximation of consumption
1.8
1.6
Consumption
1.4
1.2
1
0.8
0.6
0.4
0
0.5
1.5
2.5
3
Capital stock
3.5
4.5
n
X
i g(x(i) )
i=1
where x is the vector of inputs, and h and g are scalar functions. A common
choice for g in the case of the dingle layer neural network is g(x) = x. Therefore, if we set h to be identity too, we are back to the standard OLS model
with monomials. A second a perhaps much more interesting type of neural
Figure 3.3: Neural Networks
x1
x1
x2
x2
..
..
..
..
xn
~
q
>
- y
q
1
R
q
..
..
..
..
xn
(a)
wU
:
s
- - y
3
(b)
network is depicted in panel (b) of figure 3.3. This type of neural network
is called the hiddenlayer feedforward network. In this specification, we are
closer to the idea of network as the transformed input is fed to another node
that process it to get information. The associated function form is given by
G(x, , ) = f
m
X
j h
j=1
n
X
i=1
!
j g(x(i) )
i
1
0
14
for x > 0
for x < 0
1
1 + exp(x)
Z x
x2
1
exp 2
h(x) =
2
2
Obtaining an approximation then simply amounts to determine the set
of coefficients {j , ij ; i = 1 . . . n, j = 1, . . . , m}. This can be simply achieved
{,}
N
X
`=1
One particular nice feature of neural networks is that they deliver accurate
approximation using only few parameters. This characteristic is related to the
high flexibility of the approximating functions they use. Further, as established
by the following theorem by Hornik, Stinchcombe and White [1989], neural
network are extremely powerful in that they offer a universal approximation
method.
Theorem 3 Let f : Rn R be a continuous function to be approximated.
R
Let h be a continuous function, h : R R, such that either (i) h(x)dx
aj , j R, and w j Rn , wj 6= 0, m = 1, 2, . . .
be the set of all possible single hiddenlayer feedforward neural networks, using
h as the hidden layer activation function. Then, for all > 0, probability
measure , and compact sets K Rn , there is a g (h) such that
Z
sup |f (x) g(x)| 6 and
|f (x) g(x)|d 6
xK
15
! !
1 3
3
,2
F (x) = min max , x
2
2
over the interval [3; 3] and consider a single hiddenlayer feedforward network
of the form
Fe(x, , , ) =
1
2
+
1 + exp((1 x + 1 ) 1 + exp((2 x + 2 )
1
6.8424
1
-10.0893
2
-1.5091
2
-7.6414
2
-2.9238
N
X
i=1
b 2
(F (xi ) Fe (xi , )
! 21
= 0.0469
e
b
= max F (xi ) F (xi , ) = 0.2200
i
True
Neural Network
1.5
1
0.5
0
0.5
1
1.5
2
3
0
x
All these methods actually relied on regressions. They are simple, but
may be either totally unreliable and illconditioned in a number of problem, or
17
3.2.1
Orthogonal polynomials
This class of polynomial possesses by definition the orthogonality property which will prove to be extremely efficient and useful in a number of
problem. For instance, this will solve the multicolinearity problem we encountered in OLS. This property will also greatly simplify the computation of the
approximation in a number of problems we will deal with in the sequel. First
of all we need to introduce some preliminary concepts.
Definition 4 (Weighting function) A weighting function (x) on the interval [a; b] is a function that is positive almost everywhere on [a; b] and has a
finite integral on [a; b].
An example of such a weighting function is (x) = (1x)1/2 over the interval
[-1;1]. Indeed, limx1 (x) = 2/2, and 0 (x) > 0 such that (x) is positive
everywhere over the whole interval. Further
1
Z 1
2 1/2
(1 x )
dx = arcsin(x) =
1
Definition 5 (Inner product) Let us consider two functions f1 (x) and f2 (x)
both defined at least on [a; b], the inner product with respect to the weighting
function (x) is given by
hf1 , f2 i =
f1 (x)f2 (x)(x)dx
a
x
2
dx = 1 x = 0
hf1 , f2 i =
1x
1
1
18
Hence, in this case, we have that the inner product of 1 and x with respect
to (w) on the interval [-1;1] is identically null. This will actually define the
orthogonality property.
Definition 6 (Orthogonal Polynomials) The family of polynomials {Pn (x)}
is mutually orthogonal with respect to (x) iff
hPi , Pj i = 0 for i 6= j
Definition 7 (Orthonormal Polynomials) The family of polynomials {Pn (x)}
is mutually orthonormal with respect to (x) iff it is orthogonal and
hPi , Pi i = 1 for all i
Table 3.3 reports the most common families of orthogonal polynomials (see
Judd [1998] for a more detailed exposition) and table 3.4 their recursive formulation
Table 3.3: Orthogonal polynomials (definitions)
Family
Legendre
Chebychev
Laguerre
Hermite
3.3
(x)
1
(1 x2 )1/2
exp(x)
exp(x2 )
[a; b]
[1; 1]
[1; 1]
[0, )
(, )
Definition
n
dn
2 n
Pn (x) = (1)
2n n! dxn (1 x )
1
Tn (x) = cos(n cos (x))
dn
n
Ln (x) = exp(x)
n! dxn (x exp(x))
dn
2
Hn (x) = (1)n exp(x2 ) dx
n exp(x )
We will now discuss one of the most common approach to approximation that
goes back to the easy OLS approach we have seen previously.
19
0
P0 (x) = 1
1
P1 (x) = x
Recursion
Pn+1 (x) =
Chebychev
T0 (x) = 1
T1 (x) = x
Laguerre
L0 (x) = 1
L1 (x) = 1 x
Ln+1 (x) =
Hermite
H0 (x) = 1
H1 (x) = 2x
2n+1
n+1 xPn (x)
2n+1x
n+1 Ln (x)
n
n+1 Pn1 (x)
n
n+1 Ln1 (x)
deg(g)6n a
Pn
i=0 ci i (x),
i=0
i=0
20
which yields
ci =
Therefore, we have
F (x) '
hF, i i
hi , i i
n
X
hF, i i
i (x)
hi , i i
i=0
There are several examples of the use of this type of approximation. Fourier
approximation is an example of those which is suitable for periodic functions.
Nevertheless, I will focus in a coming section on one particular approach, which
we will use quite often in the next chapters: Chebychev approximation.
Beyond least square approximation, there exists other approaches that
departs from the least square by the norm they use:
Uniform approximation, which attempt to solve
lim max |F (x) pn (x)| = 0
n x[a;b]
deg(g)6n
There are theorems in approximation theory that states that the quality
of this approximation increases polynomially as the degree of polynomials increases, and the polynomial rate of convergence is faster for
21
3.4
Interpolation methods
Up to now, we have seen that there exist methods to compute the value of a
function at some particular points, but in most of the cases we are not only
interested by the function at some points but also between these points. This
is the problem of interpolation.
3.4.1
Linear interpolation
xi yi1 xi1 yi
yi yi1
x+
xi xi1
xi xi1
Although, such approximations have proved to be very useful in a lot of applications, it can be particularly inefficient for different reasons:
1. It does not deliver an approximating function but rather a collection of
approximations for each interval;
2. it requires to identify the interval where the approximation is to be
computed, which may be particularly costly when the interval is not
uniform;
3. it can obviously perform very badly for nonlinear functions.
Therefore, there obviously exist alternative interpolation methods that perform better, but which are not necessarily more efficient.
22
3.4.2
Lagrange interpolation
that yi = P (xi ). Therefore, the method is exact for each point. Lagrange
Y x xj
xi x j
j6=i
n
X
yi Pi (x)
i=1
The obvious problem that we can immediately see from this formula is that
if the number of point is high enough, this type of interpolation is totally
untractable. Indeed, just to compute a single Pi (x) this already requires 2(n
1) substractions, n multiplications. Then this has to be constructed for the
n data points to compute all needed Pi (x). Then to compute P (x) we need
n additions and n multiplications. Therefore, this requires 3n2 operations!
Therefore, one actually may actually attempt to compute directly:
P (x) =
n1
X
i x i
i=0
or
y1 = 0 + 1 x1 + 2 x21 + . . . + n1 xn1
y2 = 0 + 1 x2 + 2 x2 + . . . + n1 xn1
2
2
..
yn = 0 + 1 xn + 2 x2n + . . . + n1 xn1
n
A = y
23
xi , i = 1, . . . , n
A=
1 x1 x21
1 x2 x22
.. . .
.. ..
.
.
. .
1 xn x2n
xn1
1
xn1
2
..
.
xn1
n
P ? (x) is at most of degree n 1 and is zeros at all n nodes. But the only
Here again, the method may be quite demanding from a computational point of
view, as we have to invert a matrix of size (nn). There then exist much more
efficient methods, and in particular the socalled Chebychev approximation
that works very well for smooth functions.
3.5
Chebychev approximation
less, this interval may be generalized to [a; b], by transforming the data using
24
the formula
ya
1 for y [a; b]
ba
Beyond the standard orthogonality property, Chebychev polynomials exhibit
x=2
n
0 for i 6= j
X
n for i = j = 0
Ti (rk )Tj (rk ) =
n
i=1
for i = j 6= 0
2
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0
x
0.2
0.4
0.6
0.8
ci =
dx for i = 0 . . . n
1
1 x2
and
gn (x)
c0 X
+
ci Ti (x)
2
i=1
25
log(n)
nk
This theorem is of great importance as it actually states that the approximation gn (x) will be as close as we might want to F (x) as the degree of
approximation n increases to . In effect, since the approximation error is
increases. Further, the next theorem will establish a useful property on the
coefficients of the approximation.
Theorem 6 Assume F is a C k function over the interval [1; 1], and admits
a Chebychev representation
F (x) =
c0 X
+
ci Ti (x)
2
i=1
c
for j > 1
ik
G(x)
n
X
i=0
xa
1
i Ti 2
ba
2k 1
for k = 1 . . . , m
rk = cos
2m
2. Adjust the nodes, rk , to fit in the [a; b] interval
xk = (rk + 1)
ba
+ a for k = 1 . . . , m
2
i =
m
X
yk Ti (rk )
k=1
m
X
Ti (rk )2
k=1
n
X
i=0
xa
i Ti 2
1
ba
27
cov(y, Ti (x))
var(Ti (x))
X=
T0 (x1 )
T0 (x2 )
..
.
T1 (x1 )
T1 (x2 )
..
.
..
.
Tn (x1 )
Tn (x2 )
..
.
T0 (xm ) T1 (xm )
Tn (xm )
and Y =
y1
y2
..
.
ym
We now report two examples implementing the algorithm. The first one
deals with a smooth function of the type F (x) = x . The second one evaluate the accuracy of the approximation in the case of a nonsmooth function:
F (x) = min(max(1.5, (x 1/2)3 ), 2).
In the case of the smooth function, we set = 0.1 and approximate the
function over the interval [0.01; 2]. We select 100 nodes and evaluate the
accuracy of degree 2 and 6 approximation. Figure 3.6 reports the true function
and the corresponding approximation, table 3.5 reports the coefficients. As
can be seen from the table, adding terms in the approximation does not alter
the coefficients of lower degree. This just reflects the orthogonality properties
of the Chebychev polynomials, that we saw in the formula determining each
i . This is of great importance, as it states that once we have obtained a high
order approximation, obtaining lower orders is particularly simple. This is
the economization principle. Further, as can be seen from the figure, a good
approximation to the function is obtained at rather low degrees. Indeed, the
difference between the function and its approximation at order 6 is already
good.
28
c0
c1
c2
c3
c4
c5
c6
n=2
0.9547
0.1567
-0.0598
n=6
0.9547
0.1567
-0.0598
0.0324
-0.0202
0.0136
-0.0096
1.1
1
0.9
True
n=2
n=6
0.8
0.05
0.1
0.7
0.6
0
Residuals
0.05
0.5
1
x
1.5
0.15
0
29
n=2
n=6
0.5
1
x
1.5
xa
2
1
i
i
i=0
ba
Pn
(x 1/2)3 = 1.5
30
c0
c1
c2
c3
c4
c5
c6
c7
c8
c9
c10
c11
c12
c13
c14
c15
n=7
-0.0140
2.0549
0.4176
-0.3120
-0.1607
-0.0425
-0.0802
0.0571
n=15
-0.0140
2.0549
0.4176
-0.3120
-0.1607
-0.0425
-0.0802
0.0571
0.1828
0.0275
-0.1444
-0.0686
0.0548
0.0355
-0.0012
0.0208
1
2
4
0
x
0.5
4
31
0
x
and x satisfies
(x 1/2)3 = 2
In such a case, the approximation would be perfect with n = 3. This suggests
that piecewise approximation may be of interest in a number of cases.
3.6
Piecewise interpolation
i = 0, . . . , m 1
0, x < xi
0
1, xi 6 x 6 xi+1
Bi (x) =
0, x > xi+1
32
0,
xxi
x < xi
xi 6 x 6 xi+1
xi+1 xi ,
Bi1 (x) =
xi+2 x
xi+2 xi+1 , xi+1 6 x 6 xi+2
0,
x > xi+2
x xi
xi+n+1 x
n1
n1
n
Bi (x) =
Bi (x) +
Bi+1
(x)
xi+n xi
xi+n+1 xi+1
Cubic splines are the most widely used splines to interpolate functions. Therefore, we will describe the method in greater details in such a case.
Let
(3.6)
and
an1 + bn1 (xn xn1 ) + cn1 (xn xn1 )2 + dn1 (xn xn1 )3 = yn (3.7)
The second set of restrictions imposes continuity of the function on the upper
bound of each interval
Si (xi ) = Si1 (xi ) for i = 1, . . . , n 1
which implies, noting hi = xi xi1
ai = ai1 + bi1 hi + ci1 h2i + di1 h3i for i = 1, . . . , n 1
(3.8)
This furnishes 2n identification restrictions, such that 2n additional restrictions are still needed. Since we are dealing with a cubic spline interpolation,
this requires the approximation to be C 2 , implying that first and second order
derivatives should be continuous. This yields the following n 1 conditions
for the first order derivatives
0
Si0 (xi ) = Si1
(xi ) for i = 1, . . . , n 1
or
bi = bi1 + 2ci1 hi + 3d3i1 h2i for i = 1, . . . , n 1
(3.9)
or
2ci = 2ci1 + 6d3i1 hi for i = 1, . . . , n 1
(3.10)
This is the socalled Hermite spline. However, in a number of situation such information on the derivative of F is either not known or does
not exist (think of F not being differentiable at some points), such that
further source of information is needed. One can then rely on an approximation of the slope by the secant line. This is what is proposed
by thesecant Hermite spline, which amounts to approximate F 0 (x0 ) and
F 0 (xn ) by the secant line over the corresponding interval:
S00 (x0 ) =
S0 (x1 ) S0 (x0 )
Sn1 (xn ) Sn1 (xn1 )
0
and Sn1
(xn ) =
x1 x 0
xn xn1
1
(ci ci1 ) for i = 1, . . . , n 1
3hi
by
3
hi+1
(yi+1 yi )
3
(yi yi1 ) = hi ci1 + 2(hi + hi+1 )ci + hi+1 ci+1
hi
2(h0 + h1 )
h1
h1
2(h1 + h2 )
h2
h
2(h
+ h3 )
h3
2
2
A=
.
..
hn3
2(hn3 + hn2 )
hn2
hn2
2(hn2 + hn1 )
36
c1
c = ... and B =
cn1
3
h1 (y2
3
hn1 (yn
y1 )
..
.
yn1 )
3
h0 (y1
y0 )
3
hn2 (yn1
yn2 )
The matrix A is then said to be tridiagonal (and therefore sparse) and is also
symmetric and elementwise positive. It is hence positive definite and therefore
invertible. We then got all the ci , i = 1, . . . , n 1 and can compute the bs
and ds as
bi1 =
yn yn1 2cn1
yi yi1 1
(ci +2ci1 )hi for i = 1, . . . , n1 and bn1 =
hi
3
hn
3hn
and
cn1
1
(ci ci1 ) for i = 1, . . . , n 1 and dn1 =
3hi
3hn
finally we have had ai = yi , i = 0, . . . , n 1 from the very beginning.
di1 =
has to be undertaken. The only difficult part in this job is to identify the interval the value of the argument of the function we want to evaluate belongs to
i.e. we have to find i {0, . . . , n 1} such that x [xi , xi+1 ]. Nevertheless,
as long as the nodes are generated using an invertible formula, there will be no
cost to determine the interval. Most of the time, a uniform grid is used, such
that the interval [a; b] is divided using the linear scheme xi = a + i, where
= (b a)/(n 1), and i = 0, . . . , n 1. In such a case, it is particularly
the following simple 4 nodes grid {3, 0.5 3 1.5, 0.5 + 3 2, 3}, as taking this
grid would yield a perfect approximation (remember that the central part of
the function is cubic!)
As an example of spline approximation, figure 3.8 reports the spline approximation to the nonsmooth function F (x) = min(max(1.5, (x1/2) 3 ), 2)
considering a uniform grid over the [-3;3] interval with 3, 7 and 15 nodes. In
37
3
2
1
0
1
2
4
0
x
0.5
4
0
x
mation in that the error is driven to zero. Nevertheless, it also appears that
convergence is not monotonic in the case of the L error. This is actually related to the fact that F , in this case is not even C 1 on the overall interval. In
fact, as soon as we consider a smooth function this convergence is monotonic,
as can be seen from the lower panel that report it for the function F (x) = x 0.1
over the interval [0.01;2]. This actually illustrates the following result.
Theorem 7 Let F be a C 4 function over the interval [x0 ; xn ] and S its cubic
spline approximation on {x0 , x1 , . . . , xn } and let > maxi {xi xi1 }, then
kF Sk 6
and
0
kF S k
5
kF (4) k 4
384
p
9 + (3) (4)
6
kF k 3
216
This theorem actually gives upper bounds to spline approximation, and indicates that these bounds decrease at a fast pace (power of 4) as the number of
38
L2 error
0.03
0.8
L error
0.6
0.02
0.4
0.01
0.2
0
0
20
40
# of nodes
0
0
60
20
40
# of nodes
60
x 10
L2 error
0.2
L error
0.15
0.1
1
0
0
0.05
20
40
# of nodes
0
0
60
39
20
40
# of nodes
60
=
=
=
=
=
=
A
= spalloc((nbx-2),(nbx-2),3*nbx-8);
% creates sparse matrix A
B
= zeros((nbx-2),1);
% creates vector B
A(1,[1 2])=[2*(dx+dx) dx];
for i=2:nbx-3;
A(i,[i-1 i i+1])=[dx 2*(dx+dx) dx];
B(i)=3*(y(i+2)-y(i+1))/dx-3*(y(i+1)-y(i))/dx;
end
A(nbx-2,[nbx-3 nbx-2])=[dx 2*(dx+dx)];
c
a
b
d
S
=
=
=
=
=
[0;A\B];
y(1:nbx-1);
(y(2:nbx)-y(1:nbx-1))/dx-dx*([c(2:nbx-1);0]+2*c(1:nbx-1))/3;
([c(2:nbx-1);0]-c(1:nbx-1))/(3*dx);
[a;b;c(1:nbx-1);d];
% Matrix of spline coefficients
One potential problem that may arise with the type of method we have
developed until now is that we have not imposed any particular restriction on
the shape of the approximation relative to the true function. This may be of
great importance in some cases. Let us assume for instance that we need to
approximate the function F (xt ) that characterizes the dynamics of variable x
40
3.7
In this section, we will see an approximation method that preserves the shape
of the function we want to approximate. This method was proposed by Schumaker [1983] and essentially amounts to exploit some information on both
the level and the slope of the function to be approximated to build a smooth
approximation. We will deal with two situations. The first one Hermite
interpolation assumes that we have information on both the level and the
slope of the function to approximate. The second one that uses Lagrange
data assumes that no information on the slope of the function is available.
Both method was originally developed using quadratic splines.
3.7.1
Hermite interpolation
This method assumes that we have information on both the level and the
slope of the function to be approximated. Assume we want to approximate
the function F on the interval [x1 , x2 ] and we know yi = F (xi ) and zi = F 0 (xi ),
i = 1, 2. Schumaker proposes to build a quadratic function S(x) on [x1 ; x2 ]
41
that satisfies
S(xi ) = yi and S 0 (xi ) = zi for i = 1, 2
Schumaker establishes first that
Lemma 1 If
z1 + z 2
y2 y 1
=
2
x2 x 1
S(x) = y1 + z1 (x x1 ) +
z2 z 1
(x x1 )2
2(x2 x1 )
(z2 z1 )
(x x1 )
(x2 x1 )
However, the conditions stated by this lemma are extremely stringent and
do not usually apply, such that we have to adapt the procedure. This may be
done by adding a node between x1 and x2 and construct another spline that
satisfies the lemma.
Lemma 2 For every x? (x1 , x2 ) there exist a unique quadratic spline that
solves
01 + 11 (x x1 ) + 21 (x x1 )2
S(x) =
02 + 12 (x x? ) + 22 (x x? )2
where
where z =
01 = y1
02 = y1 +
for x [x1 ; x? ]
for x [x? ; x2 ]
11 = z1 21 =
z+z1
?
22 =
2 (x x1 ) 12 = z
42
zz1
2(x? x1 )
z2 z
2(x2 x? )
If the later lemma fully characterized the quadratic spline, it gives no information on x? which therefore remains to be selected. x? will be set such that the
spline matches the desired shape properties. First note that if z1 and z2 are
both positive (negative), then S(x) is monotone if and only if z1 z > 0 (6 0)
which is actually equivalent to
2(y2 y1 ) R (x? x1 )z1 + (x2 x? )z2 if z1 , z2 R 0
This essentially deals with the monotonicity problem, and we now have to
tackle the question of curvature. To do so, we compute the slope of the secant
line between x1 and x2
=
y2 y 1
x2 x 1
Then, if (z2 )(z1 ) > 0, this indicates the presence of an inflexion point
in the interval [x1 ; x2 ] such that the interpolant cannot be neither convex nor
concave. Conversely, if |z2 | < |z1 | and x? satisfies
x1 < x ? 6 x x 1 +
2(x2 x1 )(z2 )
(z2 z1 )
2(x2 x1 )(z1 )
6 x? < x2
(z2 z1 )
3. if (z1 )(z2 ) > 0 set x? = (x1 + x2 )/2 and stop else goto 4.
4. if |z1 | < |z2 | set x? = (x1 + x)/2 and stop else goto 5.
5. if |z1 | > |z2 | set x? = (x2 + x)/2 and stop.
We have then in hand a value for x? for [x1 ; x2 ]. We then apply it to each
subinterval to get x?i [xi ; xi+1 ] and then solve the general interpolation
problem as explained in lemma 2.
Note here that everything assumes that with have Hermite data in hand
i.e. {xi , yi , zi : i = 0, . . . , n}. However, the knowledge of the slope is usually
not the rule and we therefore have to adapt the algorithm to such situations.
3.7.2
Assume now that we do not have any data for the slope of the function, that
is we are only endowed with Lagrange data {xi , yi : i = 0, . . . , n}. In such a
case, we just have to add the needed information an estimate of the slope of
and
1
Li = (xi+1 xi )2 + (yi+1 yi )2 2
i =
yi+1 yi
xi+1 xi
Li1 i1 + Li i
if i1 i > 0
zi =
i = 2, . . . , n 1
Li1 + Li
0
if i1 i 6 0
and
z1 =
3n1 sn1
31 z2
and zn =
2
2
Then, we just apply exactly the same procedure as described in the previous
section.
44
3.8
Multidimensional approximations
Computing a multidimensional approximation to a function may be quite cumbersome and even impossible in some cases. To understand the problem, let
us restate an example provided by Judd [1998]. Consider we have data points
{P1 , P2 , P3 , P4 } = {(1, 0), (1, 0), (0, 1), (0, 1)} in R2 and the corresponding
data zi = F (Pi ), i = 1, . . . , 4. Assume now that we want to construct the approximation of function F using a linear combination of {1, x, y, xy} defined
as
G(x, y) = a + bx + cy + dxy
such that G(xi , yi ) = zi . Finding a, b, c, d amounts to solve the linear system
1
1
0 0
a
z1
1 1
0 0
b = z2
1
0
1 0
z3
c
z4
1
0 1 0
d
3.8.1
The idea here is to use the tensor product of univariate functions to form a
basis of multivariate functions. In order to better understand this point, let
us consider that we want to approximate a function F : R2 R using simple
{1, x, y, xy, x2 , y 2 , x2 y, xy 2 , x2 y 2 }
i.e. all possible 2terms products of elements belonging to X and Y .
We are now in position to define the nfold tensor product basis for functions of n variables {x1 , . . . , xi , . . . , xn }.
Definition 10 Given a basis for n functions of the single variable xi : Pi =
i
{pki (xi )}k=0
then the tensor product basis is given by
1
n
Y
Y
...
B=
pk11 (x1 ) . . . pknn (xn )
k1 =0
kn =0
An important problem with this type of tensor product basis is their size.
For example, considering a mdimensional space with polynomials of order
n, we already get (n + 1)m terms! This exponential growth in the number
of terms makes it particularly costly to use this type of basis, as soon as the
number of terms or the number of nodes is high. Nevertheless, it will often
be satisfactory or sufficient for low enough polynomials (in practice n=2!)
Therefore, one often rely on less computationally costly basis.
3.8.2
Complete polynomials
xk11
. . . xknn : k1 , . . . , kn > 0,
n
X
ki 6
i=1
To see this more clearly, let us consider the example developed in the previous
section (X = {1, x, x2 } and Y = {1, y, y 2 }) and let us assume that = 2. In
B c = 1, x, y, x2 , y 2 , xy = B\{xy 2 , x2 y, x2 y 2 }
Note that we have actually already encountered this type of basis, as this is
typically what is done by Taylors theorem for many dimensions
F (x) ' F (x? ) +
..
.
+
n
X
F ?
(x )(xi x?i )
xi
i=1
n
n
X
F
1 X
...
(x? )(xi1 x?i1 ) . . . (xik x?ik )
k!
xi1 . . . xi1
i1 =1
ik =1
1
+
Fxx (x? , y ? )(x x? )2 + 2Fxy (x? , y ? )(x x? )(y y ? )
2
!
+Fyy (x? , y ? )(y y ? )2
which rewrites
F (x, y) = 0 + 1 x + 2 y + 3 x2 + 4 y 2 + 5 xy
such that the implicit polynomial basis is the complete polynomials basis of
order 2 with 2 variables.
47
The key difference between tensor product bases and complete polynomials bases lies essentially in the rate at which the size of the basis increases. As
aforementioned, tensor product bases grow exponentially while complete polynomials bases only grow polynomially. This reduces the computational cost of
approximation. But what do we loose using complete polynomials rather than
tensor product bases? From a theoretical point of view, Taylors theorem gives
us the answer: Nothing! Indeed, Taylors theorem indicates that the element
in B c delivers a approximation in the neighborhood of x? that exhibits an
asymptotic degree of convergence equal to k. The nfold tensor product, B,
can deliver only a k th degree of convergence as it does not contains all terms
of degree k + 1. In other words, complete polynomials and tensor product
bases deliver the same degree of asymptotic convergence and therefore complete polynomials based approximation yields an as good level of accuracy as
tensor product based approximations.
Once we have chosen a basis, we can proceed to approximation. For example, we may use Chebychev approximation in higher dimensional problems.
Judd [1998] reports the algorithm for this problem. As we will see, it takes advantage of a very nice feature of orthogonal polynomials: they inherit their orthogonality property even if we extend them to higher dimensions. Let us then
assume we want to compute the chebychev approximation of a 2dimensional
function F (x, y) over the interval [ax ; bx ] [ay ; by ] and let us assume to
keep things simple for a while that we use a tensor product basis. Then
the algorithm is as follows
1. Choose a polynomial order for x (nx ) and y (ny )
= cos
2k 1
, k = 1, . . . , mx
2mx
= cos
2k 1
, k = 1, . . . , my
2my
and
zky
48
bx ax
x
xk = ax + (1 + zk )
, k = 1, . . . , mx
2
and
yk = ay + (1 +
zky )
by ay
2
, k = 1, . . . , my
k` Tix (zkx ) Tjy z`y
!
! my
ij = mk=1 `=1
x
X
X
2
Tjy z`y
Tix (zkx )2
`=1
k=1
T x (z x )0 T y (z y )
kT x (z x )k2 kT y (z y )k2
ny
nx X
X
i=0 j=0
ij Tix
y ay
x ax
y
1 Tj 2
1
2
bx ax
by ay
y ay
x ax
y
x
1 T 2
1
G(x, y) = T
2
bx ax
by ay
As an illustration of the algorithm we compute the approximation of the CES
function
1
F (x, y) = [x + y ]
49
on the [0.01; 2][0.01; 2] interval for = 0.75. We used 5th order polynomials
for both x and y and 20 nodes for both x and y, such that there are 400
possible interpolation nodes. Applying the algorithm we just described, we
get the matrix of coefficients reported in table 3.7. As can be seen from the
table, most of the coefficients are close to zero as soon as they involve the
crossproduct of higher order terms, such that using a complete polynomial
basis would yield the same efficiency at a lower computational cost. Figure
3.10 reports the graph of the residuals for the approximation.
Table 3.7: Matrix of Chebychev coefficients (tensor product basis)
kx \ k y
0
1
2
3
4
5
0
2.4251
1.2744
-0.0582
0.0217
-0.0104
0.0057
1
1.2744
0.2030
-0.0366
0.0124
-0.0055
0.0029
2
-0.0582
-0.0366
0.0094
-0.0037
0.0018
-0.0009
3
0.0217
0.0124
-0.0037
0.0016
-0.0008
0.0005
4
-0.0104
-0.0055
0.0018
-0.0008
0.0004
-0.0003
5
0.0057
0.0029
-0.0009
0.0005
-0.0003
0.0002
50
%
% Step 3
%
Y
= zeros(mx,my);
for ix=1:mx;
for iy=1:my;
Y(ix,iy) = (x(ix)^rho+y(iy)^rho)^(1/rho);
end
end
%
% Step 4
%
Xx = [ones(mx,1) rx];
for i=3:nx+1;
Xx= [Xx 2*rx.*Xx(:,i-1)-Xx(:,i-2)];
end Xy = [ones(my,1) ry];
for i=3:ny+1;
Xy= [Xy 2*ry.*Xy(:,i-1)-Xy(:,i-2)];
end
T2x = diag(Xx*Xx);
T2y = diag(Xy*Xy);
a
= (Xx*Y*Xy)./(T2x*T2y);
0.01
0.005
0
0.005
0.01
0.015
0
0.5
1
1.5
y
1.5
51
0.5
If we now want to perform the same approximation using a complete polynomials basis, we just have to modify the algorithm to take into account the
fact that when iterating on i and j we want to impose i + j 6 . Let us
compute is for = 5. This implies that the basis will consists of
1, T1x (.), T1y (.), T2x (.), T2y (.), T3x (.), T3y (.), T4x (.), T4y (.), T5x (.), T5y (.),
T1x (.)T1y (.), T1x (.)T2y (.), T1x (.)T3y (.), T1x (.)T4y (.),
T2x (.)T1y (.), T2x (.)T2y (.), T2x (.)T3y (.),
T3x (.)T1y (.), T3x (.)T2y (.),
T4x (.)T1y (.)
0
2.4251
1.2744
-0.0582
0.0217
-0.0104
0.0057
1
1.2744
0.2030
-0.0366
0.0124
-0.0055
2
-0.0582
-0.0366
0.0094
-0.0037
3
0.0217
0.0124
-0.0037
4
-0.0104
-0.0055
5
0.0057
A first thing to note is that the coefficients that remain are the same as
the one we got in the tensor product basis. This should not be any surprise
as what we just find is just the expression of the Chebychev economization we
already encountered in the unidimensional case and which is just the direct
consequence of the orthogonality condition of chebychev polynomials. Figure
3.11 report the residuals from the approximation using the complete basis. As
can be seen from the figure, this constrained approximation yields quantitatively similar results compared to the tensor product basis, therefore achieving
almost the same accuracy while being less costly from a computational point
of view. In the matlab code section, we just report the lines in step 4 that
are affected by the adoption of the complete polynomials basis.
52
0.02
0.01
0
0.01
0.02
0
0.5
0.5
1.5
2
1.5
2
3.9
Finite element are extremely popular among engineers (especially in aeronautics). This approach considers elements that are zero over most of the
domain of approximation. Although they are extremely powerful in the case
of 2dimensional problems, they are more difficult to implement in higher
dimensions. We therefore focus on the bidimensional case.
53
3.9.1
Bilinear approximations
A bilinear interpolation proposes to interpolate data linearly in both coordinate directions. Assume that we have the values of a function F (x, y) at the
four points
P1 = (1, 1) P2 = (1, 1)
P3 = (1, 1)
P4 = (1, 1)
F (x, y) ' F (1, 1)b1 (x, y)+F (1, 1)b2 (x, y)+F (1, 1)b3 (x, y)+F (1, 1)b4 (x, y)
If we have data on [ax ; bx ] [ay ; by ], we use the linear transformation we have
already encountered a great number of times
y ay
x ax
2
1, 2
1
bx ax
by ay
proceed as follows
insured if we were to construct biquadratic or bicubic interpolations for example. Note that this type of interpolation scheme is typically what is done
when a computer draw a 3d graph of a function. In figure 3.12, we plot
the residuals of the bilinear approximation of the CES function we approximated in the previous section, with 5 uniform intervals (6 nodes such that x =
{0.010, 0.408, 0.806, 1.204, 1.602, 2.000} and y = {0.010, 0.408, 0.806, 1.204, 1.602, 2.000}).
Like in the spline approximation procedure, the most difficult step once we
have obtained an approximation is to determine the square the point for which
we want an approximation belongs to. We therefore face exactly the same type
of problems.
Figure 3.12: Residuals of the bilinear approximation
1
0.5
0
0.5
1
0
0.5
1.5
1
1.5
0.5
2
3.9.2
plane. To do so, and assuming that the lagrange data have already been
55
to which is associated the same index (i = j). b1 b2 and b3 are the cardinal
functions on P1 , P2 , P3 . Let us now add the point P4 = (1, 1), then we have
the following cardinal functions for P2 , P3 , P4 :
b4 (x, y) = 1 x, b5 (x, y) = 1 y, b6 (x, y) = x + y 1
Therefore, on the square P1 , P2 , P3 , P4 the interpolant for F is given by
if x + y 6 1
F (0, 0)(1 x y) + F (0, 1)y + F (1, 0)x
G(x, y) =
It should be clear to you that if these methods are pretty easy to implement
56
Bibliography
Hornik, K., M. Stinchcombe, and H. White, MultiLayer Feedforward Networks are Universal Approximators, Neural Networks, 1989, 2, 359366.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Schumaker, L.L., On Shapepreseving Quadratic Spline Interpolation, SIAM
Journal of Numerical Analysis, 1983, 20, 854864.
57
Index
Analytic function, 4
Chebychev approximation, 24
Vandermonde matrix, 23
Complete polynomials, 47
complete polynomials, 46
Economization principle, 28
Hiddenlayer feedforward, 14
Least square orthogonal polynomial
approximation, 19
Linearization, 6
Local approximations, 2
Loglinearization, 7
Multidimensional approximation, 45
Neural networks, 11
Ordinary least square, 8
Orthogonal polynomials, 18
Radius of convergence, 3
Singlelayer, 14
Splines, 32
Taylor series expansion, 2
58
Contents
3 Approximation Methods
3.1
Local approximations
3.1.1
3.2
1
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
Regressions as approximation . . . . . . . . . . . . . . . . . . .
3.2.1
Orthogonal polynomials . . . . . . . . . . . . . . . . . .
18
3.3
19
3.4
Interpolation methods . . . . . . . . . . . . . . . . . . . . . . .
22
3.4.1
Linear interpolation . . . . . . . . . . . . . . . . . . . .
22
3.4.2
Lagrange interpolation . . . . . . . . . . . . . . . . . . .
23
3.5
Chebychev approximation . . . . . . . . . . . . . . . . . . . . .
24
3.6
Piecewise interpolation . . . . . . . . . . . . . . . . . . . . . . .
32
3.7
41
3.7.1
Hermite interpolation . . . . . . . . . . . . . . . . . . .
41
3.7.2
44
Multidimensional approximations . . . . . . . . . . . . . . . . .
45
3.8.1
46
3.8.2
Complete polynomials . . . . . . . . . . . . . . . . . . .
47
53
3.9.1
Bilinear approximations . . . . . . . . . . . . . . . . . .
53
3.9.2
55
3.8
3.9
59
60
List of Figures
3.1
Selection of points . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.2
13
3.3
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3.4
17
3.5
Chebychev polynomials . . . . . . . . . . . . . . . . . . . . . .
25
x0.1
3.6
. . . . . . . . . . . . . . . . . . .
29
3.7
31
3.8
38
Approximation errors
. . . . . . . . . . . . . . . . . . . . . . .
39
52
54
56
3.9
61
62
List of Tables
3.1
3.2
17
3.3
19
3.4
20
3.5
29
3.6
31
3.7
50
3.8
52
63
Lecture Notes 4
4.1
Numerical differentiation
4.1.1
Computation of derivatives
A direct approach
Let us recall that the derivative of a function is given by
F (x + ) F (x)
0
F 0 (x) = lim
F (x + x ) F (x)
x
1
(4.1)
The problem is then: how big should x be? It is obvious that x should be
small, in order to be as close as possible to the limit. The problem is that
it cannot be too small because of the numerical precision of the computer.
Assume for a while that the computer can only deliver a precision of 1e-2
and that we select x = 0.00001, then F (x + x ) F (x) would be 0 for the
x
x
x
x
Further, Taylors expansion theorem states
F (x + x ) = F (x) + F 0 (x)x +
F 00 () 2
x
2
Fb(x + ) Fb (x)
00 ()
F
2e
x
x 6
F 0 (x)
x
x
2
Fb (x + ) Fb(x)
2e
M
x
F 0 (x) 6
+ x
x
2
such that the upper bound is 2 eM . One problem here is that we usually
do not know M However, from a practical point of view, most people use the
following scheme for x
x = 1e 5. max(|x|, 1e 8)
which essentially amounts to work at the machine precision.
Ajouter ici une discussion de x
Similarly, rather than taking a forward difference, we may also take the
backward difference
F 0 (x) '
F (x) F (x x )
x
(4.2)
Central difference
There are a number of situations where onesided differences are not accurate enough, one potential solution is then to use the central difference or
twosided differences approach that essentially amounts to compute the
derivative using the backwardforward formula
F 0 (x) '
F (x + x ) F (x x )
2x
(4.3)
What do we gain from using this formula? To see this, let us consider the
Taylor series expansion of F (x + x ) and F (x x )
1
F (x + x ) = F (x) + F 0 (x)x + F 00 (x)2x +
2
1
F (x x ) = F (x) F 0 (x)x + F 00 (x)2x
2
1 (3)
F (1 )3x
6
1 (3)
F (2 )3x
6
(4.4)
(4.5)
F (x + x ) F (x x ) 2x (3)
+
F ()
2x
6
(4.6)
Dropping the, hopefully tiny, term O(hk+1 ) from this equation, we obtain a
linear equation, A = A(h) + hk , in the two unknowns A and . But this
really gives a different equation for each possible value of h. We can therefore
get two different equations to identify both A and by just using two different
step sizes. Then doing this , using step sizes h and h/2, for any h, and taking
2k times
A = A(h/2) + (h/2)k + O(hk+1 )
(4.7)
(note that, in equations (4.6) and (4.7), the symbol O(hk+1 ) is used to stand
for two different sums of terms of order hk+1 and higher) and subtracting
4
2k A(h/2) A(h)
+ O(hk+1 )
2k 1
where, once again, O(hk+1 ) stands for a new sum of terms of order hk+1 and
A=
higher. Denoting
B(h) =
then
2k A(h/2) A(h)
2k 1
A = B(h) + O(hk+1 )
What have we done so far? We have defined an approximation B(h) whose
error is of order k + 1 rather than k, such that it is a better one than A(h)s.
The generation of a new improved approximation for A from two A(h)s
with different values of h is called Richardson Extrapolation. We can then
continue the process with B(h) to get a new better approximation. This
method is widely used when computing numerical integration or numerical
differentiation.
Numerical differentiation with Richardson Extrapolation Assume
we want to compute the first order derivative of the function F C 2n R at
point x? . We may first compute the approximate quantity:
D00 (F ) =
F (x? + h0 ) F (x? h0 )
2h0
F (x? + h1 ) F (x? h1 )
2h1
Then according to the previous section, we may compute a better approximation as (since k = 2 in the case of numerical differentiation)
D01 (F ) =
4D01 (F ) D00 (F )
3
5
D01 (F ) D00 (F )
3
j+1
D`1
(F )
`1
Dj+1
(F ) Dj`1 (F )
4k 1
error proportional to hj
fm
= feval(f,x-h,varargin{:});
D(j+1,1) = (fs-fm)/(2*h);
% derivative with updated step size
%
% recursion
%
for k = 1:j,
D(j+1,k+1) = D(j+1,k-1+1) + (D(j+1,k-1+1)-D(j-1+1,k-1+1))/(4^k -1);
end
%
% compute errors
%
err = abs(D(j+1,j+1)-D(j-1+1,j-1+1));
rerr = 2*err/(abs(D(j+1,j+1))+abs(D(j-1+1,j-1+1))+eps);
j = j+1;
end
n = size(D,1);
D = D(n,n);
4.1.2
Partial Derivatives
Let us now consider that rather than having a single variable function, the
problem is multidimensional, such that F : Rn R and that we now want
to compute the first order partial derivative
Fi (x) =
F (x)
xi
This may be achieved extremely easily by computing, for example in the case
of central difference formula
F (x + ei x ) F (x ei x )
2x
where ei is a vector which i th component is 1 and all other elements are
0.
%
= l -> left difference
%
= r -> right difference
%
x0
= x0(:);
f
= feval(func,x0,varargin{:});
m
= length(x0);
n
= length(f);
J
= zeros(n,m);
dev = diag(.00001*max(abs(x0),1e-8*ones(size(x0))));
if (lower(method)==l);
for i=1:m;
ff= feval(func,x0+dev(:,i),varargin{:});
J(:,i)
= (ff-f)/dev(i,i);
end;
elseif (lower(method)==r)
for i=1:m;
fb= feval(func,x0-dev(:,i),varargin{:});
J(:,i)
= (f-fb)/dev(i,i);
end;
elseif (lower(method)==c)
for i=1:m;
ff= feval(func,x0+dev(:,i),varargin{:});
fb= feval(func,x0-dev(:,i),varargin{:});
J(:,i)
= (ff-fb)/(2*dev(i,i));
end;
else
error(Bad method specified)
end
4.1.3
Hessian
Hessian matrix can be computed relying on the same approach as for the
Jacobian matrix. Let us consider for example that we want to compute the
second order derivative of function F : R R using a central difference
approach, as we have seen it delivers higher accuracy. Let us write first write
the Taylors expansion of F (x + x ) and F (x x ) up to order 3
2x 00
F (x) +
2
2
F (x x ) = F (x) F 0 (x)x + x F 00 (x)
2
F (x + x ) = F (x) + F 0 (x)x +
3x (3)
F (x) +
6
3x (3)
F (x) +
6
4x (4)
F (1 )
4!
4x (4)
F (2 )
4!
4x (4)
[F (1 ) + F (4) (2 )]
4!
F (x + x ) 2F (x) + F (x x ) 2x (4)
F ()
2x
12
derivative is O(2x ).
4.2
Numerical Integration
f (X, )e 2 2 d
2
i=0
where the coefficients i depend on the method chosen to compute the integral.
This approach to numerical integration is known as the quadrature problem.
These method essentially differ by (i) the weights that are assigned to each
function evaluation and (ii) the nodes at which the function is evaluated. In
fact basic quadrature methods may be categorized in two wide class:
9
1. The methods that are based on equally spaced data points: these are
Newtoncotes formulas: the midpoint rule, the trapezoid rule and
Simpson rule.
2. The methods that are based on data points which are not equally spaced:
these are Gaussian quadrature formulas.
4.2.1
NewtonCotes formulas
Yb
Ya
YM
F (a)
a+b
2
Z b
(b a)3 00
a+b
F (x)dx = (b a)F
+
F ()
2
4!
a
where [a; b], such that the approximate integral is given by
a+b
Ib = (b a)F
2
Note that this rule does not make any use of the end points. It is noteworthy
that this approximation is far too coarse to be accurate, such that what is usually done is to break the interval [a; b] into smaller intervals and compute the
approximation on each subinterval. The integral is then given by cumulating
the subintegrals, we therefore end up with a composite rule. Hence, assume
that the interval [a; b] is broken into n > 1 subintervals of size h = (b a)/n,
Ibn = h
n
X
f (xi )
i=1
Trapezoid rule
The trapezoid rule essentially amounts to use a linear approximation of the
function to be integrated between the two end points of the interval. This
11
then defines the trapezoid {(a, 0), (a, F (a)), (b, F (b)), (b, 0)} which area and
consequently the approximate integral is given by
(b a)
Ib =
(F (a) + F (b))
2
xb
xa
F (a) +
F (b)
ab
ba
then
Z
b
a
F (x)dx '
'
'
'
'
'
'
xa
xb
F (a) +
F (b)dx
ba
a ab
Z b
1
(b x)F (a) + (x a)F (b)dx
ba a
Z b
1
(bF (a) aF (b)) + x(F (b) F (a))dx
ba a
Z b
1
x(F (b) F (a))dx
bF (a) aF (b) +
ba a
b2 a2
bF (a) aF (b) +
(F (b) F (a))
2(b a)
b+a
(F (b) F (a))
bF (a) aF (b) +
2
(b a)
(F (a) + F (b))
2
12
Simpsons rule
The simpsons rule attempts to circumvent an inefficiency of the trapezoid rule:
a composite trapezoid rule may be far too coarse if F is smooth. An alternative
is then to use a piecewise quadratic approximation of F that uses the values
of F at a, b and (b + a)/2 as interpolating nodes. Figure 4.2 illustrates the
rule. The thick line is the function F to be integrated and the thin line is
the quadratic interpolant for this function. A quadratic interpolation may be
obtained by the Lagrange interpolation formula, where = (b + a)/2
L (x) =
(x a)(x b)
(x a)(x )
(x )(x b)
F (a) +
F () +
F (b)
(a )(a b)
( a)( b)
(b a)(b )
a+b
2
F (b)
F (a)
a+b
2
=
=
F (a) b3 a3
b2 a2
(b
+
)
+
b(b
a)
2h2
3
2
h
F (a) 2
(b 2ba + a2 ) = F (a)
12h
3
I2 =
I3 =
(x a)(x b)
F ()dx
h2
a
Z
F () b 2
x (b + a)x + abdx
=
h2 a
3
b2 a2
F () b a3
(b + a)
+ ba(b a)
=
h2
3
2
F ()
4h
=
(b a)2 = F ()
3h
3
=
=
(x )(x a)
F (b)dx
2h2
a
Z
F (b) b 2
x (a + )x + adx
2h2 a
b2 a2
F (b) b3 a3
(a + )
+ a(b a)
2h2
3
2
14
F (b) 2
h
(b 2ba + a2 ) = F (b)
12h
3
b+a
ba
b
I=
F (a) + 4F
+ F (b)
6
2
If, like in the midpoint rule and the trapezoid rules we want to compute a
better approximation of the integral by breaking [a; b] into n > 2 even number
of subintervals, we set h = (b a)/n, xi = x + ih, i = 0, . . . , n. Then the
h
Ibn = [F (x0 ) + 4F (x1 ) + 2F (x2 ) + 4F (x3 ) + . . . + 2F (xn2 ) + 4F (xn1 ) + F (xn )]
3
Matlab Code: Simpsons Rule Integration
function simp=simpson(func,a,b,n,varargin);
%
% function simp=simpson(func,a,b,n,P1,...,Pn);
%
% func
: Function to be integrated
% a
: lower bound of the interval
% b
: upper bound of the interval
% n
: even number of sub-intervals => n+1 points
% P1,...,Pn : parameters of the function
%
h
= (b-a)/n;
x
= a+[0:n]*h;
y
= feval(func,x,varargin{:});
simp= h*(2*(1+rem(1:n-1,2))*y(2:n)+y(1)+y(n+1))/3;
by
F (x)dx
a
setting a and b too large enough negative and positive values. However, this
may be a particularly slow way of approximating the integral, and the next
theorem provides a indirect way to achieve higher efficiency.
Theorem 1 If : R R is a monotonically increasing, C 1 , function on
the interval [a; b] then for any integrable function F (x) on [a; b] we have
Z 1 (b)
Z b
F ((y))0 (y)dy
F (x)dx =
1 (a)
This theorem is just what we usually call a change of variables, and convert a
problem where we want to integrate a function in the variable x into a perfectly
equivalent problem where we integrate with regards to y, with y and x being
related by the nonlinear relation: x = (y).
As an example, let us assume that we want to compute the average of the
transformation of a gaussian random variable x ; N (0, 1). This is given by
Z
x2
1
G(x)e 2 dx
2
such that F (x) = G(x)e
x2
2
2
G( 2z)ez dz
16
1
y
(y) = log
such that 0 (y) =
1y
y(1 y)
In this case, the integral rewrites
Z 1
1
y
1
F
dy
2 log
1y
y(1 y)
0
or
1
1y
y
log((1y)/y)
y
1
2 log
dy
G
1y
y(1 y)
with
1
h(y)
1y
y
log(y/(1y))
G
2 log
y
1y
1
y(1 y)
Table 4.1 reports the results for the different methods we have seen so far. As
can be seen, the midpoint and the trapezoid rule perform pretty well with
20 subintervals as the error is less than 1e-4, while the Simpson rule is less
efficient as we need 40 subintervals to be able to reach a reasonable accuracy.
We will see in the next section that there exist more efficient methods to deal
with this type of problem.
Note that not all change of variable are admissible. Indeed, in this case
we might have used (y) = log(y/(1 y))1/4 , which also maps [0; 1] into R in
a monotonically increasing way. But this would not have been an admissible
17
Midpoint
2.2232
Trapezoid
1.1284
Simpson
1.5045
(-0.574451)
(0.520344)
(0.144219)
1.6399
1.6758
1.8582
(0.0087836)
(-0.0270535)
(-0.209519)
1.6397
1.6579
1.6519
(0.00900982)
(-0.00913495)
(-0.0031621)
1.6453
1.6520
1.6427
(0.00342031)
(-0.00332232)
(0.00604608)
1.6488
1.6487
1.6475
(-4.31809e-005)
(4.89979e-005)
(0.00117277)
1.6487
1.6487
1.6487
(-2.92988e-006)
(2.90848e-006)
(-1.24547e-005)
4.2.2
Gaussian quadrature
b
a
F (x)dx '
18
n
X
i=1
i F (xi )
for some quadrature nodes xi [a; b] and quadrature weights i . All xi s are
arbitrarily set in NewtonCotes formulas, as we have seen we just imposed a
equally spaced grid over the interval [a; b]. Then the weights i follow from
the fact that we want the approximation to be equal for a polynomial of
order lower or equal to the degree of the polynomials used to approximate the
function. The question raised by Gaussian Quadrature is then Isnt there a
more efficient way to set the nodes and the weights? The answer is clearly
R
Yes. The key point is then to try to get a good approximation to F (x)dx.
In fact, Gaussian quadrature is a much more general than simple integration, it actually computes an approximation to the weighted integral
Z b
n
X
F (x)w(x)dx '
i F (xi )
a
i=1
respect to the weighting function w(x) on the interval [a; b], and define ` so
n+1 /n
0
n (x)n+1 (x)
19
>0
We will take advantage of this property. The nodes will be the roots of the
orthogonal polynomial of order n, while the weights will be chosen such that
the gaussian formulas is exact for lower order polynomials
Z b
n
X
k (x)w(x)dx =
i k (xi ) for k = 0, . . . , n 1
a
i=1
This implies that the weights can be recovered by solving a linear system of
the form
Rb
1 0 (x1 ) + . . . + n 0 (xn ) = a w(x)dx
1 1 (x1 ) + . . . + n 1 (xn ) = 0
..
.
1 n1 (x1 ) + . . . + n1 n (xn ) = 0
which rewrites
=
with
0 (x1 )
..
.
0 (xn )
..
..
, =
.
.
n1 (x1 ) n1 (xn )
20
Rb
.. and =
w(x)dx
0
..
.
0
Note that the orthogonality property of the polynomials imply that the
matrix is invertible, such that = 1 . We now review the most commonly
used Gaussian quadrature formulas.
GaussChebychev quadrature
This particular quadrature can be applied to problems that takes the form
Z
1
1
F (x)(1 x2 ) 2 dx
1
1
1
F (x)(1 x2 ) 2 dx =
X
F (2n)()
F (xi ) + 2n1
n
2
(2n)!
i=1
for [1; 1] and where the nodes are given by the roots of the Chebychev
polynomial of order n:
xi = cos
2i 1
2n
i = 1, . . . , n
It is obviously the case that we rarely have to compute an integral that exactly
takes the form this quadrature imposes, and we are rather likely to compute
Z
F (x)dx
a
xa
2dx
1 implying dy =
ba
ba
(y + 1)(b a)
F a+
dy
2
1
1
21
1
1
G(y) p
1
1 y2
(y + 1)(b a)
a+
2
such that
Z
b
a
(b a) X
F
F (x)dx '
2n
i=1
dy
p
1 y2
(yi + 1)(b a)
a+
2
q
1 yi2
F (x)dx
1
F (x)dx =
1
n
X
i F (xi ) +
i=1
for [1; 1]. In this case, both the nodes and the weights are non trivial
to compute. Nevertheless, we can generate the nodes using any root finding
procedure, and the weights can be computed as explained earlier, noting that
R1
1 w(x)dx = 2.
transformation
y=2
2dx
xa
1 implying dy =
ba
ba
22
b
a
baX
i F
F (x)dx '
2
i=1
(yi + 1)(b a)
a+
2
where yi and i are the GaussLegendre nodes and weights over the interval
[a; b].
Such a simple formula has a direct implication when we want to compute
the discounted value of a an asset, the welfare of an agent or the discounted
sum of profits in a finite horizon problem, as it can be computed solving the
integral
T
0
in the case of a profit function. However, it will be often the case that we will
want to compute such quantities in an infinite horizon model, something that
this quadrature method cannot achieve unless considering a change of variable
of the kind we studied earlier. Nevertheless, there exists a specific Gaussian
quadrature that can achieve this task.
As an example of the potential of GaussLegendre quadrature formula, we
compute the welfare function of an individual that lives an infinite number of
period. Time is continuous and the welfare function takes the form
Z T
c(t)
et
dt
W =
0
where we assume that c(t) = c? et . Results for n=2, 4, 8 and 12 and T=10,
50, 100 and 1000 (as an approximation to ) are reported in table 4.3, where
23
we set = 0.01, = 0.05 and c? = 1. As can be seen from the table, the
integral converges pretty fast to the true value as the absolute error is almost
zero for n > 8, except for T=1000. Note that even with n = 4 a quite high
level of accuracy can be achieved in most of the cases.
GaussLaguerre quadrature
This particular quadrature can be applied to problems that takes the form
Z
F (x)ex dx
0
ex
is then given by
Z
n
X
x
F (x)e dx =
i F (xi ) +
0
i=1
(n!)2
F (2n) ()
(2n + 1)!(2n)! (2n)!
generate the nodes using any root finding procedure, and the weights can be
R
computed as explained earlier, noting that 0 w(x)dx = 1.
The problem involves a discount rate that should be eliminated to stick to the
0
and can be approximated by
n
1X
i F
i=1
24
yi
= 2.5
= 1
= 0.5
= 0.9
T=10
-3.5392
-8.2420
15.3833
8.3929
(-3.19388e-006)
(-4.85944e-005)
(0.000322752)
(0.000232844)
-3.5392
-8.2420
15.3836
8.3931
(-3.10862e-014)
(-3.01981e-012)
(7.1676e-011)
(6.8459e-011)
-3.5392
-8.2420
15.3836
8.3931
(0)
(1.77636e-015)
(1.77636e-015)
(-1.77636e-015)
12
-3.5392
-8.2420
15.3836
8.3931
(-4.44089e-016)
(0)
(3.55271e-015)
(1.77636e-015)
T=50
2
4
8
12
2
4
8
12
-11.4098
-21.5457
33.6783
17.6039
(-0.00614435)
(-0.0708747)
(0.360647)
(0.242766)
-11.4159
-21.6166
34.0389
17.8467
(-3.62327e-008)
(-2.71432e-006)
(4.87265e-005)
(4.32532e-005)
-11.4159
-21.6166
34.0390
17.8467
(3.55271e-015)
(3.55271e-015)
(7.10543e-015)
(3.55271e-015)
-11.4159
-21.6166
34.0390
17.8467
(-3.55271e-015)
(-7.10543e-015)
(1.42109e-014)
(7.10543e-015)
-14.5764
8
12
16.4972
(-0.110221)
(-0.938113)
(3.63138)
-14.6866
-24.5416
36.2078
18.7749
(-1.02204e-005)
(-0.000550308)
(0.00724483)
(0.00594034)
(2.28361)
-14.6866
-24.5421
36.2150
18.7808
(3.55271e-015)
(-1.03739e-012)
(1.68896e-010)
(2.39957e-010)
-14.6866
-24.5421
36.2150
18.7808
(-5.32907e-015)
(-1.77636e-014)
(2.84217e-014)
(1.77636e-014)
-1.0153
(-14.9847)
T=100
-23.6040
32.5837
T=1000
-0.1066
0.0090
(-24.8934)
(36.3547)
0.0021
(18.8303)
-12.2966
-10.8203
7.6372
3.2140
(-3.70336)
(-14.1797)
(28.7264)
(15.6184)
-15.9954
-24.7917
34.7956
17.7361
(-0.00459599)
(-0.208262)
(1.56803)
(1.09634)
-16.0000
-24.9998
36.3557
18.8245
(-2.01256e-007)
(-0.000188532)
(0.00798507)
(0.00784393)
25
where yi and i are the GaussLaguerre nodes and weights over the interval
[0; ).
where we assume that c(t) = c? et . Results for n=2, 4, 8 and 12 are reported
in table 4.3, where we set = 0.01, = 0.05 and c? = 1. As can be seen from
the table, the integral converges pretty fast to the true value as the absolute
error is almost zero for n > 8. It is worth noting that the method performs
far better than the GaussLegendre quadrature method with T=1000. Note
that even with n = 4 a quite high level of accuracy can be achieved in some
cases.
Table 4.3: Welfare in infinite horizon
n
2
4
8
12
= 2.5
-15.6110
= 1
-24.9907
= 0.5
36.3631
= 0.9
18.8299
(0.388994)
(0.00925028)
(0.000517411)
(0.00248525)
-15.9938
-25.0000
36.3636
18.8324
(0.00622584)
(1.90929e-006)
(3.66246e-009)
(1.59375e-007)
-16.0000
-25.0000
36.3636
18.8324
(1.26797e-006)
(6.03961e-014)
(0)
(0)
-16.0000
-25.0000
36.3636
18.8324
(2.33914e-010)
(0)
(0)
(3.55271e-015)
GaussHermite quadrature
This type of quadrature will be particularly useful when we will consider
stochastic processes with gaussian distributions as they approximate integrals
of the type
F (x)ex dx
26
Z
n
X
n! F (2n) ()
2
i F (xi ) + n
F (x)ex dx =
2
(2n)!
i=1
both the nodes and the weights are non trivial to compute. The nodes can be
computed using any root finding procedure, and the weights can be computed
R
F (x)e 22 dx
2
in order to stick to the problem this type of approach can explicitly solve, we
need to transform the variable using the linear map
y=
1
2
F ( 2y + )ey dy
and can therefore be approximated by
n
1 X
i F ( 2yi + )
i=1
where yi and i are the GaussHermite nodes and weights over the interval
(; ).
As a first example, let us compute the average of a lognormal distribution,
that is log(X) ; N (, 2 ) We then know that E(X) = exp( + 2 /2). This
27
0.01
1.00005
0.1
1.00500
0.5
1.12763
1.0
1.54308
2.0
3.76219
(8.33353e-10)
(8.35280e-06)
(0.00552249)
(0.105641)
(3.62686)
1.00005
1.00501
1.13315
1.64797
6.99531
(2.22045e-16)
(5.96634e-12)
(2.46494e-06)
(0.000752311)
(0.393743)
1.00005
1.00501
1.13315
1.64872
7.38873
(2.22045e-16)
(4.44089e-16)
(3.06422e-14)
(2.44652e-09)
(0.00032857)
1.00005
1.00501
1.13315
1.64872
7.38906
(3.55271e-15)
(3.55271e-15)
(4.88498e-15)
(1.35447e-14)
(3.4044e-08)
discretization of shocks that we will face when we will deal with methods for
solving rational expectations models. In fact, we will often face shocks that
follow Gaussian AR(1) processes
xt+1 = xt + (1 )x + t+1
where t+1 ; N (0, 2 ). This implies that
(
)
Z
Z
1
1 xt+1 xt (1 )x 2
exp
dxt+1 = f (xt+1 |xt )dxt+1 = 1
2
2
which illustrates the fact that x is a continuous random variable. The question
we now ask is: does there exist a discrete representation to x which is equivalent
to its continuous representation? The answer to this question is yes as shown
in Tauchen and Hussey [1991]2 Tauchen and Hussey propose to replace the
integral by
Z
Z
f (xt+1 |xt )
f (xt+1 |x)dxt+1 = 1
(xt+1 ; xt , x)f (xt+1 |x)dxt+1
f (xt+1 |x)
2
28
where f (xt+1 |x) denotes the density of xt+1 conditional on the fact that xt = x
#)
xt+1 xt (1 )x 2
f (xt+1 |xt )
1
xt+1 x 2
(xt+1 ; xt , x)
= exp
f (xt+1 |x)
2
then we can use the standard linear transformation and impose yt = (xt
x)/( 2) to get
Z
2
1
2
for which we can use a GaussHermite quadrature. Assume then that we have
n
1 X
j (yj ; yi ; x) ' 1
j=1
bij of the transition probability from state i to state j, but remember that the
quadrature is just an approximation such that it will generally be the case that
Pn
bij = 1 will not hold exactly. Tauchen and Hussey therefore propose the
j=1
following modification:
bij =
Pn
j (yj ; yi ; x)
si
where si =
0.7330
0.1745
=
0.0077
0.0000
0.2557
0.5964
0.2214
0.0113
29
0.0113
0.2214
0.5964
0.2557
0.0000
0.0077
0.1745
0.7330
meaning for instance that we stay in state 1 with probability 0.7330, but will
transit from state 2 to state 3 with probability 0.2214.
Matlab Code: Discretization of an AR(1)
n
xbar
rho
sigma
=
=
=
=
2;
0;
0.95;
0.01;
%
%
%
%
number of nodes
mean of the x process
persistence parameter
volatility
[xx,wx] = gauss_herm(n);
% nodes and weights for x
x_d
= sqrt(2)*s*xx+mx;
% discrete states
x=xx(:,ones(n,1));
y=x;
w=wx(:,ones(n,1));
%
% computation
%
px = (exp(y.*y-(y-rx*x).*(y-rx*x)).*w)./sqrt(pi);
sx = sum(px);
px = px./sx(:,ones(n,1));
4.2.3
Potential problems
In all the cases we dealt with in the previous sections, the integral were definite or at least existed (up to some examples), but there may exist some
singularities in the function such that the integral may not be definite. For
instance think of integrating x over [0; 1], the function diverges in 0. How
will perform the methods we presented in the previous section. The following
theorem by Davis and Rabinowitz [1984] states that standard method can still
be used.
Theorem 3 Assume that there exists a continuous monotonically increasing
R1
function G : [0; 1] R such that 0 G(x)dx < and |F (x)| 6 |G(x)| on
[0; 1], the the NewtonCotes rule (with F (1) = 0 to avoid the singularity in 1)
R1
and the GaussLegendre quadrature rule converge to 0 F (x)dx as n increases
to .
Therefore, we can still apply standard methods to compute such integrals, but
convergence is much slower and the error formulas cannot be used anymore as
30
kF (k) (x)k is infinite for k > 1. Then, if we still want to use error bounds,
4.2.4
Multivariate integration
as
approach to higher dimensions by multiplying sums. For instance, let xki and
ik , k = 1, . . . , nk be the quadrature nodes and weights of the one dimensional
problem along dimension k {1, . . . , s}, which can be obtained either from a
...
ns
X
is =1
i1 =1
A potential difficulty with this approach is that when the dimension of the
space increases, the computational cost increases exponentially this is the
socalled curse of dimensionality. Therefore, this approach should be restricted for low dimensions problems.
As an example of use of this type of method, let us assume that we want
to compute the first order moment of the 2 dimensional function F (x1 , x2 ),
where
x1
x2
;N
1
2
11 12
,
12 22
Z Z
1
21
0 1
1
|| (2)
F (x1 , x2 ) exp (x ) (x ) dx1 dx2
2
11 12
0
0
. Let be the Cholesky
where x = (x1 , x2 ) , = (1 , 2 ) , =
12 22
decomposition of such that = 0 , and let us make the change of variable
y = 1 (x )/ 2 x = 2y +
then, the integral rewrites
!
s
Z Z
X
2
1
yi dy1 dy2
F ( 2y + ) exp
i=1
1 X
i11 i22 F ( 211 y1 + 1 , 2(21 y1 + 22 y2 ) + 2 )
i1 =1 i2 =1
32
0.0100 0.0075
0.0075 0.0200
The results are reported in table 4.5, where we consider different values for
n1 and n2 . It appears that the method performs well pretty fast, as the true
value for the integral is 0.01038358129717, which is attained for n 1 > 8 and
n2 > 8.
Table 4.5: 2D Gauss-Hermite quadrature
nx \ny
2
4
8
12
2
0.01029112845254
0.01038328639869
0.01038328710679
0.01038328710679
4
0.01029142086814
0.01038358058862
0.01038358129674
0.01038358129674
8
0.01029142086857
0.01038358058906
0.01038358129717
0.01038358129717
=
=
=
=
=
2;
8;
gauss_herm(n1);
8;
gauss_herm(n2);
Sigma
Omega
mu1
mu2
=
=
=
=
%
%
%
%
%
int=0;
for i=1:n1;
for j=1:n2;
x12
= sqrt(2)*Omega*[x1(i);x2(j)]+[mu1;mu2];
f
= (exp(x12(1))-exp(mu1))*(exp(x12(2))-exp(mu2));
33
12
0.01029142086857
0.01038358058906
0.01038358129717
0.01038358129717
int
= int+w1(i)*w2(j)*f
end
end
int=int/sqrt(pi^n);
4.2.5
MonteCarlo integration
(b a) X
f (xi )
n
i=1
There have been attempts to build truly random number generators, but these technics
were far too costly and awkward.
4
Generating a 2 dimensional sequence may be done extracting subsequences: yk =
(x2k+1 , x2k+2 ).
36
50
100
150
200
250
for these numbers led to push linear congruential methods into disfavor, the
solution has been to design more complicated generators. An example of
those generators quoted by Judd [1998] is the multiple prime random number
generator for which we report the matlab code. This pseudo random numbers
generator proposed by Haas [1987] generates integers between 0 and 99999,
such that dividing the sequence by 100,000 returns numbers that approximate
a uniform random variable over [0;1] with 5 digits precision. If higher precision
is needed, the sequence may just be concatenated using the scheme (for 8 digits
precision) 100, 000 x2k + x2k+1 . The advantage of this generator is that its
long = 10000;
m
= 971;
ia
= 11113;
37
k+1
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
xk
0.6
0.7
0.8
0.9
ib
= 104322;
x
= zeros(long,1);
x(1) = 481;
for i= 2:long;
m = m+7;
ia= ia+1907;
ib= ib+73939;
if m>=9973;m=m-9871;end
if ia>=99991;ia=ia-89989;end
if ib>=224729;ib=ib-96233;end
x(i)=mod(x(i-1)*m+ia+ib,100000)/10;
end
which has a period length of 102 5, such that it passes a lot of randomness
tests.
A key feature of all these random number generators is that they attempt
to draw numbers from a uniform distribution over the interval [0;1]. There
however may be some cases where we would like to draw numbers from another
distribution mainly the normal distribution. The way to handle this problem is then to invert the cumulative density function of the distribution we
want to generate to get a random draw from this particular distribution. More
formally, assume we want numbers generated from the distribution F (.), and
N
we have a draw {xi }N
i=1 from the uniform distribution, then the draw {yi }i=1
Inverting this function may be trivial in some cases (say the uniform over [a;b])
N
2
1 X
Xi =
where 2 = var(Xi )
var
N
N
i=1
b2 =
N
N
2
1 X
1 X
Xi X with X =
Xi
N 1
N
i=1
i=1
39
2
1 X
F (xi ) IbF
=
N 1
N
bF2
i=1
port in table 4.6 the results obtained integrating the exponential function over
[0;1]. This table illustrates why MonteCarlo integration is seldom used (i)
Table 4.6: Crude MonteCarlo example:
N
10
100
1000
10000
100000
1000000
Ibf
1.54903750
1.69945455
1.72543465
1.72454262
1.72139292
1.71853252
bIbf
R1
0
ex dx.
0.13529216
0.05408852
0.01625793
0.00494992
0.00156246
0.00049203
N
1 X
(F (xi ) + F (1 xi ))
IbfA =
2N
i=1
5
Note that we used the same seed when generating this integral and the one we generate
using crude MonteCarlo.
41
Ibf
1.71170096
1.73211884
1.72472178
1.71917393
1.71874441
1.71827383
bIbf
R1
0
ex dx.
0.02061231
0.00908890
0.00282691
0.00088709
0.00027981
0.00008845
Nb
Na
1 X
1 X
s
a
b
If =
F (xi ) +
F (xbi )
Na
Nb
i=1
i=1
where xai [0; ] and xbi [; 1]. Then the variance of this estimator is
given by
(1 )2
2
vara (F (x)) +
varb (F (x))
Na
Nb
which equals
(1 )
vara (F (x)) +
varb (F (x))
N
N
Table 4.8 reports results for the exponential function for = 0.25. As can
be seen from the table, up to the 10 points example,6 there is hopefully
no differences between the crude MonteCarlo method and the stratified
sampling approach in the evaluation of the integral and we find potential
gain in the use of this approach in the variance of the estimates. The
potential problem that remains to be fixed is How should be selected?
In fact we would like to select such that we minimize the volatility,
6
42
N
10
100
1000
10000
100000
1.52182534
1.69945455
1.72543465
1.72454262
1.72139292
bIbf
R1
0
ex dx.
0.11224567
0.04137204
0.01187637
0.00359030
0.00114040
F (x)dx =
(F (x) (x))dx +
(x)dx
1.5. Table 4.9 reports the results. As can be seen the method performs
a little worse than the antithetic variates, but far better than the crude
MonteCarlo.
43
Ibf
bIbf
1.64503465
1.71897083
1.72499149
1.72132486
1.71983807
1.71838279
R1
0
ex dx.
0.05006855
0.02293349
0.00688639
0.00210111
0.00066429
0.00020900
F (x)
G(x)dx
G(x)
H(x)G(x)dx
D
44
I2bis
Z
Z
2 !
h2
F (x)
1
F (x)2
=
G(x)dx
G(x)dx
2
N
N
D G(x)
D G(x)
Z
2 !
Z
1
F (x)2
F (x)dx
dx
N
D G(x)
D
Ibf
1.54903750
1.69945455
1.72543465
1.72454262
1.72139292
1.71853252
45
bIbf
0.04278314
0.00540885
0.00051412
0.00004950
0.00000494
0.00000049
R1
0
ex dx.
4.2.6
n
Definition 1 A sequence {xi }
i=1 D R is said to be equidistributed
46
(D) X
F (xi ) =
lim
N N
i=1
F (x)dx
D
(b a) X
lim
F (xi ) =
N
N
i=1
F (x)dx
a
Remember that the fractional part is that part of a number that lies right after the dot.
This is denoted by {.}, such that {2.5} = 0.5. This can be computed as
{x} = x max{k Z|k 6 x}
The matlab function that return this component is x-fix(x).
47
k(k+1)
Haber: { k(k+1)
p
p
},
.
.
.
,
{
}
1
n
2
2
Baker: ({k er1 }, . . . , {kern }), rs are rational and distinct numbers
In all these cases, the ps are usually prime numbers. Figure 4.6 reports a 2
dimensional sample of 1000 points for each type of sequence. There obviously
Figure 4.6: QuasiMonte Carlo sequences
Weyl sequence
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.4
0.6
0.8
0
0
Niederreiter sequence
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
0.2
0
0
0.4
0.6
0.8
0.8
Baker sequence
0.8
0
0
Haber sequence
0.2
0.4
0.6
exist other ways of obtaining sequences for quasiMonte carlo methods that
rely on low discrepancy approaches, Fourier methods, or the socalled good
lattice points approach. The interested reader may refer to chapter 9 in Judd
48
[1998], but we will not investigate this any further as this would bring us far
away from our initial purpose.
Matlab Code: Equidistributed Sequences
n=2;
% dimension of the space
nb=1000;
% number of data points
K=[1:nb];
% k=1,...,nb
seq=NIEDERREITER;
% Type of sequence
switch upper(seq)
case WEYL
% Weyl
p=sqrt(primes(n+1));
x=K*p;
x=x-fix(x);
case HABER
% Haber
p=sqrt(primes(n+1));
x=(K.*(K+1)./2)*p;
x=x-fix(x);
case NIEDERREITER
% Niederreiter
x=K*(2.^((1:n)/(1+n)));
x=x-fix(x);
case BAKER
% Baker
x=K*exp(1./primes(n+1));
x=x-fix(x);
otherwise
error(Unknown sequence requested)
end
49
ex dx.
N
10
Weyl
1.67548650
(0.0427953)
(0.00186656)
(0.0427953)
(0.104939)
100
1.71386433
1.75678423
1.71386433
1.71871676
(0.0044175)
(0.0385024)
(0.0044175)
(0.000434929)
1000
1.71803058
1.71480932
1.71803058
1.71817437
(0.000251247)
(0.00347251)
(0.000251247)
(0.000107457)
10000
100000
1000000
Haber
1.72014839
R1
Niederreiter
1.67548650
Baker
1.82322097
1.71830854
1.71495774
1.71830854
1.71829897
(2.67146e-005)
(0.00332409)
(2.67146e-005)
(1.71431e-005)
1.71829045
1.71890493
1.71829045
1.71827363
(8.62217e-006)
(0.000623101)
(8.62217e-006)
(8.20223e-006)
1.71828227
1.71816697
1.71828227
1.71828124
(4.36844e-007)
(0.000114855)
(4.36844e-007)
(5.9314e-007)
50
Bibliography
Davis, P.J. and P. Rabinowitz, Methods of Numerical Integration, New York:
Academic Press, 1984.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.
51
Index
Antithetic variates, 41
Richardson Extrapolation, 5
Composite rule, 11
Simpsons rule, 13
Control variates, 43
Stratified sampling, 41
GaussChebychev quadrature, 21
Trapezoid rule, 11
GaussLaguerre quadrature, 24
GaussLegendre quadrature, 22
Hessian, 1
Importance sampling, 44
Jacobian, 1
Law of large numbers, 39
Midpoint rule, 10
MonteCarlo, 34
NewtonCotes, 10
Pseudorandom numbers, 36
Quadrature, 9
Quadrature nodes, 18
Quadrature weights, 18
QuasiMonte Carlo, 46
Random numbers generators, 35
52
Contents
4 Numerical differentiation and integration
4.1
4.2
Numerical differentiation . . . . . . . . . . . . . . . . . . . . . .
4.1.1
Computation of derivatives . . . . . . . . . . . . . . . .
4.1.2
Partial Derivatives . . . . . . . . . . . . . . . . . . . . .
4.1.3
Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . .
Numerical Integration . . . . . . . . . . . . . . . . . . . . . . .
4.2.1
NewtonCotes formulas . . . . . . . . . . . . . . . . . .
10
4.2.2
Gaussian quadrature . . . . . . . . . . . . . . . . . . . .
18
4.2.3
Potential problems . . . . . . . . . . . . . . . . . . . . .
30
4.2.4
Multivariate integration . . . . . . . . . . . . . . . . . .
31
4.2.5
MonteCarlo integration . . . . . . . . . . . . . . . . . .
34
4.2.6
46
53
54
List of Figures
4.1
NewtonCotes integration . . . . . . . . . . . . . . . . . . . . .
10
4.2
Simpsons rule . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.3
35
4.4
37
4.5
38
4.6
48
55
56
List of Tables
4.1
18
4.2
25
4.3
26
4.4
GaussHermite quadrature . . . . . . . . . . . . . . . . . . . .
28
4.5
. . . . . . . . . . . . .
33
. . . . . . . . . . . . .
40
. . . . . . . . . . . . .
42
. . . . . . . . . . . . .
43
. . . . . . . . . . . . .
44
. . . . . . . . . . . . .
45
. . . . . . . . . . . . .
50
2D Gauss-Hermite quadrature . . . . . .
R1
4.6 Crude MonteCarlo example: 0 ex dx. .
R1
4.7 Antithetic variates example: 0 ex dx. . .
R1
4.8 Stratified sampling example: 0 ex dx. .
R1
4.9 Control variates example: 0 ex dx. . . .
R1
4.10 Importance sampling example: 0 ex dx.
R1
4.11 Quasi MonteCarlo example: 0 ex dx. .
57
Lecture Notes 5
5.1
5.1.1
The idea is here to express the problem as a fixed point, such that we would
like to solve the onedimensional problem of the form
x = f (x)
1
(5.1)
xk
Let us start from x0 =0.95. The simple iterative scheme is found to be diverging
as illustrated in figure 5.1 and shown in table 5.1. Why? simply because the
function is not Lipschitz bounded in a neighborhood of the initial condition!
Nevertheless, as soon as we set 0 = 1 and k = 0.99k1 the algorithm is able
to find a solution, as illustrated in table 5.2. In fact, this trick is a numerical
way to circumvent the fact that the function is not Lipschitz bounded.
F(x)
1.5
1
0.5
0
0.5
1
0.8
0.9
1.1
1.2
1.3
1.4
1.5
xk
1.011686
0.655850
4.090549
77.074938
|f (xk ) xk |
3.558355e-001
3.434699e+000
7.298439e+001
n.c
xk
1.011686
1.011686
1.007788
7 1.000619
0.991509
0.982078
0.973735
0.967321
0.963024
0.960526
0.959281
0.958757
0.958577
0.958528
0.958518
0.958517
k
1.0000
0.9900
0.9801
0.9703
0.9606
0.9510
0.9415
0.9321
0.9227
0.9135
0.9044
0.8953
0.8864
0.8775
0.8687
0.8601
|f (xk ) xk |
3.558355e-001
3.558355e-001
3.313568e-001
2.856971e-001
2.264715e-001
1.636896e-001
1.068652e-001
6.234596e-002
3.209798e-002
1.435778e-002
5.467532e-003
1.723373e-003
4.314821e-004
8.035568e-005
9.863215e-006
5.907345e-007
5.1.2
Bisection method
x0 + x 1
2
z
|
I1
}|
|
{z
I2
{z
I0
b
}
As can be seen, it takes more iterations than in the previous iterative scheme
(27 iteration for a=0.5 and b=1.5, but still 19 with a=0.95 and b=0.96!), but
the bisection method is actually implementable in a much greater number of
cases as it only requires continuity of the function while not imposing the
Lipschitz condition on the function.
Matlab Code: Bisection Algorithm
function x=bisection(f,a,b,varargin);
%
% function x=bisection(f,a,b,P1,P2,...);
%
% f
: function for which we want to find a zero
% a,b
: lower and upper bounds of the interval (a<b)
% P1,...
: parameters of the function
%
% x solution
%
epsi
= 1e-8;
x0
= a;
x1
= b;
y0
= feval(f,x0,varargin{:});
y1
x2
1.000000
0.750000
0.875000
0.937500
0.968750
0.958008
0.958527
0.958516
0.958516
0.958516
error
5.000000e-001
2.500000e-001
1.250000e-001
6.250000e-002
3.125000e-002
9.765625e-004
3.051758e-005
9.536743e-007
2.980232e-008
1.490116e-008
= feval(f,x1,varargin{:});
if a>=b
error(a should be greater than b)
end
if y0*y1>=0
error(a and b should be such that f(a)f(b)<0!)
end
err = 1;
while err>0;
x2
= (x0+x1)/2;
y2
= feval(f,x2,varargin{:});
if y2*y0<0;
x1
= x2;
y1
= y2;
else
x0
= x1;
x1
= x2;
y0
= y1;
y1
= y2;
end
err
= abs(x1-x0)-epsi*(1+abs(x0)+abs(x1));
end
x
= x2;
5.1.3
Newtons method
While bisection has proven to be more stable than the previous algorithm it
displays slow convergence. Newtons method will be more efficient as it will
take advantage of information available on the derivatives of the function. A
simple way to understand the underlying idea of Newtons method is to go
back to the Taylor expansion of the function f
f (x? ) ' f (xk ) + (x? xk )f 0 (xk )
but since x? is a zero of the function, we have
f (xk ) + (x? xk )f 0 (xk ) = 0
which, replacing xk+1 by the new guess one may formulate for a candidate
solution, we have the recursive scheme
xk+1 = xk
f (xk )
= xk + k
f 0 (xk )
with k = f (xk )/f 0 (xk 0 Then, the idea of the algorithm is straightforward.
For a given guess xk , we compute the tangent line of the function at xk and
find the zero of this linear approximation to formulate a new guess. This
process is illustrated in figure 5.3. The algorithm then works as follows.
1. Assign an initial value, xk , k = 0, to x and a vector of termination
criteria (1 , 2 ) > 0
2. Compute f (xk ) and the associated derivative f 0 (xk ) and therefore the
step k = f (xk )/f 0 (xk )
3. Compute xk+1 = xk + k
4. if |xk+1 xk | < 1 (1 + |xk+1 |) then goto 5, else go back to 2
5. if |f (xk+1 )| < 2 then stop and set x? = xk+1 ; else report failure.
9
x0
x1
10
y0
xk
0.737168
0.900057
0.954139
0.958491
0.958516
0.958516
error
2.371682e-001
1.628891e-001
5.408138e-002
4.352740e-003
2.504984e-005
8.215213e-010
= feval(f,x0,varargin{:});
err
= 1;
while err>0;
dy0 = (feval(f,x0+dev,varargin{:})-feval(f,x0-dev,varargin{:}))/(2*dev);
if dy0==0;
error(Algorithm stuck at a local optimum)
end
d0
= -y0/dy0;
x
= x0+d0
err = abs(x-x0)-eps1*(1+abs(x));
y0
= yeval(f,x,varargin{:})
ferr = abs(y0);
x0
= x;
end
if ferr<eps2;
term = 1;
else
term = 0;
end
Note that in order to apply this method, we need the first order derivative
of f to be nonzero at each evaluation point, otherwise the algorithm degenerates as it is stuck a a local optimum. Further the algorithm may get stuck into
a cycle, as illustrated in figure 5.4. As we can see from the graph, because the
function has the same derivative in x0 and x1 , the Newton iterative scheme
cycles between the two values x0 and x1 .
11
x0
x1
12
k
2j
where
1 f (xk )
j = min i : 0 6 i 6 imax , f xk j 0
< |f (xk )|
2 f (xk )
Should this condition impossible to fulfill, one continues the process setting
j = 0 as usual. In practice, one sets imax = 4. However, in some cases, imax ,
should be adjusted. Increasing imax helps at the cost of larger computational
time.
5.1.4
Secant methods just start by noticing that in the Newtons method, we need
to evaluate the first order derivative of the function, which may be quite
costly. Regula falsi methods therefore propose to replace the evaluation of the
derivative by the secant, such that the step is replaced by
k = f (xk )
xk xk1
for k = 1, . . .
f (xk ) f (xk1 )
therefore one has to feed the algorithm with two initial conditions x0 and x1 ,
satisfying f (x0 )f (x1 ) < 0. The algorithm then writes as follows
1. Assign 2 initial value, xk , k = 0, 1, to x and a vector of termination
criteria (1 , 2 ) > 0
2. Compute f (xk ) and the step
k = f (xk )
xk xk1
f (xk ) f (xk1 )
13
3. Compute xk+1 = xk + k
4. if |xk+1 xk | < 1 (1 + |xk+1 |) then goto 5, else go back to 2
5. if |f (xk+1 )| < 2 then stop and set x? = xk+1 ; else report failure.
5.2
xk
1.259230
0.724297
1.049332
0.985310
0.955269
0.958631
0.958517
0.958516
0.958516
error
2.407697e-001
5.349331e-001
3.250351e-001
6.402206e-002
3.004141e-002
3.362012e-003
1.138899e-004
4.860818e-007
7.277312e-011
Multidimensional systems
5.2.1
which, replacing xk+1 by the new guess one may formulate for a candidate
solution, we have the recursive scheme
xk+1 = xk + k where k = (F (xk ))1 F (xk )
such that the algorithm then works as follows.
1. Assign an initial value, xk , k = 0, to the vector x and a vector of
termination criteria (1 , 2 ) > 0
2. Compute F (xk ) and the associated jacobian matrix F (xk )
3. Solve the linear system F (xk )k = F (xk )
4. Compute xk+1 = xk + k
5. if kxk+1 xk k < 1 (1 + kxk+1 k) then goto 5, else go back to 2
6. if kf (xk+1 )k < 2 then stop and set x? = xk+1 ; else report failure.
All comments previously stated in the onedimensional case apply to this
higher dimensional method.
Matlab Code: Newtons Method
function [x,term]=newton(f,x0,varargin);
%
% function x=newton(f,x0,P1,P2,...);
%
% f
: function for which we want to find a zero
% x0
: initial condition for x
% P1,...
: parameters of the function
%
% x
: solution
% Term
: Termination status (1->OK, 0-> Failure)
%
eps1
= 1e-8;
eps2
= 1e-8;
x0
= x0(:);
y0
= feval(f,x0,varargin{:});
n
= size(x0,1);
dev
= diag(.00001*max(abs(x0),1e-8*ones(n,1)));
err
= 1;
16
while err>0;
dy0 = zeros(n,n);
for i= 1:n;
f0
= feval(f,x0+dev(:,i),varargin{:});
f1
= feval(f,x0-dev(:,i),varargin{:});
dy0(:,i) = (f0-f1)/(2*dev(i,i));
end
if det(dy0)==0;
error(Algorithm stuck at a local optimum)
end
d0
= -y0/dy0;
x
= x0+d0
y
= feval(f,x,varargin{:});
tmp = sqrt((x-x0)*(x-x0));
err = tmp-eps1*(1+abs(x));
ferr = sqrt(y*y);
x0
= x;
y0
= y;
end
if ferr<eps2;
term = 1;
else
term = 0;
end
5.2.2
As in the onedimensional case, the main gain from using the secant method is
to avoid the computation of the jacobian matrix. Here, we will see a method
developed by Broyden which essentially amounts to define a Rn version of
the secant method. The ideas is to replace the jacobian matrix F (xk ) by a
matrix Sk at iteration k which serves as a guess for the jacobian. Therefore
17
(Fk Sk k )k0
k0 k
(Fk Sk k )k0
k0 k
19
end
if ferr<eps2;
term = 1;
else
term = 0;
end
5.2.3
Final considerations
All these methods are numerical, such that we need a computer to find a
solution. Never forget that a computer can only deal with numbers with a
given accuracy, hence these methods will be more or less efficient depending
on the scale of the system. For instance, imagine we deal with a system for
which the numbers are close to zero (lets think of a model that is close to
the nontrade theorem), then the computer will have a hard time trying to
deal with numbers close to machine precision. Then it may be a good idea to
rescale the system.
Another important feature of all these methods is that they implicitly rely
on a linear approximation of the function we want to solve. Therefore, the
system should not be too nonlinear. For instance, assume you want to find
the solution to the equation
c
=p
(c c)
where p is given. Then there is a great advantage to first rewrite the equation
as
c = p(c c)
and then as
c = p(c c)
such that the system is more linear. Such transformation often turn out to
be extremely useful.
20
21
22
Bibliography
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
23
Index
bisection, 6
Broydens method, 17
fixed point, 1
Homotopy, 21
iterative procedure, 2
Lipschitz condition, 2
Newtons method, 9
Regula falsi, 13
Secant method, 13
24
Contents
5 Solving nonlinear systems of equations
5.1
5.2
5.1.1
5.1.2
Bisection method . . . . . . . . . . . . . . . . . . . . . .
5.1.3
Newtons method . . . . . . . . . . . . . . . . . . . . . .
5.1.4
13
Multidimensional systems . . . . . . . . . . . . . . . . . . . . .
15
5.2.1
15
5.2.2
17
5.2.3
Final considerations . . . . . . . . . . . . . . . . . . . .
20
25
26
List of Figures
5.1
5.2
5.3
10
5.4
12
27
28
List of Tables
5.1
5.2
5.3
Bisection progression . . . . . . . . . . . . . . . . . . . . . . . .
5.4
Newton progression . . . . . . . . . . . . . . . . . . . . . . . . .
11
5.5
Secant progression . . . . . . . . . . . . . . . . . . . . . . . . .
15
29
Lecture Notes 6
Perturbation methods
In these lecture notes, we will study the socalled perturbation method, a
class of method the linear approximation belongs to. The basic idea of this
approach is to include higher order terms in the approximation, and therefore
take both curvature and risk into account therefore accounting for the two
first points we raised in lecture notes 2. For further insights on these methods
see Judd [1998], Judd and Gaspard [1997], Sims [2000] or Schmitt-Grohe and
Uribe [2001].
6.1
6.1.1
Model representation
(6.1)
(6.2)
where g maps Rnx R+ into Rny and h maps Rnx R+ into Rnx . These are
the true solution of the model, such that the model may be rewritten as
Et F (g(xt+1 , ), g(xt , ), h(xt , ) + t+1 , xt ) = 0
Et F (g (h(xt , ) + t+1 , ) , g(xt , ), h(xt , ) + t+1 , xt ) = 0
In order to save on notations, we will now get rid off time subscripts, and a
prime will stand for t + 1, such that the model rewrites
or
6.1.2
The Method
Since we cannot compute exactly neither g(.) nor h(.), we shall compute an
approximation. But, since we want to take risk and curvature into account, we
will take a higher order approximation. Now assume we want to take a higher
order approximation for g(.) and h(.). To keep things simple and you will
2
see that they are already quite cumbersome we will restrict ourselves to the
secondorder approximation, that is
i
Et F (x, ) ' Et F i (x? , 0) + x F i (x? , 0)(x x? ) + F i (x? , 0)
1
x x?
?
i
+ (x x , )HF
=0
2
for all i = 1 . . . , n. Note that we will take the approximation for accurate,
such that this is the approximation that will be set to 0. Therefore, for the
model to be solved, we need all derivatives, at any order to be 0.
First order approximation:
?
?
?
?
Et F(x , 0) + x F(x , 0)(x x ) + F(x , 0) = 0
and we want to set
F i (x? , 0) = 0
x F i (x? , 0) = 0
F i (x? , 0) = 0
for all i = 1 . . . , n. The object we are looking for are however somewhat
different, since we look for something like
g(x, ) = g(x? , 0) + gx (x? , 0)(x x? ) + g (x? , 0)
h(x, ) = h(x? , 0) + hx (x? , 0)(x x? ) + h (x? , 0)
our problem is then to determine g(x? , 0), gx (x? , 0), g (x? , 0), h(x? , 0), hx (x? , 0)
and h (x? , 0). But we already know at least two quantities since by definition
y ? = g(x? , 0)
x? = h(x? , 0)
3
fi
xj
The same index in both superscript and subscript positions indicates a summation. For example,
[Fx ]i [hx ]j =
X fi h
x xj
i
i
i
0
[Fy ] [g ] + [Fx0 ] [h ] + [Fx0 ] [] [ ]
=0
4
Note that this equation involves expectations of t+1 which are identically
equal to 0, such that the system reduces to
[Fy0 ]i [gx ] [h ] + [Fy0 ]i [g ] + [Fy ]i [g ] + [Fx0 ]i [h ] = 0
(6.4)
g i (.),
i = 1, . . . , ny and
hj (.),
j = 1, . . . , nx of the form
1
x x?
?
i
g (x, ) = g (x , 0) +
x )+
+ (x x , )Hg
1
x x?
j
?
?
?
j
j
j ?
j
?
h (x, ) = h (x , 0) + hx (x , 0)(x x ) + h (x , 0) + (x x , )Hh
2
i
gi (x? , 0)
We have already solved the first order problem, we now need to reveal information on
.
i (x? , 0) .. g i (x? , 0)
gxx
x
Hgi =
.. i
i
?
g (x , 0) . g (x? , 0)
x
.. j
?
. hx (x , 0)
and Hj =
.
hjx (x? , 0) .. hj (x? , 0)
Et Fxx (x? , 0) = 0
Et Fx (x? , 0) = 0
Et Fx (x? , 0) = 0
Et F (x? , 0) = 0
5
hjxx (x? , 0)
Let us start with gxx (x? , 0) and hxx (x? , 0), these matrices may be identified,
imposing
Et Fxx (x? , 0) = 0
But before going to this point, we need to establish further notations. Like
previously, we will rely on the tensor notations, and will adopt the following
conventions:
[Fxx ]ijk =
[Fxy ]ijk =
2 Fi
xj xk
2 Fi
.
xj yk
The same index in both superscript and subscript positions indicates a summation. For example,
[Fxx ]i [hx ]j [hx ]k =
X X 2 Fi h h
.
x x xj xk
+ [Fyy0 ]i [gx ] [hx ]k + [Fyy ]i [gx ]k + [Fyx0 ]i [hx ]k + [Fyx ]ik [gx ]j
i
i
i
0
0
0
0
0
0
+ [Fx y ] [gx ] [hx ]k + [Fx y ] [gx ]k + [Fx x ] [hx ]k + [Fx x ]k [hx ]j
+[Fxy0 ]ij [gx ] [hx ]k + [Fxy ]ij [gx ]k + [Fxx0 ]ij [hx ]k + [Fxx ]ijk
i
i
= 0
for i = 1, . . . , n, , = 1, . . . , ny , , = 1, . . . , nx and , = 1, . . . , n . Like
the previous system, although it looks quite complicated, this is just a linear
system that has to be solved for g (x? , 0) and h (x? , 0), as all the first order
derivatives are perfectly known.
Finally, we can easily show that gx (x? , 0) and hx (x? , 0) are both equal
to zero, indeed, we have
Et Fx (x? , 0) = [Fy0 ]i [gx ] [hx ]j + [gx ] [hx ]j + [Fy ]i [gx ]j + [Fx0 ]i [hx ]j
= 0
Since this system is homogenous in the unknowns gx (x? , 0) and hx (x? , 0), if
one solution exists, it is zero. Hence
gx (x? , 0) = 0 and hx (x? , 0) = 0
6.2
6.2.1
X
=0
ct+
with (, 0) (0, 1]
(6.5)
(6.6)
(6.7)
where xt , the rate of growth of dividends, is assumed to be a Gaussian stationary AR(1) process
xt = (1 )x + xt1 + t
8
(6.8)
(6.9)
Burnside [1998] shows that the above equation admits an exact solution of the
form1
yt =
(6.10)
i=1
where
2 2
2(1 i ) 2 (1 2i )
+
ai = xi +
i
2(1 )2
1
1 2
and
bi =
(1 i )
1
As can be seen from the definition of ai , the volatility of the shock, directly
enters the decision rule, therefore Burnsides [1998] model does not make the
certainty equivalent hypothesis: risk matters for asset holdings decisions.
Finding a secondorder approximation to the model: First of all let
us rewrite the two key equations that define the model:
yt = Et [exp(xt+1 )(1 + yt+1 )]
xt = (1 )x + xt1 + t
and let us reexpress the model as
xt+1 = h(xt , )
yt = Et [f (yt+1 , xt+1 )]
1
(6.11)
(6.12)
x? = x? + (1 )x x? = x
y ? = f (y ? , x? )
y ? = exp(x)/(1 exp(x))
(6.13)
The solution to (6.12) is a function g(.) such that yt = g(xt , ). Then (6.12)
can be rewritten as
g(xt , ) = Et [f (g(h(xt , )), h(xt , ))] = Et [F (xt , )]
(6.14)
k
n
X
X
k
1
= Et
Fkj,j x
btkj j
k!
j
j=0
k=0
k
n
X
X
1
k
Et Fkj,j x
btkj j (6.15)
=
j
k!
n
k
X
1 X k
gkj,j x
btkj j
k!
j
k=0
j=0
k=0
g(xt ,)
xkj
j xt =x?
t
=0
j=0
k F (xt ,)
xkj
j xt =x
t
=0
and j =
a system of n equations:
nk
X
k
1
1
gk =
Fk,j j
k!
(j + k)! j
for k = 0 . . . n
(6.16)
j=0
Fk,j = exp(x)
"
j+k
j+k
X
j + k j+k`
+
g`
`
`=0
(6.17)
where xt+1 = h(xt , t+1 ). Therefore, (6.16) together with (6.17) defines a
10
Pn 1 h j Pj j j` i
g
=
exp(x)
g ` j
0
j=0 j! +
`=0 `
..
h
Pj+k j+k j+k` i
Pnk 1 k
1
k j+k +
exp(x)
g
=
g ` j
k
j=0 (j+k)! j
`=0
k!
`
..
gn = exp(x)n n + n`=0 n` n` g`
(6.18)
It worth noting that the solution of this system does depend on higher mo-
ments of the distribution of the exogenous shocks, j . Therefore, the properties of the decision rule (slope and curvature) will depend on these moments.
But more importantly, the level of the rule will differ from the certainty equivalent solution, such that the steady state level, y ? , is not given anymore by
(6.13) but will be affected by the higher moments of the distribution of the
shocks (essentially the volatility for our model as the shocks are assumed to
be normally distributed). Also noteworthy is that this approximation is particularly simple in our showcase economy, as the solution only depends on the
exogenous variable xt .
6.2.2
What do we gain?
This section checks the accuracy and evaluates the potential gains of the
method presented above. As the model possesses a closedform solution, we
can directly check the accuracy of each solution we propose against this true
solution. We first present the evaluation criteria that will be used to check
the accuracy. We then conduct the evaluation of the approximation methods
under study.
Criteria
Several criteria are considered to tackle the question of the accuracy of the
different approximation methods under study. As the model admits a closed
form solution, the accuracy of the approximation method can be directly
11
checked against the true decision rule. This is undertaken relying on the
two following criteria
N
1 X yt yet
E1 = 100
yt
N
t=1
and
yt yet
= 100 max
yt
where yt denotes the true solution to pricedividend ratio and yet is the ap-
the true solution, while E is the maximal relative error made using the approximation rather than the true solution. These criteria are evaluated over
the interval xt [x x , x + x ] where is selected such that we explore
99.99% of the distribution of x.
However, one may be more concerned especially in finance with
the ability of the approximation method to account for the distribution of
the PDR, and therefore the moments of the distribution. We first compute
the mean of yt for different calibration and different approximation methods.
Further, we explore the stochastic properties of the innovation of yt : t =
yt Et1 (yt ), in order to assess the ability of each approximation method to
account for the internal stochastic properties of the model. We thus report
standard deviation, skewness and kurtosis of t , which provides information
on the ability of the model to capture the heteroskedasticity of the innovation
and more importantly the potential nonlinearities the model can generate. 2
Approximation errors
Table 6.1 reports E1 and E for the different cases and different approximation methods under study. We also consider different cases. Our benchmark
2
The cdf of t is computed using 20000 replications of MonteCarlo simulations of 1000
observations for t .
12
experiment amounts to considering the Mehra and Prescotts [1985] parameterization of the asset pricing model. We therefore set the mean of the rate
of growth of dividend to x = 0.0179, its persistence to = 0.139 and the
volatility of the innovations to = 0.0348. These values are consistent with
the properties of consumption growth in annual data from 1889 to 1979.
was set to -1.5, the value widely used in the literature, and to 0.95, which is
standard for annual frequency. We then investigate the implications of changes
in these parameters in terms of accuracy. In particular, we study the implications of larger and lower impatience, higher volatility, larger curvature of the
utility function and more persistence in the rate of growth of dividends.
Benchmark
=0.5
=0.99
=-10
=0
=0.5
=0.001
=0.1
=0
=0.5
=0.9
Linear
E1
E
1.4414
1.4774
0.2537
0.2944
2.9414
2.9765
23.7719 25.3774
0.0000
0.0000
0.2865
0.2904
0.0012
0.0012
11.8200 12.1078
1.8469
1.8469
5.9148
8.2770
57.5112 226.2032
Quadratic (CE)
E1
E
1.4239
1.4241
0.2338
0.2343
2.9243
2.9244
23.1348 23.1764
0.0000
0.0000
0.2845
0.2845
0.0012
0.0012
11.6901 11.6933
1.8469
1.8469
4.9136
5.2081
31.8128 146.6219
Quadratic
E1
E
0.0269
0.0642
0.0041
0.0087
0.0833
0.1737
4.5777
8.3880
0.0000
0.0000
0.0016
0.0038
0.0000
0.0000
1.2835
2.2265
0.0329
0.0329
0.7100
1.5640
36.8337 193.1591
Note: The series defining the true solution was truncated after 800 terms,
as no significant improvement was found adding additional terms at the
machine accuracy. When exploring variations in , the overall volatility
of the rate of growth of dividends was maintained to its benchmark level.
14
quadratic approximation leads to a less than 0.1% error. This error is even
lower for low volatility economies, and is essentially zero in the case of =0.001.
Even for higher volatility, the gains from applying a quadratic approximation
can be substantial, as the case = 0.1 shows. In effect, the average error
drops from 12% in the linear case to 1.3% in the quadratic case. The gain
is also important for low degrees of intertemporal substitution as the error is
less than 5% for =-10 compared to the 25% previously obtained.
The gains seem to be impressive, but something still remains to be understood: where does this gain come from?
Figure 6.1 sheds light on these results. We consider a rather extreme
situation where = 5, = 0.5 and the volatility of the shock is preserved.
In this graph, Linear corresponds to the linear approximation of the true
decision rule, quadratic (CE) is the quadratic approximation but ignores the
correction introduced by the volatility and therefore can be qualified as a
certainty equivalent quadratic approximation and quadratic takes care of
the correction.
As can be seen from figure 1, the major gains from moving from a linear to
a nonstochastic higher order Taylor expansion (CE) is found in the ability
of the latter to take care of the curvature of the decision rule. But this is not
a sufficient improvement as far as accuracy of the solution is concerned. This
is confirmed by table 6.2,which reports the approximation errors in each case.
It clearly indicates that the gain from increasing the order of approximation,
Table 6.2: Accuracy check (=-5, =-0.5)
Linear
E1
E
5.9956 10.1111
Quadratic (CE)
E1
E
4.3885 4.7762
Quadratic
E1
E
0.7784 1.9404
without taking care of the stochastic component of the problem, is not as high
as taking the risky component into account. Indeed, accounting for curvature
15
Exact
O1
DO2
O2
D04
04
16
14
12
10
8
6
4
2
0.15
0.1
0.05
0
xt
0.05
0.1
0.15
i.e. moving from linear to quadratic (CE) only leads to a 27% gain in
terms of average error, where as taking risk into account permits to further
enhance the accuracy by 82%, therefore yielding a 87% total gain! In others,
increasing the order of the Taylor approximation, without taking into account
the information carried by the moments of the distribution ignoring the
stochastic dimension of the problem does not add that much in terms of
accuracy. The main improvement can be precisely found in adding the higher
order moments, as it yields a correction in the level of the rule, as can be seen
from figure 1. As soon as the volatility of the shock explicitly appears in the
rule then the rule shifts upward, thus getting closer to the true solution of
the problem. Then, the average (maximal) approximation error decreases to
0.8% (1.9%) to be compared to the earlier 6% (10%). Therefore, most of the
approximation error essentially lies in the level of the rule rather than in its
curvature. Therefore, a higher curvature in the utility function necessitates an
increase in the order of the moments of the distribution that should be taken
16
X
i=0
exp(ai )
"
p
X
bk
i
k=0
k!
(xt x)
Table 6.3 then reports the average and maximal errors induced by the Taylor
expansion to the true rule. We only report these errors for cases where the
previous analysis indicated a error greater than 1% for the O2 method. Table
6.3 clearly shows that approximation errors are large (greater than 1%) when
a second order Taylor series expansion to the true rule is already not sufficient
to produce an accurate representation to the analytical solution. For instance,
in the = 10 case, a good approximation of the true rule can be obtained
only after a third order Taylor series expansion. This indicates that we should
use at least a third order Taylor series expansion to increase significantly the
accuracy of the approximation. This phenomenon is even more pronounced as
we consider persistence. Let us focus on the case = 0.9 for which approximation errors (see table 6.1) are huge more than 15% in the O2 approximation.
As can be seen from table 6.3, the second order Taylor series expansion to the
true rule is very inaccurate as it produces maximal errors around 87%. An
17
Case
Benchmark
=0.99
=-10
=-5
=0.1
=0.5
=0.9
Crit.
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
12
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
In this example, we only had to deal with an exogenous state variable, but
it will be the case that we will have to deal with endogenous state variables
such as the capital stock. Therefore, in the next section, we will deal with the
optimal growth model.
18
6.2.3
c
t+1
1
exp(at+1 )kt+1
+1
c
=0
t
(6.19)
(6.20)
Since h(kt , at ) is entirely known once we have obtained g(kt , at ), solving the
mode reduces to find g(kt , at ) that solves:
Et [G(ct+1 , kt+1 , at+1 , ct )] = 0
Note that ct+1 = g(kt+1 , at+1 ) = g(h(kt , at ), at+1 ) = f (kt , at , t+1 ) so that the
later equation rewrites as
Et [G(f (kt , at , ), h(kt , at ), at+1 + (1 )
a + t+1 , g(kt , at ))] = 0
or
Et [F (kt , at , )] = 0
19
(6.21)
We now present the perturbation method for the quadratic case. Taking the
a
second order Taylor expansion of (6.21) around (k,
, 0) yields
a
a
a
a
Et F (k,
, 0) + Fk (k,
, 0)kt + Fa (k,
, 0)
at + F (k,
, 0)
1
1
1
a
a
a
, 0)kt2 + Faa (k,
, 0)
a2t + F (k,
, 0)
2
+ Fkk (k,
2
2
2
a
a
a
+ Fka (k,
, 0)kt a
t + Fk (k,
, 0)kt
+ Fa (k,
, 0)
at
= 0(6.22)
Since kt and at are perfectly known to the household when she takes her
decisions, and since is a constant, expectations can be dropped off in the
previous equation, such that (6.22) reduces to
a
a
a
a
F (k,
, 0) + Fk (k,
, 0)kt + Fa (k,
, 0)
at + F (k,
, 0)
1
1
1
a
a
a
+ Fkk (k,
, 0)kt2 + Faa (k,
, 0)
a2t + F (k,
, 0)
2
2
2
2
a
a
a
+Fka (k,
, 0)kt a
t + Fk (k,
, 0)kt
+ Fa (k,
, 0)
at
=0
(6.23)
a
a
a
a
g(kt , at , ) ' g(k,
, 0) + gk (k,
, 0)kt + ga (k,
, 0)
at + g (k,
, 0)
1
1
1
a
a
a
+ gkk (k,
, 0)kt2 + gaa (k,
, 0)
a2t + g (k,
, 0)
2
2
2
2
a
a
a
+gka (k,
, 0)kt a
t + gk (k,
, 0)kt
+ ga (k,
, 0)
at
(6.24)
Note however that the general approach to the problem has enabled us to
a
a
a
establish formally that both g (k,
, 0), gk (k,
, 0) and ga (k,
, 0) should be
a
a
a
equal to 0. such that we just have to find g(k,
, 0), gk (k,
, 0), ga (k,
, 0),
a
a
a
a
gkk (k,
, 0), gka (k,
, 0), gaa (k,
, 0) and g (k,
, 0). Identifying each term
20
all;
2;
1;
1;
alpha
sigma
beta
delta
ab
rhoa
s2
eta1
ETA
=
=
=
=
=
=
=
=
=
%
%
%
%
%
%
21
%
% Steady state
%
ksy
=alpha*beta/(1-beta*(1-delta));
ysk
=(1-beta*(1-delta))/(alpha*beta);
ys
=ksy^(alpha/(1-alpha));
ks
=ksy*ys;
cs
=ys*(1-delta*ksy);
%
% Solving the model
%
param
= [beta sigma alpha delta rhoa as];
xs
= [ks as cs ks as cs];
[J]
= diffext(model,xs,[],param);
[H]
= hessext(model,xs,[],param);
%
%
%
%
vector of parameters
deterministic steady state
Jacobian matrix of the model
Hessian matric of the model
[Gx,Hx,Gxx,Hxx,Gss,Hss]=solve(xs,J,H,ETA,nx,ny,ne);
%
% Simulating the economy (only once)
%
long
= 120;
tronc
= 100;
slong
= long+tronc;
T
= tronc+1:slong;
sim
= 1;
seed
= 1;
e
= randn(slong,ne)*s2;
S1
= zeros(nx,slong);
S2
= zeros(nx,slong);
X1
= zeros(ny,slong);
X2
= zeros(ny,slong);
S1(:,1) = ETA*e(:,1);
S2(:,1) = ETA*e(:,1);
tmp
= S2(:,1)*S2(:,1);
X1(:,1) = Gx*S1(:,1);
X2(:,1) = X1(:,1)+0.5*Gxx*tmp(:);
for i=2:slong
S1(:,i)=Hx*S1(:,i-1)+ETA*e(:,i);
X1(:,i)=Gx*S1(:,i);
S2(:,i)=S1(:,i)+0.5*Hxx*tmp(:)+0.5*Hss;
tmp=S2(:,i)*S2(:,i);
X2(:,i)=(Gs+0.5*Gss*s2+X1(:,i)+0.5*Gxx*tmp(:));
X1(:,i)=(Gs+X1(:,i));
end;
22
Figure (6.2) reports the relative differences in the decision rules for consumption and capital between the quadratic and the linear approximation,
computed as
100
g quad g lin
g lin
As it appears clearly in the figure, taking into account risk induces a drop
in the level of consumption between 5 and 10% relative to the linear approximation. This actually just reflects the precautionary motive that is at work
23
0.5
10
3
0.05 1.5
3
2.5
kt
2 0.05
0.05
0
2.5
exp(at)
kt
2 0.05
exp(at)
in this type of model. Since the agent is risk averse, she increase her savings
in order to insure herself against shocks, therefore lowering her current expenditures. This also affects current investment to a lesser extent therefore
translating into lower capital accumulation.
6.3
The perturbation method is and remains a local approximation. Therefore, even in in practice it may handle larger shocks than the now conventional (log)linear approximation, it cannot be used to study the
implications of big structural shocks such as a tax reform, unless this
reform is marginal.
By construction, this method requires the model to be differentiable,
which forbids the analysis of modes with binding constraints.
The quadratic approximation may sometime induce strange behaviors
due to the quadratic term. In order to understand the problem, let us
focus on a simplistic one dimensional model for which the the quadratic
approximation of the decision rule for the state variable would be
xt+1 x? = 0 (xt x? ) + 1 (xt x? )2
24
One can recognize the socalled logistic map that lies at the core of a
lot of chaotic systems. Such that we may encounter cases for which the
model is known to be locally stable, which can be checked taking a linear
approximation. But the quadratic approximation can lead to chaotic
dynamics or at least unattended behavior. (try for example to compute
the IRF of capital to a 1% increase when a second order approximation
is used with = 33.7145!). Note that this does not totally question
the relevance of the approach but rather reveals a problem of accuracy
which calls for either other methods, or higher order approximations.
25
26
Bibliography
Burnside, C., Solving asset pricing models with Gaussian shocks, Journal of
Economic Dynamics and Control, 1998, 22, 329340.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
and J. Gaspard, Solving Large-Scale Rational-Expectations Models,
Macroeconomic Dynamics, 1997, 1, 4575.
Mehra, R. and E.C. Prescott, The equity premium: a puzzle, Journal of
Monetary Economics, 1985, 15, 145161.
Rietz, T.A., The equity risk premium: a solution, Journal of Monetary Economics, 1988, 22, 117131.
Schmitt-Grohe, S. and M. Uribe, Solving Dynamic General Equilibrium Models Using a Second-Order Approximation to the Policy Function, mimeo,
University of Pennsylvania 2001.
Sims, C., Second Order Accurate Solution of Discrete Time Dynamic Equilibrium Models, manuscript, Princeton University 2000.
27
28
Contents
6 Perturbation methods
6.1
6.2
6.3
6.1.1
Model representation . . . . . . . . . . . . . . . . . . . .
6.1.2
The Method
. . . . . . . . . . . . . . . . . . . . . . . .
6.2.1
6.2.2
What do we gain? . . . . . . . . . . . . . . . . . . . . .
11
6.2.3
19
24
29
30
List of Figures
6.1
Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
6.2
24
31
32
List of Tables
6.1
Accuracy check . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
6.2
15
6.3
18
33
Lecture Notes 7
Dynamic Programming
In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic programming. Dynamic programming is a very convenient
way of writing a large set of dynamic problems in economic analysis as most of
the properties of this tool are now well established and understood.1 In these
lectures, we will not deal with all the theoretical properties attached to this
tool, but will rather give some recipes to solve economic problems using the
tool of dynamic programming. In order to understand the problem, we will
first deal with deterministic models, before extending the analysis to stochastic ones. However, we shall give some preliminary definitions and theorems
that justify all the approach
7.1
7.1.1
Let us consider the case of an agent that has to decide on the path of a set
of control variables, {yt }
t=0 in order to maximize the discounted sum of its
future payoffs, u(yt , xt ) where xt is state variables assumed to evolve according
to
xt+1 = h(xt , yt ), x0 given.
1
For a mathematical exposition of the problem see Bertsekas [1976], for a more economic
approach see Lucas, Stokey and Prescott [1989].
max
{yt+s D (xt+s )}
s=0
s u(yt+s , xt+s )
(7.1)
s=0
where D is the set of all feasible decisions for the variables of choice. Note
that the value function is a function of the state variable only, as since the
model is markovian, only the past is necessary to take decisions, such that all
the path can be predicted once the state variable is observed. Therefore, the
value in t is only a function of xt . (7.1) may now be rewritten as
V (xt ) =
max
u(yt , xt ) +
s u(yt+s , xt+s )
(7.2)
s=1
max
u(yt , xt ) +
max
{yt D (xt )
u(yt , xt ) +
k=0
or
V (xt ) =
max
{yt+1+k D (xt+1+k )}
k=0
k u(yt+1+k , xt+1+k )
k=0
(7.3)
max
{yt+1+k D (xt+1+k )}
k=0
k u(yt+1+k , xt+1+k )
k=0
(7.4)
This is the socalled Bellman equation that lies at the core of the dynamic
programming theory. With this equation are associated, in each and every
period t, a set of optimal policy functions for y and x, which are defined by
{yt , xt+1 } Argmax u(y, x) + V (xt+1 )
yD (x)
(7.5)
Our problem is now to solve (7.4) for the function V (xt ). This problem is
particularly complicated as we are not solving for just a point that would
satisfy the equation, but we are interested in finding a function that satisfies
the equation. A simple procedure to find a solution would be the following
1. Make an initial guess on the form of the value function V0 (xt )
2. Update the guess using the Bellman equation such that
Vi+1 (xt ) = max u(yt , xt ) + Vi (h(yt , xt ))
yt D (xt )
3. If Vi+1 (xt ) = Vi (xt ), then a fixed point has been found and the problem is solved, if not we go back to 2, and iterate on the process until
convergence.
In other words, solving the Bellman equation just amounts to find the fixed
point of the bellman equation, or introduction an operator notation, finding
the fixed point of the operator T , such that
Vi+1 = T Vi
where T stands for the list of operations involved in the computation of the
Bellman equation. The problem is then that of the existence and the uniqueness of this fixedpoint. Luckily, mathematicians have provided conditions for
the existence and uniqueness of a solution.
7.1.2
We then have the following remarkable theorem that establishes the existence
and uniqueness of the fixed point of a contraction mapping.
Theorem 1 (Contraction Mapping Theorem) If (S, ) is a complete metric space and T : S S is a contraction mapping with modulus (0, 1),
then
{Vn }
n=0 , we shall prove that {Vn }n=0 is a Cauchy sequence. First of all, note that the
contraction property of T implies that
(V2 , V1 ) = (T V1 , T V0 ) 6 (V1 , V0 )
and therefore
(Vn+1 , Vn ) = (T Vn , T Vn1 ) 6 (Vn , Vn1 ) 6 . . . 6 n (V1 , V0 )
Now consider two terms of the sequence, Vm and Vn , m > n. The triangle inequality implies
that
(Vm , Vn ) 6 (Vm , Vm1 ) + (Vm1 , Vm2 ) + . . . + (Vn+1 , Vn )
therefore, making use of the previous result, we have
(Vm , Vn ) 6 m1 + m2 + . . . + n (V1 , V0 ) 6
n
(V1 , V0 )
1
Since (0, 1), n 0 as n , we have that for each > 0, there exists N N
such that (Vm , Vn ) < . Hence {Vn }
n=0 is a Cauchy sequence and it therefore converges.
Further, since we have assume that S is complete, Vn converges to V S.
We now have to show that V = T V in order to complete the proof of the first part.
Note that, for each > 0, and for V0 S, the triangular inequality implies
(V, T V ) 6 (V, Vn ) + (Vn , T V )
But since {Vn }
n=0 is a Cauchy sequence, we have
(V, T V ) 6 (V, Vn ) + (Vn , T V ) 6
+
2
2
algorithm, if the value function satisfies a contraction property, simple iterations will deliver the solution. It therefore remains to provide conditions for
the value function to be a contraction. These are provided by the following
theorem.
Theorem 2 (Blackwells Sufficiency Conditions) Let X R` and B(X)
be the space of bounded functions V : X R with the uniform metric. Let
T : B(X) B(X) be an operator satisfying
2. (Discounting) There exists some constant (0, 1) such that for all
V B(X) and a > 0, we have
T (V + a) 6 T V + a
then T is a contraction with modulus .
Proof: Let us consider two functions V, W B(X) satisfying 1. and 2., and such that
V 6 W + (V, W )
Monotonicity first implies that
T V 6 T (W + (V, W ))
and discounting
T V 6 T W + (V, W ))
since (V, W ) > 0 plays the same role as a. We therefore get
T V T W 6 (V, W )
Likewise, if we now consider that V 6 W + (V, W ), we end up with
T W T V 6 (V, W )
Consequently, we have
|T V T W | 6 (V, W )
so that
(T V, T W ) 6 (V, W )
which defines a contraction. This completes the proof.
with kt+1 = F (kt ) ct . In order to save on notations, let us drop the time
subscript and denote the next period capital stock by k 0 , such that the Bellman
equation rewrites, plugging the law of motion of capital in the utility function
V (k) = max
u(F (k) k 0 ) + V (k 0 )
0
k K
k0 K
k0 K
k0 K
= (T V )(k) + a
Therefore, the Bellman equation satisfies discounting in the case of optimal growth model.
Hence, the optimal growth model satisfies the Blackwells sufficient conditions for a contraction mapping, and therefore the value function exists and is
unique. We are now in position to design a numerical algorithm to solve the
bellman equation.
7.2
7.2.1
u(y ? (x), x)
1
c1 1
1
and
k 0 = k c + (1 )k
Then the Bellman equation writes
V (k) =
c1 1
+ V (k 0 )
06c6k +(1)k 1
max
(k + (1 )k k 0 )1 1
+ V (k 0 )
1
(1)k6k 0 6k +(1)k
max
Now, let us define a grid of N feasible values for k such that we have
K = {k1 , . . . , kN }
9
and an initial value function V0 (k) that is a vector of N numbers that relate
each k` to a value. Note that this may be anything we want as we know by
the contraction mapping theorem that the algorithm will converge. But, if
we want it to converge fast enough it may be a good idea to impose a good
initial guess. Finally, we need a stopping criterion.
Then, for each ` = 1, . . . , N , we compute the feasible values that can be
taken by the quantity in left hand side of the value function
V`,h
It is important to understand what h feasible means. Indeed, we only compute consumption when it is positive and smaller than total output, which
restricts the number of possible values for k 0 . Namely, we want k 0 to satisfy
0 6 k 0 6 k + (1 )k
which puts a lower and an upper bound on the index h. When the grid of
values is uniform that is when kh = k + (h 1)dk , where dk is the increment
ki + (1 )ki k
+1
h=E
dk
Then we find
?
V`,h
max V`,h
h=1,...,h
and set
?
Vi+1 (k` ) = V`,h
and keep in memory the index h? = Argmaxh=1,...,N V`,h , such that we have
k 0 (k` ) = kh?
In figures 7.1 and 7.2, we report the value function and the decision rules
obtained from the deterministic optimal growth model with = 0.3, = 0.95,
10
= 0.1 and = 1.5. The grid for the capital stock is composed of 1000 data
points ranging from (1 k )k ? to (1 + k )k ? , where k ? denotes the steady
state and k = 0.9. The algorithm2 then converges in 211 iterations and 110.5
seconds on a 800Mhz computer when the stopping criterion is = 1e6 .
Figure 7.1: Deterministic OGM (Value function, Value iteration)
4
3
2
1
0
1
2
3
4
0
1.5;
0.1;
0.95;
0.30;
0.5
1.5
2.5
kt
3.5
4.5
sigma
delta
beta
alpha
=
=
=
=
nbk
crit
epsi
= 1000;
= 1;
= 1e-6;
ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
kmin
kmax
= 0.9;
= (1-dev)*ks;
= (1+dev)*ks;
This code and those that follow are not efficient from a computational point of view, as
they are intended to have you understand the method without adding any coding complications. Much faster implementations can be found in the accompanying matlab codes.
11
Consumption
1.5
3
1
2
0.5
1
0
0
dk
kgrid
v
dr
=
=
=
=
kt
0
0
(kmax-kmin)/(nbk-1);
linspace(kmin,kmax,nbk);
zeros(nbk,1);
zeros(nbk,1);
%
%
%
%
kt
implied increment
builds the grid
value function
decision rule (will contain indices)
while crit>epsi;
for i=1:nbk
%
% compute indexes for which consumption is positive
%
tmp
= (kgrid(i)^alpha+(1-delta)*kgrid(i)-kmin);
imax
= min(floor(tmp/dk)+1,nbk);
%
% consumption and utility
%
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
%
% find value function
%
[tv(i),dr(i)] = max(util+beta*v(1:imax));
end;
crit = max(abs(tv-v));
v
= tv;
end
%
% Final solution
%
kp = kgrid(dr);
12
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
7.2.2
13
u(y ? , x)
1
I report now the matlab code for this approach when we use cubic spline
interpolation for the value function, 20 nodes for the capital stock and 1000
nodes for consumption. The algorithm converges in 182 iterations and 40.6
seconds starting from initial condition for the value function
v0 (k) =
((c? /y ? ) k )1 1
(1 )
sigma
delta
beta
alpha
nbk
nbk
crit
epsi
=
=
=
=
ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
kmin
kmax
kgrid
cmin
cmak
c
v
dr
util
=
=
=
=
=
=
=
=
=
=
20;
1000;
1;
1e-6;
%
%
%
%
0.9;
%
(1-dev)*ks;
%
(1+dev)*ks;
%
linspace(kmin,kmax,nbk); %
0.01;
%
kmax^alpha;
%
linspace(cmin,cmax,nbc); %
zeros(nbk,1);
%
zeros(nbk,1);
%
(c.^(1-sigma)-1)/(1-sigma);
while crit>epsi;
for i=1:nbk;
kp
vi
[Tv(i),dr(i)]
= A(j)*k(i).^alpha+(1-delta)*k(i)-c;
= interp1(k,v,kp,spline);
= max(util+beta*vi);
14
end
crit = max(abs(tv-v));
% Compute convergence criterion
v
= tv;
% Update the value function
end
%
% Final solution
%
kp = kgrid(dr);
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
v
= util/(1-beta);
Figure 7.3: Deterministic OGM (Value function, Value iteration with interpolation)
4
3
2
1
0
1
2
3
4
0
7.2.3
0.5
1.5
2.5
kt
3.5
4.5
The simple value iteration algorithm has the attractive feature of being particularly simple to implement. However, it is a slow procedure, especially for
infinite horizon problems, since it can be shown that this procedure converges
at the rate , which is usually close to 1! Further, it computes unnecessary
quantities during the algorithm which slows down convergence. Often, com15
Figure 7.4: Deterministic OGM (Decision rules, Value iteration with interpolation)
Next period capital stock
Consumption
1.5
3
1
2
0.5
1
0
0
kt
0
0
kt
putation speed is really important, for instance when one wants to perform a
sensitivity analysis of the results obtained in a model using different parameterization. Hence, we would like to be able to speed up convergence. This can
be achieved relying on Howard improvement method. This method actually
iterates on policy functions rather than iterating on the value function. The
algorithm may be described as follows
1. Set an initial feasible decision rule for the control variable y = f0 (x) and
compute the value associated to this guess, assuming that this rule is
operative forever:
V (xt ) =
s=0
taking care of the fact that xt+1 = h(xt , yt ) = h(xt , fi (xt )) with i = 0.
Set a stopping criterion > 0.
2. Find a new policy rule y = fi+1 (x) such that
fi+1 (x) Argmax u(y, x) + V (x0 )
y
3. check if kfi+1 (x) fi (x)k < , if yes then stop, else go back to 2.
Note that this method differs fundamentally from the value iteration algorithm
in at least two dimensions
i one iterates on the policy function rather than on the value function;
ii the decision rule is used forever whereas it is assumed that it is used
only two consecutive periods in the value iteration algorithm. This is
precisely this last feature that accelerates convergence.
Note that when computing the value function we actually have to solve a linear
system of the form
Vi+1 (x` ) = u(fi+1 (x` ), x` ) + Vi+1 (h(x` , fi+1 (x` )))
x` X
17
sigma
delta
beta
alpha
=
=
=
=
1.50;
0.10;
0.95;
0.30;
nbk
crit
epsi
= 1000;
= 1;
= 1e-6;
ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
= 0.9;
% maximal deviation from steady state
kmin
= (1-dev)*ks;
% lower bound on the grid
kmax
= (1+dev)*ks;
% upper bound on the grid
kgrid
= linspace(kmin,kmax,nbk); % builds the grid
v
= zeros(nbk,1);
% value function
kp0
= kgrid;
% initial guess on k(t+1)
dr
= zeros(nbk,1);
% decision rule (will contain indices)
%
% Main loop
%
while crit>epsi;
for i=1:nbk
%
% compute indexes for which consumption is positive
%
imax
= min(floor((kgrid(i)^alpha+(1-delta)*kgrid(i)-kmin)/devk)+1,nbk);
%
% consumption and utility
%
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
%
% find new policy rule
%
[v1,dr(i)]= max(util+beta*v(1:imax));
end;
%
% decision rules
%
kp = kgrid(dr)
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
%
% update the value
%
util= (c.^(1-sigma)-1)/(1-sigma);
Q
= sparse(nbk,nbk);
for i=1:nbk;
Q(i,dr(i)) = 1;
18
end
Tv =
crit=
v
=
kp0 =
(speye(nbk)-beta*Q)\u;
max(abs(kp-kp0));
Tv;
kp;
end
0.5
1.5
2.5
kt
3.5
4.5
Consumption
1.5
3
1
2
0.5
1
0
0
kt
0
0
kt
2. iterate k times on
Ji+1 = u(y, x) + QJi , i = 0, . . . , k
3. set Vi+1 = Jk .
When k , Jk tends toward the solution of the linear system.
7.2.4
This last technics I will describe borrows from approximation theory using
either orthogonal polynomials or spline functions. The idea is actually to
make a guess for the functional for of the value function and iterate on the
parameters of this functional form. The algorithm then works as follows
1. Choose a functional form for the value function Ve (x; ), a grid of interpolating nodes X = {x1 , . . . , xN }, a stopping criterion > 0 and an
2. Using the conjecture for the value function, perform the maximization
step in the Bellman equation, that is compute w` = T (Ve (x, i ))
w` = T (Ve (x` , i )) = max u(y, x` ) + Ve (x0 , i )
y
20
First note that for this method to be implementable, we need the payoff function and the value function to be continuous. The approximation function
may be either a combination of polynomials, neural networks, splines. Note
that during the optimization problem, we may have to rely on a numerical
maximization algorithm, and the approximation method may involve numerical minimization in order to solve a nonlinear leastsquare problem of the
form:
i+1 Argmin
N
X
`=1
This algorithm is usually much faster than value iteration as it may not require
iterating on a large grid. As an example, I will once again focus on the optimal
growth problem we have been dealing with so far, and I will approximate the
value function by
Ve (k; ) =
kk
1
i i 2
kk
i=1
p
X
the value function in this case, and table 7.1 reports the parameters of the
approximation function. The algorithm converged in 242 iterations, but took
less much time that value iterations.
21
0.5
1.5
2.5
kt
3.5
4.5
Consumption
1.5
3
1
2
0.5
1
0
0
0
0
22
3
t
0.82367
2.78042
-0.66012
0.23704
-0.10281
0.05148
-0.02601
0.01126
-0.00617
0.00501
-0.00281
23
sigma
delta
beta
alpha
=
=
=
=
nbk
p
crit
iter
epsi
=
=
=
=
=
20;
10;
1;
1;
1e-6;
%
%
%
%
%
ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
= 0.9;
% maximal dev. from steady state
kmin
= (1-dev)*ks;
% lower bound on the grid
kmax
= (1+dev)*ks;
% upper bound on the grid
rk
= -cos((2*[1:nbk]-1)*pi/(2*nnk)); % Interpolating nodes
kgrid
= kmin+(rk+1)*(kmax-kmin)/2;
% mapping
%
% Initial guess for the approximation
%
v
= (((kgrid.^alpha).^(1-sigma)-1)/((1-sigma)*(1-beta)));;
X
= chebychev(rk,n);
th0
= X\v
Tv
= zeros(nbk,1);
kp
= zeros(nbk,1);
%
% Main loop
%
options=foptions;
options(14)=1e9;
while crit>epsi;
k0
= kgrid(1);
for i=1:nbk
param = [alpha beta delta sigma kmin kmax n kgrid(i)];
kp(i) = fminu(tv,k0,options,[],param,th0);
k0
= kp(i);
Tv(i) = -tv(kp(i),param,th0);
end;
theta= X\Tv;
crit = max(abs(Tv-v));
v
= Tv;
th0 = theta;
iter= iter+1;
end
24
Functions
%
%
%
%
%
%
%
%
insures positivity of k
computes the value function
computes consumption
find negative consumption
get rid off negative c
computes utility
utility = low number for c<0
compute -TV (we are minimizing)
function v = value(k,param,theta);
kmin
= param(1);
kmax
= param(2);
n
= param(3);
k
= 2*(k-kmin)/(kmax-kmin)-1;
v
= chebychev(k,n)*theta;
function Tx=chebychev(x,n);
X=x(:);
lx=size(X,1);
if n<0;error(n should be a positive integer);end
switch n;
case 0;
Tx=[ones(lx,1)];
case 1;
Tx=[ones(lx,1) X];
otherwise
Tx=[ones(lx,1) X];
for i=3:n+1;
Tx=[Tx 2*X.*Tx(:,i-1)-Tx(:,i-2)];
end
end
recommends to use shapepreserving methods such as Schumaker approximation in this case. Judd and Solnick [1994] have successfully applied this latter
technic to the optimal growth model and found that the approximation was
very good and dominated a lot of other methods (they actually get the same
precision with 12 nodes as the one achieved with a 1200 data points grid using
a value iteration technic).
7.3
7.3.1
The stochastic problem differs from the deterministic problem in that we now
have a . . . stochastic shock! The problem then defines a value function which
has as argument the state variable xt but also the stationary shock st , whose
sequence {st }+
t=0 satisfies
(7.6)
X
u(yt+ , xt+ , st+ )
(7.7)
E
V (xt , st ) =
max
t
=0
xt+1 = h(xt , yt , st )
(7.8)
X
V (xt , st ) =
max
u(y
,
x
,
s
)+E
u(yt+ , xt+ , st+ )
t t t
t
26
=1
or
V (xt , st ) =
max
yt D (xt ,st )
u(yt , xt , st )+
max
Et
max
yt D (xt+ ,st+ )
u(yt , xt , st )+
=1
max
Et
k=0
max
yt D (xt ,st )
u(yt , xt , st )
max
Et Et+1
max
yt D (xt ,st )
k=0
or
V (xt , st ) =
u(yt , xt , st )
max
Et+1
X
k=0
Note that because each value for the shock defines a particular mathematical
object the maximization of the integral corresponds to the integral of the maximization, therefore the max operator and the summation are interchangeable,
so that we get
V (xt , st ) =
max
u(yt , xt , st )
max Et+1
yt D (xt ,st )
{yt+1+k }k=0
X
k=0
27
max
yt D (xt ,st )
u(yt , xt , st ) +
or equivalently
V (xt , st ) =
max
yt D (xt ,st )
which is precisely the Bellman equation for the stochastic dynamic programming problem.
7.3.2
A very important problem that arises whenever we deal with value iteration or
policy iteration in a stochastic environment is that of the discretization of the
space spanned by the shocks. Indeed, the use of a continuous support for the
stochastic shocks is infeasible for a computer that can only deals with discrete
supports. We therefore have to transform the continuous problem into a discrete one with the constraint that the asymptotic properties of the continous
and the discrete processes should be the same. The question we therefore face
is: does there exist a discrete representation for s which is equivalent to its
continuous original representation? The answer to this question is yes. In
particular as soon as we deal with (V)AR processes, we can use a very powerful
tool: Markov chains.
states. Thus, the probability the next state st+1 does only depend on the
current realization of s.
The transition probabilities ij must of course satisfy
1. ij > 0 for all i, j = 1, . . . , M
2.
PM
j=1 ij
= 1 for all i = 1, . . . , M
11 . . . 1M
..
..
= ...
.
.
M 1 . . . M M
29
Note that kij then gives the probability that st+k = sj given that st = si .
In the long run, we obtain the steady state equivalent for a Markov chain: the
invariant distribution or stationary distribution
Definition 7 A stationary distribution for a Markov chain is a distribution
such that
1. j > 0 for all j = 1, . . . , M
2.
PM
j=1 j
=1
3. =
Gaussianquadrature approach to discretization
)
Z
Z
1 st+1 st (1 )s 2
1
2
which illustrates the fact that s is a continuous random variable. Tauchen and
Hussey [1991] propose to replace the integral by
Z
Z
f (st+1 |st )
(st+1 ; st , s)f (st+1 |s)dst+1
f (st+1 |s)dst+1 = 1
f (st+1 |s)
where f (st+1 |s) denotes the density of st+1 conditional on the fact that st = s
(which amounts to the unconditional density), which in our case implies that
"
(
#)
f (st+1 |st )
st+1 st (1 )s 2
1
st+1 s 2
(st+1 ; st , s)
= exp
f (st+1 |s)
2
30
then we can use the standard linear transformation and impose zt = (st
s)/( 2) to get
Z
2
1
2
dzt+1
exp (zt+1 zt )2 zt+1
j) exp zt+1
for which we can use a GaussHermite quadrature. Assume then that we have
n
1 X
j (zj ; zi ; x) ' 1
j=1
state j, but remember that the quadrature is just an approximation such that
P
it will generally be the case that nj=1
bij = 1 will not hold exactly. Tauchen
bij =
i
Pn
1
where i = j=1 j (zj ; zi ; x).
function [s,p]=tauch_huss(xbar,rho,sigma,n)
% xbar : mean of the x process
% rho
: persistence parameter
% sigma : volatility
% n
: number of nodes
%
% returns the states s and the transition probabilities p
%
[xx,wx] = gauss_herm(n);
% nodes and weights for s
s
= sqrt(2)*s*xx+mx;
% discrete states
x
= xx(:,ones(n,1));
z
= x;
w
= wx(:,ones(n,1));
%
% computation
%
p
= (exp(z.*z-(z-rx*x).*(z-rx*x)).*w)./sqrt(pi);
sx
= sum(p);
p
= p./sx(:,ones(n,1));
31
7.3.3
Value iteration
M
X
kj Vi (x0 , s0j )
j=1
3. If kVi+1 (x, s) Vi (x, s)k < go to the next step, else go back to 2.
4. Compute the final solution as
y ? (x, s) = y(x, x0 (s, s), s)
As in the deterministic case, I will illustrate the method relying on the optimal
growth model. In order to better understand the algorithm, let us consider a
simple example and go back to the optimal growth model, with
u(c) =
4
c1 1
1
The interested reader should then refer to Lucas et al. [1989] chapter 9.
32
and
k 0 = exp(a)k c + (1 )k
where a0 = a + 0 . Then the Bellman equation writes
Z
c1 1
V (k, a) = max
+ V (k 0 , a0 )d(a0 |a)
c
1
From the law of motion of capital we can determine consumption as
c = exp(a)k + (1 )k k 0
such that plugging this results in the Bellman equation, we have
Z
(exp(a)k + (1 )k k 0 )1 1
+ V (k 0 , a0 )d(a0 |a)
V (k, a) = max
k0
1
A first problem that we encounter is that we would like to be able to evaluate the integral involved by the rational expectation. We therefore have to
discretize the shock. Here, I will consider that the technology shock can be
accurately approximated by a 2 state markov chain, such that a can take on
2 values a1 and a2 (a1 < a2 ). I will also assume that the transition matrix is
symmetric, such that
=
1
1
a1 , a2 and are selected such that the process reproduces the conditional first
and second order moments of the AR(1) process
First order moments
a1 + (1 )a2 = a1
(1 )a1 + a2 = a2
Second order moments
a21 + (1 )a22 (a1 )2 = 2
(1 )a21 + a22 (a2 )2 = 2
From the first two equations we get a1 = a2 and = (1 + )/2. Plugging
p
these last two results in the two last equations, we get a1 = 2 /(1 2 ).
33
X
c1 1
V (k, ak ) = max
+
kj V (k 0 , a0j )
c
1
j=1
Now, let us define a grid of N feasible values for k such that we have
K = {k1 , . . . , kN }
and an initial value function V0 (k) that is a vector of N numbers that relate
each k` to a value. Note that this may be anything we want as we know by
the contraction mapping theorem that the algorithm will converge. But, if
we want it to converge fast enough it may be a good idea to impose a good
initial guess. Finally, we need a stopping criterion.
In figures 7.9 and 7.10, we report the value function and the decision rules
obtained for the stochastic optimal growth model with = 0.3, = 0.95,
= 0.1, = 1.5, = 0.8 and = 0.12. The grid for the capital stock
is composed of 1000 data points ranging from 0.2 to 6. The algorithm then
converges in 192 iterations when the stopping criterion is = 1e6 .
34
6
0
3
kt
1.5
4
3
0.5
1
0
0
Consumption
0
0
35
4
t
sigma
delta
beta
alpha
rho
se
=
=
=
=
=
=
1.50;
0.10;
0.95;
0.30;
0.80;
0.12;
nbk
= 1000;
nba
= 2;
crit
= 1;
epsi
= 1e-6;
%
% Discretization of the shock
%
p
= (1+rho)/2;
PI
= [p 1-p;1-p p];
se
= 0.12;
ab
= 0;
a1
= exp(-se*se/(1-rho*rho));
a2
= exp(se*se/(1-rho*rho));
A
= [a1 a2];
%
% Discretization of the state space
%
kmin
= 0.2;
kmax
= 6;
kgrid
= linspace(kmin,kmax,nbk);
c
= zeros(nbk,nba);
util
= zeros(nbk,nba);
v
= zeros(nbk,nba);
Tv
= zeros(nbk,nba);
%
%
%
%
while crit>epsi;
for i=1:nbk
for j=1:nba;
c
= A(j)*k(i)^alpha+(1-delta)*k(i)-k;
neg
= find(c<=0);
c(neg)
= NaN;
util(:,j)
= (c.^(1-sigma)-1)/(1-sigma);
util(neg,j) = -1e12;
end
[Tv(i,:),dr(i,:)] = max(util+beta*(v*PI));
end;
crit = max(max(abs(Tv-v)));
v
= Tv;
iter = iter+1;
end
36
kp
= k(dr);
for j=1:nba;
c(:,j)
= A(j)*k.^alpha+(1-delta)*k-kp(:,j);
end
7.3.4
Policy iterations
M
X
kj V (x0 , s0j )
j=1
M
X
j=1
where Q is an (N N ) matrix
6
0
3
kt
38
1.5
4
3
0.5
1
0
0
sigma
delta
beta
alpha
rho
se
Consumption
=
=
=
=
=
=
1.50;
0.10;
0.95;
0.30;
0.80;
0.12;
kt
0
0
kt
nbk
= 1000;
nba
= 2;
crit
= 1;
epsi
= 1e-6;
%
% Discretization of the shock
%
p
= (1+rho)/2;
PI
= [p 1-p;1-p p];
se
= 0.12;
ab
= 0;
a1
= exp(-se*se/(1-rho*rho));
a2
= exp(se*se/(1-rho*rho));
A
= [a1 a2];
%
% Discretization of the state space
%
kmin
= 0.2;
kmax
= 6;
kgrid
= linspace(kmin,kmax,nbk);
c
= zeros(nbk,nba);
util
= zeros(nbk,nba);
%
%
%
%
39
v
= zeros(nbk,nba);
Tv
= zeros(nbk,nba);
Ev
= zeros(nbk,nba);
% expected value function
dr0
= repmat([1:nbk],1,nba);
% initial guess
dr
= zeros(nbk,nba);
% decision rule (will contain indices)
%
% Main loop
%
while crit>epsi;
for i=1:nbk
for j=1:nba;
c
= A(j)*kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid;
neg
= find(c<=0);
c(neg)
= NaN;
util(:,j)
= (c.^(1-sigma)-1)/(1-sigma);
util(neg,j) = -inf;
Ev(:,j)
= v*PI(j,:);
end
[v1,dr(i,:)]= max(u+beta*Ev);
end;
%
% decision rules
%
kp = kgrid(dr);
Q
= sparse(nbk*nba,nbk*nba);
for j=1:nba;
c = A(j)*kgrid.^alpha+(1-delta)*kgrid-kp(:,j);
%
% update the value
%
util(:,j)= (c.^(1-sigma)-1)/(1-sigma);
Q0 = sparse(nbk,nbk);
for i=1:nbk;
Q0(i,dr(i,j)) = 1;
end
Q((j-1)*nbk+1:j*nbk,:) = kron(PI(j,:),Q0);
end
Tv =
crit=
v
=
kp0 =
iter=
(speye(nbk*nba)-beta*Q)\util(:);
max(max(abs(kp-kp0)));
reshape(Tv,nbk,nba);
kp;
iter+1;
end
c=zeros(nbk,nba);
for j=1:nba;
c(:,j)
= A(j)*kgrid.^alpha+(1-delta)*kgrid-kp(:,j);
end
40
7.4
7.4.1
After having solved the model by either value iteration or policy iteration, we
have in hand a collection of points, X , for the state variable and a decision
rule that relates the control variable to this collection of points. In other words
we have Lagrange data which can be used to get a continuous decision rule.
This decision rule may then be used to simulate the model.
In order to illustrate this, lets focus, once again on the deterministic optimal growth model and let us assume that we have solved the model by either
value iteration or policy iteration. We therefore have a collection of data points
for the capital stock K = {k1 , . . . , kN }, which form our grid, and the associ-
ated decision rule for the next period capital stock ki0 = h(ki ), i = 1, . . . , N ,
and the consumption level which can be deduced from the next period capital
stock using the law of motion of capital together with the resource constraint.
Also note that we have the upper and lower bound of the grid k1 = k and
kN = k. We will use Chebychev polynomials to approximate the rule in order to avoid multicolinearity problems in the least square algorithm. In other
words we will get a decision rule of the form
k0 =
n
X
` T` ((k))
`=0
41
where (k) is a linear transformation that maps [k, k] into [-1;1]. The algorithm works as follows
1. Solve the model by value iteration or policy iteration, such that we have
a collection of data points for the capital stock K = {k1 , . . . , kN } and
associated decision rule for the next period capital stock ki0 = h(ki ),
i = 1, . . . , N . Choose a maximal order, n, for the Chebychev polynomials
approximation.
2. Apply the transformation
xi = 2
ki k
1
kk
kk
1
kk
yt = kt
ct = y t i t
For instance, figure 7.13 reports the transitional dynamics of the model when
the economy is started 50% below its steady state. In the figure, the plain line
represents the dynamics of each variable, and the dashed line its steady state
level.
Figure 7.13: Transitional dynamics (OGM)
Capital stock
3
2.5
0.29
0.28
1.5
0.27
1
0
10
20
Time
30
40
0.26
0
50
Output
1.4
1.2
0.9
1.1
0.8
10
20
Time
30
10
40
0.7
0
50
20
Time
30
40
50
40
50
Consumption
1.1
1.3
1
0
Investment
0.3
10
20
Time
30
43
Tk
= [ones(nbk,1) transk];
for i=3:n;
Tk=[Tk 2*transk.*Tk(:,i-1)-Tk(:,i-2)];
end
b=Tk\kp;
% Performs OLS
k0 = 0.5*ks;
% initial capital stock
nrep= 50;
% number of periods
k
= zeros(nrep+1,1);
% initialize dynamics
%
% iteration loop
%
k(1)= k0;
for t=1:nrep;
trkt = 2*(k(t)-kmin)/(kmax-kmin)-1;
Tk
= [1 trkt];
for i=3:n;
Tk= [Tk 2*trkt.*Tk(:,i-1)-Tk(:,i-2)];
end
k(t+1)=Tk*b;
end
y
= k(1:nrep).^alpha;
i
= k(2:nrep+1)-(1-delta)*k(1:nrep);
c
= y-i;
Note: At this point, I assume that the model has already been solved and that the capital
grid has been stored in kgrid and the decision rule for the next period capital stock is stored
in kp.
7.4.2
3. Assume the economy was in state i in t 1, find the index j such that
ci(j1) < pt 6 cij
then sj is the state of the economy in period t
5
The code to generate these two rules is exactly the same as for the deterministic version
of the model since, when computing b=Tk\kp matlab will take care of the fact that kp is a
(N 2) matrix, such that b will be a ((n + 1) 2) matrix where n is the approximation
order.
45
s
PI
s0
T
p
j
j(1)
%
for k=2:n;
j(k)=find(((sim(k)<=cum_PI(state(k-1),2:cpi+1))&...
(sim(k)>cum_PI(state(k-1),1:cpi))));
end;
chain=s(j);
% simulated values
We are then in position to simulate our optimal growth model. Figure 7.14
reports a simulation of the optimal growth model we solved before.
4.5
Investment
0.5
0.4
3.5
0.3
3
0.2
2.5
0.1
2
1.5
0
50
100
Time
150
0
0
200
Output
100
Time
150
200
150
200
Consumption
1.6
1.8
1.4
1.6
1.4
1.2
1.2
1
0.8
0
50
50
100
Time
150
0.8
0
200
46
50
100
Time
= Tk*b(:,id(t));
= A(id(t))*k(t).^alpha;
end
i=k(2:long+1)-(1-delta)*k(1:long);
c=y-i;
% computes investment
% computes consumption
Note: At this point, I assume that the model has already been solved and that the capital
grid has been stored in kgrid and the decision rule for the next period capital stock is stored
in kp.
47
7.5
Handling constraints
Et
=0
c1
t+ 1
1
(7.9)
w
w
(7.10)
(7.11)
We assume that the households state on the labor market is a random variable.
Further the probability of being (un)employed in period t is greater if the
household was also (un)employed in period t 1. In other words, the state
6
Et (.) denotes mathematical conditional expectations. Expectations are conditional on
information available at the beginning of period t.
48
borrowing contraint that states that she cannot borrow more than a threshold
level a?
at+1 > a?
(7.12)
s.t. (7.10)(7.12), which may be solved by one of the methods we have seen
before. Anyway, implementing any of these method taking into account the
constraint is straightforward as, plugging the budget constraint into the utility
function, the Bellman equation can be rewritten
V (at , t ) = max u((1 + r)at + t at+1 ) + Et V (at+1 , t+1 )
at+1
s.t. (7.11)(7.12).
But, since the borrowing constraint should be satisfied in each and every
period, we have
V (at , t ) = max ? u((1 + r)at + t at+1 ) + Et V (at+1 , t+1 )
at+1 >a
In other words, when defining the grid for a, one just has to take care of the
fact that the minimum admissible value for a is a? . Positivity of consumption
which should be taken into account when searching the optimal value for
at+1 finally imposes at+1 < (1 + r)at + t , such that the Bellman equation
writes
V (at , t ) =
max
49
50
Bibliography
Ayiagari, R.S., Uninsured Idiosyncratic Risk and Aggregate Saving, Quarterly
Journal of Economics, 1994, 109 (3), 659684.
Bertsekas, D., Dynamic Programming and Stochastic Control, New York:
Academic Press, 1976.
Judd, K. and A. Solnick, Numerical Dynamic Programming with Shape
Preserving Splines, Manuscript, Hoover Institution 1994.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Lucas, R., N. Stokey, and E. Prescott, Recursive Methods in Economic Dynamics, Cambridge (MA): Harvard University Press, 1989.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.
51
Index
Bellman equation, 1, 2
Blackwells sufficiency conditions, 6
Cauchy sequence, 4
Contraction mapping, 4
Contraction mapping theorem, 4
Discounting, 6
Howard Improvement, 15, 37
Invariant distribution, 30
Lagrange data, 41
Markov chain, 28
Metric space, 3
Monotonicity, 6
Policy functions, 2
Policy iterations, 15, 37
Stationary distribution, 30
Transition probabilities, 29
Transition probability matrix, 30
Value function, 2
Value iteration, 32
52
Contents
7 Dynamic Programming
7.1
7.2
7.3
7.4
7.5
7.1.1
7.1.2
7.2.1
7.2.2
13
7.2.3
15
7.2.4
20
26
7.3.1
26
7.3.2
28
7.3.3
Value iteration . . . . . . . . . . . . . . . . . . . . . . .
32
7.3.4
Policy iterations . . . . . . . . . . . . . . . . . . . . . .
37
41
7.4.1
41
7.4.2
44
Handling constraints . . . . . . . . . . . . . . . . . . . . . . . .
48
53
54
List of Figures
7.1
11
7.2
12
7.3
7.4
15
16
7.5
19
7.6
20
7.7
22
7.8
22
7.9
35
35
38
39
43
46
55
56
List of Tables
7.1
57
23
Lecture Notes 8
Parameterized expectations
algorithm
The Parameterized Expectations Algorithm (PEA hereafter) was introduced
by Marcet [1988]. As it will become clear in a moment, this may be viewed as
a generalized method of undetermined coefficients, in which economic agents
learn the decision rule at each step of the algorithm. It will therefore have a
natural interpretation in terms of learning behavior. The basic idea of this
method is to approximate the expectation function of the individuals rather
than attempting to recover directly the decision rules by a smooth function,
in general a polynomial function. Implicit in this approach is the fact that the
space spanned by polynomials is dense in the space spanned by all functions
in the sense
lim inf sup |F (x) F (x)| = 0
k Rk xX
where F is the function to be approximated and F is an kth order interpolating function that is parameterized by .
8.1
Basics
reveal the set of parameters that insure that the residuals from the Euler
equations are a martingale difference sequence (Et t+1 = 0). Note that the
main difficulty when solving the model is to deal with the integral involved by
the expectation. The approach of the basic PEA algorithm is to approximate
it by MonteCarlo simulations.
PEA algorithm may be implemented to solve a large set of models that
admit the following general representation
F (Et (E (yt+1 , xt+1 , yt , xt )), yt , xt , t ) = 0
(8.1)
where F : Rm Rny Rnx Rne Rnx +ny describes the model and E :
Rny Rnx Rny Rnx Rm defines the transformed variables on which we
take expectations. Et is the standard conditional expectations operator. t is
the set of innovations of the structural shocks that affect the economy.
In order to fix notations, let us take the optimal growth model as an
example
1
+1
= 0
t Et t+1 zt+1 kt+1
c
t t = 0
kt+1 zt kt + ct (1 )kt = 0
zt+1 zt t+1 = 0
In this example, we have y = {c, }, x = {k, z} and = , the function E
takes the form
1
+1
E ({c, }t+1 , {k, z}t+1 , {c, }t , {k, z}t ) = t+1 zt+1 kt+1
ct t
F (.) =
k
zt kt + ct (1 )kt
t+1
zt+1 zt t+1
The idea of the PEA algorithm is then to replace the expectation function
Et (E (yt+1 , xt+1 , yt , xt )) by an parametric approximation function, (xt ; ), of
2
the current state variables xt and a vector of parameters , such that the
approximated model may be restated as
F ((xt , ), yt , xt , t ) = 0
(8.2)
The problem of the PEA algorithm is then to find a vector such that
Argmin k(xt , ) Et (E (yt+1 , xt+1 , yt , xt ))k2
which would call for LAD estimation methods. However, the usual practice is
use the standard quadratic norm.
Once, and therefore the approximation function has been found, (xt , )
and equation (8.2) may be used to generate time series for the variables of the
model.
The algorithm may then be described as follows.
Step 1. Specify a guess for the function (xt , ), an initial . Choose a stopping
criterion > 0, a sample size T that should be large enough and draw a
sequence {t }Tt=0 that will be used during all the algorithm.
Step 2. At iteration i, and for the given i , simulate, recursively, a sequence for
{yt (i )}Tt=0 and {xt (i )}Tt=0
3
t=0
which just amounts to perform a nonlinear least square regression taking E (yt+1 (), xt+1 (), yt (), xt ()) as the dependent variable, (.) as the
explanatory function and as the parameter to be estimated.
Step 4. Set i+1 to
i+1 = bi + (1 ) i
(8.3)
function, (.). How should it be specified? In fact, we are free to choose any
functional form we may think of, nevertheless economic theory may guide us
as well as some constraints imposed by the method more particularly in
step 3. A widely used interpolating function combines nonlinear aspects of
the exponential function with some polynomials, such that j (x, ) may take
the form (where j {1, . . . , m} refers to a particular expectation)
1.0000 0.9998
0.9998 1.0000
0.9991 0.9998
0.9980 0.9991
= 0.01
0.9991
0.9998
1.0000
0.9998
0.9980
0.9991
0.9998
1.0000
for the sequence at the power 4. Hence multicolinearity may occur. One way
to circumvent this problem is to rely on orthogonal polynomials instead of
standard polynomials in the interpolating function.
A second problem that arises in this approach is to select initial conditions for . Indeed, this step is crucial for at least 3 reasons: (i) the problem
is fundamentally nonlinear, (ii) convergence is not always guarantee, (iii)
economic theory imposes a set of restrictions to insure positivity of some variables for example. Therefore, much attention should be paid when imposing
an initial value to .
A third important problem is related to the choice of , the smoothing
parameter. A too large value may put too much weight on new values for
and therefore reinforce the potential forces that lead to divergence of the
algorithm. On the contrary, setting too close to 0 may be costly in terms of
computational CPU time.
It must however be noted that no general rule may be given for these
implementation issues and that in most of the case, one has to guess and try.
Therefore, I shall now report 3 examples of implementation. The first one is
the standard optimal growth model, the other one corresponds to the optimal
growth model with investment irreversibility, the last one will be the problem
of a household facing borrowing constraints.
But before going to the examples, we shall consider a linear example that
will highlight the similarity between this approach and the undetermined coefficient approach.
8.2
A linear example
min
t=1
(8.4)
t=1
N
1 X
xt (0 + 1 xt ayt+1 bxt ) = 0
N
(8.5)
t=1
t=1
t=1
N
N
N
1 X
1 X
1 X
xt = a
(0 + 1 xt+1 ) + b
xt
N
N
N
t=1
t=1
t=1
Asymptotically, we have
N
N
1 X
1 X
xt = lim
xt+1 = x
lim
N N
N N
t=1
t=1
(8.6)
t=1
t=1
t=1
t=1
t=1
t=1
(8.7)
t=1
Asymptotically, we have
N
N
1 X
1 X 2
xt = x and lim
xt = E(x2 ) = x2 + x2
N N
N N
lim
t=1
t=1
finally, we have
N
N
1 X
1 X
xt xt+1 = lim
xt ((1 )x + xt + t+1 )
N N
N N
lim
t=1
t=1
1
N
PN
t=1 xt t+1
N
1 X
xt xt+1 = (1 )x2 + E(x2 ) = x2 + x2
N N
lim
t=1
= 0,
b
1 a
ab(1 )x
(1 a)(1 a)
ab(1 )x
b
+
xt
(1 a)(1 a) 1 a
which corresponds exactly to the solution of the model (see Lecture notes
#1).Therefore, asymptotically, the PEA algorithm is nothing else but an undetermined coefficient method.
8.3
Let us first recall the type of problem we have in hand. We are about to solve
the set of equations
1
+1
= 0
t Et t+1 zt+1 kt+1
c
t t = 0
kt+1 zt kt + ct (1 )kt = 0
log(zt+1 ) log(zt ) t+1 = 0
9
1
Et t+1 zt+1 kt+1
+1
In this problem, we have 2 state variables: kt and zt , such that (.) should be
a function of both kt and zt . We will make the guess
From the first equation of the above system, we have that for a given
ct () = t ()
and therefore get
kt+1 () = zt kt () ct () + (1 )kt ()
We then recover a whole sequence for {kt ()}Tt=0 , {zt }Tt=0 , {t ()}Tt=0 , and
{ct ()}Tt=0 , which makes it simple to compute a sequence for
(8.8)
b We then set a new value for according to the updating scheme (8.3)
to get .
and restart the process until convergence.
The parameterization we used in the matlab code are given in table 8.1
and is totally standard. , the smoothing parameter was set to 1, implying
that in each iteration the new vector is totally passed as a new guess in the
10
0.95
0.3
0.1
0.9
e
0.01
progression of the algorithm. The stopping criterion was set at =1e-6 and
T =20000 data points were used to compute the OLS regression.
Initial conditions were set as follows. We first solve the model relying on
a loglinear approximation. We then generate a random draw of size T for
and generate series using the loglinear approximate solution. We then built
and ran the regression (8.8) to get an initial condition for , reported in table
8.2. The algorithm converges after 22 iterations and delivers the final decision
Table 8.2: Decision rule
Initial
Final
0
0.5386
0.5489
1
-0.7367
-0.7570
2
-0.2428
-0.3337
3
0.1091
0.1191
4
0.2152
0.1580
5
-0.2934
-0.1961
rule reported in table 8.2. When is set at 0.75, 31 iterations are needed,
46 for = 0.5 and 90 for = 0.25. It is worth noting that the final decision
rule does differ from the initial conditions, but not by an as large amount
as one would have expected, meaning that in this setup and provided the
approximation is good enough3 certainty equivalence and nonlinearities
do not play such a great role. In fact, as illustrated in figure 8.1, the capital
decision rule does not display that much non linearities. Although particularly
simple to implement (see the following matlab code), this method should be
handle with care as it may be difficult to obtain convergence for some models.
Nevertheless it has another attractive feature: it can handle problems with
3
Note that for the moment we have not made any evaluation of the accuracy of the
decision rule. We will undertake such an evaluation in the sequel.
11
kt+1
2.7
2.6
2.5
2.4
2.3
2.2
2.2
2.3
2.4
2.5
2.6
k
2.7
2.8
2.9
=
=
=
=
=
=
=
=
20000;
500;
init+long;
init+1:slong-1;
init+2:slong;
1e-6;
1;
1;
sigma
delta
beta
alpha
ab
rho
se
param
ksy
yss
kss
= 1;
= 0.1;
= 0.95;
= 0.3;
= 0;
= 0.9;
= 0.01;
= [ab alpha beta delta rho se sigma long init];
=(alpha*beta)/(1-beta*(1-delta));
= ksy^(alpha/(1-alpha));
= yss^(1/alpha);
12
iss
css
csy
lss
=
=
=
=
delta*kss;
yss-iss;
css/yss;
css^(-sigma);
randn(state,1);
e
= se*randn(slong,1);
a
= zeros(slong,1);
a(1)
= ab+e(1);
for i
= 2:slong;
a(i) = rho*a(i-1)+(1-rho)*ab+e(i);
end
b0
%
% Main Loop
%
iter
= 1;
while crit>tol;
%
% Simulated path
%
k
= zeros(slong+1,1);
lb
= zeros(slong,1);
X
= zeros(slong,length(b0));
k(1) = kss;
for i
= 1:slong;
X(i,:)= [1 log(k(i)) a(i) log(k(i))*log(k(i)) a(i)*a(i) log(k(i))*a(i)];
lb(i) = exp(X(i,:)*b0);
k(i+1)=exp(a(i))*k(i)^alpha+(1-delta)*k(i)-lb(i)^(-1/sigma);
end
y
= beta*lb(T1).*(alpha*exp(a(T1)).*k(T1).^(alpha-1)+1-delta);
bt
= X(T,:)\log(y);
b
= gam*bt+(1-gam)*b0;
crit = max(abs(b-b0));
b0
= b;
disp(sprintf(Iteration: %d\tConv. crit.: %g,iter,crit))
iter=iter+1;
end;
13
8.4
We now consider a variation to the previous model, in the sense that we restrict
gross investment to be positive in each and every period:
it > 0 kt+1 > (1 )kt
(8.9)
This assumption amounts to assume that there does not exist a second hand
market for capital. In such a case the problem of the central planner is to
determined consumption and capital accumulation, such that utility is maximum:
max E0
s.t.
X
t=0
c1
1
t
1
kt+1 = zt kt ct + (1 )kt
and
kt+1 > (1 )kt
Forming the Lagrangean associated to the previous problem, we have
"
1
X
ct+ 1
ct+ + (1 )kt+ kt+ +1
+ t+ zt+ kt+
Lt = E t
1
=0
#
+ t+ (kt+1 (1 )kt )
(8.10)
1
+ 1 t+1 (1 )
t t = Et t+1 zt+1 kt+1
kt+1 = zt kt ct + (1 )kt
t (kt+1 (1 )kt )
(8.11)
(8.12)
(8.13)
The main difference with the previous example is that now the central planner
faces a constraint that may be binding in each and every period. Therefore,
14
this complicates a little bit the algorithm, and we have to find a rule for both
the expectation function
Et [t+1 ]
where
1
+ 1 t+1 (1 )
t+1 t+1 zt+1 kt+1
(8.12) under the assumption that the constraint is not binding that
is t () = 0. In such a case, we just compute the sequences as in the
standard optimal growth model.
2. Test whether, under this assumption, it () > 0. If it is the case, then
set t () = 0, otherwise set kt+1 () = (1 )kt (), ct () is computed
from the resource constraint and t () is found from (8.11).
Note that, using this procedure, t is just treated as an additional variable
which is just used to compute a sequence to solve the model. We therefore
do not need to compute explicitly its interpolating function, as far as t+1 is
concerned we use the same interpolating function as in the previous example
and therefore run a regression of the type
log(t+1 ()) = 0 + 1 log(kt ()) + 2 log(zt ) + 3 log(kt ())2
+4 log(zt )2 + 5 log(kt ()) log(zt )
(8.14)
b
to get .
matlab code is essentially the same as the one we used in the optimal growth
model. The shock was artificially assigned a lower persistence and a greater
volatility in order to increase the probability of binding the constraint, and
therefore illustrate the potential of this approach. , the smoothing parameter
was set to 1. The stopping criterion was set at =1e-6 and T =20000 data
points were used to compute the OLS regression.
15
0.95
0.3
0.1
0.8
e
0.14
{t+1 ()}
t=0 , {kt ()}t=0 and {zt ()}t=0 and ran the regression (8.14) to get an
initial condition for , reported in table 8.4. The algorithm converges after 115
iterations and delivers the final decision rule reported in table 8.4. Contrary
Table 8.4: Decision rule
Initial
Final
0
0.4620
0.3558
1
-0.5760
-0.3289
2
-0.3909
-0.7182
3
0.0257
-0.1201
4
0.0307
-0.2168
5
-0.0524
0.3126
to the standard optimal growth model, the initial and final rule totally differ
in the sense the coefficient in front of the capital stock in the final rule is half
that on the initial rule, that in front of the shock is double, and the sign in
front of all the quadratic terms are reversed. This should not be surprising as
the initial rule is computed under (i) the certainty equivalence hypothesis and
(ii) the assumption that the constraint never binds, whereas the size of the
shocks we introduce in the model implies that the constraint binds in 2.8% of
the cases. The latter quantity may seem rather small, but this is sufficient to
dramatically alter the decision of the central planner when it acts under rational expectations. This is illustrated by figures 8.2 and 8.3 which respectively
report the decision rules for investment, capital and the lagrange multiplier
and a typical path for investment and lagrange multiplier. As reflected in
16
1.5
800
Distribution of investment
600
400
0.5
0
0
200
4
k
0
0
Capital stock
0.8
0.6
0.4
0.2
0
0
4
kt
0
0
0.5
1.5
Lagrange multiplier
4
kt
investment
0.25
0.8
0.2
0.6
0.15
0.4
0.1
0.2
0.05
0
0
200
Time
400
0
0
600
17
Lagrange multiplier
200
Time
400
600
the upper right panel of figure 8.2 which reports the simulated distribution of
investment the distribution is highly skewed and exhibits a mode at it = 0, revealing the fact that the constraint occasionally binds. This is also illustrated
in the lower left panel that reports the decision rule for the capital stock. As
can be seen from this graph, the decision rule is bounded from below by the
line (1 )kt (the grey line on the graph), such situation then correspond to
situations where the Lagrange multiplier is positive as reported in the lower
right panel of the figure.
Matlab Code: PEA Algorithm (Irreversible Investment)
clear all
long
= 20000;
init
= 500;
slong
= init+long;
T
= init+1:slong-1;
T1
= init+2:slong;
tol
= 1e-6;
crit
= 1;
gam
= 1;
sigma
= 1;
delta
= 0.1;
beta
= 0.95;
alpha
= 0.3;
ab
= 0;
rho
= 0.8;
se
= 0.125;
kss
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
css
= kss^alpha-delta*kss;
lss
= css^(-sigma);
ysk
= (1-beta*(1-delta))/(alpha*beta);
csy
= 1-delta/ysk;
%
% Simulation of the shock
%
randn(state,1);
e
= se*randn(slong,1);
a
= zeros(slong,1);
a(1)
= ab+e(1);
for i
= 2:slong;
a(i) = rho*a(i-1)+(1-rho)*ab+e(i);
end
%
% Initial guess
18
%
param
= [ab alpha beta delta rho se sigma long init];
b0
= peaoginit(e,param);
%
% Main Loop
%
iter
= 1;
while crit>tol;
%
% Simulated path
%
k
= zeros(slong+1,1);
lb
= zeros(slong,1);
mu
= zeros(slong,1);
X
= zeros(slong,length(b0));
k(1) = kss;
for i
= 1:slong;
X(i,:)= [1 log(k(i)) a(i) log(k(i))*log(k(i)) a(i)*a(i) log(k(i))*a(i)];
lb(i) = exp(X(i,:)*b0);
iv
= exp(a(i))*k(i)^alpha-lb(i)^(-1/sigma);
if iv>0;
k(i+1) = (1-delta)*k(i)+iv;
mu(i) = 0;
else
k(i+1) = (1-delta)*k(i);
c
= exp(a(i))*k(i)^alpha;
mu(i) = c^(-sigma)-lb(i);
end
end
y
= beta*(lb(T1).*(alpha*exp(a(T1)).*k(T1).^(alpha-1)+1-delta) ...
-mu(T1)*(1-delta));
bt
= X(T,:)\log(y);
b
= gam*bt+(1-gam)*b0;
crit = max(abs(b-b0));
b0
= b;
disp(sprintf(Iteration: %d\tConv. crit.: %g,iter,crit))
iter = iter+1;
end;
19
8.5
u(ct+ )
=0
s.t.
at+1 = (1 + r)at + t ct
at+1 > 0
log(t+1 ) = log(t ) + (1 ) log() + t+1
Let us first recall the first order conditions that are associated with this problem:
c
= t
t
(8.15)
t = t + (1 + r)Et t+1
at+1 = (1 + r)at + t ct
log(t+1 ) = log(t ) + (1 ) log() + t+1
(8.16)
(8.17)
(8.18)
t (at+1 a) = 0
(8.19)
t > 0
(8.20)
In order to solve this model, we have to find a rule for both the expectation
function
Et [t+1 ]
where
t+1 Rt+1
and t . We propose to follow the same procedure as the previous one:
20
1. Compute two sequences for {t ()}
t=0 and {at ()}t=0 from (8.16) and
(8.17) under the assumption that the constraint is not binding that
is t () = 0.
2. Test whether, under this assumption, at+1 () > a. If it is the case, then
set t () = 0, otherwise set at+1 () = a, ct () is computed from the
resource constraint and t () is found from (8.16).
Note that, using this procedure, t is just treated as an additional variable
which is just used to compute a sequence to solve the model. We therefore
do not need to compute explicitly its interpolating function, as far as t+1 is
concerned we use the same interpolating function as in the previous example
and therefore run a regression of the type
log(t+1 ()) = 0 + 1 at () + 2 t + 3 at ()2 + 4 t2 + 5 at ()t
(8.21)
b
to get .
0.95
1.5
0.7
0.1
R
1.04
was set to 1. The stopping criterion was set at =1e-6 and T =20000 data
points were used to compute the OLS regression.
One key issue in this particular problem is related to the initial conditions.
Indeed, it is extremely difficult to find a good initial guess as the only model
for which we might get an analytical solution while being related to the present
model is the standard permanent income model. Unfortunately, this model
exhibits a nonstationary behavior, in the sense it generates an I(1) process
for the level of individual wealth and consumption, and therefore the marginal
utility of wealth. We therefore have to take another route. We propose the
21
re = 0.1 and = 0.1. We then compute a1 from the law of motion of wealth.
sequence for both at and ct , and therefore for t . We can then recover easily
t+1 and an initial from the regression (8.21) (see table 8.6).
Table 8.6: Decision rule
Initial
Final
0
1.6740
1.5046
1
-0.6324
-0.5719
2
-2.1918
-2.1792
3
0.0133
0.0458
4
0.5438
0.7020
5
0.2971
0.3159
The algorithm converges after 79 iterations and delivers the final decision
rule reported in table 8.6. Note that if the final decision rule effectively differs
from the initial one, the difference is not huge, meaning that our initialization
procedure is relevant. Figure 8.4 reports the decision rule of consumption in
terms of cashonhand that is the effective amount a household may use to
purchase goods (Rat + t a). Figure 8.5 reports the decision rule for wealth
accumulation as well as the implied distribution, which admits a mode in a,
revealing that the constraints effectively binds (in 13.7% of the cases).
Matlab Code: PEA Algorithm (Borrowing Constraints)
clear
crit
tol
gam
long
init
slong
T
T1
=
=
=
=
=
=
=
=
1;
1e-6;
1;
20000;
500;
long+init;
init+1:slong-1;
init+2:slong;
rw
sw
= 0.7;
= 0.1;
22
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.5
1.5
2.5
3
3.5
Cashonhand (R a + a)
t
4.5
4000
3000
2000
1000
0
0
2
a
0
0
23
Distribution of wealth
wb
beta
R
sigma
ab
=
=
=
=
=
0;
0.95;
1/(beta+0.01);
1.5;
0;
randn(state,1);
e
= sw*randn(slong,1);
w
= zeros(slong,1);
w(1)
= wb+e(1);
for i
= 2:slong;
w(i)= rw*w(i-1)+(1-rw)*wb+e(i);
end
w=exp(w);
a
= zeros(slong,1);
c
= zeros(slong,1);
lb
= zeros(slong,1);
X
= zeros(slong,6);
a(1)
= ass;
rt
= 0.2;
sc
= 0.1;
randn(state,1234567890);
ec
= sc*randn(slong,1);
for i=1:slong;
X(i,:) = [1 a(i) w(i) a(i)*a(i) w(i)*w(i) a(i)*w(i)];
c(i)
= rt*a(i)+w(i)+ec(i);
a1
= R*a(i)+w(i)-c(i);
if a1>ab;
a(i+1)=a1;
else
a(i+1)= ab;
c(i) = R*a(i)+w(i)-ab;
end
end
lb = c.^(-sigma);
y
= log(beta*R*lb(T1));
b0 = X(T,:)\y
iter=1;
while crit>tol;
a
= zeros(slong,1);
c
= zeros(slong,1);
lb
= zeros(slong,1);
X
= zeros(slong,length(b0));
a(1)
= 0;
for i=1:slong;
X(i,:)= [1 a(i) w(i) a(i)*a(i) w(i)*w(i) a(i)*w(i)];
24
lb(i) = exp(X(i,:)*b0);
a1
= R*a(i)+w(i)-lb(i)^(-1/sigma);
if a1>ab;
a(i+1) = a1;
c(i)
= lb(i).^(-1./sigma);
else
a(i+1) = ab;
c(i)
= R*a(i)+w(i)-ab;
lb(i) = c(i)^(-sigma);
end
end
y
= log(beta*R*lb(T1));
b
= X(T,:)\y;
b
= gam*b+(1-gam)*b0;
crit = max(abs(b-b0));
b0
= b;
disp(sprintf(Iteration: %d\tConv. crit.: %g,iter,crit))
iter=iter+1;
end;
25
26
Bibliography
Marcet, A., Solving Nonlinear Stochastic Models by Parametrizing Expectations, mimeo, CarnegieMellon University 1988.
and D.A. Marshall, Solving Nonlinear Rational Expectations Models
by Parametrized Expectations : Convergence to Stationary Solutions,
Manuscript, Universitat Pompeu Fabra, Barcelone 1994.
and G. Lorenzoni, The Parameterized Expectations Approach: Some
Practical Issues, in M. Marimon and A. Scott, editors, Computational
Methods for the Study of Dynamic Economies, Oxford: Oxford University
Press, 1999.
27
Index
Expectation function, 2
Interpolating function, 1
Orthogonal polynomial, 6
28
Contents
8 Parameterized expectations algorithm
8.1
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2
A linear example . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3
8.4
8.5
investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
20
29
30
List of Figures
8.1
12
8.2
Decision rules . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
8.3
17
8.4
23
8.5
Wealth accumulation . . . . . . . . . . . . . . . . . . . . . . . .
23
31
32
List of Tables
8.1
11
8.2
Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
8.3
16
8.4
Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
8.5
21
8.6
Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
33
Lecture Notes 9
9.1
9.1.1
(9.1)
where F : Rny Rnx Rny Rnx Rne Rny +nx describes the model. t
is the set of innovations of the structural shocks that hit the economy. The
solution of the model is a set of decision rules for the control variables
yt = g(xt ; )
that define the next period state variables as
xt+1 h(xt , yt , t+1 ) = h(xt , g(xt ; ), t+1 )
such that the model can be rewritten
Et R(xt , t+1 ; g, ) = 0
(9.2)
The idea of the minimum weighted residual method is to replace the true
decision rule by a parametric approximation function, (xt; ), of the current
state variables xt and a vector of parameters . Therefore, in its original
implementation the minimum weighted residual method differs from the PEA
algorithm in that we are not seeking directly an expectation function but a
decision rule.1 The problem is then to find a vector of parameters such that
when the agents use the so defined rule of thumb, Et F (xt , t+1 ; g, ) can be
made as small as possible. But, What do we mean by small? In fact we
want to find a vector of parameters b such that
kEt R(xt , t+1 ; g, {}ni=0 )k = 0
which corresponds to
Z
(9.3)
9.1.2
Implementation
Choosing an implementation of the minimum weighting residual method basically amounts to make 3 choices
1. the choice of a family of approximating functions,
2. the choice of the weighting function,
3. the choice of the method to approximate the integral involved by (i) the
rational expectation and (ii) the identifying restriction (9.3).
Choosing a family of approximating functions:
xi xi1
xi+1 x
(xi ) =
x
x
i+1 i
0
3
p
X
i i (xt )
i=0
extremely important in that it will define the method we will use. Traditionally, we may define 3 broad classes of method, depending on the weighting
function we use:
1. The Least square method sets
i (x) =
R(xt , t+1 ; g, )
i
1 if x = xi
0 if x 6= xi
1. The problem of the rational expectation: In this case, everything depends on the form of the process for the shocks. If we are to use Markov
chains, the integration problem is highly simplified as it only involves
a discrete sum. If we use continuous support for the shocks, the choice
of an integration method is dictated by the type of distribution we are
assuming. In most of the cases, we will use gaussian shocks such that
we will rely on GaussHermite quadrature methods described in lecture
notes # 4.
2. The problem of the inner product (equation (9.3)): In this case everything depends on the approximation method. If we are to use a
collocation method, no integration method is to be used as collocation
5
amounts to impose that the residuals are zero at each node, such that
(9.3) reduces to
Et R(xi , t+1 ; g, {}ni=0 ) = 0 for i = 1, . . . , n
When Least square methods are used, it is often the case that Legendre
quadrature will do the job. When a Galerkin method is selected, this will
often be the right choice too. However, when we are to use Chebychev
polynomials, a Chebychev quadrature is in order provided each weighting
function is defined as2
Ti ((x))
i (x) = p
1 (x)2
9.2
Practical implementation
In the sequel, we will essentially discuss the collocation and Galerkin implementations of a spectral method using Chebychev polynomials as they seem
to be the most popular implementation of this method.3
9.2.1
In this section, we present the collocation method in its simplest form. We will
start by presenting the general algorithm, when Chebychev polynomials are
used, and then present as an example the stochastic optimal growth model.
For the moment, let us assume that we want to solve a rational expectation
model that writes as
Et R(xt , t+1 ; g, ) = 0
2
3
The least square method can be straightforwardly implemented minimizing the implicit
objective function we discussed.
and we want to find an approximating function for the decision rule g(xt ) over
the domain [x, x]. Assume that we take as approximating function
(xt , )
n
X
i Ti ((xt ))
i=0
(2i 1)
zi = cos
for i = 1, . . . , n + 1
2(n + 1)
and formulate an initial guess for .
2. Compute xi as
xi = x + (zi + 1)
(x x)
for i = 1, . . . , n + 1
2
n
X
i Ti ((xt ))
i=0
We will discuss the one dimensional case. However, the multidimensional case will
be illustrated in an example. You are also referred to lecture notes #3 which presented
multidimensional approximation technics
The updating scheme for will typically be given by a nonlinear solver of the
type described in lecture notes #5. The computation of the integral involved
by the rational expectation will depend on the process we assumed for the
shocks. In order to better understand it, let us discuss the implementation of
the collocation method on the stochastic optimal growth model both in the
case of a Markov chain, and an AR(1) process.
The stochastic OGM (Markov chain)
p
1p
=
1p
p
The Euler equation is given by
1
c
=0
t Et ct+1 exp(at+1 )kt+1 + 1
= 0
(1 p) ct+1 (a) exp(a)kt+1 (a)1 + 1
= 0
p ct+1 (a) exp(a)kt+1 (a)1 + 1
where the notation ct (a) (resp. ct (a)) stands for the fact that the consumption
decision is contingent on the realization of the shock, a (resp. a), likewise for
kt+1 (a) (resp. kt+1 (a)).
It therefore appears that we actually have only two decision rules to compute, one for ct (a) and one for ct (a), each taking the form5
n
X
j=0
n
X
j=0
(9.4)
(9.5)
where the notation (a) accounts for the fact that the form of the decision
rule may differ depending on the realization of the shock, a. We may also
select different order of approximation for the two rules, but in order to keep
things simple we chose to impose the same order. The algorithm then works
as follows.
1. Choose an order of approximation n, compute the n + 1 roots of the
Chebychev polynomial of order n + 1 as
zi = cos
(2i 1)
2(n + 1)
for i = 1, . . . , n + 1
log(k) log(k)
ki = exp log(k) + (zi + 1)
2
for i = 1, . . . , n + 1
3. Compute
n
X
j (a)Tj ((log(ki )))
ct (a) ' (kt , (a)) exp
j=0
n
X
ct (a) ' (kt , (a)) exp
j (a)Tj ((log(ki )))
j=0
and
at+1
a
a
a
a
Consumption
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
where
for all i = 1, . . . , n.
6. if all residuals are close enough to zero then stop, else update and go
back to 3.
10
From a practical point of view, the last step is performed using a Newton
algorithm. Initial conditions can be obtained from a linear approximation of
the model (see the matlab codes in directory growth/collocmc).
Matlab Code: Collocation Method (OGM, Main Code)
clear all
global kmin ksup XX kt;
global nstate nbk ncoef XX XT PI;
nbk
nodes
nstate
ncoef
=
=
=
=
4;
nbk+1;
2;
nbk+1;
%
%
%
%
Degree of polynomials
# of Nodes
# of possible states for technology shock
# of coefficients
delta
= 0.1;
% depreciation rate
beta
= 0.95;
% discount factor
alpha
= 0.3;
% capital elasticity
sigma
= 1.5;
% parameter of utility
%
% Steady state
%
ysk
=(1-beta*(1-delta))/(alpha*beta);
ksy
= 1/ysk;
ys
= ksy^(alpha/(1-alpha));
ks
= ys^(1/alpha);
is
= delta*ks;
cs
= ys-is;
%
% Markov Chain: technology shock (Tauchen Hussey)
%
rho
= 0.8;
se
= 0.2;
ma
= 0;
[agrid,wmat] = hernodes(nstate);
agrid
= agrid*sqrt(2)*se;
PI
= transprob(agrid,wmat,0,rho,se);
at
= agrid+ma;
%
% grid for the capital stock
%
kmin
= log(0.1);
ksup
= log(6);
rk
= rcheb(nodes);
% roots
kt
= exp(itransfo(rk,kmin,ksup)); % grid
XX
= cheb(rk,[0:nbk]);
% Polynomials
%
% Initial Conditions
11
%
a0=repmat([-0.2 0.65 0.04 0 0],nstate,1);
a0=a0(:);
%
% Main loop
%
param
= [alpha beta delta sigma];
th
= fcsolve(residuals,a0,[],param,at);
th
= reshape(th,ncoef,nstate);
=
=
=
=
=
=
param(1);
param(2);
param(3);
param(4);
length(theta);
reshape(theta,lt/nstate,nstate);
RHS=[];
LHS=[];
for i=1:nstate;
ct
= exp(XX*theta(:,i));
% C(t)
k1
= exp(at(i))*kt.^alpha+(1-delta)*kt-ct;
% k(t+1)
rk1 = transfo(log(k1),kmin,ksup);
% k(t+1) in kmin,ksup
xk1 = cheb(rk1,[0:nbk]);
% polynomials
%
% Euler equation for all states
%
aux = 0;
for j=1:nstate;
c1
= exp(xk1*theta(:,j));
% c(t+1)
%
% Cumulates the left hand side of the Euler equation for all states
%
resid = beta*(alpha*exp(at(j))*k1.^(alpha-1)+1-delta).*c1.^(-sigma);
aux
= aux+PI(i,j)*resid;
end;
RHS = [RHS -sigma*log(ct)];
LHS = [LHS log(aux)];
end;
res=LHS-RHS;
res=res(:);
12
For the parameterization we use in the codes, and using 4th order Chebychev polynomials, we obtain the decision rules reported in table 9.1. Figure
Table 9.1: Decision rules (OGM, Collocation)
0
-0.2807
-0.4876
1
0.7407
0.8406
2
0.0521
0.0546
3
0.0034
0.0009
4
-0.0006
-0.0006
(9.1) reports the decision rules of consumption and next period capital stock.
As can be seen, they are (hopefully) alike those obtained from the value iteration method (see lecture notes #7).
Figure 9.1: Decision rules (OGM, Collocation)
6
1.5
4
3
0.5
1
0
0
Consumption
0
0
The AR(1) process : In this case, we assume that the technology shock is
modelled as
at+1 = at + t+1
where ; N (0, 2 ). The Euler equation is given by
1
c
=0
t Et ct+1 exp(at+1 )kt+1 + 1
13
1
1
ct p
ct+1 exp(at + t+1 )kt+1 + 1 exp t+1
dt+1 = 0
2
22
2
kt+1 exp(at )kt + ct (1 )kt = 0
1
ct+1 exp(at + z 2 )kt+1
ct
+ 1 exp z 2 dz = 0
which makes more explicit the need for a GaussHermite quadrature. We then
just have to compute the nodes and weight for the quadrature, such that the
Euler equation rewrites
c
t
1 X
1
+1 =0
j ct+1 exp(at + zj 2 )kt+1
j=1
We now have to formulate a guess for the decision rule. Note that since we
have not specified particular values for the technology shock, we can introduce
it into the approximating rule, such that we will take
nk X
na
X
ct ' (kt , at , ) exp
jk ja Tjk ((log(kt )))Tja ((at ))
jk =0 ja =0
Note that since we use collocation, the numbers of nodes should be equal to
the total numbers of coefficients. It may then be much easier to work with
tensor basis rather than complete polynomials in this case. The algorithm
then works as follows.
14
1. Choose an order of approximation nk and na for each dimension, compute the nk + 1 and na + 1 roots of the Chebychev polynomial of order
nk + 1 and na + 1 as
zki
zai
(2i 1)
for i = 1, . . . , nk + 1
= cos
2(n + 1)
(2i 1)
= cos
for i = 1, . . . , na + 1
2(n + 1)
log(k) log(k)
i
for i = 1, . . . , nk + 1
ki = exp log(k) + (zk + 1)
2
to map [-1;1] into [k; k] and
ai = log(a) + (zai + 1)
aa
for i = 1, . . . , na + 1
2
nk X
na
X
jk =0 ja =0
and
nk X
na
X
for ` = 1, . . . , q.
15
q
X
` (ki , aj , z` ; )
`=1
where
6. if all residuals are close enough to zero then stop, else update and go
back to 3.
From a practical point of view, the last step is performed using a Newton
algorithm. Initial conditions can be obtained from a linear approximation of
the model (see the matlab codes in directory growth/collocg).
Matlab Code: Collocation Method (Stochastic OGM, Main Code)
clear all
global nbk kmin ksup XK kt;
global nba amin asup XA at;
global nstate nodea nodek wmat wgrid;
nbk=4;
nba=2;
nodek=nbk+1;
nodea=nba+1;
nstate=12;
%
%
%
%
%
delta
= 0.1;
% depreciation rate
beta
= 0.95;
% discount factor
alpha
= 0.3;
% capital elasticity
sigma
= 1.5;
% parameter of utility
rho
= 0.8;
% persistence of AR(1)
se
= 0.2;
% standard deviation of innovation
ma
= 0;
% average of the process
%
% deterministic steady state
%
ysk
=(1-beta*(1-delta))/(alpha*beta);
ksy
= 1/ysk;
ys
= ksy^(alpha/(1-alpha));
ks
= ys^(1/alpha);
16
is
= delta*ks;
cs
= ys-is;
ab
= 0;
%
% grid for the technology shocks
%
[wgrid,wmat]=hernodes(nstate);
% weights and nodes for quadrature
wgrid
= wgrid*sqrt(2)*se;
amin
= (ma+wgrid(nstate));
asup
= (ma+wgrid(1));
ra
= rcheb(nodea);
% roots
at
= itransfo(ra,amin,asup);
% grid
XA
= cheb(ra,[0:nba]);
% Polynomials
%
% grid for the capital stock
%
kmin
= log(.1);
ksup
= log(6);
rk
= rcheb(nodek);
% roots
kt
= exp(itransfo(rk,kmin,ksup)); % grid
XK
= cheb(rk,[0:nbk]);
% Polynomials
%
% Initial Conditions
%
a0
= [
-0.23759592487257
0.60814488103911
0.03677400318790
0.69025680170443
-0.21654209984197
0.00551243342828
0.03499834613714
-0.00341171507904
-0.00449139656933
0.00085302605779
0.00285737302122
-0.00002348542016
-0.00011606672164
-0.00003323351559
0.00018045618825];
a0
= a0(:);
%
% main part!
%
param
= [alpha beta delta sigma ma rho];
th
= fcsolve(residuals,a0,[],param);
17
For the parameterization we use in the codes, and using 4th order Chebychev polynomials for the capital stock and second order polynomials for the
technology shock, we obtain the decision rules reported in table 9.2.
18
T0 (at )
T1 (at )
T2 (at )
T0 (kt )
-0.3609
0.6474
0.0338
T1 (kt )
0.7992
-0.2512
0.0103
T2 (kt )
0.0487
-0.0096
-0.0058
T3 (kt )
0.0019
0.0048
-0.0004
T4 (kt )
-0.0003
0.0001
0.0004
Figure (9.2) reports the decision rules of consumption and next period
capital stock.
Consumption
10
3
2
5
1
0
10
5
k
9.2.2
0 2
t
0
10
5
k
0 2
Et R(xt , t+1 ; g, ) = 0
19
We therefore want to find an approximating function for the decision rule g(x t )
over the domain [x, x]. Assume that we take as approximating function
(xt , )
n
X
i Ti ((xt ))
i=0
where
Ti ((x))
i (x) = p
1 (x)2
which rewrites
T ((x))R(x, t+1 ; , ) = 0
where
T0 ((x1 )) . . . T0 ((xm ))
..
..
..
T (x) =
.
.
.
Tn ((x1 )) . . . Tn ((xm ))
(2i 1)
for i = 1, . . . , m
zi = cos
2m
and formulate an initial guess for .
6
We will discuss the one dimensional case. However, the multidimensional case will
be illustrated in an example. You are also referred to lecture notes #3 which presented
multidimensional approximation technics
20
T0 (z1 ) . . . T0 (zm )
..
..
..
T (x) =
.
.
.
Tn (z1 ) . . . Tn (zm )
3. Compute xi as
xi = x + (zi + 1)
(x x)
for i = 1, . . . , n + 1
2
n
X
i Ti ((xt ))
i=0
p
1p
=
1p
p
1
=0
c
t Et ct+1 exp(at+1 )kt+1 + 1
and the law of motion of capital
1
0.7407
0.8406
2
0.0521
0.0546
3
0.0034
0.0009
4
-0.0005
-0.0006
The AR(1) process : Like in the case of a Markov chain, the matlab codes
associated to the Galerkin method are basically the same as the ones for the
collocation method, up to some minor differences. We were using a greater
number of nodes (20 for capital and 10 for the shock) such that in the main
code, the line nodek=nbk+1; and nodea=nba+1; now read nodek=20; and
nodea=10;, and the residuals are projected on the chebychev matrix, such
that the codes for the residuals is slightly modified, as the line res=LHS-RHS;
is now replaced by res=XX*(LHS-RHS);. Further, we used complete basis
rather than tensor basis (see the makepoly.m function)
Table 9.4: Decision rule parameters
T0 (a) -0.359970
T1 (a) 0.009890
T2 (a) 0.001764
T0 (k)
0.647039
0.048143
0.004738
T1 (k)
0.032769
-0.009799
-0.000247
23
T2 (k)
0.799127
-0.005171
T3 (k)
-0.252257
T4 (k)
9.3
9.3.1
Let us first recall the type of problem we have in hand. We are about to solve
the set of equations
1
t Et t+1 exp(at+1 )kt+1
+1
= 0
c
t t = 0
1
Et t+1 exp(at+1 )kt+1
+1
In this problem, we will deal with a continuous AR(1) process,7 such we have
2 state variables: kt and at , such that (.) should be a function of both kt and
7
See directory growth/peagalmc for the matlab code with a Markov chain.
24
at . Like in the standard Galerkin procedure, we will use a guess of the form
(kt , at , ) exp
jk =0,...,nk
ja =0,...,na
ja +jk 6max(nk ,na )
(2i 1)
= cos
for i = 1, . . . , mk
2(n + 1)
(2i 1)
= cos
for i = 1, . . . , ma
2(n + 1)
log(k) log(k)
i
ki = exp log(k) + (zk + 1)
for i = 1, . . . , mk
2
to map [-1;1] into [k; k] and
ai = log(a) + (zai + 1)
aa
for i = 1, . . . , ma
2
(kt+1 (ki , aj ), aj + z` 2 , )
and the consumption in t + 1
`=1
where
At this point, it is not clear why the PEA modification to the minimum
weighted residual method could be interesting, as it does not differ so much
from its standard implementation. However, it has the attractive feature of
being able to deal with binding constraints extremely easily. As an example
of implementation, we shall now study the stochastic optimal growth model
with an irreversibility in investment decision.8
9.3.2
Irreversible investment
We now consider a variation to the previous model, in the sense that we restrict
gross investment to be positive in each and every period:
it > 0 kt+1 > (1 )kt
(9.6)
This assumption amounts to assume that there does not exist a second hand
market for capital. In such a case the problem of the central planner is to
determined consumption and capital accumulation, such that utility is maximum:
max E0
s.t.
X
t=0
c1
1
t
1
c1 1
8
You will also find in the matlab codes a version of the problem of a consumer facing a
borrowing constraint in the directory borrow.
27
(9.7)
1
t t = Et t+1 exp(at+1 )kt+1
+ 1 t+1 (1 )
(9.8)
(9.9)
(9.10)
The main difference with the previous example is that now the central planner
faces a constraint that may be binding in each and every period. Therefore,
this complicates a little bit the algorithm, and we have to find a rule for both
the expectation function
Et [t+1 ]
where
1
t+1 t+1 exp(at+1 )kt+1
+ 1 t+1 (1 )
nk
X
Et [t+1 ] ' (kt , a` ; ) exp
(a` )Tj ((log(ki ))) for each a` , ` = 1, . . . , na
j=0
The algorithm is very close to the one we used in the case of PEA and
closely follows ideas suggested in Marcet and Lorenzoni [1999] and Christiano
and Fisher [2000]. In the case of a Markov chain and for the Galerkin procedure, it can be sketched as follows.
9
You are left to study the matlab code located in /irrev/peagalg for the continuous
AR(1) case.
28
(2i 1)
2(n + 1)
for i = 1, . . . , mk
T0 (z1 ) . . . T0 (zm )
..
..
..
T (z) =
.
.
.
Tn (z1 ) . . . Tn (zm )
3. Compute ki as
log(k) log(k)
for i = 1, . . . , nk + 1
ki = exp log(k) + (zi + 1)
2
to map [-1;1] into [k; k].
4. Compute the approximating expectation function at each node ki , i =
1, . . . , mk , and for each possible state a` , ` = 1, . . . , na :
nk
X
(ki , a` ; ) = exp
(a` )Tj ((log(ki )))
j=0
t (ki , a` ; ) = 0
If it is binding then set
kt+1 (ki , a` ; ) = (1 )ki
t (ki , a` ; ) = ct (ki , a` ; )
t (ki , a` ; ) = t (ki , a` ; ) (ki , a` ; )
7. Then, for each state (ki , a` ), i = 1, . . . , nk and ` = 1, . . . , na , compute
the possible levels of future consumption without taking the constraint
into account
ct+1 (kt+1 (ki , a` ; ), a ; ) = (kt+1 (ki , a` ; ), a ; )1/
for = 1, . . . , na , the level of next period investment
it (kt+1 (ki , a` ; ), a ; ) = exp(a )kt+1 (ki , a` ; ) ct (kt+1 (ki , a` ; ), a ; )
8. Check whether the constraint is binding at each position
If the constraint is not binding then keep computed values for
ct+1 (kt+1 (ki , a` ; ), a ; ) and set
na
X
` (ki , a` , a ; )
=1
where
(ki , a` , a ; ) ct+1 (kt+1 (ki , a` ; ), a ; )
%
%
%
%
Degree of polynomials
# of Nodes
# of possible states for the shock
# of coefficients
= 0.1;
% depreciation rate
= 0.95;
% discount factor
= 0.3;
% capital elasticity
= 1.5;
% CRRA Utility
=(1-beta*(1-delta))/(alpha*beta);
= 1/ysk;
31
ys
= ksy^(alpha/(1-alpha));
ks
= ys^(1/alpha);
is
= delta*ks;
cs
= ys-is;
%
% Markov Chain: (Tauchen-Hussey 1991) technology shock
%
rho
= 0.8;
% persistence
se
= 0.075;
% volatility
ma
= 0;
[agrid,wmat]=hernodes(nstate);
agrid
= agrid*sqrt(2)*se;
PI
= transprob(agrid,wmat,0,rho,se);
at
= agrid+ma;
%
% grid for the capital stock
%
kmin
= log(1);
ksup
= log(7);
rk
= rcheb(nodes);
% roots
kt
= itransfo(rk,kmin,ksup);
% grid
XX
= cheb(rk,[0:nbk]);
% Polynomials
%
% Initial Conditions
%
a0=[
-0.32518704
-0.22798091
-0.15021991
-0.12387075
-0.61535041
-0.64879442
-0.70031744
-0.84056683
-0.02892469
-0.03369096
-0.05570433
-0.15063263
-0.00565378
-0.00941495
-0.02618621
-0.09526710
-0.00296230
-0.00547789
-0.01692339
-0.06102182
-0.00145337
-0.00274142
-0.00988611
-0.03461623
-0.00065225
-0.00115070
-0.00536657
-0.01776343
-0.00025245
-0.00038302
-0.00291504
-0.00838555
-0.00004845
-0.00001360
-0.00162292
-0.00365588
0.00001150
0.00013398
-0.00088759
-0.00124601
-0.00003248
0.00013595
-0.00055706
-0.00013116
];
%
% Solves the problem effectively
%
param
= [alpha beta delta sigma];
th
= fcsolve(residuals,a0,[],param,at);
th
= reshape(th,ncoef,nstate);
%
% Evaluate the approximation
%
lt
= length(th);
nb
= 100;
32
-0.23608269
-1.23998020
-0.45685868
-0.32031660
-0.21319594
-0.13635186
-0.09027963
-0.06431768
-0.04689460
-0.03294968
-0.02163670
kt
= [kmin:(ksup-kmin)/(nb-1):ksup];
rk
= transfo(kt,kmin,ksup);
XX
= cheb(rk(:),[0:nbk]);
kt
= exp(kt);
kp
= zeros(nb,nstate);
it
= zeros(nb,nstate);
ct
= zeros(nb,nstate);
mu
= zeros(nb,nstate);
for i=1:nstate;
Upsilon = exp(XX*th(:,i));
y
= exp(at(i))*kt.^alpha;
it(:,i) = max(y-Upsilon.^(-1/sigma),0);
ct(:,i) = y-it(:,i);
kp(:,i) = it(:,i)+(1-delta)*kt;
mu(:,i) = ct(:,i).^(-sigma)-Ephi;
end;
Matlab Code: Residuals Function (Irreversible Investment)
function res=residuals(theta,param,at);
global nbk kmin ksup XX kt PI nstate;
alpha
= param(1);
beta
= param(2);
delta
= param(3);
sigma
= param(4);
lt
= length(theta);
theta
= reshape(theta,lt/nstate,nstate);
RHS=[];LHS=[];
% lhs and rhs of Euler equation
for i=1:nstate;
Upsilon = exp(XX*theta(:,i));
% evaluate expectation
y
= exp(at(i))*exp(kt).^alpha;
% Output
iv
= max(y-Ephi.^(-1/sigma),0);
% takes care of the constraint
k1
= (1-delta)*exp(kt)+iv;
% next period capital stock
rk1
= transfo(log(k1),kmin,ksup);
% maps it into [-1;1]
xk1
= cheb(rk1,[0:nbk]);
% Computes the polynomials
aux
= 0;
for j=1:nstate;
Upsilon1 = exp(xk1*theta(:,j));
% used for the consumption
y1
= exp(at(j))*k1.^alpha;
% next period output
mu1
= max(y1.^(-sigma)-Ephi1,0); % mu>0 <=> lambda=(c=y)^(-sigma)
tmp
= beta*((alpha*y1./k1+1-delta).*Upsilon1-mu1*(1-delta));
aux
= aux+PI(i,j)*tmp;
end;
RHS = [RHS (aux)];
LHS = [LHS (Ephi)];
end;
res = XX*(LHS-RHS);
% Galerkins projection
res = res(:);
33
T0 (log(kt ))
T1 (log(kt ))
T2 (log(kt ))
T3 (log(kt ))
T4 (log(kt ))
T5 (log(kt ))
T6 (log(kt ))
T7 (log(kt ))
T8 (log(kt ))
T9 (log(kt ))
T1 0(log(kt ))
a1
-0.271633
-0.617972
-0.020999
-0.000673
-0.000403
-0.000251
-0.000073
-0.000031
-0.000042
-0.000028
-0.000008
a2
-0.198195
-0.639585
-0.021617
-0.001298
-0.000913
-0.000670
-0.000312
-0.000103
-0.000104
-0.000093
-0.000032
a3
-0.131885
-0.662475
-0.024964
-0.004093
-0.002900
-0.002002
-0.001192
-0.000519
-0.000363
-0.000359
-0.000182
a4
-0.076585
-0.701972
-0.042518
-0.017748
-0.012222
-0.007493
-0.004215
-0.001893
-0.000923
-0.000890
-0.000557
a5
-0.044720
-0.786584
-0.094447
-0.053895
-0.032597
-0.015666
-0.005548
-0.000922
0.000202
-0.000995
-0.001279
Table 9.5 and Figure 9.3 report the approximate decision rules for an economy where = 0.3, = 0.95, = 0.1, = 1.5, the persistence of the shock
34
is = 0.8 and the volatility = 0.075. As can be seen, only low values
for the shock make the constraint binds. This is illustrated by the fact that
the Lagrange multiplier becomes positive, and investment becomes negative.
Therefore, the next period capital stock is given by what is left by the depreciation and output is fully consumed.
Figure 9.3: Decision rules (Irreversible investment, PEAGalerkin)
7
Consumption
6
5
1.5
4
3
2
1
0
4
k
0.5
0
4
k
Investment
0.5
0.3
0.4
Lagrange multiplier
0.2
0.3
0.1
0.2
0
0.1
0
0
4
k
0.1
0
4
k
Note that up to now, we have not address the important issue of the
evaluation of the rule we obtained using any of the method we have presented
so far. This is the object of the next lecture notes.
35
36
Bibliography
Christiano, L.J. and J.D.M. Fisher, Algorithms for solving dynamic models
with occasionally binding constraints, Journal of Economic Dynamics
and Control, 2000, 24 (8), 11791232.
Judd, K., Projection Methods for Solving Aggregate Growth Models, Journal
of Economic Theory, 1992, 58, 410452.
Marcet, A. and G. Lorenzoni, The Parameterized Expectations Approach:
Some Practical Issues, in M. Marimon and A. Scott, editors, Computational Methods for the Study of Dynamic Economies, Oxford: Oxford
University Press, 1999.
McGrattan, E., Solving the Stochastic Growth Model with a Finite Element
Method, Journal of Economic Dynamics and Control, 1996, 20, 1942.
, Application of Weighted Residual Methods to Dynamic Economic Models, in M. Marimon and A. Scott, editors, Computational Methods for the
Study of Dynamic Economies, Oxford: Oxford University Press, 1999.
37
38
Contents
9 Minimum Weighted Residual Methods
9.1
9.2
9.3
9.1.1
9.1.2
Implementation . . . . . . . . . . . . . . . . . . . . . . .
Practical implementation . . . . . . . . . . . . . . . . . . . . .
9.2.1
9.2.2
19
24
9.3.1
24
9.3.2
Irreversible investment . . . . . . . . . . . . . . . . . . .
27
39
40
List of Figures
9.1
13
9.2
19
9.3
35
41
42
List of Tables
9.1
13
9.2
19
9.3
23
9.4
23
9.5
43