Vous êtes sur la page 1sur 431

Chapter 1

Expectations and Economic


Dynamics
Expectations lie at the core of economic dynamics as they usually determine,
not only the behavior of the agents, but also the main properties of the economy under study. Although having been soon recognized, the question of
expectations has been neglected for a while, as this is a pretty difficult issue to deal with. In this course, we will mainly be interested by rational
expectations

1.1

The rational expectations hypothesis

The term rational expectations is most closely associated with Nobel Laureate Robert Lucas of the University of Chicago, but the question of rationality
of expectations came into the place before Lucas investigated the issue (see
Muth [1960] or Muth [1961]). The most basic interpretation of rational expectations is usually summarized by the following statement:
Individuals do not make systematic errors in forming their expectations; expectations errors are corrected immediately, so that
on average expectations are correct.
But rational expectation is a bit more subtil concept that may be defined in
3 ways.
Definition 1 (Broad definition) Rational expectations are such that individuals formulate their expectations in an optimal way, which is actually com1

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

parable to economic optimization.


This definition actually states that individuals collect information about the
economic environment and use it in an optimal way to specify their expectations. For example, assume that an individual wants to make forecasts on
an asset price, she needs to know the series of future dividends and therefore needs to make predictions about these dividends. She will then collect all
available information about the environment of the firm (expected demand, investments, state of the market. . . ) and use this information in an optimal way
to make expectations. But two key issues emerge then: (i) the cost of collecting information and (ii) the definition of the objective function. Hence, despite
its general formulation, this definition remains weakly operative. Therefore, a
second definition was proposed in the literature.
Definition 2 (middefinition) Agents do not waste any available piece of
information and use it to make the best possible fit of the real world.
This definition has the great advantage of avoiding to deal with the problem
of the cost of collecting information we only need to know that agents do
not waste information but it remains weakly operative in the sense it is
not mathematically specified. Hence, the following weak definition is most
commonly used.
Definition 3 (weak definition) Agents formulate expectations in such a way
that their subjective probability distribution of economic variables (conditional
on the available information) coincides with the objective probability distribution of the same variable (the state of Nature) in an equilibrium:
xet = E(xt |)
where denote the information set
When the model satisfies a markovian property, essentially consists of past
realizations of the stochastic variables from t=0 on. For instance, if we go back
to our individual who wants to predict the price of an asset in period t, will
essentially consist of all past realizations of this asset price: = {pti ; i =
1 . . . t}. Beyond, this definition assumes that agents know the model and the

1.1. THE RATIONAL EXPECTATIONS HYPOTHESIS

probability distributions of the shocks that hit the economy that is what is
needed to compute all the moments (average, standard deviations, covariances
. . . ) which are needed to compute expectations. In other words, and this is
precisely what makes rational expectations so attractive:
Expectations should be consistent with the model
= Solving the model is finding an expectation function.
Notation: Hereafter, we will essentially deal with markovian models, and will
work with the following notation:
Eti (xt ) = E(xt |ti )
where ti = {xk ; k = 0 . . . t i}.
The weak definition of rational expectations satisfies two vary important properties.
Proposition 1 Rational Expectations do not exhibit any bias: Let x
bt = xt xet
denote the expectation error:

Et1 (b
xt ) = 0
which essentially corresponds to the fact that individuals do not make systematic errors in forming their expectations.
Proposition 2 Expectation errors do not exhibit any serial correlation:
Covt1 (b
xt , x
bt1 ) = Et1 (b
xt x
bt1 ) Et1 (b
xt )Et1 (b
xt1 )
= Et1 (b
xt )b
xt1 Et1 (b
xt )b
xt1

= 0
Example 1 Lets consider the following AR(2) process
xt = 1 xt1 + 2 xt2 + t
such that the roots lies outside the unit circle and t is the innovation of the
process.

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


1. Lets now specify = {xk ; k = 0, . . . , t 1}, then
E(xt |) = E(1 xt1 + 2 xt2 + t |)
= E(1 xt1 |) + E(2 xt2 |) + E(t |)
Note that by construction, we have xt1 and xt2 , therefore,
E(xt1 |) = xt1 and E(xt2 |) = xt2 . Since, t is an innovation,
it is orthogonal to any past realization of the process, t such that
E(t |) = 0. Hence
E(xt |) = 1 xt1 + 2 xt2
2. Lets now specify = {xk ; k = 0, . . . , t 2}, then
E(xt |) = E(1 xt1 + 2 xt2 + t |)
= E(1 xt1 |) + E(2 xt2 |) + E(t |)
Note that by construction, we have xt2 , such that as before E(xt2 |) =
xt2 . Further, we still have t such that E(t |) = 0. But now
xt1
/ such that
E(xt |) = 1 E(xt1 |) + 2 xt2
and we shall compute E(xt1 |):
E(xt1 |) = E(1 xt2 + 2 xt3 + t1 |)
= E(1 xt2 |) + E(2 xt3 |) + E(t1 |)
Note that xt2 , xt3 and t1 , such that
E(xt1 |) = 1 xt2 + 2 xt3
Hence
E(xt |) = (21 + 2 )xt2 + 2 xt3

This example illustrates the so called law of iterated projection.


Proposition 3 (Law of Iterated Projection) Lets consider two information sets t and t1 , such that t t1 , then
E(xt |t1 ) = E(E(xt |t )|t1 )

1.1. THE RATIONAL EXPECTATIONS HYPOTHESIS

Beyond, the example reveals a very important property of rational expectations: a rational expectation model is not a model in which the individual knows everything. Everything depends on the information structure. Lets consider some simple examples.
Example 2 (signal extraction) In this example, we will deal with a situation where the agents know the model but do not perfectly observe the shocks
they face. Information is therefore incomplete because the agents do not know
perfectly the distribution of the true shocks.
Assume that a firm wants to predict the demand, d, it will be addressed, but
only observes a random variable x that is related to d as
x=d+

(1.1)

where E(d) = 0, E(d2 ) = d < , E( 2 ) = < , E(d) = , and


E() = 0. This assumption amounts to state that x differs from d by a measurement error, . Note that in this example, we assume that there is a noisy
information, but the firm still knows the overall structure of the model (namely
it knows 1.1). The problem of the firm is then to formulate an expectation for
d only observing x: = {1, x}. In this case, the problem of the entrepreneur
is to determine E(d|). Since the entrepreneur knows the linear structure of
the model, it can guess that
E(d|) = 0 + 1 x
From proposition 1, we know that the expectation error exhibits no bias so that
E(d E(d|)|) = 0
which amounts to
E(d 0 1 x|) = 0
or

E(d 0 1 x|1) = 0
E(d 0 1 x|x) = 0

These are the two normal equation associated with an OLS estimate, hence we
have
1 =

2
Cov(x, d)
Cov(d + , d)
=
= 2 d 2
V(x)
V(d + )
d +

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

and
0 =

d2 + 2

Example 3 (bounded memory) In this example, we deal with a situation


where the agents know the model but have a bounded memory in the sense they
forget past realization of the shocks.
Lets consider the problem of a firm which demand depends on expected aggregate demand and the price level. In order to keep things as simple as possible, we will assume that the price is an exogenous i.i.d process with mean p
and variance p2 ) and that aggregate demand is driven by the following simple
AR(1) process
Yt = Yt1 + (1 )Y + t
where || < 1 and t is the innovation of the process. The demand then takes
the following form
dt = E(Yt+1 |) pt
But rather than being defined as = {Yti , pti , ti ; i = 0 . . . }, now takes
the form = {Yti , pti , ti ; i = 0 . . . k, k < }. Computing the rational
expectation is now a bit more tricky. We first have to write down the Wold
decomposition of the process of Y
Yt = Y +

i ti

i=0

Then E(Yt+1 |) can be computed as


E(Yt+1 |) = E

Y +

X
i=0

!

i t+1i


Since Y is a deterministic constant, E Y = Y , such that
E(Yt+1 |) = Y +

i E(t+1i |)

i=0

Since = {Yti , pti , ti ; i = 0 . . . k, k < }, we have ti i > k, such


that, in this case E(ti |) = 0. Hence,
E(Yt+1 |) = Y +

k
X
i=0

i+1 ti

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS


hence
dt = Y +

k
X
i=0

i+1 ti

pt

which may be reexpressed in terms of observable variables as





dt = Y + Yt Y k+1 Yt(k+1) Y
pt

1.2

1.2.1

A prototypical model of rational expectations


Sketching up the model

In this section we try to characterize the behavior of an endogenous variable


y that obeys the following expectational difference equation
yt = aEt yt+1 + bxt

(1.2)

where Et yt+1 E(yt+1 |) where = {yti , xti , i = 0, . . . , }.


Equation (1.2) may be given different interpretations. We now provide you
with a number of models that suit this type of expectational difference equation.
Assetpricing model:

Let pt be the price of a stock, dt be the dividend,

and r be the rate of return on a riskless asset, assumed to be held constant


over time. Standard theory of finance teaches us that if agents are risk neutral,
then the arbitrage between holding stocks and the riskless asset should be such
that the expected return on the stock given by the expected rate of capital
gain plus the dividend/price ratio should equal the riskless interest rate:
Et pt+1 pt dt
+
=r
pt
pt
or equivalently
pt = aEt pt+1 + adt where a
The Cagan Model:

1
<1
1+r

The Cagan model is a macro model that was designed

to furnish an explanation to the hyperinflation problem. Cagan assumes that


the demand for real balances takes the following form

Mtd
e
= exp t+1
Pt

(1.3)

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

e
where t+1
denotes expected inflation
e
t+1

Et (Pt+1 ) Pt
Pt

In an equilibrium, money demand equals money supply, such that


Mtd = Mts = Mt
hence in an equilibrium, equation (1.3) reduces to


Mt
Et (Pt+1 ) Pt
= exp
Pt
Pt

(1.4)

Taking logs lowercases will denote logged variables using the approximation log(1 + x) x and reorganizing, we end up with
pt = aEt (pt+1 ) + (1 a)mt where a =
Monopolistic competition

1+

Consider a monopolist that faces the following

demand
pt = yt Et yt+1

(1.5)

the term in yt accounts for the fact that the greater the greater the price
is, the lower the demand is. The term in Et yt+1 accounts for the fact that
greater expected sells tend to lower the price.1 The firm acts as a monopolist
maximizing its profit
max pt yt ct yt
yt

taking the demand (1.5) into account. ct is the marginal cost, which is assumed to follow an exogenous stochastic process. Note that we assume, for the
moment, that the firm adopts a purely static behavior. Profit maximization
taking (1.5) into account yields
2yt Et yt+1 ct = 0
which may be rewritten as
yt = aEt (pt+1 ) + bct + d where a =
1

,b=
and d =
2
2
2

If < 0, the model may be given an alternative interpretation. Greater expected sells
lead the firm to raise its price (you may think of goods such as tobacco, alcohol, . . . , each
good that may create addiction).

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

At this point we are left with the expectational difference equation (1.2),
which may either be solved forward or backward looking depending on
the value of a. When |a| < 1 the solution should be forward looking, as it
will become clear in a moment, conversely, when |a| > 1 the model should be
solved backward. The next section investigates this issue.

1.2.2

Forward looking solutions: |a| < 1

The problem that arises with the case |a| < 1 may be understood by looking
at figure 1.1, which reports the dynamics of equation
Et yt+1 =

b
1
yt xt
a
a

Holding xt constant and therefore eliminating the expectation. As can be


seen from the figure, the path is fundamentally unstable as soon as we look at
it in the usual backward looking way. Starting from an initial condition that
differs from y, say y0 , the dynamics of y diverges. The system then displays
a bubble.2 A more interesting situation arises when the variable yt represents
a variable such as a price or consumption in any case a variable that shifts
following a shock and that does not have an initial condition but a terminal
condition of the form
lim |yt | <

(1.6)

In fact such a terminal condition which is often related to the socalled


transversality condition arising in dynamic optimization models bounds
the sequence of {yt }
t=0 and therefore imposes stationarity.

Solving this

system then amounts to find a sequence of stochastic variable that satisfies


(1.2). This may be achieved in different ways and we now present 3 possible
methods.
Forward substitution
This method proceeds by iterating forward on the system, making use of the
law of iterated projection (proposition 3). Let us first recall the expectational
difference equation at hand:
yt = aEt yt+1 + bxt
2

We will come back to this point later on.

10

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


Figure 1.1: The regular case
Et yt+1 6
-

45

yt

y y0

Iterating one step forward that is plugging the value of yt evaluated in t + 1


in the expectation, we get
yt = aEt (Et+1 (ayt+2 + bxt+1 )) + bxt
The law of iterated projection implies that Et (Et+1 (yt+2 )) = Et yt+2 , so that
yt = a2 Et (yt+2 ) + abEt (xt+1 ) + bxt
Iterating one step forward, we get
yt = a2 Et (Et+2 (ayt+3 + bxt+2 )) + abEt (xt+1 ) + bxt
Once again making use of the law of iterated projection, we get
yt = a3 Et (yt+3 ) + a2 bEt (xt+2 ) + abEt (xt+1 ) + bxt
Continuing the process, we get
yt = b lim

k
X
i=0

ai Et (xt+i ) + lim ak+1 Et (yt+k+1 )


k

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

11

For the first term to converge, we need the expectation Et (xt+k ) not to increase
at a too fast pace. Then provided that |a| < 1, a sufficient condition for the
first term to converge is that the expectation explodes at a rate lower than
|1/a 1|.3 In the sequel we will assume that this condition holds.
Finally, since |a| < 1, imposing that lim |yt | < holds, we have
t

k+1

lim a

Et (yt+k+1 ) = 0

and the solution is given by


yt = b

ai Et (xt+i )

(1.7)

i=0

In other words, yt is given by the discounted sum of all future expected values
of xt . In order to get further insight on the form of the solution, we may be
willing to specify a particular process for xt . We shall assume that it takes
the following AR(1) form:
xt = xt1 + (1 )x + t
where || < 1 for sake of stationarity and t is the innovation of the process.
Note that
Et xt+1 = xt + (1 )x
Et xt+2 = Et xt+1 + (1 )x = 2 xt + (1 )(1 + )x
Et xt+3 = Et xt+2 + (1 )x = 3 xt + (1 )(1 + + 2 )x
..
.
Et xt+i = i xt + (1 )(1 + + 2 + . . . + i )x = i xt + (1 i+1 )x
Therefore, the solution takes the form

X
ai (i xt + (1 i )x)
yt = b
= b

i=0

X
i=0

(a) (xt x) +

X
i=0

ax


xt x
x
+
= b
1 a 1 a
b
ab(1 )
=
x
xt +
1 a
(1 a)(1 a)


This will actually be the case with a stationary process.

12

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Figure 1.2 provides an example of the process generated by such a solution, in


the deterministic case and in the stochastic case. In the deterministic case, the
economy always lies on its longrun value y , which is the only stable point.
We then talk about steady state that is a situation where yt = yt+k = y .
In the stochastic case, the economy fluctuates around the mean of the process,
and it is noteworthy that any change in xt instantaneously translates into a
change in yt . Therefore, the persistence of yt is given by that of xt .
Figure 1.2: Forward Solution
Deterministic Case

Stochastic Case

9
8

5.5
7
5

6
5

4.5
4
4
0

50

100
Time

150

200

3
0

50

100
Time

150

200

Note: This example was generated using a = 0.8, b = 1, = 0.95, = 0.1 and x = 1.
Matlab Code: Forward Solution
\simple
%
% Forward solution
%
lg = 100;
T
= [1:long];
a
= 0.8;
b
= 1;
rho = 0.95;
sx = 0.1;
xb = 1;
%
% Deterministic case
%
y=a*b*xb/(1-a);
%
% Stochastic case
%
%
% 1) Simulate the exogenous process
%
x
= zeros(lg,1);
randn(state,1234567890);

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

13

e
= randn(lg,1)*sx;
x(1) = xb;
for i=2:long;
x(i) = rho*x(i-1)+(1-rho)*xb+e(i);
end
%
% 2) Compute the solution
%
y
= b*x/(1-a*rho)+a*b*(1-rho)*xb/((1-a)*(1-a*rho));

Factorization
The method of factorization was introduced by Sargent [1979]. It amounts to
make use of the forward operator F , introduced in the first chapter.4 In a first
step, equation (1.2) is rewritten in terms of F
yt = aEt yt+1 +bxt Et (yt ) = aEt (yt+1 )+bEt (xt ) (1aF )Et yt = bEt xt
which rewrites as
E t yt = b

Et xt
1 aF

since |a| < 1, we have

X
1
ai F i
=
1 aF
i=0

Therefore, we have

E t y t = yt = b

ai F i Et xt = b

ai Et xt+i

i=0

i=0

Note that although we get, obviously, the same solution, this method is not
as transparent as the previous one since the terminal condition (1.6) does not
appear explicitly.
Method of undetermined coefficients
This method proceeds by making an initial guess on the form of the solution.
An educated guess for the problem at hand would be
yt =

i Et xt+i

i=0

Recall that the forward operator is such that F i Et (xt ) = Et (xt+i ).

14

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Plugging the guess in (1.2) leads to

i Et xt+i = aEt

i=0

i Et+1 xt+1+i

i=0

+ bxt

Solving the model then amounts to find the sequence of i , i = 0, . . . , such


that the guess satisfies the equation. We then proceed by identification.
i = 0 0 = b
i = 1 1 = a0
i = 2 2 = a1
..
.
such that i = ai1 , with 0 = b. Note that since |a| < 1, this sequence
converges toward 0 as i tends toward infinity. Therefore, the solution writes
yt = b

ai Et xt+i

i=0

The problem with such an approach is the we need to make the right guess
from the very beginning. Assume for a while that we had specified the following guess
yt = xt
Then
xt = aEt xt+1 + bxt
Identifying term by terms we would have obtained = b or = 0, which is
obviously a mistake.
As a simple example, let us assume that the process for xt is given by the same
AR(1) process as before. We therefore have to solve the following dynamic
system

yt = aEt yt+1 + bxt


xt = xt1 + (1 )x + t

Since the system is linear and that xt exhibits a constant term, we guess a
solution of the form
yt = 0 + 1 xt
Plugging this guess in the expectational difference equation, we get
0 + 1 xt = aEt (0 + 1 xt+1 ) + bxt

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

15

which rewrites, computing the expectation5


0 + 1 xt = a0 + a1 xt + a1 (1 )x + bxt
Identifying term by term, we end up with the following system of equations
0 = a0 + a1 (1 )x1 = a1 + b
The second equation yields
1 =

b
1 a

the first one gives


0 =

ab(1 )
x
(1 a)(1 a)

One advantage of this method is that it is particularly simple, and it requires


the user to know enough on the economic problem to formulate the right guess.
This latter property precisely constitutes the major drawback of the method
as if formulating a guess is simple for linear economies it may be particularly
tricky even impossible in all other cases.

1.2.3

Backward looking solutions: |a| > 1

Until now, we have only considered the case of a regular economy in which
|a| < 1, which provided we are ready to impose a nonexplosion condition
yields a unique solution that only involves fundamental shocks. In this
section we investigate what happens when we relax the condition |a| < 1
and consider the case |a| > 1. This fundamentally changes the nature of the
solution, as can be seen from figure 1.3. More precisely, any initial condition
y0 for y is admissible as any leads the economy back to its longrun solution
y. The equilibrium is then said to be indeterminate.
From a mathematical point of view, the sum involved in the forward solution
is unlikely to converge. Therefore, the solution should be computed in an
alternative way. Let us recall the expectational difference equation
yt = aEt yt+1 + bxt
5
Note that this is here that we make use of the assumptions on the process for the
exogenous shock.

16

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


Figure 1.3: The irregular case
Et yt+1 6

45
y

y0

yt

Note that, by construction, we have


yt+1 = Et (yt+1 ) + t+1
where t+1 is the expectational error, uncorrelated by construction with
the information set, such that Et t+1 = 0. The expectational difference equation then rewrites
yt = a(yt+1 t+1 ) + bxt
which may be restated as
yt+1 =

b
1
yt + xt + t+1
a
a

Since |a| > 1 this equation is stable and the system is fundamentally backward
looking. Note that t+1 is serially uncorrelated, and not necessarily correlated
with the innovations of xt . In other words, this shock may not be a fundamental shock and is alike a sunspot. For example, I wake up in the morning,
look at the weather and decides to consume more. Why? I dont know! This
is purely extrinsic to the economy!

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

17

Figure 1.4 reports an example of such an economy. We have drawn the solution
to the model for different values of the volatility of the sunspot, using the
same draw. As can be seen, although each solution is perfectly admissible,
the properties of the economy are rather different depending on the volatility
of the sunspot variable. Besides, one may compute the volatility and the first
Figure 1.4: Backward Solution
=0.1

Without sunspot
2.5

2.5

1.5

1.5

0.5

0.5

0
0

50

100
Time
=0.5

150

200

0
0

50

50

100
Time

150

200

100
Time

150

200

0
0

100
Time
=1

150

200

2
0

50

Note: This example was generated using a = 1.8, b = 1, = 0.95, = 0.1 and x = 1.

order autocorrelation of yt :6
y2 =
y (1) =

a2
b2 ( + a)
2

+
2
x
(a2 1)(a )
a2 1
"
#
b2 (a2 1)x2
1
1+ 2
a
b (a + )x2 + a2 (a )2

Therefore, as should be expected, the overall volatility of y is an increasing


function of the volatility of the sunspot, but more important is the fact that
its persistence is lower the greater the volatility of the sunspot. Hence, there
6

We leave it to you as an exercize.

18

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

may be many candidates to the solution of such a backward looking equation,


each displaying totally different properties.
Matlab Code: Backward Solution
%
% Backward solution
%
lg = 200;
T
= [1:lg];
a
= 1.8;
b
= 1;
rho = 0.95;
sx = 0.1;
xb = 1;
se = 0.1;
%
% 1) Simulate the exogenous process
%
x
= zeros(lg,1);
randn(state,1234567890);
e
= randn(lg,1)*sx;
x(1) = xb;
for i=2:lg;
x(i) = rho*x(i-1)+(1-rho)*xb+e(i);
end
%
% 2) Compute the solution
%
randn(state,1234567891);
es = randn(lg,1);
y1 = zeros(lg,1); % without sunspot
y2 = zeros(lg,1); % with sunspot (se=0.1)
y3 = zeros(lg,1); % with sunspot (se=0.5)
y4 = zeros(lg,1); % with sunspot (se=1)
y1(1) = 0;
y2(1) = es(1)*0.1;
y3(1) = es(1)*0.5;
y4(1) = es(1);
for i=2:lg;
y1(i) = y1(i-1)/a+b*x(i-1)/a;
y2(i) = y2(i-1)/a+b*x(i-1)/a+0.1*es(i);
y3(i) = y3(i-1)/a+b*x(i-1)/a+0.5*es(i);
y4(i) = y4(i-1)/a+b*x(i-1)/a+es(i);
end

1.2.4

One step backward: bubbles

Lets now go back to the forward looking solution. The ways we dealt with it
led us to eliminate any bubble that is we imposed condition (1.6) to bound
the sequence. By doing so, we restricted ourselves to a particular class of

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

19

solution, but there may exist a wider class of admissible solution that satisfy
(1.2) without being bounded.
Let us now assume that such an alternative solution of the form does exist
yet = yt + bt

where yt is the solution (1.7) and bt is a bubble. In order for yet to be a solution
to (1.2), we need to place some additional assumption on its behavior.

If yet = yt + bt it has to be the case that Et yet+1 = Et yt+1 + Et bt+1 , such that

plugging this in (1.2), we get

yt + bt = aEt yt+1 + aEt bt+1 + bxt


Since yt is a solution to (1.2), we have that yt = aEt yt+1 + bxt such that the
latter equation reduces to
bt = aEt bt+1 Et bt+1 = a1 bt
Therefore, any bt that satisfies the latter restriction will be such that yet is a

solution to (1.2). Note that since |a| < 1 in the case of a forward solution,

bt explodes in expected values therefore referring directly to the common


sense of a speculative bubble. Up to this point we have not specified any
particular functional form for the bubble. Blanchard and Fisher [1989] provide
two examples of such bubbles:
1. The everexpanding bubble: bt then simply follows a deterministic trend
of the form:
bt = b0 at
It is then straightforward to verify that bt = aEt bt+1 . How should we
interpret such a behavior for the bubble? In order to provide with some
insights, lets consider the case of the assetpricing equation:
Et pt+1 pt dt
+
=r
pt
pt
where dt = d (for simplicity). It is straightforward to check that the
nobubble solution (the fundamental solution) takes the form:
pt = p =

d
r

20

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


which sticks to the standard solution that states that the price of an asset
should be the discounted sum of expected dividends (you may check that
P
i
d /r =
i=0 (1+r) d ). If we now add a bubble of the kind we consider

that is bt = b0 at = b0 (1 + r)t provided b0 > 0 the price of the


asset will increase exponentially though the dividends are constant. The

explanation for such a result is simple: individuals are ready to pay a


price for the asset greater than expected dividends because they expect
the price to be higher in future periods, which implies that expected
capital gains will be able to compensate for the low price to dividend
ratio. This kind of anticipation is said to be selffulfilling. Figure 1.5
reports an example of such a bubble.
Figure 1.5: Deterministic Bubble
Asset price

Bubble

28

2.5
Bubble solution
Fundamental solution

27
2
26
1.5
25
24

10
Time

15

20

1
0

dividend/price

10
Time

15

20

Capital Gain (%)

0.039

0.35

0.0385

0.3

0.038
0.25
0.0375
0.2

0.037
0.0365
0

10
Time

15

20

0.15
0

10
Time

Note: This example was generated using d = 1, r = 0.04.

Matlab Code: Deterministic Bubble


%
% Example of a deterministic bubble
% The case of asset pricing (constant dividends)
%
d_star = 1;

15

20

1.2. A PROTOTYPICAL MODEL OF RATIONAL EXPECTATIONS

21

r
= 0.04;
%
% Fundamental solution p*
%
p_star = d_star/r;
%
% bubble
%
long
= 20;
T
= [0:long];
b
= (1+r).^T;
p
= p_star+b;

2. The burstingbubble: A problem with the previous example is that the


bubble is everexpanding whereas observation and common sense suggests that sometimes the bubble bursts. We may therefore define the
following bubble:
bt+1 =

(a)1 bt + t+1
t+1

with probability
with probability 1

with Et t+1 = 0. So defined, the bubble keeps on inflating with probability and bursts with probability (1 ). Lets check that bt = aEt bt+1
bt =
=
=
=

aEt (((a)1 bt + t+1 ) + (1 )t+1 )


aEt ((a)1 bt ) + t+1 )
aEt (a1 bt )
bt

taking bursting into account


grouping terms in t+1
since Et t+1 = 0
since bt is known in t

Figure 1.6 reports an example of such a bubble (the vertical lines in


the upper right panel of the figure corresponds to time when the bubble
bursts). The intuition for the result is the same as before: individuals are
ready to pay a higher price for the asset than the expected discounted
dividends because they expect with a sufficiently high probability that
the price will be high enough in subsequent periods to generate sufficient
capital gains to compensate for the lower price to dividend ratio. The
main difference with the previous case is that this bubble is now driven
by a stochastic variable, labelled as sunspot in the literature.

22

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Figure 1.6: Bursting Bubble


Asset price

Bubble

200

150
Bubble solution
Fundamental solution

150

100

100

50

50

0
0

50

100
Time

150

200

50
0

50

dividend/price

100
Time

150

200

Capital Gain (%)

0.06

50

0.05
0.04

0.03
0.02

50

0.01
0
0

50

100
Time

150

200

100
0

50

100
Time

150

200

Note: This example was generated using d = 1, = 0.95, r = 0.04.

1.3. A STEP TOWARD MULTIVARIATE MODELS

23

Matlab Code: Bursting Bubble


%
% Example of a bursting bubble
% The case of asset pricing (constant dividends)
%
d_star = 1;
r
= 0.04;
%
% Fundamental solution p*
%
p_star = d_star/r;
%
% bubble
%
long
= 200;
prob
= 0.95;
randn(state,1234567890);
e
= randn(long,1);
rand(state,1234567890);
ind
= rand(long,1);
b
= zeros(long,1);
dum
= zeros(long,1);
b(1)
= 0;
for i = 1:long-1;
dum(i)= ind(i)<prob;
b(i+1)= dum(i)*(b(i)*(1+r)/prob+e(i+1))+(1-dum(i))*e(i+1);
end;
p
= p_star+b;

Up to this point we have been dealing with very simple situations where the
problem is either backward looking or forward looking. Unfortunately, such
a case is rather scarce, and most of economic problems such as investment
decisions, pricing decisions . . . are both backward and forward looking. We
examine such situations in the next section.

1.3

A step toward multivariate Models

We are now interested in solving a slightly more complicated problem involving


one lag (for the moment!) of the endogenous variable:
yt = aEt yt+1 + byt1 + cxt

(1.8)

This equation may be encountered in many different models, either in macro,


micro, IO. . . as we will see later on. For the moment, let us assume that this
is obtained from whatever model we may think of and let us take it as given.

24

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

We are now willing to solve this expectational equation. As before, there exist
many methods.

1.3.1

The method of undetermined coefficients

Let us recall that solving the equation using undetermined coefficients amounts
to formulate a guess for the solution and find some restrictions on the coefficients of the guess such that equation (1.8) is satisfied. An educated guess in
this case is given by
yt = yt1 +

i Et xt+i

i=0

Where does this guess come from? Experience! and this is precisely why
the method of undetermined coefficients, although it may appear particularly
practical in a number of (simple) problems, is not always appealing.
Plugging this guess in equation (1.8) yields
#
"

X
X
i Et+1 xt+1+i + byt1 + cxt
i Et xt+i = aEt yt +
yt1 +
i=0

i=0

= a yt1 +

i Et xt+i

i=0

+ aEt

"

+byt1 + cxt

= (a2 + b)yt1 + a

i Et xt+i + a

i=0

i Et+1 xt+1+i

i Et xt+1+i + cxt

i=0

i=0

Everything is then a matter of identification (term by term):


= a2 + b

(1.9)

0 = a0 + c

(1.10)

i = ai + ai1

i > 1

Solving (1.9) for amounts to solve the second order polynomial


1
b
2 + = 0
a
a
which admits two solutions such that

1 + 2 =
1 2 = ab

1
a

Three configurations may emerge from the above equation

(1.11)

1.3. A STEP TOWARD MULTIVARIATE MODELS

25

1. the two solutions lie outside the unit circle: the model is said to be a
source and only one particular point the steady state is a solution
to the equation.
2. One solution lie outside the unit circle and the other one inside: the
model exhibits the saddle path property.
3. The two solutions lie inside the unit circle: the model is said to be a sink
and there is indeterminacy.
Here, we will restrict ourselves to the situation where an extended version of
the condition |a| < 1 we were dealing with in the preceding section holds,
namely one root will be of modulus greater than one and the other less than
one. The model will therefore exhibit the socalled saddle point property, for
which we will provide a geometrical interpretation in a moment. To sum up,
we consider a situation where |1 | < 1 and |2 | > 1. Since we restrict ourselves
to the stationary solution, we necessarily have || < 1 so that = 1 .
Once has been obtained, we can solve for i , i = 0, . . .. 0 is obtained from
(1.10) and takes the value
0 =

c
1 a1

We then get i , i > 1, from (1.11) as


i =

a
i1 =
1 a1

1
a

1
i1
1

Since 1 + 2 = 1/a, the latter equation rewrites


i = 1
2 i1
where |2 | > 1, such that this sequence converges toward zero. Therefore the
solution is given by
yt = 1 yt1 +

X
c
i
2 Et xt+i
1 a1
i=0

Example 4 In the case of an AR(1) process for xt , the solution is straightforward, as all the process may be simplified. Indeed, let us consider the following
problem

yt = aEt yt+1 + byt1 + cxt


xt = xt1 + t

26

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

with t ; N (0, ). An educated guess for the solution of this equation would
be
yt = yt1 + xt
Let us then compute the solution of the problem, that is let us find and
. Plugging the guess for the solution in the expectational difference equation
leads to
yt1 + xt = aEt (yt + xt+1 ) + byt1 + cxt
= a2 yt1 + axt + axt + byt1 + cxt
= (a2 + b)yt1 + (c + a( + ))xt
Therefore, we have to solve the system

= a2 + b
= c + a( + )
Like in the general case, we select the stable root of the first equation 1 , such
that |1 | < 1, and =

c
1a(1 +)

Figure (1.7) reports an example of such an

economy for two different parameterizations.


Matlab Code: BackwardForward Solution
%
% Solve for
%
% y(t)=a E y(t+1) + b y(t-1) + c x(t)
% x(t)= rho x(t-1)+e(t)
e iid(0,se)
%
% and simulate the economy!
%
a
= 0.25;
b
= 0.7;
c
= 1;
rho
= 0.95;
se
= 0.1;
mu
[m,i]
mu1
[m,i]
mu2

=
=
=
=
=

roots([a -1 b]);
min(mu);
mu(i);
max(mu);
mu(i);

alpha = b/(1-a*(mu1+rho));
%
% Simulation
%

1.3. A STEP TOWARD MULTIVARIATE MODELS

27

Figure 1.7: Backwardforward solution


x

6
4

0.5

2
0

2
0.5
0

50

100
Time

150

200

4
0

50

100
Time

150

200

150

200

Note: a = 0.25, b = 0.7, = 0.95, = 0.1

3
2

0.5

1
0

1
0.5
0

50

100
Time

150

200

2
0

50

Note: a = 0.7, b = 0.25, = 0.95, = 0.1

100
Time

28

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

lg
= 200;
randn(state,1234567890);
e
= randn(lg,1)*se;
x
= zeros(lg,1);
y
= zeros(lg,1);
x(1) = 0;
y(1) = alpha*x(1);
for i = 2:lg;
x(i) = rho*x(i-1)+e(i);
y(i) = mu1*y(i-1)+alpha*x(i);
end

Note that contrary to the simple case we considered in the previous section, the
solution does not only inherit the persistence of the shock, but also generates its
own persistence through 1 as can be seen from the first order autocorrelation
(1) =

1.3.2

1 +
1 + 1

Factorization

The method of factorization proceeds into 2 steps.


1. Factor the model (1.8) making use of the leading operator F :
(aF 2 F + b)Et yt1 = cEt xt
which may be rewritten as


1
b
c
F2 F +
Et yt1 = Et xt
a
a
a
which may also be rewritten as
c
(F 1 )(F 2 )Et yt1 = Et xt
a
Note that 1 and 2 are the same as the ones obtained using the method
of undetermined coefficients, therefore the same discussion about their
size applies. We restrict ourselves to the case |1 | < 1 (backward part)
and |2 | > 1 (forward part) that is to saddle path solutions.
2. Derive a solution for yt : Starting from the last equation, we can rewrite
it as

c
(F 1 )Et yt1 = (F 2 )1 Et xt
a

1.3. A STEP TOWARD MULTIVARIATE MODELS


or
(F 1 )Et yt1 =

29

c
1
(1 1
2 F ) Et xt
a2

Since |2 | > 1, we know that


1
=
(1 1
2 F)

i
i
2 F

i=0

so that
(F 1 )Et yt1 =

c X i i
c X i
2 F Et xt =
2 Et xt+i
a2
a2
i=0

i=0

Now, applying the leading operator on the left hand side of the equation
and acknowledging that 2 = 1/a 1 , we have

X
c
i
yt = 1 yt1 +
2 Et xt+i
1 a1
i=0

1.3.3

A matricial approach

In this section, we would like to provide you with some geometrical intuition of
what is actually going on when the saddle path property applies in the model.
To do so, we will rely on a matricial approach. First of all, let us recall the
problem we have in hands:
yt = aEt yt+1 + byt1 + cxt
Introducing the technical variable zt defined as
zt+1 = yt
the model may be rewritten as7

  1


 
Et yt+1
yt
c
ab
a
=
xt

zt+1
zt
1
1 0
Remember that Et yt+1 = yt+1 t+1 where t+1 is an iid process which represents the expectation error, therefore, the system rewrites

 

 

  1
yt
c
1
yt+1
ab
a

xt
t+1
=
zt
1
0
zt+1
1 0
7
In the next section we will actually pool all the equations in a single system, but for
pedagogical purposes let us separate exogenous variables from the rest for a while.

30

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

In order to understand the saddle path property let us focus on the homogenous part of the equation

  1




b
yt+1
yt
yt
a a
=
=W
zt+1
zt
zt
1 0
Provided b 6= 0 the matrix W can be diagonalized and may be rewritten as
W = P DP 1
where D contains the two eigenvalues of W and P the associated eigenvectors.
Figure 1.8 provides a way of thinking about eigenvectors and eigenvalues in
dynamical systems. The figure reports the two eigenvectors, P1 and P2 , associated with the two eigenvalues 1 and 2 of W . 1 is the stable root and
2 is the unstable root. As can be seen from the graph, displacements along
Figure 1.8: Geometrical interpretation of eigenvalues/eigenvectors
z 6

P2

P1
x2

6
x1 R

x2

x1

z
x3
x4


I
x4

x3 )

P1 are convergent, in the sense they shift either x1 or x4 toward the center of
the graph (x1 and x4 ), while displacements along P2 are divergent (shift of x2
and x3 to x2 and x3 ). In fact the eigenvector determines the direction along

1.3. A STEP TOWARD MULTIVARIATE MODELS

31

which the system will evolve and the eigenvalue the speed at which the shift
will take place.
The characteristic equation that gives the eigenvalues, in the case we are studying, is given by
1
b
det(W I) = 0 2 + = 0
a
a
which exactly corresponds to the equations we were dealing with in the previous sections. We will not enter the formal resolution of the model right now,
as we will undertake an extensive treatment in the next section. However, we
will just try to understand what may be going on using a phase diagram like
approach to understand the dynamics. Figures 1.91.11 report the different
possible configuration we may encounter solving this type of model. The first
one is a source (figure 1.9), which is such that no matter the initial condition
we feed the system with except y0 = y , z0 = z the system will explode.
Both y and z will not be bounded. The second one is a sink (figure 1.10), all
trajectories converge back to the steady state of the economy, one is then free
to choose whatever trajectory it wants to go back to the steady state. The
equilibrium is therefore indeterminate.
Figure 1.9: A source
yt

P2

yt+1 = 0

zt+1 = 0

6


I
i

P1
y

R



zt

32

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


Figure 1.10: A sink: indeterminacy
yt

P1
y

P2

yt+1 = 0

zt+1 = 0

z z
-



y

i
i




zt

In the last situation (figure 1.11) this corresponds to the most commonly
encountered situation in economic theory the economy lies on a saddle: one
branch of the saddle converges to the steady state, the other one diverges. The
problem is then to select where to start from. It should be clear to you that in
t, zt is perfectly known as zt = yt1 which was selected in the earlier period.
zt is then said to be predetermined: the agents is endowed with its value when
she enters the period. This is part of the information set. Solving the system
therefore amounts to select a value for yt , given that for zt and the structure
of the model. How to proceed then? Let us assume for a while that at time
0, the economy is endowed with z0 , and assume that we impose the value y01
as a starting value for y. In such a case, the economy will explode: in other
words a solution including a bubble has been selected. If, alternatively, y02 is
selected, then the economy will converge to the steady state (z , y ) and all
the variables will be bounded. In other words, we have selected a trajectory
such that
lim |yt | <

holds. Otherwise stated, bubbles have been eliminated by imposing a terminal


condition. In the sequel, we will be mostly interested by situation were the

1.4. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (THE SIMPLE CASE)33


Figure 1.11: The saddle path
yt

P2

yt+1 = 0

P1

zt+1 = 0

y2
0

1
y0

y



?

z0

zt

economy either lies on a saddle path or is indeterminate. In the next section,


we will show you how to solve an expectational multivariate system of the
kind we were considering up to now.

1.4
1.4.1

Multivariate Rational Expectations Models (The


simple case)
Representation

Let us assume that the model writes


Mcc Yt = Mcs Mcs St
Mss0 Et St+1 + Mss1 St = Msc0 Et Yt+1 + Msc1 Yt + Mse Et+1

(1.12)
(1.13)

where Yt is a ny 1 vector of endogenous variables, t is a 1 vector of


exogenous serially uncorrelated random disturbances. A fairly natural interpretation of this dynamic system may be found in the statespace form literature: equation (1.17) corresponds to the standard measurement equation. It
relates variables of interest Yt to state variables St . (1.13) is the state equa-

34

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

tion that actually drives the dynamics of the economy under consideration:8 it
relates future values of states St+1 to current and expected values of variables
of interest, current state variables and shocks to fundamentals Et+1 . In other
words, (1.13) furnishes the transition from one state of the system to another
one. Our problem is then to solve this system.
As a first step, it would be great if we were able to eliminate all variables
defined by the measurement equation and restrict ourselves to a state equation,
as it would bring us back to our initial problem. To do so, we use (1.17) to
eliminate Yt .
1
Yt = Mcc
Mcs St

Plugging this expression in (1.13), we obtain:


Et St+1 = WS St + WE Et+1
where
1 M
WS = Mss0 Msc0 Mcc
cs

WE =

1 M
Mss0 Msc0 Mcc
cs

1

1

1 M
Mss1 Msc1 Mcc
cs

M se

We are then back to our expectational difference equation. But it needs additional work. Indeed, Farmer proposes a method that enables us to forget
about expectations when solving for the system. He proposes to replace the
expectation by the actual variable minus the expectation error
Et St+1 = St+1 Zt+1
where Et Zt+1 = 0. Then the system rewrites
St+1 = WS St + WE Et+1 + Zt+1

(1.14)

This is the system we will be dealing with.


8
Let us accept that statement for the moment, things will become clear as we will move
to examples.

1.4. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (THE SIMPLE CASE)35

1.4.2

Solving the system

? have shown that the existence and uniqueness of a solution depends fundamentally on the position of the eigenvalues of WS relative to the unit circle.
Denoting by NB and NF the number of, respectively, predetermined and jump
variables, and by NI and NO the number of eigenvalues that lie inside and
outside the unit circle, we have the following proposition.
Proposition 4
(i) If NI = NB and NO = NF , then there exists a unique solution path for
the rational expectation model that converges to the steady state;
(ii) If NI > NB (and NO < NF ), then the system displays indeterminacy;
(iii) If NI > NB (and NO > NF ), then the system is a source.
Hereafter we will deal with the two first situations, the last one being never
studied in economics.
The diagonalization of WS leads to
WS = P D P 1
where D is the matrix that contains the eigenvalues of WS on its diagonal and
P is the matrix that contains the associated eigenvectors. For convenience,
we assume that both D and P are such that eigenvalues are sorted in the
ascending order. We shall then consider two cases
1. The model satisfies the saddle path property (NI = NB and NO = NF )
2. The model exhibit indeterminacy (NI > NB and NO < NF )
The saddle path
In this section, we consider the case were the model satisfies the saddle path
property (NI = NB and NO = NF ). For convenience, we consider the following
partitioning of the matrices







DB 0
PBB PBF
PBB PBF
1
D=
, P =
, P =
0 DF
PF B PF F
PF B PF F

36

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

This partition conforms the position of the eigenvalues relative to the unit
circle. For instance, a B stands for the set of eigenvalues that lie within the
unit circle, whereas B stands for the set of eigenvalues that lie out of it.
We then apply the following modification to the system in order to make it
diagonal:
Set = P 1 St

so that

P 1 St+1 = P 1 WS P P 1 St + P 1 WE Et+1 + P 1 Zt+1


or
Set+1 = D Set + R Et+1 + P 1 Zt+1

The same partitioning is applied to R




RB.
R=
RF.
and the state vector
Set =

The system then rewrites as


! 

SeB,t+1
DB 0
=
0 DF
SeF,t+1

SeB,t
SeF,t

SeB,t
SeF,t
!

!


RB.
RF.

Et+1 +

PB.

PF.

Zt+1

Therefore, the law of motion of forward variables is given by

SeF,t+1 = DF SeF,t + RF. Et+1 + PF.


Zt+1

Taking expectations on both side of the equation

Et SeF,t+1 = DF SeF,t SeF,t = DF1 Et SeF,t+1

since DF is a diagonal matrix, forward iteration yields


SeF,t = lim DFj Et SeF,t+j
j

Provided Et SeF,t+j is bounded which amounts to eliminate bubbles we

have

lim DFj Et SeF,t+j = 0 SeF,t = 0

1.4. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (THE SIMPLE CASE)37


Then by construction, we have
SeF,t = PF B SB,t + PF F SF,t

which furnishes a restriction on SB,t and SF,t

PF B SB,t + PF F SF,t = 0
This condition expresses the relationship that relates the jump variables to the
predetermined variables, and therefore defined the initial condition SF,t which
is compatible with (i) the initial conditions on the predetermined variables
and (ii) the stationarity of the solution:
SF,t = (PF F )1 PF B SB,t = SB,t
Plugging this result in the law of motion of backward variables we have
SB,t+1 = (WBB + WBF )SB,t + RB Et+1 + ZB,t+1
but by definition, no expectation error may be done when predicting a predetermined variable, such that ZBt+1 = 0. Hence, the solution of the problem is
given by
SB,t+1 = MSS SB,t + MSE Et+1

(1.15)

where MSS = (WBB + WBF ) and MSE = RB .


As far as the measurement equation is concerned, thing are then rather simple.
.
Let us define = M 1 M = ( .. ), we have
cs

cc

Yt = B SB,t + F SF,t = SB,t


where = (B + F ).
The system is therefore solved and may be represented as
SB,t+1 = MSS SB,t + MSE Et+1

(1.16)

Yt = SB,t

(1.17)

SF,t = SB,t

(1.18)

38

1.5

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Multivariate Rational Expectations Models (II)

In this section we present a method to solve for multivariate rational expectations models, a because there are many of them (almost as many as authors
that deal with this problem).9 The one we present was introduced by Sims
[2000] and recently revisited by Lubik and Schorfheide [2003]. It has the advantage of being general and explicitly dealing with expectation errors. This
latter property makes it particularly suitable for solving sunspot equilibria.

1.5.1

Preliminary Linear Algebra

Generalized Schur Decomposition:

This is a method to obtain eigenval-

ues from a system which is not invertible. One way to think of this approach is
to remember that when we compute the eigenvalues of a diagonalizable matrix
A, we want to find a number and an associated eigenvector V such that
(A I)V = 0
The generalized Schur decomposition of two matrices A and B attempts to
compute something similar, but rather than considering (AI), the problem
considers (A B). A more formal, and above all a more rigorous
statement of the Schur decomposition is given by the following definitions and
theorem.
Definition 4 Let P C Cnn be a matrixvalued function of a complex
variable (a matrix pencil). Then the set of its generalized eigenvalues (P ) is
defined as
(P ) = {z C : |P (z) = 0}
When P (z) writes as Az B, we denote this set as (A, B). Then there exists
a vector V such that BV = AV .
Definition 5 Let P (z) be a matrix pencil, P is said to be regular if there
exists z C such that |P (z)| =
6 0 i.e. if (P ) 6= C.
9
In the appendix we present an alternative method that enables you to solve for singular
systems.

1.5. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (II)

39

Theorem 1 (The complex generalized Schur form) Let A and B belong


to Cnn and be such that P (z) = Az B is a regular matrix pencil. Then
there exist unitary n n matrices of complex numbers Q and Z such that
1. S = Q AZ is upper triangular,
2. T = Q BZ is upper triangular,
3. For each i, Sii and Tii are not both zero,
4. (A, B) = {Tii /Sii : Sii 6= 0}
5. The pairs (Tii , Sii ), i = 1 . . . n can be arranged in any order.
A formal proof of this theorem may be found in Golub and Van Loan [1996].
Singular Value Decomposition:

The singular value decomposition is used

for nonsquare matrices and is the most general form of diagonalization. Any
complex matrix A(n m) can be factored into the form
A = U DV
where U (n n), D(n m) and V (m m), with U and V unitary matrices
(U U = V V = I(nn) ). D is a diagonal matrix with positive values dii ,
i = 1 . . . r and 0 elsewhere. r is the rank of the matrix. dii are called the
singular values of A.

1.5.2

Representation

Let us assume that the model writes


A0 Yt = A1 Yt1 + Bt + Ct

(1.19)

where Yt is a n 1 vector of endogenous variables, t is a 1 vector of


exogenous serially uncorrelated random disturbances, and t is a k 1 vector
of expectation errors satisfying Et1 t = 0 for all t. A0 and A1 are both n n
coefficient matrices, while B is n and C is n k.

40

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

As an example of a model, let us consider the simple macro model


Et yt+1 + Et t+1 = yt + Rt
Et t+1 = t yt
Rt = t + gt
gt = gt1 + t
Let us then recall that by definition of an expectation error, we have
t = Et1 t + t
yt = Et1 yt + ty
Plugging the definition of Rt into the first two equations, and making use of
the definition of expectation errors, the system rewrites
yt = Et1 yt + ty
t = Et1 t + t
Et yt+1 + Et t+1 yt t gt = 0
Et t+1 t + yt = 0
gt = gt1 + t
Now defining

1
0
0
1


1
0
0

Yt = (yt , t , Et yt+1 , Et t+1 , gt ) ,

0 0 0
1 0 0 0
0 1 0 0
0 0 0

0 0 0 0
1 1
Y
=
t

0 0 0 0
0 0
0 0 1
0 0 0 0

the system may be writte10

0
0
1
0
0
0

0 t + 0
0
Y
+
t1

0
0
0

1
0

0
1
0
0
0

A nice feature of this representation is that it makes full use of expectation


errors and therefore may be given a fully interpretable economic meaning.

1.5.3

Solving the system

We now turn to the resolution of the system (1.19). Since, A0 is not necessarily
invertible, we will make full use of the generalized Schur decomposition of
(A0 , A1 ). There therefore exist matrices Q, Z, T and S such that
Q T Z = A0 , Q SZ = A1 , QQ = ZZ = Inn
10

Note that Yt1 = (yt1 , t1 , Et1 yt , Et1 t , gt1 )

 y 
t

1.5. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (II)

41

and T and S are upper triangular. Let us then define Xt = Z Yt and pre
multiply (1.19) by Q to get


 

 

T11 T12
W1,t
S11 S12
W1,t1
Q1
=
+
(Bt + Ct )
0 T22
W2,t
0 S22
W2,t1
Q2
(1.20)
Let us assume, without loss of generality that the system is ordered and partitioned such that the m 1 vector W2,t is purely explosive. Accordingly, the
remaining n m 1vector W1,t is stable. Let us first focus on the explosive
part of the system
T22 W2,t = S22 W2,t1 + Q2 (Bt + Ct )
For this particular block, the diagonal elements of T22 can be null, while S22
is necessarily full rank, as its diagonal elements must be different from zero if
the model is not degenerate. Therefore, the model may be written
1
1
W2,t = M W2,t+1 S22
Q2 (Bt+1 + Ct+1 ) where M S22
T22

Iterating forward, we get


s

W2,t = lim M W2,t+s


t

1
M s1 S22
Q2 (Bt+s + Ct+s )

s=1

In order to get rid of bubbles, we have to impose limt M s W2,t+s = 0, such


that
W2,t =

1
M s1 S22
Q2 (Bt+s + Ct+s )

s=1

Note that by definition of the vector Yt which does not involve any variable
which do not belong to the information set available in t, we should have
Et W2,t = W2,t . But,
Et W2,t = Et

1
M s1 S22
Q2 (Bt+s + Ct+s ) = 0

s=1

This therefore imposes a restriction on t and t . Indeed, if we go back to the


recursive formulation of W2,t and take into account that W2,t = 0 for all t, this
imposes
t
+
Q2 C
t
=
0
Q2 B
|{z}
|{z}
| {z }
|{z}
| {z }
(m ) ( 1)
(m k) (k 1)
(m 1)

(1.21)

42

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Our problem is now to know whether we can pin down the vector of expectation errors uniquely from that set of restrictions. Indeed, the vector t may
not be uniquely determined. This is the case for instance when the number
of expectation errors k exceeds the number of explosive components m. In
this case, equation (1.21) does not provide enough restrictions to determine
uniquely the vector t . In other words, it is possible to introduce expectation
errors which are not related with fundamental uncertainty the socalled
sunspot variables.
Sims [2000] shows that a necessary and sufficient condition for a stable solution
to exist is that the column space of Q2 B be contained in the column space of
Q2 C:
span(Q2 B) span(Q2 C)
Otherwise stated, we can reexpress Q2 B as a linear function of Q2 C (Q2 B =
Q2 C), implying that k > m. This is actually a generalization of the socalled
Blanchard and Khan condition that states that the number of explosive eigenvalues should be equal to the number of jump variables in the system. Lubik
and Schorfheide [2003] complement this statement by the following lemma.
Lemma 1 Statements (i) and (ii) are equivalent
(i) For every t R , there exists an t Rk such that Q2 Bt + Q2 Ct = 0.
(ii) There exists a (real) k matrix such that Q2 B = Q2 C
Endowed with this lemma, we can compute the set of all solutions (fully determinate and indeterminate solutions), reported in the following proposition.
Proposition 5 (Lubik and Schorfheide [2003]) Let t be a p 1 vector
of sunspot shocks, satisfying Et1 t = 0. Suppose that condition (i) of lemma
1 is satisfied. The full set of solutions for the forecast errors in the linear
rational expectations model is
1
t = (V1 D11
U1 Q2 B + V2 M1 )t + V2 M2 t

where M1 is a (k r) matrix and M2 is a (k r) p matrix.

1.5. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (II)


Proof: First of all, we have to find a solution to equation (1.21). The problem is that the rows of matrix Q2 C can be linearly dependent. Therefore,
we will use the Singular Value Decomposition of Q2 C
Q2 C =

U
|{z}
mm

which may be partitioned as



D11
Q2 C = (U1 U2 )
0

0
0

D
|{z}
mk

V
|{z}
kk



V1
V2

= U1 D11 V1

where D11 is a rr matrix, where r is the number of linearly independent


rows in Q2 C therefore the actual number of restrictions. Accordingly,
U1 is m r, and V1 is k r.
Given that we are looking for a solution that satisfies Q2 B = Q2 C,
equation (1.21) rewrites
U1 D11 (V1 t + V1 t ) = |{z}
0
| {z } |
{z
}
mr
r1
m1

We therefore now have r restrictions to identify the kdimensional vector


of expectation errors.
We guess that the solution implies that forecast errors are a linear function of (i) fundamental shocks and (ii) a p 1 vector of sunspot shocks
t , satisfying Et1 t = 0:
t = t + t
where is k and is k p.
Plugging this guess in the former equation, we get
U1 D11 (V1 + V1 )t + U1 D11 V1 t = 0
for all t and t . This triggers that we should have
U1 D11 (V1 + V1 ) = 0
U1 D11 V1 = 0

(1.22)
(1.23)

Let us first focus on equation (1.22). Since V is an orthonormal matrix,


it satisfies V V = I otherwise stated V1 V1 +V2 V2 = I and V V = I,
implying that V1 V2 = 0. A direct consequence of the first part of this
statement is that
e + V 2 M1
= V1 (V1 ) + V2 (V2 ) = V1

e V and M1 V . Since V V1 = I and V V2 = 0, (1.22)


with
1
2
1
1
therefore rewrites
e ) = 0
U1 D11 (V1 +

from which we get

e = V1

43

44

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


e and therefore . To do so,
We still need to identify to determine
we use the fact that Q2 B = Q2 C and Q2 C = U1 D11 V1 , to get
Q2 B = Q2 C = U1 D11 V1

Since U is orthonormal, we have U1 U1 = I, such that


1
V1 = U1 D11
Q2 B

e , we get
Therefore, plugging this result in the determination of
e = D1 U Q2 B

11 1

e + V2 M1 , we finally get
Since = V1

1
= V1 D11
U1 Q2 B + V2 M1

where M1 is left totally undetermined and therefore arbitrary.


We can now focus on (1.23) to determine . This is actually straightforward as it simply triggers that be orthogonal to V1 . But since
V1 V2 = 0, the orthogonal space of V1 is spanned by the columns of the
k (k r) matrix V2 . In other words, any linear combination of the
column of V2 would do the job. Hence
= V 2 M 2
where once again M2 is left totally undetermined and therefore arbitrary.
2

This last result tells us how to solve the model and under which condition
the system is determined or not. Indeed, let us recall that k is the number
of expectation errors, while r is the number of linearly independent expectation errors. According to this proposition, if k = r, all expectation errors
are linearly independent, and the system is therefore totally determinate. M1
and M2 are identically zeros. Conversely, if k > r expectation errors are not
linearly independent, meaning that the system does not provide enough restrictions to uniquely pin down the expectation errors. We therefore have to
introduce extrinsic uncertainty in the system the socalled sunspot variables. We will deal first with the determinate case, before considering the case
of indeterminate system.
Determinacy
This case occurs when the number of expectation errors exactly matches the
number of explosive components (k = m), or otherwise stated in the case

1.5. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (II)

45

where k = r. As shown in proposition 5, the expectation errors are then just


a linear combination of fundamental disturbances for all t since both M1 and
M2 reduce to nil matrices. Therefore, in this case, we have
1
t = V1 D11
U1 Q2 Bt

such that the overall effect of fundamental shocks on Wt is


1
(Q1 B Q1 CV1 D11
U1 Q2 B)t

while that of purely extrinsic expectation errors is nil. To get such an effect
.
in the first part of system (1.20), we shall premultiply by the matrix [I .. ]
1
where Q1 CV1 D11
U1 . Then, taking into account that W2t = 0, we have

T11 T12 T22


0
I



W1,t
W2,t



S11 S12 S22
W1,t1
=
0
0
W2,t1


Q1 Q2
Bt
+
0

Noting that the inverse of the matrix



is

T11 T12 T22


0
I

1
1
T11
T11
(T12 T22 )
0
I

we have

  1

  1

1
W1,t1
W1,t
T11 (Q1 Q2 )
T11 S11 T11
(S12 S22 )
Bt
+
=
W2,t1
W2,t
0
0
0
Now recall that Wt = Z Yt and that ZZ = I. Therefore, premultiplying the
last equation by Z, we end up with a solution of the form
Yt = My Yt1 + Me t

(1.24)

with
M =Z

1
1
T11
S11 T11
(S12 S22 )
0
0

Z and Me = Z

1
T11
(Q1 Q2 )
0

46

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Indeterminacy
This case arises as soon as the number of expectation errors is greater than the
number of explosive components (k > m), which translates into the fact that
k > r. As shown in proposition 5, the expectation errors are then not only
linear combinations of fundamental disturbances for all t but also of purely
extrinsic disturbances called sunspot variables. Then, the expectation errors
are shown to be of the form
1
t = (V1 D11
U1 Q2 B + V2 M1 )t + V2 M2 t

where both M1 and M2 can be freely chosen. This actually raises several questions. The first one is how to select M1 and M2 ? They are totally arbitrary
the only restriction we have to impose is that M1 is a (k r) matrix and M2
is a (k r) p matrix. A second one is then how to interpret these sunspots?
In order to partially circumvent these difficulties, it is useful to introduce the
notion of beliefs. For instance, this amounts to introduce new shocks the
sunspots beside the standard expectation error. In such a case, a variable
yt will be determined by its expectation at time t 1, a shock on the beliefs
that leads to a revision of forecasts, and the expectation error
yt = Et1 yt + t + t
where t is the shock on the belief, that satisfies Et1 t = 0, and t is the
expectation error. t is a k 1 vector. Then the system 1.19 rewrites
A0 Yt = A1 Yt1 + Bt + C(t + t )
which can be restated in the form
A0 Yt = A1 Yt1 + B

t
t

+ C t

where B = [B C]. Implicit in this rewriting of the system is the fact that the
belief shock be treated like a fundamental shock, therefore condition (1.21)
rewrites
Q2 B

t
t

+ Q2 C t = 0

which leads, according to proposition 5, to an expectation error of the form


1
1
t = (V1 D11
U1 Q2 B + V2 M1 )t + (V1 D11
U1 Q2 C + V2 M1 )t

1.5. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (II)

47

But, since Q2 C = U1 D11 V1 and V1 V1 + V2 V2 = I, this rewrites


1
t = (V1 D11
U1 Q2 B + V2 M1 )t + V2 (V2 + M1 )t

This shows that the expectation error is a function of both the fundamental
shocks and the beliefs.
If this latter formulation furnishes an economic interpretation to the sunspots,
it leaves unidentified the matrices M1 and M1 . From a practical point of view,
we can, arbitrarily, set these matrices to zeros and then proceed exactly as in
the determinate case, replacing B by B in the solution. This leads to


t
(1.25)
Yt = My Yt1 + Me
t
with
M =Z

1
1
(S12 S22 )
S11 T11
T11
0
0

Z and Me = Z

1
(Q1 Q2 )
T11
0

Note however, that even if we know the form of the solution, we know nothing
about the statistical properties of the t shocks. In particular, we do not know
their covariance matrix that can be set arbitrarily.

1.5.4

Using the model

In this section, we will show you how the solution may be used to study the
dynamic properties of the model from a quantitative point of view. We will
basically address two issues
1. Impulse response functions
2. Computation of moments
Impulse response functions
As we have already seen in the preceding chapter, the impulse response function of a variable to a shock gives us the expected response of the variable to
a shock at different horizons in other words this corresponds to the best
linear predictor of the variable if the economic environment remains the same
in the future. For instance, and just to remind you what it is, let us consider
the case of an AR(1) process:
xt = xt1 + (1 )x + t

48

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Assume for a while that no shocks occurred in the past, such that xt remained
steady at the level x from t = 0 to T . A unit positive shock of magnitude
occurs in T , xT is then given by
xT = x +

Figure 1.12: Impulse Response Function (AR(1))


6

xt

Time

In T + 1, no other shock feeds the process, such that xT +1 is given by


xT +1 = xT + (1 )x = x +
XT +2 is then given by
xT +2 = xT +1 + (1 )x = x + 2
therefore, as reported in figure 1.12, we have
xT +i = xT +i1 + (1 )x = x + i i > 1
In our system, obtaining impulse response functions is as simple as that, provided the solution has already been computed. Assume we want to compute

1.5. MULTIVARIATE RATIONAL EXPECTATIONS MODELS (II)

49

the response to one of the fundamental shocks (i,t Et ). On impact the


vector of endogenous variables (Yt )responds as

1
Yt = ME ei with eik =
0

if i = k
otherwise

The response as horizon j is then given by:


Yt+j = My Yt+j1 j > 0
Computation of moments
Let us focus on the computation of the moments for this economy. We will
describe two ways to do it. The first one uses a direct theoretical computation
of the moments, while the second one relies on MonteCarlo simulations.
The theoretical computation of moments can be achieved in a straightforward
way. Let us focus for a while on the covariance matrix of the state variables:
yy = E(Yt Yt )
Recall that in the most complicated case, we have
Yt = My Yt1 + ME t
with E(t t ) = ee .
Further, recall that we only consider stationary representations of the economy,
) whatever j. Hence, we have
such that SS = E(St+j St+j

yy = My yy My + My E(Yt1 t )ME + +Me E(t Yt1


)My + Me ee ME

Since both t are innovations, they are orthogonal to Yt , such that the previous
equation reduces to
yy = My yy My + Me ee ME
Solving this equation for SS can be achieved remembering that vec(ABC) =
(A C )vec(B), hence
vec(yy ) = (I My My )1 vec(ee )

50

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

The computation of covariances at leads and lags proceeds the same way. For
). From
instance, assume we want to compute jSS = E(St Stj

Yt = My Yt1 + Me t
we know that
Yt =

Myj Ytj

+ Me

j
X

Myi ti

i=0

Therefore,

E(Yt Ytj
)

Myj E(Ytj Ytj


)

+ Me

j
X

Myi E(ti Ytj


)

i=0

Since are innovations, they are orthogonal to any past value of Y , such that

0
if i < j

E(ti Ytj ) =

ee Me if i = j
Then, the previous equation reduces to

E(Yt Ytj
) = Myj yy + Me Myj ee Me

The MonteCarlo simulation is as simple as computing Impulse Response


Functions, as it just amounts to simulate a process for , impose an initial
condition for Y0 and to iterate on
Yt = My Yt1 + Me t for t = 0, . . . , T
Then moments can be computed and stored in a matrix. The experiment is
conducted N times, as N one can compute the asymptotic distribution
of the moments.

1.6

Economic examples

This section intends to provide you with some economic applications of the set
of tools we have described up to now. We will consider three examples, two of
which may be thought of as micro examples. In the first one a firm decides on
its labor demand, the second one is a macro model and endogenous growth
model `
a la Romer [1986] which allows to show that even a nonlinear model
may be expressed in linear terms and therefore may be solved in a very simple
way. The last one deals with the socalled Lucas critique which has strong
implications on the econometric side.

1.6. ECONOMIC EXAMPLES

1.6.1

51

Labor demand

We consider the case of a firm that has to decide on its level of employment.
The firm is infinitely lived and produces a good relying on a decreasing returns
to scale technology that essentially uses labor another way to think of it
would be to assume that physical capital is a fixedfactor. This technology is
represented by the production function
Yt = f0 nt

f1 2
n with f0 , f1 > 0.
2 t

Using labor incurs two sources of cost


1. The standard payment for labor services: wt nt where wt is the real wage,
which positive sequence {wt }
t=0 is taken as given by the firm
2. A cost of adjusting labor which may be justified either by appealing to
reorganization costs, training costs, and that takes the form

(nt nt1 )2 with > 0


2
Labor is then determined by maximizing the expected intertemporal profit

s 

X
1

f1 2
2
max Et
f0 nt+s nt+s wt+s nt+s (nt+s nt+s1 )
1+r
2
2
{n }
=0
s=0

First order conditions:

Finding the first order conditions associated to

this dynamic optimization problem may be achieved in various ways. Here,


we will follow Sargent [1987] and will adopt the Lagrangean approach. Let
us fix s for a while and make some accountancy in order to find all the terms
involving nt+s
in s i, i = 2, . . .
in s 1
in s
in s + 1
in s + i, i = 2, . . .

none
none


s 
1

f1 2
2
Et
f0 nt+s nt+s wt+s nt+s (nt+s nt+s1 )
1+r
2
2
s+1 



1
(nt+s+1 nt+s )2
Et
1+r
2
none

52

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Hence, finding the optimality condition associated to nt+s reduces to maxi-

mizing



s 

1
1

f1 2
2
2
Et
(nt+s+1 nt+s )
f0 nt+s nt+s wt+s nt+s (nt+s nt+s1 )
1+r
2
2
1+r 2
which yields the following first order condition

s 



1
1
Et
f0 f1 nt+s wt+s (nt+s nt+s1 ) +
(nt+s+1 nt+s ) = 0
1+r
1+r
since r is a constant this reduces to




1
(nt+s+1 nt+s ) = 0
Et f0 f1 nt+s wt+s (nt+s nt+s1 ) +
1+r
Now remark that this relationship holds whatever s, such that we may restrict
ourselves to the case s = 0 which then yields noting that nti , i > 0 belongs
to the information set
f0 f1 nt wt (nt nt1 ) +

(Et nt+1 nt ) = 0
1+r

rearranging terms


1+r
f1 (1 + r)
nt + (1 + r)nt1 +
(f0 wt ) = 0
Et nt+1 2 + r +

Finally we have the transversality condition


lim (1 + r)T (nT nT 1 )nT = 0

T +

Solving the model:

In this example, we will apply all three methods that

we have described previously. Let us first start with factorization.


The preceding equation may be rewritten using the forward operator as




f1 (1 + r)
1+r
2
P (F )nt1 F 2 + r +
(wt f 0)
F + 1 + r nt1 =

P (F ) may be factorized as
P (F ) = (F 1 )(F 2 )
Let us compute the discriminant of this second order polynomial




f1 (1 + r) 2
f1
f1
2+r+
(1 + r) + 2(2 + r) > 0
4(1 + r) = (1 + r)

1.6. ECONOMIC EXAMPLES

53

Hence, since > 0, we know that the two roots are real. Further
f1 (1 + r)
<0

f1
P (1) = (1 + r) + 2(2 + r) > 0

P (0) = 1 + r > 0


f1 (1 + r)
1

2+r+
>1
P (x) = 0 x =
2

P (1) =

P (0) being greater than 0 and since P(1) is negative, one root lies between
0 and 1, and the other one is therefore greater than 1 since lim P (x) = .
x

The system therefore satisfies the saddle path property.


Let us assume then that 1 < 1 and 2 > 1. The expectational equation
rewrites
(F 1 )(F 2 )nt1 = wt f0 (F 1 )nt1 =

1 + r wt f 0
F 2

or
nt = 1 nt1 +

1 + r f0 wt
1 + r X i
2 Et (f0 wt+i )
=

n
+
1 t1
2 1 1
2
2 F
i=0

Since 1 2 = (1 + r), this rewrites

1 X i
nt = 1 nt1 +
2 Et (f0 wt+i )

i=0

or developing the series

f0 (1 + r)
1 X i
2 Et wt+i
nt =
+ 1 nt1
(2 1)

i=0

For practical purposes let us assume that wt follows an AR(1) process of the
form
wt = wt1 + (1 )w + t
we have
Et wt+i = i wt + (1 i )w
such that nt rewrites
nt =

f0 (1 + r)
1+r
(1 + r)(1 )
w + 1 nt1

wt
(2 1) (2 1)(2 )
( 2 )

54

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

We now consider the problem from the method of undetermined coefficients


point of view, and guess that the solution takes the form
nt = 0 + 1 nt1 +

i Et wt+i

i=0

Plugging the guess in t + 1 in the Euler equation, we get


!
!

X
X
i Et+1 wt+i+1
i Et wt+i +
Et 0 + 1 0 + 1 nt1 +
i=0

i=0

!



X
f1 (1 + r)
i Et wt+i
0 + 1 nt1 +
2+r+

i=0

1+r
+(1 + r)nt1 +
(f0 wt ) = 0

which rewrites
0 (1 + 1 ) +

12 nt1

+ 1

i Et wt+i +

i=0



f1 (1 + r)
0 + 1 nt1 +
2+r+

+(1 + r)nt1 +

1+r
(f0 wt ) = 0

i Et wt+i+1

i=0

i Et wt+i

i=0

Identifying term by term, we get the system



f1 (1+r)

(1
+

2
+
r
+
0 + 1+r

0
1
f0 = 0




f
(1+r)
2 2 + r + 1
1 + (1 + r) = 0
1




f
(1+r)
1

0 1 2 + r +
1+r

=0





f
(1+r)
1
i 1 2 + r +
+ i1 = 0

The second equation of the system exactly corresponds to the second order
polynomial we solved in the factorization method. The system therefore exhibits the saddle path property so that 1 (0, 1) and 2 (1, ). Let us
recall that 1 + 2 = 2 + r + f1 (1 + r)/, such that the system for 0 and i
rewrites

0 (1 + 1 ) 2 + r +
0 2 1+r
=0

1
i = 2 i1

f1 (1+r)

0 +

1+r
f0

=0

1.6. ECONOMIC EXAMPLES

55

Therefore, we have
0 =

1+r
1
=
2

and i = i
2 0 . Finally, we have
0 =

f0 (1 + r)
(2 1)

We then find the previous solution

1 X i
f0 (1 + r)
2 Et wt+i
+ 1 nt1
nt =
(2 1)

i=0

As a final exercise, let us adopt the matricial approach to the problem. To


do so, and because this approach is essentially numerical, we need to assume a
particular process for the real wage. We will assume that it takes the preceding
AR(1) form. Further, we do not need to deal with levels in this approach such
that we will express the model in terms of deviation from its steady state. We
thus first compute this quantity, which is defined by


f1 (1 + r)
f0 w
1+r

n 2+r+
(f0 w) = 0 n =
n + (1 + r)n +

f1
Denoting n
bt = nt n and w
bt = wt w, and introducing the technical

variable zbt+1 = n
bt , the Labor demand reexpresses as


1+r
f1 (1 + r)
w
bt = 0
n
bt + (1 + r)b
zt
Et n
bt+1 2 + r +

We define the vector Yt = {b


zt+1 , n
bt , w
bt , Et n
bt+1 }. Remembering that n
bt =

Et1 n
bt + t , the system expresses as

1
0
0
1

1
1
0

2+r+

f1 (1+r)

0
0
1

0
0
0

1+r

zbt+1
n
bt
w
bt
Et n
bt+1

0
0
0
(1 + r)

0
0
1
0

0
0
0
0

0
0

t +

0
1
0
0
0
1
0
0

zbt
n
bt1
w
bt1
Et1 n
bt

We now provide you with an example of the type of dynamics this model
may generate. Figure 1.13 reports the impulse response function of labor to

56

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


Table 1.1: Parameterization: labor demand
r
0.01

f0
1

f1
0.2

0.001/1

w
0.6

0.95

a positive shock on the real wage (table 1.1 reports the parameterization).
As expected, labor demand shifts downward instantaneously, but depending
on the size of the adjustment cost, the magnitude of the impact effect differs. When adjustment costs are low, the firm drastically cuts employment,
which goes back steadily to its initial level as the effects of the shock vanish. Conversely, when adjustment costs are high, the firm does not respond as
Figure 1.13: Impulse Response to a Wage Shock
Small adjustment costs ( = 0.001)
Real Wage

Labor Demand

0.8

0.6

0.4

0.2
0

10
Time

15

5
0

20

10
Time

15

20

15

20

High adjustment costs ( = 1)


Real Wage

Labor Demand

1.5
2

0.8

2.5
0.6
3
0.4
0.2
0

3.5
5

10
Time

15

20

4
0

10
Time

much as before since it wants to avoid paying the cost. Nevertheless, it remains
optimal to cut employment, so in order to minimize the cost, the firm spreads
it intertemporally by smoothing the employment profile, therefore generating
a hump shaped response of employment.

1.6. ECONOMIC EXAMPLES


Matlab Code: Labor Demand
%
% Labor demand
%
clear all
%
% Structural Parameters
%
r
= 0.02;
f0
= 1;
wb
= 0.6;
f1
= 0.2;
phi
= 1;
rho
= 0.95;
nb
= (f0-wb)/f1;
A0=[
1 -1 0 0
0 1 0 0
0 0 1 0
0 -(2+r+f1*(1+r)/phi) -(1+r)/phi 1
];
A1=[
0 0 0 0
0 0 0 1
0 0 rho 0
-(1+r) 0 0 0
];
B=[0;0;1;0];
C=[0;1;0;0];
% Call Sims Routine
[MY,ME] = sims_solve(A0,A1,B,C);
%
% IRF
%
nrep
= 20;
SHOCK = 1;
YS
= zeros(4,nrep);
YS(:,1)= ME*SHOCK;
for i = 2:nrep;
YS(:,i)=MY*YS(:,i-1);
end
T=1:nrep;
subplot(221);plot(T,Y(3,:));
subplot(222);plot(T,Y(2,:));

57

58

1.6.2

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

The Real Business Cycle Model

We consider an economy that consists of a large number of dynastic households and a large number of firms. Firms are producing a homogeneous final
product that can be either consumed or invested by means of capital and labor
services. Firms own their capital stock and hire labor supplied by the households. Households own the firms. In each and every period three perfectly
competitive markets open the markets for consumption goods, labor services, and financial capital in the form of firms shares. Household preferences
are characterized by the lifetime utility function:
Et

X
s=0

h1+
log(ct+s ) t+s
1+
s

where 0 < < 1 is a constant discount factor, ct is consumption in period


t, ht is the fraction of total available time devoted to productive activity in
period t, > 0 and > 0. We assume that there exists a central planner
that determines hours, consumption and capital accumulation maximizing the
households utility function subject to the following budget constraint
ct + it = yt

(1.26)

where it is investment, and yt is output. Investment is used to form physical


capital, which accumulates in the standard form as:
kt+1 = it + (1 )kt with 0 6 6 1

(1.27)

where is the constant physical depreciation rate.


Output is produced by means of capital and labor services, relying on a constant returns to scale technology represented by the following Cobb Douglas
production function:
yt = at kt h1
with 0 < < 1
t

(1.28)

at represents a stochastic shock to technology or Solow residual, which evolves


according to:
log(at ) = log(at1 ) + (1 ) log(a) + t

(1.29)

1.6. ECONOMIC EXAMPLES

59

The unconditional mean of at is a, || < 1 and t is a gaussian white noise


with standard deviation of . Therefore, the central planner solves
max

{ct+s ,kt+1+s }
s=0

Et

s log(ct+s )

s=0

s.t.

h1+
t+s
1+

kt+1 =yt = at kt h1
ct + (1 )kt
t
log(at ) = log(at1 ) + (1 ) log(a) + t
The set of conditions characterizing the equilibrium is given by
yt
h
t ct =(1 )
ht
yt =at kt h1
t
yt =ct + it
kt+1 =it + (1 )kt



yt+1
ct

+1
1 =Et
ct+1
kt+1

(1.30)
(1.31)
(1.32)
(1.33)
(1.34)

and the transversality condition

kt+1+s
=0
ct+s
The problem with this dynamic system is that it is fundamentally nonlinear
lim s

and therefore the methods we have developed so far are not designed to handle
it. The usual way to deal with this type of system is then to take a linear
or loglinear approximation of each equation about the deterministic steady
state. Therefore, the first step is to find the deterministic steady state.
Deterministic steady state

Recall that the steady state value of a vari-

able, x, is the value x such that xt = x for all t. Therefore, the steady state
of the RBC model is characterized by the set of equations:
y
h c =(1 )
h
y =ak h 1

(1.35)
(1.36)

y t =c + i

(1.37)

k =i + (1 )k


y
1 =Et
+1
k

(1.38)
(1.39)

60

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

From equation (1.38), we get


i = k

i
k
=

y
y

Then, equation (1.39) implies


1 (1 )
y
=
k

such that from the previous equation and (1.37)


i

c
=
=
s
=
= 1 si
c
y
1 (1 )
y

si

Then, from (1.35), we obtain

h =

1
sc

1
1+

Finally, it follows from the production function and the definition of y /k


that

y =a

1 (1 )

h , c = sc y , i = y c .

We are now in position to loglinearize the dynamic system.


Loglinearization:

A common practice in the macro literature is to take a

loglinear approximation to the equilibrium. Such an approximation is usually


taken because it delivers a natural interpretation of the coefficients in front of
the variables: these can be interpreted as elasticities. Indeed, lets consider
the following onedimensional function f (x) and lets assume that we want to
take a loglinear approximation of f around x. This would amount to have,
as deviation, a logdeviation rather than a simple deviation, such that we can
define
x
b = log(x) log(x )

Then, a restatement of the problem is in order, as we are to take an approximation with respect to log(x):
f (x) f (exp(log(x)))
which leads to the following first order Taylor expansion
f (x) f (x ) + f (exp(log(x )))exp(log(x ))b
x = f (x ) + f (x )x x
b

1.6. ECONOMIC EXAMPLES

61

Now, remember that by definition of the deterministic steady state, we have


f (x ) = 0, such that the latter equation reduces to
f (x) f (x )x x
b

Applying this technic to the system (1.30)(1.34), we end up with the system
(1 + )b
ht + b
ct ybt

(1.40)

ybt (1 )b
ht b
ht b
at = 0

(1.41)

ybt sc b
ct sibit = 0

(1.42)

b
kt+1 bit (1 )b
kt = 0

(1.43)

Et b
ct+1 b
ct (1 (1 ))(Et ybt+1 Et b
kt+1 )

(1.44)

b
at b
at1 bt

(1.45)

Note that only the last three equations of the system involve dynamics, but
they depend on variables that are defined in the first three equations. Either
we solve the first three equations in terms of the state and costate variables,
or we adapt a little bit the method. We choose the second solution.
Let us define Yt = {b
kt+1 , b
at , Et b
ct+1 } and Xt = {b
yt , b
ct , bit , b
ht }. The system can

be rewritten as a set of two equations. The first one gathers static equations
x Xt = y Yt1 + t + t

where t is the vector of expectation errors, which actually reduces to that


attached on b
ct , and

1
0
0 1

1
0
0
0
x =
=
1 sc si
0 y 0
1 1
0
1
0

0
0
0

0
1

1
0
=
0 0
0
0

The second one gathers the dynamic equations


0y Yt + 0x Et Xt+1 = 1y Yt1 + 1x + t + t


= 1

0
0

62

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

with

0
0 0 0
1
0 0
0
0 0 0
0
1 0 0x =
0y =
(1 (1 )) 0 0 0
1 (1 ) 0 1
0 0 0
1 0 0
0
1x = 0 0 0 0
1y = 0
0
0 0

0 1 0 0
0
0

1
0
=
=
0
0

From the first equation, we obtain

Xt = y Yt1 + t + t
where j = 1
x j , j = {y, , }. Furthermore, remembering that Et t+1 =
Et t+1 = 0, we have Et Xt+1 = y Yt . Hence, plugging this result and the first
equation in the second equation we get
A0 Yt = A1 Yt+1 + Bt + Ct
where A0 = 0y +0x y , A1 = 1y +1x y , B = +0x and C = +0x .
We then just use the algorithm as described previously.
Then, we make use of the result in proposition 5, to get t . Since it turns
out that the model is determinate, the expectation error is a function of the
fundamental shock t
1
t = V1 D11
U1 Q2 Bt

Plugging this result in the equation governing static equations, we end up with
1
Xt = y Yt1 + (e V1 D11
U1 Q2 B)t

Figure 1.14 then reports the impulse response function to a 1% technology


shock. These IRFs are obtained using the set of parameters reported in table
1.2.
Matlab Code: The RBC Model
clear all
% Clear memory
%
% Structural parameters
%
alpha
= 0.4;

1.6. ECONOMIC EXAMPLES

63

Table 1.2: The Real Business Cycle Model: parameters

0.4

0.988

0.025

delta
= 0.025;
rho
= 0.95;
beta
= 0.988;
%
% Deterministic Steady state
%
ysk = (1-beta*(1-delta))/(alpha*beta);
ksy = 1/ysk;
si = delta/ysk;
sc = 1-si;
% Define:
%
% Y=[k(t+1) a(t+1) E_tc(t+1)]
%
% X=[y,c,i,h]
%
ny = 3; % # of variables in vector Y
nx = 4; % # of variables in vector X
ne = 1; % # of fundamental shocks
nn = 1; % # of expectation errors
%
% Initialize the Upsilon matrices
%
UX=zeros(nx,nx);
UY=zeros(nx,ny);
UE=zeros(nx,ne);
UN=zeros(nx,nn);
G0Y=zeros(ny,ny);
G1Y=zeros(ny,ny);
G0X=zeros(ny,nx);
G1X=zeros(ny,nx);
GE=zeros(ny,ne);
GN=zeros(ny,nn);
%
% Production function
%
UX(1,1)=1;
UX(1,4)=alpha-1;
UY(1,1)=alpha;
UY(1,2)=rho;
UE(1)=1;
%
% Consumption c(t)=E(c(t)|t-1)+eta(t)
%

0.95

64

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

UX(2,2)=1;
UY(2,3)=1;
UN(2)=1;
%
% Resource constraint
%
UX(3,1)=1;
UX(3,2)=-sc;
UX(3,3)=-si;
%
% Consumption-leisure arbitrage
%
UX(4,1)=-1;
UX(4,2)=1;
UX(4,4)=1;
%
% Accumulation of capital
%
G0Y(1,1)=1;
G1Y(1,1)=1-delta;
G1X(1,3)=delta;
%
% Productivity shock
%
G0Y(2,2)=1;
G1Y(2,2)=rho;
GE(2)=1;
%
% Euler equation
%
G0Y(3,1)=1-beta*(1-delta);
G0Y(3,3)=1;
G0X(3,1)=-(1-beta*(1-delta));
G1X(3,2)=1;
%
% Solution
%
% Step 1: solve the first set of equations
%
PIY = inv(UX)*UY;
PIE = inv(UX)*UE;
PIN = inv(UX)*UN;
%
% Step 2: build the standard System
%
A0 = G0Y+G0X*PIY;
A1 = G1Y+G1X*PIY;
B
= GE+G1X*PIE;
C
= GN+G1X*PIN;
%
% Step 3: Call Sims routine
%
[MY,ME,ETA,MU_]=sims_solve(A0,A1,B,C);
%

1.6. ECONOMIC EXAMPLES

65

% Step 4: Recover the impact function


%
PIE=PIE-PIN*ETA;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%
Impulse Response Functions
%
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
nrep
= 20;
% horizon of responses
YS
= zeros(3,nrep);
XS
= zeros(4,nrep);
Shock
= 1;
YS(:,1) = ME*Shock;
XS(:,1) = PIE;
for t=2:nrep;
YS(:,t) = MY*YS(:,t-1);
XS(:,t) = PIY*YS(:,t-1);
end
subplot(221);plot(XS(1,:));title(Output);xlabel(Time)
subplot(222);plot(XS(2,:));title(Consumption);xlabel(Time)
subplot(223);plot(XS(3,:));title(Investment);xlabel(Time)
subplot(224);plot(XS(4,:));title(Hours worked);xlabel(Time)

1.6.3

A model with indeterminacy

Let us consider the simplest new keynesian model, with the following IS curve
yt = Et yt+1 (it Et t+1 ) + gt
where yt denotes output, t is the inflation rate, it is the nominal interest rate
and gt is a stochastic shock that follows an AR(1) process of the form
gt = g gt1 + gt
the model also includes a Phillips curve that relates positively inflation to the
output gap
t = yt + Et t+1 + ut
where ut is a supply shock that obeys
ut = u ut1 + ut
For stationarity purposes, we have |g | < 1 and |u | < 1.
The model is closed by a simple Taylor rule of the form
it = t + y yt

66

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Figure 1.14: IRF to a technology shock


Output

Consumption

0.8

1.8
0.7
1.6
1.4

0.6

1.2
0.5
1
0.8
0

10
Time

15

20

0.4
0

Investment

10
Time

15

20

15

20

Hours worked

1.5

5
4

3
2

0.5

1
0
0

10
Time

15

20

0
0

10
Time

1.6. ECONOMIC EXAMPLES

67

Plugging this rule in the first equation, and remembering the definition of
expectation errors, the system rewrites
yt =Et1 yt + ty
t =Et1 t + t
gt =g gt1 + gt
ut =u ut1 + ut
(1 + y )yt =Et yt+1 t + Et t+1 + gt
t =yt + Et t+1 + ut
Defining Yt = {yt , t , gt , ut , Et yt+1 , Et t+1 } and t = {ty , t }, the system
rewrites

1
0
0
0
0
0

0
1
0
0
0
0

0
0
1
0
0
0

0
0
0
1
0
0

1 + y + 1 0 1

1
0 1 0

0
0

Yt = 0
0

0 0 0 1
0 0 0 0
0 g 0 0
0 0 u 0
0 0 0 0
0 0 0 0

0 0

0 0

1 0
t +

0 1

0 0
0 0

0
1
0
0
0
0
1
0
0
0
0
0

Yt1

0
1
0
0
0
0

The set of parameter used in the numerical experiment is reported in table


1.3. As predicted by theory of Taylor rules, a coefficient below 1 yields
indeterminacy.
Table 1.3: New Keynesian model: parameters

0.4

0.9

g
0.9

u
0.9

y
0.25

1.5/0.5

Matlab Code: A Model with Real Indeterminacy


clear all
% Clear memory
%
% Structural parameters

68

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

%
alpha
= 0.4;
gy
= 0.25;
gp
= 0.5;
rho_g
= 0.9;
rho_u
= 0.95;
lambda = 1;
beta
= 0.9;
% Define:
%
% Y=[y(t),pi(t),g(t),u(t),E_t y(t+1),E_t pi(t+1)]
%
ny = 6; % # of variables in vector Y
ne = 2; % # of fundamental shocks
nn = 2; % # of expectation errors
%
% Initialize the matrices
%
A0 = zeros(ny,ny);
A1 = zeros(ny,ny);
B
= zeros(ny,ne);
C
= zeros(ny,nn);
%
% Output
%
A0(1,1) = 1;
A1(1,5) = 1;
C(1,1) = 1;
%
% Inflation
%
A0(2,2) = 1;
A1(2,6) = 1;
C(2,2) = 1;
%
% IS shock
%
A0(3,3) = 1;
A1(3,3) = rho_g;
B(3,1) = 1;
%
% Supply shock
%
A0(4,4) = 1;
A1(4,4) = rho_u;
B(4,2) = 1;
%
% IS curve
%
A0(5,1) = 1+alpha*gy;
A0(5,2) = alpha*gp;
A0(5,3) = -1;
A0(5,5) = -1;
A0(5,6) = -alpha;

1.6. ECONOMIC EXAMPLES

69

%
% Phillips Curve
%
A0(6,1) = -lambda;
A0(6,2) = 1;
A0(6,4) = -1;
A0(6,6) = -beta;
%
% Call Sims routine
%
[MY,ME,ETA,MU_]=sims_solve(A0,A1,B,C);

1.6.4

AK growth model

Up to now, we have considered quadratic objective function in order to get


linear expectational difference equations. This may seem to be very restrictive.
However, there is a number of situations, where the dynamics generated by the
model is characterized by a linear expectational difference equation, despite
the objective function is not quadratic. We provide you with such an example
in this section.
We consider an endogenous growth model a` la Romer [1986] extended to a
stochastic environment. The economy consists of a large number of dynastic
households and a large number of firms. Firms are producing a homogeneous
final product that can be either consumed or invested by means of capital, but
contrary to the standard optimal growth model, returns to factors that can
be accumulated (namely capital) are exactly constant.
Household decides on consumption, Ct , and capital accumulation (or savings),
Kt+1 , maximizing her lifetime expected utility
max Et

s log(Ct+s )

s=0

subject to the resource constraint in the economy


Yt = Ct + It
and the law of motion of capital
Kt+1 = It + (1 )Kt with [0; 1]
It is investment, Yt denotes output, which is produced using a linear technology
of the form Yt = At Kt . At is a stochastic shock that we leave unspecified for

70

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

the moment. We may think of it as a shift on the technology, such that it


represents a technology shock.
First order conditions:

We now present the derivation of the optimal be-

havior of the consumer. The first order condition associated to the consumption/savings decisions may be obtained forming the following Lagrangean,
where t is the multiplier associated to the resource constraint
Lt = Et

s log(Ct+s ) + t (At Kt + (1 )Kt Ct Kt+1 )

s=0

Terms involving Ct :
max Et (log(Ct ) t Ct ) = max (log(Ct ) t Ct )
{Ct }

{Ct }

Therefore, the FOC associated to consumption writes


1
= t
Ct
Likewise for the saving decision, terms involving Kt+1 :
max t Kt+1 + Et [t+1 (At+1 Kt+1 + (1 )Kt+1 )]

{Kt+1 }

such that the FOC is given by


t = Et [t+1 (At+1 + 1 )]
Finally, we impose the socalled transversality condition


KT +1
T
lim Et
=0
T
CT
Solving the dynamic system:

Plugging the first order condition on con-

sumption in the Euler equation, we get




1
1
= Et
(At+1 + 1 )
Ct
Ct+1
This system seems to be nonlinear, but we can make it linear very easily.
Indeed, let us multiply both sides of the Euler equation by Kt+1 , we get


Kt+1
Kt+1
= Et
(At+1 + 1 )
Ct
Ct+1

1.6. ECONOMIC EXAMPLES

71

But the resource constraint states that


Kt+1 + Ct = Kt (At + 1 ) Kt+2 + Ct+1 = Kt+1 (At+1 + 1 )
such that the Euler equation rewrites




Kt+2 + Ct+1
Kt+2
Kt+1
= Et 1 +
= Et
Ct
Ct+1
Ct+1
Let us denote Xt = Kt+1 /Ct , the latter equation rewrites
Xt = Et (1 + Xt+1 )
which has the same form as (1.2). As we have already seen, the solution for
such an equation can be easily obtained iterating forward. We then get
Xt = lim

T
X
k=0

k + lim T Et (XT +1 )
T

The second term in the right hand side of the latter equation corresponds
precisely to the transversality condition. Hence, Xt reduces to
Xt =

Kt+1 =
Ct
1
1

Plugging this relation in the resource constraint, we get


Kt+1 = (At + 1 )Kt
and
Ct = (1 )(At + 1 )Kt
Time series properties

Let us consider the solution for capital accumula-

tion. Taking logs, we get


log(Kt+1 ) = log(Kt ) + log((At + 1 ))
since At is an exogenous stochastic process, we immediately see that the process may be rewritten as
log(Kt+1 ) = log(Kt ) + t
where t log((At + 1 )). Such that we see that capital is an non
stationary process (an I(1) process) more precisely a random walk. Since

72

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

consumption, output and investment are just a linear function of capital, the
nonstationarity of capital translates into the non stationarity of these variables. Nevertheless, as can be seen from the law of motion of consumption, for
example, log(Ct ) log(Kt ) is a stationary process. Kt and Ct are then said
to be cointegrated with a cointegrating vector (1, 1).
This has extremely important economic implications, that may be analyzed
in the light of the impulse response functions, reported in figure 1.15. In fact,
figure 1.15 reports two balanced growth paths for each variable: The first one
corresponds to the path without any shock, the second one corresponds to
the path that includes a non expected positive shock on technology in period
10. As can be seen, this shock yields a permanent increase in all variables.
Therefore, this model can account for the fact that countries may not converge.
Why is that so? The answer to this question is actually simple and may be
Figure 1.15: Impulse response functions
Output

Consumption

0.052

0.0135

0.05

0.013

0.048

0.0125

0.046

0.012

0.044

0.0115

0.042

0.011

0.04

0.0105

0.038

10

20

30

40

50

0.01

10

20

30

Time

Time

Investment

Capital

40

50

40

50

0.038
1.3

0.036
0.034

1.2

0.032
1.1
0.03
0.028

10

20

30
Time

40

50

10

20

30
Time

understood if we go back to the simplest Solow growth model. Assume there


is a similar shock in the Solow growth model, output increases on impact
and since income increases so does investment yielding higher accumulation.
Because the technology displays decreasing returns to capital in the solow
growth model, the marginal efficiency of capital decreases reducing incentives

1.6. ECONOMIC EXAMPLES

73

to investment so that capital accumulation slows down. The economy then


goes back to its steady state. Things are different in this model: Following
a shock, income increases. This triggers faster accumulation, but since the
marginal productivity of capital is totally determined by the exogenous shock,
there is no endogenous force that can drive the economy back to its steady
state. Therefore, each additional capital is kept forever.
This implies that following shocks, the economy will enter an ever growing
regime. This may be seen from figure 1.16 which reports a simulated path for
each variable. These simulated data may be used to generate time moments on
Figure 1.16: Simulated data
Output

Consumption

0.09

0.024

0.08

0.022
0.02

0.07

0.018
0.06
0.016
0.05

0.014

0.04
0.03

0.012
0

50

100
Time

150

200

0.01

50

Investment

100
Time

150

200

150

200

Capital

0.07

2.5

0.06
2
0.05
0.04
1.5
0.03
0.02

50

100
Time

150

200

50

100
Time

the rate of growth of each variable, which estimates are reported in table (1.4)
and which distributions are represented in figures 1.171.20. It is interesting
to note that all variables exhibit when taken in loglevels a spurious
correlation with output that just reflects the existence of a common trend due
to the balanced growth path hypothesis.

74

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Table 1.4: MonteCarlo Simulations

Corr(.,Y )

Corr(.,Y )

Y
0.40
0.79
1.00
-0.01
Y
0.99

C
0.40
0.09
0.30
0.93
C
0.99

I
0.40
1.06
0.99
-0.02
I
0.99

K
0.40
0.09
-0.08
0.93
K
0.99

Figure 1.17: Rates of growth: distribution of mean


Output

Consumption

200

200

150

150

100

100

50

50

4
Time

4
Time

x 10

Investment
200

150

150

100

100

50

50

4
Time

6
3

x 10

Capital

200

6
3

x 10

0
2.5

3.5

4
Time

4.5

5.5
3

x 10

1.6. ECONOMIC EXAMPLES

75

Figure 1.18: Rates of growth: distribution of standard deviation


Output

Consumption

200

200

150

150

100

100

50

50

8
Time

10
x 10

0.5

1
Time

Investment
200

150

150

100

100

50

50

0.01

2
x 10

Capital

200

0
0.008 0.009

1.5

0.011 0.012 0.013 0.014


Time

0.5

1
Time

1.5

2
x 10

Figure 1.19: Rates of growth: distribution of correlation with Y


Output

Consumption

5000

250

4000

200

3000

150

2000

100

1000

50

0
60

40

20

0
Time

20

40

60

0
0.2

0.25

Investment

0.3

0.35
Time

0.4

0.45

Capital

250

200

200

150

150
100
100
50

50
0
0.998

0.9985

0.999 0.9995
Time

1.0005

0
0.4

0.2

0
Time

0.2

0.4

76

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Figure 1.20: Rates of growth: distribution of first order autocorrelation


Output

Consumption

200

250
200

150

150
100
100
50

0
0.4

50

0.2

0
Time

0.2

0.4

0
0.7

0.75

Investment
200

150

150

100

100

50

50

0.2

0
Time

0.85
Time

0.9

0.95

Capital

200

0
0.4

0.8

0.2

0.4

0
0.4

0.2

0
Time

0.2

0.4

1.6. ECONOMIC EXAMPLES


Matlab Code: AK growth model
%
% AK growth model
%
long
= 200;
nsim
= 5000;
nrep
= 50;
%
% Structural parameters
%
gx
= 1.004;
beta
= 0.99;
delta
= 0.025;
rho
= 0.95;
se
= 0.0079;
ab
= (gx-beta*(1-delta))/beta;
K0
= 1;
%
% IRF
%
K1(1)
= K0;
K2(1)
= K0;
a2
= zeros(nrep,1);
K1
= zeros(nrep,1);
K2
= zeros(nrep,1);
K1(1)
= K0;
K2(1)
= K0;
a2(1)
= log(ab);
e
= zeros(nrep,1);
e(11)
= 10*se;
T=[1:nrep];
for i
= 2:nrep;
a2(i)= rho*a2(i-1)+(1-rho)*log(ab)+e(i);
K1(i)= beta*(ab+1-delta)*K1(i-1);
K2(i)= beta*(exp(a2(i-1))+1-delta)*K2(i-1);
end;
C1 = (1-beta)*(ab+1-delta).*K1;
Y1 = ab*K1;
I1 = Y1-C1;
C2 = (1-beta)*(exp(a2)+1-delta).*K2;
Y2 = exp(a2).*K2;
I2 = Y2-C2;
Y=[Y1(:) Y2(:)];
C=[C1(:) C2(:)];
K=[K1(:) K2(:)];
I=[I1(:) I2(:)];
%
% Simulations
%
cx=zeros(nsim,4);
mx=zeros(nsim,4);
sx=zeros(nsim,4);
rx=zeros(nsim,4);

77

78

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

for s
= 1:nsim;
disp(s)
randn(state,s);
e
= randn(long,1)*se;
a
= zeros(long,1);
K
= zeros(long,1);
a(1) = log(ab)+e(1);
K(1) = K0;
for
i
= 2:long;
a(i)= rho*a(i-1)+(1-rho)*log(ab)+e(i);
K(i)= beta*(exp(a(i-1))+1-delta)*K(i-1);
end;
C
= (1-beta)*(exp(a)+1-delta).*K;
Y
= exp(a).*K;
I
= Y-C;
X
= [Y C I K];
dx
= diff(log(X));
mx(s,:)
= mean(dx);
sx(s,:)
= std(dx);
tmp
= corrcoef(dx);cx(s,:)=tmp(1,:);
tmp
= corrcoef(dx(2:end,1),dx(1:end-1,1));ry=tmp(1,2);
tmp
= corrcoef(dx(2:end,2),dx(1:end-1,2));rc=tmp(1,2);
tmp
= corrcoef(dx(2:end,3),dx(1:end-1,3));ri=tmp(1,2);
tmp
= corrcoef(dx(2:end,4),dx(1:end-1,4));rk=tmp(1,2);
rx(s,:)
= [ry rc ri rk];
end;
disp(mean(mx))
disp(mean(sx))
disp(mean(cx))
disp(mean(rx))

1.6.5

Announcements

In the last two examples, we will help you to give an answer to this crucial
question:
Why do these two guys annoy us with rational expectations?
In this example we will show you how different may the impulse response to a
shock be different depending on the fact that the shock is announced or not.
To illustrate this issue, let us go back to the problem of asset pricing. Let pt
be the price of a stock, dt be the dividend which will be taken as exogenous
and r be the rate of return on a riskless asset, assumed to be held constant
over time. As we have seen earlier, standard theory of finance states that when
agents are risk neutral, the asset pricing equation is given by:
Et pt+1 pt dt
+
=r
pt
pt

1.6. ECONOMIC EXAMPLES

79

or equivalently

1
1
Et pt+1 +
dt
1+r
1+r
Let us now consider that the dividend policy of the firm is such that from
pt =

period 0 on, the firm serves a dividend equal to d0 . The price of the asset is
therefore given by
i

1 X
1
d0
pt =
Et dt+i =
1+r
1+r
r
i=0

If, in period T , the firm unexpectedly decides to serve a dividend of d1 > d0 ,


the price of the asset will be given by
d1
t > T
pt =
r
In other words, the price of the asset shifts upward to its new level, as shown
in the upperleft panel of figure 1.21. Let us now assume that the firm anFigure 1.21: Asset pricing behavior
t0=10, T=40

Unexpected shock
2.5

2.5

1.5

1.5

0.5

10

20

30
Time

40

50

60

0.5

10

20

t0=20, T=40
2.5

1.5

1.5

10

20

30
Time

40

50

60

40

50

60

t0=30, T=40

2.5

0.5

30
Time

40

50

60

0.5

10

20

30
Time

nounces in t0 < T that it will raise its dividend from d0 to d1 in period T .


This dramatically changes the behavior of the asset price, as the structure of
information is totally modified. Indeed, before the shock is announced by the
firm, the level of the asset price establishes at
d0
pt =
r

80

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

as before. In period t0 things change as the individuals now know that in


T t0 period the price will be different, this information is now included in
the information set they use to formulate expectations. Hence, from period
t0 to T , they take this information into account in their calculation, and the
asset price is now given by
pt =
=
=

it
it
T 1 

1 X
1
1 X
1
d0 +
d1
1+r
1+r
1+r
1+r
i=t
i=T
it
it
T
1 

X
X
1
1
1
1
d0 +
(d1 d0 + d0 )
1+r
1+r
1+r
1+r
i=t
i=T
it
it


1
1
1 X
1 X
d0 +
(d1 d0 )
1+r
1+r
1+r
1+r
i=t

i=T

Denoting j = i t in the first sum and = i T in the second, we have


j
+T t


1
1
1 X
1 X
d0 +
(d1 d0 )
pt =
1+r
1+r
1+r
1+r
j=0
=0

T t 

d0
1
d1 d0
=
+
r
1+r
r
Finally, from T on, the shock has taken place, such that the value of the asset
is given by

d1
r
Hence, the dynamics of the asset price is given by
d0
for t < t0

r
T t 


d1 d0
d0
1
pt =
for t0 6 t 6 T
+ 1+r
r

dr1
for t > T
r
pt =

Hence, compared to the earlier situation, there is now a transition phase that
takes place as soon as the individuals has learnt the news and exploits this
additional piece of information when formulating her expectations. This dynamics is depicted in figure 1.21 for different dates of announcement.

1.6.6

The Lucas critique

As a last example, we now have a look at the socalled Lucas critique. One
typical answer to the question raised in the previous section may be found

1.6. ECONOMIC EXAMPLES

81

in the socalled Lucas critique (see e.g. Lucas [1976]) , or the econometric
policy evaluation critique, which asserts that because the apparently (for old
fashioned econometricians) structural parameters of a model may change when
policy changes, standard econometrics may not be used to study alternative
regimes. In order to illustrate this point, let us go back to the simplest example
we were dealing with:
yt = aEt yt+1 + bxt
xt = xt1 + t
which solution is given by

b
xt
1 a
Now let us assume for a while that yt denotes output and xt is money, which is
yt =

discretionary provided by a central bank. An econometrician that has access


to data on output and money would estimate the reduced form of the model
yt = xt
where
b should converge to b/(1 a). Now the central banker would like to
evaluate the implications of a new monetary policy from t = T on
xt = xt1 + t with >

What should be done then? The oldfashioned econometrician would do the


following:
1. Take the estimated reduced form: yt =
bxt
2. Simulate paths for the new xt process

3. Analyse the properties of the time series


Stated like that all seems OK. But such an approach is totally false. Indeed,
underlying the rational expectations hypothesis is the fact that the agents
know the overall structure of the model, therefore, the agents know that from
t = T on the new monetary policy is
xt = xt1 + t with >

82

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS


Figure 1.22: The Lucas Critique
1.8
Misspecified
Correct
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0

10

15

20
Time

25

30

35

40

the model needs then to be solved again to yield


yt =

b
xt
1 a

Therefore the econometrician should reestimate the reduced form to get


b t
yt = x

Keeping the old reduced form implies a systematic bias of


ab( )
(1 a)(1 a)
To give you an idea of the type of mistake one may do, we report in figure
1.22 the impulse response to a monetary shock in the second monetary rule
when the old reduced form (misspecified) and the new one (correct) are used.
As it should be clear to you using the wrong rule leads to a systematic bias
in policy evaluation since it biases in this case the impact effect of the
policy. Why is that so? Because the rational expectations hypothesis implies
that the expectation function is part of the solution of the model.Keep in mind
that solving a RE model amounts to find the expectation function.
Hence, from an econometric point of view, the rational expectations hypothesis
has extremely important implications since they condition the way we should

1.6. ECONOMIC EXAMPLES

83

think of the model, solve the model and therefore evaluate and test the model.
This will be studied in the next chapter.

84

CHAPTER 1. EXPECTATIONS AND ECONOMIC DYNAMICS

Bibliography
Blanchard, O. and C. Kahn, The Solution of Linear Difference Models under
Rational Expectations, Econometrica, 1980, 48 (5), 13051311.
Blanchard, O.J. and S. Fisher, Lectures on Macroeconomics, Cambridge: MIT
Press, 1989.
Lubik, T.A. and F. Schorfheide, Computing Sunspot Equilibria in Linear Rational Expectations Models, Journal of Economic Dynamics and Control,
2003, 28, 273285.
Lucas, R., Econometric policy Evaluation : a Critique, in K. Brunner and
A.H. Meltzer, editors, The Phillips Curve and Labor Markets, Amsterdam: NorthHolland, 1976.
Muth, J.F., Optimal Properties of Exponentially Weighted Forecasts, Journal
of the American Statistical Association, 1960, 55.
, Rational Expections and the Theory of Price Movements, Econometrica, 1961, 29, 315335.
Romer, P., Increasing Returns and Long Run Growth, Journal of Political
Economy, 1986, 94, 10021037.
Sargent, T., Macroeconomic Theory, MIT Press, 1979.
Sargent, T.J., Dynamic Macroeconomic Theory, Londres: Harvard University
Press, 1987.
Sims, C., Solving Linear Rational Expectations Models, manuscript, Princeton
University 2000.

85

86

BIBLIOGRAPHY

Contents
1 Expectations and Economic Dynamics

1.1

The rational expectations hypothesis . . . . . . . . . . . . . . .

1.2

A prototypical model of rational expectations . . . . . . . . . .

1.2.1

Sketching up the model . . . . . . . . . . . . . . . . . .

1.2.2

Forward looking solutions: |a| < 1 . . . . . . . . . . . .

1.2.3

Backward looking solutions: |a| > 1 . . . . . . . . . . .

15

1.2.4

One step backward: bubbles . . . . . . . . . . . . . . . .

18

A step toward multivariate Models . . . . . . . . . . . . . . . .

23

1.3.1

The method of undetermined coefficients

. . . . . . . .

24

1.3.2

Factorization . . . . . . . . . . . . . . . . . . . . . . . .

28

1.3.3

A matricial approach . . . . . . . . . . . . . . . . . . . .

29

Multivariate Rational Expectations Models (The simple case) .

33

1.4.1

Representation . . . . . . . . . . . . . . . . . . . . . . .

33

1.4.2

Solving the system . . . . . . . . . . . . . . . . . . . . .

35

Multivariate Rational Expectations Models (II) . . . . . . . . .

38

1.5.1

Preliminary Linear Algebra . . . . . . . . . . . . . . . .

38

1.5.2

Representation . . . . . . . . . . . . . . . . . . . . . . .

39

1.5.3

Solving the system . . . . . . . . . . . . . . . . . . . . .

40

1.5.4

Using the model . . . . . . . . . . . . . . . . . . . . . .

47

Economic examples . . . . . . . . . . . . . . . . . . . . . . . . .

50

1.6.1

Labor demand . . . . . . . . . . . . . . . . . . . . . . .

51

1.6.2

The Real Business Cycle Model . . . . . . . . . . . . . .

58

1.6.3

A model with indeterminacy . . . . . . . . . . . . . . .

65

1.6.4

AK growth model . . . . . . . . . . . . . . . . . . . . .

69

1.6.5

Announcements . . . . . . . . . . . . . . . . . . . . . . .

78

1.3

1.4

1.5

1.6

87

88

CONTENTS
1.6.6

The Lucas critique . . . . . . . . . . . . . . . . . . . . .

80

List of Figures
1.1

The regular case . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.2

Forward Solution . . . . . . . . . . . . . . . . . . . . . . . . . .

12

1.3

The irregular case . . . . . . . . . . . . . . . . . . . . . . . . .

16

1.4

Backward Solution . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.5

Deterministic Bubble . . . . . . . . . . . . . . . . . . . . . . . .

20

1.6

Bursting Bubble . . . . . . . . . . . . . . . . . . . . . . . . . .

22

1.7

Backwardforward solution . . . . . . . . . . . . . . . . . . . .

27

1.8

Geometrical interpretation of eigenvalues/eigenvectors . . . . .

30

1.9

A source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

1.10 A sink: indeterminacy . . . . . . . . . . . . . . . . . . . . . . .

32

1.11 The saddle path

. . . . . . . . . . . . . . . . . . . . . . . . . .

33

1.12 Impulse Response Function (AR(1)) . . . . . . . . . . . . . . .

48

1.13 Impulse Response to a Wage Shock . . . . . . . . . . . . . . . .

56

1.14 IRF to a technology shock . . . . . . . . . . . . . . . . . . . . .

66

1.15 Impulse response functions

. . . . . . . . . . . . . . . . . . . .

72

1.16 Simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

1.17 Rates of growth: distribution of mean . . . . . . . . . . . . . .

74

1.18 Rates of growth: distribution of standard deviation . . . . . . .

75

1.19 Rates of growth: distribution of correlation with Y . . . . . .

75

1.20 Rates of growth: distribution of first order autocorrelation . . .

76

1.21 Asset pricing behavior . . . . . . . . . . . . . . . . . . . . . . .

79

1.22 The Lucas Critique . . . . . . . . . . . . . . . . . . . . . . . . .

82

89

90

LIST OF FIGURES

List of Tables
1.1

Parameterization: labor demand . . . . . . . . . . . . . . . . .

56

1.2

The Real Business Cycle Model: parameters . . . . . . . . . . .

63

1.3

New Keynesian model: parameters . . . . . . . . . . . . . . . .

67

1.4

MonteCarlo Simulations . . . . . . . . . . . . . . . . . . . . .

74

91

Lecture Notes 2

Towards nonlinear methods


In the previous lectures, we dealt with linear economies, for which there exist
straightforward methods to solve the involved expectational difference equations. However, most of the models we encounter in economics are fundamentally characterized by nonlinear dynamical features. We therefore need
methods to solve such models. The aim of this chapter is to introduce you
to such methods, by first pointing out the possible drawbacks of simple linearization a method commonly used in the literature. We will present four
methods which may be used to solve RE models:
1. Perturbation methods (see Judd (1998), Judd and Gaspard (1997), Collard and Juillard (2001a), Collard and Juillard (2001b) or ?)), which
essentially amount to take higherorder Taylor approximation of the
model;
2. Value iteration (see Christiano (1990), Tauchen and Hussey (1991)),
which may be simply thought of as finding a fixed point on an operator;
3. Parameterized Expectations Algorithm (PEA) (see Marcet (1988), Den Haan
and Marcet (1990) or Marcet and Lorenzoni (1999) among others), which
may be thought of as a generalization of the method of undetermined
coefficients to the higher order relying on simulations;
4. Minimum weighted residual methods (see Judd (1992), McGrattan (1996),
1

McGrattan (1999)), which, as PEA, may be thought of as a generalization of the method of undetermined coefficients to the higher order
but exploits some orthogonality conditions rather than relying on simulations;
Each method is illustrated by an economic example, which is intended to show
you the potential and simplicity of the method. However, before going to such
methods, we shall now see why linearizing many not always be a good idea.
The big question is then
What are we missing?

2.1

Risk and the Certainty Equivalence Hypothesis

Taking a either linear or loglinear approximation to the decision rules of an


economic model, is actually equivalent to taking a quadratic approximation
to the optimization problem that lies behind the optimal behavior of agents.
In so doing we encounter the socalled Certainty Equivalence property. In
order to understand the certainty equivalence property, let us consider the
following problem. Let us consider that x is a random variable with probability
density g(x) and let y be a variable decided by a decision maker (this may be
consumption for an household, investment or labor for a firm, the price for a
monopolist. . . ). This decision maker has an objective function u(y, x) which
is concave and twice continuously differentiable. Its y plan is then chosen by
solving
max E(u(y, x))
y

u(y, x)g(x)dx

The first order condition1 for choosing y is then given by (applying Leibniz
rule)

u(y, x)g(x)dx = 0

u(y, x)g(x)dx = 0
y

(2.1)

1
Note that since u(.) is concave and g(.) is positive, this condition is necessary and
sufficient.

Now, let us assume that u(y, x) is a second order Taylor expansion of


another objective, such that
H
u(y, x) = (y x)J + (y x)
2

y
x

where H is a negativedefinite (2 2) matrix, such that

1
y
Jy
hxx hxy
+ (y x)
u(y, x) = (y x)
x
Jx
hyx hyy
2

1
= Jy y + Jx x +
hxx x2 + (hxy + hyx )xy + hyy y 2
2

(2.2)

In such a case, (2.1) rewrites


Z h
i
2Jy + (hxy + hyx )Ex
x
Jy + (hxy + hyx ) + hyy y g(x)dx = 0 y =
2
2hyy
Now let us consider a situation where the objective of the decision maker
is

max u(y, E(x)) u y, xg(x)dx


y

Note that in this case, we are not maximizing the expected value of the problem
but the value, taking into account the expected value of x. Given the functional
form (2.2), the first order condition is now
Jy + (hxy + hyx )

2Jy + (hxy + hyx )Ex


Ex
+ hyy y = 0 y =
2
2hyy

which is exactly the same as before. In other words, we have for the
quadratic formulation
Argmax E(u(y, x)) = Argmax u(y, E(x))
y

This is what is usually called the Certainty equivalence principle: risk does not
matter in decision making, the only thing that matters is the average value
of the random variable x, not its variability. But this is usually not a general
result. Let us consider, for example, the case of Burnsides [1998] asset pricing
model.
3

2.1.1

An assetpricing example

This model is a standard asset pricing model for which (i) the marginal intertemporal rate of substitution is an exponential function of the rate of growth
of consumption and (ii) the endowment is a Gaussian exogenous process. As
shown by Burnside (1998), this setting permits to obtain a closed form solution
to the problem. We consider a frictionless pure exchange economy a
` la Mehra
and Prescott (1985) and Rietz (1988) with a single household and a unique
perishable consumption good produced by a single tree. The household can
hold equity shares to transfer wealth from one period to another. The problem of a single agent is then to choose consumption and equity holdings to
maximize her expected discounted stream of utility, given by
Et

X
=0

ct+
with (, 0) (0, 1]

(2.3)

subject to the budget constraint


pt et+1 + ct = (pt + dt )et

(2.4)

where (0, 1) is the agents subjective discount factor, ct is households


consumption of a single perishable good at date t, pt denotes the price of the
equity in period t and et is the households equity holdings in period t. Finally,
dt is the trees dividend in period t. Dividends are assumed to grow at rate xt
such that :
dt = exp(xt )dt1

(2.5)

where xt , the rate of growth of dividends, is assumed to be a Gaussian stationary AR(1) process
xt = (1 )x + xt1 + t

(2.6)

where is i.i.d. N (0, 2 ) with || < 1. Market clearing requires that et = 1


so that ct = dt in equilibrium. Like in Burnside (1998), let yt denote the
4

pricedividend ratio, yt = pt /dt . Then, condition for the households problem


can be shown to rewrite as
yt = Et [exp(xt+1 )(1 + yt+1 )]

(2.7)

Burnside (1998) shows that the above equation admits an exact solution of
the form2
yt =

i exp [ai + bi (xt x)]

(2.8)

i=1

where

and

2 2
2(1 i ) 2 (1 2i )
ai = xi +
+
i
2(1 )2
1
1 2

(1 i )
1
As can be seen from the definition of ai , the volatility of the shock, directly
bi =

enters the decision rule,, therefore Burnsides [1998] model does not make the
certainty equivalent hypothesis: risk matters for asset holdings decisions.
What happens then, if we now obtain a solution relying on a first order
Taylor approximation of the model?
First of all let us determine the deterministic steady state of the economy:
y ? = exp(x? )(1 + y ? )
x? = x? + (1 )x
such that we get
exp(x? )
1 exp(x? )
= x

y? =
x?

(2.9)
(2.10)

The first order Taylor expansion of the Euler equation yields

ybt = exp(x? )Et (b


yt+1 ) + exp(x? )Et (b
xt+1 )

See appendix ?? for a detailed exposition of the solution.

(2.11)

We actually recognize the simplest RE model we have been dealing with in


chapter 2 (yt = aEt yt+1 +bxt ) such that we may use a undetermined coefficient
approach and guess a decision rule of the form
ybt = b
xt

Plugging the guess in (2.11), we get

b
xt = exp(x? )Et (b
xt+1 ) + exp(x? )Et (b
xt+1 )
taking expectations and identifying, we obtain
=

exp(x? )
(1 exp(x? ))(1 exp(x? ))

such that the approximate decision rule may be written as


yt = y ? + (xt x? )

We are now endowed to compute the approximation error we make using


linear approximation. As the model admits a closedform solution, the accuracy of the approximation method can be directly checked against the true
decision rule. This is undertaken relying on the two following criteria

N
1 X yt yet
E1 = 100
yt
N
t=1

and

yt yet

= 100 max
yt

where yt denotes the true solution to pricedividend ratio and yet is the ap-

proximation of the true solution by the method under study. E1 represents


the average relative error an agent makes using the approximation rather than

the true solution, while E is the maximal relative error made using the approximation rather than the true solution. These criteria are evaluated over
the interval xt [x x , x + x ] where is selected such that we explore
6

99.99% of the distribution of x. Table 2.1 reports E1 and E for the different cases. Our benchmark experiment amounts to considering the Mehra and
Prescotts [1985] parameterization of the asset pricing model. We therefore
set the mean of the rate of growth of dividend to x = 0.0179, its persistence to
= 0.139 and the volatility of the innovations to = 0.0348. These values
are consistent with the properties of consumption growth in annual data from
1889 to 1979. was set to -1.5, the value widely used in the literature, and
to 0.95, which is standard for annual frequency. We then investigate the implications of changes in these parameters in terms of accuracy. In particular, we
study the implications of larger and lower impatience, higher volatility, larger
curvature of the utility function and more persistence in the rate of growth of
dividends.
Table 2.1: Accuracy check

E1
E
E1
E

Benchmark
1.43
1.46
=0.5
0.29
0.29

=0.5
0.24
0.26
=0.001
0.01
0.03

=0.99
2.92
2.94
=0.1
11.70
11.72

=-10
23.53
24.47
=0
1.57
1.57

=-5
8.57
8.85
=0.5
5.52
6.76

=0
0.50
0.51
=0.9
37.50
118.94

Note: The series defining the true solution was truncated after 800 terms,
as no significant improvement was found adding additional terms at the
machine accuracy. When exploring variations in , the overall volatility
of the rate of growth of dividends was maintained to its benchmark level.

At a first glance at table 2.1, it appears that linear approximation can


only accommodate situations where the economy does not experiment high
volatility or large persistence of the growth of dividends, or where the utility
of individuals does not exhibit much curvature. This is for instance the case in
the Mehra and Prescotts [1985] parameterization (benchmark) case as both
the average and maximal approximation error lie around 1.5%. But, as is
nowadays wellknown, increases along one of the aforementioned dimension
yields lower accuracy of the linear approximation. For instance, increasing the
7

volatility of the innovations of the rate of growth of dividends to =0.1 yields


approximation errors of almost 12% both in average and at the maximum, thus
indicating that the approximation performs particularly badly in this case.
This is even worse when the persistence of the exogenous process increases,
as =0.9 yields an average approximation error of about 40% and a maximal
approximation of about 120%. This is also true for increases in the curvature
of the utility function (see row 4 and 5 of table 2.1).
Figure 2.1 sheds light on these results. It reports the exact solution to the
problem (grey line) and the linear approximation of the true solution (thin
black line). We consider a rather extreme situation where = 5, = 0.5
and the volatility of the shock is preserved. As can be seen from figure 2.1,
Figure 2.1: Decision rule
20

Exact (high )
Exact (low )
Linear App.

18
16
14
12

Certainty equivalence bias

10
8
6
4
2
0
0.15

0.1

0.05

xt

0.05

0.1

0.15

Note: This graph was obtained for = 5 and = 0.5.

using a linear approximation induces two types of error:


8

0.2

1. in terms of curvature,
2. in terms of level.
The first type of error is obvious, as the linear approximation is not intended
(as it cannot) to capture any curvature. The second type of error is related
to the fact that we are using a approximation about the deterministic steady
state. Therefore, the latter source of error is related to the risk component. In
fact, this may be understood in light of the ai terms in the exact solution which
include the volatility of the innovations. In order to be sure that this
error is related to this component, we also report the exact solution when we
cut the overall volatility by 25% (thick dashed line). As can be seen the level
error tends to diminish dramatically, which indicates that the risk component
plays a major role in this as the average error is cut by 20% then (5% as
far as the maximal error is concerned). Hence, this suggests that the linear
approximation may only be accurate for low enough variability and curvature,
which prevents its use for studying structural breaks.

2.2

NonLinear Dynamics and Asymmetries

We now consider another situation where the linear approximation may perform poorly. This situation is related to the existence of strong asymmetries
in decision rules or strong asymmetries in the objective functions the economic
agents have to optimize. In order to illustrate this situation, let us take the
problem of a firm that has to decide on employment and which faces asymmetric adjustment costs. Asymmetric adjustment costs may be justified on
institutional grounds. We may argue for example that there exist laws in the
economy that render firings more costly than hirings.
We consider the case of a firm that has to decide on its level of employment.
The firm is infinitely lived and produces a good relying on a decreasing returns
to scale technology that essentially uses labor another way to think of it
would be to assume that physical capital is a fixedfactor. This technology is
9

represented by the constant returnstoscale production function 3


Yt = Ant with A > 0.
Using labor incurs two sources of cost
1. The standard payment for labor services: wt nt where wt is the real
wage, which positive sequence {wt }
t=0 is taken as given by the firm and
is assumed to evolve as
wt = wt1 + (1 )w + t
with t ; U[w ,w ] and4 w < (1 )w.
2. A cost of adjusting labor, C(t ) which satisfies
C(0) = 0, C 0 (0) = 0, C 00 () > 0
but that displays some asymmetries. An example of such a function is
depicted in figure (2.2)
Labor is then determined by maximizing the expected intertemporal profit
s

X
1
(f0 nt+s wt+s nt+s C(t+s ))
max Et
1+r
{n , }
=0
s=0

subject to
nt = t + nt1

(2.12)

which yields the two first order conditions


t = C 0 (t )
t

1
= A wt +
Et t+1
1+r

(2.13)
(2.14)

This will enable us to obtain an analytical solution to the problem


This assumption is imposed in order to guaranty the positivity of the real wage. Indeed
assume the economy experiences the
P worst shock in each and every period, then we would
have wt+j = j wt + (1 j )w jk=0 j which in the limit yields limj wt+j = w
/(1 ). The positivity condition then corresponds to what we impose in the main text.
4

10

Figure 2.2: Asymmetric adjustment costs


60

50

40

30

20

10

0
2

1.5

0.5

0
t

0.5

1.5

where t is the lagrange multiplier associated to (2.12) The second equation


may be simply solved iterating to yield
i

X
1
Et (f0 wt+i )
t =
1+r
i=0

Note that
j

Et (wt+j ) = wt + (1 )w +

j1
X
i=0

therefore
t =

d
= j wt + (1 j )w
2

w
1+r
(1 + r)A

wt
r
1+r 1+r

Then, t is given by
t = (wt ) C

0 1

(1 + r)A
w
1+r

wt
r
1+r 1+r

and we have
nt = (wt ) + nt1
11

Figure 2.3: Decision rule for t


t

wt

Since C 0 (.) may exhibit strong asymmetries, the decision rule may be extremely
nonlinear too to yield a decision rule of the form we depict in figure (2.3). As
can be seen from the graph, the linear approximation would do a very poor
job, as any departure from the steady state level (? = 0) would create a large
error. In other words, and as should have been expected, strong nonlinearities
forbid the use of linear approximations.
Beyond this point that may appear quite peculiar, since such important
nonlinearities are barely encountered after all, there exists a large class of
models for which linear approximation would do a bad job: models with binding constraints that we now investigate.

2.3

Dealing with binding constraints

In this section, we will provide you with an example where linear approximation should not be used because the decision rules are not differentiable. This
12

is the case when the agent faces possibly binding constraints. To illustrate it
we will develop a model of a consumer who is constrained on its borrowing in
the financial market.
We consider the case of a household who determines her consumption/saving
plans in order to maximize her lifetime expected utility, which is characterized
by the function:

Et

=0

c1
t+ 1
1

with (0, 1) (1, )

(2.15)

where 0 < < 1 is a constant discount factor, c denotes the consumption


bundle. In each and every period, the household enters the period with a level
of asset at carried from the previous period, for which it receives an constant
real interest rate r. It also receives an endowment t , which may either be
thought of as something totally extrinsic to the economy or as wages. But
this is taken to be exogenous as we are not interested by its determination.
Therefore, this will be an exogenous stochastic process of the form
log(t ) = log(t1 ) + (1 ) log() + t

(2.16)

with t ; N (0, 2 ). These revenus are then used to consume and purchase
assets on the financial market. Therefore, the budget constraint in t is given
by
at+1 = (1 + r)at + t ct

(2.17)

In addition, the household is submitted to a borrowing contraint that states


that she cannot borrow
at+1 > 0

The first order conditions to this model may be obtained forming the
5
Et (.) denotes mathematical conditional expectations. Expectations are conditional on
information available at the beginning of period t.

13

Lagrangean to the system:


!

X
c1
t+ 1

+ t+ ((1 + r)at+ + t+ ct+ at+ +1 ) + t+ at+ +1


Lt = E t

1
=0

where t and t respectively denote the Lagrangean multipliers associated to


the budget and the borrowing constraints. The first order conditions associated to the system are then
c
= t
t

(2.18)

t = t + (1 + r)Et t+1

(2.19)

t ((1 + r)at + t ct at+1 ) = 0

(2.20)

t at+1 = 0

(2.21)

t > 0

(2.22)

t > 0

(2.23)

together with

manipulating the system, we see that consumption is given by

c
= min ((1 + r)at + t ) , (1 + r)Et c
t
t+1

The decision rule of consumption is not differentiable in the point where assets
holdings are not sufficient to guaranty a positive net position on asset holdings:
((1 + r)at + t ) = (1 + r)Et c
t+1
Just to give you an idea of this phenomenon, we reported in figure 2.4 the
consumption decision rule for two different values of t as a function of cash
onhand which is given by (1+r)at +t .6 This nondifferentiability implies
obviously that linear approximation cannot be useful in this case, as they are
not defined in the neighborhood of the point that makes the household switch
from the unconstrained to the constrained regime. Nevertheless, if we are
6

We will see later on how these decision rules where computed.

14

Figure 2.4: Consumption decision rule


140

120

Consumption (c )

100

80

60

40

20

0
0

50

100

150
200
cashonhand ((1+r)att)

15

250

300

to consider an economy with tiny shocks and where the steady state lies in
the unconstrained regime, the linear approximation may be sufficient as the
decision rule is particularly smooth in this region because of consumption
smoothing).
We therefore need to investigate alternative methods, which however require some preliminaries

16

Bibliography
Burnside, C., Solving asset pricing models with Gaussian shocks, Journal of
Economic Dynamics and Control, 1998, 22, 329340.
Christiano, L., Solving the Stochastic Growth Model by Linear Quadratic
Approximation and by Value Function Iteration, Journal of Business and
Economic Statistics, 1990, 8 (1), 2326.
Collard, F. and M. Juillard, Accuracy of Stochastic Perturbation Methods:
The Case of Asset Pricing Models, Journal of Economic Dynamics and
Control, 2001, 25 (6/7), 979999.
and

, Accuracy of Stochastic Perturbation Methods: The Case of

Asset Pricing Models, Computational Economics, 2001, 17 (2/3), 125


139.
Haan, W. J. Den and M. Marcet, Solving the Stochastic Growth Model by
parametrizing Expectations, Journal of Business and Economic statistics,
1990, 8, 3134.
Judd, K., Projection Methods for Solving Aggregate Growth Models, Journal
of Economic Theory, 1992, 58, 410452.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
and J. Gaspard, Solving Large-Scale Rational-Expectations Models,
Macroeconomic Dynamics, 1997, 1, 4575.
17

Marcet, A., Solving Nonlinear Stochastic Models by Parametrizing Expectations, mimeo, CarnegieMellon University 1988.
and G. Lorenzoni, The Parameterized Expectations Approach: Some
Practical Issues, in R. Marimon and A. Scott, editors, Computational
methods for the study of dynamic economies, Oxford: Oxford University
Press, 1999, pp. 143171.
McGrattan, E.R., Solving the Stochastic Growth Models with a Finite Element Method, Journal of Economic Dynamics and Control, 1996, 20,
1942.
, Application of Weighted Residual Methods to Dynamic Economic Models, in R. Marimon and A. Scott, editors, Computational methods for
the study of dynamic economies, Oxford: Oxford University Press, 1999,
pp. 114142.
Mehra, R. and E.C. Prescott, The equity premium: a puzzle, Journal of
Monetary Economics, 1985, 15, 145161.
Rietz, T.A., The equity risk premium: a solution, Journal of Monetary Economics, 1988, 22, 117131.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.

18

Index
Asymmetries, 9
Binding constraints, 12
Certainty equivalence, 2
Nonlinearities, 9

19

20

Contents
2 Towards nonlinear methods
2.1

Risk and the Certainty Equivalence Hypothesis . . . . . . . . .

2.1.1

An assetpricing example . . . . . . . . . . . . . . . . .

2.2

NonLinear Dynamics and Asymmetries . . . . . . . . . . . . .

2.3

Dealing with binding constraints . . . . . . . . . . . . . . . . .

12

21

22

List of Figures
2.1

Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Asymmetric adjustment costs . . . . . . . . . . . . . . . . . . .

11

2.3

Decision rule for t

. . . . . . . . . . . . . . . . . . . . . . . .

12

2.4

Consumption decision rule . . . . . . . . . . . . . . . . . . . . .

15

23

24

List of Tables
2.1

Accuracy check . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Lecture Notes 3

Approximation Methods
In this chapter, we deal with a very important problem that we will encounter
in a wide variety of economic problems: approximation of functions. Such
a problem commonly occurs when it is too costly either in terms of time or
complexity to compute the true function or when this function is unknown
and we just need to have a rough idea of its main properties. Usually the only
thing that is required then is to be able to compute this function at one or a
few points and formulate a guess for all other values. This leaves us with some
choice concerning either the local or global character of the approximation
and the level of accuracy we want to achieve. As we will see in different
applications, choosing the method is often a matter of efficiency and ease of
computing.
Following Judd [1998], we will consider 3 types of approximation methods
1. Local approximation, which essentially exploits information on the value
of the function in one point and its derivatives at the same point. The
idea is then to obtain a (hopefully) good approximation of the function
in a neighborhood of the benchmark point.
2. Lp approximations, which actually find a nice function that is close to
the function we want to evaluate in the sense of a Lp norm. Ideally, we
would need information on the whole function to find a good approximation, which is usually infeasible or which would make the problem
1

of approximation totally irrelevant! Therefore, we usually rely on interpolation, which then appears as the other side of the same problem, but
only requires to know the function at some points.
3. Regressions, which may be viewed as an intermediate situation between
the two preceding cases, as it usually relies exactly as in econometrics,
on m moments to find n parameters of the approximating function.

3.1

Local approximations

The problem of the local approximation of a function f : R R is to make

use of information about the function at a particular point x0 R, to produce


a good approximation of f in a neighborhood of x0 . Among the various
available method 2 are of particular interest: the Taylor series expansion and
Pade approximation.

3.1.1

Taylor series expansion

Taylor series expansion is certainly the most wellknown and natural approximation to any student.
The basic framework: This approximation relies on the standard Taylors
theorem:
Theorem 1 Suppose F : R R is a C k+1 function, then for x? Rn , we
have

F (x) = F (x? ) +
+

1
2

n
X

1
k!

n
X
F ?
(x )(xi x?i )
xi

i=1
n
X

i=1 i1 =1
n
X

F
(x? )(xi1 x?i1 )(xi2 x?i2 ) + . . .
xi1 xi2
n
X

F
(x? )(xi1 x?i1 ) . . . (xik x?ik )
xi1 . . . xi1
i1 =1
ik =1

+O kx x? kk+1
+

...

The idea of Taylor expansion approximation is then to form a polynomial


approximation of the function f as described by the Taylors theorem. This
approximation method therefore applies to situations where the function is at
least n times differentiable to get a nth order approximation. If this is the

case, then we are sure that the error will be at most of order O kx x? kk+1 .1 .
In fact, we may look at Taylor series expansion from a slightly different

perspective and acknowledge that it amounts to approximate the function by


an infinite series. For instance in the one dimensional case, this amounts to
write
F (x) '
where k =

1 F
?
k! xi (x ).

n
X
k=0

k (x x? )k

As n tends toward infinity, the latter equation may be

understood as a power series expansion of the F in the neighborhood of x ? .


This is a natural way to think of this type of approximation if we just think of
the exponential function for instance, and the way a computer delivers exp(x).
Indeed, the formal definition of the exponential function is traditionally given
by
exp(x)

X
xk
i=0

k!

The advantage of this representation is that we are now in position to give


a very important theorem concerning the relevance of such approximation.
Nevertheless, we need to report some preliminary definitions.
Definition 1 We call the radius of convergence of the complex power series,
the quantity r defined by

k x k <
r = sup |x| :

k=0

r therefore provides the maximal radius of x C for which the complex series
converges. That is for any x C such that |x| < r, the series converges while

it diverges for any x C such that |x| > r.


1


Let us recall at this point that a function f : Rn Rk is O x` if limx0

kf (x)k
kxk`

<

Definition 2 A function F : C C is said to be analytic, if for every

x? there exists a sequence k and a radius r such that


F (x) =

n
X
k=0

k (x x? )k for kx x? k < r

Definition 3 Let F : C C be a function, and x? . x? is a

singularity of F is F is analytic on {x? } but not on .

For example, let us consider the tangent function tan(x). tan(x) is defined by
the ratio of two analytic functions
tan(x) =

sin(x)
cos(x)

since cos(x) and sin(x) may be written as


cos(x) =

(1)n

x2n
(2n)!

(1)n

x2n+1
(2n + 1)!

i=0

sin(x) =

X
i=0

However, the function obviously admits a singularity at x? = /2, for which


cos(x? ) = 0.
Given these definition we can now enounce the following theorem:
Theorem 2 Let F be an analytic function in x C. If F or any derivative

of F exhibits a singularity at xo C, then the radius of convergence in the

complex plane of the Taylor series expansion of F in the neighborhood of x ?

X
1 kF ?
(x )(x x? )k
k! xk
k=0

is bounded from above by kx? xo k.


This theorem is extremely important as it gives us a guideline to use Taylor
series expansions, as it actually tells us that the series at x? cannot deliver
any accurate, and therefore reliable, approximation for F an any point farther
4

away from x? than any singular point of F . An extremely simple example


to understand this point is provided by the approximation of the function
F (x) = log(1 x), where x (, 1), in a neighborhood of x? = 0. First of
all, note that xo = 1 is a singular value for F (x). The Taylor series expansion
of F (x) in the neighborhood of x? = 0 writes
log(1 x) ' (x)

X
xk
k=1

What the theorem tells us is that this approximation may be used for values
of x such that kx x? k is below kx? xo k = k0 1k = 1, that is for x such
that 1 < x < 1. In other words, the radius of convergence of the Taylor

approximation to this function is r = 1. As an numerical illustration, we

report in table 3.1, the true value of log(1 x), its approximate value using

100 terms in the summation, and the absolute deviation of this approximation
from the true value.2 As can be seen from the table, as soon as kx? xo k
approaches the radius, the accuracy of the approximation by the Taylor series
expansion performs more and more poorly.
Table 3.1: Taylor series expansion for log(1 x)
x
-0.9999
-0.9900
-0.9000
-0.5000
0.0000
0.5000
0.9000
0.9900
0.9999

log(1 x)
0.69309718
0.68813464
0.64185389
0.40546511
0.00000000
-0.69314718
-2.30258509
-4.60517019
-9.21034037

100 (x)
0.68817193
0.68632282
0.64185376
0.40546511
0.00000000
-0.69314718
-2.30258291
-4.38945277
-5.17740221

0.00492525
0.00181182
1.25155e-007
1.11022e-016
0
2.22045e-016
2.18735e-006
0.215717
4.03294

2
The word true lies between quotes as it has been computed by the computer and is
therefore an approximation!

The usefulness of the approach:

This approach to approximation is par-

ticularly useful and quite widespread in economic dynamics. For instance,


anyone that has ever tried to study the dynamic properties of any macro
model lets say the optimal growth model has once encountered the
method. Lets take for instance the optimal growth model which dynamics
may be summarized when preferences are logarithmic and technology is
CobbDouglas by the two following equations3
kt+1 = kt ct + (1 )kt

1
1 1
=
kt+1 + 1
ct
ct+1

(3.1)
(3.2)

It is a widespread practice to linearize of loglinearize such an economy around


the steady state, which we will define later on, to study the local dynamic
properties of the equilibrium. Lets assume that the steady state has already
been found and is given by ct+1 = ct = c? and kt+1 = kt = k ? for all t.

let us denote by b
kt the deviation of the capital stock kt with
respect to its steady state level in period t, such that b
kt = kt k ? . Likewise,

Linearization:

we define b
ct = ct c? . The first step of linearization is to reexpress the system

in terms of functions. Here, we can define

F (kt+1 , ct+1 , kt , ct ) =

kt+1 kt + ct (1 )kt

1
ct

and then build the Taylor expansion

1
ct+1
kt+1 + 1

F (kt+1 , ct+1 , kt , ct ) ' F (k ? , c? , k ? , c? ) + F1 (k ? , c? , k ? , c? )b


kt+1 + F2 (k ? , c? , k ? , c? )b
ct+1
3

+F3 (k ? , c? , k ? , c? )b
kt + F4 (k ? , c? , k ? , c? )b
ct

Since we just want to make the case of Taylor series expansions, we do not need to be
any more precise on the origin of these two equations. Therefore, we will take them as given,
but we will come back to their determination in the sequel.

We therefore just have to compute the derivatives of each subfunction, and


realize that the steady state corresponds to a situation where F (k ? , c? , k ? , c? ) =
0. This yields the following system
(
b
kt+1 k ? 1 b
k +b
ct (1 )b
kt = 0
t ? 1

1
1
+1 b
ct+1 c1? ( 1)k ? 2 b
c?2 b
kt+1 = 0
ct + c?2 k
which simplifies to
(
b
kt+1 k ? 1 b
k +b
ct (1 )b
kt = 0
? 1t

?
b
ct + k
+1 b
ct+1 ( 1) kc ? k ? 1 b
kt+1 = 0

We then have to solve the implied linear dynamic system, bu this is another
story that we will deal with in a couple of chapters.

Loglinearization:

Another common practice is to take a loglinear ap-

proximation to the equilibrium. Such an approximation is usually taken because it delivers a natural interpretation of the coefficients in front of the
variables: these can be interpreted as elasticities. Indeed, lets consider the
following onedimensional function f (x) and lets assume that we want to take
a loglinear approximation of f around x? . This would amount to have, as
deviation, a logdeviation rather than a simple deviation, such that we can
define
x
b = log(x) log(x? )

Then, a restatement of the problem is in order, as we are to take an approximation with respect to log(x):

f (x) = f (exp(log(x)))
which leads to the following first order Taylor expansion
f (x) ' f (x? ) + f 0 (exp(log(x? ))) exp(log(x? ))b
x = f (x? ) + f 0 (x? )x? x
b

If we apply this technic to the growth model, we end up with the following
system

b
ct (1 )b
kt = 0
b
ct + b
ct+1 ( 1)(1 (1 ))b
kt+1 = 0

b
kt+1

1(1) b
kt

1(1)

3.2

Regressions as approximation

This type of approximation is particularly common in economics as it just corresponds to ordinary least square (OLS) . As you know the problem essentially
amounts to approximate a function F by another function G of exogenous variables. In other words, we have a set of observable endogenous variables y i ,
i = 1 . . . N , which we are willing to explain in terms of the set of exogenous
variables Xi = {x1i , . . . , xki }, i = 1 . . . N . This problem amounts to find a set

of parameters that solves the problem


minp

N
X
i=1

(yi G(Xi ; ))2

(3.3)

The idea is therefore that is chosen such that on average G(X ; ) is close
enough to y, such that G delivers a good approximation for the true function
F in the region of x that we consider. This is actually nothing else than
econometrics!
There are however several choices that can be made and that give us much
more freedom than in econometrics. We now consider investigate these choices.

Number of data points:

Econometricians never have the choice of the

points they can use to reveal information on the F function, as data are given
by history. In numerical analysis this constraint is relaxed, we may be exactly
identified for example, meaning that the number of data points, N , is exactly
equal to the number of parameters, p, we want to reveal. Never would a good
econometrician do this kind of thing! Obviously, we can impose a situation
where N > p in order to exploit more information. The difference between
these two choices should be clear to you, in the first case we are sure that
the approximation will be exact in the selected points, whereas this will not
necessarily be the case in the second experiment. To see that, just think of the
following: assume we have a sample made of 2 data points for a function that
we want to approximate using a linear function. we actually need 2 parameters
8

to define a linear function: the intercept, and the slope . In such a case,
(3.3) rewrites
min (y1 x1 )2 + (y2 x2 )2

{,}

which yields the system of orthogonality conditions


(y1 x1 ) + (y2 x2 ) = 0
(y1 x1 )x1 + (y2 x2 )x2 = 0
which rewrites

1 1
x1 x2

y1 x1
y2 x2

A.v = 0

This system then just amounts to find the null space of the matrix A, which,
in the case x1 6= x2 (which we can always impose as we will see next), imposes
vi = 0, i = 1, 2. This therefore leads to

y1 = + x1
y2 = + x2
such that the approximation is exact. When the system is overidentified this
is not the case anymore.
Selection of data points:

This is actually a major difference between

econometrics and numerical approximations. We do control the space over


which we want to take an approximation. In particular, we can spread the
data points wherever we want in order to control information. For instance,
let us consider the particular case of the function depicted in figure (3.1). As
this function exhibits a kink in x? it may be beneficial to concentrate a lot
of points around x? in order to reveal as much information as possible on the
kink.
Functional forms:

One key issue in the selection of the approximating

function is a functional form. In most of the cases, a sequence of monomials


9

Figure 3.1: Selection of points


y = F (x) 6

x?

of different degrees is selected: Xi = {1, xi , x2 , . . . , xp }, One advantage of this


selection is basically simplicity. However, this often turns to be a very bad
choice as many problems then turn out to be illconditioned with such a choice.
The most obvious objection we may formulate against this specification choice
is related to the multicolinearity problem. Assume for instance that you want
to approximate a production function that depends on employment and the
capital stock. The capital stock is basically an extremely smooth variable
as, by construction, it is essentially a smooth moving average of investment
decisions
kt+1 = it + (1 )kt kt+1 =

X
`=0

(1 )` it`

Therefore, taking as a basis for exogenous variables powers of the capital stock
is basically a bad idea. To see that, let us assume that = 0.025 and it is a
white noise process with volatility 0.1, then let us simulate a 1000 data points
process for the capital stock and let us compute the correlation matrix for k tj ,
10

j = 1, . . . , 4, we get:
kt
kt2
kt3
kt4

kt
1.0000
0.9835
0.9572
0.9315

kt2
0.9835
1.0000
0.9933
0.9801

kt3
0.9572
0.9933
1.0000
0.9963

kt4
0.9315
0.9801
0.9963
1.0000

implying a very high correlation between the powers of the capital stock, therefore raising the possibility of occurrence of multicolinearity. A typical answer
to this problem is to rely on orthogonal polynomials rather than monomials.
We will discuss this issue extensively in the next section. A second possibility is
to rely on parcimonious approaches that do not require too much information
in terms of function specification. An alternative is to use neural networks.
Before going to Neural Network approximations, we first have to deal with
a potential problem you may face with all the examples I gave is that when
we solve a model: the true decision rule is unknown, such that we do not
know the function we are dealing with. However, the main properties of the
decision rule are known, in particular, we know that it has to satisfy some
conditions imposed by economic theory. As an example, let us attempt to find
an approximate solution for the consumption decision rule in the deterministic
optimal growth model. Economic theory teaches us that consumption should
satisfy the Euler equation
1

c
= c
t
t+1 kt+1 + 1

(3.4)

knowing that the capital stock evolves as

kt+1 = kt ct + (1 )kt
Let us assume that consumption may be approximated by

(kt , ) = exp 0 + 1 log(kt ) + 2 log(kt )2

(3.5)

over the interval [k; k]. Our problem is then to find the triple {0 , 1 , 2 } such

that

N
X

t=1

1
2
(kt , ) (kt+1 , ) kt+1
+1
11

is minimum. This actually amounts to solve a nonlinear least squares problem. However, a lot of structure is put on this problem as kt+1 has to satisfy
the law of motion for capital:

kt+1 = kt exp 0 + 1 log(kt ) + 2 log(kt )2 + (1 )kt

The algorithm then works as follows

1. Set a grid of N data points, {ki }N


i=1 , for the capital stock over the interval
[k; k], and an initial vector {0 , 1 , 2 }.

2. for each ki , i = 1, . . . , N , and given {0 , 1 , 2 }, compute


ct = (kt , )
and
kt+1 = kt (kt , ) + (1 )kt
3. Compute
ct+1 = (kt+1 , )
and the quantity

1
R(kt , ) (kt , ) (kt+1 , ) kt+1
+1
4. If the quantity
N
X

R(ki , )2

t=1

is minimal then stop, else update {0 , 1 , 2 } and go back to 2.


As an example, I computed the approximate decision rule for the deterministic
optimal growth model with = 0.3, = 0.95, = 0.1 and = 1.5. I consider
that the model may deviate up to 90% from its capital stock steady state
k = 0.1k ? and k = 1.9k ? and used 20 data points. Figure 3.2 reports the
approximate decision rule versus the true decision rule.4 As can be seen
4

The true decision rule was computed using value iteration, which we shall study later.

12

from the graph, the solution we obtained with our approximation is rather
accurate and we actually have

1 X ctrue
cLS
4
i
i
true

= 8.743369e and
N
ci
i=1

true

ci cLS

i
= 0.005140
max
true

c
i={1,...,N }
i

such that the error we are making is particularly small. Indeed, using the
approximate decision rule, an agent would make a maximal error of 0.51% in
its economic calculus i.e. 51 cents for each 100$ spent on consumption.
Figure 3.2: Leastsquare approximation of consumption
1.8

True Decision Rule


LS approximation

1.6

Consumption

1.4
1.2
1
0.8
0.6
0.4
0

0.5

1.5

2.5
3
Capital stock

3.5

4.5

A neural network may be simply viewed as a particular type of function,


which are flexible enough to fit fairly general functions. A neural network may
be simply understood using the standard metaphor of human brain. There is
an input, x, which is processed by a node. Each node is actually a function that
transforms the input into an output which is then itself passed to another node.
For instance, panel (a) of figure 3.3, borrowed from Judd [1998], illustrates the
13

singlelayer neural network whose functional form is


G(x, ) = h

n
X

i g(x(i) )

i=1

where x is the vector of inputs, and h and g are scalar functions. A common
choice for g in the case of the dingle layer neural network is g(x) = x. Therefore, if we set h to be identity too, we are back to the standard OLS model
with monomials. A second a perhaps much more interesting type of neural
Figure 3.3: Neural Networks
x1

x1

x2

x2

..
..
..
..
xn

~
q
>

- y

q
1

R
q

..
..
..
..
xn

(a)

wU
:

s
- - y
3

(b)

network is depicted in panel (b) of figure 3.3. This type of neural network
is called the hiddenlayer feedforward network. In this specification, we are
closer to the idea of network as the transformed input is fed to another node
that process it to get information. The associated function form is given by

G(x, , ) = f

m
X

j h

j=1

n
X
i=1

!
j g(x(i) )
i

In this case, h is called the hiddenlayer activation function. h should be a


squasher function i.e. a monotically nondecreasing function that maps
R onto [0; 1]. Three very popular functions are
1. The heaviside step function
h(x) =

1
0

14

for x > 0
for x < 0

2. The sigmoid function


h(x) =

1
1 + exp(x)

3. Cumulative distribution functions, for example the normal cdf

Z x
x2
1

exp 2
h(x) =
2
2
Obtaining an approximation then simply amounts to determine the set
of coefficients {j , ij ; i = 1 . . . n, j = 1, . . . , m}. This can be simply achieved

running nonlinear least squares, that is solving


min

{,}

N
X
`=1

(y` G(x` , , ))2

One particular nice feature of neural networks is that they deliver accurate
approximation using only few parameters. This characteristic is related to the
high flexibility of the approximating functions they use. Further, as established
by the following theorem by Hornik, Stinchcombe and White [1989], neural
network are extremely powerful in that they offer a universal approximation
method.
Theorem 3 Let f : Rn R be a continuous function to be approximated.
R
Let h be a continuous function, h : R R, such that either (i) h(x)dx

is finite and non zero and h is Lp for 1 6 p 6 or (ii) h is a squashing

function (nondecreasing, limx h(x) = 1,limx h(x) = 0). Let


(
n
X
n (h) =
g : Rn R, g(x) =
j h(wj .x + aj ),
j=1

aj , j R, and w j Rn , wj 6= 0, m = 1, 2, . . .

be the set of all possible single hiddenlayer feedforward neural networks, using
h as the hidden layer activation function. Then, for all > 0, probability
measure , and compact sets K Rn , there is a g (h) such that
Z
sup |f (x) g(x)| 6 and
|f (x) g(x)|d 6
xK

15

This theorem is of great importance as it provides a universal approximation


result that states that for a broad class of functions, neural networks deliver an
accurate approximation to any continuous function. Indeed, we may use for
instance any squashing function of the type we described above, or any simple
function that satisfies condition (i). Nevertheless one potential limitation of
the approach lies into the fact that we have to conduct nonlinear estimation,
which may be cumbersome under certain circumstances.
As an example, we will deal with the function

! !
1 3
3
,2
F (x) = min max , x
2
2
over the interval [3; 3] and consider a single hiddenlayer feedforward network
of the form
Fe(x, , , ) =

1
2
+
1 + exp((1 x + 1 ) 1 + exp((2 x + 2 )

The algorithm is then straightforward


1. Generate N values for x [3; 3] and compute F (x)
2. Set initial values for 0 = {i , i , i ; i = 1, 2}
3. Compute
N
X
i=1

(F (xi ) Fe (xi , ))2

if this quantity is minimal then stop, else update and go back to 2.


The last step can be performed using a nonlinear minimizer, such that we
are actually performing nonlinear leastsquares. Figure 3.4 plots the approximation where N = 1000 and the solution vector yields the values reported
in table 3.2. Note that from the form of the vector, we can deduce that the
first layer handle positive values for x, therefore corresponding to the left part
of the function, while the second layer takes care of the negative part of the
16

Table 3.2: Neural Network approximation


1
2.0277

1
6.8424

1
-10.0893

2
-1.5091

2
-7.6414

2
-2.9238

function. Further, it appears that the function is no so badly approximated


by this simple neural network, as
1
E2 =
N
and
E

N
X
i=1

b 2
(F (xi ) Fe (xi , )

! 21

= 0.0469

e
b
= max F (xi ) F (xi , ) = 0.2200
i

Figure 3.4: Neural Network Approximation


2.5
2

True
Neural Network

1.5
1
0.5
0
0.5
1
1.5
2
3

0
x

All these methods actually relied on regressions. They are simple, but
may be either totally unreliable and illconditioned in a number of problem, or
17

difficult to compute because they rely on nonlinear optimization. We will now


consider methods which will actually be more powerful and somehow simpler
to implement. They however require introducing some important preliminary
concepts, among which that of orthogonal polynomials

3.2.1

Orthogonal polynomials

This class of polynomial possesses by definition the orthogonality property which will prove to be extremely efficient and useful in a number of
problem. For instance, this will solve the multicolinearity problem we encountered in OLS. This property will also greatly simplify the computation of the
approximation in a number of problems we will deal with in the sequel. First
of all we need to introduce some preliminary concepts.
Definition 4 (Weighting function) A weighting function (x) on the interval [a; b] is a function that is positive almost everywhere on [a; b] and has a
finite integral on [a; b].
An example of such a weighting function is (x) = (1x)1/2 over the interval

[-1;1]. Indeed, limx1 (x) = 2/2, and 0 (x) > 0 such that (x) is positive
everywhere over the whole interval. Further
1
Z 1

2 1/2
(1 x )
dx = arcsin(x) =
1

Definition 5 (Inner product) Let us consider two functions f1 (x) and f2 (x)

both defined at least on [a; b], the inner product with respect to the weighting
function (x) is given by
hf1 , f2 i =

f1 (x)f2 (x)(x)dx
a

For example, assume that f1 (x) = 1, f2 (x) = x and (x) = (1 x2 )( 1/2),


then the inner product over the interval [-1;1] is
1
Z 1
p

x
2

dx = 1 x = 0
hf1 , f2 i =
1x
1
1
18

Hence, in this case, we have that the inner product of 1 and x with respect
to (w) on the interval [-1;1] is identically null. This will actually define the
orthogonality property.
Definition 6 (Orthogonal Polynomials) The family of polynomials {Pn (x)}
is mutually orthogonal with respect to (x) iff

hPi , Pj i = 0 for i 6= j
Definition 7 (Orthonormal Polynomials) The family of polynomials {Pn (x)}
is mutually orthonormal with respect to (x) iff it is orthogonal and
hPi , Pi i = 1 for all i
Table 3.3 reports the most common families of orthogonal polynomials (see
Judd [1998] for a more detailed exposition) and table 3.4 their recursive formulation
Table 3.3: Orthogonal polynomials (definitions)
Family
Legendre
Chebychev
Laguerre
Hermite

3.3

(x)
1
(1 x2 )1/2
exp(x)
exp(x2 )

[a; b]
[1; 1]
[1; 1]
[0, )
(, )

Definition
n
dn
2 n
Pn (x) = (1)
2n n! dxn (1 x )
1
Tn (x) = cos(n cos (x))
dn
n
Ln (x) = exp(x)
n! dxn (x exp(x))
dn
2
Hn (x) = (1)n exp(x2 ) dx
n exp(x )

Least square orthogonal polynomial approximation

We will now discuss one of the most common approach to approximation that
goes back to the easy OLS approach we have seen previously.
19

Table 3.4: Orthogonal polynomials (recursive representation)


Family
Legendre

0
P0 (x) = 1

1
P1 (x) = x

Recursion
Pn+1 (x) =

Chebychev

T0 (x) = 1

T1 (x) = x

Tn+1 (x) = 2xTn (x) Tn1 (x)

Laguerre

L0 (x) = 1

L1 (x) = 1 x

Ln+1 (x) =

Hermite

H0 (x) = 1

H1 (x) = 2x

Hn+1 (x) = 2xHn (x) 2nHn1 (x)

2n+1
n+1 xPn (x)

2n+1x
n+1 Ln (x)

n
n+1 Pn1 (x)

n
n+1 Ln1 (x)

Definition 8 Let F : [a, b] R be a function we want to approximate,


and g(x) a polynomial approximation of F . The least square polynomial ap-

proximation of F with respect to the weighting function (w) is the degree n


polynomial that solves
min

deg(g)6n a

(F (x) g(x))2 (x)dx

This equation may be understood adopting an econometric point of view,


as (w) may be given a weighting matrix interpretation we find in GMM
estimation. Indeed, (x) furnishes an indication on how much we care about
approximation errors as a function of x. Therefore, setting (x) = 1, which
amounts to put the same weight on any x then corresponds to a simple OLS
approximation.
Let us now assume that g(x) =

Pn

i=0 ci i (x),

where {k (x)}nk=0 is a se-

quence of orthogonal polynomials, the least square problem rewrites


!2
Z b
n
X
min
F (x)
ci i (x)) (x)dx
n
{ci }i=0

i=0

for which the first order conditions are given by


!
Z b
n
X
F (x)
ci i (x)) i (x)(x)dx = 0 for i = 0, . . . , n
a

i=0

20

which yields
ci =
Therefore, we have
F (x) '

hF, i i
hi , i i

n
X
hF, i i
i (x)
hi , i i
i=0

There are several examples of the use of this type of approximation. Fourier
approximation is an example of those which is suitable for periodic functions.
Nevertheless, I will focus in a coming section on one particular approach, which
we will use quite often in the next chapters: Chebychev approximation.
Beyond least square approximation, there exists other approaches that
departs from the least square by the norm they use:
Uniform approximation, which attempt to solve
lim max |F (x) pn (x)| = 0

n x[a;b]

The main difference between this approach and L2 approximation is that


contrary to L2 norms that put no restrictions on the approximation on
particular points, the uniform approximation imposes that the approximation of F at each x is exact, whereas L2 approximations juste requires
the total error to be as small as possible.
Minimax approximations, which rest on the L norm, such that these
approximations attempt to find an approximation that provides the best

uniform approximation to the function F that is we search the degree n


polynomial that achieves
min kF (x) g(x)k

deg(g)6n

There are theorems in approximation theory that states that the quality
of this approximation increases polynomially as the degree of polynomials increases, and the polynomial rate of convergence is faster for
21

smoother functions. Nevertheless, if this approximation can be as good


as needed for F , no use of its derivatives is made, such that the derivatives may be very poorly approximated. Finally, such approximation
methods are extremely difficult to compute.

3.4

Interpolation methods

Up to now, we have seen that there exist methods to compute the value of a
function at some particular points, but in most of the cases we are not only
interested by the function at some points but also between these points. This
is the problem of interpolation.

3.4.1

Linear interpolation

This method is totally straightforward and known to everybody. Assume you


have a collection of data points C = {(xi , yi )|i = 1, . . . , n}. Then for any

x [xi1 , xi ], we can compute y as


y=

xi yi1 xi1 yi
yi yi1
x+
xi xi1
xi xi1

Although, such approximations have proved to be very useful in a lot of applications, it can be particularly inefficient for different reasons:
1. It does not deliver an approximating function but rather a collection of
approximations for each interval;
2. it requires to identify the interval where the approximation is to be
computed, which may be particularly costly when the interval is not
uniform;
3. it can obviously perform very badly for nonlinear functions.
Therefore, there obviously exist alternative interpolation methods that perform better, but which are not necessarily more efficient.
22

3.4.2

Lagrange interpolation

This type of approximation is quite demanding as it consider a collection of


data C = {(xi , yi )|i = 1, . . . , n} with distinct xi called the Lagrange data.

Lagrange interpolation amounts to find a degree n 1 polynomial P (x), such

that yi = P (xi ). Therefore, the method is exact for each point. Lagrange

interpolating polynomials are defined by


Pi (x) =

Y x xj
xi x j
j6=i

Note that Pi (xi ) = 1 and Pi (xj ) = 0. The interpolation is then given by


P (x) =

n
X

yi Pi (x)

i=1

The obvious problem that we can immediately see from this formula is that
if the number of point is high enough, this type of interpolation is totally
untractable. Indeed, just to compute a single Pi (x) this already requires 2(n
1) substractions, n multiplications. Then this has to be constructed for the

n data points to compute all needed Pi (x). Then to compute P (x) we need
n additions and n multiplications. Therefore, this requires 3n2 operations!
Therefore, one actually may actually attempt to compute directly:
P (x) =

n1
X

i x i

i=0

which may be obtained by solving the linear system

or

y1 = 0 + 1 x1 + 2 x21 + . . . + n1 xn1

y2 = 0 + 1 x2 + 2 x2 + . . . + n1 xn1
2
2
..

yn = 0 + 1 xn + 2 x2n + . . . + n1 xn1
n
A = y
23

where = {0 , 1 , . . . , n1 }0 and A is the socalled Vandermonde matrix for

xi , i = 1, . . . , n

A=

1 x1 x21
1 x2 x22
.. . .
.. ..
.
.
. .
1 xn x2n

xn1
1
xn1
2
..
.
xn1
n

If the Lagrange formula guarantees the existence of the interpolation, the


following theorem guarantees its uniqueness.
Theorem 4 Provided the interpolating points are distincts, there is a unique
solution to the Lagrange interpolation problem.
The proof of the theorem may be sketched as follows. Assume that besides
P (x), there exists another interpolating polynomial P ? (x) of degree at most
n 1 that also interpolates the n data points. Then, by construction, P (x)

P ? (x) is at most of degree n 1 and is zeros at all n nodes. But the only

polynomial of degree n 1 that has n zeros is 0, such that P (x) = P ? (x).

Here again, the method may be quite demanding from a computational point of

view, as we have to invert a matrix of size (nn). There then exist much more
efficient methods, and in particular the socalled Chebychev approximation
that works very well for smooth functions.

3.5

Chebychev approximation

Chebychev approximation uses . . . Chebychev polynomials as a basis for the


polynomials (see figure 3.5). These polynomials are described by the recursion
Tn+1 (x) = 2xTn (x) Tn1 (x) with T0 (x) = 1, T1 (x) = x
which admits as solution
Tn (x) = cos(n cos1 (x))
as aforementioned these polynomials form an orthogonal basis with respect to
the weighting function (x) = (1 x2 )1/2 over the interval [1; 1]. Neverthe-

less, this interval may be generalized to [a; b], by transforming the data using
24

the formula

ya
1 for y [a; b]
ba
Beyond the standard orthogonality property, Chebychev polynomials exhibit
x=2

a discrete orthogonality property which writes as

n
0 for i 6= j
X
n for i = j = 0
Ti (rk )Tj (rk ) =
n
i=1
for i = j 6= 0
2

where rk , k = 1, . . . , n are the roots of Tn (x) = 0.

Figure 3.5: Chebychev polynomials


1
n=1
n=2
n=3
n=4

0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1

0.8

0.6

0.4

0.2

0
x

0.2

0.4

0.6

0.8

A first theorem will state the usefulness of Chebychev approximation.


Theorem 5 Assume F is a C k function over the interval [1; 1], and let
Z
2 1 F (x)Ti (x)

ci =
dx for i = 0 . . . n
1
1 x2

and

gn (x)

c0 X
+
ci Ti (x)
2
i=1

25

Then, there exists < such that for all n > 2


kF (x) gn (x)k 6

log(n)
nk

This theorem is of great importance as it actually states that the approximation gn (x) will be as close as we might want to F (x) as the degree of
approximation n increases to . In effect, since the approximation error is

bounded by above by log(n)


and since the latter expression tends to zero
nk
as n tends toward , we have that gn (x) converges uniformly to F (x) as n

increases. Further, the next theorem will establish a useful property on the
coefficients of the approximation.

Theorem 6 Assume F is a C k function over the interval [1; 1], and admits
a Chebychev representation

F (x) =

c0 X
+
ci Ti (x)
2
i=1

then, there exists c such that


|ci | 6

c
for j > 1
ik

This theorem is particularly important as it states that the smoother the


function to be approximated is (the greater k is), the faster is the pace at
which coefficients will drop off. In other words, we will be able to achieve a
high enough accuracy using less coefficients.
At this point, we have established that Chebychev approximation can be
accurate for smooth functions, but we still do not know how to proceed to get
a good approximation. In particular, a very important issue is the selection
of interpolating data points, the socalled nodes. This is the main problem
of interpolation: how to select nodes such that we minimize the interpolation
error? The answer to this question is particularly simple in the case of Chebychev interpolation: the nodes should be the zeros of the nth degree Chebychev
polynomial.
26

We are then endowed with data points to compute the approximation.


Using m > n data points, we can compute the (n 1)th order Chebychev
approximation relying on the Chebychev regression algorithm we will now

describe in details. When m = n, the algorithm reduces to the socalled


Cebychev interpolation formula. Let us consider the following problem: Let
F : [a; b] R, let us construct a degree n 6 m polynomial approximation of
F on [a; b]:

G(x)

n
X
i=0

xa
1
i Ti 2
ba

1. Compute m > n + 1 Chebychev interpolation nodes on [1; 1], which


are the roots of the degree m Chebychev polynomial

2k 1
for k = 1 . . . , m
rk = cos
2m
2. Adjust the nodes, rk , to fit in the [a; b] interval
xk = (rk + 1)

ba
+ a for k = 1 . . . , m
2

3. Evaluate the function F at each approximation node xk , to get a collection of ordinates


yk = F (xk ) for k = 1 . . . , m
4. Compute the collection of n + 1 coefficients = {i ; i = 0 . . . n} as

i =

m
X

yk Ti (rk )

k=1
m
X

Ti (rk )2

k=1

5. Form the approximation


G(x)

n
X
i=0

xa
i Ti 2
1
ba
27

Note that step 4 actually can be interpreted in terms of an OLS problem as


because of the orthogonality property of the Chebychev polynomials i
is given by
i =

cov(y, Ti (x))
var(Ti (x))

which may be recast in matricial notations as


= (X 0 X)1 X 0 Y
where

X=

T0 (x1 )
T0 (x2 )
..
.

T1 (x1 )
T1 (x2 )
..
.

..
.

Tn (x1 )
Tn (x2 )
..
.

T0 (xm ) T1 (xm )

Tn (xm )

and Y =

y1
y2
..
.
ym

We now report two examples implementing the algorithm. The first one
deals with a smooth function of the type F (x) = x . The second one evaluate the accuracy of the approximation in the case of a nonsmooth function:
F (x) = min(max(1.5, (x 1/2)3 ), 2).

In the case of the smooth function, we set = 0.1 and approximate the

function over the interval [0.01; 2]. We select 100 nodes and evaluate the
accuracy of degree 2 and 6 approximation. Figure 3.6 reports the true function
and the corresponding approximation, table 3.5 reports the coefficients. As
can be seen from the table, adding terms in the approximation does not alter
the coefficients of lower degree. This just reflects the orthogonality properties
of the Chebychev polynomials, that we saw in the formula determining each
i . This is of great importance, as it states that once we have obtained a high
order approximation, obtaining lower orders is particularly simple. This is
the economization principle. Further, as can be seen from the figure, a good
approximation to the function is obtained at rather low degrees. Indeed, the
difference between the function and its approximation at order 6 is already
good.
28

Table 3.5: Chebychev Coefficients: Smooth function

c0
c1
c2
c3
c4
c5
c6

n=2
0.9547
0.1567
-0.0598

n=6
0.9547
0.1567
-0.0598
0.0324
-0.0202
0.0136
-0.0096

Figure 3.6: Smooth function: F (x) = x0.1


Approximation

1.1
1

0.9

True
n=2
n=6

0.8

0.05
0.1

0.7
0.6
0

Residuals

0.05

0.5

1
x

1.5

0.15
0

29

n=2
n=6
0.5

1
x

1.5

Matlab Code: Smooth Function Approximation


m
= 100;
% number of nodes
n
= 6;
% degree of polynomials
rk = -cos((2*[1:m]-1)*pi/(2*m)); % Roots of degree m polynomials
a
= 0.01;
% lower bound of interval
b
= 2;
% upper bound of interval
xk = (rk+1)*(b-a)/2+a;
% nodes
Y
= xk.^0.1;
% compute the function at nodes
%
% Builds Chebychev polynomials
%
Tx
= zeros(m,n+1);
Tx(:,1) = ones(m,1);
Tx(:,2) = xk(:);
for i=3:n+1;
Tx(:,i) = 2*xk(:).*Tx(:,i-1)-Tx(:,i-2);
end
%
% Chebychev regression
%
alpha
= X\Y;
% compute the approximation coefficients
G
= X*a;
% compute the approximation

In the case of the nonsmooth function we consider (F (x) = min(max(1.5, (x


1/2)3 ), 2)), the coefficients remain large even at degree 15 and the residuals
remain high at order 15. This actually indicates that Chebychev approximations are well suited for smooth functions, but have more difficulties to capture
kinks. Nevertheless, increasing the order of the approximation drastically, we
can achieve a much better approximation.
In the later case, it seems that a piecewise approximation would perform
better. Indeed, in this case, we may compute 3 approximation
for x (, x), G(x) = 1.5
for x (x, x), G(x)

xa
2

1
i
i
i=0
ba

Pn

for x (x, ), G(x) = 2


where x is such that

(x 1/2)3 = 1.5
30

Table 3.6: Chebychev Coefficients: Nonsmooth function


n=3
-0.0140
2.0549
0.4176
-0.3120

c0
c1
c2
c3
c4
c5
c6
c7
c8
c9
c10
c11
c12
c13
c14
c15

n=7
-0.0140
2.0549
0.4176
-0.3120
-0.1607
-0.0425
-0.0802
0.0571

n=15
-0.0140
2.0549
0.4176
-0.3120
-0.1607
-0.0425
-0.0802
0.0571
0.1828
0.0275
-0.1444
-0.0686
0.0548
0.0355
-0.0012
0.0208

Figure 3.7: Nonsmooth function: F (x) = min(max(1.5, (x 1/2)3 ), 2)


Approximation
Residuals
3
1
True
n=3
n=3
2
n=7
n=7
n=15
0.5
1
n=15
0

1
2
4

0
x

0.5
4

31

0
x

and x satisfies
(x 1/2)3 = 2
In such a case, the approximation would be perfect with n = 3. This suggests
that piecewise approximation may be of interest in a number of cases.

3.6

Piecewise interpolation

We have actually already seen piecewise approximation method: the linear


interpolation method. But there exist more powerful and efficient method that
use splines. A spline can be any smooth function that is piecewise polynomial,
but most of all it should be smooth at all nodes.
Definition 9 A function S(x) on an interval [a; b] is a spline of order n if
1. S(x) is a C n2 function on [a; b],
2. There exist a collection of ordered nodes a = x0 < x1 < . . . < xm = b
such that S(x) is a polynomial of order n 1 on each interval [xi ; xi+1 ],

i = 0, . . . , m 1

Examples of spline functions are


Cubic splines: These splines functions are splines of order 4. These
splines are the most popular and are of the form

Si (x) = ai + bi (x xi ) + ci (x xi )2 + di (x xi )3 for x [xi ; xi+1 ]


B 0 splines: These functions are splines of order 1:

0, x < xi
0
1, xi 6 x 6 xi+1
Bi (x) =

0, x > xi+1
32

B 1 splines: These functions are splines of order 2 that actually describe


tent functions:

0,

xxi

x < xi
xi 6 x 6 xi+1
xi+1 xi ,
Bi1 (x) =
xi+2 x
xi+2 xi+1 , xi+1 6 x 6 xi+2

0,
x > xi+2

Such a spline reaches a peak at x = xi+1 and is upward (downward)


sloping for x < xi+1 (x > xi+1 ).
Higher order spline functions are defined by the recursion:

x xi
xi+n+1 x
n1
n1
n
Bi (x) =
Bi (x) +
Bi+1
(x)
xi+n xi
xi+n+1 xi+1
Cubic splines are the most widely used splines to interpolate functions. Therefore, we will describe the method in greater details in such a case.

Let

us assume that we are endowed with Lagrange data i.e. a collection of


nodes xi and corresponding values for the function yi = F (xi ) to interpolate: {(xi , yi ) : i = 0 . . . n}. We therefore have in hand n intervals [xi ; xi+1 ],

i = 0, . . . , n 1 for which we search n cubic splines

Si (x) = ai + bi (x xi ) + ci (x xi )2 + di (x xi )3 for x [xi ; xi+1 ]


The problem is then to select 4n coefficients {ai , bi , ci , di : i = 0, . . . , n 1}

using n + 1 nodes. We therefore need 4n identification conditions to identify


these 4n coefficients.
The first set of restrictions is given by the collection of restrictions imposing
that the spline approximation is exact on the nodes
S(xi ) = yi for i = 0, . . . , n 1 and Sn1 (xn ) = yn
which amounts to impose
ai = yi for i = 0, . . . , n 1
33

(3.6)

and
an1 + bn1 (xn xn1 ) + cn1 (xn xn1 )2 + dn1 (xn xn1 )3 = yn (3.7)
The second set of restrictions imposes continuity of the function on the upper
bound of each interval
Si (xi ) = Si1 (xi ) for i = 1, . . . , n 1
which implies, noting hi = xi xi1
ai = ai1 + bi1 hi + ci1 h2i + di1 h3i for i = 1, . . . , n 1

(3.8)

This furnishes 2n identification restrictions, such that 2n additional restrictions are still needed. Since we are dealing with a cubic spline interpolation,
this requires the approximation to be C 2 , implying that first and second order
derivatives should be continuous. This yields the following n 1 conditions
for the first order derivatives

0
Si0 (xi ) = Si1
(xi ) for i = 1, . . . , n 1

or
bi = bi1 + 2ci1 hi + 3d3i1 h2i for i = 1, . . . , n 1

(3.9)

and the additional n 1 conditions for the second order derivatives


00
(xi ) for i = 1, . . . , n 1
Si00 (xi ) = Si1

or
2ci = 2ci1 + 6d3i1 hi for i = 1, . . . , n 1

(3.10)

Equations (3.6)(3.10) therefore define a system of 4n 2 equations, such that

we are left with 2 degrees of freedom. Hence, we have to impose 2 additional


conditions. There are several ways to select such conditions
1. Natural cubic splines impose that the second order derivatives S000 (x0 ) =
Sn00 (xn ) = 0. Note that the latter is actually not to be calculated in our
34

problem, nevertheless this imposes conditions on both c0 and cn which


will be useful in the sequel. In fact it imposes
c0 = c n = 0
An interpretation of this condition is that the cubic spline is represented
by the tangent of S at x0 and xn
2. Another way to fix S(x) would be to use potential information on the
slope of the function to be approximated. In other words, one may set
0
S00 (x0 ) = F 0 (x0 ) and Sn1
(xn ) = F 0 (xn )

This is the socalled Hermite spline. However, in a number of situation such information on the derivative of F is either not known or does
not exist (think of F not being differentiable at some points), such that
further source of information is needed. One can then rely on an approximation of the slope by the secant line. This is what is proposed
by thesecant Hermite spline, which amounts to approximate F 0 (x0 ) and
F 0 (xn ) by the secant line over the corresponding interval:
S00 (x0 ) =

S0 (x1 ) S0 (x0 )
Sn1 (xn ) Sn1 (xn1 )
0
and Sn1
(xn ) =
x1 x 0
xn xn1

But from the identification scheme, we have S0 (x1 ) = S1 (x1 ) = y1 and


Sn1 (xn ) = yn , such that we get
b0 = (y1 y0 )/h1 and bn1 = (yn yn1 )/hn
Let us now focus on the natural cubic spline approximation, which imposes
c0 = cn = 0. First, note that the system (3.6)(3.9) has a recursive form, such
that from (3.9) we can get
di1 =

1
(ci ci1 ) for i = 1, . . . , n 1
3hi

Plugging this results in (3.9),we get


bi b1i1 = 2ci1 hi + (ci ci1 )hi = (ci + ci1 )hi for i = 1, . . . , n 1
35

and, (3.8) becomes


1
ai ai1 = bi1 hi + ci1 h2i + (ci ci1 )h2i
3
1
= bi1 hi + (ci + 2ci1 )h2i for i = 1, . . . , n 1
3
which we may rewrite as
ai ai1
1
= bi1 + (ci + 2ci1 )hi for i = 1, . . . , n 1
hi
3
Likewise, we have
1
ai+1 ai
= bi + (ci+1 + 2ci )hi+1 for i = 0, . . . , n 2
hi+1
3
substracting the last two equations, when defined, we get
1
1
ai+1 ai ai ai1

= bi bi1 + (ci+1 + 2ci )hi+1 (ci + 2ci1 )hi


hi+1
hi
3
3
for i = 1, . . . , n 2, which is then given, taking (3.6) and (3.7) into account,

by

3
hi+1

(yi+1 yi )

3
(yi yi1 ) = hi ci1 + 2(hi + hi+1 )ci + hi+1 ci+1
hi

for i = 1, . . . , n 2. We however have the additional n 1th identification

restriction that imposes c0 = 0 and the last restriction cn = 0 We therefore


endup with a system of the form
Ac = B
where

2(h0 + h1 )
h1

h1
2(h1 + h2 )
h2

h
2(h
+ h3 )
h3
2
2

A=
.
..

hn3
2(hn3 + hn2 )
hn2
hn2
2(hn2 + hn1 )
36

c1

c = ... and B =
cn1

3
h1 (y2
3
hn1 (yn

y1 )
..
.

yn1 )

3
h0 (y1

y0 )

3
hn2 (yn1

yn2 )

The matrix A is then said to be tridiagonal (and therefore sparse) and is also
symmetric and elementwise positive. It is hence positive definite and therefore
invertible. We then got all the ci , i = 1, . . . , n 1 and can compute the bs
and ds as
bi1 =

yn yn1 2cn1
yi yi1 1
(ci +2ci1 )hi for i = 1, . . . , n1 and bn1 =

hi
3
hn
3hn

and

cn1
1
(ci ci1 ) for i = 1, . . . , n 1 and dn1 =
3hi
3hn
finally we have had ai = yi , i = 0, . . . , n 1 from the very beginning.
di1 =

Once the approximation is obtained, the evaluation of the approximation

has to be undertaken. The only difficult part in this job is to identify the interval the value of the argument of the function we want to evaluate belongs to
i.e. we have to find i {0, . . . , n 1} such that x [xi , xi+1 ]. Nevertheless,
as long as the nodes are generated using an invertible formula, there will be no

cost to determine the interval. Most of the time, a uniform grid is used, such
that the interval [a; b] is divided using the linear scheme xi = a + i, where
= (b a)/(n 1), and i = 0, . . . , n 1. In such a case, it is particularly

simple to determine the interval as i is given by E[(x a)/]. Nevertheless


there are some cases where it may be efficient to use nonuniform grid. For

instance, in the case of the function we consider it would be useful to consider

the following simple 4 nodes grid {3, 0.5 3 1.5, 0.5 + 3 2, 3}, as taking this

grid would yield a perfect approximation (remember that the central part of
the function is cubic!)

As an example of spline approximation, figure 3.8 reports the spline approximation to the nonsmooth function F (x) = min(max(1.5, (x1/2) 3 ), 2)
considering a uniform grid over the [-3;3] interval with 3, 7 and 15 nodes. In
37

3
2
1
0

Figure 3.8: Cubic spline approximation


Approximation
Residuals
1.5
True
n=3
n=3
n=7
1
n=7
n=15
n=15
0.5
0

1
2
4

0
x

0.5
4

0
x

order to gauge the potential of spline approximation, we report in the upper


panel of figure 3.9 the L2 and L error of approximation. The L2 approximation error is given by kF (x)S(x)k while the L is given by max |F (x)S(x)|.
It clearly appears that increasing the number of nodes improves the approxi-

mation in that the error is driven to zero. Nevertheless, it also appears that
convergence is not monotonic in the case of the L error. This is actually related to the fact that F , in this case is not even C 1 on the overall interval. In
fact, as soon as we consider a smooth function this convergence is monotonic,
as can be seen from the lower panel that report it for the function F (x) = x 0.1
over the interval [0.01;2]. This actually illustrates the following result.
Theorem 7 Let F be a C 4 function over the interval [x0 ; xn ] and S its cubic
spline approximation on {x0 , x1 , . . . , xn } and let > maxi {xi xi1 }, then
kF Sk 6
and
0

kF S k

5
kF (4) k 4
384

p
9 + (3) (4)
6
kF k 3
216

This theorem actually gives upper bounds to spline approximation, and indicates that these bounds decrease at a fast pace (power of 4) as the number of
38

Figure 3.9: Approximation errors


F (x) = min(max(1.5, (x 1/2)3 ), 2) over [3; 3]

L2 error

0.03

0.8

L error

0.6

0.02

0.4
0.01

0.2

0
0

20
40
# of nodes

0
0

60

20
40
# of nodes

60

F (x) = x0.1 over [0.01; 2]


3

x 10

L2 error

0.2

L error

0.15

0.1
1
0
0

0.05
20
40
# of nodes

0
0

60

39

20
40
# of nodes

60

nodes increases (as diminishes). Splines are usually viewed as a particularly


good approximation method for two main reasons:
1. A good approximation may be achieved even for functions that are not
C or that do not possess high order derivatives. Indeed, as indicated in
theorem 7, the error term basically depends only on fourth order derivatives, such that even if the fifth order derivative were badly behaved then
an accurate approximation may be obtained.
2. Evaluation of splines is particularly cheap as they involve most of the
time at most cubic polynomials, the only costly part being the interval
search step.
nbx
a
b
dx
x
y

=
=
=
=
=
=

Matlab Code: Cubic Spline Approximation


8;
% number of nodes
-3;
% lower bound of interval
3;
% upper bound of interval
(b-a)/(n-1);
% step in the grid
[a:dx:b];
% grid points
min(max(-1.5,(x(i)-0.5)^3),2);

A
= spalloc((nbx-2),(nbx-2),3*nbx-8);
% creates sparse matrix A
B
= zeros((nbx-2),1);
% creates vector B
A(1,[1 2])=[2*(dx+dx) dx];
for i=2:nbx-3;
A(i,[i-1 i i+1])=[dx 2*(dx+dx) dx];
B(i)=3*(y(i+2)-y(i+1))/dx-3*(y(i+1)-y(i))/dx;
end
A(nbx-2,[nbx-3 nbx-2])=[dx 2*(dx+dx)];
c
a
b
d
S

=
=
=
=
=

[0;A\B];
y(1:nbx-1);
(y(2:nbx)-y(1:nbx-1))/dx-dx*([c(2:nbx-1);0]+2*c(1:nbx-1))/3;
([c(2:nbx-1);0]-c(1:nbx-1))/(3*dx);
[a;b;c(1:nbx-1);d];
% Matrix of spline coefficients

One potential problem that may arise with the type of method we have
developed until now is that we have not imposed any particular restriction on
the shape of the approximation relative to the true function. This may be of
great importance in some cases. Let us assume for instance that we need to
approximate the function F (xt ) that characterizes the dynamics of variable x
40

in the following backward looking dynamic equation:


xt+1 = F (xt )
Assume F is a concave function that is costly to compute, such that it is beneficial to approximate the function. However, as we have already seen from the
previous examples, many methods generate oscillations in the approximation.
This can create some important problems as it implies that the approximation
is not strictly concave, which is in turn crucial to characterize the dynamics of
variable x. Further, the approximation of a strictly increasing function may be
locally decreasing. All this may create some divergent path, or even generate
some spurious steady state, and therefore spurious dynamics. It is therefore
crucial to develop shape preserving methods preserving in particular the
curvature and monotonicity properties for such cases.

3.7

Shape preserving approximations

In this section, we will see an approximation method that preserves the shape
of the function we want to approximate. This method was proposed by Schumaker [1983] and essentially amounts to exploit some information on both
the level and the slope of the function to be approximated to build a smooth
approximation. We will deal with two situations. The first one Hermite
interpolation assumes that we have information on both the level and the
slope of the function to approximate. The second one that uses Lagrange
data assumes that no information on the slope of the function is available.
Both method was originally developed using quadratic splines.

3.7.1

Hermite interpolation

This method assumes that we have information on both the level and the
slope of the function to be approximated. Assume we want to approximate
the function F on the interval [x1 , x2 ] and we know yi = F (xi ) and zi = F 0 (xi ),
i = 1, 2. Schumaker proposes to build a quadratic function S(x) on [x1 ; x2 ]
41

that satisfies
S(xi ) = yi and S 0 (xi ) = zi for i = 1, 2
Schumaker establishes first that
Lemma 1 If

then the quadratic form

z1 + z 2
y2 y 1
=
2
x2 x 1

S(x) = y1 + z1 (x x1 ) +

z2 z 1
(x x1 )2
2(x2 x1 )

satisfies S(xi ) = yi and S 0 (xi ) = zi for i = 1, 2.

The construction of this function is rather appealing. If z1 and z2 have the


same sign then S 0 (x) has the same sign as z1 and z2 over [x1 ; x2 ]:
S 0 (x) = z1 +

(z2 z1 )
(x x1 )
(x2 x1 )

Hence, if F is monotically increasing (decreasing) on the interval [x1 ; x2 ], so is


S(x). Further, z1 > z2 (z1 < z2 ) indicates concavity (convexity), which S(x)
satisfies as S 00 (x) = (z2 z1 )/(x2 x1 ) < 0 (> 0).

However, the conditions stated by this lemma are extremely stringent and

do not usually apply, such that we have to adapt the procedure. This may be
done by adding a node between x1 and x2 and construct another spline that
satisfies the lemma.
Lemma 2 For every x? (x1 , x2 ) there exist a unique quadratic spline that
solves

S(xi ) = yi and S 0 (xi ) = zi for i = 1, 2


with a node at x? . This spline is given by

01 + 11 (x x1 ) + 21 (x x1 )2
S(x) =
02 + 12 (x x? ) + 22 (x x? )2
where

where z =

01 = y1
02 = y1 +

for x [x1 ; x? ]
for x [x? ; x2 ]

11 = z1 21 =
z+z1
?
22 =
2 (x x1 ) 12 = z

2(y2 y1 )(z1 (x? x1 )+z2 (x2 x? )


x2 x1

42

zz1
2(x? x1 )
z2 z
2(x2 x? )

If the later lemma fully characterized the quadratic spline, it gives no information on x? which therefore remains to be selected. x? will be set such that the
spline matches the desired shape properties. First note that if z1 and z2 are
both positive (negative), then S(x) is monotone if and only if z1 z > 0 (6 0)
which is actually equivalent to
2(y2 y1 ) R (x? x1 )z1 + (x2 x? )z2 if z1 , z2 R 0
This essentially deals with the monotonicity problem, and we now have to
tackle the question of curvature. To do so, we compute the slope of the secant
line between x1 and x2
=

y2 y 1
x2 x 1

Then, if (z2 )(z1 ) > 0, this indicates the presence of an inflexion point

in the interval [x1 ; x2 ] such that the interpolant cannot be neither convex nor
concave. Conversely, if |z2 | < |z1 | and x? satisfies
x1 < x ? 6 x x 1 +

2(x2 x1 )(z2 )
(z2 z1 )

then S(x), as described in the latter lemma, is convex (concave) if z1 < z2


(z1 > z2 ). Further, if z1 z2 > 0 it is also monotone.
If, on the contrary, |z2 | > |z1 | and x? satisfies
x x2 +

2(x2 x1 )(z1 )
6 x? < x2
(z2 z1 )

then S(x), as described in the latter lemma, is convex (concave) if z1 < z2


(z1 > z2 ).
This therefore endow us with a range of values for x? that will insure that
shape properties will be preserved.
1. Check if lemma 1 is satisfied. If so set x? = x2 and set S(x) as in lemma
2. Then stop else go to 2.
2. Compute = y2 y1 /x2 x1
43

3. if (z1 )(z2 ) > 0 set x? = (x1 + x2 )/2 and stop else goto 4.
4. if |z1 | < |z2 | set x? = (x1 + x)/2 and stop else goto 5.
5. if |z1 | > |z2 | set x? = (x2 + x)/2 and stop.
We have then in hand a value for x? for [x1 ; x2 ]. We then apply it to each
subinterval to get x?i [xi ; xi+1 ] and then solve the general interpolation
problem as explained in lemma 2.

Note here that everything assumes that with have Hermite data in hand
i.e. {xi , yi , zi : i = 0, . . . , n}. However, the knowledge of the slope is usually
not the rule and we therefore have to adapt the algorithm to such situations.

3.7.2

Unknown slope: back to Lagrange interpolation

Assume now that we do not have any data for the slope of the function, that
is we are only endowed with Lagrange data {xi , yi : i = 0, . . . , n}. In such a

case, we just have to add the needed information an estimate of the slope of

the function and proceed exactly as in Hermite interpolation. Schumaker


proposes the following procedure to get {zi ; i = 1, . . . , n}. Compute

and

1
Li = (xi+1 xi )2 + (yi+1 yi )2 2
i =

yi+1 yi
xi+1 xi

for i = 1, . . . , n 1. Then zi , i = 1, . . . , n can be recovered as

Li1 i1 + Li i
if i1 i > 0
zi =
i = 2, . . . , n 1
Li1 + Li
0
if i1 i 6 0

and

z1 =

3n1 sn1
31 z2
and zn =
2
2

Then, we just apply exactly the same procedure as described in the previous
section.
44

Up to now, all methods we have been studying are unidimensional whereas


most of the model we deal with in economics involve more than 1 variable.
We therefore need to extend the analysis to higher dimensional problems.

3.8

Multidimensional approximations

Computing a multidimensional approximation to a function may be quite cumbersome and even impossible in some cases. To understand the problem, let
us restate an example provided by Judd [1998]. Consider we have data points
{P1 , P2 , P3 , P4 } = {(1, 0), (1, 0), (0, 1), (0, 1)} in R2 and the corresponding

data zi = F (Pi ), i = 1, . . . , 4. Assume now that we want to construct the approximation of function F using a linear combination of {1, x, y, xy} defined

as

G(x, y) = a + bx + cy + dxy
such that G(xi , yi ) = zi . Finding a, b, c, d amounts to solve the linear system

1
1
0 0
a
z1
1 1

0 0

b = z2
1

0
1 0
z3
c
z4
1
0 1 0
d

which is not feasible as the matrix is not full rank.


This example reveals two potential problems:

1. Approximation in higher dimensional systems involves crossproduct


and therefore poses the problem of the selection of polynomial basis
to be used for approximation,
2. More important is the selection of the grid of nodes used to evaluate the
function to compute the approximation.
We now investigate these issues, by first considering the simplest way to
attack the question namely considering tensor product bases and then
moving to a second way of dealing with this problem considering complete
polynomials. In each case, we explain how Chebychev approximations can be
obtained.
45

3.8.1

Tensor product bases

The idea here is to use the tensor product of univariate functions to form a
basis of multivariate functions. In order to better understand this point, let
us consider that we want to approximate a function F : R2 R using simple

univariate monomials up to order 2: X = {1, x, x2 } and Y = {1, y, y 2 }. The


tensor product basis is given by

{1, x, y, xy, x2 , y 2 , x2 y, xy 2 , x2 y 2 }
i.e. all possible 2terms products of elements belonging to X and Y .
We are now in position to define the nfold tensor product basis for functions of n variables {x1 , . . . , xi , . . . , xn }.
Definition 10 Given a basis for n functions of the single variable xi : Pi =
i
{pki (xi )}k=0
then the tensor product basis is given by

1
n

Y
Y
...
B=
pk11 (x1 ) . . . pknn (xn )

k1 =0

kn =0

An important problem with this type of tensor product basis is their size.
For example, considering a mdimensional space with polynomials of order
n, we already get (n + 1)m terms! This exponential growth in the number
of terms makes it particularly costly to use this type of basis, as soon as the
number of terms or the number of nodes is high. Nevertheless, it will often
be satisfactory or sufficient for low enough polynomials (in practice n=2!)
Therefore, one often rely on less computationally costly basis.

3.8.2

Complete polynomials

As aforementioned, tensor product bases grow exponentially as the dimension


of the problem increases, complete polynomials have the great advantage of
growing only polynomially as the dimension increases. From an intuitive point
of view, complete polynomials bases take products of order lower than a priori
given into account, ignoring higher terms of higher degrees.
46

Definition 11 For N given, the complete set of polynomials of total degree


in n variables is given by
(
Bc =

xk11

. . . xknn : k1 , . . . , kn > 0,

n
X

ki 6

i=1

To see this more clearly, let us consider the example developed in the previous
section (X = {1, x, x2 } and Y = {1, y, y 2 }) and let us assume that = 2. In

this case, we end up with a complete polynomials basis of the type

B c = 1, x, y, x2 , y 2 , xy = B\{xy 2 , x2 y, x2 y 2 }

Note that we have actually already encountered this type of basis, as this is
typically what is done by Taylors theorem for many dimensions
F (x) ' F (x? ) +
..
.
+

n
X
F ?
(x )(xi x?i )
xi
i=1

n
n
X
F
1 X
...
(x? )(xi1 x?i1 ) . . . (xik x?ik )
k!
xi1 . . . xi1
i1 =1

ik =1

For instance, considering the Taylor expansion to the 2dimensional function


F (x, y) around (x? , y ? ) we get
F (x, y) ' F (x? , y ? ) + Fx (x? , y ? )(x x? ) + Fy (x? , y ? )(y y ? )

1
+
Fxx (x? , y ? )(x x? )2 + 2Fxy (x? , y ? )(x x? )(y y ? )
2
!
+Fyy (x? , y ? )(y y ? )2

which rewrites
F (x, y) = 0 + 1 x + 2 y + 3 x2 + 4 y 2 + 5 xy
such that the implicit polynomial basis is the complete polynomials basis of
order 2 with 2 variables.
47

The key difference between tensor product bases and complete polynomials bases lies essentially in the rate at which the size of the basis increases. As
aforementioned, tensor product bases grow exponentially while complete polynomials bases only grow polynomially. This reduces the computational cost of
approximation. But what do we loose using complete polynomials rather than
tensor product bases? From a theoretical point of view, Taylors theorem gives
us the answer: Nothing! Indeed, Taylors theorem indicates that the element
in B c delivers a approximation in the neighborhood of x? that exhibits an
asymptotic degree of convergence equal to k. The nfold tensor product, B,
can deliver only a k th degree of convergence as it does not contains all terms
of degree k + 1. In other words, complete polynomials and tensor product
bases deliver the same degree of asymptotic convergence and therefore complete polynomials based approximation yields an as good level of accuracy as
tensor product based approximations.
Once we have chosen a basis, we can proceed to approximation. For example, we may use Chebychev approximation in higher dimensional problems.
Judd [1998] reports the algorithm for this problem. As we will see, it takes advantage of a very nice feature of orthogonal polynomials: they inherit their orthogonality property even if we extend them to higher dimensions. Let us then
assume we want to compute the chebychev approximation of a 2dimensional
function F (x, y) over the interval [ax ; bx ] [ay ; by ] and let us assume to

keep things simple for a while that we use a tensor product basis. Then
the algorithm is as follows
1. Choose a polynomial order for x (nx ) and y (ny )

2. Compute mx > nx + 1 and my > ny + 1 Chebychev interpolation nodes


on [1; 1]
zkx

= cos

2k 1
, k = 1, . . . , mx
2mx

= cos

2k 1
, k = 1, . . . , my
2my

and
zky

48

3. Adjust the nodes to fit in both interval

bx ax
x
xk = ax + (1 + zk )
, k = 1, . . . , mx
2
and
yk = ay + (1 +

zky )

by ay
2

, k = 1, . . . , my

4. Evaluate the function F at each node to form


{k` = F (xk , y` ) : k = 1, . . . , mx ; ` = 1, . . . , my }
5. Compute the (nx +1)(ny +1) Chebychev coefficients ij , i = 0, . . . , nx ,
j = 0, . . . , ny as
my
mx X
X


k` Tix (zkx ) Tjy z`y

!
! my
ij = mk=1 `=1
x
X
X

2
Tjy z`y
Tix (zkx )2
`=1

k=1

which may be simply obtained in this case as


=

T x (z x )0 T y (z y )
kT x (z x )k2 kT y (z y )k2

6. Compute the approximation as


G(x, y) =

ny
nx X
X
i=0 j=0

ij Tix

y ay
x ax
y
1 Tj 2
1
2
bx ax
by ay

which may also be obtained as

y ay
x ax
y
x
1 T 2
1
G(x, y) = T
2
bx ax
by ay
As an illustration of the algorithm we compute the approximation of the CES
function
1

F (x, y) = [x + y ]
49

on the [0.01; 2][0.01; 2] interval for = 0.75. We used 5th order polynomials
for both x and y and 20 nodes for both x and y, such that there are 400
possible interpolation nodes. Applying the algorithm we just described, we
get the matrix of coefficients reported in table 3.7. As can be seen from the
table, most of the coefficients are close to zero as soon as they involve the
crossproduct of higher order terms, such that using a complete polynomial
basis would yield the same efficiency at a lower computational cost. Figure
3.10 reports the graph of the residuals for the approximation.
Table 3.7: Matrix of Chebychev coefficients (tensor product basis)
kx \ k y
0
1
2
3
4
5

0
2.4251
1.2744
-0.0582
0.0217
-0.0104
0.0057

1
1.2744
0.2030
-0.0366
0.0124
-0.0055
0.0029

2
-0.0582
-0.0366
0.0094
-0.0037
0.0018
-0.0009

3
0.0217
0.0124
-0.0037
0.0016
-0.0008
0.0005

4
-0.0104
-0.0055
0.0018
-0.0008
0.0004
-0.0003

5
0.0057
0.0029
-0.0009
0.0005
-0.0003
0.0002

Matlab Code: Chebychev Coefficients in R2 (Tensor Product Basis)


rho = 0.75;
mx = 20;
my = 20;
nx = 5;
ny = 5;
ax = 0.01;
bx = 2;
ay = 0.01;
by = 2;
%
% Step 1
%
rx = cos((2*[1:mx]-1)*pi/(2*mx));
ry = cos((2*[1:my]-1)*pi/(2*my));
%
% Step 2
%
x
= (rx+1)*(bx-ax)/2+ax;
y
= (ry+1)*(by-ay)/2+ay;

50

%
% Step 3
%
Y
= zeros(mx,my);
for ix=1:mx;
for iy=1:my;
Y(ix,iy) = (x(ix)^rho+y(iy)^rho)^(1/rho);
end
end
%
% Step 4
%
Xx = [ones(mx,1) rx];
for i=3:nx+1;
Xx= [Xx 2*rx.*Xx(:,i-1)-Xx(:,i-2)];
end Xy = [ones(my,1) ry];
for i=3:ny+1;
Xy= [Xy 2*ry.*Xy(:,i-1)-Xy(:,i-2)];
end
T2x = diag(Xx*Xx);
T2y = diag(Xy*Xy);
a
= (Xx*Y*Xy)./(T2x*T2y);

Figure 3.10: Residuals: Tensor product basis

0.01
0.005
0
0.005
0.01
0.015
0
0.5
1
1.5
y

1.5

51

0.5

If we now want to perform the same approximation using a complete polynomials basis, we just have to modify the algorithm to take into account the
fact that when iterating on i and j we want to impose i + j 6 . Let us
compute is for = 5. This implies that the basis will consists of
1, T1x (.), T1y (.), T2x (.), T2y (.), T3x (.), T3y (.), T4x (.), T4y (.), T5x (.), T5y (.),
T1x (.)T1y (.), T1x (.)T2y (.), T1x (.)T3y (.), T1x (.)T4y (.),
T2x (.)T1y (.), T2x (.)T2y (.), T2x (.)T3y (.),
T3x (.)T1y (.), T3x (.)T2y (.),
T4x (.)T1y (.)

Table 3.8: Matrix of Chebychev coefficients (Complete polynomials basis)


kx \ k y
0
1
2
3
4
5

0
2.4251
1.2744
-0.0582
0.0217
-0.0104
0.0057

1
1.2744
0.2030
-0.0366
0.0124
-0.0055

2
-0.0582
-0.0366
0.0094
-0.0037

3
0.0217
0.0124
-0.0037

4
-0.0104
-0.0055

5
0.0057

A first thing to note is that the coefficients that remain are the same as
the one we got in the tensor product basis. This should not be any surprise
as what we just find is just the expression of the Chebychev economization we
already encountered in the unidimensional case and which is just the direct
consequence of the orthogonality condition of chebychev polynomials. Figure
3.11 report the residuals from the approximation using the complete basis. As
can be seen from the figure, this constrained approximation yields quantitatively similar results compared to the tensor product basis, therefore achieving
almost the same accuracy while being less costly from a computational point
of view. In the matlab code section, we just report the lines in step 4 that
are affected by the adoption of the complete polynomials basis.
52

Matlab Code: Complete Polynomials Specificities


a=zeros(nx+1,ny+1);
for ix=1:nx+1;
iy = 1;
while ix+iy-2<=kappa
a(ix,iy)=(Xx(:,ix)*Y*Xy(:,iy))./(T2x(ix)*T2y(iy));
iy=iy+1;
end
end

Figure 3.11: Residuals: Complete polynomials basis

0.02
0.01
0
0.01
0.02
0

0.5

0.5

1.5
2

1.5
2

3.9

Finite element approximations

Finite element are extremely popular among engineers (especially in aeronautics). This approach considers elements that are zero over most of the
domain of approximation. Although they are extremely powerful in the case
of 2dimensional problems, they are more difficult to implement in higher
dimensions. We therefore focus on the bidimensional case.
53

3.9.1

Bilinear approximations

A bilinear interpolation proposes to interpolate data linearly in both coordinate directions. Assume that we have the values of a function F (x, y) at the
four points
P1 = (1, 1) P2 = (1, 1)
P3 = (1, 1)
P4 = (1, 1)

A cardinal interpolation basis on [1; 1] [1; 1] is provided by the set of


functions

b1 (x, y) = 41 (1 x)(1 y) b2 (x, y) = 14 (1 x)(1 + y)


b3 (x, y) = 41 (1 + x)(1 y) b4 (x, y) = 14 (1 + x)(1 + y)
All functions bi are zero on all Pj , i 6= j, but on the point to which is associated

the same index (i = j). Therefore, an approximation of F (x, y) on [1; 1]


[1; 1] is given by

F (x, y) ' F (1, 1)b1 (x, y)+F (1, 1)b2 (x, y)+F (1, 1)b3 (x, y)+F (1, 1)b4 (x, y)
If we have data on [ax ; bx ] [ay ; by ], we use the linear transformation we have
already encountered a great number of times

y ay
x ax
2
1, 2
1
bx ax
by ay

Then, if we have Lagrange data of the type {xi , yi , zi : i = 1, . . . , n}, we

proceed as follows

1. Construct a grid of nodes for x and y;


2. Construct the interpolant over each square applying the previous scheme;
3. Piece all interpolant together.
Note that an important issue of piecewise interpolation is related to the continuity of the approximation: individual pieces must meet continuously at
common edges. In bilinear interpolation, this is insured by the fact that two
interpolants overlap only at the edges of rectangles on which the approximation is a linear interpolant of 2 common end points. This would not be
54

insured if we were to construct biquadratic or bicubic interpolations for example. Note that this type of interpolation scheme is typically what is done
when a computer draw a 3d graph of a function. In figure 3.12, we plot
the residuals of the bilinear approximation of the CES function we approximated in the previous section, with 5 uniform intervals (6 nodes such that x =
{0.010, 0.408, 0.806, 1.204, 1.602, 2.000} and y = {0.010, 0.408, 0.806, 1.204, 1.602, 2.000}).

Like in the spline approximation procedure, the most difficult step once we
have obtained an approximation is to determine the square the point for which
we want an approximation belongs to. We therefore face exactly the same type
of problems.
Figure 3.12: Residuals of the bilinear approximation

1
0.5
0
0.5
1
0

0.5

1.5

1
1.5

0.5
2

3.9.2

Simplicial 2D linear interpolation

This method essentially amounts to consider triangles rather than rectangles


as an approximation basis. The idea is then to build triangles in the x y

plane. To do so, and assuming that the lagrange data have already been
55

transformed using the linear map described earlier, we set 3 points


P1 = (0, 0), P2 = (0, 1), P3 = (1, 0)
to which we associate 3 functions
b1 (x, y) = 1 x y, b2 (x, y) = y, b3 (x, y) = x
which are such that all functions bi are zero on all Pj , i 6= j, but on the point

to which is associated the same index (i = j). b1 b2 and b3 are the cardinal
functions on P1 , P2 , P3 . Let us now add the point P4 = (1, 1), then we have
the following cardinal functions for P2 , P3 , P4 :
b4 (x, y) = 1 x, b5 (x, y) = 1 y, b6 (x, y) = x + y 1
Therefore, on the square P1 , P2 , P3 , P4 the interpolant for F is given by

if x + y 6 1
F (0, 0)(1 x y) + F (0, 1)y + F (1, 0)x
G(x, y) =

F (0, 1)(1 x) + F (1, 0)(1 y) + F (1, 1)(x + y 1) if x + y > 1

It should be clear to you that if these methods are pretty easy to implement

in dimension 3, it becomes quite cumber some in higher dimensions, and noth


that much is proposed in the literature, are most of these methods were designed by engineers and physicists that essentially have to deal with 2, 3 at
most 4 dimensional problems. We will see that this will be a limitation in a
number of economic applications.

56

Bibliography
Hornik, K., M. Stinchcombe, and H. White, MultiLayer Feedforward Networks are Universal Approximators, Neural Networks, 1989, 2, 359366.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Schumaker, L.L., On Shapepreseving Quadratic Spline Interpolation, SIAM
Journal of Numerical Analysis, 1983, 20, 854864.

57

Index
Analytic function, 4

Tensor product bases, 46

Chebychev approximation, 24

Vandermonde matrix, 23

Complete polynomials, 47
complete polynomials, 46
Economization principle, 28
Hiddenlayer feedforward, 14
Least square orthogonal polynomial
approximation, 19
Linearization, 6
Local approximations, 2
Loglinearization, 7
Multidimensional approximation, 45
Neural networks, 11
Ordinary least square, 8
Orthogonal polynomials, 18
Radius of convergence, 3
Singlelayer, 14
Splines, 32
Taylor series expansion, 2
58

Contents
3 Approximation Methods
3.1

Local approximations
3.1.1

3.2

1
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .

Regressions as approximation . . . . . . . . . . . . . . . . . . .

3.2.1

Taylor series expansion

Orthogonal polynomials . . . . . . . . . . . . . . . . . .

18

3.3

Least square orthogonal polynomial approximation . . . . . . .

19

3.4

Interpolation methods . . . . . . . . . . . . . . . . . . . . . . .

22

3.4.1

Linear interpolation . . . . . . . . . . . . . . . . . . . .

22

3.4.2

Lagrange interpolation . . . . . . . . . . . . . . . . . . .

23

3.5

Chebychev approximation . . . . . . . . . . . . . . . . . . . . .

24

3.6

Piecewise interpolation . . . . . . . . . . . . . . . . . . . . . . .

32

3.7

Shape preserving approximations . . . . . . . . . . . . . . . . .

41

3.7.1

Hermite interpolation . . . . . . . . . . . . . . . . . . .

41

3.7.2

Unknown slope: back to Lagrange interpolation . . . . .

44

Multidimensional approximations . . . . . . . . . . . . . . . . .

45

3.8.1

Tensor product bases . . . . . . . . . . . . . . . . . . . .

46

3.8.2

Complete polynomials . . . . . . . . . . . . . . . . . . .

47

Finite element approximations . . . . . . . . . . . . . . . . . .

53

3.9.1

Bilinear approximations . . . . . . . . . . . . . . . . . .

53

3.9.2

Simplicial 2D linear interpolation . . . . . . . . . . . . .

55

3.8

3.9

59

60

List of Figures
3.1

Selection of points . . . . . . . . . . . . . . . . . . . . . . . . .

10

3.2

Leastsquare approximation of consumption . . . . . . . . . . .

13

3.3

Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.4

Neural Network Approximation . . . . . . . . . . . . . . . . . .

17

3.5

Chebychev polynomials . . . . . . . . . . . . . . . . . . . . . .

25

x0.1

3.6

Smooth function: F (x) =

. . . . . . . . . . . . . . . . . . .

29

3.7

Nonsmooth function: F (x) = min(max(1.5, (x 1/2)3 ), 2) .

31

3.8

Cubic spline approximation . . . . . . . . . . . . . . . . . . . .

38

Approximation errors

. . . . . . . . . . . . . . . . . . . . . . .

39

3.10 Residuals: Tensor product basis . . . . . . . . . . . . . . . . . .

52

3.11 Residuals: Complete polynomials basis . . . . . . . . . . . . . .

54

3.12 Residuals of the bilinear approximation . . . . . . . . . . . . .

56

3.9

61

62

List of Tables
3.1
3.2

Taylor series expansion for log(1 x) . . . . . . . . . . . . . . .

Neural Network approximation . . . . . . . . . . . . . . . . . .

17

3.3

Orthogonal polynomials (definitions) . . . . . . . . . . . . . . .

19

3.4

Orthogonal polynomials (recursive representation) . . . . . . .

20

3.5

Chebychev Coefficients: Smooth function . . . . . . . . . . . .

29

3.6

Chebychev Coefficients: Nonsmooth function . . . . . . . . . .

31

3.7

Matrix of Chebychev coefficients (tensor product basis) . . . .

50

3.8

Matrix of Chebychev coefficients (Complete polynomials basis)

52

63

Lecture Notes 4

Numerical differentiation and


integration
Numerical integration and differentiation is a key step is a lot of economic
applications, among which optimization of utility functions or profits, computation of expectations, incentive problems . . . .

4.1

Numerical differentiation

In a lot of economic problems and most of all in most of numerical problems we


will encounter we will have to compute either Jacobian of Hessian matrices, in
particular in optimization problems or when we will solve nonlinear equations
problems.

4.1.1

Computation of derivatives

A direct approach
Let us recall that the derivative of a function is given by
F (x + ) F (x)
0

F 0 (x) = lim

which suggests as an approximation of F 0 (x)


F 0 (x) '

F (x + x ) F (x)
x
1

(4.1)

The problem is then: how big should x be? It is obvious that x should be
small, in order to be as close as possible to the limit. The problem is that
it cannot be too small because of the numerical precision of the computer.
Assume for a while that the computer can only deliver a precision of 1e-2
and that we select x = 0.00001, then F (x + x ) F (x) would be 0 for the

computer as it would round x to 0! Theory actually delivers an answer to


this problem. Assume that F (x) is computed with accuracy e that is
|F (x) Fb(x)| 6 e

where Fb(x) is the computed value of F . If we compute the derivative using

formula (4.1), the computation error is given by

Fb(x + ) Fb (x) F (x + ) F (x)


2e

x
x

x
x
Further, Taylors expansion theorem states

F (x + x ) = F (x) + F 0 (x)x +

F 00 () 2
x
2

for [x; x + x ]. Therefore,1


F 00 ()
F (x + x ) F (x)
= F 0 (x) +
x
x
2
such that the approximation error is

Fb(x + ) Fb (x)

00 ()
F
2e

x
x 6
F 0 (x)

x
x
2

suppose now that M > 0 is an upper bound on |F 00 | in a neighborhood of x,


then we have

Fb (x + ) Fb(x)
2e
M

x
F 0 (x) 6
+ x

x
2

that is the approximation error is bounded above by


2e
M
+ x
x
2
1

Note that this also indicates that this approximation is O (x ).

If we minimize this quantity with respect to x , we obtain


r
e
?
x = 2
M

such that the upper bound is 2 eM . One problem here is that we usually
do not know M However, from a practical point of view, most people use the
following scheme for x
x = 1e 5. max(|x|, 1e 8)
which essentially amounts to work at the machine precision.
Ajouter ici une discussion de x
Similarly, rather than taking a forward difference, we may also take the
backward difference
F 0 (x) '

F (x) F (x x )
x

(4.2)

Central difference
There are a number of situations where onesided differences are not accurate enough, one potential solution is then to use the central difference or
twosided differences approach that essentially amounts to compute the
derivative using the backwardforward formula
F 0 (x) '

F (x + x ) F (x x )
2x

(4.3)

What do we gain from using this formula? To see this, let us consider the
Taylor series expansion of F (x + x ) and F (x x )
1
F (x + x ) = F (x) + F 0 (x)x + F 00 (x)2x +
2
1
F (x x ) = F (x) F 0 (x)x + F 00 (x)2x
2

1 (3)
F (1 )3x
6
1 (3)
F (2 )3x
6

(4.4)
(4.5)

where 1 [x, x + x ] and 2 [x x ; x]. Then, although the error term


involves the third derivative at two unknown points on two intervals, assuming
3

that F is at least C 3 , the central difference formula rewrites


F 0 (x) =

F (x + x ) F (x x ) 2x (3)
+
F ()
2x
6

with [x x ; x + x ]. A nice feature of this formula is therefore that it


is now O(2x ) rather than O(x ).

Further improvement: Richardson extrapolation


Basic idea of Richardson Extrapolation There are many approximation procedures in which one first picks a step size h and then generates an
approximation A(h) to some desired quantity A. Often the order of the error
generated by the procedure is known. This means that the quantity A writes
A = A(h) + hk + 0 hk+1 + 00 hk+2 + . . .
where k is some known constant, called the order of the error, and , 0 , 0 ,
. . . are some other (usually unknown) constants. For example, A may be the
derivative of a function, A(h) will be the approximation of the derivative when
we use a step size of h, and k will be set to 2.
The notation O(hk+1 ) is conventionally used to stand for a sum of terms
of order hk+1 and higher. So the above equation may be written
A = A(h) + hk + O(hk+1 )

(4.6)

Dropping the, hopefully tiny, term O(hk+1 ) from this equation, we obtain a
linear equation, A = A(h) + hk , in the two unknowns A and . But this
really gives a different equation for each possible value of h. We can therefore
get two different equations to identify both A and by just using two different
step sizes. Then doing this , using step sizes h and h/2, for any h, and taking
2k times
A = A(h/2) + (h/2)k + O(hk+1 )

(4.7)

(note that, in equations (4.6) and (4.7), the symbol O(hk+1 ) is used to stand
for two different sums of terms of order hk+1 and higher) and subtracting
4

equation (4.6) yields


(2k 1)A = 2k A(h/2) A(h) + O(hk+1 )
where O(hk+1 ) stands for a new sum of terms of order hk+1 and higher. We
then get

2k A(h/2) A(h)
+ O(hk+1 )
2k 1
where, once again, O(hk+1 ) stands for a new sum of terms of order hk+1 and
A=

higher. Denoting
B(h) =
then

2k A(h/2) A(h)
2k 1

A = B(h) + O(hk+1 )
What have we done so far? We have defined an approximation B(h) whose
error is of order k + 1 rather than k, such that it is a better one than A(h)s.
The generation of a new improved approximation for A from two A(h)s
with different values of h is called Richardson Extrapolation. We can then
continue the process with B(h) to get a new better approximation. This
method is widely used when computing numerical integration or numerical
differentiation.
Numerical differentiation with Richardson Extrapolation Assume
we want to compute the first order derivative of the function F C 2n R at
point x? . We may first compute the approximate quantity:
D00 (F ) =

F (x? + h0 ) F (x? h0 )
2h0

let us define h1 = h0 /2 and compute


D01 (F ) =

F (x? + h1 ) F (x? h1 )
2h1

Then according to the previous section, we may compute a better approximation as (since k = 2 in the case of numerical differentiation)
D01 (F ) =

4D01 (F ) D00 (F )
3
5

which may actually be rewritten as


D01 (F ) = D01 (F ) +

D01 (F ) D00 (F )
3

We then see that a recursive algorithm occurs as


Dj` (F )

j+1
D`1
(F )

`1
Dj+1
(F ) Dj`1 (F )

4k 1

note that since F is at most C 2n , then k 6 n such that


2(k+1)

F 0 (x? ) = Djk (F ) + O(hj

Hence, Djk (F ) yields an approximate value for F 0 (x? ) with an approximation


2(k+1)

error proportional to hj

. The recursive scheme is carried out until


|D0m D1m1 | <

in which case, D0m is used as an approximate value for F 0 (x? )


Matlab Code: Richardon Extrapolation
Function D = richardson(f,x,varargin)
%
% f -> function to differentiate
% x -> point at which the function is to be differentiated
% varargin -> parameters of the function
%
delta = 1e-12;
% error goal
toler = 1e-12;
% relative error goal
err
= 1;
% error bound
rerr = 1;
% relative error
h
= 1;
% initialize step size
j
= 1;
% initialize j
%
% First, compute the first derivative
%
fs
= feval(f,x+h,varargin{:});
fm
= feval(f,x-h,varargin{:});
D(1,1)= (fs-fm)/(2*h);
while (rerr>toler) & (err>delta) & (j<12)
h = h/2;
% update the step size
fs
= feval(f,x+h,varargin{:});

fm
= feval(f,x-h,varargin{:});
D(j+1,1) = (fs-fm)/(2*h);
% derivative with updated step size
%
% recursion
%
for k = 1:j,
D(j+1,k+1) = D(j+1,k-1+1) + (D(j+1,k-1+1)-D(j-1+1,k-1+1))/(4^k -1);
end
%
% compute errors
%
err = abs(D(j+1,j+1)-D(j-1+1,j-1+1));
rerr = 2*err/(abs(D(j+1,j+1))+abs(D(j-1+1,j-1+1))+eps);
j = j+1;
end
n = size(D,1);
D = D(n,n);

4.1.2

Partial Derivatives

Let us now consider that rather than having a single variable function, the
problem is multidimensional, such that F : Rn R and that we now want
to compute the first order partial derivative
Fi (x) =

F (x)
xi

This may be achieved extremely easily by computing, for example in the case
of central difference formula
F (x + ei x ) F (x ei x )
2x
where ei is a vector which i th component is 1 and all other elements are

0.

Matlab Code: Jacobian Matrix


function J=jacobian(func,x0,method,varargin);
%
% [df]=numgrad(func,x0,method,param)
%
% method = c -> centered difference

%
= l -> left difference
%
= r -> right difference
%
x0
= x0(:);
f
= feval(func,x0,varargin{:});
m
= length(x0);
n
= length(f);
J
= zeros(n,m);
dev = diag(.00001*max(abs(x0),1e-8*ones(size(x0))));
if (lower(method)==l);
for i=1:m;
ff= feval(func,x0+dev(:,i),varargin{:});
J(:,i)
= (ff-f)/dev(i,i);
end;
elseif (lower(method)==r)
for i=1:m;
fb= feval(func,x0-dev(:,i),varargin{:});
J(:,i)
= (f-fb)/dev(i,i);
end;
elseif (lower(method)==c)
for i=1:m;
ff= feval(func,x0+dev(:,i),varargin{:});
fb= feval(func,x0-dev(:,i),varargin{:});
J(:,i)
= (ff-fb)/(2*dev(i,i));
end;
else
error(Bad method specified)
end

4.1.3

Hessian

Hessian matrix can be computed relying on the same approach as for the
Jacobian matrix. Let us consider for example that we want to compute the
second order derivative of function F : R R using a central difference
approach, as we have seen it delivers higher accuracy. Let us write first write
the Taylors expansion of F (x + x ) and F (x x ) up to order 3
2x 00
F (x) +
2
2
F (x x ) = F (x) F 0 (x)x + x F 00 (x)
2
F (x + x ) = F (x) + F 0 (x)x +

3x (3)
F (x) +
6
3x (3)
F (x) +
6

4x (4)
F (1 )
4!
4x (4)
F (2 )
4!

with 1 [x; x + x ] and 2 [x x ; x]. We then get


F (x + x ) + F (x x ) = 2F (x) + 2x F 00 (x) +

4x (4)
[F (1 ) + F (4) (2 )]
4!

such that as long as F is at least C 4 , we have


F 00 (x) =

F (x + x ) 2F (x) + F (x x ) 2x (4)

F ()
2x
12

with [x x ; x + x ]. Note then that the approximate second order

derivative is O(2x ).

4.2

Numerical Integration

Numerical integration is a widely encountered problem in economics. For


example, if wa are to compute the welfare function in a continuous time model,
we will face an equation of the form
Z
et u(ct )dt
W =
0

Likewise in rational expectations models, we will have to compute conditional


expectations such that assuming that the innovations of the shocks are
gaussian we will quite often encounter an equation of the form
Z
1 2
1

f (X, )e 2 2 d
2

In general, numerical integration formulas approximate a definite integral by


a weighted sum of function values at points within the interval of integration.
In other words, a numerical integration rule takes the typical form
Z b
n
X
F (x)dx '
i F (xi )
a

i=0

where the coefficients i depend on the method chosen to compute the integral.
This approach to numerical integration is known as the quadrature problem.
These method essentially differ by (i) the weights that are assigned to each
function evaluation and (ii) the nodes at which the function is evaluated. In
fact basic quadrature methods may be categorized in two wide class:
9

1. The methods that are based on equally spaced data points: these are
Newtoncotes formulas: the midpoint rule, the trapezoid rule and
Simpson rule.
2. The methods that are based on data points which are not equally spaced:
these are Gaussian quadrature formulas.

4.2.1

NewtonCotes formulas

NewtonCotes formulas evaluate the function F at a finite number of points


and uses this point to build an interpolation between these points typically
a linear approximation in most of the cases. Then this interpolant is integrated
to get an approximate value of the integral.
Figure 4.1: NewtonCotes integration
F (x) 6
F (b)

Yb

Ya
YM
F (a)

a+b
2

The midpoint rule


The midpoint rule essentially amounts to compute the area of the rectangle
formed by the four points P0 = (a, 0), P1 = (b, 0), P2 = (a, f ()), P3 =
10

(b, f ()) where = (a + b)/2 as an approximation of the integral, such that

Z b
(b a)3 00
a+b
F (x)dx = (b a)F
+
F ()
2
4!
a
where [a; b], such that the approximate integral is given by

a+b
Ib = (b a)F
2

Note that this rule does not make any use of the end points. It is noteworthy
that this approximation is far too coarse to be accurate, such that what is usually done is to break the interval [a; b] into smaller intervals and compute the
approximation on each subinterval. The integral is then given by cumulating
the subintegrals, we therefore end up with a composite rule. Hence, assume
that the interval [a; b] is broken into n > 1 subintervals of size h = (b a)/n,

we have n+1 data points xi = a+ i 21 h with i = 1, . . . , n. The approximate

integral is then given by

Ibn = h

n
X

f (xi )

i=1

Matlab Code: Midpoint Rule Integration


function mpr=midpoint(func,a,b,n,varargin);
%
% function mpr=midpoint(func,a,b,n,P1,...,Pn);
%
% func
: Function to be integrated
% a
: lower bound of the interval
% b
: upper bound of the interval
% n
: number of sub-intervals => n+1 points
% P1,...,Pn : parameters of the function
%
h
= (b-a)/n;
x
= a+([1:n]-0.5)*h;
y
= feval(func,x,varargin{:});
mpr = h*(ones(1,n)*y);

Trapezoid rule
The trapezoid rule essentially amounts to use a linear approximation of the
function to be integrated between the two end points of the interval. This
11

then defines the trapezoid {(a, 0), (a, F (a)), (b, F (b)), (b, 0)} which area and
consequently the approximate integral is given by
(b a)
Ib =
(F (a) + F (b))
2

This may be derived appealing to the Lagrange approximation for function F


over the interval [a; b], which is given by
L (x) =

xb
xa
F (a) +
F (b)
ab
ba

then
Z

b
a

F (x)dx '
'
'
'
'
'
'

xa
xb
F (a) +
F (b)dx
ba
a ab
Z b
1
(b x)F (a) + (x a)F (b)dx
ba a
Z b
1
(bF (a) aF (b)) + x(F (b) F (a))dx
ba a
Z b
1
x(F (b) F (a))dx
bF (a) aF (b) +
ba a
b2 a2
bF (a) aF (b) +
(F (b) F (a))
2(b a)
b+a
(F (b) F (a))
bF (a) aF (b) +
2
(b a)
(F (a) + F (b))
2

Obviously, this approximation may be poor, as in the example reported in


figure 4.1, such that as in the midpoint rule we should break the [a; b] interval
in n > subintervals of size h = (ba)/n, we have n+1 data points xi = a+ih
and their corresponding function evaluations F (xi ) with i = 0, . . . , n. The
approximate integral is then given by
#
"
n1
X
h
F (xi )
F (x0 ) + F (xn ) + 2
Ibn =
2
i=1

12

Matlab Code: Trapezoid Rule Integration


function trap=trapezoid(func,a,b,n,varargin);
%
% function trap=trapezoid(func,a,b,n,P1,...,Pn);
%
% func
: Function to be integrated
% a
: lower bound of the interval
% b
: upper bound of the interval
% n
: number of sub-intervals => n+1 points
% P1,...,Pn : parameters of the function
%
h
= (b-a)/n;
x
= a+[0:n]*h;
y
= feval(func,x,varargin{:});
trap= 0.5*h*(2*sum(y(2:n))+y(1)+y(n+1));

Simpsons rule
The simpsons rule attempts to circumvent an inefficiency of the trapezoid rule:
a composite trapezoid rule may be far too coarse if F is smooth. An alternative
is then to use a piecewise quadratic approximation of F that uses the values
of F at a, b and (b + a)/2 as interpolating nodes. Figure 4.2 illustrates the
rule. The thick line is the function F to be integrated and the thin line is
the quadratic interpolant for this function. A quadratic interpolation may be
obtained by the Lagrange interpolation formula, where = (b + a)/2
L (x) =

(x a)(x b)
(x a)(x )
(x )(x b)
F (a) +
F () +
F (b)
(a )(a b)
( a)( b)
(b a)(b )

Setting h = (b a)/2 we can approximate the integral by


Z b
Z b
(x )(x b)
(x a)(x b)
(x a)(x )
F (x)dx '
F (a)
F () +
F (b)dx
2
2
2h
h
2h2
a
a
' I1 I2 + I3
We then compute each subintegral
Z b
(x )(x b)
F (a)dx
I1 =
2h2
a
Z
F (a) b 2
=
x (b + )x + bdx
2h2 a
13

Figure 4.2: Simpsons rule


F (x) 6

a+b
2

F (b)

F (a)

a+b
2

=
=

F (a) b3 a3
b2 a2

(b
+
)
+
b(b

a)
2h2
3
2
h
F (a) 2
(b 2ba + a2 ) = F (a)
12h
3

I2 =

I3 =

(x a)(x b)
F ()dx
h2
a
Z
F () b 2
x (b + a)x + abdx
=
h2 a

3
b2 a2
F () b a3
(b + a)
+ ba(b a)
=
h2
3
2
F ()
4h
=
(b a)2 = F ()
3h
3

=
=

(x )(x a)
F (b)dx
2h2
a
Z
F (b) b 2
x (a + )x + adx
2h2 a

b2 a2
F (b) b3 a3
(a + )
+ a(b a)
2h2
3
2
14

F (b) 2
h
(b 2ba + a2 ) = F (b)
12h
3

Then, summing the 3 components, we get an approximation of the integral


given by

b+a
ba
b
I=
F (a) + 4F
+ F (b)
6
2

If, like in the midpoint rule and the trapezoid rules we want to compute a
better approximation of the integral by breaking [a; b] into n > 2 even number
of subintervals, we set h = (b a)/n, xi = x + ih, i = 0, . . . , n. Then the

composite Simpsons rule is given by

h
Ibn = [F (x0 ) + 4F (x1 ) + 2F (x2 ) + 4F (x3 ) + . . . + 2F (xn2 ) + 4F (xn1 ) + F (xn )]
3
Matlab Code: Simpsons Rule Integration
function simp=simpson(func,a,b,n,varargin);
%
% function simp=simpson(func,a,b,n,P1,...,Pn);
%
% func
: Function to be integrated
% a
: lower bound of the interval
% b
: upper bound of the interval
% n
: even number of sub-intervals => n+1 points
% P1,...,Pn : parameters of the function
%
h
= (b-a)/n;
x
= a+[0:n]*h;
y
= feval(func,x,varargin{:});
simp= h*(2*(1+rem(1:n-1,2))*y(2:n)+y(1)+y(n+1))/3;

Infinite domains and improper integrals


The methods we presented so far were defined over finite domains, but it will
be often the case at least when we will be dealing with economic problems
that the domain of integration is infinite. We will now investigate how we
can transform the problem to be able to use standard methods to compute the
integrals. Nevertheless, we have to be sure that the integral is well defined.
15

For example, let us consider the integral


Z
F (x)dx

it may not exist because of either divergence if limx F (x) =


R
or because of oscillations as in sin(x)dx. Let us restrict ourselves to the
case where the integral exists. In this case, we can approximate
Z
F (x)dx

by

F (x)dx
a

setting a and b too large enough negative and positive values. However, this
may be a particularly slow way of approximating the integral, and the next
theorem provides a indirect way to achieve higher efficiency.
Theorem 1 If : R R is a monotonically increasing, C 1 , function on
the interval [a; b] then for any integrable function F (x) on [a; b] we have
Z 1 (b)
Z b
F ((y))0 (y)dy
F (x)dx =
1 (a)

This theorem is just what we usually call a change of variables, and convert a
problem where we want to integrate a function in the variable x into a perfectly
equivalent problem where we integrate with regards to y, with y and x being
related by the nonlinear relation: x = (y).
As an example, let us assume that we want to compute the average of the
transformation of a gaussian random variable x ; N (0, 1). This is given by
Z
x2
1

G(x)e 2 dx
2
such that F (x) = G(x)e

x2
2

. As a first change of variable, that will leave the

interval unchanged, we will apply the transformation z = x/ 2 such that the


integral rewrites

2
G( 2z)ez dz

16

We would like to transform this problem since it would be quite difficult to


compute this integral on the interval [a, b] and set both a and b to large negative
and positive values. Another possibility is to assume that we compute the
integral of a transformed problem over the interval [a; b]. We therefore look
for a C 1 , monotonically increasing transformation that would insure that
limya (y) = and limyb (y) = . Let us assume that a = 0 and

b = 1, a possible candidate for (y) is

1
y
(y) = log
such that 0 (y) =
1y
y(1 y)
In this case, the integral rewrites

Z 1
1
y
1

F
dy
2 log
1y
y(1 y)
0
or
1

1y
y

log((1y)/y)

y
1
2 log
dy
G
1y
y(1 y)

which is now equivalent to compute a simple integral of the form


Z 1
h(y)dy
0

with
1
h(y)

1y
y

log(y/(1y))

G
2 log

y
1y

1
y(1 y)

Table 4.1 reports the results for the different methods we have seen so far. As
can be seen, the midpoint and the trapezoid rule perform pretty well with
20 subintervals as the error is less than 1e-4, while the Simpson rule is less
efficient as we need 40 subintervals to be able to reach a reasonable accuracy.
We will see in the next section that there exist more efficient methods to deal
with this type of problem.
Note that not all change of variable are admissible. Indeed, in this case
we might have used (y) = log(y/(1 y))1/4 , which also maps [0; 1] into R in
a monotonically increasing way. But this would not have been an admissible
17

Table 4.1: Integration with a change in variables: True value=exp(0.5)


n
2
4
8
10
20
40

Midpoint
2.2232

Trapezoid
1.1284

Simpson
1.5045

(-0.574451)

(0.520344)

(0.144219)

1.6399

1.6758

1.8582

(0.0087836)

(-0.0270535)

(-0.209519)

1.6397

1.6579

1.6519

(0.00900982)

(-0.00913495)

(-0.0031621)

1.6453

1.6520

1.6427

(0.00342031)

(-0.00332232)

(0.00604608)

1.6488

1.6487

1.6475

(-4.31809e-005)

(4.89979e-005)

(0.00117277)

1.6487

1.6487

1.6487

(-2.92988e-006)

(2.90848e-006)

(-1.24547e-005)

transformation. Why? Remember that any approximate integration has an


associated error bound that depends on the derivatives of the function to be
integrated (the overall h(.) function). If the derivatives of h(.) are well defined
when y tends towards 0 or 1 in the case we considered in our experiments,
this is not the case for the latter case. In particular, the derivatives are found
to diverge as y tends to 1, such that the error bound does not converge. In
others, we always have to make sure that the derivatives of F ((y))0 (y) have
to be defined over the interval.

4.2.2

Gaussian quadrature

As we have seen from the earlier examples, NewtonCotes formulas actually


derives from piecewise interpolation theory, as they just use a collection of low
order polynomials to get an approximation for the function to be integrated
and then integrate this approximation which is in general far easier. These
formulas also write

b
a

F (x)dx '
18

n
X
i=1

i F (xi )

for some quadrature nodes xi [a; b] and quadrature weights i . All xi s are
arbitrarily set in NewtonCotes formulas, as we have seen we just imposed a

equally spaced grid over the interval [a; b]. Then the weights i follow from
the fact that we want the approximation to be equal for a polynomial of
order lower or equal to the degree of the polynomials used to approximate the
function. The question raised by Gaussian Quadrature is then Isnt there a
more efficient way to set the nodes and the weights? The answer is clearly
R
Yes. The key point is then to try to get a good approximation to F (x)dx.

The problem is what is a good approximation? Gaussian quadrature sets the


nodes and the weights in such a way that the approximation is exact when F
is a low order polynomial.

In fact, Gaussian quadrature is a much more general than simple integration, it actually computes an approximation to the weighted integral
Z b
n
X
F (x)w(x)dx '
i F (xi )
a

i=1

Gaussian quadrature imposes that the last approximation is exact when F is a


polynomial of order 2n 1. Further, the nodes and the weights are contingent

on the weighting function. Then orthogonal polynomials are expected to come


back in the whole story. This is stated in the following theorem by Davis and
Rabinowitz [1984]
Theorem 2 Assume {` (x)}
`=0 is an orthonormal family of polynomials with

respect to the weighting function w(x) on the interval [a; b], and define ` so

that ` (x) = k xk + . . .. Let xi , i = 1, . . . , n be the roots of the polynomial


n (x). If a < x1 < . . . < xn < b and if F C 2n [a; b], then
Z b
n
X
F (2n) ()
i F (xi ) + 2
w(x)F (x)dx =
n (2n)!
a
i=1

for [a; b] with


i =

n+1 /n
0
n (x)n+1 (x)
19

>0

This theorem is of direct applicability, as it gives for any weighting function


a general formula for both the nodes and the weights. Fortunately, most of
the job has already been done, and there exist Gaussian quadrature formulas for a wide spectrum of weighting function, and the values of the nodes
and the weights are given in tables. Assume we have a family of orthogonal
polynomials, {` (x)}n`=0 , we know that for any i 6= j
hi (x), j (x)i = 0
In particular, we have
hi (x), 0 (x)i =

k (x)0 (x)w(x)dx = 0 for i > 0


a

but since the orthogonal polynomial of order 0 is 1, this reduces to


Z b
k (x)w(x)dx = 0 for i > 0
a

We will take advantage of this property. The nodes will be the roots of the
orthogonal polynomial of order n, while the weights will be chosen such that
the gaussian formulas is exact for lower order polynomials
Z b
n
X
k (x)w(x)dx =
i k (xi ) for k = 0, . . . , n 1
a

i=1

This implies that the weights can be recovered by solving a linear system of
the form

Rb
1 0 (x1 ) + . . . + n 0 (xn ) = a w(x)dx
1 1 (x1 ) + . . . + n 1 (xn ) = 0
..
.
1 n1 (x1 ) + . . . + n1 n (xn ) = 0

which rewrites

=
with

0 (x1 )
..
.

0 (xn )

..
..
, =
.
.
n1 (x1 ) n1 (xn )
20

Rb

.. and =

w(x)dx
0
..
.
0

Note that the orthogonality property of the polynomials imply that the
matrix is invertible, such that = 1 . We now review the most commonly
used Gaussian quadrature formulas.
GaussChebychev quadrature
This particular quadrature can be applied to problems that takes the form
Z

1
1

F (x)(1 x2 ) 2 dx
1

such that in this case w(x) = (1 x2 ) 2 and a = 1, b = 1. The very

attractive feature of this gaussian quadrature is that the weight is constant


and equal to i = = /n, where n is the number of nodes, such that
Z

1
1

F (x)(1 x2 ) 2 dx =

X
F (2n)()
F (xi ) + 2n1
n
2
(2n)!
i=1

for [1; 1] and where the nodes are given by the roots of the Chebychev
polynomial of order n:

xi = cos

2i 1

2n

i = 1, . . . , n

It is obviously the case that we rarely have to compute an integral that exactly
takes the form this quadrature imposes, and we are rather likely to compute
Z

F (x)dx
a

Concerning the bounds of integration, we may use the change of variable


y=2

xa
2dx
1 implying dy =
ba
ba

such that the problem rewrites


ba
2

(y + 1)(b a)
F a+
dy
2
1
1

21

The weighting matrix is still missing, nevertheless multiplying and dividing


1

the integrand by (1 y 2 ) 2 , we get


ba
2
with
G(y) F

1
1

G(y) p

1
1 y2

(y + 1)(b a)
a+
2

such that
Z

b
a

(b a) X
F
F (x)dx '
2n
i=1

dy

p
1 y2

(yi + 1)(b a)
a+
2

q
1 yi2

where yi , i = 1, . . . , n are the n GaussChebychev quadrature nodes.


GaussLegendre quadrature
This particular quadrature can be applied to problems that takes the form
Z

F (x)dx
1

such that in this case w(x) = 1 and a = 1, b = 1. We are therefore back to a

standard integration problem as the weighting function is constant and equal


to 1. We then have
Z

F (x)dx =
1

n
X

i F (xi ) +

i=1

22n+1 (n!)4 F (2n) ()


(2n + 1)!(2n)! (2n)!

for [1; 1]. In this case, both the nodes and the weights are non trivial

to compute. Nevertheless, we can generate the nodes using any root finding
procedure, and the weights can be computed as explained earlier, noting that
R1
1 w(x)dx = 2.

Like in the case of GaussChebychev quadrature, we may use the linear

transformation
y=2

2dx
xa
1 implying dy =
ba
ba
22

to be able to compute integrals of the form


Z b
F (x)dx
a

which is then approximated by


Z

b
a

baX
i F
F (x)dx '
2
i=1

(yi + 1)(b a)
a+
2

where yi and i are the GaussLegendre nodes and weights over the interval
[a; b].
Such a simple formula has a direct implication when we want to compute
the discounted value of a an asset, the welfare of an agent or the discounted
sum of profits in a finite horizon problem, as it can be computed solving the
integral

T
0

et u(c(t)) with T <

in the case of the welfare of an individual or


Z T
ert (x(t)) with T <
0

in the case of a profit function. However, it will be often the case that we will
want to compute such quantities in an infinite horizon model, something that
this quadrature method cannot achieve unless considering a change of variable
of the kind we studied earlier. Nevertheless, there exists a specific Gaussian
quadrature that can achieve this task.
As an example of the potential of GaussLegendre quadrature formula, we
compute the welfare function of an individual that lives an infinite number of
period. Time is continuous and the welfare function takes the form
Z T
c(t)
et
dt
W =

0
where we assume that c(t) = c? et . Results for n=2, 4, 8 and 12 and T=10,
50, 100 and 1000 (as an approximation to ) are reported in table 4.3, where
23

we set = 0.01, = 0.05 and c? = 1. As can be seen from the table, the
integral converges pretty fast to the true value as the absolute error is almost
zero for n > 8, except for T=1000. Note that even with n = 4 a quite high
level of accuracy can be achieved in most of the cases.
GaussLaguerre quadrature
This particular quadrature can be applied to problems that takes the form
Z
F (x)ex dx
0

such that in this case w(x) =

ex

and a = 0, b = . The quadrature formula

is then given by
Z
n
X
x
F (x)e dx =
i F (xi ) +
0

i=1

(n!)2
F (2n) ()
(2n + 1)!(2n)! (2n)!

for [0; ). In this case, like in the GaussLegendre quadrature, both


the nodes and the weights are non trivial to compute. Nevertheless, we can

generate the nodes using any root finding procedure, and the weights can be
R
computed as explained earlier, noting that 0 w(x)dx = 1.

A direct application of this formula is that it can be used to to compute

the discounted sum of any quantity in an infinite horizon problem. Consider


for instance the welfare of an individual, as it can be computed solving the
integral once we know the function c(t)
Z
et u(c(t))dt
0

The problem involves a discount rate that should be eliminated to stick to the

exact formulation of the GaussLaguerre problem. Let us consider the linear


map y = t, the problem rewrites

Z
dy
y
y
e u c

0
and can be approximated by
n

1X
i F

i=1

24

yi

Table 4.2: Welfare in finite horizon


n
2
4

= 2.5

= 1

= 0.5

= 0.9

T=10

-3.5392

-8.2420

15.3833

8.3929

(-3.19388e-006)

(-4.85944e-005)

(0.000322752)

(0.000232844)

-3.5392

-8.2420

15.3836

8.3931

(-3.10862e-014)

(-3.01981e-012)

(7.1676e-011)

(6.8459e-011)

-3.5392

-8.2420

15.3836

8.3931

(0)

(1.77636e-015)

(1.77636e-015)

(-1.77636e-015)

12

-3.5392

-8.2420

15.3836

8.3931

(-4.44089e-016)

(0)

(3.55271e-015)

(1.77636e-015)

T=50
2
4
8
12

2
4
8
12

-11.4098

-21.5457

33.6783

17.6039

(-0.00614435)

(-0.0708747)

(0.360647)

(0.242766)

-11.4159

-21.6166

34.0389

17.8467

(-3.62327e-008)

(-2.71432e-006)

(4.87265e-005)

(4.32532e-005)

-11.4159

-21.6166

34.0390

17.8467

(3.55271e-015)

(3.55271e-015)

(7.10543e-015)

(3.55271e-015)

-11.4159

-21.6166

34.0390

17.8467

(-3.55271e-015)

(-7.10543e-015)

(1.42109e-014)

(7.10543e-015)

-14.5764

8
12

16.4972

(-0.110221)

(-0.938113)

(3.63138)

-14.6866

-24.5416

36.2078

18.7749

(-1.02204e-005)

(-0.000550308)

(0.00724483)

(0.00594034)

(2.28361)

-14.6866

-24.5421

36.2150

18.7808

(3.55271e-015)

(-1.03739e-012)

(1.68896e-010)

(2.39957e-010)

-14.6866

-24.5421

36.2150

18.7808

(-5.32907e-015)

(-1.77636e-014)

(2.84217e-014)

(1.77636e-014)

-1.0153
(-14.9847)

T=100
-23.6040
32.5837

T=1000
-0.1066
0.0090
(-24.8934)

(36.3547)

0.0021
(18.8303)

-12.2966

-10.8203

7.6372

3.2140

(-3.70336)

(-14.1797)

(28.7264)

(15.6184)

-15.9954

-24.7917

34.7956

17.7361

(-0.00459599)

(-0.208262)

(1.56803)

(1.09634)

-16.0000

-24.9998

36.3557

18.8245

(-2.01256e-007)

(-0.000188532)

(0.00798507)

(0.00784393)

25

where yi and i are the GaussLaguerre nodes and weights over the interval
[0; ).

As an example of the potential of GaussLaguerre quadrature formula, we

compute the welfare function of an individual that lives an infinite number of


period. Time is continuous and the welfare function takes the form
Z
c(t)
dt
W =
et

where we assume that c(t) = c? et . Results for n=2, 4, 8 and 12 are reported

in table 4.3, where we set = 0.01, = 0.05 and c? = 1. As can be seen from
the table, the integral converges pretty fast to the true value as the absolute
error is almost zero for n > 8. It is worth noting that the method performs
far better than the GaussLegendre quadrature method with T=1000. Note
that even with n = 4 a quite high level of accuracy can be achieved in some
cases.
Table 4.3: Welfare in infinite horizon
n
2
4
8
12

= 2.5
-15.6110

= 1
-24.9907

= 0.5
36.3631

= 0.9
18.8299

(0.388994)

(0.00925028)

(0.000517411)

(0.00248525)

-15.9938

-25.0000

36.3636

18.8324

(0.00622584)

(1.90929e-006)

(3.66246e-009)

(1.59375e-007)

-16.0000

-25.0000

36.3636

18.8324

(1.26797e-006)

(6.03961e-014)

(0)

(0)

-16.0000

-25.0000

36.3636

18.8324

(2.33914e-010)

(0)

(0)

(3.55271e-015)

GaussHermite quadrature
This type of quadrature will be particularly useful when we will consider
stochastic processes with gaussian distributions as they approximate integrals
of the type

F (x)ex dx

26

such that in this case w(x) = ex and a = , b = . The quadrature

formula is then given by

Z
n
X
n! F (2n) ()
2
i F (xi ) + n
F (x)ex dx =
2
(2n)!

i=1

for (; ). In this case, like in the two last particular quadratures,

both the nodes and the weights are non trivial to compute. The nodes can be
computed using any root finding procedure, and the weights can be computed
R

as explained earlier, noting that w(x)dx = .

As aforementioned, this type of quadrature is particularly useful when we


want to compute the moments of a normal distribution. Let us assume that
x ; N (, 2 ) and that we want to compute
Z
(x)2
1

F (x)e 22 dx
2
in order to stick to the problem this type of approach can explicitly solve, we
need to transform the variable using the linear map
y=

such that the problem rewrites


Z

1
2

F ( 2y + )ey dy

and can therefore be approximated by
n

1 X

i F ( 2yi + )

i=1

where yi and i are the GaussHermite nodes and weights over the interval
(; ).
As a first example, let us compute the average of a lognormal distribution,
that is log(X) ; N (, 2 ) We then know that E(X) = exp( + 2 /2). This
27

is particularly important as we will often rely in macroeconomics on shocks


that follow a lognormal distribution. Table 4.4 reports the results as well
as the approximation error into parenthesis for = 0 and different values of
. Another direct application of this method in economics is related to the
Table 4.4: GaussHermite quadrature
n
2
4
8
12

0.01
1.00005

0.1
1.00500

0.5
1.12763

1.0
1.54308

2.0
3.76219

(8.33353e-10)

(8.35280e-06)

(0.00552249)

(0.105641)

(3.62686)

1.00005

1.00501

1.13315

1.64797

6.99531

(2.22045e-16)

(5.96634e-12)

(2.46494e-06)

(0.000752311)

(0.393743)

1.00005

1.00501

1.13315

1.64872

7.38873

(2.22045e-16)

(4.44089e-16)

(3.06422e-14)

(2.44652e-09)

(0.00032857)

1.00005

1.00501

1.13315

1.64872

7.38906

(3.55271e-15)

(3.55271e-15)

(4.88498e-15)

(1.35447e-14)

(3.4044e-08)

discretization of shocks that we will face when we will deal with methods for
solving rational expectations models. In fact, we will often face shocks that
follow Gaussian AR(1) processes
xt+1 = xt + (1 )x + t+1
where t+1 ; N (0, 2 ). This implies that
(

)
Z
Z
1
1 xt+1 xt (1 )x 2

exp
dxt+1 = f (xt+1 |xt )dxt+1 = 1
2

2
which illustrates the fact that x is a continuous random variable. The question
we now ask is: does there exist a discrete representation to x which is equivalent
to its continuous representation? The answer to this question is yes as shown
in Tauchen and Hussey [1991]2 Tauchen and Hussey propose to replace the
integral by
Z
Z
f (xt+1 |xt )
f (xt+1 |x)dxt+1 = 1
(xt+1 ; xt , x)f (xt+1 |x)dxt+1
f (xt+1 |x)
2

This is actually a direct application of gaussian quadrature.

28

where f (xt+1 |x) denotes the density of xt+1 conditional on the fact that xt = x

(therefore the unconditional density), which in our case implies that


"
(

#)

xt+1 xt (1 )x 2
f (xt+1 |xt )
1
xt+1 x 2
(xt+1 ; xt , x)
= exp

f (xt+1 |x)
2

then we can use the standard linear transformation and impose yt = (xt

x)/( 2) to get
Z

2
1
2

exp (yt+1 yt )2 yt+1


exp yt+1
dyt+1

for which we can use a GaussHermite quadrature. Assume then that we have

the quadrature nodes yi and weights i , i = 1, . . . , n, the quadrature leads to


the formula

n
1 X

j (yj ; yi ; x) ' 1

j=1

in other words we might interpret the quantity j (yj ; yi ; x) as an estimate

bij of the transition probability from state i to state j, but remember that the

quadrature is just an approximation such that it will generally be the case that
Pn
bij = 1 will not hold exactly. Tauchen and Hussey therefore propose the
j=1

following modification:

bij =

Pn

j (yj ; yi ; x)

si

j (yj ; yi ; x). We then end up with a markov chain


j=1
with nodes xi = 2yi + and transition probability ij given by the pre-

where si =

vious equation. The matlab code to generate such an approximation is then


straightforward. It yields the following 4 states approximation to an AR(1)
process with persistence = 0.9 and = 0.01 with x = 0
xd = {0.0233, 0.0074, 0.0074, 0.0233}
and

0.7330
0.1745
=
0.0077
0.0000

0.2557
0.5964
0.2214
0.0113
29

0.0113
0.2214
0.5964
0.2557

0.0000
0.0077

0.1745
0.7330

meaning for instance that we stay in state 1 with probability 0.7330, but will
transit from state 2 to state 3 with probability 0.2214.
Matlab Code: Discretization of an AR(1)
n
xbar
rho
sigma

=
=
=
=

2;
0;
0.95;
0.01;

%
%
%
%

number of nodes
mean of the x process
persistence parameter
volatility

[xx,wx] = gauss_herm(n);
% nodes and weights for x
x_d
= sqrt(2)*s*xx+mx;
% discrete states
x=xx(:,ones(n,1));
y=x;
w=wx(:,ones(n,1));
%
% computation
%
px = (exp(y.*y-(y-rx*x).*(y-rx*x)).*w)./sqrt(pi);
sx = sum(px);
px = px./sx(:,ones(n,1));

4.2.3

Potential problems

In all the cases we dealt with in the previous sections, the integral were definite or at least existed (up to some examples), but there may exist some
singularities in the function such that the integral may not be definite. For
instance think of integrating x over [0; 1], the function diverges in 0. How
will perform the methods we presented in the previous section. The following
theorem by Davis and Rabinowitz [1984] states that standard method can still
be used.
Theorem 3 Assume that there exists a continuous monotonically increasing
R1
function G : [0; 1] R such that 0 G(x)dx < and |F (x)| 6 |G(x)| on

[0; 1], the the NewtonCotes rule (with F (1) = 0 to avoid the singularity in 1)
R1
and the GaussLegendre quadrature rule converge to 0 F (x)dx as n increases
to .

Therefore, we can still apply standard methods to compute such integrals, but
convergence is much slower and the error formulas cannot be used anymore as
30

kF (k) (x)k is infinite for k > 1. Then, if we still want to use error bounds,

we need to accommodate the rules to handle singularities. There are several


ways of dealing with singularities
develop a specific quadrature method to deal with the singularity
Use a change of variable

Another potential problem is how much intervals or nodes should we use?


Usually there is no clear answer to that question, and we therefore have to
adapt the method. This is the socalled adaptive quadrature method. The
idea is to increase the number of nodes up to the point where increases in the
number of nodes do not yield any significant change in the numerical integral.
The disadvantage of this approach is the computational cost it involves.

4.2.4

Multivariate integration

There will be situations where we would like to compute multivariate integrals.


This will in particular be the case when we will deal with models in which the
economic environment is hit by stochastic shocks, or in incentives problems
where the principal has to reveal multiple characteristics. . . . In such a case,
numerical integration is on order. There are several ways of obtaining multivariate integration, among which product rules that I will describe the most,
nonproduct rules which are extremely specific to the problem we handle or
MonteCarlo and Quasi MonteCarlo methods.
Product rules
Let us assume that we want to compute the integral
Z bs
Z b1
F (x1 , . . . , xs )w1 (x1 ) . . . ws (xs )dx1 . . . dxs
...
a1

as

for the function F : Rs R and where wk is a weighting function. The idea

of product rules is just to extend the standard onedimensional quadrature


31

approach to higher dimensions by multiplying sums. For instance, let xki and
ik , k = 1, . . . , nk be the quadrature nodes and weights of the one dimensional
problem along dimension k {1, . . . , s}, which can be obtained either from a

NewtonCotes formula or a Gaussian quadrature formula. The product rule


will approximate the integral by
n1
X

...

ns
X

i11 . . . iss F (x1i1 , . . . , xsis )

is =1

i1 =1

A potential difficulty with this approach is that when the dimension of the
space increases, the computational cost increases exponentially this is the
socalled curse of dimensionality. Therefore, this approach should be restricted for low dimensions problems.
As an example of use of this type of method, let us assume that we want
to compute the first order moment of the 2 dimensional function F (x1 , x2 ),
where

x1
x2

;N

1
2


11 12
,
12 22

We therefore have to compute the integral

Z Z
1
21
0 1
1
|| (2)
F (x1 , x2 ) exp (x ) (x ) dx1 dx2
2

11 12
0
0
. Let be the Cholesky
where x = (x1 , x2 ) , = (1 , 2 ) , =
12 22
decomposition of such that = 0 , and let us make the change of variable

y = 1 (x )/ 2 x = 2y +
then, the integral rewrites
!
s
Z Z
X

2
1
yi dy1 dy2

F ( 2y + ) exp

i=1

We then use the product rule relying on onedimensional GaussHermite

quadrature, such that we approximate the integral by


n2
n1 X

1 X
i11 i22 F ( 211 y1 + 1 , 2(21 y1 + 22 y2 ) + 2 )

i1 =1 i2 =1

32

As an example (see the matlab code) we set


F (x1 , x2 ) = (ex1 e1 ) (ex2 e2 )
with = (0.1, 0.2)0 and
=

0.0100 0.0075
0.0075 0.0200

The results are reported in table 4.5, where we consider different values for
n1 and n2 . It appears that the method performs well pretty fast, as the true
value for the integral is 0.01038358129717, which is attained for n 1 > 8 and
n2 > 8.
Table 4.5: 2D Gauss-Hermite quadrature
nx \ny
2
4
8
12

2
0.01029112845254
0.01038328639869
0.01038328710679
0.01038328710679

4
0.01029142086814
0.01038358058862
0.01038358129674
0.01038358129674

8
0.01029142086857
0.01038358058906
0.01038358129717
0.01038358129717

Matlab Code: 2D GaussHermite Quadrature (Product Rule)


n
n1
[x1,w1]
n2
[x2,w2]

=
=
=
=
=

2;
8;
gauss_herm(n1);
8;
gauss_herm(n2);

Sigma
Omega
mu1
mu2

=
=
=
=

0.01*[1 0.75;0.75 2];


chol(Sigma);
0.1;
0.2;

%
%
%
%
%

dimension of the problem


# of nodes for x1
nodes and weights for x1
# of nodes for x2
nodes and weights for x2

int=0;
for i=1:n1;
for j=1:n2;
x12
= sqrt(2)*Omega*[x1(i);x2(j)]+[mu1;mu2];
f
= (exp(x12(1))-exp(mu1))*(exp(x12(2))-exp(mu2));

33

12
0.01029142086857
0.01038358058906
0.01038358129717
0.01038358129717

int

= int+w1(i)*w2(j)*f

end
end
int=int/sqrt(pi^n);

The problem is that whenever the dimension of the problem increases or


as the function becomes complicated these procedures will not perform well,
and relying on stochastic approximation may be a good idea.

4.2.5

MonteCarlo integration

MonteCarlo integration methods are sampling methods that are based on


probability theory, and rely on several trials to reveal information. From an
intuitive point of view, Monte carlo methods rest on the central limit theorem
and the law of large number and are capable of handling quite complicated and
large problems. These two features make MonteCarlo method particularly
worth learning.
A very important feature of MonteCarlo methods is that they appeal
to probability theory, therefore any result of a MonteCarlo experiment is a
random variable. This is precisely a very nice feature of MonteCarlo methods
as by their probabilistic nature, they put a lot of structure on the error of
approximation which has a probabilistic distribution. Finally, by adjusting the
size of the sample we can always increase the accuracy of the approximation.
This is just a consequence of the central limit theorem.
The basic intuition that lies behind MonteCarlo integration may be found
in figure 4.3. The dark curve is the univariate function we want to integrate
and the shaded area under this curve is the integral. Then the evaluation of
an integral using MonteCarlo simulations amounts to draw random numbers
in the xy plan (the dots in the graph), then the integral of the function f
is approximately given by the total area times the fraction of points that fall
under the curve f (x). It is then obvious that the greater the number of points
the more information we get the more accurate is the evaluation of
this area. Further, this method will prove competitive only for complicated
34

Figure 4.3: Basic idea of MonteCarlo integration

and/or multidimensional functions. Note that the integral evaluation will be


better if the points are uniformly scattered in the entire area that is if the
information is spread all over the area.
Another way to think of it is just to realize that
Z b
f (x)dx = (b a)EU[a;b] (f (x))
a

such that if we draw n random numbers, xi , i = 1, . . . , n, from a U[b;a] , an


approximation of the integral of f (x) over the interval [a; b] is given by
n

(b a) X
f (xi )
n
i=1

The key point here is the way we get random numbers.


Not so random numbers!
MonteCarlo methods are usually associated to stochastic simulations and
therefore rely on random numbers. But such numbers cannot be generated
35

by computers.3 Computers are only and this is already a great thing


capable of generating pseudorandom numbers that is numbers that look
like random numbers because they look unpredictable. However it should
be clear to you that all these numbers are just generated with deterministic
algorithms explaining the term pseudo whose implementation is said
to be of the volatile type in the sense that the seed the initial value of a
sequence depends on an external unpredictable feeder like the computer clock.
Two important properties are usually demanded to such generators:
1. zero serial correlation: we want iid sequences.
2. correct frequency of runs: we do not want to generate predictable sequences
The most wellknown and the simplest random number generator relies on
the socalled linear congruential method which obeys the equation
xk+1 = axk + c (mod m)
One big advantage of this method is that it is pretty fast and cheap. The most
popular implementation of this scheme assumes that a = 3(mod8), c = 0 and

m = 2b where b is the number of significant bits available on the computer


(these days 32 or 64). Using this scheme we then generates sequences that
ressemble random numbers.4 For example figure 4.4 reports a sequence of 250
random numbers generated by this pseudo random numbers generator, and
as can be seen it looks like random numbers, it smells randomness, it tastes
randomness but this is not randomness! In fact, linear congruential methods
are not immune from serial correlation on successive calls: if k random numbers
generators at a time are used to plot points in k dimensional space, then the
points will not fill up the kdimensional space but they will tend to lie on
3

There have been attempts to build truly random number generators, but these technics
were far too costly and awkward.
4
Generating a 2 dimensional sequence may be done extracting subsequences: yk =
(x2k+1 , x2k+2 ).

36

(k 1)dimensional planes. This can easily be seen as soon as we plot xk+1

against xk , as done in figure 4.5. This too pronounced nonrandom pattern


Figure 4.4: A pseudo random numbers draw (linear congruential generator)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

50

100

150

200

250

for these numbers led to push linear congruential methods into disfavor, the
solution has been to design more complicated generators. An example of
those generators quoted by Judd [1998] is the multiple prime random number
generator for which we report the matlab code. This pseudo random numbers
generator proposed by Haas [1987] generates integers between 0 and 99999,
such that dividing the sequence by 100,000 returns numbers that approximate
a uniform random variable over [0;1] with 5 digits precision. If higher precision
is needed, the sequence may just be concatenated using the scheme (for 8 digits
precision) 100, 000 x2k + x2k+1 . The advantage of this generator is that its

period is over 85 trillions!

long = 10000;
m
= 971;
ia
= 11113;

Matlab Code: Prime Random Number Generator


% length of the sample

37

Figure 4.5: The linear congruential generator


1
0.9
0.8
0.7

k+1

0.6
0.5
0.4
0.3
0.2
0.1
0
0

0.1

0.2

0.3

0.4

0.5
xk

0.6

0.7

0.8

0.9

ib
= 104322;
x
= zeros(long,1);
x(1) = 481;
for i= 2:long;
m = m+7;
ia= ia+1907;
ib= ib+73939;
if m>=9973;m=m-9871;end
if ia>=99991;ia=ia-89989;end
if ib>=224729;ib=ib-96233;end
x(i)=mod(x(i-1)*m+ia+ib,100000)/10;
end

Over generators may be designed and can be nonlinear as


xk+1 = f (xk )mod m
or may take rather strange formulations as the one reported by Judd [1998],
that begins with a sequence of 55 odd numbers and computes
xk = (xk24 xk55 )mod 232
38

which has a period length of 102 5, such that it passes a lot of randomness
tests.
A key feature of all these random number generators is that they attempt
to draw numbers from a uniform distribution over the interval [0;1]. There
however may be some cases where we would like to draw numbers from another
distribution mainly the normal distribution. The way to handle this problem is then to invert the cumulative density function of the distribution we
want to generate to get a random draw from this particular distribution. More
formally, assume we want numbers generated from the distribution F (.), and
N
we have a draw {xi }N
i=1 from the uniform distribution, then the draw {yi }i=1

from the f distribution may be obtained solving


Z yi
F (s)ds = xi for i = 1, . . . , N
a

Inverting this function may be trivial in some cases (say the uniform over [a;b])

but it may require approximation as in the case of a normal distribution.


MonteCarlo integration
The underlying idea of MonteCarlo integration may be found in the Law of
Large Numbers
Theorem 4 (Law of Large Numbers) If Xi is a collection of i.i.d. random variables with density (x) then
Z
N
1 X
lim
Xi = x(x)dx almost surely.
N N
i=1

Further, we know that in this case


!

N
2
1 X
Xi =
where 2 = var(Xi )
var
N
N
i=1

If 2 is not known it can be estimated by

b2 =

N
N
2
1 X
1 X
Xi X with X =
Xi
N 1
N
i=1

i=1

39

With this in mind, we understand the potential of MonteCarlo methods for


numerical integration. Integrating a function F (x) over [0;1] is nothing else
than computing the mean of F (x) assuming that x ; U[0;1] therefore a crude
R1
application of MonteCarlo method to compute the integral 0 F (x)dx is to
draw N numbers, xi , from a U[0;1] distribution and take
N
1 X
IbF =
F (xi )
N
i=1

as an approximation to the integral. Further, as this is just an estimate of the


integral, it is a random variable that has variance
Z
2
1 1
2
(F (x) IF )2 dx = F
Ib =
F
N 0
N

where F2 may be estimated by

2
1 X
F (xi ) IbF
=
N 1
N

bF2

i=1

such that the standard error of the MonteCarlo integral is Ibf =


bf / N .

As an example of a crude application of MonteCarlo integration, we re-

port in table 4.6 the results obtained integrating the exponential function over
[0;1]. This table illustrates why MonteCarlo integration is seldom used (i)
Table 4.6: Crude MonteCarlo example:
N
10
100
1000
10000
100000
1000000

Ibf

1.54903750
1.69945455
1.72543465
1.72454262
1.72139292
1.71853252

bIbf

R1
0

ex dx.

0.13529216
0.05408852
0.01625793
0.00494992
0.00156246
0.00049203

True value: 1.71828182

for univariate integration and (ii) without modification. Indeed, as can be


40

seen a huge number of data points is needed to achieve, on average, a good


enough approximation as 1000000 points are needed to get an error lower than
0.5e-4 and the standard deviation associated to each experiment is far too
high as even with only 10 data points a Student test would lead us to accept the approximation despite its evident lack of accuracy! Therefore several
modifications are usually proposed in order to circumvent these drawbacks.
Antithetic variates: This acceleration method lies on the idea that if

f is monotonically increasing then f (x) and f (1 x) are negatively


correlated. Then estimating the integral as

N
1 X
(F (xi ) + F (1 xi ))
IbfA =
2N
i=1

will still furnish an unbiased estimator of the integral while delivering


a lower variance of the estimator because of the negative correlation
between F (x) and F (1 x):
var(IbfA ) =

var(F (x)) + var(F (1 x)) + 2 cov(F (x), F (1 x))


4N
F2 + cov(F (x), F (1 x))
2
6 F
2N
N

This method is particularly recommended when F is monotone. Table


4.7 illustrates the potential of the approach for the previous example.
As can be seen the gains in terms of volatility are particularly important
but these are also important in terms of average even in small sample.5
Stratified sampling: Stratified sampling rests on the basic and quite ap-

pealing idea that the variance of f over a subinterval of [0;1] should be


lower than the variance over the whole interval. The underlying idea is
to prevent draws from clustering in a particular region of the interval,
and therefore we force the procedure to visit each subinterval, and by
this we enlarge the information set used by the algorithm.

5
Note that we used the same seed when generating this integral and the one we generate
using crude MonteCarlo.

41

Table 4.7: Antithetic variates example:


N
10
100
1000
10000
100000
1000000

Ibf

1.71170096
1.73211884
1.72472178
1.71917393
1.71874441
1.71827383

bIbf

R1
0

ex dx.

0.02061231
0.00908890
0.00282691
0.00088709
0.00027981
0.00008845

True value: 1.71828182

The stratified sampling approach works as follows. We set (0, 1) and


we draw Na = N data points over [0; ] and Nb = N Na = (1 )N

over [; 1]. Then the integral can be evaluated by

Nb
Na
1 X
1 X
s
a
b
If =
F (xi ) +
F (xbi )
Na
Nb
i=1

i=1

where xai [0; ] and xbi [; 1]. Then the variance of this estimator is

given by

(1 )2
2
vara (F (x)) +
varb (F (x))
Na
Nb
which equals

(1 )
vara (F (x)) +
varb (F (x))
N
N
Table 4.8 reports results for the exponential function for = 0.25. As can
be seen from the table, up to the 10 points example,6 there is hopefully
no differences between the crude MonteCarlo method and the stratified
sampling approach in the evaluation of the integral and we find potential
gain in the use of this approach in the variance of the estimates. The
potential problem that remains to be fixed is How should be selected?
In fact we would like to select such that we minimize the volatility,
6

This is related to the very small sample in this case.

42

Table 4.8: Stratified sampling example:


Ibf

N
10
100
1000
10000
100000

1.52182534
1.69945455
1.72543465
1.72454262
1.72139292

bIbf

R1
0

ex dx.

0.11224567
0.04137204
0.01187637
0.00359030
0.00114040

True value: 1.71828182

which amounts to set such that


vara (F (x)) = varb (F (x))
which drives the overall variance to
varb (F (x))
N
Control variates: The method of control variates tries to extract infor-

mation from a function that approximates the function to be integrated


arbitrarily well, while begin easy to integrate. Hence, assume there exists a function that is similar to F , but that can be easily integrated,
the identity

F (x)dx =

(F (x) (x))dx +

(x)dx

restates the problem as the MonteCarlo integration of (F ) plus

the known integral of . The variance of (F ) is given by F2 + 2


2cov(F, ) which is lower than the variance of F2 provided the covariance
between F and is high enough.
In our example, we may use as the function: 1 + x since exp(x) ' 1 + x
R1
in a neighborhood of zero. 0 (1+x)dx is simple to compute and equal to

1.5. Table 4.9 reports the results. As can be seen the method performs
a little worse than the antithetic variates, but far better than the crude
MonteCarlo.
43

Table 4.9: Control variates example:


N
10
100
1000
10000
100000
1000000

Ibf

bIbf

1.64503465
1.71897083
1.72499149
1.72132486
1.71983807
1.71838279

R1
0

ex dx.

0.05006855
0.02293349
0.00688639
0.00210111
0.00066429
0.00020900

True value: 1.71828182

Importance sampling: Importance sampling attempts to circumvent a

insufficiency of crude MonteCarlo method: by drawing numbers from

a uniform distribution, information is spread all over the interval we are


sampling over. But there are some cases where this is not the most
efficient strategy. Further, it may exist a simple transformation of the
problem for which MonteCarlo integration can be improved to generate
a far better result in terms of variance. More formally, assume you want
to integrate F over a given interval
Z
F (x)dx
D

now assume there exists a function G such that H = F/G is almost


constant over the domain of integration D, the problem may be restated
Z

F (x)
G(x)dx
G(x)

H(x)G(x)dx
D

Then we can easily integrate F by instead sampling H, but not by


drawing numbers from a uniform density function but rather from a non
uniform density G(x)dx. Then the approximated integral is given by
N
1 X F (xi )
IbFis =
N
G(xi )
i=1

44

and it has variance

I2bis

Z
Z
2 !
h2
F (x)
1
F (x)2
=
G(x)dx
G(x)dx
2
N
N
D G(x)
D G(x)
Z
2 !
Z
1
F (x)2
F (x)dx
dx
N
D G(x)
D

The problem we still have is how should G be selected? In fact, we see


from the variance that if G were exactly F the variance would reduce to
zero, but then what would be the gain? and it may be the case that G
would not be a distribution or may be far too complicated to sample. In
fact we would like to have G to display a shape close to that of F while
being simple to sample.
In the example reported in table 4.10, we used G(x) = (1 + )x , with
= 1.5. As can be seen the gains in terms of variance are particularly
important, which render the method particularly attractive, nevertheless
the selection of the G function requires a pretty good knowledge of the
function to be integrated, which will not be the case in a number of
economic problems.

Table 4.10: Importance sampling example:


N
10
100
1000
10000
100000
1000000

Ibf

1.54903750
1.69945455
1.72543465
1.72454262
1.72139292
1.71853252

True value: 1.71828182

45

bIbf

0.04278314
0.00540885
0.00051412
0.00004950
0.00000494
0.00000049

R1
0

ex dx.

4.2.6

QuasiMonte Carlo methods

QuasiMonte Carlo methods are fundamentally different from MonteCarlo


methods although they look very similar. Indeed, in contrast to MonteCarlo
methods that relied on probability theory, quasiMonte Carlo methods rely
on number theory (and Fourier analysis, but we will not explore this avenue
here). In fact, as we have seen, MonteCarlo methods use pseudorandom
numbers generators, that are actually deterministic schemes. A first question
that may then be addressed to such an approach is: If the MC sequences are
deterministic, how can I use probability theory to get theoretical results? and
in particular What is the applicability of the Law of Large Numbers and the
Central Limit Theorem? This is however a bit unfair as many new random
number generators pass the randomness tests. Nevertheless, why not acknowledging the deterministic nature of these sequences and try to use them? This
is what is proposed by QuasiMonte Carlo methods.
There is another nice feature of QuasiMonte Carlo methods, which is
related to the rate of convergence of the method. Indeed, we have seen that
choosing N points uniformly in an ndimensional space leads to an error in

MonteCarlo that diminishes as 1/ N . From an intuitive point of view, this


comes from the fact that each new point adds linearly to an accumulated sum
that will become the function average, and also linearly to an accumulated
sum of squares that will become the variance. Since the estimated error is
the square root of the variance, the power is N 1/2 . But we can accelerate
the convergence relying on some purely deterministic schemes, as quasiMonte
Carlo methods do.

QuasiMonte Carlo methods rely on equidistributed sequences, that is


sequence that satisfy the following definition.

n
Definition 1 A sequence {xi }
i=1 D R is said to be equidistributed

46

over the domain D iff


N

(D) X
F (xi ) =
lim
N N
i=1

F (x)dx
D

for all Riemanintegrable function F : Rn R, where (D) is the Lebesgue


measure of D.

In order to better understand what it exactly means, let us consider the


unidimensional case the sequence {xi }
i=1 R is equidistributed if for any
Riemannintegrable function we have
N

(b a) X
lim
F (xi ) =
N
N
i=1

F (x)dx
a

This is therefore just a formal statement of a uniform distribution, as it just


states that if we sample correctly data points over the interval [a; b] then
these points should deliver a valid approximation to the integration problem.
From an intuitive point of view, equidistributed sequences are just deterministic sequences that mimic the uniform distribution, but since they are, by
essence, deterministic, we can select their exact location and therefore we can
avoid clustering or sampling twice the same point. This is why QuasiMonte
Carlo methods appear to be so attractive: they should be more efficient.
There exist different ways of selecting equidistributed sequences. Judd
[1998], chapter 9, reports different sequences that may be used, but they share
the common feature of being generated by the scheme
xk+1 = (xk + ) mod 1
which amounts to take the fractional part of k.7 should be an irrational
number. These sequences are among others
7

Remember that the fractional part is that part of a number that lies right after the dot.
This is denoted by {.}, such that {2.5} = 0.5. This can be computed as
{x} = x max{k Z|k 6 x}
The matlab function that return this component is x-fix(x).

47

Weyl: ({k p1 }, . . . , {k pn }), where n is the dimension of the space.

k(k+1)
Haber: { k(k+1)
p
p
},
.
.
.
,
{
}
1
n
2
2

Niederreiter: {k 21/(1+n) }, . . . , {k 2n/(1+n) }

Baker: ({k er1 }, . . . , {kern }), rs are rational and distinct numbers
In all these cases, the ps are usually prime numbers. Figure 4.6 reports a 2
dimensional sample of 1000 points for each type of sequence. There obviously
Figure 4.6: QuasiMonte Carlo sequences
Weyl sequence

1
0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
0

0.2

0.4

0.6

0.8

0
0

Niederreiter sequence

0.8

0.6

0.6

0.4

0.4

0.2

0.2
0.2

0.4

0.6

0.8

0.2

0
0

0.4

0.6

0.8

0.8

Baker sequence

0.8

0
0

Haber sequence

0.2

0.4

0.6

exist other ways of obtaining sequences for quasiMonte carlo methods that
rely on low discrepancy approaches, Fourier methods, or the socalled good
lattice points approach. The interested reader may refer to chapter 9 in Judd
48

[1998], but we will not investigate this any further as this would bring us far
away from our initial purpose.
Matlab Code: Equidistributed Sequences
n=2;
% dimension of the space
nb=1000;
% number of data points
K=[1:nb];
% k=1,...,nb
seq=NIEDERREITER;
% Type of sequence
switch upper(seq)
case WEYL
% Weyl
p=sqrt(primes(n+1));
x=K*p;
x=x-fix(x);
case HABER
% Haber
p=sqrt(primes(n+1));
x=(K.*(K+1)./2)*p;
x=x-fix(x);
case NIEDERREITER
% Niederreiter
x=K*(2.^((1:n)/(1+n)));
x=x-fix(x);
case BAKER
% Baker
x=K*exp(1./primes(n+1));
x=x-fix(x);
otherwise
error(Unknown sequence requested)
end

As an example, we report in table 4.11 the results obtained integrating


the exponential function over [0;1]. Once again the potential gain of this type
of method will be found in approximating integral of multidimensional or
complicated functions. Further, as for MonteCarlo methods, this type of
integration is not restricted to the [0; 1]n hypercube. You may transform the
function, or perform a change of variables to be able to use the method. Finally
note, that we may apply all the acceleration methods applied to MonteCarlo
technics to the quasiMonte Carlo approach too.

49

Table 4.11: Quasi MonteCarlo example:

ex dx.

N
10

Weyl
1.67548650
(0.0427953)

(0.00186656)

(0.0427953)

(0.104939)

100

1.71386433

1.75678423

1.71386433

1.71871676

(0.0044175)

(0.0385024)

(0.0044175)

(0.000434929)

1000

1.71803058

1.71480932

1.71803058

1.71817437

(0.000251247)

(0.00347251)

(0.000251247)

(0.000107457)

10000
100000
1000000

Haber
1.72014839

R1

Niederreiter
1.67548650

Baker
1.82322097

1.71830854

1.71495774

1.71830854

1.71829897

(2.67146e-005)

(0.00332409)

(2.67146e-005)

(1.71431e-005)

1.71829045

1.71890493

1.71829045

1.71827363

(8.62217e-006)

(0.000623101)

(8.62217e-006)

(8.20223e-006)

1.71828227

1.71816697

1.71828227

1.71828124

(4.36844e-007)

(0.000114855)

(4.36844e-007)

(5.9314e-007)

True value: 1.71828182, absolute error into parenthesis.

50

Bibliography
Davis, P.J. and P. Rabinowitz, Methods of Numerical Integration, New York:
Academic Press, 1984.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.

51

Index
Antithetic variates, 41

Richardson Extrapolation, 5

Composite rule, 11

Simpsons rule, 13

Control variates, 43

Stratified sampling, 41

GaussChebychev quadrature, 21

Trapezoid rule, 11

GaussLaguerre quadrature, 24
GaussLegendre quadrature, 22
Hessian, 1
Importance sampling, 44
Jacobian, 1
Law of large numbers, 39
Midpoint rule, 10
MonteCarlo, 34
NewtonCotes, 10
Pseudorandom numbers, 36
Quadrature, 9
Quadrature nodes, 18
Quadrature weights, 18
QuasiMonte Carlo, 46
Random numbers generators, 35
52

Contents
4 Numerical differentiation and integration
4.1

4.2

Numerical differentiation . . . . . . . . . . . . . . . . . . . . . .

4.1.1

Computation of derivatives . . . . . . . . . . . . . . . .

4.1.2

Partial Derivatives . . . . . . . . . . . . . . . . . . . . .

4.1.3

Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . .

Numerical Integration . . . . . . . . . . . . . . . . . . . . . . .

4.2.1

NewtonCotes formulas . . . . . . . . . . . . . . . . . .

10

4.2.2

Gaussian quadrature . . . . . . . . . . . . . . . . . . . .

18

4.2.3

Potential problems . . . . . . . . . . . . . . . . . . . . .

30

4.2.4

Multivariate integration . . . . . . . . . . . . . . . . . .

31

4.2.5

MonteCarlo integration . . . . . . . . . . . . . . . . . .

34

4.2.6

QuasiMonte Carlo methods . . . . . . . . . . . . . . .

46

53

54

List of Figures
4.1

NewtonCotes integration . . . . . . . . . . . . . . . . . . . . .

10

4.2

Simpsons rule . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

4.3

Basic idea of MonteCarlo integration . . . . . . . . . . . . . .

35

4.4

A pseudo random numbers draw (linear congruential generator)

37

4.5

The linear congruential generator . . . . . . . . . . . . . . . . .

38

4.6

QuasiMonte Carlo sequences . . . . . . . . . . . . . . . . . . .

48

55

56

List of Tables
4.1

Integration with a change in variables: True value=exp(0.5) . .

18

4.2

Welfare in finite horizon . . . . . . . . . . . . . . . . . . . . . .

25

4.3

Welfare in infinite horizon . . . . . . . . . . . . . . . . . . . . .

26

4.4

GaussHermite quadrature . . . . . . . . . . . . . . . . . . . .

28

4.5

. . . . . . . . . . . . .

33

. . . . . . . . . . . . .

40

. . . . . . . . . . . . .

42

. . . . . . . . . . . . .

43

. . . . . . . . . . . . .

44

. . . . . . . . . . . . .

45

. . . . . . . . . . . . .

50

2D Gauss-Hermite quadrature . . . . . .
R1
4.6 Crude MonteCarlo example: 0 ex dx. .
R1
4.7 Antithetic variates example: 0 ex dx. . .
R1
4.8 Stratified sampling example: 0 ex dx. .
R1
4.9 Control variates example: 0 ex dx. . . .
R1
4.10 Importance sampling example: 0 ex dx.
R1
4.11 Quasi MonteCarlo example: 0 ex dx. .

57

Lecture Notes 5

Solving nonlinear systems of


equations
The core of modern macroeconomics lies in the concept of equilibrium, which
is usually expressed as a system of plausibly nonlinear equations which can
be either viewed as finding the zero of a a given function F : Rn R, such
that x? Rn satisfies
F (x? ) = 0
or may be thought of finding a fixed point, such that x? Rn satisfies
F (x? ) = x?
Note however that the latter may be easily restated as finding the zero of
G(x) F (x) x.

5.1
5.1.1

Solving one dimensional problems


General iteration procedures

The idea is here to express the problem as a fixed point, such that we would
like to solve the onedimensional problem of the form
x = f (x)
1

(5.1)

The idea is then particularly straightforward. If we are given a fixed point


equation of the form (5.1) on an interval I , then starting from an initial
value x0 I , a sequence {xk } can be constructed by setting
xk = f (xk1 ) for k = 1, 2, . . .
Note that this sequence can be constructed if for every k = 1, 2, . . .
xk = f (xk1 ) I
If the sequence {xk } converges i.e.
lim xk = x? I

xk

then x? is a solution of (5.1), such a procedure is called an iterative procedure.


There are obviously restrictions on the behavior of the function in order to be
sure to get a solution.
Theorem 1 (Existence theorem) For a finite closed interval I , the equation x = f (x) has at least one solution x? I if
1. f is continuous on I ,
2. f (x) I for all x I .
If this theorem establishes the existence of at least one solution, we need
to establish its uniqueness. This can be achieved appealing to the socalled
Lipschitz condition for f .
Definition 1 If there exists a number K [0; 1] so that
|f (x) f (x0 )| 6 K|x x0 |for allx, x0 I
then f is said to be Lipschitzbounded.
A direct and more implementable implication of this definition is that
any function f for which |f 0 (x)| < K < 1 for all x I is Lipschitzbounded.
We then have the following theorem that established the uniqueness of the
solution
2

Theorem 2 (Uniqueness theorem) The equation x = f (x) has at most


one solution x? I if f is lipschitzbounded in I .
The implementation of the method is then straightforward
1. Assign an initial value, xk , k = 0, to x and a vector of termination
criteria (1 , 2 , 3 ) > 0
2. Compute f (xk )
3. If either
(a) |xk xk1 | 6 1 |xk | (Relative iteration error)
(b) or |xk xk1 | 6 2 (Absolute iteration error)
(c) or |f (xk ) xk | 6 3 (Absolute functional error)
is satisfied then stop and set x? = xk , else go to the next step.
4. Set xk = f (xk1 ) and go back to 2.
Note that the stopping criterion is usually preferred to the second one. Further,
the updating scheme
xk = f (xk1 )
is not always a good idea, and we might prefer to use
xk = k xk1 + (1 k )f (xk1 )
where k [0; 1] and limk k = 0. This latter process smoothes convergence, which therefore takes more iterations, but enhances the behavior of the
algorithm in the sense it often avoids crazy behavior of xk .
As an example let us take the simple function
f (x) = exp((x 2)2 ) 2
such that we want to find x? that solves
x? = exp((x 2)2 ) 2
3

Let us start from x0 =0.95. The simple iterative scheme is found to be diverging
as illustrated in figure 5.1 and shown in table 5.1. Why? simply because the
function is not Lipschitz bounded in a neighborhood of the initial condition!
Nevertheless, as soon as we set 0 = 1 and k = 0.99k1 the algorithm is able
to find a solution, as illustrated in table 5.2. In fact, this trick is a numerical
way to circumvent the fact that the function is not Lipschitz bounded.

Figure 5.1: Nonconverging iterative scheme


2.5
2

F(x)

1.5

1
0.5
0
0.5
1
0.8

0.9

1.1

1.2

1.3

1.4

1.5

Table 5.1: Divergence in simple iterative procedure


k
1
2
3
4

xk
1.011686
0.655850
4.090549
77.074938

|xk xk1 |/|xk |


6.168583e-002
3.558355e-001
3.434699e+000
7.298439e+001

|f (xk ) xk |
3.558355e-001
3.434699e+000
7.298439e+001
n.c

Table 5.2: Convergence in modified iterative procedure


k
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

xk
1.011686
1.011686
1.007788
7 1.000619
0.991509
0.982078
0.973735
0.967321
0.963024
0.960526
0.959281
0.958757
0.958577
0.958528
0.958518
0.958517

k
1.0000
0.9900
0.9801
0.9703
0.9606
0.9510
0.9415
0.9321
0.9227
0.9135
0.9044
0.8953
0.8864
0.8775
0.8687
0.8601

|xk xk1 |/|xk |


6.168583e-002
6.168583e-002
5.717130e-002
4.886409e-002
3.830308e-002
2.736281e-002
1.767839e-002
1.023070e-002
5.238553e-003
2.335836e-003
8.880916e-004
2.797410e-004
7.002284e-005
1.303966e-005
1.600526e-006
9.585968e-008

|f (xk ) xk |
3.558355e-001
3.558355e-001
3.313568e-001
2.856971e-001
2.264715e-001
1.636896e-001
1.068652e-001
6.234596e-002
3.209798e-002
1.435778e-002
5.467532e-003
1.723373e-003
4.314821e-004
8.035568e-005
9.863215e-006
5.907345e-007

5.1.2

Bisection method

We now turn to another method that relies on bracketing and bisection of


the interval on which the zero lies. Suppose f is a continuous function on an
interval I = [a; b], such that f (a)f (b) < 0, meaning that f crosses the zero
line at least once on the interval, as stated by the intermediate value theorem.
The method then works as follows.
1. Define an interval [a; b] (a < b) on which we want to find a zero of f ,
such that f (a)f (b) < 0 and a stopping criterion > 0.
2. Set x0 = a, x1 = b, y0 = f (x0 ) and y1 = f (x1 );
3. Compute the bisection of the inclusion interval
x2 =

x0 + x 1
2

and compute y2 = f (x2 ).


4. Determine the new interval
If y0 y2 < 0, then x? lies between x0 and x2 thus set
x0 = x 0 , x1 = x 2
y0 = y 0 , y 1 = y 2
else x? lies between x1 and x2 thus set
x0 = x 1 , x1 = x 2
y0 = y 1 , y 1 = y 2
5. if |x1 x0 | 6 (1 + |x0 | + |x1 |) then stop and set
x? = x 3
This algorithm is illustrated in figure 5.2. Table 5.3 reports the convergence
scheme for the bisection algorithm when we solve the problem for finding the
fixed point of
x = exp((x 2)2 ) 2
6

Figure 5.2: The Bisection algorithm


6

z
|

I1
}|
|

{z
I2

{z
I0

b
}

As can be seen, it takes more iterations than in the previous iterative scheme
(27 iteration for a=0.5 and b=1.5, but still 19 with a=0.95 and b=0.96!), but
the bisection method is actually implementable in a much greater number of
cases as it only requires continuity of the function while not imposing the
Lipschitz condition on the function.
Matlab Code: Bisection Algorithm
function x=bisection(f,a,b,varargin);
%
% function x=bisection(f,a,b,P1,P2,...);
%
% f
: function for which we want to find a zero
% a,b
: lower and upper bounds of the interval (a<b)
% P1,...
: parameters of the function
%
% x solution
%
epsi
= 1e-8;
x0
= a;
x1
= b;
y0
= feval(f,x0,varargin{:});

Table 5.3: Bisection progression


iteration
1
2
3
4
5
10
15
20
25
26

y1

x2
1.000000
0.750000
0.875000
0.937500
0.968750
0.958008
0.958527
0.958516
0.958516
0.958516

error
5.000000e-001
2.500000e-001
1.250000e-001
6.250000e-002
3.125000e-002
9.765625e-004
3.051758e-005
9.536743e-007
2.980232e-008
1.490116e-008

= feval(f,x1,varargin{:});

if a>=b
error(a should be greater than b)
end
if y0*y1>=0
error(a and b should be such that f(a)f(b)<0!)
end
err = 1;
while err>0;
x2
= (x0+x1)/2;
y2
= feval(f,x2,varargin{:});
if y2*y0<0;
x1
= x2;
y1
= y2;
else
x0
= x1;
x1
= x2;
y0
= y1;
y1
= y2;
end
err
= abs(x1-x0)-epsi*(1+abs(x0)+abs(x1));
end
x
= x2;

5.1.3

Newtons method

While bisection has proven to be more stable than the previous algorithm it
displays slow convergence. Newtons method will be more efficient as it will
take advantage of information available on the derivatives of the function. A
simple way to understand the underlying idea of Newtons method is to go
back to the Taylor expansion of the function f
f (x? ) ' f (xk ) + (x? xk )f 0 (xk )
but since x? is a zero of the function, we have
f (xk ) + (x? xk )f 0 (xk ) = 0
which, replacing xk+1 by the new guess one may formulate for a candidate
solution, we have the recursive scheme
xk+1 = xk

f (xk )
= xk + k
f 0 (xk )

with k = f (xk )/f 0 (xk 0 Then, the idea of the algorithm is straightforward.
For a given guess xk , we compute the tangent line of the function at xk and
find the zero of this linear approximation to formulate a new guess. This
process is illustrated in figure 5.3. The algorithm then works as follows.
1. Assign an initial value, xk , k = 0, to x and a vector of termination
criteria (1 , 2 ) > 0
2. Compute f (xk ) and the associated derivative f 0 (xk ) and therefore the
step k = f (xk )/f 0 (xk )
3. Compute xk+1 = xk + k
4. if |xk+1 xk | < 1 (1 + |xk+1 |) then goto 5, else go back to 2
5. if |f (xk+1 )| < 2 then stop and set x? = xk+1 ; else report failure.
9

Figure 5.3: The Newtons algorithm


6

x0

x1

A delicate point in this algorithm is that we have to compute the derivative


of the function, which has to be done numerically if the derivative is not known.
Table 5.4 reports the progression of the Newtons algorithm in the case of our
test function.
Matlab Code: Simple Newtons Method
function [x,term]=newton_1d(f,x0,varargin);
%
% function x=newton_1d(f,x0,P1,P2,...);
%
% f
: function for which we want to find a zero
% x0
: initial condition for x
% P1,...
: parameters of the function
%
% x
: solution
% Term
: Termination status (1->OK, 0-> Failure)
%
eps1
= 1e-8;
eps2
= 1e-8;
dev
= diag(.00001*max(abs(x0),1e-8*ones(size(x0))));

10

Table 5.4: Newton progression


iteration
1
2
3
4
5
6

y0

xk
0.737168
0.900057
0.954139
0.958491
0.958516
0.958516

error
2.371682e-001
1.628891e-001
5.408138e-002
4.352740e-003
2.504984e-005
8.215213e-010

= feval(f,x0,varargin{:});

err
= 1;
while err>0;
dy0 = (feval(f,x0+dev,varargin{:})-feval(f,x0-dev,varargin{:}))/(2*dev);
if dy0==0;
error(Algorithm stuck at a local optimum)
end
d0
= -y0/dy0;
x
= x0+d0
err = abs(x-x0)-eps1*(1+abs(x));
y0
= yeval(f,x,varargin{:})
ferr = abs(y0);
x0
= x;
end
if ferr<eps2;
term = 1;
else
term = 0;
end

Note that in order to apply this method, we need the first order derivative
of f to be nonzero at each evaluation point, otherwise the algorithm degenerates as it is stuck a a local optimum. Further the algorithm may get stuck into
a cycle, as illustrated in figure 5.4. As we can see from the graph, because the
function has the same derivative in x0 and x1 , the Newton iterative scheme
cycles between the two values x0 and x1 .
11

Figure 5.4: Pathological Cycling behavior


6

x0

x1

12

A way to escape from these pathological behaviors is to alter the recursive


scheme. A first way would be to set
xk+1 = xk + k
with [0, 1]. A better method is the socalled damped Newtons method
which replaces the standard iteration by
xk+1 = xk +

k
2j

where

1 f (xk )

j = min i : 0 6 i 6 imax , f xk j 0
< |f (xk )|
2 f (xk )

Should this condition impossible to fulfill, one continues the process setting
j = 0 as usual. In practice, one sets imax = 4. However, in some cases, imax ,
should be adjusted. Increasing imax helps at the cost of larger computational
time.

5.1.4

Secant methods (or Regula falsi)

Secant methods just start by noticing that in the Newtons method, we need
to evaluate the first order derivative of the function, which may be quite
costly. Regula falsi methods therefore propose to replace the evaluation of the
derivative by the secant, such that the step is replaced by
k = f (xk )

xk xk1
for k = 1, . . .
f (xk ) f (xk1 )

therefore one has to feed the algorithm with two initial conditions x0 and x1 ,
satisfying f (x0 )f (x1 ) < 0. The algorithm then writes as follows
1. Assign 2 initial value, xk , k = 0, 1, to x and a vector of termination
criteria (1 , 2 ) > 0
2. Compute f (xk ) and the step
k = f (xk )

xk xk1
f (xk ) f (xk1 )

13

3. Compute xk+1 = xk + k
4. if |xk+1 xk | < 1 (1 + |xk+1 |) then goto 5, else go back to 2
5. if |f (xk+1 )| < 2 then stop and set x? = xk+1 ; else report failure.

Matlab Code: Secant Method


function [x,term]=secant_1d(f,x0,x1,varargin);
eps1
= 1e-8;
eps2
= 1e-8;
y0
= feval(f,x0,varargin{:});
y1
= feval(f,x1,varargin{:});
if y0*y1>0;
error(x0 and x1 must be such that f(x0)f(x1)<0);
end
err = 1;
while err>0;
d
= -y1*(x1-x0)/(y1-y0);
x
= x1+d;
y
= feval(f,x,varargin{:});
err = abs(x-x1)-eps1*(1+abs(x));
ferr = abs(y);
x0
= x1;
x1
= x;
y0
= y1;
y1
= y;
end
if ferr<eps2;
term=1;
else
term=0
end

Table 5.5 reports the progression of the algorithm starting from x 0 =


0.5 and x1 = 1.5 for our showcase function. This method suffers the same
convergence problems as the Newtons method, but may be faster as it does
not involve the computation of the first order derivatives.
14

Table 5.5: Secant progression


iteration
1
2
3
4
5
6
7
8
9

5.2

xk
1.259230
0.724297
1.049332
0.985310
0.955269
0.958631
0.958517
0.958516
0.958516

error
2.407697e-001
5.349331e-001
3.250351e-001
6.402206e-002
3.004141e-002
3.362012e-003
1.138899e-004
4.860818e-007
7.277312e-011

Multidimensional systems

We now consider a system of n equations


f1 (x1 , . . . , xn ) = 0
..
.
fn (x1 , . . . , xn ) = 0
that we would like to solve for the vector n. This is a standard problem in
economics as soon as we want to find a general equilibrium, a Nash equilibrium, the steady state of a dynamic system. . . We now present two methods
to achieve this task, which just turn out to be the extension of the Newton
and the secant methods to higher dimensions.

5.2.1

The Newtons method

As in the onedimensional case, a simple way to understand the underlying


idea of Newtons method is to go back to the Taylor expansion of the multidimensional function F
F (x? ) ' F (xk ) + F (xk )(x? xk )
but since x? is a zero of the function, we have
F (xk ) + F (xk )(x? xk ) = 0
15

which, replacing xk+1 by the new guess one may formulate for a candidate
solution, we have the recursive scheme
xk+1 = xk + k where k = (F (xk ))1 F (xk )
such that the algorithm then works as follows.
1. Assign an initial value, xk , k = 0, to the vector x and a vector of
termination criteria (1 , 2 ) > 0
2. Compute F (xk ) and the associated jacobian matrix F (xk )
3. Solve the linear system F (xk )k = F (xk )
4. Compute xk+1 = xk + k
5. if kxk+1 xk k < 1 (1 + kxk+1 k) then goto 5, else go back to 2
6. if kf (xk+1 )k < 2 then stop and set x? = xk+1 ; else report failure.
All comments previously stated in the onedimensional case apply to this
higher dimensional method.
Matlab Code: Newtons Method
function [x,term]=newton(f,x0,varargin);
%
% function x=newton(f,x0,P1,P2,...);
%
% f
: function for which we want to find a zero
% x0
: initial condition for x
% P1,...
: parameters of the function
%
% x
: solution
% Term
: Termination status (1->OK, 0-> Failure)
%
eps1
= 1e-8;
eps2
= 1e-8;
x0
= x0(:);
y0
= feval(f,x0,varargin{:});
n
= size(x0,1);
dev
= diag(.00001*max(abs(x0),1e-8*ones(n,1)));
err

= 1;

16

while err>0;
dy0 = zeros(n,n);
for i= 1:n;
f0
= feval(f,x0+dev(:,i),varargin{:});
f1
= feval(f,x0-dev(:,i),varargin{:});
dy0(:,i) = (f0-f1)/(2*dev(i,i));
end
if det(dy0)==0;
error(Algorithm stuck at a local optimum)
end
d0
= -y0/dy0;
x
= x0+d0
y
= feval(f,x,varargin{:});
tmp = sqrt((x-x0)*(x-x0));
err = tmp-eps1*(1+abs(x));
ferr = sqrt(y*y);
x0
= x;
y0
= y;
end
if ferr<eps2;
term = 1;
else
term = 0;
end

As in the onedimensional case, it may be useful to consider a damped


method, for which the j is computed as

j = min i : 0 6 i 6 imax , f xk + j < kf (xk )k2


2
2

5.2.2

The secant method

As in the onedimensional case, the main gain from using the secant method is
to avoid the computation of the jacobian matrix. Here, we will see a method
developed by Broyden which essentially amounts to define a Rn version of
the secant method. The ideas is to replace the jacobian matrix F (xk ) by a
matrix Sk at iteration k which serves as a guess for the jacobian. Therefore
17

the step, k , now solves


Sk k = F (xk )
to get
xk+1 = xk + k
The remaining point to elucidate is how to compute Sk+1 . The idea is actually
simple as soon as we remember that Sk should be an approximate jacobian
matrix, in other words this should not be far away from the secant and it
should therefore solve
Sk+1 k = F (xk+1 ) F (xk )
This amounts to state that we are able to compute the predicted change in
F(.) for the specific direction k , but we have no information for any other
direction. Broydens idea is to impose that the predicted change in F(.) in
directions orthogonal to k under the new guess for the jacobian, Sk+1 , are
the same than under the old one:
Sk+1 z = Sk z for z 0 k = 0
This yields the following updating scheme:
Sk+1 = Sk +

(Fk Sk k )k0
k0 k

where Fk = F (xk+1 ) F (xk ). Then the algorithm writes as


1. Assign an initial value, xk , k = 0, to the vector x, set Sk = I, k = 0,
and a vector of termination criteria (1 , 2 ) > 0
2. Compute F (xk )
3. Solve the linear system Sk k = F (xk )
4. Compute xk+1 = xk + k , and Fk = F (xk+1 ) F (xk )
18

5. Update the jacobian guess by


Sk+1 = Sk +

(Fk Sk k )k0
k0 k

6. if kxk+1 xk k < 1 (1 + kxk+1 k) then goto 7, else go back to 2


7. if kf (xk+1 )k < 2 then stop and set x? = xk+1 ; else report failure.
The convergence properties of the Broydens method are a bit inferior to those
of Newtons. Nevertheless, this method may be worth trying in large systems
as it can be less costly since it does not involve the computation of the Jacobian
matrix. Further, when dealing with highly nonlinear problem, the jacobian
can change drastically, such that the secant approximation may be particularly
poor.
Matlab Code: Broydens Method
function [x,term]=Broyden(f,x0,varargin);
%
% function x=Broyden(f,x0,P1,P2,...);
%
% f
: function for which we want to find a zero
% x0
: initial condition for x
% P1,...
: parameters of the function
%
% x
: solution
% Term
: Termination status (1->OK, 0-> Failure)
%
eps1
= 1e-8;
eps2
= 1e-8;
x0
= x0(:);
y0
= feval(f,x0,varargin{:});
S
= eye(size(x0,1));
err = 1;
while err>0;
d
= -S\y0;
x
= x0+d;
y
= feval(f,x,varargin{:});
S
= S+((y-y0)-S*d)*d/(d*d);
tmp = sqrt((x-x0)*(x-x0));
err = tmp-eps1*(1+abs(x));
ferr = sqrt(y*y);
x0
= x;
y0
= y;

19

end
if ferr<eps2;
term = 1;
else
term = 0;
end

5.2.3

Final considerations

All these methods are numerical, such that we need a computer to find a
solution. Never forget that a computer can only deal with numbers with a
given accuracy, hence these methods will be more or less efficient depending
on the scale of the system. For instance, imagine we deal with a system for
which the numbers are close to zero (lets think of a model that is close to
the nontrade theorem), then the computer will have a hard time trying to
deal with numbers close to machine precision. Then it may be a good idea to
rescale the system.
Another important feature of all these methods is that they implicitly rely
on a linear approximation of the function we want to solve. Therefore, the
system should not be too nonlinear. For instance, assume you want to find
the solution to the equation
c
=p
(c c)
where p is given. Then there is a great advantage to first rewrite the equation
as
c = p(c c)
and then as

c = p(c c)
such that the system is more linear. Such transformation often turn out to
be extremely useful.
20

Finally, in a number of cases, we know a solution of a simpler system, and


we may use a continuation approach to the problem. For instance, assume we
want to solve a system F (x) = 0 which is particularly complicated, and we
know the solution to the system G(x) = 0 to be particularly simple. We may
then restate the problem of solving F (x) = 0 as solving
G(x) + (1 )F (x) = 0
with [0; 1]. We first start with = 1 get a first solution x0 , and then take
it as an initial condition to solve the system for 1 = 1 , > 0 and small,
to get x1 . This new solution is the used as an initial guess for the problem
with 2 < 1 . This process is repeated until we get the solution for = 0.
This may seem quite a long process, but in complicated method, this may
actually save a lot of time instead of spending hours to finding a good initial
value for the algorithm. Judd [1998] (chapter 5) reports more sophisticated
continuation methods known as homotopy methods that have proven to
be particularly powerful.

21

22

Bibliography
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.

23

Index
bisection, 6
Broydens method, 17
fixed point, 1
Homotopy, 21
iterative procedure, 2
Lipschitz condition, 2
Newtons method, 9
Regula falsi, 13
Secant method, 13

24

Contents
5 Solving nonlinear systems of equations
5.1

5.2

Solving one dimensional problems . . . . . . . . . . . . . . . . .

5.1.1

General iteration procedures . . . . . . . . . . . . . . .

5.1.2

Bisection method . . . . . . . . . . . . . . . . . . . . . .

5.1.3

Newtons method . . . . . . . . . . . . . . . . . . . . . .

5.1.4

Secant methods (or Regula falsi) . . . . . . . . . . . . .

13

Multidimensional systems . . . . . . . . . . . . . . . . . . . . .

15

5.2.1

The Newtons method . . . . . . . . . . . . . . . . . . .

15

5.2.2

The secant method . . . . . . . . . . . . . . . . . . . . .

17

5.2.3

Final considerations . . . . . . . . . . . . . . . . . . . .

20

25

26

List of Figures
5.1

Nonconverging iterative scheme . . . . . . . . . . . . . . . . .

5.2

The Bisection algorithm . . . . . . . . . . . . . . . . . . . . . .

5.3

The Newtons algorithm . . . . . . . . . . . . . . . . . . . . . .

10

5.4

Pathological Cycling behavior . . . . . . . . . . . . . . . . . . .

12

27

28

List of Tables
5.1

Divergence in simple iterative procedure . . . . . . . . . . . . .

5.2

Convergence in modified iterative procedure . . . . . . . . . . .

5.3

Bisection progression . . . . . . . . . . . . . . . . . . . . . . . .

5.4

Newton progression . . . . . . . . . . . . . . . . . . . . . . . . .

11

5.5

Secant progression . . . . . . . . . . . . . . . . . . . . . . . . .

15

29

Lecture Notes 6

Perturbation methods
In these lecture notes, we will study the socalled perturbation method, a
class of method the linear approximation belongs to. The basic idea of this
approach is to include higher order terms in the approximation, and therefore
take both curvature and risk into account therefore accounting for the two
first points we raised in lecture notes 2. For further insights on these methods
see Judd [1998], Judd and Gaspard [1997], Sims [2000] or Schmitt-Grohe and
Uribe [2001].

6.1

The general method

This presentation is mainly based on the paper by Schmitt-Grohe and Uribe


[2001], who present the quadratic perturbation method at the order 2.

6.1.1

Model representation

The set of equilibrium conditions of a wide variety of RE models takes the


form
Et F (yt+1 , yt , xt+1 , xt ) = 0
where F : Rny Rny Rnx Rnx Rn . The state vector xt is of size (nx 1)
and the costate vector yt is of size (ny 1). The total number or variables
is therefore given by n = nx + ny . We assume that the state vector may be
1

partitioned as xt = [x1t ; x2t ], where x1 consists of endogenous state variables,


whereas x2 consists of exogenous state variables. In order to simplify, let us
assume that x2 follows the process
x2t+1 = M x2t + t+1
where t+1 is (n 1) and is distributed as a N (0, I). All eigenvalues of M are
assumed to have modulus less than one.
The solution to this model is of the form:
yt = g(xt , )
xt+1 = h(xt , ) + t+1 with =

(6.1)
(6.2)

where g maps Rnx R+ into Rny and h maps Rnx R+ into Rnx . These are
the true solution of the model, such that the model may be rewritten as
Et F (g(xt+1 , ), g(xt , ), h(xt , ) + t+1 , xt ) = 0
Et F (g (h(xt , ) + t+1 , ) , g(xt , ), h(xt , ) + t+1 , xt ) = 0
In order to save on notations, we will now get rid off time subscripts, and a
prime will stand for t + 1, such that the model rewrites

or

Et F g h(x, ) + 0 , , g(x, ), h(x, ) + 0 , x = 0


Et F(x, ) = 0

6.1.2

The Method

Since we cannot compute exactly neither g(.) nor h(.), we shall compute an
approximation. But, since we want to take risk and curvature into account, we
will take a higher order approximation. Now assume we want to take a higher
order approximation for g(.) and h(.). To keep things simple and you will
2

see that they are already quite cumbersome we will restrict ourselves to the
secondorder approximation, that is

i
Et F (x, ) ' Et F i (x? , 0) + x F i (x? , 0)(x x? ) + F i (x? , 0)

1
x x?
?
i
+ (x x , )HF
=0

2
for all i = 1 . . . , n. Note that we will take the approximation for accurate,
such that this is the approximation that will be set to 0. Therefore, for the
model to be solved, we need all derivatives, at any order to be 0.
First order approximation:

Taking a first order approximation exactly

corresponds to what we did at the very beginning of the chapter. Namely, we


use the approximation

?
?
?
?
Et F(x , 0) + x F(x , 0)(x x ) + F(x , 0) = 0
and we want to set
F i (x? , 0) = 0
x F i (x? , 0) = 0
F i (x? , 0) = 0
for all i = 1 . . . , n. The object we are looking for are however somewhat
different, since we look for something like
g(x, ) = g(x? , 0) + gx (x? , 0)(x x? ) + g (x? , 0)
h(x, ) = h(x? , 0) + hx (x? , 0)(x x? ) + h (x? , 0)
our problem is then to determine g(x? , 0), gx (x? , 0), g (x? , 0), h(x? , 0), hx (x? , 0)
and h (x? , 0). But we already know at least two quantities since by definition
y ? = g(x? , 0)
x? = h(x? , 0)
3

Taking the second condition seriously we have to solve


[x F(x? , 0)]ij = [Fy0 ]i [gx ] [hx ]j + [Fy ]i [gx ]j + [Fx0 ]i [hx ]j + [Fx ]ij = 0 (6.3)
with i = 1, . . . , n, = 1, . . . , ny , and , j = 1, . . . , nx . Implicit in this formulation is the fact that we are using tensor notation. Writing the Taylor expansion
of a multidimensional system is very demanding on notation. We therefore
adopt the following representation for partial derivatives, loosely adapted from
the tensor notation advocated by Judd [1998]:
[Fx ]ij =

fi
xj

The same index in both superscript and subscript positions indicates a summation. For example,
[Fx ]i [hx ]j =

X fi h
x xj

Equation (6.3) defines a (ny + nx ) nx system of equations, sufficient


to determine ny nx partial derivatives of y with respect to x and nx nx
partial derivatives of x0 with respect to x. This system of equations is a
matrix polynomial equation and is solved under the requirement that the
system returns asymptotically to equilibrium in the absence of other future
shocks (transversality condition). In other words we are back to a linear RE
model of the kind we have solved in the first series of the lectures, such that
we may use one of the method we already saw to get hx (.) and gx (.).
Similarly, g (.) and h (.) can be obtained imposing F (x? , 0) = 0. This
amounts to solve the system

[ F(x? , 0)]i = Et [Fy0 ]i [gx ] [h ] + [Fy0 ]i [gx ] [] [0 ] + [Fy0 ]i [g ]

i
i

i
0
[Fy ] [g ] + [Fx0 ] [h ] + [Fx0 ] [] [ ]
=0
4

Note that this equation involves expectations of t+1 which are identically
equal to 0, such that the system reduces to
[Fy0 ]i [gx ] [h ] + [Fy0 ]i [g ] + [Fy ]i [g ] + [Fx0 ]i [h ] = 0

(6.4)

with i = 1, . . . , n, = 1, . . . , ny , and , j = 1, . . . , nx . But as may be noticed,


this equation is homogenous in g and h , such that if a unique solution exists,
it has to be the case that
g (x? , 0) = 0 and h (x? , 0) = 0
in other words risk does not matter, and we are back to the certainty equivalence property. Now we will expand the solution up to the second order.
We now want to obtain an approximation

Second order approximation:


for each

g i (.),

i = 1, . . . , ny and

hj (.),

j = 1, . . . , nx of the form

1
x x?
?
i
g (x, ) = g (x , 0) +
x )+
+ (x x , )Hg

1
x x?
j
?
?
?
j
j
j ?
j
?
h (x, ) = h (x , 0) + hx (x , 0)(x x ) + h (x , 0) + (x x , )Hh

2
i

gxi (x? , 0)(x

gi (x? , 0)

We have already solved the first order problem, we now need to reveal information on

.
i (x? , 0) .. g i (x? , 0)
gxx
x

Hgi =

.. i
i
?
g (x , 0) . g (x? , 0)
x

To do so, we only need to impose

.. j
?
. hx (x , 0)

and Hj =

.
hjx (x? , 0) .. hj (x? , 0)

Et Fxx (x? , 0) = 0
Et Fx (x? , 0) = 0
Et Fx (x? , 0) = 0
Et F (x? , 0) = 0
5

hjxx (x? , 0)

Let us start with gxx (x? , 0) and hxx (x? , 0), these matrices may be identified,
imposing
Et Fxx (x? , 0) = 0
But before going to this point, we need to establish further notations. Like
previously, we will rely on the tensor notations, and will adopt the following
conventions:
[Fxx ]ijk =
[Fxy ]ijk =

2 Fi
xj xk
2 Fi
.
xj yk

The same index in both superscript and subscript positions indicates a summation. For example,
[Fxx ]i [hx ]j [hx ]k =

X X 2 Fi h h
.
x x xj xk

Let us work it out now:

[Fxx (x? , 0)]ijk =


[Fy0 y0 ]i [gx ] [hx ]k + [Fy0 y ]i [gx ]k + [Fy0 x0 ]i [hx ]k + [Fy0 x ]ik [gx ] [hx ]j

+[Fy0 ]i [gxx ] [hx ]k [hx ]j + [gx ] [hxx ]jk

+ [Fyy0 ]i [gx ] [hx ]k + [Fyy ]i [gx ]k + [Fyx0 ]i [hx ]k + [Fyx ]ik [gx ]j

i
i

i
0
0
0
0
0
0
+ [Fx y ] [gx ] [hx ]k + [Fx y ] [gx ]k + [Fx x ] [hx ]k + [Fx x ]k [hx ]j
+[Fxy0 ]ij [gx ] [hx ]k + [Fxy ]ij [gx ]k + [Fxx0 ]ij [hx ]k + [Fxx ]ijk

+[Fy ]i [gxx ]jk + [Fx0 ]i [hxx ]jk


= 0
for i = 1, . . . , n, , = 1, . . . , ny and j, k, , = 1, . . . , nx . Although it looks
quite complicated, this system is just a big linear system that we should solve
for gxx (x? , 0) and hxx (x? , 0), as all the first order derivatives are perfectly
known.
6

Likewise for g (x? , 0) and h (x? , 0), we just need to impose


Et F (x? , 0) = 0
which amounts to
Et F (x? , 0) = [Fy0 y0 ]i [gx ] [] [gx ] [] [I]

+[Fy0 ] [gx ] [h ] + [gxx ] [] [] [I]


+ [g ]

i
i

+[Fy0 x0 ]i [] [gx ] [] [I]


+ [Fy ] [g ] + [Fx0 ] [h ]

+[Fx0 y0 ]i [gx ] [] [] [I]


+ [Fx0 x0 ] [] [] [I]

= 0
for i = 1, . . . , n, , = 1, . . . , ny , , = 1, . . . , nx and , = 1, . . . , n . Like
the previous system, although it looks quite complicated, this is just a linear
system that has to be solved for g (x? , 0) and h (x? , 0), as all the first order
derivatives are perfectly known.
Finally, we can easily show that gx (x? , 0) and hx (x? , 0) are both equal
to zero, indeed, we have

Et Fx (x? , 0) = [Fy0 ]i [gx ] [hx ]j + [gx ] [hx ]j + [Fy ]i [gx ]j + [Fx0 ]i [hx ]j
= 0

Since this system is homogenous in the unknowns gx (x? , 0) and hx (x? , 0), if
one solution exists, it is zero. Hence
gx (x? , 0) = 0 and hx (x? , 0) = 0

Obviously, this solution cannot be obtained analytically, and a numerical


package like Gauss or Matlab is needed to implement this method. Nevertheless, in order to give you an idea of how we can implement it, we will apply
it to our assetpricing model developed in lecture notes 2 and to the optimal
growth model.
7

6.2
6.2.1

Implementing the method


The asset pricing model

We now discuss the implementation of the perturbation method to the simple


asset pricing model of Burnside [1998], and describe how this method can take
advantage of the information carried by the higher moments of the distribution
of the shocks. As a first step, let us briefly sketch the model again.
The model economy: We consider a frictionless pure exchange economy a
`
la Mehra and Prescott [1985] and Rietz [1988] with a single household and a
unique perishable consumption good produced by a single tree. The household can hold equity shares to transfer wealth from one period to another. The
problem of a single agent is then to choose consumption and equity holdings
to maximize her expected discounted stream of utility, given by
Et

X
=0

ct+
with (, 0) (0, 1]

(6.5)

subject to the budget constraint


pt et+1 + ct = (pt + dt )et

(6.6)

where (0, 1) is the agents subjective discount factor, ct is households


consumption of a single perishable good at date t, pt denotes the price of the
equity in period t and et is the households equity holdings in period t. Finally,
dt is the trees dividend in period t. Dividends are assumed to grow at rate xt
such that :
dt = exp(xt )dt1

(6.7)

where xt , the rate of growth of dividends, is assumed to be a Gaussian stationary AR(1) process
xt = (1 )x + xt1 + t
8

(6.8)

where is i.i.d. N (0, 2 ) with || < 1. Market clearing requires that et = 1 so


that ct = dt in equilibrium. Like in Burnside [1998], let yt denote the price
dividend ratio, yt = pt /dt . Then, condition for the households problem can
be shown to rewrite as
yt = Et [exp(xt+1 )(1 + yt+1 )]

(6.9)

Burnside [1998] shows that the above equation admits an exact solution of the
form1
yt =

i exp [ai + bi (xt x)]

(6.10)

i=1

where

2 2
2(1 i ) 2 (1 2i )
+
ai = xi +
i
2(1 )2
1
1 2

and
bi =

(1 i )
1

As can be seen from the definition of ai , the volatility of the shock, directly
enters the decision rule, therefore Burnsides [1998] model does not make the
certainty equivalent hypothesis: risk matters for asset holdings decisions.
Finding a secondorder approximation to the model: First of all let
us rewrite the two key equations that define the model:
yt = Et [exp(xt+1 )(1 + yt+1 )]
xt = (1 )x + xt1 + t
and let us reexpress the model as
xt+1 = h(xt , )
yt = Et [f (yt+1 , xt+1 )]
1

See appendix ?? for a detailed exposition of the solution.

(6.11)
(6.12)

where f (y, x) = exp(x)(1 + y) and h(x, ) = x + (1 )x + . In this


setting, the deterministic steady state is given by

x? = x? + (1 )x x? = x
y ? = f (y ? , x? )
y ? = exp(x)/(1 exp(x))

(6.13)

The solution to (6.12) is a function g(.) such that yt = g(xt , ). Then (6.12)
can be rewritten as
g(xt , ) = Et [f (g(h(xt , )), h(xt , ))] = Et [F (xt , )]

(6.14)

The nth order approximation is then determined by

k
n
X
X
k
1
= Et
Fkj,j x
btkj j
k!
j
j=0
k=0

k
n
X
X
1
k
Et Fkj,j x
btkj j (6.15)
=
j
k!

n
k
X
1 X k
gkj,j x
btkj j
k!
j
k=0

j=0

k=0

where gkj,j denotes


j
Et bt+1
.

g(xt ,)
xkj
j xt =x?
t
=0

j=0

and Fkj,j denotes

k F (xt ,)
xkj
j xt =x
t
=0

and j =

Identifying terms by terms, and after some tedious accounting, we obtain

a system of n equations:

nk
X
k
1
1
gk =
Fk,j j
k!
(j + k)! j

for k = 0 . . . n

(6.16)

j=0

It shall be clear that each Fk,j is a function of gk , k = 0 . . . n, as in our case

Fk,j = exp(x)

"

j+k

j+k
X
j + k j+k`
+

g`
`
`=0

(6.17)

where xt+1 = h(xt , t+1 ). Therefore, (6.16) together with (6.17) defines a
10

linear system that should be solved for gk , k = 0, . . . , n:

Pn 1 h j Pj j j` i

g
=

exp(x)
g ` j

0
j=0 j! +
`=0 `

..

h
Pj+k j+k j+k` i
Pnk 1 k
1
k j+k +

exp(x)
g
=
g ` j
k
j=0 (j+k)! j
`=0
k!
`

..

gn = exp(x)n n + n`=0 n` n` g`
(6.18)

It worth noting that the solution of this system does depend on higher mo-

ments of the distribution of the exogenous shocks, j . Therefore, the properties of the decision rule (slope and curvature) will depend on these moments.
But more importantly, the level of the rule will differ from the certainty equivalent solution, such that the steady state level, y ? , is not given anymore by
(6.13) but will be affected by the higher moments of the distribution of the
shocks (essentially the volatility for our model as the shocks are assumed to
be normally distributed). Also noteworthy is that this approximation is particularly simple in our showcase economy, as the solution only depends on the
exogenous variable xt .

6.2.2

What do we gain?

This section checks the accuracy and evaluates the potential gains of the
method presented above. As the model possesses a closedform solution, we
can directly check the accuracy of each solution we propose against this true
solution. We first present the evaluation criteria that will be used to check
the accuracy. We then conduct the evaluation of the approximation methods
under study.
Criteria
Several criteria are considered to tackle the question of the accuracy of the
different approximation methods under study. As the model admits a closed
form solution, the accuracy of the approximation method can be directly
11

checked against the true decision rule. This is undertaken relying on the
two following criteria

N
1 X yt yet
E1 = 100
yt
N
t=1

and

yt yet

= 100 max
yt

where yt denotes the true solution to pricedividend ratio and yet is the ap-

proximation of the true solution by the method under study. E1 represents


the average relative error an agent makes using the approximation rather than

the true solution, while E is the maximal relative error made using the approximation rather than the true solution. These criteria are evaluated over
the interval xt [x x , x + x ] where is selected such that we explore
99.99% of the distribution of x.
However, one may be more concerned especially in finance with
the ability of the approximation method to account for the distribution of
the PDR, and therefore the moments of the distribution. We first compute
the mean of yt for different calibration and different approximation methods.
Further, we explore the stochastic properties of the innovation of yt : t =
yt Et1 (yt ), in order to assess the ability of each approximation method to
account for the internal stochastic properties of the model. We thus report
standard deviation, skewness and kurtosis of t , which provides information
on the ability of the model to capture the heteroskedasticity of the innovation
and more importantly the potential nonlinearities the model can generate. 2
Approximation errors
Table 6.1 reports E1 and E for the different cases and different approximation methods under study. We also consider different cases. Our benchmark
2
The cdf of t is computed using 20000 replications of MonteCarlo simulations of 1000
observations for t .

12

experiment amounts to considering the Mehra and Prescotts [1985] parameterization of the asset pricing model. We therefore set the mean of the rate
of growth of dividend to x = 0.0179, its persistence to = 0.139 and the
volatility of the innovations to = 0.0348. These values are consistent with
the properties of consumption growth in annual data from 1889 to 1979.
was set to -1.5, the value widely used in the literature, and to 0.95, which is
standard for annual frequency. We then investigate the implications of changes
in these parameters in terms of accuracy. In particular, we study the implications of larger and lower impatience, higher volatility, larger curvature of the
utility function and more persistence in the rate of growth of dividends.

At a first glance at table 6.1, it appears that linear approximation can


only accommodate situations where the economy does not experiment high
volatility or large persistence of the growth of dividends, or where the utility
of individuals does not exhibit much curvature. This is for instance the case in
the Mehra and Prescotts [1985] parameterization (benchmark) case as both
the average and maximal approximation error lie around 1.5%. But, as is
nowadays wellknown, increases along one of the aforementioned dimension
yields lower accuracy of the linear approximation. For instance, increasing the
volatility of the innovations of the rate of growth of dividends to =0.1 yields
approximation errors of almost 12% both in average and at the maximum, thus
indicating that the approximation performs particularly badly in this case.
This is even worse when the persistence of the exogenous process increases,
as =0.9 yields an average approximation error of about 60% and a maximal
approximation of about 225%. This is also true for increases in the curvature
of the utility function (see row 4 and 5 of table 6.1).
If we now apply higher order Taylor expansions as proposed in section ??,
the gains in accuracy are particularly significant. In most of the cases under
study, the approximation error is reduced to less than 0.10%. For instance,
in the benchmark case, the linear approximation led to a 1.5% error, the
13

Table 6.1: Accuracy check

Benchmark
=0.5
=0.99
=-10
=0
=0.5
=0.001
=0.1
=0
=0.5
=0.9

Linear
E1
E
1.4414
1.4774
0.2537
0.2944
2.9414
2.9765
23.7719 25.3774
0.0000
0.0000
0.2865
0.2904
0.0012
0.0012
11.8200 12.1078
1.8469
1.8469
5.9148
8.2770
57.5112 226.2032

Quadratic (CE)
E1
E
1.4239
1.4241
0.2338
0.2343
2.9243
2.9244
23.1348 23.1764
0.0000
0.0000
0.2845
0.2845
0.0012
0.0012
11.6901 11.6933
1.8469
1.8469
4.9136
5.2081
31.8128 146.6219

Quadratic
E1
E
0.0269
0.0642
0.0041
0.0087
0.0833
0.1737
4.5777
8.3880
0.0000
0.0000
0.0016
0.0038
0.0000
0.0000
1.2835
2.2265
0.0329
0.0329
0.7100
1.5640
36.8337 193.1591

Note: The series defining the true solution was truncated after 800 terms,
as no significant improvement was found adding additional terms at the
machine accuracy. When exploring variations in , the overall volatility
of the rate of growth of dividends was maintained to its benchmark level.

14

quadratic approximation leads to a less than 0.1% error. This error is even
lower for low volatility economies, and is essentially zero in the case of =0.001.
Even for higher volatility, the gains from applying a quadratic approximation
can be substantial, as the case = 0.1 shows. In effect, the average error
drops from 12% in the linear case to 1.3% in the quadratic case. The gain
is also important for low degrees of intertemporal substitution as the error is
less than 5% for =-10 compared to the 25% previously obtained.
The gains seem to be impressive, but something still remains to be understood: where does this gain come from?
Figure 6.1 sheds light on these results. We consider a rather extreme
situation where = 5, = 0.5 and the volatility of the shock is preserved.
In this graph, Linear corresponds to the linear approximation of the true
decision rule, quadratic (CE) is the quadratic approximation but ignores the
correction introduced by the volatility and therefore can be qualified as a
certainty equivalent quadratic approximation and quadratic takes care of
the correction.
As can be seen from figure 1, the major gains from moving from a linear to
a nonstochastic higher order Taylor expansion (CE) is found in the ability
of the latter to take care of the curvature of the decision rule. But this is not
a sufficient improvement as far as accuracy of the solution is concerned. This
is confirmed by table 6.2,which reports the approximation errors in each case.
It clearly indicates that the gain from increasing the order of approximation,
Table 6.2: Accuracy check (=-5, =-0.5)
Linear
E1
E
5.9956 10.1111

Quadratic (CE)
E1
E
4.3885 4.7762

Quadratic
E1
E
0.7784 1.9404

without taking care of the stochastic component of the problem, is not as high
as taking the risky component into account. Indeed, accounting for curvature
15

Figure 6.1: Decision rule


18

Exact
O1
DO2
O2
D04
04

16
14
12
10
8
6
4
2
0.15

0.1

0.05

0
xt

0.05

0.1

0.15

Note: This graph was obtained for = 5 and = 0.5.

i.e. moving from linear to quadratic (CE) only leads to a 27% gain in
terms of average error, where as taking risk into account permits to further
enhance the accuracy by 82%, therefore yielding a 87% total gain! In others,
increasing the order of the Taylor approximation, without taking into account
the information carried by the moments of the distribution ignoring the
stochastic dimension of the problem does not add that much in terms of
accuracy. The main improvement can be precisely found in adding the higher
order moments, as it yields a correction in the level of the rule, as can be seen
from figure 1. As soon as the volatility of the shock explicitly appears in the
rule then the rule shifts upward, thus getting closer to the true solution of
the problem. Then, the average (maximal) approximation error decreases to
0.8% (1.9%) to be compared to the earlier 6% (10%). Therefore, most of the
approximation error essentially lies in the level of the rule rather than in its
curvature. Therefore, a higher curvature in the utility function necessitates an
increase in the order of the moments of the distribution that should be taken
16

into account in the approximate solution.


There may however be situations where increasing the curvature constitutes per se a real improvement that contributes to reduce the approximation
error. This is illustrated by considering the high persistence case (=0.9). Indeed, as shown in table 6.1, increasing the order of the Taylor series expansion
from 1 to 2 is not sufficient, per se, to solve the accuracy problem.
To sum up it appears that increasing the order of approximation and further considering information carried by the higher order moments of the distribution yields great improvements in the accuracy of the approximation,
mainly because higher order moments enhance the ability of the approximation to match the level of the decision rule. However, problems still remain
to be solved as the errors are still large for highly persistent economies. This
can be easily explained if we consider the Taylor series expansion to the true
decision rule, given by
yt '

X
i=0

exp(ai )

"

p
X
bk
i

k=0

k!

(xt x)

Table 6.3 then reports the average and maximal errors induced by the Taylor
expansion to the true rule. We only report these errors for cases where the
previous analysis indicated a error greater than 1% for the O2 method. Table
6.3 clearly shows that approximation errors are large (greater than 1%) when
a second order Taylor series expansion to the true rule is already not sufficient
to produce an accurate representation to the analytical solution. For instance,
in the = 10 case, a good approximation of the true rule can be obtained
only after a third order Taylor series expansion. This indicates that we should
use at least a third order Taylor series expansion to increase significantly the
accuracy of the approximation. This phenomenon is even more pronounced as
we consider persistence. Let us focus on the case = 0.9 for which approximation errors (see table 6.1) are huge more than 15% in the O2 approximation.
As can be seen from table 6.3, the second order Taylor series expansion to the
true rule is very inaccurate as it produces maximal errors around 87%. An
17

Table 6.3: Taylor series expansion to the true solution

Case
Benchmark
=0.99
=-10
=-5
=0.1
=0.5
=0.9

Crit.
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)
E1 (v)
E (v)

Order of Taylor series expansion


1
2
3
4
10
0.01
0.00
0.00
0.00 0.00
0.03
0.00
0.00
0.00 0.00
0.01
0.00
0.00
0.00 0.00
0.03
0.00
0.00
0.00 0.00
0.49
0.02
0.00
0.00 0.00
1.60
0.09
0.00
0.00 0.00
0.12
0.00
0.00
0.00 0.00
0.37
0.01
0.00
0.00 0.00
0.01
0.00
0.00
0.00 0.00
0.03
0.00
0.00
0.00 0.00
0.63
0.03
0.00
0.00 0.00
2.05
0.14
0.01
0.00 0.00
32.30 13.25 4.49
1.29 0.00
154.99 87.24 37.15 12.75 0.00

12
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

order of 12 is actually needed to generate an accurate approximation of the


true rule. This hence explains the poor approximation even using a global
approach we obtained in the = 0.9 case, and indicates that a much higher
corrected Taylor series expansion is required to increase accuracy. This thus
does not invalidate the method but underlines one of its features, large orders
can be necessary to get an accurate approximation.
Thus, error approximation results indicate that this perturbation procedure is accurate for most of the cases under study. But they also provide an
explanation, and thus a solution, when the approximation is not accurate.

In this example, we only had to deal with an exogenous state variable, but
it will be the case that we will have to deal with endogenous state variables
such as the capital stock. Therefore, in the next section, we will deal with the
optimal growth model.
18

6.2.3

The second order approximation to the optimal growth


model

We now discuss the implementation of the perturbation method to this model.


The first order optimality condition associated to this model, with a logarithmic utility function and a cobb douglas production function, write as
Et

c
t+1

1
exp(at+1 )kt+1
+1

c
=0
t

(6.19)

Finally, the law of motion of capital writes as


kt+1 exp(at )kt + ct (1 )kt

(6.20)

Therefore, the model to be solved consists of equations (6.19) and (6.20).


Solving the model then amounts to find two functions g(.) and h(.) of the two
predetermined variables kt and at such that:
ct = g(kt , at )
kt+1 = h(kt , at ) = exp(at )kt + (1 )kt ct

Since h(kt , at ) is entirely known once we have obtained g(kt , at ), solving the
mode reduces to find g(kt , at ) that solves:
Et [G(ct+1 , kt+1 , at+1 , ct )] = 0
Note that ct+1 = g(kt+1 , at+1 ) = g(h(kt , at ), at+1 ) = f (kt , at , t+1 ) so that the
later equation rewrites as
Et [G(f (kt , at , ), h(kt , at ), at+1 + (1 )
a + t+1 , g(kt , at ))] = 0
or
Et [F (kt , at , )] = 0
19

(6.21)

We now present the perturbation method for the quadratic case. Taking the
a
second order Taylor expansion of (6.21) around (k,
, 0) yields

a
a
a
a
Et F (k,
, 0) + Fk (k,
, 0)kt + Fa (k,
, 0)
at + F (k,
, 0)

1
1
1
a
a
a
, 0)kt2 + Faa (k,
, 0)
a2t + F (k,
, 0)
2
+ Fkk (k,
2
2
2

a
a
a
+ Fka (k,
, 0)kt a
t + Fk (k,
, 0)kt
+ Fa (k,
, 0)
at
= 0(6.22)

Since kt and at are perfectly known to the household when she takes her
decisions, and since is a constant, expectations can be dropped off in the
previous equation, such that (6.22) reduces to

a
a
a
a
F (k,
, 0) + Fk (k,
, 0)kt + Fa (k,
, 0)
at + F (k,
, 0)

1
1
1
a
a
a
+ Fkk (k,
, 0)kt2 + Faa (k,
, 0)
a2t + F (k,
, 0)
2
2
2
2
a
a
a
+Fka (k,
, 0)kt a
t + Fk (k,
, 0)kt
+ Fa (k,
, 0)
at
=0

(6.23)

Likewise, g(.) is approximated by

a
a
a
a
g(kt , at , ) ' g(k,
, 0) + gk (k,
, 0)kt + ga (k,
, 0)
at + g (k,
, 0)

1
1
1
a
a
a
+ gkk (k,
, 0)kt2 + gaa (k,
, 0)
a2t + g (k,
, 0)
2
2
2
2
a
a
a
+gka (k,
, 0)kt a
t + gk (k,
, 0)kt
+ ga (k,
, 0)
at
(6.24)

Note however that the general approach to the problem has enabled us to
a
a
a
establish formally that both g (k,
, 0), gk (k,
, 0) and ga (k,
, 0) should be
a
a
a
equal to 0. such that we just have to find g(k,
, 0), gk (k,
, 0), ga (k,
, 0),
a
a
a
a
gkk (k,
, 0), gka (k,
, 0), gaa (k,
, 0) and g (k,
, 0). Identifying each term
20

amounts to solve the system:3


a
F (k,
, 0) = 0
a
Fk (k,
, 0) = 0
a
Fa (k,
, 0) = 0
a
Fkk (k,
, 0) = 0
a
Faa (k,
, 0) = 0
a
F (k,
, 0) = 0
a
Fka (k,
, 0) = 0
The first of these equations actually defines the steady state, where as the
next two equations actually amounts to solve the linearized system, as we
have shown in the theoretical section. This system can be solved numerically
using different packages.4 The following codes illustrate the implementation
of the method for = 0.3, = 0.1, = 0.95, = 2.5, = 0.9 and a volatility
of 0.1.
Matlab Code: Perturbation Method (the OGM)
clear
nx =
ny =
ne =

all;
2;
1;
1;

alpha
sigma
beta
delta
ab
rhoa
s2
eta1
ETA

=
=
=
=
=
=
=
=
=

% # backward variables (k,a)


% # forward and static variables (c)
% # shocks (a)
0.3;
2.5;
0.95;
0.1;
0;
0.9;
1;
0.1^2;
[0;eta1];

%
%
%
%
%
%

capital elasticity of output


parameter of the utility function
discount factor
depreciation rate
average of the log of technology shock
persistence of the technology shock

% volatility of the shock

The exact form of this system is given in appendix.


Schmitt-Grohe and Uribe [2001] propose their own package, Chris Sims has also one
package that you can download from his webpage: http://www.princeton.edu/ sims/,
in these notes I am using a package that M. Juillard and I designed which is based on
Schmitt-Grohe and Uribe [2001] but that does not require additional matlab toolboxes like
the symbolic toolbox.
4

21

%
% Steady state
%
ksy
=alpha*beta/(1-beta*(1-delta));
ysk
=(1-beta*(1-delta))/(alpha*beta);
ys
=ksy^(alpha/(1-alpha));
ks
=ksy*ys;
cs
=ys*(1-delta*ksy);
%
% Solving the model
%
param
= [beta sigma alpha delta rhoa as];
xs
= [ks as cs ks as cs];
[J]
= diffext(model,xs,[],param);
[H]
= hessext(model,xs,[],param);

%
%
%
%

vector of parameters
deterministic steady state
Jacobian matrix of the model
Hessian matric of the model

[Gx,Hx,Gxx,Hxx,Gss,Hss]=solve(xs,J,H,ETA,nx,ny,ne);
%
% Simulating the economy (only once)
%
long
= 120;
tronc
= 100;
slong
= long+tronc;
T
= tronc+1:slong;
sim
= 1;
seed
= 1;
e
= randn(slong,ne)*s2;
S1
= zeros(nx,slong);
S2
= zeros(nx,slong);
X1
= zeros(ny,slong);
X2
= zeros(ny,slong);
S1(:,1) = ETA*e(:,1);
S2(:,1) = ETA*e(:,1);
tmp
= S2(:,1)*S2(:,1);
X1(:,1) = Gx*S1(:,1);
X2(:,1) = X1(:,1)+0.5*Gxx*tmp(:);
for i=2:slong
S1(:,i)=Hx*S1(:,i-1)+ETA*e(:,i);
X1(:,i)=Gx*S1(:,i);
S2(:,i)=S1(:,i)+0.5*Hxx*tmp(:)+0.5*Hss;
tmp=S2(:,i)*S2(:,i);
X2(:,i)=(Gs+0.5*Gss*s2+X1(:,i)+0.5*Gxx*tmp(:));
X1(:,i)=(Gs+X1(:,i));
end;

22

Matlab Code: The Model


function eq=model(xx,param);
%
% parameters
%
beta
= param(1);
sigma
= param(2);
alpha
= param(3);
delta
= param(4);
ra
= param(5);
ab
= param(6);
%
% variables (leads and lags)
%
kp
= xx(1); % k(t+1)
ap
= xx(2); % a(t+1)
cp
= xx(3); % c(t+1)
k
= xx(4); % k(t)
a
= xx(5); % a(t)
c
= xx(6); % c(t)
%
% Input the model as we write it on a sheet of paper
%
eq
= zeros(3,1);
%
% Backward variables
%
% k a c k a c
% 1 2 3 4 5 6
eq(1)
= kp-exp(a)*k^alpha+c-(1-delta)*k;
eq(2)
= ap-ra*a-(1-ra)*ab;
%
% Forward variables
%
eq(3)
= c^(-sigma)-beta*(cp^(-sigma))*(alpha*exp(ap)*(kp^(alpha-1))+1-delta);

Figure (6.2) reports the relative differences in the decision rules for consumption and capital between the quadratic and the linear approximation,
computed as
100

g quad g lin
g lin

As it appears clearly in the figure, taking into account risk induces a drop
in the level of consumption between 5 and 10% relative to the linear approximation. This actually just reflects the precautionary motive that is at work
23

Figure 6.2: Differences between Linear and Quadratic approximations


Consumption

Next period capital stock

0.5

10
3

0.05 1.5
3

2.5
kt

2 0.05

0.05
0

2.5

exp(at)

kt

2 0.05

exp(at)

in this type of model. Since the agent is risk averse, she increase her savings
in order to insure herself against shocks, therefore lowering her current expenditures. This also affects current investment to a lesser extent therefore
translating into lower capital accumulation.

6.3

Some words of caution

The perturbation method is and remains a local approximation. Therefore, even in in practice it may handle larger shocks than the now conventional (log)linear approximation, it cannot be used to study the
implications of big structural shocks such as a tax reform, unless this
reform is marginal.
By construction, this method requires the model to be differentiable,
which forbids the analysis of modes with binding constraints.
The quadratic approximation may sometime induce strange behaviors
due to the quadratic term. In order to understand the problem, let us
focus on a simplistic one dimensional model for which the the quadratic
approximation of the decision rule for the state variable would be
xt+1 x? = 0 (xt x? ) + 1 (xt x? )2
24

which may be rewritten as


x
bt+1 = 0 x
bt (1 2 x
bt )

One can recognize the socalled logistic map that lies at the core of a
lot of chaotic systems. Such that we may encounter cases for which the
model is known to be locally stable, which can be checked taking a linear
approximation. But the quadratic approximation can lead to chaotic
dynamics or at least unattended behavior. (try for example to compute
the IRF of capital to a 1% increase when a second order approximation
is used with = 33.7145!). Note that this does not totally question
the relevance of the approach but rather reveals a problem of accuracy
which calls for either other methods, or higher order approximations.

25

26

Bibliography
Burnside, C., Solving asset pricing models with Gaussian shocks, Journal of
Economic Dynamics and Control, 1998, 22, 329340.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
and J. Gaspard, Solving Large-Scale Rational-Expectations Models,
Macroeconomic Dynamics, 1997, 1, 4575.
Mehra, R. and E.C. Prescott, The equity premium: a puzzle, Journal of
Monetary Economics, 1985, 15, 145161.
Rietz, T.A., The equity risk premium: a solution, Journal of Monetary Economics, 1988, 22, 117131.
Schmitt-Grohe, S. and M. Uribe, Solving Dynamic General Equilibrium Models Using a Second-Order Approximation to the Policy Function, mimeo,
University of Pennsylvania 2001.
Sims, C., Second Order Accurate Solution of Discrete Time Dynamic Equilibrium Models, manuscript, Princeton University 2000.

27

28

Contents
6 Perturbation methods
6.1

6.2

6.3

The general method . . . . . . . . . . . . . . . . . . . . . . . .

6.1.1

Model representation . . . . . . . . . . . . . . . . . . . .

6.1.2

The Method

. . . . . . . . . . . . . . . . . . . . . . . .

Implementing the method . . . . . . . . . . . . . . . . . . . . .

6.2.1

The asset pricing model . . . . . . . . . . . . . . . . . .

6.2.2

What do we gain? . . . . . . . . . . . . . . . . . . . . .

11

6.2.3

The second order approximation to the optimal growth


model . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

Some words of caution . . . . . . . . . . . . . . . . . . . . . . .

24

29

30

List of Figures
6.1

Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

6.2

Differences between Linear and Quadratic approximations . . .

24

31

32

List of Tables
6.1

Accuracy check . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

6.2

Accuracy check (=-5, =-0.5) . . . . . . . . . . . . . . . . . .

15

6.3

Taylor series expansion to the true solution . . . . . . . . . . .

18

33

Lecture Notes 7

Dynamic Programming
In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic programming. Dynamic programming is a very convenient
way of writing a large set of dynamic problems in economic analysis as most of
the properties of this tool are now well established and understood.1 In these
lectures, we will not deal with all the theoretical properties attached to this
tool, but will rather give some recipes to solve economic problems using the
tool of dynamic programming. In order to understand the problem, we will
first deal with deterministic models, before extending the analysis to stochastic ones. However, we shall give some preliminary definitions and theorems
that justify all the approach

7.1
7.1.1

The Bellman equation and associated theorems


A heuristic derivation of the Bellman equation

Let us consider the case of an agent that has to decide on the path of a set
of control variables, {yt }
t=0 in order to maximize the discounted sum of its
future payoffs, u(yt , xt ) where xt is state variables assumed to evolve according

to
xt+1 = h(xt , yt ), x0 given.
1
For a mathematical exposition of the problem see Bertsekas [1976], for a more economic
approach see Lucas, Stokey and Prescott [1989].

We finally make the assumption that the model is markovian.


The optimal value our agent can derive from this maximization process is
given by the value function
V (xt ) =

max

{yt+s D (xt+s )}
s=0

s u(yt+s , xt+s )

(7.1)

s=0

where D is the set of all feasible decisions for the variables of choice. Note
that the value function is a function of the state variable only, as since the
model is markovian, only the past is necessary to take decisions, such that all
the path can be predicted once the state variable is observed. Therefore, the
value in t is only a function of xt . (7.1) may now be rewritten as
V (xt ) =

max

{yt D (xt ),{yt+s D (xt+s )}


s=1

u(yt , xt ) +

s u(yt+s , xt+s )

(7.2)

s=1

making the change of variable k = s 1, (7.2) rewrites


V (xt ) =

max

{yt D (xt ),{yt+1+k D (xt+1+k )}


k=0

u(yt , xt ) +

max

{yt D (xt )

u(yt , xt ) +

k+1 u(yt+1+k , xt+1+k )

k=0

or
V (xt ) =

max

{yt+1+k D (xt+1+k )}
k=0

k u(yt+1+k , xt+1+k )

k=0

(7.3)

Note that, by definition, we have


V (xt+1 ) =

max

{yt+1+k D (xt+1+k )}
k=0

k u(yt+1+k , xt+1+k )

k=0

such that (7.3) rewrites as


V (xt ) = max u(yt , xt ) + V (xt+1 )
yt D (xt )

(7.4)

This is the socalled Bellman equation that lies at the core of the dynamic
programming theory. With this equation are associated, in each and every
period t, a set of optimal policy functions for y and x, which are defined by
{yt , xt+1 } Argmax u(y, x) + V (xt+1 )
yD (x)

(7.5)

Our problem is now to solve (7.4) for the function V (xt ). This problem is
particularly complicated as we are not solving for just a point that would
satisfy the equation, but we are interested in finding a function that satisfies
the equation. A simple procedure to find a solution would be the following
1. Make an initial guess on the form of the value function V0 (xt )
2. Update the guess using the Bellman equation such that
Vi+1 (xt ) = max u(yt , xt ) + Vi (h(yt , xt ))
yt D (xt )

3. If Vi+1 (xt ) = Vi (xt ), then a fixed point has been found and the problem is solved, if not we go back to 2, and iterate on the process until
convergence.
In other words, solving the Bellman equation just amounts to find the fixed
point of the bellman equation, or introduction an operator notation, finding
the fixed point of the operator T , such that
Vi+1 = T Vi
where T stands for the list of operations involved in the computation of the
Bellman equation. The problem is then that of the existence and the uniqueness of this fixedpoint. Luckily, mathematicians have provided conditions for
the existence and uniqueness of a solution.

7.1.2

Existence and uniqueness of a solution

Definition 1 A metric space is a set S, together with a metric : S S


R+ , such that for all x, y, z S:

1. (x, y) > 0, with (x, y) = 0 if and only if x = y,


2. (x, y) = (y, x),
3. (x, z) 6 (x, y) + (y, z).
3

Definition 2 A sequence {xn }


n=0 in S converges to x S, if for each > 0
there exists an integer N such that

(xn , x) < for all n > N


Definition 3 A sequence {xn }
n=0 in S is a Cauchy sequence if for each > 0

there exists an integer N such that

(xn , xm ) < for all n, m > N


Definition 4 A metric space (S, ) is complete if every Cauchy sequence in
S converges to a point in S.
Definition 5 Let (S, ) be a metric space and T : S S be function mapping

S into itself. T is a contraction mapping (with modulus ) if for (0, 1),


(T x, T y) 6 (x, y), for all x, y S.

We then have the following remarkable theorem that establishes the existence
and uniqueness of the fixed point of a contraction mapping.
Theorem 1 (Contraction Mapping Theorem) If (S, ) is a complete metric space and T : S S is a contraction mapping with modulus (0, 1),
then

1. T has exactly one fixed point V S such that V = T V ,


2. for any V S, (T n V0 , V ) < n (V 0 , V ), with n = 0, 1, 2 . . .
Since we are endowed with all the tools we need to prove the theorem, we shall
do it.
Proof: In order to prove 1., we shall first prove that if we select any sequence {Vn }
n=0 ,
such that for each n, Vn S and
Vn+1 = T Vn
this sequence converges and that it converges to V S. In order to show convergence of

{Vn }
n=0 , we shall prove that {Vn }n=0 is a Cauchy sequence. First of all, note that the
contraction property of T implies that
(V2 , V1 ) = (T V1 , T V0 ) 6 (V1 , V0 )

and therefore
(Vn+1 , Vn ) = (T Vn , T Vn1 ) 6 (Vn , Vn1 ) 6 . . . 6 n (V1 , V0 )
Now consider two terms of the sequence, Vm and Vn , m > n. The triangle inequality implies
that
(Vm , Vn ) 6 (Vm , Vm1 ) + (Vm1 , Vm2 ) + . . . + (Vn+1 , Vn )
therefore, making use of the previous result, we have

(Vm , Vn ) 6 m1 + m2 + . . . + n (V1 , V0 ) 6

n
(V1 , V0 )
1

Since (0, 1), n 0 as n , we have that for each > 0, there exists N N
such that (Vm , Vn ) < . Hence {Vn }
n=0 is a Cauchy sequence and it therefore converges.
Further, since we have assume that S is complete, Vn converges to V S.
We now have to show that V = T V in order to complete the proof of the first part.
Note that, for each > 0, and for V0 S, the triangular inequality implies
(V, T V ) 6 (V, Vn ) + (Vn , T V )
But since {Vn }
n=0 is a Cauchy sequence, we have
(V, T V ) 6 (V, Vn ) + (Vn , T V ) 6

+
2
2

for large enough n, therefore V = T V .


Hence, we have proven that T possesses a fixed point and therefore have established
its existence. We now have to prove uniqueness. This can be obtained by contradiction.
Suppose, there exists another function, say W S that satisfies W = T W . Then, the
definition of the fixed point implies
(V, W ) = (T V, T W )
but the contraction property implies
(V, W ) = (T V, T W ) 6 (V, W )
which, as > 0 implies (V, W ) = 0 and so V = W . The limit is then unique.
Proving 2. is straightforward as
(T n V0 , V ) = (T n V0 , T V ) 6 (T n1 V0 , V )
but we have (T n1 V0 , V ) = (T n1 V0 , T V ) such that
(T n V0 , V ) = (T n V0 , T V ) 6 (T n1 V0 , V ) 6 2 (T n2 V0 , V ) 6 . . . 6 n (V0 , V )
which completes the proof.

This theorem is of great importance as it establishes that any operator that


possesses the contraction property will exhibit a unique fixedpoint, which
therefore provides some rationale to the algorithm we were designing in the
previous section. It also insure that whatever the initial condition for this
5

algorithm, if the value function satisfies a contraction property, simple iterations will deliver the solution. It therefore remains to provide conditions for
the value function to be a contraction. These are provided by the following
theorem.
Theorem 2 (Blackwells Sufficiency Conditions) Let X R` and B(X)
be the space of bounded functions V : X R with the uniform metric. Let
T : B(X) B(X) be an operator satisfying

1. (Monotonicity) Let V, W B(X), if V (x) 6 W (x) for all x X, then


T V (x) 6 T W (x)

2. (Discounting) There exists some constant (0, 1) such that for all
V B(X) and a > 0, we have

T (V + a) 6 T V + a
then T is a contraction with modulus .
Proof: Let us consider two functions V, W B(X) satisfying 1. and 2., and such that
V 6 W + (V, W )
Monotonicity first implies that
T V 6 T (W + (V, W ))
and discounting
T V 6 T W + (V, W ))
since (V, W ) > 0 plays the same role as a. We therefore get
T V T W 6 (V, W )
Likewise, if we now consider that V 6 W + (V, W ), we end up with
T W T V 6 (V, W )
Consequently, we have
|T V T W | 6 (V, W )
so that
(T V, T W ) 6 (V, W )
which defines a contraction. This completes the proof.

This theorem is extremely useful as it gives us simple tools to check whether


a problem is a contraction and therefore permits to check whether the simple
algorithm we were defined is appropriate for the problem we have in hand.
As an example, let us consider the optimal growth model, for which the
Bellman equation writes
V (kt ) = max u(ct ) + V (kt+1 )
ct C

with kt+1 = F (kt ) ct . In order to save on notations, let us drop the time

subscript and denote the next period capital stock by k 0 , such that the Bellman

equation rewrites, plugging the law of motion of capital in the utility function
V (k) = max
u(F (k) k 0 ) + V (k 0 )
0
k K

Let us now define the operator T as


(T V )(k) = max
u(F (k) k 0 ) + V (k 0 )
0
k K

We would like to know if T is a contraction and therefore if there exists a


unique function V such that
V (k) = (T V )(k)
In order to achieve this task, we just have to check whether T is monotonic
and satisfies the discounting property.
1. Monotonicity: Let us consider two candidate value functions, V and W ,
such that V (k) 6 W (k) for all k K . What we want to show is that
(T V )(k) 6 (T W )(k). In order to do that, let us denote by e
k 0 the optimal

next period capital stock, that is

(T V )(k) = u(F (k) e


k 0 ) + V (e
k0 )

But now, since V (k) 6 W (k) for all k K , we have V (e


k 0 ) 6 W (e
k 0 ),

such that it should be clear that

(T V )(k) 6 u(F (k) e


k 0 ) + W (e
k0 )
6

max u(F (k) k 0 ) + W (k 0 ) = (T W (k 0 ))

k0 K

Hence we have shown that V (k) 6 W (k) implies (T V )(k) 6 (T W )(k)


and therefore established monotonicity.
2. Discounting: Let us consider a candidate value function, V , and a positive constant a.
(T (V + a))(k) =
=

max u(F (k) k 0 ) + (V (k 0 ) + a)

k0 K

max u(F (k) k 0 ) + V (k 0 ) + a

k0 K

= (T V )(k) + a
Therefore, the Bellman equation satisfies discounting in the case of optimal growth model.
Hence, the optimal growth model satisfies the Blackwells sufficient conditions for a contraction mapping, and therefore the value function exists and is
unique. We are now in position to design a numerical algorithm to solve the
bellman equation.

7.2

Deterministic dynamic programming

7.2.1

Value function iteration

The contraction mapping theorem gives us a straightforward way to compute


the solution to the bellman equation: iterate on the operator T , such that
Vi+1 = T Vi up to the point where the distance between two successive value
function is small enough. Basically this amounts to apply the following algorithm
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1 , . . . , xN }
formulate an initial guess for the value function V0 (x) and choose a
stopping criterion > 0.
8

2. For each x` X , ` = 1, . . . , N , compute


Vi+1 (x` ) = max u(y(x` , x0 ), x` ) + Vi (x0 )
{x0 X }

3. If kVi+1 (x) Vi (x)k < go to the next step, else go back to 2.


4. Compute the final solution as
y ? (x) = y(x, x0 )
and
V ? (x) =

u(y ? (x), x)
1

In order to better understand the algorithm, let us consider a simple example


and go back to the optimal growth model, with
u(c) =

c1 1
1

and
k 0 = k c + (1 )k
Then the Bellman equation writes
V (k) =

c1 1
+ V (k 0 )
06c6k +(1)k 1
max

From the law of motion of capital we can determine consumption as


c = k + (1 )k k 0
such that plugging this results in the Bellman equation, we have
V (k) =

(k + (1 )k k 0 )1 1
+ V (k 0 )
1
(1)k6k 0 6k +(1)k
max

Now, let us define a grid of N feasible values for k such that we have
K = {k1 , . . . , kN }
9

and an initial value function V0 (k) that is a vector of N numbers that relate
each k` to a value. Note that this may be anything we want as we know by
the contraction mapping theorem that the algorithm will converge. But, if
we want it to converge fast enough it may be a good idea to impose a good
initial guess. Finally, we need a stopping criterion.
Then, for each ` = 1, . . . , N , we compute the feasible values that can be
taken by the quantity in left hand side of the value function
V`,h

(k` + (1 )k` kh0 )1 1


+ V (kh ) for h feasible
1

It is important to understand what h feasible means. Indeed, we only compute consumption when it is positive and smaller than total output, which
restricts the number of possible values for k 0 . Namely, we want k 0 to satisfy
0 6 k 0 6 k + (1 )k
which puts a lower and an upper bound on the index h. When the grid of
values is uniform that is when kh = k + (h 1)dk , where dk is the increment

in the grid, the upper bound can be computed as


ki + (1 )ki k
+1
h=E
dk
Then we find
?
V`,h
max V`,h
h=1,...,h

and set
?
Vi+1 (k` ) = V`,h

and keep in memory the index h? = Argmaxh=1,...,N V`,h , such that we have
k 0 (k` ) = kh?

In figures 7.1 and 7.2, we report the value function and the decision rules
obtained from the deterministic optimal growth model with = 0.3, = 0.95,
10

= 0.1 and = 1.5. The grid for the capital stock is composed of 1000 data
points ranging from (1 k )k ? to (1 + k )k ? , where k ? denotes the steady

state and k = 0.9. The algorithm2 then converges in 211 iterations and 110.5
seconds on a 800Mhz computer when the stopping criterion is = 1e6 .
Figure 7.1: Deterministic OGM (Value function, Value iteration)
4
3
2
1
0
1
2
3
4
0

1.5;
0.1;
0.95;
0.30;

0.5

1.5

2.5
kt

3.5

4.5

Matlab Code: Value Function Iteration


% utility parameter
% depreciation rate
% discount factor
% capital elasticity of output

sigma
delta
beta
alpha

=
=
=
=

nbk
crit
epsi

= 1000;
= 1;
= 1e-6;

ks

= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

dev
kmin
kmax

= 0.9;
= (1-dev)*ks;
= (1+dev)*ks;

% number of data points in the grid


% convergence criterion
% convergence parameter

% maximal deviation from steady state


% lower bound on the grid
% upper bound on the grid

This code and those that follow are not efficient from a computational point of view, as
they are intended to have you understand the method without adding any coding complications. Much faster implementations can be found in the accompanying matlab codes.

11

Figure 7.2: Deterministic OGM (Decision rules, Value iteration)


5

Next period capital stock

Consumption

1.5

3
1
2
0.5

1
0
0

dk
kgrid
v
dr

=
=
=
=

kt

0
0

(kmax-kmin)/(nbk-1);
linspace(kmin,kmax,nbk);
zeros(nbk,1);
zeros(nbk,1);

%
%
%
%

kt

implied increment
builds the grid
value function
decision rule (will contain indices)

while crit>epsi;
for i=1:nbk
%
% compute indexes for which consumption is positive
%
tmp
= (kgrid(i)^alpha+(1-delta)*kgrid(i)-kmin);
imax
= min(floor(tmp/dk)+1,nbk);
%
% consumption and utility
%
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
%
% find value function
%
[tv(i),dr(i)] = max(util+beta*v(1:imax));
end;
crit = max(abs(tv-v));
v
= tv;
end
%
% Final solution
%
kp = kgrid(dr);

% Compute convergence criterion


% Update the value function

12

c
= kgrid.^alpha+(1-delta)*kgrid-kp;
util= (c.^(1-sigma)-1)/(1-sigma);

7.2.2

Taking advantage of interpolation

A possible improvement of the method is to have a much looser grid on the


capital stock but have a pretty fine grid on the control variable (consumption
in the optimal growth model). Then the next period value of the state variable
can be computed much more precisely. However, because of this precision and
the fact the grid is rougher, it may be the case that the computed optimal
value for the next period state variable does not lie in the grid, such that
the value function is unknown at this particular value. Therefore, we use
any interpolation scheme to get an approximation of the value function at
this value. One advantage of this approach is that it involves less function
evaluations and is usually less costly in terms of CPU time. The algorithm is
then as follows:
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1 , . . . , xN }
Decide on a grid, Y , of admissible values for the control variable y
Y = {y1 , . . . , yM } with M N
formulate an initial guess for the value function V0 (x) and choose a
stopping criterion > 0.
2. For each x` X , ` = 1, . . . , N , compute
x0`,j = h(yj , x` )j = 1, . . . , M
Compute an interpolated value function at each x0`,j = h(yj , x` ): Vei (x0`,j )
Vi+1 (x` ) = max u(y, x` ) + Vei (x0`,j )
{yY }

13

3. If kVi+1 (x) Vi (x)k < go to the next step, else go back to 2.


4. Compute the final solution as
V ? (x) =

u(y ? , x)
1

I report now the matlab code for this approach when we use cubic spline
interpolation for the value function, 20 nodes for the capital stock and 1000
nodes for consumption. The algorithm converges in 182 iterations and 40.6
seconds starting from initial condition for the value function
v0 (k) =

((c? /y ? ) k )1 1
(1 )

sigma
delta
beta
alpha

Matlab Code: Value Function Iteration with Interpolation


= 1.5;
% utility parameter
= 0.1;
% depreciation rate
= 0.95;
% discount factor
= 0.30;
% capital elasticity of output

nbk
nbk
crit
epsi

=
=
=
=

ks

= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));

dev
kmin
kmax
kgrid
cmin
cmak
c
v
dr
util

=
=
=
=
=
=
=
=
=
=

20;
1000;
1;
1e-6;

%
%
%
%

0.9;
%
(1-dev)*ks;
%
(1+dev)*ks;
%
linspace(kmin,kmax,nbk); %
0.01;
%
kmax^alpha;
%
linspace(cmin,cmax,nbc); %
zeros(nbk,1);
%
zeros(nbk,1);
%
(c.^(1-sigma)-1)/(1-sigma);

while crit>epsi;
for i=1:nbk;
kp
vi
[Tv(i),dr(i)]

number of data points in the K grid


number of data points in the C grid
convergence criterion
convergence parameter

maximal deviation from steady state


lower bound on the grid
upper bound on the grid
builds the grid
lower bound on the grid
upper bound on the grid
builds the grid
value function
decision rule (will contain indices)

= A(j)*k(i).^alpha+(1-delta)*k(i)-c;
= interp1(k,v,kp,spline);
= max(util+beta*vi);

14

end
crit = max(abs(tv-v));
% Compute convergence criterion
v
= tv;
% Update the value function
end
%
% Final solution
%
kp = kgrid(dr);
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
v
= util/(1-beta);

Figure 7.3: Deterministic OGM (Value function, Value iteration with interpolation)
4
3
2
1
0
1
2
3
4
0

7.2.3

0.5

1.5

2.5
kt

3.5

4.5

Policy iterations: Howard Improvement

The simple value iteration algorithm has the attractive feature of being particularly simple to implement. However, it is a slow procedure, especially for
infinite horizon problems, since it can be shown that this procedure converges
at the rate , which is usually close to 1! Further, it computes unnecessary
quantities during the algorithm which slows down convergence. Often, com15

Figure 7.4: Deterministic OGM (Decision rules, Value iteration with interpolation)
Next period capital stock

Consumption

1.5

3
1
2
0.5

1
0
0

kt

0
0

kt

putation speed is really important, for instance when one wants to perform a
sensitivity analysis of the results obtained in a model using different parameterization. Hence, we would like to be able to speed up convergence. This can
be achieved relying on Howard improvement method. This method actually
iterates on policy functions rather than iterating on the value function. The
algorithm may be described as follows
1. Set an initial feasible decision rule for the control variable y = f0 (x) and
compute the value associated to this guess, assuming that this rule is
operative forever:
V (xt ) =

s u(fi (xt+s ), xt+s )

s=0

taking care of the fact that xt+1 = h(xt , yt ) = h(xt , fi (xt )) with i = 0.
Set a stopping criterion > 0.
2. Find a new policy rule y = fi+1 (x) such that
fi+1 (x) Argmax u(y, x) + V (x0 )
y

with x0 = h(x, fi (x))


16

3. check if kfi+1 (x) fi (x)k < , if yes then stop, else go back to 2.
Note that this method differs fundamentally from the value iteration algorithm
in at least two dimensions
i one iterates on the policy function rather than on the value function;
ii the decision rule is used forever whereas it is assumed that it is used
only two consecutive periods in the value iteration algorithm. This is
precisely this last feature that accelerates convergence.
Note that when computing the value function we actually have to solve a linear
system of the form
Vi+1 (x` ) = u(fi+1 (x` ), x` ) + Vi+1 (h(x` , fi+1 (x` )))

x` X

for Vi+1 (x` ), which may be rewritten as


Vi+1 (x` ) = u(fi+1 (x` ), x` ) + QVi+1 (x` ) x` X
where Q is an (N N ) matrix

1 if x0j h(fi+1 (x` ), x` ) = x`


Q`j =
0 otherwise
Note that although it is a big matrix, Q is sparse, which can be exploited in
solving the system, to get
Vi+1 (x) = (I Q)1 u(fi+1 (x), x)
We apply this algorithm to the same optimal growth model as in the previous
section and report the value function and the decision rules at convergence
in figures 7.5 and 7.6. The algorithm converges in only 18 iterations and 9.8
seconds, starting from the same initial guess and using the same parameterization!

17

sigma
delta
beta
alpha

=
=
=
=

1.50;
0.10;
0.95;
0.30;

nbk
crit
epsi

= 1000;
= 1;
= 1e-6;

Matlab Code: Policy Iteration


% utility parameter
% depreciation rate
% discount factor
% capital elasticity of output
% number of data points in the grid
% convergence criterion
% convergence parameter

ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
= 0.9;
% maximal deviation from steady state
kmin
= (1-dev)*ks;
% lower bound on the grid
kmax
= (1+dev)*ks;
% upper bound on the grid
kgrid
= linspace(kmin,kmax,nbk); % builds the grid
v
= zeros(nbk,1);
% value function
kp0
= kgrid;
% initial guess on k(t+1)
dr
= zeros(nbk,1);
% decision rule (will contain indices)
%
% Main loop
%
while crit>epsi;
for i=1:nbk
%
% compute indexes for which consumption is positive
%
imax
= min(floor((kgrid(i)^alpha+(1-delta)*kgrid(i)-kmin)/devk)+1,nbk);
%
% consumption and utility
%
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
%
% find new policy rule
%
[v1,dr(i)]= max(util+beta*v(1:imax));
end;
%
% decision rules
%
kp = kgrid(dr)
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
%
% update the value
%
util= (c.^(1-sigma)-1)/(1-sigma);
Q
= sparse(nbk,nbk);
for i=1:nbk;
Q(i,dr(i)) = 1;

18

end
Tv =
crit=
v
=
kp0 =

(speye(nbk)-beta*Q)\u;
max(abs(kp-kp0));
Tv;
kp;

end

Figure 7.5: Deterministic OGM (Value function, Policy iteration)


4
3
2
1
0
1
2
3
4
0

0.5

1.5

2.5
kt

3.5

4.5

As experimented in the particular example of the optimal growth model,


policy iteration algorithm only requires a few iterations. Unfortunately, we
have to solve the linear system
(I Q)Vi+1 = u(y, x)
which may be particularly costly when the number of grid points is important. Therefore, a number of researchers has proposed to replace the matrix
inversion by an additional iteration step, leading to the socalled modified policy iteration with k steps, which replaces the linear problem by the following
iteration scheme
1. Set J0 = Vi
19

Figure 7.6: Deterministic OGM (Decision rules, Policy iteration)


Next period capital stock

Consumption

1.5

3
1
2
0.5

1
0
0

kt

0
0

kt

2. iterate k times on
Ji+1 = u(y, x) + QJi , i = 0, . . . , k
3. set Vi+1 = Jk .
When k , Jk tends toward the solution of the linear system.

7.2.4

Parametric dynamic programming

This last technics I will describe borrows from approximation theory using
either orthogonal polynomials or spline functions. The idea is actually to
make a guess for the functional for of the value function and iterate on the
parameters of this functional form. The algorithm then works as follows
1. Choose a functional form for the value function Ve (x; ), a grid of interpolating nodes X = {x1 , . . . , xN }, a stopping criterion > 0 and an

initial vector of parameters 0 .

2. Using the conjecture for the value function, perform the maximization
step in the Bellman equation, that is compute w` = T (Ve (x, i ))
w` = T (Ve (x` , i )) = max u(y, x` ) + Ve (x0 , i )
y

20

s.t. x0 = h(y, x` ) for ` = 1, . . . , N


3. Using the approximation method you have chosen, compute a new vector
of parameters i+1 such that Ve (x, i+1 ) approximates the data (x` , w` ).

4. If kVe (x, i+1 ) Ve (x, i )k < then stop, else go back to 2.

First note that for this method to be implementable, we need the payoff function and the value function to be continuous. The approximation function
may be either a combination of polynomials, neural networks, splines. Note
that during the optimization problem, we may have to rely on a numerical
maximization algorithm, and the approximation method may involve numerical minimization in order to solve a nonlinear leastsquare problem of the
form:
i+1 Argmin

N
X
`=1

(w` Ve (x` ; ))2

This algorithm is usually much faster than value iteration as it may not require
iterating on a large grid. As an example, I will once again focus on the optimal
growth problem we have been dealing with so far, and I will approximate the
value function by
Ve (k; ) =

kk
1
i i 2
kk
i=1

p
X

where {i (.)}pi=0 is a set of Chebychev polynomials. In the example, I set


p = 10 and used 20 nodes. Figures 7.8 and 7.7 report the decision rule and

the value function in this case, and table 7.1 reports the parameters of the
approximation function. The algorithm converged in 242 iterations, but took
less much time that value iterations.

21

Figure 7.7: Deterministic OGM (Value function, Parametric DP)


4
3
2
1
0
1
2
3
4
0

0.5

1.5

2.5
kt

3.5

4.5

Figure 7.8: Deterministic OGM (Decision rules, Parametric DP)


5

Next period capital stock

Consumption

1.5

3
1
2
0.5

1
0
0

0
0

22

3
t

Table 7.1: Value function approximation


0
1
2
3
4
5
6
7
8
9
10

0.82367
2.78042
-0.66012
0.23704
-0.10281
0.05148
-0.02601
0.01126
-0.00617
0.00501
-0.00281

23

sigma
delta
beta
alpha

=
=
=
=

Matlab Code: Parametric Dynamic Programming


1.50;
% utility parameter
0.10;
% depreciation rate
0.95;
% discount factor
0.30;
% capital elasticity of output

nbk
p
crit
iter
epsi

=
=
=
=
=

20;
10;
1;
1;
1e-6;

%
%
%
%
%

# of data points in the grid


order of polynomials
convergence criterion
iteration
convergence parameter

ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
= 0.9;
% maximal dev. from steady state
kmin
= (1-dev)*ks;
% lower bound on the grid
kmax
= (1+dev)*ks;
% upper bound on the grid
rk
= -cos((2*[1:nbk]-1)*pi/(2*nnk)); % Interpolating nodes
kgrid
= kmin+(rk+1)*(kmax-kmin)/2;
% mapping
%
% Initial guess for the approximation
%
v
= (((kgrid.^alpha).^(1-sigma)-1)/((1-sigma)*(1-beta)));;
X
= chebychev(rk,n);
th0
= X\v
Tv
= zeros(nbk,1);
kp
= zeros(nbk,1);
%
% Main loop
%
options=foptions;
options(14)=1e9;
while crit>epsi;
k0
= kgrid(1);
for i=1:nbk
param = [alpha beta delta sigma kmin kmax n kgrid(i)];
kp(i) = fminu(tv,k0,options,[],param,th0);
k0
= kp(i);
Tv(i) = -tv(kp(i),param,th0);
end;
theta= X\Tv;
crit = max(abs(Tv-v));
v
= Tv;
th0 = theta;
iter= iter+1;
end

24

Matlab Code: Extra


function res=tv(kp,param,theta);
alpha
= param(1);
beta
= param(2);
delta
= param(3);
sigma
= param(4);
kmin
= param(5);
kmax
= param(6);
n
= param(7);
k
= param(8);
kp
= sqrt(kp.^2);
v
= value(kp,[kmin kmax n],theta);
c
= k.^alpha+(1-delta)*k-kp;
d
= find(c<=0);
c(d)
= NaN;
util
= (c.^(1-sigma)-1)/(1-sigma);
util(d) = -1e12;
res
= -(util+beta*v);

Functions

%
%
%
%
%
%
%
%

insures positivity of k
computes the value function
computes consumption
find negative consumption
get rid off negative c
computes utility
utility = low number for c<0
compute -TV (we are minimizing)

function v = value(k,param,theta);
kmin
= param(1);
kmax
= param(2);
n
= param(3);
k
= 2*(k-kmin)/(kmax-kmin)-1;
v
= chebychev(k,n)*theta;
function Tx=chebychev(x,n);
X=x(:);
lx=size(X,1);
if n<0;error(n should be a positive integer);end
switch n;
case 0;
Tx=[ones(lx,1)];
case 1;
Tx=[ones(lx,1) X];
otherwise
Tx=[ones(lx,1) X];
for i=3:n+1;
Tx=[Tx 2*X.*Tx(:,i-1)-Tx(:,i-2)];
end
end

A potential problem with the use of Chebychev polynomials is that they


do not put any assumption on the shape of the value function that we know
to be concave and strictly increasing in this case. This is why Judd [1998]
25

recommends to use shapepreserving methods such as Schumaker approximation in this case. Judd and Solnick [1994] have successfully applied this latter
technic to the optimal growth model and found that the approximation was
very good and dominated a lot of other methods (they actually get the same
precision with 12 nodes as the one achieved with a 1200 data points grid using
a value iteration technic).

7.3

Stochastic dynamic programming

In a large number of problems, we have to deal with stochastic shocks


just think of a standard RBC model dynamic programming technics can
obviously be extended to deal with such problems. I will first show how
we can obtain the Bellman equation before addressing some important issues
concerning discretization of shocks. Then I will describe the implementation
of the value iteration and policy iteration technics for the stochastic case.

7.3.1

The Bellman equation

The stochastic problem differs from the deterministic problem in that we now
have a . . . stochastic shock! The problem then defines a value function which
has as argument the state variable xt but also the stationary shock st , whose
sequence {st }+
t=0 satisfies

st+1 = (st , t+1 )

(7.6)

where is a white noise process. The value function is therefore given by

X
u(yt+ , xt+ , st+ )
(7.7)
E
V (xt , st ) =
max
t

{yt+ D (xt+ ,st+ )} =0

=0

subject to (7.6) and

xt+1 = h(xt , yt , st )

(7.8)

Since both yt , xt and st are either perfectly observed or decided in period t


they are known in t, such that we may rewrite the value function as

X
V (xt , st ) =
max
u(y
,
x
,
s
)+E
u(yt+ , xt+ , st+ )
t t t
t

{yt D (xt ,st ),{yt+ D (xt+ ,st+ )} =1 }

26

=1

or
V (xt , st ) =

max

yt D (xt ,st )

u(yt , xt , st )+

max

{yt+ D (xt+ ,st+ )}


=1

Et

max

yt D (xt+ ,st+ )

u(yt , xt , st )+

u(yt+ , xt+ , st+ )

=1

Using the change of variable k = 1, this rewrites


V (xt , st ) =

max

{yt+1+k D (xt+1+k ,st+1+k )}


k=0

Et

k u(yt+1+k , xt+1+k , st+1+k )

k=0

It is important at this point to recall that


Z
Et (X(t+ )) =
X(t+ )f (t+ |t )dt+
Z Z
=
X(t+ )f (t+ |t+ 1 )f (t+ 1 |t )dt+ dt+ 1
Z
Z
=
. . . X(t+ )f (t+ |t+ 1 ) . . . f (t+1 |t )dt+ . . . dt+1
which as corollary the law of iterated projections, such that the value function
rewrites
V (xt , st ) =

max

yt D (xt ,st )

u(yt , xt , st )
max

{yt+1+k D (xt+1+k ,st+1+k )}


k=0

Et Et+1

max

yt D (xt ,st )

k u(yt+1+k , xt+1+k , st+1+k )

k=0

or
V (xt , st ) =

u(yt , xt , st )
max

{yt+1+k D (xt+1+k ,st+1+k )}


k=0

Et+1

X
k=0

k u(yt+1+k , xt+1+k , st+1+k )f (st+1 |st )dst+1

Note that because each value for the shock defines a particular mathematical
object the maximization of the integral corresponds to the integral of the maximization, therefore the max operator and the summation are interchangeable,
so that we get
V (xt , st ) =

max

u(yt , xt , st )

max Et+1

yt D (xt ,st )

{yt+1+k }k=0

X
k=0

27

k u(yt+1+k , xt+1+k , st+1+k )f (st+1 |st )dst+1

By definition, the term under the integral corresponds to V (xt+1 , st+1 ),


such that the value rewrites
V (xt , st ) =

max

yt D (xt ,st )

u(yt , xt , st ) +

V (xt+1 , st+1 )f (st+1 |st )dst+1

or equivalently
V (xt , st ) =

max

yt D (xt ,st )

u(yt , xt , st ) + Et V (xt+1 , st+1 )

which is precisely the Bellman equation for the stochastic dynamic programming problem.

7.3.2

Discretization of the shocks

A very important problem that arises whenever we deal with value iteration or
policy iteration in a stochastic environment is that of the discretization of the
space spanned by the shocks. Indeed, the use of a continuous support for the
stochastic shocks is infeasible for a computer that can only deals with discrete
supports. We therefore have to transform the continuous problem into a discrete one with the constraint that the asymptotic properties of the continous
and the discrete processes should be the same. The question we therefore face
is: does there exist a discrete representation for s which is equivalent to its
continuous original representation? The answer to this question is yes. In
particular as soon as we deal with (V)AR processes, we can use a very powerful
tool: Markov chains.

Markov Chains: A Markov chain is a sequence of random values whose


probabilities at a time interval depends upon the value of the number at the
previous time. W will restrict ourselves to discrete-time Markov chains, in
which the state changes at certain discrete time instants, indexed by an integer
variable t. At each time step t, the Markov chain is in a state, denoted by
s S {s1 , . . . , sM }. S is called the state space.
28

The Markov chain is described in terms of transition probabilities ij . This


transition probability should read as follows
If the state happens to be si in period t, the probability that the next state
is equal to sj is ij .

We therefore get the following definition


Definition 6 A Markov chain is a stochastic process with a discrete indexing
S , such that the conditional distribution of st+1 is independent of all previously attained state given st :
ij = Prob(st + 1 = sj |st = si ), si , sj S .
The important assumption we shall make concerning Markov processes is
that the transition probabilities, ij , apply as soon as state si is activated
no matter the history of the shocks nor how the system got to state s i . In
other words there is no hysteresis. From a mathematical point of view, this
corresponds to the socalled Markov property
P (st+1 = sj |st = si , st1 = sin1 , . . . s0 = si0 ) = P (st+1 = sj |st = si ) = ij
for all period t, all states si , sj S , and all sequences {sin }t1
n=0 of earlier

states. Thus, the probability the next state st+1 does only depend on the
current realization of s.
The transition probabilities ij must of course satisfy
1. ij > 0 for all i, j = 1, . . . , M
2.

PM

j=1 ij

= 1 for all i = 1, . . . , M

All of the elements of a Markov chain model can then be encoded in a


transition probability matrix

11 . . . 1M

..
..
= ...
.
.
M 1 . . . M M

29

Note that kij then gives the probability that st+k = sj given that st = si .
In the long run, we obtain the steady state equivalent for a Markov chain: the
invariant distribution or stationary distribution
Definition 7 A stationary distribution for a Markov chain is a distribution
such that
1. j > 0 for all j = 1, . . . , M
2.

PM

j=1 j

=1

3. =
Gaussianquadrature approach to discretization

Tauchen and Hussey

[1991] provide a simple way to discretize VAR processes relying on gaussian


quadrature.3 I will only present the case of an AR(1) process of the form
st+1 = st + (1 )s + t+1
where t+1 ; N (0, 2 ). This implies that
(

)
Z
Z
1 st+1 st (1 )s 2
1

dst+1 = f (st+1 |st )dst+1 = 1


exp
2

2
which illustrates the fact that s is a continuous random variable. Tauchen and
Hussey [1991] propose to replace the integral by
Z
Z
f (st+1 |st )
(st+1 ; st , s)f (st+1 |s)dst+1
f (st+1 |s)dst+1 = 1
f (st+1 |s)
where f (st+1 |s) denotes the density of st+1 conditional on the fact that st = s

(which amounts to the unconditional density), which in our case implies that
"
(

#)
f (st+1 |st )
st+1 st (1 )s 2
1
st+1 s 2

(st+1 ; st , s)
= exp
f (st+1 |s)
2

Note that we already discussed this issue in lectures notre # 4.

30

then we can use the standard linear transformation and impose zt = (st

s)/( 2) to get
Z

2
1
2

dzt+1
exp (zt+1 zt )2 zt+1
j) exp zt+1

for which we can use a GaussHermite quadrature. Assume then that we have

the quadrature nodes zi and weights i , i = 1, . . . , n, the quadrature leads to


the formula

n
1 X

j (zj ; zi ; x) ' 1

j=1

in other words we might interpret the quantity j (zj ; zi ; x) as an estimate

bij Prob(st+1 = sj |st = si ) of the transition probability from state i to

state j, but remember that the quadrature is just an approximation such that
P
it will generally be the case that nj=1
bij = 1 will not hold exactly. Tauchen

and Hussey therefore propose the following modification:


j (zj ; zi ; s)

bij =
i
Pn
1
where i = j=1 j (zj ; zi ; x).

Matlab Code: Discretization of AR(1)

function [s,p]=tauch_huss(xbar,rho,sigma,n)
% xbar : mean of the x process
% rho
: persistence parameter
% sigma : volatility
% n
: number of nodes
%
% returns the states s and the transition probabilities p
%
[xx,wx] = gauss_herm(n);
% nodes and weights for s
s
= sqrt(2)*s*xx+mx;
% discrete states
x
= xx(:,ones(n,1));
z
= x;
w
= wx(:,ones(n,1));
%
% computation
%
p
= (exp(z.*z-(z-rx*x).*(z-rx*x)).*w)./sqrt(pi);
sx
= sum(p);
p
= p./sx(:,ones(n,1));

31

7.3.3

Value iteration

As in the deterministic case, the convergence of the simple value function


iteration procedure is insured by the contraction mapping theorem. This is
however a bit more subtil than that as we have to deal with the convergence
of a probability measure, which goes far beyond this introduction to dynamic
programming.4 The algorithm is basically the same as in the deterministic
case up to the delicate point that we have to deal with the expectation. The
algorithm then writes as follows
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1 , . . . , xN }
and the shocks, s
S = {s1 , . . . , sM } together with the transition matrix = (ij )
formulate an initial guess for the value function V0 (x) and choose a
stopping criterion > 0.
2. For each x` X , ` = 1, . . . , N , and sk S , k = 1, . . . , M compute
0

Vi+1 (x` , sk ) = max u(y(x` , sk , x ), x` , sk ) +


{x0 X }

M
X

kj Vi (x0 , s0j )

j=1

3. If kVi+1 (x, s) Vi (x, s)k < go to the next step, else go back to 2.
4. Compute the final solution as
y ? (x, s) = y(x, x0 (s, s), s)
As in the deterministic case, I will illustrate the method relying on the optimal
growth model. In order to better understand the algorithm, let us consider a
simple example and go back to the optimal growth model, with
u(c) =
4

c1 1
1

The interested reader should then refer to Lucas et al. [1989] chapter 9.

32

and
k 0 = exp(a)k c + (1 )k
where a0 = a + 0 . Then the Bellman equation writes
Z
c1 1
V (k, a) = max
+ V (k 0 , a0 )d(a0 |a)
c
1
From the law of motion of capital we can determine consumption as
c = exp(a)k + (1 )k k 0
such that plugging this results in the Bellman equation, we have
Z
(exp(a)k + (1 )k k 0 )1 1
+ V (k 0 , a0 )d(a0 |a)
V (k, a) = max
k0
1
A first problem that we encounter is that we would like to be able to evaluate the integral involved by the rational expectation. We therefore have to
discretize the shock. Here, I will consider that the technology shock can be
accurately approximated by a 2 state markov chain, such that a can take on
2 values a1 and a2 (a1 < a2 ). I will also assume that the transition matrix is
symmetric, such that
=

1
1

a1 , a2 and are selected such that the process reproduces the conditional first
and second order moments of the AR(1) process
First order moments
a1 + (1 )a2 = a1
(1 )a1 + a2 = a2
Second order moments
a21 + (1 )a22 (a1 )2 = 2
(1 )a21 + a22 (a2 )2 = 2
From the first two equations we get a1 = a2 and = (1 + )/2. Plugging
p
these last two results in the two last equations, we get a1 = 2 /(1 2 ).
33

Hence, we will actually work with a value function of the form


2

X
c1 1
V (k, ak ) = max
+
kj V (k 0 , a0j )
c
1
j=1

Now, let us define a grid of N feasible values for k such that we have
K = {k1 , . . . , kN }
and an initial value function V0 (k) that is a vector of N numbers that relate
each k` to a value. Note that this may be anything we want as we know by
the contraction mapping theorem that the algorithm will converge. But, if
we want it to converge fast enough it may be a good idea to impose a good
initial guess. Finally, we need a stopping criterion.
In figures 7.9 and 7.10, we report the value function and the decision rules
obtained for the stochastic optimal growth model with = 0.3, = 0.95,
= 0.1, = 1.5, = 0.8 and = 0.12. The grid for the capital stock
is composed of 1000 data points ranging from 0.2 to 6. The algorithm then
converges in 192 iterations when the stopping criterion is = 1e6 .

34

Figure 7.9: Stochastic OGM (Value function, Value iteration)


6

6
0

3
kt

Figure 7.10: Stochastic OGM (Decision rules, Value iteration)


6

Next period capital stock

1.5

4
3

0.5

1
0
0

Consumption

0
0

35

4
t

sigma
delta
beta
alpha
rho
se

=
=
=
=
=
=

1.50;
0.10;
0.95;
0.30;
0.80;
0.12;

Matlab Code: Value Function Iteration


% utility parameter
% depreciation rate
% discount factor
% capital elasticity of output
% persistence of the shock
% volatility of the shock

nbk
= 1000;
nba
= 2;
crit
= 1;
epsi
= 1e-6;
%
% Discretization of the shock
%
p
= (1+rho)/2;
PI
= [p 1-p;1-p p];
se
= 0.12;
ab
= 0;
a1
= exp(-se*se/(1-rho*rho));
a2
= exp(se*se/(1-rho*rho));
A
= [a1 a2];
%
% Discretization of the state space
%
kmin
= 0.2;
kmax
= 6;
kgrid
= linspace(kmin,kmax,nbk);
c
= zeros(nbk,nba);
util
= zeros(nbk,nba);
v
= zeros(nbk,nba);
Tv
= zeros(nbk,nba);

%
%
%
%

number of data points in the grid


number of values for the shock
convergence criterion
convergence parameter

while crit>epsi;
for i=1:nbk
for j=1:nba;
c
= A(j)*k(i)^alpha+(1-delta)*k(i)-k;
neg
= find(c<=0);
c(neg)
= NaN;
util(:,j)
= (c.^(1-sigma)-1)/(1-sigma);
util(neg,j) = -1e12;
end
[Tv(i,:),dr(i,:)] = max(util+beta*(v*PI));
end;
crit = max(max(abs(Tv-v)));
v
= Tv;
iter = iter+1;
end

36

kp
= k(dr);
for j=1:nba;
c(:,j)
= A(j)*k.^alpha+(1-delta)*k-kp(:,j);
end

7.3.4

Policy iterations

As in the deterministic case, we may want to accelerate the simple value


iteration using Howard improvement. The stochastic case does not differ that
much from the deterministic case, apart from the fact that we now have to
deal with different decision rules, which implies the computation of different
Q matrices. The algorithm may be described as follows
1. Set an initial feasible set of decision rule for the control variable y =
f0 (x, sk ), k = 1, . . . , M and compute the value associated to this guess,
assuming that this rule is operative forever, taking care of the fact that
xt+1 = h(xt , yt , st ) = h(xt , fi (xt , st ), st ) with i = 0. Set a stopping
criterion > 0.
2. Find a new policy rule y = fi+1 (x, sk ), k = 1, . . . , M , such that
fi+1 (x, sk ) Argmax u(y, x, sk ) +
y

M
X

kj V (x0 , s0j )

j=1

with x0 = h(x, fi (x, sk ), sk )


3. check if kfi+1 (x, s) fi (x, s)k < , if yes then stop, else go back to 2.
When computing the value function we actually have to solve a linear system
of the form
Vi+1 (x` , sk ) = u(fi+1 (x` , sk ), x` , sk ) +

M
X

kj Vi+1 (h(x` , fi+1 (x` , sk ), sk ), s0j )

j=1

for Vi+1 (x` , sk ) (for all x` X and sk S ), which may be rewritten as


Vi+1 (x` , sk ) = u(fi+1 (x` , sk ), x` , sk ) + k. QVi+1 (x` , .) x` X
37

where Q is an (N N ) matrix

1 if x0j h(fi+1 (x` ), x` ) = x`


Q`j =
0 otherwise
Note that although it is a big matrix, Q is sparse, which can be exploited in
solving the system, to get
Vi+1 (x, s) = (I Q)1 u(fi+1 (x, s), x, s)
We apply this algorithm to the same optimal growth model as in the previous
section and report the value function and the decision rules at convergence in
figures 7.11 and 7.12. The algorithm converges in only 17 iterations, starting
from the same initial guess and using the same parameterization!
Figure 7.11: Stochastic OGM (Value function, Policy iteration)
6

6
0

3
kt

38

Figure 7.12: Stochastic OGM (Decision rules, Policy iteration)


6

Next period capital stock

1.5

4
3

0.5

1
0
0

sigma
delta
beta
alpha
rho
se

Consumption

=
=
=
=
=
=

1.50;
0.10;
0.95;
0.30;
0.80;
0.12;

kt

0
0

kt

Matlab Code: Policy Iteration


% utility parameter
% depreciation rate
% discount factor
% capital elasticity of output
% persistence of the shock
% volatility of the shock

nbk
= 1000;
nba
= 2;
crit
= 1;
epsi
= 1e-6;
%
% Discretization of the shock
%
p
= (1+rho)/2;
PI
= [p 1-p;1-p p];
se
= 0.12;
ab
= 0;
a1
= exp(-se*se/(1-rho*rho));
a2
= exp(se*se/(1-rho*rho));
A
= [a1 a2];
%
% Discretization of the state space
%
kmin
= 0.2;
kmax
= 6;
kgrid
= linspace(kmin,kmax,nbk);
c
= zeros(nbk,nba);
util
= zeros(nbk,nba);

%
%
%
%

number of data points in the grid


number of values for the shock
convergence criterion
convergence parameter

39

v
= zeros(nbk,nba);
Tv
= zeros(nbk,nba);
Ev
= zeros(nbk,nba);
% expected value function
dr0
= repmat([1:nbk],1,nba);
% initial guess
dr
= zeros(nbk,nba);
% decision rule (will contain indices)
%
% Main loop
%
while crit>epsi;
for i=1:nbk
for j=1:nba;
c
= A(j)*kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid;
neg
= find(c<=0);
c(neg)
= NaN;
util(:,j)
= (c.^(1-sigma)-1)/(1-sigma);
util(neg,j) = -inf;
Ev(:,j)
= v*PI(j,:);
end
[v1,dr(i,:)]= max(u+beta*Ev);
end;
%
% decision rules
%
kp = kgrid(dr);
Q
= sparse(nbk*nba,nbk*nba);
for j=1:nba;
c = A(j)*kgrid.^alpha+(1-delta)*kgrid-kp(:,j);
%
% update the value
%
util(:,j)= (c.^(1-sigma)-1)/(1-sigma);
Q0 = sparse(nbk,nbk);
for i=1:nbk;
Q0(i,dr(i,j)) = 1;
end
Q((j-1)*nbk+1:j*nbk,:) = kron(PI(j,:),Q0);
end
Tv =
crit=
v
=
kp0 =
iter=

(speye(nbk*nba)-beta*Q)\util(:);
max(max(abs(kp-kp0)));
reshape(Tv,nbk,nba);
kp;
iter+1;

end
c=zeros(nbk,nba);
for j=1:nba;
c(:,j)
= A(j)*kgrid.^alpha+(1-delta)*kgrid-kp(:,j);
end

40

7.4

How to simulate the model?

At this point, no matter whether the model is deterministic or stochastic, we


have a solution which is actually a collection of data points. There are several
ways to simulate the model
One may simply use the pointwise decision rule and iterate on it
or one may use an interpolation scheme (either least square methods,
spline interpolations . . . ) in order to obtain a continuous decision rule

Here I will pursue the second route.

7.4.1

The deterministic case

After having solved the model by either value iteration or policy iteration, we
have in hand a collection of points, X , for the state variable and a decision
rule that relates the control variable to this collection of points. In other words
we have Lagrange data which can be used to get a continuous decision rule.
This decision rule may then be used to simulate the model.
In order to illustrate this, lets focus, once again on the deterministic optimal growth model and let us assume that we have solved the model by either
value iteration or policy iteration. We therefore have a collection of data points
for the capital stock K = {k1 , . . . , kN }, which form our grid, and the associ-

ated decision rule for the next period capital stock ki0 = h(ki ), i = 1, . . . , N ,
and the consumption level which can be deduced from the next period capital
stock using the law of motion of capital together with the resource constraint.
Also note that we have the upper and lower bound of the grid k1 = k and

kN = k. We will use Chebychev polynomials to approximate the rule in order to avoid multicolinearity problems in the least square algorithm. In other
words we will get a decision rule of the form
k0 =

n
X

` T` ((k))

`=0

41

where (k) is a linear transformation that maps [k, k] into [-1;1]. The algorithm works as follows
1. Solve the model by value iteration or policy iteration, such that we have
a collection of data points for the capital stock K = {k1 , . . . , kN } and

associated decision rule for the next period capital stock ki0 = h(ki ),
i = 1, . . . , N . Choose a maximal order, n, for the Chebychev polynomials
approximation.
2. Apply the transformation
xi = 2

ki k
1
kk

to the data and compute T` (xi ), ` = 1, . . . , n, and i = 1 . . . , N .


3. Perform the least square approximation such that
= (T` (x)0 T` (x))1 T` (x)h(k)
For the parameterization we have been using so far, setting
(k) = 2

kk
1
kk

and for an approximation of order 7, we obtain


= {2.6008, 2.0814, 0.0295, 0.0125, 0.0055, 0.0027, 0.0012, 0.0007, }
such that the decision rule for the capital stock can be approximated as
kt+1 = 2.6008 + 2.0814T1 ((kt )) 0.0295T2 ((kt )) + 0.0125T3 ((kt ))
0.0055T4 ((kt )) + 0.0027T5 ((kt )) 0.0012T6 ((kt )) + 0.0007T7 ((kt ))
Then the model can be used in the usual way such that in order to get the
transitional dynamics of the model we just have to iterate on this backward
42

looking finitedifferences equation. Consumption, output and investment can


be computed using their definition
it = kt+1 (1 )kt

yt = kt

ct = y t i t
For instance, figure 7.13 reports the transitional dynamics of the model when
the economy is started 50% below its steady state. In the figure, the plain line
represents the dynamics of each variable, and the dashed line its steady state
level.
Figure 7.13: Transitional dynamics (OGM)
Capital stock

3
2.5

0.29

0.28

1.5

0.27

1
0

10

20

Time

30

40

0.26
0

50

Output

1.4

1.2

0.9

1.1

0.8

10

20

Time

30

10

40

0.7
0

50

20

Time

30

40

50

40

50

Consumption

1.1

1.3

1
0

Investment

0.3

10

20

Time

30

Matlab Code: Transitional Dynamics


n
= 8;
% Order of approximation+1
transk = 2*(kgrid-kmin)/(kmax-kmin)-1; % transform the data
%
% Computes the matrix of Chebychev polynomials
%

43

Tk
= [ones(nbk,1) transk];
for i=3:n;
Tk=[Tk 2*transk.*Tk(:,i-1)-Tk(:,i-2)];
end
b=Tk\kp;
% Performs OLS
k0 = 0.5*ks;
% initial capital stock
nrep= 50;
% number of periods
k
= zeros(nrep+1,1);
% initialize dynamics
%
% iteration loop
%
k(1)= k0;
for t=1:nrep;
trkt = 2*(k(t)-kmin)/(kmax-kmin)-1;
Tk
= [1 trkt];
for i=3:n;
Tk= [Tk 2*trkt.*Tk(:,i-1)-Tk(:,i-2)];
end
k(t+1)=Tk*b;
end
y
= k(1:nrep).^alpha;
i
= k(2:nrep+1)-(1-delta)*k(1:nrep);
c
= y-i;

Note: At this point, I assume that the model has already been solved and that the capital
grid has been stored in kgrid and the decision rule for the next period capital stock is stored
in kp.

Another way to proceed would have been to use spline approximation, in


which case the matlab code would have been (for the simulation part)
Matlab Code: OGM simulation (cubic spline interpolation
k(1)= k0;
for t=1:nrep;
k(t+1)=interp1(grid,kp,k(t),cubic);
end
y
= k(1:nrep).^alpha;
i
= k(2:nrep+1)-(1-delta)*k(1:nrep);
c
= y-i;

7.4.2

The stochastic case

As in the deterministic case, it will often be a good idea to represent the


solution as a continuous decision rule, such that we should apply the same
44

methodology as before. The only departure from the previous experiment


is that since we have approximated the stochastic shock by a Markov chain,
we have as many decision rule as we have states in the Markov chain. For
instance, in the optimal growth model we solved in section 7.3, we have 2
decision rules which are given by5
k1,t+1 = 2.8630 + 2.4761T1 ((kt )) 0.0211T2 ((kt )) + 0.0114T3 ((kt ))
0.0057T4 ((kt )) + 0.0031T5 ((kt )) 0.0014T6 ((kt )) + 0.0009T7 ((kt ))
k2,t+1 = 3.2002 + 2.6302T1 ((kt )) 0.0543T2 ((kt )) + 0.0235T3 ((kt ))
0.0110T4 ((kt )) + 0.0057T5 ((kt )) 0.0027T6 ((kt )) + 0.0014T7 ((kt ))
The only difficulty we have to deal with in the stochastic case is to simulate
the Markov chain. Simulating a series of length T for the markov chain can
be achieved easily
1. Compute the cumulative distribution of the markov chain (c ) i.e.
the cumulative sum of each row of the transition matrix. Hence, cij
corresponds to the the probability that the economy is in a state lower
of equal to j given that it was in state i in period t 1.
2. Set the initial state and simulate T random numbers from a uniform
distribution: {pt }Tt=1 . Note that these numbers, drawn from a U[0;1] can
be interpreted as probabilities.

3. Assume the economy was in state i in t 1, find the index j such that
ci(j1) < pt 6 cij
then sj is the state of the economy in period t
5

The code to generate these two rules is exactly the same as for the deterministic version
of the model since, when computing b=Tk\kp matlab will take care of the fact that kp is a
(N 2) matrix, such that b will be a ((n + 1) 2) matrix where n is the approximation
order.

45

Matlab Code: Simulation of Markov Chain


= [-1;1];
% 2 possible states
= [0.9 0.1;
0.1 0.9];
% Transition matrix
= 1;
% Initial state
= 1000;
% Length of the simulation
= rand(T,1);
% Simulated distribution
= zeros(T,1);
= s0;
% Initial state

s
PI

s0
T
p
j
j(1)
%
for k=2:n;
j(k)=find(((sim(k)<=cum_PI(state(k-1),2:cpi+1))&...
(sim(k)>cum_PI(state(k-1),1:cpi))));
end;
chain=s(j);
% simulated values

We are then in position to simulate our optimal growth model. Figure 7.14
reports a simulation of the optimal growth model we solved before.

Figure 7.14: Simulated OGM


Capital stock

4.5

Investment

0.5

0.4

3.5

0.3

3
0.2

2.5

0.1

2
1.5
0

50

100
Time

150

0
0

200

Output

100
Time

150

200

150

200

Consumption

1.6

1.8

1.4

1.6
1.4

1.2

1.2

1
0.8
0

50

50

100
Time

150

0.8
0

200

46

50

100
Time

Matlab Code: Simulating the OGM


n
= 8;
% Order of approximation+1
transk = 2*(kgrid-kmin)/(kmax-kmin)-1; % transform the data
%
% Computes approximation
%
Tk
= [ones(nbk,1) transk];
for i=3:n;
Tk=[Tk 2*transk.*Tk(:,i-1)-Tk(:,i-2)];
end
b=Tk\kp;
%
% Simulate the markov chain
%
long
= 200;
% length of simulation
[a,id] = markov(PI,A,long,1);
% markov is a function:
% a contains the values for the shock
% id the state index
k0=2.6;
% initial value for the capital stock
k=zeros(long+1,1);
% initialize
y=zeros(long,1);
% some matrices
k(1)=k0;
% initial capital stock
for t=1:long;
% Prepare the approximation for the capital stock
trkt
= 2*(k(t)-kmin)/(kmax-kmin)-1;
Tk
= [1 trkt];
for i
= 3:n;
Tk = [Tk 2*trkt.*Tk(:,i-1)-Tk(:,i-2)];
end
k(t+1)
y(t)

= Tk*b(:,id(t));
= A(id(t))*k(t).^alpha;

% use the appropriate decision rule


% computes output

end
i=k(2:long+1)-(1-delta)*k(1:long);
c=y-i;

% computes investment
% computes consumption

Note: At this point, I assume that the model has already been solved and that the capital
grid has been stored in kgrid and the decision rule for the next period capital stock is stored
in kp.

47

7.5

Handling constraints

In this example I will deal the problem of a household that is constraint on


her borrowing and who faces uncertainty on the labor market. This problem
has been extensively studied by Ayiagari [1994].
We consider the case of a household who determines her consumption/saving
plans in order to maximize her lifetime expected utility, which is characterized
by the function:

Et

=0

c1
t+ 1
1

with (0, 1) (1, )

(7.9)

where 0 < < 1 is a constant discount factor, c denotes the consumption


bundle. In each and every period, the household enters the period with a level
of asset at carried from the previous period, for which she receives an constant
real interest rate r. She may also be employed and receives the aggregate real
wage w or may be unemployed in which case she just receives unemployment
benefits, b, which are assumed to be a fraction of the real wage: b = w
where is the replacement ratio. These revenus are then used to consume
and purchase assets on the financial market. Therefore, the budget constraint
in t is given by
at+1 = (1 + r)at + t ct
where
t =

w
w

if the household is employed


otherwise

(7.10)

(7.11)

We assume that the households state on the labor market is a random variable.
Further the probability of being (un)employed in period t is greater if the
household was also (un)employed in period t 1. In other words, the state

on the labor market, and therefore t , can be modeled as a 2 state Markov


chain with transition matrix . In addition, the household is submitted to a

6
Et (.) denotes mathematical conditional expectations. Expectations are conditional on
information available at the beginning of period t.

48

borrowing contraint that states that she cannot borrow more than a threshold
level a?
at+1 > a?

(7.12)

In this case, the Bellman equation writes


V (at , t ) = max u(ct ) + Et V (at+1 , t+1 )
ct >0

s.t. (7.10)(7.12), which may be solved by one of the methods we have seen
before. Anyway, implementing any of these method taking into account the
constraint is straightforward as, plugging the budget constraint into the utility
function, the Bellman equation can be rewritten
V (at , t ) = max u((1 + r)at + t at+1 ) + Et V (at+1 , t+1 )
at+1

s.t. (7.11)(7.12).
But, since the borrowing constraint should be satisfied in each and every
period, we have
V (at , t ) = max ? u((1 + r)at + t at+1 ) + Et V (at+1 , t+1 )
at+1 >a

In other words, when defining the grid for a, one just has to take care of the
fact that the minimum admissible value for a is a? . Positivity of consumption
which should be taken into account when searching the optimal value for
at+1 finally imposes at+1 < (1 + r)at + t , such that the Bellman equation
writes
V (at , t ) =

max

{a? 6at+1 6(1+r)at +t }

u((1 + r)at + t at+1 ) + Et V (at+1 , t+1 )

The matlab code borrow.m solves and simulate this problem.

49

50

Bibliography
Ayiagari, R.S., Uninsured Idiosyncratic Risk and Aggregate Saving, Quarterly
Journal of Economics, 1994, 109 (3), 659684.
Bertsekas, D., Dynamic Programming and Stochastic Control, New York:
Academic Press, 1976.
Judd, K. and A. Solnick, Numerical Dynamic Programming with Shape
Preserving Splines, Manuscript, Hoover Institution 1994.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets:
MIT Press, 1998.
Lucas, R., N. Stokey, and E. Prescott, Recursive Methods in Economic Dynamics, Cambridge (MA): Harvard University Press, 1989.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions to Nonlinear Asset Pricing Models, Econometrica,
1991, 59 (2), 371396.

51

Index
Bellman equation, 1, 2
Blackwells sufficiency conditions, 6
Cauchy sequence, 4
Contraction mapping, 4
Contraction mapping theorem, 4
Discounting, 6
Howard Improvement, 15, 37
Invariant distribution, 30
Lagrange data, 41
Markov chain, 28
Metric space, 3
Monotonicity, 6
Policy functions, 2
Policy iterations, 15, 37
Stationary distribution, 30
Transition probabilities, 29
Transition probability matrix, 30
Value function, 2
Value iteration, 32
52

Contents
7 Dynamic Programming
7.1

7.2

7.3

7.4

7.5

The Bellman equation and associated theorems . . . . . . . . .

7.1.1

A heuristic derivation of the Bellman equation . . . . .

7.1.2

Existence and uniqueness of a solution . . . . . . . . . .

Deterministic dynamic programming . . . . . . . . . . . . . . .

7.2.1

Value function iteration . . . . . . . . . . . . . . . . . .

7.2.2

Taking advantage of interpolation . . . . . . . . . . . .

13

7.2.3

Policy iterations: Howard Improvement . . . . . . . . .

15

7.2.4

Parametric dynamic programming . . . . . . . . . . . .

20

Stochastic dynamic programming . . . . . . . . . . . . . . . . .

26

7.3.1

The Bellman equation . . . . . . . . . . . . . . . . . . .

26

7.3.2

Discretization of the shocks . . . . . . . . . . . . . . . .

28

7.3.3

Value iteration . . . . . . . . . . . . . . . . . . . . . . .

32

7.3.4

Policy iterations . . . . . . . . . . . . . . . . . . . . . .

37

How to simulate the model? . . . . . . . . . . . . . . . . . . . .

41

7.4.1

The deterministic case . . . . . . . . . . . . . . . . . . .

41

7.4.2

The stochastic case . . . . . . . . . . . . . . . . . . . . .

44

Handling constraints . . . . . . . . . . . . . . . . . . . . . . . .

48

53

54

List of Figures
7.1

Deterministic OGM (Value function, Value iteration) . . . . . .

11

7.2

Deterministic OGM (Decision rules, Value iteration) . . . . . .

12

7.3

Deterministic OGM (Value function, Value iteration with interpolation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.4

15

Deterministic OGM (Decision rules, Value iteration with interpolation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

7.5

Deterministic OGM (Value function, Policy iteration) . . . . .

19

7.6

Deterministic OGM (Decision rules, Policy iteration) . . . . . .

20

7.7

Deterministic OGM (Value function, Parametric DP) . . . . . .

22

7.8

Deterministic OGM (Decision rules, Parametric DP) . . . . . .

22

7.9

Stochastic OGM (Value function, Value iteration) . . . . . . . .

35

7.10 Stochastic OGM (Decision rules, Value iteration) . . . . . . . .

35

7.11 Stochastic OGM (Value function, Policy iteration) . . . . . . .

38

7.12 Stochastic OGM (Decision rules, Policy iteration) . . . . . . . .

39

7.13 Transitional dynamics (OGM) . . . . . . . . . . . . . . . . . . .

43

7.14 Simulated OGM . . . . . . . . . . . . . . . . . . . . . . . . . .

46

55

56

List of Tables
7.1

Value function approximation . . . . . . . . . . . . . . . . . . .

57

23

Lecture Notes 8

Parameterized expectations
algorithm
The Parameterized Expectations Algorithm (PEA hereafter) was introduced
by Marcet [1988]. As it will become clear in a moment, this may be viewed as
a generalized method of undetermined coefficients, in which economic agents
learn the decision rule at each step of the algorithm. It will therefore have a
natural interpretation in terms of learning behavior. The basic idea of this
method is to approximate the expectation function of the individuals rather
than attempting to recover directly the decision rules by a smooth function,
in general a polynomial function. Implicit in this approach is the fact that the
space spanned by polynomials is dense in the space spanned by all functions
in the sense
lim inf sup |F (x) F (x)| = 0

k Rk xX

where F is the function to be approximated and F is an kth order interpolating function that is parameterized by .

8.1

Basics

The basic idea that underlies this approach is to replace expectations by an a


priori given function of the state variables of the problem in hand, and then
1

reveal the set of parameters that insure that the residuals from the Euler
equations are a martingale difference sequence (Et t+1 = 0). Note that the
main difficulty when solving the model is to deal with the integral involved by
the expectation. The approach of the basic PEA algorithm is to approximate
it by MonteCarlo simulations.
PEA algorithm may be implemented to solve a large set of models that
admit the following general representation
F (Et (E (yt+1 , xt+1 , yt , xt )), yt , xt , t ) = 0

(8.1)

where F : Rm Rny Rnx Rne Rnx +ny describes the model and E :
Rny Rnx Rny Rnx Rm defines the transformed variables on which we
take expectations. Et is the standard conditional expectations operator. t is
the set of innovations of the structural shocks that affect the economy.
In order to fix notations, let us take the optimal growth model as an
example

1
+1
= 0
t Et t+1 zt+1 kt+1

c
t t = 0

kt+1 zt kt + ct (1 )kt = 0
zt+1 zt t+1 = 0
In this example, we have y = {c, }, x = {k, z} and = , the function E
takes the form

1
+1
E ({c, }t+1 , {k, z}t+1 , {c, }t , {k, z}t ) = t+1 zt+1 kt+1

while F (.) is given by

t Et [E ({c, }t+1 , {k, z}t+1 , {c, }t , {k, z}t )]


ct t
F (.) =
k
zt kt + ct (1 )kt

t+1
zt+1 zt t+1

The idea of the PEA algorithm is then to replace the expectation function
Et (E (yt+1 , xt+1 , yt , xt )) by an parametric approximation function, (xt ; ), of
2

the current state variables xt and a vector of parameters , such that the
approximated model may be restated as
F ((xt , ), yt , xt , t ) = 0

(8.2)

The problem of the PEA algorithm is then to find a vector such that
Argmin k(xt , ) Et (E (yt+1 , xt+1 , yt , xt ))k2

that is the solution satisfies the rational expectations hypothesis. At this


point, note that we selected a quadratic norm, but one also may consider
other metrics of the form
Argmin R(xt , )0 R(xt , )

with R(xt , ) (xt , )Et (E (yt+1 , xt+1 , yt , xt )) and is a weighting matrix.


This would then correspond to a GMM type of estimation. One may also
consider
Argmin max{|(xt , ) Et (E (yt+1 , xt+1 , yt , xt ))|}

which would call for LAD estimation methods. However, the usual practice is
use the standard quadratic norm.
Once, and therefore the approximation function has been found, (xt , )
and equation (8.2) may be used to generate time series for the variables of the
model.
The algorithm may then be described as follows.
Step 1. Specify a guess for the function (xt , ), an initial . Choose a stopping
criterion > 0, a sample size T that should be large enough and draw a
sequence {t }Tt=0 that will be used during all the algorithm.
Step 2. At iteration i, and for the given i , simulate, recursively, a sequence for
{yt (i )}Tt=0 and {xt (i )}Tt=0
3

Step 3. Find G( i ) that satisfies


T
1X
b Argmin
kE (yt+1 (), xt+1 (), yt (), xt ()) (xt (), )k2
T

t=0

which just amounts to perform a nonlinear least square regression taking E (yt+1 (), xt+1 (), yt (), xt ()) as the dependent variable, (.) as the
explanatory function and as the parameter to be estimated.
Step 4. Set i+1 to
i+1 = bi + (1 ) i

(8.3)

where (0, 1) is a smoothing parameter. On the one hand, setting


low helps convergence, but at the cost of increasing the computational
time. As long as good initial conditions can be found and the model
is not too nonlinear, setting close to 1 is sufficient, however, when
dealing with strongly nonlinear models with binding constraints for
example decreasing will generally helps a lot.
Step 5. If | i+1 i | < then stop, otherwise go back to step 2.
Reading this algorithm, it appears that it may easily be given a learning
interpretation. Indeed, each iteration mays be interpreted as a learning step,
in which the individual uses a rule of thumb as a decision rule and reveal
information on the kind of errors he/she does using this rule of thumb. He/she
then corrects the rule that is find another that will be used during the
next step. But it should be noted that nothing in the algorithm guarantees
that the algorithm always converges and if it does delivers a decision
rule that is compatible with the rational expectation hypothesis.1
At this point, several comments stemming from the implementation of the
method are in order. First of all, we need to come with an interpolating
1
For a convergence proof in the case of the optimal growth model, see Marcet and Marshall
[1994].

function, (.). How should it be specified? In fact, we are free to choose any
functional form we may think of, nevertheless economic theory may guide us
as well as some constraints imposed by the method more particularly in
step 3. A widely used interpolating function combines nonlinear aspects of
the exponential function with some polynomials, such that j (x, ) may take
the form (where j {1, . . . , m} refers to a particular expectation)

j (x, ) = exp 0 P (x)

where P (x) is a multivariate polynomial.2 One advantage of this interpolating


function is obviously that it guarantees positive values for the expectations,
which turns out to be mostly the case in economics. One potential problem
with such a functional form is precisely related to the fact that it uses simple
polynomials which then may generate multicolinearity problems during step
3. As an example, let us take the simple case in which the state variable is
totally exogenous and is an AR(1) process with lognormal innovations:
log(at ) = log(at1 ) + t
with || < 1 and ; N (0, ). The state variable is then at . If we simulate
the sequence {at }Tt=0 with T = 10000, and compute the correlation matrix of
{at , a2t , a3t , a4t } we get, for = 0.95 and

1.0000 0.9998
0.9998 1.0000

0.9991 0.9998
0.9980 0.9991

= 0.01
0.9991
0.9998
1.0000
0.9998

0.9980
0.9991

0.9998
1.0000

revealing some potential multicolinearity problems to occur. As an illustrative


example, assume that we want to approximate the expectation function in this
model, it will be a function of the capital stock which is a particularly smooth
sequence, therefore if there will be significant differences between the sequence
itself and the sequence taken at the power 2, the difference may then be small
2
For instante, let us consider the case nx = 2, P (xt ) may then consists of a constant
term, x1t , x2t , x21t , x22t , x1t x2t .

for the sequence at the power 4. Hence multicolinearity may occur. One way
to circumvent this problem is to rely on orthogonal polynomials instead of
standard polynomials in the interpolating function.
A second problem that arises in this approach is to select initial conditions for . Indeed, this step is crucial for at least 3 reasons: (i) the problem
is fundamentally nonlinear, (ii) convergence is not always guarantee, (iii)
economic theory imposes a set of restrictions to insure positivity of some variables for example. Therefore, much attention should be paid when imposing
an initial value to .
A third important problem is related to the choice of , the smoothing
parameter. A too large value may put too much weight on new values for
and therefore reinforce the potential forces that lead to divergence of the
algorithm. On the contrary, setting too close to 0 may be costly in terms of
computational CPU time.
It must however be noted that no general rule may be given for these
implementation issues and that in most of the case, one has to guess and try.
Therefore, I shall now report 3 examples of implementation. The first one is
the standard optimal growth model, the other one corresponds to the optimal
growth model with investment irreversibility, the last one will be the problem
of a household facing borrowing constraints.
But before going to the examples, we shall consider a linear example that
will highlight the similarity between this approach and the undetermined coefficient approach.

8.2

A linear example

Let us consider the simple model


yt = aEt yt+1 + bxt
xt+1 = (1 )x + xt + t+1
6

where ; N (0, 2 ). Finding an expectation function in this model amounts


to find a function (xt , ) for Et (ayt+1 + bxt ). Let us make the following guess
for the solution:
(xt , ) = 0 + 1 xt
In this case, solving the PEA problem amount to solve
N
1 X
((xt , ) ayt+1 bxt )2
{0 ,1 } N

min

t=1

The first order conditions for this problem are


N
1 X
(0 + 1 xt ayt+1 bxt ) = 0
N

(8.4)

t=1

N
1 X
xt (0 + 1 xt ayt+1 bxt ) = 0
N

(8.5)

t=1

Equation (8.4) can be rewritten as


N
N
N
1 X
1 X
1 X
0 + 1
xt = a
yt+1 + b
xt
N
N
N
t=1

t=1

t=1

But, since (xt , ) is an approximate solution for the expectation function,


the model implies that
yt = Et (ayt+1 + bxt ) = (xt , )
such that the former equation rewrites
0 + 1

N
N
N
1 X
1 X
1 X
xt = a
(0 + 1 xt+1 ) + b
xt
N
N
N
t=1

t=1

t=1

Asymptotically, we have
N
N
1 X
1 X
xt = lim
xt+1 = x
lim
N N
N N
t=1

t=1

such that this first order condition converges to


0 + 1 x = a0 + a1 x + bx
7

therefore, rearranging terms, we have


0 (1 a) + 1 (1 a)x = bx

(8.6)

Now, let us consider equation (8.5), which can be rewritten as


N
N
N
N
1 X
1 X 2
1 X
1 X 2
xt + 1
xt = a
yt+1 xt + b
xt
N
N
N
N

t=1

t=1

t=1

t=1

Like for the first condition, we acknowledge that


yt = Et (ayt+1 + bxt ) = (xt , )
such that the condition rewrites
N
N
N
N
1 X 2
1 X
1 X 2
1 X
xt + 1
xt = a
(0 + 1 xt+1 )xt + b
xt
0
N
N
N
N
t=1

t=1

t=1

(8.7)

t=1

Asymptotically, we have
N
N
1 X
1 X 2
xt = x and lim
xt = E(x2 ) = x2 + x2
N N
N N

lim

t=1

t=1

finally, we have
N
N
1 X
1 X
xt xt+1 = lim
xt ((1 )x + xt + t+1 )
N N
N N

lim

t=1

t=1

Since is the innovation of the process, we have limN


such that

1
N

PN

t=1 xt t+1

N
1 X
xt xt+1 = (1 )x2 + E(x2 ) = x2 + x2
N N

lim

t=1

Hence, (8.7) asymptotically rewrites as


x(1 a)0 + (1 a)(x2 + x2 )1 = b(x2 + x2 )
8

= 0,

We therefore have to solve the system


0 (1 a) + 1 (1 a)x = bx
x(1 a)0 + (1 a)(x2 + x2 )1 = b(x2 + x2 )
premultiplying the first equation by x, and plugging the result in the second
equation leads to
(1 a)1 x2 = bx2
such that
1 =

b
1 a

Plugging this result into the first equation, we get


0 =

ab(1 )x
(1 a)(1 a)

Therefore, Asymptotically, the solution is given by


yt =

ab(1 )x
b
+
xt
(1 a)(1 a) 1 a

which corresponds exactly to the solution of the model (see Lecture notes
#1).Therefore, asymptotically, the PEA algorithm is nothing else but an undetermined coefficient method.

8.3

Standard PEA solution: the Optimal Growth


Model

Let us first recall the type of problem we have in hand. We are about to solve
the set of equations

1
+1
= 0
t Et t+1 zt+1 kt+1
c
t t = 0

kt+1 zt kt + ct (1 )kt = 0
log(zt+1 ) log(zt ) t+1 = 0
9

Our problem will therefore be to get an approximation for the expectation


function:

1
Et t+1 zt+1 kt+1
+1

In this problem, we have 2 state variables: kt and zt , such that (.) should be
a function of both kt and zt . We will make the guess

(kt , zt ; ) = exp 0 + 1 log(kt ) + 2 log(zt ) + 3 log(kt )2 + 4 log(zt )2 + 5 log(kt ) log(zt )

From the first equation of the above system, we have that for a given

vector = {0 , 1 , 2 , 3 , 4 , 5 } t () = (kt (), zt (); ), which enables us


to recover
1

ct () = t ()
and therefore get
kt+1 () = zt kt () ct () + (1 )kt ()
We then recover a whole sequence for {kt ()}Tt=0 , {zt }Tt=0 , {t ()}Tt=0 , and
{ct ()}Tt=0 , which makes it simple to compute a sequence for

t+1 () t+1 () zt+1 kt+1 ()1 + 1

Since (kt , zt ; ) is an exponential function of a polynomial, we may run the


regression
log(t+1 ()) = 0 + 1 log(kt ()) + 2 log(zt ) + 3 log(kt ())2
+4 log(zt )2 + 5 log(kt ()) log(zt )

(8.8)

b We then set a new value for according to the updating scheme (8.3)
to get .
and restart the process until convergence.

The parameterization we used in the matlab code are given in table 8.1
and is totally standard. , the smoothing parameter was set to 1, implying
that in each iteration the new vector is totally passed as a new guess in the
10

Table 8.1: Optimal growth: Parameterization

0.95

0.3

0.1

0.9

e
0.01

progression of the algorithm. The stopping criterion was set at =1e-6 and
T =20000 data points were used to compute the OLS regression.
Initial conditions were set as follows. We first solve the model relying on
a loglinear approximation. We then generate a random draw of size T for
and generate series using the loglinear approximate solution. We then built

the needed series to recover a draw for {t+1 ()}


t=0 , {kt ()}t=0 and {zt ()}t=0

and ran the regression (8.8) to get an initial condition for , reported in table
8.2. The algorithm converges after 22 iterations and delivers the final decision
Table 8.2: Decision rule

Initial
Final

0
0.5386
0.5489

1
-0.7367
-0.7570

2
-0.2428
-0.3337

3
0.1091
0.1191

4
0.2152
0.1580

5
-0.2934
-0.1961

rule reported in table 8.2. When is set at 0.75, 31 iterations are needed,
46 for = 0.5 and 90 for = 0.25. It is worth noting that the final decision
rule does differ from the initial conditions, but not by an as large amount
as one would have expected, meaning that in this setup and provided the
approximation is good enough3 certainty equivalence and nonlinearities
do not play such a great role. In fact, as illustrated in figure 8.1, the capital
decision rule does not display that much non linearities. Although particularly
simple to implement (see the following matlab code), this method should be
handle with care as it may be difficult to obtain convergence for some models.
Nevertheless it has another attractive feature: it can handle problems with
3
Note that for the moment we have not made any evaluation of the accuracy of the
decision rule. We will undertake such an evaluation in the sequel.

11

Figure 8.1: Capital decision rule


3
2.9
2.8

kt+1

2.7
2.6
2.5
2.4
2.3
2.2
2.2

2.3

2.4

2.5

2.6
k

2.7

2.8

2.9

possibly binding contraints. We now provide two examples of such models.


Matlab Code: PEA Algorithm (OGM)
clear all
long
init
slong
T
T1
tol
crit
gam

=
=
=
=
=
=
=
=

20000;
500;
init+long;
init+1:slong-1;
init+2:slong;
1e-6;
1;
1;

sigma
delta
beta
alpha
ab
rho
se
param
ksy
yss
kss

= 1;
= 0.1;
= 0.95;
= 0.3;
= 0;
= 0.9;
= 0.01;
= [ab alpha beta delta rho se sigma long init];
=(alpha*beta)/(1-beta*(1-delta));
= ksy^(alpha/(1-alpha));
= yss^(1/alpha);

12

iss
css
csy
lss

=
=
=
=

delta*kss;
yss-iss;
css/yss;
css^(-sigma);

randn(state,1);
e
= se*randn(slong,1);
a
= zeros(slong,1);
a(1)
= ab+e(1);
for i
= 2:slong;
a(i) = rho*a(i-1)+(1-rho)*ab+e(i);
end
b0

= peaoginit(e,param); % Compute initial conditions

%
% Main Loop
%
iter
= 1;
while crit>tol;
%
% Simulated path
%
k
= zeros(slong+1,1);
lb
= zeros(slong,1);
X
= zeros(slong,length(b0));
k(1) = kss;
for i
= 1:slong;
X(i,:)= [1 log(k(i)) a(i) log(k(i))*log(k(i)) a(i)*a(i) log(k(i))*a(i)];
lb(i) = exp(X(i,:)*b0);
k(i+1)=exp(a(i))*k(i)^alpha+(1-delta)*k(i)-lb(i)^(-1/sigma);
end
y
= beta*lb(T1).*(alpha*exp(a(T1)).*k(T1).^(alpha-1)+1-delta);
bt
= X(T,:)\log(y);
b
= gam*bt+(1-gam)*b0;
crit = max(abs(b-b0));
b0
= b;
disp(sprintf(Iteration: %d\tConv. crit.: %g,iter,crit))
iter=iter+1;
end;

13

8.4

PEA and binding constraints: Optimal growth


with irreversible investment

We now consider a variation to the previous model, in the sense that we restrict
gross investment to be positive in each and every period:
it > 0 kt+1 > (1 )kt

(8.9)

This assumption amounts to assume that there does not exist a second hand
market for capital. In such a case the problem of the central planner is to
determined consumption and capital accumulation, such that utility is maximum:
max E0

{ct ,kt+1 }t=0

s.t.

X
t=0

c1
1
t
1

kt+1 = zt kt ct + (1 )kt
and
kt+1 > (1 )kt
Forming the Lagrangean associated to the previous problem, we have
"

1
X

ct+ 1
ct+ + (1 )kt+ kt+ +1

+ t+ zt+ kt+
Lt = E t
1
=0
#
+ t+ (kt+1 (1 )kt )

which leads to the following set of first order conditions


c
= t
t

(8.10)

1
+ 1 t+1 (1 )
t t = Et t+1 zt+1 kt+1

kt+1 = zt kt ct + (1 )kt
t (kt+1 (1 )kt )

(8.11)
(8.12)
(8.13)

The main difference with the previous example is that now the central planner
faces a constraint that may be binding in each and every period. Therefore,
14

this complicates a little bit the algorithm, and we have to find a rule for both
the expectation function
Et [t+1 ]
where

1
+ 1 t+1 (1 )
t+1 t+1 zt+1 kt+1

and t . We then proceed as suggested in Marcet and Lorenzoni [1999]:

1. Compute two sequences for {t ()}


t=0 and {kt ()}t=0 from (8.11) and

(8.12) under the assumption that the constraint is not binding that
is t () = 0. In such a case, we just compute the sequences as in the
standard optimal growth model.
2. Test whether, under this assumption, it () > 0. If it is the case, then
set t () = 0, otherwise set kt+1 () = (1 )kt (), ct () is computed
from the resource constraint and t () is found from (8.11).
Note that, using this procedure, t is just treated as an additional variable
which is just used to compute a sequence to solve the model. We therefore
do not need to compute explicitly its interpolating function, as far as t+1 is
concerned we use the same interpolating function as in the previous example
and therefore run a regression of the type
log(t+1 ()) = 0 + 1 log(kt ()) + 2 log(zt ) + 3 log(kt ())2
+4 log(zt )2 + 5 log(kt ()) log(zt )

(8.14)

b
to get .

Up to the shock, the parameterization, reported in table 8.3, we used in the

matlab code is essentially the same as the one we used in the optimal growth

model. The shock was artificially assigned a lower persistence and a greater
volatility in order to increase the probability of binding the constraint, and
therefore illustrate the potential of this approach. , the smoothing parameter
was set to 1. The stopping criterion was set at =1e-6 and T =20000 data
points were used to compute the OLS regression.
15

Table 8.3: Optimal growth: Parameterization

0.95

0.3

0.1

0.8

e
0.14

Initial conditions were set as in the standard optimal growth model: We


first solve the model relying on a loglinear approximation. We then generate a random draw of size T for and generate series using the loglinear
approximate solution. We then built the needed series to recover a draw for

{t+1 ()}
t=0 , {kt ()}t=0 and {zt ()}t=0 and ran the regression (8.14) to get an

initial condition for , reported in table 8.4. The algorithm converges after 115
iterations and delivers the final decision rule reported in table 8.4. Contrary
Table 8.4: Decision rule

Initial
Final

0
0.4620
0.3558

1
-0.5760
-0.3289

2
-0.3909
-0.7182

3
0.0257
-0.1201

4
0.0307
-0.2168

5
-0.0524
0.3126

to the standard optimal growth model, the initial and final rule totally differ
in the sense the coefficient in front of the capital stock in the final rule is half
that on the initial rule, that in front of the shock is double, and the sign in
front of all the quadratic terms are reversed. This should not be surprising as
the initial rule is computed under (i) the certainty equivalence hypothesis and
(ii) the assumption that the constraint never binds, whereas the size of the
shocks we introduce in the model implies that the constraint binds in 2.8% of
the cases. The latter quantity may seem rather small, but this is sufficient to
dramatically alter the decision of the central planner when it acts under rational expectations. This is illustrated by figures 8.2 and 8.3 which respectively
report the decision rules for investment, capital and the lagrange multiplier
and a typical path for investment and lagrange multiplier. As reflected in
16

Figure 8.2: Decision rules


investment

1.5

800

Distribution of investment

600

400
0.5

0
0

200

4
k

0
0

Capital stock

0.8

0.6

0.4

0.2

0
0

4
kt

0
0

0.5

1.5

Lagrange multiplier

4
kt

Figure 8.3: Typical investment path


1

investment

0.25

0.8

0.2

0.6

0.15

0.4

0.1

0.2

0.05

0
0

200

Time

400

0
0

600

17

Lagrange multiplier

200

Time

400

600

the upper right panel of figure 8.2 which reports the simulated distribution of
investment the distribution is highly skewed and exhibits a mode at it = 0, revealing the fact that the constraint occasionally binds. This is also illustrated
in the lower left panel that reports the decision rule for the capital stock. As
can be seen from this graph, the decision rule is bounded from below by the
line (1 )kt (the grey line on the graph), such situation then correspond to
situations where the Lagrange multiplier is positive as reported in the lower
right panel of the figure.
Matlab Code: PEA Algorithm (Irreversible Investment)
clear all
long
= 20000;
init
= 500;
slong
= init+long;
T
= init+1:slong-1;
T1
= init+2:slong;
tol
= 1e-6;
crit
= 1;
gam
= 1;
sigma
= 1;
delta
= 0.1;
beta
= 0.95;
alpha
= 0.3;
ab
= 0;
rho
= 0.8;
se
= 0.125;
kss
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
css
= kss^alpha-delta*kss;
lss
= css^(-sigma);
ysk
= (1-beta*(1-delta))/(alpha*beta);
csy
= 1-delta/ysk;
%
% Simulation of the shock
%
randn(state,1);
e
= se*randn(slong,1);
a
= zeros(slong,1);
a(1)
= ab+e(1);
for i
= 2:slong;
a(i) = rho*a(i-1)+(1-rho)*ab+e(i);
end
%
% Initial guess

18

%
param
= [ab alpha beta delta rho se sigma long init];
b0
= peaoginit(e,param);
%
% Main Loop
%
iter
= 1;
while crit>tol;
%
% Simulated path
%
k
= zeros(slong+1,1);
lb
= zeros(slong,1);
mu
= zeros(slong,1);
X
= zeros(slong,length(b0));
k(1) = kss;
for i
= 1:slong;
X(i,:)= [1 log(k(i)) a(i) log(k(i))*log(k(i)) a(i)*a(i) log(k(i))*a(i)];
lb(i) = exp(X(i,:)*b0);
iv
= exp(a(i))*k(i)^alpha-lb(i)^(-1/sigma);
if iv>0;
k(i+1) = (1-delta)*k(i)+iv;
mu(i) = 0;
else
k(i+1) = (1-delta)*k(i);
c
= exp(a(i))*k(i)^alpha;
mu(i) = c^(-sigma)-lb(i);
end
end
y
= beta*(lb(T1).*(alpha*exp(a(T1)).*k(T1).^(alpha-1)+1-delta) ...
-mu(T1)*(1-delta));
bt
= X(T,:)\log(y);
b
= gam*bt+(1-gam)*b0;
crit = max(abs(b-b0));
b0
= b;
disp(sprintf(Iteration: %d\tConv. crit.: %g,iter,crit))
iter = iter+1;
end;

19

8.5

The Households Problem With Borrowing Constraints

As a final example, we now report the example of a consumer that faces


borrowing constraints, such that she solves the program
max Et
{ct }

u(ct+ )

=0

s.t.
at+1 = (1 + r)at + t ct
at+1 > 0
log(t+1 ) = log(t ) + (1 ) log() + t+1
Let us first recall the first order conditions that are associated with this problem:
c
= t
t

(8.15)

t = t + (1 + r)Et t+1
at+1 = (1 + r)at + t ct
log(t+1 ) = log(t ) + (1 ) log() + t+1

(8.16)
(8.17)
(8.18)

t (at+1 a) = 0

(8.19)

t > 0

(8.20)

In order to solve this model, we have to find a rule for both the expectation
function
Et [t+1 ]
where
t+1 Rt+1
and t . We propose to follow the same procedure as the previous one:
20


1. Compute two sequences for {t ()}
t=0 and {at ()}t=0 from (8.16) and

(8.17) under the assumption that the constraint is not binding that
is t () = 0.
2. Test whether, under this assumption, at+1 () > a. If it is the case, then
set t () = 0, otherwise set at+1 () = a, ct () is computed from the
resource constraint and t () is found from (8.16).
Note that, using this procedure, t is just treated as an additional variable
which is just used to compute a sequence to solve the model. We therefore
do not need to compute explicitly its interpolating function, as far as t+1 is
concerned we use the same interpolating function as in the previous example
and therefore run a regression of the type
log(t+1 ()) = 0 + 1 at () + 2 t + 3 at ()2 + 4 t2 + 5 at ()t

(8.21)

b
to get .

The parameterization is reported in table 8.5. , the smoothing parameter


Table 8.5: Borrowing constraint: Parameterization
a
0

0.95

1.5

0.7

0.1

R
1.04

was set to 1. The stopping criterion was set at =1e-6 and T =20000 data
points were used to compute the OLS regression.
One key issue in this particular problem is related to the initial conditions.
Indeed, it is extremely difficult to find a good initial guess as the only model
for which we might get an analytical solution while being related to the present
model is the standard permanent income model. Unfortunately, this model
exhibits a nonstationary behavior, in the sense it generates an I(1) process
for the level of individual wealth and consumption, and therefore the marginal
utility of wealth. We therefore have to take another route. We propose the
21

following procedure. For a given a0 and a sequence {t }Tt=0 , we generate


c0 = rea0 + 0 + 0 where re > r and 0 ; N (0, ). In practice, we took

re = 0.1 and = 0.1. We then compute a1 from the law of motion of wealth.

If a1 < a then a1 is set to a and c0 = Ra0 +y0 a, otherwise c0 is not modified.


We then proceed exactly the same way for all t > 0. We then have in hand a

sequence for both at and ct , and therefore for t . We can then recover easily
t+1 and an initial from the regression (8.21) (see table 8.6).
Table 8.6: Decision rule

Initial
Final

0
1.6740
1.5046

1
-0.6324
-0.5719

2
-2.1918
-2.1792

3
0.0133
0.0458

4
0.5438
0.7020

5
0.2971
0.3159

The algorithm converges after 79 iterations and delivers the final decision
rule reported in table 8.6. Note that if the final decision rule effectively differs
from the initial one, the difference is not huge, meaning that our initialization
procedure is relevant. Figure 8.4 reports the decision rule of consumption in
terms of cashonhand that is the effective amount a household may use to
purchase goods (Rat + t a). Figure 8.5 reports the decision rule for wealth
accumulation as well as the implied distribution, which admits a mode in a,
revealing that the constraints effectively binds (in 13.7% of the cases).
Matlab Code: PEA Algorithm (Borrowing Constraints)
clear
crit
tol
gam
long
init
slong
T
T1

=
=
=
=
=
=
=
=

1;
1e-6;
1;
20000;
500;
long+init;
init+1:slong-1;
init+2:slong;

rw
sw

= 0.7;
= 0.1;

22

Figure 8.4: Consumption decision rule


Consumption

1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.5

1.5

2.5
3
3.5
Cashonhand (R a + a)
t

4.5

Figure 8.5: Wealth accumulation


Wealth

4000

3000

2000

1000

0
0

2
a

0
0

23

Distribution of wealth

wb
beta
R
sigma
ab

=
=
=
=
=

0;
0.95;
1/(beta+0.01);
1.5;
0;

randn(state,1);
e
= sw*randn(slong,1);
w
= zeros(slong,1);
w(1)
= wb+e(1);
for i
= 2:slong;
w(i)= rw*w(i-1)+(1-rw)*wb+e(i);
end
w=exp(w);
a
= zeros(slong,1);
c
= zeros(slong,1);
lb
= zeros(slong,1);
X
= zeros(slong,6);
a(1)
= ass;
rt
= 0.2;
sc
= 0.1;
randn(state,1234567890);
ec
= sc*randn(slong,1);
for i=1:slong;
X(i,:) = [1 a(i) w(i) a(i)*a(i) w(i)*w(i) a(i)*w(i)];
c(i)
= rt*a(i)+w(i)+ec(i);
a1
= R*a(i)+w(i)-c(i);
if a1>ab;
a(i+1)=a1;
else
a(i+1)= ab;
c(i) = R*a(i)+w(i)-ab;
end
end
lb = c.^(-sigma);
y
= log(beta*R*lb(T1));
b0 = X(T,:)\y
iter=1;
while crit>tol;
a
= zeros(slong,1);
c
= zeros(slong,1);
lb
= zeros(slong,1);
X
= zeros(slong,length(b0));
a(1)
= 0;
for i=1:slong;
X(i,:)= [1 a(i) w(i) a(i)*a(i) w(i)*w(i) a(i)*w(i)];

24

lb(i) = exp(X(i,:)*b0);
a1
= R*a(i)+w(i)-lb(i)^(-1/sigma);
if a1>ab;
a(i+1) = a1;
c(i)
= lb(i).^(-1./sigma);
else
a(i+1) = ab;
c(i)
= R*a(i)+w(i)-ab;
lb(i) = c(i)^(-sigma);
end
end
y

= log(beta*R*lb(T1));

b
= X(T,:)\y;
b
= gam*b+(1-gam)*b0;
crit = max(abs(b-b0));
b0
= b;
disp(sprintf(Iteration: %d\tConv. crit.: %g,iter,crit))
iter=iter+1;
end;

25

26

Bibliography
Marcet, A., Solving Nonlinear Stochastic Models by Parametrizing Expectations, mimeo, CarnegieMellon University 1988.
and D.A. Marshall, Solving Nonlinear Rational Expectations Models
by Parametrized Expectations : Convergence to Stationary Solutions,
Manuscript, Universitat Pompeu Fabra, Barcelone 1994.
and G. Lorenzoni, The Parameterized Expectations Approach: Some
Practical Issues, in M. Marimon and A. Scott, editors, Computational
Methods for the Study of Dynamic Economies, Oxford: Oxford University
Press, 1999.

27

Index
Expectation function, 2
Interpolating function, 1
Orthogonal polynomial, 6

28

Contents
8 Parameterized expectations algorithm

8.1

Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.2

A linear example . . . . . . . . . . . . . . . . . . . . . . . . . .

8.3

Standard PEA solution: the Optimal Growth Model . . . . . .

8.4

PEA and binding constraints: Optimal growth with irreversible

8.5

investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

The Households Problem With Borrowing Constraints . . . . .

20

29

30

List of Figures
8.1

Capital decision rule . . . . . . . . . . . . . . . . . . . . . . . .

12

8.2

Decision rules . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

8.3

Typical investment path . . . . . . . . . . . . . . . . . . . . . .

17

8.4

Consumption decision rule . . . . . . . . . . . . . . . . . . . . .

23

8.5

Wealth accumulation . . . . . . . . . . . . . . . . . . . . . . . .

23

31

32

List of Tables
8.1

Optimal growth: Parameterization . . . . . . . . . . . . . . . .

11

8.2

Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

8.3

Optimal growth: Parameterization . . . . . . . . . . . . . . . .

16

8.4

Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

8.5

Borrowing constraint: Parameterization . . . . . . . . . . . . .

21

8.6

Decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

33

Lecture Notes 9

Minimum Weighted Residual


Methods
The minimum weighted residual method has been introduced in economics
by Judd [1992]. As for the PEA algorithm, it may receive an interpretation in terms of a generalized method of undetermined coefficients. However,
here stops the comparison, as the philosophy of minimum weighted residual
methods differs fundamentally from that of PEA as it will become clear in a
moment. Nevertheless, as for the PEA algorithm, the basic idea of the method
is to approximate either the decision rule in Judds original implementation
or the expectation function of the individuals as proposed by Christiano
and Fisher [2000] by a smooth function, which in most of the cases will
combine orthogonal polynomials. The parameters of the function are then
revealed by imposing identifying restrictions dictated by economic theory.

9.1
9.1.1

The Basic idea


Stating the problem

The minimum weighted residual method may be implemented to solve a large


set of models that admit the following general representation
Et F (yt+1 , xt+1 , yt , xt , t+1 ) = 0
1

(9.1)

where F : Rny Rnx Rny Rnx Rne Rny +nx describes the model. t

is the set of innovations of the structural shocks that hit the economy. The
solution of the model is a set of decision rules for the control variables
yt = g(xt ; )
that define the next period state variables as
xt+1 h(xt , yt , t+1 ) = h(xt , g(xt ; ), t+1 )
such that the model can be rewritten
Et R(xt , t+1 ; g, ) = 0

(9.2)

The idea of the minimum weighted residual method is to replace the true
decision rule by a parametric approximation function, (xt; ), of the current
state variables xt and a vector of parameters . Therefore, in its original
implementation the minimum weighted residual method differs from the PEA
algorithm in that we are not seeking directly an expectation function but a
decision rule.1 The problem is then to find a vector of parameters such that
when the agents use the so defined rule of thumb, Et F (xt , t+1 ; g, ) can be
made as small as possible. But, What do we mean by small? In fact we
want to find a vector of parameters b such that
kEt R(xt , t+1 ; g, {}ni=0 )k = 0

which corresponds to
Z

Et R(xt , t+1 ; g, )i (x)dx = 0 for i = 0, . . . , n

(9.3)

where (x) is a weighting function, which operationalizes the notion of small.


1
Note however that we will present a variant of the method that attempts to reveal the
expectation instead.

9.1.2

Implementation

Choosing an implementation of the minimum weighting residual method basically amounts to make 3 choices
1. the choice of a family of approximating functions,
2. the choice of the weighting function,
3. the choice of the method to approximate the integral involved by (i) the
rational expectation and (ii) the identifying restriction (9.3).
Choosing a family of approximating functions:

The choice of a family

of approximating function defines the class of minimum weighting residual


method we use. It is commonly admitted to range the method in two classes:
Finite element methods,
Spectral methods.
The Finite element methods can be viewed as a piecewise application of
the minimum weighted residual method. This approach assumes that the
decision rule can be accurately represented by a collection of small elements of
finite dimensions called as Finite Elements. The original decision rule is then
considered as an assemblage of these elements. These elements are connected
to each other at joints called as nodes (or nodal points) to form the entire
decision rule. Therefore, the first step involved in solving the model using finite
elements is to divide the state space into disjoint intervals the elements.
Then the method will essentially amount to fit low order polynomials or spline
functions on each subintervals. Once an approximation is obtained over each
subinterval, all approximations are then pieced to form the decision rule over
the whole domain of the state space we
xx
i1

xi xi1
xi+1 x
(xi ) =
x
x

i+1 i
0
3

consider. Typical finite elements are


if x [xi1 ; xi ]
if x [xi ; xi+1 ]
otherwise.

we therefore recognize spline functions. McGrattan [1996] and McGrattan


[1999] discuss and present the implementation of finite element methods in
various economic models.
The spectral methods instead make use of higher order polynomials usually orthogonal polynomials but do not divide the domain of approximation
into subintervals. The approximation is therefore conducted over the whole
interval, which requires the decision rule to be continuous and differentiable.
Therefore, in this case, the typical approximation will look like
yt =

p
X

i i (xt )

i=0

where {i (.)}ni=0 is a family of orthogonal polynomials.


Choosing a weighting function:

The choice of the weighting function is

extremely important in that it will define the method we will use. Traditionally, we may define 3 broad classes of method, depending on the weighting
function we use:
1. The Least square method sets
i (x) =

R(xt , t+1 ; g, )
i

such that the problem amounts to find b such that


Z
b
Argmin
Et R(xt , t+1 ; )2 dxt

2. The collocation method uses the Dirac function as a weighting function,


such that
(x) = (x xi ) =

1 if x = xi
0 if x 6= xi

Therefore, by selecting this weighting function we set the residuals to


zero on the nodes x1 , . . . , xn the collocation points such that the
4

approximation is exact on these nodes. Note that nothing is imposed


outside these nodes. When orthogonal polynomials are selected, these
nodes are the roots of the polynomial of the highest order and the method
is called orthogonal collocation. Note that in this approximation method,
we only need n + 1 collocation points to identify the n + 1 elements of
, otherwise the system is overidentified.
3. The Galerkin method uses the basis functions as weighting function. For
instance, if the approximation is a linear combination of Chebychev polynomials of order 0 to n, the weighting functions are given by Chebychev
polynomials of order 0 to n:
i (x) = t (x)
In other words, this amounts to impose the orthogonality of the residuals
vis a vis the basis functions we are using. We are actually making use of
the fact that a continuous function is identically zero if it is orthogonal
to each element of a complete set of functions.
Choosing an integration method:

We shall consider two problems here:

1. The problem of the rational expectation: In this case, everything depends on the form of the process for the shocks. If we are to use Markov
chains, the integration problem is highly simplified as it only involves
a discrete sum. If we use continuous support for the shocks, the choice
of an integration method is dictated by the type of distribution we are
assuming. In most of the cases, we will use gaussian shocks such that
we will rely on GaussHermite quadrature methods described in lecture
notes # 4.
2. The problem of the inner product (equation (9.3)): In this case everything depends on the approximation method. If we are to use a
collocation method, no integration method is to be used as collocation
5

amounts to impose that the residuals are zero at each node, such that
(9.3) reduces to
Et R(xi , t+1 ; g, {}ni=0 ) = 0 for i = 1, . . . , n
When Least square methods are used, it is often the case that Legendre
quadrature will do the job. When a Galerkin method is selected, this will
often be the right choice too. However, when we are to use Chebychev
polynomials, a Chebychev quadrature is in order provided each weighting
function is defined as2
Ti ((x))
i (x) = p
1 (x)2

where (x) is a linear function mapping the domain of x into [-1;1].

9.2

Practical implementation

In the sequel, we will essentially discuss the collocation and Galerkin implementations of a spectral method using Chebychev polynomials as they seem
to be the most popular implementation of this method.3

9.2.1

The Collocation method

In this section, we present the collocation method in its simplest form. We will
start by presenting the general algorithm, when Chebychev polynomials are
used, and then present as an example the stochastic optimal growth model.
For the moment, let us assume that we want to solve a rational expectation
model that writes as
Et R(xt , t+1 ; g, ) = 0
2

Remember that Chebychev quadrature computes integrals of the form


Z 1
1
F (x)
dx
1 x2
1

3
The least square method can be straightforwardly implemented minimizing the implicit
objective function we discussed.

and we want to find an approximating function for the decision rule g(xt ) over
the domain [x, x]. Assume that we take as approximating function
(xt , )

n
X

i Ti ((xt ))

i=0

where Ti (.) is the Chebychev polynomial of order i = 0, . . . , n. The problem


is then to find the vector , such that
Et R(xi , t+1 ; , ) = 0 for i = 0, . . . , n
The algorithm is then as follows4
1. Choose an order of approximation n, compute the n + 1 roots of the
Chebychev polynomial of order n + 1 as

(2i 1)
zi = cos
for i = 1, . . . , n + 1
2(n + 1)
and formulate an initial guess for .
2. Compute xi as
xi = x + (zi + 1)

(x x)
for i = 1, . . . , n + 1
2

to map [-1;1] into [x; x].


3. Compute
Et R(xi , t+1 ; g, ) for i = 1, . . . , n + 1
4. if it is close enough to zero then stop and form
(xt , )

n
X

i Ti ((xt ))

i=0

else update and go back to 3.


4

We will discuss the one dimensional case. However, the multidimensional case will
be illustrated in an example. You are also referred to lecture notes #3 which presented
multidimensional approximation technics

The updating scheme for will typically be given by a nonlinear solver of the
type described in lecture notes #5. The computation of the integral involved
by the rational expectation will depend on the process we assumed for the
shocks. In order to better understand it, let us discuss the implementation of
the collocation method on the stochastic optimal growth model both in the
case of a Markov chain, and an AR(1) process.
The stochastic OGM (Markov chain)

: We will actually consider a

OGM in which the production function writes


yt = exp(at )kt
where at can take on two values [a, a] with a transition matrix

p
1p
=
1p
p
The Euler equation is given by

1
c
=0
t Et ct+1 exp(at+1 )kt+1 + 1

and the law of motion of capital

kt+1 = exp(at )kt ct + (1 )kt


But, as a matter of fact, by taking the definition of the Markov chain into
account, the model may be restated as the following system

ct (a) p ct+1 (a) exp(a)kt+1 (a)1 + 1

= 0
(1 p) ct+1 (a) exp(a)kt+1 (a)1 + 1

ct (a) (1 p) ct+1 (a) exp(a)kt+1 (a)1 + 1

= 0
p ct+1 (a) exp(a)kt+1 (a)1 + 1

kt+1 (a) exp(a)kt + ct (a) (1 )kt = 0


kt+1 (a) exp(a)kt + ct (a) (1 )kt = 0
8

where the notation ct (a) (resp. ct (a)) stands for the fact that the consumption
decision is contingent on the realization of the shock, a (resp. a), likewise for
kt+1 (a) (resp. kt+1 (a)).
It therefore appears that we actually have only two decision rules to compute, one for ct (a) and one for ct (a), each taking the form5

ct (a) ' (kt , (a)) exp

ct (a) ' (kt , (a)) exp

n
X
j=0

n
X
j=0

j (a)Tj ((log(kt )))

(9.4)

j (a)Tj ((log(kt )))

(9.5)

where the notation (a) accounts for the fact that the form of the decision
rule may differ depending on the realization of the shock, a. We may also
select different order of approximation for the two rules, but in order to keep
things simple we chose to impose the same order. The algorithm then works
as follows.
1. Choose an order of approximation n, compute the n + 1 roots of the
Chebychev polynomial of order n + 1 as
zi = cos

(2i 1)
2(n + 1)

for i = 1, . . . , n + 1

and formulate an initial guess for (a) and (a).


2. Compute ki as

log(k) log(k)
ki = exp log(k) + (zi + 1)
2

for i = 1, . . . , n + 1

to map [-1;1] into [k; k].


5

The exponential form was imposed in order to guarantee positivity of consumption.

3. Compute

n
X
j (a)Tj ((log(ki )))
ct (a) ' (kt , (a)) exp
j=0

n
X
ct (a) ' (kt , (a)) exp
j (a)Tj ((log(ki )))
j=0

and

kt+1 (ki , a) = exp(a)ki (ki , (a)) + (1 )ki

kt+1 (ki , a) = exp(a)ki (ki , (a)) + (1 )ki


at each node ki , i = 1, . . . , n + 1.
4. Then compute the possible levels of future consumption
at
a
a
a
a

at+1
a
a
a
a

Consumption
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))
P
(kt+1 (ki , a), (a)) nj=0 j (a)Tj ((kt+1 (ki , a)))

5. Evaluate the residuals


R(ki , a; ) = (ki , (a)) p(ki , a, a) (1 p)(ki , a, a)
R(ki , a; ) = (ki , (a)) (1 p)(ki , a, a) p(ki , a, a)

where

(ki , at , at+1 ) (kt+1 (ki , at ), (at+1 )) exp(at+1 )kt+1 (ki , at )1 + 1

for all i = 1, . . . , n.

6. if all residuals are close enough to zero then stop, else update and go
back to 3.
10

From a practical point of view, the last step is performed using a Newton
algorithm. Initial conditions can be obtained from a linear approximation of
the model (see the matlab codes in directory growth/collocmc).
Matlab Code: Collocation Method (OGM, Main Code)
clear all
global kmin ksup XX kt;
global nstate nbk ncoef XX XT PI;
nbk
nodes
nstate
ncoef

=
=
=
=

4;
nbk+1;
2;
nbk+1;

%
%
%
%

% need to set some parameters


% globally

Degree of polynomials
# of Nodes
# of possible states for technology shock
# of coefficients

delta
= 0.1;
% depreciation rate
beta
= 0.95;
% discount factor
alpha
= 0.3;
% capital elasticity
sigma
= 1.5;
% parameter of utility
%
% Steady state
%
ysk
=(1-beta*(1-delta))/(alpha*beta);
ksy
= 1/ysk;
ys
= ksy^(alpha/(1-alpha));
ks
= ys^(1/alpha);
is
= delta*ks;
cs
= ys-is;
%
% Markov Chain: technology shock (Tauchen Hussey)
%
rho
= 0.8;
se
= 0.2;
ma
= 0;
[agrid,wmat] = hernodes(nstate);
agrid
= agrid*sqrt(2)*se;
PI
= transprob(agrid,wmat,0,rho,se);
at
= agrid+ma;
%
% grid for the capital stock
%
kmin
= log(0.1);
ksup
= log(6);
rk
= rcheb(nodes);
% roots
kt
= exp(itransfo(rk,kmin,ksup)); % grid
XX
= cheb(rk,[0:nbk]);
% Polynomials
%
% Initial Conditions

11

%
a0=repmat([-0.2 0.65 0.04 0 0],nstate,1);
a0=a0(:);
%
% Main loop
%
param
= [alpha beta delta sigma];
th
= fcsolve(residuals,a0,[],param,at);
th
= reshape(th,ncoef,nstate);

Matlab Code: Residuals Function (OGM)


function res=residuals(theta,param,at);
global nbk kmin ksup XX kt PI nstate;
alpha
beta
delta
sigma
lt
theta

=
=
=
=
=
=

param(1);
param(2);
param(3);
param(4);
length(theta);
reshape(theta,lt/nstate,nstate);

RHS=[];
LHS=[];
for i=1:nstate;
ct
= exp(XX*theta(:,i));
% C(t)
k1
= exp(at(i))*kt.^alpha+(1-delta)*kt-ct;
% k(t+1)
rk1 = transfo(log(k1),kmin,ksup);
% k(t+1) in kmin,ksup
xk1 = cheb(rk1,[0:nbk]);
% polynomials
%
% Euler equation for all states
%
aux = 0;
for j=1:nstate;
c1
= exp(xk1*theta(:,j));
% c(t+1)
%
% Cumulates the left hand side of the Euler equation for all states
%
resid = beta*(alpha*exp(at(j))*k1.^(alpha-1)+1-delta).*c1.^(-sigma);
aux
= aux+PI(i,j)*resid;
end;
RHS = [RHS -sigma*log(ct)];
LHS = [LHS log(aux)];
end;
res=LHS-RHS;
res=res(:);

% Note that residuals are taken in logs (easier)

12

For the parameterization we use in the codes, and using 4th order Chebychev polynomials, we obtain the decision rules reported in table 9.1. Figure
Table 9.1: Decision rules (OGM, Collocation)
0
-0.2807
-0.4876

1
0.7407
0.8406

2
0.0521
0.0546

3
0.0034
0.0009

4
-0.0006
-0.0006

(9.1) reports the decision rules of consumption and next period capital stock.
As can be seen, they are (hopefully) alike those obtained from the value iteration method (see lecture notes #7).
Figure 9.1: Decision rules (OGM, Collocation)
6

Next period capital stock

1.5

4
3

0.5

1
0
0

Consumption

0
0

The AR(1) process : In this case, we assume that the technology shock is
modelled as
at+1 = at + t+1
where ; N (0, 2 ). The Euler equation is given by

1
c
=0
t Et ct+1 exp(at+1 )kt+1 + 1
13

and the law of motion of capital


kt+1 = exp(at )kt ct + (1 )kt
But, as a matter of fact, by taking the definition of AR(1) process, the model
may be restated as the following system
2
Z

1
1

ct p
ct+1 exp(at + t+1 )kt+1 + 1 exp t+1
dt+1 = 0
2
22
2
kt+1 exp(at )kt + ct (1 )kt = 0

We therefore have to compute the integral in the Euler equation. However,


since we have assumed a gaussian distribution for the shock we may use the
GaussHermite quadrature. Therefore, we make the following change of vari
ables: z = / 2 , such that the Euler equation rewrites as
Z

1
ct+1 exp(at + z 2 )kt+1
ct
+ 1 exp z 2 dz = 0

which makes more explicit the need for a GaussHermite quadrature. We then

just have to compute the nodes and weight for the quadrature, such that the
Euler equation rewrites
c
t

1 X
1

+1 =0
j ct+1 exp(at + zj 2 )kt+1

j=1

We now have to formulate a guess for the decision rule. Note that since we
have not specified particular values for the technology shock, we can introduce
it into the approximating rule, such that we will take

nk X
na
X
ct ' (kt , at , ) exp
jk ja Tjk ((log(kt )))Tja ((at ))
jk =0 ja =0

Note that since we use collocation, the numbers of nodes should be equal to
the total numbers of coefficients. It may then be much easier to work with
tensor basis rather than complete polynomials in this case. The algorithm
then works as follows.
14

1. Choose an order of approximation nk and na for each dimension, compute the nk + 1 and na + 1 roots of the Chebychev polynomial of order
nk + 1 and na + 1 as
zki
zai

(2i 1)
for i = 1, . . . , nk + 1
= cos
2(n + 1)

(2i 1)
= cos
for i = 1, . . . , na + 1
2(n + 1)

and formulate an initial guess for .


2. Compute ki as

log(k) log(k)
i
for i = 1, . . . , nk + 1
ki = exp log(k) + (zk + 1)
2
to map [-1;1] into [k; k] and
ai = log(a) + (zai + 1)

aa
for i = 1, . . . , na + 1
2

to map [-1;1] into [a; a] and


3. Compute

ct ' (ki , aj , ) = exp

nk X
na
X

jk =0 ja =0

and

jk ja Tjk ((log(ki )))Tja ((aj ))

kt+1 (ki , aj ) = exp(aj )ki (ki , aj , ) + (1 )ki


at each node (ki , aj ), i = 1, . . . , nk + 1 and j = 1, . . . , na + 1.
4. Then, for each state (ki , aj ), compute the possible levels of future consumption needed to compute the integral

ct+1 ' (kt+1 (ki , aj ), aj + z` 2 , )

nk X
na
X

jk ja Tjk ((log(kt+1 (ki , aj ))))Tja ((aj + z` 2 ))


' exp
jk =0 ja =0

for ` = 1, . . . , q.

15

5. Evaluate the residuals


R(ki , aj ; ) = (ki , aj , )

q
X

` (ki , aj , z` ; )

`=1

where

(ki , aj , z` ; ) (kt+1 (ki , aj ), aj +z` 2 ) exp(aj + z` )kt+1 (ki , aj )1 + 1

for all i = 1, . . . , nk + 1 and i = 1, . . . , na + 1.

6. if all residuals are close enough to zero then stop, else update and go
back to 3.
From a practical point of view, the last step is performed using a Newton
algorithm. Initial conditions can be obtained from a linear approximation of
the model (see the matlab codes in directory growth/collocg).
Matlab Code: Collocation Method (Stochastic OGM, Main Code)
clear all
global nbk kmin ksup XK kt;
global nba amin asup XA at;
global nstate nodea nodek wmat wgrid;
nbk=4;
nba=2;
nodek=nbk+1;
nodea=nba+1;
nstate=12;

%
%
%
%
%

% parameters that will be common to


% several subfunctions
%

Degree of polynomials (capital)


Degree of polynomials (technology shock)
# of nodes for capital stock
# of nodes for technology shock
# of nodes for Gauss-Hermite quadrature

delta
= 0.1;
% depreciation rate
beta
= 0.95;
% discount factor
alpha
= 0.3;
% capital elasticity
sigma
= 1.5;
% parameter of utility
rho
= 0.8;
% persistence of AR(1)
se
= 0.2;
% standard deviation of innovation
ma
= 0;
% average of the process
%
% deterministic steady state
%
ysk
=(1-beta*(1-delta))/(alpha*beta);
ksy
= 1/ysk;
ys
= ksy^(alpha/(1-alpha));
ks
= ys^(1/alpha);

16

is
= delta*ks;
cs
= ys-is;
ab
= 0;
%
% grid for the technology shocks
%
[wgrid,wmat]=hernodes(nstate);
% weights and nodes for quadrature
wgrid
= wgrid*sqrt(2)*se;
amin
= (ma+wgrid(nstate));
asup
= (ma+wgrid(1));
ra
= rcheb(nodea);
% roots
at
= itransfo(ra,amin,asup);
% grid
XA
= cheb(ra,[0:nba]);
% Polynomials
%
% grid for the capital stock
%
kmin
= log(.1);
ksup
= log(6);
rk
= rcheb(nodek);
% roots
kt
= exp(itransfo(rk,kmin,ksup)); % grid
XK
= cheb(rk,[0:nbk]);
% Polynomials
%
% Initial Conditions
%
a0
= [
-0.23759592487257
0.60814488103911
0.03677400318790
0.69025680170443
-0.21654209984197
0.00551243342828
0.03499834613714
-0.00341171507904
-0.00449139656933
0.00085302605779
0.00285737302122
-0.00002348542016
-0.00011606672164
-0.00003323351559
0.00018045618825];
a0
= a0(:);
%
% main part!
%
param
= [alpha beta delta sigma ma rho];
th
= fcsolve(residuals,a0,[],param);

17

Matlab Code: Residuals Function (Stochastic OGM)


function res=residuals(theta,param);
global nbk kmin ksup XK kt;
global nba amin asup XA at;
global nstate nodea nodek wmat wgrid;
alpha
= param(1);
beta
= param(2);
delta
= param(3);
sigma
= param(4);
ma
= param(5);
rho = param(6);
RHS=[];
LHS=[];
XX=[];
for i
= 1:nodek;
for j=1:nodea;
XX0 = makepoly(XA(j,:),XK(i,:));
ct = exp(XX0*theta);
% consumption
XX = [XX;XX0];
k1 = exp(at(j))*kt(i).^alpha+(1-delta)*kt(i)-ct;
% k(t+1)
rk1 = transfo(log(k1),kmin,ksup);
% log(k1) to [-1;1]
xk1 = cheb(rk1,[0:nbk]);
% Cheb. polynomials
a1 = rho*at(j)+(1-rho)*ma+wgrid;
% next period shock
ra1 = transfo(a1,amin,asup);
% a(t+1) to [-1;1]
XA1 = cheb(ra1,[0:nba]);
% Cheb. Polynomials
XX1 = makepoly(XA1,xk1);
c1 = exp(XX1*theta);
% c(t+1)
%
% computes the integral
%
tmp = beta*(alpha*exp(a1)*k1.^(alpha-1)+1-delta).*c1.^(-sigma);
aux = wmat*tmp/sqrt(pi);
RHS = [RHS;log(aux)];
LHS = [LHS;log(ct.^(-sigma))];
end
end;
res = LHS-RHS;
res = res(:);

For the parameterization we use in the codes, and using 4th order Chebychev polynomials for the capital stock and second order polynomials for the
technology shock, we obtain the decision rules reported in table 9.2.
18

Table 9.2: Decision rule (Consumption, OGM, Collocation, AR(1))

T0 (at )
T1 (at )
T2 (at )

T0 (kt )
-0.3609
0.6474
0.0338

T1 (kt )
0.7992
-0.2512
0.0103

T2 (kt )
0.0487
-0.0096
-0.0058

T3 (kt )
0.0019
0.0048
-0.0004

T4 (kt )
-0.0003
0.0001
0.0004

Figure (9.2) reports the decision rules of consumption and next period
capital stock.

Figure 9.2: Decision rules (OGM, Collocation, AR(1))


Next period capital stock

Consumption

10

3
2

5
1
0
10

5
k

9.2.2

0 2
t

0
10

5
k

0 2

The Galerkin method

In this section, we present the Galerkin method. As in the previous section, we


will start by presenting the general algorithm, when Chebychev polynomials
are used, and then present as an example the stochastic optimal growth model.
Assume that we want to solve a rational expectation model that writes as

Et R(xt , t+1 ; g, ) = 0
19

We therefore want to find an approximating function for the decision rule g(x t )
over the domain [x, x]. Assume that we take as approximating function
(xt , )

n
X

i Ti ((xt ))

i=0

where Ti (.) is the Chebychev polynomial of order i = 0, . . . , n. The problem


is then to find the vector , such that
Z 1
Et R(x, t+1 ; , {i }ni=0 )i ((x))d(x) = 0 for i = 0, . . . , n
1

where

Ti ((x))
i (x) = p
1 (x)2

In fact, the integral may be evaluated using a GaussChebychev quadrature,


such that, as the weights of the GaussChebychev quadrature are constant,
the problem amounts to solve
m
X
j=1

Et R(xj , t+1 ; , {i }ni=0 )Ti ((xj )) = 0 for i = 0, . . . , n

which rewrites
T ((x))R(x, t+1 ; , ) = 0
where

T0 ((x1 )) . . . T0 ((xm ))

..
..
..
T (x) =

.
.
.
Tn ((x1 )) . . . Tn ((xm ))

The algorithm then works as follows6

1. Choose an order of approximation n, compute the m > n roots of the


Chebychev polynomial of order m > n as

(2i 1)
for i = 1, . . . , m
zi = cos
2m
and formulate an initial guess for .
6

We will discuss the one dimensional case. However, the multidimensional case will
be illustrated in an example. You are also referred to lecture notes #3 which presented
multidimensional approximation technics

20

2. Compute the matrix T (x) as

T0 (z1 ) . . . T0 (zm )

..
..
..
T (x) =

.
.
.
Tn (z1 ) . . . Tn (zm )

3. Compute xi as
xi = x + (zi + 1)

(x x)
for i = 1, . . . , n + 1
2

to map [-1;1] into [x; x].


4. Compute
Et R(xi , t+1 ; g, ) for i = 1, . . . , n + 1
and evaluate
T ((x))R(x, t+1 ; , ) = 0
5. if it is close enough to zero then stop and form
(xt , )

n
X

i Ti ((xt ))

i=0

else update and go back to 3.


As before, the updated in steps 4 and 5 will typically be obtained from
a nonlinear solver of the type described in lecture notes #5. Likewise, the
computation of the integral involved by the rational expectation will depend
on the process we assumed for the shocks. In order to better understand
it, and as in the previous section, we will discuss the implementation of the
method on the stochastic optimal growth model both in the case of a Markov
chain, and an AR(1) process.
The stochastic OGM (Markov chain) : We consider the same OGM as
in the previous section, such that the production function writes
yt = exp(at )kt
21

where at can take on two values [a, a] with a transition matrix

p
1p
=
1p
p

The Euler equation is given by


1
=0
c
t Et ct+1 exp(at+1 )kt+1 + 1
and the law of motion of capital

kt+1 = exp(at )kt ct + (1 )kt


In fact, the algorithm is exactly the same up to some slight differences that I
will just mention:
1. Rather than evaluating the residuals on n + 1 nodes, we evaluate them
on m > n nodes.
2. The functions we want to set to zero are not the residuals anymore, as
we have to compute the integral. Therefore, the vector solves
T (z)R(k, a; ) = 0
T (z)R(k, a; ) = 0
The matlab codes associated to the Galerkin method are basically the same
as the ones for the collocation method, up to some minor differences. We
were using a greater number of nodes (20) such that in the main code, the
line nodes=nbk; now reads nodes=20;, and the residuals are projected on the
chebychev matrix, such that the codes for the residuals is slightly modified,
as the line res=LHS-RHS; is now replaced by res=XX*(LHS-RHS);. Applying
these modifications, we end up with the decision rules reported in table 9.3.
These decision rules do not differ much from those obtained from collocation
at the 4 digits precision, they actually differ if we report numbers with a
greater precision. This actually illustrates the remarkable capability of the
collocation method to capture the main features of the decision rule in this
particular example (this is actually related to the smoothness of the decision
rules). This would not be the case for other more sophisticated models.
22

Table 9.3: Decision rules (OGM, Galerkin)


0
-0.2807
-0.4876

1
0.7407
0.8406

2
0.0521
0.0546

3
0.0034
0.0009

4
-0.0005
-0.0006

The AR(1) process : Like in the case of a Markov chain, the matlab codes
associated to the Galerkin method are basically the same as the ones for the
collocation method, up to some minor differences. We were using a greater
number of nodes (20 for capital and 10 for the shock) such that in the main
code, the line nodek=nbk+1; and nodea=nba+1; now read nodek=20; and
nodea=10;, and the residuals are projected on the chebychev matrix, such
that the codes for the residuals is slightly modified, as the line res=LHS-RHS;
is now replaced by res=XX*(LHS-RHS);. Further, we used complete basis
rather than tensor basis (see the makepoly.m function)
Table 9.4: Decision rule parameters

T0 (a) -0.359970
T1 (a) 0.009890
T2 (a) 0.001764

T0 (k)
0.647039
0.048143
0.004738

T1 (k)
0.032769
-0.009799
-0.000247

23

T2 (k)
0.799127
-0.005171

T3 (k)
-0.252257

T4 (k)

9.3

The PEA modification

The PEA modification to minimum weighted residual method has recently


been proposed by Christiano and Fisher [2000]. The basic idea to this modification is basically to apply exactly the same technic as the one we just
described to the expectation function rather than the decision rule. One very
attractive feature of this modification is that it will enable us to handle, as
easily as in the case of PEA, problems in which there is a possibly binding
constraint. Since, this modification is only a change in the function we are
approximating, I will not elaborate any further on the method which is
the exact application of the previously described methods to the expectation
function but will rather provide detailed examples of application, focusing
on the Galerkin method.

9.3.1

The optimal growth model

Let us first recall the type of problem we have in hand. We are about to solve
the set of equations

1
t Et t+1 exp(at+1 )kt+1
+1
= 0
c
t t = 0

kt+1 exp(at )kt + ct (1 )kt = 0


at+1 at t+1 = 0
Our problem will therefore be to get an approximation for the expectation
function:

1
Et t+1 exp(at+1 )kt+1
+1

In this problem, we will deal with a continuous AR(1) process,7 such we have
2 state variables: kt and at , such that (.) should be a function of both kt and
7

See directory growth/peagalmc for the matlab code with a Markov chain.

24

at . Like in the standard Galerkin procedure, we will use a guess of the form

(kt , at , ) exp

jk =0,...,nk
ja =0,...,na
ja +jk 6max(nk ,na )

jk ja Tjk ((log(kt )))Tja ((at ))

therefore imposing a complete basis of polynomials. Then the algorithm is


pretty much the same as the one for standard Galerkin procedure with an
AR(1) process.
1. Choose an order of approximation nk and na for each dimension, compute the mk > nk (nk + 1 if you use collocation) and ma > na (na + 1
for collocation) roots of the Chebychev polynomial of order mk and ma
as
zki
zai

(2i 1)
= cos
for i = 1, . . . , mk
2(n + 1)

(2i 1)
= cos
for i = 1, . . . , ma
2(n + 1)

and formulate an initial guess for .


2. Compute ki as

log(k) log(k)
i
ki = exp log(k) + (zk + 1)
for i = 1, . . . , mk
2
to map [-1;1] into [k; k] and
ai = log(a) + (zai + 1)

aa
for i = 1, . . . , ma
2

to map [-1;1] into [a; a] and


3. Compute the approximating expectation function at each node (ki , aj ),
i = 1, . . . , mk and j = 1, . . . , ma :
(ki , aj , )
25

4. Deduce the level of consumption, which, from the Euler equation, is


given by
ct (ki , aj , ) = (ki , aj , )1/
and the capital stock
kt+1 (ki , aj ) = exp(aj )ki ct (ki , aj , ) + (1 )ki
5. Then, for each state (ki , aj ), compute the possible levels of future consumption needed to compute the integral by computing the expectation
function in t + 1

(kt+1 (ki , aj ), aj + z` 2 , )
and the consumption in t + 1

ct (kt+1 (ki , aj ), aj + z` 2 , ) = (kt+1 (ki , aj ), aj + z` 2 , )1/


for ` = 1, . . . , q
6. Evaluate the residuals ((ki , aj , ) minus the expectation function)
q
X
` (ki , aj , z` ; )
R(ki , aj ; ) = (ki , aj , )

`=1

where

(ki , aj , z` ; ) (kt+1 (ki , aj ), aj +z` 2 ) exp(aj + z` )kt+1 (ki , aj )1 + 1


for all i = 1, . . . , mk and i = 1, . . . , ma .

7. if all residuals (projected on polynomials in the case of standard Galerkin)


the are close enough to zero then stop, else update and go back to 3.
The implementation of this version of the problem is extremely close to the
standard Galerkin (collocation) procedure, such that you are left to the matlab
code (in /growth/peagalg and /growth/peagalmc directories).
26

At this point, it is not clear why the PEA modification to the minimum
weighted residual method could be interesting, as it does not differ so much
from its standard implementation. However, it has the attractive feature of
being able to deal with binding constraints extremely easily. As an example
of implementation, we shall now study the stochastic optimal growth model
with an irreversibility in investment decision.8

9.3.2

Irreversible investment

We now consider a variation to the previous model, in the sense that we restrict
gross investment to be positive in each and every period:
it > 0 kt+1 > (1 )kt

(9.6)

This assumption amounts to assume that there does not exist a second hand
market for capital. In such a case the problem of the central planner is to
determined consumption and capital accumulation, such that utility is maximum:
max E0

{ct ,kt+1 }t=0

s.t.

X
t=0

c1
1
t
1

kt+1 = exp(at )kt ct + (1 )kt


and
kt+1 > (1 )kt
Forming the Lagrangean associated to the previous problem, we have
"

c1 1

ct+ + (1 )kt+ kt+ +1


Lt = E t
t+
+ t+ exp(at+ )kt+
1
=0
#
+ t+ (kt+1 (1 )kt )

8
You will also find in the matlab codes a version of the problem of a consumer facing a
borrowing constraint in the directory borrow.

27

which leads to the following set of first order conditions


c
= t
t

(9.7)

1
t t = Et t+1 exp(at+1 )kt+1
+ 1 t+1 (1 )

kt+1 = exp(at )kt ct + (1 )kt


t (kt+1 (1 )kt )

(9.8)
(9.9)
(9.10)

The main difference with the previous example is that now the central planner
faces a constraint that may be binding in each and every period. Therefore,
this complicates a little bit the algorithm, and we have to find a rule for both
the expectation function
Et [t+1 ]
where

1
t+1 t+1 exp(at+1 )kt+1
+ 1 t+1 (1 )

and t . We then proceed as we did in the PEA algorithm, that is we will


compute an approximation to the expectation function and to the Lagrangean
multiplier, checking at each computed node whether the constraint is binding
or not. In order to keep things simple, we will discuss the case where the shock
follows a Markov Chain,9 Therefore, the rule of thumb will be such that

nk
X
Et [t+1 ] ' (kt , a` ; ) exp
(a` )Tj ((log(ki ))) for each a` , ` = 1, . . . , na
j=0

where na denotes the number of possible states for a.

The algorithm is very close to the one we used in the case of PEA and
closely follows ideas suggested in Marcet and Lorenzoni [1999] and Christiano
and Fisher [2000]. In the case of a Markov chain and for the Galerkin procedure, it can be sketched as follows.
9
You are left to study the matlab code located in /irrev/peagalg for the continuous
AR(1) case.

28

1. Choose an order of approximation nk and a number of possible states na


for the shock, compute the mk > nk roots of the Chebychev polynomial
of order mk as
zi = cos

(2i 1)
2(n + 1)

for i = 1, . . . , mk

Compute a parameters of the Markov chain (Transition matrix = [` ]


and formulate an initial guess for .
2. Compute the matrix T (z) as

T0 (z1 ) . . . T0 (zm )

..
..
..
T (z) =

.
.
.
Tn (z1 ) . . . Tn (zm )

3. Compute ki as

log(k) log(k)
for i = 1, . . . , nk + 1
ki = exp log(k) + (zi + 1)
2
to map [-1;1] into [k; k].
4. Compute the approximating expectation function at each node ki , i =
1, . . . , mk , and for each possible state a` , ` = 1, . . . , na :

nk
X
(ki , a` ; ) = exp
(a` )Tj ((log(ki )))
j=0

5. Deduce the level of consumption, which, from the Euler equation, is


given by
ct (ki , a` , ) = (ki , a` , )1/
the level of investment
it (ki , a` ; ) = exp(a` )ki ct (ki , a` , )
the level of next period capital stock
kt+1 (ki , a` ; ) = it (ki , a` ) + (1 )ki
29

6. Check whether the constraint is binding for each ki , i = 1, . . . , nk and


each a` , ` = 1, . . . , na
If the constraint is not binding then keep computed values for
ct (ki , a` , ) and kt+1 (ki , a` ; ) and set

t (ki , a` ; ) = 0
If it is binding then set
kt+1 (ki , a` ; ) = (1 )ki

ct (ki , a` , ) = exp(a` )ki

t (ki , a` ; ) = ct (ki , a` ; )
t (ki , a` ; ) = t (ki , a` ; ) (ki , a` ; )
7. Then, for each state (ki , a` ), i = 1, . . . , nk and ` = 1, . . . , na , compute
the possible levels of future consumption without taking the constraint
into account
ct+1 (kt+1 (ki , a` ; ), a ; ) = (kt+1 (ki , a` ; ), a ; )1/
for = 1, . . . , na , the level of next period investment
it (kt+1 (ki , a` ; ), a ; ) = exp(a )kt+1 (ki , a` ; ) ct (kt+1 (ki , a` ; ), a ; )
8. Check whether the constraint is binding at each position
If the constraint is not binding then keep computed values for
ct+1 (kt+1 (ki , a` ; ), a ; ) and set

t+1 (kt+1 (ki , a` ; ), a ; ) = 0


If it is binding then set
ct+1 (kt+1 (ki , a` ; ), a ; ) = exp(a )kt+1 (ki , a` ; )
t+1 (kt+1 (ki , a` ; ), a ; ) = ct+1 (kt+1 (ki , a` ; ), a ; )
t+1 (kt+1 (ki , a` ; ), a ; ) = t+1 (kt+1 (ki , a` ; ), a ; ) (kt+1 (ki , a` ; ), a ; )
30

9. Evaluate the residuals ((ki , a` ; ) minus the expectation function)


R(ki , a` ; ) = (ki , a` ; )

na
X

` (ki , a` , a ; )

=1

where
(ki , a` , a ; ) ct+1 (kt+1 (ki , a` ; ), a ; )

exp(a )kt+1 (ki , a` ; )1 + 1


t+1 (kt+1 (ki , a` ; ), a ; )(1 )

for all i = 1, . . . , mk and ` = 1, . . . , na .


10. Compute the inner product involved by the Galerkin procedure
T ((k))R(k, a; )
11. if all inner products are close enough to zero then stop, else update
and go back to 3.
Note that, as in the standard PEA algorithm, we treat t as an technical
variable which is just used to compute the residuals. We therefore do not need
to compute explicitly its interpolating function.
Matlab Code: PEA Galerkin (Irreversible Investment)
clear all
global kmin ksup XX kt;
% parameters that will be common to
global nstate nbk ncoef XX XT PI; % several subfunctions
nbk=10;
nodes=30;
nstate=5;
ncoef=nbk+1;
delta
beta
alpha
sigma
ysk
ksy

%
%
%
%

Degree of polynomials
# of Nodes
# of possible states for the shock
# of coefficients

= 0.1;
% depreciation rate
= 0.95;
% discount factor
= 0.3;
% capital elasticity
= 1.5;
% CRRA Utility
=(1-beta*(1-delta))/(alpha*beta);
= 1/ysk;

31

ys
= ksy^(alpha/(1-alpha));
ks
= ys^(1/alpha);
is
= delta*ks;
cs
= ys-is;
%
% Markov Chain: (Tauchen-Hussey 1991) technology shock
%
rho
= 0.8;
% persistence
se
= 0.075;
% volatility
ma
= 0;
[agrid,wmat]=hernodes(nstate);
agrid
= agrid*sqrt(2)*se;
PI
= transprob(agrid,wmat,0,rho,se);
at
= agrid+ma;
%
% grid for the capital stock
%
kmin
= log(1);
ksup
= log(7);
rk
= rcheb(nodes);
% roots
kt
= itransfo(rk,kmin,ksup);
% grid
XX
= cheb(rk,[0:nbk]);
% Polynomials
%
% Initial Conditions
%
a0=[
-0.32518704
-0.22798091
-0.15021991
-0.12387075
-0.61535041
-0.64879442
-0.70031744
-0.84056683
-0.02892469
-0.03369096
-0.05570433
-0.15063263
-0.00565378
-0.00941495
-0.02618621
-0.09526710
-0.00296230
-0.00547789
-0.01692339
-0.06102182
-0.00145337
-0.00274142
-0.00988611
-0.03461623
-0.00065225
-0.00115070
-0.00536657
-0.01776343
-0.00025245
-0.00038302
-0.00291504
-0.00838555
-0.00004845
-0.00001360
-0.00162292
-0.00365588
0.00001150
0.00013398
-0.00088759
-0.00124601
-0.00003248
0.00013595
-0.00055706
-0.00013116
];
%
% Solves the problem effectively
%
param
= [alpha beta delta sigma];
th
= fcsolve(residuals,a0,[],param,at);
th
= reshape(th,ncoef,nstate);
%
% Evaluate the approximation
%
lt
= length(th);
nb
= 100;

32

-0.23608269
-1.23998020
-0.45685868
-0.32031660
-0.21319594
-0.13635186
-0.09027963
-0.06431768
-0.04689460
-0.03294968
-0.02163670

kt
= [kmin:(ksup-kmin)/(nb-1):ksup];
rk
= transfo(kt,kmin,ksup);
XX
= cheb(rk(:),[0:nbk]);
kt
= exp(kt);
kp
= zeros(nb,nstate);
it
= zeros(nb,nstate);
ct
= zeros(nb,nstate);
mu
= zeros(nb,nstate);
for i=1:nstate;
Upsilon = exp(XX*th(:,i));
y
= exp(at(i))*kt.^alpha;
it(:,i) = max(y-Upsilon.^(-1/sigma),0);
ct(:,i) = y-it(:,i);
kp(:,i) = it(:,i)+(1-delta)*kt;
mu(:,i) = ct(:,i).^(-sigma)-Ephi;
end;
Matlab Code: Residuals Function (Irreversible Investment)
function res=residuals(theta,param,at);
global nbk kmin ksup XX kt PI nstate;
alpha
= param(1);
beta
= param(2);
delta
= param(3);
sigma
= param(4);
lt
= length(theta);
theta
= reshape(theta,lt/nstate,nstate);
RHS=[];LHS=[];
% lhs and rhs of Euler equation
for i=1:nstate;
Upsilon = exp(XX*theta(:,i));
% evaluate expectation
y
= exp(at(i))*exp(kt).^alpha;
% Output
iv
= max(y-Ephi.^(-1/sigma),0);
% takes care of the constraint
k1
= (1-delta)*exp(kt)+iv;
% next period capital stock
rk1
= transfo(log(k1),kmin,ksup);
% maps it into [-1;1]
xk1
= cheb(rk1,[0:nbk]);
% Computes the polynomials
aux
= 0;
for j=1:nstate;
Upsilon1 = exp(xk1*theta(:,j));
% used for the consumption
y1
= exp(at(j))*k1.^alpha;
% next period output
mu1
= max(y1.^(-sigma)-Ephi1,0); % mu>0 <=> lambda=(c=y)^(-sigma)
tmp
= beta*((alpha*y1./k1+1-delta).*Upsilon1-mu1*(1-delta));
aux
= aux+PI(i,j)*tmp;
end;
RHS = [RHS (aux)];
LHS = [LHS (Ephi)];
end;
res = XX*(LHS-RHS);
% Galerkins projection
res = res(:);

33

Once again an important issue in this problem is that of initial conditions.


The initial conditions reported in the matlab codes are actually extremely
close to the solution. They were obtained using a homotopy type of approach.
In praxis, we started from the deterministic optimal growth model without any
irreversibility, and get a decision rule using an approximation of order 10. We
then assign the decision rule we obtained in this case to a stochastic optimal
growth model, with 5 possible states but low volatility (0.01), and obtained a
new solution. Then we introduced the irreversibility constrained, and starting
from the set of initial conditions we had, we get an approximate solution for the
optimal growth model with irreversibility but with a low volatility. Starting
from this new solution, we solved the same problem with higher volatility
(0.02) and applied again the process until we reached the desired volatility
(se=0.075). At a first sight, this may seem quite a long and useless process,
but it saves a lot of time as it is much easier to go from a problem for which
we have a solution to the problem we actually want to solve than trying to
guess initial conditions.
Table 9.5: Decision rules parameters (Irreversible investment, PEAGalerkin)

T0 (log(kt ))
T1 (log(kt ))
T2 (log(kt ))
T3 (log(kt ))
T4 (log(kt ))
T5 (log(kt ))
T6 (log(kt ))
T7 (log(kt ))
T8 (log(kt ))
T9 (log(kt ))
T1 0(log(kt ))

a1
-0.271633
-0.617972
-0.020999
-0.000673
-0.000403
-0.000251
-0.000073
-0.000031
-0.000042
-0.000028
-0.000008

a2
-0.198195
-0.639585
-0.021617
-0.001298
-0.000913
-0.000670
-0.000312
-0.000103
-0.000104
-0.000093
-0.000032

a3
-0.131885
-0.662475
-0.024964
-0.004093
-0.002900
-0.002002
-0.001192
-0.000519
-0.000363
-0.000359
-0.000182

a4
-0.076585
-0.701972
-0.042518
-0.017748
-0.012222
-0.007493
-0.004215
-0.001893
-0.000923
-0.000890
-0.000557

a5
-0.044720
-0.786584
-0.094447
-0.053895
-0.032597
-0.015666
-0.005548
-0.000922
0.000202
-0.000995
-0.001279

Table 9.5 and Figure 9.3 report the approximate decision rules for an economy where = 0.3, = 0.95, = 0.1, = 1.5, the persistence of the shock
34

is = 0.8 and the volatility = 0.075. As can be seen, only low values
for the shock make the constraint binds. This is illustrated by the fact that
the Lagrange multiplier becomes positive, and investment becomes negative.
Therefore, the next period capital stock is given by what is left by the depreciation and output is fully consumed.
Figure 9.3: Decision rules (Irreversible investment, PEAGalerkin)
7

Next period capital stock

Consumption

6
5

1.5

4
3

2
1
0

4
k

0.5
0

4
k

Investment

0.5

0.3

0.4

Lagrange multiplier

0.2

0.3
0.1
0.2
0

0.1
0
0

4
k

0.1
0

4
k

Note that up to now, we have not address the important issue of the
evaluation of the rule we obtained using any of the method we have presented
so far. This is the object of the next lecture notes.

35

36

Bibliography
Christiano, L.J. and J.D.M. Fisher, Algorithms for solving dynamic models
with occasionally binding constraints, Journal of Economic Dynamics
and Control, 2000, 24 (8), 11791232.
Judd, K., Projection Methods for Solving Aggregate Growth Models, Journal
of Economic Theory, 1992, 58, 410452.
Marcet, A. and G. Lorenzoni, The Parameterized Expectations Approach:
Some Practical Issues, in M. Marimon and A. Scott, editors, Computational Methods for the Study of Dynamic Economies, Oxford: Oxford
University Press, 1999.
McGrattan, E., Solving the Stochastic Growth Model with a Finite Element
Method, Journal of Economic Dynamics and Control, 1996, 20, 1942.
, Application of Weighted Residual Methods to Dynamic Economic Models, in M. Marimon and A. Scott, editors, Computational Methods for the
Study of Dynamic Economies, Oxford: Oxford University Press, 1999.

37

38

Contents
9 Minimum Weighted Residual Methods
9.1

9.2

9.3

The Basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . .

9.1.1

Stating the problem . . . . . . . . . . . . . . . . . . . .

9.1.2

Implementation . . . . . . . . . . . . . . . . . . . . . . .

Practical implementation . . . . . . . . . . . . . . . . . . . . .

9.2.1

The Collocation method . . . . . . . . . . . . . . . . . .

9.2.2

The Galerkin method . . . . . . . . . . . . . . . . . . .

19

The PEA modification . . . . . . . . . . . . . . . . . . . . . . .

24

9.3.1

The optimal growth model . . . . . . . . . . . . . . . .

24

9.3.2

Irreversible investment . . . . . . . . . . . . . . . . . . .

27

39

40

List of Figures
9.1

Decision rules (OGM, Collocation) . . . . . . . . . . . . . . . .

13

9.2

Decision rules (OGM, Collocation, AR(1)) . . . . . . . . . . . .

19

9.3

Decision rules (Irreversible investment, PEAGalerkin) . . . . .

35

41

42

List of Tables
9.1

Decision rules (OGM, Collocation) . . . . . . . . . . . . . . . .

13

9.2

Decision rule (Consumption, OGM, Collocation, AR(1)) . . . .

19

9.3

Decision rules (OGM, Galerkin) . . . . . . . . . . . . . . . . . .

23

9.4

Decision rule parameters . . . . . . . . . . . . . . . . . . . . . .

23

9.5

Decision rules parameters (Irreversible investment, PEAGalerkin) 34

43