0 vues

Transféré par Ashwin Dev

,.,.b .b.lk/kn/n,/

- Lambert 2018
- 52120 MT Modern Control Theory
- Estimation Error and Port Optimization
- 9th U.S. National and 10th Canadian Conference on Earthquake Engineering Paper1
- WATER DISTRIBUTION NETWORK DESIGN BY PARTIAL ENUMERATION
- PDD
- _optContr
- A Simple Inventory Replenishment.pdf
- Distribution system state
- no_42.pdf
- Line Balancing Presentation
- auto_vehicle_guidance.pdf
- Cost Optimization of Roof Top Swimming Pool
- Teaching Reaction Using Attainable Region
- Chapter 5
- On the Zero-modules of Spectral Factors Using State Space Methods
- NonlinearModelLibrary
- Control systems resource sheet
- art%3A10.1007%2Fs40903-015-0019-4
- 1-s2.0-S1877705816308517-main

Vous êtes sur la page 1sur 15

Martin Ellison

1 Motivation

Dynamic programming is one of the most fundamental building blocks of

modern macroeconomics. It gives us the tools and techniques to analyse

(usually numerically but often analytically) a whole class of models in which

the problems faced by economic agents have a recursive nature. recursive

problems pervade macroeconomics: any model in which agents face repeated

decision problems tends to have a recursive formulation. This lecture intro-

duces two key concepts: the value function and value function iterations. To

fully understand the intuition of dynamic programming, we begin with sim-

ple models that are deterministic. Models which are stochastic and nonlinear

will be considered in future lectures.

2 Key reading

This lecture draws on the material in chapters 2 and 3 of “Dynamic Eco-

nomics: Quantitative Methods and Applications” by Jérôme Adda and Rus-

sell Cooper, Massachusetts Institute of Technology, 2003. We will also use

the book later in the course. It is a very accessible introduction to techniques

for dynamic economics, covering topics including consumption, investment

1

and employment. As it is very reasonably priced, I recommend this text for

purchase.

3 Other reading

The same material is covered in several textbooks, most notably “Recursive

Macroeconomic Theory”, 2nd ed by Lars Ljungqvist and Tom Sargent, MIT

Press, 2000 and the original grandfather text “Recursive Methods in Eco-

nomic Dynamics” by Nancey Stokey and Robert Lucas, 1989. These cover

the material with greater mathematical rigor, whereas Adda and Cooper

place greater weight on intuitive understanding.

At the heart of dynamic programming is the value function, which shows the

value of a particular state of the world. For example, what is the value of

having income I when there are two goods in the economy, x1 and x2 , at prices

p1 and p2 and utility is logarithmic and separable? To answer this question,

we begin by asking how the consumer would allocate income I across the two

goods. The utility maximisation problem faced by the consumer is

max [! ln x1 + (1 − !) ln x2 ]

x1 ;x2

s:t:

p1 x1 + p2 x2 = I

Taking first order conditions gives the familiar solution in which there are

constant expenditure shares.

2

p1 x1 = !I

p2 x2 = (1 − !)I

Solving these two equations for x1 and x2 , we can then substitute into

the utility function to obtain.

µ ¶ µ ¶

!I (1 − !)I

V (I; p1 ; p2 ) = ! ln + (1 − !) ln

p1 p2

Students of microeconomics will immediately recognise this as the indirect

utility function. It describes how utility depends on the state variables I, p1

and p2 . However, in the dynamic programming terminology, we refer to it as

the value function - the value associated with the state variables. Note that

it is intrinsic to the value function that the agents (in this case the consumer)

is optimising. More generally, we can write

V (I; p) = maxu(c)

c∈C

s:t:

pc = I

of their corresponding prices. this formulation makes it explicit that the

value function incorporates optimisation. Whilst the example in this section

is trivial, we will see that recasting economic problems in terms of value

functions turns out to be extremely powerful.

5 Cake-eating example

To introduce dynamics to the problem, we now consider the problem of how

quickly one should eat a cake of given size. Imagine the cake is initially of

3

size W1 and all cake should be eaten before time T (by which time presum-

ably either the cake has become moldy or the consumer has died and become

moldy!) Instantaneous utility derived from eating cake is given by the func-

tion u(ct ) and the consumer discounts future utility by the factor ¯. This is

a finite-horizon dynamic problem with discounting.

T

X

max ¯ t−1 u(ct )

{ct }

t=1

s:t:

Wt+1 = Wt − ct

W1 given

in macroeconomics, the preferred solution method is one of direct attack. We

solve the budget constraint forward to obtain

T

X

ct + WT +1 = W1

t=1

T

" T

#

X X

L= ¯ t−1 u(ct ) + ¸ W1 − WT +1 − ct

t=1 t=1

¯ t−1 u0 (ct ) = ¸

Alternatively,

4

This is the familiar Euler equation, equating the net present value of

marginal utility of consumption across consecutive time periods. In itself, it

is not suﬃcient to uniquely determine how the cake should be eaten. For

that, we also require that WT +1 = 0, a terminal condition that states that

no cake should be left over after period T .

To solve the problem using the method of dynamic programming, we

define a value function VT (W1 ) to be the solution derived above with the

method of direct attack, i.e.,

T

X

VT (W1 ) = max ¯ t−1 u(ct )

{ct }

t=1

s:t:

Wt+1 = Wt − ct

W1 given

VT0 (W1 ). An increment in the initial cake size W1 allows consumption in any

period to increase, therefore, VT0 (W1 ) = ¯ t−1 u0 (ct ). It does not matter in

which period the extra cake is eaten since, due to optimality, the return (in

terms of the value function) of eating extra cake is equalised across periods.

The power of dynamic programming becomes apparent when we add an

additional period 0 to our problem. The problem at time 0 is to solve

c0

s:t:

W1 = W0 − c0

W0 given

5

This is a simple problem to solve because we only have to choose c0 rather

than a whole time path for consumption {ct }. The first order condition is

simply

However, we know from before that VT0 (W1 ) = ¯ t−1 u0 (ct ) = u0 (c1 ) so we

can conclude

and we have derived the Euler equation using the dynamic programming

method. Notice how we did not need to worry about decisions from time

t = 1 onwards. This is an example of the Bellman optimality principle. It is

suﬃcient to optimise today conditional on future behaviour being optimal.

The ease with which we did this is of course illusionary because we already

knew the form of VT0 (W1 ) from the direct attack approach. In general, this

will not be the case and we will not know the exact form of the value function

or its first derivative. Fortunately, this is not a completely insurmountable

problem. Our approach will be to make a first guess at the value function and

then have several value function iterations until our guesses converge on the

true value function. The next section is devoted to showing how these value

function iterations are carried out and under what conditions they converge

to the true value function.

We illustrate the convergence of value function iterations to the true value

function in a general formulation of the dynamic programming problem. The

key ingredients are an payoﬀ function ¾

˜ (st ; ct ) and a transition equation

6

st+1 = ¿ (st ; ct ). The payoﬀ function describes the instantaneous return from

choosing a vector of controls ct at a given vector of states st . In the cake-

eating example, ¾

˜ (·) is simply the (direct) utility function. The transition

equation describes the evolution of the vector of state variables st . For the

cake-eating example, ¿ (·) is the intertemporal budget constraint. Of crucial

importance for the remainder of this course is that ¾

˜ (·) and ¿ (·) are not

time-dependent so the problem is stationary. This ensures that the problem

has a recursive representation. In other words, for a given state vector, the

problem faced by the agent is always the same.

The value function is now defined as the value of having a particular state

s (we remove the time index and use s and s0 to denote the state vector in

adjacent time periods).

max [˜

c∈C(s)

s:t:

st+1 = ¿ (st ; ct )

C(s) is the set of all possible choices for the controls c for a given state

vector s. To make the notation more compact, we invert the transition

equation to define the control c as a function of current and future states,

enabling the payoﬀ function to be written in terms of s and s0 . Instead of

choosing c, the agent chooses the future state s0 from the set of feasible states

Γ(s).

0s ∈Γ(s)

The problem now is to find the value function V (s). In some cases, it is

possible to make an intuitive guess at its form (e.g. quadratic in the state

variables) and then proceed via the method of undetermined coeﬃcients to

7

show that the guess is consistent with optimality. The approach we take

through value function iterations is more general, although it leads to a

numerical rather than analytical solution. We begin by making an initial

guess of the value function W (·). The next guess of the value function is

obtained by applying an operator T , defined as follows:

s ∈Γ(s)

0

To put the iteration in words, what we are doing in each iteration is re-

optimising the choice of the future states s0 . In doing this, we need to know

how a change in the future states aﬀects the payoﬀ this period and in all

future periods (the latter is often known as the continuation value). For the

payoﬀ this period, we can use the function ¾(s; s0 ). For the payoﬀ in future

periods, we use the previous guess of the value function W (s0 ). In eﬀect, we

are reoptimising our choice of s0 , assuming that W (s0 ) is a correct represen-

tation of the true value function from the next period onwards. Clearly it is

not, but successive iterations will generally converge so that W (s0 ) becomes

the true value function V (s0 ) and the guess of the value function does not

change between successive value function iterations.

How do we know that value function iterations will converge? Even if they

do, how do we know that they converge to the unique value function? To

answer these question we need a fixed point theorem, since we wish to show

that value function iterations converge to the unique fixed point defined by

s ∈Γ(s)

0

There are many fixed point theorems, some more useful than others. For

our purposes, the most useful fixed point theorem is known as Blackwell’s

suﬃciency conditions. These conditions ensure convergence to a unique fixed

point. The conditions are 1) monotonicity and 2) discounting of the T oper-

8

ator. The mathematics behind these conditions can be found in Stokey and

Lucas and many other places. Here, we will concentrate on gaining intuition

into why Blackwell’s suﬃciency conditions guarantee convergence of value

function iterations to the unique true value function.

To illustrate the intuition, we describe a simple example in which the state

space collapses to a single point. In this example, there is no possibility to

change state so no control or optimisation decision. However, there is a value

associated with the (unique) state so we can still illustrate the operation of

value function iterations. In our simple example the value function will be a

constant. If the initial guess of the value function is W then the next guess

of the value function is obtained by applying the operator

T (W ) = ¾ + ¯W

Notice that the assumption of a collapsed state space removes the state

dependency of the payoﬀ function ¾(·). It is a simple matter to plot the

mapping graphically.

T(W) = s + b W

W

V

a contraction mapping as the span of W is contracted at each iteration by

the T operator. The key condition for convergence in the simple model is

|¯| < 1, which is guaranteed by discounting. Graphically, this ensures that

9

the T mapping cuts the 45o line from above and with a gradient of modulus

less than 1.

Once we move to problems with a fully specified state space, the oper-

ator T is applied to a function W (s0 ) rather than a constant W . In this

case, discounting is not a suﬃcient condition for unique convergence of value

function iterations to the true value. Intuitively, what we require is that

the T mapping cuts the 45o surface from above in every direction and that

the dynamics are stable. The condition of monotonicity ensures this. The

original Blackwell paper form 1965 contains a formal proof, as does Stokey

and Lucas.

It is easy to see that Blackwell’s suﬃciency conditions apply to the dy-

namic programming problems we will study. Monotonicity requires that if

W (s) ≥ Q(s) for all s ∈ S then T (W )(s) ≥ T (Q)(s) for all s ∈ S. This is

guaranteed because our problem is one of maximisation. We have

s ∈Γ(s)

0

= T (Q)(s)

where s00 is the future state chosen in the previous period. Discounting

is satisfied if we consider adding a constant k to the value function and

T (W +k)(s) ≤ T (W )(s) +¯k. This is satisfied trivially in our model because

0s ∈Γ(s)

7 Numerical example

In this final section we show how to apply the principles of dynamic program-

ming to the cake-eating problem in practice. We discuss the Matlab program

10

available from the (very preliminary) Adda and Cooper book’s homepage at

http://www.eco.utexas.edu/~cooper/dynprog/dynprog1.html. This pro-

gram iterates the value function and derives an optimal policy function.

Initialise the program by clearing the working space and define the num-

ber of value function iterations and the discount factor.

Clear;

dimIter=30;

beta=0.75;

hundred rows, starting from 0 and increasing in steps of 0.01 to 1. We store

the row and column size of K in rowK and colK

K=0:0.01:1;

[rowK,colK]= size(K);

V is a matrix that stores the results of the value function iterations. The

rows of V correspond to the value of the possible cake sizes defined in K.

The columns of V contain successive value function iterations. The initial

guess of the value function is zero for all sizes of cake.

V=zeros(colK,dimIter);

Begin with the first value function iteration. Continue until the desired

number of iterations have been completed.

iter;

FOR iter=1:dimIter;

Define aux as an auxiliary matrix with the same number of rows and

columns as the cake-size matrix K. We will use this matrix to store the

11

value of choosing to leave K(ik2) cake for the next period when the current

size of the cake is K(ik). We will actually only use the lower left triangle of

the aux matrix since it is impossible to leave more cake in the future than

you have at present, i.e. K(ik2) ≤ K(ik).

aux=zeros(colK,colK)+NaN;

Beginning with the first possible current cake size, start looking through

all possible cake sizes.

for ik=1:colK;

For each current cake size K(ik), we examine the value of leaving all pos-

sible future cake sizes, K(ik2) < K(ik). We ignore the possibility K(ik2) =

K(ik) since that would imply no consumption and therefore starvation under

logarithmic utility.

for ik2=1:(ik-1)

The value of choosing K(ik2) when the current cake has size K(ik) is

stored in the ik; ik2 element of aux. It consists of two parts, the (logarithmic

here) payoﬀ function log(K(ik) − K(ik2)) and the expected continuation

value V (ik2; iter). Note that this uses the value function V (ik2; iter) from

the previous iteration.

aux(ik,ik2)=log(K(ik)-K(ik2))+beta*V(ik2,iter);

END

END

The newly iterated value function is derived by choosing the best future

cake K(ik2) for each current cake K(ik). Simply looking for the maximum

value of each row of aux (or alternatively the maximum value of each column

of aux0 ) is suﬃcient to find the optimal cake choices.

12

V(:,iter+1)=max(aux’)’;

Loop to the next value function iterations until we get to last iteration,

dimIter.

ENDO;

The value function iterations are now complete. The final value function

is stored in V al, with the corresponding indices of future cake choices in Ind.

optK converts these indices into actual cake sizes, with optC the necessary

consumption.

[Val,Ind]=max(aux’);

optK=K(Ind);

optK=optK+Val*0;

optC=K’-optK’;

figure(1)

plot(K,V);

xlabel(’Size of Cake’);

ylabel(’Value Function’);

figure(2)

plot(K,optC,’LineWidth’,2)

hold on

plot(K,K’,’—r’,...

’LineWidth’,2)

xlabel(’Size of Cake’);

13

ylabel(’Optimal Consumption’);

text(0.4,0.65,’45 degree line’,’FontSize’,18)

text(0.4,0.13,’Optimal Consumption’,’FontSize’,18)

legend(’Optimal Consumption’,’45 degree line’,2)

The output of the code is shown below. In the first figure, the value

function clearly converges over successive iterations. the second figure shows

the optimal consumption as a function of current cake size.

-5

-10

-15

Value Function

-20

-25

-30

-35

-40

-45

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Size of Cake

14

1

0.9

0.8

0.7

45 degree line

Optimal Consumption

0.6

0.5

0.4

0.3

0.2

0.1

Optimal Consumption

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Size of Cake

15

- Lambert 2018Transféré parJamsranjav Enkhbayar
- 52120 MT Modern Control TheoryTransféré parhari0118
- Estimation Error and Port OptimizationTransféré parconnythecon
- 9th U.S. National and 10th Canadian Conference on Earthquake Engineering Paper1Transféré pardr test
- WATER DISTRIBUTION NETWORK DESIGN BY PARTIAL ENUMERATIONTransféré parjoreli
- PDDTransféré parSatheshkumar Dhamotharan
- _optContrTransféré parhannesrei
- A Simple Inventory Replenishment.pdfTransféré parStevens Mulford Carvajal
- Distribution system stateTransféré parjaach78
- no_42.pdfTransféré pareeeeewwwwwwww
- Line Balancing PresentationTransféré parGlenn Rebello
- auto_vehicle_guidance.pdfTransféré parCallum Gillan
- Cost Optimization of Roof Top Swimming PoolTransféré parIRJET Journal
- Teaching Reaction Using Attainable RegionTransféré parYorman Zambrano Silva
- Chapter 5Transféré parmenber
- On the Zero-modules of Spectral Factors Using State Space MethodsTransféré parAnonymous Tph9x741
- NonlinearModelLibraryTransféré parRani Jus
- Control systems resource sheetTransféré parCharlie Ho Si
- art%3A10.1007%2Fs40903-015-0019-4Transféré parJavier Maldonado
- 1-s2.0-S1877705816308517-mainTransféré parjsotofmet4918
- Compit_2007_ConferenceProceedingsTransféré parShanvens Wong
- gloverTransféré parAnup Dugane
- A General Robust MPC Design for the State-Space Model__Application to Paper Machine Process - Hosseinnia2015Transféré parHesam Ahmadian
- 1-s2.0-S0376042114000803-mainjTransféré parEirick Wayne Zuñigga De-Itzel
- Mixed Integer Linear Programming Formulations for Open Pit ProductionTransféré parDavid Halomoan
- 265Transféré parStefano Ramsingh
- EC6405-Control Systems Engineering_2.pdfTransféré parSaravanan Pazapughal
- Basic Principles of Steel StructuresTransféré parJean Claude Eid
- 2. Transportation ModelTransféré parLelouch V. Britania
- Comparación de la Optimización de Enjambre de Partículas y Algoritmo Genético en el diseño de motores de imanes permanentes.pdfTransféré parCésar Lifonzo

- s3 regular.pdfTransféré parAshwin Dev
- s3 regular.pdfTransféré parAshwin Dev
- grhdfhyjkdftjkfulythgjhTransféré parAshwin Dev
- 4 Unit 1Transféré parsathia
- AMPR57163708_2014-07-02_01-51-42Transféré parAshwin Dev
- Electrical and Thermal Efficiency GraphsTransféré parAshwin Dev
- 27333588-Refrigeration-and-Air-Conditioning.pptTransféré parEmma
- All in One Resume Career With UsTransféré parAshwin Dev
- EffPro_enetwtTransféré parAshwin Dev
- FM qqq - CopydgaryezxbfTransféré parAshwin Dev
- 28-ICAMS2011-A10023 (1)Transféré parAshwin Dev
- 04 JenniferTransféré parCarlos Soto
- cnc aptTransféré parKartik Arora
- Vi SMDP for InvestmentTransféré parAshwin Dev
- jjkyus\ggraewgzgadsgewTransféré parAshwin Dev
- Comp Lab Exam Qn PaperTransféré parAshwin Dev
- yarab59-A-10-260-2-a30280cTransféré parAshwin Dev
- 61207071 Ultra Sonic MachiningTransféré parAshwin Dev
- p2Transféré parAshwin Dev
- 69_17_provl_answer_key_mal.pdfTransféré parAshwin Dev
- Module 1Transféré parRavi Kiran Jana
- fgpjvsjfAFPQrw4qtwq487dfsgd'rwet[7Transféré parAshwin Dev
- Computer Science 2006 Sem VI.pdfTransféré parVarun Sankar S
- FXZBXFHMJK5T564674Transféré parAshwin Dev
- Vi SMDP for InvestmentTransféré parAshwin Dev
- matlabTransféré parEduardo Cudia
- Time vs Power Graphs1 Normal PanelTransféré parAshwin Dev
- abhinav ve.pdfTransféré parAshwin Dev
- SFGFDNCNMHDGMFTransféré parAshwin Dev

- HissTransféré parJuan Sánchez López
- 1. IntroductionTransféré parShivaji Sarkate
- Evaluating and Modeling the Gaza Transportation SystemTransféré parZaher Massaad
- MCA Regular First to Fourth Semester Syllabus Final..Transféré parIshna Anhsi
- Bernoulli EquationTransféré parNiroex
- Nash-Cournot Equilibria in Electricity MarketsTransféré parLazaro Samuel Escalante
- Basic Guidelines For Electronic Circuit-1 SessionalTransféré parKhondokar Fida Hasan
- Final Exam Questions - Module 4Transféré parwhatecer
- EM101(Sec.185,ST) SyllabusTransféré parJaafar Soriano
- Higher order thinking skill on chemistryTransféré parSyaputra Irwan
- Revision Sex and GenderTransféré parPensbyPsy
- Foreign Language Teaching en 2001Transféré parAnonymous p9dumEC
- 4th Sem EC 1257 MicroprocessorTransféré parHarleen Barmi
- Nolio ASAP 3 2 User GuideTransféré parJim Blair
- Sprout TemplateTransféré parNette de Guzman
- C++ NotesTransféré parprasenjit1985
- Mini ProjectTransféré parJoyful Joe
- Notice: Senior Executive Service Performance Review Board; membershipTransféré parJustia.com
- Thyroid Papillary Carcinoma CaseTransféré parRandy F Babao
- AkkaTransféré parNitin Sharma
- Lesson 11 AssemblyAndSinglePartDrawingsTransféré parMrsonso
- Induction Hr PptTransféré parkamdica
- Or TutorialTransféré parhehehuhu
- a collection development policy analysisTransféré parapi-291492035
- 2012n1.3Transféré parhydroclathusclathratus
- The Caribbean: Aesthetics, World-Ecology, PoliticsTransféré paradornoinmassculture
- Development and Development ParadigmsTransféré parroblagr
- sat testTransféré pardangnganha
- 43G Balanced Photo ReceiverTransféré parvsinisa1
- 2-3-09Transféré parDMatias