Vous êtes sur la page 1sur 58

DYNAMIC

PROGRAMMING

Dynamic Programming
It is a useful mathematical technique for making a
sequence of interrelated decisions.
Systematic procedure for determining the optimal
combinations of decisions.
There is no standard mathematical formulation of
the Dynamic Programming problem.
Knowing when to apply dynamic programming
depends largely on experience with its general
structure.

206

Prototype example

Stagecoach problem
Fortune seeker that wants to go from Missouri to
California in the mid-19th century.
Travel has 4 stages. A is Missouri and J is California.
Cost is the life insurance of a specific route; lowest
cost is equivalent to a safest trip.
207

Costs
Cost cij of going from state i to state j is:

Problem: which route minimizes the total cost of the


policy?

208

Solving the problem


Note that greedy approach does not work.
Solution A B
However, e.g. A

F
D

J has total cost of 13.


F is cheaper than A B

F.

Other possibility: trial-and-error. Too much effort even


for this simple problem.
Dynamic programming is much more efficient than
exhaustive enumeration, especially for large problems.
Usually starts from the last stage of the problem, and
it enlarges it one stage at a time.

209

Formulation
Decision variables xn (n = 1, 2, 3, 4) are the immediate
destination of stage n.
Route is A

x1

x2

x3

x4, where x4 = J.

Total cost of the best overall policy for the remaining


stages is fn(s, xn).
Actual state is s, ready to start stage n, selecting xn as
the immediate destination.

xn* minimizes fn(s, xn) and fn*(s, xn) is the minimum


value of fn(s, xn):
f n* ( s )

min f n ( s, xn )
xn

f n ( s, xn* )

210

Formulation
where
f n ( s , xn )

immediate cost (stage n) + minimum future cost (stages n +1 onward)


csxn f n 1 ( xn )

value of csxn is given by cij where i = s (current state)


and j = xn (immediate destination).
Objective: find f1*(A) and the corresponding route.
Dynamic programming finds successively f4*(s), f3*(s),
f2*(s) and finally f1*(A).

211

Solution procedure
When n = 4, the route is determined by its current
state s (H or I) and its final destination J.
Since f4*(s) = f4*(s, J) = csJ, the solution for n = 4 is
s

f4*(s)

x4 *

212

Stage n = 3
Needs a few calculations. If fortune seeker is in state F,
he can go to either H or I with costs cF,H = 6 or cF,I = 3.
Choosing H, the minimum additional cost is f4*(H) = 3.
Total cost is 6 + 3 = 9.
Choosing I, the total cost is 3 + 4 = 7. This is smaller, and
it is the optimal choice for state F.
3
6

F
3

I
4
213

Stage n = 3
Similar calculations can be made for the two possible
states s = E and s = G, resulting in the table for n = 3:

x3

f3*(s, x3) = csx3 + f4*(x3)


H

f3*(s)

x3*

214

Stage n = 2
In this case, f2*(s, x2) = csx2 + f4*(x3).
Example for node C:
x2 = E: f2*(C, E) = cC,E + f3*(E) = 3 + 4 = 7 optimal
x2 = F: f2*(C, F) = cC,F + f3*(F) = 2 + 7 = 9.
x2 = G: f2*(C, G) = cC,G + f3*(G) = 4 + 6 = 10.
4

E
3
7
2

C
4

F
G
6
215

Stage n = 2
Similar calculations can be made for the two possible
states s = B and s = D, resulting in the table for n = 2:

x2

f2*(s, x2) = csx2 + f3*(x2)


E

f2*(s)

x2*

11

11

12

11

E or F

10

11

E or F

216

Stage n = 1
Just one possible starting state: A.
x1 = B: f2*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.
x1 = C: f2*(A, C) = cA,C + f2*(C) = 4 + 7 = 11
x1 = D: f2*(A, D) = cA,D + f2*(D) = 3 + 8 = 11

optimal
optimal

Resulting in the table:


s

x1
A

f1*(s, x1) = csx1 + f2*(x1)


B

f1*(s)

x1*

13

11

11

11

C or D

217

Optimal solution
Three optimal solutions, all with f1*(A) = 11:

218

Characteristics of DP
1. The problem can be divided into stages, with a policy

decision required at each stage.

Example: 4 stages and life insurance policy to choose.


Dynamic programming problems require making a
sequence of interrelated decisions.
2. Each stage has a number of states associated with

the beginning of each stage.

Example: states are the possible territories where the


fortune seeker could be located.
States are possible conditions in which the system
might be.
219

Characteristics of DP
3. Policy decision transforms the current state to a state

associated with the beginning of the next stage.

Example: fortune seekers decision led him from his


current state to the next state on his journey.
DP problems can be interpreted in terms of networks:
each node correspond to a state.
Value assigned to each link is the immediate
contribution to the objective function from making
that policy decision.
Objective corresponds to finding the shortest or the
longest path.
220

Characteristics of DP
4. The solution procedure finds an optimal policy for

the overall problem. Finds a prescription of the


optimal policy decision at each stage for each of the
possible states.
Example: solution procedure constructed a table for
each stage, n, that prescribed the optimal decision, xn*,
for each possible state s.
In addition to identifying optimal solutions, DP
provides a policy prescription of what to do under
every possible circumstance (why a decision is called
policy decision).

221

Characteristics of DP
5. Given the current state, an optimal policy for the

remaining stages is independent of the policy


decisions adopted in previous stages.

Optimal immediate decision depends only on the


current state: this is the principle of optimality for DP.
Example: at any state, the insurance policy is
independent on how the fortune seeker got there.
Knowledge of the current state conveys all information
necessary for determining the optimal policy
henceforth (Markovian property).

222

Characteristics of DP
6. Solution procedure begins by finding the optimal

policy for the last stage. Solution is usually trivial.


7. A recursive relationship that identifies optimal policy
for stage n, given optimal policy for stage n + 1, is
available.
Example: recursive relationship was
f n* ( s )

min csxn
xn

f n* 1 ( xn )

Recursive relationship differs somewhat among


dynamic programming problems.

223

Characteristics of DP
7. (cont.) Notation:
N number of stages.
n label for current stage (n 1, 2,
sn

current state for stage n.

xn

decision variable for stage n.

xn*

optimal value of xn (given sn ).

, N ).

f n ( sn , xn ) contribution of stages n, n 1,

, N to objective function,

system starts in state sn at stage n and immediate decision is xn .


f n*

f n ( sn , xn* )

224

Characteristics of DP
7. (cont.) recursive relationship:
f n* ( sn )

max f n ( sn , xn )
xn

or

f n* ( sn )

min f n ( sn , xn )
xn

*
f
where fn(sn, xn) is written in terms of sn, xn and n 1 ( sn 1 ).

8. Using recursive relationship, solution procedure

starts at the end and moves backward stage by


stage.

It stops when finds optimal policy starting at initial stage.


The optimal policy for the entire problem was found.
Example: the tables for the stages shown this procedure.

225

Deterministic dynamic programming


Deterministic problems: the state at the next stage is
completely determined by the state and the policy
decision at the current stage.

Form of the objective function: minimize or maximize


the sum, product, etc. of the contributions from the
individual stages.
Set of states: may be discrete or continuous, or a state
vector. Decision variables can also be discrete or
continuous.
226

Example: medical teams


The World Health Council has five medical teams to allocate to
three underdeveloped countries.
Measure of performance: additional person-years of life, i.e.,
increased life expectancy (in years) times countrys population.
Thousands of additional person-years of life
Country
Medical teams

45

20

50

70

45

70

90

75

80

105

110

100

120

150

130
227

Formulation of the problem


Problem requires three interrelated decisions: how
many teams to allocate to the three countries (stages).
xn is the number of teams to allocate to stage n.
What are the states? What changes from one stage to
another?
sn = number of medical teams still available for
remaining countries.
Thus: s1 = 5, s2 = 5 x1, s3 = s2 x2.

228

States to be considered
Thousands of additional
person-years of life
Country
Medical
teams

45

20

50

70

45

70

90

75

80

105

110

100

120

150

130

229

Overall problem
pi(xi): measure of performance for allocating xi
medical teams to country i.
3

Maximize

pi ( xi ),
i 1

subject to
3

xi

5,

i 1

and xi are nonnegative integers.

230

Policy
Recursive relationship relating functions:

f n* ( sn )
f3* ( s3 )

max

xn 0,1, , sn

pn ( xn )

f n* 1 ( sn

xn ) , for n 1, 2

max p3 ( x3 ), for n 3

xn 0,1, , s3

231

Solution procedure
For last stage n = 3, values of p3(x3) are the last column
of table. Here, x3* = s3 and f3*(s3) = p3(s3).
Thousands of additional
person-years of life
Country
2

n = 3:

s3

f3*(s3)

x3 *

50

70

80

Medical
teams

45

20

50

70

45

70

90

75

80

100

105

110

100

130

120

150

130

232

Stage n = 2
Here, finding x2* requires calculating f2(s2, x2) for the
values of x2 = 0, 1, , s2. Example for s2 = 2:
Thousands of additional
person-years of life

Country

Medical
teams

45

20

50

70

45

70

90

75

80

105

110

100

120

150

130

45

50

State:

20

2
70

233

Stage n = 2
Similar calculations can be made for the other values of s2:

n=2:

s2

x2

f2*(s2, x2) = p2(x2) + f3*(s2 x2)


0

f2*(s2)

x2*

50

70

0 or 1

95

50

20

70

70

45

80

90

95

100

100

115

125 110

125

130

120

125

145 160 150

160

75

234

Stage n = 1
Only state is the starting
state s1 = 5:
0

Thousands of additional
person-years of life
Country

Medical
teams

45

20

50

70

45

70

90

75

80

105

110

100

160

120

150

130

...

120

State:

45

125

4
0

n=1:

s1

x1
5

f1*(s1, x1) = p1(x1) + f2*(s1 x1)


0

f1*(s1)

x1*

160

170

165

160

155

120

170

1
235

Optimal policy decision

236

Distribution of effort problem


One kind of resource is allocated to a number of
activities.
In DP involves only one (or few) resources, while in LP
can deal with thousands of resources.
The assumptions of LP: proportionality, divisibility and
certainty can be violated by DP. Only additivity is
necessary because of the principle of optimality.
World Health Council problem violates proportionality
and divisibility (only integers are allowed).

237

Formulation of distribution of effort


Stage n
xn

activity n ( n 1, 2, , n),
decision variable for stage n,

State sn

amount of resource still available for allocating


to remaining activities ( n,

, N)

When system starts at stage n in state sn, choice xn


results in the next state at stage s + 1:
Stage:
State:

n
sn

n+1
xn

s n xn

238

Example
Distributing scientists to research teams
3 teams are solving engineering problem to safely fly
people to Mars.
2 extra scientists reduce the probability of failure.
Probability of failure
Team
New scientists

0.40

0.60

0.80

0.20

0.40

0.50

0.15

0.20

0.30

239

Continuous dynamic programming


Previous examples had a discrete state variable sn, at
each stage.
They all have been reversible; the solution procedure
could have moved backward or forward stage by
stage.
Next example is continuous. As sn can take any values
in certain intervals, the solutions fn*(sn) and xn* must
be expressed as functions of sn.
Stages in the next example will correspond to time
periods, so the solution must proceed backwards.

240

Example: scheduling jobs


The company Local Job Shop needs to schedule
employment jobs due to seasonal fluctuations.
Machine operators are difficult to hire and costly to train.
Peak season payroll should not be maintained afterwards.
Overtime work on a regular basis should be avoided.

Requirements in near future:


Season

Spring

Summer

Autumn

Winter

Spring

Requirements

255

220

240

200

255

241

Example: scheduling jobs


Employment above level in the table costs 2000 per
person per season.
Total cost of changing level of employment from one
season to the other is 200 times the square of the
difference in employment levels.
Fractional levels are possible due to part-time
employees.

242

Formulation
From data, maximum employment should be 255
(spring). It is necessary to find the level of employment
for other seasons. Seasons are stages.
One cycle of four seasons, where stage 1 is summer
and stage 4 is spring.
xn = employment level for stage n (n =1,2,3,4); x4=255
rn = minimum employment level for stage n: r1=220,
r2=240, r3=200, r4=255. Thus:
rn xn 255

243

Formulation
Cost for stage n:
n = 200(xn xn1)2 + 2000(xn rn)
State sn is the employment in the preceding season xn1
sn = xn1
(n=1: s1 = x0 = x4 = 255)
Problem:
4

minimize

2000( xi

xi 1 )

200( xi

ri ) ,

i 1

subject to ri

xi

255, for i 1, 2,3, 4

244

Detailed formulation
Choose x1, x2 and x3 so as to minimize the cost:
n

rn

Feasible xn

Possible sn = xn-1

Cost

220

220 x1 255

s1 = 255

200(x1 - 255)2 + 2000(x1 - 220)

240

240 x2 255

220 s2 255

200(x2 - x1)2 + 2000(x2 - 240)

200

200 x3 255

240 s3 255

200(x3 - x2)2 + 2000(x3 - 200)

255

x4 = 255

200 s4 255

200(255 - x3)2

245

Formulation
Recursive relationship:
f n* ( sn )

min

rn xn 255

200( xn

sn ) 2 2000( xn

rn )

f n* 1 ( xn )

Basic structure of the problem:

246

Solution procedure
Stage 4: the solution is known to be x4* = 255.
s4
200 s4 255

f4* (s4)
200(255 - s4)2

x 4*
255

Stage 3: possible values of 240 s3 255:


f3* ( s3 )

min

200( x3 s3 ) 2 2000( x3 200)

min

200( x3 s3 ) 2 2000( x3 200) 200(255 x3 ) 2

200 xn 255

200 xn 255

f 4* ( x3 )

graphical solution:

247

Graphical solution for

*
f3 (x3)

248

Calculus solution for

*
f3 (x3)

Using calculus:
x3

f3 ( s3 , x3 )

400( x3 s3 ) 2000 400(255 x3 )


400(2 x3 s3 250) 0

*
3

x
s3
240 s3 255

s3 250
2
f3* (s3)

50(250- s3)2+50(260- s3)2


+1000(s3-150)

x3*
(s3+250)/2

249

Stage 2
Solved in a similar fashion:
f 2 ( s2 , x2 )

200( x2 s2 ) 2 2000( x2 r2 )

f3* ( x3 )

200( x2 s2 )2 2000( x2 240)


50(250 x2 )

50(260 x2 )

f 3* ( x3 )
1000( x2 150)

for 220 s2 255 (possible values) and 240 x2 255


(feasible values).
Calculating / x2[f2(s2, x2)] = 0, yields:
*
2

2 s2

240
3
250

Stage 2
The solution has to be feasible for 220 s2 255 (i.e.,
240 x2 255 for 220 s2 255)!
*
2

2 s2

240
3

is only feasible for 240 s2 255.

Need to solve for feasible value of x2 that minimizes


f2(s2, x2) when 220 s2 < 240.
When s2 < 240,

x2

f 2 ( s2 , x2 ) 0

for 240

x2

255

so x2* = 240.
251

Stage 2 and Stage 1


s2
220 s2 240
240 s2 255

f2* (s2)
200(240-s2) 2+115000

x2 *
240

200/9[(240- s2)2+(255- s2)2 (270s2)2 ]+2000(s2-195)

Stage 1: procedure is similar.

(2s2+240)/3

s1

f1* (s1)

x1 *

255

185000

247.5

Solution:
x1* = 247.5; x2* = 245; x3* = 247.5; x4* = 255
Total cost of 185 000
252

Probabilistic dynamic programming


Next stage is not completely determined by state and
policy decision at the current one
There is a probability distribution for determining the
next state, see figure.
S = number of possible states at stage n + 1.
System goes to i (i = 1,2,,S) with probability pi given
state sn and decision xn at stage n.
Ci = contribution of stage n to the objective function.

If figure is expanded to all possible states and


decisions at all stages, it is a decision tree.

253

Basic structure

254

Probabilistic dynamic programming


Relation between fn(sn, xn) and f*n+1 (sn+1) depends on
the form of the objective function.
Example: minimize the expected sum of the
contributions from individual stages.
fn(sn, xn) is the minimum expected sum from stage n
onward, given state sn, and policy decision xn, at stage n:
S

f n ( sn , xn )

pi Ci

f n* 1 (i )

i 1

with
f n* 1 (i )

min f n 1 (i, xn 1 )
xn

255

Example: determining reject allowances


The Hit-and-Mass Manufacturing Company received
an order to supply one item of a particular type.
Customer requires specified stringent quality requirements.
Manufacturer has to produce more than one to achieve one
acceptable. Number of extra items is the reject allowance.
Probability of acceptable or defective is .
Number of acceptable items in a lot size of L has a binomial
distribution: probability of not acceptable item is (1/2)L.
Setup cost is 300, cost per item is 100. Maximum production
runs is 3. Cost of no acceptable item after 3 runs: 1600.

256

Formulation
Objective: determine policy regarding lot size
(1 + reject allowance) for required production run(s)
that minimizes total expected cost.
Stage n = production run n (n = 1,2,3),
xn = lot size for stage n,
State sn = number of acceptable items still needed (1
or 0) at the beginning of stage n.
At stage 1, state s1 = 1.

257

Formulation
fn(sn, xn) = total expected costs for stages n,,3 and
optimal decisions are made thereafter:
f n* ( sn )

min f n ( sn , xn )

xn 0,1,

where (fn*(0) = 0).


Consider monetary unit = 100. Contribution to cost
from stage n is [K(xn) + xn], with
0,
if xn 0
K ( xn )
3,
if xn 0
Note that f4*(1) = 16.
258

Basic structure of the problem


Recursive relationship:
*
n

f (1)

min

xn 0,1,2,

K ( xn ) xn

1
2

xn

*
n 1

(1)

for n 1, 2,3

259

Solution procedure
n = 3:

n = 2:

n = 1:

s3 x3

f3*(1, x3) = K(x3) + x3 +16(1/2)x3


0

16

s2 x2

f3*(s3)

x3*

0
12

8 1/2

3 or 4

f2*(s2)

x2*

f2*(1, x2) = K(x2) + x2 +16(1/2)x2 f3*(1)


0

s1 x1

0
8

7 1/2

2 or 3

f1*(1, x1) = K(x1) + x1 +16(1/2)x1 f2*(1)


0

f1*(s1)

x1*

7 1/2

6 3/4

6 7/8

7 7/16

6 3/4

260

Solution
Optimal policy: produce two items on the first
production run;
if none is acceptable, then produce either two or three
items on the second production run;
if none is acceptable, then produce either three or four
items on the third production run.
The total expected cost for this policy is 675.

261

Conclusions
Dynamic programming: very useful technique for
making a sequence of interrelated decisions. It
requires formulating an appropriate recursive
relationship for each individual problem.
Example: problem has 10 stages with 10 states and 10
possible decisions at each stage.
Exhaustive enumeration must consider up to 10 billion
combinations.
Dynamic programming need make no more than a
thousand calculations (10 for each state at each stage).

262

Vous aimerez peut-être aussi