OD Dynamic Programming LARGE 2010

DYNAMIC
PROGRAMMING
Dynamic Programming
It is a useful mathematical technique for making a
sequence of interrelated decisions.
Systematic procedure for determining the optimal
combinations of decisions.
There is no standard mathematical formulation of
the Dynamic Programming problem.
Knowing when to apply dynamic programming
depends largely on experience with its general
structure.
206
Prototype example
Stagecoach problem
Fortune seeker that wants to go from Missouri to
California in the mid-19th century.
Travel has 4 stages. A is Missouri and J is California.
Cost is the life insurance of a specific route; lowest
cost is equivalent to a safest trip.
207
Costs
Cost cij of going from state i to state j is:
Problem: which route minimizes the total cost of the

policy?
208
Solving the problem

Note that greedy approach does not work.
Solution A B
However, e.g. A
F
D
J has total cost of 13.

F is cheaper than A B
F.
Other possibility: trial-and-error. Too much effort even

for this simple problem.
Dynamic programming is much more efficient than
exhaustive enumeration, especially for large problems.
Usually starts from the last stage of the problem, and
it enlarges it one stage at a time.
209
Formulation
Decision variables xn (n = 1, 2, 3, 4) are the immediate
destination of stage n.
Route is A
x1
x2
x3
x4, where x4 = J.
Total cost of the best overall policy for the remaining

stages is fn(s, xn).
Actual state is s, ready to start stage n, selecting xn as
the immediate destination.
xn* minimizes fn(s, xn) and fn*(s, xn) is the minimum

value of fn(s, xn):
f n* ( s )
min f n ( s, xn )
xn
f n ( s, xn* )
210
Formulation
where
f n ( s , xn )
immediate cost (stage n) + minimum future cost (stages n +1 onward)

csxn f n 1 ( xn )
value of csxn is given by cij where i = s (current state)

and j = xn (immediate destination).
Objective: find f1*(A) and the corresponding route.
Dynamic programming finds successively f4*(s), f3*(s),
f2*(s) and finally f1*(A).
211
Solution procedure
When n = 4, the route is determined by its current
state s (H or I) and its final destination J.
Since f4*(s) = f4*(s, J) = csJ, the solution for n = 4 is
s
f4*(s)
x4 *
212
Stage n = 3
Needs a few calculations. If fortune seeker is in state F,
he can go to either H or I with costs cF,H = 6 or cF,I = 3.
Choosing H, the minimum additional cost is f4*(H) = 3.
Total cost is 6 + 3 = 9.
Choosing I, the total cost is 3 + 4 = 7. This is smaller, and
it is the optimal choice for state F.
3
6
F
3
I
4
213
Stage n = 3
Similar calculations can be made for the two possible
states s = E and s = G, resulting in the table for n = 3:
x3
f3*(s, x3) = csx3 + f4*(x3)

H
f3*(s)
x3*
214
Stage n = 2
In this case, f2*(s, x2) = csx2 + f4*(x3).
Example for node C:
x2 = E: f2*(C, E) = cC,E + f3*(E) = 3 + 4 = 7 optimal
x2 = F: f2*(C, F) = cC,F + f3*(F) = 2 + 7 = 9.
x2 = G: f2*(C, G) = cC,G + f3*(G) = 4 + 6 = 10.
4
E
3
7
2
C
4
F
G
6
215
Stage n = 2
Similar calculations can be made for the two possible
states s = B and s = D, resulting in the table for n = 2:
x2
f2*(s, x2) = csx2 + f3*(x2)

E
f2*(s)
x2*
11
11
12
11
E or F
10
11
E or F
216
Stage n = 1
Just one possible starting state: A.
x1 = B: f2*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.
x1 = C: f2*(A, C) = cA,C + f2*(C) = 4 + 7 = 11
x1 = D: f2*(A, D) = cA,D + f2*(D) = 3 + 8 = 11
optimal
optimal
Resulting in the table:

s
x1
A
f1*(s, x1) = csx1 + f2*(x1)

B
f1*(s)
x1*
13
11
11
11
C or D
217
Optimal solution
Three optimal solutions, all with f1*(A) = 11:
218
Characteristics of DP
1. The problem can be divided into stages, with a policy
decision required at each stage.
Example: 4 stages and life insurance policy to choose.

Dynamic programming problems require making a
sequence of interrelated decisions.
2. Each stage has a number of states associated with
the beginning of each stage.
Example: states are the possible territories where the

fortune seeker could be located.
States are possible conditions in which the system
might be.
219
3. Policy decision transforms the current state to a state
associated with the beginning of the next stage.
Example: fortune seekers decision led him from his

current state to the next state on his journey.
DP problems can be interpreted in terms of networks:
each node correspond to a state.
Value assigned to each link is the immediate
contribution to the objective function from making
that policy decision.
Objective corresponds to finding the shortest or the
longest path.
220
4. The solution procedure finds an optimal policy for
the overall problem. Finds a prescription of the

optimal policy decision at each stage for each of the
possible states.
Example: solution procedure constructed a table for
each stage, n, that prescribed the optimal decision, xn*,
for each possible state s.
In addition to identifying optimal solutions, DP
provides a policy prescription of what to do under
every possible circumstance (why a decision is called
policy decision).
221
5. Given the current state, an optimal policy for the
remaining stages is independent of the policy

decisions adopted in previous stages.
Optimal immediate decision depends only on the

current state: this is the principle of optimality for DP.
Example: at any state, the insurance policy is
independent on how the fortune seeker got there.
Knowledge of the current state conveys all information
necessary for determining the optimal policy
henceforth (Markovian property).
222
6. Solution procedure begins by finding the optimal
policy for the last stage. Solution is usually trivial.

7. A recursive relationship that identifies optimal policy
for stage n, given optimal policy for stage n + 1, is
available.
Example: recursive relationship was
f n* ( s )
min csxn
xn
f n* 1 ( xn )
Recursive relationship differs somewhat among

dynamic programming problems.
223
7. (cont.) Notation:
N number of stages.
n label for current stage (n 1, 2,
sn
current state for stage n.
xn
decision variable for stage n.
xn*
optimal value of xn (given sn ).
, N ).
f n ( sn , xn ) contribution of stages n, n 1,
, N to objective function,
system starts in state sn at stage n and immediate decision is xn .

f n*
f n ( sn , xn* )
224
7. (cont.) recursive relationship:
f n* ( sn )
max f n ( sn , xn )
xn
or
f n* ( sn )
min f n ( sn , xn )
xn
*
f
where fn(sn, xn) is written in terms of sn, xn and n 1 ( sn 1 ).
8. Using recursive relationship, solution procedure
starts at the end and moves backward stage by

stage.
It stops when finds optimal policy starting at initial stage.

The optimal policy for the entire problem was found.
Example: the tables for the stages shown this procedure.
225
Deterministic dynamic programming

Deterministic problems: the state at the next stage is
completely determined by the state and the policy
decision at the current stage.
Form of the objective function: minimize or maximize

the sum, product, etc. of the contributions from the
individual stages.
Set of states: may be discrete or continuous, or a state
vector. Decision variables can also be discrete or
continuous.
226
Example: medical teams

The World Health Council has five medical teams to allocate to
three underdeveloped countries.
Measure of performance: additional person-years of life, i.e.,
increased life expectancy (in years) times countrys population.
Thousands of additional person-years of life
Country
Medical teams
45
20
50
70
45
70
90
75
80
105
110
100
120
150
130
227
Formulation of the problem

Problem requires three interrelated decisions: how
many teams to allocate to the three countries (stages).
xn is the number of teams to allocate to stage n.
What are the states? What changes from one stage to
another?
sn = number of medical teams still available for
remaining countries.
Thus: s1 = 5, s2 = 5 x1, s3 = s2 x2.
228
States to be considered
Thousands of additional
person-years of life
Country
Medical
teams
45
20
50
70
45
70
90
75
80
105
110
100
120
150
130
229
Overall problem
pi(xi): measure of performance for allocating xi
medical teams to country i.
3
Maximize
pi ( xi ),
i 1
subject to
3
xi
5,
i 1
and xi are nonnegative integers.
230
Policy
Recursive relationship relating functions:
f n* ( sn )
f3* ( s3 )
max
xn 0,1, , sn
pn ( xn )
f n* 1 ( sn
xn ) , for n 1, 2
max p3 ( x3 ), for n 3
xn 0,1, , s3
231
Solution procedure
For last stage n = 3, values of p3(x3) are the last column
of table. Here, x3* = s3 and f3*(s3) = p3(s3).
Country
2
n = 3:
s3
f3*(s3)
x3 *
50
70
80
Medical
teams
45
20
50
70
45
70
90
75
80
100
105
110
100
130
120
150
130
232
Stage n = 2
Here, finding x2* requires calculating f2(s2, x2) for the
values of x2 = 0, 1, , s2. Example for s2 = 2:
Country
Medical
teams
45
20
50
70
45
70
90
75
80
105
110
100
120
150
130
45
50
State:
20
2
70
233
Stage n = 2
Similar calculations can be made for the other values of s2:
n=2:
s2
x2
f2*(s2, x2) = p2(x2) + f3*(s2 x2)

0
f2*(s2)
x2*
50
70
0 or 1
95
50
20
70
70
45
80
90
95
100
100
115
125 110
125
130
120
125
145 160 150
160
75
234
Stage n = 1
Only state is the starting
state s1 = 5:
0
Country
Medical
teams
45
20
50
70
45
70
90
75
80
105
110
100
160
120
150
130
...
120
State:
45
125
4
0
n=1:
s1
x1
5
f1*(s1, x1) = p1(x1) + f2*(s1 x1)

0
f1*(s1)
x1*
160
170
165
160
155
120
170
1
235
Optimal policy decision
236
Distribution of effort problem

One kind of resource is allocated to a number of
activities.
In DP involves only one (or few) resources, while in LP
can deal with thousands of resources.
The assumptions of LP: proportionality, divisibility and
certainty can be violated by DP. Only additivity is
necessary because of the principle of optimality.
World Health Council problem violates proportionality
and divisibility (only integers are allowed).
237
Formulation of distribution of effort

Stage n
xn
activity n ( n 1, 2, , n),
decision variable for stage n,
State sn
amount of resource still available for allocating

to remaining activities ( n,
, N)
When system starts at stage n in state sn, choice xn

results in the next state at stage s + 1:
Stage:
State:
n
sn
n+1
xn
s n xn
238
Example
Distributing scientists to research teams
3 teams are solving engineering problem to safely fly
people to Mars.
2 extra scientists reduce the probability of failure.
Probability of failure
Team
New scientists
0.40
0.60
0.80
0.20
0.40
0.50
0.15
0.20
0.30
239
Continuous dynamic programming

Previous examples had a discrete state variable sn, at
each stage.
They all have been reversible; the solution procedure
could have moved backward or forward stage by
stage.
Next example is continuous. As sn can take any values
in certain intervals, the solutions fn*(sn) and xn* must
be expressed as functions of sn.
Stages in the next example will correspond to time
periods, so the solution must proceed backwards.
240
Example: scheduling jobs

The company Local Job Shop needs to schedule
employment jobs due to seasonal fluctuations.
Machine operators are difficult to hire and costly to train.
Peak season payroll should not be maintained afterwards.
Overtime work on a regular basis should be avoided.
Requirements in near future:

Season
Spring
Summer
Autumn
Winter
Spring
Requirements
255
220
240
200
255
241
Example: scheduling jobs

Employment above level in the table costs 2000 per
person per season.
Total cost of changing level of employment from one
season to the other is 200 times the square of the
difference in employment levels.
Fractional levels are possible due to part-time
employees.
242
Formulation
From data, maximum employment should be 255
(spring). It is necessary to find the level of employment
for other seasons. Seasons are stages.
One cycle of four seasons, where stage 1 is summer
and stage 4 is spring.
xn = employment level for stage n (n =1,2,3,4); x4=255
rn = minimum employment level for stage n: r1=220,
r2=240, r3=200, r4=255. Thus:
rn xn 255
243
Formulation
Cost for stage n:
n = 200(xn xn1)2 + 2000(xn rn)
State sn is the employment in the preceding season xn1
sn = xn1
(n=1: s1 = x0 = x4 = 255)
Problem:
4
minimize
2000( xi
xi 1 )
200( xi
ri ) ,
i 1
subject to ri
xi
255, for i 1, 2,3, 4
244
Detailed formulation
Choose x1, x2 and x3 so as to minimize the cost:
n
rn
Feasible xn
Possible sn = xn-1
Cost
220
220 x1 255
s1 = 255
200(x1 - 255)2 + 2000(x1 - 220)
240
240 x2 255
220 s2 255
200(x2 - x1)2 + 2000(x2 - 240)
200
200 x3 255
240 s3 255
200(x3 - x2)2 + 2000(x3 - 200)
255
x4 = 255
200 s4 255
200(255 - x3)2
245
Formulation
Recursive relationship:
f n* ( sn )
min
rn xn 255
200( xn
sn ) 2 2000( xn
rn )
f n* 1 ( xn )
Basic structure of the problem:
246
Solution procedure
Stage 4: the solution is known to be x4* = 255.
s4
200 s4 255
f4* (s4)
200(255 - s4)2
x 4*
255
Stage 3: possible values of 240 s3 255:

f3* ( s3 )
min
200( x3 s3 ) 2 2000( x3 200)
min
200( x3 s3 ) 2 2000( x3 200) 200(255 x3 ) 2
200 xn 255
200 xn 255
f 4* ( x3 )
graphical solution:
247
Graphical solution for
*
f3 (x3)
248
Calculus solution for
*
f3 (x3)
Using calculus:
x3
f3 ( s3 , x3 )
400( x3 s3 ) 2000 400(255 x3 )

400(2 x3 s3 250) 0
*
3
x
s3
240 s3 255
s3 250
2
f3* (s3)
50(250- s3)2+50(260- s3)2

+1000(s3-150)
x3*
(s3+250)/2
249
Stage 2
Solved in a similar fashion:
f 2 ( s2 , x2 )
200( x2 s2 ) 2 2000( x2 r2 )
f3* ( x3 )
200( x2 s2 )2 2000( x2 240)

50(250 x2 )
50(260 x2 )
f 3* ( x3 )
1000( x2 150)
for 220 s2 255 (possible values) and 240 x2 255

(feasible values).
Calculating / x2[f2(s2, x2)] = 0, yields:
*
2
2 s2
240
3
250
Stage 2
The solution has to be feasible for 220 s2 255 (i.e.,
240 x2 255 for 220 s2 255)!
*
2
2 s2
240
3
is only feasible for 240 s2 255.
Need to solve for feasible value of x2 that minimizes

f2(s2, x2) when 220 s2 < 240.
When s2 < 240,
x2
f 2 ( s2 , x2 ) 0
for 240
x2
255
so x2* = 240.
251
Stage 2 and Stage 1

s2
220 s2 240
240 s2 255
f2* (s2)
200(240-s2) 2+115000
x2 *
240
200/9[(240- s2)2+(255- s2)2 (270s2)2 ]+2000(s2-195)
Stage 1: procedure is similar.
(2s2+240)/3
s1
f1* (s1)
x1 *
255
185000
247.5
Solution:
x1* = 247.5; x2* = 245; x3* = 247.5; x4* = 255
Total cost of 185 000
252
Probabilistic dynamic programming

Next stage is not completely determined by state and
policy decision at the current one
There is a probability distribution for determining the
next state, see figure.
S = number of possible states at stage n + 1.
System goes to i (i = 1,2,,S) with probability pi given
state sn and decision xn at stage n.
Ci = contribution of stage n to the objective function.
If figure is expanded to all possible states and

decisions at all stages, it is a decision tree.
253
Basic structure
254
Probabilistic dynamic programming

Relation between fn(sn, xn) and f*n+1 (sn+1) depends on
the form of the objective function.
Example: minimize the expected sum of the
contributions from individual stages.
fn(sn, xn) is the minimum expected sum from stage n
onward, given state sn, and policy decision xn, at stage n:
S
f n ( sn , xn )
pi Ci
f n* 1 (i )
i 1
with
f n* 1 (i )
min f n 1 (i, xn 1 )
xn
255
Example: determining reject allowances

The Hit-and-Mass Manufacturing Company received
an order to supply one item of a particular type.
Customer requires specified stringent quality requirements.
Manufacturer has to produce more than one to achieve one
acceptable. Number of extra items is the reject allowance.
Probability of acceptable or defective is .
Number of acceptable items in a lot size of L has a binomial
distribution: probability of not acceptable item is (1/2)L.
Setup cost is 300, cost per item is 100. Maximum production
runs is 3. Cost of no acceptable item after 3 runs: 1600.
256
Formulation
Objective: determine policy regarding lot size
(1 + reject allowance) for required production run(s)
that minimizes total expected cost.
Stage n = production run n (n = 1,2,3),
xn = lot size for stage n,
State sn = number of acceptable items still needed (1
or 0) at the beginning of stage n.
At stage 1, state s1 = 1.
257
Formulation
fn(sn, xn) = total expected costs for stages n,,3 and
optimal decisions are made thereafter:
f n* ( sn )
min f n ( sn , xn )
xn 0,1,
where (fn*(0) = 0).

Consider monetary unit = 100. Contribution to cost
from stage n is [K(xn) + xn], with
0,
if xn 0
K ( xn )
3,
if xn 0
Note that f4*(1) = 16.
258
Basic structure of the problem

Recursive relationship:
*
n
f (1)
min
xn 0,1,2,
K ( xn ) xn
1
2
xn
*
n 1
(1)
for n 1, 2,3
259
Solution procedure
n = 3:
n = 2:
n = 1:
s3 x3
f3*(1, x3) = K(x3) + x3 +16(1/2)x3

0
16
s2 x2
f3*(s3)
x3*
0
12
8 1/2
3 or 4
f2*(s2)
x2*
f2*(1, x2) = K(x2) + x2 +16(1/2)x2 f3*(1)

0
s1 x1
0
8
7 1/2
2 or 3
f1*(1, x1) = K(x1) + x1 +16(1/2)x1 f2*(1)

0
f1*(s1)
x1*
7 1/2
6 3/4
6 7/8
7 7/16
6 3/4
260
Solution
Optimal policy: produce two items on the first
production run;
if none is acceptable, then produce either two or three
items on the second production run;
if none is acceptable, then produce either three or four
items on the third production run.
The total expected cost for this policy is 675.
261
Conclusions
Dynamic programming: very useful technique for
making a sequence of interrelated decisions. It
requires formulating an appropriate recursive
relationship for each individual problem.
Example: problem has 10 stages with 10 states and 10
possible decisions at each stage.
Exhaustive enumeration must consider up to 10 billion
combinations.
Dynamic programming need make no more than a
thousand calculations (10 for each state at each stage).
262

OD Dynamic Programming LARGE 2010

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

OD Dynamic Programming LARGE 2010

Transféré par

Droits d'auteur :

Formats disponibles

DYNAMIC

Problem: which route minimizes the total cost of the

Solving the problem

J has total cost of 13.

Other possibility: trial-and-error. Too much effort even

Total cost of the best overall policy for the remaining

xn* minimizes fn(s, xn) and fn*(s, xn) is the minimum

immediate cost (stage n) + minimum future cost (stages n +1 onward)

value of csxn is given by cij where i = s (current state)

f3*(s, x3) = csx3 + f4*(x3)

f2*(s, x2) = csx2 + f3*(x2)

Resulting in the table:

f1*(s, x1) = csx1 + f2*(x1)

decision required at each stage.

Example: 4 stages and life insurance policy to choose.

the beginning of each stage.

Example: states are the possible territories where the

associated with the beginning of the next stage.

Example: fortune seekers decision led him from his

the overall problem. Finds a prescription of the

remaining stages is independent of the policy

Optimal immediate decision depends only on the

policy for the last stage. Solution is usually trivial.

Recursive relationship differs somewhat among

current state for stage n.

decision variable for stage n.

optimal value of xn (given sn ).

system starts in state sn at stage n and immediate decision is xn .

8. Using recursive relationship, solution procedure

starts at the end and moves backward stage by

It stops when finds optimal policy starting at initial stage.

Deterministic dynamic programming

Form of the objective function: minimize or maximize

Example: medical teams

Formulation of the problem

and xi are nonnegative integers.

f2*(s2, x2) = p2(x2) + f3*(s2 x2)

145 160 150

f1*(s1, x1) = p1(x1) + f2*(s1 x1)

Optimal policy decision

Distribution of effort problem

Formulation of distribution of effort

amount of resource still available for allocating

When system starts at stage n in state sn, choice xn

Continuous dynamic programming

Example: scheduling jobs

Requirements in near future:

Example: scheduling jobs

255, for i 1, 2,3, 4

200(x1 - 255)2 + 2000(x1 - 220)

200(x2 - x1)2 + 2000(x2 - 240)

200(x3 - x2)2 + 2000(x3 - 200)

Basic structure of the problem:

Stage 3: possible values of 240 s3 255:

200( x3 s3 ) 2 2000( x3 200)

200( x3 s3 ) 2 2000( x3 200) 200(255 x3 ) 2

Graphical solution for

Calculus solution for

400( x3 s3 ) 2000 400(255 x3 )

50(250- s3)2+50(260- s3)2

200( x2 s2 )2 2000( x2 240)

for 220 s2 255 (possible values) and 240 x2 255

is only feasible for 240 s2 255.

Need to solve for feasible value of x2 that minimizes

f3(s, x3) = csx3 + f4(x3)

f2(s, x2) = csx2 + f3(x2)

f1(s, x1) = csx1 + f2(x1)

f2(s2, x2) = p2(x2) + f3(s2 x2)

f1(s1, x1) = p1(x1) + f2(s1 x1)

f2(1, x2) = K(x2) + x2 +16(1/2)x2 f3(1)

f1(1, x1) = K(x1) + x1 +16(1/2)x1 f2(1)