Vous êtes sur la page 1sur 12

PROBABILISTIC DYNAMIC

PROGRAMMING

Neal Cristian S. Perlas


Probabilistic Dynamic Programming
(Stochastic Dynamic Programming)
 What does Stochastic means? It is having a random probability distribution or pattern
that may be analyzed statistically but may not be predicted precisely.
 Uncertainty is involved
 Given input results to different outputs
 Uses backward recursion or backward pass rule
 Has three basic elements
a. Stages
b. State
c. Objective

As many of the problems in the field of Operations Research deals with future planning and
many future events are hard to predict with certainty, it is not hard to imagine the
importance of SDP and related techniques. According to Bellmann and Dreyfus [5] this -
that is; the stochastic case - is always the actual situation.
Problem:
An enterprising young statistician believes that she has
developed a system for winning a popular Las Vegas
game. Her colleagues do not believe that her system
works, so they have made a large bet with her that if she
starts with three chips, she will not have at least five chips
after three plays of the game. Each play of the game
involves betting any desired number of available chips
and then either winning or losing this number of chips. The
statistician believes that her system will give her a
probability of 2/3 of winning a given play of the game.

n (Stage) = nth play of game (1,2,3)


Xn = number of chips to bet at stage n
Sn (State) = number of chips in hand to begin stage n.
Objective = to win the bet against her colleagues (to have
at least 5 chips after three plays of the game)
fn(Sn,Xn) – probability of finishing three plays with at least
five chips

If she wins, the state at the next stage will be fn+1 = (Sn + Xn).
probability of winning = 2/3
If she loses, the state at the next stage will be fn+1 = (Sn – Xn).
probability of losing = 1 – 2/3 = 1/3
1 2
f*n(Sn,Xn) = (Sn – Xn) + (Sn + Xn)
3 3
n=3

X3
F*3(S3) X*3
S3
0 0 -
1 0 -
2 0 -
3 2/3 2 (or more)
4 2/3 1 (or more)
≥5 1 0 (or ≤ S3 – 5 )
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4

0 0 0 -

1 0 0 0 -

≥5

1 2
f2(s2,x2) = f*n+1(s2-x2) + f*n+1(s2+x2)
3 3

1 2 1 2 1 2
f2(0,0) = f*3(0-0) + f*3(0+0) = f*3(0) + f*3(0) = (0) + (0)
3 3 3 3 3 3

1 2 1 2 1 2
f2(1,0) = f*3(1-0) + f*3(1+0) = f*3(1) + f*3(1) = (0) + (0)
3 3 3 3 3 3

1 2 1 2 1 2
f2(1,1) = f*3(1-1) + f*3(1+1) = f*3(0) + f*3(2) = (0) + (0)
3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4

0 0 0 -

1 0 0 0 -

2 0 4/9 4/9 4/9 1 or 2

≥5

1 2
f2(s2,x2) = f*n+1(s2-x2) + f*n+1(s2+x2)
3 3

1 2 1 2 1 2
f2(2,0) = f*3(2-0) + f*3(2+0) = f*3(2) + f*3(2) = (0) + (0)
3 3 3 3 3 3

1 2 1 2 1 2 2
f2(2,1) = f*3(2-1) + f*3(2+1) = f*3(1) + f*3(3) = (0) + ( )
3 3 3 3 3 3 3

1 2 1 2 1 2 2
f2(2,2) = f*3(2-2) + f*3(2+2) = f*3(0) + f*3(4) = (0) + ( )
3 3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4

0 0 0 -

1 0 0 0 -

2 0 4/9 4/9 4/9 1 or 2

3 2/3 4/9 2/3 2/3 2/3 0, 2 or 3

≥5

1 2 1 2 1 2 2 2
f2(3,0) = f*3(3-0) + f*3(3+0) = f*3(3) + f*3(3) = ( ) + ( )
3 3 3 3 3 3 3 3

1 2 1 2 1 2 2
f2(3,1) = f*3(3-1) + f*3(3+1) = f*3(2) + f*3(4) = (0) + ( )
3 3 3 3 3 3 3

1 2 1 2 1 2
f2(3,2) = f*3(3-2) + f*3(3+2) = f*3(1) + f*3(5) = (0) + (1)
3 3 3 3 3 3

1 2 1 2 1 2
f2(3,3) = f*3(3-3) + f*3(3+3) = f*3(0) + f*3(6) = (0) + (1)
3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4

0 0 0 -

1 0 0 0 -

2 0 4/9 4/9 4/9 1 or 2

3 2/3 4/9 2/3 2/3 2/3 0, 2 or 3

4 2/3 8/9 2/3 2/3 2/3 8/9 1

≥5 1 1 0 (or ≤ S2 – 5 )

1 2 1 2 1 2 2 2
f2(4,0) = f*3(4-0) + f*3(4+0) = f*3(4) + f*3(4) = ( ) + ( )
3 3 3 3 3 3 3 3

1 2 1 2 1 2 2
f2(4,1) = f*3(4-1) + f*3(4+1) = f*3(3) + f*3(5) = ( ) + (1)
3 3 3 3 3 3 3

1 2 1 2 1 2
f2(4,2) = f*3(4-2) + f*3(4+2) = f*3(2) + f*3(6) = (0) + (1)
3 3 3 3 3 3

1 2 1 2 1 2
f2(4,3) = f*3(4-3) + f*3(4+3) = f*3(1) + f*3(7) = (0) + (1)
3 3 3 3 3 3

1 2 1 2 1 2
f2(4,4) = f*3(4-4) + f*3(4+4) = f*3(0) + f*3(8) = (0) + (1)
3 3 3 3 3 3
n=1
𝟏 𝟐
X1 f*1(S1,X1) = f*2(S1 – X1) + f*2(S1 + X1)
𝟑 𝟑
F*1(S1) X*1
S1 0 1 2 3

3 2/3 20/27 2/3 2/3 20/27 1

1 2 1 2 1 2 2 2
f1(3,0) = f*2(3-0) + f*2(3+0) = f*2(3) + f*2(3) = ( ) + ( )
3 3 3 3 3 3 3 3

1 2 1 2 1 4 2 8
f1(3,1) = f*2(3-1) + f*2(3+1) = f*2(2) + f*2(4) = ( ) + ( )
3 3 3 3 3 9 3 9

1 2 1 2 1 2
f1(3,2) = f*2(3-2) + f*2(3+2) = f*2(1) + f*2(5) = (0) + (1)
3 3 3 3 3 3

1 2 1 2 1 2
f1(3,3) = f*2(3-3) + f*2(3+3) = f*2(0) + f*2(6) = (0) + (1)
3 3 3 3 3 3

This policy gives the statistician a probability of 20/27 of winning


her bet with her colleagues.
Thank
You!