Académique Documents
Professionnel Documents
Culture Documents
Contents
1 Introduction
1.1 Decision problems . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Choice of medical treatment (adapted from Weinstein
and Fineberg [1980, p. 180]) . . . . . . . . . . . . . . .
1.1.2 A choice between two languages for computer programs
1.1.3 Monty Hall . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Procedures for handling decision problems: the decision theoretic approach . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
29
29
32
32
under
. . . .
. . . .
. . . .
. . . .
. . . .
5
5
6
7
7
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
38
41
41
43
46
Uncertainty
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
48
49
49
50
51
52
52
56
56
56
56
Appendices
94
95
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
58
58
59
59
61
62
63
63
64
65
65
66
67
68
68
79
83
86
87
88
116
116
116
118
119
119
120
121
2.4
2.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Chapter 1
Introduction
What will I suffer now? These words by Odysseus, one of the most successful
decision makers of Western literature, expresses what most people at times
experience both in their private life and as professionals. It is also worth
noting that they are not outdated, since in contrast to other areas man as
a decision maker in general relies more on intuition rather than rules when
solving decision problems. Since such a procedure hardly can be seen as an
ideal one, especially if a proposed solution must be supported by arguments,
it is natural to look into the possibility of finding formal methods like those
in logic for solving decision problems.
1.1
Decision problems
If you are looking for the right kind of white color for the walls of your
drawing-room, or the perfect sour milk, then you are facing an optimization
problem. If instead you are pondering which brand of yogurt to buy at your
local grocery, or of finding the ingredients of a decent meal there, then you
are up to a problem of choice, or a problem of satisfaction. Decision problems
come in one of these forms or as combinations of them. A good example of
the latter case is the procedure Swedish authorities follow when selecting
tenders under the Act of Public Procurement. First each tender is subjected
to a test in order to see if it fulfills certain minimum requirements. Then a
selection is made out of the tenders that had passed the initial test.
There is also another and perhaps more familiar way of classifying decisions, namely in terms of timescales. In this course only decisions where
there is ample time for deliberation are studied. Hence operational decisions
made by firefighters, policemen and soldiers in the field are not considered.
Moreover, the main focus will be on professional decision making since such
4
1.1.1
Choice of medical treatment (adapted from Weinstein and Fineberg [1980, p. 180])
A 68 year old woman has suffered from bad circulation in the left leg for
some time. Shes now seeing a doctor because of an injury that has led to
an infection that may develop into gangrene in the left foot. There are two
available options: either an immediate operation O leading to the insertion of
an artificial limb below the left knee, or a treatment with drugs during three
months M. Such a treatment is successful in seven out of ten cases. But if
it fails, a more complicated operation leading to the insertion of an artificial
limb above the left knee is needed. The doctor considers the probabilities
for a successful operation to be 99 percent if it is done right away, and 90
percent if it is done after an unsuccessful treatment with drugs. The doctor
is also willing to let the preferences of the patient influence her proposals.
How should she act?
1.1.2
1.1.3
Monty Hall
Suppose that you are to select one of three doors in a popular American TV
show. You are informed that there is a car behind one of the doors, and a
lemon behind each of the other ones. When you have made your choice, the
host Monty Hall, opens one of the doors that you didnt choose, and behind
which there is a lemon. You are now invited to make a new choice. Should
you accept the invitation and choose the other, still closed, door?
1.2
1.2.1
1.2.2
Examples
(1.1)
(1.2)
meaning that recovery is better than an artificial limb below the knee, which
in turn is better than serious consequences. Likewise, recovery is better than
an artificial limb above the knee, which is better than serious consequences.
Most patients would also set
v(b) > v(a)
(1.3)
Figure 1.2: This is the tree of the choice of a drug treatment. The probability
for recovery r is 70 percent. Should the drug treatment fail, the probability of
a successful operation with an artificial limb above the knee a is 90 percent,
and the probability of serious consequences s is 10 percent.
tree of M (see figure 1.2), the symbol o stands for an operation, given that
the drug treatment failed.
A mean consequence value is the weighted average of the mean consequence values of the subsequent nodes, and the weighted average is the sum
of all consequence values multiplied by their respective probabilities. Looking at the tree in figure 1.3, the weighted average of the nodes is calculated
as
aP (a) + bP (b) + cP (c)
where a, b and c represents the value of each node and P is the probability
function. You can read more about the probability function in appendix 1.
The mean consequence value of M, is the weighted average of v(o) and
v(r). We have already set v(r) = 1, but we need to calculate the value of o,
which in turn is the weighted average of v(s) and v(b), in order to calculate
(1.4)
10
(1.5)
MO = P (b)v(b) + P (s)v(s)
= P (b)v(b) + P (s) 0
= P (b)v(b)
= 0.99v(b)
To determine which of the alternatives O and M has the highest value,
we can calculate the difference M between the mean consequence values of
the two alternatives as
M = MM MO
= 0.27v(a) + 0.7 0.99v(b)
(1.6)
Figure 1.4: From the plot we can see that M approaches its maximum
when both v(a) and v(b) approaches 0, and approaches its minimum when
v(a) approaches 0 and v(b) approaches 1.
and hence M must lie somewhere in the interval
0.29 < M < 0.7.
Since M can take on both negative and positive values, we need more
constraints on the values in order to favor one of the options. In particular we
must introduce the notion of distance between values. One such constraint
that comes to mind is
v(r) v(b) v(b) v(a)
which reflects the opinion that the distance in value between recovery and an
artificial limb below the knee, is not smaller than the distance between two
different artificial limbs. To employ the inequality we can rearrange things a
bit:
v(r) v(b) v(b) v(a)
v(r) 2v(b) v(a)
v(r) + v(a) 2v(b)
v(r) + v(a)
2
and just as above, set v(r) = 1, which in turn gives
v(b)
v(b)
1 + v(a)
.
2
12
1 + v(a)
.
2
(1.7)
and thus a new interval for M , depending only on v(a) where 0 < v(a) < 1,
as in figure 1.5.
M
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
1.0
vHaL
Figure 1.5: This plot shows the boundaries of M as functions of v(a) only.
By using (1.7) and solving
0.7 0.72v(a) = 0 v(a) =
0.7
= 0.972222
0.72
because it seems safe to assume that the value of an artificial limb above
the knee v(a), is at least as high as 0.8. Setting v(a) = 0.8 in the original
equation for M , gives us
M = 0.27v(a) + 0.7 0.99v(b)
= 0.27 0.8 + 0.7 0.99v(b)
= 0.916 0.99v(b)
(1.8)
(1.9)
(1.10)
which in turn means that v(b) must higher than 0.925... if M is to be less
than zero, and consequently an immediate operation MO is to be preferred.
Suitable language for a computer program: a decision theoretic
approach As a first step it was agreed upon that the consequences ci of
the two options were the following ones:
Option A
c1 Prototype in C ready in September 1988, costs quite high, staff
not entirely satisfied
c2 Prototype in C somewhat delayed due to circumstances outside
the control of TR
c3 Neither c1 nor c2
Option B
c4 Prototype in Prolog ready in September 1988, low costs, staff
pleased
c5 Prototype in Prolog somewhat delayed due to circumstances outside the control of TR
c6 Only fractions of a prototype ready in September 1988, low costs,
staff frustrated
c7 None of c4 , c5 or c6
It was then agreed upon that the probabilities and values were the following ones:
Probabilities The probability of c1 is quite high, at least 2/3. It is
almost certain that c1 or c2 occurs. The probability of the event c4 or c5 is
quite uncertain, could be as low as 0.1 and as high as 0.6. The event c4 or
c5 or c6 is almost certain.
14
(1.11)
(1.12)
(1.13)
Here P (ci ) is the probability of ci , and v(ci ) is its numerical value. Note
that the representation is not to be viewed as an exact one. For instance,
the interval [0.1, 0.6], should be viewed as sufficiently large for P (c4 ) + P (c5 ).
Hence we must be careful not to let a ranking of the options depend on values
near the boundaries.
To simplify the comparison between the options we use the variables a, b
and c such that
a = P (c1 ),
b = P (c4 ),
c = P (c5 ).
Take note that we now, according to (1.11) and (1.12), can substitute
P (c2 ) with 1 a, and P (c6 ) with 1 (b + c). In addition, we set
v(c4 ) = 1
v(c6 ) = 0
(1.14)
to reflect the best and the worst consequence, and use the variables x and y
where
x = v(c2 )
y = v(c1 ) x = v(c1 ) v(c2 ) = v(c4 ) v(c5 )
15
(1.15)
(1.16)
(1.17)
(1.18)
(1.19)
(1.20)
The intervals for x and y are a bit more tricky. In (1.15) we set
x = v(c2 )
and in (1.16)
y = v(c1 ) x = v(c1 ) v(c2 ) = v(c4 ) v(c5 ).
In other words, y represents the difference between v(c1 ) and v(c2 ), and
between v(c4 ) and c(v5 ). Since v(c6 ) = 0, x represents the difference between
16
v(c2 ) and v(c6 ). If we add another variable z = v(c5 ) v(c1 ) and look at the
inequalities in (1.13), we see that
0 < y < z < x < 1,
2y + z + x = 1
and thus we have that
0 < y < 0.25 < x < 1,
x + y > 0.5.
We now start the comparison of E[A] and E[B] by computing the maximum, minimum and mean of both, using the domain
D = {(a, b, c) | 2/3 a 1, 0 b, 0 c, 0.1 b + c 0.6}
specified above. Repeating the formulas for the expected values of A and B
we have
E[A] = x + ay
E[B] = b + c(1 y)
(1.21)
(1.22)
and
RRR
D
mean(E[B]) =
(b + c(1 y)) dA
RRR
.
dA
D
Starting with
ZZZ
dA
D
we notice that the boundaries of a dont pose any particular problems. However, the boundaries of b and c are not as simple. Looking at the region plot
in figure 1.6 we see that the region bounded by b and c forms a triangle with
one of the corners missing. Thus we can start by integrating over the whole
triangle and then subtract the integration over the missing corner.
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
3
b
5
where 0 b 35 , and the line making up the top boundary of the missing
corner can similarly be expressed as
c = 0.1 b =
18
1
b
10
where 0 b
Z
1
.
10
1
a= 23
3
5
b=0
3
b
5
1
10
dc db da
1
b
10
dc db da
a= 23
c=0
b=0
c=0
3
5
3
b
5
3
5
dc db da =
a= 23
b=0
a= 23
c=0
[c]05
db da
b=0
3
5
3
=
b db da
5
a= 23 b=0
3
Z 1
3
b2 5
c
da
=
2 0
a= 23 5
Z 1
9
9
=
da
25 50
a= 23
1
9
9
=
a a
25
50 2
3
9
9
6
6
=
25 50
25 50
3
=
50
Z
19
and
Z
1
10
1
b
10
1
10
dc db da =
a= 32
b=0
a= 32
c=0
[c]010
db da
b=0
1
10
1
10
1
=
b db da
10
a= 23 b=0
1
Z 1
b2 10
1
c
=
da
2 0
a= 23 10
Z 1
1
1
=
da
100 200
a= 32
1
1
1
=
a
a
100
200 2
3
2
1
1
1
=
100 200
300 300
1
=
600
which finally gives
ZZZ
dA =
D
a= 32
3
5
b=0
3
b
5
dc db da
dc db da =
a= 23
c=0
1
b
10
b=0
c=0
3
1
7
=
.
50 600
120
Continuing in the same manner with calculating
ZZZ
(x + ay) dA =
D
Z 1
a= 32
3
5
b=0
3
b
5
(x + ay) dc db da
20
1
10
1
b
10
(x + ay) dc db da,
a= 23
c=0
b=0
c=0
3
5
3
b
5
3
5
a= 23
b=0
c=0
a= 23
1
[cx + acy]05
(x + ay) dc db da =
db da
b=0
3
5
3
3
=
x bx + ay aby db da
5
5
a= 32 b=0
3
Z 1
3
b2 x 3
ab2 y 5
bx
+ aby
=
da
2
5
2 0
a= 23 5
Z 1
9
9
9
9
x x + ay ay da
=
25
50
25
50
a= 23
1
9
9
9 2
9 2
=
ax ax + a y
ay
25
50
50
100
2
3
1
9
9 2
=
ax +
ay
50
100
2
3
9
9
18
36
= x+
y
x+
y
50
100
150
900
3
1
= x+ y
50
20
Z
21
and
Z 1 Z
1
10
1
b
10
1
10
(x + ay) dc db da =
a= 23
b=0
a= 23
c=0
[cx + acy]010
db da
b=0
1
10
1
1
=
x bx + ay aby db da
10
10
a= 23 b=0
1
Z 1
b2 x
1
ab2 y 10
1
bx
+ aby
=
da
2
10
2 0
a= 23 10
Z 1
1
1
1
1
x
x+
ay
ay da
=
100
200
100
200
a= 23
1
1
1
1 2
1 2
=
ax
ax +
a y
ay
100
200
200
400
2
3
1
1 2
1
ax +
ay
=
200
400
2
Z
1
2
4
y
x
y
400
600
3600
1
y
720
1
x+
=
200
1
=
x+
600
resulting in
ZZZ
(x + ay) dA =
D
a= 23
3
5
b=0
3
b
5
(x + ay) dc db da
1
b
10
(x + ay) dc db da =
a= 23
c=0
1
10
b=0
1
3
x+ y
50
20
c=0
1
1
x+
y
600
720
=
7
7
x+
y
120
144
(x + ay) dA
=
dA
D
DRRR
7
x
120
7
y
144
7
120
5
= x + y.
6
(1.23)
a= 32
3
5
b=0
3
b
5
Z
(b + c(1 y)) dc db da
a= 23
c=0
22
1
10
b=0
1
b
10
c=0
a= 32
3
5
b=0
1
3
5
(b + c(1 y)) dc db da =
c=0
3
5
=
a= 32
3
b
5
b=0
c2 (1 y)
bc +
2
53 b
db da =
0
9
b2 9y 3by b2 y
+
db da =
=
50
2
50
5
2
a= 23 b=0
3
Z 1
9b b3 9by 3b2 y b3 y 5
=
+
da =
6
50
10
6 0
a= 53 50
Z 1
9
9y
da =
=
125 250
a= 35
1
9a
9ay
=
=
125 250 2
3
9
9y
6
3y
=
=
125 250
125 125
3y
3
=
125 250
Z
23
a= 23
1
10
b=0
1
1
10
(b + c(1 y)) dc db da =
c=0
1
10
=
a= 32
1
b
10
b=0
c2 (1 y)
bc +
2
101 b
db da =
0
1
b2
y
by b2 y
+
db da =
=
200
2
200 10
2
a= 32 b=0
1
Z 1
b
b3
by
b2 y b3 y 10
=
+
da =
6
200
20
6 0
a= 32 200
Z 1
1
y
da =
=
3000 6000
a= 23
h a
ay i1
=
=
3000 6000 23
1
1
y
y
=
=
3000 6000
4500 9000
y
1
=
9000 18000
Z
125 250
9000 18000
D
43
43y
=
.
1800 3600
We then have
RRR
mean(E[B]) =
(b + c(1 y)) dA
RRR
=
dA
D
43
1800
7
120
43y
3600
43
43y
. (1.24)
105 210
24
where again 0.25 < x < 1 and 0 < y < 0.25. Hence only max can have
different signs, whilst mean and min always will be positive. But if D is
replaced by
D0 = {(a, b, c) : 2/3 a 1, 0 b, 0 c, 0.1 b + c 0.5}
(1.25)
(1.26)
Monty Hall We would have reasoned as follows. In line with the classical
theory of probability, the distribution of objects behind the doors should
be viewed as the outcome of a stochastic experiment. Moreover, there are
two possibilities with regard to the distributions. Either all distributions
are equally likely, or the car is most likely to appear behind the door in
the middle since the show is sponsored by a car maker who wants the best
display of the car. These considerations would have led us to select the door
in the middle. Assume now that the host opens door number one. Is it then
more likely that the car is behind door number three than behind the door
in the middle? An answer to this question must await the result of a few
calculations depending on the cases mentioned above.
Case 1. All distributions are equally likely. Let Ci be the event
that the car has been put behind door i, then
P (C1 ) = P (C2 ) = P (C3 ) = 1/3.
Now, let Si be the event that door i has been selected and Oi the event that
door number i has been opened by the host. Then
P (Ci ) = P (Ci |Sj )
25
for i, j = 1, 2, 3. In other words, since the objects were put behind the doors
in advance, the probability that the car has been put behind door i remains
the same, regardless of which door has been selected; the expression P (Ci |Sj )
is the probability that the car has been put behind door i, given that door
j has been selected, see Appendix 1. Moreover, Monty Hall will never open
the door behind which there is a car, thus
P (Oi |Ci and Sj ) = 0
for i, j = 1, 2, 3. However, since we dont know whether Monty Hall favours
one of the doors left, when there is a possibility of choice (in other words
when the door behind which the car is hidden has been selected) we have to
set
P (Oj |Ci and Si ) = p
where i, j = 1, 2, 3, i 6= j and 0 p 1. Lastly, we can be certain that
Monty Hall wont open the door with the car behind it, and thus
P (Ok |Ci and Sj ) = 1
for i, j, k = 1, 2, 3 where i 6= j 6= k.
Now we can calculate the probability that the car is behind the door we
selected initially, given that Monty Hall opened one of the other doors, as
P (Ci |Oj and Si ) =
and thus we dont need to consider the case when the car is behind door j.
Consequently
P (Oj Si ) = P (Oj Si Ci ) + P (Oj Si Ck )
but applying the product rule to the individual terms yields
P (Oj Si ) = P (Oj |Si Ci ) P (Si Ci ) + P (Oj |Si Ck ) P (Si Ck )
= P (Oj |Si Ci ) P (Ci |Si ) P (Si ) + P (Oj |Si Ck ) P (Ck |Si ) P (Si ).
and since weve already specified the probabilities of the factors at the beginning of this section, we can finally write the denominator as
1
1
P (Oj Si ) = p
P (Si ) + 1
P (Si )
3
3
p
1
= P (Si ) + P (Si )
3
3
P (Si )
=
(p + 1) .
3
Going back to our original expression, replacing the old expressions of the
numerator and denominator, we have
P (Ci and Oj and Si )
=
P (Ci |Oj and Si ) =
P (Oj and Si )
P (Si )
p
3
P (Si )
(p +
3
1)
p
.
p+1
p
1
p+1
2
and thus we would recommend choosing the other, still closed, door.
Case 2. The distribution of cars is skewed. In contrast to the
previous case, we now consider the case where P (Ci ) > P (Cj ) = P (Ck ) and
set
P (Ci ) = q
and
P (Cj ) = P (Ck ) =
27
1q
2
where
1
3
P (Ci |Oj Si ) =
(
D0 =
then
D0
R
D
pq
pq+ 1q
2
dA
pq
pq+ 1q
2
dA
0.84
28
Chapter 2
Classical Decision Theory
2.1
(c) 550 SEK if the product of the outcomes is a square number and neither
(a) nor (b) holds.
Game 3
One throw with four distinguishable dice.
Stake: 200 SEK
Prize: 202 SEK if the product of the outcomes is at least two.
Game 4
One throw with four distinguishable dice.
Stake: 200 SEK
Prize: 250 000 SEK if the product of the outcomes is one.
According to the classical theory all outcomes of these lotteries are equally
likely. Moreover, each lottery has a value that is equal to its expected prize,
which is defined as the weighted arithmetic mean of the prizes with their
probabilities serving as weights. Accordingly, the classical theory assigns the
following expected prizes to our lotteries:
Game 1:
11
1296
12000 +
Game 2:
24
1296
3000 +
Game 3:
1295
1296
202 = 201.84
Game 4:
1
1296
250000 = 192.90
16
1296
65
1296
4000 +
2000 +
54
1296
110
1296
1000 = 192.90
550 = 202.55
Game 3: 31
Game 4: 48 188 098
If we then demand that the probability of a mean deviation of at most
4 SEK from the expected prize, should be at least 0.99, then Chebyshevs
inequality yields the following numbers:
Game 1: 8 901 305
Game 2: 2 199 588
Game 3: 197
Game 4: 301 175 611
Now, these estimates are mainly interesting from an historical point of
view since modern sharper estimates yield lower but still forbiddingly high
numbers. The Bernstein-Bennett inequality (Bennett, 1962), for instance,
yields the following ones:
Game 1: 953 608
Game 2: 235 540
Game 3: 21
Game 4: 32 134 317
Using Talagrands (1995) inequality we obtain even smaller numbers.
Note that it isnt applicable to game 3 due to the size of the interval in
comparison to the maximum win:
Game 1: 810 700
Game 2: 200 185
Game 4: 27 332 010
Moreover, extensive simulations (see Appendix 3), yield numbers in the
same region. Hence these numbers are likely to be the true ones.
But then the evaluation of games of chance given by the classical theory
seems rather unconvincing and its influence almost a mystery. As a basis
for a choice between the lotteries above the following considerations seem
more natural. A risk averse person should choose game 3 whereas one only
interested in games where there are prospects for a substantial gain should
31
choose game 4. If, on the other hand, someone badly needs 12 000 SEK, then
game 1 ought to be the most attractive one. Only if such considerations are
not met with success a trade-off between prizes and probabilities seems to be
needed. Then basing a choice between games on their expected prizes seems
to be attractive mainly because the expected prize takes care of probabilities
and prizes in such a simple way.
After this account the salient features of Pure Classical Decision Theory
may be summarized as follows:
it aims at supporting a choice between options which are completely
analyzed; i e the consequences of each option are determined and have
prizes attached to them,
the probabilities of the consequences are determined by considerations
of symmetry, and
the value of an option equals its expected prize.
2.2
2.2.1
Introduction to utilities
if x 1000
x,
(2.1)
U (x) =
log10 (x) 1000 , if x > 1000
3
Then the expected utility of the games in section 2.1 are as follows:
Game 1:
11
16
54
U (12000) +
U (4000) +
U (1000) = 68.03
1296
1296
1296
Game 2:
24
65
110
U (3000) +
U (2000) +
U (550) = 123.33
1296
1296
1296
Game 3:
1295
U (202) = 201.84
1296
32
Game 4:
1
U (250000) = 1.39
1296
Hence a person who opts for choosing between games based on this utility
function should choose game 3, provided she evaluates the games in accordance with the utility principle: the value of a game equals its expected
utility.
But what grounds are there for evaluating games in accordance with this
principle? In case
U (x) = ax + b with a > 0
you can rank the games via the Law of Large Numbers since the expected
utility of a game g then equals
aE[g] + b.
But otherwise other grounds must be provided. Attempts in this direction
will be discussed in chapter 5.
33
Chapter 3
Supersoft Decision Theory
Supersoft Decision Theory (SSD) is a family of modifications of Classical
Decision Theory. The common trait is that vague and numerically imprecise
estimates of probabilities and values are allowed. The main advantage of
admitting such estimates is that it makes a smooth application of decision
theoretic methods to new problems feasible. This much can be seen from
the two first examples of chapter 1. The main drawback, of course, is that
calculations then become much harder and therefore in many cases require a
suitable software. In this chapter one version of the theory, called weak SSD
(wSSD), is delineated. For other versions, see Danielson (1997), Ekenberg
(2005), and Sundgren (2011).
3.1
We assume that the number of options is finite and that each option has a
finite number of consequences. Sometimes the consequences of an option are
best described as nodes of a tree. Such a tree can in some cases be quite
extensive, see e.g. Johansson (2003, 300-315).
Mathematically, a finite tree can be identified with a finite set T of finite
sequences of natural numbers such that
i. T contains exactly one sequence of minimal length, called the root of
the tree, and
ii. if (s1 , . . . , sn1 , sn ) is an element of T , then either (s1 , . . . , sn1 , sn ) is
the element of minimal length or (s1 , . . . , sn1 ) is an element of T .
If (s1 , . . . , sn1 , sn ) and (s1 , . . . , sn1 ) both are in T , then (s1 , . . . , sn1 , sn )
is an immediate successor of (s1 , . . . , sn1 ) in T .
34
must always be represented by open sets. Note in particular that if an estimate of E is vague then the same must hold of the corresponding estimate of
not E. Hence these two estimates must be represented by overlapping open
sets. Finally, the value of each leaf is estimated.
Sometimes such an evaluation is straightforward but sometimes it takes
the form of an aggregation. In public procurement, for instance, tenders
typically are first evaluated according to different criteria such as quality
and price. Then these evaluations are aggregated in some other way. In
wSSD, primary evaluations of almost any form are admitted but three forms
present themselves as the most natural ones:
i. v(E) = a,
ii. a < v(E) < b, and
iii. an ordering combined with an ordering of distances.
For convenience, all numbers are assumed to be non-negative and at most
1. As an end product the original decision problem is represented by a mathematical structure F called a decision frame. Such a structure typically has
the form (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]). Here o1 , . . . , on are the options
considered and T1 , . . . , Tn the corresponding decision trees. Moreover, S[p],
is the set of probability estimates and U[v] the set of value estimates.
Lets look at a couple of examples that utilizes the above premises. The
tree structures in figure 3.2 represents two different options o1 and o2 . If we
let T1 and T2 be the corresponding decision trees then
T1 = {(E1 ), (E1 , E1,1 ), (E1 , E1,2 )}
and
T2 = {(E2 ), (E2 , E2,1 ), (E2 , E2,2 ), (E2 , E2,3 )} .
E1
E1,1
E2
E1,2
E2,1
E2,2
E2,3
and
v(E1,2 ) = b
where a and b are real numbers in the interval [0, 1]. Again, using vague
estimates for the values of E2,1 , E2,2 and E2,3 we can for example set
a < v(E2,1 ) < v(E2,2 )
and
v(E2,3 ) = 2v(E2,1 ).
Using the definitions we made earlier, we can now define the decision
frame
F = (E1 , E2 , T1 , T2 , S[p], U[v]) ,
where
S[p] = {P (E1 ) = 1,
P (E1,1 ) = a,
P (E1,2 ) = 1 a,
a < P (E2,1 ) < b,
P (E2,2 ) < P (E2,3 )}.
and
U[v] = {v(E1,1 ) = a,
v(E1,2 ) = b,
a < v(E2,1 ) < v(E2,2 ),
v(E2,3 ) = 2v(E2,1 )}.
37
3.2
38
n
X
j=0
u1,
,
u997
u998
u998
u999
u1000
u998
u1000
u0
u0
bye
u0
3.3
When the original decision problem is represented by a decision frame, evaluations and decision rules originally formulated for classical decision theory
may be used to decide whether a proposed project is acceptable or to make
a choice between the options considered. The fact that this representation
is not straightforward, however, calls for some circumspection in employing
them. In this section the evaluations used in example 1 of chapter 1 will be
presented followed by a discussion of decision rules based on extreme values.
3.3.1
Qualitative Evaluations
In many evaluations, options that are too risky are eliminated at an early
stage. In other cases only options which promise high returns are given a
closer examination. To formulate such evaluations in wSSD, take as a point
of departure a decision maker B and a satisfiable decision frame
F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]).
Then the probability of the option oi yielding an outcome of value V is
sufficiently high if and only if there exists a suitable number r such that
P (Li,V ) r. Here Li,V is the set of leaves of Ti with a value at least as high
as V (see figure 3.4), and r depends on B. This leaves us with the task of
explicating P (Li,V ) r and the outcome u has the value V .
41
root
0.5
2
0.2
0.3
Figure 3.4: Let the tree in the figure be T1 and set V = 3, then the set
L1,V = {5, 6} and consequently P (L1,V ) = P ({5, 6}) = 0.2 + 0.3 = 0.5.
Explication of P (Li,V ) r
At least the following candidates present themselves:
i. P (Li,V ) r holds for all solutions of S[p],
ii. P (Li,V ) r holds for all regular solutions of S[p],
iii. P (Li,V ) r holds for a great deal of the solutions of S[p],
iv. P (Li,V ) r holds for some regular solution of S[p], and
v. P (Li,V ) r holds for some solution of S[p].
Comment. A solution is regular if and only if it does no contain values
that are too close to the endpoints. If we recall that the representation in
wSSD uses somewhat wide intervals, then (v) is too weak since it allows values
near the endpoints, and perhaps (i) is too strong, even though its occurrence
should be noted. Moreover, (iii) should be defined more precisely. One way
of doing this is to say that (iii) holds if and only if
volume of A
volume of B
has a certain size. Here B is the set of solutions to S[p] and A the set
of solutions to S[p] such that P (Li,V ) r. Another possibility is to use
the notion of contraction introduced by Mats Danielson (Danielson, 1997),
which in turn is a modification of the notion of proportion due to Love
Ekenberg (Ekenberg, 1994). Note that the outcome u has the value V can
be explicated along the same lines.
42
3.3.2
Expected Value
(3.1)
(3.2)
,Z
Z
CF (oi ) =
E(p, v, oi ) dpdv
p
dpdv
p
(3.3)
Here
E(p, v, oi ) = E[Ti ] =
m
X
Pj Vj =
j=1
m
X
Pj E[Ti,j ]
j=1
dp
(3.4)
are permissible and sometimes quite useful in ordering options, see example
3 of chapter 1.
If one of these evaluations is to be chosen, then the obvious choice is
CF (oi ) since it is a mean value of mean values but sometimes a conservative
decision maker wants to base a decision on all of them.
Example. Set o1 = (q, 1 q; x, y) and o2 = (r, 1 r; x, y) with 0 < r <
q < 1 and 0 < y < x < 1. Here o1 is a lottery which yields x with probability
q and y with probability 1 q. Then o1 dominates o2 stochastically and
therefore clearly is the better option.
To see how the evaluations introduced above orders them, note first that
AF (o1 ) = inf (qx + (1 q)y) = inf (y + q(x y)) = y,
43
dydxdrdq
Z qZ
0
Z qZ
0
1
[y]x0 dxdrdq
Z0 1 Z0 q Z0 1
=
x dxdrdq
Z 1 Z q 2 1
x
drdq
=
2 0
0
0
Z 1Z q
1
=
drdq
0
0 2
Z 1 h iq
r
=
dq
2 0
0
Z 1
q
=
dq
0 2
2 1
q
=
4 0
= 1/4,
0
44
Z qZ
qx + (1 q)y dydxdrdq
E(p, v, o1 ) dvdp =
p
0
1
Z qZ
0
1
qx qy + y dydxdrdq
x
Z 1Z qZ 1
qy 2 y 2
qxy
=
+
dxdrdq
2
2 0
0
0
0
Z 1Z qZ 1
qx2 x2
+
dxdrdq
=
qx2
2
2
0
0
0
1
Z 1Z q 3
qx
qx3 x3
=
+
drdq
3
6
6 0
0
0
Z 1Z q
q 1
=
+ drdq
6
0
0 6
Z 1h
i
qr r q
=
+
dq
6
6 0
0
Z 1 2
q
q
+ dq
=
6
0 6
3
2 1
q
q
=
+
18 12 0
1
1
5
=
+
=
18 12
36
=
45
and
Z
Z qZ
rx + (1 r)y dydxdrdq
E(p, v, o2 ) dvdp =
p
0
1
Z qZ
0
1
rx ry + y dydxdrdq
x
Z 1Z qZ 1
ry 2 y 2
rxy
=
+
dxdrdq
2
2 0
0
0
0
Z 1Z qZ 1
rx2 x2
+
dxdrdq
rx2
=
2
2
0
0
0
1
Z 1Z q 3
rx
rx3 x3
=
+
drdq
3
6
6 0
0
0
Z 1Z q
r 1
=
+ drdq
6
0
0 6
q
Z 1 2
r
r
dq
=
+
12 6 0
0
Z 1 2
q
q
=
+ dq
6
0 12
3
1
q
q2
=
+
36 12 0
1
3
1
=
+
=
36 36
9
=
5
36
1
4
20
0.56
36
1
9
1
4
4
0.44,
9
and
CF (o1 ) =
hence CF (o1 ) > CF (o2 ) and we see that CF induces the correct order.
3.3.3
V1 1
Vm m
and
max(oi ) = max( sup V1 , . . . , sup Vj , . . . , sup Vm )
V1 1
Vj j
Vm m
47
Chapter 4
Other Approaches to Decisions
under Uncertainty
In this chapter a few well-known decision rules, when we have more than one
probability distribution, are presented and evaluated. First decision rules
developed within Statistical Decision Theory are discussed, thereafter two
different proposals put forward by philosophers are considered.
4.1
48
4.2
4.3
Watson (1974)
6
2
v
if q 1 p. This should be compared with the result obtained by
employing CF . Then o1 is the better option for a < b and o2 the better
one for a > b with b 0.13. Hence CF fares better in this example.
4.4
Levi (1974)
the size of a. Hence Levys proposal seems to be overly liberal in some cases.
On the other hand o2 but not o1 is S-permissible if o1 = (p, 1 p; x + a, y b),
o2 = (q, 1 q; x, y), 0 < p < q < 1, and 0 < y b < y < x < x + a < 1 even
if b 0 and a 1.
4.5
G
ardenfors and Sahlin (1982)
These two authors propose that each epistemically possible probability distribution be given a probability index and that options having a distribution
with a low index be discarded at the outset. A selection is then to be based on
the evaluation AF , see section 3.3.2. Although no precise criteria for assigning indices to options are presented, their treatment of an example describing
the qualms of a lady contemplating various betting odds, indicate that all
probability distributions commensurate with somewhat imprecise probability
estimates, should be assigned a low reliability index. Hence o1 rather than
o2 should be chosen if o1 = (0.5, 0.5; x, y), o2 = (p, 1 p; z, u), 0 < p < 1,
and 0 < y < x < u < z < 1 even though o2 stochastically dominates o1 .
As presented, the upshot of discarding options having a low reliability index
therefore seems questionable. But if that step is removed from the proposal
of Gardenfors and Sahlin, then it will suffer from all deficiencies accruing to
AF .
51
Chapter 5
Evaluations and Choice Rules
This chapter contains a general discussion of evaluations of options and choice
rules based on such evaluations. Since the discussion at some points is somewhat technical, it should be pointed out that the prospects of finding evaluations that are markedly superior to the ones introduced in chapter 1 and 3
seem to be bleak.
5.1
Fundamental Concepts
Let C = {c1 , . . . , cn } be a finite set of real numbers such that c1 > > cn .
Then the set of finite options O over C is the least set S such that
P
i. if A is a vector (p1 , . . . , pn ; o1 , . . . , on ) such that 0 pi ,
pi = 1, then
A S, and
P
ii. if A1 S, . . . , Am S, 0 pi and
pi = 1, then
(p1 , . . . , pm ; A1 , . . . , Am ) S.
A is a normal option if and onlyP
if A equals (p1 , . . . , pn ; c1 , . . . , cn ) for
some p1 , . . . , pn such that 0 pi and
pi = 1. Each finite option A can be
reduced to a unique normal option N (A) and two options A and B are said
to be congruent if and only if N (A) = N (B). An evaluation is a function
V : O RR R such that V (A, f ) = V (B, f ) if A and B are congruent.
Comments. Finite options are to be viewed as lotteries which ultimately
yield USD. A fixed finite set C is used in order to simplify the presentation.
C is to be thought of as a sufficiently large set that contains all outcomes
explicitly mentioned in this chapter.
52
Examples
a1 Expected utility given f .
E(A, f ) = p1 f (c1 )+ +pn f (cn ). Here and below N (A) = (p1 , . . . , pn ; c1 , . . . , cn ).
a2 Qualitative evaluation
P given f and a risk index (r, s). P
S(A,
P f, r, s) = 1 if r pi s and S(A, f, r, s) = 0 if r pi > s. Here
r indicates that the sum is taken over all i such that f (ci ) r.
a2,c Continuous qualitative
P evaluation given f and risk index (r, s, ).
S(A, f, r, s, ) = 1 if r pi s, S(A, f, r,P
s, ) = (s + t)/ if s t =
P
p
s
+
,
and
S(A,
f,
r,
s,
)
=
0
if
r i
r pi s + .
a3 Maximin.
m(A, f ) = min(f (ci ), pi > 0).
a4 Maximax.
M (A, f ) = max(f (ci ), pi > 0).
a5 Hurwicz (1951).
H (A, f ) = M (A, f ) + (1 )m(A, f ).
a6 Maximin regret given K.
RK (A, f ) = inf(R(A, B, f ), B 6= A, B K).
Here R(A, B, f ) = min(f (ci ) f (cj ), pi , qj > 0)
if N (B) = (q1 , . . . , qn ; c1 , . . . , cn ).
Given an evaluation V and a function f , we can define a semimetric dV,f
and an order V,f on O by setting dV,f (A, B) = |V (A, f ) V (B, f )| and
A V,f B if and only if V (A, f ) V (B, f ). These notions can in turn
serve to define a choice rule RV,f as follows: RV,f : 2O R 2O such that
RV,f (A, ) = {A A | V (A, f ) + V (B, f ), for all B A}. RV,f (A, ) is
called the set of -optimum options given V, f .
Comments. d is a semimetric on a set M if and only if d(x, y) 0,
d(x, y) = d(y, x), and d(x, z) d(x, y)+d(y, z). Note that the last inequality
implies that d(x, x) = 0. As is customary, is to be a small non-negative
number. is introduced because A does not always contain an option that
is an optimum one given V, f but, at least for evaluations considered in this
chapter, always an - optimum one. To see this, set
A = {(p, 1 p; c1 , c2 ) | 0.5 < p < 0.9} ,
f (c1 ) > f (c2 ) and
V ((p, 1 p; c1 , c2 ), f ) = pf (c1 ) + (1 p)f (c2 ).
53
54
Examples
b1 Set A = (0.99, 0.01; 1000, 0), B = (0.01, 0.99; 2, 1), and A > B. Then
(>, {A}, {B}) is a strong counterexample to minimax as a choice rule
generator at level (F2 , 0).
b2 Set A = (0.01, 0.99; 1000, 0), B = (0.99, 0.01; 999, 998), and B > A.
Then (>, {B}, {A}) is a strong counterexample to maximax as a choice
rule generator at level (F2 , 0).
b3 Set B = (1s, s; t, r), C = (1s, s; u, r), t u r, and B > C. Then
(>, {B}, {C}) is a weak counterexample to S(A, f, r, s) as a choice rule
generator at level (F2 , 0).
b4 Set B = (0.99, 0.01; 999, 0), A = (0.01, 0.99; 1000, 1), and B > C. Then
(>, {B}, {C}) is a strong counterexample to H (A, f ) as a choice rule
generator at level (F2 , 0).
b5 Set B = (0.99, 0.01; 999, 0), A = (0.01, 0.99; 1000, 1), K = {B, C}, and
B > C. Then (>, {B}, {C}) is a strong counterexample to RK (A, f )
as a choice rule generator at level (F2 , 0).
b6 (Bernoulli 1738, Menger 1934) Set B = (1; 0), C = (231 , . . . , 21 , 231 ; 230
15, . . . , 1 15, 15), B > C, and i(x) = x, for all x in R. Then
(>, {B}, {C}) is a weak counterexample to E(A, f ) at level ({i}, 0).
b7 (Allais 1953, 1979) Set B = (1; 106 ), C = (0.1, 0.89, 0.01; 5106 , 106 , 0),
B > C, and i as in (b6 ). Then (>, {B}, {C}) is a strong counterexample
to E(A, f ) as a choice rule generator at level ({i}, 0).
Remarks. As expected, all evaluations ignoring probabilities fail strongly
at such high level as (F2 , 0). Hence it is doubtful whether it is good policy
to any of these to a large extent. However, the status of E(A, f ) as a choice
rule generator remains to be determined. To this end, section 5 contains an
account of what can be inferred from the Allais example with respect to this
problem. Moreover, the status of evaluations as preference generators must
also be determined. This will be done in sections 3 and 4 with the help of
common ratio tests. But first a few general remarks to clarify some issues.
55
5.2
5.2.1
Miscellaneous remarks
On Tests
5.2.2
On Options
5.2.3
Classification of Evaluations
5.2.4
Take as a starting point the Allais example (Allais, 1953, 1979). Set
B
C
D
E
= (1; 106 ),
= (0.1, 0.89, 0.01; 5 106 , 106 , 0),
= (0.11, 0.89; 106 , 0),
= (0.1, 0.9; 5 106 , 0),
5.3
5.3.1
Set
Ap = (p, 1 p; c1 , c3 ) and
Bq = (q, 1 q; c2 , c3 )
with c1 > c2 > c3 and 1 > q > p > r > 0. Then ({Bq , Ap , Arp , Brq }, >) is
called a common ratio test. In most cases p and q are comparatively large
numbers and r a small one. Because of their simple structure, common ratio
tests are deemed ideal for testing evaluations as preference generators. The
fundamental observation concerning such tests is that
E(Ap , f ) > E(Bq , f ) if and only if E(Arp , f ) > E(Brq , f ),
for all f F1 .
On the other hand, for some c1 , c2 , c3 and when p and q are large but r
small, most people, see Kahneman and Tversky (1979), prefer Bq to Ap and
Arp to Brq . So ({Bq , Ap , Arp , Brq }, >) can be a counterexample to E(A, f )
as a preference generator at level (F1 , 0).
Now, a challenge for anyone who wishes to propose an alternative to
expected utility, say V (A, f ), is to show that V (Ap , f ) > V (Bq , f ) if and only
if V (Arp , f ) > V (Brq , f ), for all f in F1 does not hold, and that V (A, f ) is
compatible with stochastic dominance, see Sugden (1986) for a lucid account.
Quite a few attempts have also been successful in these respects.
However, showing this does not entail that ({Bq , Ap , Arp , Brq }, >) cannot
be a counterexample to V (A, f ) as a preference generator at level (F2 , 0).
But the latter result is what is needed in order to have an evaluation that
is a substantial improvement upon expected utility. The prospects for finding such an evaluation are, however, not particularly bright; this much can
be concluded from the following observation: Select a set {Bq , Ap , Arp , Brq }
58
as above with c1 c2 > c3 and 1 > q > p > 0.5. Let V be an evaluation that is compatible with stochastic dominance and f F2 . Assume that V (Arp , f ) > V (Brq , f ). Now V (Aq , f ) > V (Bq , f ), and hence
V (As , f ) > V (Bq , f ) for some s, p s < q. Moreover, V (Ars , f ) > V (Brq , f ).
Hence the main advantage that V (A, f ) may have over E(A, f ) is that of a
smaller distance between probabilities. The reader should bear this in mind
when considering the following detailed examination.
5.4
5.4.1
M3 (A, f )
S 2 (A, f )
H
(p(1 p)) 2
= f (d) af (d)(1 2p)
+ 2bf (d).
p
2
59
1
.
k2
Then
(p(1 p)) 2
< ak/4.
af (d)(1 2p)
2
Hence H
< 0 if p is a small number. Accordingly, H is not an increasing
p
function in p, and it is therefore hardly a serious rival to expected utility.
In 1979 Hagen proposed the following modification of Ha,b :
Hg,b (A, f ) = E(A, f ) g(S(A, f )) + b
M3 (A, f )
+
S 2 (A, f )
p))
= f (d) 1 g 0 f (d) (p(1 p)) 2 (1 2p)
2b .
p
2
1
Hence H
> 0 only if b < 0.5. Assume then that c > 0 and that (p(1 p)) 2
p
1
k with k large. Then g 0 (x) < k1 if f (d) 2kx. Hence g 0 (x) = 0, which yields
2
a contradiction. Hence we must have a uniform upper bound M on f (d),
which in turn imposes restrictions on admissible f :s and d:s. Moreover, g 0
must approach 0 as fast as p does. The simplest function satisfying these
conditions seems to be xa with a a large number. But then the contribution of
the term g (S(A, f )) will be negligible in most cases. Hence we may neglect
this term when determining how Hg,b performs in common ratio tests. To
simplify the comparison with E(A, f ), set p = sq with 0 < s < 1. Then
q
q + b(1 2q)
1
q(1 2b) + b
=
qs qs + b(1 2qs)
s qs(1 2b) + b
qs(1 2b) + b s(q(1 2b) + b)
=
s(qs(1 2b) + b)
b bs
> 0.
=
s(qs(1 2b) + b)
Hence Hg,b is less risk averse than E(A, f ) in most common ratio tests and
therefore hardly a serious rival to it.
60
5.4.2
Fishburn (1983)
E(A, g1 (f ))
.
E(A, g2 (f ))
>0
=
pg2 (f (c1 )) + (1 p)(g2 (f (0))) pg2 (f (c2 )) + (1 p)(g2 (f (0)))
if and only if
(1 p)g2 (0)g1 (f (c1 )) > (1 p)g2 (0)g1 (f (c2 ))
if and only if g1 (x) is strictly increasing for x > 0. Hence the introduction of
g1 seems pointless. Now
V (Ap , f ) V (Bq , f ) > 0
if and only if
pg1 (f (c1 ))(qg2 (f (c2 )) + (1 q)g2 (0)) > qg1 (f (c2 ))(pg2 (f (c1 )) + (1 p)g2 (0))
if and only if
p(1 q)
g1 (f (c2 ))
>
.
q(1 p)
g1 (f (c1 ))
Hence V is slightly more risk averse than E. But note that this is is due to
the introduction of the function g2 with g2 (0) > 0. Hence the price seems to
be too high.
61
5.4.3
These two authors claim that regret and disappointment (Loomes & Sugden,
1982; 1986) ought to influence decision making and in their paper of 1986
they propose an evaluation utilizing these ideas. Their work is closely related
to that of (Bell, 1982; 1985). But, for all we know, Bell has contented himself
with a discussion of special cases and never presented any evaluation. The
proposal of Loomes and Sugden is as follows:
X
V (A, f ) =
pi f (ci ) + D(f (ci ) E(A, f ))
where 1 i n and A = (p1 , . . . , pn ; c1 , . . . , cn ). Here D is supposed to measure elation and disappointment. D is non-decreasing and derivable. Moreover, D(x) is convex if x > 0 and concave if x < 0. Finally, D(x) = D(x)
which yields D(0) = 0. To see when V is compatible with stochastic dominance, let Ap and f be as in section 5.4.2. Then
V (Ap , f ) = pf (c1 ) + D(f (c1 ) pf (c1 )) + D(0 pf (c1 ))
= pf (c1 ) + D((1 p)f (c1 )) D(pf (c1 ))
and
V
= f (c1 ) f (c1 )D0 ((1 p)f (c1 )) f (c1 )D0 ((pf (c1 ))
p
= f (c1 )(1 D0 ((1 p)f (c1 )) D0 (pf (c1 ))).
Hence 0 D0 (x) < 1 for 0 < x < a, for some a > f (c1 ). But then x D(x)
is strictly increasing in this interval. To see how V performs in common ratio
tests, set
D(pf (c1 )) = pf (c1 ) d
and
D(qf (c2 )) = qf (c2 ) e.
Then
V (Ap , f ) V (Bq , f ) = d + D((1 p)f (c1 )) e D((1 q)f (c2 )).
But
d + D(1 p)f (c1 )) e D((1 q)f (c2 )) 0
if pf (c1 ) qf (2) with equality only if pf (c1 ) qf (2) since xD(x) is strictly
increasing and compatibility with stochastic dominance holds. Hence V is
less risk averse than E in common ratio tests and hardly a serious rival to
E.
62
5.4.4
These two authors propose an evaluation that is based on the idea that we
should replace the notion of the value of an outcome with the notion of the
value of an outcome given a probability. Since this idea contradicts one
of the basic presuppositions of normative decision theory, their proposal is
presented here only for the sake of completeness.
X
V (A, f ) =
pi gi (f (ci )),
where 1 i n and A = (p1 , . . . , pn ; c1 , . . . , cn ). Here
Z
1
(x, t) dt
gi (x) =
pi
with the integral going from si1 to si , s0 = 0, si = p1 + + pi , and
: R [0, 1] R such that is continuous, non-decreasing in the first
variable, and (0, p) = 0. Setting (x,
R p) = x we get V (A, f ) = E(A, f ) as
expected. If V is to be regular, then I (x, t) dt = x for I = [0, 1]. Moreover,
{p | (x, p) is not strictly increasing for all p} has measure zero if V is compatible with stochastic dominance. To see how V performs in common ratio
tests, we will here only consider the case (x, p) = x h(p), h continuous in
[0, 1]. Now, in view of Weierstrass approximation theorem, it suffices to set
h(p) = Pn (p) with Pn a polynomial of degree n > 0. If V is compatible with
stochastic dominance, then Pn 0 in [0, 1], and if Pn is to serve as an risk
averse weight, Pn must be strictly increasing. So the most interesting case
is Pn (p) = (n + 1) pn . Set Ap = (p, 1 p; c1 , 0), Bq = (q, 1 q; c2 , 0) with
0 < c2 < c1 , 0.5 < p < q, and f strictly increasing with f (0) = 0. Then
V (Ap , f ) = pn f (c1 ) and
V (Bq , f ) = qn f (c2 ).
Moreover,
V (Arp , f ) = rn pn f (c1 ) and
V (Brq , f ) = rn qn f (c2 ).
Hence, for this choice of , V performs very much as E in common ratio
tests.
5.4.5
Quiggin (1982)
Quiggin proposes that an evaluation of an option should not be based directly upon the values and probabilities of the given outcomes, but that the
63
probabilities first should be modified. The reason behind this claim is solely
empirical: it seems to conform with observed behavior. Any such proposal
is, of course, at variance with one of the basic tenets of normative decision
theory. This view has, however, gained some momentum, so we had better
comment upon it here. The simplest way to modify the given probabilities
is to set
V (A, w, f ) =
n
X
w(pi )f (ci ),
i=1
n
X
i=1
5.4.6
Yaari (1987)
<n
X
i=1
5.5
On Continuous Evaluations
5.5.1
n 3.
Hence by a new iteration we get that the most risk averse evaluation has the
the form vpn . But this evaluation performs as E in common ratio tests.
5.5.2
5.6
Milnor (1953) is, at least from a mathematical point of view, the most satisfactory of the systems directly inspired by von Neumann and Morgenstern
(1944), and the system of Savage (1954; 1972) has been the most influential
of the systems inspired by Ramsey (1931). Finally, Oddie and Milne (1990) is
chosen mainly because they, in contradistinction to most authors, explicitly
aim at a justification of the utility principle. A discussion of their system
also forms a natural bridge to a general discussion of the possibilities of obtaining a non-circular formal justification of the utility principle. Hopefully,
a discussion of the systems mentioned above will convince most readers of
the tenability of the claims made at the outset. In case of doubt, a fuller
account is found in Malmnas (1990).
5.6.1
Fundamental Concepts
5.6.2
Proof of (iv)
Case 1 = 1. Then
s(, s(, a, b), s(, a, b)) = s(, s(, a, b), a)
= s(, s(1 , b, a), a)
= s((1 ), b, a)
= s(1 (1 ), a, b)
= s( + (1 ), a, b).
Case 2 < 1. Then
s(, a, b) = s(, a, s(, a, b))
with = ( )/(1 ). Hence
s(, s(, a, b), s(, a, b)) = s(, s(, a, s(, a, b)), s(, a, b))
= s(, a, s(, a, b))
= s(1 , s(1 , b, a), a)
= s((1 )(1 ), b, a)
= s(1 (1 )(1 ), a, b).
But 1 (1 )(1 ) = + (1 ). Hence
s(, s(, a, b), s(, a, b)) = s( + (1 ), a, b).
They also consider the relation at least as good as between prospects.
This relation, labeled , is employed to define the relations better than (>)
and equally good () in the following way: a > b if and only if a b and
not b a, and a b if and only if a b and b a. The following axioms
are then assumed:
Axiom 1. is a complete semi-order on S.
Axiom 2. { | s(, a, b) c} and { | c s(, a, b)} are closed, a, b, c S.
Axiom 3. If a a0 , then s(1/2, a, b) s(1/2, a0 , b).
From these axioms, the following theorem is then proved.
69
Representation Theorem
a. If and s satisfy (i) - (iii) and Axioms 1 - 3, then there exists a function
U : S R, which is compatible with and s.
b. If U and U 0 are real-valued functions on S that are compatible with
and s, then U = U 0 + , for some > 0 and .
Here, a function U : S R is compatible with and s if and only if:
i. a b if and only if U (a) U (b), and
ii. U (s(, a, b)) = U (a) + (1 )U (b).
Proof of the Representation Theorem
The Representation Theorem follows from Lemmas 1 - 8 below:
Lemma 1. If a b c, then b s(, a, c), for some .
Lemma 2. If b s(i , a, c) and limi i = , then b s(, a, c).
Lemma 3. If a > b, then a > s(1/2, a, b) > b.
Lemma 4. If a > b and 0 < < 1, then a > s(, a, b) > b.
Lemma 5. If a a0 , then a s(, a, a0 ).
Lemma 6. If a a0 , then s(, a, b) = s(, a0 , b).
Lemma 7. If a > b, then s(, a, b) > s(, a, b) if and only if > .
Lemma 8. If a > b > c, then there is a unique such that b s(, a, c).
Part(a)
Case 1. a b, for all a, b S. Set U (x) = 1. Then U is compatible
with and s.
70
and
Ma0 ,b0 (x) =
1
= Ma0 ,b0 (x).
iii. c > d > x. Then d s(, c, x), for some , 0 < < 1. Hence
a,b (d) = a,b (c) + (1 )a,b (x).
So
(1 )(a,b (x) a,b (d) = (a,b (c) a,b (d)).
Hence
Ma,b (x) =
U (c) U (d)
U 0 (c) U 0 (d)
and
= U (d) U 0 (d).
Proof of lemma 1. Set
I1 = { | s(, a, c) b} and
I2 = { | b s(, a, c)} .
Then I1 I2 = [0, 1] since is a complete semiorder on S. Moreover, I1 and
I2 are closed and non-empty. Hence I1 I2 6= .
72
73
p2
pn
,...,
; c2 , . . . , cn ) = b2 ,
1 p1
1 p1
then
(p1 , . . . , pn ; c1 , . . . , cn ) = s(p1 , b1 , b2 ).
It can the be noted that (c) and (d) hold:
(c) If C = {c1 , . . . , cn } and V : C R is such that V (c1 ) > > V (cn ),
then there exists a system (S, s, ) that is induced by V and satisfies
the Herstein-Milnor axioms.
(d) If V is as in (c), (S, s, ) is induced by V , a = (p1 , . . . , pn ; c1 , . . . , cn ),
and U is an extension of V to S that is weakly compatible with and
compatible with s, then
U (a) =
n
X
pi U (ci ).
i=1
for some , such that 1 > > > 0. We may then derive o1 > o2 if and
only if
p > q + (1 q).
We also get E(o1 , V ) > E(o2 , V ) if and only if
v3 v2
v4 v2
p>q
+ (1 q)
.
v1 v2
v1 v2
Now, these orderings coincide for all values of p, q such that 0 < p, q < 1 if
and only if
v3 v2
and
v1 v2
v4 v2
=
.
v1 v2
Hence the Herstein-Milnor axioms force to accept the principle of maximizing expected utility if and only if they force her to set
v3 v2
and
v1 v2
v4 v2
=
.
v1 v2
But they clearly can not do that, since (e) of the previous section shows
that any choice of , such that 1 > > > 0 is compatible with the
Herstein-Milnor axioms. Hence these axioms put very mild constraints on
the admissible choices of and . Note also that (e) of section 5.6.2 shows
when o1 > o2 is compatible with the Herstein-Milnor axioms. Indeed, set
o1 = (p1 , . . . , pn ; c1 , . . . , cn ) and
o2 = (q1 , . . . , qn ; c1 , . . . , cn )
with c1 > > cn . Then o1 > o2 is compatible with the Herstein-Milnor
axioms if and only if
j
X
pi >
i=1
j
X
i=1
78
qi ,
j
X
pi
qi > .
i=1
i=1
Set
1 j =
> j + 1 0.
3
Then (e) of section 5.6.2 shows that there exists a system that contains
o1 > o2 and satisfies the Herstein-Milnor axioms.
Necessity. Obvious.
Hence these axioms provide a decision maker, facing a choice between two
options, with no useful guidelines.
Numerical Utility
von Neumann and Morgenstern (1953, p. 617) claim that their axioms make
utility a number modulo a positive linear transformation. In this section
it will be shown that this contention is at best somewhat misleading. To
see this, assume that c0 > c1 > > cn . Set, then, ci s(xi , c0 , cn ) for
1 i < n, and assume that 1 > x1 > > xn1 > 0. We can then construct
a system S that satisfies the Herstein-Milnor axioms and show that there
exists a unique function U : S R compatible with and s and such that
U (c1 ) = 1 and U (cn ) = 0. We then get U (ci ) = xi for 1 i < n. Hence the
construction of S has not provided us with a more specific value than we had
at the outset. Thus it cannot be said of a person who formally accepts the
Herstein-Milnor axioms that he has a numerical utility function unique up
to a positive linear transformation even if she is willing to let s(, a, b) stand
for a + (1 )b. Moreover, a persons acceptance of these axioms does not
imply that she has such a utility function. Rather, as we saw in section 5.6.2,
we have to let s(, a, b) stand for a + (1 )b to get that result; but, as
we saw there, from this assumption alone we can deduce that our agent has
a numerical utility function modulo a positive linear transformation. Thus
the contribution of the axioms can be questioned even in his respect.
5.6.3
80
From these axioms Savage proves that there exists a unique finitely additive probability measure P on P(X) such that P (A) P (B) if and only if
A B. This measure is the employed to associate sets of acts with gambles
as follows: let k1 , . . . , kn be consequences and assume that
n
X
pi = 1,
i=1
with pi 0. Then
G = {(pi , ki ) | 1 i n}
is a gamble. With G we can associate the set FG that consists of all all acts
f such that there exists a partition {Bi }, 1 i n, of X with P (Bi ) = pi
and f (x) = ki for x Bi , 1 i n.
Savage then defines a utility as a real valued function on K such that
n
X
pi U (ki )
m
X
i=1
qi U (ki0 ),
i=1
n
X
i=1
81
pi U (ki ),
V (ci )V (cn )
,
V (c1 )V (cn )
(iii) ci fi .
Here P is the unique probability measure on P(X) that is commensurate
with (X, C, ).
Proof of (a). See the construction of section 5.6.3.
Proof of (b). Let U 0 be a a utility that exists according to the Representation Theorem of section 5.6.3. Then
V (ci ) V (cn )
V (c1 ) V (ci )
0
0
U (ci ) =
U (c1 ) +
U 0 (cn ).
V (c1 ) V (cn )
V (c1 ) V (cn )
We thus get
V (ci ) = U 0 (ci ) + ,
for some > 0 and some . Hence there exists an extension U of V with
the desired properties.
82
5.6.4
Theorem. If
V :CR
with V (ci ) = 0, for some i, 1 i n, and
U :OR
that satisfies C1 and C2 , then
U (p1 , . . . , pn ; c1 , . . . , cn ) =
n
X
pi V (ci ).
i=1
k
X
U (pi , 1 pi , ci , c),
i=1
n
X
U (pi , 1 pi , ci , c).
i=1
84
and
p
p
p p
p
p
U ( , 1 ; ci , c) + U (1 , ; ci , c) = U ( , 1 ; ci , ci )
q
q
q q
q
q
1
1
p
= U ( , . . . , , 1 ; ci , . . . , ci , ci )
q
q
q
1
1
p p
= pU ( , 1 ; ci , c) + U (1 , ; ci , c)
q
q
q q
p
p p
= V (ci ) + U (1 , ; ci , c).
q
q q
So
p
p
p
U ( , 1 ; ci , c) = V (ci ).
q
q
q
But U (p, 1 p; ci c) is continuous in p. Hence the theorem follows.
Note. We have deviated from Oddie and Milne (1990) in some trivial
respects in order to simplify the presentation.
Justification of the Utility Principle
The theorem above does not provide any justification of the utility principle,
but will form part of such a justification if the following results can be proved
in a non-circular way:
Theorem 1. If V : C R, then there exists a value function U : O R
that satisfies C1 and C2 .
Theorem 2. Every reasonable value operation is a value function that
satisfies C1 and C2 .
Oddie and Milne (1990) do not explicitly argue for these theorems, but
content themselves by stating that order considerations strongly suggest that
the value of an act is a function of the values and probabilities of the outcomes. Now, order considerations - the value of an act lies between the values
of the best outcome and the worsts one, it increases if the probabilities of
good outcomes increase etc. - do suffice to guarantee the existence of a value
function if the number of distinct outcomes is less than or equal to two. But
they will not do so in case this number is greater than two, as the considerations of section 5.6.3 and the following simple result from Debreu (1954)
show: Set
B = {(a, b) | a + b 1, a, b 0}
and (a, b) > (c, d) if and only if
85
(i) a + b > c + d or
(ii) a + b = c + d and a > b.
Suppose now that f : B R is such that f (a, b) > f (c, d) if and only
if (a, b) > (c, d). Set g(c) = (f (0, c), f (c, 0)). Then g(c) 6= if c > 0
and g(c) g(c0 ) = if c 6= c0 . But then the number of rational numbers
is uncountable since g(c) contains a rational number if c > 0. So order
considerations alone will not lead to the existence of a value function and
fortiori not to one satisfying C1 and C2 . No, to to prove the existence of
such a value function we must introduce the operation s and the relation
of Herstein and Milnor and assume that (O, s, ) satisfies there axioms.
Moreover, to get theorems 1 and 2, we must in view of the results in section
5.6.2 add the following axioms:
c1 > cn ,
and
ci s(V (ci ), c1 , cn )
for 1 < i < n. We have here assumed that
C = {c1 , . . . , cn }
and that
1 = V (c1 ) > > V (cn ) = 0.
Then
(p1 , . . . , pn ; c1 , . . . , cn ) s
X
pi V (ci ), c1 , cn
5.6.5
Summary
(1) Existing axiomatic systems do not provide the utility principle or the
the principle of maximizing expected utility with a formal non-circular
justification.
86
(2) The prospects for their obtaining such a justification are not very
bright: indeed assume
C = {c1 , . . . , cn },
V : C R such that V (c1 ) > > V (cn ),
and O equal to the set of options (p1 , . . . , pn ; c1 , . . . , cn ) such that
n
X
pi = 1 and pi 0.
i=1
5.7
Here BE corresponds to choosing B from {B, C} and E from {D, E}. This
in turn corresponds to choosing B rather than C and E rather than D at
the same time. In outcome form BD etc. can be described as follows:
BD
BE
CD
CE
Now
PBD (x 106 ) = PBE (x 106 ) > PCD (x 106 ) > PCE (x 106 ).
Moreover,
6 106 2 106 > 2 106 106 .
So there exists an f F, such that
E(BE, f ) > E(BD, f ),
E(BE, f ) > E(CD, f ),
E(BE, f ) > E(CE, f ).
Hence the Allais example does not show that E(A, f ) fails as a choice rule
generator at level (F3 , 0). We can also, see section 5.6.3, construct a system
that satisfies Savages axioms and contains the inequalities
BE > BD,
BE > CD,
BE > CE.
Finally, such constructions can also be performed for a host of axiom systems
for expected utility, see Malmnas (1990).
5.8
In this case it may happen the rule of dominance is violated by people without
their being irrational. This can be seen from the following example from
Mas-Collel et al. (1995). Suppose that someone is offered a trip to Venice
including a weeks stay at Hotel Cipriano, a poster of a genre painting by
the Venetian painter Giandomenico Tiepelo, or a piece of chocolate. If these
88
89
References
About, P & Boy, M. (1983). La Correspondance de Blaise Pascal et de Pierre
de Fermat, Fontenay aux Roses.
Allais, M. (1953). Le Comportement de lHomme devant le Risque: Critique
des Postulats et Axioms de lEcole Americaine, Econometrica, 21, pp. 503546.
Allais, M. (1979). The Foundations of a Positive Theory of Choice involving
Risk and a Criticism of the Postulates and Axioms of the American School.
In Allais M. & Hagen, O. (ed.) Expected Utility and the Allais Paradox.
Dordrecht: Reidel, pp. 27-145.
Bell, D. (1982). Regret in Decision Making under Uncertainty. Operations
Research, 33, pp. 961-981.
Bell, D. (1986). Disappointment in Decision Making under Uncertainty. Operations Research, 33, 1-27.
Bergstrom, L. (1991). Cykliska preferenser. In Rabinowicz, W. (ed.) Valets
vedermodor. Sex beslutsteoretiska studier. Stockholm: Thales.
Bernoulli, D. (1738). Specimen theoriae novae de mensura sortis (English
translation). In Bernoulli, D. (1954). Exposition of a New Theory on the
Measurement of Risk. Econometrica, 22, pp. 23-36.
Bernoulli, D. (1954). Exposition of a New Theory on the Measurement of
Risk. Econometrica, 22, pp. 23-36.
Bernoulli, J. (1713). Ars Coniectandi. Basel.
Blum, J.R. & Rosenblatt, J. (1967). On Partial a Priori Information in
Statistical Inference. Ann. Math. Stat., 38, pp. 1671-1678.
Danielson, Mats (1997). Computational decision analysis. Diss., Department
of Computer and Systems Sciences, KTH, Stockholm.
Debreu, G. (1954). Representation of a Preference Ordering by a Numerical
Function. In Thrall, R.M., Coombs, C.H. & Davis, R.L. (eds.). Decision
Processes. New York: John Wiley
Ekenberg, Love (1994). Decision support in numerically imprecise domains.
Diss., Stockholm University, Stockholm.
90
and Giroux.
Kahneman, D. & Tversky, A. (1979). Prospect Theory: An Analysis of
Decision under Risk. Econometrica, 47, pp. 263-293.
Kofler, E. & Menges, G. (1976). Entscheidungen bei unfollstandiger Information. New York: Springer Verlag, Berlin, Heidelberg.
Levi, I. (1974). On Indeterminate Probabilities. The Journal of Philosophy,
71-13, pp. 391-418.
Loomes, G. & Sugden, R. (1982). Regret Theory: an Alternative Theory
of Rational Choice under Uncertainty. The Economic Journal, 92, pp.
805-824.
Loomes, G. & Sugden, R. (1986). Disappointment and Dynamic Consistency
in Choice under Uncertainty. Review of Economic Studies, LIII, pp. 271282.
Mac Crimmon, R.D. & Larsson, S. (1979). Utility Theory: Axioms versus
Paradoxes. In Allais, M. & Hagen, O. (eds.). Expected Utility and the
Allais Paradox. Dordrecht: Reidel, pp. 333-409.
Malmnas, P.E. (1981). From qualitative to quantitative probability. Diss.,
Stockholm: Almqvist & Wiksell international.
Malmnas, P.E. (1990). Axiomatic Justifications of the Utility Principle: A
Formal Investigation. Research Report HSFR 677/87.
Malmnas, P.E. (1994). Axiomatic Justifications of the Utility Principle: A
Formal Investigation. Synthese, 99, pp. 233-249.
Mas-Colell, A. et al. (1995). Microeconomic Theory, New York: Oxford UP.
Menger, K. (1934). Das Unsicherheitsmoment in der Wertlehre. Zeitschrift
f
ur Nationalokonomie, V, pp. 459-485.
Oddie, G. & Milne, P. (1990). Act and Value. Theoria, LVII, pp. 42-76.
Quiggin, J. (1982). A Theory of Anticipated Utility. Journal of Economic
Behavior and Organization, 3, pp. 323-343.
Ramsey, F.P. (1931). Truth and Probability. In Ramsey, F.P. (ed.) The
Foundations of Mathematics and Other Logical Essays. New York: Harcourt, Brace, pp. 156-98.
92
93
Appendices
94
Appendix 1
Elementary Probability
This appendix contains a short treatment of some parts of elementary probability theory. The emphasis is on notions and results fundamental for an
understanding of classical decision theory. For a full treatment of elementary
probability theory Feller (1968) is recommended.
The primary notion of elementary probability theory is that of a finite
random experiment like throwing a die, tossing a coin, playing a soccer game,
or shooting at a paper target, a fixed number of times. The main feature of
such an experiment is that the outcomes denoted i (small omega), will vary
if the experiment is to be repeated, but all be contained in a given set called
a sample space, denoted by (capital omega).
When throwing a die, the sample space consists of the outcomes 1 =
1, 2 = 2, 3 = 3, 4 = 4, 5 = 5 and 6 = 6. The experiment of tossing a
coin has a sample space with the outcomes head head or tail tail . Playing
a game of soccer or shooting at a paper target, however, has a sample space
which is not quite as easily defined as throwing a die or tossing a coin. More
on this later.
The sample spaces of throwing a die, tossing a coin or playing a game of
soccer are all discrete, meaning countable. Other discrete sample spaces are
the number of scores in a wrestling match, the number of faulty products in
a batch, or the number of white cars on a given street on a rainy day. Note
that we are here dealing with the natural numbers 0, 1, 2, 3 and so on.
When shooting at a paper target, the sample space is continuous, meaning
that the outcomes are not countable. We can define an outcome (x,y) of
shooting at a paper target, as the number of centimeters away from the
center of the target, horizontally and vertically. But the distance away from
the center, in any given direction, can be seen as numbers along a continues
line, and thus the sample space itself is continuous.
Another property of a sample space is that it can be either finite or
95
infinite. The throwing of a die has a finite sample space, because the only
possible outcomes are 1, 2, 3, 4, 5 or 6. Likewise has the tossing of a coin
a finite sample space. The number of goals in a soccer game however, is in
theory, an infinite sample space, since there is no maximum limit of goals.
The sample space when shooting at a paper target is also infinite, since there
are an infinite number of different distances from the center.
To summarize, a discrete sample space contains countable things, such
as objects, while a continuous sample space contains measurable properties
such as length, height, and speed. A sample space with a fixed number of
possible outcomes is finite, and a sample space with an unknown or unending
number of possible outcomes, is infinite.
In probability theory we are interested in something called events (often
denoted A, B, C, . . . or A1 , . . . , Am , B1 , . . . , Bn , . . . ). An event consist of a set
of outcomes, where a set of outcomes, denoted {i , . . . , n }, can be either a
single outcome, or several outcomes. When an event consists of only a single
outcome, it is sometimes referred to as a simple event. So, for example, if we
throw a die and want to specify an event A1 as the event that we throw a 6,
and an event A2 that we throw an even number, we can write these as
A1 = {6},
(1.1)
A2 = {2, 4, 6}.
(1.2)
For a soccer game we can denote the event B, that the number of goals will
be either 1, 3 or 4, with
B = {1, 3, 4}.
(1.3)
As events when shooting at a paper target, we can specify C1 to be the event
that the hit will be less then 5 cm from the center, the event C2 that the hit
will be in the upper half sector, and the event C3 that the hit will be in the
right half sector.
C1 = {hit less than 5 cm from center}
(1.4)
(1.5)
(1.6)
96
that an event (for example D = ), will occur1 . Trying to picture the certain
event of a finite sample space, as in the case of throwing is die, is quite easy.
We can be certain to get either a 1, 2, 3, 4, 5 or a 6. But, when it comes to
infinite sample spaces, like all possible outcomes of a soccer game, it becomes
difficult. How can one picture the unknown? By using negation. Looking
back at the event B in (1.3), we could define the certain event of the soccer
game to be either 1, 3 or 4, or something other than 1, 3 or 4, which together
encompasses all possible outcomes of a soccer game.
Since the aim of probability theory is to give a mathematical model of
random experiments, it freely employs set theoretic notation, and thats why
the results are called events, which in turn are defined as subsets of the sample
space . The notion of a subset is a set consisting entirely of elements found
in another set. If we have the sets A = {1, 2, 3, 4} and B = {1, 3}, then, since
all the elements in B can be found in A, we call B a subset of A, written as
B A.
(1.7)
For this definition to hold, we need to accept that an event occurs if and only if one
of its outcomes occurs.
2
Other common ways to write the complement, of for example the set S, are S, S and
S0
97
B=
C=
Using the union operator we can create the events A B = {2, 4, 6}
{1, 2, 3} = {1, 2, 3, 4, 6}, and B C = {1, 2, 3} {1} = {1, 2, 3}. As you can
see, any of the elements in A B can be found in either A or B, or in both
A and B. Likewise, any of the elements in B C can be found in either B
or C, or in both B and C.
AB =
BC =
AC =
C =
Using the difference operator we can write the event all outcomes in
B, but not in C as B C, and the event all outcomes in C, but not in
Bas C B. However, by adding parenthesis, we see that this is the same as
using a combination of intersection and complement, as in B C = B(C ) =
B (C ) = {2, 3} and C B = C(B ) = C (B ) = .
BC =
98
C B =
Figure 1.1: The shaded region represents the sample space (or certain event)
.
Figure 1.3: The shaded region represents the event A , the complement of A.
Lets look at an experiment of throwing one die two times. The certain
event can be represented by the sample space = {(i, j) : 1 i, j 6}.
Essentially, this means that the sample space consists of the two throws i
and j, in that order, each of which can take on a number from 1 through 6.
Now, let A be the event that both throws yield even numbers, and B the
event that both throws give as results, numbers less than four, then A B
is the event two is the common outcome of the two throws. Moreover,
99
Figure 1.4: The shaded region represents the event A B, the union of A
and B.
Figure 1.5: The shaded region represents the event A B, the intersection
of A and B.
Figure 1.6: The shaded region represents the event A B, the difference
between A and B.
Figure 1.7: Here the events A and B are disjoint, i.e. their intersection is
the empty set .
A B is the event neither the first throw nor the second one has five as an
outcome, and A the event at least one of the two throws yields an odd
number.
To each event of a stochastic experiment, we can associate a non-negative
number indicating how probable the event is. Since probable is best understood as probable in comparison with the certain event, choosing numbers
between 0 and 1 appears as the natural choice, where 1 is the probability of
100
= {1, 2, 4, 6}
= 4/6.
From this follows that if two events are disjoint, as in the case of A and
C, the probability of the union of the events is the same as the sum of the
probabilities of each event: P (A C) = P (A) + P (C). This is true for an
arbitrary number of events, as long as they are disjoint.
The probability of the union of A and B is calculated differently, since
these events are not disjoint, with the common outcome of 2. Just looking
at the union of the two events, A B = {1, 2, 3, 4, 6}, its quite obvious that
the probability is P (A B) = 5/6. However, adding the probabilities of
101
events A and B gives us P (A) + P (B) = 3/6 + 3/6 = 1, which clearly isnt
correct. The error lies in the fact that we counted the probability of the
intersection of the events twice. Hence we need to subtract the probability
of the intersection to get the correct answer.
P (A B) = P (A) + P (B) P (A B) = 5/6
+P
P
= 5/6
P (A B) = P
If two events are disjoint, i.e. their intersection is the empty set , the
probability of the intersection is 0. Its like asking what the probability of
nothing is. Something that will remain zero, at least until the end of the
universe. Take note, however, that nothing is not the same as an unexpected
event.
In order to calculate the probability of the intersection of two events that
are not disjoint, we also need to look at conditional probability. Essentially
this refers to the probability of some event, given that some other event already occurred. For example, the probability of some team winning a soccer
game is likely to be higher, given that they lead by 3-0, than if the score was
even. Actually, even though its not made explicit, one can argue that probabilities are always conditional, because assumptions are made about the state
prior to the result of the experiment. Consequently, to be meticulous about
probabilities, one should for example not only ask about the probability of
getting a head when tossing a coin, but about the probability of getting a
head, given that the coin is fair, and so on.
Although it might seem a bit lame to exemplify conditional probability
with the experiment of throwing a die, I will still do it for the sake of clarity.
Imagine that you have thrown a die without looking at it. An observer tells
you that the die shows either 1, 2, 3 or 6; a subset of the sample space. At
this point, asking for the probability of getting a 5 or a 6, is the same as
asking for the probability of getting a 6 (the intersection of the two events),
given the event that the result is either 1, 2, 3 or 6, since we know it cant
be 5. With this new information, the probability of getting a 5 or a 6 is 1/4
(one favorable event, the 6, out of four possible events).
Let A1 = {1, 2, 3, 6} and A2 = {5, 6} represent the events above. Then
we have
= 4/6
P (A1 ) = P
P (A2 ) = P
= 2/6
= 1/6
P (A1 A2 ) = P
102
P (A B)
P (A)
P (A1 A2 )
.
P (A2 )
P (A1 A2 )
1/3
103
x
6
y
6
y=3
Now we have not only gotten the answer to P (A1 |A2 ), but also the answer
to P (A1 A2 ):
1
P (A1 A2 ) =
6
1
3
P (A1 |A2 ) = =
6
2
Hence, the formula for P (A B) is
P (A B) = P (A)P (B|A)
(1.8)
(1.9)
(1.10)
The formula for calculating the probability of the intersection if the events
are stochastically independent, and P (A) > 0 and P (B) > 0, is
P (A B) = P (A)P (B)
(1.11)
P (A B) = P (A)P (B|A)
(1.12)
P (A B)
P (A)
(1.13)
(1.14)
if and only if A B
A stochastic variable is a variable that can take on any value from a specific range,
but the exact value cant be predicted with certainty, only probabilistically. Thus, the
outcome of throwing a die, or the result of a soccer game, is a stochastic variable.
105
Figure 1.8: Here the event A is a subset of event B. Since the outcomes in
A also are in B, then if A occurs B will also occur. But the event B can also
occur without event A occurring, since B obviously contains outcomes that
are not in A.
1AB = 1A 1B
Contrary to the probability of the intersection, the indicator function of the
intersection is much easier to calculate since it only takes on the values 1 and
0. Thus, if both A and B occurs, the the indicator function of each of them
is 1, and thus the product of these are also 1.
1AB = 1A + 1B 1A 1B
Since the indicator function can only take on the values 0 or 1, we must
subtract by the product of the indicator functions, which is 0 if the event is
only A or B, and 1 if the event is the intersection of the two events.
If A1 , . . . , An is a division of into disjoint subsets (A1 An = ),
then X = x1 1A1 + + xn 1An is a simple stochastic variable, where
each variable xi is the value of the corresponding event Ai . Take again the
example of throwing a die. Let A1 be the event that the die shows a 1, let
A2 be the event that the die show a 2, and so on. Let the values of each
event be the number on the die in dollars. Thus, if the die show a 1 we
get one dollar, if the die show a 2 we get two dollars, etc. The stochastic
variable X can then take on any value from 1 through 6, since in this case
X = 1 1A1 + + 6 1A6 .
The expected value of X is defined as E(X) = x1 P (A1 )+ +xn P (An ),
where xn is the result of event An . Consequently we can also write the
expected value as E(X) = x1 P (x1 ) + + xn P (xn ). Looking at the same
example as in the above paragraph, the expected value of throwing a die
would be E(X) = 1 61 + + 6 61 = 3.5.
Expected values satisfy the following conditions: E(c X) = c E(X)
for any variable c, and E(X) E(Y ) if X Y . Moreover, the following
equality holds:
106
Addition property
E(X + Y ) = E(X) + E(Y )
Proof. Set
X = x1 1A1 + + xn 1An =
n
X
xi 1Ai
i=1
Y = y1 1B1 + + ym 1Bm =
m
X
yj 1Bj
j=1
X + Y = (x1 1A1 + y1 1B1 ) + + (x1 1A1 + ym 1Bm ) + + (xn 1An + y1 1B1 ) + + (xn 1An + ym 1
n X
m
X
=
xi 1Ai + yj 1Bj
i=1 j=1
n
X
xi P (xi ) +
i=1
m
X
yj P (yj )
j=1
n X
m
X
i=1 j=1
n X
m
X
i=1 j=1
n X
m
X
i=1 j=1
j=1
j=1
i=1
107
n
X
P (xi ) = 1
i=1
And consequently
E(X + Y ) = E(X + Y ) =
n
X
P (xi )xi +
i=1
m
X
j=1
(1.15)
Proof. Set
X = x1 1A1 + + xn 1An
Y = y1 1B1 + + ym 1Bm
The product of the stochastic variables X and Y is the sum of the products
of all possible combinations of X and Y :
XY = x1 1A1 y1 1B1 + + x1 1A1 ym 1Bm + + xn 1An y1 1B1 + + xn 1An ym 1Bm
n X
m
X
=
xi 1Ai yj 1Bj
i=1 j=1
j=1
i=1 j=1
n X
m
X
i=1 j=1
108
E(X)
1
2
(1.16)
What this means is, the probability that the actual outcome lies outside of
standard deviations from the expected value. The greater the number ,
the less the probability that the outcome lies outside the interval . By
setting = we can write the inequality as
P (|X | )
2
2
(1.17)
(1.18)
n
X
(xi )2 P (xi )
i=1
|xi |
|xi |
109
|xi |
Now, since x1 , . . . , xn are pairwise disjoint, the sum of all P (xi ) where |xi
| is the same as the probability that |X | , and consequently we
can write
X
P (xi ) = 2 P (|X | )
2
|xi |
2
2
that the expected value of the expected value itself, is equal to the expected
value), we continue with
V (X) = E(X 2 ) E (2XE(X)) + E E(X)2
= E(X 2 ) E(2)E(X)E (E(X)) + E(E(X)2 )
= E(X 2 ) 2E(X)E(X) + E(X)2
= E(X 2 ) 2E(X)2 + E(X)2
= E(X 2 ) E(X)2
The last expression makes calculating the variance much easier than using
the formula in the definition.
The variance of a stochastic variable X, multiplied by some constant b is
V (bX) = b2 V (X)
Proof.
V (bX) = E [bX bE(X)]2
= E [b(X E(X))]2
= E b2 [X E(X)]2
= b2 E [X E(X)]2
= b2 V (X)
Bienaymes equality
V (X1 + + Xn ) = V (X1 ) + + V (Xn )
if X1 + + Xn are pairwise independent.
Proof. Note first that E (X E(X)) = E(X) E (E(X)) = E(X)
E(X) = 0 and that X + a and Y + b are independent so long as X and Y
are independent. Then
V (X1 + + Xn ) = E [(X1 + + Xn ) E (X1 + + Xn )]2
= E [(X1 E(X1 )) + + (Xn E(Xn ))]2
Using a summation symbol instead, we get
"
!
#2
n
n
X
X
V
Xi = E
(Xi E(Xi ))
i=1
i=1
111
i=1
j6=i
Using the addition property of the expected value, we can then split the right
hand side into two parts:
!
!
!
n
n
n
X
X
X
2
V
Xi = E
[Xi E(Xi )] +E
[(Xi E(Xi )) (Xj E(Xj ))]
i=1
i=1
j6=i
n
X
!
[Xi E(Xi )]2
i=1
and
b=E
n
X
!
[(Xi E(Xi )) (Xj E(Xj ))]
j6=i
so we have
V
n
X
!
Xi
i=1
112
=a+b
=E
Xi 2
i=1
=
=
=
n
X
i=1
n
X
E(Xi ) 2
E(Xi 2 ) 2
i=1
n
X
i=1
i=1
n
X
n
X
E(Xi 2 ) 2
n
X
E(Xi 2 )
i=1
n
X
E(Xi )2
i=1
E (Xi E(Xi )) +
E(Xi )E(Xi ) +
n
X
i=1
n
X
E E(Xi )2
E(Xi )2
i=1
E(Xi )2 +
i=1
n
X
n
X
2Xi E(Xi ) +
i=1
n
X
i=1
n
X
n
X
E(Xi )2
i=1
E(Xi )2
i=1
E(Xi 2 ) E(Xi )2
i=1
and
b=E
n
X
!
[Xi Xj Xi E(Xj ) Xj E(Xi ) + E(Xi )E(Xj )]
j6=i
=E
n
X
Xi X j
j6=i
=
=
=
n
X
j6=i
n
X
j6=i
n
X
j6=i
E(Xi Xj )
n
X
j6=i
n
X
Xi E(Xj )
E(Xi )E(Xj )
Xj E(Xi ) +
j6=i
E (Xi E(Xj ))
j6=i
E(Xi )E(Xj )
n
X
j6=i
n
X
!
E(Xi )E(Xj )
j6=i
n
X
E (Xj E(Xi )) +
j6=i
n
X
n
X
n
X
E (E(Xi )E(Xj ))
j6=i
E(Xi )E[E(Xj )]
n
X
E(Xj )E[E(Xi )] +
j6=i
E(Xi )E(Xj )
j6=i
n
X
j6=i
=0
113
E(Xj )E(Xi ) +
n
X
E[E(Xi )]E[E(Xj )]
j6=i
n
X
j6=i
E(Xi )E(Xj )
i=1
i=1
1
1
V
(X
)
+
+
V (Xn ).
1
n2
n2
Now, since X1 , . . . , Xn are equally distributed stochastic variables, the variance of them all is the same, so V (Xi ) = i2 = 2 , and thus we have
=
V (X)
=
=
=
=
1 2
1 2
1
n2
n2 n
1 2
( + + n2 )
n2 1
1
n 2
n2
n 2
n2
2
V (X)
=
n
n
114
as n
(1.19)
In words this means that the probability that the difference between the
sample average and the expected value, is greater or equal to some number
, approaches 0 as n approaches infinity. Another way to write it is
X1 + + Xn
E(X) < 1
as n
P
n
which might be more intuitive.
Proof. Recall Chebyshevs inequality:
P (|X | )
=
Applying Chebyshevs inequality to X
2
.
2
X1 ++Xn
n
we obtain
2
E(X)|
P |X
n2
2
Since n
2 0 when n we have just proved the Bernoullis Law of
Large numbers.
115
Appendix 2
Historical Notes
Classical decision theory and much of probability theory starts with a lost
letter from Pierre de Fermat to Blaise Pascal. In his response, dated 29th of
July 1654, Blaise Pascal accepts Fermats solution of the Problem of Points
and outlines his own, which he considers simpler.
2.1
2.1.1
Pascals solution
Although Pascal only considers special cases, his approach is perfectly general
since it consists of an algorithm for calculating the expected outcome for
each player. Pascal then claims that this number is the sought one. As an
illustration, suppose that A and B each has staked 32 pistoles in a fair game.
The player who first reaches four points wins the entire pot of 64 pistoles.
But what if the game is interrupted when A has obtained two points and B
only one? The stakes are to be divided as follows: A needs an additional two
points to win, and B needs three points to win. We can write that as R(2, 3).
116
The possible outcomes had they taken one or two more turns is depicted in
figure 2.1.
117
1
1
64 + s (R(1, 2))
2
2
1
1
= 32 +
2
2
1
1
64 +
2
2
1
1
s (R(0, 2)) + s (R(1, 1))
2
2
1
1
= 32 +
2
2
1
1
64 +
2
2
1
1
64 + 32
2
2
(2.1)
1
1
1
1
= 32 + 64 + 64 + 32 = 44.
2
4
8
8
Hence player A should receive 44 pistoles and player B 20 pistoles.
2.1.2
Fermats solution
Fermat notes that the game described above will end after at the most four
additional throws, and that all results are equally possible. More precisely,
the results are the following ones:
aaaa
baaa
baab
babb
aaab
aabb
baba
bbab
aaba
abab
bbaa
bbba
abaa
abba
abbb
bbbb
Here a stands for a result that yields a point to A. He then notes that in 11
of these cases, when A gets two points or more, A will win. Hence A is to
11
receive 16
64 = 44pistoles. Note that the sample space above contains results
that will never occur but make all results equally likely.
118
2.2
Generalizations
2.2.1
Pascals approach
It is assumed that all participants have equal chances in winning a point and
that each of them has staked 32 pistoles. The remaining possible outcomes
is presented in figure 2.2.
We use the same notation as above and notes that s (R(0, n, m)) = 96,
s (R(n, 0, m)) = 0, s (R(n, m, 0)) = 0 and s (R(n, n, n)) = 32. Following the
same recursive steps we can calculate As share as
s (R(1, 2, 2)) = 31 s (R(0, 2, 2)) + 31 s (R(1, 1, 2)) + 13 s (R(1, 2, 1))
1
1
= 96 +
3
3
1
+
3
1
1
1
s (R(0, 1, 2)) + s (R(1, 0, 2)) + s (R(1, 1, 1))
3
3
3
1
1
1
s (R(0, 2, 1)) + s (R(1, 1, 1)) + s (R(1, 2, 0))
3
3
3
1
1
= 96 +
3
3
1
1
1
1 1
1
1
96 + 0 + 32 +
96 + 32 + 0
3
3
3
3 3
3
3
1
1
1
1
1
1
= 96 + 96 + 32 + 96 + 32 = (288 + 96 + 32 + 96 + 32)
3
9
9
9
9
9
=
544
.
9
(2.2)
544
Hence A is to receive
pistoles.
9
119
2.2.2
Fermats approach
The game will end after at most three throws. Hence the sample space
contains the following results:
aaa aab aac aba abb abc aca acb acc
baa bab bac bba bbb bbc bca bcb bcc
caa cab cac cba cbb cbc cca ccb ccc
120
(2.3)
Player A wins after getting at least one additional point, except when
player B or player C has managed to get two points before (underlined).
544
17
96 =
pistoles.
Hence A is to receive
27
9
It should be borne in mind that the aim of Pascals and Fermats investigations were to determine the stakes each player should receive, not to
determine probabilities in general. In this they were followed by their successors Christiaan Huygens and James Bernoulli, who devoted much time to
variants of the Problem of Points, see under Exercises below. It is, however,
noteworthy that these brilliant minds did not succeed in obtaining a general
account of the probability of A winning the game, although such an account
presents few problems. Assume for instance that A is in need of p points, B
of r points, and C of s points. Then A wins the game if and only if she obtains p points while B and C obtain at most r 1 or s1 points respectively.
If then probabilities for obtaining points are added, then the probability of
A winning is home, see under Exercises.
It should also be noted that the division of stakes proposed by Fermat
and Pascal was immediately accepted by all mathematicians at the time, and
that their proposal also has a bearing on the games considered in Chapter
2. Assume namely that a player has bought a ticket in one of the games
described there and that some circumstances hinders a drawing from taking
place. Then, according to Fermat and Pascal, she is entitled to the expected
value of the game in question. The term expected value stems from the term
expectatio used by Christiaan Huygens in his treatise De Ratiociniis in Ludo
Aleae, which is contained in the first part of Bernoulli (1713). J. Bernoulli
there also gives a general argument to support the claim that the expectatio
of a game is what you should expect to get when playing it. Suppose that we
have n urns containing a1 , . . . , an SEK. If n players are allotted these urns by
a random experiment, then the expectatio of each player is the same. Since
the combined expectatio is a1 + + an , the expectatio of each player is
a1 + + an
.
n
2.3
Other Problems
Fermat and Pascal also seems to have solved the following problem posed
by Huygens, see About and Boy (1983). Two players A and B throw two
distinguishable dice in turn. A starts and wins if his throw results in the sum
of six, whereas B wins if his throw results in the sum of seven. What are the
odds for A and B respectively?
121
Fermat and Pascal found the right numbers, 30/61 for A and 31/61 for
B, but unfortunately their calculations are lost. To find the answer, note
first that the sum six occurs when the dice show the numbers (1, 5), (2, 4),
(3, 3), (4, 2) or (5, 1), and that the sum seven occurs when the dice show the
numbers (1, 6), (2, 5), (3, 4), (4, 3), (5, 2) or (6, 1). Note then that A can
win after any odd number of throws and B after any even number. The
probability that A wins after 2n 1 throws (or the nth turn) with n 1
equals
n1
5
31 30
,
(2.4)
P (An ) =
36 36
36
and the probability that B wins after 2n throws (or the nth turn) with n 1
equals
n1
31 30
31
6
P (Bn ) =
.
(2.5)
36 36
36 36
The probability that either player is winning the mth turn is
n1
n1
31 30
31 30
6
5
31
P (Am ) + P (Bm ) =
36 36
36
36 36
36 36
31 30
36 36
n1
31 30
36 36
n1
5
6 31 1
+
36 6 36 6
31 30
36 36
n1
61
6 36
=
5
31
6
+
36 36 36
(2.6)
.
P (Am )
P (Am ) + P (Bm )
31
36
31
36
5
36
61
636
30 n1 5
36
36
30 n1
61
36
636
(2.7)
30
.
61
31
.
61
2.4
Exercises
123
two throws, B makes three throws, A makes three throws, etc. Find
the probability of A winning the game.
8. Game of stones (from Bernoulli (1713))
An urn contains four white stones and eight black ones. Three players
in turn draw stones without replacement until a white stone is drawn.
Determine the probability of winning for each player.
9. Game of stones (from Bernoulli (1713))
Like exercise 8 except that stones are drawn with replacement.
10. Game of dice (from Bernoulli (1713))
A and B are involved in a game where three dice are thrown. In case the
number of eyes is 11, A is to give B a coin, whereas she is to receive one
from B if the number of eyes is 14. They start the game with 12 coins
each and whoever gains position of all coins wins the game. Determine
the quotient between their probabilities of winning the game.
2.5
Solutions
1. Problem of Points
The expected value for B is
0 P (A wins the game) + 64 P (B wins the game) pistoles.
Now A wins if she gains two points and B at most three points, whereas
B wins if she gains four points and A at most one. Hence B wins if
she gains four points and A none, or if she gains four points and A one
point but not the last one. Since
P (A wins a point) = P (B wins a point) =
1
2
we can calculate
P (B wins four points and A none) =
1
1
= .
4
2
16
Moreover
P (B wins four points and A one point but not the last one) =
124
4
32
since A can win the first, second, third or fourth round. Hence
P (B wins the game) = P (B wins four points and A none)
+P (B wins four points and A one point but not the last one) =
3
.
16
3
= 12 pistoles.
16
2. Problem of Points
The expected value for C is
E[C] = 0 P (A wins the game)
+0 P (B wins the game)
+96 P (C wins the game) pistoles.
Now C wins the game if and only if she wins the next two rounds.
Hence
P (C wins the game) = 1/9.
Accordingly her expected value is 11 pistoles.
(a) Problem of Points
A wins the game if she wins p rounds including the last round,
while B wins at most q 1 rounds. Hence
P (A wins the game) =
q1
X
p1+k
k=0
ap b k
(2.8)
As an example, set p=2 and q=3, then there are basically three
possible ways in which A can win: A get two points in a row
and B zero points, A get two points and B 1 point, A get two
points and B two points. Looking closer at the last scenario, we
see that it can occur in three different ways. Let A represent a
round where player A wins and B a round where player B wins.
Since A always have to score on the last round we have a situation
like A, and the question is in how many ways we can order the
remaining ABB. Of course we can list them like this:
ABB BAB BBA
125
But as the numbers becomes bigger, listing all possible combinations like this soon gets close to impossible. Instead one can utilize
the fact that n distinguishable objects in a row can be ordered in
n(n 1) . . . (n r) . . . 1 = n! different ways. However, since B
and B arent distinguishable (and the same does of course apply
to the As should weve had more than one) the total number of
ways has to be divided by two. To reach the general case we could
use the binomial coefficient
n
n!
.
=
k!(n k)!
k
Here n represents the total number of rounds, namely p 1 + k,
where p 1 is the rounds won by A except the last one, and k is
the total number of rounds won by B. Back to our example we
can calculate the number of distinct orderings of ABB as
3
3!
3 2!
=
=
= 3.
2
2!(3 2)!
2!(1)!
Writing the expression (2.8) out in full for p = 2 and q = 3 gives
2
X
1+k 2 k
ab =
k
k=0
3 2 2
1 2 0
2 2 1
=
ab +
ab +
ab =
0
1
2
= a2 + 2a2 b + 3a2 b2 .
(b) Player A wins the game if she wins the last round and, in addition,
p 1 rounds while B wins at most q 1 rounds and C at most
r 1 rounds. Suppose that the probability of A gaining a point
equals a, and that the corresponding probabilities for B and C
are b and c respectively. Then
q1 r1
X
X p 1 + k + l
p
P (A wins the game) = a
bq c r
p
1,
k,
l
k=0 l=0
Note that the double summation sign means for each k, calculate
the expression for all l. Also, instead of a binomial coefficient we
are now using a multinomial coefficient which expands as follows.
p1+k+l
(p 1 + k + l)!
=
.
(p 1)! k! l!
p 1, k, l
126
27
2
6
+ 32
= 34 + pistoles
36
36
3
3
27
1
+ 32
= 29 + pistoles.
36
36
3
127
Now, set p = 1 5/36 (the probability of not getting a sum of six) and
q = 1 6/36 (the probability of not getting a sum of seven). Then
P (A1 ) = 1 p
P (A3 ) = pq 2 (1 p2 )
P (A5 ) = pq 2 p2 q 3 (1 p3 )
P (A7 ) = pq 2 p2 q 3 p3 q 4 (1 p4 ).
Considering the recurring pattern we can write the more general formula
P (A2n+1 ) = p(pq)
n1
(n+2)
2
q n+1 (1 pn+1 )
n(n+1)
2
n(n+3)
2
(1 pn+1 ).
8 (n 1)
12 (n 1)
for n = 1, . . . , 8
P (wn ) = P (ln1 )
4
12 (n 1)
for n = 1, . . . , 9.
and then
128
7
,
15
53
165
7
33
and
129
and consequently
sn =
1 xn+1
1 q 3(n+1)
=
.
1x
1 q3
Since q 3(n+1) tends to zero quite rapidly as n gets large, we can omit
that term and approximate the original sum as
1 + + q 3n +
1
.
1 q3
p
,
1 q3
P (B wins) =
pq
1 q3
P (C wins) =
pq2
.
1 q3
and
15
216
27
.
216
Now, let P (An ) be the event A wins after n rounds, where n 12,
and A = i and B = i that player A and B won i times respectively,
and finally L = i that neither player won i times. Then
P (A12 ) = P (A = 12 B = 0 L = 0),
P (A13 ) = P (A = 12 B = 0 L = 1)
and
P (A14 ) = P (A = 12 B = 0 L = 2) + P (A = 13 B = 1 L = 0).
This can be generalized to
P (An ) =
12+2in
X
P (A = 12 + i B = i L = n 12 2i)
i=0
12+2in
X
a12+i bi (1 a b)n122i
i=0
131
(n 1)!
.
(12 + i 1)!i!(n 12 2i)!
Setting up the same equation for P (Bn ) is only a matter of shifting the
symbols and thus get
P (Bn ) =
12+2in
X
ai b12+i (1 a b)n122i
i=0
(n 1)!
.
(12 + i 1)!i!(n 12 2i)!
If we extract the products a12 and b12 from the summations we can
then calculate
P
(n1)!
a12 12+2in
ai bi (1 a b)n122i (12+i1)!i!(n122i)!
P (An )
a12
i=0
=
=
.
P
(n1)!
P (Bn )
b12
b12 12+2in
ai bi (1 a b)n122i
i=0
(12+i1)!i!(n122i)!
Lastly we calculate
15/216 12 5 12
a12
=
0.00086
=
b12
27/216
9
132
Appendix 3
Monte Carlo simulations
In this appendix we will present the method used and the results from the
Monte Carlo simulations. Because even simple games such as the ones in
section 2.1 require rather extensive simulations, weve chosen to include only
those for game 1 and 2. However, the same method is applicable to game 3
and 4 as well.
As a reminder, game 1 has the following set up:
Game 1
One throw of four distinguishable dice.
Stake: 200 SEK
Prizes:
(a) 12 000 SEK if the product of the outcomes is an odd number and their
sum a square one.
(b) 4 000 SEK if the product of the outcomes is an odd number and their
sum minus one a square number.
(c) 1 000 SEK if the product of the outcomes is an odd number and neither
(a) nor (b) holds.
Expected value:
16
54
11
12000 +
4000 +
1000 = 192.90
1296
1296
1296
The set up for game 2 is:
Game 2
One throw of four distinguishable dice.
Stake: 200 SEK
Prizes:
133
(a) 3 000 SEK if the product of the outcomes is a square number, their
sum is odd, and the number one does not occur.
(b) 2 000 SEK if the product of the outcomes is a square number, their
sum is even, and the number one does not occur.
(c) 550 SEK if the product of the outcomes is a square number and neither
(a) nor (b) holds.
Expected value:
24
65
110
3000 +
2000 +
550 = 202.55
1296
1296
1296
To run the simulations we used R, a free platform for statistical computing. We are aware that this isnt the implementation with the fastest
run-time, but the code has been written with the beginner programmer in
mind. All the code used is included bit by bit and suggestions on how to
improve clarity are welcome. However, before going any further we need to
be clear about what to do.
The question which we will try to answer with the help of computer
simulations is: How many games need to be played in order for the average
prize to stay within +/- 4 SEK from the expected value with a probability of
at least 0.99? Now, if we play m games and then calculate the average prize,
it will either be within +/- 4 SEK from the mean, or it will not. Hence,
whether the average prize of m games falls within the specified interval or
not, can be seen as a Bernoulli trial (an experiment with only two possible
outcomes), equivalent to the tossing of a coin.
If we let p be the probability of the average prize falling within +/- 4 SEK
of the expected value, then we can set up the following two hypothesis:
H0 : < 0.99
and
H1 : 0.99.
Would p be less than 0.99, we should expect to find more averages outside
the interval in the long run, then if p would be either equal to or greater
then 0.99. Thus, if we get sufficiently many averages within the interval in
proportion to the number of trials, we could decide to reject the hypothesis
H0 .
The probability of getting n hits in a row is pn , and thus the probability
of getting n hits in a row, given p < 0.99 is
Z 1
Z 0.99
n
P (p < 0.99) =
p dp
pn dp
p=0
134
p=0
0.99n+1 n + 1
=
n+1
1
n+1
= 0.99
and then solve
= P (p < 0.99)
0.01 = 0.99n+1
0.01
= 0.99n
0.99
ln 0.01 ln 0.99
ln 0.01
n=
=
1 = 457.2...
ln 0.99
ln 0.99
Consequently, if 458 trials of m games, each would result in an average prize
at most +/- 4 SEK from the expected value, we would reject H0 and rather
believe H1 to be the true one, and thus take m to be the number of games
that has to be played to in order to stay within an interval at most +/- 4 SEK
from the expected value with a probability of at least 0.99.
Looking at the code used in the simulations, we first defined some commonly used constants in order to avoid unnecessary and time consuming
calculations:
135
N <- 458;
EV <- c ( " GAME1 " = 192.90 , " GAME2 " = 202.55);
EVEN _ SQUARES <- c (1 ,
49 ,
169 ,
361 ,
625 ,
961 ,
4,
64 ,
196 ,
400 ,
676 ,
1024 ,
9,
81 ,
225 ,
441 ,
729 ,
1089 ,
16 ,
100 ,
256 ,
484 ,
784 ,
1156 ,
25 ,
121 ,
289 ,
529 ,
841 ,
1225 ,
36 ,
144 ,
324 ,
576 ,
900 ,
1296);
Then we defined a couple of helper functions to make some of the conditionals clearer:
even <- function ( n ) n %% 2 == 0; # True if n is even
odd <- function ( n ) n %% 2 == 1; # True if n is odd
We put the functions for calculating the prize of a four dice throw, depending on which game we are playing, in a list. Doing this makes the rest of
the code simpler and possibly faster, since we dont have to use conditionals
in order to keep track of which game we are playing.
win <- list (
" GAME1 " <- function ( l ) {
if ( odd ( prod ( l ))) {
if ( sum ( l ) % in % EVEN _ SQUARES ) return (12000);
if (( sum ( l ) - 1) % in % EVEN _ SQUARES ) return (4000);
return (1000);
}
return (0)
},
" GAME2 " <- function ( l ) {
if ( prod ( l ) % in % EVEN _ SQUARES ) {
if ( odd ( sum ( l )) & & 1 % in % l == FALSE ) return (3000);
if ( even ( sum ( l )) & & 1 % in % l == FALSE ) return (2000);
return (550);
} else {
return (0);
}
}
);
136
137
}
fun <- function ( x ) {
p ( p _ min , p _ max , N , x );
}
return ( vapply ( failed , fun , FUN . VALUE = numeric (1)));
}
3.1
Results
The results from the simulations of Game 1 are presented i figure 3.1 and
3.2, and the results from the simulations of Game 2 are presented i figure 3.3
and 3.4.
Looking at game 1, the first time of which none of the 458 trials fails is at
661,000 repeated games. The probability of zero failures given H0 is less than
0.01. Albeit not impossible, starting at 797,000 repeated games, zero failures
are becomingly increasingly common and the probability of the outcome,
given H0 , is always less than 0.5. Thus it is reasonable to believe that, for
game 1, p < 0.99 somewhere in the neighborhood of 796,000 repeated games.
Considering game 2 in the same way, the first occurrence of zero failures is
at 167,000 repeated games, and at 252,000 repeated games, zero failures are
becoming increasingly common, with the probability of the outcome, given
H0 , is less than 0.5. Consequently, it is reasonable to believe that p < 0.99
at approximately 251,000 repeated games, for game 2.
Comparing the above results with the Bernstein-Bennet and Talagrand
inequalities, we have for game 1 953,608 and 810,700, and game 2 235,540
and 200,185, respectively. Thus, in summary, the simulations for game 1
gives a result which is a bit smaller than the inequalities, whilst for game 2
they give a bit more conservative numbers.
138
200
0
100
Number of failures
300
400
500000
1000000
1500000
2000000
0.6
0.4
0.0
0.2
Probability
0.8
1.0
500000
1000000
1500000
2000000
Figure 3.2: The probability of the number of failures given H0 (that p < 0.99)
for game 1.
139
200
0
100
Number of failures
300
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
6e+05
0.6
0.4
0.0
0.2
Probability
0.8
1.0
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
6e+05
Figure 3.4: The probability of the number of failures given H0 (that p < 0.99)
for game 2.
140