Vous êtes sur la page 1sur 141

DECISION PROBLEMS AND

PROCEDURES FOR HANDLING THEM


Per-Erik Malmnas
and
Andreas Paulsson
version 06.09.2015

Contents
1 Introduction
1.1 Decision problems . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Choice of medical treatment (adapted from Weinstein
and Fineberg [1980, p. 180]) . . . . . . . . . . . . . . .
1.1.2 A choice between two languages for computer programs
1.1.3 Monty Hall . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Procedures for handling decision problems: the decision theoretic approach . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . .

4
4

2 Classical Decision Theory


2.1 Pure Classical Decision Theory . . . . . . . . . . . . . . . . .
2.2 Modified Classical Decision Theory . . . . . . . . . . . . . . .
2.2.1 Introduction to utilities . . . . . . . . . . . . . . . . . .

29
29
32
32

3 Supersoft Decision Theory


3.1 Representation of Decision Problems . . . . . . . .
3.2 Satisfiable and not Satisfiable Decision Frames . . .
3.3 Evaluations of Options and Decision Rules in wSSD
3.3.1 Qualitative Evaluations . . . . . . . . . . . .
3.3.2 Expected Value . . . . . . . . . . . . . . . .
3.3.3 Decision Rules Based on Extreme Values . .
4 Other Approaches to Decisions
4.1 Hodges and Lehmann (1952) .
4.2 Blum and Rosenblatt (1967) .
4.3 Watson (1974) . . . . . . . . .
4.4 Levi (1974) . . . . . . . . . .
4.5 Gardenfors and Sahlin (1982)

under
. . . .
. . . .
. . . .
. . . .
. . . .

5
5
6
7
7
7

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

34
34
38
41
41
43
46

Uncertainty
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

48
48
49
49
50
51

5 Evaluations and Choice Rules


5.1 Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . .
5.2 Miscellaneous remarks . . . . . . . . . . . . . . . . . . . . . .
5.2.1 On Tests . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 On Options . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Classification of Evaluations . . . . . . . . . . . . . . .
5.2.4 On the Relation between Preference Tests and Choice
Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Evaluations and Common Ratio Tests . . . . . . . . . . . . . .
5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
5.4 An Examination of Some Proposals . . . . . . . . . . . . . . .
5.4.1 Hagen (1969; 1972; 1979) . . . . . . . . . . . . . . . . .
5.4.2 Fishburn (1983) . . . . . . . . . . . . . . . . . . . . . .
5.4.3 Loomes and Sugden (1986) . . . . . . . . . . . . . . . .
5.4.4 Green and Jullien (1988) . . . . . . . . . . . . . . . . .
5.4.5 Quiggin (1982) . . . . . . . . . . . . . . . . . . . . . .
5.4.6 Yaari (1987) . . . . . . . . . . . . . . . . . . . . . . . .
5.5 On Continuous Evaluations . . . . . . . . . . . . . . . . . . .
5.5.1 Polynomial evaluations of options in two variables . . .
5.5.2 Polynomial evaluations of options with three outcomes
5.6 Axiomatic Utility Theory and Expected Utility . . . . . . . .
5.6.1 Fundamental Concepts . . . . . . . . . . . . . . . . . .
5.6.2 Herstein and Milnor (1953) . . . . . . . . . . . . . . .
5.6.3 Savage (1954; 1972) . . . . . . . . . . . . . . . . . . . .
5.6.4 Oddie and Milne (1990) . . . . . . . . . . . . . . . . .
5.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . .
5.7 Allais Example and Expected Utility . . . . . . . . . . . . . .
5.8 Lotteries with Non-monetary Prizes . . . . . . . . . . . . . . .

52
52
56
56
56
56

Appendices

94

Appendix 1 Elementary Probability

95

Appendix 2 Historical Notes


2.1 The Classical Problem of Points
2.1.1 Pascals solution . . . .
2.1.2 Fermats solution . . . .
2.2 Generalizations . . . . . . . . .
2.2.1 Pascals approach . . . .
2.2.2 Fermats approach . . .
2.3 Other Problems . . . . . . . . .
2

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

57
58
58
59
59
61
62
63
63
64
65
65
66
67
68
68
79
83
86
87
88

116
116
116
118
119
119
120
121

2.4
2.5

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Appendix 3 Monte Carlo simulations


133
3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Chapter 1
Introduction
What will I suffer now? These words by Odysseus, one of the most successful
decision makers of Western literature, expresses what most people at times
experience both in their private life and as professionals. It is also worth
noting that they are not outdated, since in contrast to other areas man as
a decision maker in general relies more on intuition rather than rules when
solving decision problems. Since such a procedure hardly can be seen as an
ideal one, especially if a proposed solution must be supported by arguments,
it is natural to look into the possibility of finding formal methods like those
in logic for solving decision problems.

1.1

Decision problems

If you are looking for the right kind of white color for the walls of your
drawing-room, or the perfect sour milk, then you are facing an optimization
problem. If instead you are pondering which brand of yogurt to buy at your
local grocery, or of finding the ingredients of a decent meal there, then you
are up to a problem of choice, or a problem of satisfaction. Decision problems
come in one of these forms or as combinations of them. A good example of
the latter case is the procedure Swedish authorities follow when selecting
tenders under the Act of Public Procurement. First each tender is subjected
to a test in order to see if it fulfills certain minimum requirements. Then a
selection is made out of the tenders that had passed the initial test.
There is also another and perhaps more familiar way of classifying decisions, namely in terms of timescales. In this course only decisions where
there is ample time for deliberation are studied. Hence operational decisions
made by firefighters, policemen and soldiers in the field are not considered.
Moreover, the main focus will be on professional decision making since such
4

decisions ought to be well founded. However, as an introduction, three simple


problems are presented.

1.1.1

Choice of medical treatment (adapted from Weinstein and Fineberg [1980, p. 180])

A 68 year old woman has suffered from bad circulation in the left leg for
some time. Shes now seeing a doctor because of an injury that has led to
an infection that may develop into gangrene in the left foot. There are two
available options: either an immediate operation O leading to the insertion of
an artificial limb below the left knee, or a treatment with drugs during three
months M. Such a treatment is successful in seven out of ten cases. But if
it fails, a more complicated operation leading to the insertion of an artificial
limb above the left knee is needed. The doctor considers the probabilities
for a successful operation to be 99 percent if it is done right away, and 90
percent if it is done after an unsuccessful treatment with drugs. The doctor
is also willing to let the preferences of the patient influence her proposals.
How should she act?

1.1.2

A choice between two languages for computer


programs

A result of a cooperation with section TR of ELLEMTEL, a company owned


jointly by Telia and Ericsson.
In spring and summer 1987 TR had developed a small prototype of a
support system for telephone services consisting of a simulator and a planner.
The prototype was written in Prolog and was presented to various audiences.
Since these presentations were quite successful it was decided that TR should
develop a more elaborate prototype containing more tools. Large parts of
this prototype was implemented during the autumn but it turned out that
fitting different parts to one another was harder than expected since the
Prolog system used could not handle large programs. The staff spent a lot
of time learning more about Prolog and how to deal with large programs
and in spring 1988 it was decided that the prototype should be ready for
presentation after the summer. The available options turned out to be the
following ones:
A: The whole system is written in C
B: The section continues searching for a version of Prolog that can
handle large programs
5

C: The project is temporarily abandoned


TR then embarked upon evaluating the options by stating their pros and
cons:
Pro A: Then number of man hours needed could be estimated without
difficulty. The finished product would in all probability be stable and
enable swift computations.
Con A: The whole code must be rewritten and getting hold of proficient
C programmers could be a problem. Moreover, the staff would be forced
to spend a lot of time programming in C which in turn would stop them
from engaging in more interesting work.
Pro B: If a stable Prolog system is found, then the prototype would
be completed with limited effort. Hence the staff could spend time on
improving the system by adding more tools.
Con B: Only parts of a prototype might be completed in September.
Pro C: New and better Prolog systems are likely to appear in a not too
distant future. Hence TR should for the moment engage in research
and development rather than develop a prototype.
Con C: The whole approach of TR might be discredited if a complete
prototype isnt completed by September.
At a preliminary meeting with me it turned out that C was not a viable
option, and I was asked to demonstrate the possible merits of a decision
theoretic approach to make a choice between A and B. The result is shown
in the next section.

1.1.3

Monty Hall

Suppose that you are to select one of three doors in a popular American TV
show. You are informed that there is a car behind one of the doors, and a
lemon behind each of the other ones. When you have made your choice, the
host Monty Hall, opens one of the doors that you didnt choose, and behind
which there is a lemon. You are now invited to make a new choice. Should
you accept the invitation and choose the other, still closed, door?

1.2
1.2.1

Procedures for handling decision problems: the decision theoretic approach


Preliminaries

The salient feature of a decision theoretic approach to decision problems is


that an option considered is evaluated in accordance with its consequences.
This entails that an evaluation of an option must be preceded by both an
enumeration and an evaluation of its consequences. For some decisions, see
for instance the Monty Hall example above, this causes no problem, but in
the other cases considered above, it is far from obvious how this is to be
accomplished. I will return to these problems in a moment after a few useful
general steps have been considered.
First of all it must be decided if the problem considered is a problem of
choice or a problem of satisfaction, or even an optimization problem. Then
a suitable framework has to be considered. Take as an example the problem
of how to handle the nuclear waste from the nuclear power plants in Sweden.
This could be seen either as a problem limited to nuclear waste or in the
more general context of long lived toxic waste. In this area feasible strategies
clearly depend on the framework chosen. Furthermore, a horizon must be
fixed since it is in practice impossible to enumerate all consequences of a
given option. Finally the evaluation of the consequences must in some cases
proceed in stages starting from a consideration of price or quality and ending
in a final evaluation of them.

1.2.2

Examples

In this section it is shown how a decision theoretic approach can be used


to solve the problems mentioned above. The method used will be described
in more detail in chapter 3 followed by some theoretical underpinnings in
chapters 4 and 5.
Gangrene First, the doctor should try to find out if there exists a decisive
factor. In this case two such factors come to mind: the possibility of a
recovery and the risk of serious complications. If one of these is decisive for
the patient, then the problem is an easy one. But if this is not the case,
then some trade off between values and probabilities is needed. One way of
doing this is to let the probabilities of the various consequences act as weights
and consider the weighted arithmetic mean of the values of the consequences
of the options. Then the option with highest mean value is chosen. Sadly,

this approach presupposes a mathematical representation of the values of the


outcomes, something found strange by some people in the medical profession.
However, to my mind this can be done in an uncontroversial way. Let us first
label the consequences as follows:
r: recovery
b: an artificial limb below the knee
a: an artificial limb above the knee
s: serious consequences
(Note that s in reality is a set of consequences, but since these are the
same from the options considered, there is no point in enumerating them.)
Let v be a value function, which maps a consequence to a specific value.
Consequently, if c is a consequence, then v(c) is the value of that consequence.
Returning to our decision problem, and with the help of the value function
v, we can now define the values of the consequences: v(r) is the value of a
recovery, v(b) is the value of an artificial limb below the knee, v(a) is the
value of an artificial limb above the knee, and v(s) is the value of serious
complications.
If we let the values of the consequences correspond to some numerical
value, and the better the consequence the higher that numerical value would
be, most patients are likely to rank the consequences as
v(r) > v(b) > v(s),
v(r) > v(a) > v(s),

(1.1)
(1.2)

meaning that recovery is better than an artificial limb below the knee, which
in turn is better than serious consequences. Likewise, recovery is better than
an artificial limb above the knee, which is better than serious consequences.
Most patients would also set
v(b) > v(a)

(1.3)

so I will concentrate on this case.


Let us now calculate the mean consequence value (denoted M ) of an immediate operation, MO and of a drug treatment, MM . The decision problem
can be modeled as a tree structure with nodes connected by entities (see
figure 1.1 and 1.2), where the numbers above the entities represent the probabilities of the respective outcome. For example, if we would choose the
alternative O, then the probability of a successful operation b is 0.99. In the
8

Figure 1.1: The tree of an immediate operation, showing the probability


of a successful operation b, to be 99 percent, and the probability of serious
complications s to be 1 percent.

Figure 1.2: This is the tree of the choice of a drug treatment. The probability
for recovery r is 70 percent. Should the drug treatment fail, the probability of
a successful operation with an artificial limb above the knee a is 90 percent,
and the probability of serious consequences s is 10 percent.
tree of M (see figure 1.2), the symbol o stands for an operation, given that
the drug treatment failed.
A mean consequence value is the weighted average of the mean consequence values of the subsequent nodes, and the weighted average is the sum
of all consequence values multiplied by their respective probabilities. Looking at the tree in figure 1.3, the weighted average of the nodes is calculated
as
aP (a) + bP (b) + cP (c)
where a, b and c represents the value of each node and P is the probability
function. You can read more about the probability function in appendix 1.
The mean consequence value of M, is the weighted average of v(o) and
v(r). We have already set v(r) = 1, but we need to calculate the value of o,
which in turn is the weighted average of v(s) and v(b), in order to calculate

Figure 1.3: The weighted average of a, b and c, is calculated as aP (a) +


bP (b) + cP (c).
MM . Lets start by calculating v(o), and then continue with MM and MO :
v(o) = P (a)v(a) + P (s)v(s)
MM = P (o)v(o) + P (r)v(r)
MO = P (b)v(b) + P (s)v(s)

(1.4)

From the inequalities in (1.2) we know that r is the best consequence,


and s is the worst consequence. Setting v(r) = 1 and v(s) = 0 gives us a
numerical interval of consequence values ranging from 0 to 1, which enables

10

us to simplify v(o), MM and MO to


v(o) = P (a)v(a) + P (s)v(s)
= P (a)v(a) + P (s) 0
= P (a)v(a)
= 0.9v(a)
MM = P (o)v(o) + P (r)v(r)
= P (o)v(o) + P (r) 1
= P (o)v(o) + P (r)
= 0.3 (0.9v(a)) + 0.7
= 0.27v(a) + 0.7

(1.5)

MO = P (b)v(b) + P (s)v(s)
= P (b)v(b) + P (s) 0
= P (b)v(b)
= 0.99v(b)
To determine which of the alternatives O and M has the highest value,
we can calculate the difference M between the mean consequence values of
the two alternatives as
M = MM MO
= 0.27v(a) + 0.7 0.99v(b)

(1.6)

Consequently, if M is positive, the preferred alternative should be a


treatment with drugs M, and if its negative the rational choice should be
an immediate operation O.
A three dimensional plot of the equation in (1.6) can be seen in figure 1.4
(one dimension for each variable v(a) and v(b), and one dimension for the
result M ). The plots triangular shape is due to the constraint in (1.3),
that v(b) > v(a). As v(b) 0 and v(a) 0, M approaches its maximum
Mmax , and as v(b) 1 and v(a) 0, M approaches its minimum Mmin .
From (1.2) we know that v(b) and v(a) can only take on values in the so
called open interval ]0, 1[, i.e. any value in between 0 and 1, but not exactly
0, or exactly 1. Thus
Mmax < 0.27 0 + 0.7 0.99 0 = 0.7
Mmin > 0.27 0 + 0.7 0.99 1 = 0.29
11

Figure 1.4: From the plot we can see that M approaches its maximum
when both v(a) and v(b) approaches 0, and approaches its minimum when
v(a) approaches 0 and v(b) approaches 1.
and hence M must lie somewhere in the interval
0.29 < M < 0.7.
Since M can take on both negative and positive values, we need more
constraints on the values in order to favor one of the options. In particular we
must introduce the notion of distance between values. One such constraint
that comes to mind is
v(r) v(b) v(b) v(a)
which reflects the opinion that the distance in value between recovery and an
artificial limb below the knee, is not smaller than the distance between two
different artificial limbs. To employ the inequality we can rearrange things a
bit:
v(r) v(b) v(b) v(a)
v(r) 2v(b) v(a)
v(r) + v(a) 2v(b)
v(r) + v(a)
2
and just as above, set v(r) = 1, which in turn gives
v(b)

v(b)

1 + v(a)
.
2
12

At the same time we know from (1.3) that


v(a) < v(b)
and thus we have
v(a) < v(b)

1 + v(a)
.
2

Applying the above inequality to M gives us


1 + v(a)
M < 0.27v(a) + 0.7 0.99v(a)
2

0.205 0.225v(a) M < 0.7 0.72v(a)

0.27v(a) + 0.7 0.99

(1.7)

and thus a new interval for M , depending only on v(a) where 0 < v(a) < 1,
as in figure 1.5.
M
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.2

0.4

0.6

0.8

1.0

vHaL

Figure 1.5: This plot shows the boundaries of M as functions of v(a) only.
By using (1.7) and solving
0.7 0.72v(a) = 0 v(a) =

0.7
= 0.972222
0.72

we see that for M to lie completely in a negative interval (i.e. for an


immediate operation MO to be preferred over a treatment with drugs MM ),
the value of an artificial limb above the knee v(a) need to be greater than
0.972222. But then, since v(r) > v(b) > v(a), the values of v(r), v(b) and
v(a) would be so close that an operation now can only be recommended if
the risk of serious complications is a decisive factor. Hence the doctor should
recommend a medical treatment even in this case. This is also a reasonable
recommendation even in the absence of the inequality
v(r) v(b) v(b) v(a)
13

because it seems safe to assume that the value of an artificial limb above
the knee v(a), is at least as high as 0.8. Setting v(a) = 0.8 in the original
equation for M , gives us
M = 0.27v(a) + 0.7 0.99v(b)
= 0.27 0.8 + 0.7 0.99v(b)
= 0.916 0.99v(b)

(1.8)
(1.9)
(1.10)

which in turn means that v(b) must higher than 0.925... if M is to be less
than zero, and consequently an immediate operation MO is to be preferred.
Suitable language for a computer program: a decision theoretic
approach As a first step it was agreed upon that the consequences ci of
the two options were the following ones:
Option A
c1 Prototype in C ready in September 1988, costs quite high, staff
not entirely satisfied
c2 Prototype in C somewhat delayed due to circumstances outside
the control of TR
c3 Neither c1 nor c2
Option B
c4 Prototype in Prolog ready in September 1988, low costs, staff
pleased
c5 Prototype in Prolog somewhat delayed due to circumstances outside the control of TR
c6 Only fractions of a prototype ready in September 1988, low costs,
staff frustrated
c7 None of c4 , c5 or c6
It was then agreed upon that the probabilities and values were the following ones:
Probabilities The probability of c1 is quite high, at least 2/3. It is
almost certain that c1 or c2 occurs. The probability of the event c4 or c5 is
quite uncertain, could be as low as 0.1 and as high as 0.6. The event c4 or
c5 or c6 is almost certain.
14

Values The most desirable consequence is c4 , which is slightly better


than c5 . These two are clearly more desirable than c1 , which is slightly better
than c2 whereas c6 is much worse than c2 .
Comparison of the options As is to be expected, the staff did not
consider any factor as decisive. Therefore it is natural to compare the mean
values of the options or, to borrow a term from probability theory, their
expected values. As in the previous example this presupposes a mathematical
representation. After some discussion TR accepted the following one:
P (c1 ) 2/3
P (c1 ) + P (c2 ) = 1

(1.11)

0.1 P (c4 ) + P (c5 ) 0.6


P (c4 ) + O(c5 ) + P (c6 ) = 1

(1.12)

v(c6 ) < v(c2 ) < v(c1 ) < v(c5 ) < v(c4 )


v(c1 ) v(c2 ) = v(c4 ) v(c5 ) < v(c5 ) v(c1 ) < v(c2 ) v(c6 )

(1.13)

Here P (ci ) is the probability of ci , and v(ci ) is its numerical value. Note
that the representation is not to be viewed as an exact one. For instance,
the interval [0.1, 0.6], should be viewed as sufficiently large for P (c4 ) + P (c5 ).
Hence we must be careful not to let a ranking of the options depend on values
near the boundaries.
To simplify the comparison between the options we use the variables a, b
and c such that
a = P (c1 ),
b = P (c4 ),
c = P (c5 ).
Take note that we now, according to (1.11) and (1.12), can substitute
P (c2 ) with 1 a, and P (c6 ) with 1 (b + c). In addition, we set
v(c4 ) = 1
v(c6 ) = 0

(1.14)

to reflect the best and the worst consequence, and use the variables x and y
where
x = v(c2 )
y = v(c1 ) x = v(c1 ) v(c2 ) = v(c4 ) v(c5 )
15

(1.15)
(1.16)

From (1.13) we know that


v(c4 ) v(c5 ) = v(c1 ) v(c2 )
so we can, according to (1.14), (1.15) and (1.16), make the substitution
v(c5 ) = v(c4 ) + v(c2 ) v(c1 )
= 1 + x (x + y)
= 1 y.
Calculating the expected value of the alternatives A and B gives us
E[A] = P (c1 )v(c1 ) + P (c2 )v(c2 )
= a(y + x) + (1 a)x
= ay + ax + x ax
= x + ay
and
E[B] = P (c4 )v(c4 ) + P (c5 )v(c5 ) + P (c6 )v(c6 )
= b + c(1 y) + (1 (b + c))
= b + c(1 y)
where the intervals for a, b, c, where again a = P (c1 ), b = P (c4 ) and c =
P (c5 ), are
2
a1
3
0b
0c
0.1 b + c 0.6

(1.17)
(1.18)
(1.19)
(1.20)

The intervals for x and y are a bit more tricky. In (1.15) we set
x = v(c2 )
and in (1.16)
y = v(c1 ) x = v(c1 ) v(c2 ) = v(c4 ) v(c5 ).
In other words, y represents the difference between v(c1 ) and v(c2 ), and
between v(c4 ) and c(v5 ). Since v(c6 ) = 0, x represents the difference between
16

v(c2 ) and v(c6 ). If we add another variable z = v(c5 ) v(c1 ) and look at the
inequalities in (1.13), we see that
0 < y < z < x < 1,
2y + z + x = 1
and thus we have that
0 < y < 0.25 < x < 1,
x + y > 0.5.
We now start the comparison of E[A] and E[B] by computing the maximum, minimum and mean of both, using the domain
D = {(a, b, c) | 2/3 a 1, 0 b, 0 c, 0.1 b + c 0.6}
specified above. Repeating the formulas for the expected values of A and B
we have
E[A] = x + ay
E[B] = b + c(1 y)

(1.21)
(1.22)

We begin with calculating the minimum value of E[A]. From (1.21) it


should be clear that we need to set a to its minimum value according to D,
and thus we obtain
2
min(E[A]) = x + y.
3
To get the maximum value of E[A] we set a to its largest value and get
max(E[A]) = x + y.
We then treat E[B] in the same way which gives
min(E[B]) = 0.1(1 y),
max(E[B]) = 0.6.
As is customary, the mean of the means is defined as the integral of the
means over D, divided by the area of D. This gives
RRR
(x + ay) dA
DRRR
mean(E[A]) =
dA
D
17

and
RRR
D

mean(E[B]) =

(b + c(1 y)) dA
RRR
.
dA
D

Starting with
ZZZ
dA
D

we notice that the boundaries of a dont pose any particular problems. However, the boundaries of b and c are not as simple. Looking at the region plot
in figure 1.6 we see that the region bounded by b and c forms a triangle with
one of the corners missing. Thus we can start by integrating over the whole
triangle and then subtract the integration over the missing corner.
0.6

0.5

0.4

0.3

0.2

0.1

0.0
0.0

0.1

0.2

0.3

0.4

0.5

0.6

Figure 1.6: The region bounded by b and c.


The line that makes up the top boundary of the triangle can be expressed
as
c = 0.6 b =

3
b
5

where 0 b 35 , and the line making up the top boundary of the missing
corner can similarly be expressed as
c = 0.1 b =
18

1
b
10

where 0 b
Z

1
.
10
1

Now we can write the integral over D as

a= 23

3
5

b=0

3
b
5

1
10

dc db da

1
b
10

dc db da
a= 23

c=0

b=0

c=0

Calculating one term at the time result in


Z

3
5

3
b
5

3
5

dc db da =
a= 23

b=0

a= 23

c=0

[c]05

db da

b=0
3
5


3
=
b db da
5
a= 23 b=0
3
Z 1 
3
b2 5
c
da
=
2 0
a= 23 5

Z 1 
9
9
=

da
25 50
a= 23

1
9
9
=
a a
25
50 2
3

9
9
6
6
=

25 50
25 50
3
=
50
Z

19

and
Z

1
10

1
b
10

1
10

dc db da =
a= 32

b=0

a= 32

c=0

[c]010

db da

b=0

1
10

1
10


1
=
b db da
10
a= 23 b=0
1
Z 1 
b2 10
1
c
=
da
2 0
a= 23 10

Z 1 
1
1
=

da
100 200
a= 32

1
1
1
=
a
a
100
200 2
3

2
1
1
1
=

100 200
300 300
1
=
600
which finally gives
ZZZ
dA =
D

a= 32

3
5

b=0

3
b
5

dc db da

dc db da =
a= 23

c=0

1
b
10

b=0

c=0

3
1
7

=
.
50 600
120
Continuing in the same manner with calculating
ZZZ
(x + ay) dA =
D
Z 1
a= 32

3
5

b=0

3
b
5

(x + ay) dc db da

20

1
10

1
b
10

(x + ay) dc db da,
a= 23

c=0

b=0

c=0

taking one term at a time, gives


Z

3
5

3
b
5

3
5

a= 23

b=0

c=0

a= 23
1

[cx + acy]05

(x + ay) dc db da =

db da

b=0
3
5


3
3
=
x bx + ay aby db da
5
5
a= 32 b=0
3
Z 1 
3
b2 x 3
ab2 y 5
bx
+ aby
=
da
2
5
2 0
a= 23 5

Z 1 
9
9
9
9
x x + ay ay da
=
25
50
25
50
a= 23

1
9
9
9 2
9 2
=
ax ax + a y
ay
25
50
50
100
2
3

1
9
9 2
=
ax +
ay
50
100
2
 3

9
9
18
36
= x+
y
x+
y
50
100
150
900
3
1
= x+ y
50
20
Z

21

and
Z 1 Z

1
10

1
b
10

1
10

(x + ay) dc db da =
a= 23

b=0

a= 23

c=0

[cx + acy]010

db da

b=0

1
10


1
1
=
x bx + ay aby db da
10
10
a= 23 b=0
1
Z 1 
b2 x
1
ab2 y 10
1
bx
+ aby
=
da
2
10
2 0
a= 23 10

Z 1 
1
1
1
1
x
x+
ay
ay da
=
100
200
100
200
a= 23

1
1
1
1 2
1 2
=
ax
ax +
a y
ay
100
200
200
400
2
3
1

1 2
1
ax +
ay
=
200
400
2
Z

1
2
4
y
x
y
400
600
3600
1
y
720

1
x+
=
200
1
=
x+
600
resulting in
ZZZ
(x + ay) dA =
D

a= 23

3
5

b=0

3
b
5

(x + ay) dc db da

1
b
10

(x + ay) dc db da =
a= 23

c=0

1
10

b=0

1
3
x+ y
50
20

c=0

1
1
x+
y
600
720


=

7
7
x+
y
120
144

which in turn gives


RRR
mean (E[A]) =

(x + ay) dA
=
dA
D

DRRR

7
x
120

7
y
144

7
120

5
= x + y.
6

(1.23)

To calculate the mean of E[B], we first solve


ZZZ
(b + c(1 y)) dA =
D

a= 32

3
5

b=0

3
b
5

Z
(b + c(1 y)) dc db da

a= 23

c=0

22

1
10

b=0

1
b
10

c=0

(b + c(1 y)) dc db da.

Starting with the first term we have


Z

a= 32

3
5

b=0
1

3
5

(b + c(1 y)) dc db da =

c=0
3
5

=
a= 32

3
b
5

b=0

c2 (1 y)
bc +
2

 53 b
db da =
0


9
b2 9y 3by b2 y

+

db da =
=
50
2
50
5
2
a= 23 b=0
3
Z 1 
9b b3 9by 3b2 y b3 y 5
=

+

da =
6
50
10
6 0
a= 53 50

Z 1 
9
9y
da =

=
125 250
a= 35

1
9a
9ay
=

=
125 250 2
3


9
9y
6
3y
=

=
125 250
125 125
3y
3

=
125 250
Z

23

and the second term yields


Z

a= 23

1
10

b=0
1

1
10

(b + c(1 y)) dc db da =

c=0
1
10

=
a= 32

1
b
10

b=0

c2 (1 y)
bc +
2

 101 b
db da =
0


1
b2
y
by b2 y

+

db da =
=
200
2
200 10
2
a= 32 b=0
1
Z 1 
b
b3
by
b2 y b3 y 10
=

+

da =
6
200
20
6 0
a= 32 200

Z 1 
1
y
da =

=
3000 6000
a= 23
h a
ay i1
=

=
3000 6000 23


1
1
y
y
=

=
3000 6000
4500 9000
y
1

=
9000 18000
Z

which taken together gives




ZZZ
3
3y
1
y
(b + c(1 y)) dA =

125 250
9000 18000
D
43
43y
=

.
1800 3600
We then have
RRR
mean(E[B]) =

(b + c(1 y)) dA
RRR
=
dA
D

43
1800

7
120

43y
3600

43
43y

. (1.24)
105 210

A pairwise comparison between these linear forms then yields


max = max(E[A]) max(E[B]) = x + y 0.6,
43
109
y
,
mean = mean(E[A]) mean(E[B]) = x +
105
105
2y
min = min(E[A]) min(E[B]) = x +
0.1(1 y),
3

24

where again 0.25 < x < 1 and 0 < y < 0.25. Hence only max can have
different signs, whilst mean and min always will be positive. But if D is
replaced by
D0 = {(a, b, c) : 2/3 a 1, 0 b, 0 c, 0.1 b + c 0.5}

(1.25)

then max(E[B]) = 0.5 and consequently


max = x + y 0.5 > 0
since x + y > 0.5.
In order to test the stability of the sign of mean , D can be replaced by
D00 = {(a, b, c) : 2/3 a 0.7, 0 b, 0 c, 0.3 b + c 0.6}

(1.26)

which still yields


203
111
y
>0
(1.27)
124
465
(the calculations are left to the reader in order to save space) since x+y > 0.5
and x > 0.25. The sign of mean is remarkably stable, and since other
sensitivity tests yield similar results, we can conclude that A is to be preferred
to B.
mean = x +

Monty Hall We would have reasoned as follows. In line with the classical
theory of probability, the distribution of objects behind the doors should
be viewed as the outcome of a stochastic experiment. Moreover, there are
two possibilities with regard to the distributions. Either all distributions
are equally likely, or the car is most likely to appear behind the door in
the middle since the show is sponsored by a car maker who wants the best
display of the car. These considerations would have led us to select the door
in the middle. Assume now that the host opens door number one. Is it then
more likely that the car is behind door number three than behind the door
in the middle? An answer to this question must await the result of a few
calculations depending on the cases mentioned above.
Case 1. All distributions are equally likely. Let Ci be the event
that the car has been put behind door i, then
P (C1 ) = P (C2 ) = P (C3 ) = 1/3.
Now, let Si be the event that door i has been selected and Oi the event that
door number i has been opened by the host. Then
P (Ci ) = P (Ci |Sj )
25

for i, j = 1, 2, 3. In other words, since the objects were put behind the doors
in advance, the probability that the car has been put behind door i remains
the same, regardless of which door has been selected; the expression P (Ci |Sj )
is the probability that the car has been put behind door i, given that door
j has been selected, see Appendix 1. Moreover, Monty Hall will never open
the door behind which there is a car, thus
P (Oi |Ci and Sj ) = 0
for i, j = 1, 2, 3. However, since we dont know whether Monty Hall favours
one of the doors left, when there is a possibility of choice (in other words
when the door behind which the car is hidden has been selected) we have to
set
P (Oj |Ci and Si ) = p
where i, j = 1, 2, 3, i 6= j and 0 p 1. Lastly, we can be certain that
Monty Hall wont open the door with the car behind it, and thus
P (Ok |Ci and Sj ) = 1
for i, j, k = 1, 2, 3 where i 6= j 6= k.
Now we can calculate the probability that the car is behind the door we
selected initially, given that Monty Hall opened one of the other doors, as
P (Ci |Oj and Si ) =

P (Ci and Oj and Si )


P (Oj and Si )

where i, j = 1, 2, 3 and i 6= j. From now on the word and will be omitted


from the expressions in order to save space.
The numerator can, using the product rule twice, be expressed as
P (Ci Oj Si ) = P (Oj |Ci Si ) P (Ci Si )
= P (Oj |Ci Si ) P (Ci |Si ) P (Si )
 
1
P (Si )
=p
3
P (Si )
=
p.
3
Passing to the denominator we note that the events Oj and Si can occur
both when the car is behind door i or behind door k, where k = 1, 2, 3 and
k 6= i, j. Remember that Monty never will open the door containing the car
26

and thus we dont need to consider the case when the car is behind door j.
Consequently
P (Oj Si ) = P (Oj Si Ci ) + P (Oj Si Ck )
but applying the product rule to the individual terms yields
P (Oj Si ) = P (Oj |Si Ci ) P (Si Ci ) + P (Oj |Si Ck ) P (Si Ck )
= P (Oj |Si Ci ) P (Ci |Si ) P (Si ) + P (Oj |Si Ck ) P (Ck |Si ) P (Si ).
and since weve already specified the probabilities of the factors at the beginning of this section, we can finally write the denominator as
 
 
1
1
P (Oj Si ) = p
P (Si ) + 1
P (Si )
3
3
p
1
= P (Si ) + P (Si )
3
3
P (Si )
=
(p + 1) .
3
Going back to our original expression, replacing the old expressions of the
numerator and denominator, we have
P (Ci and Oj and Si )
=
P (Ci |Oj and Si ) =
P (Oj and Si )

P (Si )
p
3
P (Si )
(p +
3

1)

p
.
p+1

Now, because 0 p 1, we know that


0

p
1

p+1
2

and thus we would recommend choosing the other, still closed, door.
Case 2. The distribution of cars is skewed. In contrast to the
previous case, we now consider the case where P (Ci ) > P (Cj ) = P (Ck ) and
set
P (Ci ) = q
and
P (Cj ) = P (Ck ) =

27

1q
2

where

1
3

< q < 1. Calculations similar to those in case 1 yield


pqP (Si )
pqP (Si ) + 1q
P (Si )
2
pqP (Si )

=
P (Si ) pq + 1q
2
pq
=
.
pq + 1q
2

P (Ci |Oj Si ) =

As a matter of curiosity, if we define the domains






1

D = (p, q) 0 p 1, < q < 1
3
and

)

pq
1

(p, q)
>
pq + 1q
2
2

(
D0 =
then

D0

R 
D

pq
pq+ 1q
2

dA

pq
pq+ 1q
2

dA

0.84

which would lead to the recommendation of not changing doors. However,


one could argue that it is unlikely that q > 0.4, and for
1
0.4p
>
0.4p + 0.3
2
p has to be greater than 3/4. But that Monty Hall would prefer a particular door in three out of four cases seems highly unlikely. Hence it seems
reasonable to make a switch even in this case.

28

Chapter 2
Classical Decision Theory
2.1

Pure Classical Decision Theory

This theory can be viewed as an application of Probability Calculus to games


of chance like the following ones:
Game 1
One throw of four distinguishable dice.
Stake: 200 SEK
Prizes:
(a) 12 000 SEK if the product of the outcomes is an odd number and their
sum a square one.
(b) 4 000 SEK if the product of the outcomes is an odd number and their
sum minus one a square number.
(c) 1 000 SEK if the product of the outcomes is an odd number and neither
(a) nor (b) holds.
Game 2
One throw of four distinguishable dice.
Stake: 200 SEK
Prizes:
(a) 3 000 SEK if the product of the outcomes is a square number, their
sum is odd, and the number one does not occur.
(b) 2 000 SEK if the product of the outcomes is a square number, their
sum is even, and the number one does not occur.
29

(c) 550 SEK if the product of the outcomes is a square number and neither
(a) nor (b) holds.
Game 3
One throw with four distinguishable dice.
Stake: 200 SEK
Prize: 202 SEK if the product of the outcomes is at least two.
Game 4
One throw with four distinguishable dice.
Stake: 200 SEK
Prize: 250 000 SEK if the product of the outcomes is one.
According to the classical theory all outcomes of these lotteries are equally
likely. Moreover, each lottery has a value that is equal to its expected prize,
which is defined as the weighted arithmetic mean of the prizes with their
probabilities serving as weights. Accordingly, the classical theory assigns the
following expected prizes to our lotteries:
Game 1:

11
1296

12000 +

Game 2:

24
1296

3000 +

Game 3:

1295
1296

202 = 201.84

Game 4:

1
1296

250000 = 192.90

16
1296

65
1296

4000 +

2000 +

54
1296

110
1296

1000 = 192.90

550 = 202.55

The assignment of expected prizes is to serve as a guide for a choice, and


the classical theory in this case recommends game 2 if one were to choose
between the different games. Furthermore, it even prescribes that anyone
at anytime should accept an offer to play game 2 or game 3, because their
expected prizes are higher than the stakes. The basis for this prescription is
provided by Bernoullis law of large numbers (see Appendix 1) which shows
that the average return in the long run, with a high probability, will be close
to the expected prize. Thereby, an empirical hypothesis to the effect that the
relative frequency of a given outcome in the long run will equal its probability,
is presupposed. To assess the basis for the classical assignment of expected
prizes, it may be instructive to estimate the long run in this case. We start
by noticing the variances of the different games.
Game 1: 1 424 209
Game 2: 351 934
30

Game 3: 31
Game 4: 48 188 098
If we then demand that the probability of a mean deviation of at most
4 SEK from the expected prize, should be at least 0.99, then Chebyshevs
inequality yields the following numbers:
Game 1: 8 901 305
Game 2: 2 199 588
Game 3: 197
Game 4: 301 175 611
Now, these estimates are mainly interesting from an historical point of
view since modern sharper estimates yield lower but still forbiddingly high
numbers. The Bernstein-Bennett inequality (Bennett, 1962), for instance,
yields the following ones:
Game 1: 953 608
Game 2: 235 540
Game 3: 21
Game 4: 32 134 317
Using Talagrands (1995) inequality we obtain even smaller numbers.
Note that it isnt applicable to game 3 due to the size of the interval in
comparison to the maximum win:
Game 1: 810 700
Game 2: 200 185
Game 4: 27 332 010
Moreover, extensive simulations (see Appendix 3), yield numbers in the
same region. Hence these numbers are likely to be the true ones.
But then the evaluation of games of chance given by the classical theory
seems rather unconvincing and its influence almost a mystery. As a basis
for a choice between the lotteries above the following considerations seem
more natural. A risk averse person should choose game 3 whereas one only
interested in games where there are prospects for a substantial gain should
31

choose game 4. If, on the other hand, someone badly needs 12 000 SEK, then
game 1 ought to be the most attractive one. Only if such considerations are
not met with success a trade-off between prizes and probabilities seems to be
needed. Then basing a choice between games on their expected prizes seems
to be attractive mainly because the expected prize takes care of probabilities
and prizes in such a simple way.
After this account the salient features of Pure Classical Decision Theory
may be summarized as follows:
it aims at supporting a choice between options which are completely
analyzed; i e the consequences of each option are determined and have
prizes attached to them,
the probabilities of the consequences are determined by considerations
of symmetry, and
the value of an option equals its expected prize.

2.2

Modified Classical Decision Theory

2.2.1

Introduction to utilities

Utilities were introduced by Daniel Bernoulli in 1738 to solve a mathematical


problem (see Appendix 2). But it seems reasonable to use them even for
handling finite decision problems like the ones described in sections 1.1.1
and 2.1.
Utilities are real numbers defined by functions from the outcomes of options or, as in section 2.1, from the prizes associated with these outcomes.
In the example of section 2.1, we can for instance define a utility function by
setting

if x 1000
x,
(2.1)
U (x) =
log10 (x) 1000 , if x > 1000
3
Then the expected utility of the games in section 2.1 are as follows:
Game 1:

11
16
54
U (12000) +
U (4000) +
U (1000) = 68.03
1296
1296
1296

Game 2:

24
65
110
U (3000) +
U (2000) +
U (550) = 123.33
1296
1296
1296

Game 3:

1295
U (202) = 201.84
1296
32

Game 4:

1
U (250000) = 1.39
1296

Hence a person who opts for choosing between games based on this utility
function should choose game 3, provided she evaluates the games in accordance with the utility principle: the value of a game equals its expected
utility.
But what grounds are there for evaluating games in accordance with this
principle? In case
U (x) = ax + b with a > 0
you can rank the games via the Law of Large Numbers since the expected
utility of a game g then equals
aE[g] + b.
But otherwise other grounds must be provided. Attempts in this direction
will be discussed in chapter 5.

33

Chapter 3
Supersoft Decision Theory
Supersoft Decision Theory (SSD) is a family of modifications of Classical
Decision Theory. The common trait is that vague and numerically imprecise
estimates of probabilities and values are allowed. The main advantage of
admitting such estimates is that it makes a smooth application of decision
theoretic methods to new problems feasible. This much can be seen from
the two first examples of chapter 1. The main drawback, of course, is that
calculations then become much harder and therefore in many cases require a
suitable software. In this chapter one version of the theory, called weak SSD
(wSSD), is delineated. For other versions, see Danielson (1997), Ekenberg
(2005), and Sundgren (2011).

3.1

Representation of Decision Problems

We assume that the number of options is finite and that each option has a
finite number of consequences. Sometimes the consequences of an option are
best described as nodes of a tree. Such a tree can in some cases be quite
extensive, see e.g. Johansson (2003, 300-315).
Mathematically, a finite tree can be identified with a finite set T of finite
sequences of natural numbers such that
i. T contains exactly one sequence of minimal length, called the root of
the tree, and
ii. if (s1 , . . . , sn1 , sn ) is an element of T , then either (s1 , . . . , sn1 , sn ) is
the element of minimal length or (s1 , . . . , sn1 ) is an element of T .
If (s1 , . . . , sn1 , sn ) and (s1 , . . . , sn1 ) both are in T , then (s1 , . . . , sn1 , sn )
is an immediate successor of (s1 , . . . , sn1 ) in T .
34

An element of T without immediate successors in T is called a leaf of T .


Hence the first step in the representation of a decision problem consists of
the construction of a number of trees. This is followed by an estimate of the
probability of each node of each tree save the ones of minimal length.
1

Figure 3.1: An example of a tree structure.


The tree T in figure 3.1 can be described as the following set of sequences,
where each sequence represents the path to a node:

(1, 2), (1, 2, 5),


(1),
(1, 2, 6), (1, 3), (1, 3, 7), .
T =

(1, 3, 8), (1, 3, 9), (1, 4)


Since the node labeled 1 is the minimal sequence of T , this is the root of
the tree. The nodes labeled 5, 6, 7, 8 and 9 are all leaves since they dont
have any immediate successors. We can also see that the sequences together
represent all possible paths from the root to the different nodes.
In wSSD any kind of probability estimate is admitted but the basic ones
have one of the following four forms:
i. P (E) = a,
ii. P (E1 ) = P (E2 ),
iii. a < P (E) < b, or
iv. a P (E) b.
The first form is the only one that is mandatory since the immediate
successors of a node form a sample space. The third one occurs above all in
mathematical representations of vague estimates like the event E is quite
likely or it is more probable than not that E occurs, since vague estimates
35

must always be represented by open sets. Note in particular that if an estimate of E is vague then the same must hold of the corresponding estimate of
not E. Hence these two estimates must be represented by overlapping open
sets. Finally, the value of each leaf is estimated.
Sometimes such an evaluation is straightforward but sometimes it takes
the form of an aggregation. In public procurement, for instance, tenders
typically are first evaluated according to different criteria such as quality
and price. Then these evaluations are aggregated in some other way. In
wSSD, primary evaluations of almost any form are admitted but three forms
present themselves as the most natural ones:
i. v(E) = a,
ii. a < v(E) < b, and
iii. an ordering combined with an ordering of distances.
For convenience, all numbers are assumed to be non-negative and at most
1. As an end product the original decision problem is represented by a mathematical structure F called a decision frame. Such a structure typically has
the form (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]). Here o1 , . . . , on are the options
considered and T1 , . . . , Tn the corresponding decision trees. Moreover, S[p],
is the set of probability estimates and U[v] the set of value estimates.
Lets look at a couple of examples that utilizes the above premises. The
tree structures in figure 3.2 represents two different options o1 and o2 . If we
let T1 and T2 be the corresponding decision trees then
T1 = {(E1 ), (E1 , E1,1 ), (E1 , E1,2 )}
and
T2 = {(E2 ), (E2 , E2,1 ), (E2 , E2,2 ), (E2 , E2,3 )} .
E1

E1,1

E2

E1,2

E2,1

E2,2

E2,3

Figure 3.2: Each of the decision trees represents a unique option.


Since the events E1,1 and E1,2 make up the complete sample space of E1 ,
the probability
P (E1 ) = P (E1,1 ) + P (E1,2 ) = 1.
36

Likewise, the probability


P (E2 ) = P (E2,1 ) + P (E2,2 ) + P (E2,3 ) = 1,
which reflects our belief that no outcomes except the immediate successors
to E1 or E2 , which we have defined explicitly, can occur. In its simplest
form we can then let P (E1,1 ) = a and P (E1,2 ) = 1 a, where a represents
an exact probability estimate. Using vague probability estimates for the
outcomes E2,1 , E2,2 and E2,3 , we could for example set
a < P (E2,1 ) < b and P (E2,2 ) < P (E2,3 ),
which would represent the belief that the probability of E2,2 lies somewhere
in between a and b, and that the probability of E2,2 is less than that of E2,3 .
When it comes to estimating the values of the different events, it is usually
not possible to directly estimate the values of the options, in this case E1
and E2 , since these are dependent on their respective successors. However,
we can estimate the values of E1,1 and E1,2 exactly as
v(E1,1 ) = a

and

v(E1,2 ) = b

where a and b are real numbers in the interval [0, 1]. Again, using vague
estimates for the values of E2,1 , E2,2 and E2,3 we can for example set
a < v(E2,1 ) < v(E2,2 )

and

v(E2,3 ) = 2v(E2,1 ).

Using the definitions we made earlier, we can now define the decision
frame
F = (E1 , E2 , T1 , T2 , S[p], U[v]) ,
where
S[p] = {P (E1 ) = 1,
P (E1,1 ) = a,
P (E1,2 ) = 1 a,
a < P (E2,1 ) < b,
P (E2,2 ) < P (E2,3 )}.
and
U[v] = {v(E1,1 ) = a,
v(E1,2 ) = b,
a < v(E2,1 ) < v(E2,2 ),
v(E2,3 ) = 2v(E2,1 )}.
37

3.2

Satisfiable and not Satisfiable Decision Frames

F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]) is satisfiable if both S[p] and U[v] are


solvable. Then the estimates in S[p] can be interpreted as probability estimates and the values expressed in U[v] can be measured with a common
scale. Formally, this is all that is needed to apply all valuations of options
developed for classical decision theory.
In case F is not satisfiable S[p] and U[v] may in some cases be modified
in the following ways to yield solvable sets:
i. If S[p] is not solvable, the only general way to proceed is to successively
increase the size of the intervals until a solvable set is obtained. If this
fails, then the estimates in S[p] are not genuine probability estimates
and at least some of them must be reconsidered, see Malmnas (1981).
ii. If U[v] is not solvable, the only general procedure available is the one
outlined above, but in some cases more special procedures are to be
preferred. This can be illustrated by a case much discussed by moral
philosophers in Sweden.
A well-known problem, see e.g. Bergstrom (1991), is that pairwise comparisons may yield cycles: U[v] may e.g. contain the following inequalities:
v(ui ) < v(ui+1 ) where 0 i n 1 and
v(un ) < v(u0 ),
which for the sake of clarity also can be written as
v(u0 ) < v(u1 ) < v(u2 ) < < v(un ) < v(u0 ).
Cycles may, of course, give rise to problems for decision makers, especially
if they like Bergstrom is to choose between u0 , . . . , u1000 . A simple way out
of this predicament is to replace < by in the inequalities above. Then all
options are equally good. But perhaps the following procedures more truly
reflect the value estimates in some cases.
Assume for instance that a decision maker B is facing a choice between
u0 , . . . , u1000 and that her valuations of them are as follows:
v(ui ) < v(uj ) where 0 i < j 999,
v(ui ) < v(u1000 ) where 1 i 999 and
v(u0 ) > v(u1000 ),

38

which we also can express as


v(u0 ) < v(u1 ) < v(u2 ) < < v(u998 ) < v(u999 ),
v(u1 ) < v(u2 ) < v(u3 ) < < v(u999 ) < v(u1000 ) and
v(u1000 ) < v(u0 ).
Assume, in addition, that
v(u0 ) v(u1000 ) = v(ui+1 ) v(ui ) where 0 i 999.
In other words, the pairwise distance with regards to value, between any
neighboring elements is the same.
If, then, N (ui ) is the number of pairs where ui is on top, then
N (u0 ) = 1,
N (ui ) = i where 1 i 999 and
N (u1000 ) = 999.
Hence B should choose u1000 if her choice is to be based on N since
v(u999 ) < v(u1000 ).
If her choice instead is based on a cup without seeding, then only u998 ,
u999 or u1000 can come out as the winner. Should u0 play the first round
against anyone but u1000 , then u1000 will win because it cant loose to anyone
but to u0 . Since u0 , . . . , u1000 is an uneven number of outcomes, we have to
use a so call bye competitor in a cup. Any ui will always win against the
bye. If for simplicity u0 , . . . , u1000 are given the 1001 leftmost positions in
the first round and the bye the last position, then the probability of u1000
not winning is approximately 103 (see figure 3.3 for an example). Hence a
cup yields the same result as a series in this particular case.
An alternative way of dealing with cycles is to introduce a binary function
D(ui , uj ) expressing the difference in value between ui and uj instead of a
unary utility function V (ui ). The value of this function, which satisfies the
conditions D(ui , uj ) = D(uj , ui ) and 1 D(ui , uj ) 1, need not be
a number but can be a variable with some constraints. A choice can then
determined by the function
M (ui ) =

n
X

D(ui , uj ) = D(ui , u0 ) + D(ui , u1 ) + + D(ui , un ),

j=0

where n = 1000 in the example above.


Hence, formally, cycles may not pose a great problem. However, a decision
maker, whose value assessments have given rise to a cycle, should at least
consider the following principles:
39

u1,
,
u997
u998
u998
u999
u1000

u998
u1000
u0

u0
bye

u0

Figure 3.3: Example of a cup where u1000 looses.


i. If a < b then a < pa + (1 p)b < b where 0 < p < 1.
ii. If a < b and b < c then b pa + (1 p)c where 0 < p < 1.
iii. If a < b and b c then a < c.
iv. pa + (1 p)b = (1 q)a + qb where q = 1 p and 0 < p < 1.
v. If a b and b = c then a c.
Here a, b, and c are lotteries or outcomes, a < b is short for b is clearly
better than a, and b c stands for b and c are about equally good. As is
customary, = stands for definitional equality.
From these principles and the cycle
u1 < u2 < u3 < u1
we first look at the partial relation
u3 < u1 < u2
and with the help of (ii.) conclude that
u1 qu3 + (1 q)u2 .
40

Replacing u1 in the relation u3 < u1 with the approximation above yields


u3 < qu3 + (1 q)u2 .
Furthermore, applying (i.) to the relation u2 < u3 we have
u2 < pu2 + (1 p)u3 < u3
which together with the approximation of u1 yields
pu2 + (1 p)u3 < u3 < qu3 + (1 q)u2 .
Hence there is some lottery c such that u3 is both better and worse than
c; for example when both p and q equals 1/2. Therefore B should either
reconsider her initial assessments or rebut one of these principles.

3.3

Evaluations of Options and Decision Rules


in wSSD

When the original decision problem is represented by a decision frame, evaluations and decision rules originally formulated for classical decision theory
may be used to decide whether a proposed project is acceptable or to make
a choice between the options considered. The fact that this representation
is not straightforward, however, calls for some circumspection in employing
them. In this section the evaluations used in example 1 of chapter 1 will be
presented followed by a discussion of decision rules based on extreme values.

3.3.1

Qualitative Evaluations

In many evaluations, options that are too risky are eliminated at an early
stage. In other cases only options which promise high returns are given a
closer examination. To formulate such evaluations in wSSD, take as a point
of departure a decision maker B and a satisfiable decision frame
F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]).
Then the probability of the option oi yielding an outcome of value V is
sufficiently high if and only if there exists a suitable number r such that
P (Li,V ) r. Here Li,V is the set of leaves of Ti with a value at least as high
as V (see figure 3.4), and r depends on B. This leaves us with the task of
explicating P (Li,V ) r and the outcome u has the value V .
41

root
0.5
2

0.2

0.3

Figure 3.4: Let the tree in the figure be T1 and set V = 3, then the set
L1,V = {5, 6} and consequently P (L1,V ) = P ({5, 6}) = 0.2 + 0.3 = 0.5.
Explication of P (Li,V ) r
At least the following candidates present themselves:
i. P (Li,V ) r holds for all solutions of S[p],
ii. P (Li,V ) r holds for all regular solutions of S[p],
iii. P (Li,V ) r holds for a great deal of the solutions of S[p],
iv. P (Li,V ) r holds for some regular solution of S[p], and
v. P (Li,V ) r holds for some solution of S[p].
Comment. A solution is regular if and only if it does no contain values
that are too close to the endpoints. If we recall that the representation in
wSSD uses somewhat wide intervals, then (v) is too weak since it allows values
near the endpoints, and perhaps (i) is too strong, even though its occurrence
should be noted. Moreover, (iii) should be defined more precisely. One way
of doing this is to say that (iii) holds if and only if
volume of A
volume of B
has a certain size. Here B is the set of solutions to S[p] and A the set
of solutions to S[p] such that P (Li,V ) r. Another possibility is to use
the notion of contraction introduced by Mats Danielson (Danielson, 1997),
which in turn is a modification of the notion of proportion due to Love
Ekenberg (Ekenberg, 1994). Note that the outcome u has the value V can
be explicated along the same lines.

42

3.3.2

Expected Value

If F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]) is a satisfiable decision frame, p is


the set of solutions to S[p], and v is the set of solutions to U[v], then the
expected value can be accommodated to wSSD in one of the following ways,
where p p and v v :
AF (oi ) = inf E(p, v, oi )

(3.1)

BF (oi ) = sup E(p, v, oi )

(3.2)

,Z

Z
CF (oi ) =

E(p, v, oi ) dpdv
p

dpdv
p

(3.3)

Here
E(p, v, oi ) = E[Ti ] =

m
X

Pj Vj =

j=1

m
X

Pj E[Ti,j ]

j=1

where the sum is taken over the components of p = (P1 , . . . , Pm ) and v =


(V1 , . . . , Vm ) associated with the nodes of Ti . Note that the order of the
operations is immaterial and that all combinations of them make sense. Note
also that partial evaluations like
,Z
Z
E(p, v, oi ) dp
p

dp

(3.4)

are permissible and sometimes quite useful in ordering options, see example
3 of chapter 1.
If one of these evaluations is to be chosen, then the obvious choice is
CF (oi ) since it is a mean value of mean values but sometimes a conservative
decision maker wants to base a decision on all of them.
Example. Set o1 = (q, 1 q; x, y) and o2 = (r, 1 r; x, y) with 0 < r <
q < 1 and 0 < y < x < 1. Here o1 is a lottery which yields x with probability
q and y with probability 1 q. Then o1 dominates o2 stochastically and
therefore clearly is the better option.
To see how the evaluations introduced above orders them, note first that
AF (o1 ) = inf (qx + (1 q)y) = inf (y + q(x y)) = y,

43

AF (o2 ) = inf (rx + (1 r)y) = inf (y + r(x y)) = y,


and thus
AF (o1 ) = AF (o2 ).
Likewise
BF (o1 ) = sup (qx + (1 q)y) = x,
BF (o2 ) = sup (rx + (1 r)y) = x
and
BF (o1 ) = BF (o2 ).
Hence the evaluations according to AF and BF fail to order o1 and o2
correctly. Passing to CF , we first note that the domains
p = {(q, r) | 0 < r < q < 1}
and
v = {(x, y) | 0 < y < x < 1}
each form a triangle. This yields
Z
Z Z
dvdp =
p

dydxdrdq

Z qZ
0

Z qZ

0
1

[y]x0 dxdrdq

Z0 1 Z0 q Z0 1
=

x dxdrdq
Z 1 Z q  2 1
x
drdq
=
2 0
0
0
Z 1Z q
1
=
drdq
0
0 2
Z 1 h iq
r
=
dq
2 0
0
Z 1
q
=
dq
0 2
 2 1
q
=
4 0
= 1/4,
0

44

Z qZ

qx + (1 q)y dydxdrdq

E(p, v, o1 ) dvdp =
p

0
1

Z qZ

0
1

qx qy + y dydxdrdq
x
Z 1Z qZ 1
qy 2 y 2
qxy
=
+
dxdrdq
2
2 0
0
0
0
Z 1Z qZ 1
qx2 x2
+
dxdrdq
=
qx2
2
2
0
0
0
1
Z 1Z q 3
qx
qx3 x3
=

+
drdq
3
6
6 0
0
0
Z 1Z q
q 1
=
+ drdq
6
0
0 6
Z 1h
i
qr r q
=
+
dq
6
6 0
0
Z 1 2
q
q
+ dq
=
6
0 6
 3

2 1
q
q
=
+
18 12 0
1
1
5
=
+
=
18 12
36
=

45

and
Z

Z qZ

rx + (1 r)y dydxdrdq

E(p, v, o2 ) dvdp =
p

0
1

Z qZ

0
1

rx ry + y dydxdrdq
x
Z 1Z qZ 1
ry 2 y 2
rxy
=
+
dxdrdq
2
2 0
0
0
0
Z 1Z qZ 1
rx2 x2
+
dxdrdq
rx2
=
2
2
0
0
0
1
Z 1Z q 3
rx
rx3 x3
=

+
drdq
3
6
6 0
0
0
Z 1Z q
r 1
=
+ drdq
6
0
0 6
q
Z 1 2
r
r
dq
=
+
12 6 0
0
Z 1 2
q
q
=
+ dq
6
0 12
 3
1
q
q2
=
+
36 12 0
1
3
1
=
+
=
36 36
9
=

The above calculations gives


CF (o1 ) =

5
36
1
4

20
0.56
36

1
9
1
4

4
0.44,
9

and
CF (o1 ) =

hence CF (o1 ) > CF (o2 ) and we see that CF induces the correct order.

3.3.3

Decision Rules Based on Extreme Values

Pure maximin and maximax


Let F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U[v]) be a satisfiable decision frame, v
the set of solutions to U[v] with v = (V1 , . . . , Vm ), and j the projection of
46

v on the j:th coordinate, 1 j m. Set


min(oi ) = min( inf V1 , . . . , inf Vj , . . . , inf Vm )
Vj j

V1 1

Vm m

and
max(oi ) = max( sup V1 , . . . , sup Vj , . . . , sup Vm )
V1 1

Vj j

Vm m

where each Vj is associated with a leaf of Ti . Then min(oi ) is the smallest


possible outcome for the alternative oi , and max(oi ) is the largest possible
outcome for the alternative oi .
The maximin principle prescribes that the alternative with the largest
minimum value should be chosen. In other words if the smallest possible
outcome of the alternative o1 is a, and the smallest possible outcome of
the alternative o2 is b, and a > b, then alternative o1 is preferred over o2 .
Consequently, the maximin principle can be denoted
mF = max (min(o1 ), . . . , min(oi ), . . . , min(on )) .
The maximax principle prescribes that the alternative with the largest
maximum value should be chosen. If the largest possible value of alternative
o1 is c, the largest possible value of of alternative o2 is d, and d > c, then o2
is to be preferred over o1 . We can denote the maximax principle
MF = max (max(o1 ), . . . , max(oi ), . . . , max(on )) .
Comment. Note that the lotteries considered in section 3.2 are equally
good according to these rules. More generally, all lotteries with the same
outcomes are equally good according to them as long as the price per ticket
is the same.

47

Chapter 4
Other Approaches to Decisions
under Uncertainty
In this chapter a few well-known decision rules, when we have more than one
probability distribution, are presented and evaluated. First decision rules
developed within Statistical Decision Theory are discussed, thereafter two
different proposals put forward by philosophers are considered.

4.1

Hodges and Lehmann (1952)

Let F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U [v]) be a satisfiable decision frame and


s > 0. Then these two authors propose that first all options oi such that mF
min(oi ) < s are selected, (mF is the maximin of F, see section 3.3.3). Then
from this set the option oi with the greatest value of AF (oi ), the infimum of
the expected value, is to be chosen.
They also propose the following refinement of step one in the selection
process. Let
F1 , . . . , Fi , . . . , Fm
where
Fi = (o1 , . . . , on , T1 , . . . , Tn , Si [p], U [v])
be a series of satisfiable decision frames such that Sj [p] is a contraction of
Sm [p] if j > i and si > 0, 1 i m. Here a contraction is obtained by the
replacement of an interval by a subinterval. Then all options oi satisfying
the condition mFj min(oi ) < sj where 1 j m are first selected.

48

Comment. Set o1 = (q, 1 q; x, y) and o2 = (r, 1 r; x, y) with


0 < r < q < 1 and 0 < y < x < 1. Here o1 is a lottery which yields x with
probability q and y with probability 1 q. Then o1 dominates o2 stochastically and therefore clearly is the better option. But according to the rule
proposed by Hodges and Lehmann they are equally good.

4.2

Blum and Rosenblatt (1967)

These two authors propose that a selection of options is to be based on


the evaluation AF , see section 3.3.2. The same proposal is put forward by
Jackson et al. (1970), Randles and Hollander (1971), Solomon (1972) and
Kofler and Menges (1976, p. 140). But this proposal makes the options
considered in the previous section equally good.

4.3

Watson (1974)

Let F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U [v]) be a satisfiable decision frame and


P a solution to S[p]. Set
Z
Z
dv ,
E(P, v, oi ) dv
CF = (oi , P ) =
v

M (P ) = max(CF (oi , P ), 1 i n),


L(P ) = sup (M (P 0 ) M (P ), P 0 a solution to S[p]))
and
m = inf (L(P ), P a solution to S[p]) .
Then Watson proposes that an option oi such that
CF (oi , P ) = M (P )
with L(P ) m is to be chosen.
Comments.
- The rationale behind Watsons proposal is that we should minimize the
damage due to a choice based on a wrong solution to S[p].
49

- Note that Watsons proposal orders the options considered in section


4.1 correctly since CF (o1 , P ) > CF (o2 , P ) for all solutions P .
- Set o1 = (q, 1 q; x, y) and o2 = (r, 1 r; x + a, y) with 0 < r < q < 1
and 0 < y < x < 1 a. Then Watsons proposal favors o2 for all
a, 0 < a < 1. To see this, note that L(P ) approaches m when both
q and p approaches 1. Hence it suffices to consider the limiting case
q = p = 1. But
Z
(1 a)3
E(P, v, o1 ) dv
6
v
Z
(1 a)3 a(1 a)2
E(P, v, o2 ) dv
<
+

6
2
v
if q 1 p. This should be compared with the result obtained by
employing CF . Then o1 is the better option for a < b and o2 the better
one for a > b with b 0.13. Hence CF fares better in this example.

4.4

Levi (1974)

Levi does not propose an evaluation or ranking of available options in a


satisfiable decision frame F = (o1 , . . . , on , T1 , . . . , Tn , S[p], U [v]) but instead
proposes ways of delimiting the set of permissible options. More precisely he
advocates that this should be done as follows. Select first the E-permissible
options, pick then out the P -permissible ones. Select finally the S-permissible
ones. Here
i. An option oi is E-permissible if there exists a p in p and a v in v
such that E(p, v, oi ) E(p, v, oj ) for all j, 1 j n.
ii. An option oi is P -permissible if it is E-permissible and an optimum
one with respect of freedom of choice.
iii. An option oi is S-permissible if it is P -permissible and there exists a v
in v such that w(oi , v) w(oj , n) for all j, 1 j n. Here w(oi , v)
is the worst outcome of oi given v.
Comments. Set o1 = (p, 1 p; x, y) and o2 = (q, 1 q; x, y) with 0 <
q 0.5 p < 1 and 0 < y < x < 1. Then both o1 and o2 are Spermissible according to Levy. This also holds for o1 = (p, 1 p; x + a, y) and
o2 = (q, 1 q; x, y) if 0 < p < q < 1 and 0 < y < x < x + a < 1 regardless of
50

the size of a. Hence Levys proposal seems to be overly liberal in some cases.
On the other hand o2 but not o1 is S-permissible if o1 = (p, 1 p; x + a, y b),
o2 = (q, 1 q; x, y), 0 < p < q < 1, and 0 < y b < y < x < x + a < 1 even
if b 0 and a 1.

4.5

G
ardenfors and Sahlin (1982)

These two authors propose that each epistemically possible probability distribution be given a probability index and that options having a distribution
with a low index be discarded at the outset. A selection is then to be based on
the evaluation AF , see section 3.3.2. Although no precise criteria for assigning indices to options are presented, their treatment of an example describing
the qualms of a lady contemplating various betting odds, indicate that all
probability distributions commensurate with somewhat imprecise probability
estimates, should be assigned a low reliability index. Hence o1 rather than
o2 should be chosen if o1 = (0.5, 0.5; x, y), o2 = (p, 1 p; z, u), 0 < p < 1,
and 0 < y < x < u < z < 1 even though o2 stochastically dominates o1 .
As presented, the upshot of discarding options having a low reliability index
therefore seems questionable. But if that step is removed from the proposal
of Gardenfors and Sahlin, then it will suffer from all deficiencies accruing to
AF .

51

Chapter 5
Evaluations and Choice Rules
This chapter contains a general discussion of evaluations of options and choice
rules based on such evaluations. Since the discussion at some points is somewhat technical, it should be pointed out that the prospects of finding evaluations that are markedly superior to the ones introduced in chapter 1 and 3
seem to be bleak.

5.1

Fundamental Concepts

Let C = {c1 , . . . , cn } be a finite set of real numbers such that c1 > > cn .
Then the set of finite options O over C is the least set S such that
P
i. if A is a vector (p1 , . . . , pn ; o1 , . . . , on ) such that 0 pi ,
pi = 1, then
A S, and
P
ii. if A1 S, . . . , Am S, 0 pi and
pi = 1, then
(p1 , . . . , pm ; A1 , . . . , Am ) S.
A is a normal option if and onlyP
if A equals (p1 , . . . , pn ; c1 , . . . , cn ) for
some p1 , . . . , pn such that 0 pi and
pi = 1. Each finite option A can be
reduced to a unique normal option N (A) and two options A and B are said
to be congruent if and only if N (A) = N (B). An evaluation is a function
V : O RR R such that V (A, f ) = V (B, f ) if A and B are congruent.
Comments. Finite options are to be viewed as lotteries which ultimately
yield USD. A fixed finite set C is used in order to simplify the presentation.
C is to be thought of as a sufficiently large set that contains all outcomes
explicitly mentioned in this chapter.

52

Examples
a1 Expected utility given f .
E(A, f ) = p1 f (c1 )+ +pn f (cn ). Here and below N (A) = (p1 , . . . , pn ; c1 , . . . , cn ).
a2 Qualitative evaluation
P given f and a risk index (r, s). P
S(A,
P f, r, s) = 1 if r pi s and S(A, f, r, s) = 0 if r pi > s. Here
r indicates that the sum is taken over all i such that f (ci ) r.
a2,c Continuous qualitative
P evaluation given f and risk index (r, s, ).
S(A, f, r, s, ) = 1 if r pi s, S(A, f, r,P
s, ) = (s +  t)/ if s t =
P
p

s
+
,
and
S(A,
f,
r,
s,
)
=
0
if
r i
r pi s + .
a3 Maximin.
m(A, f ) = min(f (ci ), pi > 0).
a4 Maximax.
M (A, f ) = max(f (ci ), pi > 0).
a5 Hurwicz (1951).
H (A, f ) = M (A, f ) + (1 )m(A, f ).
a6 Maximin regret given K.
RK (A, f ) = inf(R(A, B, f ), B 6= A, B K).
Here R(A, B, f ) = min(f (ci ) f (cj ), pi , qj > 0)
if N (B) = (q1 , . . . , qn ; c1 , . . . , cn ).
Given an evaluation V and a function f , we can define a semimetric dV,f
and an order V,f on O by setting dV,f (A, B) = |V (A, f ) V (B, f )| and
A V,f B if and only if V (A, f ) V (B, f ). These notions can in turn
serve to define a choice rule RV,f as follows: RV,f : 2O R 2O such that
RV,f (A, ) = {A A | V (A, f ) +  V (B, f ), for all B A}. RV,f (A, ) is
called the set of -optimum options given V, f .
Comments. d is a semimetric on a set M if and only if d(x, y) 0,
d(x, y) = d(y, x), and d(x, z) d(x, y)+d(y, z). Note that the last inequality
implies that d(x, x) = 0. As is customary,  is to be a small non-negative
number.  is introduced because A does not always contain an option that
is an optimum one given V, f but, at least for evaluations considered in this
chapter, always an - optimum one. To see this, set
A = {(p, 1 p; c1 , c2 ) | 0.5 < p < 0.9} ,
f (c1 ) > f (c2 ) and
V ((p, 1 p; c1 , c2 ), f ) = pf (c1 ) + (1 p)f (c2 ).
53

A choice test is a triple (>, A, B) where > is a strict partial order on


O and A, B are finite subsets of O such that A > B, for all A A and
B B. A preference test is a pair (A, >) where A is a finite subset of O
and > is a strict partial order on A. Let V be an evaluation, F a subset of
RR , and  0. Then (>, A, B) is a weak counterexample at level (F, ) to V
as a choice rule generator if and only if (>, A, B) is a choice test such that
RV,f (AB, )B 6= , for all f F, and (>, A, B) is a strong counterexample
at level (F, ) to V as a choice rule generator if and only if (>, A, B) is a
choice test such that B RV,f (A B, ), for all f F. Moreover, (A, >) is
a counterexample at level (F, ) to V as a preference generator if and only if
(A, >) is a preference test such that (A, V,f ) 6= (A, C ), for each f F and
each linear extension C of >. Finally, V fails strongly (weakly) to degree
d as a choice rule generator at level (F, ) if and only if there exists a dconvincing strong (weak) counterexample at this level to V as a choice rule
generator, and V fails to degree d as a preference generator at level (F, ) if
and only if there exists a d-convincing counterexample at this level to V as
a preference generator.
Comments. The concepts defined above can serve to assess an evaluation as a choice rule generator along the following two ways: first, counterexamples may be assigned degrees which indicate how convincing they are,
secondly, it may for each proposed counterexample c and each number  0
be possible to determine the largest set F such that c is a counterexample
at level (F, ).
In this chapter, only the following three levels will be considered:
1. (F1 , 0) where F1 = RR ,



2. (F2 , 0) where F2 = f RR f is strictly increasing and


3. (F3, 0) where F3 =
f RR f is strictly increasing but with a
strictly decreasing rate of increase}.
As to degrees, it will only be required that each evaluation be compatible
with stochastic dominance; that is V (A, f ) > V (B, f ) if A dominates B
stochastically. Here A dominates B stochastically if p1 + +pi q1 + +qi ,
for all i, 1 i n1, and p1 + +pi > q1 + +qi , for some i, 1 i n1.
As before N (A) = (p1 , . . . , pn ; c1 , . . . , cn ) and N (B) = (q1 , . . . , qn ; c1 , . . . , cn ).
Note that counterexamples primarily fault the order induced by a given
evaluation. Hence we will below also speak of counterexamples to orderings
not necessarily induced by evaluations.

54

Examples
b1 Set A = (0.99, 0.01; 1000, 0), B = (0.01, 0.99; 2, 1), and A > B. Then
(>, {A}, {B}) is a strong counterexample to minimax as a choice rule
generator at level (F2 , 0).
b2 Set A = (0.01, 0.99; 1000, 0), B = (0.99, 0.01; 999, 998), and B > A.
Then (>, {B}, {A}) is a strong counterexample to maximax as a choice
rule generator at level (F2 , 0).
b3 Set B = (1s, s; t, r), C = (1s, s; u, r), t  u  r, and B > C. Then
(>, {B}, {C}) is a weak counterexample to S(A, f, r, s) as a choice rule
generator at level (F2 , 0).
b4 Set B = (0.99, 0.01; 999, 0), A = (0.01, 0.99; 1000, 1), and B > C. Then
(>, {B}, {C}) is a strong counterexample to H (A, f ) as a choice rule
generator at level (F2 , 0).
b5 Set B = (0.99, 0.01; 999, 0), A = (0.01, 0.99; 1000, 1), K = {B, C}, and
B > C. Then (>, {B}, {C}) is a strong counterexample to RK (A, f )
as a choice rule generator at level (F2 , 0).
b6 (Bernoulli 1738, Menger 1934) Set B = (1; 0), C = (231 , . . . , 21 , 231 ; 230
15, . . . , 1 15, 15), B > C, and i(x) = x, for all x in R. Then
(>, {B}, {C}) is a weak counterexample to E(A, f ) at level ({i}, 0).
b7 (Allais 1953, 1979) Set B = (1; 106 ), C = (0.1, 0.89, 0.01; 5106 , 106 , 0),
B > C, and i as in (b6 ). Then (>, {B}, {C}) is a strong counterexample
to E(A, f ) as a choice rule generator at level ({i}, 0).
Remarks. As expected, all evaluations ignoring probabilities fail strongly
at such high level as (F2 , 0). Hence it is doubtful whether it is good policy
to any of these to a large extent. However, the status of E(A, f ) as a choice
rule generator remains to be determined. To this end, section 5 contains an
account of what can be inferred from the Allais example with respect to this
problem. Moreover, the status of evaluations as preference generators must
also be determined. This will be done in sections 3 and 4 with the help of
common ratio tests. But first a few general remarks to clarify some issues.

55

5.2
5.2.1

Miscellaneous remarks
On Tests

A justification of an evaluation as a choice rule generator can either take the


form of a demonstration that the given evaluation is the only one satisfying
certain desirable properties (arguments from above) or in demonstration that
it does not to a large extent produce counterintuitive choices (arguments from
below). Arguments from above, in particular for expected utility, abound in
the literature. But, for all I can see (Malmnas, 1994), they can only provide
expected utility with a justification that is so weak as to be almost useless.
Take as a case in point the axiom system of Herstein and Milnor (1953).
According to these two authors (see 5.6.2 for details), an option A is better
than an option B if and only if E(A, f ) > E(B, f ), for all f in F2 if and only if
A dominates B stochastically. Set, for example, A = (1 106 , 106 ; 106 , 0)
and B = (106 , 1 106 ; 2, 1). Then the axiom system of Herstein and
Milnor does not imply that A is better than B. Hence we can get a weak
counterexample to the order induced by this axiom system at a level as close
to (F2 , 0) as we please. Accordingly, arguments from above offer little as a
guide for selecting suitable choice rules. Hence most of the burden must be
carried by arguments from below.

5.2.2

On Options

The present compendium is mainly devoted to normative decision theory. An


axiom of this theory is that the value of a course of action a is a function of
the ultimate outcomes of a and of the probabilities of these outcomes. Hence
the virtues of various choice rules when the ultimate outcomes are prices in
USD can be discussed at the level of options.

5.2.3

Classification of Evaluations

An evaluation V (A, f ) is regular


P if and only if V (A, f ) = f (ci ) in case
A = (p1 , . . . , pn ; c1 , . . . , cn ) and
pi = 1. V (A, f ) is discriminating if there
R
exists a bijection g in R such that g(V (A, f )) is regular. Of the evaluations
considered in section 5.1 all except (a2 ) and (a2,c ) are discriminating, and
of the remaining ones only (a6 ) is not regular. Regularity seems to be a
desirable property for an evaluation. V (A, f ) is a continuous evaluation if
V (Bn , f ) converges to V (C, f ) in case {Bn } converges to C. All evaluations
defined above except (a2 ) are continuous ones. The reader should note that
no evaluation that is continuous and regular can put a premium on security.
56

5.2.4

On the Relation between Preference Tests and


Choice Tests

Take as a starting point the Allais example (Allais, 1953, 1979). Set
B
C
D
E

= (1; 106 ),
= (0.1, 0.89, 0.01; 5 106 , 106 , 0),
= (0.11, 0.89; 106 , 0),
= (0.1, 0.9; 5 106 , 0),

and B > C > E > D. Then ({B, C, D, E}, >) is a counterexample to


E(A, f ) as a preference generator at level (F1 , 0). Indeed,
E(B, f ) E(C, f ) = f (106 ) 0.1f (5 106 ) 0.89f (106 ) 0.01f (0)
= 0.11f (106 ) + 0.89f (0) 0.1f (5 106 ) 0.9f (0)
= E(D, f ) E(E, f ).
Hence ({B, C, D, E}, >) is certainly a counterexample of the kind mentioned
above. Moreover, empirical tests, see Mac Crimmon and Larsson (1979) and
Kahneman and Tversky (1979), indicate that it should be considered a convincing counterexample. Now, what kind of counterexample to E(A, f ) as
a choice rule generator can be constructed from the Allais example? The
immediate consequences are the following ones: (>, {B}, {C}) is a strong
counterexample to E(A, f ) as a choice rule generator at level (G1 , 0) and
(>, {E}, {D}) is a strong counterexample to E(A, f ) as a choice rule generator at level (G2 , 0). Here
G1 = {f | E(E, f ) E(D, f )} and
G2 = {f | E(B, f ) E(C, f )} .
Moreover, G1 G2 = F1 . On the other hand, G1 F3 6= and G2 F3 6= .
So these counterexamples are not really at a high level. Passing to more
contrived counterexamples, we can consider the hypothetical choices at the
same time or in succession; we can also consider mixtures of the given options.
This possibility is, however, not open to those who side with Allais since
they can not accept the following principle: If A1 > B1 , . . . , An > Bn , then
(p1 , . . . , pn : A1 , . . . , An ) > (p1 , . . . , pn ; B1 , . . . , Bn ) for all p1 0, . . . , pn 0
such that p1 + +pn = 1. Now this principle is not compatible with B > C,
E > D since (0.5, 0.5; B, E) = (0.5, 0.5; C, D). Considering the choices in
succession can not give rise to a formal counterexample since the underlying
utility functions need not be kept constant. So the only possibility left is to
57

consider combinations of the given options. This case is discussed in some


detail in section 5.5, and it is shown there that combinations do not yield
a counterexample to E(A, f ) at level (F3 , 0). Now these findings are not
limited to the Allais example but hold in general. So the substantial recent
literature on preference tests has little to offer those who are interested in
finding counterexamples to E(A, f ) as a choice rule generator.

5.3
5.3.1

Evaluations and Common Ratio Tests


Introduction

Set
Ap = (p, 1 p; c1 , c3 ) and
Bq = (q, 1 q; c2 , c3 )
with c1 > c2 > c3 and 1 > q > p > r > 0. Then ({Bq , Ap , Arp , Brq }, >) is
called a common ratio test. In most cases p and q are comparatively large
numbers and r a small one. Because of their simple structure, common ratio
tests are deemed ideal for testing evaluations as preference generators. The
fundamental observation concerning such tests is that
E(Ap , f ) > E(Bq , f ) if and only if E(Arp , f ) > E(Brq , f ),
for all f F1 .
On the other hand, for some c1 , c2 , c3 and when p and q are large but r
small, most people, see Kahneman and Tversky (1979), prefer Bq to Ap and
Arp to Brq . So ({Bq , Ap , Arp , Brq }, >) can be a counterexample to E(A, f )
as a preference generator at level (F1 , 0).
Now, a challenge for anyone who wishes to propose an alternative to
expected utility, say V (A, f ), is to show that V (Ap , f ) > V (Bq , f ) if and only
if V (Arp , f ) > V (Brq , f ), for all f in F1 does not hold, and that V (A, f ) is
compatible with stochastic dominance, see Sugden (1986) for a lucid account.
Quite a few attempts have also been successful in these respects.
However, showing this does not entail that ({Bq , Ap , Arp , Brq }, >) cannot
be a counterexample to V (A, f ) as a preference generator at level (F2 , 0).
But the latter result is what is needed in order to have an evaluation that
is a substantial improvement upon expected utility. The prospects for finding such an evaluation are, however, not particularly bright; this much can
be concluded from the following observation: Select a set {Bq , Ap , Arp , Brq }

58

as above with c1  c2 > c3 and 1 > q > p > 0.5. Let V be an evaluation that is compatible with stochastic dominance and f F2 . Assume that V (Arp , f ) > V (Brq , f ). Now V (Aq , f ) > V (Bq , f ), and hence
V (As , f ) > V (Bq , f ) for some s, p s < q. Moreover, V (Ars , f ) > V (Brq , f ).
Hence the main advantage that V (A, f ) may have over E(A, f ) is that of a
smaller distance between probabilities. The reader should bear this in mind
when considering the following detailed examination.

5.4

An Examination of Some Proposals

5.4.1

Hagen (1969; 1972; 1979)

Building on earlier work by Allais, Hagen tentatively proposed the following


evaluation in 1969.
Ha,b (A, f ) = E(A, f ) aS(A, f ) + b

M3 (A, f )
S 2 (A, f )

in case S 2 (A, f ) > 0. If S 2 (A, f ) = 0, then


Ha,b (A, f ) = E(A, f ).
Here
S 2 (A, f ) = p1 (f (c1 ) E(A, f ))2 + + pn (f (cn ) E(A, f ))2 ,
0 S(A, f ) = (S 2 (A, f ))1/2 ,
M3 (A, f ) = p1 (f (c1 ) E(A, f ))3 + + pn (f (cn ) E(A, f ))3
and a, b > 0.
The rationale behind Ha,b (A, f ) is that the value of an option should
decrease with increasing dispersion and increase with positive skewness. The
reason behind the division of M3 by S 2 is probably that all moments are to
have equal weights. As is customary in this field, Ha,b (A, f ) is claimed to be
compatible with an axiom system that the interested reader can look up.
To gain some understanding of this evaluation, set
B(p, d) = (p, 1 p; d, 0)
with p, d > 0 and f (d) > f (0) = 0. Then
1

Ha,b (B(p, d), f ) = pf (d) af (d)(p(1 p)) 2 bf (d)(1 2p) and


1

H
(p(1 p)) 2
= f (d) af (d)(1 2p)
+ 2bf (d).
p
2
59

Let k be a large natural number and set p =

1
.
k2

Then

(p(1 p)) 2
< ak/4.
af (d)(1 2p)
2
Hence H
< 0 if p is a small number. Accordingly, H is not an increasing
p
function in p, and it is therefore hardly a serious rival to expected utility.
In 1979 Hagen proposed the following modification of Ha,b :
Hg,b (A, f ) = E(A, f ) g(S(A, f )) + b

M3 (A, f )
+
S 2 (A, f )

in case S 2 (A, f ) > 0. If S 2 (A, f ) = 0, then


Hg,b (A, f ) = E(A, f ).
Here g(0) = 0, g(x) > 0 if x > 0, g(x) > 0 and continuous, and b,  > 0.
To see to what extent Hg,b is compatible with stochastic dominance, we
compute H
as above, getting
p
!
12


1
H
(p(1

p))
= f (d) 1 g 0 f (d) (p(1 p)) 2 (1 2p)
2b .
p
2
1

Hence H
> 0 only if b < 0.5. Assume then that c > 0 and that (p(1 p)) 2
p
1
k with k large. Then g 0 (x) < k1 if f (d) 2kx. Hence g 0 (x) = 0, which yields
2
a contradiction. Hence we must have a uniform upper bound M on f (d),
which in turn imposes restrictions on admissible f :s and d:s. Moreover, g 0
must approach 0 as fast as p does. The simplest function satisfying these
conditions seems to be xa with a a large number. But then the contribution of
the term g (S(A, f )) will be negligible in most cases. Hence we may neglect
this term when determining how Hg,b performs in common ratio tests. To
simplify the comparison with E(A, f ), set p = sq with 0 < s < 1. Then
q
q + b(1 2q)
1
q(1 2b) + b

=
qs qs + b(1 2qs)
s qs(1 2b) + b
qs(1 2b) + b s(q(1 2b) + b)
=
s(qs(1 2b) + b)
b bs
> 0.
=
s(qs(1 2b) + b)
Hence Hg,b is less risk averse than E(A, f ) in most common ratio tests and
therefore hardly a serious rival to it.
60

5.4.2

Fishburn (1983)

Fishburn there proposes the evaluation


V (A, f ) =

E(A, g1 (f ))
.
E(A, g2 (f ))

Here g2 (x) > 0. V is regular if and only if


g1 (x)
= x.
g2 (x)
Assume that V is regular. Then g1 (0) = 0. To see how V performs in
common ratio tests, set
Ap = (p, 1 p; c1 , 0) and
Bp = (p, 1 p; c2 , 0)
with c1 > c2 > 0. Let f be strictly increasing with f (0) = 0. Then
V (Ap , f ) V (Bp , f ) =
pg1 (f (c2 ))
pg1 (f (c1 ))

>0
=
pg2 (f (c1 )) + (1 p)(g2 (f (0))) pg2 (f (c2 )) + (1 p)(g2 (f (0)))
if and only if
(1 p)g2 (0)g1 (f (c1 )) > (1 p)g2 (0)g1 (f (c2 ))
if and only if g1 (x) is strictly increasing for x > 0. Hence the introduction of
g1 seems pointless. Now
V (Ap , f ) V (Bq , f ) > 0
if and only if
pg1 (f (c1 ))(qg2 (f (c2 )) + (1 q)g2 (0)) > qg1 (f (c2 ))(pg2 (f (c1 )) + (1 p)g2 (0))
if and only if
p(1 q)
g1 (f (c2 ))
>
.
q(1 p)
g1 (f (c1 ))
Hence V is slightly more risk averse than E. But note that this is is due to
the introduction of the function g2 with g2 (0) > 0. Hence the price seems to
be too high.
61

5.4.3

Loomes and Sugden (1986)

These two authors claim that regret and disappointment (Loomes & Sugden,
1982; 1986) ought to influence decision making and in their paper of 1986
they propose an evaluation utilizing these ideas. Their work is closely related
to that of (Bell, 1982; 1985). But, for all we know, Bell has contented himself
with a discussion of special cases and never presented any evaluation. The
proposal of Loomes and Sugden is as follows:
X
V (A, f ) =
pi f (ci ) + D(f (ci ) E(A, f ))
where 1 i n and A = (p1 , . . . , pn ; c1 , . . . , cn ). Here D is supposed to measure elation and disappointment. D is non-decreasing and derivable. Moreover, D(x) is convex if x > 0 and concave if x < 0. Finally, D(x) = D(x)
which yields D(0) = 0. To see when V is compatible with stochastic dominance, let Ap and f be as in section 5.4.2. Then
V (Ap , f ) = pf (c1 ) + D(f (c1 ) pf (c1 )) + D(0 pf (c1 ))
= pf (c1 ) + D((1 p)f (c1 )) D(pf (c1 ))
and
V
= f (c1 ) f (c1 )D0 ((1 p)f (c1 )) f (c1 )D0 ((pf (c1 ))
p
= f (c1 )(1 D0 ((1 p)f (c1 )) D0 (pf (c1 ))).
Hence 0 D0 (x) < 1 for 0 < x < a, for some a > f (c1 ). But then x D(x)
is strictly increasing in this interval. To see how V performs in common ratio
tests, set
D(pf (c1 )) = pf (c1 ) d
and
D(qf (c2 )) = qf (c2 ) e.
Then
V (Ap , f ) V (Bq , f ) = d + D((1 p)f (c1 )) e D((1 q)f (c2 )).
But
d + D(1 p)f (c1 )) e D((1 q)f (c2 )) 0
if pf (c1 ) qf (2) with equality only if pf (c1 ) qf (2) since xD(x) is strictly
increasing and compatibility with stochastic dominance holds. Hence V is
less risk averse than E in common ratio tests and hardly a serious rival to
E.
62

5.4.4

Green and Jullien (1988)

These two authors propose an evaluation that is based on the idea that we
should replace the notion of the value of an outcome with the notion of the
value of an outcome given a probability. Since this idea contradicts one
of the basic presuppositions of normative decision theory, their proposal is
presented here only for the sake of completeness.
X
V (A, f ) =
pi gi (f (ci )),
where 1 i n and A = (p1 , . . . , pn ; c1 , . . . , cn ). Here
Z
1
(x, t) dt
gi (x) =
pi
with the integral going from si1 to si , s0 = 0, si = p1 + + pi , and
: R [0, 1] R such that is continuous, non-decreasing in the first
variable, and (0, p) = 0. Setting (x,
R p) = x we get V (A, f ) = E(A, f ) as
expected. If V is to be regular, then I (x, t) dt = x for I = [0, 1]. Moreover,
{p | (x, p) is not strictly increasing for all p} has measure zero if V is compatible with stochastic dominance. To see how V performs in common ratio
tests, we will here only consider the case (x, p) = x h(p), h continuous in
[0, 1]. Now, in view of Weierstrass approximation theorem, it suffices to set
h(p) = Pn (p) with Pn a polynomial of degree n > 0. If V is compatible with
stochastic dominance, then Pn 0 in [0, 1], and if Pn is to serve as an risk
averse weight, Pn must be strictly increasing. So the most interesting case
is Pn (p) = (n + 1) pn . Set Ap = (p, 1 p; c1 , 0), Bq = (q, 1 q; c2 , 0) with
0 < c2 < c1 , 0.5 < p < q, and f strictly increasing with f (0) = 0. Then
V (Ap , f ) = pn f (c1 ) and
V (Bq , f ) = qn f (c2 ).
Moreover,
V (Arp , f ) = rn pn f (c1 ) and
V (Brq , f ) = rn qn f (c2 ).
Hence, for this choice of , V performs very much as E in common ratio
tests.

5.4.5

Quiggin (1982)

Quiggin proposes that an evaluation of an option should not be based directly upon the values and probabilities of the given outcomes, but that the
63

probabilities first should be modified. The reason behind this claim is solely
empirical: it seems to conform with observed behavior. Any such proposal
is, of course, at variance with one of the basic tenets of normative decision
theory. This view has, however, gained some momentum, so we had better
comment upon it here. The simplest way to modify the given probabilities
is to set
V (A, w, f ) =

n
X

w(pi )f (ci ),

i=1

where A = (p1 , . . . , pn ; c1 , . . . , cn ) and w : [0, 1] R+ , w(0) = 0. If


w is regular, then w(1) = 1. Hence by setting f (c1 ) f (c2 ) we get
w(p) + w(1 p) = 1, which in turn implies that w( 21 ) = 12 . Moreover, w
is strictly increasing if V is compatible with stochastic dominance. So, by
induction, w( 2kn ) = 2kn , 0 k 2n . Hence w(x) = x if w is continuous. So,
in order to improve upon expected utility, Quiggin proposes the following
more complicated evaluation:
V (A, w, f ) =

n
X

(w(si ) w(si1 ))f (ci ).

i=1

Here s0 = 0 and si = p1 + + pi . w : [0, 1] R+ , which is continuous


and strictly increasing. Moreover, w(0) = 0, w(1) = 1, and w( 21 ) = w( 12 ). If
V is to have any advantage over E in common ratio tests, then w must be
convex. But, as pointed out by Yaari (1987), then w(x) = x. Indeed, we
first get w(x) x, then w( 2kn ) = 2kn by induction, and finally w(x) = x by
continuity. So V is hardly an improvement upon expected utility.

5.4.6

Yaari (1987)

To overcome the drawbacks of Quiggins proposal, Yaari puts forward the


following evaluation:
V (A, g, f ) =

<n
X

(g(1 si1 ) g(1 si ))f (ci ) + g(pn )f (cn ),

i=1

where A = (p1 , . . . , pn ; c1 , . . . , cn ) and g : [0, 1] R+ , g continuous and


strictly increasing. Here s0 = 0 and si = p1 + + pi . Again, if V is regular
and compatible with stochastic dominance, then g(0) = 0 and g(1) = 1. By
setting n = 2 we get
V ((p, 1 p; c1 , c2 ), g, f ) = g(p)f (c1 ) + g(1 p)f (c2 ).
64

Hence by setting f (c1 ) f (c2 ) we get g(p) + g(1 p) = 1 if g is to be


reasonable. But since g must be convex if V is to have any advantage over
E in common ratio tests, we get g(x) = x. Hence Yaaris evaluation hardly
has any advantage over E in common ratio tests.

5.5

On Continuous Evaluations

In this section a few special cases of continuous evaluations are studied in


some detail. Special consideration is devoted to risk averse evaluations and
their performance in common ratio tests.

5.5.1

Polynomial evaluations of options in two variables

Consider all options (p, 1 p; y, 0) with 0 p 1 and 0 y. Since we


are only interested in evaluations v(y) such that v is continuous, strictly
increasing and with v(0) = 0, we can replace the options with the set of
vectors (p, v) with 0 p 1 and v 0. An evaluation of options can then
be considered as a function V (p, v). Accordingly E((p, 1 p; y, 0), v) equals
pv. This evaluation is sometimes considered to be at fault when p is close to
one and v is fairly large. This defect is often said to be a consequence of the
fact that the rate of increase p is constant.
To improve upon pv in this respect we may start by considering all evaluations of degree 2. Then
V (p, v) = ap2 + bpv + cv 2 + dp + ev + f.
If V is a reasonable evaluation, then V is strictly increasing in p for v > 0
and strictly increasing in v for p > 0. Moreover, V (p, 0) = V (0, v) = 0, and
V (1, v) = v. Hence f = e = d = c = a = 0, and b = 1. So pv is the only
reasonable evaluation of degree 2.
Suppose instead that V is a polynomial of degree 4. It follows from the
argument above that V can only contain mixed terms. So
V (p, v) = vp(ap2 + (bv + c)p + dv 2 + ev + f ).
Set p = 1. Then
v(a + bv + c) + v(dv 2 + ev + f ) = v.
Hence d = b + c = 0, and a + c + f = 1. So
V (p, v) = vp(ap2 + (bv + c)p bv + 1 a c).
65

But V is increasing in p and the rate of increase is also increasing. So


V
= v(3ap2 + 2(bv + c)p bv + 1 a c) > 0
p
and
2V
= v(6ap + 2(bv + c)) > 0.
p2
Hence 1 a (bv + c) 0 and bv + c 0. So c 0. Hence a 1.
Accordingly the most risk averse evaluation of degree 4 is p3 v. But this
evaluation performs as E in common ratio tests.
Consider then the general case, that is
V (p, v) = vp(an,0 pn + (an1,1 v + an1,0 )pn1 + + (a0,n v n + . . . a0,0 )p0
for n 3. But V (1, v) = v. So
an,0 + + a0,0 = 1,
a0,n = 0,
ank,k + + a0,k = 0,
where 0 < k < n. Let v be a small number, then
V (p, v) vp(an,0 pn + + a0,0 p0 ).
This function is increasing for all p if V is a reasonable evaluation. So a0,0 0,
and the most risk averse evaluation satisfying this condition has a0,0 = 0. By
iteration we get ak,0 = 0 for 0 k < n and an,0 = 1. Hence the most risk
averse evaluation V when p approaches 1 has the form
V (p, v) = v 2 p(pn + an1,1 pn1 + + (a0,n v n1 + + a0,1 )p0 ,

n 3.

Hence by a new iteration we get that the most risk averse evaluation has the
the form vpn . But this evaluation performs as E in common ratio tests.

5.5.2

Polynomial evaluations of options with three outcomes

Consider options like (p1 , p2 , p3 ; v1 , v2 , v3 ) and let V be an evaluation of such


options. Assume that V separates the probability variables from the value
variables and that V is linear in the value variables. Then
V (p, v) = v1 f1 (p) + v2 f2 (p) + v3 f3 (p).
66

If v1 = v2 = 0, then V only depends on p3 . So we get


V (p, v) = v1 f1 (p1 ) + v2 f2 (p2 ) + v3 f3 (p3 ).
Setting v1 = v2 = v > 0 and v3 = 0, then f1 (p1 ) + f2 (p2 ) = g(p1 + p2 )
if V is reasonable. Now f1 (0) = f2 (0) = 0. Then f1 (x) = g(x) = f2 (x).
Hence kg(x) = g(kx) for each natural number k > 0. But then rg(x) =
g(rx) for each positive rational number r. Finally continuity implies that
ag(x) = g(ax) for each positive real number a. Hence
V (p, v) = v1 p1 + v2 p2 + v3 p3 .
Assume thereafter that
V (p, v) = f1 (v1 , p1 ) + f2 (v2 , p2 ) + f3 (v3 , p3 ).
Then fi is continuous, fi (v, 1) = 1 and, fi (0, p) = 0 = fi (v, 0). Setting again
v1 = v2 = v > 0 and v3 = 0 we get f1 (v, p1 ) + f2 (v, p2 ) = g(v, p1 + p2 ) if V is
reasonable. But then f1 (v, p) = g(v, p) = f2 (v, p). As before g(v, ax) = ag(v, x).
So g(v, p) = vp. Hence again
V (p, v) = v1 p1 + v2 p2 + v3 p3 .
So in order to get other reasonable evaluations we must assume a rather
intricate interdependence between probabilities and values.

5.6

Axiomatic Utility Theory and Expected


Utility

It seems to be a common view (see, for instance Kahneman [2011, p. 314])


that von Neumann and Morgenstern (1944) and the host of axiomatic theories which have appeared in their aftermath (see, for instance, Fishburn
[1981]) provide a formal foundation for the utility principle, or at least for
the principle of maximizing expected utility. In this section it will be shown
that this opinion rests on a misunderstanding of the results obtained: existing axiomatic theories do not provide any formal foundation for the utility
principle, nor can they be made to do so via the addition of new axioms,
at least if we want to exclude a non-circular foundation. To substantiate
this claim, three such systems will be examined in some detail: Herstein and
Milnor (1953), Savage (1954; 1972), and Oddie and Milne (1990). The rationale behind this selection is the following: the axiom system of Herstein and
67

Milnor (1953) is, at least from a mathematical point of view, the most satisfactory of the systems directly inspired by von Neumann and Morgenstern
(1944), and the system of Savage (1954; 1972) has been the most influential
of the systems inspired by Ramsey (1931). Finally, Oddie and Milne (1990) is
chosen mainly because they, in contradistinction to most authors, explicitly
aim at a justification of the utility principle. A discussion of their system
also forms a natural bridge to a general discussion of the possibilities of obtaining a non-circular formal justification of the utility principle. Hopefully,
a discussion of the systems mentioned above will convince most readers of
the tenability of the claims made at the outset. In case of doubt, a fuller
account is found in Malmnas (1990).

5.6.1

Fundamental Concepts

The utility principle states that the value of an option


A = (p1 , . . . , pn ; c1 , . . . , cn )
equals E(A, V ), and an agent is said to accept the utility principle if she
assigns A the value E(A, V ) given that she has assigned the value V (ci )
to ci , 1 i n. Moreover, an ordering > between options is said to
be compatible with the principle of maximizing expected utility if A > B
implies E(A, V ) > E(B, V ). Finally, an agent, who has assigned the value
V (ci ) to ci , is said to accept the principle of maximizing expected utility if
her value-ordering of the options is compatible with that principle.

5.6.2

Herstein and Milnor (1953)

These authors consider a non-empty set of outcomes C, a set of prospects


S, and an operation s such that C S and s(, a, b) S if a, b S and
0 1. s(, a, b) is to be the prospect that a occurs with probability
and b with probability 1 , and s is to satisfy the following conditions:
i. s(1, a, b) = a
ii. s(, a, b) = s(1 , b, a)
iii. s(, s(, a, b), b) = s(, a, b)
From (i)-(iii) we can derive
iv. s(, s(, a, b), s(, a, b)) = s( + (1 ), a, b), which will be needed
below.
68

Proof of (iv)
Case 1 = 1. Then
s(, s(, a, b), s(, a, b)) = s(, s(, a, b), a)
= s(, s(1 , b, a), a)
= s((1 ), b, a)
= s(1 (1 ), a, b)
= s( + (1 ), a, b).
Case 2 < 1. Then
s(, a, b) = s(, a, s(, a, b))
with = ( )/(1 ). Hence
s(, s(, a, b), s(, a, b)) = s(, s(, a, s(, a, b)), s(, a, b))
= s(, a, s(, a, b))
= s(1 , s(1 , b, a), a)
= s((1 )(1 ), b, a)
= s(1 (1 )(1 ), a, b).
But 1 (1 )(1 ) = + (1 ). Hence
s(, s(, a, b), s(, a, b)) = s( + (1 ), a, b).
They also consider the relation at least as good as between prospects.
This relation, labeled , is employed to define the relations better than (>)
and equally good () in the following way: a > b if and only if a b and
not b a, and a b if and only if a b and b a. The following axioms
are then assumed:
Axiom 1. is a complete semi-order on S.
Axiom 2. { | s(, a, b) c} and { | c s(, a, b)} are closed, a, b, c S.
Axiom 3. If a a0 , then s(1/2, a, b) s(1/2, a0 , b).
From these axioms, the following theorem is then proved.

69

Representation Theorem
a. If and s satisfy (i) - (iii) and Axioms 1 - 3, then there exists a function
U : S R, which is compatible with and s.
b. If U and U 0 are real-valued functions on S that are compatible with
and s, then U = U 0 + , for some > 0 and .
Here, a function U : S R is compatible with and s if and only if:
i. a b if and only if U (a) U (b), and
ii. U (s(, a, b)) = U (a) + (1 )U (b).
Proof of the Representation Theorem
The Representation Theorem follows from Lemmas 1 - 8 below:
Lemma 1. If a b c, then b s(, a, c), for some .
Lemma 2. If b s(i , a, c) and limi i = , then b s(, a, c).
Lemma 3. If a > b, then a > s(1/2, a, b) > b.
Lemma 4. If a > b and 0 < < 1, then a > s(, a, b) > b.
Lemma 5. If a a0 , then a s(, a, a0 ).
Lemma 6. If a a0 , then s(, a, b) = s(, a0 , b).
Lemma 7. If a > b, then s(, a, b) > s(, a, b) if and only if > .
Lemma 8. If a > b > c, then there is a unique such that b s(, a, c).
Part(a)
Case 1. a b, for all a, b S. Set U (x) = 1. Then U is compatible
with and s.

70

Case 2. Ia,b = S, for some a > b S. Here Ia,b = {x | a x b}. Set


U (x) = if and only if x s(, a, b). Then it follows from lemmas 1, 4, and
8 that U is a real-valued function on S. Moreover, lemma 7 implies that U
is compatible with . Suppose then that x s(, a, b) and y s(, a, b).
Then it follows from condition (iv) of section 5.6.2 and lemma 6 that
s(, x, y) s(, s(, a, b), y)
s(, s(, a, b), s(, a, b)
= s( + (1 ), a, b).
Hence
U (s(, x, y)) = + (1 ) = U (x) + (1 )U (y),
which shows that U is compatible with s.
Case 3. Neither case 1 nor case 2. Choose c, d S such that c > d.
Consider Ia,b , Ia0 ,b0 such that c, d Ia,b Ia0 ,b0 . Set a,b (x) = if and only if
x s(, a, b) and a0 ,b0 (x) = if and only if x s(, a0 , b0 ). Set then
Ma,b (x) =

a,b (x) a,b (d)


a,b (c) a,b (d)

and
Ma0 ,b0 (x) =

a0 ,b0 (x) a0 ,b0 (d)


.
a0 ,b0 (c) a0 ,b0 (d)

Then M is a positive linear transformation of . Hence Ma,b is compatible


with and s on Ia,b , and Ma0 ,b0 is compatible with and s on Ia0 ,b0 . In
addition, Ma,b (c) = Ma0 ,b0 (c) = 1 and Ma,b (d) = Ma0 ,b0 (d) = 0. Suppose that
x Ia,b Ia0 ,b0 . Then Ma,b (x) = Ma0 ,b0 (x). This can be seen in the following
way:
i. c x d. Then x s(, c, d), for some > 0. Hence
Ma,b (x) = = Ma0 ,b0 (x).
ii. x > c > d. Then c s(, x, d), for some > 0. Hence
a,b (c) = a,b (x) + (1 )a,b (d).
So
a,b (c) (a,b (d) = (a,b (x) (a,b (d)).
Hence
Ma,b (x) =
71

1
= Ma0 ,b0 (x).

iii. c > d > x. Then d s(, c, x), for some , 0 < < 1. Hence
a,b (d) = a,b (c) + (1 )a,b (x).
So
(1 )(a,b (x) a,b (d) = (a,b (c) a,b (d)).
Hence
Ma,b (x) =

= Ma0 ,b0 (x).


1

Accordingly, by setting U (x) = y if and only if Ma,b (x) = y for some


a, b S such that x, c, d Ia,b we get a real-valued function on S that
is compatible with and s.
Part (b) Assume that U and U 0 are compatible with and s. Let c > d
be elements of S. Set
U (x) U (d)
and
U (c) U (d)
U 0 (x) U 0 (d)
V 0 (x) = 0
.
U (c) U 0 (d)
V (x) =

Then by reasoning as in case 3 above we see that V (x) = V 0 (x). Hence


U (x) = U 0 (x) +
with
=

U (c) U (d)
U 0 (c) U 0 (d)

and
= U (d) U 0 (d).
Proof of lemma 1. Set
I1 = { | s(, a, c) b} and
I2 = { | b s(, a, c)} .
Then I1 I2 = [0, 1] since is a complete semiorder on S. Moreover, I1 and
I2 are closed and non-empty. Hence I1 I2 6= .
72

Proof of lemma 2. Let I1 and I2 be as in the proof of lemma 1. Then


I1 I2 since I1 I2 is closed and i I1 I2 .
Proof of lemma 3. Assume that a > b. Suppose first that a s(1/2, a, b).
Then axiom 3 yields that
s(1/2, a, b) s(1/2, s(1/2, a, b) = s(22 , a, b).
Hence a s(2n , a, b) for n = 2k , k 1. But then lemma 1 implies that
a s(0, a, b) = s(1, b, a) = b. Suppose then that s(1/2, a, b) > a. Then
lemma 1 yields that
a s(21 , a, b)
for some , 0 < < 1. Hence
a s(0, a, b) = b.
But then it has been shown that a > s(1/2, a, b). Since s(1/2, a, b) > b is
shown in a similar way, lemma 3 follows.
Proof of lemma 4. Assume that a > b and that 0 < < 1. Define
two sequences ri , si in the following way:
(a) r1 = 0 and s1 = 1.
(b) If > 21 (rn + sn ), then rn + 1 = 12 (rn + sn ) and sn + 1 = sn , but if
12 (rn + sn ), then rn + 1 = rn and sn + 1 = 21 (rn + sn ).
Then r < si and si ri = 2i+1 . Hence
lim ri = lim si = .

A repeated use of lemma 3 then yields that


s(ri , a, b) s(ri+1 , a, b) < s(si+1 , a, b) s(si , a, b).
Finally, axiom 2 implies that the lemma holds.

73

Proof of lemma 5. It is first shown that s(, a, a) a. Suppose first


that a > s(, a, a). Set 0 < < 1. Then lemma 4 implies that
a > s(, a, s(, a, a)) = s((1 ), a, a) > s(, a, a).
Let satisfy the condition 1 (1 ) > and set = 1 (1 ). Then
a > s(, a, a) > s(, a, a),
see condition (ii) of section 5.6.2. Set 0 < < 1. Then lemma 4 implies that
a > s(, a, s(, a, a)) = s((1 ), a, a) > s(, a, a).
But for (1 ) = , some in (0, 1). Hence a > s(, a, a) entails a
contradiction. In a similar way a contradiction follows from s(, a, a) > a.
Hence s(, a, a) a. Assume then that a a0 . Then axiom 3 and the
argument above implies that a s(1/2, a0 , a). Moreover, induction yields
that a s(rn , a0 , a) s(sn , a0 , a). Here rn and sn are as in the proof of
lemma 4. Finally, lemma 2 implies that a s(, a0 , a).
Proof of lemma 6. Assume that a a0 . Then axiom 3 implies that
s(1/2, a, b) s(1/2, a0 , b). Suppose now that s(rn , a, b) s(rn , a0 , b) and that
s(sn , a, b) s(sn , a0 , b). Then
s(1/2(rn + sn ), a, b) = s(1/2, s(rn , a, b), s(sn , a, b))
s(1/2, s(rn , a0 , b), s(sn , a0 , b))
= s(1/2(rn + sn ), a0 , b).
Hence
s(rn + 1, a, b) s(rn + 1, a0 , b)
and
s(sn + 1, a, b) s(sn + 1, a0 , b).
Finally, lemma 2 implies that s(, a, b) s(, a0 , b).
Proof of lemma 7. Assume that a > b and that 1 > > > 0. Then
lemma 4 implies that a > s(, a, b) > b. Moreover,
s(, a, b) > s(, s(, a, b), b) = s(, a, b) > b
if 1 > > 0. But = , for some such . Hence
s(, a, b) > s(, a, b).
74

The Justification of the Utility Principle


The Representation Theorem stated above is to serve as a justification of
the utility principle. To see to what degree this holds, (p1 , . . . , pn ; c1 , . . . , cn )
must first be identified with an element of S. This is readily done as follows:
(i) (1, 0, . . . , 0; c1 , . . . , cn ) = c1 ,
(ii) If p1 < 1, c1 = b1 , and
(

p2
pn
,...,
; c2 , . . . , cn ) = b2 ,
1 p1
1 p1

then
(p1 , . . . , pn ; c1 , . . . , cn ) = s(p1 , b1 , b2 ).
It can the be noted that (c) and (d) hold:
(c) If C = {c1 , . . . , cn } and V : C R is such that V (c1 ) > > V (cn ),
then there exists a system (S, s, ) that is induced by V and satisfies
the Herstein-Milnor axioms.
(d) If V is as in (c), (S, s, ) is induced by V , a = (p1 , . . . , pn ; c1 , . . . , cn ),
and U is an extension of V to S that is weakly compatible with and
compatible with s, then
U (a) =

n
X

pi U (ci ).

i=1

Here the following holds: A function U on S is weakly compatible with


if and only if a b implies U (a) U (b), and a system (S, s, ) is induced
by a function V : C R such that V (c1 ) > > V (cn ) if and only if
(i) S C and s(, s, a, b) S if a, b, c, S and 0 1,
(ii) s satisfies conditions (i) - (iii) of section 5.6.2,
(iii) if a b, then a b and b a, and


V (c )V (c )
(iv) cj s V (c1j )V (cnn ) , c1 , cn ), 1 j n.
Proof of (c). See the proof of (e) below.
75

Proof of (d). A simple induction on the definition of a.


We are now in a position to see what kind of support the utility principle
receives from the Herstein-Milnor axioms. (c) and (d) imply that the utility
principle is consistent with these axioms, but note also that this is their only
connection with the utility principle: the Herstein-Milnor axioms have no
role in (d), and should we change the antecedent of (d) so as to add that
(S, s, ) satisfies these axioms but delete that (S, s, ) be compatible with s,
then any strictly increasing function of U would be compatible with . We
can also note that the form of U is completely determined by the condition
that it be compatible with s, and that the clauses (iii) and (iv) appear in the
definition above only to ensure that U be an extension of V .
Hence the axioms do not determine U given V , but at least any strictly
function of a function compatible with the axioms also satisfies that condition. This follows that the axioms primarily impose restrictions on and
that it therefore is reasonable to say that a function is compatible with the
axioms if and only if it is compatible with .
But if we assume this position, then it becomes reasonable to say that
a system (S, s, ), which satisfies the Herstein-Milnor axioms, is compatible
with a function V : C R such that V (c1 ) > > V (cn ) if and only if
(i) S C and s(, s, a, b) S if a, b S and 0 1,
(ii) s satisfies conditions (i) - (iii) of section 5.6.2,
(iii) if c1 > > cn , and
(iv) There exist
1 1 > > n 0,
such that cj s(j , c1, cn), 1 j n.
We can then prove the following simple result:
(e) If V : C R is such that
V (c1 ) > > V (cn )
and
1 = 1 > > n = 0,
then there exists a system (S, s, ) that satisfies the Herstein-Milnor
axioms and is such that cj s(j , c1 , cn ), 1 j n.
76

Proof. Let S be the least class T such that T C and s(, s, a, b) T


if a, b T and 0 1. Define s in such a way that s satisfies conditions
(i) - (iii) of section 5.6.2 and set cj s(j , c1 , cn ), 1 j n. Let 0 be the
least equivalence relation such that:
(i) is an extension of ,
(ii) a b if a = b, and
(iii) b c implies s(, a, b) s(, a, c).
Then it holds for all a S that there exists a unique between 0 and 1 such
that a 0 s(, c1 , cn ). Finally, set a b if and only if > for all , such
that a 0 s(, c1 , cn ) and b 0 s(, c1 , cn ). Then (S, s ) is as desired.
Hence the Herstein-Milnor axioms hardly provide the utility principle
with a formal justification.
Maximization of Expected Utility
In this section we will show that an agent may very well accept the
Herstein-Milnor axioms without her endorsing the principle of maximizing
expected utility. Indeed, let V : C R with C = {c1 , c2 , c3 , c4 } and set
o1 = (p, 1 p; c1 , c2 ) and
o2 = (q, 1 q; c3 , c4 ).
Assume that accepts the Herstein-Milnor axioms, has set
1 > v1 > v3 > v4 > v2 > 0,
and has the following preferences:
c1 > c3 > c4 > c2 .
Because accepts the axioms, she sets
o1 = s(p, c1 , c2 ) and
o2 = s(q, c3 , c4 ).
Moreover, she should be willing to set
o3 s(, c1 , c2 ) and
o4 s(, c1 , c2 ),
77

for some , such that 1 > > > 0. We may then derive o1 > o2 if and
only if
p > q + (1 q).
We also get E(o1 , V ) > E(o2 , V ) if and only if




v3 v2
v4 v2
p>q
+ (1 q)
.
v1 v2
v1 v2
Now, these orderings coincide for all values of p, q such that 0 < p, q < 1 if
and only if
v3 v2
and
v1 v2
v4 v2
=
.
v1 v2

Hence the Herstein-Milnor axioms force to accept the principle of maximizing expected utility if and only if they force her to set
v3 v2
and
v1 v2
v4 v2
=
.
v1 v2

But they clearly can not do that, since (e) of the previous section shows
that any choice of , such that 1 > > > 0 is compatible with the
Herstein-Milnor axioms. Hence these axioms put very mild constraints on
the admissible choices of and . Note also that (e) of section 5.6.2 shows
when o1 > o2 is compatible with the Herstein-Milnor axioms. Indeed, set
o1 = (p1 , . . . , pn ; c1 , . . . , cn ) and
o2 = (q1 , . . . , qn ; c1 , . . . , cn )
with c1 > > cn . Then o1 > o2 is compatible with the Herstein-Milnor
axioms if and only if
j
X

pi >

i=1

j
X
i=1

for some j, 1 j < n.

78

qi ,

Proof. Sufficiency. Assume that


j
X

j
X

pi

qi > .

i=1

i=1

Set
1 j =


> j + 1 0.
3

Then (e) of section 5.6.2 shows that there exists a system that contains
o1 > o2 and satisfies the Herstein-Milnor axioms.
Necessity. Obvious.
Hence these axioms provide a decision maker, facing a choice between two
options, with no useful guidelines.
Numerical Utility
von Neumann and Morgenstern (1953, p. 617) claim that their axioms make
utility a number modulo a positive linear transformation. In this section
it will be shown that this contention is at best somewhat misleading. To
see this, assume that c0 > c1 > > cn . Set, then, ci s(xi , c0 , cn ) for
1 i < n, and assume that 1 > x1 > > xn1 > 0. We can then construct
a system S that satisfies the Herstein-Milnor axioms and show that there
exists a unique function U : S R compatible with and s and such that
U (c1 ) = 1 and U (cn ) = 0. We then get U (ci ) = xi for 1 i < n. Hence the
construction of S has not provided us with a more specific value than we had
at the outset. Thus it cannot be said of a person who formally accepts the
Herstein-Milnor axioms that he has a numerical utility function unique up
to a positive linear transformation even if she is willing to let s(, a, b) stand
for a + (1 )b. Moreover, a persons acceptance of these axioms does not
imply that she has such a utility function. Rather, as we saw in section 5.6.2,
we have to let s(, a, b) stand for a + (1 )b to get that result; but, as
we saw there, from this assumption alone we can deduce that our agent has
a numerical utility function modulo a positive linear transformation. Thus
the contribution of the axioms can be questioned even in his respect.

5.6.3

Savage (1954; 1972)

Savage considers a non-empty set X, called a world, another non-empty set


K, called the set of consequences, and the set of all functions from X into K.
This set is labeled F and its elements are called acts. They will be referred
79

to by f , g, h etc. Similarly, x, y, z, . . . will refer to elements of X, k, k 0 , . . .


to elements of K, and A, B, C, . . . to subsets of X. He then considers the
relations , <, , >, and between acts and employs these to define the
following concepts:
Definition 1. f = g(B) if and only if f (x) = g(x) for all x B.
Definition 2. f g given B if and only if f 0 g 0 for all f 0 , g 0 such that f = f 0 (B),
g = g 0 (B), f 0 = g 0 (X \ B), and g 0 f 0 for all such pairs or for none.
Definition 3. f = k if and only if f (x) = k for all x X.
Definition 4. k k 0 if and only if k k 0 .
Definition 5. B is a null set if and only if f g given B for all f, g F .
Definition 6. f = (k, A, k 0 ) if and only if f (x) = k for x A and f (x) = k 0 for
x X \ A.
Definition 7. A B if and only if (k, A, k 0 ) (k, B, k 0 ) for all k, k 0 such that k > k 0 .
Definition 8. f k given B(k f given B) if and only if f k given B(k
f given B).
He then states the following axioms:
Axiom 1. is a complete semiorder on F .
Axiom 2. f g given B or g f given B for all f, g F .
Axiom 3. If f (x) = k, f 0 (x) = k 0 for all x B, and B is not a null set, then
f f 0 given B if and only if k k 0 .
Axiom 4. A B or B A for all subsets A, B of X.
Axiom 5. k < k 0 for some k, k 0 K.
Axiom 6. If f < g, then there exists a partition {Bi }, 1 i n, of X such that
f < g 0 and f 0 < g for all f 0 , g 0 such that f 0 = f (Bi ), g 0 = g(Bi ),
and f 0 = g 0 (Bi ) for some i, 1 i n.
Axiom 7. If f g(x) given B(g(x) f given B) for all x B, then f g given
B(g f given B).

80

From these axioms Savage proves that there exists a unique finitely additive probability measure P on P(X) such that P (A) P (B) if and only if
A B. This measure is the employed to associate sets of acts with gambles
as follows: let k1 , . . . , kn be consequences and assume that
n
X

pi = 1,

i=1

with pi 0. Then
G = {(pi , ki ) | 1 i n}
is a gamble. With G we can associate the set FG that consists of all all acts
f such that there exists a partition {Bi }, 1 i n, of X with P (Bi ) = pi
and f (x) = ki for x Bi , 1 i n.
Savage then defines a utility as a real valued function on K such that
n
X

pi U (ki )

m
X

i=1

qi U (ki0 ),

i=1

if and only if f g holds for all f FG and g FG0 . Here


G = {(pi , ki ) | 1 i n} and
G0 = {(qi , ki0 ) | 1 i m} .
Thereafter, he proves the following theorem:
Representation Theorem. If (X, K, ) satisfies Savages axioms then
there exists a utility on K.
This theorem is to serve as a justification of the utility principle (see
Savage, 1972, p. 99). To see how far Savage has succeeded in this endeavor,
note first that his axioms are consistent with it. This much follows from (a)
and (b) below. But Savages axioms cannot, of course, imply or justify the
utility principle, because they only contain the relations , > as primitive
concepts. Indeed, let U be a utility in Savages sense, and assume that f FG
with
G = {(pi , ki ) | 1 i n} .
Set
V (f ) =

n
X
i=1

81

pi U (ki ),

and let V 0 be a strictly increasing function of V . Then V and V 0 are both


compatible with the relation when it is restricted to acts associated with
gambles.
(a) For all C = {c1 , . . . , cn } and V : C R such that V (c1 ) > > V (cn )
there exist X and such that (X, C, ) satisfies Savages axioms and
is induced by V.
(b) If V : C R, o = (p1 , . . . , pn ; c1 , . . . , cn ), (X, C, ) satisfies Savages
axioms and is induced by V , then there exists an extension U of V
such that U : F R, U is compatible with , and
n
X
U (f ) =
pi U (ci ),
i=1

for all f T (a).


Here


T (o) = f f F and P (f 1 (ci )) = pi , 1 i n ,
an extension U to F of V , is such that U (ci ) = V (ci ), and a system (X, C, )
that satisfies Savages axiom is induced by a function V : C R such that
V (c1 ) > > V (cn ) if and only if c1 > cn , and for each ci there exists an
fi F such that:
(i) fi (x) = c1 or fi (x) = cn , for all x X,
(ii) P ({x | fi (x) = c1 }) =

V (ci )V (cn )
,
V (c1 )V (cn )

(iii) ci fi .
Here P is the unique probability measure on P(X) that is commensurate
with (X, C, ).
Proof of (a). See the construction of section 5.6.3.
Proof of (b). Let U 0 be a a utility that exists according to the Representation Theorem of section 5.6.3. Then




V (ci ) V (cn )
V (c1 ) V (ci )
0
0
U (ci ) =
U (c1 ) +
U 0 (cn ).
V (c1 ) V (cn )
V (c1 ) V (cn )
We thus get
V (ci ) = U 0 (ci ) + ,
for some > 0 and some . Hence there exists an extension U of V with
the desired properties.
82

Maximization of Expected Utility


A system (X, C, ) that satisfies Savages axioms is said to be compatible
with a function V : C R such that V (c1 ) > > V (cn ) if and only if
c1 > > cn , and for each ci there exists an fi F and a number i such
that
(i) fi (x) = c1 or fi (x) = cn , for all x X,
(ii) P ({x | fi (x) = c1 }) = i , 1 > > n ,
(iii) ci fi .
Here P is the unique probability measure on P (X) that is commensurate
with (X, C, ). We can then prove the following simple result:
(c) If V : C R is such that V (c1 ) > > V (cn ), 1 = 1 > > n = 0,
then there exists a system (X, C, ) that satisfies Savages system, is
compatible with V , and is such that
P ({x | fi (x) = c1 }) = i .
(fi as in the definition above.)
Proof. Set X = [0, 1] and K = C. Let P be a finitely additive measure
on P (X) that is an extension of the ordinary Lebesgue measure on [0, 1]. Set
U (f ) = 1 P (B1 ) + + n P (Bn )
with
Bi = {x | f (x) = ci }
and 1 , . . . , n as above. Finally, define f g if and only if U (f ) U (g).
Then (X, C, ) is as desired.
Note also that (c) implies that o1 > o2 is compatible with Savages axioms
if and only if o1 > o2 is compatible with the axioms of Herstein-Milnor.

5.6.4

Oddie and Milne (1990)

These authors aim explicitly at a justification of the utility principle. They


consider a finite set C = {c1 , . . . , cn } and the set O of all options
(p1 , . . . , pk ; c1 , . . . , ck )
such that k 1, p1 + + pk = 1, and pi 0 for 1 i k. They then
prove the following result:
83

Theorem. If
V :CR
with V (ci ) = 0, for some i, 1 i n, and
U :OR
that satisfies C1 and C2 , then
U (p1 , . . . , pn ; c1 , . . . , cn ) =

n
X

pi V (ci ).

i=1

Here U satisfies C1 if and only if


U (p1 , . . . , pk ; ci , . . . , ci ) = V (ci ),
and U satisfies C2 if and only if
U (p1 , . . . , pk ; c1 , . . . , ck ) =

k
X

U (pi , 1 pi , ci , c),

i=1

where V (c) = 0 and U (p, 1 p1 , ci , c) is continuous in p.


Proof of the theorem.
U (p1 , . . . , pn ; c1 , . . . , cn ) =

n
X

U (pi , 1 pi , ci , c).

i=1

Hence we need only prove that


U (pi , 1 pi , ci , c) = pi V (ci )
for 0 pi 1. This is readily done:
V (ci ) = U (1; ci )
1
1
= U ( , . . . , ; ci , . . . , c i )
n
n
1
1
= nU ( , 1 ; ci , c)
n
n

84

and
p
p
p p
p
p
U ( , 1 ; ci , c) + U (1 , ; ci , c) = U ( , 1 ; ci , ci )
q
q
q q
q
q
1
1
p
= U ( , . . . , , 1 ; ci , . . . , ci , ci )
q
q
q
1
1
p p
= pU ( , 1 ; ci , c) + U (1 , ; ci , c)
q
q
q q
p
p p
= V (ci ) + U (1 , ; ci , c).
q
q q
So
p
p
p
U ( , 1 ; ci , c) = V (ci ).
q
q
q
But U (p, 1 p; ci c) is continuous in p. Hence the theorem follows.
Note. We have deviated from Oddie and Milne (1990) in some trivial
respects in order to simplify the presentation.
Justification of the Utility Principle
The theorem above does not provide any justification of the utility principle,
but will form part of such a justification if the following results can be proved
in a non-circular way:
Theorem 1. If V : C R, then there exists a value function U : O R
that satisfies C1 and C2 .
Theorem 2. Every reasonable value operation is a value function that
satisfies C1 and C2 .
Oddie and Milne (1990) do not explicitly argue for these theorems, but
content themselves by stating that order considerations strongly suggest that
the value of an act is a function of the values and probabilities of the outcomes. Now, order considerations - the value of an act lies between the values
of the best outcome and the worsts one, it increases if the probabilities of
good outcomes increase etc. - do suffice to guarantee the existence of a value
function if the number of distinct outcomes is less than or equal to two. But
they will not do so in case this number is greater than two, as the considerations of section 5.6.3 and the following simple result from Debreu (1954)
show: Set
B = {(a, b) | a + b 1, a, b 0}
and (a, b) > (c, d) if and only if
85

(i) a + b > c + d or
(ii) a + b = c + d and a > b.
Suppose now that f : B R is such that f (a, b) > f (c, d) if and only
if (a, b) > (c, d). Set g(c) = (f (0, c), f (c, 0)). Then g(c) 6= if c > 0
and g(c) g(c0 ) = if c 6= c0 . But then the number of rational numbers
is uncountable since g(c) contains a rational number if c > 0. So order
considerations alone will not lead to the existence of a value function and
fortiori not to one satisfying C1 and C2 . No, to to prove the existence of
such a value function we must introduce the operation s and the relation
of Herstein and Milnor and assume that (O, s, ) satisfies there axioms.
Moreover, to get theorems 1 and 2, we must in view of the results in section
5.6.2 add the following axioms:
c1 > cn ,
and
ci s(V (ci ), c1 , cn )
for 1 < i < n. We have here assumed that
C = {c1 , . . . , cn }
and that
1 = V (c1 ) > > V (cn ) = 0.
Then
(p1 , . . . , pn ; c1 , . . . , cn ) s

X

pi V (ci ), c1 , cn

and we get theorem 1. Moreover, we get theorem 2 if U is compatible with


. In this way a circular proof of the utility principle is obtained. Hence, it
can be questioned if Oddie and Milne has contributed to provide the utility
principle with a formal justification.

5.6.5

Summary

(1) Existing axiomatic systems do not provide the utility principle or the
the principle of maximizing expected utility with a formal non-circular
justification.
86

(2) The prospects for their obtaining such a justification are not very
bright: indeed assume
C = {c1 , . . . , cn },
V : C R such that V (c1 ) > > V (cn ),
and O equal to the set of options (p1 , . . . , pn ; c1 , . . . , cn ) such that
n
X

pi = 1 and pi 0.

i=1

Then V gives rise to a set of value orderings on O. If n = 2, we


get a unique order determined by the size of p1 . Hence we can, in
this case, say that the value of (p1 , . . . , pn ; c1 , . . . , cn ) equals p1 or is a
strictly increasing function of p1 . But, in case n 3, we can, as we
saw in the previous section, get orderings that are incompatible with
any assignment of real numbers to the options. Hence, we must in this
case introduce a stratagem that in some way or other brings about a
reduction of the size of n; for instance by using topological methods like
Debreu (1954) or by employing the operation like von Neumann and
Morgenstern (1944). But, as we saw in section 5.6.2, the introduction
of s, which by the way is questionable, does not yield that all orderings
are isomorphic to the one induced by the utility principle. To achieve
that we must add axioms like those in the previous section. But we
then get a justification of the utility principle that is patently circular.

5.7

Allais Example and Expected Utility

Allais example might point the way to a counterexample to E(A, f ) as a


choice rule generator in the following manner: take as point of departure the
two inequalities
B > C and E > D.
Then
BE > BD,
BE > CE,
BE > CD.
87

Here BE corresponds to choosing B from {B, C} and E from {D, E}. This
in turn corresponds to choosing B rather than C and E rather than D at
the same time. In outcome form BD etc. can be described as follows:
BD
BE
CD
CE

= (0.11, 0.89; 2 106 , 106 ),


= (0.1, 0.9; 6 106 , 106 ),
= (0.011, 0.089, 0.0979, 0.7932, 0.0089; 6 106 , 5 106 , 2 106 , 0),
= (0.01, 0.089, 0.091, 0, 801, 0.009; 10 106 , 6 106 , 5 106 , 106 , 0).

Now
PBD (x 106 ) = PBE (x 106 ) > PCD (x 106 ) > PCE (x 106 ).
Moreover,
6 106 2 106 > 2 106 106 .
So there exists an f F, such that
E(BE, f ) > E(BD, f ),
E(BE, f ) > E(CD, f ),
E(BE, f ) > E(CE, f ).
Hence the Allais example does not show that E(A, f ) fails as a choice rule
generator at level (F3 , 0). We can also, see section 5.6.3, construct a system
that satisfies Savages axioms and contains the inequalities
BE > BD,
BE > CD,
BE > CE.
Finally, such constructions can also be performed for a host of axiom systems
for expected utility, see Malmnas (1990).

5.8

Lotteries with Non-monetary Prizes

In this case it may happen the rule of dominance is violated by people without
their being irrational. This can be seen from the following example from
Mas-Collel et al. (1995). Suppose that someone is offered a trip to Venice
including a weeks stay at Hotel Cipriano, a poster of a genre painting by
the Venetian painter Giandomenico Tiepelo, or a piece of chocolate. If these
88

offers are labeled a, b, c, then most people would prefer a to b and b to


c. But if they instead are offered tickets to the lotteries (0.99, 0.01; a, b) or
(0.99, 0.01; a, c), then a preference of the first lottery to the second one is not
uncommon and not unreasonable. A case for such a preference could take
the following form: having it on the wall would remind me of a missed chance
of seeing Venice as would giving it away to someone close. It would also be
embarrassing to give it to a charity organization, since its commercial value
is so low. This also seems to make any attempts at selling it unattractive.

89

References
About, P & Boy, M. (1983). La Correspondance de Blaise Pascal et de Pierre
de Fermat, Fontenay aux Roses.
Allais, M. (1953). Le Comportement de lHomme devant le Risque: Critique
des Postulats et Axioms de lEcole Americaine, Econometrica, 21, pp. 503546.
Allais, M. (1979). The Foundations of a Positive Theory of Choice involving
Risk and a Criticism of the Postulates and Axioms of the American School.
In Allais M. & Hagen, O. (ed.) Expected Utility and the Allais Paradox.
Dordrecht: Reidel, pp. 27-145.
Bell, D. (1982). Regret in Decision Making under Uncertainty. Operations
Research, 33, pp. 961-981.
Bell, D. (1986). Disappointment in Decision Making under Uncertainty. Operations Research, 33, 1-27.
Bergstrom, L. (1991). Cykliska preferenser. In Rabinowicz, W. (ed.) Valets
vedermodor. Sex beslutsteoretiska studier. Stockholm: Thales.
Bernoulli, D. (1738). Specimen theoriae novae de mensura sortis (English
translation). In Bernoulli, D. (1954). Exposition of a New Theory on the
Measurement of Risk. Econometrica, 22, pp. 23-36.
Bernoulli, D. (1954). Exposition of a New Theory on the Measurement of
Risk. Econometrica, 22, pp. 23-36.
Bernoulli, J. (1713). Ars Coniectandi. Basel.
Blum, J.R. & Rosenblatt, J. (1967). On Partial a Priori Information in
Statistical Inference. Ann. Math. Stat., 38, pp. 1671-1678.
Danielson, Mats (1997). Computational decision analysis. Diss., Department
of Computer and Systems Sciences, KTH, Stockholm.
Debreu, G. (1954). Representation of a Preference Ordering by a Numerical
Function. In Thrall, R.M., Coombs, C.H. & Davis, R.L. (eds.). Decision
Processes. New York: John Wiley
Ekenberg, Love (1994). Decision support in numerically imprecise domains.
Diss., Stockholm University, Stockholm.

90

Ekenberg, Love (2005). A Unified Framework for Indeterminate Probabilities


and Utilities. Diss., Stockholm University, Stockholm.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications.
Volume I, 3rd Edition, New York: John Wiley & Sons.
Fishburn, P. (1981). Subjective Expected Utility. Theory and Decision, 5,
pp. 205-42.
Fishburn, P. (1983). Transitive Measurable Utility. Journal of Economic
Theory, 31, pp. 293-317.
Green, J. & Jullien, B. (1988). Ordinal Independence in Nonlinear Utility
Theory. Journal of Risk and Uncertainty, 1, pp. 355-387.
Hagen, O. (1969). Separation of Cardinal Utility and Specific Utility of Risk
in Theory of Choices under Uncertainty. Statskonomisk Tidsskrift, 3, pp.
81-107.
Hagen, O. (1972). A New Axiomatization of Utility under Risk. Teorie a
Metoda, IV/2, pp. 55-80.
Hagen, O. (1979). Towards a Positive Theory of Preferences under Risk. In
Allais, M. & Hagen, O. (eds.). Expected Utility and the Allais Paradox.
Dordrecht: Reidel, pp. 271-302.
Herstein, I.N. & Milnor, J. (1953). An Axiomatic Approach to Expected
Utility. Econometrica, 21, pp. 291-97.
Hodges Jr, J.L. & Lehmann, E.L. (1952). The Use of Previous Experience
in Reaching Statistical Decisions. Ann. Math. Stat., 23, pp. 396-407.
Hurwicz, L. (1951). Some Specification Problems and Applications to Econometric Models. Econometrica, 19, pp. 343-344.
Jackson, D.A. et al. (1970). G2-Minimax Estimators in the Exponential
Family. Biometrika, 57, pp. 439-443.
Johansson, Henrik (2003). Decision Analysis in Fire Safety Engineering Analysing Investments in Fire Safety [Electronic resource]. Diss., Department of Fire Safety Engineering, Lund University, P.O. Box 118, SE-221
00 Lund, Sweden. Available online:
http://lup.lub.lu.se/record/466080/file/2064108.pdf
Kahneman, D. (2011). Thinking Fast and Slow. New York: Farrar, Straus
91

and Giroux.
Kahneman, D. & Tversky, A. (1979). Prospect Theory: An Analysis of
Decision under Risk. Econometrica, 47, pp. 263-293.
Kofler, E. & Menges, G. (1976). Entscheidungen bei unfollstandiger Information. New York: Springer Verlag, Berlin, Heidelberg.
Levi, I. (1974). On Indeterminate Probabilities. The Journal of Philosophy,
71-13, pp. 391-418.
Loomes, G. & Sugden, R. (1982). Regret Theory: an Alternative Theory
of Rational Choice under Uncertainty. The Economic Journal, 92, pp.
805-824.
Loomes, G. & Sugden, R. (1986). Disappointment and Dynamic Consistency
in Choice under Uncertainty. Review of Economic Studies, LIII, pp. 271282.
Mac Crimmon, R.D. & Larsson, S. (1979). Utility Theory: Axioms versus
Paradoxes. In Allais, M. & Hagen, O. (eds.). Expected Utility and the
Allais Paradox. Dordrecht: Reidel, pp. 333-409.
Malmnas, P.E. (1981). From qualitative to quantitative probability. Diss.,
Stockholm: Almqvist & Wiksell international.
Malmnas, P.E. (1990). Axiomatic Justifications of the Utility Principle: A
Formal Investigation. Research Report HSFR 677/87.
Malmnas, P.E. (1994). Axiomatic Justifications of the Utility Principle: A
Formal Investigation. Synthese, 99, pp. 233-249.
Mas-Colell, A. et al. (1995). Microeconomic Theory, New York: Oxford UP.
Menger, K. (1934). Das Unsicherheitsmoment in der Wertlehre. Zeitschrift
f
ur Nationalokonomie, V, pp. 459-485.
Oddie, G. & Milne, P. (1990). Act and Value. Theoria, LVII, pp. 42-76.
Quiggin, J. (1982). A Theory of Anticipated Utility. Journal of Economic
Behavior and Organization, 3, pp. 323-343.
Ramsey, F.P. (1931). Truth and Probability. In Ramsey, F.P. (ed.) The
Foundations of Mathematics and Other Logical Essays. New York: Harcourt, Brace, pp. 156-98.
92

Randles, R.H. & Hollander, M. (1971). Y-Minimax Selection Procedures in


Treatments versus Control Problems. Ann. Math. Stat., 42, pp. 330-341.
Savage, L.J. (1954). The Foundations of Statistics. New York: John Wiley
(2nd ed. 1972, Dover, New York).
Solomon, D.L. (1972). L-Minimax Estimation of a Multivariate Location
Parameter. Journal of the American Statistical Association, 67, pp. 641646.
Sugden, R. (1986). New Developments in the Theory of Choice under Uncertainty. Bulletin of Economic Research, 38, pp. 1-24.
Sundgren, David (2011). The Apparent Arbitrariness of Second-Order Probability Distributions [Electronic resource]. Diss., Department of Computer
and Systems Sciences, Stockholm University, Stockholm. Available online:
http://su.diva-portal.org/smash/get/diva2:397258/FULLTEXT02.pdf
von Neumann, J. & Morgenstern, O. (1944). Theory of Games and Economic
Behavior. Princeton: Princeton University Press (2nd ed. 1947, 3rd ed.
1953).
Watson, S.R. (1974). On Bayesian Inference with with Incompletely Specified
Prior Distributions. Biometrika, 61, pp. 193-196.
Weinstein, M.C. & Fineberg, H.V. (1980). Clinical Decision Analysis. Philadelphia: W.B. Sunders.
Yaari, M. (1987). The Dual Theory of Choice under Risk. Econometrica, 55,
pp. 95-115.

93

Appendices

94

Appendix 1
Elementary Probability
This appendix contains a short treatment of some parts of elementary probability theory. The emphasis is on notions and results fundamental for an
understanding of classical decision theory. For a full treatment of elementary
probability theory Feller (1968) is recommended.
The primary notion of elementary probability theory is that of a finite
random experiment like throwing a die, tossing a coin, playing a soccer game,
or shooting at a paper target, a fixed number of times. The main feature of
such an experiment is that the outcomes denoted i (small omega), will vary
if the experiment is to be repeated, but all be contained in a given set called
a sample space, denoted by (capital omega).
When throwing a die, the sample space consists of the outcomes 1 =
1, 2 = 2, 3 = 3, 4 = 4, 5 = 5 and 6 = 6. The experiment of tossing a
coin has a sample space with the outcomes head head or tail tail . Playing
a game of soccer or shooting at a paper target, however, has a sample space
which is not quite as easily defined as throwing a die or tossing a coin. More
on this later.
The sample spaces of throwing a die, tossing a coin or playing a game of
soccer are all discrete, meaning countable. Other discrete sample spaces are
the number of scores in a wrestling match, the number of faulty products in
a batch, or the number of white cars on a given street on a rainy day. Note
that we are here dealing with the natural numbers 0, 1, 2, 3 and so on.
When shooting at a paper target, the sample space is continuous, meaning
that the outcomes are not countable. We can define an outcome (x,y) of
shooting at a paper target, as the number of centimeters away from the
center of the target, horizontally and vertically. But the distance away from
the center, in any given direction, can be seen as numbers along a continues
line, and thus the sample space itself is continuous.
Another property of a sample space is that it can be either finite or
95

infinite. The throwing of a die has a finite sample space, because the only
possible outcomes are 1, 2, 3, 4, 5 or 6. Likewise has the tossing of a coin
a finite sample space. The number of goals in a soccer game however, is in
theory, an infinite sample space, since there is no maximum limit of goals.
The sample space when shooting at a paper target is also infinite, since there
are an infinite number of different distances from the center.
To summarize, a discrete sample space contains countable things, such
as objects, while a continuous sample space contains measurable properties
such as length, height, and speed. A sample space with a fixed number of
possible outcomes is finite, and a sample space with an unknown or unending
number of possible outcomes, is infinite.
In probability theory we are interested in something called events (often
denoted A, B, C, . . . or A1 , . . . , Am , B1 , . . . , Bn , . . . ). An event consist of a set
of outcomes, where a set of outcomes, denoted {i , . . . , n }, can be either a
single outcome, or several outcomes. When an event consists of only a single
outcome, it is sometimes referred to as a simple event. So, for example, if we
throw a die and want to specify an event A1 as the event that we throw a 6,
and an event A2 that we throw an even number, we can write these as
A1 = {6},

(1.1)

A2 = {2, 4, 6}.

(1.2)

For a soccer game we can denote the event B, that the number of goals will
be either 1, 3 or 4, with
B = {1, 3, 4}.
(1.3)
As events when shooting at a paper target, we can specify C1 to be the event
that the hit will be less then 5 cm from the center, the event C2 that the hit
will be in the upper half sector, and the event C3 that the hit will be in the
right half sector.
C1 = {hit less than 5 cm from center}

(1.4)

C2 = {hit in upper half sector}

(1.5)

C3 = {hit in right half sector}

(1.6)

The events C1 , C2 and C3 , consists each of an infinite number of outcomes


(in this case hits), which should make it obvious why probability is about
events, rather than about outcomes.
The sample space is also called the certain event. This is because it
encompasses all possible outcomes of an experiment, and thus we are certain

96

that an event (for example D = ), will occur1 . Trying to picture the certain
event of a finite sample space, as in the case of throwing is die, is quite easy.
We can be certain to get either a 1, 2, 3, 4, 5 or a 6. But, when it comes to
infinite sample spaces, like all possible outcomes of a soccer game, it becomes
difficult. How can one picture the unknown? By using negation. Looking
back at the event B in (1.3), we could define the certain event of the soccer
game to be either 1, 3 or 4, or something other than 1, 3 or 4, which together
encompasses all possible outcomes of a soccer game.
Since the aim of probability theory is to give a mathematical model of
random experiments, it freely employs set theoretic notation, and thats why
the results are called events, which in turn are defined as subsets of the sample
space . The notion of a subset is a set consisting entirely of elements found
in another set. If we have the sets A = {1, 2, 3, 4} and B = {1, 3}, then, since
all the elements in B can be found in A, we call B a subset of A, written as
B A.

(1.7)

The symbol means a subset of, or equal to. Thus B A even if


B = {1, 2, 3, 4}.
New events can be created from already defined events, with the help of
the set theoretic operations of union (), intersection (), complement ( )2 ,
and difference (). Let A and B be two events (i.e. sets of outcomes), and
let be the sample space. Then A B is the event of all outcomes in at
least A or B, A B is the event of the outcomes in both A and B, and A is
the event of all outcomes in the sample space excluding the outcomes in
A. The complement of the sample space is the so called empty set, = .
Consider the experiment of throwing a single die. The sample space is
= {1, 2, 3, 4, 5, 6}, which is also the certain event.
=
Let A be the event that the outcome is an even number, A = {2, 4, 6},
let B be the event that the outcome is equal to or less than three, B =
{1, 2, 3}, and let C be the event that the outcome is 1, C = {1}.
A=
1

For this definition to hold, we need to accept that an event occurs if and only if one
of its outcomes occurs.
2
Other common ways to write the complement, of for example the set S, are S, S and
S0

97

B=
C=
Using the union operator we can create the events A B = {2, 4, 6}
{1, 2, 3} = {1, 2, 3, 4, 6}, and B C = {1, 2, 3} {1} = {1, 2, 3}. As you can
see, any of the elements in A B can be found in either A or B, or in both
A and B. Likewise, any of the elements in B C can be found in either B
or C, or in both B and C.
AB =

BC =

With the intersection operator we can define the events A B =


{2, 4, 6} {1, 2, 3} = {2}, and A C = {2, 4, 6} {1} = . Here, all elements in the resulting sets A B, are found in both A and B. The same
applies to A C because the empty set is a subset of every set, and thus a
subset of both A and C. Note that the intersection of two events A and B,
sometimes is written as AB.
AB =

AC =

The complement is a unary operator, which means that it is applied to


only one operand at a time, contrary to the previous operators union and
intersection . The complement of A is A = {1, 3, 5}, and the complement
of C is C = {2, 3, 4, 5, 6}.
A =

C =

Using the difference operator we can write the event all outcomes in
B, but not in C as B C, and the event all outcomes in C, but not in
Bas C B. However, by adding parenthesis, we see that this is the same as
using a combination of intersection and complement, as in B C = B(C ) =
B (C ) = {2, 3} and C B = C(B ) = C (B ) = .
BC =

98

C B =

To picture the relations between events, consisting of arbitrary outcomes,


one can use Venn diagrams, where the outer boundary represents the sample
space, and the circles (possibly any shape) the events. In figures 1.1 through
1.7 follows a quick repetition of the set operators with the help of Venn
diagrams.

Figure 1.1: The shaded region represents the sample space (or certain event)
.

Figure 1.2: The shaded region represents the event A.

Figure 1.3: The shaded region represents the event A , the complement of A.
Lets look at an experiment of throwing one die two times. The certain
event can be represented by the sample space = {(i, j) : 1 i, j 6}.
Essentially, this means that the sample space consists of the two throws i
and j, in that order, each of which can take on a number from 1 through 6.
Now, let A be the event that both throws yield even numbers, and B the
event that both throws give as results, numbers less than four, then A B
is the event two is the common outcome of the two throws. Moreover,
99

Figure 1.4: The shaded region represents the event A B, the union of A
and B.

Figure 1.5: The shaded region represents the event A B, the intersection
of A and B.

Figure 1.6: The shaded region represents the event A B, the difference
between A and B.

Figure 1.7: Here the events A and B are disjoint, i.e. their intersection is
the empty set .
A B is the event neither the first throw nor the second one has five as an
outcome, and A the event at least one of the two throws yields an odd
number.
To each event of a stochastic experiment, we can associate a non-negative
number indicating how probable the event is. Since probable is best understood as probable in comparison with the certain event, choosing numbers
between 0 and 1 appears as the natural choice, where 1 is the probability of
100

the certain event.


For an experiment in which all events are equally likely, we define the
probability for each event as 1/n, where n is the total number of events. Thus,
when throwing a die, the probability of getting a one is 1/6, the probability
of getting a two is 1/6, and so on. When tossing a coin, the probability of
getting a head is 1/2, which is the same as the probability of getting a tail.
Generally, the probability is the number of favorable outcomes, over the total
number of outcomes.
Lets look at how the probability of events relates to the set operators of
union , intersection , and complement , which also involves the concept
of conditional probability. We will again make use of the above events of
the throwing of a die: A = {2, 4, 6}, B = {1, 2, 3, }, and C = {1}. The
probability of event A is 3/6, because the number of favorable outcomes is
3, and the total number of outcomes is 6. Consequently, the probability of
B is also 3/6, and the probability of C is 1/6.


= 3/6
P (A) = P


= 3/6
P (B) = P


P (C) = P
= 1/6
The probability of the union of two events, depends on whether the events
are disjoint or not. Two events are disjoint if their intersection is the empty
set , i.e. if they have no outcomes in common. Thus, event A and C are
disjoint, but A and B are not.
Starting with the probability of the union of A and C, since
AC =
then
P (A C) = P

= {1, 2, 4, 6}


= 4/6.

From this follows that if two events are disjoint, as in the case of A and
C, the probability of the union of the events is the same as the sum of the
probabilities of each event: P (A C) = P (A) + P (C). This is true for an
arbitrary number of events, as long as they are disjoint.
The probability of the union of A and B is calculated differently, since
these events are not disjoint, with the common outcome of 2. Just looking
at the union of the two events, A B = {1, 2, 3, 4, 6}, its quite obvious that
the probability is P (A B) = 5/6. However, adding the probabilities of
101

events A and B gives us P (A) + P (B) = 3/6 + 3/6 = 1, which clearly isnt
correct. The error lies in the fact that we counted the probability of the
intersection of the events twice. Hence we need to subtract the probability
of the intersection to get the correct answer.
P (A B) = P (A) + P (B) P (A B) = 5/6






+P
P
= 5/6
P (A B) = P
If two events are disjoint, i.e. their intersection is the empty set , the
probability of the intersection is 0. Its like asking what the probability of
nothing is. Something that will remain zero, at least until the end of the
universe. Take note, however, that nothing is not the same as an unexpected
event.
In order to calculate the probability of the intersection of two events that
are not disjoint, we also need to look at conditional probability. Essentially
this refers to the probability of some event, given that some other event already occurred. For example, the probability of some team winning a soccer
game is likely to be higher, given that they lead by 3-0, than if the score was
even. Actually, even though its not made explicit, one can argue that probabilities are always conditional, because assumptions are made about the state
prior to the result of the experiment. Consequently, to be meticulous about
probabilities, one should for example not only ask about the probability of
getting a head when tossing a coin, but about the probability of getting a
head, given that the coin is fair, and so on.
Although it might seem a bit lame to exemplify conditional probability
with the experiment of throwing a die, I will still do it for the sake of clarity.
Imagine that you have thrown a die without looking at it. An observer tells
you that the die shows either 1, 2, 3 or 6; a subset of the sample space. At
this point, asking for the probability of getting a 5 or a 6, is the same as
asking for the probability of getting a 6 (the intersection of the two events),
given the event that the result is either 1, 2, 3 or 6, since we know it cant
be 5. With this new information, the probability of getting a 5 or a 6 is 1/4
(one favorable event, the 6, out of four possible events).
Let A1 = {1, 2, 3, 6} and A2 = {5, 6} represent the events above. Then
we have


= 4/6
P (A1 ) = P


P (A2 ) = P
= 2/6


= 1/6
P (A1 A2 ) = P
102

Thinking again about the conditional probability of A2 , given that A1


occurred, its obvious that the only possible event left is the intersection of
A1 and A2 . Without the condition of A1 having occurred, the probability of
the intersection is 1/6, and its obvious that once A1 actually has occurred,
the probability of the intersection is increased. But by how much? Lets for
a moment look at the case when the condition is the intersection itself.
If the condition is the intersection itself, it should come as no surprise
that that conditional probability is 1, i.e. P (A1 A2 ), given A1 A2 is 1.
Since P (A1 A2 ) = 1/6 it is obvious that the factor we need to multiply
P (A1 A2 ) by is 6. Writing 6 as 6/1 it looks like the inverse of P (A1 A2 ),
which is exactly what it is. The conditional probability of some event A1 ,
given another event A2 , is calculated as the probability of their intersection,
multiplied by the inverse of the probability of A2 .
More formally the conditional probability of some event B given some
event A is written P (B|A), and has the equation
P (B|A) = P (A B)P (A)1 =

P (A B)
P (A)

Lets now calculate the probability of A1 , given that A2 occurred, pretending


we dont know P (A1 A2 ). Initially we have
P (A1 |A2 ) =

P (A1 A2 )
.
P (A2 )

Since we already know P (A2 ) we can write it as


P (A1 |A2 ) =

P (A1 A2 )
1/3

and rearrange the factors to get


1
P (A1 A2 )
=
P (A1 |A2 )
3
Exploiting the fact that any probability is a number from 0 to 1, and in this
particular case, some rational number n/6, we can for a moment rewrite
P (A1 A2 ) =
and
P (A1 |A2 ) =

103

x
6

y
6

where x and y are unknown. This gives us


x/6
1
=
y/6
3
Rewriting the left fraction as a multiplication gives us
x 6
x
1
= =
6 y
y
3
and consequently
x = 1,

y=3

Now we have not only gotten the answer to P (A1 |A2 ), but also the answer
to P (A1 A2 ):
1
P (A1 A2 ) =
6
1
3
P (A1 |A2 ) = =
6
2
Hence, the formula for P (A B) is
P (A B) = P (A)P (B|A)

(1.8)

However, if the events A and B are stochastically independent of each


other, such as throwing a die twice, then the intersection of the probabilities
are simply P (A B) = P (A)P (B), with the condition that P (A) > 0 and
P (B) > 0. For example, the probability of getting a 2 or a 3 on the first
throw, and a 4, 5 or a 6 on the second throw, is (2/6)(3/6) = 1/6. However,
this is in reality the same formula as in (1.8) above, its just that if the events
are independent, the factor P (B|A) is equal to P (B).
Now on to the probability of the complement of an event, which is rather
straight forward. Since the complement of an event A, is the event that A
does not occur, and the probability of the certain event is 1, then
P (A ) = 1 P (A)
To summarize, let A and B be two events. If they are disjoint, i.e. AB =
, then the probability of their union is
P (A B) = P (A) + P (B)

(1.9)

If the events A and B are not disjoint then


P (A B) = P (A) + P (B) P (A B)
104

(1.10)

The formula for calculating the probability of the intersection if the events
are stochastically independent, and P (A) > 0 and P (B) > 0, is
P (A B) = P (A)P (B)

(1.11)

P (A B) = P (A)P (B|A)

(1.12)

and if they are dependent

The conditional probability is


P (B|A) =

P (A B)
P (A)

(1.13)

Finally, the probability of the complement of an event A is


P (A ) = 1 P (A)

(1.14)

A real-valued function defined on a finite sample space is called a


stochastic variable (or random variable)3 . If such a function assumes the
value 1 if an event A occurs, and the value 0 if the event A does not occur
(i.e. if the complement A occurs), then it is called the indicator of the event
A and is labeled 1A . Thus
(
1 if A;
1A =
0 if A .
Indicators have the following useful properties:
1A 1B

if and only if A B

because, if the event A is a subset of B, we know that if A occurs, then B


must occur. However, that B occurs, doesnt necessarily mean that A occurs
(see figure 1.8).
1A = 1 1A
This implies that if the indicator function of the complement of A is 1, i.e.
the event A does not occur, then the indicator function of A has to be 0,
which is exactly what it is when A does not occur.
3

A stochastic variable is a variable that can take on any value from a specific range,
but the exact value cant be predicted with certainty, only probabilistically. Thus, the
outcome of throwing a die, or the result of a soccer game, is a stochastic variable.

105

Figure 1.8: Here the event A is a subset of event B. Since the outcomes in
A also are in B, then if A occurs B will also occur. But the event B can also
occur without event A occurring, since B obviously contains outcomes that
are not in A.

1AB = 1A 1B
Contrary to the probability of the intersection, the indicator function of the
intersection is much easier to calculate since it only takes on the values 1 and
0. Thus, if both A and B occurs, the the indicator function of each of them
is 1, and thus the product of these are also 1.
1AB = 1A + 1B 1A 1B
Since the indicator function can only take on the values 0 or 1, we must
subtract by the product of the indicator functions, which is 0 if the event is
only A or B, and 1 if the event is the intersection of the two events.
If A1 , . . . , An is a division of into disjoint subsets (A1 An = ),
then X = x1 1A1 + + xn 1An is a simple stochastic variable, where
each variable xi is the value of the corresponding event Ai . Take again the
example of throwing a die. Let A1 be the event that the die shows a 1, let
A2 be the event that the die show a 2, and so on. Let the values of each
event be the number on the die in dollars. Thus, if the die show a 1 we
get one dollar, if the die show a 2 we get two dollars, etc. The stochastic
variable X can then take on any value from 1 through 6, since in this case
X = 1 1A1 + + 6 1A6 .
The expected value of X is defined as E(X) = x1 P (A1 )+ +xn P (An ),
where xn is the result of event An . Consequently we can also write the
expected value as E(X) = x1 P (x1 ) + + xn P (xn ). Looking at the same
example as in the above paragraph, the expected value of throwing a die
would be E(X) = 1 61 + + 6 61 = 3.5.
Expected values satisfy the following conditions: E(c X) = c E(X)
for any variable c, and E(X) E(Y ) if X Y . Moreover, the following
equality holds:

106

Addition property
E(X + Y ) = E(X) + E(Y )
Proof. Set
X = x1 1A1 + + xn 1An =

n
X

xi 1Ai

i=1

Y = y1 1B1 + + ym 1Bm =

m
X

yj 1Bj

j=1

The sum of X and Y is all possible combinations of them, thus

X + Y = (x1 1A1 + y1 1B1 ) + + (x1 1A1 + ym 1Bm ) + + (xn 1An + y1 1B1 ) + + (xn 1An + ym 1
n X
m
X
=
xi 1Ai + yj 1Bj
i=1 j=1

The sum of the expected values is


E(X) + E(Y ) =

n
X

xi P (xi ) +

i=1

m
X

yj P (yj )

j=1

and the expected value of the sum is


E(X + Y ) =

n X
m
X

P (xi )P (yj )(xi + yj )

i=1 j=1

which, since the order of terms is irrelevant, is the same as


E(X + Y ) =

n X
m
X

P (xi )P (yj )xi +

i=1 j=1

n X
m
X

P (xi )P (yj )yj

i=1 j=1

Further, multiplication is associative so we can write the above as


! m
!
! n
!
n
m
X
X
X
X
E(X + Y ) =
P (xi )xi
P (yj ) +
P (yj )yj
P (xi )
i=1

j=1

j=1

i=1

Since all probabilities add up to 1 (e.g. P (x1 ) + + P (Xn ) = 1), we know


that
m
X
P (yj ) = 1
j=1

107

n
X

P (xi ) = 1

i=1

And consequently
E(X + Y ) = E(X + Y ) =

n
X

P (xi )xi +

i=1

m
X

P (yj )yj = E(X) + E(Y )

j=1

which we set out to prove.


Multiplication property
E(XY ) = E(X) E(Y )

(1.15)

Proof. Set
X = x1 1A1 + + xn 1An
Y = y1 1B1 + + ym 1Bm
The product of the stochastic variables X and Y is the sum of the products
of all possible combinations of X and Y :
XY = x1 1A1 y1 1B1 + + x1 1A1 ym 1Bm + + xn 1An y1 1B1 + + xn 1An ym 1Bm
n X
m
X
=
xi 1Ai yj 1Bj
i=1 j=1

The product of the expected values is


! m
!
n
n X
m
X
X
X
E(X)E(Y ) =
xi P (xi )
yj P (yj ) =
xi P (xi )yj P (yj )
i=1

j=1

i=1 j=1

and the expected value of the product of X and Y is


E(XY ) =

n X
m
X

P (xi )P (yj )xi yj = E(X)E(Y )

i=1 j=1

Markovs inequality Let X 0 be a stochastic variable and > 0, then


P (X )

108

E(X)

Proof. Let A be the event that X is greater or equal to , and 1A the


indicator of A. Then 1A X , and the following holds
 
X
E(X)
P (X ) = E(1A ) E
=

and thus we have proven the inequality.


Chebyshevs inequality Let X be a stochastic variable with expected
value = E(X) and standard deviation . If > 0 is a real number then
P (|X | )

1
2

(1.16)

What this means is, the probability that the actual outcome lies outside of
standard deviations from the expected value. The greater the number ,
the less the probability that the outcome lies outside the interval . By
setting  = we can write the inequality as
P (|X | )

2
2

(1.17)

Both ways of writing Chebyshevs inequality are common in the literature.


Proof. First we rewrite the inequality in (1.17) as
2 2 P (|X | )

(1.18)

We start by rewriting the variance 2 as


2

n
X



(xi )2 P (xi )

i=1

which obviously is greater than or equal to the variance of the subset of X


which satisfy the inequality |xi | , hence we have
n
X
X 



(xi )2 P (xi )
(xi )2 P (xi )
i=1

|xi |

Since |xi |  we can further state that


X 
X

(xi )2 P (xi )
2 P (xi )
|xi |

|xi |

109

Then we can rewrite the right hand side as


X
X
P (xi )
2 P (xi ) = 2
|xi |

|xi |

Now, since x1 , . . . , xn are pairwise disjoint, the sum of all P (xi ) where |xi
|  is the same as the probability that |X | , and consequently we
can write
X
P (xi ) = 2 P (|X | )
2
|xi |

Looking back we have shown that


2 2 P (|X | )
which is the same as
P (|X | )

2
2

and thus we have proved the inequality.


Two simple random variables X = x1 A1 + +xn An and Y = y1 B1 + +
ym Bm are stochastically independent if the classes A1 , . . . , An and B1 , . . . , Bm
are stochastically independent. For independent stochastic variables, the
following fundamental equality holds:
The variance of a stochastic variable X, often denoted with 2 , V (X) or
V ar(X), is a measure of the mean variation of X from its expected value. It
is defined as

V (X) = E [X E (X)]2
Remember here the definition of a stochastic variable X = x1 1A1 + +xn 1An ,
and that the expected value is the weighted sum of the different possible
values of that stochastic variable:
E(X) = x1 P (x1 ) + + xn P (xn ).
Applying this to the variance we can also write it as
V (X) = p(x1 ) (x1 E(X))2 + + p(xn ) (xn E(X))2 .
However, expanding the square in the first definition we get

V (X) = E [X E (X)]2

= E X 2 2XE (X) + E (X)2
Using the properties we have proved above, and remembering that the expected value of a constant is equal to the constant itself (this also means
110

that the expected value of the expected value itself, is equal to the expected
value), we continue with

V (X) = E(X 2 ) E (2XE(X)) + E E(X)2
= E(X 2 ) E(2)E(X)E (E(X)) + E(E(X)2 )
= E(X 2 ) 2E(X)E(X) + E(X)2
= E(X 2 ) 2E(X)2 + E(X)2
= E(X 2 ) E(X)2
The last expression makes calculating the variance much easier than using
the formula in the definition.
The variance of a stochastic variable X, multiplied by some constant b is
V (bX) = b2 V (X)
Proof.
V (bX) = E [bX bE(X)]2


= E [b(X E(X))]2

= E b2 [X E(X)]2

= b2 E [X E(X)]2
= b2 V (X)
Bienaymes equality
V (X1 + + Xn ) = V (X1 ) + + V (Xn )
if X1 + + Xn are pairwise independent.
Proof. Note first that E (X E(X)) = E(X) E (E(X)) = E(X)
E(X) = 0 and that X + a and Y + b are independent so long as X and Y
are independent. Then

V (X1 + + Xn ) = E [(X1 + + Xn ) E (X1 + + Xn )]2

= E [(X1 E(X1 )) + + (Xn E(Xn ))]2
Using a summation symbol instead, we get
"
!
#2
n
n
X
X
V
Xi = E
(Xi E(Xi ))
i=1

i=1

111

Expanding the square in the right hand side gives us


!
!
n
n
n
X
X
X
2
V
Xi = E
[Xi E(Xi )] +
[(Xi E(Xi )) (Xj E(Xj ))]
i=1

i=1

j6=i

Using the addition property of the expected value, we can then split the right
hand side into two parts:
!
!
!
n
n
n
X
X
X
2
V
Xi = E
[Xi E(Xi )] +E
[(Xi E(Xi )) (Xj E(Xj ))]
i=1

i=1

j6=i

To save space we set


a=E

n
X

!
[Xi E(Xi )]2

i=1

and
b=E

n
X

!
[(Xi E(Xi )) (Xj E(Xj ))]

j6=i

so we have
V

n
X

!
Xi

i=1

112

=a+b

Then, by using the addition and multiplication property of the expected


value, we get
!
n
X

 2
a=E
Xi 2Xi E(Xi ) + E(Xi )2
i=1
n
X

=E

Xi 2

i=1

=
=
=

n
X
i=1
n
X

E(Xi ) 2
E(Xi 2 ) 2

i=1
n
X

i=1

i=1

n
X

n
X

E(Xi 2 ) 2

n
X

E(Xi 2 )

i=1
n
X

E(Xi )2

i=1

E (Xi E(Xi )) +
E(Xi )E(Xi ) +

n
X
i=1
n
X

E E(Xi )2

E(Xi )2

i=1

E(Xi )2 +

i=1
n
X

n
X

2Xi E(Xi ) +

i=1
n
X

i=1

n
X

n
X

E(Xi )2

i=1

E(Xi )2

i=1

E(Xi 2 ) E(Xi )2

i=1

and
b=E

n
X

!
[Xi Xj Xi E(Xj ) Xj E(Xi ) + E(Xi )E(Xj )]

j6=i

=E

n
X

Xi X j

j6=i

=
=
=

n
X
j6=i
n
X
j6=i
n
X
j6=i

E(Xi Xj )

n
X
j6=i
n
X

Xi E(Xj )

E(Xi )E(Xj )

Xj E(Xi ) +

j6=i

E (Xi E(Xj ))

j6=i

E(Xi )E(Xj )

n
X

j6=i
n
X

!
E(Xi )E(Xj )

j6=i
n
X

E (Xj E(Xi )) +

j6=i
n
X

n
X

n
X

E (E(Xi )E(Xj ))

j6=i

E(Xi )E[E(Xj )]

n
X

E(Xj )E[E(Xi )] +

j6=i

E(Xi )E(Xj )

j6=i

n
X
j6=i

=0

113

E(Xj )E(Xi ) +

n
X

E[E(Xi )]E[E(Xj )]

j6=i
n
X
j6=i

E(Xi )E(Xj )

Consequently, we now have


!
n
n
n
X
X
 X
2
2
V (Xi )
V
Xi =
E(Xi ) E(Xi ) =
i=1

i=1

i=1

which is what we wanted to prove.


is
The variance of the mean of a stochastic variable X, namely V (X),

= E (X
E(X))
2
V (X)
Since the mean of X is
= X 1 + + Xn = X 1 + + Xn = 1 X1 + + 1 X n
X
n
n
n
n
n
where X1 , . . . , Xn are equally distributed and independent stochastic vari as
ables, we can write the variance of X
= V ( X1 + + X n ) = V ( 1 X1 + + 1 X n )
V (X)
n
n
n
n
and according to Bienaymes equality above we can further expand it to
= V ( 1 X1 ) + + V ( 1 Xn ).
V (X)
n
n
Because V (bX) = b2 V (X) we can then write this as
=
V (X)

1
1
V
(X
)
+

+
V (Xn ).
1
n2
n2

Now, since X1 , . . . , Xn are equally distributed stochastic variables, the variance of them all is the same, so V (Xi ) = i2 = 2 , and thus we have
=
V (X)
=
=
=
=

1 2
1 2

1
n2
n2 n
1 2
( + + n2 )
n2 1
1
n 2
n2
n 2
n2
2
V (X)
=
n
n

114

Bernoullis Law of Large Numbers Assume then that X1 , . . . , Xn are


pairwise independent and equal with respect to expected value and variance.
Then the following fundamental inequality holds for any  > 0.



X1 + + X n



P
E(X)  0
n

as n

(1.19)

In words this means that the probability that the difference between the
sample average and the expected value, is greater or equal to some number
, approaches 0 as n approaches infinity. Another way to write it is




X1 + + Xn
E(X) <  1
as n
P
n
which might be more intuitive.
Proof. Recall Chebyshevs inequality:
P (|X | )
=
Applying Chebyshevs inequality to X

2
.
2

X1 ++Xn
n

we obtain

2

E(X)|

P |X
n2
2

Since n
2 0 when n we have just proved the Bernoullis Law of
Large numbers.

115

Appendix 2
Historical Notes
Classical decision theory and much of probability theory starts with a lost
letter from Pierre de Fermat to Blaise Pascal. In his response, dated 29th of
July 1654, Blaise Pascal accepts Fermats solution of the Problem of Points
and outlines his own, which he considers simpler.

2.1

The Classical Problem of Points

Two players A and B stake an equal amount of money, which is to be won by


the player who first gets n points. Points are gained after a throw of a die,
where A gains one if an odd number is shown. Suppose now that the game
is interrupted before any of the players has obtained n points. How are the
stakes to be divided?
This problem had been discussed by Italian mathematicians during the
15th and 16th century, but they had failed to find a satisfactory solution. In
particular, they had failed to notice that only the numbers wanted are to be
considered.

2.1.1

Pascals solution

Although Pascal only considers special cases, his approach is perfectly general
since it consists of an algorithm for calculating the expected outcome for
each player. Pascal then claims that this number is the sought one. As an
illustration, suppose that A and B each has staked 32 pistoles in a fair game.
The player who first reaches four points wins the entire pot of 64 pistoles.
But what if the game is interrupted when A has obtained two points and B
only one? The stakes are to be divided as follows: A needs an additional two
points to win, and B needs three points to win. We can write that as R(2, 3).

116

The possible outcomes had they taken one or two more turns is depicted in
figure 2.1.

Figure 2.1: Tree representation of the stakes left.


Let s (R(n, m)) be the function whose output is the sum player A should
receive if the game is interrupted when the stakes are R(n, m). Note that
we can easily calculate s (R(0, m)) = 64 since R(0, m) means player A has
won, and s (R(n, n)) = 32 because this indicates that the scores are even and
thus it is reasonable that the pot be split evenly. Now, what is the result of
s (R(2, 3))?

117

The calculations can be done recursively as such:


1
1
s (R(2, 3)) = s (R(2, 2)) + s (R(1, 3))
2
2
1
1
= 32 + s (R(1, 3))
2
2


1
1 1
1
= 32 +
s (R(0, 3)) + s (R(1, 2))
2
2 2
2
1
1
= 32 +
2
2


1
1
64 + s (R(1, 2))
2
2

1
1
= 32 +
2
2

1
1
64 +
2
2


1
1
s (R(0, 2)) + s (R(1, 1))
2
2

1
1
= 32 +
2
2

1
1
64 +
2
2


1
1
64 + 32
2
2

(2.1)

1
1
1
1
= 32 + 64 + 64 + 32 = 44.
2
4
8
8
Hence player A should receive 44 pistoles and player B 20 pistoles.

2.1.2

Fermats solution

Fermat notes that the game described above will end after at the most four
additional throws, and that all results are equally possible. More precisely,
the results are the following ones:
aaaa
baaa
baab
babb

aaab
aabb
baba
bbab

aaba
abab
bbaa
bbba

abaa
abba
abbb
bbbb

Here a stands for a result that yields a point to A. He then notes that in 11
of these cases, when A gets two points or more, A will win. Hence A is to
11
receive 16
64 = 44pistoles. Note that the sample space above contains results
that will never occur but make all results equally likely.

118

2.2

Generalizations

Pascal and Fermat also considers the following generalization: A is missing


one point while B and C each are missing two points. Their discussion of
this case is best known for a mistake on the part of Pascal. His suspicion
that Fermats approach will give the wrong numbers is patently false as he
himself later acknowledges, but he has a point in that Fermats approach
does not readily show the desired probabilities whereas his own method is
perfectly general.

2.2.1

Pascals approach

It is assumed that all participants have equal chances in winning a point and
that each of them has staked 32 pistoles. The remaining possible outcomes
is presented in figure 2.2.
We use the same notation as above and notes that s (R(0, n, m)) = 96,
s (R(n, 0, m)) = 0, s (R(n, m, 0)) = 0 and s (R(n, n, n)) = 32. Following the
same recursive steps we can calculate As share as
s (R(1, 2, 2)) = 31 s (R(0, 2, 2)) + 31 s (R(1, 1, 2)) + 13 s (R(1, 2, 1))
1
1
= 96 +
3
3
1
+
3


1
1
1
s (R(0, 1, 2)) + s (R(1, 0, 2)) + s (R(1, 1, 1))
3
3
3


1
1
1
s (R(0, 2, 1)) + s (R(1, 1, 1)) + s (R(1, 2, 0))
3
3
3

1
1
= 96 +
3
3




1
1
1
1 1
1
1
96 + 0 + 32 +
96 + 32 + 0
3
3
3
3 3
3
3

1
1
1
1
1
1
= 96 + 96 + 32 + 96 + 32 = (288 + 96 + 32 + 96 + 32)
3
9
9
9
9
9
=

544
.
9

(2.2)

544
Hence A is to receive
pistoles.
9

119

Figure 2.2: Tree representation of the stakes left.

2.2.2

Fermats approach

The game will end after at most three throws. Hence the sample space
contains the following results:
aaa aab aac aba abb abc aca acb acc
baa bab bac bba bbb bbc bca bcb bcc
caa cab cac cba cbb cbc cca ccb ccc
120

(2.3)

Player A wins after getting at least one additional point, except when
player B or player C has managed to get two points before (underlined).
544
17
96 =
pistoles.
Hence A is to receive
27
9
It should be borne in mind that the aim of Pascals and Fermats investigations were to determine the stakes each player should receive, not to
determine probabilities in general. In this they were followed by their successors Christiaan Huygens and James Bernoulli, who devoted much time to
variants of the Problem of Points, see under Exercises below. It is, however,
noteworthy that these brilliant minds did not succeed in obtaining a general
account of the probability of A winning the game, although such an account
presents few problems. Assume for instance that A is in need of p points, B
of r points, and C of s points. Then A wins the game if and only if she obtains p points while B and C obtain at most r 1 or s1 points respectively.
If then probabilities for obtaining points are added, then the probability of
A winning is home, see under Exercises.
It should also be noted that the division of stakes proposed by Fermat
and Pascal was immediately accepted by all mathematicians at the time, and
that their proposal also has a bearing on the games considered in Chapter
2. Assume namely that a player has bought a ticket in one of the games
described there and that some circumstances hinders a drawing from taking
place. Then, according to Fermat and Pascal, she is entitled to the expected
value of the game in question. The term expected value stems from the term
expectatio used by Christiaan Huygens in his treatise De Ratiociniis in Ludo
Aleae, which is contained in the first part of Bernoulli (1713). J. Bernoulli
there also gives a general argument to support the claim that the expectatio
of a game is what you should expect to get when playing it. Suppose that we
have n urns containing a1 , . . . , an SEK. If n players are allotted these urns by
a random experiment, then the expectatio of each player is the same. Since
the combined expectatio is a1 + + an , the expectatio of each player is
a1 + + an
.
n

2.3

Other Problems

Fermat and Pascal also seems to have solved the following problem posed
by Huygens, see About and Boy (1983). Two players A and B throw two
distinguishable dice in turn. A starts and wins if his throw results in the sum
of six, whereas B wins if his throw results in the sum of seven. What are the
odds for A and B respectively?

121

Fermat and Pascal found the right numbers, 30/61 for A and 31/61 for
B, but unfortunately their calculations are lost. To find the answer, note
first that the sum six occurs when the dice show the numbers (1, 5), (2, 4),
(3, 3), (4, 2) or (5, 1), and that the sum seven occurs when the dice show the
numbers (1, 6), (2, 5), (3, 4), (4, 3), (5, 2) or (6, 1). Note then that A can
win after any odd number of throws and B after any even number. The
probability that A wins after 2n 1 throws (or the nth turn) with n 1
equals
n1

5
31 30

,
(2.4)
P (An ) =
36 36
36
and the probability that B wins after 2n throws (or the nth turn) with n 1
equals

n1
31 30
31
6
P (Bn ) =

.
(2.5)
36 36
36 36
The probability that either player is winning the mth turn is


n1
n1
31 30
31 30
6
5
31
P (Am ) + P (Bm ) =

36 36
36
36 36
36 36
31 30

36 36

n1 

31 30

36 36

n1 

5
6 31 1
+

36 6 36 6

31 30

36 36

n1 

61
6 36


=

5
31
6
+

36 36 36


(2.6)


.

Hence the probability that A wins the game equals


P (A) =

P (Am )
P (Am ) + P (Bm )
31
36
31
36

5
36
61
636

30 n1 5
36
36

30 n1
61
36
636

(2.7)

30
.
61

In a similar way it is shown that the probability that B wins equals


122

31
.
61

2.4

Exercises

1. Problem of Points (from Bernoulli (1713))


A is wanting two points and B four points. Stakes as above. Compute
the expected values.
2. Problem of Points (from Bernoulli (1713))
A and B are each wanting one point while C is wanting two points.
Stakes as above. Compute the expected values.
(a) Problem of Points
Assume that A wants p points and B q points. Suppose that the
probability of A gaining a point is a and that the corresponding
probability for B is b. Determine the probability of A winning the
game.
(b) Much like (a) but with three players. Thus A wants p points, B
wantsq points and C wants r points. Each player has the chance to
win each round with probability a, b and c respectively. Determine
the probability of winning the game for each player.
3. Game of dice (from Bernoulli (1713))
A wins if she rolls a six with a die. Determine the number of trials A
should demand before entering the game.
4. Game of dice (from Bernoulli (1713))
A wins if she rolls a double six with two dice. Determine the number
of trials A should demand before entering the game.
5. Game of dice (from Bernoulli (1713))
A wins if she rolls a six. Determine the number of dice A should demand
before entering the game.
6. Game of dice (from Bernoulli (1713))
Two dice are thrown. A wins if the number of eyes is seven whereas
B wins if the number of eyes is ten. In any other case the stakes
are divided evenly. Stakes as above (32 pistoles each). Determine the
expected values.
7. Game of dice (from Bernoulli (1713))
A and B play with two dice until one of them wins. A wins if she rolls
a six while B wins if she rolls a seven. Player A and B take turns in a
sequence such that A makes one throw, B makes two throws, A makes

123

two throws, B makes three throws, A makes three throws, etc. Find
the probability of A winning the game.
8. Game of stones (from Bernoulli (1713))
An urn contains four white stones and eight black ones. Three players
in turn draw stones without replacement until a white stone is drawn.
Determine the probability of winning for each player.
9. Game of stones (from Bernoulli (1713))
Like exercise 8 except that stones are drawn with replacement.
10. Game of dice (from Bernoulli (1713))
A and B are involved in a game where three dice are thrown. In case the
number of eyes is 11, A is to give B a coin, whereas she is to receive one
from B if the number of eyes is 14. They start the game with 12 coins
each and whoever gains position of all coins wins the game. Determine
the quotient between their probabilities of winning the game.

2.5

Solutions

1. Problem of Points
The expected value for B is
0 P (A wins the game) + 64 P (B wins the game) pistoles.
Now A wins if she gains two points and B at most three points, whereas
B wins if she gains four points and A at most one. Hence B wins if
she gains four points and A none, or if she gains four points and A one
point but not the last one. Since
P (A wins a point) = P (B wins a point) =

1
2

we can calculate
P (B wins four points and A none) =

1
1
= .
4
2
16

Moreover
P (B wins four points and A one point but not the last one) =

124

4
32

since A can win the first, second, third or fourth round. Hence
P (B wins the game) = P (B wins four points and A none)
+P (B wins four points and A one point but not the last one) =

3
.
16

Consequently the expected value for B equals


64

3
= 12 pistoles.
16

2. Problem of Points
The expected value for C is
E[C] = 0 P (A wins the game)
+0 P (B wins the game)
+96 P (C wins the game) pistoles.
Now C wins the game if and only if she wins the next two rounds.
Hence
P (C wins the game) = 1/9.
Accordingly her expected value is 11 pistoles.
(a) Problem of Points
A wins the game if she wins p rounds including the last round,
while B wins at most q 1 rounds. Hence
P (A wins the game) =


q1 
X
p1+k
k=0

ap b k

(2.8)

As an example, set p=2 and q=3, then there are basically three
possible ways in which A can win: A get two points in a row
and B zero points, A get two points and B 1 point, A get two
points and B two points. Looking closer at the last scenario, we
see that it can occur in three different ways. Let A represent a
round where player A wins and B a round where player B wins.
Since A always have to score on the last round we have a situation
like A, and the question is in how many ways we can order the
remaining ABB. Of course we can list them like this:
ABB BAB BBA
125

But as the numbers becomes bigger, listing all possible combinations like this soon gets close to impossible. Instead one can utilize
the fact that n distinguishable objects in a row can be ordered in
n(n 1) . . . (n r) . . . 1 = n! different ways. However, since B
and B arent distinguishable (and the same does of course apply
to the As should weve had more than one) the total number of
ways has to be divided by two. To reach the general case we could
use the binomial coefficient
 
n
n!
.
=
k!(n k)!
k
Here n represents the total number of rounds, namely p 1 + k,
where p 1 is the rounds won by A except the last one, and k is
the total number of rounds won by B. Back to our example we
can calculate the number of distinct orderings of ABB as
 
3
3!
3 2!
=
=
= 3.
2
2!(3 2)!
2!(1)!
Writing the expression (2.8) out in full for p = 2 and q = 3 gives

2 
X
1+k 2 k
ab =
k
k=0
 
 
 
3 2 2
1 2 0
2 2 1
=
ab +
ab +
ab =
0
1
2
= a2 + 2a2 b + 3a2 b2 .
(b) Player A wins the game if she wins the last round and, in addition,
p 1 rounds while B wins at most q 1 rounds and C at most
r 1 rounds. Suppose that the probability of A gaining a point
equals a, and that the corresponding probabilities for B and C
are b and c respectively. Then
q1 r1 
X
X p 1 + k + l
p
P (A wins the game) = a
bq c r
p

1,
k,
l
k=0 l=0
Note that the double summation sign means for each k, calculate
the expression for all l. Also, instead of a binomial coefficient we
are now using a multinomial coefficient which expands as follows.


p1+k+l
(p 1 + k + l)!
=
.
(p 1)! k! l!
p 1, k, l
126

In a similar way we can determine P (A wins the came) for any


number of players.
3. Game of dice (from Bernoulli (1713))
It is presupposed that A ought only engage in a game when the probability of her winning is at least
0.5. But then she should demand that

5 n
n satisfy the inequality 6 0.5. Hence she should demand to be
4
given at least four trials; 56 0.48.
4. Game of dice (from Bernoulli (1713))
As in the preceding problem the probability of A winning should be at
least
 0.5. But then
she should ask to be given at least 25 trials since
35 25
35 24
< 0.5 < 36 . This result was considered scandalous by the
36
17th century gambler Chevalier de Mere since 64 = 24
, and was one of
36
the problems that triggered the development of probability theory, see
Pascals letter to Fermat of July 29 1654.
5. Game of dice (from Bernoulli (1713))
The solution is similar to the one of problem 3.
6. Game of dice (from Bernoulli (1713))
The number of eyes will be seven in six of 36 cases and ten in three of
36 cases. Hence the expected value of this game for A equals
64

27
2
6
+ 32
= 34 + pistoles
36
36
3

whereas it for B equals


64

3
27
1
+ 32
= 29 + pistoles.
36
36
3

7. Game of dice (from Bernoulli (1713))


We assume that the dice are distinguishable. The probability of rolling
a sum of six is 5/36 whereas the probability of rolling a sum of seven is
6/36. Let An be short for A wins after n trials, then the probability
that A wins is
P (A1 ) + P (A3 ) + P (A5 ) + + P (An )
where n tends to infinity.

127

Now, set p = 1 5/36 (the probability of not getting a sum of six) and
q = 1 6/36 (the probability of not getting a sum of seven). Then
P (A1 ) = 1 p
P (A3 ) = pq 2 (1 p2 )
P (A5 ) = pq 2 p2 q 3 (1 p3 )
P (A7 ) = pq 2 p2 q 3 p3 q 4 (1 p4 ).
Considering the recurring pattern we can write the more general formula
P (A2n+1 ) = p(pq)

n1
(n+2)
2

q n+1 (1 pn+1 )

which can be further simplified to


P (A2n+1 ) = p

n(n+1)
2

n(n+3)
2

(1 pn+1 ).

P (An ) approaches zero quite rapidly as n increases. In fact P (A11 ) <


0.002 and P (A12 ) 0.0006. Consequently it should be enough, for our
purposes, to calculate the probability of A winning the game as
P (A wins the game) P (A1 ) + + P (A12 ) 0.433.
8. Game of stones (from Bernoulli (1713))
The game ends after at most nine draws. Let ln be the event loss after
n draws, and let wn be the event win after n draws. Then we can
define a recursive function such that
P (l0 ) = 1
P (ln ) = P (ln1 )

8 (n 1)
12 (n 1)

for n = 1, . . . , 8

P (wn ) = P (ln1 )

4
12 (n 1)

for n = 1, . . . , 9.

and then

If the players A, B and C draw stones in turns, then


P (A wins the game) = P (w1 ) + P (w4 ) + P (w7 ) =

128

7
,
15

P (B wins the game) = P (w2 ) + P (w5 ) + P (w8 ) =

53
165

P (C wins the game) = P (w3 ) + P (w6 ) + P (w9 ) =

7
33

and

Note that this solution can be applied in any similar game.


9. Game of stones (from Bernoulli (1713))
This exercise provides a splendid instance of a problem where it is easier
to first give a general solution and then put in numbers. So assume
that
P (drawing a black stone) = q
and
P (drawing a white stone) = p = 1 q
with 0 < q < 1. Then
P (someone wins at draw number n + 1, n 0) = q n p.
Since A can win at draws number 1, 4, . . . , 3n + 1, . . . , B at draws
number 2, 5, . . . , 3n + 2, . . . , and C at draws 3, 6, . . . , 3n + 3, . . . , we get
the following probabilities:
P (A wins) = p(1 + + q 3n + . . . ),
P (B wins) = pq(1 + + q 3n + . . . )
and
P (C wins) = pq 2 (1 + + q 3n + . . . ).
The sum 1 + + q 3n + . . . is a geometrical series, which we can rewrite
as
s n = 1 + x + x2 + + xn
with
x = q 3 < 1.
Multiplying both sides by x gives us
xsn = x + x2 + + xn+1 .
Subtracting xsn from sn then gives us
sn (1 x) = 1 xn+1

129

and consequently
sn =

1 xn+1
1 q 3(n+1)
=
.
1x
1 q3

Since q 3(n+1) tends to zero quite rapidly as n gets large, we can omit
that term and approximate the original sum as
1 + + q 3n +

1
.
1 q3

Using the approximation we get


P (A wins) =

p
,
1 q3

P (B wins) =

pq
1 q3

P (C wins) =

pq2
.
1 q3

and

Setting p = 1/3 and q = 2/3, we finally get


P (A wins) = 9/19,
P (B wins) = 6/19
and
P (C wins) = 4/19.
10. Game of dice (from Bernoulli (1713))
The number of outcomes when throwing three dice is 63 = 216. To
calculate the number of ways in which we can get the sum of 11 and 14
we can take the following approach. The distinct combinations giving
the sum 11 are
(1, 6, 4) (1, 5, 5)
(2, 6, 3) (2, 5, 4)
(3, 5, 3) (3, 4, 4)
Let N (x) be the number of distinct orderings of the sequence x, then
N ((1, 6, 4)) = 6 N ((1, 5, 5)) = 3
N ((2, 6, 3)) = 6 N ((2, 5, 4)) = 6
N ((3, 5, 3)) = 3 N ((3, 4, 4)) = 3
130

Consequently, out of the 216 possible outcomes, 6 + 3 + 6 + 6 + 3 + 3 =


27 adds up to 11. Using the same procedure to find the number of
outcomes giving the sum of 14, we first get the distinct combinations
(2, 6, 6) (3, 6, 5)
(4, 6, 4) (4, 5, 5)
and then we have
N ((2, 6, 6)) = 3 N ((3, 6, 5)) = 6
N ((4, 6, 4)) = 3 N ((4, 5, 5)) = 3
Thus in 3 + 6 + 3 + 3 = 15 of the cases the sum adds up to 14.
Set the probability that A gets a coin from B to
a=

15
216

and the probability that B gets a coin from A to


b=

27
.
216

Now, let P (An ) be the event A wins after n rounds, where n 12,
and A = i and B = i that player A and B won i times respectively,
and finally L = i that neither player won i times. Then
P (A12 ) = P (A = 12 B = 0 L = 0),
P (A13 ) = P (A = 12 B = 0 L = 1)
and
P (A14 ) = P (A = 12 B = 0 L = 2) + P (A = 13 B = 1 L = 0).
This can be generalized to
P (An ) =

12+2in
X

P (A = 12 + i B = i L = n 12 2i)

i=0

which can be calculated as


P (An ) =

12+2in
X

a12+i bi (1 a b)n122i

i=0

131

(n 1)!
.
(12 + i 1)!i!(n 12 2i)!

Setting up the same equation for P (Bn ) is only a matter of shifting the
symbols and thus get
P (Bn ) =

12+2in
X

ai b12+i (1 a b)n122i

i=0

(n 1)!
.
(12 + i 1)!i!(n 12 2i)!

If we extract the products a12 and b12 from the summations we can
then calculate
P
(n1)!
a12 12+2in
ai bi (1 a b)n122i (12+i1)!i!(n122i)!
P (An )
a12
i=0
=
=
.
P
(n1)!
P (Bn )
b12
b12 12+2in
ai bi (1 a b)n122i
i=0
(12+i1)!i!(n122i)!

Lastly we calculate
15/216 12 5 12
a12
=
0.00086
=
b12
27/216
9

132

Appendix 3
Monte Carlo simulations
In this appendix we will present the method used and the results from the
Monte Carlo simulations. Because even simple games such as the ones in
section 2.1 require rather extensive simulations, weve chosen to include only
those for game 1 and 2. However, the same method is applicable to game 3
and 4 as well.
As a reminder, game 1 has the following set up:
Game 1
One throw of four distinguishable dice.
Stake: 200 SEK
Prizes:
(a) 12 000 SEK if the product of the outcomes is an odd number and their
sum a square one.
(b) 4 000 SEK if the product of the outcomes is an odd number and their
sum minus one a square number.
(c) 1 000 SEK if the product of the outcomes is an odd number and neither
(a) nor (b) holds.
Expected value:
16
54
11
12000 +
4000 +
1000 = 192.90
1296
1296
1296
The set up for game 2 is:
Game 2
One throw of four distinguishable dice.
Stake: 200 SEK
Prizes:
133

(a) 3 000 SEK if the product of the outcomes is a square number, their
sum is odd, and the number one does not occur.
(b) 2 000 SEK if the product of the outcomes is a square number, their
sum is even, and the number one does not occur.
(c) 550 SEK if the product of the outcomes is a square number and neither
(a) nor (b) holds.
Expected value:
24
65
110
3000 +
2000 +
550 = 202.55
1296
1296
1296
To run the simulations we used R, a free platform for statistical computing. We are aware that this isnt the implementation with the fastest
run-time, but the code has been written with the beginner programmer in
mind. All the code used is included bit by bit and suggestions on how to
improve clarity are welcome. However, before going any further we need to
be clear about what to do.
The question which we will try to answer with the help of computer
simulations is: How many games need to be played in order for the average
prize to stay within +/- 4 SEK from the expected value with a probability of
at least 0.99? Now, if we play m games and then calculate the average prize,
it will either be within +/- 4 SEK from the mean, or it will not. Hence,
whether the average prize of m games falls within the specified interval or
not, can be seen as a Bernoulli trial (an experiment with only two possible
outcomes), equivalent to the tossing of a coin.
If we let p be the probability of the average prize falling within +/- 4 SEK
of the expected value, then we can set up the following two hypothesis:
H0 : < 0.99

and

H1 : 0.99.

Would p be less than 0.99, we should expect to find more averages outside
the interval in the long run, then if p would be either equal to or greater
then 0.99. Thus, if we get sufficiently many averages within the interval in
proportion to the number of trials, we could decide to reject the hypothesis
H0 .
The probability of getting n hits in a row is pn , and thus the probability
of getting n hits in a row, given p < 0.99 is
Z 1
Z 0.99
n
P (p < 0.99) =
p dp
pn dp
p=0

134

p=0

where the denominator is used to normalize the result. We can never be


absolutely certain that p isnt less than 0.99, but if the probability of it
being true is sufficiently small, we can choose to reject the hypothesis that
it is indeed true. In this case we choose the probability = 0.01 as small
enough for us to reject H0 . Now we need to determine how many times in a
row the average has to fall within the interval in order for P (p < 0.99) .
Lets first calculate
R 0.99 n
p dp
p=0
P (p < 0.99) = R 1
pn dp
p=0

0.99
pn+1

n+1
0
=
1
pn+1
n+1
0

0.99n+1 n + 1
=
n+1
1
n+1
= 0.99
and then solve
= P (p < 0.99)

0.01 = 0.99n+1

0.01
= 0.99n
0.99

ln 0.01 ln 0.99 = n ln 0.99

ln 0.01 ln 0.99
ln 0.01
n=
=
1 = 457.2...
ln 0.99
ln 0.99
Consequently, if 458 trials of m games, each would result in an average prize
at most +/- 4 SEK from the expected value, we would reject H0 and rather
believe H1 to be the true one, and thus take m to be the number of games
that has to be played to in order to stay within an interval at most +/- 4 SEK
from the expected value with a probability of at least 0.99.
Looking at the code used in the simulations, we first defined some commonly used constants in order to avoid unnecessary and time consuming
calculations:
135

N <- 458;
EV <- c ( " GAME1 " = 192.90 , " GAME2 " = 202.55);
EVEN _ SQUARES <- c (1 ,
49 ,
169 ,
361 ,
625 ,
961 ,

4,
64 ,
196 ,
400 ,
676 ,
1024 ,

9,
81 ,
225 ,
441 ,
729 ,
1089 ,

16 ,
100 ,
256 ,
484 ,
784 ,
1156 ,

25 ,
121 ,
289 ,
529 ,
841 ,
1225 ,

36 ,
144 ,
324 ,
576 ,
900 ,
1296);

Then we defined a couple of helper functions to make some of the conditionals clearer:
even <- function ( n ) n %% 2 == 0; # True if n is even
odd <- function ( n ) n %% 2 == 1; # True if n is odd

We put the functions for calculating the prize of a four dice throw, depending on which game we are playing, in a list. Doing this makes the rest of
the code simpler and possibly faster, since we dont have to use conditionals
in order to keep track of which game we are playing.
win <- list (
" GAME1 " <- function ( l ) {
if ( odd ( prod ( l ))) {
if ( sum ( l ) % in % EVEN _ SQUARES ) return (12000);
if (( sum ( l ) - 1) % in % EVEN _ SQUARES ) return (4000);
return (1000);
}
return (0)
},
" GAME2 " <- function ( l ) {
if ( prod ( l ) % in % EVEN _ SQUARES ) {
if ( odd ( sum ( l )) & & 1 % in % l == FALSE ) return (3000);
if ( even ( sum ( l )) & & 1 % in % l == FALSE ) return (2000);
return (550);
} else {
return (0);
}
}
);

The following function plays m games and returns a vector of length m,


where each element contains the outcome/prize of a game.
play _ games <- function (m , game ) {
play <- function ( e ) {
win [[ game ]]( sample (1:6 , 4 , replace = T ));
}

136

wins <- vapply ( rep ( NA , m ) , play , FUN . VALUE = numeric (1));


return ( wins );
}

The core of the simulation is a function which plays m games, calculates


the average prize of the m games, and then repeats the same procedure n
times. It returns the number of times the average prize was outside the
interval +/- 4 SEK from the expected value.
run _ trials <- function (n , m , game ) {
fails <- 0;
for ( i in 1: n ) {
print (n - i +1); # Print trials left ( i . e current status )
average <- mean ( play _ games (m , game ));
if ( average < EV [ game ] - 4 || average > EV [ game ] + 4) {
fails <- fails + 1;
}
}
return ( fails );
}

In order to automate the simulation process we wrote a function that


runs a series of simulations and returns the result:
run _ simulation <- function ( mstart , mstop , minterval , game ) {
trials <- seq ( mstart , mstop , minterval );
fails <- rep ( NA , length ( trials ));
for ( i in 1: length ( trials )) {
fails [ i ] <- run _ trials (N , trials [ i ] , game );
plot ( trials , fails , xlab = " Trial " , ylab = " Failed " , \\
col = " red " , pch =19);
}
return ( rbind ( trials , fails ));
}

Furthermore, to calculate the probability of some number of failures, given


some probability interval of being within +/- 4 SEK from the expected value,
we wrote the following function:
calculate _ p <- function ( failed , p _ min , p _ max ) {
p <- function ( p _ min , p _ max , num _ total , num _ failed ) {
integrand <- function ( p ) {
choose ( num _ total , num _ failed ) * p ^( num _ total - \\
num _ failed ) * (1 - p )^ num _ failed ;
}
numerator <- integrate ( integrand , p _ min , p _ max );
denominator <- integrate ( integrand , 0 , 1);
return ( numerator $ value / denominator $ value );

137

}
fun <- function ( x ) {
p ( p _ min , p _ max , N , x );
}
return ( vapply ( failed , fun , FUN . VALUE = numeric (1)));
}

3.1

Results

The results from the simulations of Game 1 are presented i figure 3.1 and
3.2, and the results from the simulations of Game 2 are presented i figure 3.3
and 3.4.
Looking at game 1, the first time of which none of the 458 trials fails is at
661,000 repeated games. The probability of zero failures given H0 is less than
0.01. Albeit not impossible, starting at 797,000 repeated games, zero failures
are becomingly increasingly common and the probability of the outcome,
given H0 , is always less than 0.5. Thus it is reasonable to believe that, for
game 1, p < 0.99 somewhere in the neighborhood of 796,000 repeated games.
Considering game 2 in the same way, the first occurrence of zero failures is
at 167,000 repeated games, and at 252,000 repeated games, zero failures are
becoming increasingly common, with the probability of the outcome, given
H0 , is less than 0.5. Consequently, it is reasonable to believe that p < 0.99
at approximately 251,000 repeated games, for game 2.
Comparing the above results with the Bernstein-Bennet and Talagrand
inequalities, we have for game 1 953,608 and 810,700, and game 2 235,540
and 200,185, respectively. Thus, in summary, the simulations for game 1
gives a result which is a bit smaller than the inequalities, whilst for game 2
they give a bit more conservative numbers.

138

200
0

100

Number of failures

300

400

Distribution of failures for Game 1

500000

1000000

1500000

2000000

Number of played games

Figure 3.1: Number of failures when simulating game 1.

0.6
0.4
0.0

0.2

Probability

0.8

1.0

Probability of outcome given hypothesis zero for Game 1

500000

1000000

1500000

2000000

Number of played games

Figure 3.2: The probability of the number of failures given H0 (that p < 0.99)
for game 1.

139

200
0

100

Number of failures

300

Distribution of failures for Game 2

0e+00

1e+05

2e+05

3e+05

4e+05

5e+05

6e+05

Number of played games

Figure 3.3: Number of failures when simulating game 2.

0.6
0.4
0.0

0.2

Probability

0.8

1.0

Probability of outcome given hypothesis zero for Game 2

0e+00

1e+05

2e+05

3e+05

4e+05

5e+05

6e+05

Number of played games

Figure 3.4: The probability of the number of failures given H0 (that p < 0.99)
for game 2.

140

Vous aimerez peut-être aussi