Académique Documents
Professionnel Documents
Culture Documents
M
CENDO
Ulm University
Institute of Stochastics
Lecture Notes
Prof. Dr. Volker Schmidt
Summer 2010
DO
UR
SCIENDO
ANDO U
N
ERS
IV
CONTENTS
Contents
1 Introduction
2 Markov Chains
2.1
2.2
2.3
2.1.1
2.1.2
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3
Recursive Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.4
2.2.2
2.2.3
2.2.4
2.2.5
2.3.2
2.3.3
2.3.4
2.3.5
2.3.6
2.3.7
3 MonteCarlo Simulation
3.1
3.2
3.3
58
3.1.2
3.1.3
Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Inversion Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2.2
3.2.3
Acceptance-Rejection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.4
3.3.2
Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
CONTENTS
3.3.3
MetropolisHastings Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4
3.5
3.4.2
3.4.3
3.5.2
3.5.3
3.5.4
3.5.5
1 INTRODUCTION
Introduction
Markov chains
are a fundamental class of stochastic models for sequences of nonindependent random variables, i.e.
of random variables possessing a specific dependency structure.
have numerous applications e.g. in insurance and finance.
play also an important role in mathematical modelling and analysis in a variety of other fields such as
physics, chemistry, life sciences, and material sciences.
Questions of scientific interest often exhibit a degree of complexity resulting in great difficulties if the
attempt is made to find an adequate mathematical model that is solely based on analytical formulae.
In these cases Markov chains can serve as an alternative tool as they are crucial for the construction of
computer algorithms for the Markov Chain Monte Carlo simulation (MCMC) of the mathematical models
under consideration.
This course on Markov chains and Monte Carlo simulation will be based on the methods and models introduced
in the course Elementare Wahrscheinlichkeitsrechnung und Statistik. Further knowledge of probability theory
and statistics can be useful but is not required.
The main focus of this course will be on the following topics:
Notions and results introduced in Elementare Wahrscheinlichkeitsrechnung and Statistik will be used
frequently. References to these lecture notes will be labelled by the prefix WR in front of the number
specifying the corresponding section, theorem, lemma, etc.
The following list contains only a small collection of introductory texts that can be recommended for in
depth studies of the subject complementing the lecture notes.
E. Behrends (2000) Introduction to Markov Chains. Vieweg, Braunschweig
P. Bremaud (2008) Markov Chains, Gibbs Fields, Monte Carlo Simulation, and Queues. Springer,
New York
B. Chalmond (2003) Modeling and Inverse Problems in Image Analysis. Springer, New York
D. Gamerman, H. Lopes (2006) Markov Chain Monte Carlo: Stochastic Simulation for Bayesian
Inference. Chapman & Hall, London
O. Hggstrm (2002) Finite Markov Chains and Algorithmic Applications. Cambridge University
Press, Cambridge
D. Levin, Y. Peres, E. Wilmer (2009) Markov chains and mixing times. Publications of the AMS,
Riverside
S. Resnick (1992) Adventures in Stochastic Processes. Birkhuser, Boston
C. Robert, G. Casella (2009) Introducing Monte Carlo Methods with R. Springer, Berlin
T. Rolski, H. Schmidli, V. Schmidt, J. Teugels (1999) Stochastic Processes for Insurance and Finance.
Wiley, Chichester
Y. Suhov, M. Kelbert (2008) Probability and Statistics by Example. Volume 2. Markov Chains: A
Primer in Random Processes and their Applications. Cambridge University Press, Cambridge
H. Thorisson (2002) Coupling, Stationarity, and Regeneration. Springer, New York
G. Winkler (2003) Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer,
Berlin
2 MARKOV CHAINS
Markov Chains
Markov chains can describe the (temporal) dynamics of objects, systems, etc.
that can possess one of finitely or countably many possible configurations at a given time,
where these configurations will be called the states of the considered object or system, respectively.
Examples for this class of objects and systems are
the current prices of products like insurance policies, stocks or bonds, if they are observed on a discrete
(e.g. integer) time scale,
the monthly profit of a business,
the current length of the checkout lines (socalled queues) in a grocery store,
the vector of temperature, air pressure, precipitation and wind velocity recorded on an hourly basis at
the meteorological office UlmKuhberg,
digital maps, for example describing the momentary spatial dispersion of a disease.
microscopical 2D or 3D images describing the current state (i.e. structural geometrical properties) of
biological tissues or technical materials such as polymers, metals or ceramics.
Remarks
In this course we will focus on discretetime Markov chains, i.e., the temporal dynamics of the considered objects, systems etc. will be observed stepwise, e.g. at integer points in time.
The algorithms for Markov Chain Monte Carlo simulation we will discuss in part II of the course are
based on exactly these discretetime Markov chains.
The number of potential states can be very high.
For mathematical reasons it is therefore convenient to consider the case of infinitely many states as
well. As long as the infinite case is restricted to countably many states, only slight methodological
changes will be necessary.
2.1
2.1.1
The stochastic model of a discretetime Markov chain with finitely many states consists of three components:
state space, initial distribution and transition matrix.
The model is based on the (finite) set of all possible states called the state space of the Markov chain.
W.l.o.g. the state space can be identified with the set E = {1, 2, . . . , `} where ` N = {1, 2, . . .} is an
arbitrary but fixed natural number.
For each i E, let i be the probability of the system or object to be in state i at time n = 0, where
it is assumed that
`
X
i [0, 1] ,
i = 1 .
(1)
i=1
>
The vector = (1 , . . . , ` )
Markov chain.
Furthermore, for each pair i, j E we consider the (conditional) probability pij [0, 1] for the
transition of the object or system from state i to j within one time step.
2 MARKOV CHAINS
`
X
pij = 1 ,
(2)
j=1
(3)
(4)
0,
P (X0 = i0 , X1 = i1 ) =
p
i0 i0 i1 ,
if i0 = 0,
if i0 > 0,
2 MARKOV CHAINS
(5)
P (Xn1 = in1 , . . . , X0 = i0 )
i0 ,...,in2 E
(3)
i0 ,...,in2 E
i0 ,...,in2 E
= pin1 in .
Remarks
Corollary 2.1 can be interpreted as follows:
The conditional distribution of the (random) state Xn of the Markov chain {Xn } at time n is
completely determined by the state Xn1 = in1 at the preceding time n 1.
It is independent from the states Xn2 = in2 , . . . , X1 = i1 , X0 = i0 observed in the earlier history
of the Markov chain.
The definition of the conditional probability immediately implies
the equivalence of (5) and
P (Xn = in , Xn2 = in2 . . . , X0 = i0 | Xn1 = in1 )
=
(6)
2 MARKOV CHAINS
2.1.2
Examples
1. Weather Forecast
(see. O. Hggstrm (2002) Finite Markov Chains and Algorithmic Applications. CU Press, Cambridge)
We assume to observe the weather in an area whose typical weather is characterized by longer periods
of rainy or dry days (denoted by rain and sunshine), where rain and sunshine exhibit approximately
the same relative frequency over the entire year.
It is sometimes claimed that the best way to predict tomorrows weather is simply to guess that
it will be the same tomorrow as it is today.
If we assume that this way of predicting the weather will be correct in 75% of the cases (regardless
whether todays weather is rain or sunshine), then the weather can be easily modelled by a Markov
chain.
The state space consists of the two states 1 =rain and 2 = sunshine.
The transition matrix is given as follows:
0.75 0.25
.
(7)
P=
0.25 0.75
Note that a crucial assumption for this model is the perfect symmetry between rain and sunshine in
the sense that the probability that todays weather will persist tomorrow is the same regardless of
todays weather.
In areas where sunshine is much more common than rain a more realistic transition matrix would be
the following:
0.5 0.5
P=
(8)
0.1 0.9
2 MARKOV CHAINS
3. Queues
The number of customers waiting in front of an arbitrary but fixed checkout desk in a grocery store
can be modelled by a Markov chain in the following way:
Let X0 = 0 be the number of customers waiting in the line, when the store opens.
By Zn we denote the random number of new customers arriving while the cashier is serving the
nth customer (n = 1, 2, . . .).
We assume the random variables Z, Z1 , Z2 , . . . : {0, 1, . . .} to be independent and identically
distributed.
The recursive definition
Xn = max{0, Xn1 + Zn 1} ,
n 1,
(10)
yields a sequence of random variables X0 , X1 , . . . {0, 1, . . .} that is a Markov chain whose transition
matrix P = (pij ) has the entries
P (Z = j + 1 i) ,
if j + 1 i > 0 or j > i = 0,
pij =
P (Z = 0) + P (Z = 1) , if j = i = 0,
0,
else
Xn denotes the random number of customers waiting in the line right after the cashier has finished
serving the nth customer, i.e., the customer who has just started checking out and hence already left
the line is not counted any more.
4. Branching Processes
We consider the reproduction process of a certain population, where Xn denotes the total number of
descendants in the nth generation; X0 = 1.
We assume that
Xn1
Xn =
Zn,i ,
(11)
i=1
where {Zn,i , n, i N} is a set of independent and identically distributed random variables mapping
into the set E = {0, 1, . . .}.
The random variable Zn,i is the random number of descendants of individual i in generation (n 1).
The sequence X0 , X1 , . . . : {0, 1, . . .} of random variables given by X0 = 1 and the recursion (11)
is called a branching process.
One can show (see Section 2.1.3) that
pij =
P
i
Z1,k = j , if i > 0,
k=1
1,
if i = j = 0,
0,
else.
(12)
2 MARKOV CHAINS
10
6
pij =
0,
if (j + 1000 i)
else.
i = 1, . . . , 6 .
The sequence X0 , X1 , . . . {0, 1, . . . , 999} of random variables defined by the recursion formula
Xn = (Xn1 + Zn )
mod (1000)
(13)
2.1.3
Recursive Representation
2 MARKOV CHAINS
11
(14)
where the last equality follows from the transformation theorem for independent and identically
distributed random variables (see Theorem WR3.18),
as the random variables X0 , . . . , Xn1 are functions of Z1 , . . . , Zn1 and hence independent of
(in1 , Zn ).
In the same way one concludes that
P ((in1 , Zn ) = in ) =
=
=
Remarks
The proof of Theorem 2.2 yields that the conditional probability
pij = P (Xn = j | Xn1 = i)
is given by pij = P ((i, Zn ) = j).
pij does not dependent on n, as the innovations Zn are identically distributed.
Moreover, the joint probability P (X0 = i0 , X1 = i1 , . . . , Xn = in ) is given by
P (X0 = i0 , X1 = i1 , . . . , Xn = in ) = i0 pi0 i1 . . . pin1 in ,
(15)
where i0 = P (X0 = i0 ).
Consequently, the sequence X0 , X1 , . . . of random variables given by the recursive definition (14) is a
Markov chain following the definition given in (3).
Our next step will be to show that vice versa, every Markov chain can be regarded as the solution of a recursive
stochastic equation.
2 MARKOV CHAINS
12
Let X0 , X1 , . . . : E be a Markov chain with state space E = {1, 2, . . . , `}, initial distribution =
(1 , . . . , ` )> and transition matrix P = (pij ).
Based on a recursive equation of the form (14) we will construct a Markov chain X00 , X10 , . . . with initial
distribution and transition matrix P such that
P (X0 = i0 , . . . , Xn = in ) = P (X00 = i0 , . . . , Xn0 = in ) ,
i0 , . . . , in E
(16)
for all n 0:
1. We start with a sequence Z0 , Z1 , . . . of independent random variables that are uniformly distributed
on the interval (0, 1].
2. First of all the Evalued random variable X00 is defined as follows:
X00 = k
if and only if
Z0
k1
X
i=1
`
X
i ,
k
X
i=1
k
k1
X
X
k1I
i < Z0
i .
k=1
i=1
i
i ,
(17)
i=1
3. The random variables X10 , X20 , . . . are defined by the recursive equation
0
Xn0 = (Xn1
, Zn ) ,
(18)
`
X
k=1
k1I
k1
X
pij < z
j=1
k
X
pij .
(19)
j=1
It is easy to see that the probabilities P (X00 = i0 , X10 = i1 , . . . , Xn0 = in ) for the sequence {Xn0 } defined by
(17)(18) are given by (3), i.e., {Xn0 } is a Markov chain with initial distribution and transition matrix P.
Remarks
If (16) holds for two sequences {Xi } and {Xi0 } of random variables, these sequences are called stochastically equivalent.
The construction principle (17)(19) can be exploited for the MonteCarlo simulation of Markov chains
with given initial distribution and transition matrix.
Markov chains on a countably infinite state space can be constructed and simulated in the same way.
However, in this case (17)(19) need to be modified by considering vectors and matrices P of infinite
dimensions.
2.1.4
Let X0 , X1 , . . . : E be a Markov chain on the state space E = {1, 2, . . . , `} with initial distribution
= (1 , . . . , ` )> and transition matrix P = (pij ).
For arbitrary but fixed n 1 and i, j E the product pii1 pi1 i2 . . . pin1 j can be interpreted as the probability
of the path i i1 . . . in1 j.
2 MARKOV CHAINS
13
Consequently, the probability of the transition from state i to state j within n steps is given by the sum
X
(n)
pij =
pii1 pi1 i2 . . . pin1 j ,
(20)
i1 ,...,in1 E
where
(n)
pij = P (Xn = j | X0 = i)
if P (X0 = i) > 0.
(21)
Remarks
(n)
The matrix P(n) = (pij )i,j=1,...,` is called the nstep transition matrix of the Markov chain {Xn }.
If we introduce the convention P(0) = I, where I denotes the ` `dimensional identity matrix, then
P(n) has the following representation formulae.
Lemma 2.1 The equation
P(n) = Pn
(22)
(23)
Equation (22) is an immediate consequence of (20) and the definition of matrix multiplication.
Example
(Weather Forecast)
P=
1p
p0
1 p0
0
0 n
p
p
p
p
1
(1
p
)
+
.
Pn =
p + p0
p + p0
p0 p
p0 p0
Remarks
The matrix identity (23) is called the Chapman-Kolmogorov equation in literature.
Formula (23) yields the following useful inequalities.
Corollary 2.2 For arbitrary n, m, r = 0, 1, . . . and i, j, k E,
(n+m)
pii
and
(r+n+m)
pij
(n) (m)
pij pji
(24)
(25)
Furthermore, Lemma 2.1 allows the following representation of the distribution of Xn . Recall that Xn denotes
the state of the Markov chain at step n.
2 MARKOV CHAINS
14
Theorem 2.3
Let X0 , X1 , . . . be a Markov chain with state space E = {1, . . . , `}, initial distribution and onestep
transition matrix P.
Then the vector n = (n1 , . . . , n` )> of the probabilities ni = P (Xn = i) is given by the equation
> n
>
n = P .
(26)
Proof
From the formula of total probability (see Theorem WR2.6) and (21) we conclude that
X
X
(n)
i P (Xn = j | X0 = i) =
i pij ,
P (Xn = j) =
iE
iE
Remarks
Due to Theorem 2.3 the probabilities ni = P (Xn = i) can be calculated via the nth power Pn of the
transition matrix P.
In this context it is often useful to find a socalled spectral representation of Pn . It can be constructed
by using the eigenvalues and a basis of eigenvectors of the transition matrix as follows. Note that there
are matrices having no spectral representation.
A short recapitulation
Let A be a (not necessarily stochastic) ` ` matrix, let , 6= 0 be two `dimensional (column)
vectors such that for each of them at least one of their components is different from 0, and let be an
arbitrary (real or complex) number.
If
A =
and
(27)
then is an eigenvalue of A and and are left and right eigenvectors (for ).
As (27) is equivalent to
(A I) = 0
and
(28)
Note that the determinant in (28) is a polynomial of order `. Thus, the algebraic equation (28) has `
possibly complex solutions 1 , . . . , ` . These solutions might not be all different from each other.
W.l.o.g. we may assume the eigenvalues 1 , . . . , ` to be ordered such that
|1 | |2 | . . . |` | .
For every eigenvalue i left and right eigenvectors i and i , respectively, can be found.
2 MARKOV CHAINS
15
>
1
.
= ..
>
`
be the ` ` matrix formed by the left eigenvectors 1 , . . . , ` .
By definition of the eigenvectors
A = diag() ,
>
where = (1 , . . . , ` )
(29)
An = ( diag())n 1 = ( diag())n .
`
X
in i >
i .
(30)
i=1
Remarks
An application of (30) for the transition matrix A = P results in a simple algorithm calculating the
nth power Pn of (26).
For the necessary calculation of the eigenvalues and eigenvectors of P standard software like MAPLE,
MATLAB or MATHEMATICA can be used.
A striking advantage of the spectral representation (30) can be seen in the fact that the complexity of
the numerical calculation for Pn stays constant if n is increased.
However, the derivation of (30) requires the eigenvectors 1 , . . . , ` to be linearly independent. The
next lemma gives a sufficient condition for the linear independence of eigenvectors.
Lemma 2.2
If all eigenvalues 1 , . . . , ` of A are pairwise distinct, every family of corresponding right eigenvectors
1 , . . . , ` is linearly independent.
Furthermore, if the left eigenvectors 1 , . . . , ` are given by = 1 it holds that
1 if i = j,
>
i j =
0 if i 6= j.
Proof
The first statement will be proved by complete induction.
As every eigenvector 1 has at least one nonzero component, a1 1 = 0 implies a1 = 0.
(31)
2 MARKOV CHAINS
16
Let now all eigenvalues 1 , . . . , ` of A be pairwise different and let the eigenvectors 1 , . . . , k1
be linearly independent for a certain k ` .
In order to show the independence of 1 , . . . , k it suffices to show that
k
X
aj j = 0
(32)
j=1
implies a1 = . . . = ak = 0.
Let a1 , . . . , ak be such that (32) holds. This also implies
0 = A0 =
k
X
aj Aj =
j=1
k
X
aj j j .
j=1
k
X
aj j =
j=1
and thus
0=
k1
X
k
X
k aj j
j=1
(k j )aj j .
j=1
2.2
2.2.1
If the Markov chain X0 , X1 , . . . has a very large number ` of possible states, the spectral representation (30)
of the n-step transition matrix P(n) = Pn discussed in Section 2.1.4 turns out to be inappropriate in order
to calculate
(n)
ensuring the existence of the limits limn pij and limn P (Xn = j), respectively, as well as their
equality and independence of i,
2 MARKOV CHAINS
17
(n)
(n)
thus justifying to consider the limit j = lim pij = lim P (Xn = j) as approximation of pij and
n
P (Xn = j) if n 1.
This serves as a motivation to formally introduce the notion of the ergodicity of Markov chains.
Definition The Markov chain X0 , X1 , . . . with transition matrix P = (pij ) and the corresponding n-step
(n)
transition matrices P(n) = (pij ) (= Pn ) is called ergodic if the limits
(n)
j = lim pij
(33)
P
jE
j = 1.
(Weather Forecast)
In order to illustrate the notion of an ergodic Markov chain we return to the simple example of weather
forecast already discussed in Sections 2.1.2 and 2.1.4.
Let E = {1, 2} and
P=
1p
0
1p
0
0 n
p
p
p
1
(1
p
)
+
Pn =
0
0
p + p0
p
+
p
p p
p0
p
p0
lim Pn =
1
p + p0
p0
and
= lim n =
n
p0
p0
p >
,
,
0
p+p
p + p0
(34)
respectively. Note that the limit distribution in (34) does not depend on the choice of the initial
distribution (= 0 ).
However, if p + p0 = 2, then
P
Pn =
I
if n is odd,
if n is even.
The ergodicity of Markov chains on an arbitrary finite state space can be characterized by the following notion
from the theory of positive matrices.
2 MARKOV CHAINS
18
Definition
The ` ` matrix A = (aij ) is called non-negative if all entries aij of A are non-negative.
The non-negative matrix A is called quasi-positive if there is a natural number n0 1 such that all
entries of An0 are positive.
Remark If A is a stochastic matrix and we can find a natural number n0 1 such that all entries of An0 are
positive, then it is easy to see that for all natural numbers n n0 all entries of An are positive.
Theorem 2.4 The Markov chain X0 , X1 , . . . with state space E = {1, . . . , `} and transition matrix P is ergodic
if and only if P is quasi-positive.
Proof
First of all we show that the condition
(n )
(35)
i,jE
Let mj
(n)
(n)
(n)
pij
(n)
pik pkj
kE
and thus
(n+1)
mj
(n+1)
= min pij
i
(n)
(n+1)
i.e., mj mj
for
(n)
(n+1)
Mj Mj
for all n
= min
i
(n)
(n)
(n)
0.
Consequently, in order to show the existence of the limits j in (33) it suffices to show that for all
jE
(n)
(n)
lim (Mj mj ) = 0 .
(36)
n
(n )
(n )
For this purpose we consider the sets E 0 = {k E : pi0 k0 pj0 k0 } and E 00 = E \ E 0 for arbitrary
but fixed states i0 , j0 E.
(n )
(n )
(n )
pi0 k0 pj0 k0
=1
kE 0
and
X
kE 00
(n )
(n )
pi0 k0 pj0 k0
(n )
pi0 k0
(n )
pj0 k0 1 `a
kE 00
kE 0
X (n )
(n )
pi0 k0 pj0 k0 .
kE 0
2 MARKOV CHAINS
19
By another application of the ChapmanKolmogorov equation (23) this yields for arbitrary n 0
and j E
X (n )
(n +n)
(n +n)
(n ) (n)
pi0 j0
pj0 j0
=
pi0 k0 pj0 k0 pkj
kE
X (n )
X (n )
(n ) (n)
(n ) (n)
pi0 k0 pj0 k0 pkj +
pi0 k0 pj0 k0 pkj
kE 0
kE 00
X (n )
X (n )
(n )
(n)
(n ) (n)
pi0 k0 pj0 k0 Mj +
pi0 k0 pj0 k0 mj
kE 0
kE 00
kE 0
kE 0
X (n )
X (n )
(n ) (n)
(n )
(n)
pi0 k0 pj0 k0 mj
pi0 k0 pj0 k0 Mj
X (n )
(n )
(n)
(n)
pi0 k0 pj0 k0 Mj mj
kE 0
(n)
(n)
(1 `a) Mj mj .
(n0 +n)
As a consequence, Mj
for any k 1
(n0 +n)
mj
(kn0 +n)
Mj
(n)
(Mj
(kn0 +n)
(n)
mj
(Mj
(n)
mj )(1 `a)k .
(37)
This ensures the existence of an (unbounded) sequence n1 , n2 , . . . such that for all j E
(nk )
lim (Mj
(n)
(nk )
mj
(n)
mj
) = 0.
(38)
(n)
(n0 )
mj
a > 0.
P
P
P
(n)
(n)
Furthermore, jE j = jE limn pij = limn jE pij = 1 as the sum consists of finitely
many summands.
It follows immediately from minjE j > 0 and (33) that the condition (35) is necessary for ergodicity
if one takes into account that the state space E is finite.
Remarks
(n)
As the limits j = limn pij of ergodic Markov chains do not depend on i and the state space
E = {1, . . . , `} is finite, clearly
>
lim >
lim P(n) = > .
n =
(n)
The proof of Theorem 2.4 does not only show the existence of the limits j = limn pij but also
yields the following estimate for the rate of convergence: The inequality (37) implies
(n)
(n)
(n)
sup |pij j | sup Mj mj
(1 `a)bn/n0 c
(39)
i,jE
and hence
jE
jE
(40)
2 MARKOV CHAINS
20
Estimates like (39) and (40) are referred to as geometric bounds for the rate of convergence in literature.
(n)
Now we will show that the limits j = limn pij can be regarded as solution of a system of linear equations.
Theorem 2.5
Let X0 , X1 , . . . be an ergodic Markov chain with state space E = {1, . . . , `} and transition matrix P = (pij ).
(n)
In this case the vector = (1 , . . . , ` )> of the limits j = limn pij is the uniquely determined (positive)
solution of the linear equation system
X
i pij ,
j E,
(41)
j =
iE
P
jE
j = 1 is imposed.
Proof
The definition (33) of the limits j and the ChapmanKolmogorov equation (23) imply by changing
the order of limit and sum that
X (n1)
X
(33)
(33) X
(n) (23)
(n1)
j = lim pkj = lim
pki pij =
lim pki pij =
i pij .
n
iE
iE
iE
(n)
i0 pij ,
P
iE
i0 pij for
j E,
(42)
iE
for all n = 1, 2, . . ..
In particular (42) implies
(42)
j0 = lim
X
iE
(n)
i0 pij =
X
iE
(n) (33)
i0 lim pij
n
= j .
Remarks
In matrix notation the linear equation system (41) is of the form > = > P.
If the number ` of elements in the state space is reasonably small this equation system can be used for
the numerical calculation of the probability function ; see Section 2.2.5.
In case ` 1, MonteCarlo simulation turns out to be a more efficient method to determine ; see
Section 3.3.
2.2.2
Recall:
If {Xn } is a Markov chain whose 1-step transition matrix P has only strictly positive entries pij ,
2 MARKOV CHAINS
21
then the geometric bound for the rate of convergence to the limit distribution = (1 , . . . , ` )> derived
in (40) is given as follows:
max |nj j | = O((1 `a)n ) ,
(43)
jE
(Weather Forecast)
P=
1p
p0
1 p0
0
0 n
p
p
p
1
(1
p
)
+
Pn =
0
0
0
p+p
p+p
p p
p0
and thus
P =
lim Pn =
p0
1 p
p + p0
p0
p
p
Consequently
Pn P
(1 p p0 )n p
=
p + p0
p0
p0
= O(|1 p p0 |n ) ,
(44)
det(P I) =
(1 p )(1 p0 ) pp0 = 0
of P has the two solutions 1 = 1 and 2 = 1 p p0 .
In general geometric estimates of the form (44) for the rate of convergence can be derived by means of the following
socalled PerronFrobenius theorem for quasi-positive matrices.
Theorem 2.6
Let A be a quasi-positive ` ` matrix with eigenvalues 1 , . . . , ` such that |1 | . . . |` |.
Then the following holds:
2 MARKOV CHAINS
22
`
X
pij |j | max |j | ,
jE
j=1
i E .
Let P be a quasi-positive transition matrix such that all eigenvalues 1 , . . . , ` of P are pairwise
sup |nj j | = O(|2 |n ) .
(45)
jE
Proof
Corollary 2.3 implies
lim
`
X
in i >
i = 0,
(46)
i=2
`
X
in i >
i ,
i=1
Pn
>
..
.
>
in i >
= Pn 1n 1 >
i .
1 =
i=2
2 MARKOV CHAINS
23
> n
As >
n = P (see Theorem 2.3) this together with (46) shows (45).
bi
replacing it by
`
X
(0)
pij
bj .
j=1
(1)
(1)
`
X
(0)
j
bj
(47)
j=1
lim
bi
lim
n
`
X
j=1
`
X
(n1)
pij
bj
= lim
j=1
(n)
lim p
n ij
(0)
bj
`
X
`
X
(n)
(0)
pij
bj
j=1
(0)
j
bj .
j=1
The consensus, i.e. the common projection of the unknown parameter , reached by the committee is
given by
`
X
(0)
b=
j
bj .
(48)
j=1
2 MARKOV CHAINS
24
Remarks
For large ` the algebraic solution of the linear equation system (41) can be difficult.
In this case the estimates for the rate of convergence in (47) become relevant for the practical implementation of the method to reach a consensus described in (47).
We consider the following numerical example.
Let ` = 3 and
2 1 3
6 6 6
1 1 2
.
(49)
P=
4 4 4
2 1 5
8 8 8
The entries of this stochastic matrix imply that the third expert has a particularly high reputation
among his colleagues.
The solution = (1 , 2 , 3 )> of the corresponding linear equation system (41) is given by
1 =
(0)
21
,
77
2 =
12
,
77
3 =
44
,
77
The eigenvalues of the transition matrix given in (49) are 1 = 1, 2 = 1/8 and 3 = 1/12.
The basis in the rate of convergence given by (43) is
1 3a = 1 3 min pij = 1
i,j=1,2,3
5
3
= ,
8
8
whereas Corollary 2.4 yields the following substantially improved geometric rate of convergence
(n)
max
b
bi = O(|2 |n )
i{1,...,`}
where 2 = 1/8 denotes the second largest eigenvalue of the stochastic matrix P given by (49).
2.2.3
Recall that in Theorem 2.4 we characterized the ergodicity of the Markov chain X0 , X1 , . . . by the quasipositivity of its transition matrix P.
However, it can be difficult to show this property of P directly, especially if ` 1.
Therefore, we will derive another (probabilistic) way to characterize the ergodicity of a Markov chain with
finite state space. For this purpose we will need the following notion.
(n)
For arbitrary but fixed states i, j E we say that the state j is accessible from state i if pij > 0 for
some n 0 where P(0) = I. (notation: i j)
Another (equivalent) definition for accessibility of states is the following:
Let j = min{n 0 : Xn = j} be the number of steps until the Markov chain {Xn } reaches the state
j E for the first time. We define j = if Xn 6= j for all n 0.
Theorem 2.7 Let i E be such that P (X0 = i) > 0. In this case j is accessible from i E if and only if
P (j < | X0 = i) > 0.
2 MARKOV CHAINS
25
Proof
The condition is obviously necessary because
{Xn = j} {j n} {j < }
(n)
and thus
lim P
lim
n1
[
{Xk = j} X0 = i
k=0
n1
X
P (Xk = j | X0 = i) = lim
k=0
n1
X
(k)
pij = 0 .
k=0
Remarks
The property of accessibility is
transitive, i.e., i k and k j imply that i j.
(r+m)
(r) (m)
Examples
The definition of irreducibility immediately implies that the 2 2 matrices
1/2 1/2
1/2 1/2
P1 =
and
P2 =
1/2 1/2
1/4 3/4
are irreducible.
On the other hand the 4 4 block matrix P consisting of P1 and P2
P1 0
P=
0 P2
is not irreducible.
2 MARKOV CHAINS
26
Besides irreducibility we need a second property of the transition probabilities, namely the so-called aperiodicity,
in order to characterize the ergodicity of a Markov chain in a simple way.
Definition
(n)
The period di of the state i E is given by di = gcd{n 1 : pii > 0} where gcd denotes the greatest
(n)
common divisor. We define di = if pii = 0 for all n 1.
A state i E is said to be aperiodic if di = 1.
The Markov chain {Xn } and its transition matrix P = (pij ) are called aperiodic if all states of {Xn }
are aperiodic.
We will now show that the periods di and dj coincide if the states i, j belong to the same equivalence class of
(n)
communicating states. For this purpose we introduce the notation i j[n] if pij > 0.
Theorem 2.8 If the states i, j E communicate, then di = dj .
Proof
If j j[n], i j[k] and j i[m] for certain k, m, n 1, then the inequalities from Corollary 2.2
imply that i i[k + m] and i i[k + m + n].
Thus, k + m and k + m + n are divisible by di .
As a consequence the difference n = (k + m + n) (k + m) is also divisible by di .
(n)
This shows that di is a common divisor for all natural numbers n having the property that pjj > 0,
i.e. di dj .
For reasons of symmetry the same argument also proves that dj di .
Corollary 2.5
Let the Markov chain {Xn } be irreducible. Then all states of {Xn } have the same period.
In order to show
that the characterization of an ergodic Markov chain (see Theorem 2.4) considered in Section 2.2.1 is
equivalent to the Markov chain being irreducible and aperiodic,
we need the following elementary lemma from number theory.
Lemma 2.3 Let k = 1, 2, . . . an arbitrary but fixed natural number. Then there is a natural number n0 1 such
that
{n0 , n0 + 1, n0 + 2, . . .} {n1 k + n2 (k + 1); n1 , n2 0} .
Proof
If n k 2 there are integers m, d 0 such that n k 2 = mk + d and d < k.
Therefore n = (k d + m)k + d(k + 1) and hence
n {n1 k + n2 (k + 1); n1 , n2 0} ,
i.e., n0 = k 2 is the desired number.
2 MARKOV CHAINS
Theorem 2.9
27
Proof
Let us first assume the transition matrix P to be irreducible and aperiodic.
(n)
For every i E we consider the set J(i) = {n 1 : pii > 0} whose greatest common divisor is 1
as P is aperiodic.
The inequalities from Corollary 2.2 yield
(n+m)
pii
(n) (m)
pii pii
and hence
n + m J(i)
if n, m J(i).
(50)
(52)
This result, the irreducibility of P and the inequality (25) in Corollary 2.2, i.e.
(r+n+m)
pij
imply that for each pair i, j E of states there is a natural number n(ij) 1 such that
(n)
2 MARKOV CHAINS
28
Remarks
A simple example for a nonirreducible Markov chain
can be given by our well-known model for the weather forecast where E = {1, 2} and
1p
p
.
P=
p0
1 p0
If p = 0 or p0 = 0, then the corresponding Markov chain is clearly not irreducible and therefore by
Theorem 2.9 not ergodic.
It is nevertheless possible that the linear equation system
> = > P
(53)
(54)
and such that for every n N the random variable Zn is conditionally independent of the random
variables Z1 , . . . , Zn1 , X0 , . . . , Xn2 given Xn1 ,
i.e., for arbitrary n N, i0 , i1 . . . , in1 E and k1 , . . . , kn D
P (Zn = kn , Zn1 = kn1 , . . . , Z1 = k1 , Xn1 = in1 , . . . , X0 = i0 )
= P (Zn = kn | Xn1 = in1 ) P (Zn1 = kn1 , . . . , Z1 = k1 , Xn1 = in1 , . . . , X0 = i0 ) ,
where we define P (Zn = kn | Xn1 = in1 ) = 0 if P (Xn1 = in1 ) = 0.
Moreover, we assume that for arbitrary i E and k D the probabilities P (Zn = k | Xn1 = i)
do not depend on n N.
One can show that the sequence X0 , X1 , . . . : E recursively defined by (54) is a Markov chain
whose transition matrix P = (pij ) is given by
pij = P ((i, Z1 ) = j | X0 = i) ,
if P (X0 = i) > 0 for all i E.
2 MARKOV CHAINS
29
(55)
where given Xn1 the random variable Zn is conditionally independent of the random variables
Z1 , . . . , Zn1 , X0 , . . . , Xn1 with P (Zn = 1) + P (Zn = 1) = 1 and
P (Zn = 1 | Xn1 = i) =
i
`
if P (Xn1 = i) > 0 .
The entries pij of the transition matrix P = (pij ) are therefore given by
`i
`
i
pij =
if i < ` and j = i + 1,
if i > 0 and j = i 1,
else
(n)
In particular this implies di = gcd{n 1 : pii > 0} = 2 for all i {0, 1, . . . , `}, i.e. the Markov
chain given by (55) is not aperiodic (and thus by Theorem 2.9 not ergodic).
In spite of this, the linear equation system
> = > P
has a (uniquely determined) probability solution > = (0 , . . . , ` ) where
1 `
,
i {0, 1, . . . , `} .
i = `
2
i
(56)
(57)
Remarks
The diffusion model of Ehrenfest is a special case of the following class of Markov chains called birth
and death processes with two reflecting barriers in literature.
2 MARKOV CHAINS
30
The state space considered is E = {0, 1, . . . , `} whereas the transition matrix P = (pij ) is given by
q
1
P=
1
r1
p1
q2
r2
..
.
p2
..
.
qi
..
ri
..
.
pi
..
.
..
q`1
r`1
p`1
(58)
p
+ ri i + qi+1 i+1 , if 0 < i < `,
i1 i1
i =
q1 1 ,
if i = 0,
p
if i = `.
`1 `1 ,
One can show that
i = 0
p1 p2 . . . pi1
,
q1 q2 . . . qi
P`
where 0 > 0 is defined by the condition i=0 i = 1, i.e.
!
1
p1
p1 p2 . . . p`1
0 1 +
+
+ ... +
=1
q1
q1 q2
q1 q2 . . . q`
and, consequently,
0 =
1
p1
p1 p2 . . . p`1
1+
+
+ ... +
q1
q1 q2
q1 q2 . . . q`
!1
.
As we assume pi > 0 and qi > 0 for all i {1, . . . , ` 1}, birth and death processes with two reflecting
barriers are obviously irreducible.
If the additional condition ri > 0 is satisfied for some i {1, . . . , ` 1}, then birth and death processes
with two reflecting barriers are also aperiodic (and hence ergodic by Theorem 2.9).
2.2.4
Recall
If {Xn } is an irreducible and aperiodic Markov chain with (finite) state space E = {1, . . . , `} and
(quasi-positive) transition matrix P = (pij ),
then the limit distribution = limn n is the uniquely determined probability solution of the
following matrix equation (see Theorem 2.5):
> = > P .
(59)
If the Markov chain {Xn } is not assumed to be irreducible there can be more than one solution for (59).
2 MARKOV CHAINS
31
Moreover, if the initial distribution 0 of {Xn } is a solution of (59), then Theorem 2.3 and (59) imply
>
>
>
1 = 0 P = 0
(n)
For arbitrary but fixed i, j E the entries qij of the stochastic (` `)dimensional matrices Qn = (qij )
where
1
Qn =
P + P2 + . . . + Pn
(60)
n
converge to a limit
(n)
j = lim qij > 0 ,
(61)
n
which does not depend on i. The vector = (1 , . . . , ` )> is a solution of the matrix equation (59) and
P`
satisfies j=1 j = 1.
The distribution given by (60)(61) is the only probability solution of (59).
A proof of Theorem 2.10 can be found in Chapter 7 of E. Behrends (2000) Introduction to Markov Chains, Vieweg,
Braunschweig.
Remarks
Besides the invariance property 0 = 1 = . . ., the Markov chain {Xn } with stationary initial distribution 0 exhibits still another invariance property for all finite dimensional distributions that is
considerably stronger.
In this context we consider the following notion of a (strongly) stationary sequence of random variables.
Definition
Let X0 , X1 , . . . : E be an arbitrary sequence of random variables mapping into E = {1, . . . , `}
(which is not necessarily a Markov chain).
The sequence {Xn } of E-valued random variables is called stationary if for arbitrary k, n {0, 1, . . .}
and i0 , . . . , in E
P (Xk = i0 , Xk+1 = i1 , . . . , Xk+n = in ) = P (X0 = i0 , X1 = i1 , . . . , Xn = in ) .
(62)
2 MARKOV CHAINS
32
Theorem 2.11
Let X0 , X1 , . . . : E be a Markov chain with state space E = {1, . . . , `}.
Then {Xn } is a stationary sequence of random variables if and only if the Markov chain {Xn } has a
stationary initial distribution.
Proof
The necessity of the condition follows immediately
from Theorem 2.3 and from the definitions for a stationary initial distribution and a stationary
sequence of random variables, respectively,
as (62) in particular implies that P (X1 = i) = P (X0 = i) for all i E
>
>
and from Theorem 2.3 we thus obtain >
0 = 1 = 0 P, i.e., 0 is a stationary initial distribution.
Conversely, suppose now that 0 is a stationary initial distribution of the Markov chain {Xn }.
Then, by the definition (3) of a Markov chain {Xn }, we have
P (Xk = i0 , Xk+1 = i1 , . . . , Xk+n = in )
X
=
P (X0 = i00 , . . . , Xk1 = i0k1 , Xk = i0 , Xk+1 = i1 , . . . , Xk+n = in )
i00 ,...,i0k1 E
i00 ,...,i0k1 E
=
=
=
k
>
0P
i0
pi0 i1 . . . pin1 in
where the last but one equality is due to the stationarity of the initial distribution 0 and the last
equality uses again the definition (3) of the Markov chain {Xn }.
Remarks
For some Markov chains, whose transition matrices exhibit a specific structure, we already calculated
their stationary initial distributions in Sections 2.2.2 and 2.2.3.
Now we will discuss two additional examples of this type.
In these examples the state space is infinite requiring an additional condition apart from quasi
positivity (or irreducibility and aperiodicity) in order to ensure the ergodicity of the Markov chains.
Namely, a socalled contraction condition is imposed that prevents the probability mass to migrate
towards infinity.
Examples
1. Queues
see T. Rolski, H. Schmidli, V. Schmidt, J. Teugels (2002) Stochastic Processes for Insurance and
Finance. J. Wiley & Sons, Chichester, p. 147.
We consider the example already discussed in Section 2.1.2
of the recursively defined Markov chain X0 , X1 , . . . {0, 1, . . .} with X0 = 0 and
Xn = max{0, Xn1 + Zn 1} ,
n 1,
(63)
2 MARKOV CHAINS
33
P (Z = j + 1 i)
if j + 1 i > 0 or j > i = 0,
pij =
(64)
P (Z = 0) + P (Z = 1) if j = i = 0,
0
otherwise.
It is not difficult to show that
the Markov chain {Xn } defined by the recursion formula (63) with its corresponding transition
matrix (64) is irreducible and aperiodic if
P (Z = 0) > 0 ,
(65)
for all n 1 the solution of the recursion equation (63) can be written as
n
Xn = max 0,
max
k{1,...,n}
n
X
o
(Zr 1)
n
d
= max 0,
max
k{1,...,n}
r=k
k
o
X
(Zr 1) ,
(66)
r=1
n
lim P max 0,
P
=
k{1,...,n}
k
P
sup
k
X
max
(Zr 1) = i
r=1
(Zr 1) = i
k{1,2,...} r=1
k
P
sup
for i > 0,
(Zr 1) 0
for i = 0.
k{1,2,...} r=1
Furthermore
i = 0
if E Z 1,
i > 0
P
i0
i = 1
X
i
X
g (s) =
s i
= Es
i=0
and
n
X = max 0,
sup
k
X
k{1,2,...} r=1
o
(Zr 1) .
(67)
2 MARKOV CHAINS
34
Namely, we have
g (s) =
(1 )(1 s)
,
gZ (s) s
s (1, 1) ,
(68)
1X k
s P (X + Z = k) + P (X + Z = 0)
s
k=1
(s 1)P (X + Z = 0)
.
s gZ (s)
As
lim g (s) = 1
and
s1
lim
s1
(69)
d
gZ (s) = E Z ,
ds
P (X + Z = 0)
.
1
q
1
P=
1
r1
p1
q2
r2
..
.
p2
..
.
qi
..
ri
..
.
pi
..
.
..
(70)
p
if i > 0,
i1 i1 + ri i + qi+1 i+1
i =
q
if i = 0.
1 1
Similarly to the birth and death processes with two reflecting barriers one can show that
(71)
2 MARKOV CHAINS
35
the equation system (71) has a uniquely determined probability solution > if
X
p1 p2 . . . pj
< ,
q
q . . . qj+1
j=1 1 2
(72)
X
1
p1 p2 . . . pj
0 1 +
+
=1
q1 j=1 q1 q2 . . . qj+1
i = 0
and, consequently,
0 =
X p1 p2 . . . pj
1
+
1+
q1 j=1 q1 q2 . . . qj+1
!1
.
As we assume pi > 0 and qi > 0 for all i {1, 2, . . .} birth and death processes with one reflecting
barrier are obviously irreducible.
Furthermore, if ri > 0 for some i {1, 2 . . . , } then birth and death processes with one reflecting
barrier are also aperiodic (as well as ergodic if the contraction condition (72) is satisfied).
2.2.5
First we show how the stationary initial distribution 0 (= = limn n ) of the Markov chain {Xn } can be
computed based on methods from linear algebra in case the transition matrix P does not exhibit a particularly
nice structure (but is quasipositive) and if the number ` of states is reasonably small.
Theorem 2.12
Let the transition matrix P of the Markov chain {Xn } be quasi-positive.
Then the matrix I P + E is invertible and the uniquely determined probability solution = limn n
of the matrix equation > = > P is given by
> = e> (I P + E)1 ,
(73)
where e = (1, . . . , 1)> and all entries of the ` ` matrix E are equal to 1.
Proof
In order to prove that the matrix I P + E is invertible we show that the only solution of the equation
(I P + E) x = 0
(74)
is given by x = 0.
As satisfies the equation > = > P we obtain
> (I P) = 0 .
(75)
2 MARKOV CHAINS
36
0 = > (I P + E) x = 0 + > Ex ,
i.e.
> Ex = 0 .
>
>
(76)
e> x = 0
and
Ex = 0 .
(77)
Remarks
Given a larger number ` of states the numerical computation of the inverse matrix (I P + E)1 in
(73) can cause difficulties.
In this case it is often more convenient to solve the matrix equation > = > P iteratively.
If the transition matrix P is quasipositive and hence ` > 0 one can start by setting
b` = 1 and
solving the modified equation
b = b>
b > (I P)
(78)
b = (pij )i,j=1,...,`1 and
b > = (b
where P
1 , . . . ,
b`1 ), b> = (p`1 , . . . , p`,`1 ).
The probability function > = (1 , . . . , ` ) desired originally is given by
i =
bi /c with c =
b1 + . . . +
b`
i = 1, . . . , ` .
2 MARKOV CHAINS
37
Lemma 2.4
Let A be an ` ` matrix such that An 0 for n .
Then the matrix I A is invertible and for all n = 1, 2, . . .
I + A + . . . + An1 = (I A)1 (I An ) .
(79)
Proof
Obviously for all n = 1, 2, . . .
(I A)(I + A + . . . + An1 ) =
=
I + A + . . . + An1 A . . . An
I An .
(80)
6= det(I An )
Lemma 2.5
b be the (` 1) (` 1) matrix introduced in (78).
Let the stochastic matrix P be quasi-positive and let P
b n 0 for n , the matrix I P
b is invertible, and
Then, P
b 1 =
(I P)
bn .
P
(81)
n=0
Proof
b n 0.
Because of Lemma 2.4 it suffices to show that P
As P is quasipositive by hypothesis there is a natural number n0 1 such that
X (n )
b = {1, . . . , ` 1}.
= max
pij 0 < 1 ,
where E
b
iE
b
jE
Furthermore,
b n )ij =
(P
i1 ,...,in1 E
b
i1 ,...,in1 E
(n)
2 MARKOV CHAINS
38
(n ) (n )
(n )
0
pii10 pi1 i02 . . . pik1
ik
b
i1 ,...,ik E
(n ) (n )
(n )
0
pii10 pi1 i02 . . . pik2
ik1
b
i1 ,...,ik1 E
b
ik E
(n )
0
pik1
ik
(n ) (n )
(n )
0
pii10 pi1 i02 . . . pik2
ik1
b
i1 ,...,ik1 E
..
.
k .
b n )ij limk k = 0.
This yields limn (P
Remarks
b = b> , is given by
b > of the equation (78), i.e.
b > (I P)
As a consequence of Lemma 2.5 the solution
>
>
b =b
bn ,
P
(82)
n=0
b > = (b
thus allowing an iterative solution of
1 , . . . ,
b`1 ).
>b
>
>
Notice that we start the iteration with b>
0 = b as initial value later setting bn+1 = bn P for all n 0.
and
2.3
2.3.1
Pn0
n=0
b>
n ,
(83)
n=0
A stationary Markov chain X0 , X1 , . . . : E and its corresponding pair (P, ) consisting of the transition
matrix P and the stationary initial distribution is called reversible if its finitedimensional distributions
do not depend on the orientation of the time axis, i.e., if
P (X0 = i0 , X1 = i1 , . . . , Xn1 = in1 , Xn = in ) = P (Xn = i0 , Xn1 = i1 , . . . , X1 = in1 , X0 = in )
(84)
First of all we will derive a simple characterization for the reversibility of stationary (but not necessarily ergodic)
Markov chains.
2 MARKOV CHAINS
39
Theorem 2.13
Let X0 , X1 , . . . : E be a Markov chain with state space E, transition matrix P = (pij ) and stationary
initial distribution = (1 , 2 , . . .)> .
The Markov chain is reversible if and only if
i pij = j pji
for arbitrary i, j E.
(85)
Proof
By definition (84) the condition (85) is clearly necessary as (84) implies in particular
P (X0 = i, X1 = j) = P (X1 = i, X0 = j)
for arbitrary i, j E.
Therefore
i pij
= P (X0 = i, X1 = j)
= P (X1 = i, X0 = j)
= j pji .
Conversely, if (85) holds then the definition (3) of Markov chains yields
P (X0 = i0 , X1 = i1 , . . . , Xn1 = in1 , Xn = in )
(3)
(85)
=
..
.
(85)
=
=
(3)
=
=
Remarks
The proof of Theorem 2.13 does not require the stationary Markov chain X0 , X1 , . . . to be ergodic.
In other words,
if the transition matrix P is not irreducible or not aperiodic and hence the limit distribution
does not exist or is not uniquely determined, respectively,
then Theorem 2.13 still holds if is an arbitrary stationary initial distribution.
As P = (pij ) is a stochastic matrix, (85) implies for arbitrary i E
i = i
X
jE
pij =
X
jE
(85)
i pij =
j pji .
jE
In other words: Every initial distribution satisfying the so-called detailed balance condition (85) is
necessarily a stationary initial distribution, i.e. it satisfies the global balance condition > = > P.
2 MARKOV CHAINS
40
v1
v4
v3
v2
v5
v6
v7
v8
`
i
pij =
if i < ` and j = i + 1,
if i > 0 and j = i 1,
(86)
else,
and the (according to Theorem 2.10 uniquely determined but not ergodic) stationary initial distribution
1 `
> = (0 , . . . , ` ) ,
where i = `
,
i {0, 1, . . . , `} .
(87)
2
i
One can easily see that
i pij = j pji
for arbitrary i, j E, i.e., the pair (P, ) given by (86) and (87) is reversible.
2. Birth and Death Processes
For the birth and death processes with two reflecting barriers considered in Section 2.2.3 let the
transition matrix P = (pij ) be of such a form that the equation > = > P has a uniquely
determined probability solution > = (1 , 2 , . . .).
For this situation one can show that
i pij = j pji
i, j E .
2 MARKOV CHAINS
41
A random walk on the graph G = (V, K) is a Markov chain X0 , X1 , . . . : E with state space
E = {1, . . . , `} and transition matrix P = (pij ), where
1
di
pij =
Figure 1 shows such a graph G = (V, K) where the set V = {v1 , . . . , v8 } contains 8 vertices and
the set K consists of 12 edges. More precisely
K =
(v1 , v2 ), (v1 , v3 ), (v2 , v3 ), (v2 , v8 ), (v3 , v4 ), (v3 , v7 ), (v3 , v8 ), (v4 , v5 ), (v4 , v6 ), (v5 , v6 ),
(v6 , v7 ), (v7 , v8 ) .
One can show that
the transition matrix given by (88) is irreducible,
the (according to Theorem 2.10 uniquely determined) stationary initial distribution is given
by
d
`
P
d` >
1
=
,...,
,
where d =
di ,
(89)
d
d
i=1
the pair (P, ) given by (88)(89) is reversible as for arbitrary i, j {1, . . . , `}
d 1
1
dj 1
i
= =
= j pji
d di
d
d dj
i pij =
0 = j pji
The transition matrix P given by (88) for the numerical example defined in Figure 1 is not only
irreducible but also aperiodic and the stationary initial distribution (= = limn n ) is
given by
2 3 5 3 2 3 3 3 >
=
,
,
,
,
,
,
,
.
24 24 24 24 24 24 24 24
4. Cyclic Random Walks
The following example of a cyclic random walk
Let E = {1, 2, 3, 4} and
0.25
P=
0.75
is not reversible.
0.75
0.25
0
0.75
0
0.25
0
0.75
0
0.25
0
(90)
2 MARKOV CHAINS
42
0.75
2
0.25
0.75
0.25
0.25
0.75
0.25
4
0.75
a
ab
2b
P= a+b
(91)
b
ab
.
0
a+b
a
This transition matrix P is doublystochastic, i.e., the transposed matrix P> is also a stochastic
matrix and P is obviously quasipositive.
The (uniquely determined) stationary initial distribution = limn n is given by
= (1/3, 1/3, 1/3)> .
As the transition matrix P in (91) is not symmetric the pair (P, ) is not reversible.
2.3.2
Recall that
in Section 2.1.3 we showed that a stationary Markov chain X0 , X1 , . . . with transition matrix P = (pij )
and stationary initial distribution = (1 , . . . , ` )> can be constructed as follows, where
we started with a sequence Z0 , Z1 , . . . of independent and on [0, 1] uniformly distributed random variables and defined
k
k1
i
X
X
X0 = k
if and only if
Z0
i ,
i ,
i=1
`
X
k=1
i=1
k
k1
X
X
k1I
i < Z0
i .
i=1
i=1
(92)
2 MARKOV CHAINS
43
for n = 1, 2, . . . ,
(93)
pij .
(94)
`
X
k=1
k1I
k1
X
pij < z
j=1
k
X
j=1
If the pair (P, ) is reversible, then the stationary Markov chain X0 , X1 , . . . constructed in (92)(94) can
be tracked back into the past in the following way.
First of all we extend the sequence Z0 , Z1 , . . . of independent and on [0, 1] uniformly distributed random
variables to a sequence . . . , Z1 , Z0 , Z1 , . . . of independent and identically random variables that is
unbounded in both directions.
Note that due to the assumed independence of . . . , Z1 , Z0 , Z1 , . . . this expansion does not pose any
problems as the underlying probability space can be constructed via an appropriate product space,
productalgebra, and product measure.
The random variables X1 , X2 , . . . are now constructed recursively setting
Xn1 = (Xn , Zn1 )
for n = 0, 1, . . . ,
(95)
Theorem 2.14
Let X0 , X1 , . . . : E be a reversible Markov chain with state space E, transition matrix P = (pij ) and
stationary initial distribution = (1 , . . . , ` )> .
Then the sequence . . . , X1 , X0 , X1 , . . . : E defined by (92)(95) is
a stationary Markov chain with transition matrix P and the onedimensional marginal distribution ,
i.e., for arbitrary k Z = {. . . , 1, 0, 1, . . .}, ik , ik+1 , . . . , in E and m 1
P (Xk = ik , Xk+1 = ik+1 , . . . , Xn1 = in1 , Xn = in )
= P (Xk+m = ik , Xk+m+1 = ik+1 , . . . , Xn+m1 = in1 , Xn+m = in )
= ik pik ik+1 . . . pin1 in .
The proof of Theorem 2.14 is quite similar to the ones given for Theorems 2.11 and 2.13 and is therefore omitted.
2.3.3
Let E = {1, . . . , `} and P be a quasi-positive (i.e. an irreducible and aperiodic) transition matrix.
In case the eigenvalues 1 , . . . , ` of P are pairwise distinct we showed by the PerronFrobenius
Theorem (see Corollary 2.4) that
max |nj j | = O(|2 |n ) ,
jE
(96)
where = (1 , . . . , ` )> is the (uniquely determined) solution of the equation > = > P.
If (P, ) is also reversible one can show that the basis |2 | considered in (96) cannot be improved.
2 MARKOV CHAINS
44
D = diag( i ).
As the eigenvalues 1 , . . . , ` of P coincide with the eigenvalues of DPD1 we obtain i R for all
i E,
and the right eigenvectors 1 , . . . , ` of DPD1 can be chosen such that all of their components are
real,
that furthermore 1 , . . . , ` are also left eigenvectors of DPD1 and that the rows as well as the lines
of the ` ` matrix (1 , . . . , ` ) are orthonormal vectors.
The spectral representation (30) of A = DPD1 yields for every n 1
`
X
1
n
1 n
P = D AD = D A D =
kn D1 k (k )> D .
n
k=1
By plugging in 1 = 1 and
= ( 1 , . . . ,
r
(n)
pij = j +
`
j X n
k ki kj ,
i
(97)
k=2
(n)
jE
jE
jE
k=2
This shows that |2 | is the smallest positive number such that the estimate for the rate of convergence
considered in (96) holds uniformly for all initial distributions 0 .
Remarks
Notice that (97) yields the following more precise specification of the convergence estimate (96). We
have
`
P
|ki ||kj |
`
X
(n)
1
1
p j q
q
|k |n |ki ||kj | k=2
|2 |n q
|2 |n ,
ij
min i k=2
min i
min i
iE
iE
iE
as the column vectors 1 , . . . , ` and hence also the row vectors (1,j , . . . , `,j ) where j = 1, . . . , ` form
an orthonormal basis in R` and thus by the CauchySchwarz inequality
`
X
|ki ||kj |
k=2
`
X
(ki )2
`
1/2 X
k=1
(kj )2
1/2
= 1.
Consequently,
k=1
1
|2 |n .
min i
max |nj j | q
jE
(98)
iE
However, the practical benefit of the estimate (98) can be limited for several reasons:
The factor in front of |2 |n in (98) does not depend on the choice of the initial distribution 0 .
2 MARKOV CHAINS
45
The derivation of the estimate (98) requires the Markov chain to be reversible.
It can be difficult to determine the eigenvalue 2 if the number of states is large.
Therefore in Section 2.3.5 we consider an alternative convergence estimate,
which depends on the initial distribution
and does not require the reversibility of the Markov chain.
Furthermore, in Section 2.3.7 we will derive an upper bound for the second largest absolute value
|2 | among the eigenvalues of a reversible transition matrix.
2.3.4
At first we will discuss a method enabling us to transform (ergodic) transition matrices such that the resulting
matrix is reversible.
Let P = (pij ) be an irreducible and aperiodic (but not necessarily reversible) transition matrix and let
= (1 , . . . , ` )> be the corresponding stationary initial distribution such that i > 0 for all i E.
e = (e
Moreover, we consider the stochastic matrix P
pij ) where
peij =
j pji
,
i
(99)
e = D2 P> D2 where D = diag(i ) is also an irreducible and aperiodic transition matrix having the
i.e., P
same stationary initial distribution = (1 , . . . , ` )> .
e is reversible as we observe
The pair (M, ), where the stochastic matrix M = (mij ) is given by M = PP,
i mij = i
`
X
k=1
Definition
pik
X
j pjk
i pik
= j
pjk
= j mji .
k
k
k=1
Remarks
All eigenvalues M,1 , . . . , M,` of M are real and in [0, 1] because M has the same eigenvalues as the
symmetric and nonnegative definite matrix M = DMD1 , where
`
`
j
X
i
i X
i
j pjk
mij = mij =
pik
=
pik pjk
j
j
k
k
k
k=1
and hence
k=1
>
M = DMD1 = DPD1 DPD1 .
As a consequence, the symmetric matrix M is diagonalizable and the right and left eigenvectors i
and i can be chosen such that
i = i for all i E
the vectors 1 , . . . , ` are an orthonormal basis in R` .
2 MARKOV CHAINS
46
i = D i ,
and
i E ,
(100)
>
M = ( i )> DMD1 D = M,i ( i )> D = M,i >
>
i .
i M = D i
This yields the following spectral representation of the multiplicative reversible version M obtained from the
transition matrix P; see also the spectral representation given by formula (30).
Theorem 2.15
`
X
n
M,i
i >
i x.
(101)
i=1
where i and i are the right and left eigenvectors of M defined in (100).
Proof
As the (right) eigenvectors 1 , . . . , ` of M defined in (100) are also a basis in R` , for every x R`
(r)
(r) >
there is a (uniquely determined) vector x1 , . . . , x`
R` such that
x=
`
X
(r)
x i i .
i=1
n
Furthermore, we have Mi = M,i i and hence Mn i = M,i
i for arbitrary i E and n N.
Thus we obtain
Mn x =
`
X
(r)
xi Mn i =
i=1
`
X
(r)
n
xi M,i
i .
i=1
> X
> X
>
(r)
(r)
(r)
(r)
>
>
x
=
D
x
=
x
D
=
=
xj i j = xi ,
j
i
i
i
i
j
j
j
j=1
j=1
(102)
j=1
where the last equality takes into account that i = i for all i E and that the eigenvectors
1 , . . . , ` von M are an orthonormal basis of R` .
This proves the spectral representation (101).
2.3.5
2 MARKOV CHAINS
47
v
u `
uX
kxk = t
x2i i .
i=1
The terms (weighted) mean (x) and variance Var (x) of x L(E) will be used to denote the quantities
(x) =
`
X
xi i
= (x, e)
(104)
i=1
and
(105)
respectively.
Lemma 2.6
e + (I M)x, x .
Var (x) = Var (Px)
(106)
Proof
b = x (x) e we obtain that (b
Introducing the notation x
x) = 0 and
e x) =
(Pb
` X
`
X
i=1
`
`
X
X
peij xj (x) i =
peij xj i (x) =
j pji xj (x) = 0 ,
j=1
i,j=1
i,j=1
e
where the last but one equality follows from the definition (99) of the matrix P.
This implies
kb
xk2 = Var (b
x) = Var (x)
and
` X
`
2
X
e x, Pb
ex =
b
Pb
p
e
x
i
ik
k
i=1 k=1
` X
`
X
peij
i=1 j,k=1
` X
`
X
k=1 j=1
bj x
b k i
peik x
|{z}
=
k pki
i
e kj x
b k k
bj x
(PP)
{z
e k
=(PPx)
e x, x
b
b = Mb
x, x
PPb
(107)
2 MARKOV CHAINS
and thus
48
e xk2 = kb
b = (I M)b
b = (I M)x, x ,
kb
xk2 kPb
xk2 Mb
x, x
x, x
as M is a stochastic matrix such that > M = > and therefore (I M)e = 0 and
(I M)x, e
`
X
`
`
`
X
X
X
j (i) mij xj i =
xi i
xj
i mij = 0 .
i,j=1
i=1
j=1
i=1
{z
=j
i.e., the distance dTV (, ) between and is expressed via the total variation
X
| | =
|i i |
(109)
iE
X (i i )2
i
(110)
iE
The distance dTV (, ) between and can be estimated via the 2 contrast 2 (; ) of with respect to
as follows.
Lemma 2.7
1 2
(; ) .
4
(111)
Proof
Taking into account that
X
iE
P
iE
|i i |
X
iE
p 2 X 1
1
i i i
(i i )2 .
i
i
iE
The rate of convergence > Pn > for n can now be estimated based on
2 MARKOV CHAINS
49
> 2 (; ) n
d2TV > Pn ,
M,2 .
4
(112)
Proof
Let n = (n1 , . . . , n` )> where ni = (> Pn )i /i .
Then for all i E
`
X
k pki (> Pn )k
(> Pn+1 )i
=
i
k
i
k=1
and thus
e n = n+1 .
P
>
>
Moreover, by definition (110) of the 2 -contrast 2n = 2 > Pn ; of > Pn with respect
to we obtain
> n
2
`
`
2
X
X
( P )i i
(> Pn )i
2
n =
=
1 i
i
i
i=1
i=1
=
`
X
2
ni (n ) i = Var (n ) ,
i=1
i.e.,
2n = Var (n ) .
(113)
2n = 2n+1 + (I M)n , n .
(114)
On the other hand the spectral representation (101) of M derived in Theorem 2.15 implies
(I M)n , n
= (n , n ) (Mn , n )
= (n , n )
`
X
M,i (i >
i n , n )
i=1
= (n , n ) 1
`
X
M,i (i >
i n , n ) ,
i=2
>
as M,1 = 1, 1 = e and >
1 = and therefore
1 >
1 n , n ) = n , n = (e, n ) = (n ) = 1
= (n )2 .
>
>
>
Moreover, in (102) we have shown that >
i n = ni . As 1 = we can conclude 1 n = 1.
2 MARKOV CHAINS
50
M,i (i >
i n , n )
i=2
` X
`
X
(r) (r)
M,i ni nj (i , j )
i=2 j=1
` X
`
X
(r) (r)
M,i ni nj D1 i , D1 j
|
{z
}
i=2 j=1
=i (j)
`
X
(r) 2
M,i ni
i=2
M,2
`
`
X
X
(r) 2
(r) 2
ni
ni = M,2
i=2
= M,2
i=1
` X
`
X
!
(r) (r)
ni nj (i , j ) 1
(r)
n1
|{z}
= >
1 n =1
= M,2
i=1 j=1
(n , n ) 1 .
|
{z
}
=Var (n )
1 M,2 Var (n ) .
(I M)n , n
2n 2n+1 + 1 M,2 2n
and
2n+1 M,2 2n .
n
Thus, we have shown that 2n M,2
20 for all n 1 and, consequently, the assertion follows from
Lemma 2.7.
2.3.6
Let E = {1, . . . , `} be an arbitrary finite set and let P be an (` `)dimensional transition matrix, which
is irreducible and aperiodic (i.e. quasipositive) as well as reversible.
Recall that
all eigenvalues of P are real (see Section 2.3.3), and
by the PerronFrobenius theorem (see Theorem 2.6 and Corollary 2.3) the eigenvalues of P are in the
interval (1, 1], where
the largest eigenvalue is 1 and the absolute values of the other eigenvalues are (strictly) less than 1.
Remarks
Instead of ordering the eigenvalues according to their absolute values (like above) we will now order
them with respect to their own size and denote them by 1 , . . . , ` such that
1 = 1 > 2 . . . ` > 1 .
e of the transition matrix P that was
Moreover, for the multiplicative reversible version M = PP
introduced in Section 2.3.4 we have
1 = 1 > 2 . . . ` > 0 ,
i.e., for the eigenvalues of the matrix M the notations 1 , . . . , ` and 1 , . . . , ` coincide.
2 MARKOV CHAINS
51
For large `,
the calculation of the second largest absolute value |2 | = max{2 , |` |} among the eigenvalues can
cause difficulties.
Therefore, in Section 2.3.7 we will derive bounds for 2 and ` , whose calculation is very simple.
These bounds are particularly useful if
the stationary (limit) distribution is at least in principle known,
but in spite of this the corresponding Markov chain is started with a non-stationary initial distribution ; for example it could be started in a predetermined state i E, i.e. i = 1 and j = 0
for j 6= i.
In order to derive an upper bound for 2 , we need a representation formula for 2 ,
that is usually called the Rayleightheorem in literature
and that is expressed based on the socalled Dirichletform
(115)
of the reversible pair (P, ) , where (y, x) denotes the inner product of y and x with respect to ; see
(103).
1 X
i pij (xj xi )2 .
2
(116)
i,jE
Proof
From the definition (103) of the inner product and the reversibility of the pair (P, ) we obtain
X
2 (I P)x, x = 2
i pij xi (xi xj )
i,jE
ij
i pij xi (xi xj ) +
i,jE
(85)
i,jE
j pji xj (xj xi )
i,jE
i pij xi (xi xj ) +
i,jE
X
X
i pij xj (xj xi )
i,jE
i pij (xj xi )2 .
We will now prove the Rayleightheorem that yields a representation formula for the second largest eigenvalue
2 of the reversible pair (P, ).
Theorem 2.17
Let R`6= = x = (x1 , . . . , x` )> R` : xi 6= xj for some pair i, j E denote the set of all vectors in R`
whose components are not all equal.
2 MARKOV CHAINS
52
For the eigenvalue 2 of the reversible pair (P, ) the following holds
2 = 1 inf
xR`6=
D(P,) (x, x)
,
Var (x)
(117)
where Var (x) denotes the variance of the components of x with respect to defined in (105).
Proof
Lemma 2.8 implies for arbitrary c R and x R`
D(P,) (x, x) = D(P,) (x c e, x c e) .
Thus, the assertion (117) is equivalent to
D(P,) (x, x)
,
(118)
Var (x)
xR`0
x=
`
X
(r)
xi i .
i=1
As 1 = 1 we obtain
(I P)x =
`
X
(r)
(1 i )xi i
and hence
D(P,) (x, x) =
i=2
`
X
(r) 2
.
(1 i ) xi
i=2
On the other hand as 1 = e and the eigenvectors 1 , . . . , ` are orthonormal with respect to the
inner product ( , ) we can conclude that
(x) = (x, e) =
(r)
x1
and
Var (x) =
`
X
(r) 2
xi
if (x) = 0 .
i=2
D(P,) (x, x)
Var (x)
i=2
(r) 2
(1 i ) xi
`
P
i=2
(r) 2
xi
`
P
(1 2 ) +
i=2
`
(r) 2
P
(r) 2
(1 i ) xi
(1 2 )
xi
i=2
`
P
(r) 2
xi
i=2
`
P
(1 2 ) +
i=3
(r) 2
(2 i ) xi
`
P
(r) 2
xi
i=2
1 2 .
2 MARKOV CHAINS
53
This shows that (118) holds as the last expression for the quotient D(P,) (x, x)/Var (x) implies
D(P,) (x, x)
= 1 2 ,
Var (x)
for x = 2 where 2 R`0 as 1 = e and 2 are linearly independent.
2.3.7
In order to derive bounds for the eigenvalues 2 and ` the following notions and notations are necessary.
For each pair i, j E such that i 6= j and pij > 0 we denote
by e = eij the corresponding directed edge of the transition graph
by e = i and e+ = j the starting and target vertices of e, respectively.
Let E be the set of all directed edges e = eij such that i 6= j and pij > 0.
Furthermore, for each i, j E such that i 6= j we consider exactly one path ij from i to j,
which is given by a vector ij = (i0 , i1 , . . . , im1 , im ) of states such that i = i0 , j = im and
pii1 pi1 i2 . . . pim1 j > 0 ,
such that none of the edges eik1 ik is contained more than once (and m is the smallest possible number).
Let be the set of all these paths and for each path ij define
|ij | =
X
eij
1
1
1
1
=
+
+ ... +
,
Q(e)
i pii1
i1 pi1 i2
im1 pim1 j
(119)
(120)
ij 3e
Finally we consider
the extended set of edges E 0 E also containing the edges of the type i i in case pii > 0.
for all i E exactly one path i from i to i which contains an odd number of edges in E 0 such that no
edge occurs more than once.
Let 0 be the set of all these paths and for every path
i 0 let
X 1
.
|i | =
Q(e)
e
(121)
X
i 3e
|i |i .
(122)
2 MARKOV CHAINS
Theorem 2.18
54
1
2
2 ` 1 +
and hence
max{2 , |` |} 1 min
(123)
n1 2o
,
.
(124)
Proof
First we will show that 2 1 1 .
Because of Theorem 2.17 it suffices to show that
Var (x) D(P,) (x, x) ,
x R` .
(125)
1
2kxk2 2(x)2
2
X
X
1 X 2
xi i +
x2j j 2
xi xj i j
2
=
=
iE
jE
i,jE
1 X
(xi xj )2 i j
2
i,jE
2
p
1
1 XX
p
Q(e)(xe xe+ ) i j .
2
Q(e)
i,jE eij
=
=
X
1 X
|ij |
Q(e)(xe xe+ )2 i j
2
eij
i,jE
!
X
1 X
2
Q(e)(xe xe+ )
|ij | i j
2
3e
eE
ij
D(P,) (x, x) ,
where the last inequality follows from Lemma 2.8 and by definition of the Poincarcoefficient; see
(120). This shows (125).
In order to finish the proof it is left to show that ` 1 + 2 1 .
For this purpose we exploit the following equation: For all x = (x1 , . . . , x` )> R`
1 X
(xi + xj )2 i pij = (Px, x) + kxk2 ,
2
(126)
i,jE
X
1 X 2
1 X 2
xi xj i pij +
xi i pij +
xj i pij
2
2
| {z }
i,jE
i,jE
i,jE
(85)
|
{z
}
= j pji
P 2
{z
}
|
=
xi i
iE
jE
= kxk2 + (Px, x) .
x2j j
2 MARKOV CHAINS
55
Let now i = (i0 , i1 , . . . , i2m , i2m+1 ) where i = i0 = i2m+1 is a path from i to i, containing an odd
number of edges such that every edge does not occur more than once.
Then
1
xi =
(xi + xi1 ) (xi1 + xi2 ) + . . . + (xi2m + xi )
2
1 X
=
(1)n(e) (xe+ + xe ) ,
2 e
i
2
X i X
p
1
p
Q(e)(1)n(e) (xe+ + xe )
4 e
Q(e)
iE
i
X
X i
|i |
(xe+ + xe )2 Q(e)
4
ei
iE
X
1 X
=
(xe+ + xe )2 Q(e)
|i |i
4
0
3e
=
eE
X
(xe+ + xe )2 Q(e) .
4
0
eE
(Px, x) + kxk2 .
2
Example
(` + 1)
2
and
` 1 +
2
.
We return to the example of a random walk on a graph that has been already discussed in Section 2.3.1.
Let G = (V, K) be a connected graph with vertices V = {v1 , . . . , v` } and edges K where each edge
connects two vertices,
such that for each pair vi , vj V of vertices there is a path of edges in K connecting vi and vj .
A random walk on the graph G = (V, K) is a Markov chain X0 , X1 , . . . : E
with state space E = {1, . . . , `} and transition matrix P = (pij ) where
1
di
pij =
Recall that two vertices vi and vj are called neighbors if they are endpoints of the same edge
where, for each vertex vi , di denotes its number of neighbors.
We already showed that
the transition matrix P given in (127) is always irreducible (where we now additionally assume P
to be aperiodic),
2 MARKOV CHAINS
56
(128)
where
|ij | =
X
eij
ij 3e
1
= d (ij )
Q(e)
and (ij ) = #{e : e ij } denotes the number of edges (i.e. the length) of the path ij .
Taking into account (127)(128), this implies
()
2
,
d
(129)
2 1
(130)
where
|i | =
X
ei
1
= d (i ) ,
Q(e)
|i |i 0 0 ,
i 3e
0 = max0 () ,
0 = max0 #{ 0 : 3 e}
and hence
` 1 +
eE
2
2
1 +
.
0 0
Remarks.
For the numerical example from Section 2.3.1
v4
(131)
v1
v3
v2
v5
v6
v7
v8
2 MARKOV CHAINS
57
and
0 = 3, 0 = 3.
24
24
<
25 3 7
25
and
8 > 1 +
and hence
max{2 , |8 |} <
24
.
25
2
43
=
,
533
45
3 MONTECARLO SIMULATION
58
MonteCarlo Simulation
Besides the traditional ways of data acquisition in laboratory experiments and field tests the generation of
so-called synthetic data via computer simulation has gained increasing importance.
There is a variety of reasons for the increased benefit drawn from computer simulation used to investigate
a wide range of issues, objects and processes:
The most prominent reason is the rapidly growing performance of modern computer systems which has
extended our computational capabilities in a way that would not have been imaginable even a short
time ago.
Consequently, computer-based data generation is often considerably cheaper and less time-consuming
than traditional data acquisition in laboratory experiments and field tests.
Moreover, computer experiments can be repeated under constant conditions as frequently as necessary
whereas in traditional scientific experiments the investigated object is often damaged or even destroyed.
A further reason for the value of computer simulations is the fact
that volume and structure of the analyzed data is often very complex
and that in this case data processing and evaluation is typically based on mathematical models whose
characteristics cannot be (completely) described by analytical formulae.
Thus, computer simulations of the considered models present a valuable alternative tool for analysis.
Computer experiments for the investigation of the issues, objects and processes of scientific interest are
based on stochastic simulation algorithms. In this context one also uses the term MonteCarlo simulation
summarizing a huge variety of simulation algorithms.
1. Random number generators are the basis for MonteCarlo simulation of single features, quantities and
variables.
By these algorithms realizations of random variables can be generated via the computer. Those
are called pseudorandom numbers.
The simulation of random variables is based on socalled standard random number generators
providing realizations of random variables that are uniformly distributed on the unit interval
(0, 1].
Certain transformation and rejection methods can be applied to these standard pseudorandom
numbers in order to generate pseudorandom numbers for other (more complex) random variables
having e.g. binomial, Poisson or normal distributions.
2. Computer experiments designed to investigate highdimensional random vectors or the evolution of
certain objects in time are based on more sophisticated algorithms from socalled dynamic Monte
Carlo simulation.
In this context MarkovChainMonteCarloSimulation (MCMC simulation) is a construction
principle for algorithms that are particularly appropriate to simulate time stationary equilibria of
objects or processes.
Another example for the application of MCMC simulation is statistical image analysis.
An active field of research that resulted in numerous publications during the last years are so-called
coupling algorithms for perfect MCMC simulation.
These coupling algorithms enable us to simulate timestationary equilibria of objects and processes
in a way that does not only allow approximations but simulations that are perfect in a certain
sense.
3 MONTECARLO SIMULATION
3.1
3.1.1
59
First we recall two simple problems that can be solved by means of MonteCarlo simulation and have already
been discussed in the course Elementare Wahrscheinlichkeitsrechnung und Statistik.
B = (1, 1] (1, 1] R2 ,
|C|
= ,
|B|
4
1 if S 2 + T 2 < 1,
i
i
Xi =
0
else
are independent and identically distributed random variables with expectation E Xi = /4.
Furthermore, the SLLN (see Theorem WR-5.15) implies
that the arithmetic mean
Yn = n1
n
X
i=1
Xi
3 MONTECARLO SIMULATION
60
1 if s2 + t2 < 1,
i
i
xi =
0
else
Compute 4(x1 + . . . + xn )/n.
2. Monte Carlo Integration
Let : [0, 1] [0, 1] be a continuous function.
Our goal is to find an estimator for the value of the integral
MonteCarlo simulation.
We consider the following stochastic model.
R1
0
Let the random variables X1 , X2 , . . . : R be independent and uniformly distributed on (0, 1], with
probability density fX given by
1 if x [0, 1],
fX (x) =
0 else.
Let Zk = (Xk ) for all k = 1, 2, . . ..
By the transformation theorem for independent and identically distributed random variables (see
Theorem WR-3.18) the random variables Z1 , Z2 , . . . are independent and identically distributed
with
Z
Z
1
E Z1 =
(x)fX (x) dx =
0
(x) dx .
0
1X
a.s.
Zk
n
k=1
(x) dx .
0
R1
Zk is an unbiased and (strongly) consistent estimator for 0 (x) dx,
R1
Pn
i.e., the probability for n1 k=1 Zk to be a good approximation of the integral 0 (x) dx is high
for sufficiently large n.
Hence
1
n
Pn
k=1
For the implementation of this simulation algorithm one can proceed similarly to Example 1:
Use a random number generator to generate n pseudo-random numbers x1 , . . . , xn that are realizations of random variables being uniformly distributed in (0, 1].
Define zk = (xk ) for k = 1, . . . , n.
Pn
Compute n1 k=1 zk .
3 MONTECARLO SIMULATION
3.1.2
61
mod (m) ,
k = 1, . . . , n
(1)
.
The initial value z0 {0, 1, . . . , m 1} the algorithm is starting from is called germ of the linear
congruential generator.
m N, a {0, 1, . . . , m 1} and c {0, 1, . . . , m 1} are further parameters called modulus, factor
and increment of the congruential generator.
The scaling
zk
uk =
(2)
m
yields the standard pseudorandom numbers u1 , . . . , un .
As a next step we will solve the recursion equation (1), i.e., we will show how the number zk that has been
recursively defined in (1) can be expressed directly by the initial value z0 and the parameters m, a and c.
Theorem 3.1
ak 1
zk = ak z0 + c
a1
mod (m) .
(3)
Proof
We show the assertion by mathematical induction. For k = 1 the claim (3) coincides with the recursion
equation (1).
Let (3) be true for a certain k 1, i.e., there is an integer j 0 such that
zk = ak z0 + c
ak 1
jm .
a1
(4)
=
=
=
=
ak 1
k
a a z0 + c
jm + c
mod (m)
a1
a(ak 1) + a 1
ak+1 z0 + c
ajm
mod (m)
a1
ak+1 1
mod (m) ,
ak+1 z0 + c
a1
3 MONTECARLO SIMULATION
62
Remarks
Obviously, the linear congruential generator defined in (1) can generate no more than m different
numbers z1 , . . . , zn .
As soon as a number zk is repeated for the first time, i.e., there is some m0 > 0 such that
zk = zkm0 ,
the same period of length m0 , which has already been completely generated, is started again, i.e.
zk+j = zkm0 +j
for all j 1.
An unfavorable choice of the parameters m, a, c and z0 , respectively, may result in a very short length
m0 of the period.
For example we have
m0 = 2
3 MONTECARLO SIMULATION
63
ensuring the generation of sequences z1 , . . . , zn whose period m0 is as large as possible and also exhibiting other desirable properties.
One of those properties is
that the points (u1 , u2 ), . . . , (un1 , un ) formed by pairs of consecutive pseudorandom numbers ui1 ,
ui are uniformly spread over the unit square [0, 1]2 .
The following numerical examples illustrate that relatively small changes of the parameters a and c
can result in completely different point patterns (u1 , u2 ), . . . (un1 , un ).
Further details can be found in the text by Ripley (1987) that has been already mentioned and in the lecture
notes by H. Knsch (ftp://stat.ethz.ch/U/Kuensch/skript-sim.ps) that also contains the following figures.
Figure 3: Point patterns for pairs (ui1 , ui ) of consecutive pseudo-random numbers for m = 256
3.1.3
Statistical Tests
In literature numerous statistical significance tests are discussed in order to investigate characteristics of random number generators; see e.g. G.S. Fishman (1996) Monte Carlo: Concepts, Algorithms and Applications,
Springer, New York.
We only recall two such tests which are important for investigating characteristics of linear congruential
generators (and other random number generators).
Pearsons 2 goodness of fit test is used to check
if the generated pseudorandom numbers can be regarded as realizations of uniformly distributed
random variables
and if we may assume the independence of these random variables.
3 MONTECARLO SIMULATION
64
Figure 4: Point patterns for pairs (ui1 , ui ) of consecutive pseudorandom numbers for m = 256
Another method for the generation of sequences u1 , u2 , . . . of numbers having desirable characteristics
is based on minimizing the Kolmogorov distance
1
r
X
(Zj (u1 , . . . , un ) n/r)2
,
n/r
j=1
3 MONTECARLO SIMULATION
65
Figure 5: Point patterns for pairs (ui1 , ui ) of consecutive pseudorandom numbers for m = 2048
If the sampling variables U1 , . . . , Un are independent and uniformly distributed on the interval (0, 1],
the test statistic Tn is asymptotically 2r1 distributed.
Thus, for sufficiently large n the hypothesis H0 : p = p0 is rejected if
Tn (u1 , . . . , un ) > 2r1,1 ,
where 2r1,1 denotes the (1 )-quantile of the 2 distribution with r 1 degrees of freedom.
We will illustrate this test by the following numerical example. For = 0.05, n = 100 000 and r = 10
we want to check if
the hypothesis that the sampling variables are uniformly distributed is conformable with a sample
(u1 , . . . , u100 000 ) of pseudorandom numbers. The sample has the following vector (z1 , . . . , z10 ) of
class frequencies:
z1
z2
z3
z4
z5
z6
z7
z8
z9
z10
9 995
10 045
10 127
9 816
10 130
10 040
9 890
9 858
10 083
10 016
3 MONTECARLO SIMULATION
66
In this case we obtain T100 000 (u1 , . . . , u100 000 ) = 10.99 and hence
T100 000 (u1 , . . . , u100 000 ) = 10.99 < 29,0.95 = 16.92 .
Thus, the hypothesis of a uniform distribution on (0, 1] is not rejected.
Remarks
As a generalization of the 2 goodness of fit test for checking the uniform distribution of some
sample variables one can also check
if for a given natural number d 1 (e.g. d = 2 or d = 3) the pseudorandom vectors
(u1 , . . . , ud ), . . . , (u(n1)d+1 , . . . , und ) can be regarded
as realizations of independent random vectors (U1 , . . . , Ud ), . . . , (U(n1)d+1 , . . . , Und ) that are
uniformly distributed on (0, 1]d .
For this purpose the unit cube (0, 1]d is divided into rd smaller cubes Bj of equal size,
which are of the form ((i1 1)/r, i1 /r] . . . ((id 1)/r, id /r].
Furthermore, we consider the (rd 1)dimensional (hypothetical) vector p0 = (1/rd , . . . , 1/rd )
of parameters and
the test statistic Tn : Rnd [0, ) where
d
r
X
(Zj (u1 , . . . , un ) n/rd )2
Tn (u1 , . . . , un ) =
,
n/rd
j=1
2. Run Test
There are a number of other significance tests allowing to evaluate the quality of random number generators.
In particular it can be verified
if the generated pseudo-random numbers u1 , . . . , un can be regarded as realizations of independent
random variables U1 , . . . , Un having a certain distribution. In our case we consider the hypothesis of a
uniform distribution on (0, 1].
The following run test checks in particular
if the independence assumption for the sampling variables U1 , . . . , Un is reflected sufficiently well
by the pseudorandom numbers u1 , . . . , un .
This is done by analyzing the lengths of monotonically increasing subsequences, also called runs,
within the sequence u1 , u2 , . . . of pseudo-random numbers.
For this purpose we define the random variables V1 , V2 , . . . by the recursion formula
Vj+1 = min{i : i > Vj + 1, Ui > Ui+1 } ,
j = 1, 2, . . . ,
(5)
and
(6)
3 MONTECARLO SIMULATION
67
Theorem 3.3 The random variables W1 , W2 , . . . introduced in (6) are independent and identically distributed such that
k
P (Wj = k) =
,
k = 1, 2, . . . ,
(7)
(k + 1)!
if the random variables U1 , U2 , . . . are independent and uniformly distributed on (0, 1].
Proof
Let U1 , U2 , . . . be independent and uniformly distributed on (0, 1].
Then for all n 1 and for arbitrary natural numbers k1 , . . . , kn 1, we get that
P (W1 = k1 , . . . , Wn = kn ) = P (V1 = k1 , V2 V1 1 = k2 , . . . , Vn Vn1 1 = kn )
= P V1 = k1 , V2 = k2 + k1 + 1, . . . , Vn = kn + . . . + k1 + n 1
tk
.
k!
(8)
For k = 1, equation (8) obviously holds. By the formula of total probability we obtain
Z 1
P (U1 . . . Uk+1 t) =
P (U1 . . . Uk Uk+1 t | Uk+1 = x) P (Uk+1 dx)
0
P (U1 . . . Uk x t) dx ,
=
0
where the last equality is a consequence of the independence and (0, 1]-uniform distribution of
U1 , U2 , . . ..
Assume now that (8) is true for some k 1. Then
Z t
P (U1 . . . Uk+1 t) =
P (U1 . . . Uk x) dx
0
Z t k
x
tk+1
=
dx =
,
k!
(k + 1)!
0
where the second but one equality uses the induction hypothesis.
3 MONTECARLO SIMULATION
68
Z 1
=
P (U1 . . . Uk 1) P (U1 . . . Uk x) dx
Z 1
1
xk
=
dx
k!
k!
0
1
k
1
=
.
=
k!
(k + 1)!
(k + 1)!
Remarks
Let us assume that sufficiently many pseudo-random numbers u1 , u2 . . . have been generated that
are resulting in the n runs w1 , . . . , wn according to (5) and (6).
We choose r pairwise disjoint intervals (a1 , b1 ], . . . , (ar , br ] on the positive real axis such that
the probabilities
X
k
,
j = 1, . . . , r
p0,j =
(k + 1)!
kN(aj ,bj ]
r
X
(Yj (w1 , . . . , wn ) np0,j )2
,
np0,j
j=1
3.2
Based on standard pseudorandom numbers u1 , u2 . . . that can be generated by methods like the linear
congruential generator
it is possible to generate pseudorandom numbers x1 , x2 . . . that can be regarded as realizations of
random variables X1 , X2 . . . having other than uniform distributions.
Examples are realizations x1 , x2 , . . . of exponentially, Poisson, binomially or normally distributed random variables X1 , X2 , . . ..
For this purpose one can apply algorithms like the so-called inversion method and rejectionsampling, whose
basic ideas will be explained by some examples.
A much more comprehensive discussion of these algorithms can be found e.g. in
L. Devroye (1986) Nonuniform Random Variate Generation. Springer, New York,
G.S. Fishman (1996) Monte Carlo: Concepts, Algorithms and Applications. Springer, New York,
C.P. Robert and G. Casella (1999) Monte Carlo Statistical Methods. Springer, New York.
3 MONTECARLO SIMULATION
3.2.1
69
Inversion Method
The following property of the generalized inverse can be used as a basis for the generation of pseudorandom
numbers x1 , x2 . . . that can be regarded as realizations of random variables X1 , X2 . . . whose distribution
function F : R [0, 1] is an arbitrary monotonically nondecreasing and rightcontinuous function such that
limx F (x) = 0 and limx F (x) = 1.
Recall the following auxiliary result.
Let F : R [0, 1] be an arbitrary distribution function. Then the function F 1 : (0, 1] R {}
where
F 1 (y) = inf{x : F (x) y}
(9)
is called the generalized inverse of the distribution function F .
For arbitrary x R and y (0, 1)
y F (x)
if and only if
F 1 (y) x ,
(10)
Theorem 3.4
Let U1 , U2 , . . . be a sequence of independent and uniformly distributed random variables on (0, 1] and let
F : R [0, 1] be a distribution function.
Then the random variables X1 , X2 , . . . where Xi = F 1 (Ui ) for i = 1, 2, . . . are independent and their
distribution function is given by F .
Proof
The independence of X1 , X2 , . . . is an immediate consequence of the transformation theorem for independent random variables; see Theorem WR-3.18.
Furthermore, (10) implies for arbitrary x R and i N
(10)
Examples
In the following we discuss some examples illustrating
how Theorem 3.4 can be used in order to generate pseudo-random numbers x1 , x2 . . .
that can be regarded as realizations of independent random variables X1 , X2 . . . with a given
distribution function F : R [0, 1].
These numbers are also referred to as F distributed pseudorandom numbers x1 , x2 . . .,
in spite of the fact that the empirical distribution function Fbn of the sample x1 , . . . , xn
is only an approximation of F for large n.
Note that Theorem 3.4 can only be applied directly if
the generalized inverse F 1 of F is given explicitly (i.e. by an analytical formula).
3 MONTECARLO SIMULATION
70
1. Exponential distribution
Let > 0 and F : R [0, 1] be the distribution function of the Exp()distribution, i.e.
1 ex if x 0,
F (x) =
0
if x < 0.
Then F 1 (u) = 1 log(1 u) for all u (0, 1].
By Theorem 3.4,
we have X = 1 log U Exp() if U and hence also 1 U are uniformly distributed on (0, 1]
and the pseudo-random numbers x1 , . . . , xn where
xi =
log ui
for i = 1, . . . , n
dv if x 0,
(r 1)!
F (x) =
(11)
0
0
if x < 0.
Then the generalized inverse F 1 of F cannot be determined explicitly and therefore Theorem 3.4
cannot be applied directly.
However, one can show that X1 +. . .+Xr (, r) if the random variables X1 , . . . , Xr are independent
and Exp()distributed.
By Theorem 3.4
the pseudorandom numbers y1 , . . . , yn where
for i = 1, . . . , n
3 MONTECARLO SIMULATION
71
3. Normal distribution
In order to generate normally distributed pseudorandom numbers one can apply the socalled Box
Muller algorithm, which also requires exponentially distributed pseudorandom numbers.
Assume the random numbers U1 , U2 to be independent and uniformly distributed on (0, 1].
By Theorem 3.4, we get that X = 2 log U1 is an Exp(1/2)distributed random variable and
the random vector (Y1 , Y2 ) where
Y1 = X cos(2U2 ) ,
Y2 = X sin(2U2 )
turns out to be N(o, I)distributed, i.e., Y1 , Y2 are independent and N(0, 1)distributed random
variables
as for arbitrary y1 , y2 R
p
p
P (Y1 y1 , Y2 y2 ) = P
2 log U1 cos(2U2 ) y1 , 2 log U1 sin(2U2 ) y2
Z 1Z
1
=
1I x cos(2u) y1 , x sin(2u) y2 ex/2 dx du
2 0 0
Z y2 Z y1
2
2
1
=
e(v +w )/2 dv dw
2
Z y1
Z y2
2
1
1
v 2 /2
e
dv
ew /2 dw ,
=
2
2
where the last but one equality follows from the substitution
w = x sin(2u)
v = x cos(2u) ,
whose functional determinant is .
The pseudorandom numbers y1 , . . . , y2n where
p
y2k1 = 2 log u2k1 cos(2u2k ) ,
y2k =
p
2 log u2k1 sin(2u2k )
(12)
can thus be regarded as realizations of independent and N(0, 1)distributed random variables ,
if u1 , . . . , u2n are realizations of independent and uniformly on (0, 1] distributed random variables
U1 , . . . , U2n .
0
For arbitrary R and 2 > 0 the pseudo-random numbers y10 , . . . , y2n
where yi0 = (yi + ) can be
2
regarded as realizations of independent and N(, )distributed random variables.
Remarks
A faster algorithm for the generation of normally distributed pseudo-random numbers is obtained
if additionally a method of rejection sampling is applied that will be introduced in Section 3.2.3.
This method avoids the relatively time-consuming computation of the trigonometric functions in
(12).
3.2.2
3 MONTECARLO SIMULATION
72
a0
a1
.
..
X=
aj
..
.
if U < p0 ,
if p0 U < p0 + p1 ,
(13)
if p0 + . . . + pj1 U < p0 + . . . + pj ,
a0 if ui < p0 ,
a1 if p0 ui < p0 + p1 ,
.
..
xi =
aj if p0 + . . . + pj1 ui < p0 + . . . + pj ,
..
.
can thus be regarded as realizations of independent and p-distributed random variables where
p = (p0 , p1 , . . .)> ,
if u1 , . . . , un are realizations of independent and uniformly distributed random variables on (0, 1].
Example
(Geometric distribution)
0
if j = 0,
pj =
p q j1 if j 1.
Then, for all j 1,
1 (p1 + . . . + pj ) = pj+1 + pj+2 + . . . = p
qi = qj
(14)
i=j
and pj = q j1 q j .
Furthermore, we consider the random variable
log U
X=
log q
+ 1,
where U is a (0, 1]uniformly distributed random variable and bzc denotes the integer part of z.
Then P (X = j) = p q j1 for all j = 1, 2, . . ., i.e. X Geo(p),
(15)
3 MONTECARLO SIMULATION
73
(15)
=
=
=
n
log U o
min j 1 : j >
log q
n
o
min j 1 : j log q < log U
n
o
min j 1 : q j < U
j 1I q j < U q j1
j=1
(14)
j 1I p1 + . . . + pj1 1 U < p1 + . . . + pj ,
j=1
log ui
xi =
+1
log q
can thus be regarded as realizations of independent and geometrically distributed random variables
X1 , . . . , Xn Geo(p)
if u1 , . . . , un are realizations of independent random variables U1 , . . . , Un that are uniformly distributed on the interval (0, 1].
For some discrete distributions there are specific transformation algorithms allowing the generation of pseudo
random numbers having this distribution.
Examples
1. Poisson distribution (with small expectation )
If > 0 is a small number, then the following procedure is appropriate to generate Poisson
distributed pseudorandom numbers
by transformation of exponentially distributed pseudorandom numbers (as in Section 3.2.1)
or directly based on (0, 1]uniformly distributed pseudorandom numbers.
Let the random variables X1 , X2 , . . . be independent and Exp()distributed.
If we consider the random variable Y = max{k 0 : X1 + . . . + Xk 1}, formula (11) for the
distribution function of the Erlangdistribution yields for all j 0
P (Y = j)
= P (Y j) P (Y j + 1)
= P (X1 + . . . + Xj 1) P (X1 + . . . + Xj+1 1)
Z 1 v
Z 1 v
e
(v)j
e
(v)j1
dv
dv
=
(j 1)!
j!
0
0
Z 1
d ev (v)j
=
dv
j!
0 dv
e j
=
.
j!
3 MONTECARLO SIMULATION
74
and
yi = max{k 0 : x1 + . . . + xk i} yi1 ,
i = 1, . . . , n ,
(16)
yi = max{k 0 : u1 . . . uk ei } yi1 ,
i = 1, . . . , n ,
(17)
(18)
pj+1 =
pj ,
j 0,
j+1
Pj
is applied to calculate the sums Pj = k=0 pk for j 0.
Let bc > 0 be the integer part of . Then it is firstly checked if U < Pbc .
If this inequality holds it is checked if U < Pbc1 , U < Pbc2 , . . . where we define X =
min{k : U < Pk }.
If the inequality U < Pbc does not hold then it is checked if U < Pbc+1 , U < Pbc+2 , . . . and
we also define X = min{k : U < Pk }.
For the expectation E V of the necessary number V of checking steps we obtain the approximation
EV
1 + E |X |
|X |
= 1 + E
1 + 0.798 ,
where the last approximation uses the fact that the random variable (X )/ is approximately N(0, 1)-distributed for large for the following reasons.
As the Poisson distribution is stable under convolutions, i.e.,
n
X
Pn
the random variable X Poi() can be viewed as the sum i=1 Xi of n independent and
Poi(/n)distributed random variables Xi . The last approximation then follows from the
central limit theorem for sums of independent and identically distributed random variables;
see Theorem WR-5.16.
3 MONTECARLO SIMULATION
75
We observe that
for increasing the mean number of checking steps only grows with rate if this simulation
procedure is applied,
whereas for the formerly discussed method generating Poi()distributed pseudorandom numbers the necessary number of standard pseudorandom numbers grows linearly in .
3. Binomial distribution
For the generation of binomially distributed pseudorandom numbers one can proceed similarly
to the Poisson case.
For arbitrary but fixed numbers n N and p (0, 1) where q = 1 p let
aj = j
and
pj =
n!
pj q nj ,
j! (n j)!
Pj
k=0
j = 0, 1, . . . , n .
nj p
pj ,
j+1 q
j = 0, 1, . . . , n 1
Acceptance-Rejection Method
In this section we discuss another method for the generation of pseudorandom numbers y1 , y2 , . . .
that can be regarded as realizations of independent and identically distributed random variables
Y1 , Y2 . . .. Their distribution function is assumed to be given; it is denoted by G.
This method also requires a sequence of independent and identically distributed pseudorandom numbers x1 , x2 , . . ., but we abandon the condition that they need to be uniformly distributed on (0, 1].
The only condition we impose on their distribution function F is that G needs to be absolutely continuous with respect to F with bounded density g(x) = dG(x)/dF (x),
i.e., for some constant c > 0, we have
Z
g(x) c
and
G(y) =
g(x) dF (x) ,
x, y R .
(19)
3 MONTECARLO SIMULATION
76
g(j) =
qj
c
pj
(20)
Theorem 3.5
Let (U1 , X1 ), (U2 , X2 ), . . . be a sequence of independent and identically distributed random vectors whose
components are independent. Furthermore, let Ui be a (0, 1]uniformly distributed random variable and Xi
be distributed according to p.
Then
the random variable
n
qXk o
I = min k 1 : Uk <
c pXk
Proof
By the definition of I given in (21), we obtain for all j 1
qXj1
qXj
qX1
P (I = j) = P U1
, . . . , Uj1
, Uj <
c pX 1
c pXj1
c pXj
qXj1
qXj
qX1
= P U1
. . . P Uj1
P Uj <
c pX 1
c pXj1
c pXj
= p q j1 ,
where q = 1 p and
p =
=
=
qX1
P U1 <
c pX
1 q
X
X1
P U1 <
| X1 = k P (X1 = k)
c pX 1
k: pk >0
X
qk
P U1 <
pk
c pk
k: pk >0
k: pk >0
1
qk
pk =
.
c pk
c
P (XI = j) =
P (XI = j, I = k)
k=1
=
=
X
k=1
X
k=1
P (Xk = j, I = k) =
X
k=1
pj q k1
qj
cpj
qj
P (Xk = j) q k1 P Uk
cpj
qj X k1
=
q
c
qj 1
= qj ,
c 1q
k=1
(21)
3 MONTECARLO SIMULATION
77
P (Xk = j, I = k)
k=1
P (Xk = j) = 0 .
k=1
Remarks
Theorem 3.5 implies that the mean number of F distributed pseudo-random numbers necessary to
obtain a Gdistributed random number is c.
In case there are several alternatives for the choice of the the distribution function F ,
possessing equally nice properties with respect to the generation of F distributed pseudorandom
numbers,
then one should choose the distribution function with the smallest c.
Furthermore, as a consequence of Theorem 3.5,
the values g(x) and g(j) of the density in (19) and (20), respectively need only be known up to a
constant factor.
In the general (i.e. not necessarily discrete) case one can proceed in a similar way. The following result will serve
as foundation for constructing acceptancerejection algorithms.
Theorem 3.6
Let F, G : R [0, 1] be two arbitrary distribution functions such that (19) holds.
Let (U1 , X1 ), (U2 , X2 ), . . . be a sequence of independent and identically distributed random vectors whose
components are independent. Furthermore, let Ui be a (0, 1]uniformly distributed random variable and Xi
be distributed according to F .
Then the random variable
n
g(Xk ) o
I = min k 1 : Uk <
(22)
c
is geometrically distributed with expectation c, i.e., I Geo(c1 ) and the random variable Y = XI is
distributed according to G.
Proof
Similarly to the proof of Theorem 3.5 we obtain P (I = j) = p q j1 for any j 1 where
Z
g(X1 )
g(X1 )
p = P U1 <
=
P U1 <
| X1 = x dF (x)
c
c
R
Z
Z
g(x)
g(x)
1
=
P U1 <
dF (x) =
dF (x) = .
c
c
c
R
R
Furthermore, for all y R we have
P (Y y)
= P (XI y) =
=
Z
X
k=1
=
where 1 q = p = c1 .
P (XI y, I = k) =
k=1
1
1q
P (I = k | Xk = v) dF (v) =
Z
g(v)
dF (v) =
c
P (Xk y, I = k)
k=1
X
k=1
k1
g(v)
P Uk <
dF (v)
c
3 MONTECARLO SIMULATION
78
In the same way we obtain the following vectorial version of Theorem 3.6.
Theorem 3.7
Let m 1 be an arbitrary but fixed natural number and let F, G : Rm [0, 1] be two arbitrary distribution
functions (of mdimensional random vectors) and let c > 0 be a constant such that
Z
g(x) c
and
G(y) =
g(x) dF (x) ,
x, y Rm .
(23)
(,y]
Let (U1 , X1 ), (U2 , X2 ), . . . be a sequence of independent and identically distributed random vectors whose
components are also independent. Furthermore, let Ui be a (0, 1]uniformly distributed random variable and
Xi be distributed according to F .
Then the random variable
n
g(Xk ) o
I = min k 1 : Uk <
c
(24)
is geometrically distributed with expectation c, i.e., I Geo(c1 ) and the random vector Y = XI is
distributed according to G.
Examples
1. Uniform distribution on bounded Borel sets
Let the random vector X : Rm (with distribution function F ) be uniformly distributed on
the square (1, 1]m and let B B((1, 1]m be an arbitrary Borel subset of (1, 1]m of positive
Lebesgue measure |B|.
Then the distribution function G : Rm [0, 1] given by
Z
1I(x B)
G(y) =
dF (x) ,
y Rm
|B|
(,y]
is absolutely continuous with respect to F and we obtain for the (RadonNikodym) density
g : Rm [0, ) that
g(x) =
1I(x B)
c = |B|1
|B|
and
g(x)
= 1I(x B) ,
c
x Rm .
By Theorem 3.7 we can now in the following way generate pseudorandom vectors y1 , y2 , . . . that
are uniformly distributed on B.
1. Generate m pseudorandom numbers u1 , . . . , um that are uniformly distributed on the interval
(0, 1].
2. If (2u1 1, . . . , 2um 1)> 6 B, then return to step 1.
3. Otherwise put y = (2u1 1, . . . , 2um 1)> .
2. Normal distribution
As an alternative to the Box-Muller algorithm discussed in Section 3.2.1 we will now introduce
another method to generate normally distributed pseudorandom numbers,
which is often called the polar method.
Notice that the polar method avoids calculating the trigonometric functions in (12).
Let the random vector (V1 , V2 ) be uniformly distributed on the unit circle B, where
B = {(x1 , x2 ) R2 : x21 + x22 1}.
3 MONTECARLO SIMULATION
79
q
Y2 =
V2
2 log(V12 + V22 ) p
V12
+ V22
is N(o, I)-distributed, i.e., Y1 , Y2 are independent and N(0, 1)-distributed random variables. This
can be seen as follows.
By the substitution
v1 = r cos ,
v2 = r sin ,
i.e. by a transformation into polar coordinates we obtain for arbitrary y1 , y2 R
P (Y1 y1 , Y2 y2 )
p
Z p
v1 2 log(v12 + v22 )
v2 2 log(v12 + v22 )
1
p
p
=
1I
y
,
y
d(v1 , v2 )
1
2
B
v12 + v22
v12 + v22
Z 2 Z 1 p
p
1
=
r 1I
2 log(r2 ) cos y1 , 2 log(r2 ) sin y2 dr d
0
0
Z 2 Z
1 1
1I x cos y1 , x sin y2 ex/2 dx d ,
=
2 2 0
0
where the last equality results from the following substitution:
x = 2 log(r2 )
bzw.
1 x/2
e
dx = 2r dr .
2
By the same argument that was used to verify formula (12) in Section 3.2.1 one can check that
the last term can be written as the product F (y1 )F (y2 ) of two N(0, 1)distribution functions.
The pseudorandom numbers y1 , . . . , y2n with
q
q
v2k
2
2 ) q v2k1
2
2 ) q
+ v2k
+ v2k
y2k1 = 2 log(v2k1
,
y2k = 2 log(v2k1
2
2
2
2
v2k1 + v2k
v2k1 + v2k
can thus be regarded as realizations of independent and N(0, 1)distributed random variables,
if (v1 , v2 ), . . . , (v2n1 , v2n ) are realizations of the random variables (V1 , V2 ), . . . , (V2n1 , V2n )
that are independent and uniformly distributed on the unit circle
B = {(x1 , x2 ) R2 : x21 + x22 1} .
Those can be generated via acceptancerejection sampling as explained in the last example.
3.2.4
In many cases random variables having absolutely continuous distributions can be represented as quotients of
uniformly distributed random variables.
Combined with acceptancerejection sampling (see Section 3.2.3) this yields another type of simulation
algorithm.
The mathematical foundation for this type of algorithm is the following transformation theorem for the
density of absolutely continuous random vectors.
3 MONTECARLO SIMULATION
80
Theorem 3.8
Let X = (X1 , . . . , Xn )> : Rn be an absolutely continuous random vector with joint density fX :
Rn [0, ) and let = (1 , . . . , n ) : Rn Rn be a Borel-measurable function with continuous partial
derivatives i /xj (x1 , . . . , xn ).
Let now the Borel-set C B(Rn ) be picked in a way such that
{x Rn : fX (x) 6= 0} C
and
i
(x1 , . . . , xn ) 6= 0 ,
det
xj
x = (x1 , . . . , xn ) C ,
which ensures that the restriction : C D of to the set C is a bijection where D = {(x) : x C}
denotes the image of .
1
Let 1 = (1
1 , . . . , n ) : D C be the inverse of : C D.
Then the random vector Y = (X) is also absolutely continuous and the density fY (y) of Y is given by
i
1
fX (1
(y1 , . . . , yn ) if y = (y1 , . . . , yn ) D,
1 (y), . . . , n (y)) det
y
j
fY (y) =
(25)
0
if y 6 D.
which is the same as
i
1
1
(
(y
,
.
.
.
,
y
))
(y),
.
.
.
,
(y))
det
fX (1
1
n
n
1
x
j
fY (y) =
if y = (y1 , . . . , yn ) D,
(26)
if y 6 D.
From Theorem 3.8 we obtain the following result concerning the representation of absolutely continuous random
variables as quotients of uniformly distributed random variables.
Theorem 3.9
Let f 0 : R [0, ) be Borel measurable and bounded such that
Z
p
0<
f 0 (x) dx <
and
sup |x| f 0 (x) < .
(27)
xR
Let the random vector (V1 , V2 ) be uniformly distributed on the (bounded) Borel set
p
B = {(x1 , x2 ) R2 : 0 < x1 < f 0 (x2 /x1 )} .
(28)
Then the quotient V2 /V1 is an absolutely continuous random variable with density f : R [0, ) where
f 0 (x)
,
f 0 (y) dy
R
f (x) = R
x R.
Proof
Notice that (27) implies that the Borel set B defined in (28) is bounded, i.e. 0 < |B| < . This is due
to the following reasons.
p
p
For x2 > 0 the inequality x1 < f 0 (x2 /x1 ) is equivalent to x2 < x2 /x1 f 0 (x2 /x1 ).
3 MONTECARLO SIMULATION
81
p
If on the other hand x2 < 0 it is equivalent to x2 > x2 /x1 f 0 (x2 /x1 ).
Therefore
p
p
p
B [0, sup f 0 (x)] [ inf x f 0 (x), sup x f 0 (x)]
x<0
xR
and
B [0, sup
xR
(29)
x>0
xR
f 0 (x)] .
xR
The following joint density f(V1 ,V2 ) (v1 , v2 ) of the random vector (V1 , V2 ) is thus well defined
p
det
xj
(x1 , x2 )
= det
1
x2
2
x1
0
1
x1
!
=
1
,
x1
(x1 , x2 ) C .
Zf (y2 )
f 0 (y2 )
y1 dy1 =
.
2 |B|
Example
(Normal distribution)
Theorem 3.9 yields a third method to generate N(0, 1)distributed pseudo-random numbers (as an
alternative to the BoxMuller algorithm from Section 3.2.1 and the polar method explained in Section 3.2.3).
Consider the function f 0 : R [0, ) where f 0 (x) = exp(x2 /2) for all x R. For the bounds in (29)
we obtain:
p
p
p
p
p
sup f 0 (x) = 1 ,
inf x f 0 (x) = 2/e ,
sup x f 0 (x) = 2/e .
xR
x<0
x>0
3 MONTECARLO SIMULATION
3.3
82
Let E be an arbitrary finite set, e.g. a family of possible digital binary or greyscale images x = (x(v), v V ),
where V is a finite set of pixels
and every pixel v V in the observation window V gets mapped to a greyscale value x(v) 0,
resulting in a matrix (x(v), v V ) that has certain properties.
Let : E (0, 1) be an arbitrary probability function, i.e.
X
x = 1
and
x > 0 ,
x E .
xE
3.3.1
(see O. Hggstrm (2002) Finite Markov Chains and Algorithmic Applications. CU Press, Cambridge)
We consider a connected graph G = (V, K)
with finitely many vertices V = {v1 , . . . , v|V | }
and a certain set K V 2 of edges, each of them connecting two vertices.
Each vertex in V gets either mapped to 0 or 1,
where we consider the following set E {0, 1}|V | of admissible configurations,
characterized by the property that pairs of connected vertices are not allowed to obtain the value 1 on
both vertices; see also Figure 6.
As we want to pick one of the admissible configurations x E at random we consider the (discrete)
uniform distribution on E, i.e.
1
x = ,
x E ,
(30)
`
where ` = |E| denotes the number of all admissible configurations.
3 MONTECARLO SIMULATION
83
3 MONTECARLO SIMULATION
84
2i 2 2i 1 i
1
if z
,
and x(w) = 0 for all vertices w V
2|V |
2|V |
connected to v V ,
2i 2 2i 1 i
2i 1
2i i
,
or z
,
and x(w) = 0
0
if z
x0 (vi ) =
2|V |
2|V |
2|V |
2|V |
2i 2
2i i
x(vi ) if z 6
,
.
2|V |
2|V |
(31)
The following theorem implies that for sufficiently large n the return xn = (xn (v), v V ) of the
algorithm can be regarded as a configuration that has been approximately picked according to the
distribution .
Theorem 3.10
Let P = (pxx0 ) be the transition matrix of the MCMC algorithm simulating the hard core model in (31) and
let be the probability function given in (30).
Then P is irreducible and aperiodic and the pair (P, ) is reversible.
Proof
In order to show that P = (pxx0 ) is aperiodic it suffices to note that all diagonal elements pxx of P are
positive.
The following considerations show that P is also irreducible.
Let x, x0 E be two admissible configurations and let m(x) and m(x0 ) denote the number of
vertices set to 1 in x and x0 , respectively.
First we observe that the transition x x0 to the zero configuration x0 E is possible in
m(x) steps with positive probability, where x0 (v) = 0 for all v V .
For this transition all vertices that were originally set to 1 are subsequently set to 0. Each of these
steps happens with positive probability.
Afterwards, in a similar way the chain can transfer from the zero state x0 to state x0 taking
m(x0 ) steps where each of them happens again with positive probability.
Thus the transition x x0 in a finite number of steps is possible with positive probability.
It is left to check that the detailed balance equation (2.85) holds, i.e.
x pxx0 = x0 px0 x ,
x, x0 E .
(32)
1 1
= x0 px0 x .
` 2|V |
3 MONTECARLO SIMULATION
85
Remarks
For all x E let m(x) be the number of vertices set to 1 of the admissible configuration x.
If the admissible configuration is picked at random then the expectation E Y of the random number
Y of vertices set to 1 is given as
1 X
EY =
m(x) .
(33)
`
xE
If ` is large the direct calculation of the expectation E Y via formula (33) is in general not possible
because it is difficult to determine the numbers m(x) analytically.
A method to approximate the expectation E Y is based on generating k randomly picked admissible
(1)
(2)
(k)
configurations xn , xn , . . . , xn E by k runs of the MCMC simulation algorithm described above.
(1)
(2)
As a consequence of the strong law of large numbers the arithmetic mean m(xn ) + m(xn ) + . . . +
(k)
m(xn ) /k is close to E Y with high probability if the run length n and the sample size k are sufficiently
large.
3.3.2
Gibbs Sampler
The MCMC algorithm for the generation of randomly picked admissible configurations of the hard core model
(see Section 3.3.1) is a special case of a socalled Gibbs sampler for the simulation of discrete (highdimensional)
random vectors.
Let V be a finite (nonempty) index set and let X = (X(v), v V ) be a discrete random vector
taking values in the finite state space E R|V | with probability 1 where we assume
that for every pair x, x0 E there is a finite sequence of states y0 , y1 , . . . , yn E such that
y0 = x ,
yn = x0
and
i = 0, . . . , n 1 .
(34)
Let = (x , x E) be the probability function of the random vector X with x > 0 for all x E, and for
all v V let
3 MONTECARLO SIMULATION
86
3. Generate the update xn+1 (v) of the vth component according to the (conditional) probability
function
pxx0 =
qv x0 (v)| x(v) 1I x(v) = x0 (v) ,
x, x0 E ,
(36)
vV
where the conditional probabilities x0 (v)| x(v) are defined in (35). Then P is irreducible and aperiodic and the
pair (P, ) is reversible.
Proof
qv x(v)| x(v) =
vV
Px
qv
vV
>0
zE: z(v)=x(v)
yi+1
P
>0
(37)
Qk1
i=0
It is left to show that the detailed balance equation (2.85) holds, i.e.
x pxx0 = x0 px0 x ,
x, x0 E .
(38)
zE: z(v)=x(v)
3 MONTECARLO SIMULATION
87
Let X0 , X1 , . . . be a Markov chain with state space E and the transition matrix P = (pxx0 ) given by (36). As a
consequence of Theorem 3.11 we get that in this case
lim dTV (n , ) = 0
(39)
for any initial concentration 0 where n denotes the distribution of Xn . Furthermore, the Gibbs sampler shows
the following monotonic behavior.
Theorem 3.12
For all n = 0, 1, . . .,
dTV (n , ) dTV (n+1 , ) .
(40)
Proof
For arbitrary v V and x0 E, formula (35) implies
X
(35)
x0 (v)| x(v) x
x0 x(v)| x0 (v)
x0
x(v)| x0 (v) = x0 .
{z
(41)
=1
Using this and the definition (36) of the transition matrix P = (pxx0 ) we obtain
X
n+1, x0 x0
2 dTV (n+1 , ) =
x0 E
X X
n, x pxx0 x0
x0 E xE
(36)
X X
X
n, x
qv x0 (v)| x(v) 1I x(v) = x0 (v) x0
x0 E xE
X X
qv
x0 E vV
(41)
X X
qv
x0 E vV
X X
xE vV
X
X
qv
x0 (v)| x(v) n, x x
x0 (v)| x(v) n, x x
qv
x0 (v)| x(v) n, x x
x0 E: x0 (v)=x(v)
{z
(36)
P
x0 E
x0 (v)| x(v) n, x x0
x0 E vV
XX
vV
pxx0 =1
n, x x = 2 dTV (n , ) .
xE
Remarks
A modified version of the Gibbs sampler that was considered in this section is the so-called cyclic Gibbs
sampler, which uses a different procedure for picking the component v V that will be updated.
3 MONTECARLO SIMULATION
88
Namely, it is not chosen according to a (given) probability function q = (qv , v V ), where qv > 0
for all v V ,
but the components v V are sorted linearly and chosen one after another according to this order.
The selection of the update candidates thus becomes a deterministic procedure.
If k = n|V | + i for some numbers n = 0, 1, . . . and i = 1, . . . , |V |, then the matrix P(k) = pxx0 (k) of
the transition probabilities pxx0 (k) in step k is given as
(42)
x pxx0 (i) =
xE
(41)
x x0 (vi )| x(vi ) 1I x(vi ) = x0 (vi ) = x0
xE
x pxx0 = x0 .
xE
The pair (P, ) is in general not reversible. However, in Section 2.3.4 we showed that the pair (M, )
is reversible where
e
e = diag( 1 )P> diag(x )
M = PP
for
P
(44)
x
denotes the multiplicative reversible version of P.
Theorem 3.13
(45)
i.e., the multiplicative reversible version M of the forwardscan matrix P coincides with the forwardbackward
scan matrix.
Proof
e = P(|V |) . . . P(1) for the matrix P
e = (e
It suffices to show that P
pxx0 ) defined by (44).
Formulae (42)(44) imply for arbitrary x, x0 E
!
e xx0 =
p
=
=
>
|) . . . P (1) diag(x )
xx0
1
x(v|V | )| y1 (v|V | ) 1I x(v|V | ) = y1 (v|V | ) y1 (v|V |1 )| y2 (v|V |1 )
x
y1 ,...,y|V |1 E
3 MONTECARLO SIMULATION
89
e xx0 =
p
y1 (v|V | )| x(v|V | ) 1I x(v|V | ) = y1 (v|V | ) y2 (v|V |1 )| y1 (v|V |1 )
y1 ,...,y|V |1 E
P(|V |) . . . P(1)
.
0
xx
Remarks
If Gibbs samplers are used in practice it is always assumed
that the conditional probabilities considered in (36) and (42)
MetropolisHastings Algorithm
We will now show that the Gibbs sampler discussed in Section 3.3.2 is a special case of a class of MCMC
algorithms that are of the socalled MetropolisHastings type. This class generalizes two aspects of the
Gibbs sampler.
1. The transition matrix P = (pxx0 ) can be of a more general form than the one defined by
pxx0 =
x, x0 E.
(46)
vV
2. Besides this, a procedure for acceptance or rejection of the updates x x0 is integrated into the
algorithm. It is based on a similar idea as the acceptance-rejection sampling discussed in Section 3.2.3;
see in particular Theorem 3.5.
Let V be a finite nonempty index set and let X = (X(v), v V ) be a discrete random vector,
taking values in the finite state space E R|V | with probability 1.
As usual we assume x > 0 for all x E where = (x , x E) is the probability function of the
random vector X.
We construct
a Markov chain X0 , X1 , . . . with ergodic limit distribution whose transition matrix P =
pxx0 is given by
pxx0 = qxx0 axx0 ,
x, x0 E with x 6= x0 ,
(47)
3 MONTECARLO SIMULATION
90
where Q = qxx0 is an arbitrary stochastic matrix that is irreducible and aperiodic, i.e. in particular
qxx0 = 0 if and only if qx0 x = 0.
xx0
sxx0
,
1 + txx0
q 0
x xx
x0 q x0 x
=
(48)
if qxx0 > 0,
(49)
if qxx0 = 0,
(50)
Remarks
The structure given by (47) of the transition matrix P = pxx0 can be interpreted as follows.
transition matrix Q = qxx0 only the quotients x /x0 need to be known for all pairs x, x0 E of
states such that qxx0 > 0.
The special case of the Gibbs sampler (see Section 3.3.2) is obtained
if the potential transition probabilities qxx0 are defined by (46).
Then for arbitrary x, x0 E such that #{v V : x(v) 6= x0 (v)} 1
x qxx0 = x0 qx0 x
and thus
txx0 = 1 .
By defining sxx0 = 1 + min txx0 , tx0 x we obtain axx0 = 1 for arbitrary x, x0 E such that
#{v V : x(v) 6= x0 (v)} 1.
Theorem 3.14 The transition matrix P = (pxx0 ) defined by (47)(50) is irreducible and aperiodic and the pair
(P, ) is reversible.
Proof
As the acceptance probabilities axx0 given by (48)(50) are positive for arbitrary x , x0 E the irreducibility and aperiodicity of P = (pxx0 ) are inherited from the corresponding properties of Q = (qxx0 ).
In order to check the detailed balance equation (2.85), i.e.
x pxx0 = x0 px0 x ,
x, x0 E ,
(51)
3 MONTECARLO SIMULATION
91
= x qxx0 axx0
sxx0 x0 qx0 x
= x qxx0
x0 qx0 x + x qxx0
= x0 px0 x ,
where the last equality follows by the symmetry of the matrix S = sxx0 .
Examples
1. Metropolis Algorithm
The classic Metropolis algorithm is obtained if we consider equality in (50), i.e. if
axx0
x0 q x0 x
= min 1,
x qxx0
)
x, x0 E such that qxx0 > 0.
(52)
If the matrix Q = (qxx0 ) of the potential transition probabilities is symmetric, then (52) implies
n 0o
x
,
x, x0 E such that qxx0 > 0.
(53)
axx0 = min 1,
x
In particular, if the potential updates x x0 are chosen randomly, i.e. if
qxx0 =
1
,
|E|
x, x0 E ,
The socalled Barker algorithm is obtained if we consider the matrix S = sxx0 where sxx0 = 1
for arbitrary x, x0 E.
The acceptance probabilities axx0 are then given by
axx0 =
x0 q x0 x
,
x0 qx0 x + x qxx0
(54)
x0
,
x0 + x
(55)
3 MONTECARLO SIMULATION
92
3.4
3.4.1
We will now show how the upper bounds for the variational distance dTV (n , ) and the second largest
absolute value |2 | = max{2 , |` |} of the eigenvalues 1 , . . . , ` of the transition matrix P derived in
Section 2.3 can be used
in order to determine upper bounds for the distance dTV (n , ) occurring in the nth step of the
MCMC simulation via the Metropolis algorithm,
if the simulated distribution satisfies the following conditions.
Namely we assume
that x 6= x0 for arbitrary x, x0 E such that x 6= x0 ,
and that the states x1 , . . . , x` E are ordered such that x1 > . . . > x` .
We may thus (w.l.o.g.) return to the notation used in Section 2.3 and identify the states x1 , . . . , x` E
and the first ` natural numbers, i.e. E = {1, . . . , `}.
The probabilities i (= xi ) can thus be written in the following way:
i =
bh(i)
,
z(b)
i = 1, . . . , ` ,
(56)
P`
i=1
i = 1, . . . , ` 1
(57)
Furthermore, the definition of a Metropolis algorithm for the MCMC simulation of = (1 , . . . , ` )>
requires
that the basis b and the differences h(i + 1) h(i) are known for all i = 1, . . . , ` 1,
i.e. in particular that the quotients i+1 /i are known for all i = 1, . . . , ` 1.
3 MONTECARLO SIMULATION
93
if i = 1, j = 1, 2 or i = `, j = `, ` 1,
1
qij =
if i = 2, . . . , ` 1 and j = i 1, i + 1,
0 , else.
(58)
j ji
aij = min 1,
= min 1, bh(j)h(i) ,
i, j {1, . . . , `} where qij = qji > 0.
i qij
By (56) and (58) the entries pij = qij aij of the transition matrix P = (pij ) for the MCMC simulation
are thus be given as
p11 = 1
bh(2)h(1)
,
2
p12 =
bh(2)h(1)
,
2
p`,`1 = p`` =
1
2
(59)
and for i = 2, . . . , ` 1
pi,i1 =
1
,
2
pi,i+1 =
bh(i+1)h(i)
,
2
(60)
Theorem 3.15 The second largest eigenvalue 2 of the transition matrix P = (pij ) defined by (59)(60) has
the following upper bound
(1 bc/2 )2
2 1
.
(61)
2
Proof
By Theorem 3.14 the pair (P, ) is reversible.
Hence, Rayleighs theorem (see Theorem 2.17) yields the following representation formula
2 = 1 inf
xR`6=
D(P,) (x, x)
,
Var (x)
(62)
where R`6= = x = (x1 , . . . , x` )> R` : xi 6= xj for somer i, j E denotes the subset of vectors
in R` whose components are not all equal,
Var (x) = kxk2 (x)2 is the variance of the components of x with respect to
and D(P,) (x, x) = (I P)x, x denotes the Dirichlet form of the reversible pair (P, ).
Due to (62) it is sufficient to show that
Var (x) aD(P,) (x, x) ,
for some constant a such that
0<a
x R`
(63)
2
.
(1 bc/2 )2
(64)
Similar to the proof of Theorem 2.18 we obtain by copying the notation that for all (0, 1)
X
2 Var (x) =
(xi xj )2 i j
i,jE
XX
2
1
xe+ )
Q(e)
(x
i j
e
Q(e)
i,jE eij
!
!
X
X X
1
2
2
i j ,
Q(e) (xe xe+ )
Q(e)2
e
e
i,jE
ij
ij
3 MONTECARLO SIMULATION
94
where the edge probability Q(e) = e pe e+ is assigned to the directed edge e = (e , e+ ) and
ij denotes the path from i to j.
P
Using the notation |ij | = eij Q(e)2 we thus obtain
X
2Var (x)
|ij |
i,jE
eij
X
X
(xe xe+ )2 Q(e)Q(e)21
i j |ij | .
ij 3e
eE
n
o
X
a = max Q(e)21
i j |ij | ,
eE
(65)
ij 3e
eE
It is left to show that the constant a considered in (65) satisfies the inequality (64).
For this purpose we choose the path ij = (i, i + 1, . . . , j 1, j) for each pair i, j E such that
i < j.
Then (56) and (59)(60) imply
Q(i, i + 1) = i pi,i+1 =
bh(i) bh(i+1)h(i)
i+1
=
.
z(b)
2
2
Thus, the reversibility of the pair (P, ) shown in Theorem 3.14 yields
Q(i + 1, i) = Q(i, i + 1) =
i+1
.
2
Because of (56) and (57) we obtain for arbitrary i, j E such that i < j
!
2
2 2
2
i+1
j
j
j
|ij | =
+ ... +
b2(ji1)c + . . . + b2c + 1
j
j
2
2
22 j2
.
1 b2c
Moreover, all edges e E are of the form e = (i, i + 1) or e = (i, i 1), as for the entries pij of the
transition matrix P = (pij ) defined by (58)(60) we have pij = 0 if |i j| > 1.
Thus, for < 1/2,
a =
n
o
X
i j |ij |
max Q(e)21
eE
max
k=1,...,`1
ij 3e
Q(k, k + 1)
21
1ik,k+1j`
2
(1
b2c )(1
bc(12) )
i j12
1 b2c
21
P
and 1ik i 1 and hence
as Q(k, k + 1)21 = k+1 /2
!
12
12
12
X
k+1
k+1
`
12
j12 =
+ ... +
k+1
.
k+1
k+1
1 bc(12)
k+1j`
3 MONTECARLO SIMULATION
95
The following lemma will turn out to be useful in order to derive a lower bound for the smallest eigenvalue ` of
the transition matrix P = (pij ) defined by (59)(60).
Lemma 3.1
Let A = (aij ) be an arbitrary ` `matrix and for all i = 1, . . . , ` let ri =
P
j: 1j`, j6=i
|aij |.
Then,
| akk | rk .
(66)
Proof
By definition of and we have A = . In particular
`
X
akj j = k
and
( akk )k =
j=1
akj j .
j: 1j`, j6=k
This implies
| akk ||k |
|akj ||j | rk |k |
and
| akk | rk .
j: 1j`, j6=k
Theorem 3.16 The smallest eigenvalue ` of the transition matrix P = (pij ) defined by (59)(60) has the
following lower bound
` bc .
(67)
Proof
By Lemma 3.1 applied to A = P (and to the index k determined for ` )
X
|` pkk |
pkj = 1 pkk
` 1 + 2pkk .
j: 1j`, j6=k
1 bc
` 1 + 2 min pii 1 + 2 1
= bc .
i=1,...,`
2
2
Remark
Summarizing the results of Theorems 3.15 and 3.16 we have shown that
n
(1 bc/2 )2
(1 bc/2 )2 c o
|2 | = max{2 , |` |} max 1
, b =1
.
2
2
(68)
3 MONTECARLO SIMULATION
3.4.2
96
In this section we will investigate the characteristics of MonteCarlo estimators for expectations.
Examples for similar problems were already discussed in Section 3.1.1,
when we estimated by statistical means
and the value of integrals via MonteCarlo simulation.
However, for these purposes we assumed
that the pseudorandom numbers can be regarded as realizations of independent and identically distributed sampling variables.
In the present section we assume that the sample variables form an (appropriately chosen) Markov
chain.
This is the reason why these estimators are called MarkovChainMonteCarlo estimators (MCMC estimators).
Statistical Model
Let V be a finite (nonempty) index set and let X = (X(v), v V ) be a discrete random vector,
taking values in the finite state space E R|V | with probability 1,
where E is identified with the set E = {1, . . . , `} of the first ` = |E| natural numbers.
Furthermore, we assume i > 0 for all i E where = (i , i E) denotes the probability
function of the random vector X.
Our goal
is to estimate the expectation = E (X) via MCMC simulation where
= >
(69)
n 1,
(70)
k=0
where X0 , X1 , . . . is a Markov chain with state space E, arbitrary but fixed initial distribution
and
an irreducible and aperiodic transition matrix P = pij , such that is the ergodic limit distribution with respect to P.
Remarks
Typically, the initial distribution does not coincide with the simulated distribution .
Consequently, the MCMC estimator bn defined by (70) is not unbiased for fixed (finite) sample
size,
i.e. in general E bn 6= for all n 1.
For determining the bias E bn the following representation formula will be helpful.
Theorem 3.17
For all n 1,
n1
X
1
Pk .
E bn = >
n
k=0
(71)
3 MONTECARLO SIMULATION
97
Proof
> k
In Theorem 2.3 we proved that for all k 1 the distribution k of Xk is given by >
k = P .
Thus, by definition (70) of the MCMC estimator bn , we get that
n1
n1
n1
n1
X
1 X
1 X >
1 X > k
1
E bn =
E (Xk ) =
k =
P = >
Pk .
n
n
n
n
k=0
k=0
k=0
k=0
Remarks
As an immediate consequence of Theorem 3.17, the ergodicity of the transition matrix P, and (69),
one obtains
lim E bn = ,
n
Apart from this, the asymptotic behavior of n E bn for n can be determined. For this purpose we need
the following two lemmata.
Lemma 3.2
Let be the ` ` matrix consisting of the ` identical row vectors > . Then
(P )n = Pn
(72)
lim (P )n = 0 .
(73)
Proof
Evidently, (72) holds for n = 1.
If we assume that (72) holds for some n 1 1, then
(P )n
=
=
and thus
P = P = = 2 .
Remarks
By the zero convergence (P)n 0 for n in Lemma 3.2 and Lemma 2.4, the matrix I(P)
is invertible.
In order to show this it suffices to consider the matrix A = P in Lemma 2.4.
The inverse matrix
Z = (I (P ))1
(74)
3 MONTECARLO SIMULATION
98
Lemma 3.3 The fundamental matrix Z = (I (P ))1 of the irreducible and aperiodic transition matrix
P has the representation formulae
X
Z=I+
(Pk )
(75)
k=1
and
n1
X
Z = I + lim
nk
(Pk ) .
n
k=1
(76)
Proof
Formula (75) follows from Lemmas 2.4 and 3.2 as for A = P
Z
=
=
=
(2.79)
=
(72)
(I A)1
(I A)1 lim (I An )
n
lim (I A)1 (I An )
n
lim I + A + . . . + An1
n
X
X
k
k
I+
A
=I+
(P )
k=1
I+
k=1
(Pk ) .
k=1
(Pk )
k=1
n1
X
k=1
n
n
X
nk
k
1 X
(Pk ) =
(Pk ) =
k(P )k
n
n
n
k=1
k=1
n
X
kA =
k=1
n
X
Ak nAn+1
k=1
k=1
lim
(72)
1 X
Z
(P )k Z(P )n+1
n
1
Z lim
n n
k=1
n
X
!
k
(P ) lim (P )
n
|
{z
{z
}
(73)
k=1
(75)
ZI
0.
n+1
Theorem 3.17 and Lemma 3.3 enable us to give a more detailed description of the asymptotic behavior of the
bias E bn .
3 MONTECARLO SIMULATION
99
Theorem 3.18
Let a = > Z > where Z denotes the fundamental matrix of P that was introduced by (74).
Then, for all n 1,
n E bn = a + en ,
(77)
>
+ lim
n
n1
X
lim >
(Pk )
k=1
>
n1
X
k=1
n1
X
>
Pk (n 1)
| {z }
= >
Pk (n 1) > .
k=0
Hence by taking into account Theorem 3.17 we obtain the following for a certain sequence {en } such
that en 0:
>
a =
Z >
n1
X
= >
Pk n > en
k=0
(71)
(69)
3.4.3
n E bn n > en
n E bn en .
For the statistical model introduced in Section 3.4.2 we now investigate the asymptotic behavior of the variance
Var bn if n .
Theorem 3.19 Define 2 =
defined by (74). Then
P`
i=1
(78)
Proof
Clearly,
n2 Var bn = E
n1
X
2 n1
2
X
(Xk )
E (Xk )
k=0
(79)
k=0
and thus
n Var bn =
2
n1
X
k=0
E (Xk ) + 2
X
0k<k0 n1
n1
2
X
0
E (Xk )(Xk )
E (Xk ) .
k=0
3 MONTECARLO SIMULATION
100
This representation will now be used to show (78) for the case 0 = .
In this case we observe
n1
X
2
E (Xk ) = (n)2
and
k=0
n1
X
E (Xk ) = n
`
X
i 2i .
i=1
k=0
n1
X
E (Xk )(X ) =
(n k)E (X0 )(Xk ) ,
k0
0k<k0 n1
where
k=1
` X
`
X
(k)
E (X0 )(Xk ) =
i i pij j = > diag()Pk
i=1 j=1
(k)
(pij )
and Pk = P(k) =
denotes the matrix of the k-step transition probabilities.
A combination of the results above yields
n1
X
1
Var
(Xk )
n
`
X
i 2i + 2 > diag()
i=1
k=0
n1
X
k=1
nk k
P n2
n
n1
!
X nk
n
1
2 + 2 > diag()
Pk
n
2
k=1
n1
!
X nk
2
>
k
+ 2 diag()
P ,
n
=
=
k=1
()
At this point we will use a more precise notation: We will write X0 , X1 , . . . instead of X0 , X1 , . . .
()
and bn instead of bn .
It suffices to show that
For this purpose we introduce the following notation: For 0 < r < n 1 let
Yr() =
r1
X
k=0
()
(Xk )
und
()
Zrn
=
n1
X
()
(Xk ) .
k=r
Then, by (79),
() 2
() 2
() 2
() 2
=
E Yr() + Zrn
E Yr() + Zrn
E Yr() + E Zrn
E Yr() + E Zrn
2
2
2
2
=
E Yr() E Yr() E Yr() + E Yr()
()
()
()
()
+2E Yr() E Yr() Zrn
E Zrn
2E Yr() E Yr() Zrn
E Zrn
() 2
() 2
() 2
() 2
+ E Zrn
E Zrn
E Zrn
E Zrn
,
where we denote the three summands in the last expression by Ir , IIrn and IIIrn , respectively.
3 MONTECARLO SIMULATION
101
()
()
()
1 ()
= lim 1 Zrn
= 0.
Zrn E Zrn
E Zrn
n n
n
1 X (i ) 2
(i ) 2
E Z0,nr E Z0,nr
|i ri |
n i=1
2 X
1
( )
( )
E Z0nj E Z0nj
|i ri | ,
sup max
n>0 j{1,...,`} n + r
{z
} i=1
|
<
()
Due to the ergodicity of the Markov chain X0 , X1 , . . ., the last summand will become arbitrarily
small for sufficiently large r. This completes the proof of (80).
Remarks
Note that
2
b 1 , . . . , Xn ) for defined
for the mean squared error E ( bn ) of the MCMC estimator bn = (X
in (70) it holds that
2
E ( bn ) = E bn + Var bn ,
(81)
i.e., the mean squared error of the MCMC estimator bn is equal to the sum of the squared bias
(E bn )2 and the variance Var bn of the estimator bn .
Both summands on the right hand side of (81) converge to 0 if n but with different rates of
convergence.
In Theorem 3.19 we showed that Var bn = O(n1 ).
On the other hand, by Theorem 3.18 we get that (E bn )2 = O(n2 ).
2
Consequently, the asymptotic behavior of the mean squared error E ( bn ) of bn is crucially
influenced by the asymptotic variance Var bn of the estimator, whereas the bias plays a minor role.
In other words: It can make sense to choose the simulation matrix P such that
the asymptotic variance limn nVar bn is as small as possible,
3 MONTECARLO SIMULATION
102
Theorem 3.20
Let P1 = (p1,ij ) and P2 = (p2,ij ) be two transition matrices on E such that (P1 , ) and (P2 , ) are
reversible.
For arbitrary i, j E such that i 6= j let p1,ij p2,ij ,
i.e., outside the diagonal all entries of the transition matrix P1 are greater or equal than the corresponding entries of the transition matrix P2 .
Then, for any function : E R,
V (, P1 , ) V (, P2 , ) .
(82)
Proof
Let P = (pij ) be a transition matrix such that the pair (P, ) is reversible. It suffices to show that
V (, P, ) 0 ,
pij
By Theorem 3.19,
i, j E with i 6= j.
(83)
Z
V (, P, ) = 2 > diag()
,
pij
pij
(84)
Z
Z1
= Z
Z.
pij
pij
Z1
V (, P, ) = 2 > diag()Z
Z .
pij
pij
(85)
As the pair (P, ) is reversible, by the representation formula (75) for the fundamental matrix Z = (zij )
that was derived in Lemma 3.3 we obtain for arbitrary i, j E
i zij = i ij +
X
(k)
(k)
i pij i j = j ji +
j pji j i = j zji .
k=1
k=1
This implies
> diag()Z
`
X
i i zi1 , . . . ,
i=1
=
=
i i zi`
i=1
`
X
i=1
>
`
X
z1i i , . . . , `
`
X
i=1
diag() .
z`i i
3 MONTECARLO SIMULATION
103
Thus, by (85),
>
>
Z1
P
V (, P, ) = 2 Z diag()
Z = 2 Z diag()
Z ,
pij
pij
pij
(86)
P
diag()
=
i
if (i0 , j 0 ) = (i, j) or (i0 , j 0 ) = (j, i),
pij i0 ,j 0
0,
else
This implies that the matrix diag()(P/pij ) is non-negative definite, i.e., for all x R`
P
x 0.
pij
By (86) this yields for arbitrary i, j E such that i 6= j
x> diag()
>
P
V (, P, ) = 2 Z diag()
Z 0 .
pij
pij
This completes the proof of (83).
Remarks
the simulation matrix P of the Metropolis algorithm (i.e. if we consider equality in (50) ) minimizes
the asymptotic variance V (, P, )
within the class
ofall MetropolisHastings algorithms having an arbitrary but fixed potential transition
matrix Q = qij .
3.5
3 MONTECARLO SIMULATION
3.5.1
104
First of all we consider a method for coupling the paths of Markov chains where the time
is running forward, i.e. in a way that is perceived as natural.
Therefore, one also refers to this method as coupling to the future.
(i) (i)
For all i {1, . . . , `} let X(i) = X0 , X1 , . . . be a homogenous Markov chain with finite state space
E = {x1 , . . . , x` }
(i)
such that = (x , x E) is the ergodic limit distribution of the Markov chain X(i) .
Definitions
For all k {1, . . . , `} we consider
(k) (k)
(k)
X(i)
n = xk , Un
(i)
if Xn1 = xk ,
(87)
(`)
(1)
is called coupling time where we define
The random variable = min n 1 : Xn = . . . = Xn
(`)
(1)
= if there is no natural number n such that Xn = . . . = Xn .
Theorem 3.21 If the sequences of innovations U(1) , . . . , U(`) are independent, then < with probability 1
(`)
(1)
and Xn = . . . = Xn for all n > .
Proof
(1)
(`)
The recursive definition (87) of the Markov chains X(1) , . . . , X(`) immediately implies Xn = . . . = Xn
for all n > .
It is left to show that P ( < ) = 1. We notice that it suffices to show that for arbitrary i 6= i0
(i0 )
lim P max n : X(i)
r = 1.
n 6= Xn
r
As
(i0 )
(i0 )
P max n : X(i)
r
= 1 P max n : X(i)
>r
n 6= Xn
n 6= Xn
(i0 )
= 1 P X(i)
r 6= Xr
this is equivalent to
0
)
lim P Xr(i) 6= X(i
= 0.
r
3 MONTECARLO SIMULATION
105
x,x0 E
and consider the decomposition r = m(r)n0 + k for some m(r) {0, 1, . . .} and
k {0, 1, . . . , n0 1}.
The independence of the innovation sequences U(1) , . . . , U(`) yields for r
(i0 )
(i)
(i0 )
(i)
(i0 )
P X(i)
=
6
X
=
P
X
=
6
X
,
X
=
6
X
r
r
n0
n0
r
r
=
(i0 )
(i)
(i0 )
(i)
(i0 )
0
0
P X(i)
=
x
,
X
=
x
P
X
=
6
X
|
X
=
x
,
X
=
x
j
j
j
j
n0
n0
r
r
n0
n0
` X
X
j=1 j 0 6=j
(j)
(j 0 )
(i )
P X(i)
n0 = xj P Xn0 = xj 0 P Xrn0 6= Xrn0
` X
X
j=1 j 0 6=j
`
X
j=1
0)
p(n
xi xj
(j)
(j 0 )
0)
p(n
xi0 xj 0 P Xrn0 6= Xrn0
j 0 6=j
{z
1c
(j)
(j 0 )
(1 c) max P Xrn0 6= Xrn0
..
.
(1 c)m(r) 0 .
j6=j
Remarks
Under additional assumptions about the irreducible and periodic transition matrix P = (pxx0 ) it can
be shown that the coupling time is finite even if
if
j1
P
r=1
pxxr < u
j
P
r=1
pxxr .
(88)
Such an additional condition imposed on P will be discussed in the following theorem, see also the
monotonicity condition in Section 3.5.3.
Theorem 3.22
Let U(1) = . . . = U(`) = U and let the update function : E (0, 1] E be given by (88). Furthermore,
for some xi0 E, let
iX
i0
0 1
X
max
pxxr < min
pxxr .
(89)
xE
xE
r=1
(1)
r=1
(`)
3 MONTECARLO SIMULATION
106
Proof
Similar to the proof of Theorem 3.21 it suffices to show that for arbitrary i 6= i0
(i0 )
lim P X(i)
=
6
X
= 0.
r
r
r
Observe that
0
)
P Xr(i) 6= X(i
r
=
=
(i)
(i0 )
(i0 )
P X1 6= X1 , X(i)
r 6= Xr
` X
X
(i)
(i0 )
(i)
(i0 )
(i0 )
P X1 = xj , X1 = xj 0 P X(i)
| X1 = x j , X1 = x j 0
r 6= Xr
j=1 j 0 6=j
` X
X
(i)
(i0 )
(j)
(j 0 )
P X1 = xj , X1 = xj 0 P Xr1 6= Xr1
j=1 j 0 6=j
(i)
(i0 )
(j)
(j 0 )
1 P X1 = X1
max
P
X
=
6
X
r1
r1
0
{z
} j 6=j
|
d>0
..
.
(1 d)r 0 ,
iX
0 1
r=1
pxxr min
xE
i0
X
(i0 )
(i)
(i0 )
(i)
.
pxxr P X1 = X1 = xi0 P X1 = X1
r=1
Remarks
(i)
0.5 0.5
P=
1
0
whose stationary limit distribution is = (2/3, 1/3)> .
(1)
(2)
(1)
(2)
(1)
(2)
Recall that
the procedure of coupling to the future discussed in Section 3.5.1 starts at a deterministic time 0
whereas the final state, i.e. the coupling time of the simulation is random.
Moreover, the state distribution of the Markov chain X(i) at the coupling time is in general not equal
to the stationary limit distribution .
3 MONTECARLO SIMULATION
107
For the precise mathematical modelling of this procedure we need the following notation.
For each potential initial time m {1, 2, . . .} and for all i {1, . . . , `} let
(m,i)
X(m,i) = X(m,i)
, Xm+1 , . . .
m
be a homogenous Markov chain with finite state space E = {x1 , . . . , x` },
(m,i)
X(m,i)
= xk , Un(k)
n
(m,i)
if Xn1 = xk .
(90)
(m,`)
(m,1)
is called CFTP coupling time
Definition The random variable = min m 1 : X0
= . . . = X0
(m,1)
(m,`)
where we define = if there is no integer m such that X0
= . . . = X0
.
Theorem 3.23
X0
(m,`)
= . . . = X0
X0
(,j)
= X0
3 MONTECARLO SIMULATION
108
Proof
(m,1)
Directly by the recursive definition (90) of the Markov chains X(m,1) , . . . , X(m,`) , we get that X0
(m,`)
(m,i)
(,j)
. . . = X0
and X0
= X0
for arbitrary m and i, j {1, . . . , `}.
(,i)
(,i)
P X0
= xk
=
lim P X0
= xk , m
m
(m,i)
=
lim P X0
= xk , m
m
(m,i)
(m,i)
=
lim P X0
= xk lim P X0
= xk , > m
m
m
{z
}
|
=0
(m,i)
=
lim P X0
= xk
m
(0,i)
=
lim P Xm = xk = xk ,
m
where the last but one equality is a consequence of the homogeneity of the Markov chain X(m,i) .
Remarks
If the number ` of elements in the state space E = {x1 , . . . , x` } is large,
the MCMC simulation of based on the CFTP algorithm by Propp and Wilson can be computationally inefficient
as for every initial state x1 , . . . , x` a complete path needs to be generated.
However, in some cases the computational complexity can be reduced. Examples will be discussed in
Sections 3.5.3 and 3.5.4.
In these special situations the state space E = {x1 , . . . , x` } and the update function : E(0, 1]
E possess certain monotonicity properties.
3.5.3
We additionally assume that the state space E = {x1 , . . . , x` } is partially ordered and has a maximal
element 1 E and a minimal element 0 E, i.e., there is a relation on E such that
(a) x x ,
x E ,
(b) x y and y z x z ,
x, y, z E ,
(c) x y and y x x = y ,
x, y E ,
(d) 0 x 1 ,
x E .
u (0, 1] .
(91)
3 MONTECARLO SIMULATION
109
X(m,i)
= Xn1 , Un ,
n
n = m + 1, m + 2, . . . .
(92)
Remarks
If xi xj , then by (91) and (92) we get that for all n m
X(m,i)
X(m,j)
.
n
n
(93)
(94)
, Xm+1
X(m,min) = (X(m,min)
m
, . . .)
(m,max)
, Xm+1
X(m,max) = (X(m,max)
m
and
(m,min)
(m,max)
= 0 and Xm
, . . .)
= 1.
Due to (94) it suffices to choose an initial time that lies far enough in the past
such that the paths of X(m,min) and X(m,max) will have merged by time 0,
i.e., we consider the CFTP coupling time
(m,min)
(m,max)
.
= min m 1 : X0
= X0
Theorem 3.24
(95)
Let the update function : E (0, 1] E satisfy the monotonicity condition (91).
Then, for the CFTP coupling time defined by (95), it holds that < with probability 1.
(m,i)
(,j)
= X0
Proof
(m,i)
(,j)
= X0
for arbitrary m and i, j {1, . . . , `} if
As the argument showing that X0
P ( < ) = 1, is similar to the proof of Theorem 3.23 this part of the proof is omitted.
We merely show that P ( < ) = 1.
First of all, we observe that for all r 1
n
o
(r,min)
(r,min)
{ > r} Xr+1 6= 1, . . . , X0
6= 1 ,
as (94) implies
{ > r}
=
=
(94)
n
o
(r,min)
(r,max)
X0
6 X0
=
n
o
(r,min)
(r,max)
(r,min)
(r,max)
Xr+1 =
6 Xr+1 , . . . , X0
6= X0
n
o
(r,min)
(r,min)
Xr+1 =
6 1, . . . , X0
6= 1 .
(96)
3 MONTECARLO SIMULATION
110
(97)
x,x0 E
and decompose r such that r = m(r)n0 + k for some m(r) {0, 1, . . .} and k {0, 1, . . . , n0 1}.
By (96) and (97) we obtain
P =
=
lim P > r
r
(96)
(97)
(r,min)
(r,min)
lim P Xr+1 6= 1, . . . , X0
6= 1
r
X
(n )
(n0 )
0)
lim
p 0x01 p(n
x1 x2 . . . pxm(r)1 xm(r)
r
x1 ,...,xm(r) 6= 1
lim (1 c)m(r) = 0 .
Remarks
Sometimes the update function : E(0, 1] E is not monotonously nondecreasing but nonincreasing
with respect to the partial order , i.e., for arbitrary x, y E such that x y we have
(x, u) (y, u) ,
u (0, 1] .
(98)
x E; u1 , u2 (0, 1] .
(99)
This function has the desired property as by (98) and (99) we obtain for arbitrary x, y E such
that x y
0 (x; u1 , u2 ) = ((x, u1 ), u2 ) ((y, u1 ), u2 ) = 0 (y; u1 , u2 ) ,
u1 , u2 (0, 1] ,
(2m,min)
(2m,max)
0 = min m 1 : X0
= X0
is finite with probability 1, i.e., 0 < and
(2 0 ,i)
X0
3.5.4
1. BirthandDeath Processes
The update function : E (0, 1] E defined in (88) satisfies the monotonicity condition (91)
3 MONTECARLO SIMULATION
111
if the state space can identified with the set E = {1, . . . , `} equipped with the natural order of
the numbers 1, . . . , `
and if the simulation matrix P = (pij ) is monotonously nondecreasing with respect to the order
, i.e., for arbitrary i, j E such that i j we have
`
X
pir
r=k
`
X
pjr ,
k = 1, . . . , ` .
(100)
r=k
A whole class of transition matrices P = (pij ) satisfying the monotonicity condition (100) is given by
the tridiagonal matrices of birthanddeath processes which are of the type
1 p12
p12
0
...
0
1 p21 p23
p23
...
0
p21
0
p32
1 p32 p34 . . .
0
P=
..
..
..
..
.
.
.
.
0
0
0
.
.
.
p
`1,`
0
0
0
. . . 1 p`,`1
where 0 < pi,i+1 1/2 for all i = 1, . . . , ` 1 and 0 < pi,i1 1/2 for all i = 2, . . . , `.
Zeit
Zeit
Zeit
Figure 7: Monotonic coupling to the past for monotonously nondecreasing deathandbirth processes
On the other hand, the update function : E (0, 1] E defined in (88) is monotonously nonincreasing, see (98),
if P = (pij ) is monotonously nonincreasing with respect to ,
i.e., if for arbitrary i, j E such that i j we have
`
X
r=k
pir
`
X
r=k
pjr ,
k = 1, . . . , ` .
(101)
3 MONTECARLO SIMULATION
112
It is easy to show that there is no tridiagonal transition matrix P = (pij ) satisfying the condition
(101), i.e., birthanddeath processes are never monotonously nonincreasing.
However, condition (101) holds for example for the following matrix:
0 ...
0
0
1
0
1/2
1/2
0 ...
1/3
1/3
1/3
0 ...
P=
..
..
..
..
.
.
.
.
1/` . . .
1/`
1/`
1/`
2. Ising Model
Like for the hardcore model discussed in Section 3.3.1
we consider a connected graph G = (V, K) with finitely many vertices V = {v1 , . . . , v|V | }
and a certain set K V 2 of edges e = (vi , vj ), each of them connecting two vertices vi , vj .
One of the values 1 and 1 is assigned to each vertex,
and we consider the state space E = {1, 1}|V | of all configurations x = (x(v), v V ), i.e. for
each v V either x(v) = 1 or x(v) = 1.
If this is interpreted as an image, x(v) = 1 is regarded as a white pixel and x(v) = 1 as a black
pixel.
For each x E let the probability x of the configuration x be given by
!
X
1
x(vi )x(vj )
x =
exp J
zG,J
(102)
e=(vi ,vj )K
!
X
X
zG,J =
exp J
x(vi )x(vj ) .
xE
e=(vi ,vj )K
The following figure was taken from O. Hggstrm (2002) Finite Markov Chains and Algorithmic
Applications, CU Press, Cambridge.
3 MONTECARLO SIMULATION
113
Figure 8: Typical configuration of the Ising model for J = 0 (upper left corner), J = 0.15 (upper right corner),
J = 0.3 (lower left corner) and J = 0.5 (lower right corner)
It illustrates the role of the parameter J,
i.e., an increase of J results in a more pronounced clumping tendency of identically colored pixels.
Let the simulation matrix P = (pxx0 ) be given by the Gibbs sampler, i.e., assume that (36) holds,
namely
X
pxx0 =
qv x0 (v)| x(v) 1I x(v) = x0 (v) ,
x, x0 E .
vV
x+
if x0 (v) = 1,
+
x+
x
x0 (v)| x(v) =
x
if x0 (v) = 1,
x+ + x
using the notation x (v) = 1 and x (v) = x(v) and similarly x+ (v) = 1 and x+ (v) =
x(v).
By (102) we obtain for x0 (v) = 1 that
3 MONTECARLO SIMULATION
114
x0 (v)| x(v)
if x0 (v) = 1,
if x0 (v) = 1,
(103)
where k+ (x(v)) and k (x(v)) denote the number of vertices connected to v having the values
1 and 1, respectively.
For the state space E = {1, 1}|V | we define the partial order
by x y if x(v) y(v) for all v V such that 0 x 1 for all x E,
where we assume the elements of the state space E = {x1 , . . . , x` } to be indexed in a way ensuring
i j if xi xj (this is e.g. the case if E is ordered lexicographically).
Then (103) implies for arbitrary x, y E such that x y
1| x(v) 1| y(v)
and
1| x(v) 1| y(v) ,
(104)
Let the update function : E (0, 1]2 E be given by (x; u1 , u2 ) = x0 , where x0 = x0 (v), v V
and for all i = 1, . . . , |V |
Pi
Pi1
Pi
Pi1
0
x (vi ) =
1
if j=1 q(vj ) < u1 j=1 q(vj ) and u2 1| x(vi ) ,
x(v ) , else.
i
By (104), for arbitrary x, y E such that x y we have
(x; u1 , u2 ) (y; u1 , u2 ) ,
u1 , u2 (0, 1] ,
A problem of the monotone CFTP algorithm discussed in Sections 3.5.3 and 3.5.4 is
the necessity to save all innovations U0 , U1 , . . . , U where denotes the coupling time defined in
(95), i.e.
(m,min)
(m,max)
= min m 1 : X0
= X0
.
Therefore, in the year 2000, David Wilson suggested the following modifications of the CFTP algorithm
aiming at a reduction of the necessary memory allocation.
The main idea of the modification is to realize coupling to the past (see Sections 3.5.2 3.5.4)
3 MONTECARLO SIMULATION
115
based on a sequence of independent and identically distributed blocks of forward simulation, where
(m,i) (m,i)
the (potential) initial times m {1, 2, . . .} of the Markov chain X(m,i) = Xm , Xm+1 , . . . can
be picked at random.
The innovation sequences U(1) , . . . , U(`) are chosen identical with probability 1,
(`)
= min n 1 : X(1)
n = . . . = Xn
(m,1)
(m,`)
bzw. = min m 1 : X0
= . . . = X0
with probability 1.
Now we consider blocks of forward simulation of (at first deterministic) length T for some T 1.
(kT,i)
= xi and
(kT,i)
X(kT,i)
= (Xn1 , Un ) ,
n
Furthermore, for each k 0 we consider the event
n
(kT,i)
(kT,j)
CkT = X(k+1)T = X(k+1)T ,
n = kT + 1, kT + 2, ... .
o
i 6= j {1, ..., l}
0 < P (CT )
= P (CkT ) ,
k 0 .
(105)
1. Simulate Xn
2. Set m = k and k = k + 1. If the event CmT has occurred proceed with step 3, otherwise return to step
1.
(mT,i)
3. Repeat steps 1 and 2 until the event Cm0 T occurs for some m0 > m and return the value of Xm0 T
an arbitrary i {1, ..., l} as a realization of .
Example
For ` = 3 states we consider the irreducible and aperiodic transition matrix
1/2 0 1/2
0
1
0
For block length T = 2 and the (0, 1]uniformly distributes pseudorandom numbers
u = (0.01, 0.60, 0.82, 0.47, 0.36, 0.59, 0.34, 0.89, ...)
we obtain the simulation run shown in Fig. 9.
for
3 MONTECARLO SIMULATION
116
Output
2T
3T
Cm
4T
C m
innovation sequence U = U0 , U1 , . . . ,
then by the following elementary but useful properties of the coupling times and we obtain
P (CT ) 1/2.
d
Theorem 3.25 The random variables and have the same distribution, i.e. = . Moreover, if the coupling
times and are independent and almost surely finite, then
P ( )
1
.
2
(106)
Proof
By the homogeneity of the Markov chains X(1) , . . . , X(`) and X(m,1) , . . . , X(m,`) , for any natural number
k 1 we have
(`)
P ( = k) = P min n 1 : X(1)
=k
n = . . . = Xn
(m,1)
(m,`)
= P min m 1 : X0
= . . . = X0
=k
=
P ( = k) .
3 MONTECARLO SIMULATION
117
Let now the coupling times and be independent and finite with probability 1.
This implies
P ( ) =
=
X
k=1
P ( | = k)P ( = k) =
P ( k | = k)P ( = k)
k=1
P ( k)P ( = k) =
k=1
..
.
=
P ( k)P ( = k)
k=1
P ( ) ,
d
where the last equality follows from = which has been shown in the first part of the proof.
Thus,
2 P ( ) = P ( ) + P ( )
= 1 P ( > ) + 1 P ( < )
= 2 P ( 6= )
2 1 = 1.