Schmidt MCMC 2010

ITT
M
CENDO
Ulm University
Institute of Stochastics
Lecture Notes
Prof. Dr. Volker Schmidt
Summer 2010
Ulm, July 2010
DO
Markov Chains and MonteCarlo

Simulation
UR
SCIENDO
ANDO U
N
ERS
IV
CONTENTS
Contents
1 Introduction
2 Markov Chains
2.1
2.2
2.3
Specification of the Model and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1
State Space, Initial Distribution and Transition Probabilities . . . . . . . . . . . . . . . . .
2.1.2
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3
Recursive Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.4
The Matrix of the nStep Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . 12
Ergodicity and Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1
Basic Definitions and Quasi-positive Transition Matrices . . . . . . . . . . . . . . . . . . . . 16
2.2.2
Estimates for the Rate of Convergence; PerronFrobeniusTheorem . . . . . . . . . . . . . 20
2.2.3
Irreducible and Aperiodic Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4
Stationary Initial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.5
Direct and Iterative Computation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Reversibility; Estimates for the Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.3.1
Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.2
Recursive Construction of the Past . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.3
Determining the Rate of Convergence under Reversibility . . . . . . . . . . . . . . . . . . . 43
2.3.4
Multiplicative Reversible Version of the Transition Matrix; Spectral Representation . . . . . 45
2.3.5
Alternative Estimate for the Rate of Convergence; 2 -Contrast . . . . . . . . . . . . . . . . 46
2.3.6
DirichletForms and RayleighTheorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.7
Bounds for the Eigenvalues 2 and ` . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 MonteCarlo Simulation
3.1
3.2
3.3
58
Generation of Pseudo-Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.1.1
Simple Applications; MonteCarlo Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.2
Linear Congruential Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.3
Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Transformation of Uniformly Distributed Random Numbers . . . . . . . . . . . . . . . . . . . . . . 68

3.2.1
Inversion Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2.2
Transformation Algorithms for Discrete Distributions . . . . . . . . . . . . . . . . . . . . . 71
3.2.3
Acceptance-Rejection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.4
Quotients of Uniformly Distributed Random Variables . . . . . . . . . . . . . . . . . . . . . 79
Simulation Methods Based on Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.3.1
Example: HardCore Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.3.2
Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
CONTENTS
3.3.3
MetropolisHastings Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4
3.5
Error Analysis for MCMC Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.4.1
Estimate for the Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.2
MCMC Estimators; Bias and Fundamental Matrix . . . . . . . . . . . . . . . . . . . . . . . 96
3.4.3
Asymptotic Variance of Estimation; Mean Squared Error . . . . . . . . . . . . . . . . . . . 99
Coupling Algorithms; Perfect MCMC Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.5.1
Coupling to the Future; Counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.5.2
ProppWilson Algorithm; Coupling from the Past . . . . . . . . . . . . . . . . . . . . . . . 106
3.5.3
Monotone Coupling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.5.4
Examples: BirthandDeath Processes; Ising Model . . . . . . . . . . . . . . . . . . . . . . 110
3.5.5
ReadOnce Modification of the CFTP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 114
1 INTRODUCTION
Introduction
Markov chains
are a fundamental class of stochastic models for sequences of nonindependent random variables, i.e.
of random variables possessing a specific dependency structure.
have numerous applications e.g. in insurance and finance.
play also an important role in mathematical modelling and analysis in a variety of other fields such as
physics, chemistry, life sciences, and material sciences.
Questions of scientific interest often exhibit a degree of complexity resulting in great difficulties if the
attempt is made to find an adequate mathematical model that is solely based on analytical formulae.
In these cases Markov chains can serve as an alternative tool as they are crucial for the construction of
computer algorithms for the Markov Chain Monte Carlo simulation (MCMC) of the mathematical models
under consideration.
This course on Markov chains and Monte Carlo simulation will be based on the methods and models introduced
in the course Elementare Wahrscheinlichkeitsrechnung und Statistik. Further knowledge of probability theory
and statistics can be useful but is not required.
The main focus of this course will be on the following topics:
discretetime Markov chains with finite state space

stationarity and ergodicity
Markov Chain Monte Carlo (MCMC)
reversibility and coupling algorithms
Notions and results introduced in Elementare Wahrscheinlichkeitsrechnung and Statistik will be used
frequently. References to these lecture notes will be labelled by the prefix WR in front of the number
specifying the corresponding section, theorem, lemma, etc.
The following list contains only a small collection of introductory texts that can be recommended for in
depth studies of the subject complementing the lecture notes.
E. Behrends (2000) Introduction to Markov Chains. Vieweg, Braunschweig
P. Bremaud (2008) Markov Chains, Gibbs Fields, Monte Carlo Simulation, and Queues. Springer,
New York
B. Chalmond (2003) Modeling and Inverse Problems in Image Analysis. Springer, New York
D. Gamerman, H. Lopes (2006) Markov Chain Monte Carlo: Stochastic Simulation for Bayesian
Inference. Chapman & Hall, London
O. Hggstrm (2002) Finite Markov Chains and Algorithmic Applications. Cambridge University
Press, Cambridge
D. Levin, Y. Peres, E. Wilmer (2009) Markov chains and mixing times. Publications of the AMS,
Riverside
S. Resnick (1992) Adventures in Stochastic Processes. Birkhuser, Boston
C. Robert, G. Casella (2009) Introducing Monte Carlo Methods with R. Springer, Berlin
T. Rolski, H. Schmidli, V. Schmidt, J. Teugels (1999) Stochastic Processes for Insurance and Finance.
Wiley, Chichester
Y. Suhov, M. Kelbert (2008) Probability and Statistics by Example. Volume 2. Markov Chains: A
Primer in Random Processes and their Applications. Cambridge University Press, Cambridge
H. Thorisson (2002) Coupling, Stationarity, and Regeneration. Springer, New York
G. Winkler (2003) Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer,
Berlin
2 MARKOV CHAINS
Markov Chains
Markov chains can describe the (temporal) dynamics of objects, systems, etc.
that can possess one of finitely or countably many possible configurations at a given time,
where these configurations will be called the states of the considered object or system, respectively.
Examples for this class of objects and systems are
the current prices of products like insurance policies, stocks or bonds, if they are observed on a discrete
(e.g. integer) time scale,
the monthly profit of a business,
the current length of the checkout lines (socalled queues) in a grocery store,
the vector of temperature, air pressure, precipitation and wind velocity recorded on an hourly basis at
the meteorological office UlmKuhberg,
digital maps, for example describing the momentary spatial dispersion of a disease.
microscopical 2D or 3D images describing the current state (i.e. structural geometrical properties) of
biological tissues or technical materials such as polymers, metals or ceramics.
Remarks
In this course we will focus on discretetime Markov chains, i.e., the temporal dynamics of the considered objects, systems etc. will be observed stepwise, e.g. at integer points in time.
The algorithms for Markov Chain Monte Carlo simulation we will discuss in part II of the course are
based on exactly these discretetime Markov chains.
The number of potential states can be very high.
For mathematical reasons it is therefore convenient to consider the case of infinitely many states as
well. As long as the infinite case is restricted to countably many states, only slight methodological
changes will be necessary.
2.1
2.1.1
Specification of the Model and Examples

State Space, Initial Distribution and Transition Probabilities
The stochastic model of a discretetime Markov chain with finitely many states consists of three components:
state space, initial distribution and transition matrix.
The model is based on the (finite) set of all possible states called the state space of the Markov chain.
W.l.o.g. the state space can be identified with the set E = {1, 2, . . . , `} where ` N = {1, 2, . . .} is an
arbitrary but fixed natural number.
For each i E, let i be the probability of the system or object to be in state i at time n = 0, where
it is assumed that
`
X
i [0, 1] ,
i = 1 .
(1)
i=1
>
The vector = (1 , . . . , ` )
Markov chain.
of the probabilities 1 , . . . , ` defines the initial distribution of the
Furthermore, for each pair i, j E we consider the (conditional) probability pij [0, 1] for the
transition of the object or system from state i to j within one time step.
2 MARKOV CHAINS
The ` ` matrix P = (pij )i,j=1,...,` of the transition probabilities pij where

pij 0 ,
`
X
pij = 1 ,
(2)
j=1
is called onestep transition matrix of the Markov chain.

For each set E = {1, 2, . . . , `}, for any vector = (1 , . . . , ` )> and matrix P = (pij ) satisfying the
conditions (1) and (2) the notion of the corresponding Markov chain can now be introduced.
Definition
Let X0 , X1 , . . . : E be a sequence of random variables defined on the probability space (, F, P )
and mapping into the set E = {1, 2, . . . , `}.
Then X0 , X1 , . . . is called a (homogeneous) Markov chain with initial distribution = (1 , . . . , ` )>
and transition matrix P = (pij ), if
P (X0 = i0 , X1 = i1 , . . . , Xn = in ) = i0 pi0 i1 . . . pin1 in
(3)
for arbitrary n = 0, 1, . . . and i0 , i1 , . . . , in E.

Remarks
A quadratic matrix P = (pij ) satisfying (2) is called a stochastic matrix.
The following Theorem 2.1 reveals the intuitive meaning of condition (3). In particular the motivation
for the choice of the words initial distribution and transition matrix will become evident.
Furthermore, Theorem 2.1 states another (equivalent) definition of a Markov chain that is frequently
found in literature.
Theorem 2.1 The sequence {Xn } of Evalued random variables is a Markov chain if and only if there is a
stochastic matrix P = (pij ) such that
P (Xn = in | Xn1 = in1 , . . . , X0 = i0 ) = pin1 in
(4)
for any n = 1, 2, . . . and i0 , i1 , . . . , in E such that P (Xn1 = in1 , . . . , X0 = i0 ) > 0.

Proof
Clearly condition (4) is necessary for {Xn } to be a Markov chain as (4) follows immediately from (3)
and the definition of the conditional probability; see Section WR2.6.1.
Let us now assume {Xn } to be a sequence of E-valued random variables such that a stochastic matrix
P = (pij ) exists that satisfies condition (4).
For all i E we define i = P (X0 = i) and realize that condition (3) obviously holds for n = 0.
Furthermore,
P (X0 = i0 ) = 0 implies P (X0 = i0 , X1 = i1 ) = 0,
and in case P (X0 = i0 ) > 0 from (4) we can conclude that
(4)
P (X0 = i0 , X1 = i1 ) = P (X0 = i0 )P (X1 = i1 | X0 = i0 ) = i0 pi0 i1 .

Therefore
0,
P (X0 = i0 , X1 = i1 ) =
p
i0 i0 i1 ,
i.e., we showed that (3) also holds for the case n = 1.
if i0 = 0,
if i0 > 0,
2 MARKOV CHAINS
Now assume that (3) holds for some n = k 1 1.

By the monotonicity of probability measures (see statement 2 in Theorem WR2.1)
P (X0 = i0 , X1 = i1 , . . . , Xk1 = ik1 ) = 0 immediately implies
P (X0 = i0 , X1 = i1 , . . . , Xk = ik ) = 0.
On the other hand if P (X0 = i0 , X1 = i1 , . . . , Xk1 = ik1 ) > 0, then
P (X0 = i0 , X1 = i1 , . . . , Xk = ik )
= P (X0 = i0 , X1 = i1 , . . . , Xk1 = ik1 )P (Xk = ik | Xk1 = ik1 , . . . , X0 = i0 )
= i0 pi0 i1 . . . pik2 ii1 pik1 ik .
Thus, (3) also holds for n = k and hence for all n N.
Corollary 2.1 Let {Xn } be a Markov chain. Then,

P (Xn = in | Xn1 = in1 , . . . , X0 = i0 ) = P (Xn = in | Xn1 = in1 )
(5)
holds whenever P (Xn1 = in1 , . . . , X0 = i0 ) > 0.

Proof
Let P (Xn1 = in1 , . . . , X0 = i0 ) and hence also P (Xn1 = in1 ) be strictly positive.
In this case (3) yields
P (Xn = in | Xn1 = in1 )
P (Xn = in , Xn1 = in1 )

P (Xn1 = in1 )
P
P (Xn = in , . . . , X0 = i0 )
i0 ,...,in2 E
P (Xn1 = in1 , . . . , X0 = i0 )
i0 ,...,in2 E
(3)
i0 ,...,in2 E
i0 pi0 i1 . . . pin2 in1 pin1 in
i0 ,...,in2 E
i0 pi0 i1 . . . pin2 in1
= pin1 in .
This result and (4) imply (5).
Remarks
Corollary 2.1 can be interpreted as follows:
The conditional distribution of the (random) state Xn of the Markov chain {Xn } at time n is
completely determined by the state Xn1 = in1 at the preceding time n 1.
It is independent from the states Xn2 = in2 , . . . , X1 = i1 , X0 = i0 observed in the earlier history
of the Markov chain.
The definition of the conditional probability immediately implies
the equivalence of (5) and
P (Xn = in , Xn2 = in2 . . . , X0 = i0 | Xn1 = in1 )
=
P (Xn = in | Xn1 = in1 ) P (Xn2 = in2 . . . , X0 = i0 | Xn1 = in1 ) .
(6)
The conditional independence (6) is called the Markov property of {Xn }.

The definitions and results of Section 2.1.1 are still valid,
if instead of a finite state space E = {1, 2, . . . , `} a countably infinite state space such as the set
of all integers or all natural numbers is considered.
It merely has to be taken into account that in this case and P possess an infinite number of
components and entries, respectively.
2 MARKOV CHAINS
2.1.2
Examples
1. Weather Forecast
(see. O. Hggstrm (2002) Finite Markov Chains and Algorithmic Applications. CU Press, Cambridge)
We assume to observe the weather in an area whose typical weather is characterized by longer periods
of rainy or dry days (denoted by rain and sunshine), where rain and sunshine exhibit approximately
the same relative frequency over the entire year.
It is sometimes claimed that the best way to predict tomorrows weather is simply to guess that
it will be the same tomorrow as it is today.
If we assume that this way of predicting the weather will be correct in 75% of the cases (regardless
whether todays weather is rain or sunshine), then the weather can be easily modelled by a Markov
chain.
The state space consists of the two states 1 =rain and 2 = sunshine.
The transition matrix is given as follows:
0.75 0.25
.
(7)
P=
0.25 0.75
Note that a crucial assumption for this model is the perfect symmetry between rain and sunshine in
the sense that the probability that todays weather will persist tomorrow is the same regardless of
todays weather.
In areas where sunshine is much more common than rain a more realistic transition matrix would be
the following:
0.5 0.5
P=
(8)
0.1 0.9
2. Random Walks; Risk Processes

Classic examples for Markov chains are socalled random walks. The (unbounded) basic model is
defined in the following way:
Let Z, Z1 , Z2 , . . . : Z be a sequence of independent and identically distributed random variables mapping to Z = {. . . , 1, 0, 1, . . .}.
Let X0 : Z be an arbitrary random variable, which is independent from the increments
Z1 , Z2 , . . ., and define
Xn = Xn1 + Zn ,
n 1.
(9)
Then the random variables X0 , X1 , . . . form a Markov chain on the countably infinite state space
E = Z with initial distribution = (1 , 2 , . . .)> , where i = P (X0 = i). The transition
probabilities are given by pij = P (Z = j i).
Remarks
The Markov chain given in (9) can be used as a model for the temporal dynamics of the solvability
reserve of insurance companies. X0 will then be interpreted as the (random) initial reserve and
the increments Zn as the difference Zn = a Zn0 between the riskfree premium income a > 0 and
random expenses for the liabilities Zn0 in time period n 1.
Another example for a random walk are the total winnings in n roulette games already discussed
in Section WR1.3. In this case we have X0 = 0. The distribution of the random increment Z is
given by P (Z = i) = 1/2 for i = 1, 1 and P (Z = i) = 0 for i Z \ {1, 1}.
2 MARKOV CHAINS
3. Queues
The number of customers waiting in front of an arbitrary but fixed checkout desk in a grocery store
can be modelled by a Markov chain in the following way:
Let X0 = 0 be the number of customers waiting in the line, when the store opens.
By Zn we denote the random number of new customers arriving while the cashier is serving the
nth customer (n = 1, 2, . . .).
We assume the random variables Z, Z1 , Z2 , . . . : {0, 1, . . .} to be independent and identically
distributed.
The recursive definition
Xn = max{0, Xn1 + Zn 1} ,
n 1,
(10)
yields a sequence of random variables X0 , X1 , . . . {0, 1, . . .} that is a Markov chain whose transition
matrix P = (pij ) has the entries
P (Z = j + 1 i) ,
if j + 1 i > 0 or j > i = 0,
pij =
P (Z = 0) + P (Z = 1) , if j = i = 0,
0,
else
Xn denotes the random number of customers waiting in the line right after the cashier has finished
serving the nth customer, i.e., the customer who has just started checking out and hence already left
the line is not counted any more.
4. Branching Processes
We consider the reproduction process of a certain population, where Xn denotes the total number of
descendants in the nth generation; X0 = 1.
We assume that
Xn1
Xn =
Zn,i ,
(11)
i=1
where {Zn,i , n, i N} is a set of independent and identically distributed random variables mapping
into the set E = {0, 1, . . .}.
The random variable Zn,i is the random number of descendants of individual i in generation (n 1).
The sequence X0 , X1 , . . . : {0, 1, . . .} of random variables given by X0 = 1 and the recursion (11)
is called a branching process.
One can show (see Section 2.1.3) that
pij =
X0 , X1 , . . . is a Markov chain with transition probabilities

P
P
i
Z1,k = j , if i > 0,
k=1
1,
if i = j = 0,
0,
else.
5. Cyclic random walks

Further examples of Markov chains can be constructed as follows (see E. Behrends (2000) Introduction
to Markov Chains. Vieweg, Braunschweig, p. 4).
We consider the finite state space E = {0, 1, . . . , 999}, the initial distribution
= (1/16, 4/16, 6/16, 4/16, 1/16, 0, . . . , 0)>
(12)
2 MARKOV CHAINS
10
and the transition probabilities

1
6
pij =
0,
if (j + 1000 i)
mod (1000) {1, . . . , 6},
else.
Let X0 , Z1 , Z2 , . . . : {0, 1, . . . , 999} be independent random variables, where the distribution

of X0 is given by (12) and
P (Z1 = i) = P (Z2 = i) = . . . = 1/6 ,
i = 1, . . . , 6 .
The sequence X0 , X1 , . . . {0, 1, . . . , 999} of random variables defined by the recursion formula
Xn = (Xn1 + Zn )
mod (1000)
(13)
for n 1 is a Markov chain called cyclic random walk.

Remarks
An experiment corresponding to the Markov chain defined above can be designed in the following
way. First of all we toss a coin four times and record the frequency of the event versus. The
number x0 of these events is regarded as realization of the random initial state X0 ; see the Bernoulli
scheme in Section WR3.2.1.
Afterwards a dice is tossed n times. The outcome zi of the ith experiment, is interpreted as a
realization of the random increment Zi ; i = 1, . . . , n.
The new state xn of the system results from the update of the old state xn1 according to (13)
taking zn1 as increment.
If the experiment is not realized by tossing a coin and a dice, respectively, but by a computerbased
generation of pseudorandom numbers x0 , z1 , z2 , . . . the procedure is referred to as MonteCarlo
simulation.
Methods allowing the construction of dynamic simulation algorithms based on Markov chains will
be discussed in the second part of this course in detail; see Chapter 3 below.
2.1.3
Recursive Representation
In this section we will show

how Markov chains can be constructed from sequences of independent and identically distributed
random variables,
that the recursive formulae (9), (10), (11) and (13) are special cases of a general principle for the
construction of Markov chains,
that vice versa every Markov chain can be considered as solution of a recursive stochastic equation.
As usual let E = {1, 2, . . . , `} be a finite (or countably infinite) set.
Furthermore, let (D, D) be a measurable space, e.g. D = Rd could be the ddimensional Euclidian
space and D = B(Rd ) the Borel algebra on Rd , or D = [0, 1] could be defined as the unit interval
and D = B([0, 1]) as the Borel algebra on [0, 1].
Let now Z1 , Z2 , . . . : D be a sequence of independent and identically distributed random variables
mapping into D, and let X0 : E be independent of Z1 , Z2 , . . ..
2 MARKOV CHAINS
11
Let the random variables X1 , X2 , . . . : E be given by the stochastic recursion equation

Xn = (Xn1 , Zn ) ,
(14)
where : E D E is an arbitrary measurable function.

Theorem 2.2
Let the random variables X0 , X1 , . . . : E be given by (14).
Then
P (Xn = in | Xn1 = in1 , . . . , X0 = i0 ) = P (Xn = in | Xn1 = in1 )
holds for any n 1 and i0 , i1 , . . . , in E such that P (Xn1 = in1 , . . . , X0 = i0 ) > 0.
Proof
Formula (14) implies that
P (Xn = in | Xn1 = in1 , . . . , X0 = i0 )
= P ((Xn1 , Zn ) = in | Xn1 = in1 , . . . , X0 = i0 )

= P ((in1 , Zn ) = in | Xn1 = in1 , . . . , X0 = i0 )
= P ((in1 , Zn ) = in ) ,
where the last equality follows from the transformation theorem for independent and identically
distributed random variables (see Theorem WR3.18),
as the random variables X0 , . . . , Xn1 are functions of Z1 , . . . , Zn1 and hence independent of
(in1 , Zn ).
In the same way one concludes that
P ((in1 , Zn ) = in ) =
=
=
P ((in1 , Zn ) = in | Xn1 = in1 )

P ((Xn1 , Zn ) = in | Xn1 = in1 )
P (Xn = in | Xn1 = in1 ) .
Remarks
The proof of Theorem 2.2 yields that the conditional probability
pij = P (Xn = j | Xn1 = i)
is given by pij = P ((i, Zn ) = j).
pij does not dependent on n, as the innovations Zn are identically distributed.
Moreover, the joint probability P (X0 = i0 , X1 = i1 , . . . , Xn = in ) is given by
P (X0 = i0 , X1 = i1 , . . . , Xn = in ) = i0 pi0 i1 . . . pin1 in ,
(15)
where i0 = P (X0 = i0 ).
Consequently, the sequence X0 , X1 , . . . of random variables given by the recursive definition (14) is a
Markov chain following the definition given in (3).
Our next step will be to show that vice versa, every Markov chain can be regarded as the solution of a recursive
stochastic equation.
2 MARKOV CHAINS
12
Let X0 , X1 , . . . : E be a Markov chain with state space E = {1, 2, . . . , `}, initial distribution =
(1 , . . . , ` )> and transition matrix P = (pij ).
Based on a recursive equation of the form (14) we will construct a Markov chain X00 , X10 , . . . with initial
distribution and transition matrix P such that
P (X0 = i0 , . . . , Xn = in ) = P (X00 = i0 , . . . , Xn0 = in ) ,
i0 , . . . , in E
(16)
for all n 0:
1. We start with a sequence Z0 , Z1 , . . . of independent random variables that are uniformly distributed
on the interval (0, 1].
2. First of all the Evalued random variable X00 is defined as follows:
X00 = k
if and only if
Z0
k1
X
i=1
for all k = 1, . . . , `, i.e.

X00 =
`
X
i ,
k
X
i=1
k
k1
X
X
k1I
i < Z0
i .
k=1
i=1
i
i ,
(17)
i=1
3. The random variables X10 , X20 , . . . are defined by the recursive equation
0
Xn0 = (Xn1
, Zn ) ,
(18)
where the function : E (0, 1] E is given by

(i, z) =
`
X
k=1
k1I
k1
X
pij < z
j=1
k
X
pij .
(19)
j=1
It is easy to see that the probabilities P (X00 = i0 , X10 = i1 , . . . , Xn0 = in ) for the sequence {Xn0 } defined by
(17)(18) are given by (3), i.e., {Xn0 } is a Markov chain with initial distribution and transition matrix P.
Remarks
If (16) holds for two sequences {Xi } and {Xi0 } of random variables, these sequences are called stochastically equivalent.
The construction principle (17)(19) can be exploited for the MonteCarlo simulation of Markov chains
with given initial distribution and transition matrix.
Markov chains on a countably infinite state space can be constructed and simulated in the same way.
However, in this case (17)(19) need to be modified by considering vectors and matrices P of infinite
dimensions.
2.1.4
The Matrix of the nStep Transition Probabilities
Let X0 , X1 , . . . : E be a Markov chain on the state space E = {1, 2, . . . , `} with initial distribution
= (1 , . . . , ` )> and transition matrix P = (pij ).
For arbitrary but fixed n 1 and i, j E the product pii1 pi1 i2 . . . pin1 j can be interpreted as the probability
of the path i i1 . . . in1 j.
2 MARKOV CHAINS
13
Consequently, the probability of the transition from state i to state j within n steps is given by the sum
X
(n)
pij =
pii1 pi1 i2 . . . pin1 j ,
(20)
i1 ,...,in1 E
where
(n)
pij = P (Xn = j | X0 = i)
if P (X0 = i) > 0.
(21)
Remarks
(n)
The matrix P(n) = (pij )i,j=1,...,` is called the nstep transition matrix of the Markov chain {Xn }.
If we introduce the convention P(0) = I, where I denotes the ` `dimensional identity matrix, then
P(n) has the following representation formulae.
Lemma 2.1 The equation
P(n) = Pn
(22)
holds for arbitrary n = 0, 1, . . . and thus for arbitrary n, m = 0, 1, . . .

P(n+m) = P(n) P(m) .
Proof
(23)
Equation (22) is an immediate consequence of (20) and the definition of matrix multiplication.
Example
(Weather Forecast)
Consider E = {1, 2}, and let
P=
1p
p0
1 p0
be an arbitrarily chosen transition matrix, i.e. 0 < p, p0 1.

One can show that the nstep transition matrix P(n) = Pn is given by the formula
0
0 n
p
p
p
p
1
(1
p
)
+
.
Pn =
p + p0
p + p0
p0 p
p0 p0
Remarks
The matrix identity (23) is called the Chapman-Kolmogorov equation in literature.
Formula (23) yields the following useful inequalities.
Corollary 2.2 For arbitrary n, m, r = 0, 1, . . . and i, j, k E,
(n+m)
pii
and
(r+n+m)
pij
(n) (m)
pij pji
(r) (n) (m)
pik pkk pkj .
(24)
(25)
Furthermore, Lemma 2.1 allows the following representation of the distribution of Xn . Recall that Xn denotes
the state of the Markov chain at step n.
2 MARKOV CHAINS
14
Theorem 2.3
Let X0 , X1 , . . . be a Markov chain with state space E = {1, . . . , `}, initial distribution and onestep
transition matrix P.
Then the vector n = (n1 , . . . , n` )> of the probabilities ni = P (Xn = i) is given by the equation
> n
>
n = P .
(26)
Proof
From the formula of total probability (see Theorem WR2.6) and (21) we conclude that
X
X
(n)
i P (Xn = j | X0 = i) =
i pij ,
P (Xn = j) =
iE
iE
where we define P (Xn = j | X0 = i) = 0 if i = P (X0 = i) = 0.

Now statement (26) follows from Lemma 2.1.
Remarks
Due to Theorem 2.3 the probabilities ni = P (Xn = i) can be calculated via the nth power Pn of the
transition matrix P.
In this context it is often useful to find a socalled spectral representation of Pn . It can be constructed
by using the eigenvalues and a basis of eigenvectors of the transition matrix as follows. Note that there
are matrices having no spectral representation.
A short recapitulation
Let A be a (not necessarily stochastic) ` ` matrix, let , 6= 0 be two `dimensional (column)
vectors such that for each of them at least one of their components is different from 0, and let be an
arbitrary (real or complex) number.
If
A =
and
> A = > , respectively,
(27)
then is an eigenvalue of A and and are left and right eigenvectors (for ).
As (27) is equivalent to
(A I) = 0
and
> (A I) = 0> , respectively,
is an eigenvalue of A if and only if is a solution of the so-called characteristic equation

det(A I) = 0 .
(28)
Note that the determinant in (28) is a polynomial of order `. Thus, the algebraic equation (28) has `
possibly complex solutions 1 , . . . , ` . These solutions might not be all different from each other.
W.l.o.g. we may assume the eigenvalues 1 , . . . , ` to be ordered such that
|1 | |2 | . . . |` | .
For every eigenvalue i left and right eigenvectors i and i , respectively, can be found.
2 MARKOV CHAINS
15
Let = (1 , . . . , ` ) be the ` ` matrix consisting of the right eigenvectors 1 , . . . , ` and let
>
1
.
= ..
>
`
be the ` ` matrix formed by the left eigenvectors 1 , . . . , ` .
By definition of the eigenvectors
A = diag() ,
>
where = (1 , . . . , ` )
(29)
and diag() denotes the diagonal matrix with diagonal elements 1 , . . . , ` .
If the eigenvectors 1 , . . . , ` are linearly independent,

the inverse 1 exists and we can set = 1 .
Moreover, in this case (29) implies
A = diag()1 = diag()
and hence
An = ( diag())n 1 = ( diag())n .
This yields the spectral representation of A:

An =
`
X
in i >
i .
(30)
i=1
Remarks
An application of (30) for the transition matrix A = P results in a simple algorithm calculating the
nth power Pn of (26).
For the necessary calculation of the eigenvalues and eigenvectors of P standard software like MAPLE,
MATLAB or MATHEMATICA can be used.
A striking advantage of the spectral representation (30) can be seen in the fact that the complexity of
the numerical calculation for Pn stays constant if n is increased.
However, the derivation of (30) requires the eigenvectors 1 , . . . , ` to be linearly independent. The
next lemma gives a sufficient condition for the linear independence of eigenvectors.
Lemma 2.2
If all eigenvalues 1 , . . . , ` of A are pairwise distinct, every family of corresponding right eigenvectors
1 , . . . , ` is linearly independent.
Furthermore, if the left eigenvectors 1 , . . . , ` are given by = 1 it holds that
1 if i = j,
>
i j =
0 if i 6= j.
Proof
The first statement will be proved by complete induction.
As every eigenvector 1 has at least one nonzero component, a1 1 = 0 implies a1 = 0.
(31)
2 MARKOV CHAINS
16
Let now all eigenvalues 1 , . . . , ` of A be pairwise different and let the eigenvectors 1 , . . . , k1
be linearly independent for a certain k ` .
In order to show the independence of 1 , . . . , k it suffices to show that
k
X
aj j = 0
(32)
j=1
implies a1 = . . . = ak = 0.
Let a1 , . . . , ak be such that (32) holds. This also implies
0 = A0 =
k
X
aj Aj =
j=1
k
X
aj j j .
j=1
The same argument yields

0 = k 0 = k
k
X
aj j =
j=1
and thus
0=
k1
X
k
X
k aj j
j=1
(k j )aj j .
j=1
As the eigenvectors 1 , . . . , k1 are linearly independent

(k 1 )a1 = (k 2 )a2 = . . . = (k k1 )ak1 = 0
and hence a1 = a2 = . . . = ak1 = 0 as k 6= j for 1 j k 1.
Now (32) immediately implies ak = 0.
If the eigenvalues 1 , . . . , ` of A are pairwise distinct,
the ` ` matrix consists of ` linearly independent column vectors,
and thus is invertible.
Consequently, the matrix of the left eigenvectors is simply the inverse = 1 . This immediately implies (31).
2.2
2.2.1
Ergodicity and Stationarity

Basic Definitions and Quasi-positive Transition Matrices
If the Markov chain X0 , X1 , . . . has a very large number ` of possible states, the spectral representation (30)
of the n-step transition matrix P(n) = Pn discussed in Section 2.1.4 turns out to be inappropriate in order
to calculate
(n)
the conditional probabilities pij = P (Xn = j | X0 = i) of the random state Xn

P`
(n)
as well as the (unconditional) probabilities P (Xn = j) = i=1 i pij of Xn
after n 1 (time-) steps.
However, there are certain conditions
(n)
ensuring the existence of the limits limn pij and limn P (Xn = j), respectively, as well as their
equality and independence of i,
2 MARKOV CHAINS
17
(n)
(n)
thus justifying to consider the limit j = lim pij = lim P (Xn = j) as approximation of pij and
n
P (Xn = j) if n 1.
This serves as a motivation to formally introduce the notion of the ergodicity of Markov chains.
Definition The Markov chain X0 , X1 , . . . with transition matrix P = (pij ) and the corresponding n-step
(n)
transition matrices P(n) = (pij ) (= Pn ) is called ergodic if the limits
(n)
j = lim pij
(33)
1. exist for all j E

2. are positive and independent of i E
3. form a probability function = (1 , . . . , ` )> , i.e.
Example
P
jE
j = 1.
(Weather Forecast)
In order to illustrate the notion of an ergodic Markov chain we return to the simple example of weather
forecast already discussed in Sections 2.1.2 and 2.1.4.
Let E = {1, 2} and
P=
1p
0
1p
be an arbitrary transition matrix such that 0 < p, p0 1.

The n-step transition matrix P(n) = Pn is given by
0
0 n
p
p
p
1
(1
p
)
+
Pn =
0
0
p + p0
p
+
p
p p
p0
p
p0
If p + p0 < 2, this and (26) imply
lim Pn =
1
p + p0
p0
and
= lim n =
n
p0
p0
p >
,
,
0
p+p
p + p0
(34)
respectively. Note that the limit distribution in (34) does not depend on the choice of the initial
distribution (= 0 ).
However, if p + p0 = 2, then
P
Pn =
I
if n is odd,
if n is even.
The ergodicity of Markov chains on an arbitrary finite state space can be characterized by the following notion
from the theory of positive matrices.
2 MARKOV CHAINS
18
Definition
The ` ` matrix A = (aij ) is called non-negative if all entries aij of A are non-negative.
The non-negative matrix A is called quasi-positive if there is a natural number n0 1 such that all
entries of An0 are positive.
Remark If A is a stochastic matrix and we can find a natural number n0 1 such that all entries of An0 are
positive, then it is easy to see that for all natural numbers n n0 all entries of An are positive.
Theorem 2.4 The Markov chain X0 , X1 , . . . with state space E = {1, . . . , `} and transition matrix P is ergodic
if and only if P is quasi-positive.
Proof
First of all we show that the condition
(n )
min pij 0 > 0
(35)
i,jE
for some n0 N is sufficient for the ergodicity of {Xn }.

(n)
Let mj
(n)
(n)
= miniE pij and Mj
(n)
= maxiE pij . The ChapmanKolmogorov equation (23) yields

(n+1)
pij
(n)
pik pkj
kE
and thus
(n+1)
mj
(n+1)
= min pij
i
(n)
(n+1)
i.e., mj mj
for
(n)
(n+1)
Mj Mj
for all n
= min
i
(n)
pik pkj min

i
(n)
(n)
pik min plj = mj ,

l
all n 0, where we define P(0) = I. A similar argument shows that
0.
Consequently, in order to show the existence of the limits j in (33) it suffices to show that for all
jE
(n)
(n)
lim (Mj mj ) = 0 .
(36)
n
(n )
(n )
For this purpose we consider the sets E 0 = {k E : pi0 k0 pj0 k0 } and E 00 = E \ E 0 for arbitrary
but fixed states i0 , j0 E.
(n )
Let a = mini,jE pij 0 > 0. Then

X
(n )
(n )
pi0 k0 pj0 k0
=1
kE 0
and
X
kE 00
(n )
(n )
pi0 k0 pj0 k0
(n )
pi0 k0
(n )
pj0 k0 1 à
kE 00
kE 0
X (n )
(n )
pi0 k0 pj0 k0 .
kE 0
2 MARKOV CHAINS
19
By another application of the ChapmanKolmogorov equation (23) this yields for arbitrary n 0
and j E
X (n )
(n +n)
(n +n)
(n ) (n)
pi0 j0
pj0 j0
=
pi0 k0 pj0 k0 pkj
kE
X (n )
X (n )
(n ) (n)
(n ) (n)
pi0 k0 pj0 k0 pkj +
pi0 k0 pj0 k0 pkj
kE 0
kE 00
X (n )
X (n )
(n )
(n)
(n ) (n)
pi0 k0 pj0 k0 Mj +
pi0 k0 pj0 k0 mj
kE 0
kE 00
kE 0
kE 0
X (n )
X (n )
(n ) (n)
(n )
(n)
pi0 k0 pj0 k0 mj
pi0 k0 pj0 k0 Mj
X (n )
(n )
(n)
(n)
pi0 k0 pj0 k0 Mj mj
kE 0
(n)
(n)
(1 à) Mj mj .
(n0 +n)
As a consequence, Mj
for any k 1
(n0 +n)
mj
(kn0 +n)
Mj
(n)
(Mj
(kn0 +n)
(n)
mj )(1 à) and by induction one shows that

(n)
mj
(Mj
(n)
mj )(1 à)k .
(37)
This ensures the existence of an (unbounded) sequence n1 , n2 , . . . such that for all j E
(nk )
lim (Mj
(n)
By the monotonicity of the differences Mj

of natural numbers.
This proves (36).
(nk )
mj
(n)
mj
) = 0.
(38)
in n, (38) holds for every sequence n1 , n2 , . . .
The limits j are positive because

(n)
(n)
j = lim pij lim mj

n
(n0 )
mj
a > 0.
P
P
P
(n)
(n)
Furthermore, jE j = jE limn pij = limn jE pij = 1 as the sum consists of finitely
many summands.
It follows immediately from minjE j > 0 and (33) that the condition (35) is necessary for ergodicity
if one takes into account that the state space E is finite.
Remarks
(n)
As the limits j = limn pij of ergodic Markov chains do not depend on i and the state space
E = {1, . . . , `} is finite, clearly
>
lim >
lim P(n) = > .
n =
(n)
The proof of Theorem 2.4 does not only show the existence of the limits j = limn pij but also
yields the following estimate for the rate of convergence: The inequality (37) implies
(n)
(n)
(n)
sup |pij j | sup Mj mj
(1 à)bn/n0 c
(39)
i,jE
and hence
jE
sup |nj j | (1 à)bn/n0 c ,
jE
where bn/n0 c denotes the integer part of n/n0 .
(40)
2 MARKOV CHAINS
20
Estimates like (39) and (40) are referred to as geometric bounds for the rate of convergence in literature.
(n)
Now we will show that the limits j = limn pij can be regarded as solution of a system of linear equations.
Theorem 2.5
Let X0 , X1 , . . . be an ergodic Markov chain with state space E = {1, . . . , `} and transition matrix P = (pij ).
(n)
In this case the vector = (1 , . . . , ` )> of the limits j = limn pij is the uniquely determined (positive)
solution of the linear equation system
X
i pij ,
j E,
(41)
j =
iE
when additionally the condition
P
jE
j = 1 is imposed.
Proof
The definition (33) of the limits j and the ChapmanKolmogorov equation (23) imply by changing
the order of limit and sum that
X (n1)
X
(33)
(33) X
(n) (23)
(n1)
j = lim pkj = lim
pki pij =
lim pki pij =
i pij .
n
iE
iE
iE
Suppose now that

is another solution 0 = (10 , . . . , `0 )> of (41) such that j0 =
P there
0
all j E and jE j = 1.
By induction one easily shows
j0 =
(n)
i0 pij ,
P
iE
i0 pij for
j E,
(42)
iE
for all n = 1, 2, . . ..
In particular (42) implies
(42)
j0 = lim
X
iE
(n)
i0 pij =
X
iE
(n) (33)
i0 lim pij
n
= j .
Remarks
In matrix notation the linear equation system (41) is of the form > = > P.
If the number ` of elements in the state space is reasonably small this equation system can be used for
the numerical calculation of the probability function ; see Section 2.2.5.
In case ` 1, MonteCarlo simulation turns out to be a more efficient method to determine ; see
Section 3.3.
2.2.2
Estimates for the Rate of Convergence; PerronFrobeniusTheorem
Recall:
If {Xn } is a Markov chain whose 1-step transition matrix P has only strictly positive entries pij ,
2 MARKOV CHAINS
21
then the geometric bound for the rate of convergence to the limit distribution = (1 , . . . , ` )> derived
in (40) is given as follows:
max |nj j | = O((1 à)n ) ,
(43)
jE
where a = mini,jE pij > 0.

Whenever the minimum a of the entries pij of the transition matrix P is close to 0 the bound in (43) is not
very useful.
However, in some cases the basis 1 à of the convergence estimate (43) can be improved.
Example
(Weather Forecast)
Let E = {1, 2} and
P=
1p
p0
1 p0
where 0 < p, p0 < 1.
In Section 2.2.1 we showed that

the n-step transition matrix P(n) = Pn is given by
0
0 n
p
p
p
1
(1
p
)
+
Pn =
0
0
0
p+p
p+p
p p
p0
and thus
P =
lim Pn =
p0
1 p
p + p0
p0
p
p
Consequently
Pn P
(1 p p0 )n p
=
p + p0
p0
p0
= O(|1 p p0 |n ) ,
(44)
where p + p0 > 2a = 2 mini,jE pij and hence |1 p p0 | < 1 2a if p 6= p0 .

Remarks
The basis |2 | = |1 p p0 | of the rate of convergence in (44) is the absolute value of the second
largest eigenvalue 2 of the transition matrix P,
as the characteristic equation
det(P I) =
(1 p )(1 p0 ) pp0 = 0
of P has the two solutions 1 = 1 and 2 = 1 p p0 .
In general geometric estimates of the form (44) for the rate of convergence can be derived by means of the following
socalled PerronFrobenius theorem for quasi-positive matrices.
Theorem 2.6
Let A be a quasi-positive ` ` matrix with eigenvalues 1 , . . . , ` such that |1 | . . . |` |.
Then the following holds:
2 MARKOV CHAINS
22
(a) The eigenvalue 1 is real and positive.

(b) 1 > |i | for all i = 2, . . . , `,
(c) The right and left eigenvectors 1 and 1 of 1 are uniquely determined up to a constant factor and
can be chosen such that all components of 1 and 1 are positive.
A proof of Theorem 2.6 can be found in Chapter 1 of E. Seneta (1981) Non-Negative Matrices and Markov Chains,
Springer, New York.
Corollary 2.3
Let P be a quasi-positive transition matrix. Then
1 = 1, 1 = e and 1 = , where e = (1, . . . , 1)> and = (1 , . . . , ` )> .

|i | < 1 for all i = 2, . . . , `.
Proof
As P is a stochastic matrix, obviously Pe = e and (41) implies > P = > .
Thus 1 is an eigenvalue of P and e and are right and left eigenvectors of this eigenvalue, respectively.
Let now be an arbitrary eigenvalue of P and let = (1 , . . . , ` )> be an eigenvector corresponding
to .
By definition (27) of and
|| |i |
`
X
pij |j | max |j | ,
jE
j=1
i E .
Consequently || 1 and therefore 1 = 1 is the largest eigenvalue of P.

Theorem 2.6 now implies |i | < 1 for i = 2, . . . , `.
Corollary 2.3 yields the following geometric convergence estimate.

Corollary 2.4
distinct. Then
Let P be a quasi-positive transition matrix such that all eigenvalues 1 , . . . , ` of P are pairwise
sup |nj j | = O(|2 |n ) .
(45)
jE
Proof
Corollary 2.3 implies
lim
`
X
in i >
i = 0,
(46)
i=2
as |i | < 1 for all i = 2, . . . , `.

Furthermore, Corollary 2.3 implies 1 = 1 as well as 1 = (1, . . . , 1)> and 1 = being the right and
left eigenvectors of 1 , respectively.
Taking into account the spectral representation (30) of Pn , i.e.,
Pn =
`
X
in i >
i ,
i=1
it is easy to see that
Pn
>
..
.
>
in i >
= Pn 1n 1 >
i .
1 =
i=2
2 MARKOV CHAINS
23
> n
As >
n = P (see Theorem 2.3) this together with (46) shows (45).
Example (Reaching a Consensus)

see C. Hesse (2003) Angewandte Wahrscheinlichkeitstheorie. Vieweg, Braunschweig, p. 349
A committee consisting of ` members has the mandate to project a certain (economical) parameter
R, one could think of the German Council of Economic Experts projecting economic growth for
the next year.
In a first step each of the ` experts gives a personal projection for , where the single projection
(0)
(0)
results are denoted by
b1 , . . . ,
b` .
What could be a method for the experts to settle on a common projection, i.e. reach a consensus?
(0)
(0)
A simple approach would be to calculate the arithmetic mean (b
1 + . . . +
b` )/`, thus ignoring
the different levels of expertise within the group.
Alternatively, every committee member could modify his own projection based on the projections by
his ` 1 colleagues and his personal assessment of their authors expertise.
For arbitrary i, j {1, . . . , `} the expert i attributes the trust probability pij to the expert j
such that
`
X
pij > 0
and
pij = 1 ,
i, j {1, . . . , `}
j=1
(0)
and expert i modifies his original projection

bi
(1)
bi
replacing it by
`
X
(0)
pij
bj .
j=1
(1)
(1)
b` will still be different from each other. Therefore

b1 , . . . ,
In most cases the modified projections
the procedure is repeated until the differences are sufficiently small.
Theorem 2.4 ensures
that this can be achieved if the modification procedure is repeated often enough,
as according to Theorem 2.4 the limits
(n)
lim
b
n i
`
X
(0)
j
bj
(47)
j=1
exist and do not depend on i,

(n)
where the vector = (1 , . . . , ` )> of the limits j = limn pij is the (uniquely determined)
solution of the linear equation system (41), i.e. > = > P with P = (pij ).
The equality (47) can be seen as follows:
(n)
lim
bi
lim
n
`
X
j=1
`
X
(n1)
pij
bj
= lim
j=1
(n)
lim p
n ij
(0)
bj
`
X
`
X
(n)
(0)
pij
bj
j=1
(0)
j
bj .
j=1
The consensus, i.e. the common projection of the unknown parameter , reached by the committee is
given by
`
X
(0)
b=
j
bj .
(48)
j=1
2 MARKOV CHAINS
24
Remarks
For large ` the algebraic solution of the linear equation system (41) can be difficult.
In this case the estimates for the rate of convergence in (47) become relevant for the practical implementation of the method to reach a consensus described in (47).
We consider the following numerical example.
Let ` = 3 and
2 1 3
6 6 6
1 1 2
.
(49)
P=
4 4 4
2 1 5
8 8 8
The entries of this stochastic matrix imply that the third expert has a particularly high reputation
among his colleagues.
The solution = (1 , 2 , 3 )> of the corresponding linear equation system (41) is given by
1 =
(0)
i.e. the projection

b3
21
,
77
2 =
12
,
77
3 =
44
,
77
of the third expert with the outstanding reputation is most influential.
The eigenvalues of the transition matrix given in (49) are 1 = 1, 2 = 1/8 and 3 = 1/12.
The basis in the rate of convergence given by (43) is
1 3a = 1 3 min pij = 1
i,j=1,2,3
5
3
= ,
8
8
whereas Corollary 2.4 yields the following substantially improved geometric rate of convergence
(n)
max
b
bi = O(|2 |n )
i{1,...,`}
where 2 = 1/8 denotes the second largest eigenvalue of the stochastic matrix P given by (49).
2.2.3
Irreducible and Aperiodic Markov Chains
Recall that in Theorem 2.4 we characterized the ergodicity of the Markov chain X0 , X1 , . . . by the quasipositivity of its transition matrix P.
However, it can be difficult to show this property of P directly, especially if ` 1.
Therefore, we will derive another (probabilistic) way to characterize the ergodicity of a Markov chain with
finite state space. For this purpose we will need the following notion.
(n)
For arbitrary but fixed states i, j E we say that the state j is accessible from state i if pij > 0 for
some n 0 where P(0) = I. (notation: i j)
Another (equivalent) definition for accessibility of states is the following:
Let j = min{n 0 : Xn = j} be the number of steps until the Markov chain {Xn } reaches the state
j E for the first time. We define j = if Xn 6= j for all n 0.
Theorem 2.7 Let i E be such that P (X0 = i) > 0. In this case j is accessible from i E if and only if
P (j < | X0 = i) > 0.
2 MARKOV CHAINS
25
Proof
The condition is obviously necessary because
{Xn = j} {j n} {j < }
(n)
and thus
0 < pij P (j < | X0 = i)
for some n 0 if j is accessible from i.

(n)
On the other hand if i 6= j and pij = 0 for all n 0, then

P (j < | X0 = i) = lim P (j < n | X0 = i)
n
lim P
lim
n1
[
{Xk = j} X0 = i
k=0
n1
X
P (Xk = j | X0 = i) = lim
k=0
n1
X
(k)
pij = 0 .
k=0
Remarks
The property of accessibility is
transitive, i.e., i k and k j imply that i j.
(r+m)
(r) (m)
This is an immediate consequence of the inequality pij

pik pkj (see Corollary 2.2) and of
the definition of accessibility.
Moreover, in case i j and j i we say that the states i and j communicate. (notation: ij)
The property of communicating is an equivalence relation as
(a) ii (reflexivity),
(b) ij if and only if ji (symmetry),
(c) ik and kj implies ij (transitivity).
As a consequence,
the state space E can be completely divided into disjoint equivalence classes with respect to the
equivalence relation .
The Markov chain {Xn } with transition matrix P = (pij ) is called irreducible if the state space E
consists of only one equivalence class, i.e. ij for all i, j E.
Examples
The definition of irreducibility immediately implies that the 2 2 matrices
1/2 1/2
1/2 1/2
P1 =
and
P2 =
1/2 1/2
1/4 3/4
are irreducible.
On the other hand the 4 4 block matrix P consisting of P1 and P2
P1 0
P=
0 P2
is not irreducible.
2 MARKOV CHAINS
26
Besides irreducibility we need a second property of the transition probabilities, namely the so-called aperiodicity,
in order to characterize the ergodicity of a Markov chain in a simple way.
Definition
(n)
The period di of the state i E is given by di = gcd{n 1 : pii > 0} where gcd denotes the greatest
(n)
common divisor. We define di = if pii = 0 for all n 1.
A state i E is said to be aperiodic if di = 1.
The Markov chain {Xn } and its transition matrix P = (pij ) are called aperiodic if all states of {Xn }
are aperiodic.
We will now show that the periods di and dj coincide if the states i, j belong to the same equivalence class of
(n)
communicating states. For this purpose we introduce the notation i j[n] if pij > 0.
Theorem 2.8 If the states i, j E communicate, then di = dj .
Proof
If j j[n], i j[k] and j i[m] for certain k, m, n 1, then the inequalities from Corollary 2.2
imply that i i[k + m] and i i[k + m + n].
Thus, k + m and k + m + n are divisible by di .
As a consequence the difference n = (k + m + n) (k + m) is also divisible by di .
(n)
This shows that di is a common divisor for all natural numbers n having the property that pjj > 0,
i.e. di dj .
For reasons of symmetry the same argument also proves that dj di .
Corollary 2.5
Let the Markov chain {Xn } be irreducible. Then all states of {Xn } have the same period.
In order to show
that the characterization of an ergodic Markov chain (see Theorem 2.4) considered in Section 2.2.1 is
equivalent to the Markov chain being irreducible and aperiodic,
we need the following elementary lemma from number theory.
Lemma 2.3 Let k = 1, 2, . . . an arbitrary but fixed natural number. Then there is a natural number n0 1 such
that
{n0 , n0 + 1, n0 + 2, . . .} {n1 k + n2 (k + 1); n1 , n2 0} .
Proof
If n k 2 there are integers m, d 0 such that n k 2 = mk + d and d < k.
Therefore n = (k d + m)k + d(k + 1) and hence
n {n1 k + n2 (k + 1); n1 , n2 0} ,
i.e., n0 = k 2 is the desired number.
2 MARKOV CHAINS
Theorem 2.9
27
The transition matrix P is quasi-positive if and only if P is irreducible and aperiodic.
Proof
Let us first assume the transition matrix P to be irreducible and aperiodic.
(n)
For every i E we consider the set J(i) = {n 1 : pii > 0} whose greatest common divisor is 1
as P is aperiodic.
The inequalities from Corollary 2.2 yield
(n+m)
pii
(n) (m)
pii pii
and hence
n + m J(i)
if n, m J(i).
(50)
We show that J(i) contains two successive numbers.

If J(i) did not contain two successive numbers, the elements of J(i) would have a minimal distance
k 2.
The consequence would be that mk + d J(i) for some m = 0, 1, . . . and d = 1, . . . , k 1 as
otherwise n = mk for all n J(i).
But this is a contradiction to our hypothesis gcd(J(i)) = 1.
Let now n1 , n1 + k J(i). Statement (50) then implies also a(n1 + k) J(i) and n + bn1 J(i) for
arbitrary a, b N, where
n = mk + d J(i) .
(51)
We will show
that there are natural numbers a, b {1, 2, . . .} such that the difference between a(n1 + k) J(i)
and n + bn1 J(i) is less than k.
From (51) we obtain
a(n1 + k) n bn1 = (a b)n1 + (a m)k d
and hence for a = b = m + 1
a(n1 + k) n bn1 = k d < k .
Therefore, the set J(i) contains two successive numbers.
Statement (50) and Lemma 2.3 yield that for every i E there is an n(i) 1 such that
J(i) {n(i), n(i) + 1, . . .} .
(52)
This result, the irreducibility of P and the inequality (25) in Corollary 2.2, i.e.
(r+n+m)
pij
(r) (n) (m)
pik pkk pkj ,
imply that for each pair i, j E of states there is a natural number n(ij) 1 such that
(n)
J(ij) = {n 0 : pij > 0} {n(ij), n(ij) + 1, . . .} ,

i.e., P is quasi-positive.
Conversely, the irreducibility and aperiodicity of quasi-positive transition matrices are immediate consequences of the definitions.
2 MARKOV CHAINS
28
Remarks
A simple example for a nonirreducible Markov chain
can be given by our well-known model for the weather forecast where E = {1, 2} and
1p
p
.
P=
p0
1 p0
If p = 0 or p0 = 0, then the corresponding Markov chain is clearly not irreducible and therefore by
Theorem 2.9 not ergodic.
It is nevertheless possible that the linear equation system
> = > P
(53)
has one (or infinitely many) probability solutions > = (1 , 2 ).

If for example p = 0 and p0 > 0, then i = 1 is a so-called absorbing state and > = (1, 0) is the
(uniquely determined) solution of the linear equation system (53).
If p = 0 and p0 = 0, every probability solution > = (1 , 2 ) solves the linear equation system
(53).
Now we give some examples for non-aperiodic Markov chains X0 , X1 , . . . : E.
In this context the random variables X0 , X1 , . . . : E are not given by a stochastic recursion formula Xn = (Xn1 , Zn ) of the type (14) where the increments Z1 , Z2 , . . . : D are
independent and identically distributed random variables.
We merely assume that the random variables Z1 , Z2 , . . . : D are conditionally independent in
the following sense.
Note: As was shown in Section 2.1.3 it is nevertheless possible to construct a Markov chain that
is stochastically equivalent to X0 , X1 , . . . having independent increments, see the construction
principle considered in (17)(19).
Let E and D be arbitrary finite (or countably finite) sets, let : E D E be an arbitrary function
and let X0 , X1 , . . . : E and Z1 , Z2 , . . . : D be random variables
such that
Xn = (Xn1 , Zn )
(54)
and such that for every n N the random variable Zn is conditionally independent of the random
variables Z1 , . . . , Zn1 , X0 , . . . , Xn2 given Xn1 ,
i.e., for arbitrary n N, i0 , i1 . . . , in1 E and k1 , . . . , kn D
P (Zn = kn , Zn1 = kn1 , . . . , Z1 = k1 , Xn1 = in1 , . . . , X0 = i0 )
= P (Zn = kn | Xn1 = in1 ) P (Zn1 = kn1 , . . . , Z1 = k1 , Xn1 = in1 , . . . , X0 = i0 ) ,
where we define P (Zn = kn | Xn1 = in1 ) = 0 if P (Xn1 = in1 ) = 0.
Moreover, we assume that for arbitrary i E and k D the probabilities P (Zn = k | Xn1 = i)
do not depend on n N.
One can show that the sequence X0 , X1 , . . . : E recursively defined by (54) is a Markov chain
whose transition matrix P = (pij ) is given by
pij = P ((i, Z1 ) = j | X0 = i) ,
if P (X0 = i) > 0 for all i E.
2 MARKOV CHAINS
29
Example (Diffusion Model)

see P. Brmaud (1999) Markov Chains, Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New
York, p. 76
The following simple model describing a diffusion process through a membrane was suggested in 1907
by the physicists Tatiana and Paul Ehrenfest. It is designed to model the heat exchange between two
systems at different temperatures.
We consider ` particles, which are distributed between two containers A and B that are permeably
connected but insulated with respect to their environment.
Assume there are Xn1 = i particles in A at time n 1. Then one of the ` particles in the two
containers is selected at random and transferred into the other container.
The state Xn of the system at time n is hence either Xn = i1 with probability i/` (if the selected
particle was in container A) or Xn = i + 1 with probability (` i)/` (if the selected particle was
in container B).
The random variables X0 , X1 , . . . : {0, 1, . . . , `} can thus be defined recursively
by the stochastic recursion formula
Xn = Xn1 + Zn ,
(55)
where given Xn1 the random variable Zn is conditionally independent of the random variables
Z1 , . . . , Zn1 , X0 , . . . , Xn1 with P (Zn = 1) + P (Zn = 1) = 1 and
P (Zn = 1 | Xn1 = i) =
i
`
if P (Xn1 = i) > 0 .
The entries pij of the transition matrix P = (pij ) are therefore given by
ì
`
i
pij =
if i < ` and j = i + 1,
if i > 0 and j = i 1,
else
(n)
In particular this implies di = gcd{n 1 : pii > 0} = 2 for all i {0, 1, . . . , `}, i.e. the Markov
chain given by (55) is not aperiodic (and thus by Theorem 2.9 not ergodic).
In spite of this, the linear equation system
> = > P
has a (uniquely determined) probability solution > = (0 , . . . , ` ) where

1 `
,
i {0, 1, . . . , `} .
i = `
2
i
(56)
(57)
Remarks
The diffusion model of Ehrenfest is a special case of the following class of Markov chains called birth
and death processes with two reflecting barriers in literature.
2 MARKOV CHAINS
30
The state space considered is E = {0, 1, . . . , `} whereas the transition matrix P = (pij ) is given by
q
1
P=
1
r1
p1
q2
r2
..
.
p2
..
.
qi
..
ri
..
.
pi
..
.
..
q`1
r`1
p`1
(58)
where pi > 0, qi > 0 and pi + qi + ri = 1 for all i {1, . . . , ` 1}.

The linear equation system > = > P is of the form
p
+ ri i + qi+1 i+1 , if 0 < i < `,
i1 i1
i =
q1 1 ,
if i = 0,
p
if i = `.
`1 `1 ,
One can show that
i = 0
p1 p2 . . . pi1
,
q1 q2 . . . qi
P`
where 0 > 0 is defined by the condition i=0 i = 1, i.e.
!
1
p1
p1 p2 . . . p`1
0 1 +
+
+ ... +
=1
q1
q1 q2
q1 q2 . . . q`
and, consequently,
0 =
1
p1
p1 p2 . . . p`1
1+
+
+ ... +
q1
q1 q2
q1 q2 . . . q`
!1
.
As we assume pi > 0 and qi > 0 for all i {1, . . . , ` 1}, birth and death processes with two reflecting
barriers are obviously irreducible.
If the additional condition ri > 0 is satisfied for some i {1, . . . , ` 1}, then birth and death processes
with two reflecting barriers are also aperiodic (and hence ergodic by Theorem 2.9).
2.2.4
Stationary Initial Distributions
Recall
If {Xn } is an irreducible and aperiodic Markov chain with (finite) state space E = {1, . . . , `} and
(quasi-positive) transition matrix P = (pij ),
then the limit distribution = limn n is the uniquely determined probability solution of the
following matrix equation (see Theorem 2.5):
> = > P .
(59)
If the Markov chain {Xn } is not assumed to be irreducible there can be more than one solution for (59).
2 MARKOV CHAINS
31
Moreover, if the initial distribution 0 of {Xn } is a solution of (59), then Theorem 2.3 and (59) imply
>
>
>
1 = 0 P = 0
and thus n = 0 for all n 0.

Due to this invariance property every probability solution of (59) is called a stationary initial
distribution of {Xn }.
Conversely, it is possible to show that
there is a unique probability solution for the matrix equation (59) if P is irreducible.
However, this solution of (59) is not necessarily the limit distribution = limn n as does not
exist if P is not aperiodic.
Theorem 2.10
Let P = (pij )i,jE be an irreducible transition matrix, where E = {1, . . . , `}.
(n)
(n)
For arbitrary but fixed i, j E the entries qij of the stochastic (` `)dimensional matrices Qn = (qij )
where
1
Qn =
P + P2 + . . . + Pn
(60)
n
converge to a limit
(n)
j = lim qij > 0 ,
(61)
n
which does not depend on i. The vector = (1 , . . . , ` )> is a solution of the matrix equation (59) and
P`
satisfies j=1 j = 1.
The distribution given by (60)(61) is the only probability solution of (59).
A proof of Theorem 2.10 can be found in Chapter 7 of E. Behrends (2000) Introduction to Markov Chains, Vieweg,
Braunschweig.
Remarks
Besides the invariance property 0 = 1 = . . ., the Markov chain {Xn } with stationary initial distribution 0 exhibits still another invariance property for all finite dimensional distributions that is
considerably stronger.
In this context we consider the following notion of a (strongly) stationary sequence of random variables.
Definition
Let X0 , X1 , . . . : E be an arbitrary sequence of random variables mapping into E = {1, . . . , `}
(which is not necessarily a Markov chain).
The sequence {Xn } of E-valued random variables is called stationary if for arbitrary k, n {0, 1, . . .}
and i0 , . . . , in E
P (Xk = i0 , Xk+1 = i1 , . . . , Xk+n = in ) = P (X0 = i0 , X1 = i1 , . . . , Xn = in ) .
(62)
2 MARKOV CHAINS
32
Theorem 2.11
Let X0 , X1 , . . . : E be a Markov chain with state space E = {1, . . . , `}.
Then {Xn } is a stationary sequence of random variables if and only if the Markov chain {Xn } has a
stationary initial distribution.
Proof
The necessity of the condition follows immediately
from Theorem 2.3 and from the definitions for a stationary initial distribution and a stationary
sequence of random variables, respectively,
as (62) in particular implies that P (X1 = i) = P (X0 = i) for all i E
>
>
and from Theorem 2.3 we thus obtain >
0 = 1 = 0 P, i.e., 0 is a stationary initial distribution.
Conversely, suppose now that 0 is a stationary initial distribution of the Markov chain {Xn }.
Then, by the definition (3) of a Markov chain {Xn }, we have
P (Xk = i0 , Xk+1 = i1 , . . . , Xk+n = in )
X
=
P (X0 = i00 , . . . , Xk1 = i0k1 , Xk = i0 , Xk+1 = i1 , . . . , Xk+n = in )
i00 ,...,i0k1 E
i00 pi00 i01 . . . pi0k2 i0k1 pi0k1 i0 pi0 i1 . . . pin1 in
i00 ,...,i0k1 E
=
=
=
k
>
0P
i0
pi0 i1 . . . pin1 in
0,i0 pi0 i1 . . . pin1 in

P (X0 = i0 , X1 = i1 , . . . , Xn = in ) ,
where the last but one equality is due to the stationarity of the initial distribution 0 and the last
equality uses again the definition (3) of the Markov chain {Xn }.
Remarks
For some Markov chains, whose transition matrices exhibit a specific structure, we already calculated
their stationary initial distributions in Sections 2.2.2 and 2.2.3.
Now we will discuss two additional examples of this type.
In these examples the state space is infinite requiring an additional condition apart from quasi
positivity (or irreducibility and aperiodicity) in order to ensure the ergodicity of the Markov chains.
Namely, a socalled contraction condition is imposed that prevents the probability mass to migrate
towards infinity.
Examples
1. Queues
see T. Rolski, H. Schmidli, V. Schmidt, J. Teugels (2002) Stochastic Processes for Insurance and
Finance. J. Wiley & Sons, Chichester, p. 147.
We consider the example already discussed in Section 2.1.2
of the recursively defined Markov chain X0 , X1 , . . . {0, 1, . . .} with X0 = 0 and
Xn = max{0, Xn1 + Zn 1} ,
n 1,
(63)
2 MARKOV CHAINS
33
where the random variables Z, Z1 , Z2 , . . . : {0, 1, . . .} are independent and identically

distributed and the transition matrix P = (pij ) is given by
P (Z = j + 1 i)
if j + 1 i > 0 or j > i = 0,
pij =
(64)
P (Z = 0) + P (Z = 1) if j = i = 0,
0
otherwise.
It is not difficult to show that
the Markov chain {Xn } defined by the recursion formula (63) with its corresponding transition
matrix (64) is irreducible and aperiodic if
P (Z = 0) > 0 ,
P (Z = 1) > 0 and P (Z = 2) > 0 ,
(65)
for all n 1 the solution of the recursion equation (63) can be written as
n
Xn = max 0,
max
k{1,...,n}
n
X
o
(Zr 1)
n
d
= max 0,
max
k{1,...,n}
r=k
k
o
X
(Zr 1) ,
(66)
r=1
the limit probabilities i exist for all i {0, 1, . . .} where

i
n
lim P max 0,
P
=
k{1,...,n}
k
P
sup
k
X
max
(Zr 1) = i
r=1
(Zr 1) = i
k{1,2,...} r=1
k
P
sup
for i > 0,
(Zr 1) 0
for i = 0.
k{1,2,...} r=1
Furthermore
i = 0
for all i {0, 1, . . .}
if E Z 1,
i > 0
for all i {0, 1, . . .} and
P
i0
i = 1
if (65) holds and E Z < 1.
Thus, for Markov chains with (countably) infinite state space,

irreducibility and aperiodicity do not always imply ergodicity,
but, additionally, a certain contraction condition needs to be satisfied,
where in the present example this condition is the requirement of a negative drift , i.e.,
E (Z 1) < 0.
If the conditions (65) are satisfied and E Z < 1, then
the equation > = > P has a uniquely determined probability solution > = (0 , 1 , . . .),
which coincides with > = (0 , 1 , . . .) (= limn >
n ) but which in general cannot be determined explicitly.
However, there is a simple formula for the generating function g : (1, 1) [0, 1] of =
(0 , 1 , . . .)> , where
X
i
X
g (s) =
s i
= Es
i=0
and
n
X = max 0,
sup
k
X
k{1,2,...} r=1
o
(Zr 1) .
(67)
2 MARKOV CHAINS
34
Namely, we have
g (s) =
(1 )(1 s)
,
gZ (s) s
s (1, 1) ,
(68)
where = E Z and gZ (s) = E sZ is the generating function of Z.

Proof of (68)
d
By the definition (67) of X , we have X = max{0, X + (Z 1)}.

Furthermore, using the notation x+ = max{0, x}, we obtain
g (s) = E sX = E s(X +Z1)+
= E s(X +Z1)+ 1I(X + Z 1 0) + E s(X +Z1)+ 1I(X + Z 1 = 1)
1X k
s P (X + Z = k) + P (X + Z = 0)
s
k=1
= s1 g (s)gZ (s) + (s1 1)P (X + Z = 0) ,

i.e.
g (s) =
(s 1)P (X + Z = 0)
.
s gZ (s)
As
lim g (s) = 1
and
s1
lim
s1
(69)
d
gZ (s) = E Z ,
ds
by lHpitals rule we can conclude that

1=
P (X + Z = 0)
.
1
Hence (68) is a consequence of (69).

2. Birth and death processes with one reflecting barrier
We modify the example of the death and birth process discussed in Section 2.2.3 now considering
the infinite state space E = {0, 1, . . .} and the transition matrix
q
1
P=
1
r1
p1
q2
r2
..
.
p2
..
.
qi
..
ri
..
.
pi
..
.
..
(70)
where pi > 0, qi > 0 and pi + qi + ri = 1 is assumed for all i {1, 2, . . .}.

The linear equation system > = > P is of the form
p
if i > 0,
i1 i1 + ri i + qi+1 i+1
i =
q
if i = 0.
1 1
Similarly to the birth and death processes with two reflecting barriers one can show that
(71)
2 MARKOV CHAINS
35
the equation system (71) has a uniquely determined probability solution > if
X
p1 p2 . . . pj
< ,
q
q . . . qj+1
j=1 1 2
(72)
the solution > = (0 , 1 , . . .) of (71) is given by

p1 p2 . . . pi1
,
i > 0,
q1 q2 . . . qi
P
where 0 > 0 is defined by the condition i=0 i = 1, i.e.
X
1
p1 p2 . . . pj
0 1 +
+
=1
q1 j=1 q1 q2 . . . qj+1
i = 0
and, consequently,
0 =
X p1 p2 . . . pj
1
+
1+
q1 j=1 q1 q2 . . . qj+1
!1
.
As we assume pi > 0 and qi > 0 for all i {1, 2, . . .} birth and death processes with one reflecting
barrier are obviously irreducible.
Furthermore, if ri > 0 for some i {1, 2 . . . , } then birth and death processes with one reflecting
barrier are also aperiodic (as well as ergodic if the contraction condition (72) is satisfied).
2.2.5
Direct and Iterative Computation Methods
First we show how the stationary initial distribution 0 (= = limn n ) of the Markov chain {Xn } can be
computed based on methods from linear algebra in case the transition matrix P does not exhibit a particularly
nice structure (but is quasipositive) and if the number ` of states is reasonably small.
Theorem 2.12
Let the transition matrix P of the Markov chain {Xn } be quasi-positive.
Then the matrix I P + E is invertible and the uniquely determined probability solution = limn n
of the matrix equation > = > P is given by
> = e> (I P + E)1 ,
(73)
where e = (1, . . . , 1)> and all entries of the ` ` matrix E are equal to 1.
Proof
In order to prove that the matrix I P + E is invertible we show that the only solution of the equation
(I P + E) x = 0
(74)
is given by x = 0.
As satisfies the equation > = > P we obtain
> (I P) = 0 .
(75)
2 MARKOV CHAINS
36
Thus (74) implies
0 = > (I P + E) x = 0 + > Ex ,
i.e.
> Ex = 0 .
>
>
On the other hand, clearly E = e
(76)
and hence as a consequence of (76)
e> x = 0
and
Ex = 0 .
(77)
Taking into account (74) this implies (I P) x = 0 and, equivalently, P x = x.

Thus, we also have x = Pn x for all n 1.
Furthermore, Theorem 2.4 implies Pn ,
where denotes the ` ` matrix consisting of the ` identical (row) vectors > .
In other words: For n we have
x = Pn x x ,
P`
i.e. xi = j=1 j xj for all i = 1, . . . , `.
As the right hand sides of these equations do not depend on i we can conclude x = c e for some
constant c R.
Moreover, as a consequence of (77),
0 = e> x = c e> e = c`
and hence c = 0, i.e. x = 0.
Thus, the matrix I P + E is invertible.
Finally, (75) implies
and, equivalently,
> (I P + E) = > E = e>

> = e> (I P + E)1 .
Remarks
Given a larger number ` of states the numerical computation of the inverse matrix (I P + E)1 in
(73) can cause difficulties.
In this case it is often more convenient to solve the matrix equation > = > P iteratively.
If the transition matrix P is quasipositive and hence ` > 0 one can start by setting
b` = 1 and
solving the modified equation
b = b>
b > (I P)
(78)
b = (pij )i,j=1,...,`1 and
b > = (b
where P
1 , . . . ,
b`1 ), b> = (p`1 , . . . , p`,`1 ).
The probability function > = (1 , . . . , ` ) desired originally is given by
i =
bi /c with c =
b1 + . . . +
b`
i = 1, . . . , ` .
b being invertible and that

When solving the modified matrix equation (78) we use the facts of I P
1
b
there is an expansion of (I P)
as a power series, which is a consequence of the following two
lemmata.
2 MARKOV CHAINS
37
Lemma 2.4
Let A be an ` ` matrix such that An 0 for n .
Then the matrix I A is invertible and for all n = 1, 2, . . .
I + A + . . . + An1 = (I A)1 (I An ) .
(79)
Proof
Obviously for all n = 1, 2, . . .
(I A)(I + A + . . . + An1 ) =
=
I + A + . . . + An1 A . . . An
I An .
(80)
Furthermore, the matrix I An is invertible for sufficiently large n as by hypothesis An 0.

Consequently, for sufficiently large n we have
0
6= det(I An )
= det (I A)(I + A + . . . + An1 )

= det(I A) det(I + A + . . . + An1 ) .
This implies det(I A) 6= 0 and hence I A is invertible.

The assertion (79) now follows from (80).
Lemma 2.5
b be the (` 1) (` 1) matrix introduced in (78).
Let the stochastic matrix P be quasi-positive and let P
b n 0 for n , the matrix I P
b is invertible, and
Then, P
b 1 =
(I P)
bn .
P
(81)
n=0
Proof
b n 0.
Because of Lemma 2.4 it suffices to show that P
As P is quasipositive by hypothesis there is a natural number n0 1 such that
X (n )
b = {1, . . . , ` 1}.
= max
pij 0 < 1 ,
where E
b
iE
b
jE
Furthermore,
b n )ij =
(P
pii1 pi1 i2 . . . pin1 j
pii1 pi1 i2 . . . pin1 j = (Pn )ij
i1 ,...,in1 E
b
i1 ,...,in1 E
(n)
b n )ij (Pn )ij = p < 1 for all n n0 ; i, j E.

b
and thus 0 (P
ij
2 MARKOV CHAINS
38
Writing n as n = kn0 + m for some k 1 and m {0, . . . , n0 1} we obtain

X
b n )ij =
b n0 )ii (P
b n0 )i i . . . (P
b n0 )i i (P
b m )i j
(P
(P
1
1 2
k1 k
k
b
i1 ,...,ik E
(n ) (n )
(n )
0
pii10 pi1 i02 . . . pik1
ik
b
i1 ,...,ik E
(n ) (n )
(n )
0
pii10 pi1 i02 . . . pik2
ik1
b
i1 ,...,ik1 E
b
ik E
(n )
0
pik1
ik
(n ) (n )
(n )
0
pii10 pi1 i02 . . . pik2
ik1
b
i1 ,...,ik1 E
..
.
k .
b n )ij limk k = 0.
This yields limn (P
Remarks
b = b> , is given by
b > of the equation (78), i.e.
b > (I P)
As a consequence of Lemma 2.5 the solution
>
>
b =b
bn ,
P
(82)
n=0
b > = (b
thus allowing an iterative solution of
1 , . . . ,
b`1 ).
>b
>
>
Notice that we start the iteration with b>
0 = b as initial value later setting bn+1 = bn P for all n 0.
Thus, (82) can be rewritten as

b> =
and
2.3
2.3.1
Pn0
n=0
b>
n ,
(83)
n=0
b > if n0 1 is sufficiently large.

b>
n can be used as an approximation for
Reversibility; Estimates for the Rate of Convergence

Definition and Examples
A stationary Markov chain X0 , X1 , . . . : E and its corresponding pair (P, ) consisting of the transition
matrix P and the stationary initial distribution is called reversible if its finitedimensional distributions
do not depend on the orientation of the time axis, i.e., if
P (X0 = i0 , X1 = i1 , . . . , Xn1 = in1 , Xn = in ) = P (Xn = i0 , Xn1 = i1 , . . . , X1 = in1 , X0 = in )
(84)
for arbitrary n 0 and i0 , . . . , in E.

The reversibility of Markov chains is a particularly useful property for the construction of dynamic simulation
algorithms, see Sections 3.33.5.
First of all we will derive a simple characterization for the reversibility of stationary (but not necessarily ergodic)
Markov chains.
2 MARKOV CHAINS
39
Theorem 2.13
Let X0 , X1 , . . . : E be a Markov chain with state space E, transition matrix P = (pij ) and stationary
initial distribution = (1 , 2 , . . .)> .
The Markov chain is reversible if and only if
i pij = j pji
for arbitrary i, j E.
(85)
Proof
By definition (84) the condition (85) is clearly necessary as (84) implies in particular
P (X0 = i, X1 = j) = P (X1 = i, X0 = j)
for arbitrary i, j E.
Therefore
i pij
= P (X0 = i, X1 = j)
= P (X1 = i, X0 = j)
= j pji .
Conversely, if (85) holds then the definition (3) of Markov chains yields
P (X0 = i0 , X1 = i1 , . . . , Xn1 = in1 , Xn = in )
(3)
(85)
=
..
.
(85)
=
=
(3)
=
=
i0 pi0 i1 pi1 i2 . . . pin1 in

pi1 i0 i1 pi1 i2 . . . pin1 in
pi1 i0 pi2 i1 . . . pin in1 in

in pin in1 . . . pi2 i1 pi1 i0
P (X0 = in , X1 = in1 , . . . , Xn1 = i1 , Xn = i0 )
P (Xn = i0 , Xn1 = i1 , . . . , X1 = in1 , X0 = in ) .
i.e., (84) holds.
Remarks
The proof of Theorem 2.13 does not require the stationary Markov chain X0 , X1 , . . . to be ergodic.
In other words,
if the transition matrix P is not irreducible or not aperiodic and hence the limit distribution
does not exist or is not uniquely determined, respectively,
then Theorem 2.13 still holds if is an arbitrary stationary initial distribution.
As P = (pij ) is a stochastic matrix, (85) implies for arbitrary i E
i = i
X
jE
pij =
X
jE
(85)
i pij =
j pji .
jE
In other words: Every initial distribution satisfying the so-called detailed balance condition (85) is
necessarily a stationary initial distribution, i.e. it satisfies the global balance condition > = > P.
2 MARKOV CHAINS
40
v1
v4
v3
v2
v5
v6
v7
v8
Figure 1: Connected Graph

Examples
1. Diffusion Model
We return to the diffusion model already discussed in Section 2.2.3 with the finite state space
E = {0, 1, . . . , `}, the irreducible (but aperiodic) transition matrix P = (pij ) where
ì
`
i
pij =
if i < ` and j = i + 1,
if i > 0 and j = i 1,
(86)
else,
and the (according to Theorem 2.10 uniquely determined but not ergodic) stationary initial distribution

1 `
> = (0 , . . . , ` ) ,
where i = `
,
i {0, 1, . . . , `} .
(87)
2
i
One can easily see that
i pij = j pji
for arbitrary i, j E, i.e., the pair (P, ) given by (86) and (87) is reversible.
2. Birth and Death Processes
For the birth and death processes with two reflecting barriers considered in Section 2.2.3 let the
transition matrix P = (pij ) be of such a form that the equation > = > P has a uniquely
determined probability solution > = (1 , 2 , . . .).
For this situation one can show that
i pij = j pji
i, j E .
3. Random Walks on Graphs

We consider a connected graph G = (V, K)
with the set V = {v1 , . . . , v` } of ` vertices and the set K of edges, each of them connecting
two vertices
such that for every pair vi , vj V of vertices there is a path of edges in K connecting vi and
vj .
We say that two vertices vi and vj are neighbors if there is an edge connecting them, i.e., an edge
having both of them as endpoints, where di denotes the number of neighbors of vi .
2 MARKOV CHAINS
41
A random walk on the graph G = (V, K) is a Markov chain X0 , X1 , . . . : E with state space
E = {1, . . . , `} and transition matrix P = (pij ), where
1
di
pij =
if the vertices vi and vj are neighbors,

(88)
else.
Figure 1 shows such a graph G = (V, K) where the set V = {v1 , . . . , v8 } contains 8 vertices and
the set K consists of 12 edges. More precisely
K =
(v1 , v2 ), (v1 , v3 ), (v2 , v3 ), (v2 , v8 ), (v3 , v4 ), (v3 , v7 ), (v3 , v8 ), (v4 , v5 ), (v4 , v6 ), (v5 , v6 ),
(v6 , v7 ), (v7 , v8 ) .
One can show that
the transition matrix given by (88) is irreducible,
the (according to Theorem 2.10 uniquely determined) stationary initial distribution is given
by
d
`
P
d` >
1
=
,...,
,
where d =
di ,
(89)
d
d
i=1
the pair (P, ) given by (88)(89) is reversible as for arbitrary i, j {1, . . . , `}
d 1
1
dj 1
i
= =
= j pji
d di
d
d dj
i pij =
0 = j pji

else.
The transition matrix P given by (88) for the numerical example defined in Figure 1 is not only
irreducible but also aperiodic and the stationary initial distribution (= = limn n ) is
given by
2 3 5 3 2 3 3 3 >
=
,
,
,
,
,
,
,
.
24 24 24 24 24 24 24 24
4. Cyclic Random Walks
The following example of a cyclic random walk
Let E = {1, 2, 3, 4} and
0.25
P=
0.75
is not reversible.
0.75
0.25
0
0.75
0
0.25
0
0.75
0
0.25
0
(90)
i.e., the transition graph is given by Figure 2.

The transition matrix (90) is obviously irreducible, but not aperiodic, and the initial distribution
(which is uniquely determined by Theorem 2.10) is given by = (1/4, 1/4, 1/4, 1/4)> .
However,
3
1
1 1
1 3
=
>
=
= 2 p21 .
1 p12 =
4 4
16
16
4 4
It is intuitively plausible that this cyclic random work is not reversible as clockwise steps are much
more likely than counterclockwise movements.
2 MARKOV CHAINS
42
0.75
2
0.25
0.75
0.25
0.25
0.75
0.25
4
0.75
Figure 2: Transition Graph

5. DoublyStochastic Transition Matrix
Finally we consider the following example of a transition matrix P = (pij ) and a stationary initial
distribution = (1 , . . . , ` )> which are not reversible: For a, b > 0 such that b < a and 2a+b = 1
let
a
ab
2b
P= a+b
(91)
b
ab
.
0
a+b
a
This transition matrix P is doublystochastic, i.e., the transposed matrix P> is also a stochastic
matrix and P is obviously quasipositive.
The (uniquely determined) stationary initial distribution = limn n is given by
= (1/3, 1/3, 1/3)> .
As the transition matrix P in (91) is not symmetric the pair (P, ) is not reversible.
2.3.2
Recursive Construction of the Past
Recall that
in Section 2.1.3 we showed that a stationary Markov chain X0 , X1 , . . . with transition matrix P = (pij )
and stationary initial distribution = (1 , . . . , ` )> can be constructed as follows, where
we started with a sequence Z0 , Z1 , . . . of independent and on [0, 1] uniformly distributed random variables and defined
k
k1
i
X
X
X0 = k
if and only if
Z0
i ,
i ,
i=1
for all k = 1, . . . , `, i.e.

X0 =
`
X
k=1
i=1
k
k1
X
X
k1I
i < Z0
i .
i=1
i=1
(92)
2 MARKOV CHAINS
43
The random variables X1 , X2 , . . . were defined by the recursion formula

Xn = (Xn1 , Zn )
for n = 1, 2, . . . ,
(93)
pij .
(94)
where the function : E [0, 1] E was given by

(i, z) =
`
X
k=1
k1I
k1
X
pij < z
j=1
k
X
j=1
If the pair (P, ) is reversible, then the stationary Markov chain X0 , X1 , . . . constructed in (92)(94) can
be tracked back into the past in the following way.
First of all we extend the sequence Z0 , Z1 , . . . of independent and on [0, 1] uniformly distributed random
variables to a sequence . . . , Z1 , Z0 , Z1 , . . . of independent and identically random variables that is
unbounded in both directions.
Note that due to the assumed independence of . . . , Z1 , Z0 , Z1 , . . . this expansion does not pose any
problems as the underlying probability space can be constructed via an appropriate product space,
productalgebra, and product measure.
The random variables X1 , X2 , . . . are now constructed recursively setting
Xn1 = (Xn , Zn1 )
for n = 0, 1, . . . ,
(95)
where the function : E [0, 1] E is defined in (94).
Theorem 2.14
Let X0 , X1 , . . . : E be a reversible Markov chain with state space E, transition matrix P = (pij ) and
stationary initial distribution = (1 , . . . , ` )> .
Then the sequence . . . , X1 , X0 , X1 , . . . : E defined by (92)(95) is
a stationary Markov chain with transition matrix P and the onedimensional marginal distribution ,
i.e., for arbitrary k Z = {. . . , 1, 0, 1, . . .}, ik , ik+1 , . . . , in E and m 1
P (Xk = ik , Xk+1 = ik+1 , . . . , Xn1 = in1 , Xn = in )
= P (Xk+m = ik , Xk+m+1 = ik+1 , . . . , Xn+m1 = in1 , Xn+m = in )
= ik pik ik+1 . . . pin1 in .
The proof of Theorem 2.14 is quite similar to the ones given for Theorems 2.11 and 2.13 and is therefore omitted.
2.3.3
Determining the Rate of Convergence under Reversibility
Let E = {1, . . . , `} and P be a quasi-positive (i.e. an irreducible and aperiodic) transition matrix.
In case the eigenvalues 1 , . . . , ` of P are pairwise distinct we showed by the PerronFrobenius
Theorem (see Corollary 2.4) that
max |nj j | = O(|2 |n ) ,
jE
(96)
where = (1 , . . . , ` )> is the (uniquely determined) solution of the equation > = > P.
If (P, ) is also reversible one can show that the basis |2 | considered in (96) cannot be improved.
2 MARKOV CHAINS
44
Let (P, ) be reversible, where P is an irreducible and aperiodic transition matrix.

In this case the detailed balance condition (85) implies the symmetry of the matrix DPD1 where
D = diag( i ).
As the eigenvalues 1 , . . . , ` of P coincide with the eigenvalues of DPD1 we obtain i R for all
i E,
and the right eigenvectors 1 , . . . , ` of DPD1 can be chosen such that all of their components are
real,
that furthermore 1 , . . . , ` are also left eigenvectors of DPD1 and that the rows as well as the lines
of the ` ` matrix (1 , . . . , ` ) are orthonormal vectors.
The spectral representation (30) of A = DPD1 yields for every n 1
`
X
1
n
1 n
P = D AD = D A D =
kn D1 k (k )> D .
n
k=1
By plugging in 1 = 1 and
= ( 1 , . . . ,
r
(n)
pij = j +
` )> we obtain for arbitrary i, j E
`
j X n
k ki kj ,
i
where k = (k1 , . . . , k` )> .
(97)
k=2
If n is even or all eigenvalues 2 , . . . , ` are nonnegative, then

`
X
(n)
sup max |nj j | max pjj j = max

kn (kj )2 2n max(2j )2 .
0 jE
jE
jE
jE
k=2
This shows that |2 | is the smallest positive number such that the estimate for the rate of convergence
considered in (96) holds uniformly for all initial distributions 0 .
Remarks
Notice that (97) yields the following more precise specification of the convergence estimate (96). We
have
`
P
|ki ||kj |
`
X
(n)
1
1
p j q
q
|k |n |ki ||kj | k=2
|2 |n q
|2 |n ,
ij
min i k=2
min i
min i
iE
iE
iE
as the column vectors 1 , . . . , ` and hence also the row vectors (1,j , . . . , `,j ) where j = 1, . . . , ` form
an orthonormal basis in R` and thus by the CauchySchwarz inequality
`
X
|ki ||kj |
k=2
`
X
(ki )2
`
1/2 X
k=1
(kj )2
1/2
= 1.
Consequently,
k=1
1
|2 |n .
min i
max |nj j | q
jE
(98)
iE
However, the practical benefit of the estimate (98) can be limited for several reasons:
The factor in front of |2 |n in (98) does not depend on the choice of the initial distribution 0 .
2 MARKOV CHAINS
45
The derivation of the estimate (98) requires the Markov chain to be reversible.
It can be difficult to determine the eigenvalue 2 if the number of states is large.
Therefore in Section 2.3.5 we consider an alternative convergence estimate,
which depends on the initial distribution
and does not require the reversibility of the Markov chain.
Furthermore, in Section 2.3.7 we will derive an upper bound for the second largest absolute value
|2 | among the eigenvalues of a reversible transition matrix.
2.3.4
Multiplicative Reversible Version of the Transition Matrix; Spectral Representation
At first we will discuss a method enabling us to transform (ergodic) transition matrices such that the resulting
matrix is reversible.
Let P = (pij ) be an irreducible and aperiodic (but not necessarily reversible) transition matrix and let
= (1 , . . . , ` )> be the corresponding stationary initial distribution such that i > 0 for all i E.
e = (e
Moreover, we consider the stochastic matrix P
pij ) where
peij =
j pji
,
i
(99)
e = D2 P> D2 where D = diag(i ) is also an irreducible and aperiodic transition matrix having the
i.e., P
same stationary initial distribution = (1 , . . . , ` )> .
e is reversible as we observe
The pair (M, ), where the stochastic matrix M = (mij ) is given by M = PP,
i mij = i
`
X
k=1
Definition
pik
X
j pjk
i pik
= j
pjk
= j mji .
k
k
k=1
e is called the multiplicative reversible version of the transition matrix P.

The matrix M = PP
Remarks
All eigenvalues M,1 , . . . , M,` of M are real and in [0, 1] because M has the same eigenvalues as the
symmetric and nonnegative definite matrix M = DMD1 , where
`
`
j
X
i
i X
i
j pjk
mij = mij =
pik
=
pik pjk
j
j
k
k
k
k=1
and hence
k=1
>
M = DMD1 = DPD1 DPD1 .
As a consequence, the symmetric matrix M is diagonalizable and the right and left eigenvectors i
and i can be chosen such that
i = i for all i E
the vectors 1 , . . . , ` are an orthonormal basis in R` .
2 MARKOV CHAINS
46
Then 1 , . . . , ` and 1 , . . . , ` , where

i = D1 i
i = D i ,
and
i E ,
(100)
are right and left eigenvectors of M, respectively, as for every i E

Mi = MD1 i = D1 DMD1 i = D1 M,i i = M,i i
and
>
M = ( i )> DMD1 D = M,i ( i )> D = M,i >
>
i .
i M = D i
This yields the following spectral representation of the multiplicative reversible version M obtained from the
transition matrix P; see also the spectral representation given by formula (30).
Theorem 2.15
For arbitrary n N and x R`

Mn x =
`
X
n
M,i
i >
i x.
(101)
i=1
where i and i are the right and left eigenvectors of M defined in (100).
Proof
As the (right) eigenvectors 1 , . . . , ` of M defined in (100) are also a basis in R` , for every x R`
(r)
(r) >
there is a (uniquely determined) vector x1 , . . . , x`
R` such that
x=
`
X
(r)
x i i .
i=1
n
Furthermore, we have Mi = M,i i and hence Mn i = M,i
i for arbitrary i E and n N.
Thus we obtain
Mn x =
`
X
(r)
xi Mn i =
i=1
`
X
(r)
n
xi M,i
i .
i=1
On the other hand, (100) implies for arbitrary i E and x R`

`
`
`
X
> X
> X
>
(r)
(r)
(r)
(r)
>
>
x
=
D
x
=
x
D
=
=
xj i j = xi ,
j
i
i
i
i
j
j
j
j=1
j=1
(102)
j=1
where the last equality takes into account that i = i for all i E and that the eigenvectors
1 , . . . , ` von M are an orthonormal basis of R` .
This proves the spectral representation (101).
2.3.5
Alternative Estimate for the Rate of Convergence; 2 -Contrast
e of the ergodic (but not necessarily reversible) transition

Based on the multiplicative reversible version M = PP
matrix P we will now deduce an alternative estimate for the rate of convergence > Pn > for n ; see
Theorem 2.16.
The following abbreviations and lemmata will turn out to be useful in the proof of Theorem 2.16.
2 MARKOV CHAINS
47
Let L(E) denote the family of all functions

defined on E and mapping into the real line R
and let = (1 , . . . , ` )> be an arbitrary positive probability function from L(E), i.e. i > 0 for all
P`
i E and i=1 i = 1.
For arbitrary vectors x = (x1 , . . . , x` )> L(E) and y = (y1 , . . . , y` )> L(E) we denote by (x, y) the
inner product
`
X
(x, y) =
xi yi i
(103)
i=1
and by kxk the induced norm, i.e.,
v
u `
uX
kxk = t
x2i i .
i=1
The terms (weighted) mean (x) and variance Var (x) of x L(E) will be used to denote the quantities
(x) =
`
X
xi i
= (x, e)
(104)
i=1
and
Var (x) = kxk2 (x)2 ,
(105)
respectively.
Lemma 2.6
For all x L(E), it holds that
e + (I M)x, x .
Var (x) = Var (Px)
(106)
Proof
b = x (x) e we obtain that (b
Introducing the notation x
x) = 0 and
e x) =
(Pb
` X
`
X
i=1
`
`
X
X
peij xj (x) i =
peij xj i (x) =
j pji xj (x) = 0 ,
j=1
i,j=1
i,j=1
e
where the last but one equality follows from the definition (99) of the matrix P.
This implies
kb
xk2 = Var (b
x) = Var (x)
e xk2 = Var (Pb

e x) = Var (Px)
e .
kPb
and
On the other hand

e xk2
kPb
` X
`
2
X
e x, Pb
ex =
b
Pb
p
e
x
i
ik
k
i=1 k=1
` X
`
X
peij
i=1 j,k=1
` X
`
X
k=1 j=1
bj x
b k i
peik x
|{z}
=
k pki
i
e kj x
b k k
bj x
(PP)
{z
e k
=(PPx)
e x, x
b
b = Mb
x, x
PPb
(107)
2 MARKOV CHAINS
and thus
48
e xk2 = kb
b = (I M)b
b = (I M)x, x ,
kb
xk2 kPb
xk2 Mb
x, x
x, x
as M is a stochastic matrix such that > M = > and therefore (I M)e = 0 and
(I M)x, e
`
X
`
`
`
X
X
X
j (i) mij xj i =
xi i
xj
i mij = 0 .
i,j=1
i=1
j=1
i=1
{z
=j
Taking into account (107) this shows the validity of (106).
We introduce the following notions.

Let E = {1, . . . , `}, let = (1 , . . . , ` )> and = (1 , . . . , ` )> be arbitrary probability distributions on
E, and let
1 X
dTV (, ) =
|i i | ,
(108)
2
iE
i.e., the distance dTV (, ) between and is expressed via the total variation
X
| | =
|i i |
(109)
iE
of the signed measure .

If i > 0 for all i E we also consider the term
2 (; ) =
X (i i )2
i
(110)
iE
which is called the 2 contrast of with respect to .
The distance dTV (, ) between and can be estimated via the 2 contrast 2 (; ) of with respect to
as follows.
Lemma 2.7
If i > 0 for all i E, then

d2TV (, )
1 2
(; ) .
4
(111)
Proof
Taking into account that
X
iE
P
iE
|i i |
i = 1, an application of the CauchySchwarz inequality yields

2
X
iE
p 2 X 1
1
i i i
(i i )2 .
i
i
iE
This implies the assertion of the lemma.
The rate of convergence > Pn > for n can now be estimated based on
2 MARKOV CHAINS
49
e of the (ergodic) transition

the second largest eigenvalue M,2 of the multiplicative reversible version M = PP
matrix P
and the 2 contrast 2 (; ) of the initial distribution with respect to the stationary limit distribution
.
Theorem 2.16
For any initial distribution and for all n N,
> 2 (; ) n
d2TV > Pn ,
M,2 .
4
(112)
Proof
Let n = (n1 , . . . , n` )> where ni = (> Pn )i /i .
Then for all i E
`
X
k pki (> Pn )k
(> Pn+1 )i
=
i
k
i
k=1
and thus
e n = n+1 .
P
>
>
Moreover, by definition (110) of the 2 -contrast 2n = 2 > Pn ; of > Pn with respect
to we obtain
> n
2
`
`
2
X
X
( P )i i
(> Pn )i
2
n =
=
1 i
i
i
i=1
i=1
=
`
X
2
ni (n ) i = Var (n ) ,
i=1
i.e.,
2n = Var (n ) .
(113)
Now the identity (106) derived in Lemma 2.6 yields
2n = 2n+1 + (I M)n , n .
(114)
On the other hand the spectral representation (101) of M derived in Theorem 2.15 implies
(I M)n , n
= (n , n ) (Mn , n )
= (n , n )
`
X
M,i (i >
i n , n )
i=1
= (n , n ) 1
`
X
M,i (i >
i n , n ) ,
i=2
>
as M,1 = 1, 1 = e and >
1 = and therefore
1 >
1 n , n ) = n , n = (e, n ) = (n ) = 1
= (n )2 .
As the eigenvectors 1 , . . . , ` von M defined in (100) are a basis of R` there is a (uniquely

(r)
P`
(r)
(r) >
determined) vector n1 , . . . , n`
R` , such that n = i=1 ni i .
(r)
>
>
>
Moreover, in (102) we have shown that >
i n = ni . As 1 = we can conclude 1 n = 1.
2 MARKOV CHAINS
50
Furthermore, as i = D1 i and i = Di for all i E we obtain

`
X
M,i (i >
i n , n )
i=2
` X
`
X
(r) (r)
M,i ni nj (i , j )
i=2 j=1
` X
`
X
(r) (r)
M,i ni nj D1 i , D1 j
|
{z
}
i=2 j=1
=i (j)
`
X
(r) 2
M,i ni
i=2
M,2
`
`
X
X
(r) 2
(r) 2
ni
ni = M,2
i=2
= M,2
i=1
` X
`
X
!
(r) (r)
ni nj (i , j ) 1
(r)
n1
|{z}
= >
1 n =1
= M,2
i=1 j=1
(n , n ) 1 .
|
{z
}
=Var (n )
Summarizing our results we have seen that
1 M,2 Var (n ) .
(I M)n , n
Because of (113) and (114) this implies
2n 2n+1 + 1 M,2 2n
and
2n+1 M,2 2n .
n
Thus, we have shown that 2n M,2
20 for all n 1 and, consequently, the assertion follows from
Lemma 2.7.
2.3.6
DirichletForms and RayleighTheorem
Let E = {1, . . . , `} be an arbitrary finite set and let P be an (` `)dimensional transition matrix, which
is irreducible and aperiodic (i.e. quasipositive) as well as reversible.
Recall that
all eigenvalues of P are real (see Section 2.3.3), and
by the PerronFrobenius theorem (see Theorem 2.6 and Corollary 2.3) the eigenvalues of P are in the
interval (1, 1], where
the largest eigenvalue is 1 and the absolute values of the other eigenvalues are (strictly) less than 1.
Remarks
Instead of ordering the eigenvalues according to their absolute values (like above) we will now order
them with respect to their own size and denote them by 1 , . . . , ` such that
1 = 1 > 2 . . . ` > 1 .
e of the transition matrix P that was
Moreover, for the multiplicative reversible version M = PP
introduced in Section 2.3.4 we have
1 = 1 > 2 . . . ` > 0 ,
i.e., for the eigenvalues of the matrix M the notations 1 , . . . , ` and 1 , . . . , ` coincide.
2 MARKOV CHAINS
51
For large `,
the calculation of the second largest absolute value |2 | = max{2 , |` |} among the eigenvalues can
cause difficulties.
Therefore, in Section 2.3.7 we will derive bounds for 2 and ` , whose calculation is very simple.
These bounds are particularly useful if
the stationary (limit) distribution is at least in principle known,
but in spite of this the corresponding Markov chain is started with a non-stationary initial distribution ; for example it could be started in a predetermined state i E, i.e. i = 1 and j = 0
for j 6= i.
In order to derive an upper bound for 2 , we need a representation formula for 2 ,
that is usually called the Rayleightheorem in literature
and that is expressed based on the socalled Dirichletform
D(P,) (x, x) = (I P)x, x
(115)
of the reversible pair (P, ) , where (y, x) denotes the inner product of y and x with respect to ; see
(103).
First of all we will show the following lemma.

Lemma 2.8
For all x = (x1 , . . . , x` )> R` ,

D(P,) (x, x) =
1 X
i pij (xj xi )2 .
2
(116)
i,jE
Proof
From the definition (103) of the inner product and the reversibility of the pair (P, ) we obtain
X
2 (I P)x, x = 2
i pij xi (xi xj )
i,jE
ij
i pij xi (xi xj ) +
i,jE
(85)
i,jE
j pji xj (xj xi )
i,jE
i pij xi (xi xj ) +
i,jE
X
X
i pij xj (xj xi )
i,jE
i pij (xj xi )2 .
We will now prove the Rayleightheorem that yields a representation formula for the second largest eigenvalue
2 of the reversible pair (P, ).
Theorem 2.17
Let R`6= = x = (x1 , . . . , x` )> R` : xi 6= xj for some pair i, j E denote the set of all vectors in R`
whose components are not all equal.
2 MARKOV CHAINS
52
For the eigenvalue 2 of the reversible pair (P, ) the following holds
2 = 1 inf
xR`6=
D(P,) (x, x)
,
Var (x)
(117)
where Var (x) denotes the variance of the components of x with respect to defined in (105).
Proof
Lemma 2.8 implies for arbitrary c R and x R`
D(P,) (x, x) = D(P,) (x c e, x c e) .
Thus, the assertion (117) is equivalent to
D(P,) (x, x)
,
(118)
Var (x)
where R`0 = x = (x1 , . . . , x` )> R` : (x) = 0, x 6= 0 .

Let now the left eigenvectors 1 , . . . , ` of P be chosen such that they are an orthonormal basis of
R` with respect to the inner product ( , ) , i.e. (i , j ) = 1 if i = j and (i , j ) = 0 if i 6= j
where 1 = e.
First of all, the eigenvectors 1 , . . . , ` of the symmetric vectors DPD1 are chosen such that
they are orthonormal with respect to the ordinary Euclidian inner product. Then we can define
i = D1 i for all i E (see also Section 2.3.3).
(r)
(r) >
R` such that
For every x R` there is now a uniquely determined vector x1 , . . . , x`
1 2 = inf
xR`0
x=
`
X
(r)
xi i .
i=1
As 1 = 1 we obtain
(I P)x =
`
X
(r)
(1 i )xi i
and hence
D(P,) (x, x) =
i=2
`
X
(r) 2
.
(1 i ) xi
i=2
On the other hand as 1 = e and the eigenvectors 1 , . . . , ` are orthonormal with respect to the
inner product ( , ) we can conclude that
(x) = (x, e) =
(r)
x1
and
Var (x) =
`
X
(r) 2
xi
if (x) = 0 .
i=2
Thus for every x R`0

`
P
D(P,) (x, x)
Var (x)
i=2
(r) 2
(1 i ) xi
`
P
i=2
(r) 2
xi
`
P
(1 2 ) +
i=2
`
(r) 2
P
(r) 2
(1 i ) xi
(1 2 )
xi
i=2
`
P
(r) 2
xi
i=2
`
P
(1 2 ) +
i=3
(r) 2
(2 i ) xi
`
P
(r) 2
xi
i=2
1 2 .
2 MARKOV CHAINS
53
This shows that (118) holds as the last expression for the quotient D(P,) (x, x)/Var (x) implies
D(P,) (x, x)
= 1 2 ,
Var (x)
for x = 2 where 2 R`0 as 1 = e and 2 are linearly independent.
2.3.7
Bounds for the Eigenvalues 2 and `
In order to derive bounds for the eigenvalues 2 and ` the following notions and notations are necessary.
For each pair i, j E such that i 6= j and pij > 0 we denote
by e = eij the corresponding directed edge of the transition graph
by e = i and e+ = j the starting and target vertices of e, respectively.
Let E be the set of all directed edges e = eij such that i 6= j and pij > 0.
Furthermore, for each i, j E such that i 6= j we consider exactly one path ij from i to j,
which is given by a vector ij = (i0 , i1 , . . . , im1 , im ) of states such that i = i0 , j = im and
pii1 pi1 i2 . . . pim1 j > 0 ,
such that none of the edges eik1 ik is contained more than once (and m is the smallest possible number).
Let be the set of all these paths and for each path ij define
|ij | =
X
eij
1
1
1
1
=
+
+ ... +
,
Q(e)
i pii1
i1 pi1 i2
im1 pim1 j
(119)
where Q(eik1 ik ) = ik1 pik1 ik .

The socalled Poincarcoefficient of the set of paths is then defined as
X
= () = max
|ij |i j .
eE
(120)
ij 3e
Finally we consider
the extended set of edges E 0 E also containing the edges of the type i i in case pii > 0.
for all i E exactly one path i from i to i which contains an odd number of edges in E 0 such that no
edge occurs more than once.
Let 0 be the set of all these paths and for every path
i 0 let
X 1
.
|i | =
Q(e)
e
(121)
The coefficient of the path set is then defined as

= (0 ) = max0
eE
X
i 3e
|i |i .
(122)
2 MARKOV CHAINS
Theorem 2.18
54
For the eigenvalues 2 and ` of P the following inequalities hold

1
1
2
2 ` 1 +
and hence
max{2 , |` |} 1 min
(123)
n1 2o
,
.

(124)
Proof
First we will show that 2 1 1 .
Because of Theorem 2.17 it suffices to show that
Var (x) D(P,) (x, x) ,
x R` .
(125)
Using the notation introduced in (119) we obtain

Var (x)
1
2kxk2 2(x)2
2
X
X
1 X 2
xi i +
x2j j 2
xi xj i j
2
=
=
iE
jE
i,jE
1 X
(xi xj )2 i j
2
i,jE
2
p
1
1 XX
p
Q(e)(xe xe+ ) i j .
2
Q(e)
i,jE eij
=
=
An application of the CauchySchwarz inequality yields

Var (x)
X
1 X
|ij |
Q(e)(xe xe+ )2 i j
2
eij
i,jE
!
X
1 X
2
Q(e)(xe xe+ )
|ij | i j
2
3e
eE
ij
D(P,) (x, x) ,
where the last inequality follows from Lemma 2.8 and by definition of the Poincarcoefficient; see
(120). This shows (125).
In order to finish the proof it is left to show that ` 1 + 2 1 .
For this purpose we exploit the following equation: For all x = (x1 , . . . , x` )> R`
1 X
(xi + xj )2 i pij = (Px, x) + kxk2 ,
2
(126)
i,jE
as the reversibility of (P, ) implies

1 X
(xi + xj )2 i pij
2
i,jE
X
1 X 2
1 X 2
xi xj i pij +
xi i pij +
xj i pij
2
2
| {z }
i,jE
i,jE
i,jE
(85)
|
{z
}
= j pji
P 2
{z
}
|
=
xi i
iE
jE
= kxk2 + (Px, x) .
x2j j
2 MARKOV CHAINS
55
Let now i = (i0 , i1 , . . . , i2m , i2m+1 ) where i = i0 = i2m+1 is a path from i to i, containing an odd
number of edges such that every edge does not occur more than once.
Then
1
xi =
(xi + xi1 ) (xi1 + xi2 ) + . . . + (xi2m + xi )
2
1 X
=
(1)n(e) (xe+ + xe ) ,
2 e
i
where n(e) = k if e = (ik , ik+1 ) i .

Similarly to the first part of the proof, the CauchySchwarz inequality implies that for all x =
(x1 , . . . , x` )> R`
kxk2
2
X i X
p
1
p
Q(e)(1)n(e) (xe+ + xe )
4 e
Q(e)
iE
i
X
X i
|i |
(xe+ + xe )2 Q(e)
4
ei
iE
X
1 X
=
(xe+ + xe )2 Q(e)
|i |i
4
0
3e
=
eE
X
(xe+ + xe )2 Q(e) .
4
0
eE
From (126) we can now conclude that

kxk2

(Px, x) + kxk2 .
2
For x = ` we obtain in particular that

1
Example
(` + 1)
2
and
` 1 +
2
.
Random Walk on a Graph
We return to the example of a random walk on a graph that has been already discussed in Section 2.3.1.
Let G = (V, K) be a connected graph with vertices V = {v1 , . . . , v` } and edges K where each edge
connects two vertices,
such that for each pair vi , vj V of vertices there is a path of edges in K connecting vi and vj .
A random walk on the graph G = (V, K) is a Markov chain X0 , X1 , . . . : E
with state space E = {1, . . . , `} and transition matrix P = (pij ) where
1
di
pij =

(127)
else.
Recall that two vertices vi and vj are called neighbors if they are endpoints of the same edge
where, for each vertex vi , di denotes its number of neighbors.
We already showed that
the transition matrix P given in (127) is always irreducible (where we now additionally assume P
to be aperiodic),
2 MARKOV CHAINS
56
the uniquely determined initial distribution is given by

d
`
P
d` >
1
=
,...,
,
where d =
di ,
d
d
i=1
(128)
the pair (P, ) given by (127)(128) is reversible.

For the Poincarcoefficient introduced in (120) we obtain
X
= () = max
|ij |i j ,
eE
where
|ij | =
X
eij
ij 3e
1
= d (ij )
Q(e)
and (ij ) = #{e : e ij } denotes the number of edges (i.e. the length) of the path ij .
Taking into account (127)(128), this implies
()
2
,
d
(129)
where d/2 denotes the total number of edges,

= maxiE di is the maximum number of edges originating at a vertex,
= max () denotes the maximal path length and
= maxeE #{ : 3 e} is the socalled Bottleneckcoefficient, i.e. the maximal number of
paths containing a single edge.
From (123) and (129) we obtain the following estimate
1
d
1 2
2 1
(130)
for the second largest eigenvalue 2 of P.

In a similar way one obtains the upper bound
= (0 ) = max0
eE
where
|i | =
X
ei
1
= d (i ) ,
Q(e)
|i |i 0 0 ,
i 3e
0 = max0 () ,
0 = max0 #{ 0 : 3 e}
and hence
` 1 +
eE
2
2
1 +
.
0 0
Remarks.
For the numerical example from Section 2.3.1
v4
(131)
v1
v3
v2
v5
v6
v7
v8
2 MARKOV CHAINS
57
the following holds:

d = 24, = 5, = 3, = 7
and
0 = 3, 0 = 3.
The inequalities (130) and (131) thus imply

2 1
24
24
<
25 3 7
25
and
8 > 1 +
and hence
max{2 , |8 |} <
24
.
25
2
43
=
,
533
45
3 MONTECARLO SIMULATION
58
MonteCarlo Simulation
Besides the traditional ways of data acquisition in laboratory experiments and field tests the generation of
so-called synthetic data via computer simulation has gained increasing importance.
There is a variety of reasons for the increased benefit drawn from computer simulation used to investigate
a wide range of issues, objects and processes:
The most prominent reason is the rapidly growing performance of modern computer systems which has
extended our computational capabilities in a way that would not have been imaginable even a short
time ago.
Consequently, computer-based data generation is often considerably cheaper and less time-consuming
than traditional data acquisition in laboratory experiments and field tests.
Moreover, computer experiments can be repeated under constant conditions as frequently as necessary
whereas in traditional scientific experiments the investigated object is often damaged or even destroyed.
A further reason for the value of computer simulations is the fact
that volume and structure of the analyzed data is often very complex
and that in this case data processing and evaluation is typically based on mathematical models whose
characteristics cannot be (completely) described by analytical formulae.
Thus, computer simulations of the considered models present a valuable alternative tool for analysis.
Computer experiments for the investigation of the issues, objects and processes of scientific interest are
based on stochastic simulation algorithms. In this context one also uses the term MonteCarlo simulation
summarizing a huge variety of simulation algorithms.
1. Random number generators are the basis for MonteCarlo simulation of single features, quantities and
variables.
By these algorithms realizations of random variables can be generated via the computer. Those
are called pseudorandom numbers.
The simulation of random variables is based on socalled standard random number generators
providing realizations of random variables that are uniformly distributed on the unit interval
(0, 1].
Certain transformation and rejection methods can be applied to these standard pseudorandom
numbers in order to generate pseudorandom numbers for other (more complex) random variables
having e.g. binomial, Poisson or normal distributions.
2. Computer experiments designed to investigate highdimensional random vectors or the evolution of
certain objects in time are based on more sophisticated algorithms from socalled dynamic Monte
Carlo simulation.
In this context MarkovChainMonteCarloSimulation (MCMC simulation) is a construction
principle for algorithms that are particularly appropriate to simulate time stationary equilibria of
objects or processes.
Another example for the application of MCMC simulation is statistical image analysis.
An active field of research that resulted in numerous publications during the last years are so-called
coupling algorithms for perfect MCMC simulation.
These coupling algorithms enable us to simulate timestationary equilibria of objects and processes
in a way that does not only allow approximations but simulations that are perfect in a certain
sense.
3.1
3.1.1
59
Generation of Pseudo-Random Numbers

Simple Applications; MonteCarlo Estimators
First we recall two simple problems that can be solved by means of MonteCarlo simulation and have already
been discussed in the course Elementare Wahrscheinlichkeitsrechnung und Statistik.
1. Algorithm to determine the number

A simple computer algorithm for the MonteCarlo simulation of is the following improved version of
Buffons needle experiment; see Sections 2.5 and 5.2.3 of the course Elementare Wahrscheinlichkeitsrechnung und Statistik.
This algorithm is based on the following geometrical facts.
We consider the square
B = (1, 1] (1, 1] R2 ,
the circle C inscribed into B, where

C = {(x, y) : (x, y) B, x2 + y 2 < 1} ,
and arbitrarily toss a point into the set B.
Translated into the language of stochastics this means:
We consider two independent random variables S and T that are uniformly distributed on the
interval (1, 1] and
determine the probability of the event
A = {(S, T ) C} = {S 2 + T 2 < 1} ,
i.e. that the random point (S, T ) is in C B.
Then
P (A) = P (S 2 + T 2 < 1) =
|C|
= ,
|B|
4
where |B| and |C| denote the area of B and C, respectively.

Similarly to Buffons needle experiment the equation P (A) = /4 yields a
method for the statistical estimation of ,
which is based on the strong law of large numbers (SLLN) and can be easily implemented.
Let (S1 , T1 ), . . . , (Sn , Tn ) be independent and identically distributed random vectors,
whose distribution coincides with the one of (S, T )
and which are regarded as a stochastic model for n (independent) experiments.
Then X1 , X2 , . . . , Xn where
1 if S 2 + T 2 < 1,
i
i
Xi =
0
else
are independent and identically distributed random variables with expectation E Xi = /4.
Furthermore, the SLLN (see Theorem WR-5.15) implies
that the arithmetic mean
Yn = n1
n
X
i=1
converges to /4 almost surely.
Xi
60
Thus, Yn is an unbiased and (strongly) consistent estimator for /4,

i.e., the probability of 4Yn to be a good approximation for is very high if n is large.
For the implementation of this simulation algorithm one can proceed as follows
Use a random number generator to generate 2n pseudorandom numbers u1 , . . . , u2n that are
realizations of random variables being uniformly distributed on (0, 1].
Put si = 2ui 1 and ti = 2un+i 1 for i = 1, . . . , n.
Define
1 if s2 + t2 < 1,
i
i
xi =
0
else
Compute 4(x1 + . . . + xn )/n.
2. Monte Carlo Integration
Let : [0, 1] [0, 1] be a continuous function.
Our goal is to find an estimator for the value of the integral
MonteCarlo simulation.
We consider the following stochastic model.
R1
0
(x) dx that can be determined by
Let the random variables X1 , X2 , . . . : R be independent and uniformly distributed on (0, 1], with
probability density fX given by
1 if x [0, 1],
fX (x) =
0 else.
Let Zk = (Xk ) for all k = 1, 2, . . ..
By the transformation theorem for independent and identically distributed random variables (see
Theorem WR-3.18) the random variables Z1 , Z2 , . . . are independent and identically distributed
with
Z
Z
1
E Z1 =
(x)fX (x) dx =
0
(x) dx .
0
Furthermore the SSLN (see Theorem WR-5.15) implies that for n

n
1X
a.s.
Zk
n
k=1
(x) dx .
0
R1
Zk is an unbiased and (strongly) consistent estimator for 0 (x) dx,
R1
Pn
i.e., the probability for n1 k=1 Zk to be a good approximation of the integral 0 (x) dx is high
for sufficiently large n.
Hence
1
n
Pn
k=1
For the implementation of this simulation algorithm one can proceed similarly to Example 1:
Use a random number generator to generate n pseudo-random numbers x1 , . . . , xn that are realizations of random variables being uniformly distributed in (0, 1].
Define zk = (xk ) for k = 1, . . . , n.
Pn
Compute n1 k=1 zk .
3.1.2
61
Linear Congruential Generators
Most simulation algorithms are based on standard random number generators,

whose goal is to generate sequences u1 , . . . , un of numbers in the unit interval (0, 1]. These are the
so-called standard pseudorandom numbers,
which can be regarded as realizations of independent and on (0, 1] uniformly distributed random
variables U1 , . . . , Un .
A commonly established procedure to generate standard pseudo-random numbers is the following linear
congruential method,
where first of all the numbers z1 , . . . , zn are generated according to a recursion formula
zk = (azk1 + c)
mod (m) ,
k = 1, . . . , n
(1)
.
The initial value z0 {0, 1, . . . , m 1} the algorithm is starting from is called germ of the linear
congruential generator.
m N, a {0, 1, . . . , m 1} and c {0, 1, . . . , m 1} are further parameters called modulus, factor
and increment of the congruential generator.
The scaling
zk
uk =
(2)
m
yields the standard pseudorandom numbers u1 , . . . , un .
As a next step we will solve the recursion equation (1), i.e., we will show how the number zk that has been
recursively defined in (1) can be expressed directly by the initial value z0 and the parameters m, a and c.
Theorem 3.1
For all k {1, . . . , n}
ak 1
zk = ak z0 + c
a1
mod (m) .
(3)
Proof
We show the assertion by mathematical induction. For k = 1 the claim (3) coincides with the recursion
equation (1).
Let (3) be true for a certain k 1, i.e., there is an integer j 0 such that
zk = ak z0 + c
ak 1
jm .
a1
(4)
We show that this implies that (3) also holds for k + 1.

By the recursion equation (1) and by induction hypothesis (4) we get that
zk+1
=
=
=
=
i.e., (3) also holds for k + 1.
(azk + c) mod (m)
ak 1
k
a a z0 + c
jm + c
mod (m)
a1
a(ak 1) + a 1
ak+1 z0 + c
ajm
mod (m)
a1
ak+1 1
mod (m) ,
ak+1 z0 + c
a1
62
Remarks
Obviously, the linear congruential generator defined in (1) can generate no more than m different
numbers z1 , . . . , zn .
As soon as a number zk is repeated for the first time, i.e., there is some m0 > 0 such that
zk = zkm0 ,
the same period of length m0 , which has already been completely generated, is started again, i.e.
zk+j = zkm0 +j
for all j 1.
An unfavorable choice of the parameters m, a, c and z0 , respectively, may result in a very short length
m0 of the period.
For example we have
m0 = 2
for a = c = z0 = 5 and m = 10,
where the sequence 5, 0, 5, 0, . . . is generated.

A desirable feature for the period length m0 of linear congruence generators is to be as close as
possible to the maximum length m.
We will now mention some (sufficient and necessary) conditions for the parameters m, a, c and z0 , respectively,
ensuring that the maximal possible period m is obtained.
Theorem 3.2
1. If c > 0, then for every initial value z0 {0, 1, . . . , m 1} the linear congruential generator defined in
(1) generates a sequence z1 , . . . , zn of numbers with maximal possible period m if and only if the following
conditions are satisfied:
(a1 ) The parameters c and m are relatively prime.
(a2 ) For every prime number r dividing m, a 1 is a multiple of r.
(a3 ) If m is a multiple of 4 then also a 1 is multiple of 4.
2. If c = 0 then m0 = m 1 for all z0 {1, . . . , m 1} if and only if
(b1 ) m is prime and
(b2 ) for any prime r dividing m 1 the number a(m1)/r 1 is not divisible by m.
3. If c = 0 and if there is k N such that m = 2k 16 then m0 = m/4 if and only if z0 is an odd number and
a mod (8) = 5 or = 3.
A proof of Theorem 3.2 using results from number theory (one of them being Fermats little theorem) can be
found e.g.
in Section 2.7 of B.D. Ripley Stochastic Simulation, J. Wiley & Sons, New York (1987) or
in Section 3.2 of D.E. Knuth (1997) The Art of Computer Programming, Vol. II, Addison-Wesley, Reading
MA.
We also refer to these two texts for the discussion
of other generators for standard pseudorandom numbers like nonlinear congruential generators, shift
register generators and lagged Fibonacci generators as well as their combinations,
alternative conditions for the parameters m, a, c and z0 of the linear congruential generator defined in
(1),
63
ensuring the generation of sequences z1 , . . . , zn whose period m0 is as large as possible and also exhibiting other desirable properties.
One of those properties is
that the points (u1 , u2 ), . . . , (un1 , un ) formed by pairs of consecutive pseudorandom numbers ui1 ,
ui are uniformly spread over the unit square [0, 1]2 .
The following numerical examples illustrate that relatively small changes of the parameters a and c
can result in completely different point patterns (u1 , u2 ), . . . (un1 , un ).
Further details can be found in the text by Ripley (1987) that has been already mentioned and in the lecture
notes by H. Knsch (ftp://stat.ethz.ch/U/Kuensch/skript-sim.ps) that also contains the following figures.
Figure 3: Point patterns for pairs (ui1 , ui ) of consecutive pseudo-random numbers for m = 256
3.1.3
Statistical Tests
In literature numerous statistical significance tests are discussed in order to investigate characteristics of random number generators; see e.g. G.S. Fishman (1996) Monte Carlo: Concepts, Algorithms and Applications,
Springer, New York.
We only recall two such tests which are important for investigating characteristics of linear congruential
generators (and other random number generators).
Pearsons 2 goodness of fit test is used to check
if the generated pseudorandom numbers can be regarded as realizations of uniformly distributed
random variables
and if we may assume the independence of these random variables.
64
Figure 4: Point patterns for pairs (ui1 , ui ) of consecutive pseudorandom numbers for m = 256
Another method for the generation of sequences u1 , u2 , . . . of numbers having desirable characteristics
is based on minimizing the Kolmogorov distance
1
Dn (u1 , . . . , un ) = sup # i : 1 i n, 0 < ui x x

x(0,1] n
between the empirical distribution function of the sample u1 , . . . , un and the distribution function of
the uniform distribution on (0, 1] for every natural number n.
In literature this procedure is referred to as Quasi-Monte-Carlo-Method; see e.g. H. Niederreiter (1992)
Random Number Generation and Quasi-Monte-Carlo Methods, SIAM, Philadelphia.
1. 2 goodness of fit test of uniform distribution

The following test is considered in order to check if the pseudo-random numbers u1 , . . . , un
can be regarded as realizations of independent sampling variables U1 , . . . , Un that are uniformly distributed on the interval (0, 1].
The interval (0, 1] is divided in r subintervals of equal length (0, 1/r], . . . , ((r 1)/r, 1] and
we consider the (r 1)dimensional (hypothetical) vector of parameters p0 = (1/r, . . . , 1/r) and
the test statistic Tn : Rn [0, ) where
Tn (u1 , . . . , un ) =
r
X
(Zj (u1 , . . . , un ) n/r)2
,
n/r
j=1
and Zj (u1 , . . . , un ) = #{i : 1 i n, j 1 < rui j} denotes the number of pseudorandom

numbers u1 , . . . , un in the interval ((j 1)/r, j/r].
65
Figure 5: Point patterns for pairs (ui1 , ui ) of consecutive pseudorandom numbers for m = 2048
If the sampling variables U1 , . . . , Un are independent and uniformly distributed on the interval (0, 1],
the test statistic Tn is asymptotically 2r1 distributed.
Thus, for sufficiently large n the hypothesis H0 : p = p0 is rejected if
Tn (u1 , . . . , un ) > 2r1,1 ,
where 2r1,1 denotes the (1 )-quantile of the 2 distribution with r 1 degrees of freedom.
We will illustrate this test by the following numerical example. For = 0.05, n = 100 000 and r = 10
we want to check if
the hypothesis that the sampling variables are uniformly distributed is conformable with a sample
(u1 , . . . , u100 000 ) of pseudorandom numbers. The sample has the following vector (z1 , . . . , z10 ) of
class frequencies:
z1
z2
z3
z4
z5
z6
z7
z8
z9
z10
9 995
10 045
10 127
9 816
10 130
10 040
9 890
9 858
10 083
10 016
66
In this case we obtain T100 000 (u1 , . . . , u100 000 ) = 10.99 and hence
T100 000 (u1 , . . . , u100 000 ) = 10.99 < 29,0.95 = 16.92 .
Thus, the hypothesis of a uniform distribution on (0, 1] is not rejected.
Remarks
As a generalization of the 2 goodness of fit test for checking the uniform distribution of some
sample variables one can also check
if for a given natural number d 1 (e.g. d = 2 or d = 3) the pseudorandom vectors
(u1 , . . . , ud ), . . . , (u(n1)d+1 , . . . , und ) can be regarded
as realizations of independent random vectors (U1 , . . . , Ud ), . . . , (U(n1)d+1 , . . . , Und ) that are
uniformly distributed on (0, 1]d .
For this purpose the unit cube (0, 1]d is divided into rd smaller cubes Bj of equal size,
which are of the form ((i1 1)/r, i1 /r] . . . ((id 1)/r, id /r].
Furthermore, we consider the (rd 1)dimensional (hypothetical) vector p0 = (1/rd , . . . , 1/rd )
of parameters and
the test statistic Tn : Rnd [0, ) where
d
r
X
(Zj (u1 , . . . , un ) n/rd )2
Tn (u1 , . . . , un ) =
,
n/rd
j=1
ui = (u(i1)d+1 , . . . , uid ) and Zj (u1 , . . . , un ) = #{i : 1 i n, ui Bj }. Notice that

Zj (u1 , . . . , un ) denotes the number of pseudo-random vectors in Wj .
2. Run Test
There are a number of other significance tests allowing to evaluate the quality of random number generators.
In particular it can be verified
if the generated pseudo-random numbers u1 , . . . , un can be regarded as realizations of independent
random variables U1 , . . . , Un having a certain distribution. In our case we consider the hypothesis of a
uniform distribution on (0, 1].
The following run test checks in particular
if the independence assumption for the sampling variables U1 , . . . , Un is reflected sufficiently well
by the pseudorandom numbers u1 , . . . , un .
This is done by analyzing the lengths of monotonically increasing subsequences, also called runs,
within the sequence u1 , u2 , . . . of pseudo-random numbers.
For this purpose we define the random variables V1 , V2 , . . . by the recursion formula
Vj+1 = min{i : i > Vj + 1, Ui > Ui+1 } ,
j = 1, 2, . . . ,
(5)
where V1 = min{i : i 1, Ui > Ui+1 }.

The random variables W1 , W2 , . . . where
W 1 = V1
and
Wj+1 = Vj+1 (Vj + 1) for j = 1, 2, . . .
(6)
are called the runs of the sequence U1 , U2 , . . ..

The significance test that will be constructed is based on the following property of the runs W1 , W2 , . . ..
67
Theorem 3.3 The random variables W1 , W2 , . . . introduced in (6) are independent and identically distributed such that
k
P (Wj = k) =
,
k = 1, 2, . . . ,
(7)
(k + 1)!
if the random variables U1 , U2 , . . . are independent and uniformly distributed on (0, 1].
Proof
Let U1 , U2 , . . . be independent and uniformly distributed on (0, 1].
Then for all n 1 and for arbitrary natural numbers k1 , . . . , kn 1, we get that
P (W1 = k1 , . . . , Wn = kn ) = P (V1 = k1 , V2 V1 1 = k2 , . . . , Vn Vn1 1 = kn )
= P V1 = k1 , V2 = k2 + k1 + 1, . . . , Vn = kn + . . . + k1 + n 1
= P Ui Ui+1 , i = 1, . . . , k1 1, Uk1 > Uk1 +1 ,

Ui Ui+1 , i = k1 + 2, . . . , k1 + 1 + k2 1, Uk1 +1+k2 > Uk1 +1+k2 +1 , . . . ,
Ui Ui+1 , i = k1 + 1 + . . . + kn1 + 2, . . . , k1 + 1 + . . . + kn1 + 1 + kn 1,
Uk1 +1+...+kn1 +1+kn > Uk1 +1+...+kn1 +1+kn +1
= P Ui Ui+1 , i = 1, . . . , k1 1, Uk1 > Uk1 +1
P Ui Ui+1 , i = k1 + 2, . . . , k1 + 1 + k2 1, Uk1 +1+k2 > Uk1 +1+k2 +1
. . . P Ui Ui+1 , i = k1 + 1 + . . . + kn1 + 2, . . . , k1 + 1 + . . . + kn1 + 1 + kn 1,
Uk1 +1+...+kn1 +1+kn > Uk1 +1+...+kn1 +1+kn +1
= P Ui Ui+1 , i = 1, . . . , k1 1, Uk1 > Uk1 +1
. . . P Ui Ui+1 , i = 1, . . . , kn 1, Ukn > Ukn +1 .

This implies that the runs W1 , W2 , . . . are independent and identically distributed.
Furthermore, an induction argument shows that for arbitrary k R and t (0, 1]
P (U1 . . . Uk t) =
tk
.
k!
(8)
For k = 1, equation (8) obviously holds. By the formula of total probability we obtain
Z 1
P (U1 . . . Uk+1 t) =
P (U1 . . . Uk Uk+1 t | Uk+1 = x) P (Uk+1 dx)
0
P (U1 . . . Uk x t | Uk+1 = x) P (Uk+1 dx)

0
P (U1 . . . Uk x t) dx ,
=
0
where the last equality is a consequence of the independence and (0, 1]-uniform distribution of
U1 , U2 , . . ..
Assume now that (8) is true for some k 1. Then
Z t
P (U1 . . . Uk+1 t) =
P (U1 . . . Uk x) dx
0
Z t k
x
tk+1
=
dx =
,
k!
(k + 1)!
0
where the second but one equality uses the induction hypothesis.
68
This shows (8) for any k 1.

Moreover, by (8) we can conclude that for any k N
Z 1
P (U1 . . . Uk , Uk > x) dx
P (U1 . . . Uk , Uk > Uk+1 ) =
0
Z 1
=
P (U1 . . . Uk 1) P (U1 . . . Uk x) dx
Z 1
1
xk
=
dx
k!
k!
0
1
k
1
=
.
=
k!
(k + 1)!
(k + 1)!
Remarks
Let us assume that sufficiently many pseudo-random numbers u1 , u2 . . . have been generated that
are resulting in the n runs w1 , . . . , wn according to (5) and (6).
We choose r pairwise disjoint intervals (a1 , b1 ], . . . , (ar , br ] on the positive real axis such that
the probabilities
X
k
,
j = 1, . . . , r
p0,j =
(k + 1)!
kN(aj ,bj ]
are almost equal.

For these probabilities we consider the (r 1)dimensional (hypothetical) vector
p0 = (p0,1 , . . . , p0,r1 ) and
the test statistic Tn : Rn [0, ) where
Tn (w1 , . . . , wn ) =
r
X
(Yj (w1 , . . . , wn ) np0,j )2
,
np0,j
j=1
and Yj (w1 , . . . , wn ) = #{i : 1 i n, aj < wi bj } denotes the number of run lengths

w1 , . . . , wn belonging to class j.
According to Theorem 3.3 for large n the hypothesis H0 : p = p0 will be rejected if T (w1 , . . . , wn ) >
2r1,1 . Note that this requires the generation of a sufficiently large number of pseudorandom
numbers u1 , u2 , . . ..
3.2
Transformation of Uniformly Distributed Random Numbers
Based on standard pseudorandom numbers u1 , u2 . . . that can be generated by methods like the linear
congruential generator
it is possible to generate pseudorandom numbers x1 , x2 . . . that can be regarded as realizations of
random variables X1 , X2 . . . having other than uniform distributions.
Examples are realizations x1 , x2 , . . . of exponentially, Poisson, binomially or normally distributed random variables X1 , X2 , . . ..
For this purpose one can apply algorithms like the so-called inversion method and rejectionsampling, whose
basic ideas will be explained by some examples.
A much more comprehensive discussion of these algorithms can be found e.g. in
L. Devroye (1986) Nonuniform Random Variate Generation. Springer, New York,
G.S. Fishman (1996) Monte Carlo: Concepts, Algorithms and Applications. Springer, New York,
C.P. Robert and G. Casella (1999) Monte Carlo Statistical Methods. Springer, New York.
3.2.1
69
Inversion Method
The following property of the generalized inverse can be used as a basis for the generation of pseudorandom
numbers x1 , x2 . . . that can be regarded as realizations of random variables X1 , X2 . . . whose distribution
function F : R [0, 1] is an arbitrary monotonically nondecreasing and rightcontinuous function such that
limx F (x) = 0 and limx F (x) = 1.
Recall the following auxiliary result.
Let F : R [0, 1] be an arbitrary distribution function. Then the function F 1 : (0, 1] R {}
where
F 1 (y) = inf{x : F (x) y}
(9)
is called the generalized inverse of the distribution function F .
For arbitrary x R and y (0, 1)
y F (x)
if and only if
F 1 (y) x ,
(10)
see Lemma WR-4.1.
Theorem 3.4
Let U1 , U2 , . . . be a sequence of independent and uniformly distributed random variables on (0, 1] and let
F : R [0, 1] be a distribution function.
Then the random variables X1 , X2 , . . . where Xi = F 1 (Ui ) for i = 1, 2, . . . are independent and their
distribution function is given by F .
Proof
The independence of X1 , X2 , . . . is an immediate consequence of the transformation theorem for independent random variables; see Theorem WR-3.18.
Furthermore, (10) implies for arbitrary x R and i N
(10)
P (Xi x) = P F 1 (Ui ) x = P Ui F (x) = F (x) .
Examples
In the following we discuss some examples illustrating
how Theorem 3.4 can be used in order to generate pseudo-random numbers x1 , x2 . . .
that can be regarded as realizations of independent random variables X1 , X2 . . . with a given
distribution function F : R [0, 1].
These numbers are also referred to as F distributed pseudorandom numbers x1 , x2 . . .,
in spite of the fact that the empirical distribution function Fbn of the sample x1 , . . . , xn
is only an approximation of F for large n.
Note that Theorem 3.4 can only be applied directly if
the generalized inverse F 1 of F is given explicitly (i.e. by an analytical formula).
70
Unfortunately, this situation is merely an exception.
1. Exponential distribution
Let > 0 and F : R [0, 1] be the distribution function of the Exp()distribution, i.e.
1 ex if x 0,
F (x) =
0
if x < 0.
Then F 1 (u) = 1 log(1 u) for all u (0, 1].
By Theorem 3.4,
we have X = 1 log U Exp() if U and hence also 1 U are uniformly distributed on (0, 1]
and the pseudo-random numbers x1 , . . . , xn where
xi =
log ui
for i = 1, . . . , n
can be regarded as realizations of Exp()distributed random variables

if u1 , . . . , un are realizations of independent and uniformly on (0, 1] distributed random variables
U1 , . . . , Un .
2. Erlang distribution
Let > 0, r N and let F : R [0, 1] be the distribution function of the Erlang distribution, i.e., of
the (, r)distribution where
Z x v
e
(v)r1
dv if x 0,
(r 1)!
F (x) =
(11)
0
0
if x < 0.
Then the generalized inverse F 1 of F cannot be determined explicitly and therefore Theorem 3.4
cannot be applied directly.
However, one can show that X1 +. . .+Xr (, r) if the random variables X1 , . . . , Xr are independent
and Exp()distributed.
By Theorem 3.4
the pseudorandom numbers y1 , . . . , yn where
log ur(i1)+1 . . . uri

yi = xr(i1)+1 + . . . + xri =
for i = 1, . . . , n
can be regarded as realizations of independent (, r)distributed random variables,

if u1 , . . . , urn are realizations of independent and uniformly distributed random variables on (0, 1].
In particular, for = 1/2 the pseudorandom numbers y1 , . . . , yn can be regarded as realizations
of a 22r distributed random variable.
71
3. Normal distribution
In order to generate normally distributed pseudorandom numbers one can apply the socalled Box
Muller algorithm, which also requires exponentially distributed pseudorandom numbers.
Assume the random numbers U1 , U2 to be independent and uniformly distributed on (0, 1].
By Theorem 3.4, we get that X = 2 log U1 is an Exp(1/2)distributed random variable and
the random vector (Y1 , Y2 ) where
Y1 = X cos(2U2 ) ,
Y2 = X sin(2U2 )
turns out to be N(o, I)distributed, i.e., Y1 , Y2 are independent and N(0, 1)distributed random
variables
as for arbitrary y1 , y2 R
p
p
P (Y1 y1 , Y2 y2 ) = P
2 log U1 cos(2U2 ) y1 , 2 log U1 sin(2U2 ) y2
Z 1Z
1
=
1I x cos(2u) y1 , x sin(2u) y2 ex/2 dx du
2 0 0
Z y2 Z y1
2
2
1
=
e(v +w )/2 dv dw
2
Z y1
Z y2
2
1
1
v 2 /2
e
dv
ew /2 dw ,
=
2
2
where the last but one equality follows from the substitution
w = x sin(2u)
v = x cos(2u) ,
whose functional determinant is .
The pseudorandom numbers y1 , . . . , y2n where
p
y2k1 = 2 log u2k1 cos(2u2k ) ,
y2k =
p
2 log u2k1 sin(2u2k )
(12)
can thus be regarded as realizations of independent and N(0, 1)distributed random variables ,
if u1 , . . . , u2n are realizations of independent and uniformly on (0, 1] distributed random variables
U1 , . . . , U2n .
0
For arbitrary R and 2 > 0 the pseudo-random numbers y10 , . . . , y2n
where yi0 = (yi + ) can be
2
regarded as realizations of independent and N(, )distributed random variables.
Remarks
A faster algorithm for the generation of normally distributed pseudo-random numbers is obtained
if additionally a method of rejection sampling is applied that will be introduced in Section 3.2.3.
This method avoids the relatively time-consuming computation of the trigonometric functions in
(12).
3.2.2
Transformation Algorithms for Discrete Distributions
If pseudorandom numbers x1 , x2 . . . need to be generated

that can be regarded as realizations of discrete random variables X1 , X2 . . .
taking the values a0 , a1 . . . R with probabilities pj = P (Xi = aj ) 0 for j = 0, 1, . . .,
72
then it is sometimes advisable to proceed as follows:

Let U be a (0, 1]uniformly distributed random variable and let the random variable X be given by
a0
a1
.
..
X=
aj
..
.
if U < p0 ,
if p0 U < p0 + p1 ,
(13)
if p0 + . . . + pj1 U < p0 + . . . + pj ,
Then P (X = aj ) = pj for all j = 0, 1, . . ..

The pseudorandom numbers x1 , . . . , xn where
a0 if ui < p0 ,
a1 if p0 ui < p0 + p1 ,
.
..
xi =
aj if p0 + . . . + pj1 ui < p0 + . . . + pj ,
..
.
can thus be regarded as realizations of independent and p-distributed random variables where
p = (p0 , p1 , . . .)> ,
if u1 , . . . , un are realizations of independent and uniformly distributed random variables on (0, 1].
Example
(Geometric distribution)
We consider the following values for aj and the corresponding probabilities pj .

Let aj = j for j = 0, 1, . . ., and for 0 < p < 1 , q = 1 p let
0
if j = 0,
pj =
p q j1 if j 1.
Then, for all j 1,
1 (p1 + . . . + pj ) = pj+1 + pj+2 + . . . = p
qi = qj
(14)
i=j
and pj = q j1 q j .
Furthermore, we consider the random variable
log U
X=
log q
+ 1,
where U is a (0, 1]uniformly distributed random variable and bzc denotes the integer part of z.
Then P (X = j) = p q j1 for all j = 1, 2, . . ., i.e. X Geo(p),
(15)
73
as (14) and (15) imply

X
(15)
=
=
=
n
log U o
min j 1 : j >
log q
n
o
min j 1 : j log q < log U
n
o
min j 1 : q j < U
j 1I q j < U q j1
j=1
(14)
j 1I p1 + . . . + pj1 1 U < p1 + . . . + pj ,
j=1
where the random variable 1 U is also uniformly distributed on (0, 1].

The pseudorandom numbers x1 , . . . , xn where
log ui
xi =
+1
log q
can thus be regarded as realizations of independent and geometrically distributed random variables
X1 , . . . , Xn Geo(p)
if u1 , . . . , un are realizations of independent random variables U1 , . . . , Un that are uniformly distributed on the interval (0, 1].
For some discrete distributions there are specific transformation algorithms allowing the generation of pseudo
random numbers having this distribution.
Examples
1. Poisson distribution (with small expectation )
If > 0 is a small number, then the following procedure is appropriate to generate Poisson
distributed pseudorandom numbers
by transformation of exponentially distributed pseudorandom numbers (as in Section 3.2.1)
or directly based on (0, 1]uniformly distributed pseudorandom numbers.
Let the random variables X1 , X2 , . . . be independent and Exp()distributed.
If we consider the random variable Y = max{k 0 : X1 + . . . + Xk 1}, formula (11) for the
distribution function of the Erlangdistribution yields for all j 0
P (Y = j)
= P (Y j) P (Y j + 1)
= P (X1 + . . . + Xj 1) P (X1 + . . . + Xj+1 1)
Z 1 v
Z 1 v
e
(v)j
e
(v)j1
dv
dv
=
(j 1)!
j!
0
0
Z 1
d ev (v)j
=
dv
j!
0 dv
e j
=
.
j!
In other words we obtained Y Poi().
74
The pseudorandom numbers y1 , . . . , yn where
and
yi = max{k 0 : x1 + . . . + xk i} yi1 ,
i = 1, . . . , n ,
(16)
yi = max{k 0 : u1 . . . uk ei } yi1 ,
i = 1, . . . , n ,
(17)
where y0 = 0 and xj = log uj for j = 1, 2, . . .,

can thus be regarded as realizations of independent and Poi()distributed random variables,
if x1 , x2 . . . are realizations of Exp()distributed random variables X1 , X2 . . . and
if u1 , . . . , un are realizations of independent random variables U1 , . . . , Un that are uniformly
distributed on the interval (0, 1], respectively.
Remarks
As the expectation of the Poi()distribution is given by , the mean number of uniformly
distributed pseudorandom numbers necessary in order to generate a new Poi()distributed
pseudo-random number is also .
For large this effort can be reduced if one proceeds as follows.
2. Poisson distribution (with large expectation )
If > 0 is large, aj = j and pj = e j /j! for j = 0, 1, . . .,
then the procedure based directly on the transformation formula (13) is more appropriate to
generate Poi()distributed pseudorandom numbers,
The validity of the inequalities
U < p0 ,
p0 U < p0 + p1 , . . . , p0 + . . . + pj1 U < p0 + . . . + pj , . . .
(18)
needs to be checked in the order defined below.

Note that the recursion formula
pj+1 =
pj ,
j 0,
j+1
Pj
is applied to calculate the sums Pj = k=0 pk for j 0.
Let bc > 0 be the integer part of . Then it is firstly checked if U < Pbc .
If this inequality holds it is checked if U < Pbc1 , U < Pbc2 , . . . where we define X =
min{k : U < Pk }.
If the inequality U < Pbc does not hold then it is checked if U < Pbc+1 , U < Pbc+2 , . . . and
we also define X = min{k : U < Pk }.
For the expectation E V of the necessary number V of checking steps we obtain the approximation
EV
1 + E |X |
|X |
= 1 + E
1 + 0.798 ,
where the last approximation uses the fact that the random variable (X )/ is approximately N(0, 1)-distributed for large for the following reasons.
As the Poisson distribution is stable under convolutions, i.e.,
n
X
Poi(1 ) . . . Poi(n ) = Poi

i ,
k=1
Pn
the random variable X Poi() can be viewed as the sum i=1 Xi of n independent and
Poi(/n)distributed random variables Xi . The last approximation then follows from the
central limit theorem for sums of independent and identically distributed random variables;
see Theorem WR-5.16.
75
We observe that
for increasing the mean number of checking steps only grows with rate if this simulation
procedure is applied,
whereas for the formerly discussed method generating Poi()distributed pseudorandom numbers the necessary number of standard pseudorandom numbers grows linearly in .
3. Binomial distribution
For the generation of binomially distributed pseudorandom numbers one can proceed similarly
to the Poisson case.
For arbitrary but fixed numbers n N and p (0, 1) where q = 1 p let
aj = j
and
pj =
For j = 0, 1, . . . , n the sums Pj =

pj+1 =
n!
pj q nj ,
j! (n j)!
Pj
k=0
j = 0, 1, . . . , n .
pk are calculated via the recursion formula
nj p
pj ,
j+1 q
j = 0, 1, . . . , n 1
If np > 0 is small, then

the validity of the inequalities (18) is checked in the natural order
starting at U < p0 and defining X = min{k : U < Pk }.
If np is large,
then it is more efficient to check the validity of the inequalities (18) in the following order. It
is firstly checked if U < Pbnpc .
If this inequality holds it is checked if U < Pbnpc1 , U < Pbnpc2 , . . . where we also define
X = min{k : U < Pk }.
If the inequality U < Pbnpc does not hold it is checked if U < Pbnpc+1 , U < Pbnpc+2 , . . . where
we again define X = min{k : U < Pk }.
3.2.3
Acceptance-Rejection Method
In this section we discuss another method for the generation of pseudorandom numbers y1 , y2 , . . .
that can be regarded as realizations of independent and identically distributed random variables
Y1 , Y2 . . .. Their distribution function is assumed to be given; it is denoted by G.
This method also requires a sequence of independent and identically distributed pseudorandom numbers x1 , x2 , . . ., but we abandon the condition that they need to be uniformly distributed on (0, 1].
The only condition we impose on their distribution function F is that G needs to be absolutely continuous with respect to F with bounded density g(x) = dG(x)/dF (x),
i.e., for some constant c > 0, we have
Z
g(x) c
and
G(y) =
g(x) dF (x) ,
x, y R .
(19)
First of all we consider the discrete case.

Let aj = j for all j = 0, 1, . . ., and let p = (p0 , p1 , . . .)> and q = (q0 , q1 , . . .)> be two arbitrary
probability functions such that for all j = 0, 1, . . . pj = 0 implies qj = 0.
Let X : {0, 1, . . .} be a random variable P (X = j) = pj for all j = 0, 1, . . .,
76
and let c > 0 be a positive number
g(j) =
qj
c
pj
for all j 0 such that pj > 0.
(20)
Theorem 3.5
Let (U1 , X1 ), (U2 , X2 ), . . . be a sequence of independent and identically distributed random vectors whose
components are independent. Furthermore, let Ui be a (0, 1]uniformly distributed random variable and Xi
be distributed according to p.
Then
the random variable
n
qXk o
I = min k 1 : Uk <
c pXk
is geometrically distributed with expectation c, i.e., I Geo(c1 ),

and the random variable Y = XI is distributed according to q.
Proof
By the definition of I given in (21), we obtain for all j 1
qXj1
qXj
qX1
P (I = j) = P U1
, . . . , Uj1
, Uj <
c pX 1
c pXj1
c pXj
qXj1
qXj
qX1
= P U1
. . . P Uj1
P Uj <
c pX 1
c pXj1
c pXj
= p q j1 ,
where q = 1 p and
p =
=
=
qX1
P U1 <
c pX
1 q
X
X1
P U1 <
| X1 = k P (X1 = k)
c pX 1
k: pk >0
X
qk
P U1 <
pk
c pk
k: pk >0
k: pk >0
1
qk
pk =
.
c pk
c
This shows I Geo(c1 ).

Furthermore, for any j 1 such that pj > 0, we get that
P (Y = j) =
P (XI = j) =
P (XI = j, I = k)
k=1
=
=
X
k=1
X
k=1
P (Xk = j, I = k) =
X
k=1
pj q k1
qj
cpj
qj
P (Xk = j) q k1 P Uk
cpj
qj X k1
=
q
c
qj 1
= qj ,
c 1q
k=1
(21)
77
and for all j 1 such that pj = 0

P (Y = j) =
P (Xk = j, I = k)
k=1
P (Xk = j) = 0 .
k=1
Remarks
Theorem 3.5 implies that the mean number of F distributed pseudo-random numbers necessary to
obtain a Gdistributed random number is c.
In case there are several alternatives for the choice of the the distribution function F ,
possessing equally nice properties with respect to the generation of F distributed pseudorandom
numbers,
then one should choose the distribution function with the smallest c.
Furthermore, as a consequence of Theorem 3.5,
the values g(x) and g(j) of the density in (19) and (20), respectively need only be known up to a
constant factor.
In the general (i.e. not necessarily discrete) case one can proceed in a similar way. The following result will serve
as foundation for constructing acceptancerejection algorithms.
Theorem 3.6
Let F, G : R [0, 1] be two arbitrary distribution functions such that (19) holds.
components are independent. Furthermore, let Ui be a (0, 1]uniformly distributed random variable and Xi
be distributed according to F .
Then the random variable
n
g(Xk ) o
I = min k 1 : Uk <
(22)
c
is geometrically distributed with expectation c, i.e., I Geo(c1 ) and the random variable Y = XI is
distributed according to G.
Proof
Similarly to the proof of Theorem 3.5 we obtain P (I = j) = p q j1 for any j 1 where
Z
g(X1 )
g(X1 )
p = P U1 <
=
P U1 <
| X1 = x dF (x)
c
c
R
Z
Z
g(x)
g(x)
1
=
P U1 <
dF (x) =
dF (x) = .
c
c
c
R
R
Furthermore, for all y R we have
P (Y y)
= P (XI y) =
=
Z
X
k=1
=
where 1 q = p = c1 .
P (XI y, I = k) =
k=1
1
1q
P (I = k | Xk = v) dF (v) =
Z
g(v)
dF (v) =
c
P (Xk y, I = k)
k=1
X
k=1
k1
g(v)
P Uk <
dF (v)
c
g(v)dF (v) = G(y) ,
78
In the same way we obtain the following vectorial version of Theorem 3.6.
Theorem 3.7
Let m 1 be an arbitrary but fixed natural number and let F, G : Rm [0, 1] be two arbitrary distribution
functions (of mdimensional random vectors) and let c > 0 be a constant such that
Z
g(x) c
and
G(y) =
g(x) dF (x) ,
x, y Rm .
(23)
(,y]
components are also independent. Furthermore, let Ui be a (0, 1]uniformly distributed random variable and
Xi be distributed according to F .
Then the random variable
n
g(Xk ) o
I = min k 1 : Uk <
c
(24)
is geometrically distributed with expectation c, i.e., I Geo(c1 ) and the random vector Y = XI is
distributed according to G.
Examples
1. Uniform distribution on bounded Borel sets
Let the random vector X : Rm (with distribution function F ) be uniformly distributed on
the square (1, 1]m and let B B((1, 1]m be an arbitrary Borel subset of (1, 1]m of positive
Lebesgue measure |B|.
Then the distribution function G : Rm [0, 1] given by
Z
1I(x B)
G(y) =
dF (x) ,
y Rm
|B|
(,y]
is absolutely continuous with respect to F and we obtain for the (RadonNikodym) density
g : Rm [0, ) that
g(x) =
1I(x B)
c = |B|1
|B|
and
g(x)
= 1I(x B) ,
c
x Rm .
By Theorem 3.7 we can now in the following way generate pseudorandom vectors y1 , y2 , . . . that
are uniformly distributed on B.
1. Generate m pseudorandom numbers u1 , . . . , um that are uniformly distributed on the interval
(0, 1].
2. If (2u1 1, . . . , 2um 1)> 6 B, then return to step 1.
3. Otherwise put y = (2u1 1, . . . , 2um 1)> .
2. Normal distribution
As an alternative to the Box-Muller algorithm discussed in Section 3.2.1 we will now introduce
another method to generate normally distributed pseudorandom numbers,
which is often called the polar method.
Notice that the polar method avoids calculating the trigonometric functions in (12).
Let the random vector (V1 , V2 ) be uniformly distributed on the unit circle B, where
B = {(x1 , x2 ) R2 : x21 + x22 1}.
79
Then, the random vector (Y1 , Y2 ) where

q
V1
Y1 = 2 log(V12 + V22 ) p 2
,
V1 + V22
q
Y2 =
V2
2 log(V12 + V22 ) p
V12
+ V22
is N(o, I)-distributed, i.e., Y1 , Y2 are independent and N(0, 1)-distributed random variables. This
can be seen as follows.
By the substitution
v1 = r cos ,
v2 = r sin ,
i.e. by a transformation into polar coordinates we obtain for arbitrary y1 , y2 R
P (Y1 y1 , Y2 y2 )
p
Z p
v1 2 log(v12 + v22 )
v2 2 log(v12 + v22 )
1
p
p
=
1I
y
,
y
d(v1 , v2 )
1
2
B
v12 + v22
v12 + v22
Z 2 Z 1 p
p
1
=
r 1I
2 log(r2 ) cos y1 , 2 log(r2 ) sin y2 dr d
0
0
Z 2 Z
1 1
1I x cos y1 , x sin y2 ex/2 dx d ,
=
2 2 0
0
where the last equality results from the following substitution:
x = 2 log(r2 )
bzw.
1 x/2
e
dx = 2r dr .
2
By the same argument that was used to verify formula (12) in Section 3.2.1 one can check that
the last term can be written as the product F (y1 )F (y2 ) of two N(0, 1)distribution functions.
The pseudorandom numbers y1 , . . . , y2n with
q
q
v2k
2
2 ) q v2k1
2
2 ) q
+ v2k
+ v2k
y2k1 = 2 log(v2k1
,
y2k = 2 log(v2k1
2
2
2
2
v2k1 + v2k
v2k1 + v2k
can thus be regarded as realizations of independent and N(0, 1)distributed random variables,
if (v1 , v2 ), . . . , (v2n1 , v2n ) are realizations of the random variables (V1 , V2 ), . . . , (V2n1 , V2n )
that are independent and uniformly distributed on the unit circle
B = {(x1 , x2 ) R2 : x21 + x22 1} .
Those can be generated via acceptancerejection sampling as explained in the last example.
3.2.4
Quotients of Uniformly Distributed Random Variables
In many cases random variables having absolutely continuous distributions can be represented as quotients of
uniformly distributed random variables.
Combined with acceptancerejection sampling (see Section 3.2.3) this yields another type of simulation
algorithm.
The mathematical foundation for this type of algorithm is the following transformation theorem for the
density of absolutely continuous random vectors.
80
Theorem 3.8
Let X = (X1 , . . . , Xn )> : Rn be an absolutely continuous random vector with joint density fX :
Rn [0, ) and let = (1 , . . . , n ) : Rn Rn be a Borel-measurable function with continuous partial
derivatives i /xj (x1 , . . . , xn ).
Let now the Borel-set C B(Rn ) be picked in a way such that
{x Rn : fX (x) 6= 0} C
and

i
(x1 , . . . , xn ) 6= 0 ,
det
xj
x = (x1 , . . . , xn ) C ,
which ensures that the restriction : C D of to the set C is a bijection where D = {(x) : x C}
denotes the image of .
1
Let 1 = (1
1 , . . . , n ) : D C be the inverse of : C D.
Then the random vector Y = (X) is also absolutely continuous and the density fY (y) of Y is given by
i
1
fX (1
(y1 , . . . , yn ) if y = (y1 , . . . , yn ) D,
1 (y), . . . , n (y)) det
y
j
fY (y) =
(25)
0
if y 6 D.
which is the same as
i
1
1
(
(y
,
.
.
.
,
y
))
(y),
.
.
.
,
(y))
det
fX (1
1
n
n
1
x
j
fY (y) =
if y = (y1 , . . . , yn ) D,
(26)
if y 6 D.
From Theorem 3.8 we obtain the following result concerning the representation of absolutely continuous random
variables as quotients of uniformly distributed random variables.
Theorem 3.9
Let f 0 : R [0, ) be Borel measurable and bounded such that
Z
p
0<
f 0 (x) dx <
and
sup |x| f 0 (x) < .
(27)
xR
Let the random vector (V1 , V2 ) be uniformly distributed on the (bounded) Borel set
p
B = {(x1 , x2 ) R2 : 0 < x1 < f 0 (x2 /x1 )} .
(28)
Then the quotient V2 /V1 is an absolutely continuous random variable with density f : R [0, ) where
f 0 (x)
,
f 0 (y) dy
R
f (x) = R
x R.
Proof
Notice that (27) implies that the Borel set B defined in (28) is bounded, i.e. 0 < |B| < . This is due
to the following reasons.
p
p
For x2 > 0 the inequality x1 < f 0 (x2 /x1 ) is equivalent to x2 < x2 /x1 f 0 (x2 /x1 ).
81
p
If on the other hand x2 < 0 it is equivalent to x2 > x2 /x1 f 0 (x2 /x1 ).
Therefore
p
p
p
B [0, sup f 0 (x)] [ inf x f 0 (x), sup x f 0 (x)]
x<0
xR
and
B [0, sup
f 0 (x)] [ sup |x|
xR
(29)
x>0
f 0 (x), sup |x|
xR
f 0 (x)] .
xR
The following joint density f(V1 ,V2 ) (v1 , v2 ) of the random vector (V1 , V2 ) is thus well defined
p
f(V1 ,V2 ) (v1 , v2 ) = |B|1 1I 0 < v1 < f 0 (v2 /v1 ) .

The function : C C where C = (0, ) R and (x1 , x2 ) = (x1 , x2 /x1 )
is a bijection of C onto itself
and its functional determinant is given by
det
xj
(x1 , x2 )
= det
1
x2
2
x1
0
1
x1
!
=
1
,
x1
(x1 , x2 ) C .
Theorem 3.8 therefore implies

that the density f(V1 ,V2 /V1 ) (y1 , y2 ) of the random vector (V1 , V2 /V1 )> has the following form:
p
f(V1 ,V2 /V1 ) (y1 , y2 ) = |B|1 y1 1I 0 < y1 < f 0 (y2 )

Moreover, the marginal density fV2 /V1 (y2 ) of the second component V2 /V1 of (V1 , V2 /V1 )> is given
by
fV2 /V1 (y2 ) = |B|1
Zf (y2 )
f 0 (y2 )
y1 dy1 =
.
2 |B|
Example
(Normal distribution)
Theorem 3.9 yields a third method to generate N(0, 1)distributed pseudo-random numbers (as an
alternative to the BoxMuller algorithm from Section 3.2.1 and the polar method explained in Section 3.2.3).
Consider the function f 0 : R [0, ) where f 0 (x) = exp(x2 /2) for all x R. For the bounds in (29)
we obtain:
p
p
p
p
p
sup f 0 (x) = 1 ,
inf x f 0 (x) = 2/e ,
sup x f 0 (x) = 2/e .
xR
x<0
x>0
According to Theorem 3.9 a sequence x1 , x2 , . . . of N(0, 1)distributed pseudorandom numbers can

now be generated as follows.
p
p
1. Generate a (0, 1]uniformly distributed pseudo-random number u and a ( 2/e, 2/e]uniformly
distributed pseudorandom number v.
p
2. If u exp(v 2 /(2u2 )) , i.e., if log u v 2 /(4u2 ) v 2 4u2 log u, then return to step 1.
3. Otherwise put x = v/u.
3.3
82
Simulation Methods Based on Markov Chains
Let E be an arbitrary finite set, e.g. a family of possible digital binary or greyscale images x = (x(v), v V ),
where V is a finite set of pixels
and every pixel v V in the observation window V gets mapped to a greyscale value x(v) 0,
resulting in a matrix (x(v), v V ) that has certain properties.
Let : E (0, 1) be an arbitrary probability function, i.e.
X
x = 1
and
x > 0 ,
x E .
xE
If the number |E| of elements in E is large,

the inversion method discussed in Section 3.2 as well as acceptance-rejection sampling are inefficient
algorithms
for the generation of pseudorandom numbers x1 , x2 , . . . in E that are distributed according to .
Remarks
An alternative simulation method is based on
constructing a Markov chain X0 , X1 , . . . with state space E
and an (appropriately chosen) irreducible and aperiodic transition matrix P,
such that is the ergodic limit distribution of the Markov chain.
For sufficiently large n
Xn is approximately distributed
and can thus serve as an efficient tool for the generation of (approximately) distributed pseudo
random elements in E.
Therefore one also uses the term MarkovChainMonteCarlo Simulation (MCMC).
3.3.1
Example: HardCore Model
(see O. Hggstrm (2002) Finite Markov Chains and Algorithmic Applications. CU Press, Cambridge)
We consider a connected graph G = (V, K)
with finitely many vertices V = {v1 , . . . , v|V | }
and a certain set K V 2 of edges, each of them connecting two vertices.
Each vertex in V gets either mapped to 0 or 1,
where we consider the following set E {0, 1}|V | of admissible configurations,
characterized by the property that pairs of connected vertices are not allowed to obtain the value 1 on
both vertices; see also Figure 6.
As we want to pick one of the admissible configurations x E at random we consider the (discrete)
uniform distribution on E, i.e.
1
x = ,
x E ,
(30)
`
where ` = |E| denotes the number of all admissible configurations.
83
Figure 6: Lattice G of size 8 8, black pixels are corresponding to value 1

If the numbers |V | and |K| of vertices and edges, respectively, of the connected graph G = (V, K) are large,
the explicit description of the admissible configurations E will cause difficulties.
Therefore, the number ` of all admissible configurations is typically unknown.
Consequently, formula (30) cannot be applied directly for the simulation of randomly picked admissible configurations.
MCMC Simulation Algorithm

Alternatively, a Markov chain X0 , X1 , . . . can be constructed
that has the state space E and an (appropriately chosen) irreducible and aperiodic transition
matrix P,
such that the ergodic limit distribution is given by (30).
Then we generate a path x0 , x1 , . . . of the Markov chain using the recursive construction of Markov
chains that has been discussed in Section 2.1.3:
1. Pick an admissible initial configuration x0 E.
2. Pick an arbitrary vertex v V at random and toss a fair coin.
3. If the event head occurs and if xn (w) = 0 for all vertices w V connected to v V , then set
xn+1 (v) = 1; else set xn+1 (v) = 0.
4. The values of all edges w 6= v are not changed, i.e., xn+1 (w) = xn (w) for all w 6= v.
Remarks
In order to implement steps 2 4 of this algorithm, the update function : E [0, 1] E considered
in (2.19) needs to be specified.
For this purpose the unit interval (0, 1] is divided into 2|V | parts of equal length 1/2|V |
that correspond to the events (v1 , head), (v1 , tail), . . ., (v|V | , head), (v|V | , tail).
84
Then x0 = (x, z) where
2i 2 2i 1 i
1
if z
,
and x(w) = 0 for all vertices w V
2|V |
2|V |
connected to v V ,
2i 2 2i 1 i
2i 1
2i i
,
or z
,
and x(w) = 0
0
if z
x0 (vi ) =
2|V |
2|V |
2|V |
2|V |
not for all vertices w V connected to v V ,
2i 2
2i i
x(vi ) if z 6
,
.
2|V |
2|V |
(31)
The following theorem implies that for sufficiently large n the return xn = (xn (v), v V ) of the
algorithm can be regarded as a configuration that has been approximately picked according to the
distribution .
Theorem 3.10
Let P = (pxx0 ) be the transition matrix of the MCMC algorithm simulating the hard core model in (31) and
let be the probability function given in (30).
Then P is irreducible and aperiodic and the pair (P, ) is reversible.
Proof
In order to show that P = (pxx0 ) is aperiodic it suffices to note that all diagonal elements pxx of P are
positive.
The following considerations show that P is also irreducible.
Let x, x0 E be two admissible configurations and let m(x) and m(x0 ) denote the number of
vertices set to 1 in x and x0 , respectively.
First we observe that the transition x x0 to the zero configuration x0 E is possible in
m(x) steps with positive probability, where x0 (v) = 0 for all v V .
For this transition all vertices that were originally set to 1 are subsequently set to 0. Each of these
steps happens with positive probability.
Afterwards, in a similar way the chain can transfer from the zero state x0 to state x0 taking
m(x0 ) steps where each of them happens again with positive probability.
Thus the transition x x0 in a finite number of steps is possible with positive probability.
It is left to check that the detailed balance equation (2.85) holds, i.e.
x pxx0 = x0 px0 x ,
x, x0 E .
(32)
If the configurations x, x0 E coincide then (32) obviously holds.

If x(v) 6= x0 (v) for more than one vertex v V then pxx0 = px0 x = 0 and thus (32) also holds for
this case.
Let now x(v) 6= x0 (v) for exactly one v V (and hence x(w) = x0 (w) for all w 6= v).
Then x(w) = x0 (w) = 0 for all vertices w 6= v connected to v and consequently
x pxx0 =
1 1
= x0 px0 x .
` 2|V |
85
Remarks
For all x E let m(x) be the number of vertices set to 1 of the admissible configuration x.
If the admissible configuration is picked at random then the expectation E Y of the random number
Y of vertices set to 1 is given as
1 X
EY =
m(x) .
(33)
`
xE
If ` is large the direct calculation of the expectation E Y via formula (33) is in general not possible
because it is difficult to determine the numbers m(x) analytically.
A method to approximate the expectation E Y is based on generating k randomly picked admissible
(1)
(2)
(k)
configurations xn , xn , . . . , xn E by k runs of the MCMC simulation algorithm described above.
(1)
(2)
As a consequence of the strong law of large numbers the arithmetic mean m(xn ) + m(xn ) + . . . +
(k)
m(xn ) /k is close to E Y with high probability if the run length n and the sample size k are sufficiently
large.
3.3.2
Gibbs Sampler
The MCMC algorithm for the generation of randomly picked admissible configurations of the hard core model
(see Section 3.3.1) is a special case of a socalled Gibbs sampler for the simulation of discrete (highdimensional)
random vectors.
Let V be a finite (nonempty) index set and let X = (X(v), v V ) be a discrete random vector
taking values in the finite state space E R|V | with probability 1 where we assume
that for every pair x, x0 E there is a finite sequence of states y0 , y1 , . . . , yn E such that
y0 = x ,
yn = x0
and
#{v V : yi (v) 6= yi+1 (v)} = 1 ,
i = 0, . . . , n 1 .
(34)
Let = (x , x E) be the probability function of the random vector X with x > 0 for all x E, and for
all v V let
x(v)| x(v) = P X(v) = x(v) | X(v) = x(v)

(35)
denote the conditional probability that the component X(v) of X has the value x(v)
given that the vector X(v) = (X(w), w V \ {v}) of the other components equals x(v) where we
assume (x(v), x(v)) E.

Similar to Section 3.3.1 we construct a Markov chain X0 , X1 , . . .
with state space E and an (appropriately chosen) irreducible and aperiodic transition matrix P,
such that is the ergodic limit distribution of X0 , X1 , . . ..
Then we generate a path x0 , x1 , . . . of the Markov chain by the recursive construction discussed in
Section 2.1.3:
1. Pick an initial state x0 E.
2. Pick a component v V according to a given probability function q = (qv , v V ) such that
qv > 0 for all v V .
86
3. Generate the update xn+1 (v) of the vth component according to the (conditional) probability
function
|xn (v) = x(v)| xn (v) , x(v) such that (x(v), xn (v)) E .

4. The values of all components w 6= v are not changed, i.e. xn+1 (w) = xn (w) for all w 6= v.
Theorem 3.11
Let the transition matrix P = (pxx0 ) be given as

X
pxx0 =
qv x0 (v)| x(v) 1I x(v) = x0 (v) ,
x, x0 E ,
(36)
vV
where the conditional probabilities x0 (v)| x(v) are defined in (35). Then P is irreducible and aperiodic and the
pair (P, ) is reversible.
Proof
The assertion can be proved similarly to the proof of Theorem 3.10.

In order to see that P = (pxx0 ) is aperiodic it suffices to notice
that for all x E
px,x =
qv x(v)| x(v) =
vV
Px
qv
vV
>0
zE: z(v)=x(v)
and hence all diagonal elements px,x of P are positive.

The following considerations show that P is irreducible.
For arbitrary but fixed x, x0 E let k |V | be the number of components v V such that
x(v) 6= x0 (v).
For k = 0, i.e. x = x0 , we already showed while proving the aperiodicity that pxx > 0.
Let now k > 0. Without loss of generality we may assume that the components are linearly ordered
and that the first k components of x and x0 differ.
By hypothesis (34) the state space E contains a sequence y0 , . . . , yk E such that y0 = x and
y1 = x0 (v1 ), x(v2 ), . . . , x(v|V | ) , . . . , yk = x0 (v1 ), . . . , x0 (vk ), x(vk+1 ), . . . , x(v|V | ) = x0 .

Moreover, for each i = 0, . . . , k 1
pyi yi+1 = qvi yi+1 (vi )| yi (vi ) = qvi
yi+1
P
>0
(37)
zE: z(vi )=yi (vi )

(k)
and thus pxx0
Qk1
i=0
pyi yi+1 > 0.
It is left to show that the detailed balance equation (2.85) holds, i.e.
x pxx0 = x0 px0 x ,
x, x0 E .
(38)
If x = x0 , then (38) obviously holds.

If x(v) 6= x0 (v) for more than one component v V , then pxx0 = px0 x = 0 and hence (38) holds.
Let now x(v) 6= x0 (v) for exactly one v V (and hence x(w) = x0 (w) for all w 6= v). Then (37)
implies
q
q v x0
P
Pv x
= x0
= x0 px0 x .
x pxx0 = x
z
z
zE: z(v)=x(v)
zE: z(v)=x0 (v)
87
Let X0 , X1 , . . . be a Markov chain with state space E and the transition matrix P = (pxx0 ) given by (36). As a
consequence of Theorem 3.11 we get that in this case
lim dTV (n , ) = 0
(39)
for any initial concentration 0 where n denotes the distribution of Xn . Furthermore, the Gibbs sampler shows
the following monotonic behavior.
Theorem 3.12
For all n = 0, 1, . . .,
dTV (n , ) dTV (n+1 , ) .
(40)
Proof
For arbitrary v V and x0 E, formula (35) implies
X
(35)
x0 (v)| x(v) x
xE: x(v)=x0 (v)
x0 x(v)| x0 (v)
xE: x(v)=x0 (v)
x0
x(v)| x0 (v) = x0 .
xE: x(v)=x0 (v)
{z
(41)
=1
Using this and the definition (36) of the transition matrix P = (pxx0 ) we obtain
X
n+1, x0 x0
2 dTV (n+1 , ) =
x0 E
X X
n, x pxx0 x0
x0 E xE
(36)
X X
X
n, x
qv x0 (v)| x(v) 1I x(v) = x0 (v) x0
x0 E xE
X X
qv
x0 E vV
(41)
X X
qv
x0 E vV
X X
xE vV
xE: x(v)=x0 (v)
X
X
qv
x0 (v)| x(v) n, x x
x0 (v)| x(v) n, x x
xE: x(v)=x0 (v)
qv
x0 (v)| x(v) n, x x
x0 E: x0 (v)=x(v)
{z
(36)
P
x0 E
x0 (v)| x(v) n, x x0
xE: x(v)=x0 (v)
x0 E vV
XX
vV
pxx0 =1
n, x x = 2 dTV (n , ) .
xE
Remarks
A modified version of the Gibbs sampler that was considered in this section is the so-called cyclic Gibbs
sampler, which uses a different procedure for picking the component v V that will be updated.
88
Namely, it is not chosen according to a (given) probability function q = (qv , v V ), where qv > 0
for all v V ,
but the components v V are sorted linearly and chosen one after another according to this order.
The selection of the update candidates thus becomes a deterministic procedure.
If k = n|V | + i for some numbers n = 0, 1, . . . and i = 1, . . . , |V |, then the matrix P(k) = pxx0 (k) of
the transition probabilities pxx0 (k) in step k is given as
pxx0 (k) = x0 (vi )| x(vi ) 1I x(vi ) = x0 (vi ) ,

x, x0 E .
(42)
For an entire (scan) cycle, updating each component exactly once, one obtains the following transition
matrix
P = P(1) . . . P(|V |) .
(43)
It is easy to show that the matrix P = (pxx0 ) given by (42) and (43)
is irreducible and aperiodic
and that is the stationary (limit) distribution of P as
for all i = 1, . . . , |V | and for all x0 E formulae (41) and (42) imply that
X
(42)
x pxx0 (i) =
xE
(41)
x x0 (vi )| x(vi ) 1I x(vi ) = x0 (vi ) = x0
xE
and hence also
x pxx0 = x0 .
xE
The pair (P, ) is in general not reversible. However, in Section 2.3.4 we showed that the pair (M, )
is reversible where
e
e = diag( 1 )P> diag(x )
M = PP
for
P
(44)
x
denotes the multiplicative reversible version of P.
Theorem 3.13
The matrix M has the following representation

M = P(1) . . . P(|V |) P(|V |) . . . P(1) ,
(45)
i.e., the multiplicative reversible version M of the forwardscan matrix P coincides with the forwardbackward
scan matrix.
Proof
e = P(|V |) . . . P(1) for the matrix P
e = (e
It suffices to show that P
pxx0 ) defined by (44).
Formulae (42)(44) imply for arbitrary x, x0 E
!
e xx0 =
p
diag(x1 )P> diag(x )

xx0
=
=
diag(x1 )P> (|V

X
>
|) . . . P (1) diag(x )
xx0
1
x(v|V | )| y1 (v|V | ) 1I x(v|V | ) = y1 (v|V | ) y1 (v|V |1 )| y2 (v|V |1 )
x
y1 ,...,y|V |1 E
1I y1 (v|V |1 ) = y2 (v|V |1 ) . . . y|V |1 (v1 )| x0 (v1 ) 1I y|V |1 (v1 ) = x0 (v1 ) x0 .
89
This and (35) yield (similar to the proof of (38))

X
e xx0 =
p
y1 (v|V | )| x(v|V | ) 1I x(v|V | ) = y1 (v|V | ) y2 (v|V |1 )| y1 (v|V |1 )
y1 ,...,y|V |1 E
1I y1 (v|V |1 ) = y2 (v|V |1 ) . . . x0 (v1 )| y|V |1 (v1 ) 1I y|V |1 (v1 ) = x0 (v1 )
P(|V |) . . . P(1)
.
0
xx
Remarks
If Gibbs samplers are used in practice it is always assumed
that the conditional probabilities considered in (36) and (42)
x(v)| x(v) = P X(v) = x(v) | X(v) = x(v)
only depend on the vector x(w), w N (v) of the values

obtained by the random vector X = (X(w), w V ) in a certain small neighborhood N (v) V of
v V.
The family N = {N (v), v V } of subsets of V is called a system of neighborhoods if for arbitrary
v, w V
(a) v 6 N (v),
(b) w N (v) implies v N (w).
For the hardcore model from Section 3.3.1, N (v) is the set of those vertices w 6= v that are directly
connected to v by an edge.
3.3.3
MetropolisHastings Algorithm
We will now show that the Gibbs sampler discussed in Section 3.3.2 is a special case of a class of MCMC
algorithms that are of the socalled MetropolisHastings type. This class generalizes two aspects of the
Gibbs sampler.
1. The transition matrix P = (pxx0 ) can be of a more general form than the one defined by
pxx0 =
qv x0 (v)| x(v) 1I x(v) = x0 (v) ,
x, x0 E.
(46)
vV
2. Besides this, a procedure for acceptance or rejection of the updates x x0 is integrated into the
algorithm. It is based on a similar idea as the acceptance-rejection sampling discussed in Section 3.2.3;
see in particular Theorem 3.5.
Let V be a finite nonempty index set and let X = (X(v), v V ) be a discrete random vector,
taking values in the finite state space E R|V | with probability 1.
As usual we assume x > 0 for all x E where = (x , x E) is the probability function of the
random vector X.
We construct
a Markov chain X0 , X1 , . . . with ergodic limit distribution whose transition matrix P =
pxx0 is given by
pxx0 = qxx0 axx0 ,
x, x0 E with x 6= x0 ,
(47)
90
where Q = qxx0 is an arbitrary stochastic matrix that is irreducible and aperiodic, i.e. in particular
qxx0 = 0 if and only if qx0 x = 0.
Moreover, the matrix A = axx0 is defined as

axx0 =
where
t
xx0
sxx0
,
1 + txx0
q 0
x xx
x0 q x0 x
=
(48)
if qxx0 > 0,
(49)
if qxx0 = 0,
and S = sxx0 is an arbitrary symmetric matrix such that
0 < sxx0 1 + min txx0 , tx0 x .
(50)
Remarks
The structure given by (47) of the transition matrix P = pxx0 can be interpreted as follows.
At first a candidate x0 E for the update x x0 is selected according to Q = qxx0 .

If x0 6= x, then x0 is accepted with probability axx0 ,
i.e., with probability 1 axx0 the update x x0 is rejected (and the current state is thus not
changed).
In order to apply the MetropolisHastings
algorithm defined by (47)(50), for a given potential
transition matrix Q = qxx0 only the quotients x /x0 need to be known for all pairs x, x0 E of
states such that qxx0 > 0.
The special case of the Gibbs sampler (see Section 3.3.2) is obtained
if the potential transition probabilities qxx0 are defined by (46).
Then for arbitrary x, x0 E such that #{v V : x(v) 6= x0 (v)} 1
x qxx0 = x0 qx0 x
and thus
txx0 = 1 .
By defining sxx0 = 1 + min txx0 , tx0 x we obtain axx0 = 1 for arbitrary x, x0 E such that
#{v V : x(v) 6= x0 (v)} 1.
Theorem 3.14 The transition matrix P = (pxx0 ) defined by (47)(50) is irreducible and aperiodic and the pair
(P, ) is reversible.
Proof
As the acceptance probabilities axx0 given by (48)(50) are positive for arbitrary x , x0 E the irreducibility and aperiodicity of P = (pxx0 ) are inherited from the corresponding properties of Q = (qxx0 ).
In order to check the detailed balance equation (2.85), i.e.
x pxx0 = x0 px0 x ,
x, x0 E ,
we consider two cases.

If qxx0 = qx0 x = 0, then pxx0 = px0 x = 0 and (51) holds.
(51)
91
If qxx0 > 0, then also qx0 x > 0 and (47)(50) imply

x pxx0
= x qxx0 axx0
sxx0 x0 qx0 x
= x qxx0
x0 qx0 x + x qxx0
= x0 px0 x ,
where the last equality follows by the symmetry of the matrix S = sxx0 .
Examples
1. Metropolis Algorithm
The classic Metropolis algorithm is obtained if we consider equality in (50), i.e. if
sxx0 = 1 + min txx0 , tx0 x ,

x, x0 E .
In this case the acceptance probabilities axx0 for arbitrary x, x0 E such that qxx0 > 0 are of the
following form:
1 + min txx0 , tx0 x

axx0 =
1 + t x0 x
(
)
min 1 + txx0 , 1 + tx0 x

1 + t x0 x
= min 1,
=
1 + txx0
1 + txx0
(
)
x0 q x0 x
= min 1,
,
x qxx0
i.e.
axx0
x0 q x0 x
= min 1,
x qxx0
)
x, x0 E such that qxx0 > 0.
(52)
If the matrix Q = (qxx0 ) of the potential transition probabilities is symmetric, then (52) implies
n 0o
x
,
(53)
axx0 = min 1,
x
In particular, if the potential updates x x0 are chosen randomly, i.e. if
qxx0 =
1
,
|E|
x, x0 E ,
then the acceptance probabilities axx0 are also given by (53).

2. Barker Algorithm
The socalled Barker algorithm is obtained if we consider the matrix S = sxx0 where sxx0 = 1
for arbitrary x, x0 E.
The acceptance probabilities axx0 are then given by
axx0 =
x0 q x0 x
,
x0 qx0 x + x qxx0
(54)
If the matrix Q = (qxx0 ) of potential transition probabilities is symmetric, then

axx0 =
x0
,
x0 + x
(55)
92

As it was done for the Gibbs sampler (see Section 3.3.2) we construct a Markov chain X0 , X1 , . . .
with state space E and with the (irreducible and aperiodic) transition matrix P = (pxx0 ) defined
by (47)(50)
such that is the ergodic limit distribution of X0 , X1 , . . ..
For sufficiently large n the distribution n on Xn coincides approximately with .
In estimating the approximation error for MCMC simulation algorithms it is useful
to know the variational distance dTV (n , ) between the distributions n and
as well as its upper bounds; see Section 3.4.1.
3.4
3.4.1
Error Analysis for MCMC Simulation

Estimate for the Rate of Convergence
We will now show how the upper bounds for the variational distance dTV (n , ) and the second largest
absolute value |2 | = max{2 , |` |} of the eigenvalues 1 , . . . , ` of the transition matrix P derived in
Section 2.3 can be used
in order to determine upper bounds for the distance dTV (n , ) occurring in the nth step of the
MCMC simulation via the Metropolis algorithm,
if the simulated distribution satisfies the following conditions.
Namely we assume
that x 6= x0 for arbitrary x, x0 E such that x 6= x0 ,
and that the states x1 , . . . , x` E are ordered such that x1 > . . . > x` .
We may thus (w.l.o.g.) return to the notation used in Section 2.3 and identify the states x1 , . . . , x` E
and the first ` natural numbers, i.e. E = {1, . . . , `}.
The probabilities i (= xi ) can thus be written in the following way:
i =
bh(i)
,
z(b)
i = 1, . . . , ` ,
(56)
where h : {1, . . . , `} (1, ) is a monotonically increasing function,

and b (0, 1) is chosen such that for a certain constant c 1
h(i + 1) h(i) c ,
and z(b) =
P`
i=1
i = 1, . . . , ` 1
(57)
bh(i) is an (in general unknown) factor.
Furthermore, the definition of a Metropolis algorithm for the MCMC simulation of = (1 , . . . , ` )>
requires
that the basis b and the differences h(i + 1) h(i) are known for all i = 1, . . . , ` 1,
i.e. in particular that the quotients i+1 /i are known for all i = 1, . . . , ` 1.
93
Let the matrix Q = (qij ) of the potential transitions i j be given by
if i = 1, j = 1, 2 or i = `, j = `, ` 1,
1
qij =
if i = 2, . . . , ` 1 and j = i 1, i + 1,
0 , else.
(58)
Let the acceptance probability aij be defined as in (53), i.e.

n q o
j ji
aij = min 1,
= min 1, bh(j)h(i) ,
i, j {1, . . . , `} where qij = qji > 0.
i qij
By (56) and (58) the entries pij = qij aij of the transition matrix P = (pij ) for the MCMC simulation
are thus be given as
p11 = 1
bh(2)h(1)
,
2
p12 =
bh(2)h(1)
,
2
p`,`1 = p`` =
1
2
(59)
and for i = 2, . . . , ` 1
pi,i1 =
1
,
2
pi,i+1 =
bh(i+1)h(i)
,
2
pii = 1 pi,i1 pi,i+1 .
(60)
Theorem 3.15 The second largest eigenvalue 2 of the transition matrix P = (pij ) defined by (59)(60) has
the following upper bound
(1 bc/2 )2
2 1
.
(61)
2
Proof
By Theorem 3.14 the pair (P, ) is reversible.
Hence, Rayleighs theorem (see Theorem 2.17) yields the following representation formula
2 = 1 inf
xR`6=
D(P,) (x, x)
,
Var (x)
(62)
where R`6= = x = (x1 , . . . , x` )> R` : xi 6= xj for somer i, j E denotes the subset of vectors
in R` whose components are not all equal,
Var (x) = kxk2 (x)2 is the variance of the components of x with respect to
and D(P,) (x, x) = (I P)x, x denotes the Dirichlet form of the reversible pair (P, ).
Due to (62) it is sufficient to show that
Var (x) aD(P,) (x, x) ,
for some constant a such that
0<a
x R`
(63)
2
.
(1 bc/2 )2
(64)
Similar to the proof of Theorem 2.18 we obtain by copying the notation that for all (0, 1)
X
2 Var (x) =
(xi xj )2 i j
i,jE
XX
2
1
xe+ )
Q(e)
(x
i j
e
Q(e)
i,jE eij
!
!
X
X X
1
2
2
i j ,
Q(e) (xe xe+ )
Q(e)2
e
e
i,jE
ij
ij
94
where the edge probability Q(e) = e pe e+ is assigned to the directed edge e = (e , e+ ) and
ij denotes the path from i to j.
P
Using the notation |ij | = eij Q(e)2 we thus obtain
X
2Var (x)
|ij |
i,jE
Q(e)2 (xe xe+ )2 i j
eij
X
X
(xe xe+ )2 Q(e)Q(e)21
i j |ij | .
ij 3e
eE
This shows (63) for
n
o
X
a = max Q(e)21
i j |ij | ,
eE
(65)
ij 3e
as we showed in Lemma 2.8 that

2 D(P,) (x, x) =
(xe xe+ )2 Q(e) .
eE
It is left to show that the constant a considered in (65) satisfies the inequality (64).
For this purpose we choose the path ij = (i, i + 1, . . . , j 1, j) for each pair i, j E such that
i < j.
Then (56) and (59)(60) imply
Q(i, i + 1) = i pi,i+1 =
bh(i) bh(i+1)h(i)
i+1
=
.
z(b)
2
2
Thus, the reversibility of the pair (P, ) shown in Theorem 3.14 yields
Q(i + 1, i) = Q(i, i + 1) =
i+1
.
2
Because of (56) and (57) we obtain for arbitrary i, j E such that i < j
!
2
2 2
2
i+1
j
j
j
|ij | =
+ ... +
b2(ji1)c + . . . + b2c + 1
j
j
2
2
22 j2
.
1 b2c
Moreover, all edges e E are of the form e = (i, i + 1) or e = (i, i 1), as for the entries pij of the
transition matrix P = (pij ) defined by (58)(60) we have pij = 0 if |i j| > 1.
Thus, for < 1/2,
a =
n
o
X
i j |ij |
max Q(e)21
eE
max
k=1,...,`1
ij 3e
Q(k, k + 1)
21
1ik,k+1j`
2
(1
b2c )(1
bc(12) )
i j12
1 b2c
21
P
and 1ik i 1 and hence
as Q(k, k + 1)21 = k+1 /2
!
12
12
12
X
k+1
k+1
`
12
j12 =
+ ... +
k+1
.
k+1
k+1
1 bc(12)
k+1j`
95
For = 1/4 we obtain the estimate (64).
The following lemma will turn out to be useful in order to derive a lower bound for the smallest eigenvalue ` of
the transition matrix P = (pij ) defined by (59)(60).
Lemma 3.1
Let A = (aij ) be an arbitrary ` `matrix and for all i = 1, . . . , ` let ri =
P
j: 1j`, j6=i
|aij |.
Let be an arbitrary eigenvalue of A, let = (1 , . . . , ` )> 6= o be a left eigenvector corresponding to

and let k be the number of the component k where
|k | = max |i | > 0 .
i=1,...,`
Then,
| akk | rk .
(66)
Proof
By definition of and we have A = . In particular
`
X
akj j = k
and
( akk )k =
j=1
akj j .
j: 1j`, j6=k
This implies
| akk ||k |
|akj ||j | rk |k |
and
| akk | rk .
j: 1j`, j6=k
Theorem 3.16 The smallest eigenvalue ` of the transition matrix P = (pij ) defined by (59)(60) has the
following lower bound
` bc .
(67)
Proof
By Lemma 3.1 applied to A = P (and to the index k determined for ` )
X
|` pkk |
pkj = 1 pkk
` 1 + 2pkk .
j: 1j`, j6=k
Thus, taking into account (59)(60) we derive
1 bc
` 1 + 2 min pii 1 + 2 1
= bc .
i=1,...,`
2
2
Remark
Summarizing the results of Theorems 3.15 and 3.16 we have shown that
n
(1 bc/2 )2
(1 bc/2 )2 c o
|2 | = max{2 , |` |} max 1
, b =1
.
2
2
(68)
3.4.2
96
MCMC Estimators; Bias and Fundamental Matrix
In this section we will investigate the characteristics of MonteCarlo estimators for expectations.
Examples for similar problems were already discussed in Section 3.1.1,
when we estimated by statistical means
and the value of integrals via MonteCarlo simulation.
However, for these purposes we assumed
that the pseudorandom numbers can be regarded as realizations of independent and identically distributed sampling variables.
In the present section we assume that the sample variables form an (appropriately chosen) Markov
chain.
This is the reason why these estimators are called MarkovChainMonteCarlo estimators (MCMC estimators).
Statistical Model
Let V be a finite (nonempty) index set and let X = (X(v), v V ) be a discrete random vector,
taking values in the finite state space E R|V | with probability 1,
where E is identified with the set E = {1, . . . , `} of the first ` = |E| natural numbers.
Furthermore, we assume i > 0 for all i E where = (i , i E) denotes the probability
function of the random vector X.
Our goal
is to estimate the expectation = E (X) via MCMC simulation where
= >
(69)
and = (1 , . . . , ` )> : E R is an arbitrary but fixed function.

As an estimator for we consider the random variable
n1
1 X
bn =
(Xk ) ,
n
n 1,
(70)
k=0
where X0 , X1 , . . . is a Markov chain with state space E, arbitrary but fixed initial distribution
and

an irreducible and aperiodic transition matrix P = pij , such that is the ergodic limit distribution with respect to P.
Remarks
Typically, the initial distribution does not coincide with the simulated distribution .
Consequently, the MCMC estimator bn defined by (70) is not unbiased for fixed (finite) sample
size,
i.e. in general E bn 6= for all n 1.
For determining the bias E bn the following representation formula will be helpful.
Theorem 3.17
For all n 1,
n1
X
1
Pk .
E bn = >
n
k=0
(71)
97
Proof
> k
In Theorem 2.3 we proved that for all k 1 the distribution k of Xk is given by >
k = P .
Thus, by definition (70) of the MCMC estimator bn , we get that
n1
n1
n1
n1
X
1 X
1 X >
1 X > k
1
E bn =
E (Xk ) =
k =
P = >
Pk .
n
n
n
n
k=0
k=0
k=0
k=0
Remarks
As an immediate consequence of Theorem 3.17, the ergodicity of the transition matrix P, and (69),
one obtains
lim E bn = ,
n
i.e., the MCMC estimator bn for defined in (70) is asymptotically unbiased.
Apart from this, the asymptotic behavior of n E bn for n can be determined. For this purpose we need
the following two lemmata.
Lemma 3.2
Let be the ` ` matrix consisting of the ` identical row vectors > . Then
for all n 1 and in particular
(P )n = Pn
(72)
lim (P )n = 0 .
(73)
Proof
Evidently, (72) holds for n = 1.
If we assume that (72) holds for some n 1 1, then
(P )n
=
=
(P )n1 (P ) = (Pn1 )(P )

Pn P Pn1 + 2 = Pn ,
where the last equality follows from the fact that

> P = >
and thus
P = P = = 2 .
This proves (72) for all n 1.

As P is assumed to be irreducible and aperiodic,
by Theorems 2.4 and 2.9 we get that Pn 0 if n .
Thus, by (72), also (P )n 0 if n .
Remarks
By the zero convergence (P)n 0 for n in Lemma 3.2 and Lemma 2.4, the matrix I(P)
is invertible.
In order to show this it suffices to consider the matrix A = P in Lemma 2.4.
The inverse matrix
Z = (I (P ))1
is hence well defined. It is called the fundamental matrix of P.
(74)
98
Lemma 3.3 The fundamental matrix Z = (I (P ))1 of the irreducible and aperiodic transition matrix
P has the representation formulae
X
Z=I+
(Pk )
(75)
k=1
and
n1
X
Z = I + lim
nk
(Pk ) .
n
k=1
(76)
Proof
Formula (75) follows from Lemmas 2.4 and 3.2 as for A = P
Z
=
=
=
(2.79)
=
(72)
(I A)1
(I A)1 lim (I An )
n
lim (I A)1 (I An )
n
lim I + A + . . . + An1
n
X
X
k
k
I+
A
=I+
(P )
k=1
I+
k=1
(Pk ) .
k=1
In order to show (76) it suffices to notice that

n
X
(Pk )
k=1
n1
X
k=1
n
n
X
nk
k
1 X
(Pk ) =
(Pk ) =
k(P )k
n
n
n
k=1
k=1
and that the last expression converges to 0 for n .

The zero convergence is due to the fact that for every ` ` matrix A
(I A)
n
X
kA =
k=1
n
X
Ak nAn+1
k=1
and thus for A = P

n
1 X
lim
k(P )k
n n
k=1
lim
(72)
1 X
Z
(P )k Z(P )n+1
n
1
Z lim
n n
k=1
n
X
!
k
(P ) lim (P )
n
|
{z
{z
}
(73)
k=1
(75)
ZI
0.
n+1
Theorem 3.17 and Lemma 3.3 enable us to give a more detailed description of the asymptotic behavior of the
bias E bn .
99
Theorem 3.18
Let a = > Z > where Z denotes the fundamental matrix of P that was introduced by (74).
Then, for all n 1,
n E bn = a + en ,
(77)
where en is a remainder such that en 0 for n .

Proof
The representation formula (75) in Lemma 3.3 yields
> Z =
=
> + > lim
>
+ lim
n
n1
X
lim >
(Pk )
k=1
>
n1
X
k=1
n1
X
>
Pk (n 1)
| {z }
= >
Pk (n 1) > .
k=0
Hence by taking into account Theorem 3.17 we obtain the following for a certain sequence {en } such
that en 0:
>
a =
Z >
n1
X
= >
Pk n > en
k=0
(71)
(69)
3.4.3
n E bn n > en
n E bn en .
Asymptotic Variance of Estimation; Mean Squared Error
For the statistical model introduced in Section 3.4.2 we now investigate the asymptotic behavior of the variance
Var bn if n .
Theorem 3.19 Define 2 =
defined by (74). Then
P`
i=1
i (i )2 and let Z = (I (P ))1 be the fundamental matrix of P
lim n Var bn = 2 + 2 > diag()(Z I) .
(78)
Proof
Clearly,
n2 Var bn = E
n1
X
2 n1
2
X
(Xk )
E (Xk )
k=0
(79)
k=0
and thus
n Var bn =
2
n1
X
k=0
E (Xk ) + 2
X
0k<k0 n1
n1
2
X
0
E (Xk )(Xk )
E (Xk ) .
k=0
100
This representation will now be used to show (78) for the case 0 = .
In this case we observe
n1
X
2
E (Xk ) = (n)2
and
k=0
n1
X
E (Xk ) = n
`
X
i 2i .
i=1
k=0
Furthermore, by the stationarity of the Markov chain {Xn },

X
n1
X
E (Xk )(X ) =
(n k)E (X0 )(Xk ) ,
k0
0k<k0 n1
where
k=1
` X
`
X
(k)
E (X0 )(Xk ) =
i i pij j = > diag()Pk
i=1 j=1
(k)
(pij )
and Pk = P(k) =
denotes the matrix of the k-step transition probabilities.
A combination of the results above yields
n1
X
1
Var
(Xk )
n
`
X
i 2i + 2 > diag()
i=1
k=0
n1
X
k=1
nk k
P n2
n
n1
!
X nk
n
1
2 + 2 > diag()
Pk
n
2
k=1
n1
!
X nk
2
>
k
+ 2 diag()
P ,
n
=
=
k=1
where the second equality is due to the identity

2 = > diag() .
Taking into account the representation formula (76) for Z I this implies (78).
It is left to show that (78) is also true for an arbitrary initial distribution .
()
()
At this point we will use a more precise notation: We will write X0 , X1 , . . . instead of X0 , X1 , . . .
()
and bn instead of bn .
It suffices to show that
lim n Var bn() Var bn() = 0 .

(80)
n
For this purpose we introduce the following notation: For 0 < r < n 1 let
Yr() =
r1
X
k=0
()
(Xk )
und
()
Zrn
=
n1
X
()
(Xk ) .
k=r
Then, by (79),
n2 Var bn() Var bn()

() 2
() 2
() 2
() 2
=
E Yr() + Zrn
E Yr() + Zrn
E Yr() + E Zrn
E Yr() + E Zrn

2
2
2
2
=
E Yr() E Yr() E Yr() + E Yr()
()
()
()
()
+2E Yr() E Yr() Zrn
E Zrn
2E Yr() E Yr() Zrn
E Zrn

() 2

() 2
() 2
() 2
+ E Zrn
E Zrn
E Zrn
E Zrn
,
where we denote the three summands in the last expression by Ir , IIrn and IIIrn , respectively.
101
Ir does not depend on n and hence limn n1 Ir = 0.

As the state space E is finite we obtain for c = maxiE |(i)| < that
1

()
()
1
()
()
Zrn
+ 4rc E 1 Zrn
.
IIrn 4rc E
E Zrn
E Zrn
n
n
n
This implies limn n1 IIrn = 0 for any r > 0, as
()
1 ()
2c
Zrn E Zrn
n
with probability 1 for all n > r and
lim
()
()
()
1 ()
= lim 1 Zrn
= 0.
Zrn E Zrn
E Zrn
n n
n
Furthermore, for n > r > 0 we have the following estimate

1
IIIrn
n
1 X (i ) 2
(i ) 2
E Z0,nr E Z0,nr
|i ri |
n i=1
2 X
1
( )
( )
E Z0nj E Z0nj
|i ri | ,
sup max
n>0 j{1,...,`} n + r
{z
} i=1
|
<
where it is easy to see that the supremum is finite.

()
()
Due to the ergodicity of the Markov chain X0 , X1 , . . ., the last summand will become arbitrarily
small for sufficiently large r. This completes the proof of (80).
Remarks
Note that
2
b 1 , . . . , Xn ) for defined
for the mean squared error E ( bn ) of the MCMC estimator bn = (X
in (70) it holds that
2
E ( bn ) = E bn + Var bn ,
(81)
i.e., the mean squared error of the MCMC estimator bn is equal to the sum of the squared bias
(E bn )2 and the variance Var bn of the estimator bn .
Both summands on the right hand side of (81) converge to 0 if n but with different rates of
convergence.
In Theorem 3.19 we showed that Var bn = O(n1 ).
On the other hand, by Theorem 3.18 we get that (E bn )2 = O(n2 ).
2
Consequently, the asymptotic behavior of the mean squared error E ( bn ) of bn is crucially
influenced by the asymptotic variance Var bn of the estimator, whereas the bias plays a minor role.
In other words: It can make sense to choose the simulation matrix P such that
the asymptotic variance limn nVar bn is as small as possible,
even if this results in a certain increase of the asymptotic bias limn n E bn .

In order to investigate this problem more deeply we introduce the following notation: Let
V (, P, ) = lim nVar bn ,
n
where : E R is an arbitrary function and (P, ) is an arbitrary reversible pair.
102
Theorem 3.20
Let P1 = (p1,ij ) and P2 = (p2,ij ) be two transition matrices on E such that (P1 , ) and (P2 , ) are
reversible.
For arbitrary i, j E such that i 6= j let p1,ij p2,ij ,
i.e., outside the diagonal all entries of the transition matrix P1 are greater or equal than the corresponding entries of the transition matrix P2 .
Then, for any function : E R,
V (, P1 , ) V (, P2 , ) .
(82)
Proof
Let P = (pij ) be a transition matrix such that the pair (P, ) is reversible. It suffices to show that
V (, P, ) 0 ,
pij
By Theorem 3.19,
i, j E with i 6= j.
(83)
Z
V (, P, ) = 2 > diag()
,
pij
pij
(84)
where Z denotes the fundamental matrix of P introduced by (74).

On the other hand, as ZZ1 = I, we get that
Z1
Z
Z1 + Z
= 0
pij
pij
and thus
Z
Z1
= Z
Z.
pij
pij
Taking into account (84) this implies
Z1
V (, P, ) = 2 > diag()Z
Z .
pij
pij
(85)
As the pair (P, ) is reversible, by the representation formula (75) for the fundamental matrix Z = (zij )
that was derived in Lemma 3.3 we obtain for arbitrary i, j E
i zij = i ij +
X
(k)
(k)
i pij i j = j ji +
j pji j i = j zji .
k=1
k=1
This implies
> diag()Z
`
X
i i zi1 , . . . ,
i=1
=
=
i i zi`
i=1
`
X
i=1
>
`
X
z1i i , . . . , `
`
X
i=1
diag() .
zì i
103
Thus, by (85),
>
>
Z1
P
V (, P, ) = 2 Z diag()
Z = 2 Z diag()
Z ,
pij
pij
pij
(86)
where the last equality is due to the fact that

Z1
P
=
pij
pij
which is an immediate consequence of the definition (74) of Z.
As P = (pij ) is a stochastic matrix and (P, ) is reversible
only the entries pij where i < j (or alternatively the entries pij where i > j) can be chosen
arbitrarily. This can be seen as follows.
For every pair i, j E such that i 6= j the entries pji , pii and pjj can be expressed via pij in the
following way:
i
i
pij ,
pii = c pij ,
pjj = c0
pij ,
pji =
j
j
where c and c0 are constants that do not depend on pij .
For arbitrary i0 , j 0 E the entry diag()(P/pij ) i0 ,j 0 of the matrix product diag()(P/pij )

is given by
i if (i0 , j 0 ) = (i, i) or (i0 , j 0 ) = (j, j),
P
diag()
=
i
if (i0 , j 0 ) = (i, j) or (i0 , j 0 ) = (j, i),
pij i0 ,j 0
0,
else
This implies that the matrix diag()(P/pij ) is non-negative definite, i.e., for all x R`
P
x 0.
pij
By (86) this yields for arbitrary i, j E such that i 6= j
x> diag()
>
P
V (, P, ) = 2 Z diag()
Z 0 .
pij
pij
This completes the proof of (83).
Remarks
As a particular consequence of Theorem 3.20 we get that
the simulation matrix P of the Metropolis algorithm (i.e. if we consider equality in (50) ) minimizes
the asymptotic variance V (, P, )
within the class
ofall MetropolisHastings algorithms having an arbitrary but fixed potential transition
matrix Q = qij .
3.5
Coupling Algorithms; Perfect MCMC Simulation
In this section we will discuss algorithms

that are also based on Markov chains,
but this new class of algorithms simulates a given discrete distribution not only approximately but
in a certain sense exactly.
Therefore, these techniques are referred to as methods of perfect MCMC simulation.
3.5.1
104
Coupling to the Future; Counterexample
First of all we consider a method for coupling the paths of Markov chains where the time
is running forward, i.e. in a way that is perceived as natural.
Therefore, one also refers to this method as coupling to the future.
(i) (i)
For all i {1, . . . , `} let X(i) = X0 , X1 , . . . be a homogenous Markov chain with finite state space
E = {x1 , . . . , x` }
(i)
with deterministic initial state X0

P = (pxx0 ),
= xi and with an irreducible and aperiodic transition matrix
such that = (x , x E) is the ergodic limit distribution of the Markov chain X(i) .
Definitions
For all k {1, . . . , `} we consider
(k) (k)
a sequence U(k) = U1 , U2 , . . . of independent and (0, 1]uniformly distributed random va(k)

riables Un ,
called innovations in step n for the current state xk E.
We consider two different cases:
Either we assume the sequences U(1) , . . . , U(`) to be independent
or we merely consider a single sequence U = U1 , U2 , . . . and define U(1) = . . . = U(`) = U.

Let the Markov chain X(i) be defined recursively by
(k)
X(i)
n = xk , Un
(i)
if Xn1 = xk ,
(87)
where : E (0, 1] E is a socalled valid update function, i.e.

(x, ) : (0, 1] E is piecewise constant for all x E
and for arbitrary x, x0 E such that pxx0 > 0 the total length of the set {u (0, 1] : (x, u) = x0 }
equals pxx0 .
(`)
(1)
is called coupling time where we define
The random variable = min n 1 : Xn = . . . = Xn
(`)
(1)
= if there is no natural number n such that Xn = . . . = Xn .
Theorem 3.21 If the sequences of innovations U(1) , . . . , U(`) are independent, then < with probability 1
(`)
(1)
and Xn = . . . = Xn for all n > .
Proof
(1)
(`)
The recursive definition (87) of the Markov chains X(1) , . . . , X(`) immediately implies Xn = . . . = Xn
for all n > .
It is left to show that P ( < ) = 1. We notice that it suffices to show that for arbitrary i 6= i0
(i0 )
lim P max n : X(i)
r = 1.
n 6= Xn
r
As
(i0 )
(i0 )
P max n : X(i)
r
= 1 P max n : X(i)
>r
n 6= Xn
n 6= Xn
(i0 )
= 1 P X(i)
r 6= Xr
this is equivalent to
0
)
lim P Xr(i) 6= X(i
= 0.
r
105
Let now n0 1 be a natural number such that

(n )
min pxx00 = c > 0 ,
x,x0 E
and consider the decomposition r = m(r)n0 + k for some m(r) {0, 1, . . .} and
k {0, 1, . . . , n0 1}.
The independence of the innovation sequences U(1) , . . . , U(`) yields for r
(i0 )
(i)
(i0 )
(i)
(i0 )
P X(i)
=
6
X
=
P
X
=
6
X
,
X
=
6
X
r
r
n0
n0
r
r
=
(i0 )
(i)
(i0 )
(i)
(i0 )
0
0
P X(i)
=
x
,
X
=
x
P
X
=
6
X
|
X
=
x
,
X
=
x
j
j
j
j
n0
n0
r
r
n0
n0
` X
X
j=1 j 0 6=j
(j)
(j 0 )
(i )
P X(i)
n0 = xj P Xn0 = xj 0 P Xrn0 6= Xrn0
` X
X
j=1 j 0 6=j
`
X
j=1
0)
p(n
xi xj
(j)
(j 0 )
0)
p(n
xi0 xj 0 P Xrn0 6= Xrn0
j 0 6=j
{z
1c
(j)
(j 0 )
(1 c) max P Xrn0 6= Xrn0
..
.
(1 c)m(r) 0 .
j6=j
Remarks
Under additional assumptions about the irreducible and periodic transition matrix P = (pxx0 ) it can
be shown that the coupling time is finite even if
only a single sequence U = U1 , U2 , . . . innovations is considered, i.e. U = U(1) = . . . = U(`) ,

and if for all x E the update function : E (0, 1] E is given by
(x, u) = xj ,
if
j1
P
r=1
pxxr < u
j
P
r=1
pxxr .
(88)
Such an additional condition imposed on P will be discussed in the following theorem, see also the
monotonicity condition in Section 3.5.3.
Theorem 3.22
Let U(1) = . . . = U(`) = U and let the update function : E (0, 1] E be given by (88). Furthermore,
for some xi0 E, let
iX
i0
0 1
X
max
pxxr < min
pxxr .
(89)
xE
xE
r=1
(1)
r=1
(`)
Then < with probability 1 and for all n > Xn = . . . = Xn .
106
Proof
Similar to the proof of Theorem 3.21 it suffices to show that for arbitrary i 6= i0
(i0 )
lim P X(i)
=
6
X
= 0.
r
r
r
Observe that
0
)
P Xr(i) 6= X(i
r
=
=
(i)
(i0 )
(i0 )
P X1 6= X1 , X(i)
r 6= Xr
` X
X
(i)
(i0 )
(i)
(i0 )
(i0 )
P X1 = xj , X1 = xj 0 P X(i)
| X1 = x j , X1 = x j 0
r 6= Xr
j=1 j 0 6=j
` X
X
(i)
(i0 )
(j)
(j 0 )
P X1 = xj , X1 = xj 0 P Xr1 6= Xr1
j=1 j 0 6=j
(i)
(i0 )
(j)
(j 0 )
1 P X1 = X1
max
P
X
=
6
X
r1
r1
0
{z
} j 6=j
|
d>0
..
.
(1 d)r 0 ,
where we use that (87) (89) imply

0 < d = max
xE
iX
0 1
r=1
pxxr min
xE
i0
X
(i0 )
(i)
(i0 )
(i)
.
pxxr P X1 = X1 = xi0 P X1 = X1
r=1
Remarks
(i)
In general P ( < ) = 1 does not imply X ,

i.e., at the coupling time the distribution of the Markov chain X(i) does in general not coincide
with the stationary limit distribution although this could be a conjecture.
The following counterexample illustrates this paradox.
Consider the state space E = {1, 2} and the irreducible and aperiodic transition matrix
0.5 0.5
P=
1
0
whose stationary limit distribution is = (2/3, 1/3)> .
(1)
(2)
(1)
(2)
(1)
(2)
If X 1 6= X 1 we necessarily obtain X 1 = 2 or X 1 = 2 and therefore X = X = 1.

3.5.2
ProppWilson Algorithm; Coupling from the Past
Recall that
the procedure of coupling to the future discussed in Section 3.5.1 starts at a deterministic time 0
whereas the final state, i.e. the coupling time of the simulation is random.
Moreover, the state distribution of the Markov chain X(i) at the coupling time is in general not equal
to the stationary limit distribution .
107
Therefore, we will now consider a different coupling method,

which is called Coupling from the Past (CFTP).
It was developed in the mid 90s by Propp and Wilson at the Massachusetts Institute of Technology
(MIT).
The procedure is similar to coupling to the future (see Section 3.5.1) but now the initial time of the
simulation will be chosen randomly whereas the final time is deterministic.
In other words, the Markov chains X(1) , . . . , X(`) are not started at time 0,
but sufficiently far away in the past such that by time 0 at the latest all paths will have merged.
For the precise mathematical modelling of this procedure we need the following notation.
For each potential initial time m {1, 2, . . .} and for all i {1, . . . , `} let
(m,i)
X(m,i) = X(m,i)
, Xm+1 , . . .
m
be a homogenous Markov chain with finite state space E = {x1 , . . . , x` },
(m,i)
with the (deterministic) initial state Xm

matrix P = (pxx0 ),
= xi and with the irreducible and aperiodic transition
such that = (x , x E) is the ergodic limit distribution of X(m,i) .

For every k {1, . . . , `} we consider
(k) (k)
a sequence U(k) = U0 , U1 , . . . of independent and (0, 1]uniformly distributed random variables.

(k)
Like in Section 3.5.1 we call Un an innovation in step n if the current state is xk E.

We consider two cases:
The innovation sequences U(1) , . . . , U(`) are either independent
or U(1) = . . . = U(`) = U.
Let the Markov chain X(m,i) be defined recursively via the update function : E (0, 1] E, i.e.
X(m,i)
= xk , Un(k)
n
(m,i)
if Xn1 = xk .
(90)
(m,`)
(m,1)
is called CFTP coupling time
Definition The random variable = min m 1 : X0
= . . . = X0
(m,1)
(m,`)
where we define = if there is no integer m such that X0
= . . . = X0
.
Theorem 3.23
Let P ( < ) = 1. Then, for all m ,

(m,1)
X0
(m,`)
= . . . = X0
Moreover, for arbitrary m and i, j {1, . . . , `},

(m,i)
X0
(,j)
= X0
108
Proof
(m,1)
Directly by the recursive definition (90) of the Markov chains X(m,1) , . . . , X(m,`) , we get that X0
(m,`)
(m,i)
(,j)
. . . = X0
and X0
= X0
for arbitrary m and i, j {1, . . . , `}.
As by hypothesis P ( < ) = 1, we obtain for arbitrary k {1, . . . , `} that
(,i)
(,i)
P X0
= xk
=
lim P X0
= xk , m
m
(m,i)
=
lim P X0
= xk , m
m
(m,i)
(m,i)
=
lim P X0
= xk lim P X0
= xk , > m
m
m
{z
}
|
=0
(m,i)
=
lim P X0
= xk
m
(0,i)
=
lim P Xm = xk = xk ,
m
where the last but one equality is a consequence of the homogeneity of the Markov chain X(m,i) .
Remarks
If the number ` of elements in the state space E = {x1 , . . . , x` } is large,
the MCMC simulation of based on the CFTP algorithm by Propp and Wilson can be computationally inefficient
as for every initial state x1 , . . . , x` a complete path needs to be generated.
However, in some cases the computational complexity can be reduced. Examples will be discussed in
Sections 3.5.3 and 3.5.4.
In these special situations the state space E = {x1 , . . . , x` } and the update function : E(0, 1]
E possess certain monotonicity properties.
As a consequence it suffices to consider a single sequence U = U0 , U1 , . . . of independent and

(0, 1]uniformly distributed innovations.
Moreover, only two different paths need to be generated.
3.5.3
Monotone Coupling Algorithms
We additionally assume that the state space E = {x1 , . . . , x` } is partially ordered and has a maximal
element 1 E and a minimal element 0 E, i.e., there is a relation on E such that
(a) x x ,
x E ,
(b) x y and y z x z ,
x, y, z E ,
(c) x y and y x x = y ,
x, y E ,
(d) 0 x 1 ,
x E .
Furthermore, we impose the condition

that the update function : E (0, 1] E is monotonously nondecreasing with respect to the partial
order , i.e., for arbitrary x, y E such that x y we have
(x, u) (y, u) ,
u (0, 1] .
(91)
109
Let the innovations U(1) , . . . , U(`) be identical with probability 1,
i.e., we merely consider a single sequence U = U0 , U1 , . . . of independent and (0, 1]uniformly

distributed random variables and define U(1) = . . . = U(`) = U.
For arbitrary m {1, 2, . . .} and i {1, . . . , `} the Markov chain X(m,i) is recursively defined by
(m,i)
X(m,i)
= Xn1 , Un ,
n
n = m + 1, m + 2, . . . .
(92)
Remarks
If xi xj , then by (91) and (92) we get that for all n m
X(m,i)
X(m,j)
.
n
n
(93)
In particular, for arbitrary n m and i {1, . . . , `},

X(m,min)
X(m,i)
X(m,max)
,
n
n
n
(94)
where X(m,min) and X(m,max) denote the Markov chains

(m,min)
, Xm+1
X(m,min) = (X(m,min)
m
, . . .)
(m,max)
, Xm+1
X(m,max) = (X(m,max)
m
and
(m,min)
that are recursively defined by (92) with Xm
(m,max)
= 0 and Xm
, . . .)
= 1.
Due to (94) it suffices to choose an initial time that lies far enough in the past
such that the paths of X(m,min) and X(m,max) will have merged by time 0,
i.e., we consider the CFTP coupling time
(m,min)
(m,max)
.
= min m 1 : X0
= X0
Theorem 3.24
(95)
Let the update function : E (0, 1] E satisfy the monotonicity condition (91).
Then, for the CFTP coupling time defined by (95), it holds that < with probability 1.
(m,i)
Moreover, for arbitrary m and i, j {1, . . . , `}, X0
(,j)
= X0
Proof
(m,i)
(,j)
= X0
for arbitrary m and i, j {1, . . . , `} if
As the argument showing that X0
P ( < ) = 1, is similar to the proof of Theorem 3.23 this part of the proof is omitted.
We merely show that P ( < ) = 1.
First of all, we observe that for all r 1
n
o
(r,min)
(r,min)
{ > r} Xr+1 6= 1, . . . , X0
6= 1 ,
as (94) implies
{ > r}
=
=
(94)
n
o
(r,min)
(r,max)
X0
6 X0
=
n
o
(r,min)
(r,max)
(r,min)
(r,max)
Xr+1 =
6 Xr+1 , . . . , X0
6= X0
n
o
(r,min)
(r,min)
Xr+1 =
6 1, . . . , X0
6= 1 .
(96)
110
As in the proof of Theorem 3.21 let n0 1 be a natural number such that

(n )
min pxx00 = c > 0 ,
(97)
x,x0 E
and decompose r such that r = m(r)n0 + k for some m(r) {0, 1, . . .} and k {0, 1, . . . , n0 1}.
By (96) and (97) we obtain
P =
=
lim P > r
r
(96)
(97)
(r,min)
(r,min)
lim P Xr+1 6= 1, . . . , X0
6= 1
r
X
(n )
(n0 )
0)
lim
p 0x01 p(n
x1 x2 . . . pxm(r)1 xm(r)
r
x1 ,...,xm(r) 6= 1
lim (1 c)m(r) = 0 .
Remarks
Sometimes the update function : E(0, 1] E is not monotonously nondecreasing but nonincreasing
with respect to the partial order , i.e., for arbitrary x, y E such that x y we have
(x, u) (y, u) ,
u (0, 1] .
(98)
In this case the following cross-over technique turns out to be useful.

Based on the update function : E (0, 1] E we construct a new nondecreasing update function
0 : E (0, 1]2 E which is given as
0 (x; u1 , u2 ) = ((x, u1 ), u2 ) ,
x E; u1 , u2 (0, 1] .
(99)
This function has the desired property as by (98) and (99) we obtain for arbitrary x, y E such
that x y
0 (x; u1 , u2 ) = ((x, u1 ), u2 ) ((y, u1 ), u2 ) = 0 (y; u1 , u2 ) ,
u1 , u2 (0, 1] ,
i.e., 0 : E (0, 1]2 E is nondecreasing if : E (0, 1] E is nonincreasing.

Let now : E (0, 1] E be an update function with respect to the irreducible and aperiodic
transition matrix P = (pxx0 ) with ergodic limit distribution = (x , x E).
Then the map 0 : E (0, 1]2 E defined by (99) is a valid update function with respect to the
(2)
irreducible and aperiodic twostep transition matrix P(2) = (pxx0 ) and it has the same ergodic
limit distribution = (x , x E).
In the same way that was used to prove Theorem 3.24 one can show that the coupling time
(2m,min)
(2m,max)
0 = min m 1 : X0
= X0
is finite with probability 1, i.e., 0 < and
(2 0 ,i)
X0
3.5.4
for all i {1, . . . , `} if : E (0, 1] E is nonincreasing.
Examples: BirthandDeath Processes; Ising Model
1. BirthandDeath Processes
The update function : E (0, 1] E defined in (88) satisfies the monotonicity condition (91)
111
if the state space can identified with the set E = {1, . . . , `} equipped with the natural order of
the numbers 1, . . . , `
and if the simulation matrix P = (pij ) is monotonously nondecreasing with respect to the order
, i.e., for arbitrary i, j E such that i j we have
`
X
pir
r=k
`
X
pjr ,
k = 1, . . . , ` .
(100)
r=k
A whole class of transition matrices P = (pij ) satisfying the monotonicity condition (100) is given by
the tridiagonal matrices of birthanddeath processes which are of the type
1 p12
p12
0
...
0
1 p21 p23
p23
...
0
p21
0
p32
1 p32 p34 . . .
0
P=
..
..
..
..
.
.
.
.
0
0
0
.
.
.
p
`1,`
0
0
0
. . . 1 p`,`1
where 0 < pi,i+1 1/2 for all i = 1, . . . , ` 1 and 0 < pi,i1 1/2 for all i = 2, . . . , `.
Zeit
Zeit
Zeit
Figure 7: Monotonic coupling to the past for monotonously nondecreasing deathandbirth processes
On the other hand, the update function : E (0, 1] E defined in (88) is monotonously nonincreasing, see (98),
if P = (pij ) is monotonously nonincreasing with respect to ,
i.e., if for arbitrary i, j E such that i j we have
`
X
r=k
pir
`
X
r=k
pjr ,
k = 1, . . . , ` .
(101)
112
It is easy to show that there is no tridiagonal transition matrix P = (pij ) satisfying the condition
(101), i.e., birthanddeath processes are never monotonously nonincreasing.
However, condition (101) holds for example for the following matrix:
0 ...
0
0
1
0
1/2
1/2
0 ...
1/3
1/3
1/3
0 ...
P=
..
..
..
..
.
.
.
.
0 . . . 1/(` 1) 1/(` 1) 1/(` 1)
1/` . . .
1/`
1/`
1/`
2. Ising Model
Like for the hardcore model discussed in Section 3.3.1
we consider a connected graph G = (V, K) with finitely many vertices V = {v1 , . . . , v|V | }
and a certain set K V 2 of edges e = (vi , vj ), each of them connecting two vertices vi , vj .
One of the values 1 and 1 is assigned to each vertex,
and we consider the state space E = {1, 1}|V | of all configurations x = (x(v), v V ), i.e. for
each v V either x(v) = 1 or x(v) = 1.
If this is interpreted as an image, x(v) = 1 is regarded as a white pixel and x(v) = 1 as a black
pixel.
For each x E let the probability x of the configuration x be given by
!
X
1
x(vi )x(vj )
x =
exp J
zG,J
(102)
e=(vi ,vj )K
for a certain parameter J 0, which is interpreted as inverse temperature in physics:

For J = 0 (infinite temperature) the distribution = (x , x E) given by (102) is the discrete
uniform distribution.
For J 0 (low temperature) those configurations possess a large probability that have a small
number of connected pairs of vertices being differently colored.
= (x , x E) given by (102) converges to the
For J (zero temperature) the
distribution
two point uniform distribution 0 + 1 /2,

where 0 and 1 denote the (extreme) configurations consisting either only of white or only of black
pixels, i.e. either 0(v) = 1 or 1(v) = 1 for all v V .
Notice that zG,J > 0 is an (in general unknown) normalizing constant where
!
X
X
zG,J =
exp J
x(vi )x(vj ) .
xE
e=(vi ,vj )K
The following figure was taken from O. Hggstrm (2002) Finite Markov Chains and Algorithmic
Applications, CU Press, Cambridge.
113
Figure 8: Typical configuration of the Ising model for J = 0 (upper left corner), J = 0.15 (upper right corner),
J = 0.3 (lower left corner) and J = 0.5 (lower right corner)
It illustrates the role of the parameter J,
i.e., an increase of J results in a more pronounced clumping tendency of identically colored pixels.
Let the simulation matrix P = (pxx0 ) be given by the Gibbs sampler, i.e., assume that (36) holds,
namely
X
pxx0 =
qv x0 (v)| x(v) 1I x(v) = x0 (v) ,
x, x0 E .
vV
where for arbitrary x, x0 E such that x(v) = x0 (v)
x+
if x0 (v) = 1,
+
x+
x
x0 (v)| x(v) =
x
if x0 (v) = 1,
x+ + x
using the notation x (v) = 1 and x (v) = x(v) and similarly x+ (v) = 1 and x+ (v) =
x(v).
By (102) we obtain for x0 (v) = 1 that

exp J k+ (x(v)) k (x(v))

x0 (v)| x(v) =
exp J k (x(v)) k+ (x(v)) + exp J k+ (x(v)) k (x(v))
114
and in the same way for x0 (v) = 1 that

x0 (v)| x(v)
exp J k (x(v)) k+ (x(v))

=
exp J k+ (x(v)) k (x(v)) + exp J k (x(v)) k+ (x(v))
Thus, we can summarize
x0 (v)| x(v)
1 + exp 2 J k+ (x(v)) k (x(v))

=
1
1 + exp 2 J k (x(v)) k+ (x(v))
if x0 (v) = 1,
if x0 (v) = 1,
(103)
where k+ (x(v)) and k (x(v)) denote the number of vertices connected to v having the values
1 and 1, respectively.
For the state space E = {1, 1}|V | we define the partial order
by x y if x(v) y(v) for all v V such that 0 x 1 for all x E,
where we assume the elements of the state space E = {x1 , . . . , x` } to be indexed in a way ensuring
i j if xi xj (this is e.g. the case if E is ordered lexicographically).
Then (103) implies for arbitrary x, y E such that x y
1| x(v) 1| y(v)
and
1| x(v) 1| y(v) ,
(104)
because 1/(1 + ea ) 1/(1 + eb ) for arbitrary a, b R such that a b.
Let the update function : E (0, 1]2 E be given by (x; u1 , u2 ) = x0 , where x0 = x0 (v), v V
and for all i = 1, . . . , |V |
Pi
Pi1
if j=1 q(vj ) < u1 j=1 q(vj ) and u2 < 1| x(vi ) ,

1
Pi
Pi1
0
x (vi ) =
1
if j=1 q(vj ) < u1 j=1 q(vj ) and u2 1| x(vi ) ,
x(v ) , else.
i
By (104), for arbitrary x, y E such that x y we have
(x; u1 , u2 ) (y; u1 , u2 ) ,
u1 , u2 (0, 1] ,
i.e., condition (91) with respect to is satisfied.

3.5.5
ReadOnce Modification of the CFTP Algorithm
A problem of the monotone CFTP algorithm discussed in Sections 3.5.3 and 3.5.4 is
the necessity to save all innovations U0 , U1 , . . . , U where denotes the coupling time defined in
(95), i.e.
(m,min)
(m,max)
= min m 1 : X0
= X0
.
Therefore, in the year 2000, David Wilson suggested the following modifications of the CFTP algorithm
aiming at a reduction of the necessary memory allocation.
The main idea of the modification is to realize coupling to the past (see Sections 3.5.2 3.5.4)
115
based on a sequence of independent and identically distributed blocks of forward simulation, where
(m,i) (m,i)
the (potential) initial times m {1, 2, . . .} of the Markov chain X(m,i) = Xm , Xm+1 , . . . can
be picked at random.
The innovation sequences U(1) , . . . , U(`) are chosen identical with probability 1,
i.e., we merely consider a single sequence U = . . . , U1 , U0 , U1 , . . . of independent and uniformly

distributed random variables and define U(1) = . . . = U(`) = U.
Furthermore, we assume that the Markov chains X(1) , . . . , X(`) and X(m,1) , . . . , X(m,`) defined by (87)
and (90) have finite forward and backward coupling times
(`)
= min n 1 : X(1)
n = . . . = Xn
(m,1)
(m,`)
bzw. = min m 1 : X0
= . . . = X0
with probability 1.
Now we consider blocks of forward simulation of (at first deterministic) length T for some T 1.
(kT,i)
For arbitrary k 0 and i = 1, ..., `, let XkT
= xi and
(kT,i)
X(kT,i)
= (Xn1 , Un ) ,
n
Furthermore, for each k 0 we consider the event
n
(kT,i)
(kT,j)
CkT = X(k+1)T = X(k+1)T ,
n = kT + 1, kT + 2, ... .
o
i 6= j {1, ..., l}
where the length T of the blocks is chosen such that
0 < P (CT )
= P (CkT ) ,
k 0 .
(105)
Starting at k = 0 the readonce modification of the CFTP algorithm is given as follows.

(kT,i)
1. Simulate Xn
via and U for n = kT + 1, ..., (k + 1)T .
2. Set m = k and k = k + 1. If the event CmT has occurred proceed with step 3, otherwise return to step
1.
(mT,i)
3. Repeat steps 1 and 2 until the event Cm0 T occurs for some m0 > m and return the value of Xm0 T
an arbitrary i {1, ..., l} as a realization of .
Example
For ` = 3 states we consider the irreducible and aperiodic transition matrix
1/2 0 1/2
P = 1/3 1/3 1/3 .
0
1
0
For block length T = 2 and the (0, 1]uniformly distributes pseudorandom numbers
u = (0.01, 0.60, 0.82, 0.47, 0.36, 0.59, 0.34, 0.89, ...)
we obtain the simulation run shown in Fig. 9.
for
116
Output
2T
3T
Cm
4T
C m
Figure 9: Read once algorithm

Remarks
As the simulation blocks and hence the events CT , C2T , . . . are independent and as P (CT ) = P (C2T ) =
. . .,
the first m0 blocks of forward simulation of the algorithm described above in particular yield the
coupling from the past discussed in Section 3.5.2 if they are considered in reversed order.
(mT,i)
Therefore, Xm0 T for all i {1, ..., l}.
The last, i.e. the (m0 + 1)st block of forward simulation serves only to define a stopping rule.
The readonce modification of the CFTP algorithm terminates with probability 1
if condition (105) is satisfied, i.e. if P (CT ) > 0.
For monotonously nondecreasing update functions this holds if T n0 where n0 1 is a natural
number such that
(n )
min
pxx00 = c > 0 ,
0
x,x E
see the proof of Theorem 3.24.

If T is a random variable
having the same distribution
as the forward
coupling time and which is independent of the
innovation sequence U = U0 , U1 , . . . ,
then by the following elementary but useful properties of the coupling times and we obtain
P (CT ) 1/2.
d
Theorem 3.25 The random variables and have the same distribution, i.e. = . Moreover, if the coupling
times and are independent and almost surely finite, then
P ( )
1
.
2
(106)
Proof
By the homogeneity of the Markov chains X(1) , . . . , X(`) and X(m,1) , . . . , X(m,`) , for any natural number
k 1 we have
(`)
P ( = k) = P min n 1 : X(1)
=k
n = . . . = Xn
(m,1)
(m,`)
= P min m 1 : X0
= . . . = X0
=k
=
P ( = k) .
117
Let now the coupling times and be independent and finite with probability 1.
This implies
P ( ) =
=
X
k=1
P ( | = k)P ( = k) =
P ( k | = k)P ( = k)
k=1
P ( k)P ( = k) =
k=1
..
.
=
P ( k)P ( = k)
k=1
P ( ) ,
d
where the last equality follows from = which has been shown in the first part of the proof.
Thus,
2 P ( ) = P ( ) + P ( )
= 1 P ( > ) + 1 P ( < )
= 2 P ( 6= )
2 1 = 1.

Schmidt MCMC 2010

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Schmidt MCMC 2010

Transféré par

Droits d'auteur :

Formats disponibles

ITT

Ulm, July 2010

Markov Chains and MonteCarlo

Specification of the Model and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

State Space, Initial Distribution and Transition Probabilities . . . . . . . . . . . . . . . . .

The Matrix of the nStep Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . 12

Ergodicity and Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Basic Definitions and Quasi-positive Transition Matrices . . . . . . . . . . . . . . . . . . . . 16

Estimates for the Rate of Convergence; PerronFrobeniusTheorem . . . . . . . . . . . . . 20

Irreducible and Aperiodic Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Stationary Initial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Direct and Iterative Computation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Reversibility; Estimates for the Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Recursive Construction of the Past . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Determining the Rate of Convergence under Reversibility . . . . . . . . . . . . . . . . . . . 43

Multiplicative Reversible Version of the Transition Matrix; Spectral Representation . . . . . 45

Alternative Estimate for the Rate of Convergence; 2 -Contrast . . . . . . . . . . . . . . . . 46

DirichletForms and RayleighTheorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Bounds for the Eigenvalues 2 and ` . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Generation of Pseudo-Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Simple Applications; MonteCarlo Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Linear Congruential Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Transformation of Uniformly Distributed Random Numbers . . . . . . . . . . . . . . . . . . . . . . 68

Transformation Algorithms for Discrete Distributions . . . . . . . . . . . . . . . . . . . . . 71

Quotients of Uniformly Distributed Random Variables . . . . . . . . . . . . . . . . . . . . . 79

Simulation Methods Based on Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Example: HardCore Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Error Analysis for MCMC Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Estimate for the Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

MCMC Estimators; Bias and Fundamental Matrix . . . . . . . . . . . . . . . . . . . . . . . 96

Asymptotic Variance of Estimation; Mean Squared Error . . . . . . . . . . . . . . . . . . . 99

Coupling Algorithms; Perfect MCMC Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Coupling to the Future; Counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

ProppWilson Algorithm; Coupling from the Past . . . . . . . . . . . . . . . . . . . . . . . 106

Monotone Coupling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Examples: BirthandDeath Processes; Ising Model . . . . . . . . . . . . . . . . . . . . . . 110

ReadOnce Modification of the CFTP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 114

discretetime Markov chains with finite state space

Specification of the Model and Examples

of the probabilities 1 , . . . , ` defines the initial distribution of the

The ` ` matrix P = (pij )i,j=1,...,` of the transition probabilities pij where

is called onestep transition matrix of the Markov chain.

for arbitrary n = 0, 1, . . . and i0 , i1 , . . . , in E.

for any n = 1, 2, . . . and i0 , i1 , . . . , in E such that P (Xn1 = in1 , . . . , X0 = i0 ) > 0.

P (X0 = i0 , X1 = i1 ) = P (X0 = i0 )P (X1 = i1 | X0 = i0 ) = i0 pi0 i1 .

i.e., we showed that (3) also holds for the case n = 1.

Now assume that (3) holds for some n = k 1 1.

Corollary 2.1 Let {Xn } be a Markov chain. Then,

holds whenever P (Xn1 = in1 , . . . , X0 = i0 ) > 0.

P (Xn = in , Xn1 = in1 )

i0 pi0 i1 . . . pin2 in1 pin1 in

i0 pi0 i1 . . . pin2 in1

This result and (4) imply (5).

P (Xn = in | Xn1 = in1 ) P (Xn2 = in2 . . . , X0 = i0 | Xn1 = in1 ) .

The conditional independence (6) is called the Markov property of {Xn }.

2. Random Walks; Risk Processes

X0 , X1 , . . . is a Markov chain with transition probabilities

5. Cyclic random walks

and the transition probabilities

mod (1000) {1, . . . , 6},

Let X0 , Z1 , Z2 , . . . : {0, 1, . . . , 999} be independent random variables, where the distribution

for n 1 is a Markov chain called cyclic random walk.

In this section we will show