Vous êtes sur la page 1sur 42

Random walk: notes

László Lovász

December 5, 2017

Contents
1 Basics 2
1.1 Random walks and finite Markov chains . . . . . . . . . . . . . . . . . . . . . 2
1.2 Matrices of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Stationary distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Harmonic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Times 9
2.1 Return time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Hitting time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Commute time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Cover time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Universal traverse sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Mixing 19
3.1 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Random coloring of a graph . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Conductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Stopping rules 30
4.1 Exit frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Mixing and ε-mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Applications 36
5.1 Volume computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.1 What is a convex body? . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Lower bounds on the complexity . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.1 Monte-Carlo algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.2 Measurable Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . 39

1
5.2.3 Isoperimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.4 The ball walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Random spanning tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1 Basics
1.1 Random walks and finite Markov chains
Let G = (V, E) be a connected finite graph with n ≥ 2 vertices and m edges. We usually
assume that V = {1, . . . , n}. Let a(i, j) denote the number of edges connecting i and j.
A random walk on G is an (infinite) sequence of random vertices v 0 , v 1 , v 2 , . . ., where v 0
is chosen from some given initial probability distribution σ 0 on V (often concentrated on a
single point) and for each t ≥ 0, v t+1 is obtained by choosing and edge from the uniform
distribution on the set of edges incident with v t , and moving to its other endpoint. We denote
σ t the distribution of v t : σit = P(v t = i). We denote by σ k the vector (σik : i ∈ V ).
If we are at node i, then the probability of moving to node j is pij = a(i, j)/n. In the
case of a simple graph,
(
1/d(i) if ij ∈ E(G),
pij = (1)
0, otherwise.

Note that
X
pij = 1 (i ∈ V ). (2)
j∈V

A nonnegative matrix P = (pij )ni,j=1 satisfying (2) defines a finite Markov chain. A walk of the
chain is the random sequence (v 0 , v 1 , v 2 , . . .) where v 0 is chosen from some initial probability
distribution σ 0 and v t+1 is chosen from the probability distribution (pvt ,j : j = 1, . . . , n).
All these choices are made independently.
We call the Markov chain irreducible, if for every S ⊆ V , S 6= ∅, V there are i ∈ S and
j ∈ V \ S with pij > 0. For the random walk on a graph, this just means that the graph is
connected.

Proposition 1.1 In every irreducible Markov chain, every node is visited infinitely often
with probability 1.

We conclude with some examples.

Example 1.2 (Waiting Problem) Let A1 , A2 , . . . be a sequence of independent events,


each with probability p. Let AN be the first that occurs. Then E(N) = 1/p. Proof: E(N) =
p · 1 + (1 − p)(1 + E(N).

2
This can be modeled as a Markov chain with two states W (for Wait) and S (for Success),
where

pW,W = 1 − p, pW,S = p,
pS,W = 0, pS,S = 1.

(The last two values are given only to have a full Markov chain, since we don’t care what
happens after reaching S.)
An alternative proof would use the formula

P(N = t) = (1 − p)t−1 p.

From this one could easily compute the expectation of N2 and the variance of N:
p+1 p−1
E(N2 ) = , Var(N) = .
2p2 p2

Instead of the independence of the events Ai , it would suffice to assume that P(Ai | A1 ∧
· · · ∧ Ai−1 ) = p. If we only assume that P(Ai | A1 ∧ · · · ∧ Ai−1 ) ≥ p, then the inequalities
E(N) ≤ 1/p and E(N2 ) ≤ (p + 1)/(2p2 ) still follow.

Example 1.3 (Gambler’s Ruin) A gambler is betting on coin flips; so at every turn, it
loses one dollar or gains one dollar. He starts with k dollars, and sets a target wealth of
n > k dollars. He quits if he either reaches n dollars (a win) or loses all his money (a loss).
What is his probability of winning?
This can be phrased as a random walk on a path with nodes {0, 1, . . . , n}, starting at
node k. Let f (k) be the probability of hitting n before 0. Then f (n) = 1, f (0) = 0, and
1 1
f (k) = f (k − 1) + f (k + 1) (1 ≤ k ≤ n − 1).
2 2
So the numbers f (k) form an arithmetic progression, and hence f (k) = k/n.

Example 1.4 (Coupon Collector Problem) A type of cereal box contains one of n dif-
ferent coupons, each with the same probability. How many boxes do you have to open, in
expectation, in order to collect all of the different coupons?
Let Ti denote the first time when i coupons have been collected. So T1 = 0 < T2 =
1 < . . . < Tn , and we want to determine E(Tn ). The difference Tk+1 − Tk is the number of
boxes you open before finding a new coupon, when having k coupons already. This event has
probability (n − k)/n, independently of the previous steps. Hence by the Waiting Problem
(Example 1.2),
n
E(Tk+1 − Tk ) = ,
n−k

3
and so the total expected time is
n−1 n−1
X X n
E(Tn ) = E(Tk+1 − Tk ) = = nhar(n) ∼ n ln n,
n−k
k=0 k=0

where
1 1 1
har(n) = 1 + + + ··· + .
2 3 n
This can be modeled as a random walk on a complete graph with a loop at each node.

Example 1.5 (Card shuffling) The usual “riffle shuffle” can be modeled as a Markov chain
on the set of all permutations of 52 cards. A step is generated by taking a random sequence
ε1 ε2 . . . ε52 of 0’s and 1’s of length 52. If there are a 1’s in the sequence, we take off a cards
from the top of the deck, place them to the right of the rest, and then merge the two piles so
that the i-th card comes from the pile on the right if and only if εi = 1. The question is, how
many steps are needed to “shuffle well”, i.e., to get a permutation approximately uniformly
distributed over all permutations? The surprising answer is: seven shuffle moves suffice.

1.2 Matrices of graphs


We need the following theorem. We call an n × n matrix irreducible, if it does not contain a
k × (n − k) block of 0-s disjoint from the diagonal.

Theorem 1.6 (Perron-Frobenius) If an n × n matrix has nonnegative entries then it has


a nonnegative real eigenvalue λ which has maximum absolute value among all eigenvalues.
This eigenvalue λ has a nonnegative real eigenvector. If, in addition, the matrix is irreducible,
then λ has multiplicity 1 and the corresponding eigenvector is positive (up to sign change).

The adjacency matrix of a simple graph G is be defined as the n×n matrix A = AG = (Aij )
in which
(
1, if i and j are adjacent,
Aij =
0, otherwise.

If G has multiple edges, then we let Aij = a(i, j). We could also allow loops and include this
information in the diagonal. In this course, a loop at node i adds one to the degree of the
node.
The Laplacian of the graph is defined as the n × n matrix L = LG = (Lij ) in which
(
d(i), if i = j,
Lij =
−a(i, j), if i 6= j.

So L = D − A, where D = DG is the diagonal matrix of the degrees of G. Clearly L1 = 0.

4
Let P = D−1 A be the transition matrix of the random walk. Explicitly,
a(i, j)
(P )ij = pij = .
d(i)
If G is d-regular, then P = (1/d)A.
The matrix P is not symmetric in general, but the equation

D1/2 P D−1/2 = D−1/2 AD−1/2

shows that P is similar to the symmetric matrix Pb = D−1/2 AD−1/2 . In particular, the
eigenvalues λ1 > λ2 ≥ · · · ≥ λn of P are also eigenvalues of Pb, and hence they are real
numbers. If w1 , . . . , wn are the corresponding orthonormal eigenvectors, then the spectral
decomposition
n
X
Pb = λk wk wkT
k=1

gives the decomposition


n
X
P = λk uk vkT , (3)
k=1

where uk = D−1/2 wk are right eigenvectors and vk = wk )T D1/2 are left eigenvectors of P .
k vl = 1(k = l), and u1 = 1 and v1 = π.
We have uT
We can express the distribution after t steps by a simple formula. We have
(k+1) (k) a(i, j)
X
σj = σi .
i
d(i)

This can be written as σ k+1 = P T σ k , and hence


n
X
σ k = (P T )k σ 0 = λk (uT 0
k σ )vk . (4)
k=1

The matrix entry (P t )ij is the probability that starting at i we reach j in t steps.

Theorem 1.7 −1 is an eigenvalue of P if and only of G is bipartite.

1.3 Stationary distribution


Considering a random walk on a graph G, define
d(i)
πi = (i ∈ V ). (5)
2m
This is a probability distribution on V , called the stationary distribution of the chain. Note
that
1
πi pij = . (6)
2m

5
If we start the chain from initial distribution σ 0 = π, then
X 1
σj1 = πi pij = d(j) = πj ,
i
2m

and repeating this, we see that the distribution σ t after any number of steps remains π. (This
explains the name.)
In the more general case of Markov chains, we cannot give such a simple definition, but the
Perron–Frobenius Theorem implies that there is a distribution (πi : i = 1, . . . , n) preserved
by the chain, i.e.,
n
X
πi pij = πj . (7)
j=1

So π is a left eigenvector of the transition matrix P , belonging to the eigenvalue 1. (The


corresponding right eigenvector is 1.)
It also follows from the Perron–Frobenius Theorem that the stationary distribution of an
irreducible Markov chain is unique.
If G is regular, then the Markov chain is symmetric: puv = pvu . We say that the
Markov chain is time-reversible, if πu puv = πv pvu for all u and v. The random walk on an
(undirected) graph is time-reversible by (6). (Informally, this means that a random walk
considered backwards is also a random walk.)

Theorem 1.8 For the random walk on a non-bipartite graph, the distribution of v t tends to
the stationary distribution as t → ∞.

This is not true for bipartite graphs if n > 1, since v t is concentrated one one or the other
color class, depending on the parity of t.

Proof. By the Perron–Frobenius Theorem, every eigenvalue of P is in the interval [−1, 1];
if G is non-bipartite then it follows that −1 is not an eigenvalue. Using (18), we see that
n
X
σt = λk (uT 0 T 0
k σ )vk → (u1 σ )v1 (t → ∞).
k=1

We know that u1 = 1 and v1 = π, and uT 0


σi0 = 1, so σ t → π as claimed.
P
1σ = i 

1.4 Harmonic functions


Let G be a connected simple graph, S ⊆ V and f : V → R. The function f is called a
harmonic at a node v ∈ V if
1 X
f (u) = f (v), (8)
d(v)
u∈N (v)

6
asserting that the value of f at v is the average of its values at the neighbors of v. A node
where a function is not harmonic is called a pole of the function. Another way of writing the
definition is
X
(f (v) − f (u)) = 0 ∀v ∈ V \ S. (9)
u∈N (v)

If we allow multiple edges, then the definition is


1 X
a(u, v)f (u) = f (v), (10)
d(v)
u∈V

where a(u, v) is the multiplicity of the edge uv.


Every constant function is harmonic at each node. On the other hand,

Proposition 1.9 Every nonconstant function on the nodes of a connected graph has at least
two poles.

Proof. Let S be the set where the function assumes its maximum, and let S 0 be the set of
those nodes in S that are connected to any node outside S. Then every node in S 0 must be
a pole, since in (8), every value f (u) on the left had side is at most f (v), and at least one
is less, so the average is less than f (v). Since the function is nonconstant, S is a nonempty
proper subset of V , and since the graph is connected, S 0 is nonempty. So there is a pole
where the function attains its maximum. Similarly, there is another pole where it attains its
minimum. 

For any two nodes there is a nonconstant harmonic function that is harmonic everywhere
else. More generally, we have the following theorem.

Theorem 1.10 For a connected simple graph G, nonempty set S ⊆ V and function f0 : S →
R, there is a unique function f : V → R extending f0 that is harmonic at each node of V \ S.

We call this function f the harmonic extension of f0 . Note that if |S| = 1, then the
harmonic extension is a constant function (and so it is also harmonic at S, and it does not
contradict Proposition 1.9).
The uniqueness of the harmonic extension is easy by the argument in the proof of Propo-
sition 1.9. Suppose that f and f 0 are two harmonic extensions of f0 . Then g = f − f 0 is
harmonic on V \ S, and satisfies g(v) = 0 at each v ∈ S. If g is the identically 0 function,
then f = f 0 as claimed. Else, either its minimum or its maximum is different from 0. But
we have seen that both the minimizers and the maximizers contain at least one pole, which
is a contradiction.
To prove the existence, we describe three constructions, which all will be useful.

7
(a) Let u be the (random) point where a random walk starting at vertex v hits S (we
know this happens almost surely), and let f (v) = E(f0 (u)). Then f is a harmonic extension
of f0 .

(b) Consider the graph G as an electrical network, where each edge represents a unit
resistance. Keep node u ∈ S at electric potential f0 (u), and define f (v) as the potential of
node v. Then f is a harmonic extension of f0 . (Use Kirchhoff’s Laws.)
(c) Consider the edges of the graph G as ideal rubber bands with unit Hooke constant
(i.e., it takes h units of force to stretch them to length h). Let us nail down each node u ∈ S
to point f0 (u) on the real line, and let the graph find its equilibrium. The energy is a positive
definite quadratic form of the positions of the nodes, and so there is a unique minimizing
position, which is in equilibrium. The positions of the nodes define a harmonic extension of
f0 .

Example 1.11 Let S = {a, b} and f0 (a) = 0, f0 (b) = 1. Let f be the harmonic extension.
Then f (v) is the probability that a random walk staring at v hits b before a.

We can extend the notion of harmonic functions to Markov chains. We say that a function
f : V → R is harmonic at node i, if
X
pij f (j) = f (i).
j

Essentially the same proof as for random walks implies that if a function on the nodes of an
irreducible Markov chain is harmonic at every node, then the function is constant.

Lemma 1.12 Let us stretch two nodes a, b of a graph G in the rubber band model to distance
1. Let F (a, b) be the force needed for this, and let f (u) be the position of node u if a is at
point 0 and b is at point 1.
(a) The effective resistance in the electrical network between a and b is 1/F (a, b).
(b) F (a, b) = ij∈E (f (i) − f (j))2 .
P

P
Proof. (a) We have F (a, b) = i∈N (a) f (i), since a neighbor i of a pulls a with force
f (a). On the other hand, if we fix the potentials f (a) = 0 and f (b) = 1, then by Ohm’s
Law, the current through an edge ai is f (i). Hence the current through the network is
P
i∈N (a) f (i) = F (a, b), and by Ohm’s Law, the effective resistance is 1/F (a, b).
(b) For the rubber band model, imagine that we slowly stretch the graph until nodes
a and b will be at distance 1. When they are at distance t, the force pulling our hands is
tF (a, b), and hence the energy we have to spend is
Z1
1
tF (a, b) dt = F (a, b).
2
0

8
This energy accumulates in the rubber bands. By a similar argument, the energy stored in
the rubber band ij is (f (i) − f (j))2 /2. By conservation of energy, we get the identity
X
(f (i) − f (j))2 = F (a, b). (11)
ij∈E


Exercise 1.13 For a finite Markov chain, its underlying graph is obtained by
connecting two states i and j by an edge if either pij > 0 or pji > 0. Prove that if
the underlying graph of a Markov chain is a tree, then the chain is time-reversible.
Exercise 1.14 Let G be a bipartite graph with bipartition {U, W }. Let us start
a random walk from a node u ∈ U . Prove that for i ∈ U ,
d(i)
σi2t → (t → ∞).
m
Exercise 1.15 Let G be the standard grid graph in the plane (defined on lattice
points, where two of them are connected by an edge if their distance is 1; so G is
countably infinite), and let f be a non-negative valued harmonic function on G.
Prove that f is constant.

2 Times
The return time Ru to node u is the expected number of steps of the random walk starting
at u before it returns to u.
The hitting time H(u, v) (also called access time) from vertex u to vertex v is the
expected number of steps of a random walk starting at u before visiting v. We set
Hmax = maxu,v H(u, v).
The commute time comm(u, v) between vertices u and v is the expected number of steps
of a random walk starting at u before visiting v and returning to u. Clearly comm(u, v) =
comm(v, u) = H(u, v) + H(v, u).
The cover time C(u) from vertex u is the expected number of steps of a random walk
starting at u before every vertex is visited. We set Cmax = maxu C(u).
Warning: often the hitting time is defined as a random variable, the number of steps of
the random walk starting at u before visiting v (which depends on the random walk), and the
number H(u, v) is called the expected hitting time or mean hitting time; similarly for comm
etc.
Example 1.2 concerns hitting time, Example 1.4 concerns cover time, while Example 1.5
is about “mixing time” to be discussed later.

2.1 Return time


We start with a technical lemma.

9
Lemma 2.1 Let v 0 , v 1 , . . . be a walk in a finite irreducible Markov chain started a node u.
Then the random variable T = min{t : v t = w} has finite expectation and variance for any
node w.

Proof. Between any two nodes u and w there is a path u0 = u, u1 , . . . , uk = w of length


k < n such that pui ,ui+1 > 0 for every 0 ≤ i < k. Therefore there is an ε > 0 such that
the probability that we visit w within n steps is at least ε. This holds for the next stretch
of n steps etc. By the Waiting Problem (Example 1.2), the expected number of stretches
to wait is at most 1/ε, so E(T) < 1/ε is finite. Similarly, E(T2 ) is finite, and hence so is
Var(T) = E(T2 ) − E(T)2 . 

This lemma implies that Ru is finite and well-defined.

Theorem 2.2 For every finite Markov chain and every node u, Ru = 1/πu .

Proof. Before giving an exact proof, let us describe a simple heuristic while this is true.
Consider a very long random walk v 0 , v 1 , . . . , v T started from the stationary distribution.
Then P(v t = u) = πu , and so the expected number of visits to u is T πu . The expected time
between two consecutive visits is Ru , so the total time T is about T πu Ru . So T = T πu Ru ,
and πu Ru = 1.
To make this precise, Let N be a large positive integer, let ε > 0, and set t = (1 − ε)RN .
Start a random walk from the stationary distribution. Let Tk denote the time of the k-th
visit to u, and let X denote the number of visits before time t.
We have E(Tk+1 − Tk ) = Ru , and the differences Tk+1 − Tk are independent identically
distributed random variables with finite expectation and variance by Lemma 2.1, hence
N −1
1 1 1 X
TN = T1 + (Tk+1 − Tk )
N N N
k=1

will be arbitrarily close to R if N is large enough, with probability arbitrarily close to 1. This
means that we probability at least 1 − ε, TN ≥ t, and in such cases, X ≤ N . Thus

E(X) ≤ P(X ≤ N )N + P(X > N )t ≤ N + εt.

On the other hand, we have E(X) = tπu , and so (πu − ε)t = (πu − ε)(1 − ε)Ru N ≤ N .
Dividing by N and letting ε → 0, we get πu Ru ≤ 1. The inequality πu Ru ≥ 1 follows
similarly. 

Corollary 2.3 For the random walk on a graph starting from node v, the expected number
of steps before an edge uv is passed (in this direction) is 2m.

10
2.2 Hitting time
There is a basic equation for hitting times:
1 X

1 +
 H(k, j), if i 6= j,
H(i, j) = d(i) (12)
k∈N (i)

0, if i = j.

Note that
1 X 1
1+ H(k, i) = Ru =
d(i) πu
k∈N (i)

is the return time to i, so in terms of the matrix

H = (H(i, j))ni,j=1 ,

we can write (12) as

(I − P )H = J − R, (13)

where R is the diagonal matrix with the return times in the diagonal.
We can give a nice geometric interpretation. Consider the graph as a rubber band struc-
ture, and attach a weight of d(v) to each node v. Nail the node b to the wall and let the
graph find its equilibrium. Each node v will be at a distance of H(v, b) below b.
The following two examples are easy to verify using this geometric interpretation.

Example 2.4 The hitting time on a path of length n from one endpoint to the other is n2 ;
more generally, from a node at distance k from an endpoint v to v is n2 −(n−k)2 = k(2n−k).

Example 2.5 The hitting time between two nodes at distance k of a circuit of length n is
k(n − k).

Example 2.6 The hitting time on the 3-dimensional cube from one vertex to the opposite
one is 10.

Example 2.7 Take a clique of size n/2 and attach to it an endpoint of a path of length n/2.
Let i be the attachment point, and j, the “free” endpoint of the path. Then

n3
H(i, j) = .
8
This last example is the worst for trying to hit a node as soon as possible, at least up to
a constant.

Theorem 2.8 For the random walk on a connected graph, for any two nodes at distance r,
we have H(i, j) < 2mr < n3 .

11
Proof. For two adjacent nodes i and j, starting from i, in expected 2m time the edge ij is
passed from j to i, so H(i, j) < 2m. More generally, let (i = v0 , v1 , . . . , vr = j) be a path, and
let Tk be the first time when v0 , . . . , vk have been visited. By the above, E(Tk+1 −Tk ) < 2m,
and hence E(Tr ) < 2mr < n3 . 

Example 2.9 H(u, v) is not a symmetric function, even for time-reversible chains (even
random walks on undirected graphs): for the first two nodes u and v on a path of length n,
we have H(u, v) = 1 but H(v, u) = 2n − 3. The H(u, v) may be different from H(v, u) even
on a regular graph (Exercise 2.24).
However, if the graph has a node-transitive automorphism group, then H(u, v) = H(v, u)
for any two nodes (Corollary 2.12).

Lemma 2.10 (Cycle Reversal Lemma) For the random walk on a connected graph, for
any three nodes u, v and w, H(u, v) + H(v, w) + H(w, u) = H(u, w) + H(w, v) + H(v, u).

Proof. Starting a random walk at u, walk until v is visited; then walk until w is visited;
then walk until u is reached. Call this random sequence a uvwu-tour. The expected number
of steps in a uvwu-tour is H(u, v) + H(v, w) + H(w, u). On the other hand, we can express
this number as follows. Let W = (u0 , u1 , . . . , uN = u0 ) be a closed walk. The probability
that we have walked exactly this way is
N −1
Y 1
P(W ) = ,
i=0
d(ui )

which is independent of the starting point and remains the same if we reverse the order.
Let a(W ) denote the number of ways this closed walk arises as a uvwu-tour, i.e., the
number of occurrences of u in W where we can start W to get a uvwu-tour (note that the
same value would be obtained by considering v or w instead of u). We shall show that the
number of ways the reverse closed walk W 0 = (uN , uN −1 , . . . , u0 = uN ) arises as a uwvu-tour
P
is also a(W ). Since the expected length of a uvwu-tour is W p(W )a(W )|W |, this will prove
the identity in the problem. (It will also follow that a(W ) is 1 or 2.)
Call an occurrence of u in the closed walk W “forward good” if starting from u and
following the walk until v occurs, then following it until w occurs, then following it until u
occurs, we traverse the whole walk exactly once. Call this occurrence “backward good” if this
holds with the orientation of W as well as the role of v and w reversed. Clearly a(W ) is the
number of “forward good” occurrences of u, so it suffices to verify that for every closed walk
W , the number of “forward good” occurrences of u is the same as the number of “backward
good” occurrences. (Note that a “forward good” occurrence need not be “backward good”.)
Assume that W arises as a uvwu-tour at least once; say u0 = u, ui = v, and uj = w
(0 < i < j < N ), where W1 = {u1 , . . . , ui−1 } does not contain v, W2 = {ui+1 , . . . , uj−1 }

12
does not contain w and W3 = {uj+1 , . . . , uN −1 } does not contain u. Assume first that W2
does not contain u either. Then u0 is the only “forward good” occurrence of u, and the last
occurrence of u in W1 is the only “backward good” occurrence.
Second, assume that W2 contains u. Similarly, we may assume that W3 contains v and W1
contains w. Let ut be the last occurrence of u in W2 . It is easy to check that ut is “backward
good”. So we see that if W arises as a uvwu-tour then it also arises as a uwvu-tour.
Assume now that a(W ) > 1. Then there must be a second “forward good” element, and
it is easy to check that this can only be the first occurrence us of u on W2 ; it also follows
that all occurrences of v on W2 must come before us , and similarly, all occurrences of w on
W3 must come before the first occurrence of v on W3 , and all occurrences of u on W1 must
come before the first occurrence of w on W1 . But in this case there are exactly two “forward
good” and exactly two “backward good” occurrences of u. So a(W ) = a(W 0 ) = 2. 

Corollary 2.11 The vertices of any graph can be ordered so that if u precedes v then
H(u, v) ≤ H(v, u).

Proof. Fix any node u, and define and ordering (v1 , . . . , vn ) of the nodes so that

H(u, v1 ) − H(v1 , u) ≥ H(u, v2 ) − H(v2 , u) ≥ · · · ≥ H(u, vn ) − H(vn , u).

Then by Lemma 2.10,

H(vi , vj ) + H(vj , u) + H(u, vi ) = H(vj , vi ) + H(vi , u) + H(u, vj ),

and so for i < j,

H(vj , vi ) − H(vi , vj ) = (H(u, vi ) − H(vi , u)) − (H(u, vj ) − H(vj , u)) ≥ 0.

Corollary 2.12 If a graph has a node-transitive automorphism group, then H(u, v) =


H(v, u) for any two nodes.

We express hitting times in terms of the transition matrix. By (18), we have


n
X
I −P = (1 − λk )uk vkT .
k=2

Consider the matrix


n
0
X 1
(I − P ) = uk vkT .
1 − λk
k=2

13
This is a pseudoinverse of I − P : the matrix I − P is singular, so it does not have a proper
inverse, but we have (I −P )(I −P )0 (I −P ) = I −P , (I −P )0 (I −P )(I −P )0 = (I −P )0 , and it is
easy to check that (I − P )0 (I − P ) and (I − P )(I − P )0 are symmetric matrices. Furthermore,
(I − P )0 1 = (I − P )0 u1 = 0 from the orthogonality relations, and so (I − P )0 J = 0. We can
compute (I − P )0 by making it nonsingular, and then inverting it:
1 T
(I − P )0 = (I − P + x1π T )−1 − 1π
x
with an arbitrary x 6= 0.
Let S = −(I −P )0 R, where R is the diagonal matrix with the return times in the diagonal.
Using that R = 2mD−1 , we can write this as

G = −2mD−1/2 (I − D−1/2 P D1/2 )0 D−1/2 = −2mD−1/2 (I − Pb)0 D−1/2 ,

showing that G is a symmetric matrix.

Lemma 2.13 H(i, j) = Gij − Gjj .

Proof. Recall the equation

(I − P )H = J − R. (14)

Unfortunately, the matrix I − P is singular, and so (14) does not uniquely determine H. But
using the generalized inverse,

(I − P )H = (I − P )(I − P )0 (I − P )H = (I − P )(I − P )0 (J − R) = −(I − P )(I − P )0 R,

and hence

(I − P )(H + (I − P )0 R) = 0.

The left nullspace of I − P consists of the multiples of 1, hence (I − P )X = 0 implies that


every column of X is constant. So

H = −(I − P )0 R + 1gT = G + 1gT . (15)

with some vector g. Using that Hii = 0, we get that gi = −Gii . 

Corollary 2.14 Let λ1 = 1 > λ2 ≥ . . . ≥ λn be the eigenvalues of the transition matrix P


of the random walk on graph G, and let u1 , . . . , un , v1 , . . . , vn be the corresponding right and
left eigenvectors. Then
n
X ukj vkj − uki vkj
H(i, j) = 2m .
d(j)(1 − λk )
k=2

14
Proof. We have
n n
X uki vkj Rj X uki vkj
Gij = ((I − P )0 R)ij = = 2m .
1 − λk d(j)(1 − λk )
k=2 k=2

Substituting in the formula of Lemma 2.13, the corollary follows. 

Lemma 2.15 (Random Target Lemma) For every Markov chain,


X
πj H(i, j) = N (16)
j

is independent of the starting node i.


P
Proof. Let f (i) = j πj H(i, j); we show that f is harmonic. Indeed, for every node i,
X X X X X
pij f (j) = pij πk H(j, k) = πk pij H(j, k)
j j k k j
X X
= πk (H(i, k) − 1) + πi (Ri − 1) = πk H(i, k) = f (i).
k6=i k


In the case of random walks on a graph, we can express N by the spectrum. Using (15),

Hπ = (I − P )0 Rπ + 1gT π = (I − P )0 1 + (gT π)1 = (gT π)1.

Hence
n
X 1
N = gT π = .
1 − λk
k=2

2.3 Commute time


Lemma 2.16 The probability that a random walk on a graph starting at u visits v before
returning to u is Ru /comm(u, v)).

Proof. Let T be the first time the random walk starting at u returns to u, and let S be the
first time it returns to u after visiting v. Then E(T) = Ru and E(S) = comm(u, v). Clearly
S ≥ T, and equality holds if and only if the random walk visits v before returning to u.
Hence

E(S − T) = p · 0 + (1 − p) · comm(u, v).

Thus

comm(u, v) = E(S) = E(T) + E(S − T) = Ru + (1 − p)comm(u, v),

and the lemma follows. 

15
Lemma 2.17 If a connected graph G is considered as an electrical network (with unit
resistances on the edges), then the effective resistance R(u, v) between nodes u and v is
2m/comm(u, v).

Proof. Let φ : V → R be the (unique) function that satisfies φ(u) = 0, φ(v) = 1, and is
harmonic at the other nodes.
Let us keep node u at potential 0 and node v at potential 1. By Ohm’s Law, the current
on an edge ij, in the direction from i to j, is φ(j) − φ(i). Hence the total current from u to
P
v is j∈N (u) φ(j), and the effective resistance of the network is
 X −1
R(u, v) = φ(j) . (17)
j∈N (u)

On the other hand, for every j, φ(j) is the probability that the walk hits v before u. Hence
P
(1/d(u)) j∈N (u) φ(j) is the probability that starting from u, we visit v before returning to
u. Thus by Lemma 2.16,
1 X Ru
φ(j) = .
d(u) comm(u, v)
j∈N (u)

Together with (17), this proves the lemma. 

Theorem 2.18 For any two nodes at distance r, comm(i, j) < 4mr < 2n3 .

This follows similarly as Theorem 2.8.

2.4 Cover time


Example 2.19 Assuming that we start from 0, the cover time of the path on n nodes will
also be (n − 1)2 , since it suffices to reach the other endnode.

Example 2.20 To determine the cover time Cn of a cycle of length n, note that it is the
same as the time needed on a very long path, starting from the midpoint, to visit n nodes.
We have to reach first n − 1 nodes, which takes Cn−1 steps on the average. At this point, we
have a subpath with n − 1 nodes covered, and we are sitting at one of its endpoints. To reach
a new node means to reach one of the endnodes of a path with n + 1 nodes from a neighbor
of an endnode. Clearly, this is the same as the hitting time between two consecutive nodes
of a circuit of length n, which is one less than the return time to the second node. This leads
to the recurrence

Cn = Cn−1 + (n − 1).

Hence Cn = n(n − 1)/2.

16
Theorem 2.21 Cmax < 2nm.

Let Hmax = maxu,v H(u, v) and Hmin = minu,v H(u, v).

Theorem 2.22 har(n)Hmin ≤ Cmax ≤ har(n)Hmax .

Proof. Let Let (σ1 , . . . , σn ) be a random permutation of the vertices, and let Ak be the
event that σk is the last visited node of {σ1 , . . . , σk }. Then
1
P(Ak ) = .
k
Note: this is independent of the walk. Let Tk be the first time σ1 , . . . , σk are all visited, then

E(Tk − Tk−1 | Ak ) ≤ Hmax ,

and

E(Tk − Tk−1 | Ak ) = 0.

Hence
1 k−1 Hmax
E(Tk − Tk−1 ) ≤ Hmax + 0= .
k k k
Summing over all k we get the upper bound in the theorem. The lower bound follows
similarly. 

2.5 Universal traverse sequences


Let G be a connected d-regular graph, u ∈ V (G), and assume that at each node, the ends of
the edges incident with the node are labeled 1, 2, . . . , d. A traverse sequence (for this graph,
starting point, and labeling) is a sequence (h1 , h2 , . . . , ht ) ⊆ {1, . . . , d}t such that if we start
a walk at v 0 = u and at the ith step, we leave the current vertex through the edge labeled
hi , then we visit every vertex. A universal traverse sequence is a sequence which is a traverse
sequence for every connected d-regular graph on n vertices, every labeling of it, and every
starting point.

Theorem 2.23 For every d ≥ 2 and n ≥ 2, there exists a universal traverse sequence of
length O(d2 n4 ).

Proof. The “construction” is easy: we consider a random sequence. More exactly, let
t = d8dn log ne, and let H = (h1 , . . . , ht ) be randomly chosen from {1, . . . , d}t . For a
3

fixed G, starting point, and labeling, the walk defined by H is just a random walk; so the
probability p that H is not a traverse sequence is the same as the probability that a random
walk of length t does not visit all nodes.

17
By Theorem 2.21, the expected time needed to visit all nodes is at most dn2 . Hence
(by Markov’s Inequality) the probability that after 2dn2 steps we have not seen all nodes is
less than 1/2. Since we may consider the next 2dn2 steps as another random walk etc., the
2
probability that we have not seen all nodes after t steps is less than 2−t/(4n )
= n−2nd .
Now the total number of d-regular graphs G on n nodes, with the ends of the edges
labeled, is less than ndn (less than nd choices at each node), and so the probability that
H is not a traverse sequence for one of these graphs, with some starting point, is less than
nnnd n−2nd < 1. So at least one sequence of length t is a universal traverse sequence. 

Exercise 2.24 Show by an example that H(u, v) can be different from H(v, u)
even for two nodes of a regular graph.
Exercise 2.25 Consider random walk on a connected regular graph G.
(a) The average of H(s, t) over all s ∈ N (t) is exactly n − 1.
(b) The average of H(s, t) over all s ∈ V (G) \ {t} is at least n − 1.
(c) The average of H(t, s) over all s ∈ V (G), with weights πs , is at least n − 1.
Exercise 2.26 Prove that the mean hitting time between two antipodal vertices
of the k-cube Qk is asymptotically 2k .
Exercise 2.27 What is the cover time of the path when starting from an internal
node?
Exercise 2.28 The mean commute time between any pair of vertices of a d-
regular graph is at least n and at most 2nd/(d − λ2 ).
Exercise 2.29 Let G0 denote the graph obtained from G by identifying s and
t, and let T (G) denote the number of spanning trees of G. Prove that Rst =
T (G0 )/T (G).
Exercise 2.30 (Raleigh’s Principle) Adding any edge to a graph G does not
increase the resistance Rst .
Exercise 2.31 Let G0 be obtained from the graph G by adding a new edge (a, b),
and let s, t ∈ V (G).
(a) Prove that the mean commute time between s and t in G0 is not larger than
the mean commute time in G.
(b) Show by an example that similar assertion is not valid for the mean hitting
time.
(c) If a = t then the mean hitting time from s to t is not larger in G0 than in G.
Exercise 2.32 Find a formula for the commute time in terms of the spectrum.
Exercise 2.33 Prove that the commute time between two nodes of a regular
graph is at least n.
Exercise 2.34 Let `(u, v) denote the probability that a random walk starting at
u visits every vertex before hitting v.
(a) Prove that if G is a circuit of length n then `(u, v) = 1/(n − 1) for all u 6= v.
(b) If u and v are two non-adjacent vertices of a connected graph G such that
{u, v} is not a cutset then there is a neighbor w of u such that `(w, v) < `(u, v).
(c) Assume that for every pair u 6= v ∈ V (G), `(u, v) = 1/(n − 1). Show that G
is either a circuit or a complete graph.

18
Exercise 2.35 Let N (u, v) denote the expected number of nodes a random walk
from u visits before v (including u; not the number of steps!). Prove that for every
u ∈ V (G) there exists a v ∈ V (G) \ {u} such that N (u, v) ≥ n/2.
Exercise 2.36 Let b be the expected number of steps before our random walk
visits more than half of the vertices. Prove that b ≤ 2Hmax .
S
Exercise 2.37 For S ⊆ V , define Hmin = minu,v∈A H(u, v). Prove that Cmax ≥
A
har(|A|)Hmin .

3 Mixing
Perhaps the most important use of random walks in practice is sampling: choosing a random
element from a prescribed distribution over a large set.
A general method for sampling from a probability distribution π over a large and com-
plicated set V is the following. We define a graph G with node set V whose stationary
distribution is π. Let us assume the graph is not bipartite (for example, consider the lazy
walk on it). By Theorem 1.8, σ t → π as t → ∞ for every starting distribution σ. This means
that by simulating a random walk on G for sufficiently many steps, we get a point whose
distribution is close to π. The question is, what does “close” mean, and how long do we have
to walk?
The total variation distance between two probability distributions on the same finite set
V is defined by

dvar (σ, τ ) = max(σ(A) − τ (A)).


A⊆V

Since σ(A) − τ (A) = τ (V \ A) − σ(V \ A), we could define the same value as the maximum
of τ (A) − σ(A), or as the maximum of |σ(A) − τ (A)|. It could be expressed as
1X
dvar (σ, τ ) = |σi − τi |.
2
i∈V

The ε-mixing time is defined as the smallest t such that

dvar (σ t , π) ≤ ε

for every starting distribution σ. It is easy to see that the worst starting distribution is
concentrated on a single node.
This definition makes sense only for non-bipartite graphs. To avoid this exception, we
often consider the lazy walk: At each step, we flip a coin and we stay if its Head, and move
according to the random walk if it is Tail. This can be described as the random walk on
a graph where we add d(i) loops to each node i. (Recall that adding a loop increases the
degree by 1 only.) Making the random walk lazy does not change the stationary distribution.
It doubles all the previous ”times”, since we are idle in half of the steps in expectation.

19
The transition matrix P of the lazy random walk can be expressed by the transition
matrix P of the original walk very simply:
1
P = (P + I).
2
Since all eigenvalues of P are ≥ −1, all eigenvalues of P 0 are nonnegative. In other words,
the symmetrized version is positive semidefinite.

Example 3.1 On a complete graph Kn , starting from a given node, we have


1
dvar (σ 1 , π) = .
n
More generally,
1
dvar (σ t , π) =
n(n − 1)t−1
(exercise ??).

The ε-mixing time is in general difficult to compute, even to estimate. We are going to
discuss four different methods.

3.1 Eigenvalues
Theorem 3.2 Let G be a connected graph and let λ1 = 1 > λ2 ≥ · · · ≥ λn be the eigenvalues
of its transition matrix. Set λ = max{λ2 , |λn |} and πmin = mini πi . Then
s
t π(S) t
|σ (S) − π(S)| ≤ λ.
πmin

If we start at node i, then


r
t πj t
|pij − πj | ≤ λ.
πi

Proof. Consider the spectral decomposition of P :


n
X
P = λk uk vkT , (18)
k=1

where uk and vk are the right and left eigenvectors of P , with u1 = 1 and v1 = π. So
n n
σ t (S) = σ T P t 1S == λtk σ T uk vkT 1S = π(S) + λtk σ T uk vk 1S ,
X X

k=1 k=2

and hence
n n
|σ T uk | |vkT 1S | ≤ λt |σ T uk | |vkT 1S |
X X
|σ t (S) − π(S)| ≤ λt
k=2 k=1

20
We can estimate this using Cauchy–Schwarz:

n n
!1/2 n
!1/2
1 ≤ 1
X X X
T
|σ uk | |vkT S | T
|σ uk | 2
|vkT S |2 .
k=1 k=1 k=1

Using that (wk )nk=1 is an orthonormal basis,

n n n
!2
X X X σi 1
|σ T uk |2 = |σ T Π−1/2 wk |2 = |Π−1/2 σ|2 = √ ≤ , (19)
i=1
πi πmin
k=1 k=1

and similarly
n
|vkT 1S |2 = |Π1/2 1S |2 =
X X
πi = π(S).
k=1 i∈S

Thus
1 p
|σ t (S) − π(S)| ≤ λt √ π(S)
πmin

as claimed. If σ is concentrated on node i, then we don’t have to make the last step in (19).


Example 3.3 Consider the k-cube as above. For a ∈ {0, 1}k , consider the vector v a ∈ RQk
defined by
T
vxa = (−1)a x .

It is easy to check that this is an eigenvector of P with eigenvalue 1 − 2|a| k , where |a|1 =
1

Pk k a
i=1 ai . Since there are 2 such vectors v , these are all eigenvectors. So the eigenvalues of
P are
2 4 2k
1, 1 − ,1 − ,...,1 − = −1.
k k k
So the eigenvalues of the lazy walk Plazy are

1 2 k
1, 1 − , 1 − , . . . , 1 − = 0.
k k k
Hence by Theorem 3.2, we have
√  1 t
dvar (σ t , π) ≤ 2k 1 − < 2k/2 et/k .
k
So to t = k 2 + Ck, we have dvar (σ t , π) < (2/e)k/2 e−C .

21
3.2 Coupling
A coupling in a Markov chain is a sequence of random pairs of nodes ((v 0 , u0 ), (v 1 , u1 ), . . . )
such that each of the sequences (v 0 , v 1 , . . . ) and (u0 , u1 , . . . ) is a walk of the chain.

Lemma 3.4 (Coupling Lemma) Let (v 0 , v 1 , . . . ) be random walk starting from distribu-
tion σ, and let (u0 , u1 , . . . ) be another random walk starting from the stationary distribution
π. Then

dvar (σ t , π) ≤ P(v t 6= ut ).

If we can couple the two random walks so that P(v t 6= ut ) < ε for some t, then we get a
bound on the ε-mixing time.

Proof. Clearly ut is stationary for every t ≥ 0. Hence for every S ⊆ V ,

P(v t ∈ S) − π(S) = P(v t ∈ S) − P(ut ∈ S)


= P(v t ∈ S | ut = v t )P(ut = v t ) + P(v t ∈ S | ut 6= v t )P(ut 6= v t )
− P(ut ∈ S | ut = v t )P(ut = v t ) − P(ut ∈ S | ut 6= v t )P(ut 6= v t )
= (P(v t ∈ S | ut 6= v t ) − P(ut ∈ S | ut 6= v t ))P(ut 6= v t )
≤ P(ut 6= v t ).

Example 3.5 Consider a random walk on the k-cube Qk . Since this is bipartite, we look at
the lazy version. The vertices of the cube are sequences (x1 , x2 , . . . , xk ), where xi ∈ {0, 1}.
Let us assume that we start at vertex (0, . . . , 0). Then the random walk can be described as
being at (x1 , x2 , . . . , xk ), we choose a coordinate 1 ≤ i ≤ k, and flip it with probability 1/2.
So a tour of length t is described by a sequence (i1 , . . . , it ) (is ∈ {1, . . . , k}), and a sequence
(ε1 , . . . , εt ) (εs ∈ {0, 1}). Step s consists of adding εs to xis modulo 2.
If we condition on the sequence (i1 , . . . , it ), then each xi for which i ∈ {i1 , . . . , it } will be
uniformly distributed over {0, 1}, and these distributions are independent. So conditioning
on {i1 , . . . , it } = {1, . . . , k}, the distribution will be uniform on all vertices of the cube.
The probability that {i1 , . . . , it } =
6 {1, . . . , k} can be estimated from the Coupon Collector’s
Problem: The expectation of the first t for which {i1 , . . . , it } = {1, . . . , k} is khar(k), and
so the probability that in 2khar(k) steps we have not seen all directions is at most 1/2 by
Markov’s Inequality. It follows that after 2N khar(k), the probability that {i1 , . . . , it } =
6
{1, . . . , k} is at most 2−N . Hence the ε-mixing time is at most log2 (1/ε)khar(k).
Compare this time bound with Exercise 2.26: the mean hitting time between antipodal
vertices of Qk is approximately 2k .
Consider the lazy walk v 0 , v 1 , . . . on the k-cube Qk started from an arbitrary distribu-
tion σ 0 . We couple it with the lazy random walk u0 , u1 , . . . started from the stationary

22
(uniform) distribution π. Recall that the random walk v 0 , v 1 , . . . is determined by two
random sequences: a sequence (i1 , i2 . . . ) (is ∈ {1, . . . , k}) of direction, and a sequence
(ε1 , ε2 , . . . ) (εs ∈ {0, 1}) go-or-wait decisions. Let us use the same sequence of directions for
u0 , u1 , . . . , but choose (ε01 , ε02 , . . . ) as follows. At a given time t, let v t = (x1 , . . . , xk ) and
ut = (y1 , . . . , yk ). If xit = yit , then let ε0t = εt ; else, let ε0t = 1 − εt .
After this step, the ii -th coordinate of ut is equal to the it -th coordinate of v t , and they
remain equal forever. So if all coordinates have been chosen, then ut = v t . By the same
computation as before, we see that after t = 2Ck ln k steps, we have
1
dvar (σ t , π) ≤ P(ut 6= v t ) ≤ .
C
Example 3.6 Consider the lazy walk v 0 , v 1 , . . . on the path of length n started from an
arbitrary distribution σ 0 . We couple it with the lazy random walk u0 , u1 , . . . started from
the stationary distribution π. A step of the random walk is generated by a coin flip interpreted
as ”Left” or ”Right”, and another coin flip interpreted as ”Go” or ”Stay”. If we are at an
endpoint of the path, then both ”Left” and ”Right” mean the same move.
It will be convenient to assume that ut and v t start at an even distance from each other.
This can be achieved by interpreting the first “Go” and “Stay” outcome differently if neces-
sary. From here on, the two walks will “Go” and “Stay” at the same time. The coupling of
the directions is also very simple: the two moves are independent until the two walks collide,
and from then on, they stay together.
Suppose that u0 is to the right of v 0 . Then by the time v t reaches the right end of
the path, the two walks must have collided. This takes at most n2 expected time. By
Markov’s Inequality, the probability that v t 6= ut for t = 2n2 is less than 1/2, and so after
t = 2n2 log(1/e) steps,

dvar (σ t , π) ≤ P(ut 6= v t ) ≤ ε.

Note that if we start from an endpoint of the path, and dvar (σ t , π) < 1/4 for some t,
then the walk must end up in the right half of the path. If T is the first time it reaches the
midpoint (say, n is even), then E(T ) = n2 /4. With some work, one can deduce from this
that dvar (σ t , π) < 1/4 implies that t > n2 /8.

3.2.1 Random coloring of a graph

We want to choose a uniformly random k-coloring of a graph G (such that adjacent nodes
get different colors). The method we describe will work when the maximum degree D of G
satisfies k > 3D.
We define a graph H whose nodes are the k-colorations of G, with two of them connected
by an edge if they differ at a single node. The degrees of H are bounded by kn; we make it
kn-regular by adding appropriately many loops at the nodes.

23
While the graph itself may be exponentially large, it is easy to generate a random walk
on H. If we have a k-coloring α, we select a uniformly random node n of G, and select a
uniformly random color i; the pair (v, i) is the seed of the step. The new coloring α0 is defined
by
(
0 i ha u = v,
α (u) =
α(u) otherwise,

if this is a legal coloring; else, let α0 = α0 . This random walk on colorings is called “Glauber
dynamics”.

Theorem 3.7 If D > 3k, then starting from any k-coloring α, we have

dvar (αt , π) < e−t/(kn) n.

So if t = kn ln(n/ε), then dvar (αt , π) < ε.

Proof. We use the Coupling Lemma 3.4. Starting from two colorings α0 and β 0 (where
β 0 is uniformly random), we construct two random walks α0 , α1 , . . . and β 0 , β 1 , . . . by using
the same random seed in both. We want to show that

P(αt 6= β t ) ≤ ne−t/(kn) . (20)

Let U t = {v ∈ V : αt (v) 6= β t (v)}, W t = V \ U t and X t = |U t |. For v ∈ V , let a(v)


denote the number of edges joining v to the other class of the partition {U t , W t }. We claim
that
 1  t
E(X t+1 | X t ) ≤ 1 − X . (21)
kn
Indeed, let (v, i) be the seed at step t, and let us fix v. If v ∈ U t , then X t − 1 ≤ X t+1 ≤ X t ,
and we have X t − 1 = X t+1 if color i does not occur among the neighbors of v in either
coloring αt or β t . Since at least a(v) colors are common, the probability of this is

k − 2d(v) + a(v)
P(X t+1 = X t − 1 | v) ≥ . (22)
k
Ha v ∈ W t , then X t ≤ X t+1 ≤ X t + 1, and we lose only if color i occurs among the neighbors
of v in exactly one of the colorings αt and β t . There are at most 2a(v) such colors, and so

2a(v)
P(X t+1 = X t + 1 | v) ≤ . (23)
k
Thus
X k − 2d(i) + a(i) X 2a(i)
E(X t+1 | X t ) ≤ X t − + .
t
kn t
kn
i∈U i∈W

24
Using that
X X
a(i) = a(i),
i∈U t i∈W t

we get
X k − 2d(i) − a(i) X k − 3d(i)
E(X t+1 | X t ) ≤ X t − ≤ Xt − .
t
kn t
kn
i∈U i∈U

The last sum is at most X t /(kn), and so (21) follows.


From (21) we get by induction
 1 t
E(X t ) ≤ 1 − n,
kn
and hence by Markov’s Inequality,
 1 t
P(αs 6= β s ) = P(X t ≥ 1) ≤ 1 − n < e−t/(kn) n.
kn


3.3 Conductance
The conductance of a Markov chain is defined by
P P
i∈A j∈V \A πi pij
Φ = min ,
A π(A)π(V \ A)
where the minimum is extended over all non-empty proper subsets A of V . The numerator is
the frequency with which a very long random walk steps from A to V \A. The numerator is the
frequency with which a very long sequence of independent random nodes from the stationary
distribution steps from A to V \ A. So the ratio measures how strongly non-independent the
nodes in a random walk are.
Let (say) π(A) ≤ 1/2, then π(V \ A) ≥ 1/2 and
X X XX X
pij πi ≤ πi pij = πi = π(A),
i∈A j∈V \A i∈A j∈V i∈A

and hence Φ ≤ 2.
In the case of a random walk on a graph, we have πi pij = 1/(2m) for every edge ij. Let
e(A, V \ A) denote the number of edges connecting A to V \ A, then

e(A, V \ A)
Φ = min .
A 2mπ(A)π(V \ A)
We can use the conductance to bound the eigenvalue gap and through this, the ε-mixing
time.

25
Φ2
Theorem 3.8 ≤ 1 − λ2 ≤ Φ.
16
We need a couple of lemma. The following formula for the eigenvalue gap in the case of
random walk on a graph follows from more general results in linear algebra, but we describe
a direct proof.

Lemma 3.9
 
1  X X X 
1 − λ2 = min (xi − xj )2 : πi xi = 0, πi x2i = 1 .
2m 
i i

ij∈E(G)

Proof. Let y be the unit eigenvector of the symmetrized transition matrix Pb, belonging

to the eigenvalue λ2 , and define xi = yi / πi . the vector y is orthogonal to the eigenvector

( πi : i ∈ V ) belonging to the eigenvalue 1, and hence
X X√
πi xi = πi yi = 0,
i i

and
X X
πi x2i = yi2 = 1.
i i

Furthermore,
1 X 1 X 1 X X
(xi − xj )2 = d(i)x2i − xi xj = πi x2i − y T Pby = 1 − λ2 .
2m 2m m
ij∈E(G) i∈V ij∈E(G) i∈V

It follows ba a similar computation that this choice of x minimizes the right hand side. 

The second lemma can be considered as a linearized version of the theorem.

Lemma 3.10 Let G be a graph with conductance Φ. Let y ∈ RV and assume that π({i : yi >
P
0}) ≤ 1/2, π({i : yi < 0}) ≤ 1/2 and i π(i)|yi | = 1. Then

1 X
|yi − yj | ≥ Φ.
m
ij∈E

Proof. Label the points by 1, . . . , n so that

y1 ≤ · · · ≤ yt < 0 = yt+1 = · · · = ys < ys+1 ≤ . . . ≤ yn .

Set Si = {1, . . . , i}. Substituting yj − yi = (yi+1 − yi ) + · · · + (yj − yj−1 ), we get

X n−1
X n−1
X
|yi − yj | = |∇(Si )|(yi+1 − yi ) ≥ 2mΦ (yi+1 − yi )π(Si )π(V \ Si ).
ij∈E i=1 i=1

26
Using that π(Si ) ≤ 1/2 for i ≤ t, π(Si ) ≥ 1/2 for i ≥ s + 1, and that yi+1 − yi = 0 for
t < i < s, we obtain
X t
X n−1
X
|yi − yj | ≥ mΦ (yi+1 − yi )π(Si ) + mΦ (yi+1 − yi )π(V \ Si )
ij∈E i=1 i=t+1
X
= mΦ π(i)|yi | = mΦ.
i


Proof of Theorem 3.8. We prove the upper bound first. By Lemma 3.9, it suffices to
exhibit a vector x ∈ RV such that
X X
πi xi = 0, πi x2i = 1 (24)
i i

and
1 X
(xi − xj )2 = Φ. (25)
2m
ij∈E(G)

Let S be the minimizer in the definition of the conductance, and consider a vector of the
type
(
a, if i ∈ S,
xi =
b, if i ∈ V \ S.
Then the conditions are

π(S)a + π(V \ S)b = 0, π(S)a2 + π(V \ S)b2 = 1.

Solving these equations for a and b, we get


s s
π(V \ S) π(S)
a= , b=− ,
2mπ(S) 2mπ(V \ S)
and then straightforward substitution shows that (25) is satisfied as well.
To prove the lower bound, we again invoke Lemma 3.9: we prove that for every vector
x ∈ RV satisfying (24), we have
X Φ2
(xi − xj )2 ≥ . (26)
8
ij∈E(G)

Let x be any vector satisfying (24). We may assume that x1 ≥ x2 ≥ . . . ≥ xn . Let k


(1 ≤ k ≤ n) be the index for which π({1, . . . , k − 1}) ≤ 1/2 and π({k + 1, . . . , n}) < 1/2.
Setting zi = max{0, xi − xk } and choosing the sign of x appropriately, we may assume that
X 1X 1X X m
π(i)zi2 ≥ π(i)(xi − xk )2 = π(i)x2i − xk π(i)xi + x2k
i
2 i
2 i i
2
1 m 2 1
= + xk ≥ .
2 2 2

27
Now Lemma 3.10 can be applied to the numbers yi = zi2 / π(i)zi2 , and we obtain that
P
i
X X
|zi2 − zj2 | ≥ mΦ π(i)zi2 .
ij∈E i

On the other hand, using the Cauchy-Schwartz inequality,


 1/2  1/2
X X X
|zi2 − zj2 | ≤  (zi − zj )2   (zi + zj )2  .
ij∈E ij∈E ij∈E

It is easy t see that


X X
(zi − zj )2 ≤ (xi − xj )2 ,
ij∈E ij∈E

while the second factor can be estimated as follows:


X X X X
(zi + zj )2 ≤ 2 (zi2 + zj2 ) = 2 d(i)zi2 = 4m π(i)zi2 .
ij∈E ij∈E i∈V i

Combining these inequalities, we obtain


 2
X X X .X
(xi − xj )2 ≥ (zi − zj )2 ≥  |zi2 − zj2 | (zi + zj )2
ij∈E ij∈E ij∈E ij∈E
!2
X . X
≥ Φ2 m 2 π(i)zi2 4m π(i)zi2
i i
2
Φ mX Φ2 m
= π(i)zi2 ≥ .
4 i
8

Dividing by 2m, the theorem follows. 

Corollary 3.11 For any starting distribution σ, any subset A ⊆ V and any t ≥ 0,
t
Φ2

1
dvar (σ t , π) ≤ √ 1− .
πmin 16
To estimate the conductance, the following lemma is often very useful.

Lemma 3.12 Let F be a multiset of paths in G and suppose that for every pair i 6= j of
nodes there are πi πj N paths in the family connecting them for some N ≥ 1. Suppose that
each edge of G belongs to at most K of these paths. Then Φ ≥ N/(2mK).

Proof. Let A ⊂ V , A 6= ∅. Let FA denote the subfamily of paths in F with one endpoint
in A and one endpoint in V \ A. Then
X X
|FA | = πi πj N = π(A)π(V \ A)N.
i∈A j∈V \A

28
On the other hand, every path in FA contains at least one edge connecting A to V \ A. So
if e(A, V \ A) denotes the number of such edges, then

|FA | ≤ Ke(A, V \ A).

Hence
e(A, V \ A) |FA | N
Φ≥ ≥ = .
2mπ(A)π(V \ A) 2mKπ(A)π(V \ A) 2mK

Corollary 3.13 Let G be a graph with a node- and edge-transitive automorphism group and
diameter d ≥ 1.Then Φ > 1/d.

Proof. Let us select a shortest path Pi j between every pair of nodes {i, j}. Every node
has πi = 1/n. Let F consist of these paths and their images under all automorphisms. So
|F| = g n2 , where g is the number of automorphisms, and every pair of nodes is connected


by g paths. The total length of these paths is at most dg n2 . By symmetry, every edge is


covered by the same number of paths, which is at most dg n2 /m. So we can apply Lemma


3.12 with N = n2 g and K = dg n2 /m, to get that




N n2 1
Φ≥ = n > .

2mK 2d 2 d

Example 3.14 For the k-cube Qk , the last corollary gives that Φ > 1/k, and so the eigen-
value gap is between 1/(8k 2 ) and 1/k. We know that the eigenvalue gap is in fact 1/k.

Exercise 3.15 Prove that dvar (σ, τ ) satisfies the triangle inequality.
Exercise 3.16 Prove that dvar (σ, τ ) is convex in both arguments.
Exercise 3.17 Prove that the ε-mixing time for the random walk on a k × k grid
is at least cn2 and at most Cn2 , where c and C are constants depending on ε only.
Exercise 3.18 A coupling of two random walks u0 , u1 , . . . and v 0 , v 1 , . . . is called
Markovian, if the sequence of pairs (ui , vi ) forms a Markov chain, i.e., given ui and
vi , the distribution of the pair (ui+1 , vi+1 ) does not depend on the “prehistory”.
(All couplings we constructed in the lecture had this property.)
Let T be the (random) first time when uT = v T . Let s = E(T). Prove that
Var(T) ≤ 8T 2 .
Exercise 3.19 Let us start a lazy random walk on a graph from node v, and let
σ t denote the distribution after t steps. Prove that σvt is monotone decreasing as
a function of t.

29
4 Stopping rules
A stopping rule Γ is a map V ∗ → [0, 1], such that Γ(v 0 , . . . , v t ) is interpreted as the probability
of continuing given that (v 0 , . . . , v t ) is the walk so far observed. Each such stop-or-go decision
is made independently. (It suffices to define Γ whenever (v 0 , . . . , v t ) has positive probability,
and Γ(v 0 , . . . , v r ) > 0 for 0 ≤ r < t.)
We usually assume that the walk is stopped by Γ after a finite number of steps with
probability 1. In this case, Γ can be regarded as a random variable with values in {0, 1, . . . },
so that we stop at v Γ . The condition on Γ is that it is independent of the later part of the
walk, i.e., of v Γ+1 , v Γ+2 , . . . In this disguise, stopping rules are usually called stopping times.
The condition EΓ < ∞ is sufficient (but not necessary) for the walk to be stopped eventually
with probability 1.
We denote by σiΓ the probability that starting from distribution σ and following Γ, the
chain is stopped at node i. If the walk stops with probability 1, then σ Γ is a probability
distribution. A stopping rule Γ for which σ Γ = τ is also called a stopping rule from σ to τ .

Example 4.1 The simplest stopping rule is “Stop after k steps”. Here k can be specified,
but it can be generated randomly from some probability distribution over the nonnegative
integers. The rule when k is chosen uniformly from {0, . . . , t} will be important. The rule
when k is chosen from the binomial distribution with parameters t and 1/2 is equivalent to
stopping the lazy walk after t steps.
The hitting time from u to v is the mean length of the rule “ Stop when reaching v”. The
cover time is the mean length of the rule “Stop when all nodes have been visited”.

For any starting distribution σ and target distribution τ , there is at least one stopping
rule Γ such that σ Γ = τ ; namely, we select a target state j from the distribution τ and walk
until we reach j. We call this the naive stopping rule from σ to τ , and denote it by Ωσ,τ .
The mean length of this rule is given by
X
EΩσ,τ = σi τj H(i, j) = σ T Hτ.
i,j

In general, we can find much more efficient stopping rules. Rule Γ is said to be optimal (for
distributions σ and τ ) if EΓ is minimal among all rules Γ with σ Γ = τ . We will show that
this minimum is always attained, and will characterize those rules that are optimal. To this
end, we have to introduce some further quantities.

4.1 Exit frequencies


Given a Markov chain, a starting distribution, and a stopping rule Γ, the exit frequency
xi = xi (Γ) (i ∈ V ) is the expected number of times the walk leaves node i before stopping.
P
It is clear that i xi (Γ) = EΓ.

30
We start with the following basic identity.

Lemma 4.2 The exit frequencies of any stopping rule Γ from distribution σ to τ satisfy
X
pij xi (Γ) − xj (Γ) = τj − σj (j ∈ V ).
i

Proof. This follows by counting the number of times the node j is entered and exited. 

Introducing the vector x(Γ) = (xi (Γ) : i ∈ V ), we can write this identity as

(P T − I)x = τ − σ. (27)

The exit frequencies are almost determined by the starting and ending distributions:

Lemma 4.3 For any distributions σ and τ , and for any two stopping rules Γ and Γ0 from
σ to τ , and any i ∈ V ,

xi (Γ) − xi (Γ0 ) = πi (EΓ − EΓ0 ).

Proof. If x0 is the vector of exit frequencies of Γ0 , then by (27) we have (P T −I)(x−x0 ) = 0.


Since the null space of PIT consists of the multiples of π, it follows that x − x0 = Kπ for some
real number K. By summing all entries, we see that K = EΓ − EΓ0 . 

If, in particular, σ = τ , and Γ0 is the rule “don’t move”, then we get that from any
stopping rule Γ with the same starting and ending distribution

xi (Γ) = πi EΓ. (28)

As an application of this formula, start a walk from a node i, and use the stopping rule
Γ: “stop when you return to i”. The expected length is the return time: EΓ = Ri . Equation
(28) gives that xi (Γ) = πi Ri . But clearly xi (Γ) = 1 (we exit the starting node exactly once),
and hence Ri = 1/πi . We also get that the expected number of times node j is visited before
returning to i is πj Ri = πj /πi .

Lemma 4.4 The exit frequencies for the naive stopping rule from state i to state j are given
by

xk (i, j) = πk H(i, j) + H(j, k) − H(i, k) .

Proof. Clearly xj (i, j) = 0 for every i and j.


Starting from node i, consider stopping rule “stop when you reach i after having seen j”.
By (28),

xk (i, j) + xk (j, i) = πk (H(i, j) + H(j, i)).

31
In particular,

xk (k, i) = πk (H(i, k) + H(k, i)) − xk (i, k) = πk (H(i, k) + H(k, i)).

By a similar argument,

xk (i, j) + xk (j, k) + xk (k, i) = πk (H(i, j) + H(j, k) + H(k, i)).

and so

xk (i, j) = πk (H(i, j) + H(j, k) + H(k, i)) − xk (j, k) − xk (k, i)


= πk (H(i, j) + H(j, k) + H(k, i)) − πk (H(i, k) + H(k, i))
= πk (H(i, j) + H(j, k) − H(k, i)).

It follows easily that the exit frequencies for the naive stopping rule from σ to τ are given
by
X
xk (Ωσ,τ ) = πk σi τj (H(i, j) + H(j, k) − H(i, k)) (29)
i,j

A state j for which xj (Γ) = 0 is called a halting state for the rule Γ.

Theorem 4.5 A stopping rule is optimal for starting distribution σ and target distribution
τ if and only if it has a halting state.

Proof. The “if” direction is easy. Let Γ be a rule from σ to τ that has a halting state j,
0
and let Γ be any other rule from σ to τ . By Lemma 4.3,

xi (Γ0 ) − xi (Γ) = πi (EΓ0 − EΓ).

In particular,

πj (EΓ0 − EΓ) = xj (Γ0 ) ≥ 0,

showing that EΓ0 ≥ EΓ. Since this holds for every Γ0 , rule Γ is optimal.
To prove the converse, we need to construct a stopping rule from σ to τ with a halting
state. Starting the chain from distribution σ, let αU be the distribution of the first node in
U , for every nonempty subset U of V . For example, αV = σ. We consider αU as a probability
distribution on V , even though it is concentrated on U .
We construct a decomposition
k
X
τ= γi αSi , (30)
i=1

32
where S1 ⊃ S2 ⊃ · · · ⊃ Sk are nonempty subsets of V , and γi > 0.
Let S1 be the support of τ , and γ1 = max{c : cαS1 ≤ τ }. Then the support S2 of
τ − γ1 αS1 is a proper subset of S1 . Assuming that S1 ⊃ S2 ⊃ · · · ⊃ Sr and γ1 , . . . , γr have
been chosen, we define Sr+1 as the support of τ − γ1 αS1 − · · · − γr αSr . If Sr+1 = ∅, we stop.
Else, let

γr+1 = max{c : cαSr+1 ≤ τ − γ1 αS1 − · · · − γr αSr }.


P
It is easy to see that this gives a decomposition (30). Summing over V , we get that i γi = 1,
so the coefficients γi give a probability distribution on {1, . . . , k}.
This leads to a simple stopping rule: we choose i randomly from the distribution γ, and
walk until we hit Si . The probability that we stop at node v is
k
X
γi (αSi )v = τv .
i=1

Furthermore, any node v ∈ Sk is a halting state, since v ∈ Si for every choice of i. 

Remark 4.6 The stopping rule in the proof can be called the “chain rule”, referring to
the chain of subsets. A rather neat way to think of this rule is to assign a “price” av =
P
{γi : Si 3 v} to each node v. The rule is then implemented by choosing a random
“budget” b uniformly from [0, 1] and walking until a state j with aj ≤ b (an item that we
can buy) is reached.

Example 4.7 If we start from an endpoint on a path, then the naive rule for reaching the
stationary distribution is optimal. Indeed, the other endpoint is a halting state.

Example 4.8 The analysis in Example 3.5 suggests the following stopping rule Γ for the lazy
random walk on the k-cube Qk : Starting from a vertex u, we stop when (with the notation
there) {i1 , . . . , it } = {1, . . . , k}. We have seen that the distribution of v Γ is uniform on all
vertices of the cube, so we have a stopping rue from u to π. If we ever reach the vertex u0 of
the cube opposite to u, then we have flipped every coordinate, so we must stop. This means
that u0 is a halting state for this rule, and so it optimal. It follows that H(uπ) = khar(k).

It follows from Lemma 4.3 that the exit frequencies of any optimal stopping rule from σ
to τ are the same. We denote them by xi (σ, τ ). We denote by H(σ, τ ) the expected length of
any optimal stopping rule. If σ is concentrated on a node i and τ is concentrated on a node
j, then H(σ, τ ) = H(i, j).
Theorem 4.9 The expected number of steps of an optimal stopping rule is given by

H(σ, τ ) = max H(σ, j) − H(τ, j) ,
j

and the exit frequencies are given by

xk (σ, τ ) = πk (H(τ, k) − H(σ, k) + H(σ, τ )) .

33
The proof will show that the maximum is achieved precisely when j is a halting state of
some optimal rule which stops at τ when started at σ.

Proof. Recall that the exit frequencies of the naive stopping rule are given by
X
xk (Ωσ,τ ) = πk σi τj (H(i, j) + H(j, k) − H(i, k)).
i,j

To get the exit frequencies of an optimal stopping rule we have to subtract a multiple of π
to make the minimal exit frequency equal to zero. This means that
X X 
xk (σ, τ ) = πk σi τj (H(i, j)+H(j, k)−H(i, k))−min σi τj (H(i, j)+H(j, m)−H(i, m) .
m
i,j i,j

After cancelation, we get



xk (σ, τ ) = πk (H(τ, k) − H(σ, k)) + max(H(σ, m) − H(τ, m)) .
m

Summing over k,
X X
H(σ, τ ) = πk H(τ, k) − πk H(σ, k) + max(H(σ, m) − H(τ, m)).
m
k k

The first two terms cancel by the Random Target Lemma. This proves the first formula.
Using it, we get

xk (σ, τ ) = πk (H(τ, k) − H(σ, k)) + H(σ, τ )

as claimed. 

4.2 Mixing and ε-mixing


In practice, it could be difficult to follow a complicated stopping rule, but we can follow a
simple rule and not lose much.

Theorem 4.10 Let Y be chosen uniformly from {0, . . . , t − 1}. Then for any starting dis-
tribution σ and 0 ≤ ε ≤ 1,
1
dvar (σ Y , π) ≤ H(σ, π).
t

Proof. Let Ψ be an optimal stopping rule from σ to π. Consider the following rule: follow
Ψ until it stops at v Ψ , then generate Z ∈ {0, . . . , t − 1} uniformly (and independently from
the previous walk), and walk Z more steps. Since Ψ stops with a node from the stationary
distribution, σ Ψ+t is also stationary for every t ≥ 0 and hence so is σ Ψ+Z .

34
On the other hand, let Y = Ψ+Z (mod t), then Y is uniformly distributed over {0, . . . , t−
1}, and so

σiY = P(v Y = i) ≥ P(v Ψ+Z = i, Ψ + Z ≤ t − 1)


≥ P(v Ψ+Z = i) − P(v Ψ+Z = i, Ψ + Z ≥ t)
= πi − P(v Ψ+Z = i, Ψ + Z ≥ t).

Hence for every set A of nodes,

σ Y (A) ≥ π(A) − P(v Ψ+Z ∈ A, Ψ + Z ≥ t) ≥ π(A) − P(Ψ + Z ≥ t).

For any fixed value of Ψ, the probability that Ψ + Z ≥ t is at most Ψ/t, and hence
Ψ 1
P(Ψ + Z ≥ t) ≤ E = H(σ, π).
t t
So π(A) − σ Y (A) ≤ H(σ, τ )/t, proving the theorem. 

Exercise 4.11 Prove that H(σ, τ ) satisfies the triangle inequality: H(σ, ρ) ≤
H(σ, τ ) + H(τ, ρ).
Exercise 4.12 Prove that H(σ, τ ) is convex in both arguments.
Exercise 4.13 Let α, β, γ, δ be four distributions such that α − β = c(γ − δ).
Then H(α, β) = cH(γ, δ).
Exercise 4.14 Consider the following “local” stopping rule Λ, given distributions
P
σ and τ on V : Compute numbers xi (i ∈ V ) such that mini xi = 0, and i pij xi −
xj = τj − σj for all j ∈ V . If we are at node i, we stop with probability τi /(xi + τi )
(if xi = τi = 0 the stopping probability can be set to 0).
(a) Prove that there exist numbers xi satisfying the above conditions.
(b) Prove that the rule stops with probability 1.
(c) Prove that σ Λ = τ .
(d) Prove that Λ is an optimal rule from σ to τ .
Exercise 4.15 Prove that for the random walk on a graph,

H(i, π) = max H(j, i) − H(π, i).


j

Exercise 4.16 Let Φ : σ → τ and Ψ : σ → ρ be two stopping rules such that Φ


stops not later than Ψ for any walk starting from σ. Prove that

H(τ, ρ) ≤ EΨ − EΦ.
Exercise 4.17 Call a node s of a graph G pessimal if it maximizes H(s, π). Let
i be a pessimal node and let j be a halting node from i to π. Then j is also
pessimal, and i is a halting state for H(j, π). In particular, every graph with at
least two nodes has at least two pessimal nodes.
Exercise 4.18 Assume that there is a stopping rule from σ to τ that never makes
more than t steps (t ≥ 0).
(a) Prove that H(σ, τ ) + H(τ, σ t ) ≤ t.
(b) Choose a random integer Y uniformly from the interval [t, 2t − 1], and walk
for Y steps, starting from a random node from σ. Then the probability of being
at node i is at most 2πi .

35
5 Applications
5.1 Volume computation
Computing the volume of high-dimensional convex bodies, and related tasks like numerical
integration in high dimension, have profound applications from geometry to statistics to
theoretical physics. The volume of certain special convex bodies can be expressed explicitly.
We will need that the volume of a ball B ⊆ Rn with radius 1 is
 k
π
if n = 2k,


k!

vol(B) =
 22k+1 k!π k

 if n = 2k + 1,
(2k + 1)!
and using Stirling’s Formula,
1  2eπ n/2
vol(B) ∼ √ (n → ∞).
πn n
Computing the volume of a general n-dimensional convex body, even of a polytope, is NP-
hard. For general convex bodies, every estimate of the volume that can be computed in
polynomial time will err on some inputs with a factor that grows exponentially with the
dimension.
But if we allow randomized algorithms (using a random number generator), one can
compute an estimate of the volume such that the probability that the relative error is larger
than a prescribed ε > 0 is arbitrarily small.
The basic tool is generating an approximately uniformly distributed random point in
a convex body. The random point in the convex body is generated using Markov chains.
The mixing rate of the Markov chain can be estimated using conductance. This in itself is
an interesting geometric problem, using an “isoperimetric inequality” concerning subsets of
convex bodies.
The algorithm sketched in this course has a running time of about n10 , which is polynomial
in the dimension n, but rather large. Further improvements (quite elaborate) lead to a
running time of O(n4 ).

5.1.1 What is a convex body?

A convex body is a compact and full-dimensional convex set in Rn . For algorithmic purposes,
special classes of convex bodies occur with quite different descriptions in different situations.
Polyhedra are usually described as solutions sets of systems of linear inequalities; but
polytopes (bounded convex polyhedra) can also be specified by listing their vertices. These
two descriptions are equivalent in the sense of classical mathematics, but from an algorithmic
point of view, it may make a lot of difference to have one description or the other as input:
for example, to maximize a linear objective function over the polytope is trivial if an explicit

36
list of the vertices is given, but quite difficult if the polytope is described by linear inequalities
(it is just the task of solving linear programs).
One meets many other forms of descriptions of convex bodies. In analysis, convex sets
often arise as level sets or epigraphs of convex functions, or unit balls of norms. We want our
results to be as independent from the specific description as possible. A way to achieve that
is to describe a convex body K ⊆ Rn (as an input to an algorithm) by a membership oracle:
a subroutine (oracle) deciding whether a point y ∈ Rn belongs to K or not. (We could allow
the oracle to err near the boundary of K.)
In addition, we shall assume throughout that we know the center and radius r of some
ball contained in K, and the radius R of another concentric ball containing K. One can
apply an appropriate affine transformation (this is nontrivial, but outside the topic of this
course) after which both balls are centered at the origin, r = 1, and R = n3/2 .

5.2 Lower bounds on the complexity


Theorem 5.1 Let A be a polynomial algorithm that computes a number V (K) for every
convex body K given by a membership oracle, such that V (K) ≥ vol(K). Then for every
large enough dimension n, there is a convex body in Rn for which V (K) > 20.99n vol(K).

The proof shows that there is a trade-off between time and precision: if we want (say) a
relative error less than 2, then we need exponential time. The error of 20.99n in the theorem
can be replaced (with a more involved proof) by n0.99n . This means that up to the coefficient
of n in the exponent, the very rough estimate vol(B) ≤ vol(K) ≤ n3n/2 vol(B) cannot be
improved in general.

Lemma 5.2 Let B be the unit ball in Rn , and let P ⊆ B be a convex polytope with p vertices.
Then
p
vol(P ) ≤ vol(B).
2n

Proof. For every vertex of the polytope P , consider the ball Bv for which the segment
connecting v to the origin is a diameter (the “Thales ball” over the segment). We claim that
these balls cover the whole polytope. Indeed, let u ∈ P , then the closed halfspace uT x ≥ uT u
contains at least one point of the polytope (namely u), and hence it contains a vertex v of
P . But the the angle 0uv is obtuse, and hence u ∈ Bv .
Since the diameter of Bv is at most half the diameter of B, we have vol(Bv ) ≤ 2−n vol(B).
Using this, is is easy to estimate the volume of P :
X
vol(P ) ≤ vol(Bv ) ≤ p2−n vol(B).
v

This proves the Lemma. 

37
Proof of the theorem. Apply the algorithm A with the ball B 0 = n3/2 B as its input. It
returns a number V (B 0 ) which satisfies V (B 0 ) ≥ vol(B 0 ).
Next, let S be the set of points which were asked from the oracle and which it declared to
be in the ball. Let Q = {n3/2 e1 , . . . , n3/2 en , −n3/2 e1 , . . . , −n3/2 en }, and let P be the convex
hull of S ∪ Q. (Throwing in the points of Q is a little technicality, which is needed since we
must guarantee that the unit ball is contained in the convex hull of S.) Let p be the number
of vertices of P .
If we apply the algorithm A with input P , then comparing its run with the previous run
step-by-step, we see that it asks the same points from the oracle, and to these questions the
oracle gives the same answers. Hence the final result must be the same, and so V (P ) = V (B 0 ).
By the Lemma,
2n
V (P ) = V (B 0 ) ≥ vol(B 0 ) ≥ vol(P ).
p
Since p is bounded by a polynomial in n, the theorem follows. 

5.2.1 Monte-Carlo algorithms

It is a very interesting fact that once we allow randomized algorithms, the situation changes
dramatically. Randomization brings Monte-Carlo algorithms to mind. We have already made
the assumption that K is contained in the ball B 0 = n3/2 B. Let us generate many random
points in B 0 (this is not difficult), and count how often we hit K. This gives us an estimate
on the ratio of the volumes of K and B, and we know the volume of B 0 . Are we done?
The problem is not generating a random point in the ball; this can be accomplished as
follows. Let X1 , . . . , Xn be independent random variables from the standard normal distri-
p
bution, and let Y be uniformly distributed in [0, 1]. Let Zi = n3/2 Y 1/n Xi / X12 + · · · + Xn2 ,
then z = (Z1 , . . . , Zn ) is uniformly distributed over B 0 .
The problem is that the volume of K may be smaller than the volume of B by an
exponential factor (in n). For example, if K is the cube whose vertices are all ±1-vectors,
then its volume is 2n , while the smallest ball containing it has volume ∼ (2eπ/)n/2 . Hence the
first exponentially many random points will miss the body K. This method can be applied
to estimate the ratio of the volumes of two convex bodies (one including the other) only if
this ratio is not too small.
This suggests the trick: “connect” K and B by a sequence of convex bodies K = K0 ⊆
K1 ⊆ . . . ⊆ Km = B 0 , so that vol(Ki )/vol(Ki+1 ) ≥ 1/2. Then these ratios can be estimated
by the Monte-Carlo method, and their product gives an estimate on the ratio vol(K)/vol(B 0 ).
Such a sequence is easily constructed: we can take e.g. Ki = K ∩ 2i/n B. Trivially K0 ⊆
K1 ⊆ . . .. Since B ⊆ K, it follows that K0 = B and Km = K for m = d(3/2)n log ne.
Furthermore, since

21/n Ki = 21/n K ∩ 2(i+1)/n B ⊇ K ∩ 2(i+1)/n B = Ki+1 ,

38
we have

vol(Ki+1 ) ≤ vol(21/n Ki ) = 2vol(Ki ).

However, estimating vol(Ki )/vol(Ki+1 ) by the Monte-Carlo method not so easy; the key
question in this algorithm is: how to generate a uniformly distributed random point in a
convex body?

5.2.2 Measurable Markov chains

As it can be expected, we use Markov chains. But K is an infinite set, and we have studied
finite Markov chains so far. One possibility is to take a sufficiently fine grid, and do random
walk on the part of the grid inside K. This is doable, but not very efficient. It is better to
generalize the theory of mixing time to Markov chains with an infinite underlying set.
Let (Ω, A) be a σ-algebra. To define a Markov chain with this underlying space, we specify,
for every u ∈ Ω, a probability measure Pu on (Ω, A). We assume that for every A ∈ A the
value Pu (A) is measurable as a function of u. Selecting w0 from any starting distribution σ 0
on Ω, we can generate a walk, i.e., a sequence of random points w0 , w1 , w2 , . . . from Ω such
that wi+1 is chosen from distribution Pwi (independently of the values of w0 , . . . , wi−1 ).
We say that a probability distribution π on (Ω, A) is stationary, if selecting one point
from π and doing one step, the resulting point has distribution π. Explicitly,
Z
Pu (A) dπ(u) = π(A).
A

Such a distribution may not exist, but we will only need Markov chains where it does.
Several notions and results can be extended to these kinds of Markov chains with little
difficulty. Lazy walks can be defined as before. The variation distance of two probability
distributions α and β is

dvar (α, β) = sup α(A) − β(A).


A∈A

The quantity πi pij , which came up often for finite chains, can be replaced by
Z
Q(A, B) = P(u ∈ A, u0 ∈ B) = Px (B) dπ(x)
A

(here A, B ∈ A, u is a random point from π, and u0 is obtained by making a step from u).
Define
Q(A, Ω \ A)
Φ(A) = ,
π(A)π(Ω \ A)
The conductance of the chain is

Φ= inf Φ(A).
A∈A
0<π(A)<1

39
It is not hard to see that Q(A, Ω \ A) = Q(Ω \ A, A), and this implies that

P(u0 ∈ Ω \ A | u ∈ A)P(u ∈ A) + P(u0 ∈ A | u ∈ Ω \ A)P(u ∈ Ω \ A) = 2Φ(A)π(A)π(Ω \ A).

So we could define Φ(A) as half of the probability of crossing over between A and Ω \ A from
a random point from the stationary distribution.
To use conductance in order to estimate mixing times, we have to generalize Corollary
3.11. The quantity πmin causes difficulty, since this cannot be directly generalized to the
infinite case. The theorem below shows a way out (one needs a different proof from what we
gave above).

Theorem 5.3 Suppose that a starting distribution satisfies σ(A) ≤ Kπ(A) for all A ∈ A.
Then the lazy walk satisfies
t
√ Φ2

dvar (σ t , π) ≤ K 1− .
16

for all t ≥ 0.

5.2.3 Isoperimetry

The other auxiliary result we need is from geometry.

Theorem 5.4 Let K be a convex body in Rn with diameter d. Let K = K1 ∪ K2 ∪ K3 be a


partition of K into three measurable sets such that the distance of K1 and K3 is at least ε.
Then
ε
vol(K3 ) ≥ min{vol(K1 ), vol(K2 )}.
d
A long narrow cylinder shows that the bound is essentially sharp.

5.2.4 The ball walk

We generate a random point in K by using a random walk with steps in a small ball. Let rB

denote the ball centered at 0 with radius r. (For us the best choice for r will be r ≈ 1/ n,
but for a while it will be better to leave this as a parameter.) Let v 0 be drawn from some
initial distribution σ on K. Given v k , we generate a vector u from the uniform distribution
over the ball vk + rB (centered at v k and with radius r). If u ∈ K, we move to v k+1 = u.
Else, we let vk+1 = vk . We call this the ball walk in K. This defines a time-reversible Markov
chain, called the ball walk, and the uniform distribution on K is its stationary distribution.
We need to estimate the conductance of the ball walk. Let Ki0 = x ∈ Ki : vol((x +

9
vol(rB) , and K30 = K \ (K10 ∪ K20 ).

rB) ∩ Ki ) ≥ 10

The key observation is that the distance of K10 and K20 is at least r/ n. In fact, consider

two points xi ∈ Ki0 , and assume (by way of contradiction) that |xi − xj | < r/ n. Then

40
vol((x1 + B) ∩ (x2 + B)) > vol(B)/5 (this is a nontrivial exercise in integration), and so
either K1 or K2 meets (x1 + B) ∩ (x2 + B) in a set with volume at least vol(B)/10. But this
contradicts the definition of the Ki0 .
So we can apply Theorem 5.4 to the decomposition K10 ∩ K20 ∩ K30 of K. Using that the
diameter of the bodies we deal with is bounded by n3/2 , we get
r r
vol(K30 ) ≥ min{vol(K10 ), vol(K20 )} ≥ 2 min{vol(K1 )−vol(K30 ), vol(K2 )−vol(K30 )}. (31)
n2 n
and hence

vol(K30 ) > rvol(K1 )/(r + n2 ) > rvol(K1 )/(2n2 )

Thus we get
r
vol(K30 ) ≥ min{vol(K1 ), vol(K2 )}. (32)
2n2
To estimate the conductance, notice that if we are at a point u ∈ Ki ∩ K30 and generate
a random point u0 ∈ u + B, then with probability at least 1/10, u0 will not be in K1 .
Unfortunately, this can mean that u ∈ K3−i or that u ∈
/ K; in the latter case, we stay in K1
by the rule of the walk.
For the (incomplete) analysis that follows, we are going to ignore the second possibility; it
is small if the step-size r is small enough. With this simplification, we see that from (almost)
every point u ∈ K30 , we step to the other side of the partition {K1 , K2 } with probability at
least 1/20. It follows that

vol(K30 ) r min{vol(K1 ), vol(K2 )}


Q(K1 , K2 ) ≥ ≥
20vol(K) 40n2 vol(K)
r r
= min{π(K1 ), π(K2 )} ≥ π(K1 )π(K2 ).
40n2 40n2
So (subject to the simplification of ignoring the steps leaving K) the conductance is at least
r/(40n2 ).
Careful estimation of the case that was ignored above and optimization of the parameters

leads to the choice of r = 1/ n, which gives a bound on the conductance of the order of
n−5/2 . This gives a polynomial bound of n5 on the mixing time.

5.3 Random spanning tree


Theorem 5.5 Consider a random walk on a graph G starting at node u, and mark, for each
node different from u, the edge through which the node was first entered. The edges marked
form a subtree T of G, which is spanning with probability 1, and every spanning tree occurs
with the same probability.

41
Proof. The proof uses a method called “coupling from the past”.
It is easy to see that the first entrances to each node form a spanning tree.
We consider a directed graph H whose nodes are pairs (T, u), where T is a spanning tree
and u is any node of G, which we call the root. Draw a (directed) edge to (T 0 , v) if uv ∈ E(G)
and T 0 arises from T by deleting the first edge on the path from v to u and adding the edge
uv. Let H denote the resulting digraph. Clearly each tree with root v has indegree and
outdegree d(v) in H, and hence in the stationary distribution of a random walk on H, the
probability of a spanning tree with a given root is proportional to the degree of the root (in
G). If we draw a random spanning tree from this distribution, and then forget about the
root, we get every spanning tree with the same probability.
A random walk on G induces a random walk on H as follows. Assume that we are at a
node v of G, and at a node (T, v) in H, where T is a spanning tree. If we move along an
edge vw in G, then we can move to a node (T 0 , w) in H by removing the first edge of the
path from w to v and adding the edge vw to the current spanning tree. We can follow this
random walk on H backwards as well, following the random walk on G backwards. We may
assume that both random walks are infinite both to the past and to the future.
Consider a particular time, and let T and v denote the current tree and the current root.

Claim 1 The last exits from each node (as oriented edges) other v form a spanning tree,
oriented to v.

Claim 2 The tree of last exits is T .

It follows that the last exits from each node other than the current node form a uniformly
distributed random spanning tree at every time. Reversing time, the same follows for the
first entrances. 

42

Vous aimerez peut-être aussi