A Course in Stochastic Processes

A COURSE IN STOCHASTIC PROCESSES
THEORY AND DECISION LIBRARY
General Editors: W. Leinfellner (Vienna) and G. Eberlein (Munich)
Series A: Philosophy and Methodology of the Social Sciences
Series B: Mathematical and Statistical Methods
Series C: Game Theory, Mathematical Programming and Operations Research
Series D: System Theory, Knowledge Engineering and Problem Solving
SERIES B: MATHEMATICAL AND STATISTICAL METHODS

VOLUME 34
Editor: H. J. Skala (paderborn); Assistant Editor: M. Kraft (paderborn); Editorial Board:

J. Aczel (Waterloo, Ont), G. Bamberg (Augsburg), H. Drygas (Kassel), W. Eichhorn
(Karlsruhe), P. Fishburn (Murray Hill, N.J.), D. Fraser (Toronto), W. Janko (Vienna), P. de
Jong (Vancouver), T. Kariya (Tokyo), M. Machina (La Jolla, Calif.), A. Rapoport (Toronto),
M. Richter (Kaiserslautern), B. K. Sinha (Cattonsville, Md.), D. A. Sprott (Waterloo, Ont.),
P. Suppes (Stanford, Calif.), H. Theil (St. Augustine, Fla.), E. Trillas (Madrid), L. A. Zadeh
(Berkeley, Calif.).
Scope: The series focuses on the application of methods and ideas of logic, mathematics and
statistics to the social sciences. In particular, formal treatment of social phenomena, the
analysis of decision making, information theory and problems of inference will be central
themes of this part of the library. Besides theoretical results, empirical investigations and the
testing of theoretical models of real world problems will be subjects of interest. In addition
to emphasizing interdisciplinary communication, the series will seek to support the rapid
dissemination of recent results.
The titles published in this series are listed at the end of this volume.
A COURSE IN
STOCHASTIC PROCESSES
Stochastic Models and
Statistical Inference
by
DENIS BOSQ
Institut de Statistique,
Universite Pierre et Marie Curie,
Paris, France
and
HUNG T. NGUYEN
Department of Mathematical Sciences,
New Mexico State University,
Las Cruces, New Mexico, U.SA.
"
~ ..
Springer-Science+Business Media, B.Y.

A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-90-481-4713-7 ISBN 978-94-015-8769-3 (eBook)

DOI 10.1007/978-94-015-8769-3
Printed on acid-free paper
All Rights Reserved

© 1996 Springer Science+Business Media Dordrecht
Originally published by Kluwer Academic Publishers in 1996.
Softcover reprint of the hardcover I st edition 1996
No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.
Contents
Preface IX
1 Basic Probability Background 1

1.1 Events and Probabilities . . . . . . . . . . 1
1.2 Random variables and their distributions 10
1.3 Expectation .. 18
1.4 Limit theorems 24
1.5 Exercises 27
2 Modeling Random Phenomena 33

2.1 Random Phenomena . . . . . . 33
2.2 Stochastic Processes . . . . . . 34
2.3 Distributions of Stochastic Processes 35
2.4 Some Important Properties of Stochastic Processes 39
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . 42
3 Discrete - Time Markov Chains 45

3.1 The Markov Model . . . . . . . . . . . . . . 45
3.2 Distributions of Markov Chains . . . . . . . 48
3.3 Classification and Decomposition of States . 51
3.4 Stationary Distributions 62
3.5 Exercises . . . . . . . . 73
4 Poisson Processes 79
4.1 Motivation and Modeling 79
4.2 Axioms of Poisson Processes . 81
4.3 Interarrival Times . . . . . . 84
4.4 Some Properties of Poisson Processes . 87
4.5 Processes related to Poisson Processes 91
4.6 Exercises 92
v
vi Contents
5 Continuous - Time Markov Chains 95

5.1 Some typical examples . . . . . . . 95
5.2 Computational aspects . . . . . . . 100
5.3 Distributions of Birth and Death Chains . 109
5.4 Exercises 114
6 Random Walks 117

6.1 Motivation and definitions . . . . . . . . . . . . . 117
6.2 Asymptotic behavior of the simple random walk 122
6.3 Returns to the origin. 126
6.4 First passage times 131
6.5 A classical game 135
6.6 Exercises 142
7 Renewal Theory 147

7.1 Motivation and examples 147
7.2 The counting process. 150
7.3 Renewal equations 153
7.4 Renewal Theorems 157
7.5 Exercises 166
8 Queueing Theory 171

8.1 Modeling and structure . . . . . 171
8.2 The queue MIMll . . . . . . . . 173
8.3 The queues M/M/s, 1 < s ::; 00 . 180
8.4 The queue MIGll 182
8.5 Exercises . . . . . . . . . . . . . 186
9 Stationary Processes 189

9.1 Autocovariance, Spectral Density, and Partial Autocorrelation 189
9.2 Linear Prediction and the Wold Decomposition 194
9.3 Limit Theorems for Stationary Processes. 196
9.4 Stationary Processes in Continuous Time 198
9.5 Exercises . . . . . . . . . . . . . . . . . . 202
10 ARMA model 205

10.1 Linear Processes . . . . . 205
10.2 Autoregressive Processes . 207
10.3 Moving Average Processes 211
10.4 ARMA Processes . . . . . 213
10.5 Nonstationary Models and Exogeneous Variables 214
10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . 215
Contents vii
11 Discrete-Time Martingales 219

11.1 Generalities . . . . . . . . 219
11.2 Examples and Applications 221
11.3 Convergence of Martingales 225
11.4 Exercises 230
12 Brownian Motion and Diffusion Processes 233

12.1 Gaussian Processes 233
12.2 Brownian Motion. . 235
12.3 Stochastic Integral . 238
12.4 Diffusion Processes 242
12.5 Processes Defined by Stochastic Differential Equations 247
12.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 250
13 Statistics for Poisson Processes 255

13.1 The Statistical Model 255
13.2 Estimation . . . . . . . . . . . 256
13.3 Tests . . . . . . . . . . . . . . . 260
13.4 Estimation for Poisson processes 261
13.5 Confidence Intervals and Tests for A 263
13.6 Inference for Point Processes . . . . 265
13.7 Exercises 267
14 Statistics of Discrete-Time Stationary Processes 271

14.1 Stationarization . . . . . . . . . . . . . . . . . . . . 271
14.2 Nonparametric Estimation in Stationary Processes 275
14.3 Statistics of ARMA Processes 280
14.4 Exercises 284
15 Statistics of Diffusion Processes 287

15.1 Nonparametric Estimation in Continuous Time Processes 287
15.2 Statistics of Wiener Processes .. 291
15.3 Estimation in Diffusion Processes 293
15.4 Exercises . . . . . . . . . . . . . 295
A Measure and Integration 299

A.1 Extension of measures . 299
A.2 Product measures. . . . 301
A.3 Some theorems on integrals 302
viii Contents
B Banach and Hilbert Spaces 305

B.1 Definitions .. . 305
B.2 V-spaces .. . 306
B.3 Hilbert spaces . 307
B.4 Fourier series . 308
B.5 Applications to probability theory 309
List of Symbols 311
Bibliography 313
Partial Solutions to Selected Exercises 315
Index 347
Preface
This text is an Elementary Introduction to Stochastic Processes in discrete

and continuous time with an initiation of the statistical inference. The
material is standard and classical for a first course in Stochastic Processes
at the senior/graduate level (lessons 1-12). To provide students with a
view of statistics of stochastic processes, three lessons (13-15) were added.
These lessons can be either optional or serve as an introduction to statistical
inference with dependent observations. Several points of this text need to
be elaborated,
(1) The pedagogy is somewhat obvious. Since this text is designed for a
one semester course, each lesson can be covered in one week or so. Having
in mind a mixed audience of students from different departments (Math-
ematics, Statistics, Economics, Engineering, etc.) we have presented the
material in each lesson in the most simple way, with emphasis on moti-
vation of concepts, aspects of applications and computational procedures.
Basically, we try to explain to beginners questions such as "What is the
topic in this lesson?" "Why this topic?", "How to study this topic math-
ematically?". The exercises at the end of each lesson will deepen the stu-
dents' understanding of the material, and test their ability to carry out
basic computations. Exercises with an asterisk are optional (difficult) and
might not be suitable for homework, but should provide food for thought.
The purpose of the book, viewed as a text for a course or as a reference
book for self study, is to provide students with an pleasant introduction to
the Theory of Stochastic Processes (without tears!). After completing the
course, the students should be able to take more advanced and technical
courses or to read more specialized books on the subject.
(2) In writing the text we face the following dilemma. In general, mea-
sure theory is not required for a First Course in Stochastic Processes. On
the other hand, it is true that measure theory is the language of probabil-
ity theory. When presenting the material, even at the simplest level, some
ix
x Preface
aspects to measure theory are necessary to make the treatment rigorous.

After all, this is a text about theory. Our approach is this. We do not
require measure theory for this text. However, whenever necessary, we will
call upon some facts from measure theory. A short appendix at the end of
the text contains these facts in some detail as well as other topics which
might not be familiar to the audience.
(3) The standard prerequisite is a solid first course in probability theory
and some calculus. However, lesson 1 is devoted to a complete review of
the probability background needed for this text. Lessons 1-12 form a core
of a course in stochastic processes. As far as the statistical part of the book
(lessons 13-15) is concerned, when using, for example, in a Seminar on Ini-
tiation of Statistics in Random Processes, students need a basic knowledge
of a first course in mathematics statistics.
A selected bibliography at the end of the book suggests some appropriate
references for this purpose as well as for further reading on topics omitted
in this text.
The real level of the course depends upon the background of the au-
dience. More specifically, depending on the interests and background of
the mixture of students, some aspects of measure theory, advanced top-
ics, generalities of results, and complete proofs, etc. can be emphasized
appropriately.
We would like to thank Professor H. Skala, Editor of the Series "Math-

ematical and Statistical Methods", for giving us the opportunity to write
a text in our own style. We extend our thanks also to Dr. Paul Roos
and Ms Angelique Hempel at Kluwer Academic for advising us during the
preparation of the manuscript.
We are grateful to Dr. Tonghui Wang of New Nexico State University for
proof reading of the text, and for his penetrating remarks and suggestions
concerning the final version of the text. The Camera-ready version as well
as the design of the book is also due to him.
The first named author would like to thank Emmanuel Guerre for pro-
viding some exercises.
The second named author would like to thank his department head,
Professor Douglas Kurtz for his encouragement.
Denis Bosq and Hung T. Nguyen

Paris and Las Cruces, Winter, 1995
Lesson 1
Basic Probability
Background
This Lesson is a review of basic concepts in probability theory needed for

this Text. The notation in this Lesson will be used throughout the Text un-
less otherwise stated. We emphasize computational aspects. The Appendix
at the end of this Text contains additional topics.
1.1 Events and Probabilities

This section aims at providing the motivation for using probability spaces
to model random phenomena.
By an experiment, we mean the making of an observation. The result of
an experiment is called an outcome. The collection of all possible outcomes
of an experiment & is called the sample space and is denoted by n. By a
random experiment, we mean an experiment such that observations under
identical conditions might not lead to the same outcome.
Suppose we consider the random experiment consisting of rolling two
dice. The sample space is
n = {(i,j): i, j = 1,2,·· ·,6}.

Consider the event "the sum of two numbers shown is equal to 7". This
event A consisting of sample points (i, j) such that i + j = 7. Thus an event
is a subset of the set n, and we write
A~O ( A is contained in 0).
1
2 Lesson 1
If we perform the experiment and obtain the outcome (2,5), then, since
(2,5) E A (the point (2,5) belongs to A or is a member of A), we say that
the event A is realized, or A occurs.
Since we cannot predict exactly what the outcome will be in a random
experiment such as this, we ask " what is the chance that A will occur?"
The answer to this question will be a number P(A) called the probability of
the event A. In an experiment whose sample space 0 is finite, it is possible
to assign a number P(A) to all subsets A of O. The point is this. Since we
are interested in probabilities of events, subsets of a general sample space
o (such as 0 = IR == (-00,00), the set of real numbers) are considered as
events only if their probabilities can be assigned.
In our actual example, the collection A of all events is P(O), the power
set of 0, that is, the collection of all possible subsets of 0, including the
empty set 0 and O. Events are stated in natural language and hence com-
pound events are formed by using logical connectives like "not", "and",
and "or". In the context of random experiments, events are subsets of o.
The modeling of the above connectives in the context of Set Theory is as
follows.
The negation (or complement) of A is AC = {w EO: w ¢ A}, where ¢
stands for "is not a member of" .
For A, B E 0, "A and B" is defined as
An B = {w EO: wE A, WEB},
where n stands for "intersection"; "A or B" is
AU B = {w EO: wE A, or wEB},
where U stands for "union". Note that the "or" here is not exclusive, i.e.,
we allow w E A U B if w belongs to both A and B.
In our example, since A = P(O), A is closed under all above set op-
erations, that is, if A, B E A then A c, A n B, and A U B all belong to
A-
We describe now the way to assign probability to events in our example.
It is plausible that any outcome (i, j) will have the same chance to occur.
=
Thus we assign to each w (i,j) a number f(w), called the probability of
event A = {w}. By its meaning, 0 ~ f(w) ~ 1. Here since f(w) is the same
for all w E 0, we obtain that
1 1
f(w) = #(0) = 36'
where #(0) denotes the cardinality (number of elements) of o. Observe
that f : 0 -+ [0,1] and satisfying the condition 'EWEO f(w) = 1. Such a
function is called a probability mass function.
Basic Probability Background 3
Now for an event such as {w = =

(i,j): i+j 7}, how to assign P(A)
from f? Recall that A occurs ifthe outcome is any (i, j) such that i + j = 7.
Thus it is plausible that
#(A) 6 1
P(A) = #(0.) = 36 = 6'
which values P(A) = EWEA f(w).
The operator P is a mapping from A to [0,1], satisfying the conditions
(i) p(o.) = 1 and
(ii) For A, BE A with AnB = 0 (A and Bare disjoint or incompatible),
P(A U B) = P(A) + P(B).

Such an operator P is called a probability measure. The condition (ii) above
is referred to as the finite additivity property of P. Specifically,
k
P(A 1 U ... U Ak) = I: P(Ai),
i=1
when Ai n Aj = 0, 1 $ i =1= j $ k.
The triple (0., A, P) above is called a probability space. A probability
space is a model for a random experiment.
Let us extend the above modeling of random experiments with finite
sample spaces to the case of experiments with countable infinite sample
spaces (0. is countable if there is an one-to-one correspondence between 0.
=
and the set IN {O, 1,2, ... ,} of non-negative integers). We say that 0. is
discrete if 0. is finite or countable infinite.
As an example of an experiment with infinite many outcomes, consider
the experiment of tossing a fair coin until we first obtain a Head. The
outcome of this experiment is the number of tosses needed to obtain the
first Head. Obviously, 0. = {I, 2, ... , }. As in the finite case, we first assign
f( n) to each n E 0., where n stands for the outcome "the first Head occurs
on toss n". When tossing a coin n times, there are 2n possible combinations
of Heads and Tails, only one of which corresponds to the above outcome,
namely the first n - 1 tosses yield Tails, and the nth toss yields Head. Thus
1
f(n) = 2n ' n ~ 1.
Since
I: f(w) = I: f(n) = I: 1/2 = 1,
00 00
n
wEn n=1 n=1
4 Lesson 1
f is a probability mass function, where the summation E is an infinite one.

In the discrete case, we can assign probabilities to all possible subsets
of n via the formula
P(A) = 1: f(w), A~n,

wEA
where f(w) = P({w}). Thus the collection of events A = p(n).

The probability measure P satisfies the following IT-additivity property:
For any sequence An, n ~ 1 of subsets of n (that is An E A, n ~ 1),
where the An's are disjoint (that is , An n Am = 0 for n :/= m),
P CQl An) = ; P(An ).
Consider random experiments with uncountably many outcomes (con-

tinuous case). The typical and important experiment in this category is
=
"choosing a number at random from the unit interval [0, 1]". Here n [0,1].
Recall that for n discrete, we specify A and P as follows.
(i) Assign to each wEn its probability value f(w) and
=
(ii) assign to each A ~ n its probability P(A) EWEA f(w).
Now, n = [0,1] is uncountably infinite, we cannot proceed as in the
discrete case. Indeed, if f(w) denotes the probability of getting the point w
in [0,1]' then by the nature of the experiment, f(w) should be constant, say
a (every point in [0,1] has the same chance to be selected). But since the
probability of [0,1] is 1, a must be zero! (Take n points Wi, i = 1,2, .. " n in
[0,1], where n ~ [l/a], the integer part of l/a, that is the smallest integer
greater than l/a, then E?=l f(wd =
° na > 1 if a :/= 0). As we will see,
the probability value f(w) = for each wE [0,1] does make sense, but this
assignment f(.) cannot be used to define probabilities on subsets of [0, 1].
For example, what is the chance that the point chosen at random will be
in the interval [0,0.25]? It is clear that the answer should be 0.25. More
generally, if I is a sub-interval of [0, 1], then P(I) should be IJI, the length
of I.
The above suggests that, for uncountable n, we have to assign proba-
bility directly to subsets (events), that is we need to specify a probability
measure P. The probability measure P should be such that P(I) = III for
any sub-interval I of [0,1]. When I reduces to a singleton {w}, P( {w}) = 0.
The next question is "What is the domain A of P"? In other words,
what are the subsets of [0,1] which are events? Recall that, a subset of
n is qualified as an event if we can assign to it a probability value. If we
set A = P([O, 1]), as in the discrete case, then we must ask the following
question: Is there a probability measure P on 1'([0, 1]) such that P(I) = III
for any sub-interval I of [0, I]? Note that, to be consistent with the discrete
case, P needs to be u-additive.
It turns out that the answer for this mathematical problem is NO. The
reason is that 1'([0,1]) is too big. Thus not all subsets of [0, 1] are events,
that is, A is a proper subset of 1'([0, 1]). To determine A, we observe that A
should contain intervals, and for any A E A, P(A) should be derived from
P{I) = III for interval I. Furthermore, as in the discrete case, A should be
a u-jield, that is, A is a collection of subsets of 0 satisfying
(i) 0 E A,
(ii) A E A implies that AC E A, and
(iii) For any sequence An E A, n ~ 1, U~=l An E A.
Remarks.
(a) The above algebraic structure of A expresses the fact that A should
be large enough to contain all events of interest.
(b) (ii) and (iii) above imply that if An E A, n ~ 1, then n~=lAn E A
(exercises).
(c) If (iii) above is replaced by
A, B E A =} AU B E A,
then A is called a jield. Note that a u-field is a jield (Exercise).

Thus we arrive at the general probabilistic model for an arbitrary ran-
dom experiment (discrete or continuous):
Definition 1.1 A probabilistic model for a random experiment is a prob-

ability space (0, A, P), where 0 is the sample space, A is a u-jield of
subsets (events) of 0, and P is a probability measure dejined on A (for
A E A, P(A) is the probability of A). A probability measure is a map
P: A - [0,1] such that
(i) P(O) = 1 and
(ii) For any sequence An E A, n > 1, where the An's are pairwise
disjoint,
P (Q, An) = ~P(An) ( u-additivity).
The pair (0, A) is called a measurable space.

6 Lesson 1
Let us go back to the specification of A for 0 = [0,1]. In view of

the requirements imposed on A discussed earlier, A should be au-field
containing intervals. Thus we take A to be the smallest u-field containing
intervals. If C denotes the collection of all sub-intervals of [0,1], then we
write A = u(C), which is called the u-field generated by C.
Remarks.
(a) The above u-field is called the Borel u-field of [0, 1] and is denoted
by 8([0,1]). Elements of 8([0,1]) are called Borel subsets of [0, 1].
(b) For 0 = JR = (-00,00), JR = [-00,00], JR+ = [0,00) and ni+ =

[0,00], we have similarly 8(JR), 8(JR), 8(JR+), and 8(JR+), respectively.
The above choice of A as the Borel u-field 8([0,1]) is justified by the
existence of a unique probability measure P on it such that P(I) = III for
any sub-interval I of [0, 1]. We omit the technical details. The P so defined
is sometimes denoted as dL( x) or dx and is called the ebesgue measure on
[0,1].
We close this section by mentioning some useful properties of P. The
proofs of these properties are left as exercises.
Let (0, A, P) be a probability space.
(i) P is monotone increasing, i.e., if A, B E A with A ~ B, then

P(A) $ P(B).
(ii) For A, BE A, P(AUB) = P(A)+P(B)-p(AnB). More generally,
P (Q A.) ~ ~?(A;)- "t1," p(A,nAJ)+- - _+(_1)"+' P (0 A,) .

(Poincare's formula).
(iii) For A E A, P(AC) = 1 - P(A).
(iv) For An E A,
P (Q. An) ~ ~P(An) ( sub-u-additivity).
(v) Limits of events. As in Real Analysis, we proceed as follows. A

sequence of subsets An, n ~ 1, of 0, is increasing if An ~ A n+1, "In ~ 1.
For such a sequence, we define the limit as follows.
= U An.
00
lim An
n ..... oo
n=l
Similarly, the sequence An is decreasing if An+l ~ An, 'in ~ 1, and
00
limAn=nAn·
n~oo
n=l
If the sequence is arbitrary, then Bn =
U~nAi is a decreasing sequence
=
and Dn n~nAi is an increasing sequence. Thus we define
lim supAn
n ..... oo
= nU
00
n=l i=n
00
Ai and liminfAn
n ..... oo
=U
00
n=li=n
n
00
Ai.
Note that
lim inf An ~ lim sup An.
n ..... oo n ..... oo
When
liminfAn
n ..... oo
= lim supAn ,
n ..... oo
we say that limn ..... oo An exists and is equal to
lim inf An = lim sup An

n ..... oo n ..... oo
Note that limsupn..... oo An is also written as (An i.o.), where i.o. stands
for "infinitely often", since w E lim supn ..... oo An if and only if wEAn for
infinitely many An. Also w E lim infn ..... oo An if and only if wEAn for all
but a finite number of n.
If An E A, n ~ 1, is either an increasing or decreasing sequence of
events, then
lim P(An)
n-+oo
= P ( n-+oo
lim An) . (Monotone continuity)
(vi) Borel-Cantelli Lemma. Let An E A, n ~ 1.

(a) llI::=lP(An) < 00, then P(limsupn ..... ooAn) = O.
(b) lithe An's are independent (see definition below) and I::=1 P(An)
= 00, then P (limsupn ..... oo An) = 1.
8 Lesson 1
(vii) Conditional probability and independence. Let A E A with

P(A) # 0. The conditional probability of B E A given A is defined to
be P(A n B)/P(A) and is denoted by P(BIA). (For P(A) = 0, P(.IA)
is undefined). For fixed A, the set-function P(.IA), defined on A, is a
probability measure.
From the definition of conditional probability, we obtain the multiplica-
tion formula for probabilities
P(A n B) = P(A)P(BIA).
More generally, if A 1, A 2 ,···, An are events such that P (n?:::i1 Ai) # 0,

then
P (0 Ai) = P(At)P(A 2 IAt) ... P (Ani :0: Ai) .
The following law of total probability is useful in computing probabilities

of complicated events.
Let {A1' A 2 , ••• , An} be a measurable partition of 0, Ai E A, i =
1,2, ... , n, the Ai'S are disjoint and U?=l Ai = O. Assuming that P(Ai) > 0,
i = 1,2,···, n, then for any B E A,
n
P(B) = L: P(BIAi)P(Ai).
i=l
As a consequence, we obtain Bayes' formula: If P(B) > 0, then for any

j = 1,2,·· ·,n,
P(Aj )P(BIAj )
P(AjIB) = L:?=1 P(Ai)P(BIAi)·
The above formulas can be extended to an infinitely countable partition
of 0 (Exercise).
We turn now to the concept of independence of events. This is a stochas-
tic independence concept, since it is defined in terms of P.
For A, BE A with P(A), P(B) # 0, it is intuitive that "A is indepen-
dent of B" (with respect to P) when
P(AIB) = P(A),
and similarly, "B is independent of A" when
P(BIA) = P(B).
In both cases,
P(A n B) = P(A)P(B), (1.1)

which is taken to be the definition of the independence of two events A and

B. Note that (1.1) is symmetric in A and B and makes sense for all events
=
(even if P(A) or P(B) 0).
It should be noted that in general, disjoint events are not independent!
If A n B = 0 and P(A) f. 0, P(B) f. 0, then (1.1) cannot hold.
Viewing {A, B} as a collection of events, the above concept of (stochas-
tic) independence is extended to an arbitrary collection of events as follows.
Let I be an arbitrary set. A collection {Ai, i E I} ~ A is said to be inde-
pendent if for any finite J ~ I, we have
P (n
jeJ
Aj) = II P(Aj),
jeJ
where the symbol I1 stands for "product". In particular, when

1= {1, 2,···, n}, the events A l , A 2 ,···, An are (mutually) independent if
for k = 1,2,···, nand 1 ~ i l , < i2 < ... , < ik ~ n,
P(QA;;) ilP(A,;)
=
The independence of Ai'S implies that any two events Ai and Aj are in-
dependent (pairwise independent). However, the converse does not hold
(Exercise) .
Viewing {A} and {B} as two collections of events, we define independent
collections of families of events as follows.
Let I be a set and Ci ~ A, i E I. Then the collections Ci'S are said to
be independent if for any finite J ~ I and all Ai E Ci , i E I,
P (n
iEJ
Ai) = II P(Ai).
iEJ
In particular, when I = {1, 2, ... ,} and Cn =

{An}, n E I, the infinite
sequence of events An, n ~ 1 is independent if any finite number of An's
are (mutually) independent.
Finally, note that for A, B, G E A, we say that A and B are independent
given G if
P(A n BIG) = P(AIG)P(BIG).
The general concept of conditional independence appears naturally in the
context of Markov processes (e.g., Lesson 3 and Lesson 5), and will be
formulated in the context of random variables in Section 1.3.
10 Lesson 1
1.2 Random variables and their distributions

In performing a random experiment such as rolling two dice, we might be
interested in various numerical quantities resulting from the outcomes of
the experiment. For example, X ="the sum of two numbers shown" and
Y ="the product of two numbers shown" .
Since X, Y,"', are names of some quantities, they are called variables.
A variable, like X, can take different values: 2,3, "', 11. Unlike determin-
istic variables where we can assign values to them directly, the values of
X depend on the outcomes of the roll of two dice. For example, the value
X = 3 corresponds to the outcome w = (1,2) or w = (2,1). Variables of
this type are called random variables. Since the values of a random vari-
ables depend on the outcomes of random experiments, these variables are
in fact functions of outcomes.
For 0 = {(i,j) : i,j = 1,2", ·,6}, X: 0 -+ JR and X(w) = X(i,j) =
i +j.
Many quantities of interest in the real world can be viewed as random
variables, such as the annual income of an individual randomly selected
from a population, the number of car accidents at a given location and a
given time of the day, the arrival times of customers between 9 am to 4 pm,
at a bank, ....
Let (0, A, P) be the model of the random experiment underlying a
random variable X. The range of X (i.e, the possible values that X can
take) is a subset of the real line JR. (X is called a random vector i.f its
range is some subset of a Euclidian space JRd, d> 1, and a random element
when its range is of some more complicated nature, such as the infinite
dimensional space of continuous functions). In a problem involving X, we
are interested in various events which can be "described" by X, such as
"X belongs to some subset A of JR". This event (X E A) occurs when the
outcome w is such that X(w) EA. Thus (X E A) = {w EO: X(w) E A}.
Since X : n -+ JR, we can write
(X E A) = X-1(A),
where X-l : P(JR) -+ P(O) is defined by
X-1(A) = {w : X(w) E A} ~ 0, A~JR.
Since P is specified on (0, A), we can assign to (X E A) the probability

value P(X E A) = P(X-1(A)), provided X-1(A) E A. When 0 is discrete,
we take A = P(O) so that X-1(A) E A for all A ~ JR. But for infinitely
uncountable n, this is not true since not all subsets of n are events (elements
of A). Recall that subsets of 0 are qualified as events only if we can assign
probability values to them. Now, on JR, there is a natural u-field namely

the Borel u-field 8(JR) generated by the intervals of JR (see Exercise 5 and
the Appendix). Also, for technical reasons given in the next section, events
associated with random variables are (X E A) for A E 8(JR). Thus we
arrive at the following definition.
Definition 1.2 Let (n,A) and (JR,8(JR» be two measurable spaces. A

random variable is a map X : n -+ JR such that for any A E 8(JR) ,
X-I(A) E A.
Remarks.
(a) A map X satisfying the condition in the above definition is called a
measurable function. More specifically, X is a A - 8(JR) measurable func-
tion. Note that the probability P on (U, A) plays no role in the definition.
(b) If the range of the random variable X is discrete (continuous), the
X is called a discrete (continuous) random variable.
(c) By technical reasons, we might need to consider extended random
variables, that is, we allow ±oo as values. In this case, X : n -+ IR =
[-00,00] and by definition, X is a (extended) random variable if {w :
X(w) ::; t} E A for any t E JR.
(d) More generally, for d ~ 1, a measurable mapping X : (n,A) -+
»
(JRd,8(JRd is called a random vector. Write X = (Xl, X 2 , ••• , X d ), where
Xk : n -+ JR, k = 1,2,···, d, then it can be shown that X is a random
vector if and only if each Xk is a random variable. Note that elements of
8(JRd) are Borel sets of JRd (see Appendix).
I
EXaIIlple 1.1 (a) The number of heads obtained in tossing a coin five times
and number of tosses needed to obtain the first head in repeated tosses of a
coin are examples of discrete random variables.
(b) The waiting time for service of a customer in a queue and the time at
which some event of interest (such as breakdowns, earthquakes, .. -) occurs
are examples of continuous random variables.
It can be shown that if X and Y are random variables defined on the

same (n,A), then X ± Y, XY, max(X, Y), and min(X, Y) are also ran-
dom variables. Also if X n , n ~ 1 is a sequence of random variables, then
sUPn X n , infn Xn are extended random variables (Exercise). It is also the
case for the following quantities:
limsupXn = lim (SUPXk)

n_oo n_oo k$n
12 Lesson 1
and
liminfXn = lim (inf
n-+oo n-+oo k$n
Xk) .
In particular, when liffin-+oo Xn exists (That is when limsuPn-+oo Xn
lim infn-+oo X n ), it is also a random variable.
The simplest random variables are indicator functions of events (sets).
Let A S; 0, then the function 1A : 0 - {O, 1} defined by
1A(W) = {1 if wE A
0 elsewhere
is called the indicator (function) of set A. Obviously, if A E A, then 1A is

a random variable. The events associated with X = 1A are {0, 0, A, AC}.
This is a sub-IT-field of A and is called the IT-field generated by 1A, denoted
by IT(1A). Since
IT(1A) = {(1A)-l(B) : B E 8(JR)} ,

we define the IT-field generated by an arbitrary random variable X as
IT(X) = {X-l(B) : BE 8(JR)}.

This is indeed a sub-IT-field of A (Exercise).
Let P be a probability measure on (O,A). When dealing with random
variables defined on (0, A, P), we are interested, not in P, but in probability
measures over their ranges. For example, let (0, A, P) be the finite prob-
°
ability space describing the random experiment of rolling two dice. Let X
denote the sum of two numbers shown. Since = {(i,j) : i,j = 1,2,···, 6}
is finite, the range of X is also finite: R(X) = {2, 3, ... , 12}. The proba-
bility measure Px on (R(X), P(R(X))) is defined by
Px(A) = P (X-l(A» , VA S; R(X).
This probability measure (induced by X) describes the probabilistic "be-

havior" of X on its range. Here, since R(X) is finite, it suffices to specify
the values
=
Px(x) P(X x), = Vx E R(X).
In our example,
P(X = 2) = P{(1, 1)} = 1/36, P(X = 3) = P{(1, 2), (2, 1)} = 2/36, ... ,
x =x 2 3 4 5 6 7 8 9 10 11 12
P() 1 2 3 4 5 6 5 4 3 2 1
XX 36 36 36 36 36 36 36 36 36 36 36
The function Px(.) : 'R(X) - [0,1] is a robability mass function

(Exen(x) Px (x) = 1).
The knowledge of the probability mass function of
X is equivalent to that of the probability measure Px, since for A ~ 'R(X),
Px(A) = L: Px(x).
xeA
Also X can be characterized by its umulative distribution function(CDF or

distribution function for short) as
Fx: IR- [0,1]' Fx(x) = P(X ::; x).

Since
Fx(x) = L: Px(y),
y$x
and for x E 'R(X) = {2, 3, ... , 12},
Px(x) = Fx(x) - Fx(x - 1).

In general, for real-valued random variables, it turns out that the distribu-
tion function
Fx : IR - [0,1]' Fx(x) = P(X ::; x).

determines completely the induced probability measure Px(A) P(X-l(A)) =
on (IR, B(IR)). See Appendix. Thus, in any case, distribution functions
characterize probabilistic structures of random variables.
There are three types of distribution functions.
(a) F is piecewise constant. There are at most a countable number
of jump points Xl, X2, ••• , at which ~F(xn) = F(x n ) - F(x n -) > 0, where
F(x-) denotes the left limit at x, i.e., liffiyjx F(y). In this case, the asso-
ciated probability mass function is f(x n ) - ~F(xn) with En f(x n ) = l.
The random variable having such a distribution function is called a discrete
random variable.
(b) absolutely continuous distribution functions. By this, we
mean a distribution function F of the form
F(x) = 1:00 f(y)dy,

14 Lesson 1
where 1 : IR --+ IR+ and J:OI(y)dy = 1. 1 is called a probability den-

sity function. Random variables having this types of distribution functions
are referred to as continuous random variables. Note that, except on a
countable set of points, F(:c) is differentiable and F'(:c) = I(:c).
(c) Singular distribution functions. There are distribution functions
F which are continuous (there are no mass points, that is P(X = :c) = °
°
for all :c), but have all their points of increase (that is, points :c such that
F(:c + c) - F(:c - c) > for all c > 0) on sets of zero "Lebesgue measure".
As an example, let X = E:=l Xn/(3 n ) where the Xn's are independent
with the same distribution
P(Xn = 0) = 1 - P(Xn = 2) = 1/2.

Then the distribution F of X is continuous, and yet F does not admit a
density. This can be seen as follows.
Each point :c E [0,1] can be represented in ternary notation as :c =
E:=l a n /(3 n ), where an E {O, 1, 2}. The range of X is the subset A of [0,1]
consisting of:c such that an E {0,2}. Now A (the Cantor set) is obtained
as A = n~l B n , where the Bn's are constructed as follows. Starting with
[0,1], we divide [0, 1] into 3 sub-intervals oflength 1/3 and delete the closed
middle interval [1/3, 2/3], the remaining is Bl = [0,1/3) U (2/3, 1]. In the
step two, divide [0,1/3) and (2/3,1]' each into 3 sub-intervals of length
1/32 , delete the closed middle interval, the remaining is
B2 = [0'312)U(322,~)U(~,~)U(:2,1],
and so on. Note that the Bn's decrease, and each Bn is the disjoint union
of 2n sub-intervals, each of length 1/3n . Thus the "length" of A is:
L(A) = lim
n--+oo
L(Bn) = lim
n--+oo
(-32 ) n = 0.
But since A is the range of X, we have P(X E A) = 1. These facts show
that X does not have an absolutely continuous distribution F. It can be
shown, however, that F is continuous.
Every distribution function F can be written in the form aF1 + f3F2 +
-yFa, where a + f3 + -y = 1 and Fl, F2 , Fa are of types (a), (b), (c) above
respectively.
Distribution lunctions 01 random vectors are defined as follows.
Definition 1.3 Let X = (Xl,"" Xn) be a random vector. Then

Fx : IRn -+ [0,1], Fx (Xl,'" ,Xn) = P(XI :::; Xl," ',Xn :::; Xn)
is called the joint distribution function of Xi'S. The joint density

function is
/(Xl' X2, •• " Xn) =!l :n Fx

Xl X2'" Xn
a (when it exists).
For 1 :::; i l < i2 < ... < ik :::; n, the joint distribution of the random vector
(Xil' Xi 2 , " ' , Xi,,) is
F(i 1 ,i 2 .... ,i,,) (Xit> X'2"", Xi,,) = Fx (00,···, Xit> 00,"', X'2' 00,"', Xi", 00,···),
and is a k-dimensional marginal distribution.
For example, the marginal distribution of Xi is
F,(Xi) = F(oo,···, Xi, 00,···,00).

(an expression like F(x, 00) means liIlly ..... oo F(x, y)).
We discuss now the concept of conditional distributions. Let (0, A, P)
be a probability space. Recall that, for fixed A E A with P(A) =F 0, the
set-function
PA(.) : A -+ [0,1]' PA(B) P(BIA) =
is a probability measure on A and is called the conditional probability mea-
sure given A.
In applications, when several random variables are involved, we are
often interested in computing expressions like P(Y E AIX x), denoted=
also as PfI:' for event A in the range of Y. As a function of A for fixed
x, this set-function is called the conditional law of Y given that X x. =
The associated distribution function F(ylx) =
P(Y :::; ylX x) is the =
conditional distribution of Y given X = x. This function is well defined
when P(X = x) =F 0. For example, suppose that X is discrete with support
{Xl, X2,""} (that is, P(X = xn) > 0, n ~ 1, and E~=l P(X = xn) = 1),
then F(.lx n ) represents the distribution of Y after observing the value Xn
of X. Before observing X, P(Y E AIX) is a random variable defined by
E P(AI Bn)IB,,(w),
00
P(Y E AIX)(w) =
n=l
where Bn = {w : X(w) = xn }. Note that {Bn, n ~ I} forms a partition of
O.
16 Lesson 1
When X is continuous random variable (so that P(X x) =

0, 'ix), =
the situation is delicate! Note that, in the discrete case, we never have to
consider conditional probabilities with respect to events where probabilities
are zero. The situation is different for continuous X: all observed values
of X are not mass points. For example, let X be the outcome of randomly
=
selecting a point in the unit interval [0, 1]. For X x, we build a unbalanced
coin with probability of getting a head in a single toss equal to x. Let Y
denote the number of heads_ obtained when tossing that coin 10 times.
Obviously, the probability of getting k heads is
P(Y = klX = x) = ( ~o ) xk(l- X)lO-k
while P(X = x) = 0.
The conditional distribution F(ylx) =
P(Y ~ ylX =
x) in such cases
can be defined rigorously by using some sophisticated mathematics (known
as the "Radon-Nikodym theorem", see Appendix). some details will be
given in the next section.
For computational purpose, when the pair of random variables (X, Y)
has a joint density function f(x, y), then
F(ylx) = 1Y
eo f(zlx)dz,
where the conditional density function of Y given X = x is

f( I ) - f(x, y) for fx(x) "I 0,
y x - fx(x)
and is defined as zero for fx(x) = 0, and fx(x) is the marginal density
1:
function of X given by
fx(x) = f(x, y)dy.
In view of the independence of events, the independence of random

variables is expressed as follows.
The random variables Xl, ... , Xn are said to be (mutually) independent
if n
P(Xl E A l , ···,Xn E An) = II P(Xi E Ai)

i=l
for all choices of Ai E B(JR), i = 1,2, ... , n.
The interpretation is this. The information related to each Xi is the

O'-field generated by Xi:
O'(Xi) = {X;I(B) : BE B(IR)}.
Saying that the Xi'S are independent is the same as saying that the col-
lections of events {O'(Xi) : i = 1, .. ·,n} are independent. (See Section
1.1).
In this spirit, independence of an arbitrary collection of random vari-
ables (such as an infinite sequence of random variables) is defined similarly.
For discrete or continuous random variables Xl,"', X n , the indepen-
dence of Xi'S is expressed simply as
n
f(Xl,"" Xn) = II fk(Xk), V(XI, ""Xn ) E IRn ,

k=1
where f is the joint mass (or density) probability function of the Xi'S and
fi is the marginal mass (or density )probability function of Xi'
Sums of independent random variables appear often in the studies of
stochastic processes. The following is the formula for obtaining their dis-
tributions.
Suppose that X and Yare two independent discrete random variables
with values in to, 1, 2, ...}. The distribution of Z = X + Y is completely
determined by the mass probability function
fx(n) = P(Z = n) = P(X + Y = n), n ;?: 0.
Now, for fixed n, (X + Y = n) = Uk=o(X = k, Y k). Since the events
n -
{w : X(w) = k, Yew) = n - k}, k = 0,1"", n are disjoint, we have
n n
P(X + Y = n) = L P(X = k, Y = n - k) =L P(X = k)P(Y = n - k),
k=O k=O
I:
by independence. The counter-part of this formula in the continuous case
IS
fz(z) = fx(x)Jy(z - x)dx, z E IR,
in symbol, fz = fx * fy. The operation * is called convolution. Note that

fx * fy = fy * fx. More generally, the convolution of two distribution
I:
functions F and G is defined as:
F * G(z) = F(z - x)dG(x).

18 Lesson 1
We conclude this section with a remark on the conditional independence

of random variables.
The Markov property ( Lesson 2) states that "given the present, the
future is independent of the past". Now, in the context of random phe-
nomena, these states are functions of random variables. Thus we need to
formulate rigorously the concept of conditional independence of random
variables. Since independence of random variables is essentially related
to u-fields generated by them, the appropriate place for formulating this
concept is in the next section.
1.3 Expectation
Consider a random experiment such as rolling two dice and let X be the sum
of two numbers shown. What is the average (mean or expected) value of X?
We will answer this question using our modeling scheme. The experiment
is modeled by the probability space (n,A,p), where n = (i,j) : i,j =
1,2, .. " 6}, A = pen), and P( {w}) = 1/36, 'Vw En. The random quantity
X is modleed as a random variable, that is, a map from n to {2, 3,·", 12}.
The probability mass function of X is
I(k) = P({w : X(w) = k}), k E {2, 3, .. " 12}.
If we repeat the experiments n times, then each value k is expected to

appear about nl(k) times. Thus the average of the results of X is
12 12
~ L:(nl(k))k = L:kl(k).
1:=2 1:=2
Thus, for random variables with finite ranges, the expected value (or mean,
or expectation) of X is taken to be
E(X) = L:xP(X = x).

:c
The extension of this formula to random variables whose ranges are infinite
(countable or not)is a little delicate. To avoid meaningless expressions such
as 00 - 00, we first consider random variables with constant sign, say, non-
negative (extended) random variables.
A random variable X with finite range {Xl, X2,"', xn} can be written
as n
X(w) = L: xi1A;(w),
i=l
where Ai = {w : X(w) = Xi}. Note that the Ai'S form a (measurable)

partition of n. Such a variable is called a simple random variable. We have
n
E(X) = L XP(Ai).
i=1
Now, let X be an extended non-negative random variable defined on

(n, A, P) Then X is the limit (pointwise) of an increasing sequence of simple
random variables. It suffices to consider
n2"-1 .
Xn(w) = L
i=O
2zn l[~~x<W](w) + n1[x?:n]'
It can be checked that
X(w) = lim Xn(w), 'r/w En.

n-+oo
On the other hand, since Xn(w) ::::; X n+1 (w) for all wEn and all n ~
1, we have E(Xn) ::::; E(Xn+d. (Exercise). Thus the sequence of non-
negative numbers of E(Xn) is non-decreasing and hence limn-+oo E(Xn)
exists (possibly 00). The expected value of the (extended) non-negative
random variable X is defined is defined to be
E(X) = n-+oo
lim E(Xn).
Note that, in fact, the value of E(X) is independent of any particular choice
of simple Xn / X.
When X is a non-negative discrete random variable with mass proba-
bility function f, then
E(X) = Lxf(x)
rc
1:
and if X is a non-negative continuous random variable with density function
f, then
E(X) = xf(x)dx.
Now let X be an arbitrary (extended) random variable. Then X can be

written as the difference of two non-negative random variables called the
positive and negative parts of X. Specifically, define
X+(w) = max(O,X(w)), X-(w) = -min(O,X(w)).
Then X(w) = X+(w) - X-(w).

20 Lesson 1
Definition 1.4 Let X be an extended random variable.

(i) If both E(X+) and E(X-) are 00, we say that the expectation of X
does not exist.
(ii) When not both E(X+) and (X-) are 00, we say that the expectation
of X exists and is equal to
E(X) = E(X+) - E(X-).
(iii) If both E(X+) and (X-) are finite, then we say that the expectation
of X is finite and that X is integrable,
( EIXI < 00 ¢::::> E(X+) < 00 and E(X-) < 00. )
Note that E(X) can be used to define the integral on (11, A, P) as

InX(w)dP(w). The following properties of expectation are easy to check.
(a) X $ Y implies that E(X) $ E(Y).
(b) For any real numbers a and (3, E(aX + (3Y) = aE(X) + (3E(Y).
For computations, we have
E(X) = I>f(x) (if X is discrete)
1:
:c
and
E(X) = xf(x)dx (if X is continuous).
1:
More generally, if t/J : IR --+ IR (measurable), then
E(t/J(X)) = t/J(x)f(x)dx.
If Xl, X 2, ... ,Xn are independent random variables, then
E (g =gXi) E(Xi) (Exercise) .
Note that, for an infinite sequence of independent random variables X n ,

n ~ 1, it might happen that
E (ii ii Xn) # E(Xn ).
(See Exercise 10 of Lesson 11).

Let n ~ 1 be an integer. If X ~ 0 or xn is integrable, then E(xn) is
called the moment of X of order n (or nth order moment of X). Note that
if E(xn) < 00, then E(xm) < 00 for m :::; n. However, X might not have
moments of order> n.
=
For n 2, the quantity E(X - E(X))2 is called the variance of X and
is denoted as Var (X) or simply V(X), its positive square root is called
the standard deviation of X. For two random variables X and Y, having
second moments, the covariance of X and Y is the quantity
cov (X, Y) = E [(X - E(X))(Y - E(Y))].
If cov (X, Y) =0, then X and Yare said to be un correlated. Of course,

independent random variables are uncorrelated, but the converse is not
true.
Now, we consider the important concept of conditional independence.
Consider two random variables X and Y, defined on (0, A, P). We are
going to formulate the notion of expectation of Y when we observe X.
First, suppose that X is discrete with range{xn, n ~ 1}. The variable
X induces a (measurable) partition (finite or countable) of 0:
Dn = {w : X(w) = xn} n ~ 1.
When X = X n , we might be interested in P(AIX = xn) for A E A and

E(YIX = xn). Of course P(AIX = xn) = P(AIDn).
Before observing X, the conditional probability of the event A given X
is a random variable defined as
P(AIX)(w) = Ep(AIDn )1D,,(W).

n~1
If Y is a random variable with finite range {Y1, Y2, ... , Ym}, then
m
E(Y) = E YiP(Bi), Bi = {w : Y(w) = yd,
i=1
thus, by analogy,
m
E(YIX = Xn) E(YIDn) = LYiP(B;lDn )

i=1
1 m
P(Dn) ?:y;P(B;
.=1
n Dn) = E(YIDn)
-,- , .
22 Lesson 1
In general, if the extended random variable Y whose expectation exists,

then E(YID) exists for D E A with P(D) > 0, where
E(YID) = 10 Y(W)dPD(W),
and PD(.) denotes the conditional probability measure on A defined by
PD(A) = P(AID), AEA.
It can be shown that
E(YID) = E(YID)/P(D).
Now, consider the partition D n , n ~ 1, induced by the discrete random
variable X. Before observing X, the conditional expectation of Y given X,
denoted as E(YIX), is a random variable. The above discussions leads to
the following definition.
Definition 1.5 Lei Y be an extended random variable whose expectation

exists and X be a discrete random variable. Then the conditional expecta-
tion of Y given X is a random variable defined by
E(YIX)(w) =L E(YIX = x n)l(x=x,,)(w).

n?:l
The dependence of the expectation of Y on X can be also expressed

in terms of the u-field u(X) generated by X. Here u(X) is the u-field
= =
generated by the partition Dn {w : X(w) x n }, n ~ 1. Note that u(X)
represents the information about X. Thus we can write
E(YIX) = E(Ylu(X».
Note that P(AIX) = P(Alu(X». With this identification, we have the

following:
(i) The random variable E(Ylu(X» is u(X)-measurable and for any
L L
A E u(X),
Y(w)dP(w) = E(Ylu(X»(w)dP(w),
where
L Y(w)dP(w) = 10 lA(w)Y(w)dP(w).
(ii) By E(YIX1 ,·· ·,Xk), we mean E(Ylu(Xl'·· ·,Xk».
(iii) We can define the conditional expectation of Y with respect to any

sub-u-field of A, as a function on 0 satisfying the conditions in (i). In partic-
ular, when X is continuous, E(YIX) is still well-defined in this framework.
The existence of a function E(YIX) satisfying the conditions in (i) is proved
by using a theorem in Measure Theory, known as the Radon-Nikodyn the-
orem (see Appendix).
We list below some useful properties of conditional expectations (Exer-
cise). Let V be a sub-u-field of A.
(a) E('IV) is increasing and linear:
X :::; Y ==> E(XIV):::; E(YIV) (a.s.)

where a.s. stands for almost surely, that is, the property is true on a subset
0 0 ~ 0 with P(Oo) = 1. Also for a, /3 E JR,
E (aX + /3YIV) = aE(XIV) + /3E(YIV), (a.s.)
(b) For V = {0, O}, E(XIV) = E(X).
(c) E (E(XIV)) = E(X).
(d) If C is a sub u-filed of V and C ~ V, then
E (E(XIV)IC) = E(XIC) (a.s.)

(e) If X is independent of V, that is, independent of {lD D E V},
then
E(E(XIV)) = E(X) (a.s.)
(f) If Y is V-measurable, then
E(XYIV) = Y E(XIV) (a.s.)

(g) Jensen's inequality: If ¢ : JR -+ JR is a convex funtion, and ¢(X)
is integrable, then
¢ (E(XIV)) :::; E (¢(X)IV) (a.s.)
We close this section with the definition of conditional independence of

random variables.
Definition 1.6 We say that X and Yare conditionally independent

given Z if for any A E u(X) and B E u(Y), we have
P (A n Blu(Z)) = P (Alu(Z)) P (Blu(Z)) (a.s.)
24 Lesson 1
1.4 Limit theorems

When using stochastic processes to model random phenomena (Lesson 2),
we are interested in their behavior for large values of the time parameter (as
well as other properties such as their time dependent structures). The con-
cept of limits of sequences of random variables is suitable for investigating
this property.
Let X n , n ~ 1, be a sequence of random variables defined on O. There
are different kinds of convergence for (Xn, n ~ 1).
Definition 1.7 The sequence (Xn, n ~ 1) is said to converge in proba-
bility to a random variable X if for any c > 0,
lim P(IXn
n ..... oo
- XI > c) = 0,
in symbol, Xn ~ X.
The interpretation is this. With high probability, Xn is close to X for
large values of n.
A stronger concept of convergence is
Definition 1.8 The sequence (Xn, n ~ 1) is said to converge almost
surely (or with probability one) to X if
P (w : Xn(w) --+ X(w)) = 1,
in symbol, Xn ~ X.
Remarks.
(i) It can be shown that if Xn ~ X, then Xn ~ X. The converse
does not hold. See Exercise 1.25.
(ii) To prove the a.s. convergence, the following equivalent criterion is
useful:
Xn~X ¢::::> lim P (sup IXk - XI
n ..... oo k~n
> c) = 0
for any c > O.

(iii) For random variables with finite second moments, Tchebychev's
inequality is useful for checking convergence in probability:
P (IX - E(X)I ~ c) ~ V(X)jc 2 •
Concerning the moments of random variables, we have

Definition 1.9 Let each X n , n ~ 1 and X have finite moments of order

k. Then the sequence (Xn , n ~ 1) converges in k-mean to X if
lim E (IXn - Xlk) = 0,
n_oo
in symbol, Xn ~ X. In particular, when k = 2, the L2-convergence is

also called the convergence in mean square.
Remarks.
(i) The Lk-convergence implies the convergence in probability.
(ii) If Xn ~ X, then liffin_oo E(Xn) = E(X).

(iii) There are no simple relations between Lk-convergence and a.s.-
convergence.
Finally, we are interested in the limiting distribution of the Xn's.
Definition 1.10 Let X n , n ~ 1 and X be random variables with distri-
bution functions Fn, n ~ 1 and F, respectively. Then Xn is said to be
converge in distribution to X, denoted by Xn ~ X, if
lim Fn(x) = F(x) \:Ix E C(F),

n-oo
where C(F) denotes the subset of IR on which F is continuous.
Remarks.
(i) If Xn ~ X, then Xn ~ X.
(ii) When Xn ~ X, F is called the limiting distribution of the sequence

(Xn , n ~ 1).
The two important results in Probability Theory related to various
modes of convergence of random variables are the following.
A. Laws of large numbers.
There are two types of laws of large numbers, which are "strong" (a.s.)
and "weak" (in probability), according to the convergence concept involved.
(a) Weak law of large numbers. If (Xn , n ~ 1) is a sequence of in-
dependent random variables having the same distribution (identically dis-
tributed) with finite mean Il, then
Xl +X2 + ···+Xn p
~ Il, as n -+ 00.
n
26 Lesson 1
(b) Strong law of large numbers. If (Xn, n ~ 1) is a sequence of

independent, identically distributed random variables with E(IXll) < 00,
then
Xl +X2 + ···+Xn ~ E(X!) , as n --+ 00.
n
B. Central limit theorem.
This theorem concerns the limiting distribution of the partial sums
Sn = Xl + X 2 + ... + Xn property centered and normalized. Specifically,
if (Xn, n ;::: 1) is a sequence of independent, identically distributed random
variables with finite common second moment, then
Sn - nE(X!) N(O 1)
= D
---+ "as n --+ 00,
u"n
where u is the standard deviation of Xl and N(O, 1) denotes the standard
normal random variable with probability density function given by
1 _x2/2
f(x) = .,fi;e , x E JR.
Remarks.
(a) Saying that the sequence Zn = (Sn - nE(X!)j(uVn) ~ N(O,I)
is the same as
lim P(Zn
n--+oo
~ t) = 1t-00
!.::e-
v27r
x2 / 2 dz, Vt E JR.
(b) The proof of the centallimit theorem involves the transformation of

the distribution functions, known as "Fourier transform". Specifically, let
f be the probability density function of the random variable X. Then the
1:
characteristic function of X is defined to be:
j(t) = E(eitx ) = eitx f(z)dx, Vt E JR,
where i is the usual complex number R. The transformation j is "char-

acteristic" in the sense that it determines completely the distribution of
X. This transformation is useful in finding distribution functions. Other
transformations are
(i) Generating functions. If X is a non-negative, integer-valued ran-
dom variable, the the generating function of X is defined by
L P(X = n)tn
00
q,(t) = E(tX) =
n=O
for It I < 1.
(ii) Laplace transform. For X ~ 0, the Laplace transform of the
density f of X is
1/;(t) = E(e- tX ) = 1 00
e- tx f(x)dx
for any complex t.
1.5 Exercises
1.1. Specify (O,A, P) for the following random experiments.
(i) Tossing a balanced coin five times.
(ii) An urn contains 10 white and 4 black balls. Five balls will be drawn
(without replacement) from the urn. An outcome is defined as the number
of black balls obtained in the drawn sample.
(iii) Consider a unbalanced coin with probability of getting a head in
each toss equal p. Toss that coin (independently) until the first head ap-
pears. An outcome is defined as the number of tosses needed.
1.2. Let (0, A, P) be a probability space. Show that
(i) A is a field.
(ii) If A, B E A, then A - B = {w : w E A,w ~ B} EA. (Hint: first
=
prove the DeMorgan's Laws: (A n B)C AC U B C, (A U B)C N nBc.) =
(iii) If An E A, n 2:: 1, then n~=lA, EA.
(iv) If A, B E A with A ~ B, then P(A) ~ P(B).
(v) If An E A, n 2:: 1, then
P (91 An) ~; P(An).
(vi) If A, B E A, then
P(A U B) = P(A) + P(B) - P(A n B).
1.3. Let An ~ 0, n ~ 1. (i) Show that

lim inf An ~ lim sup An.
n ..... oo n ..... oo
28 Lesson 1
(ii) Verify that
f: 1
l~~~f An = {w : n=l A :',(w) < oo}
whereas
lim sup An
n-+oo
= {w : f:
n=l
1An (w) = oo}.
Give an interpretation for these events.
1.4. Let 0 be an infinitely countable space and let f: 0 -+ [0,1] such that
EWEO f(w) = 1. Define P : P(O) -+ [0,1] by
P(A) = I: f(w), A E P(O).

wEA
Verify that P is a probability measure on the measurable space (0, P(O).

1.5. Let 0 be a set and C = P(O).
(i) Show that the collection of IT-fields containing C is not empty.
(ii) If Al and A2 are two IT-fields containing C, then Al n A2 is also a
=
IT-field containing C, where Al nA 2 {A: A E A l , A E A 2 }.
(iii) Show that the intersection of all IT-fields containing C is the smallest
IT-field containing C (the IT-field Al is smaller than the IT-field A2 if Al ~ A2
).
1.6. Let (0, A, P) be a probability space. An infinite countable (measur-
able) partition of 0 is a sequence An E A, n ~ 1 such that An n Am 0 =
for n # m, and U~=lAn =
O. Let {An, n ~ 1} be an infinite countable
partition of 0 and B E A. Show that
= I: P(An)P(BIAn)
00
P(B)
n=l
and for P(B) > 0,
P(Am)P(BIAm)
P(AmI B ) = .... 00 ...,of A \ rofrol A \' "1m ~ 1.
1.7. Consider the following events in the experiment of tossing of a fair

coin twice: A =" a head occurs on the first toss", B =" a head occurs on
the second toss", and C ="exactly one head occurs". Are A, B, C pairwise
independent? Are A, B, C mutually independent?
1.8. Let (0, A, P) be a probability space. Let A, B, C E A such that
P(A n B) > O. Show that if P(ClA n B) = P(CIA) then Band C are
independent given A.
1.9. Let (0, A, P) be a probability space. Let X : 0 -+ JR.
(i) Show that for A, An ~ JR, n ~ 1,
X-l(A C) = (X-l(A))C, X- l COl An) = nOl X-l(An),

and
X- l CQ An) [1 X-l(An).
=
(ii) Let X- l (B(JR)) = {X-l(A) : A E B(JR)}, Verify that X- l (B(JR))

is a u-field on O. Let X(A) = {X(A) : A E A}, where X(A) = {X(w) :
wE A}. Is X(A) a u-field on JR?
(iii) Let Px(.) = P (X-l(.)) on B(JR). Verify that Px(.) is a probability
measure.
10. Let X be a random variable taking values in 1R = [-00,00]. Recall
that such an extended random variable is defined by the condition: {w :
X(w) ~ t} E A for any t E JR.
(i) Verify that
00
{w: X(w) < oo} = U{w: X(w) ~ n}.

n=l
(ii) Use (i) to show that
{w: X(w) = oo}, {w: X(w) = -oo} E A.

(iii) Verify that if (Xn, n ~ 1) is a sequence of extended random vari-
ables, then
{w: supXn(w) ~ t} = n{w: Xn(w) ~ t}, '<It E JR.

n n
(iv) Use (iii) to show that SUPn Xn and infn Xn are extended random
variables. Show also that {w : Xn(w) converges} E A.
30 Lesson 1
1.11. Let F be the distribution function of a random variable X, that is,
F : IR --t [0,1]' F(x) = P(X ~ x).
Show that
(i) F is monotone non-decreasing, i.e., x < y implies F(x) ~ F(y).
(ii) li~-+_oo F(x) = 0, liII1n-+oo F(x) = 1.
(iii) F is right-continuous, i.e., liIlly",,,, F(y) = F(x) for any x E IR.
1.12. A random variable X taking values in an interval [a, b] ~ IR is said to
be uniformly distributed on [a, b] if it is a continuous random variable with
the probability function given by
1
f(x) = b _ a l[a,6](x), x E IR.
(i) Find the distribution function of X.

(ii)Compute P(X > a) for a < a < b.
1.13. Let f(x) = ie-AI"'I, x E IR (for some A > 0).
(i) Verify that f is a probability density function.
(ii) Find the associated distribution function.
1.14". Let X : (0, A) ~ (JR, B(JR». Show that X is a random variable if
and only if one of the following conditions holds. For all x E JR,
(i){w: X(w)~X}EA.
(ii) {w : X(w) > x} EA.
(iii) {w : X(w) ~ x} E A.
(iv) {w : X(w) < x} EA.
1.15. Compute the means and variances of the following random variables.
(i) Binomial: f(k) = (~)pk(I-P)n-k,k=0,1,2, ... ,nwithgiVen
nand p E [0,1].
(ii) Geometric: f(k) = p(1 '- p)k-l, k = 1,2,··· with p E [0,1].
(iii) Poisson: f( n) = e- A An / n!, n = 0, 1,2, ... with A > 0.
(iv) Exponential: f(x) = Ae- A"'I(o,oo)(x) with A > 0.
(v) Normal: f(x) = e-("'-JJ)2/(2u 2) /(.../2-i, x E JR with I' E IR and > 0.
(J'
(vi) Gamma (n, A): /(x) = Ae-A~(AX)n-l j(n - 1)11[0,00)(x) with A > 0
and n> O.
1.16*. Let X be a random variable taking values in {O, 1,2,·· .}. Show that
00
E(X) = L P(X > n).
n=O
1.17*. Show that

(i) If X ~ 0 then E(X) = 1000 P(X > t)dt.
(ii) For any real-valued random variable X,
E(X) = 1 00
P(X > t)dt -1~ P(X $ t)dt.
(iii) E(IXlk) = kIt t k- 1P(IXI > t)dt.
1.18. Let X : (0, A, P) -+ ni+ = [0,00] be a non-negative random vari-
able. For each integer n, define
n2"-1 .
Xn(w) = L 2zn 1[~~x<Wl(w) + n1[X~nl(w).

;=0
Show that
(i) For any wE 0, Xn(w) $ Xn+1(w), (Vn).
(ii) liffin-+oo Xn(w) = X(w), Vw E O.
1.19. Let X and Y be random variables with finite ranges. Show that
(i) X $ Y implies that E(X) $ E(Y).
(ii) E(X + Y) = E(X) + E(Y).
1.20*. Let X be a non-negative extended random variable. Suppose that
P(X =00) > O. Show that E(X) 00. =
1.21. Let X be a random variable with values in {Xl, X2,···, x n }. let
Dk = =
{w : X(w) Xk}, k 1,2,···, n.=
(i) Verify that the Dk'S form a (measurable) partition of O.
(ii) For A E A, show that E(AIX) = P(A).
(iii) Let Y be a discrete random variable, independent of X. Show that
P(X + Y = niX = m) = P(Y = n - m).

32 Lesson 1
1.22. Prove the properties of conditional expectation listed at the end of

Section 1.3.
1.23. Show that
(i) The characteristic function of N(O, 1) is e- t2 / 2 • What is the charac-
teristic function of N(I', 0'2)?
(ii) The generating function ofthe Poisson random variable with f( n) =
e- A An In!, n = 0,1,2· .. is e-(l-t)A.
1.24. Let Xl, X 2 , •• " Xn be independent random variables. Show that the
characteristic (respectively, generating) function of the sum Xl +X2 + ... +
Xn is the product of the characteristic (respectively, generating) function
of the Xl's, j = 1,2"", n.
1.25. Let X, X n , n ~ 1, be random variables defined on (S1,A, P).
(i) Show that A = {w : Xn(w) - + X(w), as n -+ co} EA.
(ii) Let An(c) = {w: IXn(w) - X(w)1 > c}. Show that
Xn ~X if and only if P (lim sup An (c))

n-+oo
= 0,
for any c > O.
(iii) Suppose that the Xn's are independent with
P(Xn = 1) = -n1 = 1 - P(Xn = 0).

Show that Xn ~ O. Does Xn converge a.s. to O? (Hint: use Borel-Cantelli
lemma.)
Lesson 2
Modeling Random
Phenomena
In this Lesson, we motivate the use of the concept of Stochastic Processes as

a means to model random phenomena. It is emphasized that the analysis
of random phenomena in terms of stochastic processes relies heavily on the
mathematical theory of probability.
2.1 Random Phenomena

As opposed to deterministic phenomena, random phenomena are those
whose outcomes cannot be predicted with certainty, under identical con-
ditions. We are all familiar with gambling schemes such as "tossing a fair
coin", "rolling a pair of dice", etc. Random phenomena which evolve in
time are the subject of this text. The following are examples of such phe-
nomena.
Example 2.1 A xerox machine in an office is either "out of order" or "in

operating condition". Let Xn denote the state of machine, say, at 8:00am
of the nth day. This is an example of a random phenomenon which evolves
in discrete time and which has a finite number of "states".
Example 2.2 Let X t , t ~ 0 denote the state of a patient (with a specific

=
disease) at time t. Suppose there are four possible states: 1 the patient
is identified as having the disease; 2= recovery; 3 = death due to disease;
4 = death due to some other cause. This is a random phenomenon which
evolves in continuous time and which has a finite "state space".
33
34 Lesson 2
Example 2.3 In Example 2.1, starting with n = 1, let Y n be the number

of days (among the first n days) where the machine is "out of order". The
sequence {Yn , n ~ I} constitutes a random phenomenon evolving in discrete
time and having an infinitely countable state space {O, 1,2", .}.
Example 2.4 Consider an event such as "the arrival of a customer for

service at a bank". Obviously, such events occur at random times. If we let
Tn, n ~ 1 denote the arrival time of the nth customer, then the sequence
{Tn, n ~ I} constitutes a random phenomenon evolving in discrete time
and having a continuous state space [0,00).
Example 2.5 In Example 2.4, if we let Nt, t ~ 0 be the number of events

that have occurred in the time interval [0, tJ, then the family {Nt, t ~ O}
constitutes a random phenomenon evolving in continuous time and having
a discrete state space.
Example 2.6 An example of a random phenomenon evolving in continu-

ous times and having a continuous state space, say (-00,00), is the famous
Brownian motion. It was observed that small particles immersed in a liquid
exhibit irregular motions. Thus the displacement of a particle at time t,
X t , along some axis from its starting position, is a random quantity. The
motion of the particle is a random phenomenon evolving in continuous time
and having a continuous state space.
2.2 Stochastic Processes

If we examine the above examples, then we see that there is uncertainty
in the "outcomes" of the phenomena. If we make the basic assumption
that the uncertainty involved is due to chance (or randomness), then it
makes sense to talk about the chances for their occurrences although the
outcomes cannot be predicted with certainty. For example, in tossing a fair
coin, although we cannot predict with certainty the occurrence of H(ead)
or T(ail) , we still can assign 50-50 chance to each of these two possible
outcomes.
Thus the random phenomena can be viewed as family of random vari-
ables indexed by a time set. The mathematical concept of random variables
as well as related concepts were reviewed in Lesson 1.
From the above point of view, we are going to describe a random phe-
nomenon as a stochastic process, that is, a family of random variables
X t , t E T, where T is some index set, usually, T ~ R = [-0000), in-
terpreted as a time set. The common range of random variables, Xt's (the
Modeling Random Phenomena 35
set of their possible values) is called the state space of the process and is
denoted by S.
Stochastic processes are thus the mathematical models for random phe-
nomena. They are classified according to the nature of the time set T and
the state space S (discrete or continuous). For example, it T is continuous,
say, [0,00) and S is discrete, say, S = {-'" -2, -1, 0,1,2,·· .}, then the
process is called a continuous-time process with discrete state space. The
classification of stochastic processes is exemplified by the examples of the
previous section as follows.
Example 2.1: A discrete-time stochastic process with a finite state space.
Example 2.2: A continuous-time stochastic process with a finite state
space.
Example 2.3: A discrete-time stochastic process with a discrete state
space.
Example 2.4: A discrete-time stochastic process with a continuous state
space.
Example 2.5: A continuous-time stochastic process with a discrete state
space.
Example 2.6: A continuous-time stochastic process with a continuous
state space.
2.3 Distributions of Stochastic Processes

We are going to specify rigorously the structure of stochastic processes.
The standard probability background for this section has been reviewed in
Lesson 1.
As stated in Section 2.2, a stochastic process X is a collection ofrandom
variables X t , t E T. Since, in general, the time set T is infinite (countable
or not), we need to elaborate a little bit on the concept of probability laws
(distributions) governing an infinite collection of random variables. Note
that a stochastic process (Xt , t E T) can be viewed as a random function,
that is a random variable talking values in a space of functions. (See details
below.)
To be concrete, consider the case where T = [0, 00) and S = IR =
(-00,00). Each random variable X t is defined on some probability space
(0, A, P) and taking values in the set IR of real numbers. To specify the
process X = (Xt , t ~ 0) is to specify the space (O,A, P) and the maps X t ,
t ~ O. As we will see in the following Lessons, it is possible, in practice, to
specify the finite dimensional distributions of X, that is, joint cumulative
distribution functions (CDF) of the form
F(tl,'",t .. )(X1,···, xn) = P (Xh ~ Xl,···, X t .. ~ Xn)
36 Lesson 2
for n 2: 1, t1, .. " tn E T, :e1, .. ',:e n E JR, or equivalently, the probability

measures of the random vectors (Xtl' .. " X t ,,), namely
Pt(B) = P{w: (Xt1(w),"',Xt,,(w) E B} (2.1)
where t = (t1' .. " tn) and B E 8(JRn ) (see Lesson 1 for notation).
The construction of (0, A, P) and X t should take the set :F of all finite
dimensional distributions of X into account.
First, for each w E 0, the sample path at w is the real-valued function
defined on T : t -+ Xt(w). Thus we can take 0= JRT which is the set of
all real-valued functions defined on T, so thatXt(w) = w(t), with wE JRT,
that is, for each t E T,
X t : JRT --JR.
For X t to be random variable, the u-field A on JRT should be such that
X t- 1 (B) E A for any B E 8(JRn ).
More generally, in view of (2.1), A should also contain all (finite dimen-
sional) cylinder sets of JRT, that is, subsets A of JRT of the form
A = {w E JRT: (w(tt),··· ,w(tn) E B}
for some B E 8(JRn ).

Let C denote the set of all such cylinder sets of JRT. Then take A to
be the u-field generated by C, denoted by u(C), i.e., the smallest u-field
containing C.
It remains to construct a probability measure P on (lRT, 0'( C») satisfy-
ing (2.1) with the collection :F = {Pt } given in advance.
Observe that if (0, A, P) is given, then the induced collection :F will
satisfy the following consistency condition:
(i) If a is a permutation of elements of {I, 2, .. ·n} and
fa: JRn - - JRn: (:e1,"', :en) -+ (Xa(l), .. " Xa(n») ,
then, obviously,
Pt(B) = Pa(t) (J~l(B» ,
for BE 8(JRn ), t = (t1,"', tn), and a(t) = (ta(l),"" ta(n»'
(ii) For t = (t1,"', t n), s = (t1,"" tn, Sn+1), and B E 8(JRn ), we have
Pt(B) = Ps(B x JR).
Thus, it is possible to construct P compatible with (2.1) when the given

collection :F satisfied the above consistency condition. Below we will sketch
the proof that P is unique. The probability P so obtained is referred to as

the distribution of the process X. It represents the complete probabilistic
information concerning the process X, in the same way that a probability
measure characterizes probabilistically a random variable. We also refer to
P as the probability law governing the random evolution of the process X,
or of the random phenomenon under study.
Note that the construction (IR7, u(C), P) and X t : IR7 -+ JR : w -+
w(t), is referred to as the canonical representation of the process X. From
the probabilistic view point, two processes are equivalent if they admit the
same collection of finite dimensional distributions F.
The construction of P from a consistent family F goes as follows.
First, it can be verified that the collection C of all (finite dimensional)
cylinder sets of JRT is a field. Define P on C by
P(A) = Pt(B), (2.2)
where A = {w E JRT : (W(tl),'" ,w(tn) E B} and t = (t1,"", t n), and

BE 8(JRn ).
Although the representation of cylinder sets is not unique, P is well-
defined on C through (2.2), that is, the value P(A) is the same for all pos-
sible representations of A. This is guaranteed precisely by the consistency
condition of F.
It can be shown that Pis u-additive on the field C. Then from a stan-
dard extension theorem in measure theorey (see Appendix), P is uniquely
extended to a probability measure on u(C). This result is called the Kol-
mogorov existence theorem.
From the above canonical representation of a stochastic process X, we
see that, in applications, it suffices to specify the set F of all possible finite
dimensional distributions of X. The knowledge of F is sufficient for com-
putations of all quantities of interest related to X. Since the computations
of various events of interest in stochastic processes are based on the rig-
orous calculus of probabilities, some technical problems should be at least
mentioned. We have in mind the justification of various subsets as events,
that is in the domain of a probability measure so that the computations of
their probabilities make sense.
(a) The u-field u(C) might be too small as compared to the space JRT.
We might need to enlarge u(C) to include more subsets of JRT.
For any given probability space (0, A, P), it is always possible to enlarge
A without changing P on A (see Exercise 2). A probability space (0, A, P)
is said to be complete if subsets of elements A E A such that P(A) = 0
are elements of A. In other words, all subsets of zero probability events
38 Lesson 2
are events. Unless stated otherwise, (0, A, P) is always assumed to be

completed, without loss of generality.
(b) When dealing with continuous-time stochastic processes, we might
be interested in computing the probabilities of "events" such as {w E
R[O,oo) : w(.) is continuous },
{w: wet) =0 for some t ~ O} = Ut~o{w: wet) = OJ,

{w: supXt(w):::::; a} = nt>o{w: Xt(w):::::; a}.
t~O -
Now, observe that the above subsets of R[O,oo) are uncountable unions and
intersections of elements of u(C). They need not be in u(C). This also
implies that functions like SUPt>o X t (.) and inft>o X t (.) might not be u(C)-
measurable (i.e., they might not be random variables).
Fortunately, since the real line R is rich enough in structure, the above
technical problem can be handled by calling upon the concept of separable
versions of stochastic processes.
Specifically, let T = [0,00), or more generally, an interval of R, a
stochastic process is said to be separable if there exist a countable dense set
D ~ T and A E A with peA) = 0 such that
{w: Xt(w) E F, tEInD} \ {w: Xt(w) E F, t E InT} ~ A (2.3)
for any closed set F and any open interval I of R.

Let B = {w: Xt(IN) E F, tEInD} and C = {IN: Xt(w) E F, t E
In T}, we have C ~ B as D ~ T. Not that B \ C = {w : wEB, w ¢ C}.
For a separable process, (2.3) implies that
AC n B = A n C.
C (See Exercise 3)
where A C is the complement of A, that is

countable, AC n BE A, and hence AC n C E A.
°\A. Since A E A and I n D is
Assuming that (0, A, P) is complete, we have An C ~ A with A E A

and peA) = 0, and hence An C E A (of course peA n C) = 0). Now C =
(A n C) U (AC n C) E A. Thus for separable stochastic processes, functions
such as SUPt>o X t (.) and inft>o X t (.) are legitimate random variables.
Fortunately, every stochMtic process X = (Xt, t E T) with state space
S C Rand T being an interval of R, has a separable version, that is a
stodastic process X = CXt, t E T) which is equivalent ot X. Thus in the
following, without loss of generality, we always assume that (0, A, P) is a
complete probability space and that real-valued, continuous-time processes
are separable.
Modeling Random Phenomena. 39
2.4 Some Important Properties of Stochastic

Processes
First consider a. very special stochastic process X = =
(Xn, n 1,2", -) with
state space S = {O, I}. We assume that the variables Xn's are indepen-
dent and having the same distribution, say,
P(Xn = 1) = p = 1 - P(Xn = 0), '<In ~ 2.
Such a process is called a Bernoulli process. It is a mathematical model to
describe random phenomena such as
=
(i) Tossing a fair coin indefinitely (p 1/2), where we set Xn = 1 or 0
according to the outcome of the nth toss is "head" or "tail".
(ii) any experiment consisting of an indefinite number of trials, in which
trials are independent of each other, each trial has only two possible out-
comes called S(uccess) and F(ailure), and the chance of getting an S in each
trial is the same for all trials.
For such processes, their distributions are easy to specify in terms of
finite dimensional distributions, mainly because of the assumption of inde-
pendence. In general, stochastic processes X = (Xt, t E T) exhibit some
form of dependence among variables. Also, the distribution of X t can be
different from that of X t " for t :/; t'. The specification of their distributions
can be easily made if the processes possess some special properties.
In the following, we consider processes with T ~ IR and S ~ IR. Some-
times, it is necessary to use the extended real line IR = [-00,00] so that
expressions like sup(remum) and inf(imum) exist on any arbitrary subset
ofIR.
In the Bernoulli process (Xn, n 2: 1), if we are interested in the number
of S's in the first n trials, then we set
Yn = Xl + X 2 + ... + X n .
The stochastic process (Yn , n ~ 1) has the following properties:
(i) for all choices of nl, n2, .. " nm with nl < n2 < ... < n m , the random
variables Y n2 - Ynu Y n3 - Y n2 , "', Yn", - Yn"'_l are independent;
(ii) for any nand m, the distribution of the increment Yn +m - Yn
depends only on m (and not on n).
The above properties can be shared by other processes.
Definition 2.1 The process X = (Xt, t E T) is said to have independent
increments if for all choices oftl, t2,"', tn in T with tl < t2 < ... < tn,
40 Lesson 2
the random variables X t2 -Xt1 , Xt3 -Xh ,"', X t " -Xt,,_l are independent.
If for any t, sET with t < s, the distribution of X6 - X t depends only on
s - t, then the process X is said to have stationary increments.
Various stochastic processes studied in this text have stationary and

independent increments, such as Poisson processes (Lesson 4).
From a modeling point of view, the assumption of independent incre-
ments seems appropriate when the random phenomenon exhibits an obvious
fact that outcomes in disjoint time intervals are independent. The station-
ary increments property can be postulated when it seems plausible that the
distribution of outcomes in any time interval depends only on the length of
that interval.
For processes having stationary and independent increments, their finite
dimensional distributions are obtained simply in terms of distributions of
increments. Thus it suffices to specify the latter in applications.
Next, the process (Yn , n ~ 1), associated with a Bernoulli process
(Xn, n ~·1), has the following form of "conditional dependence":
For any n ~ 1, the conditional distribution ofYn given YI , Y 2 , " ' , Yn - l ,
is the same as the conditional distribution of Yn given Yn - l . Indeed, since
Yn =Yn - l + X n ,
P(Yn = klYI = kl, .. " Yn- l = kn-d P(Yn = klYn- 1 = kn- l )

P(Xn = k - kn-I}.
Roughly speaking, the ''future'' Y n depends only on the "present" Yn-l,
and not on the entire "past" Yl, Y2,"', Yn-l. In other words, given the
present Yn-l, the future Y n is independent of the past YI , Y 2, ... , Y n - 2.
This property is formulated in the general case under the name of Markov
property.
Definition 2.2 The stochastic process X =

(Xt, t E T) is called a Markov
process if it has the following Markov property:
For any t, sET, with t < s, the conditional distribution of X6 given
X t is the same as the conditional distribution of X6 given {Xu, u $ t}.
Specifically, for any choices oftl, t2,"', tn in T and B E B(IR), we have
P (Xt" E BIXtt = Xl,"', Xt"_l = Xn-l) = P (Xtn E BIXtn_1 = Xn-l) .

The Markov property is suitable for modeling situations in which future
behaviors of random phenomena are not altered by additional information
about their past, once their present states are known. Markov processes
will occupy a large portion in this text. Poisson processes (Lesson 4) are
examples of continuous-time Markov chains (when the state space S is
discrete, that is finite or infinitely countable, the Markov process is called

a Markov chain), Brownian motion (Lesson 12) is is a continuous-time
Markov process.
The specification of finite dimensional distributions of a Markov process
is obtained by using
(i) transitional probability functions, that is expressions of the form
P(Xs E BIXt = x), t<s

and
(ii) the initial distribution of X to , where to is the smallest element ofT.
Note that independent increments property is a stronger requirement
than Markov property.
Next, observe that the identical distributions of the Xn's in a Bernoulli
process can be expressed as follows.
For any n, m, the distribution of Xn and X n+m are the same. More
generally, in view of the independence property of Bernoulli processes, for
any nl, n2, ... , nm and k, the joint distributions of the random vectors
(Xnll X n2 ,·· ·,Xnm ) and (Xn,+k,Xn2+k," "Xnm+k) are the same. This
property is formulated for other processes as follows.
Definition 2.3 A stochastic process X = (Xt , t

E T) is strictly station-
ary if for any choices oftl, t2,"" tn in T and h > 0 such that tj + hE T
for each i =1,2,···,n, the joint distribution of (Xt"Xt2 , .. ·,Xt .. ) and
(Xt,+h,Xt2 +h," "Xt .. +h) are the same.
As a consequence of the definition, we see that, for a strictly stationary

process, the distributions of Xt's are identical. Indeed, take h = s - t,
n = 1, and t1 = t, in the definition (with arbitrary t, s such that t < s), the
distribution of X t is the same as that of Xt+(s-t) = Xs.
When dealing with second order processes, i.e., processes (Xt , t E T)
such that E(X2) < 00 for each t, the following concept of stationarity is
useful (Lesson 9 and Lesson 10).
Definition 2.4 A second order process (Xt , t E T) is called a weakly sta-
=
tionary process if the mean function m(t) E(Xt ) is independent oft,
and the covariance function Cov (Xs, Xt) depends only on the difference
/t - s/.
Remark.
A weakly stationary process is also referred to as stationary in the wide
sense, or second order stationary. A second order process which is strictly
42 Lesson 2
stationary is also weakly stationary (exercise). However, a strictly station-

ary process with infinite second moments cannot be weakly stationary.
From modeling point of view, stationary processes are appropriate to
model random phenomena whose random behaviors seem unchanged through
shifts in time, such as in economics, communication theory, ...
Finally, if we consider a process, say (Xn , n ~ 1), such that the Xn's
are independent and have means zero, then the associated process Yn
E~=l X n , n ~ 1 has the following property:
E(Yn+lIYl = Yl,···, Yn = Yn) = Yn,

since Y n+1 = Yn + Xn with E(Xn) = O. This property is formulated as
follows.
Definition 2.5 A stochastic process (Xt , t E T) with E(IXtl) < 00, for
each t E T is called a martingale, iffor any choices oftl < t2 < ... < tn+l
in T, we have
E (Xt,,+1IXt1 , X t2 , ... , Xt,,) = X t " (a.s.).
Discrete-time martingales will be treated in Lesson 11. The concept of
martingales is appropriate for modeling random phenomena such as fair
games. Note that martingale and Markov properties are distinct concepts
(Exercise).
2.5 Exercises
2.1. Give several examples ofrandom phenomena together with their math-
ematical modeling by stochastic processes.
2.2*. Let (0, A, P) be a probability space.
(i) Define the collection .A of the subsets of 0 as follows.
For A ~ 0, A E .A if and only if there are B l , B2 E A such that
Bl ~ A ~ B2 and P(B l ) = P(B2). Verify that A ~ .A and .A is au-field.
(ii) Define P : .A -+ [0,1] by P(A) = P(Bt) = P(B2). Show that P is
well-defined and is a probability measure on A.
(iii) Let A E .A with P(A) = O. Show that if B ~ A, then B E A.
((0,.4, P) is called the completion of (0, A, P)).
2.3*. Let Xt, t E [0,00), be a real-valued stochastic process.
(i) verify that
{ w: supXt(W):$ x} = n{w: Xt(W):$ x}.

t~O t~O
(ii) For A, B, C E A, Show that ifC ~ Band B\C ~ A then AcnB =

AcnC.
(iii) Show that if the process (Xt, t ~ 0) is separable, then the map
w . - inf Xt(w)
t~O
is a random variable.
(iv) Explain why the assumption of completeness of (0, A, P) is neces-
sary in addressing the concept of separability of stochastic processes.
2.4. Let (Xn, n ~ 1) be a Bernoulli process with state space S = {O, I}
and probability of "success" p = P(Xn = 1).
(i) Compute P(X2 = 0, X5 = 1, Xs = 1).
(ii) Give an explicit formula for computing finite-dimensional distribu-
tion of the process.
(iii) Let Yn = E~=l Xi, n ~ 1. Verify that the process (Yn , n ~ 1) has
stationary and independent increments.
(iv) Is (Yn , n ~ 1) a Markov process?
(v) Is (Yn , n ~ 1) a martingale?
2.5*. Consider the experiment consisting of tossing a fair coin indefinitely.
The space of all possible outcomes is
0= {O, l}IN = {w = (Wl,W2,"') : Wi = 0, 1} ..

(i) What is the cardinality of o?
(ii) Specify the probability space (0, A, P) for modeling the above ex-
periment.
2.6. Let (Xt, t ~ 0) be a (strictly) stationary process with E(Xl) < 00,
Vt ~ 0. Show that this process is weakly stationary.
2.7. Let X t , yt, t E T be two stochastic processes defined on (O,A,P).
Show that if X t = yt almost surely, that is P(w : Xt(w) = yt(w» = 1,
for any t ~ 0, then (Xt, t ~ 0) and (yt, t ~ 0) have the same collection of
finite dimensional distributions.
2.8*. Let (O,A,P) = ([0, 1],B([0, 1]),dx) and S = [0,1]. Let Xt(w) = on
=
°=
°
S for eachw E [0,1]' yt(w) OonS\{to} foreachw E [0, 1], and yto(w) 1
for w = to, elsewhere. Verify that for any t E [0, 1], P(Xt = yt) = 1.
2.9. Let (Xn, n ~ 1) be a discrete-time stochastic process with state space
S = {O, 1, 2", -}. Suppose that the variables Xn's are independent and
have the same probability density function f. Consider the process Yn =
E~=l Xi, n ~ 1.
44 Lesson 2
(i) Verify that (Yn , n ;::: 1) is a Markov chain.

(ii) Show that P(Yn +1 = ylYn = x) is independent of n.
= = =
(iii) Compute P(Y1 nl, Y2 n2, Y3 n3) in terms of f.
(iv) Give an explicit formula for computing finite-dimensional distribu-
tions of the process (Yn , n ;::: 1).
2.10. Let (Xn, n ;::: 0) be a process having independent increments. Show
that such a process is necessarily a Markov process.
Lesson 3
Discrete Time Markov

Chains
This Lesson is devoted to detailed studies of an important class of stochastic

processes whose time-dependent structures are simple but general enough
to model a variety of practical random phenomena.
3.1 The Markov Model

Consider the following typical random phenomena.
Example 3.1 Consider a system in which two identical pieces of equip-

ment are installed in parallel. These pieces of equipment act independently
of each other and have a reliability of a E [0,1] in a day (meaning that the
probability that a piece of equipment fails during this time period is 1- a).
Initially, they are in working condition. We are interested in the number of
pieces of equipment which fail after n days; the time between a good working
condition (both pieces of equipment are working) and the breakdown of the
system (both pieces of equipment fail); and so on ...
If we let X n , n ~ 0, be the number of pieces of equipment which are
not in working conditions at the beginning of the nth day, then obviously,
the Xn's are random numbers with possible values 0, 1, 2. This random
phenomenon can be modeled by a discrete-time stochastic process whose
state space S is finite.
Example 3.2 Suppose a commodity is stocked to satisfy a continuing de-

mand. The stock is checked at times tn, n ~ 1. At each checking time, if
45
46 Lesson 3
the stock is below some prescribed level a, then the stock level is brought
up to some prescribed level b (a < b), otherwise, no replenishment is un-
dertaken. Since the demand for the commodity during each time interval
[tn-I, t n ) cannot be predicted with certainty, the stock level just before tn is
a random number.
If we let X n , n ~ 0, be the stock level just before time tn, then {Xn, n ~
=
O} is a discrete-time stochastic process with finite state space S {O, 1"", b}.
Example 3.3 This example is typical in every situation in which a facil-

ity for common use is provided when waiting and queueing situations are
encountered. Suppose that customers arrive for service at a taxi stand, and
that a cab arrives every five minutes. Assume that a single customer is
served during each time period. Since the number of customers who arrive
at the stand at time n is random, so is the number Xn of customers waiting
in line at the start of the time period n. The discrete-time stochastic process
{Xn, n ~ O} has a infinitely countable state space.
Example 3.4 Suppose that the lifetime of some piece of equipment is mea-
sured in units of time, say, minutes. When a piece of equipment fails, it is
immediately replaced by an identical one, and so on ...
If we let Xn be the remaining lifetime of the piece of equipment in
use at time n, then the discrete-time stochastic process {Xn, n ~ O} has
{O, 1,2, ...} as state space.
Now, if we examine the above examples, then we recognize that all the
above stochastic processes (Xn, n ~ 0) possess a common time-dependent
structure. Indeed, in Example 3.1, if we observe X o =
io, Xl =
iI, "',
Xn = in, then the prediction of the ''future'' X n+1 depends only on the
"present" state Xn = in of the process. The knowledge of the "past",
namely Xo, Xl,"', Xn-l, will not contribute to any improvement of X n +1 •
In other words, the present Xn contains all information concerning the
prediction of X n + l . This property is expressed mathematically as, for any
n ~ 0,
P(Xn+1 = in+IIXo = io," ',Xn = in) = P(Xn+1 = in+IIXn = in).

(3.1)
In Example 3.2, the relation between X n +1 and the demand Yn +1 (dur-
ing [tn, tn+d)is
Xn - Yn +l when a < Xn ~ b, Yn+l ~ Xn

Xn+l = { b - Yn+l when Xn ~ a, Yn+l ~ b
o otherwise
Discrete - Time Markov Chains 47
Thus the property (3.1) is satisfied when Y n +1 is conditionally independent

of XO,Xl,"',Xn-l given X n , i.e.,
P(Yn+b X o, Xl,"', Xn-lIXn) = P(Yn+1IXn )P(Xo, Xl,'" ,Xn-lIXn).
Indeed, for example,
P(Xn+l = jlXo = io, .. " Xn- l = in- b Xn = i)

=P(Yn+l = i - jlXo = io,"', Xn- l = in- l , Xn = i)
_ P(Yn+l = i - j, X o = i o,"', X n - l = in - l , Xn = i)
P(Xo = io," ',Xn- l = in-l,Xn = i)
_ P(Yn+l = i - j, X o = io, .. " X l = in-dXn = i)P(Xn = i)
n-
P(Xo = io,"', Xn- l = in- l , Xn = i)

_ P(Yn +l = i - j, X o = io, .. " Xn- l = in-llXn = i)
P(Xo = io, .. " Xn-l = in-llXn = i)
=P(Yn +l = i - jlXn = i) (by conditional independence)
= P(Xn+1 = jlXn = i).
In Example 3.3, if we use Yn to denote the (random) number of new
customers arriving during the nth period, then it is clear that
X n +l = { Xn - 1 + Yn if Xn ~ 1
Yn if Xn = 0
It is plausible to assume that the Yn's are mutually independent and have
the same distribution (i.e., an independent and identically distributed (Li.d.)
sequence of random variables). In this case, the property (3.1) is clearly
satisfied and the one-step transition probabilities P(Xn+l = jlXn = i) do
not depend on n, since
P(Xn+l= jlXn = 0) = P(Yn = j),
P(Xn+1 = jlXn = i I: 0) = P(Yn = j - i + 1),
noting that the distribution of Yn is the same for all n ~ O.
Similarly, in Example 3.4, we have
Xn- 1 if Xn ~ 1
X n +1 = { Yn +l if Xn = 0,
where Yn+l denotes the lifetime of the piece of equipment installed at time
n. It is plausible to assume that (Yn , n ~ 0) is an i.i.d. sequence and Yn +l
is independent of Xk, k ~ n. The property (3.1) is then clearly satisfied.
Thus random phenomena of the above type can be described by a com-
mon stochastic process model.
48 Lesson 3
Definition 3.1 A discrete-time stochastic process (Xn n ~ 0) having a

countable state space S is called a discrete-time Markov chain if it
satisfies the following Markov property:
P(Xn+1 = in+IIXo = iO,XI = i l ,""Xn = in) = P(Xn+1 = in+IIXn = in)

(3.2)
for all n ~ 0 and io, iI, ... , in E S. When the one-step transition prob-
abilities Pij = P(Xn+1 = jlXn = i) do not depend on the time n, we
say that (Xn , n ~ 0) is a discrete-time Markov chain with stationary
transition probabilities.
In this introductory text, we consider only this simple type of Markov

chains. This Lesson 3 deals with discrete-time chains, the continuous-time
case will be treated in Lesson 5.
Remark.
The above Markov property is equivalent to an apparently more general
form of Markov property, namely, for any k, nl, < ... < nk < nk+l and
states i l , ... , ik, ik+l,
P (Xn"+l = ik+IIXnl = il, .. . ,Xnk = ik) = P (Xn"+l = ik+I!Xnk = ik)
(Exercise 3.3).
3.2 Distributions of Markov Chains

As stated in Lesson 2, the distribution of a Markov chain (Xn n ~ 0), with
state space S, is characterized by its finite dimensional distributions. Let
11'0 be the initial distribution of the chain, i.e., 11'0 is the distribution of Xo.
Let IP = [Pij], i,j E S be the (one-step) transition probability matrix, where
Pij = P(Xn+1 = jlXn = i), Vn ~ o.

Note that the entries Pij are all non-negative, and EjEs Pij = ,I, Vi E S.
Such a matrix IP is called a stochastic matrix. We are going to show that
11'0 and IP determine completely all finite dimensional distributions of the
chain.
Lemma 3.1 For all n ~ 0 and io, i l ,···, in E S, we have
P(Xo = iO,XI = it,·· "Xn = in) = 1I'O(iO)PioilPili2" ,Pin_li". (3.3)

Proof.
P(Xo = iO,XI = il,·· ·,Xn = in)

=P(Xn = inlXo = io,·· ·,Xn- l = in-I}P(Xo = io,·· ·,Xn- l = in-I}
= Pin_linP(XO = io, ... , X n - l = in-I} (by Markov property)
= Pin_linP (Xn- l = in-IIXo = io,···, X n - 2 = in - 2 )
xP(Xo = i o,···, X n- 2 = in- 2)
= ... = Pin-linPin-2in-1 .. · Pioil 1ro(io). <>
To derive the joint distributions of (Xnl' X n2 , ... , X nk ) for any integers

nl, n2, ... , nk, and k ~ 1, we proceed as follows. We assume, without loss
of generality, that nl < n2 < ... < nk. Observe that for any im E 8,
=
m 1,2,·· ·k, we have
(Xnl = i l ,···, X nk = ik)

= U(Xo = so, Xl = S2,···, Xnl - l = snl-I, Xnl = iI, Xnl +l = Snl+l,
... , Xn2 - 1 = sn2-I, Xnl = i2,···, X nk - l = Snk-l, X nk = ik),
where the union U is over all Sq E 8, q E U~~lo 1m , where 10 = {n : 0 ~

=
n < nd, and for m = 1,2,···,k-l, Ij {n: nm < n < nm+d. Thus, by
using Lemma 3.1, we obtain the follwing theorem.
Theorem 3.1 For any integers k ~ 1, nI,···, nk and im E 8, m = 1,2,···, k,

we have
P (Xnl = iI, ... , X nk = ik) = 2: 1r(SO)P6061 ... Panl-lil Pil6nl+1 ... P6nk-lik'
(3.4)
where the summation E is over all Sq E 8, q E U~~IO 1m.
For example,
P(XI = i,X3 = j) I: I: P(Xo = x,X = i,X2 = y,X3 = j)

I
xES yES
2: 2: 1rO(X)PxiPiyPyj.
xES yES
Remarks.
(a) The Markov property implies that all future moves of the chain
depend only on the present state. Specially, if A is an event depending on
50 Lesson 3
X n+1, X n+2,···, then P(AIXo,·· ·,Xn) = P(AIXn). More generally, for

any bounded function f of X n +1 ,Xn +2 ,···, we have
E(f(Xn+1,Xn+2,·· ·)IXo, Xl,··· ,Xn) = E(f(Xn+1,Xn+2,·· ·)IXn ).

For example, using Theorem 3.1, we have
P (Xn+2 = in+2lXo = io,···, Xn = in)

P (Xo = io,···, Xn = in, X n+2 = in+2)
P(Xo = io,·· ·,Xn = in)
Ei"+l ES 11"0 ( iO)Pioil ... Pi"_li"Pi"i"+l Pi"+li"+2
11"0 ( iO)Pioil ... Pi"_li,,
= L
i,,+lES
Pi"i"+l Pi"+li"+2 .
Now
P(Xn+2 = i n +2IXn = in) = P(Xn+2 = in+2,Xn = in)fP(Xn = in)

Ei"+lES [E1I"0(i o)Pi oi l .. . Pi"_li,,] Pi"i"+lPi"+li"+2
"1I"0(i
L-, o)P.-1011. . .. p..'n.-1'"
.
where E is over io E S, 81 E S, ... , i n - 1 E S. Hence
P «Xn+2 = in+2lXo = i o,···, Xn = in) = P (Xn+2 = in+2lXn = in).

It is clear that
P (Xn+2 = jlXn = i) = P(X2 = jlXo = i), Vn ~ 0,

and more generally,
P (XnH = jlXn = i) = P(Xlc = jlXo = i), Vn ~ 0, Vk ~ 1.
(b) In the Markov property, the "present" is Xn with n being a fixed

time. It turns out that, for a class of special random times T, a strong
Markov property holds in which the present n is replaced by T. For example,
the strong Markov property holds for stopping times. A random variable
T, taking on values in {O, 1,2, ...} is a stopping time for (Xn, n ~ 0) if for
every n ~ 0, the occurrence of the event (T = n) can be determined from
X O ,X1,···,Xn.
The complete probabilistic knowledge of a Markov chain can be used to

compute various quantities of interest. For example, the distribution of Xn
is determined from 11"0 and IP as follows
P(Xn = j) = LP(Xo = i,Xn = j)

ieS
= L P(Xn = jlXo = i)P(Xo = i) = L 11"0 ( i)Plj, (3.5)

iES iES
where Plj denotes the n-step transition probability P(Xn = jlXo = i). For
= =
n 1, we simply write Pij, and for n 0 we have
ifi = j
Pi~ = { ~ ifii:j.
In the remark (a) after Theorem 3.1, we have Pi] = EkES PikPkj. Thus,
if we view IP = [Pij], i, j E S as a matrix, then Pi} is precisely the entry
(i, j) if the product of the matrix IP with itself, i.e., IP 2. Recursively, we
can get all n-step transition probabilities from IP. Moreover, "In, m ~ 0,
we have the following Chapman-Kolmogorov equation:
Pijn+m = "'"
L.J pn pm
ik kj' v,, J
\.J"
E S. (3.6)
kES
Indeed,
p:!+m P(Xn+m = jlXo = i) = L P(Xn = k,Xn+m = jlXo = i)

IJ
kES
LP(Xn = klXo = i)P(Xn+m = jlXo = i).
keS
In matrix notation, IP n+m = IP nIP m, where IP n is the n-th power of the

matrix IP. The direct computations of IPn might be hard. Approximations
to IP n , for n large, can be obtained when the chain is stable (see Section
3.4).
3.3 Classification and Decomposition of States

Let S be a countable set. When a Markov chain (Xn, n ~ 0) has S as its
state space, the elements of S will be classified according to their status with
regard to the probabilistic distribution of the chain. This section provides
a basis for the assymptotic analysis in Section 3.4.
First, consider the following
52 Lesson 3
Example 3.5 Let (Xn , n ~ 0) be a Markov chain with state space S =

{1,2,3,4} and transition matrix
o o
IP = [ ~
0.2
0.3
0.1
0.3
0.7
0.9
0.4
oo .
0.1
1
We can describe the evolution of this chain by using a graphical repre-
sentation. In the following directed graph, the vertices are the states and
an arrow from state i to state j (with value Pi; over it) indicates that it
is possible for the chain to move from i to j with probability Pi; in one
transition.
0.9
0.7
Let us formulate the concept of communication between states, which,
in turn, is used to decompose the state space.
Definition 3.2 (a) State j can be reached (or accessible) from state i, in
symbol, i --+ j, if Ptj > 0 for some integer n ~ O. In words, j can be
reached from i if starting from i, the chain can reach j in a finite number
of transitions with positive probability. Note that, since 'Vi, Pi~ = 1, any
state i can be reached from itself
(b) If i --+j and j --+ i, then i and j are said to communicate, in
symbol, i - j.
In Example 3.5, states 2 and 3 communicate, while states 1 and 4 do

not.
The binary relation of communication +---+ on S is an equivalence rela-
tion (Exercise 3.7), that is
(i) +---+ is reflexive: 'Vi E S, i +---+ i,
(ii) +---+ is symmetric: if i +---+ j, then j +---+ i, and
(iii) +---+ is transitive: if i +---+ j and j +---+ k, then i +---+ k.
It is an elementary fact from algebra that each equivalence relation on

a set S induces a partition of S. Here let
i = {j E S : j +---+ i},
the equivalence class of i. Then S is partitioned into disjoint equivalence

classes. In Example 3.5,
1={1}, 2=3={2,3}, 4={4}.

We observe that 1 is an absorbing state since once the chain hits 1, it will
remain at 1 forever. Looking at the transition matrix IP, we see that a
state i is an absorbing state when Pii = 1.
We formulate next the concept that states can be revisited by the chain
periodically.
Example 3.6 If we consider a Markov chain (Xn, n 2:: 0) on S = {O, I}

with 1P = [~ ~], then it is clear that state 0 is revisited by the chain
at the transition steps 2n, n 2:: 1, since P60 = 1 (pgr;+l = 0). Thus 2 is
considered as the period of state O.
Definition 3.3 Let i E S. Then i is said to be periodic with period o( i)

if o(i) 2:: 2, where o(i) is the greatest common divisor (g.c.d.) of all n 2:: 1
for which Pli > o. If o(i) = 1, then i is said to be aperiodic. If Pli = 0
for all n 2:: 1, then we define o( i) to be zero.
In Example 3.6, 0(0) = o(i) = 2. Also states 0 and 1 belong to the

same (communication) equivalence class. This is, in fact, true in general:
states belonging to the same class have the same period, in other words,
aperiodicity is a class property. This can be seen as follows.
Let i E S and suppose i has at least two states. Let j E i and j =P i,
Since i +---+ j, there exist integers n, m 2:: 1 such that P/j > 0 and PJ': > O.
54 Lesson 3
By Chapman-Kolmogorov equation, we have
PIr;+n =L P}'kPkj ~ PJ? Ptj > O.

kES
Thus 6(j) i= o. Similarly, 6(i) i= O. By symmetry, it suffices to show that

6(i) ~ 6(j). This is achieved if 6(i) divides all k such that Pj~ > O. Let k
be such that Pj~ > O. By Chapman-Kolmogorovequation again, we have
p~+m+k
n
>
-
p~+k p~
I] ]1 ,
Pijn+k >
-
pnpk
ij jj'
so that
Piin+m+k >
_
pnpk pm > 0 .
ij jj ji
By definition, 6( i) divides n + m + k (as well as n +m since Pi~+m >

PljPJ? > 0), so that 6(i) divides k. <>
The directed graph of Example 3.5 brings out the fact that the subset
{2,3} of the state space is closed in the sense that once the chain enters
it, the chain cannot get out of it. Mathematically, a non-empty subset A
of S is closed if'v'i E A, Vi ¢ A, and n ~ 1, Plj = 0 (equivalently, Pij = 0
for all i E A and i ¢ A). Of course, the absorbing state 1 in Example 3.5
is a closed set (as a singleton), as well as the whole space S. The closet
set A = {2,3} is an irreducible (or minimal) closed set in the sense that
no proper subset of A is closed. The state space {I, 2, 3, 4} is closed but
not an irreducible closed set. A closed set A is irreducible if and only if
all states in A communicate (Exercise 3.7). A Markov chain whose state
space consists of single equivalence class (that is, all states communicate)
is called an irreducible Markov chain.
Example 3.7 (A random walk with barrier). Let (Xn,n ~ 0) be a Markov

chain on S =
{O, 1,2, .. -} with transition probabilities
ifi = 1
Pii = 0, Vi E S, POi ={ ~ ifi i= 1,
and for i i= 0,
if i = i + 1
Pij = { i _p if i = i-I,
where p E (0,1). It can be checked that the chain is irreducible. Thus all
states have the same period. Since Pll > 0 only when n = 2k, k ~ 1,
6(0) = 2, and the chain is periodic with period 2.
We consider now a classification of states according to the probability

that the chain will return to its original state.
In looking at the directed graph of Example 3.5, we realize that the
equivalence classes {I}, {2,3} and {4} of S are of two different types.
(i) {I} (or {2, 3}) is a class such that once in it, the chain cannot leave
it.
(ii) {4} is a class such that it is possible to leave it, but once leaving it,
the chain will never return to it.
The classes of type (i) are called recurrent classes and those of type (ii)
are transient classes.
We are going to formalize these concepts in precise mathematical terms.
Specifically, we will define the concepts of recurrent and transient states,
and then show that recurrence and transience are class properties.
A state i E S is recurrent if, starting at i, the chain will return to i
surely (in a finite number of transitions). Specifically
Definition 3.4 A state i E S is said to be recurrent if
P(Xn =i for some n ~ llXo = i) = 1.

A non-recurrent state is called transient.
For i, j E Sand n ~ 1, let
f[j = P (Xn = j, Xk "I j, k = 1,2, ... , n - IIXo = i) ,

which is the probability that, starting at i, the first visit to j takes place
at the nth transition. For n =
0, define li~ =
0 for all i, j. Then
Ii; = 2::=1.m is the probability that, starting at i, the chain ever vis-
its j. Obviously, i is recurrent if and only if Iii = 1. ( and i is transient if
and only if Iii < 1).
We are going to establish a criterion for recurrence in terms of transition
probabilities of the chain. For this purpose, we need to relate the In's to
Pli's.
For lsi < 1, let
L .ms L plisn.
00 00
Fii(S) = n, and Pii(s) =

n=O n=O
Then
Pu(s) = 1/ (1 - Fu(s)) . (3.7)
56 Lesson 3
Indeed,
Fii(S)Pii(S) f: (t fi~Pi~-le) sn
n=O le=O
f: (t fi~Pi~-le) Sle = PU(S) -

n=1 le=O
1,
since fi~ = 0, Pi~ = 1, and for all n ~ 1,

n
Pi~ =L fi~Pi~-le. (3.8)
le=O
The proof of (3.8) goes as follows. Let A = {w: Xo(w) = i, Xn(w) = i}.
Observe that for w E A, the corresponding realization of the chain returns
to i at time n, but not necessarily for the first time. In fact, the first return
to i must have occured at one of the times 1,2, ... , n. Let
Ale = {w: = i, Xle(w) = i, Xm(w) # i, m = 1", .,k-l}.

Xo(w)
Then the Ale's, k = 1,2, ... , n, are disjoint and A = U~=1 Ale. Thus
n
peA) =L P(Ale).
le=1
Now
peAk) P(Xn = ilXo = i,Xm # i,m = 1", ·,k -1,Xle = i)
xP(Xo = i,Xm # i,m = 1", ·,k -1,Xle = i)
P(Xn = ilXk = i)P (Xle = i, Xm # i, m = 1"", k - llXo = i)
P(Xo = i)
Pi~-le f~P(Xo = i). (by Markov property)
But
= P(Xn = ilXo = i)P(Xo = i),
peA)
and hence (3.8) follows (Note that f~ = 0).
Theorem 3.2 A state i is recurrent if and only if E:=1 Pi~ = 00.
Proof.
(a) Necessity. Suppose E:=l fn
= 1. Then using (i) of Abel's lemma
(Exercise 3.10), we get lims/1 Fii(S) = 1. Thus by (3.5), lim&/1 Pii(s) =
00. Using (ii) of Abel's Lemma, we get E:=1 Pi? = 00.
(b) Sufficiency. Suppose that 2::=1 Pi~ = 00. If 2::=1 III < 1, then
by (i) of Abel's Lemma, lim3/1 Fu(s) < 1, which, by (3.5), implies that
lim6 /1 Pii(s) < 00. But then, by using (ii) of Abel's lemma, 2::=1 < pa
00, contradicting the hypothesis. <>
Remarks.
=
(i) While Iii 2::=1 fa is the probability that, starting at i, the state
i is eventually re-entered , the sum 2::=1 Pi~ is the expected number of
returns to i. To see this, let Ni be the random variable whose value Ni(W)
is the number of times i appears in the realization Xl (w), X 2 (w), . . .. In
terms of the indicator function l{i}(.),
00
Ni(W) =L l{i}(Xn(W»,
n=l
so that 00 00
E(NiIXo = i) = LP(Xn = ilXo = i) = L Pi~·

n=l n=l
(ii) Another interpretation of recurrence is this. The state i is recur-
rent if and only if, with the probability one, it is revisited infinitely often.
Specifically,
P(Ni = oolXo = i) = 1.
(Dually, i is transient if and only if it is revisited finitely often, with proba-
bilityone. Thus, after the last visit, i will be never re-entered again). This
can be seen as follows.
= i) = L
00
P(Ni < oolXo P(Ni = mlXo = i).

m=O
Now from Exercise 3.16, we have
= mlXo = i) = (1 - lii)(fii)m, for m ~ O.

P(Ni
Thus if Iii = 1, then P(Ni < oolXo = i) = 0, so that P(Ni = oolXo = i) =

1. If Iii < 1, then
00
P(Ni < oolXo = i) = L (1 - lii)(fii)m = 1.

m=O
The following theorem shows that recurrence is a class property.

58 Lesson 3
Theorem 3.3 If i is recurrent and i ~ j, then j is also recurrent.
Proof. By hypothesis, Plj > 0 and PI': > 0 for some n, m ~ 1. For k ~ 1,
we have PJj+m+k ~ PJ': Pi~Plj. Thus
00 00
'"'
L." pn+m+k
jj
>
-
pmpn '"' pk
ji ij L." ii'
k=O k=O
Note that PI': Plj > O. Thus it is clear that if E~=o Pi~ 00, then
pk_
,",00
L.."k=O jj - 00. <>
From the above, we see that within each equivalence class, states are
of the same nature. (For transience as a class property, see Exercise 3.11).
Moreover, as it will be shown below, no states outside of a recurrent class
A can be reached from the states in A. However, recurrent states in A can
be reached from transient states outside of A (thus it is possible to leave a
transient class, but then the chain will never return to it).
Theorem 3.4 Suppose i is a recurrent state, and i -+ j i= i. Then neces-

sarily j is recurrent.
Proof. Let T; be the time of the first visit to j, that is
T;(w) = { :n{n ~ 1: Xn(w) = j} when Xn(w) = j for some n ~ 1,

if Xn(w) i= j for all n ~ 1.
Since i -+ j, we have lij > 0 (Exercise 3.13). But
= P(T; < oolXo = i) = L

00
lij P(T; = nlXo = i),

n=l
so that P(T; = nlXo = i) > 0 for some n ~ 1. Let m = min{n ~ 1 :

P(T; = nlXo = i) > O}. Note that P(T; = mlXo = i) = fij > O. Also
from Exercise 3.13, we have, 'Vn ~ 0,
n
p!'l.
'3
= '"'
L." f,~. p!".- k > ~
'3 33 - Jij' (3.9)
k=O
Thus Ptj > o. Now,
Ptj =L ... L P(Xl = X!," ',Xm - 1 = Xm-l,Xm = jlXo = i),

:l:lES :l:m-1ES
so that there exist states Xl, ... , X m -1 such that

P (Xl = Xl,·· ·,Xm - 1 = X m -1,Xm = jlXo = i)
= =
Pix 1 PX1X2 ... PXm-d a > O.
Note that XIc i= i for k = 1,2,···, m - 1, that is i leads to j in m steps
without returning to i on the way. Indeed, if one of the XIc'S equals i, then
j can be reached from i in fewer than m steps, contradicting the fact that
pA = 0, for k = 1,2, ... , m - 1, by (3.7) and the definition of m.
Since the probability of never returning to i (that is P(Ti oolXo= =
=
i) 1 - Iii) is obviously greater than that of the event "The chain starting
at i, visits Xl.···, Xm-1, j in the first m steps and never returns to i after
the mth transition" , we have
1 - f,".. > a(1 - f··) > o.
- J' -
But, i being recurrent, Iii = =
1, a > 0, we must have Iii 1, implying that
j _ i. (See Exercise 3.13). The result then follows from Theorem 3.3. 0
Finally, we classify further recurrent states based on the concept of mean

recurrence time. The mean recurrence time J1.i of state i is the expected
number of transitions needed to return to i, that is J1.i =
E(TiIXo i). =
For a transient state i, J1.i =
00 since P(Ti = =
oolXo i) > 0 (see Exercise
1.20). When i is a recurrent state, the conditional distribution of Ti, given
=
Xo i is In = = =
P(Ti nlXo i), n ~ 1, so that
00
J1.i = L:ntn:::; 00.

n=l
Definition 3.5 A recurrent state i is

(a) Positive recurrence if J1.i < 00,
(b) Null recurrent if J1.i 00,=
(c) ergodic if i is positive recurrent and aperiodic.
In view of class properties (see exercises), when an arbitrary (countable)
state space S is irreducible, then either all states are transient, or all states
are positive recurrent, or all states are null recurrent.
If the state space S is finite, then
Theorem 3.5 If S is a finite state space of a Markov chain, then at least
one state is recurrent, and all recurrent states are positive recurrent.
Proof. Suppose S is transient chain, i.e., all states of S are transient.
Then, for j E S,
lim prj· = 0 Vi E S. (see Exercise 3.17)
n-oo
60 Lesson 3
Since S is finite, we have for any given i,
L.J Pt; = "L..J n-+oo

lim"
n-+oo
lim Pt; = O.
jES jES
But for each n ~ 0,
LPt; = P(Xn E SIXo = i) = 1,

jES
so that
lim" Pt; = lim (1) = 1,
n-+oo L..J n-+oo
jES
a contradiction. Thus the set of recurrent states is non-empty. Suppose

that the subset A of null recurrent states is non-empty. As for transient
states, it can be shown that if j is null recurrent, then
lim pr! =0 for all i E S.
n-co I)
Let i E A, then it follows that
lim pr! = 0 Vj EA.
n-co ')
Thus liffin_co LjEA Pt; = O. But since A is closed (see Exercise 3.15), we
have, for each n ~ 0,
L P;'j = P (Xn E AIXo = i E A) = 1.

JEA
We arrive at a contradiction as before. o

Remark.
Using the same technique as in the proof the above theorem, we deduce
that any finite closed sets of states (of an infinitely countable state space
S) contains at least one recurrent state. In particular, all states in a finite
irreducible closed set are positive recurrent. When S is finite, states in
irreducible closed sets are positive recurrent, the remaining states (if any)
are transient. There is no null recurrent state when S is finite. The situation
is different when S is infinite. For example, let (Xn, n ~ 1) be a Markov
chain on S = {O, 1,2, ...} with transition probabilities (0 < p < 1):
P;; = U- p
when j = i
whenj=i
otherwise.
+1
Then all states are transient. Indeed, for any i E S, i -+ i + 1 but i + 1 f:. i
(see Exercise 3.13).
Thus, unlike the finite case, it is possible that there is no recurrent
state in the infinite case. Also, in the infinite case,it is possible that all
states in an infinite irreducible closed set are null recurrent. For example,
consider Example 3.4 in Section 3.1. Suppose lifetimes are measured in
units of time. Let g(n) be the common probability mass function of the Yn's.
Then (Xn , n ~ 0) is a Markov chain on S = {O, 1,2,· ..} with transition
probability given by:
I ifj=i-l
for i ~ 1, {
Pij = 0 otherwise
and POj = g(j + 1), Vj E S. Suppose g(j) > 0 for all j ~ 1. It can be
checked that the chain is irreducible. By examining the state 0, one finds
that the chain is recurrent. If, in addition, I:~=l kg(k) = 00, then state 0,
and hence the chain, is null recurrent.
We close this section with a decomposition theorem. Again, in the
Example 3.5, the state space S =
{I, 2, 3, 4} is decomposed into {I} U
{2,3}U{4}, in which {4} is the set of transient states, while {I} and {2,3}
are irreducible closed sets of recurrent states. This type of decomposition is
true in general. The state of an arbitrary Markov chain can be decomposed
into a set of recurrent states (denoted as SR) and a set of transient states
(denoted by ST), one of which may be empty. Moreover, when SR f:. 0, it
can be decomposed further uniquely into disjoint irreducible closed sets.
Theorem 3.6 S = ST U SR, where non-empty SR is the union of a finite

or countable collection of closed sets A l , A 2 , ..••
Proof. We will first show that for each i E SR, there is an irreducible closed
set B(i) containing i. Since SR ~ S is at most countable, the collection of
such irreducible closed sets is at most countable. To obtain the theorem,
it then suffices to prove that any two B( i) and B(j) are either identical or
disjoint.
Let i E SR. Define B(i) = {j E SR : i -+ j}. Obviously i E B(i) (since i
is recurrent). Let j, k E B(i), then i +---+ j and i +---+ k in view of Theorem
3.4 (its proof), and hence j +---+ k, so that B(i) is irreducible. Let j E B(i)
and k ¢ B(i). Then j f+ k since, otherwise, i -+ k, so that k E B(i). Thus
B(i) is closed. Suppose B(i) n B(j) f:. 0. Let x E B(i). Then x +---+ k
where k E B(i) n B(j). But k E B(j), we must have x E B(j) since B(j)
is closed. Thus B(i) ~ B(j). By symmetry, we conclude that B(i) = B(j).
<>
62 Lesson 3
Remark. In the following discussions, we assume that ST i= 0 and the

Ai'S are not all empty.
If the chain starts at a recurrent state i in Am, then since Am is closed,

it will never leave Am. On the other hand, if the starting state i is in ST,
then the chain can be absorbed into one of the Am's. Thus we are interested
in computing absorption probabilities.
Let A be an irreducible closed set of recurrent states. Generalizing the
case where A is a singleton {j}, we define the hitting time of A as follows.
TA(W) = min{n 2: 1 : Xn(w) E A},
where Xn(w) E A for some n 2: 1 (and = 00 if Xn (j. A, '<In i= 1). Let

i E ST. Then the probability that the chain is absorbed by A, starting at
i, is
=
a(i, A) P (TA < oolXo i). =
The following system oflinear equations can be used to compute a(i, A),
at least when S is finite. We have
00
a(i, A) = L P(TA = nlXo = i)

n=l
00
P(TA = llXo = i) + L P(TA = nlXo = i)

n=2
00
P(XI E AIXo = i) + L P(XI EST, TA = nlXo = i)

n=2
00
LP(X1 = jlXo = i) + L L P(XI = j,TA = nlXo = i)

JEA n=2jEST
L Pij +L a(i, A)Pij (by Markov property).

JEA JEST
3.4 Stationary Distributions

Consider a Markov chain (Xn n 2: 0) on a countable state space S with ini-
tial distribution 11"0 and transition matrix IP. We are interested in knowing
whether or not, after a long time n, the chain will attain a stationary mode.
As we will see, such behavior of the chain depends on the matrix IP , and
not 11"0. Observe that the Markov chain (Xn, n 2: 0) will be a (strictly) sta-
tionary process if all the Xn's have the same distribution. Indeed, suppose
is the common distribution of X n , "In

11"0 ~ O. Then for any nl, n2,"', nk
and m, we have
P(Xnl+m = i1,Xn2+m = i2," ',Xnk+m = ik)

= P (Xnl+m = '1. ) pn ..
2-n 1 . . .pnk-nk_
1112
.Ik-l'k
.
1
(by Markov property)
= 11" p~~-nl ... p~/C-~/C-l
o '112 Ik_llk
= P (Xnl = ii, X n2 = i2, .. " Xn/c = ik) ,

so that (Xnl>Xn2,"', X nk ) and (Xnl+m, X n2 +m,"', X nk +m) have the same
joint distribution.
Thus if for all n sufficiently large, the Xn's have the same distribution,
and we take this common distribution as the initial distribution for the
chain, then the chain will be stationary. (We also say that the chain is
stable or in steady state mode).
We are led to consider the limiting distribution of the chain, namely
= =
liffin_oo P(Xn j) 11"" j E S. Now,
P(Xn = j) = I: 11"0 (i)Pi'j , Vj E S, (3.10)

iES
where 11"0 is the initial distribution.

We see that the existence of 11" depends on liffin_oo Plj. It is clear that
if the limits liffin_oo Plj = aU) exist and a( . ) is a probability distribution
(i.e., aU) ~ 0 and :LiES aU) = 1), then a( . ) is the limiting distribution
of the chain. Note that liffin_oo Plj should not depend on i.
In view of this,there is no hope for establishing the existence of the
limiting distribution in situations such as
(i) In Example 3.6, the chain is irreducible, recurrent, and periodic with
period 2. We have
if n is odd
P~o= { ~ if n is even.
Thus, P~o does not have a limit as n -+ 00. The problem is with aperiod-
icity.
(ii) Consider a Markov chain such that its state space S has two disjoint
closed sets A and B. Then, for each n ~ 1,
ifi E B
P(Xn E AIXo = i) = { ~ ifi E A
and hence, liffin_oo P(Xn E AIXo = i) depends on the initial state i.

64 Lesson 3
The asymptotic behavior of Plj depends on the nature of the state i,

j (transient, null recurrent, positive recurrent) which is determined by the
transition matrix IP.
Consider the Markov chain (Xn, n ~ 0) on S = {0,1} with ]P
[ 1 ba 1 ~ b ] with a, bE (0,1). Then it can be checked that
. IPn - -1-
11m [b a]
n-co - a+b b a
so that o:(j) = liffin_co Plj exists and 0:( . ) is indeed a probability distri-
bution:
0:(0) = lim Pcfo = lim P lo =
n-co n-co
~b
a+
and
0:(1) = n_co lim Pfl = ~b.
lim Pcfl = n_co a+
As mentioned in Section 3.2 that, in general, the computations of ]pn
are hard, and in fact, the determination of the limiting distribution of the
chain, when it exists, by other means, will provide approximations for ]pn
for large n.
To determine the limiting distribution 11", we observe that 11" is invariant,
that is
1I"(j) = L
11"( i)Pij 'Vj E S. (3.11)
iES
An invariant distribution is also called a stationary distribution. Indeed,
1I"(j) lim P(Xn+1 = j) = n-+oo
lim ""' P(Xn = i)Pij
n-+oo L...J
iES
L [Ji.~ P(Xn = i)] Pij =L 1I"(i)Pij

iES iES
(by dominated convergence theorem, see Appendix). In fact, 11" is the unique
invariant distribution. Indeed, if 11"' is another invariant distribution, then
= = =
take 11"0 11"', we have, for n ~ 0 P(Xn j) 1I"'(j) (see Remark below),
so that liffin_co P(Xn j)= = 1I"'(j) , implying that 11"'( .) 11"(.), since =
the convergence in distribution of the Xn's to 11"( . ) is valid for any initial
distribution 11"0.
Remark. If the initial distribution 11"0 is invariant (stationary), then the
Xn's have the same distribution (and hence the chain is a strictly stationary
process). This can be seen as follows. Suppose
1I"0(j) =L 11"0 (i)PiJ , j ES. (3.12)

iES
Then we can write
1I"0(j) =L 11"0 (i)Pli , with n = 1. (3.13)

iES
Suppose that (3.13) holds for n. Then
P(Xn+l = j) L 11"0 (i)Pij+l

iES
L L Pti.Pkj = L [L 1I"0(i)Ptk] Pkj

11"0 (i)
iES kES kES iES
L 1I"0(k)Pkj by induction hypothesis
kES
1I"0(j).
Therefore (3.13) holds for all n ~ 1. Thus, if the limiting distribution 11"
of the chain exists, and we take 11"0 = 11", then all the Xn's will have the
same distribution 11". The converse is obvious: if the Xn's have a common
distribution 11", then 11" is stationary. Indeed, 1I"(j) = P(Xo = j) = P(Xn =
j), and
P(XI = j) = LP(Xo = i)Pij.
iES
However, it should be noted that a Markov chain may have a (unique)
stationary distribution without being stable, that is, it is not true that the
Xn's will converge in distribution to this stationary distribution, for any
initial distribution 11"0 (examine Example 3.6)
In the following, we will investigate the problem of existence of sta-
tionary distributions, and then the problem of convergence to a stationary
distribution.
In view of the Decomposition Theorem 3.6, the state space S consists
of a set of transient states, and a collection of irreducible, recurrent closed
sets. The restriction of the chain to each closed set is a Markov chain (called
a sub-chain). Thus it suffices to look at irreducible chains.
The following Theorem 3.7 shows that
a) An irreducible transient (or null recurrent) chain has no stationary
distribution.
b) An irreducible positive recurrent chain has a unique stationary dis-
tribution.
c) For an arbitrary Markov chain, if the chain does not have positive
recurrent states, then it has no stationary distribution. If the set of all posi-
tive recurrent states of the chain is non-empty and forms an irreducible set,
66 Lesson 3
then the chain has a unique stationary distribution; if the set of all positive
recurrent states of the chain is non-empty and is decomposed into distinct
irreducible closed sets, then the chain has distinct stationary distributions.
Theorem 3.7 Let (Xn, n 2:: 0) be an irreducible Markov chain.

(a) If the chain has a stationary distribution 7r, then necessarily 7r is
given by
1
7r( i) = J.li' i E 8,
where J.li is the mean recurrence time of i. (hence 7r is unique). Moreover,

in this case, all states are positive recurrent.
(b) Conversely, if the (irreducible) chain is positive recurrent, then the
function 7r on 8 defined by 7r( i) = 1/J.li, i E 8, is the unique stationary
distribution.
Proof. (a) Suppose that the chain has a stationary distribution 7r. Since
the chain is irreducible, all states are either transient or recurrent. If the
chain is transient, then limn ..... oo Plj = 0, for all i, j E 8. Now, since 7r is a
stationary distribution, we have, for each n, and j E 8:
7r(j) ,;" E 7r(i)PiJ .

iES
Thus
7r(j) = n-+oo
lim "7r(
L...J
i)PiJ = "7r( lim PiJ = 0
L...J i) n-+oo
iES iES
(by dominated convergence theorem), contradicting the fact that 7r is a

probability distribution. Hence the chain has to be recurrent.
Recall that J.li = E(11IXo = i) where 11 is the first visit to i. Since 11
is a positive random variable, its expectation can be written as
= E P(11 2:: nlXo = i) = E P(11 2:: nXo = i)/ P(Xo = i).

00 00
J.li
n=l n=l
Take 7ro = 7r so that P(Xo = i) = 7r(i), and the chain becomes a stationary
process. Now since (Xo = i) ~ (11 2:: 1),
= i) = P(Xo = i).
P(11 2:: 1, Xo
Note that if A, B are two events, then P(AB) = P(A) - P(AB'), where B'
denotes the complement of B. For n 2:: 2, Let
A = (Xm =1= i, m = 1, .. " n - 1), B = (Xo = i),
then
pen ~ n, Xo = i) = P(Xm =F i, m = 1, ... , n - 1, Xo = i)

= P(Xm =F i, m = 1,···, n - 1) + P(Xm =F i, m = 0,1,···, n - 1)
= P(Xm =F i, m = 0,1, ... , n - 2) - P(Xm =F i, m = 0,1, ... , n - 1)
=a n -2 - an-t.
where an = P(Xm =F i, m = 0,1,· .. , n). Since the events An = (Xm =F

i, m = 0,1, ... , n) form a non-increasing sequence of events, we have An -+
nnAn = (Xm =F i, m ~ 0) and hence
lim an
n-oo
= P(Xm =F i,m ~ 0) = 0,
since i is a recurrent state.
Now, from the above, we have
J.li'IT( i) ?rei) + ~)an-2 - an-I)

n~2
?rei) + ao - lim an
n-oo
P(Xo = i) + P(Xo =F i) = 1.
Now, observe that ?rei) > 0, Vi E S. Indeed, if ?rei) = 0, then since ?r is a
stationary distribution, we have
0= ?rei) = L ?r(k)P[ic ~ ?r(j)PJi

icES
for all n and all i E S. On the other hand, the chain is irreducible by
hypothesis, there exists an n such that PI} > 0, thus ?r(j) = 0, for all i E S,
which is impossible since, again, ?r is a probability distribution on S. Thus
J.li = 1/?r(i) < 00, Vi E S, implying that all states are positive recurrent.
(b) Suppose now that (Xn , n ~ 0) is an irreducible, positive recurrent

chain. Vi E S, 0 < J.li < 00. Define ?r: S --+ m+ by
?r( i) = 1/J.li •
We are going to verify directly that ?r is indeed a stationary distribution,
that is,
L ?rei) = 1 and?r(j) = L ?r(i)Pij, Vi E S.
iES iES
68 Lesson 3
Let Nj(n) denote the number of visits to j during the first n transitions.
Then n
Nj(n) =L Ij(Xk)
k=l
so that
n
E(Nj(n)IXo = i) = LPi~'
k=l
Since 2::j ES Pi~ = 1 for all k, we have
L .!:.E(Nj(n)IXo = i) = 1 for all n. (3.14)

jES n
We are going to show that
lim .!.E(Nj(n)IXo = i) =~ 'Vi,jES. (3.15)

n_oo n J1.j
For this purpose, we fix j, and let Tm , m> 1, be the time of mth visit to
j, that is
Tm = min{n ~ 1 : Nj(n) = m}.
By the probabilistic nature of the Markov chain, it is clear that the ran-
dom variables Tl, T2 - T I , " ' , Tm - Tm-l are independent and identically
distributed. Conditioned on Xo = j, the common mean of these random
variables is J1.i. By the strong law of large numbers,
-Tm
m
= Tl + (T2 - Tt) + ... + (Tm - Tm-d
m
---+ J1.j, as m --+ 00.
Now, TNj(n) ~ n ~ T Nj (n)+1 so that
- - < -n- <

TNj(n) TNj(n) +1
---'~:.-,.--
Nj(n) - Nj(n) - nj(n)
Observe that since j is positive recurrent, Nj (n) --+ 00 almost surely, as n --+
00, Thus Nj(n)/n ---+ 1/J1.j almost surely, as n --+ 00. And (3.14) follows,
by taking expectation under X 0 = i (The convergence of expectations is
justified by "dominated convergence theorem", see Appendix). When S is
finite, we have
L~ = "L..J nlim
_ oo [.!.E(Nj(n)IXo
n = i)]
jES J1.j jES
lim" .!.E(Nj(n)IX
n_ooL..J n o = i) = 1
jES
in view of (3.13) and (3.14). To verify that 7r(i) = l/J-Lj is stationary, we

look at
2: ~E(Nj(n)IXo = i)Pjk 2: ~ (t prj) Pjk

jeS
jes m=l
.!..n [E(Nj(n + l)IXo = i) - Pik]
(noting that Ejes prj

Pjk =
P[;:+1). Thus, when S is finite, we let n -+ 00
in the above equality, and obtain the desired result as follows.
2:- 1
jes J-Lj
1
Pjk= - ,
J-Lk
't/k E S.
The same result holds when S is infinite but with much more technical
~~. 0
Corollary 3.1 An irreducible Markov chain on a finite state space has a
unique stationary distribution.
Proof. In view of Theorem 3.5, such a chain is positive recurrent, and the
result follows from Theorem 3.7.
Remark. In general, a Markov chain on a finite state space has at least
one stationary distribution. It is not necessarily so if the state space is
infinite.
As mentioned previously, even if a Markov chain has a unique stationary
distribution 7r, it may happen that 7r is not the limiting distribution of the
chain, i.e. the chain may not be stable. However, if this unique stationary
distribution 7r is such that liIIln-+oo Plj = 7r(j), j E S, then necessarily
7r is the limiting distribution of the chain. It can be shown that for an
irreducible, aperiodic chain, we have
1
lim pT'.
n-+oo I)
=-,
J-Lj
for all i,j E S.
If the irreducible chain (Xn, n ~ 0) is periodic with period 6, then the chain
(X6n, n ~ 0) is aperiodic, and hence
p~n = P(X6n = ilXo = i) --+ ~, as n -+ 00.

" ~
As a consequence, any irreducible, aperiodic and positive recurrent chain
(i.e. an ergodic chain) is a stable chain. In particular, an irreducible,
70 Lesson 3
aperiodic Markov chain on a finite state space has a unique stationary

distribution 7r which is the solution of
L 7r(i)Pij = 7r(j), "Ii E S, L 7r(i) = 1.

iES iES
Moreover
lim
n-+oo
Plj = 7r(j), "Ii, i E S.
Note that, in such a case, approximations to ]pn, for n large, can be ob-
tained.
We close this Lesson with an important stochastic model in physics and
biology, (population growth). Consider the following random phenomenon.
A population of "objects" evolves in generations as follows. Each object can
produce identical objects (offspring). The offspring of members of the initial
(zero) generation form the first generation, the offspring of the members of
the first generation form the second generation, and so on. Suppose that
each object independently produces offspring according to a probability
=
mass function 1 on S {O, 1,2, .. J.
Let Zi(n) be the offspring ofthe ith member in the nth generation, then
the Zi(n)'s are independent and have 1 as common offspring distribution.
Let Xn be the size of the nth generation. Then
X n+l(W) = { Zl(n)(w) + ... + ZX .. (w)(w) for Xn(w) >0

o for Xn(w) =0
It follows that
P(Xn+l = ilXn = i) = P(Zl(n) + ... + Zi(n) = i),

which can be computed in terms of 1 either by using convolution or by using
generating functions. For example, suppose I(n) = e->'Anjn!, n 0,1"" =
(Poisson distribution), then
P.i j = e ->.i(,.)jj"
112 }., ..
2,}E {012 }
" , •••.
The Markov chain (Xn, n ~ 0) on S = {O, 1,2,"'} so obtained is called a

branching process.
The state 0 is an absorbing state, and the chain becomes extinct when
= = =
it is absorbed by O. However, if 1(0) P(Z 0) 0, then p(Z ~ 1) 1, =
so that the sequence Xn is almost surely non-decreasing, and hence the
extinction of the population is impossible. So let us assume 1(0) > O. Note
=
also that if 1(1) 1- 1(0), then the sequence Xn is non-increasing (a.s.),
so that extinction cannot be avoided. Thus, in the sequel, we assume that
1(0) > 0 and 1(0) + 1(1) < 1 (3.16)
and we are interested in computing the probability that the population,

ever dies out, in other words, the absorption probability into the state O.
The state 0 forms the only positive recurrent class of the chain. The
other states 1,2, ... form a class of transient states, since it can be checked
that all these states lead to 0, but 0 f+ i, 'Vi ~ 1. The branching process
is not irreducible. But clearly the probability distribution 7r, concentrated
on {OJ, given by 7r(0) = 1 (7r(i) = 0, Vi ~ 1), is the unique stationary
distribution of the chain. However, the asymptotic behavior of the chain
cannot be learned from such 7r. As we will see, under (3.15), either, with
probability one, the population will become extinct, or there is a positive
probability that the population will grow indefinitely.
First, the time of absorption is
T(w) = inf{n : Xn(w) = O} ~ 00,
where inf0 = 00. The probability of extinction, given Xo = 1, is

P(T < oolXo = 1) = a
By independence assumption, we have
P(T < oolXo = i) = ai, i ~ 1.
Since the subset of states {I, 2, ...} is a transient class, we see that
it T(w) < 00
lim Xn(w) = {
n-oo
~ ifT(w) = 00.
Thus
P ( n-oo
lim Xn = 0IXo = i) = P(T < oolXo = i) = ai,
and
i)
P ( lim Xn = oolXo = = P(T = oolXo = i) = 1 - ai,
n-oo
for all i E S = {O, 1, 2, ...}.
If a = 1, then extinction is certain. If a < 1, then the probability of
extinction, given Xo = i, is a i < 1, and 1 - a i > 0 is the probability that
the population grows to infinite.
72 Lesson 3
We are going now to determine the probability a. Let G be the proba-

bility generating function of the offspring distribution, that is,
= I: f(n)t n ,
00
G(t) t E [0,1].
n=O
Now, it is clear that, 'Vi,j E S,
P(Tj = n + 11Xo = i) = I: Pil,P(Tj = nlXo = k),

kti
for all n ~ 1. Thus
00
fii = P(Tj < oolXo = i) = I: P(Tj = nlXo = i)

n=l
00
P(Tj = 11Xo = i) + I: P(Tj = nlXo = i)

n=2
00
P(X1 = jlXo = i) + I: I:PikP(Tj = nlXo = k)

n=2kti
Pij + I:Pikiki.
ktj
For i = 1 and j = 0, we get

= 1) = flo = P10 + I: Plk ak
00
a P(To < oolXo

k=l
I: Plk ak = G(a).
00
k=O
by noting that
00
I: P(To = nlXo = k) = P(To < oolXo = k) = fkO = a k .

n=l
Thus a is a positive solution of the equation

t = G(t) on [0, 1]. (3.17)
Note that 1 is always a solution. It can be shown that if the mean of the
offspring distribution J.t =
E:=l nf(n) ~ 1, then (3.16) has no roots on
[0,1) so that a = 1. On the other hand, if J.t > 1, then (3.16) has a unique
root to on [0,), and necessarily a to. =
3.5 Exercises
3.1. Verify the Markov property of Example 3.1 and find the one-step
transition probability matrix.
3.2. Show that if (Xn, n 2: 0) is a sequence of i.i.d. discrete random
variables, then it is a Markov chain.
3.3. Let (Xn, n 2: 0) be a Markov chain.
(i) Show that, "In, X n +1 is conditionally independent of Xo, Xl, ... , Xn-l,
given X n . Hint:
P(ABIC) = P(AIC)P(BIC) ¢:::} P(AIBC) = P(AIC).
(ii) Use (i) to show that for nl < ... < nk < nk+l,
P (Xn"+l = ik+1IXn1 = il , ... , X = ik) = P (X
n " nk +1 = ik+IIXn" = ik) .
3.4. (Ehrenfest Model) Two urns UI and U2 contain M balls in total. A
ball is drawn at random. This selected ball is transferred from the urn it
is in to the other. Let Xn denote the number of balls in UI at the end of
nth trial.
(a) Show that (Xn, n 2: 0) is a Markov chain.
(b) Specify the state space and one-step transition matrix of the chain.
3.5. Let (Yn , n 2: 1) be a sequence of i.i.d. random variables such that
P(Yn = 1) = p = 1 - P(Yn = 0),

where p E (0,1). Consider
X n -{ Y for n 2: 1
0 I +Y2 +···+y.n
for n = O.
-
Show that (Xn, n 2: 0) is a Markov chain and find its transition probability
matrix.
3.6. Let (Xn, n 2: 0) be a Markov chain on S = {a, b, c} with transition
l!]
probability matrix
0
IP = [Pij] = [ 0 t! 0
1.
4
1.
4
0
(i) Compute P(X4 = blXI = a) and P(X5 = blXI = c, Xa = c).
(ii) Describe the evolution of the chain by drawing a directed graph.
74 Lesson 3
(iii) Indicate the states which communicate with each other. Are there
any absorbing states'?
3.7. Let (Xn, n ~ 0) be a Markov chain with state space S. Let 0 =F A ~ S.
(i) Show that A is closed of and only of Pij = 0 for all i E A and j ¢ A.
(ii) Show that a closed set A is irreducible if and only if all states in A
communicate with each other.
3.8. Show that the communication relation - on a state space S of a
Markov chain is an equivalence relation.
3.9. Let (Xn, n ~ 0) be a Markov chain on S with transition probability
matrix IP.
(i) Show that, for i E S, if Pi1 > 0 then Plf > 0 for some m > n. (Hint:
use Chapman-Kolmogorovequation.)
(ii) Let In be the probability mass function of X n . Verify that, for all
j E S,
In+1(i) = :L In(i)Pij.
iES
3.10. (Abel's Lemma). Show that
(i) if E~=o an converges, then
= :Lan.
00 00
lim :Lan sn
3/1 n =O n=O
(ii) If an ~ 0 and
00
lim :Lansn
6/1 n =O
= a <- 00,
then E~=o an = a.
3.11. Show that transience and positive recurrence are class properties.
3.12. Let A be a recurrent class. Verify that the sub-matrix [Pi;], i.j E
A, is stochastic matrix, and the associated Markov chain is recurrent and
irreducible.
3.13. Let (Xn' n ~ 0) be a Markov chain with state space S and transition
matrix IP.
(i) Show that, for all i =f; j and all n ~ 0,

n
P{J = 'ILJi~ PIr"'·

"'=0
(ii) Let T; be the time of the first visit to j. Show that
peT; < oolXo = i) = lij.
(iii) For i =f; j, show that i -+ j if and only if lij > O.
(iv) Let i, j E S such that i -+ j but j ft i (Le., i cannot be reached
from j). Show that i is a transient state.
(v) Show that the set of all recurrent states is a closed set.
3.14. Classify the states of the Markov chain (Xn, n ~ 0) with state space
{O, 1,2,3,4, 5} and
0.2 0.8 0 0 0 0
0.5 0.5 0 0 0 0
IP = I 0.1 0.2 0.3 0.4 0 0
0.1 0 0.2 0.3 0 0.4
0 0 0 0 0.3 0.7
0 0 0 0 0.4 0.6
3.15. Let (Xn, n ~ 0) be a Markov chain with state space S.

(i) Let A be a closed set of S. For i E A, show that
P(Xn E AIXo = i) = 1, 'tin;::: O.
(ii) Show that the set of all null recurrent states is closed.
3.16. Let (Xn, n ;::: 0) be a Markov chain with state space S and transition
matrix IP. Define Nj(w) = number of times j appears in the sequence
(X 1 (W),X2 (w)," .).
(a) Show that 'tIk ;::: 1,
P(Nj ~ klXo = i) = lij(Jjj )"'-1.

Deduce that, for i =f; j,
P(Nj = klXo = z) =
. {1- lij
lij(Jjj)"'-I(l- hj)
for k = 0
for k ~ 1
76 Lesson 3
and
P(Ni = klXo = i) = Uii)k(1 - Iii), for k ~ o.
(b) Suppose j is a transient state. Show that P(Nj = oolXo = i) = o.
(c) Suppose j is a recurrent state. Show that
P(Nj = oolXo = i) = Ii; Vi ES.

(d) Suppose j is a recurrent state and i i= j, i --+ j. Show that
E:=l Plj = 00. (Hint: E:=l Plj = E(N; IXo = i) and use (iii) of Ex-
ercise 3.13.)
3.17. Let j be a transient state of a Markov chain. Show that
lim Ptj = 0, for all i E S.

n-+oo
(Hint: use Exercise 3.16 (a) to show that Eij Plj < 00.)
3.18. Let (Xn; n ~ 0) be a Markov chain on S = {- .. , -2, -1, 0,1,2, ...}
with
Pij ={ P if j = i + 1
I-p ifj=i-l
for all i E S, where p E (0,1).
(i) Verify that this chain is irreducible and compute Plj for n ~ O.
(ii) Show that the chain is a recurrent chain when p = 1/2. (Hint:
n! "" v27rn e-nn n , as n --+ 00.)
1
3.19. Let
0.2 0.8 0
F=U
1 0 0
o 0 1
010
be the transition matrix of a Markov chain on S = {a, b, c, d}. Determine
the stationary distributions of the chain.
3.20. Let (Xn, n ~ 0) be a Markov chain on S = {O, I} with
IP=[I- P p ]
q l-q'
where p, q E [0,1] and 0 < p + q < 2.

(i) Compute IPn, n ~ 1.
(ii) Find the unique stationary distribution 7r of the chain and verify
that
7r(j) = lim Ptj 'Vj E S.
n-oo
3.21. Let (Xn, n ~ 0) be a Markov chain on S = {O, 1,2,3,4, 5} with
0.1 0.9 0 0 0 0
0.5 0.5 0 0 0 0
IP _ I 0.1 0.2 0.3 0.4 0 0
- 0.1 0 0.2 0.3 0 0.4
0 0 0 0 0.9 0.1
0 0 0 0 0,5 0.5
(i) Verify that all states are aperiodic.

(ii) Is the chain irreducible?
(iii) Identify the ergodic states.
3.22. Let (Xn, n ~ 0) be a Markov chain on S = {O, 1, 2} with
0.1 0.2 0.7]
IP = [ 0.5 0 0.5
o 0.1 0.9
(i) Show that the chain is irreducible and ergodic.

(ii) Find its unique stationary distribution.
3.23* Let X be a random variable taking values in {O, 1,2, ...}. Let 1 be
its probability mass function, and
L: I(n)t
00
G(t) = n for t E [0,1].

n=O
Assume that 1(1) < 1. Show that

(i) If E(X) ~ 1, then the equation G(t) = t has no roots in [0,1).
(ii) If E(X) > 1, then the equation G(t) = t has a unique root in [0,1).
Lesson 4
Poisson Processes
This Lesson is devoted entirely to an important class of continuous-time

Markov chains, the Poisson processes. This Lesson also serves as an intro-
duction to continuous-time Markov chains where the general theory will be
treated in Lesson 5.
4.1 Motivation and Modeling

Consider a sequence of events which occur at random instants, say T1 , T2, ... ,
Tn, .... For example the arrival of customers for service; the occurrence of
breakdowns, accidents, earthquakes; the arrivals of particles registered by
a Geiger counter.
The sequence (Tn, n ~ 1) is called a point process. In the following we
will suppose that 0 < Tl < T2 < ... < Tn < ... and liIDntoo Tn = 00 with
probability one. These properties mean that the registration of the events
begins at time 0, that two events cannot occur at the same time and that
the observed phenomena take place during a long period. Note that 0 is
not an event time arrival. The reason is that it is natural to suppose that
the distribution of Tn is continuous.
Now a convenient method for describing (Tn) is to consider the associ-
ated counting process (Nt, t ~ 0) where Nt represents the number of events
that have occured in the time interval [0, t].
(Nt, t ~ 0) and (Tn, n = 1,2,···) contain the same information since
with probability one
Nt=sup{n: n=0,1,2,···;Tn~t},t~0 (4.1)
with the conventional notation To = 0; whereas

79
80 Lesson 4
Tn = inf{t : t ~ 0, Nt ~ n}; n = 0, 1,···. (4.2)
These relations are visible on figure 1 which shows a typical sample path
for the Counting Process (Nt).
Nt
4
r----:
3
2. r
I
I
~
I
0
TI T2 T3 T4 TS
Figure I. A typical sample path of a Counting Process
The following relations between (Nt) and (Tn) are also of interest
{Nt = n} = {Tn ~ t ~ Tn+1 }, (4.3)
{Nt ~ n} = {Tn ~ t}, (4.4)
{s < Tn ~ t} = {N6 < n ~ Nt}. (4.5)
On the other hand if the sources which generate the events are indepen-
dent, then it is natural to suppose that the respective numbers of events
which occur on nonoverlap ping time intervals are stochastically indepen-
dent.
Furthermore, ifthe sources keep the same intensity during the time then
the distribution of N t +h - N 6 +h does not depend on h.
Poisson Processes 81
4.2 Axioms of Poisson Processes

The above considerations lead to the following axioms
Ao: 0 < Tl < T2 < ... < Tn < ... and liffin/co Tn = 00 with probability
one.
AI: (Nt, t ~ 0) is an independent increments process, i.e., for any k ~ 2
and 0 ~ to < tl < ... < tic the random variables Ntl - Nto, Nt2 - Ntl' "',
Ntk - N tk _l are independent.
A2 : (Nt, t ~ 0) is a stationary increments process, i.e., for any h > 0,
o~ s < t, Nt+h - N$+h and Nt - N$ have the same distribution.
If these axioms are valid we have the following astonishing result:
Theorem 4.1 If A o, Al and A2 hold, then there exists a strictly positive

constant A such that, for each 0 ~ s < t,
P(Nt - N$ = k) = e-A(t-$) (A(t - '_I

s))lc., k = 0, 1,2,···. (4.6)
Relation( 4.6) means that Nt - N$ follows the Poisson distribution with

parameter A(t - s) (we use the notation 1'(a) to denote a Poisson distri-
bution with parameter a); A is called the intensity of the Poisson process
(Nt).
Note that (4.6) together with Al and A2 determine completely the dis-
tribution of (Nt, t ~ 0) since No = 0 a.s. and since if 0 < tl < ... < tic
P(Ntl = nl,"', Ntk = nlc)

P(Ntl = nl,N Nh = n2 -
t2 - nl," ',Ntk - N tk _l = nlc - nlc-l),
then using AI, A2 and (4.6) we obtain
P(Nh = nl ,
... , Nt k -- n)
Ic = e
-Atl (Atl)nl
nd ...
e-A(tk-tk-l) (A(tlc - tlc_l))nk-n k- l
x (nlc - nlc-l)! 10~nl~···~nk' (4.7)
where nl, .. " nlc E IN. Now, according to Kolmogorov's existence theorem
(See Lesson 2), the distribution of the entire process is determined.
Before making some comments about the axioms, we give the proof of
the Theorem 4.1. In that proof and in the following of the Lesson, the
expression "with probability one" will be omitted.
82 Lesson 4
Proof. Let gt-. be the moment generating function of Nt - N.:
00
gt-.(u) = E (u Nc - No ) = LP(Nt - N. = u)u1:, O~u~l. (4.8)

1:=0
Using the decomposition Nt = (Nt - N.) + (N. - No), and axioms Al and
A2 we get
gt(u) = g.(u)gt-.(u), 0 ~ s < t, 0 ~ U ~ 1, (4.9)
which implies for each pair (p, q) of integers
gp/q(U) = (gl/q(U)y = (9l(u))l/qY = (gl(U)t/ q . (4.10)
On the other hand (4.9) entails the decrease oft 1---+ gt(u), consequently
(4.10) remains valid for irrational t's:
gt(U) = (gl(U))t , t > O. (4.11)
We now show that gt(u) cannot vanish. In fact, if gto(u) = 0 then

(4.11) implies gl(U) = 0 and consequently gt(u) = 0 for each t > O. This is
a contradiction since
gt(U) ~ peNt = 0) = P(Tl > t) t P(Tl > 0) = 1 as t! O.
Finally we may let

gt(U) = e-t>.(u) , (4.12)
where A(U) is positive.
It remains to identify A(U). To this aim we first show that
peNh ~ 2) = o(h) as h --+ O. (4.13)
Note that for h > 0,
L {N(n-l)h = 0, Nnh - N(n-l)h ~ 2} C {T2 < Tl + h}

n~l
then since peNt = 0) = gt(O) = e-t)'(O), we obtain, using Al and A2 ,
Lexp(-(n -1)hA(O)) peNh ~ 2) ~ P(T2 < Tl + h). (4.14)

n~l
Now it is clear that A(O) =t= 0, unless (4.12) implies gt(O) = 1 for each
t > 0, consequently 1 = peNt = 0) = P(Tl > t) for each t > 0, hence
Tl = +00 a.s. which contradicts Ao. Thus (4.14) may be written under the
form
P(Nh > 2)
---'--'-'-.:;:::-::....,.,-+. :::; P(T2 < Tl + h).
Now, as h ! 0, P(T2 < Tl + h) ! P(T2 :::; Tt} = 0 and 1- e-hA(O) '" hA(O)
hence (4.13).
On the other hand, we have
A(U) = lim.!.
h!O h
(1- e-hA(U»)
so by (4.8) and (4.12)
A(U) = ~~ L ~P(Nh = k)(l- uk).

! k~l
Using (4.13) we obtain
0:::; ~m L ~P(Nh = k)(l- uk):::; lim P(N~ ~ 2) = 0

k~2 h!O·
Consequently,
A(U) = lim-1 P(Nh = 1)(1- u) = A(l- u),

h!O h
where A = limh!o P(Nh = l)/h. Finally

9 ( u) = e-At(l-u)
t , 0< < 1,
_u_
which is the moment generating function ofP(At) and the proof is complete.
<>
The following important properties of (Nt) have been obtained in the
above proof:
Corollary 4.1 As h - 0(+), we have
P(Nt+h - Nt = 0) = 1- Ah + o(h), (4.15)
P(Nt+h - Nt = 1) = Ah + o(h), (4.16)

P(Nt+h - Nt ~ 2) = o(h). (4.17)
84 Lesson 4
Thus, for small h, Nt+h - Nt follows approximately the Bernoulli distri-
bution 8(1, )..h): in a sufficiently small time interval, at most one event may
occur and the probability of this occurrence is proportional to the length
of that interval.
Comments about axioms
In order to construct Poisson Processes, other axioms may be used.
Consider the axioms
A~: No = 0; 0 < P(Nt > 0) < 1, t > O.

A3: For any t ~ 0,
lim P(Nt+h - Nt ~ 2) = O.
h..... O P(Nt+h - Nt = 1)
Then A~ and A3 together with Ai and A2 imply (4.6). Clearly A~ and A3

are consequences of Ao, Ai, and A2. It should be noticed that our axioms
are simpler than classical systems like A~, Ai , A2, and A3. The idea may
be found in Neveu (1990).
4.3 Interarrival Times

Let (Nt, t ~ 0) be a Counting Process associated with the Point Process
(Tn, n 2:: 1). Set To = 0 and consider the interarrival times
Wn = Tn - Tn-l. n ~ 1.
If (Nt) is a Poisson Process, then the sequence (Wn ) has some special
properties given by the following
Theorem 4.2 Let (Nt) be a Poisson Process with intensity)... Then the
Wn 's are independent with common exponential distribution characterized
by
P(Wn > t) = e- At , t > 0, n ~ 1 (4.18)
and consequently
E(Wn ) = 1/)", n~1. (4.19)
Theorem 4.2 contains an important and paradoxical property of Poisson

Processes: if n ~ 2, Wn is the waiting time between two successive events,
= =
but this interpretation is not true for Wi Tl - To since To 0 is not an
event time-arrival. However Wi and Wn have the same distribution!
Proof of Theorem 4.2. It suffices to show that

n
P(W1 > t1,···' Wn > tn ) = II e- At " t1,···,tn ~ 0; n ~ 1. (4.20)
;=1
If n = 1 the result follows from Theorem 4.1 since
P(W1 > t1) = P(T1 > tI) = P(Nt = 0) = e- Ah .

Now for convenience, we only establish (4.18) for n = 2. A similar proof
could be given for n > 2.
Taking 0 ~ S1 < t1 < S2 < t2, we may write
P(S1 <T1 <t1,s2 <T2 <t2)

P(NSl = 0, Ntl - NSl = 1, NS2 - Ntl = 0, Nt2 - NS2 ~ 1)
e- ASl A(t1 _ sde-A(tl-Sl)e-A(S2-tl) (1 _ e A(h- S 2»)
A(t1 - sI) (e- AS2 _ e- At2 )
181 <Yl <tl,82<Y2<t2

A2e-AY2dY1dY2.
Which shows that
( Y1, \2 -AY21
Y2 ) t--+ 1\ e {O<Yl <Y2}
is the density of (T1, T 2). Since (W1, W2) = (T1, T1 +T2) it follows that the
density of (W1, W2) is
A2 e- A(W l +w2)I{wl>0,w2>0},
hence (4.20) by integration. <>

Corollary 4.2 Tn has the distribution Gamma (n, A) with density
(At)n-1
fn(t) = Ae- At 1_ ,\,lIR+(t). (4.21)
Proof. Consider the identity
n-1 (At);
P(Tn > t) = P(Nt < n) = ~e-At_.I_' t > O.
;=0 J.
86 Lesson 4
Taking derivative with respect to t we get
n-1 (At)i n-1 Aiti-1

-fn(t) = - A ""
L.J e -At -.,- + ""
L.J e -At -(-.--1-)'
i=O J. i=1 J .
(At)n-1
_Ae- At (n _ I)!' t > 0,
hence the desired result follows. o

We now show that the obtained properties in Theorem 4.2 characterize
Poisson Processes.
Theorem 4.3 Let (Tn) be a Point Process such that the random variables
=
Wn Tn - T n -1, n ~ 1 are independent with the same exponential distribu-
tion £(A). Then the associated Counting Process (Nt) is a Poisson Process
with intensity A.
Proof. By hypothesis (W1,"" W n ) has the density

\n -A(w 1 +,,+w")1
1\ e {Wl>O,"',w .. >O}.
Setting
ti = W1 + ... + Wi, 1~ i ~ n,
we obtain the density of (T1,"', Tn):
f(T 1 ,. .. ,T.. )(t1,···, tn) = A e- At "l{o<tl< ... <t .. }.

n (4.22)
Now, for convenience, we only compute the distribution of (N8' N t -N8),

t > s. For that purpose we write
P(N. = k,Nt - N. = n) = P(Tk ~ S < Tk+n,Tk+n ~t < Tk+n+d

f Ak+n+1e-At"+"+ldt1" . dt k+n+1.
JO<tl < ···<t,,~.< ···<t"+n~t<t,,+ ..+1
Now it is easy to obtain the following equalities:
1 t
00 Ak+n+1e-At,,+n+ldt k+n+1 -- e-At,k+n
1\ ,
1. 8<t"+1 < ... <t,,+ .. ~t

dtk+1 ... dtk+n =
(t - s)n
-'--_.!..-
n!
1
and
sk
dtl·· ·dtk = ,.
O<tl< ... <t,,~, k.
Combining the above results and applying Fubini's Theorem (see Appendix),
we obtain
P (N, = k, Nt _ N, = n) = (~ne->'(t-') (t ~!s)n)

x (~ke->"~~), k=0,1,2,···, n=0,1,2,···.
which completes the the proof of Theorem 4.3. <>

Finally we may characterize a Poisson Process either by
(NtpNt1 -Ntp···,Nt" -Ntk _1) ...... P(Atd®···®P(A(tk-tk-d, (4.23)
k ~ 1, °
$ tl < ... < tk, or by
(Tl, T2 - Tl,···, Tn - Tn-d ...... ®ne(A), (4.24)
where ...... means "is distributed as" and ® denotes the product measure (see
Appendix).
4.4 Some Properties of Poisson Processes

The current section is devoted to some properties which are useful for sta-
tistical studies of Poisson Processes.
a) Poisson processes and order statistics
First let us define the order statistics associated with i.i.d real random
variables Ul,·· ., Uk, as the random vector (U(1), ... , U(k) where U(l) ~
... ~ U(k) is a rearrangement of the Ui'S.
The next theorem shows that (Tl , ... , Tk) is a "conditional" order statis-
tics.
Theorem 4.4 Let (Nt) be a Poisson Process associated with the Point
Process (Tn), then
.c «Tl ,···, n)INt = k) =.c (U(l)'···' U(k))' k = 1,2,···; t> 0, (4.25)
where (U(l), ... , U(k) denotes the order statistics associated with i.i.d. ran-
dom variables with uniform distribution over [0, t].
88 Lesson 4
Proof. It is easy to show that (U(l),···, U(k») has the density k!/(t k )
l{o<ul< ... <uk<t} (exercise).
Now let us consider the conditional probability
. p(AnB)
P(t; $11 $ t; +h;; 1 $ z $ klNt = k) = P(BIA) = r t f A\ ,
°
where < tl < tl + hl < t2 < ... < tk + hk < t.
Noting that An B = (ti $ 11 $ 11 + hi; 1 $ i $ k, Tk+l > t) and that
the density of (Tl' ... , Tk+d is Ak+le-AUk+l1{o<Ul< ... <Uk+d (see (4.22)),
we get
P(A n B) = Ak+l f e-AUk+ldul ... dUk+l.

JO<tl <Ul <tl+hl <···<tk+hk ,Uk+1>t
By iterated integrations we obtain
P(A n B) = Ae- At hl ... hk
and since Nt follows the P(At) distribution, we have

p(An B) k!
P(A) = tk hl ... hk'
dividing by hl ... hk and taking the limits on both sides, as maxl<i<k hi -
0, leads to the desired result. - - <>
Comments
The preceeding theorem shows that the generation of a Poisson Process
on [0, t] can be performed as follows:
(a) Construct a realization of Nt, a P(At) random variable.
(b) Let k be the result of (a), construct a sample Ul,···, Uk of the
uniform distribution over [0, t].
Thus Theorem 4.4 reveals that U(l),···, U(k) may be considered as the
time arrivals of a Poisson Process with intensity A. Clearly if k = 0, (b)
does not take place and the corresponding event is Tl > t.
Now from a statistical point of view, one realizes that, if Nt is observed,
the reconstruction of a Poisson Process sample can be performed even if A
is unknown. This fact means that Nt furnishes all the information about
A contained in the time arrivals: Nt is a "sufficient statistics". As a conse-
quence we will see that an estimate of A may be based on Nt (See Lesson
13).
We now state without proof another "conditional" result:
Theorem 4.5 If(Tn) is a Point Process associated with a Poisson Process,

then
.c «Tl' .. " Tn-1)ITn = t) = .c (u(1), .. " U(n-l») , t > 0, n ~ 2, (4.26)

where (U(l),"" U(n-l») is the order statistics of i. i. d. random variables
Ul, .. " Un-l with uniform distribution over [0, t].
b) Renewal property of Poisson processes.

Let (Nt) be a Poisson Process and (Tn) the associated Point Process.
Given an instant s > 0, define a new Point Process as
TiS) = TN,+! - s, TJS) = TN.+2 - s, ...
and the associated Counting Process
Ni S) = NsH - Ns, t ~ O.
Then we have the following renewal property of (Nt).
Theorem 4.6 (Nt(s), t ~ 0) is a Poisson Process with the same intensity

as (Nt). Furthermore (Nt(s>, t ~ 0) and (Nt, 0 ~ t ~ s) are independent.
Proof. The proof of the first claim is straightforward since Ao, A!, and
A2 are clearly satisfied.
The second claim means that the random vectors Ul = (Ntl' ... , N tk )
and VI = (Nt~ll"'" Nt~~) are independent for any choice of h ~ 1, k ~ 1,
and 0 ~ tl < .. " < tk ~ S < tk+l < .. " tho
In order to prove that statement it suffices to remark that U = (Ntl' N t .-
N tl , ... , Ntk - N tk _ l ) and
V = (NS+tk+l - N s , N S+tk+. - NS+tk+l"'" N S+tk+h - NS+tk+h_l)

are independent and that U1 (respectively VI) is a function of U (respec-
tively V). This completes the proof of the theorem. <>
Comments
(i). It can be shown that the property in Theorem 4.6 remains valid if
one replaces a constant time s by a random variable S which only depends
on the past (i.e., for any t ~ 0, the event {S ~ t} is a function of the
random variables N s , 0 ~ s ~ t). Such a random time is called a stopping
time and the associated Poisson Process is
Nt(S) = NSH - Ns, t ~ O. (4.27)

90 Lesson 4
(ii) The property described in Theorem 4.6 is somewhat surprising be-
cause it entails that TN.+l - 8 has the distribution &(A), therefore Tn -Tn-l
and TN.+l - TN. do not have the same distribution.
This fact is illustrated by the famous bus paradox: if buses arrive at
a station each ten minutes and if I arrive at the station at time s, the
expectation of my waiting time is five minutes, if they arrive irregularly
according to a Poisson Process with intensity 1/10, the expectation of my
waiting time is now ten minutes! we refer to Exercise 4.7 for details.
c) Limit theorems for Poisson processes.

The asymptotic behavior of (Nt) for t large is given by the following
theorem:
Theorem 4.7 Let (Nt) be a Poisson Process with intensity A, then, as t

tends to infinity,
Nt - A (4.28)
t
with probability one and
Nt - At
(4.29)
..fii
converges in distribution to the standard normal distribution.
Note that (4.29) may be written
P ( Nt-At)
IT; ~ x --+.
1
m= 1:& e- u 2/2 du, x E JR. (4.30)
VAt v211" -00
Proof of Theorem 4.7. Suppose that t is an integer (t = n) and write

N n _ Nl + (N2 - N l ) + ... + (Nn - N n - l )
n n
Then (4.28) is straightforward by applying the strong law of large numbers
(Lesson 1).
Similarly (4.29) is an easy consequence of the central limit theorem
(Lesson 1) since
N n - An _ (Nl - A) + (N2 - Nl - A) + ... + (Nn - N n - l - A)

.;>:;; - .;>:;;
For the general case we refer to exercises. <>
4.5 Processes related to Poisson Processes

a) Non-homogeneous Poisson process.
In practice the intensity often varies with the time; this is, for example,
the case with interarrival times of telephone calls during the day. So it
is natural to introduce Poisson processes with variable intensity. More
precisely a Counting Process (Nt, t ~ 0) is a non-homogeneous Poisson
Process if A o, A l , and A3 are satisfied and, in addition, if
A4: There exists a positive locally integrable function A(.), defined over
II4 and such that
lim 1 - P(Nt+h - Nt = 0) = A(t), t > O.

h-O(+) h -
Under the above conditions, it can be shown that Nt '" P(A(t)), where
A(t)= J~ A(s)ds (exercise). Clearly A(t) = At in the homogeneous case.
Using a suitable change of time the non-homogeneous Poisson process
can be transformed into a homogeneous Poisson Process. If (Nt) is non-
homogeneous, the process (Ms, s ~ 0) defined by
Ms = N A -l(s), s ~ 0, (4.31)
where A-l(s) = inf{u : A(u) ~ s} is a homogeneous Poisson process with

intensity 1. For an application of (4.31), see Exercise 4.14.
b) Compound Poisson processes.

Consider the following two examples of stochastic models:
(i). Policy holders suffer damages at Poisson times 0 < Tl < ... <
Tn < .... They respectively obtain settlements Yl , Y 2 , ••• , then the total
settlement at time t is given by
Nt
Xt = LYn , t ~ 0, (4.32)
n=O
where, for convenience, Yo = O.

(ii). A particle suffers impacts at Poisson times Tn. Yn denotes the
particle displacement at time Tn. Assuming the particle to be 0 at time 0,
its position X t at time t is given by (4.32).
Thus a stochastic process (Xt, t ~ 0) is said to be a compound Pois-
son process if it can be represented by (4.32), where (Nt) is a Poisson
92 Lesson 4
process, (Yn , n ~ 1) is a sequence of i.i.d. random variables with Yo = 0,
and (Nt) and (Yn ) are assumed to be independent.
c) Other processes related to Poisson processes.

We only mention some other processes related to Poisson processes.
Details may be found in the bibliography.
- A counting process is called a generalized Poisson process if it satisfies
Ati, A l , A 2 , and
As: There exists a sequence (rk' k = 1,2, ... , ) such that for each k
lim P (Nt+h - Nt
h-O(+)
= klNt+h - Nt ~ 1) = rk.
- Poisson processes in IRd: Let E be a bounded Borel set in lRd and
let m be a bounded measure on (E, BE), where BE is the Borel u-field of
E. A family (NB' BE BE) of integer-valued random variables is called a
Poisson Process with mean measure m if NB '" P(m(B)) for any B E BE
and if for any k ~ 2, any B l ,···, Bk disjoint elements of BE, the r.v.'s
NB 1 , · · · , NBk are independent.
- A Cox process is a non-homogeneous Poisson process where (A(t), t >
0) is itself a stochastic process.
4.6 Exercises
4.1. Customers arrive at a shop according to a Poisson Process at a mean
rate of 10 times per hour. Find the
(i) Probability that only 1 customer arrives in 2 minutes,
(ii) Probability of no customer in 5 minutes, and
(iii) Probability that in two disjoint 2 minutes time intervals there ar-
rives at least 2 customers.
(Hint: Use Nt '" P(tf6).)
4.2. With the same assumption as in exercise 1, compute the probability
that the time interval between successive arrivals will be
(i) longer than 6 minutes,
(ii) shorter that 1 minute, and

(iii) between 2 and 7 minutes.
4.3. A particle counter records only every second particles arriving at the
counter. Particles arrive according to a Poisson process at a mean rate of
6 per minute. Let S be the waiting time between two successive recorded
particles. Find
(i) The distribution of S,
(ii) E(S) and V(S),
(iii) peS < 1).
(Hint: write S = Wn + Wn+1 .)
4.4. Let (Nt,t ~ 0) be a Counting process satisfying axioms A~, Ai, A 2 ,
and A3 . Show that (4.6) is valid.
4.5. Let (Tn, n ~ 1) be a Point process associated with a Poisson process
with intensity A. Show (4.11) using the relation Tn = Wi + ... + W n .
4.6. Prove Theorem 4.4.
4.7. Let (Nt, t ~ 0) be a Poisson process with intensity A and let s be a
strictly positive instant.
(i) Show that
P(TN.+1 - s ~ x, s - TN. ~ y) = e->,(x+Y )1[o,s](y)1 IR + (x),
(ii) Compute the distribution of s - TN •. Show in particular that P(s-

TN. = s) = e->'s.
(iii) Compute the distribution of TN. +1 - TN•.
(iv) Compare E(TN.+1 - TN.) and E(TN.+1 - s).
(v) Apply the above results to the bus paradox.
4.8. Prove Theorem 4.6 when s is replaced by a stopping time S.
4.9. Prove Theorem 4.7 for any t.
4.10. Let (Nl) and (N?) be two independent Poisson processes with re-
spective intensity Ai and A2. Show that (Nl + N?) is a Poisson process
with intensity Ai + A2.
4.11. Under the assumptions of exercise 10, consider the process
M t = Nl- N t2 , t ~ O.
(i) Show that (Mt) has stationary independent increments.

(ii) What is the distribution of M t - M" t > s?
94 Lesson 4
(iii) Compute
lim P(IMtl ::; c),
t-oo
c> o.
(iv) Suppose that Nl is the number of clients arriving at a taxi station
and that Nl is the number of taxis arriving at the same station. Interpret
the above results.
4.12. Let (Nl', t ~ 0), k = 1,2,··· be a sequence of independent Poisson
processes with respective intensity Ak such that 2:~=1 Ak < 00. Define
Nt = Nl + 2Nl + ... + kNtk + ... , t ~ o.

Show that (Nt) is a generalized Poisson process and compute the charac-
teristic function of Nt.
4.13. Let (Nt, t ~ 0) be a Poisson process with intensity A and E a bounded
Borel set in 1R+.
(i) Using a "conditioning argument" find the distribution of
L: IB(Tn),
00
NB = BEBE .
n=1
(ii) Find the distribution of (NBl> ... ' NB/o), where B 1 , •.. , Bk are dis-
joint Borel sets of E.
(iii) Show that (NB , B E BE) is a Poisson process in IR and find its
mean measure.
4.14. Let (Nt, t ~ 0) be a nonhomogeneous Poisson process with intensity
function A(t). Find the conditional distribution of (T1 , .•• , Tk) given Nt =
k.
4.15. Let X t = 2::;'0 Y n , t ~ 0 be a compound Poisson process. Suppose
that A is the intensity of (Nt) and Y n is a zero mean random variable with
variance 0- 2 > 0 and characteristic function ¢, n ~ 1.
(i) Find the characteristic function of X t .
(ii) Find the asymptotic distribution of Xt/VX as A tends to infinity.
Lesson 5
Continuous - Time
Markov Chains
Poisson processes in Lesson 4 are examples of continuous-time stochastic

processes (with discrete state spaces) having the Markov property in the
continuous-time setting. In this Lesson, we discuss the probabilistic struc-
ture and some computational aspects of such processes with emphasis on
Birth and Death chains.
5.1 Some typical examples

Let (Nt, t ~ 0) be a Poisson process with intensity A. Its independent
increments property implies the following.
For any time points 0 :5 81 < ... < 8 n < 8 < t and i 1, ... , in, i, i in the
state space IN = {O, I, ...} such that i1 :5 ... :5 in :5 i :5 i, we have
P (Nt = i1N. 1 = i1,.··, Ns" = in, N. = i) = P (Nt = ilNs = i).

Indeed,
= =
P(Nt ilNsl i1," .,N." in,Ns i)= =
= = = =
P(NS1 i1,.·.,Ns i,Nt i)/P(Ns1 i1, ... ,N. i) = =
_ P(NS1 = =
i1,Ns2 - N. 1 i1 - i1, ... ,Nt - Ns i-i) =
P(NS1 =
i1,Ns2 - NSl =
i2 - i1,· .. ,N. - N." i-in) =
= =
= P(Nt - Ns i-i) P(Nt iiN. i). = =
95
96 Lesson 5
Moreover,
P(Nt =jlN = .) _ { e-A(t-,)[A(t-,)]i-; for i 5: j

, a - 0 (i-i)!
for i > j
(see Lesson 4), so that, for fixed i,j, the quantity
Pij(s, t) = P (Nt = jlN, = i)

depends only on t - s. Thus it suffices to let
Pij(t) = P (N,+t = jlN, = i) , for s, t ~ 0, i 5: j.

We define now stochastic processes having similar properties as above.
In this Lesson, the discrete state S is taken to be the set of non-negative
integers 1N or a subset of it, unless otherwise stated.
Definition 5.1 Let (Xt, t ~ 0) be a continuous-time stochastic process with
discrete state space S. Then (Xt, t ~ 0) is said to be a continuous-
time Markov chain if for any 0 5: Sl < S2 < ... < Sn < s < t and
i l , .. . i n , i,j E S,
P(Xt = jlX'l = i l , ... ,X,,, = in,X, = i) = P(Xt = jlX, = i), (5.1)

whenever these conditional probabilities are defined.
If the transition probability functions
Pij(S, t) = P(Xt = ilX, = i), s < t,

depend only on t - s, then (Xt, t ~ 0) is said to have stationary (or time-
homogeneous) transition probabilities. In this case, we set
Pij(t) = P(X,+t = jlX, = i) for all s ~ O.

In this Lesson, we consider only Markov chains with stationary transition
probabilities.
For t ~ 0, 1P(t) denotes the transition probability matrix [Pij(t)]. If S
is finite, then IP (t) is a finite (square) matrix, whereas if S is (countably)
infinite, then 1P (t) is an infinite matrix. Also,
IP (0) = 1= [Oij],
the identity matrix, where
Oij = { ~ =
if i j
ifi=f;j.
Continuous - Time Markov Chains 97
The above definition generalizes the Markov property in the discrete-

time case (see Lesson 3). Unlike the discrete-time case, there is no exact
counter-part ofthe one-step transition probabilities since there is no implicit
unit length of time in the continuous-time case. Recall that, in the discrete-
time case, the n-step transition probabilities can be expressed in terms of
the one-step transition probabilities. Here, as we will see later, there is a
matrix Q, called the (infinitesimal)generator ofthe chain, which plays the
role of the one-step transition probability matrix in the discrete case.
Note right away that, the distribution of a continuous-time Markov
chain is completely specified from the knowledge of the JP(t)'s, t ~ 0, and
the intial distribution 11"0 of Xo. Indeed, for 0 < tl < ... < tn, we have, by
Markov property (5.1),
P(Xtl = i l ,·· .,Xtn = in) = P(Xtl = i l )Pi i

1 2 (t2-t t}·· . Pin_lin (tn-tn-t)
with
P(Xtl = it} = LP(Xo = i,Xf} = it) = L1I"0 Pii1 (tt}.

iES iES
Note that a Poisson process (with intensity A) is non-decreasing, that

is s < t implies N$ ~ Nt. If we interprete an occurence of the 'event of
interest as a "birth", then a Poisson process is called a birth process. A
birth increases the population size by one. In view of the Corollary 4.1, we
see that the infinitesimal transition probabilities are
P (Nt+h - Nt = liNt = i) = Ah + o(h), as h \. O.
The point is this. The intensity A does not depend on i. However, in

the study of the population growth, the rate of birth might depend on the
population i at time t. Stochastic models for such situations are given as
follows.
Definition 5.2 Let (Xt , t ~ 0) be a stochastic process with state space

S = {O, 1,2, .. .}. Then (X:) is called a birth process if it is a non-
decreasing Markov chain such that
P (Xt+h - X t = 0IXt = i) = 1 - Aih + o(h) (5.2)
and
P (Xt+h - X t = 11Xt = i) = Aih + o(h). (5.3)
The positive numbers Ai, i E S, are called the birth rates of the process.
98 Lesson 5
Remarks.
(i) Ai is interpreted as the birth rate at an instant at which the popula-
tion size is i.
(ii) A Poisson process is a birth process with Ai = A, for all i E S.
(iii) (5.2) and (5.3) imply that
P (Xt+h - X t ~ 21Xt = i) = o(h) as h '" O.

Thus, in a short time interval, at most one birth can occur.
(iv) As we will see in subsequent sections, that the knowledge of the
Ai'S is sufficient for the specification of the H>(t)'s, which in turn, together
with an intial distribution 'lr0, determine the distribution of the chain.
Here are some simple examples of birth processes.
Exam.ple 5.1 Consider a population, say in biology or physics, in which

no individual may die and each individual acts independently in given birth
to a new individual with probability Ah+o(h) during (t, t+h). Let X t denote
the population size at time t. Given that X t = i, the number of births during
(t, t + h), for small h, follows a binomial distribution B(i, Ah), so that
P (Xt+h - X t = klXt = i) = ( !) (Ah)k(l - Ah)i-k + o(h),
and hence
P(Xt+h - X t = 0IXt = i) = 1- (iA)h + o(h)

and
P(Xt+h - X t = llXt = i) = (iA)h + O(h).
Thus Ai = iA. This Markov chain is called a linear birth process (Yule
process).
Exam.ple 5.2 Suppose that, in the population of example 5.1, new individ-
uals immigrate into the population at a constant rate v. Then the birth rates
=
become Ai iA + v. Chains of this type are called linear birth processes
with immigration.
In a birth process, the population size can only increase with time. To
model random phenomena in which a population can increase as well as
decrease in size, say by births and deaths, we need to include the concept
of death rates into the description of this more general type of processes.
Definition 5.3 Let (Xt, t ~ 0) be a Markov chain. Then (Xt ) is called a

birth and death process if, as h \. 0,
Aih + o(h) if k = 1
P (XtH - X t = klXt = i) = { J.tih + o(h) if k =-1 (5.4)
o(h) if Ikl > 1.
( P(Xt+h- X t=OIXt =i)h=I-(Ai+J.ti)h+o(h). )

The Ai's and J.ti 's are called the birth rates and death rates, respectively.
Remark.
It is assumed that births and deaths occur independently of each other.
Of course, Ai ~ 0, J.ti ~ 0 with 1'0 = O. We will discuss the problem of
modeling a birth and death process with given Ai and J.ti later. If 1'0 =
0
for all i ~ 0, then the chain is called a birth chain; if Ai for all i ~ 0, then
the chain is called a death chain.
Example 5.3 Consider the population in which individuals do not repro-

duce. The death rate per individual is 1'. Furthermore, suppose that new
individual immigrate into the population according to a Poisson process with
intensity A. Let X t denote the population size at time t. Then as h \. 0,
we have
P(Xt+h - X t = llXt = i) = P(one arrival, no deaths) + o(h)

Ah(l- J.th)i + o(h) = Ah + o(h),
P(Xt+h - X t = -IIXt = i) = P(no arrivals, one death) + o(h)

=
(1- Ah)i(J.th)(l- J.th)i-l + o(h) (iJ.t)h + o(h).
(Note that P (IXt+h - Xtl ~ 21Xt = i) = o(h).) Thus, Ai = A and J.ti = iJ.t.
Example 5.4 The continuous-time branching processes are used to model

the evolution of the populations of, say, particles. Consider a population
of particles which evolves as follows. Independently of each other, a par-
ticle alive at time t will have an additional random time, exponentially
distributed with parameter a (0 < a < 00), then splits into a random num-
ber of offspring according to the offspring distribution f(k), k = 0, 1,2, ....
Let X t denote the population size at time t. If X t = i, then the process will
remain at i for a random time, exponentially distributed with parameter
ia, and then jumps to another state j ~ i-I with probability f(j - i + 1),
and so on. As we will see in the next section, with the above description
100 Lesson 5
of the evolving population of particles, (Xt, t ? 0) can be modeled as a

Markov chain on the state space S = {O, 1,2, ...}. It follows from the above
probabilistic structure that as h \. 0,
Pij(h) = iaf(j - i + l)h + o(h) for j f:. j, j ? i-I,
Pu =1- iah + o(h) for i E S.

and
Pij(h) = o(h) for j < i-1.
5.2 Computational aspects

Let (Xt, t ? 1) be a continuous-time Markov chain with stationary transi-
tion probabilities and with discrete state space S.
The transition probabilities Pij(t) = P(Xt = jlXo = i) satisfy the
following relations
Pij(t) ? 0, for all t ? 0, i, j E S, (5.5)
I: Pij(t) = 1, for all t ? 0, i E S, (5.6)

jES
and
Pij(s + t) = I: Pi/c(t)Pkj(S) for all t, s ? 0, t, s E S. (5.7)
kES
When S is infinite, it may happen that
I: Pij(t) < 1 for some i and t.

jES
This can be explained by saying that the chain has a positive probability of
escaping to infinity (by adding the element 00 to the state space S). Such
a chain is said to be dishonest. The chain is honest when (5.6) holds.
(5.7) is the Chapman-Kolmogorov equation. Its proof is similar to the
discrete-time case and is left as an exercise. In matrix form, (5.7) expresses
the fact that the function t --+ JP (t) has the "semigroup property":
JP(t + s) = JP(t)JP(s), t, s ? 0. (5.8)
As we will see, in applications IP(t) will be determined from available

data such as rates of births and deaths in Birth and Death chains. It should
be kept in mind that (IP(t), t ? 0) is qualified as a family of transition
matrices for a Markov chain if (5.5) - (5.7) hold. The condition (5.7) is
essential since it allows to define finite dimensional distributions via
P (Xti = il, ... , X t .. = in) = ~ 'lTOPii i (tl)Pi ii (t2-t d ... Pi

2 n_ i i .. (tn -tn-d
iES
in a consistent manner (the system of finite dimensional distributions so

obtained satisfies Kolmogorov's consistency condition for the existence of a
stochastic process having these distrbutions as its finite dimensional distri-
butions - see Lesson 2). (Exercise 5.11).
Now, if we look at the examples in the previous Section, we see that the
data available are the infinitesimal transition rates qij. For example, for a
birth process,
Pij(h) = Oij + qijh + o(h) as h '\. 0,
where
-Ai if j = i
qij = Ai
{ =
if j i +1
o otherwise
Thus we know Pij(h) for small h. But then it is possible to compute Pij(t)
for all t ~ O. (Exercise 5.14).
More formally, the above qij'S can be written as:
.. _ 1· Pij(h) - Oij _ 1· Pij(h) - Pij(O) - p' (0)

q'J - 1m
h'\.O
h - h'\.O
1m h - iJ·
(right derivative of Pij(t) at t = 0). Thus we would like to determine IP (t),

t ~ 0, from the knowledge of JP'(O).
Before addressing this important problem, we need to discuss the exis-

tence of IP'(O) for general Markov chains.
Observe that, for Pij(t) of a Poisson process,
lim Pij (t)
t'\.O
= Oij, i,j E S.
=
That is the function Pij(t) is right continuous at t 0 (recalling Pij(O) =
Oij). This condition turns out to be general enough for investigating general
Markov chains.
Definition 5.4 Let (IP(t), t ~ 0) be the transition matrix function of a
continuous-time Markov chain (Xt, t ~ 0) on S. We say that (IP(t), t ~ 0)
is standard if
limlP(t) = I. (5.9)
t'\.O
102 Lesson 5
Remark.
For (5.9) to hold, it suffices that
lim Pii(t)
t'\.o
= 1, for all i E S.
(See Exercise 5.5).

Due to the rich structure of (5.8), it can be shown that (5.9) implies
that ]P'(O) exists. Moreover, -00 ~ P/i(O) ~ 0 for all i E S, Pij(O) < 00
for i i' j, and Pij(t) is continuously differentiable on (0,00) for all i, j (see
Exercise 5.14).
Let
Q = [qij] = ]P'(O) = [Pfj(O)].
Since
qij = lim (P;j(t) - Oij) It,
t'\.O
we see that % ~ 0 for i i' j, and qii ~ 0 for all i.
The quantities qij are referred to as the (infinitesimal) transition rates
of the chain. By reasons which will be specified shortly, the matrix Q is
called the generator of the chain.
Note that, in general, :LjEs Pij(t) ~ 1 (for a dishonest chain), so that
L:q;j ~ O.
iES
This can be seen as follows.

Write
_1-_P'---='i(,-,-t) >! " p'.. (t)
t - t L...J I} •
j¢i
Taking limits as t '\. 0 of both sides (on the right hand side, first consider
a finite number of terms, then let the number of terms go to infinity), we
obtain
qii ~ L:qij.
j¢i
When this inequality holds, the chain can disappear instantaneously to

infinity. However, in applications, chains are rather conservative, that is
L:%=O for all i E S. (5.10)

jES
For example, for chains with finite state space S, we have
L:Pij(t) =1
jES
so that
1 - Pii(t) =L P;j(t).
j#
Since the P{j(O) exist and are finite, for j ::P i, it follows that (since S
is finite), qii exists and is finite. The situation when S is infinite is more
complicated. For example, although P{j(O) < 00 for j ::P i, qii might be
-00; also even if qii is finite, (5.10) may not hold.
From the above analysis, we see that the generator Q of a (honest)
Markov chain should be such that
q"IJ >
_ 0 for j ::P i
and
Lqii =0 for all i.
jES
To derive relations between Q and IP(t), we use the Chapman-

Kolmogorovequation (5.7):
Differentiating (5.7) with respect to s and then set s = 0, we get the so
called Kolmogorov Forward equation
P!j(t) =L Pi/c(t)P£j(O) =L Pik(t)qkj

kES kES
or in matrix form
IP'(t) = IP(t)Q. (5.11)
Similarly, differentiating (5.7) with respect to t and set t = 0, we get
the Kolmogorov Backward equation
IP'(t) = QIP(t). (5.12)
Let us look at the case where S is finite. Given a matrix Q with qij ~ 0
for j ::P i, qi < 00 for i E S, and LjES % = 0 for all i E S. Let
+L
00 t n (n)
Zij (t) = Oij ,qij , (5.13)
n=l
n.
where q~j) denotes a generic element ofthe nth power ofthe matrix Q, i.e.,
Qn = [q~)].
Since the matrix Q is finite, let a = max !qij! < 00, then obviously,
!qIJ~~)! <
-
can ,
104 Lesson 5
where c is a constant, so that (5.13) converges. It is easy to check that

(5.13) is a solution of
ZIj(t) =L qikZkj(t)
kES
with the boundary condition Zij(O) = Oij. Moreover the Zij(t) given by
(5.13) do satisfy the Chapman-Kolmogorovequation. Thus (5.13) corre-
spond to the transition probabilities of a Markov chain having Q as gener-
ator. In summary, for finite S, the unique solution of (5.11) and (5.12) is
given by, in matrix form,
IP(t) = etQ , (5.14)
where, for a square matrix A, e A stands for
An
=1+ L-;!'
00
eA
n=l
so that
tn
=L = I.
00
etQ ,Qn, with QO

n=O
n.
That is the generator Q determines IP(t) uniquely.
When S is infinite, the matrix Q has to satisfy some condition for (5.13)
to converge. But for infinite S, (5.11) and (5.12) hold only under a stronger
condition than (5.9).
Definition 5.5 We say that (IP(t), t 2: 0) is uniform if
limPii(t)
t ...... o
=1 uniformly in i. (5.15)
Note that, since LjES Pij(t) = 1, we have Pij(t) :::; 1 - Pii(t) so that
(5.15) implies
limPij = O.
t ...... o
It can be shown that, under (5.15), LjES qij = 0 for all i E S. Moreover
IP(t) is the unique solution of Kolmogorov's equations, namely IP(t) = etQ .
In other words, if (IP(t), t 2: 0) of a Markov chain (Xt ) is uniform, then the
knowledge of the generator Q = IP'(O), together with the initial distribution
of Xo, determines the distribution of the chain.
Now, in view of the Exercise 5.14(ii), the condition (5.15) is clearly
satisfied when
sup Iqii I < 00. (5.16)
i
In fact, (5.16) is equivalent to (5.15).

It should be noted that a unique solution to the Kolmogorov's equations
may exist under weaker conditions than (5.15). In applications, we have
available a generator Q with qij > 0 for j # i and L:jEs qij =
0 for all
i, then the Kolmogorov backward equation has a smallest solution Zij (t)
such that
Z··(t)
~ >
- 0 and '~ " Z··(t)
~ <
- 1
jES
for all t ~ 0 and i E S (under the boundary condition Zij(O) = Oij ). If
L.: Zij(t) = 1 for all t ~ 0 and i E S,

jES
then
Z(t) = [Zij(t)]
is the unique solution to both Kolmogorov equations.
The meaning of qii is clarified as follows. If (Nt, t ~ 0) is a Poisson
process with intensity A, then
A ifj=i+1
qij = { -A if j = i
o otherwise.
(See Corollary 4.1).

Also -qii = A is the parameter of the exponential distribution of the
interarrival times ( see Theorem 4.2). Looking at Figure 1 in Lesson 4, this
provides an explanation for the behavior of sample paths of this special
Markov chain. Suppose that the chain enters a state i at time t, then it
remains there for a random time exponentially distributed with mean l/A
and then jumps to another state. General Markov chains have a similar
behavior as we now elaborate:
Let (Xt, t ~ 0) be a Markov chain with state space S. As in the case of
Poisson processes, let (Tn, n ~ 0), with To = 0, be an increasing sequence
of random times such that
lim Tn = 00 (a.s.)
t_oo
The times Tn, n ~ 0, are the instants of transitions of the chain. At time
Tn the chain is in state Yn =
X T". Thus
00
Xt = L.: XT"l[T",T"+l)(t). (5.17)

n=O
106 Lesson 5
The condition
lim Tn = 00 (a.s.)
t-oo
is needed for the above representation of X t is valid for all t ~ o. Chains
satisfying this condition are called non-explosive (or regular). If, with posi-
tive probability, liffit_oo Tn < 00, then the chain explodes, in the sense that
it can make an infinite number of transitions during a time interval of finite
length (so that sample paths of the chain might not be step functions. By
a step function, we mean a function such that, in any finite time interval,
it has at most a finite number of jumps). It turns out that if (IP(t), t ~ 0)
satisfies (5.15), then explosions are not possible (so that almost all sample
paths of the corresponding chain are step functions). In applications, as we
have seen before, this condition can be checked by looking at the matrix of
infinitesimal transition rates (generator) Q, namely (5.16).
Suppose that (5.16) is satisfied. We are going to show that waiting (or
holding) times in states are exponentially distributed, and in fact, condi-
tionally upon states being visited, these random times are independent.
Given that X, = i, the waiting time Wi of the chain in state i is the
(random) time that the chain first leaves i, that is
Wi = inf{t > 0 : X,+t # i}.

(Technical note: The chain is separable and is defined on a complete prob-
ability space - Lesson 2).
The distribution of the random time Wi is determined as follows.
Look at P(Wi > tlX, = i). Now
(Wi> t) = (w : X,+u(W) = i, u E [0, t]).

Let
An = {W : X,+kt/2"(W) = i, k = 0,1, .. . ,2n}, n ~ 1.
Then An's decrease to
A = n An =
00 {
w: X u +, = i, Vu E [0, t] of the form u = 2~}
n .
n:l
Obviously (Wi > t) ~ A. If W ¢ (Wi > t), that is Wi(W) ~ t, then

Wi(W) = v ~ t such that X'+II(W) = j # i. Obviously W ¢ A if v = t.
Suppose that v < t. Then, since the sample paths are step functions (and
right continuous), X,+u(w) = j for all u E [v, w) for some w. Choosing n, k
appropriately, kt/2 n E [v, w), we have that kt/2 n < t and X'+kt/2"(W) =
j # i, so that W ¢ A. Hence A = (Wi> t).
Note that
P(AnIX. = i)
P (X. = i, X.+t/2" = i, ... , X.+(2"-1)t/2'" X.+t = ilX. = i)
2"
[Pii (2tn ) ] (by Markov property)
Now since An \. A, we have
P(Wi > tlX8 = i) P(AIX8 = i)
lim P(AnIX.
n--+oo
= i) = n-+oo
lim [Pii (2t )] 2"
n
eqiit , if -00 < qu ~ O.
( Pu (2tn) = Pii(O) + PMO)2tn + 0 (2~ ), as n -+ 00,
for details, see Exercise 5.9). For qii = -00, that is
lim Pii(h) - 1
h'\.O h = -00,
or
. 1 - Pii(h)
11m h = 00,
h'\.O
meaning that, for arbitray 0 < a < 00, we have (1 - Pu(0))h- 1 > a for h
sufficiently small. Thus, for n sufficiently large,
p'.. ( - t ) 0, and hence
P(Wi > tiX. = i) = O.

Remark.
When qii = -00, the state i is called an instantaneous state [ P(Wi =
0IX. = =
i) 1 upon entering i, the chain leaves it instantaneously]. A
108 Lesson 5
state such that qii > -00 is called stable. A stable state i such that qii = 0
is called an absorbing state ( P(Wi > tlX3 = =
i) 1, for all t > O. Once
the chain enters i, it remains there forever). When entering a stable, non-
absorbing state i (-00 < qii < 0), the chain spends a random time Wi in
i, where Wi is exponentially distributed with mean -l/qii, then jumps to
another state.
Consider a Markov chain (Xt, t ~ 0) such that all states are stable
(qii > -00, Vi E S). Using the strong Markov property (see Lesson 3),
it can be shown that the successive states visited by (Xt ), namely Yn =
XT", n ~ 0, from a discrete-time Markov chain whose one step transition
matrix R = [Rij] (called the jump matrix) is determined as follows.
If i is absorbing (qii = 0), then the chain (Xt ) will remain in i perma-
nently once entered it. Thus it cannot jump to any other states, hence
if j = i
Rij = P(Yn +1 = jlYn = i) = { ~ if j:/; i
When qii < 0 (recall that qii ~ 0 for all i and 0 ~ qij < 00 for all j :/; i),
that is i is non-absorbing state, then obviously Rii O. =
For j :/; i and i non-absorbing, we have
Rij = -qij/qii. (5.18)
Thus we have
Cij if qii = 0
Rij = { (Cij - l)qij/qii if qu < o.
The derivation of (5.18) is essentially based upon the strong Markov
property. To see why (5.18) holds, argue as follows.
-qij 1. Pij(h)
--= 1m ,
qii h'\,O 1 - Pii(h)
where Pij(h)/(l-Pii(h)) is the conditional probability that the chainjumps
to state j given that the chain is in the state i in the time interval (t, t + h)
and is going to jump during that interval.
In summary, under suitable conditions, the structure of (yt) can be
described as follows. The discrete-time chain Yn = XT", n ~ 0 is Markov
with one step transition matrix R. Conditionally upon (Yn ), the waiting
times Tn+l - Tn, n ~ 0, are independent and exponentially distributed with
parameter depending on the states being visited. Thus when the chain (Xt )
enters an absorbing state, it will stay there forever, whereas if it reaches a
non-aborbing state XT" = i, it will spend a random time Wi in that state,
where Wi is exponentially distributed with mean -l/qu, and then jumps
to another state XT,,+1 = j with probability Rij.
5.3 Distributions of Birth and Death Chains

In this section, we will illustrate the determination of transition probabili-
ties of a Markov chain from its generator.
Let (Xt , t ~ 0) be a Birth and Death chain on the state space S ~
{O, 1,2, ...}, with birth and death rates Ai, Pi, respectively, i = 0,1,2, ...
(po = 0).
The generator Q is specified by (see (5.4))
qi,i+1 Ai,
qi,i-l Pi,
qii -(Ai + Pi),
and qij = 0 otherwise. Note that for each i,
qii > -00 and Lqij = O.

jES
Consider the Kolmogorov's differential equations. The forward equation

(in general, an infinite system of differential equations) has the specific form,
for i ~ 0 and j ~ 1,
{ P/o(t) = -AOPiO(t) + P1Pi1(t) (5.19)

P/j(t) = Aj-1Pi,j-l(t) - (Aj + pj)Pij(t) + Pi+l Pi ,i+ 1 (t).
whereas the backward equation takes the form, for j ~ 0, i ~ 1,
{ Ptij(t) = -AOPOj(t) + AOP1j(t) (5.20)

P/j(t) = PiPi-l,j(t) - (Ai + Pi)Pij(t) + AiPi+l,j(t).
Unless the state space S is finite, it is not evident that both (5.19) and
(5.20) have a common solution
{Pij(t), i,j E S, t ~ O},
which describes the distribution of a Markov chain having Q as its gener-

ator.
Let us start by considering a (pure) birth chain on S = {O, 1,2, ...}
(Pi = 0 for all i E S). In this case, the forward system (5.19) becomes
{ P/o(t) = -AoPiO(t) for i ~ 0
(5.21)
P/j(t) = Aj-1Pi,j-l(t) - AjPij(t) for j ~ 1.
110 Lesson 5
subject to the boundary conditions Pij(O) = 6ij . It is possible to solve

the forward system (5.21) yielding a unique solution. This can be seen as
follows. Observe that, by the nature of a birth chain Pij(t) = 0 for j ';t, t ~ O.
Other Pij(t), for j > i, can be computed recursively, via
Pij(t) = Aj-l lot e->'j(t-3) Pi,j_l(S)ds (5.22)
Thus for arbitrary specified birth rates Ai, i ~ 0, the above Pij(t) given
by (5.22) are non-negative, satisfy the Chapman-Kolmogorovequation. But
it may happen that
L Pij(t) < 1 for some i, t.

jES
Since the Pij(t)'S are functions ofthe birth rates Ai's, this phenomenon can
be checked by examining the generator Q. It can be verified that if (5.22)
provides a proper probability distribution, that is,
LPij(t) = 1 for all i E S, t ~ O. (5.23)

jES
(so that P(Xt < 00) = 1 for all t), then (5.22) is also the unique solution
of backward equation, and in this case, the generator Q does specify the
transition probabilities of a Markov chain. Thus conditions on Q for (5.23)
to hold are of practical importance. Previously, in Section 5.2, we have
mentioned a sufficient condition for (5.23) to hold namely (5.16). This
condition might be too strong for Birth and Death chains. It turns out
that a weaker condition might be sufficient and necessary for the generator
Q of a birth chain to specify uniquely the distribution of the chain.
Theorem 5.1 Let (Xt, t ~ 0) be a birth chain on S = {O, 1,2, ...} with
birth rates Ai, i ~ O. Then a necessary and sufficient condition for (5.23)
to hold is
00
~ ;. =00. (5.24)
1=0 1
Proof. (a) Sufficiency. Assume (5.24). For n ~ i, let

n
Sn(t) = E Pij(t).
j=i
Using the forward equation (5.19), we have
S~(t) = -AnPin(t).
In virtue of the condition Pij (0) = Oij, we obtain
1 - Sn(t) = An lot Pin(s)ds. (5.25)
As n -+ 00, the right hand side of (5.25) decrease to a limit a(t) ( since
obviously, Sn(t) increases with n ). Thus for each n ~ i,
lot Pin(s)ds ~ a(t)

An
1otSn(s)ds
nt
and hence
= E 1Pij(s)ds ~ a(t) E~·
1 n
j=i 0 j=1 J
Now, since Sn(t) ~ 1 (by (5.25)), we have
a(t) L: ~1 ~
n it Sn(s)ds ~ t.
j=1 J 0
Under (5.24), these inequalities can only hold when a(t) = 0 for all t. Thus
Sn(t) -+- 1 as n -+ 00, for all t, meaning that (5.23) holds.
(b) Necessity. Since
io
t p... ()d _ l-Sj(t)
$J s s - A. '
J
j ~ i,
we have
t Sn(s)ds ~ L: ~.1
Jo
n
o j=i J
If Ej=i Aj -1 < 00, then J;

Sn(s)ds is bounded, contradicting the hypoth-
esis that (5.23) holds, i.e.,
lim Sn(t) = "L..J Pij(t) = 1,

n~oo
for all t. o
jES
112 Lesson 5
Remark.
For a Poisson process, the transition matrices (1P(t) , t ~ 0) form a uni-
form semigroup (condition (5.16) holds). Also, condition (5.24) is clearly
satisfied. For a linear birth process (Example 5.1), where Ai = iA, the
chain is not uniform, but condition (5.24) does hold. As an example for a
dishonest chain, consider Ai = i 2 , i ~ 1. Since
00
'" 1
!--' i2 11'2
= -6 < 00,
,=1
the above theorem asserts that, for some t and i, EjEs Pij(t) < 1, so that
the chain escaped to infinity at or before time t with positive probability
1 - EjEs Pij(t).
For a general Birth and Death process, the situation is similar. Given
arbitrary Ai ~ 0, I-'i ~ 0, and i ~ 0, there always exist transition prob-
abilities Pij(t), as a solution to Kolmogorov's differential equations, such
that EjEs Pij(t) :$ 1. Under some conditions on the A;'S and l-'i'S (e.g.,
they are bounded or increase sufficiently slow), this solution is unique and
determines a honest chain.
Example 5.5 Consider a machine which is either in an operating state ot

in a repair state. Suppose that the operating time (repair time respectively)
is a random variable, exponentially distributed with mean 1/A (1/1-' respec-
tively). One question of interest could be the probability that the machine
is in an operating state at time t, knowing that it was in an operating time
zero.
°
For t ~ 0, let X t = or 1 according to, at time t, the machine is in an
operating state or in a repair state.
=
°
(Xt, t ~ 0) is a Birth and Death chain on the state space S
with birth rates AO = A A1 = and death rates 1-'0 = 0, 1-'1 = 1-'.
{O, I},
The equation (5.19) becomes, for i =j = 0,

P~o(t) = -APoo(t) + I-'P01(t). (5.26)
Since Poo(t) + P01(t) = 1, (5.26) becomes
P~o(t) + (A + I-')Poo(t) = 1-'.

Thus
Poo(t) = -1-'- + ae-(A+~)t
A+1-' .
The constant a is determined by the initial condition Poo (O)=I, and finally
Poo(t) = -1'- + _>'_e-(>'+~)t

>.+1' >.+1' .
Example 5.6 Consider a population in which deaths occur at a constant
rate I' (so that 1'0 = 0, I'i = Il, for i ~ 1). We are going to solve the
forward equation to obtain the transition probabilities of the death chain.
For each i, by the nature of the death chain, it suffices to find Pij(t) for
j $ i, subject to Pij(O) = Oij. Note that Poo(t) = 1. The forward equation
takes the form
PIo(t) = I'Pi1(t)
PIj(t) = -I'Pij (t) + I'Pi,j+1(t), j = 1, ... , i - I
P{i(t) = -I'Pii(t).
The solution of this system of differential equations can be obtained directly
by using (5.14).
First, Pu(t) = e-~t. Next,
P{,i-1(t) = -I'Pi,i-1(t) + I'Pu(t).
If we let f(t) = Pi,i-1(t) and g(t) = I'Pii(t) = I'e-~t, then the above
equation is of the form
I'(t) = -af(t) + g(t) (5.27)

(Here a = 1'). The solution of (5.27) is
f(t) = f(O)e- at + 10t e-a(t-s)g(s)ds. (5.28)
Thus
Pi,i-1(t) = 10t e-~(t-s)l'e-~tds = I'te-~t.
Similarly, we get
P.ij (t) =,(I't)i-j

_. e
-~t
_.\1
£
or J
.
= 1,... ..
,~, ~
>
_ 1.
Since
Pi1(t) = (I't)i-1 e-~t,
,. 1 \I
114 Lesson 5
we have
P/o(t) = Jl Pi1(t) = Jl(Jlt)i-1 -JJt
(i-I)! e
so that
it
PiO(t) = o Jl(Jls)i-1
,. e- JJ3 ds,
1\1 i ~ 1.
5.4 Exercises
5.1. Let (Xt, t ~ 0) be a continuous-time Markov chain. Show that, for
any 0 ~ tl < t2 < ... < tn < ... < tm and ij E S, j = 1,2, ... , m,
= in+l , ... , Xtm = imlXtl = i l , ... , X t" = in)

P (Xt,,+l
= P (Xt,,+l = in+! , ... , Xtm = im IXt" = in) .
5.2. Consider a compound Poisson process (Lesson 4): X t = E:-~o Yn ,

where Nt is a Poisson process with intensity A, and the Yn's are integer-
valued random variables, independent and identically distributed with com-
mon probability density function 1/;.
(i) Show that (Xt, t ~ 0) is a Markov chain.
(ii) In terms of A and 1/;, find the transition probability matrix (lP(t) , t ~
0) of (Xt ).
5.3. Suppose that a population evolves in time according to a continuous-
time Markov chain. Furthermore, suppose that each individual in the pop-
ulation has a probability of Ah+o(h) of splitting into two, and a probability
of Jlh + o(h) if disappearing in small time interval (t, t + h). Find the birth
rates and death rates.
5.4. Prove the Chapman-Kolmogorovequation (5.5).
5.5. Let (lP(t) , t ~ 0) be the family transition probability matrices of a
continuous-time Markov chain. Show that (IP(t) , t ~ 0) is a standard if and
only if
Pii(t) - + 1 as t \. 0, for all i.
5.6. Let (Xt, t ~ 0) be a Markov chain with finite state space S.

(i) Suppose G =[gij] =
lP'(O) exists. Show that EjEs gij =
0, for all
i E S.
(ii) Show that if Pii(t) - + 1 as t \. 0, then this convergence is uniform
in i.
(iii) Show that if Pij(t) is right continuous at t = 0, then Pij(t) is

continuous for all t.
(iv) Show that if (IP(t) , t ~ 0) is standard, then (Xt ) is non-explosive.
5.7. Show that Pii(t) --+ 1 uniformly in i if and only if supd -Yii} < 00.
5.B. Show that supd -Yii} < 00 is equivalent to sUPi,j IYij I < 00.
5.9. Suppose that an -+ a < 00, as n -+ 00. Show that
lim
n-+oo
(1- an)n
n
= e- a .
5.10. Let (Nt, t ~ 0) be a Poisson process with intensity A. Viewing (Nt)

as a birth process,
(i) Determine the generator Q.
(ii) Verify that condition (5.16) is satisfied.
(iii) Derive the transition probabilities Pij(t) by using the equation
(5.14).
5.11. Let Zij (t) satisfy
Zij(t) ~ 0, LZij(t) = 1,
jES
and
Zij(t + s) = L Zile(t)Zlej(S).
leES
For a distribution 11"0, define
P (Xti = iI, ••• ,Xtn = in) = L 11"0 ( i)Ziii (tI) ... Zin_ii n(tn - tn-I).
iES
Show that Kolmogorov's consistency condition is satisfied for this system

of finite dimensional distributions.
5.12. Let (Xt, t ~ 0) be a continuous-time branching process (see Example
5.4). Verify the following:
(i) If a is the parameter of the exponential distribution of the lifetime
of each paritcle, for X t = i, the chain will remain in i for a random time,
exponentially distributed with parameter ia.
(ii) At the end of the above time, the chain jumps to state j ~ i - I
with probability f(j - i + 1).
116 Lesson 5
5.13. Let IP(t) be the transition matrix function of a Markov chain. Show
that, for given to > 0, the values of IP(t) for t > to can be computed from
the values IP(t) for t ::; to. (Hint: Use Chapman-Kolmogorovequation.)
5.14. Let (IP(t), t ~ 0) be standard.
(i) For fixed i, j E S, show that the function t ---+ Pij(t) is uni-
formly continuous. (Hint: Use Chapman-Kolmogorov equation to show
that IPij(t + h) - Pij(t)1 ::; 1- Pu(lhl).)
(ii) For each fixed i E S, show that
qu = h'\.O
lim Pii(h) -
h
1
~ -00
always exists. Also
1 - Pii(t) < -qii for t > O.

t
(iii) For fixed i 'I j, show that
qij = lim Pij(h)
h'\.O h
exists and is finite. (Hint: Use the following fact: for each t, h small enough,
and £ > 0, we have
Pij(h) < Pij(t) 1
- h
-- - ---
t-h 1-3£
for £ > 0 arbitrary small.)
5.15. Let (IP(t) , t ~ 0) be standard (of a honest Markov chain). Let i E S
be such that qii > -00. Verify that, for any j,
Pfj(t) = L: qiT"Pkj(t).
kES
5.16. In Example 5.5, compute PlO (t), P01 (t), and Pu(t).
5.17. Let (Xt, t ~ 0) be a linear birth chain with immigration (Example
5.2), that is >'i = V + i>., i ~ O. Use the forward equation to derive the
transition probabilities of the chain.
Lesson 6
Random Walks
In this Lesson, we study a special class of discrete-time Markov chains

known as random walks. Because of their special features, these stochastic
processes deserve a Lesson in their own right.
6.1 Motivation and definitions

Various random phenomena which evolve in time can be approximately
modeled by sums of independent and identically distributed (i.i.d.) random
variables (or vectors).
A simple random walk is used to describe the path of an intoxicated
person who moves one step forward or backward at random. Starting at
some position on the line, say 0, the person moves one unit forward to 1
or one unit backward to -1 at time n = 1, with equal probability 1/2, and
his/her motion continues with the same pattern. For n ~ 1, let Xn denote
the jump at time n. Then Xn is regarded as a random variable with
1
P(Xn = 1) = P(Xn = -1) = 2.
It is reasonable to assume that the Xn's are independent. At time n, the
position of the person is
Sn = Xl + X 2 + ... + X n .
The above mathematical model can be used to describe the game of
heads or tails. At each toss of a fair coin, you bet on its outcome, winning
117
118 Lesson 6
one dollar if, say, heads comes up, and losing one dollar if tails come up.
Your gain after n independent tosses is expressed by
Sn =Xl +X2 +···+Xn , n ~ 1,
where the Xn's are i.i.d. with

if the outcome of the nth toss is heads
Xn = { ~1 if the outcome of the nth toss is tails
This game is described by the motion of a point on the set '!h of integers,
where at each time unit, it jumps to one of the two neighboring states with
equal probabilities.
Physicists use this type of random walk model to approximate Brownian
motion (Lesson 12). In this context, Sn denotes the position of a particle
after n jumps.
Stochastic processes such as (Sn, n ~ 1) above are used to model similar
random phenomena in physical science and others, such as the problem of
insurance risk, the escape of comets from the solar system, the content of
a dam, etc ...
Roughly speaking, a random walk is a discrete-time stochastic process
whose increments are i.i.d .. In the Lesson, we restrict ourselves to the case
where the state space of a random walk is some subset of '!h. Note that
general random walks can have state spaces as '!h d or IRd , d ~ 1.
Definition 6.1 Let (Xn, n ~ 0) be a sequence of independent, integer-
valued random variables, the Xn's, n ~ 1, being i.i.d. with common proba-
bility density function 1{;. Set So = Xo and Sn = Xl + X 2 + ... + Xn for
n ~ 1, then the discrete-time stochastic process (Sn, n ~ 0) with discrete
state space S !; 7.Z is called a random walk on S.
When S "= '!h, the random walk is said to be unrestricted. If S is a
proper subset of '!h, then the random walk is said to be restricted. In this
latter case, the endpoints of S are called barriers. There are two main types
of barriers: the endpoint i is an absorbing barier if the "particle" remains
there forever once it reaches ij the end point j is a reflecting barrier if the
particle bounces back once it reaches j.
It is clear from the Definition 6.1 that a random walk (Sn,n ~ 0) is
a discrete-time Markov chain with stationary transition probabilities. The
one-step transition probability matrix IP has entries given by
Pij = P (Sn+l = jlSn = i) = 1{;(j - i). (6.1)

Random Walks 119
Indeed,
P (Sn+1 = ilSo, ... , Sn = i) P(Xn+1 = i-i)

= P(Xn+l = i - ilSn = i) = P(Sn+1 = ilSn = i).
Example 6.1 (i) A simple random walk on 7Z.
Let X n , n ~ 1 be i.i.d. random variables with
P(Xn = 1) = p, P(Xn = 0) = r, P(Xn = -1) = q,

where p + q + r = 1. The matrix IP has entries
if i = i + 1
~j=n ifi = i
if j = i-I
otherwise.
When r =
0 and p q = =
1/2, the simple random walk is said to be
symmetric. Here is a realization of a symmetric random walk, starting
from state O.
z
21 •
• •
0
-I • • •
-2 •
03
120 Lesson 6
(ii) Bernoulli random walk.
Let So = 0 and Sn = Xl + ... + X n , n ~ 1, where the Xn's are i.i.d.

with
= = =
P(Xn 1) p 1 - P(Xn 0). =
(Sn, n ~ 0) is a Markov chain on IN = {O, 1,2, ...} with transition proba-
0-'
bilities
if j = i + 1
p;; =
if j i =
otherwise.
(iii) A symmetric (simple) random walk with an absorbing bar-

rier at the origin.
Consider a random walk on IN starting at some i > O. At time n 1, =
the random walk will be in state i + 1 or state i - I with equal probability
1/2, and so on. Moreover, if the random walk enters state 0 at some time
n, then it will remains in this state thereafter, that is 0 is an absorbing
state: the random walk stops once it reaches state O. If we let Xn 's be i.i.d.
with
1
P(Xn = 1) = P(Xn = -1) = 2'
then
So = i, Sn = Xl + .. ·Xn , n ~ 1.
This is a restricted random walk in which if Sn 0 then Sn+1 = = O. The
transition probabilities of the Markov chain (Sn, n ~ 0) are
Poo = 1,
and for i ~ 1,
Pij ={ t if j =
i + 1 or j
otherwise.
=i-I
(iv) A simple random walk with two reflecting barriers.

Again consider a random walk (Sn, n ~ 0) with So i > O. Suppose =
i < k, where k is a given positive integer such that if Sn k then Sn+l = =
k - 1. Also suppose that 0 is a reflecting barrier, that is if Sn 0 then =
Sn+l =
1. If we let Sn = Xl + ... + X n , n ~ 1, where the Xn's are i.i.d.
with
= = =
P(Xn 1) p 1- P(Xn -1), =
Random Walks 121
then (Sn, n ~ 0) is a random walk on {O, 1, ... , k}, whose transition proba-
bilities are given by
=1
POj = { ~ ifj
if j 1= 1,
= k-l
Pkj = { ~ if j
if j 1= k -1,
and for 1 ::; i ::; k - 1,
P if j = i + 1
P,; = { 1-p if j = i - I
o otherwise.
In using random walks to model phenomena in applications, typical

problems of interest are:
(a) The asymptotic behavior of Sn as n gets large. How often a given

state i is visited by the random walk? The probability of returning to the
origin (or to any starting state i), the average time of return to the origin,
etc ...
(b) The hitting time of a state i , that is the time at which state i is first
entered. Starting, say, from the origin, this is also called the first passage
time from 0 to i. If state i is an absorbing state, then the hitting time of i
is the time to absorption.
(c) In a restricted random walk with two absorbing states, say a and b,
one is interested in computing the probability that the random walk reaches
a before b, etc ...
Since random walks are sums of i.i.d. random variables, it is expected

that limit theorems concerning sums of i.i.d. random variables, such as laws
of large numbers, law of the iterated logarithm, the central limit theorem,
will play some role in the study of asymptotics of random walks.
As an introductory Lesson to the topic of random walks, we treat only

the case of simple random walks, using elementary techniques. Note that
powerful techniques from Markov chains and Martingales (Lesson 11) can
be used in the analysis of random walks.
122 Lesson 6
6.2 Asymptotic behavior of the simple ran-

dom walk
We illustrate first the simple case of unrestricted random walk on the inte-
gers'll.
Let So = 0, Sn = Xl + ... + X n , n ~ 1, where the Xn's are i.i.d. with
common distribution given by
P(Xn = 1) = p = 1 - P(Xn = -1), O<p< 1.
The common mean and variance of the Xn's are
I' = 2p-l, and (12 = 4p(1 - p),
respectively.
The asymptotic behavior of Sn as n -+ 00 is as follows.
If I' > 0 (i.e., p > 1/2), then Sn --+ 00 (a.s.) by the strong law oflarge
numbers: the random walk drifts to 00; whereas, if I' < 0, then Sn --+ -00
(a.s.): the random walk drifts to -00.
= =
If I' 0 (Le. p 1/2, the random walk is symmetric), then the random
walk oscillates between -00 and 00 with probability one, since in this case,
-00 = lim inf Sn < lim sup Sn = 00

n_oo n_oo
(a.s.).
Remark. The above fact follows from the law of the iterated logarithm
which states that: for i.i.d. random variables Xn with E(Xn) = 0 and
o < (12 = V(Xn) < 00,
P (lim S~
sup (2 (1 2n Iog ogn )1/2
n-oo = 1) = 1
(I·n_oo. f (2(12nloglogn)l/2
and
P ImlD
Sn = -1 ) = 1.
From the above asymptotic behavior, it is plausible that the random
walk (as a Markov chain) is recurrent when I' = 0, and transient when
I' #= O. To see this, in view of Theorem 3.2 (Lesson 3), let us compute
Poo = P(Sn = OISo = 0).

Random Walks 123
= =
For -n ~ j ~ n, it is clear that P(Sn j) 0 if n and j do not have the
same parity, whereas if nand j are of the same parity,
P(Sn = j) = ( ~ ) p(n+j )/2(1_ p)(n-j)/2.
In particular, we have for n ~ 1,
P(S2n = 0) = ( 2: ) pn(l - pt
and
P(S2n-l = 0) = O.
E00(2)
:
Thus
00
~P;o = pn(l_ Pt·
Next, observe that
( 2n ) = (2n)! '" _1_22n

n n!n! fo
in view of Stirling's formula:
n! '" v'21rn nne-n as n -+ 00,
where f(n) '" h(n) as n -+ 00 means that
lim f(n)/h(n)
n-oo
= 1.
(For a proof of Stirling's formula, see e.g., W. Feller, An Introduction to
Probability Theory and Its Applications, Volume I, pp 50-52, Wiley, 1957).
It follows that
P;o '" [4p(1- p)]n /fo.
But 4p(1 - p) ~ 1 with equality if and only if p = 1/2. Thus
00
LP;o=oo if and only if p = 1/2.
n=O
In other words, the random walk is recurrent when J.I. = 0 and transient
when J.I. =F O.
Note that, in the transience case, the probability of returning to state
o infinitely often is zero, whereas, in the recurrence case, this probability is
one. Specifically,
124 Lesson 6
(i) If J.& 1= 0, then P (lim sUPn-+oo(Sn = 0)) = 0,

(ii) If J.& = 0, then P(limsuPn-+oo(Sn = 0)) = 1.
It is interesting to prove (i) and (ii) directly as follows.
Let An = (Sn = 0), n ~ 1. Then (i) follows from Borel-Cantelli lemma.
Indeed, when J.& 1= 0, from the above analysis, we have E:=o P(An) < 00,
and hence
P (limsUPAn) = O.
n-+oo
(ii) does not follow from Borel-Cantelli lemma, since the An's are not in-
dependent (although, when Jl. = 0, E:=o P(An) = 00). One can prove (ii)
as follows.
Let N(w) = #{n : Sn(w) = OJ, the number of times the random walk
reaches state 0 in the realization w, and
Ak = {w : N(w) ~ k}, k ~ 1,
then A = n~=l Ak is the event that the random walk reaches state 0 an
infinite number of times. Assuming So = 0, we have
peA) = lim peAk).

k-+oo
Now state 0 is recurrent, peA) = foo = 1 (see Section 3.3 for notation). By
induction, we have
peAk) = (/oo)k = 1, for k ~ 1,
(see Exercise 6.6), and hence peA) = 1.

Remarks.
=
(a) Zero-one laws. The event A limsuPn-+oo An, where An (Sn = =
0) in the above simple random walk on 'lZ has 0 or 1 as the possible values
for its probability. It is interesting to prove (ii) within the context of the
so-called zero-one laws.
Recall that the Borel-Cantelli lemma is a form of zero-one law for inde-
pendent events, namely, if (Bn, n ~ 1) is a sequence of independent events,
then
P (limsUPBn) = 0 or 1.
n-+oo
Now, the An, n ~ 1, are not independent, but A = limsuPn-+oo An is a
symmetric event, that is A remains unchanged under any finite permutation
of (Xn , n ~ 1). By this, we mean the following. A one-to-one mapping
Random Walks 125
7r: {I, 2, ...} --+ {I, 2, ...} is called a finite permutation if 7r(n) = n except
a finite number of n. A finite permutation of (Xn, n ~ 1) is (X.,..(n) , n ~ 1),
where 7r is a finite permutation. Thus, P(A) = 0 or 1 according to Hewitt-
Savage zero-one law. Note that, in the random walk case, the Xn's are
i.i.d ..
lt is interesting to note that, although A = limsupn ..... oo(Sn = 0) is not
n
a tail event, that is A ¢ U(Xk' k ~ n), the Kolmogorov's zero-one law can
be used to prove that P(A) = 1 when J.' = O. Indeed, for integer m> 0, let
Cm = {W : lim sup Sn(W)/v'n >

n ..... oo
m}
and
Dm = {w : liminfSn(w)/v'n < -m}.
Then C m , Dm are tail events, and hence, according to Kolmogorov's zero-
one law, P(Cm ) and P(Dm ) can take only 0 or 1 as values.
By the Central Limit Theorem (in case p = 1/2), we have
hm. P ( -Sn
n ..... oo Vn > m) = --
~
1 1m
00
e- x 2/2 dx > O.
On the other hand, since
lim sup
n..... oo
S~ > m} ~ {W: limsup
{W: yn S~ > m},
n..... oo yn
we have
P(Cm) = P(Dm) ~ nlim P (S~ > m) > O.
..... oo yn
Thus
P(Cm) = P(Dm) = 1 for all m ~ 1.
It follows that P(Cm n Dm) = 1 for all m ~ 1, and hence P(B) = 1, where
B =
m=l
n00
(CmnDm)
= ·
{ w: Ilmsup Sn
r=
n..... oo y n
= 00, I··
lmlnf Sn r.;;;;; = -00 } ,
n..... oo y nn
(observe that the (Cm n Dm) decrease as the m -+ 00). But put B ~ A
and hence P(A) = 1.
(b) When the random walk is recurrent (p = 1/2), it can be shown
that the probability that the random walk reaches a state j E 'll in a finite
126 Lesson 6
number of steps, denoted as F(O,j), is one, for any j. In the transience

case, for example when p > 1/2,
for j > 0
F(O,j) ={ ~(1-~) for j = 0
(0Y for j < o.
(c) The asymptotic behavior of the simple random walk is shared by

general random walks on'll.
We summarize the above results in
Theorem 6.1 Let (Sn, n ~ 0) be a simple random walk on 7£. (So = 0,

Sn = Xl + ... + X n , n ~ 1, the Xn's being i.i.d. with common mean
J.t = 2p-l).
(i) If J.t > 0 (resp. J.t < 0), then Sn drifts to 00 (resp. -00) almost
surely.
(ii) If J.t = 0, then Sn oscillates between -00 and 00, almost surely.
(iii) The simple random walk is recurrent or transient according to J.t = 0
or J.t i= o.
6.3 Returns to the origin

Let (Sn, n ~ 1) be a simple random walk on'll. We are interested in the
random times at which the random walk returns to the origin, when So = O.
It suffices to look at the time of first return, since consecutive times of visit
to 0 are simply independent copies of this time.
Let T8 denote the time of first return to O. T8 is the hitting time of
{O}, that is
=
Tg(w) inf{n ~ 1 : Sn(w) O}. =
We are going to determine the generating function of T8, from which
the distribution of T8 as well as its expected value can be derived.
= P(Sn = OISo = 0) and
Let u(n) be the transition probability PQ'o
v(n) = p(T8 = n) = P (Sj i= O,j = 1, ... , n - 1; Sn = 0).
Then, for n ~ 1, we have

n
u(n) = L v(k)u(n - k). (6.2)

1:=1
Random Walks 127
Indeed, (6.2) is a special case of the following general relation: for any
i,j E 7h,
n
Plj = ~Fk(i,j)p}rk, (6.3)
k=l
where P{j = P(8n = jl80 = i) and
Fk(i,j) = P (81 ", j, ... ,8k-1 '" j; 8k = jl80 = i).

(6.3) is proved by induction as follows.
(6.3) is true for n = 1, since Pj~ = 1, and Pij = Fk(i,j).
Suppose that (6.3) holds for k :::; n. We have
Pij+1 ~ Pil:P:j (Chapman-Kolmogorov equation)

l:e7h
n
= ~ Pil: ~Fk(X,j)p}rk. (by induction hypothesis)
l:e7h k=l
But
~ PiI:Fk(X,j) PijFk(j,j) + ~PiI:Fk(X,j)

!1:e7h #OJ
PijFk(j,j) + FJ:+1(i,j).
Thus
n n
Pijn+1 = '"'
L..J p.ijL'kz;1
J,J')pn-k
(.
jj + '"'
L..J L'k+1
z;1 (.
I,J
n- k
')pjj
k=l k=l
n n+1
L..J Fk (.J,J')pn-k
p.ij '"' jj L..J Fk (.I,J')pjnj+1- k
+ '"'
k=l k=2
n+1
= PijPJi + ~Fk(i,j)P}j+1-k (by induction hypothesis)
k=2
n+1
= ~ Fk(i,j)P;'j+1- k (since Pij = F1(i,j)).
k=l
Now (6.2) is a form of "convolution" ofthe u(n)'s and v(n)'s. This suggests
the use of generating functions, since
(~u(n)sn) (~v(n)sn) = f [~V(k)U(n - k)j sn.

128 Lesson 6
So let
00 00
U(s) = ~ u(n)sn = 1 + ~ u(n)sn.

n=O n=l
We have from (6.2)

U(s) = 1 + U(S)V(S), (6.4)
where
00
V(S) = E (STg) = ~ v(n)Sn.

n=O
From the distribution of Sn (see Section 6.2), we have
u(n) = P(Sn = 0) = ( ~ ) (pqt/ 2
for n even (and zero for n odd). Thus
U(s) =~ ( 2nn ) (pqts2n = (1- 4pqs2)-1/2. (6.5)
This result can be seen as follows.
For x E nt and n nonnegative integer, set
( : ) = x(x - 1) ... (x - n - 1)/nL
Then
(-~2 ) (-~) (-~-I) ... (-~-n+l)/n!

(-1)( -3)··· (-1- 2n + 2) _ (-I)nl ·3··· (2n - 1)
n!2n
- .
n!2n
so that
1·3···(2n-l) _ _ n(-1/2)
.- - (1) n .
Thus
( 2n ) = (2n)! = 1· 3·· .(2n -1)2nn! = (-It ( -1/2 ) 22n

n n!n! n!n! n
Random Walks 129
and hence
U(s) = E(-l t ( -~2 ) 22n(pqs2t
~ ( -~2 ) (-It (4pqs2)n
(1 _ 4pqs2) -1/2 for 4pqs2 < 1.

Note that, for lal < 1, we have
(1 + aY = E(~ ) an.
From (6.4) and (6.5), we obtain
V(s) = 1- (1- 4pqs2) 1/2 . (6.6)

Letting s /' 1 in (6.6), we have
V(I) = =
1 - (1 - 4pqi/ 2 1 - (1 - 4p(1 _ p»1/2
= 1- (4p2 - 4p+ 1)1/2 = 1- [(2p _1)2j1/2 = l-12p-ll,
which is the probability that the random walk, starting at 0, ever returns
to o.
When p = q = 1/2 (symmetric simple random walk), V(I) = 1, so that
with probability one, the random walk will return to the origin. However,
the expected time of the first return is infinite, since
L: nv(n) = V'(I) =
00
E(Tg) = 00.
Finally, to obtain the distribution of T~ , we expand V( s) = 1-(1-4pqs2)1/2

as a power series in s.
V(s) = 1- E(1~2 ) (-It(4pqs2t
~ ( 1~2 ) (-It+ 1(4pqts 2n .
Thus
v(2n) = (_I)(n+l) ( 1~2 ) (4pq)n, n ~1
130 Lesson 6
(and v(2n - 1) = 0). Note that for n even, ( 1~2) < 0, whereas
( 1~2 ) > 0 for n odd.
More specifically, for n ~ 1,
(2n - 1 )
v(2n) = 2n 2_ 1 n (pq)
n
= 2n2q_ 1 P(S2n-l = 1).
Indeed,
v(2n) = (_It+l(pqt22n ( 1~2 )
= (_It+l(pqt22n [(_1)n+l 1 . 3 ... (2n-3)]

2nn!
2n
(pq)n _ (1 ·3· .. (2n - 3))
n!
2(pq)n (1.3 ... (2n-3)2 n- 1)
n!
_ 2(pq)n (2(n - I))!
- n! (n -I)!
(By observing that 2n- 1 = 2·2···2 «n - 1) times), and (n - 1)!2n - 1 =

2·4·· ·2(n-1), so that (2(n -I))! = 1·3·· .(2n-3)2n- 1 «n -1)1)). Thus
v(n) = 2(pq)n [(2n - I)! 1 ] = _2_( qt [(2n - I)!]

n! (n - I)! 2n - 1 2n - 1 P n!(n - I)!
_2-(pqt ( 2n - 1 ) = ~ ( 2n - 1 )
2n - 1 n 2n -1 n
2q
2n _ 1 P(S2n-l = 1).
Theorem 6.2 Let T8 =

inf{n ~ 1 : Sn = O} be the time of the first return
to O. Then
(i) The generating function of T8 is
V(s) = 1- (1- 4pqs2)1/2.

= =
(ii) When p q ~ (symmetric simple random walk), the random walk
will return to 0 with probability one, but E(T8) 00. =
Random Walks 131
(iii) The distribution o/T8 is given by:

=
For n ~ 1, p(T8 2n -1) 0 and =
P(~ = 2n) = (_1)n+1 ( 1~2 ) (4pqt.
6.4 First passage times

Consider again a simple random walk, starting at O. Let a E 'll, a > O.
The first passage time from the origin to point a is
T2 = inf{n ~ 1 : Sn = a}.
To derive the generating function for T~, it suffices to determine that of
TP, since
TOa -- TO1 + To2l + ... + Ta
a- l
'
where Tj is the first passage time from state i to state j. These first passage
times are i.i.d. so that
Ga(s) = (W(s)t ,
where GaO and W(·) denote the generating functions of ~ and Tf, re-
spectively.
Now
Tf(w) = inf{n ~ 1 : Sn(w) = 1}.
Conditioning on Xl, we get, for n ~ 2,
p(Tf = n) = pp(Tf = nlXl = 1) + qP(Tf = nlXl = -1)

= 0 + qP(T1- l = n - 1) = qP(T~ = n - 1).
Set wen) = p(Tf = n) and ¢(n) = P(T~ = n - 1), we have that
wen) = q¢(n - 1). (6.7)
Multiplying (6.7) by sn and sum over n ~ 2:
00
W(s) = w(1)s + L w(n)sn

n=2
00
ps+ q L¢(n -1)sn

n=2
00
ps + qs L ¢(n)sn.
n=l
132 Lesson 6
But L:~=1 q,(n)sn is the generating function ofT~ which is the sum Tf+TJ,
so that
L: q,(n)sn = W 2(s).
00
n=l
Thus
W(s) = ps + qsW2(s). (6.8)
The roots of this quadratic equation are
(1 ± Vl- 4pqS2) /(2qs).

The function (1 + VI - 4pqs2) /2qs cannot be a generating function, since
lim (1 + VI - 4pqs2) /2qs = 00,
.-0
whereas W(O) ~ 1. Thus, the generating function of Tf is
W(s) = (1 - VI - 4pQS2) /2qs, O<s<1
and hence the generating function of Tg is
Ga(s) = [(1- Vl- 4pqS2) /2qs] a .

Now,
W(I) = l-12p - 11 = {I if p ~ 1/2
2q p/q if p < 1/2.
Thus
o { 1 if p ~ 1/2
P (Ta < 00) = Ga(l) = (p/qt if p < 1/2
and E(Tg) = 00 if p < 1/2. For p ~ 1/2,

W'(I) ={ ~ for p 1/2=
p-q
for p> 1/2
so that
G~(1) = aWa - 1 (I)W'(I) = {~ for p 1/2=
p_q for p > 1/2.
Therefore, in a symmetric random walk on 7Z, all first times have an infinite
expectation. The distribution of Tg can be obtained via convolution of that
Random Walks 133
of Tf. In expanding W(s) in a power series as in Section 6.3 (see Exercise

6.14), the distribution ofTP
can be obtained as
P(T~ = 2n -1) _1_ (2n - 1 ) n n 1

2n -1 n p q -
1
2n _ 1 P(S2n-l = 1), n ~ 1.
Remark.
A direct calculation of the distribution of Tg can be carried out through
an analysis of sample paths of the random walk as follows.
From Exercise 6.1, we know that each path leading from (0,0) to (n, a)
has a probability of p(n tq )/2q(n-a)/2. The total numbers of paths from
(0,0) to (n, a) is ( (n +na)/2 ).
It is clear that
P(T~ = n) = o:p(n t a)/2 q(n-a)/2,

where 0: is the number of paths from (0,0) to (n, a) without touching or
crossing the level a before n. To find 0:, it suffices to determine the number
of paths from (0,0) to (n, a) which touch or cross the level a before time n.
There are two kinds of such paths:
Type I: paths such that Sn-l = a + 1,
Type II: paths such that Sn-l = a-I.
The total number of paths of Type I is
n-l )
( n-l!a±l = ( nn!a
-l) .
Observe that a path of Type II must touch or cross the level A before
time n - 1. Thus the total number of paths of Type II is the same ofthat
of paths from (0,0) to (n - 1, a-I) which touch or cross a before time
n - 1. The following reflection principle shows that the number of paths
from (0,0) to (n - 1, a + 1) which touch or cross a before time n - 1 is the
same as that of all paths from (0, 0) to (n-l, a+ 1), that is ( (nn-l
+ a)/2 ) .
By looking at the figure below, we see that if r 1 is a path from (0,0) to
(n - 1, a-I) which touches or crosses a bofore time n - 1, then there is a
path r 2from (0,0) to (n - 1, a + 1) obtained by setting r 1= r2 up to the
first time 6 ( < n - 1) the path rl hits a, and the rest of 1'2 is obtained by
reflecting r 1 about the level a.
134 Lesson 6
a+1
a-I
Conversely, if r3 is a path from (0,0) to (n -1, a + 1), then consider the

path r 4 obtained by setting r 4= r3 up to the first time r3 hits a (such a
time exists since in order to reach a + 1, r3 must reach the level a before
time n - 1), and the rest of r 4 is obtained by reflection. r 4 so obtained is a
path from (0,0) to (n -1, a -1) which touches or crosses a before n. Thus
there is an one-to-one correspondence between two kinds of paths.
Therefore
Q =( nta ) - 2 ( nni/ ) = ~ ( nta )

and
P(T~ = n) = ~ ( nt a ) p(n+a)/2q(n-a)/2
for n + a even and n ~ a, that is, for n = a + 2k k ~ o.

The distribution of ~ for a < 0 is obtained in a similar fashion (see
Exercise 6.16).
=
Theorem 6.3 Let T2 inf{n ~ 1 : Sn = a}, a> 0, a E ZZ. Then
(i) The generating function of ~ is
Ga(S) = [(1- (1- 4pqS2)1/2) /(2qs)r .

Random Walks 135
(II..J P(Ta0 < 00 ) = {I(p/q)a if p > 1/2

if p:< 1/2.
(iii) E(T!1) = 00 for p :$ 1/2.

(iv) The distribution ofT~ is given by
P(~a _- n) -- n~ ( n
~
) p(n+a)/2 q(n-a)/2
2
forn=a+2k, k~O.
6.5 A classical game

Probabilities of ruin.
We turn now to the analysis of the situation where a random walk leaves
a given interval on 'lI.,. Various practical situations (including gambling
games!) motivate the following set-up.
Consider again, for simplicity, a simple random walk on 'lI.,. Set So = 0
and
Sn = Xl + ···+Xn, n ~ 1,
where the Xn's are i.i.d. with
P(Xn = 1) =p= 1- P(Xn = -1).
For i E 'lI." the random walk starting at i is denoted by
,P" = i + Sn, n ~ o.
We use p(i) to denote the distribution of the process (S~, n ~ 0) (see
Exercise 6.7).
Now consider to integers a and b with a < b. Let Ta and n be the
hitting times of a and b, respectively, that is
Ta(w) = inf{n ~ 0 : ,P,,(w) = a}.
Then
a(i) = p(i)(Ta < n), for a:$ i :$ b,
is the probability that the random walk reaches a before reaching b.
Conditioning upon Xl, we have that
a(i) = pa(i + 1) + qa(i - 1). (6.9)

136 Lesson 6
On the other hand, from the definition of a(i), it is clear that
a(a) = 1 and a(b) = O. (6.10)
The method of particular solutions can be used to solve (6.9) (see Exercise
8). Here, in view of the form of difference equation (6.9), a direct way to
solve (6.9) is as follows. Observe that, in view of a(b) = 0,
b
a(i) =- L: [a(j) - a(j - 1)] (6.11)
j=i+l
and from (6.9), we have
a(j) - a(j - 1) = ~[a(j + 1) - a(j)] (6.12)

q
so that
a(i) = t
j=i+l
(~) b-j a(b _ 1) = 1 ~ ~/q)b-i a(b _
q p/q
1).
Since a(a) = 1, we obtain

1- p/q
a(b - 1) = 1 _ (p/q)b-a
so that
. 1 _ (p/q)b-i
a(z) - ---=.;~,....- (6.13)
- 1 _ (p/q)b-a'
for a ~ i ~ b, and provided that p'f; q (0 < p < 1).
When p = q = 1/2, the solution of (6.9), subject to (6.10), is
b-i
at --,
( .) =b-a a ~ i ~ b.
Now let (3(i) = p(i)(n < Ta). Then similarly, we have, for a ~ i ~ b,
(3(i)={ (1_(p/q)i-a)/(I_(p/q)b-a) when p'f; q

(i - a)/(b - a) when p = q.
From the above expressions for a(i) and (3(i), it follows that
a(i) + (3(i) = 1, for a ~ i ~ b,

Random Walks 137
meaning that with probability one, the random walk, starting from i, will
reach either a or b.
let us intepret the above results in the context of games. Suppose that
the initial capital of player I is z and that of player II is y. At each trial,
the player I will win one dollar with probability p and loss one dollar with
probability q = 1 - p. The fortune of player I after n trials is
s~ = z+Xn + .. ·+Xn.
The plarer I is ruined when the random walk enters state 0 before state z+y
(if the random walk enters state z + y first, then the player II is ruined).
In this context, state 0 and z + yare absorbing states. The probabilities of
ruin are computed as before by taking a = 0, i = z > 0 and b = z + y.
Expected duration of the game.
In the following, for simplicity, we take a = 0 and b > O. The simple
random walk on 'lh, starting at i (0 ~ i ~ b), represents the fortune of
Player I with initial capital of i:
S(i)
n
= i + Xl + ... + X n , n;::: 0,
(the initial capital of Player II is b - i). Since states 0 and b are absorbing
states, the game will stops when the random walk reaches either 0 or b.
Thus the stopping time of the game is
r(i)(w) = inf{n;::: 0: S~i)(w) E {O,b}}
and hence the expected duration of the game, starting at i, is E( r(i»).

Let I'(i) = E(r(i»), for 0 ~ i ~ b. Obviously,
1'(0) = I'(b) = O. (6.14)
To find I'(i), we derive a difference equation for I'(i) as follows.
E (r(i») E (E(r(i)IXd)
Lk [pp (r(i) = klX = 1) + qP (r(i) = klX = -1)]

l l
k;?:l
LU + 1) [pp (r(i+ l ) = j) + qP (r(i-l) = j)]

i;?:O
Pl'(i + 1) + I'(i -1) + 1.
138 Lesson 6
Thus
I-'(i) = Pl-'(i + 1) + ql-'(i - 1) + 1. (6.15)
Case P = q.
A particular solution of (6.15) is 1-'( i) = _i2 • Observe that the difference
of any two solutions of (6.15) satisfies the difference equation
r( i) = ~ (r( i + 1) + r( i-I)) ,
which has r(i) = i and r(i) == 1 as particular solutions, so that all solutions
= =
of (6.15), when P q 1/2, are of the form
I-'(i) = _i 2 + r + -yi.
The unique solution of (6.15) under the boundary condition (6.14) is I-'(i) =
i(b - i).
Case P f:. q.
In this case, (6.15) has a particular solution given by I-'(i) = i/(q - p),
and the difference of any two solutions of (6.15) satisfies
r(i) = pr(i + 1) + qr(i - 1),
which has r(i) == 1 and r(i) = (q/p)i as particular solutions. Thus all
solutions of (6.15), when p f:. q, are of the form
I-'(i) = r + -y (!)i
p
+ _i_.
q-p
Under (6.14), we have
I-'(i) = _1_
q-p
[i _ 1_(q/p)b·
(q/p)i]
b 1-
Note that
1- (q/p): = f3(i) = P(i)(Tb < To).
1 - (q/p)
Remarks.
(i) In the above analysis, we implicitly assume that 1-'( i) < 00 for all
0< i < b. This fact can be proved as follows.
Random Walks 139
For m ~ 1, let
(i)( )_ {min{O~k~m: Sf,(w)=Oorb}

Tm W - 'f {
m I ' " } -- 0.
Since the T~) increase with m, we have that
Jl( i) = n-+oo
lim E (r.(mi») .
On the other hand, the event {T~) = k} depends only on Xl, X 2, ... , X k,
so that {T~) = k} is independent of Xk+l, ... , X m . As a note, the random
variable T~) is a stopping time with respect to the sequence of increasing
u-fields:Fn = u(X1, ... ,Xn), n ~ 1 (:Fo = {O, 0}), in the sense that,
{T~) = k} E :Fk, for all k ~ O.
Consider the stopped sum

m
B!i!l(W) = L S1(w)1({Ti!l=k})(W),
k=O
We have
E (B!i!l) fk=O E (S11({Ti!l=k}»)
= fk=O E [(~m + 81- ~m) l({T~l=k})]

m
E(~m) + LE (81- ~m) P(T~) = k)
k=O
(This follows from the fact that the random variable Sf, - Sim = -(Xk+1 +
... + Xm) is independent of the random variable l({Ti!l=k})' as noted ear-
lier .), so that
m m
E (B!i!l) = LE (81) P(T~) = k) L[i + k(p - q)]P (T~) = k)

k=O k=O
i+(p-q)E(T~») .
140 Lesson 6
Thus, if p - q -I 0,
E
( i»)
Tm ::;
b
Ip-ql'
and hence ,,(i) < 00. If p - q = 0, then we need to relate E (T~») to S~s.:)
through another quantity. It is left as Exercise 6.12 to show that
E (~s.:) )2 = i 2 + E (T~») .
Thus, E (T~») ::; b2,and hence again, ,,(i) < 00.
(ii) In the gambling scheme, the case b = 00 corresponds to a game

against an infinitely rich adversary. The random walk describing this situa-
tion has 0 as an absorbing barrier. More specifically, for a = 0, the equation
(6.13) becomes
for p -11/2
l-(q/pt-i
a(i) = ab(i) = { l~(q/p)£
b-I
-b- for p = 1/2.
Thus, letting b -+ 00, the probability of ruin (starting at i) is such that
I when p ::; 1/2

b~~ ab(i) = { (q/p)i when p > 1/2.
Also, when p = 1/2,
lim ,,(i)
b-+oo
= b-+oo
lim i(b - i) = 00,
whereas, for p < 1/2,
. '" (.)
I1m I·1m -1- (., - b 1 -(q/P)i) = --.
= b-+oo i
b-+oo q- P 1 - (q/pt q- p
Note that in the context of a game against an infinitely rich adversary, if

p> 1/2, then the game may go on forever!
Distribution of the duration of the game.
Let 0 < b and A = {O, b}. The (random) duration ofthe game, starting
at i (0 < i < b), is
T(i)(W) = inf{n ~ 0 : ~(w) E A}.

Random Walks 141
The event {T(i) = n} is decomposed into two disjoint events
B = {sf: ¢ A, k = 0,1, ... , n - 1; s:a =O}

and
C = {sf: ¢ A, k =0, 1, ... , n -1; s:a = b}.
Let
u(i,n) = P(B) and v(i,n)=P(C).
Then
P (T(i) = n) = u(i, n) + v(i, n).
First consider u(i, n). By conditioning upon Xl, we obtain a difference
equation for u(i, n):
u(i, n + 1) = pu(i + 1, n) + qv(i - 1, n) (6.16)
with boundary conditions
{ u(O,n) = u(b,n) = 0 for n ~ 1

(6.17)
u(O,O) = 1, u(i, 0) = 0 for i ~ 1
A difference equation for the generating function U(i, s) of u(i, n)'s (that
is U(i, s) = E:=o u(i, n)sn) is obtained from (6.16) as follows. Multiplying
(6.16) by sn+l leads to
u(i, n + l)sn+l = psu(i + 1, n)sn + qsu(i - 1, n)sn.
Summation over all n ~ 0 leads to
U(i, s) = psU(i + 1, s) + qsU(i - 1, s) (6.18)
(noting that u(i,O) = 0) with boundary conditions
U(O, s) == 1, U(b, s) == O. (6.19)
Consider a solution of (6.18) of the form Ai(s). Such a solution satisfies
Ai(S) = PSA i +1 + qSAi-l(s)
or the quadratic equation
pSA2(S) - A(S) + qs = 0
142 Lesson 6
whose roots are (1

± Jl- 4pQS2) /(2ps), 0 < s < 1. Thus the general
solution of (6.18) is of the form
U(i, s) = a(s)'\~(s) + ,8(s)'\;(s),

where
'\l(S) = (1 + Jl- 4pqS2) /(2ps), '\2(S) = (1- Jl- 4pqS2) /(2ps).

The boundary conditions (6.19) lead to
a(s) + ,8(s) == 1 and a(s)'\~(s) + ,8(s)'\~(s) == 0,

so that
'\~(s) ,\t{s)
a( s) = ,b f _\ ,b f _\ , ,8( s) = ,h" , h, "
and hence
_ ,\t(s)'\;(s) - '\~(s)'\1 (s)
U(i,s) -
'\1b (S) - '\2(S)
b
(!)
P
i ,\~-i (S) _ ,\~-i (S)
,\t{S) - '\~(S) ,
by observing that '\1(S)'\2(S) = q/p.
The generating function V(i, s) of the v(i, n)'s is obtained by replac-
ing p, q, i by q, p, b - i, respectively, in the above expression for U(i, s).
The coefficients u(i, n), v(i, n) are obtained by expanding U(i, s), V(i, s) in
power series as usual. For details, see Feller (1957).
6.6 Exercises
6.1. Let (Xn, n ~ 1) be i.i.d. with
P(Xn = 1) =p = 1 - P(Xn = -1).

Let Sn = Xl + X 2 + ... + Xn , n ~ 1.
(i) Show that, for -n ::; j ::; n,
P(Sn = j) = { ( nti ) p(n+i)/2(1 - p)(n-j)/2 if n + j is even

o if n + j is odd.
Random Walks 143
(ii) Let So = i and Sn = i+Xl + .. ·+Xn , n ~ O. Find the distribution

of Sn.
6.2. Let (Xn, n ~ 1) be i.i.d. with finite mean 1-'. Let
- { 'IXl(W)+",+Xn(W)_1
A n-W' n 1-'>2'
!!.}
(i) Show that
P (lim sUPAn) = O.
n-oo
(ii) Show that {w : Xl(W) + ... + Xn(w) = O} S;;; An and
P (Xl + ... + Xn = 0 infinitely often) = o.
6.3. Let (Sn, n ~ 0) be a random walk with
P(Xn=l)=p, P(XN=-l)=q, P(Xn=O)=r, p+q+r=1.
(i) Compute the common mean and variance of the Xn's and find the
distribution of Sn.
(ii) Show that Sn drifts to 00 or -00 according to p > q or p < q.
6.4*. Let (Sn, n ~ 0) be a simple random walk on 'Il. P(Xn = 1) = p,
P«Xn = -1) = q (p + q = 1). Show that
00 1
~p'2n __•
L..J 00 - p_ q
n=O
(Thus, if p f= q, then the series is convergent, whereas, if p = q, then the

series is divergent).
6.5*. Let (Sn, n ~ 0) be a simple random walk on 'Il. Viewing this stochas-
tic process as a Markov chain with stationary transition probabilities, and
using the notation of Lesson 3, define the following quantities:
Pij = P(Sn+1 = jlSn = i) = 1f;(j - i),
where 1f; is the common density of the Xn's, n ~ 1, So X o, Sn =

Xl + =
.. ·+Xn , n ~ 1.
(i) Let gn(i,j) = E~=o Pi~' Verify that lillln_oo gn(i,j) exists (~ 00).
Show that Vi, j E 'Il,
gn( i, j) ~ gn(O, 0).
144 Lesson 6
(Hint: Use Exercise 3.13(i) of Lesson 3.)

(ii) Let
00 00
g(i,j) = ~ p!c. = LfIj,

~
k=O
" lij
,,=1
where
= P(S" = j, Sk =F j
fIj for k = 1, ... , n - 1 ISo = i).
Show that g(O, 0) = 1/(1 - 100).
(iii) Show that, if i =F j, then
lij = lim g,,(i,j)

"_00 g,,(O, 0)'
6.6. Let (S", n ~ 0) be a random walk on 'lh. Let N(w) = #{n : S,,(w) =
O}, Ak = {w : N(w) ~ k} and a = P(A 1 ). Show by induction that
P(Ak) = a k , k ~ 1.
6.7. Let S" = i + Xl + ... + X"' n ~ 0, be a simple random walk on 'lh.
Let
=
Oi {(i, al, a2,"') : aj E 'lh, j 1,2, ...}. =
Specify the u-field Ai on Oi and the probability measure Pi (on Ai) which
is the distribution ofthe stochastic process (S", n ~ 0).
6.S. Consider the equation
!3(i) = p!3(i + 1) + q!3(i - 1), a~i~b
in Section 6.5, (with p =F q).

(i) Verify that all solutions of the above equation is of the form
T + r (q/p)i, for some constants T and r.
(ii) Show that the only solution of the equation is
[(q/p)i _ (q/pt] / [(q/p)b _ (q/p)i] .
6.9. With the notation in Section 6.5, let
!3(i) = P(i)(Tb < To).

Consider the difference equation
!3(i) = p!3(i + 1) + q!3(i - 1) for a < i < b. (*)

Random Walks 145
(i) When p::f:. q, verify that f3(i) == 1 and f3(i) = (q/p)i are solutions of
(*). Also, for constants T and r, T + r (q/p)i is a solution of (*). Determine
T and r so that f3(a) = 0 and f3(b) = 1.
= =
(ii) For p q, verify that f3(i) == 1 and f3(i) i are solutions of (*), and
hence T + ri is a solution of (*). Determine T and r so that f3(a) = 0 and
f3(b) = 1.
6.10. In a simple random walk, show that, for any i, j E 'lh,
P (limsup(~
n-oo
= j») = 0 or 1
according to p::f:. 1/2 or p = 1/2.

6.11. Let (Xn, n ~ 1) be a sequence ofi.i.d. random variables with
P(Xn = 1) =p, P(Xn = -1) = q, P(Xn = 0) = 1- p - q.
Let So = 0, Sn = Xl + ... + X n .
(i) Find the distribution of Sn, n ~ 1.
(ii) Use the Central Limit Theorem to approximate, for a, b integers,

P(a:::; Sn :::; b), when n is sufficiently large.
6.12. Let (Xn, n ~ 1) be i.i.d. with P(Xn = 1) = P(Xn = -1) = 1/2. Let
S~ = i + Xl + ... + X n , for 0 :::; i :::; b. Let
T~) = { min~O:::; k:::; m: ~ = 0 or b}

m If{ ... }=0.
Show that
E ( S!!.:) ) 2 = i 2 + E (T~») .
6.13. Consider a simple random walk on 'lh, starting at O. Let Tg denote
the time of the first return to O. Show that
for p = 1/2
P (To' < 00) = { ~(1- p) for p < 1/2
for p > 1/2
6.14. Expand in a power series the generating function
W(s) = (1- V1- 4pqS2) /(2qs)

146 Lesson 6
and identify the coefficients w(n). Show that w(n) = 0 for n even and
w(2n -1) (-It+ 1 ( 1/2 ) (4pq)n
n 2q
_1
2 _-P(S
1
n
2n-l _
- 1) =-1- (2n -
~-1 n
1 ) pnqn-l
'
for n ~ 1.
6.15. Let (Sn, n ~ 0) be a symmetric random walk on '/l, (p =q
1/2, So = 0), and
To(w) = inf{n ~ 1 : Sn = O}.
Compute P (To = 2n).
6.16. Let (Sn, n ~ 0) be a simple random walk with So = O. Let a E '/l,-O.
(i) Show that the distribution of the first passage time to state a is given
by
p(T2
lal
= n) = -;;:P(Sn = a), n = lal + 2k, k ~ O.
(ii) Use (i) and Stirling's formula to show that E(T~) = 00 in the case
of a symmetric random walk.
Lesson 7
Renewal Theory
This Lesson is devoted to the study of a class of random walks whose steps
are non-negative. With the interpretation of renewals, these stochastic
processes model many random phenomena of interest. Renewal theory
provides tools for the analysis of such processes.
7.1 Motivation and examples

Consider a specific item (such as a machine, a light bulb, an electronic
device ... ), which is placed in service, say, at time O. Upon its failure, a new
identical one is immediately installed, and so on. In a replacement model
such as this, one might be interested in the (random) number of items to
keep the system alive during a fixed time interval, the time from some given
time point t until the next replacement, ...
Since lifetimes of identical items are appropriately modeled as random
variables so that epochs of renewal are random times. On the other hand,
the above phenomenon evolves in time, as such, stochastic processes are
appropriate tools for modeling.
Let Xn denote the lifetime of the nth-item. It is a random variable
talking values in [0,00). It is natural to postulate that the Xn's, n ? 1, are
independent and identically distributed (i.i.d.) with a common distribution
F. The times at which renewals occur are Sn = Xl +X2 + .. ·+Xn , n ? 1,
(So = 0). Sn is the total lifetime of the first n items, or the time until the
nth failure. The discrete-time stochastic process (Sn, n ? 0) is a random
walk with steps Xn being non-negative. It represents sucessive occurences
of events (failures of items). The number of items which have failed by time
t is denoted by Nt, and clearly Nt = max{n ? 0 : Sn ~ t}. Since Nt is a
147
148 Lesson 7
random variable , one needs to specify its distribution in order to compute

quantities of interest such as the average number of renewals up to time t.
The process (Sn, n ~ 0), or (Nt, t ~ 0), is called a renewal process.
A renewal process is used to model successive occurences of events such
as the failures of items, the incidences of earthquakes, the emission of par-
ticles, ...
As spelled out above, the investigation of renewal processes consists
of studying functions of non-negative i.i.d. random variables representing
successive intervals between renewals. The main objective of renewal the-
ory is to derive properties of various random variables associated with the
Sn's and Nt's, from the knowledge of the "inter-arrival" distribution F.
(In practice, F can be estimated from observed data.) Note that this is
possible since, given the structure of a renewal process, finite dimensional
distributions of counting process (Nt, t ~ 0) can be determined from F.
Definition 7.1 Let X n , n ~ 1 be a sequence of non-negative i.i.d. ran-

dom variables (defined on some probability space (0, A, P)). The process
(Sn, n ~ 0), where So = 0, Sn =
Xl + X2 + ... + X n , n ~ I, is called a
renewal process.
Remarks.
(i) Alternatively, the associated counting process (Nt, t ~ 0) is also
called a renewal process.
(ii) Motivated by applications, the Xn's are called lifetimes, or inter-
arrival times. The Sn's are renewal times, and Nt counts the number of
renewals upto time t.
(iii) A renewal process is specified by the common distribution F of the
Xn's. In Section 7.2, we will see that distributions of Sn's and the Nt's can
be expressed in terms of F.
Example 7.1 A Poisson process (Lesson 4) is a renewal process in which

inter-arrival times are exponentially distributed. Specifically,
F(x) = (1- e- A2:)I(o.oo)(x),

where A is the intensity of the Poisson process.
Example 7.2 A Bernoulli random walk is a renewal process in which the

= =
Xn's are discrete: P(Xn 1) 1 - P(Xn 0). =
Example 7.3 Let (Yn , n ~ 0) be a (recurrent) Markov chain. Suppose that
some state j is of interest, and we are interested in the times at which the
Renewal Theory 149
chain visits j. Suppose Yo = j. Then the times between successive visits to

j are
Xl = min{n > 0: Yn = j},
and for k ~ 1,
Xk+l = min{n > X" : Yn = j} - X".

The Markov property implies that the X" 's are i.i.d., so that (Sn, n ~ 0)
is a renewal process. Note that, if Yo =f:. j, then the above X" 's, k ~ 1 are
independent, and the X" 's , k ~ 2 are i.i.d. In this case, the distribution
of Xl is different from that of the X" 's k ~ 2, and we call (Sn, n ~ 0) a
delayed renewal process. Delayed renewal processes are used to model
phenomena in which the origin of time is not a renewal time. For example,
in the replacement model, the item in service at time 0 is not new.
Example 7.4 Consider a system which can be either in state on (oper-

ating) or off (breakdown). Upon a breakdown, the system is immediately
repaired. let Y n , Zn denote the operating times and repair times, respec-
tively. If the Yn's (resp. Zn's) are i.i.d. (resp. i.i.d.), and (Yn , n ~ 1) is
independent of (Zn, n ~ 1), then Xn = Y n + Zn, n ~ 1, are i.i.d., with
the common distribution obtained as a convolution of the distribution of
Y n and Zn. The associated counting process Nt registers the number of
complete repairs by time t. A quantity of interest in this alternating re-
newal process is the probability that the system is in state "on" at time t.
The probabilistic technique for computing such a quantity is called renewal
theory.
As we will see in Section 7.3 and 7.4, renewal theory is based on an

argument, called renewal argument, which states roughly as follows. At
renewal times, the process probabilistically restarts itself, so that the future
after one of these times, say Sn, looks probabilistically like as it did back at
time 0, in other words, the process beyond Sn is a probabilistic replica of
the process starting from O. The renewal times Sn are called regeneration
times.
Formally, a stochastic process (Zt, t E T) is called a regenerative process,
if there is a sequence of random (stopping) times Sn, n ~ 0, such that
(Sn, n ~ 0) is a renewal process, and after any Sn, the process (Zt, t E T)
has the same distribution as the whole process, that is, for any n, k, 0 <
tl < t2 < ... < t", the distribution of (Zs .. +tj, 1 ~ j ~ k) is the same as the
distribution of (Zt j ' 1 ~ j ~ k); and moreover, the process (ZS .. +t, t E T)
is independent of {So, Sl, ... , Sn}. For example, in a Markov chain, if we
let (Sn) be the sequence of return times to the origin 0, then clearly the
150 Lesson 7
evolution of the chain after such return time is that of the chain starting
at O.
Examples of such regenerative processes are (Nt, t ~ 0), (SN1 +1 - t, t ~ 0).
See also the renewal property of Poisson processes in lesson 4.
The renewal argument, based upon regenerative processes, is essential
in deriving renewal equations in renewal theory.
7.2 The counting process

Let (Xn,n ~ 1) be a sequence of i.i.d., positive random variables with
common distribution F. Let So = 0, Sn = X -1 + Xa + ... + X)n, n ~ 1.
The counting process (Nt, t ~ 0) is defined by
Nt(w) = sup{n ~ 0: Sn $ t}, t ~ O.
We have No = 0, so that Nt is the cardinality of the set {n ~ 1 : Sn $ t}.

Thus the time point 0 is not considered as a renewal.
Since the Xn's are assumed to be positive, we have Jl = E(X) > O. As
a consequence, for each t, the random variable Nt is finite almost surely
(a.s.). See Exercise 7.1. Thus we can write
Nt(w) = max{n ~ 0 : Sn $ t}. (7.1)

However, as t -+ 00, Nt -+ 00, a.s. Indeed, in the one hand, from the
definition (7.1), Nt is monotone increasing in t, and on the other hand,
SInce
(Nt ~ n) = (Sn $ t), (7.2)
lim P(Nt
t ..... oo
> n) = 1, for any n,
we obtain
P (lim
t ..... oo
Nt = 00) = 1. (7.3)
The distributions of Sn and Nt can be expressed in terms of F as follows.
Since Sn is a sum of i.i.d. random variables, its distribution Fn is the
n-fold convolution of F with itself, that is
P(Sn $ x) = Fn(x) = F*n(x) = (F * F * ... * F) (x), (n times).

(See Lesson 1). Now
P(Nt = n) = P(Nt ~ n) - P(Nt ~ n + 1)

Fn(t) - Fn+l(t)
= rn(t) - F*(n+1)(t). (7.4)
Renewal Theory 151
Note that since So = 0, its distribution is l[o,oo)(x), so that F*O(x) =

l[o,oo)(x). Of course, F*l = F.
Example 7.5 Consider a renewal process with
F(x) = (1- e->'X) l[o,oo)(x).
Since F is absolutely continuous, F *F has density I * I, where I(x) =

Ae->,x1(0,00)(x). Now
1* I(z) = l z
I(z - x)/(x)dx = A2 e->'z z, z ~0
and hence
Anzn-1e->'z
rn(z) = r(n-l) * I(z) = 1_ 1\1 l(o,oo)(Z) (Gamma distribution),
P(Nt = n)
1°t
F*n(z) _ F*(n+l)(t)
An zn-le->.z 1t An+1 zne->.z
dz
~--~~dz-
(n-1)! ° n!
( At)n
_,_e->'t, n ~ O.
n.
As in the case of Poisson processes, the random variable Nt in a general
renewal process has finite moments of all orders. This can be seen as follows.
Since the Xn's are not concentrated at 0, there is some a > 0 such that
P(Xl ~ a) > O. Consider the truncated renewal process:
X:(w) = a1(x.. ~a)(w).

Then, clearly, X: ~ Xn and
S: = Xl +X~ + ···+X: ~ Sn,

and hence
Nt ~ Nt = max{n ~ 0 : S: ~ t}.
The result follows by observing that the random variable N ta + 1 has
a negative binomial distribution with parameters p = P(Xl ~ a) and
r= [~] + 1, (where [x] denotes the integer part of x), that is
P(Nt+1=k)= (~::::~ )pr(l_ p)k_ r , k ~ r.

152 Lesson '1
Indeed, (Nt'" = k - 1) is the event that the rth "sucess" (getting the value
a in a Bernoulli trial with outcome a or 0) occurs at the kth trial.
We describe now the asymptotics of Nt, as t ~ 00. First, note that by
(7.3), Nt ~ 00 as t ~ 00, (a.s.), we have that
SN.!Nt ---+ Jl, (a.s.), as n ~ 00,
provided that Jl < 00. Indeed, by the strong law of large numbers, Sn/n ~
Jl, a.s., as n ~ 00. On the other hand,
Sn(w)
{W : - n- ---+ Jl
} n {w: Nt(w) ~ oo} ~
{SN'(W)(W)}
w: Nt(w) ---+ Jl .
Theorem 7.1 Let 0 < Jl = E(Xn) < 00. Then
lim Nt = .!. (a.s.).

t_oo t Jl
Proof. For each t > 0, we have
SN. ~ t < SN.+I

so that
SN.
- <-t
- - .Nt-+-1
<SN.+1
Nt - Nt Nt + 1 Nt
and the result follows by the discussion prior to the statement of the The-
~m. 0
Remark. When Jl = 00,
· -
11m Nt= 0 (a.s.).
t_oo t
To see this, apply the Theorem 7.1 to the truncated sequence X~ = Xn1(X"Sa)
and the associated Nt, S~, and then letting a ~ 00.
In the case of Poisson processes, the Xn's are exponentially distributed
= =
with Jl .A-I and variance (1'2 Var(X n ) .A-I. Thus E(Nt ) =.At t/Jl= =
and Var(Nt ) = t(1'2/Jl 3 . In general, it can be shown that (see Section 7.4)
t (1'2 - Jl2
E(Nt ) = - + 2? + 0(1), t ~ 00
Jl Jl
and
(1'2t
Var(Nt ) = 3"" + o(t), t ~oo.
Jl
Renewal Theory 153
Thus, it is expected that
(Nt - ~) I J u 2t I JJ3
will converge in distribution, as t - 00, to the standard normal random
variable.
Theorem 7.2 Suppose that u 2 = Var(X) < 00. Then
r p (Nt tiP) 1 1:1:

t~~ Ju-2tlp3 ~X = c)(x) = V2-i e- Y /2dy,
-00
2
"Ix E JR.
Proof. Let x E IR. We have
P (Nt - tlJJ <

..;u 2tl p3 -
x) = P (Nt < ! + J I'
x u 2 tlJJ 3 ) = P (Sn(t) > t) ,
where n(t) is the integer part of tip + xJu2t11'3.
Thus, it suffices to show that
lim P (Sn(t)
t-oo
> t) = c)(x).
Now, n(t) - 00 as t - 00, so that we have, by the central limit theorem,

and by the fact that
t - n(t)JJ
n(t)yfu -+ -x, as t - 00,
we get
lim P (Sn(t) > t) lim P (Sn(t) - n(t)JJ > t - n(t)JJ)

t-oo t_oo n(t)yfu n(t)yfu
l-C)(-x)=c)(x). <>
7.3 Renewal equations

We turn now to an important object in the study of renewal processes. As
proved in the previous section, for each t ~ 0, E(Nt ) < 00. The renewal
function m(t) is defined to be
m(t) = E(Nt ), t ~ o.
154 Lesson 7
The function m(t) can be expressed in terms of the common distribution

F of the Xn's as follows.
= L: P(Nt ~ n) = L: P(Sn ~ t) = L: Fn(t) = L: F*n(t).

00 00 00 00
m(t) (7.5)
n=l n=l n=l n=l
From (7.5), we see that
L: Fn(t) = F(t) + L: Fn+l(t)

00 00
m(t) = Fl(t) +
n=2 n=l
L: (Fn * F) = F(t) + (m * F)(t),

00
= F(t) + (7.6)
n=l
1t
where
(m * F)(t) = m(t - x)dF(x).
An equation of the form (7.6) is referred to as a renewal equation. A

renewal equation is an integral equation of the form
A(t) = H(t) + 1t A(t - x)dF(x) (7.7)
in which, the functions H and F are known, whereas A(·) is unknown.

In (7.6), the renewal function m(·) satisfies the renewal equation with
H(t) =F(t). In fact, m( . ) is unique solution of (7.6) which is bounded
on finite intervals. As we will see in the next section, many quantities of
interest can be expressed as solutions of renewal equations, so that renewal
theory consists of solving renewal equations and of studying the asymptotic
behavior of these solutions.
For (7.6), the solution m is written as
m = ~ Fn = (~Fn) * F = (m + 1) * F.
It turns out that the solution of (7.7) has the same pattern, namely
A(t) = H(t) + 1t H(t - x)dm(x),
or
A=(m+l)*H. (7.8)
Renewal Theory 155
Indeed, we have
H+(m+l)*H*F = H+F*H+F2*H+···
= H*(1+F+F2+···)=H*(m+l).
Assuming that A and H are bounded on finite intervals, (m + 1) * His
the unique solution of (7.7). Indeed, if B is another solution of (7.7), then
for G = B - (m + 1) * H, we have G = G * F (recalling (m + 1) * H is a
solution of (7.7». But then,
G = G * F = (G * F) * F = ... = G * Fn, for all n.
Thus
G(t) = lot G(t - z)dFn(z), for all n
= lim
n-oo}o
t G(t - z)dFn(z).
By hypothesis, the function G is bounded on [0, t], for each fixed t, say
IGI ~ at, so that
11t G(t - z)dFn(z)1 ~ atFn(t).
But m(t) = E:'=l Fn(t) < 00, implying that
lim Fn(t) = 0, for each fixed t.
n-oo
We have that G == 0.
As an example, consider A(t) = E (SN.+1). The so-called renewal argu-
ment (Section 7.1) is used to derive a renewal equation for A(t). Specifically,
A(t) = E [E (SN.+d] .
Now
for t <z
E(SN.+1IX1 =z)= { :+A(t-z) for t ~ z.
Thus
A(t) = looo E(SN.+dX1 = z)dF(z)
1 00
zdF(z) + lot [z + A(t - z)] dF(z)
looo zdF(z) + lot A(t - z)dF(z)

E(X 1 ) + (A * F)(t).
156 Lesson 7
The solution A(t) of the above renewal equation is expressed in terms of

= =
the known function H(t) E(XI} fooo xdF(x), and the renewal function
m(t) via (7.8):
E(SNt+I} = E(XI} + fat E(XI)dm(x) = (m(t) + I)E(XI}.

Now m(t) + 1 = E(Nt + 1), so that, for the random sum SNt +1
Xl + X 2 + ... + XN1 +1, we have
E (SNt +1) = E(Nt + I)E(XI}. (7.9)
Remark. (7.9) is a special case of a result known as the Wald's equation,

namely, if X n , n ~ 1 is a sequence of i.i.d. random variables with finite
mean and N is a stopping time with respect to (Xn, n ~ 1), such that
E(N) < 00, then
E(XI +X2 + .. ,+XN) = E(N)E(XI}. (7.10)
In the above analysis, Nt + 1 is indeed a stopping time since (Nt + 1 = n)
if and only if Xl +X2 + .. '+Xn - l ~ t and Xl +X2 + .. ,+Xn > t. (Note
that Nt is not a stopping time, since to determine whether ot not the event
(Nt = n) has occured, we need also to look at X n +1') On the other hand,
E(Nt + 1) = m(t) + 1 < 00, for t > O.

The asymptotic behavior of the renewal function m(t), as t --+ 00, is
expressed in the following theorem.
Theorem 7.3 (Elementary renewal theorem). If 0 < I' = E(XI} < 00,
then
lim m(t) = ~. (7.11)
t_oo t I'
Proof. By (7.9), we have

1
m(t) = -E (SNt+I) - 1,
I'
so that
m(t) 1 1 1
- = -+-E(SN+I-t)--.
t I' I't t t
Since SNt+1 ~ t, we have E(SNt+1 - t) ~ 0, so that
· . f-
1ImlD m(t)
-
> -1. (7.12)
t_oo t - I'
Renewal Theory 157
SNt - t ~ SNt+1 - SNt = XNt +1,
we have
E (SNt - t) ~ E (XNt+d
(Note that, in general, E (XNt +1) f E(Xd.) and
m(t) 1 1
-t - -+
< p. -E(XN+d·
p.t t
If the Xn's are bounded a.s., that is,
P(Xn ~ a) = 1, n ~ 1, for some a > 0,

then E (XNt +1) ::; a so that
. m(t)
1Imsup-- 1
<-. (7.13)
t-co t - P.
If Xn's are not bounded a.s., then we apply the previous analysis to the
truncated renewal process
X~ = { ;;n if Xn
if Xn
<a
~ a
noting that m(t) ::; ma(t), leading to
m(t) _1 .
lim sup -t ~ E(Xf)
t-co
By letting a -+ 00, E(Xf) ::; E(Xd by monotone convergence theorem (see
Appendix), we obtain the result. <>
Remark. When p. =00, limt_co m(t)/t =
0, by using the truncation
technique.
The Theorem 7.3 is in fact a consequence of a more general theorem in
the next section.
7.4 Renewal Theorems

The Elementary Renewal Theorem (Theorem 7.3) in the previous section
states that, for large t, the expected number of renewals m(t) is of the order
158 Lesson 7
of til-'. A refinement of this result will be given below in Theorem 7.4 (Re-
newal Theorem). In Theorem 7.4, we distinguish two types of distribution
F.
In the context of renewal processes, F is the distribution of a non-
negative random variableX. If X is discrete with values in 1N = {O, 1,2, ...},
then, with probability one, X takes values of the form nd, for n E 1N and
= =
d 1. It is clear that z nd is a point of increase of F in the sense that,
for any real numbers a < z < b, we have F(b) - F(a) > 0, or equivalently,
for any s > 0, F(z + s) - F(z - s) > 0. More generally, the distribution F
°
is said to be arithmetic (or lattice) if there is d > such that all points of
increase of F are of the form nd, n E 1N. The largest such d is called the
span of F. An arithmetic distribution F corresponds to a random variable
X which assumes, with probability one, only values which are multiples of
d. If there is no such d for F, then F is said to be non arithmetic. For
example, if F is continuous, then F is nonarithmetic.
Here is the so-called Renewal Theorem, its proof is complicated and
hence omitted.
Theorem 7.4 (Renewal Theorem). (i) If the distribution F is nonarith-

metic, then, for any h > 0,
m(t + h) - m(t) --+ hll-' as t -+ 00. (7.14)
(ii) If the distribution F is arithmetic with span d, then, for any h which
is a multiple of d, (7.14) holds.
Remarks.
(a) In the statement of Theorem 7.4, the limits are ° when J.t = 00.
(b) The interpretation of Theorem 7.4 is this. For t large, the expected
number of renewals in an interval of length h is approximately hi1-'.
The fact that Theorem 7.4 implies Theorem 7.3 is left as an exercise
(Exercise 7.6).
It turns out that Theorem 7.4 is equivalent to Theorem 7.5 (below)
which is useful in determining asymptotics of solutions of renewal equations.
Specifically, the limit, as t -+ 00, of A(t) = H(t) + (H * m)(t), solution of
renewal equation A(t) = H(t)+(A*F)(t), is provided in Theorem 7.5 when
the function H( . ) satisfies certain conditions.
Since technical details as well as a formal proof of Theorem 7.5 will be
omitted, we focus instead on motivation and applications of this theorem.
Renewal Theory 159
First, observe that if H(t) = l[o,IJj(t), then

A(t) = l[o,IJj(t) + 1t l[o,IJj(t - x)dm(x)
i t
t-IJ
dm(x) = m(t) - m(t - a), for t > a.
It follows from Theorem 7.4 that
lim A(t) = -
t ..... oo
11
J.I. 0
00
H(x)dx, (7.15)
where 10 00
H(x)dx denotes the usual Riemann integral of H(x) on [0,00).
Recall that a (measurable) function H: [0,00) --+ [0,00) is Riemann-
integrable on [0,00) if H is Riemann-integrable on [0, a] for all a > 0, and
limIJ ..... oo IolJ H(x)dx exists. (The Riemann integral of H on [0,00) is then
taken to be this limit).
From above we see that (7.15) holds for H(t) = l[o,IJj(t), which is
Riemann-integrable on [0,00). However, if H(t) is an arbitrary Riemann-
integrable function on [0,00), (7.15) may fail (see Feller (1966), Vol II, pp
349). To see which additional conditions we need to impose on Riemann-
integrable functions H, consider the following.
For h > 0, the intervals [(n -1)h, nh), n ~ 1, form a partition of [0, 00).
Let
an(h) = inf{H(x) : (n -1)h ~ x < nh},
fin(h) = sup{H(x) : (n - l)h ~ x < nh},
and
00 00
f(t) = L: an(h) 1[(n-1)h,nhj(t), g(t) = L: fin (h) 1[(n-1)h,nhj(t).

n=O n=O
Then
f(t) ~ H(t) ~ g(t), Vt ~ 0,
so that
(f * m)(t) ~ (H * m)(t) ~ (g * m)(t).
Suppose that
L: an(h) L: fin (h)

00 00
a(h) = h and fi(h) = h

n=l n=l
160 Lesson 7
converge absolutely, then it can be shown that
lim (J * m)(t) =
t-+oo
~a(h)
J1.
and lim (g * m)(t)
t-+oo
= ~(j(h),
J1.
So that
~a(h)
J1.
~ liminf(H
t-+oo
* m)(t) ~ limsup(H
t-+oo
* m)(t) ~ ~(j(h).
J1.
If, in addtion, we suppose that
lim ((j(h) - a(h)) = 0,

h'\.O
then (7.15) holds with
10[00 H(z)dz = h'\.O

lim a(h) = lim (j(h).
h'\.O
Thus for H : [0,00) --+ [0,00) such that (j(h) < 00, for h > 0, and
limh'\.o ((j(h) - a(h)) = 0, (7.15) holds. Since the Riemann integral of such
a function H is obtained directly as limh'\.o a(h), H is said to be directly
Riemann integrable.
A directly Riemann integrable function is Riemann integrable on [0, 00),
but the converse fails.
It can be shown that (see Exercise 7.7) the concept of direct Riemann
integrability coincides with the usual Riemann integrability for a function
which is zero outside of some finite interval, or monotonic.
Examples.
(i) H(t) = l[O,aj(t).
(ii) H ~ 0, non-increasing and 1000 H(z)dz < 00.
We now state, without proof, the following important theorem.
Theorem 7.5 (Key Renewal Theorem). Let (Sn, n ~ 0) be a renewal
process with interarrival distribution F, and mean J1. = 00 zdF(z). Let A 10
be the solution of the renewal equation
A(t) = H(t) + (A * F)(t), t ~ 0,

where H is directly Riemann integrable on [0,00).
(i) If F is nonarithmetic, then
lim A(t)
t-+oo
11
= -J1. 0
00
H(z)dz. (7.16)
Renewal Theory 161
(ii) If F is arithmetic with span d, then
d
=- L
00
lim A(x + nd) H(x + kd). (7.17)
t-+oo p. k=O
(The limits are zero when p. = 00, and (7.14) holds for all x> 0).
In the rest of this section, we are going to use (7.16) to derive asymp-
totics of various quantities of interest in renewal theory.
In Section 7.3, we mentioned that, in a renewal process with F nonar-
=
ithmetic, and E(Xl) p., Var(X) 0'2 < 00, =
t 0'2 - p.2
E(Nt ) = m(t) = -
p.
+ 2?
p.
+ 0(1), t -+ 00.
This result can be shown now by using the following strategy:

(i) Using the renewal argument to obtain a renewal equation for
t
A(t) = m(t) + 1 - -.
p.
First note that

t t
A(t) E(Nt ) + 1- - = E(Nt + 1) - -
p. P.
1 t 1
-E (SN.+d - - = -E (SN.+1 - t).
p. p. p.
Now,
X - t for t ::; x
E (SN.+1 - tlX1 = x) = {
E (SNt_",+1 _ (t - x)) for t > x
so that
E(SNt +1 - t) E [E (SNt+l - tIX1)]
1 00
(x - t)dF(x) + lt p.A(t - x)dF(x)
or
p.A(t) = H(t) + (p.A * F)(t).
162 Lesson 7
(ii) The function H(t) = JtOO(:c - t)dF(:c) is directly Riemann integrable

since it is monotone non-increasing and Riemann integrable over [0,00) (see
Exercise 7.9), and hence by (7.16),
lim 1'A(t)
t-+oo
=!
l'
1 0
00
H(:c)d:c = ,-1'2_2+_U_2
l'
Since m(t) -tf1' = A(t) -1, we have
,!~~ [m(t) _ ~] = u2 - 1'2

l' 21''''
or
t u 2 - 1'2
m(t) = - + 2? + 0(1), t -+ 00.
l' l'
We consider now various types of "lifetimes" when we observe a renewal

process at some time point t. Given t, the random time Ct = t - SNi is
called the current lifetime, whereas SNi+l - t = R t is called the residual
(or excess) lifetime. The total lifetime is C t +Rt = XNi+l. We are going to
use renewal theory to derive distributions of these lifetimes. Of course, it
suffices to find the joint distribution of (Ct , Rt). In the sequent, we assume
that F is nonarithmetic.
Since
P (Ct ~ :c, Rt ~ y) = P (Rt - y ~ :c + y), (7.18)
we need first to determine the distribution of residual lifetime Rt.
For this purpose, we let, for fixed y, A(t) = P(Rt > -V). We derive a
renewal equation for A(t) by conditioning on Xl:
for:c > t + y
P(R, > .IX, > z) = { ~(R,-. >.) for t < :c < t + y
for t ~ :c.
Thus
A(t) = 10 00
P(Rt > ylXl > :c)dF(:c)
1t+y
00 dF(:c) + it
0
A(t - :c)dF(:c)
H(t) + (A * F)(t),
where
H(t) = 1 00
t+y
dF(:c) =1- F(t + v).
Renewal Theory 163
The solution of this renewal equation is
A(t) = H(t) + (H * m)(t),

so that
P(Rt ::::; y) = 1 - A(t) = F(t + y) -It [1- F(t +y- x)]dm(x). (7.19)
The limiting distribution of Rt , as t -+ 00, can be obtained by using the

Theorem 7.5.
The above function H(t) = 1 - F(t + y) is non-increasing. It will be
directly Riemann integrable if it is Riemann integrable over [0,00), that is
1 00
[1 - F(t + y)]dt < 00.
But, for y > 0,
1 00
[1- F(t + y)]dt = 100
[1- F(x)]dx ::::; 1 00
[1- F(x)]dx = E(Xd = Jl..
Thus H(t) is directly Riemann integrable when Jl. < 00, and in this case,
(7.16) yields
lim P (SNt+ 1
t .... oo
- t 11
> y) = -
Jl. 11
00
[1- F(x)]dx (7.20)
= -11
or
lim P(Rt ::::; y) 00
[1- F(x)]dx.
t .... oo Jl. 11
Now, (7.18) and (7.19) yield
lim P (Ct
t .... oo
;::: X, Rt ;::: y) 11
= -Jl. 00
:1'+11
[1- F(z)]dz, (7.21)
which, in turn, yields the limiting distribution of the current lifetime Ct:
lim P (Ct ;::: x) = t lim P (Ct ;::: x, Rt ;::: 0) = ..!:.1°O [1- F(z)]dz. (7.22)
t .... oo .... oo Jl. :I'
Note that the distribution of Ct can be derived from that of Rt as

follows.
For x > t, it is obvious that P(Ct ::::; x) = 1. For x ::::; t
P(Ct < x) = P(Rt-:I'::::; x) = F(t) -I t

-:I'[l- F(t - z)]dm(z).
164 Lesson 7
The distribution of the total lifetime XN. +1 = C t + R t can be obtained,

in principle, from the joint distribution of (Ct , Rt ). As we will see, the
distribution of XN.+l is different from that of Xl, since the random variable
XN.+l depends on Nt.
We can obtain the distribution of XN.+1 by using renewal theory.
Let A(t) = P (XN.+1 > V), then, as usual,
A(t - z) for z ~ t
P (XN.+l > ylXl = z) = { 1 for z > max(t, y)
o elsewhere
and hence
P(XN.+l > y) = 1 00
P(XN.+1 > ylXl = z)dF(z)
= 1 00
P(XN.+l > ylXl = z)dF(z)
+lot P(XN.+1 > ylXl = z)dF(z)
1 dF(z) + Jot A(t - z)dF(z)
00
max(t,y)
H(t) + (A * F)(t),
where H(t) = 1 - F (max(t, y». Thus
A(t) = H(t) + (H * m)(t). (7.23)
Again, since H is directly Riemann integrable, we have, by (7.16),
lim A(t)
t-+oo
11
= .J1.:. . 0
00
[1 - F (max(t, y»]dt.
As an example, consider F(z) = (1- e-Aa:) l(o,oo)(z). Then
H(t) = e- A max(t,y) and met) = At.
The formula (7.23) becomes
A(t)= P (XN.+1 > y) = [1 + max(t, v)] e- AY , y> 0,
which is different from P(XI > y) = e- Ay •

Finally, we consider a useful generalization of renewal processes.
Renewal Theory 165
(i) Delayed renewal processes.

In a renewal process, the random variables X n , n ~ 1, represent, e.g.,
lifetimes of successive items placed into service. The times of renewal are
Sn = Xl +X2 + ... +Xn . To handle the situation where the origin oftime
is not a reneqal time, that is, the first item has been installed for some time
prior to time zero, we need to distinguish the first random variable Xl from
the Xn's for n ~ 2. In such a case, the distribution G of Xl is different
from the common distribution F of the Xn's n ~.
Formally, let (Xn, n ~ 1) be a sequence of independent, positive random
variables such that the Xn's, n ~ 2, are i.i.d. Let So = 0 and Sn =
Xl + X 2 + ... + X n , n ~ 1. Then (Sn, n ~ 0) is called a delayed renewal
process.
As in the case of an ordinary renewal process, the quantities of interest
are
NP(w) = max{n ~ 0: Sn(W) ~ t} and mD(t) = E (NP) .

Basic results for delayed renewal processes are derived as in the case of
ordinary renewal processes. For example,
P (NP = n) P(Sn ~ t) - P(Sn+l ~ t)

(G * Fn-d(t) - (G * Fn)(t),
00
mD(t) = E (NP) = :L(G * Fn-d(t) = G(t) + (G * m)(t),

n=l
lim ~mD (t) = !,

t-oo t p.
and
lim [mD(t
t-oo
+ h) - mD(t)] = !!.,
JJ
provided that F is nonarithmetic. Note that JJ is the mean of X 2 •
(ii) Stationary renewal processes.
From (7.19), we see that the limiting distribution of the residual lifetime
Rt = SN.+1 - tis
11"
-
JJ 0
[1 - F(x)]dx = lim P (SN1 +1 - t
t-oo
~ y) .
If a renewal process has been operated for a long time so that the residual
lifetime of the item in service at time zero has above limiting distribution,
166 Lesson 7
then we have a stationary renewal process. Specifically, in a delayed renewal

process, if the distribution G of Xl is specified to be
G(y) = -l1
I-' 0
Y
[1- F(x)]dx, y> 0,
then this delayed renewal process is called a stationary renewal process. In

such a process, the renewal rate is constant. Indeed, since m D (t) satisfies
the renewal equation
mD(t) = G(t) + (m D * F)(t) (Exercise 7.12)
whose solution is mD(t) = til-'.

Note that in a stationary renewal process, the counting process (Nf, t ~
0) has stationary increments (see Exercise 7.13). Also, (Nf, t ~ 0) is a
stationary process (see Lesson 2). Poisson processes are stationary renewal
processes. Indeed
F(x) = (1- e->'X) l(o,oo)(x) and I-' = 10

00
[1- F(x)]dx =
1
I'
l1
Thus
-
I-' 0
Y [1- F(x)]dx = A l0
Y e->,xdx = 1- e->'y = F(y).
7.5 Exercises
7.1. Let (Xn' n ~ 1) be a sequence ofi.i.d. nonnegative random variables.
(i) Show that if E(Xn) = 0 then P(Xn = 0) = 1.
(ii) Show that E(Xn) > 0 if and only if P(Nt < 00) = 1, where Nt =
sup{n ~ 0: Sn ~ t}, t ~ 0, So = 0, Sn = Xl + X2 + ... + X n , n ~ 1.
7.2. Let (Nt, t ~ 0) be a Poisson process with intensity A. Use the formula
00
m(t) = E(Nt ) = 2: Fn(t)

n=l
to compute m(t).
7.3. Use the renewal argument to show that the renewal function m(t)
satisfies the renewal equation
m(t) = F(t) + lt m(t - x)dF(x).

Renewal Theory 167
(i) For x > 0, verify that
o ift<x
E(NtIXl = x) = { 1 + m(t - x) ift ~ x.
(ii) Use m(t) = E [E(NtIXI)] to verify that

m(t) = 1t E (NtlX l = x) dF(x) = F(t) + 1t m(t - x)dF(x).
7.4. Let (Xn, n ~ 1) be LLd. with P(Xn 1) 1 - P(Xn= =-1) = = 1/2,

=
Sn Xl + X2 + ... + X n , n ~ 1, and T(w) min{n : Sn =
I}. =
(i) Show that T is a stopping time, that is, (T = n) E cr(Xl' ... X n ), for
all n.
(ii) Compute E(ST).
(iii) Use Wald's equation (7.10) to show that E(T) = 00.
7.5. Consider a renewal process with
F(x) = (1- e->':I:) l(o,oo)(x).

For h > 0, compute m(t + h) - m(t). Verify that
lim (m(t
t-oo
+ h) - m(t» = ~,
I-'
where I-' = E(X) = 1000 xdF(x).

7.6. Consider a renewal process with distribution F.
(i) Suppose that F is nonarithmetic. Use Theorem 7.4 to show that
m(n) 1
- - ---+ - , as n -+ 00.
n I-'
(Hint: Look at m(n + 1) - m(n) and use the fact that if Xn -+ x then
E~=l Xk/ n ---+ x, as n -+ 00.)
(ii) Suppose that F is nonarithmetic. Use (i) to show that
m(t) 1
- - ---+ - as t -+ 00.
t 1-"
168 Lesson 7
(iii) Suppose that F is arithmetic with span d. Show that the result of
(ii) still holds. (Hint: First show (i) by looking at m(nd + d) - m(nd) and
applying Theorem 7.4.)
7.7. Let H : [0,00) ---. [0,00). Show that
(i) If H is directly Riemann integrable on [0,00), then H is Riemann
integrable on [0,00).
(ii) If H is non-increasing and Riemann integrable on [0,00), then H is
directly Riemann integrable.
(iii) If H is continuous and zero outside some finite interval, then H is
directly Riemann integrable.
7.8. Suppose that the distribution F is nonarithmetic. Use Theorem 7.5 to
obtain (i) of Theorem 7.4.
7.9. Let H(t) = Jooo(x - t)dF(x). Show that H(.) is monotone non-
increasing and Riemann integrable over [0,00).
7.10. Let (Sn, n ~ 1) be a renewal process with F(x) = (1 - e- AX ) 1(o,oo)(x).
(i) Find the distribution of SNl +1 - t.

(ii) Find the distribution of t - SNi •
(iii) Show that SNlH - t and t - SNI are independent.
(iv) Use (7.22) to find the distribution of XNI+1 (Hint: Consider y ~ t,
and y > t in the integration). Compute the mean of XNI +1.
7.11. Let Ct =t - SNjI and for each fixed y, Ay(t) = P(Ct ~ y).
(i) Show that the function t -+ Ay(t) satisfies the renewal equation
Ay(t) = =
H(t) + (Ay * F)(t) with H(t) [1 - F(t)]1[o,y](t).
(ii) Solve the renewal equation in (i) to find the distribution of Ct.
7.12. Let (Sn,n ~ 0) be a delayed renewal process. Show that mD(t)
satisfies the renewal equation
mD(t) = G(t) + (m D * F)(t).

7.13. Let (Sn,n ~ 0) be a delayed renewal process. Show that if the
counting process (Nf, t ~ 0) has stationary increments, then necessarily
the distribution G of Xl is given by
Renewal Theory 169
1 r
G(y) = P(XI ~ y) = P10 [1- F(:c)]d:c,
where F is the common distribtuion of the Xn's, n ~ 2, and J1. = Iooo [1 -
F(:c)]d:c.
7.14. (Alternating renewal processes).
Consider a machine which can be either in operating condition (on) or
in breakdown state (off). At time t = 0, the machine is on and remains
on until time UI at which it breaks down. let VI be the repair time after
the first breakdown. After repair, the machine will be on for a length of
time U2, and so on. Suppose that the random variables Un, n ~ 1 (resp.
Vn , n ~ 1) are i.i.d. with common distribution G (resp. H), and these two
sequences of random variables are independent of each other.
Let Xn = Un + Vn , n ~ 1.
(i) Show that So = 0, Sn = Xl + X 2 + ... + X n , n ~ 1 form a renewal
process. Specify the common distribution F of the Xn's in terms of G and
H.
(ii) What is the meaning of Nt = max{n : Sn ~ t}.
(iii) Let pet) be the probability that the machine is on at time t. Find
a renewal equation for pet).
(iv) Solve the renewal equation in (iii) to get pet).
(v) Suppose that F is nonarithmetic, find the limiting probability
limt_oo pet).
7.15. Let (Sn,n ~ 0) be a (delayed) stationary renewal process. Show that
the distribution of SNi+1 - t is independent of t, namely,
P (SNi+1 - t ~ 1i
y) = -
J1. 0
Y
[1- F(:c)]d:c
for all t.
Lesson 8
Queueing Theory
This Lesson presents an introduction to the stochastic analysis of queueing

systems. Queueing systems arise in a variety of activities in fields such as
management and technology. The applications of stochastic processes such
as Markov chains, random walks and renewal processes constitute the core
of the analysis.
8.1 Modeling and structure

In activities such as inventories, industrial processes, communications and
transportation traffic, business operations, physical processes ... , we are
interested in understanding the process involved for better planning and
control.
Consider the situation where "customers" arrive at some location to
seek some kind of service. The operation of such a system is described as
follows. Given a number of servers, a customer, upon arrival, will receive
service immediately if one of the service counters is free, otherwise he has
to wait by joining a waiting line or a queue. Obviously, in such a dynamical
system, problems of interest include such things as the length of the queue,
the waiting time of a customer in the queue, the total time spent by a
customer in the system (waiting time plus service time at a counter), ...
With appropriate interpretations of terminologies such as customers,
servers, arrival times and service times, the above mentioned activities share
a common structure. In a situation such as this, it is a scientific routine to
develop a general theory in order to study all similar phenomena. The first
step in building such a theory is to examine the basic components of a typi-
cal case. In the example of customers above, the basic components forming
171
172 Lesson 8
the system are: the way in which the customers arrive, the type of service,
the service policy and the number of servers. Having identified the type
of service, the service policy and the number of servers, we are uncertain,
except in trivial cases, about the arrival times of customers as well as the
duration of their service times at the counters. If we view the irregularities
in these uncertain quantities as statistical fluctuations, then we can model
them as random quantities using probability theory. Moreover, queueing
systems evolve in time, stochastic processes are a natural tool of modeling.
The random components of a queueing system consists of the arrival
times and the service times of customers. If we denote by To =
Tl < T2 < ... the successive arrival times of customers, then, as ran-
° <
dom variables, they form a point process on [0,00) (see Lesson 4). Let the
inter arrival times be Xn = Tn - Tn - l , n ~ 1. The Xn's are assumed to
be i.i.d., positive random variables, with common distribution F (so that
Tn = Xl + ... + X n , n ~ 0, is a renewal process). Also, let Yn be the
service time of the nth customer. It is reasonable to assume that the Yn's
are positive i.i.d. random variables, with common distribution H. More-
over, (Xn' ~ 1) and (Yn , n ~ 1) are independent of each other. With this
structure, the random components of a queueing system are characterized
by the distributions F and H.
To complete the description of the structure of a queueing system, we
need to specify the service policy and the number of severs.
As an introduction to queueing theory, we consider only the most nat-
ural and the simplest service policy, namely ''first come, first served", that
is, customers are served in the order of their arrival. The number of servers
s is either 1 (single server system), 1 < s < 00 (s counters in parallel),
or even s = 00 (in this case, each customer will be served immediately on
arrival, so that there is no queue. This situation has not only theoretical
interests, but also can be used as approximations to systems with large s).
For simplicity, we assume that the capacity of the queueing system under
study is unlimited, that is all arrivals are admitted to the system.
Thus, in the case of unlimited capacity and a specified service policy
(here ''first come, first served"), a queueing system is characterized as a
triple F / H / s. For example, consider a queueing system with s = 1, and in
which, customers arrive according to a Poisson process with rate A, so that
F is exponential, that is
F(z) = (1- e->':II) 1(0,00)(z).
Sometimes, the service time distribution H can be postulated to be ex-

ponential with parameter 1-'. Of course, in practice, the modeling of the
arrival process as a Poisson process is appropriate when chances are small
Queueing Theory 173
that more than one arrival can occur in a small time interval, such as in
telephone communication systems.
Queueing systems F I His are classified according to the nature of F,
H, and s. Thus MIMII denotes a queueing system with both F and H
exponential and s = 1 (a single server Poisson queue), where M stands
for Markov, in view of the lack-of-memory property of the exponential
distribution (namely, P (X > s + tlX > t) = P(X > s), for all t, s > 0).
Let G stand for "general", then MIG/3 denotes a queueing system with
F exponential, H arbitrary, and s = 3. In this lesson, we will study the
system MIMIs, 1 ~ s ~ 00, and MIGII.
8.2 The queue MIM/1

We start out by considering a single server queueing system with inter-
arrival time distribution F and service time distribution H being exponen-
tial with parameter A and 1', respectively. Various random quantities of
interest will be spelled out, and the analysis will consist of obtaining their
distri bu tions.
Let Q(t) denote the number of customers present in the queueing system
at time t (either waiting or being served). Q(t) is referred to as the system
length at time t. (Q(t), t ~ 0) is a continuous-time stochastic process with
discrete state space IN.
In the special case of MIMII queues, it turns out that (Q(t), t ~ 0) is
a Markov chain, in fact, a Birth and Death process.
Theorem 8.1 In a MIMII queue, the process (Q(t), t ~ 0) is a birth and

= =
death process with birth rates Ai A, i ~ 0, and death rate 1'0 0, I'i 1', =
i> 1.
Proof. The transitions in (Q(t), t ~ 0) are caussed only by arrivals and

departures of customers. In the one hand, the arrival process is a Poisson
process, independent of service process, and on the other hand, the lack-
of-memory property of the (exponential) service time distribution implies
that the excess (residual) service time at any time t is independent of all
past service times. From these observations, it follows that (Q(t), t ~ 0)
is a Markov chain with stationary transition probabilities. Now, it can be
checked that, as h "" 0,
Ah + o(h) for k = 1, i ~ 0
P (Q(t + h) - Q(t) = kIQ(t) = i) = { I'h + o(h) for k = -1, i ~ 1
O(h) for Ikl > 1.
<>
174 Lesson 8
In view of Theorem 8.1, it suffices to determine the transition matrix

[Pij(t)], where
Pij(t) = P (Q(t + s) = jIQ(s) = i).
This can be achieved by solving Kolmogorov forward equations for Birth
and Death processes (Lesson 5). In our case for i ~ 0,
pIo(t) = -APlo + J.'Pi1(t)

PIj(t) = APi,j-l(t) - (A + J.')Pij(t) + J.'Pi,j+1(t) for j ~ 1.
The above system of differential equations can be solved by the method

of generating functions and Laplace transforms. An explicit expression for
Pij(t) can also be obtained by a combinatorial method. The computational
procedures are rather lengthy. We summarize below the results (for details,
see, e.e., Prabhu, 1965).
Theorem 8.2 In a M / M /1 queue, the transition probabilities of the Birth

and death chain (Q(t), t ~ 0) are given by
Pij(t) = OJ_i(t) + p-i-lOi+i+l(t) + (1- p)piE>-i-j-2(t), t ~ 0, i,j E IN,

(8.1)
where
OJ(t) = f
n=O
e-At(At)n+j . e-1Jt(J.'t)n
(n + j)! n!'
j ~ 0,
A
p= -, O_j = p-j OJ (t), j ~ 0,
J.'
and
E>j(t) = I:0i(t).
i~j
Remarks.
(i) If we let A(t), D(t) be the number of arrivals and departures during
(0, t], respectively, then (OJ(t),j E 7l) is the probability density of the
random variable X(t) = A(t)-D(t), and E>j(t) = P(X(t) :5 j) (see Exercise
8.1).
(ii) For j ~ 0, if we denote by Ij(z) the (modified) Bessel function of
order j, that is
1 (z)2n+j
I: nUn + 1)! 2"
= n=O
00
Ij(z) ' Z E JR, (8.2)

Queueing Theory 175
then
OJ (t) = e-()..+/J)t rJ /2 1j(2t"';>;). (8.3)
(iii) In Exercise 8.1, the student is asked to show that
P(X(t) = -j) = (r)j P(X(t) = j).

(iv) Let X, Y denote the inter-arrival time and service time, respectively.
The ratio p = E(Y)I E(X) is referred to as the traffic intensity of the
queueing system. In the M1MII queue, p = AI Il, which is the ratio of the
arrival rate and the service rate. Also, since
E (A(Y» = E [E(A(Y)IY)]
= 1 o
00
AtdH(t) = AE(Y) = -,
A
Il
p is the expected number of arrivals during a service time.
Using Theorem 8.2, we investigate now the behavior of the process

(Q(t), t~ 0) for large t. As expected, if p < 1 (that is A < Il), then the
system-length will be a steady state after a long period of time, whereas if
p > 1, with probability one, the system-length grows to infinity. Specifi-
cally,
Theorem 8.3 For i,j E IN,
t~~ Pij(t) = { a1 - p)rJ if p < 1

if p ~ 1.
Proof. Look at (8.2) and (8.3). First, we use the following result (see
Erdelyi, 1953):
efl:
1j(x) "" tn=:' X -+ 00, for all j.
v2'1rx
For x = 2t.,jAji, and t -+ 00, we have
_()..+ )t ·/2
'>:;
e2t V"''' 'Ji"2 t
e -(YX- v,., ·/2
O·(t)
J "" e /J"J
IF J.... I ..... rr--... - nf ~\'1J?I' \ 1 / ... Y"J .
In (8.1), we also need the asymptotic behavior of E>k(t) as t -+ 00. Now

k _(YX_..jii)2 t k
E>k(t) = .L OJ(t) "" 2t'lrt)1/2(AIl)1/4 .L pi/2.

3=-00 3=-00
176 Lesson 8
If P > 1 then 1:;=-00 pi/2 < 00, so that

lim ek(t) = 0, for all k.
t-+oo
Thus,
lim Pij(t) = O.
t-+oo
If p = 1, then (8.1) reduces to
Pij(t) = OJ_i(t) + 0i+i+l(t),
and hence
lim Pij(t) = O.
t-+oo
If p < 1, then
lim ek(t)
t-+oo
= 1, for all k,
sInce
00 _(..;x_.,fij)2 , 00
1- ek(t) = .L OJ(t) '" 2(:t)1/2().. )1/4 . L pi/2 < 00
.1=k+1 J1. .1=k+l
(see Remark (i) above). Thus,
lim Pi;(t) = (1 - p)pi.

1-+00 <>
Remarks.
(i) The above theorem says that 7r(j) = lim,-+oo Pij(t) exists for all
i E IN, and are independent of i. If p < 1, then 7r is the unique stationary
distribution of the Markov chain (Q(t), t ~ 0), whereas, if p ~ 1, then there
is no stationary distribution.
(ii) It is left as an exercise (Exercise 8.2) to verify that, when p < 1, the
geometric distribution
7r(j) = (1 - p)pi , i = 0, 1,2, ...

is indeed a stationary distribution of (Q(t), t ~ 0).
(iii) When p ~ 1, we have, for all i,i E IN,
lim P (Q(t)
1-+00
= iIQ(O) = i) = O.
Queueing Theory 177
As an application of Theorem 8.3, let us compute several quantities of

interest when the queueing system M / M /1 is in its steady state mode. We
assume that p = >"/Jl. < 1. Since 7r(j) = (1 - p)pi, j = 0,1,2, ... is the
stationary distribution of (Q(t), t ~ 0), we have
lim P (Q(t) = j) = 7r(j), j = 0, 1,2, ...

t-+oo
Thus, in its steady state mode, the queueing system is governed by the
random variable Q(oo), or simply Q, whose distribution is 7r( .).
The expected number of customers in the system is
00 00 >..
E(Q) = l:j7r(j) = (1- p) l:jtl = 1 ~ =---==1"'
;=0 ;=0 p Jl.
Let Q" denote the number of customers in the queue, then
Q" = { ~-1 ifQ =

if Q ~ 1.
°
Thus we have
E(Q") = t ( j -1)7r(j) = _(
>..2 .• ,
;=1 Jl. Jl.
Let W be the total time spent by a customer in the system. Its distri-
bution is obtained as follows. Note that this customer joins the queue after
some large time t has elapsed, so that the distribution of W does not vary
much with t.
Now, W is the time required to serve all customers present in the system
at the time of arrival of the customer in question, including his service time.
Thus
W = Yt + Y2 + ... + YQ+1,
where Yt is the residual service time of the customer being served. Since
the service time Y is exponential (with parameter Jl.), Yt is also exponetial
with parameter Jl., due to the lack-of-memory property of the exponential
distribution. Therefore the conditional distribution of W given Q = j,
j = 0,1,2, ..., is the Gamma distribution with parameter (j + 1, Jl.). We
have
00
peW ~ z) l:P(W ~ zlQ = j)P(Q = j)

;=0
00 1:1: e-I-'t Jl.H1 t; .
l: 'r (1- p)P' dt
;=0 0 J.
178 Lesson 8
1 : e-ptJ!l!;-dt
(t)i
(1- p)1' 1
o J!
= (1 - p)1' 111: exp [-1'(1- p)t] dt =1 - e-(J.I-.>')t.
Thus, W is exponentially distributed with parameter I' - A, and hence

E(W) = 1/(1' - A).
Let W" be the waiting time of a customer. Then
ifQ= 0
W" = { ~t + Y2 + ... + YQ if Q ~ 1.
We have
P(W" = 0) = P(Q = 0) = 1- p
and
00
P(O < W":5 z) Ep(O < W" :5 zlQ = j)P(Q = j)

i=1
t;lor
00 e-/lt l'iti - 1
(j 1\1
.
(l-p)P'dt
1 - pe-/l(1- P)1I: , z> O.

Thus, putting together,
P(W" :5 z) = 1 - pe-/l(1- p )lI:, z ~ O.
Note that the distribution F of W" has a discontinuity at 0 (F(O) = 1- p),

and is continuous on z > 0 with
dF(t) = PI'(1- p)e-/l(1- p)lI:dz.

From the distribution of W" , we get
E(W*) = ~
It is easy to check that
(i) E(Q) = AE(W), E(Q") = AE(W*) (Little's formulae).

(ii) E(W) = E(W") + 1/1', E(Q) = E(Q*) + A/I'.
These relations turn out to hold in general queueing systems.
Queueing Theory 179
Remarks.
(i) For a MIMII queueing system, the waiting time W(t) is the time
required to serve all customers present in the system at time t. It is clear
that
-if Q(t) = 0
W(t) = { ~t + Y 2 + ... + YQ(t) if Q(t) ?: 1.
(W(t), t ?: 0) is a Markov process, and the distribution of each W(t) can be

obtained as in the case of W· above. As a matter of fact, the distribution
of W· is the limiting distribution of W(t), as t -+ 00, when p < 1. It can
be shown that, when p?: 1, liIDt-+oo P(W(t) ~ x) = 0, for all x, so that, as
t gets large, the waiting time W(t) increases indefinitely.
(ii) Queueing systems evolve in cycles, where a cycle is formed by an idle
period and a busy period. An idle period is the time interval during which
no customers are present in the system (so that the server is continuously
idle). In a MIMII queue, it is clear that idle periods are i.i.d. exponentially
distributed with parameter A. A busy period is a time interval during which
the server is continuously occupied. If a customer, upon his arrival, finds
the server free, then the busy period, initiated by him, begins. During his
service time, other customers arrive and form a queue, waiting for service.
The busy period ends when the server becomes free again. It can be shown
that the busy period B has density function 9 given by
1
g(x) = -{)-l(X), x > 0, (see (8.3»
x
where
e-(>'+/J):J: 00 1 ( ) 2n+l
{)-l(X) = 'P ~n!(n+l)! x.;>:P .
By taking Laplace transform of g, it can be seen that
P(B<OO)={ ~/p if p ~ 1
if p > 1,
meaning that the busy period will eventually end when p ~ 1, and may
continue indefinitely when p > 1, with positive probability.
It we let Bi denote a busy period initiated by i customers (that is,
initially there are i customers in the queue), then B j is the sum of i i.i.d.
random variables which are distributed as B.
180 Lesson 8
8.3 The queues M/M/s, 1 < s < 00

In this section, we consider queueing systems with more than one servers.
We continue to assume that customers arrive according to a Poisson process
with rate A (so that the inter-arrival times X n , n ~ 1, are i.i.d. exponen-
tially distributed with mean I/A), and the service times Yn , n ~ 1, are Li.d.
exponentially distributed with mean 1/1', and (Xn, n ~ 1) , (Yn , n ~ 1) are
independent of each other.
First, consider a queueing system M/M/s with 1 < s < 00 (s-server
queue). The service policy is ''first come, first served", and the queue
capacity is unlimited.
Let Q(t) denote the system-length at time t (waiting or being served).
A customer who arrives at time t must wait in the line if all servers are
occupied (i.e., Q(t) ~ s ), otherwise, he is served immediately. As in the
case of a M/M/l queue, (Q(t), t ~ 0) is a Birth and death chain with birth
and death rates given by (Exercise 8.5):
Ai A, i ~ 0,
if1~i~s
I'i = { il'
Sl' if i >s
1'0 O. (8.4)
The quantity Sl' is referred to as the rate of global service. The traffic
intensity of M/M/s is defined to be p = A/Sl'.
As usual, the transition probabilities of the chain (Q(t), t ~ 0) can be
obtained by solving, say, the forward equations (see (5.19), lesson 5). For
i ~ 0, j ~ 1,
PIo(t) = -AoPiO(t) + I'lPi1(t),

PIj(t) = Aj-1PI,j-l(t) - (Aj + I'j )Pij(t) + I'H1Pi,Hl(t). (8.5)
Since the analytic solution of the system (8.5) is complicated, we restrict

ourselves to the stationary mode of the chain. Let us proceed informally
as follows. let 7r(j) = limt_oo Pij(t). Then the 7r(j)'s satisfy (8.5) with
PIj(t) = 0 so that
Ao7r(O) 1'17r(1)
(Aj + I'j )7r(j) Aj_l7r(j - 1) + I'j+17r(j + 1), j ~ 1. (8.6)
If we add the above equations (j = 0,1, .. . n), then
An7r(n) = I'n+l7r(n + 1), n ~ O. (8.7)

Queueing Theory 181
Thus,
AoA1 ... An-111"(0),
= 1'11'2'" J.tn
1I"(n) n~1. (8.8)
On the other hand, since E:=o 1I"(n) = 1, we get
11"(0) = [1 + f: AoA1 ... An_1]-1 (8.9)

n=1 1'11'2'" J.tn '
provided, of course, that
Loo AoA1'" An-1

-=--=---=..:---=. < 00. (8.10)
n=1 1'11'2'" J.tn
For Ai, J.ti given in (8.4), we have
f: AoA1 ... An-1

,-1
"AoA1" 'A n -1 +L
00
AOA1" 'An -1
n=1 1'11'2'" J.tn ~ 1'11'2'" J.tn n=, 1'11'2'" J.tn
,-1 1 (A)n 00 1 (A)n
~ n!' ; + n_,
n_1 ~ s!sn-, ;
_ 8 -1 1 ( A) n S' 00 n
- ~ n! ; + s! ~ P ,
where p = A/sJ.t. Thus (8.10) holds when p < 1. Therefore, suppose

that p < 1, the chain (Q(t), t ~ 0) has 11"( .) as its stationary distribution.
Specifically, using (8.4), we have
E
-1
[
,- 1 1 Ani A 8 ]
11"(0) = n! (;) + s! (;) (1- p)-l ,
and
~(j) = { +, (~)i
J. /.I
,!,,-.
1 (A)i
Ii 11"(0)
for 0
for j
~ j
~ s.
<s
As in the M/M/1 queue, various quantities of interest in a M/M/s

queue can be easily computed using the stationary distribution 11"( .). It
is left as an exercise (Exercise 8.6) for the computations of the expected
system-length as well as the expected number of customers in the queue
E(Q*), where here
ifQ < s
Q* = { ~- s if Q ~ s.
182 Lesson 8
Consider next the queue M/M/oo. In this case, there is no queue: each
customer, upon arrival, is immediately served. The system length Q(t)
becomes the number of busy servers at time t. (Q(t), t ;::: 0) is a Birth and
death process with birth and death rates given by
~i =~, i;::: 0,
I'i = iI', i;::: 1.
and the corresponding forward equations are
PIo(t) -~PiO(t) + I'Pi1(t)

PIj(t) ~Pj,j-l(t) - (~+ jl')Pjj(t) + (j + 1)I'Pj,j+1(t), j;::: 1.
It can be shown that the solution Pjj(t) of this system of equations is given
1; ( ~
by
Pjj(t) = j n ) e-q).!IJpi-nqj-H2n (~) n
where p = e- IJt = 1- q.
Here, for all 0 < ~, I' < 00, the stationary distribution of (Q(t), t ;::: 0)
exists and is given by
1r(j) = lim Pj.(t) _

t-oo
(~)j e-).!IJ j;::: 0,
I' J.. , '
J - - --
that is, 1r( . ) is a Poisson distribution with parameter ~/1'.
8.4 The queue M/G/1

Consider waiting phenomena in which it is still reasonable to model inter-
arrival times of customers as random variables exponentially distributed
(with parameter ~) but the service time Y might be more realistically
modeled by a random variable having a distribution different than an ex-
ponential distribution, such as uniform, triangular, or deterministic (con-
stant service times), ... In general, the service time distribution H is not
specified. We investigate essentially the stochastic process Q(t), t ;::: 0) and
its stationary distribution. But when H is not exponential, the process
(Q(t), t ;::: 0) is no longer Markovian. However, this process has a set of re-
generation points, namely {tn, n 2:: I}, where tn is the instant at which the
nth-customer leaves the queueing system, in other words, (Q(t),t;::: 0) is a
regenerative process. Note that {tn, n 2:: I} is a set ofregeneration points if
Queueing Theory 183
and only iffor all t > T E {tn, n ~ I}, the conditional distribution of Q(t)
given Q(T) is the same as the conditional distribution ofQ(t), given Q(z),
z ::; T. Thus, the evolution of Q(t) after T is independent of its history dur-
ing (0, T]. Note that if Q(t) is Markovian, such as in a M/M/l queue, then
the set of all t is the set of regeneration points for Q(t). It follows that the
discrete-time process Qn = Q(Tn), n ~ 1, is a discrete-time Markov chain.
(Qn, n ~ 1) is referrred to as an imbedded Markov chain. We are going to
determine the stationary distribution of (Qn, n ~ 1) to provide informa-
tion about the queueing system, where characteristics of the system will be
computed after time which is an instant of departure of a customer. Note
that in a M/M/l queue where (Q(t), t ~ 0) is Markovian, the queueing
system will be in steady mode after a large and arbitrary time t.
We assume that the variance of the service time, V ar(Y), is finite, and
we write E(Y) = 1//l.
It is easy to check that Qn = Q(t n ), n ~ 1, is a Markov chain. Note that
Qn is the number of customers left behind in the system by the customer
at time tn. Let Cn denote the number of customers arriving during the
service time of the nth-customer, that is Cn = N(Yn ), where N(t) is the
number of arrivals during (0, t], and Yn is distributed as H. Then
P(Cn = j) = E (P(Cn = jlY)) = 1 00

P(N(Y) = jlY = t)dH(t)
=
o
100 e->.t(>.t);
., dH(t).
J.
Note that the variables Cn, n ~ 1 are i.i.d., with common distribution aj
with
1
aj = 00 e->'t().t)j (j!)-ldH(t), j ~ 0.
Since
Qn+l ={
Cn+l
Qn - 1 + Cn+l .
ifQn = °
ifQn ~ 1,
or, equivalently,
Qn+1 = Qn - On + Cn +1 , (8.11)
where
if Qn ~ 1
On ={ ~ if Qn = 0,
we see that Qn+l does not depend on Qm, m::; n - 1.
From (8.11), the one-step transition matrix IP of the chain (Qn, n ~ 1)
is easily obtained.
POj = P(Qn+l = jlQn = 0) = P(Cn+1 = j) = aj, j ~ 0.
184 Lesson 8
For i ~ 1
P(Qn+1 = jlQn = i) = P(Cn+1 = j-Hl) = {qj-i+1

o if j - i + 1 ~ 0
if j - i + 1 < O.
Thus, IP = [Pij] with

POj = aj, j ~ 0,
and
p... _ { aj -i+1 for 1 ::; i ::; j + 1
IJ - 0 for i > j + 1.
Note that aj > 0 for all j ~ OJ the chain (Qn, n ~ 1) is irreducible and
aperiodic. Thus its stationary distribution exists when p < 1, where the
traffic intensity is p = E(Y)/E(X) = >'/1-" Since p = E(Cn ), which is the
expected number of arrivals during a service time, intuitively, the queue
length increases without limit as time increases, if p > 1. In fact, it can be
shown that the chain (Qn, n ~ 1) is positive recurrent, null recurrent, or
= =
transient according to p Ei=oja; is < 1, 1, or > 1, respectively.
When p < 1, the stationary distribution 7r(j) = liffin_oo Ptj, j ~ 0, is
the solution of 00
7r(j) = 2: 7r(i)Pij, j~O (8.12)

i=O
with 7r(j) ~ 0 and Ei=o 7r(j) = 1.
To solve (8.12), we use the method of generating functions. First,
rewrite (8.12) in terms of the aj's:
;+1
+ 2: 7r(i)Pij = 7r(O)aj + L: 7r(i)aj-i+1
00
7r(j) = 7r(O)POj
i=1 i=1
H1
7r(O)a; + 2:
7r(i)aj_i+1 - 7r(0)aH1'
i=O
Thus,
[ ; 7r(j)zJ = 7r(0) [ ; a;zJ +; [;

00 • 00. 1 00
t;
(H1
7r(i)a;-i+1
). 7r(0) 00
zJ+1_-z- ~ aj+1 zJ +1.
•
(8.13)
let 00 00
A(z) = 2:a;zi, II(z) = 2: 7r(j)zi .

j=O ;=0
Queueing Theory 185
We see that 00
A(z)II(z) = L: bjzj = B(z),

j=O
where bj = E{=o aj_i'lI"(i). Thus (8.13) becomes

1 . 11"(0)
II(z) 1I"(0)A(z) + -[B(z) - bo] - -[A(z) - ao]
z z
1 11"(0)
1I"(0)A(z) + -[A(z)II(z) - a01l"(0)] - -[A(z) - ao]
z z
and hence
II(z) = 11"(0)(1- z)A(z). (8.14)
A(z) - z
Now, using (8.11), we have, in the stationary mode,
p = E(Cn+1 ) = E(6n ) = P(Qn ~ 1) = 1- P(Qn = 0) = 1- 11"(0).
Thus (8.14) becomes
II(z) = (1 - p)(l - z)A(z) (8.15)
A(z) - z
Remark. From (8.15), the stationary distribution 11"( .) can be determined
as follows. Given the service time distribution H, we have A( z) = .,p( A- AZ),
where .,p is the Laplace transform .,p(z) = 1000 e-ztdH(t). From this, an
explicit expression for II(z)is obtained. Then, the 1I"(j)'s can be determined
by developing II( z) into a power series.
As in the case of a M / M /1 queue, various quantities of interest can be
computed in the stationary mode of a M/G/1 queue. For example, if Q
denote the system length left behind by a departing customer, then
I A2E(y2)
W(Q)=II(l)=p+ ___ ,.
(See also Exercise 8.9).

Let W denote the total time spent in the queueing system of a customer.
Then
1I"(j) = 1 00
P(N(W) = jlW = t)dFw(t),
where Fw denotes the distribution function of W. (Since 1I"(j) = P( Q =
j), where these j customers arrived during the time W of a departing
1 (AJ.l)~,e-At
customer). Thus,
1I"(j) = 00
dFw(t)
o J.
186 Lesson 8
and hence
L j7f'(j) = '\E(W).
00
E( Q) =
i=o
It follows that
E(W) = .!.E(Q) = .!. + '\E(y2) .
.\ I' 2(1- p)
Note that, if W* denotes the waiting time of a customer, then
E(W*) = E(W) - E(Y) = E(W) _ .!. = '\E(y2)

I' 2(1- p)"
Remark. The analysis of a G/M/l queue can be carried in a similar
manner. In such a queue, the process (Q(t),t ~ 0) is regenerative with
regeneration points being the instants of arrivals. Specifically, if rn denotes
the instant at which the nth-customer joins the queue, and Qn = Q(r;;)
being the number of customers ahead of him in the queueing system at the
epoch of his arrival, then (Qn, n ~ 1) is a Markov chain. The situation is
much more complicated for a G/G/l queue.
8.5 Exercises
8.1. In a M/M/l queue with inter-arrival times and service times exponen-
tially distributed with parameters .\, 1', respectively, let
A(t) = number of arrivals during (0, t],
D(t) = number of departures during (0, t),
and
X(t) = A(t) - D(t), t ~ O.
(i) Determine the distribution of X(t).
(ii) Show that
P(X(t) = -j) == (X)j P(X(t) = j).

(iii) Compute the generating function E (zx(t)) of X(t). Show that
P(X(t) < 00) = 1.
(iv) Show that P(Q(t) < ooIQ(O) = i) = 1 for all i.
8.2. Consider the Markov chain (Q(t), t ~ 0) in a M/M/l queue with traffic
intensity p < 1. Suppose that the distribution of Q(O) is given by
P(Q(O) = j) = (1 - p)pi, j = 0,1,2, ...
Queueing Theory 187
Show that, for any t > 0, Q(t) has the same distribution as Q(O).
(Hint: P(Q(t) $ jIQ(O) = i) = ej_i(t) - pi+le_(i+i+ 2 )(t)).
8.3. Consider a MIMII queue in which a customer arrives, on the average,

every 10 minutes, and the mean service time is 6 minutes. In the stationary
state, compute
(i) The probability that at least two customers are waiting for service.
(ii) The probability that a customer who arrives and waits has 5 other
customers in front of him.
8.4. In a MIMII queue with A < 1', suppose
A
P(Q(O) = j) = (1- p)pi, j=0,1,2 ... , p= -.
I'
(i) Compute the variance of Q(t), for t > o.
(ii) Let W be the total time spent by a customer in the system, compute
E(W) directly (i.e., without using the distribution of W) by using
00
E(W) = :L E(WIAj )P(Aj),

j=O
where Aj denotes the event that, upon arrival, the customer has in front of
him j customers in the system.
(iii) With the notation given in Section 8.2, show that E(W) = E(W")+
III'. Use Little's formulae to show next that E(Q) = E(Q") + AII'.
8.5. Consider a queue MIMI8 with 1 < 8 < 00. Show that the birth and
death rates of the Markov chain (Q(t), t ~ 0) are
Ai = A, i ~ 0,
il' ifl$i$s
I'i = { 81' if i > s.
8.6. In the stationary mode of the queue M 1M18 with 1 < s < 00,
(i) Compute the expected system-length E(Q).
(ii) What is the probability that a customer must wait for service.
(iii) Compute the expected number of customers in the queue E(Q").
(iv) Find the distribution of Q*.
(v) Let W* be the waiting time of a customer, find the distribution of
WoO and E(W*).
188 Lesson 8
(vi) Compute the expected number of free servers.

8.7. Consider the queue M/M/oo. The conditional distribution of Q(t)
given Q(O) = i is known to be the convolution of a binomial distribution
with parameter (i,p), with a Poisson distribution with parameter q)./I-'.
(i) Use this fact to compute
E(Q(t)IQ(O) = i), Var(Q(t)IQ(O) = i).

(ii) In the stationary mode, compute the expected number of busy
servers and the expected total time spent in the system of a customer.
8.8. In a M /G /1 queue with service time Y having finite mean and variance,
i.e.,
E(Y) = 1 00
o
tdH(t) = -,
1
I-'
E(y2) = 1 00
t 2dH(t) < 00.
(i) Let Cn be the number of arrivals during the service time Yn of the
nth customer. Show that E(Cn ) = )./1-' = p.
(Hint: Cn = N(Yn).)
(ii) Compute the generating function of Cn:
=L
00
t/J(z) P(Cn = j)zi.

i=O
(iii) Use (ii) to find E(Cn ) and Var(Cn ).
8.9. Consider the stationary mode of a M/G/l queue. Let Q denote the
system length left behind by a departing customer. Use Exercise 8.8. and
the relation (8.11) in Section 8.3, to show that
E(Q) = p + ~~~(y2}.
(Hint: Take expectation in Q~+l = (Qn - bn + Cn+d 2 • )
Lesson 9
Stationary Processes
In this Lesson we point out specific properties of discrete-time weakly sta-

tionary processes. In particular, we describe the inner correlation of such
a process, investigate the problem of predicting the future of a weakly sta-
tionary process, and study asymptotic theory.
9.1 Autocovariance, Spectral Density, and Par-

tial Autocorrelation
Definition 9.1 A real second order stochastic process (Xtl t E ZZ) is said
to be (weakly) stationary if
E(Xt ) does not depend on t (9.1)

and
COV(X3' X t ) is a function of It - sl. (9.2)
In the following we will suppose that E(Xt ) = 0 and E(Xl) > 0 unless
otherwise stated. (On the other hand, we always identify two random
variable X and Y if P(X = Y) = 1).
Note that if (Xt ) is a second order, zero mean process, defined on the
probability space (0, A, P), then X t belongs to the Hilbert space L2(0, A, P)
and COV(X3' X t ) = E(X3X t ) is the scalar product of X3 and X t in that
space (see Appendix). This geometric interpretation will be used repeatedly
in the current Lesson.
When (Xt) is stationary we define its autocovariance (or auto covariance
function) by
'Yt = E(XoXt ), tEll (9.3)
189
190 Lesson 9
and its autocorrelation by
Pt = "It/ro, tE'lh. (9.4)
("It) satisfies the following properties:
"10 = Var(Xt ), "I-t ="It, "It-, = Cov(X" X t ).
The auto covariance function is an important object since it describes

all the inner correlation of (Xt ). As an application, we consider the fol-
lowing problem: Determine the linear regression of X n +1 with respect to
Xl, ... ' X n , i.e., a random variable of the form X~+l = E~=l aiXi that
minimizes the quadratic error E (Xn+1 - E~=l ai X i)2.
The solution does exist and is unique since it is the orthogonal pro-
jection of X n +1 on the linear space generated by Xl. ... , X n , denoted as
Sp(X1' ... ,Xn ) (see Appendix).
Consequently a = (a1, ... , an) is a solution of the system
n
Lani ="In+1-j, j = 1, ... , n (9.5)

i=l
obtained from the orthogonality relations
E (X~+1 - Xn+dXj) = 0, j = 1, ... , n.

Thus a is completely determined by ("It).
Note that a is unique if and only if the circulant matrix ("Ij-ih:5i,j:5n
is non-singular.
Spectral density. Let (Xt, t E 'lh) be a stationary process such that
Et l"It I < 00.
We define its spectral density by
f(>..) = 211"1 L "It cos >"t, >.. E [-11",11"]. (9.6)

te'lh
The following statement summarizes some important properties of f:
Theorem 9.1 f is an even, positive, continuous function which deter-

mines ("It, t E 72) via the formulas
"It = J. 1r
1r cos>..tf(>..)d>.., t E 72. (9.7)
Stationary Processes 191
Proof. Clearly f is even, i.e., f(A) =

f( -A)) and continuous since
ht cos Atl ~ l1'tl implies the uniform convergence of the series in (9.6). On
the other hand (9.7) is valid since 1t is a Fourier coefficient of f.
It remains to show that f is positive. For this purpose we consider the
so called periodogram
In(A) = _1_1 txt >..tl2

211"n t=l
ei = _1_ t
211"n .,t=l
X.Xte i>..(t-6). (9.8)
Taking expectation on both sides and rearranging the terms we obtain
E (In (A)) = 2~n L:

Itl~n-1
(1- 1:1) 1t cos At. (9.9)
Now since
Itl) 1t cosAtl ~ 11tl
I ( 1- -;
we may apply the Lebesgue dominated convergence theorem to the counting
measure over'll (see Appendix). It follows that
lim E (In (A))
n-oo
= f(A)
and since In(A) ~ 0 implies E(In(A)) ~ 0 we have f(A) ~ 0 and the proof
of Theorem 9.1 is therefore complete. <>
More generally, it can be shown that, if (Xt ) is any stationary process,
1:
then there exists a bounded measure on [-11",11"], say 1', such that
1t = cos Atdl'(A), t E 'll. (9.10)
I' is unique provided it is supposed to be symmetric (that is I' ([-b, -aD =

I' ([a, b]), -11" ~ a < b ~ 11"). It is called the spectral measure of (Xt ).
The associated distribution function
F(A) = I' ([-11", AD, -11" ~ A~ 11" (9.11)
is calleed the spectral distribution function of (Xt ).
1:
Note that, if the spectral density f does exist, then
F(A) = f(u)du. (9.12)
If X. and X t remain highly correlated for large It - sl, then the spectral
density does not exist (see Example 9.2 and 9.4).
We now give some important examples of stationary processes with their
spectral measures.
192 Lesson 9
Example 9.1 A real second order process (Ct, t E ;Z) is said to be a white
noise if
(i) E(ct) = 0, t E ;Z,
(ii) E(c¥} = 00 2 > 0, t E;Z, and
(iii) E(C3Ct) = 0, s, t E ;Z, s 1= t.
If in addition the Ct'S are i.i.d., then (Ct) is said to be a strong white
noise. A white noise is stationary with autocovariance 1t = oo21t¢0 and
consequently with spectral density given by
00 2
f(A) = 211"' A E [-11",11"]. (9.13)
Example 9.2 Consider the degenerate process X t = X o, t E Z, where

E(Xo) = 0, E(XJ) = 00 2 • Then I' = 00 2 6(0)' where 6(0) denotes the Dirac
°
measure at (see Appendix).
Example 9.3 Consider the model
I: pict-j ,
00
Xt = t E;Z, (9.14)
j=o
where (Ct) is a white noise and Ipl < 1. Then (Xt ) is stationary with
autocorrelation (p') and spectral density
2
f(A) = -11-
00
211"
•
pe'>'r 2 , A E [-11",11"]. (9.15)
Example 9.4 Consider the process

I:
Xt = I: (Aj cos Ajt + Bj sin Ajt) , t E ;Z, (9.16)
j=l
where (A1' Bl, ... , AI:, BI:) is a finite sequence of orthogonal random vari-
=
ables such that E(Aj) E(Bj) 0, E(AJ) E(BJ) ooJ, j= =
1, ... k and = =
A1, ... , AI: E (0,11"]. Then (X t ) is stationary and
I:
1t = I: 00 2 cos Ajt, t E ;Z. (9.17)
j=l
The spectral measure of (Xt ) is
I: 2
I' = I: ~ [6(>.;) + 6(_>.;)] , (9.18)
j=l
where 6(0) denotes the Dirac measure at a (see Appendix).
The model (9.16) is crucial since it may be proved that every stationary
process can be approximated by processes of that type.
Relation (9.18) shows how the spectral measure points out the predom-
inant amplitudes and frequencies of a stationary process.
We now give a result which allows us to compute the spectral measure
of a stationary process defined by a linear filter.
Theorem 9.2 Let (Ut , t E zq be a stationary process with spectral measure

(Vi, t E m) be the process defined by the mean square convergent
I' and let
senes 00
Vi = LCjUt-j, tEm, (9.19)

j=O
where E j !cj I < 00 (linear filter). Then (Vi) is stationary with spectral
measure r such that
IL cjeiA;
00
dr(,\) = 12dl'('\). (9.20)

j=O
Proof. First using the Cauchy criterion, it is easy to check that the series
in (9.17) converges in mean square.
On the other hand, since convergence in mean square implies conver-
gence in I-mean, we have
E(Vi) = LcjE(Ut-j) =O.

j
Now the bicontinuity of scalar product in a Hilbert space entails
E(~ Vi) = L.., CjCj"'Y(t_.)+(j_j') ,

J,J
where 'Y. denotes the auto covariance of (Ut ).

It remains to prove (9.18). To this aim we first notice that for all n
L
l~j,j'~n
CjCj' cos '\[(t - s) + (j - i')11 ::; (L
j
ICj I) 2 ,
therefore we may apply the Lebesgue dominated convergence theorem for
1:
obtaining the relation
E(V. Vi) = ~ CjCj' cos '\[(t - s) + (j - i')]dl'('\),

J,J
194 Lesson 9
1:
hence
E(V" lit) = Re ~ CjCj,ei>.[(t-&)+(j-j')]djl(A)
= 1: ).)
cos A(t - s)1 ~ Cjei>.j 12 djl(A)
= 1: )
cos A(t - s)dr(A).
Finally r is the spectral measure of (lit) since it is symmetric, bounded and

satisfies (9.10). <>
Partial autocorrelation. We now define the partial autocorrelation of a
stationary process. This concept is of interest in statistics (see Lesson 14).
First, consider the zero-mean square integrable random variables X, Y,
Zl, ... , Zk. The partial correlation coefficient of X and Y with respect to
Zl, ... , Zk is defined by
r(X; Zl, ... , Zk; Y) = Cov(X - X·, Yy)

f~ " .. \ f'T.T "'I'T.\"
where X· and Y· are the orthogonal projections of X and Y on Sp(Zl, ... , Zk).
Now the partial autocorrelation of a stationary process (Xt , t E '/l,)
is definied by
rk = r(Xt;Xt-l, .. "Xt-k+l;Xt-k), k;::: 1
(rl is simply the correlation coefficient of X t and Xt-l). rk is thus the

correlation coefficient of the two residuals obtained after regressing X t and
Xt-k on the intermediate variables of the process.
9.2 Linear Prediction and the Wold Decom-

position
Let (Xt , t E '/l,) be a stationary process and let 1ft be the closure in
L2(O, A, P) of the linear space generated by (X&, S ~ t).
The best linear predictor of Xt+l with respect to (X&, s ~ t) is the
orthogonal projection Xt+! of X t+! on 1f t . Its prediction error is defined
by
u2 = E (Xt+l _ Xt+l) 2 • (9.21)
Using stationarity one may infer that (1'2 does not depend on t (see Exercise
9.3).
(Xt ) is said to be regular if (1'2 > 0, otherwise (Xt ) is said to be deter-
ministic.
It (Xt) is deterministic, then (9.21) entails Xt+1 E 1tt (a.s.) and conse-
quently past and present of the process determine its future.
If (Xt ) is regular one may be define a white noise by setting
ct = Xt - Xt, t E 'll. (9.22)
(Ct) is called the innovation (process) of (Xt ). The following theorem pro-
vides a decomposition of a regular process.
Theorem 9.3 (The Wold decomposition) Let (Xt, t E 7Z) be a stationary

regular process. Then
00
X t = Lajct-j + Yi, t E 7Z, (9.23)

j=O
where the series converges in mean square and where ao = 1, Lj aJ < 00,
(ct) is white noise, Ct1. Y3, s, t E 7Z, Yi E nj:o 1tt -j, t E 7Z.
The sequence (aj), (Ct), and (Yi) are uniquely defined by (9.23) and the
above properties. In particular, (Ct) is the innovation of (Xt ).
Proof. Let us set Ct =

X t - Xt, t E Z. Then (Ct_j/(1' : j = 0,1,2, ...) is
an orthonormal system in L2(0, A, P). Therefore
00
Xt = "'"
L.J bjet_
- - + t,
j y;
j=O (1'
where bj = E(Xtct-j/(1'), Lj bJ < 00, and Yi1.Ct-j, j ~ O. We thus

obtain (9.23) with aj =E(Xtet_j )/(1'2, j ~ 0 and in particular ao 1 since=
E(Xtet) = E«Xt - Xt}ct) = (1'2.
Now, since Yi = X t - Lj ajet_j and et-j E 1tt -j, j ~ 0, we have
Yi E 1t t .
On the other hand let us set Yi = Yt + Ot, Then Ot1.1t t -1 and Ot E 1tt .
But Yi1.et and Yt E 1tt -11.et imply Ot1.Ct. Therefore Ot1.Xt + E:t = Xt.
Finally we obtain Ot1.1t t -1 and Ot1.Xt , which imply Ot1.1t t , and since Ot E
1t t we may assert that Ot =
o. Thus we have shown that Yi Yt or =
equivalently that Yi E 1tt-1. Similarly it can be proved that Yi E 1tt -j,
j ~ 2.
196 Lesson 9
It remains to show the unicity. Let us consider the decomposition

00
X t = L: ajc~_j + Yf, tEll ,

j=o
where (aj), (cD, and (Yf) satisfy the same conditions as (ai), (Ct), and (yt).
Then we may write
Xt = c~ + (tajc~_j
J=1
+ Yf) ,
where c~l.'Ht-l and

00
L:ajc~_j +~' E 'H t -l.

j=1
Consequently
,
00
~"
.LJ ajCt_j + ~" = X t and Ct = X t - Xt.
'
j=1
The rest of the proof is straightforward. <>
The process Zt =Xt - yt, tEll has the Wold decomposition
00
Zt = Eajct-j, t E 7Z. (9.24)

j=1
In other words, Ct = X t - Xt Zt - Zt, where Zt is the orthogonal

projection of Zt on the closure of sp(Z., S :$; t) and aj =
E(Ztct_j )/u2 ,
j ~ O. (Zt) is said to be purely nondeterministic.
A linear process is a process who satisfies (9.24) with E j laj I < 00.
Various kinds of linear processes will be considered in Lesson 10.
9.3 Limit Theorems for Stationary Processes

The laws of large numbers and the central limit theorem (see Lesson 1) are
valid for stationary process as soon as l1'tl is small enough for large t. In
the following we set
n
Xn = EXt/n, n~l.
t=1
Theorem 9.4 (weak law of large numbers). Let (Xt, t E tz) be a station-
=
ary process such that E(Xt ) m and with autocovariance (1t). Then
Xn ~ m ¢=> ~ L
Itl$n-l
(1- 1:1) 1t 0 --+ (9.25)
as n -+ 00. In particular if Lt 11t I < 00, then we have

nE(Xn - m)2 --+ 2'111(0), (9.26)
where f denotes the spectral density of (Xt ).
Proof. Noting that E(Xn - m)2 = V(X n ), we may write
E(Xn
- - m) 2 = n12 'L..J
" Cov(X.,Xt )
l$.,t$n
1
= n2 L
l$.,t$n
1t-.
= ~n 'L..J
"
Itl$n-l
(1 - J!1) 1t,
n
(9.27)
hence (9.25).
In order to prove (9.26), we first note that 1(1 - Itl/nhtl :::; 11tl and
then apply dominated convergence theorem to the counting measure over
'lI.. (see Appendix) for obtaining
L Itl) 1t --+ L
( 1 - -; 1t
Itl$n-l tEZ
as n -+ 00. Now (9.6) shows that LtEZ 1t = 2111(0), therefore (9.26)

follows from (9.27). <>
If (Xt ) satisfies condition in (9.23) and is strictly stationary then it can
be proved that Xn ~ m (strong law oflarge numbers).
We now state without proof a central limit theorem:
Theorem 9.5 If
00
X t = m+ Lajct-j, t E tz,
j=O
where m E lR, ao = 1, L laj I < 00, and (Ct) is a strong white noise, then
t= - D
vn(Xn - m) --+ N, (9.28)
where N denotes a random varianle with distribution N(O, LtEtz 1t).
198 Lesson 9
Note that the variance of N may be written under the alternative form
(T2 (E j aj f.
9.4 Stationary Processes in Continuous Time
In this section we give some indications about second order continuous time
processes and weakly stationary continuous time processes.
Second order calculus. Consider a second order process (Xt t E I) where
I is an interval of reals. It may be considered as a random function by
defining the transformation which associate to every wEn the function
t 1----+ Xt(w) called the sample function corresponding with w.
Then it is natural to construct a theory in which it is possible to talk
about continuity, differentiation an integration of the process. We develop
these concepts in the L2 sense. In this Section, we suppose that (t,w) - +
Xt(w) is 8(1) ® A-8(JR) measurable (see Appendix).
Definition 9.2 (Xt, t E I) is said to be L2-continuous at t E I if Xt+h ~

X t as h -+ 0 and L 2-differentiable at t E I if (Xt+h - Xt)/h converges
in mean square to a limit X: as h -+ 0,
The following theorem provides a crierion for L2-continuity.
Theorem 9.6 Let (Xt, t E I) be a second order process with covariance

=
C(s, t) Cov(X., X,), s, tEl. Then the following statements are equiva-
lent.
(a) (Xt ) is L2-continuous over I.
(6) C is continuous over the diagonal of I x I.
(c) C is continuous over I x I.
Proof. (a) implies (c) since X. ~ X t and X.' ~ X t' entails

Cov(X.,X.,) - + Cov(Xt,Xt ,) by bicontinuity of the scalar product in a
Hilbert space.
Obviously (c) implies (b). Finally (b) implies (a) since s -+ t entails
E(X. - X t )2 = E(X.)2 - 2E(X.Xt ) + E(Xt )2

= C(s, s) - 2C(s, t) + C(t, t) - + O. <>
We now state a criterion for L 2-differentiability.
Theorem 9.7 Let (Xt, tEl) be a zero mean second order process. The
following conditions are equivalent.
(aJ (Xt ) is L 2-ditferentiable at to.
(b) E (Xjo+~-XjO Xro+~-Xro) has a limit £ as (h, k) --+ (0,0).
Proof. (a) implies (b) by bicontinuity of the scalar product, moreover

£ = E(X:o)2.
In order to prove that (b) implies (a), let us consider the auxilliary
=
variable Yk (Xto+h - Xto)/h. Then E(YhYk) --+ £, therefore
E(Yh - Yk)2 = E(Yh)2 - 2E(YhYk) + E(Yk)2 -+ £- 2£ + £ =0

as (h, k) --+ (0,0), and the Cauchy criterion entails (a). <>
We now turn to integration. Let (Xt,a ~ t ~ b) be a zero mean second
order process. In order to define its L2-integral on [a, b), let us consider the
Riemann sums
k,.
In = I: X ,.,;(tn
8 ,i - tn,i-t},
i=l
=
where a tn 0 < tn 1 < ... < tn n =
b, Sn i E [tn i-l. tn i). If In - + I as
L2
n --+ 00 and S~Pi(tn,: -tn,i-l) ~ 0, then (X

t ) is ~aid to be L2-integrable
on [a, b)with integral
1= 1
6
Xtdt. (9.29)
It can be proved that I does not depend on the sequence of partitions (tn,i)
and on the choice of (Sn,i).
Note that I is a square integrable random variable and E(I) = liffin In =
O. If (Xt ) is not zero mean, its L2-integral is defined by J:(Xt-E(Xt»dt+
J: E(Xt)dt provided t --+ E(Xt ) is integrable over [a, b).
We have the following basic criterion for L 2-integrability.
Theorem 9.S (Xt) is L2-integrable on [a, b) if and only if its covariance

C is Riemann integrable on [a, b) x [a, b).
Peoof. Let (In) and (Jm) be Riemann sums associated with two sequences
of partitions (tn,i) and (Tm,j). (Xt) is L 2-integrable with integral I if and
only if In ~ I and Jm ~ I. These conditions are equivalent to E(In -
200 Lesson 9
Jm )2 ---+ 0 and E(I,Jm) ---+ l and therefore equivalent to

k .. k ...
L: L: C(Sn,i, Sm,j)(tn,i - tn,i-1)(Tm,j - Tm,j-1) ---+ i.
i=lj=l
This last condition means that C is Riemann integrable on [a, b] x [a, b]. <>
The following property of the integral is useful.
Theorem 9.9 If C is continuous on the diagonal of [a, b] x [a, b], and f

and g are two continuous functions on [a, b], then
E [l b
a
f(t)Xt dt lb
a
g(t)Xt dt] = J1 [a,b)2
f(s)C(s, t)g(t)dsdt. (9.30)
In particular,
E (l a
b
Xtdt) 2 = J1 [a,b)2
C(s, t)dsdt (9.31)
and (9.29) remains valid if C is only assumed to be integrable on [a, b] x [a, b].
The proof is similar to the previous ones and is therefore omitted.

Applications. Consider the following input-output scheme:
r~ignai I ---+ I system I ---+ I response I

Some examples are:
Particles emission ---+ registration by a counter,
arriving of customers ---+ duration of service,
dollar's currency ---+ foreign trade.
Let h(t, s) be the response at time t to a signal of intensity 1 emitted
at time s. In many systems h has the form
h(t, s) = { ~(t - s), s~t

s > t.
If the signal intensity (Xt ) is supposed to be random and if the system
starts a time 0, the response yt a time t is give by
yt = fat g(t - S)X6ds, t ~ 0, (9.32)

where the integral is taken in the L2-sense.

Weakly stationary process. The auto covariance of a stationary con-
tinuous time process is defined by
'Yt = Cov(Xo, Xt), t E ~. (9.33)
If (-rt) is integrable on ~, then the spectral density of (Xt ) is defined
1
by setting
f(>.) = -217r 00
-00
'Yt cos >.tdt, >. E~. (9.34)
1:
If f is integrable on ~ we have the Fourier inversion formula
'Yt = f(>.) cos >.td>., t E ~. (9.35)
The following theorem summarizes some results about second order cal-
culus for stationary processes. Proof is omitted.
Theorem 9.10 (i) (Xt) is L2-continuous if and only if ('Yt) is continuous

=
at t o.
(ii) If ('Yt) is continuous at t = 0 then (Xt ) is integrable over every
compact interval.
(iii) (Xt ) is L 2-differentiable if and only if('Yt) is twice differentiable at
t = O.
Concerning the law of large numbers, we have the following
Theorem 9.11 Let (Xt, t E ~) be a stationary process with mean m and

autocovariance ('Yt). If ('Yt) is continuous at t = 0 and integrable on ~ then,
as T -+ 00,
1 IT L2
T 10
Xtdt --+ m. (9.36)
Proof. The continuity of ('Yt) on ~ entails the integrability of the covariance

of (Xt) on [0,1'] x [0,1'] and by Theorem 9.8, the L 2-integrability of (Xt )
on [0,1'].
Now using Theorem 9.9 we obtain
v (~ lT Xtdt) = ;2 J I l[o,T)2
'Y(t - s)dsdt
;2 JI lO$.$t$T
'Y(s - t)dsdt
202 Lesson 9
2 fT
= T2 Jo (T - uh(u)du
T1jT (
-T 1- Tlui) ')'(u)du,
t X,dt) "~1:
therefore
v (~ h(u)ldu,
which tends to zero as T tends to infinity and since
v (~ t X,dt) =E (~ t x,dt-m)',
the proof of Theorem 9.11 is now complete. <>
9.5 Exercises
9.1. Prove (9.15), (9.17), and (9.18).
9.2. (i) Give a detailed proof of Theorem 9.1.
(ii) Give a proof of Theorem 9.2 without using Lebesgue theorem but
with the additional assumption Et Iht I < 00.
9.3. Let (Xt, tEll) be a zero mean stationary process.
(i) Show that XJp) ~ Xt as p -+ 00, where XJp) denotes the orthogonal
projection of X t on sp(Xt - 1 , ... , X t - p ).
(ii) Use (i) for proving that E(Xt - Xt )2 does not depend on t.
9.4. What is the Wold decomposition of the process given by (9.14)? Justify
your answer.
9.5. Let (Ct, tEll) be a strong white noise. Set X t = a + bt + ct, tEll
and define
1 I:
2k +
Zt= - - " X t+; ,
1 'L..J tEll .
;=-1:
Compute the mean and the covariance of (Zt). Is (Zt) stationary?
9.6. Let (Nt, t ~ 0) be a Poisson process with intensity A. Study its L2_
continuity, L2-differentiability, and L2-integrability.
9.7. Show that L 2-differentiability does not imply the usual differentiability
of the sample functions. (Hint: Consider the process Xt(w) = l{t}(w), t E
[0,1] with (0, A, P) = ([0,1],8[0,1], J.t) where J.t denotes Lebesgue measure.)
8. Prove Theorem 9.9.
9.9. (i) Prove Theorem 9.10.
(ii) Show that if (Xt ) is stationary and L 2-differentiable, then (Xn has
the auto covariance (-1" (t)).
9.10. (Karhunen-Loeve expansion). Let (Xt, a ::; t ::; b) be a second or-
der L2-continuous stochastic process. By Mercer's theorem (see e.g., U.
Grenander, Abstract Inference, Wiley, (1981), pp 62-64 ), we have
C(s, t) = 1:0 An¢n(S)¢n(t), a::; s, t ::; b,

where C denotes the covariance of (Xt ), (¢n) denotes and orthonormal
sequence of in L2([a, b]) and (An) a sequence ofreal numbers such that
1b C(s, t)¢n(t)dt = An¢n(S), a::; a ::; b, n ~ o.

Furthermore the series converges uniformly and every ¢n is continuous.
(i) Show that (Xt, a ::; t ::; b) is L2-integrable on [a, b] and define = en
J: Xt¢n(t)dt, n~ o.
(ii) Show that E(en, em) = AnOn,m.
(iii) Show that E(Xten) = An¢n(t).
(iv) Find an expression for E (Xt - L:~=o ek¢k(t))2 and prove that
00
X t = Een¢n(t), a ~ t ~ b,
n=O
where the series converges in mean square (K. L. expansion).

Lesson 10
ARMA model
In this Lesson we introduce the popular autoregressive / Moving av-

erage (ARMA) model and study its probabilistic properties. Statistics for
ARMA processes will appear in Lesson 14.
10.1 Linear Processes

(Xt, t E 1l) is a linear process if for all t the random variable X t can be
represented as a linear combination of shocks which take place at times
t, t - 1, t - 2, .... More precisely we have the following
Definition 10.1 (Xt, t E ZZ) is said to be a linear process if for every t,

00
Xt = Eajct-j (10.1)
j=O
with ao = 1, Ej laj I < 00, and where (Ct, t E ZZ) is a white noise.
Remark. Note that the series in (10.1) converges in mean square and with
probability 1 (Exercise 10.1). Note also that if one replaces the condition
E j laj I < 00 by E j aJ < 00, then the series remains convergent in mean
square, however the process may be affected by long memory dependence.
Properties. A linear process is zero mean weakly stationary with autoco-
varIance
00
"1t = (1'2 Eajaj+t, t E 1l, (10.2)

j=l
205
206 Lesson 10
where 0'2 = E(Ct)2. Furthermore c,.lXt if s > t (Exercise 10.2). Finally,

Theorem 9.2 implies that (Xt,t E Z) has a spectral density given by
2 00
f(>.) = ; l~ajei>'jI2, >. E [-11",11"]. (10.3)

11" j=O
Operator B. We now define the backward shift operator B by
B(Xt ) = X t- 1
or simply BXt = X t - 1. Powers of B are defined by
BjXt=Xt_j, j=0,1,2,···,
in particular BO = I (the identity).

Using these notations one may write (10.1) under the symbolic form
X t = (tajBj) Ct· (10.4)

3=0
Note that E1=o ajBj can be defined rigorously as an operator on the

space of weakly stationary processes.
Invertibility. In spite of its appearance, (10.1) is not the Wold decom-
position of (Xt ) (see Lesson 9) because the crucial property "Ct E 11./' is
missing. If this property is satisfied, then the process (Xt ) is said to be
invertible and Ct may be written as
00
Ct = ~ 1I"jXt_j, tEll , (10.5)

j=O
where 11"~ = 1, E j 11I"jl < 00. Now by setting 1I"j = -1I"j, j ~ 1, we obtain
00
X t = ~1I"jXt_j +Ct, tEll. (10.6)

j=1
Consequently the orthogonal projection of X t on 1I.t -1 is

00
Xt = ~ 1I"jXt_j, tEll. (10.7)

j=1
ARMA model 207
Note that (10.5) may be written as
ct = (t i.1=0
7r Bj) Xt
which means that the operator Ef=o aj Bj is invertible with inverse

,",00 'Bj .
L.Jj=O 7rj
ARMA model: The statistical analysis of a general linear process is in-

tricate because (Xt ) contains an infinity of (unknown) parameters, namely
(1'2, al, a2, .. " or 7rl, 7r2, . ". In order to obtain more usable processes one
may consider the simpler models
q
Xt = Lajct-j (10.8)
j=O
and
p
Xt = L 7rjXt _j + ct (10.9)
j=O
obtained by cutting out (10.1) and (10.6) respectively.

More general, one may define the so called ARMA(p, q) model by the
equation
p q
Xt + L<PjXt-j = ct + LOjCt-j. (10.10)
j=1 j=1
In the following we will study the processes defined by (10.8), (10.9),

and (10.10).
10.2 Autoregressive Processes

Definition 10.2 (Xt, t E m) is said to be an autoregressive process
of order p (AR(p)) if
p
Xt = L7rj X t - j +Ct, t E m (10.11)

j=1
with 7rp :/; 0 and where (ct, t E m) is a white noise such that ctl.X.,
s,tEm,s<t.
208 Lesson 10
Example 10.1 The simplest example of an autoregressive process is the

AR(l) defined by
00
Xt = LP'et-j t E 7Z, (10.12)

j=O
where 0 < Ipi < 1. Thus et J..X6 , s < t and by considering X t - pXt-l, we
obtain the equation
X t = pXt - l +et, (10.13)
which proves that (Xt) is actually an AR(l).
Note that, if Ipl = 1, (10.13) has no weakly stationary solution (Exercise

lOA).
The following lemma gives the classical condition for existence and unic-
ity of the process satisfying (10.11).
Lemma 10.1 Equation (10.11) has a weakly stationary solution as soon

as the polynomial P(z) = 1 - Er=l 7rjZ j satisties P(z) #: 0 for all z E C
such that zl ~ 1. If such a solution exists, it is unique.
Proof. We first deal with unicity. If (Xt) is a weakly stationary process

satisfying (10.11) and if
p.
Xt = L 7rjXt _j + e;, t E '/I,
j=l
with obvious notations and with e; J..X6 , s < t, then
p p.
Xt = L7rjXt-j = L7rjXt-j.
j=l j=l
If 7rl #: 7ri, then X t - l E 1lt - 2 and by stationarity X t E 1lt - l , t E '/I,.
Now, since et = X t - Er=l 7rjXt _j, we have et E 1lt -l, which contradicts
e t J..1l t -l. Then step by step, we can prove that trj = trj and finally that
=
p p. and e; = et.
We now turn to the existence. 1/ P(z) has no pole in a closed disc of
center 0 and radius 1 + h where h is strictly positive and small enough.
Consequently 1/ P(z) has the power series representation
1 00
P(z) = ~ajzj, z < 1+ h (10.14)

J=O
ARMA model 209
with ao = 1 and E j laj I < 00.

Now let us define a linear process by setting
X= I:ajet-j = (I:ajBj) et,

t
j=O j=O
(10.15)
it is easy to check that
P(B)Xt = P(B) (tajBj) et = let = Ct· (10.16)

.1=0
Thus (Xt ) is a solution of (10.11) and the proof is therefore completed. <>
In the following we will suppose that P(z) # 0 if Izl ~ 1. Note that this
implies the representation
p
P(z) = II(I- gi Z ) (10.17)

i=1
with Igil < 1, i= 1,2,···,p.

Autocovariance.
Theorem 10.1 (Yule-Walker equations) I/(Xt ) is an AR(p), then its au-
tocovariance ('Yt) satisfies the system
p
L 'lrj'Yk-j, k = 1,2, ... (10.18)
j=1
P
L'lrj"{j + q2 = 'Yo. (10.19)
j=1
Proof. If k ~ 1, we have
o = E(etXt-k) E[(X, -t,TiXH) x'-'l

P
'Yk - L'lrj"{k-j,
j=1
proving (10.18). On the other hand,
E(etXt) =E [(Xt -
A]
Xt)Xt = E(Xt - A2
Xt) =q 2
210 Lesson 10
and
E(e,X,) = E [ (X, - t, rJX'-I) X'l p
= "Yo - I: 1rj"Yj,
j=l
hence (10.19) follows. <>

The Yule-Walker equations are useful in statistics (see Lesson 14). On
the other hand, (10.18) allows to specify the asymptotic correlation between
X. and X t as It - 81 tends to infinity. Let us define the autocorrelation
of (Xt ) by putting
"Yle
Pie =-, k = 0,1,2,···.
"Yo
Fom (10.18) we obtain the difference equation
P(B)PIe = O. (10.20)
Now using (10.17), we see that, for every i, g:,

k ~ 0 is a particular so-
lution of (10.20). Using a method similar to that for homogeneous linear
differential equations we obtain the solution of (10.20):
P
Pie = I:cigf, (10.21)
i=l
which implies that Pie tends to zero at an exponential rate as k tends to

infinity.
Spectral density. Since an AR(p) is a linear process it possesses a spectral
density given by (10.3). Using (10.11) and Theorem 9.2, we obtain the
alternative form
(1'2 1
f(A) = 21r I n/_a \1'1' A E [-1r,1r]. (10.22)
Partial autocorrelation. The partial autocorrelation (rle) of a sta-

tionary process has been defined in Lesson 9. The partial autocorrelation
of an AR(p) has a characteristic property stated in the following theorem.
Theorem 10.2 If (Xt ) is an AR(p), then
rp = 1rp , rle = 0, for k > p. (10.23)
Proof. For every k > 1, let y(le) be the projection ofthe square integrable
random variable Y on the linear space generated by X t - 1 , " ' , Xt-le+!.
ARMA model 211
First, let us specify rp. If p = 1 we clearly have rp = Pp = '11"1. If P > 1,

(10.11) implies
p-1
X}p) = ~ 'll"jXt_j + 'll"pX~~ + o. (10.24)
j=1
Subtracting (10.24) from (10.11), we obtain
Xt - X t(p) -- 'll"p
(Xt-p - X(p)
t-p ) + Ct· (10.25)
Now, since ctl.(Xt- p - X~~), (10.25) entails
E (Xt - X¥'»)(Xt- p - X}~~)) = 'll"pE (Xt- p _ X~~) 2 (10.26)
and by stationarity
E (Xt- p - X t(p»)2
_p -_ E ( X t - X t(p»)2 • (10.27)
Thus (10.26) and (10.27) implies that rp = 'll"p.

Second, let us calculate rk for k > p. Using again (10.11) we obtain
P
X~k) = ~ 'll"jXt_j.
j=1
Therefore
E (Xt - X}k»)(Xt_k - X~~~)) =E (Ct(Xt-k - X~~~) =0,

which implies rk = o. <>
10.3 Moving Average Processes

Definition 10.3 (Xt, t E m) is said to be a moving average process
of order q (M A(q)) if
q
Xt = ~ajct_j, tE m, (10.28)
j=O
where ao = 1 a q =1= 0, and (Ct, t E m) is a white noise such that Ct E 1£t,

tEm.
212 Lesson 10
Example 10.2 Let us consider the process

Xt = Ct + alCt-l, t Em, (10.29)
where 0 < lad < 1. Then (Xt ) is a MA(l) since
00
Ct = Xt L.)-1)''+1'
- ""' a{Xt_j (10.30)
j=1
Existence. Here existence reduces to invertibility. Putting Q(z) =1+

alz+ ... + aqzlJ, we may write (10.28) under the form
X t = Q(B)ct. (10.31)
Similarly as in the AR(p) case the invertibility of Q(B) takes place as soon
as Q(z) 'I 0 for Izl ~ 1. In that case we have
= 1- f; '1I'jzJ, Izl < 1 + h

1 00.
Q(z)
with h small enough and L; l'1I'j I < 00, and consequently

00
Ct = Xt - E'1I'jXt -j (10.32)
j=1
or equivalently
Ct = (Q(B))-1 X t . (10.33)
We now state without proof some properties of M A processes.
The auto covariance is given by
ric = { ~2 (Ll:~ ajaj+Jc) O~k~q

k> q.
The partial autocorrelation (ric) is, in general, difficult to compute.
It tends zero at an exponential rate as k tends to infinity. If (Xt ) is a
M A(l) process, we have
ric
= (-1)Jc+ 1 aH1 -
2(1c+1)
an (10 34)
•
1- a 1
Using (10.3) we obtain the spectral density
0'2
f(>..) = 2'11'IQ(e i >')1 2 , >.. E [-'11','11']. (10.35)
ARMA model 213
10.4 ARMA Processes

Definition 10.4 (Xt, t E 7Z) is said to be a (mixed) autoregressive /
moving average process of order (p, q) (ARMA(p, q)) if
cI>(B)Xt = e(B)C:t, (10.36)
where
p
cI>(B) = 1- L¢jBj, (¢p # 0),
j=l
q
e(B) = 1- LOjBj, (Oq # 0),
j=l
# 0 if Izl ~ 1. (C:t) is a white noise.
with cI>(z)e(z)
The condition cI>(z)e(z) # 0 if Izl ~ 1 implies the invertibility of cI>(B)

and e(B). Therefore
X t = (cI>(B))-le(B)C:t (10.37)
and
C:t = (e(B))-lcI>(B)Xt. (10.38)
(10.38) means that (Xt ) is a linear process and (10.39) that (C:t) is the
innovation process of (Xt ).
Properties. Using (10.37) we obtain for every k > 0
p q
ric = E(XtXt-Ic) = L¢jrlc-j + LOjE(c:jXt-Ic),
j=l j=l
hence p
ric = L ¢jrlc-j k> q, (10.39)

j=l
which is a Yule-Walker equation (Theorem 10.1). Consequently (-)'Ic) be-
haves like the auto covariance of an AR process: it tends to zero at an
exponential rate as k goes to infinity.
Applying Theorem 9.2 on both sides of (10.37), one obtains the following
spectral density
i
(72 e(e "') 12 A E [-11',11'].
f(A) = 211'1 cI>i>.) (10040)
Noting that an ARMA(p,q) can be approximated by MA(q') processes

(see (10.38)), it is possible to show that the partial autocorrelation of
an ARM A(p, q) tends to zero at an exponential rate.
214 Lesson 10
10.5 Nonstationary Models and Exogeneous

Variables
In practice, processes often have a nonstationary part which can be a trend
or (and) a seasonality. Also they are influenced by other processes.
In this section, we give some indications about models which take into
account these facts.
ARIMA processes. Let us consider the random walk
X t = C1 + C2,··· + Ct, t ~ 1, (10.41)
where as usual (Ct) denotes a white noise. Clearly (Xt ) is not station-
ary, however X t - X t - 1 = Ct is stationary and it can be considered as an
ARMA(O, 0) process. We will say that (Xt ) is an autoregressive integral
moving average (ARIMA) process.
More generally an ARIMA(p, q, d) process satisfies the equation
<P(B)(I - B)d X t = 9(B)ct, (10.42)
where <P and 9 have the same properties as in definition 10.4 and d is an
integer which characterizes the trend.
Note that <P(B)(I - B)d is not invertible. Consequently (Xt ) cannot
be defined as a linear process. In fact, in order to define (Xt ) precisely, it
is necessary to consider p + d nonrandom initial values Xto-1,···, Xto-p-d
which allow to computeXto by using (10.43). When all these values are
elimilated, (Xt ) takes its "cruising speed" and then (I - B)d X t behaves
like an ARM A(p, q).
SARIMA processes. If (X t ) has a trend together with a period S, one
may consider the model
<P 1(B)(I - B)dXt = 9 1(B)clt (10.43)
with
<P2(B$)(I - B$)D X t = 9 2(B$)ct, (10.44)
where <P 1, 91. <P 2, 9 2 are polynomials of respective degrees p, q, P, Q.
We obtain the so called SARIMA(p, q, d; P, Q, D)$ process.
Example 10.3 The model SA RIMA (0 , 1, 1; 0,1, 1h2 is very useful in econo-
metrics. It may be defined by the equation
(I - B)(I - B 12 )Xt = (I - ~B)(I - OB12)Ct, 0 < ~ < 1, 0 < 0 < 1.

ARMA model 215
ARMAX processes. All the above models are "closed": they explain the
present of (Xt ) only by its own past. It would be more realistic to take into
account some exogeneous variables. For example electricity consumption is
linked with temperature.
The ARM AX model is an "open" process defined by
P(B)Xt = R(B)Zt + Q(B)ct, (10045)

where P, Q, R are polynomials and (Zt) is the exogeneous process.
More generally one may introduce SARIMAX processes.
10.6 Exercises
10.1. (i) Show that the series in (10.1) converges in mean square. (Hint:
Use Cauchy criterion.)
(ii) Show that this series converges with probability 1. (Hint: Show that
E (E j lajllct-jl) < 00.)
10.2. Show that a linear process is
(i) zero mean,
(ii) weakly stationary with auto covariance given by (10.2),
(iii) such that ct.lXt if s > t.
10.3. Let (Xt ) be a linear process associated with an i.i.d. white noise (ct).
Show that (Xt ) is strictly stationary.
lOA. Let (Xt ) be a process satisfying (10.13).
(i) Verify that
Xt = Ct + PCt-l + ... + /'Ct-Ie + pie+! Xt-le-l, k 2:: 1.
(ii) Show that, if Ipi = 1, (10.13) has no weakly stationary solution

satisfying ca.lXt' s > t.
10.5. Let (Xt ) be an AR(2) process defined by
(1 - g1B)(1 - g2B)Xt = Ct
with Ig11 < 1, Ig21 < 1, and g1 :f= g2. Compute 11e and show that it tends to
zero at an exponential rate.
10.6. Prove (10.30).
10.7. Consider the MA(I) process
Xt = ct + a1ct-1, t E 'U..,
216 Lesson 10
where 0 < lall < 1.

(i) Show that the projection X~'2l of X n+ l on the linear space generated
by Xl, X 2 , " ' , Xn is given by
n
X~'2l = I:tPjXn+l-j,
j=l
where tPl, ... ,tPn satisfy the difference equations
altPj-l + (1 + aDtPj + altPj+l = 0, 2:$ j :$ n - 1,

with conditions
(1 + antPn + al tPn-l = 0
and
(1 + antPl + altP2 = al·
(ii) Use (i) to prove (10.34).
10.B. Let (Xt ) be the random walk given by (10.42).
(i) Compute the correlation coefficient Pt,h of X t and Xt+h.
(ii) Determine limpt,h when t/h -+ 00.
10.9. Let ('7t) be a strong white noise (i.e., the '7t'S are i.i.d.) and let (rt, t E
'll) be a sequence of i.i.d. random variables such that Irtl :$ a < 1 where
a is constant. The processes (r t ) and ('7t) are supposed to be independent.
(i) Show that for every t E 'll the series E~l r t- l ··· r t-j'7t-j con-
verges in mean square.
(ii) Consider the process
00
Xt = '7t + I : rt- l ·· ·rt-j'7t-j, t E'll

j=l
and show that
Xt = rt-lXt- l + '7t, t E 'll
and that, for each t, r t is independent of Xt, Xt-l,···.
(iii) Compute E(XtIXt-l, X t - 2 ,"') in function of P = E(rt ) and Xt-l,
X t - 2 ,···.
(iv) Show that (Xt ) is a weakly stationary process and determine its
innovation (Ct).
10.10. Let ('7t, t E 'll) be a white noise. Set
00
X t = - I:pk'7t+k' tE'll,
k=l
ARMA model 217
where 0 < Ipi < l.

(i) Show that (Xt ) is stationary and such that
1
X t = -Xt -
p
1 + TIt, tE'll.
(ii) Compute the spectral density of (Xt ).

(iii) Compute E(XtTlt+d and show that (TIt) is not the innovation of
(Xt ).
(iv) Set
Ct = X t - pXt - 1 , tE'll.
Show that (ct) is a white noise such that E(cn < E(Tln.
(v) Show that (ct) is the innovation of (Xt ).
Lesson 11
Discrete-Time
Martingales
A martingale is a particular sequence of dependent random variables. The

concept comes from gambling systems but is also of great interest in Prob-
ability and Statistics.
11.1 Generalities
Let (O,A,P) be a probability space and let (Bn,n ~ 1) be an increasing
sequence of sub u-fields of A, that is, each Bn is a u-field contained in A,
and for each n, Bn ~ Bn+1'
We shall say that a sequence (Xn, n ~ 1) of real random variables is
(Bn)-adapted (or simply Bn-adapted) if, for every n, Xn is Bn-measurable.
For example, if Bn = u(X1 , ... , X n ), n ~ 1, then, clearly, the (Bn, n ~ 1) is
an increasing sequence of sub u-fields of A, and (Xn) is Bn-adapted. (See
Lesson 1 for notation and details).
Definition 11.1 Let (Yn , n ~ 1) be a sequence of positive or integrable real

random variables, which is Bn-adapted. Then
(i) (Yn ) is said to be a martingale if
E B " (Yn+d = Yn , n ~ 1; (11.1)
(ii) (Yn ) is said to be a sub martingale if
E B " (Yn +1 ) ~ Yn , n>

- l', (11.2)
219
220 Lesson 11
(iii) (Yn ) is said to be a supermartingale if
E B" (Yn+d $ Yn , n ~ 1. (11.3)
In the above definitions, E B " denotes the conditional expectation with

respect to Bn and the symbol "a.s." is omitted in (11.1) - (11.3). The same
conventions will be used in the following.
Note that (11.1) implies that (Yn ) is Bn-adapted.
We now give some elementary properties of martingales, submartingales,
and supermartingales.
Lemma 11.1 (i) (Yn ) is a martingale if and only if it is Bn-adapted and
L Yn+1dP = L YndP, BE Bn , n ~ 1. (11.4)
(ii) The sequence (E(Yn» is

- constant if (Yn ) is a martingale,
- increasing if (Yn ) is a submartingale, and
- decreasing if (Yn ) is a supermartingale.
(iii) (Yn ) is a martingale if and only if
E B " (Yn +p ) = Yn , n ~ 1, p ~ O. (11.5)
Proof.
(i) Since Yn is Bn-measurable, (11.4) means that Yn = EB"(Yn+d, by
definition of conditional expectation (see Lesson 1).
(ii) (11.1) implies
E [EB" (Yn +1)] = E(Yn+d = E(Yn)
and (11.2) implies
E [E B"(Yn+1)] = E(Yn+I) ~ E(Yn).
For a supermartingale, the proof is similar.

(iii) If (Yn ) is a martingale, we have for every p ~ 1,
ES,,(yn+p ) = E B" (E B"+P-l(Yn+p »= EB,,(yn+p _ 1).
It follows that
EB,,(yn +p ) = E B,,(yn+p _ 2 ) = ... = E B"+l(Yn ) = Yn .

Discrete-Time Martingales 221
The converse is obvious since E 13 ,,(yn ) = Yn implies the Bn-measurability

ofYn • <>
Analogues of (11.4) and (11.5) can be established for submartingales
and supermartingales (see Exercise 11.1).
Lemma 11.2 (i) If Yn E Ll(Bn), n ~ 1, then it is a martingale if and

only if
Yn=c+Xl+···+Xn, n~l, (11.6)
where Xk E Ll(Bk), k ~ 1 and E 13"-l (Xk) = 0, k ~ 1 (with Bo = {0, Q})
and c is constant.
(ii) IfYn E L2(Bn), n ~ 1, then
E(XiXj) = 0, 1 '5, i #j '5, n. (11.7)
Proof. (i) If (Yn ) is a martingale, we set
c = E(Yt}, Xl = Yl - E(Yl), Xn = Yn - Yn-l, n ~ 2,
hence (11.6) follows. Also Xk E Ll(Bk)' k ~ 1 and
E 13"-l(Xk) = E 13"-l(Yk - Yk-l) = Yk-l - Yk-l = O.

Conversely if (Yn ) is defined by (11.6), then we have
E 13 ,,-l(Yn ) = c+ Xl + ... + X n - l + E 13"-l = Yn - l .

(ii) If Yn E L2(Bn), then (11.6) holds and if i <j then
E(XiXj) = E (E13 (XiXj)) = E (XiE13 (Xj))

i i
and using the same method as in the proof of Lemma 11.1(iii), we obtain
E 13 i(Xj) = 0, hence (11.7). <>
(11.6) is valid for a submartingale ( or supermartingale) except that
E 13k-l(Xk) = 0 is replaced by E 13"-l(Xk) ~ 0 (or E13 k- 1 (Xk) '5, 0). See
Exercise 11.2.
11.2 Examples and Applications

a) Gambler's Fortune.
Suppose that in a certain game of chance, a gambler wins or loses Xn
at the nth play, then his fortune after n plays is
Yn=c+Xl+···+Xn,
222 Lesson 11
where c is the gambler's fortune at the beginning of the game.

=
The game is said to be fair if ES .. -l (Xn) 0, favorable if ES .. -l (Xn) ~
=
0, and unfavorable if ES .. -l (Xn) ::; 0. Here Bn U(Xl' X 2 , ••• , X n ), n ~ 1,
=
Bo {0, O}, and Xn is integrable.
The condition (11.6) shows that (Yn ) is a martingale if the game is fair
and Exercise 11.2 shows that (Yn ) is a submartingale (or supermartingale)
if the game is favorable (or unfavorable).
In this context, the interpretation of Lemma 11.1 (ii) is obvious. Note
however that these properties of (E(Yn)) cannot be taken as definitions of
fair, favorable, and unfavorable games since they do not take into account
a possible gambler's strategy which could change the nature of the game.
b) Gambler's Strategy.
Consider again a certain game of chance. A strategy is a sequence
(cn, n ~ 1) ofrandom variables such that:
°
(i) Cn = 1 if and only if the gambler decides to play at the nth game.
(ii) Cn = if and only if the gambler decides to miss the nth game.
(iii) Cn is Bn_1-measurable: the gambler's decision depends only on
X1, ... ,Xn- 1.
Then the new sequence of gambler's fortunes is
Zn = c+ c1X1 + ... +cnXn, n~1. (11.8)

Now if (Yn ) is a martingale, then we have
ES .. -l(Zn) + c1X1 + ... + cn-lXn-l + ES .. -l(cnXn)
C
= Zn-l + cnES ..-1(Xn) = Zn-l (11.9)
and consequently (Zn) is also a martingale.

If the game is unfavorable, (11.9) remains valid but since ES ..- l (Xn) ::;
0, it follows that
ES .. -l(Zn) ::; Zn-l
and unfortunately the game is again unfavorable! Similarly if (Yn ) is sub-
martingale, then (Zn) is also a submartingale.
A stopping time is a particular strategy defined by a random variable
r which takes its value in N' U {+oo} and satisfies
{r = n} E Bn , n~1.
The associated strategy is
Cn = I{T~n}' n~I
and is executed in the following way: the gambler plays all the game until
the rth and then stops definitely.
Thus the sucessive fortunes are
min(n,T)
Zn = c+ LXi, n~1 (11.10)
i=l
and, of course, (Yn ) and (Zn) have the same nature.
c) Random walk and the ruin problem.
Let (Xn, n ~ 1) be a sequence ofi.i.d. random variables with a common
distribution given by
P(Xn = 1) = p, P(Xn = -1) = 1- p, n ~ 1,
where 0 1/2, then (Sn) is a submartingale,
here Bn = u(X1 , ••• ,Xn ), n ~ 1.

Now using a suitable transformation, we can construct a function of Sn
which is a martingale whatever p. Let us actually define
Yn =(1 ;p )Sn , n ~ 1, (11.11)
then
E B n(Yn +1) EBn [(1; p) Sn+xn+l]

C;p)Sn EBn C;p)Xn+l
Since (Xl, ... , Xn) and X n+1 are independent, we have
(1- p)Xn+l
1- p)Xn+l =E--
EBn ( --
p p
-p 1
=p (1-p- ) +(I-p) (1-p-
-p )-1
= 1,
224 Lesson 11
so that
ES"(Yn+d = Yn , n;?:1
and (Yn ) is a martingale for every p.
Now if Sn is the fortune of a gambler who wins or losea one dollar with
probability p and 1 - p at each game, the ruin problem is defined by the
stopping time
T=inf{n: Sn=a or Sn=-b}, a,b;?:O,
where {Sn =
-b} corresponds to the ruin of the gambler and {Sn =
a} to
the ruin of his opponent.
Similarly as in b), it can be proved that the perturbation of (Yn ) by T
does not affect its nature (see Exercise 11.3).
d) Polya's urn model.
An urn contains a red and b black balls. A ball is drawn at random.
It is replaced and c balls of the color drawn are added. A new random
drawing is then made and this procedure is repeated; This model can be
used to describe contagious diseases.
let Yn be the proportion of black balls after the nth drawing, then
Yo = b/(a + b) and we claim that (Yn ) is a martingale adapted to Bn =
u(Yo, Yl, ... , Yn ), n ;?: O.
To prove this we set Yn =
0:/ f3 where 0: is the number of black balls and
f3 the total number of balls after the nth drawn. Then, whatever (0:, f3) is,
( Yn +1IYn = 7i0:) = 7io:o:+c ( 0:)_0: _~

E f3 + c + 1 - 7i f3 + c - f3' (11.12)
and since (Yn ) is clearly a Markov process, we obtain
ES"(Yn+d = Yn.
e) Likelihood ratio.
Let (Xn, n ;?: 1) be a sequence of i.i.d. random variables with common
density f or g. Xl' ... ' Xn are supposed to be observed and we wish to
determine the true density. For that purpose we construct the likelihood
ratio (see Lesson 13) by setting
Y = TI?=1 g(Xi)
n n;?:1. (11.13)
TI?=1 f(Xi) ,
This quantity is "small" if I is the true density and "large" otherwise.

We now prove that (Yn ) is a martingale adapted to Bn = u( X 1, ... , X n ),
n ~ 1, provided I is the true density.
For convenience we assume that I is strictly positive, then we have
E (Yn +1I X 1, ... , Xn) = J g(X1)·· .g(Xn)g(y)

Ltv" \ Lt v" \ Lt •.\ I(y)dy =Yn
or equivalently
EBn(Yn+d = Yn , n ~ 1,
which is the desired result.
11.3 Convergence of Martingales

In all examples presented in Section 11.2, the behaviour of (Yn ) for n large is
of great interest. We now study this asymptotic behaviour. In the following
we shall suppose that (Yn ) is integrable.
We establish, first, two technical lemmas.
Lemma 11.3 Let (Yn ) be a martingale and let u be a convex function such
that u(Yn) is integrable lor every n. Then (u(Yn )) is a submartingale.
Proof.
Using Jensen's inequality (see Lesson 1), we obtain
EBn [u(Yn+1)] ~ u [EBn(Yn+dJ = u(Yn ), n~1
as desired. O.
Lemma 11.4 (Kolmogorov's inequality). Let (Un) be a positive submartin-

gale, then
P (maxuk > t) < E(Ut
k~n -
n)
'
t > 0, n ~ 1. (11.14)
Proof. Let
Aj = {U1 ~ t, ... , Uj-1 ~ t, Uj > t}
and
A = U = {max > t} .
j=1
Aj
k<n
-
Uk
226 Lesson 11
Since Un is positive, we have

n
Un ~ Un lA =L Un1A;'
j=1
therefore n
E(Un) ~ LE (Un1A;). (11.15)
j=1
Now since Aj E Bj and Un is a submartingale, we have

EB; (Un 1A;) = 1A;EB; (Un) ~ 1A;Uj.
Taking expectation on both sides of this inequality, we obtain
E (Un 1AJ ~ E (Uj1A;). (11.16)

But Aj implies that {Uj > t} and consequently
E(lA;Uj) ~tP(Aj). (11.17)
Then using (11.15) - (11.17), we get
n
E(Un ) ~ t L P(Aj) = tP(A)
j=1
hence (11.14). <>

Corollary 11.1 If (Yn ) is a martingale, then
P (T~IYkl > t) ~ E(I~nl), t> 0, n ~ 1. (11.18)
Proof. Since the function x 1--+ Ixl is convex, Lemma 11.3 shows that
(IYnl) is a submartingale. Thus applying (11.14) to IYnl, we obtain (11.18).
<>
We now in a position to state the martingale's convergence theorem.
TheoreIll 11.1 (Convergence of martingales). Let (Yn ) be a martingale
such that
sup E(Y;) < 00 (11.19)
n~1
then there exists a random variable Y such that

L2
Yn ~ Y and Yn ~ Y a.s. (11.20)
Proof.
(i) We first prove L2-convergence by using Cauchy criterion.
Note that if n ~ m,
E(YnYm ) = E [ES",(ynym )]
E [YmEs",(yn )] = E(Y~),
hence
E(Yn - Ym)2 = E(Y;) - E(Y~), n::::m. (11.21)
Now Lemma 11.3 and (11.19) show that (Y;) is a submartingale. Thus
E(Y;) is increasing (see Lemma 11. 1(ii)) and since it is bounded, E(Y;) II
for some finite l.
Using (11.21), we infer that
lim E(Yn - Ym?
n,m~oo
= l - l = 0,
which proves L2-convergence.
(ii) We now turn to almost sure convergence. Note first that
(Yk-Ym, k=m+1,m+2, ... )
is a martingale since
ES"(Yk - Ym ) = Yk - Ym , k>m.
Thus (Yk - Ym)2, k > m is a positive submartingale (see Lemma 11.3) and
applying Kolmogorov's inequality we find, for n :::: m,
p (Am,n(e)) ~ E(Yn : Ym? = E(Y;) - E(Y~) -? , (11.22)
where
n
Am,n(e) = U {IYk - Yml > e:}.

k=m+l
Then letting n -+ 00 in (11.22), we obtain
P (=9+1 {IYk - Yml > e}) ~ l - E.,(Y~), m::::1. (11.23)
Now letting m -+ 00 in (11.23), we have
p (m~l k=9+1 {IYk - Yml > e}) = 0 e > 0,

228 Lesson 11
(n u n
therefore
p
11=1 m=1 .I:=m+1
{IYk - Yml ~ ~}) = 1
and finally
p (O,Q".DJIY' ~~}) ~ -Y,·I I
or equivalently
lim
.1:,.1:'_00
IYk - Yk,1 = 0, a.s .
which proves a.s. convergence. <:>

Condition (11.19) can be improved as the following theorem shows.
Theorem 11.2 Let (Yn ) be a martingale such that
sup E(lYnl) < 00, (11.24)

n
then there exists a random variable Y such that
Yn~ y. (11.25)
Proof is omitted.
We now present two important probabilistic applications of Theorem
11.1.
Theorem 11.3 (Strong law of large numbers). Let (Xn) be a sequence of

Bn -adapted, square integrable random variables such that
EB"(Xn+d = 0, n ~ 1.
Then if 0 < an /00 and En E(X~)/a~ < 00, we have
Xl+···+X n a.a. 0.
--+ (11.26)
an
Proof. We need the following classical Kronecker's lemma whose proof is
left as an exercise.
Lemma 11.5 Let (xn) be a real sequence such that the series En xn/a n
converges, where 0 < an / 00, then
·
11m Xl + ... + Xn 0
= .
n-oo an
Now let us consider the random sequence

n
Yn = EXk/ak, n~1.
k=l
Lemma 11.2 shows that (Yn) is a square integrable martingale and that
(11.7) is satisfied, hence
E(Y;) = ~ E(Xi) < ~ E(Xi)

LJ a2 - LJ 2 < 00
k=l k k=l ak
and by Theorem 11.1 it follows that
Yn~Y
hence (11.26) is obtained by Kronecker's lemma. o

Theorem 11.4 (Random series). Let (Xn) be a sequence of square inte-
grable, zero mean independent random variables such that
LE(X;) < 00.

n
Then En Xn converges a.s. and in mean square.
Proof. Note that

Yn = Xl +···Xn , n~l
is a martingale which satisfies (11.19), hence

L2
Yn -- Y and Yn -- Y a.s. 0
Corollary 11.2 let (Xn) be a sequence of square integrable, zero mean

independent random variables such that En Xn converges in mean square,
then En converges a.s.
Proof. The L2-convergence entails
~E(X~)=E(~X~) <00
and the result follows from Theorem 11.4. o
Finally we state without proof a central limit theorem.
230 Lesson 11
Theorem 11.5 (Central limit theorem for martingales). Let (Yn ) be a

square integrable martingale. Set
Xn = Yn - Yn-l, n~2
and suppose that

(i) the Xn are uniformly bounded and
(ii) En EB"-l (X~) = 00, a.s.
Then
Yilt
Vi
1)
---+ N '" N(O, 1)
where
lit = min{n: tEB"-l(X~) ~ t}

k=2
with the convention lit = 1 if { ... } = 0.
11.4 Exercises
11.1 Let (Yn ) be a Bn-adapted integrable random sequence.
(i) Show that (Yn ) is a supermartingale if and only if
L Yn+l dP $ L YndP, BE Bn, n;?: 1.
Hint: choose
B = {Yn $ EB"(Yn+d}.
(ii) Show that (Yn ) is a supermartingale if and only if
E B" (Yn+p) :$ Yn, n ~ 1, p ~ O.
(iii) Prove the similar results for a submartingale.

11.2. Prove the analogues of (11.6) for a supermartingale and a submartin-
gale.
11.3. Prove that (Yn ) defined by (11.11) perturbated by
T = inf{n : Sn = a or Sn = -b}
is again a martingale.
11.4. Let Y be a square integrable random variable and let (Bn) be an

increasing sequence of u-fields. Show that (ES,,(y») is a martingale which
converges in mean square and almost surely.
11.5. Conversely show that if a martingale (Yn ) converges in mean square,
then
Yn = ES"(Z), n ~ 1
for some square integrable random variable Z and therefore (Yn ) converges
almost surely.
11.6. Let (Yn ) be the martingale defined in the Polya's urn model.
(i) Show that Yn --+ Y a.s. and in L2 ..
(ii)* Show that Y '" B(J.t, v) with density
r~J.t. + .v~ (1 _ X)~-lx"-l, 0< X < 1,
where J.t = b/c and v = a/c.

11.7. Consider the probability space ([0,1], B[O,l], p) where P is Lebesgue
measure and B[O,l] the Borel u-field of [0,1]. Let f be a continuously dif-
ferentiable real function on [0,1]. Define
Yn(W) = f(i/2 n ) - feci - 1)/2n) wE [(i -1)/2n ,i/2n),

2 n
1~ i ~ 2n , n ~ 1, with [(2n - 1)/2n , 2n/2n) = [(2n - 1)/2n , 1].
(i) Prove that (Yn ) is a martingale.
(ii) Prove that Y n --+ Y a.s. and in L2.
11.8. Let (Xn) be a sequence of independent random variables such that
for n ~ 2
1
=~.
2
n -1
P(Xn -1 = -n) = n 2 P ( Xn- 1 = n2 n_ 1
)
(i) Show that

Yn = X 2 + ... + X n , n ~ 2
is a martingale.
(ii) Show that Yn ~ 00.
232 Lesson 11
11.9. Let (Xn) be a finite-state Markov chain with transition matrix (Pij).
suppose that for all i,
LPijXj = AXi,
j
where A i= O. Show that
Yn = A-nxx .. , n~1
is a martingale.
11.10. Let (Xn) be a sequence of independent positive random variables
such that E(Xn) = 1.
(i) Show that Yn = II?=l Xi, n ~ 1 is a martingale and that Yn ~ Y.

(ii) Consider the special case where
P (Xn = ~) = P (Xn = ~) =~, n ~ 1.

Show that, in this case, Y = 0 a.s. and compare E (II:l Xi) with II:l E(Xi)'
11.11 >1<. (Doob-Meyer decomposition).
Let (Xn) be a sequence of Bn-adapted integrable random variables.
(i) Show that there exists a martingale (Yn ) and a sequence (An) of
Bn_l-adapted integrable random variables such that
Xn = Yn +An, n ~ 1 a.s.
(ii) Show that the above decomposition is unique.

(iii) Show that (An) is increasing if and only if (Xn) is a sub martingale
(see e.g. Rao (1984), p. 184).
Lesson 12
Brownian Motion and

Diffusion Processes
This Lesson is mainly devoted to the study of continuous time processes

defined by differential equations. The crucial mathematical tool is the
stochastic integral.
12.1 Gaussian Processes

A real random variable Xo is said to be standard Gaussian (or standard
normal) if its distribution has density
1 _:1:2/2
f(x) = 0/ie , z E JR. (12.1)
A real random variable X is said to be Gaussian (or normal) if there

exists a standard Gaussian random variable Xo and real numbers a and b
such that
X = aXo+b. (12.2)
If a I: 0, then X has density
f,..,q{x)
1
= 0'0/i exp
(x
-
_1-')2)
20'2 ' x E JR, (12.3)
where I-' = band = lal. The associated distribution is

0' denoted by
N(I-',0'2).
If a = 0, then X = b is degenerated with distribution O(b).
233
234 Lesson 12
Let X = (Xl"'" Xn) be a n-dimensional random vector. X is said

to be Gaussian (or normaQ if for every a = (al,"" an) in IRn , the real
random variable E~=l aiXi is Gaussian.
If IRn is equipped with its Euclidean structure, then it is equivalent to
say that (a, X) is Gaussian for each a in IRn where (, ) denotes the scalar
product in IRn.
The following lemma exhibits the crucial property of Gaussian vectors.
Lemma 12.1 let X be a p-dimensional random vector and let T be an

affine transformation of IRP into IRq, then T(X) is a Gaussian random
vector.
Proof. There exists a linear mapping from IRP to IRq, say A, and B in IRq,
such that T(X) = AX + B. Hence from every a E IRq,
(a, T(X») = (a, A(X») + (a, B) = (A' a, X) + (a, B),

where A' denotes the transpose of A. Thus (a, T(X») is a real Gaussian
random variable and consequently T(X) is Gaussian. <>
Now let I' = (E(Xt}, ... , E(Xn» be the expectation of a Gaussian
vector X and let C = (Cov(X.,Xt », 1 ~ s,t ~ n be its covariance matrix.
Then the characteristic function of X is given by
<px(t) = E (ei(t,X)) = exp(i(t,I')exp (-~(t,Ct)), t E IRn. (12.4)
This expression shows that I' and C determine the distribution of X.

If C is nonsingular, then X has density given by
fx(x) = 1"_\n/~lrtll/? exp (-~«x -1'), C(x -1'»)) , x E IRn , (12.5)
where ICI denotes the determinant of C.

For the proofs of (12.4) and (12.5), see Exercise 12.1 and Exercise 12.2.
We are now in a position to define Gaussian processes.
Definition 12.1 A real stochastic process (Xt, t E T) is said to be Gaus-
sian if all vectors (Xtl' ... , Xt/c)' k 2:: 1, tl, ... , tic E I are Gaussian.
Concerning the distribution of a Gaussian process, we have the following

Lemma 12.2 The distribution of a Gaussian process is completely specified
by the mappings
t ~ E(Xt ) and (s,t) ~ Cov(X.,Xt ).

Brownian Motion and Diffusion Processes 235
Proof. Clear, since

(i) the distribution of a stochastic process is completely specified by its
finite dimensional distributions (Lesson 2);
(ii) the distribution of a Gaussian vector is determined by its mean and
its covariance matrix. <>
The following lemma gives a criterion for independence of the margins
of a Gaussian vector.
=
Lemma 12.3 Let X (Xl, ... , Xn) be a n-dimensional Gaussian vector.
Then Xl, ... ,Xn are independent if and only if the covariance matrix is
diagonal.
Proof. Let tP x, tP Xl' ... tP X .. be the characteristic functions of X , Xl, ... , X n ,

respectively. The Xi'S are independent if and only if
n
tPX(tl, ... , t n ) = II tPX;(ti), (tl' ... , t n ) E IRn. (12.6)

i=l
Now we may and do suppose that X is zero mean. Thus (12.6) is

equivalent to
exp Hp;tiE(X,Xil] =exp [-~~tlE(Xlll'

which in turn is equivalent to
E(XiXi) = 0, i -:/: j,
which means that ex is diagonal. <>
12.2 Brownian Motion

We now study the most important continuous time Gaussian process.
Definition 12.2 (Wt, t E JR+) is said to be a Wiener process (or a

Brownian motion process) if
A. W t is distributed as N(O, (12t), t ~ 0, where (12 > is constant.
B. (Wt ) has independent increments.
°
Recall that B means that W t2 - W t1 , ... , Wt" - W tk _1 are independent
for every k ~ 3 and every positive tl, ... , t",.
236 Lesson 12
Interpretation. A particle plunged into a homogeneous fluid suffers

impacts of the surrounding molecules. Wt is the abscissa at time t of the
projection of the particle on a given axis. Wt is the result of a large number
of independent small displacements and is therefore Gaussian by the Central
Limit Theorem.
Brown (1827) was the first to observe this phenomenon, Bachelier (1900)
gave a mathematical description of this motion, Einstein (1905) interpreted
physically the parameter (12 and Wiener (1923) gave a rigorous mathemat-
ical derivation of the process.
The following theorem gives characterization and some properties of
Brownian motion process.
Theorem 12.1 (i) A Wiener process has stationary increments and its
covanance IS
C(s, t) = (12 min(s, t), s, t ? O. (12.7)
(ii) Conversely every zero mean Gaussian process with covariance (12 min(s, t)
is a Wiener process.
Proof. (i) Let ¢ be the characteristic function of an increment W t - Ws

(s < t). Since
Wt = (Wt - W8) + W"
the axioms of Wiener process entail
exp (_~(12tu2) = ¢(u)exp (_~(12SU2), U E 1R
hence
¢(u) = exp ( _~(12(t - s)u) , u E JR,
which is the characteristic function of N(O, (12(t - s». This proves that the
distribution of Wt - W8 depends only on t - s.
Now in order to establish (12.7), it suffices to write
C(s, t) E(W8 Wt) = E (W8(Wt - Ws »+ E(W8 2)
E(Ws2) = (12S, S ~ t.
(ii) Let (Xt, t ? 0) be a zero mean Gaussian process satisfying (12.6).

Then A is clearly valid. Concerning B, note that
E «Xt4 - X t3 )(Xt2 - X tt »= (12(t2-t l-t2+t d = 0, 0 ~ it < t2 ~ ta < t4,

then using Lemma 12.3, it is easy to prove B, details are omitted. <>
Second order calculus for Brownian motion. Since C(s, t) 00 2 mines, t) =

we infer from Theorem 9.6 and Theorem 9.8 that (Wt ) is L2-continuous and
locally L2-integrable.
Note that (Wt ) is not L 2-differentiable since
2 h
E ( Wt+hh- Wt)2 -_ h
00
--+ 00 as -+ 00.
In fact it can be proved that, with probability 1, the sample functions of

(Wt ) are continuous but not differentiable (in the usual sense). Proof of
this property is rather tricky. We only prove the slightly weaker result as
follows.
Theorem 12.2 The total variation 01 (Wt ) over each real interval is al-
most surely infinite.
Recall that the total variation of a real function over [a, b] is defined by
k
v(J) = sup L: I/(ti) - l(ti-1)1,

i=1
where the supremum is taken over all k and over all choices of (ti) such
that a = to < t1 < ... < tk = b.
v(J) is finite if and only if I is the difference of two monotonous func-
tions. In particular if I has a bounded derivative then v(J) < 00.
Proof of Theorem 12.2. We may and do suppose that (Wt ) is standard
= =
(i.e. 00 2 1) and that [a, b] [0,1]. Define
2ft
Yn = I: IWi / 2ft - W(i-1)/2 ft l, n;?: 1 (12.8)
i=1
and let N be an auxiliary random variable with standard normal distribu-
tion. Let us set E(IN!) = a and V(IN!) = [3, then
E (IWi/2ft - W(i-1)/2ftl) = aT n / 2
and
V (IWi / 2 ft - W(i-1)/2,, 1) = [32- n .
=
Then B entails E(Yn ) a2 n / 2 and V(Yn ) [3. =
Using Tchebychev inequality (Lesson 1) we get
P (IYn - a2 n/2 1> n) ~ ~.

238 Lesson 12
Thus the Borel-Cantelli lemma (Lesson 1) implies
P (liminfYn ~ Ot2 n / 2 - n) = 1,
hence
Yn - + 00 a.s.
and consequently v «W »=t 00 a.s. <>
The following Karhunen-Loeve expansion may be used as an alternative
definition of the Wiener process.
Theorem 12.3 The K. L. expansion of a standard Wiener process is given

by
Wt = ~ v'2 en sin ( n + ~) 1Ii, 0 ~ t ~ 1, (12.9)
where (en) is a sequence of independent random variables with distributions

N(O, (n + 1/2)-21("-2).
Proof. See Exercise 12.3. <>
It can be established that the series in (12.9) converges uniformly with
probability 1. This entails the a.s. continuity of the sample functions of
Wiener process.
12.3 Stochastic Integral
J:
We now introduce Ito integral, an essential tool in the theory of diffusion
processes. This integral has the form f(t)dW(t) but its definition turns
out to be intricate since the differential dW(t) does not exist in the usual
sense as we have seen in the previous section.
Let us consider a standard Wiener process (Wt ) defined on the prob-
ability space (0, A, P) and let :Ft =
u(W" a ~ s ~ t), a ~ t ~ b be the
family of increasing u-fields associated with W = (Wt, a ~ t ~ b). :Ft is
the set of those events which only depends on the behaviour of (Wa) in the
time interval [a, t].
We now define the class C of integrands as the class of random functions
f such that
(a) f E L2 ([a, b] x 0, B[/J,h] X A, L ® p),
where L denotes the Lebesgue measure over [a, b].
(b) For every t E [a, b], the random variable f(t,.) is :Ft-measurable.
Note that (a) and Fubini's theorem (see Appendix) entail

J: E (1/(t)12) dt < 00. (b) means that I(t,w) is nonanticipating with
respect to W; in other words the value of I(t) depends only on past and
present values of the Wiener process.
We first consider the stochastic integral for the class e of step random
functions which belong to C, that is functions I of the form
n-I
I(t,w) = L: li(W)I[ti.ti+l)(t), a ~ t ~ b, wE 0, (12.10)
i=O
where a = to < tl < ... < tn = b and where, by convention, [tn-I, tn) =
[tn-I. b].
Note that since I is nonanticipating, we have li(W) l(ti'W), i = =
0,1, ... , n - i, consequently Ii is Fti measurable and E(fl) < 00.
Now we define the Ito integral of I by setting
1a
b
l(t)dW(t)
n-I
= L: li(Wti+
i=O
1 - wt;}. (12.11)
It is easy to prove that (12.11) does not depend on the partition (ti).
The following lemma is crucial.
Lemma 12.4 I : I 1---+ t l(t)dW(t) is a linear mapping from e into

L2(O,A, P), and a
E (l b
IdW) = 0, lEe, (12.12)
E (1 1b
6
IdW 9dW) = 1 6
E (f(t)g(t» dt, I, gEe. (12.13)
Note that (12.13) means that I is an isometry from e into L2(O,A, P).
Proof. First since Ii and W ti +1 - W ti are square integrable independent
random variables, Ii (Wti+l - W t ;) is square integrable and consequently
J: IdW is in L 2(O,A,P).
Now J IdW and J gdW may be defined by using the same partition
(ti)' thus Ito integral is clearly linear.
Second, noting that F t = 0" (Wa, Ws - W a, a ~ s ~ t) and that (Wt ) has
independent increments, we get
E (Wti +1 - W ti 1Ft;) = E (Wti+1 - Wt;) . (12.14)

240 Lesson 12
On the other hand, from the Tt;-measurability of Ii, we obtain
E (J (Wt;+l - Wt;) ITt;} = liE (Wt;+l - Wt;ITt;). (12.15)

Thus (12.14) and (12.15) imply
E [I (Wt;+l - Wt;) ITt;] = E (Wt;+l - Wt;) = 0, i = 0,1, ... , n - 1.

Consequently
E (1b) n-l
IdW = ~ E [E (I (Wt;+l - Wt;) ITt;)] = O.
For (12.13), write
n-l n-l
1= L li 1[t;,t;+1) and 9 = L gi 1[t;,t;+1)'
i=O i=O
then
E (l l fdW 9dW) = E [t: f;g; (W';" - W.;) (W' H ' - W") 1

=L E [E (Ji9j (Wt;+l - Wt;) (Wti+l - Wtj ) ITmax(t;,tj»)]
i,j
== LE(aij).
i,j
If i < j, then ti < ti+l :::; tj hence

aij = li9j (Wt;+l - Wt;) E (Wti+l - Wtj ITtj) = O.
Similarlyaij = 0 if i > j. Now if i = j, then ti = tj and
au = ligiE (Wt;+l - Wt J2 = ligi(ti+l - ti)'
Finally
E (l b IdW 1b 9dW)
n-l
L E(figi)(ti+l - ti)
i=O
1b E [/(t)g(t)] dt. <>
The following technical lemma allows to extend the integral to C.
Lemma 12.5 t = e, where the closure is taken in the Hilbert space

L2 ([a, b] x fl, B[a,b] ® A, L ® P).
Proof. See Exercise 12.4. <>
Theorem 12.4 f 1---+
continuous and such that
J:
fdW has a unique extension to e, which is linear,
E ( [ fdW) =0, fEe (12.16)
and
E (l b lbfdW 9dW) = lb E [f(t)g(t)] dt, f, gEe. (12.17)
This extension is called the Ito integral or the stochastic integral.
Proof. Immediate from Lemma 12.4 and 12.5 and the bicontinuity of the
scalar product. <>
We now indicate some properties of the stochastic integral:
(i) If f is non-random, then
lb fdW is distributed as N (0, 1b f 2 (t)dt) .
If in addition f is continuously differentiable, then we have the integral by

parts formula
1 b
fdW = f(b)W(b) - f(a)W(a) -
1b a f'(t)W(t)dt. (12.18)
(ii) Define
Xt = it f(s)dW(s), a ~ t ~ b, (f E e), (12.19)
then (Xt ) is nonanticipating with respect to (Wt ) and has continuous sample
functions (a.s.)
(iii) (Xt ) is a (continuous-time)martingale with respect to (Ft ), that is
E(XtIF,) = X" a ~ s ~ t ~ b. (12.20)

For the proofs of above properties, see exercises.
242 Lesson 12
12.4 Diffusion Processes

In many practical situations, a particle is plunged into a nonhomogeneous
and moving fluid. For example the ''fluid'' is the economic conjuncture and
the "particle" is the price of some product.
We now can model such a situation by considering the stochastic dif-
ferential equation
dXt = J.l(Xt , t)dt + O"(Xt, t)dWt. (12.21)
where J.l(Xt , t) is called the drift term and O"(Xt, t) the diffusion coeffi-
cient.
Intepretation. X t is the position ofthe particle at the time t, 1'(:1:, t) is
the velocity of a small volume v of fluid located at :I: at time t. A particle
within v will carry out Brownian motion with parameter 0"(:1:, t). (12.21)
gives the change in position of the particle in the time interval [t, t + dt].
The appropriate mathematical form of (12.21) is
X t- 1t
Xa = I' (X, , s)ds + 1t O"(X"s)dW(s), a ~ t ~ b, (12.22)
where the first integral is in L2 sense and the second is in Ito sense.
We now state assumptions which allow to claim that (12.22) has a so-
lution.
Ai. There exists k > 0 such that
IJ.l(Y, t) - 1'(:1:, t)1 ~ kly - :1:1, 11'(:1:, t)1 ~ kv'1 + :1: 2

and
IJ.l(Y, t) - 0"(:1:, t)1 ~ kly - :1:1, 10"(:1:, t)1 ~ kv'1 + :1: 2 ,
for all :1:, y, t.
A2 • Xa E L2(0, :Fa, P) where :Fa = O"(Wa).
Theorem 12.5 If Ai and A2 hold, then there exists (Xt, a ~ t ~ b) such

that
(i) (Xt ) E C,
(ii) (Xt ) is L2-continuous and its sample functions are a.s. continuous,
(iii) (Xt ) is a solution of (12.22), and
(iv) (Xt ) is unique in the sense that if (yt) is solution of (12.22) with
=
Ya Xa and satisfies (ii), then
P(yt = X t , t E [a, b]) = 1. (12.23)

Proof (sketch). We split the proof in four parts.

(a) Let us define the sequence of processes Xn = (Xn(t), a $ t $ b),
n;::: 0 by
Xo(t) = X(J, a$t$b (12.24)
and
Xn(t) = X(J + it Jl(Xn-l(s), s)ds + it u(Xn_l(s), s)dW(s), n;::: 1.
We are going to show that (Xn) is a sequence of approximations to the

solution of (12.22).
First we show by induction that the processes (Xn(t» satisfy (i), (ii),
and
(ii)': s 1---+ Jl(Xn(s), s) is bounded a.s. and (s,w) 1-+ u(Xn(s,w), s)
belongs to C.
Clearly Xo satisfies the above conditions. Now we assume that Xn has
these properties. To establish L 2-continuity, we compute for s < t,
= 1 t
1 t ]2
r
E (IXn+1(t) - Xn+l(sW) E [ Jl(Xn(x), u)du + u(Xn(X), u)dW(x)
< 2E (J Jl + 2E (J u) 2• (12.25)
Using Schwarz's inequality and the isometry of Ito integral, we obtain
E (lXn+l(t) - Xn+l(sW) $ 2(t - s) 1t E (m 2(Xn(X), u) du

t
+21 E (U 2(Xn(X), u)dW(x» . (12.26)
Now from Al we get
E (lXn+l(t) - Xn+l(sW) $ 2[(t - s) + l]k21t [1 + E (X~(x»] duo

(12.27)
Since Xn is L2-continuous, E (X~(x») is bounded over [a, b]. Hence the
bound in (12.27) tends to zero as t - s tends to zero. This proves the
L2-continuity of X n +l .
The other properties are straightforward consequences of properties of
stochastic integral and are therefore omitted.
244 Lesson 12
(b) We now show that there exists (Xt, a::; t ::; b) such that
L~
Xn(t) --+ Xt , a ::; t ::; b. (12.28)
For this purpose, define for every t E [a, b],
.6. 0 = X a , .6.n(t) = Xn(t) - Xn-l(t), n ~ 1,

then
E (l.6. n(tW) = [1 [P(Xn(s), s) - p(Xn-I(S), s)]ds

t
+ 1 t
[u(Xn(s),s) - U(Xn_l(s),S)]dW(S)].
Using the same method as above (see (12.25) and (12.26)), we obtain
E (I.6. n(t) 12) ::; 2(t - a) 1t E [P(Xn(S), s) - p(Xn-I(S), S)]2 ds
+ 1t E [u(Xn(s), s) - u(Xn_l(s), s)]2 ds
1t
and from AI,
E (I.6. n(tW) ::; K E (l.6.n _ l (s)1 2 ) , (12.29)
where K = 2[(b - a) + 1]k 2 • In particular,

E (1.6. I (tW) ::; K(t - a)E(X~),
E (1.6.2(tW) ::; K 1t E (I.6. n(sW) ds ::; K 2(t ~ a)2 E(X~),

and, inductively
E (l.6. n(tW) ::; K n (t - ~)n E(X~). (12.30)

n.
Now for n > m,
n
IXn(t) - X/J(t) 1 = L 12-i/ 22i/ 2.6.j (t)12
j=m+l
< (t,2-;) (t,vI6.;(t)I') (12.31)

by Schwarz's inequality.
From (12.28) and (12.29), we infer that
E (IXn(t) - XI'(tW) :s E(X2)a ~

L.J [2K(b .,- a)F ---+ 0, (12.32)
j=mH J.
as n, m ~ 00, hence (12.28) from Cauchy criterion.

Note that (12.32) shows that the convergence is uniform with respect
to t, thus (Xt ) is L2-continuous. It is easy to verify that (Xt ) E C and is
a.s. continuous.
(c) We now show that (Xt ) is solution of (12.22). Let us set
D(t) = X t - Xa - i t Jl(Xs , s)ds - i t u(Xs, s)ds,
we have
D(t) = [Xt - Xn+l(t)] - i t Ut(Xs , s) - Jl(Xn(s), s)]ds
-it[u(xs, s) - u(Xn(s), s)]dW(s)

- Pn+Qn+Rn .
By (12.28),
L2
Pn = X t - Xn+l(t) ---+ O.
Using I we get
Qn = i t
a Ut(Xs, s) - Jl(Xn(s), s)]ds ~ O.
2
Finally the isometry of Ito integral and Al imply
i t
Rn = a [u(Xs, s) - u(Xn(S), s)]dW(s) ---+
L2
0,
and consequently D(t) 0 as desired. =

(d) It remains to prove unicity. Since (Xt ) and (yt) satisfy (12.22), we
have
X t- = it
yt (j.t(Xs, s) - Jl(Ys , s)]ds + it [u(Xs, s) - u(Ys, s)]dW(s).
246 Lesson 12
Again Al and the isometry of Ito integral yields
E (IXt - yt12) $ A it E (IX. - Y.12) ds == F(t)
hence
F'(t) - AF(t) $ 0,
! (e-
then
At F(t)) $ 0.
Finally e- At F(t) is a positive decreasing function which vanishes at a.

Therefore F(t) = 0, a $ t $ b and consequently
P(Xt = yt) = 1, a$ t $ b
or
P ( n
tEQn[a,bj
{Xt = yt}) = 1,
but since the sample functions of (Xt ) and (yt) are continuous (a.s.), we
obtain (12.23) and the proof of Theorem 12.5 is now complete. 0
Ito's differentiation formula. The following change-of-variable formula
is very useful. Let ¢ : IR x [a, b] - - IR such that the partial derivatives
a¢/ax, a¢/at, a 2 ¢/au 2 exist and are continuous for all (x, t) in IR x [a, b]
and let (Xt, a $ t $ b) with stochastic differential
dXt = /(t)dt + g(t)dWt, (12.33)
then yt = ¢(Xt, t) has stochastic differential

dyt = h(t)dt + gl(t)dWt, (12.34)
where
a¢ a¢ 1 2 a2 ¢
h(t) = /(t) ax (Xt, t) + at (Xt, t) + '2 g (t) ax 2 (Xt, t) (12.35)
and
gl(t) = g(t) a¢
ax (Xt, t). (12.36)
For the proof we refer to Ash and Gardner (1975).

12.5 Processes Defined by Stochastic Differ-

ential Equations
In this section we give some important examples of continuous time pro-
cesses.
(i) A simple example is the process (Xt ) satisfying
{ dXt = I'dt + udWt (12.37)

Xo =0,
where I' E 1R and u > 0 are constants and where (Wt, t ~ 0) is a standard
Wiener process. Then we have
X t = I't + uWt , t ~ 0 (12.38)
so that (Xt ) is a Wiener process perturbated by a linear trend.

(ii) Black-Scholes process. Consider the stochastic differential equation
{ dXt = Xt(j.t(t)dt + u(t)dWtl (12.39)

Xo = Xo > 0,
where Xo is constant and where I' and u are not random and satisfy as-
sumption At.
Let us set
¢(x, t) = log(x/xo), x> 0, t ~ 0
and define a process (yt) by
yt = ¢(Xt, t), (12.40)
then Yo = 0 and by Ito's formula (12.34), we have
dyt = [I'(t) - u 2it)] dt + u(t)dW(t),
hence
yt = 1t [I'(s) - u2~s)] ds + 1t u(s)dW(s),
and from (12.40),
Xt = Xo exp {1 t [I'(s) - U2~S)] ds + 1t u(s)dW(s) }, t ~0 (12.41)

248 Lesson 12
If J.t(t) = I' and u(t) = u, one obtains the Black-Scholes process

Xt = Xo exp {(I' - u 2 /2)t + uwtl, t ~ O. (12.42)
Note that X t has a log-normal distribution since log X t '" N(Iog Xo +

(I' - u 2 /2)t, u 2t). More precisely, the process
COgXt -logx:- (I' - u 2 /2)t, t ~ 0)

is a standard Wiener process.
Xt
euWt = Xo exp (I' - u 2 /2)t) ,
we have as t -+ 00,
--+ 0 if I' u 2 /2.
Interpretation. Consider a financial market in which a stock is available.

The stock is a risky asset whose price at time t is X t . The formal notation
dXt = J.tdt + udWt

Xt
suggests that I' may be interpreted as the mean rate of return for the stock
and u 2 as the variance of return. The asymptotic behaviour of X t as t -+ 00
is specified by (12.43).
(iii) Ornstein-Uhlenbeck process. This process is solution of the stochas-
tic differential equation
{ dXt = -OXtdt + u(t)dWt (12.44)

Xo = Xo,
where Xo is nonrandom and 0 is a strictly positive parameter.
In order to solve this equation, we set
yt = ue St X t •
Then Ito's formula (12.34) implies
dyt = ue St dWt
1t
hence
e8t X t - Xo = (1' e83 dW(s), t ~ o. (12.45)
A more general form of the O. U. process may be obtained by considering

a bilateral Wiener process (Wt ) defined by
Wi, t~O
W t = { W: t , t~ 0,
(12.46)
where (Wi, t ~ 0) and (W?, t ~ 0) are two independent standard Wiener

processes.
Replacing "Xo = xo" by the initial condition Xto in (12.42) and using
the same method we get
X t = e- 8(t-to) Xto + (1' tlto e- 8(t-3)dW(s), t ~ to. (12.47)
By letting (formally!) to -7 -00, we obtain the process
Xt = (1' {too e- 8(t-3)dW(s), t E JR. (12.48)
This process is Markovian stationary Gaussian and has auto covariance

e- 8t
1t = 20' (1'2 t~ o. (12.49)
Interpretation. X t may be interpreted as the velocity at time t of a

particle which executes a Brownian motion.
(iii) Purly nondeterministic stationary Gaussian Processes. These
processes take the form
Xt = {too get - s)dW(s), t E JR, (12.50)
where g is nonrandom, vanishes over JR- and is square integrable over

JR+ with respect to Lebesgue measure. (Wt, t E JR) is a bilateral Wiener
process. Finally
1t-00
get - s)dW(s) =
~~-oo
l~m 1t
~
get - s)dW(s), (12.51)
where the integral on the right is an Ito's integral.

250 Lesson 12
The Ornstein-Uhlenbeck process corresponds to
g(X) = o-e-9u1H4(x), u E JR.
Note that (12.50) may be considered as a Wold decomposition (see Les-

son 9), in continuous time.
12.6 Exercises
12.1. Characteristic function of a Gaussian vector.
(i) Show that the c.f. of Y, which is distributed as N(/J, 0- 2 ), is
tPy(X) = ei/Jue-q2u2/2 , u E JR.
(ii) Prove (12.4). Hint: consider the case /J = 0 and use the equality
tPy(X) = tP(t,x)(I), t E JR.
12.2. Gaussian density. Consider the probability space (JRn , BJR'" P)

where P has the density
1 -, exp (
,- -. -21 ~xJ
n ) ,(Xl, ... ,xn ) E JRn .
J=l
Let Xo be the r.v. defined by
Xo(w) = w, wE JRn .
(i) Show that Xo is Gaussian.

(ii) Let C be an n x n nonsingular covariance matrix and let Y = AXo+/J
be a r. v. defined on (JRn , BJR'" P) where /J E JRn and A is a matrix such
that AA' = C. Show that Y is Gaussian with density given by (12.5).
12.3. Karhunen-Loeve expanssion of a standard Wiener process.
(i) Consider the equations
11 min(s, t)tPn(s)ds = AntPn(t), 0~t ~ 1, n ?: O.

Show that tPn has two continuous derivatives and that
AntP~(t) = -tPn(t).
(ii) Show that
tPn(t) = v'2sin (n+~) 7rt, 0~t ~ 1, ~ 0n

and that
An = 1
(n + t) 2 7r2 ' n ~ O.
(iii) Prove (12.9).
12.4*. t = c.
(i) Show that if I E C and if t 1--+ I(t,.) is a (uniformly) continuous
map of [a, b] into L2(0, A, P), then lEt. Hint: Define
n-l
g(t, w) = L I(ti ,w)l[ti,ti+d(t)
;=0
and show that f:EI/(t) - g(t)1 2 dt can be made arbi~rary small.

(ii) Show that if IE C and is bounded then I E C. Hint: Define
In)(t,w) = 100
e- U I (t - ~,w) du
and show that In E C and (In) --+ I in L2 ([a, b] x 0).
(iii) Show that t = C. Hint: Consider
gn = I 1IJI<n, IE C, n ~ 1.
12.5. Let (Wt , t ~ 0) be a standard Wiener process, [a, b] an interval in 1R+

and a = to < tl < ... tn = b a partition of [a, b]. Define
n-l
In(A) =L (AWti +1 + (1- A)Wti ) (Wti+ 1 - W to ) ,
i=O
where 0 < A < 1.

(i) show that In(.~) converges in L2 sense as max(ti+l - ti) tends to
zero.
(ii) Determine f:WtdWt .
(iii) For which value of A the limit of In(A) is (Wf - W'1)/2?
Remark: (Wf - W;)/2 is called the Stratonovitch integral of W t (For a
study of this stochastic integral see Sobcsyk (1991)).
252 Lesson 12
12.6. Prove (12.18). Hint: Define fn = E?;01 f(ti)l[ti,ti+l) and show that
1a
b n-1
fn dW = f(t n-1)Wt .. - f(to)Wto - Erf(ti) - f(ti-I)]Wti ·
i=1
12.7. Show that the process (X,) defined by (12.19) is

(i) nonanticipating,
(ii) with continuous sample functions (a.s.), and
(iii) a martingale.
12.8. Let (Xt ) be the Ornstein-Uhlenbeck process defined by (12.48).
(i) Prove (12.49).
(ii) Show that (Xt ) is a Markov process and compute
Xt+h = E(Xt+h IX6 , S ~ t), t E JR, h > O.
(iii) Compute the prediction error E ( Xt+h - Xt+h) and find its limit
as h --+ 00.
12.9. Consider the random walk
Xn(t) = e1 +~..
... + en
vn6.z, t ~ 0,
where e1, ... ,en are independent and such that

1
P(ei = 6.z) = P(ei = -6.z) = -
2
and where t = n6.t. Show that, if
(6.Z)2 --+ (1'2 > 0, as 6.t --+ 0,

6.t
then (Xn (t1), ... , Xn(tk)) converges in distribution to (W(t1), ... , W(tk)),
where W(.) is a Wiener process and 0 ~ t1 < ... < tk.
12.10. Let (Wt, t ~ 0) be a Wiener process with parameter (1'2 and let
Tn, n ~ 1) be a Poisson process with intensity A. (Wt ) and (Tn) are sup-
posed to be independent.
(i) Show that W 1---* WT.. (w)(w) is a random variable for all n ~ l.
(ii) Compute E(WT.. ) and V(WT.. ).
(iii) Determine the characteristic function and the distribution of WT".
(iv) Find the distribution of WT" - WT"_l.
12.11. Let (Xt, t ~ 0) be a zero mean Gaussian process with covariance

E(X3Xt) = u(s)v(t), 0 ~ s ~ t,
where u and v are continuous and such that v(t) :I 0 and a(t) = u(t)/v(t)
is strictly increasing.
(i) Show that
X(a-1(t)) t >0
Yt = v(a-1(t)) , -
is a standard Wiener process.
(ii) Apply (i) to the Ornstein-Uhlenbeck process.
12.12. Let (Wt, t ~ 0) be a standard Wiener process.
(i) Show that (Wt ) is a martingale with respect to
:Ft = U(W3' S ~ t),
t ~ O.
(ii) Show that Wl-t and exp(AWt -A 2 t/2) where A E IR are martingales
with respect to (:Ft )
12.13. Brownian bridge. Let (Wt, t ~ 0) be a standard Wiener process.
Show that the process defined by
Bt = Wt -tWl, t E [0,1],
is a zero mean Gaussian process with covariance function given by
C(s, t) = s(1 - t), for s ~ t.
Such a Gaussian process is called the Brownian bridge.
12.14. Reflection principle of Brownian motion process. Let (Wt, t ~
0) be a standard Wiener process. For given T, we are interested in finding
the distribution of the random variable
YT = sup W t .
09:ST
(i) Explain why the sup in the definition of YT can be replaced by max.
(ii) Let A = {w : YT(W) > x, WT(W) > x} and B {w : YT(W) > =
x, WT(W) ~ x}. Verify that
P(A) = P(WT > x) = v'27rT

1 1 00
(y2)
'" exp - 2T dy.
(iii) The so-called Reflection principle is the following type of argument.
Let T", be the first time the process (Wt ) hits x. Between T", and T, it is
plausible that the probabilities for (Wt ) to be below or above x are the
same. Thus P(A) =
P(B). Use this fact to determine the distribution of
YT.
Lesson 13
Statistics for Poisson

Processes
This Lesson begins with a review of some basic concepts in Statistics. As

an application we study statistical inference for Poisson processes.
13.1 The Statistical Model

In the Mathematical Theory of Statistics, the results of observations are
interpreted as values of a random vector or a random function X.
It is only known that the distribution of X belongs to a class of distri-
butions P. Thus the statistical model is a triple (E, B, P) where E is a
nonempty set, B a u-field of subsets of E, and P a family of probabilities
on (E, B). The observed random element X is defined by X(x) x, x E E. =
The family P can always be parameterized and represented in the form
P = {P(J, (J E 8}. In the following we assume that (J 1---+ P(J is injective.
P is said to be dominated by a u-finite measure L if every P(J has a
density f(x, (J) with respect to L.
Example 13.1 Let us consider the Gaussian model:
(IRn, B IR'" N(I-', u 2 r8)n, (1-', ( 2 ) E IR x IR~) ,
which corresponds to the observation of X = (Xl, ... , X n ), where Xl' ... ' Xn
are i.i.d. Gaussian random variables with distributionN(I-',u 2 ), and IR~ =
(0,00).
255
256 Lesson 13
This model is dominated by Lebesgue measure on IRn and X has the

density
I (Xl, ... , Xn; (1-', 0"2)) = (~)n/2 exp [- 2~2 ~(Xi - 1-')2], (13.1)
where (Xl, ... ,Xn) E IRn and (1-',0"2) E IR x IRt.
Sufficiency.
Let (F, C) be a measurable space. A statistic S with values in (F, C)
is, by definition, a B-C measurable mapping of E into F.
A statistic S is said to be sufficient if there exists a variant of the
conditional probability P;(BIS), BE B, () E e which does not depend on
().
This property means that S(X) contains all the available information
concerning ().
The following theorem provides a useful criterion for sufficiency. Proof
is omitted.
Theorem 13.1 (Factorization theorem). liP is dominated with den-

sity I( x, (}), then a statistic T is sufficient il and only il
I(x, (}) = h1 (S(x), (})h 2 (x), (13.2)

where h1 (., ()) and h2 are positive measurable functions.
In the Gaussian model (13.1), the statistic S =

(x, s2) where x
E~=l =
xdn and S2 E~=l(X - x)2 In is sufficient (see Exercise 13.1).
13.2 Estimation
Let (E, B, P 9, () E e) be a statistical model and let 9 be a measurable
mapping from (e, V) into (e', V'), where V and V' are O"-fields over e and
e' respectively.
In order to evaluate g«(}) from the observation X, one uses an estimator,
that is a statistic with values in (e', V').
In the Gaussian model, S2 = E~=l (x-x)2 In is an estimator of g(l-', 0"2) =
0"2. It is important to note that an estimator only depends on X. For ex-
ample, s~ = E~l (x - 1-')2 In is not an estimator of 0"2 because it cannot
be computed from the observations.
Statistics for Poisson Processes 257
The accuracy of an estimator T(X) is specified by a risk function. If

a' = JR, the most popular risk function is the quadratic error
R(T, (J) = Ee (T(X) - g«(J))2 , (13.3)
where the symbol Ee means that the expectation is taken with respect to
Pe·
The quadratic error generates a partial ordering on the set To of all
estimators of g«(J) as follows. Consider Sand T in To, then S is said to be
preferable to T (S -< T) if and only if
Ee (S(X) - g«(J))2 ~ Ee (T(X) - g«(J))2, (J E 8. (13.4)
Now let T be a subset of To, then T* E T is said to be optimal within

T if T* -< T for all T E T.
In general it is not possible to find an optimal estimator except for some
special classes. We now introduce such a class.
Unbiased estimators.
An estimator is said to be an unbiased estimator of the real valued
function g«(J) if Tis Pe-integrable for each (J and
Ee(T) = g«(J), (J E 8. (13.5)
In Example 13.1, X = (Xl + ... + Xn)/n is an unbiased estimator of 1'.

The following theorem shows that a "good" unbiased estimator is a
function of a sufficient statistic.
Theorem 13.2 {Rao-Blackwell theorem}. Let S be a sufficient statis-

tic and let T be a square integrable unbiased estimator of the real valued
function g«(J). Then
E9(TIS) -< T, (13.6)
where E;(·IS) denotes the expectation with respect to P;(·IS).
Rroof. First since P;('IS) does not depends on (J, T* = E9(TIS) is actually'
an estimator. Now for every (J E 8,
Ee(T* - g«(J))2 = Ee [E; (T* - g«(J))2IS)] ,

but Schwarz inequality implies
(E;(TIS) - g«(J))2 = (E;(T - g«(J))IS)2

< E; [(T - g«(J))2IS] ,
258 Lesson 13
hence
Ee(T* - g(0»2 ~ Ee E; [(T - g(0»2IS]

< Ee(T - g(0»2. <>
In order to obtain an optimality result we now define a complete statis-

1;ic S as a statistic such that if Ee(g(S» = 0 for all 0 E e, then g(S) = 0
(Pe a.s. for all 0 E e). Then we have the following theorem.
Theorem 13.3 (Lehmann-Scheffe theorem). If S is a complete suffi-
cient statistic and if the class T of unbiased estimators is not empty, then
there exists an optimal estimator T* within T. This estimator is given by
T* = E;(TIS), where T is any element in T and 0 any element in e.
Proof. It T E T then E*(TIS) E T since
Ee (E;(TIS» = Ee(T) = g(o).

Now for any T' in T we have E*(T'IS) E T and
Ee (E*(T'IS) - E"(TIS» = 0, 0E e.
By the completeness of S it follows that
E"(T'IS) = E*(TIS) == T" (a.s.).

Finally using Theorem 13.2, we obtain
T" -< T, T' E T. <>

In the Gaussian model, (z, s2) is complete.
Method of maximum likelihood.
This is a general method of estimation which is commonly used, In
particular if no unbiased estimator is available.
Let (E, B, !(., o)L, 0 E e) be a dominated statistical model with ob-
served random element X. The random function !(x, 0) is called the like-
lihood (function) associated with the model.
The statistic 0 defined by
!(X,O) = max!(X, 0) (13.7)

eEe
is called the maximum likelihood estimator (MLE) of 9. In regular cases,

(13.7) has an unique measurable solution.
The factorization theorem 13.1 shows that if S is a sufficient statistic,

then the MLE is a function of S.
Consistency.
If X(n) = (Xl, ... , Xn) is a n-dimensional random vector, then the
accuracy of an estimator Tn based on X(n) increases with n. Thus it is
natural to consider the behavior of Tn as n tends to infinity.
In that context, the asymptotic statistical model is defined by a triple
(Eoo, Boo, Pe,oo, () E 9) and by an increasing sequence (Bn) of sub u-fields
of Boo. For a fixed n, the observation is a Bn-measurable random element
X(n) and an estimator of g«(}) is denoted by Tn.
If (9', V') is a metric space equipped with its Borel u-field and with
distance d, then (Tn) is said to be consistent in probability if and only if
Pe,oo (d(Tn,g«(})) > c) --+ 0, c > 0, () E 9. (13.8)

Similarly one defines almost sure consistency and L2-consistency. The
asymptotic behavior of Tn in distribution is also of interest as we shall see
below.
Conserning the MLE On, we consider the case of observations X(n) =
(Xl, ... , X n ), n ~ 1, where (Xn) is a sequence ofi.i.d. real random variables
with common density I(x, (}) where () is a real parameter. Then under some
regularity conditions, we have
a.l. ()
() E 9 (13.9)
A
n --+ ,
(}
and
r.::TTi1\ 'D
ynI«(})«(}n - ()) N '" N(O, 1), (13.10)
A
--+
where I«(}) is the so-called Fisher inforIIlation quantity defined by
I«(}) = Ee (8Iog/8()(X, (}))2 ,() E 9. (13.11)
Under the same condtions, any unbiased estimator Tn satisfies the

CraIIler-Rao inequality
1
Ve(Tn) ~ nI«(}) , () E 9, n ~ 1, (13.12)
where Ve is the variance taken with respect to P9.

If Ve(Tn) = [nI«(})]-l, then the estimator is said to be efficient. It can
be shown that On is asymptotically efficient, i.e.,
nI«(})Ve(On) --+ 1, as n~ 00.
260 Lesson 13
13.3 Tests
Given a statistical model (E, 8, Pe, () E e), we wish to test the hypothesis
Ho : () E eo, against H 1 : () E e1 == e - eo.
Ho is called the null hypothesis and H1 the alternative hypothesis. These
expressions are justified by a dissymmetry in the problem which is visible
in the following example.
Example 13.2 n trials with a coin are performed. The problem is to test
that the coin is fair. An associated statistical model is
({O, l}n, P( {O, l}n), (1 - (})c(o) + (}C(l) , °: :; () : :; 1)

and the null hypothesis is "() = 1/2".
is accepted if ¢ °
A test ¢ is a measurable mapping of (E,8) into ({O, I}, P({O, I})). Ho
= =
ans rejected if ¢ 1. Note that this does not mean that
H1 is accepted! For that, one must contruct a new test problem where H1
(or some other hypothesis) is the null hypothesis. The above ¢ is completely
specified by its critical region W =
{x : ¢(x) I}. =
The probabilities of error Pe(W), () E eo and Pe(E - W), () E e 1
measure the quality of ¢. Taking into account dissymmetry one defines the
level of significance
Q.p =
sup Pe(W) (13.13)
eeeo
and the power function
(3.p = Pe(W), () Eel. (13.14)
A "good"test has a small level of significance and a large power.
Let Q E [0,1] be a given number and let Ta, be the family of tests
satisfying Q.p ::::; Q, ¢ ETa. A test ¢o is said to be optimal within Ta or
uniformly most powerful (UMP) in Ta if
(3.po«(J) ~ (3.p«(}), () Eel, ¢ ETa. (13.15)
The following classical result gives the optimal test in the simplest case.
Proof is omitted.
Theorem 13.4 (Neyman-Pearson lemma). Let (E,8'/e,() E {(}o,(}d)
be a dominated statistical model. Then the test ¢o defined by the critical
region
W = {x : fel(X) ~ cfeo(x)}, (13.16)
where c is a constant, is optimal in Ta4>o for testing () = (Jo against () = (}1.
In the general dominated case, a commonly used test is the likelihood

ratio test defined by the critical region
w = {x : f(x, 0) ~ c sup f(x, On,

geE>o
where 0 is the MLE of O. This test has good asymptotic properties.
Monotone likelihood ratio.

A family of densities (/(-,0), 0 E e), where e is a real interval, is said
to have monotone likelihood ratio if there exists a real statistic U such that
whenever 01 < O2 , the likelihood ratio has the form
~~x, ~2~ = g9 1 ,9 2 (U(x» , (13.17)
where g9 1 ,9 2 (') is a strictly increasing function. For such a family, an UMP

=
test does exist if eo {O : 0 ~ Oo}.
Theorem 13.5 Let (/(·,0), 0 E e) be a family of densities with monotone

likelihood ratio. Then, for testing 0 ~ 00 against 0 > 00 , any test of the
form
¢(x) = l(U(x)~e) (13.18)
is UMP in To •.
This result is a consequence of Neyman-Pearson lemma and (13.17).
13.4 Estimation for Poisson processes

Consider an observed Poisson process (Nt, t ~ 0) with unknown intensity
A. We wish to estimate A using available data. For convenience, we as-
sume that the statistical model contains the degenerate Poisson process
corresponding to A = 0 unless otherwise stated.
Observation over a fixed time interval.

Suppose that the process is observed over the time interval [0, T]. Data
may be written as To, ... ,TNT with as usual To = 0 in order to avoid a
possible empty set of observations.
Now Theorem 4.4 shows that the conditional distribution
£, [(To, ... , TNT) INT] does not depend on A. Thus NT is a sufficient statistic
and therefore contains all the information.
262 Lesson 13
The likelihood function of NT is then
fe N A) = e->'T (AT)NT (13.19)

T, 1\T' '
lYT·
with convention 00 = 1. Hence the MLE
jT = NT/T. (13.20)
We now show that jT is optimal.
Theorem 13.6 jT is the minimum variance unbiased estimator for A.
Proof. First, since NT has a Poisson distribution with parameter AT, it

follows that
E>. (jT)
= A, A ~ 0. (13.21)
We now prove that NT is a complete statistic. Let g( NT) be an inte-

grable random variable such that
E>. (9(NT» = 0, A ~ 0, (13.22)
or equivalently
00 (AT)n
G(A) = Lg(n) - , - = 0, A ~ 0. (13.23)
n=O n.
The power series G(A) vanishes over [0,00), hence
g(n) = 0, n = 0, 1, ... (13.24)
and NT is complete. Applying Theorem 13.3 to the statistic NT, we obtain

the desired result. <>
The asymptotic behavior of jT is given by Theorem 4.7.
Observation over a random time interval.
If A is strictly positive, the process may be observed on the random time
interval [0, Tn] and Theorem 4.5 shows that Tn is a sufficient statistic.
Using Corollary 4.2, we obtain the likelihood function
f(Tn, A) = Ae->'T" (ATn)n-l

1__ 1'\ I , n ~ 1, (13.25)
hence the MLE

A~ = n/Tn . (13.26)
This estimator is not unbiased since for n ~ 2,
E>. (A:) = n 1
00
o u
A
_e->'u ,
(AU)n-l
_,. du
~ foo (AU)n-2 Ae->,udu _ ~

n - 1 io (n - 2)! - n - 1·
Consequently An = (n-1)/Tn is unbiased. It can be shown that Tn is com-

plete and therefore that An is an optimal unbiased estimator (see Exercise
13.7).
Concerning A: we have the following
n 2A2
E>.(A:-A)2=, . . • \1 .. n\' n~3 (13.27)
and
A: - - A, m.q. and a.s. (13.28)
13.5 Confidence Intervals and Tests for A

Confidence intervals.
If the process is observed over a fixed time interval, Theorem 4.7 allows
to construct a confidence interval for A.
Let a E (0,1) be a given number and let Zl-a such that
P(INI ~ Zl-a) = 1- a, (13.29)
where N '" N(O, 1). Then (5.29) entails
r,;;jT - A )
lim P>. ( IvT~1 ~
T-+oo VA
Zl-a = 1- a, (13.30)
which leads to a confidence region of asymptotic confidence level 1 - a.

Actually (13.30) may be written as
P>.(AER(jT,a») --I-a, as T--+ 00, (13.31)
where R(jT, a) is a random set.

264 Lesson 13
Replacing V>. by ..;r;. we obtain a confidence interval for A
h(a) = [ AT -
A
r-
V{5; Z1 - CX , AT +
A
r-
V(5; Z1 -
1
CX • (13.32)
Now if the process is observed over [0, Tn], then the construction of a
confidence interval is based on the fact that 2ATn is distributed as x2(2n).
Consider q1 and q2 such that
P(ql ~ Q ~ q2) =1- a, (13.33)
where Q is distributed as x2(2n), then
2Tn' ~])
PA ( A E [ .l!.. 2Tn = 1- a. (13.34)
Tests.
Consider the problem of testing 0 < A ~ AO against A > AO.
Note that if NT is observed, the model is dominated by the counting
measure over N with density
f(x, A) = e-AT(AT):I: xEN,

x! '
thus
f(x, A2) (>..1-A2)T (A2):I: N
f(x, A1) = e A1' 0 < A1 < A2, X E .
Consequently the family (f(., A), A> 0) has monotone likelihood ratio.
By Theorem 13.5, it follows that an optimal test has the form
¢(NT) = INT>c, (13.35)
In order to specify c, we consider Zcx defined by
P(N ~ zcx) = a, 0< a < 1,
where N ....., N(O, 1).

Now the monotonicity of the likelihood ratio implies
a", = sup P)..(NT ~ c) = PAo(NT ~ c)

)..:5)..0
and using Theorem 4.7, we obtain

NT - >'oT )
lim P Ao ( /\rF ~ Za = a.
T-+oo v>'oT
Finally the critical region {NT ~ >'oT + zaJ>'oT} provides a test with
asymptotic level a.
We now turn to the test based on Tn. In that case, we use the property
2>'oTn "" x 2(2n) for obtaining an optimal test of level a defined by the
critical region
Tn < q2n(a) (13.36)
-~'
where P (Q ~ q2n(a)) = a with Q "" x2(2n). Details are left as an exercise.
13.6 Inference for Point Processes

We now give some indications about statistical inference when a general
point process is observed. In fact a main problem is to verify if such a
process is Poissonian or not.
Testing the Poissonian hypothesis.
Suppose that a point process is observed over [0,7l We wish to verify
whether or not we are dealing with an homogeneous Poisson process. For
this purpose, we test the uniformity of the distribution of the time arrivals
Tl"'" Tn given NT = n (see Theorem 4.4).
The test uses pseudo-distances between the empirical distribution
1
E
n
Vn = ;; C(Ti} (13.37)
i=l
and the theoretical distribution U, which is here the uniform distribution

over [0, T].
Typical pseudo-distances are
Dn = Vn sup Ivn ( -00, x] - v( -00, xli, (13.38)
and
Wn = n i: :r:eIR
(vn(-oo, x] - v( -00, x])2 dv(x), (13.39)
Q~k) = n Lk [
Vn
(i=.!T iT]
k ' k . 1- V.
(i=.!T iT]] 2
k , k , (13.40)
;=1 v (yT, tTl
266 Lesson 13
where k is fixed.
It can be proved that if n tends to infinity, then
Dn ~ K , W.n ~ W' n
Q(k) ~ Q(k) ,
where K has the so-called Kolmogorov distribution, W the so-called

Von Mises distribution, and Q(k) the X 2(k - 1). The associated tests
have respective critical regions
{Dn > cal (Kolmogorov test),
{Wn > c~} (Von Mises test),

{Q~k) > c~} (x2-test),
where
P(K > cal = P(W > c~) = p(Q(k) > c~) = a, 0<a < 1.
The choice between these tests is difficult. Some practical considerations
seem to prefer the K. and V. M. tests but the x2-test is easy to compute.
Remark.
This method for testing the Poisson character of a point process is com-
monly used in practice. Note however that the "uniformly property", say
U, does not characterize Poisson processes! For example Cox Processes (see
Section 4.5) verify U. Thus from a theoretical point of view, we only test
Ho : U against Hl: not U.
Comparing two Poisson processes.
Consider two independent Poisson processes with intensities A and N
and time arrivals 11 and TJ respectively. We wish to test Ho: A = N given
data Tnl and T~2.
Since 2ATnl '" x 2(2nI) and 2A'Tn2 '" X2(2n2), the random variable
(TnJnl)/(Tn2/n2) follows the Fisher distribution F(2nl,2n2)' provided
A =A'.
From this property, we deduce the critical region
n2Tnl
{- T. > fa/2 } U{n2Tnl
-T.:$ h-a/2 } (13.41)
nl n2 nl n2
with P(F > ff3) = p, 0 < p < 1, and F", F(2nb 2n2).
Note that, since a Poisson process has independent increments, this
test may be utilized for the verification of homogeneity of a Poisson process
observed over two disjoint intervals.
A typical example should be the intensity of accidents at a crossroad

before and after some works intending to improve the road safety.
Estimation in Cox processes.
Recall that a Cox process (Nt, t ~ 0) is a nonhomogeneous Poisson
process with stochastic intensity (A(t), t ~ 0). We consider the simple case
where
A(t) = k(t)Z, t ~ 0, (13.42)
where k(t) is a known deterministic function and Z an exponential random
variable with unknown mean O.
If (Nt, 0 ~ t ~ T) is observed, it may be established that the likelihood
is given by
)-NT+I
L(O) = Or(NT + 1) (
0+ 10
T
k(s)ds J1
NT
k(Tj), (13.43)
where TI, ... , TNT denotes the time arrival. Note that L(O) is not defined
if NT = o.
Now NT is a sufficient statistic for 0 and the MLE is
" 1 fT
0= NT 10 k(s)ds. (13.44)
13.7 Exercises
13.l. Show that (x, S2) is sufficient in the Gaussian model ( Hint: use the
factorization theorem).
13.2. (i) Show that (13.12) is valid under some regularity conditions (Hint:
show that Cov(Tn, 8Iogf(X,O)/80) = 1 and use Schwarz inequality).
(ii) Find the models for which (13.12) is an equality.
13.3. Construct the statistical model associated with observations X I, ... , Xn
of i.i.d. random variables with uniform distribution on [0,0], 0 > O. Find a
sufficient statistic and determine the MLE of O. Compute its variance and
explain why (13.12) is not valid.
13.4. In Exercise 13.2, construct an optimal test for testing 0 = 1/2 against
0> 1/2.
13.5. Prove Neyman-Pearson lemma.
13.6. Prove Theorem 13.5.
268 Lesson 13
13.7. Show that Xn = (n - 1)/Tn is an optimal unbiased estimator of the

intensity A of an homogeneous Poisson process. .
13.8. Verify that 2ATn ...... x2(2n).
13.9. (i) Prove (13.27).
(ii) Prove (13.28) (Hint: use the strong laws of large numbers).
13.10. Consider the problem oftesting A = AO against A =1= AO when NT is
observed. Construct a test based on the likelihood ratio. Compute its level
and describe its power function. Study its asymptotic behavior as T - 00.
13.11. Show that the test defined by (13.36) is optimal.
13.12*. Prove that Q~k) ~ Q(k).
13.13. Let X 1, ... ,Xn be n independent random variables with distribution

N(JJ, 0- 2 ), 0- > o.
(i) Show that Q; = E?:1(Xi - JJ)2/0- 2 has the x2(n) distribution with
density
2- n / 2
fn(Y) = r (j) e-Y/2yn/2-11I14(Y)'
10
where r(a) = 00 e-:C x u- 1 dx.
=
Show that E( Q;) n and V ar( Q;) = 2n.
r. . (n -
(ii) Show that
n:a~ = t
0=1
(Xi ~ Xn x2 1),
where Xn = E~=1 Xi/no

(iii) If X ...... N(O, 1) and Y ...... x2 (n) are independent, then Tn =
y'nX/VY has the so-called Student distribution with n degrees offreedom.
Show that the density of Tn is
r(nt1) ( x 2 )-(n+1)/2
gn(x) = r-=.... In\ 1+ ~ , x E JR.
=
Show that E(Tn) 0 and Var(Tn) =
n/(n - 2), n > 2. Find the limit of
gn(x) as n tends to infinity.
(iv) Prove that Xn and S~ are independent and find the distribution of
vn=T(Xn - JJ)/Sn.
(v) Show that if Y '" x2 (nt) and Z "" x2(n2) are independent then
F = ~ ~ has the so-called Fisher distribution with (nl' n2) degrees of
freedom. Prove that the density of F is
_ r (~) nt/2 n2/2 x n1 / 2- I

h n1 ,n2(X) -r (T) r (~) n i n 2 (ni + n2 x )(n 1+n 2 )/2 1(O,OO)(x).
Show that
E(F)=~ n2 >2
n2 - 2'
and
Var(F) = 2n~(nl + n2 - 2) n>4.
nl(n2 - 2)2(n2 -4)'
Lesson 14
Statistics of Discrete-Time
Stationary Processes
In this Lesson, we distinguish between nonparametric methods which ap-

pear in the general case and parametric methods which are used in ARMA
models.
14.1 Stationarization
The first step in the statistical analysis of an observed stochastic process is
to extract the possible trend and seasonality and eliminate them in order
to obtain a stationary process.
Detecting trend and seasonality.
(1) Let (yt, t E 7l)be an observed process admitting the decomposition
yt = mt + Xt , tEll, (14.1)
where mt is a "slowly varying" deterministic function (the "trend compo-

nent") and (Xt ) a stationary zero mean process.
Let us define
n
S = L: l{Yt- Yt_l>O}, (14.2)

t=2
then S is large if (mt) is increasing and small if (mt) is decreasing. If (mt)

is constant then S does not differ significantly from (n - 1)/2. Thus Scan
be used to detect a monotonic trend.
271
272 Lesson Lf.
(2) Another model with trend is the ARIMA process (10.43). In that case,
the trend is random and is detected by considering the sample correlation
n h - -
A Et':-l (Yt - Yn)(Yt+h - Yn ) h = 1, .. . ,n -1, (14.3)
Ph = E~=l (Yt - Yn)2 '
where Yn = E~=l Ytln.

If (Yt) is stationary, then Ph ~ 0 as n -+ 00. If a trend exists, then
IPh I is not small for h large (for the behavior of Ph in this case, see Exercise
9.8).
(3) We now study the detection of seasonality. For convenience, we consider
a process of the form
p
v
Lt
~
= L..Jaje it>.·J + X t, t E 'lh, (14.4)
j=l
where aj E ~*, Aj E (-11",11"), i = 1, ... ,p, and (Xt ) is a stationary process.

If A¢ P1, ... , Ap}, then
1 1
..j211"n ~ ajei(>.;->.) ei~(>';->') - 1
n p
..j211"n L ei>.tYt
t=l
= J=l e'(>';->') -1
1 n
_ _ ~ X -i>.t
+ ..j211"n L..J t e == An + Bn (14.5)
t=l '
where liffin-+oo An = 0 and where Bn is bounded in L2-norm.

On the contrary, if A = Aj for some j, we have
_~
1 n
Lt =
L..Je -i).;tv
'F-" t=l
v.G1I"n
# -a·
211" J
+ Cn, (14.6)
where en
is bounded in L2-norm.
Now consider the periodogram associated with Y 1 , .•• , Yn :
In(A) = -1-ltei >'tYtI2. (14.7)

..j211"n t=l
It is then bounded in L 2-norm if A ¢ {A1' ... , Ap} and tends to infinity in

L 2-norm otherwise.
Thus the magnitude of (In(A), -11" $ A $ 11") allows to detect the hidden
frequencies of (Yt).
Statistics of Discrete-Time Stationary Processes 273
In order to test stationarity, we now deal with the special model
y;t = aexp ( 2-;-t

.2k1r )
+ Ct, t = 1, .. . ,n, (14.8)
where k is a known integer such that 0 < kin < 1/2, a is unknown and
(ct) is a white noise with a known variance 0- 2 •
We wish to test H 0 : a = 0 using the statistic
T. - 411"
n - 0- 2 m
L (2k1l")
n . (14.9)
Now it is easy to prove that the zero mean Gaussian vector
( t; ct (21rkt) t; ct (21rkt))
n
cos --;- ,
n •
sm --;-
has a covariance matrix given by
r=[ E~i cos 2(2dt)

E~=i s~n2 (2:kt)
0- 2
o n
0- 2 ]
and it follows that Tn '" X2 (2) if a O. Hence the critical region is
{Tn> qa} where
p(Q(2) > qa) = a with Q(2) '" X2(2).
If 0- 2 is unknown, then Tn is replaced by
T. -
n -
411"
0- 2
L (2k1l")
m n '
where u2 = E~=i ~2 In.

Eliminating the trend and seasonality.
Let (yt, tEll) be an observed process admitting the general decompo-
sition
yt =
mt + St + Xt, tEll, (14.10)
where (mt) is the trend component, (St) a periodic function with known
period T (the "seasonal component"), and (Xt ) a stationary zero mean
process.
If mt ans St have a simple form, then their estimation may be performed
using least square method. Suppose for instance that
mt = bo + bit + ... + bptP (14.11)

274 Lesson 14
and
St = CISlt + ... + C'TS'Tt, (14.12)
where
Slet = l{t=le(mod'T)}' k = 1, ... , T. (14.13)
Since E;=1 Slet = 1, it is necessary to introduce an additional assump-
tion which should ensure identifiability of the model. A natural condition
IS
'T
L = 0, Cle (14.14)
1e=1
which expresses the compensation of seasonal effects over a period.

Now, given the data Yb ... , Yn , the least square estimator, (mt, St), of
(mt, St) is obtained by minimizing
n
L (yt - bo - ... - bptP - ZI Slt - ... - C'T S'Tt)2
t=1
under the constraint (14.14).

Then the trend and seasonality may be eliminated by constructing the
artificial data
Xt = yt - mt - St, t 1, .. . ,n. = (14.15)
The above technique of elimination suffers the drawback of perturbating
the data.
A more flexible method of elimination is differencing. Consider the first
difference operator 'V =
I - B, where B is the backward shift operator
introduced in Lesson 10.
If yt has the decomposition
yt = mt + Xt, tEll,
where mt has the polynomial form (14.2), then we have
'VPyt = p!bp + 'VPX t , t EZ (14.16)
and consequently ('VPyt) is a stationary process with mean p!bp •

Differencing may also be used to eliminate seasonality in the model
(14.10), since by applying 1- B'T, one obtains the nonseasonal model
(I - B'T)yt = (mt - mt-'T) + (Xt - X'-'T). (14.17)
Finally differencing is useful when the trend is random and especially

in ARIMA model.
14.2 Nonparametric Estimation in Stationary

Processes
1) Estimation of Moments.
let us consider a real weakly stationary process (Xt, t E 7Z) observed at
instants 1, ... , n. A natural estimation of the mean J.I. of (Xt ) is the sample
mean
_ 1 n
Xn = - L::Xt • (14.18)
n t=l
The asymptotic properties of this unbiased estimator are given in The-
orem 9.4 and 9.5. Note that (9.26) allows to construct confidence inter-
vals and tests for J.I. provided to have at one's disposal, an estimator of
Lt1't = 27rf(0). This problem is studied below.
If (Xt ) is zero-mean with unknown auto covariance , one may define the
sample autocovariance by setting
1 n-t
'Yt n _ t L:: X.X.+t 0~t ~ n- 1
.=1
o t 2: n. (14.19)
'Ytis unbiased for every t ~ n - 1. Its convergence is given by the following

theorem.
Theorem 14.1 Let (Xt ) be a zero mean stationary process such that E(Xi) <
00 and E(X.Xt+.X.+"X.+.'+t) does not depend on s. If
E (XoXtX,X,+t) ~ 1'1, as s -+ 00,
then we have
L~
t2:0
A
1't ~ 1't, (14.20)

as n -+ 00.
Proof. Clear since the process
y(t) = X'+t X , -1't, z E 7Z

•
satissfies assumptions in Theorem 9.4. ¢
276 Lesson 14
If (Xt ) has an unknown mean, then the sample auto covariance is defined
by
1 n-l
n _ 1 ~)X. - Xn)(X.+t - Xn),
>I<
'Yt 0:::; t :::; n - 1
0, t> n (14.21)
and its convergence is obtained by combining Theorem 9.4 and Theorem

14.1 (Exercise 14.4).
Asymptotic distributions of ('Y; , t ~ 0) are rather complicated and there-
fore difficult to utilize.
2) Estimation of spectral density.
The periodogram (see (9.8) and (14.7)) is a natural estimator of the
spectral density since it may be written as
1
In(A) = 2'11" L rt cos At, A E [-'II", '11"], (14.22)
tEZ
where rt= (n - t).yt/n, t E 'll is the so called modified sample autocovari-

ance.
Using (9.9) we have seen that the bias f(A) - E(In(A)) of In(A) tends
to zero as n approaches infinity.
Concerning the variance of In(A), we consider the particular case (Xt ) =
(Ct). Then, similarly as in Section 14.1, it may be established that
2
In(O) = ~Q(l)
2'11" '
where Q(1) ,..- X2 (1). Consequently
V(In(O))
2)2 '
= 2 (;'11"
which shows that In is not consistent!
More generally, if (Xt ) satisfies some regularity conditions, one can show
that
V(In(A)) ~ f2(A) A ¢ {-'II",O,'II"},

~ 2f2(A) AE{'II",O,'II"}, (14.23)
and
Cov (In (A), In (A')) ~ 0 if A oj; ±A'. (14.24)
The non consistency of In is not surprising since, in fact, this estima-

tor only uses n observations for estimating the n dimensional parameter
('Yo, 'Y1,···, 'Yn-1).
Thus in order to obtain a consistent estimator, it is necessary to mod-
ify the periodogram. The primary idea is to estimate the kn dimensional
parameter ('YO,'Y1, ... ,'Yk,,-1) where kn - 00 and n/kn - 00.
More generally, one may consider a weight function W : [-1,1] 1----+
[-1,1] symmetric and such that W(O) = 1 and we define the associated
estimator of I by
In(>') = 211"
1
L
n-1
W
( t )
kn 1't cos>'t, >. E [-11",11"]. (14.25)
t=-(n-1)
Typical examples of weight functions are

(i) W = 1[-1,1),
(ii) W(x) = l-Ixi. Ixi ~ 1 (Bartlett),
(iii) W(x) = 1- 2a+2acos1l"x, Ixi ~ 1 (Blackman-Turkey), and
(iv) W(x) = 1-6x 2 +6IxI 3, Ixl ~ 1/2 and W(x) = 2(1-lxI)3, Ixl > 1/2
(Parzen).
The rate ofthe convergence ofsuch an estimatoris given by the following
theorem.
Theorem 14.2 II I is twice continuously differentiable, EteiZ It 2'Yt I < 00,
L IE(XoXhXrX6) - ('Yh'Yr-6 + 'Yr'Yh-6 + 'Y6'Yh-6)1 < 00,

h,r,8eiZ
and the estimator In is defined by (14.25) where W is continuous over
=
[-1,1] and satisfies liIIlu ..... o(l- W(u))/u 2 a> 0 and where k n [cn 1 / 2 ], =
then
n 4 / 5 E (fn(>') - 1(>.))2 --+ f3 > 0, 0< 1>'1 < 11", (14.26)
11
where
f3 = c/2(>.)
-1
W2( u)du +
c
~ [1"(>.)]2 a 2.
We only give some indications about the proof First since I is twice
continuously differentiable, it may be shown that
1(>') - E{fn(>')) = 0 (k~) (14.27)

278 Lesson 14
by studying the rest ofthe Fourier series of I.

On the other hand, noting that
E('Yf - 1't)2 = 0 (~) , t ~0

and that the number of dominating terms in the variance of In is approxi-
mately kn, it may be proved that
V{fn(>'» = 0 (~ ) . (14.28)
From (14.27) and (14.28), it follows that
E{fn(>') - 1(>'» = 0 (k; ) + 0 (k~)

and the optimal choice k n ~ n 1 / 5 gives the rate n- 4 / 5 • o
Remark.
A global measure of the quality of In as an estimator of I is the mean
i:
integrated square error (MISE) defined by
In = E (fn(>') - 1(>.»2 d>' = Eil/n - 111 2 , (14.29)
where the norm II . Ilis taken in the space L2 ([-11", 11"]).

If assumptions of the Theorem 14.2 hold, then
Eil/n - 1112 = O(n- 4 / 5 ). (14.30)

It is interesting to note that the rate obtained in (14.26) and (14.30) is
optimal even if (Xt ) is a white noise. This type of phenomenon is typical
for infinite dimensional parameters when the optimal rate is, in general,
n- 1 for finite dimensional parameters.
3) Nonparametric estimation in strictly stationary processes.
If the observed process is strictly stationary, it is natural to try to esti-
mate its finite dimensional distributions.
Suppose that (Xt ), tEll) is a real strictly stationary process and that
the density, say g, of Xo does exist.
An estimator of 9 given the data Xl, X 2, ... , Xn is defined by
1
L
n
g~(x) =h 1[~_h .. /2,~+h .. /21(Xt), X E nt, (14.31)
n n t=l
where h n is a bandwidth parameter.

A more general estimator is given by
gn(X) = _1_ ~ K
nh n L...J
(x -h Xt )
'
x E JR, (14.32)
t=l n
where the kernel K : JR -+ JR is a density over JR. g~ is a kernel estimator

corresponding to K = 1[-1/2,1/2].
Under some regularity conditions, it may be proved that
E(gn(x) - g(x))2 = O(n- 4 / 5 ) (14.33)
and that
I: E(gn(x) - g(x))2dx = O(n- 4 / 5 ) (14.34)
provided h n ~ n- l / 5 .
This estimator is more accurate than the classical histogram (see Exer-
cise 14.5) which only reaches the rate n- 2 / 3 •
Results of the same kind are obtained when estimating the density of
(Xl' ... ' Xk), k ~ 2.
Another important problem is autoregression estimation. Suppose that
(Xt ) is in addition a Markov process with autoregression
r(x) = E(Xt+1IXt = x), x E JR. (14.35)
Then the kernel autoregression estimator is defined by
E;;;ll Xt+l K ("'1.;')

rn(x) = "n K(U;_X') x E JR, (14.36)
L..,.,t=l h ..
where K is strictly positive.

The local and global rates of this estimator are again n- 4 / 5 •
Application to prediction.
Suppose that we wish to predict X n +1 given the data Xl' ... ' X n .
Then rn generates the statistical predictor:
E~;ll Xt+lK (XDh~X')

rn(Xn) = "n
L..,.,t=l
K (Xa-X,)
h ..
Xn E JR. (14.37)
280 Lesson 14
The normal kernel K(x) = (211")-l e-x 2/ 2, x E IR and
In ]1/2
hn = [; L:(Xt - Xn)2 n- 1/5
t=l
are commonly used in practice.
14.3 Statistics of ARM A Processes

We now deal with an observed ARMA(p, q) process (Xt ) (see Lesson 10)
obtained through a possible stationarization (see Section 14.1) and a pre-
liminary estimation of the mean (see Section 14.2).
1) Identification.
The first step in the analysis of ARM A process is to identify (p, q) or
more precisely to construct an estimator (p, q) of (p, q).
First we summarize characteristic properties of AR and MA processes
in the following table
Pk rk
MA(q) Pk =0, k > q Irkl = O(e O!Al), a > 0
AR(p) IPkl== O(e O!Al), a> 0 rk = 0, k > p
where (Pk) denotes autocorrelation and (rk) partial autocorrelation (see
Lesson 10).
Therefore if (Pk) and (rk) are suitable estimators of (Pk) and (rk) re-
spectively, then we obtain the following empirical criterion:
If IPkl is small for k > q, then the model is a MA(q).
If Irkl is small for k > p, then the model is an AR(p).
If IPk I and Irk I decrease rather slowly, then the model is mixed.
Now in order to construct our estimators, we need the following result.
Lemma 14.1 Let (Xt, t E 7Z) be a zero mean stationary process with au-
tocorrelation (Pk) and partial autocorrelation (rk). Consider the linear re-
gression of X t with respect to X t - 1 , ... , X t - k :
k
X; = L: aikXt-i. (14.38)
i=l
If aa, ... , au are unique, then we have

k
Pi = L:a;kP;-i, i = 1, .. . ,k (14.39)
;=1
and
rk = akk. (14.40)
Proof (sketch). The definition of linear regression is given in Section 9.1.

Now (14.39) is a straightforward consequence of (9.5) (with some change of
notation). Finally the proof of (14.40) is similar to that of Theorem 10.2.
<>
We are now in a position to define Pk and rk. Set
Pk = (~XtXtH) f (t x ;), k ~1 (14.41)

t=1 t=1
and define estimators lzik of the regression coefficients by
k
Pk=L:lzikh-i, i=I, ... ,k, (14.42)
;=1
then
rk = lzkk. (14.43)
It may be checked that Pk and rk are consistent estimators but the
above criterion remains empirical.
We now consider the general case.
Akaike's criterion.
It is based on the minimization of the quatity
AIC (p, q) = log U;,q + (p + q) log nfn, (14.44)

where n is the number of observations and U;,q the MLE of 0"2 computed
as if (Xt ) should be a Gaussian ARMA process.
Thus
(Pn,lin) = argminAIC(p,q). (14.45)
(p,q)
The Akaike's criterion may be interpreted as follows: one chooses the

ARMA(p, q) model for which the prediction error 0"2 is minimum. Con-
cerning the consistency, we have the following
282 Lesson 14
Theorem 14.3
(fin, iin) ~ (p, q) as n -+ 00. (14.46)
Proof is omitted.
Note that (14.46) means that, with probability one, there exists a ran-
=
dom integer N such that fin p and iin q for every n ~ N.=
2) Estimation.
We now suppose that (Xt ) is an ARMA(p, q) where (p, q) is known. The
problem is to estimate the unknown parameter
TJ = (¢l, ... , ¢p'(h, ... , (Jq, (1'2),
where (1'2 is the variance of Ct and
p q
Xt - L: ¢j Xt-j = Ct - L: (JjCt_j (14.47)
j=l j=l
(see (10.37».
If (Xt ) is Gaussian, the MLE provides a good estimator of TJ since it
may be checked that it is asymptotically efficient. However its implemen-
tation is tricky because the likelihood is complicated.
In the particular case of a MA(q), we have
+ L: (JjCt_j,
IJ
Xt = Ct t E 'fl. (14.48)
j=l
Thus
(Xl," .,Xn) = A(cl-q, .. . ,cn), (14.49)
where A is a linear mapping. This allows us to write explicitly the likelihood
(Exercise 14.7).
In the general case the problem can be simplified by approximating (Xt )
to a MA(Q).
Now if (Xt ) is in an AR(p), then the conditional MLE provides a simple
and interesting alternative method.
Recall that p
Xt = L: 'lrjXt_j + Ct, t E 'fl (14.50)

j=l
and consider the random vector

(X l - p , •• . ,XQ,cl, ... ,cn)
with density
!(X1-p, ... , XO) (o. J2;i) -n exp (- 2~2 t u~)

t=l
, (14.51)
(X1-p, ... , Xo, U1, ... , un) E IRn+p, where! denotes the density of
(X 1 - p , ••• , Xo). Using the change of variables
p
Ut = Xt - L 7rj X t_j, t = 1, .. . ,n,
j=l
we deduce from (14.51) that the conditional density of (Xl, ... , Xn) given
(X 1 - p , ••• , Xo) is
g(X1, ... , XnIX1-p, ... , xo) = (O'v2';;)-n exp (- 2!2 tt=l

z;), (14.52)
where Zt = Xt - E:=l 7rj X t_j·
For convenience, we now suppose that the data are Xl- p , .•• , X o, Xl ... ·, X n .
The conditional likelihood is then g(X1"'" X n IX1- p, ... , Xo) and the con-
ditional MLE is the solution of the system
1 PIn
- L XtXt-k - L L Xt-jXt-k = 0,
n
7rj - k=I, ... ,p,
n t=l j=l n t=l
0'
2 ~ (Xl -
= -1 L..J 7r 1 X t - 1 - ... - 7r pX t _p)2 , (14.53)
n
t=1
hence the estimator fJn = (11'1, ... , lrp, u2 )n.
Note that these equations may be obtained fro~ the Yule-Walker equa-
tions (Theorem 10.1) with replacement ofthe autocovariances by (modified)
sample autovariances. (14.53) may be used in the non-Gaussian case and
fJn is consistent.
3) Diagnostic checking.
The operations performed in 1) and 2) specify completely the model.
In order to verify if the model fits to the data, we define the residuals
€t by
~(B)Xt = 8(B)€t, (14.54)
where ~(B) = 1- E~=l ¢jBj and 8(B) = 1- EJ=l DjBj.
284 Lesson 14
Independence of ii, ... ,in is tested by using the portmanteau statistic

K
Qn = n I:p~(i), (14.55)
1:=1
where (pr.(i), k ~ 1) is the residuals sample autocorrelation.

If K > p + q it can be proved that
Qn ~ Q(K-p-q) ,..., X2(K - P - q), (14.56)
hence the critical region {Qn > qa}, where

P (Q(K-P-q) > qa) = ex, O<ex<1.
If the model is rejected, then the identification must be corrected.
14.4 Exercises
14.1. Consider S defined by (14.2). Compute E(S), Var(S) and give a
bound for P(IS - E(S)I > TJ), TJ> 0 in the following cases.
(i) (yt) = (Ct), where (Ct) are Li.d. random variables with common
uniform distribution over [-1/2,1/2].
(ii) yt = at + ct, t E '/1, where a E JR*.
14.2. Consider Tn defined by (14.9). Show that Tn ,..., X2(2) when a = O.
14.3. Construct a confidence interval of asymptotic confidence level 1 - ex
for the mean of a stationary process (1(0) is supposed to be known).
14.4 Prove the consistency of 1; defined by (14.21).
14.5 Let (Ct, t E '/1,) be a strictly stationary real process. Suppose that the
density , g, of co does exist and define the histogram estimator by
Yn(x) =: t
t=l
lU/1:",(i+1)/r.,,)(Ct), x E [j/k n, (j + 1)/kn), j E '/1,.
Show that conditions k n --10 00 and kn/n --10 0 entail

L2
Yn(x) -+ g(x), x E JR
provided that 9 is continuous.

14.6. Give a detailed proof of Lemma 14.1.
14.7. Consider the process
Xt = Ct + OCt-1, t E '/l"
where 101 < 1 and where (ct) is a Gaussian white noise. Determine the
likelihood and compute the MLE (0, &2).
X t = pXt - 1 + ct, t E '/l"
where Ipi < 1 and where (ct) is a Gaussian white noise. Given the data
X o, Xl"'" X n , compute the conditional MLE (p, &2).
14.9. Let (Xt, t E '/l,) be a zero mean stationary process with spectral
density
1
=
f(>t.) 211" (1 + 20 cos >t. + ( 2 ), -11" ~ >t. ~ 11",
where 101 < 1.
(i) Determine the auto covariance (rt) of (Xt ) and verify that ro > 211'11.
(ii) Define
Z2(a) = aX1 + (1- a)X2' a E JR.
Find a number a which minimizes Var(Z2(a)).
(iii) Define
1- 2a
Zn(a) = a(XI + X 2) + --2-
n-
(X2 + ... + Xn-d, a E JR, n ~ 3.
=
Determine a number a an which minimizes Var (Zn(a)). Find a condition
which ensures that an lin. =
»
(iv) Compute limn_co nVar(Zn(an = t. Compare t and liffin_co nVar(Xn).
Conclusion?
14.10. Define
Xt = Yt - 0.4Yt-1 and Wt = Yt - 2.5Yt-1,
where (Yt) is zero mean stationary.

(i) Express the auto covariance of (Xt ) and (Wt ) in terms of autocovari-
ance of (Yt).
(ii) Show that (Xt ) and (Wt ) have the same autocorrelation.
(iii) Suppose that Yl, ... , Yn are observed. Construct an estimator of
the auto covariance of (Yt) and deduce autocovariance's estimators for (Xt )
and (Wt ).
Lesson 15
Statistics of Diffusion
Processes
This Lesson deals with statistics of continuous time processes, especially

diffusion processes. Nonparametric and parametric methods are considered.
15.1 Nonparametric Estimation in Continu-

ous Time Processes
1) Estimation of moments and spectral density
Let us consider the weakly stationary process (Xt, t E JR) observed over
the interval [0,11. Natural estimators of moments and spectral density are
the sample mean
- 1 fT
XT = T 10 Xtdt, (15.1)
the sample autocovariance
"f3*
1
-T
IT-3 (X t - XT )(Xt+3 - XT )dt, s<t
- s 0
0, s ~ t, (15.2)
and the periodogram
h(,x) = 2~T11T ei >.tXtdtI 2 , ,x E JR, (15.3)
287
288 Lesson 15
provided that the above L 2-integrals exist. If, for instance, (Xt ) has an
auto covariance continuous at t = 0, then these estimators are well defined
(see Theorem 9.10).
The consistency of XT is given by Theorem 9.11 and concerning 'Y: an
adaptation of Theorem 14.1 gives the convergence (Exercise 15.1). Similarly
as in the discrete case, the periodogram is asymptotically unbiased but not
consistent and must be modified using weight functions.
We do not develop these properties since in practice it is difficult to
observe a process in continuous time. Classical schemes for observing a
continuous time process are as follows.
a) Observable coordinates.
Owing to inertia of the measurement's device, the observations take the
J
form
Zt = X 6 ¢(s, t)ds, °~ t ~ T, (15.4)
where ¢ is some deterministic function; for example,

1
¢(s, t) = 2c l[t-e,t+ej(s), c > 0. (15.5)
The quality of estimators based on these observable coordinates is poor.

b) Deterministic sampling.
Suppose that (Xt ) is observed at intervals 0,20, ... , no, where °is a
fixed positive number. Then a natural estimator of the mean is
1
= - L: Xi6
n
X~6) (15.6)
n i=l
and its asymptotic behaviour is governed by the properties of the discrete

stationary process (Xi6, i E Z).
Concerning the auto covariance ('Yt) of (Xt ), a difficulty arises. If (Xt )
is zero mean, an estimator such that
°~
1 n-j
• (6)
'Yj n_ j L: Xi6 X (i+j)6,
i=l
j ~n- 1
= 0, j~n (15.7)
provides some information about ('Yj6,j = 0,1,2 ....) but not about 'Yt for
t ¢ {O, 0, 2o, ...}.
Statistics of Diffusion Processes 289
Clearly the same drawback appears in spectral density estimation: it is

possible to estimate the spectral density t(fJ) of (XifJ), but not the spectral
density of (Xt ) because t is, in general, not determined by t(fJ). This
phenomenon is called aliasing.
c) Random sampling.
We now suppose that (Xt ) is observed at times T i , T2, ... , Tn which are
time arrivals of a Poisson process (Nt, t ~ 0) independent of (Xt ).
This Poisson sampling is superior to sampling at regularly spaced in-
stants as shown by the following lemma.
Lemma 15.1 Let (Xt,t E JR) and (yt,t E JR) be stochastic processes con-
tinuous in probability and let (Nt, t ~ 0) be a Poisson process with time
arrivals (Tn,n ~ 1) and independent of (Xt ) and (yt). If(XT",n ~ 1) and
(YT", n ~ 1) have the same distribution, then (Xt ) and (yt) have the same
distribution.
For a proof we refer to Karr (1986).

As a consequence of Lemma 15.1, we see that the spectral density of
(XT,,) determines the spectral of (Xt): aliasing disappears!
Now suppose that (XT,,) is observed over [0,11 and define an estimator
of the mean m by
-(N) 1 ~
XT =
>'T L..J l(T,,~T) XT", (15.8)
n=i
where the intensity>. of (Nt) is supposed to be known.
We then have the following results.
Theorem 15.1 (i) X~N) is an unbiased estimator of 1'.

(ii) If the autocovariance (-yt) of (Xt ) is locally integrable, then
V ar (X-(N»)
T +m2
= 'Yo >'T + .!.1T
T (1-~)
T 'Y$ d s. (15.9)
-T
(iii) If ('Y$) is integrable over JR, then
TVar(X¥""») -+ 'Yo+ m2
>.
+100
-00 'Y.ds=~2. (15.10)
(iv) If('Y$) is integrable over JR, XT ~ m, and
VT(XT - m) -E... N --- IV (0, [: 'Y$dS) , (15.11)

290 Lesson 15
then
VT(X~N) - m) ~ N '" N (0, E2) . (15.12)
The proof of Theorem 15.1 is left as Exercise 15.2.

Note that the proof of Theorem 9.11 and the dominated convergence
theorem entail
T Var(XT) -+ [ : -y,ds, (15.13)
which should be compared with (15.10).

Concerning (-yd, we assume for simplicity that J1. = 0 and consider the
estimator
4N)(t)
IT _- \2Th
1\
1
T
Loo
. .
1(T;<TT'<T) X T,XT.K (t - (ToI- }
- , J_ • J h
To))
' t > 0,
1,}=1 T
(15.14)
where the kernel K is a density over JR and hT is a banwidth parameter.
This estimator has good asymptotic properties (see Karr (1986)).
Finally the spectral density
f(A) =-11
7r 0
00
-Yt cos Atdt, A E JR (15.15)
can be estimated from XT1 , ••• , XT .. by

1 n-l n-l
j~N)(A) = ----;- L LXTIoXT,,+t Wn(Tk+l - Tk) cosA(Tk+i - Tk)' A E JR,
n7r1\ l=l k=1
(15.16)
where Wn(t) = K(-yh n ), K is the Fourier transform of K (see (15.14)), and
h n is a banwidth parameter.
Details may be again found in Karr (1986).
2) Estimation and prediction for strictly stationary processes.
If (Xt ) is strictly stationary and if the density 9 of Xo exists, then it
can be estimated by
1
gT(X) = ThT
iT
0 K
(x-Xt)
---,;:;:- dt, x E JR, (15.17)
where hT is a banwidth parameter and the kernel K is a density over JR.

If in addition (Xt ) is Markovian and if
rH(X) = E(Xt+HIXt = x), x E JR (15.18)

is well defined, then its kernel estimator is given by
foT - H Xt+H K ( ~ ) dt
,
rHT(z) = f: K
(
aJ"h;' )
dt
' Z E IR (15.19)
and the associated predictor of XT+H given (Xt, 0 ~ t ~ T) is

XT+H = rH,T(XT). (15.20)
Under some regularity conditions, the rate T- 4 / 5 is reached by 9T and
rH,T (compare with (14.33».
Now if the sample functions of (Xt ) are continuous but not differen-
tiable, then their local irregularity furnishes additional information and the
parametric rate T-l is attained by 9T and rH,T. In particular, this phe-
nomenon occurs if (Xt ) is a diffusion process satisfying mild conditions.
Details are given in Exercise 15.4.
15.2 Statistics of Wiener Processes

Let (Wt ) be a Wiener process observed over the time interval [0, T]. The
unknown parameter is (12.
r
Let us define an estimator associated with (Wt, 0 ~ t ~ T) by
l~~~f~ f; [W (~:) -W(Ck ~n1)T)

2"
·2
(1T
liminfZn · (15.21)
n-+oo
In order to study its behaviour, we need the following lemma.
Lemma 15.2 Let Xl' ... ' Xn be real i. i. d. random variables such that
E(Xt) < 00 and E(Xi) = 0, then
E(t, X,) = .E(Xtl+ 3.(. - 1) (E(XnJ' .

4 (15.22)
Proof. Consider the identity
(~X,r (~Xir (~Xi)' (~Xl+ fuXiXi)'

= =
= I:xt+ I: X1Xj+2I:I: x lXjXk + I:I:XiXjXkXt

i i~j i j# i~j k#
292 Lesson 15
and use the independence for obtaining (15.22). <>

We then obtaining a surprising result:
Theorem 15.2
Uf =U2 a.s. T>O. (15.23)
Proof. Define
y;2 =
k"
~
Tu2
[w (kT) _ W (k -1)kT)]2
2n 2 n '
1<k
__
< 2n.
For every k, y k2" ,... X2(1) and YIn, ... , Y2"n are independent since the
Wiener process has independent increments.
Now from Tchebychev's inequality and Lemma 15.2, it follows that for
allg>O
P(l Z n - u2 1>g) P ( I;n ?;(yl:. -1)1 >

2 2"
g
)
2" 2 ]4 8
E [ Lk=I(Yk" - 1) U
< 24ng4
u 8 a2 n + 3b2n(2n - 1)
< g4 24n (15.24)
=
where a E(yk8J and b = [E(YlJ]2 are constant.
Now (15.24) entails
L P (IZn - u 2 1 > g) < 00, g> 0
n
and applying Borel-Cantelli lemma, we obtain

Zn --+ u2 a.s. (15.25)
hence (15.23). <>
Theorem 15.2 means that the observation of Wiener process over an
arbitrarily short time interval allows to construct a perfect estimator of u 2 !
Note that this result is theoretical since it is very difficult to observe a
Wiener process over a full time interval.
Actually (15.23) may be interpreted as a convergence result since Zn
is an estimator of u 2 based on data (W(kTj2n),o ::; k ::; 2n) and which
satisfies (15.25).
Recall that here T is fixed: the "asymptotic" corresponds to accuracy
of the instrument used for observing the process. Some other asymptotics
are considered in Exercises 15.5 and 15.6.
15.3 Estimation in Diffusion Processes

This section is devoted to a brief exposition of parametric estimation in
diffusion processes. Proofs are not given because they use techniques which
are beyond the scope of this introductory book.
Consider first a diffusion process (Xt ) solution of the stochastic differ-
ential equation
dXt = J.L(O, Xt)dt + dWt , (15.26)
where (Wt ) is a standard Wiener process and where J.L satisfies the condi-
tions
IJ.L(O, y) - J.L(O, x)1 ~ kly - xl,

IJ.L(O, x)1 ~ kV1 + x2, (15.27)
P (iT J.L(O,X s )2ds < 00) = 1.

°°
and where is a real unknown parameter. We wish to estimate from the
data (Xt, ~ t ~ T).
Let C = C ([0, 1']) be the Banach space of continuous functions defined
°
on [0,1'], equipped with its norm
11/11 = sup I/(u)l, I E C ([0, 1']) .

uE[O,T]
°
Then
X=(Xt,O~t~T) and W = (Wt, ~ t ~ T)
are C-valued random functions with distributions pf and pf, respectively.
It can be proved that pl admits a density with respect to Pew and the
associated likelihood is given by
I(O;Xt,O ~ t ~ T) = exp (it J.L(O,xt) - ~ iT J.L 2(O,Xt)dt) , (15.28)
°
where dXt is defined by (15.26).
Thus can be estimated by the MLE OT. We now make the following
hypothesis
(i) (Xt ) is a strictly stationary Markov process.
(ii) For every function <f; integrable with respect to the distribution pfo
of X o, we have
lim T
t-oo
liT° <f;(Xt)dt = 1 00
-00
<f;(x)dPfO(x), a.s.
294 Lesson 15
(ergodicity).
(iii) The statistical model is identifiable:
f«(h;Xt,O ~ t ~ T) = f(02;Xt , 0 ~ t ~ T) ~ 01 = O2,

(iv) '" is twice continuously differentiable with respect to O.
We are now in a position to state the convergence theorem of OT:
Theorem 15.3 If conditions (i) - (iv) hold, then for all 00 and as T --I- 00,
" X
OT --+ 00 P8 0 a.s., (15.29)
and
../T(OT - (0) ~ N ",.N (0, Iia 1 ) , (15.30)
where
l -E (8",(OO'XO»)2
80 - 80 80
The Ornstein-Uhlenbeck process (see (12.42) and (12.46» satisfies as-

sumptions in Theorem 15.3 (see Exercise 15.7).
Small diffusions.
Consider a physical system governed by the ordinary differential equa-
tion
dXt
dt
",(8, :1:,), = (15.31)
where t is the time.

If the system suffers small random perturbations, then it is more realistic
to replace (15.31) by the stochastic differential equation
dXt = ",(8, Xt)dt + cdWt , (15.32)
where the positive number c is "small".

The statistical model is described similarly as above. In particular the
likelihood is
f(8,X e ) = exp (loT ",(0;;:) dX: -loT ",2(~'2X:) dt) ,

where xe =
(X: ,0 ~ t ~ T) is the solution of (15.32).
The associated MLE satisfies, as c --I- 0
pfo~ (1ge - 80 1> TJ) --+ 0, TJ>O

and
£-1(0£ - (0 ) ~ N '" N (0, lir/) ,
[iT
where
leo = limEeo
£_0 0
(0J.t(00,Xf))2
00 dt
1
and T is fixed.
15.4 Exercises
15.1. Prove the consistency of 1; under suitable conditions.
15.2. Prove the results in Therorem 15.1 (Hint: for (iv) compute the char-
acteristic function of VT(X!f) - J.t)).
15.3. Look at the estimators 'Y~N) and j~N) «15.14) and (15.16)) and try
to explain the rationale which leads to them (Hint: use the simple kernel
K = 1[-1/2.1/21 and recall that hT and h n are small).
15.4*. Consider a strictly stationary process (Xt, t E JR) such that the
density fu of (Xo, Xu) does exists for u =/; 0 and satisfies
f lI¢ull du < 00, (A)

J(O,oo)
where ¢u(Y, z) = fu(Y, z) - g(y)g(z), 9 denotes the density of X o, and II . II

denotes the supnorm. Furthermore, ¢u is assumed to be continuous at
(x, u) for all x E JR.
(i) Consider the estimator gT of 9 defined by (15.17) where
K(v) = .~e-1J2/2dv, v E JR
V 211"
and show that
T(1- TU) Cov [1hT K (X-Xo)

= 2 10r -,;:;:- 'h1 (x-Xu)]
T-Var(gT(x)] T K ---;;;- duo (1)
(ii) Deduce from (i) and (A) that
TVar[gT(x)] = 21T h~ K(V1)K(V2) IT (1- ;) tPu(y,z)dudydz, (2)

296 Lesson 15
=
where VI (x - y)/hT, V2 = (x - Z)/hT.
(iii) Show that
1100 tPu(y, z)du -1 (1 - ;) T

tPu(y, z)dul
~ iT IltPulloo + iT TlltPulloo
[00
du 0
u
du , y, z E R.
(iv) Show that the bound in (iii) tends to zero as T -+ 00.
( v) Prove that
T· V[UT(X)] ---? 21 00
tPu(X, x)du.
(vi) Show that if f is twice continuously differentiable and if f and f"

are bounded, then
Iu(x) - E (UT(X)) I = O(h~)
as T -+ 00.
(vii) Show that for a suitable choice of hT , we have
E(UT(X) - U(x))2 = 0 (~) .
Xt = uWt , t ~ 0,
where (Wt ) is a standard Wiener process and u > 0 is an unknown param-

eter.
(i) Suppose that X to , X t1 , .•. , X t .. are observed, where 0 = to < tl ... <
=
tn T and consider the estimator
~2
Un = T1 ~ (
L..J X tj - X t j _ 1 )2 •
j=l
Compute E(a-~) and V(a-~).

(ii) Set .6. n = maxl~j~n(tj - tj-I). Fix T and let n tends to infinity.
Show that
lim E (a-~ - u 2 ) 2
n-+oo
=0 if and only if lim.6. n
n-+oo
= o.
(iii) Define the estimator
u2 _
n -
.!.
n
t
j=l
(Xt; - X t ;_1)2
tj - tj-l
(a) Compute E(u;) and V(u;).

(b) Show that
E (&; - 00 2 ) 2 ~ E (u; _ 002) 2 •
Find a condition which ensures the equality.
( c) Prove that for fixed T,
lim E (u; -
n-+oo
00 2 )2 = o.
15.6. (Continuation of 15.5). Suppose that data are X T1 , ... , XT" where
T1 , ••• , Tn are time arrivals of a Poisson process with known intensity A and
independent of (Xt ). Construct an estimator of 00 2 based on X Tu ... , XT"
and prove its consistency as n tends to infinity.
15.7. Let (Xt, t E JR) be an Ornstein-Uhlenbeck process «12.42) and
(12.46» observed over the interval [0,11.
(i) Compute the likelihood (15.28).
(ii) Compute the MLE OT.
(iii) Find a confidence region for () by using (15.30).
X t =tY +cWt , o ~ t ~ 1,
where Y '" N(J-t, £2) and (Wt, 0 ~ t ~ 1) is a standard Wiener process
independent of Y.
(i) Determine an unbiased estimator its of J-t based on (Xt, 0 ~ t ~ 1).
(ii) Show that its is consistent as c --+ o.
15.9. Let (Xt ) be an Ornstein-Uhlenbeck process observed at times 0,1, ... , n.
(i) Determine a conditional MLE estimator On of () based on (Xo, Xl, ... , X n).
(ii) Show the consistency in probability of On as n --+ 00.
15.10*. Consider the stochastic process (Xt, t ~ 0) defined by
dXt = «(}Xt - Xl)dt + ooXtdWt (B)
with initial condition Xo = Xo.

298 Lesson 15
(i) Show that the solution of (B) is
Xo exp [(0 - u 2/2)t + uWt ]

Xt = 1 + Xo fot exp [(0 - u 2/2)s + uW3 ] ds
•
(ii) Consider the estimator OT of 0 defined by
. llT
OT= T
o
Xtdt +
lT - .
0
dXt
Xt
Show that OT is an unbiased estimator and that
Pe (lOT - 01> '7) -+ 0, '7 > 0

as T --+ 00.
Appendix A
Measure and Integration
A.1 Extension of measures

Let 0 be a set. A collection A of subsets of 0 is called a u-field if
(i) 0 E A,
(ii) If A E A then AC E A,
(iii) If (An, n ~ 1) is a sequence of elements of A, then Un~l An E A.
The pair (O,A) is called a measurable space. When 0 = IRd , d ~ 1,

A is constructed as follows. First observe that if C ~ P(O) (the power set
of 0), then the class of u-fields containing C is not empty since obviously
P(O) is a u-field containing C. On the other hand, it is easy to check that
the intersection (in P(O» of all u-fields containing C is a u-field: it is the
smallest u-field containing C, and we denote it as u(C).
The Borel u-field of IRd is the u-field U(ed), where ed is the class of
open sets of IRd. We also denote this u-field as B(IRd) and refer to it as the
u-field generated by ed.
By a measure on a measurable space (0, A), we mean a mapping J.t :
= =
A --+ ni+ [0,00], such that J.t(0) 0 and J.t is u-additive (or countably
additive), that is
J.t (yAn) = ~J.t(An)

for any sequence of An E A, pairwise disjoint.
The triple (0, A, J.t) is referred to as a measure space. For example,
o = 'lI., A = P('lI.), J.t(A) = cardinality of A if A is finite, and 00 if A is
299
300 Appendix A
infinite. This discrete measure is called the counting measure on 'U.,. Note
that J.l( {n}) = 1 for all n E 'U.,. A property is said to hold almost everywhere
(a.e.) on 0, with respect to a measure J.l (J.l- a.e.) if it holds in 0 except in
some subset A E A with J.l(A) = O.
J.l is said to be bounded if J.l(0) < 00. A probability measure P is a
bounded measure with P(O) = 1. For a E 0, the Dirac measure at a,
denoted by O( a), is the measure defined by
if a E A
o(a)(A) ={ ~ if a ¢ A
for A E A. J.l is said to be u-finite if there is a sequence En E A such that

0= Un En and J.l(En) < 00 for all n. The Lebesgue measure on JR (or JRd,
d ~ 1) is u-finite.
Let C = {[a, b) : a, bE JR}. Define J.l on C by J.l([a, b)) = b - a. It can
be checked that J.l is u-additive on C. We extend J.l to the field 1) generated
by C, where 1) consists of all finite, disjoint unions of elements of C. For
U?=l[ai, bi) with [ai, bt) n [aj, bj) = 0, for i f j,
J.l CQ[ai,bt )) = t(bi - ail·
It is clear that J.l is u-additive on 1). The u-field generated by 1) is the Borel
u-field B(JR). There exists a unique u-additive extension of J.l to B(JR) (see
Theorem below). The completion of this measure on (JR, B(JR)) is called
the Lebesgue measure on JR. Since
00
JR= U(-n,n) and J.l«-n,n)) < 00

n=l
for all n, J.l is u-additive.
Theorem A.1 (extension of measures). Let 1) be a field of subsets of

o. If J.l is u-finite and u-additive on 1), then there exists a unique measure
fi on (0, u(1))) such that
jj(A) = J.l(A), for any A E 1).
As an application, consider the following.

Let F be a distribution function on JR, that is
(i) F is nondecreasing,
Measure and Integration 301
(ii) F is right continuous (and having left limits on IR), and

(iii) liIIlz--+-oo F(z) = 0, liIIlz--+oo F(z) = 1.
Then there exists a unique probability measure on (IR,B(IR», denoted as
dF, such that
dF «a, b]) = F(b) - F(a), for all - 00 ~ a < b < 00.

The probability measure dF is called the Lebesgue-Stieljes measure as-
sociated with F. The situation for (IRd,B(IRd») is similar.
The Lebesgue measure on (IRd, B(IRd») is obtained as in the case of

IR by considering the case C of "rectangles" of the form TIt=1 (ai, bi]. It
turns out that such a measure is a product measure constructed from the
Lebesgue measure on IR, as we will see below.
A.2 Product measures

Let (Oi, Ai), i = 1,2, ... , d be measurable spaces. The product u-/ield, de-
noted as ®1=1Ai, on the cartesian product TIt=1 Oi, is the u-field generated
by the field V which consists of all finite, disjoint unions of "rectangles" of
the form TIt=1 Ai, Ai E Ai, i = 1,2, ... , d.
Let I'i be a u-finite measure on (Oi,Ai), i = 1,2, ... ,d. Then there
exists a unique u-finite measure on (TIt:1 Oi, ®1=1Ai), denoted as ®1=1I'i,
such that
®1=1I'i (u, Ai) = u'l'i(Ai)

for any Ai E Ai, i = 1,2, .. . d.
The measure ®1=1I'i is called the product measure of the l'i'S, and
(TIt=1 Oi, ®1=1 Ai, ®1=1I'i) is called the product measure space of the spaces
(Oi, Ai, I'i), i = 1,2, ... , d.
Remark.
In the study of the stochastic processes, we encounter spaces of the form
IRT = {f : T ---+ IR} where, e.g., T = 1N or T = IR+.

The construction of ®iETA; (where here A; = B(IR), for all i E T) goes as
follows.
302 Appendix A
A cylinder in mT is a subset of RT of the form
Al X A2 ... x An X R X R X ••• Ai E B(R).
Then ®iETA (or B(RT» is the u-field on RT generated by the class of

all cylinders of RT. Let J denote the collection od all finite subsets of T.
Given a family of probability spaces
(Rj, B(Rj), Pj) , i EJ
where
Rj --:71
R· X R·'12 x··· x R·'1 .. ' i = {it, h, ... ,in}, ii E T, Rj; = R,
the Kolmogorov existence theorem (Lesson 2) asserts that there is a unique
probability measure P on (RT , B(RT» such that
P (Ail x Ah x ... x Aj .. x R x R· .. ) = Pj (Ail x Ah ... x Aj .. )
for all cylinders, provided that the family of probability measure (Pj, i E J)
satisfies the consistency condition.
A.3 Some theorems on integrals

Let (0, A, JJ) be a measure space. The integration of real-valued functions
defined on {1 with respect to JJ is carried as in Lesson 1.
Theorem A.2 (monotone convergence theorem.)

If fn is an non-decreasing sequence of nonnegative measurable functions,
then
lim [ fn(w)dJJ(w) = [ ( lim fn(w») dJJ(w).
n-oo in in n-oo
Two measurable functions f and 9 are said to be equal almost every-

where (a.e.) if JJ (w : f(w) i= g(w» = O.
Theorem A.3 (Lebesgue dominated convergence theorem).

Let (In, n ~ 1) be a sequence of measurable functions such that
(i) fn converges a.e. to f, that is
JJ {w : liminffn(w) = limsupfn(w) = f(W)}C = 0,

n-+oo n-+oo
Measure and Integration 303
in other words In converges to I except on a set olJ1.-measure zero;

(ii) there exists a integrable function g such that I/nl ~ 9 (a.e.}.
then the functions In, n ~ 1, I are integrable and
lim [ In(w)dJ1.(w) = [ l(w)dJ1.(w).

n-ooio io
Theorem A.4 (Fubini theorem).
II I is integrable with respect to the product measure 1'1 ® 1'2, then
[
k1X~
I(Wl, w2)d(J1.1 ® 1'2)
k1[ [rk2 I(Wl, W2)dJ1.2(W2)] dJ1.1(Wl)
102 [101 I(Wl,W2)dJ1.1(Wl)] dJ1.2(W2),
where WI ~ J0 2 /(Wl,W2)dJ1.2(W2) (resp. W2 ~ J0 1 I(Wl,W2)dJ1.1(wd) is
defined 1'1- a.e. (reap. 1'2- a.e.).
Theorem A.S (Radon-Nikodym theorem).

Let I' and v be two (u-finite) measures on (0, A). If I' is absolutely
continuous with respect to v (i.e., J1.(A) = 0 whenever v(A) = 0), then
there exists an integrable non-negative function I (with respect to v) such
L
that
J1.(A) = I(w)dv(w), 'v'AEA. (*)
The function I (called the Radon-Nikodym derivative of I' with respect

to v) is unique in the sence that, if 9 is another function verifying (*), then
9 = I v-a.e. (i.e., v{w : I(w) =p g(w)} = 0).
Appendix B
Banach and Hilbert

Spaces
B.1 Definitions
Let U be a linear space over JR. A nonnegative real function II . II defined
on U is called a norm if
(i) Ilxll = 0 -¢:::? x = 0,

(ii) Ilx + yll ~ Ilxll + IIYII, x, y E U (triangular inequality),
(iii) Ilaxll = lalllxll, x E U, a E JR.
A normed space is a linear space U endowed with a norm II . II. A
sequence (x n ) from a normed space (E, II . II) is called a Cauchy sequence
if
lim Ilxm - xnll = O.
n,m-+oo
A Banach space is a normed space where every Cauchy sequence is conver-

gent (i.e., a complete normed space).
A Hilbert space is a Banach space (H, II . II) in which there is a defined
function < ., . > on H x H to JR such that
(i) < alxl +a2x2, y >= al < Xl, y> +a2 < X2, y >, where aI, a2 E JR
and Xl,X2,y E Hj
(ii) < X, y >=< y, x>, X,y E Hj and
(iii) < x, X>= IlxW, x E H.

305
306 Appendix B
Such a function is called a "scalar product" (or an inner product). A

Banach space U is said to be separable if there exists a countable subset D
of U which is dense (i.e., for all x E E, there exists a sequence (x n ) from
D such that limn ..... oo Ilx n - xII = 0).
In a separable Banach space U, the Borel u-field Bu is defined as the
u-field generated by the open balls (the open ball with center as x and
radius r is the set {V: V E U, IIx - vII < r}).
Examples.
(a) JRn is a separable Hilbert space with scalar product
n
< x, V >= L XiYi x = (x!, ... , xn ), V = (VI, ... , Vn),

i=l
where Xl, ... , X n , VI, ... ,Vn E JR. A typical countable dense subset of JRn
is Qn where Q denotes the set of rational numbers.
(b) The space C ([0, 11) of continuous real functions defined over [0,11
is a separable Banach space with respect ot the supremum norm defined by
IIxil = sup Ix(t)l, x E C([O,11).

te[D,T]
B.2 V-spaces
Let (U, B, 1') be a measure space, p ~ 1 and LP(I') the space of real mea-
surable functions I on (U, B) such that fu I/IP dl' < 00.
The relation
I "" 9 <=> I = 9 I' a.e.
is an equivalence relation over LP(I').
Now the space LP(I') of equivalence classses associated with "" IS a
Banach space with the norm
IlFilp = (fu I/IPdl') lIP,

where I is any element in the class F E LP(I'). In the following we identify
I with its equivalence class.
It is convenient to denote L OO (I') the space of classes of real measurable
functions I such that
11/1100 = inf{M : I'{t : I(t) ~ M} = O} < 00.

Banach and Hilbert Spaces 307
Then II . 1100 induces a norm over LOO(IJ) which becomes a Banach space.
L 2 (IJ) is a special V-space since it is a Hilbert space with scalar product
< I, 9 >= L IgdIJ, I, 9 E L 2 (IJ).
If p E [1,00]' q E [1,00]' and l/p + l/q = 1, then Holder inequality

provides a link between V(IJ) and Lq(IJ): if IE V(IJ) and 9 E Lq(IJ), then
Ig E L 1 (IJ) and
Illglll ::; 1l/llpllgllq·
If p = q = 2 this inequality entails the Schwarz inequality
I< I, 9 > I ::; 11/1I211g112.
Holder inequality shows that, if 9 E Lq(IJ), and if p = q/(q - 1), then
J
11---+ IgdIJ is a bounded linear functional on V(IJ) with norm Ilgllq.
ifJ:
The norm of ifJ is defined as
sup{lifJ(f)I: I: 11/11p::; I}.
Conversely, it can be proved that if p E [1,00) every bounded linear func-

J
tional on V(IJ) has the representation I 1----+ IgdIJ, 9 E Lq(IJ) (Riesz
representation theorem). Note that if in particular p = q = 2, this result is
valid. A similar representation does not hold for bounded linear functionals
on LOO(IJ). For details we refer to Royden (1968).
B.3 Hilbert spaces

Let H be an Hilbert space with scalar product < ., . > and the norm II . II.
If < x, y >= 0 for some x and y in H, we say that x and y are orthogonal
and write x 1. y. The following theorem is crucial:
Theorem B.1 (Orthogonal projection theorem).

Let M be a closed linear subspace of an Hilbert space H. Then there exists
a unique mapping 7rM of H onto M such that for every x E H,
x- 7r M (x) 1. y, yEM.
7rM is called the orthogonal projection of H onto M.
Note. The linear space M is said to be closed if Xn -+ x, Xn EM, n ~ 1

implies x EM.
308 Appendix B
7r M has the following properties: it is linear, contractive (i.e., II7rM (x)1I $

IIxll) and =
idempotent (i.e., 7rM (7rM (x» 7r M (x».
Conversely every mapping ¢ : H 1---+ H which is linear, contractive,
and idempotent is an orthogonal projection onto some closed subspace of
H.
Complete orthogonal system.
A set Tin H is called an orthonormal system if II¢II = 1 for each ¢ E T
and ¢ 1. 1f; if ¢ and 1f; are any different elements in T. T is said to be
complete of < x, ¢ >= 0 implies x = O.
Concerning complete orthonormal systems we have the following result:
Theorem B.2 If H is a separable Hilbert space, then every complete or-

thonormal systems is countable. If (¢n) is such a system, we have
00
x= L < x, ¢n > ¢n, xEH

n=O
and
IIxll 2 = L < x, ¢n >2 .

00
n=O
an =< x, ¢n >, n E IN is the sequence of Fourier coefficients of x

associated with (¢n).
B.4 Fourier series

Let us consider the special space L2(A1I") where A7r denotes Lebesgue mea-
sure on [-7r, 7r]. Then a classical complete orthonormal system in L2(A1I")
is defined by
1 1
¢o = .,f2i' ¢2k+1(t) = Vi coskt, ¢2k+2(t) = ~ sin kt, k E IN.
Theorem 2 implies that the associated Fourier series converges to x

in L2(A1I")' In 1966, L. Carleson has shown that this series also converges
A1I"-almost everywhere.
If x has m continuous derivatives and if x(-7r) = x(7r), then an =
O(n- m ). If m ~ 2 this implies that the associated Fourier series converges
to x absolutely and uniformly.
Banach and Hilbert Spaces 309
B.5 Applications to probability theory

(1) If (a,A,p) is a probability space, then L 2 (P) is the space of (classes
of) real random variables which have finite variances.
If X, Y E L 2 (P) are zero mean, we have
!IXII = O'(X), IWII = O'(Y), < X, Y >= cov(X, Y).

p(X, Y) = cov (X, Y)j(O'(X)O'(Y)) may be interpreted as the cosine of the
angle [X, Y].
(2) If Y, Xl, X 2, ... , Xn are zero-mean square integrable random vari-
ables, the linear regression of Y with respect to Xl, ... , Xn is the orthogonal
projection of Yon M = sp(Xl , ... , Xn), where for A ~ L 2(P), sp(A) is
the span of A, i.e., the smallest closed subspace containing A.
More generally, let Y E L 2 (P) and X be a random variable with range
(F, C). The conditional expectation E(YIX) is the orthogonal projection of
Y on the space
M = {Z : Z E L 2(P), 3g : Z = g(X)}.
An abstract form of conditional expectation is the following: Let B be

a sub O'-field of A and let PB be the restriction of P to B. Then L 2 (PB)
may be considered as a closed linear subspace of L 2(P) and the conditional
expectation with respect to B is the orthogonal projection EB of L2 (P)
onto L 2 (PB). Therefore EB has the following characterization:
(i) EB(y) E L 2 (PB) for every Y E L 2 (P) and
(ii) Y - EB(y) 1.. Z for every Y E L 2 (P) and Z E L 2 (Pa).
The following theorem gives characteristic properties of conditional ex-
pectations:
Theorem B.3 A mapping 7r : L 2(P) 1---+ L 2(P) is a conditional expec-

tation if and only if it is linear, contractive, idempotent and such that
=
7r(Y) ~ 0 ifY ~ 0 and 7r(c) c for every constant c.
(3) Let us finally note that convergence in mean square is nothing but
convergence in L 2 (P). Consequently if liIDn,m-+oo E(Yn - Ym)2 =
0, then
there exists Y E L 2(P) such that liIDn-+oo E(Yn - y)2 O. =
More details about connection between Hilbert spaces and probability
theory can be found in Rao (1984).
List of Symbols
1R: Set of real numbers.

1R* = 1R\ {OJ.
1R: {x: -00 $ x $ oo}.
1R+: {x:O$x<oo}.
1R+: {x: O$x$oo}.
N: Set of non-negative integers.
7l: Set of integers { ... , -2, -1,0,1,2, ... }.
0: Empty set.
#(A): Number of elements of the set A.
p(n): Power set of n.
AC: Complement of the set A.
o'(C): O'-field generated by C.
O'(X): O'-field generated by random variable X.
Var(X): V(X), Variance of the random variable X.
Cov(X, Y): Covariance of X and Y.
X N(O, 1): The random variable X is distributed as N(O, 1).
fV
N(O, 1): Standard normal distribution.

N(I-', 0'2): Normal distribution with mean I-' and variance a 2.
B(1Rd ): Borel a-field of 1Rd , d ~ l.
BE: Borel O'-field of E.
dL(X): dx, Lebesgue measure.
h(a): Dirac measure at a.
(An i.o.): An infinitely often, limsupn..... oo An.
i.i.d. Independent and identically distributed.
R( X): Range of the variable X.
Fx: The cumulative distribution function of X.
311
312 List of Symbols
E(X): Expected value of X.

p
Xn ----+ X: Convergence in probability.
a.s. X.
X n ----+ . Almost sure convergence.
LP
Xn ----+ X: Convergence in pth mean.
Xn ~ X: Convergence in distribution.
LP(n, A, P): LP(P), LP(n), equivalence classes of pth-integrable
random variables.
0: Product of a-fields, or of measures.
p, 0 v: Product measure of p, and v.
lA: Indicator function of set A.
E 8 (X): E(XIB), conditional expectation of X given the sub-
a-field B.
£(X): Law on distribution of the random variable X.
P(A): Poisson distribution with mean A.
*: Convolution.
X.lY: X and Yare independent.
Bibliography
[1] Anderson, T. W. (1971) The Statistical Analysis of Time Series. John

Wiley, New York.
[2] Ash, R. B. and Gardner, M. F. (1975) Topics in Stochastic Processes.

Academic Press, New York.
[3] Bosq, D. and Lecoutre, J. P. (1987) Theorie de I'Estimation Punc-

tionelle. Economia, France.
[4] Bosq, D. and Lecoutre, J. P. (1992) Analyse et Prevision des Series

Chronologiques. Masson, Paris.
[5] Chung, K. L. (1967) Markov Chains with Stationary Transition Prob-

abilities. (Second edition), Spring Verlag, New York.
[6] Doob, J. L. (1953) Stochastic Processes. John Wiley, New York.
[7] Erdelyi, A. et al (1953) Higher Transcendental Functions. McGraw-

Hill, New York.
[8] Feller, W. (1957) An Introduction to Probability and Its Applications.

Vol. I, Vol. II (1966), John Wiley, New York.
[9] Freedman, D. (1971) Markov Chains. Holder-day, San Francisco.
[10] Halmos, P. R. (1950) Measure Theory. Van Nostrand, Princeton.

[11] Karlin, S. and Taylor, H. M. (1975) A First Course in Stochastic Pro-
cesses. Academic Press, San Diego.
[12] Karr, A. F. (1986) Point Processes and Their Statistical Inference.

Marcel Dekker, New York.
[13] Neveu, J. (1990) Cours de theorie du signal. University of Paris VI.
313
314 Bibliography
[14] Nguyen, H. T. and Rogers, G. S. (1989) Fundamentals of Mathematical

Statistics. Vol. I, II. Springer Verlag, New York.
[15] Prabhu, N. V. (1965) Queues and Inventories. John Wiley, New York.
[16] Priestley, M. B. (1981) Spectral Analysis and Time Series. John Wiley,
New York.
[17] Roo, M. M. (1984) probability Theory with Applications. Academic
Press, San Diego.
[18] Royden, H. L. (1988) Real Analysis. Third Edition. MacMilan Pub-
lishing Company, New York.
[19] Sobcsyk, K. (1991) Stochastic Differential Equations. Kluwer Aca-
demic, Dordrecht.
Partial Solutions to
Selected Exercises
Lesson 1.
1.10. (i) {w: X(w) < oo} = Un~l{W : X(W) ~ n} E A.
(ii) {w : X(w) = oo} = {w: X(w) < oo}C E A and
{w: X(w) = -oo} = nn::;o{w: X(w) ~ n} E A.
(iii) {w : SUPn Xn(w) ~ t} = nn{w : Xn(w) ~ t} and infn Xn =
- sUPn( -Xn ).
1.17. (i) Clearly if X ? 0 and simple, then E(X) = Jooo P(X > t)dt. If
X ? 0 (measurable), then let Xn / X, with Xn ? 0, simple (Exercise
1.18). Since, for each t, (Xn > t) / Un>l(Xn > t) = (X > t), the result
follows by monotone continuity of P and by the monotone convergence
theorem (Appendix).
(ii) Write X = X+ - X- .
= =
(iii) E(IXlk) Jooo P(IXI > tl/k)dt, then y tl/k.
1.18. (i) Let
i
Ai,n = { w: 2n ~ X(w)
i+l}
<"""2rI ' i = 0, 1, ... , n2n - 1
and Bn = {w : X(w) ? n}. Then clearly, for each n ? 1, the Ai,n'S and Bn
form a partition of n. To show that 'Vw, Xn(W) ~ X n+1(w), we consider
two cases:
(a) w E Bn. If X(w) ? n + 1, then Xn(w) = =
n < n + 1 Xn+1(w)j If
n ~ X(w) < n + 1, then
n2 n +1 (n + 1)2n +1
2n +1 ~ X(w) < ..
315
316 Partial Solutions to Selected Exercises
Also
n2n+1 n2n+1 + 1 (n + 1)2n +1
2n+1 < 2n+1 < 2n+1
If w E Aj,n+l with j = n2 n+1, then Xn (w) = nand Xn+l (w) = j 12n+1 = n.
If
n2n+1 + 1 (n + 1)2n+1
- .. ~ X(w) < n".L1 ,
then Xn(w) = n < Xn+1(w) = (j + 1)/2n+1.

(b) w E U~~~-l Ai,n, that is 2i" ~ Xn(w) < W
for some i = 0,1, ...,
n2n - 1. But then,
-2i- < X() 2( i + 1)
w < -0...-:-:--<-
2n +1 - 2n +1 .
If 2:i i ~ X(w) < ~~t!, then Xn(w) = 2i" = Xn+1(w) = 2:il.
If 2"2itl ~ X(w) < 2(itP

2" , then Xn(w) -- 2"i < '2"ft.FT
2i+1 -- X n+1(w).
(ii) For each w, liffin ..... co Xn(w) = X(w). Indeed, if X(w) = 00, then
X(w) ~ n for all n, so that Xn(w) = n - - 00 = X(w) as n ~ 00.
For X(w) < 00, there exists no(w) such that X(w) < no(w). But then,
IXn(w) - X(w)1 ~ 1/2n for all n ~ no(w).
1.20. Let Xn /' X an in Exercise 1.18. Then
= n-+oo
E(X) lim E(Xn) ~ lim [nP(X ~ n)] ~ lim [nP(X = 00)] = 00
n-+oo n-+oo
if P(X = 00) > O.

1.25. (iii) "Ie > 0, P(Xn > e) ~ lin - - 0 as n -+ 00. Let 0 < e < 1,
and An(e) = {w : Xn(w) > e}. Since the An(e)'s are independent, and
P(An(e)) = lin, so that E:=l P(An(e)) = 00, we have, by Borel-Cantelli
lemma, P (limsupn ..... co An(e)) = 1. Thus, in view of (ii), Xn does not
converge a.s. to O.
Lesson 2.
2.2. (i) For A E A, take Bl = B2 = A.
(ii) If Bl ~ A ~ B2 and B~ ~ A ~ B~ with P(Bt} = P(B2),
P(BD = P(B~), then P(B1) = P(BD. Indeed,
P(Bt} ~ P(B~) = P(BD = P(B1BD + P(N) ~ P(Bt},
where P(N) = o.
P(n) = p(n) = 1. P(A C ) = P(BD = 1- P(Bt} = 1- P(A).
For An disjoint, the corresponding B 1,n's are disjoint, so that
P (Un An) = En P(B1,n) = En P(An).

(iii) B1 ~ A ~ B2 with PCB!) = P(B2) = O. Let B ~ A, then
o~ B ~ B2 with 0, B2 E A and P(0) = P(B2).
2.3. (i) W E nt>o{w : Xt(w) ~ x} means that, \It, Xt(w) ~ x. Thus,
SUPt>oXt(w) ~ x.
Conversely, suppose SUPt>oXt(w) ~ x. For each t,
Xt(w) ~ x. -
(ii) Let B - e ~ A and e ~ B. Then Ace ~ AC B. If this inclusion
is strict, then there exists W E AC B such that W ft Ace or, equivalently,
W E (ACe)C =
AU ec. But since W E AC, we must have W E eCj on the
other hand, wEB, we have wEB - e ~ A, impossible.
(iii) \Ix E JR, {w : inft~o Xt(w) < x} = Ut~dw : Xt(w) < x}.
2.5. (i) n has the cardinality of the continuum, since there is a one-to-one
mapping from [0,1) to n: for 0 ~ x < 1, consider the proper dyadique
expansion x = E::'=l Wn /2n, Wn E {O, 1}, with an infinite number of zeros.
(ii) Take n = [0,1), A = 8[0, 1), P = dx.
2.8. \It = ¥t on n. For t = to,
=f. to, X t
A = {w : Xto(w) = 0 =f. ¥to = 1} = {to}.
Since dx({to}) = 0, P(AC) = 1. Thus P(Xto = ¥to) = pen - {to}) = 1.
Lesson 3.
3.13. (i) Let 1j(w) = inf{k ~ 1 : Xk(W) = j}. We have {w : Xn(w) =
j} ~ {w : 1j(w) ~ n}. Thus
P(Xn = j\Xo = i) P(Xn= j, 1j ~ n\Xo = i)

n
E P(Xn = j, 1j = k\Xo = i).
k=l
But
P(Xn = j, 1j = k\Xo = i) = P(Xn = j\Xk = j)P(1j = k\Xo = i) = PIr kfi~

so that Plj = E~=l fi~PIrk.
(iii) Use (i).

(iv) If i is recurrent, then as in the proof of Theorem 3.4, /ji = 1, so
that by (iii), j ~ i.
(v) Let i,j be recurrent and transient, respectively. Then i f+ j in view
of Theorem 3.4.
3.15. (i) Let A be a closed set of S. Then for i E A and j ¢ A, Plj = 0,
Vn ~ O. Thus
1 = P(Xn E SIXo = i) = Ep(Xn = jlXo = i) = EPlj

jES jES
E Plj = P(Xn E AIXo = i).

jEA
(ii) Let A be the set of all null recurrent states. Let i E A and j ¢ A.
If j is transient, then i f+ j. If j is positive recurrent, then i f+ j, since
otherwise j -+ i, but then i has to be positive recurrent.
3.16. (a) Nj(w) = E~=ll(x .. =j)(w). Consider the successive indices n ~ 1
at which Xn(w) = j:
1j(w) = 1f 1 )(w) = inf{n ~ 1: Xn(w) = j},

1f2 )(w) = inf{n > 1f 1 )(w) : Xn(w) = j}, ...
we have
P(Nj ~ klXo = i) = P (1j(1) < 00, ... , 1j(k) <: oolXo = i)

P (1fl) < oolXo = i) [p (1fl) < oolXo = j)r-1
lij(fjj)k-l.
For i i= j,
P(Nj = klXo = i) P(Nj ~ klXo = i) - P(Nj ~ k + llXo = i)
{ I - lij for k = 0
lij(fjj)k-l(1 - Ijj) for k ~ 1.
For j = i, peN - i = klXo = i) = (fii)k(1 - Iii)' k ~ o.

(b) Let j be a transient state, so that Ijj < 1.
P(Nj = oolXo = i) = k-+oo

lim P(Nj ~ klXo = i) = lim lij(fjj)k-l = o.
k-+oo
(d) Let j be a recurrent state and i -+ j,

00 00
LPij E(NjIXo = i) = LP(Nj > klXo = i)

n=l 1:=0
00 00
LP(Nj ~ k+ llXo = i) = L/ij(fjj)1:

1:=0 1:=0
00
lij L(l) = 00,

1:=0
since lij > O.

Lesson 4.
4.7. (i) If z ~ 0 and 0 ~ Y ~ s then
{TN.+! - s ~ z, s - TN. ~ y} = {N,+x - N, = 0, N, - N'-1I = O}
(N,+x - N" N, - N'-II) has distribution P(..\z) ® P("\y), hence
P (TN.+! - s ~ z, s - TN. ~ y) = e->'x e->'1I (1)
If z < 0 or y ¢ [0, s] this probability is clearly O.

(ii) (1) shows that TN.+1 - s and s - TN. are independent and that
P(s - TN. ~ y) = e->'1I1[0,,](Y), (2)

in particular
P (s - TN. = s) = P(N, = 0) = e->", (3)
Thus the distribution of s - TN. is a mixture of an absolutely continuous

measure and a discrete measure.
(iii) We have TN.+1 - TN. = (TN.+1 - s) + (s - TN.), consequently,
PCTN.+I-TN.) =
PCTN.+l-') * PC,-TN.)' where * denotes the convolution
product. After calculations, we obtain the density of TN.+! - TN. as
..\2ze->'z 0 <z<s
I(z) ={ ..\(1 + ..\s)e->'z z ~ s.
(iv) Since TN.+! - s has en exponential distribution, we have
1
E(TN.+l-S)=~. (4)
On the other hand, (2) and (3) imply E(s - TN.) = (1- e-)..$)/A hence
2 e-)..$
E (TN. +1 - TN,) = 'I - -A-' (5)
(v) If buses arrive at a station according to a Poisson process with

intensity A and if I arrive at the station at time s, then the expectation of
my waiting time is given by (4). Now formula (5) shows that, if s is large,
this expectation is approximately the "intuitive" ~E(TN.+1 - TN.)'
4.11. (i) If 0 ~ s < t, P(M,-M.) depends only on t - s since
P(M,-M.) = P(Nt-N;) * P(N~-Nn = P(NL) * P(-NL) '
thus the increments of M t are stationary.
Now if 0 ~ t1 < ... < tic we have
(Mt2 - Mt1 ,···, Mt/c - Mt/c-d

= (Nl2 - NlJ - (N!2 - N~), ... , (Nt~ - Nl/c-1) - (Nt2/c - Nt2._ 1))·
Then, using independence between (Nl) and (N?) together with indepen-
dence between the increments of theses processes, it is easy to infer inde-
pedence between Mh - Mtll ... , Mt/c - Mt/c-1.
(ii) From (i) it suffices to compute the distribution of Mh where h = t-s.
We have P(Mh = k) = P (Nl- Nl = k), k E 'lh and by independence
= k) = L
00
P(Mh p(Nl = k + p)p(Nl = p),

pES.
where Sic = {p: k + p ~ 0, p ~ o}. Hence
P(Mh = k) = (A1h)ke-()..1+)..2)k L (A1 A2h2 )P

pES/c (k + p)!p! .
(iii) First let us suppose that A1 > A2' Then E(Mt) = (A1 - A2)t --+ 00
as t -+ 00. Now IMtl ~ c ~ IMt - E(Mt)1 ~ c - E(Mt) and for t
large enough c - E(Mt ) ~ -E(Mt )/2, thus IMtl ~ c ~ IMt - E(Mt)1 ~
E(Mt )/2. By using Tchebychev inequality we get
P(IMtl ~ c) - E(M2 t )) <- 4Var(M

(E(Mt))2
t)
4(A1 + A2)t
(A1 _ A2)2t2 --+ 0 as t -+ 00.
The case A2 > Al may be treated similarly by putting Mf = Nl - Nl

and noticing that 1M:! IMtl. =
Finally if Al = A2 = A we may write
Mt Nl - At At - Nl
V1i = V1i + V1i .
By using Theorem 4.7 and the independence between (Nl) and (Nl) we
obtain
lim P
t ..... oo
(I~
VAt <
- z) = P(INI <
- z), z E 114
where N "" N(O, 2). Now for all £ > 0, there exists a real number Ze/2 such
that P(lNI ~ Ze/2) = £/2. On the other hand c/V1i ~ Ze/2 for t large
enough. Therefore P(IMtl ~ c) ~ P (IMtl/V1i ~ Ze/2) and for t large
enough
IMtl)
P ( V1i ~ Ze/2 ~ P(INI ~ Ze/2) + 2" £,
£
=
which proves that limt ..... oo P(IMtl ~ c) = o.
(iv) The results obtained in (iii) show that, even if Al = A2, the differ-
ence INl- Nli has a tendency for increasing with t.
4.13. (i) Let t be a positive number such that B ~ [0, t] and let k be a
positive integer. We have
P(NB = k) = 'L P(NB = klNt = n)P(Nt = n).

n~k
Now from Theorem 4.4, we infer that the conditional distribution of NB

given {Nt =n} is binomial B(n, m(B)/t), where m denotes the Lebesgue
measure. Therefore
P(NB = k) = 'L ( ~ ) (m(B))k (1- m(B))n-k e_At(At)k

n~k t t n!
e-Am(B) (Am(B))k
k~l.
k!
Thus P(NB = =
0) e-Am(B) and NB "" P(Am(B)). (Poisson distribution
with parameter Am(B)).
(ii) Similarly as above we consider t such that U:=I Bj C [0, t]. Then
by using again Theorem 4.4, we see that the conditional distribution of
(NBl' ... , N B", N[o,tj - U7=1 Bj) given {Nt = n} is mulitnomial and hence
P(NB1 .... ,NB,,) = P(Am(Bt)) ® ... ® P(Am(B,,)). (1)
(ii) The above formula (1) shows that (NB) is a Poisson process on
R with mean measure Am: the two definitions of a Poisson process on R
agree.
4.15. (i) Write
E (e iIlXi ) = f
1:=0
E (e illXi INt = k) P(Nt = k) = f
1:=0
e->.t (A:t ¢1:(u),
u E R (recall that Yo = 0). Thus we have

E (e iIlXi ) = e->.t(l-.p(II», uER.
(ii) Since Yn has a finite variance, we may write
2
u ) = 1 - 2A
¢ ( V). u 0'2 + 0 (uT 2
)
.
Therefore
E (eiIlXdv'X) = e-q2tIl2/2->.to(1I2/>.)
and
lim E (eiIlXi/".;x) = e-q2tIl2/2 uER,
>'_00 '
which shows that the asymptotic distribution of Xt/V). is N(O, 0'2t). Actu-
ally it can be proved that the Poisson process "tends" to a Wiener process
as A tends to infinity.
Lesson 5.
5.5. For i i' j,
o ~ Pij(t) ~ 2:Pi1:(t) = 1- Pii(t) -? 0,

1:¢j
as t ~ O.
5.6. (i)
"'g . = L..J
L..J'J
'" lim ![p,
t '..J(t) - 0"] = lim! "'[p,
t'\,O L..J'J
'J
.. (t) - 0"]
t'\,O t 'J
jES ;ES ;ES
(since S is finite). But EjEs Pij(t) = EjEs Cij = 1, and hence

EjEs gij = 0, 'Vi E S.
(ii) Easy.
(iii) For c > 0,
Pij(t + c) - Pij(t) = L Pi/c (c)Pkj (t) - Pij(t)

kES
Pii(c)Pij(t) + LPik(C)Pkj(t) - Pij(t)

k¢i
L Pik(c)Pkj(t) - [1- Pii(c)]Pij(t).
k¢i
Thus, since Pkj(t) ~ 1,
-[1 - Pii(c)] ~ Pij(t + c) - Pij(t) ~ L Pik(c) = 1 - Pii(c),

k¢i
or IPij(t + c) - Pij(t)1 ~ 1 - Pii(c). For c < 0, similarly, we get IPij(t +

c) - Pij(t)1 ~ 1 - Pii( -c).
(iv) Since S is finite.
5.7. Pii(t) ~ e9iit ~ 1 + giit, so that 1 - Pii(t) ~ -giit.
5.S. Use gij ~ ° i= EjEs gij = 0.
for i j,
= kto + (t -
5.14. For
and note that ° s =k t - kto to. oft/to.
t > to,
let
~
be the integer part
~
Write t
Using (5.8), we get
kto)
IP(t) = IP(kto + s) = IP(to)IP «k - l)to + s)

= IP2(tO)JP «k - 2)to + s) = ... = JPk(to)IP(s).
5.15. (ii) Let qi = -qii so that qi = limh_o(l - Pii(h»/h ~ 00. If

liminfh_o(l - Pii(h»/h = 00, then obviously, qi exists and is equal to 00,
so that, 'Vt > 0, (1- Pii(t»/t ~ qi.
Suppose that liminfh_ o(l- Pii(h»/h < 00. Then'Vt > 0,
1- Pii(t) I. . f 1- Pii(h)
-- <
-
Imln
h_O
, ,
and hence
·
I1m 1 - Pii(h) I· . f 1 - Pii(h)
sup h ~ 1m III , ,
h_O h_O
and limh-+o[l- Pu(h)]/h exists.
(iii) Letting h -t 0 in PiiJh) ~ ~i~(~) 1':3£ we get
1. Pij(h) Pij(t) 1
IT.!~P - h - ~ - t-1 _ 3c < 00.
Letting t -t 0 we obtain
·
1Imsup--< . f Pi)' (t)
Pi)' (h) l'Imln 1
----< 00
h_O h - t-+O t 1 - 3c '
and hence
• Pi)' (h)
1Imsup--= l' . f Pi)' (t)
Imln - - ,
h-+O h t-+O t
since c is arbitrarily small.
Lesson 6.
6.4.
P~r: = ( 2: ) (pqt = (-It ( -~2 )22n(pqt.
(For x E IR and k E 1N, ( : ) = x(x - 1) ... (x - k + 1)/k!). Thus
00
~
L..J
p,2n
00 ~(-lt ( -~2 ) (4pqt = (1- 4pq)-1/2
n=O
= [(2p_l)2]-1/2 = _1_ = _1_.

2p-l p- q
6.9. 8(i) = p8(i + 1) + q8(i - 1).
(i) 8(i) = [(q/p)i _ (q/p)o] / [(q/p)b _ (q/p)o].
6.10. S~ = i + Xl + ... + X n , n ~ O. We have
P (s;. = j) = ( (n + F- i)/2 ) p(n+i -i)/2q(n-H i)/2
for n +j - i even and Ii - jl ~ n (and zero otherwise).

6.13. T~(w) = inf{n ~ 1 : Sn = O}. Let
E P(T~ = 2n)s2n = 1- Vl- 4pqs2.

00
¢(s) =
n=l
P(T~ < 00) = 4>(1) = 1- v'1- 4pq = l-Ip - ql·

6.14. Note that
W(s) = (1- Jl - 4pqS2) j(2qs)

= [1- ~ ( 1~2) (-It+l(4PqS2t] j(2qs)
= t ( 1~2 ) (4~q)n s2n-l
(_1)n+l
n=l q
= als+a3 s3 + ....
Lesson 7.
7.1. Use (i) and (Nt ~ n) = (Sn ~ t).
7.10. (i) (Rt > x) = (no renewals in (t, t + x]).
(ii) For x < t, (Ct > x) = (no renewals in (t - x, t]).
(iii) (Rt > x, Ct > y) = (no renewals in (t - y, t + x]), for 0 < y < t.
(iv) XNi+l = Ct + R t .
Lesson 8.
8.1. (i) For each t, A(t) and D(t) are independent, thus
E P(A(t) = n + j)P(D(t) = n)
00
P(X(t) = i) =
n=O
e->'t(At)n+j eJJt(J.lt)n = e-(>'+JJ)tei/ 2 Ij(x),

2:
00
n=O
(n+ i)! n!
where Ij(x) = L::=o (~)2n+j j[n!(n+ i)!] and x = 2t..;>:p..

(ii) For i ~ 0,
00
P(X(t) = -i) = 2: P(A(t) = n - i)P(D(t) = n)

f:
n=O
e-~t(Att.-j e-JJt(~t)n
n=j
00 e->'t(At)k+j e-/-It(l-'t)k+j
= ~ k! (k+j)!
= (-;A)-j P(X(t) = j).

(iii) E(zX(t» = E(zA(t»E(z-D(t» = e->.t(l-z)e-/-It(l-l/z). Thus as
z / 1, E(zX(t» - + l.
(iv) P(Q(t) < ooIQ(O) = i) = E~-oo Qj(t) = 1 by (iii).
S.2.
P(Q(t) ~ j) = E [P(Q(t) $ jIQ(O»IQ(O)]
00
= Ep(Q(t) ~ jIQ(O) = i)P(Q(O) = i)

i=O
E [ej-i(t) -
00
= pi+1e_(HH1)(t)] (1 - p)pi = 1 _ pi+!.

i=O
Thus P(Q(t) = j) = P(Q(t) ~ j) - P(Q(t) ~ j - 1) = (1- p)pi.
S.7. (i) E(Q(t)IQ(O) =i) = *+ (i - *) e-/-It, (p =1 - q = e-/-It ).

Var( Q(t)IQ(O) = i) = q* + ipq = *+ *)(i - e-/-It - ie- 2/-1t.
(ii) E(Q) = >'/1-', E(W) = 1/1-'.
S.9. Qn+! = Qn-on+Cn+! andQ~+! = Q~+On+C~+1-2Qn-2onCn+1+
2QnCn+1, ( here o~ = On, OnQn = Qn). Since Cn+! is independent of Qn.
Cn, and by stationarity, E(Q~+l) = E(Q~)j and E(Cn ) = E(on) = p,
we have 2E(Q)(1 - p) = 2p(1 - p) + >'E(y2) and hence E(Q) = p +
>.2 E(y2)j[2(1- p)].
Lesson 9.
9.3. (i) Let 'H t -1,p be the subspace spanned by X t - 1, ... , Xt_p. Since Xt E
'H t -1 = U~l'Ht-l,p, there exists a sequence (X(p) , p ~ 1), X(P) E 'Ht-l,p,
such tha.t X(p) -+ Xt, as P -+ 00 (in mean square). Thus, it suffices to
verify that
p_oo
lim E (X(P) - X(p»)2
t
- 0
-.
Now, for each p, X t - Xr) l.'Ht-l,p, and hence
E (X(p) -xtf = E (X(P) -xr)f +E (Xr) -Xtf.

But
IIX Xtll
t - ~ IIX X~p)11 ~ IIX X(p) I
t - t -
< IIXt- Xtll + IIXt- X(P)II,

so that
. E
p~ = pl~~ E (Xt- X(P) )2 = E
(Xt- X t(p»)2. (X Xt)2 ,
t -
A
and hence E (X(P) - X;p) ) 2 ---? 0 as p -+ 00 as desired.
(ii) As a consequence of (i), we have
E(Xt - Xtf = pl!..~ E(Xt - x~p)r. (1)
Now, by stationarity, the covariance matrix Ct,p of (Xt, X t - 1 , ... , X t - p)

does not depend on t. Hence the same property is valid for E(Xt _ Xf'») 2,
and consequently for E(Xt - Xtf (see (1».
9.7. Set
Xt+h,,(w)-Xt(w) 1 ( )
Zh .. (w) I = -h l{t+h .. }(w) - l{t}(w) ,
In n
w, t, t + h n E [0,1] and linln_oo h n = O. Then, for every n,

Zh,,(W) = 0, w f/. {t, t + hi, t + h 2 , ... }
Thus (Xt ) is L 2-differentiable although its sample functions are not differ-
entiable and even not continuous.
9.10. (i) Since (Xt ) is L2-continuous, C is continuous (Theorem 9.6). It
follows that C is Riemman integrable on [a, b] x [a, b], hence (Xt ) is L2_
integrable on [a, b] (Theorem 9.8). Similarly (XttPn(t» is L 2-integrable on
[a, b].
(ii) Theorem 9.9 implies
E(enem) = J1 [a,bJ2
tPn(S)C(S, t)tPm(t)dsdt
1b tPm(t)dt 1b tPn(S)C(S, t)ds

1b tPm (t)>'ntPn (t)dt = >'nOnm.
(iii) By using the definition of the £2-integral, it is easy to prove that
E (Xt 1B X&,pn(S)ds) = 1b E(X&Xt),pn(s)ds.

Thus,
E(Xten) = 1b G(s, t),pn(s)ds = >'n,pn(t).
(iv) Combining (ii) and (iii) we obtain
E (x,- t,e,¢,(t»)' C(s,t) - 2E It,M,(t)] + E It,el¢l<tl]

n
G(t, t) - L >'I:,p~(t),
1:=0
which tends to zero as n --+ 00. Finally
=L
00
Xt el:,pl:(t), a $ t $ b,
1:=0
where the series converges in mean square.
Lesson 10.
10.7. (i) By using the orthogonality relations
E [(Xn+1 - X~~I)Xj] = 0, l$j$n
one obtains the system

n
'Yn+l-j = ~ ,pi'Yn+l-i-j, 1 $ j $ n, (1)

i=1
where 1. denotes the autocorrelation of (Xt ). Now we have 'Yo = 0'2 (1 + an,
'Y-l= 'Yl = O'2al, 'Yl = 0, Iii> 1. Thus the desired result follows by
substituting these values in (1).
(ii) By using a similar method as in the proof of Theorem 10.2, it can be
proved that rn = ,pI (see also Lemma 14.1). Now by using recursively the
difference equation obtained in (i), one can infer that ,pI = [( _1)n+la1(1-
anl/(I- a~(n+1)) hence the result.
10.9. (i) We use Cauchy criterion. First note that, for p < q,
E [ (t.r<-l' .rH~H) 'W<->. ..,r,_.] q
= L:rt - 1 ·· .rt_jE("I;_j),
j=p
since rt-I, ... , rt-j, "It-p, ... , "It-q are independent. Consequently
2
q q q
E
(
?: r
J=P
t - 1 ... rt-j"lt-j
)
::; ?:
J=P
j
a E("I;_j) ::; E("I~)?: aj
J=P
and the bound tends to zero as p and q tend to infinity.

(ii) By using the definition of X t and X t - 1 , it is easy to verify that
X t - r t - 1X t -1 = "It. On the other hand, X t is a function of "It, "It-I, ...
and rt-I, r t - 2 , ••• and r t is independent of these variables. Thus r t is
independent of Xt. For Xt-j, the proof is similar.
(iii) Applying the results in (ii), we obtain
E(Xt IXt - 1, ... ) = X t-1E(rt- 1) + E("It) = pXt- 1.

=
(iv) Set Ct X t -pXt - 1 , t E 7Z, then (Ct) is a white noise (see (iii)), and
(Xt ) is an AR(l), thus it is stationary and (Ct) is its innovation process.
10.10. (i) Theorem 9.2 shows that (Xt ) is stationary. On the other hand
1 1 00
= - L:l"lt+k + - L:l"lt-1+k = "It·

00
Xt - -Xt- 1
p k=1 P k=1
(ii) The spectral density of "It is
f,,(>") = Var("Io) >.. E [-11'.11']

211'
(see Example 9.1). Now applying Theorem 9.2 to the filter defined by
=
"It X t - Xt-I! p, we obtain
Var ("10) d>" = 11- eiA / pI2dJ.l(>,,),

211'
where J.l denotes the spectral measure of (Xt ). Thus (Xt ) has a spectral
density given by
f(>..) = Var(7]o) 11- eiA /pl-2, >.. E [-11',11']. (1)

211'
(iii) The continuity of scalar product in a Hilbert space entails

p
E(Xt"7t+d = - p-oo L...i P" E(1/t+k1/t+d = -pE(1/l+d i= 0,

lim'"'
k=1
which proves that 1/t+1 is not orthogonal to X,. Thus (1/t) is not the inno-
vation of (Xt ).
(iv) It is easy to verify that (Ct) is a white noise. By using again Theorem
9.2, we obtain
f(>..) = Var ("70) 11 _ pei ).I- 2, >.. E [-71",71"]. (2)

271"
The comparison of (2) with (1) gives
Var(co) = p2 Var(1/0) < Var(1/o).

(v) It is easy to show that X. = Ei=opic.-j, S E 'll. Thus Ct.1.X.,
S < t and (Ct) is the innovation of (Xt ).
Lesson 11.
11.4. (EBn(y)) is a martingale since EB"y is square integrable and
E B" (EBn+l(Y)) = EB,,(y), n ~ 1.
Now Jensen inequality for conditional expectation (see Lesson 1) entails
(E Bny)2 :::; EBn(y2), n~1.
Taking expectation on both sides leads to
E (EBny)2 :::; E (EBn(y2)) = E(y2), n ~ 1.
Thus (EBn (Y)) satisfies the condition in Theorem 11.1 and consequently
converges in mean square and almost surely.
Note that it can be proved that
EBny ~ EB .... (y),
where Boo = u (UnBn).

11.5. Let (Yn) be a martingale such that
L2
Yn --Y as n -+ 00.
For every integer n, we have

E 13 ,,(yp ) = Yn , n ~p.
Now since EB" is an orthogonal projection in L2 (P) (see Appendix), it is

continuous. Then Yp ~ Y as p -4 00 implies that EB" (Yp) ~ E B" (Y)
for every fixed n. Hence Yn = EB,,(y), n ~ 1 and (Yn ) converges almost
surely (see Exercise 11.4).
11.S. (i) Let (Bn) be the sequence ofu-fields defined by Bn = u(XI , ... ,Xn),
n ~ 1. Since X n+1 is independent of Xl, ... , Xn and E(Xn+d = 0, we have
EB"(Yn+d = X 2 + ... + Xn + EB,,(Xn+l )

X 2 + ... + Xn + E(Xn+l) = Yn, n~1
and (Yn ) is therefore a martingale.

(ii) We have
L P(X n- 1 = -n) = L n1 < 2 00.

n~2 n~2
Then Borel-Contelli lemma entails that there exists 0 0 such that P(Oo) = 1
and for every w E 0 0 , X n- 1 = n/(n 2 - 1) for n ~ no(w). Consequently, if
n ~ no(w), we have
n p
Yn(w) - Yno(w) = L p2 -1·
p=no(w)
Taking the limit as n -4 00, we obtain Yn(w) -400. Hence Yn ~ 00. Note
that a consequence of this result is SUPn E(IYn I) = 00 (see Theorem 11.2).
11.10. (i) Since Xl ... , Xn are integrable and independent, TI?=1 Xi is
integrable. Now if Bn = u(XI , ... , Xn), n ~ 1, we have
E B" (11 Xi) = (g Xi) E 13 "(Xn+t} = (g Xi) E(Xn+1 = gXi,
thus (Yn ) is a martingale. Now sUPn E(IYn I) = 1, then Theorem 11.2 entails
Yn~Y.
(ii) Let
n
Tn =L 1{Xi=3/2}, n~1.
;=1
By the strong law of large numbers, we have
~ ~ P ( Xi = ~) = ~.
It means that there exists no such that p(no) = 1 and Tn(W)/n --+ 1/2
for wE no.
Now let £ be a positive number such that 3c +1/ 2 < 2. There exists
no(w) such that
Tn(W) 1
-n- < 2" +£ for n ~ no(w), w E no.
Hence, for w E no,

II X(w)
n
, = -2 - <
3T ,,(w)
n
(3-/2+C)n
1
2- --+ 0' as n -+ 00.

;=1
thus Y =
0 a.s.
Finally E (n:l Xi) = E(Y) = 0 when n:l E(Xi) = 1.
Lesson 12.
12.3. (i) The equation which defines <Pn can be written in the form
1t s<Pn(s)ds + t 11 <Pn(s)ds = An<Pn(t), 0 ~ t ~ 1. (1)
Since <Pn is continuous (see, e.g. Exercise 9.10), the left side of this equation
is differentiable, hence <Pn is differentiable. Taking derivative on both sides,
we obtain
t<Pn(t) + 11 <Pn(s)ds - t<Pn(t) = An<P~(t). (2)
The second derivation gives An<p~(t) = -<Pn(t).

(ii) By solving the above differential equation, we find
<Pn(t) = a cos ..:x + bsin ..:x, (2)
where a and b are constants. From (1) we obtain <Pn(O) 0, hence a O. = =

Now (2) entails <p~(I) =
0, which implies bX;;I/2 cos X;; 1/2 0, which in =
turn implies An =
[(n + 1/2)2'/1"2] -1. On the other hand,
1 = 11 <p~(t)dt = 11 b2 sin 2 (n + 1/2)7rtdt

and consequently b = ±.J2. As the sign of <Pn being arbitrary, one can
choose b = .J2, thus
<Pn(t) = v2sin (n +~) 7I"t, 0::; t::; 1, n2:: O.

(iii) Exercise 9.10 shows that
Xt = v2~en sin (n +~) 7rt, o::; t ::; 1,
where the series converges in mean square and where en = 1 Wt<Pn(t)dt Jo

with E(en) = 0 and Var(en) = An = (n + lr
2 71"-2, n 2:: O. Finally, since
J;
(Wt ) is a Gaussian process, the L 2 -integral Wt<Pn(t)dt is limit in mean
square of Gaussian random variables (see e.g., Section 9.4) and is therefore
Gaussian: en "" N(O, An).
12.5. (i) Elementary computations show that
In(A) = {;
n-l 1
"2 (Wt~+l - Wn +
(A-"21) n-l
{; (Wt;+l - WtJ2 .
The first sum is nothing but (Wl- W;)/2. Concerning the second sum,
we have
E ( ?: (Wti+! - WtJ2 - (b - a))2 = E (n_l)2

n-l
,=0
?:}'i , 1=0
=
where Yi (Wt ;+! - W t J2 - (ti+l - til, i = 0, ... , n - 1. Hence E(Yi) 0 =
and Var(Yi) = 2(ti+l-t;)2. Furthermore, Yo, Y1 , ... , Yn - 1 are independent.
Therefore
E (
n 1 )2
{;Yi
n-l
2 'l)ti+ 1 - ti)2
;=0
< 2(b - a) SUp(ti+l - til --+ 0, as n -+ 00.
i
Thus
n-l
~ 2 L2
L.,; (Wti+! - WtJ --+ b- a
i=O
and finally
In(A) L2
-+
1
2(Wb2 - Wa2 ) + ( A - 1) (b - a),
2 (1)
O~A~l.
(ii) In order to compute the ITO integral

sequence of approximating functions by
J: WtdWt , we may define a
n
fn(t) = L: Wt ;1[tl,t.+l](t), a ~t ~ b.
i=l
These functions are nonanticipative and
1a
b
fn(t)dWt
n
= L: WdWt.+l -
i=l
Wt.) = In(O).
Taking limit in mean square and using (1), we obtain
ja
b
WtdWt = 21 (2
Wb -
2) b - a
Wa - -2-'
(iii) In order to obtain the "natural" value i (Wl- W;), one may
i
choose A = and define formally
Wt • + Wt .+
I,{b fn(t)dWt = ~
L...J f}
1
(Wt.+ 1 - W,;),
a i=l
which leads to the so-called Stratonovitch integral
la
b
1(Wb2 - Wa2)
WtdWt = 2
12.10. (i) The mappings f : W t---+ (Tn(w),w) and 9 : (t,w) t---+ Wt(w) are
measurable. Then h : w t---+ WT.. (w)(w) is measurable since h = go f.
(ii) We have
E(WT,,) J E(WT.. ITn = t)dPT.. (t)

J E(WtlTn =t)dPT.. (t)
J E(Wt)dPT.. (t) by independence

O.
Similarly
Var(WT,,) E (WfJ = J E(W?)dPT,,(t)
0"2 J tdPT,,(t) = 0'2 E(Tn) = n0'2I>',

(see e.g., Lesson 4).
(iii) We have
E (e iUWT,,) J E (e iUWT" ITn = t) dPT,,(t)
J (e E iUWt ) dPT,,(t) = Je- u 'tu'/2dPT,,(t)
(see Exercise 12.1). Hence, by using Corollary 4.2, we obtain
E (e iuWT,,) t XJ
e_u2tu2/2>.(>.t)n-l e->.tdt
Jo (n-l)!
100
o
e
-at >.(>.t)n-l
, ."dt,
where a = ). + to"2u 2. Noting that
[00 e- at a(att-l dt = 1,
Jo (n -I)!
we obtain
E (eiUWT,,) = (1 + 0'2u2>' 2)-n ' u E IR.
Now the characteristic function of the Laplace distribution (Le., the distri-
bution with density e- 1xI /2, x E IR) is (1 + u 2)-I. Consequently,
[1 + (0'2u 2)/(2>')] -1 is the characteristic function of the distribution Pu ,>.
with density v)./(20')e-lxlvTI/u. Finally WT" has the distribution P::l,
where * denotes the convolution product.
(iv) The characteristic function of WT" - WT,,-1 is
A. (u)
V'n -- JE (eiU(WTn -wTn-dlT.n-l -- t n-b T.n - T.n-l -- Sn )
XdPT,,_l (t n-l)dPT,,-T,,_l (sn)

J e-U2 ...U2/2dPT.. _l (t n-1)dPT.. -T.. _1 (sn)
J e-u~6 .. u-/~ )..e- A6 "dsri by Theorem 4.2
0'2U 2)-1
( 1+~ u E JR,
which is again the chacteristic function of PU,A'

12.14. (i) Since the sample functions of (Wt ) are a.s. continuous, we have
sup W t
09:S;T
= O:S;t:S;T
max W t a.s.
hence YT has the same distribution as ZT = maxo9:S;T Wt .

(ii) Clearly {WT > x} = {ZT > x, WT > x}. Thus
1
peA) = P(WT > x) = V27rT 1 1:
00
exp (y2)
- 2T dy.
(iii) The reflection principle implies that, for every positive x,
P(ZT > x, WT > x) = P(ZT > x, WT < x)

since P(WT = x) = O. Consequently
P(ZT > x) = 2P(ZT > x, WT > x) = 2P(WT > x)

2
V27rT 1 1:
00
exp (y2)
- 2T dy, x > O.
Thus ZT has the density y'27(.;rT) exp( _x 2 /(2T))11l4 (x).
Lesson 13.
13.2. (i) Suppose that Ln(x, 0) = TI?:1 !(Xi, 0) (where x = (xt, ... , xn)) is
strictly positive and differentiable over e (an open set in JR), and consider
J
the relations
Ln(x, O)dx =1 (1)
J
and
Tn (x)Ln(x, O)dx = 0. (2)
If it is possible to differentiate (1) and (2) under the sign J, we have

J B:(Jn dx =0 (3)
J (
and
1 BLn) (4)
Tn Ln B(J Ln dx = l.
Now, note that (4) may be written as
1 BLn)
Cov ( Tn, Ln B(J = 1,
where the random variable X(n) is omitted. Then, Schwarz inequality en-
tails
1 OLn)
Vare(Tn)Vare ( Ln B(J ~ l.
Now, taking (3) into account, we see that
BLn) = nI«(J ) ,
1 7iO
Vare ( Ln
hence the desired result provided I«(J) =F O.

(ii) (13.12) is an equality if and only if
1 BLn)2
Cov ( Tn, Ln B(J = Vare(Tn) nI«(J),
that is, if and only if
1 8Ln)
Tn - (J = A«(J) ( Ln 8(J .
Integrating with respect to (J leads to a density of the form

Ln(x, (J) = An«(J)eTn(x)Bm(e)Cn(x), x E JR.n, (J E e. (5)
This is the so-called exponential model. Many family of distributions belong
to the class (5).
13.3. The statistical model associated with Xl, ... , Xn is
(JR.n , BJR.n, U, .. , (J E (0,00)), where U8 denotes the uniform distribution
over [0, (JJ. S = sUPl9~n Xi is a sufficient statistic since
j(Xl, ... ,Xnj(J) = (J-nl[o,8]" (sup Xi) (1)

l~i~n
satisfies (13.2). Now by maximizing (1), it is easy to see that S is the MLE
of (J. S has the density (J-n nzn- 1 1[o,9j(Z), hence
E9(S) = 1
o
9
o-nnzndz
n
= --1(J.
n+
Thus T = nt
1 S is an unbiased estimator of (J. By Theorem 13.3, if S is
f:
complete then T is optimal.
Now if 9 is such that E9(g(S)) 0, (J > 0, then = h(z)dz 0, (J > 0, =
where h(z) = zn-lg(z). Therefore
192
91
h+(z)dz = 192
91
h-(z)dz, 0< (Jl < (J2'
Hence the O'-finite measures defined by the densities h+ and h- agree on

the intervals of (0,00) and therefore on 8«0,00)). Hence h+ = h- a.e.,
° °
which implies 9 = a.e., i.e., 9 = P9 a.s. for all (J, which means that S
is complete. It is easy to see that Var9(T) = (J2/[n(n + 2)], (J > 0. Thus
Var9(T) = O(n- 2 ), which is better than the bound given by (13.12). In
fact the regularity conditions for the validity of (13.12) are not satisfied
here (see e.g. Exercise 13.2).
13.5. Let '¢ be a test such that P9o('¢ = 1) ~ P9o(<fJ = 1). Consider the
positive random variable Z defined by
Z(z) = (l{4>=l}(Z) - l{t/J=l}(Z)) (/91(Z) - c/9o(z)), z E E.

We have
J Zdm = J (l{4>=l}(Z) - l{.p=l}(Z)) (f9 1 (Z) - c/9o(z» dm ~ 0,

hence
Po 1 (ifJ = 1) - P(Jl('¢ = 1) ~ c[P(Jo(<fJ = 1) - P(Jo('¢ = 1)] ~ 0,

which proves that <fJ is more powerful that '¢.
13.9. (i) We already know that EA(A~) = nAj(n - 1). On the other hand,
EA(A~? n 100
2'e
A -AU
,
(AU)n-l
_"du
o u
n 2 A2
(n - l)(n - 2)
1 0
00
Ae -AU
(n - 3)!
n 2A2
An - 3 U n - 3 du - -:---.,.....,---,.-
- (n - l)(n - 2)'
Hence
Var),(A~) = E(A~)2 _ (E(A~))2 =, n 2A2
and finally
\)2 (\*) (E(\*) \)2 (n2+n-2)A2

( \*
E),,, -" =Var),"n + "n -" = (n-1)2(n-2)' A> 0, (1)
(ii) By the strong law of large numbers,
W 1 + ... + Wn ~ E(W1 ) =.!..

n A
Thus
\* _ n G.6. \
"n - W1+···+Wn --+ ".
On the other hand (1) implies that
E),(A* - A)2 = 0 (~) --+ o.
Lesson 14.
14.2. Let
An =~ 27rk
L..J ct cos --t and Bn =~ . 27rk
L..J et sm --t.
t=1 n t=1 n
First (An, Bn) is a Gaussian vector since every linear combination of its
componentss can be written as L:~=1 atet (where at's are constants) which
is a Gaussian random variable.
Now, by independence,
Var(An) = U
2 ~ 2 27rk
L..Jcos - t
n
and Var(Bn) = u 2 En sm. 2 -to
21rk
n
t=1 t=1
Concerning the covariance, we have
~ 27rk . 27rk
Cov(An, Bn) L..J E(e6 et) cos - t sm --t
6,t=1 n n
2 ~ 27rk . 27rk
U L..J cos --t sm --t
t=1 n n
u 2 ~ • 47rk
- L..Jsm --t =0
2 t=1 n
since E~=l e4itrkt/n = O. Hence the covariance matrix of (An, Bn) is di-
agonal and, since (An, Bn) is Gaussian, it follows that An and Bn are
independent. Now
2
Tn = - 2 (A~
nu
+ B~) = Y + Z,
where Y and Z are independent with common distribution X2 (1). Thus,
Tn"'; X2(2).
14.5. First, for every :c E JR, there exists j = jn(:C) such that if :c E
[j/k n , (j + 1)/kn ), then
E (Yn(:C» kn
-P
n k n -
jk+ 1)
(j-<cl<-- n
kn l
i/kn
(Hl)'k n
g(u)du = g(u n ),
where t: : :; Un :::; 't!-. Thus

E (Yn(:C» - g(u) = g(u n ) - g(u) --+ 0, as n -+ 00,
since IUn - ul :::; l/kn -+ O. Now, by independence,
Var(Yn (:c» = kn~ Ln

2
P :
(.
:::; ct <;;-
• 1) [ P (: . :::; Ct < . 1)]
+ 1- J:
n n n n
+ 1)
t=1
< k2
-.!!.nP .L
(.
< Cl < _J-
•
n2 kn - k n
k21(Hl)'kn k
< -.!!. g(u)du = ~g(un).
n jfk n n
As n -+ 00, kn/n -+ 0 and g(u n ) --+ g(:c). Thus Var(Yn(:C» --+ O. Finally
E(Yn(:C) - g(:c»2 = Var(Yn(:C» + [E(Yn(:C» - g(:c)]2 --+ 0, as n -+ 00,
which means that

£2
Yn(:C) --+ g(:c), :c E JR.
14.9. (i) f has the decomposition

1 00
f(>.) = -211" '"'

L...J
1'" cos >'t, -11" :::; >'11",
t=-oo
where the Fourier coefficients "'It are unique. Thus
"'10 = 1 + ()2 , "'11 = "'1-1 = (), "'It =0 if It I > 1.

Clearly "'10 = 1 + ()2 > 21()1 = 2"'11 since I()I < 1.
(ii) We have
Var(Z2(a)) = a2Var(Xt} + (1 - a2)Var(X2) + 2a(1- a)Cov(X1, X 2)
= [a 2 + (1 - a2 )J'Yo + 2a(1 - ah1
2a 2 (1 + ()2 _ ()) _ 2a(1 + ()2 _ ()) + (1 + ()2),
which is minimum if a = 1/2.
(iii) Let A = a(X1 + Xn) and B = ~-::;(Xl + ... + X n ). Then
=
Var(Zn(a)) Var(A) + Var(B) + 2Cov(A, B). We have V(A) 2a 2(1 + =
()2) ,
(1 - 2a)2
Var(B) = (n _ 2)2 [(n - 1)(1 + ()2) + 2(n - 3)()],
and 2Cov(A, B) = 4a()(1-2a)/(n-2), and hence Var(Zn(a)) has the form

Var (Zn(a)) = na2 + f3n a + "'In.
l.t
After some easy but tedious calculations, we obtain

(n - 2)(1 + ()2) + (n - 4)()
an = .
n(n - 2)(1 + ()2) - 4()
Thus, in order to have an = lin, a necessary and sufficient condition is

() = O.
(iv) Noting that an '" (1 + ()2 + ())/[n(l + ()2)] as n -+ 00 and that the
dominating term in nVar(Zn(a)) is n"'ln, we obtain
lim nVar(Zn(an )) = 1 + ()2 + 2().

n-+oo
On the other hand, Theorem 9.4 shows that

00
lim nVar(Xn)
n-+oo
= "L.J "'It = 1 + ()2 + 2().
t=-oo
Thus
Vare (Zn(an)) ~ Vare(Xn), I()I < 1,
and the inequality is strict except if () = O. However Zn(a n ) and Xn have
the same asymptotic variance.
These results can be applied in statistics in the following way: If 0 is

known and X t = Yt - m, t E 7l, where m is unknown, then, given the data
Yl, ... , Yn , we can construct two unbiased estimators of m:
*
mn = an(Y1 + Yn ) + 1-
n-
2an
... (Y2 + ... + Yn- 1)
and mn = (Y1 + ... Yn)/n.
The above results show that m* is better than mn for n small and that
they are equivalent for n large.
Lesson 15.
15.4. (i) From Theorem 9.9, it follows that
Var(gT(:1:») = T21Jf J[O,TP Cov (1hT K (:1:-X~)

hT 1K (:1:-Xt))
' hT -;;:;- dsdt.
Using a method similar as in the proof of Theorem 9.11, we obtain the

desired result.
(ii) Let us consider the function
tP : (u, y, z) ~ (1 - tU) l4>u(Y, z)IK (:1:--;;;:- - z) ,

- y) K (:1:--;;;:-
assumption (A) shows that tP is integrable over [0,11 x R2. On the other
hand,
Cov (J...
hT
K (:1: - X
hT
o ), J... K (:1: - Xu))
hT hT
= Jh~1 K (:1:--;;;:- - z) 4>u(Y, z)dydz.

- Y) K (:1:--;;;:-
Then (2) follows from Fubini Theorem applied to (1).
(iii) We have
t
If 4>.(y,z)du - (1- ;) ~'(Y'Z)dUI
x If My, z )du + ~~.(y, Z)dUIt
flO
~ JT 114>ulloo du +
iT° T"4>u"oo
u
du , y,zER.
(iv) By (A) it follows that limt-+oo I; lI¢ulloodu = O. Now

If l[O,T](u)lI¢ullool ~ lI¢ulloo,
I:
then by using (A) again and the dominated convergence theorem, we obtain
limt-+oo lI¢ulloo du O. =
(v) (iii) and (iv) imply that
sup
(!I,z)
,1
a
00
¢u(y, z)du _iT
a
(1- !)
t
¢u(y, z)dul ---+ 0, as t -+ 00.
Thus,
TVar(gT(X» 2f' [ -.!...K (x - y) K (~)

JIR2 hT hT hT
X 10 00
¢u(Y, z)dudydz + 0(1).
Now as ¢u is continuous at (x, x) and lI¢ulloo is integrable, the dominated

10
convergence theorem implies that G: (y, z) 1---+ 00 ¢u(Y, z)du is also con-
tinuous at (x, x). Then it is easy to show that
. f'[JIR2 hf1 K (X-y)

h~~O h:;- [g(y, z) - G(x, x)]dydz = 0,
h:;- K (x-z)
hence the desired result.
(vi) We have
E(gT(X» E[_l iT K(X-Xt)dt]

ThT a hT
.!.. { E [-.!...K (x -Xt )] dt

T Jo hT hT
and by stationarity,
(x -hTXo)] =1 - u) g(u)du
(xh:;-
1:
= 1 1
00
E (gT(X» E [ hT J{ -00 hT K
= K(v)g(x - VhT )dv.

i:
As J::='oo K (v)dv = 1, we obtain
E(gT(:C» - g(:c) = K(v)[g(:c - vhT ) - g(:c)]dv
i:h21°O
and by Taylor formula
E(gT(:C» -hT vK(v)g'(:c)dv
+: -00 v2K(v)g"(:c - (JVhT )dv,
where 0 < (J < 1. Noting that J::='oo vK(v)dv = 0 and by using again the
dominated convergence theorem, we obtain
lim h:r 2 (E(gT(:C» - g(:c»

t-+oo
= -2-
g"(:c) 1 00
-00
v2K(v)dv.
(vii) The quadratic error of 9T(:C) may be written as
E (gT(:C) - g(:c»2 = Var(JT(:c» + (E(9T(:C» - g(:c»2

= O(T- 1) + O(hj.)
(by using (v) and (vi». Then choosing hT = T-l/4, we obtain
E(9T(:C) - g(x»2 = O(T-l).

Comment: This result is somewhat surprising since in the discrete case,
and with similar assumptions, the quadratic error is O(n- 4 / 5 ) (see, e.g.,
(14.33». The reason is that irregularity of sample paths of (Xt ), t E JR)
furnishes an additional information which allows to reach a better rate of
convergence for gT. Note that the Ornstein-Uhlenbeck process satisfies the
above assumptions and, in particular, (A).
15.5. (i) Yj =
(Xt; - X t ;_1)2 j[u 2(tj - tj-t)] '" X2(1), then E(Yj) = 1,
= =
Var(Yj) 2. Therefore E(u~) 00 2, and by independence,
2 4 n
Var(u'2)
n = T200 ~(tj -
L..J tj-l
)2 .
j=l
(ii) If liffin-+oo an = 0, then we have

2004 n 2 4
,2
E (Un - 00
2)2 = Var(u'2) = T2
n
~(
L..J tj - tj-l
)2 ~ T2
00
an --+ 0,
j=l
as n -+ 00. Conversely, if
217 4 ~
n 2 4
E
,2
( Un - 17
2) 2
= 2 17
T2 L.J(tj - tj-d ~ T2 ~n ---+ O. (1)
j=1
and if ~n f+ 0, then there exists an c > 0 and a subsequence (~n') of (~n)

such that ~n' ~ c for all n'. Therefore
2174 2 2174 c 2
(, 2 2) 2
E Un' -17 ~ T2 ~n' ~~,
which contradicts (1). Thus ~n -+ O.

(iii) (a) E(iT~) = 172 and
Var ( 17
-2) _ 1
- -
En Var(Xt ; - Xt;_I)2 _ 2174
--.
n n2. (t.-t'_1)2 n
3=1 3 3
(b) Schwarz inequality entails

2
1 n 1 n
( )
- E(tj - tj-I) ~ - E(tj - tj_t)2, (2)
n j=1 n j=1
that is
T2 < .!. E(tj
n
-
2
tj-I) .
n2 - n j=1
Thus,
2174 2174 n
- n -< -~(t·
T2 ~ 3 -to3- 1)2 ,
3=1
which is the desired inequality. Now (2) is an equality if and only if tj -

tj_l = lin, j = 1, . .. ,n. In that case,
E(o-~ - (7 2)2 = E(iT~ - (7 2)2.
(c) E(iT~ - (7 2 )2 = 2u 4 /n ---+ 0 as n -+ 00. Finally iT~ is better than o-~

and is consistent even if ~n f+ O.
15.8. (i) Noting that E(Xt/t) = m, we may consider the estimator me =
Ie
(1 - c)-1 1 r l Xtdt. Thus
E(m,,) =1~c 11 E (~t) dt = m.

(Note that fol rl Xtdt is undefined).

(ii)
Var(me) 1
. -)2 11 Cov (X6
- , -Xt) dsdt
~)2 11
- g [e,112 s t
,- [c2 +g2min(rl,s-1)] dsdt
11
- g [e,l)2
g
2 + (1)2
2g2 r 1dsdt
- g e:5 6 9$1
O(g) --+ 0, as g -+ o.
Index
L2-consistency, 259 backward shift operator, 206
L2-continuous, 198 Banach space, 305
L 2-differentiable, 198 Bernoulli process, 39
L 2-integrable, 199 Bernoulli random walk, 120
V-space, 306 Bessel function, 174
u-additivity, 5 best linear predictor, 194
u-field, 299 bias, 276
n-step transition probability, 51 bilateral Wiener process, 249
birth and death process, 99
birth process, 97
absolutely continuous, 303
birth rate, 97, 99
absolutely continuous distribution
Black-Scholes process, 247
functions, 13
Borel-Cantelli Lemma, 7
absorbing state, 108
branching process, 70
absorption probability, 62
Brownian bridge, 253
adapted-Bn, 219
Brownian motion process, 235
Akaike's criterion, 281 bus paradox, 90
aliasing, 289 busy period, 179
almost everywhere, 300
almost sure consistency, 259 canonical representation, 37
almost surely, 23 Cauchy sequence, 305
alternating renewal process, 149 Central limit theorem, 26
alternative hypothesis, 260 central limit theorem, 230
ARIMA process, 214 Chapman-Kolmogorovequation, 51
ARMA model, 207 characteristic function, 26
ARMAX process, 215 closed linear subspace, 309
asymptotic statistical model, 259 communication, 52
autocorrelation, 190 complete, 37, 38
autocovariance, 189, 209, 212 complete normed space, 305
autoregression estimation, 279 complete orthogonal system, 308
autoregressive / moving average compound Poisson process, 91
process, 213 conditional distributions, 15
autoregressive process, 207 conditional expectation, 22, 309
347
348 Index
conditional independence, 9 distribution

conditional likelihood, 283 arithmetic, 158
conditional MLE, 282 nonarithmetic, 158
conditionally independent, 23 distribution of the process, 37
confidence interval, 263 Doob-Meyer decomposition, 232
confidence region of asymptotic con- drift term, 242
fidence level, 263
conservative chain, 102 elementary renewal theorem, 156
consistency condition, 36 empirical distribution, 265
consistent in probability, 259 equivalent processes, 37
continuous-time branching process, estimator, 256
99 asymptotically efficient, 259
continuous-time Markov chain, 96 efficient, 259
convergence almost surely, 24 maximum likelihood, 258
convergence in k-mean, 25 optimal, 257
convergence in distribution, 25 preferable, 257
convergence in probability, 24 unbiased, 257
convergence of martingales, 226 expectation, 18
convolution, 17 expected value, 18
counting measure, 191,300 extension of measure, 300
counting process, 79
covariance, 21 factorization theorem, 256
covariance function, 41 field, 5
Cox process, 267 u-field, 5
Cramer-Rao inequality, 259 Borel u-field, 6
critical region, 260 finite dimensional distribution, 35
cumulative distribution function, first passage time, 131
13 Fisher distribution, 269
current lifetime, 162 Fisher information quantity, 259
cylinder set, 36 Fourier coefficient, 308
Fourier series, 308
death rate, 99 Fubini theorem, 303
delayed renewal process, 149, 165
DeMorgan's Laws, 27 game
deterministic sampling, 288 fair, 222
diagnostic checking, 283 favorable, 222
diffusion coefficient, 242 unfavorable, 222
diffusion processes, 242 Gamma distribution, 151
Dirac measure, 192 Gaussian distribution, 282
directly Riemann integrable, 160 Gaussian model, 255
discrete-time Markov chain, 48 Gaussian processes, 233, 234
dishonest chain, 100 generalized Poisson process, 92
Index 349
generating function, 26, 126 Kolmogorov Forward equation, 103

generator of the chain, 102 Kolmogorov's inequality, 225
Hilbert space, 305 Laplace transform, 27

histogram estimator, 284 law of the iterated logarithm, 122
hitting time, 62 laws of laege numbers, 25
Holder inequality, 307 least square method, 273
honest chain, 100 Lebesgue dominated convergence
theorem, 302
identification, 280 Lebesgue measure, 6, 300
idle period, 179 Lebesgue-Stieljes measure, 301
imbedded Markov chain, 183 Lehmann-Scheffe theorem, 258
independent increment, 81 level of significance, 260
independent increments, 39 likelihood function, 258
indicator functions, 12 likelihood ratio, 224
infinitesimal transition probabil- limiting distribution, 63
ity, 97 linear birth process, 98
infinitesimal transition rate, 101 with immigration, 98
initial distribution, 48 linear filter, 193
innovation process, 195 linear process, 205
instantaneous state, 107 linear regression, 190, 309
intensity, 81 Little's formula, 178
interarrival time, 84 Markov property, 40,48
invariant distribution, 64 martingale, 42, 219
invertibility, 206 mean function, 41
irreducible closed set, 54 mean integrated square error, 278
irreducible Markov chain, 54 measurable space, 5, 299
isometry, 239 measure, 299
Ito integral, 238 measure space, 299
Ito's differentiation formula, 246 modified sample auto covariance ,
276
Jensen's inequality, 23 moment, 20
jump matrix, 108 Monotone continuity, 7
monotone convergence theorem, 302
Karhunen-Loeve expansion, 238 monotone likelihood ratio, 261
kernel autoregression estimator, 279 moving average process, 211
kernel estimator, 279
key renewal ~eorem, 160 Neyman-Pearson lemma, 260
Kolmogorov Backward equation, non-absorbing state, 108
103 non-explosive chain, 106
Kolmogorov distribution, 266 non-homogeneous Poisson process,
Kolmogorovexistence theorem, 37 91
350 Index
nonanticipating, 239 random series, 229

norm, 305 random time, 50
normed space, 305 random variable, 11
null hypothesis, 260 random walk, 118
transient, 123
one-step transition probability, 48 absorbing barrier, 120
order statistics, 87 recurrent, 123
Ornstein-Uhlenbeck process, 248 random walks
orthogonal projection theorem, 307 reflecting barrier, 120
orthonormal system, 308 Rao-Blackwell theorem, 257
rate of birth, 97
partial autocorrelation, 194, 210, recurrent class, 55
212 reflection principle, 253
periodogram, 191, 287 regeneration point, 182
point of increase, 158 regenerative process, 182
point process, 79 renewal argument, 149
Poisson process, 79 renewal equation, 153
Poisson sampling, 289 renewal function, 153
Polya's urn model, 224 renewal property, 89
portmanteau statistic, 284 renewal theorem, 157, 158
power function, 260 residual lifetime, 162
prediction error, 194, 252 residuals, 283
Probability risk function, 257
probability measure, 5
probability space, 5 sample autocovariance, 275, 287
probability sample function, 198
law of total probability, 8 sample mean, 275, 287
probability mass function, 13 sample path, 36
probability of ruin, 135 SARIMA process, 214
process seasonality, 271
regular, 195 second order stationary, 41
weakly stationary, 189 semigroup property, 100
product O"-field, 301 separable version, 38
product measure, 301 simple random walk, 119
product measure space, 301 singular distribution functions, 14
small diffusion, 294
quadratic error, 190, 257 span, 158
queue, 171 spectral density, 190, 210, 276
spectral distribution function, 191
Radon-Nikodym theorem, 303 spectral measure, 191
random function, 35, 238 stable distribution, 65
random sampling, 289 stable state, 108
Index 351
standard deviation, 21 symmetric random walk, 120

standard Gaussian, 233 system length, 173
standard normal, 233
standard transition matrix, 101 test, 260, 264
state likelihood ratio, 261
absorbing, 53 optimal, 260
aperiodic, 53 uniformly most powerful, 260
communicate, 52 time of first return, 126
ergodic, 59 time set, 34
null recurrent, 59 total lifetime, 162
periodic, 53 traffic intensity, 175
positive recurrence, 59 transient class, 55
reached, 52 transition probability matrix, 48
recurrent, 55 transition rate, 102
transient, 55 trend,271
state space, 35
stationary distribution, 64, 176 uncorrelated, 21
stationary in the wide sense, 41 uniform transition matrix, 104
stationary increment, 81
variance, 21
stationary increments, 40
Von Mises distribution, 266
stationary renewal process, 165
stationary transition probability, waiting time, 179
48, 96 weak law of large numbers, 25,
statistical model, 255 197
statistical predictor, 279 weakly stationary, 41
Stirling's formula, 123 weakly stationary process, 201
stochastic differential equation, 242 weight function, 277
stochastic integral, 238 Wiener process, 235
stochastic matrix, 48
stochastic process, 34 Yule-Walker equation, 209
stopping time, 156, 222
Stratonovitch integral, 251 zero-one law, 124
strictly stationary, 41 Hewitt-Savage zero-one law,
strong law of large numbers, 26, 125
228 Kolmogorov's zero-one law, 125
strong Markov property, 50
Student distribution, 268
sub-u-additivity, 6
submartingale, 219
sufficiency, 256
supermartingale, 220
symmetric event, 124
THEORY AND DECISION LIBRARY
SERIES B: MATHEMATICAL AND STATISTICAL METHODS

Editor: H. J. Skala, University of Paderborn, Germany
1. D. Rasch and M.L. Tiku (eds.): Robustness of Statistical Methods and

Nonparametric Statistics. 1984 ISBN 90-277-2076-2
2. J.K. Sengupta: Stochastic Optimization and Economic Models. 1986
ISBN 90-277-2301-X
3. J. Acrel: A Short Course on Functional Equations. Based upon Recent
Applications to the Social Behavioral Sciences. 1987
ISBN Hb 90-277-2376-1; Pb 90-277-2377-X
4. J. Kacprzyk and S.A. Orlovski (eds.): Optimization Models Using Fuzzy Sets
and Possibility Theory. 1987 ISBN 90-277-2492-X
5. A.K. Gupta (ed.): Advances in Multivariate Statistical Analysis. Pillai
Memorial Volume. 1987 ISBN 90-277-2531-4
6. R. Kruse and K.D. Meyer: Statistics with Vague Data. 1987
ISBN 90-277-2562-4
7. J.K. Sengupta: Applied Mathematics for Economics. 1987
ISBN 90-277-2588-8
8. H. Bozdogan and A.K. Gupta (eds.): Multivariate Statistical Modeling and
Data Analysis. 1987 ISBN 90-277-2592-6
9. B.R. Munier (ed.): Risk, Decision and Rationality. 1988
ISBN 90-277-2624-8
10. F. Seo and M. Sakawa: Multiple Criteria Decision Analysis in Regional
Planning. Concepts, Methods and Applications. 1988 ISBN 90-277-2641-8
11. I. Vajda: Theory of Statistical Inference and Information. 1989
ISBN 90-277-2781-3
12. J.K. Sengupta: Efficiency Analysis by Production Frontiers. The Non-
parametric Approach. 1989 ISBN 0-7923-0028-9
13. A. Chikan (ed.): Progress in Decision, Utility and Risk Theory. 1991
ISBN 0-7923-1211-2
14. S.E. Rodabaugh, E.P. Klement and U. Hohle (eds.): Applications of Category
Theory to Fuzzy Subsets. 1992 ISBN 0-7923-1511-1
15. A. Rapoport: Decision Theory and Decision Behaviour. Normative and
Descriptive Approaches. 1989 ISBN 0-7923-0297-4
16. A. Chikan (ed.): Inventory Models. 1990 ISBN 0-7923-0494-2
17. T. Bromek and E. Pleszczynska (eds.): Statistical Inference. Theory and
Practice. 1991 ISBN 0-7923-0718-6
THEORY AND DECISION LIBRARY: SERIES B
18. J. Kacprzyk and M. Fedrizzi (eds.): Multiperson Decision Making Models

Using Fuzzy Sets and Possibility Theory. 1990 ISBN 0-7923-0884-0
19. G.L. G6mez M.: Dynamic Probabilistic Models and Social Structure. Essays
on Socioeconomic Continuity. 1992 ISBN 0-7923-1713-0
20. H. Bandemer and W. Nlither: Fuzzy Data Analysis. 1992
ISBN 0-7923-1772-6
21. A.G. Sukharev: Minimax Models in the Theory ofNumerical Methods. 1992
ISBN 0-7923-1821-8
22. J. Geweke (ed.): Decision Making under Risk and Uncertainty. New Models
and Empirical Findings. 1992 ISBN 0-7923-1904-4
23. T. Kariya: Quantitative Methods for Portfolio Analysis. MTV Model
Approach. 1993 ISBN 0-7923-2254-1
24. M.J. Panik: Fundamentals of Convex Analysis. Duality, Separation, Represen-
tation, and Resolution. 1993 ISBN 0-7923-2279-7
25. J.K. Sengupta: Econometrics of Information and Efficiency. 1993
ISBN 0-7923-2353-X
26. B.R. Munier (ed.): Markets. Risk and Money. Essays in Honor of Maurice
Allais.1995 ISBN 0-7923-2578-8
27. D. Denneberg: Non-Additive Measure and Integral. 1994
ISBN 0-7923-2840-X
28. V.L. Girko, Statistical Analysis of Observations of Increasing Dimension.
1995 ISBN 0-7923-2886-8
29. B.R. Munier and M.J. Machina (eds.): Models and Experiments in Risk and
Rationality. 1994 ISBN 0-7923-3031-5
30. M. Grabisch, H.T. Nguyen and E.A. Walker: Fundamentals of Uncertainty
Calculi with Applications to Fuzzy Inference. 1995 ISBN 0-7923-3175-3
31. D. Helbing: Quantitative Sociodynamics. Stochastic Methods and Models of
Social Interaction Processes. 1995 ISBN 0-7923-3192-3
32. U. Hoble and E.P. Klement (eds.): Non-Classical Logics and Their Applica-
tions to Fuzzy Subsets. A Handbook of the Mathematical Foundations of
Fuzzy Set Theory. 1995 ISBN 0-7923-3194-X
33. M. Wygralak: Vaguely Defined Objects. Representations, Fuzzy Sets and
Nonclassical Cardinality Theory. 1996 ISBN 0-7923-3850-2
34. D. Bosq and H.T. Nguyen: A Course in Stochastic Processes. Stochastic
Models and Statistical Inference. 1996 ISBN 0-7923-4087-6
KLUWER ACADEMIC PUBLISHERS - DORDRECHT / BOSTON / LONDON

A Course in Stochastic Processes

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Course in Stochastic Processes

Transféré par

Droits d'auteur :

Formats disponibles

A COURSE IN STOCHASTIC PROCESSES

THEORY AND DECISION LIBRARY

General Editors: W. Leinfellner (Vienna) and G. Eberlein (Munich)

Series A: Philosophy and Methodology of the Social Sciences

Series B: Mathematical and Statistical Methods

Series C: Game Theory, Mathematical Programming and Operations Research

Series D: System Theory, Knowledge Engineering and Problem Solving

SERIES B: MATHEMATICAL AND STATISTICAL METHODS

Editor: H. J. Skala (paderborn); Assistant Editor: M. Kraft (paderborn); Editorial Board:

Springer-Science+Business Media, B.Y.

ISBN 978-90-481-4713-7 ISBN 978-94-015-8769-3 (eBook)

Printed on acid-free paper

All Rights Reserved

1 Basic Probability Background 1

2 Modeling Random Phenomena 33

3 Discrete - Time Markov Chains 45

5 Continuous - Time Markov Chains 95

6 Random Walks 117

7 Renewal Theory 147

8 Queueing Theory 171

9 Stationary Processes 189

10 ARMA model 205

11 Discrete-Time Martingales 219

12 Brownian Motion and Diffusion Processes 233

13 Statistics for Poisson Processes 255

14 Statistics of Discrete-Time Stationary Processes 271

15 Statistics of Diffusion Processes 287

A Measure and Integration 299

B Banach and Hilbert Spaces 305

Partial Solutions to Selected Exercises 315

This text is an Elementary Introduction to Stochastic Processes in discrete

aspects to measure theory are necessary to make the treatment rigorous.

We would like to thank Professor H. Skala, Editor of the Series "Math-

Denis Bosq and Hung T. Nguyen

This Lesson is a review of basic concepts in probability theory needed for

1.1 Events and Probabilities

n = {(i,j): i, j = 1,2,·· ·,6}.

A~O ( A is contained in 0).

Now for an event such as {w = =

P(A U B) = P(A) + P(B).

f is a probability mass function, where the summation E is an infinite one.

P(A) = 1: f(w), A~n,

where f(w) = P({w}). Thus the collection of events A = p(n).

P CQl An) = ; P(An ).

Consider random experiments with uncountably many outcomes (con-

then A is called a jield. Note that a u-field is a jield (Exercise).

Definition 1.1 A probabilistic model for a random experiment is a prob-

P (Q, An) = ~P(An) ( u-additivity).

The pair (0, A) is called a measurable space.

Let us go back to the specification of A for 0 = [0,1]. In view of

(b) For 0 = JR = (-00,00), JR = [-00,00], JR+ = [0,00) and ni+ =

(i) P is monotone increasing, i.e., if A, B E A with A ~ B, then

P (Q A.) ~ ~?(A;)- "t1," p(A,nAJ)+- - _+(_1)"+' P (0 A,) .

P (Q. An) ~ ~P(An) ( sub-u-additivity).

(v) Limits of events. As in Real Analysis, we proceed as follows. A

For such a sequence, we define the limit as follows.

we say that limn ..... oo An exists and is equal to

lim inf An = lim sup An

(vi) Borel-Cantelli Lemma. Let An E A, n ~ 1.

(vii) Conditional probability and independence. Let A E A with

More generally, if A 1, A 2 ,···, An are events such that P (n?:::i1 Ai) # 0,

P (0 Ai) = P(At)P(A 2 IAt) ... P (Ani :0: Ai) .

The following law of total probability is useful in computing probabilities