Académique Documents
Professionnel Documents
Culture Documents
Scope: The series focuses on the application of methods and ideas of logic, mathematics and
statistics to the social sciences. In particular, formal treatment of social phenomena, the
analysis of decision making, information theory and problems of inference will be central
themes of this part of the library. Besides theoretical results, empirical investigations and the
testing of theoretical models of real world problems will be subjects of interest. In addition
to emphasizing interdisciplinary communication, the series will seek to support the rapid
dissemination of recent results.
The titles published in this series are listed at the end of this volume.
A COURSE IN
STOCHASTIC PROCESSES
Stochastic Models and
Statistical Inference
by
DENIS BOSQ
Institut de Statistique,
Universite Pierre et Marie Curie,
Paris, France
and
HUNG T. NGUYEN
Department of Mathematical Sciences,
New Mexico State University,
Las Cruces, New Mexico, U.SA.
"
~ ..
Preface IX
4 Poisson Processes 79
4.1 Motivation and Modeling 79
4.2 Axioms of Poisson Processes . 81
4.3 Interarrival Times . . . . . . 84
4.4 Some Properties of Poisson Processes . 87
4.5 Processes related to Poisson Processes 91
4.6 Exercises 92
v
vi Contents
Bibliography 313
Index 347
Preface
(1) The pedagogy is somewhat obvious. Since this text is designed for a
one semester course, each lesson can be covered in one week or so. Having
in mind a mixed audience of students from different departments (Math-
ematics, Statistics, Economics, Engineering, etc.) we have presented the
material in each lesson in the most simple way, with emphasis on moti-
vation of concepts, aspects of applications and computational procedures.
Basically, we try to explain to beginners questions such as "What is the
topic in this lesson?" "Why this topic?", "How to study this topic math-
ematically?". The exercises at the end of each lesson will deepen the stu-
dents' understanding of the material, and test their ability to carry out
basic computations. Exercises with an asterisk are optional (difficult) and
might not be suitable for homework, but should provide food for thought.
The purpose of the book, viewed as a text for a course or as a reference
book for self study, is to provide students with an pleasant introduction to
the Theory of Stochastic Processes (without tears!). After completing the
course, the students should be able to take more advanced and technical
courses or to read more specialized books on the subject.
(2) In writing the text we face the following dilemma. In general, mea-
sure theory is not required for a First Course in Stochastic Processes. On
the other hand, it is true that measure theory is the language of probabil-
ity theory. When presenting the material, even at the simplest level, some
ix
x Preface
The first named author would like to thank Emmanuel Guerre for pro-
viding some exercises.
The second named author would like to thank his department head,
Professor Douglas Kurtz for his encouragement.
Basic Probability
Background
1
2 Lesson 1
If we perform the experiment and obtain the outcome (2,5), then, since
(2,5) E A (the point (2,5) belongs to A or is a member of A), we say that
the event A is realized, or A occurs.
Since we cannot predict exactly what the outcome will be in a random
experiment such as this, we ask " what is the chance that A will occur?"
The answer to this question will be a number P(A) called the probability of
the event A. In an experiment whose sample space 0 is finite, it is possible
to assign a number P(A) to all subsets A of O. The point is this. Since we
are interested in probabilities of events, subsets of a general sample space
o (such as 0 = IR == (-00,00), the set of real numbers) are considered as
events only if their probabilities can be assigned.
In our actual example, the collection A of all events is P(O), the power
set of 0, that is, the collection of all possible subsets of 0, including the
empty set 0 and O. Events are stated in natural language and hence com-
pound events are formed by using logical connectives like "not", "and",
and "or". In the context of random experiments, events are subsets of o.
The modeling of the above connectives in the context of Set Theory is as
follows.
The negation (or complement) of A is AC = {w EO: w ¢ A}, where ¢
stands for "is not a member of" .
For A, B E 0, "A and B" is defined as
An B = {w EO: wE A, WEB},
where n stands for "intersection"; "A or B" is
AU B = {w EO: wE A, or wEB},
where U stands for "union". Note that the "or" here is not exclusive, i.e.,
we allow w E A U B if w belongs to both A and B.
In our example, since A = P(O), A is closed under all above set op-
erations, that is, if A, B E A then A c, A n B, and A U B all belong to
A-
We describe now the way to assign probability to events in our example.
It is plausible that any outcome (i, j) will have the same chance to occur.
=
Thus we assign to each w (i,j) a number f(w), called the probability of
event A = {w}. By its meaning, 0 ~ f(w) ~ 1. Here since f(w) is the same
for all w E 0, we obtain that
1 1
f(w) = #(0) = 36'
where #(0) denotes the cardinality (number of elements) of o. Observe
that f : 0 -+ [0,1] and satisfying the condition 'EWEO f(w) = 1. Such a
function is called a probability mass function.
Basic Probability Background 3
when Ai n Aj = 0, 1 $ i =1= j $ k.
The triple (0., A, P) above is called a probability space. A probability
space is a model for a random experiment.
Let us extend the above modeling of random experiments with finite
sample spaces to the case of experiments with countable infinite sample
spaces (0. is countable if there is an one-to-one correspondence between 0.
=
and the set IN {O, 1,2, ... ,} of non-negative integers). We say that 0. is
discrete if 0. is finite or countable infinite.
As an example of an experiment with infinite many outcomes, consider
the experiment of tossing a fair coin until we first obtain a Head. The
outcome of this experiment is the number of tosses needed to obtain the
first Head. Obviously, 0. = {I, 2, ... , }. As in the finite case, we first assign
f( n) to each n E 0., where n stands for the outcome "the first Head occurs
on toss n". When tossing a coin n times, there are 2n possible combinations
of Heads and Tails, only one of which corresponds to the above outcome,
namely the first n - 1 tosses yield Tails, and the nth toss yields Head. Thus
1
f(n) = 2n ' n ~ 1.
Since
I: f(w) = I: f(n) = I: 1/2 = 1,
00 00
n
wEn n=1 n=1
4 Lesson 1
question: Is there a probability measure P on 1'([0, 1]) such that P(I) = III
for any sub-interval I of [0, I]? Note that, to be consistent with the discrete
case, P needs to be u-additive.
It turns out that the answer for this mathematical problem is NO. The
reason is that 1'([0,1]) is too big. Thus not all subsets of [0, 1] are events,
that is, A is a proper subset of 1'([0, 1]). To determine A, we observe that A
should contain intervals, and for any A E A, P(A) should be derived from
P{I) = III for interval I. Furthermore, as in the discrete case, A should be
a u-jield, that is, A is a collection of subsets of 0 satisfying
(i) 0 E A,
(ii) A E A implies that AC E A, and
(iii) For any sequence An E A, n ~ 1, U~=l An E A.
Remarks.
(a) The above algebraic structure of A expresses the fact that A should
be large enough to contain all events of interest.
(b) (ii) and (iii) above imply that if An E A, n ~ 1, then n~=lAn E A
(exercises).
(c) If (iii) above is replaced by
A, B E A =} AU B E A,
(a) The above u-field is called the Borel u-field of [0, 1] and is denoted
by 8([0,1]). Elements of 8([0,1]) are called Borel subsets of [0, 1].
(iv) For An E A,
= U An.
00
lim An
n ..... oo
n=l
Similarly, the sequence An is decreasing if An+l ~ An, 'in ~ 1, and
00
limAn=nAn·
n~oo
n=l
If the sequence is arbitrary, then Bn =
U~nAi is a decreasing sequence
=
and Dn n~nAi is an increasing sequence. Thus we define
lim supAn
n ..... oo
= nU
00
n=l i=n
00
Ai and liminfAn
n ..... oo
=U
00
n=li=n
n
00
Ai.
Note that
lim inf An ~ lim sup An.
n ..... oo n ..... oo
When
liminfAn
n ..... oo
= lim supAn ,
n ..... oo
Note that limsupn..... oo An is also written as (An i.o.), where i.o. stands
for "infinitely often", since w E lim supn ..... oo An if and only if wEAn for
infinitely many An. Also w E lim infn ..... oo An if and only if wEAn for all
but a finite number of n.
If An E A, n ~ 1, is either an increasing or decreasing sequence of
events, then
lim P(An)
n-+oo
= P ( n-+oo
lim An) . (Monotone continuity)
P(A n B) = P(A)P(BIA).
P(AIB) = P(A),
and similarly, "B is independent of A" when
P(BIA) = P(B).
In both cases,
P (n
jeJ
Aj) = II P(Aj),
jeJ
P(QA;;) ilP(A,;)
=
The independence of Ai'S implies that any two events Ai and Aj are in-
dependent (pairwise independent). However, the converse does not hold
(Exercise) .
Viewing {A} and {B} as two collections of events, we define independent
collections of families of events as follows.
Let I be a set and Ci ~ A, i E I. Then the collections Ci'S are said to
be independent if for any finite J ~ I and all Ai E Ci , i E I,
P (n
iEJ
Ai) = II P(Ai).
iEJ
(X E A) = X-1(A),
where X-l : P(JR) -+ P(O) is defined by
Remarks.
(a) A map X satisfying the condition in the above definition is called a
measurable function. More specifically, X is a A - 8(JR) measurable func-
tion. Note that the probability P on (U, A) plays no role in the definition.
(b) If the range of the random variable X is discrete (continuous), the
X is called a discrete (continuous) random variable.
(c) By technical reasons, we might need to consider extended random
variables, that is, we allow ±oo as values. In this case, X : n -+ IR =
[-00,00] and by definition, X is a (extended) random variable if {w :
X(w) ::; t} E A for any t E JR.
(d) More generally, for d ~ 1, a measurable mapping X : (n,A) -+
»
(JRd,8(JRd is called a random vector. Write X = (Xl, X 2 , ••• , X d ), where
Xk : n -+ JR, k = 1,2,···, d, then it can be shown that X is a random
vector if and only if each Xk is a random variable. Note that elements of
8(JRd) are Borel sets of JRd (see Appendix).
I
EXaIIlple 1.1 (a) The number of heads obtained in tossing a coin five times
and number of tosses needed to obtain the first head in repeated tosses of a
coin are examples of discrete random variables.
(b) The waiting time for service of a customer in a queue and the time at
which some event of interest (such as breakdowns, earthquakes, .. -) occurs
are examples of continuous random variables.
and
liminfXn = lim (inf
n-+oo n-+oo k$n
Xk) .
In particular, when liffin-+oo Xn exists (That is when limsuPn-+oo Xn
lim infn-+oo X n ), it is also a random variable.
The simplest random variables are indicator functions of events (sets).
Let A S; 0, then the function 1A : 0 - {O, 1} defined by
1A(W) = {1 if wE A
0 elsewhere
°
ability space describing the random experiment of rolling two dice. Let X
denote the sum of two numbers shown. Since = {(i,j) : i,j = 1,2,···, 6}
is finite, the range of X is also finite: R(X) = {2, 3, ... , 12}. The proba-
bility measure Px on (R(X), P(R(X))) is defined by
P(X = 2) = P{(1, 1)} = 1/36, P(X = 3) = P{(1, 2), (2, 1)} = 2/36, ... ,
Basic Probability Background 13
x =x 2 3 4 5 6 7 8 9 10 11 12
P() 1 2 3 4 5 6 5 4 3 2 1
XX 36 36 36 36 36 36 36 36 36 36 36
Px(A) = L: Px(x).
xeA
B2 = [0'312)U(322,~)U(~,~)U(:2,1],
and so on. Note that the Bn's decrease, and each Bn is the disjoint union
of 2n sub-intervals, each of length 1/3n . Thus the "length" of A is:
L(A) = lim
n--+oo
L(Bn) = lim
n--+oo
(-32 ) n = 0.
But since A is the range of X, we have P(X E A) = 1. These facts show
that X does not have an absolutely continuous distribution F. It can be
shown, however, that F is continuous.
Every distribution function F can be written in the form aF1 + f3F2 +
-yFa, where a + f3 + -y = 1 and Fl, F2 , Fa are of types (a), (b), (c) above
respectively.
Distribution lunctions 01 random vectors are defined as follows.
Basic Probability Background 15
For 1 :::; i l < i2 < ... < ik :::; n, the joint distribution of the random vector
(Xil' Xi 2 , " ' , Xi,,) is
F(i 1 ,i 2 .... ,i,,) (Xit> X'2"", Xi,,) = Fx (00,···, Xit> 00,"', X'2' 00,"', Xi", 00,···),
and is a k-dimensional marginal distribution.
E P(AI Bn)IB,,(w),
00
P(Y E AIX)(w) =
n=l
where Bn = {w : X(w) = xn }. Note that {Bn, n ~ I} forms a partition of
O.
16 Lesson 1
while P(X = x) = 0.
The conditional distribution F(ylx) =
P(Y ~ ylX =
x) in such cases
can be defined rigorously by using some sophisticated mathematics (known
as the "Radon-Nikodym theorem", see Appendix). some details will be
given in the next section.
For computational purpose, when the pair of random variables (X, Y)
has a joint density function f(x, y), then
F(ylx) = 1Y
eo f(zlx)dz,
and is defined as zero for fx(x) = 0, and fx(x) is the marginal density
1:
function of X given by
where f is the joint mass (or density) probability function of the Xi'S and
fi is the marginal mass (or density )probability function of Xi'
Sums of independent random variables appear often in the studies of
stochastic processes. The following is the formula for obtaining their dis-
tributions.
Suppose that X and Yare two independent discrete random variables
with values in to, 1, 2, ...}. The distribution of Z = X + Y is completely
determined by the mass probability function
fx(n) = P(Z = n) = P(X + Y = n), n ;?: 0.
Now, for fixed n, (X + Y = n) = Uk=o(X = k, Y k). Since the events
n -
{w : X(w) = k, Yew) = n - k}, k = 0,1"", n are disjoint, we have
n n
P(X + Y = n) = L P(X = k, Y = n - k) =L P(X = k)P(Y = n - k),
k=O k=O
I:
by independence. The counter-part of this formula in the continuous case
IS
I:
functions F and G is defined as:
1.3 Expectation
Consider a random experiment such as rolling two dice and let X be the sum
of two numbers shown. What is the average (mean or expected) value of X?
We will answer this question using our modeling scheme. The experiment
is modeled by the probability space (n,A,p), where n = (i,j) : i,j =
1,2, .. " 6}, A = pen), and P( {w}) = 1/36, 'Vw En. The random quantity
X is modleed as a random variable, that is, a map from n to {2, 3,·", 12}.
The probability mass function of X is
Thus, for random variables with finite ranges, the expected value (or mean,
or expectation) of X is taken to be
The extension of this formula to random variables whose ranges are infinite
(countable or not)is a little delicate. To avoid meaningless expressions such
as 00 - 00, we first consider random variables with constant sign, say, non-
negative (extended) random variables.
A random variable X with finite range {Xl, X2,"', xn} can be written
as n
X(w) = L: xi1A;(w),
i=l
Basic Probability Background 19
E(X) = L XP(Ai).
i=1
Xn(w) = L
i=O
2zn l[~~x<W](w) + n1[x?:n]'
E(X) = n-+oo
lim E(Xn).
Note that, in fact, the value of E(X) is independent of any particular choice
of simple Xn / X.
When X is a non-negative discrete random variable with mass proba-
bility function f, then
E(X) = Lxf(x)
rc
1:
and if X is a non-negative continuous random variable with density function
f, then
E(X) = xf(x)dx.
(iii) If both E(X+) and (X-) are finite, then we say that the expectation
of X is finite and that X is integrable,
1:
:c
and
E(X) = xf(x)dx (if X is continuous).
1:
More generally, if t/J : IR --+ IR (measurable), then
E(t/J(X)) = t/J(x)f(x)dx.
if E(xn) < 00, then E(xm) < 00 for m :::; n. However, X might not have
moments of order> n.
=
For n 2, the quantity E(X - E(X))2 is called the variance of X and
is denoted as Var (X) or simply V(X), its positive square root is called
the standard deviation of X. For two random variables X and Y, having
second moments, the covariance of X and Y is the quantity
Dn = {w : X(w) = xn} n ~ 1.
If Y is a random variable with finite range {Y1, Y2, ... , Ym}, then
m
E(Y) = E YiP(Bi), Bi = {w : Y(w) = yd,
i=1
thus, by analogy,
m
E(YID) = 10 Y(W)dPD(W),
and PD(.) denotes the conditional probability measure on A defined by
E(YID) = E(YID)/P(D).
Now, consider the partition D n , n ~ 1, induced by the discrete random
variable X. Before observing X, the conditional expectation of Y given X,
denoted as E(YIX), is a random variable. The above discussions leads to
the following definition.
E(YIX) = E(Ylu(X».
L L
A E u(X),
Y(w)dP(w) = E(Ylu(X»(w)dP(w),
where
L Y(w)dP(w) = 10 lA(w)Y(w)dP(w).
(ii) By E(YIX1 ,·· ·,Xk), we mean E(Ylu(Xl'·· ·,Xk».
Basic Probability Background 23
in symbol, Xn ~ X.
The interpretation is this. With high probability, Xn is close to X for
large values of n.
A stronger concept of convergence is
Definition 1.8 The sequence (Xn, n ~ 1) is said to converge almost
surely (or with probability one) to X if
P (w : Xn(w) --+ X(w)) = 1,
in symbol, Xn ~ X.
Remarks.
(i) It can be shown that if Xn ~ X, then Xn ~ X. The converse
does not hold. See Exercise 1.25.
(ii) To prove the a.s. convergence, the following equivalent criterion is
useful:
Xn~X ¢::::> lim P (sup IXk - XI
n ..... oo k~n
> c) = 0
Remarks.
(i) The Lk-convergence implies the convergence in probability.
Remarks.
(i) If Xn ~ X, then Xn ~ X.
Remarks.
(a) Saying that the sequence Zn = (Sn - nE(X!)j(uVn) ~ N(O,I)
is the same as
lim P(Zn
n--+oo
~ t) = 1t-00
!.::e-
v27r
x2 / 2 dz, Vt E JR.
1:
characteristic function of X is defined to be:
L P(X = n)tn
00
q,(t) = E(tX) =
n=O
Basic Probability Background 27
for It I < 1.
(ii) Laplace transform. For X ~ 0, the Laplace transform of the
density f of X is
1/;(t) = E(e- tX ) = 1 00
e- tx f(x)dx
1.5 Exercises
1.1. Specify (O,A, P) for the following random experiments.
(i) Tossing a balanced coin five times.
(ii) An urn contains 10 white and 4 black balls. Five balls will be drawn
(without replacement) from the urn. An outcome is defined as the number
of black balls obtained in the drawn sample.
(iii) Consider a unbalanced coin with probability of getting a head in
each toss equal p. Toss that coin (independently) until the first head ap-
pears. An outcome is defined as the number of tosses needed.
1.2. Let (0, A, P) be a probability space. Show that
(i) A is a field.
(ii) If A, B E A, then A - B = {w : w E A,w ~ B} EA. (Hint: first
=
prove the DeMorgan's Laws: (A n B)C AC U B C, (A U B)C N nBc.) =
(iii) If An E A, n 2:: 1, then n~=lA, EA.
(iv) If A, B E A with A ~ B, then P(A) ~ P(B).
(vi) If A, B E A, then
P(A U B) = P(A) + P(B) - P(A n B).
f: 1
l~~~f An = {w : n=l A :',(w) < oo}
whereas
lim sup An
n-+oo
= {w : f:
n=l
1An (w) = oo}.
Give an interpretation for these events.
1.4. Let 0 be an infinitely countable space and let f: 0 -+ [0,1] such that
EWEO f(w) = 1. Define P : P(O) -+ [0,1] by
= I: P(An)P(BIAn)
00
P(B)
n=l
and for P(B) > 0,
P(Am)P(BIAm)
P(AmI B ) = .... 00 ...,of A \ rofrol A \' "1m ~ 1.
the second toss", and C ="exactly one head occurs". Are A, B, C pairwise
independent? Are A, B, C mutually independent?
1.8. Let (0, A, P) be a probability space. Let A, B, C E A such that
P(A n B) > O. Show that if P(ClA n B) = P(CIA) then Band C are
independent given A.
1.9. Let (0, A, P) be a probability space. Let X : 0 -+ JR.
(i) Show that for A, An ~ JR, n ~ 1,
(iv) Use (iii) to show that SUPn Xn and infn Xn are extended random
variables. Show also that {w : Xn(w) converges} E A.
30 Lesson 1
Show that
(i) F is monotone non-decreasing, i.e., x < y implies F(x) ~ F(y).
(ii) li~-+_oo F(x) = 0, liII1n-+oo F(x) = 1.
(iii) F is right-continuous, i.e., liIlly",,,, F(y) = F(x) for any x E IR.
1.12. A random variable X taking values in an interval [a, b] ~ IR is said to
be uniformly distributed on [a, b] if it is a continuous random variable with
the probability function given by
1
f(x) = b _ a l[a,6](x), x E IR.
(vi) Gamma (n, A): /(x) = Ae-A~(AX)n-l j(n - 1)11[0,00)(x) with A > 0
and n> O.
1.16*. Let X be a random variable taking values in {O, 1,2,·· .}. Show that
00
E(X) = L P(X > n).
n=O
E(X) = 1 00
P(X > t)dt -1~ P(X $ t)dt.
(iii) E(IXlk) = kIt t k- 1P(IXI > t)dt.
1.18. Let X : (0, A, P) -+ ni+ = [0,00] be a non-negative random vari-
able. For each integer n, define
n2"-1 .
Show that
(i) For any wE 0, Xn(w) $ Xn+1(w), (Vn).
(ii) liffin-+oo Xn(w) = X(w), Vw E O.
1.19. Let X and Y be random variables with finite ranges. Show that
(i) X $ Y implies that E(X) $ E(Y).
(ii) E(X + Y) = E(X) + E(Y).
1.20*. Let X be a non-negative extended random variable. Suppose that
P(X =00) > O. Show that E(X) 00. =
1.21. Let X be a random variable with values in {Xl, X2,···, x n }. let
Dk = =
{w : X(w) Xk}, k 1,2,···, n.=
(i) Verify that the Dk'S form a (measurable) partition of O.
(ii) For A E A, show that E(AIX) = P(A).
(iii) Let Y be a discrete random variable, independent of X. Show that
Modeling Random
Phenomena
33
34 Lesson 2
set of their possible values) is called the state space of the process and is
denoted by S.
Stochastic processes are thus the mathematical models for random phe-
nomena. They are classified according to the nature of the time set T and
the state space S (discrete or continuous). For example, it T is continuous,
say, [0,00) and S is discrete, say, S = {-'" -2, -1, 0,1,2,·· .}, then the
process is called a continuous-time process with discrete state space. The
classification of stochastic processes is exemplified by the examples of the
previous section as follows.
Example 2.1: A discrete-time stochastic process with a finite state space.
Example 2.2: A continuous-time stochastic process with a finite state
space.
Example 2.3: A discrete-time stochastic process with a discrete state
space.
Example 2.4: A discrete-time stochastic process with a continuous state
space.
Example 2.5: A continuous-time stochastic process with a discrete state
space.
Example 2.6: A continuous-time stochastic process with a continuous
state space.
where t = (t1' .. " tn) and B E 8(JRn ) (see Lesson 1 for notation).
The construction of (0, A, P) and X t should take the set :F of all finite
dimensional distributions of X into account.
First, for each w E 0, the sample path at w is the real-valued function
defined on T : t -+ Xt(w). Thus we can take 0= JRT which is the set of
all real-valued functions defined on T, so thatXt(w) = w(t), with wE JRT,
that is, for each t E T,
X t : JRT --JR.
For X t to be random variable, the u-field A on JRT should be such that
X t- 1 (B) E A for any B E 8(JRn ).
More generally, in view of (2.1), A should also contain all (finite dimen-
sional) cylinder sets of JRT, that is, subsets A of JRT of the form
then, obviously,
Pt(B) = Pa(t) (J~l(B» ,
for BE 8(JRn ), t = (t1,"', tn), and a(t) = (ta(l),"" ta(n»'
(ii) For t = (t1,"', t n), s = (t1,"" tn, Sn+1), and B E 8(JRn ), we have
(a) The u-field u(C) might be too small as compared to the space JRT.
We might need to enlarge u(C) to include more subsets of JRT.
For any given probability space (0, A, P), it is always possible to enlarge
A without changing P on A (see Exercise 2). A probability space (0, A, P)
is said to be complete if subsets of elements A E A such that P(A) = 0
are elements of A. In other words, all subsets of zero probability events
38 Lesson 2
AC n B = A n C.
C (See Exercise 3)
the random variables X t2 -Xt1 , Xt3 -Xh ,"', X t " -Xt,,_l are independent.
If for any t, sET with t < s, the distribution of X6 - X t depends only on
s - t, then the process X is said to have stationary increments.
2.5 Exercises
2.1. Give several examples ofrandom phenomena together with their math-
ematical modeling by stochastic processes.
2.2*. Let (0, A, P) be a probability space.
(i) Define the collection .A of the subsets of 0 as follows.
For A ~ 0, A E .A if and only if there are B l , B2 E A such that
Bl ~ A ~ B2 and P(B l ) = P(B2). Verify that A ~ .A and .A is au-field.
(ii) Define P : .A -+ [0,1] by P(A) = P(Bt) = P(B2). Show that P is
well-defined and is a probability measure on A.
(iii) Let A E .A with P(A) = O. Show that if B ~ A, then B E A.
((0,.4, P) is called the completion of (0, A, P)).
2.3*. Let Xt, t E [0,00), be a real-valued stochastic process.
(i) verify that
w . - inf Xt(w)
t~O
is a random variable.
(iv) Explain why the assumption of completeness of (0, A, P) is neces-
sary in addressing the concept of separability of stochastic processes.
2.4. Let (Xn, n ~ 1) be a Bernoulli process with state space S = {O, I}
and probability of "success" p = P(Xn = 1).
(i) Compute P(X2 = 0, X5 = 1, Xs = 1).
(ii) Give an explicit formula for computing finite-dimensional distribu-
tion of the process.
(iii) Let Yn = E~=l Xi, n ~ 1. Verify that the process (Yn , n ~ 1) has
stationary and independent increments.
(iv) Is (Yn , n ~ 1) a Markov process?
(v) Is (Yn , n ~ 1) a martingale?
2.5*. Consider the experiment consisting of tossing a fair coin indefinitely.
The space of all possible outcomes is
45
46 Lesson 3
the stock is below some prescribed level a, then the stock level is brought
up to some prescribed level b (a < b), otherwise, no replenishment is un-
dertaken. Since the demand for the commodity during each time interval
[tn-I, t n ) cannot be predicted with certainty, the stock level just before tn is
a random number.
If we let X n , n ~ 0, be the stock level just before time tn, then {Xn, n ~
=
O} is a discrete-time stochastic process with finite state space S {O, 1"", b}.
Example 3.4 Suppose that the lifetime of some piece of equipment is mea-
sured in units of time, say, minutes. When a piece of equipment fails, it is
immediately replaced by an identical one, and so on ...
If we let Xn be the remaining lifetime of the piece of equipment in
use at time n, then the discrete-time stochastic process {Xn, n ~ O} has
{O, 1,2, ...} as state space.
Now, if we examine the above examples, then we recognize that all the
above stochastic processes (Xn, n ~ 0) possess a common time-dependent
structure. Indeed, in Example 3.1, if we observe X o =
io, Xl =
iI, "',
Xn = in, then the prediction of the ''future'' X n+1 depends only on the
"present" state Xn = in of the process. The knowledge of the "past",
namely Xo, Xl,"', Xn-l, will not contribute to any improvement of X n +1 •
In other words, the present Xn contains all information concerning the
prediction of X n + l . This property is expressed mathematically as, for any
n ~ 0,
X n +l = { Xn - 1 + Yn if Xn ~ 1
Yn if Xn = 0
It is plausible to assume that the Yn's are mutually independent and have
the same distribution (i.e., an independent and identically distributed (Li.d.)
sequence of random variables). In this case, the property (3.1) is clearly
satisfied and the one-step transition probabilities P(Xn+l = jlXn = i) do
not depend on n, since
P(Xn+l= jlXn = 0) = P(Yn = j),
P(Xn+1 = jlXn = i I: 0) = P(Yn = j - i + 1),
noting that the distribution of Yn is the same for all n ~ O.
Similarly, in Example 3.4, we have
Xn- 1 if Xn ~ 1
X n +1 = { Yn +l if Xn = 0,
where Yn+l denotes the lifetime of the piece of equipment installed at time
n. It is plausible to assume that (Yn , n ~ 0) is an i.i.d. sequence and Yn +l
is independent of Xk, k ~ n. The property (3.1) is then clearly satisfied.
Thus random phenomena of the above type can be described by a com-
mon stochastic process model.
48 Lesson 3
(Exercise 3.3).
Proof.
P (Xnl = iI, ... , X nk = ik) = 2: 1r(SO)P6061 ... Panl-lil Pil6nl+1 ... P6nk-lik'
(3.4)
where the summation E is over all Sq E 8, q E U~~IO 1m.
For example,
Remarks.
(a) The Markov property implies that all future moves of the chain
depend only on the present state. Specially, if A is an event depending on
50 Lesson 3
= L
i,,+lES
Pi"i"+l Pi"+li"+2 .
Now
where Plj denotes the n-step transition probability P(Xn = jlXo = i). For
= =
n 1, we simply write Pij, and for n 0 we have
ifi = j
Pi~ = { ~ ifii:j.
In the remark (a) after Theorem 3.1, we have Pi] = EkES PikPkj. Thus,
if we view IP = [Pij], i, j E S as a matrix, then Pi} is precisely the entry
(i, j) if the product of the matrix IP with itself, i.e., IP 2. Recursively, we
can get all n-step transition probabilities from IP. Moreover, "In, m ~ 0,
we have the following Chapman-Kolmogorov equation:
Pijn+m = "'"
L.J pn pm
ik kj' v,, J
\.J"
E S. (3.6)
kES
Indeed,
0.9
0.7
Let us formulate the concept of communication between states, which,
in turn, is used to decompose the state space.
Definition 3.2 (a) State j can be reached (or accessible) from state i, in
symbol, i --+ j, if Ptj > 0 for some integer n ~ O. In words, j can be
reached from i if starting from i, the chain can reach j in a finite number
of transitions with positive probability. Note that, since 'Vi, Pi~ = 1, any
state i can be reached from itself
(b) If i --+j and j --+ i, then i and j are said to communicate, in
symbol, i - j.
Discrete - Time Markov Chains 53
i = {j E S : j +---+ i},
p~+m+k
n
>
-
p~+k p~
I] ]1 ,
Pijn+k >
-
pnpk
ij jj'
so that
Piin+m+k >
_
pnpk pm > 0 .
ij jj ji
ifi = 1
Pii = 0, Vi E S, POi ={ ~ ifi i= 1,
and for i i= 0,
if i = i + 1
Pij = { i _p if i = i-I,
where p E (0,1). It can be checked that the chain is irreducible. Thus all
states have the same period. Since Pll > 0 only when n = 2k, k ~ 1,
6(0) = 2, and the chain is periodic with period 2.
Discrete - Time Markov Chains 55
L .ms L plisn.
00 00
Then
Pu(s) = 1/ (1 - Fu(s)) . (3.7)
56 Lesson 3
Indeed,
Fii(S)Pii(S) f: (t fi~Pi~-le) sn
n=O le=O
But
= P(Xn = ilXo = i)P(Xo = i),
peA)
and hence (3.8) follows (Note that f~ = 0).
Theorem 3.2 A state i is recurrent if and only if E:=1 Pi~ = 00.
Proof.
(a) Necessity. Suppose E:=l fn
= 1. Then using (i) of Abel's lemma
(Exercise 3.10), we get lims/1 Fii(S) = 1. Thus by (3.5), lim&/1 Pii(s) =
00. Using (ii) of Abel's Lemma, we get E:=1 Pi? = 00.
Discrete - Time Markov Chains 57
(b) Sufficiency. Suppose that 2::=1 Pi~ = 00. If 2::=1 III < 1, then
by (i) of Abel's Lemma, lim3/1 Fu(s) < 1, which, by (3.5), implies that
lim6 /1 Pii(s) < 00. But then, by using (ii) of Abel's lemma, 2::=1 < pa
00, contradicting the hypothesis. <>
Remarks.
=
(i) While Iii 2::=1 fa is the probability that, starting at i, the state
i is eventually re-entered , the sum 2::=1 Pi~ is the expected number of
returns to i. To see this, let Ni be the random variable whose value Ni(W)
is the number of times i appears in the realization Xl (w), X 2 (w), . . .. In
terms of the indicator function l{i}(.),
00
Ni(W) =L l{i}(Xn(W»,
n=l
so that 00 00
= i) = L
00
Proof. By hypothesis, Plj > 0 and PI': > 0 for some n, m ~ 1. For k ~ 1,
we have PJj+m+k ~ PJ': Pi~Plj. Thus
00 00
'"'
L." pn+m+k
jj
>
-
pmpn '"' pk
ji ij L." ii'
k=O k=O
Note that PI': Plj > O. Thus it is clear that if E~=o Pi~ 00, then
pk_
,",00
L.."k=O jj - 00. <>
From the above, we see that within each equivalence class, states are
of the same nature. (For transience as a class property, see Exercise 3.11).
Moreover, as it will be shown below, no states outside of a recurrent class
A can be reached from the states in A. However, recurrent states in A can
be reached from transient states outside of A (thus it is possible to leave a
transient class, but then the chain will never return to it).
so that
lim" Pt; = lim (1) = 1,
n-+oo L..J n-+oo
jES
P;; = U- p
when j = i
whenj=i
otherwise.
+1
Discrete - Time Markov Chains 61
Then all states are transient. Indeed, for any i E S, i -+ i + 1 but i + 1 f:. i
(see Exercise 3.13).
Thus, unlike the finite case, it is possible that there is no recurrent
state in the infinite case. Also, in the infinite case,it is possible that all
states in an infinite irreducible closed set are null recurrent. For example,
consider Example 3.4 in Section 3.1. Suppose lifetimes are measured in
units of time. Let g(n) be the common probability mass function of the Yn's.
Then (Xn , n ~ 0) is a Markov chain on S = {O, 1,2,· ..} with transition
probability given by:
I ifj=i-l
for i ~ 1, {
Pij = 0 otherwise
and POj = g(j + 1), Vj E S. Suppose g(j) > 0 for all j ~ 1. It can be
checked that the chain is irreducible. By examining the state 0, one finds
that the chain is recurrent. If, in addition, I:~=l kg(k) = 00, then state 0,
and hence the chain, is null recurrent.
We close this section with a decomposition theorem. Again, in the
Example 3.5, the state space S =
{I, 2, 3, 4} is decomposed into {I} U
{2,3}U{4}, in which {4} is the set of transient states, while {I} and {2,3}
are irreducible closed sets of recurrent states. This type of decomposition is
true in general. The state of an arbitrary Markov chain can be decomposed
into a set of recurrent states (denoted as SR) and a set of transient states
(denoted by ST), one of which may be empty. Moreover, when SR f:. 0, it
can be decomposed further uniquely into disjoint irreducible closed sets.
Proof. We will first show that for each i E SR, there is an irreducible closed
set B(i) containing i. Since SR ~ S is at most countable, the collection of
such irreducible closed sets is at most countable. To obtain the theorem,
it then suffices to prove that any two B( i) and B(j) are either identical or
disjoint.
Let i E SR. Define B(i) = {j E SR : i -+ j}. Obviously i E B(i) (since i
is recurrent). Let j, k E B(i), then i +---+ j and i +---+ k in view of Theorem
3.4 (its proof), and hence j +---+ k, so that B(i) is irreducible. Let j E B(i)
and k ¢ B(i). Then j f+ k since, otherwise, i -+ k, so that k E B(i). Thus
B(i) is closed. Suppose B(i) n B(j) f:. 0. Let x E B(i). Then x +---+ k
where k E B(i) n B(j). But k E B(j), we must have x E B(j) since B(j)
is closed. Thus B(i) ~ B(j). By symmetry, we conclude that B(i) = B(j).
<>
62 Lesson 3
if n is odd
P~o= { ~ if n is even.
Thus, P~o does not have a limit as n -+ 00. The problem is with aperiod-
icity.
(ii) Consider a Markov chain such that its state space S has two disjoint
closed sets A and B. Then, for each n ~ 1,
ifi E B
P(Xn E AIXo = i) = { ~ ifi E A
. IPn - -1-
11m [b a]
n-co - a+b b a
so that o:(j) = liffin_co Plj exists and 0:( . ) is indeed a probability distri-
bution:
0:(0) = lim Pcfo = lim P lo =
n-co n-co
~b
a+
and
0:(1) = n_co lim Pfl = ~b.
lim Pcfl = n_co a+
As mentioned in Section 3.2 that, in general, the computations of ]pn
are hard, and in fact, the determination of the limiting distribution of the
chain, when it exists, by other means, will provide approximations for ]pn
for large n.
To determine the limiting distribution 11", we observe that 11" is invariant,
that is
1I"(j) = L
11"( i)Pij 'Vj E S. (3.11)
iES
An invariant distribution is also called a stationary distribution. Indeed,
1I"(j) lim P(Xn+1 = j) = n-+oo
lim ""' P(Xn = i)Pij
n-+oo L...J
iES
(by dominated convergence theorem, see Appendix). In fact, 11" is the unique
invariant distribution. Indeed, if 11"' is another invariant distribution, then
= = =
take 11"0 11"', we have, for n ~ 0 P(Xn j) 1I"'(j) (see Remark below),
so that liffin_co P(Xn j)= = 1I"'(j) , implying that 11"'( .) 11"(.), since =
the convergence in distribution of the Xn's to 11"( . ) is valid for any initial
distribution 11"0.
Remark. If the initial distribution 11"0 is invariant (stationary), then the
Xn's have the same distribution (and hence the chain is a strictly stationary
process). This can be seen as follows. Suppose
then the chain has a unique stationary distribution; if the set of all positive
recurrent states of the chain is non-empty and is decomposed into distinct
irreducible closed sets, then the chain has distinct stationary distributions.
Proof. (a) Suppose that the chain has a stationary distribution 7r. Since
the chain is irreducible, all states are either transient or recurrent. If the
chain is transient, then limn ..... oo Plj = 0, for all i, j E 8. Now, since 7r is a
stationary distribution, we have, for each n, and j E 8:
Thus
7r(j) = n-+oo
lim "7r(
L...J
i)PiJ = "7r( lim PiJ = 0
L...J i) n-+oo
iES iES
J.li
n=l n=l
Take 7ro = 7r so that P(Xo = i) = 7r(i), and the chain becomes a stationary
process. Now since (Xo = i) ~ (11 2:: 1),
= i) = P(Xo = i).
P(11 2:: 1, Xo
Note that if A, B are two events, then P(AB) = P(A) - P(AB'), where B'
denotes the complement of B. For n 2:: 2, Let
A = (Xm =1= i, m = 1, .. " n - 1), B = (Xo = i),
Discrete - Time Markov Chains 67
then
lim an
n-oo
= P(Xm =F i,m ~ 0) = 0,
since i is a recurrent state.
Now, from the above, we have
?rei) + ao - lim an
n-oo
P(Xo = i) + P(Xo =F i) = 1.
Now, observe that ?rei) > 0, Vi E S. Indeed, if ?rei) = 0, then since ?r is a
stationary distribution, we have
for all n and all i E S. On the other hand, the chain is irreducible by
hypothesis, there exists an n such that PI} > 0, thus ?r(j) = 0, for all i E S,
which is impossible since, again, ?r is a probability distribution on S. Thus
J.li = 1/?r(i) < 00, Vi E S, implying that all states are positive recurrent.
?r( i) = 1/J.li •
We are going to verify directly that ?r is indeed a stationary distribution,
that is,
L ?rei) = 1 and?r(j) = L ?r(i)Pij, Vi E S.
iES iES
68 Lesson 3
Let Nj(n) denote the number of visits to j during the first n transitions.
Then n
Nj(n) =L Ij(Xk)
k=l
so that
n
E(Nj(n)IXo = i) = LPi~'
k=l
-Tm
m
= Tl + (T2 - Tt) + ... + (Tm - Tm-d
m
---+ J1.j, as m --+ 00.
L~ = "L..J nlim
_ oo [.!.E(Nj(n)IXo
n = i)]
jES J1.j jES
lim" .!.E(Nj(n)IX
n_ooL..J n o = i) = 1
jES
Discrete - Time Markov Chains 69
2:- 1
jes J-Lj
1
Pjk= - ,
J-Lk
't/k E S.
The same result holds when S is infinite but with much more technical
~~. 0
Corollary 3.1 An irreducible Markov chain on a finite state space has a
unique stationary distribution.
Proof. In view of Theorem 3.5, such a chain is positive recurrent, and the
result follows from Theorem 3.7.
Remark. In general, a Markov chain on a finite state space has at least
one stationary distribution. It is not necessarily so if the state space is
infinite.
As mentioned previously, even if a Markov chain has a unique stationary
distribution 7r, it may happen that 7r is not the limiting distribution of the
chain, i.e. the chain may not be stable. However, if this unique stationary
distribution 7r is such that liIIln-+oo Plj = 7r(j), j E S, then necessarily
7r is the limiting distribution of the chain. It can be shown that for an
irreducible, aperiodic chain, we have
1
lim pT'.
n-+oo I)
=-,
J-Lj
for all i,j E S.
If the irreducible chain (Xn, n ~ 0) is periodic with period 6, then the chain
(X6n, n ~ 0) is aperiodic, and hence
Moreover
lim
n-+oo
Plj = 7r(j), "Ii, i E S.
Note that, in such a case, approximations to ]pn, for n large, can be ob-
tained.
We close this Lesson with an important stochastic model in physics and
biology, (population growth). Consider the following random phenomenon.
A population of "objects" evolves in generations as follows. Each object can
produce identical objects (offspring). The offspring of members of the initial
(zero) generation form the first generation, the offspring of the members of
the first generation form the second generation, and so on. Suppose that
each object independently produces offspring according to a probability
=
mass function 1 on S {O, 1,2, .. J.
Let Zi(n) be the offspring ofthe ith member in the nth generation, then
the Zi(n)'s are independent and have 1 as common offspring distribution.
Let Xn be the size of the nth generation. Then
P.i j = e ->.i(,.)jj"
112 }., ..
2,}E {012 }
" , •••.
=
also that if 1(1) 1- 1(0), then the sequence Xn is non-increasing (a.s.),
so that extinction cannot be avoided. Thus, in the sequel, we assume that
Since the subset of states {I, 2, ...} is a transient class, we see that
it T(w) < 00
lim Xn(w) = {
n-oo
~ ifT(w) = 00.
Thus
P ( n-oo
lim Xn = 0IXo = i) = P(T < oolXo = i) = ai,
and
i)
P ( lim Xn = oolXo = = P(T = oolXo = i) = 1 - ai,
n-oo
for all i E S = {O, 1, 2, ...}.
If a = 1, then extinction is certain. If a < 1, then the probability of
extinction, given Xo = i, is a i < 1, and 1 - a i > 0 is the probability that
the population grows to infinite.
72 Lesson 3
= I: f(n)t n ,
00
G(t) t E [0,1].
n=O
I: Plk ak = G(a).
00
k=O
by noting that
00
3.5 Exercises
3.1. Verify the Markov property of Example 3.1 and find the one-step
transition probability matrix.
3.2. Show that if (Xn, n 2: 0) is a sequence of i.i.d. discrete random
variables, then it is a Markov chain.
3.3. Let (Xn, n 2: 0) be a Markov chain.
(i) Show that, "In, X n +1 is conditionally independent of Xo, Xl, ... , Xn-l,
given X n . Hint:
(ii) Use (i) to show that for nl < ... < nk < nk+l,
P (Xn"+l = ik+1IXn1 = il , ... , X = ik) = P (X
n " nk +1 = ik+IIXn" = ik) .
3.4. (Ehrenfest Model) Two urns UI and U2 contain M balls in total. A
ball is drawn at random. This selected ball is transferred from the urn it
is in to the other. Let Xn denote the number of balls in UI at the end of
nth trial.
(a) Show that (Xn, n 2: 0) is a Markov chain.
(b) Specify the state space and one-step transition matrix of the chain.
3.5. Let (Yn , n 2: 1) be a sequence of i.i.d. random variables such that
X n -{ Y for n 2: 1
0 I +Y2 +···+y.n
for n = O.
-
Show that (Xn, n 2: 0) is a Markov chain and find its transition probability
matrix.
3.6. Let (Xn, n 2: 0) be a Markov chain on S = {a, b, c} with transition
l!]
probability matrix
0
IP = [Pij] = [ 0 t! 0
1.
4
1.
4
0
(i) Compute P(X4 = blXI = a) and P(X5 = blXI = c, Xa = c).
(ii) Describe the evolution of the chain by drawing a directed graph.
74 Lesson 3
(iii) Indicate the states which communicate with each other. Are there
any absorbing states'?
3.7. Let (Xn, n ~ 0) be a Markov chain with state space S. Let 0 =F A ~ S.
(i) Show that A is closed of and only of Pij = 0 for all i E A and j ¢ A.
(ii) Show that a closed set A is irreducible if and only if all states in A
communicate with each other.
3.8. Show that the communication relation - on a state space S of a
Markov chain is an equivalence relation.
3.9. Let (Xn, n ~ 0) be a Markov chain on S with transition probability
matrix IP.
(i) Show that, for i E S, if Pi1 > 0 then Plf > 0 for some m > n. (Hint:
use Chapman-Kolmogorovequation.)
(ii) Let In be the probability mass function of X n . Verify that, for all
j E S,
In+1(i) = :L In(i)Pij.
iES
= :Lan.
00 00
lim :Lan sn
3/1 n =O n=O
(ii) If an ~ 0 and
00
lim :Lansn
6/1 n =O
= a <- 00,
then E~=o an = a.
3.11. Show that transience and positive recurrence are class properties.
3.12. Let A be a recurrent class. Verify that the sub-matrix [Pi;], i.j E
A, is stochastic matrix, and the associated Markov chain is recurrent and
irreducible.
3.13. Let (Xn' n ~ 0) be a Markov chain with state space S and transition
matrix IP.
Discrete - Time Markov Chains 75
0.2 0.8 0 0 0 0
0.5 0.5 0 0 0 0
IP = I 0.1 0.2 0.3 0.4 0 0
0.1 0 0.2 0.3 0 0.4
0 0 0 0 0.3 0.7
0 0 0 0 0.4 0.6
(ii) Show that the set of all null recurrent states is closed.
3.16. Let (Xn, n ;::: 0) be a Markov chain with state space S and transition
matrix IP. Define Nj(w) = number of times j appears in the sequence
(X 1 (W),X2 (w)," .).
(a) Show that 'tIk ;::: 1,
P(Nj = klXo = z) =
. {1- lij
lij(Jjj)"'-I(l- hj)
for k = 0
for k ~ 1
76 Lesson 3
and
P(Ni = klXo = i) = Uii)k(1 - Iii), for k ~ o.
(b) Suppose j is a transient state. Show that P(Nj = oolXo = i) = o.
(c) Suppose j is a recurrent state. Show that
(Hint: use Exercise 3.16 (a) to show that Eij Plj < 00.)
3.18. Let (Xn; n ~ 0) be a Markov chain on S = {- .. , -2, -1, 0,1,2, ...}
with
Pij ={ P if j = i + 1
I-p ifj=i-l
for all i E S, where p E (0,1).
(i) Verify that this chain is irreducible and compute Plj for n ~ O.
(ii) Show that the chain is a recurrent chain when p = 1/2. (Hint:
n! "" v27rn e-nn n , as n --+ 00.)
1
3.19. Let
0.2 0.8 0
F=U
1 0 0
o 0 1
010
be the transition matrix of a Markov chain on S = {a, b, c, d}. Determine
the stationary distributions of the chain.
3.20. Let (Xn, n ~ 0) be a Markov chain on S = {O, I} with
IP=[I- P p ]
q l-q'
(ii) Find the unique stationary distribution 7r of the chain and verify
that
7r(j) = lim Ptj 'Vj E S.
n-oo
3.21. Let (Xn, n ~ 0) be a Markov chain on S = {O, 1,2,3,4, 5} with
0.1 0.9 0 0 0 0
0.5 0.5 0 0 0 0
IP _ I 0.1 0.2 0.3 0.4 0 0
- 0.1 0 0.2 0.3 0 0.4
0 0 0 0 0.9 0.1
0 0 0 0 0,5 0.5
L: I(n)t
00
Poisson Processes
These relations are visible on figure 1 which shows a typical sample path
for the Counting Process (Nt).
Nt
4
r----:
3
2. r
I
I
~
I
0
TI T2 T3 T4 TS
The following relations between (Nt) and (Tn) are also of interest
On the other hand if the sources which generate the events are indepen-
dent, then it is natural to suppose that the respective numbers of events
which occur on nonoverlap ping time intervals are stochastically indepen-
dent.
Furthermore, ifthe sources keep the same intensity during the time then
the distribution of N t +h - N 6 +h does not depend on h.
Poisson Processes 81
P(Nh = nl ,
... , Nt k -- n)
Ic = e
-Atl (Atl)nl
nd ...
e-A(tk-tk-l) (A(tlc - tlc_l))nk-n k- l
x (nlc - nlc-l)! 10~nl~···~nk' (4.7)
where nl, .. " nlc E IN. Now, according to Kolmogorov's existence theorem
(See Lesson 2), the distribution of the entire process is determined.
Before making some comments about the axioms, we give the proof of
the Theorem 4.1. In that proof and in the following of the Lesson, the
expression "with probability one" will be omitted.
82 Lesson 4
Proof. Let gt-. be the moment generating function of Nt - N.:
00
Using the decomposition Nt = (Nt - N.) + (N. - No), and axioms Al and
A2 we get
gt(u) = g.(u)gt-.(u), 0 ~ s < t, 0 ~ U ~ 1, (4.9)
which implies for each pair (p, q) of integers
On the other hand (4.9) entails the decrease oft 1---+ gt(u), consequently
(4.10) remains valid for irrational t's:
Now it is clear that A(O) =t= 0, unless (4.12) implies gt(O) = 1 for each
t > 0, consequently 1 = peNt = 0) = P(Tl > t) for each t > 0, hence
Poisson Processes 83
Tl = +00 a.s. which contradicts Ao. Thus (4.14) may be written under the
form
P(Nh > 2)
---'--'-'-.:;:::-::....,.,-+. :::; P(T2 < Tl + h).
Now, as h ! 0, P(T2 < Tl + h) ! P(T2 :::; Tt} = 0 and 1- e-hA(O) '" hA(O)
hence (4.13).
On the other hand, we have
A(U) = lim.!.
h!O h
(1- e-hA(U»)
so by (4.8) and (4.12)
Consequently,
which is the moment generating function ofP(At) and the proof is complete.
<>
The following important properties of (Nt) have been obtained in the
above proof:
lim P(Nt+h - Nt ~ 2) = O.
h..... O P(Nt+h - Nt = 1)
Wn = Tn - Tn-l. n ~ 1.
If (Nt) is a Poisson Process, then the sequence (Wn ) has some special
properties given by the following
Theorem 4.2 Let (Nt) be a Poisson Process with intensity)... Then the
Wn 's are independent with common exponential distribution characterized
by
P(Wn > t) = e- At , t > 0, n ~ 1 (4.18)
and consequently
E(Wn ) = 1/)", n~1. (4.19)
( Y1, \2 -AY21
Y2 ) t--+ 1\ e {O<Yl <Y2}
is the density of (T1, T 2). Since (W1, W2) = (T1, T1 +T2) it follows that the
density of (W1, W2) is
A2 e- A(W l +w2)I{wl>0,w2>0},
(At)n-1
fn(t) = Ae- At 1_ ,\,lIR+(t). (4.21)
n-1 (At);
P(Tn > t) = P(Nt < n) = ~e-At_.I_' t > O.
;=0 J.
86 Lesson 4
Taking derivative with respect to t we get
Theorem 4.3 Let (Tn) be a Point Process such that the random variables
=
Wn Tn - T n -1, n ~ 1 are independent with the same exponential distribu-
tion £(A). Then the associated Counting Process (Nt) is a Poisson Process
with intensity A.
Setting
ti = W1 + ... + Wi, 1~ i ~ n,
we obtain the density of (T1,"', Tn):
1 t
00 Ak+n+1e-At,,+n+ldt k+n+1 -- e-At,k+n
1\ ,
1
and
sk
dtl·· ·dtk = ,.
O<tl< ... <t,,~, k.
Combining the above results and applying Fubini's Theorem (see Appendix),
we obtain
k ~ 1, °
$ tl < ... < tk, or by
where ...... means "is distributed as" and ® denotes the product measure (see
Appendix).
Theorem 4.4 Let (Nt) be a Poisson Process associated with the Point
Process (Tn), then
where (U(l), ... , U(k) denotes the order statistics associated with i.i.d. ran-
dom variables with uniform distribution over [0, t].
88 Lesson 4
Proof. It is easy to show that (U(l),···, U(k») has the density k!/(t k )
l{o<ul< ... <uk<t} (exercise).
Now let us consider the conditional probability
. p(AnB)
P(t; $11 $ t; +h;; 1 $ z $ klNt = k) = P(BIA) = r t f A\ ,
°
where < tl < tl + hl < t2 < ... < tk + hk < t.
Noting that An B = (ti $ 11 $ 11 + hi; 1 $ i $ k, Tk+l > t) and that
the density of (Tl' ... , Tk+d is Ak+le-AUk+l1{o<Ul< ... <Uk+d (see (4.22)),
we get
Ni S) = NsH - Ns, t ~ O.
Then we have the following renewal property of (Nt).
Proof. The proof of the first claim is straightforward since Ao, A!, and
A2 are clearly satisfied.
The second claim means that the random vectors Ul = (Ntl' ... , N tk )
and VI = (Nt~ll"'" Nt~~) are independent for any choice of h ~ 1, k ~ 1,
and 0 ~ tl < .. " < tk ~ S < tk+l < .. " tho
In order to prove that statement it suffices to remark that U = (Ntl' N t .-
N tl , ... , Ntk - N tk _ l ) and
P ( Nt-At)
IT; ~ x --+.
1
m= 1:& e- u 2/2 du, x E JR. (4.30)
VAt v211" -00
Ms = N A -l(s), s ~ 0, (4.31)
lim P (Nt+h - Nt
h-O(+)
= klNt+h - Nt ~ 1) = rk.
- Poisson processes in IRd: Let E be a bounded Borel set in lRd and
let m be a bounded measure on (E, BE), where BE is the Borel u-field of
E. A family (NB' BE BE) of integer-valued random variables is called a
Poisson Process with mean measure m if NB '" P(m(B)) for any B E BE
and if for any k ~ 2, any B l ,···, Bk disjoint elements of BE, the r.v.'s
NB 1 , · · · , NBk are independent.
- A Cox process is a non-homogeneous Poisson process where (A(t), t >
0) is itself a stochastic process.
4.6 Exercises
4.1. Customers arrive at a shop according to a Poisson Process at a mean
rate of 10 times per hour. Find the
(i) Probability that only 1 customer arrives in 2 minutes,
(ii) Probability of no customer in 5 minutes, and
(iii) Probability that in two disjoint 2 minutes time intervals there ar-
rives at least 2 customers.
(Hint: Use Nt '" P(tf6).)
4.2. With the same assumption as in exercise 1, compute the probability
that the time interval between successive arrivals will be
(i) longer than 6 minutes,
4.3. A particle counter records only every second particles arriving at the
counter. Particles arrive according to a Poisson process at a mean rate of
6 per minute. Let S be the waiting time between two successive recorded
particles. Find
(i) The distribution of S,
(ii) E(S) and V(S),
(iii) peS < 1).
(Hint: write S = Wn + Wn+1 .)
4.4. Let (Nt,t ~ 0) be a Counting process satisfying axioms A~, Ai, A 2 ,
and A3 . Show that (4.6) is valid.
4.5. Let (Tn, n ~ 1) be a Point process associated with a Poisson process
with intensity A. Show (4.11) using the relation Tn = Wi + ... + W n .
4.6. Prove Theorem 4.4.
4.7. Let (Nt, t ~ 0) be a Poisson process with intensity A and let s be a
strictly positive instant.
(i) Show that
M t = Nl- N t2 , t ~ O.
L: IB(Tn),
00
NB = BEBE .
n=1
(ii) Find the distribution of (NBl> ... ' NB/o), where B 1 , •.. , Bk are dis-
joint Borel sets of E.
(iii) Show that (NB , B E BE) is a Poisson process in IR and find its
mean measure.
4.14. Let (Nt, t ~ 0) be a nonhomogeneous Poisson process with intensity
function A(t). Find the conditional distribution of (T1 , .•• , Tk) given Nt =
k.
4.15. Let X t = 2::;'0 Y n , t ~ 0 be a compound Poisson process. Suppose
that A is the intensity of (Nt) and Y n is a zero mean random variable with
variance 0- 2 > 0 and characteristic function ¢, n ~ 1.
(i) Find the characteristic function of X t .
(ii) Find the asymptotic distribution of Xt/VX as A tends to infinity.
Lesson 5
Continuous - Time
Markov Chains
= =
P(Nt ilNsl i1," .,N." in,Ns i)= =
= = = =
P(NS1 i1,.·.,Ns i,Nt i)/P(Ns1 i1, ... ,N. i) = =
_ P(NS1 = =
i1,Ns2 - N. 1 i1 - i1, ... ,Nt - Ns i-i) =
P(NS1 =
i1,Ns2 - NSl =
i2 - i1,· .. ,N. - N." i-in) =
= =
= P(Nt - Ns i-i) P(Nt iiN. i). = =
95
96 Lesson 5
Moreover,
IP (0) = 1= [Oij],
the identity matrix, where
Oij = { ~ =
if i j
ifi=f;j.
Continuous - Time Markov Chains 97
with
and
P (Xt+h - X t = 11Xt = i) = Aih + o(h). (5.3)
The positive numbers Ai, i E S, are called the birth rates of the process.
98 Lesson 5
Remarks.
(i) Ai is interpreted as the birth rate at an instant at which the popula-
tion size is i.
(ii) A Poisson process is a birth process with Ai = A, for all i E S.
(iii) (5.2) and (5.3) imply that
and hence
Exam.ple 5.2 Suppose that, in the population of example 5.1, new individ-
uals immigrate into the population at a constant rate v. Then the birth rates
=
become Ai iA + v. Chains of this type are called linear birth processes
with immigration.
In a birth process, the population size can only increase with time. To
model random phenomena in which a population can increase as well as
decrease in size, say by births and deaths, we need to include the concept
of death rates into the description of this more general type of processes.
Continuous - Time Markov Chains 99
Aih + o(h) if k = 1
P (XtH - X t = klXt = i) = { J.tih + o(h) if k =-1 (5.4)
o(h) if Ikl > 1.
Remark.
It is assumed that births and deaths occur independently of each other.
Of course, Ai ~ 0, J.ti ~ 0 with 1'0 = O. We will discuss the problem of
modeling a birth and death process with given Ai and J.ti later. If 1'0 =
0
for all i ~ 0, then the chain is called a birth chain; if Ai for all i ~ 0, then
the chain is called a death chain.
(Note that P (IXt+h - Xtl ~ 21Xt = i) = o(h).) Thus, Ai = A and J.ti = iJ.t.
This can be explained by saying that the chain has a positive probability of
escaping to infinity (by adding the element 00 to the state space S). Such
a chain is said to be dishonest. The chain is honest when (5.6) holds.
(5.7) is the Chapman-Kolmogorov equation. Its proof is similar to the
discrete-time case and is left as an exercise. In matrix form, (5.7) expresses
the fact that the function t --+ JP (t) has the "semigroup property":
matrices for a Markov chain if (5.5) - (5.7) hold. The condition (5.7) is
essential since it allows to define finite dimensional distributions via
=
That is the function Pij(t) is right continuous at t 0 (recalling Pij(O) =
Oij). This condition turns out to be general enough for investigating general
Markov chains.
Definition 5.4 Let (IP(t), t ~ 0) be the transition matrix function of a
continuous-time Markov chain (Xt, t ~ 0) on S. We say that (IP(t), t ~ 0)
is standard if
limlP(t) = I. (5.9)
t'\.O
102 Lesson 5
Remark.
For (5.9) to hold, it suffices that
lim Pii(t)
t'\.o
= 1, for all i E S.
L:q;j ~ O.
iES
Taking limits as t '\. 0 of both sides (on the right hand side, first consider
a finite number of terms, then let the number of terms go to infinity), we
obtain
qii ~ L:qij.
j¢i
L:Pij(t) =1
jES
Continuous - Time Markov Chains 103
so that
1 - Pii(t) =L P;j(t).
j#
Since the P{j(O) exist and are finite, for j ::P i, it follows that (since S
is finite), qii exists and is finite. The situation when S is infinite is more
complicated. For example, although P{j(O) < 00 for j ::P i, qii might be
-00; also even if qii is finite, (5.10) may not hold.
From the above analysis, we see that the generator Q of a (honest)
Markov chain should be such that
q"IJ >
_ 0 for j ::P i
and
Lqii =0 for all i.
jES
or in matrix form
IP'(t) = IP(t)Q. (5.11)
Similarly, differentiating (5.7) with respect to t and set t = 0, we get
the Kolmogorov Backward equation
Let us look at the case where S is finite. Given a matrix Q with qij ~ 0
for j ::P i, qi < 00 for i E S, and LjES % = 0 for all i E S. Let
+L
00 t n (n)
Zij (t) = Oij ,qij , (5.13)
n=l
n.
where q~j) denotes a generic element ofthe nth power ofthe matrix Q, i.e.,
Qn = [q~)].
Since the matrix Q is finite, let a = max !qij! < 00, then obviously,
!qIJ~~)! <
-
can ,
104 Lesson 5
eA
n=l
so that
tn
=L = I.
00
limPii(t)
t ...... o
=1 uniformly in i. (5.15)
Note that, since LjES Pij(t) = 1, we have Pij(t) :::; 1 - Pii(t) so that
(5.15) implies
limPij = O.
t ...... o
It can be shown that, under (5.15), LjES qij = 0 for all i E S. Moreover
IP(t) is the unique solution of Kolmogorov's equations, namely IP(t) = etQ .
In other words, if (IP(t), t 2: 0) of a Markov chain (Xt ) is uniform, then the
knowledge of the generator Q = IP'(O), together with the initial distribution
of Xo, determines the distribution of the chain.
Now, in view of the Exercise 5.14(ii), the condition (5.15) is clearly
satisfied when
sup Iqii I < 00. (5.16)
i
Continuous - Time Markov Chains 105
then
Z(t) = [Zij(t)]
is the unique solution to both Kolmogorov equations.
The meaning of qii is clarified as follows. If (Nt, t ~ 0) is a Poisson
process with intensity A, then
A ifj=i+1
qij = { -A if j = i
o otherwise.
lim Tn = 00 (a.s.)
t_oo
The times Tn, n ~ 0, are the instants of transitions of the chain. At time
Tn the chain is in state Yn =
X T". Thus
00
The condition
lim Tn = 00 (a.s.)
t-oo
is needed for the above representation of X t is valid for all t ~ o. Chains
satisfying this condition are called non-explosive (or regular). If, with posi-
tive probability, liffit_oo Tn < 00, then the chain explodes, in the sense that
it can make an infinite number of transitions during a time interval of finite
length (so that sample paths of the chain might not be step functions. By
a step function, we mean a function such that, in any finite time interval,
it has at most a finite number of jumps). It turns out that if (IP(t), t ~ 0)
satisfies (5.15), then explosions are not possible (so that almost all sample
paths of the corresponding chain are step functions). In applications, as we
have seen before, this condition can be checked by looking at the matrix of
infinitesimal transition rates (generator) Q, namely (5.16).
Suppose that (5.16) is satisfied. We are going to show that waiting (or
holding) times in states are exponentially distributed, and in fact, condi-
tionally upon states being visited, these random times are independent.
Given that X, = i, the waiting time Wi of the chain in state i is the
(random) time that the chain first leaves i, that is
A = n An =
00 {
w: X u +, = i, Vu E [0, t] of the form u = 2~}
n .
n:l
Note that
P(AnIX. = i)
P (X. = i, X.+t/2" = i, ... , X.+(2"-1)t/2'" X.+t = ilX. = i)
2"
[Pii (2tn ) ] (by Markov property)
lim P(AnIX.
n--+oo
= i) = n-+oo
lim [Pii (2t )] 2"
n
lim Pii(h) - 1
h'\.O h = -00,
or
. 1 - Pii(h)
11m h = 00,
h'\.O
meaning that, for arbitray 0 < a < 00, we have (1 - Pu(0))h- 1 > a for h
sufficiently small. Thus, for n sufficiently large,
implies that
lim
n-+oo
[PH (~)]
2n
2" < e-at
- ,
state such that qii > -00 is called stable. A stable state i such that qii = 0
is called an absorbing state ( P(Wi > tlX3 = =
i) 1, for all t > O. Once
the chain enters i, it remains there forever). When entering a stable, non-
absorbing state i (-00 < qii < 0), the chain spends a random time Wi in
i, where Wi is exponentially distributed with mean -l/qii, then jumps to
another state.
Consider a Markov chain (Xt, t ~ 0) such that all states are stable
(qii > -00, Vi E S). Using the strong Markov property (see Lesson 3),
it can be shown that the successive states visited by (Xt ), namely Yn =
XT", n ~ 0, from a discrete-time Markov chain whose one step transition
matrix R = [Rij] (called the jump matrix) is determined as follows.
If i is absorbing (qii = 0), then the chain (Xt ) will remain in i perma-
nently once entered it. Thus it cannot jump to any other states, hence
if j = i
Rij = P(Yn +1 = jlYn = i) = { ~ if j:/; i
When qii < 0 (recall that qii ~ 0 for all i and 0 ~ qij < 00 for all j :/; i),
that is i is non-absorbing state, then obviously Rii O. =
For j :/; i and i non-absorbing, we have
Rij = -qij/qii. (5.18)
Thus we have
Cij if qii = 0
Rij = { (Cij - l)qij/qii if qu < o.
The derivation of (5.18) is essentially based upon the strong Markov
property. To see why (5.18) holds, argue as follows.
-qij 1. Pij(h)
--= 1m ,
qii h'\,O 1 - Pii(h)
where Pij(h)/(l-Pii(h)) is the conditional probability that the chainjumps
to state j given that the chain is in the state i in the time interval (t, t + h)
and is going to jump during that interval.
In summary, under suitable conditions, the structure of (yt) can be
described as follows. The discrete-time chain Yn = XT", n ~ 0 is Markov
with one step transition matrix R. Conditionally upon (Yn ), the waiting
times Tn+l - Tn, n ~ 0, are independent and exponentially distributed with
parameter depending on the states being visited. Thus when the chain (Xt )
enters an absorbing state, it will stay there forever, whereas if it reaches a
non-aborbing state XT" = i, it will spend a random time Wi in that state,
where Wi is exponentially distributed with mean -l/qu, and then jumps
to another state XT,,+1 = j with probability Rij.
Continuous - Time Markov Chains 109
qi,i+1 Ai,
qi,i-l Pi,
qii -(Ai + Pi),
and qij = 0 otherwise. Note that for each i,
P:i(t) = -AiPii(t)
so that
Pii(t) = e->';t, t ~ O.
Other Pij(t), for j > i, can be computed recursively, via
Thus for arbitrary specified birth rates Ai, i ~ 0, the above Pij(t) given
by (5.22) are non-negative, satisfy the Chapman-Kolmogorovequation. But
it may happen that
Since the Pij(t)'S are functions ofthe birth rates Ai's, this phenomenon can
be checked by examining the generator Q. It can be verified that if (5.22)
provides a proper probability distribution, that is,
(so that P(Xt < 00) = 1 for all t), then (5.22) is also the unique solution
of backward equation, and in this case, the generator Q does specify the
transition probabilities of a Markov chain. Thus conditions on Q for (5.23)
to hold are of practical importance. Previously, in Section 5.2, we have
mentioned a sufficient condition for (5.23) to hold namely (5.16). This
condition might be too strong for Birth and Death chains. It turns out
that a weaker condition might be sufficient and necessary for the generator
Q of a birth chain to specify uniquely the distribution of the chain.
Theorem 5.1 Let (Xt, t ~ 0) be a birth chain on S = {O, 1,2, ...} with
birth rates Ai, i ~ O. Then a necessary and sufficient condition for (5.23)
to hold is
00
~ ;. =00. (5.24)
1=0 1
Continuous - Time Markov Chains 111
S~(t) = -AnPin(t).
In virtue of the condition Pij (0) = Oij, we obtain
As n -+ 00, the right hand side of (5.25) decrease to a limit a(t) ( since
obviously, Sn(t) increases with n ). Thus for each n ~ i,
1otSn(s)ds
nt
and hence
= E 1Pij(s)ds ~ a(t) E~·
1 n
j=i 0 j=1 J
a(t) L: ~1 ~
n it Sn(s)ds ~ t.
j=1 J 0
Under (5.24), these inequalities can only hold when a(t) = 0 for all t. Thus
Sn(t) -+- 1 as n -+ 00, for all t, meaning that (5.23) holds.
(b) Necessity. Since
io
t p... ()d _ l-Sj(t)
$J s s - A. '
J
j ~ i,
we have
t Sn(s)ds ~ L: ~.1
Jo
n
o j=i J
Remark.
For a Poisson process, the transition matrices (1P(t) , t ~ 0) form a uni-
form semigroup (condition (5.16) holds). Also, condition (5.24) is clearly
satisfied. For a linear birth process (Example 5.1), where Ai = iA, the
chain is not uniform, but condition (5.24) does hold. As an example for a
dishonest chain, consider Ai = i 2 , i ~ 1. Since
00
'" 1
!--' i2 11'2
= -6 < 00,
,=1
the above theorem asserts that, for some t and i, EjEs Pij(t) < 1, so that
the chain escaped to infinity at or before time t with positive probability
1 - EjEs Pij(t).
For a general Birth and Death process, the situation is similar. Given
arbitrary Ai ~ 0, I-'i ~ 0, and i ~ 0, there always exist transition prob-
abilities Pij(t), as a solution to Kolmogorov's differential equations, such
that EjEs Pij(t) :$ 1. Under some conditions on the A;'S and l-'i'S (e.g.,
they are bounded or increase sufficiently slow), this solution is unique and
determines a honest chain.
The constant a is determined by the initial condition Poo (O)=I, and finally
For each i, by the nature of the death chain, it suffices to find Pij(t) for
j $ i, subject to Pij(O) = Oij. Note that Poo(t) = 1. The forward equation
takes the form
PIo(t) = I'Pi1(t)
PIj(t) = -I'Pij (t) + I'Pi,j+1(t), j = 1, ... , i - I
P{i(t) = -I'Pii(t).
The solution of this system of differential equations can be obtained directly
by using (5.14).
First, Pu(t) = e-~t. Next,
If we let f(t) = Pi,i-1(t) and g(t) = I'Pii(t) = I'e-~t, then the above
equation is of the form
Thus
Pi,i-1(t) = 10t e-~(t-s)l'e-~tds = I'te-~t.
Similarly, we get
we have
P/o(t) = Jl Pi1(t) = Jl(Jlt)i-1 -JJt
(i-I)! e
so that
it
PiO(t) = o Jl(Jls)i-1
,. e- JJ3 ds,
1\1 i ~ 1.
5.4 Exercises
5.1. Let (Xt, t ~ 0) be a continuous-time Markov chain. Show that, for
any 0 ~ tl < t2 < ... < tn < ... < tm and ij E S, j = 1,2, ... , m,
lim
n-+oo
(1- an)n
n
= e- a .
Zij(t) ~ 0, LZij(t) = 1,
jES
and
Zij(t + s) = L Zile(t)Zlej(S).
leES
For a distribution 11"0, define
P (Xti = iI, ••• ,Xtn = in) = L 11"0 ( i)Ziii (tI) ... Zin_ii n(tn - tn-I).
iES
5.13. Let IP(t) be the transition matrix function of a Markov chain. Show
that, for given to > 0, the values of IP(t) for t > to can be computed from
the values IP(t) for t ::; to. (Hint: Use Chapman-Kolmogorovequation.)
5.14. Let (IP(t), t ~ 0) be standard.
(i) For fixed i, j E S, show that the function t ---+ Pij(t) is uni-
formly continuous. (Hint: Use Chapman-Kolmogorov equation to show
that IPij(t + h) - Pij(t)1 ::; 1- Pu(lhl).)
(ii) For each fixed i E S, show that
qu = h'\.O
lim Pii(h) -
h
1
~ -00
exists and is finite. (Hint: Use the following fact: for each t, h small enough,
and £ > 0, we have
Pij(h) < Pij(t) 1
- h
-- - ---
t-h 1-3£
for £ > 0 arbitrary small.)
5.15. Let (IP(t) , t ~ 0) be standard (of a honest Markov chain). Let i E S
be such that qii > -00. Verify that, for any j,
Pfj(t) = L: qiT"Pkj(t).
kES
5.16. In Example 5.5, compute PlO (t), P01 (t), and Pu(t).
5.17. Let (Xt, t ~ 0) be a linear birth chain with immigration (Example
5.2), that is >'i = V + i>., i ~ O. Use the forward equation to derive the
transition probabilities of the chain.
Lesson 6
Random Walks
1
P(Xn = 1) = P(Xn = -1) = 2.
It is reasonable to assume that the Xn's are independent. At time n, the
position of the person is
Sn = Xl + X 2 + ... + X n .
The above mathematical model can be used to describe the game of
heads or tails. At each toss of a fair coin, you bet on its outcome, winning
117
118 Lesson 6
one dollar if, say, heads comes up, and losing one dollar if tails come up.
Your gain after n independent tosses is expressed by
Indeed,
if i = i + 1
~j=n ifi = i
if j = i-I
otherwise.
When r =
0 and p q = =
1/2, the simple random walk is said to be
symmetric. Here is a realization of a symmetric random walk, starting
from state O.
z
21 •
• •
0
-I • • •
-2 •
03
120 Lesson 6
0-'
bilities
if j = i + 1
p;; =
if j i =
otherwise.
Poo = 1,
and for i ~ 1,
Pij ={ t if j =
i + 1 or j
otherwise.
=i-I
then (Sn, n ~ 0) is a random walk on {O, 1, ... , k}, whose transition proba-
bilities are given by
=1
POj = { ~ ifj
if j 1= 1,
= k-l
Pkj = { ~ if j
if j 1= k -1,
and for 1 ::; i ::; k - 1,
P if j = i + 1
P,; = { 1-p if j = i - I
o otherwise.
(b) The hitting time of a state i , that is the time at which state i is first
entered. Starting, say, from the origin, this is also called the first passage
time from 0 to i. If state i is an absorbing state, then the hitting time of i
is the time to absorption.
(c) In a restricted random walk with two absorbing states, say a and b,
one is interested in computing the probability that the random walk reaches
a before b, etc ...
respectively.
The asymptotic behavior of Sn as n -+ 00 is as follows.
If I' > 0 (i.e., p > 1/2), then Sn --+ 00 (a.s.) by the strong law oflarge
numbers: the random walk drifts to 00; whereas, if I' < 0, then Sn --+ -00
(a.s.): the random walk drifts to -00.
= =
If I' 0 (Le. p 1/2, the random walk is symmetric), then the random
walk oscillates between -00 and 00 with probability one, since in this case,
Remark. The above fact follows from the law of the iterated logarithm
which states that: for i.i.d. random variables Xn with E(Xn) = 0 and
o < (12 = V(Xn) < 00,
P (lim S~
sup (2 (1 2n Iog ogn )1/2
n-oo = 1) = 1
(I·n_oo. f (2(12nloglogn)l/2
and
P ImlD
Sn = -1 ) = 1.
From the above asymptotic behavior, it is plausible that the random
walk (as a Markov chain) is recurrent when I' = 0, and transient when
I' #= O. To see this, in view of Theorem 3.2 (Lesson 3), let us compute
= =
For -n ~ j ~ n, it is clear that P(Sn j) 0 if n and j do not have the
same parity, whereas if nand j are of the same parity,
P(S2n = 0) = ( 2: ) pn(l - pt
and
P(S2n-l = 0) = O.
E00(2)
:
Thus
00
~P;o = pn(l_ Pt·
lim f(n)/h(n)
n-oo
= 1.
(For a proof of Stirling's formula, see e.g., W. Feller, An Introduction to
Probability Theory and Its Applications, Volume I, pp 50-52, Wiley, 1957).
It follows that
P;o '" [4p(1- p)]n /fo.
But 4p(1 - p) ~ 1 with equality if and only if p = 1/2. Thus
00
LP;o=oo if and only if p = 1/2.
n=O
In other words, the random walk is recurrent when J.I. = 0 and transient
when J.I. =F O.
Note that, in the transience case, the probability of returning to state
o infinitely often is zero, whereas, in the recurrence case, this probability is
one. Specifically,
124 Lesson 6
Ak = {w : N(w) ~ k}, k ~ 1,
then A = n~=l Ak is the event that the random walk reaches state 0 an
infinite number of times. Assuming So = 0, we have
Now state 0 is recurrent, peA) = foo = 1 (see Section 3.3 for notation). By
induction, we have
7r: {I, 2, ...} --+ {I, 2, ...} is called a finite permutation if 7r(n) = n except
a finite number of n. A finite permutation of (Xn, n ~ 1) is (X.,..(n) , n ~ 1),
where 7r is a finite permutation. Thus, P(A) = 0 or 1 according to Hewitt-
Savage zero-one law. Note that, in the random walk case, the Xn's are
i.i.d ..
lt is interesting to note that, although A = limsupn ..... oo(Sn = 0) is not
n
a tail event, that is A ¢ U(Xk' k ~ n), the Kolmogorov's zero-one law can
be used to prove that P(A) = 1 when J.' = O. Indeed, for integer m> 0, let
hm. P ( -Sn
n ..... oo Vn > m) = --
~
1 1m
00
e- x 2/2 dx > O.
lim sup
n..... oo
S~ > m} ~ {W: limsup
{W: yn S~ > m},
n..... oo yn
we have
P(Cm) = P(Dm) ~ nlim P (S~ > m) > O.
..... oo yn
Thus
P(Cm) = P(Dm) = 1 for all m ~ 1.
It follows that P(Cm n Dm) = 1 for all m ~ 1, and hence P(B) = 1, where
B =
m=l
n00
(CmnDm)
= ·
{ w: Ilmsup Sn
r=
n..... oo y n
= 00, I··
lmlnf Sn r.;;;;; = -00 } ,
n..... oo y nn
(observe that the (Cm n Dm) decrease as the m -+ 00). But put B ~ A
and hence P(A) = 1.
(b) When the random walk is recurrent (p = 1/2), it can be shown
that the probability that the random walk reaches a state j E 'll in a finite
126 Lesson 6
for j > 0
F(O,j) ={ ~(1-~) for j = 0
Indeed, (6.2) is a special case of the following general relation: for any
i,j E 7h,
n
Plj = ~Fk(i,j)p}rk, (6.3)
k=l
where P{j = P(8n = jl80 = i) and
So let
00 00
Then
Thus
and hence
(1 + aY = E(~ ) an.
V(I) = =
1 - (1 - 4pqi/ 2 1 - (1 - 4p(1 _ p»1/2
= 1- (4p2 - 4p+ 1)1/2 = 1- [(2p _1)2j1/2 = l-12p-ll,
which is the probability that the random walk, starting at 0, ever returns
to o.
When p = q = 1/2 (symmetric simple random walk), V(I) = 1, so that
with probability one, the random walk will return to the origin. However,
the expected time of the first return is infinite, since
L: nv(n) = V'(I) =
00
E(Tg) = 00.
Thus
v(2n) = (_I)(n+l) ( 1~2 ) (4pq)n, n ~1
130 Lesson 6
(and v(2n - 1) = 0). Note that for n even, ( 1~2) < 0, whereas
(2n - 1 )
v(2n) = 2n 2_ 1 n (pq)
n
= 2n2q_ 1 P(S2n-l = 1).
Indeed,
T2 = inf{n ~ 1 : Sn = a}.
To derive the generating function for T~, it suffices to determine that of
TP, since
TOa -- TO1 + To2l + ... + Ta
a- l
'
where Tj is the first passage time from state i to state j. These first passage
times are i.i.d. so that
Ga(s) = (W(s)t ,
where GaO and W(·) denote the generating functions of ~ and Tf, re-
spectively.
Now
Tf(w) = inf{n ~ 1 : Sn(w) = 1}.
Conditioning on Xl, we get, for n ~ 2,
ps + qs L ¢(n)sn.
n=l
132 Lesson 6
But L:~=1 q,(n)sn is the generating function ofT~ which is the sum Tf+TJ,
so that
L: q,(n)sn = W 2(s).
00
n=l
Thus
W(s) = ps + qsW2(s). (6.8)
The roots of this quadratic equation are
so that
G~(1) = aWa - 1 (I)W'(I) = {~ for p 1/2=
p_q for p > 1/2.
Therefore, in a symmetric random walk on 7Z, all first times have an infinite
expectation. The distribution of Tg can be obtained via convolution of that
Random Walks 133
Remark.
A direct calculation of the distribution of Tg can be carried out through
an analysis of sample paths of the random walk as follows.
From Exercise 6.1, we know that each path leading from (0,0) to (n, a)
has a probability of p(n tq )/2q(n-a)/2. The total numbers of paths from
(0,0) to (n, a) is ( (n +na)/2 ).
It is clear that
n-l )
( n-l!a±l = ( nn!a
-l) .
Observe that a path of Type II must touch or cross the level A before
time n - 1. Thus the total number of paths of Type II is the same ofthat
of paths from (0,0) to (n - 1, a-I) which touch or cross a before time
n - 1. The following reflection principle shows that the number of paths
from (0,0) to (n - 1, a + 1) which touch or cross a before time n - 1 is the
same as that of all paths from (0, 0) to (n-l, a+ 1), that is ( (nn-l
+ a)/2 ) .
By looking at the figure below, we see that if r 1 is a path from (0,0) to
(n - 1, a-I) which touches or crosses a bofore time n - 1, then there is a
path r 2from (0,0) to (n - 1, a + 1) obtained by setting r 1= r2 up to the
first time 6 ( < n - 1) the path rl hits a, and the rest of 1'2 is obtained by
reflecting r 1 about the level a.
134 Lesson 6
a+1
a-I
=
Theorem 6.3 Let T2 inf{n ~ 1 : Sn = a}, a> 0, a E ZZ. Then
(i) The generating function of ~ is
P(~a _- n) -- n~ ( n
~
) p(n+a)/2 q(n-a)/2
2
forn=a+2k, k~O.
,P" = i + Sn, n ~ o.
We use p(i) to denote the distribution of the process (S~, n ~ 0) (see
Exercise 6.7).
Now consider to integers a and b with a < b. Let Ta and n be the
hitting times of a and b, respectively, that is
Then
a(i) = p(i)(Ta < n), for a:$ i :$ b,
is the probability that the random walk reaches a before reaching b.
Conditioning upon Xl, we have that
The method of particular solutions can be used to solve (6.9) (see Exercise
8). Here, in view of the form of difference equation (6.9), a direct way to
solve (6.9) is as follows. Observe that, in view of a(b) = 0,
b
a(i) =- L: [a(j) - a(j - 1)] (6.11)
j=i+l
and from (6.9), we have
a(i) = t
j=i+l
(~) b-j a(b _ 1) = 1 ~ ~/q)b-i a(b _
q p/q
1).
so that
. 1 _ (p/q)b-i
a(z) - ---=.;~,....- (6.13)
- 1 _ (p/q)b-a'
for a ~ i ~ b, and provided that p'f; q (0 < p < 1).
When p = q = 1/2, the solution of (6.9), subject to (6.10), is
b-i
at --,
( .) =b-a a ~ i ~ b.
Now let (3(i) = p(i)(n < Ta). Then similarly, we have, for a ~ i ~ b,
From the above expressions for a(i) and (3(i), it follows that
meaning that with probability one, the random walk, starting from i, will
reach either a or b.
let us intepret the above results in the context of games. Suppose that
the initial capital of player I is z and that of player II is y. At each trial,
the player I will win one dollar with probability p and loss one dollar with
probability q = 1 - p. The fortune of player I after n trials is
s~ = z+Xn + .. ·+Xn.
The plarer I is ruined when the random walk enters state 0 before state z+y
(if the random walk enters state z + y first, then the player II is ruined).
In this context, state 0 and z + yare absorbing states. The probabilities of
ruin are computed as before by taking a = 0, i = z > 0 and b = z + y.
Expected duration of the game.
In the following, for simplicity, we take a = 0 and b > O. The simple
random walk on 'lh, starting at i (0 ~ i ~ b), represents the fortune of
Player I with initial capital of i:
S(i)
n
= i + Xl + ... + X n , n;::: 0,
(the initial capital of Player II is b - i). Since states 0 and b are absorbing
states, the game will stops when the random walk reaches either 0 or b.
Thus the stopping time of the game is
E (r(i») E (E(r(i)IXd)
Thus
I-'(i) = Pl-'(i + 1) + ql-'(i - 1) + 1. (6.15)
Case P = q.
A particular solution of (6.15) is 1-'( i) = _i2 • Observe that the difference
of any two solutions of (6.15) satisfies the difference equation
r( i) = ~ (r( i + 1) + r( i-I)) ,
which has r(i) = i and r(i) == 1 as particular solutions, so that all solutions
= =
of (6.15), when P q 1/2, are of the form
I-'(i) = _i 2 + r + -yi.
The unique solution of (6.15) under the boundary condition (6.14) is I-'(i) =
i(b - i).
Case P f:. q.
In this case, (6.15) has a particular solution given by I-'(i) = i/(q - p),
and the difference of any two solutions of (6.15) satisfies
which has r(i) == 1 and r(i) = (q/p)i as particular solutions. Thus all
solutions of (6.15), when p f:. q, are of the form
I-'(i) = r + -y (!)i
p
+ _i_.
q-p
Under (6.14), we have
I-'(i) = _1_
q-p
[i _ 1_(q/p)b·
(q/p)i]
b 1-
Note that
1- (q/p): = f3(i) = P(i)(Tb < To).
1 - (q/p)
Remarks.
(i) In the above analysis, we implicitly assume that 1-'( i) < 00 for all
0< i < b. This fact can be proved as follows.
Random Walks 139
For m ~ 1, let
Jl( i) = n-+oo
lim E (r.(mi») .
On the other hand, the event {T~) = k} depends only on Xl, X 2, ... , X k,
so that {T~) = k} is independent of Xk+l, ... , X m . As a note, the random
variable T~) is a stopping time with respect to the sequence of increasing
u-fields:Fn = u(X1, ... ,Xn), n ~ 1 (:Fo = {O, 0}), in the sense that,
B!i!l(W) = L S1(w)1({Ti!l=k})(W),
k=O
We have
m m
Thus, if p - q -I 0,
E
( i»)
Tm ::;
b
Ip-ql'
and hence ,,(i) < 00. If p - q = 0, then we need to relate E (T~») to S~s.:)
through another quantity. It is left as Exercise 6.12 to show that
E (~s.:) )2 = i 2 + E (T~») .
Thus, E (T~») ::; b2,and hence again, ,,(i) < 00.
for p -11/2
l-(q/pt-i
a(i) = ab(i) = { l~(q/p)£
b-I
-b- for p = 1/2.
lim ,,(i)
b-+oo
= b-+oo
lim i(b - i) = 00,
. '" (.)
I1m I·1m -1- (., - b 1 -(q/P)i) = --.
= b-+oo i
b-+oo q- P 1 - (q/pt q- p
A difference equation for the generating function U(i, s) of u(i, n)'s (that
is U(i, s) = E:=o u(i, n)sn) is obtained from (6.16) as follows. Multiplying
(6.16) by sn+l leads to
pSA2(S) - A(S) + qs = 0
142 Lesson 6
and hence
_ ,\t(s)'\;(s) - '\~(s)'\1 (s)
U(i,s) -
'\1b (S) - '\2(S)
b
(!)
P
i ,\~-i (S) _ ,\~-i (S)
,\t{S) - '\~(S) ,
by observing that '\1(S)'\2(S) = q/p.
The generating function V(i, s) of the v(i, n)'s is obtained by replac-
ing p, q, i by q, p, b - i, respectively, in the above expression for U(i, s).
The coefficients u(i, n), v(i, n) are obtained by expanding U(i, s), V(i, s) in
power series as usual. For details, see Feller (1957).
6.6 Exercises
6.1. Let (Xn, n ~ 1) be i.i.d. with
- { 'IXl(W)+",+Xn(W)_1
A n-W' n 1-'>2'
!!.}
(i) Show that
P (lim sUPAn) = O.
n-oo
(i) Compute the common mean and variance of the Xn's and find the
distribution of Sn.
(ii) Show that Sn drifts to 00 or -00 according to p > q or p < q.
6.4*. Let (Sn, n ~ 0) be a simple random walk on 'Il. P(Xn = 1) = p,
P«Xn = -1) = q (p + q = 1). Show that
00 1
~p'2n __•
L..J 00 - p_ q
n=O
6.6. Let (S", n ~ 0) be a random walk on 'lh. Let N(w) = #{n : S,,(w) =
O}, Ak = {w : N(w) ~ k} and a = P(A 1 ). Show by induction that
P(Ak) = a k , k ~ 1.
6.7. Let S" = i + Xl + ... + X"' n ~ 0, be a simple random walk on 'lh.
Let
=
Oi {(i, al, a2,"') : aj E 'lh, j 1,2, ...}. =
Specify the u-field Ai on Oi and the probability measure Pi (on Ai) which
is the distribution ofthe stochastic process (S", n ~ 0).
6.S. Consider the equation
(i) When p::f:. q, verify that f3(i) == 1 and f3(i) = (q/p)i are solutions of
(*). Also, for constants T and r, T + r (q/p)i is a solution of (*). Determine
T and r so that f3(a) = 0 and f3(b) = 1.
= =
(ii) For p q, verify that f3(i) == 1 and f3(i) i are solutions of (*), and
hence T + ri is a solution of (*). Determine T and r so that f3(a) = 0 and
f3(b) = 1.
6.10. In a simple random walk, show that, for any i, j E 'lh,
P (limsup(~
n-oo
= j») = 0 or 1
Let So = 0, Sn = Xl + ... + X n .
(i) Find the distribution of Sn, n ~ 1.
and identify the coefficients w(n). Show that w(n) = 0 for n even and
w(2n -1) (-It+ 1 ( 1/2 ) (4pq)n
n 2q
_1
2 _-P(S
1
n
2n-l _
- 1) =-1- (2n -
~-1 n
1 ) pnqn-l
'
for n ~ 1.
6.15. Let (Sn, n ~ 0) be a symmetric random walk on '/l, (p =q
1/2, So = 0), and
To(w) = inf{n ~ 1 : Sn = O}.
Compute P (To = 2n).
6.16. Let (Sn, n ~ 0) be a simple random walk with So = O. Let a E '/l,-O.
(i) Show that the distribution of the first passage time to state a is given
by
p(T2
lal
= n) = -;;:P(Sn = a), n = lal + 2k, k ~ O.
(ii) Use (i) and Stirling's formula to show that E(T~) = 00 in the case
of a symmetric random walk.
Lesson 7
Renewal Theory
This Lesson is devoted to the study of a class of random walks whose steps
are non-negative. With the interpretation of renewals, these stochastic
processes model many random phenomena of interest. Renewal theory
provides tools for the analysis of such processes.
147
148 Lesson 7
Remarks.
(i) Alternatively, the associated counting process (Nt, t ~ 0) is also
called a renewal process.
(ii) Motivated by applications, the Xn's are called lifetimes, or inter-
arrival times. The Sn's are renewal times, and Nt counts the number of
renewals upto time t.
(iii) A renewal process is specified by the common distribution F of the
Xn's. In Section 7.2, we will see that distributions of Sn's and the Nt's can
be expressed in terms of F.
evolution of the chain after such return time is that of the chain starting
at O.
Examples of such regenerative processes are (Nt, t ~ 0), (SN1 +1 - t, t ~ 0).
See also the renewal property of Poisson processes in lesson 4.
The renewal argument, based upon regenerative processes, is essential
in deriving renewal equations in renewal theory.
1* I(z) = l z
I(z - x)/(x)dx = A2 e->'z z, z ~0
and hence
Anzn-1e->'z
rn(z) = r(n-l) * I(z) = 1_ 1\1 l(o,oo)(Z) (Gamma distribution),
P(Nt = n)
1°t
F*n(z) _ F*(n+l)(t)
An zn-le->.z 1t An+1 zne->.z
dz
~--~~dz-
(n-1)! ° n!
( At)n
_,_e->'t, n ~ O.
n.
As in the case of Poisson processes, the random variable Nt in a general
renewal process has finite moments of all orders. This can be seen as follows.
Since the Xn's are not concentrated at 0, there is some a > 0 such that
P(Xl ~ a) > O. Consider the truncated renewal process:
Indeed, (Nt'" = k - 1) is the event that the rth "sucess" (getting the value
a in a Bernoulli trial with outcome a or 0) occurs at the kth trial.
We describe now the asymptotics of Nt, as t ~ 00. First, note that by
(7.3), Nt ~ 00 as t ~ 00, (a.s.), we have that
provided that Jl < 00. Indeed, by the strong law of large numbers, Sn/n ~
Jl, a.s., as n ~ 00. On the other hand,
Sn(w)
{W : - n- ---+ Jl
} n {w: Nt(w) ~ oo} ~
{SN'(W)(W)}
w: Nt(w) ---+ Jl .
· -
11m Nt= 0 (a.s.).
t_oo t
To see this, apply the Theorem 7.1 to the truncated sequence X~ = Xn1(X"Sa)
and the associated Nt, S~, and then letting a ~ 00.
In the case of Poisson processes, the Xn's are exponentially distributed
= =
with Jl .A-I and variance (1'2 Var(X n ) .A-I. Thus E(Nt ) =.At t/Jl= =
and Var(Nt ) = t(1'2/Jl 3 . In general, it can be shown that (see Section 7.4)
t (1'2 - Jl2
E(Nt ) = - + 2? + 0(1), t ~ 00
Jl Jl
and
(1'2t
Var(Nt ) = 3"" + o(t), t ~oo.
Jl
Renewal Theory 153
(Nt - ~) I J u 2t I JJ3
will converge in distribution, as t - 00, to the standard normal random
variable.
lim P (Sn(t)
t-oo
> t) = c)(x).
t - n(t)JJ
n(t)yfu -+ -x, as t - 00,
we get
m(t) = E(Nt ), t ~ o.
154 Lesson 7
m(t) (7.5)
n=l n=l n=l n=l
m(t) = Fl(t) +
n=2 n=l
= F(t) + (7.6)
n=l
1t
where
(m * F)(t) = m(t - x)dF(x).
m = ~ Fn = (~Fn) * F = (m + 1) * F.
It turns out that the solution of (7.7) has the same pattern, namely
or
A=(m+l)*H. (7.8)
Renewal Theory 155
Indeed, we have
H+(m+l)*H*F = H+F*H+F2*H+···
= H*(1+F+F2+···)=H*(m+l).
Assuming that A and H are bounded on finite intervals, (m + 1) * His
the unique solution of (7.7). Indeed, if B is another solution of (7.7), then
for G = B - (m + 1) * H, we have G = G * F (recalling (m + 1) * H is a
solution of (7.7». But then,
G = G * F = (G * F) * F = ... = G * Fn, for all n.
Thus
= lim
n-oo}o
t G(t - z)dFn(z).
By hypothesis, the function G is bounded on [0, t], for each fixed t, say
IGI ~ at, so that
11t G(t - z)dFn(z)1 ~ atFn(t).
But m(t) = E:'=l Fn(t) < 00, implying that
lim Fn(t) = 0, for each fixed t.
n-oo
We have that G == 0.
As an example, consider A(t) = E (SN.+1). The so-called renewal argu-
ment (Section 7.1) is used to derive a renewal equation for A(t). Specifically,
A(t) = E [E (SN.+d] .
Now
for t <z
E(SN.+1IX1 =z)= { :+A(t-z) for t ~ z.
Thus
1 00
zdF(z) + lot [z + A(t - z)] dF(z)
Theorem 7.3 (Elementary renewal theorem). If 0 < I' = E(XI} < 00,
then
lim m(t) = ~. (7.11)
t_oo t I'
we have
E (SNt - t) ~ E (XNt+d
(Note that, in general, E (XNt +1) f E(Xd.) and
m(t) 1 1
-t - -+
< p. -E(XN+d·
p.t t
. m(t)
1Imsup-- 1
<-. (7.13)
t-co t - P.
If Xn's are not bounded a.s., then we apply the previous analysis to the
truncated renewal process
X~ = { ;;n if Xn
if Xn
<a
~ a
m(t) _1 .
lim sup -t ~ E(Xf)
t-co
By letting a -+ 00, E(Xf) ::; E(Xd by monotone convergence theorem (see
Appendix), we obtain the result. <>
Remark. When p. =00, limt_co m(t)/t =
0, by using the truncation
technique.
The Theorem 7.3 is in fact a consequence of a more general theorem in
the next section.
of til-'. A refinement of this result will be given below in Theorem 7.4 (Re-
newal Theorem). In Theorem 7.4, we distinguish two types of distribution
F.
In the context of renewal processes, F is the distribution of a non-
negative random variableX. If X is discrete with values in 1N = {O, 1,2, ...},
then, with probability one, X takes values of the form nd, for n E 1N and
= =
d 1. It is clear that z nd is a point of increase of F in the sense that,
for any real numbers a < z < b, we have F(b) - F(a) > 0, or equivalently,
for any s > 0, F(z + s) - F(z - s) > 0. More generally, the distribution F
°
is said to be arithmetic (or lattice) if there is d > such that all points of
increase of F are of the form nd, n E 1N. The largest such d is called the
span of F. An arithmetic distribution F corresponds to a random variable
X which assumes, with probability one, only values which are multiples of
d. If there is no such d for F, then F is said to be non arithmetic. For
example, if F is continuous, then F is nonarithmetic.
Here is the so-called Renewal Theorem, its proof is complicated and
hence omitted.
(ii) If the distribution F is arithmetic with span d, then, for any h which
is a multiple of d, (7.14) holds.
Remarks.
(a) In the statement of Theorem 7.4, the limits are ° when J.t = 00.
(b) The interpretation of Theorem 7.4 is this. For t large, the expected
number of renewals in an interval of length h is approximately hi1-'.
The fact that Theorem 7.4 implies Theorem 7.3 is left as an exercise
(Exercise 7.6).
It turns out that Theorem 7.4 is equivalent to Theorem 7.5 (below)
which is useful in determining asymptotics of solutions of renewal equations.
Specifically, the limit, as t -+ 00, of A(t) = H(t) + (H * m)(t), solution of
renewal equation A(t) = H(t)+(A*F)(t), is provided in Theorem 7.5 when
the function H( . ) satisfies certain conditions.
Since technical details as well as a formal proof of Theorem 7.5 will be
omitted, we focus instead on motivation and applications of this theorem.
Renewal Theory 159
i t
t-IJ
dm(x) = m(t) - m(t - a), for t > a.
lim A(t) = -
t ..... oo
11
J.I. 0
00
H(x)dx, (7.15)
where 10 00
H(x)dx denotes the usual Riemann integral of H(x) on [0,00).
Recall that a (measurable) function H: [0,00) --+ [0,00) is Riemann-
integrable on [0,00) if H is Riemann-integrable on [0, a] for all a > 0, and
limIJ ..... oo IolJ H(x)dx exists. (The Riemann integral of H on [0,00) is then
taken to be this limit).
From above we see that (7.15) holds for H(t) = l[o,IJj(t), which is
Riemann-integrable on [0,00). However, if H(t) is an arbitrary Riemann-
integrable function on [0,00), (7.15) may fail (see Feller (1966), Vol II, pp
349). To see which additional conditions we need to impose on Riemann-
integrable functions H, consider the following.
For h > 0, the intervals [(n -1)h, nh), n ~ 1, form a partition of [0, 00).
Let
an(h) = inf{H(x) : (n -1)h ~ x < nh},
fin(h) = sup{H(x) : (n - l)h ~ x < nh},
and
00 00
Then
f(t) ~ H(t) ~ g(t), Vt ~ 0,
so that
(f * m)(t) ~ (H * m)(t) ~ (g * m)(t).
Suppose that
lim (J * m)(t) =
t-+oo
~a(h)
J1.
and lim (g * m)(t)
t-+oo
= ~(j(h),
J1.
So that
~a(h)
J1.
~ liminf(H
t-+oo
* m)(t) ~ limsup(H
t-+oo
* m)(t) ~ ~(j(h).
J1.
If, in addtion, we suppose that
Thus for H : [0,00) --+ [0,00) such that (j(h) < 00, for h > 0, and
limh'\.o ((j(h) - a(h)) = 0, (7.15) holds. Since the Riemann integral of such
a function H is obtained directly as limh'\.o a(h), H is said to be directly
Riemann integrable.
A directly Riemann integrable function is Riemann integrable on [0, 00),
but the converse fails.
It can be shown that (see Exercise 7.7) the concept of direct Riemann
integrability coincides with the usual Riemann integrability for a function
which is zero outside of some finite interval, or monotonic.
Examples.
(i) H(t) = l[O,aj(t).
(ii) H ~ 0, non-increasing and 1000 H(z)dz < 00.
We now state, without proof, the following important theorem.
Theorem 7.5 (Key Renewal Theorem). Let (Sn, n ~ 0) be a renewal
process with interarrival distribution F, and mean J1. = 00 zdF(z). Let A 10
be the solution of the renewal equation
lim A(t)
t-+oo
11
= -J1. 0
00
H(z)dz. (7.16)
Renewal Theory 161
d
=- L
00
lim A(x + nd) H(x + kd). (7.17)
t-+oo p. k=O
(The limits are zero when p. = 00, and (7.14) holds for all x> 0).
In the rest of this section, we are going to use (7.16) to derive asymp-
totics of various quantities of interest in renewal theory.
In Section 7.3, we mentioned that, in a renewal process with F nonar-
=
ithmetic, and E(Xl) p., Var(X) 0'2 < 00, =
t 0'2 - p.2
E(Nt ) = m(t) = -
p.
+ 2?
p.
+ 0(1), t -+ 00.
t
A(t) = m(t) + 1 - -.
p.
Now,
X - t for t ::; x
E (SN.+1 - tlX1 = x) = {
E (SNt_",+1 _ (t - x)) for t > x
so that
1 00
(x - t)dF(x) + lt p.A(t - x)dF(x)
or
p.A(t) = H(t) + (p.A * F)(t).
162 Lesson 7
lim 1'A(t)
t-+oo
=!
l'
1 0
00
H(:c)d:c = ,-1'2_2+_U_2
l'
for:c > t + y
P(R, > .IX, > z) = { ~(R,-. >.) for t < :c < t + y
for t ~ :c.
Thus
A(t) = 10 00
P(Rt > ylXl > :c)dF(:c)
1t+y
00 dF(:c) + it
0
A(t - :c)dF(:c)
H(t) + (A * F)(t),
where
H(t) = 1 00
t+y
dF(:c) =1- F(t + v).
Renewal Theory 163
P(Rt ::::; y) = 1 - A(t) = F(t + y) -It [1- F(t +y- x)]dm(x). (7.19)
1 00
[1 - F(t + y)]dt < 00.
1 00
[1- F(t + y)]dt = 100
[1- F(x)]dx ::::; 1 00
[1- F(x)]dx = E(Xd = Jl..
Thus H(t) is directly Riemann integrable when Jl. < 00, and in this case,
(7.16) yields
lim P (SNt+ 1
t .... oo
- t 11
> y) = -
Jl. 11
00
[1- F(x)]dx (7.20)
= -11
or
lim P(Rt ::::; y) 00
[1- F(x)]dx.
t .... oo Jl. 11
lim P (Ct
t .... oo
;::: X, Rt ;::: y) 11
= -Jl. 00
:1'+11
[1- F(z)]dz, (7.21)
which, in turn, yields the limiting distribution of the current lifetime Ct:
lim P (Ct ;::: x) = t lim P (Ct ;::: x, Rt ;::: 0) = ..!:.1°O [1- F(z)]dz. (7.22)
t .... oo .... oo Jl. :I'
A(t - z) for z ~ t
P (XN.+l > ylXl = z) = { 1 for z > max(t, y)
o elsewhere
and hence
P(XN.+l > y) = 1 00
P(XN.+1 > ylXl = z)dF(z)
= 1 00
P(XN.+l > ylXl = z)dF(z)
+lot P(XN.+1 > ylXl = z)dF(z)
1 dF(z) + Jot A(t - z)dF(z)
00
max(t,y)
H(t) + (A * F)(t),
where H(t) = 1 - F (max(t, y». Thus
lim A(t)
t-+oo
11
= .J1.:. . 0
00
[1 - F (max(t, y»]dt.
If a renewal process has been operated for a long time so that the residual
lifetime of the item in service at time zero has above limiting distribution,
166 Lesson 7
G(y) = -l1
I-' 0
Y
[1- F(x)]dx, y> 0,
l1
Thus
-
I-' 0
Y [1- F(x)]dx = A l0
Y e->,xdx = 1- e->'y = F(y).
7.5 Exercises
7.1. Let (Xn' n ~ 1) be a sequence ofi.i.d. nonnegative random variables.
(i) Show that if E(Xn) = 0 then P(Xn = 0) = 1.
(ii) Show that E(Xn) > 0 if and only if P(Nt < 00) = 1, where Nt =
sup{n ~ 0: Sn ~ t}, t ~ 0, So = 0, Sn = Xl + X2 + ... + X n , n ~ 1.
7.2. Let (Nt, t ~ 0) be a Poisson process with intensity A. Use the formula
00
to compute m(t).
7.3. Use the renewal argument to show that the renewal function m(t)
satisfies the renewal equation
o ift<x
E(NtIXl = x) = { 1 + m(t - x) ift ~ x.
lim (m(t
t-oo
+ h) - m(t» = ~,
I-'
m(n) 1
- - ---+ - , as n -+ 00.
n I-'
(Hint: Look at m(n + 1) - m(n) and use the fact that if Xn -+ x then
E~=l Xk/ n ---+ x, as n -+ 00.)
m(t) 1
- - ---+ - as t -+ 00.
t 1-"
168 Lesson 7
(iii) Suppose that F is arithmetic with span d. Show that the result of
(ii) still holds. (Hint: First show (i) by looking at m(nd + d) - m(nd) and
applying Theorem 7.4.)
7.7. Let H : [0,00) ---. [0,00). Show that
(i) If H is directly Riemann integrable on [0,00), then H is Riemann
integrable on [0,00).
(ii) If H is non-increasing and Riemann integrable on [0,00), then H is
directly Riemann integrable.
(iii) If H is continuous and zero outside some finite interval, then H is
directly Riemann integrable.
7.8. Suppose that the distribution F is nonarithmetic. Use Theorem 7.5 to
obtain (i) of Theorem 7.4.
7.9. Let H(t) = Jooo(x - t)dF(x). Show that H(.) is monotone non-
increasing and Riemann integrable over [0,00).
7.10. Let (Sn, n ~ 1) be a renewal process with F(x) = (1 - e- AX ) 1(o,oo)(x).
1 r
G(y) = P(XI ~ y) = P10 [1- F(:c)]d:c,
where F is the common distribtuion of the Xn's, n ~ 2, and J1. = Iooo [1 -
F(:c)]d:c.
7.14. (Alternating renewal processes).
Consider a machine which can be either in operating condition (on) or
in breakdown state (off). At time t = 0, the machine is on and remains
on until time UI at which it breaks down. let VI be the repair time after
the first breakdown. After repair, the machine will be on for a length of
time U2, and so on. Suppose that the random variables Un, n ~ 1 (resp.
Vn , n ~ 1) are i.i.d. with common distribution G (resp. H), and these two
sequences of random variables are independent of each other.
Let Xn = Un + Vn , n ~ 1.
(i) Show that So = 0, Sn = Xl + X 2 + ... + X n , n ~ 1 form a renewal
process. Specify the common distribution F of the Xn's in terms of G and
H.
(ii) What is the meaning of Nt = max{n : Sn ~ t}.
(iii) Let pet) be the probability that the machine is on at time t. Find
a renewal equation for pet).
(iv) Solve the renewal equation in (iii) to get pet).
(v) Suppose that F is nonarithmetic, find the limiting probability
limt_oo pet).
7.15. Let (Sn,n ~ 0) be a (delayed) stationary renewal process. Show that
the distribution of SNi+1 - t is independent of t, namely,
P (SNi+1 - t ~ 1i
y) = -
J1. 0
Y
[1- F(:c)]d:c
for all t.
Lesson 8
Queueing Theory
171
172 Lesson 8
the system are: the way in which the customers arrive, the type of service,
the service policy and the number of servers. Having identified the type
of service, the service policy and the number of servers, we are uncertain,
except in trivial cases, about the arrival times of customers as well as the
duration of their service times at the counters. If we view the irregularities
in these uncertain quantities as statistical fluctuations, then we can model
them as random quantities using probability theory. Moreover, queueing
systems evolve in time, stochastic processes are a natural tool of modeling.
The random components of a queueing system consists of the arrival
times and the service times of customers. If we denote by To =
Tl < T2 < ... the successive arrival times of customers, then, as ran-
° <
dom variables, they form a point process on [0,00) (see Lesson 4). Let the
inter arrival times be Xn = Tn - Tn - l , n ~ 1. The Xn's are assumed to
be i.i.d., positive random variables, with common distribution F (so that
Tn = Xl + ... + X n , n ~ 0, is a renewal process). Also, let Yn be the
service time of the nth customer. It is reasonable to assume that the Yn's
are positive i.i.d. random variables, with common distribution H. More-
over, (Xn' ~ 1) and (Yn , n ~ 1) are independent of each other. With this
structure, the random components of a queueing system are characterized
by the distributions F and H.
To complete the description of the structure of a queueing system, we
need to specify the service policy and the number of severs.
As an introduction to queueing theory, we consider only the most nat-
ural and the simplest service policy, namely ''first come, first served", that
is, customers are served in the order of their arrival. The number of servers
s is either 1 (single server system), 1 < s < 00 (s counters in parallel),
or even s = 00 (in this case, each customer will be served immediately on
arrival, so that there is no queue. This situation has not only theoretical
interests, but also can be used as approximations to systems with large s).
For simplicity, we assume that the capacity of the queueing system under
study is unlimited, that is all arrivals are admitted to the system.
Thus, in the case of unlimited capacity and a specified service policy
(here ''first come, first served"), a queueing system is characterized as a
triple F / H / s. For example, consider a queueing system with s = 1, and in
which, customers arrive according to a Poisson process with rate A, so that
F is exponential, that is
that more than one arrival can occur in a small time interval, such as in
telephone communication systems.
Queueing systems F I His are classified according to the nature of F,
H, and s. Thus MIMII denotes a queueing system with both F and H
exponential and s = 1 (a single server Poisson queue), where M stands
for Markov, in view of the lack-of-memory property of the exponential
distribution (namely, P (X > s + tlX > t) = P(X > s), for all t, s > 0).
Let G stand for "general", then MIG/3 denotes a queueing system with
F exponential, H arbitrary, and s = 3. In this lesson, we will study the
system MIMIs, 1 ~ s ~ 00, and MIGII.
A
p= -, O_j = p-j OJ (t), j ~ 0,
J.'
and
E>j(t) = I:0i(t).
i~j
Remarks.
(i) If we let A(t), D(t) be the number of arrivals and departures during
(0, t], respectively, then (OJ(t),j E 7l) is the probability density of the
random variable X(t) = A(t)-D(t), and E>j(t) = P(X(t) :5 j) (see Exercise
8.1).
(ii) For j ~ 0, if we denote by Ij(z) the (modified) Bessel function of
order j, that is
1 (z)2n+j
I: nUn + 1)! 2"
= n=O
00
then
OJ (t) = e-()..+/J)t rJ /2 1j(2t"';>;). (8.3)
(iii) In Exercise 8.1, the student is asked to show that
E (A(Y» = E [E(A(Y)IY)]
= 1 o
00
AtdH(t) = AE(Y) = -,
A
Il
p is the expected number of arrivals during a service time.
Proof. Look at (8.2) and (8.3). First, we use the following result (see
Erdelyi, 1953):
efl:
1j(x) "" tn=:' X -+ 00, for all j.
v2'1rx
For x = 2t.,jAji, and t -+ 00, we have
_()..+ )t ·/2
'>:;
e2t V"''' 'Ji"2 t
e -(YX- v,., ·/2
O·(t)
J "" e /J"J
IF J.... I ..... rr--... - nf ~\'1J?I' \ 1 / ... Y"J .
and hence
lim Pij(t) = O.
t-+oo
If p < 1, then
lim ek(t)
t-+oo
= 1, for all k,
sInce
00 _(..;x_.,fij)2 , 00
1- ek(t) = .L OJ(t) '" 2(:t)1/2().. )1/4 . L pi/2 < 00
.1=k+1 J1. .1=k+l
Remarks.
(i) The above theorem says that 7r(j) = lim,-+oo Pij(t) exists for all
i E IN, and are independent of i. If p < 1, then 7r is the unique stationary
distribution of the Markov chain (Q(t), t ~ 0), whereas, if p ~ 1, then there
is no stationary distribution.
(ii) It is left as an exercise (Exercise 8.2) to verify that, when p < 1, the
geometric distribution
Thus, in its steady state mode, the queueing system is governed by the
random variable Q(oo), or simply Q, whose distribution is 7r( .).
The expected number of customers in the system is
00 00 >..
E(Q) = l:j7r(j) = (1- p) l:jtl = 1 ~ =---==1"'
;=0 ;=0 p Jl.
1 : e-ptJ!l!;-dt
(t)i
(1- p)1' 1
o J!
= (1 - p)1' 111: exp [-1'(1- p)t] dt =1 - e-(J.I-.>')t.
ifQ= 0
W" = { ~t + Y2 + ... + YQ if Q ~ 1.
We have
P(W" = 0) = P(Q = 0) = 1- p
and
00
t;lor
00 e-/lt l'iti - 1
(j 1\1
.
(l-p)P'dt
E(W*) = ~
It is easy to check that
Remarks.
(i) For a MIMII queueing system, the waiting time W(t) is the time
required to serve all customers present in the system at time t. It is clear
that
-if Q(t) = 0
W(t) = { ~t + Y 2 + ... + YQ(t) if Q(t) ?: 1.
1
g(x) = -{)-l(X), x > 0, (see (8.3»
x
where
e-(>'+/J):J: 00 1 ( ) 2n+l
{)-l(X) = 'P ~n!(n+l)! x.;>:P .
P(B<OO)={ ~/p if p ~ 1
if p > 1,
meaning that the busy period will eventually end when p ~ 1, and may
continue indefinitely when p > 1, with positive probability.
It we let Bi denote a busy period initiated by i customers (that is,
initially there are i customers in the queue), then B j is the sum of i i.i.d.
random variables which are distributed as B.
180 Lesson 8
Ai A, i ~ 0,
if1~i~s
I'i = { il'
Sl' if i >s
1'0 O. (8.4)
The quantity Sl' is referred to as the rate of global service. The traffic
intensity of M/M/s is defined to be p = A/Sl'.
As usual, the transition probabilities of the chain (Q(t), t ~ 0) can be
obtained by solving, say, the forward equations (see (5.19), lesson 5). For
i ~ 0, j ~ 1,
Thus,
AoA1 ... An-111"(0),
= 1'11'2'" J.tn
1I"(n) n~1. (8.8)
- ~ n! ; + s! ~ P ,
E
-1
[
,- 1 1 Ani A 8 ]
11"(0) = n! (;) + s! (;) (1- p)-l ,
and
~(j) = { +, (~)i
J. /.I
,!,,-.
1 (A)i
Ii 11"(0)
for 0
for j
~ j
~ s.
<s
Consider next the queue M/M/oo. In this case, there is no queue: each
customer, upon arrival, is immediately served. The system length Q(t)
becomes the number of busy servers at time t. (Q(t), t ;::: 0) is a Birth and
death process with birth and death rates given by
~i =~, i;::: 0,
I'i = iI', i;::: 1.
and the corresponding forward equations are
It can be shown that the solution Pjj(t) of this system of equations is given
1; ( ~
by
where p = e- IJt = 1- q.
Here, for all 0 < ~, I' < 00, the stationary distribution of (Q(t), t ;::: 0)
exists and is given by
and only iffor all t > T E {tn, n ~ I}, the conditional distribution of Q(t)
given Q(T) is the same as the conditional distribution ofQ(t), given Q(z),
z ::; T. Thus, the evolution of Q(t) after T is independent of its history dur-
ing (0, T]. Note that if Q(t) is Markovian, such as in a M/M/l queue, then
the set of all t is the set of regeneration points for Q(t). It follows that the
discrete-time process Qn = Q(Tn), n ~ 1, is a discrete-time Markov chain.
(Qn, n ~ 1) is referrred to as an imbedded Markov chain. We are going to
determine the stationary distribution of (Qn, n ~ 1) to provide informa-
tion about the queueing system, where characteristics of the system will be
computed after time which is an instant of departure of a customer. Note
that in a M/M/l queue where (Q(t), t ~ 0) is Markovian, the queueing
system will be in steady mode after a large and arbitrary time t.
We assume that the variance of the service time, V ar(Y), is finite, and
we write E(Y) = 1//l.
It is easy to check that Qn = Q(t n ), n ~ 1, is a Markov chain. Note that
Qn is the number of customers left behind in the system by the customer
at time tn. Let Cn denote the number of customers arriving during the
service time of the nth-customer, that is Cn = N(Yn ), where N(t) is the
number of arrivals during (0, t], and Yn is distributed as H. Then
Since
Qn+l ={
Cn+l
Qn - 1 + Cn+l .
ifQn = °
ifQn ~ 1,
or, equivalently,
Qn+1 = Qn - On + Cn +1 , (8.11)
where
if Qn ~ 1
On ={ ~ if Qn = 0,
we see that Qn+l does not depend on Qm, m::; n - 1.
From (8.11), the one-step transition matrix IP of the chain (Qn, n ~ 1)
is easily obtained.
POj = P(Qn+l = jlQn = 0) = P(Cn+1 = j) = aj, j ~ 0.
184 Lesson 8
For i ~ 1
7r(j) = 7r(O)POj
i=1 i=1
H1
7r(O)a; + 2:
7r(i)aj_i+1 - 7r(0)aH1'
i=O
Thus,
(8.13)
let 00 00
We see that 00
1 (AJ.l)~,e-At
customer). Thus,
1I"(j) = 00
dFw(t)
o J.
186 Lesson 8
and hence
L j7f'(j) = '\E(W).
00
E( Q) =
i=o
It follows that
E(W) = .!.E(Q) = .!. + '\E(y2) .
.\ I' 2(1- p)
Note that, if W* denotes the waiting time of a customer, then
8.5 Exercises
8.1. In a M/M/l queue with inter-arrival times and service times exponen-
tially distributed with parameters .\, 1', respectively, let
A(t) = number of arrivals during (0, t],
D(t) = number of departures during (0, t),
and
X(t) = A(t) - D(t), t ~ O.
(i) Determine the distribution of X(t).
(ii) Show that
Show that, for any t > 0, Q(t) has the same distribution as Q(O).
where Aj denotes the event that, upon arrival, the customer has in front of
him j customers in the system.
(iii) With the notation given in Section 8.2, show that E(W) = E(W")+
III'. Use Little's formulae to show next that E(Q) = E(Q") + AII'.
8.5. Consider a queue MIMI8 with 1 < 8 < 00. Show that the birth and
death rates of the Markov chain (Q(t), t ~ 0) are
Ai = A, i ~ 0,
il' ifl$i$s
I'i = { 81' if i > s.
8.6. In the stationary mode of the queue M 1M18 with 1 < s < 00,
(i) Compute the expected system-length E(Q).
(ii) What is the probability that a customer must wait for service.
(iii) Compute the expected number of customers in the queue E(Q").
(iv) Find the distribution of Q*.
(v) Let W* be the waiting time of a customer, find the distribution of
WoO and E(W*).
188 Lesson 8
E(Y) = 1 00
o
tdH(t) = -,
1
I-'
E(y2) = 1 00
t 2dH(t) < 00.
(i) Let Cn be the number of arrivals during the service time Yn of the
nth customer. Show that E(Cn ) = )./1-' = p.
(Hint: Cn = N(Yn).)
(ii) Compute the generating function of Cn:
=L
00
E(Q) = p + ~~~(y2}.
(Hint: Take expectation in Q~+l = (Qn - bn + Cn+d 2 • )
Lesson 9
Stationary Processes
189
190 Lesson 9
"It = J. 1r
1r cos>..tf(>..)d>.., t E 72. (9.7)
Stationary Processes 191
Now since
Itl) 1t cosAtl ~ 11tl
I ( 1- -;
we may apply the Lebesgue dominated convergence theorem to the counting
measure over'll (see Appendix). It follows that
lim E (In (A))
n-oo
= f(A)
and since In(A) ~ 0 implies E(In(A)) ~ 0 we have f(A) ~ 0 and the proof
of Theorem 9.1 is therefore complete. <>
More generally, it can be shown that, if (Xt ) is any stationary process,
1:
then there exists a bounded measure on [-11",11"], say 1', such that
1:
Note that, if the spectral density f does exist, then
If X. and X t remain highly correlated for large It - sl, then the spectral
density does not exist (see Example 9.2 and 9.4).
We now give some important examples of stationary processes with their
spectral measures.
192 Lesson 9
Example 9.1 A real second order process (Ct, t E ;Z) is said to be a white
noise if
(i) E(ct) = 0, t E ;Z,
(ii) E(c¥} = 00 2 > 0, t E;Z, and
(iii) E(C3Ct) = 0, s, t E ;Z, s 1= t.
If in addition the Ct'S are i.i.d., then (Ct) is said to be a strong white
noise. A white noise is stationary with autocovariance 1t = oo21t¢0 and
consequently with spectral density given by
00 2
f(A) = 211"' A E [-11",11"]. (9.13)
I: pict-j ,
00
Xt = t E;Z, (9.14)
j=o
where (Ct) is a white noise and Ipl < 1. Then (Xt ) is stationary with
autocorrelation (p') and spectral density
2
f(A) = -11-
00
211"
•
pe'>'r 2 , A E [-11",11"]. (9.15)
The model (9.16) is crucial since it may be proved that every stationary
process can be approximated by processes of that type.
Relation (9.18) shows how the spectral measure points out the predom-
inant amplitudes and frequencies of a stationary process.
We now give a result which allows us to compute the spectral measure
of a stationary process defined by a linear filter.
IL cjeiA;
00
Proof. First using the Cauchy criterion, it is easy to check that the series
in (9.17) converges in mean square.
On the other hand, since convergence in mean square implies conver-
gence in I-mean, we have
L
l~j,j'~n
CjCj' cos '\[(t - s) + (j - i')11 ::; (L
j
ICj I) 2 ,
1:
obtaining the relation
1:
hence
= 1: ).)
= 1: )
where X· and Y· are the orthogonal projections of X and Y on Sp(Zl, ... , Zk).
Now the partial autocorrelation of a stationary process (Xt , t E '/l,)
is definied by
Using stationarity one may infer that (1'2 does not depend on t (see Exercise
9.3).
(Xt ) is said to be regular if (1'2 > 0, otherwise (Xt ) is said to be deter-
ministic.
It (Xt) is deterministic, then (9.21) entails Xt+1 E 1tt (a.s.) and conse-
quently past and present of the process determine its future.
If (Xt ) is regular one may be define a white noise by setting
(Ct) is called the innovation (process) of (Xt ). The following theorem pro-
vides a decomposition of a regular process.
where the series converges in mean square and where ao = 1, Lj aJ < 00,
(ct) is white noise, Ct1. Y3, s, t E 7Z, Yi E nj:o 1tt -j, t E 7Z.
The sequence (aj), (Ct), and (Yi) are uniquely defined by (9.23) and the
above properties. In particular, (Ct) is the innovation of (Xt ).
Xt = "'"
L.J bjet_
- - + t,
j y;
j=O (1'
Xt = c~ + (tajc~_j
J=1
+ Yf) ,
,
00
~"
.LJ ajCt_j + ~" = X t and Ct = X t - Xt.
'
j=1
The rest of the proof is straightforward. <>
The process Zt =Xt - yt, tEll has the Wold decomposition
00
Theorem 9.4 (weak law of large numbers). Let (Xt, t E tz) be a station-
=
ary process such that E(Xt ) m and with autocovariance (1t). Then
Xn ~ m ¢=> ~ L
Itl$n-l
(1- 1:1) 1t 0 --+ (9.25)
E(Xn
- - m) 2 = n12 'L..J
" Cov(X.,Xt )
l$.,t$n
1
= n2 L
l$.,t$n
1t-.
= ~n 'L..J
"
Itl$n-l
(1 - J!1) 1t,
n
(9.27)
hence (9.25).
In order to prove (9.26), we first note that 1(1 - Itl/nhtl :::; 11tl and
then apply dominated convergence theorem to the counting measure over
'lI.. (see Appendix) for obtaining
L Itl) 1t --+ L
( 1 - -; 1t
Itl$n-l tEZ
X t = m+ Lajct-j, t E tz,
j=O
where m E lR, ao = 1, L laj I < 00, and (Ct) is a strong white noise, then
t= - D
vn(Xn - m) --+ N, (9.28)
where N denotes a random varianle with distribution N(O, LtEtz 1t).
198 Lesson 9
Note that the variance of N may be written under the alternative form
(T2 (E j aj f.
9.4 Stationary Processes in Continuous Time
In this section we give some indications about second order continuous time
processes and weakly stationary continuous time processes.
Second order calculus. Consider a second order process (Xt t E I) where
I is an interval of reals. It may be considered as a random function by
defining the transformation which associate to every wEn the function
t 1----+ Xt(w) called the sample function corresponding with w.
Then it is natural to construct a theory in which it is possible to talk
about continuity, differentiation an integration of the process. We develop
these concepts in the L2 sense. In this Section, we suppose that (t,w) - +
Xt(w) is 8(1) ® A-8(JR) measurable (see Appendix).
Theorem 9.7 Let (Xt, tEl) be a zero mean second order process. The
following conditions are equivalent.
=
where a tn 0 < tn 1 < ... < tn n =
b, Sn i E [tn i-l. tn i). If In - + I as
L2
1= 1
6
Xtdt. (9.29)
It can be proved that I does not depend on the sequence of partitions (tn,i)
and on the choice of (Sn,i).
Note that I is a square integrable random variable and E(I) = liffin In =
O. If (Xt ) is not zero mean, its L2-integral is defined by J:(Xt-E(Xt»dt+
J: E(Xt)dt provided t --+ E(Xt ) is integrable over [a, b).
We have the following basic criterion for L 2-integrability.
Peoof. Let (In) and (Jm) be Riemann sums associated with two sequences
of partitions (tn,i) and (Tm,j). (Xt) is L 2-integrable with integral I if and
only if In ~ I and Jm ~ I. These conditions are equivalent to E(In -
200 Lesson 9
This last condition means that C is Riemann integrable on [a, b] x [a, b]. <>
The following property of the integral is useful.
E [l b
a
f(t)Xt dt lb
a
g(t)Xt dt] = J1 [a,b)2
f(s)C(s, t)g(t)dsdt. (9.30)
In particular,
E (l a
b
Xtdt) 2 = J1 [a,b)2
C(s, t)dsdt (9.31)
and (9.29) remains valid if C is only assumed to be integrable on [a, b] x [a, b].
1
by setting
f(>.) = -217r 00
-00
'Yt cos >.tdt, >. E~. (9.34)
1:
If f is integrable on ~ we have the Fourier inversion formula
The following theorem summarizes some results about second order cal-
culus for stationary processes. Proof is omitted.
v (~ lT Xtdt) = ;2 J I l[o,T)2
'Y(t - s)dsdt
;2 JI lO$.$t$T
'Y(s - t)dsdt
202 Lesson 9
2 fT
= T2 Jo (T - uh(u)du
T1jT (
-T 1- Tlui) ')'(u)du,
t X,dt) "~1:
therefore
v (~ h(u)ldu,
v (~ t X,dt) =E (~ t x,dt-m)',
the proof of Theorem 9.11 is now complete. <>
9.5 Exercises
9.1. Prove (9.15), (9.17), and (9.18).
9.2. (i) Give a detailed proof of Theorem 9.1.
(ii) Give a proof of Theorem 9.2 without using Lebesgue theorem but
with the additional assumption Et Iht I < 00.
9.3. Let (Xt, tEll) be a zero mean stationary process.
(i) Show that XJp) ~ Xt as p -+ 00, where XJp) denotes the orthogonal
projection of X t on sp(Xt - 1 , ... , X t - p ).
(ii) Use (i) for proving that E(Xt - Xt )2 does not depend on t.
9.4. What is the Wold decomposition of the process given by (9.14)? Justify
your answer.
9.5. Let (Ct, tEll) be a strong white noise. Set X t = a + bt + ct, tEll
and define
1 I:
2k +
Zt= - - " X t+; ,
1 'L..J tEll .
;=-1:
Compute the mean and the covariance of (Zt). Is (Zt) stationary?
9.6. Let (Nt, t ~ 0) be a Poisson process with intensity A. Study its L2_
continuity, L2-differentiability, and L2-integrability.
Stationary Processes 203
9.7. Show that L 2-differentiability does not imply the usual differentiability
of the sample functions. (Hint: Consider the process Xt(w) = l{t}(w), t E
[0,1] with (0, A, P) = ([0,1],8[0,1], J.t) where J.t denotes Lebesgue measure.)
8. Prove Theorem 9.9.
9.9. (i) Prove Theorem 9.10.
(ii) Show that if (Xt ) is stationary and L 2-differentiable, then (Xn has
the auto covariance (-1" (t)).
9.10. (Karhunen-Loeve expansion). Let (Xt, a ::; t ::; b) be a second or-
der L2-continuous stochastic process. By Mercer's theorem (see e.g., U.
Grenander, Abstract Inference, Wiley, (1981), pp 62-64 ), we have
X t = Een¢n(t), a ~ t ~ b,
n=O
ARMA model
Xt = Eajct-j (10.1)
j=O
with ao = 1, Ej laj I < 00, and where (Ct, t E ZZ) is a white noise.
Remark. Note that the series in (10.1) converges in mean square and with
probability 1 (Exercise 10.1). Note also that if one replaces the condition
E j laj I < 00 by E j aJ < 00, then the series remains convergent in mean
square, however the process may be affected by long memory dependence.
Properties. A linear process is zero mean weakly stationary with autoco-
varIance
00
205
206 Lesson 10
B(Xt ) = X t- 1
BjXt=Xt_j, j=0,1,2,···,
where 11"~ = 1, E j 11I"jl < 00. Now by setting 1I"j = -1I"j, j ~ 1, we obtain
00
ct = (t i.1=0
7r Bj) Xt
Xt = Lajct-j (10.8)
j=O
and
p
Xt = L 7rjXt _j + ct (10.9)
j=O
with 7rp :/; 0 and where (ct, t E m) is a white noise such that ctl.X.,
s,tEm,s<t.
208 Lesson 10
where 0 < Ipi < 1. Thus et J..X6 , s < t and by considering X t - pXt-l, we
obtain the equation
X t = pXt - l +et, (10.13)
which proves that (Xt) is actually an AR(l).
Xt = L7rjXt-j = L7rjXt-j.
j=l j=l
If 7rl #: 7ri, then X t - l E 1lt - 2 and by stationarity X t E 1lt - l , t E '/I,.
Now, since et = X t - Er=l 7rjXt _j, we have et E 1lt -l, which contradicts
e t J..1l t -l. Then step by step, we can prove that trj = trj and finally that
=
p p. and e; = et.
We now turn to the existence. 1/ P(z) has no pole in a closed disc of
center 0 and radius 1 + h where h is strictly positive and small enough.
Consequently 1/ P(z) has the power series representation
1 00
Thus (Xt ) is a solution of (10.11) and the proof is therefore completed. <>
In the following we will suppose that P(z) # 0 if Izl ~ 1. Note that this
implies the representation
p
Proof. If k ~ 1, we have
E(etXt) =E [(Xt -
A]
Xt)Xt = E(Xt - A2
Xt) =q 2
210 Lesson 10
and
= "Yo - I: 1rj"Yj,
j=l
P(B)PIe = O. (10.20)
Proof. For every k > 1, let y(le) be the projection ofthe square integrable
random variable Y on the linear space generated by X t - 1 , " ' , Xt-le+!.
ARMA model 211
Xt - X t(p) -- 'll"p
(Xt-p - X(p)
t-p ) + Ct· (10.25)
and by stationarity
E (Xt- p - X t(p»)2
_p -_ E ( X t - X t(p»)2 • (10.27)
Therefore
Ct = Xt L.)-1)''+1'
- ""' a{Xt_j (10.30)
j=1
Q(z)
Ct = Xt - E'1I'jXt -j (10.32)
j=1
or equivalently
Ct = (Q(B))-1 X t . (10.33)
We now state without proof some properties of M A processes.
The auto covariance is given by
ric
= (-1)Jc+ 1 aH1 -
2(1c+1)
an (10 34)
•
1- a 1
Using (10.3) we obtain the spectral density
0'2
f(>..) = 2'11'IQ(e i >')1 2 , >.. E [-'11','11']. (10.35)
ARMA model 213
hence p
where as usual (Ct) denotes a white noise. Clearly (Xt ) is not station-
ary, however X t - X t - 1 = Ct is stationary and it can be considered as an
ARMA(O, 0) process. We will say that (Xt ) is an autoregressive integral
moving average (ARIMA) process.
More generally an ARIMA(p, q, d) process satisfies the equation
where <P and 9 have the same properties as in definition 10.4 and d is an
integer which characterizes the trend.
Note that <P(B)(I - B)d is not invertible. Consequently (Xt ) cannot
be defined as a linear process. In fact, in order to define (Xt ) precisely, it
is necessary to consider p + d nonrandom initial values Xto-1,···, Xto-p-d
which allow to computeXto by using (10.43). When all these values are
elimilated, (Xt ) takes its "cruising speed" and then (I - B)d X t behaves
like an ARM A(p, q).
SARIMA processes. If (X t ) has a trend together with a period S, one
may consider the model
with
<P2(B$)(I - B$)D X t = 9 2(B$)ct, (10.44)
where <P 1, 91. <P 2, 9 2 are polynomials of respective degrees p, q, P, Q.
We obtain the so called SARIMA(p, q, d; P, Q, D)$ process.
Example 10.3 The model SA RIMA (0 , 1, 1; 0,1, 1h2 is very useful in econo-
metrics. It may be defined by the equation
ARMAX processes. All the above models are "closed": they explain the
present of (Xt ) only by its own past. It would be more realistic to take into
account some exogeneous variables. For example electricity consumption is
linked with temperature.
The ARM AX model is an "open" process defined by
10.6 Exercises
10.1. (i) Show that the series in (10.1) converges in mean square. (Hint:
Use Cauchy criterion.)
(ii) Show that this series converges with probability 1. (Hint: Show that
E (E j lajllct-jl) < 00.)
10.2. Show that a linear process is
(i) zero mean,
(ii) weakly stationary with auto covariance given by (10.2),
(iii) such that ct.lXt if s > t.
10.3. Let (Xt ) be a linear process associated with an i.i.d. white noise (ct).
Show that (Xt ) is strictly stationary.
lOA. Let (Xt ) be a process satisfying (10.13).
(i) Verify that
Xt = ct + a1ct-1, t E 'U..,
216 Lesson 10
X~'2l = I:tPjXn+l-j,
j=l
where tPl, ... ,tPn satisfy the difference equations
X t = - I:pk'7t+k' tE'll,
k=l
ARMA model 217
Discrete-Time
Martingales
11.1 Generalities
Let (O,A,P) be a probability space and let (Bn,n ~ 1) be an increasing
sequence of sub u-fields of A, that is, each Bn is a u-field contained in A,
and for each n, Bn ~ Bn+1'
We shall say that a sequence (Xn, n ~ 1) of real random variables is
(Bn)-adapted (or simply Bn-adapted) if, for every n, Xn is Bn-measurable.
For example, if Bn = u(X1 , ... , X n ), n ~ 1, then, clearly, the (Bn, n ~ 1) is
an increasing sequence of sub u-fields of A, and (Xn) is Bn-adapted. (See
Lesson 1 for notation and details).
219
220 Lesson 11
Proof.
(i) Since Yn is Bn-measurable, (11.4) means that Yn = EB"(Yn+d, by
definition of conditional expectation (see Lesson 1).
(ii) (11.1) implies
It follows that
and using the same method as in the proof of Lemma 11.1(iii), we obtain
E 13 i(Xj) = 0, hence (11.7). <>
(11.6) is valid for a submartingale ( or supermartingale) except that
E 13k-l(Xk) = 0 is replaced by E 13"-l(Xk) ~ 0 (or E13 k- 1 (Xk) '5, 0). See
Exercise 11.2.
Yn=c+Xl+···+Xn,
222 Lesson 11
°
(i) Cn = 1 if and only if the gambler decides to play at the nth game.
(ii) Cn = if and only if the gambler decides to miss the nth game.
(iii) Cn is Bn_1-measurable: the gambler's decision depends only on
X1, ... ,Xn- 1.
Then the new sequence of gambler's fortunes is
Cn = I{T~n}' n~I
Discrete-Time Martingales 223
and is executed in the following way: the gambler plays all the game until
the rth and then stops definitely.
Thus the sucessive fortunes are
min(n,T)
Zn = c+ LXi, n~1 (11.10)
i=l
and, of course, (Yn ) and (Zn) have the same nature.
c) Random walk and the ruin problem.
Let (Xn, n ~ 1) be a sequence ofi.i.d. random variables with a common
distribution given by
P(Xn = 1) = p, P(Xn = -1) = 1- p, n ~ 1,
where 0 < p < 1. Then
Sn = Xl + ... + Xn, n~1
then
so that
ES"(Yn+d = Yn , n;?:1
and (Yn ) is a martingale for every p.
Now if Sn is the fortune of a gambler who wins or losea one dollar with
probability p and 1 - p at each game, the ruin problem is defined by the
stopping time
where {Sn =
-b} corresponds to the ruin of the gambler and {Sn =
a} to
the ruin of his opponent.
Similarly as in b), it can be proved that the perturbation of (Yn ) by T
does not affect its nature (see Exercise 11.3).
d) Polya's urn model.
An urn contains a red and b black balls. A ball is drawn at random.
It is replaced and c balls of the color drawn are added. A new random
drawing is then made and this procedure is repeated; This model can be
used to describe contagious diseases.
let Yn be the proportion of black balls after the nth drawing, then
Yo = b/(a + b) and we claim that (Yn ) is a martingale adapted to Bn =
u(Yo, Yl, ... , Yn ), n ;?: O.
To prove this we set Yn =
0:/ f3 where 0: is the number of black balls and
f3 the total number of balls after the nth drawn. Then, whatever (0:, f3) is,
ES"(Yn+d = Yn.
e) Likelihood ratio.
Let (Xn, n ;?: 1) be a sequence of i.i.d. random variables with common
density f or g. Xl' ... ' Xn are supposed to be observed and we wish to
determine the true density. For that purpose we construct the likelihood
ratio (see Lesson 13) by setting
Y = TI?=1 g(Xi)
n n;?:1. (11.13)
TI?=1 f(Xi) ,
Discrete-Time Martingales 225
Lemma 11.3 Let (Yn ) be a martingale and let u be a convex function such
that u(Yn) is integrable lor every n. Then (u(Yn )) is a submartingale.
Proof.
Using Jensen's inequality (see Lesson 1), we obtain
as desired. O.
Proof. Let
Aj = {U1 ~ t, ... , Uj-1 ~ t, Uj > t}
and
A = U = {max > t} .
j=1
Aj
k<n
-
Uk
226 Lesson 11
Un ~ Un lA =L Un1A;'
j=1
therefore n
E(Un) ~ LE (Un1A;). (11.15)
j=1
Proof. Since the function x 1--+ Ixl is convex, Lemma 11.3 shows that
(IYnl) is a submartingale. Thus applying (11.14) to IYnl, we obtain (11.18).
<>
We now in a position to state the martingale's convergence theorem.
TheoreIll 11.1 (Convergence of martingales). Let (Yn ) be a martingale
such that
sup E(Y;) < 00 (11.19)
n~1
Proof.
(i) We first prove L2-convergence by using Cauchy criterion.
Note that if n ~ m,
E(YnYm ) = E [ES",(ynym )]
E [YmEs",(yn )] = E(Y~),
hence
E(Yn - Ym)2 = E(Y;) - E(Y~), n::::m. (11.21)
Now Lemma 11.3 and (11.19) show that (Y;) is a submartingale. Thus
E(Y;) is increasing (see Lemma 11. 1(ii)) and since it is bounded, E(Y;) II
for some finite l.
Using (11.21), we infer that
lim E(Yn - Ym?
n,m~oo
= l - l = 0,
which proves L2-convergence.
(ii) We now turn to almost sure convergence. Note first that
(Yk-Ym, k=m+1,m+2, ... )
is a martingale since
ES"(Yk - Ym ) = Yk - Ym , k>m.
Thus (Yk - Ym)2, k > m is a positive submartingale (see Lemma 11.3) and
applying Kolmogorov's inequality we find, for n :::: m,
where
n
(n u n
therefore
p
11=1 m=1 .I:=m+1
{IYk - Yml ~ ~}) = 1
and finally
or equivalently
lim
.1:,.1:'_00
IYk - Yk,1 = 0, a.s .
Yn~ y. (11.25)
Proof is omitted.
We now present two important probabilistic applications of Theorem
11.1.
Xl+···+X n a.a. 0.
--+ (11.26)
an
Proof. We need the following classical Kronecker's lemma whose proof is
left as an exercise.
Lemma 11.5 Let (xn) be a real sequence such that the series En xn/a n
converges, where 0 < an / 00, then
·
11m Xl + ... + Xn 0
= .
n-oo an
Discrete-Time Martingales 229
Yn = EXk/ak, n~1.
k=l
Lemma 11.2 shows that (Yn) is a square integrable martingale and that
(11.7) is satisfied, hence
Yn~Y
~E(X~)=E(~X~) <00
and the result follows from Theorem 11.4. o
Finally we state without proof a central limit theorem.
230 Lesson 11
Xn = Yn - Yn-l, n~2
where
11.4 Exercises
11.1 Let (Yn ) be a Bn-adapted integrable random sequence.
(i) Show that (Yn ) is a supermartingale if and only if
Hint: choose
B = {Yn $ EB"(Yn+d}.
(ii) Show that (Yn ) is a supermartingale if and only if
T = inf{n : Sn = a or Sn = -b}
is again a martingale.
Discrete-Time Martingales 231
for some square integrable random variable Z and therefore (Yn ) converges
almost surely.
11.6. Let (Yn ) be the martingale defined in the Polya's urn model.
(i) Show that Yn --+ Y a.s. and in L2 ..
(ii)* Show that Y '" B(J.t, v) with density
1
=~.
2
n -1
P(Xn -1 = -n) = n 2 P ( Xn- 1 = n2 n_ 1
)
is a martingale.
(ii) Show that Yn ~ 00.
232 Lesson 11
11.9. Let (Xn) be a finite-state Markov chain with transition matrix (Pij).
suppose that for all i,
LPijXj = AXi,
j
Yn = A-nxx .. , n~1
is a martingale.
11.10. Let (Xn) be a sequence of independent positive random variables
such that E(Xn) = 1.
Xn = Yn +An, n ~ 1 a.s.
1 _:1:2/2
f(x) = 0/ie , z E JR. (12.1)
f,..,q{x)
1
= 0'0/i exp
(x
-
_1-')2)
20'2 ' x E JR, (12.3)
233
234 Lesson 12
Proof. There exists a linear mapping from IRP to IRq, say A, and B in IRq,
such that T(X) = AX + B. Hence from every a E IRq,
=
Lemma 12.3 Let X (Xl, ... , Xn) be a n-dimensional Gaussian vector.
Then Xl, ... ,Xn are independent if and only if the covariance matrix is
diagonal.
E(XiXi) = 0, i -:/: j,
Theorem 12.1 (i) A Wiener process has stationary increments and its
covanance IS
C(s, t) = (12 min(s, t), s, t ? O. (12.7)
(ii) Conversely every zero mean Gaussian process with covariance (12 min(s, t)
is a Wiener process.
hence
¢(u) = exp ( _~(12(t - s)u) , u E JR,
which is the characteristic function of N(O, (12(t - s». This proves that the
distribution of Wt - W8 depends only on t - s.
Now in order to establish (12.7), it suffices to write
E(Ws2) = (12S, S ~ t.
2 h
E ( Wt+hh- Wt)2 -_ h
00
--+ 00 as -+ 00.
Theorem 12.2 The total variation 01 (Wt ) over each real interval is al-
most surely infinite.
Recall that the total variation of a real function over [a, b] is defined by
k
E (IWi/2ft - W(i-1)/2ftl) = aT n / 2
and
V (IWi / 2 ft - W(i-1)/2,, 1) = [32- n .
=
Then B entails E(Yn ) a2 n / 2 and V(Yn ) [3. =
Using Tchebychev inequality (Lesson 1) we get
P (liminfYn ~ Ot2 n / 2 - n) = 1,
hence
Yn - + 00 a.s.
and consequently v «W »=t 00 a.s. <>
The following Karhunen-Loeve expansion may be used as an alternative
definition of the Wiener process.
J:
We now introduce Ito integral, an essential tool in the theory of diffusion
processes. This integral has the form f(t)dW(t) but its definition turns
out to be intricate since the differential dW(t) does not exist in the usual
sense as we have seen in the previous section.
Let us consider a standard Wiener process (Wt ) defined on the prob-
ability space (0, A, P) and let :Ft =
u(W" a ~ s ~ t), a ~ t ~ b be the
family of increasing u-fields associated with W = (Wt, a ~ t ~ b). :Ft is
the set of those events which only depends on the behaviour of (Wa) in the
time interval [a, t].
We now define the class C of integrands as the class of random functions
f such that
(a) f E L2 ([a, b] x 0, B[/J,h] X A, L ® p),
where L denotes the Lebesgue measure over [a, b].
(b) For every t E [a, b], the random variable f(t,.) is :Ft-measurable.
Brownian Motion and Diffusion Processes 239
where a = to < tl < ... < tn = b and where, by convention, [tn-I, tn) =
[tn-I. b].
Note that since I is nonanticipating, we have li(W) l(ti'W), i = =
0,1, ... , n - i, consequently Ii is Fti measurable and E(fl) < 00.
Now we define the Ito integral of I by setting
1a
b
l(t)dW(t)
n-I
= L: li(Wti+
i=O
1 - wt;}. (12.11)
It is easy to prove that (12.11) does not depend on the partition (ti).
The following lemma is crucial.
E (l b
IdW) = 0, lEe, (12.12)
E (1 1b
6
IdW 9dW) = 1 6
E (f(t)g(t» dt, I, gEe. (12.13)
Note that (12.13) means that I is an isometry from e into L2(O,A, P).
Proof. First since Ii and W ti +1 - W ti are square integrable independent
random variables, Ii (Wti+l - W t ;) is square integrable and consequently
J: IdW is in L 2(O,A,P).
Now J IdW and J gdW may be defined by using the same partition
(ti)' thus Ito integral is clearly linear.
Second, noting that F t = 0" (Wa, Ws - W a, a ~ s ~ t) and that (Wt ) has
independent increments, we get
E (1b) n-l
IdW = ~ E [E (I (Wt;+l - Wt;) ITt;)] = O.
For (12.13), write
n-l n-l
1= L li 1[t;,t;+1) and 9 = L gi 1[t;,t;+1)'
i=O i=O
then
E (l b IdW 1b 9dW)
n-l
L E(figi)(ti+l - ti)
i=O
1b E [/(t)g(t)] dt. <>
The following technical lemma allows to extend the integral to C.
Brownian Motion and Diffusion Processes 241
and
Proof. Immediate from Lemma 12.4 and 12.5 and the bicontinuity of the
scalar product. <>
We now indicate some properties of the stochastic integral:
(i) If f is non-random, then
1 b
fdW = f(b)W(b) - f(a)W(a) -
1b a f'(t)W(t)dt. (12.18)
(ii) Define
then (Xt ) is nonanticipating with respect to (Wt ) and has continuous sample
functions (a.s.)
(iii) (Xt ) is a (continuous-time)martingale with respect to (Ft ), that is
where J.l(Xt , t) is called the drift term and O"(Xt, t) the diffusion coeffi-
cient.
Intepretation. X t is the position ofthe particle at the time t, 1'(:1:, t) is
the velocity of a small volume v of fluid located at :I: at time t. A particle
within v will carry out Brownian motion with parameter 0"(:1:, t). (12.21)
gives the change in position of the particle in the time interval [t, t + dt].
The appropriate mathematical form of (12.21) is
X t- 1t
Xa = I' (X, , s)ds + 1t O"(X"s)dW(s), a ~ t ~ b, (12.22)
where the first integral is in L2 sense and the second is in Ito sense.
We now state assumptions which allow to claim that (12.22) has a so-
lution.
Ai. There exists k > 0 such that
and
= 1 t
1 t ]2
r
E (IXn+1(t) - Xn+l(sW) E [ Jl(Xn(x), u)du + u(Xn(X), u)dW(x)
< 2E (J Jl + 2E (J u) 2• (12.25)
(b) We now show that there exists (Xt, a::; t ::; b) such that
L~
Xn(t) --+ Xt , a ::; t ::; b. (12.28)
+ 1 t
[u(Xn(s),s) - U(Xn_l(s),S)]dW(S)].
Using the same method as above (see (12.25) and (12.26)), we obtain
1t
and from AI,
E (I.6. n(tW) ::; K E (l.6.n _ l (s)1 2 ) , (12.29)
by Schwarz's inequality.
From (12.28) and (12.29), we infer that
we have
By (12.28),
L2
Pn = X t - Xn+l(t) ---+ O.
Using I we get
Qn = i t
a Ut(Xs, s) - Jl(Xn(s), s)]ds ~ O.
2
i t
Rn = a [u(Xs, s) - u(Xn(S), s)]dW(s) ---+
L2
0,
X t- = it
yt (j.t(Xs, s) - Jl(Ys , s)]ds + it [u(Xs, s) - u(Ys, s)]dW(s).
246 Lesson 12
hence
F'(t) - AF(t) $ 0,
! (e-
then
At F(t)) $ 0.
P(Xt = yt) = 1, a$ t $ b
or
P ( n
tEQn[a,bj
{Xt = yt}) = 1,
but since the sample functions of (Xt ) and (yt) are continuous (a.s.), we
obtain (12.23) and the proof of Theorem 12.5 is now complete. 0
Ito's differentiation formula. The following change-of-variable formula
is very useful. Let ¢ : IR x [a, b] - - IR such that the partial derivatives
a¢/ax, a¢/at, a 2 ¢/au 2 exist and are continuous for all (x, t) in IR x [a, b]
and let (Xt, a $ t $ b) with stochastic differential
where
a¢ a¢ 1 2 a2 ¢
h(t) = /(t) ax (Xt, t) + at (Xt, t) + '2 g (t) ax 2 (Xt, t) (12.35)
and
gl(t) = g(t) a¢
ax (Xt, t). (12.36)
where I' E 1R and u > 0 are constants and where (Wt, t ~ 0) is a standard
Wiener process. Then we have
hence
yt = 1t [I'(s) - u2~s)] ds + 1t u(s)dW(s),
we have as t -+ 00,
yt = ue St X t •
dyt = ue St dWt
Brownian Motion and Diffusion Processes 249
1t
hence
e8t X t - Xo = (1' e83 dW(s), t ~ o. (12.45)
Wi, t~O
W t = { W: t , t~ 0,
(12.46)
1t-00
get - s)dW(s) =
~~-oo
l~m 1t
~
get - s)dW(s), (12.51)
12.6 Exercises
12.1. Characteristic function of a Gaussian vector.
(i) Show that the c.f. of Y, which is distributed as N(/J, 0- 2 ), is
(ii) Prove (12.4). Hint: consider the case /J = 0 and use the equality
tPy(X) = tP(t,x)(I), t E JR.
1 -, exp (
,- -. -21 ~xJ
n ) ,(Xl, ... ,xn ) E JRn .
J=l
Xo(w) = w, wE JRn .
AntP~(t) = -tPn(t).
Brownian Motion and Diffusion Processes 251
In)(t,w) = 100
e- U I (t - ~,w) du
and show that In E C and (In) --+ I in L2 ([a, b] x 0).
(iii) Show that t = C. Hint: Consider
gn = I 1IJI<n, IE C, n ~ 1.
12.6. Prove (12.18). Hint: Define fn = E?;01 f(ti)l[ti,ti+l) and show that
1a
b n-1
fn dW = f(t n-1)Wt .. - f(to)Wto - Erf(ti) - f(ti-I)]Wti ·
i=1
(iii) Compute the prediction error E ( Xt+h - Xt+h) and find its limit
as h --+ 00.
Xn(t) = e1 +~..
... + en
vn6.z, t ~ 0,
which corresponds to the observation of X = (Xl, ... , X n ), where Xl' ... ' Xn
are i.i.d. Gaussian random variables with distributionN(I-',u 2 ), and IR~ =
(0,00).
255
256 Lesson 13
I (Xl, ... , Xn; (1-', 0"2)) = (~)n/2 exp [- 2~2 ~(Xi - 1-')2], (13.1)
Sufficiency.
Let (F, C) be a measurable space. A statistic S with values in (F, C)
is, by definition, a B-C measurable mapping of E into F.
A statistic S is said to be sufficient if there exists a variant of the
conditional probability P;(BIS), BE B, () E e which does not depend on
().
This property means that S(X) contains all the available information
concerning ().
The following theorem provides a useful criterion for sufficiency. Proof
is omitted.
13.2 Estimation
Let (E, B, P 9, () E e) be a statistical model and let 9 be a measurable
mapping from (e, V) into (e', V'), where V and V' are O"-fields over e and
e' respectively.
In order to evaluate g«(}) from the observation X, one uses an estimator,
that is a statistic with values in (e', V').
In the Gaussian model, S2 = E~=l (x-x)2 In is an estimator of g(l-', 0"2) =
0"2. It is important to note that an estimator only depends on X. For ex-
ample, s~ = E~l (x - 1-')2 In is not an estimator of 0"2 because it cannot
be computed from the observations.
Statistics for Poisson Processes 257
where the symbol Ee means that the expectation is taken with respect to
Pe·
The quadratic error generates a partial ordering on the set To of all
estimators of g«(J) as follows. Consider Sand T in To, then S is said to be
preferable to T (S -< T) if and only if
hence
Ee (E*(T'IS) - E"(TIS» = 0, 0E e.
By the completeness of S it follows that
n --+ ,
(}
and
r.::TTi1\ 'D
ynI«(})«(}n - ()) N '" N(O, 1), (13.10)
A
--+
13.3 Tests
Given a statistical model (E, 8, Pe, () E e), we wish to test the hypothesis
Ho : () E eo, against H 1 : () E e1 == e - eo.
Ho is called the null hypothesis and H1 the alternative hypothesis. These
expressions are justified by a dissymmetry in the problem which is visible
in the following example.
Example 13.2 n trials with a coin are performed. The problem is to test
that the coin is fair. An associated statistical model is
is accepted if ¢ °
A test ¢ is a measurable mapping of (E,8) into ({O, I}, P({O, I})). Ho
= =
ans rejected if ¢ 1. Note that this does not mean that
H1 is accepted! For that, one must contruct a new test problem where H1
(or some other hypothesis) is the null hypothesis. The above ¢ is completely
specified by its critical region W =
{x : ¢(x) I}. =
The probabilities of error Pe(W), () E eo and Pe(E - W), () E e 1
measure the quality of ¢. Taking into account dissymmetry one defines the
level of significance
Q.p =
sup Pe(W) (13.13)
eeeo
and the power function
(3.p = Pe(W), () Eel. (13.14)
A "good"test has a small level of significance and a large power.
Let Q E [0,1] be a given number and let Ta, be the family of tests
satisfying Q.p ::::; Q, ¢ ETa. A test ¢o is said to be optimal within Ta or
uniformly most powerful (UMP) in Ta if
(3.po«(J) ~ (3.p«(}), () Eel, ¢ ETa. (13.15)
The following classical result gives the optimal test in the simplest case.
Proof is omitted.
Theorem 13.4 (Neyman-Pearson lemma). Let (E,8'/e,() E {(}o,(}d)
be a dominated statistical model. Then the test ¢o defined by the critical
region
W = {x : fel(X) ~ cfeo(x)}, (13.16)
where c is a constant, is optimal in Ta4>o for testing () = (Jo against () = (}1.
Statistics for Poisson Processes 261
jT = NT/T. (13.20)
or equivalently
00 (AT)n
G(A) = Lg(n) - , - = 0, A ~ 0. (13.23)
n=O n.
The power series G(A) vanishes over [0,00), hence
E>. (A:) = n 1
00
o u
A
_e->'u ,
(AU)n-l
_,. du
n 2A2
E>.(A:-A)2=, . . • \1 .. n\' n~3 (13.27)
and
A: - - A, m.q. and a.s. (13.28)
(see Exercise 13.9).
r,;;jT - A )
lim P>. ( IvT~1 ~
T-+oo VA
Zl-a = 1- a, (13.30)
h(a) = [ AT -
A
r-
V{5; Z1 - CX , AT +
A
r-
V(5; Z1 -
1
CX • (13.32)
Now if the process is observed over [0, Tn], then the construction of a
confidence interval is based on the fact that 2ATn is distributed as x2(2n).
Consider q1 and q2 such that
2Tn' ~])
PA ( A E [ .l!.. 2Tn = 1- a. (13.34)
Tests.
Consider the problem of testing 0 < A ~ AO against A > AO.
Note that if NT is observed, the model is dominated by the counting
measure over N with density
Consequently the family (f(., A), A> 0) has monotone likelihood ratio.
By Theorem 13.5, it follows that an optimal test has the form
and
Wn = n i: :r:eIR
Q~k) = n Lk [
Vn
(i=.!T iT]
k ' k . 1- V.
(i=.!T iT]] 2
k , k , (13.40)
;=1 v (yT, tTl
266 Lesson 13
where k is fixed.
It can be proved that if n tends to infinity, then
Dn ~ K , W.n ~ W' n
Q(k) ~ Q(k) ,
P(K > cal = P(W > c~) = p(Q(k) > c~) = a, 0<a < 1.
The choice between these tests is difficult. Some practical considerations
seem to prefer the K. and V. M. tests but the x2-test is easy to compute.
Remark.
This method for testing the Poisson character of a point process is com-
monly used in practice. Note however that the "uniformly property", say
U, does not characterize Poisson processes! For example Cox Processes (see
Section 4.5) verify U. Thus from a theoretical point of view, we only test
Ho : U against Hl: not U.
Comparing two Poisson processes.
Consider two independent Poisson processes with intensities A and N
and time arrivals 11 and TJ respectively. We wish to test Ho: A = N given
data Tnl and T~2.
Since 2ATnl '" x 2(2nI) and 2A'Tn2 '" X2(2n2), the random variable
(TnJnl)/(Tn2/n2) follows the Fisher distribution F(2nl,2n2)' provided
A =A'.
From this property, we deduce the critical region
n2Tnl
{- T. > fa/2 } U{n2Tnl
-T.:$ h-a/2 } (13.41)
nl n2 nl n2
with P(F > ff3) = p, 0 < p < 1, and F", F(2nb 2n2).
Note that, since a Poisson process has independent increments, this
test may be utilized for the verification of homogeneity of a Poisson process
observed over two disjoint intervals.
Statistics for Poisson Processes 267
)-NT+I
L(O) = Or(NT + 1) (
0+ 10
T
k(s)ds J1
NT
k(Tj), (13.43)
where TI, ... , TNT denotes the time arrival. Note that L(O) is not defined
if NT = o.
Now NT is a sufficient statistic for 0 and the MLE is
" 1 fT
0= NT 10 k(s)ds. (13.44)
13.7 Exercises
13.l. Show that (x, S2) is sufficient in the Gaussian model ( Hint: use the
factorization theorem).
13.2. (i) Show that (13.12) is valid under some regularity conditions (Hint:
show that Cov(Tn, 8Iogf(X,O)/80) = 1 and use Schwarz inequality).
(ii) Find the models for which (13.12) is an equality.
13.3. Construct the statistical model associated with observations X I, ... , Xn
of i.i.d. random variables with uniform distribution on [0,0], 0 > O. Find a
sufficient statistic and determine the MLE of O. Compute its variance and
explain why (13.12) is not valid.
13.4. In Exercise 13.2, construct an optimal test for testing 0 = 1/2 against
0> 1/2.
13.5. Prove Neyman-Pearson lemma.
13.6. Prove Theorem 13.5.
268 Lesson 13
10
where r(a) = 00 e-:C x u- 1 dx.
=
Show that E( Q;) n and V ar( Q;) = 2n.
r. . (n -
(ii) Show that
n:a~ = t
0=1
(Xi ~ Xn x2 1),
r(nt1) ( x 2 )-(n+1)/2
gn(x) = r-=.... In\ 1+ ~ , x E JR.
=
Show that E(Tn) 0 and Var(Tn) =
n/(n - 2), n > 2. Find the limit of
gn(x) as n tends to infinity.
(iv) Prove that Xn and S~ are independent and find the distribution of
vn=T(Xn - JJ)/Sn.
Statistics for Poisson Processes 269
(v) Show that if Y '" x2 (nt) and Z "" x2(n2) are independent then
F = ~ ~ has the so-called Fisher distribution with (nl' n2) degrees of
freedom. Prove that the density of F is
Statistics of Discrete-Time
Stationary Processes
14.1 Stationarization
The first step in the statistical analysis of an observed stochastic process is
to extract the possible trend and seasonality and eliminate them in order
to obtain a stationary process.
Detecting trend and seasonality.
yt = mt + Xt , tEll, (14.1)
271
272 Lesson Lf.
(2) Another model with trend is the ARIMA process (10.43). In that case,
the trend is random and is detected by considering the sample correlation
n h - -
A Et':-l (Yt - Yn)(Yt+h - Yn ) h = 1, .. . ,n -1, (14.3)
Ph = E~=l (Yt - Yn)2 '
1 1
..j211"n ~ ajei(>.;->.) ei~(>';->') - 1
n p
..j211"n L ei>.tYt
t=l
= J=l e'(>';->') -1
1 n
_ _ ~ X -i>.t
+ ..j211"n L..J t e == An + Bn (14.5)
t=l '
_~
1 n
Lt =
L..Je -i).;tv
'F-" t=l
v.G1I"n
# -a·
211" J
+ Cn, (14.6)
where en
is bounded in L2-norm.
Now consider the periodogram associated with Y 1 , .•• , Yn :
where k is a known integer such that 0 < kin < 1/2, a is unknown and
(ct) is a white noise with a known variance 0- 2 •
We wish to test H 0 : a = 0 using the statistic
T. - 411"
n - 0- 2 m
L (2k1l")
n . (14.9)
( t; ct (21rkt) t; ct (21rkt))
n
cos --;- ,
n •
sm --;-
T. -
n -
411"
0- 2
L (2k1l")
m n '
and
St = CISlt + ... + C'TS'Tt, (14.12)
where
Slet = l{t=le(mod'T)}' k = 1, ... , T. (14.13)
Since E;=1 Slet = 1, it is necessary to introduce an additional assump-
tion which should ensure identifiability of the model. A natural condition
IS
'T
L = 0, Cle (14.14)
1e=1
yt = mt + Xt, tEll,
1 n-t
'Yt n _ t L:: X.X.+t 0~t ~ n- 1
.=1
o t 2: n. (14.19)
Theorem 14.1 Let (Xt ) be a zero mean stationary process such that E(Xi) <
00 and E(X.Xt+.X.+"X.+.'+t) does not depend on s. If
then we have
L~
t2:0
A
V(In(O))
2)2 '
= 2 (;'11"
which shows that In is not consistent!
More generally, if (Xt ) satisfies some regularity conditions, one can show
that
and
Cov (In (A), In (A')) ~ 0 if A oj; ±A'. (14.24)
Statistics of Discrete-Time Stationary Processes 277
In(>') = 211"
1
L
n-1
W
( t )
kn 1't cos>'t, >. E [-11",11"]. (14.25)
t=-(n-1)
11
where
f3 = c/2(>.)
-1
W2( u)du +
c
~ [1"(>.)]2 a 2.
We only give some indications about the proof First since I is twice
continuously differentiable, it may be shown that
V{fn(>'» = 0 (~ ) . (14.28)
i:
integrated square error (MISE) defined by
gn(X) = _1_ ~ K
nh n L...J
(x -h Xt )
'
x E JR, (14.32)
t=l n
and that
I: E(gn(x) - g(x))2dx = O(n- 4 / 5 ) (14.34)
provided h n ~ n- l / 5 .
This estimator is more accurate than the classical histogram (see Exer-
cise 14.5) which only reaches the rate n- 2 / 3 •
Results of the same kind are obtained when estimating the density of
(Xl' ... ' Xk), k ~ 2.
Another important problem is autoregression estimation. Suppose that
(Xt ) is in addition a Markov process with autoregression
In ]1/2
hn = [; L:(Xt - Xn)2 n- 1/5
t=l
Pk rk
MA(q) Pk =0, k > q Irkl = O(e O!Al), a > 0
AR(p) IPkl== O(e O!Al), a> 0 rk = 0, k > p
where (Pk) denotes autocorrelation and (rk) partial autocorrelation (see
Lesson 10).
Therefore if (Pk) and (rk) are suitable estimators of (Pk) and (rk) re-
spectively, then we obtain the following empirical criterion:
If IPkl is small for k > q, then the model is a MA(q).
If Irkl is small for k > p, then the model is an AR(p).
If IPk I and Irk I decrease rather slowly, then the model is mixed.
Lemma 14.1 Let (Xt, t E 7Z) be a zero mean stationary process with au-
tocorrelation (Pk) and partial autocorrelation (rk). Consider the linear re-
gression of X t with respect to X t - 1 , ... , X t - k :
k
X; = L: aikXt-i. (14.38)
i=l
Statistics of Discrete-Time Stationary Processes 281
and
rk = akk. (14.40)
then
rk = lzkk. (14.43)
It may be checked that Pk and rk are consistent estimators but the
above criterion remains empirical.
We now consider the general case.
Akaike's criterion.
It is based on the minimization of the quatity
Theorem 14.3
(fin, iin) ~ (p, q) as n -+ 00. (14.46)
Proof is omitted.
Note that (14.46) means that, with probability one, there exists a ran-
=
dom integer N such that fin p and iin q for every n ~ N.=
2) Estimation.
We now suppose that (Xt ) is an ARMA(p, q) where (p, q) is known. The
problem is to estimate the unknown parameter
TJ = (¢l, ... , ¢p'(h, ... , (Jq, (1'2),
where (1'2 is the variance of Ct and
p q
Xt - L: ¢j Xt-j = Ct - L: (JjCt_j (14.47)
j=l j=l
(see (10.37».
If (Xt ) is Gaussian, the MLE provides a good estimator of TJ since it
may be checked that it is asymptotically efficient. However its implemen-
tation is tricky because the likelihood is complicated.
In the particular case of a MA(q), we have
+ L: (JjCt_j,
IJ
Xt = Ct t E 'fl. (14.48)
j=l
Thus
(Xl," .,Xn) = A(cl-q, .. . ,cn), (14.49)
where A is a linear mapping. This allows us to write explicitly the likelihood
(Exercise 14.7).
In the general case the problem can be simplified by approximating (Xt )
to a MA(Q).
Now if (Xt ) is in an AR(p), then the conditional MLE provides a simple
and interesting alternative method.
Recall that p
with density
(X1-p, ... , Xo, U1, ... , un) E IRn+p, where! denotes the density of
(X 1 - p , ••• , Xo). Using the change of variables
p
Ut = Xt - L 7rj X t_j, t = 1, .. . ,n,
j=l
we deduce from (14.51) that the conditional density of (Xl, ... , Xn) given
(X 1 - p , ••• , Xo) is
For convenience, we now suppose that the data are Xl- p , .•• , X o, Xl ... ·, X n .
The conditional likelihood is then g(X1"'" X n IX1- p, ... , Xo) and the con-
ditional MLE is the solution of the system
1 PIn
- L XtXt-k - L L Xt-jXt-k = 0,
n
7rj - k=I, ... ,p,
n t=l j=l n t=l
0'
2 ~ (Xl -
= -1 L..J 7r 1 X t - 1 - ... - 7r pX t _p)2 , (14.53)
n
t=1
hence the estimator fJn = (11'1, ... , lrp, u2 )n.
Note that these equations may be obtained fro~ the Yule-Walker equa-
tions (Theorem 10.1) with replacement ofthe autocovariances by (modified)
sample autovariances. (14.53) may be used in the non-Gaussian case and
fJn is consistent.
3) Diagnostic checking.
The operations performed in 1) and 2) specify completely the model.
In order to verify if the model fits to the data, we define the residuals
€t by
~(B)Xt = 8(B)€t, (14.54)
where ~(B) = 1- E~=l ¢jBj and 8(B) = 1- EJ=l DjBj.
284 Lesson 14
14.4 Exercises
14.1. Consider S defined by (14.2). Compute E(S), Var(S) and give a
bound for P(IS - E(S)I > TJ), TJ> 0 in the following cases.
(i) (yt) = (Ct), where (Ct) are Li.d. random variables with common
uniform distribution over [-1/2,1/2].
(ii) yt = at + ct, t E '/1, where a E JR*.
14.2. Consider Tn defined by (14.9). Show that Tn ,..., X2(2) when a = O.
14.3. Construct a confidence interval of asymptotic confidence level 1 - ex
for the mean of a stationary process (1(0) is supposed to be known).
14.4 Prove the consistency of 1; defined by (14.21).
14.5 Let (Ct, t E '/1,) be a strictly stationary real process. Suppose that the
density , g, of co does exist and define the histogram estimator by
Yn(x) =: t
t=l
lU/1:",(i+1)/r.,,)(Ct), x E [j/k n, (j + 1)/kn), j E '/1,.
Xt = Ct + OCt-1, t E '/l"
where 101 < 1 and where (ct) is a Gaussian white noise. Determine the
likelihood and compute the MLE (0, &2).
14.8. Consider the process
where Ipi < 1 and where (ct) is a Gaussian white noise. Given the data
X o, Xl"'" X n , compute the conditional MLE (p, &2).
14.9. Let (Xt, t E '/l,) be a zero mean stationary process with spectral
density
1
=
f(>t.) 211" (1 + 20 cos >t. + ( 2 ), -11" ~ >t. ~ 11",
where 101 < 1.
(i) Determine the auto covariance (rt) of (Xt ) and verify that ro > 211'11.
(ii) Define
Z2(a) = aX1 + (1- a)X2' a E JR.
Find a number a which minimizes Var(Z2(a)).
(iii) Define
1- 2a
Zn(a) = a(XI + X 2) + --2-
n-
(X2 + ... + Xn-d, a E JR, n ~ 3.
=
Determine a number a an which minimizes Var (Zn(a)). Find a condition
which ensures that an lin. =
»
(iv) Compute limn_co nVar(Zn(an = t. Compare t and liffin_co nVar(Xn).
Conclusion?
14.10. Define
Statistics of Diffusion
Processes
"f3*
1
-T
IT-3 (X t - XT )(Xt+3 - XT )dt, s<t
- s 0
0, s ~ t, (15.2)
287
288 Lesson 15
provided that the above L 2-integrals exist. If, for instance, (Xt ) has an
auto covariance continuous at t = 0, then these estimators are well defined
(see Theorem 9.10).
The consistency of XT is given by Theorem 9.11 and concerning 'Y: an
adaptation of Theorem 14.1 gives the convergence (Exercise 15.1). Similarly
as in the discrete case, the periodogram is asymptotically unbiased but not
consistent and must be modified using weight functions.
We do not develop these properties since in practice it is difficult to
observe a process in continuous time. Classical schemes for observing a
continuous time process are as follows.
a) Observable coordinates.
Owing to inertia of the measurement's device, the observations take the
J
form
Zt = X 6 ¢(s, t)ds, °~ t ~ T, (15.4)
°~
1 n-j
• (6)
'Yj n_ j L: Xi6 X (i+j)6,
i=l
j ~n- 1
= 0, j~n (15.7)
provides some information about ('Yj6,j = 0,1,2 ....) but not about 'Yt for
t ¢ {O, 0, 2o, ...}.
Statistics of Diffusion Processes 289
V ar (X-(N»)
T +m2
= 'Yo >'T + .!.1T
T (1-~)
T 'Y$ d s. (15.9)
-T
(iii) If ('Y$) is integrable over JR, then
TVar(X¥""») -+ 'Yo+ m2
>.
+100
-00 'Y.ds=~2. (15.10)
then
VT(X~N) - m) ~ N '" N (0, E2) . (15.12)
4N)(t)
IT _- \2Th
1\
1
T
Loo
. .
1(T;<TT'<T) X T,XT.K (t - (ToI- }
- , J_ • J h
To))
' t > 0,
1,}=1 T
(15.14)
where the kernel K is a density over JR and hT is a banwidth parameter.
This estimator has good asymptotic properties (see Karr (1986)).
Finally the spectral density
f(A) =-11
7r 0
00
-Yt cos Atdt, A E JR (15.15)
1
gT(X) = ThT
iT
0 K
(x-Xt)
---,;:;:- dt, x E JR, (15.17)
foT - H Xt+H K ( ~ ) dt
,
rHT(z) = f: K
(
aJ"h;' )
dt
' Z E IR (15.19)
r
Let us define an estimator associated with (Wt, 0 ~ t ~ T) by
liminfZn · (15.21)
n-+oo
Lemma 15.2 Let Xl' ... ' Xn be real i. i. d. random variables such that
E(Xt) < 00 and E(Xi) = 0, then
y;2 =
k"
~
Tu2
[w (kT) _ W (k -1)kT)]2
2n 2 n '
1<k
__
< 2n.
For every k, y k2" ,... X2(1) and YIn, ... , Y2"n are independent since the
Wiener process has independent increments.
Now from Tchebychev's inequality and Lemma 15.2, it follows that for
allg>O
2" 2 ]4 8
E [ Lk=I(Yk" - 1) U
< 24ng4
u 8 a2 n + 3b2n(2n - 1)
< g4 24n (15.24)
=
where a E(yk8J and b = [E(YlJ]2 are constant.
Now (15.24) entails
L P (IZn - u 2 1 > g) < 00, g> 0
n
°
Then
X=(Xt,O~t~T) and W = (Wt, ~ t ~ T)
are C-valued random functions with distributions pf and pf, respectively.
It can be proved that pl admits a density with respect to Pew and the
associated likelihood is given by
°
where dXt is defined by (15.26).
Thus can be estimated by the MLE OT. We now make the following
hypothesis
(i) (Xt ) is a strictly stationary Markov process.
(ii) For every function <f; integrable with respect to the distribution pfo
of X o, we have
lim T
t-oo
liT° <f;(Xt)dt = 1 00
-00
<f;(x)dPfO(x), a.s.
294 Lesson 15
(ergodicity).
(iii) The statistical model is identifiable:
Theorem 15.3 If conditions (i) - (iv) hold, then for all 00 and as T --I- 00,
" X
OT --+ 00 P8 0 a.s., (15.29)
and
../T(OT - (0) ~ N ",.N (0, Iia 1 ) , (15.30)
where
l -E (8",(OO'XO»)2
80 - 80 80
and
£-1(0£ - (0 ) ~ N '" N (0, lir/) ,
[iT
where
leo = limEeo
£_0 0
(0J.t(00,Xf))2
00 dt
1
and T is fixed.
15.4 Exercises
15.1. Prove the consistency of 1; under suitable conditions.
15.2. Prove the results in Therorem 15.1 (Hint: for (iv) compute the char-
acteristic function of VT(X!f) - J.t)).
15.3. Look at the estimators 'Y~N) and j~N) «15.14) and (15.16)) and try
to explain the rationale which leads to them (Hint: use the simple kernel
K = 1[-1/2.1/21 and recall that hT and h n are small).
15.4*. Consider a strictly stationary process (Xt, t E JR) such that the
density fu of (Xo, Xu) does exists for u =/; 0 and satisfies
K(v) = .~e-1J2/2dv, v E JR
V 211"
and show that
=
where VI (x - y)/hT, V2 = (x - Z)/hT.
(iii) Show that
~ iT IltPulloo + iT TlltPulloo
[00
du 0
u
du , y, z E R.
( v) Prove that
T· V[UT(X)] ---? 21 00
tPu(X, x)du.
Xt = uWt , t ~ 0,
~2
Un = T1 ~ (
L..J X tj - X t j _ 1 )2 •
j=l
lim E (a-~ - u 2 ) 2
n-+oo
=0 if and only if lim.6. n
n-+oo
= o.
Statistics of Diffusion Processes 297
u2 _
n -
.!.
n
t
j=l
(Xt; - X t ;_1)2
tj - tj-l
lim E (u; -
n-+oo
00 2 )2 = o.
15.6. (Continuation of 15.5). Suppose that data are X T1 , ... , XT" where
T1 , ••• , Tn are time arrivals of a Poisson process with known intensity A and
independent of (Xt ). Construct an estimator of 00 2 based on X Tu ... , XT"
and prove its consistency as n tends to infinity.
15.7. Let (Xt, t E JR) be an Ornstein-Uhlenbeck process «12.42) and
(12.46» observed over the interval [0,11.
(i) Compute the likelihood (15.28).
(ii) Compute the MLE OT.
(iii) Find a confidence region for () by using (15.30).
15.8. Consider the process
X t =tY +cWt , o ~ t ~ 1,
where Y '" N(J-t, £2) and (Wt, 0 ~ t ~ 1) is a standard Wiener process
independent of Y.
(i) Determine an unbiased estimator its of J-t based on (Xt, 0 ~ t ~ 1).
(ii) Show that its is consistent as c --+ o.
15.9. Let (Xt ) be an Ornstein-Uhlenbeck process observed at times 0,1, ... , n.
(i) Determine a conditional MLE estimator On of () based on (Xo, Xl, ... , X n).
(ii) Show the consistency in probability of On as n --+ 00.
15.10*. Consider the stochastic process (Xt, t ~ 0) defined by
. llT
OT= T
o
Xtdt +
lT - .
0
dXt
Xt
infinite. This discrete measure is called the counting measure on 'U.,. Note
that J.l( {n}) = 1 for all n E 'U.,. A property is said to hold almost everywhere
(a.e.) on 0, with respect to a measure J.l (J.l- a.e.) if it holds in 0 except in
some subset A E A with J.l(A) = O.
J.l is said to be bounded if J.l(0) < 00. A probability measure P is a
bounded measure with P(O) = 1. For a E 0, the Dirac measure at a,
denoted by O( a), is the measure defined by
if a E A
o(a)(A) ={ ~ if a ¢ A
It is clear that J.l is u-additive on 1). The u-field generated by 1) is the Borel
u-field B(JR). There exists a unique u-additive extension of J.l to B(JR) (see
Theorem below). The completion of this measure on (JR, B(JR)) is called
the Lebesgue measure on JR. Since
00
where
Rj --:71
R· X R·'12 x··· x R·'1 .. ' i = {it, h, ... ,in}, ii E T, Rj; = R,
the Kolmogorov existence theorem (Lesson 2) asserts that there is a unique
probability measure P on (RT , B(RT» such that
for all cylinders, provided that the family of probability measure (Pj, i E J)
satisfies the consistency condition.
[
k1X~
I(Wl, w2)d(J1.1 ® 1'2)
k1[ [rk2 I(Wl, W2)dJ1.2(W2)] dJ1.1(Wl)
102 [101 I(Wl,W2)dJ1.1(Wl)] dJ1.2(W2),
where WI ~ J0 2 /(Wl,W2)dJ1.2(W2) (resp. W2 ~ J0 1 I(Wl,W2)dJ1.1(wd) is
defined 1'1- a.e. (reap. 1'2- a.e.).
L
that
J1.(A) = I(w)dv(w), 'v'AEA. (*)
B.1 Definitions
Let U be a linear space over JR. A nonnegative real function II . II defined
on U is called a norm if
(i) < alxl +a2x2, y >= al < Xl, y> +a2 < X2, y >, where aI, a2 E JR
and Xl,X2,y E Hj
(ii) < X, y >=< y, x>, X,y E Hj and
where Xl, ... , X n , VI, ... ,Vn E JR. A typical countable dense subset of JRn
is Qn where Q denotes the set of rational numbers.
(b) The space C ([0, 11) of continuous real functions defined over [0,11
is a separable Banach space with respect ot the supremum norm defined by
B.2 V-spaces
Let (U, B, 1') be a measure space, p ~ 1 and LP(I') the space of real mea-
surable functions I on (U, B) such that fu I/IP dl' < 00.
The relation
I "" 9 <=> I = 9 I' a.e.
is an equivalence relation over LP(I').
Now the space LP(I') of equivalence classses associated with "" IS a
Banach space with the norm
Then II . 1100 induces a norm over LOO(IJ) which becomes a Banach space.
L 2 (IJ) is a special V-space since it is a Hilbert space with scalar product
x- 7r M (x) 1. y, yEM.
00
and
n=O
1 1
¢o = .,f2i' ¢2k+1(t) = Vi coskt, ¢2k+2(t) = ~ sin kt, k E IN.
M = {Z : Z E L 2(P), 3g : Z = g(X)}.
(3) Let us finally note that convergence in mean square is nothing but
convergence in L 2 (P). Consequently if liIDn,m-+oo E(Yn - Ym)2 =
0, then
there exists Y E L 2(P) such that liIDn-+oo E(Yn - y)2 O. =
More details about connection between Hilbert spaces and probability
theory can be found in Rao (1984).
List of Symbols
311
312 List of Symbols
313
314 Bibliography
Lesson 1.
1.10. (i) {w: X(w) < oo} = Un~l{W : X(W) ~ n} E A.
(ii) {w : X(w) = oo} = {w: X(w) < oo}C E A and
{w: X(w) = -oo} = nn::;o{w: X(w) ~ n} E A.
(iii) {w : SUPn Xn(w) ~ t} = nn{w : Xn(w) ~ t} and infn Xn =
- sUPn( -Xn ).
1.17. (i) Clearly if X ? 0 and simple, then E(X) = Jooo P(X > t)dt. If
X ? 0 (measurable), then let Xn / X, with Xn ? 0, simple (Exercise
1.18). Since, for each t, (Xn > t) / Un>l(Xn > t) = (X > t), the result
follows by monotone continuity of P and by the monotone convergence
theorem (Appendix).
(ii) Write X = X+ - X- .
= =
(iii) E(IXlk) Jooo P(IXI > tl/k)dt, then y tl/k.
1.18. (i) Let
i
Ai,n = { w: 2n ~ X(w)
i+l}
<"""2rI ' i = 0, 1, ... , n2n - 1
and Bn = {w : X(w) ? n}. Then clearly, for each n ? 1, the Ai,n'S and Bn
form a partition of n. To show that 'Vw, Xn(W) ~ X n+1(w), we consider
two cases:
(a) w E Bn. If X(w) ? n + 1, then Xn(w) = =
n < n + 1 Xn+1(w)j If
n ~ X(w) < n + 1, then
n2 n +1 (n + 1)2n +1
2n +1 ~ X(w) < ..
315
316 Partial Solutions to Selected Exercises
Also
n2n+1 n2n+1 + 1 (n + 1)2n +1
2n+1 < 2n+1 < 2n+1
If w E Aj,n+l with j = n2 n+1, then Xn (w) = nand Xn+l (w) = j 12n+1 = n.
If
n2n+1 + 1 (n + 1)2n+1
- .. ~ X(w) < n".L1 ,
Lesson 2.
2.2. (i) For A E A, take Bl = B2 = A.
(ii) If Bl ~ A ~ B2 and B~ ~ A ~ B~ with P(Bt} = P(B2),
P(BD = P(B~), then P(B1) = P(BD. Indeed,
P(Bt} ~ P(B~) = P(BD = P(B1BD + P(N) ~ P(Bt},
where P(N) = o.
P(n) = p(n) = 1. P(A C ) = P(BD = 1- P(Bt} = 1- P(A).
Partial Solutions to Selected Exercises 317
Lesson 3.
3.13. (i) Let 1j(w) = inf{k ~ 1 : Xk(W) = j}. We have {w : Xn(w) =
j} ~ {w : 1j(w) ~ n}. Thus
But
(ii) Let A be the set of all null recurrent states. Let i E A and j ¢ A.
If j is transient, then i f+ j. If j is positive recurrent, then i f+ j, since
otherwise j -+ i, but then i has to be positive recurrent.
3.16. (a) Nj(w) = E~=ll(x .. =j)(w). Consider the successive indices n ~ 1
at which Xn(w) = j:
For i i= j,
P(Nj = klXo = i) P(Nj ~ klXo = i) - P(Nj ~ k + llXo = i)
{ I - lij for k = 0
lij(fjj)k-l(1 - Ijj) for k ~ 1.
On the other hand, (2) and (3) imply E(s - TN.) = (1- e-)..$)/A hence
2 e-)..$
E (TN. +1 - TN,) = 'I - -A-' (5)
= k) = L
00
(iii) First let us suppose that A1 > A2' Then E(Mt) = (A1 - A2)t --+ 00
as t -+ 00. Now IMtl ~ c ~ IMt - E(Mt)1 ~ c - E(Mt) and for t
large enough c - E(Mt ) ~ -E(Mt )/2, thus IMtl ~ c ~ IMt - E(Mt)1 ~
E(Mt )/2. By using Tchebychev inequality we get
4(A1 + A2)t
(A1 _ A2)2t2 --+ 0 as t -+ 00.
Partial Solutions to Selected Exercises 321
Mt Nl - At At - Nl
V1i = V1i + V1i .
By using Theorem 4.7 and the independence between (Nl) and (Nl) we
obtain
lim P
t ..... oo
(I~
VAt <
- z) = P(INI <
- z), z E 114
where N "" N(O, 2). Now for all £ > 0, there exists a real number Ze/2 such
that P(lNI ~ Ze/2) = £/2. On the other hand c/V1i ~ Ze/2 for t large
enough. Therefore P(IMtl ~ c) ~ P (IMtl/V1i ~ Ze/2) and for t large
enough
IMtl)
P ( V1i ~ Ze/2 ~ P(INI ~ Ze/2) + 2" £,
£
=
which proves that limt ..... oo P(IMtl ~ c) = o.
(iv) The results obtained in (iii) show that, even if Al = A2, the differ-
ence INl- Nli has a tendency for increasing with t.
4.13. (i) Let t be a positive number such that B ~ [0, t] and let k be a
positive integer. We have
(ii) Similarly as above we consider t such that U:=I Bj C [0, t]. Then
by using again Theorem 4.4, we see that the conditional distribution of
322 Partial Solutions to Selected Exercises
(NBl' ... , N B", N[o,tj - U7=1 Bj) given {Nt = n} is mulitnomial and hence
P(NB1 .... ,NB,,) = P(Am(Bt)) ® ... ® P(Am(B,,)). (1)
(ii) The above formula (1) shows that (NB) is a Poisson process on
R with mean measure Am: the two definitions of a Poisson process on R
agree.
4.15. (i) Write
E (e iIlXi ) = f
1:=0
E (e illXi INt = k) P(Nt = k) = f
1:=0
e->.t (A:t ¢1:(u),
2
u ) = 1 - 2A
¢ ( V). u 0'2 + 0 (uT 2
)
.
Therefore
E (eiIlXdv'X) = e-q2tIl2/2->.to(1I2/>.)
and
lim E (eiIlXi/".;x) = e-q2tIl2/2 uER,
>'_00 '
which shows that the asymptotic distribution of Xt/V). is N(O, 0'2t). Actu-
ally it can be proved that the Poisson process "tends" to a Wiener process
as A tends to infinity.
Lesson 5.
5.5. For i i' j,
as t ~ O.
5.6. (i)
"'g . = L..J
L..J'J
'" lim ![p,
t '..J(t) - 0"] = lim! "'[p,
t'\,O L..J'J
'J
.. (t) - 0"]
t'\,O t 'J
jES ;ES ;ES
Partial Solutions to Selected Exercises 323
1- Pii(t) I. . f 1- Pii(h)
-- <
-
Imln
h_O
, ,
and hence
·
I1m 1 - Pii(h) I· . f 1 - Pii(h)
sup h ~ 1m III , ,
h_O h_O
324 Partial Solutions to Selected Exercises
1. Pij(h) Pij(t) 1
IT.!~P - h - ~ - t-1 _ 3c < 00.
Letting t -t 0 we obtain
·
1Imsup--< . f Pi)' (t)
Pi)' (h) l'Imln 1
----< 00
h_O h - t-+O t 1 - 3c '
and hence
• Pi)' (h)
1Imsup--= l' . f Pi)' (t)
Imln - - ,
h-+O h t-+O t
since c is arbitrarily small.
Lesson 6.
6.4.
P~r: = ( 2: ) (pqt = (-It ( -~2 )22n(pqt.
00
~
L..J
p,2n
00 ~(-lt ( -~2 ) (4pqt = (1- 4pq)-1/2
n=O
¢(s) =
n=l
Partial Solutions to Selected Exercises 325
Lesson 7.
7.1. Use (i) and (Nt ~ n) = (Sn ~ t).
7.10. (i) (Rt > x) = (no renewals in (t, t + x]).
(ii) For x < t, (Ct > x) = (no renewals in (t - x, t]).
(iii) (Rt > x, Ct > y) = (no renewals in (t - y, t + x]), for 0 < y < t.
(iv) XNi+l = Ct + R t .
Lesson 8.
8.1. (i) For each t, A(t) and D(t) are independent, thus
E P(A(t) = n + j)P(D(t) = n)
00
P(X(t) = i) =
n=O
n=O
(n+ i)! n!
e-~t(Att.-j e-JJt(~t)n
n=j
326 Partial Solutions to Selected Exercises
00 e->'t(At)k+j e-/-It(l-'t)k+j
= ~ k! (k+j)!
E [ej-i(t) -
00
Lesson 9.
9.3. (i) Let 'H t -1,p be the subspace spanned by X t - 1, ... , Xt_p. Since Xt E
'H t -1 = U~l'Ht-l,p, there exists a sequence (X(p) , p ~ 1), X(P) E 'Ht-l,p,
such tha.t X(p) -+ Xt, as P -+ 00 (in mean square). Thus, it suffices to
verify that
p_oo
lim E (X(P) - X(p»)2
t
- 0
-.
But
IIX Xtll
t - ~ IIX X~p)11 ~ IIX X(p) I
t - t -
9.7. Set
Xt+h,,(w)-Xt(w) 1 ( )
Zh .. (w) I = -h l{t+h .. }(w) - l{t}(w) ,
In n
E(enem) = J1 [a,bJ2
tPn(S)C(S, t)tPm(t)dsdt
G(t, t) - L >'I:,p~(t),
1:=0
=L
00
Xt el:,pl:(t), a $ t $ b,
1:=0
Lesson 10.
10.7. (i) By using the orthogonality relations
where 1. denotes the autocorrelation of (Xt ). Now we have 'Yo = 0'2 (1 + an,
'Y-l= 'Yl = O'2al, 'Yl = 0, Iii> 1. Thus the desired result follows by
substituting these values in (1).
(ii) By using a similar method as in the proof of Theorem 10.2, it can be
proved that rn = ,pI (see also Lemma 14.1). Now by using recursively the
difference equation obtained in (i), one can infer that ,pI = [( _1)n+la1(1-
anl/(I- a~(n+1)) hence the result.
Partial Solutions to Selected Exercises 329
10.9. (i) We use Cauchy criterion. First note that, for p < q,
= L:rt - 1 ·· .rt_jE("I;_j),
j=p
since rt-I, ... , rt-j, "It-p, ... , "It-q are independent. Consequently
2
q q q
E
(
?: r
J=P
t - 1 ... rt-j"lt-j
)
::; ?:
J=P
j
a E("I;_j) ::; E("I~)?: aj
J=P
1 1 00
Xt - -Xt- 1
p k=1 P k=1
(ii) The spectral density of "It is
which proves that 1/t+1 is not orthogonal to X,. Thus (1/t) is not the inno-
vation of (Xt ).
(iv) It is easy to verify that (Ct) is a white noise. By using again Theorem
9.2, we obtain
Thus (EBn (Y)) satisfies the condition in Theorem 11.1 and consequently
converges in mean square and almost surely.
Note that it can be proved that
EBny ~ EB .... (y),
Then Borel-Contelli lemma entails that there exists 0 0 such that P(Oo) = 1
and for every w E 0 0 , X n- 1 = n/(n 2 - 1) for n ~ no(w). Consequently, if
n ~ no(w), we have
n p
Yn(w) - Yno(w) = L p2 -1·
p=no(w)
Taking the limit as n -4 00, we obtain Yn(w) -400. Hence Yn ~ 00. Note
that a consequence of this result is SUPn E(IYn I) = 00 (see Theorem 11.2).
11.10. (i) Since Xl ... , Xn are integrable and independent, TI?=1 Xi is
integrable. Now if Bn = u(XI , ... , Xn), n ~ 1, we have
thus (Yn ) is a martingale. Now sUPn E(IYn I) = 1, then Theorem 11.2 entails
Yn~Y.
(ii) Let
n
Tn =L 1{Xi=3/2}, n~1.
;=1
332 Partial Solutions to Selected Exercises
~ ~ P ( Xi = ~) = ~.
It means that there exists no such that p(no) = 1 and Tn(W)/n --+ 1/2
for wE no.
Now let £ be a positive number such that 3c +1/ 2 < 2. There exists
no(w) such that
Tn(W) 1
-n- < 2" +£ for n ~ no(w), w E no.
thus Y =
0 a.s.
Finally E (n:l Xi) = E(Y) = 0 when n:l E(Xi) = 1.
Lesson 12.
12.3. (i) The equation which defines <Pn can be written in the form
Since <Pn is continuous (see, e.g. Exercise 9.10), the left side of this equation
is differentiable, hence <Pn is differentiable. Taking derivative on both sides,
we obtain
and consequently b = ±.J2. As the sign of <Pn being arbitrary, one can
choose b = .J2, thus
J;
(Wt ) is a Gaussian process, the L 2 -integral Wt<Pn(t)dt is limit in mean
square of Gaussian random variables (see e.g., Section 9.4) and is therefore
Gaussian: en "" N(O, An).
12.5. (i) Elementary computations show that
In(A) = {;
n-l 1
"2 (Wt~+l - Wn +
(A-"21) n-l
{; (Wt;+l - WtJ2 .
The first sum is nothing but (Wl- W;)/2. Concerning the second sum,
we have
=
where Yi (Wt ;+! - W t J2 - (ti+l - til, i = 0, ... , n - 1. Hence E(Yi) 0 =
and Var(Yi) = 2(ti+l-t;)2. Furthermore, Yo, Y1 , ... , Yn - 1 are independent.
Therefore
E (
n 1 )2
{;Yi
n-l
2 'l)ti+ 1 - ti)2
;=0
< 2(b - a) SUp(ti+l - til --+ 0, as n -+ 00.
i
Thus
n-l
~ 2 L2
L.,; (Wti+! - WtJ --+ b- a
i=O
334 Partial Solutions to Selected Exercises
and finally
In(A) L2
-+
1
2(Wb2 - Wa2 ) + ( A - 1) (b - a),
2 (1)
O~A~l.
1a
b
fn(t)dWt
n
= L: WdWt.+l -
i=l
Wt.) = In(O).
Taking limit in mean square and using (1), we obtain
ja
b
WtdWt = 21 (2
Wb -
2) b - a
Wa - -2-'
(iii) In order to obtain the "natural" value i (Wl- W;), one may
i
choose A = and define formally
Wt • + Wt .+
I,{b fn(t)dWt = ~
L...J f}
1
(Wt.+ 1 - W,;),
a i=l
la
b
1(Wb2 - Wa2)
WtdWt = 2
12.10. (i) The mappings f : W t---+ (Tn(w),w) and 9 : (t,w) t---+ Wt(w) are
measurable. Then h : w t---+ WT.. (w)(w) is measurable since h = go f.
(ii) We have
Similarly
E (e iuWT,,) t XJ
e_u2tu2/2>.(>.t)n-l e->.tdt
Jo (n-l)!
100
o
e
-at >.(>.t)n-l
, ."dt,
[00 e- at a(att-l dt = 1,
Jo (n -I)!
we obtain
E (eiUWT,,) = (1 + 0'2u2>' 2)-n ' u E IR.
Now the characteristic function of the Laplace distribution (Le., the distri-
bution with density e- 1xI /2, x E IR) is (1 + u 2)-I. Consequently,
[1 + (0'2u 2)/(2>')] -1 is the characteristic function of the distribution Pu ,>.
with density v)./(20')e-lxlvTI/u. Finally WT" has the distribution P::l,
where * denotes the convolution product.
(iv) The characteristic function of WT" - WT,,-1 is
A. (u)
V'n -- JE (eiU(WTn -wTn-dlT.n-l -- t n-b T.n - T.n-l -- Sn )
0'2U 2)-1
( 1+~ u E JR,
sup W t
09:S;T
= O:S;t:S;T
max W t a.s.
1
peA) = P(WT > x) = V27rT 1 1:
00
exp (y2)
- 2T dy.
Lesson 13.
13.2. (i) Suppose that Ln(x, 0) = TI?:1 !(Xi, 0) (where x = (xt, ... , xn)) is
strictly positive and differentiable over e (an open set in JR), and consider
J
the relations
Ln(x, O)dx =1 (1)
J
and
Tn (x)Ln(x, O)dx = 0. (2)
Partial Solutions to Selected Exercises 337
J (
and
1 BLn) (4)
Tn Ln B(J Ln dx = l.
Now, note that (4) may be written as
1 BLn)
Cov ( Tn, Ln B(J = 1,
where the random variable X(n) is omitted. Then, Schwarz inequality en-
tails
1 OLn)
Vare(Tn)Vare ( Ln B(J ~ l.
Now, taking (3) into account, we see that
BLn) = nI«(J ) ,
1 7iO
Vare ( Ln
1 BLn)2
Cov ( Tn, Ln B(J = Vare(Tn) nI«(J),
1 8Ln)
Tn - (J = A«(J) ( Ln 8(J .
satisfies (13.2). Now by maximizing (1), it is easy to see that S is the MLE
of (J. S has the density (J-n nzn- 1 1[o,9j(Z), hence
E9(S) = 1
o
9
o-nnzndz
n
= --1(J.
n+
Thus T = nt
1 S is an unbiased estimator of (J. By Theorem 13.3, if S is
f:
complete then T is optimal.
Now if 9 is such that E9(g(S)) 0, (J > 0, then = h(z)dz 0, (J > 0, =
where h(z) = zn-lg(z). Therefore
192
91
h+(z)dz = 192
91
h-(z)dz, 0< (Jl < (J2'
EA(A~? n 100
2'e
A -AU
,
(AU)n-l
_"du
o u
n 2 A2
(n - l)(n - 2)
1 0
00
Ae -AU
(n - 3)!
n 2A2
An - 3 U n - 3 du - -:---.,.....,---,.-
- (n - l)(n - 2)'
Partial Solutions to Selected Exercises 339
Hence
Var),(A~) = E(A~)2 _ (E(A~))2 =, n 2A2
and finally
Lesson 14.
14.2. Let
An =~ 27rk
L..J ct cos --t and Bn =~ . 27rk
L..J et sm --t.
t=1 n t=1 n
First (An, Bn) is a Gaussian vector since every linear combination of its
componentss can be written as L:~=1 atet (where at's are constants) which
is a Gaussian random variable.
Now, by independence,
Var(An) = U
2 ~ 2 27rk
L..Jcos - t
n
and Var(Bn) = u 2 En sm. 2 -to
21rk
n
t=1 t=1
~ 27rk . 27rk
Cov(An, Bn) L..J E(e6 et) cos - t sm --t
6,t=1 n n
2 ~ 27rk . 27rk
U L..J cos --t sm --t
t=1 n n
u 2 ~ • 47rk
- L..Jsm --t =0
2 t=1 n
340 Partial Solutions to Selected Exercises
since E~=l e4itrkt/n = O. Hence the covariance matrix of (An, Bn) is di-
agonal and, since (An, Bn) is Gaussian, it follows that An and Bn are
independent. Now
2
Tn = - 2 (A~
nu
+ B~) = Y + Z,
where Y and Z are independent with common distribution X2 (1). Thus,
Tn"'; X2(2).
14.5. First, for every :c E JR, there exists j = jn(:C) such that if :c E
[j/k n , (j + 1)/kn ), then
E (Yn(:C» kn
-P
n k n -
jk+ 1)
(j-<cl<-- n
kn l
i/kn
(Hl)'k n
g(u)du = g(u n ),
+ 1)
t=1
< k2
-.!!.nP .L
(.
< Cl < _J-
•
n2 kn - k n
k21(Hl)'kn k
< -.!!. g(u)du = ~g(un).
n jfk n n
As n -+ 00, kn/n -+ 0 and g(u n ) --+ g(:c). Thus Var(Yn(:C» --+ O. Finally
lim nVar(Xn)
n-+oo
= "L.J "'It = 1 + ()2 + 2().
t=-oo
Thus
Vare (Zn(an)) ~ Vare(Xn), I()I < 1,
and the inequality is strict except if () = O. However Zn(a n ) and Xn have
the same asymptotic variance.
342 Partial Solutions to Selected Exercises
Lesson 15.
15.4. (i) From Theorem 9.9, it follows that
Cov (J...
hT
K (:1: - X
hT
o ), J... K (:1: - Xu))
hT hT
t
If 4>.(y,z)du - (1- ;) ~'(Y'Z)dUI
x If My, z )du + ~~.(y, Z)dUIt
flO
~ JT 114>ulloo du +
iT° T"4>u"oo
u
du , y,zER.
Partial Solutions to Selected Exercises 343
sup
(!I,z)
,1
a
00
¢u(y, z)du _iT
a
(1- !)
t
¢u(y, z)dul ---+ 0, as t -+ 00.
Thus,
X 10 00
¢u(Y, z)dudydz + 0(1).
(x -hTXo)] =1 - u) g(u)du
(xh:;-
1:
= 1 1
00
E (gT(X» E [ hT J{ -00 hT K
i:
As J::='oo K (v)dv = 1, we obtain
E(gT(:C» - g(:c) = K(v)[g(:c - vhT ) - g(:c)]dv
i:h21°O
and by Taylor formula
where 0 < (J < 1. Noting that J::='oo vK(v)dv = 0 and by using again the
dominated convergence theorem, we obtain
-00
v2K(v)dv.
15.5. (i) Yj =
(Xt; - X t ;_1)2 j[u 2(tj - tj-t)] '" X2(1), then E(Yj) = 1,
= =
Var(Yj) 2. Therefore E(u~) 00 2, and by independence,
2 4 n
Var(u'2)
n = T200 ~(tj -
L..J tj-l
)2 .
j=l
as n -+ 00. Conversely, if
217 4 ~
n 2 4
E
,2
( Un - 17
2) 2
= 2 17
T2 L.J(tj - tj-d ~ T2 ~n ---+ O. (1)
j=1
that is
T2 < .!. E(tj
n
-
2
tj-I) .
n2 - n j=1
Thus,
2174 2174 n
- n -< -~(t·
T2 ~ 3 -to3- 1)2 ,
3=1
Var(me) 1
. -)2 11 Cov (X6
- , -Xt) dsdt
~)2 11
- g [e,112 s t
11
- g [e,l)2
g
2 + (1)2
2g2 r 1dsdt
- g e:5 6 9$1
O(g) --+ 0, as g -+ o.
Index
L2-consistency, 259 backward shift operator, 206
L2-continuous, 198 Banach space, 305
L 2-differentiable, 198 Bernoulli process, 39
L 2-integrable, 199 Bernoulli random walk, 120
V-space, 306 Bessel function, 174
u-additivity, 5 best linear predictor, 194
u-field, 299 bias, 276
n-step transition probability, 51 bilateral Wiener process, 249
birth and death process, 99
birth process, 97
absolutely continuous, 303
birth rate, 97, 99
absolutely continuous distribution
Black-Scholes process, 247
functions, 13
Borel-Cantelli Lemma, 7
absorbing state, 108
branching process, 70
absorption probability, 62
Brownian bridge, 253
adapted-Bn, 219
Brownian motion process, 235
Akaike's criterion, 281 bus paradox, 90
aliasing, 289 busy period, 179
almost everywhere, 300
almost sure consistency, 259 canonical representation, 37
almost surely, 23 Cauchy sequence, 305
alternating renewal process, 149 Central limit theorem, 26
alternative hypothesis, 260 central limit theorem, 230
ARIMA process, 214 Chapman-Kolmogorovequation, 51
ARMA model, 207 characteristic function, 26
ARMAX process, 215 closed linear subspace, 309
asymptotic statistical model, 259 communication, 52
autocorrelation, 190 complete, 37, 38
autocovariance, 189, 209, 212 complete normed space, 305
autoregression estimation, 279 complete orthogonal system, 308
autoregressive / moving average compound Poisson process, 91
process, 213 conditional distributions, 15
autoregressive process, 207 conditional expectation, 22, 309
347
348 Index