Académique Documents
Professionnel Documents
Culture Documents
Contents
I.
POBABILITY .................................................................... 6
A.BENHARI
-2-
1. Inequalities ................................................................ 44
2. Convergences of Sequences of Random Variables ............................... 45
3. The Weak Laws of Large Numbers .............................................. 46
4. The Strong Laws of Large Numbers ............................................ 47
5. The Central Limit Theorems .................................................. 49
Conditioning. Conditioned distribution and expectation. ............................ 51
1. The conditioned probability and expectation. ................................ 51
2. Properties of the conditioned expectation. .................................. 53
3. Regular conditioned distribution of a random variable. ...................... 59
Transition Probabilities ........................................................... 67
1. Definitions and notations. .................................................. 67
2. The product between a probability and a transition probability. ............. 68
3. Contractivity properties of a transition probability. ....................... 70
4. The product between transition probabilities. ............................... 73
5. Invariant measures. Convergence to a stable matrix .......................... 74
Disintegration of the probabilities on product spaces .............................. 75
II.
A.BENHARI
-3-
III.
A.BENHARI
-4-
1.
Maximal inequalities .................................................. 207
2.
Almost sure convergence of semimartingales ............................ 210
1
3.
Uniform integrability and the convergence of semimartingales in L .... 214
4.
Singular martingales. Exponential martingales. ........................ 218
Bibliography: ...................................................................... 221
A.BENHARI
-5-
I.
A.BENHARI
POBABILITY
-6-
UA
(which
iI
IA = IA = UA
i
iI
iI
iI
A.BENHARI
-7-
+ +
(2) P U A i = P(A i ) for all A 1 , A 2 , L , A n , L such that A i I A j = when
i =1 i =1
i j
(3) P( ) = 1 (which implies that P( ) = 0 )
Remark 1: Usually, is often called sample space, the field of random events and for all
Remark 2: If = {1 , 2 , L , N } and p( i ) =
1
, where i = 1,2, L , N , then the resulting
N
1
2
(1) p(1 ) = , p( 2 ) = , then (, , P ) is a discrete probability space
3
3
(2) p(1 ) = p( 2 ) =
A.BENHARI
1
, then (, , P ) is a classical probability space
2
-8-
Example Let = {1 , 2 , L , n , L} , = 2
1
2
6
=
and p( n ) = + n
, n = 1,2, L , then
1
(n)2
2
k =1 k
P=
A.BENHARI
-9-
n
n
k 1
P U A i = ( 1)
P A i1 L A ik
1i1 <L<i k n
i=1 k =1
Hint:
n +1
n
P U A i = P U A i + P(A n +1 ) P U (A i A n +1 )
i=1
i=1
i=1
n
n
k 1
k 1
= ( 1)
P
A
L
A
+
P
(
A
)
P A i1 L A i k A n +1
( 1)
i1
ik
n +1
k =1
1i1 <L<i k n
k =1
1i1 <L<i k n
P(A ) + P(A )
1 i1 n
i1
n +1
n
n 1
k 1
k
+ ( 1)
P A i1 L A ik + ( 1)
P A i1 L A ik A n +1
k=2
1i1 <L<i k n
1i1 <L<i k n
k =1
+ ( 1)
P(A
1i1 <L<i n n
i1
L A in A n +1
P(A )
1i n +1
k 1
+ ( 1) P A i1 L A ik +
P A i1 L A i k 1 A n +1
k=2
1i1 <L<i k 1 n
1i1 <L<ik n
+ ( 1) P(A1 L A n A n +1 )
n
n
k 1
(
)
P
A
+
P A i1 L A i k + ( 1) P(A1 L A n A n +1 )
( 1)
i
1i n +1
k =2
1i1 <L<i k n +1
n +1
k 1
= ( 1)
P A i1 L A ik
k =1
1i1 <L<i k n +1
A.BENHARI
-10-
P (AB)
, where P(A ) > 0 .
P(A )
Theorem Let (, , P ) be a probability space and A with P(A ) > 0 , the triplet
( A , A , PA )
PA (AB) = P(B A ) .
Proof:
k
k
k
Remark:
A B A I B = A , A I U E k = U (A I E k )
k
k
A.BENHARI
-11-
P(E i A ) =
P(E i )P(A E i )
.
P(E k )P(A E k )
k
Proof:
P(E i A ) =
P(AE i )
=
P(A )
P(E i )P(A E i )
#
P(E k )P(A E k )
k
P(AB)
= P(A ) .
P(B)
Remark 2: Recall that two events A and B are said to be incompatible if AB = . In this
case, P(AB) = 0 .
A.BENHARI
-12-
Appendix Combinatorics
Sample Selection Suppose there are m distinguishable elements, how many ways there are in
which one can select r elements from these m distinguishable elements?
Repetitions are
Order
allowed?
counts?
(With/Without
Remarks
replacement)
Yes
Yes
mr
Permutation
Yes
No
m!
(m r )!
Permutation
No
Yes
(m + r 1)!
r!(m 1)!
Combination
No
No
m!
r!(m r )!
Combination
Balls into Cells There are eight different ways in which n balls can be placed into k cells:
Can cells be
balls?
empty?
A.BENHARI
cells?
Yes
Yes
Yes
kn
Yes
Yes
No
n
k!
k
No
Yes
Yes
(k + n 1)!
n!(k 1)!
No
Yes
No
(n 1)!
(k 1)!(n k )!
-13-
Yes
Yes
No
Yes
No
r =1
r
n
k
No
k
No
No
Yes
p (n )
r =1
No
No
No
p k (n )
n 1 k
kr k
where = ( 1) r n is the Stirling cycle number and p k (n ) the number of
k k! r =1
r
partition of the number n into exactly k integer pieces.
A.BENHARI
-14-
Remark 2: In application, a random variable can be used to depict a random experiment and
E(x ) can be used to depict a result of the experiment, i.e., a random event.
Definition Let (, , P ) be a probability space and a random variable, then the probability
F(x ) = P{ , () < x}
is called the distribution (function) of .
x +
-15-
n!
k! (n k )!
k =0
Remark 2: If let { = k} be an event that among the n independent random experiments only
k experiments are successful, then P{ = k} = C kn p k q n k .
n +
proof:
A.BENHARI
-16-
nk
k
e
k!
lim C p (1 p n )
n +
n
k
k
n
n k
= C 1
n
n
k
n k
n
k
k
1
k 1
=
1 L 1
1
k!
n
n
n
n k
k
(
np )
k!
n k
k
e #
k!
e np
Example If the variables 1 , 2 ,L, n are statistically independent and distributed with the
n
P{ = k} = q k1 p .
P{ = k} =
A.BENHARI
N M
CM
k C n k
, where M < N , k M , n N and k = 0,1,L, n
C nN
-17-
k
e , where > 0 , k = 0,1,L
k!
f (x )dx = 1.
f ()d
is a distribution function, i.e., F(x ) is monotone increasing, continuous and lim F(x ) = 0 ,
x
lim F(x ) = 1 .
x +
Theorem Let be a continuous random variable with distribution F(x ) , then there must be a
probability density function f (x ) such that F(x ) =
f ()d .
Remark: For a continuous random variable, the relation between its distribution and its
probability density function is as follows:
F(x ) =
f ()d
F(x ) = f (x )
A.BENHARI
-18-
f (x ) = b a
0
x (a , b )
others
1
f (x ) =
e
2
( x )2
22
, x ( ,+ )
e x
f (x ) =
0
x0
, where > 0
x<0
x t
e dt = 1 e x
F(x ) = P{ < x} = f (t )dt =
0
x0
x<0
x 0
Proof:
(1) At first, we have
A.BENHARI
-19-
o(x )
=0.
x
P{ x + x x} =
P{ x + x; x} P{ x + x} e (x + x )
=
=
= e x = P{ x}
x
P{ x}
P{ x}
e
=1 e
k
(
x )
= x + ( 1)
+
k!
k =2
o ( x )= ( 1)k
(x )k
k=2
x + o(x ) #
k!
xn
.
n = 0 n!
Remark: e x =
Proof:
Let p(t ) = P{ t}, then we have p(0 ) = P{ 0} = 1 and
p(t + t ) = P{ t + t} = P{ t + t; t} = P{ t + t t}P{ t}
= [1 P{ < t + t t}]p(t ) = [1 t + o(t )]p(t )
which leads to
d ln p(t )
p(t + t ) p(t )
o(t )
=
= + lim
p(t ) = p(t )
t 0
t 0 t
dt
t
p (t ) = lim
Example (Speaking Time) Suppose the probability of a telephone being used at time t and
released during the coming period (t , t + t ] is t + o(t ) , whats the distribution of time T
during which the telephone is being used, i.e., the speaking time of a telephone user?
A.BENHARI
-20-
Example Suppose there are n persons speaking at time t, whats the probability of the event
that 2 or more persons finish speaking in the coming time period (t , t + t ] ?
Solution:
Let i be a random variable such that { i = 1} represents the event that the ith person finishes
speaking in the time period (t , t + t ] , then
p{ i = 1} = t + o(t ) , p{ i = 0} = 1 t + o(t )
where i = 1,2, L , n . Thus, the random variable
i =1
P i 2
1 P i = 0 P i = 1
i =1
= lim
i =1
i =1
lim+
+
t 0
t 0
t
t
1 [1 t + o(t )] n[t + o(t )][1 t + o(t )]
= lim+
t 0
t
n 1
=0
1 x
x e
f (x ) = ( )
x>0
x0
1 t
e dt , where > 0 .
A.BENHARI
-21-
f (x )dx =
g ( x )< y
g 1 ( y )
f (x )dx f (y ) =
dF (y )
dy
) dgdy(y)
) dgdy(y)
= f g 1 (y )
f (x )dx =
g ( x )< y
f (x )dx f (y ) =
g 1 ( y )
dF (y )
dy
= f g 1 (y )
) dgdy(y)
1
f =g ( ) (y ) = f g 1 (y )
Remark 2:
g(x )
g (x )
d
= g (x )h (x , g (x )) f (x )h (x , f (x )) + h (x , t ) dt
(
)
h
x
,
t
dt
x
dx f (x )
f (x )
y b
y b
F ( y ) = P{ = a + b < y} = P <
= F
a
a
y b
dF
dF ( y )
a 1 y b
f ( y ) =
=
= f
dy
dy
a a
Remark: For a 0 , f ( y ) =
1 y b
f
.
a a
P y < <
F ( y ) = P = 2 < y =
f ( y ) =
A.BENHARI
dF ( y )
dy
( y )+ f ( y )
2 y
0
-22-
y = F
0
( y ) F ( y )
y>0
y0
y>0
y0
f ( y ) =
dF ( y )
dy
1
f (ln y ) y > 0
= y
0
y0
( )
F (y ) = P{ = ln < y} = f (x )dx = F e y f (y ) =
0
dF (y )
dy
( )
= f ey ey
+ 2(k 1) sin 1 y
F ( y ) = P{ = sin < y} =
f1 (x )dx 1 < y 1
=
2 k + sin y
0
y 1
Example Let (, ) be a random vector and F(x , y ) its joint distribution, then
A.BENHARI
-23-
k1 < x1 k 2 < x 2
P{
kn <xn
= k1 ; 2 = k 2 ;L ; n = k n }
+ +
L f (x , x
1
, L , x n )dx 1dx 2 L dx n = 1
(3) F(x 1 , x 2 , L , x n ) =
x1 x 2
xn
L f ( ,
1
, L , n )d1 d 2 L d n
A.BENHARI
-24-
F(x 1 , x 2 , L , x p ) = F(x 1 , x 2 , L , x p , x p +1 = +, L , x n = + )
kn
f (1 , 2 , L , p ) = L f (1 , L , p , p +1 , L , n )d p +1 L d n
(1 , 2 ,L , n )
p +1
F1 ,L, p
p +1 ,L, n
(x , x
1
, L , x p k p +1 , L , k n ) =
L P{
k1 < x1
kp <xp
= k1 ; L ; p = k p ; p +1 = k p +1 ; L ; n = k n }
P{ p +1 = k p +1 ; L ; n = k n }
Remark: Again, in the discrete case, we prefer the conditional probability to the conditional
distribution:
P{1 = k 1 ;L; p = k p p +1 = k p+1 ;L; n = k n } =
p +1
Remark:
p +1 ,L, n
(x , x
1
f (1 ,L, p x p +1 ,L, x n ) =
A.BENHARI
xp
f p +1 L n (x p +1 , L , x n )
the
conditional
, L , x p x p +1 , L , x n ) = L
In
f (1 , L , p , x p +1 , L , x n )
x1
practice,
f (1 ,L, p , x p +1 ,L, x n )
f p +1Ln (x p +1 ,L, x n )
-25-
d 1 L d p
density
} {
}{
} {
f (x , y )dxdy =
x + y<z
+ z y
f (x , y )dx dy
z +
z
z
f
(
u
y
,
y
)
du
dy
=
f
(
u
y
,
y
)
dy
du
=
f (u )du
x + y=u
where f (z ) =
dF (z )
dz
= f (z y, y )dy .
f (z y, y )dy = f (z y )f (y)dy = (f
A.BENHARI
-26-
* f )(z )
n x n 1 x
e
f Sn (x ) = (n 1)!
x0
, where n 1
x<0
Solution:
n
f Sn +1 (x ) = f Sn (t )f Tn +1 (x t )dt =
n t n 1 t (x t )
n +1 x n 1
n +1 x n x
e e
dt =
e t dt =
e
(n 1)! 0
(n 1)!
n!
x
P{S n < x}
lim+
= lim+
x 0
x 0
x
n = 1
n x n 1 x
= lim+
e =
x 0 (n 1)!
0 n 2
This remark shows that the probability of 2 or more telephones being called by a person
during a period is the higher order infinitesimal of the period.
+ z + y
x y< z
z +
z
z
=
f (u + y, y )du dy = f (u + y, y )dy du = f (u )du
x y=u
where f (z ) =
dF (z )
dz
= f (z + y, y )dy .
A.BENHARI
-27-
f (z ) =
f (z + y, y )dy = f (z + y )f (y )dy
x <z
0
zy
0
z
y
0
+
yf
(
uy
,
y
)
dy
0
yf (uy, y )dy du
z
z
+
where f (z ) =
dF (z )
dz
y f (zy, y )dy .
zy
0 +
xy < z
0
z
0
z 1 u
1 u
= f , y du dy + f , y du dy
xy = u
y y
y y
0
z
0
+ 1 u
1 u
f
,
y
dy
0 y y y f y , y dydu
z
z
+ 1
u
= f , y dy du = f (u )du
y
y
where f (z ) =
A.BENHARI
dF (z )
dz
yf
z
, y dy .
y
-28-
Example Suppose and are independent random variables with the same exponential
distribution , i.e.,
2 e ( x + y )
f (x, y ) = f (x )f (y ) =
0
x > 0, y > 0
others
then
0
others
pq
p p
,
dpdq u > 0, v > 0
f
= 0< p<u ,0<q < v q + 1 q + 1 (q + 1)2
x
p = x + y ,q =
y
0
others
uv
u u
u2
,
=
e u
f
f (u , v ) = v + 1 v + 1 (v + 1)2 (v + 1)2
u > 0, v > 0
others
Remark 1:
x
p
J=
y
p
q
x
1 + q
q
pq= p
y x = , y = 1
1+ q
1+ q
1 + q
q
2
(1 + q ) dxdy = J dpdq = p dpdq
p
(1 + q )2
(1 + q )2
F (u , v ) = P = + < u; = < v =
f (x, y )dxdy 0<u=,0<v
uv
1+ v
(u x )
x
e dx =
uv
1+ v
1+vv x
e
e u dx
v
uv u
v
1 e u
e =
1 e u ue u
1+ v
1+ v
1+ v
f (u , v ) =
2 F (u , v )
uv
2 u
e u
2
= (v + 1)
-29-
0 < u ,0 < v
others
uv
1+ v
u x
e y dy e x dx
xv
1 f 1 (1 , 2 , L , n )
f ( , , L , )
n
2 = 2 1 2
one toone correspendence
M
n f n (1 , 2 , L , n )
1 g 1 (1 , 2 , L , n )
g ( , , L , )
n
2 = 2 1 2
M
n g n (1 , 2 , L , n )
then
F12 Ln (y1 , y 2 , L , y n ) = P{1 < y1 ; 2 < y 2 ;L; n < y n }
1 2 L n
f1 ( x 1 , x 2 ,L, x n )< y1
f 2 ( x 1 , x 2 ,L, x n )< y 2
(x 1 , x 2 ,L , x n )dx 1dx 2 L dx n
M
f n ( x 1 , x 2 ,L, x n )< y n
u1 = f1 ( x1 , x 2 ,L, x n )
u 2 = f 2 ( x1 , x 2 ,L, x n )
M
u n = f n ( x1 , x 2 ,L, x n )
u 1 < y1
u 2 <y2
1 2 L n
M
un <yn
which leads to
A.BENHARI
g 1
u 2
g 2
u 2
M
g n
u 2
g 1
u n
g 2
L
u n is Jacobian matrix.
O
M
g n
L
u n
L
-30-
E[g ( )] = g ( k )P{ = k }
k
if
g( ) P{ = } < + .
k
Remark 1: If
E[g(, )] = g ( i , j )P{ = i ; = j }
i, j
E[g ( )] =
g(x )f (x )dx
if
Remark 1: If
A.BENHARI
-31-
E[g (, )] =
+ +
1.2. Properties
Theorem The expectation E[] is a linear operator, i.e.,
E[af ( ) + bg ( )] = aE[f ( )] + bE[g ()]
where f (x ) and g (x ) are two functions, and two random variables and a and b two
numbers.
E = 0 P{ = } = 1
2
1.3. Moments
Definition Let be a random variable, then
[ ]
[ ]
E ( E )
[ ]
A.BENHARI
-32-
E
D
, then
E[] = 0 , D[] = 1
E
D
] [
]
E[( ) ]
] [
] [
Hint: E ( E ) = E ( + E ) = E ( ) ( E )
2
D i i = E i i i E i = E i ( i E i )
i =1
i =1
i =1
i =1
= E[(
n
i =1 j=1
2
2
2
i E i )( j E j )] = i E [( i E i ) ] = i D i
n
i =1
i =1
Example
k =1
p
Bernoullis distribution: P{ = k} =
, then
1 p k = 0
E = p , D = E ( E )2 = p(1 p )
Binormial distribution: P{ = k} = C nk p k q n k , k = 0,1, L , n , then
E = np , D = E ( E )2 = npq
Poisson distribution: P{ = k} =
e , k = 0,1,2, L , then
k!
] [ ]
E = , D = E ( E ) = E 2 (E ) =
1
Uniform distribution: f (x ) = b a
0
x (a , b )
, then
others
2
a+b
(
b a)
2
E =
, D = E ( E ) =
2
12
A.BENHARI
-33-
e x
Exponential distribution: f (x ) =
0
E =
1
Normal distribution: f (x ) =
e
2
x>0
others
, then
1
1
2
, D = E ( E ) = 2
( x )2
22
, x ( ,+ ) , then
E = , D = 2
( [ ]) (E[ ])
E[ ] E
where p > 1 and
1
p
1
q
1 1
+ = 1.
p q
Proof:
(1) We first prove that u v u + v , where u 0 , v 0 , 0 < < 1 and + = 1 .
Lets begin with the function y = x , where 0 < < 1 . Since y = ( 1)x 2 < 0 for all
x > 0 , the shape of y = x must be convex over the range (0, + ) , which leads to
u
, where v > 0 and u 0 , we
v
then have
v u u + v
Again, the above inequality can be applied to the case of v = 0 .
(2) Let
v=
[ ]
, u=
[ ]
and
A.BENHARI
-34-
1 1
+ =1
p q
1
1
and = ,
p
q
(E[ ]) (E[ ])
1
p
1
q
1
1
+
q E q
p E p
( [ ]) ( [ ])
Applying the mathematical expectation to both sides of the above inequality gives
E[ ]
(E[ ]) (E[ ])
p
1
p
1
q
( [ ]) (E[ ])
1 1
p
+ = 1 E[ ] E
q p
1
p
1
q
[ ]
[ ]
[ ][ ]
0 E + x = x 2 E 2 + 2 xE[] + E 2 E[] E 2 E 2
2
E[( E )( E)]
D D
E ( E )( E)
D D
E E
E E
D D
=1
Remark 2: Note the differences between the concepts of incompatibility (sets), statistical
independence (probability) and uncorrelation (mathematical expectation).
Theorem (Linear Correlation) Let and be two second-order random variables and
the correlation coefficient of and , then
A.BENHARI
-35-
= 1 = a + b
where a and b are two numbers.
Proof:
(1) If = a + b , then
E[( E )( E)]
[
] E[( E) ]
aE[( E ) ]
=
=1
a E[( E ) ]
E ( E )
E[( E )(a + b aE b )]
E ( E )
] E[(a + b aE b) ]
2
(2) If = 1 , then
2
2
E E 2
= E ( E) + E ( E ) 2E ( E )( E)
E
D
D D
D
D
D
E
E
= D
+ D
2 = 1 + 1 2 = 0
D
D
E E
D
P
=
( E ) + E = P{ = a + b} = 1
= P =
D
D
where a =
D
D
, b = E
D
D
E .
(3) If = 1 , then
2
2
E E 2
= E ( E) + E ( E ) + 2E ( E )( E)
E
+
D
D
D D
D
D
E
E
= D
+ D
+ 2 = 1 + 1 2 = 0
D
D
D
E
P
=
( E ) + E = P{ = a + b} = 1
= P =
D
D
where a =
D
D
, b = E +
D
D
E . #
Example (Linear Regression) Let and be two second-order random variables and
A.BENHARI
-36-
e(a , b ) = E (a + b )
How to choose a and b to make the error e(a , b ) as small as possible? By taking partial
derivatives of e(a , b ) with respect to a and b, one can have
e(a , b )
2
aE 2 + b1 = E[]
= 2E[( a b )] = 0
a
a =
1
b = 2 a 1
e(a , b ) = 2E[ a b ] = 0
a1 + b = 2
[ ]
L( ) =
2
( 1 ) + 2
1
e 2min = E L( )
If = 1 , E L( )
] = E[( ) a( ) ]= (1 )
2
]= 0 , i.e., = L() . #
2
2
= x , is then defined as
E[ x ] =
yf (y x )dy =
f (x, y )
f (x )
dy
E[ x ]f (x )dx =
+ +
f (x , y )
y
f (x )dx =
dy
f (x )
yf (x, y )dydx = E[]
+ +
Example From
f (y x ) 0 ,
f (y x )dy =
A.BENHARI
-37-
f (x, y )
f (x )
dy =
f (x )
f (x )
=1
E[ x ] =
yf (y x )dy = E[ x ]
] [
E x E[ x ] E x g (x )
2
y E[ x ] f (y x )dy
2
y g (x )
f (y x )dy
Theorem (Regression) Let and be two random variables, then for all functions g (x ) ,
] [
E E[ ] E g ( ) .
2
Proof:
E g ( )
] = y g (x ) f
+ +
+ +
y g(x )
+ +
(x , y )dydx = y g (x )
f (x , y )
f (x )
dy f (x )dx
f (y x )dy f (x )dx
[y E[ x ]] f (y x )dy f (x )dx = E[ E[ ]
+ +
]#
Remark: The theorem shows that if one wants to look for a function g (x ) such that g ( )
approaches best among others, then the conditional expectation E[ x ] given is the best
choice. The resultant variable E[ ] is often called the regression of with respect to .
[ ]
[ ]
A.BENHARI
-38-
d n g (x )
= k (k 1)L (k n + 1)x k n P( = k )
n
dx
k
d n g (x )
= k (k 1)L (k n + 1)P( = k ) = E[( 1)L ( n + 1)]
x 1 dx n
k
lim
Example Let be a random variable satisfying the binomial distribution, the generating
function of is then given by
[ ]
g(x ) = E x = x k C kn p k q n k = (xp + q )
k =0
dg(x )
n 1
= lim n (xp + q ) p = np
x 1 dx
x 1
E[] = lim
d 2 g (x )
n2
+ np = lim n (n 1)(xp + q ) p 2 + np = n (n 1)p 2 + np
2
x 1 dx
x 1
[ ]
] [ ]
Example Let be a random variable satisfying the Poisson distribution, the generating
function of is then given by
[ ]
g (x ) = E x = x k
k =0
k
e = e x e = e (x 1)
k!
dg(x )
= lim e (x 1) =
x 1 dx
x 1
E[] = lim
d 2 g (x )
+ = lim 2 e (x 1) + = 2 +
x 1 dx 2
x 1
[ ]
] [ ]
2 = E ( E[]) = E 2 E 2 [] = 2 + 2 =
2
[ ]
A.BENHARI
-39-
is called the
= ( 1 , 2 , L , n )
Let
be
an
n-dimensional
= (E[ ]) = ( 1 , 2 , L , n ) and R = E ( )( )
T
],
random
vector,
f (x ) =
(2)
n
2
1
2
1
(x )T R 1 (x )
2
1 2
1 12
-1
, R =
22
1 2
1 2
12
Remark: When n = 2 , R =
1 2
f (x,y) =
, where x = (x 1 , x 2 , L , x n ) R n
1
21 2 1
1 2
1
22
and
( x 1 )2
( x 1 )( y 2 ) ( y 2 )2
2
+
2
1 2
2 1 1
22
coefficient, then
Since
f (x,y) =
1
21 2 1 2
( x m1 )2
( x m1 )( y m 2 )+ ( y m 2 )2
2
2
12
2 1 1
22
1
f 1 (x ) =
e
21
( x m1 )2
2 12
1
, f 2 (y ) =
e
2 2
, =
cov(1 , 2 )
1 2
( y m 2 )2
2 22
we have
= 0 f (x , y ) = f 1 (x )f 2 (y ) #
Example The marginal and conditional distributions of a multivariate normal distribution are
still normal.
A.BENHARI
-40-
Proof:
Marginal distributions:
f (x ) =
( x m1 )2
2 1
= N 1 , , f (y ) =
2 12
2
1
1
2 2
( y m 2 )2
2 22
= N 2 , 22
Conditional distributions:
1
f (y x ) =
f (x , y )
f (x )
21 2 1
( x 1 )2
( x 1 )( y 2 ) ( y 2 )2
2
+
2
1 2
2 1 1
22
1
2 1
2 1 2
2
( x 1 )2
2 12
1 2 ( x 1 )2
( x 1 )( y 2 )+ ( y 2 )2
2
12
12
22
2 1 2
( (1 ) )
2
e 2 (1 )
2
22
y 2 + 2 (x 1 )
2 1 2
2
y 2 x 1
1
2 1 2
e (
= N 2 (x 1 ) + 2 , 1 2 22 #
1
Remark: Since
E[ x ] =
yf (y x )dy = (x ) +
2
Theorem Let = (1 , 2 , L , n )
a 11 a 12
a 22
a
A = 21
M
M
a
m1 a m 2
L a 1n
L a 2n
, then = A is an m-dimensional normal random vector.
O M
L a mn
Remark: This theorem shows that the linear transform of a normal random vector is still
normal.
A.BENHARI
-41-
Remark 2: It is possible that random variables 1 , 2 , L , n are not jointly normal even
though each of them is normal.
Memo
Definition
E[g ( )] =
E[g (, )] =
+ +
k ,m
Examples
( E )( E)
2
E[] , D = E ( E ) , = E
D D
Properties
A.BENHARI
-42-
E i i = i E[ i ]
i
i
[ ][ ]
E[] E 2 E 2
Linear Regression
L ( ) =
2
2
( 1 ) + 2 , E L () = 22 1 2
1
where 1 = E , 12 = E 1
],
= E , 22 = E 2
Regression
+
E g ( )
] E[ f () ]
2
Normal Distribution
f (x, y ) = N 1 , 2 , 12 , 22 ,
f (x ) = N 1 , 12 , f (x ) = N 2 , 22 , f y = N 2 (x 1 ) + 2 , 22 1 2
x
A.BENHARI
-43-
i =1
is normal
Limit Theorems
1. Inequalities
Hajek & Renyi Inequality Let 1 , L , n be independent random variables with finite second
moment and C1 , L , C n be numbers such that C1 L C n 0 , then for all 1 m < n and all
>0,
j
m
n
P max C j ( i E i ) 2 C 2m D i + C 2j D i
i =1
j=1
j= m +1
m j n
1
P max ( i E i ) 2
1 j n i =1
D
j=1
Chebyshev Inequality Let be a random variable with finite second moment, then for all
>0,
P{ E }
1
D
2
Hint: Chebyshev inequality can be regarded as a special case of Kolmogorov inequality when
letting n = 1 . Chebyshev inequality can also be proven directly
P{ E } =
f (x )dx
x E
A.BENHARI
x E
x E
-44-
f (x )dx
1
2
D
x E f (x )dx =
P , lim n () = () = 1
n +
lim P , n () () = 0
n +
n +
n +
n +
lim E n
n +
]= 0
A.BENHARI
-45-
1 n
lim P k a n = 0
n +
n k =1
Remark: The convergence involved in the weak laws of larger numbers is exactly the type of
convergence in probability. In fact, let n =
1 n
k a n , n = 1,2,L , then
n k =1
1 n
lim P k a n = lim P{ n } = 0
n +
n k =1
n +
This means that the sequence of random variables 1 , 2 , L, n ,L converges in probability to
zero.
Theorem (The Weak Law of Large Numbers, Khintchine) Suppose the second-order
random variables 1 , 2 , L , n , L are independent and identically distributed, then for all
>0,
1 n
lim P k = 0
n +
n k =1
where = E[ k ] .
Proof:
n 2
E k
n
k =1 n 2
k
= 2 n
0
+
2
n
k =1 n
Chebyshev Inequality
where 2 = E ( k ) . #
A.BENHARI
-46-
1 n
P lim k a n = 0 = 1
n + n k =1
Remark 1: The convergence involved in the strong laws of larger numbers is exactly the type
of convergence almost everywhere. In fact, let n =
1 n
k a n , n = 1,2,L , then
n k =1
1 n
P lim k a n = 0 = P lim n = 0 = 1
n +
n + n k =1
Remark 2: Since the convergence almost everywhere will lead to the convergence in
probability, a sequence of random variables satisfying the strong laws of large number must
satisfy the weak ones:
1 n
1 n
n + n k =1
n k =1
Theorem (The Strong Law of Large Numbers, Kolmogorov) Suppose the second-order
random variables 1 , 2 , L , n , L are independent with each other and
D k
< + , then
2
n =1 n
+
( k E k )
1 n
P lim k =1
= 0 =n P lim k a n = 0 = 1
1
n +
n + n
n
k =1
a n = n k=1E k
Theorem (The Strong Law of Large Numbers, Khintchine) Suppose the second-order
random variables 1 , 2 , L , k , L are independent and identically distributed, then
A.BENHARI
-47-
1 n
P lim k = = 1
n + n
k =1
where = E k .
Hint: Since the random variables 1 , 2 , L , k , L are identically distributed, one can have
+
D k
1
D
=
< +
k
2
2
k =1 k
k =1 k
+
=1
p
Remark: If k satisfies the 0-1 distribution: P{ k = } =
, then
1 p = 0
1 n
E k = p and P lim k = p = 1
n + n k =1
Note that
1 n
k represents the frequency of occurrence of the event { k = 1} in n Bernoulli
n k =1
experiments, the law of large numbers implies that the frequency will approximate the
corresponding probability p as n + .
A.BENHARI
-48-
moments and n =
i E i
i =1
i =1
D i
i =1
n
the conditions under which the distribution of n will tend to the standard normal distribution
N(0,1) as n + , i.e.,
lim P{ n < x} =
n +
e
2
t2
2
dt
i =1
Remark 2: The convergence involved in the central limit theorems is exactly the type of
convergence in distribution. In fact, let (x ) =
e
2
t2
2
dt and n (x ) = P{ n < x} ,
n = 1,2,L , then
lim P{ n < x} =
n +
e
2
t2
2
dt lim n (x ) = (x )
n +
lim P{ n < x} =
n +
where n =
i =1
i =1
i E i
n
D
i =1
A.BENHARI
i =1
e
2
t2
2
dt
, = E i , 2 = D i .
-49-
The Central Limit Theorem (de Moivre & Laplace Theorem) Let 1 , 2 , L , n , L be a
sequence
of
IID
random
variables
with
finite
second
moment,
if
k =1
p
P{ i = k} =
for all i, then,
1 p = q k = 0
lim
P i = k
i=1
n +
1
2 npq
1 k np
2 npq
i np
= 1 , lim P i =1
< x =
n +
npq
1
2
t2
2
dt
i =1
, we so far have
n
(np ) np
P i = k
e , when n is large enough and p is small enough
k!
i=1
1 k np
npq
1
2
P i = k
e
2 npq
i=1
t2
x
i np
1
i =1
P
< x
e 2 dt , when n is large enough. In this case,
npq
2
n
np
P 0 i < x = P
i =1
npq
A.BENHARI
i =1
np
npq
-50-
x np
1
<
x
npq
2
x np
npq
np
npq
t2
2
dt
i =1
np
npq
P( BA)
P( A)
This is called the conditioned probability of B given A.
Of course that P(BA) = P(B) P(BA) = P(B)P(A) A and B are independent.
If A is given, we may consider the function PA : K [0,1] given by
(1.2) PA(B) = P(BA)
It is obvious that PA is a new probability on the -algebra K, called the
conditioned probability given A.
The integral of a random variable X with respect to it will be denoted by
E(XA) or EA(X). The computing formula is
E ( X 1A )
PROPOSITION 1.1. E(XA) =
P ( A)
Proof. Obvious for X = 1B . Then apply the usual method of four steps: X simple,
(1.1) P(B A) =
X nonnegative, X any.
Let now Y be a discrete random variable and I be the set {y P(Y = y)
0}. Then I is at most countable and Y admits the cannonic representation
Y=
y1{Y = y} (a.s.) . In many statistical problems one gets interested in
yI
This quantity will be called the conditioned probability of B given the random
variable Y.
EXAMPLE.. An urn I has n labelled balls (that is I ={1,2,..,n}. One draws two
balls without replacing. The first one is Y and the second one is X . One wants to
compute P(X=x Y) and to compare it with P(X = x ) . Accepting that we are in the
classical context , = I 2 \ {(i,i)i I} , thus = n(n-1) . Then P(X = x
{X = x, Y = y} 0 if x = y
P ( X = x, Y = y )
Y = y ) =
=
= 1
(as Y has only n -1
if x y
{Y = y}
P(Y = y )
n 1
possibilities). It means that
P(X=x Y) =
{ }
yI \ x
1
1
1
1 {Y = y} =
1{Y x } . Compare this with P(X=x) =
.
n 1
n 1
n
Looking at (1.3) one remarks four things :(i). the conditioned probability is a
random variable ; (ii). the random variable does not depend as much on Y as on the
A.BENHARI
-51-
sets {Y=y} which form a partition of ;(iii). This random variable is measurable
with respect to the -algebra (Y) := Y-1(B()) and, finally, (iv). The random
variable may be not defined everywhere, but only almost surely : if P(Y=y) = 0,
then P(B Y=y) may be any number form 0 to 1. A convention, as good as any other,
would be that in this case to decree that P(B Y=y)=0.
It means that a more natural definition would be the conditioned probability
of B given a partition = (j) j I where I is at most countable . Then the analog
of (1.3) would be
(1.4) P(B ) =
j:P ( j ) 0
P ( B j )1 j
(1.5) E(X ) =
j:P ( j ) 0
E ( X j )1 j , X L1
U
jJ
J I ) , we can say that the right hand of (5) is a definition for E(XF) ,
instead of E(X ). So
E(X F ) =
(1.6)
j:P ( j ) 0
E ( X j )1 j , X L1
E E( X
jI
)1 j
j: P ( j ) 0
E ( E ( X j )1 j ) = EX < . As
jJ
E (Y1
jJ
disjoint) =
E(E( X
jJ
E( X
jJ
) P( j ) =
E( X 1
jJ
)1 j ) (since j are
The conditions (i) and (ii) are used to define E(X F ) in general
situations.
Definition.11. Let X L1(,K
K,P) and F K be a sub -algebra. We say that Y =
E(X F) (read : Y is the conditioned expectation of X given F ) iff
(1.7) Y is F measurable and A F E(X1A) = E(Y1A)
Definition.2. Let B K . By P(B K
K) we shall understand E(1B F ). Read: the
A.BENHARI
-52-
One may remark that the key concept is that of conditioned expectation.
1
> Y2 + and get that P(Y1>Y2) = 0. In the same way one gets that P(Y1<Y2) =
n
n =1
0, that is P(Y1Y2)=0 Y1 = Y2 (a.s.).
U Y
b < c LbLc.Their probability can be either 0, or 1. As 0 = P(bLb) = limbP(Lb) it means that some of these sets will heve probability 0. Let c = sup {b
P(Lb) = 0}. Then, due to the definition of c, P(Lc+ ) = 1 > 0. In the same
way P(Lc) = 0. By the monotonous continuity of any measure it follows that P(Y
c) = 1 but P(Y<c) = 0 P (Y = c) = 1 Y = c (a.s.). So Y is a constant a.s.
If in (1.7) we take A = , we get that EX = E(X1A) = E(Y1A) = EY = Ec = c.
As about the second claim, it is obvious from 1.7.
Property 3. Projectivity.
Projectivity If F G are two -algebras then E(E(XG)F )=E(XF ).
As a consequence of property 2, we get that EX = E(E(XG)).
Proof. Let Y = E(XG ) and Z = E(XF ). We want to check that E(Y F) = Z .
Firstly, Z is F measurable. Secondly, let A F. Then E(Z1A) = E(X1A) (by 1.7) =
E(Y1A) (again by 1.7; notice that A F A G ! ) It means that E(Y F) = Z.
A.BENHARI
-53-
Property 4. Linearity.
Linearity If a,b and X1,X2 L1 then E(aX1+bX2F ) = aE(X1F
)+bE(X2F ) (a.s.)
Proof. Let Yj = E(XjF ), j = 1,2. Let Y = aY1+bY2 and A F . Then Y is F
measurable and, moreover, E(Y1A) = E((aY1+bY2 )1A) =a E(Y1 1A) +b E(Y2 1A) = a E(X1 1A)
+b E(X2 1A) (by 1.7) = E((aX1+bX2 )1A) , checking the second condition from 1.7.
Property 5. Monotonicity.
Monotonicity If X1 X2 then E(X1F ) bE(X2F ) (a.s.)
Proof. Using Property 4, it is enough to check that X 0 E(XF ) 0 (a.s.).
Let Y = E(XF ) . Y is F measurable and A F E(Y1A) = E(X1A) 0 since X
0. If one puts A = {Y<0} it follows that E(Y1A) = -E(Y-) 0 E(Y-) 0 E(Y-)
= 0 (a.s.) Y = Y+ (a.s.) Y 0 (a.s.) .
Property 6. Jensens inequality.
inequality Let X : I be a random variable and f :
I be convex (here I is an interval!). Then E(f(X)F ) f(E(XF )).
Proof. A convex function f can be written as f = sup {haa }, at most
countable and ha affine functions, ha(x) = max+na. (for instance = QI and if a
, ha is a tangent for f at (a, f(a)) ; it is known that a convex function has at
least one tangent at every point)
Then E(f(X)F ) = E(sup {ha(X)a }F ) sup {E(ha(X)F)a } (by Property
6, monotonicity) = sup {E(maX+na)F)a }= sup {maE( XF ) + naa } (by
linearity and Property 2 the expectation of a constant is the constant itself) =
f(E(XF)).
Property 7. Contractivity.
Contractivity Let p[1,] and X Lp . Then
E(XF ) p X p.
As a consequence the conditioned expectation is a linear contraction from
p
p
L (,K,P) to L (,F,P)
Proof. There are two cases.
1. 1p < . The claim is EE(XF )p EXp . Let f(x) = xp. Then f :
is convex so we know that E(f(X)F ) f(E(XF )) E(XpF )
E(XF )p . If we take the expectation, we get E(E(XpF )) E(E(XF
)p) which, because of Property 3 is exactly our claim.
2. p = . Let then M = X . It means that X M (a.s.) E(X F )
E(MF ) (by property 5, monotonicity) E(X F ) M (a.s.)
E(X F )
M .
A.BENHARI
-54-
1 1
+ = 1 , p,q
p q
b 1
i =1
bi E (1A1Bi E(XF )) =
i =1
i Bi
i =1
i =1
A.BENHARI
-55-
D(Y) =
X-Y 2 . Then D is convex and has an unique (a.s.) point of minimum which
is exactly Y = E(XF ). Moreover, the following Pythagora rule holds:
X-Y 22 = X-Y 22 + Y- E(XF ) 22.
2
2
As a consequence the mapping EF : L L (,F,P) given by EF (X) = E(XF ) is the
orthogonal projector from the Hilbert space L2 to the Hilbert subspace L2(,F,P).
Proof. Let Z= E(XF ). Then X-Y 22 = E(X-Y)2 = E((X-Z)+(Z-Y))2 =
2
2
E((X-Z) ) + E((Z-Y) + 2 E((X-Z)(Z-Y)) . The last term is equal to 2 E(E((X-Z)(ZY)F )) (property 3) = 2E((Z-Y) E((X-Z)F )) = 2E((Z-Y) (E(XF )-Z)) = 2E((Z-Y)
(Z-Z)) = 0. It means that
X-Y 22 = X-Y 22 + Z-Y 22 .
Property 11. Conditioning and independence. If X is independent on F , then
E(XF) = EX . It is not true in general that E(XF) = EX X is independent on
F . However, if P(BF ) = const P(BF ) = P(B) B is independent on F .
Proof. Let X be independent on F and Y = EX. The task is to prove that Y
fulfills the conditions (1.7). As measurability is obvious, let A F (hence A is
independent on X X and 1A are independent) Then E(X1A) = EX E1A = EXP(A) =
E(EX1A) = E(Y1A) checking the first claim. As about the converse, it cannot be
true since it is enough to choose X = 1A 1B with P(A) = P(B) = p and F =()
where = (j)jJ is an (at most) countable partition of . Then EX = 0 and
E(XF) = P(AF ) P(BF ) =
(P( A
jJ
that P(Aj) = P(Bj) pP(j) , that would be an example that it is possible that
E(XF) = EX = 0 but X be not independent on F, since P(X=1,j) = P(Aj)
P(X=1)P(j).
However, suppose that P(BF ) = c where c is a constant. By (1.7) this
means that E(1A1B) = E(c1A) A F, or that P(AB) = cP(A) A F. If A =
one finds the constant c = P(B) and discovers that the definition relation (1.7)
means that P(AB) = P(A)P(B) A F, in other words, that B is independent on F
.
a 1
1 j n
A.BENHARI
i Ai
-56-
a 1
1 j n
i Bi
enough to put h = liminfn hn. In our very case the only fact that matters is that
the regression function E(XY) must be (Y) measurable.
( x a )2
2
for some lying somewhere between a and x. Remark that the mapping x a f((x)),
being a ratio between two continuous functions is continuous itself and thus,
measurable. Now replace in (2.1) x with X and a with E(XF ). We get
(2.2) f(X) = f(a) + f(a)(X-a) +
2
(
X a)
f((X))
( X a )2 F)
2
We applied the fact that f(a) and f(a) are already F measurable and property 8.
Taking into account that E(X-aF) = a a = 0 it follows that
(2.4) E(f(X)F ) = f(a) + E( f((X))
( X a )2 F )
2
2
(
X a)
f((X))
F )
0. But f is convex, thus f > 0. Being strictly convex, the set on which f = 0
contains no interval. But if Y 0 and E(YF) = 0, then Y = 0 a.s. Thus f((X))
( X a )2
2
( X () a )2 =
0}. We
2
know that P(AB) = 1. If a A then f(X()) = f(a) + f(a)(X()-a) . Well, that
may happen only if X() = a , else on the interval joining a andX() f would be
linear, which we denied. So in this case X() = E(XF)(). If B there is no
problem either: X() a . So X= E(XF) a.s. The second assertion is stronger,
but it comes from the fact that E(f(X)) = E( f(E(XF ))) E(E(f(X)F)) = E(
f(E(XF ))) E(f(X)F)) = f(E(XF )) (as if we know that U V and EU = EV
then U = V, too!) X = E(X F).
Property 14. The interior and adherence of a set in a -algebra.
A.BENHARI
-57-
-58-
0}F . We claim that B = C. Indeed, both these sets belong to F . Due to the
definition (1.7) we have that E(X+1B) = E(E(X+F )1B) = E(E(X+F )) (since always
EY = E(Y1{Y0}) !) = E(X+). But X+1BX+ and have the same expectation X+1B = X+
(a.s.) {X+ 0} B {X > 0} B {X > 0}F B (by (2.8)) C B. For
the converse inclusion, remark that E(X+F) 1C = E(X+1CF ) (property 8!) =
E(X1{X>0}1CF ) (as X+ = X1{X>0}!) = E(X1{X>0}F ) (as {X>0} C !) = E(X+F). Meaning
that { E(X+F)} C B C. In the same way one checks that the sets {E(X-F )
> 0} and {X < 0}F coincide. Now it is clear that YZ = 0 {Y 0} {Z 0} = .
Conversely, if {X > 0}F {X < 0}F = it follows that {Y > 0} {Z >
0} = (a.s.) Y-Z= Y + Z , proving our equivalences (2.9).
Example. If X = 1A 1B , X1 = P(A) + P(B) and E(XF )1= E(P(AF )-P(BF )).
These two quantities coincide iff (A)F (B)F = (a.s.).
Case 3. p = . Let M = X . As X = X we may as well
suppose that X 0. We already know that E(XF ) M . Let > 0. Then
X M- + 1{X>M-} E(XF) M - + P(X > M - F)
E(XF ) M - + P(X > M - F) = M - + P(X > M - F) . If
E(XF ) = M , then M M - + P(X > M - F) P(X > M - F)
P(X > M - F) 1 P(X > M - F) = 1 proving the implication
. For the other implication remark that X (M - )1(X > M - } E(XF) (M)P(X > M - F) E(XF) (M-)P(X > M - F) = M- for any >0.
Meaning that E(XF) = 1.
Example. Let = [1,), K = B([1,)), F =() with ={[n,n+1)}n1 , P = ,
(x) = 1/x2 Let Ak =
U [n +
n=k
Then P(AkF) =
P( Ak n )1n =
n=k
(1
n=k
n
1 has the property that P(Akn + n n
(PoX )(
-1
(3.1)
U
n =1
Bn F ) =
(PoX-1)(Bn F ) (a.s.).
n =1
U
n =1
A.BENHARI
-59-
definition!)
= E( 1
X 1 (
U Bn )
n =1
probability) = E( 1
U Bn
n =1
1
n =1
Bn
are disjoint!) =
n =1
!) =
n =1
E( 1X 1 ( B ) F ) =
n
P(X-1(Bn)F ) =
n =1
(PoX )(Bn F ) .
-1
n =1
The trouble is that the equality (3.1) holds only almost surely. That is, the
set of those having the property that (PoX )(
-1
U
n =1
Bn F )()
(PoX )(Bn
-1
n =1
U
n =1
Bn F )() =
n =1
of disjoint sets (Bn)n . In that case PoX-1(F )() would be a real probability on
(E,E ) for all N . That is the regular conditioned distribution of X given F.
To be precise
Definition. Let (E,E ) be a measurable space and X : E be a measurable
function. A function Q : E [0,1] having the properties
(i). Q(,B) is a version for P(X-1(B) F ) () B E ;
(ii). B Q(,B) is a probability on (E,E
E)
is called the regular conditioned distribution of X given F. Another name for this
object could be: a regular version for the conditioned distribution of X given F.
At a first glance it is not at all obvious why such a regular version
should exist at all.
We shall prove the following rather remarkable fact:
1
F ) (a.s.) , i.e. the sets Bx are neglectable, too. Let further C :=
n
{limx - G(x,) 0} and D ={limx + G(x,) 1} . Again by Beppo-Levy ,
the sets C and D are neglectable.
Let N be the union of all these sets : N =
U Ax, y U Bx C D F . Being a countable union of neglectable sets , N is
x+
x < y
A.BENHARI
-60-
1
,), and G(n
inf {G ( y, ) y , y > x} if 0
1[ o , ) ( x)
if 0
(3.3) F(x,) =
We claim
(i).
(ii).
(iii).
that
x a F(x,) is a distribution function for any ;
a F(x,) is F measurable for any x;
F(x) = P(X xF ) (a.s.) for any x .
Let us check (i). For 0, there is nothing to prove. Suppose that
0. Clearly F is non decreasing. If x , then by (3.2) we see that F(x,) =
G(x,) . So F(-,)=0, F(,) = 1. The only problem is to prove that F(,) is
right-continuous. If 0 , this is obvious. In this case F(x,) = 1[0,)(x).
Suppose that 0 is fixed. We shall not write it, to simplify the writing. Then
lim F(y) = inf {F(y)y (x,)} (as F is non-decreasing !) = inf {inf
y x
y( x , )
function G and any family of sets (A)I the equality inf {inf G(x)x A} I
} = inf {G(x) x
A } obviously holds check it as an amusing exercise!) =
Bn,.) =
(Bn,.)
n =1
A.BENHARI
U
n =1
n =1
-61-
1Bn (X)F ) (a.s.) (by property 8.1, Conditioned Beppo-Levy) =E( 1 (X))
U Bn
n =1
n =1
(a.s.)
Bn C.
n=1
C.
From (i) and (ii) it follows that C contains the =system generated by M . It
happens that this coincide with B(). The conclusion is: (B,.) = E(1B(X)F )
-1
(a.s) B B(). Or, in another notation, (B,.) = PoX (B F ) (a.s.) .
-1
Therefore is a regular version for PoX (F ).
-
fd)() :=
f(x)(dx,).
a E (1
) =
i =1
Bi
a 1
i =1
i Bi
i =1
( X ) d =
i =1
fd E(f(X)F ) =
fd (a.s.)
fd E(f(X)F ) =
fn d
fd (a.s.)
fd (a.s.).
A.BENHARI
-62-
for its conditioned distribution, = (PoX )(F ). We know that exists due to
Proposition 3.1.
Then the conditioned expectation is given by
-1
(3.6) E(XF ) () =
x2 (dx,)
x2 (dx,) - ( x (dx,) )2
Proof. These are easy consequences of the transport formulla, the first relation
2
with the function f(x) = x . For the second one notice that E((X E(XF ) F ) =
2
2
2
2
E(X 2X E(XF) + E(XF ) F ) = E(X F ) - 2 E(XF )E(XF ) + E(XF ) (by Property
2
2
9!) = E(X F ) E(XF ) .
Now we shall busy ourselves to find more or less practical formulae to
compute the conditioned regular distributions.
following assumptions:
B a (B,) is a probability on (,B()) ;
a (B,) is F - measurable B B() ;
The set NB = { (B,) P(X-1(B)F )()} is neglectable B B()
.
As F = (Y) by property 12 (B,) must be of the form (B,) = hB(Y())
where hB : E is E measurable, and this measurability explains the claim
(ii). Let us denote hB(Y()) by (B, Y()). Then B (B, y) is a probability on
(,B()) y Range(Y). Indeed, let y = Y() Range(Y) and let (Bn)n be a
sequence of disjoint Borel sets. Then (
n =1
U Bn , y) = ( U Bn , Y()) = ( U Bn , )
n =1
is a probability) =
n =1
n =1
(Bn,Y()) =
n =1
(Bn,y). The
n =1
A.BENHARI
-63-
(P
iI 0
o X 1 1i + *1
Pi (A) =
P( A i )
as defined in 1.1, * is arbitrary and is the union of the
P ( i )
P( X
iI 0
P
iI 0
( X 1 ( B ))1i =
(P
iI 0
(P
iI 0
( B ) i )1i =
measurability of the function a (B,) is obvious, the fact that for any given
the function B a (B,) = ( Pi o X 1 ) (B) (with i the unique set containing )
is a probability is clear, too, due to the definition 1.1. Finally, (B,)
coincides with (PoX-1)(F ) (a.s.).
yI
yI
1{Y=y}
We can let the formula as it is, but if belongs to the neglectable set
{Y()=yyRange(Y) \ I }, then P(X B Y )() = 0 B and that would not be a
probability. To have a regular version, we have to add a fictive probability * on
the set .
A.BENHARI
-64-
( x , y )I
pr1(I) and I2 = pr2(I). Of course I1 and I2 are at most countable, I I1I2 . Then
p ( x, Y )
-1
(3.10) (PoX )(Y) =
x
p 2 (Y )
xI1
(3.11) (PoY-1)(X) =
yI 2
where p1(x) =
p( X , y)
y
p1 ( X )
yI 2
p(x,y).
xI1
p ( x)
xI1
distribution of Y is PoY-1 =
p ( y)
yI 2
P(X=xY=y)1{Y=y} =
yI 2
yI 2
yI 2
and the
P(X=x,Y=y) =
yI 2
P(X=x,Y=y) =
xI1
xI1
P ( X = x, Y = y )
1 {Y=y} =
P(Y = y )
yI 2
p ( x, y )
1{Y=y} hence we can write
p2 ( y )
p ( x, Y )
x I1. This is a discrete distribution which can be written
p 2 (Y )
p ( x, Y )
x , proving (3.10). The equality
in a shorter form as (PoX1)(Y) =
p 2 (Y )
xI1
P(X=xY) =
( x, Y ( ))
, 21(y,) =
2 (Y ( ) )
(x, y )d (x )
( X ( ), y )
, 1(x) =
1 ( X ( ) )
(x, y )d ( y )
and
Proof.
A.BENHARI
-65-
It is easy to see that 1 and 2 are the densities of X and Y (For instance,
P(XA) = P((X,Y)A) =
(x,y)d(y))d(x) =
1d =
d2 =
( x, y ) d(x)d(y) =
( x)(
( x , Y )
d(x) (a.s.). As the
2 (Y )
( x , Y )
d(x)1C) C (Y) . As any C with this property is of
2 (Y )
-1
the form C=Y (B) for some B B() , the task is to prove that
( x , Y )
(3.14) E(1A(X)1B(Y)) = E( 1A (x)
d(x)1B(Y))
2 (Y )
( x , Y )
( x , Y )
( x , y )
But E( 1A (x)
d(x)1B(Y)) = ( 1A ( x )
d(x)1B(Y))dP = ( 1A ( x )
2 (Y )
2 (Y )
2 ( y )
E(1A(X)1C) = E( 1A
(1
( x)
( x , y )
d(x)1B(y))d(2)(y) =
2 ( y )
(1
( x)
( x , y )
2(y)1B(y))d()(y) d(x)
2 ( y )
(by Fubini !) =
A B
d(Po(X,Y)-1
A B
A B
A B
A B
(3.15)
1A(x) 12(x,y)d(x)
P( X A. y < Y < y + )
=
P ( y < Y < y + )
y +
1
lim
( 1
= lim
y +
(v)dv
that for continuous functions the Lebesgue and the Riemann integrals coincide and
the fact that if the function v a
(u, v)d(u) is
(u )(u , v)d (u ) is continuous, too) . It follows that lim0 P(X Ay- < Y <
A.BENHARI
-66-
y +
( v ) dv
y
0 y +
y+) = lim
A ( y)
(one applies the Hospitals rule!) =
2 ( y)
1A(x) 12(x,y)d(x).
( v ) dv
(u )
(u , y )
d (u ) =
2 ( y)
Transition Probabilities
1. Definitions and notations.
Let (E,E) and (F,F) be two measurable spaces. A function Q: EF [0,1] is
called a transition probability from E to F if
(i). x a Q(x,B) is E measurable B F and
(ii). B a Q(x,B) is a probability on (F,F) x E
Thus we can imagine Q as a family Qx of probabilities on (F,F) indexed
on the set E. That is the way they do in statistics: they denote Q by (P) . We
shall denote by Q(x) the probability defined by Q(x)(B) = Q(x,B).
Q
We shall write in short Let E
F instead of Let Q be a transition
probability from E to F
Example 1. The regular conditioned distribution of a random variable X by a sub algebra F, denoted by PoX-1F is a transition probability from (,F ) to (,B())
(see Conditioning section. 3). Indeed, if we put Q(,B) = P(X BF)() = PoX1
F (B)() , then (i) and (ii) are fulfilled by the very construction of Q.
Example 2. A particular case is Q(x,B) when Q(X(),B) = P(XBY)() (the regular
version!) where X and Y are two random variables . This time Q is a transition
probability from (,B()) to itself.
Example 3. If F is at most countable and F = P(F) (all the subsets of F!) then all
the transition probabilities from E to F are of the form
(1.1) Q(x) =
q( x, y ) y
yF
q ( x, y )
= 1 x F.
yF
Q(x,{ y})
yF
q ( x, y ) .
yF
A.BENHARI
-67-
yF
finite, this is an ordinary matrix with the sum of the entries on every line
equally to 1. We can think at a stochastic matrix as being a collection of
stochastic vectors that is,
is of nonnegative vectors with the sum of the
components equally to 1.
Q( x, C ( x,.)) d(x)
Q(C) =
(2.1)
(2.2)
Q( x, B)) d(x)
Proposition 2.1.
(i).
If is a bounded signed measure on (E,E), then Q is a bounded signed
measure on E F. If is a probability, then Q is a probability, too. If f:
EF is measurable (nonnegative or bounded) then
f dQ
(2.3)
Remark. The meaning of (2.3) is that firstly we integrate f(x,.) with respect to
the measure Q(x) and then we integrate the resulting function with respect to the
measure . The notation from (2.3) is awkward, that is why one denotes
d(x) instead. The most accepted notation is, however,
f ( x,.)dQ
So
f dQ
(2.4)
(ii).
If is a bounded signed measure on (E,E), then Q is a bounded signed
measure on F. If is a probability, then Q is a probability, too. If f : F
is measurable (nonnegative or bounded) then
f dQ
(2.5)
Proof. It is easy. Firstly, both Q and Q are measures because of Beppo Levi
theorem. Indeed, if Cn are disjoint, then Q(
U
n =1
Q ( x,
U
n =1
Cn ) =
UC
Q ( x, (
C n ( x,.)) d(x) =
n )( x,.))
d(x) =
n =1
n =1
Q(C ).
n
Thus,
n =1
Q( x, F ) d(x)
1 d(x)
= (E) ; so,
if (E) = 1 , Q(EF) = 1 too. As about the formula (2.4) its proof is standard,
A.BENHARI
-68-
into the usual steps: indicator, simple function, nonnegative function, any. The
same holds for (2.5).
Remark 2.1. Suppose that F is countable. Then Q has the form (1.1), then
(2.3) and (2.5) become
(2.6) Q(A{y}) =
(2.7) Q({y}) =
q( x, y)1
A ( x)
d(x)
q( x, y) d(x)
p (x)
therefore
xE
xE
Let M(E,E) denote the set of all the bounded signed measures on the
measurable space (E,E) , Prob(E,E) be the set of all the probabilities on that
space and let Bo(E,E) denote the set of all the bounded measurable functions f : E
.
Notice that M(E,E) is a Banach space with respect to the norm variation
defined as = +(E) + -(E) where = + - - is the Hahn-Jordan decomposition
of . Recall that + is defined by +(A) = (AH) where H is the Hahn-Jordan set
of , that is a set (almost surely defined) with the property that (H) = sup
{(A) A E }. In this Banach space the set Prob(E,E) is closed and convex.
On the other hand Bo(E,E) is a Banach space too, with the uniform norm f
= sup {f(x); x E}. The connection between these two spaces is given by
Lemma 2.2.
f Bo(E,E), f = 1}
{ f d : M(E,E), = 1}
(i).
M(E,E) = sup { f d:
(ii).
f Bo(E,E) f = sup
(iii) f d f
It means that the mapping (,f) a <,f > : =
f d is
a duality . These
A.BENHARI
-69-
{ f d:
f d where
f = 1H 1H c . As f = 1, sup
f d
-
f = sup { f d :
M(E,E), = 1} and
(2.12) T(f) = Qf
defined by Qf(x) =
f dQ(x) = f ( y) Q(x,dy)
f dT()
Bo(F,F), M (E,E)
Proof.
f dT()
f dQ
T '( f )
d f
yF
(2.15)
Qf(x) = q( x, y ) f(y)
yF
vectors.
A.BENHARI
-70-
means that x - x = 0. )
Let us define the quantity
(3.1) (Q) =
-
( x ' )Q
1
sup{Q(x) Q(x) x,x E} = sup{ x
x x}
x x'
2
(3.5)
Now f d(Q) =
=
f ( y)Q( x, dy) d
f ( y)Q( x, dy) 1
((1H m) (x) H
(x)dm
m (x) -
f ( y)Q( x, dy) d
((1H m) (x)
(x)dm
m(x)
= (
1
m(x))
1K (x') dm
b
f (y)Q(x,dy) 1
f ( y)
m2(x,x) Q ( x, dy ))1H K ( x, x' ) dm
b
1
m2(x,x)
( af ( y )Q ( x, dy ) bf ( y )Q ( x' , dy ))1H K ( x, x' ) dm
ab
(x)dm
m(x) - (
1
m(x))
1K (x) dm
a
f (y)Q(x',dy)1
(x)dm
m(x)
f ( y)
m2(x,x)
Q ( x' , dy ))1H K ( x, x' ) dm
a
1
m2(x,x)
af ( y )Q ( x, dy ) bf ( y )Q( x' , dy ) 1H K ( x, x' ) dm
ab
1
sup af ( y )Q( x, dy ) bf ( y )Q( x' , dy ) 1H K ( x, x' ) dm
m2(x,x)
ab x , x 'E
1
m2 = sup fd (aQx bQx ' ) (as m2(HK) = ab
= sup af ( y )Q( x, dy ) bf ( y )Q( x' , dy ) H K dm
ab
x , x 'E
x , x 'E
A.BENHARI
-71-
x , x 'E
1
(Q(x) Q(x)) +a b. It follows that
2
x , x 'E
sup
Q(x) x,x F} =
T
1
sup{(x - x)Q x,x F} = sup 0
2
X
1
sup{Q(x)
2
T0 where X =
{(x - x)/2 x x E} M0 (F,F). The last claim comes from the fact that
1Q - 2Q = (1 - 2)Q -(Q)1 - 2 (since (1-2)(E) = 1 1 = 0!) (Q)(1+2) = -(Q)(1+1).
If F is at most countable, then the coefficient -(Q) is computable.
Indeed, if Q(x) =
q (x,y)y and Q(x) =
q (x,y)y , then Q(x)
yF
Q(x) =
q ( x, y ) q ( x ' , y ) .
yF
yF
d ;
is -
yF
( x ) q ( x, y )
y
A.BENHARI
(Q)
-
( x)
+ (1 - -(Q))
( x)
x
-72-
Q ( x , A )1 (x )Q ( x , dx )
Q Q (x , E A ) = Q ( x , A )Q ( x , dx )
A2
Proposition 4.1.
(i).
If f : E2E3 is bounded or nonnegative then
(4.3) f dQ1Q2(x1) = f ( x2 .x3 )Q2 ( x2 , dx3 )Q1 ( x1 , dx2 ) ( = ((Q1Q2)f)(x1) )
If f : E3 is bounded or nonnegative then
(ii).
(4.4)
d Q 1Q 2 =
f ( x )Q ( x , dx )Q ( x , dx ) (
3
= (Q1Q2)f)
x2E2
(4.7) ((Q1Q2)f)(x1) =
(x2,x3)q1(x1,x2)q2(x2,x3)
x2E2 , x3E3
(4.8) ((Q1Q2)f)(x1) =
(x3)q1(x1,x2)q2(x2,x3)
x2E2 , x3E3
(4.11)
f d[(Q )Q ]
1
!)
(4.4)) so both quantities coincide. As about (4.11) one gets (Q1Q2)Q3(x) = x(Q1Q2)Q3
= (xQ1Q2)Q3 and [Q1(Q2Q3)](x) = (xQ1)(Q2Q3) =(xQ1Q2)Q3 which is the same.
A.BENHARI
-73-
Remark. If all the spaces are at most countable, then (4.9) and (4.10)
are the usual products between a row vector and a matrix (this is (4.9)) or
between matrix and column vector (this is (4.10)).
Corollary 4.3. The Dobrushin contraction coefficient is
submultiplicative.
The following inequality holds
(4.12) (Q1Q2) -(Q1)-(Q2)
Proof. Let T1 : M0 (E1,E 1) M0 (E2,E 2) and T2 : M0 (E2,E 2) M0 (E3,E
)
be
defined
as T1() = Q1 and T2() = Q2. Then we know from Corollary 3.2 that
3
(Q1) = T1 and -(Q2) = T2. Notice that T2T1() = T1()Q2 = (Q1)Q2 = (Q1Q2). It
means that (Q1Q2) = T2T1 T2T1 = (Q1) (Q2).
-
Suppose now that (Ej,Ej)j are measurable spaces and that Qj are transition
probabilities from Ej to Ej+1. Because of the associativity the product Q1Q2Qn is
well defined . If all these spaces coincide and Qi = Q, then this product will be
denoted by Qn .
The fact that is submultiplicative has important consequences.
A.BENHARI
-74-
Disintegration of the
probabilities on product
spaces
1.
Let (,K,P) be a probability space. Recall the following result from the
lesson Conditioning:
PoY (F) exists. In other words there exists a transition probability Q from
-1
(1.1)
B()
What is wrong with this Q?
We would like to have a transition probability Q* from (,F) to (E,E)
such that
(1.2)
almost all
If B1 and B2 are two Borel sets such that A = EB1 = EB2 (= i 1(B1) = i
1
(B2)!) then P(X-1(A)F ) = P(X-1(i 1(B1))F ) = P((ioX)-1(B1)F ) = P(Y-1(B1)F ) =
Q(,B1) (a.s.) and
P(X-1(A)F ) = P(X-1(i 1(B2))F ) = P((ioX)-1(B2)F ) = P(Y-1(B2)F ) = Q(,B2) (a.s.)
hence
A.BENHARI
-75-
(1.3)
E = {A
E A
Proposition 1.1. Suppose that the measurable space (E,E) has the
following property:
There exists a mapping i : E such that
(1.4)
E = i 1(B()) and
i(E) B()
Let X : E be measurable and
are disjoint, too. Indeed, An are of the form i 1(Bn) with Bn Borel sets. Replacing,
if need may be, Bn with the new Borel sets Bni(E) we may assume that (Bn)n are
disjoint as well. Then i(An) = i(i 1(Bn))
) = Bni(E) = are disjoint. It follows that
n =1
n =1
n =1
Q (,i(An)) =
Q *(,A ).
n
The
n =1
A.BENHARI
-76-
E i(A)
d n ( x)
instance p = 10 or p = 2). Then any x (0,1) can be written as x =
n
n =1 p
where the digits dn(x) are integersa from 0 to p-1. Imposing the condition that
any x of the form x = kp-n be written with a finite set of digits (that is denying
the possibility of expansions of the form x = 0.c1cnaaaa. where a = p-1) this
expansion is unique . Now consider the mapping i : (0,1)2 (0,1) defined by
d1 ( x ) d1 ( y ) d 2 ( x ) d 2 ( y ) d 3 ( x ) d 3 ( y )
(1.6)
+
+
+
+
+
+ ...
i(x,y) =
p
p2
p3
p4
p5
p6
(on the odd positions the digits of x and on the even ones the digits of y) this
function is one to one and measurable (since all the functions dn are measurable)
. It is true that i is not onto, because in Range(i) there are no numbers z of the
form z = 0.ac2ac4ac6. with a = p-1 since we denied that possibility. However, the
function j : (0,1) (0,1]2 defined by
d ( z) d ( z) d ( z)
d 2 ( z ) d 4 ( x) d 6 ( y )
(1.7)
+
+
+ ... )
j(z) =( 1 + 3 2 + 5 3 + ... ,
p
p
p
p
p2
p4
has the obvious property that j(i(x,y)) = (x,y) x,y (0,1) and it is
measurable. This fact ensures the measurability of i 1: B := Range(i) (0,1)2
because of the following equality
(1.8)
(i 1)-1(C) = i(C) = j1(C)Range(i)
Indeed, z i(C) z = i (u), u C j(z) =j(i(u)) = u C z j
1
(C)Range(i). Conversely, z j1(C)Range(i) j(z) C, z = i(u) for some u
2
(0,1) j(i(u)) C u C, z = i(u) z i(C). So the only problem is to
check that Range(i) is a Borel set. But that is easy: its complement is the set of
all the numbers x with the property that, starting from some n on all the odd
(even) positions there is the digit a = p-1 . Meaning that (0,1) \ Range(i) =
UO
En
n =1
I{ x(0,1)d
j> n
A.BENHARI
-77-
Proof. Let Bj , j=1,2 be Borel sets on the line isomorphic with Bj . Let
fj : Ej Bj the isomorphisms. Then f = (f1,f2) : E1E2 B1B2 is an isomorphism,
2
2
2
too. Let then i be the cannonical embedding of B1B2 into , h : (0,1) be an
-x
-x
isomorphism (for instance h(x,y) = (h(x),h(y)) with h(x) = e /(1+e ), the logistic
2
usual function) and : (0,1) Range() be the isomorphism from Example 2. The
composition := ohoiof is then an isomorphism from E1E2 to Range().
2.
f dQ
(2.2)
f (x,y)Q(x,dy)d(x)
F = (X1) := X11
(E1). Then X2 has a regular conditioned distribution of the form P(X2 B2 F) =
Q*(,B2) where Q* is a transition probability from (,(X1)) to (E2,E2) because
of Corollary 1.2. The fact that Q* is of the form Q*(,B) = Q(X1(),B) for some
other transition probability Q comes from the universality property studied at the
lesson Conditioning.
Now all we have to do is to check that the equality
Ef ( X ) =
(2.3)
f dQ
A.BENHARI
(y)Q(X1,dy)))dP =
formula) =
( f (X ) f
-78-
(x,y)Q(x,dy)d(x) =
f dQ
(by (2.2)!)
Let
iI
c 1
i
iI
Ci
dQ =
c E(1
i
iI
Ci
(X)) =
f dQ.
3.
let X = (Xj)1 j n be a
A.BENHARI
-79-
where one takes the regular versions for the conditioned distributions.
If we denote by Qi these conditioned distributions (the precise meaning is:
Qi(X1,,Xi;Bi+1) = P(Xi+1 Bi+1 X1,X2,,Xi) (a.s) , i = 1,2,,n-1 ) and we denote be
the distribution of X1, then one can write the not very precise relation (3.4)
as
PoX-1 = Q1Qn-1
(3.5)
This product is to be understood as being computed in the prescribed order. We
have no associativity rule yet.
If all the spaces are discrete (meaning that Ej are at most countable and
Ej= P(Ej)
an obvious standard Borel space) then (3.4) says nothing more that the well
known multiplication rule
(3.6)
P(X1=x1,, Xn=xn) = P(X1=x1)P(X2=x2X1=x1)P(Xn=xnX1=x1,,Xn-1=xn-1)
(of course, if the right member has sense) and (3.5) says the same thing using
transition probabilities
(3.7)
P(X1=x1,, Xn=xn) = p(x1)q1(x1;x2)q2(x1,x2;x3)qn-1(x1,x2,,xn-1;xn)
where p(x1) = ({x1}) and qi(x1,x2,,xi;xi+1) = Qi(x1,x2,,xi;{xi+1}) =
P(Xi+1=xi+1X1=x1,,Xi=xi).
We want to define the associativity of the product (3.5). To do that,
the first step is to define the precise meaning of Q1Q2.
So, now n = 3. We can look at the product E1E2E3 as being in fact
E1(E2E3).
If we apply Proposition 2.1 for the standard Borel space E2E3 and
Proposition 2.1 from the lesson Transition probabilities we obtain
(3.8)
f ( x,y,z)Q(x,d(y,z))d(x)
PoX 1 = Q Ef(X) =
if f is
measurable, bounded
where Q is a transition probability from E1 to E2E3 with the property that
(3.9)
PoX 1 = (Q1)Q2
Ef(X) =
(x,y,z)Q2(x,y;dz)Q1(x;dy)d(x) (same f)
which should hold for any (x included)
product of Q1 with Q2 by the relation
(3.11)
Q1Q2(x,C) =
(y,z)Q2(x,y;dz)Q1(x,dy)
E2 E3
(y,z)Q2(x,y;dz)Q1(x,dy) =
A.BENHARI
-80-
1 Q (x,y;dz)Q (x,dy) = 1
2
f d[QQ ](x)
3
to the
On the other hand, let Q* = Q2Q3 . This is a transition probability from E1E2 to
E3E4. Therefore
f (y,z)Q*(x,y;dz)Q (x,dy)
1
It is the same integral. With more natural notations both of them can be written
as
(3.15)
Q1Q2(x,B3) := Q1Q2(x,E2B3) =
B3
(z)Q2(x,y;dz)Q1(x,dy) = Q
(x,y;B3)Q1(x,dy)
This is transition probability from E1 to E3.
Proposition 3.2. The usual product is associative, too.
Namely the following equalities hold:
(3.17)
(Q1)Q2 = (Q1Q2)
(3.18)
( Q 1Q 2) Q 3 = Q 1( Q 2Q 3
(x2,B3)dQ1(x2) =
Q Q
1
(x1,B3)d(x1) and,
A.BENHARI
-81-
= E( f dQ1Q2(X1)) =
f (z)
Q2(X1,y;dz)Q1(X1,dy)dP =
f (z)
q (x,y)q (x,y;z)
1
yE2
f ( X ,..., X
1
n , x n +1 )
E1En . The set of those C for which (3.21) holds is a -system which
n +1
A.BENHARI
-82-
0,1(x) =
1 2
e . We denote that by X N(0,1). The distribution function of
2
1
2
= N(0,1)((-,x]) =
u2
2
du
There exists no explicit formula for , but it can be computed numerically. Due
to the symmetry of the density 0,1, it is easy to see that (-x) = 1 - (x)
x
1
2
u2
2
t2
2
) = (
of X is
X(x) = FX(x) =
1
) =
0,1(
1
) =
e
2
( x )2
2 2
. We denote this
density with , 2 and the distribution of X with N(,2). Due to obvious reasons we
read this distribution as the normal with expectation and dispersion . Its
characteristic function is
(1.2)
2.
X(t) = Ee
it(+Y)
itX
= Ee
it
= e Ee
itY
= e
it
t 2 2
2
t
j =1
A.BENHARI
-83-
2
j
f(t) = X t
(2.1)
2
2
: =
Xj tj
j =1
E(Xj tj )2
j =1
Then f(t) f(EX). In other words EX is the best constant which approximates X if
the optimum criterion is L2.
n
j =1
t j2 - 2 t jE X j +
j =1
E(Xj ) =
j =1
(tj-EXj) +
j =1
j =1
(Xj).
2
The analog of the variance is the matrix C = Cov(X) with the entries ci,j =
Cov(Xi,Xj) where
(2.2) Cov(Xi,Xj) = EXiXj - EXiEXj
The reason is
Proposition 2.2. Let X be a random vector from L2, C be its covariation
n
matrix and t . Then
(2.3)
Var(tX) = tCt
Proof. Var(tX) =E(tX)2 (E(tX))2 =
titjE(XiXj) titjE(Xi)E( Xj) =
1i, j n
1i, j n
1i, j n
ci,jtitj = tCt.
Remark. 2.1. Any covariance matrix C is symmetrical and non-negatively
defined , since according to (2.3) , tCt 0 t n. We shall see that for any
non-negatively defined matrix C there exists a random vector X having C as
covariance matrix.
Remark. 2.2. We know that, if X is a random variable, then Var( + X) =
2
Var(X). The n dimensional analog is
(2.3) Cov(+AX) = ACov(X)A
Indeed, Cov(+AX) = Cov(AX) (the constants dont matter) and (Cov(AX))i,j
=E((AX)i(AX)j) - E((AX)i)E((AX)j) = E((
ai ,r X r )(
a j ,s X s ) (
ai ,r EXr)(
a j ,s
E X s) =
1 r , s n
i ,r
1 r n
1 r , s n
1 s n
i ,r
1 r n
1 s n
a j ,s (Cov(X))r,s = ACov(X)A.
1 j n
1
0, I n ( x ) =
e
2
(2.4)
n
2
= (2 ) e
N ( 0, I ) (t ) =Ee
n
A.BENHARI
itX
Ee
it j X j
j =1
-84-
X(t) = e
Let
(2.6) X =
Its
ACov(Y)A
Its
t .
now Y N(0,Ik) and A be a nk matrix . Let n. Consider the vector
+ AY
expectation is and, applying (2.3) its covariance C= C(X) =
= AA (since clearly Cov(Y) = In ).
it X
it ( +AY)
it
-it AY
it
-i(A t) Y
characteristic function is X(t) = Ee = Ee = e Ee = e Ee
A 't
t ' AA't
2
it
t 'C t
2
Y(At) = e e
= e e
= e e
= e
.
The first interesting fact is that X depends on C, rather than on A. The
second one is that C can be any non-negative nn defined matrix. Indeed, as one
knows from the linear algebra, any nod-negative defined matrix C can be written as
C = ODO where O is an orthogonal matrix and D a diagonal one , with all the
elements dj,j non-negative. Let A = OO with the diagonal matrix with j,j =
it
i t
= e
i t
i t
X(t) = N(,C)(t) = e
it
t 'C t
2
n
2
( x )'C 1 ( x )
2
(2) e
Proof. Let A be such that X = + AY , C = AA. We choose A to be square and
invertible. Then det( C ) =det(AA) = det(A)det(A) = det2(A). Let f : n be
measurable and bounded. Then Ef(X) = Ef(+AY) = f ( + AY ) dP = f ( + Ay ) dPoY-1(y)
(2.8) ,C(x) =det(C)
-1/2
f ( + Ay ) (2)
n
2
A.BENHARI
f (x ) (2)
n
2
A1 ( x )
2
det(A)-1 dn(x)
-85-
D( x)
one sees that dn(x) =
D( y )
= det(A) (2)
-1
-1/2
=det(C)
=det(C)
f (x )
n
2
(2)
-1/2
,C(
f (x ) e
n
2
( A1 ( x ))' A1 ( x )
2
f (x ) e
n
2
d (x)
( x )' A '1 A1 ( x )
2
( x )'C 1 ( x )
2
f (x ) e
d (x) = f d(
(2)
x)
,C
-1
-1
-1
transport formulla,
Ef(X)
3.
= f dPo X
-1
A.BENHARI
-86-
is(tX)
= e
is ( t ' )
s 2t 'Ct
2
t 'Ct
it '
2
Proof. Due to (iii). from Corollary 3.2 we may assume that J = {1,2,,k} hence Jc
= {k+1,,n}. If i J , j J then Cov(Xi,Xj) = r(Xi,Xj)(Xi)(Xj) = 0 . Let Y = XJ
and Z = X J c . We can write then X =(Y,Z) . From (iv)., Corollary 3.2, we know
that Y and Z are normally distributed: the first one is Y N(J, CJ) and Z N(K,
CK) with K = Jc. Moreover, as i J , j K Cov(Xi,Xj) = 0 it follows that C =
CJ
0
. Let t n. Write t = (tJ,tK). It is easy to see that tCt = tJCJ tJ +
C K
it
t 'C t
2
it J J
t J 'C J t J
2
it K K
t K 'CK tK
2
says that if two distributions have the same characteristic function, they must
coincide. It means that Po(Y,Z)-1 = (PoY-1) (PoZ-1) Y and Z are independent.
Property 3.4. Convolution of normal distributions is normal. Precisely
X1 N(1,C1), X2 N(2,C2), X1 independent of X2 X1 + X2 N(1+2,C1+C2)
(Here it is understood that X and Y have the same dimension!)
it1
t 'C1 t
2
, X 2 (t) = e
it2
t 'C2 t
2
. It
xn
follows
it1
t 'C1 t
2
it1
t 'C1 t
2
= e
it ( 1 + 2 )
t '( C1 +C2 ) t
2
X 1 + X 2 + ... + X n
(from the law of large numbers we know
n
that xn ; in statistics one calls xn an estimator of ) and let s :=sn(X) =
x = xn be their average
(X
) (
2
x + X 2 x + ... + X n x
n 1
A.BENHARI
2
j
nx
j =1
n 1
-87-
Proof. Let us firstly suppose that Xj N(0,1). Let us consider the matrix A=
1
1
1
1
1
...
n
n
n
n
n
1
1 2
0
0
...
0
1 2
1 2
1
1
1 3
0
...
0
. The reader is
23
23
23
1
1
1
1 4
...
0
3 4
3 4
3 4
3 4
...
...
...
...
...
...
1
1
1
1
1
1 n
n(n 1)
n(n 1)
n( n 1)
n(n 1)
n(n 1)
n(n 1)
invited to check that A is orthogonal, that is, that AA = In . Let X = (Xj)1jn and
Y = AX. By (3.1), Y N(0,AInA) = N(0,In). Thus Yj are all independent , according
2
2
2
to property 3.3. So, Y1, Y2 , Y3 , ..,Yn are independent, too. But Y1 = x n . On the
n
Y j2 - Y12 =
j =1
( AX ) 2j - ( x n )2 =
j =1
a
(
j ,k
X k )2 - n x =
j =1 k =1
a j ,k a j ,r X k X r ) - n x =XAAX - n x = XX - n x (since AA = In !) =
j =1 1 k ,r n
2
j
- nx =
j =1
4.
(4.1) H = {
Z
j
j , 1jn}
j =1
Let U L2. We shall denote the orthogonal projection of U onto H by U*. Hence
n
U* =
(i).
Z
j
j =1
U U* Zj 1 j n
(ii).
We shall suppose that all the variables Zj are linear independent (viewed as
n
Z
j
= 0
j =1
-88-
(4.2)
j =1
The matrix G =(<Zj,Zk>)1j,kn is called the Gramm matrix. Remark that this matrix is
invertible since if t n then tGt =
< Z j , Z k > t j tk =
1 j ,k n
t Z
j
22 0 and, as Zj
j =1
s 'C s
2
s 'C s
is 'Y *
2
e
that YZ(s) =
. For every this is the characteristic function of
N(Y*(),C).
Remark. As a consequence, the regression function E(YZ) coincides with Y*.
Indeed, by the transport formula 3.5 , lesson Conditioning, E(YZ) is the integral
with respect to PoY-1(Z), i.e. with respect to N(Y*,C). And that is exactly Y*.
It follows that the regression function is linear in Z. Remark also that the
conditioned covariance matrix C does not depend on Z.
The restriction that all the Zj be linear independent is not serious and
may be removed.
Corollary 4.2. If X =(Y,Z) is normally distributed, then the regular
conditioned distribution PoY-1(Z) is also normal.
Proof. Let k be the dimension of H. Choose k r.v.s among the Zjs which
form a basis in H. Denote them by { Z j1 , Z j2 ,..., Z jk }. Then the other Zj are linear
combinations of these k random variables, thus the -algebra (Z) is generated
Z j1 , Z j2 ,..., Z jk . It follows that PoY-1(Z) = PoYonly by them. Let Z0 be the vector
1
A.BENHARI
-89-
= det 1
=
=
1 2 (1-r ) and the inverse C
2
2 2
2
2
12
1 2 (1 r ) r1 2
r1 2
1
r
s 'C s
2
1 2
1 1
2 =
=
.
Then
the
characteristic
function
is
X( s ) = e
2
r
1
(1 r ) r
1 2
22
0,C(x) =det(C)
-1/2
(4.4)
(2)
n
2
( x )'C 1 ( x )
2
+
1 2 22
12
2 (1 r 2 )
.
21 2 1 r 2
In this case the projection of X1 onto H is very simple : X1* = aX2 with
r1
a chosen such that <X1-aX2,X2> = 0 r12 = a22 a =
. The covariance matrix
2
from (4.3) becomes a positive number Var(X1 X1*) = E(X1 X1*)2 = 12 2ar12 +
a222 = 12(1-r2) thus
r
(4.5) Po(X1)-1(X2) = N( 1 X2, 12(1-r2))
2
In the same way we see that
r
(4.6) Po(X2)-1(X1) = N( 2 X1, 22(1-r2))
1
If EX = (1,2) then, taking into account that Xj and Xj-j generate the same algebra, the formulae 4.4-4.6 become
(4.7) 0,C(x) =
( x1 1 ) 2 2 r ( x1 1 )( x2 2 ) ( x2 2 )2
+
1 2
12
22
2 (1 r 2 )
21 2 1 r 2
r
(4.8) Po(X1)-1(X2) = N(1+ 1 (X2-2), 12(1-r2))
2
A.BENHARI
-90-
5.
r 2
2
2
(X1-1), 2 (1-r ))
1
Theorem 5.1. Let (Xn)n be a sequence of i.i.d random k-dimensional vectors . Let a
= EX1 and C = Cov(X1). Then
X + X 2 + ... X n na
Distribution
N(0,C)
(5.1) sn := 1
n
Proof. We shall apply the convergence theorem for characteristic functions. Let Yn
= Xn - a, let be the characteristic function of Y1 and n be the characteristic
t
function of sn. Thus (t) = E e it '( X1 a ) and n(t) = E e it 'sn = n ( ) . We shall prove
n
that n(t) N(0,C)(t).
Let Z n = tYn . Then the random variables Zn are i.i.d., from L2 , EZn = tEYn = 0 and
Z + Z 2 + ...Z n
Var(Zn) = tCt. Using the usual CLT , 1
converges in distribution to
n
Z + Z 2 + ...Z n
N(0, tCt). Let n the characteristic function of 1
. It is easy to see
n
that n(1) = n(t). But n (1) N(0,tCt)(1) = e
N(0,C)(t).
12
t 'Ct
2
=e
t 'Ct
2
Corollary 5.2. Let X , Y be two i.i.d. random vectors from L2 with the property
1
X +Y
that PoX = Po
. Then PoX-1 = N(0,C) for some covariance matrix C.
2
X +Y
X +Y
X
Proof. If X and
have the same distribution, then EX = E
= 2E
= 2
2
2
2
EX hence EX = 0. Now let Xn a sequence of i.i.d. random vectors having the same
X 1 + X 2 + ... + X 2n
distribution as X. It is easy to prove by induction that
has the
2n
same distribution as X. (Indeed, for n = 1 it is our very assumption. Suppose it
X 1 + X 2 + ... + X 2n
X 2n +1 + X 2n + 2 + ... + X 2n + 2n
holds for n, check it for n+1. So
and
are
2n
2n
X 1 + X 2 + ... + X 2n
1
i.i.d. and both have the distribution of X. Then
(
+
2
2n
X 2n +1 + X 2n + 2 + ... + X 2n + 2n
X 1 + X 2 + ... + X 2n +1
) =
must have the same distribution). But
2 n+1
2n
-1
A.BENHARI
-91-
X 1 + X 2 + ... + X 2n
sn :=
Proposition 5.3. Let X and Y be two i.i.d. random vectors. Suppose that X+Y and XY are again i.i.d. Then X N(0,C) for some covariance matrix C = Cov(X).
Proof. Let k be the dimension of X. Let t k . Then tX and tY are again i.i.d.
As X+Y and X-Y are i.i.d, it follows that tX + tY and tX tY are i.i.d.
Thats why we shall prove first our claim in the unidimensional case. That is, now
k = 1.
Let be the characteristic function of X. As X+Y and X-Y are i.i.d, it follows
is(X+Y)+it(X-Y)
= Eeis(X+Y) Eit(X-Y) EeiX(s+t)+iY(s-t) = EeisX EeisY
that X+Y,X-Y(s,t) = X+Y(s)X-Y(t) Ee
EitXEe itY which is the same with
(5.2) (s+t)(s-t) = 2(s)(t)(-t) s,t.
On the other hand, X+Y and X-Y have the same distribution. It means that they have
the same characteristic function. As X+Y(t) = X(t)Y(t) = 2(t) and X-Y(t) =
X(t)Y(-t) = (t)(-t) we infer that (t) = (-t) = (t ) t . It follows
that (t) t hence (5.2) becomes
(5.3) (s+t)(s-t) = 2(s)2(t) s,t
If s = t (5.3) becomes (2s)(0) = 4(s) s (2s) = 4(s) s (2s) 0
s (s) 0 s . Thus is non-negative and (t) = (-t) t.
Let h = log . Then (5.3) becomes
(5.4) h(s+t) + h(s-t) = 2(h(s)+h(t)) s,t
If in (5.4) we let t = 0, we get 2h(s) = 2(h(s) + h(0)) h(0) = 0.
If in (5.4) we let s = 0, we get h(t) + h(-t) = 2(h(t) + h(0)) = 2h(t) h(t) =
h(-t).
Finally, replacing h with kh , we see that (5.4) remains the same. Thats why we
shall accept that h(1)=1. By induction one checks that h(n) = n2 n positive
integer. Indeed, for n = 0 or n = 1 this is true. Suppose it holds for n, check it
for n+1. Letting in (5.3) s=n,t=1 we get
(5.5) h(n+1) + h(n-1) = 2(h(n)+h(1)) h(n+1) + (n-1)2 = 2n2 + 2 h(n+1) =
(n+1)2
It follows that h(x) = x2 x integer.
Let now set s=t . Then (5.4) becomes h(2t) = 4h(t). If 2t is an integer, we see
that (2t)2 = 4 h(t) h(t) = t2. So the claim holds for halfs of integers.
Repeating the reasoning, the claim h(x)=x2 holds for any number of the form x =
m2-n , m , n integers. But the numbers of this form are dense, so the claim holds
for any x. Remembering the constant k we get
(5.6) h(x) = kx2 x
On the other hand, 1 h 0 k 0 k = -2 for some nonnegative .
The conclusion is that
2 2
(5.7)
(t) = exp(- t ) for some 0.
A.BENHARI
-92-
A.BENHARI
-93-
A.BENHARI
-94-
II. STATISTCS
A.BENHARI
-95-
Basic Concepts
A.BENHARI
-96-
A function of random samples is called a statistic. The commonly-used statistics are listed as
follows:
1 n k
i , k = 1,2,L
n i =1
1 n
i = is also called sample
n i =1
mean.
1 n
( i )k , k = 1,2,L
n i =1
Sample Variances:
S 2n =
1 n
( i )2
n 1 i =1
Note that the sample variance S 2 is different from the sample central moment of second
order (n2 ) =
1 n
( i )2 .
n i =1
Theorem Let 1 , 2 , L , n , L be the random samples taken from the population , then for
[ ]
1 n
Proof:
Note that 1k , k2 , L , kn , L are independent and of the same distribution as that of k , it
follows from the strong law of large numbers that
A.BENHARI
-97-
[ ]
1 n k
P lim i = E k = 1 #
n + n
i =1
Remark: The theorem shows that sample average approximates to statistical average.
A.BENHARI
-98-
2. Sample Distributions
The distribution of a statistic is called sample distribution.
2.1. 2 (Chi-Square)-Distribution
Definition A continuous random variable is said to be 2 (Chi-Square) distributed with n
degree of freedom if its density functions is as follows:
n 2 2 x2
x e
n
f (x ) = 2 n
2
x >0
others
as
f (x )dx = . The upward percentage point can be obtained by looking up the probability
(n )
table concerned.
Theorem If the random variable has a 2 -distribution with n degrees of freedom, then
= E [ ] = n , 2 = E ( E )2 = 2 n
Theorem If the random variables 1 , 2 , L , n are independent and of the same standard
normal distribution N(0,1) , then the random variable 2 = 12 + 22 + L + 2n is distributed in
accordance with the 2 (Chi-square) distribution with n degree of freedom.
A.BENHARI
-99-
Theorem If random variables 12 , 22 ,L, 2k are independent and possess 2 -distribution with
k
2 -distribution with
i =1
2
i
possesses the
n
i =1
degree of freedom.
2.2. t(Student)-Distribution
Definition A continuous random variable is said to possess the so-called t- (Student)
distribution with n degree of freedom if its density functions is as follows:
f (x ) =
n +1
2
n x
n 1 +
2
2
2
n +1
2
as
( f) (x )dx = . The upward percentage point can be obtained by looking up the probability
t n
table concerned.
Theorem If the random variable has a t-distribution with n degrees of freedom, then
= E [ ] = 0 , 2 = E ( E ) =
2
n
for n > 2
n2
Theorem If the random variable is distributed in accordance with the standard normal
distribution N(0,1) , the random variable in accordance with the 2 -distribution with n
degree of freedom, and if and are independent with each other, then the random variable
n
is distributed in accordance with the t-distribution with n degree of freedom.
2.3. F-Distribution
A.BENHARI
-100-
Definition A continuous random variable is said to possess the so-called F-distribution with
m and n degrees of freedom if its density functions is as follows:
m
m2
2
m
x 2 n + m
n 2
n+m
m 2 n m
1 + x
n
2 2
f (x ) =
x>0
others
Remark 1: The degrees of freedom n and m are the only two parameters of F- distribution.
Remark 2: For all 0 < < 1 , the value F (m, n ) , called the upward percentage point, is
+
defined as
F m , n
1
~ F(m, n ) .
1
= F (m, n ) . In fact,
F1 (n , m )
1
1
1 = P{ > F1 (n, m )} = P <
= 1 P
F1 (n, m )
F1 (n, m )
1
1
1
= P
= P F (m, n )
F1 (n, m )
1
= F (m, n )
F1 (n , m )
Theorem If the random variable has a F-distribution with m and n degrees of freedom,
then
A.BENHARI
-101-
= E [ ] =
n
2 n 2 (m + n 2 )
2
for n > 2 , 2 = E ( E ) =
for n > 4
2
n2
m(n 2 ) (n 4 )
Theorem If the random variable possesses 2 -distribution with m degree of freedom, the
random variable 2 -distribution with n degree of freedom, and if and are independent
with each other, then the random variable
m
possesses F-distribution with m and n degrees
n
of freedom.
A.BENHARI
-102-
3. Normal Populations
1 n
1 n
2
( i )2 sample variance, then
sample
mean
and
S
i
n i =1
n 1 i =1
(1) and S 2 are independent of each other
2
(2) ~ N ,
n
(n 1)S 2
,
~ 2 (n 1) ,
~ t (n 1)
2
S n
and N 2 , 22
respectively, =
1 n
i ,
n i =1
(
n 1)S12 + (m 1)S 32
2
1 m
1 n
1 m
2
2
2
2
( i ) , S =
,
= i , S1 =
( i ) , S 2 = m 1
m i =1
n 1 i =1
n+m2
i =1
then
(1)
( ) (
2 )
12 32
+
n
m
S12 12
=
(2) 2
S 2 22
(3)
( ) (
(n 1)S12 12
(n 1)
(m 1)S 22 22
(m 1)
1
2 )
1 1
S
+
n m
A.BENHARI
~ N(0,1)
~ F (n 1, m 1)
~ t (n + m 2 ) if 1 = 2 =
-103-
Parameter Estimation
A.BENHARI
-104-
1. Point Estimation
1.1. Point Estimators
Let 1 , 2 , L , n be the random samples taken from the same population characterized by a
random variable , and an unknown parameter appearing in the distribution of , by point
estimation we mean the attempt to look for a statistic g(1 , 2 ,L, n ) to estimate the unknown
parameter .
Mean Square Consistent Estimators An estimator g(1 , 2 ,L, n ) for a parameter is said
]= 0 .
E g 1 (1 , 2 , L , n )
] E [g ( , , L ,
) 2]
A.BENHARI
-105-
[ ]
1 n k
i = E ( 1 , 2 ,L, m ) k , k = 1,2, L, m
n i =1
This is a system of m equations with m unknowns, the solution to which is the so-called
MOM estimators of 1 , 2 , L , m .
Remark: The method of moments is motivated by the following equation:
[ ]
[ ]
[ ]
1 n
1 n
1 n
E ( 1 , 2 ,L, m ) ik = E ( 1 , 2 ,L, m ) ik = E ( 1 , 2 ,L, m ) k = E ( 1 , 2 ,L, m ) k
n i =1
n i =1 n i =1
L (1 , 2 ,L, m ) (1 , 2 , L , n ) = f (1 ,2 ,L,m ) ( i )
i =1
L (1 ,2 ,L,m ) ( 1 , 2 , L , n ) = p (1 ,2 ,L,m ) ( i )
i =1
( , ,L , ) = argmax L(
1,2 ,L,m
1,2 ,L,m )
( 1 , 2 ,L , n )
In practice, if the derivatives of a likelihood function L (1 ,2 ,L,m ) with respect to the unknown
parameters exist, one can obtain the MLE estimator from the solution to the following
equations
A.BENHARI
-106-
] = 0 , i = 1,2,L, m
A.BENHARI
-107-
2. Interval Estimation
Definition Let 1 , 2 , L , n be the random samples taken from the same population and an
unknown parameter appearing in the population distribution. If for all 0 < < 1 (usually
small enough), one can determine two statistics a (1 , 2 , L , n ) and b (1 , 2 , L , n ) such
that
Example Suppose 1 , 2 , L , n are the random samples taken from a normal population
N , 2 .
(1) (Estimation of ) If the variance 2 is known, it follows from
~ N(0,1) that for all
n
0 < < 1,
< < + z 2
P
< z 2 = 1 P z 2
= 1
n
n
n
~ t (n 1) that for
S n
S
S
< < + t 2 (n 1)
P
< t 2 (n 1) = 1 P t 2 (n 1)
= 1
n
n
S n
(n 1)S2
= 1
2
1 2 (n 1)
2
2 (n 1)
A.BENHARI
-108-
corresponding distributions.
Example Suppose the ransom samples 1 , 2 , L , n are taken from a normal population
( )
N( , ) and the two populations are independent with each other.
N 1 , 12 , the random samples 1 , 2 ,L, m are taken from another normal population
2
(1)
2
2
(Estimation of 1 2 )
( ) (
2 )
2
+ 2
n
m
2
1
( ) (1 2 )
P
< z 2 = 1
12 22
n
m
12 22
2 2
P ( ) z 2
+
< 1 2 < ( ) + z 2 1 + 2 = 1
n
m
n
m
( ) (
2 )
(n 1)S12 + (m 1)S22 , that for all 0 < < 1 ,
~ t (n + m 2 ) , where S 2 =
n+m2
1 1
S
+
n m
1
( ) ( )
1
2
P
< t 2 (n + m 2 ) = 1
1 1
S
+
n m
1 1
1 1
P ( ) t 2 (n + m 2 )S
+ < (1 2 ) < ( ) + t 2 (n + m 2 )S
+ = 1
n m
n m
12
(
(
n 1)S12
m 1)S22
2
~ (n 1) and
~ 2 (m 1) that
(3) Estimation of 2 It follows from
2
2
1
2
2
S12 12
=
S 22 22
A.BENHARI
(n 1)S12 12
(n 1)
(m 1)S 22 22
(m 1)
-109-
~ F (n 1, m 1)
which leads to
S2 2
P F1 2 (n 1, m 1) < 12 12 < F 2 (n 1, m 1) = 1
S2 2
S12
12 S12
1
1
P 2
< 2 < 2
=1
S 2 F 2 (n 1, m 1) 2 S 2 F1 2 (n 1, m 1)
A.BENHARI
-110-
Tests of Hypotheses
Statistical hypothesis H 0 is an assumption about the unknown parameters appearing in a
population distribution or about the population distribution itself. A number of random
samples 1 , 2 , L , n taken from the population are then used to make the probability
P{H 0 is rejected H 0 is true} as small as possible. This is realized in practice by setting up the
following equation:
A.BENHARI
-111-
if
Test of the hypothesis H 0 : = 0 against the alternative H1 : < 0 ( > 0 ) of the mean of
a normal distribution with known variance 2 .
If the hypothesis H 0 : = 0 is true, then
< z =
P{H 0 is rejected H 0 is true} = P{H 1 : < 0 is accepted H 0 is true} = P
n
0
0
1
0
0
0
z , then H 0 : = 0 , otherwise H 1 : < 0
if
n
if 0 z , then H : = , otherwise H : >
0
0
1
0
n
A.BENHARI
0
S
-112-
~ t (n 1) , which leads to
if
0
S
Test of the hypothesis H 0 : = 0 against the alternative H1 : < 0 ( > 0 ) of the mean of
a normal distribution with unknown variance.
0
~ t (n 1) , which leads to
< t (n 1) =
P{H 0 is rejected H 0 is true} = P{H 1 : < 0 is accepted H 0 is true} = P
S n
0
0
1
0
0
S n
0
t (n 1), then H 0 : = 0 , otherwise H 1 : < 0
if
S n
if 0 t (n 1), then H : = , otherwise H : >
0
0
1
0
S n
(n 1)S 2
2
0
~ 2 (n 1) , which leads to
(n 1)S 2
(n 1)S 2
U
P{H 0 is rejected H 0 is true} = P
<
(
n
1
)
2 (n 1) =
1 2
2
2
0
0
if 1 2 (n 1)
(n 1)S 2
02
Test of the hypothesis H 0 : = 0 against the alternative H 1 : < 0 ( > 0 ) of the variance
of a normal distribution.
A.BENHARI
-113-
(n 1)S 2
2
0
~ 2 (n 1) , which leads to
(n 1)S 2
> (n 1) =
P{H 1 : > 0 is accepted H 0 is true} = P
2
1
0
0
1
2
(n 1)S 2
< (n 1), then H 0 : = 0 , otherwise H 1 : > 0
if
02
2
if (n 1)S > 1 (n 1), then H 0 : = 0 , otherwise H 1 : < 0
02
A.BENHARI
-114-
S2 =
(n 1 1)S12 + (n 2 1)S 22
n1 + n 2 2
is true, then
1
1
S
+
n1 n 2
~ t (n 1 + n 2 2) , where
, which leads to
n1 n 2
if
1
1
S
+
n1 n 2
S =
2
(n 1 1)S12 + (n 2 1)S 22
A.BENHARI
n1 + n 2 2
is true, then
, which leads to
-115-
1
1
S
+
n1 n 2
~ t (n 1 + n 2 2) , where
1
1
2
0
1 1
+
n1 n 2
1 1
+
n1 n 2
< t (n1 + n 2 2) =
t (n1 + n 2 2) =
n1 n 2
if
then H 0 , otherwise H 1 : 1 > 2
< t (n 1 + n 2 2)
1
1
+
S
n1 n 2
S12 12 S12
=
~ F(n 1 1, n 2 1) , which leads to
S 22 22 S 22
S 2
S2
P{H 0 is rejected H 0 is true} = P 12 < F1 2 (n 1 1, n 2 1) U 12 F 2 (n 1 1, n 2 1) =
S 2
S2
if F1 2 (n 1 1, n 2 1)
S12
< F 2 (n 1 1, n 2 1), then H 0 , otherwise H 1
S 22
S12 12 S12
=
~ F(n 1 1, n 2 1) , which leads to
S 22 22 S 22
S12
{
}
<
=
P
H
:
is
acceted
H
is
true
P
2 < F1 (n 1 1, n 2 1) =
1
1
2
0
S 2
1
1
2
0
1
2
S 2
A.BENHARI
-116-
if
if
A.BENHARI
S12
F1 (n 1 1, n 2 1), then H 0 , otherwise H 1 : 1 < 2
S 22
S12
F (n 1 1, n 2 1),
then H 0 , otherwise H 1 : 1 > 2
S 22
-117-
A.BENHARI
-118-
Introduction
A.BENHARI
-119-
1. Definition
Definition Let T be an index set, if for all t T , t is a random variable over the same
probability space, then the collection of random variables { t t T} is called a random
process.
Remark 1: { t t T} is called a discrete-time(parameter) random process if T is a countable
(finite or denumerable infinite) set. { t t T} is called a continuous-time random process if T
is a continuum.
Remark 2: The set of all possible values the random variables of a process may take is called
its state space of the process. The state space may be a continuum or a countable set.
Remark 3: There are four possible combinations for time and state of a random process:
continuous-time and continuous-state, continuous-time and discrete-state, discrete-time and
continuous-state, and discrete-time and discrete-state.
Definition A random process { t < t < +} is said to be periodic with period T if for all
t, P{ t + T = t } = 1 .
A.BENHARI
-120-
All these joint distributions constitute the family of finite-dimensional distributions of the
process.
(2) Consistency
F(x 1 , t 1 ; x 2 , t 2 ; L ; x n , t n ) = F(x 1 , t 1 ; x 2 , t 2 ; L ; x n , t n ; x n +1 = +, t n +1 ; L ; x n + m = +, t n + m )
A.BENHARI
-121-
3. Mathematical Expectations
Definition Let { t t T} be a random process, then
t = E[ t ] .
2t = E t t
] = E[ ]
2
t
R (t 1 , t 2 ) = E t 1 t 2
2
t
[(
)]
)(
cov (t 1 , t 2 ) = E t1 t1 t 2 t 2 = R (t 1 , t 2 ) t1 t 2
all t 1 T and t 2 T , R (t 1 , t 2 ) = E t1 t 2 = 0 .
A.BENHARI
-122-
4. Examples
4.1. Processes with Independent, Stationary or Orthogonal
Increments
Definition (Independent Increments) A random process
independent
increments
if
for
all
t T} is said to have
,
the
increments
[(
= E [(
)(
)] [
] [(
) ]
cov(t 2 , t 1 ) = E t 2 E t 2 t1 E t1 = E t 2 t1 = E t 2 t1 + t1 t1
t2
) ]
2
2
t1 t1 + E t1 = E t1 E t1 = 2t1 #
A.BENHARI
-123-
A.BENHARI
-124-
A.BENHARI
-125-
1. General Properties
{
F t k +1
t k L t1
(x k +1 , t k +1
x k , t k ; x k 1 , t k 1 ; L; x 1 , t 1 ) =
} {
= P t k +1 < x k +1 t k = x k ; t k 1 = x k 1 ; L ; t1 = x 1 = P t k +1 < x k +1 t k = x k
= F t k +1
t k
(x k +1 , t k +1
xk ,tk )
Remark 1: The definition of a Markov process means that the future is only dependent on the
present and has nothing to do with the past (History can tell nothing about future).
Remark 2: A Markov process is called a Markov chain if its state space is discrete.
Theorem Let { t t 0} be an IID random process, i.e., for all 0 t1 < t 2 < L < t k < t k + , the
random variables t1 , t 2 , L , t k , t k + are independent and identically distributed, then the
process is a homogenous Markov process.
Proof:
P t k + < x t k = x k ; L ; t1 = x 1 =
P t k = x k ; L ; t1 = x 1
} {
}
= P{
< x}
P{ = x }L P{ = x }
P{
< x}P{ = x }
P{
< x; = x }
=
=
= P{
P{ = x }
P{ = x }
=
P t k + < x; t k = x k ; L ; t 1 = x 1
}{
P t k + < x P t k = x k L P t1 = x 1
independent
tk
t k +
tk
tk
t1
t k +
t k +
independent
tk
tk
tk +
< x tk = x k
This shows that the process is a Markov Process. Furthermore, for all 0 t < t + ,
P{ t + < x t = y}
independent
P{ t + < x}
identically distributed
-126-
P{ < x} = P{ < x 0 = y}
P t k + < x t k = x k ; L ; t 2 = x 2 ; t1 = x 1 =
=
P t k = x k ; L ; t 2 = x 2 ; t1 = x 1
P t k + t k < x x k ; t k t k 1 = x k x k 1 ; L; t 2 t1 = x 2 x 1 ; t1 = x 1
P t k t k 1 = x k x k 1 ; L; t 2 t1 = x 2 x 1 ; t1 = x 1
}{
} {
}{
x }P{ = x }
P t k + t k < x x k P t k t k 1 = x k x k 1 L P t 2 t1 = x 2 x 1 P t1 = x 1
= P tk + tk < x x k =
} {
P t k t k 1 = x k x k 1 L P t 2 t1 = x 2
independent increment
P t k + < x; t k = x k ; L ; t 2 = x 2 ; t1 = x 1
P t k + < x; t k = x k
P tk = x k
P t k + t k < x x k ; tk = x k
P tk = x k
= P t k + < x t k = x k
t1
This shows that the process is a Markov Process. Furthermore, for all t 0 ,
} {
P t k + < x t k = x k = P t k + t k < x x k
stationary increments
P{ t + t < x x k }
P{ t + t < x x k }P{ t = x k }
P{ t + t < x x k ; t = x k }
=
independen t increments
P{ t = x k }
P{ t = x k }
P{ t + < x; t = x k }
= P{ t + < x t = x k }
P{ t = x k }
{ , () = x, () = y} = { , () () = x y, () = y}
A.BENHARI
-127-
Remark: From now on, discrete-time Markov chains appearing in this section are all
assumed to be homogenous.
1 x = y
p (xy0 ) = P{ n = y n = x} =
.
0 x y
Proof:
p (xyn + k ) = P{ n + k + m = y m = x} =
=
z
P{ n + k + m = y; m = x}
P{ n + k + m = y; k + m = z; m = x}
=
P{ m = x}
P{ m = x}
z
P{ n + k + m = y k + m = z; m = x}P{ k + m = z; m = x}
P{ m = x}
Remark: From the Chapman-Kolmogorov theorem, one can conclude that k-step transition
probabilities can be derived from the one-step transition probabilities. In fact,
p (xy2 ) = p xz p zy , p (xy3) = p (xz2 ) p zy , , p (xyk ) = p (xzk 1) p zy ,
z
A.BENHARI
-128-
Example If let
p 00
p 10
P= M
p n0
M
p 01 L p 0 n
p11 L p1n
M
M
M
p n1 L p nn
M
L
p (00m )
(m )
L
p 10
(m )
M ,P = M
p (nm0 )
L
M
M
p (01m ) L p (0mn ) L
(m )
p 11
L p 1(mn ) L
M
M
M
M
p (nm1 ) L p (nnm ) L
M
M
M
M
be the one-step transition matrix and m-step transition matrix of the chain, respectively, then
the theorem can be expressed in matrix form
P (m ) = P m
In fact, from Chapman-Kolmogorov theorem we have
+
p (xy2 ) = p xz p zy = p xz p zy P (2 ) = P 2
z=0
Theorem Let { n n = 0,1,L} be a homogenous Markov chain, then the distribution of n can
be expressed as
Remark 1: Recall that k-step transition probabilities can be derived from one-step transition
probabilities, the theorem shows that the distribution of n can be determined only by onestep transition probabilities as well as initial probabilities.
Remark 2: If let
A.BENHARI
-129-
p ( n ) = p P ( n ) = pP n
Theorem Let { n n = 0,1,L} be a homogenous Markov chain, then the joint distribution of
n1 , n 2 , L , n k , n k +1 can be expressed as
P n k +1 = x k +1 ; n k = x k ; L; n1 = x 1 =
{
= P{
}{
= P n k +1 = x k +1 n k = x k ;L; n1 = x 1 P n k = x k ;L; n1 = x 1
n k +1
}{
= x k +1 n k = x k P n k = x k ; L; n1 = x 1
{
= P{ = x }p (
= p (xnkkx+k1+1n k ) P n k = x k ; L; n1 = x 1
n1
n 2 n1 )
x1x 2
L p (xnkk1xnkk 1 ) p (xnkkx+k1+1n k )
each other if they are accessible from one another, often denoted by x y .
Hint:
p (xx0 ) = P{ 0 = x 0 = x} =
P{ 0 = x; 0 = x}
= 1 > 0 (Reflexivity)
P{ 0 = x}
x y p (xyn ) > 0
xy
y x (Symmetry)
(k )
y x p yx > 0
A.BENHARI
-130-
K C equation
p ( ) p ( ) p ( )p ( ) > 0 (Transitivity)
n
xt
k
tz
n
xy
k
yz
Remark: Since communication is an equivalent relation, one can divide the state space into
disjoint equivalent classes, the states in the same equivalent class can communicate with each
other, while the states belonging to different equivalent classes cant.
Definition A homogenous Markov chain is said to be irreducible if any two states of the
chain can communicate with each other.
2.2.2. Recurrence
Let
state y for the first time after k steps. Furthermore, let f xy = f xy(k ) , f xy is then the probability
k =1
such that the chain starting from the state x reaches the state y for the first time after some
finite steps.
Remark 2: It follows from the definition of f xy( k ) that for all n 1 , p (xyn ) = f xy(k ) p (yyn k ) .
k =1
Definition A state x of a homogenous Markov chain is said to be recurrent if, after starting
from it, the probability of returning to it after some finite steps is one, i.e., f xx = 1 . A state that
is not recurrent is said to be transient.
A.BENHARI
-131-
a 1
2
1
Example Let P =
2
c 1
4
d 0
1
1
1
1
4
2
4
a Markov chain, then the states a, b and d are recurrent, while c is transient..
p ( ) = + .
n =1
n
xx
Proof:
N
Nk
k =1
t =0
(1) Suppose
n =1 k =1
k =1 n = k
p ( ) = + , then
n =1
n
xx
Nk
n =1
k =1
t =0
k =1
t =0
p (xxn )
n =1
N
p (xxt )
t =0
p( )
n
xx
n =1
1 + p (xxt )
f xx(k )
k =1
t =1
p( )
lim
n
xx
n =1
N +
1 + p (xxt )
=
+
because p (xxn ) = +
1 f xx(k ) = f xx 1
k =1
n =1
t =1
f xx = 1
assume that
n
xx
n
xx
Nk
N N
p( ) = f ( ) p( ) f ( ) p( )
n =1
n
xx
k =1
k
xx
t =0
t
xx
k =1
k
xx
t =0
t
xx
f ( )
k =1
k
xx
p( )
n
xx
n =1
N N
1+
p( )
t =1
A.BENHARI
-132-
t
xx
p (xxn )
f ( )
k =1
k
xx
lim
N +
n =1
N N
1+
t =1
(t )
because
N +
k
xx
k =1
xx
n =1
+
n
xx
(t )
1 + p xx
<1
t =1
p( )
f ( )
p ( n ) < +
n =1
xx
+
1 = f xx = lim
p( )
n =1
+
n
xx
(t )
1 + p xx
<1
t =1
n =1
Remark: If a state x is recurrent, the chain will return to x infinite many times. If a state x is
transient, the chain will go away from x forever after returning to x finite many times.
Therefore, if the state space of a chain is finite, at least one of its states must be recurrent.
Proof:
The conclusion y x is self-evident, otherwise x would not be recurrent. Furthermore,
m =1
m =1
x is recurrent
Remark: Although a transient state can reach a recurrent state, a recurrent state can not reach
a transient state.
Theorem If a homogenous Markov chain with finite state space is irreducible, then all its
states are recurrent.
Proof:
A.BENHARI
-133-
Recall that a homogenous Markov chain with finite state space must have at least a recurrent
state x. For all other states y, it follows from the irreducibility of the chain that x and y are
communicate with each other and therefore y must also be recurrence. #
n 1 , p (xyn ) = 0 .
Remark: The fact that A is closed does not exclude the possibility of a state outside A
reaching a state inside A.
Theorem Let R be the set of all recurrent states of a homogenous Markov chain, then
(1) R is closed.
(2) If a binary relation ~ is defined on R such that for all x, y R , x ~ y = def x y ,
then the relation is an equivalent relation.
Hint: As we have proven in the preceding subsection, a recurrent state cant reach a transient
state. Thus R is closed.
Remark 1: Since the communication relation ~ in R is an equivalent relation, R can be then
divided into disjoint equivalent classes R = R 1 + R 2 + L . It is clear that each of equivalent
classes is also closed.
A.BENHARI
-134-
a 1
2
1
Example Let P =
2
c 1
4
d 0
1
1
1
2
2
4
1
4
0
1
a Markov chain, then the states a, b and d are recurrent, while c is transient. The state space
S = {a , b, c, d} can be decomposed as
S = T + R1 + R 2
where T = {c} , R 1 = {a , b} and R 2 = {d} .
k =1
k =1
Definition A state x of a homogenous Markov chain is said to have period T > 1 if p (xxn ) = 0
when n kT and T is the largest positive integer with this property. A state that is not
periodic is said to be aperiodic.
Remark: One should tell the difference between the periodicity of a random process and that
of a state of the process.
-135-
Definition Let pij be the one-step transition probability of a homogenous Markov chain, a
discrete distribution i is called the stationary distribution of the chain if
p
i
ij
=j.
Remark: if i 0 and
= 1.
large enough.
Theorem (Ergodic Theorem) If a finite-state homogenous Markov chain is regular, then the
chain is ergodic and its limit distribution is also its stationary distribution.
A.BENHARI
-136-
Let { n n = 0,1, L} be a random process such that n indicates the location of the particle at
the moment n, we will then address the following issues:
Is the process a homogeneous Markov chain?
Let 1 , 2 , L , m , L be the random variables such that { m = 1} indicates the event that
the particle moves one step forwards at the moment m and { m = 1} the event that the
n
the initial location of the particle. Note that 1 , 2 , L , m , L are independent and
p
k =1
n +1 n + k k 0
n
n
P{ n = k} = P m + k 0 = k = P m
=
= C n +k k 0 p
2
m =1
m =1 2
2
P{ n +1 = j n = i} = ?
P{ n +1
A.BENHARI
= j n = i} = q
0
-137-
j= i +1
j = i 1, n 0 .
others
n+kk0
2
n k +k 0
Remark:
Ax = x (A I )x = 0 A I = 0
X 1 AX = = diag( 1 , 2 , L , n )
where X = (x 1 , x 2 , L , x n ) .
Remark:
AX = X A = XX 1 A n = Xn X 1
a
1 a
, where 0 < a, b < 1 .
Example Let A =
b 1 b
(1) The eigenvalues and eigenvectors of A are given as follows:
a
1 a
= (1 a )(1 b ) ab = (1 )2 (a + b )(1 ) = 0
A I =
1 b
b
1 = 1
2 = 1 a b
1
x 1 =
a
1
Ax = 1 x
1 , X 1 = 1 b a
X
=
(
x
,
x
)
=
1
2
b
a
a + b b b
Ax = 2 x
x =
1
1
2
b
0 1
1
X that
(2) It follows from A = X
0 1 a b
A.BENHARI
-138-
1
A = X
0
n
A.BENHARI
n
1
1 b a (1 a b )
X =
+
(1 a b )n
a + b b a
a+b
-139-
a a
1 b a
n +
a + b b a
b b
A.BENHARI
-140-
Theorem (Chapman-Kolmogorov Equation) Let { t t T} be a homogenous continuoustime Markov chain and p ij ( ) = P{ t + = j t = i}, then
p ij ( + ) = P{ t + + = j t = i} =
P{ t + + = j; t = i}
P{ t = i}
P{ t + + = j; t + = k; t = i}
P{ t = i}
= P{ t + + = j t + = k; t = i}P{ t + = k t = i} = p ik ( )p kj ()
k
1 i = j
lim+ pij ( ) = lim+ P {t + = j t = i} =
= ij .
0
0
0 i j
Remark: Random continuity means that the chain cannot change from one state to another in
no time. From now on, homogenous continuous-time Markov chains in this section are all
assumed to be random-continuous.
A.BENHARI
-141-
pij ( )
< + , where i j
pii ( ) 1
>
Remark 1: qij is called transition rate from state i to state j, which plays the same role as that
of one-step transition probability in the case of discrete-time Markov Chains.
Remark 2: q ij can be uniformly expressed as
p ( ) 1
lim+ ij
i= j
pij ( ) pij ( 0 ) 0
lim pij ( )
i j
0 +
ij
Remark 1: If
ij
= 0 , then q ii = q ij .
j i
Remark 2: It can be proven that finite-state Markov chains are conservative. In fact,
qij = lim+
j
pij ( ) ij
p ( )
ij
= lim+
ij
= lim+
11
= lim+
0
=0
dp ij ( )
d
= p ik ( )q kj , 0
k
dp ij ( )
d
= q ik p kj ( ) , 0
k
Proof:
dp ij ( )
d
A.BENHARI
= lim
p ij ( + ) p ij ()
= lim
-142-
p ()p () p ()
ik
kj
ik
kj
= p ik ( ) lim
= lim
p ij ( + ) p ij ( )
= lim
k
dp ij ( )
p kj () kj
= p ik ( )q kj
k
= lim
p ()p ()
ik
kj
ik
p kj ( )
p ik ( ) ik
p kj () = q ik p kj ( ) #
Remark: Kolmogorov equations are ordinary differential equations of the first order, which
can be solved out as long as the transition rates q ij and initial transition probabilities p ij (0 )
are given. Note that p ij (0 ) = ij if the process is random continuous.
Solution:
Transition rates
Suppose the chain has stayed at the state 0 for some time t , then
p 01 (t ) = P{ t + t = 1 t = 0} = P{ < t + t t } = t + o(t )
p 01 (t )
t + o(t )
= lim
=,
t +
t +
t
t
q 01 = lim
q 00 = q 01 =
Suppose the chain has stayed at the state 1 for some time t , then
p 10 (t ) = P{ t + t = 0 t = 1} = P{ < t + t t } = t + o(t )
p 10 (t )
t + o(t )
= lim
= ,
t +
+
t
t
q 10 = lim
q11 = q10 =
p i 0 ( ) = p ik ( )q k 0 = p i 0 ( )q 00 + p i1 ()q 10 = p i 0 ( ) + p i1 ( ) = ( + )p i 0 ( ) + [p i 0 () + p i1 ()]
k
p
(
)
=
k p ik ()q k1 = p i0 ()q 01 + p i1 ()q 11 = p i0 () p i1 () = ( + )p i1 () + [p i0 () + p i1 ()]
i1
A.BENHARI
-143-
pi 0 () + ( + )p i 0 () = [p i 0 ( ) + ( + )p i 0 ( )]e ( + ) = e ( + )
d ( + )
e
p i 0 () = e ( + ) p i 0 ( ) =
+ Ce ( + )
d
+
+ e ( + )
p00 ( ) =
1=p00 ( 0 ) = + +C C= +
+
( + )
0=p ( 0 ) = +C C=
p = e
10
(
)
10
+
+
+
p i1 ( ) + ( + )p i1 ( ) = p i1 () =
+ Ce ( + )
+
e ( + )
(
)
p
=
01 p 01 (0 )= 0
+
#
+
e ( + )
p11 ( ) =
p11 ( 0 )=1
p i (t ) = P{ t = i} , then
dp j (t )
dt
= p k (t )q kj
k
Proof:
dp j (t )
dt
dp ij (t )
d
d
=
P{ t = j; 0 = i} = p i (0)p ij (t ) = p i (0)
dt i
dt
dt i
i
=
p (0) p (t )q
i
ik
kj
= p i (0)p ik (t )q kj = p k (t )q kj #
k
k i
Remark: Again, Fokker-Planck equations are also ordinary differential equations of the first
order and can be solved out as long as the transition rates q ij as well as the initial
probabilities j = p j ( 0 ) are given.
A.BENHARI
-144-
1.4. Ergodicity
Definition A Markov chain { t t T} is said to be ergodic if all possible states i and j,
lim p ij ( ) = j 1 and
=1.
0 j 1 and from
p () = 1 we have
ij
= lim p () = lim p () = 1
j
ij
ij
This means that j is a discrete distribution, which we often called limiting probabilities of
the chain.
= 1 is a necessary condition
Theorem For a finite-state Markov chain, if it is regular, i.e., there is a time period such
that for all possible states i and j, p ij ( ) > 0 , then it is ergodic.
Remark: If a finite-state Markov chain is irreducible, i.e., any two states of the chain can
communicate with each other, then it is regularity and therefore ergodic.
t +
t +
t +
t +
Theorem If a finite-state Markov chain is ergodic, its Kolmogorov forward equations will
reduce to linear equations when time is large enough.
A.BENHARI
-145-
p ij ( + ) p ij ( )
+ 0
= lim lim
p ij ( + ) p ij ( )
0 +
= lim
j j
=0
we have
p ij () = p ik ( )q kj
+
k
q kj = 0
Theorem If a finite-state Markov chain is ergodic, its Fokker-Planck equations will reduce to
linear equations when time t is large enough.
Hint:
p j (t ) = p k (t )q kj
t +
q kj = 0
Remark: When the chain is ergodic, its Kolmogorov forward equations and Fokker-Plank
equations approximate to the same system of linear equations.
Remark: The transition rates i = q ii +1 are often called birthrates and i = q ii 1 deathrates.
It follows from
ij
= 0 that q ii = ( i + i ) .
=
ij ( ) qik pkj ( ) = qii 1 pi 1 j ( ) + qii pij ( ) + qii +1 pi +1 j ( ) = i pi 1 j ( ) ( i + i ) pij ( ) + i pi +1 j ( )
k
j1 j1 ( j + j ) j + j+1 j+1 = 0
A.BENHARI
-146-
t +
t +
t +
t +
j1 j1 ( j + j ) j + j+1 j+1 = 0
Example If a birth and death process is ergodic, it follows from Fokker-Planck equations that
0 0 + 1 1 = 0
j 1 j 1 ( j + j ) j + j +1 j +1 = 0
m 1 m 1 m m = 0
j=1
L m - 1
0 0 + 1 1 = 0
j j + j +1 j +1 = j 1 j 1 + j j
j = 1,L ,m 1
j
j+1
j =
j j1
j+1
j
j1 = L = i 0 , j = 0,1,L ,m 1
j
i = 0 i +1
j
0 =
j =1
j 0
1 + i
j 0 i = 0 i +1
j
j +1 =
i=0
i +1
1 + i
j 0 i = 0 i +1
j
, j = 0,1,L ,m 1
A.BENHARI
-147-
Remark: The counting process is a continuous-time and discrete-state process, which is often
used to represent the total number of events that have occurred up to time t, i.e., within the
interval [0, t ] .
n
(
)
= n} =
n!
e , n = 0,1,2, L
Remark: It immediately follows from the condition (3) that the increments of a Poisson
process is stationary.
P {t + t = 1} = e
k
k +1
+
+
k ( )
k ( )
+ o ( )
= 1 + ( 1)
=
= + ( 1)
k !
k ! o( )= + ( 1)k ( )k +1
k =1
k =1
k!
k =1
P {t + t 2} = 1 P {t + t = 0} P {t + t = 1} = 1 e + o ( )
+
= ( 1)
( )
k =2
k!
+ o ( ) = o ( )
Theorem A counting process { t t 0} is a Poisson process having rate > 0 if and only if it
satisfies the following conditions:
(1) 0 = 0
(2) the process has independent and stationary increments
(3) for all t, P{ t = 1} = t + o(t ) , P{ t 2} = o(t )
Proof:
A.BENHARI
-148-
If { t t 0} is a Poisson process, the conditions (1)-(3) are clearly satisfied. We now prove
that the conditions (1)-(3) are sufficient for { t t 0} to be a Poisson process. For
convenience, we denote by Pn (t ) = P{ t = n} the probability of occurrence of n events within
the interval [0, t ] .
From
P0 (t + h ) = P{ t + h = 0} = P{ t = 0; t + h t = 0}
independent increments
stationary increments
P0 (t )P{ h 0 = 0}
condition ( 3)
P{ t = 0}P{ t + h t = 0}
P0 (t )[1 h + o(h )]
P0 (t + h ) P0 (t )
P (t + h ) P0 (t )
o(h )
= P0 (t ) +
P0 (t ) = lim 0
= P0 (t )
h 0
h
h
h
P0 (t ) = Ce t
P0 ( 0 )= P{ 0 = 0}= C =1
P0 (t ) = e t
For n 1 ,
Pn (t + h ) = P{ t + h = n} =
n
= P{ t = n; t + h t = 0}+ P{ t = n 1; t + h t = 1}+ P{ t = n k; t + h t = k}
k=2
independent increments
stationary increments
condition (3 )
d t
e Pn (t ) = e t Pn 1 (t )
dt
when n = 1 ,
d t
e P1 (t ) = e t P0 (t ) = P1 (t ) = (t + C)e t P1 (t )
=
te t
P1 ( 0 )= P{ 0 =1}= C = 0
dt
when n = 2 ,
A.BENHARI
-149-
2
(t )2
t
d t
(
t ) t
t
2
e P2 (t ) = e P1 (t ) = t P2 (t ) =
P2 (t )
e
=
+ C e
P2 ( 0 )= P { 0 = 2}= C = 0 2!
dt
2!
(t )n
n!
e t #
Remark: If the increments are not stationary, the resulting process is called nonhomogenous
Poisson process.
1.6.2. Properties
Example (Statistical Average) Let { t t 0} be a Poisson process, then
(1) the mean value and variance are
E[ t ] = E[ t 0 ] = t , D[ t ] = D[ t 0 ] = t
This implies that { t t 0} is not a weakly stationary process.
(2) the correlation function is
2
E [t + t ] = E ( t + t + t ) t = E [ t + t ] E [t ] + E t
2
= 2 t + E t E [t ] + E [t ]
2
2
= 2 t + E t t + 2tE t t + E t
E [t ]= t
= 2 t + t + 2 t 2 = t ( + t + 1)
random continuous
p ij () = P{ t + = j t = i} =
A.BENHARI
P{ t + = j; t = i} P{ t + t = j i; t = i}
=
P{ t = i}
P{ t = i}
-150-
P{ t + t = j i}P{ t = i}
= P{ t +
independent increments
P{ t = i}
=
() ji
e
t = j i} = ( j i )!
ji
others
ij
0 +
e 1
lim
0+
e
pij ( ) ij lim
0 +
qij = lim+
=
j i
0
lim
j i 1e
0+ ( j i ) !
j =i
=
j i 2 0
j = i +1
j =i
j = i +1
j i 2& j < i
j <i
1.6.3. Examples
Example (Exponential Interarrivals) Let { t t 0} be a Poisson process representing the
total number of events that have occurred within the interval [0, t ] , Wn a continuous random
variable representing the time of occurrence of the n th event, n 1 , and Tn = Wn Wn 1 the
interval time between the occurrence of the n th event and that of the ( n 1) event, n 2 ,
th
then
+
(t )k
k =n
k!
dFWn (t )
dt
e t
k
n 1
d + (t ) t
t ( t )
=
e = e
(n 1)!
dt k = n k!
P{Tn = Wn Wn 1 > } = P{ t + t = 0} = e
f Tn () =
d
d
P{Tn } = [1 P{Tn > }] = e
d
d
Example (The M/M/n Queue) Let { t t 0} be a Poisson process having rate representing
the number of customers arriving at an n-server service station. Each customer, upon arrival,
goes directly into service if any of the servers are free, and if not, joins the queue. When a
A.BENHARI
-151-
server finishes serving a customer, he leaves the station, and the next customer in the queue, if
there are anyone waiting in the queue, enters the service. The service time for a customer is
assumed to be an exponential-distributed random variable having mean
1
and independent of
the service time for other customers. Now let { t t 0} be a random process representing the
number of customers in the station at time t, is it a birth and death process?
Solution:
p ij ( ) = P{ t +
+ o( ) j = i + 1
i + o( ) j = i 1, 1 i n
= j t = i} =
n + o( ) j = i 1, i > n
ji >1
o()
p ij ( ) i
q ij = lim+
=
0
j = i +1
j = i 1, 1 i n
j = i 1, i > n
ji >1
Remark: M/M/n represent that interarrival time and service time are both exponentially
distributed and there are n servers in the system.
A.BENHARI
-152-
A.BENHARI
-153-
Suppose the average arrival rate of customers to the system and the average service rate are
and (> ) respectively, then the transition rates are given by
q ij = lim+
P{ t +
+ o( )
=
lim
0 +
= j t = i}
+ o()
= lim+
=
lim o( ) = 0
0 +
j = i +1
j = i 1
ji >1
Thus, the process is a birth and death process. It follows from the Fokker-Planck equation that
p0 (t ) = p 0 (t ) + p1 (t )
p j (t ) = p j1 (t ) ( + )p j (t ) + p j+1 (t ) j 1
0 = p 0 + p1
t + 0 = p ( + )p + p
j1
j
j+1
p p1 =
0
p j1 p j = p j p j+1
j1
j1
p = p 0 + p j = 1
pj =1
j= 0
k
k
+
=
L = kp = k1 = 1 k =
k =0
1
k =0
k =0
2
+
1
L Q = (k 1)p k = L 1 p 0 =
1
k =1
A.BENHARI
-154-
2
( )
f t
t n t n 1 L t1
(y
x n , x n 1 ,L, x1 ) = f t
tn
(y x ) is
(y
xn )
function.
Remark 2: A continuous-time and continuous-state Markov process
homogenous if and only if its transition density function f t +
(y x ) is
t T} is
independent of the
initial time t.
Remark 3:
F t +
(y x ) = P{ t + < y
t = x} =
t + t
(p x )dp
f t + +
(y x ) = f
t
t +
(z x )f
t++
t +
(y z )dz
Proof:
f t + +
(y x ) =
A.BENHARI
f t + +
f t t + + (x , y )
f t (x )
t t +
(y
f t t + t + + (x , z , y )
f t (x )
x, z )f t t + (x, z )
f t (x )
dz = f t + +
-155-
dz
t+
(y z )f
t+
(z x )dz
N 0, 2 , where > 0
Remark 1: If = 1 , the process is called standard Wiener process.
Remark 2: The condition (3) implies that Wiener process is a process with stationary
increment.
n 1
n 1
i t i = n t n t n 1 + n t n 1 + i t i = n t n t n 1 + i t i = L =
i =1
0 =0
(
n
i=2
tn
i =1
t n 1 + 1 t 1 0
i =1
)
n
Since the increments are independent normal variables, so is the random variable
i =1
E[ t ] = E[ t 0 ] = 0 ,
A.BENHARI
-156-
ti
[ ]= t
) ] + E[ ] = E[ ] = t
D[ t ] = D[ t 0 ] = E t
E[ t + t ] = E[( t +
E t +
D[ t + ] D[ t ]
t
t+
Example Let
f t +
(y x ) = ?
Solution:
F t + t (y, x ) = P{ t + < y; t < x} = P{ t + t + t < y; t < x} =
U = t + t ,V = t
f (u, v)dudv
UV
u + v < y, v< x
y x
s = u + v, t = v
f (s t, t )dtds
UV
f t +
2 F t + t (y, x )
yx
(y x ) =
= f UV (y x , x ) = f U (y x )f V (x ) =
f t + t ( y, x )
f t (x )
1
=
( y x )2
2t
1
2t
f t +
2 2
1
2
( y x )2
2 2
2t
x2
2 2 t
x2
2 2 t
1
2
( y x )2
22
(y x ) = N(x, 2 ) #
(X, Y ) ~ N(1 , 2 , 12 , 22 , )
f Y X (y x ) = N 2 (x 1 ) + 2 , 22 1 2
1
since
t + = t + 0 ~ N 0, 2 (t + ) , t = t 0 ~ N 0, 2 t
=
E[ t + t ]
D[ t + ]D[ t ]
2t
2 (t + ) 2 t
-157-
t
t+
x2
2 2 t
N 0,0, 2 (t + ), 2 t ,
t +
f t +
2
(x 1 ) + 2 , 22 1 2
1
(y x ) = N
Problems3-1(18)
A.BENHARI
-158-
) = N(x, )
2
A.BENHARI
-159-
B = bi ( o ) bi ( o ) = p ( o Qt = i ) , i = 1,L , N
A.BENHARI
-160-
Assumption 1: The t th state, given the (t 1) state, is independent of the previous states:
th
Assumption 2: The t th output, given the t th state, is independent of other outputs and states:
Example
p ( oT ,L , o1 ; qT ,L , q1 )
p ( oT ,L , o1 qT ,L , q1 ) =
p ( qT ,L , q1 )
p ( oT oT 1 ,L , o1 ; qT ,L , q1 ) p ( oT 1 ,L , o1 ; qT ,L , q1 )
p ( qT ,L , q1 )
p ( oT qT ) p ( oT 1 oT 2 ,L , o1 ; qT ,L , q1 ) p ( oT 2 ,L , o1 , qT ,L , q1 )
p ( qT ,L , q1 )
p ( oT qT ) p ( oT 1 qT 1 ) p ( oT 2 ,L , o1 ; qT ,L , q1 )
p ( qT ,L , q1 )
T
t =1
t =1
= L = p(o t q t ) = b q t (o t )
-161-
t=2
t =2
= L = p(q1 ) p(q t q t 1 ) = q1 a q t 1q t
A.BENHARI
-162-
p ( oT ,L , o1 qT ,L , q1 ) p ( qT ,L , q1 ) =
q T ,L, q1
T
T
b
o
qt ( t ) q1 aqt 1qt
q T ,L, q1 t =1
t =2
But this calculation involves the number of operations in the order of N T . This is very large
even if the length of the sequence, T, is moderate. Therefore we have to look for other
methods for this calculation.
*
1
qT ,L, q1
oT ,L , o1 )
Note that
p ( qT ,L , q1 oT ,L , o1 ) =
p ( qT ,L , q1 ; oT ,L , o1 )
p ( oT ,L , o1 )
we have
arg max p ( qT ,L , q1 oT ,L , o1 ) = arg max p ( qT ,L , q1 ; oT ,L , o1 )
qT ,L, q1
qT ,L, q1
A.BENHARI
-163-
= arg max p ( oT ,L , o1 )
A.BENHARI
-164-
p ( oT ,L , o1 ) = ?
t ( qt ) = p ( ot ,L , o1 , qt )
It is easy to see that following recursive relationship holds.
1 ( q1 ) = p ( o1 , q1 ) = p ( o1 q1 ) p ( q1 ) = q bq ( o1 )
1
qt
p ( oT ,L , o1 ) = p ( oT ,L , o1 , qT ) = T ( qT )
qT
qT
t (qt ) = p(oT ,L , ot +1 qt )
As in the case of t ( qt ) there is a recursive relationship which can be used to calculate
A.BENHARI
-165-
t (qt ) efficiently.
T (qT ) = 1
qt +1
qt +1
p ( oT ,L , o1 ) = p ( oT ,L , o1 , q1 ) = p ( oT ,L , o2 o1 , q1 ) p ( o1 , q1 )
q1
q1
= p ( oT ,L , o2 q1 ) p ( o1 q1 ) p ( q1 ) = 1 ( q1 ) bq1 ( o1 ) q1
q1
q1
qt
The above equation is very useful, specially in deriving the formulas required for gradient
based training.
A.BENHARI
-166-
(q
*
T
or equally
(q
*
T
An natural way to solve this problem is to calculate all possible state sequences to find the
most likely state sequence. But some times this method does not give a physically meaningful
state sequence. Therefore we would go for another method which has no such problems.
In this method, commonly known as Viterbi algorithm, the whole state sequence with the
maximum likelihood is found. In order to facilitate the computation we define an auxiliary
variable,
t ( q t ) = max p ( o t ,L , o1 ;q t , q t 1 ,L , q1 )
q t 1 ,L,q1
then we have
= max p ( o t +1 o t ,L , o1 ;q t +1 , q t ,L , q1 ) p ( o t ,L, o1 ;q t +1 , q t ,L , q1 )
q t ,L,q1
= max p ( o t +1 q t +1 ) p ( o t ,L, o1 ;q t +1 , q t ,L , q1 )
q t ,L,q1
qt
q t 1 ,L,q1
which gives the highest probability that partial observation sequence and state sequence up to
A.BENHARI
-167-
the t moment can have, when the current state is q t +1 . Note that
2 ( q 2 ) = max p ( o 2 , o1 ; q 2 , q1 ) = max p ( o 2 q 2 ) p ( o1 ; q 2 , q1 )
q1
q1
q1
So the procedure to find the most likely state sequence starts from the following calculation
qT ,L,q1
qT
qT
qT1 ,L,q1
= max bqT ( oT ) max aqT 1qT T 1 ( qT 1 ) = L
qT
qT 1
This whole algorithm can be interpreted as a search in a graph whose nodes are formed by the
states of the HMM in each of the time instant t .
A.BENHARI
-168-
L ( ) = p ( oT ,L , o1 )
Then the ML criterion can be given as,
= arg max L ( )
However there is no known way to analytically solve for the model = ( , A,B ) , which
maximize the quantity L( ) . But we can choose model parameters such that it is locally
maximized, using an iterative procedure, like Baum-Welch method or a gradient based
method, which are described below.
-169-
First one of those variables is defined as the probability of being in state qt at t and in state
qt +1 at t + 1 . Formally,
t ( q t+1 , qt ) = P ( Qt +1 = qt +1 , Qt = qt oT ,L , o1 )
t ( q t+1 , qt ) = p ( q t+1 , qt oT ,L , o1 ) =
=
=
=
p ( oT ,L , o1 )
p ( oT ,L , ot +1 , qt +1 ot ,L , o1 , qt ) p ( ot ,L , o1 , qt )
p ( oT ,L , o1 )
p ( oT ,L , ot +1 , qt +1 qt ) t ( qt )
p ( oT ,L , o1 )
p ( oT ,L , ot + 2 ot +1 , qt +1 , qt ) p ( ot +1 , qt +1 qt ) t ( qt )
p ( oT ,L , o1 )
p ( oT ,L , ot + 2 qt +1 ) p ( ot +1 qt +1 , qt ) p ( qt +1 qt ) t ( qt )
p ( oT ,L , o1 )
t +1 ( qt +1 ) bq
( ot +1 ) aq q t ( qt )
p ( oT ,L , o1 )
t +1
t t +1
t ( qt ) = P ( Qt = qt oT ,L , o1 )
that is the probability of being in state qt at t , given the observation sequence and the model.
t (qt ) can be also derived from the forward and backward variables:
t ( qt ) =
=
p ( oT ,L , o1 , qt )
p ( oT ,L , o1 )
p ( oT ,L , ot +1 ot ,L , o1 , qt ) p ( ot ,L , o1 , qt )
p ( oT ,L , o1 )
p ( oT ,L , ot +1 qt ) p ( ot ,L , o1 , qt )
p ( oT ,L , o1 )
t ( qt ) t ( qt )
p ( oT ,L , o1 )
One can see that the relationship between t (qt ) and t (q t+1 , q t ) is given by,
p (oT ,L, o1 , qt )
t (qt ) =
=
p(oT ,L, o1 )
p(o
qt +1
,L, o1 , qt +1 , qt )
p (oT ,L, o1 )
= t (q t+1 , qt )
qt +1
Now it is possible to describe the Baum-Welch learning process, where parameters of the
HMM is updated in such a way to maximize the quantity, p ( o1 , o2 ,L , oT ) . Assuming a
starting model = ( ,, B ) , we first calculate the forward and backward variables and
A.BENHARI
-170-
using the recursions, and then and . Next step is to update the HMM parameters
according to the following equations, known as re-estimation formulas.
T 1
)
q = 1 ( q ) , aqt q t+1 =
)
(q
t
t =1
T 1
(q)
q =
R T 1
(r )
r =1
R
( ) ( q )
)
, aqq =
r =1
, qt )
(q )
t =1
t+1
1 t T ,ot = o
, bqt ( o ) =
T
t ( qt )
(q )
t =1
( ) ( q, q)
r =1 t =1
R T 1
( ) ( q )
r =1 t =1
q( k ) =
( ) (q)
r
r =1
R T 1
( ) ( q )
r =1 t =1
A.BENHARI
-171-
A.BENHARI
-172-
Hint:
E C1 + C 2
) is a Hilbert space.
] = E[(C + C )(C + C )] C E[ ]+ C E[ ]+ 2 C C
C E [ ] + C E[ ]+ 2 C C E[ ] E [ ] < +
2
E[ ]
H is a linear space
[ ] = 0 , = E[] = E[] = ,
P{ = 0} = 1 E
H is an inner space
Remark 1: =
[ ] is then a norm.
, = E
Remark 2: Since
lim n = 0 = def lim n 0 = lim
n +
n +
n +
n 0 , n 0 = lim
n +
A.BENHARI
-173-
E n 0
]= 0
[ ]< + .
(t , t ) (t 2 , t 2 )
all all t 1 , t 2 , L , t n T , the matrix = 2 1
M
M
(t , t ) (t , t )
n 1
n
2
L (t 1 , t n )
L (t 2 , t n )
is nonnegative
O
M
L (t n , t n )
definite.
Proof:
For all numbers 1 , 2 , L , n ,
( 1
(t 1 , t 1 ) (t 1 , t 2 )
(t , t ) (t 2 , t 2 )
L n ) 2 1
M
M
(t , t ) (t , t )
n 1
n
2
L (t 1 , t n ) 1
L (t 2 , t n ) 2
M =
O
M
L (t n , t n ) n
= i j (t i , t j ) = i i , j j = E ( i i )( j j ) =
n
i =1 j=1
i =1 j=1
n
n n
= E ( i i )( j j ) = E i i
j=1
i =1 j=1
i =1 j=1
0 #
A.BENHARI
-174-
a = 0 , then
(1) For all a t 1 t 2 , we have
t1 , t 2 t1 = t1 a , t 2 t1 = 0
(2) For all t 1 t 2 T , we have
t 1 , t 2 = t 1 , t 2 t 1 + t 1 = t1 , t 2 t 1 + t 1 , t 1 = t 1 , t 1 = t 1
A.BENHARI
= t 2 t1 , t 2 t1 = t 2 , t 2 t1 , t 2 t 2 , t1 + t1 , t1 = t 2
-175-
t1
3. Random Analysis
3.1. Limits
Definition Let { t t (a , b )} be a second-order random process and a second-order random
variable, lim t = is then defined as lim t , where t 0 (a , b ) .
t t 0
tt 0
lim
t t 0 ,s t 0
t , s exists.
3.2. Continuity
Definition A second-order random process { t t T} is said to be continuous at the point
t 0 T if given any > 0 , there will be > 0 such that for all t T with t t 0 < ,
t t 0 < .
tt0
[ ]
tt 0
Proof:
[ ]
E[ t ] E t 0 = E t t 0 E t t 0 E t t 0
s s 0
lim
t t 0 ,s s 0
t , s t 0 , s0
= t t 0 , s s 0 + t t 0 , s0 + t 0 , s s 0
t t 0 , s s0 + t t 0 , s0 + t 0 , s s0
-176-
= t t 0 t
0
t 0
t , s = t 0 , s0 .
Proof:
A.BENHARI
t t 0 s s0 + t t 0 s 0 + t 0 s s0 t
0
t 0 ,s s 0
3.3. Derivatives
Definition The second-order random variable is said to be the derivative of a second-order
random process { t t T} at the point t 0 T if given any > 0 , there will be > 0 such
that for all t T with t t 0 < ,
t t 0
t t0
< .
lim
t t 0
t t0
t t0
t t 0
= , i.e., lim
t t0
t t 0
2 R (t , s )
ts
the point (t 0 , t 0 ) .
Proof:
Recall that
lim
t t 0
t t0
t t0
lim
t t 0 ,s t 0
2 R (t , s )
ts
t t0 s t0
,
t t0
s t0
0 < <1
A.BENHARI
t 0 = 0 the limit
lim
t t 0 ,s t 0
, it follows that
t t 0 s t 0
lim E
t t 0 ,s t 0
t t 0 s t 0
t t 0 ,s t 0
lim
t t0 s t0
,
t t0
s t0
lim
t t 0 ,s t 0
, t 0 )]
(t t 0 )(s t 0 )
R (t 0 + (t t 0 ), s ) R (t 0 + (t t 0 ), t 0 )
t
s t0
-177-
exists
0 < <1
lim
2 R (t 0 + (t t 0 ), t 0 + (s t 0 ))
ts
t t 0 ,s t 0
lim
t t 0 ,s t 0
t t0 s t0
,
t t0
s t0
2 R (t 0 , t 0 )
ts
exists.
t + h t s + k s R (t , s )
t
s
R (t , s ) = E[ t s ] = E lim t + h
lim s + k
=
lim
E
=
h 0 , k 0
k 0
h
k
h
k
ts
h 0
3.4. Integrals
Definition Let { t a t b} be a second-order random process and
lim i t i = 0
max t i 0
i =1
A.BENHARI
-178-
Stationary Processes
A.BENHARI
-179-
} {
Example Let { t t T} be a strictly stationary process with finite second-order moment, then
(1) for all t T , since F(x; t ) = F(x;0 ) , we have
E[ t ] =
xdF(x; t ) =
] (x m) dF(x; t ) = (x m) dF(x;0) =
E ( t m ) =
2
xdF(x;0) = m = Const.
= Const.
+ +
] xydF(x, t ; y, t
E t 2 t1 =
A.BENHARI
+ +
2 ) = xydF(x ,0; y, t 2 t 1 ) = R (t 2 t 1 )
-180-
[ ] 0
(1) R (0 ) = E t
] [
(3) R ( ) = E t + t E[ t + t
Schwartz Ineqality
E t +
] E[ ] = R (0)
2
(1
R (t 1 t 1 ) R (t 1 t 2 )
R (t 2 t 1 ) R (t 2 t 2 )
L n )
M
M
R (t t ) R (t t )
n
1
n
2
L R (t 1 t n ) 1
L R (t 2 t n ) 2
=
M
O
M
L R (t n t n ) n
[(
)(
)]
= i j R (t i t j ) = i j E t i t j = E i t i j t j =
n
i =1 j=1
A.BENHARI
i =1 j=1
-181-
i =1 j=1
)(
n
n n
= E i t i j t j = E i t i
i =1
i =1 j=1
[ ] E [ ]
Theorem Let
t T} and
R ( ) = E[ t + t ] , then.
(1) R () = E[ t + t ] = E t + t = R ( )
(2) R () = E[ t + t ] E[ t + t
Schwartz Ineqality
E t +
] E [ ] =
2
R (0 )R (0 )
2.3. Periodicity
Theorem (Periodicity) Let { t < t < +} be a weakly stationary process, t is periodic
with period T if and only if its correlation function R ( ) is periodic with period T.
Hint:
E t +T t
] = E[ ]+ E[ ] 2E[
2
t +T
2
t
t +T
t ] = 2 R (0 ) R (T )
Theorem Let { t a < t < b} be a weakly stationary process and R ( ) its correlation function,
t has derivatives within the open interval (a , b ) if R ( ) is present and continuous at the
point = 0 .
t
E[ t ] = E lim t + h
= lim E t + h
= lim
=0
h
h
h 0
h 0 h
h 0
A.BENHARI
-182-
R (t , s ) = E[ t s ] =
2 R (t , s )
ts
2 R (t s )
ts
= R (t s ) R () = R ()
1
1
= lim
t dt , t + t = lim
t + t dt (time average)
T + 2T
T + 2T
T
T
Theorem The mean of a weakly stationary random process { t < t < +} is ergodic if
and only if
1
1
2
1
R () d = lim 1
C ()d = 0
T + T
T + T
2T
2T
0
0
2T
lim
2T
where C () = R () 2 .
Proof:
Note that
T
T
1
1
E[ t ] = E lim
t dt = lim
E[ t ]dt =
T + 2T
T + 2T
T
T
we have
P{ t = } = 1
0 = E t
A.BENHARI
] = D[
] = E[
2
t
1
= E lim
T + 4T 2
-183-
dt
T t T s ds
T
1
= lim 2
T + 4T
1
2
T T
1
lim 2
t + s = p , t s = q T + 4T
R (t s )dtds
T T
TT
1
2
R
(
q
)
dpdq
2
2 T < p + q < 2 T , 2 T < p q < 2 T
2T +q
0
2 T 2 T q
1
R (q )dq + dp R (q )dq 2
dp
0 2T +q
T + 8T 2
2T 2T q
= lim
1
T + 2T 2
= lim
2T
2T
1
q
2
2
1
R (q )dq
(2T q )R (q )dq = Tlim
+ T
2T
2T
2T
1
q
1
q
2
= lim 1
R (q ) dq = Tlim
1
C (q )dq #
T + T
+
2T
T 0 2T
0
Theorem The correlation function of a weakly stationary random process { t < t < +}
is ergodic if and only if
2T
q
1
2
1
B (q ) R ( ) dq = 0
T + T
2
T
0
lim
] [
where B (q ) = E t + q t = E t + q + t + q t + t .
Proof:
Let t = t + t , then
E[ t ] = E[ t + t ] = R ( )
E[ t s ] = E[ t + t s + s ] =
+ + + +
+ + + +
This shows that t is at least weakly stationary. It follows from the preceding theorem that
P{ t = E[ t ]} = P{ t + t
q
1
2
= R ()} = 1 lim 1
B (q ) R () dq = 0 #
T + T
2T
0
2T
-184-
S () = lim
E F (, T )
T +
2T
random process.
j
R ()e d , R () =
1
S ()e j d
2
S () = R ( )e j d = R ( )e j d = R ( )e j d = S () #
Definition (White Noise) A weakly stationary process { t < t < +} is said to be a white
noise process if its spectrum is flat, i.e., S () = 2 (Const.)
Remark: Since
+
j
()e d = 1
1
e j d = ( )
2
we have
R ( ) =
A.BENHARI
1
1
S ()e j d =
2 e j d = 2 ( )
2
2
-185-
y(n ) + k y(n k ) = 0 x (n )
k =1
(2) a random sequence y(n ) is said to be in accordance with a moving-average (MA) model
of order M if it can be expressed as
M
y(n ) = m x (n m )
m=0
y(n ) + k y(n k ) =
k =1
x(n m )
m =0
( ) R (m )e
S e j =
m =
j m
= 2x
(m )e
jm
m =
= 2x
k =1
m=0
y(n ) + k y(n k ) = m x (n m )
M
let H(z ) =
m =0
K
z m
1+ k z
and z max the largest pole of H(z ) , if z max < 1 , then the model is said
k =1
to be causal and stable and H(z ) is called the transition function of the model.
A.BENHARI
-186-
Remark 1: From now on, the ARMA models we encounter in this lecture are all assumed to
be causal and stable, unless declared something else.
Remark 2: If H(z ) is the transition function of an ARMA model, then h (n ) = Z 1 [H(z )] is
called the impulse response of the model. It can be easily proven that
h (n ) = 0 for n < 0 (causal) and
h(n )
< + (stable)
n =0
H(z ) =
0
K
1+ kz
k =1
For MR model,
H (z ) =
m=0
h (n ) + k h (n k ) = m (n m )
k =1
m =0
Example What are the impulse responses for the following models?
(1) AR (1)
y(n ) y(n 1) = x (n ) , < 1
H(z ) =
, z > h (n ) = Z 1
= n u (n )
1
1
1 z
1 z
(2) AR (2 )
y(n ) 1 y(n 1) 2 y(n 2 ) = x (n )
h (n ) 1 h (n 1) 2 h (n 2 ) = (n ) , h (n ) = 0 for all n < 0
h (0) = 1 , h (1) = 1 , h (n ) = 1 h (n 1) + 2 h (n 2 ) , n 2
A.BENHARI
-187-
mean value:
+
+
y = E[y(n )] = E h (k )x (n k ) = h (k )E[x (n k )] = 0
k =0
k =0
correlation function:
+
+
R y (m ) = E[y(n + m )y(n )] = E h (p )x (n + m p ) h (q )x (n q )
p = 0
q = 0
+ +
+ +
q =0 p=0
q =0 p =0
= h (p )h (q )E[x (n q )x (n + m p )] = 2x h (p )h (q )(q p + m )
+
= 2x h (q )h (q + m ) = 2x R h (m )
q =0
where R h (m ) = h (q )h (q + m ) .
q=0
variance:
+
2y = R y (0 ) = 2x R h (0 ) = 2x h (n )
n =0
y (m ) =
R y (m )
2
y
R h (m )
R h (0)
spectrum:
( ) R (m )e
S y e j =
m =
= 2x h (k )e jk
k =0
= 2x
m =
( )
= He
jm
k =0
k =0
n =0
h (k + m )e j(m+ k ) = 2x h (k )e jk h (n )e jn
n=k+m
m =
2
x
h(k )h (k + m )e
2
x
m=0
K
jm
1 + k e jk
k =1
Remark: It is clear that y(n ) is also a zero-mean weakly stationary random process.
A.BENHARI
-188-
2
R (m )
m
= 1 =
0
y (m ) = h
m
1
R h (0)
2
1 2
m
m =0
+
+
+
E[x (n )y(m )] = E x (n ) h (k )x (m k ) = h (k )E[x (n )x (m k )] = 2x h (k )(n m + k ) = 0
k =0
k =0
k =0
Remark: The theorem is straightforward because of the causality of the model. The causality
states that the output from the model is only dependent upon the input to the model at present
and in the past and has nothing to do with the input in the future.
y(n ) k y(n k ) = 0 x (n )
k =1
R y (i ) = k R y (i k )
k =1
y (i )=
R y (i )
R y (0 )
k =1
y (i k ) = y (i )
A.BENHARI
-189-
y ( 1)
y (0 )
(1)
y (0 )
y
M
M
y (K 1) y (K 2 )
L y (1 K ) 1
1
y (1)
L y (2 K ) 2 y (1)
1
=
M
O
M
M
M
L
y (0 ) K y (K 1) y (K 2 )
L y (K 1) 1 y (1)
L y (K 2 ) 2 y (2 )
=
M M
O
M
L
1
K y (K )
The parameters 1 , 2 , L , K can be then derived from the solution to the above equations.
Remark: In practice, R y (i ) = E[y(n )y(n i )]
1 n
y(k )y(k i) , where i = 1,2,L, K .
K k = n K +i
y(n ) k y(n k ) = x (n )
k =1
we have
2
K
K
K K
= E x (n ) = E y(n ) k y(n k ) = R y (0 ) 2 k R y (k ) + p q R y (p q )
k =1
k =1
p =1 q =1
2
x
K
K
K
K
K
= R y (0 ) 2 k R y (k ) + p q R y (p q ) K
=
R y (0) 2 k R y (k ) + p R y (p )
k =1
p =1
k =1
p =1
q =1
q=1 q R (pq )=R y (p )
K
= R y (0 ) k R y (k )
k =1
The variance 2x can be obtained after the parameters 1 , 2 , L , K have been estimated.
x(n m )
m=0
M
M
M M
y(n )y(n i ) = m x (n m ) k x (n i k ) = m k x (n m )x (n i k )
m =0
k =0
m =0 k = 0
M
Mi
M
R y (i ) = 2x k m (k + i m ) = x2 k k + i
k =0
k =0
m =0
A.BENHARI
-190-
~ k
~ = ,
x
x 0
k=
0
M i
~ ~
~2
x k k + i
k =0
~
~2 ,~
Thus, the unknowns
X 1 , L , M can be derived from the solutions to the above M+1
equations.
k =1
m =0
y(n ) k y(n k ) = m x (n m )
for i = 1,2,L, K , we have
K
k =1
m =0
k =1
m =0
R y (M + i ) = k R y (M + i k )
y (i )=
k =1
R y (i )
R y (0 )
k =1
y (M + i k ) = y (M + i )
y (M 1)
y (M )
(M + 1)
y (M )
y
M
M
y (M + K 1) y (M + K 2 )
L y (M + 1 K ) 1 y (M + 1)
L y (M + 2 K ) 2 y (M + 2 )
=
M
O
M
M
L
y (M )
K y (M + K )
The parameters 1 , 2 , L , K can be then derived from the solutions to the above equations.
k =1
m =0
y(n ) k y(n k ) = m x (n m )
if let
K
A.BENHARI
x(n m)
m=0
-191-
~ 1
~ (= ),
the unknowns
x
0 x
1 =
0
~
,L, M = M
A.BENHARI
-192-
4. Problems
(1) An IID process must be strictly stationary.
In fact, let { t t T} be an IID process, then
identical distribution
(2) If
}{
independence
}{
} {
} {
independence
[ ]
E[ n m ] = 0 (when n m ), then
2
E[ n m ] =
0
n=m
nm
= 2 (n m )
(3) Let be a random variable possessing a uniform distribution over the interval (0,2 ) and
1
a
E[ t ] =
a cos(t + x )dx =
y = t + x 2
2 0
t + 2
cos(y)dy = 0
E t 2 t1 =
a2
a2
cos
(
t
+
x
)
cos
(
t
+
x
)
dx
=
cos(t 2 t 1 )
1
2
2 0
2
This implies that the process { t t = a cos(t + ), < t < +} is weakly stationary. #
A.BENHARI
cos( + ) + cos( )
2
-193-
(4) Let s(t ) be a periodic function with period T, be a random variable possessing the
uniform distribution on the interval (0, T ) and { t s(t + ), < t < +} , then
for all < t < + , we have
+
1
1
E[ t ] = s(t + x )f (x )dx = s(t + x )dx =
y
=
t
+
x
T0
T
t +T
1
=
s(y )dy = const.
t s(y )dy periodicit
y T
0
E t 2 t1
1
= s(t 1 + x )s(t 2 + x )f (x )dx = s(t 1 + x )s(t 2 + x )dx =
T0
1
y = t1 + x T
=
t1 + T
s(y)s(t 2 t 1 + y)dy
t1
1
s(y )s(t 2 t 1 + y )dy = R (t 2 t 1 )
periodicity T
0
=
This implies that the process { t s(t + ), < t < +} is weakly stationary. #
(5) Let { t < t < +} be a random process such that for all < t < + ,
1
2
1
P{ t = k} =
2
0
k=I
k = I
others
Furthermore, for all > 0 , if we denote by A k the event that the process changes its values k
times within the period [t , t + ) , then
k
(
)
P{A k } =
k!
Thus,
for all < t < + , we have
E[ t ] = I
1
1
+ ( I ) = 0
2
2
( )k
k =0
k!
= I 2 e
A.BENHARI
= I 2 e 2 , where = t 2 t 1
-194-
Note that the above result can also be applied to the case of t 2 = t 1
This implies that the process is weakly stationary. #
(6) If { t < t < +} is a periodic random process with period T, then its covariance
Proof:
E t +T t
]= 0 .
0 E[ ( t + + T t + ) t ] E t + + T t +
] E[ ] = 0 E[(
2
t + +T
t + ) t ] = 0
R ( + T ) R ( ) = E ( t + + T t + ) t E[ ( t + + T t + ) t ] = 0 R ( + T ) = R ( ) #
A.BENHARI
-195-
Martingales
A.BENHARI
-196-
1.
Simple properties
n =1
is Fn-measurable for any positive integer n. The system (,K,P, (F n)n) is called
a stochastic basis.
Example. If we define Fn := (X1,X2,,Xn) , then X is clearly adapted. This
filtration is called the natural filtration given by X.
Definitions. Let X be an adapted sequence. Suppose that Xn L1 for any n. Then X
is called
A supermartingale if E(Xn+1 Fn) Xn n;
A martingale if E(Xn+1 Fn) = Xn n;
A submartingale if E(Xn+1 Fn) Xn n;
A semimartingale if X is either supermartingale or martingale or
submartingale.
Remark. If one does not define the filtration, it is understood that he has in
mind the natural filtration. Also notice that a martingale is both a sub- and a
super- martingale and conversely, if X is both sub- and super- martingale, it is a
martingale.
Remark. In the literature the concept of semimartingale is slightly different.
However, we shall use it only in this sense.
Examples.
1. Let n be a sequence of i.i.d. r.v. from L1 and let a = E1. Let Fn =
(1,2,,n) and Xn = 1 + 2 ++n . Then a 0 X is a supermartingale, a =
0 X is a martingale and a 0 X is a submartingale. If we think at n as
being the gain of a player at the nth game, then Xn is the gain of the player
ofter n games. So we can understand a supermartingale or a martingale as the
gain in an unfair game and the martingale as the gain in a fair game.
Supermartingale = the game is unfavorable to the player and submartingale =
game favorable to the player.
Proof. E(Xn+1Fn) = E(Xn+n+1Fn) = E(XnFn) + E(n+1Fn) = Xn + E(n+1Fn) (as Xn is Fn measurable) = Xn + En+1 (as Xn is independent on Fn ) E(Xn+1Fn) = Xn + a .
2. Let n be a sequence of non-negative i.i.d. r.v. from L1 and let a = E1. Let Fn
= (1,2,,n) and Xn = 12 n . Then a 1 X is a supermartingale, a =
1 X is a martingale and a 1 X is a submartingale.
Proof. Similar. E(Xn+1Fn) = E(Xnn+1Fn) = XnE(n+1Fn) (as Xn is Fn -measurable) =
XnEn+1 (as Xn is independent on Fn ) E(Xn+1Fn) = aXn .
3. Let (Fn)n be a filtration and f L1. Let Xn = E(fFn). Then Xn is a martingale.
The random variable X = E(fF) is called the tail of X . Martingales of this
form are called martingales with tail.
A.BENHARI
-197-
but not in L .
5. Another concrete example. Let n be i.i.d with the distribution (-1+1)/2. Let
Fn = (1,,n). Let Bn Fn be such that P(Bn) 0 as n but P(limsup Bn)
= 1. Define the sequence Xn by recurrence as follows: X1=1 and Xn+1 =
Xn(1+n+1)+n+1 1Bn for n 1. Then Xn converges in probability to 0 but P(limsupXn =
liminfXn) = 0. That is, Xn diverges almost surely.
Proof. Remark that n+1() = -1 and Bn Xn+1()=0 hence Xn+1() 0 n+1()
= 1,Xn() 0 or Bn. That is, {Xn+1 0} {n+1 = 1,Xn 0} Bn P({Xn+1 0})
P( {n+1 = 1,Xn0} Bn) P(n+1= 1, Xn 0) + P(Bn) = P(Xn 0)P(n+1=1) + P(Bn) =
P(Xn 0)/2 + P(Bn).
Let pn = P(Xn0) and qn = P(Bn). So pn+1 pn/2 + qn n and qn 0. Aplying the
-1
-2
-1
-3
-2
recurrence many times we see that pn+1 2 pn + qn 2 pn-1+ 2 qn-1 + qn 2 pn-2 + 2 qn-1
.. 2-np1 +2n-1(q1 + 2q2 + + 2n-1qn). As 2-np1 0 and , by Cesaro2 + 2 qn-1 + qn
q1 + 2q2 + .. + 2 n1 qn
2 n qn+1
= lim n
= 2limiqn = 0 it means that P(Xn 0)
Stolz lim
n
n 2 2 n 1
2 n1
0. Now suppose that Xn() a for some a . Then Xn+1() Xn() 0 . But
from the recurrence relation we infere that Xn+1 Xn = n+1(Xn + 1Bn ). So, if Xn+1
Xn = Xn + 1Bn (as n = 1) converges to 0, then Xn() + 1Bn () 0, too ,
which is the same with the claim that Xn() + 1Bn () 0 , meaning that 1Bn ()
has a limit. But we know that P(liminf Bn) lim P(Bn) = 0 and P(limsup Bn) = 1 ,
i.e. the sequence 1Bn diverges a.s. . Therefore P( Xn converges to a finite limit)
= 0. Suppose that Xn() . That will imply the fact that Xn() > 0 for any n
great enough. But P(Xn+k > 0 k) P(Xn+j0) j and that converges to 0. Meaning
that P(limXn = or -) = 0. We inferr that Xn diverges a.s.
The fact that Xn is a martingale is obvious, since E(Xn+1Fn) = XnE(1+n+1Fn)+
1Bn E(n+1Fn) (as Xn is Fn measurable and Bn Fn ) = Xn E(1+n+1) + 1Bn E(n+1) = Xn
(as En+1 = 0) . On the other hand remark that
n 1
q
j =1
A.BENHARI
-198-
Moreover, if X,Y are martingales , then aX+bY is a martingale a,b , meaning that
the set of all the martingales of some stochastic basis is a vector space.
Moreover, X is supermartingale -X is a submartingale.
The proof is obvious and left to the reader.
Property 1.3. If X is a martingale and f is a convex function such that
f(Xn) L1 n, then the sequence Yn = f(Xn) is a submartingale. If f is concave
1
and f(Xn) L n, , then the sequence Yn = f(Xn) is a supermartingale. As a
2
consequence, if X is a martingale, then (Xn)n, ((Xn)+)n, Xn is are submartingales.
Proof. It is Jensens inequality for conditioned expectations.Suppose f is convex.
Then E(Yn+1Fn) = E(f(Xn+1)Fn) f(E(Xn+1Fn)) = f(Xn) = Yn .
Property 1.4. The DoobDoob-Meyer decomposition. The submartingales are
actually sums between martingales and increasing sequences. Any submartingale X
can be written as X = M + A where M is a martingale and A is nondecreasing (An
An+1 a.s.) and predictable (i.e. An+1 is Fn measurable) .
Proof. Let us define the sequence An by the following recurrence: A1 = 0 , A2 =
E(X2 F1) X1 , A3 = A2 + E(X3 F2) X2 , ., An+1 = An + E(Xn+1Fn) - Xn . As X is a
submartingale, A is indeed non-decreasing. By the definition, An+1 is Fnmeasurable. Let Mn = Xn An . As Mn+1 = Mn + Xn+1 E(Xn+1Fn) it follows that Mn is
indeed a martingale.
Property 1.5. Martingale transforms. Let X = (Xn)n1 and B = (Bn)n0 be
adapted sequences of r.v. such that Bn(Xn+1 Xn) L1 (that happens for instance if
Bn L and Xn L1 n). Remark that, unless X, B starts from 0. We shall agree
that B0 is a constant in order to be measurable with respect to any -algebra.
Let us define a new sequence denoted by BX by the recurrence (BX)1 = B0X1 and, for
n 1, (BX)n+1 = (BX)n + Bn(Xn+1-Xn) .( Or, directly, (BX)n =X1 + B1(X2 X1) + B2(X3
X2) + + Bn-1(Xn Xn-1) for n 2). Call the sequence BX the transform of X by B.
Then
(i)
If X is a martingale, BX is a martingale, too;
(ii).
If X is a submartingale and Bn 0, n, then BX is a submartingale,
too; if Bn 0, n, BX is a supermartingale.
(iii).
If Bn = c is a constant sequence, L(F1), then BX = cX.
Proof. E((BX)n+1Fn) = E((BX)n + Bn(Xn+1-Xn)Fn) = (BX)n + BnE(Xn+1-Xn)Fn) .
2.
Stopping times
X ( ) () if () <
f () =
()
(2.1) X() =
A.BENHARI
-199-
Remark that, while there exists an ambiguity in the definition of X on the set =
, if < there is no imprecision.
Property 2.1. Examples of stopping times and properties of F.
(i).
Any constant is a stopping time.
(ii).
If = k = constant, then F = F k , meaning that the definition of F
is natural.
(iii).
If X is adapted and B B(), then B defined as B= inf {nXn B} is
a stopping time. (We adopt the convention that inf = ) . This stopping time is
called the hitting time of B.
(iv).
If is a stopping time and A F then A is again stopping time where
A = 1A + 1 \ A .
(v).
If and are stopping times and , then F F.
A F A{} F, A{=} F F
(vi)
(vii)
{} F F, { = } F F
F F = F , (F F) = F
(viii)
Proof. (i) and (ii) are obvious. For (iii) remark that {B = n} = {X1B , X2B ,
, Xn-1B,XnB} F n since X is adapted.
(iv)
It is easy: {A = n } = { = n } A F n due to the definition of F.
It is also immediate: AF A{ = k}Fk so A { = n} =
(v).
U A
{ =
k =1
A{=k} Fk Fn ) Fn . (vi).
Let A F. To prove that A{} F
we have to check that A{}{=n} Fn n. But A{}{=n} = A{
n}{=n} belongs to Fn since A F A{ n} Fn and is a stopping time
{ = n} Fn . As about the set A{=}, it belongs both to F (as A{=}{=n}
= (A{ = n}){ = n} ) and to F (as A{ = }{ = n} = (A{ = n}){ = n}
).
(vii).
That {} F is an easy consequence of (vi) (just set A = ) . To
check that {} F , let n be arbitrary. Then {}{ = n } = {=n}{n} =
{=n} \ {=n}{<n} Fn as {=n} Fn and { < n } Fn . Thus {} F
F. As about { = }, it is even easier: { = } { = n } = { = } { = n }
= { = n} { = n } Fn .
(viii).
As is a stopping time and , , it follows that F
F F . Conversely, if A F F , then A { n) = (A{ n})
(A{ n} ) Fn hence A F. As both and ,
F F
F ( F F) F. Conversely, A F A = (A{=})(A{=}).
The first set is in F and the second one in F hence their union is in (F
F).
A.BENHARI
-200-
U {X B, = n} {X B, = }
n =1
U X (B ) { = n} {
1
n
n =1
U {X
B, = n} { B, = } =
n =1
-1
(2.2) E(fF ) =
E (fF )
n
n =1
Proof. Let Y be the left term from (2.2). By the same reasoning as before, Y
is F -measurable. Let A F . The task is to prove that E(f1A) = E(Y1A). But
E(Y1A) = E(
n =1
n =1
1{=}A) = E(
E (f
n =1
E(E(f1{=}A F))
E (E(f
1{=n}A Fn)) +
n =1
E (f
1{=n}A) + E(f1{=}A )
n =1
. Notice that we have commuted the sum with the expectation due to Lebesgue
dominated convergence theorem. Indeed, if gn =
(f 1{=k}A Fk)
k =1
k =1
k =1
E (f
E
n =1
n =1
filtration.
Proof. Let be a stopping time and Bn = 1{ > n} = 1{n < } for n 1 and B0 = 1 . Due to
the definition of a stopping time, B is an adapted sequence. Let X be an adapted
sequence. Then (BX)n = Xn. Indeed, if () = n , n 2, then Bk() = 1 if k < n
and = 0 if k n . Let k n. Then (BX)k ()=(B1X1 + B1(X2 X1) + B2(X3 X2) + +
A.BENHARI
-201-
Bk-1(Xk Xk-1))() = (X1 + (X2 X1) + (X3 X2) + + (Xk Xk-1))() = Xk() . If k >
n, then (BX)k() = (X1 + B1(X2 X1) + B2(X3 X2) + + Bn-1(Xn Xn-1) + Bn(Xn+1-Xn) +
+ Bk-1(Xk Xk-1))()= (X1 +(X2 X1)+(X3 X2)+ + (Xn Xn-1) + 0(Xn+1-Xn) + + 0(Xk
Xk-1))()
= Xn() . If n = 1, then (BX)1 = B0X1 = X1 = X1 holds in this case,
too.
(n + 1)1
n =1
An
A.BENHARI
-202-
Xn
Xj
j =1
1)
X-Xn
and, as
0 ,
it means that X is in L , too. The same holds for X. But we know that E(XnFn)
Xn for any n. Recalling the definition of the conditioned expectation, that
means that E(Xn1A) E(Xn1A) A Fn, n fixed. As Fn F(n+k) for k 0 , it
follows that E(X(n+k)1A) E(X(n+k)1A) A Fn, n fixed for any k 1. Letting k
1
and keeping in mind that fn f in L E(fn1A) E(f1A) A it follows that
1
Fn . Then A is an algebra of
n =1
n =1
E(
k =0
n + k +1
1{ >n + k } )
k =0
k =0
Therefore is regular.
A.BENHARI
-203-
j =1
k =1
P (>k-1)
k =1
< .
After all, the conclusion is that Yn - Y22 = y Sn22 0 as n .
Meaning that Yn2 Y2 in L1 Yn2 (n)2 Y2 2 in L1 Zn Z
1
in L . So is regular for Z.
Remark. In statistics one uses Walds inequalities in a slightly different
case: is a counting variable which is independent on s. We can see that case
as a particular one of ours as follows: let us extend the natural filtration with
the -algebra generated by . So Fn = (1,2,,n,). Then X remains a
semimartingale with respect to the new filtration because E(Xn+1Fn) = E(Xn+n+1Fn)
= Xn + E(n+1Fn) and n+1 is independent on Fn (the associativity of the
independence: if F1 (here (1,,n)) , F2 (here ()) and F3 (here (n+1)) are
independent, then ( F1 F2) is independent on F3 .
Remark. One should not believe that automatically any stopping time with
finite expectation is regular. For instance, if Xn = n2 (this is a submartingale!)
and is such that E< but E2 = , then X = 2 is not even in L1, in the spite of
1
1
the fact that Xn , being constants are in L . So Xn cannot converge in L !
A.BENHARI
-204-
3.
There are two players, A and B playing a game . The first one has a
capital of a euro , the second one b euro (a,b positive integers). If A wins, he
gains 1 euro; if B wins, he loses 1 euro. They decide to play the game until the
ruin, i.e. until one of them loses all his money. Let be the ruin time, that is
the number of games after which the game stops. We want to find the probability
that A wins and the expectation of .
Suppose that the probability that A wins is p. Let q be the
probability of a draw and r the probability that B wins. To avoid trivialities we
accept that p,r 0. Let n the gain of A at the nth game. So n 1 0 1 .
r q p
Thus
(3.1)
: = E1 = p-r
and , as E12 = p+r
2
2
(3.2) := Var(1) = p+r-(p-r) = p(1-p) + r(1-r) + 2pr.
We accept that the s are independent. Let Xn = 1++n . This is the gain
of the first player after playing n games.
The game stops the first time when Xn = b (in this case B is ruined) or
Xn = -a (now A has lost all his money). So = inf {n Xn = b or Xn = - a }.
Let (Fn)n be the natural filtration.
Remark first that < a.s. That is, P(-a < Xn < b for any n ) = 0.
Indeed, if 0 the law of the large numbers says that Sn/n a.s.
Sn/n in probability. So P(n(-) Sn n(+)) 1 as n for any
> 0. We infer that Sn if > 0 and Sn - if < 0. In both cases P(-a <
Xn < b for any n ) = 0.
Xn
N(0,1) in
If = 0, the Central Limit Theorem asserts that
n
distribution. Therefore P(-a < Xn < b for any n ) P(-a < Xn < b ) = P(
X
X
a
b
) P(- < n < ) (for n great enough) N(0,1)((-,)) for
< n <
n n n
n
any >0. As the normal distribution is absolutely continuous, the quantity
N(0,1)((-,)) can be made arbitrarily small. So P( = ) = P(-a < Xn < b for any
n ) = 0 in this case, too.
Why E < ?
There exists a direct proof, but it is pretty sophisticated. Here is an
indirect one.
Let Yn = Xn na. Then (Yn)n is a martingale and EYn = 0. Then E(Yn ) = 0
since any bounded stopping time (in our case n) is regular. It means that E(Xn)
= E(n) n. But the right hand term converges to E, by Beppo-Levi . The left
hand one is bounded between a and b, since a Xn b hence the limit EX =
E(a.s. lim Xn) = E .
A.BENHARI
-205-
(3.6)
P(A wins) =
Notice that if there are no draws, E=ab, the win-probabilities do not change.
A.BENHARI
-206-
Convergence of martingales
1.
Maximal inequalities
. Therefore P( <
a
a
EX 1
) = P( U { n}) = limn P(n) (since the sets increase!)
. As a
a
n
EX 1
.
consequence P(X* > a)
a
A.BENHARI
-207-
sup{E X n }
n
a
E ( X n 1{X *n >a} )
Proof. Let m = supn EXn , let a > 0 and let Yn = Xn. Then Y is another
submartingale, by Jensens inequality hence m = limn EXn. Let
(1.6)
= inf {n Yn > a} (inf := !)
Then the stopped sequence (Yn)n remains a submartingale (any bounded
stopping time is regular!) and Yn a1{n} + Yn1{>n}. (Indeed, by the very
definition of , < Y > a!)
It follows that a1{n} Yn aP( n) EYn EYn m (the stopping
theorem applied to the pair of regular stopping times n and n!) . It means that
m
m
P( n)
for any n hence P(< )
. But clearly { < } = {X* > a}.
a
a
The second inequality comes from the remark that n X*n > a . So
a1{n} Yn1{n} aP( n) E(Yn1{n}) E(Yn1{n}) (as n n Yn
E(YnFn) by the stopping theorem E(Yn1A) E(Yn1A) A Fn ; our A is {
n}!) . Recalling that { n} = {X*n > a} we discover that aP(X*n > a) E(Yn1{ X*n > a
= E(Xn1{ X*n > a }) which is exactly (1.5).
})
We shall prove now another kind of maximal inequalities concerned with
X*
q supn Xn p
If Xn are only in L1, then
(1.7)
(ii).
X*
(1.8)
X*
e
(1+supn E(Xnlog+Xn)
e 1
Proof.
(i).
Recall the following trick when dealing with non-negative random
variables: if f:[0,) is differentiable and X > 0, then Ef(X) = f(0) +
1
If f(x) = x the above formula becomes EX = pt p P( X > t )dt .
p
A.BENHARI
-208-
Now write (1.5) as tP(X*n> t) E(Yn1{X*n > t}) and multiply it with pt . We
p-1
obtain
pt
p-2
p 2
p
Yn ( ( p 1)t p21[ 0, X *n ) (t )dt )dP (we applied Fubini, the
p 1
0
p2
pt ( Yn1{X *n >t}dP)dt =
0
X *n
nonnegative case) = q Yn (
p 1
(t
(X*n)p-1
(Holder
( ( X
p-1
(X*n)
!) . But
q Yn
X*n
p
p
X*n
* ( p 1) q
n
dP
1
q
( ( X
= E(X*np) q
* p
n
) dP
Yn
p 1
p =
p-1
X*n
(X*n)p-1
hence we
p-1
= q
Yn
X*n
n.
X*n
As a consequence,
qsupk Yk
qsupk Yk
p
p
from 1 to :
P (X*n> t) =
E Yn 1{X *n >t }
t
1( 0,b ) (t )
t
Yn (
)dt =
Yn 1{X *n >t }
1
E(Yn1{X*n > t}). Integrate that
t
dP )dt =
Yn (
1
1( 0, X *n ) (t )
t
1( 0,b ) (t )
1( 0, X *n ) (t )
dt )dP =
(ii).
X*
dt )dP . Now
(1.10)
P (X* >
n
t) E(Ynln+(X*n))
Now look at the right hand term of (1.10). The integrand is of the form
aln+b. As
b
b
) = alna + aln
and x > 0 lnx x/e , it follows that alnb
a
a
b
b
alna + a = alna + . The inequality holds with xlnx replaced with xln+x. If
ae
e
b
b
b
b > 1, then aln+b = alnb alna +
aln+a +
and if b 1, then aln+b = 0
e
e
e
b
aln+a +
. We got the elementary inequality
e
b
(1.11)
aln+b aln+a +
a,b 0
e
EX n*
Using (1.11) in (1.10) one gets
P
(
X*
.
n> t) E(Ynln+Yn) +
1
e
alnb = aln(a
A.BENHARI
-209-
P (X* >
n
t) 1 +
P (X* >
n
t)
EX n*
implying that (1-e-1) EX*n 1 + E(Ynln+Yn) n. Remark that the
e
sequence (Ynln+Yn)n is a submartingale due to the convexity of the function x a
xln+x and Jensens inequality. So the sequence (E(Ynln+Yn))n is non-decreasing. Be as
-1
it may, it is clear now that (1-e ) EX*n 1 + supk E(Ykln+Yk) which implies (1.8)
letting n .
Remark. If sup Xn p < , we say that X is bounded in Lp. Doobs
p
p
inequalities point out that if p>1 and X is bounded in L then X* is in L .
1
1.
However, this doesnt hold for p=1 : if X is bounded in L , X* may not be in L A
counterexample is the martingale from Example 4 , previous lesson. If we want X*
1
to be in L , it means that we want X to be bounded in Lln+L . Meaning the condition
E(Ynln+Yn) +
(1.8).
2.
U
j< n
2k() <
(2.2)
a,b() k
Lemma 2.1. The bounded sequence Xn is convergent iff a,b < a.s.
A.BENHARI
-210-
a k
)
b
Proof.
Let k be fixed and define the sequence Z of random variables as follows:
Zn() = 1
if
n < 1()
X 1 ()
Xn
if
1() n < 2() (notice that 1() <
<
a
a
1!)
b
a
X 2 ()
if
b
<
a
!)
a
b Xn
if
a a
b
( )2
a
X 4 ()
!)
a
X
b
( )k-1 n
a
a
b k-2
) !)
a
b
( )k
a
X 2 k ()
!)
a
b X 3 () b
< !)
a
a
a
b
b
4() n < 5() (notice that 4() < ( )2 <
a
a
b k-1 X 2 k 1 ()
)
<(
a
a
if
if
b k
b
) <( )k-1
a
a
X
b j
b
) and the sequences Y(j)n = ( )j-1 n
a
a
a
are nonnegative supermartingales and we took care that at the combining moment j
the jump be downward, it means that we can apply Proposition (1.1) with the result
b
that Z is a non-negative supermartingale. Moreover, Zn ( )k 1{2 k n} . Therefore E(
a
b k
a
) 1{2 k n} EZn EZ1 1. We obtain the inequality P(2k n ) ( )k
n
a
b
a
. Letting n , we get P(2k < ) ( )k
which, corroborated with (2.2)
b
gives us (2.3).
Corollary. 2.3. Any non-negative supermartingale X converges a.s. to a
random variable X such that E(XFn) Xn. In words, we can add to X its tail X
A.BENHARI
-211-
Proposition 2.4.
2.4 Let X be a submartingale with the property that supn E(Xn)+
< . Then Xn converges a.s. to some X L1.
Proof. Let Yn = (Xn)+. As x a x+ is convex and non-decreasing, Y is another
submartingale. Let Zp = E(YpFn), p n. Then Zp+1 = E(Yp+1Fn) E(E(Yp+1Fp) Fn)
E(YpFn) hence (Zp)pn is nondecreasing. Let Mn = limpZp .
We claim that (Mn)n is a non-negative martingale. First of all, EMn =
E(limpZp) = limpE(Zp) (Beppo-Levi) = limpE(Yp) = supp E(Xp)+ < (as Y is a
submartingale). Therefore Mn L1. Next, E(Mn+1 Fn) = E(limp E(YpFn+1)Fn) =
limp E(E(YpFn+1)Fn) (conditioned Beppo-Levi!) = limp E(YpFn) = Mn. Thus M is a
martingale. Being non-negative, it has an a.s limit, M , by Corollary 2.3.
Let Un = Mn - Xn .
Then U is a supermartingale and Un 0 (clearly, since Un = limp E(YpFn)
- Xn = limp E(Yp - Xn Fn) = limp E((Xp)+ - Xn Fn) limp E(Xp - Xn Fn) 0
(keep in mind that X is a submartingale!).
By Corollary 2.3, U has a limit, too , in L1. Denote it by U.
It follows that X = M U is a diference between two convergent sequences.
As both M and U are finite, the meaning is that X has a limit itself, X L1.
random walks.
Corollary 2.6. Let = (n)n i.i.d. rev. from L. Let Sn = 1++n, S0 = 0
A.BENHARI
-212-
and let m = E1. Let a and = a be the hitting time of (a,), that is, =
inf {n Sn > a}. Suppose that n are not constants.
Then m 0 < (a.s.).
The same holds for the hitting time of the interval (-,a).
Proof. If m > 0 , it is simple. The sequence Sn converges a.s. to due
to the LLN. (Sn/n m > 0 Sn !) . The problem is if m = 0 . In that case
let Xn = a - Sn. Then X is a martingale and EXn = a. If a < 0, =0 and there is
nothing to prove. So we shall suppose that a0. In this case X0 = a 0 and
(2.4)
= inf{n Xn < 0} .
Here is how we shall use the boundedness of the steps n. Let M =
n
Then M n M a.s.
The stopping theorem tells us that Y = (Xn)n is another martingale,
since every bounded stopping time (we mean n !) is regular. But Yn - M since
for n > Yn = Xn 0 (from (2.4)) and n Yn = X = X-1 + X-1 M
0 M = M. So Yn+M is another martingale, this time nonnegative. By Corollary
2.5 Yn+M should converge a.s. . Subtracting M, it follows that Yn f for some f
L1. So Xn f a - Sn f
Sn a-f
. Let E = {=}. If E,
then a-f() = limSn(). Meaning that Sn() is convergent.
Well, the sequence Sn diverges a.s.
Here is why: if (Sn)n would be convergent, then it should be Cauchy. Thus Sn+k
Sn < k for great n. Hence Sn+k Sn < , Sn+2k Sn-k < , Sn+3k Sn-2k < ,
. But if n are not constants, there exists a k such that P(Sn+k Sn < ) =q < 1.
Then , as the above differences are i.i.d., P(Sn+k Sn < , Sn+2k Sn-k < , Sn+3k
Sn-2k < ,) = qqq= 0. So P({(Sn())n is Cauchy} = 0.
The only conclusion is that P(E) = 0.
.
A.BENHARI
-213-
3.
in L1
We want to establish conditions such that a martingale X converge to X
1
in L . In that case we shall call X a martingale with tail.
Proposition
Proposition 3.1.
1
If X is a martingale and Xn X in L , then Xn = E(XFn).
Proof. From the definition of the conditioned expectation we see that
1
the claim is that E(Xn1A) = E(X1A) for any A Fn. But Xn+k X in L as k
E(Xn+k1A) E(X1A) as k. And E(Xn+k1A) = E(E(Xn+k1AFn)) = E(1A E(Xn+kFn)) = E(1A
Xn).
Proposition 3.2. Conversely, if Xn = E(fFn) then Xn E(fF) both a.s.
1
and in L .
Proof. Let Z = E(fF).
Suppose first that f 0. Then Xn is a nonnegative martingale. According to
Corollary 1.3 X converges a.s. to some X from L1.
Step 1. If f is even bounded, f M , then Xn M too; hence X M X
- Xn 2M. By Lebesgues domination criterion EX - Xn 0, thus Xn X in L1.
Moreover, if A Fn then E(Xn+k1A) E(X 1A) thus E(X 1A) = limk E(E(Xn+k1AFn)) =
limk E(1A E(Xn+kFn))= E(1A Xn) (since X is a martingale!). It means that E(XFn)
= Xn . But E(ZFn) = E(E(fF)Fn) = E(fFn) Xn . Therefore Z and X are both
from L1(F) and E(ZFn) = E(XFn) n. As F is generated by the union of all F
and that union is an algebra it follows that Z = X . We proved the claim if f is
bounded and nonnegative.
Step 2. If f 0, let fa = fa. Let a be great enough such that f-fa 1 <
for a given arbitrary . Then
E(fF) - E(fFn) 1 E(fF) - E(faF) 1 +
E(faF) - E(faFn) 1 + E(faFn) - E(fFn) 1 f - fa 1 + E(faF) E(faFn) 1 + fa - f 1 (due to the contractivity of the conditioned expectation,
see the lesson!) 2 +
E(faF) - E(faFn) 1. According to step 1, the second
term converges to 0 (as fa is bounded and nonnegative). It follows that
limsupn E(fF) - E(fFn) 1 2 + 0 E(fFn) E(fF) in L1.
Step 3. f any. We write f =f+ - f- . Then E(f+Fn) E(f+F) both a.s. and
1
in L and the same holds for E(f-Fn) E(f-F). Subtracting the two relations we
infer that E(fFn) E(fF) both a.s. and in L1.
Remark. The result of proposition 3.1 and 3.2 is that even if all the
martingales bounded in L1 converge a.s., only the martingales of the form Xn =
E(fFn) have a tail that is, converge to its a.s.- limit in L1
Definition. Let X = (Xn)n be a sequence of random variables from L1. We say
that X is uniformly integrable iff for any >0 there exists an a = a() such that
E(Xn 1{ X n >a } ) < n.
Notice that can write the condition from
the definition also as E(Xn -a(Xn)) < n, where a(x) = (xa)(-a) or as
E(Xn-Xna) < n.
A.BENHARI
-214-
a.s.
Proof. . Let >0. Let a such that X - Xa 1 < /3. Let n() be
such that n > n() X - Xn 1 < /3. Then n > n Xn - Xna 1 Xn
- X 1 +
X - Xa 1 +
Xa - Xna 1 /3 + /3 +
Xn X 1 3/3 = .
For n n() let bn > 0 be such that Xn - Xnbn 1 < . Finally, let A
= max{a,b1,b2,,bn()}. Then E(Xn-XnA) < n.
. Let >0 and a as in the definition of uniform integrability; from
Fatou we infer that X is in L1, too as EX= E(liminfnXn) liminfnE(Xn) <
(according to proposition 3.3!). Let then a be chosen such that X Xa 1 < and Xna - Xn < n.
Then
X-Xn 1 X - a(X) 1 + a(X ) a(Xn) 1 + a(Xn) - Xn = I
Xna + II + III. The first term is X - Xa 1 < ; the last one is
Xn < ; as about the term II, Xn X a(Xn) a(X) since a is
continuous. But the sequence (a(Xn))n is dominated by a therefore a(X )
a(Xn) 1 0 as n by Lebesgues domination principle.
The conclusion is that liminfn X-Xn 1 2. And is arbitrary
Corroborating with propositions 3.1 and 3.2 we arrive at the following
conclusion:
convex.
Proof. . We shall first establish an auxiliary result:
A.BENHARI
-215-
Then
(3.2) is non-decreasing and convex;
( x )
= ;
x
x
(3.3) lim
(m)P(Y
m).
m=1
Proof of the Lemma. As the sequence (a(m))m is non-decreasing and nonnegative, the function (t):=
(m)1
m=0
[ m ,m+1)
( x)
( x)
(m + 1)
is non-decreasing thus limx
= limm
(here m is
x
x
m +1
(1) + (2) + ... + (m)
an integer!) = limm
= limm(m) (by Stolz-Cesaro!) = . We
m +1
function x
E ((Y)1
{m
E ((m+1)1
< m+1}
m =0
(as is non-decreasing) =
(m + 1) P(Y
m+1)) =
m) -
m =0
( m ) P(Y m) =
m=1
(m + 1) P(Y
m =0
m+1) =
{m
< m+1}
m =0
(m + 1) (P(Y
m =0
(m + 1) P(Y
m) P(Y
m) -
m =0
( m ) P(Y
m)
m =1
m +1
(since
(t)dt
= (m)).
mP (m
m = an
m = an
m= an
A.BENHARI
-216-
an+2) +2P(an+2 Y < an+3) + 3P(an+3 Y < an+4)+ . = anP(Y an) + P(Y an + 1) +
P(Y an+2) + .
P (Y
m) (since an 1 !) or
m = an
(3.5)
P (Y
m ) 2 -n
m = an
(m)P(Y
m) =
(m)P(Y
m =1
P (Y
n 1
m =1
m = an
m) . But a bit of
m)
-n
= 1.
n 1
( y ) A
( y )
y
. We can find
y
A
such a t because of the property (t)/t as t , which we assumed.
(Y )
1{Y t } )
Let then Y be one of the random variables Xk. Then E(Y1{Y t}) E(
A
(Y )
E(
) =
E(Y)
A = .
A
A
A
Corollary 3.8. If a martingale X is bounded in Lp or in Lln+L then it is
uniformly integrable. Bounded in Lln+L means that sup {E(Xnln+Xn)} < . In this
A k and let t > 0 be such that y t
1n
E(fFn)1{=n}=
1n
A.BENHARI
-217-
t1
2
, g =
t1
2
A.BENHARI
-218-
As a consequence, E X = 1.
a
Proof. This stopping time is finite a.s. by Corollary 2.7. It means that
Xn X (a.s.). But notice that Sn a. Thus, if t > 0, Xn eta-n(t) eta (since
(t) = logE e t1 log e tE1 (by Jensen!) = tE1 0!) so we can apply Lebesgues
domination criterion to infer that Xn X in L1, too.
There is a case when this fact is enough to find the distribution of a.
Suppose that n 1 1 , p . This is the simplest random walk when
q p
the probability of a step to the right is p and the probability of a step to the
left is q = 1 p . Suppose a is a positive integer. Then S = a. As the above
- (t)
-at
proposition tells us that E e tS (t ) = 1 it means E e ta(t ) = 1 t 0 Ee = e
t 0. Let us denote (t) by u 0. The function (t) becomes in our case (t)
=ln(pet + qe-t ) = u hence
(4.2)
pet+qe-t = eu.
The idea is to find the positive t=(u) from the equation (4.1) in order
to find the Laplace transform of ,
(4.3) L(u) = Ee-u = e-a(u)
A bit of calculus points out that
e u + e 2u 4 pq
(4.4) t =(u) = ln
2p
which, replaced in (4.3) gives us
(4.5)
e u + e 2u 4 pq -a
e u e 2u 4 pq a
)
L(u) = (
) = (
2p
2q
Remark that the Laplace transform is the ath power of another Laplace
transform, which means that is a convolution of a i.i.d random variables. That
should not be very surprising, because in order to reach the level a the random
walk S should reach successively the levels 1,2,,a-1!
If one expands (4.5) in series one discovers the moments of . In order to
find the distribution of it is more convenient to deal instead with the
generating function g(x) = Ex. We want x to be in [0,1]. We can do that replacing
e-u by x (since u 0 0 < x 1!) . Then we obtain
a
1 1 4 pqx 2
(4.5) g(x) =
2qx
2
3
4
5
x n = x + x + x + 5 x + 7 x + ...
(4.6) 1 1 x =
2 n 1
2 8 15 128 256
n =1 ( 2 n 1) 2
A.BENHARI
-219-
2 n 1
n 1
1
2
1
n
n
n
= =(
p q x
(4.7) g(x) =
n =1 (2n 1)
2
3
3 2 5
4 3 7
px + p qx + 2 p q x + 5 p q x + 14 p 5 q 4 x 9 + 42 p 6 q 5 x11 + ... )a .
-1
n 1 p n q n 1
(4.8) Po1 =
2 n 1 .
n =1 ( 2n 1)
2n 1
n 1
For p = q = , Po1 =
2 n 1 .
2 n 1
n =1 ( 2 n 1) 2
2ap
Remark. Notice that p > Ea =
< but p = Ea = .
2 p 1
-1
A.BENHARI
-220-
Bibliography:
1. P.Billingsley: Probability and Measure, Wiley and sons, New-York, 1979
2. L.Breiman: Probability, Addison-Wesley, Reading, 1968
3. W. B. Davenport, Jr. and W. L. Root, An Introduction to the Theory of Random Signals
and Noise. New York, NY, USA: McGraw-Hill Inc., 1958.
4. W. B. Davenport, Jr., Probability and Random Processes: An Introduction for Applied
Scientists and Engineers. New York, NY, USA: McGraw-Hill Inc., 1970.
5. C.Dellacherie, P-A.Meyer: Probabilits et Potentiel, Vol.2, Hermann, Paris, 1980
6. J. L. Doob, Stochastic Processes. New York, NY, USA: John Wiley & Sons Inc.,
1958.
7. W. Feller An introduction to probability theory and its application Tome I&II. Wiley
(1966)
8. J. E. Freund, Mathematical Statistics. Englewood Cliffs, NJ, USA: Prentice-Hall, Inc.,
16th printing, 1962.
9. J. E. Freund and G. A. Simon, Modern Elementary Statistics. Englewood Cliffs, NJ,
USA: Prentice-Hall, Inc., 8th ed., 1992.
10. W. A. Gardner, Introduction to Random Processes with Applications to Signals and
Systems. London, UK: Collier Macmillan Publishers, 1986.
11. Peter Galko, ELG 5119/92.519 Stochastic Processes Course Notes, Faculty of
Engineering, University of Ottawa, Ottawa, ON, Canada, Fall 1987.
12. W. A. Gardner, Introduction to Random Processes with Applications to Signals and
Systems. New York, NY, USA: McGraw-Hill Publishing Company, 2nd ed., 1990.
13. B. V. Gnedenko, Theory of Probability. New York, NY, USA: Chelsea Publishing
Co., 1962. Library of Congress Card No. 61-13496.
14. R. M. Gray and L. D. Davisson, Random Processes: A Mathematical Approach for
Engineers. Englewood Cliffs, NJ, USA: Prentice-Hall, Inc., 1986.
15. H. P. Hsu, Schaum's outline of theory and problems of probability, random variables,
and random processes. New York, NY, USA: McGraw-Hill Inc., 1997.
16. A. N. Kolmogorov, Foundations of the Theory of Probability. New York, NY, USA:
Chelsea Publishing Co., english translation of 1933 german edition, 2nd english ed.,
1956.
17. Alberto Leon-Garcia, Probability and Random Processes for Electrical Engineering.
Reading, MA, USA: Addison Wesley Publishing Co. Inc., 2nd ed., 1994. ISBN 0-20150037-X.
18. Alberto Leon-Garcia, Student Solutions Manual: Probability and Random Processes
for Electrical Engineering. Reading, MA, USA: Addison-Wesley Publishing Co. Inc.,
2nd ed., 1994. ISBN 0-201-55738-X.
19. M. Love, Probability Theory. Princeton, NJ, USA: D. Van Nostrand Co., Inc., 2nd
ed., 1960.
20. M. Love, Probability Theory, vol. I. New York, NY, USA: Springer, 4th ed., 1977.
21. M. Love, Probability Theory, vol. II. New York, NY, USA: Springer, 4th ed., 1978.
22. I. Miller and J. E. Freund, Probability and Statistics for Engineers. Englewood Cliffs,
NJ, USA: Prentice-Hall, Inc., 2nd ed., 1977.
23. F. Mosteller, R. E. K. Rourke, and G. B. Thomas, Jr., Probabilty and Statistics.
Reading, MA, USA: Addison Wesley Publishing Company Inc., 1961.
24. I. P. Natanson, Theory of Functions of a Real Variable. New York, NY, USA:
Frederick Ungar Publishing Co., 1955.
25. J.Neveu: Martingales temps discret, Masson, Paris, 1972
A.BENHARI
-221-
A.BENHARI
-222-