Vous êtes sur la page 1sur 8

Advanced Stochastic Processes.

David Gamarnik

LECTURE 2 Random variables and measurable functions. Strong Law of Large Numbers (SLLN). Scary stu continued ...

Outline of Lecture
Random variables and measurable functions. Extension Theorem. Borel-Cantelli Lemma and SLLN

1.1. Random variables and measurable functions


Denition 1.1. Given two pairs (1 , F1 ), (2 , F2 ) of a sample space and a -eld, a function X : 1 2 is dened to be measurable if for every A F2 we must have X 1 (A) F1 . When 2 is the set of all reals R and F2 is the Borel -eld, the function X is called a random variable. This denition naturally extends to the case when 2 = d . In this case we call X a random vector. Also since the set of integers is a subset of R, the denition of a random variable includes the case of integer valued random variables. Exercise 1. Suppose a function X : R is such that X 1 (, x) F for every real value x. Prove that X is a measurable function. Note, that we do not have to have a probability measure P on 1 or 2 in order to dene mea surable functions. But probability measure is needed when we discuss probability distributions below. Examples. (a) It is easy to give an example of a function which is not measurable. Suppose, for example 1 = 2 and both consist of exactly 3 elements 1 , 2 , 3 . Say F1 is a trivial -eld (which consists of only and ) and F2 is a full -eld consisting of all 8 subsets of . Then the identical transformation X : is not measurable: take any non-empty
1

D. GAMARNIK, 15.070

set A , A = . We have A is measurable with respect to F2 , but X 1 (A) = A is not measurable with respect to F1 . (b) (Figure.) Say = [0, 1]2 and X : R is dened by X ( ) = 1 + 2 . We claim that X is a random variable when is equipped with Borel -eld. Here is the proof. For every real value x consider the set A = { = (1 , 2 ) : 1 + 2 > x}. We will prove that A is measurable (belongs to the Borel -eld of [0, 1]2 ). Then we will take a complement of A and this will prove that X is random variable. Consider the countable set of pairs of rationals (r1 , r2 ) such that r1 + r2 > x. For each of them nd n = n(r1 , r2 ) the smallest integer which is large enough so that the recangle B (r1 , r2 ; 1/n) = {(1 , 2 ) [0, 1]2 : |1 r1 | 1 1 , |2 r2 | } n n

lies entirely in A (this is possible by strict inequality r1 + r2 > x). Observe that every pair (1 , 2 ) satisfying 1 + 2 > x lies in one of these rectangles. Thus A is the union 1 r1 ,r2 B (r1 , r2 , n(r1 ) of the countable collection of such rectangles and therefore belongs ,r2 ) to the Borel -eld of [0, 1]2 . (c) Say = C [0, ) equipped with the Borel -eld, and X : R is the function which maps every continuous function f (t) into max0t1 |f (t)|. Then X is a random variable on . Indeed, for every x, we have X 1 (x) is the set of all functions f such that max0t1 |f (t)| x. But this is exactly the set B (0, x, 1) used in Denition 1.5 of Lecture 1. The sets of this type generate the Borel -eld, and in particular, belong to it. Thus X is measurable. The concept of random variables naturally leads to the concept of probability distribution Denition 1.2. (Figure.) Given a probability space (, F , P) and a random variables X : R, the associated probability distribution is dened to be the function F : R [0, 1] given by F (x) = P({ : X ( ) x}). When F (x) is a dierentiable function of x, its derivative (x) is called the density function. f (x) = F In other words F (x) is the probability given to the set of all elementary outcomes which are mapped by X into value at most x. It is the probability distributions which are usually discussed in elementary probability classes. There, one usually denes probability distribution as a function satisfying certain properties (like it should be non-decreasing and should converge to unity as x ). Here these properties can be derived from the given denition of a probability distribution. Proposition 1. Prove that F (x) is non-decreasing, non-negative and limx F (x) = 0, limx F (x) = 1. Proof. HW

The concept of probability distributions allows one to perform the probability related calculations without alluding to more abstract notions of probability measures. This is not possible, however, when we discuss probability spaces like C [0, ).

LECTURE 2. PROBABILITY BASICS CONTINUED

Having dened random variables and associated probability distributions, we can dene further expected values, moments, moment generating functions, etc., in a more formal way then is done in elementary probability classes. We do this only heuristically, highlighting the main ideas. Denition 1.3. A random variable X : R is called simple if it takes only nitely many values x1 , x2 , . . . , xm . The expected value of a simple random variable X is dened to be the quantity E[X ] = xi P{ : X ( ) = xi }.
1im

What if X is not simple? How do we dene its expected value? The idea is to approximate X by a sequence of simple random variables. For simplicity assume that X takes only values k in the interval [0, A] for some A > 0. That is X : [0, A]. Now consider Xn ( ) = n if k 1 k X ( ) ( n , n ]. Then Xn is a simple random variable. It can be shown that the sequence of the corresponding expected values E[ Xn ] converges. Its limit is called the expected value E[X ] of X . It is also sometimes written as X ( )dP( ). This denition of expected value satises all the properties of expected values one studies in elementary probability courses, for example the fact E[X 2 ] (E[X ])2 , Markov inequality, Chebyshev inequality, Jensens inequality, etc.

1.2. Whats i.i.d. sequence of random variables?


Now we can give a formal denition of a stochastic process the principle notion for this course. Denition 1.4. Let T be the set of all non-negative reals R+ or integers Z+ . A stochastic process {Xt }tT is a family of random variables Xt : R parametrized by T . Remark. Note that a sample outcome corresponding to a stochastic process is a function X ( ) : T R, and the sample space of the corresponding stochastic process is the space of functions from T into R. But often we consider restrictions. For example, when T = [0, ) we might consider only continuous functions from [0, ) into R: Example. Set = C [0, ) equipped with Borel -eld. Dene Xt ( ) = (t) for every sample C [0, ). Then {Xt }t[0,) is a stochastic process. This is true because each function Xt : C [0, ) R is a random variable. (we will prove this later in the course). Remark. The denition naturally extends to the case when observations are functions T Rd into d-dimensional Euclidian space. One of the simplest (to analyze, but not to dene) examples of a stochastic process is an i.i.d. (independent, identically distributed) stochastic process. What is an i.i.d. stochastic process? In probability courses it was common to say X1 , X2 , . . . , is an i.i.d. sequence of Bernoulli random variables with parameter 0 < p < 1, or Z1 , Z2 , . . . is an i.i.d. sequence of Normal (Gaussian) random variables with expected value and variance 2 . What do we mean by this? How does it t with (, F , P)? We are almost equipped to answer this question, but need little more technicalities. Probability space denition includes dening a function P : F [0, 1]. How can we dene this function on the entire -eld, when we cannot sometimes even describe the -eld explicitly? The help comes from Extension Theorem ET. A rough idea is that if the -eld is generated by

D. GAMARNIK, 15.070

some collection of sets A and we can dene P on A only, the there is a unique extension of the function P onto entire -eld, provided some restrictions are satised. 1.2.1. Extension Theorem Theorem 1.5 ( Extension Theorem). Given a sample space and a collection A of subsets of such that for every A A its complement \ A is also in and for every nite sequence A1 , . . . , Am its union 1j m Aj is also in A. Suppose P : A [0, 1] is such that (a) P() = 1, (b) P( j =1 Aj ) j =1 P(Aj ), whenever j =1 Aj A. (c) P( j =1 Aj ) = j =1 P(Aj ), whenever j =1 Aj A and Ai , i = 1, 2, . . . are mutually exclusive. Then the function P uniquely extends to a probability measure P : F (A) [0, 1] dened on the -eld generated by A. Remark. Note, that the requirement from A is to be a collection of sets with properties very similar to that of a -eld. The only dierence is that we do not require every innite union of sets to be in A as well. 1.2.2. Examples and applications Uniform probability measure. Consider = [0, 1] and let A be the set of nite unions of open or closed non-intersecting intervals: [a1 , b1 ) [a2 , b2 ] (am , bm ). It is easy to check that A satises the conditions of the ET. Consider the function P : A [0, 1] which maps every such set of intervals to the value 1im bi ai (that is the total length of these intervals). It can be checked that this also satises the conditions of the ET (we skip the proof). Thus, by ET, there exists a unique extension of function P to a probability measure on entire Borel -eld, since this -eld is generated by intervals. This probability measure is called uniform probability measure on [0, 1]. Other types of continuous distributions. What about other distributions like Normal, Exponential, etc.? The proper denition of these probability measures is introduced similarly. For example the standard Normal distribution is dened as probability space (R, B , P), where 2 b t2 B is the Borel -eld on R and P assigns to each interval [a, b] value a 1 e dt. Then each 2 non-intersecting collection of intervals [ai , bi ], 1 i m is assigned value which is the sum of the corresponding integrals. Again the set of nite collections of non-intersecting intervals satises the conditions of ET, and applying ET we obtained that the probability measure P is dened on the entire Borel -eld B . 1.2.3. i.i.d. sequences i.i.d coin tosses . Let = {0, 1} . Recall that the product -eld is the eld generated by cylinder type sets A( ). Let A be the set of nite unions of such sets 1j k A(j ). Again, it can be checked that that A satises the conditions of ET. For every nite sequence = 1 , . . . , m and the corresponding set A( ) we set P(A( )) simply to be 21 m (the probability of a particular 1 sequence of 0/1 observations in the rst m coin tosses is 2m ). For example, the probability of rst four zeros is 214 . Then, for every union of non-intersecting sets 1j k A(j ) we set their

LECTURE 2. PROBABILITY BASICS CONTINUED

corresponding value to 2k m . The conditions of ET again can be checked, but we skip the proof. Then, by ET there is a unique extension of P to the entire product -eld of . This is what we call a sequence of i.i.d. unbiased coin tosses also known as a sequence of i.i.d. Bernoulli random variables with parameter 1/2. The phrase i.i.d., in proper probabilistic terms, means (, F , P) the probability space constructed above. General i.i.d. type distributions. We have dened formally i.i.d. Bernoulli sequence. What about general i.i.d. sequences? They are dened similarly by considering innite products and cylinder type sets. First we set = R . On it we consider the product -eld F . Dene A to be the set of nite unions of cylinder type sets. Recall that a cylinder set A is the set of the form A = [a1 , b1 ] (a2 , b2 ) [am , bm ) R product of closed or open or half-closed half open intervals. Recall also that cylinder sets generate, by denition, the product -eld F . Suppose we have a probability space (R, B , P) dened on R and its Borel -eld B (for exam ple P corresponds to standard Normal distribution). Then for every cylinder set A we dene P(A) = 1j m P([aj , bj ]). Again we check that A and P satisfy the conditions of ET (we skip the proof). Thus there is a unique extension of P to the entire product -eld F of R , since A generates this -eld. Then we dene Xm ( ) = m for every R . We note that Xm is a random variable as it is a measurable function from R into R. The sequence X1 , X2 , . . . is a stochastic process which we call an i.i.d. sequence of random variables. Essentially we have embedded a sequence of random variables {Xm } into a single probability space (R , F , P). Is this denition consistent with elementary denition of i.i.d. Recall that elementary denition of i.i.d. sequence is when P(X1 x1 , . . . , Xm xm ) = 1j m P(Xj xj ). Is this true in our case? Note P(X1 x1 , . . . , Xn xm ) = P{ R : 1 (, x1 ], . . . , m (, xm ]} = P{ (, x1 ] (, xm ] R } = P((, xj ]),
1j m

where the last equality follows from how we dened P on cylinder sets. But the product of these probabilities is exactly 1j n P(Xj xj ). Thus the identity checks.

1.3. Borel-Cantelli Lemma and Strong Law of Large Numbers (SLLN)


The Strong Law of Large Numbers (SLLN) (like the Central Limit Theorem) is one of the most fundamental theorems in probability theory. Yet properly stating it, let alone proving it is not as straightforward as is, for example the Weak Law of Large Numbers (WLLN). We now use the (, F , P) framework to properly state and prove SLLN. We begin with a very useful tool, the Borel-Cantelli Lemma. Given a sample space , a -eld F and an innite sequence A1 , A2 , . . . , Am , . . . F dene Ai.o. (i.o. stands for innitely often) to be the set of all which belong to innitely many Am -s. One can write Ai.o. as (check

D. GAMARNIK, 15.070

the validity of this identity) Ai.o. = m1 j m Aj . Lemma 1.6 (Borel-Cantelli Lemma ). Given a probability space (, F , P) and an innite sequence of events Am , m 1 suppose m P(Am ) < . Then P(Ai.o. ) = 0. In words we say: the probability that Am happen innitely often is equal to zero. Proof. Dene Bm = j m Aj . Then Ai.o. = m Bm . Then B1 B2 B3 . Using Proposition 4 part (b) (applied to complement sets) we obtain P(Ai.o. ) = limm P(Bm ). But since P ( A ) < then the tail parts of the sum satisfy lim m m m j m P(Aj ) = 0. But P(Bm ) = P(j m Aj ) j m P(Aj ). Therefore, moreover limm Bm = 0. We conclude P(Ai.o. ) = 0. Theorem 1.7 (SLLN). Consider an i.i.d sequence of random variables Xn , n = 1, 2, . . . corre sponding to some probability measure (R, B , P). Suppose E[X1 ] < . Then almost surely 1in Xi lim = E[X1 ]. n n Formally, dene 1in Xi ( ) A = { R : lim = E[X1 ]}. n n Then P(A) = 1. Proof. The proof of this fundamental result in probability theory is complicated (see for example [2]). Here, for simplicity, we consider a special case when the random variable X1 has a nite fourth moment. Namely, 4 E[|X1 | ] = |X1 ( )|4 dP( ) < . Let us center the random variables Xi in the following way: Yi = Xi E[Xi ]. Then Yi have zero expected value. Since Y i 1in 1in Xi = E[Xi ], n n
P

it suces to prove that P An () as |


1in Yi
| n

1in

Yi

converges almost surely to zero. Fix > 0 and dene the event

> . Formally An () = { R :|

1in

Yi ( )

n
1in

| > }.

Applying Markov inequality P(|

1in

Yi

| > ) = P(( E[(

Yi

n
1in n4 4

)4 > 4 )

Yi )4 ]

When we expand E[( 1in Yi )4 ] we note that only the terms of the form E[Yi4 ] and E[Yi2 Yj2 ] are non-zero, since the expected value of Yi is zero and the sequence is i.i.d. Also by independence

LECTURE 2. PROBABILITY BASICS CONTINUED

E[Yi2 Yj2 ] = (E[Y12 ])2 . We obtain a bound nE[Y14 ] + n(n 1)(E[Y12 ])2 n2 [E[Y14 ] + (E[Y12 ])2 ] E[Y14 ] + (E[Y12 ])2 = n4 4 n2 4 n4 4 This expression is nite by our assumption of niteness of fourth moment. Since the sum 4 ]+(E[Y 2 ])2 E[Y1 1 < , then applying the Borel-Cantelli Lemma we conclude that probabil n1 n2 4 ity that An () happens for innitely many n is zero. In other words, P for almost every R n Yi ( ) there exists n0 = n0 (, ) such that for all n > n0 we must have | 1i | . This means n that for almost every , we have 1in Yi ( ) lim = 0. n n This concludes the proof.

1.4. Reading assignments


Notes Modeling experiments, pages 1.4,1.5,2.2. Grimmett and Stirzaker [2], Chapters 1 and 2. Chapter 7, Sections 7.3-7.5 . Durrett [1] Chapter 1, Sections 1-7.

BIBLIOGRAPHY
1. R. Durrett, Probability: theory and examples, Duxbury Press, second edition, 1996. 2. G. R. Grimmett and D. R. Stirzaker, Probability and random processes, Oxford Science Pub lications, 1985.

Vous aimerez peut-être aussi