Vous êtes sur la page 1sur 36

BST 401 Probability Theory

Xing Qiu Ha Youn Lee

Department of Biostatistics and Computational Biology


University of Rochester

October 12, 2009

Qiu, Lee BST 401


Outline

1 Convergence of Sequence of Measurable Functions

Qiu, Lee BST 401


Random Variables Review
I’ll start with the simplest non-trivial probability space:
(Ω1 , 2Ω1 , µ1 ), where Ω1 = {H, T }, µ1 ({H}) = µ1 ({T }) = 21 .
I can define a sequence of random variables X1 , X2 , . . . on
this probability space in this way:
(
0, ω = H,
Xn (ω) = 1
n, ω = T.

My point: different random variables are only different ways


to assign numbers to events. They do not change the
whole space nor the probability measure.
Bernoulli random variable review: X ∼ Bernoulli(p) means
P(X = 1) = p and P(X = 0) = 1 − p. So it is more than
just a coin tossing distribution: it must send the two
possible outcomes to numbers 0 and 1.
Qiu, Lee BST 401
Random Variables Review
I’ll start with the simplest non-trivial probability space:
(Ω1 , 2Ω1 , µ1 ), where Ω1 = {H, T }, µ1 ({H}) = µ1 ({T }) = 21 .
I can define a sequence of random variables X1 , X2 , . . . on
this probability space in this way:
(
0, ω = H,
Xn (ω) = 1
n, ω = T.

My point: different random variables are only different ways


to assign numbers to events. They do not change the
whole space nor the probability measure.
Bernoulli random variable review: X ∼ Bernoulli(p) means
P(X = 1) = p and P(X = 0) = 1 − p. So it is more than
just a coin tossing distribution: it must send the two
possible outcomes to numbers 0 and 1.
Qiu, Lee BST 401
Random Variables Review
I’ll start with the simplest non-trivial probability space:
(Ω1 , 2Ω1 , µ1 ), where Ω1 = {H, T }, µ1 ({H}) = µ1 ({T }) = 21 .
I can define a sequence of random variables X1 , X2 , . . . on
this probability space in this way:
(
0, ω = H,
Xn (ω) = 1
n, ω = T.

My point: different random variables are only different ways


to assign numbers to events. They do not change the
whole space nor the probability measure.
Bernoulli random variable review: X ∼ Bernoulli(p) means
P(X = 1) = p and P(X = 0) = 1 − p. So it is more than
just a coin tossing distribution: it must send the two
possible outcomes to numbers 0 and 1.
Qiu, Lee BST 401
Random Variables on the Same Space

Let Y be the “casino r.v.”, P(Y = 1) = q,


P(Y = −1) = 1 − q. Y is not a Bernoulli r.v.!
The following examples show that X and Y can be defined
on the same probability space: a) Y = 2X − 1 (q = p) ; b)
Y = 1 − 2X (q = 1 − p); c) Y ≡ 1 (q = 1). d) Y ≡ −1
(q = 0).
If X1 ∼ Bernoulli( 12 ), X2 ∼ Bernoulli( 13 ), they can not be
defined on the same probability space!

Qiu, Lee BST 401


Random Variables on the Same Space

Let Y be the “casino r.v.”, P(Y = 1) = q,


P(Y = −1) = 1 − q. Y is not a Bernoulli r.v.!
The following examples show that X and Y can be defined
on the same probability space: a) Y = 2X − 1 (q = p) ; b)
Y = 1 − 2X (q = 1 − p); c) Y ≡ 1 (q = 1). d) Y ≡ −1
(q = 0).
If X1 ∼ Bernoulli( 12 ), X2 ∼ Bernoulli( 13 ), they can not be
defined on the same probability space!

Qiu, Lee BST 401


Random Variables on the Same Space

Let Y be the “casino r.v.”, P(Y = 1) = q,


P(Y = −1) = 1 − q. Y is not a Bernoulli r.v.!
The following examples show that X and Y can be defined
on the same probability space: a) Y = 2X − 1 (q = p) ; b)
Y = 1 − 2X (q = 1 − p); c) Y ≡ 1 (q = 1). d) Y ≡ −1
(q = 0).
If X1 ∼ Bernoulli( 12 ), X2 ∼ Bernoulli( 13 ), they can not be
defined on the same probability space!

Qiu, Lee BST 401


Random Variables and the Product Space (I)

Let X1 , X2 be two separate1 yet identical Bernoulli r.v.s.


defined on Ω1 and Ω2 , where Ω2 is just a copy of Ω1 .
My point is, though Ω2 is a copy of Ω1 and X2 assigns the
same numbers to the same events, X1 6= X2 because they
can take different values.
In probability theory, X1 = X2 is very strict. It means that a)
X1 and X2 are defined on the same probability space; b)
X1 (ω) = X2 (ω) for all ω ∈ Ω.
a.s.
In the same spirit, Xn → X ∗ means that a) Xn are defined
on the same probability space; b) they converge to X ∗
almost surely.

1
I can not use the word “independent” here because I haven’t defined it
yet.
Qiu, Lee BST 401
Random Variables and the Product Space (I)

Let X1 , X2 be two separate1 yet identical Bernoulli r.v.s.


defined on Ω1 and Ω2 , where Ω2 is just a copy of Ω1 .
My point is, though Ω2 is a copy of Ω1 and X2 assigns the
same numbers to the same events, X1 6= X2 because they
can take different values.
In probability theory, X1 = X2 is very strict. It means that a)
X1 and X2 are defined on the same probability space; b)
X1 (ω) = X2 (ω) for all ω ∈ Ω.
a.s.
In the same spirit, Xn → X ∗ means that a) Xn are defined
on the same probability space; b) they converge to X ∗
almost surely.

1
I can not use the word “independent” here because I haven’t defined it
yet.
Qiu, Lee BST 401
Random Variables and the Product Space (I)

Let X1 , X2 be two separate1 yet identical Bernoulli r.v.s.


defined on Ω1 and Ω2 , where Ω2 is just a copy of Ω1 .
My point is, though Ω2 is a copy of Ω1 and X2 assigns the
same numbers to the same events, X1 6= X2 because they
can take different values.
In probability theory, X1 = X2 is very strict. It means that a)
X1 and X2 are defined on the same probability space; b)
X1 (ω) = X2 (ω) for all ω ∈ Ω.
a.s.
In the same spirit, Xn → X ∗ means that a) Xn are defined
on the same probability space; b) they converge to X ∗
almost surely.

1
I can not use the word “independent” here because I haven’t defined it
yet.
Qiu, Lee BST 401
Random Variables and the Product Space (I)

Let X1 , X2 be two separate1 yet identical Bernoulli r.v.s.


defined on Ω1 and Ω2 , where Ω2 is just a copy of Ω1 .
My point is, though Ω2 is a copy of Ω1 and X2 assigns the
same numbers to the same events, X1 6= X2 because they
can take different values.
In probability theory, X1 = X2 is very strict. It means that a)
X1 and X2 are defined on the same probability space; b)
X1 (ω) = X2 (ω) for all ω ∈ Ω.
a.s.
In the same spirit, Xn → X ∗ means that a) Xn are defined
on the same probability space; b) they converge to X ∗
almost surely.

1
I can not use the word “independent” here because I haven’t defined it
yet.
Qiu, Lee BST 401
Random Variables and the Product Space (II)
The product space/measure is a way to connect the
otherwise separate probability spaces/random variables.
X1 on Ω1 , X2 on Ω2 . We may consider them as X̃1 and X̃2
on Ω1 × Ω2 in this way:
X̃1 : Ω1 × Ω2 → R, X̃1 (ω1 , ω2 ) = X1 (ω1 ).
X̃2 : Ω1 × Ω2 → R, X̃2 (ω1 , ω2 ) = X2 (ω2 ).
We can do this for an infinite sequence of r.v.s. Let
X1 , X2 , . . . be a sequence of r.v.s defined on separate
probability spaces Ω1 , Ω2 , . . .. The product space Ω∞
contains outcomes such as (H, H, T , H, T , T , . . .).

Y
X̃n : Ωn → R, X̃n (ω1 , ω2 , . . .) = Xn (ωn ).
n

Qiu, Lee BST 401


Random Variables and the Product Space (II)
The product space/measure is a way to connect the
otherwise separate probability spaces/random variables.
X1 on Ω1 , X2 on Ω2 . We may consider them as X̃1 and X̃2
on Ω1 × Ω2 in this way:
X̃1 : Ω1 × Ω2 → R, X̃1 (ω1 , ω2 ) = X1 (ω1 ).
X̃2 : Ω1 × Ω2 → R, X̃2 (ω1 , ω2 ) = X2 (ω2 ).
We can do this for an infinite sequence of r.v.s. Let
X1 , X2 , . . . be a sequence of r.v.s defined on separate
probability spaces Ω1 , Ω2 , . . .. The product space Ω∞
contains outcomes such as (H, H, T , H, T , T , . . .).

Y
X̃n : Ωn → R, X̃n (ω1 , ω2 , . . .) = Xn (ωn ).
n

Qiu, Lee BST 401


Random Variables and the Product Space (II)
The product space/measure is a way to connect the
otherwise separate probability spaces/random variables.
X1 on Ω1 , X2 on Ω2 . We may consider them as X̃1 and X̃2
on Ω1 × Ω2 in this way:
X̃1 : Ω1 × Ω2 → R, X̃1 (ω1 , ω2 ) = X1 (ω1 ).
X̃2 : Ω1 × Ω2 → R, X̃2 (ω1 , ω2 ) = X2 (ω2 ).
We can do this for an infinite sequence of r.v.s. Let
X1 , X2 , . . . be a sequence of r.v.s defined on separate
probability spaces Ω1 , Ω2 , . . .. The product space Ω∞
contains outcomes such as (H, H, T , H, T , T , . . .).

Y
X̃n : Ωn → R, X̃n (ω1 , ω2 , . . .) = Xn (ωn ).
n

Qiu, Lee BST 401


Random Variables and the Product Space (III)

The point: X̃n is just Xn defined for the product space so


we don’t need to make any distinction in practice.
X̃n s are defined on the same probability spaces now, so
they can be compared.
Apparently Xn 6= Xm in general. There is only one
exception: Xn (ωn ) = Xm (ωm ) = const. for all ωn ∈ Ωn and
ωm ∈ Ωm . It turns out to be the case for the strong law of
large numbers (SLLN).
SLLN (without proof,
Pjust state the conclusion for a special
case): Let Zn be n1 ni=1 Xi . Demonstrate the behavior of
Zn up to n = 3.

Qiu, Lee BST 401


Random Variables and the Product Space (III)

The point: X̃n is just Xn defined for the product space so


we don’t need to make any distinction in practice.
X̃n s are defined on the same probability spaces now, so
they can be compared.
Apparently Xn 6= Xm in general. There is only one
exception: Xn (ωn ) = Xm (ωm ) = const. for all ωn ∈ Ωn and
ωm ∈ Ωm . It turns out to be the case for the strong law of
large numbers (SLLN).
SLLN (without proof,
Pjust state the conclusion for a special
case): Let Zn be n1 ni=1 Xi . Demonstrate the behavior of
Zn up to n = 3.

Qiu, Lee BST 401


Random Variables and the Product Space (III)

The point: X̃n is just Xn defined for the product space so


we don’t need to make any distinction in practice.
X̃n s are defined on the same probability spaces now, so
they can be compared.
Apparently Xn 6= Xm in general. There is only one
exception: Xn (ωn ) = Xm (ωm ) = const. for all ωn ∈ Ωn and
ωm ∈ Ωm . It turns out to be the case for the strong law of
large numbers (SLLN).
SLLN (without proof,
Pjust state the conclusion for a special
case): Let Zn be n1 ni=1 Xi . Demonstrate the behavior of
Zn up to n = 3.

Qiu, Lee BST 401


Random Variables and the Product Space (III)

The point: X̃n is just Xn defined for the product space so


we don’t need to make any distinction in practice.
X̃n s are defined on the same probability spaces now, so
they can be compared.
Apparently Xn 6= Xm in general. There is only one
exception: Xn (ωn ) = Xm (ωm ) = const. for all ωn ∈ Ωn and
ωm ∈ Ωm . It turns out to be the case for the strong law of
large numbers (SLLN).
SLLN (without proof,
Pjust state the conclusion for a special
case): Let Zn be n1 ni=1 Xi . Demonstrate the behavior of
Zn up to n = 3.

Qiu, Lee BST 401


About the Homework

Back to the homework #6, problem 1. It asks you to prove


a.s. convergence. Without any handy theorems/tools, you
must start from scratch, that is, proof that for almost surely
every ω, X1 (ω), X2 (ω), . . . as a sequence of real numbers
converges.
Homework #7, problem 2. Limits are defined for ω ∈ Ω∞ .
You must use countably many set operations of rectangles
(essentially finitely dimensional rectangles) to defined
those sets.

Qiu, Lee BST 401


About the Homework

Back to the homework #6, problem 1. It asks you to prove


a.s. convergence. Without any handy theorems/tools, you
must start from scratch, that is, proof that for almost surely
every ω, X1 (ω), X2 (ω), . . . as a sequence of real numbers
converges.
Homework #7, problem 2. Limits are defined for ω ∈ Ω∞ .
You must use countably many set operations of rectangles
(essentially finitely dimensional rectangles) to defined
those sets.

Qiu, Lee BST 401


Convergence in measure/probability

µ
fn → f iff ∀ > 0, µ({ω : |fn (ω) − f (ω)| ≥ ω}) → 0.
Convergence in measure says that the measure of
“not-convergent” points shrinks to zero. Or in probability
theory: the probability of seeing “outlier” (those ω such that
|fn (ω) − f (ω)| > ) decreases to zero.
It looks awfully like a.e. convergence! Counter example:
shrinking but bouncy indicators.

Qiu, Lee BST 401


Convergence in measure/probability

µ
fn → f iff ∀ > 0, µ({ω : |fn (ω) − f (ω)| ≥ ω}) → 0.
Convergence in measure says that the measure of
“not-convergent” points shrinks to zero. Or in probability
theory: the probability of seeing “outlier” (those ω such that
|fn (ω) − f (ω)| > ) decreases to zero.
It looks awfully like a.e. convergence! Counter example:
shrinking but bouncy indicators.

Qiu, Lee BST 401


Convergence in measure/probability

µ
fn → f iff ∀ > 0, µ({ω : |fn (ω) − f (ω)| ≥ ω}) → 0.
Convergence in measure says that the measure of
“not-convergent” points shrinks to zero. Or in probability
theory: the probability of seeing “outlier” (those ω such that
|fn (ω) − f (ω)| > ) decreases to zero.
It looks awfully like a.e. convergence! Counter example:
shrinking but bouncy indicators.

Qiu, Lee BST 401


Weak Convergence of Measures

All the convergence we defined so far are convergence of


measurable functions/random variables w.r.t. a fixed
probability measure.
In SLLN, we need convergence in probability or even a.e.
convergence. But in CLT, we are satisfied by knowing the
resulting distribution is normal, we don’t really care about
pointwise convergence.
This makes us consider about a totally different
convergence. A convergence of distributions/measures
instead of convergence of random variables.

Qiu, Lee BST 401


Weak Convergence of Measures

All the convergence we defined so far are convergence of


measurable functions/random variables w.r.t. a fixed
probability measure.
In SLLN, we need convergence in probability or even a.e.
convergence. But in CLT, we are satisfied by knowing the
resulting distribution is normal, we don’t really care about
pointwise convergence.
This makes us consider about a totally different
convergence. A convergence of distributions/measures
instead of convergence of random variables.

Qiu, Lee BST 401


Weak Convergence of Measures

All the convergence we defined so far are convergence of


measurable functions/random variables w.r.t. a fixed
probability measure.
In SLLN, we need convergence in probability or even a.e.
convergence. But in CLT, we are satisfied by knowing the
resulting distribution is normal, we don’t really care about
pointwise convergence.
This makes us consider about a totally different
convergence. A convergence of distributions/measures
instead of convergence of random variables.

Qiu, Lee BST 401


Definition
w
Let P1 , P2 , . . . be probability measures on Ω. Pn → P iff any one
of the following equivalent conditions hold:
Fn (x) → F (x) for all continuous points (including ±∞).
(Durrett book definition)
µn (A) → µ(A) for all continuity sets A of P, which are sets
such that µ(∂A) = 0.
R R
Ω f dµn → Ω f dµ, for all bounded, continuous functions.
Several other criteria. See Thm 2.8.1. in Ash’s book.
We say a sequence of r.v.s X1 , X2 , . . . converges weakly
(converges in distribution) to X∞ if the distribution functions
Fn associated with Xn converges weakly to that of X∞ .
This topic will be re-studied in the CLT chapter.

Qiu, Lee BST 401


Definition
w
Let P1 , P2 , . . . be probability measures on Ω. Pn → P iff any one
of the following equivalent conditions hold:
Fn (x) → F (x) for all continuous points (including ±∞).
(Durrett book definition)
µn (A) → µ(A) for all continuity sets A of P, which are sets
such that µ(∂A) = 0.
R R
Ω f dµn → Ω f dµ, for all bounded, continuous functions.
Several other criteria. See Thm 2.8.1. in Ash’s book.
We say a sequence of r.v.s X1 , X2 , . . . converges weakly
(converges in distribution) to X∞ if the distribution functions
Fn associated with Xn converges weakly to that of X∞ .
This topic will be re-studied in the CLT chapter.

Qiu, Lee BST 401


Definition
w
Let P1 , P2 , . . . be probability measures on Ω. Pn → P iff any one
of the following equivalent conditions hold:
Fn (x) → F (x) for all continuous points (including ±∞).
(Durrett book definition)
µn (A) → µ(A) for all continuity sets A of P, which are sets
such that µ(∂A) = 0.
R R
Ω f dµn → Ω f dµ, for all bounded, continuous functions.
Several other criteria. See Thm 2.8.1. in Ash’s book.
We say a sequence of r.v.s X1 , X2 , . . . converges weakly
(converges in distribution) to X∞ if the distribution functions
Fn associated with Xn converges weakly to that of X∞ .
This topic will be re-studied in the CLT chapter.

Qiu, Lee BST 401


Definition
w
Let P1 , P2 , . . . be probability measures on Ω. Pn → P iff any one
of the following equivalent conditions hold:
Fn (x) → F (x) for all continuous points (including ±∞).
(Durrett book definition)
µn (A) → µ(A) for all continuity sets A of P, which are sets
such that µ(∂A) = 0.
R R
Ω f dµn → Ω f dµ, for all bounded, continuous functions.
Several other criteria. See Thm 2.8.1. in Ash’s book.
We say a sequence of r.v.s X1 , X2 , . . . converges weakly
(converges in distribution) to X∞ if the distribution functions
Fn associated with Xn converges weakly to that of X∞ .
This topic will be re-studied in the CLT chapter.

Qiu, Lee BST 401


Definition
w
Let P1 , P2 , . . . be probability measures on Ω. Pn → P iff any one
of the following equivalent conditions hold:
Fn (x) → F (x) for all continuous points (including ±∞).
(Durrett book definition)
µn (A) → µ(A) for all continuity sets A of P, which are sets
such that µ(∂A) = 0.
R R
Ω f dµn → Ω f dµ, for all bounded, continuous functions.
Several other criteria. See Thm 2.8.1. in Ash’s book.
We say a sequence of r.v.s X1 , X2 , . . . converges weakly
(converges in distribution) to X∞ if the distribution functions
Fn associated with Xn converges weakly to that of X∞ .
This topic will be re-studied in the CLT chapter.

Qiu, Lee BST 401


Definition
w
Let P1 , P2 , . . . be probability measures on Ω. Pn → P iff any one
of the following equivalent conditions hold:
Fn (x) → F (x) for all continuous points (including ±∞).
(Durrett book definition)
µn (A) → µ(A) for all continuity sets A of P, which are sets
such that µ(∂A) = 0.
R R
Ω f dµn → Ω f dµ, for all bounded, continuous functions.
Several other criteria. See Thm 2.8.1. in Ash’s book.
We say a sequence of r.v.s X1 , X2 , . . . converges weakly
(converges in distribution) to X∞ if the distribution functions
Fn associated with Xn converges weakly to that of X∞ .
This topic will be re-studied in the CLT chapter.

Qiu, Lee BST 401


Relations Between Different Convergences

Lp convergence implies convergence in measure.


If µ is a probability measure, a.e. convergence implies
convergence in measure.
For finite measures (probabilities), L∞ convergence implies
0
Lp convergence; Lp convergence implies Lp convergence,
if p > p0 (a homework problem).

Qiu, Lee BST 401


Relations Between Different Convergences

Lp convergence implies convergence in measure.


If µ is a probability measure, a.e. convergence implies
convergence in measure.
For finite measures (probabilities), L∞ convergence implies
0
Lp convergence; Lp convergence implies Lp convergence,
if p > p0 (a homework problem).

Qiu, Lee BST 401


Relations Between Different Convergences

Lp convergence implies convergence in measure.


If µ is a probability measure, a.e. convergence implies
convergence in measure.
For finite measures (probabilities), L∞ convergence implies
0
Lp convergence; Lp convergence implies Lp convergence,
if p > p0 (a homework problem).

Qiu, Lee BST 401

Vous aimerez peut-être aussi