Vous êtes sur la page 1sur 6

On the Central Limit Theorem

Bernd Losert
December 6, 2009
1 Preliminaries
Given a collection (S
n
,
n
, P
n
) : n Z
+
of probability spaces, there exists a
probability space (S, , P) and a sequence of independent random variables (Y
n
),
where Y
n
: (S, ) (S
n
,
n
), such that PY
n
A = P
n
(A) for all A
n
. If X
n
is
a real random variable on (S
n
,
n
, P
n
), then W
n
:= X
n
Y
n
is a real random variable
on (S, , P), the sequence (W
n
) is independent, and PW
n
B = P
n
X
n
B
for all Borel sets B R. The latter statement is equivalent to saying that the
law L(W
n
) = P W
1
n
of W
n
and the law L(X
n
) = P
n
X
1
n
of X
n
are equal
as probability measures on (R, B), where B is the collection of Borel sets of R. As
a consequence,
_
S
f(W
n
) dP =
_
Sn
f(X
n
) dP
n
for any real-valued Borel measurable
f R
R
, so things like the mean, variance, etc. of X
n
equal those of W
n
. Therefore,
and without loss of generality, any real random variable mentioned will be a real
random variable on (S, , P).
1
2 Metrics that Metrize Weak Convergence
By X
n
X in distribution, we mean that L(X
n
) L(X) weakly, that is
_
f dL(X
n
)
_
f
dL(X), or equivalently f(X
n
)) f(X)), for every bounded continuous func-
tion f R
R
.
For f R
R
, dene
|f|

= sup f(x) : x R + sup


_
[f(x) f(y)[
[x y[
: x, y R and x ,= y
_
which we call the bounded Lipschitz norm. It is a norm on the set f R
R
: |f|

<
.
For two probability measures P and Q on (R, B), let
(P, Q) = sup
_

_
f dP
_
f dQ

: f R
R
and |f|

1
_
and let
d
k
(P, Q) = sup
_

_
f dP
_
f dQ

: f R
R
and f
(i)
exists with [f
(i)
[ 1 for 1 i k
_
Theorem 2.1. The functions and d
k
are metrics on the set of probability mea-
sures on (R, B).
Theorem 2.2.
_
L(X
n
), L(X)
_
0 if and only if X
n
X in distribution. The
statement also holds if we replace with d
k
.
Proof. We rst prove that
_
L(X
n
), L(X)
_
0 implies X
n
X in distribution.
We will do this in four steps.
Step 1. We claim that
_
L(X
n
), L(X)
_
0 implies that the sequence (L(X
n
)) is
uniformly tight, that is for every > 0, there is an M such that P[X
n
[ > M <
for all n: Suppose not. Then for some > 0, there is a sequence (M
n
) that converges
2
to such that P([X
n
[ > M
n
) . Dene f
n
R
R
by
f
n
(x) =
_

_
1 if x (, M
n
]
x + 1 M
n
if x [M
n
, M
n
+ 1]
0 if x [M
n
+ 1, M
n
1]
x 1 + M
n
if x [M
n
1, M
n
]
1 if x [M
n
, )
and let g
n
= f
n
/2. Note that |f
n
|

= 2 and so |g
n
|

= 1. Moreover, f
n
(X
n
))

f
n
(X
n
) 1
{|Xn| >Mn}
_
=

1
{|Xn| >Mn}
_
= P[X
n
[ > M
n
and so g
n
(X
n
)) 2.
We also have that 0 g
n
(X)) f
n
(X))

f
n
(X) 1
{|X| >Mn1}
_

1
{|X| >Mn1}
_
=
P[X[ > M
n
1, and since M
n
, we have that P[X[ > M
n
1 0 and con-
sequently g
n
(X)) 0. This means there is an M such that g
m
(X)) < /2 for all
m > M. Since
_
L(X
n
), L(X)
_
0, there exists an N such that g
m
(X
n
)) /2
g
m
(X
n
)) g
m
(X)) < /2 or equivalently, g
m
(X
n
)) < for all n > N, for all
m > M. Therefore, for all n > max M, N, we have that g
n
(X
n
)) < , contra-
dicting that g
n
(X
n
)) 2 for all n.
Step 2. Let f R
R
satisfy |f|

1 and let k 0. Since


_
L(X
n
), L(X)
_
0, it
follows that f(X
n
)) f(X)) so

f(X
n
) 1
{|Xn| k}
_

f(X) 1
{|X| k}
_
.
Step 3. By the Stone-Weierstrass theorem, we can approximate any f R
R
satisfying |f|

1 by a polynomial p R
R
. It thus suces to prove that
p(X
n
)) p(X)) to prove that f(X
n
)) f(X)).
Step 4. Let M and M

be the supremum of p and p

, respectively and let q =


1
M+M

p.
Then |q|

1 and [ q(X
n
)) q(X)) [ 0, hence p(X
n
)) p(X)).
Now suppose X
n
X in distribution. Let f R
R
be bounded an continuous so
that f(X
n
)) f(X)). We will prove that
_
L(X
n
), L(X)
_
0 in four steps.
3
Step 1. The sequence (X
n
) is uniformly tight.
Step 2. Pa X
n
b Pa X b for any a, b R.
Step 3. Let , k > 0 and partition [k, k] into 2k/ intervals I
i
of length . Dene
g R
[k,k]
by g(t) =

c
i
1
I
i
(t) for some c
n
with [c
n
[ 1 and let T

be the
collection of all such gs. We claim that [ g(X
n
)) g(X)) [ 0 for any g T

:
Note that [ g(X
n
))g(X)) [ =

c
i
_
PX
n
I
i
PX I
i

[c
i
[[PX
n

I
i
PX I
i
[
3 The Central Limit Theorem
To prove that
1

n
n

i=1
X
i
converges to some normal random variable Z in distribution,
we need only prove convergence with respect to or d
k
by the last theorem of the
previous section. This is what we set out to do in the following theorem.
Theorem 3.1. Let Z N(0, 1) and suppose X
n
) = 0, var(X
n
) = 1 and

[X
3
m
[
_
=

[X
3
n
[
_
< for all m ,= n. Then
lim
n
d
3
_
L
_
1

n
n

i=1
X
i
_
, L(Z)
_
= 0
Proof. Let

X
n
=
1

n
n

i=1
X
i
, let (Z
n
) be an sequence of independent normal random
variables with mean 0 and variance 1 and let

Z
n
=
1

n
n

i=1
Z
i
. Then Z and

Z
n
have the same distribution and so d
3
(L(

X
n
), L(Z)) = d
3
(L(

X
n
), L(

Z
n
)). Ergo, it
suces to prove that d
3
(L(

X
n
), L(

Z
n
)) 0. This special quality of normal random
variables is the key to proving the central limit theorem.
Let f R
R
be three times dierentiable with [f
(i)
[ 1 for i = 1, 2, 3 and consider

_
f dL(

X
n
)
_
f dL(

Z
n
)

_
f(

X
n
)
_

_
f(

Z
n
)
_

_
f(

X
n
) f(

Z
n
)
_

4
Our goal is to show that the above goes to 0 as n . To do this, we start by
using the Lindeberg trick. This trick consists of writing f(

X
n
) f(

Z
n
) as the
sum of the dierence of n terms like so:
f
_
X
1

n
+
X
2

n
+
X
3

n
+ +
X
n

n
_
f
_
Z
1

n
+
X
2

n
+
X
3

n
+ +
X
n

n
_
+ f
_
Z
1

n
+
X
2

n
+
X
3

n
+ +
X
n

n
_
f
_
Z
1

n
+
Z
2

n
+
X
3

n
+ +
X
n

n
_
+ f
_
Z
1

n
+
Z
2

n
+
X
3

n
+ +
X
n

n
_
f
_
Z
1

n
+
Z
2

n
+
Z
3

n
+ +
X
n

n
_
+ + f
_
Z
1

n
+
Z
2

n
+
Z
3

n
+ +
X
n

n
_
f
_
Z
1

n
+
Z
2

n
+
Z
3

n
+ +
Z
n

n
_
Note that we are basically just adding
f
_
_
k

j=1
Z
j

n
+
n

j=k+1
X
j

n
_
_
+ f
_
_
k

j=1
Z
j

n
+
n

j=k+1
X
j

n
_
_
= 0
to f(

X
n
) f(

Z
n
) for k = 1, 2, . . . , n 1. The kth term of the Lindeberg trick is
just
f
_
_
k1

j=1
Z
j

n
+
X
k

n
+
n

j=k+1
X
j

n
_
_
f
_
_
k1

j=1
Z
j

n
+
Z
k

n
+
n

j=k+1
X
j

n
_
_
which, by letting S
k
=
k1

j=1
Z
j
/

n +
n

j=k+1
X
j
/

n, becomes f(S
k
+X
k
/

n)f(S
k
+
Z
k
/

n). Its Taylor expansion is


f(S
k
) + f

(S
k
)
X
k

n
+
f

(S
k
)
2

X
2
k
n
+
f

(
k
)
6

X
3
k
n
3/2
f(S
k
) f

(S
k
)
Z
k

n

f

(S
k
)
2

Z
2
k
n

f

(
k
)
6

Z
3
k
n
3/2
where
k
and
k
are real-valued functions on S. What happens when we take the
expected value of the above? In the rst column we have f(S
k
)) f(S
k
)) = 0. In
the second column, we have that
_
f

(S
k
)
X
k

n
_

_
f

(S
k
)
Z
k

n
_
=

(S
k
)
_
_
X
k

n
_

(S
k
)
_
_
Z
k

n
_
= 0
the rst equality following from the fact that S
k
is independent of both X
k
and Z
k
and the last equality following form the fact that X
k
) = Z
k
) = 0. In the third
5
column, we have
_
f

(S
k
)
2

X
2
k
n
_

_
f

(S
k
)
2

Z
2
k
n
_
= 0
using the fact that S
k
is independent of both X
k
and Z
k
and that

X
2
k
_
= var(X
k
) =

Z
2
k
_
= 1. We cant do anything about the fourth column, so the expected value of
the kth term is
_
f

(
k
)
6

X
3
k
n
3/2
_
. .
x
k

_
f

(
k
)
6

Z
3
k
n
3/2
_
. .
z
k
and consequently

_
f(

X
n
) f(

Z
n
)
_

k=1
(x
k
z
k
)

k=1
[x
k
z
k
[
n

k=1
_
[x
k
[ +[z
k
[
_
Since

[X
3
k
[
_
is some nite number a, which does not depend on k by hypothesis,
and [f

(
k
)[ 1, we have that
[x
k
[
_

(
k
)
6

X
3
k
n
3/2

a
6n
3/2
Similarly [z
k
[ b/(6n
3/2
) for some constant b (which does not depend on k since
the Z
k
are identically distributed). Therefore,

_
f(

X
n
) f(

Z
n
)
_

k=1
_
a
6n
3/2
+
b
6n
3/2
_
=
nc
n
3/2
=
c

n
where c :=
n

k=1
(a/6 + b/6) does not depend on f. As this is true for any such
f, it follows that d
3
(L(

X
n
), L(

Z
n
)) c/

n and so d
3
(L(

X
n
), L(

Z
n
)) 0. This
completes the proof.
6

Vous aimerez peut-être aussi