Vous êtes sur la page 1sur 9

7 Chebyshev’s inequality and the law of large num-

bers

In this chapter we will look at some results from probability which


can be useful in statistics.
We first prove a result, called Markov’s inequality, which we will
use to prove Chebyshev’s inequality. But Markov’s inequality can
also be used to find some useful results, as we shall see. Note that
both Markov and Chebyshev are Russian names and the originals are
written in the cyrillic alphabet. There are different ways of writing
these letters in our alphabet. So some books may have Markoff or
Tchebysheff or other versions.
Theorem 7.1. Markov’s inequality
Let X be a random variable and g a non-negative function with
domain the real line; then
E[g(X)]
P [g(X) ≥ k] ≤ for every k ≥ 0.
k
Proof. Assume that X is continuous with pdf f (x), then
Z ∞
E[g(X)] = g(x)f (x) dx
Z −∞ Z
= g(x)f (x) dx + g(x)f (x) dx
Z{x:g(x)≥k} {x:g(x)<k}

≥ g(x)f (x) dx since g(x), f (x) ≥ 0


{x:g(x)≥k}
Z
≥ kf (x) dx
{x:g(x)≥k}
= kP [g(X) ≥ k]

and the result follows.

70
Example 7.1. X is a random variable taking only non-negative val-
ues. The mean E[X] = µ. Use the Markov inequality to find an
upper bound for P (X ≥ 3µ). Find the exact probability if X has an
exponential distribution with mean θ−1 .
We may use Markov’s inequality with g(x) = x and k = 3µ to get
E[X] µ 1
P (X ≥ 3µ) ≤ = = .
3µ 3µ 3

If X has an exponential distribution with mean θ−1 then the required


probability is
Z ∞
θe−xθ dx = [−e−xθ ]∞3/θ = 0 + e
−3
= 0.0498
3/θ

This value is a lot less than the upper bound but that applies to all
possible distributions.
Example 7.2. Suppose X has a binomial distribution with n = 90
and p = 1/3. Use Markov’s inequality to find an upper bound for
P (X ≥ 50).
We may use Markov’s inequality with g(x) = x and k = 50 to get
E[X] 30
P (X ≥ 50) ≤ = = 0.6.
50 50
Theorem 7.2. Chebyshev’s inequality
If X is a random variable with mean µ and finite variance σ 2 then
1
P [|X − µ| ≥ rσ] = P [(X − µ)2 ≥ r2 σ 2 ] ≤ 2
r
for every r > 0.
Proof. Take g(x) = (x − µ)2 and k = r2 σ 2 in Markov’s inequality.
Then
2 2 2 E[(X − µ)2 ]
P [(X − µ) ≥ r σ ] ≤
r2σ 2
but E[(X − µ)2 ] = σ 2 and the result follows.

71
Note that we can rewrite the inequality as
1
P [|X − µ| < rσ] ≥ 1 − .
r2

If we let ε = rσ so that r = ε/σ we have


σ2
P [|X − µ| ≥ ε] =≤
ε2
and
σ2
P [|X − µ| < ε] ≥ 1 − 2 .
ε
We see that
1
P [µ − rσ < X < µ + rσ] ≥ 1 −
r2
so that for example if r = 2
1 3
P [µ − 2σ < X < µ + 2σ] ≥ 1 − = .
4 4

Of course particular distributions may greatly exceed this lower bound.


For example if X is normally distributed then

P [µ − 2σ < X < µ + 2σ] > 0.95.

There are examples of random variables which attain the lower bound
(See Assignment 9 for one such) so we cannot improve on Cheby-
shev’s inequality without imposing extra conditions.
Example 7.3. Y is a random variable with mean 11 and variance 9.
Use Chebyshev’s inequality to find

(a) a lower bound for P (6 < Y < 16)


(b) the value of c such that P (|Y − 11| ≥ c) ≤ 0.09.

72
(a)
P (6 < Y < 16) = p(11 − 5 < Y < 11 + 5)
= P (|Y − 11| < 5)
9
≥ 1−
25
16
= = 0.64.
25
(b)
σ2
P (|Y − 11| ≥ c) ≤
c2
Therefore σ 2 /c2 = 0.09 so c2 = 9/0.09 and c = 10.
Example 7.4. The US mint produces dimes with an average diameter
of 0.5ins and a standard deviation of 0.01ins. Using Chebyshev’s
inequality find a lower bound for the number of coins in a batch of
400 having diameter between 0.48 and 0.52.
We use
1
P [|X − µ| < rσ] ≥ 1 −
r2
with µ = 0.5, σ = 0.01 and r = 2. So
1 3
P [|X − 0.5| < 2 × 0.01] ≥ 1 − =
4 4
therefore at least 300 of the 400 coins will lie between 0.48 and
0.52ins.
Example 7.5. For a certain soil the number of wireworms per cubic
foot has mean 100. Assuming a Poisson distribution for the number
of wireworms, give an interval that will include at least 5/9 of the
sample values on wireworm counts obtained from a large number of
1 cubic foot samples.
We have µ = 100, σ 2 = 100 (since it is Poisson) and so σ = 10.
Using Chebyshev’s inequality we have
1
P [|X − 100| < 10r] ≥ 1 − 2
r
73
so that
1
P (100 − 10r < X < 100 + 10r) ≥ 1 −
r2
We want 1 − r12 = 5/9 which implies that r = 3/2 so the required
interval is (85, 115).

Consider a random variable Y which is the number of successes in


n Bernoulli trials with probability of success p on each trial. Thus
Y has a binomial distribution with parameters n and p. Y /n is the
proportion of successes. If p is unknown then Y /n is an estimate of
p. How close is Y /n to p as n increases?
We know that
 
Y 1
Var = 2 Var(Y )
n n
1
= 2 np(1 − p)
n
p(1 − p)
=
n

Let ε > 0. Then Chebyshev’s inequality gives us


 
Y p(1 − p)
P − p < ε ≥ 1 −
n nε2
So for a fixed ε and 0 < p < 1 if we take the limit as n tends to
infinity we have
   
Y p(1 − p)
lim P − p < ε ≥ lim 1 − =1
n→∞ n n→∞ nε2
but a limit of probabilities cannot be larger than 1 so
 
Y
lim P − p < ε = 1.
n→∞ n
So if n is large enough Y /n will be within ε of p for any ε > 0.

74
More generally, let X be a random variable with density fX (x),
mean µ and finite variance σ 2 . Let X̄n be the sample mean of a
random sample of size n from this distribution. So X̄n has mean µ
and variance σ 2 /n. Let ε and δ be any two specified numbers satis-
fying ε > 0 and 0 < δ < 1. If n is any integer greater than σ 2 /ε2 δ
then
P (|X̄n − µ| ≤ ε) = P (−ε ≤ X̄n − µ ≤ ε) ≥ 1 − δ
Proof. We apply Markov’s inequality with g(X) = (X̄n − µ)2 and
k = ε2 . Then
P (|X̄n − µ| < ε) = P ((X̄n − µ)2 < ε2 )
E[(X̄n − µ)2 ]
≥ 1−
ε2
σ 2 /n
= 1− 2
ε
≥ 1−δ
σ2 σ2
so long as δ > nε2 or equivalently n > δε2 .

This is called a law of large numbers


Example 7.6. X has an unknown mean and variance equal to 1. How
large a random sample must be taken in order that the probability
will be at least 0.95 that the sample mean will lie within 0.5 of the
population mean?
We have σ 2 = 1, ε = 0.5, 1 − δ = 0.95 so that δ = 0.05. So we
need
σ2 1
n> 2 = = 80.
δε 0.05(0.5)2
Example 7.7. How large a sample must be taken in order that you
are 99% certain that X̄n is within 0.5σ of µ. We have ε = 0.5σ,
1 − δ = 0.99 so that δ = 0.01. So we need
σ2 σ2
n> 2 = = 400.
δε 0.01(0.5σ)2

75
8 Limiting moment generating functions and the Cen-
tral Limit Theorem

We start by showing that a binomial can be approximated by a Pois-


son when n is sufficiently large and p is fairly small. You should
have seen this before but we are going to show it in a new way by
taking the limit of an mgf. We make use of the following theorem
which we will not prove:
Theorem 8.1. If a sequence of moment generating functions ap-
proaches a certain mgf, say M (t), then the limit of the correspond-
ing distributions must be the distribution with mgf M (t).

Let Y be binomial with parameters n and p. We take the limit as


n → ∞ such that np = λ so that p → 0.
The moment generating function of Y is
MY (t) = (1 − p + pet )n
. Letting p = λ/n we have
 n
λ λ t
MY (t) = 1 − + e
n n
n
λ(et − 1)

= 1+
n

A well known result is that limn→∞ (1 + b/n)n = eb . Using this


result we see that
lim MY (t) = exp{λ(eλ − 1)}
n→∞

but this is the mgf of a Poisson distribution with mean λ and hence
applying the Theorem we see that the limit of the binomial distribu-
tions is a Poisson distribution.

76
We can now see a proof of the Central Limit Theorem. This result
has been stated and used before to justify assuming a sample mean is
approximately normal and other approximations. This proof applies
to distributions with a finite positive variance and a moment gener-
ating function. Other versions of the theorem make less restrictive
assumptions but we cannot prove those.
Theorem 8.2. Central Limit Theorem
If X̄n is the mean of a random sample X1 , X2 , . . . , Xn from a distri-
bution with a finite mean µ and a finite positive variance σ 2 and the
moment generating function of X exists, then the distribution of
P
X̄n − µ Xi − nµ
Wn = √ = √
σ/ n σ n
is N (0, 1) in the limit as n → ∞.
Proof. We let Y = (X − µ)/σ with moment generating function
MY (t).
We find the moment generating function of Wn .
   X 
t
E[exp(tWn )] = E exp √ ( Xi − nµ)
σ n
      
t X1 − µ t Xn − µ
= E exp √ · · · exp √
n σ n σ
       
t X1 − µ t Xn − µ
= E exp √ · · · E exp √
n σ n σ
  n
t
= MY √
n
Now E[Y ] = 0 and Var[Y ] = E[Y 2 ] = 1 so MY0 (0) = 0 and
MY00 (0) = 1.
We expand MY (t) in a Taylor expansion as
MY (t) = 1 + MY0 (0)t + MY00 (0)t2 /2! + MY000 (0)t3 /3! + · · ·
= 1 + t2 /2 + MY000 (0)t3 /3! + · · ·

77

therefore MY (t/ n) is given by

MY (t/ n) = 1 + t2 /2n + MY000 (0)t3 /3!n3/2 + · · ·
1 t2 MY000 (0)t3
 
= 1+ + √ + ···
n 2 n3!
We see that
n
√ 1 t2 MY000 (0)t3
 
MY (t/ n)n = 1+ + √ + ···
n 2 n3!
Thus if n is large we can truncate the expansion and so
 2 n
√ n

t
lim MY (t/ n) = lim 1 +
n→∞ n→∞ 2n
= exp(t2 /2)

but this is the mgf of a standard normal distribution and so we see


that Wn has a limiting N (0, 1) distribution.

78

Vous aimerez peut-être aussi