Vous êtes sur la page 1sur 4

Central Limit Theorem:

CLT setup:
1. Let X be any random variable that follows a probability distribution,
called population distribution. Let and be the mean and the
standard deviation (SD) of this population distribution .
2. Let X be the mean of a sample drawn from the population with sample
is also a random variable.
size n. Note that X
3. The probability distribution that X
follows is called the sampling
distribution of the sample mean, or the sampling distribution.
4. Let X and X be the mean and the standard deviation of the
sampling distribution, respectively. ( X is also called standard error,
SE, for convenience.)
CLT results:
1. The sampling distribution of sample mean is approximately
normal;
2. X = ;
3. X = / n ;
4. the approximation improves as n increases.

CLT Examples
1. The average male drinks 2 L of water when active outdoors (with a standard
deviation of .8 L). You are planning a full day nature trip for 64 men and will
bring 130 L of water. What is the probability that you will run out?
Step 1: the population random variable, X, is the amount of water a male
drinks when active outdoors. X follows a distribution (we dont need to know
what distribution it is) with mean =2 and standard deviation =0.8 .

X , in this problem, is the average amount of


water this group of 64 men will consume during this full day nature trip.
The sample size n is 64. According to CLT, X is approximately normal
0.8
=0.1 .
with X ==2 and X = / n=
64
Step 2:

The sample mean,

Step 3:

>130/64 = 2.03125).
We are looking for P(out of water) = P( X

>130/64 = 2.03125) = 1 - NORM.DIST(2.03125, 2, 0.1, 1) =


Step 4: P( X
0.3773. That is, the probability that you will run out of water is 37.73%.
2. You sample 25 apples from your farms harvest of over 200,000 apples. The
mean weight of the sample is 100 grams (with a 30 gram sample standard

deviation). What is the probability that the mean weight of all 200,000 apples is
within 90 and 110 grams?
Step 1: Let X be the weight of any apple from the harvest. X is a random
variable following a population distribution which we don't know. Let be
population mean and be population SD. We are looking for P(90<=
<=110).
Step 2:
X is the sample mean. Sample size n = 25. Since n is sufficiently

large , X approximately follows a normal distribution (the sampling


distribution). X = and X = / n . However, is unknown. We
replace it with sample standard deviation, denoted by s . Thus,

Remark:

is normally distributed. Then the statistic

s 30
= =6
n 5

(this is

nothing but a z score) follows standard normal distribution. The sample standard
deviation, s, is itself a random variable. Strictly speaking, the statistic

X
s
n

no longer follows a normal distribution, but a t distribution. Good news is that our
sample size n = 25 is sufficiently large. Normal distribution is then a quite
accurate approximation to t distribution.
Step 3:

P(90 110)
= P(90 X 110)

- 10 X X
= P( X

+ 10)

= P(- 10 X
10)
X

= P(- 10/6 X X 10/6 ) (SE = 6)


= NORM.S.DIST(10/6, 1) NORM.S.DIST(-10/6, 1) = 0.9044
That is, the probability that the population mean weight is between 90 and
110 grams is 90.44%.

CLT Exercises
To estimate the annual starting salary for accounting majors, researchers randomly
selected a sample of 400 accounting graduates. It is found that the average annual
starting salary of these 400 graduates is $45,000 with a standard deviation of
$10,000.
a. What is the probability that the mean annual starting salary of all accounting
majors will be between $40,000 and $50,000?

b.

What is the probability that the mean annual starting salary of all accounting
majors will be between $50,000 and $60,000?
c. What is the probability that the mean annual starting salary of all accounting
majors will be less than $35,000?
d. What is the probability that the mean annual starting salary of all accounting
majors will be more than $55,000?

Bottom 15% means P(X< a) = 15%. So, need to use .15 instead of .85.

When you want to know the probability, we use this: NORM.DIST(x, mean, standard_dev,
TRUE)

But now, we've already know the probability , we want to calculate the x, so use:

NORM.INV(probability, mean, standard_dev)

which in this case is : NORM.INV(15%;45000;5000) = 39817.83

a)S.E. approx = (sample S.D.)/(sqrt(n)= 10000/sqrt(400)= 500


P(40000<=u<=50000) = P(X-5000<=u<=X+5000) = P(-5000<=Xu<=5000) = P(-5000/500SE<=X-u<=5000/500SE)
plug into Excel as NORM.S.DIST(5000/500,1)- NORM.S.DIST(-5000/500,1) =

1 ... so 100% probablility that mean salary is between 40k and 50k

b) P(50000<=u<=60000) = P(X+5000<=u<=X+10000) = P(5000<=X-u<=10000) =


P(5000/500SE<=X-u<=10000/500SE)
plug into Excel as NORM.S.DIST(5000/500,1)- NORM.S.DIST(10000/500,1) = 0 ... so 0% probablility
that mean salary is between 50k and 60k
c) and d) I am getting 0%

Be very careful when you use the term standard deviation. Is it


population standard deviation (sigma), or sample standard deviation (s),
or standard deviation of the sampling distribution of sample mean
(standard error)? In CLT, what we need is standard error, which is equal
to sigma/sqrt(n). when sigma is unknown, we use the approximation:
standard error = s/sqrt(n).

Vous aimerez peut-être aussi