Vous êtes sur la page 1sur 7

The Central Limit Theorem

A Review of Terminology

We begin our journey into inferential statistics. Most of the time the
population mean and population standard deviation are impossible or too
expensive to determine exactly. Two of the major tasks of a statistician
is to get an approximation to the mean and analyze how accurate the
approximation is. The most common way of accomplishing this task is
by using sampling techniques. Out of the entire population the
researcher obtains a (hopefully random) sample from the population and
uses the sample to make inferences about the population. From the
sample the statistician computes several numbers such as the sample
size, the sample mean, and the sample standard deviation. The numbers
that are computed from the sample are called statistics.

Example

How many cups of coffee do you drink each week?

If we asked this question to two different five person groups, we will


probably get two different sample means and two different sample
standard deviations. Choosing different samples from the same
population will produce different statistics.

The distribution of all possible samples is called the sampling


distribution.

The Five Dice Experiment:


Consider the distribution of rolling a die. It is uniform (flat)
between 1 and 6. We will roll five dice we can compute the pdf of the
mean. We will see that the distribution becomes more like a normal
distribution. The experiment can be modeled at Best Site or Another
site.
The Central Limit Theorem

Let x denote the mean of a random sample of size n from a population


having mean and standard deviation . Let

x = mean value of x and

x = the standard deviation of x

then

A. x = 
B.

C. When the population distribution is normal so is the distribution


of x for any n.
D. For large n, the distribution of x is approximately normal
regardless of the population distribution

Rule of thumb: n > 30 is large

Example:

Suppose that we play a slot machine such you can either double your bet
or lose your bet. If there is a 45% chance of winning then the expected
value for a dollar wager is

1(.45) + (-1)(.55) = -.1

We can compute the standard deviation:

x p(x) (x - )2 p(x)(x - )2


1 .45 1.21 .545
-1 .55 .81 .446
Total .991

So the standard deviation is

If we throw 100 silver dollars into the slot machine then we expect to
average a loss of ten cents with a standard deviation of

Notice that the standard deviation is very small.


This is why the casinos are assured to make
money.

Now let us find the probability that the gambler


does not lose any money, that is the mean is
greater than or equal to 0.
We first compute the z-score. We have

0 - (-.1)
z = = 1.01
.0995

Now we go to the table to find the associated probability. We


get .8438. Since we want the area to the right, we subtract from 1 to get

P(z > 1.01) = 1 - P(z < 1.01) = 1 - .8438 = .1562

There is about a 16% chance that the gambler will not lose.

Sampling Distributions for Proportions

The last example was a special case of proportions, that is Boolean data.
For now on, we can use the following theorem.

The Central Limit Theorem for Proportions

Let p be the probability of success, q be the probability of failure. The


sampling distribution for samples of size n is approximately normal
with mean

Example

The new Endeavor SUV has been recalled because 5% of the cars
experience brake failure. The Tahoe dealership has sold 200 of these
cars. What is the probability that fewer than 4% of the cars from Tahoe
experience brake failure?

Solution

We have

p = .05 q = .95 n = 200

We have

mp = p = .05 sp = = .0154

Next we want to find

P(x < 8)

Using the continuity correction, we find instead

P(x < 7.5)

This is equivalent to

P(p < 7.5/200) = P(p < .0375)

We find the z-score

.0375 - .05
z = = -.81
.0154

The table gives a probability of .2090. We can conclude that there is


about a 21% chance that fewer than 4% of the cars will experience brake
failure.
Control Charts for Proportions

A while back we discussed how to construct a control chart.


Click here for this discussion. For proportions, we can use the same tool
remembering that the Central Limit Theorem tells us how to find the
mean and standard deviation.

Example

Heavenly Ski resort conducted a study of falls on its advanced run over
twelve consecutive ten minute periods. At each ten minute interval there
were 40 boarders on the run. The data is shown below:

Time 1 2 3 4 5 6 7 8 9 10 11 12
r 14 18 11 16 19 22 6 12 13 16 9 17
r/40 .35 .45 .275 .4 .475 .55 .15 .3 .325 .4 .225 .425

Make a P-Chart and list any out of control signals by type (I, II, III).

Solution

First we find p by dividing the total number of falls by the total number
of skiers:

173
p = = .36
12(40)

Now we compute the mean


Now we find two and three standard deviations above and below the
mean are

.36 - (2)(.08) = .20 .36 - (3)(.08) = .04

.36 + (2)(.08) = .52 .36 + (3)(.08) = .68

Now we can use this data as before to construct a control chart and
determine any out of control signals.

Notice that no nine consecutive points lie on one side of the blue line, no
two of three points lie above 0.52 or below 0.20, and no points lie
below 0.04 or above 0.68. Hence this data is in control.

Vous aimerez peut-être aussi