Stat 6166 Sampling Distribution of Mean

10-1
TOPIC (10) SAMPLING VARIABILITY AND

SAMPLING DISTRIBUTIONS
Recall that we typically cannot census the entire
population of interest so we take a sample from that
population in order to make estimates and draw
conclusions about the population.
The sample mean x is the estimator of the unknown
population mean .. Similarly, the sample standard
deviation is the estimator of the unknown population
standard deviation .
10-2
1) SAMPLING DISTRIBUTION of the Sample

Mean x
Important Point:: The value of x will vary with
each sample taken from the population.
10-3
EXAMPLE Suppose we had a very small population

of 5 units with X-values {2, 4, 8, 10, 14}. What is the
frequency distribution of the sample mean x based
on a random sample of 2 units?
Here, = 7.6 and = 4.77.
Lets take samples of size 2 with replacement. The
total number of possible samples is 15.
x
3
5
6
8
6
7
9
9
11
12
2
4
8
10
14
Mean of x : x =
VAR1
4
No of obs
Sample
(2, 4)
(2, 8)
(2, 10)
(2, 14)
(4, 8)
(4, 10)
(4, 14)
(8, 10)
(8, 14)
(10, 14)
(2, 2)
(4, 4)
(8, 8)
(10, 10)
(14, 14)
0
0
10
12
Upper Boundaries (x <= boundary)
1
(3 + 5 +"+10 + 14) = 7.6
15
14
Expected
Normal
10-4
Std. Deviation of x : x =
4.77
= 3.376
2
We can think of the list of samples (and their x

values) as a population of samples, each sample with
a value for the variable of interest!
Some Things To Note About The Behavior Of
Sample Means:
1)
2)
x varies from sample to sample (called

SAMPLING VARIABILITY)
the average of the = the average of the

sample means
population sampled
x
=
The sample mean x is said to be UNBIASED

for the population mean
3) The frequency distribution of the sample means

does not match the distribution of the original
population
centered in the same place but the shape and
variability (range) are different
10-5
4) Knowing the frequency distribution for the

sample means allows us to calculate probabilities
about the mean.
5) the variability of the < the variability of the
sample means
X-values in the
population sampled
x
<
6) The frequency distribution of the sample means

is called the SAMPLING DISTRIBUTION of
x.
Its shape and its variability, x , depend on the
sample size.
Its center, x , depends on whether the sampling is
unbiased or not.
All three characteristics depend on the sampling
method (i.e. all can change if the method changes)
10-6
Effects Of Sample Size And Sampling Method

Lets take samples of size 3 with replacement. The
total number of possible samples is 35.
(4, 10, 10)
(4, 4, 14)
(4, 14, 14 )
(8, 8, 10)
(8, 10, 10)
(8, 8, 14)
Sample
(2, 4, 8)
Frequency Distribution of Sample Means, n=3

11
10
9
8
7
No of obs
(2, 8, 10)
(2, 10, 14)
(4, 8, 10)
(4, 10, 14)
(2, 2, 4)
(2, 4, 4)
(8, 14, 14)
(10, 10, 14)
(10, 14, 14)
(2, 2, 2)
(4, 4, 4)
(8, 8, 8)
(10, 10, 10)
(14, 14, 14)
(2, 4, 10)
(2, 4, 14)
(2, 8, 14)
(4, 8, 14)
( 8, 10, 14)
(2, 2, 8)
(2, 8, 8)
(2, 2, 10)
(2, 10,10)
(2, 2, 14)
(2, 14, 14)
(4, 4, 8)
(4, 8, 8)
(4, 4, 10)
6
5
4
3
2
1
0
0
10
12
14
Mean of x : x = 7.6
Std. Deviation of x :
x =
4.77
=
= 2.754
n
3
Increasing the sample size made the shape even more

normal and decreased the variability as well.
Expected
Normal
10-7
What is the probability Pr(6.6 < x < 8.6)?

We can get an approximate answer using the fact that
it looks like x is normally distributed with a mean of
7.6 and a standard deviation of 2.75.
Pr( 6.6 < x < 8.6)

= Pr
F 6.6 7.6 < Z < 8.6 7.6I

H 2.75
2.75 K
= Pr( 0.36 < Z < +0.36)

= Pr(Z < +0.36) Pr(Z < 0.36)
= 0.6406 0.3594
= 0.2812
10-8
SAMPLING DISTRIBUTION of x :
Suppose we have a population with a mean and a
standard deviation and we take a sample of size n.
As long as the sample is random and either we keep
the sample size to less than 5% of the population or
otherwise we sample with replacement, the frequency
distribution of the sample mean has the following
characteristics:
1.
2.
x =
x =
3. The shape of the distribution is

a) a bell-curve (Normal), if the original population
that we sampled has a bell-curve distribution.
b) (CENTRAL LIMIT THEOREM) a bell-curve if
the sample size is relatively large regardless of the
shape of the frequency distribution of the
original population.
relatively large = 30 or more
10-9
EXAMPLE In a study of the evolutionary history of

the amphipod Gammarus minus, one of the variables
used to distinguish subspecies is the length of the first
antennae. If the population found in caves only recently
separated from the subspecies found in springs, the
length of the antennae should be similar in the two
groups. Spring animals have an average first antennal
length of 2.9 mm and a population standard deviation of
0.7mm.
What is the probability that your sample of 10 cave
animals would yield a mean length of 3.1 or larger if the
two subspecies split off recently ?
First we note that the sample size is relatively small

so we need to assume that antennal length is normally
distributed (which seems reasonable). Then the
sampling distribution of x is Normal with mean
x = 2.6 and standard deviation of
x =
= 0 .7
10
= 0.221.
10-10
Then
Pr( x > 3.1) = 1 Pr( x 3.1) where

x 2 . 6 3 .1 2 .6
Pr( x 3.1) = Pr
<
0
.
221
0
.
221
= Pr(Z < 2.26 ) = 0.9881

So , Pr( x > 3.1) = 1 0.9881 = 0.0119
Hence, this event is very unlikely if the two species

separated recently. Should your sample actually yield
a mean of 3.1 or more, it would imply that the
hypothesis that they split recently is wrong!
10-11
1) SAMPLING DISTRIBUTION of the Sample

Proportion p
If we want to estimate what proportion of the
population () are in the category we have defined as
a success, we take a random sample from that
population and calculate the sample proportion in that
category (p).
The shape of the sampling distribution for p depends
very heavily on the sample size n and the population
proportion .
EXAMPLE Suppose we had repeatedly tossed n=5
dice where = 0.5 for Pr(1). The frequency
distribution for the sample proportion is:
VAR1
800
700
600
No of obs
500
400
300
200
100
0
-1
Expected
Normal
10-12
The mean of this sampling distribution is 0.5 and the

standard deviation is 0.2236.
Important Points: For any given sample size, the
closer , the population proportion, is to 1/2,
A) the more symmetric the shape of the frequency
distribution of the sample proportion p
B) the larger the variability of values of p
Important Points: For any given value of , the
population proportion, a larger sample size from that
population has
A) a more symmetric shape for the frequency
distribution of the sample proportion p
B) a smaller variability in the values of p
Lets put what weve learned about sample
proportions into one statement:
10-13
SAMPLING DISTRIBUTION of p
Suppose we have a population with a binary variable.
The proportion of successes in the population is
and we take a random sample of n.
As long as the sample is random so that each sampled
unit is independent of any other sampled unit, the
frequency distribution of the sample proportion has
the following characteristics:
1.
p =
2.
p =
(1 )
n
3. (CENTRAL LIMIT THEOREM) The shape of

the distribution is approximately normal when n is
large and is not too close to 0 or 1.
The further is from 1/2, the larger n has to be in
order for the shape to be a bell-curve. A rule-ofthumb is that the CLT holds if both
n 10 and n (1 ) 10 .
10-14
EXAMPLE Suppose that the proportion of a

specific form of birth defect was 1 in 1000 live births
around the early 1900s. A researcher claims that
better hygiene and health care has decreased the rate
to something much smaller (say 1 in 10,000 now). To
test this hypothesis the scientist collects birth records
at random for 25,000 children born in 1999. There
were 17 children with the birth defect. What is the
probability of observing so few defects or even fewer
if the 1 in a 1000 rate is still true?
If = 1/1000 is true then the mean proportion of
successes in random samples of 25000 is
p = = 0.001 and the standard deviation for a sample
proportion is
p =
(1 )
=
n
0.001(0.999 )
= 0.0002 . A
25000
random sample of 25,000 is sufficiently large for normality

but lets check to make sure:
n = 25000(0.001) = 25 and of course
n = 25000(0.999 ) = 24975 . Both are bigger than 10
so we can proceed.
17
0.00068 0.001
Pr p
= Pr Z
25000
0.0002
= Pr(Z 1.60 ) = 0.0548

There is evidence to suggest that the rate has gone
down but it isnt very strong.

Stat 6166 Sampling Distribution of Mean

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stat 6166 Sampling Distribution of Mean

Transféré par

Droits d'auteur :

Formats disponibles

10-1

TOPIC (10) SAMPLING VARIABILITY AND

1) SAMPLING DISTRIBUTION of the Sample

EXAMPLE Suppose we had a very small population

Upper Boundaries (x <= boundary)

We can think of the list of samples (and their x

x varies from sample to sample (called

the average of the = the average of the

The sample mean x is said to be UNBIASED

3) The frequency distribution of the sample means

4) Knowing the frequency distribution for the

6) The frequency distribution of the sample means

Effects Of Sample Size And Sampling Method

Frequency Distribution of Sample Means, n=3

Upper Boundaries (x <= boundary)

Increasing the sample size made the shape even more

What is the probability Pr(6.6 < x < 8.6)?

Pr( 6.6 < x < 8.6)

F 6.6 7.6 < Z < 8.6 7.6I

= Pr( 0.36 < Z < +0.36)

3. The shape of the distribution is

EXAMPLE In a study of the evolutionary history of

First we note that the sample size is relatively small

Pr( x > 3.1) = 1 Pr( x 3.1) where

= Pr(Z < 2.26 ) = 0.9881

Hence, this event is very unlikely if the two species

1) SAMPLING DISTRIBUTION of the Sample

Upper Boundaries (x <= boundary)

The mean of this sampling distribution is 0.5 and the

3. (CENTRAL LIMIT THEOREM) The shape of

EXAMPLE Suppose that the proportion of a

random sample of 25,000 is sufficiently large for normality

= Pr(Z 1.60 ) = 0.0548

Vous aimerez peut-être aussi