Vous êtes sur la page 1sur 14

10-1

TOPIC (10) SAMPLING VARIABILITY AND


SAMPLING DISTRIBUTIONS
Recall that we typically cannot census the entire
population of interest so we take a sample from that
population in order to make estimates and draw
conclusions about the population.
The sample mean x is the estimator of the unknown
population mean .. Similarly, the sample standard
deviation is the estimator of the unknown population
standard deviation .

10-2

1) SAMPLING DISTRIBUTION of the Sample


Mean x
Important Point:: The value of x will vary with
each sample taken from the population.

10-3

EXAMPLE Suppose we had a very small population


of 5 units with X-values {2, 4, 8, 10, 14}. What is the
frequency distribution of the sample mean x based
on a random sample of 2 units?
Here, = 7.6 and = 4.77.
Lets take samples of size 2 with replacement. The
total number of possible samples is 15.
x
3
5
6
8
6
7
9
9
11
12
2
4
8
10
14

Mean of x : x =

VAR1
4

No of obs

Sample
(2, 4)
(2, 8)
(2, 10)
(2, 14)
(4, 8)
(4, 10)
(4, 14)
(8, 10)
(8, 14)
(10, 14)
(2, 2)
(4, 4)
(8, 8)
(10, 10)
(14, 14)

0
0

10

12

Upper Boundaries (x <= boundary)

1
(3 + 5 +"+10 + 14) = 7.6
15

14

Expected
Normal

10-4

Std. Deviation of x : x =

4.77
= 3.376
2

We can think of the list of samples (and their x


values) as a population of samples, each sample with
a value for the variable of interest!
Some Things To Note About The Behavior Of
Sample Means:
1)

2)

x varies from sample to sample (called


SAMPLING VARIABILITY)

the average of the = the average of the


sample means
population sampled
x
=

The sample mean x is said to be UNBIASED


for the population mean

3) The frequency distribution of the sample means


does not match the distribution of the original
population
centered in the same place but the shape and
variability (range) are different

10-5

4) Knowing the frequency distribution for the


sample means allows us to calculate probabilities
about the mean.
5) the variability of the < the variability of the
sample means
X-values in the
population sampled
x
<

6) The frequency distribution of the sample means


is called the SAMPLING DISTRIBUTION of
x.
Its shape and its variability, x , depend on the
sample size.
Its center, x , depends on whether the sampling is
unbiased or not.
All three characteristics depend on the sampling
method (i.e. all can change if the method changes)

10-6

Effects Of Sample Size And Sampling Method


Lets take samples of size 3 with replacement. The
total number of possible samples is 35.
(4, 10, 10)
(4, 4, 14)
(4, 14, 14 )
(8, 8, 10)
(8, 10, 10)
(8, 8, 14)

Sample
(2, 4, 8)

Frequency Distribution of Sample Means, n=3


11
10
9
8
7
No of obs

(2, 8, 10)
(2, 10, 14)
(4, 8, 10)
(4, 10, 14)
(2, 2, 4)
(2, 4, 4)
(8, 14, 14)
(10, 10, 14)
(10, 14, 14)
(2, 2, 2)
(4, 4, 4)
(8, 8, 8)
(10, 10, 10)
(14, 14, 14)
(2, 4, 10)
(2, 4, 14)
(2, 8, 14)
(4, 8, 14)
( 8, 10, 14)
(2, 2, 8)
(2, 8, 8)
(2, 2, 10)
(2, 10,10)
(2, 2, 14)
(2, 14, 14)
(4, 4, 8)
(4, 8, 8)
(4, 4, 10)

6
5
4
3
2
1
0
0

10

12

14

Upper Boundaries (x <= boundary)

Mean of x : x = 7.6
Std. Deviation of x :
x =

4.77
=
= 2.754
n
3

Increasing the sample size made the shape even more


normal and decreased the variability as well.

Expected
Normal

10-7

What is the probability Pr(6.6 < x < 8.6)?


We can get an approximate answer using the fact that
it looks like x is normally distributed with a mean of
7.6 and a standard deviation of 2.75.

Pr( 6.6 < x < 8.6)


= Pr

F 6.6 7.6 < Z < 8.6 7.6I


H 2.75
2.75 K

= Pr( 0.36 < Z < +0.36)


= Pr(Z < +0.36) Pr(Z < 0.36)
= 0.6406 0.3594
= 0.2812

10-8

SAMPLING DISTRIBUTION of x :
Suppose we have a population with a mean and a
standard deviation and we take a sample of size n.
As long as the sample is random and either we keep
the sample size to less than 5% of the population or
otherwise we sample with replacement, the frequency
distribution of the sample mean has the following
characteristics:
1.
2.

x =

x =

3. The shape of the distribution is


a) a bell-curve (Normal), if the original population
that we sampled has a bell-curve distribution.
b) (CENTRAL LIMIT THEOREM) a bell-curve if
the sample size is relatively large regardless of the
shape of the frequency distribution of the
original population.
relatively large = 30 or more

10-9

EXAMPLE In a study of the evolutionary history of


the amphipod Gammarus minus, one of the variables
used to distinguish subspecies is the length of the first
antennae. If the population found in caves only recently
separated from the subspecies found in springs, the
length of the antennae should be similar in the two
groups. Spring animals have an average first antennal
length of 2.9 mm and a population standard deviation of
0.7mm.
What is the probability that your sample of 10 cave
animals would yield a mean length of 3.1 or larger if the
two subspecies split off recently ?

First we note that the sample size is relatively small


so we need to assume that antennal length is normally
distributed (which seems reasonable). Then the
sampling distribution of x is Normal with mean
x = 2.6 and standard deviation of

x =

= 0 .7

10

= 0.221.

10-10

Then

Pr( x > 3.1) = 1 Pr( x 3.1) where


x 2 . 6 3 .1 2 .6
Pr( x 3.1) = Pr
<

0
.
221
0
.
221

= Pr(Z < 2.26 ) = 0.9881


So , Pr( x > 3.1) = 1 0.9881 = 0.0119

Hence, this event is very unlikely if the two species


separated recently. Should your sample actually yield
a mean of 3.1 or more, it would imply that the
hypothesis that they split recently is wrong!

10-11

1) SAMPLING DISTRIBUTION of the Sample


Proportion p
If we want to estimate what proportion of the
population () are in the category we have defined as
a success, we take a random sample from that
population and calculate the sample proportion in that
category (p).
The shape of the sampling distribution for p depends
very heavily on the sample size n and the population
proportion .
EXAMPLE Suppose we had repeatedly tossed n=5
dice where = 0.5 for Pr(1). The frequency
distribution for the sample proportion is:
VAR1
800
700
600

No of obs

500
400
300
200
100
0
-1

Upper Boundaries (x <= boundary)

Expected
Normal

10-12

The mean of this sampling distribution is 0.5 and the


standard deviation is 0.2236.
Important Points: For any given sample size, the
closer , the population proportion, is to 1/2,
A) the more symmetric the shape of the frequency
distribution of the sample proportion p
B) the larger the variability of values of p
Important Points: For any given value of , the
population proportion, a larger sample size from that
population has
A) a more symmetric shape for the frequency
distribution of the sample proportion p
B) a smaller variability in the values of p
Lets put what weve learned about sample
proportions into one statement:

10-13

SAMPLING DISTRIBUTION of p
Suppose we have a population with a binary variable.
The proportion of successes in the population is
and we take a random sample of n.
As long as the sample is random so that each sampled
unit is independent of any other sampled unit, the
frequency distribution of the sample proportion has
the following characteristics:
1.

p =

2.

p =

(1 )
n

3. (CENTRAL LIMIT THEOREM) The shape of


the distribution is approximately normal when n is
large and is not too close to 0 or 1.
The further is from 1/2, the larger n has to be in
order for the shape to be a bell-curve. A rule-ofthumb is that the CLT holds if both

n 10 and n (1 ) 10 .

10-14

EXAMPLE Suppose that the proportion of a


specific form of birth defect was 1 in 1000 live births
around the early 1900s. A researcher claims that
better hygiene and health care has decreased the rate
to something much smaller (say 1 in 10,000 now). To
test this hypothesis the scientist collects birth records
at random for 25,000 children born in 1999. There
were 17 children with the birth defect. What is the
probability of observing so few defects or even fewer
if the 1 in a 1000 rate is still true?
If = 1/1000 is true then the mean proportion of
successes in random samples of 25000 is
p = = 0.001 and the standard deviation for a sample
proportion is

p =

(1 )
=
n

0.001(0.999 )
= 0.0002 . A
25000

random sample of 25,000 is sufficiently large for normality


but lets check to make sure:
n = 25000(0.001) = 25 and of course
n = 25000(0.999 ) = 24975 . Both are bigger than 10
so we can proceed.

17
0.00068 0.001

Pr p
= Pr Z

25000
0.0002

= Pr(Z 1.60 ) = 0.0548


There is evidence to suggest that the rate has gone
down but it isnt very strong.

Vous aimerez peut-être aussi