Vous êtes sur la page 1sur 29

Fifth Week

Sampling Distributions

Adapted From :
Scientists, 8th Ed
Probability & Statistics for Engineers & Scientists Ed.
Walpole/Myers/Myers/Ye (c)2007
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur (c)2000
South-Western College Publishing
Statistics for Managers
Using Microsoft Excel 4th Edition

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-1
Populations & Samples

A population is the set (possibly infinite) of all possible observations.


Each observation is a random variable X having some (often unknown)
probability distribution, f (x).
( )
If the population distribution is known, it might be referred to, for
example, as a normal population, etc.
A sample is a subset of a population.
Our goal is to make inferences about the population based on an
y of the sample.
analysis p
A biased sample, usually obtained by taking convenient, rather than
representative observations, will consistently over- or under-estimate
some characteristic of the population.
Observations in a random sample are made independently and at
random. Here, random variables X1, X2, , Xn in the sample all have
same distribution as the population
population, X
X.
Inferential Statistics

Populasi Sampel
Statistik sbg
ringkasan sampel

Parameter sbg
ringkasan
g populasi
p p

P
Penarikan
ik kesimpulan
k i l terhadap
h d populasi
l i melalui
l l i sampell
Parameter & Statistik

Rata-rata tinggi peserta


pelatihan
pe at a ASDIS 3
tahun terakhir
Parameter

Populasi

Rata
Rata rata tinggi peserta Sampel
pelatihan ASDI tahun ini
Statistik
Sampling Distributions

A sampling distribution is a
distribution of all of the possible
values of a statistic for a given size
sample selected from a population

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-5
Sample Statistics

Any function of the random variables X1, X2, , Xn


making up a random sample is called a statistic.
The most important statistics, as we have seen are the
sample
p mean, sample p variance and sample
p standard
deviation:
n

X
X = i =1 i

n
X) n X ( X )
n n n

(X
2 2 2

S 2
= i =1 i
= i =1 i i =1 i

n 1 n(n 1)
Sampling Distributions

Starting with an unknown population distribution, we


can study the sampling distribution, or distribution of a
sample statistic (like Xbar or S) calculated from a sample
of size n from that population.
The sample consists of independent and identically distributed
observations X1, X2, , Xn from the population.
Based on the sampling distributions of Xbar and S for samples of
size n, we will make inferences about the population mean and
variance and .
We could approximate the sampling distribution off Xbar by taking
a large number of random samples of size n and plotting the
distribution of the Xbar values.
Developing a
S
Sampling
li DiDistribution
t ib ti

Assume there is a population


C D
Population size N=4 A B

Random variable,
variable XX,
is age of individuals
Values of X: 18, 20,
22,, 24 (y
(years))

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-8
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)

Summary Measures for the Population Distribution:

=
X i P( )
P(x)
N .3
18 + 20 + 22 + 24
= = 21 .2
4 .1
1

i
0
(X
( ) 2
18 20 22 24 x
= = 2.236
2 236
N A B C D
Uniform Distribution

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-9
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)
N
Now consider
id allll possible
ibl samples
l off size
i n=2
2
1st 2nd Observation
16 Sample
Obs 18 20 22 24
Means
18 18,18 18,20 18,22 18,24
1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18
22 18 22,20
22 20 22,22
22 22 22,24
22 24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-10
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)
Sampling Distribution of All Sample Means

16 S
Sample
l MMeans Sample Means
Distribution
1st 2nd Observation _
Obs 18 20 22 24 P(X)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-11
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)

Summary Measures of this Sampling Distribution:

X =
X i
=
18 + 19 + 21 + " + 24
= 21
N 16

X =
i X
(X ) 2

(18 - 21)2 + (19 - 21)2 + " + (24 - 21)2


= = 1.58
16

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-12
Comparing the Population with its
S
Sampling
li Di
Distribution
ib i
Population Sample Means Distribution
N=4 n=2
= 21 = 2.236 X = 21 X = 1.58
1 58
_
P(X) P(X)
.3 .3

.2 .2

.1
1 .1
1

0 X 0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-13
Sampling Distribution Summary

Normal distribution: Sampling distribution of Xbar when is known


for any population distribution.
Also the sampling distribution for the difference of the means of
two different samples.
t-distribution: Sampling distribution of Xbar when is unknown and
S is used. Population must be normal.
Also the sampling distribution for the difference of the means of
two different samples when is unknown.
Chi-square (2) distribution: Sampling distribution of S2. Population
must be normal.
F-distribution: The distribution of the ratio of two 2 random
variables. Sampling distribution of the ratio of the variances of two
different samples
samples. Population must be normal
normal.
Standard Error of the Mean

Different samples of the same size from the same


population will yield different sample means
A measure of the variability from sample to sample is
given by the Standard Error of the Mean:


X =
n
Note that the standard error of the mean decreases as
the sample size increases

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-15
If the Population is Normal

If a population is normal with mean and


standard deviation , the sampling distribution
of X is also normally distributed with


X = and X =
n
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-16
Sampling Distribution Properties

For sampling with replacement:


As n increases, Larger
x decreases sample size

Smaller
sample size

x
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-17
Central Limit Theorem
The central limit theorem is the most important theorem
in statistics. It states that
If Xbar is the mean of a random sample of size n from a
population with an arbitrary distribution with mean and
variance 2, then as n n, the sampling distribution of
Xbar approaches a normal distribution with mean and
sstandard
a da d de a o /
deviation /n .
The central limit theorem holds under the following
conditions:
For any population distribution if n 30.
For n < 30, if the population distribution is generally shaped like
a normall di
distribution.
t ib ti
For any value of n if the population distribution is normal.
If the Population is not Normal

We can apply the Central Limit Theorem:


Even if the population is not normal,
normal
sample means from the population will be
approximately normal as long as the sample size is
large enough
and the sampling
p g distribution will have


x = and x =
n
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-19
Central Limit Theorem

the sampling
As the n
distribution
sample
becomes
size g
gets
almost normal
large
regardless of
enough
g
shape of
population

x
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-20
If the Population is not Normal
(continued)

Population Distribution
Sampling distribution
properties:
Central Tendency

x =
x
Variation Sampling Distribution
(becomes normal as n increases)
x = Larger
n Smaller
sample size
sample
size
(Sampling with
replacement)
l t)
x x
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-21
How Large is Large Enough?

For most distributions, n > 30 will give a


sampling
li di
distribution
t ib ti ththatt iis nearly
l normall
For fairly symmetric distributions,
distributions n > 15
For normal population distributions, the
sampling distribution of the mean is always
normally distributed

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-22
Example

Suppose a population has mean = 8 and


standard deviation = 2
2. Suppose a random
sample of size n = 25 is selected.

What is the probability that the sample mean is


between 7.8 and 8.2?

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 6-23
Example: = 8 =2 n = 25
P ( 7.8 < X < 8.2 ) = ?
7.8
7 8 8 X X 88.2 28
P ( 7.8 < X < 8.2 ) = P < <
2 / 25 X 2 / 25
= P ( .5 < Z < .5 ) = .3830

Sampling Distribution Standardized


2 Normal Distribution
X = = .4 Z =1
25
.1915

7.8 8.2 X 0.5 0.5 Z


X = 8
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
Z = 0 Chap 6-24
Inferences About the
Population Mean

We often want to test hypotheses about the population mean (hypothesis


testing will be formalized later).
Example:
Suppose a manufacturing process is designed to produce parts with
= 6 cm in diameter, and suppose is known to be .15 cm. If a random
sample of 80 parts has xbar = 6.046 cm, what is the probability (P-
value) that a value this far from the mean could occur by chance if is
truly 6 cm?
6.046 6.00
z= = 2.74
.15 / 80
P [| X 6.0 | .046] = P [| Z | 2.74] = ?

P [| Z | 2.74] = 2 P [ Z 2.74] = 2(1 .9969) = .0062


Difference Between Two Means

In addition, we can make inferences about the


difference between two population means based on the
difference between two sample means.
The central limit theorem also holds in this case.
If independent samples of size n1 and n2 are drawn at
random from two populations, discrete or continuous,
with means 1 and 2, and variances 12 and 22, then
as n gets large, the sampling distribution of Xbar1 - Xbar2
approaches a normal distribution with
2
2

X1X 2
=
1 2
and 2

X1X 2
= 1
+ 2
.
n 1
n 2
Difference of Two Means
Example

Example:
Suppose we record the drying time in hours of 20 samples each
of two types of paint, type A and type B. Suppose we know that
the population standard deviations are both equal to 1/2 hour.
Assuming that the population means are equal
equal, what is the
probability that the difference in the sample means is greater
than 1/2 hour? = = 0 X A X B A B

2
2
.25 .25
2

X AX B
= A
+ B
= + = .025
n A
n B
20 20
( x x ) ( ) .5 0
z= = = 3.16
A B
A B

( n ) + ( n )
2

A
.025
02A
2

B B
t-Distribution (when is
Unknown)

The problem with the central limit theorem is that it assumes that is
known.
Generally,
G iff is being estimated from
f the sample, must be
estimated from the sample as well.
The t-distribution can be used if is unknown, but it requires that the
original population must be normally distributed.
Let X1, X2, ..., Xn be independent, normally distributed random variables
with mean and standard deviation . Then the random variable T below
has a
t-distribution with = n - 1 degrees of freedom:
X
T =
S/ n
The t-distribution is like the normal, but with greater spread since both
and have fluctuations due to sampling.
Using the tt-Distribution
Distribution

Observations on the t-Distribution:


The t-statistic is like the normal, but using S rather than .
Table value depends on the sample size (degrees of freedom).
The t-distribution is symmetric with = 0, but 2 > 1. As would be
expected,
p 2 is largest
g for small n.
Approaches the normal distribution (2 = 1) as n gets large.
For a given probability , table shows the value of t that has P() to
the right of it.
it
The t-distribution can also be used for hypotheses concerning the
difference of two means where 1 and 2 are unknown, as long as the two
populations are normally distributed
distributed.
Usually if n 30, S is a good enough estimator of , and the normal
distribution is typically used instead.

Vous aimerez peut-être aussi