Vous êtes sur la page 1sur 55

Introduction to Probability and Statistics

Handout #7

Instructor: Lingzhou Xue

TA: Daniel Eck

The pdf file for this class is available on the class web page.

Chapter 8
Fundamental Sampling Distributions and
Data Descriptions

and Sample Variance S 2.


Sample Mean X

Histogram and Box Plot.

Central Limit Theorem (CLT).

2, t, and F Distributions.

Example 1: Sample Distribution


The sample distribution is the distribution resulting from the
collection of actual data. A major characteristic of a sample is
that it contains a finite (countable) number of scores, the number of scores represented by the letter n. For example, suppose
that the following data were collected:
15 14 15 18 15 20 15 16 17 14 17 13 11 14 18 12 17 12 21 8
14 17 14 12 13 15 15 16 17 14 16 13 14 15 18 16 16 17 14 15
16 15 17 12 14 14 13 13 13 14
These numbers constitute a sample distribution.

0.10
0.05
0.00

Density

0.15

0.20

Histogram

10

12

14

16

18

Sample Distribution.

20

In addition to the frequency distribution, the sample distribution can be described with numbers, called statistics. Examples
of statistics are the mean, median, mode, standard deviation,
range, and correlation coefficient, among others.
If a different sample was taken, different scores would result.
However, there would also be some consistency in that while the
statistics would not be exactly the same, they would be similar. To achieve order in this chaos, statisticians have developed
probability models.

0.00

0.00

0.05

0.05

0.15

0.20

0.15
8

8
10

10
12

12
14
16

14
16

18

18
20
0.10

Density

0.10

Density

0.00

0.00

0.05

0.05

0.10

Density

0.10

Density
0.15

0.15

0.20

Histogram
Histogram

20
10
12

10
14

12
16

x
x

Histogram
Histogram

14
x

18

16

20

18

22

20

Random Sampling

Population
A population consists of the totality of the observations with
which we are concerned.
It is the entire group we are interested in, which we wish to
describe or draw conclusions about.
Sample
A sample is subset of a population.

2008 Presidential Race from CNN.


7

Example 2
If you wanted to find out the percentage of students at UMN
who enjoy reading Time. If we randomly select 20% of the population, this selection would be the sample in this experiment.
Therefore, the population would be all of the students who
attend UMN.

A simple random sample of size n consists of n individuals from


the population chosen in such a way that every set of n individuals
has an equal chance to be the sample actually selected.
Random Sample
Let X1, X2, . . . , Xn be n independent random variables, each
having the same probability distribution f (x). Define X1, X2, . . . , Xn
to be a random sample of size n from the population f (x) and
write its joint probability distribution as
g(x1, x2, . . . , xn) = f (x1)f (x2) f (xn).

Some Important Statistics

10

Statistic
A statistic is a function of random variables that does not depend upon any unknown parameter.
Sample Mean & Sample Variance
If X1, X2, . . . , Xn represent a random sample of size n, then the
sample mean is defined by the statistic
n
1 X
=
Xi
X
n i=1

and the sample variance is defined by the statistic


n
X
1
2.
S2 =
(Xi X)
n 1 i=1
11

Example 3
A comparison of coffee prices at 4 randomly selected grocery
stores in San Diego showed increases from the previous month
of 12, 15, 17, and 20 cents for 1-pound bag. Find the variance
of this random sample of price increases.
Solution:
4
1 X
x
=
xi = 16 cents.
n i=1

s2 =

4
34
1 X
2
(xi x
) =
.
4 1 i=1
3

12

Theorem
If S 2 is the variance of a random sample of size n, we may write

2
n
n
X
X
1

Xi2
Xi .
S2 =
n
n(n 1)
i=1
i=1

Proof:

13

Example 4
Find the sample mean and variance of the data 3, 4, 5, 6, 6, and
7, representing the number of trout caught by a random sample
of 6 fishermen on June 19, 1996, at Lake Muskoka.
Solution:
6
X

x2
i = 171,

i=1

6
X

xi = 31,

n = 6.

i=1

Hence,
s2 =

13
1
2
[6 171 31 ] =
.
56
6

Thus the sample standard deviation s =

13/6 = 1.47.
14

Example 5
The numbers of incorrect answers on a true-false competency
test for a random sample of 15 students were recorded as follows:
2, 1, 3, 0, 1, 3, 6, 0, 3, 4. Find

the sample mean;

the sample variance.

15

Mode
The mode in a list of numbers refers to the list of numbers that
occur most frequently. A trick to remember this one is to
remember that mode starts with the same first two letters that
most does. Most frequently - Mode. Youll never forget that
one!
Median
The median is the middle value in your list. When the totals of
the list are odd, the median is the middle entry in the list after
sorting the list into increasing order. When the totals of the
list are even, the median is equal to the sum of the two middle
(after sorting the list into increasing order) numbers divided by
two. Thus, remember to line up your values, the middle number
is the median! Be sure to remember the odd and even rule.
16

Data Displays and Graphical Methods

17

Box Plot
A box plot (also known as a box-and-whisker diagram or plot or
candlestick chart) is a convenient way of graphically depicting the
five-number summary, which consists of 25% percentile (lower
quartile or first quartile (Q1)), median, 75% percentile (upper
quartile or third quartile (Q3)) and adjust values; in addition,
the boxplot indicates which observations, if any, are considered
unusual, or outliers.
Outlier
Outliers are observations that are considered to be unusually far
from the bulk of the data. Technically, one may view an outlier
as being an observation that represents a rare event. If the
distance from the box exceeds 1.5 times the interquartile range,
Q3 Q1 (in either direction), the observation may be labeled an
outlier.
18

Box Plot

19

Example 6
The following set of numbers are the amount of marbles fifteen
different boys own (they are arranged from least to greatest).
18 27 34 52 54 59 61 68 78 82 85 87 91 93 100.

Find the median.

Find the lower quartile.

Find the upper quartile.

Find the interquartile range.


20

Box-and-Whisker Plot for Example 6.

21

Sampling Distribution of Means

22

Sampling Distribution
The probability distribution of a statistic is called a sampling
distribution.

23

If we are sampling from a population with unknown distribu will


tion, either finite or infinite, the sampling distribution of X
be approximately normal with mean and variance 2/n provided that the sample size is large (n > 30).
Central Limit Theorem
is the mean of a random sample of size n taken from a
If X
population with mean and finite variance 2, then the limiting
form of the distribution of

X
Z=

/ n
as n , is the standard normal distribution N (0, 1).
24

Example 7
be the sample mean of a random sample of size 100 drawn
Let X
from an exponential distribution with its graph given by
1 x/4
,
f (x) = e
4

x>0

Exponential p.d.f with = 4.


25

Decide which of the graphs labeled (a)-(d) would most closely


Exresemble the sampling distribution of the sample mean X.
plain briefly your reasoning.

Example 8
An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed, with mean equal
to 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an average
life of less than 775 hours?
Solution:

26

Example 9
The blood cholesterol levels of a population of workers have
mean 202 and standard deviation 14.

1. If a sample of 36 workers is selected, approximate the probability that the sample mean of their blood cholesterol levels
will lie between 198 and 206.

2. Repeat 1 when the sample size is 64.

27

Solution:

28

Sampling Distribution: Difference Between Two Averages


If independent samples of size n1 and n2 are drawn at random
from two populations, discrete or continuous, with means 1
and 2, and variances 12 and 22, respectively, then the sampling
1 X
2, is approximately
distribution of the differences of means, X
normally distributed with mean and variance given by
X
1 X
2 = 1 2 ,

and

12
22
2
X
+
.
1 X
2 =
n1
n2

Hence
Z=

1 X
2)
(X
X1 X2
r
12
22
n1 + n2

is approximately a standard normal variable.


29

Example 10
The television picture tubes of manufacture A have a mean lifetime of 6.5 years and a standard deviation 0.9 year, while those
of manufacturer B have a mean lifetime of 6.0 years and a standard deviation of 0.8 year. What is the probability that a random
sample of 36 tubes from manufacturer A will have a mean lifetime that is at least 1 year more than the mean lifetime of a
sample of 49 tubes from manufacturer B?
Solution:

Example 11
The mean score for freshmen on an aptitude test at a certain
college is 540, with a standard deviation of 50. What is the
probability that two groups of students selected at random, consisting of 64 and 100 students, respectively, will differ in their
mean scores by

1. more than 10 points?

2. an amount between 5 and 10 points?

30

Solution:

31

Sampling Distribution of S 2

32

Sampling Distribution of S 2
If S 2 is the variance of a random sample of size n taken from a
normal population having the variance 2, then the statistic

n
2
X
(n 1)S 2
(Xi X)
2
=
=
2

2
i=1

has a chi-squared distribution with = n 1 degrees of freedom.

33

Example 12
Find the probability that a random sample of 21 observations,
from a normal population with variance 2 = 5, will have a
variance s2
1. greater than 2.065;
2. between 2.065 and 3.6445.
Solution:

34

tDistribution

35

tDistribution
Let Z be a standard normal random variable and V a chisquared random variable with degrees of freedom. If Z and V
are independent, then the distribution of the random variable
T , where
Z
q
T =
V /
is given by the density function
[( + 1)/2]
t2 (+1)/2
(1 + )
,
h(t) =
(/2)

< t < .

This is known as the tdistribution with degrees of freedom.


36

0.4

t Distributions

v=2

0.1

0.2

v=100

0.0

f(x)

0.3

v=5

The t-Distribution curves for = 2, 5 and 100.


37

Corollary
Let X1, X2, . . . , Xn be independent random variables that are all
normal with mean and standard deviation . let
n
1 X
=
X
Xi
n i=1

and

n
X
1
2.
S2 =
(Xi X)
n 1 i=1

X
has a t-distribution with
Then the random variable T = S/
n
= n 1 degrees of freedom.

38

Example 13
Find k such that P (k < T < 1.761) = 0.045 for a random
sample of size 15 selected from a normal distribution with mean

X
.
and T = s/
n
Solution:

39

What Is the t-Distribution Used for?


The t-distribution is used extensively in problems that deal with
inference about the population mean or in problems that involve comparative samples (i.e., in cases where one is trying to
determine if means from two samples are significantly different). The reader should note that the use of the t-distribution
for the statistic

X
T =

S/ n
requires that X1, X2, . . . , Xn be normal.

40

Example 14
Suppose scores on an IQ test are normally distributed, with a
mean of 100. Suppose 25 people are randomly selected and
tested. The standard deviation in the sample group is 25. What
is the probability that the average test score in the sample group
will be at most 110.3?
Solution:

41

Note for 2
(v)
2
In the textbook, we have = P (2 > 2
(v)). That is,
represent the 2-value above which we find an area equal to .

Note for t(v)


In the textbook, we use = P (T > t(v)). That is, t represent
the t-value above which we find an area equal to .
Note for f(v1,v2)
In the textbook, we have = P (F > f(v1, v2)). That is, f
represent the f -value above which we find an area equal to .
42

Example 13b

X
for a random sample of size n = 8.
Consider T = s/
n

Calculate P (T < 2.517) and P (2.998 < T < 3.499)


Find k such that P (k < T < 2.517) = 0.975.

Solution:

43

F -Distribution
Let U and V be two independent random variables having chisquared distribution with 1 and 2 degrees of freedom, respecU/
tively. Then the distribution of the random variable F = V /1 is
2
given by the density

[(1+2)/2](1/2)1/2
x(1 /2)1
, x > 0;
(
+
)/2
(1 /2)(2 /2)
1
2
f (x) =
(1+1 x/2 )

0,
x 0.

This is known as the F distribution with 1 and 2 degrees of


freedom.

44

1.5

2.0

F Distributions

v1=100, v2=100
v1=6, v2=10
1.0
0.5
0.0

f(x)

v1=10, v2=30

The F -Distribution curves.


45

Theorem
If S12 and S22 are the variances of independent random samples
of size n1 and n2 taken from normal populations with variances
12 and 22, respectively, then
22S12
S12/12
F = 2 2= 2 2
S2 /2
1 S2
has an F -distribution with 1 = n1 1 and 2 = n2 1 degrees
of freedom.

46

What Is the F -Distribution Used for?


The F -distribution is used in two-sample situations to draw inferences about the population variances. However, the F distribution is applied to many other types of problems in which
the sample variances are involved. In fact, the F -distribution is
called the variance ratio distribution.

47

Example 15
If S12 and S22 represent the variances of independent random samples of size n1 = 31 and n2 = 25, taken from normal populations
with variances 12 = 20 and 22 = 10, respectively, find:
1. P


2
S2 < 7.526 .

2. P

S12
S22

> 3.88 .

Solution:

48

Solution:

1.
P (S22 < 7.526) = P

24 S22
24 7.526
<

=1P

10


10

2
24 > 36.4152

= 1 0.05
= 0.95

49

2.
P

S12
> 3.88
2
S2

!
2
2
2
S1 2
2
=P
2 > 3.88 2
2
S2 1
1
!
2
2
S1 2
1
=P

> 3.88
2
S22 12



= P F(30,24) > 1.94


= 0.05