Vous êtes sur la page 1sur 40

Experimental Design and

Analysis of Variance
Chapter 11
McGraw-Hill/Irwin Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Experimental Design and
Analysis of Variance
11.1 Basic Concepts of Experimental Design
11.2 One-Way Analysis of Variance
11-2
11.1 Basic Concepts of
Experimental Design
We have considered only one way of
collecting and comparing data:
Using independent random samples
Often data is collected as the result of an
experiment
To systematically study how one or more factors
(variables) influence the variable that is being
studied
11-3
Experimental Design
In an experiment, there is strict control over the factors
(independent variables) contributing to the experiment
The values or levels of the factors are called treatments
The objective is to compare and estimate the effects of
different treatments on the response variable
The different treatments are assigned to objects (the
test subjects) called experimental units
When a treatment is applied to more than one
experimental unit, the treatment is being replicated
11-4
Experimental Design
A designed experiment is an experiment where
the analyst controls which treatments are used
and how they are applied to the experimental
units
Example: An oil company wishes to study how
three different gasoline types (A, B, and C) affect
the mileage of a midsized car.
Response Variable: Mileage
Treatments: Gasoline Type (A, B, and C)
Experimental Units: Midsized Cars
11-5
Experimental Design
In a completely randomized experimental design,
independent random samples are assigned to each
of the treatments
For example, suppose three experimental units are to be
assigned to five treatments
For completely randomized experimental design, randomly
pick three experimental units for one treatment, randomly
pick three different experimental units from those
remaining for the next treatment, and so on
11-6
Experimental Design
Once the experimental units are assigned
and the experiment is performed, a value of
the response variable is observed for each
experimental unit
Obtain a sample of values for the response
variable for each treatment
11-7
Example: Battery Testing
Suppose you wish to determine which of
three brands of AA battery (Energizer,
Eveready, and Tiger) lasts the longest when
used in a remote controlled car. You have 30
cars, so you assign 10 to each battery brand.
Determine the following:
Response Variable
Treatment
Experimental Unit
Gasoline Mileage Case
North American Oil Company is attempting to
develop a reasonably priced gasoline that will
deliver improved gasoline mileages. As part
of its development process, the company
would like to compare the effects of three
types of gasoline (A, B and C) on gasoline
mileage. To test the three types of gasoline,
the company assigned 5 cars for each type of
gasoline and measured the mileages.
11.2 One-Way Analysis of
Variance
Objective is to estimate and compare the effects of the
different treatments on the response variable.
Given p treatments on a response variable, we try to
estimate the differences between the means
i
of each
treatment.
11-10
ANOVA
Want to study the effects of all p treatments on a
response variable
For each treatment, find the mean and standard deviation
of all possible values of the response variable when using
that treatment
For treatment i, find treatment mean
i
One-way analysis of variance estimates and
compares the effects of the different treatments on
the response variable
By estimating and comparing the treatment means
1
,
2
,
,
p
One-way analysis of variance, or one-way ANOVA
11-11
ANOVA Notation
p is the total number of treatments
i is the representation of a treatment (ex: A, B, C)
n
i
denotes the size of the sample randomly selected
for treatment i
x
ij
is the j
th
value of the response variable using
treatment i

i
is the average of the sample of n
i
values for
treatment i

i
is the point estimate of the treatment mean
i
s
i
is the standard deviation of the sample of n
i
values for treatment i
s
i
is the point estimate for the treatment (population)
standard deviation
i
11-12
Gasoline Mileage Case
p = 3 i = A, B, C
n
A
= n
B
= n
C
= 5
Type A Type B Type C
x
A1
=34.0 x
B1
=35.3 x
C1
=33.3
x
A2
=35.0 x
B2
=36.5 x
C2
=34.0
x
A3
=34.3 x
B3
=36.4 x
C3
=34.7
x
A4
=35.5 x
B4
=37.0 x
C4
=33.0
x
A5
=35.8 x
B5
=37.6 x
C5
=34.9
Gasoline Mileage Case
The mean of a sample is the point
estimate for the corresponding
treatment mean

A
= 34.92 mpg estimates
A

B
= 36.56 mpg estimates
B

C
= 33.98 mpg estimates
C
Gasoline Mileage Case
Gasoline Mileage Case
The standard deviation of a sample is the
point estimate for the corresponding
treatment standard estimates
s
A
= 0.7662 mpg estimates
A
s
B
= 0.8503 mpg estimates
B
s
C
= 0.8349 mpg estimates
C
One-Way ANOVA
Assumptions
1. Completely randomized experimental design
Assume that a sample has been selected
randomly for each of the p treatments on the
response variable using a completely randomized
experimental design
2. Constant variance
The p populations of values of the response
variable (associated with the p treatments) all
have the same variance
11-17
3. Normality
The p populations of values of the response
variable all have normal distributions
4. Independence
The samples of experimental units are randomly
selected, independent samples
11-18
One-Way ANOVA
Assumptions
One-Way ANOVA
Assumptions
To make sure that unequal variances will not
be a problem:
Take the same sample size per treatment
Check the different sample standard deviations
General Rule: The one-way ANOVA results will
be approximately correct if the largest sample
standard deviation is no more than twice the
smallest sample standard deviation.
Gasoline Mileage Case
The standard deviation of a sample is the
point estimate for the corresponding
treatment standard estimates
s
A
= 0.7662 mpg estimates
A
s
B
= 0.8503 mpg estimates
B
s
C
= 0.8349 mpg estimates
C
Testing for Significant Differences
Between Treatment Means
Are there any statistically significant differences
between the sample (treatment) means?
The null hypothesis is that the mean of all p
treatments are the same
H
0
:
1
=
2
= =
p
The alternative is that some (or all, but at least two)
of the p treatments have different effects on the
mean response
H
a
: at least two of
1
,
2
, ,
p
differ
11-21
Testing for Significant Differences
Between Treatment Means
Compare the between-treatment variability
to the within-treatment variability
Between-treatment variability is the variability of
the sample means from sample to sample
Ex: Variability between
A
,
B
,
C
Within-treatment variability is the variability of the
treatments (that is, the values) within each sample
Ex: Variability between
A
and x
A1
, x
A2
,, x
A5
11-22
Comparing Between-Treatment
Variability and Within-Treatment
Variability
11-23
Partitioning the Total Variability
in the Response
Total
Variability
= Between
Treatment
Variability
+ Within
Treatment
Variability
Total Sum of
Squares
= Treatment Sum of
Squares
+ Error Sum of
Squares
SSTO = SST + SSE
( ) ( ) ( )

= = = = =
+ =
p
i
n
j
i ij
p
i
n
j
p
i
i i ij
i i
x x x x n x x
1 1
2
1 1 1
2 2
11-24
Mean Squares
The treatment mean-squares is
The error mean-squares is
1
=
p
SST
MST
p n
SSE
MSE

=
11-25
Gasoline Mileage Case
( ) ( ) ( ) ( )
2 2 2
1
2
x x n x x n x x n x x n SST
C C B B A A
p
i
i i
+ + = =

=
( ) ( ) ( )
2 2 2
153 . 35 98 . 33 5 153 . 35 56 . 36 5 153 . 35 92 . 34 5 + + =
0493 . 17 =
( ) ( ) ( ) ( )

= = = = =
+ + = =
C B A i
n
j
C Cj
n
j
B Bj
n
j
A Aj
p
i
n
j
i ij
x x x x x x x x SSE
1
2
1
2
1
2
1 1
2
028 . 8 =
F Test for Difference Between
Treatment Means
Suppose that we want to compare p
treatment means
The null hypothesis is that all treatment
means are the same:
H
0
:
1
=
2
= =
p
The alternative hypothesis is that they are not
all the same:
H
a
: at least two of
1
,
2
, ,
p
differ
11-27
F Test for Difference Between
Treatment Means
Define the F statistic:
The p-value is the area under the F curve to
the right of F, where the F curve has p 1
numerator and n p denominator degrees of
freedom
( )
( ) p n
SSE
p
SST
MSE
MST
F=

=
1
11-28
F Test for Difference Between
Treatment Means
Reject H
0
in favor of H
a
at the o level of
significance if
F > F
o
, or if
p-value < o
F
o
is based on p 1
numerator and n p
denominator degrees
of freedom
11-29
Gasoline Mileage Case
Computing for the F statistic
To test H
0
at o = 0.05, we use F
0.05
with
Numerator: p 1 = 3 1 = 2
Denominator: n p = 15 3 = 12
F
0.05
= 3.89
Since F = 12.74 > F
0.05
= 3.89, we reject H
0
( )
( )
( )
( )
74 . 12
3 15
028 . 8
1 3
0493 . 17
1
=

=
p n
SSE
p
SST
MSE
MST
F=
Excel Output: ANOVA Test
Anova: Single
Factor
SUMMARY
Groups Count Sum Average Variance
Type A 5 174.6 34.92 0.587
Type B 5 182.8 36.56 0.723
Type C 5 169.9 33.98 0.697
Excel Output: ANOVA Test
ANOVA
Source of
Variation SS df MS F P-value F crit
Between Groups 17.0493 2 8.5246 12.7424 0.0011 3.8853
Within Groups 8.028 12 0.669
Total 25.07733 14
F Test for Difference Between
Treatment Means
From the F test, we can conclude that at
least two of the treatment means differ. But
how do we know which ones differ?
We compare two means at a time. (Pairwise
Comparison)
Pairwise Comparisons,
Individual Intervals
Tukey simultaneous 100(1 - o)%
confidence interval for
i

h
:
q
o
is the upper o percentage point of
the studentized range for p and (n p)
from Table A.9
m denotes common sample size
( )
m
MSE
q x x
h i

Pairwise Comparisons,
Individual Intervals
If the sample sizes of the two treatment
means are unequal:
( )
(

+
h i

h i
n n
MSE
q
x x
1 1
2
Confidence Intervals for
Treatment Means
A point estimate of the treatment mean is the
sample mean of a treatment
We can also make a confidence interval for
each treatment with a confidence level of (1-
o)
(

i
i
n
MSE
t x
2 / o
Hypothesis Testing Between
Treatment Means
Ho:
i
-
h
= 0
Ha:
i
-
h
= 0
This test tells us whether the two treatment
means are equal or different.
|
|
.
|

\
|
+

=
h i
h i
n n
MSE
x x
t
1 1
Hypothesis Testing Between
Treatment Means
Critical Value = r = p , v = n p
Rejection Rule: If the test statistic is greater
than the critical value, reject Ho.
If we reject Ho, this means that the two
treatment means are not equal.
2
o
q
Hypothesis Testing Between
Treatment Means
Tukey simultaneous comparison t-values (d.f. = 12)
Type C Type A Type B
33.98 34.92 36.56
Type C 33.98
Type A 34.92 1.82
Type B 36.56 4.99 3.17
critical values for experimentwise error rate:
0.05 2.67
0.01 3.56
Hypothesis Testing Between
Treatment Means
p-values for pairwise t-tests
Type C Type A Type B
33.98 34.92 36.56
Type C 33.98
Type A 34.92 .0942
Type B 36.56 .0003 .0081

Vous aimerez peut-être aussi