Vous êtes sur la page 1sur 39

Ch. 16.

Single-Factor Studies
Ming-Hung (Jason) Kao STP-531. Ch. 16
Overview
An example
ANOVA model, ANOVA table & a partition of the total sum
of squares
F test for equality of factor level means
Ming-Hung (Jason) Kao STP-531. Ch. 16
An Example
Example 1. To determine the best dosage level for a drug to treat
a medical condition
three dosage levels are considered
30 patients are enrolled and are randomly assigned to the
treatment groups; each group has 10 patients
such a design is said to be balanced since each treatment is
replicated the same number of times
Here, we have a single-factor (3 levels) experimental study;
e.g.
dosage level 1 #16, #23, #18, #6, #8, #2, #4, #7, #9, #25
dosage level 2 #27, #24, #17, #28, #22, #29, #20, #13, #14, #15
dosage level 3 #19, #1, #12, #3, #5, #30, #21, #10, #26, #11
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
Let Y
ij
be the response obtained from the j th subject of the
i th treatment group. We decompose the response into two
parts: 1) the factor level mean & 2) noise
The statistical model can be written as:
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Assumptions of the model:
Corresponding to each factor level (or each i ), there is a
probability distribution of responses.
Each probability distribution is normal
Each probability distribution has the same variance
Responses are statistically independent
These assumptions can be summarized by:
Y
ij
ind.
N(
i
,
2
)
or, equivalently,

ij
i .i .d.
N(0,
2
)
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
Let Y
ij
be the response obtained from the j th subject of the
i th treatment group. We decompose the response into two
parts: 1) the factor level mean & 2) noise
The statistical model can be written as:
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Assumptions of the model:
Corresponding to each factor level (or each i ), there is a
probability distribution of responses.
Each probability distribution is normal
Each probability distribution has the same variance
Responses are statistically independent
These assumptions can be summarized by:
Y
ij
ind.
N(
i
,
2
)
or, equivalently,

ij
i .i .d.
N(0,
2
)
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
Let Y
ij
be the response obtained from the j th subject of the
i th treatment group. We decompose the response into two
parts: 1) the factor level mean & 2) noise
The statistical model can be written as:
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Assumptions of the model:
Corresponding to each factor level (or each i ), there is a
probability distribution of responses.
Each probability distribution is normal
Each probability distribution has the same variance
Responses are statistically independent
These assumptions can be summarized by:
Y
ij
ind.
N(
i
,
2
)
or, equivalently,

ij
i .i .d.
N(0,
2
)
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
Let Y
ij
be the response obtained from the j th subject of the
i th treatment group. We decompose the response into two
parts: 1) the factor level mean & 2) noise
The statistical model can be written as:
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Assumptions of the model:
Corresponding to each factor level (or each i ), there is a
probability distribution of responses.
Each probability distribution is normal
Each probability distribution has the same variance
Responses are statistically independent
These assumptions can be summarized by:
Y
ij
ind.
N(
i
,
2
)
or, equivalently,

ij
i .i .d.
N(0,
2
)
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
Let Y
ij
be the response obtained from the j th subject of the
i th treatment group. We decompose the response into two
parts: 1) the factor level mean & 2) noise
The statistical model can be written as:
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Assumptions of the model:
Corresponding to each factor level (or each i ), there is a
probability distribution of responses.
Each probability distribution is normal
Each probability distribution has the same variance
Responses are statistically independent
These assumptions can be summarized by:
Y
ij
ind.
N(
i
,
2
)
or, equivalently,

ij
i .i .d.
N(0,
2
)
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
A graphical interpretation (Figure 16.1 on page 680):
This can be viewed as:
the observations in the i th treatment group form a random
sample of a population with mean
i
variance
2
, and the
probability distribution of each population is Normal.
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
In general, a single-factor ANOVA model can be written as:
Y
ij
=
i
+
ij
; i = 1, ..., r , j = 1, ..., n
i
.

ij
i .i .d.
N(0,
2
)
We have r treatment groups, each has n
i
subjects

1
, ...,
r
are unknown, and are parameters of interest

2
is also an unknown parameter
we would like to use data to study these unknown parameters
(will learn later!)
note. this model is called the cell means model in the
textbook.
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
Some statistical properties:
E(
ij
) = 0 and var (
ij
)
2
{
ij
} =
2
E(Y
ij
) = E(
i
+
ij
) =
i

2
{Y
ij
} =
2
{
i
+
ij
} =
2
Note.
i
is unknown, but xed (population mean of the i th
treatment group)
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
Example (page 685). A food company wished to test four package
designs for a new breakfast cereal. Twenty stores were selected as
experimental units, each was randomly assigned one of the
package designs. Sales, in number of cases, were observed and
recorded below (with one missing value).
Design 1 11 17 16 14 15 Total=73 Mean=14.6
Design 2 12 10 15 19 11 Total=67 Mean=13.4
Design 3 23 20 18 17 Total=78 Mean=19.5
Design 4 27 33 22 26 28 Total=136 Mean=27.2
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
What is the factor?
What are the treatments?
How to formulate an ANOVA model?
Our interest lies in the true mean response of each factor
level. How would you use the data to estimate (approximate)
these true means (
i
in your model)?
Design 1 11 17 16 14 15 Total=73 Mean=14.6
Design 2 12 10 15 19 11 Total=67 Mean=13.4
Design 3 23 20 18 17 Total=78 Mean=19.5
Design 4 27 33 22 26 28 Total=136 Mean=27.2
Why is your estimate a good one?
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
What is the factor?
What are the treatments?
How to formulate an ANOVA model?
Our interest lies in the true mean response of each factor
level. How would you use the data to estimate (approximate)
these true means (
i
in your model)?
Design 1 11 17 16 14 15 Total=73 Mean=14.6
Design 2 12 10 15 19 11 Total=67 Mean=13.4
Design 3 23 20 18 17 Total=78 Mean=19.5
Design 4 27 33 22 26 28 Total=136 Mean=27.2
Why is your estimate a good one?
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
What is the factor?
What are the treatments?
How to formulate an ANOVA model?
Our interest lies in the true mean response of each factor
level. How would you use the data to estimate (approximate)
these true means (
i
in your model)?
Design 1 11 17 16 14 15 Total=73 Mean=14.6
Design 2 12 10 15 19 11 Total=67 Mean=13.4
Design 3 23 20 18 17 Total=78 Mean=19.5
Design 4 27 33 22 26 28 Total=136 Mean=27.2
Why is your estimate a good one?
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
A good estimate? (some statistical properties; optional)
Review. In a regression problem, we would like to t a
regression line, say y
i
=
0
+
1
x
i
+
i
to the data. To
estimate the unknown parameters,
0
and
1
, we consider the
least squares criterion, which is to nd

0
and

1
so that
Q =

i
(y
i
(

0
+

1
x
i
))
2
=

i
(y
i
y
i
)
2
=

i
e
2
i
(error sum of squares; SSE) is minimized.
Our model here is: Y
ij
=
i
+
ij
and SSE is
Q =

j
(Y
ij

i
)
2
=

j
(Y
ij


Y
ij
)
2
.
Fact. Q is minimized when
i
=

Y
i
(mean of the i th group);
see Comments on page 688.
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
A good estimate? (some statistical properties; optional)
Review. In a regression problem, we would like to t a
regression line, say y
i
=
0
+
1
x
i
+
i
to the data. To
estimate the unknown parameters,
0
and
1
, we consider the
least squares criterion, which is to nd

0
and

1
so that
Q =

i
(y
i
(

0
+

1
x
i
))
2
=

i
(y
i
y
i
)
2
=

i
e
2
i
(error sum of squares; SSE) is minimized.
Our model here is: Y
ij
=
i
+
ij
and SSE is
Q =

j
(Y
ij

i
)
2
=

j
(Y
ij


Y
ij
)
2
.
Fact. Q is minimized when
i
=

Y
i
(mean of the i th group);
see Comments on page 688.
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
Now,
i
=

Y
i
(the sample group mean) is the least squares
estimator; its value is a least square estimate it is obtained
using the least square criterion.
Good properties:
The estimator is unbiased; E{
i
} =
i
The estimator has the smallest variance (the most precise)
among all linear estimators (Best Linear Unbiased Estimator;
BLUE).
With Normality, these estimators are minimum variance
unbiased estimators
With Normality, the least square estimators are also the
maximum likelihood estimators
Ming-Hung (Jason) Kao STP-531. Ch. 16
Fitting of ANOVA model
Now,
i
=

Y
i
(the sample group mean) is the least squares
estimator; its value is a least square estimate it is obtained
using the least square criterion.
Good properties:
The estimator is unbiased; E{
i
} =
i
The estimator has the smallest variance (the most precise)
among all linear estimators (Best Linear Unbiased Estimator;
BLUE).
With Normality, these estimators are minimum variance
unbiased estimators
With Normality, the least square estimators are also the
maximum likelihood estimators
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
In general, for a single-factor ANOVA model: Y
ij
=
i
+
ij
the factor level (or treatment) mean
i
is estimated by the
LSE:

i
=

Y
i
=
Y
i 1
+ Y
i 2
+ + Y
in
i
n
i
the tted value for Y
ij
(used to predict the response in the i
the group) is:

Y
ij
=
i
=

Y
i
.
The residual is: e
ij
= Y
ij


Y
ij
the deviation of an observation from its tted value
an important property:

j
e
ij
= 0
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
In general, for a single-factor ANOVA model: Y
ij
=
i
+
ij
the factor level (or treatment) mean
i
is estimated by the
LSE:

i
=

Y
i
=
Y
i 1
+ Y
i 2
+ + Y
in
i
n
i
the tted value for Y
ij
(used to predict the response in the i
the group) is:

Y
ij
=
i
=

Y
i
.
The residual is: e
ij
= Y
ij


Y
ij
the deviation of an observation from its tted value
an important property:

j
e
ij
= 0
Ming-Hung (Jason) Kao STP-531. Ch. 16
Building a single-factor ANOVA model
In general, for a single-factor ANOVA model: Y
ij
=
i
+
ij
the factor level (or treatment) mean
i
is estimated by the
LSE:

i
=

Y
i
=
Y
i 1
+ Y
i 2
+ + Y
in
i
n
i
the tted value for Y
ij
(used to predict the response in the i
the group) is:

Y
ij
=
i
=

Y
i
.
The residual is: e
ij
= Y
ij


Y
ij
the deviation of an observation from its tted value
an important property:

j
e
ij
= 0
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
After obtaining the estimates, the next step is to create an
ANOVA table to make further statistical inferences
We start by partitioning the total variation (total sum of
squares) of the data:
SSTO = SSTR + SSE

j
(Y
ij


Y

)
2
=

j
(

Y
i


Y

)
2
+

j
(Y
ij


Y
i
)
2
Note.

Y

is the mean of all observations.


Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
After obtaining the estimates, the next step is to create an
ANOVA table to make further statistical inferences
We start by partitioning the total variation (total sum of
squares) of the data:
SSTO = SSTR + SSE

j
(Y
ij


Y

)
2
=

j
(

Y
i


Y

)
2
+

j
(Y
ij


Y
i
)
2
Note.

Y

is the mean of all observations.


Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
An intuitive explanation of the decomposition:
Start with the deviation (Y
ij


Y

)
the deviation of Y
ij
around the overall mean

Y

is due to
Y
ij
is in the i th group, which is (

Y
i


Y

) away from

Y

, and
within this group, Y
ij
is (Y
ij


Y
i
) away from its group mean
So, (Y
ij


Y

) = (

Y
i


Y

) + (Y
ij


Y
i
)
Considering all observations, we square both sides and sum
them up. With some algebra, we have:

j
(Y
ij


Y

)
2
=

j
(

Y
i


Y

)
2
+

j
(Y
ij


Y
i
)
2
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
An intuitive explanation of the decomposition:
Start with the deviation (Y
ij


Y

)
the deviation of Y
ij
around the overall mean

Y

is due to
Y
ij
is in the i th group, which is (

Y
i


Y

) away from

Y

, and
within this group, Y
ij
is (Y
ij


Y
i
) away from its group mean
So, (Y
ij


Y

) = (

Y
i


Y

) + (Y
ij


Y
i
)
Considering all observations, we square both sides and sum
them up. With some algebra, we have:

j
(Y
ij


Y

)
2
=

j
(

Y
i


Y

)
2
+

j
(Y
ij


Y
i
)
2
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
An intuitive explanation of the decomposition:
Start with the deviation (Y
ij


Y

)
the deviation of Y
ij
around the overall mean

Y

is due to
Y
ij
is in the i th group, which is (

Y
i


Y

) away from

Y

, and
within this group, Y
ij
is (Y
ij


Y
i
) away from its group mean
So, (Y
ij


Y

) = (

Y
i


Y

) + (Y
ij


Y
i
)
Considering all observations, we square both sides and sum
them up. With some algebra, we have:

j
(Y
ij


Y

)
2
=

j
(

Y
i


Y

)
2
+

j
(Y
ij


Y
i
)
2
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
An intuitive explanation of the decomposition:
Start with the deviation (Y
ij


Y

)
the deviation of Y
ij
around the overall mean

Y

is due to
Y
ij
is in the i th group, which is (

Y
i


Y

) away from

Y

, and
within this group, Y
ij
is (Y
ij


Y
i
) away from its group mean
So, (Y
ij


Y

) = (

Y
i


Y

) + (Y
ij


Y
i
)
Considering all observations, we square both sides and sum
them up. With some algebra, we have:

j
(Y
ij


Y

)
2
=

j
(

Y
i


Y

)
2
+

j
(Y
ij


Y
i
)
2
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
SSTO =

j
(Y
ij


Y

)
2
total variability of Y
ij
;
The ANOVA model tells us that the total variability can be
due to:
treatments (factor levels):
SSTR =

j
(

Y
i


Y

)
2
=

i
n
i
(

Y
i


Y

)
2
; and
random variation (error):
SSE =

j
(Y
ij


Y
i
)
2
Optional.
a mathematical proof of SSTO = SSTR + SSE can be
found on page 692.
this can be easily proved via matrix algebra (e.g., STP 526)
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
SSTO =

j
(Y
ij


Y

)
2
total variability of Y
ij
;
The ANOVA model tells us that the total variability can be
due to:
treatments (factor levels):
SSTR =

j
(

Y
i


Y

)
2
=

i
n
i
(

Y
i


Y

)
2
; and
random variation (error):
SSE =

j
(Y
ij


Y
i
)
2
Optional.
a mathematical proof of SSTO = SSTR + SSE can be
found on page 692.
this can be easily proved via matrix algebra (e.g., STP 526)
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table SS
SSTO =

j
(Y
ij


Y

)
2
total variability of Y
ij
;
The ANOVA model tells us that the total variability can be
due to:
treatments (factor levels):
SSTR =

j
(

Y
i


Y

)
2
=

i
n
i
(

Y
i


Y

)
2
; and
random variation (error):
SSE =

j
(Y
ij


Y
i
)
2
Optional.
a mathematical proof of SSTO = SSTR + SSE can be
found on page 692.
this can be easily proved via matrix algebra (e.g., STP 526)
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table d.f.
In addition to sum of squares, another component of an ANOVA
table is the degree of freedom (d.f.):
The SSTO has (n
T
1) d.f. associated with it; here,
n
T
=

r
i =1
n
i
is total sample size.
Note.

j
(Y
ij


Y

) = 0
The SSE has (n
T
r ) d.f. associated with it
Note.

n
i
j =1
(Y
ij


Y
i
) = 0; i = 1, ..., r
Another way: there are r parameters to be estimated
The SSTR has (r 1) d.f. associated with it
Note.

r
i =1
n
i
(

Y
i


Y

) = 0
decomposing the d.f.:
(n
T
1) = (r 1) + (n
T
r )
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table d.f.
In addition to sum of squares, another component of an ANOVA
table is the degree of freedom (d.f.):
The SSTO has (n
T
1) d.f. associated with it; here,
n
T
=

r
i =1
n
i
is total sample size.
Note.

j
(Y
ij


Y

) = 0
The SSE has (n
T
r ) d.f. associated with it
Note.

n
i
j =1
(Y
ij


Y
i
) = 0; i = 1, ..., r
Another way: there are r parameters to be estimated
The SSTR has (r 1) d.f. associated with it
Note.

r
i =1
n
i
(

Y
i


Y

) = 0
decomposing the d.f.:
(n
T
1) = (r 1) + (n
T
r )
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table d.f.
In addition to sum of squares, another component of an ANOVA
table is the degree of freedom (d.f.):
The SSTO has (n
T
1) d.f. associated with it; here,
n
T
=

r
i =1
n
i
is total sample size.
Note.

j
(Y
ij


Y

) = 0
The SSE has (n
T
r ) d.f. associated with it
Note.

n
i
j =1
(Y
ij


Y
i
) = 0; i = 1, ..., r
Another way: there are r parameters to be estimated
The SSTR has (r 1) d.f. associated with it
Note.

r
i =1
n
i
(

Y
i


Y

) = 0
decomposing the d.f.:
(n
T
1) = (r 1) + (n
T
r )
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table d.f.
In addition to sum of squares, another component of an ANOVA
table is the degree of freedom (d.f.):
The SSTO has (n
T
1) d.f. associated with it; here,
n
T
=

r
i =1
n
i
is total sample size.
Note.

j
(Y
ij


Y

) = 0
The SSE has (n
T
r ) d.f. associated with it
Note.

n
i
j =1
(Y
ij


Y
i
) = 0; i = 1, ..., r
Another way: there are r parameters to be estimated
The SSTR has (r 1) d.f. associated with it
Note.

r
i =1
n
i
(

Y
i


Y

) = 0
decomposing the d.f.:
(n
T
1) = (r 1) + (n
T
r )
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table MS
Also, mean squares = sum of square / (degrees of freedom):
MSTR =
SSTR
r 1
; MSE =
SSE
n
T
r
Now we have an ANOVA table:
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table MS
Also, mean squares = sum of square / (degrees of freedom):
MSTR =
SSTR
r 1
; MSE =
SSE
n
T
r
Now we have an ANOVA table:
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table back to example
Example. (p.685)
Consider again the Food Company problem:
Factor: package design (4 levels)
Experimental units: 20 store, with approximately equal sales
volumes
Design: CRD
Data:
Design 1 11 17 16 14 15
Design 2 12 10 15 19 11
Design 3 23 20 18 17
Design 4 27 33 22 26 28
the SAS program used to obtain the ANOVA table:
Tab16 1.sas
Ming-Hung (Jason) Kao STP-531. Ch. 16
ANOVA Table Example
SAS output:
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 588.2210526 196.0736842 18.59 <.0001
Error 15 158.2000000 10.5466667
Corrected Total 18 746.4210526
Level of
design N Mean Std Dev
1 5 14.6000000 2.30217289
2 5 13.4000000 3.64691651
3 4 19.5000000 2.64575131
4 5 27.2000000 3.96232255
Ming-Hung (Jason) Kao STP-531. Ch. 16

Vous aimerez peut-être aussi