Ch16 1

Ch. 16.
Single-Factor Studies
Ming-Hung (Jason) Kao STP-531. Ch. 16
Overview
An example
ANOVA model, ANOVA table & a partition of the total sum
of squares
F test for equality of factor level means
An Example
Example 1. To determine the best dosage level for a drug to treat
a medical condition
three dosage levels are considered
30 patients are enrolled and are randomly assigned to the
treatment groups; each group has 10 patients
such a design is said to be balanced since each treatment is
replicated the same number of times
Here, we have a single-factor (3 levels) experimental study;
e.g.
dosage level 1 #16, #23, #18, #6, #8, #2, #4, #7, #9, #25
dosage level 2 #27, #24, #17, #28, #22, #29, #20, #13, #14, #15
dosage level 3 #19, #1, #12, #3, #5, #30, #21, #10, #26, #11
Building a single-factor ANOVA model
Let Y
ij
be the response obtained from the j th subject of the
i th treatment group. We decompose the response into two
parts: 1) the factor level mean & 2) noise
The statistical model can be written as:
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Assumptions of the model:
Corresponding to each factor level (or each i ), there is a
probability distribution of responses.
Each probability distribution is normal
Each probability distribution has the same variance
Responses are statistically independent
These assumptions can be summarized by:
Y
ij
ind.
N(
i
,
2
)
or, equivalently,
ij
i .i .d.
N(0,
2
)
Let Y
ij
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Y
ij
ind.
N(
i
,
2
)
or, equivalently,
ij
i .i .d.
N(0,
2
)
Let Y
ij
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Y
ij
ind.
N(
i
,
2
)
or, equivalently,
ij
i .i .d.
N(0,
2
)
Let Y
ij
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Y
ij
ind.
N(
i
,
2
)
or, equivalently,
ij
i .i .d.
N(0,
2
)
Let Y
ij
Y
ij
=
i
+
ij
; i = 1, 2, 3, j = 1, ..., 10.
Y
ij
ind.
N(
i
,
2
)
or, equivalently,
ij
i .i .d.
N(0,
2
)
A graphical interpretation (Figure 16.1 on page 680):
This can be viewed as:
the observations in the i th treatment group form a random
sample of a population with mean
i
variance
2
, and the
probability distribution of each population is Normal.
In general, a single-factor ANOVA model can be written as:
Y
ij
=
i
+
ij
; i = 1, ..., r , j = 1, ..., n
i
.
ij
i .i .d.
N(0,
2
)
We have r treatment groups, each has n
i
subjects
1
, ...,
r
are unknown, and are parameters of interest
2
is also an unknown parameter
we would like to use data to study these unknown parameters
(will learn later!)
note. this model is called the cell means model in the
textbook.
Some statistical properties:
E(
ij
) = 0 and var (
ij
)
2
{
ij
} =
2
E(Y
ij
) = E(
i
+
ij
) =
i
2
{Y
ij
} =
2
{
i
+
ij
} =
2
Note.
i
is unknown, but xed (population mean of the i th
treatment group)
Fitting of ANOVA model
Example (page 685). A food company wished to test four package
designs for a new breakfast cereal. Twenty stores were selected as
experimental units, each was randomly assigned one of the
package designs. Sales, in number of cases, were observed and
recorded below (with one missing value).
Design 1 11 17 16 14 15 Total=73 Mean=14.6
Design 3 23 20 18 17 Total=78 Mean=19.5
What is the factor?
What are the treatments?
How to formulate an ANOVA model?
Our interest lies in the true mean response of each factor
level. How would you use the data to estimate (approximate)
these true means (
i
in your model)?
Why is your estimate a good one?
What is the factor?
these true means (
i
in your model)?
What is the factor?
these true means (
i
in your model)?
A good estimate? (some statistical properties; optional)
Review. In a regression problem, we would like to t a
regression line, say y
i
=
0
+
1
x
i
+
i
to the data. To
estimate the unknown parameters,
0
and
1
, we consider the
least squares criterion, which is to nd

0
and

1
so that
Q =
i
(y
i
(
0
+

1
x
i
))
2
=
i
(y
i
y
i
)
2
=
i
e
2
i
(error sum of squares; SSE) is minimized.
Our model here is: Y
ij
=
i
+
ij
and SSE is
Q =
j
(Y
ij

i
)
2
=
j
(Y
ij

Y
ij
)
2
.
Fact. Q is minimized when
i
=

Y
i
(mean of the i th group);
see Comments on page 688.
A good estimate? (some statistical properties; optional)
Review. In a regression problem, we would like to t a
regression line, say y
i
=
0
+
1
x
i
+
i
to the data. To
estimate the unknown parameters,
0
and
1
, we consider the
least squares criterion, which is to nd

0
and

1
so that
Q =
i
(y
i
(
0
+

1
x
i
))
2
=
i
(y
i
y
i
)
2
=
i
e
2
i
(error sum of squares; SSE) is minimized.
Our model here is: Y
ij
=
i
+
ij
and SSE is
Q =
j
(Y
ij

i
)
2
=
j
(Y
ij

Y
ij
)
2
.
Fact. Q is minimized when
i
=

Y
i
(mean of the i th group);
see Comments on page 688.
Now,
i
=

Y
i
(the sample group mean) is the least squares
estimator; its value is a least square estimate it is obtained
using the least square criterion.
Good properties:
The estimator is unbiased; E{
i
} =
i
The estimator has the smallest variance (the most precise)
among all linear estimators (Best Linear Unbiased Estimator;
BLUE).
With Normality, these estimators are minimum variance
unbiased estimators
With Normality, the least square estimators are also the
maximum likelihood estimators
Now,
i
=

Y
i
(the sample group mean) is the least squares
estimator; its value is a least square estimate it is obtained
using the least square criterion.
Good properties:
The estimator is unbiased; E{
i
} =
i
The estimator has the smallest variance (the most precise)
among all linear estimators (Best Linear Unbiased Estimator;
BLUE).
With Normality, these estimators are minimum variance
unbiased estimators
With Normality, the least square estimators are also the
maximum likelihood estimators
In general, for a single-factor ANOVA model: Y
ij
=
i
+
ij
the factor level (or treatment) mean
i
is estimated by the
LSE:

i
=

Y
i
=
Y
i 1
+ Y
i 2
+ + Y
in
i
n
i
the tted value for Y
ij
(used to predict the response in the i
the group) is:

Y
ij
=
i
=

Y
i
.
The residual is: e
ij
= Y
ij

Y
ij
the deviation of an observation from its tted value
an important property:
j
e
ij
= 0
ij
=
i
+
ij
i
is estimated by the
LSE:

i
=

Y
i
=
Y
i 1
+ Y
i 2
+ + Y
in
i
n
i
ij
the group) is:

Y
ij
=
i
=

Y
i
.
The residual is: e
ij
= Y
ij

Y
ij
j
e
ij
= 0
ij
=
i
+
ij
i
is estimated by the
LSE:

i
=

Y
i
=
Y
i 1
+ Y
i 2
+ + Y
in
i
n
i
ij
the group) is:

Y
ij
=
i
=

Y
i
.
The residual is: e
ij
= Y
ij

Y
ij
j
e
ij
= 0
ANOVA Table SS
After obtaining the estimates, the next step is to create an
ANOVA table to make further statistical inferences
We start by partitioning the total variation (total sum of
squares) of the data:
SSTO = SSTR + SSE
j
(Y
ij

Y
)
2
=
j
(
Y
i

Y
)
2
+
j
(Y
ij

Y
i
)
2
Note.

Y
is the mean of all observations.

ANOVA Table SS
After obtaining the estimates, the next step is to create an
ANOVA table to make further statistical inferences
We start by partitioning the total variation (total sum of
squares) of the data:
SSTO = SSTR + SSE
j
(Y
ij

Y
)
2
=
j
(
Y
i

Y
)
2
+
j
(Y
ij

Y
i
)
2
Note.

Y
is the mean of all observations.

ANOVA Table SS
An intuitive explanation of the decomposition:
Start with the deviation (Y
ij

Y
)
the deviation of Y
ij
around the overall mean

Y
is due to
Y
ij
is in the i th group, which is (
Y
i

Y
) away from

Y
, and
within this group, Y
ij
is (Y
ij

Y
i
) away from its group mean
So, (Y
ij

Y
) = (
Y
i

Y
) + (Y
ij

Y
i
)
Considering all observations, we square both sides and sum
them up. With some algebra, we have:
j
(Y
ij

Y
)
2
=
j
(
Y
i

Y
)
2
+
j
(Y
ij

Y
i
)
2
ANOVA Table SS
ij

Y
)
the deviation of Y
ij

Y
is due to
Y
ij
Y
i

Y
) away from

Y
, and
ij
is (Y
ij

Y
i
So, (Y
ij

Y
) = (
Y
i

Y
) + (Y
ij

Y
i
)
j
(Y
ij

Y
)
2
=
j
(
Y
i

Y
)
2
+
j
(Y
ij

Y
i
)
2
ANOVA Table SS
ij

Y
)
the deviation of Y
ij

Y
is due to
Y
ij
Y
i

Y
) away from

Y
, and
ij
is (Y
ij

Y
i
So, (Y
ij

Y
) = (
Y
i

Y
) + (Y
ij

Y
i
)
j
(Y
ij

Y
)
2
=
j
(
Y
i

Y
)
2
+
j
(Y
ij

Y
i
)
2
ANOVA Table SS
ij

Y
)
the deviation of Y
ij

Y
is due to
Y
ij
Y
i

Y
) away from

Y
, and
ij
is (Y
ij

Y
i
So, (Y
ij

Y
) = (
Y
i

Y
) + (Y
ij

Y
i
)
j
(Y
ij

Y
)
2
=
j
(
Y
i

Y
)
2
+
j
(Y
ij

Y
i
)
2
ANOVA Table SS
SSTO =
j
(Y
ij

Y
)
2
total variability of Y
ij
;
The ANOVA model tells us that the total variability can be
due to:
treatments (factor levels):
SSTR =
j
(
Y
i

Y
)
2
=
i
n
i
(
Y
i

Y
)
2
; and
random variation (error):
SSE =
j
(Y
ij

Y
i
)
2
Optional.
a mathematical proof of SSTO = SSTR + SSE can be
found on page 692.
this can be easily proved via matrix algebra (e.g., STP 526)
ANOVA Table SS
SSTO =
j
(Y
ij

Y
)
2
ij
;
due to:
SSTR =
j
(
Y
i

Y
)
2
=
i
n
i
(
Y
i

Y
)
2
; and
SSE =
j
(Y
ij

Y
i
)
2
Optional.
found on page 692.
ANOVA Table SS
SSTO =
j
(Y
ij

Y
)
2
ij
;
due to:
SSTR =
j
(
Y
i

Y
)
2
=
i
n
i
(
Y
i

Y
)
2
; and
SSE =
j
(Y
ij

Y
i
)
2
Optional.
found on page 692.
ANOVA Table d.f.
In addition to sum of squares, another component of an ANOVA
table is the degree of freedom (d.f.):
The SSTO has (n
T
1) d.f. associated with it; here,
n
T
=
r
i =1
n
i
is total sample size.
Note.
j
(Y
ij

Y
) = 0
The SSE has (n
T
r ) d.f. associated with it
Note.
n
i
j =1
(Y
ij

Y
i
) = 0; i = 1, ..., r
Another way: there are r parameters to be estimated
The SSTR has (r 1) d.f. associated with it
Note.
r
i =1
n
i
(
Y
i

Y
) = 0
decomposing the d.f.:
(n
T
1) = (r 1) + (n
T
r )
ANOVA Table d.f.
The SSTO has (n
T
n
T
=
r
i =1
n
i
Note.
j
(Y
ij

Y
) = 0
The SSE has (n
T
Note.
n
i
j =1
(Y
ij

Y
i
) = 0; i = 1, ..., r
Note.
r
i =1
n
i
(
Y
i

Y
) = 0
(n
T
1) = (r 1) + (n
T
r )
ANOVA Table d.f.
The SSTO has (n
T
n
T
=
r
i =1
n
i
Note.
j
(Y
ij

Y
) = 0
The SSE has (n
T
Note.
n
i
j =1
(Y
ij

Y
i
) = 0; i = 1, ..., r
Note.
r
i =1
n
i
(
Y
i

Y
) = 0
(n
T
1) = (r 1) + (n
T
r )
ANOVA Table d.f.
The SSTO has (n
T
n
T
=
r
i =1
n
i
Note.
j
(Y
ij

Y
) = 0
The SSE has (n
T
Note.
n
i
j =1
(Y
ij

Y
i
) = 0; i = 1, ..., r
Note.
r
i =1
n
i
(
Y
i

Y
) = 0
(n
T
1) = (r 1) + (n
T
r )
ANOVA Table MS
Also, mean squares = sum of square / (degrees of freedom):
MSTR =
SSTR
r 1
; MSE =
SSE
n
T
r
Now we have an ANOVA table:
ANOVA Table MS
Also, mean squares = sum of square / (degrees of freedom):
MSTR =
SSTR
r 1
; MSE =
SSE
n
T
r
Now we have an ANOVA table:
ANOVA Table back to example
Example. (p.685)
Consider again the Food Company problem:
Factor: package design (4 levels)
Experimental units: 20 store, with approximately equal sales
volumes
Design: CRD
Data:
Design 1 11 17 16 14 15
Design 2 12 10 15 19 11
Design 3 23 20 18 17
Design 4 27 33 22 26 28
the SAS program used to obtain the ANOVA table:
Tab16 1.sas
ANOVA Table Example
SAS output:
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 588.2210526 196.0736842 18.59 <.0001
Error 15 158.2000000 10.5466667
Corrected Total 18 746.4210526
Level of
design N Mean Std Dev
1 5 14.6000000 2.30217289
2 5 13.4000000 3.64691651
3 4 19.5000000 2.64575131
4 5 27.2000000 3.96232255

Ch16 1

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ch16 1

Transféré par

Droits d'auteur :

Formats disponibles

Ch. 16.

is the mean of all observations.

is the mean of all observations.

Vous aimerez peut-être aussi