Vous êtes sur la page 1sur 12



ST102
Elementary Statistical Theory

Part II: 7. Analysis of variance (ANOVA)

Part II: Statistical inference


1. Statistical inference preliminaries

Analysis of variance (ANOVA)

2. Point estimation
3. Interval estimation and sampling distributions

Dr James

Abdey

4. Hypothesis testing
5. Other statistical tests

Department

of Statistics
London School of Economics and Political Science

6. Linear regression
7. Analysis of variance (ANOVA) time permitting

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

840

ST102 Elementary Statistical Theory

7. Analysis of variance (ANOVA)

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

841

7.1 Testing for 3 means an introductory

example
Goal: To test the hypothesis that k populations means are the same.
7.1: Testing for 3 means an introductory example

Example: To assess the teaching quality of class teachers, a random


sample of 6 examination marks was selected from each of 3 classes. The
examination marks for each class are listed in the table below.

7.2: F tests for k Normal means with same variance

Can we infer from those data that there is no significant difference in the
examination marks among all 3 classes?

7.3: One-way ANOVA with Minitab


7.4: From one-way to two-way ANOVA

Class 1
85
75
82
76
71
85

7.5: F tests for two-way ANOVA


7.6: Two-way ANOVA with Minitab

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

842

ST102 Elementary Statistical Theory

Class 2
71
75
73
74
69
82

Dr James Abdey

Class 3
59
64
62
69
75
67

LT 2014

Part II: 7. Analysis of variance

843

Testing for 3 means an introductory example

Suppose examination marks from Class j follow the distribution N(j , 2 ),


j = 1, 2, 3. We need to test the hypothesis

Testing for 3 means an introductory example

Remark: Similar problems arise from practical situations:


comparing the returns of 3 stocks
comparing the sales using 3 advertising strategies
comparing the effectiveness of 3 medicines
...

The data form a 6 3 array. Denote the data point at the (i, j)-th
position as Xij , we compute the column means first:

1 , X
2 , X
3 should be very close to
If H0 is true, the three sample means X
each other, i.e. all of them should be close to the overall mean

j = (X1j + X2j + + X6j )/6


X

= (79 + 74 + 66)/3 = 73,

1 = 79, X
2 = 74, X
3 = 66. Transposing, we get
leading to X
Observation
Class 1
Class 2
Class 3

1
85
71
59

2
75
75
64

3
82
73
62

4
76
74
69

5
71
69
75

6
85
82
67

which is the mean value of all 18 observations.

Mean
79
74
66

One possible measure for the closeness is

j=1

scale-invariant.
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

3
P
j X
)2 . However, it is not
(X

844

ST102 Elementary Statistical Theory

Testing for 3 means an introductory example

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

845

7.2 F tests for k Normal means with same

variance
A general setting: k
samples available from k Normal
2
distributions N(j , ), j = 1, . . . , k. Denote by X1j , X2j , . . . , Xnj j the
sample with the sample size nj from N(j , 2 ), j = 1, . . . , k.

A possible test statistic:


3
P
j X
)2
(X

j=1

T =

sum of the 3 sample variances

we reject H0 for large values of

T .

(Note

Goal: Test the hypothesis


H1 : not all j are the same).

(against the alternative

j =
The j-th sample mean: X

1 = X
2 = X
3 .)
= 0 if X

1
nj

The overall sample mean:


Question: What is the distribution of

under H0 ?

nj
P

Xij .

i=1

nj

XX
1X
= 1
Xij =
nj Xj ,
X
n
n
j=1 i=1

where n =

k
P

j=1

nj is the total number of observations.

j=1
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

846

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

847

F tests for k Normal means with same variance

F tests for k Normal means with same variance

ANOVA decomposition:
nj
k X
X
j=1 i=1

)2 =
(Xij X

(n 1) d.f.

nj
k P
P

Total variation:

j=1 i=1

nj
k X
X
j=1 i=1

j )2 +
(Xij X

(n k) d.f.

k
X
j=1

Remarks:

j X
)2
nj ( X
(k 1) d.f.

i. B and W are also called, respectively,


and
. In fact W is effectively a Residual
(error) sum of squares, representing the variation which cannot be
explained by the treatment or group factor.

)2 , with n 1 degrees of freedom.


(Xij X

Between-treatments variation:

, with k 1

ii. The decomposition follows from the identity

degrees of freedom.
Within-treatments variation:
nk =

k
P

j=1

, with

i=1

(ai b) =

m
X
i=1

(ai a)2 + m(a b)2 .

(nj 1) degrees of freedom.

ST102 Elementary Statistical Theory

m
X

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

848

ST102 Elementary Statistical Theory

F tests for k Normal means with same variance

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

849

F tests for k Normal means with same variance

Theorem:
iii. Formulae for computations: n =

k
P

nj , and

i. W =

j=1

j =
X

nj
P

=
Xij /nj , X

i=1

Total variation =

k
P

B=

k
P

j=1

j=1 i=1

each other.

j=1
nj
k P
P

j=1 i=1

j /n
nj X

nj
k P
P

2
Xij2 nX

k
j )2 and B = P nj (X
j X
)2 are independent of
(Xij X

ii. Also,

2
2 nX
nj X
j

j=1

nj
k
1 XX
j )2 2 .
(Xij X
nk
2
j=1 i=1

Residual (Error) SS = W =

nj
k P
P

j=1 i=1

Xij2

k
P

j=1

2 =
nj X
j

Note: Total variation = Total SS = W + B.

k
P

j=1

iii. When 1 = = k ,

(nj 1)Sj2

k
1 X
j X
)2 2 .
nj (X
k1
2
j=1

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

850

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

851

F tests for k Normal means with same variance

F tests for k Normal means with same variance

To test H0 : 1 = = k , define the test statistic

F =

k
P

j=1

j X
)2 /(k 1)
nj ( X

nj
k P
P

j=1 i=1

j )2 /(n k)
(Xij X

One-way ANOVA table:


B/(k 1)
=
.
W /(n k)

Typically the ANOVA results are presented in a table below:


Source
Factor
Error
Total

Under H0 , F Fk1, nk . We reject H0 at the 100% significance level if


F > F, k1, nk ,
where P(Fk1, nk > F, k1, nk ) = . The p-value of the test is

DF
k 1
nk
n1

SS
B
W
B +W

MS
B/(k 1)
W /(n k)

F
B/(k1)
W /(nk)

P
p-value

p = P(Fk1, nk > observed value of F ).


It is clear that F > F, k1, nk if and only if p < .
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

852

F tests for k Normal means with same variance

Example: (Continued) For the given data, k = 3, n1 = n2 = n3 = 6,


1 = 79, X
2 = 74, X
3 = 66, and
n = n1 + n2 + n3 = 18, X
3

j=1

j=1

B=

j=1

F =

W =

j=1 i=1

j )2 =
(Xij X
=

3 X
6
X

Xij2

j=1 i=1

3
X

3
X

LT 2014

Part II: 7. Analysis of variance

853

516/2
B/(k 1)
=
= 9.
W /(n k)
430/15

Under H0 : 1 = 2 = 3 , F Fk1, nk = F2, 15 . Since


F0.01, 2,15 = 6.36 < 9, we reject H0 at the 1% significance level. In fact the
p-value is P(F2,15 > 9) = 0.003.

!

j X
)2 = 6 (79 73)2 + (74 73)2 + (66 73)2 = 516,
6(X
3 X
6
X

Dr James Abdey

F tests for k Normal means with same variance

Hence

X
X
= 1
j = 1
j = 73.
X
nj X
X
n
3
3
X

ST102 Elementary Statistical Theory

There is a significant difference among the examination marks from the 3


classes.

j2
X

The ANOVA table is as follows:

j=1

Source
Factor
Error
Total

5Sj2 = 5(34 + 20 + 32) = 430.

j=1

DF
2
15
17

SS
516
430
946

MS
258
28.67

F
9

P
0.003

where Sj2 is the j-th sample variance.


ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

854

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

855

F tests for k Normal means with same variance

F tests for k Normal means with same variance

Now the ANOVA test statistic is


F

An old problem: Compare two Normal means with the same, but
unknown, variance.

j=1 i=1

= (n1 X
1 + n2 X
2 )/n. Hence
When k = 2, n = n1 + n2 , and X
1 X
= n2 ( X
1 X
2 )/n,
X

where Sj2 =

2 X
= n1 ( X
2 X
1 )/n.
X

Therefore
B=

2
X
j=1

nj
P

i=1

2
2
1 X
j X
) 2 = n1 n2 + n1 n2 ( X
1 X
2 )2 = n1 n2 (X
2 )2 .
nj ( X
2
n
n

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

1 X
2 )2
(X
n1 + n2 2
= T 2,
2
2
1/n1 + 1/n2 (n1 1)S1 + (n2 1)S2

j )2 , and
(Xij X

T =

ST102 Elementary Statistical Theory

2 )2
1 X
(X
B/(2 1)
n1 n2 (n 2)
=
nj
2 P
P
W /(n 2)
n
j )2
(Xij X

2
1 X
n1 + n2 2
X
q
1/n1 + 1/n2 (n 1)S 2 + (n 1)S 2
1
2
1
2

is a t statistic introduced for testing H0 : 1 = 2 , and T tn1 +n2 2


under H0 .

856

7.3 One-way ANOVA with Minitab

It is known that if T tq , T 2 F1, q .


ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

857

One-way ANOVA with Minitab

Example:
MTB > desc c1 c2 c3

A study performed by a Columbia University professor counted the number


of times per minute professors from three different departments said uh
or ah during lectures to fill gaps between words. The data listed in
uhah.mtw were derived from observing 100 minutes from each of the
three departments. If we assume that the more frequent use of uh or ah
results in more boring lectures, can we conclude that some departments
professors are more boring than others?

Descriptive Statistics: English, Mathematics, Political Science


Variable
English
Mathematics
Political Scienc

N
100
100
100

(You may copy the file uhah.mtw from the ST102 Moodle site into your
document folder, and double-click the file to start a Minitab session.)

Variable
English
Mathematics
Political Scienc

Q1
4.000
4.000
4.000

The counts for English, Mathematics and Political Science departments


are stored in c1, c2 and c3. As always in statistical analysis, we first look
at the summary (descriptive) statistics of these data.

Surprisingly professors in English say a bit more uh or ah than those in


Mathematics and Political Science, but the difference seems small.

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

858

ST102 Elementary Statistical Theory

N*
0
0
0

Mean
5.810
5.300
5.330

Median
5.000
5.000
5.000

Dr James Abdey

SE Mean
0.249
0.201
0.197
Q3
8.000
7.000
7.000

LT 2014

StDev
2.493
2.013
1.975

Minimum
0.000000000
0.000000000
0.000000000

Maximum
11.000
9.000
9.000

Part II: 7. Analysis of variance

859

One-way ANOVA with Minitab

One-way ANOVA with Minitab

The Minitab command for one-way ANOVA is aovOneway.


MTB > aovOneway c1 c2 c3
One-way ANOVA: English, Mathematics, Political Science
Source
Factor
Error
Total

DF
2
297
299

SS
16.38
1402.50
1418.88

S = 2.173

MS
8.19
4.72

F
1.73

Since the p-value for the F test is 0.178, we cannot reject the hypothesis

P
0.178

H0 : 1 = 2 = 3
i.e. the mean numbers of uhs or ahs said by professors in the 3
departments are the same.

R-Sq = 1.15%

Level
English
Mathematics
Political Scienc

N
100
100
100

R-Sq(adj) = 0.49%
Individual 95% CIs For Mean Based on
Pooled StDev
Mean StDev
-+---------+---------+---------+-------5.810 2.493
(-----------*-----------)
5.300 2.013
(-----------*------------)
5.330 1.975
(-----------*------------)
-+---------+---------+---------+-------4.90
5.25
5.60
5.95

Pooled StDev = 2.173

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

860

One-way ANOVA with Minitab

In addition to a one-way ANOVA table, the Minitab output also provides:

Squared correlation coefficients:

2
Radj
= 1

16.38
B
=
= 0.0115 = 1.15%,
Total SS
1,418.88

W /(n k)
1,402.50/297
= 1
= 0.0049 = 0.49%.
(Total SS)/(n 1)
1,418.88/299

95% confidence intervals for j :

j t0.025, nk S ,
X
nj
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

j = 1, . . . , k.

Part II: 7. Analysis of variance

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

861

One-way ANOVA worked example

Example: In early 2001, the American economy was slowing down and
companies were laying off workers. A poll conducted during 911 February
2001, asked a random sample of workers how long (in months) it would be
before they had significant financial hardship if they lost their jobs. They
are classified into 4 groups according to their incomes. Below is a part of
Minitab output of the descriptive statistics of the classified data. Can we
infer that income has a significant impact on the length of time before
facing financial hardship?

Estimate for :
p
p

b = S = W /(n r ) = 1,402.50/297 = 4.72 = 2.173


R2 =

ST102 Elementary Statistical Theory

862

MTB > desc c1-c4


Variable
N
Over $50K
39
$30 to 50K 114
$20 to 30K
81
Under $20K
67

ST102 Elementary Statistical Theory

Mean
22.21
18.456
15.49
9.313

SE Mean
1.77
0.890
1.03
0.988

Dr James Abdey

StDev
11.03
9.507
9.23
8.087

LT 2014

Part II: 7. Analysis of variance

863

One-way ANOVA worked example

We apply one-way ANOVA to test whether the means in the k = 4 groups


are the same, i.e. H0 : 1 = 2 = 3 = 4 .
n1 = 39, n2 = 114, n3 = 81, n4 = 67, hence
n=

k
X

nj = 39 + 114 + 81 + 67 = 301.

S12 = 11.032 = 121.661, S22 = 9.5072 = 90.383, S32 = 9.232 =


85.193, S42 = 8.0872 = 65.400, hence
=

j=1

nj
k X
X
j=1 i=1

39 22.21 + 114 18.456 + 81 15.49 + 67 9.313


= 16.109.
301

F =

Now

j=1

j X
)2 = 39(22.21 16.109)2 + 114(18.456 16.109)2
nj (X

+ 81(15.49 16.109)2 + 67(9.313 16.109)2 = 5205.097.

ST102 Elementary Statistical Theory

j=1

(nj 1)Sj2

Consequently,

j=1

k
X

k
X

= 25,968.24.

X
= 1
j
X
nj X
n

B =

j )2 =
(Xij X

= 38 121.661 + 113 90.383 + 80 85.193 + 66 65.4

2 = 18.456, X
3 = 15.49, X
4 = 9.313, and
1 = 22.21, X
X

One-way ANOVA worked example

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

864

5,205.097/3
B/(k 1)
=
= 19.84.
W /(n k)
25,968.24/(301 4)

Under H0 , F Fk1, nk = F3, 297 . Since F0.01, 3, 297 = 3.848 < 19.84, we
reject H0 at the 1% significance level, i.e. income has a significant impact
on the length of time before facing financial hardship.
ST102 Elementary Statistical Theory

One-way ANOVA worked example

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

865

One-way ANOVA worked example

The data is the file gallupPoll.mtw.

The pooled estimate for :


p
p
S = W /(n k) = 25,968.24/(301 4) = 9.351.

MTB > aovoneway c1-c4


One-way ANOVA: Over $50K, $30 to 50K, $20 to 30K, Under $20K
Source
Factor
Error
Total

A 95% confidence interval for j :

j t0.025, 297 S/nj = X


j 1.968 9.351/nj = X
j 18.403/nj .
X

DF
3
297
300

S = 9.352

Hence, for example, the confidence interval for 1 is

22.21 18.403/ 39 = (19.26, 25.16)

Level
Over $50K
$30 to 50K
$20 to 30K
Under $20K

and the confidence interval for 4 is

9.313 18.403/ 67 = (7.06, 11.56).


The two confidence intervals are far away from each other.

SS
5202.1
25973.3
31175.4

MS
1734.0
87.5

R-Sq = 16.69%

N
39
114
81
67

Mean
22.205
18.456
15.494
9.313

F
19.83

P
0.000

R-Sq(adj) = 15.84%

StDev
11.029
9.507
9.233
8.087

Individual 95% CIs For Mean Based on


Pooled StDev
------+---------+---------+---------+--(----*-----)
(---*--)
(---*---)
(----*---)
------+---------+---------+---------+--10.0
15.0
20.0
25.0

Pooled StDev = 9.352

Note: Minor differences were due to rounding errors in calculations.


ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

866

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

867

7.4 From one-way to two-way ANOVA

One-way ANOVA a revisit:


We have independent observations Xij N(j , 2 ) for i = 1, . . . , nj and
j = 1, . . . , k. We are interested in testing

Two-way ANOVA deals with the observations:


Xij = + i + j + ij ,

H0 : 1 = = k .

i = 1, . . . , nj , j = 1, . . . , k,

where ij N(0, 2 ) and are independent,


:
j :
P
Note kj=1 j = 0. The null hypothesis (i.e. no treatment effect) is
represented as H0 : 1 = = k = 0.
ST102 Elementary Statistical Theory

Dr James Abdey

i = 1, . . . , r , j = 1, . . . , c,

where

Key idea: The variation of Xij is driven by a treatment factor at different


).
levels 1 , . . . , k , in addition to random fluctuations (i.e.
We test if such a treatment effect exists or not.
We recast a one-way ANOVA problem as follows:
Xij = + j + ij ,

7.5 F tests for two-way ANOVA

LT 2014

Part II: 7. Analysis of variance

represent the
1 , . . . , c represent c different
1 , . . . , r represent r different
ij N(0, 2 ) and are independent.
There are n = rc observations.
Conditions to make parameters , i , j identifiable:
1 + + r = 0,

868

F tests for two-way ANOVA

ST102 Elementary Statistical Theory

1 + + c = 0.

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

869

F tests for two-way ANOVA

Hypotheses of interest:
Similar to the original model

No treatment effect hypothesis, H0 : 1 = = c = 0.


No block effect hypothesis, H0 : 1 = = r = 0.

Xij = + i + j + ij ,
we decompose the observations as follows:

We compute different types of mean:


i =
Mean at the i-th block level: X

c
P

Xij /c,

+ (X
i X
) + (X
j X
) + (Xij X
i X
j + X
)
Xij = X

i = 1, . . . , r

for i = 1, . . . , r and j = 1, . . . , c.

j=1

j =
Mean at the j-th treatment level: X

r
P

Xij /r ,

j = 1, . . . , c

,
Point estimators:
b=X

i=1
r
c
=X
= P P Xij /rc.
Overall mean: X

i X
j + X
.
Residuals: bij = Xij X

i=1 j=1

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

i X
,

bi = X

Part II: 7. Analysis of variance

870

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

j X
.
bj = X

Part II: 7. Analysis of variance

871

F tests for two-way ANOVA

Two-way ANOVA decomposition:


c
r X
X
i=1 j=1

)2 = c
(Xij X
+

r
X

i X
)2 + r
(X

i=1
c
r
XX
i=1 j=1

Pr

c
X
j=1

To test no block (row) effect H0 : 1 = = r = 0, the test statistic is


defined as
(c 1)Brow
Brow /(r 1)
=
.
F =
(Residual SS)/[(r 1)(c 1)]
Residual SS

j X
)2
(X

Under H0 , F Fr 1, (r 1)(c1) . We reject H0 at the 100% significance


level if F > F, r 1, (r 1)(c1) . The p-value is

i X
j + X
) .
(Xij X
2

P(Fr 1, (r 1)(c1) > observed value of F ).

Pc

)2 , with rc 1 degrees of
Total variation: Total SS = i=1 j=1 (Xij X
freedom.
P
i X
)2 , with r 1
Between-blocks (rows) variation: Brow = c ri=1 (X
degrees of freedom.
P
j X
)2 ,
Between-treatments (columns) variation: Bcol = r cj=1 (X
with c 1 degrees of freedom.
Residual
variation: Residual SS
Pr P(Error)
c
i X
j + X
)2 , with (r 1)(c 1) degrees of
= i=1 j=1 (Xij X
freedom.
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

F tests for two-way ANOVA

Part II: 7. Analysis of variance

872

F tests for two-way ANOVA

To test no treatment (column) effect H0 : 1 = = c = 0, the test


statistic is defined as
(r 1)Bcol
Bcol /(c 1)
=
.
F =
(Residual SS)/[(r 1)(c 1)]
Residual SS

Under H0 , F Fc1, (r 1)(c1) . We reject H0 at the significance level if


F > F, c1, (r 1)(c1) . The p-value is
P(Fc1, (r 1)(c1) > observed value of F ).
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

873

F tests for two-way ANOVA

Computational formulae:
i = Pc Xij /c, i = 1, . . . , r
Row means: X
j=1

Two-way ANOVA table:

j =
Column means: X
Source

DF

SS

MS

Row

r 1

Brow

Brow /(r 1)

(c1)Brow

Residual SS

p-value

Column

c 1

Bcol

Bcol /(c 1)

Residual SS

Error

(r 1)(c 1)

Residual SS

Total

rc 1

Total SS

(r 1)Bcol

=
Overall mean: X

p-value

Total SS =

Residual SS
(r 1)(c1)

Pr

i=1

Pr

i=1 Xij /r ,

Pr

i=1

Pc

Pc

2
j=1 Xij

Row variation: Brow = c

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

874

j=1 Xij /rc

2
rc X

Pr

2
i=1 Xi

Column variation: Bcol = r

ST102 Elementary Statistical Theory

j = 1, . . . , c

Pc

Pr

i=1 Xi /r

2
j=1 Xj

Dr James Abdey

Pc

j=1 Xj /c

2
rc X
2
rc X

Residual
Brow Bcol =
Pr PcSS =2 (Total
PSS)
r
2 r Pc X
2
2
X
X

c
i=1
j=1 ij
i=1 i
j=1 j + rc X .

ST102 Elementary Statistical Theory

LT 2014

Part II: 7. Analysis of variance

875

F tests for two-way ANOVA example

Example:
The table below lists the percentage annual returns (calculated four times
per annum) of the Common Stock Index at the New York Stock Exchange
during 198185.
1981
1982
1983
1984
1985

1st quarter
5.7
7.2
4.9
4.5
4.4

2nd quarter
6.0
7.0
4.1
4.9
4.2

3rd quarter
7.1
6.1
4.2
4.5
4.2

4th quarter
6.7
5.2
4.4
4.5
3.6

r = 5, c = 4.
P
i = c Xij /c which are, respectively, 6.375, 6.375, 4.4,
Row means: X
j=1
4.6, 4.1 for i = 1, . . . , 5.
j =
Column means: X
4.88 for j = 1, . . . , 4.

Pr

2. Is the variability in returns from year to year statistically significant?


Idea: Using two-way ANOVA, test no column effect hypothesis to answer
1., and test no row effect hypothesis to answer 2..

Bcol = r

LT 2014

Part II: 7. Analysis of variance

2
j=1 Xj

2
j=1 Xij

Brow = c
876

F tests for two-way ANOVA example

Pc

Pc

i=1 Xij /r

Pr

i=1 Xi /r

which are, respectively, 5.34, 5.24, 5.22,

= 5.17.

= 559.06.

P P
2 = 559.06 20 (5.17)2 =
Hence Total SS = ri=1 cj=1 Xij2 rc X
559.06 534.578 = 24.482.

1. Are returns affected by the quarter of the year?

Dr James Abdey

Pr

=
The overall mean X
i=1

ST102 Elementary Statistical Theory

F tests for two-way ANOVA example

Pr

2
i=1 Xi

2 = 4 138.6112 534.578 = 19.867.


rc X

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

877

F tests for two-way ANOVA example

2 = 5 107.036 534.578 = 0.602.


rc X

Residual SS= Total SS Brow Bcol = 24.482 19.867 0.602 = 4.013.


To test no row effect hypothesis H0 : 1 = = 5 = 0, the test statistic
is F = (c 1)Brow /Residual SS = 3 19.867/4.013 = 14.852. Under H0 ,
F Fr 1, (r 1)(c1) = F4,12 . Since F0.01, 4,12 = 5.412 < 14.852, we reject
H0 at the 1% significance level. We conclude that the return does depend
on the year significantly.
To test no column effect hypothesis H0 : 1 = = 4 = 0, the test
statistic is F = (r 1)Bcol /Residual SS = 4 0.602/4.013 = 0.600.
Under H0 , F Fc1,(r 1)(c1) = F3,12 . Since F0.1, 3,12 = 2.606 > 0.600,
we cannot reject H0 at the 10% significance level. Therefore there is no
significant evidence indicating that the return depends on the quarter.
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

878

The results may be summarised in a two-way ANOVA table:


Source

DF

SS

MS

Year
Quarter
Error
Total

4
3
12
19

19.867
0.602
4.013
24.482

4.967
0.201
0.334

14.852
0.600

< 0.01
> 0.10

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

879

7.6 Two-way ANOVA with Minitab

Two-way ANOVA in Minitab is almost as easy as oneway ANOVA, except the data set to be analysed needs
to be in a special format 3 columns each of length
r c:
Data column (c1): stack the original r c data
points column over column.

Block column (c2): 1, 2, . . . , r , 1, 2, . . . r , . . .


indicating the block levels of the data points in
the data column.
Treatment column (c3):
1, 1, . . . , 1, 2, 2, . . . 2, . . . , r indicating the
treatment levels of the data points in the data
column.
For the previous example, the data should be prepared
into 3 columns each of length 20. See the data file
newYorkStock.dat.
ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

c1
5.7
7.2
4.9
4.5
4.4
6.0
7.0
4.1
4.9
4.2
7.1
6.1
4.2
4.5
4.2
6.7
5.2
4.4
4.5
3.6

c2
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5

c3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4

Part II: 7. Analysis of variance

Two-way ANOVA with Minitab

We repeat the analysis now using Minitab.


MTB > twoway c1-c3
Two-way ANOVA: Return versus Year, Quarter
Source
Year
Quarter
Error
Total

DF
4
3
12
19

S = 0.5783

880

Year
1
2
3
4
5

Xij = + i + j + ij .
Hence we may also look at the residuals:
i = 1, . . . r , j = 1, . . . , c.

Quarter
1
2
3
4

It may also give interval estimates for each block and treatment level.
MTB > twoway c1-c3;
SUBC> means c2 c3;
SUBC> gFourpack.
LT 2014

Part II: 7. Analysis of variance

F
14.85
0.60

P
0.000
0.627

R-Sq(adj) = 74.05%

882

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

881

Two-way ANOVA with Minitab

Mean
6.375
6.375
4.400
4.600
4.100

If the assumed model (structure) is correct, bij should behave like


independent N(0, 2 ). Minitab may produce diagnostic plots as used in
regression analysis.

Dr James Abdey

R-Sq = 83.61%

ST102 Elementary Statistical Theory

Note a two-way ANOVA is practical to fit a linear model:

ST102 Elementary Statistical Theory

MS
4.96675
0.20067
0.33442



SS 1/2 = (Residual MS)1/2
The pooled estimator for : S = Residual
(r 1)(c1)
Residual MS .
2 =1
R 2 = (Brow + Bcol )/ (Total SS) /(rc 1), Radj
(Total SS)/(rc1)

Two-way ANOVA with Minitab

b
bi bj ,
bij = Xij

SS
19.867
0.602
4.013
24.482

Individual 95% CIs For Mean Based on Pooled StDev


-----+---------+---------+---------+---(------*-----)
(------*-----)
(-----*-----)
(-----*-----)
(-----*-----)
-----+---------+---------+---------+---4.0
5.0
6.0
7.0

Mean
5.34
5.24
5.22
4.88

Individual 95% CIs For Mean Based on Pooled StDev


--+---------+---------+---------+------(--------------*-------------)
(-------------*-------------)
(-------------*--------------)
(-------------*-------------)
--+---------+---------+---------+------4.40
4.80
5.20
5.60

ST102 Elementary Statistical Theory

Dr James Abdey

LT 2014

Part II: 7. Analysis of variance

883

Two-way ANOVA with Minitab


Residual Plots for Return

Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values

99
0.5
Residual

Percent

90
50

0.0
-0.5

10
-1.0

1
-1.0

-0.5

0.0
Residual

0.5

1.0

Histogram of the Residuals

5
Fitted Value

Residuals Versus the Order of the Data

Residual

Frequency

0.5
3
2
1

0.0
-0.5
-1.0

0
-0.8 -0.6 -0.4 -0.2 0.0
Residual

ST102 Elementary Statistical Theory

0.2

0.4

0.6

Dr James Abdey

LT 2014

8 10 12 14
Observation Order

16

18

20

Part II: 7. Analysis of variance

884

Vous aimerez peut-être aussi