Vous êtes sur la page 1sur 52

Statistics for

Business and Economics


6th Edition

Chapter 17
Analysis of Variance

Statistics for Business and Economics, 6e 2007 Pearson Education, Inc.

Chap 17-1

Chapter Goals
After completing this chapter, you should be able
to:

Recognize situations in which to use analysis of variance

Understand different analysis of variance designs

Perform a one-way and two-way analysis of variance and


interpret the results

Conduct and interpret a Kruskal-Wallis test

Analyze two-factor analysis of variance tests with more than


one observation per cell

One-Way Analysis of Variance

Evaluate the difference among the means of three


or more groups
Examples: Average production for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires

Assumptions
Populations are normally distributed
Populations have equal variances
Samples are randomly and independently drawn

Hypotheses of One-Way ANOVA

H0 : 1 2 3 K

All population means are equal

i.e., no variation in means between groups

H1 : i j

for at least one i, j pair

At least one population mean is different

i.e., there is variation between groups

Does not mean that all population means are different


(some pairs may be the same)

One-Way ANOVA
H0 : 1 2 3 K
H1 : Not all i are the same
All Means are the same:
The Null Hypothesis is True
(No variation between
groups)

1 2 3

One-Way ANOVA
(continued)

H0 : 1 2 3 K
H1 : Not all i are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Variation is present between groups)
or

1 2 3

1 2 3

Variability

The variability of the data is key factor to test the equality


of means

In each case below, the means may look different, but a


large variation within groups in B makes the evidence
that the means are different weak

B
C
Group
Small variation within groups

B
C
Group
Large variation within groups

Partitioning the Variation

Total variation can be split into two parts:

SST = SSW + SSG


SST = Total Sum of Squares
Total Variation = the aggregate dispersion of the individual
data values across the various groups

SSW = Sum of Squares Within Groups


Within-Group Variation = dispersion that exists among the
data values within a particular group

SSG = Sum of Squares Between Groups


Between-Group Variation = dispersion between the group
sample means

Partition of Total Variation


Total Sum of Squares
(SST)

Variation due to
random sampling
(SSW)

Variation due to
differences
between groups
(SSG)

Total Sum of Squares


SST = SSW + SSG
K

ni

SST (x ij x)
Where:

i1 j1

SST = Total sum of squares


K = number of groups (levels or treatments)
ni = number of observations in group i
xij = jth observation from group i
x = overall sample mean

Total Variation
(continued)

SST (x11 x )2 (X12 x )2 ... (x KnK x )2


Response, X

Group 1

Group 2

Group 3

Within-Group Variation
SST = SSW + SSG
K

ni

SSW (x ij x i )2
i 1 j1

Where:

SSW = Sum of squares within groups


K = number of groups
ni = sample size from group i
Xi = sample mean from group i
Xij = jth observation in group i

Within-Group Variation
(continued)
K

ni

SSW (x ij x i )
i 1 j1

Summing the variation


within each group and then
adding over all groups

SSW
MSW
n K
Mean Square Within =
SSW/degrees of freedom

Within-Group Variation
(continued)

SSW (x11 x1 )2 (x12 x1 )2 ... (x KnK x K )2


Response, X

x1
Group 1

Group 2

x2
Group 3

x3

Between-Group Variation
SST = SSW + SSG
K

SSG ni ( x i x )
Where:

i1

SSG = Sum of squares between groups


K = number of groups
ni = sample size from group i
xi = sample mean from group i
x = grand mean (mean of all data values)

Between-Group Variation
(continued)
K

SSG ni ( x i x )

i1

Variation Due to
Differences
Between
Groups

SSG
MSG
K 1
Mean Square Between Groups
= SSG/degrees of freedom

Between-Group Variation
(continued)

SSG n1(x1 x) n2 (x 2 x) ... nK (x K x)


2

Response, X

x1
Group 1

Group 2

x2
Group 3

x3

Obtaining the Mean Squares


SST
MST
n 1
SSW
MSW
n K
SSG
MSG
K 1

One-Way ANOVA Table


Source of
Variation

SS

df

Between
Groups

SSG

K-1

Within
Groups

SSW

n-K

SST =
SSG+SSW

n-1

Total

MS
(Variance)

F ratio

SSG
MSG
MSG =
K - 1 F = MSW
SSW
MSW =
n-K

K = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom

One-Factor ANOVA
F Test Statistic
H0: 1= 2 = = K
H1: At least two population means are different

Test statistic

MSG
F
MSW

MSG is mean squares between variances


MSW is mean squares within variances

Degrees of freedom

df1 = K 1

(K = number of groups)

df2 = n K

(n = sum of sample sizes from all groups)

Interpreting the F Statistic

The F statistic is the ratio of the between


estimate of variance and the within estimate
of variance

The ratio must always be positive


df1 = K -1 will typically be small
df2 = n - K will typically be large

Decision Rule:
Reject H if
0
F > FK-1,n-K,

= .05

Do not
reject H0

Reject H0

FK-1,n-K,

One-Factor ANOVA
F Test Example
You want to see if three
different golf clubs yield
different distances. You
randomly select five
measurements from trials on
an automated driving
machine for each club. At
the .05 significance level, is
there a difference in mean
distance?

Club 1
254
263
241
237
251

Club 2
234
218
235
227
216

Club 3
200
222
197
206
204

One-Factor ANOVA Example:


Scatter Diagram
Club 1
254
263
241
237
251

Club 2
234
218
235
227
216

Club 3
200
222
197
206
204

Distance
270
260
250
240
230

220

x1 249.2 x 2 226.0 x 3 205.8


x 227.0

210

x1

x2

200
190
1

2
Club

x3

One-Factor ANOVA Example


Computations
Club 1
254
263
241
237
251

Club 2
234
218
235
227
216

Club 3
200
222
197
206
204

x1 = 249.2

n1 = 5

x2 = 226.0

n2 = 5

x3 = 205.8

n3 = 5

x = 227.0

n = 15

K=3
SSG = 5 (249.2 227)2 + 5 (226 227)2 + 5 (205.8 227)2 = 4716.4
SSW = (254 249.2)2 + (263 249.2)2 ++ (204 205.8)2 = 1119.6

MSG = 4716.4 / (3-1) = 2358.2


MSW = 1119.6 / (15-3) = 93.3

2358.2
F
25.275
93.3

One-Factor ANOVA Example


Solution
Test Statistic:

H0: 1 = 2 = 3
H1: i not all equal
= .05
df1= 2

df2 = 12
Critical Value:

F2,12,.05= 3.89
= .05

Do not
reject H0

Reject H0

F2,12,.05 = 3.89

MSA 2358.2
F

25.275
MSW
93.3

Decision:
Reject H0 at = 0.05
Conclusion:
There is evidence that
at least one i differs
F = 25.275
from the rest

ANOVA -- Single Factor:


Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups

Count

Sum

Average

Variance

Club 1

1246

249.2

108.2

Club 2

1130

226

77.5

Club 3

1029

205.8

94.2

ANOVA
Source of
Variation

SS

df

MS

Between
Groups

4716.4

2358.2

Within
Groups

1119.6

12

93.3

Total

5836.0

14

F
25.275

P-value
4.99E-05

F crit
3.89

Kruskal-Wallis Test

Use when the normality assumption for oneway ANOVA is violated


Assumptions:

The samples are random and independent


variables have a continuous distribution
the data can be ranked
populations have the same variability
populations have the same shape

Kruskal-Wallis Test Procedure

Obtain relative rankings for each value

In event of tie, each of the tied values gets the average


rank

Sum the rankings for data from each of the K


groups

Compute the Kruskal-Wallis test statistic

Evaluate using the chi-square distribution with K 1


degrees of freedom

Kruskal-Wallis Test Procedure


(continued)

The Kruskal-Wallis test statistic:

(chi-square with K 1 degrees of freedom)

2
i

12
R
W

n(n 1) i1 ni

3(n 1)

where:
n = sum of sample sizes in all groups
K = Number of samples
Ri = Sum of ranks in the ith group
ni = Size of the ith group

Kruskal-Wallis Test Procedure


(continued)

Complete the test by comparing the


calculated H value to a critical 2 value from
the chi-square distribution with K 1
degrees of freedom
Decision rule

Do not
reject H0

2K1,

Reject H0

Reject H0 if W > 2K1,

Otherwise do not reject H0

Kruskal-Wallis Example

Do different departments have different class


sizes?
Class size
(Math, M)

Class size
(English, E)

Class size
(Biology, B)

23
45
54
78
66

55
60
72
45
70

30
40
18
34
44

Kruskal-Wallis Example

Do different departments have different class


sizes?

Class size
Class size
Ranking
Ranking
(Math, M)
(English, E)
23
41
54
78
66

2
6
9
15
12
= 44

55
60
72
45
70

10
11
14
8
13
= 56

Class size
(Biology, B)

Ranking

30
40
18
34
44

3
5
1
4
7
= 20

Kruskal-Wallis Example
(continued)

H0 : MeanM MeanE MeanB


H1 : Not all population means are equal

The W statistic is

K
12
Ri2
W

3(n 1)
n(n 1) i1 ni

44 2 56 2 20 2
12

5
5
15(15 1) 5

3(15 1) 6.72

Kruskal-Wallis Example
(continued)

Compare W = 6.72 to the critical value from


the chi-square distribution for 3 1 = 2
degrees of freedom and = .05:

2
2,0.05

5.991

2
5.991 ,
Since H = 6.72 > 2,0.05
reject H0

There is sufficient evidence to reject that


the population means are all equal

Two-Way Analysis of Variance

Examines the effect of

Two factors of interest on the dependent


variable

e.g., Percent carbonation and line speed on soft drink


bottling process

Interaction between the different levels of these


two factors

e.g., Does the effect of one particular carbonation


level depend on which level the line speed is set?

Two-Way ANOVA
(continued)

Assumptions

Populations are normally distributed

Populations have equal variances

Independent random samples are


drawn

Randomized Block Design


Two Factors of interest: A and B
K = number of groups of factor A
H = number of levels of factor B
(sometimes called a blocking variable)
Group
Block

1
2
.
.
H

x11

x21

xK1

x12

x22

.
.
x1H

.
.
x2H

.
.

xK2
.
.
xKH

Two-Way Notation

Let xji denote the observation in the jth group and ith
block
Suppose that there are K groups and H blocks, for a
total of n = KH observations
Let the overall mean be x
Denote the group sample means by

x j (j 1,2, ,K)

Denote the block sample means by

x i (i 1,2, ,H)

Partition of Total Variation

SST = SSG + SSB + SSE

Total Sum of
Squares (SST)

Variation due to
differences between
groups (SSG)

Variation due to
differences between
blocks (SSB)

The error terms are assumed


to be independent, normally
distributed, and have the same
variance

Variation due to
random sampling
(unexplained error)
(SSE)

Two-Way Sums of Squares

The sums of squares are


Total :

Degrees of
Freedom:

SST (x ji x)2

n1

j1 i 1

Between - Groups :

SSG H (x j x)2

K1

j1

Between - Blocks :

SSB K (x i x)2

H1

i 1

Error :

SSE (x ji x j x i x)2
j 1 i 1

(K 1)(K 1)

Two-Way Mean Squares

The mean squares are


SST
MST
n 1
SST
MSG
K 1
SST
MSB
H 1
SSE
MSE
(K 1)(H 1)

Two-Way ANOVA:
The F Test Statistic
H0: The K population group
means are all the same

H0: The H population block


means are the same

F Test for Groups

MSG
F
MSE

Reject H0 if
F > FK-1,(K-1)(H-1),

F Test for Blocks

MSB
F
MSE

Reject H0 if
F > FH-1,(K-1)(H-1),

General Two-Way Table Format


Source of
Variation
Between
groups
Between
blocks
Error
Total

Sum of
Squares

Degrees of
Freedom

SSG

K1

SSB

H1

SSE

(K 1)(H 1)

SST

n-1

Mean Squares
MSG
MSB
MSE

SSG
K 1

SSB
H 1

SSE
(K 1)(H 1)

F Ratio
MSG
MSE
MSB
MSE

More than One


Observation per Cell

A two-way design with more than one


observation per cell allows one further source
of variation

The interaction between groups and blocks


can also be identified

Let

K = number of groups
H = number of blocks
L = number of observations per cell
n = KHL = total number of observations

More than One


Observation per Cell
SST = SSG + SSB + SSI + SSE
SSG
Between-group variation

SST
Total Variation

SSB
Between-block variation

SSI
n1

Variation due to interaction


between groups and blocks

SSE
Random variation (Error)

(continued)
Degrees of
Freedom:
K1

H1

(K 1)(H 1)

KH(L 1)

Sums of Squares with Interaction


Degrees of Freedom:

Total :

SST (x jil x)2


j

Between - groups :

SSG HL (x j x)2
j1

Between - blocks :

n-1

K1

SSB KL (x i x)2

H1

i1

Interaction :

SSI L (x ji x j x i x)2
j1 i1

Error :

SSE (x jil x ji )2
i

(K 1)(H 1)

KH(L 1)

Two-Way Mean Squares


with Interaction

The mean squares are

MST

SST
n 1

MSG

SST
K 1

MSB

SST
H 1

MSI

SSI
(K - 1)(H 1)

SSE
MSE
KH(L 1)

Two-Way ANOVA:
The F Test Statistic
H0: The K population group
means are all the same

H0: The H population block


means are the same

H0: the interaction of groups and


blocks is equal to zero

F Test for group effect

MSG
F
MSE

Reject H0 if
F > FK-1,KH(L-1),

F Test for block effect

MSB
F
MSE

Reject H0 if
F > FH-1,KH(L-1),

F Test for interaction effect

MSI
F
MSE

Reject H0 if
F > F(K-1)(H-1),KH(L-1),

Two-Way ANOVA
Summary Table
Source of
Variation

Sum of
Squares

Degrees of
Freedom

Mean
Squares

F
Statistic

Between
groups

SSG

K1

MSG
= SSG / (K 1)

MSG
MSE

Between
blocks

SSB

H1

MSB
= SSB / (H 1)

MSB
MSE
MSI
MSE

Interaction

SSI

(K 1)(H 1)

MSI
= SSI / (K 1)(H 1)

Error

SSE

KH(L 1)

MSE
= SSE / KH(L 1)

Total

SST

n1

Features of Two-Way
ANOVA F Test

Degrees of freedom always add up

n-1 = KHL-1 = (K-1) + (H-1) + (K-1)(H-1) + KH(L-1)

Total = groups + blocks + interaction + error

The denominator of the F Test is always the


same but the numerator is different

The sums of squares always add up

SST = SSG + SSB + SSI + SSE

Total = groups + blocks + interaction + error

Examples:
Interaction vs. No Interaction

No interaction:

Interaction is
present:

Block Level 3
Block Level 2

B
Groups

Mean Response

Mean Response

Block Level 1
Block Level 1
Block Level 2
Block Level 3

B
Groups

Chapter Summary

Described one-way analysis of variance

The logic of Analysis of Variance

Analysis of Variance assumptions

F test for difference in K means

Applied the Kruskal-Wallis test when the


populations are not known to be normal

Described two-way analysis of variance

Examined effects of multiple factors

Examined interaction between factors

Vous aimerez peut-être aussi