Vous êtes sur la page 1sur 12

Business Administration Department

Page 1 of 12
Quantitative Techniques 2
Chapter 6
Chapter Six: Analysis of Variance

When you completed this chapter, you will be able to:
- list the characteristics of the F distribution;
- conduct a test of hypothesis to determine whether the variances of two populations are equal;
- discuss the general idea of analysis of variance;
- organize data into a one-way ANOVA table;
- conduct a test of hypothesis among three or more treatment means;
- appreciate the application of MS-Excel in ANOVA test

Reference(s): Lind Chapter 12
Exercise(s): Tutorial Exercise 9


Introduction

ANalysis Of VAriance (ANOVA) is a technique used for comparing the means of several
populations. This is done by using samples from those populations. An experiment to test
for differences between the means of several different levels of a single variable involves a
one-way classification and we analyse one-way classification data using one-way
ANOVA.

More complex experiments involving two or more classification variables are called two-
way and multi-way classification designs and are analysed using two-way and multi-way
ANOVA.

The probability distribution used in ANOVA is the F distribution.

Characteristics of the F distribution:

1. There is a family of F distributions

Each member of the family is determined by two
parameters.: the numerator degrees of freedom
and the denominator degrees of freedom. Note
that the shape of the curves changes as the
degrees of freedom change.

2. The F distribution is continuous and the F
distribution cannot be negative.

As F approaches infinitive, the curve approaches the X-axis but never touches it
(asymptotic).

3. The F distribution is positively skewed

As the number of degree of freedom increases in both the numerator and denominator
the distribution approaches normal distribution.
df = (29,28)
4.5
1
df = (19,6)
df = (6,6)
df = (29,28)

Business Administration Department
Page 2 of 12
Quantitative Techniques 2
Chapter 6
Analysis of Variance (ANOVA) is used:

1. to compare the population variance of 2 populations (refer to QT2 Chapter 4 Example 12);
2. to test the equality of means of 3 or more populations.

Comparing Two Population Variances

The F distribution is used to test the hypothesis that the variance of one normal population
equals the variance of another normal population.

e.g.1 The mean rate of return on two types of common stock may be the same, but there
may be more variation in the rate of return in one than the other. A sample of 10
IT stocks and 10 utility stocks are selected for testing whether their variations in
returns are equal.
H
0
:
2
2
2
1
o = o
H
1
:
2
2
2
1
o = o (two-tailed test)

To conduct the test, we select a random sample of n
1
observations from one population,
and a sample of n
2
observations from the second population. The test statistic is:


2
2
2
1
s
s
F =

Where
2
1
s sample variance from n
1
2
2
s sample variance from n
2

df for numerator = n
1
-1 df for denominator = n
2
-1

e.g.2 The mean rate of a sample of 10 IT stocks was 12.6% with a standard deviation of
3.9%. The mean rate of return on a sample of 8 utility stocks was 10.9% with
standard deviation of 3.5%.
At the 0.05 significance level, can we conclude that there is more variation in rate
of return of software stocks?

Sol: n
1
10 , 1 X 12.6 , s
1
3.9 ; n
2
8 , 2 X 10.9 , s
2
3.5

H
0
:
2
2
2
1
o o s
H
1
:
2
2
2
1
o o > (upper one-tailed test)

Test statistic:
2
2
2
2
2
1
5 . 3
9 . 3
= =
s
s
F = 1.2416
df
1
n
1
11019 df
2
n
2
1817

Critical ValueF
9,7,0.05, upper
3.68 (From F-table with 5% tail area)
Decision Rule: Reject H
0
if F > F F
9,7,0.05, upper

Since F = 1.2416 < 3.69 = F
9,7,0.05,upper
, Dont reject H
0
at 5% significance level

Conclusion: There is insufficient evidence to show more variation in the IT stock
at 5% significance level.

Business Administration Department
Page 3 of 12
Quantitative Techniques 2
Chapter 6

e.g.3 David goes to IVE(CW) everyday by either bus or MTR. He wants to compare the
travelling times taken by the two methods. Data (in minutes) are collected as
follows:

By Bus 52 67 56 45 70 54 64
By MTR 59 60 61 51 56 63 57 65

Using the 0.1 significance level, is there a difference in the variation in the
travelling times between the two methods?

Sol: By Bus:
7
408
n
X
X
1
1 = =

58.29 s
1
=
1 7
43 . 485
1 n
) X (X
1
2
1

= 8.9947
By MTR:
8
472
n
X
X
2
2 = =

59 s
2
=
1 8
134
1 n
) X (X
2
2
2

= 4.3753

s
1
8.9947 , df
1
6 ; s
2
4.3753 , df
2
7

H
0
:
2
2
2
1
o = o vs H
1
:
2
2
2
1
o = o (two-tailed test)

Test statistic: = =
2
2
2
1
s
s
F
2
2
3753 . 4
9947 . 8
= 4.23

Upper critical value F
n1-1, n2-1, o/2, upper
= F
6, 7, 0.05 , upper
3.87

Lower critical value F
n1-1, n2-1, o/2, lower
= F
6, 7, 0.05 , lower
1/4.12 = 0.2427
(refer to QT2, unit 4, example 12. page 10)
(From F-table with 5% tail area)

Since F > F
6,7,0.05 ,upper
, Reject H
0
at 10% significance level

Conclusion: There is a difference in the variation of the travelling times between
the two travelling methods, at 10% significance level.


Class Exercise 1

The following hypotheses are given,

H
0
:
2
2
2
1
o s o vs H
1
:
2
2
2
1
o > o

A random sample of five observations from the first population showed in a
standard deviation of 12. A random sample of seven observations from the second
population showed a standard deviation of 7.
At the 0.01 significance level, test whether the variance in population 1 is greater
than that of the population 2.



Business Administration Department
Page 4 of 12
Quantitative Techniques 2
Chapter 6
Testing equality of means of 3 or more Populations

ANOVA is also used for testing whether three or more population means are equal. It
requires the following necessary assumptions:
1. Samples are selected randomly and independently from the populations;
2. All populations are normally distributed; and
3. The population variances are equal.

The null hypothesis and the alternative hypothesis are:
H
0
:
1

2

3
.=
k

H
1
: Not all population means equal ( ??? -tailed test)


Suppose that you are going to compare the mean of 3 populations, namely X, Y and Z, and
several samples are randomly selected from each population. The sample values and the
respective mean of the samples are shown below:

Sample set 1

x x x x x x
y y y y y y y
z z z z z z z z


x y z
Sample set 2

x x x x x x
y y y y y y y
z z z z z z z z


x y z

It can be seen that the means of the two x-samples are equal, and the same applies to y and
z samples as well.

It would be much more confident to say that the three population means in the second
sample set are different than that of the first sample set. It is because the variation within
each sample in the second sample set is smaller than the variation within each sample in
the first sample set.

The above figure shows that the larger the variance within samples (groups), the more
likely that the populations means are equal.


Business Administration Department
Page 5 of 12
Quantitative Techniques 2
Chapter 6
Similarly, consider the following sets of samples:

Sample set 1

x x x x x x
y y y y y y y
z z z z z z z z


x y z
Sample set 2

x x x x x x
y y y y y y y
z z z z z z z z


x y z

In this case, the variation of x-sample data in sample set 1 is same as that of in sample set
2, and the same applies to y and z samples as well.

It would be much more confident that the three population means in the first sample set
are different than those of the second sample set, because the difference (variation)
between the sample means, z y x , , in sample set 1 are greater than that of in sample set 2.

The above figure shows that the larger the variance between samples (groups), the less
likely that the populations means are equal.

If the above two variances are considered together as a ratio:
groups within Variance
groups between Variance


We may conclude that:
The larger the fraction, the more likely the population means are NOT equal;
The smaller the fraction, the more likely the population means are equal.

Thus, we can determine the equality of the population means by analysing their variances.

Moreover, it is an upper one-tailed test because the null hypothesis that the population
means are equal will only be rejected if the fraction (test statistic) is greater than the
critical value.

Business Administration Department
Page 6 of 12
Quantitative Techniques 2
Chapter 6
Suppose that there are k treatments (i=1,,k) and that treatment i
th
has n
i
observations.

X
ij
is the j
th
observation on the i
th
treatment.

Treatment 1 X
11,
X
12, .
X
1 n1
Mean
1
x
Treatment 2 X
21,
X
22, .
X
2 n2
Mean
2
x
Treatment 3 X
31,
X
32, .
X
3 n3
Mean
3
x
. . .
Treatment k X
k1,
X
k2, .
X
k nk
Mean
k
x

Total number of observations, n = n
1
+ n
2
+ + n
k


For the within groups variation
The within groups sum of squares, called the residual sum of squares (or error sum of
squares)
2
k
1 i
n
1 j
ij
2
n
1 j
kj
2
2
n
1 j
2j
2
1
n
1 j
1j R
) x (x ) (x ... ) (x ) x (x SS
k 2 1
i k
x x = + + + =

= = = = =

= (n
1
-1)s
1
2
+ (n
2
-1)s
2
2
+ (n
3
-1)s
3
2
+ . + (n
k
-1)s
k
2
=

=

k
1 i
2
i i
1)s (n


Degrees of freedom = n-k
The residual mean square = MS
R
=
k n
SS
R


This is the within groups estimate of variance.


For the between groups variation
Overall mean,
n
x
n
n ..... n n
x
k
1 i
n
1 j
ij
k 2 2 1 1

= =
=
+ + +
=
k
x x x


The between groups sum of squares,
2
k k
2
2 2
2
1 1 B
) x x ( n ... ) x x ( n ) x ( n SS + + + = x =
2
i
k
1 i
i
) x x ( n

=


Degree of freedom = k 1
The between groups mean square, MS
B
=
1 k
SS
B


This is the between groups estimate of variance.

The test statistic, F, is calculated as
R
B
MS
MS
F = and this is referenced to the F distribution
with k-1, n-k degrees of freedom, is F
k-1
,
n-k
.

The critical value is F
k-1
,
n-k
,
o

Business Administration Department
Page 7 of 12
Quantitative Techniques 2
Chapter 6

The ANOVA Table

Source of
Variation
Sum of Square
(SS)
Degree of
Freedom
Mean Square
(MS)
F
Treatments
(Between groups)
SS
B
k1
1 k
SS
MS
B
B

=
R
B
MS
MS

Errors
(Within groups)
SS
R
nk
k n
SS
MS
R
R

=

Total SS (total) = SS
B
+ SS
R
n1



e.g. 4 There are five treatments for reducing body weight. At 5% significance level,
test whether any significant difference between the five treatments.
Each treatment is given to a different randomly chosen sample of people who
undertake the slimming exercise. The results, using suitable units, are as follows.
Treatment Results
1 12, 6, 5, 7, 10
2 10, 15, 14, 13, 12, 12, 15
3 3, 2, 7, 8, 3, 1
4 7, 8, 7, 10
5 16, 18, 21, 19, 21

Sol: H
0
:
1

2

3
=
4
=
5

H
1
: Not all population means equal

Treatment Results Total Average
Standard
Deviation
1 12, 6, 5, 7, 10 40 =
1
x 8 s
1
= 2.9155
2 10, 15, 14, 13, 12, 12, 15 91 =
2
x 13 s
2
= 1.8257
3 3, 2, 7, 8, 3, 1 24
=
3
x 4 s
3
= 28284
4 7, 8, 7, 10 32 =
4
x 8 s
4
= 1.4142
5 16, 18, 21, 19, 21 95
=
5
x 19 s
5
= 2.1213

For these data, k = 5
n
1
= 5, n
2
= 7, n
3
= 6, n
4
= 4, n
5
= 5 and n = 27

The treatment means are shown in the above table, and the overall mean is
10.44 x =



Business Administration Department
Page 8 of 12
Quantitative Techniques 2
Chapter 6

The necessary assumptions for this analysis are:
1) the samples were obtained randomly and independently from each of the five
treatments;
2) the results for each treatment follow a normal distribution; and
3) the population variances are equal.



For the between groups variation
2 2
2 2
2
1 1
) ( ... ) ( ) ( x x n x x n x x n SS
k k B
+ + + =
= 5(8-10.44)
2
+ 7(13-10.44)
2
+ 6(4-10.44)
2
+ 4(8-10.44)
2
+ 5(19-10.44)
2

= 714.6667

For the within groups variation
2
4
n
1 j
4j
2
3
n
1 j
3j
2
2
n
1 j
2j
2
1
n
1 j
1j R
) (x ) (x ) (x ) (x SS
k 2 2 1
x x x x + + + =

= = = =

= (n
1
-1)s
1
2
+ (n
2
-1)s
2
2
+ (n
3
-1)s
3
2
+ (n
4
-1)s
4
2
= (5-1) x 2.9155
2
+(7-1) x 1.8257
2
+(6-1) x 2.8284
2
+(4-1) x 1.4142
2
+(5-1) x 2.1213
2

= 117.9984

SS (total) = SS
B
+ SS
R
= 714.6667 + 117.9984 = 832.6651

The ANOVA table for this analysis follows:
Source SS df MS F
Treatment
(Between groups)
SS
B

= 714.6667
k-1
= 4
MS
B

= 714.6667/4
= 178.67
MS
B
/MS
R

= 178.67/5.36
= 33.33
Error
(Within groups)
SS
R

= 117.9984
n-k
= 22
MS
R

= 17.9984/22
= 5.36

Total 832.6651 26

Test Statistic = F = 33.33
Critical value =F
4
,
22, 0.05, upper =
2.82 (From F-table with 5% tail area)


Decision Rule: Reject H
0
if F > F
4
,
22, 0.05, upper

since 33.33 > 2.82, Reject H
0
at 5% significance level

Conclusion: At 5% significance level, there is a significant difference between treatments.




Business Administration Department
Page 9 of 12
Quantitative Techniques 2
Chapter 6
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
A 4 349 87.25 36.91667
B 5 391 78.2 58.7
C 7 510 72.85714 30.14286
D 6 414 69 13.6
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 890.683766 3 296.8946 8.990643 0.000743 5.09189
Within Groups 594.407143 18 33.02262
Total 1485.09091 21
e.g. 5 Redo example 4 with the function Tools/Data Analysis/ANOVE: Single Factor in
MS-Excel.

The computer output is shown below:











Based on the computer output, the test statistic, F = 33.31, is greater than the
critical value, 2.8167, the null hypothesis (H
0
) is rejected at 5% significance level.

e.g.6 There are 4 classes of students studying the module QT2, and the lecturer wants to
compare the overall performance of students in different classes. Samples of students are
randomly selected from each class and their examination scores are listed as follows:
Class
A B C D
94 75 70 68
90 68 73 70
85 77 76 72
80 83 78 65
88 80 74
68 65
65
Use MS-Excel to test whether there is a significant difference in the mean score of the
students in each of the four classes? Use 0.01 significance level.

Sol: The computer output is shown below:












Based on the computer output, the test statistic, F = 8.99, is greater than the critical value,
5.09, the null hypothesis (H
0
) is rejected at 1% significance level.
Conclusion: There is a significant difference in the mean score of the students in each
of the four classes at 1% significance level.

From the outputs, what does the P-value mean?
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 5 40 8 8.5
Column 2 7 91 13 3.333333333
Column 3 6 24 4 8
Column 4 4 32 8 2
Column 5 5 95 19 4.5
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 714.6666667 4 178.6666667 33.31073446 4.83357E-09 2.81670834
Within Groups 118 22 5.363636364
Total 832.6666667 26

Business Administration Department
Page 10 of 12
Quantitative Techniques 2
Chapter 6
Confidence interval for the difference in treatment means

ANOVA enables us to make the decision to reject the null hypothesis that all the treatment
means are not the same. In some instances, we may want to know which treatment means
differ. The simplest is through the use of confidence intervals.

A confidence interval for the difference between two populations is found by:

)
n n
( MS t ) x - x (
R k n
2 1
2 / , 2 1
1 1
+
o


where:
1
x is the mean of the first sample.
2
x is the mean of the second sample.
t is obtained from the t-table. df = n k.
MS
R
is obtained from the ANOVA table [SS
R
/(n-k)].
1
n is the number of observations in the first sample.
2
n is the number of observations in the second sample.

If the confidence interval around the difference in treatment means includes zero, there is
not a difference between the treatment means. On the other hand, if the endpoints of the
confidence interval have the same sign, this indicates that the treatment means differ.


e.g.7 Referring to e.g.6, check if the mean from class A is differ from the mean from
class D with 95% level of confidence..

Sol:
1
x 87.25 ,
2
x 69.00 ,
1
n 4 ,
2
n 6 , MS
R
33.02
df 22 - 4 18 , t
18, 0.025
2.101

the 95% confidence interval :
)
n n
MSE( t ) x - x (
k n
2 1
2 / , 2 1
1 1
+
o

(87.2569) )
6
1
4
1
( 02 . 33 101 . 2 +
18.25 7.79 = (10.46, 26.04)

Both endpoints are positive; hence, we can conclude these treatment means differ
significantly.

That is, students who rated the instructor Excellent have significantly higher grades
than those who rated the instructor as Poor.


Business Administration Department
Page 11 of 12
Quantitative Techniques 2
Chapter 6

Class Exercise 2

A single-factor ANOVA table is shown in the following table with certain elements
deleted.

Source of variation Sum of squares Degrees of freedom Mean square
Treatment
(Between groups)
208
Error
(Within groups)
232 18 12.8889
Total 20

a) What is the total variation SS(total) within the sample as a whole?
b) What is the total sample size?
c) What percentage of the total variation is represented by variation between
samples?
d) What is the degree of freedom for the between-treatment variation?
e) How many groups of data are being compared?
f) What is the value of MS
B
?
g) What is the value of the F statistic?
h) State the degrees of freedom to be used in determining the critical value of
the F statistic.
i) What is the critical value of the test at the 5% level of significance?
j) Should the null hypothesis of no difference between the population means
be rejected at the 5% level of significance?





Business Administration Department
Page 12 of 12
Quantitative Techniques 2
Chapter 6

Class : Name : No. :

Class Exercise 1 (Solution)

s
1
12 , df
1
51 4 ; s
2
7 , df
2
71 6


H
0
:
2
2
2
1
o s o vs H
1
:
2
2
2
1
o > o

Test statistics:
2
2
2
1
s
s
F =
2
/
2
= 2.94
Level of significance is 0.01. one-tailed test, i.e. o0.01

Critical ValueF
4, 6, 0.01
9.15 (From F-table)

Since F < F
4,6,0.01
, H
0
is not rejected at o0.01 .

Conclusion: There is insufficient evidence to show that there is more variation in
the first population at 0.01 significance level.


Class Exercise 2 (Solution)

Source of Variation Sum of square Degree of
Freedom
Mean Square F
Treatments
(Between groups)
208 2 104 4.4578
Errors
(Within groups)
232 18 12.8889
Total 440 20

(a) SS
(Total)
= 208232 440
(b) 201 21
(c) SS
B
/SS
(Total)
=208 / 440 47.27%
(d) 2018 = 2
(e) df
between
1 3
(f) MS
B
= SS
B
/ df
between
208 / 2 104
(g) F = MS
B
/MS
R
104 / 12.89 = 8.07
(h) df
1
= 2 and df
2
= 18
(i) F
2,18,0.05
3.55
(j) Yes, the null hypothesis is rejected , (since F = 8.07 > 3.55 = F
2,18,0.05
)

Vous aimerez peut-être aussi