Vous êtes sur la page 1sur 41

Applied Statistics in

Business & Economics,


vu.vo@ueh.edu.vn
Chapter 11
Analysis of Variance

Chapter Contents

11.1 Overview of ANOVA


11.2 One-Factor ANOVA (Completely Randomized Model)
11.3 Multiple Comparisons
11.4 Tests for Homogeneity of Variances
11.5 Two-Factor ANOVA without Replication (Randomized Block Model)
11.6 Two-Factor ANOVA with Replication (Full Factorial Model)
11.7 Higher Order ANOVA Models (Optional)

2
Chapter 11
Analysis of Variance

Chapter Learning Objectives (LO’s)

LO1: Use basic ANOVA terminology correctly.


LO2: Recognize from data format when one-factor ANOVA is
appropriate.
LO3: Interpret sums of squares and calculations in an ANOVA table.
LO4: Use Excel or other software for ANOVA calculations.
LO5: Use a table or Excel to find critical values for the F distribution.
LO6: Explain the assumptions of ANOVA and why they are important.

3
Chapter 11
Analysis of Variance

Chapter Learning Objectives (LO’s)

LO7: Understand and perform Tukey's test for paired means.


LO8: Use Hartley's test for equal variances in c treatment groups.
LO9: Recognize from data format when two-factor ANOVA is needed.
LO10: Interpret main effects and interaction effects in two-factor
ANOVA.
LO11: Recognize the need for experimental design and GLM
(optional).

4
Chapter 11
LO1 11.1 Overview of ANOVA

LO1: Use basic ANOVA terminology correctly.

• Analysis of variance (ANOVA) is a comparison of means.


• ANOVA allows you to compare more than two means
simultaneously.
• Proper experimental design efficiently uses limited data to
draw the strongest possible inferences.

5
Chapter 11
LO1 11.1 Overview of ANOVA

The Goal: Explaining Variation

• ANOVA seeks to identify sources of variation in a numerical


dependent variable Y (the response variable).
• Variation in Y about its mean is explained by one or more
categorical independent variables (the factors) or is unexplained
(random error).

6
Chapter 11
LO1 11.1 Overview of ANOVA

The Goal: Explaining Variation

• Each possible value of a factor or combination of factors is a


treatment.
• We test to see if each factor has a significant effect on Y using
(for example) the hypotheses:
H0: m1 = m2 = m3 = m4 (e.g. mean defect rates are the same for all
four plants)
H1: Not all the means are equal
• The test uses the F distribution.
• If we cannot reject H0, we conclude that observations within each
treatment have a common mean m.

7
Chapter 11
LO1 11.1 Overview of ANOVA

One-Factor ANOVA Example

8
Chapter 11
LO1 11.1 Overview of ANOVA

The Goal: Explaining Variation

• For example, a one-factor ANOVA would test the hypothesis that


the length of hospital stay (LOS) is affected by Type of Fracture:
Length of stay = f(type of fracture).
• A two-factor ANOVA would test the hypothesis that the length of
hospital stay (LOS) is affected by Type of Fracture and Age
Group:
Length of stay = f(type of fracture, age group)
• We can also test for interaction between factors.

9
Chapter 11
LO1 11.1 Overview of ANOVA

The Goal: Explaining Variation

Figure 11.3
10
Chapter 11
LO6 11.1 Overview of ANOVA

LO6: Explain the assumptions of ANOVA and why they are important.

ANOVA Assumptions

• Analysis of Variance assumes that the


- observations on Y are independent,
- populations being sampled are normal,
- populations being sampled have equal
variances.
• ANOVA is somewhat robust to departures from normality and
equal variance assumptions.

11
Chapter 11
11.1 Overview of ANOVA

ANOVA Calculations
• Software (e.g., Excel, MegaStat, MINITAB, SPSS) is used to
analyze data.
• Large samples increase the power of the test,
but power also depends on the degree of variation in Y.
• Lowest power would be in a small sample with high variation in Y.

12
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

LO2: Recognize from data format when one-factor ANOVA is appropriate.


Data Format
• A one-factor ANOVA only compares the means of c groups
(treatments or factor levels).
• Consider the format for a one-factor ANOVA with c treatments,
denoted A1, A2, …, Ac.

13
Table 11.1
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

Data Format
• Sample sizes within each treatment do not need to be equal (i.e.,
balanced).
• The total number of observations is equal to n = n1 + n2 + … + nc

Hypothesis to Be Tested

• H0: m1 = m2 = … = mc
H1: Not all the means are equal
• ANOVA tests all means simultaneously and so does not inflate the
type I error.

14
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

One-Factor ANOVA as a Linear Model


• An equivalent way to express the one-factor model is to say that
treatment j came from a population with a common mean (m) plus
a treatment effect (Aj) plus random error (eij):

yij = m + Aj + eij
j = 1, 2, …, c and i = 1, 2, …, n

• Random error is assumed to be normally distributed with zero


mean and the same variance for all treatments.

15
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

One-Factor ANOVA as a Linear Model


• A fixed effects model only looks at what happens to the response
for particular levels of the factor.
H0: A1 = A2 = … = Ac = 0
H1: Not all Aj are zero
• If the H0 is true, then the ANOVA model collapses to yij = m + eij

16
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

Group Means

• The mean of each group is calculated as:

• The overall sample mean (grand mean) can be calculated as:

17
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

Partitioned Sum of Squares


• For a given observation yij, the following relationship must hold

18
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

Partitioned Sum of Squares


• This relationship is true for sums of squared deviations, yielding
partitioned sum of squares:

• Simply put, SST = SSA + SSE

19
Chapter 11
11.2 One-Factor ANOVA
LO2
(Completely Randomized Design)

Partitioned Sum of Squares


• SSA and SSE are used to test the hypothesis of equal treatment
means by dividing each sum of squares by it degrees of freedom
to adjust for group size.
• These ratios are called Mean Squares (MSA and MSE).
• The resulting test statistic is F = MSA/MSE.

20
Chapter 11
11.2 One-Factor ANOVA
LO3
(Completely Randomized Design)

LO3: Interpret sums of squares and calculations in an ANOVA table.

Partitioned Sum of Squares

21
Chapter 11
11.2 One-Factor ANOVA
LO3
(Completely Randomized Design)

Partitioned Sum of Squares


• The ANOVA calculations are mathematically simple but involve
tedious sums.
• One can use Excel’s one-factor ANOVA menu using Data
Analysis to analyze data.

22
Chapter 11
11.2 One-Factor ANOVA
LO3
(Completely Randomized Design)

Test Statistic
• The F distribution describes the ratio of two variances.
• The F statistic is the ratio of the variance due to treatments (MSA)
to the variance due to error (MSE).

23
Chapter 11
11.2 One-Factor ANOVA
LO3
(Completely Randomized Design)

Test Statistic
• When F is near zero, then there is little difference among
treatments and we would not expect to reject the hypothesis of
equal treatment means.

Decision Rule

• F cannot be negative has no upper limit.


• For ANOVA, the F test is a right-tailed test.
• Use Appendix F or Excel to obtain the critical value of F for a
given a.

24
Chapter 11
11.2 One-Factor ANOVA
LO5
(Completely Randomized Design)

LO5: Use a table or Excel to find critical values for the F distribution.

Decision Rule for an F-test

25
Chapter 11
11.2 One-Factor ANOVA
(Completely Randomized Design)
Example: Carton Packing.
Is the variation among stations within the range attributable to
chance, or do these samples indicate actual differences in the
means?

26
Chapter 11
11.2 One-Factor ANOVA
(Completely Randomized Design)
Example: Carton Packing.
As a preliminary step, we plot the data to check for any time
pattern and just to visualize the data. We see some potential
differences in means, but no obvious time pattern (otherwise we
would have to consider observation order as a second factor).

Figure 11.6
27
Chapter 11
11.2 One-Factor ANOVA
(Completely Randomized Design)
Example: Carton Packing.

28
Chapter 11
11.2 One-Factor ANOVA
LO5
(Completely Randomized Design)

LO5: Use a table or Excel to find critical values for the F distribution.

Example: Carton Packing.

29
Chapter 11
11.2 One-Factor ANOVA
LO4
(Completely Randomized Design)

LO4: Use Excel or other software for ANOVA calculations.

Example: Carton Packing.

30
Chapter 11
11.2 One-Factor ANOVA
(Completely Randomized Design)

Example: Carton Packing.

31
Chapter 11
LO7 11.3 Multiple Comparison Tests
LO7: Understand and perform Tukey's test for paired means.

Tukey’s Test

• After rejecting the hypothesis of equal mean, we naturally want to


know: Which means differ significantly?
• In order to maintain the desired overall probability of type I error, a
simultaneous confidence interval for the difference of means must
be obtained.
• For c groups, there are c(c – 1) distinct pairs of means to be
compared.
• These types of comparisons are called Multiple Comparison
Tests.

32
Chapter 11
LO7 11.3 Multiple Comparison Tests
Tukey’s Test

• Tukey’s studentized range test (or HSD for “honestly


significant difference” test) is a multiple comparison test that
has good power and is widely used.
• Named for statistician John Wilder Tukey (1915 – 2000)
• This test is not available in Excel’s Tools > Data Analysis but
is available in MegaStat and Minitab.

33
Chapter 11
LO7 11.3 Multiple Comparison Tests
Tukey’s Test
• Tukey’s is a two-tailed test for equality of paired means from c
groups compared simultaneously.
• The hypotheses are:
H0: mj = mk
H1: mj ≠ mk
Decision Rule

Where Tc,n−c is a critical


value of the Tukey test
statistic Tcalc for the
desired level of
significance.
34
Chapter 11
LO7 11.3 Multiple Comparison Tests
Tukey’s Test

• For example, here is the upper 5% of studentized range:

35
Chapter 11
LO8 11.4 Tests for Homogeneity of Variances

LO8: Use Hartley's test for equal variances in c treatment groups.

ANOVA Assumptions

• ANOVA assumes that observations on the response variable are


from normally distributed populations that have the same
variance.
• The one-factor ANOVA test is only slightly affected by inequality
of variance when group sizes are equal.
• Test this assumption of homogeneous variances, using Hartley’s
Fmax Test.

36
Chapter 11
LO8 11.4 Tests for Homogeneity of Variances

Hartley’s Test

• The hypotheses are


H 0: s 12 = s 2 2 = … = s c 2
H1: Not all the sj2 are equal
• The test statistic is the ratio of the largest sample variance to the
smallest sample variance

37
Chapter 11
LO8 11.4 Tests for Homogeneity of Variances

Hartley’s Test

• The decision rule is:

38
Chapter 11
LO8 11.4 Tests for Homogeneity of Variances

Hartley’s Test

• Assuming equal
group sizes,
critical values of
Fmax are found
using degrees
of freedom
Numerator df1 = c
Denominator
df2 = n/c - 1

39
Chapter 11
LO6 11.4 Tests for Homogeneity of Variances

LO6: Explain the assumptions of ANOVA and why they are important.

Levene’s Test

• Levene’s test is a more robust alternative to Hartley’s F test.


• Levene’s test does not assume a normal distribution.
• It is based on the distances of the observations from their sample
medians rather than their sample means.
• A computer program (e.g., MINITAB) is needed to perform this
test.

40
Chapter 11
LO6 11.4 Tests for Homogeneity of Variances

Levene’s Test

41

Vous aimerez peut-être aussi