Vous êtes sur la page 1sur 62

Topic 6: ANOVA

<#>

Introduction

ANOVA is short for ANalysis Of VAriance.


Developed by Sir Ronald Fisher, the noted
British statistician (Father of Classic Statistic)
Designed to test MEANS by utilizing different
estimates of the VARIANCES.
Allowed for THREE or more groups
(extension extension of the two independent
samples t-test).
UEME 4393

Examples

A manufacturer of belts is considering using one of three


different types of leather. One consideration is deciding among
the three types is whether the leather has the same resistance
to tearing under stress. Ten pieces of leather are randomly
selected from each type, and ANOVA is used to determine
whether the samples differ significantly in strength.
A company interested in controlling a chemical process is
investigating the effect on output of (1) a catalyst, (2) pressure,
and (3) temperature. An experiment is designed to study the
effect of letting each of these factors assume either a low or a
high value. The results of all eight combinations of these three
factors are analyzed using ANOVA.

UEME 4393

Terminology

Treatment Level
Main Effect/ Significant Main Effect
Interaction Effect
Error
Sum of Square (SS)/ Mean Square (MS)
F-Distribution/ F-Critical value
p-value
Degree of freedom (DF)
UEME 4393

Assumptions

Actual Value of Y = Value predicted by the


model + Error Term
Assumptions for error term

The errors are normally distributed


The errors have a constant variance
The errors are independent of one another

UEME 4393

Variability

Total variability = Variability explained by the


model + Unexplained variability; or
SSTotal = SSModel + SSError
MS = SS/DF
F = MSModel/MSError
P-value is determined based on F-value

UEME 4393

One Factor ANOVA

UEME 4393

One-Way ANOVA

Variation

Variation is the sum of the squares of the


deviations between a value and the mean of the
value
Sum of Squares is abbreviated by SS and often
followed by a variable in parentheses such as
SS(B) or SS(W) so we know which sum of
squares were talking about

UEME 4393

One-Way ANOVA

Are all of the values identical?

No, so there is some variation in the data


This is called the total variation
Denoted SS(Total) for the total Sum of Squares
(variation)
Sum of Squares is another name for variation

UEME 4393

One-Way ANOVA

Are all of the sample means identical?

No, so there is some variation between the


groups
This is called the between group variation
Sometimes called the variation due to the factor
Denoted SS(B) for Sum of Squares (variation)
between the groups

UEME 4393

10

One-Way ANOVA

Are each of the values within each group


identical?

No, there is some variation within the groups


This is called the within group variation
Sometimes called the error variation
Denoted SS(W) for Sum of Squares (variation)
within the groups

UEME 4393

11

One-Way ANOVA

There are two sources of variation

the variation between the groups, SS(B), or the


variation due to the factor
the variation within the groups, SS(W), or the
variation that cant be explained by the factor so
its called the error variation

UEME 4393

12

One Factor ANOVA

UEME 4393

13

One Factor ANOVA

UEME 4393

14

One Factor ANOVA

SSTotal = SSBetween + SSWithin


UEME 4393

15

One Factor ANOVA

SSTotal = SSBetween + SSWithin

(SSBetween)

(SSWithin)

UEME 4393

16

One Factor ANOVA

SSB = (number in the group)(Group mean


Grand mean)2
SSW =(Sample Value Group mean)2
SST =(Sample Value Grand mean)2
SST = SSB + SSW

UEME 4393

17

One Factor ANOVA

UEME 4393

18

One Factor ANOVA

The ANOVA test statistic is:

If F0 is greater than the critical value F,a-1, a(n-1) then the null hypothesis of
equal treatment means is rejected. A P-value approach can also be used.
The P-value would be the probability above F0 in the Fa-1, a(n-1) distribution.
UEME 4393

19

One Factor ANOVA

UEME 4393

20

Example: Manual

The statistics classroom is divided into three


rows: front, middle, and back
The instructor noticed that the further the
students were from him, the more likely they
were to miss class or use WhatsApp during
class
He wanted to see if the students further away
did worse on the exams
UEME 4393

21

One-Way ANOVA

The ANOVA doesnt test that one mean is


less than another, only whether theyre all
equal or at least one is different.

H :
0

UEME 4393

22

One-Way ANOVA

A random sample of the students in each row


was taken
The score for those students on the pop-quiz
was recorded

Front:
Middle:
Back:

82, 83, 97, 93, 55, 67, 53


83, 78, 68, 61, 77, 54, 69, 51, 63
38, 59, 55, 66, 45, 52, 52, 61

UEME 4393

23

One-Way ANOVA
The summary statistics for the grades of each row are
shown in the table below
Row

Front

Middle

Back

Mean

75.71

67.11

53.50

St. Dev

17.63

10.95

8.96

310.90

119.86

80.29

Sample size

Variance

UEME 4393

24

One-Way ANOVA

Here is the basic one-way ANOVA table

Source

SS

DF

MS

Between
Within
Total
UEME 4393

25

One-Way ANOVA

Grand Mean

The grand mean is the average of all the values


when the factor is ignored
It is a weighted average of the individual sample
means
k

nx
i 1
k

n
i 1

n x n x n x
x
n n n
1

UEME 4393

26

One-Way ANOVA

Between Group Variation, SS(B)

The between group variation is the variation between each


sample mean and the grand mean
Each individual variation is weighted by the sample size

SS B n x x
SS B n x x n x x n
k

i 1

UEME 4393

27

One-Way ANOVA

Within Group Variation, SS(W)

The Within Group Variation is the weighted total of the


individual variations
The weighting is done with the degrees of freedom
The df for each sample is one less than the sample size
for that sample.

UEME 4393

28

One-Way ANOVA
Within Group Variation

SS W df s
k

i 1

2
i

SS W df s df s df s
2

UEME 4393

2
k

29

One-Way ANOVA

Degrees of Freedom, DF

A degree of freedom occurs for each value that can vary


before the rest of the values are predetermined
For example, if you had six numbers that had an average of
40, you would know that the total had to be 240. Five of the
six numbers could be anything, but once the first five are
known, the last one is fixed so the sum is 240. The df
would be 6-1=5
The df is often one less than the number of values

UEME 4393

30

One-Way ANOVA

The between group df is one less than the number of


groups

The within group df is the sum of the individual dfs


of each group

We have three groups, so df(B) = 2

The sample sizes are 7, 9, and 8


df(W) = 6 + 8 + 7 = 21

The total df is one less than the sample size

df(Total) = 24 1 = 23

UEME 4393

31

One-Way ANOVA

Variances

The variances are also called the Mean of the Squares and
abbreviated by MS, often with an accompanying variable
MS(B) or MS(W)
They are an average squared deviation from the mean and
are found by dividing the variation by the degrees of
freedom
MS = SS / df

Variation
Variance
df
UEME 4393

32

One-Way ANOVA

MS(B)
MS(W)
MS(T)

= 1902 / 2
= 3386 / 21
= 5288 / 23

= 951.0
= 161.2
= 229.9

Notice that the MS(Total) is NOT the sum of


MS(Between) and MS(Within).
This works for the sum of squares SS(Total), but
not the mean square MS(Total)
The MS(Total) isnt usually shown
UEME 4393

33

One-Way ANOVA

Special Variances

The MS(Within) is also known as the pooled


estimate of the variance since it is a weighted
average of the individual
s p2 variances

Sometimes abbreviated

The MS(Total) is the variance of the response


variable.

Not technically part of ANOVA table, but useful none


the less
UEME 4393

34

One-Way ANOVA

F test statistic

An F test statistic is the ratio of two sample


variances
The MS(B) and MS(W) are two sample variances
and thats what we divide to find F.
F = MS(B) / MS(W)

For our data, F = 951.0 / 161.2 = 5.9

UEME 4393

35

One-Way ANOVA

The F test is a right tail test


The F test statistic has an F distribution with
df(B) numerator df and df(W) denominator df
The p-value is the area to the right of the test
statistic
P(F2,21 > 5.9) = 0.009

UEME 4393

36

One-Way ANOVA

The p-value is 0.009, which is less than the


significance level of 0.05, so we reject the
null hypothesis.
The null hypothesis is that the means of the
three rows in class were the same, but we
reject that, so at least one row has a different
mean.

UEME 4393

37

One-Way ANOVA

There is enough evidence to support the


claim that there is a difference in the mean
scores of the front, middle, and back rows in
class.
The ANOVA doesnt tell which row is
different, you would need to look at
confidence intervals or run post hoc tests to
determine that
UEME 4393

38

Drag Here for Input

Example: Excel

UEME 4393

39

Excel

Three menu options for ANOVA


Anova: Single Factor
Anova: Two-Factor with Replication
Anova: Two-Factor without Replication

UEME 4393

40

Excel

Perform a statistical analysis


On the Tools menu, click Data Analysis. If Data Analysis is not
available, load the Analysis ToolPak.
How?

On the Tools menu, click Add-Ins.


In the Add-Ins available list, select the Analysis ToolPak box, and then
click OK.
If necessary, follow the instructions in the setup program.

In the Data Analysis dialog box, click the name of the analysis tool
you want to use, then click OK.
In the dialog box for the tool you selected, set the analysis options you
want.
You can use the Help button on the dialog box to get more information
about the options.

UEME 4393

41

Excel

UEME 4393

42

Excel

UEME 4393

43

Excel

UEME 4393

44

Excel

UEME 4393

45

Excel

To find p-value

FDIST(F calculated, d.f. numerator, d.f. denominator)

To find F Crit

FINV(, d.f. numerator, d.f. denominator)

UEME 4393

46

Two Factor ANOVA

Main effects are the influences on Y


attributed to the variables of interest in the
analysis.
Interaction effect represent systematic
influences on Y that cannot be explained by
the main effect.

UEME 4393

47

Two Factor ANOVA

Three null hypothesis:

Ho: There is no main effect A


Ho: There is no main effect B
Ho: There is no interaction effect (i.e. A and B, in
combination, do now have an influence on Y that
is separate from the main effects)

Alternative hypothesis in each case specifies


that there is an effect.
UEME 4393

48

Two Factor ANOVA

UEME 4393

49

Two Factor ANOVA

UEME 4393

50

Two Factor ANOVA

UEME 4393

51

Two Factor ANOVA

UEME 4393

52

Example

UEME 4393

53

Example

UEME 4393

54

Example

UEME 4393

55

Example

UEME 4393

56

Example: Excel

UEME 4393

57

Excel
Input Range

UEME 4393

58

Excel

UEME 4393

59

Excel

When interaction is present, any main effects must be


interpreted with caution. Interaction should be examined
before main-effect conclusions are drawn.

UEME 4393

60

Excel

UEME 4393

61

Excel

UEME 4393

62

Vous aimerez peut-être aussi