2.16 Analysis of Variance (ANOVA) Rev DD 20100604

2.
16 Analysis of Variance (ANOVA)

Six Sigma Black Belt and Green Belt Week 2 Revised 4th June 2010
Objectives
To introduce ANOVA hypothesis testing Graphical method for analysing differences between means obtained
from two or more samples
Analysis of Variance (ANOVA) methods for analysing the differences

between means
To understand the relationship of

"within" subgroup estimates of variation and "between" subgroup estimates of variation
To understand the measuring effect size To practice examples To introduce the Post Hoc test
2010-06-04 SKF Group Slide 1
SKF (Group Six Sigma)
2.16 Analysis of Variance (ANOVA)
SKF Six Sigma roadmap

Six Sigma methodology and roadmap for common tool usage
Validating key process inputs and outputs with ANOVA
Y = f(X1, X2, X3, ..., Xn)

Variable (Continuous) Variables with categories (Attribute)
By knowing and controlling the Xs, we reduce the variability in Y. We validate Xs and Ys with hypothesis testing.
What is ANalysis Of VAriance (ANOVA)?

It compares means or other estimates of variance for each source of variation Methods for determining whether or not significant differences in variance exist between two or more samples A hypothesis test to validate P-FMEA key process input variables A test for validating improvements in the Ys Analysis of variance (ANOVA) is used to investigate and model the relationship between a response variable and one or more predictor variables. However, the predictor variables are qualitative (categorical).
Note: The underlying test is used in many Experimental Designs.
2010-06-04 SKF Group Slide 4 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Method
Uses sums of squared differences, just like a standard deviation, to evaluate the total variability of the system Calculates "standard deviations" for each source and subtracts their variability from the total
ANOVA graphical
Between subgroup variation (signal)
Within subgroup variation (error)
Degree of freedom introduction

Degrees of freedom (df) is the number of independent comparisons available to estimate a specific statistic. In ANOVA, the degrees of freedom are based on the total number of responses and the number of levels at which factors are tested. What is the minimum number of comparisons it would take to determine which person is the shortest?
The degree of freedom concept?

Example:
Consider a sample of n = 3 scores with a mean of X-bar = 5. The first score in the sample can be selected without any restrictions; all scores are independent of each other and they can have any value. For this demonstration assume X = 2 is obtained for the first score and X = 9 for the second. At this point, however, the third score can be determined.
The degree of freedom concept?

In this case the third score must be X = 4. The reason that the third score has to be X = 4 is, the entire sample of n = 3 scores has a mean of: X-bar = 5, which means that the sum of the total must be: X = 15. The first two scores add up to 11 (= 9 + 2), so the third score must be X = 4. In this case the first two out of three scores were free to have any value, but the final score was dependent on the values chosen for the first two. With a sample of n scores, the first n-1 scores are free to vary, but the final score can be determined. As a result, the sample is said to have n-1 degrees of freedom (df). The degrees of freedom determine the number of scores in the sample which are independent and free to vary.
Thus in other words ...

Degrees of freedom are "statistical cash" ... We "earn" a degree of freedom for every data point we collect We "spend" a degree of freedom for every parameter we estimate
Degrees of freedom (within groups):

Earn a degree of freedom for each observation within each group Spend one degree of freedom to calculate the average for each group dfW = n 1, where n = sample size / treatment
Degrees of freedom (between groups):

Earn a degree of freedom for each group Spend one degree of freedom to calculate the overall average dfB = k 1, where k = # of group averages or number of treatments
Degree of freedom and ANOVA

The n-1 degrees of freedom for a sample is the same n-1 that is used in the formulas for sample variance and sample standard deviation. Remember, variance is defined as the mean square deviation. This mean it is computed by finding the sum and dividing by the number of scores:
Mean = Sum / Number of scores
To calculate sample variance (mean squared deviation), we find the sum of the squared deviations (SS) and divide by the number of scores that are free to vary. This number is n-1 = df.
s2
=
Sum of squared deviations Number of scores free to vary
SS df
Calculate the F statistic Example of seal life by shift (1)

Shift 1 25.40 26.31 24.10 23.74 25.10 Mean 24.93 (5-1) Shift 2 23.40 21.80 23.50 22.75 21.60 22.61 (5-1) Shift 3 20.00 22.20 19.75 20.60 20.40 20.59 (5-1) dftotal = (4) + (4) + (4) = 12
Data collection !
df =
Overall average = 22.71

Mean shift 1 (24.93) Mean shift 2 (22.61) Mean shift 3 (20.59)
Overall average = 22.71
The F-distribution
Variance = Sum of squared deviations/df There are two variances (within and between), the F-statistic is the ratio of these two variances. The ratio follows an F-distribution. The F-distribution depends on two sets of degrees of freedom - the df from each variance: df1 for the between and df2 for the within
Fdf ,df
1
s2 = between s2 within

s
2 between
SS between = dfbetween
SS between 47.164 = = 23.582 dfbetween 31

Number of shifts
2 within
SS within = dfwithin
SS within 11.0532 = = 0.9211 dfwithin 15 3

Total data available
Fdf ,df
1
s2 23.582 between = 2 = = 25.60 s within 0.9211
The F-distribution depends on two sets of degrees of freedom: the df from each variance: df1 for the between and df2 for the within.
What is the distribution of the F-ratio?

This is the distribution of F-ratios that would occur if there was no difference in group means. For example, say Im willing to take a 5% chance of being wrong by saying there is more between than within variation.
The curve changes as a function of the numerator df and denominator df Represents the amount of risk I'm willing to take of being wrong when I say that Ive found this factor to be a significant effect.
A calculated F-ratio > Fcrit gives me less than a 5% chance that the larger between variation occurred by chance alone.
Fcritical at 5%
5% of the total area is from this F value, Fcrit to the right
Remember you choose the amount of risk to take, then find a corresponding Fcritical
Evaluate the F statistic Example of seal life by shift (4)

We find the critical value of F in the F-distribution table and the significance level of 0.05 is desired (with 2 degrees of freedom in the numerator and 12 degrees of freedom in the denominator).
Numerator df 1 Denominator df 2 11 12 13 1 2 3
4.84 4.75 4.67
3.98 3.89 3.81
3.59 3.49 3.41
Reject Null Hypothesis if F2,12 > 3.89, i.e. 25.60 > 3.89, therefore reject the Null hypothesis. Conclude that the shifts have a significant effect on seal life.
F-distribution table Probability points of the F-distribution

Degrees of Freedom for Denominator
Degrees of Freedom for Numerator (df) 1

161.4 4052 18.51 98.50 10.13 34.12 7.71 21.20 6.61 16.26 5.99 13.75 5.59 12.25 5.32 11.26 5.12 10.56 4.96 10.04 4.54 8.68 4.35 8.10
2
199.5 5000 19.00 99.00 9.55 30.82 6.94 18.00 5.79 13.27 5.14 10.92 4.74 9.55 4.46 8.65 4.26 8.02 4.10 7.56 3.68 6.36 3.49 5.85
3
315.7 5403 19.16 99.17 9.28 29.46 6.59 16.69 5.41 12.06 4.76 9.78 4.35 8.45 4.07 7.59 3.86 6.99 3.71 6.55 3.29 5.42 3.10 4.94
4
224.6 5625 19.25 99.25 9.12 28.71 6.39 15.98 5.19 11.39 4.53 9.15 4.12 7.85 3.84 7.01 3.63 6.42 3.48 5.99 3.06 4.89 2.87 4.43
5
230.2 5764 19.30 99.30 9.01 28.24 6.26 15.52 5.05 10.97 4.39 8.75 3.97 7.46 3.69 6.63 3.48 6.06 3.33 5.64 2.90 5.56 2.71 4.10
6
234.0 5859 19.33 99.33 8.94 27.91 6.16 15.21 4.95 10.67 4.28 8.47 3.87 7.19 3.58 6.37 3.37 5.80 3.22 5.39 2.79 4.32 2.60 3.87
7
236.8 5928 19.35 99.36 8.89 27.67 6.09 14.98 4.88 10.46 4.21 8.26 3.79 6.99 3.50 6.18 3.29 5.61 3.14 5.20 2.71 4.14 2.51 3.70
8
238.9 5981 19.37 99.37 8.85 27.49 6.04 14.80 4.82 10.29 4.15 8.10 3.73 6.84 3.44 6.03 3.23 5.47 3.07 5.06 2.64 4.00 2.45 3.56
9
240.5 6022 19.38 99.39 8.81 27.35 6.00 14.66 4.77 10.16 4.10 7.98 3.68 6.72 3.39 5.91 3.18 5.35 3.02 4.94 2.59 3.89 2.39 3.46
10
241.9 6056 19.40 99.40 8.79 27.23 5.96 14.55 4.74 10.05 4.06 7.87 3.64 6.62 3.35 5.81 3.14 5.26 2.98 4.85 2.54 3.80 2.35 3.37
15
245.9 6157 19.43 99.43 8.70 26.87 5.86 14.20 4.62 9.72 3.94 7.56 3.51 6.31 3.22 5.52 3.01 4.96 2.85 4.56 2.40 3.52 2.20 3.09
20
248.0 6209 19.45 99.45 8.66 26.69 5.80 14.02 4.56 9.55 3.87 7.40 3.44 6.16 3.15 5.36 2.94 4.81 2.77 4.41 2.33 3.37 2.12 2.94
1 2 3 4 5 6 7 8 9 10 15 20
Numerator Denominator
= 0.05 ... first row = 0.01 ... second row
Mean sum of squares (MS)

In ANOVA, we use the term Mean Square, or simply MS, in stead of the term variance. Remember that variance is defined as the mean of the squared deviations. In the same way that we use SS to stand for the sum of the squared deviations, we now will use MS to stand for the mean of the squared deviations. For the final F-ratio we will need an MSbetween treatments for the numerator and MSwithin treatments for the denominator.
MSbetween = SSbetween / dfbetween MSwithin = SSwithin / dfwithin
MSbetween F - ratio = MS within
Partition of variance and F-ratio Overview

Total variability
Between treatments variance Signal SSbetween dfbetween
Within treatments variance Error SSwithin dfwithin
Variance (MSbetween) =
Variance (MSwithin) =
Measures differences due to: Treatment effects and Chance
Measures differences due to: Chance
F-ratio =
MSbetween MSwithin
The p-value and ANOVA

Assumptions ... H0: There are no differences between subgroups means HA: There are differences between subgroups means Low p-values suggest that there ARE differences between subgroups means.
Tip: P-value is low, H0 must go !
The R-squared value and ANOVA

To provide an indication of how large the effect actually is, we check p-value but also the R2 value to take the decision if the result is robust or not. For Analysis of Variance, the simplest and most direct way to measure effect size is to compute R2, the percentage of variance accounted for. In simpler terms, R2 measures how much of the difference between scores is accounted for by the differences between treatments. SSbetween measures the variability accounted for by the treatment differences, and SStotal measures the total variability.
SS between R = SS total
2
The R-squared value and ANOVA - examples
Variance explained by the factor (treatment) Sum of variance 50% 90% Sum of variance
R2 = 90%
Error, part of variance not explained by the factor (xs)
R2 = 50 %
Which model is more robust? A or B?

ANOVA assumptions
1. Normality 2. Homogeneity of variance (equal variances) 3. Independence of error
Independence of error
Errors should be independent for each value and over time If not, then do not assume test is valid Identify why error is not independent and correct We use control charts to check the stability and detect the special cause
Normality
The values in each group are Normally distributed While the ANOVA method is robust against departures from normality as in the t-test, especially with large sample sizes, non-normal distributions where normality would be expected may indicate an area of investigation Master Black Belt may be consulted when non-normal data is being analysed (non-parametric tests)
Homogeneity of variance
The variance within each group is equal However, if the sample sizes are equal between groups, the F-test is robust enough for unequal variances Always try to have equal sample sizes If both normality and equal variances are violated, Master Black Belt may be consulted
The p-value
For a classical hypothesis test, use the p-value to evaluate the probability that the calculated F-ratio (or test statistic) was due to within subgroup noise. Low p-values suggest that there ARE differences between subgroups means: H0: There are no differences between subgroups means HA: There are differences between subgroups means
Examples to practice !
One-way ANOVA
Stat > ANOVA > One-Way Data must be in one column and the subscripts in another Can be used with balanced and unbalanced designs
A one-way analysis of variance (ANOVA) tests the hypothesis that the means of several populations are equal The method is an extension of the two-sample t-test, specifically for the case were the population variances are assumed to be equal. A one-way analysis of variance requires the following: A response, or measurement taken from the units sampled A factor, or discrete variable which is altered systematically
One-way ANOVA example: Tire brand test

Four cars: Four brands of tires: 1 A 2 B 3 C 4 D
Objective: To determine tread wear of tires after 30,000 km of driving. Problem: How do we assign 16 tires to the 4 cars? Assign each of the 16 tires at random to a wheel. (Large variability within brands.) Cars 1 C (12) A (17) D (13) D (11) 2 A (14) A (13) B (14) C (12) 3 C (10) D (11) B (14) B (13) 4 A (13) D (9) B (8) C (9)
Model Tread wear = Overall mean + Brand effect + error

Ref.: "Fundamental Concepts in the Design of Experiments" by Hicks and Turner
Difference in tread thickness in mm.
Data of tread wear of tires Each of the 16 tires assigned at random to wheel
Car One Brand C A D D A A B C D C B B A D B C
Two
Three
Four
Tread 12 17 13 11 14 13 14 12 10 11 14 13 13 9 8 9
Open the file
ANOVA - Tire Brand.MTW
<ANOVA - Tire Brand.mtw> and check the different assumptions:

Stability Normality Homogeneity of variance
(equal variances)
One-way ANOVA Stat > ANOVA > One-way
You wish to compare the mean tread wear for the different types of brands of tires. H0 is that the tread wear are all the same. Any variation is caused by random variation found in each brand. The HA is that different brands have different tread wear.
Normality and stability One-way ANOVA Residual plots

Residual Plots for Tread
Normal Probabilit y Plot
99 90 Percent 50 10 1 -5.0 -4 -2.5 0.0 Residual 2.5 5.0 11 12 13 Fitted V alue 14
Versus Fit s
2 Residual 0 -2
Normal ?
Hist ogram
3 2 1 0 2 Residual 0 -2 -4 -4 -3 -2 -1 0 Residual 1 2 3
1 2
Versus Order
Frequency
Stable ?
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation Order
Variance
Test for Equal Variances for Tread
Bart lett's Test
Test Stat istic P-V alue Levene's Test Test Stat istic P-V alue
1.52 0.677 0.15 0.926
B Brand
Variances are equal ?

C
2 4 6 8 10 12 14 16 95% Bonferroni Confidence Intervals for StDevs
18
One-way ANOVA example

1. Open the file <ANOVA - Tire Brand.mtw>
ANOVA - Tire Brand.MTW
2. Select Stat > ANOVA > One-Way
3. Select Tread for the Response and Brand for the Factor
4. Click on OK
Interpreting the One-way ANOVA Output from the session window

One-way ANOVA: Tread versus Brand Source Brand Error Total DF 3 12 15 SS 30.69 50.25 80.94 MS 10.23 4.19 F 2.44 P 0.115
The 1st row "Brand" gives the stats for the variation between the means of the factor levels. The 2nd row "Error" gives the stats for the variation due to random error. The 3rd row "Total" gives the stats for the overall variability in the data.
S = 2.046
R-Sq = 37.92%
R-Sq(adj) = 22.39%
MINITAB
Level A B C D
N 4 4 4 4
Mean 14.250 12.250 11.000 10.750
StDev 1.893 2.872 1.414 1.708
Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(----------*----------) (----------*----------) (----------*----------) (----------*----------) -------+---------+---------+---------+-10.0 12.0 14.0 16.0
Pooled StDev = 2.046
Interpreting the One-way ANOVA Output from the session window

One-way ANOVA: Tread versus Brand Source Brand Error Total DF 3 12 15 SS 30.69 50.25 80.94 MS 10.23 4.19 F 2.44 P 0.115
1. What is your decision? 2. The result is robust or not and why? 3. Which Brand is best?
S = 2.046
R-Sq = 37.92%
R-Sq(adj) = 22.39%
MINITAB
Level A B C D
N 4 4 4 4
Mean 14.250 12.250 11.000 10.750
StDev 1.893 2.872 1.414 1.708
Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(----------*----------) (----------*----------) (----------*----------) (----------*----------) -------+---------+---------+---------+-10.0 12.0 14.0 16.0
Pooled StDev = 2.046
Two-way ANOVA Using a 2nd variable to block Car variation

Assign each tire at random but under the condition that each tire occurs exactly once on each car. Reduces unexplained variability.
Cars
1 B (14) C (12) A (17) D (13)
2 D (11) C (12) B (14) A (14)
3 A (13) B (13) D (11) C (10)
4 C (9) D (9) B (8) A (13)
Model Tread wear = Overall mean + Brand effect + Car effect + error
Data of tread wear of tires Each tire occurs exactly once on each car
Car One Brand B C A D D C B A A B D C C D B A
Two
Three
Four
Tread 14 12 17 13 11 12 14 14 13 13 11 10 9 9 8 13
Open the file
ANOVA - Tire Brand Car.MTW
<ANOVA - Tire Brand Car.mtw> and check the assumptions:

(equal variances)
Two-way ANOVA example: Tire brand test

1. Open the file <ANOVA - Tire Brand Car.mtw> 2. Select Stat > ANOVA > Two-Way
ANOVA - Tire Brand Car.MTW
3. Select Tread for the Response and Brand for the Row factor and Car for Column factor. Check Display means.
4. Click on OK
Interpreting the Two-way ANOVA Output from the session window

Two-way ANOVA: Tread versus Brand, Car
MINITAB
Source Brand Car Error Total
DF 3 3 9 15
SS 30.6875 38.6875 11.5625 80.9375
MS 10.2292 12.8958 1.2847
F 7.96 10.04
P 0.007 0.003
p-values are low for Car and Brand, therefore: Brands are not the same, and Tread loss for Cars is not the same.
S = 1.133
R-Sq = 85.71%
R-Sq(adj) = 76.19%
Lets look at the residuals plots ...
Two-way ANOVA Residual plots

Normal Probability Plot
99 90 50 10 1 -2 -1 0 Residual 1 2 Residual P ercent 0 -1 -2 8 10 12 Fit t ed Value 14 16 1
Versus Fits
Histogram
4 Frequency 3 2 1 0 -2.0 -1.5 -1.0 -0.5 0.0 Residual 0.5 1.0 Residual 1 0 -1 -2
1 2 3 4 5
Versus Order
Observ at ion Order
9 10 11 12 13 14 15 16
The residuals plots show no unusual observations. The Histogram is not bell shaped (only 16 observations) so it is hard to interpret.
Interpreting the Two-way ANOVA Output from the session window

Two-way ANOVA: Tread versus Brand, Car Individual 95% CIs For Mean Based on Pooled StDev -+---------+---------+---------+-------(-------*-------) The confidence intervals show: (-------*-------) Brands are not the same, and (-------*-------) Tread loss for Cars is not the same. (-------*-------) -+---------+---------+---------+-------9.6 11.2 12.8 14.4 Individual 95% CIs For Mean Based on Pooled StDev --------+---------+---------+---------+(------*-----) 1. What is your decision? (-----*-----) 2. The result is robust or not and (------*-----) (------*-----) 3. Which factor is significant? --------+---------+---------+---------+10.0 12.0 14.0 16.0
Brand A B C D
Mean 14.25 12.25 10.75 11.00
MINITAB
Car Four One Three Two
Mean 9.75 14.00 11.75 12.75
why?
Lets look at the data graphically
Graph > Chart > Values from a table
Lets look at the data graphically
Graph > Chart > Values from a table > Data View
Displaying the Two-way ANOVA design

Chart of Tread
18 16 14 12 Tread 10 8 6 4 2 0 Brand Car B C A D One B C A D Two B C A D Three B C A D Four
Brand B C A D
All 4 Brands performed better in Car One. This is an assignable difference due to Car. Also it appears that Brand A performs better at each Car than the other Brands.
Displaying the Two-way ANOVA design

Chart of Tread
18 16 14 12 Tread 10 8 6 4 2
Car Brand
Car One Two Three Four
e o e ur O n Tw h r e F o T B
e o e ur O n Tw h re F o T C
e o e ur O n Tw h re F o T A
e o e ur On Tw h re F o T D
Here we are trying to discover which Brand of Tires had the best Tread Wear characteristics. We included a blocking variable to explain some of the variability. Based on a comparison of the bar chart and the ANOVA table which Brand should be selected?
Three-way ANOVA Using a Latin Square design

Each brand appears once in each position and only once on each car (2 restrictions on randomisation). Minimises variability.
Position I II III IV
1 C (12) B (14) A (17) D (13)
2 D (11) C (12) B (14) A (14)
3 A (13) D (11) C (10) B (13)
4 B (8) A (13) D (9) C (9)
Model Tread wear = Overall mean + Brand effect + Car effect + Position effect + error
Data of tread wear of tires

Position Left Front Right Front One Left Back Right Back Left Front Right Front Two Left Back Right Back Left Front Right Front Three Left Back Right Back Left Front Right Front Four Left Back Right Back
2010-06-04 SKF Group Slide 50 SKF (Group Six Sigma)
Each brand appears once in each position and each car

Car Brand B C A D D C B A A B D C C D B A
Tread 14 12 17 13 11 12 14 14 13 13 11 10 9 9 8 13
Open the file
ANOVA - Tire Brand Car Position.MTW
<ANOVA - Tire Brand Car Position.mtw> and check the assumptions:

(equal variances)
Three-way ANOVA Stat > ANOVA > General Linear Model

Fill out the dialog box as shown. Click OK.
Interpreting the General Linear Model Output from the session window
General Linear Model: Tread versus Car, Position, Brand Factor Car Position Brand Type fixed fixed fixed Levels 4 4 4
Values Four, One, Three, Two Left Back, Left Front, Right Back, Right Front A, B, C, D
The 1st half of the table lists the value for each level of each factor. The 2nd half is the ANOVA table.
MINITAB
Analysis of Variance for Tread, using Adjusted SS for Tests Source Car Position Brand Error Total DF 3 3 3 6 15 Seq SS 38.6875 6.1875 30.6875 5.3750 80.9375 Adj SS 38.6875 6.1875 30.6875 5.3750 Adj MS 12.8958 2.0625 10.2292 0.8958 F 14.40 2.30 11.42 P 0.004 0.177 0.007
Two factors are statistically significant at the = 0.05 level: Car, Brand. Factor Position doesnt appear to be a significant effect. The residual plots will confirm whether the basic assumptions about the error have been met. Lets look at the residuals plots ...
S = 0.946485
R-Sq = 93.36%
R-Sq(adj) = 83.40%
General Linear Model Residual plots
General Linear Model Residual plots

Normal Probability Plot
99 90 Residual P ercent 50 10 1 -1 0 Residual 1 1.0 0.5 0.0 -0.5 -1.0 10 12 14 Fit t ed Value 16
Versus Fits
Histogram
4 Frequency 3 2 1 0 -1.0 -0.5 0.0 Residual 0.5 1.0 1.0 0.5 Residual 0.0 -0.5 -1.0
1 2 3
Versus Order
4 5 6
Observ at ion Order
7 8
9 10 11 12 13 14 15 16
Review the residual plots and state the conclusions about the assumptions regarding error, i.e. that the errors for each treatment level are independent, normally distributed with a mean = 0 and a constant variance.
ANOVA example with GLM

How to include the interaction in the model?
1. Select Stat > ANOVA > General Linear Model
2. Select Tread for Response Car and Brand for Model. For the interaction we create Car*Brand.
3. Click on OK
GLM with an unbalanced and nested design

Four chemical companies produce insecticides that can be used to kill mosquitoes, but the composition of the insecticides differs from company to company. An experiment is conducted to test the efficacy of the insecticides by placing 400 mosquitoes inside a glass container treated with a single insecticide and counting the live mosquitoes 4 hours later. Three replications are performed for each product. The goal is to compare the product effectiveness of the different companies. The factors are fixed because you are interested in comparing the particular brands. The factors are nested because each insecticide for each company is unique. You use GLM to analyse your data because the design is unbalanced:

Company A: 3 type of products Company B: 2 type of products Company C: 2 type of products Company D: 4 type of products
SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

1. Select Stat > ANOVA > General Linear Model
For the Nested design add (Company)
2. Select NMosquito for Response Company and Product for Model.
3. Click on OK

ANOVA table in session window
1. What is your decision? 2. Which parameter is significant?
Multi-way ANOVA
Two-way, Balanced, General Linear Model.
Two-way ANOVA may also be used to analyse a design where there are two controllable factors, both of which are of interest. More than two factors can be analysed using Balanced ANOVA or General Linear Model. There may be more than one factor that has an effect on the response variable. This commonly occurs in manufacturing processes. It is often wise to include more than one factor in the analysis.
Valuable resources can be used more efficiently by investigating several
factors at one time.
More error can be explained by including additional factors in the model. By including more factors interactions can be studied.
What about the other ANOVA options? When are they appropriate?
One-way ANOVA Two-way ANOVA Balanced ANOVA Studies the effect of one factor at various levels on a response variable. Studies the effect of two factors and their interaction at various levels on a response variable. Studies the impact of 2 or more factors and their interactions at various levels on a response variable. The levels of factors are structured such that there are an equal number of levels and observations within each level for each factor. Studies the impact of 2 or more factors and their interactions at various levels on a response variable. The number of levels and observations may vary. The factors may be a mixture nested and crossed relationship. User must specify factors, interactions and nested/crossed relationships of interest. Studies the impact of 2 or more factors. The factors are structured in a hierarchical structure such that one factor is nested (or unique to) the factor above it. No interactions are obtained.
SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
General Linear Model Fully Nested ANOVA
Partitioning of sums of squares

Total SS
SS Between Brands
SS Within Brands
SS Between Cars
SS Within Cars
SS Between Positions
SS Within (Error)
Summary ANOVA
One Way ANOVA
To analyse the difference between means from 2 or more samples
Balanced ANOVA
To compare the means of populations that are classified in two or more ways (two or more factors)
General Linear Model

Similar to above
Last words
We reviewed:
Graphical methods for analysing differences between means obtained from 2 or more samples. Analysis of Variance (ANOVA) methods for analysing the differences between means. Methods for determining whether or not significant differences in variance exist between two or more samples.
Appendix
Post Hoc tests

Definition: Post hoc tests are additional hypothesis tests that are done after an ANOVA to determine exactly which mean differences are significant and which are not. These tests are done when:
You reject H0 and there are three or more treatments. Rejecting H0 indicates that at least one difference exists among the treatments. With k = 3 or more, the problem is to find where the differences are. Note that when you have two treatments, rejecting H0 indicates that the two means are not equal, in this case there is no question about which means are different, and there is no need to do Post Hoc Tests.
The first test we consider is Tukeys HSD test. Tukeys test allows you to compute a single value that determines the minimum difference between treatment mean that is necessary for significance.
Post Hoc tests

This value, called the Honestly Significant Difference (HSD) is then used to compare any two treatments (Xs). If the mean difference exceed Tukeys HSD you conclude that there is significant difference between treatments. The formula is:
MS within HSD = q n
N: number of data for each treatment
Where the value of q is found in the table (next slide). To locate the appropriate value of q, you must know the number of treatments in the overall experiment (k) and the degree of freedom for the Error and select the Alpha-risk (0.05) q value used in this test is called a Studentised range statistic. Tukeys test requires that the sample size must be the same for all treatments.
Post Hoc tests
Tukeys HSD test example

Example of seal life by shift
Shift 1 25.40 26.31 Shift 2 23.40 21.80 23.50 22.75 21.60 22.61 Shift 3 20.00 22.20 19.75 20.60 20.40 20.59
ANOVA result: P-value is low, the difference is significant between the shifts.
24.10 23.74 25.10 Mean 24.93
Now the question is: Which mean differences are significant and which are not?
Tukeys HSD test example

Tukeys HSD calculation step 1: Determine the q value, in this example k=3 and df for Error = 12. Check the value in the table, we get = 3.77 with Alpha-risk = 0.05 Tukeys HSD calculation step 2: Determine the HSD value 0.921 Tukeys HSD calculation step 3: HSD = 3.77 = 1.618
5
The mean difference between any two samples must be at least 1,618 to be significant. Using this value, we can make the following conclusions : Shift 1 is significantly different from Shift 2 (Mean S1 Mean S2 = 2.32) Shift 1 is significantly different from Shift 3 (Mean S1 Mean S3 = 4.34) Shift 2 is significantly different from Shift 2 (Mean S2 Mean S3 = 2.02)
Summary
ANOVA is used as a hypothesis test and we also use it for components of variation studies The X is attribute and Y is variable very common data sets ANOVA introduced us to 3 preliminary tests before concluding to accept or reject the null:
All hypothesis tests require these or similar tests of assumptions Use the appropriate design before to calculate ANOVA Use Tukeys HSD test to adjust the conclusion if needed

2.16 Analysis of Variance (ANOVA) Rev DD 20100604

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

2.16 Analysis of Variance (ANOVA) Rev DD 20100604

Transféré par

Droits d'auteur :

Formats disponibles

2.

16 Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) methods for analysing the differences

To understand the relationship of

2010-06-04 SKF Group Slide 1

SKF (Group Six Sigma)