Académique Documents
Professionnel Documents
Culture Documents
Objectives
To introduce ANOVA hypothesis testing Graphical method for analysing differences between means obtained
from two or more samples
To understand the measuring effect size To practice examples To introduce the Post Hoc test
By knowing and controlling the Xs, we reduce the variability in Y. We validate Xs and Ys with hypothesis testing.
Method
Uses sums of squared differences, just like a standard deviation, to evaluate the total variability of the system Calculates "standard deviations" for each source and subtracts their variability from the total
ANOVA graphical
Between subgroup variation (signal)
To calculate sample variance (mean squared deviation), we find the sum of the squared deviations (SS) and divide by the number of scores that are free to vary. This number is n-1 = df.
s2
=
SS df
df =
The F-distribution
Variance = Sum of squared deviations/df There are two variances (within and between), the F-statistic is the ratio of these two variances. The ratio follows an F-distribution. The F-distribution depends on two sets of degrees of freedom - the df from each variance: df1 for the between and df2 for the within
Fdf ,df
1
s2 = between s2 within
SS between = dfbetween
2 within
SS within = dfwithin
Fdf ,df
1
The F-distribution depends on two sets of degrees of freedom: the df from each variance: df1 for the between and df2 for the within.
2010-06-04 SKF Group Slide 15 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
A calculated F-ratio > Fcrit gives me less than a 5% chance that the larger between variation occurred by chance alone.
Fcritical at 5%
Remember you choose the amount of risk to take, then find a corresponding Fcritical
Reject Null Hypothesis if F2,12 > 3.89, i.e. 25.60 > 3.89, therefore reject the Null hypothesis. Conclude that the shifts have a significant effect on seal life.
2
199.5 5000 19.00 99.00 9.55 30.82 6.94 18.00 5.79 13.27 5.14 10.92 4.74 9.55 4.46 8.65 4.26 8.02 4.10 7.56 3.68 6.36 3.49 5.85
3
315.7 5403 19.16 99.17 9.28 29.46 6.59 16.69 5.41 12.06 4.76 9.78 4.35 8.45 4.07 7.59 3.86 6.99 3.71 6.55 3.29 5.42 3.10 4.94
4
224.6 5625 19.25 99.25 9.12 28.71 6.39 15.98 5.19 11.39 4.53 9.15 4.12 7.85 3.84 7.01 3.63 6.42 3.48 5.99 3.06 4.89 2.87 4.43
5
230.2 5764 19.30 99.30 9.01 28.24 6.26 15.52 5.05 10.97 4.39 8.75 3.97 7.46 3.69 6.63 3.48 6.06 3.33 5.64 2.90 5.56 2.71 4.10
6
234.0 5859 19.33 99.33 8.94 27.91 6.16 15.21 4.95 10.67 4.28 8.47 3.87 7.19 3.58 6.37 3.37 5.80 3.22 5.39 2.79 4.32 2.60 3.87
7
236.8 5928 19.35 99.36 8.89 27.67 6.09 14.98 4.88 10.46 4.21 8.26 3.79 6.99 3.50 6.18 3.29 5.61 3.14 5.20 2.71 4.14 2.51 3.70
8
238.9 5981 19.37 99.37 8.85 27.49 6.04 14.80 4.82 10.29 4.15 8.10 3.73 6.84 3.44 6.03 3.23 5.47 3.07 5.06 2.64 4.00 2.45 3.56
9
240.5 6022 19.38 99.39 8.81 27.35 6.00 14.66 4.77 10.16 4.10 7.98 3.68 6.72 3.39 5.91 3.18 5.35 3.02 4.94 2.59 3.89 2.39 3.46
10
241.9 6056 19.40 99.40 8.79 27.23 5.96 14.55 4.74 10.05 4.06 7.87 3.64 6.62 3.35 5.81 3.14 5.26 2.98 4.85 2.54 3.80 2.35 3.37
15
245.9 6157 19.43 99.43 8.70 26.87 5.86 14.20 4.62 9.72 3.94 7.56 3.51 6.31 3.22 5.52 3.01 4.96 2.85 4.56 2.40 3.52 2.20 3.09
20
248.0 6209 19.45 99.45 8.66 26.69 5.80 14.02 4.56 9.55 3.87 7.40 3.44 6.16 3.15 5.36 2.94 4.81 2.77 4.41 2.33 3.37 2.12 2.94
1 2 3 4 5 6 7 8 9 10 15 20
Numerator Denominator
2010-06-04 SKF Group Slide 18 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Variance (MSbetween) =
Variance (MSwithin) =
F-ratio =
MSbetween MSwithin
SS between R = SS total
2
2010-06-04 SKF Group Slide 22 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Variance explained by the factor (treatment) Sum of variance 50% 90% Sum of variance
R2 = 90%
R2 = 50 %
ANOVA assumptions
1. Normality 2. Homogeneity of variance (equal variances) 3. Independence of error
Independence of error
Errors should be independent for each value and over time If not, then do not assume test is valid Identify why error is not independent and correct We use control charts to check the stability and detect the special cause
Normality
The values in each group are Normally distributed While the ANOVA method is robust against departures from normality as in the t-test, especially with large sample sizes, non-normal distributions where normality would be expected may indicate an area of investigation Master Black Belt may be consulted when non-normal data is being analysed (non-parametric tests)
Homogeneity of variance
The variance within each group is equal However, if the sample sizes are equal between groups, the F-test is robust enough for unequal variances Always try to have equal sample sizes If both normality and equal variances are violated, Master Black Belt may be consulted
The p-value
For a classical hypothesis test, use the p-value to evaluate the probability that the calculated F-ratio (or test statistic) was due to within subgroup noise. Low p-values suggest that there ARE differences between subgroups means: H0: There are no differences between subgroups means HA: There are differences between subgroups means
Examples to practice !
One-way ANOVA
Stat > ANOVA > One-Way Data must be in one column and the subscripts in another Can be used with balanced and unbalanced designs
A one-way analysis of variance (ANOVA) tests the hypothesis that the means of several populations are equal The method is an extension of the two-sample t-test, specifically for the case were the population variances are assumed to be equal. A one-way analysis of variance requires the following: A response, or measurement taken from the units sampled A factor, or discrete variable which is altered systematically
Objective: To determine tread wear of tires after 30,000 km of driving. Problem: How do we assign 16 tires to the 4 cars? Assign each of the 16 tires at random to a wheel. (Large variability within brands.) Cars 1 C (12) A (17) D (13) D (11) 2 A (14) A (13) B (14) C (12) 3 C (10) D (11) B (14) B (13) 4 A (13) D (9) B (8) C (9)
Data of tread wear of tires Each of the 16 tires assigned at random to wheel
Car One Brand C A D D A A B C D C B B A D B C
SKF (Group Six Sigma)
Two
Three
Four
Tread 12 17 13 11 14 13 14 12 10 11 14 13 13 9 8 9
You wish to compare the mean tread wear for the different types of brands of tires. H0 is that the tread wear are all the same. Any variation is caused by random variation found in each brand. The HA is that different brands have different tread wear.
Versus Fit s
2 Residual 0 -2
Normal ?
Hist ogram
3 2 1 0 2 Residual 0 -2 -4 -4 -3 -2 -1 0 Residual 1 2 3
1 2
Versus Order
Frequency
Stable ?
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation Order
Variance
Test for Equal Variances for Tread
Bart lett's Test
Test Stat istic P-V alue Levene's Test Test Stat istic P-V alue
B Brand
18
3. Select Tread for the Response and Brand for the Factor
4. Click on OK
The 1st row "Brand" gives the stats for the variation between the means of the factor levels. The 2nd row "Error" gives the stats for the variation due to random error. The 3rd row "Total" gives the stats for the overall variability in the data.
S = 2.046
R-Sq = 37.92%
R-Sq(adj) = 22.39%
MINITAB
Level A B C D
N 4 4 4 4
Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(----------*----------) (----------*----------) (----------*----------) (----------*----------) -------+---------+---------+---------+-10.0 12.0 14.0 16.0
1. What is your decision? 2. The result is robust or not and why? 3. Which Brand is best?
S = 2.046
R-Sq = 37.92%
R-Sq(adj) = 22.39%
MINITAB
Level A B C D
N 4 4 4 4
Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(----------*----------) (----------*----------) (----------*----------) (----------*----------) -------+---------+---------+---------+-10.0 12.0 14.0 16.0
Cars
Model Tread wear = Overall mean + Brand effect + Car effect + error
2010-06-04 SKF Group Slide 39 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Data of tread wear of tires Each tire occurs exactly once on each car
Car One Brand B C A D D C B A A B D C C D B A
SKF (Group Six Sigma)
Two
Three
Four
Tread 14 12 17 13 11 12 14 14 13 13 11 10 9 9 8 13
3. Select Tread for the Response and Brand for the Row factor and Car for Column factor. Check Display means.
4. Click on OK
MINITAB
DF 3 3 9 15
F 7.96 10.04
P 0.007 0.003
p-values are low for Car and Brand, therefore: Brands are not the same, and Tread loss for Cars is not the same.
S = 1.133
R-Sq = 85.71%
R-Sq(adj) = 76.19%
Versus Fits
Histogram
4 Frequency 3 2 1 0 -2.0 -1.5 -1.0 -0.5 0.0 Residual 0.5 1.0 Residual 1 0 -1 -2
1 2 3 4 5
Versus Order
9 10 11 12 13 14 15 16
The residuals plots show no unusual observations. The Histogram is not bell shaped (only 16 observations) so it is hard to interpret.
2010-06-04 SKF Group Slide 43 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Brand A B C D
MINITAB
why?
Graph > Chart > Values from a table > Data View
2010-06-04 SKF Group Slide 46 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
All 4 Brands performed better in Car One. This is an assignable difference due to Car. Also it appears that Brand A performs better at each Car than the other Brands.
2010-06-04 SKF Group Slide 47 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
e o e ur O n Tw h r e F o T B
e o e ur O n Tw h re F o T C
e o e ur O n Tw h re F o T A
e o e ur On Tw h re F o T D
Here we are trying to discover which Brand of Tires had the best Tread Wear characteristics. We included a blocking variable to explain some of the variability. Based on a comparison of the bar chart and the ANOVA table which Brand should be selected?
2010-06-04 SKF Group Slide 48 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Position I II III IV
Model Tread wear = Overall mean + Brand effect + Car effect + Position effect + error
2010-06-04 SKF Group Slide 49 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Tread 14 12 17 13 11 12 14 14 13 13 11 10 9 9 8 13
Interpreting the General Linear Model Output from the session window
General Linear Model: Tread versus Car, Position, Brand Factor Car Position Brand Type fixed fixed fixed Levels 4 4 4
Values Four, One, Three, Two Left Back, Left Front, Right Back, Right Front A, B, C, D
The 1st half of the table lists the value for each level of each factor. The 2nd half is the ANOVA table.
MINITAB
Analysis of Variance for Tread, using Adjusted SS for Tests Source Car Position Brand Error Total DF 3 3 3 6 15 Seq SS 38.6875 6.1875 30.6875 5.3750 80.9375 Adj SS 38.6875 6.1875 30.6875 5.3750 Adj MS 12.8958 2.0625 10.2292 0.8958 F 14.40 2.30 11.42 P 0.004 0.177 0.007
Two factors are statistically significant at the = 0.05 level: Car, Brand. Factor Position doesnt appear to be a significant effect. The residual plots will confirm whether the basic assumptions about the error have been met. Lets look at the residuals plots ...
S = 0.946485
R-Sq = 93.36%
R-Sq(adj) = 83.40%
Versus Fits
Histogram
4 Frequency 3 2 1 0 -1.0 -0.5 0.0 Residual 0.5 1.0 1.0 0.5 Residual 0.0 -0.5 -1.0
1 2 3
Versus Order
4 5 6
7 8
9 10 11 12 13 14 15 16
Review the residual plots and state the conclusions about the assumptions regarding error, i.e. that the errors for each treatment level are independent, normally distributed with a mean = 0 and a constant variance.
2010-06-04 SKF Group Slide 54 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
2. Select Tread for Response Car and Brand for Model. For the interaction we create Car*Brand.
3. Click on OK
2010-06-04 SKF Group Slide 55 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
3. Click on OK
2010-06-04 SKF Group Slide 57 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Multi-way ANOVA
Two-way, Balanced, General Linear Model.
Two-way ANOVA may also be used to analyse a design where there are two controllable factors, both of which are of interest. More than two factors can be analysed using Balanced ANOVA or General Linear Model. There may be more than one factor that has an effect on the response variable. This commonly occurs in manufacturing processes. It is often wise to include more than one factor in the analysis.
Valuable resources can be used more efficiently by investigating several
factors at one time.
More error can be explained by including additional factors in the model. By including more factors interactions can be studied.
What about the other ANOVA options? When are they appropriate?
One-way ANOVA Two-way ANOVA Balanced ANOVA Studies the effect of one factor at various levels on a response variable. Studies the effect of two factors and their interaction at various levels on a response variable. Studies the impact of 2 or more factors and their interactions at various levels on a response variable. The levels of factors are structured such that there are an equal number of levels and observations within each level for each factor. Studies the impact of 2 or more factors and their interactions at various levels on a response variable. The number of levels and observations may vary. The factors may be a mixture nested and crossed relationship. User must specify factors, interactions and nested/crossed relationships of interest. Studies the impact of 2 or more factors. The factors are structured in a hierarchical structure such that one factor is nested (or unique to) the factor above it. No interactions are obtained.
SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
SS Between Brands
SS Within Brands
SS Between Cars
SS Within Cars
SS Between Positions
2010-06-04 SKF Group Slide 61 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
SS Within (Error)
Summary ANOVA
One Way ANOVA
To analyse the difference between means from 2 or more samples
Balanced ANOVA
To compare the means of populations that are classified in two or more ways (two or more factors)
Last words
We reviewed:
Graphical methods for analysing differences between means obtained from 2 or more samples. Analysis of Variance (ANOVA) methods for analysing the differences between means. Methods for determining whether or not significant differences in variance exist between two or more samples.
Appendix
You reject H0 and there are three or more treatments. Rejecting H0 indicates that at least one difference exists among the treatments. With k = 3 or more, the problem is to find where the differences are. Note that when you have two treatments, rejecting H0 indicates that the two means are not equal, in this case there is no question about which means are different, and there is no need to do Post Hoc Tests.
The first test we consider is Tukeys HSD test. Tukeys test allows you to compute a single value that determines the minimum difference between treatment mean that is necessary for significance.
2010-06-04 SKF Group Slide 65 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
MS within HSD = q n
N: number of data for each treatment
Where the value of q is found in the table (next slide). To locate the appropriate value of q, you must know the number of treatments in the overall experiment (k) and the degree of freedom for the Error and select the Alpha-risk (0.05) q value used in this test is called a Studentised range statistic. Tukeys test requires that the sample size must be the same for all treatments.
2010-06-04 SKF Group Slide 66 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
ANOVA result: P-value is low, the difference is significant between the shifts.
Now the question is: Which mean differences are significant and which are not?
The mean difference between any two samples must be at least 1,618 to be significant. Using this value, we can make the following conclusions : Shift 1 is significantly different from Shift 2 (Mean S1 Mean S2 = 2.32) Shift 1 is significantly different from Shift 3 (Mean S1 Mean S3 = 4.34) Shift 2 is significantly different from Shift 2 (Mean S2 Mean S3 = 2.02)
Summary
ANOVA is used as a hypothesis test and we also use it for components of variation studies The X is attribute and Y is variable very common data sets ANOVA introduced us to 3 preliminary tests before concluding to accept or reject the null:
Stability Normality Homogeneity of variance
All hypothesis tests require these or similar tests of assumptions Use the appropriate design before to calculate ANOVA Use Tukeys HSD test to adjust the conclusion if needed