Vous êtes sur la page 1sur 71

2.

16 Analysis of Variance (ANOVA)


Six Sigma Black Belt and Green Belt Week 2 Revised 4th June 2010

Objectives
To introduce ANOVA hypothesis testing Graphical method for analysing differences between means obtained
from two or more samples

Analysis of Variance (ANOVA) methods for analysing the differences


between means

To understand the relationship of


"within" subgroup estimates of variation and "between" subgroup estimates of variation

To understand the measuring effect size To practice examples To introduce the Post Hoc test

2010-06-04 SKF Group Slide 1

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

SKF Six Sigma roadmap


Six Sigma methodology and roadmap for common tool usage

2010-06-04 SKF Group Slide 2

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Validating key process inputs and outputs with ANOVA

Y = f(X1, X2, X3, ..., Xn)


Variable (Continuous) Variables with categories (Attribute)

By knowing and controlling the Xs, we reduce the variability in Y. We validate Xs and Ys with hypothesis testing.

2010-06-04 SKF Group Slide 3

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

What is ANalysis Of VAriance (ANOVA)?


It compares means or other estimates of variance for each source of variation Methods for determining whether or not significant differences in variance exist between two or more samples A hypothesis test to validate P-FMEA key process input variables A test for validating improvements in the Ys Analysis of variance (ANOVA) is used to investigate and model the relationship between a response variable and one or more predictor variables. However, the predictor variables are qualitative (categorical).
Note: The underlying test is used in many Experimental Designs.
2010-06-04 SKF Group Slide 4 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Method
Uses sums of squared differences, just like a standard deviation, to evaluate the total variability of the system Calculates "standard deviations" for each source and subtracts their variability from the total

2010-06-04 SKF Group Slide 5

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

ANOVA graphical
Between subgroup variation (signal)

Within subgroup variation (error)

2010-06-04 SKF Group Slide 6

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Degree of freedom introduction


Degrees of freedom (df) is the number of independent comparisons available to estimate a specific statistic. In ANOVA, the degrees of freedom are based on the total number of responses and the number of levels at which factors are tested. What is the minimum number of comparisons it would take to determine which person is the shortest?

2010-06-04 SKF Group Slide 7

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

The degree of freedom concept?


Example:
Consider a sample of n = 3 scores with a mean of X-bar = 5. The first score in the sample can be selected without any restrictions; all scores are independent of each other and they can have any value. For this demonstration assume X = 2 is obtained for the first score and X = 9 for the second. At this point, however, the third score can be determined.

2010-06-04 SKF Group Slide 8

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

The degree of freedom concept?


In this case the third score must be X = 4. The reason that the third score has to be X = 4 is, the entire sample of n = 3 scores has a mean of: X-bar = 5, which means that the sum of the total must be: X = 15. The first two scores add up to 11 (= 9 + 2), so the third score must be X = 4. In this case the first two out of three scores were free to have any value, but the final score was dependent on the values chosen for the first two. With a sample of n scores, the first n-1 scores are free to vary, but the final score can be determined. As a result, the sample is said to have n-1 degrees of freedom (df). The degrees of freedom determine the number of scores in the sample which are independent and free to vary.

2010-06-04 SKF Group Slide 9

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Thus in other words ...


Degrees of freedom are "statistical cash" ... We "earn" a degree of freedom for every data point we collect We "spend" a degree of freedom for every parameter we estimate

Degrees of freedom (within groups):


Earn a degree of freedom for each observation within each group Spend one degree of freedom to calculate the average for each group dfW = n 1, where n = sample size / treatment

Degrees of freedom (between groups):


Earn a degree of freedom for each group Spend one degree of freedom to calculate the overall average dfB = k 1, where k = # of group averages or number of treatments

2010-06-04 SKF Group Slide 10

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Degree of freedom and ANOVA


The n-1 degrees of freedom for a sample is the same n-1 that is used in the formulas for sample variance and sample standard deviation. Remember, variance is defined as the mean square deviation. This mean it is computed by finding the sum and dividing by the number of scores:
Mean = Sum / Number of scores

To calculate sample variance (mean squared deviation), we find the sum of the squared deviations (SS) and divide by the number of scores that are free to vary. This number is n-1 = df.
s2
=

Sum of squared deviations Number of scores free to vary

SS df

2010-06-04 SKF Group Slide 11

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Calculate the F statistic Example of seal life by shift (1)


Shift 1 25.40 26.31 24.10 23.74 25.10 Mean 24.93 (5-1) Shift 2 23.40 21.80 23.50 22.75 21.60 22.61 (5-1) Shift 3 20.00 22.20 19.75 20.60 20.40 20.59 (5-1) dftotal = (4) + (4) + (4) = 12
Data collection !

df =

Overall average = 22.71


2010-06-04 SKF Group Slide 12 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Calculate the F statistic Example of seal life by shift (2)

Mean shift 1 (24.93) Mean shift 2 (22.61) Mean shift 3 (20.59)

Overall average = 22.71

2010-06-04 SKF Group Slide 13

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

The F-distribution
Variance = Sum of squared deviations/df There are two variances (within and between), the F-statistic is the ratio of these two variances. The ratio follows an F-distribution. The F-distribution depends on two sets of degrees of freedom - the df from each variance: df1 for the between and df2 for the within

Fdf ,df
1

s2 = between s2 within

2010-06-04 SKF Group Slide 14

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Calculate the F statistic Example of seal life by shift (3)


s
2 between

SS between = dfbetween

SS between 47.164 = = 23.582 dfbetween 31


Number of shifts

2 within

SS within = dfwithin

SS within 11.0532 = = 0.9211 dfwithin 15 3


Total data available

Fdf ,df
1

s2 23.582 between = 2 = = 25.60 s within 0.9211

The F-distribution depends on two sets of degrees of freedom: the df from each variance: df1 for the between and df2 for the within.
2010-06-04 SKF Group Slide 15 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

What is the distribution of the F-ratio?


This is the distribution of F-ratios that would occur if there was no difference in group means. For example, say Im willing to take a 5% chance of being wrong by saying there is more between than within variation.
The curve changes as a function of the numerator df and denominator df Represents the amount of risk I'm willing to take of being wrong when I say that Ive found this factor to be a significant effect.

A calculated F-ratio > Fcrit gives me less than a 5% chance that the larger between variation occurred by chance alone.

Fcritical at 5%

5% of the total area is from this F value, Fcrit to the right

Remember you choose the amount of risk to take, then find a corresponding Fcritical

2010-06-04 SKF Group Slide 16

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Evaluate the F statistic Example of seal life by shift (4)


We find the critical value of F in the F-distribution table and the significance level of 0.05 is desired (with 2 degrees of freedom in the numerator and 12 degrees of freedom in the denominator).
Numerator df 1 Denominator df 2 11 12 13 1 2 3

4.84 4.75 4.67

3.98 3.89 3.81

3.59 3.49 3.41

Reject Null Hypothesis if F2,12 > 3.89, i.e. 25.60 > 3.89, therefore reject the Null hypothesis. Conclude that the shifts have a significant effect on seal life.

2010-06-04 SKF Group Slide 17

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

F-distribution table Probability points of the F-distribution


Degrees of Freedom for Denominator

Degrees of Freedom for Numerator (df) 1


161.4 4052 18.51 98.50 10.13 34.12 7.71 21.20 6.61 16.26 5.99 13.75 5.59 12.25 5.32 11.26 5.12 10.56 4.96 10.04 4.54 8.68 4.35 8.10

2
199.5 5000 19.00 99.00 9.55 30.82 6.94 18.00 5.79 13.27 5.14 10.92 4.74 9.55 4.46 8.65 4.26 8.02 4.10 7.56 3.68 6.36 3.49 5.85

3
315.7 5403 19.16 99.17 9.28 29.46 6.59 16.69 5.41 12.06 4.76 9.78 4.35 8.45 4.07 7.59 3.86 6.99 3.71 6.55 3.29 5.42 3.10 4.94

4
224.6 5625 19.25 99.25 9.12 28.71 6.39 15.98 5.19 11.39 4.53 9.15 4.12 7.85 3.84 7.01 3.63 6.42 3.48 5.99 3.06 4.89 2.87 4.43

5
230.2 5764 19.30 99.30 9.01 28.24 6.26 15.52 5.05 10.97 4.39 8.75 3.97 7.46 3.69 6.63 3.48 6.06 3.33 5.64 2.90 5.56 2.71 4.10

6
234.0 5859 19.33 99.33 8.94 27.91 6.16 15.21 4.95 10.67 4.28 8.47 3.87 7.19 3.58 6.37 3.37 5.80 3.22 5.39 2.79 4.32 2.60 3.87

7
236.8 5928 19.35 99.36 8.89 27.67 6.09 14.98 4.88 10.46 4.21 8.26 3.79 6.99 3.50 6.18 3.29 5.61 3.14 5.20 2.71 4.14 2.51 3.70

8
238.9 5981 19.37 99.37 8.85 27.49 6.04 14.80 4.82 10.29 4.15 8.10 3.73 6.84 3.44 6.03 3.23 5.47 3.07 5.06 2.64 4.00 2.45 3.56

9
240.5 6022 19.38 99.39 8.81 27.35 6.00 14.66 4.77 10.16 4.10 7.98 3.68 6.72 3.39 5.91 3.18 5.35 3.02 4.94 2.59 3.89 2.39 3.46

10
241.9 6056 19.40 99.40 8.79 27.23 5.96 14.55 4.74 10.05 4.06 7.87 3.64 6.62 3.35 5.81 3.14 5.26 2.98 4.85 2.54 3.80 2.35 3.37

15
245.9 6157 19.43 99.43 8.70 26.87 5.86 14.20 4.62 9.72 3.94 7.56 3.51 6.31 3.22 5.52 3.01 4.96 2.85 4.56 2.40 3.52 2.20 3.09

20
248.0 6209 19.45 99.45 8.66 26.69 5.80 14.02 4.56 9.55 3.87 7.40 3.44 6.16 3.15 5.36 2.94 4.81 2.77 4.41 2.33 3.37 2.12 2.94

1 2 3 4 5 6 7 8 9 10 15 20

Numerator Denominator
2010-06-04 SKF Group Slide 18 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

= 0.05 ... first row = 0.01 ... second row

Mean sum of squares (MS)


In ANOVA, we use the term Mean Square, or simply MS, in stead of the term variance. Remember that variance is defined as the mean of the squared deviations. In the same way that we use SS to stand for the sum of the squared deviations, we now will use MS to stand for the mean of the squared deviations. For the final F-ratio we will need an MSbetween treatments for the numerator and MSwithin treatments for the denominator.

MSbetween = SSbetween / dfbetween MSwithin = SSwithin / dfwithin

MSbetween F - ratio = MS within

2010-06-04 SKF Group Slide 19

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Partition of variance and F-ratio Overview


Total variability

Between treatments variance Signal SSbetween dfbetween

Within treatments variance Error SSwithin dfwithin

Variance (MSbetween) =

Variance (MSwithin) =

Measures differences due to: Treatment effects and Chance

Measures differences due to: Chance

F-ratio =

MSbetween MSwithin

2010-06-04 SKF Group Slide 20

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

The p-value and ANOVA


Assumptions ... H0: There are no differences between subgroups means HA: There are differences between subgroups means Low p-values suggest that there ARE differences between subgroups means.

Tip: P-value is low, H0 must go !

2010-06-04 SKF Group Slide 21

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

The R-squared value and ANOVA


To provide an indication of how large the effect actually is, we check p-value but also the R2 value to take the decision if the result is robust or not. For Analysis of Variance, the simplest and most direct way to measure effect size is to compute R2, the percentage of variance accounted for. In simpler terms, R2 measures how much of the difference between scores is accounted for by the differences between treatments. SSbetween measures the variability accounted for by the treatment differences, and SStotal measures the total variability.

SS between R = SS total
2
2010-06-04 SKF Group Slide 22 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

The R-squared value and ANOVA - examples

Variance explained by the factor (treatment) Sum of variance 50% 90% Sum of variance

R2 = 90%

Error, part of variance not explained by the factor (xs)

R2 = 50 %

Which model is more robust? A or B?


2010-06-04 SKF Group Slide 23 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

ANOVA assumptions
1. Normality 2. Homogeneity of variance (equal variances) 3. Independence of error

2010-06-04 SKF Group Slide 24

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Independence of error
Errors should be independent for each value and over time If not, then do not assume test is valid Identify why error is not independent and correct We use control charts to check the stability and detect the special cause

2010-06-04 SKF Group Slide 25

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Normality
The values in each group are Normally distributed While the ANOVA method is robust against departures from normality as in the t-test, especially with large sample sizes, non-normal distributions where normality would be expected may indicate an area of investigation Master Black Belt may be consulted when non-normal data is being analysed (non-parametric tests)

2010-06-04 SKF Group Slide 26

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Homogeneity of variance
The variance within each group is equal However, if the sample sizes are equal between groups, the F-test is robust enough for unequal variances Always try to have equal sample sizes If both normality and equal variances are violated, Master Black Belt may be consulted

2010-06-04 SKF Group Slide 27

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

The p-value
For a classical hypothesis test, use the p-value to evaluate the probability that the calculated F-ratio (or test statistic) was due to within subgroup noise. Low p-values suggest that there ARE differences between subgroups means: H0: There are no differences between subgroups means HA: There are differences between subgroups means

2010-06-04 SKF Group Slide 28

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Examples to practice !

2010-06-04 SKF Group Slide 29

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

One-way ANOVA
Stat > ANOVA > One-Way Data must be in one column and the subscripts in another Can be used with balanced and unbalanced designs

A one-way analysis of variance (ANOVA) tests the hypothesis that the means of several populations are equal The method is an extension of the two-sample t-test, specifically for the case were the population variances are assumed to be equal. A one-way analysis of variance requires the following: A response, or measurement taken from the units sampled A factor, or discrete variable which is altered systematically

2010-06-04 SKF Group Slide 30

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

One-way ANOVA example: Tire brand test


Four cars: Four brands of tires: 1 A 2 B 3 C 4 D

Objective: To determine tread wear of tires after 30,000 km of driving. Problem: How do we assign 16 tires to the 4 cars? Assign each of the 16 tires at random to a wheel. (Large variability within brands.) Cars 1 C (12) A (17) D (13) D (11) 2 A (14) A (13) B (14) C (12) 3 C (10) D (11) B (14) B (13) 4 A (13) D (9) B (8) C (9)

Model Tread wear = Overall mean + Brand effect + error


Ref.: "Fundamental Concepts in the Design of Experiments" by Hicks and Turner
2010-06-04 SKF Group Slide 31 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Difference in tread thickness in mm.

Data of tread wear of tires Each of the 16 tires assigned at random to wheel
Car One Brand C A D D A A B C D C B B A D B C
SKF (Group Six Sigma)

Two

Three

Four

Tread 12 17 13 11 14 13 14 12 10 11 14 13 13 9 8 9

Open the file

ANOVA - Tire Brand.MTW

<ANOVA - Tire Brand.mtw> and check the different assumptions:


Stability Normality Homogeneity of variance
(equal variances)

2010-06-04 SKF Group Slide 32

2.16 Analysis of Variance (ANOVA)

One-way ANOVA Stat > ANOVA > One-way

You wish to compare the mean tread wear for the different types of brands of tires. H0 is that the tread wear are all the same. Any variation is caused by random variation found in each brand. The HA is that different brands have different tread wear.

2010-06-04 SKF Group Slide 33

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Normality and stability One-way ANOVA Residual plots


Residual Plots for Tread
Normal Probabilit y Plot
99 90 Percent 50 10 1 -5.0 -4 -2.5 0.0 Residual 2.5 5.0 11 12 13 Fitted V alue 14

Versus Fit s
2 Residual 0 -2

Normal ?

Hist ogram
3 2 1 0 2 Residual 0 -2 -4 -4 -3 -2 -1 0 Residual 1 2 3
1 2

Versus Order

Frequency

Stable ?
3 4 5 6 7 8 9 10 11 12 13 14 15 16

Observation Order

2010-06-04 SKF Group Slide 34

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Variance
Test for Equal Variances for Tread
Bart lett's Test

Test Stat istic P-V alue Levene's Test Test Stat istic P-V alue

1.52 0.677 0.15 0.926

B Brand

Variances are equal ?


C

2 4 6 8 10 12 14 16 95% Bonferroni Confidence Intervals for StDevs

18

2010-06-04 SKF Group Slide 35

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

One-way ANOVA example


1. Open the file <ANOVA - Tire Brand.mtw>
ANOVA - Tire Brand.MTW

2. Select Stat > ANOVA > One-Way

3. Select Tread for the Response and Brand for the Factor

4. Click on OK

2010-06-04 SKF Group Slide 36

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Interpreting the One-way ANOVA Output from the session window


One-way ANOVA: Tread versus Brand Source Brand Error Total DF 3 12 15 SS 30.69 50.25 80.94 MS 10.23 4.19 F 2.44 P 0.115

The 1st row "Brand" gives the stats for the variation between the means of the factor levels. The 2nd row "Error" gives the stats for the variation due to random error. The 3rd row "Total" gives the stats for the overall variability in the data.

S = 2.046

R-Sq = 37.92%

R-Sq(adj) = 22.39%

MINITAB

Level A B C D

N 4 4 4 4

Mean 14.250 12.250 11.000 10.750

StDev 1.893 2.872 1.414 1.708

Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(----------*----------) (----------*----------) (----------*----------) (----------*----------) -------+---------+---------+---------+-10.0 12.0 14.0 16.0

Pooled StDev = 2.046

2010-06-04 SKF Group Slide 37

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Interpreting the One-way ANOVA Output from the session window


One-way ANOVA: Tread versus Brand Source Brand Error Total DF 3 12 15 SS 30.69 50.25 80.94 MS 10.23 4.19 F 2.44 P 0.115

1. What is your decision? 2. The result is robust or not and why? 3. Which Brand is best?

S = 2.046

R-Sq = 37.92%

R-Sq(adj) = 22.39%

MINITAB

Level A B C D

N 4 4 4 4

Mean 14.250 12.250 11.000 10.750

StDev 1.893 2.872 1.414 1.708

Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+---------+-(----------*----------) (----------*----------) (----------*----------) (----------*----------) -------+---------+---------+---------+-10.0 12.0 14.0 16.0

Pooled StDev = 2.046

2010-06-04 SKF Group Slide 38

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Two-way ANOVA Using a 2nd variable to block Car variation


Assign each tire at random but under the condition that each tire occurs exactly once on each car. Reduces unexplained variability.

Cars

1 B (14) C (12) A (17) D (13)

2 D (11) C (12) B (14) A (14)

3 A (13) B (13) D (11) C (10)

4 C (9) D (9) B (8) A (13)

Model Tread wear = Overall mean + Brand effect + Car effect + error
2010-06-04 SKF Group Slide 39 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Difference in tread thickness in mm.

Data of tread wear of tires Each tire occurs exactly once on each car
Car One Brand B C A D D C B A A B D C C D B A
SKF (Group Six Sigma)

Two

Three

Four

Tread 14 12 17 13 11 12 14 14 13 13 11 10 9 9 8 13

Open the file

ANOVA - Tire Brand Car.MTW

<ANOVA - Tire Brand Car.mtw> and check the assumptions:


Stability Normality Homogeneity of variance
(equal variances)

2010-06-04 SKF Group Slide 40

2.16 Analysis of Variance (ANOVA)

Two-way ANOVA example: Tire brand test


1. Open the file <ANOVA - Tire Brand Car.mtw> 2. Select Stat > ANOVA > Two-Way
ANOVA - Tire Brand Car.MTW

3. Select Tread for the Response and Brand for the Row factor and Car for Column factor. Check Display means.

4. Click on OK

2010-06-04 SKF Group Slide 41

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Interpreting the Two-way ANOVA Output from the session window


Two-way ANOVA: Tread versus Brand, Car

MINITAB

Source Brand Car Error Total

DF 3 3 9 15

SS 30.6875 38.6875 11.5625 80.9375

MS 10.2292 12.8958 1.2847

F 7.96 10.04

P 0.007 0.003

p-values are low for Car and Brand, therefore: Brands are not the same, and Tread loss for Cars is not the same.

S = 1.133

R-Sq = 85.71%

R-Sq(adj) = 76.19%

Lets look at the residuals plots ...

2010-06-04 SKF Group Slide 42

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Two-way ANOVA Residual plots


Residual Plots for Tread
Normal Probability Plot
99 90 50 10 1 -2 -1 0 Residual 1 2 Residual P ercent 0 -1 -2 8 10 12 Fit t ed Value 14 16 1

Versus Fits

Histogram
4 Frequency 3 2 1 0 -2.0 -1.5 -1.0 -0.5 0.0 Residual 0.5 1.0 Residual 1 0 -1 -2
1 2 3 4 5

Versus Order

Observ at ion Order

9 10 11 12 13 14 15 16

The residuals plots show no unusual observations. The Histogram is not bell shaped (only 16 observations) so it is hard to interpret.
2010-06-04 SKF Group Slide 43 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Interpreting the Two-way ANOVA Output from the session window


Two-way ANOVA: Tread versus Brand, Car Individual 95% CIs For Mean Based on Pooled StDev -+---------+---------+---------+-------(-------*-------) The confidence intervals show: (-------*-------) Brands are not the same, and (-------*-------) Tread loss for Cars is not the same. (-------*-------) -+---------+---------+---------+-------9.6 11.2 12.8 14.4 Individual 95% CIs For Mean Based on Pooled StDev --------+---------+---------+---------+(------*-----) 1. What is your decision? (-----*-----) 2. The result is robust or not and (------*-----) (------*-----) 3. Which factor is significant? --------+---------+---------+---------+10.0 12.0 14.0 16.0

Brand A B C D

Mean 14.25 12.25 10.75 11.00

MINITAB

Car Four One Three Two

Mean 9.75 14.00 11.75 12.75

why?

2010-06-04 SKF Group Slide 44

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Lets look at the data graphically

Graph > Chart > Values from a table

2010-06-04 SKF Group Slide 45

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Lets look at the data graphically

Graph > Chart > Values from a table > Data View
2010-06-04 SKF Group Slide 46 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Displaying the Two-way ANOVA design


Chart of Tread
18 16 14 12 Tread 10 8 6 4 2 0 Brand Car B C A D One B C A D Two B C A D Three B C A D Four
Brand B C A D

All 4 Brands performed better in Car One. This is an assignable difference due to Car. Also it appears that Brand A performs better at each Car than the other Brands.
2010-06-04 SKF Group Slide 47 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Displaying the Two-way ANOVA design


Chart of Tread
18 16 14 12 Tread 10 8 6 4 2
Car Brand
Car One Two Three Four

e o e ur O n Tw h r e F o T B

e o e ur O n Tw h re F o T C

e o e ur O n Tw h re F o T A

e o e ur On Tw h re F o T D

Here we are trying to discover which Brand of Tires had the best Tread Wear characteristics. We included a blocking variable to explain some of the variability. Based on a comparison of the bar chart and the ANOVA table which Brand should be selected?
2010-06-04 SKF Group Slide 48 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Three-way ANOVA Using a Latin Square design


Each brand appears once in each position and only once on each car (2 restrictions on randomisation). Minimises variability.

Position I II III IV

1 C (12) B (14) A (17) D (13)

2 D (11) C (12) B (14) A (14)

3 A (13) D (11) C (10) B (13)

4 B (8) A (13) D (9) C (9)

Model Tread wear = Overall mean + Brand effect + Car effect + Position effect + error
2010-06-04 SKF Group Slide 49 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Difference in tread thickness in mm.

Data of tread wear of tires


Position Left Front Right Front One Left Back Right Back Left Front Right Front Two Left Back Right Back Left Front Right Front Three Left Back Right Back Left Front Right Front Four Left Back Right Back
2010-06-04 SKF Group Slide 50 SKF (Group Six Sigma)

Each brand appears once in each position and each car


Car Brand B C A D D C B A A B D C C D B A
2.16 Analysis of Variance (ANOVA)

Tread 14 12 17 13 11 12 14 14 13 13 11 10 9 9 8 13

Open the file

ANOVA - Tire Brand Car Position.MTW

<ANOVA - Tire Brand Car Position.mtw> and check the assumptions:


Stability Normality Homogeneity of variance
(equal variances)

Three-way ANOVA Stat > ANOVA > General Linear Model


Fill out the dialog box as shown. Click OK.

2010-06-04 SKF Group Slide 51

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Interpreting the General Linear Model Output from the session window
General Linear Model: Tread versus Car, Position, Brand Factor Car Position Brand Type fixed fixed fixed Levels 4 4 4

Values Four, One, Three, Two Left Back, Left Front, Right Back, Right Front A, B, C, D

The 1st half of the table lists the value for each level of each factor. The 2nd half is the ANOVA table.

MINITAB

Analysis of Variance for Tread, using Adjusted SS for Tests Source Car Position Brand Error Total DF 3 3 3 6 15 Seq SS 38.6875 6.1875 30.6875 5.3750 80.9375 Adj SS 38.6875 6.1875 30.6875 5.3750 Adj MS 12.8958 2.0625 10.2292 0.8958 F 14.40 2.30 11.42 P 0.004 0.177 0.007

Two factors are statistically significant at the = 0.05 level: Car, Brand. Factor Position doesnt appear to be a significant effect. The residual plots will confirm whether the basic assumptions about the error have been met. Lets look at the residuals plots ...

S = 0.946485

R-Sq = 93.36%

R-Sq(adj) = 83.40%

2010-06-04 SKF Group Slide 52

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

General Linear Model Residual plots

2010-06-04 SKF Group Slide 53

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

General Linear Model Residual plots


Residual Plots for Tread
Normal Probability Plot
99 90 Residual P ercent 50 10 1 -1 0 Residual 1 1.0 0.5 0.0 -0.5 -1.0 10 12 14 Fit t ed Value 16

Versus Fits

Histogram
4 Frequency 3 2 1 0 -1.0 -0.5 0.0 Residual 0.5 1.0 1.0 0.5 Residual 0.0 -0.5 -1.0
1 2 3

Versus Order

4 5 6

Observ at ion Order

7 8

9 10 11 12 13 14 15 16

Review the residual plots and state the conclusions about the assumptions regarding error, i.e. that the errors for each treatment level are independent, normally distributed with a mean = 0 and a constant variance.
2010-06-04 SKF Group Slide 54 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

ANOVA example with GLM


How to include the interaction in the model?
1. Select Stat > ANOVA > General Linear Model

2. Select Tread for Response Car and Brand for Model. For the interaction we create Car*Brand.

3. Click on OK
2010-06-04 SKF Group Slide 55 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

GLM with an unbalanced and nested design


Four chemical companies produce insecticides that can be used to kill mosquitoes, but the composition of the insecticides differs from company to company. An experiment is conducted to test the efficacy of the insecticides by placing 400 mosquitoes inside a glass container treated with a single insecticide and counting the live mosquitoes 4 hours later. Three replications are performed for each product. The goal is to compare the product effectiveness of the different companies. The factors are fixed because you are interested in comparing the particular brands. The factors are nested because each insecticide for each company is unique. You use GLM to analyse your data because the design is unbalanced:

Company A: 3 type of products Company B: 2 type of products Company C: 2 type of products Company D: 4 type of products
SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

2010-06-04 SKF Group Slide 56

GLM with an unbalanced and nested design


1. Select Stat > ANOVA > General Linear Model

For the Nested design add (Company)

2. Select NMosquito for Response Company and Product for Model.

3. Click on OK
2010-06-04 SKF Group Slide 57 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

GLM with an unbalanced and nested design


ANOVA table in session window

1. What is your decision? 2. Which parameter is significant?

2010-06-04 SKF Group Slide 58

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Multi-way ANOVA
Two-way, Balanced, General Linear Model.
Two-way ANOVA may also be used to analyse a design where there are two controllable factors, both of which are of interest. More than two factors can be analysed using Balanced ANOVA or General Linear Model. There may be more than one factor that has an effect on the response variable. This commonly occurs in manufacturing processes. It is often wise to include more than one factor in the analysis.
Valuable resources can be used more efficiently by investigating several
factors at one time.

More error can be explained by including additional factors in the model. By including more factors interactions can be studied.

2010-06-04 SKF Group Slide 59

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

What about the other ANOVA options? When are they appropriate?
One-way ANOVA Two-way ANOVA Balanced ANOVA Studies the effect of one factor at various levels on a response variable. Studies the effect of two factors and their interaction at various levels on a response variable. Studies the impact of 2 or more factors and their interactions at various levels on a response variable. The levels of factors are structured such that there are an equal number of levels and observations within each level for each factor. Studies the impact of 2 or more factors and their interactions at various levels on a response variable. The number of levels and observations may vary. The factors may be a mixture nested and crossed relationship. User must specify factors, interactions and nested/crossed relationships of interest. Studies the impact of 2 or more factors. The factors are structured in a hierarchical structure such that one factor is nested (or unique to) the factor above it. No interactions are obtained.
SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

General Linear Model Fully Nested ANOVA

2010-06-04 SKF Group Slide 60

Partitioning of sums of squares


Total SS

SS Between Brands

SS Within Brands

SS Between Cars

SS Within Cars

SS Between Positions
2010-06-04 SKF Group Slide 61 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

SS Within (Error)

Summary ANOVA
One Way ANOVA
To analyse the difference between means from 2 or more samples

Balanced ANOVA
To compare the means of populations that are classified in two or more ways (two or more factors)

General Linear Model


Similar to above

2010-06-04 SKF Group Slide 62

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Last words
We reviewed:
Graphical methods for analysing differences between means obtained from 2 or more samples. Analysis of Variance (ANOVA) methods for analysing the differences between means. Methods for determining whether or not significant differences in variance exist between two or more samples.

2010-06-04 SKF Group Slide 63

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Appendix

2010-06-04 SKF Group Slide 64

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Post Hoc tests


Definition: Post hoc tests are additional hypothesis tests that are done after an ANOVA to determine exactly which mean differences are significant and which are not. These tests are done when:

You reject H0 and there are three or more treatments. Rejecting H0 indicates that at least one difference exists among the treatments. With k = 3 or more, the problem is to find where the differences are. Note that when you have two treatments, rejecting H0 indicates that the two means are not equal, in this case there is no question about which means are different, and there is no need to do Post Hoc Tests.
The first test we consider is Tukeys HSD test. Tukeys test allows you to compute a single value that determines the minimum difference between treatment mean that is necessary for significance.
2010-06-04 SKF Group Slide 65 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Post Hoc tests


This value, called the Honestly Significant Difference (HSD) is then used to compare any two treatments (Xs). If the mean difference exceed Tukeys HSD you conclude that there is significant difference between treatments. The formula is:

MS within HSD = q n
N: number of data for each treatment

Where the value of q is found in the table (next slide). To locate the appropriate value of q, you must know the number of treatments in the overall experiment (k) and the degree of freedom for the Error and select the Alpha-risk (0.05) q value used in this test is called a Studentised range statistic. Tukeys test requires that the sample size must be the same for all treatments.
2010-06-04 SKF Group Slide 66 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)

Post Hoc tests

2010-06-04 SKF Group Slide 67

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Tukeys HSD test example


Example of seal life by shift
Shift 1 25.40 26.31 Shift 2 23.40 21.80 23.50 22.75 21.60 22.61 Shift 3 20.00 22.20 19.75 20.60 20.40 20.59

ANOVA result: P-value is low, the difference is significant between the shifts.

24.10 23.74 25.10 Mean 24.93

Now the question is: Which mean differences are significant and which are not?

2010-06-04 SKF Group Slide 68

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Tukeys HSD test example


Tukeys HSD calculation step 1: Determine the q value, in this example k=3 and df for Error = 12. Check the value in the table, we get = 3.77 with Alpha-risk = 0.05 Tukeys HSD calculation step 2: Determine the HSD value 0.921 Tukeys HSD calculation step 3: HSD = 3.77 = 1.618
5

The mean difference between any two samples must be at least 1,618 to be significant. Using this value, we can make the following conclusions : Shift 1 is significantly different from Shift 2 (Mean S1 Mean S2 = 2.32) Shift 1 is significantly different from Shift 3 (Mean S1 Mean S3 = 4.34) Shift 2 is significantly different from Shift 2 (Mean S2 Mean S3 = 2.02)

2010-06-04 SKF Group Slide 69

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Summary
ANOVA is used as a hypothesis test and we also use it for components of variation studies The X is attribute and Y is variable very common data sets ANOVA introduced us to 3 preliminary tests before concluding to accept or reject the null:
Stability Normality Homogeneity of variance

All hypothesis tests require these or similar tests of assumptions Use the appropriate design before to calculate ANOVA Use Tukeys HSD test to adjust the conclusion if needed

2010-06-04 SKF Group Slide 70

SKF (Group Six Sigma)

2.16 Analysis of Variance (ANOVA)

Vous aimerez peut-être aussi