CVEN2002

© All Rights Reserved

3 vues

CVEN2002 Week12

CVEN2002

© All Rights Reserved

- ANOVA
- STATISTICS for Mgt Summary of Chapters
- Six Sigma Tools Matrix in Stages.xlsx
- Functions of Inferential Statistics Tests
- Mastering Data Analysis Tools.pptx
- The Effect of Demographic Characteristic of Passengers on Relationship Quality in Airlines Industri
- Summer 2009 SPSS Tutorial
- SPSS_MixedEffects
- Streets Hawkers of North Campus
- Mathematical Error in Indian Rubber Statistics
- case-study-call-centre-hypothesis-testing
- OrdinalexampleR.pdf
- 1041_partC_S2_15
- Statistics in 1 Hour
- What Quant Test to Use
- Comparison of Response Surface Methodology and Artificial Neural
- lasisi 2014 business relationships.pdf
- Book1 Main
- Articulo i
- 5 wedm1

Vous êtes sur la page 1sur 48

CVEN2002/2702

Week 12

This lecture

11.1 Introduction

11.2 One-way Analysis of Variance

11.3 Multiple pairwise comparisons

11.4 Adequacy of the ANOVA model

11.5 Blocking factor

CVEN2002/2702 (Statistics)

Dr Justin Wishart

2 / 48

11. ANOVA

CVEN2002/2702 (Statistics)

Dr Justin Wishart

3 / 48

11.1 Introduction

Introduction

In Chapter 10, we introduced testing procedures for comparing the

means of two different populations, having observed two random

samples drawn from those populations (two-sample z- and t-tests)

However, in applications, it is common that we want to detect a

difference in a set of more than two populations

Imagine the following context: four groups of students were subjected

to different teaching techniques and tested at the end of a specified

period of time. Do the data shown in the table below present sufficient

evidence to indicate a difference in mean achievement for the four

teaching techniques?

Tech. 1

65

87

73

79

81

69

CVEN2002/2702 (Statistics)

Tech. 2

75

69

83

81

72

79

90

Dr Justin Wishart

Tech. 3

59

78

67

62

83

76

Tech. 4

94

89

80

88

4 / 48

11.1 Introduction

Introduction: randomisation

To answer this question, we should first note that the method of division of the

students into 4 groups is of vital importance

For instance, some basic visual inspection of the data suggests that the

members of group 4 scored higher than those in the other groups. Can we

conclude from this that teaching technique 4 is superior? Perhaps, students

in group 4 are just better learners

; it is essential that we divide the students into 4 groups in such a

way to make it very unlikely that one of the group is inherently

superior to others (regardless of the teaching technique it will be

subjected to)

; the only reliable method for doing this is to divide the students in a

completely random fashion, to balance out the effect of any

nuisance variable that may influence the variable of interest

This kind of consideration is part of a very important area of statistical

modelling called experimental design, which is not addressed in this course

(Chapter 10 in the textbook). In this course, we will always assume that the

division of the individuals into the groups was indeed done at random

CVEN2002/2702 (Statistics)

Dr Justin Wishart

5 / 48

11.1 Introduction

Introduction

75

69

83

81

72

79

90

78.43

7.11

59

78

67

62

83

76

94

89

80

88

70.83

9.58

87.75

5.80

65

75.67

8.17

85

65

87

73

79

81

69

80

Tech. 4

75

Tech. 3

70

Tech. 2

60

x

s

Tech. 1

90

95

always useful:

and the variability between the groups

; comparing the between-group with the within-group variability

is the key in detecting any significant difference between the groups

CVEN2002/2702 (Statistics)

Dr Justin Wishart

6 / 48

Group 1

5.90

5.92

5.91

5.89

5.88

Group 2

5.51

5.50

5.50

5.49

5.50

Group 3

5.01

5.00

4.99

4.98

5.02

(ratio = 5545)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

7 / 48

Group 1

5.90

4.42

7.51

7.89

3.78

Group 2

6.31

3.54

4.73

7.20

5.72

Group 3

4.52

6.93

4.48

5.55

3.52

(ratio = 0.436)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

8 / 48

Analysis of Variance

Comparing (intelligently) the between-group variability and the

within-group variability is the purpose of the Analysis of Variance

; often shortened to the acronym ANOVA

Suppose that we have k different groups (k populations, or k

sub-populations of a population) that we wish to compare

Often, each group is called a treatment or treatment level (general

terms that can be traced back to the early applications of this

methodology in the agricultural sciences)

The response for each of the k treatments is the random variable of

interest, say X

Denote Xij the jth observation (j = 1, . . . , ni ) taken under treatment i

; we have k independent samples (one sample from each of the

treatments)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

9 / 48

ANOVA samples

The k random samples are often presented as:

Treatment

Mean

St. Dev.

1

X11

X21

..

.

X1n1

2

X21

X22

..

.

X1n2

...

k

Xk 1

Xk 2

..

.

Xknk

1

X

S1

2

X

S2

...

...

k

X

Sk

...

...

i and Si are the sample mean and standard deviation of the ith

where X

sample. The total number of observations is

n = n1 + n2 + . . . + nk

, is

and the grand mean of all the observations, usually denoted X

k

i

XX

1 + n2 X

2 + . . . + nk X

k

n1 X

=1

X

Xij =

n

n

i=1 j=1

CVEN2002/2702 (Statistics)

Dr Justin Wishart

10 / 48

ANOVA model

The ANOVA model is the following:

Xij = i + ij

where

i is the mean response for the ith treatment (i = 1, 2, . . . , k )

ij is an individual random error component (j = 1, 2, . . . , ni )

As usual for errors, we will assume that the random variables ij are

normally and independently distributed with mean 0 and variance 2 :

i.i.d.

ij N (0, )

for all i, j

with mean i and variance 2 :

ind.

Xij N (i , )

Important: the variance 2 is common for all treatments

CVEN2002/2702 (Statistics)

Dr Justin Wishart

11 / 48

ANOVA hypotheses

We are interested in detecting differences between the different

treatment means i , which are population parameters

; hypothesis test!

The null hypothesis to be tested is

H0 : 1 = 2 = . . . = k

versus the general alternative

Ha : not all the means are equal

Careful! The alternative hypothesis should be that at least two of the

means differ, not that they are all different !

As pointed out previously, the primary tool when testing for equality of

the means is based on a comparison of the variances within the

groups and between the groups

CVEN2002/2702 (Statistics)

Dr Justin Wishart

12 / 48

Variability decomposition

The ANOVA partitions the total variability in the sample data,

described by the total sum of squares

ni

k X

X

SSTot =

)2

(Xij X

(df = n 1)

i=1 j=1

SSTr =

k

X

)2

i X

ni (X

(df = k 1)

i=1

SSEr =

ni

k X

X

i )2

(Xij X

(df = n k )

i=1 j=1

SSTot = SSTr + SSEr

CVEN2002/2702 (Statistics)

Dr Justin Wishart

(Proof: exercise)

Session 2, 2012 - Week 12

13 / 48

Variability decomposition

The total sum of squares SSTot quantifies the total amount of variation

contained in the global sample

The Treatment sum of squares SSTr quantifies the variation between

the groups, that is the variation between the means of the groups

(giving more weight to groups with more observations)

The Error sum of squares SSEr quantifies the variation within the

groups

treatment sample

error samples

95

global sample

95

95

samples

65

65

CVEN2002/2702 (Statistics)

10

5

X

Dr Justin Wishart

80

X

75

60

10

70

65

80

X

75

60

70

70

75

85

85

80

60

90

85

90

90

14 / 48

In sample i, the sample variance is given by Si2 =

1

ni 1

Pni

j=1 (Xij

i )2

X

Since,

SSEr =

ni

k X

X

i )2 =

(Xij X

i=1 j=1

hence

E(SSEr ) =

k

X

(ni 1)Si2

i=1

k

k

X

X

(ni 1)E(Si2 ) = 2

(ni 1) = (n k ) 2

i=1

i=1

MSEr =

SSEr

nk

; the number of degrees of freedom for this error estimator of 2 is

nk

CVEN2002/2702 (Statistics)

Dr Justin Wishart

15 / 48

Now if H0 is true, that is if 1 = 2 = . . . = k = , we have

i ) N (0, ), for all i = 1, . . . , k

i N (, ), that is ni (X

X

ni

1 ), n2 (X

2 ), . . . , nk (X

k ), is a random sample

; n1 (X

whose sample variance

k

1 X

)2 = SSTr

i X

ni (X

k 1

k 1

i=1

; the Treatment Mean Square MSTr , defined by

MSTr =

SSTr

k 1

; the number of degrees of freedom for this treatment estimator

of 2 is k 1

CVEN2002/2702 (Statistics)

Dr Justin Wishart

16 / 48

ANOVA test

Thus we have two potential estimators of 2 :

1

MSEr , which always estimates 2

2

MSTr , which estimates 2 only when H0 is true

Actually, if H0 is not true, MSTr tends to exceed 2 , as we have

E(MSTr ) = 2 + true variance between the groups

; the idea of the ANOVA test now takes shape

Suppose we have observed k samples xi1 , xi2 , . . . , xini , for

i = 1, 2, . . . , k , from which we can find through calculations the

observed values msTr and msEr . Then:

if msTr ' msEr , then H0 is probably reasonable

if msTr msEr , then H0 should be rejected

; this will thus be a one-sided hypothesis test

We need to determine what msTr msEr means so as to obtain a

hypothesis test at given significance level

CVEN2002/2702 (Statistics)

Dr Justin Wishart

17 / 48

Sampling distribution

It can be shown that, if H0 is true, the ratio

MSTr

F =

=

MSEr

SSTr

k 1

SSEr

nk

with k 1 and n k degrees of freedom, which is usually denoted by

F Fk 1,nk

Note: Ronald A. Fisher (1890-1962) was an English statistician and

biologist. Some say that he almost single-handedly created the

foundation for modern statistical science. As a biologist, he is also

regarded as the greatest biologist since Charles Darwin

CVEN2002/2702 (Statistics)

Dr Justin Wishart

18 / 48

A random variable, say X , is said to follow Fishers F -distribution with

d1 and d2 degrees of freedom, i.e.

X Fd1 ,d2

if its probability density function is given by

f (x) =

(d1 /2)(d2 /2)((d1 /d2 )x + 1)(d1 +d2 )/2

for x > 0

; SX = [0, +)

Note: the Gamma function is given by

Z +

(y ) =

x y 1 ex dx,

for y > 0

integer n,

(n) = (n 1)!

There is usually no simple expression for the F -cdf

CVEN2002/2702 (Statistics)

Dr Justin Wishart

19 / 48

1.0

Some F -distributions

d1=4,d2=4

d1=100,d2=6

0.6

d1=4,d2=100

0.2

0.4

0.2

0.4

f(x)

F(x)

0.6

0.8

d1=3,d2=10

d1=4,d2=4

d1=100,d2=6

d1=3,d2=10

0.0

0.0

d1=4,d2=100

cdf F (x)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

20 / 48

It can be shown that the mean and the variance of the F -distribution

with d1 and d2 degrees of freedom are

E(X ) =

and

Var(X ) =

d2

d2 2

for d2 > 2

2d22 (d1 + d2 2)

d1 (d2 2)2 (d2 4)

for d2 > 4

(ratio of two positive random quantities) and the distribution is highly

skewed to the right

CVEN2002/2702 (Statistics)

Dr Justin Wishart

21 / 48

Similarly to what we did for other distributions, we can define the

quantiles of any F -distribution:

Fdistribution

P(X > fd1 ,d2 ; ) = 1

however it can be shown that

fd1 ,d2 ; =

f(x)

1

fd2 ,d1 ;1

1

fd1, d2,

x

For any d1 and d2 , the main quantiles of interest may be found in the

F -distribution critical values tables

CVEN2002/2702 (Statistics)

Dr Justin Wishart

22 / 48

ANOVA test

The null hypothesis to test is H0 : 1 = 2 = . . . = k

versus the general alternative Ha : not all the means are equal

Evidence against H0 is shown if MSTr MSEr , so we will reject H0

whenever MSTr is much larger than MSEr , i.e.

MSTr

MSEr

MSTr

=P

> c if H0 is true

MSEr

We know that, if H0 is true, F =

MSTr

MSEr

Fk 1,nk

From observed values msTr and msEr , the decision rule is:

reject H0 if

CVEN2002/2702 (Statistics)

msTr

> fk 1,nk ;1

msEr

Dr Justin Wishart

23 / 48

F(k1), (nk)distribution

statistic is

msTr

msEr

f(x)

f0 =

p = P(X > f0 ),

p

where X Fk 1,nk

f0

(the probability that the test statistic will take on a value that is at least

as extreme as the observed value when H0 is true, definition on Slide

21 Week 9) ; from the F -distribution table, only bounds can be found

for this p-value (use software to get an exact value)

x

CVEN2002/2702 (Statistics)

Dr Justin Wishart

24 / 48

ANOVA table

The computations for this test are usually summarised in tabular form

Source

degrees

of freedom

sum of

squares

mean

square

Treatment

dfTr = k 1

ssTr

msTr =

ssTr

k 1

Error

dfEr = n k

ssEr

msEr =

ssEr

nk

Total

dfTot = n 1

ssTot

F -statistic

f0 =

msTr

msEr

Note 2: this table is the usual computer output when an ANOVA

procedure is run

CVEN2002/2702 (Statistics)

Dr Justin Wishart

25 / 48

ANOVA: example

Example

Consider the data shown on Slide 5. Test at significance level = 0.05 the

null hypothesis that there is no difference in mean achievement for the four

teaching techniques

We have k = 4, n1 = 6, n2 = 7, n3 = 6 and n4 = 4, with x1 = 75.67,

x2 = 78.43, x3 = 70.83, x4 = 87.75 and s1 = 8.17, s2 = 7.11, s3 = 9.58,

s4 = 5.80. Besides,

4

n = 6 + 6 + 7 + 4 = 23

and

x =

1X

ni xi = 77.35

n

i=1

ssEr = 5 8.172 + 6 7.112 + 5 9.582 + 3 5.802 = 1196.63

and

ssTr = 6 (75.67 77.35)2 + 7 (78.43 77.35)2

+ 6 (70.83 77.35)2 + 4 (87.75 77.35)2 = 712.59

CVEN2002/2702 (Statistics)

Dr Justin Wishart

26 / 48

ANOVA: example

From there, the ANOVA table can be easily completed:

Source

degrees

of freedom

sum of

squares

mean

square

F -statistic

Treatments

dfTr = 3

ssTr = 712.59

msTr = 237.53

f0 = 3.77

Error

dfEr = 19

ssEr = 1196.63

msEr = 62.98

Total

dfTot = 22

ssTot =1909.22

; compare to the appropriate F -distribution critical value

CVEN2002/2702 (Statistics)

Dr Justin Wishart

27 / 48

ANOVA: example

According to M ATLAB, f3,19;0.95 = 3.1274 (in the table: f3,20;0.95 = 3.10)

; the decision rule is:

reject H0 if

msTr

> 3.1274

msEr

f0 =

msTr

= 3.77

msEr

; reject H0

the mean achievement of the students (with less than 5% chance of

being wrong)

The associated p-value is

p = P(X > 3.77) = 0.0281

(M ATLAB again)

CVEN2002/2702 (Statistics)

for X F3,19

Dr Justin Wishart

28 / 48

The ANOVA F -test will tell you whether the means are all equal or not,

but nothing more

When the null hypothesis of equal means is rejected, we will usually

want to know which of the i s are different from one another

A first step in that direction is to build confidence intervals for the

different means i

From our assumptions (normal populations, random samples, equal

variance 2 in each group), we have

Xi N i ,

ni

The value of 2 is unknown, however we have (numerous!) estimators

for it

CVEN2002/2702 (Statistics)

Dr Justin Wishart

29 / 48

For instance, the MSEr is an unbiased estimator for 2 with n k

degrees of freedom

This one is based on all the n observations from the global sample

; it has smaller variance (i.e. it is more accurate) than any other

(like e.g. Si ), and should always be used in the ANOVA framework!

Acting as usual, we can conclude that

i i

X

ni p

tnk

MSEr

and directly write a 100 (1 )% two-sided confidence interval for

i , from the observed values xi and msEr :

r

r

msEr

msEr

xi tnk ,1/2

, xi + tnk ,1/2

ni

ni

; these confidence intervals for each group will tell which values i s

are much different from one another and which ones are close

CVEN2002/2702 (Statistics)

Dr Justin Wishart

30 / 48

techniques), we would find, with t19;0.975 = 2.093 (table) and msEr = 62.98:

q

95% CI for 1 = [75.67 2.093 62.98

] = [68.89, 84.45]

q 6

95% CI for 2 = [78.43 2.093 62.98

] = [72.15, 84.71]

q 7

] = [64.05, 77.61]

95% CI for 3 = [70.83 2.093 62.98

q 6

95% CI for 4 = [87.75 2.093 62.98

4 ] = [79.45, 96.06]

3

2

2

1

teaching technique

4

3

60

70

80

90

100

CVEN2002/2702 (Statistics)

Dr Justin Wishart

31 / 48

It is also possible to build confidence intervals for the differences

between two means i and j . From observed values xi , xj and msEr , a

100 (1 ) % confidence interval for i j is

s

"

1

1

(xi xj ) tnk ;1/2 msEr

+

,

ni

nj

s

#

1

1

(xi xj ) + tnk ;1/2 msEr

+

ni

nj

for any pair of groups (i, j) (compare Slide 26 Week 10)

Finding the value 0 in such an interval is an indication that i and j

are not significantly different. On the other hand, if the interval does

not contain 0, that is evidence that i 6= j

However, these confidence intervals are sometimes misleading and

must be carefully analysed, in particular when related to the global null

hypothesis H0 : 1 = 2 = . . . = k

CVEN2002/2702 (Statistics)

Dr Justin Wishart

32 / 48

Suppose that for a pair (i, j), the 100 (1 )% confidence interval for

i j does not contain 0

(i,j)

: i = j (Sl. 44 W9)

; should you also reject H0 at significance level %? No

(i,j)

keep a % chance of being wrong

(1,2)

Successively testing H0

(1,3)

: 1 = 2 , and then H0

: 1 = 3 , and

(k 1,k )

H0

: k 1 = k , that is

k

k!

K =

=

pairwise comparisons,

2

2!(k 2)!

greatly increases the chance of making a wrong decision

(look back at Example Slide 32 Week 3)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

33 / 48

Suppose that H0 : 1 = 2 = . . . = k is true

(i,j)

were independent (which they are not! why?), we would wrongly reject

at least one null hypothesis with probability 1 (1 )K (why?)

If the decisions were perfectly dependent (which they are not either!),

we would wrongly reject at least one null hypothesis with probability

(why?)

; if we based our decision about H0 : 1 = 2 = . . . = k on the

pairwise comparison tests, we would wrongly reject H0 with a

probability strictly between and 1 (1 )K , larger than !

To fix ideas, suppose k = 4 groups, which would give K = 42 = 6

pairwise comparisons, and = 0.05

; the test based on pairwise comparisons would be of effective

significance level between 0.05 and 1 (1 0.05)6 = 0.265

CVEN2002/2702 (Statistics)

Dr Justin Wishart

34 / 48

It is usually not possible to determine exactly the significance level of

such a test: it all depends on the exact level of dependence between

the decisions about the different pairwise comparisons

Several procedures have been proposed to overcome this difficulty, the

simplest being the Bonferonni adjustment method

It is based on the Bonferonni inequality (see Exercise 1 Tut. Week 5):

P(A1 A2 . . . AK ) P(A1 ) + P(A2 ) + . . . + P(AK )

Suppose that Aq is the event we wrongly reject H0 for the qth pairwise

comparison. Then, the event B = (A1 A2 . . . AK ) is the event we

wrongly reject H0 : 1 = . . . = k

; if we want P(B) , it is enough to take P(Aq ) =

for all q

pairwise comparison tests must be carried

out at significance level

/K % (instead of %), where K = k2

CVEN2002/2702 (Statistics)

Dr Justin Wishart

35 / 48

In our running example, we have k = 4 groups, and we can run K = 6

pairwise two-sample t-tests

We can find:

t-test for H0 : 1 = 2

; p-value = 0.5276

t-test for H0 : 1 = 3

; p-value = 0.3691

t-test for H0 : 1 = 4

; p-value = 0.0346

t-test for H0 : 2 = 3

; p-value = 0.1293

t-test for H0 : 2 = 4

; p-value = 0.0537

t-test for H0 : 3 = 4

; p-value = 0.0139

From this, can we reject H0 : 1 = 2 = 3 = 4 at level 5%? No

; we must compare the above p-values to /K = 0.05/6 = 0.0083

None are smaller than 0.0083 ; do not reject H0 : 1 = 2 = 3 = 4 !

The ANOVA test did reject H0 . Is that a contradiction?

CVEN2002/2702 (Statistics)

Dr Justin Wishart

36 / 48

The ANOVA model is based on several assumptions that should be

carefully checked

The central assumption here is that the random variables ij = Xij i ,

i = 1, . . . , k and j = 1, . . . , ni , are (1) independent and (2) normally

distributed:

i.i.d.

ij N (0, ),

with (3) the same variance in each group

We do not have access to values for ij (i s are unknown!), however

we can approximate these values by the observed residuals

ij = xij xi

e

Note that these residuals are the quantities arising in ssEr

; as for a regression model (see Slides 41-42 Week 11), the

adequacy of the ANOVA model is established by examining the

residuals

; residual analysis

CVEN2002/2702 (Statistics)

Dr Justin Wishart

37 / 48

Residuals analysis

The normality assumption can be checked by constructing a normal

quantile plot for the residuals

The assumption of equal variances in each group can be checked by

plotting the residuals against the treatment level (that is, xi )

; the spread in the residuals should not depend on any way on xi

A rule-of-thumb is that, if the ratio of the largest sample standard

deviation to the smallest one is smaller than 2, the assumption of equal

population variances is reasonable

The assumption of independence can be checked by plotting the

residuals against time, if this information is available

; no pattern, such sequences of positive and negative residuals,

should be observed

As for the regression, the residuals are everything the model will not

consider ; no information should be observed in the residuals, they

should look like random noise

CVEN2002/2702 (Statistics)

Dr Justin Wishart

38 / 48

For our running example, a normal quantile plot and a plot against the

fitted values xi for the residuals are shown below:

Normal QQ Plot

residuals

10

residuals

Theoretical Quantiles

10

10

10

65

Sample Quantiles

70

75

80

85

90

95

; the assumptions we made look valid

CVEN2002/2702 (Statistics)

Dr Justin Wishart

39 / 48

Example

To assess the reliability of timber structures, researchers have studied

strength factors of structural lumber. Three species of Canadian softwood

were analysed for bending strength (Douglas Fir, Hem-Fir and

Spruce-Pine-Fir). Wood samples were selected from randomly selected

sawmills. The results of the experiment are given below. Is there any

significant difference in the mean bending parameters among the three types

of wood?

Douglas (1)

370

150

372

145

374

365

Hem (2)

381

401

175

185

374

390

Spruce (3)

440

210

230

400

386

410

the alternative Ha : not all the means are equal

CVEN2002/2702 (Statistics)

Dr Justin Wishart

40 / 48

We computed values for the ANOVA table:

Source

degrees

of freedom

sum of

squares

mean

square

F -statistic

Treatment

7544

3772

0.33

Error

15

172929

11529

Total

17

180474

; here we have observed f0 = 0.33 ; do not reject H0 !

Associated p-value: p = P(X > 0.33) = 0.726 for X F2,15

; we confidently claim that there is no significant difference in the

mean bending parameters for the different wood types

CVEN2002/2702 (Statistics)

Dr Justin Wishart

41 / 48

Residual analysis:

residuals

100

Normal QQ Plot

50

residuals

50

Theoretical Quantiles

100

150

150

100

50

50

100

280

290

300

Sample Quantiles

310

320

330

340

350

; the above conclusion is certainly not reliable!

CVEN2002/2702 (Statistics)

Dr Justin Wishart

42 / 48

Blocking factor

450

400

300

250

bending parameter

350

200

150

Mill 1

Mill 2

Mill 3

Mill 4

Mill 5

Mill 6

tree type

CVEN2002/2702 (Statistics)

Dr Justin Wishart

43 / 48

Blocking factor

It is clear that over and above the wood type, the mills where the

lumber was selected is another source of variability, in this example

even more important than the main treatment of interest (wood type)

This kind of extra source of variability is known as a blocking factor,

as it essentially groups some observations in blocks across the initial

groups ; the samples are not independent! (assumption violation)

; a potential blocking factor must be taken into account!

When a blocking factor is present, the initial Error Sum of Squares,

, that is the whole amount of variability not due to the

say SSEr

treatment, can in turn be partitioned into:

1

the variability due to the blocking factor, quantified by SSBlock

2

the true natural variability in the observations SSEr

= SS

We can write thatSSEr

Block + SSEr , and thus

SSTot = SSTr + SSBlock + SSEr

CVEN2002/2702 (Statistics)

Dr Justin Wishart

44 / 48

Blocking factor

The ANOVA table becomes:

Source

degrees

of freedom

sum of

squares

mean

square

Treatment

k 1

ssTr

msTr =

Block

b1

ssBlock

msBlock =

Error

nk b+1

ssEr

Total

n1

ssTot

msEr =

ssTr

k 1

F -statistic

f0 =

msTr

msEr

ssBlock

b1

ssEr

nk b+1

msTr

Note: the test statistic is again the ratio ms

(we have just removed the

Er

variability due to the blocking factor first), to be compared with the

quantile of the Fk 1,nk b+1 distribution

CVEN2002/2702 (Statistics)

Dr Justin Wishart

45 / 48

Blocking factor

In the previous example, we would have found:

Source

degrees

of freedom

sum of

squares

mean

square

F -statistic

Treatment

7544

3772

15.87

Block

170552

34110

Error

10

2378

238

Total

17

180474

Here, we have observed f0 = 15.87 ; clearly reject H0 !

Associated p-value: p = P(X > 15.87) = 0.0008 for X F2,10

CVEN2002/2702 (Statistics)

Dr Justin Wishart

46 / 48

The SSEr in the first ANOVA (without block) was 172,929 which contains an

amount of variability 170,552 due to mills

was due to mill to mill variability, and so was

no natural variability!

The second ANOVA (with blocking factor) adjusts for this effect

The net effect is a substantial reduction in the genuine MSEr , leading to a

larger F -statistic (increased from 0.33 to 15.87!)

; with very little risk of being wrong (p ' 0), we can now conclude that

there is a significant difference in the mean bending parameters for

the three different wood types

An analysis of the residuals in this second ANOVA would not show anything

peculiar ; valid conclusion

conclusion, and it should always be carefully assessed whether a

blocking factor may exist or not (plot the data!)

CVEN2002/2702 (Statistics)

Dr Justin Wishart

47 / 48

Objectives

Objectives

Now you should be able to:

conduct engineering experiments involving a treatment with a

certain number of levels

understand how the ANOVA is used to analyse the data from

these experiments

assess the ANOVA model adequacy with residual plots

understand the blocking principle and how it is used to isolate the

effect of nuisance factors

Recommended exercises: Q3, Q6 p.406, Q9 p.407, Q10, Q11 p.412,

Q13, Q15, Q17 p.413, Q19 p.414, Q22, Q23 p.415, Q35 p.428

CVEN2002/2702 (Statistics)

Dr Justin Wishart

48 / 48

- ANOVATransféré parSAEEDAWAN
- STATISTICS for Mgt Summary of ChaptersTransféré parramanadk
- Six Sigma Tools Matrix in Stages.xlsxTransféré parRenzo Chavez
- Functions of Inferential Statistics TestsTransféré pargeorgedelosreyes
- Mastering Data Analysis Tools.pptxTransféré parسعد سعيد
- The Effect of Demographic Characteristic of Passengers on Relationship Quality in Airlines IndustriTransféré parRidho Bramulya Ikhsan
- Summer 2009 SPSS TutorialTransféré parTarique kamaal
- SPSS_MixedEffectsTransféré parkai402
- Streets Hawkers of North CampusTransféré parAditi
- Mathematical Error in Indian Rubber StatisticsTransféré parChandrasekharan Nair
- case-study-call-centre-hypothesis-testingTransféré parBilal Hashmi
- OrdinalexampleR.pdfTransféré parDamon Copeland
- 1041_partC_S2_15Transféré parShirley Liu
- Statistics in 1 HourTransféré parRedza Khalid
- What Quant Test to UseTransféré parHum92re
- Comparison of Response Surface Methodology and Artificial NeuralTransféré parYlm Ptana
- lasisi 2014 business relationships.pdfTransféré parJean Karmel
- Book1 MainTransféré parRadHika GaNdotra
- Articulo iTransféré parKlisman A. Flores Durand
- 5 wedm1Transféré parAmith Sc
- A3solTransféré parjjmei
- Vidal 2012Transféré parWilliam Rolando Miranda Zamora
- DX07-Mixture Very NEW.pdfTransféré pardimasharyanto
- korelasiTransféré partulip_1511
- ANOVA1.pptTransféré parSriGanapathy
- AfzalTransféré parSana Khan
- V2N2P5Transféré parNiveditha Nandakumar
- Two Independent T TestTransféré parshiviparashar
- Kadar AirTransféré parDedeFauziNuriyasa
- Asep L Research ITransféré parNdoro Rian

- Solution CVEN2002 Solutions 1 7Transféré parKai Liu
- CVEN2002 Week7Transféré parKai Liu
- CVEN2002 Week10Transféré parKai Liu
- CVEN2002 Week5Transféré parKai Liu
- CVEN2002 Week11Transféré parKai Liu
- CVEN2002 Week6Transféré parKai Liu
- CVEN2002 Week8Transféré parKai Liu
- CVEN2002 Week9Transféré parKai Liu
- Week1.pdfTransféré parKai Liu
- Week2.pdfTransféré parKai Liu
- Week4.pdfTransféré parKai Liu
- Week3.pdfTransféré parKai Liu

- FMG 3232- I - Handout I.pptTransféré parSaseela Balagobei
- ruswpob.pdfTransféré parAnonymous MqprQvjEK
- Performance Appraisal and Employee Satisfaction With Reference to Western Coal Field Limited (WCL)Transféré pararcherselevators
- Corredor bucalTransféré parJOSICA058
- Anova Two Way Editdatan3 1Transféré parumi
- 2.IJBMRAUG20182Transféré parTJPRC Publications
- Credit Card Usage and Current AgeTransféré parNur Sophia
- LOS 2016Transféré parvruyrgab1598
- The Relationship Between Organizational Culture and Job Satisfaction in National OilTransféré parAbmg Libya
- May:June 2015Transféré parpradyothan
- Reducing Errors With Six Sigma QP 07-2005Transféré parcpkakope
- RPA NR Case StudiesTransféré parRubik Art
- Design of Experiments. Montgomery DoETransféré parstudycam
- HypothesisTransféré parSarah Zack
- Seasonal Rainfall Trend AnalysisTransféré parAnonymous 7VPPkWS8O
- NN5Transféré parkhalala
- Teeth, Brains, And Primate Life HistoriesTransféré pargiustamarco
- Carol Buck - Popper's Philosophy for EpidemiologistsTransféré parAlexandre Hahn
- SyllabusTransféré parashwinisamak
- 7Transféré parJoanh Adown
- StatisticsTransféré parAnonymous hYMWbA
- Bioequivalence - An Overview of Statistical ConceptsTransféré parAhmad Abdullah Najjar
- IISWBM Placement E-Brochure 2015Transféré parDrJensmon George
- Share KhanTransféré parrahulsogani123
- Estimate heat of vaporisationTransféré parClarence AG Yue
- 4.IJBMROCT20174Transféré parTJPRC Publications
- Post-ANOVA Comparison of Means.pptTransféré parFelix Ws
- The Friedman TestTransféré parJakobus Benny Salim
- So Be Hart Keenan Stein 2000Transféré parMario Cruz
- Mastering Advanced Analytics With Apache SparkTransféré parAgMa Hu