Lecture6 Module2 Anova 1

Analysis of Variance and Design g of ExperimentExperiment p -I
MODULE II
LECTURE - 6
GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCE

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Tests of hypothesis in the linear regression model

First we discuss the development of the tests of hypothesis concerning the parameters of a linear regression model. These tests of hypothesis will be used later in the development of tests based on the analysis of variance.
Analysis of Variance The technique in the analysis of variance involves the breaking down of total variation into orthogonal
components. Each orthogonal factor represents the variation due to a particular factor contributing in the total variation. Model Let Y1 , Y2 ,..., Yn be independently distributed following a normal distribution with mean E (Yi ) = j xij and variance 2. Denoting
p
Y = (Y1 , Y2 ,..., Yn ) a n 1 column vector, such assumption can be expressed in the form of a linear regression model Y = X +
where X is a n p matrix, is a p 1 vector and is a n 1 vector of disturbances with
j =1
E ( ) = 0
Cov ( ) = 2 I and follows a normal distribution. This implies that
E (Y ) = X E (Y X )(Y X ) = 2 I .
Now we consider four different types of tests of hypothesis . I the In th first fi t two t cases, we develop d l the th likelihood lik lih d ratio ti test t t for f the th null ll hypothesis h th i related l t d to t the th analysis l i of f variance. i N t that, Note th t later we will derive the same test on the basis of least squares principle also. An important idea behind the development of this test is to demonstrate that the test used in the analysis of variance can be derived using least squares principle as well as likelihood ratio test.
Case 1: Test of H 0 : = 0
0 0 ,..., p ) ' is specified and 2 is Consider the null hypothesis for testing H 0 : = 0 where = ( 1 , 2 ,..., p ), 0 = ( 10 , 2
unknown.
This null hypothesis is equivalent to

0 H 0 : 1 = 10 , 2 = 20 ,..., p = p .
Assume that all i ' s are estimable, i.e., rank(X) = p (full column rank). We now develop the likelihood ratio test. The ( p + 1) 1 dimensional parametric space is a collection of points such that
= {( , 2 ); < i < , 2 > 0 i = 1, 2,... p} .

0 Under H 0 , all ' s are known and equal, say and the reduces to one dimensional space given by
= {( 0 , 2 ); 2 > 0} .
The likelihood function of y1 , y2 ,..., yn is 1 2 1 exp 2 ( y X )( y X ) . L( y | , ) = 2 2 2
2 n
2 The likelihood function is maximum over when and are substituted with their maximum likelihood estimators, i.e.,
= ( X X )1 X y )( y X ). 2 = (y X
and Substituting 2 in L( y | , 2 ) gives
1 n
1 1 2 )( y X ) Max L( y | , 2 ) = exp 2 ( y X 2 2 2
2 n n = exp . 2 ( y X )( y X ) 2
U d H 0 , the Under th maximum i lik lih d estimator likelihood ti t of f 2 is i
n
2 = ( y X 0 )( y X 0 ).
The maximum value of the likelihood function under H 0 is
1 n
1 1 2 exp 2 ( y X 0 )( y X 0 ) Max L( y | , ) = 2 2 2
2
2 n n = exp . 0 0 2 2 ( y X )( y X )
5
The likelihood ratio test statistic is
Max L( y | , 2 ) Max L( y | , 2 )
n
)( y X ) 2 (y X = 0 0 ( y X )( y X )
2 )( y X ) (y X = ' ) + (X X 0 ) ( y X ) + (X X 0 ) ( y X
n
0 ) ' X X ( 0) ( = 1 + )( y X ) y X (
q = 1 + 1 q2
n 2
n 2
)( y X ) where q2 = ( y X
and
0 ) X X ( 0 ). q1 = (
The expression p of q1 and q2 can be further simplified p as follows:
Consider
0 ) X X ( 0) q1 = ( X ( X X )1 X y 0 X )1 X y 0 = ( X X X ( X X )1 X ( y X 0 ) = X )1 X ( y X 0 ) ( X X = ( y X 0 ) X ( X X )1 X X ( X X )1 X ( y X 0 ) = ( y X 0 ) X ( X X )1 X ( y X 0 ) )( y X ) q2 = ( y X 1 X )1 X y y = y X (X y X ( X X ) X
1 = y I X ( X X ) X y
= [( y X 0 ) + X 0 ][ I X ( X X ) 1 X '][( y X 0 ) + X 0 ] = ( y X 0 )[ I X ( X X )1 X ]( y X 0 ).
Other two terms become zero using [ I X ( X X )1 X ] X = 0.
7
In order to find out the decision rule for H 0 based on , first we need to find if is a monotonic increasing or decreasing function of
q1 . So we proceed as follows: q2
n 2
q q Let g = 1 , so that = 1 + 1 q2 q2
= (1 + g )
n 2
then
d n = d dg 2
1 (1 + g ) 2
n +1
So as g increases,
d decreases. dg
q1 . q2
Thus is a monotonic decreasing function of
The decision rule is to reject H 0 if 0 where 0 is a constant to be determined on the basis of size of the test. Let us simplify this in our context.
0
or or
q1 1 + q2 1
n 2
(1 + g )
2 n
n 2
o
2
or or or
(1 + g ) 0 n g 0 1 g C
where C is a constant to be determined by the size condition of the test.
So reject H 0
whenever
q1 C q2
q1 can also be obtained by the least squares method as follows. The least squares methodology will q2 also be discussed in further lectures.
Note that the statistic
0 ) X X ( 0) q1 = ( q1 Min( y X )( y X ) =
Min( y X )( y X )
sum of squares due to deviation from H o OR sum of squares due to
sum of squares due to H o OR Total sum of squares of
sum squares due to error
9 Theorem 9
Let
Z =Y X0 Q1 = Z X ( X X )1 X ' Z Q2 = Z [ I X ( X X )1 X ]Z .
Q1 Q2 2 2 Then Q1 and Q2 are independently distributed. Further, when H 0 is true , then 2 ~ ( p ) and 2 ~ (n p ) 2 2
where 2 ( m ) denotes the 2 distribution with m degrees of freedom. Proof: Under H 0 ,
E (Z ) = X 0 X 0 = 0 Var ( Z ) = Var (Y ) = 2 I .
Further Z is a linear function of Y and Y follows a normal distribution. So Z ~ N (0, (0 2 I ) The matrices X ( X X )1 X and [ I X ( X X ) 1 X ] are idempotent matrices. So
tr [ X ( X X ) 1 X ] = tr[( X X ) 1 X X ] = tr ( I p ) = p tr [ I X ( X X ) 1 X ] = tr I n tr[ X ( X X ) 1 X ] = n p.
So using theorem 6, we can write that under H 0 ,
Q1
2
~ 2 ( p ) and
Q2
2
~ 2 (n p )
1 1 where the degrees of freedom p and (n-p) are obtained by the trace of I X ( X X ) X , X ( X X ) X and trace of respectively.
1 1 Q Q I X ( X X ) X X ( X X ) X = 0, so using theorem 7, the quadratic forms 1 and 2 are independent under H0 . Hence the theorem is proved.
Since
10
Since Q1 and Q2 are independently distributed, so under H 0 ,
Q1 / p Q2 /(n p ) follows a central F distribution, i.e.
n p Q1 F ( p, n p). p Q2
Hence the constant C in the likelihood ratio test statistic is given by C = F1 ( p, n p) where F1 (n1 , n2 ) denotes the upper 100 % points of F-distribution with n1 and n2 degrees of freedom. The computations of this test of hypothesis can be represented in the form of an analysis of variance table.
ANOVA table for testing H 0 : = 0

Source of variation
Due to
Degrees of freedom
p
Sum of squares
q1
Mean squares
q1 p
F - value
n p q1 p q2
Error (n p)
C = F1 ( p, n p) q2
H0 : = 0
q2 (n p)
Total
( y X 0 )( y X 0 )

Lecture6 Module2 Anova 1

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lecture6 Module2 Anova 1

Transféré par

Droits d'auteur :

Formats disponibles

Analysis of Variance and Design g of ExperimentExperiment p -I

GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCE

Tests of hypothesis in the linear regression model

This null hypothesis is equivalent to

= {( , 2 ); < i < , 2 > 0 i = 1, 2,... p} .

The expression p of q1 and q2 can be further simplified p as follows:

Other two terms become zero using [ I X ( X X )1 X ] X = 0.

Thus is a monotonic decreasing function of

where C is a constant to be determined by the size condition of the test.

Note that the statistic

sum of squares due to deviation from H o OR sum of squares due to

sum of squares due to H o OR Total sum of squares of

sum squares due to error

where 2 ( m ) denotes the 2 distribution with m degrees of freedom. Proof: Under H 0 ,

So using theorem 6, we can write that under H 0 ,

ANOVA table for testing H 0 : = 0

Vous aimerez peut-être aussi