Vous êtes sur la page 1sur 123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Linear Statistical Models: Inference for the full


rank model
Notes by Yao-ban Chan and Owen Jones

Linear Statistical Models: Inference for the full rank model

1/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

In this section, we develop various forms of hypothesis testing on


the full rank model. To recap, the full rank model is
y = X +
where X is n p, n p, r (X ) = p, and the errors have:
I

mean 0;

variance 2 I ;

(for some theorems) are normally distributed.

Linear Statistical Models: Inference for the full rank model

2/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The first thing we want to test is for model relevance: does our
model contribute anything at all?
If none of the x variables have any relevance for predicting y, then
all the parameters will be 0. We test for this using the null
hypothesis
H0 : = 0.
Alternatively, if at least some of the x variables are relevant to
predicting y, then the corresponding parameters will be nonzero.
So our alternative hypothesis is
H1 : 6= 0.
To test these hypotheses, we assume throughout the section that
the errors are normally distributed.
Linear Statistical Models: Inference for the full rank model

3/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

ANOVA
The method used to test the hypotheses is ANOVA.
If = 0, then y = consists entirely of errors. In this case, yT y,
the sum of squares of the errors, measures the variability of the
errors.
However, if 6= 0, then y = X + . In this case, yT y is not
made up solely of the errors but also of the model predictions.
Some of yT y will come from the errors and some from the model
predictions.
By separating yT y into the two parts, measuring variation due to
the model and variation due to the errors, we can compare them to
see how well the model is doing.
Linear Statistical Models: Inference for the full rank model

4/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

More precisely, the sum of squares of the residuals is

SSRes

= (y X b)T (y X b)

= yT y 2yT H y + yT H 2 y

(y H y)T (y H y)
=

yT y yT H y

= yT y yT X (X T X )1 X T y
which means that
yT y = yT X (X T X )1 X T y + SSRes .
T y
the regression sum of squares
We call yT X (X T X )1 X T y = y
and denote it by SSReg . It reflects the variation in the response
variable that is accounted for by the model. If we call the total
variation in the response variable SSTotal = yT y, then we have
divided it into:
SSTotal = SSReg + SSRes .
Linear Statistical Models: Inference for the full rank model

5/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example. Suppose that there is no error, so that y = X . We


have

SSReg

= yT X (X T X )1 X T y
= T X T X (X T X )1 X T X
= T X T X
= yT y = SSTotal

and SSRes = 0.

Linear Statistical Models: Inference for the full rank model

6/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

On the other hand, suppose that there is no signal, so that = 0


and y = . If we put b = = 0 then

SSRes

= (y X b)T (y X b)
= yT y = SSTotal

and SSReg = 0.

These are the two extremes of the spectrum.

Linear Statistical Models: Inference for the full rank model

7/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

3.5
2.0

2.5

3.0

4.0

4.5

5.0

Example. Recall our previous paint cracking example, in which the


data had a strong linear relationship.

Linear Statistical Models: Inference for the full rank model

7
8/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The data matrices are

y=

1.9
2.7
4.2
4.8
4.8
5.1

,X =

1
1
1
1
1
1

2
3
4
5
6
7

and the sample the variance is


s 2 = 0.27.
This means that
SSRes = (n p)s 2 = (6 2)s 2 = 1.1.

Linear Statistical Models: Inference for the full rank model

9/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Since

SSTotal = yT y =


1.9 2.7 4.2 4.8 4.8 5.1

1.9
2.7
4.2
4.8
4.8
5.1

= 100.63,

we get
SSReg = SSTotal SSRes = 99.53.
Since 99.53 > 1.1, informally we would say that there is some
linear signal in the data.

Linear Statistical Models: Inference for the full rank model

10/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

To create a formal test of = 0, we compare SSReg against


SSRes . If SSReg is large compared to SSRes , then we have evidence
that 6= 0.

To know exactly how large, we must first derive the distributions of


SSReg and SSRes .

Linear Statistical Models: Inference for the full rank model

11/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Theorem
In the full rank linear model, SSRes / 2 has a 2 distribution with
n p degrees of freedom, SSReg / 2 has a noncentral 2
distribution with p degrees of freedom and noncentrality parameter
=

1 T T
X X ,
2 2

and they are independent.

Linear Statistical Models: Inference for the full rank model

12/123

=0

General linear hypothesis

Linear Statistical Models: Inference for the full rank model

Splitting

Corrected SS

Sequential testing

13/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The test for = 0 comes about when we observe that if the null
hypothesis is true, the noncentrality parameter for SSReg / 2 must
be 0.

Thus, under H0 ,
SSReg /p 2
SSReg /p
MSReg
=
=
2
SSRes /(n p)
SSRes /(n p)
MSRes
has an F distribution with p and n p degrees of freedom.

Linear Statistical Models: Inference for the full rank model

14/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

What happens if H0 is not true? The expected value of MSReg is




SSReg
1
E
= 2 + T X T X .
p
p
(Recall SSReg = yT H y and E xT Ax = tr (AV ) + T A.)

The expected value of the denominator MSRes is




SSRes
E
= E [s 2 ] = 2 .
n p

Linear Statistical Models: Inference for the full rank model

15/123

=0

General linear hypothesis

So if = 0, E [

SSReg
p ]

Splitting

Corrected SS

Sequential testing

= 2 and the statistic should be close to 1.

But if 6= 0, since X T X is positive definite, we get


SS
E [ pReg ] > 2 and the statistic should be bigger than one.

Therefore, we should use a one-tailed test and reject H0 if the


statistic is large.

Linear Statistical Models: Inference for the full rank model

16/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

To lay out the workings, we use a familiar ANOVA table.

Source of
variation
Regression
Residual
Total

Sum of
squares
yT X (X T X )1 X T y
T
y y yT X (X T X )1 X T y
yT y

Linear Statistical Models: Inference for the full rank model

degrees of
freedom
p
n p
n

Mean
square
SSReg
p
SSRes
np

F
ratio
MSReg
MSRes

17/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example: system cost


A data processing system uses three types of structural elements:
files, flows and processes. Files are permanent records, flows are
data interfaces, and processes are logical manipulations of the
data. The costs of developing software for the system are based on
the number of these three elements. A study is conducted with the
following results:
Cost (y)
22.6
15
78.1
28
80.5
24.5
20.5
147.6
4.2
48.2
20.5

Files (x1 )
4
2
20
6
6
3
4
16
4
6
5

Flows (x2 )
44
33
80
24
227
20
41
187
19
50
48

Processes (x3 )
18
15
80
21
50
18
13
137
15
21
17

Linear Statistical Models: Inference for the full rank model

18/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The model we use is


yi = 0 + 1 xi1 + 2 xi2 + 3 xi3 + i .
We want to test the hypothesis
H0 : = 0 vs. H1 : 6= 0.
Simple matrix calculations give us
SSReg

= yT X (X T X )1 X T y = 38978

yT y = 39667
SSRes

= yT y SSReg = 689

MSReg

= SSReg /4 = 9745

MSRes

= SSRes /(11 4) = 98

F4,7 = MSReg /MSRes = 99

Linear Statistical Models: Inference for the full rank model

19/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The F ratio is very large and we would expect H0 to be rejected


based on it. Indeed, the critical value for = 0.01 is 7.85, so we
can say that 6= 0 with 99% confidence.

Variation
Regression
Residual
Total

SS
38978
689
39667

d.f.
4
7
11

MS
9745
98

Linear Statistical Models: Inference for the full rank model

F
99

20/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example: clover area

Recall the clover example


>
>
>
>
>
>
>
>

clover <- read.csv("../data/clover.csv")


clover <- log(clover)
clover <- clover[-c(6,23,47,68,97,111,140),]
y <- clover$area
X <- cbind(1, clover$midrib, clover$estim)
b <- solve(t(X) %*% X, t(X) %*% y)
n <- length(y)
p <- dim(X)[2]

Linear Statistical Models: Inference for the full rank model

21/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We test H0 : = 0.
> (SS <- sum(y^2))
[1] 381.1864
> (SSRes <- sum((y - X %*% b)^2))
[1] 4.498653
> (SSReg <- SS - SSRes)
[1] 376.6877
> (SSReg <- t(y) %*% X %*% solve(t(X) %*% X) %*% t(X) %*% y)
[,1]
[1,] 376.6877
> (Fstat <- as.vector((SSReg/p)/(SSRes/(n-p))))
[1] 3768.005
> pf(Fstat, p, n-p, lower.tail=FALSE)
[1] 6.656806e-130
Linear Statistical Models: Inference for the full rank model

22/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> basemodel <- lm(area ~ 0, data=clover)


> model <- lm(area ~ midrib + estim, data=clover)
> anova(basemodel, model)
Analysis of Variance Table
Model 1: area ~
Model 2: area ~
Res.Df
RSS
1
138 381.19
2
135
4.50
--Signif. codes:

0
midrib + estim
Df Sum of Sq
3

Pr(>F)

376.69 3768 < 2.2e-16 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Statistical Models: Inference for the full rank model

23/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The general linear hypothesis

We can now progress to testing the general linear hypothesis,


which tests
H0 : C = vs. H1 : C 6=
where C is an r p matrix of rank r p and is an r 1
vector of constants.

This hypothesis makes it possible to test for relationships among


the parameters, as well as testing the individual parameters against
a constant.

Linear Statistical Models: Inference for the full rank model

24/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example. Consider the null hypothesis of model relevance,


H0 : = 0. We can express this in the form of the general linear
hypothesis with C = Ip (which has rank p) and = 0.
Example. Consider the regression model with 4 parameters (3
predictors)
yi = 0 + 1 xi1 + 2 xi2 + 3 xi3 + i .
Let

C =

0 1 1
0
0 0
1 1

, =

0
0


.

Suppose we want to test H0 : C = . Then what we are really


testing for is
1 2 = 0
2 3 = 0.
In other words, we are testing the hypothesis 1 = 2 = 3 .
Linear Statistical Models: Inference for the full rank model

25/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Test statistic

To develop a test statistic, we start with C b , the least


squares estimator for C . Because it is a vector containing
linear combinations of variables which have a joint normal
distribution, it is a normal random vector, with mean and variance
E(C b ) = C ,
Var (C b ) = C (X T X )1 C T 2 .

Linear Statistical Models: Inference for the full rank model

26/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Therefore, the quadratic form


(C b )T [C (X T X )1 C T ]1 (C b )
2
has a noncentral 2 distribution with r degrees of freedom and
noncentrality parameter
=

(C )T [C (X T X )1 C T ]1 (C )
.
2 2

(How do we know C (X T X )1 C T has an inverse?)

Linear Statistical Models: Inference for the full rank model

27/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

If the null hypothesis is true, then C = and the quadratic


form has a 2 distribution.

Since the numerator depends (stochastically) only on b, and


therefore is independent from s 2 , under the null hypothesis the
statistic
(C b )T [C (X T X )1 C T ]1 (C b )/r
SSRes /(n p)
has an F distribution with r and n p degrees of freedom. We
use this statistic to test the general linear hypothesis.

Linear Statistical Models: Inference for the full rank model

28/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

To justify a one-tailed test, note that the expected value of the


numerator is
(C b )T [C (X T X )1 C T ]1 (C b )
]
r
1
= 2 + (C )T [C (X T X )1 C T ]1 (C ).
r

E[

where C (X T X )1 C T is positive definite. (Why?)


If the null hypothesis is true, then the expectation is 2 . However,
if it is false, the expectation will be greater than 2 . Therefore we
reject H0 when the statistic is large.

Linear Statistical Models: Inference for the full rank model

29/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example: system cost


Consider the data processing system example we looked at earlier.
Suppose we want to test the hypothesis

T
H0 : = 2 0 0 1
. The least squares estimate is

1.96
0.12

b = (X T X )1 X T y =
0.18
0.8
so

0.04
0.12

b =
0.18 .
0.2
Linear Statistical Models: Inference for the full rank model

30/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Our calculations proceed as follows (noting C = I ):


(b )T X T X (b ) = 1110.18
SSRes = yT [I X (X T X )1 X T ]y = 668.63

F4,7

p = 4
1110.18/4
=
= 2.8.
668.63/7

The critical value at = 0.05 is 4.12, so we cannot reject the null


hypothesis. This doesnt mean that it is true, just that it is close!
Exercise: show that
(b )T X T X (b ) = (yX )T (yX )(yX b)T (yX b).
That is, the SSRes for the model under H0 minus the SSRes for the
full model.
Linear Statistical Models: Inference for the full rank model

31/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Suppose now we wish to test the hypothesis H0 : C = , where




 
0 1 1
0
0
C =
, =
.
0 0
1 1
0
Out least squares estimates are (still)

1.96
0.12

b = (X T X )1 X T y =
0.18 .
0.8

Linear Statistical Models: Inference for the full rank model

32/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Therefore

Cb
T

C (X X )

0.06
0.62

0.013 0.0024
0.0024 0.00077

=
=




(C b )T [C (X T X )1 C T ]1 (C b ) = 1138.35

Linear Statistical Models: Inference for the full rank model

33/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Since SSRes was calculated earlier to be 668.63, our F statistic


(with 2 and 7 degrees of freedom) is
1138.35/2
= 5.79.
668.63/7
The corresponding p-value is 0.9672 Thus we can reject the null
hypothesis at the 5% level, but not at the 1% level.

That is, there is evidence that the parameters 1 , 2 , and 3 are


not identical, but not strong evidence.

Linear Statistical Models: Inference for the full rank model

34/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example: clover area


For the clover data, consider the null hypothesis
H0 : (0 , 1 , 2 ) = (1, 0.5, 1).
> bst <- as.vector(c(-1, 0.5, 1))
> ( Fstat <- ((t(b-bst) %*% t(X) %*% X %*% (b-bst))/p)/
+
(SSRes/(n-p)) )
[,1]
[1,] 330.4352
> pf(Fstat, p, n-p, lower.tail=FALSE)
[,1]
[1,] 5.661703e-62

Linear Statistical Models: Inference for the full rank model

35/123

=0

General linear hypothesis

>
>
>
>

Splitting

Corrected SS

Sequential testing

h0 <- X %*% bst


basemodel <- lm(area ~ 0, data=clover, offset=h0)
model <- lm(area ~ midrib + estim, data=clover)
anova(basemodel, model)

Analysis of Variance Table


Model 1: area ~
Model 2: area ~
Res.Df
RSS
1
138 37.532
2
135 4.499
--Signif. codes:

0
midrib + estim
Df Sum of Sq
3

Pr(>F)

33.034 330.44 < 2.2e-16 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '

Linear Statistical Models: Inference for the full rank model

36/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 0 = 1, 1 = 2
> ( C <- matrix(c(1,0,0,1,0,-1),2,3) )
[1,]
[2,]

[,1] [,2] [,3]


1
0
0
0
1
-1

> r <- 2 # rank C


> dst <- c(-1,0)
> ( Fstat <- (t(C %*% b - dst) %*%
+
solve(C %*% solve(t(X) %*% X) %*% t(C)) %*%
+
(C %*% b - dst)/r)/(SSRes/(n-p)) )
[,1]
[1,] 18.87658
> pf(Fstat, 2, n-p, lower.tail=FALSE)
[,1]
[1,] 5.905736e-08
Linear Statistical Models: Inference for the full rank model

37/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Testing if part of is 0

If we find that 6= 0, we cannot say which i are nonzero, only


that at least one is not.

If a particular i is zero, then it is best to remove it from the


model. Otherwise it will only serve to fit noise, and reduce the
ability of the model to predict.

Thus, we need to find a way of testing whether parts of the


parameter vector are 0 or not.

Linear Statistical Models: Inference for the full rank model

38/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We split the parameter vector

0
..
.


r 1
= 1
r
2

..
.
k

and test the hypotheses


H0 : 1 = 0 vs. H1 : 1 6= 0.
By relabelling the indices, we can test the zero-ness of any subset
of the parameters.

Linear Statistical Models: Inference for the full rank model

39/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The important thing to note is that we are testing 1 = 0 in the


presence of the other parameters, not by itself.

In other words, we are comparing two models: in H1 , the full model


y = X + ,
and in H0 , the reduced model
y = X2 2 + 2
where X2 contains the last p r columns of X = [X1 |X2 ].

Linear Statistical Models: Inference for the full rank model

40/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Let C = [Ir |0] and = 0 then C = iff 1 = 0.


We define the regression sum of squares for 1 in the presence of
2 as
R( 1 | 2 ) = (C b )T (C (X T X )1 C T )1 (C b )
= 1 T A1
11 1
where 1 is the least squares estimator for 1 , and A11 is the
r r principal minor of (X T X )1 .
Our test statistic is

R( 1 | 2 )/r
.
SSRes /(n p)

Under the null hypothesis that 1 = 0 this has an Fr ,np


distribution, and we reject the null when it is too large.
Linear Statistical Models: Inference for the full rank model

41/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Theorem
R( 1 | 2 ) = R() R( 2 )
where R() is the regression sum of squares for the full model


1
y = X + = [X1 |X2 ]
+
2
and R( 2 ) is the regression sum of squares for the reduced model
y = X2 2 + .

Linear Statistical Models: Inference for the full rank model

42/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We will content ourselves with showing


E 1 T A1
11 1 = E(R() R( 2 ))
= EyT [X (X T X )1 X T X2 (X2T X2 )1 X2T ]y

Lemma
Suppose that

A=

A11 A12
A21 A22

,A


=B =

B11 B12
B21 B22


,

1
and B22
exists. Then
1
A1
11 = B11 B12 B22 B21 .

Linear Statistical Models: Inference for the full rank model

43/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We have
T

(p (p r )) + [X

X)

T
X2 ]y
T
1 T
tr (X (X X )
X X2 (X2 X2 )
X2 )
T
1 T
T T
T
1 T
+ X [X (X X )
X X2 (X2 X2 )
X2 ]X

Ey [X (X

X2 (X2 X2 )
1

X X

r
h
+
T
1
2

T
2
T
1

""

T
2

X1T
X2T
""

#
#

X1T
T
1 T 
X1
X2
X2 (X2 X2 )
X2
T
X2
#
"
## 
X1T X2
X1T X2 (X2T X2 )1 X2T X1
X1T X2

X2T X2
X2T X1
X2T X2

X1

X2

X1T X1
X2T X1

r+

r + 1 [X1 X1 X1 X2 (X2 X2 )

X2 X ]

"

X2 (X2 X2 )

1
tr (A11 A11 )
T 1
E 1 A11 1 .
2

1
2

1
2

X2 X1 ] 1

T 1
1 A11 1

Here we have applied the lemma with


A = (X T X )1 ,

B = XTX =

Linear Statistical Models: Inference for the full rank model

X1T X1
X2T X1

X1T X2
X2T X2


.

44/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We can again express the test calculations in an ANOVA table.

Source of
variation
Regression
Full model
Reduced model
1 in presence of 2
Residual
Total

Sum of
squares

degrees of
freedom

R()
R( 2 )
R( 1 | 2 )
yT y R()
yT y

p
pr
r
n p
n

Linear Statistical Models: Inference for the full rank model

Mean
square

R( 1 | 2 )
r
SSRes
np

F
ratio

R( 1 )| 2 )/r
MSRes

45/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example. Consider the data processing system example given


above. We rejected the null hypothesis = 0. But that is obvious
because the cost of a system is not going to have average 0.
The question we want to test is, does the cost depend on the files,
flows or processes? In other words, is one of 1 , 2 , or 3 nonzero?
To do this, we re-arrange the parameter vector as

1


2
1

=
=
.
3
2
0

Linear Statistical Models: Inference for the full rank model

46/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

To keep things in sync, we must rearrange the columns of X :

4 44 18 1
2 33 15 1 


X = .
= X1 X2 .

.
.
.
..
.. ..
..
5 48 17 1

We want to test H0 : 1 = 0 (the intercept alone is adequate)


against H1 : 1 6= 0. The reduced model is
y = X2 0 + 2 .

Linear Statistical Models: Inference for the full rank model

47/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The sum of squares of regression for the reduced model is


R( 2 ) = yT X2 (X2T X2 )1 X2T y
= (X2T y)T (n)1 X2T y
!2
n
1 X
=
yi
11
i=1

= 21800.
From before,
R() = SSReg = 38978, MSRes = 98,
so
R( 1 | 2 ) = R() R( 2 ) = 38978 21800 = 17178.
Linear Statistical Models: Inference for the full rank model

48/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Our F statistic is now


R( 1 | 2 )/r
17178/3
=
= 58.2.
SSRes /(n p)
98

We check this against the F distribution with 3 and n p = 7


degrees of freedom. The critical point for = 0.01 is 8.45, so we
can again say that H0 can be rejected.

In other words, the intercept alone does not explain the variation in
the response variable adequately, and we are (reasonably) certain
that we need at least one of the terms in the model.

Linear Statistical Models: Inference for the full rank model

49/123

=0

General linear hypothesis

Variation
Regression
Full
Reduced
1 in presence of 2
Residual
Total

Splitting

Corrected SS

SS

d.f.

38978
21800
17178
689
39667

4
1
3
7
11

Linear Statistical Models: Inference for the full rank model

Sequential testing

MS

5726
98

58.2

50/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Corrected sum of squares


In general, we have the following ANOVA table for the test
H0 : 1 = = k = 0 versus the alternative that some i 6= 0,
i {1, . . . , k }.
Source of
variation
Regression
Full model
Reduced model
1 in presence of 2
Residual
Total

Sum of
squares

degrees of
freedom

R() = yT H y
2 
Pn
n
i=1 yi
R( 1 | 2 )
yT y R()
yT y

k +1
1
k
n k 1
n

Linear Statistical Models: Inference for the full rank model

Mean
square

R( 1 | 2 )
k
SSRes
np

F
ratio

R( 1 )| 2 )/k
MSRes

51/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

This ANOVA table is sometimes presented differently. Observe that


n
X
i=1

(yi y ) =

n
X
i=1

yi2

P
( ni=1 yi )2

= yT y R( 2 ).
n

This is called the corrected sum of squares, and R( 2 ) the


correction factor.

Linear Statistical Models: Inference for the full rank model

52/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We break down the corrected sum of squares into R( 1 | 2 ) and


SSRes , and test using an F statistic ratio. The end result is the
same as before, but the table looks slightly different.

Source of
variation
Regression
Residual
Total

Sum of
squares
2 
Pn
SSReg
n
i=1 yi
SSRes
2 
Pn
n
yT y
i=1 yi

degrees of
freedom
k
n k 1
n 1

Mean
square
R( 1 | 2 )
k
SSRes
nk 1

F
ratio
R( 1 | 2 )/k
MSRes

Some computer outputs will use a corrected sum of squares layout


instead of an uncorrected sum, so you should be familiar with both.

Linear Statistical Models: Inference for the full rank model

53/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example. In the data processing example, we rejected the


T

= 0. The ANOVA table for a
hypothesis that 1 2 3
corrected sum of squares test is

Variation
Regression
Residual
Total

SS
17178
689
17867

d.f.
3
7
10

MS
5726
98

F
58.2

The actual test does not change: the F statistic and degrees of
freedom are the same.

Linear Statistical Models: Inference for the full rank model

54/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Clover example

Recall the clover example


>
>
>
>
>
>
>
>

clover <- read.csv("../data/clover.csv")


clover <- log(clover)
clover <- clover[-c(6,23,47,68,97,111,140),]
y <- clover$area
X <- cbind(1, clover$midrib, clover$estim)
b <- solve(t(X) %*% X, t(X) %*% y)
n <- length(y)
p <- dim(X)[2]

Linear Statistical Models: Inference for the full rank model

55/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 0 = 0
> X2 <- X[,-1]
> b2 <- solve(t(X2) %*% X2, t(X2) %*% y)
> (SSRes2 <- sum((y - X2 %*% b2)^2))
[1] 6.296183
> (Rg2 <- SS - SSRes2)
[1] 374.8902
> (Rg2 <- t(y) %*% X2 %*% solve(t(X2) %*% X2) %*% t(X2) %*% y)
[,1]
[1,] 374.8902
> (Rg1g2 <- as.vector(SSReg - Rg2))
[1] 1.79753

Linear Statistical Models: Inference for the full rank model

56/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 0 = 0

> r <- 1
> (Fstat <- (Rg1g2/r)/(SSRes/(n-p)))
[1] 53.94204
> pf(Fstat, r, n-p, lower.tail=FALSE)
[1] 1.761625e-11

Linear Statistical Models: Inference for the full rank model

57/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 0 = 0

> basemodel <- lm(area ~ 0 + midrib + estim, data=clover)


> anova(basemodel, model)
Analysis of Variance Table
Model 1: area ~
Model 2: area ~
Res.Df
RSS
1
136 6.2962
2
135 4.4987
--Signif. codes:

0 + midrib + estim
midrib + estim
Df Sum of Sq
F
1

Pr(>F)

1.7975 53.942 1.762e-11 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Statistical Models: Inference for the full rank model

58/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 1 = 0

> X2 <- X[,-2]


> Rg2 <- t(y) %*% X2 %*% solve(t(X2) %*% X2) %*% t(X2) %*% y
> Rg2
[,1]
[1,] 375.1498
> Rg1g2 <- SSReg - Rg2
> Rg1g2
[,1]
[1,] 1.53792
> r <- 1

Linear Statistical Models: Inference for the full rank model

59/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 1 = 0

> Fstat <- (Rg1g2/r)/(SSRes/(n-p))


> Fstat
[,1]
[1,] 46.15142
> pf(Fstat, r, n-p, lower.tail=FALSE)
[,1]
[1,] 3.189909e-10

Linear Statistical Models: Inference for the full rank model

60/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 1 = 0

> basemodel <- lm(area ~ estim, data=clover)


> anova(basemodel, model)
Analysis of Variance Table
Model 1: area ~
Model 2: area ~
Res.Df
RSS
1
136 6.0366
2
135 4.4987
--Signif. codes:

estim
midrib + estim
Df Sum of Sq
1

Pr(>F)

1.5379 46.151 3.19e-10 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Statistical Models: Inference for the full rank model

61/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 2 = 0

> X2 <- X[,-3]


> Rg2 <- t(y) %*% X2 %*% solve(t(X2) %*% X2) %*% t(X2) %*% y
> Rg2
[,1]
[1,] 373.5761
> Rg1g2 <- SSReg - Rg2
> Rg1g2
[,1]
[1,] 3.111602
> r <- 1

Linear Statistical Models: Inference for the full rank model

62/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 2 = 0

> Fstat <- (Rg1g2/r)/(SSRes/(n-p))


> Fstat
[,1]
[1,] 93.37601
> pf(Fstat, r, n-p, lower.tail=FALSE)
[,1]
[1,] 4.114603e-17

Linear Statistical Models: Inference for the full rank model

63/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 2 = 0

> basemodel <- lm(area ~ midrib, data=clover)


> anova(basemodel, model)
Analysis of Variance Table
Model 1: area ~
Model 2: area ~
Res.Df
RSS
1
136 7.6103
2
135 4.4987
--Signif. codes:

midrib
midrib + estim
Df Sum of Sq
1

Pr(>F)

3.1116 93.376 < 2.2e-16 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Statistical Models: Inference for the full rank model

64/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Corrected sum of squares, H0 : 1 = 2 = 0

> X2 <- X[,1]


> Rg2 <- t(y) %*% X2 %*% solve(t(X2) %*% X2) %*% t(X2) %*% y
> Rg2
[,1]
[1,] 311.9043
> Rg1g2 <- SSReg - Rg2
> Rg1g2
[,1]
[1,] 64.78343
> r <- 2

Linear Statistical Models: Inference for the full rank model

65/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 1 = 2 = 0

> Fstat <- (Rg1g2/r)/(SSRes/(n-p))


> Fstat
[,1]
[1,] 972.0423
> pf(Fstat, r, n-p, lower.tail=FALSE)
[,1]
[1,] 6.936784e-81

Linear Statistical Models: Inference for the full rank model

66/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

H0 : 1 = 2 = 0

> basemodel <- lm(area ~ 1, data=clover)


> anova(basemodel, model)
Analysis of Variance Table
Model 1: area ~
Model 2: area ~
Res.Df
RSS
1
137 69.282
2
135 4.499
--Signif. codes:

1
midrib + estim
Df Sum of Sq
2

Pr(>F)

64.783 972.04 < 2.2e-16 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Linear Statistical Models: Inference for the full rank model

67/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> summary(model)
Call:
lm(formula = area ~ midrib + estim, data = clover)
Residuals:
Min
1Q
-0.57603 -0.09824

Median
0.01173

3Q
0.11355

Max
0.51957

Coefficients:
Estimate Std. Error t value
(Intercept) -1.58458
0.21575 -7.345
midrib
0.76731
0.11295
6.793
estim
0.62183
0.06435
9.663
--Signif. codes: 0 '***' 0.001 '**' 0.01

Pr(>|t|)
1.76e-11 ***
3.19e-10 ***
< 2e-16 ***

'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1825 on 135 degrees of freedom


Multiple R-squared: 0.9351, Adjusted R-squared: 0.9341
F-statistic:
972 on 2 and 135 DF, p-value: < 2.2e-16

Linear Statistical Models: Inference for the full rank model

68/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Sequential testing
Suppose we have a number of explanatory variables, and we would
like a parsimonious model. That is, a model which explains the
variation in the response y using a minimal number of explanatory
variables. A parsimonious model is less likely to suffer from
overfitting.
For such a model, if we were to test if parameter i is 0, in the
presence of the other model parameters, we should always reject
the null.
How do we find such a minimal set of parameters?

Linear Statistical Models: Inference for the full rank model

69/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Conceivably, with the help of a computer, we could test all the


possible parameter sets to find the largest 1 such that the
hypothesis 1 = 0 is not rejected.
The problem with this approach (apart from the time required) is
that it can give inconsistent results. For example we might reject
1 = 2 = 0 given 3 , but not reject 1 = 0 given 2 and 3 , and
also not reject 2 = 0 given 1 and 3 .

This can happen when x1 and x2 are very strongly correlated, so


that given one of them the other isnt needed, but you need to
have at least one of them.

Linear Statistical Models: Inference for the full rank model

70/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

If we have p = k + 1 parameters 0 , . . . , k we could consider p


tests of the form H0 : i = 0, given all the other parameters are in
the model. Such tests are sometimes called partial tests.

The discussion above suggests that this could lead us to remove


too many variables, because the partial tests are not independent.

In a partial test, acceptance or rejection of H0 does not mean that


the parameter is useful or useless in the best model, just useful or
useless in the full model.

Linear Statistical Models: Inference for the full rank model

71/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

To avoid the problem of dependence between partial tests we can


consider a nested sequence of models.

That is, we can start with a simple model and sequentially add
parameters until we reach a parsimonious model. That is, until
adding parameters does not significantly improve the fit.

Alternatively we can start with a full model and sequentially


remove parameters until we reach a parsimonious model. That is,
until removing parameters significantly worsens the fit.

Linear Statistical Models: Inference for the full rank model

72/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Consider the series of models (subject to relabelling)

= 0 + (0)

= 0 + 1 x1 + (1)
..
.

= 0 + 1 x1 + . . . + k xk + (k ) .

We denote the corresponding X matrices by X (j ) , which are the


first j + 1 columns of X .
The regression sum of squares for each of these models is
calculated in the usual way:
R(0 , 1 , . . . , j ) = yT X (j ) ((X (j ) )T X (j ) )1 (X (j ) )T y.

Linear Statistical Models: Inference for the full rank model

73/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Note that these are full regression sums of squares, i.e. we are
looking at the total variation explained by the model in the
presence of no other parameters.
Now by taking the difference between the sums of squares, we can
get the extra variation explained as we add variables to the model
one at a time:

R(1 |0 ) = R(0 , 1 ) R(0 )


R(2 |0 , 1 ) = R(0 , 1 , 2 ) R(0 , 1 )
..
.
R(k |0 , 1 , . . . , k 1 ) = R() R(0 , 1 , . . . , k 1 ).

Linear Statistical Models: Inference for the full rank model

74/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Theorem
Suppose y = X + where X is full rank and N (0, 2 I ).
Let Xj be the first j + 1 columns of X (the first column is all
ones), and put
Hj

= Xj (XjT Xj )1 XjT

Rj

= yT Hj y = R(0 , . . . , j )

R(j |0 , . . . , j 1 ) = Rj Rj 1 = yT (Hj Hj 1 )y.


Then (supposing r (X ) = p = k + 1)
yT y = SSRes + R()
= SSres + R(0 ) + R(1 |0 ) + R(2 |0 , 1 ) +
+ R(k |0 , 1 , . . . , k 1 )
and the quadratic forms on the right are all independent
non-central 2 . SSRes has n p d.f. and the rest 1 d.f. each.
Linear Statistical Models: Inference for the full rank model

75/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Lemma
Suppose X = [X1 |X2 ] is full rank, size n p for n p, then
X2 = X (X T X )1 X T X2

Lemma
For X as above, X1 size n r and X2 size n (p r ), we have
that
A2 := X (X T X )1 X T X2 (X2T X2 )1 X2T = H H2
is symmetric and idempotent, rank r .

Linear Statistical Models: Inference for the full rank model

76/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Proof of lemmas

Linear Statistical Models: Inference for the full rank model

77/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Proof of theorem

Linear Statistical Models: Inference for the full rank model

78/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Each sequential regression sum of squares has 1 degree of freedom.


Therefore under the hypothesis j = 0, the test statistic
R(j |0 , 1 , . . . , j 1 )
SSRes /(n p)
has an F distribution with 1 and n p degrees of freedom.

Note that this is still not entirely satisfactory, because the result
will depend heavily on the order of the parameters considered.
Different orderings can result in different sets of parameters being
included in the final model.

Linear Statistical Models: Inference for the full rank model

79/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example. An experiment was conducted to study the size of squid.


The response is the weight of the squid, and the predictors are
I

x1 : Beak length

x2 : Wing length

x3 : Beak to notch length

x4 : Notch to wing length

x5 : Width

A total of 22 squid are sampled.

Linear Statistical Models: Inference for the full rank model

80/123

=0

General linear hypothesis

Linear Statistical Models: Inference for the full rank model

Splitting

Corrected SS

Sequential testing

81/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

The first thing we test is whether = 0. The ANOVA table is


Variation
Regression
Residual
Total

SS
595.16
7.92
603.08

d.f.
6
16
22

MS
99.19
0.49

F
200.47

The null hypothesis = 0 is rejected strongly (at = 0.01 the


critical value is 4.20).

Linear Statistical Models: Inference for the full rank model

82/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Next we test to see which parameters should be in the model. The


sequential sums of squares are:

R(0 ) = 387.16
R(1 |0 ) = 199.15
R(2 |0 , 1 ) = 0.127
R(3 |0 , 1 , 2 ) = 4.12
R(4 |0 , 1 , 2 , 3 ) = 0.263
R(5 |0 , 1 , 2 , 3 , 4 ) = 4.35
Note that these sum to the regression sum of squares for the full
model, 595.16.

Linear Statistical Models: Inference for the full rank model

83/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Each of these sums of squares should be compared against the


critical F value with 1 and n p degrees of freedom, multiplied by
SSRes /(n p). With = 0.05, this is
7.92
4.49 = 2.22.
22 6
So starting with a model with no parameters, we should definitely
add 0 and then 1 , but not 2 .
The subsequent tests are harder to interpret. For example, if
0 , 1 , 2 , and 3 are in the model, we should not add 4 . But 2
is not in the model!
The tests for 3 , 4 and 5 need to be repeated, supposing only
that 0 and 1 are in the model.

Linear Statistical Models: Inference for the full rank model

84/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Note that we use the SSRes (and residual degrees of freedom) of


the full model in the denominator of our F statistics.
This is because we cannot assume that variables that are not in the
model are irrelevant. This means that the SSRes of a reduced
model may be disproportionately large, and more importantly not
conform to our distributional assumptions.
The only way to be safe about this is to use the SSRes of the full
model, even if it means losing a few degrees of freedom to truly
irrelevant variables.
Note: R does not do this! To test for i in the presence of
0 , . . . , i1 it uses the residual sum of squares from the model
using 0 , . . . , i . This still gives a valid test, though in general not
as powerful as the test using SSRes from the full model.
Linear Statistical Models: Inference for the full rank model

85/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Forward selection

Forward selection starts off with an empty model, and adds


variables which are deemed to be significant. Significance is
measured in relation to the current model, so all tests are
conducted in the presence of already included parameters, but not
the other parameters.

When no variables are significant enough to add, we stop and take


the current model as the final model.

Linear Statistical Models: Inference for the full rank model

86/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

1. Start off with an empty model.


2. Calculate the F -values for the tests H0 : i = 0, for all
parameters not in the model, in the presence of parameters
already in the model.
3. If none of the tests are significant (we do not reject any null
hypotheses), then stop.
4. Otherwise add the most significant parameter (i.e. parameter
with the largest F -value).
5. Return to step 2.

Linear Statistical Models: Inference for the full rank model

87/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Backward elimination
A method which is conceptually very similar to forward selection is
backward elimination:
1. Start off with the full model.
2. Calculate the F -values for the tests H0 : i = 0, for all
parameters in the model, in the presence of the other
parameters in the model.
3. If all of the tests are significant (we reject all null hypotheses),
then stop.
4. Otherwise, remove the least significant parameter (i.e.
parameter with smallest F -value).
5. Return to step 2.

Linear Statistical Models: Inference for the full rank model

88/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Backward elimination is complimentary to forward selection, i.e.


starts from the full model and removes the least important variable
until all variables are important.

Forward selection and backward elimination are easy to understand


and to apply, but do not always produce the optimal results. One
reason this is so is the inability to remove an already added
variable (or add an already removed variable). This inflexibilty is
often limiting.

Linear Statistical Models: Inference for the full rank model

89/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example: forward selection


We model the hardening of cement.
> heat <- read.csv("../data/heat.csv")
> str(heat)

'data.frame': 13 obs. of 5 variables:


$ x1: int 7 1 11 11 7 11 3 1 2 21 ...
$ x2: int 26 29 56 31 52 55 71 31 54 47 ...
$ x3: int 6 15 8 8 6 9 17 22 18 4 ...
$ x4: int 60 52 20 47 33 22 6 44 22 26 ...
$ y : num 78.5 74.3 104.3 87.6 95.9 ...
> basemodel <- lm(y ~ 1, data=heat)

Linear Statistical Models: Inference for the full rank model

90/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> pairs(heat)
50

70

10

30

50
20

30

50

70

5 10

x1

20

30

x2

50

10

x3

80

100

10

30

x4

5 10

20

10

20

Linear Statistical Models: Inference for the full rank model

80

100

91/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> add1(basemodel, scope= ~ . + x1 + x2 + x3 + x4, test="F")


Single term additions
Model:
y ~ 1
Df Sum of Sq
RSS
AIC
<none>
2715.76 71.444
x1
1
1450.08 1265.69 63.519
x2
1
1809.43 906.34 59.178
x3
1
776.36 1939.40 69.067
x4
1
1831.90 883.87 58.852
--Signif. codes: 0 '***' 0.001 '**'

F value

Pr(>F)

12.6025
21.9606
4.4034
22.7985

0.0045520
0.0006648
0.0597623
0.0005762

**
***
.
***

0.01 '*' 0.05 '.' 0.1 ' ' 1

> model2 <- lm(y ~ x4, data=heat)

Linear Statistical Models: Inference for the full rank model

92/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> add1(model2, scope= ~ . + x1 + x2 + x3, test="F")


Single term additions
Model:
y ~ x4
Df Sum of Sq
RSS
AIC F value
Pr(>F)
<none>
883.87 58.852
x1
1
809.10 74.76 28.742 108.2239 1.105e-06 ***
x2
1
14.99 868.88 60.629
0.1725
0.6867
x3
1
708.13 175.74 39.853 40.2946 8.375e-05 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model3 <- lm(y ~ x1 + x4, data=heat)

Linear Statistical Models: Inference for the full rank model

93/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> add1(model3, scope= ~ . + x2 + x3, test="F")


Single term additions
Model:
y ~ x1 + x4
Df Sum of Sq
RSS
AIC F value Pr(>F)
<none>
74.762 28.742
x2
1
26.789 47.973 24.974 5.0259 0.05169 .
x3
1
23.926 50.836 25.728 4.2358 0.06969 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
We use the model with variables x1 and x4 .

Linear Statistical Models: Inference for the full rank model

94/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Backward elimination
> fullmodel <- lm(y ~ x1+x2+x3+x4,data=heat)
> drop1(fullmodel, scope= ~ ., test="F")
Single term deletions
Model:
y ~ x1 + x2 + x3 + x4
Df Sum of Sq
RSS
AIC F value Pr(>F)
<none>
47.864 26.944
x1
1
25.9509 73.815 30.576 4.3375 0.07082 .
x2
1
2.9725 50.836 25.728 0.4968 0.50090
x3
1
0.1091 47.973 24.974 0.0182 0.89592
x4
1
0.2470 48.111 25.011 0.0413 0.84407
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
> model2 <- lm(y~x1+x2+x4,data=heat)
Linear Statistical Models: Inference for the full rank model

95/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> drop1(model2, scope= ~., test="F")


Single term deletions
Model:
y ~ x1 + x2 + x4
Df Sum of Sq
RSS
AIC F value
Pr(>F)
<none>
47.97 24.974
x1
1
820.91 868.88 60.629 154.0076 5.781e-07 ***
x2
1
26.79 74.76 28.742
5.0259
0.05169 .
x4
1
9.93 57.90 25.420
1.8633
0.20540
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
> model3 <- lm(y ~ x1 + x2, data=heat)

Linear Statistical Models: Inference for the full rank model

96/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> drop1(model3, scope = ~ ., test="F")


Single term deletions
Model:
y ~ x1 + x2
Df Sum of Sq
RSS
AIC F value
Pr(>F)
<none>
57.90 25.420
x1
1
848.43 906.34 59.178 146.52 2.692e-07 ***
x2
1
1207.78 1265.69 63.519 208.58 5.029e-08 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '
We use the model with variables x1 and x2 .

Linear Statistical Models: Inference for the full rank model

97/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Stepwise selection
Stepwise selection functions similarly to forward or backward
selection, but with the possibility of either adding or eliminating a
variable at each step. We give a procedure using a goodness of fit
measure called the AIC , though it is trivial to adjust for the usage
of any other goodness-of-fit statistic.
1. Start with any model.
2. Compute the AIC of all models which either have one extra
variable or one less variable than the current model.
3. If the AIC of all such models is more than the AIC of the
current model, stop.
4. Otherwise, change to the model with the lowest AIC .
5. Return to step 2.

Linear Statistical Models: Inference for the full rank model

98/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Stepwise selection is generally better than forward or backward


selection, because it avoids the problem that an already added
variable can never be removed (or the opposite).
However the final model does often depend on the starting model,
so it does not necessarily find a global minimum for AIC (or
whichever goodness-of-fit criterion you use). Instead it finds a local
minimum.
It is possible for small numbers of variables to find a global
minimum through an exhaustive search of all possible
combinations. However, as the number of variables increases, this
can take too long.

Linear Statistical Models: Inference for the full rank model

99/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Space of models (hypercube)

Linear Statistical Models: Inference for the full rank model

100/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Goodness-of-fit measures
The F test is used to compare nested models, that is, it requires
the variable set of one model to be fully contained in the variable
set of the other model. Thus we cannot use an F test to compare
models which, for example, have replaced one variable with
another variable.
Also, use of the F test requires the somewhat arbitrary choice of a
significance level.
To overcome these problems many authors have proposed
goodness-of-fit measures, which try to give a measure of how good
a model is, independently of other models (though still dependent
on the data in question).

Linear Statistical Models: Inference for the full rank model

101/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Residual sum of squares


The residual sum of squares, SSRes , is not a good goodness-of-fit
measure. Although it measures how well the model fits the
(training) data, it does not take into account model complexity,
and thus can not prevent overfitting.
One way to get around this is to simply use s 2 as a goodness-of-fit
statistic. Whevenever we add a variable to the model, SSRes always
decreases. However, the degrees of freedom n p also decreases,
so s 2 will decrease only if the variable is good (in a sense).
Unfortunately, in practice using s 2 for model fittng does not
discourage overfitting enough.

Linear Statistical Models: Inference for the full rank model

102/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

R2
A commonly reported goodness-of-fit statistic is the proportion of
(corrected) total sums of squares that is explained by the model:
R2 = 1

SSTotal

SSRes
.
P
( i yi )2 /n

R 2 lies between 0 and 1, and the larger it is, the more variation in
y is explained by the model. (We are assuming that 0 is always in
the model.)
However R 2 can never decrease when we add a variable to a
model, as even an irrelevant variable will explain a small extra
amount of variation. We would like to remove irrelevant variables,
so, like the SSRes , R 2 is not appropriate for model selection.
Linear Statistical Models: Inference for the full rank model

103/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Adjusted R 2
The adjusted R 2 tries to account for model complexity, by
introducing a penalty based on the number of parameters in the
model.
n 1
(1 R 2 ).
n 1k
Here we are assuming that 0 is in the model, and k is the number
of other parameters in the model.
adj R 2 = 1

The adjusted R 2 is better for model selection than s 2 , but there


are other more sophisticated goodness-of-fit measures that we can
use, such as the AIC, BIC or Mallows Cp statistic.

Linear Statistical Models: Inference for the full rank model

104/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

AIC
A very popular goodness-of-fit statistic is Akaikes information
criterion, or AIC. This is based on the likelihood of the observed
values of the response.

AIC

= 2 ln(likelihood) + 2p


SSRes
= n ln
+ 2p + const.
n

(Here the likelihood is the maximised likelihood.) A smaller value


of AIC indicates a better model.
The form of the AIC can be justified using information theory.

Linear Statistical Models: Inference for the full rank model

105/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Note that any goodness-of-fit statistic should only to be used to


compare various models for the same data. For none of them is
there an absolute measure of how good a model is.

Linear Statistical Models: Inference for the full rank model

106/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example: stepwise selection


Recall our cement hardening example
> heat <- read.csv("../data/heat.csv")
> str(heat)

'data.frame': 13 obs. of 5 variables:


$ x1: int 7 1 11 11 7 11 3 1 2 21 ...
$ x2: int 26 29 56 31 52 55 71 31 54 47 ...
$ x3: int 6 15 8 8 6 9 17 22 18 4 ...
$ x4: int 60 52 20 47 33 22 6 44 22 26 ...
$ y : num 78.5 74.3 104.3 87.6 95.9 ...
> basemodel <- lm(y ~ 1, data=heat)

Linear Statistical Models: Inference for the full rank model

107/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> model2 <- step(basemodel, scope=~.+x1+x2+x3+x4,


+
steps=1)
Start:
y ~ 1

+ x4
+ x2
+ x1
+ x3
<none>

AIC=71.44

Df Sum of Sq
RSS
AIC
1
1831.90 883.87 58.852
1
1809.43 906.34 59.178
1
1450.08 1265.69 63.519
1
776.36 1939.40 69.067
2715.76 71.444

Step: AIC=58.85
y ~ x4

Linear Statistical Models: Inference for the full rank model

108/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> model3 <- step(model2, scope=~.+x1+x2+x3,steps=1)


Start:
y ~ x4

+ x1
+ x3
<none>
+ x2
- x4

AIC=58.85

Df Sum of Sq
1
809.10
1
708.13
1
1

RSS
74.76
175.74
883.87
14.99 868.88
1831.90 2715.76

AIC
28.742
39.853
58.852
60.629
71.444

Step: AIC=28.74
y ~ x4 + x1

Linear Statistical Models: Inference for the full rank model

109/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> model4 <- step(model3, scope=~.+x2+x3,steps=1)


Start: AIC=28.74
y ~ x4 + x1

+ x2
+ x3
<none>
- x1
- x4

Df Sum of Sq
1
26.79
1
23.93
1
1

RSS
47.97
50.84
74.76
809.10 883.87
1190.92 1265.69

AIC
24.974
25.728
28.742
58.852
63.519

Step: AIC=24.97
y ~ x4 + x1 + x2

Linear Statistical Models: Inference for the full rank model

110/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> step(model4, scope=~.+x3)


Start: AIC=24.97
y ~ x4 + x1 + x2
Df Sum of Sq
<none>
- x4
+ x3
- x2
- x1

1
1
1
1

9.93
0.11
26.79
820.91

RSS
47.97
57.90
47.86
74.76
868.88

AIC
24.974
25.420
26.944
28.742
60.629

Call:
lm(formula = y ~ x4 + x1 + x2, data = heat)
Coefficients:
(Intercept)
71.6483

x4
-0.2365

Linear Statistical Models: Inference for the full rank model

x1
1.4519

x2
0.4161
111/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> model2 <- step(fullmodel, scope=~., steps=1)


Start: AIC=26.94
y ~ x1 + x2 + x3 + x4
Df Sum of Sq
RSS
AIC
- x3
1
0.1091 47.973 24.974
- x4
1
0.2470 48.111 25.011
- x2
1
2.9725 50.836 25.728
<none>
47.864 26.944
- x1
1
25.9509 73.815 30.576
Step: AIC=24.97
y ~ x1 + x2 + x4

Linear Statistical Models: Inference for the full rank model

112/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

> step(model2, scope=~.+x3)


Start: AIC=24.97
y ~ x1 + x2 + x4
Df Sum of Sq
<none>
- x4
+ x3
- x2
- x1

1
1
1
1

9.93
0.11
26.79
820.91

RSS
47.97
57.90
47.86
74.76
868.88

AIC
24.974
25.420
26.944
28.742
60.629

Call:
lm(formula = y ~ x1 + x2 + x4, data = heat)
Coefficients:
(Intercept)
71.6483

x1
1.4519

Linear Statistical Models: Inference for the full rank model

x2
0.4161

x4
-0.2365
113/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

t tests

We can also use a t test for a partial test of one parameter. That
is, to test H0 : i = 0 against H1 : i 6= 0 in the presence of all the
other parameters. (A partial test.)

Recall our confidence interval for i

bi t/2 s cii
where cii is the (i , i )th entry of (X T X )1 , and we use a t
distribution with n p degrees of freedom. If this confidence
interval includes 0, we do not reject H0 ; otherwise, we can reject it.

Linear Statistical Models: Inference for the full rank model

114/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

In other words, we use the t statistic (with n p degrees of


freedom)
bi
.
s cii

Let us compare this with our existing partial F test. The statistic
we use for this is
R(i |0 , 1 , . . . , i1 , i+1 , . . . , k )
.
SSRes /(n p)
The denominator is of course s 2 .

Linear Statistical Models: Inference for the full rank model

115/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

We saw previously that the numerator is


R(i |0 , 1 , . . . , i1 , i+1 , . . . , k ) = 1 T A1
11 1
where 1 = bi , and A11 is the top left element of (X T X )1 after
the columns have been re-arranged so that the i th column comes
first. In other words, A11 = cii and
R(i |0 , 1 , . . . , i1 , i+1 , . . . , k ) = bi (cii )1 bi =

bi2
.
cii

So the statistic (using an F distribution with 1 and n p degrees


of freedom) is
bi2
.
cii s 2
This is exactly the square of the t statistic!
Linear Statistical Models: Inference for the full rank model

116/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

This is actually not too surprising. The t distribution can be


expressed as a normal variable divided by the square root of a 2
variable, and therefore when we square it we get the square of a
normal variable divided by a 2 variable. But the square of a
normal variable is a 2 variable with 1 d.f.

Therefore the square of a t variable with n d.f. is an F variable


with 1 and n d.f.

This means that the t test and the F test are (nearly) identical;
the t test is actually slightly more useful, because it also gives an
indication of the sign of the parameter.

Linear Statistical Models: Inference for the full rank model

117/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Example. In the estimation section, we modelled the amount of a


chemical which dissolves in water, when held at a certain
temperature. We found that the 95% confidence interval for 1
was

0.31 2.78 0.86 0.00057 = [0.25, 0.36].


A t test would use the statistic
b1
0.31

=
= 14.89

s c11
0.86 0.00057
which, using a t distribution with n p = 6 2 = 4 degrees of
freedom, would reject the hypothesis 1 = 0 at the 0.05 level
(critical value 2.78). We can also say that 1 is almost certainly
positive.

Linear Statistical Models: Inference for the full rank model

118/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

On the other hand, if we use an F test then we find that


R() = yT X (X T X )1 X T y = 663.77
R(0 ) = yT X1 (X1T X1 )1 X1T y = 498.68
R(1 |0 ) = 663.77 498.68 = 165.09
and the F statistic is
R(1 |0 )
165.09
= 221.7 = 14.892 .
=
2
s
0.74

Linear Statistical Models: Inference for the full rank model

119/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Based on the F distribution with 1 and 4 degrees of freedom, the


critical value is 7.71 = 2.772 . So we can again reject the null
hypothesis of 1 = 0.

Variation
Regression
Full
Reduced
1 in presence of 0
Residual
Total

SS

d.f.

663.77
498.68
165.09
2.98
666.75

2
1
1
4
6

Linear Statistical Models: Inference for the full rank model

MS

165.09
0.74

221.7

120/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Shrinkage

Not all selection procedures employ sequential addition and/or


deletion of variables. Some go for a more holistic approach.

A common approach is to try and shrink all the fitted parameters


toward 0, so that the irrelevant variables have very little effect on
the model. Some of the fitted parameters might actually become
0, and the associated variables can then be removed.

Linear Statistical Models: Inference for the full rank model

121/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

For example, ridge regression uses a penalized least squares


approach. Instead of simply minimising the residual sum of
squares, we include a penalty term which is proportional to the
sum of squares of the parameters. We choose b to minimise
n
X

ei2

i=1

k
X

bj2 .

j =0

The term controls the amount of shrinkage of the parameters.


The greater it is, the more the parameters are shrunk. The
penalized least squares estimators can be calculated to be
b = (X T X + I )1 X T y.

Linear Statistical Models: Inference for the full rank model

122/123

=0

General linear hypothesis

Splitting

Corrected SS

Sequential testing

Another approach is the LASSO, which minimises


n
X

ei2 +

i=1

k
X

|bj |.

j =0

The LASSO actually shrinks small parameters to 0, and can be


used for variable selection by removing those variables.

Choosing an appropriate shrinking parameter is quite involved. A


common method is cross-validation, which estimates the predictive
power of the model.

Linear Statistical Models: Inference for the full rank model

123/123

Vous aimerez peut-être aussi