Lectures Week 2 2016

MTH5120 S TATISTICAL M ODELING I 2015/2016
WEEK 2
1 / 64
Assessing the Model
Analysis of Variance
F test
Estimating σ 2
Coefficient of Determination
Minitab Example
Residuals
Crude Residuals
Standardized/Studentized Residuals
Residuals Diagnostics
Inference about the regression parameters
Example: Overheads
2 / 64
Assessing the Model
Parameter estimates obtained for the model
Yi = β0 + β1 xi + εi
can be used to estimate the mean response corresponding to

each variable Yi .
That is,
[i ) = Ybi = βb0 + βb1 xi , i = 1, . . . , n.
E(Y
[i ) for a given data set (xi , yi ), are called fitted
Values of E(Y
values and are denoted by ybi .
3 / 64
Assessing the Model
I They are points on the fitted regression line corresponding
to values xi .
I The observed values yi usually do not fall exactly on the
line and so are usually not equal to the fitted values ybi , as it
is shown in the figure below.
4
y
1
0 2 4 6 8 10 12 14 16 18
x
4 / 64
Assessing the Model
The residuals (also called crude residuals) are defined as
ei := Yi − Ybi , i = 1, . . . , n,
These are estimators of the random errors εi . Thus
ei = Yi − (βb0 + βb1 xi )
= Yi − (Ȳ − βb1 x̄ + βb1 xi )
= Yi − Ȳ − βb1 (xi − x̄).
We shall use the following identity:

n
X n
X n
X
ei = (Yi − Ȳ) − βb1 (xi − x̄) = 0.
i=1 i=1 i=1
5 / 64
Assessing the Model
I Note that the estimators βb0 and βb1 minimize the function
S(β0 , β1 ).
I The minimum is called the Residual Sum of Squares and is
denoted by SSE .
That is,
n
X
SSE = [Yi − (βb0 + βb1 xi )]2
i=1
Xn
= (Yi − Ybi )2
i=1
Xn
= e2i .
i=1
6 / 64
Assessing the Model
Consider the constant model
Yi = β0 + εi .
Y
10
y
4
0
0 2 4 6 8 10 X
7 / 64
Assessing the Model
For this model βb0 = Ȳ and we have
Ybi = Ȳ
ei = Yi − Ybi = Yi − Ȳ
and
n
X
SSE = SST = (Yi − Ȳ)2 .
i=1
It is called the Total Sum of Squares and is denoted by SST .
For a constant model SSE = SST .
8 / 64
Assessing the Model
When the model is non constant, the difference Yi − Ȳ can be
split into two components: one due to the regression model fit
and one due to the residuals. That is
Yi − Ȳ = (Yi − Ybi ) + (Ybi − Ȳ).
For a given data set it could be represented as follows
y
y(14)
5
y(14)
4
fitted line
y
2
1
x
0 2 4 6 8 10 12 14 16 18
9 / 64
Assessing the Model
Analysis of Variance Identity
Theorem
In the simple linear regression model the total sum of squares
is a sum of the regression sum of squares and the residual sum
of squares, that is
SST = SSR + SSE ,
where
n
X
SST = (Yi − Ȳ)2
i=1
n
X
SSR = (Ybi − Ȳ)2
i=1
n
X
SSE = (Yi − Ybi )2
i=1
10 / 64
Assessing the Model
Proof
n
X n
X
SST = (Yi − Ȳ)2 = [(Yi − Ybi ) + (Ybi − Ȳ)]2
i=1 i=1
n
X
= [(Yi − Ybi )2 + (Ybi − Ȳ)2 + 2(Yi − Ybi )(Ybi − Ȳ)]
i=1
= SSE + SSR + 2A,
11 / 64
Assessing the Model
n
X
A = (Yi − Ybi )(Ybi − Ȳ)
i=1
Xn n
X
= (Yi − Ybi )Ybi − Ȳ (Yi − Ybi )
i=1 i=1
n
X n
X
= ei Ybi − Ȳ ei
i=1 i=1
| {z }
=0
n
X n
X n
X
= ei (βb0 + βb1 xi ) = βb0 ei + βb1 ei xi .
i=1 i=1 i=1
| {z } | {z }
=0 =0
Hence A = 0.
12 / 64
Assessing the Model
For a given data set
I SSR represents the variability in the observations Yi

accounted for by the fitted model.
I SSE represents the variability in Yi accounted for by the
differences between the observations and the fitted values
I SST represents the total variability in Yi .
13 / 64
Assessing the Model
ANOVA Table
The split of the sources of variability is customarily put in the

table called ANOVA Table.
ANOVA table
Source of variation d.f. SS MS VR
SSR MSR
Regression νR = 1 SSR MSR = νR MSE
SSE
Residual νE = n − 2 SSE MSE = νE
Total νT = n − 1 SST
The table shows the sources of variation, the sums of squares

and the statistic, based on the sums of squares, for testing the
significance of regression slope.
14 / 64
Assessing the Model
ANOVA Table
The d.f. is short for degrees of freedom, which is the number of

independent pieces of information used for estimation of each
of the sums of squares.
The Mean Squares (MS) are measures of average variation in

each source.
The Variance Ratio

MSR
VR =
MSE
measures the variation explained by the model fit relative to the
variation due to residuals.
15 / 64
Assessing the Model
F-test
The mean squares are functions of random variables Yi and so

is their ratio. We denote it by F. We will see later, that if β1 = 0,
then
MSR
F= ∼ F1,n−2 .
MSE
Thus, to test the null hypothesis
H0 : β1 = 0
versus the alternative

H1 : β1 6= 0,
we use the variance ratio F as the test statistic.
Under H0 the ratio has F distribution with 1 and n − 2 degrees

of freedom.
16 / 64
Assessing the Model
F-test
We reject H0 at a significance level α if
Fcal > Fα;1,n−2 ,
where Fcal denotes the value of the variance ratio F calculated

for a given data set and Fα;1,n−2 is such that
P(F > Fα;1,n−2 ) = α.
There is no evidence to reject H0 if Fcal < Fα;1,n−2 .
17 / 64
Assessing the Model
F-test
Rejecting H0 means that the slope β1 6= 0 and the full

regression model
Yi = β0 + β1 xi + εi
is better than the constant model
Yi = β0 + εi .
18 / 64
Assessing the Model
Estimating σ 2
Theorem.
In the full simple linear regression model we have
E(SSE ) = (n − 2)σ 2
From the theorem we obtain

1
E(MSE ) = E SSE = σ2
n−2
MSE is an unbiased estimator of σ 2 .
It is often denoted by S2 .
19 / 64
Assessing the Model
Estimating σ 2
Notice, that in the full model S2 is not the sample variance.
We have
n
2 1 X [i ))2 , [i ) = βb0 + βb1 xi .
S = MSE = (Yi − E(Y where E(Y
n−2
i=1
It is the sample variance in the constant (null) model, where

[i ) = βb0 = Ȳ and νE = n − 1. Then
E(Y
n
1 X
S2 = (Yi − Ȳ)2 .
n−1
i=1
20 / 64
Assessing the Model
Coefficient of Determination R2
R2 , is the percentage of total variation in the data explained by

the fitted model. That is

2 SSR SST − SSE SSE
R = 100% = 100% = 1 − 100%.
SST SST SST
Note:
I R2 ∈ [0, 100].
I R2 = 0 indicates that none of the variability in the response
is explained by the regression model.
I R2 = 100 indicates that SSE = 0 and all observations fall on
the fitted line exactly.
I A small value of R2 does not always imply a poor
relationship between Y and X, which may, for example,
follow another model.
21 / 64
Assessing the Model
Exammple: Sparrows’ Wings continued
22 / 64
Assessing the Model
The regression equation is

yi = 0.550 + 0.303 xi
S = 0.119189 R-Sq = 99.1% R-Sq(adj) = 99.1%
Source DF SS MS F P
Regression 1 82.216 82.216 5787.39 0.000
Residual Error 53 0.753 0.014
Total 54 82.969
23 / 64
Assessing the Model
Comments:
I We fitted a simple linear model of the form
Yi = β0 + β1 xi + εi , i = 1, . . . , 55, εi ∼ N (0, σ 2 ).
iid
I The estimated values of the parameters are
I intercept: βb0 ∼
= 0.550
I slope: βb1 ∼
= 0.303
24 / 64
Assessing the Model
The ANOVA table shows the significance of the regression

(slope), that is the null hypothesis
H0 : β1 = 0
versus the alternative

H1 : β1 6= 0
can be rejected on the significance level α = 0.001 (p ∼
= 0.000).
I The test requires the assumptions of the normality and of

constant variance of random errors.
I It should be checked whether the assumptions are
approximately met.
I If not, the tests may not be valid.
25 / 64
Assessing the Model
I The value of R2 is very high, i.e., R2 = 99.1%.

I It means that the fitted model explains the variability in the
observed responses very well.
I The graph shows that the observations lie along the fitted
line and there are no strange points which are far from the
line or which could strongly affect the slope.
26 / 64
Assessing the Model
Final conclusions:
We can conclude that the data indicate that the length of
sparrows’ wings depends linearly on their age (within the range
3 - 18 days). The mean increase in the wing’s length per day is
estimated as βb1 ∼
= 0.303 cm.
However, it might be wrong to predict the length or its increase

per day outside the range of the observed time. We would
expect that the growth slows down in time and so the
relationship becomes non-linear.

27 / 64
Residuals
Crude Residuals
We defined the residuals as
ei = Yi − Y
bi .
These are often called crude residuals. We have seen that

n
X
ei = 0.
i=1
What is the distribution of crude residuals?
28 / 64
Residuals
Crude Residuals
Expectation:
E[ei ] = E[Yi − βb0 − βb1 xi ]

= E[Yi ] − E[βb0 ] − xi E[βb1 ]
= β0 + β1 xi − β0 − β1 xi
= 0.
Variance:
1 (xi − x̄)2

2
var[ei ] = σ 1 − + := σ 2 (1 − hii ).
n Sxx
The derivation is shown below.
29 / 64
Residuals
Crude Residuals
bi = Yi − βb0 − βb1 xi
ei = Yi − Y
= Yi − Y − βb1 (xi − x)
n
X
= Yi − Y − cj Yj (xi − x)
j=1
n n
1X X
= Yi − Yj − cj Yj (xi − x)
n
j=1 j=1
n
X 1
= Yi − + (xi − x)cj Yj
n
j=1
X 1
1
= 1− + (xi − x)ci Yi − + (xi − x)cj Yj
n n
j6=i
30 / 64
Residuals
Crude Residuals
This is a linear combination of independent random variables Yj

and so the variance of the combination is the combination of
variances of Yj with squared coefficients.
Furthermore, var(Yj ) = σ 2 , j = 1, . . . , n.
31 / 64
Residuals
Crude Residuals
Hence,
2 X 1 2
1
var(ei ) = σ 2 1 − + (xi − x)ci + σ2 + (xi − x)cj
n n
j6=i
 
2 X 2 
 1 1 1
= σ2 1 − 2 + (xi − x)ci + + (xi − x)ci + + (xi − x)cj
 n n n 
j6=i
 
X n 2 
 1 1
= σ2 1 − 2 + (xi − x)ci + + (xi − x)cj
 n n 
j=1
1 (xi − x̄)2

= . . . = σ2 1 − +
n Sxx
Pn Pn
by j=1 cj = 0 and j=1 c2j = S1xx .
32 / 64
Residuals
Crude Residuals
I Note that the variance depends on i, that is, var(ei ) is not

constant, unlike var(εi ).
I Similarly it can be shown that the covariance of two
residuals ei and ej is

1 (xi − x̄)(xj − x̄)
cov[ei , ej ] = −σ 2 + = −σ 2 hij .
n Sxx
I Also, the residuals are normally distributed (as a

combination of normally distributed random variables), but
not independent.
I We know that var[εi ] = σ 2 and cov[εi , εj ] = 0.
I So the crude residuals ei do not quite follow the properties
of εi .
33 / 64
Residuals
To standardize the residuals we calculate

ei − E(ei ) ei
di = √ =p .
var ei 2
σ (1 − hii )
Then
di ∼ N (0, 1).
They are not independent, though for large samples the
correlation should be small.
34 / 64
Residuals
I However, we do not know σ 2 .

I If we replace σ 2 by S2 we get the so called studentized
residuals (in Minitab they are called standardized
residuals),
ei
ri = p .
2
S (1 − hii )
I For large samples they will approximate the standard di .
35 / 64
Residual Plots
To check constant variance (homoscedasticity) and also
linearity, we plot ri against xi , as it is shown below or against byi
as shown in the next set of figures.
(a) (b)
(a) No problem apparent
(b) Clear non-linearity.

36 / 64
Residual Plots
(a) (b)
(a) No problem apparent.
(b) Variance increases as the mean response increases.
37 / 64
Histograms of Simulated Data from Four Different Distributions
38 / 64
Cumulative distributions: Empirical and Predicted
I Various tests of normality are based on the comparison of the

empirical distribution and the predicted normal distribution.
I For example, Ryan-Joiner Test is based on the correlation
between two.
39 / 64
Normal Probability Plot
I To check whether the distribution of the residuals follows a

symmetric shape of the normal distribution we can draw so
called Normal Probability Plot.
I It plots each value of ordered residuals vs. the percentage
of values in the sample that are less than or equal to it,
along a fitted distribution line.
I The scales are transformed so that the fitted distribution
forms a straight line.
I A plot that departs substantially from linearity suggests
that the error distribution is not normal.
40 / 64
Normal Probability Plot: Data from Normal Distribution
(a) (b)
(a) Histogram of data simulated from standard normal

distribution.
(b) Normal Probability Plot; no problem apparent.

41 / 64
Normal Probability Plot: Data from Log-Normal Distribution
(a) (b)
(a) Histogram of data simulated from standard log-normal

distribution.
(b) Normal Probability Plot indicates skewness of the

distribution
42 / 64
Normal Probability Plot: Data from Beta Distribution
(a) (b)
(a) Histogram of data simulated from Beta(0,1) distribution.
(b) Normal Probability Plot indicates light tails.
43 / 64
Normal Probability Plot: Data from Student t-Distribution
(a) (b)
(a) Histogram of data simulated from t-distribution.
(b) Normal Probability Plot indicates heavy tails.
44 / 64
Normal Probability Plot: Sparrows’s Wings
The Normal Probability Plot does not indicate any apparent

problems with normality of the residuals.
MINITAB
Stat → Basic Statistics → Normality Test...
45 / 64
Example: Overheads
A company builds custom electronic instruments and computer

components. All jobs are manufactured to customer specifications.
The firm wants to be able to estimate its overhead cost. As part of a
preliminary investigation, the firm decides to focus on a particular
department and investigates the relationship between total
departmental overhead cost (Y) and total direct labor hours (X).
46 / 64
Example: Overheads
Two objectives of this investigation are
1. to summarize the relationship between total departmental

overhead and total direct labor hours.
2. to estimate the expected and to predict the actual total
departmental overhead from the total direct labor hours.
47 / 64
Example: Overheads
The regression equation is
Ovhd = 16310 + 11.0 Labor
Predictor Coef SE Coef T P

Constant 16310 2421 6.74 0.000
Labor 10.982 2.268 4.84 0.000
S = 1645.61 R-Sq = 62.6% R-Sq(adj) = 60.0%
Source DF SS MS F P
Regression 1 63517077 63517077 23.46 0.000
Residual Error 14 37912232 2708017
Total 15 101429309
MINITAB
Stat → Regression → Regression...
48 / 64
Example: Overheads
(a) (b)
(a) Residuals versus Fits plot does not contradict a constant

variance nor the linearity of the model.
(b) Normal Probability Plot does not contradict the normality

assumption.
49 / 64
Example: Overheads
Comments:
I The model fit is byi = 16310 + 11xi .

I There is a significant relationship between the overheads
and the labor hours (p < 0.001 in ANOVA).
I The increase of labor hours by 1 will increase the mean
overheads by about £11.
I There is rather large variability in the data; however, the
percentage of total variation explained by the model is
rather small (R2 = 62.6).
I Hence, the question is ’how accurate is the estimate of the
slope?’
50 / 64
Inference about β1
Theorem
In the full simple linear regression model (SLRM) the
distribution of the LSE of β1 , βb1 , is normal with the expectation
2
E(βb1 ) = β1 and the variance var(βb1 ) = Sσxx , that is
σ2

βb1 ∼ N β1 , .
Sxx

Remark
For large samples, where there is no assumption of normality of
Yi , the sampling distribution of βb1 is approximately normal.

51 / 64
Inference about β1
I The theorem allows us to derive a confidence interval (CI)

for β1 and a test of non-significance for β1 .
I After standardization of βb1 we obtain
βb1 − β1
√ ∼ N (0, 1).
σ/ Sxx
I However, the error variance is usually not known and it is

replaced by its estimator.
I Then the normal distribution changes to a Student
t-distribution.
52 / 64
Inference about β1
Lemma
If Z ∼ N(0, 1) and U ∼ χ2ν , and Z and U are independent, then
Z
p ∼ tν .
U/ν

Here we have,
βb1 − β1
Z= √ ∼ N (0, 1).
σ/ Sxx
We will see later that
(n − 2)S2
U= ∼ χ2n−2
σ2
and S2 and βb1 are independent.

53 / 64
Inference about β1
It follows that
−β1
βb1√
σ/ Sxx
T=q
(n−2)S2
σ 2 (n−2)
βb1 − β1
= √ ∼ tn−2 .
S/ Sxx
54 / 64
Inference about β1 : Confidence Interval
To find a CI for an unknown parameter θ means to find values

of the boundaries A and B which satisfy
P(A < θ < B) = 1 − α
for some small α, that is for a high confidence level (1 − α)100%.
Here we have
!
βb1 − β1
P −t α2 ,n−2 < √ < t α2 ,n−2 = 1 − α,
S/ Sxx
where t α2 ,n−2 is such that P(|T| < t α2 ,n−2 ) = 1 − α.
55 / 64
This gives

S S
P βb1 − t α2 ,n−2 √ < β1 < βb1 + t α2 ,n−2 √ = 1 − α.
Sxx Sxx
That is the CI for β1 is

S S
[A, B] = βb1 − t α2 ,n−2 √ , βb1 + t α2 ,n−2 √ .
Sxx Sxx
56 / 64
Example continued
For the given data we obtained values of βb1 , S and Sxx for the
overhead costs:
βb1 = 10.982, S = 1645.61, Sxx = 526656.9.
Also t0.025,14 = 2.14479. Hence, the 95% CI for β1 is

1645.61 1645.61
= 10.982 − 2.14479 √ , 10.982 + 2.14479 √
526656.9 526656.9
= [6.11851, 15.8455]
We would expect (with 95% confidence) that one hour increase in

labour will increase the cost between £6.12 and £15.82.
57 / 64
Inference about β1 : Test of H0 : β1 = 0
The null hypothesis H0 : β1 = 0 means that the slope is zero

and a better model is a constant model
Yi = β0 + εi , εi ∼ N (0, σ 2 )
iid
showing no relationship between Y and X. We have
βb1 − β1
T= √ ∼ tn−2 .
S/ Sxx
Hence, if H0 is true, then
βb1
T= ∼ tn−2 .
√S H0
Sxx
58 / 64
We reject H0 at a significance level α when, for a given data set,

the calculated value of the test function, Tcal , is in the rejection
region, that is
|Tcal | > t α2 ,n−2 .
This is equivalent to the F-test, since if the random variable

W ∼ tν then W 2 ∼ F1,ν .
59 / 64
Remark
Square root of the variance var(βb1 ) is called the standard error
of βb1 and it is denoted by se(βb1 ).
That is s
σ2
se(βb1 ) = .
Sxx
Its estimator is s
\ S2
se(βb1 ) = .
Sxx
Often this estimated standard error is called the standard error.
You should be aware of the difference between the two.

60 / 64
Remark
Note that the (1 − α)100% CI for β1 can be written as

\ \
β1 − t 2 ,n−2 se(β1 ), β1 + t 2 ,n−2 se(β1 )
b α b b α b
and the test statistic for H0 : β1 = 0 as
βb1
T= ∼ tn−2 .
\
se(β1 )
b
61 / 64
Inference about β0
Theorem
In the full SLM the distribution of the LSE of β0 , βb0 , is normal
with the expectation
E(β0 ) = β0 and the variance
b
2
var(βb0 ) = 1n + Sx̄xx σ 2 , that is
x̄2

2 1
βb0 ∼ N β0 , σ + .
n Sxx

62 / 64
Inference about β0
Corollary
Assuming the full simple linear regression model, we obtain
CI for β0 :
\ \
β0 − t α2 ,n−2 se(β0 ), β0 + t α2 ,n−2 se(β0 )
b b b b
Test of the hypothesis H0 : β0 = β0? :
βb0 − β0?
T= ∼ tn−2 ,
\ H0
se(β0 )
b
where s
x̄2

\ 1
se(βb0 ) = S2 + .
n Sxx

63 / 64
Inference about β0
Example continued
The calculated values for the overhead costs are following:
\
βb0 = 16310, se(βb0 ) = 2421
Hence, the 95% CI for β0 is
[a, b] = [16310 − 2.14479 × 2421, 16310 + 2.14479 × 2421]
= [11117.5, 21502.5]
We would expect (with 95% confidence) that even if there is

zero hours of labor, the overhead cost is between £11117.5 and
£21502.5.
64 / 64

Lectures Week 2 2016

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lectures Week 2 2016

Transféré par

Droits d'auteur :

Formats disponibles

MTH5120 S TATISTICAL M ODELING I 2015/2016

Parameter estimates obtained for the model

can be used to estimate the mean response corresponding to

values and are denoted by ybi .

These are estimators of the random errors εi . Thus

= Yi − (Ȳ − βb1 x̄ + βb1 xi )

= Yi − Ȳ − βb1 (xi − x̄).

We shall use the following identity:

Consider the constant model

For this model βb0 = Ȳ and we have

It is called the Total Sum of Squares and is denoted by SST .

For a constant model SSE = SST .

Yi − Ȳ = (Yi − Ybi ) + (Ybi − Ȳ).

For a given data set it could be represented as follows

= SSE + SSR + 2A,

For a given data set

I SSR represents the variability in the observations Yi

The split of the sources of variability is customarily put in the

The table shows the sources of variation, the sums of squares

The d.f. is short for degrees of freedom, which is the number of

The Mean Squares (MS) are measures of average variation in

The Variance Ratio

The mean squares are functions of random variables Yi and so

versus the alternative

Under H0 the ratio has F distribution with 1 and n − 2 degrees

We reject H0 at a significance level α if

Fcal > Fα;1,n−2 ,

where Fcal denotes the value of the variance ratio F calculated

P(F > Fα;1,n−2 ) = α.

There is no evidence to reject H0 if Fcal < Fα;1,n−2 .

Rejecting H0 means that the slope β1 6= 0 and the full

From the theorem we obtain

MSE is an unbiased estimator of σ 2 .

Notice, that in the full model S2 is not the sample variance.

It is the sample variance in the constant (null) model, where

R2 , is the percentage of total variation in the data explained by

The regression equation is

S = 0.119189 R-Sq = 99.1% R-Sq(adj) = 99.1%

I The estimated values of the parameters are

The ANOVA table shows the significance of the regression

versus the alternative

I The test requires the assumptions of the normality and of

I The value of R2 is very high, i.e., R2 = 99.1%.

However, it might be wrong to predict the length or its increase

We defined the residuals as

These are often called crude residuals. We have seen that

What is the distribution of crude residuals?

E[ei ] = E[Yi − βb0 − βb1 xi ]

The derivation is shown below.

This is a linear combination of independent random variables Yj

I Note that the variance depends on i, that is, var(ei ) is not

I Also, the residuals are normally distributed (as a

To standardize the residuals we calculate

I However, we do not know σ 2 .

I For large samples they will approximate the standard di .

(a) No problem apparent

(b) Clear non-linearity.

(a) No problem apparent.

(b) Variance increases as the mean response increases.

I Various tests of normality are based on the comparison of the

I To check whether the distribution of the residuals follows a

(a) Histogram of data simulated from standard normal

(b) Normal Probability Plot; no problem apparent.

(a) Histogram of data simulated from standard log-normal