Vous êtes sur la page 1sur 17

Estimation of Error Variance

Common estimate of 2 is
n
1 X
Se2 = (yi (0 + 1 xi ))2
n 2 i=1
n
1 X 2
= e
n 2 i=1 i
SSE
= = MSerror
DFerror
1
Dividing by n2 makes Se2 an unbiased estimator for 2 .
(n 2) follows general d.f. rule:
Estimate 2 parameters in the model.
The residuals satisfy two constraints by LSE method.

1 / 17
Inference for 1

Discuss 1 in detail
Pn Pn
i=1 (xi x)(yi y ) (xi x)yi
1 = Pn 2
= Pi=1
n 2
i=1 (xi x) i=1 (xi x)

1 is a linear combination of normal random variables (the


yi s), so 1 is normally distributed with

E(1 ) = 1
2 2
21 = Var(1 ) = Pn =
i=1 (xi x)
2 (n 1)SX2

2 / 17
Inference for 1

2 is unknown; plug in estimate Se2 .

Sample standard error of 1 is


S
S1 = e
SX n 1

(1 1 )/S1 follows a t-distribution with (n 2) d.f.

Test H0 : 1 = 0 vs. H1 : 1 6= 0 at level ,


T = S1 0 tn2 , and reject H0 if |T | > tn2,1/2
1

100(1 )% C.I. for 1 is 1 tn2,1/2 S1 .

3 / 17
Inference for 0

Point estimate of 0 is 0 = y 1 x.
x 2
 
2 2 1
E(0 ) = 0 , 0 = Var(0 ) = +
n (n 1)SX2

0 N(0 , 2 )
0

q
1 x 2
0 has sample standard error S0 = Se n
+ (n1)SX2

Test H0 : 0 = 0 vs. H1 : 0 6= 0 at level ,


T = S0 0 tn2 , and reject H0 if |T | > tn2,1/2 .
0

100(1 )% C.I. for 0 is 0 tn2,1/2 S0 .

4 / 17
Inference for Regression Line (or Conditional Means)
Inference for E(Y |X = x) = 0 + 1 x

For a chosen x0 ,
estimate is y0 = 0 + 1 x0 = y + 1 (x0 x).
E(y0 ) = Y |X =x
 0 = 0 + 1x0
1 (x0 x)2
Var(y0 ) = 2 n
+ (n1)SX2
q
1 (x0 x)2
Sample standard error is Sy0 = Se n
+ (n1)SX2
.

Test H0 : Y |X =x0 = vs. H1 : Y |X =x0 6= at level ,


T = yS0
y
tn2 , and reject H0 if |T | > tn2,1/2 .
0

100(1 )% C.I. for Y |X =x0 (i.e. 0 + 1 x0 ) is


(0 + 1 x0 ) tn2,1/2 Sy0 .

5 / 17
Prediction

Predict the value for Y at given x0 :


Ynew = 0 + 1 x0 + E

Estimate is still ynew = 0 + 1 x0

Standard error is
s
q 1 (x0 x)2
Sy ,pred = Se2 + Sy20 = Se 1+ +
n (n 1)SX2

100(1 )% prediction interval:

(0 + 1 x0 ) tn2,1/2 Sy ,pred .

6 / 17
Example

Forbes Data

James D. Forbes collected data in the Scotish Alps in the


1840s and 1850s.

n = 17 locations (at different altitudes)

Objective: Predict barometric pressure (in inches of


mercury) from boiling point of water (X) in F.

Use Y = log (barametric pressure).

Motivation: Fragile barameters of the 1840s were difficult


to transport.

7 / 17
BOILING POINT BARAMETRIC NATURAL LOG OF
OF WATER PRESSURE BARAMETRIC
Obs (degrees F) (inches Hg) PRESSURE
1 194.3 20.79 3.034472
2 194.5 20.79 3.034472
3 197.9 22.40 3.109061
4 198.4 22.67 3.121042
5 199.4 23.15 3.141995
6 199.9 23.35 3.150597
7 200.9 23.89 3.173460
8 201.1 23.89 3.173460
9 201.3 24.01 3.178470
10 201.4 24.02 3.178887
11 203.6 25.14 3.224460
12 204.6 26.57 3.279783
13 208.6 27.76 3.323596
14 209.5 28.49 3.349553
15 210.7 29.04 3.368674
16 211.9 29.88 3.397189
17 212.2 30.06 3.403195

8 / 17
Forbes Data

3.5
3.4


Log Pressure
3.3


3.2


3.1


3.0

190 195 200 205 210 215


Boiling point of water (degrees F)

9 / 17
Analysis of Forbes Data
Proposed regression model
yi = 0 + 1 xi + ei
i.i.d
where ei N(0, 2 ), i = 1, , 17.

Yi = log(pressure)

Xi = boiling point ( F)

1 is the increase in mean log(pressure) when boiling point


of water increases by 1 F.

0 is the mean log(pressure) when boiling point of water is


0 F. (Is this extrapolation realistic?)

10 / 17
Analysis of Forbes Data

Estimated regression model


y = 0 + 1 x = 0.970866 + 0.020622x

Residuals: ei = yi yi , i = 1, , 17.

Estimated mean log(pressure) at 212 F is

y212 = 0 + 1 212 = 3.401074.

11 / 17
Analysis of Forbes Data
Inference on 1 :

Test H0 : 1 = 0 (Yi = 0 + ei ) versus


H1 : 1 6= 0 (Yi = 0 + 1 xi + ei ).

1 0 0.0206220
Evaluate T = S
= 0.000379
= 54.42. p-value <<
1
0.0001. Reject H0 and conclude that the slope is positive.

A 95% C.I. for the slope indicates that the slope is very
well estimated from these data

1 t15,0.975 S1
0.020622 (2.131)(0.00037895)
(0.0198, 0.0214)

12 / 17
Analysis of Forbes Data
Inference on 0

Test H0 : 0 = 0 (Yi = 1 xi + ei ) versus


H1 : 0 6= 0 (Yi = 0 + 1 xi + ei )

0 0 0.9710
Evaluate T = S
= 0.0769
= 12.6. p-value
0
<< 0.0001. Reject H0 and conclude that the intercept is
negative. (Is there a practical motivation to do this test?)

A 95% C.I. for the intercept is

0 t15,0.975 S0
0.971 (2.131)(0.0769)
(1.135, 0.807)

13 / 17
Analysis of Forbes Data

Construct a 95% C.I. for mean of log-pressure


measurements when the boiling point of water is x = 209

F.

Estimated mean is
y = 0 + 1 x = 0.9710 + (0.0206)(209) = 3.339

Evaluate the sample standard error of this estimate


s  
1 (209 202.953)2
Sy = 0.0000762 + = 0.00312
17 530.78

A 95% C.I. is y t15,0.975 Sy = (3.333, 3.346).

14 / 17
Analysis of Forbes Data
For every point x, compute 95% C.I. will get us a C.I. band for
the regression line.

3.5
Regression Line
95 percent C.I.

3.4


Log Pressure
3.3


3.2


3.1


3.0

190 195 200 205 210 215


Boiling point of water (degrees F)

15 / 17
Analysis of Forbes Data
Inference for prediction:

Construct a 95% prediction interval for a log-pressure


value when the boiling point of water is x=209 F.

Prediction is the estimated mean (because the estimate of


the error is zero)

y = 0 +1 x+error = 0.9710+(0.0206)(209)+0 = 3.339

Evaluate the standard error of the prediction


s  
1 (209 202.953)2
Sy = 0.0000762 1 + + = 0.00927
17 530.78

16 / 17
Analysis of Forbes Data

A 95% prediction interval is


y t15,0.975 Sy ,pred
3.339 (2.131)(0.00927)
(3.319, 3.359)

Above inferences (estimation, testing, prediction) are in the


output of SAS program. We will introduce SAS coding after
we introduce one more thing the ANOVA table for simple
linear regression. Hold on.

17 / 17

Vous aimerez peut-être aussi