Vous êtes sur la page 1sur 27

11-1 of 27

Chapter Eleven
Simple Linear Regression

11-2 of 27
McGraw-Hill/Irwin

Copyright 2003 by The McGraw-Hill Companies, Inc. All rights reserved.

Simple Linear Regression


11.1
11.2
11.3

11-3 of 27

The Simple Linear Regression Model


The Least Squares Point Estimates
Model Assumptions, Mean Squared Error, Std.
Error
11.4 Testing Significance of Slope and y-Intercept
11.5 Confidence Intervals and Prediction Intervals
11.6 The Coefficient of Determination and
Correlation
11.7 An F Test for the Simple Linear Regression
Model
11.8* Checking Regression Assumptions by
Residuals
11.9* Some Shortcut Formulas

11.1 The Simple Linear Regression


Model
y= y|x = 0 1 x

y|x = + 1x +
when
x.

Average
Hourly
Temperature
Week x (deg F)
1
28.0
2
28.0
3
32.5
4
39.0
5
45.9
6
57.8
7
58.1
8
62.5

Weekly Fuel
Consumption
y (MMcf)
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5

is the mean value of the dependent variable y


the value of the independent variable is

is the y-intercept, the mean of y when x is 0.


1 is the slope, the in the mean of y per unit change in x.
11-4 of 27

is an error term that describes the effect on y of all factors other

The Simple Linear Regression Model


Illustrated

11-5 of 27

11.2 The Least Squares Point


Estimates
y b0 b1x

Estimation/Prediction Equation:

Least squares point estimate of the slope 1

b1

SS xy
SS xx

SS xy ( xi x )( yi y )
SS xx ( xi x ) xi2
2

x y

xy
i

Least squares point estimate of the y-intercept 0

b0 y b1 x
11-6 of 27

y
n

Example: The Least Squares Point


Estimates
10.72MMcfof Gas
Prediction (x = 40) y b0 b1x 15.84- 0.1279(40)
y
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
81.7

x2
784.00
784.00
1056.25
1521.00
2106.81
3340.84
3375.61
3906.25
16874.76

x
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5
351.8

xy
347.20
327.60
403.00
421.20
431.46
549.10
464.80
468.75
3413.11

Slope b1
SS xy

x y 3413.11 (351.8)(81.7) 179.6475


xy
i

SS xx xi2
b1

SS xy
SS

11-7 of 27 xx

x
i

16874.76

179.6475

0.1279
1404.355

(351.8)
1404.355
8

y-Intercept b0

81.7
10.2125
n
8
xi 351.8 43.98
x
n
8
y

b0 y b1 x
10.2125 (0.1279)(43.98)
15.84

11.3 The Regression Model


Assumptions
Model

y= y|x = 0 1 x

Assumptions about the model error terms, s


Mean Zero The mean of the error terms is equal to 0.
Constant Variance The variance of the error terms is, the
same for all values of x.
Normality The error terms follow a normal distribution for all
values of x.
Independence The values of the error terms are statistically
independent of each other.

11-8 of 27

Regression Model Assumptions


Illustrated

11-9 of 27

Mean Square Error and Standard


Error
SSE ei2

(y

s 2 MSE

y i ) 2

SSE
n- 2

Sum of Squared Errors

Mean Square Error, point


estimate of residual variance

Standard Error, point estimate of


residual standard deviation
Example 11.6 The Fuel Consumption Case
s MSE

y
12.4
11.7
12.4
10.8
9.4
9.5
8.0
7.5
11-10 of 27

x
28.0
28.0
32.5
39.0
45.9
57.8
58.1
62.5

pred
12.2588
12.2588
11.6833
10.8519
9.9694
8.4474
8.4090
7.8463

SSE
n-2

y - pred
0.1412
-0.5588
0.7168
-0.0519
-0.5694
1.0526
-0.4090
-0.3462
SSE

(y - pred)2
0.019937
0.312257
0.513731
0.002694
0.324205
1.108009
0.167289
0.119889
2.568011

s 2 MSE

SSE
n- 2

2.568
0.428
6

s s 2 0.428
0.6542

11.4 Significance Test and


Estimation for Slope
If the regression assumptions hold, we can reject H0: 1 = 0 at
the level of significance (probability of Type I error equal to )
if and only if the appropriate rejection point condition holds or,
equivalently, if the corresponding p-value is less than .
Alternative

p-Value

Reject H0 if:

t t

Area under t distribution right of t

H a : 1 0

t t

Area under t distribution left of t

H a : 1 0

t t / 2 , that is

Twice area under t distribution right of t

H a : 1 0

t t / 2 or t t / 2

Test Statistic
b
s
t= 1 where sb1
sb1
SS xx
t, t/2
11-11 of 27

100(1-)% Confidence Interval for 1


[b1 t / 2 sb1 ]

and p-values are based on n 2 degrees of freedom.

Significance Test and Estimation


for y-Intercept
If the regression assumptions hold, we can reject H0: 0 = 0 at
the level of significance (probability of Type I error equal to )
if and only if the appropriate rejection point condition holds or,
equivalently, if the corresponding p-value is less than .
Alternative

Reject H0 if:

p-Value

H a : 0 0

t t

Area under t distribution left of t

H a : 0 0

t t / 2 , that is

Twice area under t distribution right of t

H a : 0 0

t t

Area under t distribution right of t

t t / 2 or t t / 2

Test Statistic
b0
1 x2
t=
where sb0 s

sb0
n SS xx
, t/2
11-12 of t27

100(1-)% Conf Interval for 0


[b0 t / 2 sb0 ]

and p-values are based on n 2 degrees of freedom.

Example: Inferences About Slope


and y-Intercept
Regression Statistics
Multiple R
0.948413871
R Square
0.899488871
Adjusted R Square
0.882737016
Standard Error
0.654208646
Observations
8

Example 11.7
The Fuel Consumption
Case
Excel Output

ANOVA
df

11-13 of 27

SS
22.980816
2.567934
25.548750

MS
22.980816
0.427989

F
Significance F
53.694882 0.000330052

Regression
Residual
Total

1
6
7

Intercept
Temp

Coefficients Standard Error


t Stat
P-value
15.83785741
0.801773385 19.75353349 0.000001092
-0.127921715
0.01745733 -7.327679169 0.000330052

Tests

Intercept
Temp

Coefficients Standard Error Lower 95%


Upper 95%
15.83785741
0.801773385 13.87598718 17.79972765
-0.127921715
0.01745733 -0.170638294 -0.085205136

Intervals

11.5 Confidence and Prediction Intervals

Prediction (x = x0)

Distance Value

1 ( x0 x ) 2

n
SS xx
If the regression assumptions hold,

y b0 b1 x0

100(1 - )% confidence interval for the mean value of y, y

[y t /2 s Distance value ]

100(1 - )% prediction interval for an individual value of y

[y t /2 s 1 + Distance value ]
11-14 of 27

t is based on n-2 degrees of freedom

Example: Confidence and Prediction


Intervals
Example 11.7 The Fuel Consumption Case
Minitab Output (predicted FuelCons when Temp, x = 40)
Predicted Values
Fit StDev Fit
10.721
0.241

11-15 of 27

95.0% CI
10.130, 11.312)

95.0% PI
9.014, 12.428)

11.6 The Simple Coefficient of


Determination
The simple coefficient of determination r2 is

Explained variation
r
Total variation
2

r2 is the proportion of the total variation in y explained by the


simple linear regression model

Total variation Explained variation Unexplained variation


Total variation = (yi y )2 Total Sum of Squares (SSTO)

Explained variation = (y i y )2 Regression Sum of Squares (SSR)

Unexplained variation = (yi y i )2


11-16 of 27

Error Sum of Squares (SSE)

The Simple Correlation Coefficient


The simple correlation coefficient measures the strength of
the linear relationship between y and x and is denoted by r.

r= r 2 if b1 is positive, and
r= r 2 if b1 is negative
Where, b1 is the slope of the least squares line.
ANOVA
df
Regression
Residual
Total

11-17 of 27

1
6
7

SS
22.980816
2.567934
25.548750

MS
22.980816
0.427989

Regression Statistics
Multiple R
0.948413871
R Square
0.899488871
Adjusted R Square
0.882737016
Standard Error
0.654208646
Observations
8

F
Significance F
53.694882 0.000330052

Example 11.15
Fuel
Consumption
Excel Output

22.980816
r
0.899489
25.548750
r 0.899489 0.948414
2

Different Values of the Correlation


Coefficient

11-18 of 27

11.7 F Test for Simple Linear


Regression Model
To test H0: = 0 versus Ha: 0 at the
level of significance
Test Statistic:

F(model)
Explained variation
(Unexplained variation)/(n - 2)
Reject H0 if
F(model) > For
p-value <

Fis based on 1 numerator and n-2 denominator degrees of


freedom.
11-19 of 27

Example: F Test for Simple Linear


Regression
Example 11.17 The Fuel Consumption Case

Excel Output

ANOVA
df
Regression
Residual
Total

1
6
7

SS
22.980816
2.567934
25.548750

MS
22.980816
0.427989

F
Significance F
53.694882 0.000330052

F-test at = 0.05
level of
significance

Test Statistic:
F(model)

Explained variation
22.980816

53.695
(Unexplained variation)/(n - 2) 2.567904 /(8 2)

Reject H0 at level of significance, since


F(model) 53.695 5.99 F.05 and
p - value 0.00033 0.05
11-20 of 27

Fis based on 1 numerator and 6 denominator degrees of


freedom.

*11.8 Checking the Regression


Assumptions by Residual Analysis
For an observed value of y, the residuale is y y (observed y predicted y)
b0 b1 x
where the predicted value of y is calculatedyas
If the regression assumptions hold, the residuals should look like
a random sample from a normal distribution with mean 0 and
variance 2.
Residual Plots

11-21 of 27

Residuals versus independent variables


Residuals versus predicted ys
Residuals in time order (if the response is a time series)
Histogram of residuals
Normal plot of the residuals

Checking the Constant Variance


Assumption
Example 11.18: The QHIC Case
Plot: Residual versus x and predicted responses

11-22 of 27

Checking the Normality


Assumption
Example 11.18: The QHIC Case
Plots: Histogram and Normal Plot of Residuals

11-23 of 27

Checking the Independence


Assumption
Plots: Residuals versus Fits (to check for functional form, not shown)
Residuals versus Time Order

11-24 of 27

Combination Residual Plots


Example 11.18: The QHIC Case
Minitab
Output
Plots: Histogram and Normal Plot of Residuals,
Residuals
versus Order (I Chart), Residuals versus Fit.
Residual Model Diagnostics
Normal Plot of Residuals

I Chart of Residuals
500

300

3.0SL=396.3

100

Residual

Residual

200

0
-100

2
2

X=0.000

-200
-3.0SL=-396.3

-300

-500
-2

-1

Normal Score

30

40

Residuals vs. Fits


200
100
0
-100
-200
-300

-300 -200 -100 0

100 200 300

Residual

11-25 of 27

20

300

Residual

Frequency

Histogram of Residuals
9
8
7
6
5
4
3
2
1
0

10

Observation Number

0 2004006008001000
1 200
1400
1600
1 800

Fit

*11.9 Some Shortcut Formulas


Total variation SSTO SS yy
Explained variation SSR

SS xy2
SS xx

Unexplained variation SSE = SS yy


where

SS xy ( xi x )( yi y )

11-26 of 27

SS yy ( yi y ) yi2

SS xx ( xi x ) xi2
2

SS xx

x y

xy

SS xy2

Simple Linear Regression


Summary :
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8*
11.9*
11-27 of 27

The Simple Linear Regression Model


The Least Squares Point Estimates
Model Assumptions, Mean Squared Error, Std. Error
Testing Significance of Slope and y-Intercept
Confidence Intervals and Prediction Intervals
The Coefficient of Determination and Correlation
An F Test for the Simple Linear Regression Model
Checking Regression Assumptions by Residuals
Some Shortcut Formulas

Vous aimerez peut-être aussi