Vous êtes sur la page 1sur 23

Simple Linear Regression - II

EM521

Applied Statistics

METU, 2012
Outline
Adequacy of a regression model
Validity of error assumptions
Linear vs. nonlinear regression models
R2- coefficient of determination
Adequacy of the Regression Model
Fitting a regression model requires several
assumptions.
1. Errors are uncorrelated random variables with
mean zero;
2. Errors have constant variance; and,
3. Errors are normally distributed.
The analyst should always consider the validity of
these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model

3
Adequacy of the Regression Model
Residual Analysis
The residuals from a regression model are
ei = yi - i
where yi is an actual observation and i is the
corresponding fitted value from the regression model.

Analysis of the residuals is frequently helpful in


checking the assumption that the errors are
approximately normally distributed with constant
variance, and in determining whether additional
terms in the model would be useful.
4
Adequacy of the Regression Model
Residual Analysis

Some patterns for


residual plots:
(a) satisfactory
(b) funnel
(c) double bow
(d) nonlinear

[Adapted from
Montgomery, Peck, and
Vining (2006)]

5
Residual Analysis
If you have the following plot, try developing a
model for log(Y+a) where Y+a>0
Residual Analysis
If you have the following plot, try developing a
model for arcsin(Y1/2)
Residual Analysis
This plot is an indication
that the model is
nonlinear in X.
Try adding higher order
X terms to the model
such as X2, or
try using an appropriate
transformation of X in
the model instead of X
(see next slide)
Residual Analysis: Example of a Nonlinear
Relationship

9
Residual Analysis: Example of a Nonlinear
Relationship

10
Residual Analysis: Test Scores Example
Obs X Y Fit SE Fit Residual St Resid
1 1.80 26.000 25.547 1.405 0.453 0.20
2 2.30 31.000 30.132 0.947 0.868 0.35
3 2.60 34.000 32.883 0.755 1.117 0.43
4 2.40 30.000 31.049 0.872 -1.049 -0.41
5 2.80 34.000 34.717 0.696 -0.717 -0.28
6 3.00 38.000 36.550 0.711 1.450 0.56
7 3.40 41.000 40.218 0.931 0.782 0.31
8 3.20 42.000 38.384 0.796 3.616 1.41
9 3.60 40.000 42.052 1.099 -2.052 -0.84
10 3.80 41.000 43.886 1.286 -2.886 -1.23
11 3.80 48.000 43.886 1.286 4.114 1.75
12 2.40 31.000 31.049 0.872 -0.049 -0.02
13 3.20 34.000 38.384 0.796 -4.384 -1.71
14 2.30 26.000 30.132 0.947 -4.132 -1.65
15 2.30 33.000 30.132 0.947 2.868 1.14
Residual Analysis: Test Scores Example
Normal Probability Plot
(response is Y)
99

95

90

80
70
Percent

60
50
40
30
20

10

1
-7.5 -5.0 -2.5 0.0 2.5 5.0
Residual

Normal probability plot of residuals, the aptitude test score example.


12
Residual Analysis: Test Scores Example
Versus Fits
(response is Y)

1
Residual

-1

-2

-3

-4

-5
25 30 35 40 45
Fitted Value

Plot of residuals versus predicted test scores, .

13
Adequacy of the Regression Model
Coefficient of Determination (R2)
The quantity

is called the coefficient of determination and is often


used to judge the adequacy of a regression model.
0 R2 1
We often refer (loosely) to R2 as the amount of
variability in the data explained or accounted for by the
regression model.
14
Adequacy of the Regression Model
Coefficient of Determination (R2)

For the aptitude test score regression model,

R2 = SSR/SST
= 455.339/548.933
= 0.829

Thus, the model accounts for 82.9% of the


variability in the data.

15
Exercise
If SST 100 and SSE 15 , the value of R2 is:

A) 0.75
B) 0.90
C) 0.85
D) None of the above
Correlation
Assume that the joint distribution of Xi and Yi is the bivariate
normal distribution.
The correlation coefficient between Y and X is:

where XY is the covariance between Y and X.

We can show that

17
Correlation
The estimator of is the sample correlation coefficient

Note that

We may also write

18
Exercise
The correlation coefficient and the slope
provide equivalent information in simple
linear regression.

A) True
B) False
Correlation: Test Scores Example
Marginal Plot of Y vs X

50

45

40
Y

35

30

25
2.0 2.5 3.0 3.5 4.0
X

Scatter plot of aptitude test score versus GPA.

20
Correlation: Test Scores Example
Minitab Output for the Example
The regression equation is
Y = 9.04 + 9.17 X

Predictor Coef SE Coef T P


Constant 9.043 3.369 2.68 0.019
X 9.169 1.153 7.95 0.000

S = 2.68320 R-Sq = 82.9% R-Sq(adj) = 81.6%

PRESS = 128.528 R-Sq(pred) = 76.59%

Analysis of Variance

Source DF SS MS F P
Regression 1 455.34 455.34 63.25 0.000
Residual Error 13 93.59 7.20
Total 14 548.93

21
Correlation: Test Scores Example

Sxx=5.416 and Sxy= 49.66


The sample correlation coefficient is

S xy 49.66
r 0.9108
S xx SST 1/ 2 (5.416)(548.93)1/ 2

Note that r2=(0.9108)2=0.829 (which is reported in the Minitab


output), or that approximately 82.9% of the variability in test
scores is explained by the linear relationship to GPA.

22
Summary
Adequacy of a regression model
Validity of error assumptions
Normal distribution
Constant variance
Uncorrelated errors
Linear vs. nonlinear regression models
R2- coefficient of determination
Correlation and R2

Vous aimerez peut-être aussi