Académique Documents
Professionnel Documents
Culture Documents
EM521
Applied Statistics
METU, 2012
Outline
Adequacy of a regression model
Validity of error assumptions
Linear vs. nonlinear regression models
R2- coefficient of determination
Adequacy of the Regression Model
Fitting a regression model requires several
assumptions.
1. Errors are uncorrelated random variables with
mean zero;
2. Errors have constant variance; and,
3. Errors are normally distributed.
The analyst should always consider the validity of
these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model
3
Adequacy of the Regression Model
Residual Analysis
The residuals from a regression model are
ei = yi - i
where yi is an actual observation and i is the
corresponding fitted value from the regression model.
[Adapted from
Montgomery, Peck, and
Vining (2006)]
5
Residual Analysis
If you have the following plot, try developing a
model for log(Y+a) where Y+a>0
Residual Analysis
If you have the following plot, try developing a
model for arcsin(Y1/2)
Residual Analysis
This plot is an indication
that the model is
nonlinear in X.
Try adding higher order
X terms to the model
such as X2, or
try using an appropriate
transformation of X in
the model instead of X
(see next slide)
Residual Analysis: Example of a Nonlinear
Relationship
9
Residual Analysis: Example of a Nonlinear
Relationship
10
Residual Analysis: Test Scores Example
Obs X Y Fit SE Fit Residual St Resid
1 1.80 26.000 25.547 1.405 0.453 0.20
2 2.30 31.000 30.132 0.947 0.868 0.35
3 2.60 34.000 32.883 0.755 1.117 0.43
4 2.40 30.000 31.049 0.872 -1.049 -0.41
5 2.80 34.000 34.717 0.696 -0.717 -0.28
6 3.00 38.000 36.550 0.711 1.450 0.56
7 3.40 41.000 40.218 0.931 0.782 0.31
8 3.20 42.000 38.384 0.796 3.616 1.41
9 3.60 40.000 42.052 1.099 -2.052 -0.84
10 3.80 41.000 43.886 1.286 -2.886 -1.23
11 3.80 48.000 43.886 1.286 4.114 1.75
12 2.40 31.000 31.049 0.872 -0.049 -0.02
13 3.20 34.000 38.384 0.796 -4.384 -1.71
14 2.30 26.000 30.132 0.947 -4.132 -1.65
15 2.30 33.000 30.132 0.947 2.868 1.14
Residual Analysis: Test Scores Example
Normal Probability Plot
(response is Y)
99
95
90
80
70
Percent
60
50
40
30
20
10
1
-7.5 -5.0 -2.5 0.0 2.5 5.0
Residual
1
Residual
-1
-2
-3
-4
-5
25 30 35 40 45
Fitted Value
13
Adequacy of the Regression Model
Coefficient of Determination (R2)
The quantity
R2 = SSR/SST
= 455.339/548.933
= 0.829
15
Exercise
If SST 100 and SSE 15 , the value of R2 is:
A) 0.75
B) 0.90
C) 0.85
D) None of the above
Correlation
Assume that the joint distribution of Xi and Yi is the bivariate
normal distribution.
The correlation coefficient between Y and X is:
17
Correlation
The estimator of is the sample correlation coefficient
Note that
18
Exercise
The correlation coefficient and the slope
provide equivalent information in simple
linear regression.
A) True
B) False
Correlation: Test Scores Example
Marginal Plot of Y vs X
50
45
40
Y
35
30
25
2.0 2.5 3.0 3.5 4.0
X
20
Correlation: Test Scores Example
Minitab Output for the Example
The regression equation is
Y = 9.04 + 9.17 X
Analysis of Variance
Source DF SS MS F P
Regression 1 455.34 455.34 63.25 0.000
Residual Error 13 93.59 7.20
Total 14 548.93
21
Correlation: Test Scores Example
S xy 49.66
r 0.9108
S xx SST 1/ 2 (5.416)(548.93)1/ 2
22
Summary
Adequacy of a regression model
Validity of error assumptions
Normal distribution
Constant variance
Uncorrelated errors
Linear vs. nonlinear regression models
R2- coefficient of determination
Correlation and R2