Académique Documents
Professionnel Documents
Culture Documents
model specification
Multiple Regression
Same concepts as bivariate regression
Interpretations
r2 (1) proportions of variability in y explained by X; (2) maximal
correlation between y and a weighted combination of X
Coefficients (bp): an increase in y for a unit increment in xp HOLDING
OTHER INDEPENDENT VARIABLES CONSTANT
Yi 1 2 X 2 i 3 X 3 i ui
Yi b1 b2 X 2 i b3 X 3 i
The regression coefficients are derived using the same least squares principle used in simple regression
analysis. The fitted value of Y in observation i depends on our choice of b1, b2, and b3.
11
Yi 1 2 X 2 i 3 X 3 i ui
Yi b1 b2 X 2 i b3 X 3 i
ei Yi Yi Yi b1 b2 X 2 i b3 X 3 i
The residual ei in observation i is the difference between the actual and fitted values of Y.
12
We define RSS, the sum of the squares of the residuals, and choose b1, b2, and b3 so as to minimize it.
13
RSS
0
b1
RSS
0
b2
RSS
0
b3
First we expand RSS as shown, and then we use the first order conditions for minimizing it.
14
b1 Y b2 X 2 b3 X 3
b2
X
X
X 2 Yi Y X 3 i X 3
2i
3i
X 3 Yi Y X 2 i X 2 X 3 i X 3
2
2
2
X
X
X
X
X
2i 2 3i 3 2i 2 3i 3
We thus obtain three equations in three unknowns. Solving for b1, b2, and b3, we obtain the expressions
shown above. (The expression for b3 is the same as that for b2, with the subscripts 2 and 3 interchanged
everywhere.)
15
b1 Y b2 X 2 b3 X 3
b2
X
X
X 2 Yi Y X 3 i X 3
2i
3i
X 3 Yi Y X 2 i X 2 X 3 i X 3
2
2
2
X
X
X
X
X
2i 2 3i 3 2i 2 3i 3
The expression for b1 is a straightforward extension of the expression for it in simple regression analysis.
16
b1 Y b2 X 2 b3 X 3
b2
X
X
X 2 Yi Y X 3 i X 3
2i
3i
X 3 Yi Y X 2 i X 2 X 3 i X 3
2
2
2
X
X
X
X
X
2i 2 3i 3 2i 2 3i 3
However, the expressions for the slope coefficients are considerably more complex than that for the slope
coefficient in simple regression analysis.
17
b1 Y b2 X 2 b3 X 3
b2
X
X
X 2 Yi Y X 3 i X 3
2i
3i
X 3 Yi Y X 2 i X 2 X 3 i X 3
2
2
2
X
X
X
X
X
2i 2 3i 3 2i 2 3i 3
For the general case when there are many explanatory variables, ordinary algebra is inadequate. It is
necessary to switch to matrix algebra.
18
Interpretation
y: house price
x1: lot size
x2: number of bedrooms
Holding the number of bedrooms
constant, increasing one square foot
adds an average of $20 to the price
Assumptions of OLS
The relationship between y and x is linear, there is an
equation, = + + that constitutes the population
model.
The errors have mean zero, and constant variance; that is
= 0 and V[] = 2 . The errors about the regression
line do not vary with x; that is, = 2 = 2 .
The residuals are independent; the value of one error is not
affected by the value of another error.
For each value of x, the errors have a normal distribution
about the regression line with mean 0 and variance 2 .
This normal distribution is centered on the regression line.
This assumption may be written as ~(0, 2 ).
Exploratory Analysis
Is the dependent variable normally
distributed?
Are there outliers in the dependent variable?
Are the relationship between the dependent
variable linear?
Are pairs of explanatory variables
independent from each other?
Is the residual IID?
Homoscedasticity
Scatter plot (predicted value on horizontal, residual on
vertical axis)
Linearity
Scatter plot
Shape of Histogram
skewness
(x x)
i 1
ns
kurtosis
4
(
x
x
)
i
i 1
ns 4
Variable transformation
Outliers: Boxplot
5, 5,
6, 9,
10,
11,
11,
12,
12,
14,
16,
17,
19,
21,
21,
21,
21,
21,
21,
22,
23,
24,
26,
26,
31,
31,
36,
42,
44,
77
Maximum
Correlation matrix
GEODA demo
Variable selection
Initial run
> names(police)
[1] "AREA"
"PERIMETER" "CNTY_"
"CNTY_ID"
"NAME"
"STATE_NAME"
[7] "STATE_FIPS" "CNTY_FIPS" "FIPS"
"FIPSNO"
"POLICE"
"POP"
[13] "TAX"
"TRANSFER"
"INC"
"CRIME"
"UNEMP"
"OWN"
[19] "COLLEGE"
"WHITE"
"COMMUTE"
> police_full<lm(police$POLICE~police$INC+police$TAX+police$CRIME+police$UNEMP+police$OWN+police$COLLEGE+police$WHITE+p
olice$COMMUTE)
> summary(police_full)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-3681.2602 1643.0551 -2.240 0.028102 *
police$INC
0.5707
0.1552
3.678 0.000447 ***
police$TAX
-0.1777
1.5500 -0.115 0.909063
police$CRIME
2.5243
0.5264
4.796 8.33e-06 ***
police$UNEMP
-47.6656
56.5926 -0.842 0.402394
police$OWN
1.5789
18.5719
0.085 0.932483
police$COLLEGE
39.7642
12.2447
3.247 0.001760 **
police$WHITE
-12.3306
8.3782 -1.472 0.145386
police$COMMUTE
-2.1177
9.8309 -0.215 0.830050
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 896 on 73 degrees of freedom
Multiple R-squared: 0.6797,
Adjusted R-squared:
F-statistic: 19.36 on 8 and 73 DF, p-value: 2.796e-15
0.6445
Diagnostics of multicollinearity
> colldiag(police_full)
Condition
Index
Variance Decomposition Proportions
intercept police$INC police$TAX police$CRIME
1 1.000 0.000
0.000
0.002
0.004
2 3.625 0.000
0.000
0.011
0.471
3 5.423 0.000
0.001
0.071
0.408
4 6.807 0.000
0.002
0.422
0.035
5 7.785 0.001
0.000
0.156
0.001
6 9.589 0.000
0.000
0.302
0.003
7 22.199 0.031
0.104
0.026
0.010
8 34.797 0.007
0.601
0.006
0.058
9 55.595 0.960
0.293
0.005
0.010
> vif(police_full)
police$INC
police$TAX
2.561469
1.344747
> cor(police[,c(11,13:20)])
POLICE
TAX
POLICE
1.00000000 0.36371791
TAX
0.36371791 1.00000000
TRANSFER 0.96694077 0.32001111
INC
0.66915713 0.28918142
CRIME
0.57533772 0.33857309
UNEMP
-0.20729823 0.08596563
OWN
-0.27590473 -0.25053979
COLLEGE 0.66062780 0.38021943
WHITE
0.04537959 -0.17779552
police$CRIME
1.302452
TRANSFER
0.96694077
0.32001111
1.00000000
0.63810786
0.43073136
-0.21383170
-0.24399134
0.65734105
0.03903367
police$UNEMP
0.001
0.000
0.000
0.035
0.192
0.023
0.425
0.076
0.248
police$UNEMP
1.865568
INC
0.6691571
0.2891814
0.6381079
1.0000000
0.3215812
-0.4679109
-0.0610233
0.5998025
0.3905673
police$OWN
0.000
0.000
0.000
0.001
0.001
0.002
0.045
0.416
0.535
police$COLLEGE
0.001
0.002
0.039
0.011
0.064
0.456
0.148
0.265
0.014
police$OWN police$COLLEGE
2.225621
1.999341
CRIME
0.57533772
0.33857309
0.43073136
0.32158124
1.00000000
0.09645134
-0.23300136
0.29997201
-0.03203264
UNEMP
-0.20729823
0.08596563
-0.21383170
-0.46791092
0.09645134
1.00000000
-0.28464074
-0.22604117
-0.59108844
police$WHITE
0.001
0.001
0.000
0.055
0.002
0.082
0.756
0.028
0.074
police$COMMUTE
0.002
0.071
0.325
0.057
0.231
0.063
0.008
0.235
0.008
police$WHITE police$COMMUTE
2.360316
1.491455
OWN
-0.2759047
-0.2505398
-0.2439913
-0.0610233
-0.2330014
-0.2846407
1.0000000
-0.3619120
0.5457160
COLLEGE
0.66062780
0.38021943
0.65734105
0.59980246
0.29997201
-0.22604117
-0.36191204
1.00000000
0.01510858
WHITE
0.04537959
-0.17779552
0.03903367
0.39056734
-0.03203264
-0.59108844
0.54571601
0.01510858
1.00000000
500
5000
8000
12
16
20
40
60
20
40
6000
100
500
POLICE
0 80000
100
TAX
5000 9000
TRANSFER
1000
INC
10 16
CRIME
70
UNEMP
50
50
OWN
60
20
COLLEGE
30
20
WHITE
COMMUTE
0
4000 8000
0 40000
120000
500
1500
50
60
70
80
20
40
60
80
>plot(police_out, which=1:4)
Residuals vs Fitted
Normal Q-Q
52
81
-2000
Standardized residuals
2000
81
Residuals
4000
52
-2
32
32
-1000
1000
2000
3000
4000
5000
6000
-2
-1
Fitted values
lm(police$POLICE ~ police$INC + police$TAX + police$CRIME + police$UNEMP + ...
Theoretical Quantiles
lm(police$POLICE ~ police$INC + police$TAX + police$CRIME + police$UNEMP + ...
Scale-Location
Cook's distance
2.5
52
1.0
Cook's distance
1.0
1.5
32
1.5
52
81
0.5
32
0.0
0.0
0.5
Standardized residuals
2.0
2.0
56
-1000
1000
2000
3000
4000
5000
6000
Fitted values
lm(police$POLICE ~ police$INC + police$TAX + police$CRIME + police$UNEMP + ...
20
40
60
80
Obs. number
lm(police$POLICE ~ police$INC + police$TAX + police$CRIME + police$UNEMP + ...
Relevant R functions
cor(police[,c(11,13:20)]) #correlation matrix
colldiag(OLS_output_file) #condition index (numbers)
vif(OLS_output_file) #variance inflation factor
pairs(police[,c(11,13:20), pch=1, cex=0.2) #scatterplot matrix
plot(OLS_output_file, which=1:4) #four plots for diagnostics
Concepts to review
Use dummy variable to filter impact from outliers (#52, #32, #81)
SUMMARY OF OUTPUT:
Data set
Dependent Variable
Mean dependent var
S.D. dependent var
ORDINARY LEAST
: police
:
POLICE
:
927.768
:
1493.61
SQUARES ESTIMATION
Number of Observations:
Number of Variables :
Degrees of Freedom
:
R-squared
:
0.796163 F-statistic
Adjusted R-squared :
0.770683 Prob(F-statistic)
Sum squared residual: 3.72884e+07 Log likelihood
Sigma-square
:
517894 Akaike info criterion
S.E. of regression :
719.649 Schwarz criterion
Sigma-square ML
:
454736
S.E of regression ML:
674.341
:
:
:
:
:
82
10
72
31.247
1.868e-21
-650.479
1320.96
1345.03
----------------------------------------------------------------------Variable Coefficient
Std.Error
t-Statistic Probability
----------------------------------------------------------------------CONSTANT
-3155.889
1322.244
-2.386767
0.0196264
TAX
0.4208701
1.248475
0.3371073
0.7370149
INC
0.6768542
0.1257428
5.382846
0.0000009
CRIME
2.431794
0.4230122
5.748756
0.0000002
UNEMP
-59.4707
45.49251
-1.307264
0.1952825
OWN
3.643113
14.92046
0.2441689
0.8077919
COLLEGE
-5.767071
12.12848
-0.4754983
0.6358709
WHITE
-16.7632
6.764766
-2.478017
0.0155575
COMMUTE
-1.972053
7.89626
-0.2497452
0.8034976
DUMMY
3644.325
568.0769
6.415197
0.0000000
-----------------------------------------------------------------------
ORDINARY LEAST
: police
:
POLICE
:
927.768
:
1493.61
SQUARES ESTIMATION
Number of Observations:
Number of Variables :
Degrees of Freedom
:
R-squared
:
0.790559 F-statistic
Adjusted R-squared :
0.779679 Prob(F-statistic)
Sum squared residual: 3.83134e+07 Log likelihood
Sigma-square
:
497577 Akaike info criterion
S.E. of regression :
705.391 Schwarz criterion
Sigma-square ML
:
467237
S.E of regression ML:
683.547
82
5
77
:
72.6614
: 2.28191e-25
:
-651.591
:
1313.18
:
1325.22
----------------------------------------------------------------------Variable Coefficient
Std.Error
t-Statistic Probability
----------------------------------------------------------------------CONSTANT
-3962.951
588.2222
-6.737166
0.0000000
INC
0.7053126
0.09277907
7.602066
0.0000000
CRIME
2.312218
0.3912496
5.909829
0.0000001
WHITE
-12.32904
4.757191
-2.591664
0.0114225
DUMMY
3484.013
441.0964
7.898529
0.0000000
-----------------------------------------------------------------------
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER
21.087093
Backward elimination
Use Akaike Information Criterion (AIC) for model selection
The smaller the AIC, the better
police2_bw<step(lm(police2$POLICE~police2$INC+police2$TAX+police2$CRIME+police2$UNEMP+police2$OWN+police2
$COLLEGE+police2$WHITE+police2$COMMUTE+police2$DUMMY),direction="backward")
Step: AIC=1080.48
police2$POLICE ~ police2$INC + police2$CRIME + police2$WHITE +
police2$DUMMY
Df Sum of Sq
<none>
- police2$WHITE
- police2$CRIME
- police2$INC
- police2$DUMMY
1 3342086
1 17378415
1 28755675
1 31042215
RSS
38313426
41655513
55691841
67069102
69355641
AIC
1080.5
1085.3
1109.2
1124.4
1127.1
Coefficients:
Estimate Std. Error t value
(Intercept) -3.963e+03 5.882e+02 -6.737
police2$INC
7.053e-01 9.278e-02 7.602
police2$CRIME 2.312e+00 3.912e-01 5.910
police2$WHITE -1.233e+01 4.757e+00 -2.592
police2$DUMMY 3.484e+03 4.411e+02 7.899
--Signif. codes: 0 *** 0.001 ** 0.01
Pr(>|t|)
2.60e-09
5.92e-11
8.81e-08
0.0114
1.59e-11
***
***
***
*
***
* 0.05 . 0.1 1
Start: AIC=1088.25
police2$POLICE ~ police2$INC + police2$TAX + police2$CRIME +
police2$UNEMP + police2$OWN + police2$COLLEGE + police2$WHITE +
police2$COMMUTE + police2$DUMMY
Df Sum of Sq
RSS
AIC
- police2$OWN
1
30876 37319263 1086.3
- police2$COMMUTE 1
32302 37320689 1086.3
- police2$TAX
1
58854 37347241 1086.4
- police2$COLLEGE 1
117095 37405482 1086.5
- police2$UNEMP
1
885049 38173436 1088.2
<none>
37288387 1088.2
- police2$WHITE
1
3180164 40468551 1093.0
- police2$INC
1 15006004 52294391 1114.0
- police2$CRIME
1 17115469 54403856 1117.2
- police2$DUMMY
1 21313811 58602197 1123.3
Step: AIC=1086.32
police2$POLICE ~ police2$INC + police2$TAX + police2$CRIME +
police2$UNEMP + police2$COLLEGE + police2$WHITE + police2$COMMUTE +
police2$DUMMY
Df Sum of Sq
RSS
AIC
- police2$COMMUTE 1
13542 37332805 1084.3
- police2$TAX
1
62287 37381549 1084.5
- police2$COLLEGE 1
154982 37474245 1084.7
- police2$UNEMP
1
902951 38222214 1086.3
<none>
37319263 1086.3
- police2$WHITE
1
3720805 41040068 1092.1
- police2$INC
1 15062883 52382146 1112.1
- police2$CRIME
1 17143866 54463129 1115.3
- police2$DUMMY
1 21288736 58607999 1121.3
If OWN eliminated
Step: AIC=1084.35
police2$POLICE ~ police2$INC + police2$TAX + police2$CRIME +
police2$UNEMP + police2$COLLEGE + police2$WHITE + police2$DUMMY
Df Sum of Sq
RSS
AIC
- police2$TAX
1
62140 37394944 1082.5
- police2$COLLEGE 1
152099 37484904 1082.7
- police2$UNEMP
1
892960 38225764 1084.3
<none>
37332805 1084.3
- police2$WHITE
1 3884963 41217768 1090.5
- police2$INC
1 16632077 53964882 1112.6
- police2$CRIME
1 17219408 54552212 1113.5
- police2$DUMMY
1 21306678 58639482 1119.4
Step: AIC=1082.49
police2$POLICE ~ police2$INC + police2$CRIME + police2$UNEMP +
police2$COLLEGE + police2$WHITE + police2$DUMMY
Df Sum of Sq
- police2$COLLEGE 1
121592
- police2$UNEMP
1
857660
<none>
- police2$WHITE
1 4167247
- police2$INC
1 17415537
- police2$CRIME
1 18275653
- police2$DUMMY
1 21254605
RSS
37516536
38252604
37394944
41562191
54810481
55670597
58649549
AIC
1080.8
1082.3
1082.5
1089.2
1111.8
1113.1
1117.4
Step: AIC=1080.75
police2$POLICE ~ police2$INC + police2$CRIME + police2$UNEMP +
police2$WHITE + police2$DUMMY
Df Sum of Sq
- police2$UNEMP 1
796891
<none>
- police2$WHITE 1 4130655
- police2$CRIME 1 18154115
- police2$INC
1 21931399
- police2$DUMMY 1 30589229
RSS
38313426
37516536
41647191
55670651
59447935
68105765
AIC
1080.5
1080.8
1087.3
1111.1
1116.5
1127.7
Step: AIC=1080.48
police2$POLICE ~ police2$INC + police2$CRIME + police2$WHITE +
police2$DUMMY
Df Sum of Sq
<none>
- police2$WHITE
- police2$CRIME
- police2$INC
- police2$DUMMY
1 3342086
1 17378415
1 28755675
1 31042215
RSS
38313426
41655513
55691841
67069102
69355641
AIC
1080.5
1085.3
1109.2
1124.4
1127.1
Forward selection
>step(lm(DV~1),direction=forward,scope=~IV1+IV2++IVk)
> police2_fw<-step(lm(police2$POLICE~1),direction="forward",
scope=~police2$INC+police2$TAX+police2$CRIME+police2$UNEMP+police2$OWN+police2$COLLEGE+pol
ice2$WHITE+police2$COMMUTE+police2$DUMMY)
Start: AIC=1200.67
police2$POLICE ~ 1
Df
+ police2$INC
1
+ police2$COLLEGE 1
+ police2$DUMMY
1
+ police2$CRIME
1
+ police2$TAX
1
+ police2$COMMUTE 1
+ police2$OWN
1
+ police2$UNEMP
1
<none>
+ police2$WHITE
1
Sum of Sq
81911671
79836825
76851690
60552944
24200200
19595871
13925406
7861053
RSS
101020281
103095128
106080263
122379008
158731753
163336082
169006546
175070899
182931953
376713 182555239
AIC
1154.0
1155.6
1158.0
1169.7
1191.0
1193.4
1196.2
1199.1
1200.7
1202.5
Step: AIC=1153.98
police2$POLICE ~ police2$INC
Df
+ police2$DUMMY
1
+ police2$CRIME
1
+ police2$COLLEGE 1
+ police2$OWN
1
+ police2$WHITE
1
+ police2$TAX
1
+ police2$UNEMP
1
<none>
+ police2$COMMUTE 1
Sum of Sq
38628598
26464464
19206089
10146268
10068483
5783456
2622045
RSS
62391683
74555817
81814192
90874014
90951799
95236825
98398237
101020281
1802737 99217544
AIC
1116.5
1131.1
1138.7
1147.3
1147.4
1151.1
1153.8
1154.0
1154.5
Step: AIC=1116.46
police2$POLICE ~ police2$INC + police2$DUMMY
Df Sum of Sq
1 20736170
1 6699842
1 3444367
1 3333611
1 2027396
+ police2$CRIME
+ police2$WHITE
+ police2$OWN
+ police2$TAX
+ police2$UNEMP
<none>
+ police2$COMMUTE 1
+ police2$COLLEGE 1
RSS
41655513
55691841
58947316
59058072
60364287
62391683
857584 61534099
582073 61809610
AIC
1085.3
1109.2
1113.8
1114.0
1115.8
1116.5
1117.3
1117.7
Step: AIC=1085.33
police2$POLICE ~ police2$INC + police2$DUMMY + police2$CRIME
Df Sum of Sq
RSS
+ police2$WHITE
1 3342086 38313426
<none>
41655513
+ police2$OWN
1
845383 40810130
+ police2$TAX
1
417622 41237891
+ police2$COMMUTE 1
218384 41437129
+ police2$COLLEGE 1
87241 41568272
+ police2$UNEMP
1
8322 41647191
AIC
1080.5
1085.3
1085.7
1086.5
1086.9
1087.2
1087.3
Step: AIC=1080.48
police2$POLICE ~ police2$INC + police2$DUMMY + police2$CRIME +
police2$WHITE
Df Sum of Sq
<none>
+ police2$UNEMP
+ police2$COLLEGE
+ police2$OWN
+ police2$TAX
+ police2$COMMUTE
1
1
1
1
1
796891
60823
52178
13488
2848
RSS
38313426
37516536
38252604
38261248
38299939
38310578
AIC
1080.5
1080.8
1082.3
1082.4
1082.5
1082.5
Coefficients:
Pr(>|t|)
2.60e-09
5.92e-11
1.59e-11
8.81e-08
0.0114
***
***
***
***
*
* 0.05 . 0.1 1
Stepwise selection
Start: AIC=1088.25
police2$POLICE ~ police2$INC + police2$TAX + police2$CRIME +
police2$UNEMP + police2$OWN + police2$COLLEGE + police2$WHITE +
police2$COMMUTE + police2$DUMMY
Df Sum of Sq
RSS
AIC
- police2$OWN
1
30876 37319263 1086.3
- police2$COMMUTE 1
32302 37320689 1086.3
- police2$TAX
1
58854 37347241 1086.4
- police2$COLLEGE 1
117095 37405482 1086.5
- police2$UNEMP
1
885049 38173436 1088.2
<none>
37288387 1088.2
- police2$WHITE
1 3180164 40468551 1093.0
- police2$INC
1 15006004 52294391 1114.0
- police2$CRIME
1 17115469 54403856 1117.2
- police2$DUMMY
1 21313811 58602197 1123.3
Step: AIC=1086.32
police2$POLICE ~ police2$INC + police2$TAX + police2$CRIME +
police2$UNEMP + police2$COLLEGE + police2$WHITE + police2$COMMUTE +
police2$DUMMY
Df Sum of Sq
RSS
AIC
- police2$COMMUTE 1
13542 37332805 1084.3
- police2$TAX
1
62287 37381549 1084.5
- police2$COLLEGE 1
154982 37474245 1084.7
- police2$UNEMP
1
902951 38222214 1086.3
<none>
37319263 1086.3
+ police2$OWN
1
30876 37288387 1088.2
- police2$WHITE
1
3720805 41040068 1092.1
- police2$INC
1 15062883 52382146 1112.1
- police2$CRIME
1 17143866 54463129 1115.3
- police2$DUMMY
1 21288736 58607999 1121.3
Step: AIC=1084.35
police2$POLICE ~ police2$INC + police2$TAX + police2$CRIME +
police2$UNEMP + police2$COLLEGE + police2$WHITE + police2$DUMMY
- police2$TAX
- police2$COLLEGE
- police2$UNEMP
<none>
+ police2$COMMUTE
+ police2$OWN
- police2$WHITE
- police2$INC
- police2$CRIME
- police2$DUMMY
Df Sum of Sq
RSS
AIC
1
62140 37394944 1082.5
1
152099 37484904 1082.7
1
892960 38225764 1084.3
37332805 1084.3
1
13542 37319263 1086.3
1
12115 37320689 1086.3
1
3884963 41217768 1090.5
1 16632077 53964882 1112.6
1 17219408 54552212 1113.5
1 21306678 58639482 1119.4
Step: AIC=1082.49
police2$POLICE ~ police2$INC + police2$CRIME + police2$UNEMP +
police2$COLLEGE + police2$WHITE + police2$DUMMY
Df Sum of Sq
- police2$COLLEGE 1
121592
- police2$UNEMP
1
857660
<none>
+ police2$TAX
1
62140
+ police2$OWN
1
14159
+ police2$COMMUTE 1
13395
- police2$WHITE
1 4167247
- police2$INC
1 17415537
- police2$CRIME
1 18275653
- police2$DUMMY
1 21254605
RSS
37516536
38252604
37394944
37332805
37380785
37381549
41562191
54810481
55670597
58649549
AIC
1080.8
1082.3
1082.5
1084.3
1084.5
1084.5
1089.2
1111.8
1113.1
1117.4
Step: AIC=1080.75
police2$POLICE ~ police2$INC + police2$CRIME + police2$UNEMP +
police2$WHITE + police2$DUMMY
- police2$UNEMP
<none>
+ police2$COLLEGE
+ police2$OWN
+ police2$TAX
+ police2$COMMUTE
- police2$WHITE
- police2$CRIME
- police2$INC
- police2$DUMMY
Df Sum of Sq
RSS
1
796891 38313426
37516536
1
121592 37394944
1
37398 37479137
1
31632 37484904
1
10804 37505732
1
4130655 41647191
1 18154115 55670651
1 21931399 59447935
1 30589229 68105765
AIC
1080.5
1080.8
1082.5
1082.7
1082.7
1082.7
1087.3
1111.1
1116.5
1127.7
Step: AIC=1080.48
police2$POLICE ~ police2$INC + police2$CRIME + police2$WHITE +
police2$DUMMY
Df Sum of Sq
<none>
+ police2$UNEMP
+ police2$COLLEGE
+ police2$OWN
+ police2$TAX
+ police2$COMMUTE
- police2$WHITE
- police2$CRIME
- police2$INC
- police2$DUMMY
1
796891
1
60823
1
52178
1
13488
1
2848
1
3342086
1 17378415
1 28755675
1 31042215
RSS
38313426
37516536
38252604
38261248
38299939
38310578
41655513
55691841
67069102
69355641
AIC
1080.5
1080.8
1082.3
1082.4
1082.5
1082.5
1085.3
1109.2
1124.4
1127.1
Coefficients:
Estimate Std. Error t value
(Intercept) -3.963e+03 5.882e+02 -6.737
police2$INC
7.053e-01 9.278e-02 7.602
police2$CRIME 2.312e+00 3.912e-01 5.910
police2$WHITE -1.233e+01 4.757e+00 -2.592
police2$DUMMY 3.484e+03 4.411e+02 7.899
--Signif. codes: 0 *** 0.001 ** 0.01
Pr(>|t|)
2.60e-09
5.92e-11
8.81e-08
0.0114
1.59e-11
***
***
***
*
***
* 0.05 . 0.1 1
CASE II
Data
> str(housing)
'data.frame':
211 obs. of 17 variables:
$ STATION: int 1 2 3 4 5 6 7 8 9 10 ...
$ PRICE : num 47 113 165 104.3 62.5 ...
$ NROOM : num 4 7 7 7 7 6 6 8 6 7 ...
$ DWELL : num 0 1 1 1 1 1 1 1 1 1 ...
$ NBATH : num 1 2.5 2.5 2.5 1.5 2.5 2.5 1.5 1 2.5 ...
$ PATIO : num 0 1 1 1 1 1 1 1 1 1 ...
$ FIREPL : num 0 1 1 1 1 1 1 0 1 1 ...
$ AC
: num 0 1 0 1 0 0 1 0 1 1 ...
$ BMENT : num 2 2 3 2 2 3 3 0 3 3 ...
$ NSTOR : num 3 2 2 2 2 3 1 3 2 2 ...
$ GAR
: num 0 2 2 2 0 1 2 0 0 2 ...
$ AGE
: num 148 9 23 5 19 20 20 22 22 4 ...
$ CITCOU : num 0 1 1 1 1 1 1 1 1 1 ...
$ LOTSZ : num 5.7 279.5 70.6 174.6 107.8 ...
$ SQFT
: num 11.2 28.9 30.6 26.1 22 ...
$ X
: num 907 922 920 923 918 900 918 907 918 897 ...
$ Y
: num 534 574 581 578 574 577 576 576 562 576 ...
- attr(*, "data_types")= chr "N" "N" "N" "N" ...
1. Data exploration
Dependent variable
Distribution: normal, skewedness
Histogram, boxplot
Transformation
IVs
Linear relationship with DV
Scatter plot
Transformation (bulging rules)
DV
Transformation of DV
AGE
LOTSZ
SQFT
-0.37345798 0.59965753 0.41033577
0.07951349 0.30723963 0.68237255
-0.10196667 0.31666268 0.58280075
-0.10078529 0.03797552 0.06428611
0.33689350 -0.15961499 0.54041230
0.11216037 0.44348023 0.38045633
1.00000000 -0.13200062 0.10549370
-0.13200062 1.00000000 0.41380466
0.10549370 0.41380466 1.00000000
Then use scatter plot to identify proper transformation for the IVs
plot(NROOM,PRICE^0.3)
lines(lowess(PRICE^0.3~NROOM)
Median
0.02056
3Q
0.14965
Max
0.93086
4. Check multicollinearity
> colldiag(m4)
Condition
Index
Variance Decomposition Proportions
intercept NROOM NBATH BMENT NSTOR
1
1.000 0.000
0.000 0.001 0.001 0.000
2
2.896 0.000
0.000 0.000 0.003 0.001
3
3.200 0.000
0.000 0.000 0.001 0.000
4
3.871 0.000
0.000 0.000 0.002 0.001
5
4.067 0.000
0.000 0.002 0.005 0.000
6
4.492 0.000
0.000 0.000 0.003 0.000
7
6.312 0.000
0.000 0.000 0.095 0.001
8
6.982 0.000
0.000 0.037 0.227 0.001
9
7.984 0.001
0.002 0.054 0.276 0.007
10 12.151 0.001
0.000 0.632 0.104 0.044
11 13.337 0.035
0.040 0.064 0.262 0.029
12 21.813 0.000
0.177 0.179 0.014 0.546
13 23.044 0.027
0.743 0.005 0.004 0.001
14 41.055 0.935
0.036 0.026 0.001 0.368
> vif(m4)
NROOM
NBATH
BMENT
NSTOR
FIREPL
2.167749
2.015631
1.174733
2.950392
1.481237
AC
CITCOU
1.409244
1.918044
GAR
0.002
0.054
0.291
0.042
0.062
0.415
0.013
0.000
0.004
0.004
0.014
0.024
0.000
0.076
AGE
0.001
0.022
0.010
0.006
0.004
0.000
0.112
0.217
0.151
0.035
0.363
0.006
0.000
0.072
log(LOTSZ)
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.001
0.030
0.113
0.216
0.640
GAR
1.390686
SQFT
0.000
0.000
0.000
0.000
0.000
0.000
0.002
0.001
0.106
0.141
0.129
0.196
0.072
0.352
DWELL
0.001
0.002
0.001
0.014
0.070
0.031
0.188
0.068
0.033
0.043
0.000
0.357
0.116
0.075
AGE log(LOTSZ)
1.817996
No actions
4.186010
PATIO
0.002
0.164
0.000
0.709
0.014
0.008
0.001
0.011
0.025
0.034
0.016
0.006
0.000
0.009
FIREPL
0.002
0.064
0.060
0.067
0.186
0.389
0.059
0.038
0.046
0.000
0.017
0.071
0.000
0.002
AC
0.002
0.083
0.149
0.046
0.166
0.221
0.179
0.040
0.030
0.052
0.015
0.010
0.001
0.006
CITCOU
0.001
0.002
0.023
0.013
0.072
0.002
0.219
0.224
0.001
0.022
0.350
0.010
0.060
0.002
SQFT
DWELL
PATIO
4.448950
3.025165
1.263142
5. Check outliers
Add indicators
> bltm$dummy[1]<-1
> bltm$dummy[16]<-1
> bltm$dummy[53]<-1
Median
0.01454
3Q
0.14102
Max
1.22804
6. Model with
dummy
Median
0.01368
3Q
0.14424
Max
1.26457
Coefficients:
Estimate Std. Error t value
(Intercept) 1.73060
0.13238 13.073
NROOM
0.02788
0.01985
1.405
NBATH
0.11679
0.03540
3.299
BMENT
0.08390
0.02025
4.144
GAR
0.05983
0.03408
1.756
log(LOTSZ)
0.11580
0.03679
3.148
DWELL
0.12976
0.05876
2.208
PATIO
0.10592
0.05481
1.932
FIREPL
0.18950
0.04896
3.870
AC
0.16165
0.04574
3.534
CITCOU
0.29745
0.04589
6.482
dummy
-0.41866
0.15292 -2.738
--Signif. codes: 0 *** 0.001 ** 0.01
Pr(>|t|)
< 2e-16
0.161690
0.001149
5.04e-05
0.080684
0.001898
0.028377
0.054729
0.000147
0.000509
6.99e-10
0.006746
***
7. BE
and Goodness of fits
**
***
.
**
*
.
***
***
***
**
* 0.05 . 0.1 1
6. Model diagnostics
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER
27.991764
TEST ON NORMALITY OF ERRORS
TEST
DF
VALUE
Jarque-Bera
2
122.9635
VALUE
N/A
11.3346274
12.1628749
0.8921555
1.7204031
13.0550305
PROB
0.0000000
PROB
0.0000000
0.0000000
PROB
0.0000000
PROB
N/A
0.0007608
0.0004875
0.3448939
0.1896412
0.0014626
Residual plots
R-squared
Sq. Correlation
Sigma-square
S.E of regression
:
0.737190
: :
0.0612524
:
0.247492
R-squared (BUSE)
Log likelihood
Akaike info criterion
Schwarz criterion
: :
-5.040731
:
34.0815
:
74.3038
----------------------------------------------------------------------Variable
Coefficient
Std.Error
z-value
Probability
----------------------------------------------------------------------CONSTANT
1.751199
0.1312221
13.34531
0.0000000
NROOM
0.02961356
0.01906336
1.553428
0.1203208
NBATH
0.1185139
0.0343727
3.447908
0.0005650
BMENT
0.08276084
0.0196993
4.201207
0.0000266
GAR
0.05483605
0.03321603
1.650891
0.0987607
LOGLOTSZ
0.1086438
0.03611104
3.008603
0.0026247
DWELL
0.1378764
0.05676524
2.428887
0.0151452
PATIO
0.09818663
0.05363735
1.830565
0.0671654
FIREPL
0.1779353
0.04749288
3.746569
0.0001793
AC
0.1578844
0.04443501
3.553152
0.0003807
CITCOU
0.297427
0.04709394
6.315611
0.0000000
DUMMY
-0.4276138
0.1484325
-2.880864
0.0039660
LAMBDA
0.121354
0.10976
1.10563
0.2688867
-----------------------------------------------------------------------
REGRESSION DIAGNOSTICS
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST
Breusch-Pagan test
DF
11
VALUE
307.1502
PROB
0.0000000
PROB
0.3151928
7. Spatial
error model