Vous êtes sur la page 1sur 7

Stat-UB.0003.

Regression and Forecasting Models


Prof. Frydman
March 2, 2012
Solutions to Multiple Linear Regression Review Questions
Q1) The regression model for Salary (in dollars) was developed from the
salary survey of computer professionals in a large corporation. The following
variables were used to predict Salary
X=number of years of experience
M=management coded as 1 for a person with management experience
and 0 otherwise
E=Education which was coded as 1 if a person had a high school diploma,
2 if a person had a college diploma and 3 for Advanced degree. Then E was
recoded using the following two dummy variables
HS =

1 if the person has high school diploma
0. otherwise
AD =

1 if the person has Advanced degree
0. otherwise
Answer questions below based on the Minitab output below.
a) Is the estimated multiple regression model I (where Education is coded
as 1, 2, 3) statistically signicant at c = 0.01? (State the hypothesis test and
your conclusion)
H
0
: ,
years
= ,
M
= ,
E
= 0
H
1
: at least one , 6= 0
Since the p-value for this test is zero, the model is statistically signicant for
any c 0..
b) Interpret the coecients in the estimated regression model.
b
,
years
= 570. all else equal, an additional year of experience increases the
mean salary by $570.
b
,
M
= 6688. all else equal, the dierence in mean salary of professionals
with management experience and those without is $6688.
1
b
,
E
= 1579. all else equal, a one level increase in education results in the
mean salary increase of 1579.
c) Is E a statistically signicant variable at c = 0.05? (State the hypoth-
esis test, rejection rule and your conclusion)
H
0
: ,
E
= 0
H
1
: ,
E
6= 0
Yes, because the p-value of this test is zero, that is, 21(2 |t
observ
|) 0.
where t
observ
= 6.02.
d) Now consider the multiple regression model II where Education was
coded using dummy variables dened above. Is there evidence at c = 0.05,
that professionals with Advanced degree have on average dierent salary
than professionals with a college degree? To answer this question formulate
an appropriate hypothesis test.
H
0
: ,
AD
= 0
H
1
: ,
AD
6= 0
Reject H
0
if |t
observ
| .
0025
= 1.96. Since |t
observ
| = 0.38. we cannot reject H
0
.
All else equal, there is no evidence that, that professionals with Advanced
Degree have, on average, dierent salary than professionals with a college
degree
e) Interpret the coecient of HS in the nal regression model III. Which
model do you prefer I, II or III?. Explain fully.
All else equal, the mean salary of professionals with high school degree is
by $3089 lower than the mean salary of professionals with College Degree.and
Advanced Degree combined. I would prefer Model III because it has all
variables statistically signicant, it has the highest adjusted :
2
. the lowest :
and is most informative about the inuence of predictors on Salary.
Q2 (Condos) A Florida real estate agent collected data on a number
of condominium units of similar size within a Florida development. Her
objective was to relate PRICE to other variables listed below.
2
PRICE = selling price of condo unit (in dollars)
FLOOR = oor (1 to 8)
DELEV = distance from the elevator (in yards)
VIEW = 1 if of ocean, 0 otherwise
END = 1 if end unit, 0 otherwise
FURN = 1 if furnished, 0 otherwise
a) Consider a multiple regression model involving all explanatory vari-
ables. How many condominium units were used to construct this model? Is
this a statistically signicant model at c = 0.05?
There were 60 condominiums. The model is statistically signicant be-
cause the p-value of the overall test F-test is equal to zero.
b) Compute the missing p-value associated with the test for the coecient
of FLOOR using the standard normal distribution. State the hypothesis test
(H
0
and H
1
) for which this is a p-value and interpret the p-value.
H
0
: ,
FLOOR
= 0
H
1
: ,
FLOOR
6= 0
Since |t
observ
| = 110.3,113.0 = 0.98.
p-value = 21(2 |t
observ
|) = 21(2 0.98)
= 2[0.5 1(0 < 2 < 0.98)] = 2(0.5 0.3365) = 0.327.
The p-value is the probability of obtaining |t
observ
| = 0.98 or more extreme
when H
0
is true. This is a large p-value, so we do not have evidence to reject
H
0
.
c) According to the best subsets regression, which is the best set of
predictors to use? Explain your choice.
The best set of predictors to use is VIEW and END. This model has the
highest adjuster r
2
and the smallest standard deviation of regression
d) Subsequently the real estate agent estimated two models: a simple
linear regression of PRICE on VIEW, and a linear regression of PRICE on
VIEW and END. She chose a simple linear regression model of PRICE
on VIEW as the nal model. Explain her reasoning
3
She wanted to have a model with highly statistically signicant explana-
tory variables. This lead her to choose the model with one variable: VIEW.
e) Using the regression of PRICE on VIEW, what is the estimated average
dierence in price of apartments with and without an ocean view?
b
,
VIEW
= 3361.3.
Construct a 95% condence interval for the actual dierence in the aver-
age prices of condos with and without an ocean view and interpret it.
3361.3 1.96(528.2)
(2326.03. 4396.57).
We can be 95% condent that the actual dierence in the average prices of
condos with and without an ocean view lies in this interval. Thus, a condo
with a view sells, on average, for more than a condo without a view.
4
MODEL I: Regression Analysis: Salary versus Years, M, E
The r egr essi on equat i on i s
Sal ar y = 6963 + 570 Year s + 6688 M + 1579 E


Pr edi ct or Coef SE Coef T P
Const ant 6963. 5 665. 7 10. 46 0. 000
Year s 570. 09 38. 56 14. 78 0. 000
M 6688. 1 398. 3 16. 79 0. 000
E 1578. 8 262. 3 6. 02 0. 000


S = 1312. 79 R- Sq = 92. 8% R- Sq( adj ) = 92. 3%


Anal ysi s of Var i ance

Sour ce DF SS MS F P
Regr essi on 3 928714168 309571389 179. 63 0. 000
Resi dual Er r or 42 72383410 1723415
Tot al 45 1001097577




MODEL II: Regression Analysis: Salary versus Years, M, HS, AD

The r egr essi on equat i on i s
Sal ar y = 11180 + 546 Year s + 6884 M - 3144 HS - 148 AD


Pr edi ct or Coef SE Coef T P
Const ant 11179. 6 366. 0 30. 55 0. 000
Year s 546. 18 30. 52 17. 90 0. 000
M 6883. 5 313. 9 21. 93 0. 000
HS - 3144. 0 362. 0 - 8. 69 0. 000
AD - 147. 8 387. 7 - 0. 38 0. 705


S = 1027. 44 R- Sq = 95. 7% R- Sq( adj ) = 95. 3%



MODEL III: Regression Analysis: S versus Years, M, HS

The r egr essi on equat i on i s
Sal ar y = 11112 + 549 Year s + 6859 M - 3089 HS


Pr edi ct or Coef SE Coef T P
Const ant 11112. 1 317. 0 35. 06 0. 000
Year s 548. 79 29. 44 18. 64 0. 000
M 6859. 5 304. 4 22. 54 0. 000
HS - 3089. 1 328. 6 - 9. 40 0. 000


S = 1016. 93 R- Sq = 95. 7% R- Sq( adj ) = 95. 4%

Regression Analysis: PRICE versus FLOOR, DELEV, VIEW, END, FURN



The r egr essi on equat i on i s
PRI CE = 17676 - 110 FLOOR + 56. 9 DELEV + 3442 VI EW- 2612 END + 409 FURN


Pr edi ct or Coef SE Coef T P
Const ant 17676. 4 850. 0 20. 79 0. 000
FLOOR - 110. 3 113. 6
DELEV 56. 86 64. 68 0. 88 0. 383
VI EW 3442. 0 542. 6 6. 34 0. 000
END - 2612 1487 - 1. 76 0. 085
FURN 409. 0 574. 9 0. 71 0. 480


S = 2024. 39 R- Sq = 46. 3% R- Sq( adj ) = 41. 3%


Anal ysi s of Var i ance

Sour ce DF SS MS F P
Regr essi on 5 190929812 38185962 9. 32 0. 000
Resi dual Er r or 54 221299362 4098136
Tot al 59 412229173

Best Subsets Regression: PRICE versus FLOOR, DELEV, VIEW, END, FURN

Response i s PRI CE

F D
L E V F
O L I E U
Mal l ows O E E N R
Var s R- Sq R- Sq( adj ) Cp S R V WD N
1 41. 1 40. 1 3. 2 2045. 8 X
2 44. 5 42. 5 1. 8 2003. 6 X X
3 45. 3 42. 4 3. 0 2006. 8 X X X
4 45. 8 41. 9 4. 5 2015. 3 X X X X
5 46. 3 41. 3 6. 0 2024. 4 X X X X X

Regression Analysis: PRICE versus VIEW, END

The r egr essi on equat i on i s
PRI CE = 17689 + 3543 VI EW- 2732 END


Pr edi ct or Coef SE Coef T P
Const ant 17688. 7 365. 8 48. 36 0. 000
VI EW 3543. 5 526. 5 6. 73 0. 000
END - 2732 1466 - 1. 86 0. 068

S = 2003. 58 R- Sq = 44. 5% R- Sq( adj ) = 42. 5%

Regression Analysis: PRICE versus VIEW
PRI CE = 17689 + 3361 VI EW

Pr edi ct or Coef SE Coef T P
Const ant 17688. 7 373. 5 47. 36 0. 000
VI EW 3361. 3 528. 2 6. 36 0. 000

S = 2045. 81 R- Sq = 41. 1% R- Sq( adj ) = 40. 1%