Prof. Frydman March 2, 2012 Solutions to Multiple Linear Regression Review Questions Q1) The regression model for Salary (in dollars) was developed from the salary survey of computer professionals in a large corporation. The following variables were used to predict Salary X=number of years of experience M=management coded as 1 for a person with management experience and 0 otherwise E=Education which was coded as 1 if a person had a high school diploma, 2 if a person had a college diploma and 3 for Advanced degree. Then E was recoded using the following two dummy variables HS =
1 if the person has high school diploma 0. otherwise AD =
1 if the person has Advanced degree 0. otherwise Answer questions below based on the Minitab output below. a) Is the estimated multiple regression model I (where Education is coded as 1, 2, 3) statistically signicant at c = 0.01? (State the hypothesis test and your conclusion) H 0 : , years = , M = , E = 0 H 1 : at least one , 6= 0 Since the p-value for this test is zero, the model is statistically signicant for any c 0.. b) Interpret the coecients in the estimated regression model. b , years = 570. all else equal, an additional year of experience increases the mean salary by $570. b , M = 6688. all else equal, the dierence in mean salary of professionals with management experience and those without is $6688. 1 b , E = 1579. all else equal, a one level increase in education results in the mean salary increase of 1579. c) Is E a statistically signicant variable at c = 0.05? (State the hypoth- esis test, rejection rule and your conclusion) H 0 : , E = 0 H 1 : , E 6= 0 Yes, because the p-value of this test is zero, that is, 21(2 |t observ |) 0. where t observ = 6.02. d) Now consider the multiple regression model II where Education was coded using dummy variables dened above. Is there evidence at c = 0.05, that professionals with Advanced degree have on average dierent salary than professionals with a college degree? To answer this question formulate an appropriate hypothesis test. H 0 : , AD = 0 H 1 : , AD 6= 0 Reject H 0 if |t observ | . 0025 = 1.96. Since |t observ | = 0.38. we cannot reject H 0 . All else equal, there is no evidence that, that professionals with Advanced Degree have, on average, dierent salary than professionals with a college degree e) Interpret the coecient of HS in the nal regression model III. Which model do you prefer I, II or III?. Explain fully. All else equal, the mean salary of professionals with high school degree is by $3089 lower than the mean salary of professionals with College Degree.and Advanced Degree combined. I would prefer Model III because it has all variables statistically signicant, it has the highest adjusted : 2 . the lowest : and is most informative about the inuence of predictors on Salary. Q2 (Condos) A Florida real estate agent collected data on a number of condominium units of similar size within a Florida development. Her objective was to relate PRICE to other variables listed below. 2 PRICE = selling price of condo unit (in dollars) FLOOR = oor (1 to 8) DELEV = distance from the elevator (in yards) VIEW = 1 if of ocean, 0 otherwise END = 1 if end unit, 0 otherwise FURN = 1 if furnished, 0 otherwise a) Consider a multiple regression model involving all explanatory vari- ables. How many condominium units were used to construct this model? Is this a statistically signicant model at c = 0.05? There were 60 condominiums. The model is statistically signicant be- cause the p-value of the overall test F-test is equal to zero. b) Compute the missing p-value associated with the test for the coecient of FLOOR using the standard normal distribution. State the hypothesis test (H 0 and H 1 ) for which this is a p-value and interpret the p-value. H 0 : , FLOOR = 0 H 1 : , FLOOR 6= 0 Since |t observ | = 110.3,113.0 = 0.98. p-value = 21(2 |t observ |) = 21(2 0.98) = 2[0.5 1(0 < 2 < 0.98)] = 2(0.5 0.3365) = 0.327. The p-value is the probability of obtaining |t observ | = 0.98 or more extreme when H 0 is true. This is a large p-value, so we do not have evidence to reject H 0 . c) According to the best subsets regression, which is the best set of predictors to use? Explain your choice. The best set of predictors to use is VIEW and END. This model has the highest adjuster r 2 and the smallest standard deviation of regression d) Subsequently the real estate agent estimated two models: a simple linear regression of PRICE on VIEW, and a linear regression of PRICE on VIEW and END. She chose a simple linear regression model of PRICE on VIEW as the nal model. Explain her reasoning 3 She wanted to have a model with highly statistically signicant explana- tory variables. This lead her to choose the model with one variable: VIEW. e) Using the regression of PRICE on VIEW, what is the estimated average dierence in price of apartments with and without an ocean view? b , VIEW = 3361.3. Construct a 95% condence interval for the actual dierence in the aver- age prices of condos with and without an ocean view and interpret it. 3361.3 1.96(528.2) (2326.03. 4396.57). We can be 95% condent that the actual dierence in the average prices of condos with and without an ocean view lies in this interval. Thus, a condo with a view sells, on average, for more than a condo without a view. 4 MODEL I: Regression Analysis: Salary versus Years, M, E The r egr essi on equat i on i s Sal ar y = 6963 + 570 Year s + 6688 M + 1579 E
Pr edi ct or Coef SE Coef T P Const ant 6963. 5 665. 7 10. 46 0. 000 Year s 570. 09 38. 56 14. 78 0. 000 M 6688. 1 398. 3 16. 79 0. 000 E 1578. 8 262. 3 6. 02 0. 000
S = 1312. 79 R- Sq = 92. 8% R- Sq( adj ) = 92. 3%
Anal ysi s of Var i ance
Sour ce DF SS MS F P Regr essi on 3 928714168 309571389 179. 63 0. 000 Resi dual Er r or 42 72383410 1723415 Tot al 45 1001097577
MODEL II: Regression Analysis: Salary versus Years, M, HS, AD
The r egr essi on equat i on i s Sal ar y = 11180 + 546 Year s + 6884 M - 3144 HS - 148 AD
Pr edi ct or Coef SE Coef T P Const ant 11179. 6 366. 0 30. 55 0. 000 Year s 546. 18 30. 52 17. 90 0. 000 M 6883. 5 313. 9 21. 93 0. 000 HS - 3144. 0 362. 0 - 8. 69 0. 000 AD - 147. 8 387. 7 - 0. 38 0. 705
S = 1027. 44 R- Sq = 95. 7% R- Sq( adj ) = 95. 3%
MODEL III: Regression Analysis: S versus Years, M, HS
The r egr essi on equat i on i s Sal ar y = 11112 + 549 Year s + 6859 M - 3089 HS
Pr edi ct or Coef SE Coef T P Const ant 11112. 1 317. 0 35. 06 0. 000 Year s 548. 79 29. 44 18. 64 0. 000 M 6859. 5 304. 4 22. 54 0. 000 HS - 3089. 1 328. 6 - 9. 40 0. 000
S = 1016. 93 R- Sq = 95. 7% R- Sq( adj ) = 95. 4%
Regression Analysis: PRICE versus FLOOR, DELEV, VIEW, END, FURN
The r egr essi on equat i on i s PRI CE = 17676 - 110 FLOOR + 56. 9 DELEV + 3442 VI EW- 2612 END + 409 FURN
Pr edi ct or Coef SE Coef T P Const ant 17676. 4 850. 0 20. 79 0. 000 FLOOR - 110. 3 113. 6 DELEV 56. 86 64. 68 0. 88 0. 383 VI EW 3442. 0 542. 6 6. 34 0. 000 END - 2612 1487 - 1. 76 0. 085 FURN 409. 0 574. 9 0. 71 0. 480
S = 2024. 39 R- Sq = 46. 3% R- Sq( adj ) = 41. 3%
Anal ysi s of Var i ance
Sour ce DF SS MS F P Regr essi on 5 190929812 38185962 9. 32 0. 000 Resi dual Er r or 54 221299362 4098136 Tot al 59 412229173
Best Subsets Regression: PRICE versus FLOOR, DELEV, VIEW, END, FURN
Response i s PRI CE
F D L E V F O L I E U Mal l ows O E E N R Var s R- Sq R- Sq( adj ) Cp S R V WD N 1 41. 1 40. 1 3. 2 2045. 8 X 2 44. 5 42. 5 1. 8 2003. 6 X X 3 45. 3 42. 4 3. 0 2006. 8 X X X 4 45. 8 41. 9 4. 5 2015. 3 X X X X 5 46. 3 41. 3 6. 0 2024. 4 X X X X X
Regression Analysis: PRICE versus VIEW, END
The r egr essi on equat i on i s PRI CE = 17689 + 3543 VI EW- 2732 END
Pr edi ct or Coef SE Coef T P Const ant 17688. 7 365. 8 48. 36 0. 000 VI EW 3543. 5 526. 5 6. 73 0. 000 END - 2732 1466 - 1. 86 0. 068
S = 2003. 58 R- Sq = 44. 5% R- Sq( adj ) = 42. 5%
Regression Analysis: PRICE versus VIEW PRI CE = 17689 + 3361 VI EW
Pr edi ct or Coef SE Coef T P Const ant 17688. 7 373. 5 47. 36 0. 000 VI EW 3361. 3 528. 2 6. 36 0. 000