Vous êtes sur la page 1sur 141

Chapter 13

Multiple Regression

True / False Questions

1. In regression the dependent variable is referred to as the response variable.

True False

2. If a regression model's F test statistic is Fcalc = 43.82, we could say that the
explained variance is approximately 44 percent.

True False

3. In a regression, the model with the best fit is preferred over all other models.

True False

4. A common misinterpretation of the principle of Occam's Razor is that a simple


regression model (rather than a multiple regression model) is always best.

True False
5. A predictor whose pairwise correlation with Y is near zero can still have a
significant t-value in a multiple regression when other predictors are included.

True False

6. The F statistic in a multiple regression is significant if at least one of the predictors


has a significant t statistic at a given .

True False

7. R2adj can exceed R2 if there are several weak predictors.

True False

8. A binary (categorical) predictor should not be used along with nonbinary


predictors.

True False

9. In a multiple regression with 3 predictors in a sample of 25 U.S. cities, we would


use F3, 21 in a test of overall significance.

True False
10. Evans' Rule says that if n = 50 you need at least 5 predictors to have a good
model.

True False

11. The model Y = 0 + 1X + 2X2 cannot be estimated by Excel because of the


nonlinear term.

True False

12. The random error term in a regression model reflects all factors omitted from the
model.

True False

13. If the probability plot of residuals resembles a straight line, the residuals show a
fairly good fit to the normal distribution.

True False

14. Confidence intervals for Y may be unreliable when the residuals are not normally
distributed.

True False
15. A negative estimated coefficient in a regression usually indicates a weak
predictor.

True False

16. For a certain firm, the regression equation Bonus = 2,000 + 257 Experience +
0.046 Salary describes employee bonuses with a standard error of 125. John has 10
years' experience, earns $50,000, and earned a bonus of $7,000. John is an
outlier.

True False

17. There is one residual for each predictor in the regression model.

True False

18. If R2 and R2adj differ greatly, we should probably add a few predictors to improve
the fit.

True False

19. The effect of a binary predictor is to shift the regression intercept.

True False
20. A parsimonious model is one with many weak predictors but a few strong ones.

True False

21. The F statistic and its p-value give a global test of significance for a multiple
regression.

True False

22. In a regression model of student grades, we would code the nine categories of
business courses taken (ACC, FIN, ECN, MGT, MKT, MIS, ORG, POM, QMM) by
including nine binary (0 or 1) predictors in the regression.

True False

23. A disadvantage of Excel's Data Analysis regression tool is that it expects the
independent variables to be in a block of contiguous columns so you must delete
a column if you want to eliminate a predictor from the model.

True False

24. A disadvantage of Excel's regression is that it does not give as much accuracy in
the estimated regression coefficients as a package like MINITAB.

True False
25. Nonnormality of the residuals from a regression can best be detected by looking
at the residual plots against the fitted Y values.

True False

26. A high variance inflation factor (VIF) indicates a significant predictor in the
regression.

True False

27. Autocorrelation may be detected by looking at a plot of the residuals against


time.

True False

28. A widening pattern of residuals as X increases would suggest heteroscedasticity.

True False

29. Plotting the residuals against a binary predictor (X = 0, 1) reveals nothing about
heteroscedasticity.

True False
30. The regression equation Bonus = 2,812 + 27 Experience + 0.046 Salary says that
Experience is the most significant predictor of Bonus.

True False

31. A multiple regression with 60 observations should not have 13 predictors.

True False

32. A regression of Y using four independent variables X1, X2, X3, X4 could also have
up to four nonlinear terms (X2) and six simple interaction terms (XjXk) if you have
enough observations to justify them.

True False

33. When autocorrelation is present, the estimates of the coefficients will be


unbiased.

True False

34. If the residuals in your regression are nonnormal, a larger sample size might help
improve the reliability of confidence intervals for Y.

True False
35. Multicollinearity can be detected from t tests of the predictor variables.

True False

36. When multicollinearity is present, the regression model is of no use for making
predictions.

True False

37. Autocorrelation of the residuals may affect the reliability of the t values for the
estimated coefficients of the predictors X1, X2, . . . , Xk.

True False

38. The first differences transformation might be tried if autocorrelation is found in a


time-series data set.

True False

39. Statisticians who work with cross-sectional data generally do not anticipate
autocorrelation.

True False
40. The ill effects of heteroscedasticity might be mitigated by redefining totals (e.g.,
total number of homicides) as relative values (e.g., homicide rate per 100,000
population).

True False

41. Nonnormal residuals lead to biased estimates of the coefficients in a regression


model.

True False

42. A large VIF (e.g., 10 or more) would indicate multicollinearity.

True False

43. Heteroscedasticity exists when all the errors (residuals) have the same variance.

True False

44. Multicollinearity refers to relationships among the independent variables.

True False

45. A squared predictor is used to test for nonlinearity in the predictor's relationship
to Y.

True False
46. Nonnormality of residuals is not usually considered a major problem unless there
are outliers.

True False

47. In the fitted regression Y = 12 + 3X1 - 5X2 + 27X3 + 2X4 the most significant
predictor is X3.

True False

48. Given that the fitted regression is Y = 76.40 -6.388X1 + 0.870X2, the standard error
of b1 is 1.453, and n = 63. At = .05, we can conclude that X1 is a significant
predictor of Y.

True False

49. Unlike other predictors, a binary predictor has a t-value that is either 0 or 1.

True False

50. The t-test shows the ratio of an estimated coefficient to its standard error.

True False
51. In a multiple regression with five predictors in a sample of 56 U.S. cities, we would
use F5, 50 in a test of overall significance.

True False

Multiple Choice Questions

52. In a multiple regression with six predictors in a sample of 67 U.S. cities, what
would be the critical value for an F-test of overall significance at = .05?

A. 2.29

B. 2.25

C. 2.37

D. 2.18

53. In a multiple regression with five predictors in a sample of 56 U.S. cities, what
would be the critical value for an F-test of overall significance at = .05?

A. 2.45

B. 2.37

C. 2.40

D. 2.56
54. When predictor variables are strongly related to each other, the __________ of the
regression estimates is questionable.

A. logic

B. fit

C. parsimony

D. stability

55. A test is conducted in 22 cities to see if giving away free transit system maps will
increase the number of bus riders. In a regression analysis, the dependent variable
Y is the increase in bus riders (in thousands of persons) from the start of the test
until its conclusion. The independent variables are X1 = the number (in thousands)
of free maps distributed and a binary variable X2 = 1 if the city has free downtown
parking, 0 otherwise. The estimated regression equation is

In city 3, the observed Y value is 7.3 and X1 = 140 and X2 = 0. The residual for city
3 (in thousands) is:

A. 6.15.

B. 1.15.

C. 4.83.

D. 1.57.
56. If X2 is a binary predictor in Y = 0 + 1X1 + 2X2, then which statement is most
nearly correct?

A. X2 = 1 should represent the most desirable condition.

B. X2 would be a significant predictor if 2 = 423.72.

C. X2 = 0, X2 = 1, X2 = 2 would be appropriate if three categories exist.

D. X2 will shift the estimated equation either by 0 units or by 2 units.

57. The unexplained sum of squares measures variation in the dependent variable Y
about the:

A. mean of the Y values.

B. estimated Y values.

C. mean of the X values.

D. Y-intercept.

58. Which of the following is not true of the standard error of the regression?

A. It is a measure of the accuracy of the prediction.

B. It is based on squared vertical deviations between the actual and predicted


values of Y.

C. It would be negative when there is an inverse relationship in the model.

D. It is used in constructing confidence and prediction intervals for Y.


59. A multiple regression analysis with two independent variables yielded the
following results in the ANOVA table: SS(Total) = 798, SS(Regression) = 738,
SS(Error) = 60. The multiple correlation coefficient is:

A. .2742

B. .0752

C. .9248

D. .9617

60. A fitted multiple regression equation is Y = 12 + 3X1 - 5X2 + 7X3 + 2X4. When X1
increases 2 units and X2 increases 2 units as well, while X3 and X4 remain
unchanged, what change would you expect in your estimate of Y?

A. Decrease by 2

B. Decrease by 4

C. Increase by 2

D. No change in Y
61. A fitted multiple regression equation is Y = 28 + 5X1 - 4X2 + 7X3 + 2X4. When X1
increases 2 units and X2 increases 2 units as well, while X3 and X4 remain
unchanged, what change would you expect in your estimate of Y?

A. Increase by 2

B. Decrease by 4

C. Increase by 4

D. No change in Y

62. Which is not a name often given to an independent variable that takes on just two
values (0 or 1) according to whether or not a given characteristic is absent or
present?

A. Absent variable

B. Binary variable

C. Dummy variable
63. Using a sample of 63 observations, a dependent variable Y is regressed against
two variables X1 and X2 to obtain the fitted regression equation Y = 76.40 -
6.388X1 + 0.870X2. The standard error of b1 is 3.453 and the standard error of b2 is
0.611. At a = .05, we could:

A. conclude that both coefficients differ significantly from zero.

B. reject H0: 1 0 and conclude H0: 1 < 0.

C. reject H0: 2 0 and conclude H0: 1 > 0.

D. conclude that Evans' Rule has been violated.

64. Refer to this ANOVA table from a regression:

Which statement is not accurate?

A. The F-test is significant at = .05.

B. There were 50 observations.

C. There were 5 predictors.

D. There would be 50 residuals.


65. Refer to this ANOVA table from a regression:

For this regression, the R2 is:

A. .3995.

B. .6005.

C. .6654.

D. .8822.
66. Refer to the following regression results. The dependent variable is Abort (the
number of abortions per 1000 women of childbearing age). The regression was
estimated using data for the 50 U.S. states with these predictors: EdSpend =
public K-12 school expenditure per capita, Age = median age of population,
Unmar = percent of total births by unmarried women, Infmor = infant mortality
rate in deaths per 1000 live births.

Which statement is not supported by a two-tailed test?

A. Unmar is a significant predictor at .= .01.

B. EdSpend is a significant predictor at = .20.

C. Infmor is not a significant predictor at = .05.

D. Age is not a significant predictor at = .05.


67. Refer to the following correlation matrix that was part of a regression analysis. The
dependent variable was Abort (the number of abortions per 1000 women of
childbearing age). The regression was estimated using data for the 50 U.S. states
with these predictors: EdSpend = public K-12 school expenditure per capita, Age
= median age of population, Unmar = percent of total births by unmarried
women, Infmor = infant mortality rate in deaths per 1000 live births.
Correlation Matrix

Using a two-tailed correlation test, which statement is not accurate?

A. Age and Infmor are not significantly correlated at = .05.

B. Abort and Unmar are significantly correlated at = .05.

C. Unmar and Infmor are significantly correlated at = .05.

D. The first column of the table shows evidence of multicollinearity.


68. Part of a regression output is provided below. Some of the information has been
omitted.

The approximate value of F is:

A. 1605.7.

B. 0.9134.

C. 89.66.

D. impossible to calculate with the given information.


69. Part of a regression output is provided below. Some of the information has been
omitted.

The SS (residual) is:

A. 3177.17.

B. 301.19.

C. 17.71.

D. impossible to determine.
70. A Realtor is trying to predict the selling price of houses in Greenville (in thousands
of dollars) as a function of Size (measured in thousands of square feet) and
whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if there is a
fireplace). Part of the regression output is provided below, based on a sample of
20 homes. Some of the information has been omitted.

The estimated coefficient for Size is approximately:

A. 9.5.

B. 13.8.

C. 122.5.

D. 1442.6.
71. A Realtor is trying to predict the selling price of houses in Greenville (in thousands
of dollars) as a function of Size (measured in thousands of square feet) and
whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if there is a
fireplace). The regression output is provided below. Some of the information has
been omitted.

How many predictors (independent variables) were used in the regression?

A. 20

B. 18

C. 3

D. 2
72. A Realtor is trying to predict the selling price of houses in Greenville (in thousands
of dollars) as a function of Size (measured in thousands of square feet) and
whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if there is a
fireplace). The regression output is provided below. Some of the information has
been omitted.

Which of the following conclusions can be made based on the F-test?

A. The p-value on the F-test will be very high.

B. At least one of the predictors is useful in explaining Y.

C. The model is of no use in predicting selling prices of houses.

D. The estimates were based on a sample of 19 houses.


73. A Realtor is trying to predict the selling price of houses in Greenville (in thousands
of dollars) as a function of Size (measured in thousands of square feet) and
whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if there is a
fireplace). Part of the regression output is provided below, based on a sample of
20 homes. Some of the information has been omitted.

Which statement is supported by the regression output?

A. At = .05, FP is not a significant predictor in a two-tailed test.

B. A fireplace adds around $6476 to the selling price of the average house.

C. A large house with no fireplace will sell for more than a small house with a
fireplace.

D. FP is a more significant predictor than Size.

74. A log transformation might be appropriate to alleviate which problem(s)?

A. Heteroscedastic residuals

B. Multicollinearity

C. Autocorrelated residuals
75. A useful guideline in determining the extent of collinearity in a multiple regression
model is:

A. Sturge's Rule.

B. Klein's Rule.

C. Occam's Rule.

D. Pearson's Rule.

76. In a multiple regression all of the following are true regarding residuals except:

A. their sum always equals zero.

B. they are the differences between observed and predicted values of the
response variable.

C. they may be used to detect multicollinearity.

D. they may be used to detect heteroscedasticity.


77. The residual plot below suggests which violation(s) of regression assumptions?

A. Autocorrelation

B. Heteroscedasticity

C. Nonnormality

D. Multicollinearity

78. Which is not a standard criterion for assessing a regression model?

A. Logic of causation

B. Overall fit

C. Degree of collinearity

D. Binary predictors
79. If the standard error is 12, a quick prediction interval for Y is:

A. 15.

B. 24.

C. 19.

D. impossible to determine without an F table.

80. Which is a characteristic of the variance inflation factor (VIF)?

A. It is insignificant unless the corresponding t-statistic is significant.

B. It reveals collinearity rather than multicollinearity.

C. It measures the degree of significance of each predictor.

D. It indicates the predictor's degree of multicollinearity.


81. Which statement best describes this regression (Y = highway miles per gallon in 91
cars)?

A. Statistically significant but large error in the MPG predictions

B. Statistically significant and quite small MPG prediction errors

C. Not quite significant, but predictions should be very good

D. Not a significant regression at any customary level of


82. Based on these regression results, in your judgment which statement is most
nearly correct (Y = highway miles per gallon in 91 cars)?

A. The number of predictors is rather small.

B. Some predictors are not contributing much.

C. Prediction intervals would be fairly narrow in terms of MPG.

D. The overall model lacks significance and/or predictive power.


83. In the following regression, which are the three best predictors?

A. ManTran, Wheelbase, RearStRm

B. ManTran, Length, Width

C. NumCyl, HPMax, Length

D. Cannot be ascertained from given information


84. In the following regression, which are the two best predictors?

A. NumCyl, HpMax

B. Intercept, NumCyl

C. NumCyl, Domestic

D. ManTran, Width
85. In the following regression (n = 91), which coefficients differ from zero in a two-
tailed test at = .05?

A. NumCyl, HPMax

B. Intercept, ManTran

C. Intercept, NumCyl, Domestic

D. Intercept, Domestic
86. Based on the following regression ANOVA table, what is the R2?

A. 0.1336

B. 0.6005

C. 0.3995

D. Insufficient information to answer


87. In the following regression, which statement best describes the degree of
multicollinearity?

A. Very little evidence of multicollinearity.

B. Much evidence of multicollinearity.

C. Only NumCyl and HPMax are collinear.

D. Only ManTran and RearStRm are collinear.

88. The relationship of Y to four other variables was established as Y = 12 + 3X1 - 5X2
+ 7X3 + 2X4. When X1 increases 5 units and X2 increases 3 units, while X3 and X4
remain unchanged, what change would you expect in your estimate of Y?

A. Decrease by 15

B. Increase by 15

C. No change

D. Increase by 5
89. Does the picture below show strong evidence of heteroscedasticity against the
predictor Wheelbase?

A. Yes

B. No

C. Need a probability plot to answer

D. Need VIF statistics to answer

90. Which is not a correct way to find the coefficient of determination?

A. SSR/SSE

B. SSR/SST

C. 1 - SSE/SST
91. If SSR = 3600, SSE = 1200, and SST = 4800, then R2 is:

A. .5000

B. .7500

C. .3333

D. .2500

92. Which statement is incorrect?

A. Positive autocorrelation results in too many centerline crossings in the residual


plot over time.

B. The R2 statistic can only increase (or stay the same) when you add more
predictors to a regression.

C. If the F-statistic is insignificant, the t-statistics for the predictors also are
insignificant at the same .

D. A regression with 60 observations and 5 predictors does not violate Evans'


Rule.
93. Which statement about leverage is incorrect?

A. Leverage refers to an observation's distance from the mean of X.

B. If n = 40 and k = 4 predictors, a leverage statistic of .15 would indicate high


leverage.

C. If n = 180 and k = 3 predictors, a leverage statistic of .08 would indicate high


leverage.

94. Which statement is incorrect?

A. Binary predictors shift the intercept of the fitted regression.

B. If a qualitative variable has c categories, we would use only c - 1 binaries as


predictors.

C. A binary predictor has the same t-test as any other predictor.

D. If there is a binary predictor (X = 0, 1) in the model, the residuals may not sum
to zero.

95. Heteroscedasticity of residuals in regression suggests that there is:

A. nonconstant variation in the errors.

B. multicollinearity among the predictors.

C. nonnormality in the errors.

D. lack of independence in successive errors.


96. If you rerun a regression, omitting a predictor X5, which would be unlikely?

A. The new R2 will decline if X5 was a relevant predictor.

B. The new standard error will increase if X5 was a relevant predictor.

C. The remaining estimated 's will change if X5 was collinear with other
predictors.

D. The numerator degrees of freedom for the F test will increase.

97. In a multiple regression, which is an incorrect statement about the residuals?

A. They may be used to test for multicollinearity.

B. They are differences between observed and estimated values of Y.

C. Their sum will always equal zero.

D. They may be used to detect heteroscedasticity.

98. Which of the following is not a characteristic of the F distribution?

A. It is a continuous distribution.

B. It uses a test statistic Fcalc that can never be negative.

C. Its degrees of freedom vary, depending on .

D. It is used to test for overall significance in a regression.


99. Which of the following would be most useful in checking the normality
assumption of the errors in a regression model?

A. The t-statistics for the coefficients

B. The F-statistic from the ANOVA table

C. The histogram of residuals

D. The VIF statistics for the predictors

100.The regression equation Salary = 25,000 + 3200 YearsExperience + 1400


YearsCollege describes employee salaries at Axolotl Corporation. The standard
error is 2600. John has 10 years' experience and 4 years of college. His salary is
$66,500. What is John's standardized residual?

A. -1.250

B. -0.240

C. +0.870

D. +1.500
101.The regression equation Salary = 28,000 + 2700 YearsExperience + 1900
YearsCollege describes employee salaries at Ramjac Corporation. The standard
error is 2400. Mary has 10 years' experience and 4 years of college. Her salary is
$58,350. What is Mary's standardized residual (approximately)?

A. -1.150

B. +2.007

C. -1.771

D. +1.400

102.Which Excel function will give the p-value for overall significance if a regression
has 75 observations and 5 predictors and gives an F test statistic Fcalc = 3.67?

A. =F.INV(.05, 5, 75)

B. =F.DIST(3.67, 4, 74)

C. =F.DIST.RT(3.67, 5, 69)

D. =F.DIST(.05, 4, 70)
103.The ScamMore Energy Company is attempting to predict natural gas
consumption for the month of January. A random sample of 50 homes was used
to fit a regression of gas usage (in CCF) using as predictors Temperature = the
thermostat setting (degrees Fahrenheit) and Occupants = the number of
household occupants. They obtained the following results:

In testing each coefficient for a significant difference from zero (two-tailed test at
= .10), which is the most reasonable conclusion about the predictors?

A. Temperature is highly significant; Occupants is barely significant.

B. Temperature is not significant; Occupants is significant.

C. Temperature is less significant than Occupants.

D. Temperature is significant; Occupants is not significant.

104.In a regression with 60 observations and 7 predictors, there will be _____


residuals.

A. 60

B. 59

C. 52

D. 6
105.A regression with 72 observations and 9 predictors violates:

A. Evans' Rule.

B. Klein's Rule.

C. Doane's Rule.

D. Sturges' Rule.

106.The F-test for ANOVA in a regression model with 4 predictors and 47


observations would have how many degrees of freedom?

A. (3, 44)

B. (4, 46)

C. (4, 42)

D. (3, 43)

107.In a regression with 7 predictors and 62 observations, degrees of freedom for a t-


test for each coefficient would use how many degrees of freedom?

A. 61

B. 60

C. 55

D. 54
Essay Questions
108.Using state data (n = 50) for the year 2000, a statistics student calculated a matrix
of correlation coefficients for selected variables describing state averages on the
two main scholastic aptitude tests (ACT and SAT). (a) In the spaces provided, write
the two-tailed critical values of the correlation coefficient for = .05 and = .01
respectively. Show how you derived these critical values. (b) Mark with * all
correlations that are significant at = .05, and mark with ** those that are
significant at = .01. (c) Why might you expect a negative correlation between
ACT% and SAT%? (d) Why might you expect a positive correlation between SATQ
and SATV? Explain your reasoning. (e) Why is the matrix empty above the
diagonal?
109.Using data for a large sample of cars (n = 93), a statistics student calculated a
matrix of correlation coefficients for selected variables describing each car. (a) In
the spaces provided, write the two-tailed critical values of the correlation
coefficient for = .05 and = .01 respectively. Show how you derived these
critical values. (b) Mark with * all correlations that are significant at = .05, and
mark with ** those that are significant at = .01. (c) Why might you expect a
negative correlation between Weight and HwyMPG? (d) Why might you expect a
positive correlation between HPMax and Length? Explain your reasoning. (e) Why
is the matrix empty above the diagonal?
110.Analyze the regression below (n = 50 U.S. states) using the concepts you have
learned about multiple regression. Circle things of interest and write comments in
the margin. Make a prediction for Poverty for a state with Dropout = 15,
TeenMom = 12, Unem = 4, and Age65% = 12 (show your work). The variables are
Poverty = percentage below the poverty level; Dropout = percent of adult
population that did not finish high school; TeenMom = percent of total births by
teenage mothers; Unem = unemployment rate, civilian labor force; and Age65% =
percent of population aged 65 and over.
111. Analyze the regression results below (n = 33 cars in 1993) using the concepts you
have learned about multiple regression. Circle things of interest and write
comments in the margin. Make a prediction for CityMPG for a car with EngSize =
2.5, ManTran = 1, Length = 184, Wheelbase = 104, Weight = 3000, and Domestic
= 0 (show your work). The variables are CityMPG = city MPG (miles per gallon by
EPA rating); EngSize = engine size (liters); ManTran = 1 if manual transmission
available, 0 otherwise; Length = vehicle length (inches); Wheelbase = vehicle
wheelbase (inches); Weight = vehicle weight (pounds); Domestic = 1 if U.S.
manufacturer, 0 otherwise.
Chapter 13 Multiple Regression Answer Key

True / False Questions

1. In regression the dependent variable is referred to as the response variable.

TRUE

Y is also sometimes called the dependent variable.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression

2. If a regression model's F test statistic is Fcalc = 43.82, we could say that the
explained variance is approximately 44 percent.

FALSE

The R2 statistic (not the F statistic) shows the percent of explained variation.

AACSB: Analytic
Blooms: Understand
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

3. In a regression, the model with the best fit is preferred over all other models.

FALSE

Occam's Razor says that complexity is justified only if it is necessary for a good
model.

AACSB: Analytic
Blooms: Understand
Difficulty: 2 Medium
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression

4. A common misinterpretation of the principle of Occam's Razor is that a simple


regression model (rather than a multiple regression model) is always best.

TRUE

Occam's Razor says that complexity is justified if it is necessary for a good


model.

AACSB: Analytic
Blooms: Understand
Difficulty: 2 Medium
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression
5. A predictor whose pairwise correlation with Y is near zero can still have a
significant t-value in a multiple regression when other predictors are included.

TRUE

The t-statistic for a predictor depends on which other predictors are in the
model.

AACSB: Analytic
Blooms: Understand
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance

6. The F statistic in a multiple regression is significant if at least one of the


predictors has a significant t statistic at a given .

TRUE

At least one predictor coefficient will differ from zero at the same used in the
F test.

AACSB: Analytic
Blooms: Understand
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
7. R2adj can exceed R2 if there are several weak predictors.

FALSE

R2adj is smaller than R2 and a large difference suggests unnecessary predictors.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

8. A binary (categorical) predictor should not be used along with nonbinary


predictors.

FALSE

Binary predictors behave like any other except they look weird on a scatter plot.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors
9. In a multiple regression with 3 predictors in a sample of 25 U.S. cities, we would
use F3, 21 in a test of overall significance.

TRUE

For the F-test we use d.f. = (k, n - k - 1).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

10. Evans' Rule says that if n = 50 you need at least 5 predictors to have a good
model.

FALSE

On the contrary, Evans' Rule is intended to prevent having too many predictors.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Assessing Overall Fit
11. The model Y = 0 + 1X + 2X2 cannot be estimated by Excel because of the
nonlinear term.

FALSE

The X2 predictor is just a data column like any other.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-09 Explain the role of data conditioning and data transformations.
Topic: Tests for Nonlinearity and Interaction

12. The random error term in a regression model reflects all factors omitted from
the model.

TRUE

The errors are assumed normally distributed with zero mean and constant
variance.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression
13. If the probability plot of residuals resembles a straight line, the residuals show a
fairly good fit to the normal distribution.

TRUE

The probability plot is easy to interpret in a general way (linearity suggests


normality).

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

14. Confidence intervals for Y may be unreliable when the residuals are not
normally distributed.

TRUE

If serious nonnormality exists and n is small, confidence intervals may be


affected.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-04 Interpret confidence intervals for regression coefficients.
Topic: Violations of Assumptions
15. A negative estimated coefficient in a regression usually indicates a weak
predictor.

FALSE

It is the t-statistic that indicates the strength of a predictor.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance

16. For a certain firm, the regression equation Bonus = 2,000 + 257 Experience +
0.046 Salary describes employee bonuses with a standard error of 125. John has
10 years' experience, earns $50,000, and earned a bonus of $7,000. John is an
outlier.

FALSE

John's standardized residual is (yactual - yestimated)/se = (7,000 - 6,870)/(125) = 1.04,


which is not unusual.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-08 Identify unusual residuals and high leverage observations.
Topic: Violations of Assumptions
17. There is one residual for each predictor in the regression model.

FALSE

There are k predictors, but there are n residuals e1, e2, , en.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression

18. If R2 and R2adj differ greatly, we should probably add a few predictors to
improve the fit.

FALSE

Evidence of unnecessary predictors can be seen when R2adj is much smaller than
R2.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
19. The effect of a binary predictor is to shift the regression intercept.

TRUE

The omitted category becomes part of the intercept.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors

20. A parsimonious model is one with many weak predictors but a few strong
ones.

FALSE

On the contrary, a lean (parsimonious) model has strong predictors and no


weak ones.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Multiple Regression
21. The F statistic and its p-value give a global test of significance for a multiple
regression.

TRUE

The F-test tells whether or not at least some predictors are significant.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

22. In a regression model of student grades, we would code the nine categories of
business courses taken (ACC, FIN, ECN, MGT, MKT, MIS, ORG, POM, QMM) by
including nine binary (0 or 1) predictors in the regression.

FALSE

We can code c categories with c - 1 predictors (i.e., omit one).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors
23. A disadvantage of Excel's Data Analysis regression tool is that it expects the
independent variables to be in a block of contiguous columns so you must
delete a column if you want to eliminate a predictor from the model.

TRUE

This is why we might want to use MINITAB, MegaStat, SPSS, or Systat.

AACSB: Technology
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Multiple Regression

24. A disadvantage of Excel's regression is that it does not give as much accuracy in
the estimated regression coefficients as a package like MINITAB.

FALSE

Excel's accuracy is good for most of the common regression statistics.

AACSB: Technology
Blooms: Understand
Difficulty: 1 Easy
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Multiple Regression
25. Nonnormality of the residuals from a regression can best be detected by
looking at the residual plots against the fitted Y values.

FALSE

Use a probability plot to check for nonnormality (a residual plot tests for
heteroscedasticity).

AACSB: Analytic
Blooms: Understand
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

26. A high variance inflation factor (VIF) indicates a significant predictor in the
regression.

FALSE

A high VIF indicates that a predictor is related to the other predictors in the
model.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
27. Autocorrelation may be detected by looking at a plot of the residuals against
time.

TRUE

Too many or too few crossings of the zero axis suggest nonrandomness.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

28. A widening pattern of residuals as X increases would suggest


heteroscedasticity.

TRUE

The absence of a pattern would be ideal (homoscedastic).

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
29. Plotting the residuals against a binary predictor (X = 0, 1) reveals nothing about
heteroscedasticity.

FALSE

You can still spot wider or narrower spread at the two points X = 0 and X = 1.

AACSB: Analytic
Blooms: Remember
Difficulty: 3 Hard
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

30. The regression equation Bonus = 2,812 + 27 Experience + 0.046 Salary says
that Experience is the most significant predictor of Bonus.

FALSE

You need a t-statistic to assess significance of a predictor.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
31. A multiple regression with 60 observations should not have 13 predictors.

TRUE

Evans' Rule suggests no more than n/10 = 60/10 = 6 predictors.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

32. A regression of Y using four independent variables X1, X2, X3, X4 could also have
up to four nonlinear terms (X2) and six simple interaction terms (XjXk) if you
have enough observations to justify them.

TRUE

We must count all the possible squares and two-way combinations of four
predictors.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-09 Explain the role of data conditioning and data transformations.
Topic: Tests for Nonlinearity and Interaction
33. When autocorrelation is present, the estimates of the coefficients will be
unbiased.

TRUE

There is no bias in the OLS estimates, though variances and t-tests may be
affected.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

34. If the residuals in your regression are nonnormal, a larger sample size might
help improve the reliability of confidence intervals for Y.

TRUE

Asymptotic normality and consistency of the OLS estimators may help.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
35. Multicollinearity can be detected from t tests of the predictor variables.

FALSE

The t-tests only indicate significance (we use VIFs to detect multicollinearity).

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity

36. When multicollinearity is present, the regression model is of no use for making
predictions.

FALSE

Multicollinearity makes it hard to assess each predictor's role, but predictions


may be useful.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
37. Autocorrelation of the residuals may affect the reliability of the t values for the
estimated coefficients of the predictors X1, X2, . . . , Xk.

TRUE

Autocorrelation can affect the variances of the estimators, hence their t-values.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

38. The first differences transformation might be tried if autocorrelation is found in


a time-series data set.

TRUE

First differences may help, and is an easily understood transformation.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
39. Statisticians who work with cross-sectional data generally do not anticipate
autocorrelation.

TRUE

We are more likely to see autocorrelation in time-series data.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

40. The ill effects of heteroscedasticity might be mitigated by redefining totals (e.g.,
total number of homicides) as relative values (e.g., homicide rate per 100,000
population).

TRUE

Large magnitude ranges for X's and Y (the "size" problem) can induce
heteroscedasticity.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-09 Explain the role of data conditioning and data transformations.
Topic: Violations of Assumptions
41. Nonnormal residuals lead to biased estimates of the coefficients in a regression
model.

FALSE

There is no bias in the estimated coefficients, though confidence intervals may


be affected.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

42. A large VIF (e.g., 10 or more) would indicate multicollinearity.

TRUE

Some multicollinearity is inevitable, but very large VIFs suggest competing


predictors.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
43. Heteroscedasticity exists when all the errors (residuals) have the same variance.

FALSE

The statement would be true if we change the first word to "homoscedasticity."

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

44. Multicollinearity refers to relationships among the independent variables.

TRUE

When one predictor is predicted by the other predictors, we have


multicollinearity.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
45. A squared predictor is used to test for nonlinearity in the predictor's
relationship to Y.

TRUE

Including a squared predictor is an easy way to make the relationship nonlinear.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-09 Explain the role of data conditioning and data transformations.
Topic: Tests for Nonlinearity and Interaction

46. Nonnormality of residuals is not usually considered a major problem unless


there are outliers.

TRUE

Serious nonnormality can make the confidence intervals unreliable.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
47. In the fitted regression Y = 12 + 3X1 - 5X2 + 27X3 + 2X4 the most significant
predictor is X3.

FALSE

We must have the t-statistics (not just the coefficients) to assess each
predictor's significance.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance

48. Given that the fitted regression is Y = 76.40 -6.388X1 + 0.870X2, the standard
error of b1 is 1.453, and n = 63. At = .05, we can conclude that X1 is a
significant predictor of Y.

TRUE

tcalc = (-6.388)/(1.453) = -4.396, which is < t.025 = -2.000 for d.f. = 60 in a two-
tailed test.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
49. Unlike other predictors, a binary predictor has a t-value that is either 0 or 1.

FALSE

The t-value for a binary predictor is like any other t-value.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors

50. The t-test shows the ratio of an estimated coefficient to its standard error.

TRUE

In a test for zero coefficient (and in computer output) tcalc = bj/sbj.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
51. In a multiple regression with five predictors in a sample of 56 U.S. cities, we
would use F5, 50 in a test of overall significance.

TRUE

F.05 = 2.25 for d.f. = (k, n - k - 1) = (6, 67 - 6 - 1) = (6, 60).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

Multiple Choice Questions

52. In a multiple regression with six predictors in a sample of 67 U.S. cities, what
would be the critical value for an F-test of overall significance at = .05?

A. 2.29

B. 2.25

C. 2.37

D. 2.18

F.05 = 2.25 for d.f. = (k, n - k - 1) = (6, 67 - 6 - 1) = (6, 60).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

53. In a multiple regression with five predictors in a sample of 56 U.S. cities, what
would be the critical value for an F-test of overall significance at = .05?

A. 2.45

B. 2.37

C. 2.40

D. 2.56

F.05 = 2.40 for d.f. = (k, n - k - 1) = (5, 56 - 5 - 1) = (5, 50).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
54. When predictor variables are strongly related to each other, the __________ of
the regression estimates is questionable.

A. logic

B. fit

C. parsimony

D. stability

High interpredictor correlation affects their variances, so coefficients are less


certain.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
55. A test is conducted in 22 cities to see if giving away free transit system maps
will increase the number of bus riders. In a regression analysis, the dependent
variable Y is the increase in bus riders (in thousands of persons) from the start
of the test until its conclusion. The independent variables are X1 = the number
(in thousands) of free maps distributed and a binary variable X2 = 1 if the city
has free downtown parking, 0 otherwise. The estimated regression equation is

In city 3, the observed Y value is 7.3 and X1 = 140 and X2 = 0. The residual for
city 3 (in thousands) is:

A. 6.15.

B. 1.15.

C. 4.83.

D. 1.57.

yestimated = 1.32 + .0345(140) - 1.45(0) = 6.15, so the residual is (7.3 - 6.15) = 1.15.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Assessing Overall Fit
56. If X2 is a binary predictor in Y = 0 + 1X1 + 2X2, then which statement is most
nearly correct?

A. X2 = 1 should represent the most desirable condition.

B. X2 would be a significant predictor if 2 = 423.72.

C. X2 = 0, X2 = 1, X2 = 2 would be appropriate if three categories exist.

D. X2 will shift the estimated equation either by 0 units or by 2 units.

If X2 = 0 then nothing is added to the equation, while if X2 = 1 we add 2 units.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors

57. The unexplained sum of squares measures variation in the dependent variable
Y about the:

A. mean of the Y values.

B. estimated Y values.

C. mean of the X values.

D. Y-intercept.

We are trying to explain variation in the response variable around its mean.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

58. Which of the following is not true of the standard error of the regression?

A. It is a measure of the accuracy of the prediction.

B. It is based on squared vertical deviations between the actual and predicted


values of Y.

C. It would be negative when there is an inverse relationship in the model.

D. It is used in constructing confidence and prediction intervals for Y.

The standard error is the square root of a sum of squares so it cannot be


negative.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-04 Interpret confidence intervals for regression coefficients.
Topic: Confidence Intervals for Y
59. A multiple regression analysis with two independent variables yielded the
following results in the ANOVA table: SS(Total) = 798, SS(Regression) = 738,
SS(Error) = 60. The multiple correlation coefficient is:

A. .2742

B. .0752

C. .9248

D. .9617

R2 = SSR/SST = 738/798 = .9248, so r = (R2)1/2 = .9617.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
60. A fitted multiple regression equation is Y = 12 + 3X1 - 5X2 + 7X3 + 2X4. When X1
increases 2 units and X2 increases 2 units as well, while X3 and X4 remain
unchanged, what change would you expect in your estimate of Y?

A. Decrease by 2

B. Decrease by 4

C. Increase by 2

D. No change in Y

The net effect is + 3X1 - 5X2 = 3(2) - 5(2) = 6 - 10 = -4.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression
61. A fitted multiple regression equation is Y = 28 + 5X1 - 4X2 + 7X3 + 2X4. When X1
increases 2 units and X2 increases 2 units as well, while X3 and X4 remain
unchanged, what change would you expect in your estimate of Y?

A. Increase by 2

B. Decrease by 4

C. Increase by 4

D. No change in Y

The net effect is + 5X1 - 4X2 = 5(2) - 4(2) = 10 - 8 = +2.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression
62. Which is not a name often given to an independent variable that takes on just
two values (0 or 1) according to whether or not a given characteristic is absent
or present?

A. Absent variable

B. Binary variable

C. Dummy variable

A two-valued predictor is a binary or dummy variable (special cases of


categorical predictors).

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors
63. Using a sample of 63 observations, a dependent variable Y is regressed against
two variables X1 and X2 to obtain the fitted regression equation Y = 76.40 -
6.388X1 + 0.870X2. The standard error of b1 is 3.453 and the standard error of b2
is 0.611. At a = .05, we could:

A. conclude that both coefficients differ significantly from zero.

B. reject H0: 1 0 and conclude H0: 1 < 0.

C. reject H0: 2 0 and conclude H0: 1 > 0.

D. conclude that Evans' Rule has been violated.

For 1 we have tcalc = (-6.388)/(3.453) = -1.849 which is less than t.05 = -1.671 for
d.f. = 60 in a left-tailed test. For 2 we have tcalc = (0.870)/(0.611) = +1.424 which
does not exceed t.05 = +1.671 for d.f. = 60 in a right-tailed test. For a two-tailed
test, t.025 = 2.000, so neither coefficient would differ significantly from zero at
a = .05. Evans' Rule is not violated because n/k = 63/3 = 21.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
64. Refer to this ANOVA table from a regression:

Which statement is not accurate?

A. The F-test is significant at = .05.

B. There were 50 observations.

C. There were 5 predictors.

D. There would be 50 residuals.

d.f. = (k, n - k - 1) = (4, 45), so k = 4 predictors.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
65. Refer to this ANOVA table from a regression:

For this regression, the R2 is:

A. .3995.

B. .6005.

C. .6654.

D. .8822.

R2 = SSR/SST = (1793.2356)/(4488.3352) = .3995.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
66. Refer to the following regression results. The dependent variable is Abort (the
number of abortions per 1000 women of childbearing age). The regression was
estimated using data for the 50 U.S. states with these predictors: EdSpend =
public K-12 school expenditure per capita, Age = median age of population,
Unmar = percent of total births by unmarried women, Infmor = infant mortality
rate in deaths per 1000 live births.

Which statement is not supported by a two-tailed test?

A. Unmar is a significant predictor at .= .01.

B. EdSpend is a significant predictor at = .20.

C. Infmor is not a significant predictor at = .05.

D. Age is not a significant predictor at = .05.

For Infmor, tcalc = (-3.7848)/(1.0173) = -3.720, which is < t.025 = -2.014 for d.f. =
45.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
67. Refer to the following correlation matrix that was part of a regression analysis.
The dependent variable was Abort (the number of abortions per 1000 women
of childbearing age). The regression was estimated using data for the 50 U.S.
states with these predictors: EdSpend = public K-12 school expenditure per
capita, Age = median age of population, Unmar = percent of total births by
unmarried women, Infmor = infant mortality rate in deaths per 1000 live births.
Correlation Matrix

Using a two-tailed correlation test, which statement is not accurate?

A. Age and Infmor are not significantly correlated at = .05.

B. Abort and Unmar are significantly correlated at = .05.

C. Unmar and Infmor are significantly correlated at = .05.

D. The first column of the table shows evidence of multicollinearity.

Use rcrit = t.025/(t.0252 + n - 2)1/2 = (2.011)/(2.0112 + 50 - 2)1/2 = .2788 for d.f. = 50 -


2 = 48 for a two-tailed test at = .05. Using this criterion, we see that two pairs
of predictors, (Abort and Unmar) and (Unmar and Infmor), have correlations
that differ significantly from zero.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity

68. Part of a regression output is provided below. Some of the information has
been omitted.

The approximate value of F is:

A. 1605.7.

B. 0.9134.

C. 89.66.

D. impossible to calculate with the given information.

Fcalc = MSR/MSE = (1588.6)/(17.717) = 89.66.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
69. Part of a regression output is provided below. Some of the information has
been omitted.

The SS (residual) is:

A. 3177.17.

B. 301.19.

C. 17.71.

D. impossible to determine.

SSE = SST - SSR = 3478.36 - 3177.17 = 301.19.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
70. A Realtor is trying to predict the selling price of houses in Greenville (in
thousands of dollars) as a function of Size (measured in thousands of square
feet) and whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if
there is a fireplace). Part of the regression output is provided below, based on a
sample of 20 homes. Some of the information has been omitted.

The estimated coefficient for Size is approximately:

A. 9.5.

B. 13.8.

C. 122.5.

D. 1442.6.

Coefficient = (t Stat)/(Std Err) = (11.439)/(1.2072436) = 9.475.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
71. A Realtor is trying to predict the selling price of houses in Greenville (in
thousands of dollars) as a function of Size (measured in thousands of square
feet) and whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if
there is a fireplace). The regression output is provided below. Some of the
information has been omitted.

How many predictors (independent variables) were used in the regression?

A. 20

B. 18

C. 3

D. 2

d.f. = (k, n - k - 1) = (2, 17), so k = 2.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
72. A Realtor is trying to predict the selling price of houses in Greenville (in
thousands of dollars) as a function of Size (measured in thousands of square
feet) and whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if
there is a fireplace). The regression output is provided below. Some of the
information has been omitted.

Which of the following conclusions can be made based on the F-test?

A. The p-value on the F-test will be very high.

B. At least one of the predictors is useful in explaining Y.

C. The model is of no use in predicting selling prices of houses.

D. The estimates were based on a sample of 19 houses.

Fcalc = MSR/MSE = (1588.6)/(17.717) = 89.66, which exceeds F.05 = 3.59 for d.f. =
(2, 17).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
73. A Realtor is trying to predict the selling price of houses in Greenville (in
thousands of dollars) as a function of Size (measured in thousands of square
feet) and whether or not there is a fireplace (FP is 0 if there is no fireplace, 1 if
there is a fireplace). Part of the regression output is provided below, based on a
sample of 20 homes. Some of the information has been omitted.

Which statement is supported by the regression output?

A. At = .05, FP is not a significant predictor in a two-tailed test.

B. A fireplace adds around $6476 to the selling price of the average house.

C. A large house with no fireplace will sell for more than a small house with a
fireplace.

D. FP is a more significant predictor than Size.

The estimated coefficient of FP is 6.476 (our home prices are in thousands).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
74. A log transformation might be appropriate to alleviate which problem(s)?

A. Heteroscedastic residuals

B. Multicollinearity

C. Autocorrelated residuals

By reducing data magnitudes, the log transform may help equalize variances.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

75. A useful guideline in determining the extent of collinearity in a multiple


regression model is:

A. Sturge's Rule.

B. Klein's Rule.

C. Occam's Rule.

D. Pearson's Rule.

Klein's Rule suggests severe collinearity if any r exceeds the multiple correlation
coefficient.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity

76. In a multiple regression all of the following are true regarding residuals except:

A. their sum always equals zero.

B. they are the differences between observed and predicted values of the
response variable.

C. they may be used to detect multicollinearity.

D. they may be used to detect heteroscedasticity.

Residuals help in all these except to detect multicollinearity (we need VIFs for
that task).

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-08 Identify unusual residuals and high leverage observations.
Topic: Violations of Assumptions
77. The residual plot below suggests which violation(s) of regression assumptions?

A. Autocorrelation

B. Heteroscedasticity

C. Nonnormality

D. Multicollinearity

There seems to be a "fan-out" pattern (nonconstant residual variance).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
78. Which is not a standard criterion for assessing a regression model?

A. Logic of causation

B. Overall fit

C. Degree of collinearity

D. Binary predictors

Binary predictors may be a useful part of any regression model.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Multiple Regression

79. If the standard error is 12, a quick prediction interval for Y is:

A. 15.

B. 24.

C. 19.

D. impossible to determine without an F table.

Double the standard error to get an approximate width of a prediction interval


for Y.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-04 Interpret confidence intervals for regression coefficients.
Topic: Confidence Intervals for Y

80. Which is a characteristic of the variance inflation factor (VIF)?

A. It is insignificant unless the corresponding t-statistic is significant.

B. It reveals collinearity rather than multicollinearity.

C. It measures the degree of significance of each predictor.

D. It indicates the predictor's degree of multicollinearity.

The larger the VIFs, the more we suspect that the predictors are multicollinear.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
81. Which statement best describes this regression (Y = highway miles per gallon in
91 cars)?

A. Statistically significant but large error in the MPG predictions

B. Statistically significant and quite small MPG prediction errors

C. Not quite significant, but predictions should be very good

D. Not a significant regression at any customary level of

The p-value for the F-test indicates significance, but the quick prediction
interval is Y 2(4.019) or Y 8 mpg, which would not permit a very precise
prediction.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
82. Based on these regression results, in your judgment which statement is most
nearly correct (Y = highway miles per gallon in 91 cars)?

A. The number of predictors is rather small.

B. Some predictors are not contributing much.

C. Prediction intervals would be fairly narrow in terms of MPG.

D. The overall model lacks significance and/or predictive power.

There is a gap between R2 and R2adj, which suggests some superfluous


predictors were used.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
83. In the following regression, which are the three best predictors?

A. ManTran, Wheelbase, RearStRm

B. ManTran, Length, Width

C. NumCyl, HPMax, Length

D. Cannot be ascertained from given information

The absolute t-statistics indicate a ranking.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
84. In the following regression, which are the two best predictors?

A. NumCyl, HpMax

B. Intercept, NumCyl

C. NumCyl, Domestic

D. ManTran, Width

Absolute t-statistics indicate a ranking, so find tcalc = (Coef)/(Std Err) for each
predictor.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
85. In the following regression (n = 91), which coefficients differ from zero in a two-
tailed test at = .05?

A. NumCyl, HPMax

B. Intercept, ManTran

C. Intercept, NumCyl, Domestic

D. Intercept, Domestic

If the confidence interval includes zero, the predictor is not significant in a two-
tailed test.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
86. Based on the following regression ANOVA table, what is the R2?

A. 0.1336

B. 0.6005

C. 0.3995

D. Insufficient information to answer

R2 = SSR/SST = (1793.2356)/(4488.3352) = .3995.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
87. In the following regression, which statement best describes the degree of
multicollinearity?

A. Very little evidence of multicollinearity.

B. Much evidence of multicollinearity.

C. Only NumCyl and HPMax are collinear.

D. Only ManTran and RearStRm are collinear.

Many predictors have large VIFs.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
88. The relationship of Y to four other variables was established as Y = 12 + 3X1 -
5X2 + 7X3 + 2X4. When X1 increases 5 units and X2 increases 3 units, while X3
and X4 remain unchanged, what change would you expect in your estimate of
Y?

A. Decrease by 15

B. Increase by 15

C. No change

D. Increase by 5

The net effect is + 3X1 - 5X2 = 3(5) - 5(3) = 15 - 15 = 0.

AACSB: Analytic
Blooms: Apply
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Predictor Significance
89. Does the picture below show strong evidence of heteroscedasticity against the
predictor Wheelbase?

A. Yes

B. No

C. Need a probability plot to answer

D. Need VIF statistics to answer

Scatter appears random (no systematic difference in vertical spread).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
90. Which is not a correct way to find the coefficient of determination?

A. SSR/SSE

B. SSR/SST

C. 1 - SSE/SST

R2 = SSR/SST or R2 = 1 - SSE/SST.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

91. If SSR = 3600, SSE = 1200, and SST = 4800, then R2 is:

A. .5000

B. .7500

C. .3333

D. .2500

R2 = SSR/SST = 3600/4800 = .7500.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
92. Which statement is incorrect?

A. Positive autocorrelation results in too many centerline crossings in the


residual plot over time.

B. The R2 statistic can only increase (or stay the same) when you add more
predictors to a regression.

C. If the F-statistic is insignificant, the t-statistics for the predictors also are
insignificant at the same .

D. A regression with 60 observations and 5 predictors does not violate Evans'


Rule.

Positive autocorrelation results in too few crossings of the zero point on the
axis (cycles).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
93. Which statement about leverage is incorrect?

A. Leverage refers to an observation's distance from the mean of X.

B. If n = 40 and k = 4 predictors, a leverage statistic of .15 would indicate high


leverage.

C. If n = 180 and k = 3 predictors, a leverage statistic of .08 would indicate high


leverage.

2(k + 1)/n = 2(4 + 1)/40 = .25, so hi = .15 would not indicate high leverage.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-08 Identify unusual residuals and high leverage observations.
Topic: Violations of Assumptions
94. Which statement is incorrect?

A. Binary predictors shift the intercept of the fitted regression.

B. If a qualitative variable has c categories, we would use only c - 1 binaries as


predictors.

C. A binary predictor has the same t-test as any other predictor.

D. If there is a binary predictor (X = 0, 1) in the model, the residuals may not


sum to zero.

Residuals always sum to zero using the OLS method.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-05 Incorporate a categorical variable into a multiple regression model.
Topic: Categorical Predictors

95. Heteroscedasticity of residuals in regression suggests that there is:

A. nonconstant variation in the errors.

B. multicollinearity among the predictors.

C. nonnormality in the errors.

D. lack of independence in successive errors.

Heteroscedasticity is nonconstant residual variance.

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions

96. If you rerun a regression, omitting a predictor X5, which would be unlikely?

A. The new R2 will decline if X5 was a relevant predictor.

B. The new standard error will increase if X5 was a relevant predictor.

C. The remaining estimated 's will change if X5 was collinear with other
predictors.

D. The numerator degrees of freedom for the F test will increase.

Numerator df is the number of predictors, so omitting one would have the


opposite effect.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
97. In a multiple regression, which is an incorrect statement about the residuals?

A. They may be used to test for multicollinearity.

B. They are differences between observed and estimated values of Y.

C. Their sum will always equal zero.

D. They may be used to detect heteroscedasticity.

To check for multicollinearity we would look at the VIFs or a correlation matrix.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity

98. Which of the following is not a characteristic of the F distribution?

A. It is a continuous distribution.

B. It uses a test statistic Fcalc that can never be negative.

C. Its degrees of freedom vary, depending on .

D. It is used to test for overall significance in a regression.

In ANOVA we use d.f. = (k, n - k - 1). The value of does not affect d.f.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

99. Which of the following would be most useful in checking the normality
assumption of the errors in a regression model?

A. The t-statistics for the coefficients

B. The F-statistic from the ANOVA table

C. The histogram of residuals

D. The VIF statistics for the predictors

A histogram could reveal skewness or possibly outliers.

AACSB: Analytic
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 13-07 Analyze residuals to check for violations of residual assumptions.
Topic: Violations of Assumptions
100. The regression equation Salary = 25,000 + 3200 YearsExperience + 1400
YearsCollege describes employee salaries at Axolotl Corporation. The standard
error is 2600. John has 10 years' experience and 4 years of college. His salary is
$66,500. What is John's standardized residual?

A. -1.250

B. -0.240

C. +0.870

D. +1.500

John's predicted salary is 25,000 + 3200(10) + 1400(4) = 62,600, so his


standardized residual is (66,500 - 62,600)/(2600) = 1.500 (he is somewhat
overpaid according to the fitted regression).

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-08 Identify unusual residuals and high leverage observations.
Topic: Violations of Assumptions
101. The regression equation Salary = 28,000 + 2700 YearsExperience + 1900
YearsCollege describes employee salaries at Ramjac Corporation. The standard
error is 2400. Mary has 10 years' experience and 4 years of college. Her salary is
$58,350. What is Mary's standardized residual (approximately)?

A. -1.150

B. +2.007

C. -1.771

D. +1.400

Mary's predicted salary is 28,000 + 2700 (10) + 1900 (4) = 62,600, so her
standardized residual is (58,350 - 62,600)/(2400) = -1.771 (she is somewhat
underpaid according to the fitted regression).

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-08 Identify unusual residuals and high leverage observations.
Topic: Violations of Assumptions
102. Which Excel function will give the p-value for overall significance if a regression
has 75 observations and 5 predictors and gives an F test statistic Fcalc = 3.67?

A. =F.INV(.05, 5, 75)

B. =F.DIST(3.67, 4, 74)

C. =F.DIST.RT(3.67, 5, 69)

D. =F.DIST(.05, 4, 70)

In pre-2010 versions of Excel the function was =FDIST(3.67, 5, 69) for d.f. = (k, n
- k - 1).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
103. The ScamMore Energy Company is attempting to predict natural gas
consumption for the month of January. A random sample of 50 homes was
used to fit a regression of gas usage (in CCF) using as predictors Temperature
= the thermostat setting (degrees Fahrenheit) and Occupants = the number of
household occupants. They obtained the following results:

In testing each coefficient for a significant difference from zero (two-tailed test
at = .10), which is the most reasonable conclusion about the predictors?

A. Temperature is highly significant; Occupants is barely significant.

B. Temperature is not significant; Occupants is significant.

C. Temperature is less significant than Occupants.

D. Temperature is significant; Occupants is not significant.

Find the test statistic tcalc = (Coef)/(StdErr) for each predictor and compare with
t.05 = 1.678 for d.f. = n - k - 1 = 50 - 2 - 1 = 47.

AACSB: Analytic
Blooms: Apply
Difficulty: 3 Hard
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
104. In a regression with 60 observations and 7 predictors, there will be _____
residuals.

A. 60

B. 59

C. 52

D. 6

There are 60 residuals e1, e2, . . . , e60 (one residual for each observation).

AACSB: Analytic
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 13-01 Use a fitted multiple regression equation to make predictions.
Topic: Assessing Overall Fit

105. A regression with 72 observations and 9 predictors violates:

A. Evans' Rule.

B. Klein's Rule.

C. Doane's Rule.

D. Sturges' Rule.

Evans' Rule suggests n/k 10, but in this example n/k = 72/9 = 8.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit

106. The F-test for ANOVA in a regression model with 4 predictors and 47
observations would have how many degrees of freedom?

A. (3, 44)

B. (4, 46)

C. (4, 42)

D. (3, 43)

d.f. = (k, n - k - 1) = (4, 47 - 4 - 1) = (4, 42).

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-02 Interpret the R2 and perform an F test for overall significance.
Topic: Assessing Overall Fit
107. In a regression with 7 predictors and 62 observations, degrees of freedom for a
t-test for each coefficient would use how many degrees of freedom?

A. 61

B. 60

C. 55

D. 54

d.f. = n - k - 1 = 62 - 7 - 1 = 54.

AACSB: Analytic
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance

Essay Questions
108. Using state data (n = 50) for the year 2000, a statistics student calculated a
matrix of correlation coefficients for selected variables describing state averages
on the two main scholastic aptitude tests (ACT and SAT). (a) In the spaces
provided, write the two-tailed critical values of the correlation coefficient for
= .05 and = .01 respectively. Show how you derived these critical values. (b)
Mark with * all correlations that are significant at = .05, and mark with **
those that are significant at = .01. (c) Why might you expect a negative
correlation between ACT% and SAT%? (d) Why might you expect a positive
correlation between SATQ and SATV? Explain your reasoning. (e) Why is the
matrix empty above the diagonal?

(a) As explained in Chapter 12, for d.f. = n - 2 = 50 - 2 = 48, the critical values
of Student's t for a two-tailed test for zero correlation are t.025 = 2.011 and t.005
= 2.682. The critical values of the correlation coefficient are:
No correlation in the first column (ACT) is significant at either , but all the
other correlations differ significantly from zero at either value of . (b) An
inverse correlation between ACT% and SAT% might be expected because
students in a given state usually take one or the other, but not both (depending
on what their state universities prefer). (c) If the tests measure general ability,
test-takers who score well on SATQ tend also to score well on SATV. (d) Entries
above the diagonal are redundant, so they are omitted.

Feedback:

(a) As explained in Chapter 12, for d.f. = n - 2 = 50 - 2 = 48, the critical values
of Student's t for a two-tailed test for zero correlation are t.05 = 2.011 and t.01 =
2.682. The critical values of the correlation coefficient are:

No correlation in the first column (ACT) is significant at either , while all other
correlations differ significantly from zero at either value of . (b) An inverse
correlation between ACT% and SAT% might be expected because students in a
given state usually take one or the other, but not both (students may not know
that requirements follow a pattern by region). (c) If the tests measure general
ability, test-takers who score well on SATQ would tend also to score well on
SATV. (d) Entries above the diagonal are redundant, so they are omitted.

AACSB: Reflective Thinking


Blooms: Evaluate
Difficulty: 3 Hard
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
109. Using data for a large sample of cars (n = 93), a statistics student calculated a
matrix of correlation coefficients for selected variables describing each car. (a)
In the spaces provided, write the two-tailed critical values of the correlation
coefficient for = .05 and = .01 respectively. Show how you derived these
critical values. (b) Mark with * all correlations that are significant at = .05, and
mark with ** those that are significant at = .01. (c) Why might you expect a
negative correlation between Weight and HwyMPG? (d) Why might you expect
a positive correlation between HPMax and Length? Explain your reasoning. (e)
Why is the matrix empty above the diagonal?

(a) As explained in Chapter 12, for d.f. = n - 2 = 93 - 2 = 91, the critical values of
Student's t for a two-tailed test are t.025 = 1.986 and t.005 = 2.631. The critical
values of the correlation coefficient are:
Given the large sample, it would also be reasonable to use z.025 = 1.960 (giving
r.05 = .202) or z.005 = 2.576 (giving r.01 = .261). However, none of the sample
correlations is close to the decision point. All the correlations are significant at
either value of . (b) An inverse correlation between Weight and HwyMPG is
expected because larger cars have more mass that must be accelerated and
moved. (c) Longer cars require bigger engines, so HPMax and Length are
correlated. In fact, many measurable aspects of a car are correlated. (d) Entries
above the diagonal are redundant, so they are omitted.

Feedback: (a) As explained in Chapter 12, for d.f. = n - 2 = 93 - 2 = 91, the


critical values of Student's t for a two-tailed test are t.025 = 1.986 and t.005 =
2.631. The critical values of the correlation coefficient are:

Given the large sample, it would also be reasonable to use z.025 = 1.960 (giving
r.05 = .202) or z.005 = 2.576 (giving r.01 = .261). However, none of the sample
correlations is close to the decision point. All the correlations are significant at
either value of . (b) An inverse correlation between Weight and HwyMPG is
expected because larger cars have more mass that must be accelerated and
moved. (c) Longer cars require bigger engines, so HPMax and Length are
correlated. In fact, many measurable aspects of a car are correlated. (d) Entries
above the diagonal are redundant, so they are omitted.

AACSB: Reflective Thinking


Blooms: Evaluate
Difficulty: 3 Hard
Learning Objective: 13-06 Detect multicollinearity and assess its effects.
Topic: Multicollinearity
110. Analyze the regression below (n = 50 U.S. states) using the concepts you have
learned about multiple regression. Circle things of interest and write comments
in the margin. Make a prediction for Poverty for a state with Dropout = 15,
TeenMom = 12, Unem = 4, and Age65% = 12 (show your work). The variables
are Poverty = percentage below the poverty level; Dropout = percent of adult
population that did not finish high school; TeenMom = percent of total births
by teenage mothers; Unem = unemployment rate, civilian labor force; and
Age65% = percent of population aged 65 and over.
The regression is significant overall (F = 18.74, p < .0001). All the predictors are
significant at = .05 (p-values less than .05). TeenMom and Unem are the best
predictors, while Age65% and DropOut are barely significant. The intercept is
not meaningful since no state would have all predictors equal to zero.
Regarding leverage, we can apply the quick rule to check for any residual
greater than 2(k + 1)/n = 2(5)/50 = .20. By this criterion, only AK (leverage .434)
has unusual leverage. We would want to check each predictor to see which X
values are unusual for Alaska, but this is not possible without the raw data.
There are no outliers in the Studentized residual column, although there are
three unusual ones: AK (t = -2.251), IN (t = -2.129), and NM (t = +2.829).
Autocorrelation is not an issue since these are not time-series observations
(and, in any event, the residual plot against observation order crosses the zero
centerline 22 times, which is not far from what would be expected for 50
observations). The residual plot against predicted Y has no pattern (suggesting
homoscedasticity) and the residual probability plot is linear (suggesting
normality). Overall, there are no serious problems. The fitted (estimated)
regression equation is: Poverty = - 5.3546 + 0.2065 Dropout + 0.4238
TeenMom + 1.1081 Unem + 0.3469 Age65%, so the predicted value of the
dependent variable Poverty for a state with Dropout = 15, TeenMom = 12,
Unem = 4, and Age65% = 12 is: Poverty = -5.3546 + 0.2065(15) + 0.4238(12) +
1.1081(4) + 0.3469(12) = 11.42. This prediction question is to see whether the
student knows how to interpret the regression coefficients and use them
correctly. The given values of the predictors are very close to their respective
means, so the prediction actually corresponds well to an "average" state.

Feedback: The regression is significant overall (F = 18.74, p < .0001). All the
predictors are significant at = .05 (p-values less than .05). TeenMom and
Unem are the best predictors, while Age65% and DropOut are barely
significant. The intercept is not meaningful since no state has all these
predictors equal to zero. Regarding leverage, we can apply the quick rule to
check for any residual greater than 2(k + 1)/n = 2(5)/50 = .20. By this criterion,
only AK (leverage .434) has unusual leverage. We would want to check each
predictor to see which X values are unusual for Alaska, but this is not possible
without the raw data. There are no outliers in the Studentized residual column,
although there are three unusual ones: AK (t = -2.251), IN (t = -2.129), and NM
(t = +2.829). Autocorrelation is not an issue since these are not time-series
observations (and, in any event, the residual plot against observation order
crosses the zero centerline 22 times, which is not far from what would be
expected for 50 observations). The residual plot against predicted Y has no
pattern (suggesting homoscedasticity) and the residual probability plot is linear
(suggesting normality). Overall, there are no serious problems. The fitted
(estimated) regression equation is: Poverty = - 5.3546 + 0.2065 Dropout +
0.4238 TeenMom + 1.1081 Unem + 0.3469 Age65%, so the predicted value of
the dependent variable Poverty for a state with Dropout = 15, TeenMom = 12,
Unem = 4, and Age65% = 12 is: Poverty = - 5.3546 + 0.2065(15) + 0.4238(12) +
1.1081(4) + 0.3469(12) = 11.42. This prediction question is to see whether the
student knows how to interpret the regression coefficients and use them
correctly. The given values of the predictors are very close to their respective
means, so the prediction actually corresponds well to an "average" state.

AACSB: Reflective Thinking


Blooms: Evaluate
Difficulty: 3 Hard
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance
111. Analyze the regression results below (n = 33 cars in 1993) using the concepts
you have learned about multiple regression. Circle things of interest and write
comments in the margin. Make a prediction for CityMPG for a car with EngSize
= 2.5, ManTran = 1, Length = 184, Wheelbase = 104, Weight = 3000, and
Domestic = 0 (show your work). The variables are CityMPG = city MPG (miles
per gallon by EPA rating); EngSize = engine size (liters); ManTran = 1 if manual
transmission available, 0 otherwise; Length = vehicle length (inches); Wheelbase
= vehicle wheelbase (inches); Weight = vehicle weight (pounds); Domestic = 1 if
U.S. manufacturer, 0 otherwise.
The regression is significant overall (F = 20.09, p < .0001). There are four strong
predictors. Weight and Wheelbase are highly significant at = .01 (p-values less
than .01), while EngSize and Domestic are significant at = .05 (p-values less
than .05). The other two predictorsLength and ManTranare not significant
at the customary levels, although their t-values (at least 1.00 in absolute
magnitude) suggest that they may be contributing to the regression (that is, if
they are omitted, the R2adj would probably decline). The intercept is not
meaningful since no car would have all these predictors equal to zero (e.g.,
Weight = 0 is impossible). Regarding leverage, we can apply the quick rule to
check for any residual greater than 2(k + 1)/n = 2(7)/33 = .424. By this criterion,
only the Ford AeroStar (leverage .583) has unusual leverage. We would want to
check the values of each independent variable in the regression to see which
one(s) is(are) unusual. However, this is not possible without having the raw
data. There are no outliers in the Studentized residual column, although
observation 15 (Honda Civic, t = 2.862) is unusual. If we refer to the Studentized
deleted residual, observation 15 (Honda Civic, t = 3.392) is in fact an outlier. Its
actual mileage (42 mpg) is much better than predicted (34.1 mpg).
Autocorrelation is not an issue since these are not time-series observations. The
residual plot against predicted Y has no pattern (suggesting homoscedasticity)
and the residual probability plot is linear (suggesting normality). Regarding
multicollinearity, the VIFs are rather large, suggesting lack of independence
among predictors. Since none of the VIFs exceeds 10, most students will
conclude that there is no serious problem with multicollinearity. It is a fact that
many car measurements are correlated, which is a simple characteristic of the
data. However, experimentation might be needed to see whether their
contributions are truly necessary. The unexpected positive signs of EngineSize
and Wheelbase may be symptomatic of intercorrelation among the predictors.
Overall, there are no serious problems aside from one possible outlier. Nothing
should be done since this outlier is simply part of the data set. However, it
might be prudent to verify the MPG for observation 15 to make sure it is not a
typo. The fitted (estimated) regression equation is CityMPG = 34.27 + 3.824
EngSize - 2.014 ManTran - 0.08573 Length + 0.5420 Wheelbase - 0.01909
Weight - 4.285 Domestic, so the predicted value of the response variable
CityMPG for a car with EngSize = 2.5, ManTran = 1, Length = 184, Wheelbase =
104, Weight = 3000, and Domestic = 0 is CityMPG = 34.27 + 3.824(2.5) -
2.014(1) - 0.08573(184) + 0.5420(104) - 0.01909(3000) - 4.285(0) = 34.27 + 9.56 -
2.01 - 15.77 + 56.37 - 57.27 - 0 = 25.14. The given values of the predictors are
very close to their respective means, so the prediction actually corresponds well
to an "average" car. The prediction is strongly affected by the two terms
involving Wheelbase and Weight.

Feedback: The regression is significant overall (F = 20.09, p < .0001). There are
four strong predictors. Weight and Wheelbase are highly significant at = .01
(p-values less than .01), while EngSize and Domestic are significant at = .05
(p-values less than .05). The other two predictorsLength and ManTranare
not significant at the customary levels, although their t-values (at least 1.00 in
absolute magnitude) suggest that they may be contributing to the regression
(that is, if they are omitted, the R2adj would probably decline). The intercept is
not meaningful since no car has all these predictors equal to zero (e.g., Weight
= 0 is impossible). Regarding leverage, we can apply the quick rule to check for
any residual greater than 2(k + 1)/n = 2(7)/33 = .424. By this criterion, only the
Ford AeroStar (leverage .583) has unusual leverage. We would want to check
the values of each independent variable in the regression to see which one(s)
is(are) unusual. However, this is not possible without having the raw data. There
are no outliers in the Studentized residual column, although observation 15
(Honda Civic, t = 2.862) is unusual. If we refer to the Studentized deleted
residual, observation 15 (Honda Civic, t = 3.392) is in fact an outlier. Its actual
mileage (42 mpg) is much better than predicted (34.1 mpg). Autocorrelation is
not an issue since these are not time-series observations. The residual plot
against predicted Y has no pattern (suggesting homoscedasticity) and the
residual probability plot is linear (suggesting normality). Regarding
multicollinearity, the VIFs are rather large, suggesting lack of independence
among predictors. Since none of the VIFs exceeds 10, most students will
conclude that there is no serious problem with multicollinearity. It is a fact that
many car measurements are correlated, which is a simple characteristic of the
data. However, experimentation might be needed to see whether their
contributions are truly necessary. The unexpected positive signs of EngineSize
and Wheelbase may be symptomatic of intercorrelation among the predictors.
Overall, there are no serious problems aside from one possible outlier. Nothing
should be done since this outlier is simply part of the data set. However, it
might be prudent to verify the MPG for observation 15 to make sure it is not a
typo. The fitted (estimated) regression equation is CityMPG = 34.27 + 3.824
EngSize - 2.014 ManTran - 0.08573 Length + 0.5420 Wheelbase - 0.01909
Weight - 4.285 Domestic, so the predicted value of the response variable
CityMPG for a car with EngSize = 2.5, ManTran = 1, Length = 184, Wheelbase =
104, Weight = 3000, and Domestic = 0 is CityMPG = 34.27 + 3.824(2.5) -
2.014(1) - 0.08573(184) + 0.5420(104) - 0.01909(3000) - 4.285(0) = 34.27 + 9.56 -
2.01 - 15.77 + 56.37 - 57.27 - 0 = 25.14. The given values of the predictors are
very close to their respective means, so the prediction actually corresponds well
to an "average" car. Note that the prediction is strongly affected by the two
terms involving Wheelbase and Weight.

AACSB: Reflective Thinking


Blooms: Evaluate
Difficulty: 3 Hard
Learning Objective: 13-03 Test individual predictors for significance.
Topic: Predictor Significance

Vous aimerez peut-être aussi