16 CH11a-isbe11

Chapter 11
Multiple Regression and Model Building

E(y) = 0 + 1x1 + 2x2
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4
c.
11.2
a.
b.
11.1
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5
a.
0 = 506.346, 1 = 941.900, 2 = -429.060
b.
y = 506.346 941.900x1 429.060x2
c.
SSE = 151,016, MSE = 8883, s = 94.251

We expect about 95% of the y-values to fall within 2s or 2(94.251) or 188.502 units of the fitted
regression equation.
d.
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1 0
s
941.900
= 3.42
275.08
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 20 (2 + 1) = 17. From Table V, Appendix B, t.025 = 2.110. The rejection region is t < 2.110 or t
> 2.110.
Since the observed value of the test statistic falls in the rejection region (t = 3.42 < 2.110), H0 is
rejected. There is sufficient evidence to indicate 1 0 at = .05.
e.
For confidence coefficient .95, = .05 and /2 = .025. From Table V, Appendix B, with
df = n (k + 1) = 20 (2 + 1) = 17, t.025 = 2.110. The 95% confidence interval is:
2 t.025 s 429.060 2.110(379.83) 429.060 801.441

2
(1230.501, 372.381)
f.
R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained by the model
containing x1 and x2.
R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is explained by the
model containing x1 and x2, adjusted for the sample size and the number of parameters in the model.
676
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.
g.
677
To determine if at least one of the independent variables is significant in prediction y, we test:

H0: 1 = 2 = 0
Ha: At least one i 0
From the printout, the test statistic is F = 7.22

Since no level was given, we will choose = .05. The rejection region requires = .05 in the
upper tail of the F-distribution with 1 = k = 2 and 2 = n (k + 1) = 20 (2 + 1) = 17. From Table
VIII, Appendix B, F.05 = 3.59. The rejection region is F > 3.59.
Since the observed value of the test statistic falls in the rejection region ( F = 7.22 > 3.59), H0 is
rejected. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in
predicting y at = .05.
h.
a.
We are given 2 = 2.7, s = 1.86, and n = 30.
H0: 2 = 0
Ha: 2 0
2 0
s
2.7
= 1.45
1.86
> 2.056.
Since the observed value of the test statistic does not fall in the rejection region (t = 1.45 2.056),
H0 is not rejected. There is insufficient evidence to indicate 2 0 at = .05.

b.
We are given 3 = .93, s = .29, and n = 30.
Test
H0: 3 = 0
Ha: 3 0
3 0
s
.93
= 3.21
.29
The rejection region is the same as part a, t < 2.056 or t > 2.056.
Since the observed value of the test statistic falls in the rejection region (t = 3.21 > 2.056), H0 is
rejected. There is sufficient evidence to indicate 3 0 at = .05.
11.3
The observed significance level of the test is p-value = 0.005. Since the p-value is so small, we will
reject H0 for most reasonable values of . There is sufficient evidence to indicate at least one of the
variables, x1 or x2, is significant in predicting y at greater than 0.005.
c.
3 has a smaller estimated standard error than 2 . Therefore, the test statistic is larger for 3 even
though 3 is smaller than 2 .
678
11.4
Chapter 11
a.
We are given 1 = 3.1, s = 2.3, and n = 25.
H0: 1 = 0
H a: 1 > 0
1 0
s
3.1
= 1.35
2.3
The rejection region requires = .05 in the upper tail of the t distribution with df =
n (k + 1) = 25 (2 + 1) = 22. From Table V, Appendix B, t.05 = 1.717. The rejection region is
t > 1.717.
H0 is not rejected. There is insufficient evidence to indicate 1 > 0 at = .05.

b.
We are given 2 = .92, s = .27, and n = 25.
H0: 2 = 0
Ha: 2 0
2 0
s
.92
= 3.41
.27
> 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.41 > 2.074), reject H0.
There is sufficient evidence to indicate 2 0 at = .05.
c.
For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table V, Appendix B,
with df = n (k + 1) = 25 (2 + 1) = 22, t.05 = 1.717. The confidence interval is:
1 t.05 s 3.1 1.717(2.3) 3.1 3.949 (.849, 7.049)

1
We are 90% confident that 1 falls between .849 and 7.049.

d.
For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table V, Appendix B,
with df = n (k + 1) = 25 (2 + 1) = 22, t.005 = 2.819. The confidence interval is:
2 t.005 s .92 2.819(.27) .92 .761 (.159, 1.681)

2
We are 99% confident that 2 falls between .159 and 1.681.

11.5
The number of degrees of freedom available for estimating 2 is n (k + 1) where k is the number of
independent variables in the regression model. Each additional independent variable placed in the model
causes a corresponding decrease in the degrees of freedom.
11.6
a.
For x2 = 1 and x3 = 3,
E(y) = 1 + 2x1 + 1 3(3)
E(y) = 2x1 7
The graph is :
b.
For x2 = 1 and x3 = 1
E(y) = 1 + 2x1 + (1) 3(1)
E(y) = 2x1 3
The graph is:
c.
They are parallel, each with a slope of 2. They have different y-intercepts.
d.
The relationship will be parallel lines.
679
680
Chapter 11
a.
Yes. Since R2 = .92 is close to 1, this indicates the model provides a good fit. Without knowledge of
the units of the dependent variable, the value of SSE cannot be used to determine how well the model
fits.
b.
11.7
H0: 1 = 2 = = 5 = 0
Ha: At least one of the parameters is not 0
The test statistic is F =
R2 / k
(1 R ) /[n (k 1)]
2
.92 / 5
= 55.2
(1 .92) /[30 (5 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 5 and 2 = n
(k + 1) = 30 (5 + 1) = 24. From Table VIII, Appendix B, F.05 = 2.62. The rejection region is F >
2.62.
Since the observed value of the test statistic falls in the rejection region (F = 55.2 > 2.62), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting y at = .05.
11.8
No. There may be other independent variables that are important that have not been included in the model,
while there may also be some variables included in the model which are not important. The only
conclusion is that at least one of the independent variables is a good predictor of y.
11.9
a.
To determine if the model is useful, we test:

H0: 1 = 2 = 3 = 4 = 0
Ha: At least 1 i 0
From the problem, the test statistic is F = 4.74 and the p-value is less than .01.
Since the p-value is less than = .05 (p < .01), H0 is rejected. There is sufficient evidence to indicate
the model is useful for predicting accountants Mach scores at = .05
b.
R2 = .13. 13% of the total sample variation of the accountants Mach scores around their means is
explained by the model containing age, gender, education, and income.
c.
To determine if income is a useful predictor of Mach score, we test:

H0: 4 = 0
Ha: 4 0
From the printout, t = 0.52 and the p-value is p > .10. Since the p-value is greater than = .05 (p >
.10), H0 is not rejected. There is insufficient evidence to indicate that income is a useful predictor of
Mach score at = .05.
11.10
a.
The two properties are that the sum of the errors of prediction is 0 and the sum of the squares of the
errors of prediction is SSE.
b.
4 .42 . For each unit change in the betweenness centrality score, the mean lead-user rating is
estimated to increase by .42, holding all other variables constant.
c.
Since the p-value is less than (p = .002 < .05), H0 is rejected. There is sufficient evidence to indicate
that there is a significant linear relationship between betweenness centrality and lead-user rating,
holding all other variables constant.
11.11
a.
The least squares prediction equation is: y 1.81231 0.10875 x1 0.00017 x2
b.
681
o 1.81231 . Since x1 = 0 and x2 = 0 are not in the observed range, o has no meaning.
1 0.10875 . For each additional mile of roadway length, the mean number of crashes per three
years is estimated to increase by .10875 when average annual daily traffic is held constant.
2 0.00017 . For each additional unit increase in average annual daily traffic, the mean number of
crashes per three years is estimated to increase by .00017 when miles of roadway length is held
constant.
c.
For confidence coefficient .99, = .01 and /2 = .01/2 = .005. From Table V,
Appendix B, with df = n (k + 1) = 100 (2 + 1) = 97, t.005 2.63. The 99% confidence interval is:
1 t.005 s 0.10875 2.63(0.03166) 0.10875 0.08327

1
(0.02548, 0.19202)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.02548 and 0.19202 for each additional mile of roadway length, holding average annual
daily traffic constant.
d.
The 99% confidence interval is:
2 t.005 s 0.00017 2.63(0.00003) 0.00017 0.00008

2
(0.00009, 0.00025)
between 0.00009 and 0.00025 for each additional unit increase in average annual daily traffic,
holding mile of roadway length constant.
e.
The least squares prediction equation is: y 1.20785 0.06343x1 0.00056 x2
o 1.20785 . Since x1 = 0 and x2 = 0 are not in the observed range, o has no meaning.
1 0.06343 . For each additional mile of roadway length, the mean number of crashes per three
years is estimated to increase by 0.06343 when average annual daily traffic is held constant.
2 0.00056 . For each additional unit increase in average annual daily traffic, the mean number of
crashes per three years is estimated to increase by 0.00056 when miles of roadway length is held
constant.
1 t.005 s 0.06343 2.63(0.01809) 0.06343 0.04758

1
(0.01585, 0.11101)
682
Chapter 11
between 0.01585 and 0.11101 for each additional mile of roadway length, holding average annual
daily traffic constant.
2 t.005 s 0.00056 2.63(0.00012) 0.00056 0.00032

2
(0.00024, 0.00088)
between 0.00024 and 0.00088 for each additional unit increase in average annual daily traffic,
holding mile of roadway length constant.
a.
The first order model is: E(y) 0 1 x1 2 x2 3 x3 4 x4 5 x5
b.
R2 = .58. 58% of the total sample variation of the levels of trust is explained by the model containing
the 5 independent variables.
c.
d.
11.12
The rejection region requires = .10 in the upper tail of the F-distribution with 1 = k = 5 and
2 = n (k + 1) = 66 (5 + 1) = 60. From Table VII, Appendix B, F.10 = 1.90. The rejection region
is F > 1.96.
R2 k
(1 R ) [n (k 1)]
2
.58 5
16.57
(1 .58) [66 (5 1)]
rejected. There is sufficient evidence to indicate that at least one of the 5 independent variables is
useful in the prediction of level of trust at = .10.
11.13
a.
1 2.006 . For each unit increase in the proportion of block with low-density residential areas, the
mean population density is estimated to increase by 2.006, holding proportion of block with highdensity residential areas constant. Since x1 is a proportion, it is unlikely that it can increase by one
unit. A better interpretation is: For each increase of .1 in the proportion of block with low-density
residential areas, the mean population density is estimated to increase by .2006, holding proportion of
block with high-density residential areas constant.
2 5.006 . For each unit increase in the proportion of block with high-density residential areas, the
mean population density is estimated to increase by 5.006, holding proportion of block with lowdensity residential areas constant. Since x2 is a proportion, it is unlikely that it can increase by one
unit. A better interpretation is: For each increase of .1 in the proportion of block with high-density
residential areas, the mean population density is estimated to increase by .5006, holding proportion of
block with low-density residential areas constant.
b.
R2 = .686. 68.6% of the total sample variation of the population densities is explained by the linear
relationship between population density and the independent variables proportion of block with lowdensity residential areas and the proportion of block with high-density residential areas.
c.
To determine if the overall model is adequate, we test:

H0: 1 = 2 = 0
683
R2 / k
The test statistic is F
e.
(k + 1) = 125 (2 + 1) = 122. From Table X, Appendix B, F.01 4.79. The rejection region is
F > 4.79.
(1 R ) /[n (k 1)]
2
.686 / 2
133.27
(1 .686) /[125 (2 1)]
d.
rejected. There is sufficient evidence to indicate the model is adequate at = .01.
11.14
a.
The least squares prediction equation is:
y 3.70 .34 x1 .49 x2 .72 x3 1.14 x4 1.51x5 .26 x6 .14 x7 .10 x8 .10 x9 .
b.
0 3.70 . This is estimate of the y-intercept. It has no other meaning because the point with all
independent variables equal to 0 is not in the observed range.
1 0.34 . For each additional walk, the mean number of runs scored is estimated to increase by
.30, holding all other variables constant.
2 0.49 . For each additional single, the mean number of runs scored is estimated to increase by
3 0.72 . For each additional double, the mean number of runs scored is estimated to increase by
4 1.14 . For each additional triple, the mean number of runs scored is estimated to increase by
1.14, holding all other variables constant.
5 1.51 . For each additional home run, the mean number of runs scored is estimated to increase
by 1.51, holding all other variables constant.
6 0.26 . For each additional stolen base, the mean number of runs scored is estimated to increase
by .26, holding all other variables constant.
7 0.14 . For each additional time a runner is caught stealing, the mean number of runs scored is
estimated too decrease by .14, holding all other variables constant.
8 0.10 . For each additional strikeout, the mean number of runs scored is estimated to decrease
by .10, holding all other variables constant.
9 0.10 . For each additional out, the mean number of runs scored is estimated to decrease by
684
Chapter 11
c.
H0: 7 = 0
Ha: 7 < 0
The test statistic is t
7 0
s
.14 0
1.00
.14
The rejection region requires = .05 in the lower tail of the t-distribution with df = n (k + 1) = 234
(9 + 1) = 224. From Table V, Appendix B, t.05 = 1.645. The rejection region is t < 1.645.
Since the observed value of the test statistic does not fall in the rejection region
(t = 1.00 1.645), H0 is not rejected. There is insufficient evidence to indicate that the mean
number of runs decreases as the number of runners caught stealing increase, holding all other
variables constant at = .05.
d.
For confidence level .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix
B, with df = 224, t.025 = 1.96. The 95% confidence interval is:
5 t / 2 s 1.51 1.96(.05) 1.51 0.098 (1.412, 1.608)

5
We are 95% confident that the mean number of runs will increase by anywhere from 1.412 to 1.608
for each additional home run, holding all other variables constant.
11.15
a.
The first order model would be

E ( y ) 0 1 x1 2 x2 3 x3 4 x4
b.
Since the p-value is less than (p = .005 < .01), H0 is rejected. There is sufficient evidence to
indicate that there is a negative linear relationship between change from routine and the number of
years played golf, holding number of rounds of golf per year, total number of golf vacations, and
average golf score constant.
c.
The statement would be correct if the independent variables are not correlated. However, if the
independent variables are correlated, then this interpretation would not necessarily hold.
d.
To determine if the overall first-order regression model is adequate, we test:

H0: 1 = 2 = 3 = 4 = 0
e.
For all dependent variables, the rejection region requires = .01 in the upper tail of the
F-distribution with 1 = k = 4 and 2 = n (k + 1) = 393 (4 + 1) = 388. From Table X, Appendix
B, F.01 3.32. The rejection region is F > 3.32. Using MINITAB, the exact F.01, 4, 388 is 3.67. The
true rejection region is F > 3.67.
f.
For Thrill: Since the observed value of the test statistic falls in the rejection region
(F = 5.56 > 3.67), H0 is rejected. There is sufficient evidence to indicate at least one of the 4
independent variables is linearly related to Thrill at = .01.
For Change from Routine: Since the observed value of the test statistic does not fall in the rejection
region (F = 3.02 3.67), H0 is not rejected. There is insufficient evidence to indicate at least one of
the 4 independent variables is linearly related to Change from Routine at = .01.

For Surprise: Since the observed value of the test statistic does not fall in the rejection region (F =
3.33 3.67), H0 is not rejected. There is insufficient evidence to indicate at least one of the 4
independent variables is linearly related to Surprise at = .01.
f.
685
For Thrill: Since the p-value is less than (p < .001 < .01), H0 is rejected. There is sufficient evidence
to indicate that at least one of the independent variables is linearly related to Thrill at = .01.
For Change from Routine: Since the p-value is not less than (p = .018 > .01), H0 is not rejected.
There is insufficient evidence to indicate that at least one of the independent variables is linearly
related to Change from Routine at = .01.
For Surprise: Since the p-value is not less than (p = .011 > .01), H0 is not rejected. There is
insufficient evidence to indicate that at least one of the independent variables is linearly related to
Surprise at = .01.
h.
For Thrill: R2 = .055. 5.5% of the total variability around the mean thrill values can be explained by
the model containing the 4 independent variables: x1 = number of rounds of golf per year, x2 = total
number of golf vacations taken, x3 = number of years played golf, and x4 = average golf score.
For Change from Routine: R2 = .030. 3.0% of the total variability around the mean change from
routine values can be explained by the model containing the 4 independent variables: x1 = number of
rounds of golf per year, x2 = total number of golf vacations taken, x3 = number of years played golf,
and x4 = average golf score.
For Surprise: R2 = .023. 2.3% of the total variability around the mean surprise values can be
explained by the model containing the 4 independent variables: x1 = number of rounds of golf per
year, x2 = total number of golf vacations taken, x3 = number of years played golf, and x4 = average
golf score.
11.16
a.
b.
Let x1 = latitude, x2 = longitude, and x3 = depth. The 1st-order model is

y = o + 1x1 + 2x2 + 3x3 + .
Using MINITAB, the results are:
Regression Analysis: ARSENIC versus LATITUDE, LONGITUDE, DEPTH-FT
The regression equation is
ARSENIC = - 86991 - 2220 LATITUDE + 1544 LONGITUDE - 0.349 DEPTH-FT
327 cases used, 1 cases contain missing values
Predictor
Constant
LATITUDE
LONGITUDE
DEPTH-FT
Coef
-86991
-2220.1
1543.9
-0.3493
S = 103.295
SE Coef
31218
526.8
373.0
0.1566
R-Sq = 12.8%
T
-2.79
-4.21
4.14
-2.23
P
0.006
0.000
0.000
0.026
R-Sq(adj) = 12.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
LATITUDE
LONGITUDE
DEPTH-FT
DF
1
1
1
DF
3
323
326
SS
506196
3446366
3952562
MS
168732
10670
F
15.81
P
0.000
Seq SS
132506
320624
53066
The least squares model is: y 80, 991 2, 220.1latitude 1, 543.9longitude .3493depth
686
Chapter 11
c.
1 -2,220.1. For each unit increase in latitude, the mean arsenic level is estimated to decrease by
2,220.1, holding longitude and depth constant.
1,543.91. For each unit increase in longitude, the mean arsenic level is estimated to increase by
2
1,543.91, holding latitude and depth constant.
-.3493. For each unit increase in depth, the mean arsenic level is estimated to decrease by
3
.3493, holding latitude and longitude constant.

d.
From the printout, the s = 103.295. We would expect about 95% of all observations to fall within 2s =
2(103.295) = 206.590 units of their predicted values.
e.
From the printout, R2 = 12.8%. 12.8% of the total sample variation of the arsenic levels is explained
by the model containing latitude, longitude, and depth.
From the printout, R2adj = 12.0. 12.0% of the total sample variation of the arsenic levels is explained
by the model containing latitude, longitude, and depth, adjusting for the sample size and number of
independent variables in the model.
f.
To determine if the model is adequate, we test:

H0: 1 = 2 = 3 = 0
From the printout, the test statistic is F = 15.81 and the p-value is p = 0.000.
the model is adequate at = .05.
g.
11.17
Although the model was found to be adequate for = .05, it is not a particularly good model. The R2
value is only 12.8% and R2adj = 12.0. Only about 12% of the variation in arsenic values is explained
by the model.
a.
The first-order model is: E(y) = 0 + 1x1 + 2x2
b.
Using MINITAB, the results of fitting the model are:

Regression Analysis: Earnings versus Age, Hours
Earnings = - 20 + 13.4 Age + 244 Hours
Predictor
Constant
Age
Hours
S = 547.737
Coef
-20.4
13.350
243.71
SE Coef
652.7
7.672
63.51
R-Sq = 58.2%
T
-0.03
1.74
3.84
P
0.976
0.107
0.002
R-Sq(adj) = 51.3%
687
Source
Regression
Residual Error
Total
Source
Age
Hours
DF
1
1
DF
2
12
14
SS
5018232
3600196
8618428
MS
2509116
300016
F
8.36
P
0.005
Seq SS
600498
4417734
Unusual Observations
Obs
4
Age
18.0
Earnings
1552
Fit
2657
SE Fit
205
Residual
-1105
St Resid
-2.18R
R denotes an observation with a large standardized residual.
The least squares prediction equation is: y 20.4 13.350 x1 243.71x2

c.
0 = 20.4. This has no meaning since x1 = 0 and x2 = 0 are not in the observed range.
1 = 13.350. For each additional year of age, the mean annual earnings is predicted to increase by
$13.350, holding hours worked per day constant.
2 = 243.71. For each additional hour worked per day, the mean annual earnings is predicted to
increase by $243.71, holding age constant.
d.
To determine if age is a useful predictor of annual earnings, we test:

H0: 1 = 0
Ha: 1 0
The test statistic is t = 1.74.
The p-value is p = .107. Since the p-value is greater than = .01 (p = .107 > = .01), H0 is not
rejected. There is insufficient evidence to indicate that age is a useful predictor of annual earnings,
adjusted for hours worked per day, at = .01.
e.
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with
df = n (k + 1) = 15 (2 + 1) = 12, t.025 = 2.179. The 95% confidence interval is:
2 t.005 s 243.71 2.179(63.51) 243.71 138.388

2
(105.322, 382.098)
We are 95% confident that the change in the mean annual earnings for each additional hour worked
per day will be somewhere between $105.322 and $382.098, holding age constant.
f.
From the printout, R2 = R-Sq = 58.2% or .582. 58.2% of the total sample variance of annual earnings
is explained by the model containing age and hours worked per day.
g.
R2a = R-Sq(adj) = 51.3% or .513. 51.3% of the total sample variance of annual earnings is
explained by the model containing age and hours worked per day, adjusted for the sample size and
the number of parameters in the model.
688
Chapter 11
h.
To determine if at least one of the variables is useful in predicting the annual earnings, we test:
H0: 1 = 2 = 0
Ha: At least 1 i 0
The test statistic is F = 8.36 and the p-value is p = .005. Since the p-value is less than
= .01 (p = .005 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the
variables is useful in predicting the annual earnings at = .01.
11.18
a.
From MINITAB, the output is:

Regression Analysis: DDT versus Mile, Length, Weight
DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight
Predictor
Constant
Mile
Length
Weight
Coef
-108.07
0.08509
3.771
-0.04941
S = 97.48
SE Coef
62.70
0.08221
1.619
0.02926
R-Sq = 3.9%
T
-1.72
1.03
2.33
-1.69
P
0.087
0.302
0.021
0.094
R-Sq(adj) = 1.8%
Source
Regression
Residual Error
Total
DF
3
140
143
SS
53794
1330210
1384003
MS
17931
9501
F
1.89
P
0.135
y = 108.07 + 0.08509x1 + 3.771x2 0.04941x3

b.
s = 97.48. We would expect about 95% of the observed values of DDT level to fall within 2s or
2(97.48) = 194.96 units of their least squares predicted values.
c.
To determine if at least one of the variables is useful in predicting the DDT level, we test:
Ho: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less than = .05
(p = .135 .05), H0 is not rejected. There is insufficient evidence to indicate at least one of the
variables is useful in predicting the DDT level at = .05.
d.
To determine if DDT level increases as length increases, we test:

H0: 2 = 0
Ha: 2 > 0
The test statistics is t = 2.33
The p-value is p = .021/2 = .0105. Since the p-value is less than (p = .0105 < .05), H0 is rejected.
There is sufficient evidence to indicate that DDT level increases as length increases, holding the
other variables constant at = .05.
The observed significance level is p = .0105.
e.
689
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with
df = n 3 = 144 4 = 140, t.025 = 1.96. The 95% confidence interval is:
3 t / 2 s 0.04941 1.96(0.02926) 0.04941 0.05735

3
(0.10676, 0.00794)
We are 95% confident that the mean DDT level will change from 0.10676 to 0.00794 for each
additional point increase in weight, holding length and mile constant. Since 0 is in the interval, there
is no evidence that weight and DDT level are linearly related.
11.19
a.
The 1st-order model is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5.
b.

Regression Analysis: HEATRATE versus RPM, INLET-TEMP, ...

HEATRATE = 13614 + 0.0888 RPM - 9.20 INLET-TEMP + 14.4 EXH-TEMP + 0.4 CPRATIO
- 0.848 AIRFLOW
Predictor
Constant
RPM
INLET-TEMP
EXH-TEMP
CPRATIO
AIRFLOW
Coef
13614.5
0.08879
-9.201
14.394
0.35
-0.8480
S = 458.828
SE Coef
870.0
0.01391
1.499
3.461
29.56
0.4421
R-Sq = 92.4%
T
15.65
6.38
-6.14
4.16
0.01
-1.92
P
0.000
0.000
0.000
0.000
0.991
0.060
R-Sq(adj) = 91.7%
Source
Regression
Residual Error
Total
DF
5
61
66
Source
RPM
INLET-TEMP
EXH-TEMP
CPRATIO
AIRFLOW
Seq SS
119598530
26893467
7784225
4623
774427
DF
1
1
1
1
1
SS
155055273
12841935
167897208
MS
31011055
210524
F
147.30
P
0.000
Obs
11
32
36
47
61
64
RPM
18000
14950
4473
7280
33000
3600
HEATRATE
14628.0
10656.0
13523.0
11588.0
16243.0
8714.0
Fit
13214.0
11663.0
12489.5
10533.0
15758.0
8415.2
SE Fit
117.9
132.5
195.1
154.7
246.5
340.9
Residual
1414.0
-1007.0
1033.5
1055.0
485.0
298.8
St Resid
3.19R
-2.29R
2.49R
2.44R
1.25 X
0.97 X

X denotes an observation whose X value gives it large influence.
690
Chapter 11
y 13, 614.5 0.0888 x1 9.201x2 14.394 x3 0.35 x4 0.848 x5

c.
o 13, 614.5 . Since 0 is not within the range of all the independent variables, this value has no
meaning.
1 0.0888 . For each unit increase in RPM, the mean heat rate is estimated to increase by .0888,
holding all the other 4 variables constant.
2 9.201 . For each unit increase in inlet temperature, the mean heat rate is estimated to decrease
by 9.201, holding all the other 4 variables constant.
3 14.394 . For each unit increase in exhaust temperature, the mean heat rate is estimated to
increase by 14.394, holding all the other 4 variables constant.
4 0.35 . For each unit increase in cycle pressure ratio, the mean heat rate is estimated to increase
by 0.35, holding all the other 4 variables constant.
5 0.8480 . For each unit increase in air flow rate, the mean heat rate is estimated to decrease by
.848, holding all the other 4 variables constant.
d.
From the printout, s = 458.828. We would expect to see most of the heat rate values within 2 s or
2(458.828) = 917.656 units of the least squares line.
e.
To determine if at least one of the variables is useful in predicting the heat rate values, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least 1 i 0
The test statistic is F = 147.30 and the p-value is p = .000. Since the p-value is less than = .01 (p =
.000 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is
useful in predicting the heat rate values at = .01.
f.
R2a = R-Sq(adj) = 91.7% or .917. 91.7% of the total sample variance of the heat rate values is
explained by the model containing the 5 independent variables.
g.
To determine if there is evidence to indicate heat rate is linearly related to inlet temperature, we test:
H0: 2 = 0
Ha: 2 0
The test statistic is t = -6.14 and the p-value is p = 0.000. Since the p-value is less than = .01 (p =
.000 < .01), H0 is rejected. There is sufficient evidence to indicate heat rate is linearly related to inlet
temperature at = .01.
11.20
691
a.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7
b.
Using MINITAB, the output is:

The regression equation is y = 0.998 - 0.0224 x1 + 0.156x2 - 0.0172x3
0.00953x4 + 0.421x5 + 0.417x6 - 0.155x7
Predictor
Constant
x1
x2
x3
x4
x5
x6
x7
Coef
0.9981
-0.022429
0.15571
-0.01719
-0.009527
0.4214
0.4171
-0.1552
S = 0.4365
StDev
0.2475
0.005039
0.07429
0.01186
0.009619
0.1008
0.4377
0.1486
T
4.03
-4.45
2.10
-1.45
-0.99
4.18
0.95
-1.04
R-Sq = 77.1%
P
0.002
0.001
0.060
0.175
0.343
0.002
0.361
0.319
R-Sq(adj) = 62.5%
Source
DF
Regression
7
Residual Error 11
Total
18
SS
7.9578
2.3632
10.3210
MS
1.1368
0.2148
F
5.29
P
0.007
Source
DF
Seq SS
x1
1
1.4016
x2
1
1.9263
x3
1
0.1171
x4
1
0.0446
x5
1
4.0771
x6
1
0.1565
x7
1
0.2345
Obs
14
x1
80.0
y
0.120
Fit StDev Fit Residual St Resid

-0.628
0.328
0.748
2.28R
The least squares model is y = .9981 .0224x1 + .1557x2 .0172x3 .0095x4

+ .4214x5 + .4171x6 .1552x7
c.
0 = .9981 = the estimate of the y-intercept.
1 = .0224. We estimate that the mean voltage will decrease by .0224 kw/cm, for each additional
increase of 1% of x1, the disperse phase volume (with all other variables held constant).
2 = .1557. We estimate that the mean voltage will increase by .1557 kw/cm for each additional
increase of 1% of x2, the salinity (with all other variables held constant).
692
Chapter 11
3 = -.0172. We estimate the the mean voltage will decrease by .0172 kw/cm for each additional
increase of 1 degree of x3, the temperature in Celsius (with all other variables held constant).
4 = .0095. We estimate that the mean voltage will decrease by .0095 kw/cm for each additional
increase of 1 hour of x4, the time delay (with all other variables held constant).
increase of 1% of x5, surficant concentration (with all other variables held constant).
increase of 1 unit of x6, span: Triton (with all other variables held constant).
7 = .1552. We estimate that the mean voltage will decrease by .1552 kw/cm for each additional
increase of 1% of x7, the solid particles (with all other variables held constant).
d.
To determine if at least one of the variables is useful in predicting voltage, we test:

H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least 1 i 0
The test statistic is F = 5.29 and the p-value is p = .007. Since the p-value is less than = .10 (p =
.007 < .10), H0 is rejected. There is sufficient evidence to indicate at least one of the 7 variables is
useful in predicting voltage at = .10.
11.21
a.
b.
R2 = .362. 36.2% of the variability in the AC scores can be explained by the model containing the
variables self-esteem score, optimism score, and group cohesion score.
To test the utility of the model, we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is:
.362 / 3
R2 / k
F=
= 5.11
=
2
(1 .362) /[31 (3 1)]
(1 R ) /[n (k 1)]
2.96.
rejected. There is sufficient evidence that the model is useful in predicting AC score at = .05.
11.22
693

H0: 1 = 2 = = 18 = 0
Ha: At least one i 0, i = 1, 2, ... , 18
R2 / k
(1 R 2 ) /[n (k 1)]
.95 / 18
= 1.06
(1 .95) /[20 (18 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 18 and 2 = n (k
+ 1) = 20 (18 + 1) = 1. From Table VIII, Appendix B, F.05 245.9. The rejection region is F > 245.9.
Since the observed value of the test statistic does not fall in the rejection region (F = 1.06 247), H0 is not
rejected. There is insufficient evidence to indicate the model is adequate at = .05.

Note: Although R2 is large, there are so many variables in the model that 2 is small.
11.23
a.
Model 1:
H0: 1 = 0
Ha: 1 0
1 0
s
.0354
= 2.58.
.0137
Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (1 + 1) = 10, t.025 =
2.228. The rejection region is t < 2.228 or t > 2.228.
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price.
Model 2:
H0: 1 = 0
Ha: 1 0
1 0
s
.0238
= 3.32
.00717
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (4 + 1) = 7, t.025 =
and the logarithm of price, adjusting for all other variables.
H0: 2 = 0
Ha: 2 0
694
Chapter 11
2 0
s
.616
= 6.47
.0952
The rejection region is t < 2.365 or t > 2.365.

rejected. There is sufficient evidence to indicate that there is a linear relationship between average
growing season temperature and the logarithm of price, adjusting for all other variables.
H0: 3 = 0
Ha: 3 0
3 0
s
.00386
= 4.77
.00081

rejected. There is sufficient evidence to indicate that there is a linear relationship between Sept./
Aug. rainfall and the logarithm of price, adjusting for all other variables.
H0: 4 = 0
Ha: 4 0
4 0
s
.0001173
= 0.24.
.000482

H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables.
Model 3:
H0: 1 = 0
Ha: 1 0
1 0
s
.0240
= 3.21
.00747
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (5 + 1) + 7, t.025 =
and the logarithm of price, adjusting for all other variables.
695
H0: 2 = 0
Ha: 2 0
2 0
s
.608
= 5.24.
.116

rejected. There is sufficient evidence to indicate that there is a linear relationship between average
growing season temperature and the logarithm of price, adjusting for all other variables.
H0: 3 = 0
Ha: 3 0
3 0
s
.00380
= 4.00
.00095

rejected. There is sufficient evidence to indicate that there is a linear relationship between Sept./Aug.
rainfall and the logarithm of price, adjusting for all other variables.
H0: 4 = 0
Ha: 4 0
4 0
s
.00115
= 2.28
.000505

rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables.
H0: 5 = 0
Ha: 5 0
5 0
s
.00765
= 0.14.
.0565

average September temperature and the logarithm of price, adjusting for all other variables.
696
Chapter 11
b.
Mode1 1:
.0354
1 = .0354, e
1 = .036
We estimate that the mean price will increase by 3.6% for each additional increase of unit of x1,
vintage year.
Model 2:
.0238
1 = .024
1 = .0238, e
We estimate that the mean price will increase by 2.4% for each additional increase of 1 unit of x1,
vintage year (with all other variables held constant).
.616
2 = .616, e 1 = .852
average growing season temperature C (with all other variables held constant).
.00386
1 = .004
3 = .00386, e
We estimate that the mean price will decrease by .4% for each additional increase of 1 unit of x3,
Sept./Aug. rainfall in cm (with all other variables held constant).
4 = .0001173, e
.0001173
1 = .0001
We estimate that the mean price will increase by .01% for each additional increase of 1 unit of x4,
rainfall in months preceding vintage in cm (with all other variables held constant).
Model 3:
.0240
1 = .024
1 = .0240, e
vintage year (with all other variables held constant).
.608
2 = .608, e 1 = .837
average growing season temperatures in C (with all other variables held constant).
.00380
1 = .004
3 = .00380, e
We estimate that the mean price will decrease by .4% for each additional increase of 1 unit of x3,
Sept./Aug. rainfall in cm, (with all other variables held constant).
.00115
1 = .001
4 = .00115, e
697
We estimate that the average mean price will increase by .1% for each additional increase of 1 unit of
x4, rainfall in months preceding vintage in cm (with all other variables held constant).
.00765
1 = .008
5 = .00765, e
We estimate that the average mean price will increase by .8% for each additional increase of 1 unit of
x5, average Sept. temperature in C (with all other variables held constant).
c.
11.24
I would recommend model 2. Model 1 has only 1 independent variable in the model and it is
significant at = .05. The R2 for this model is R2 = .212 and s = .575. Model 2 has 4 independent
variables in the model and all terms are significant at = .05 except one. This one variable is
significant at = .10. This model has R2 = .828 and s = .287. Comparing model 2 to model 1, the R2
for model 2 is much larger than that for model 1 and the estimate of the standard deviation is much
smaller. Model 3 contains all of the independent variables that model 2 has plus one additional
variable. This additional variable is not significant at = .10. In addition, the R2 for this new model
= .828, the same as for model 2. However, the estimate of the standard deviation of model 3 is now
larger than that of model 2. This indicates that model 2 is better than model 3.
a.

Regression Analysis: Labor versus Pounds, Units, Weight
Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight
Predictor
Constant
Pounds
Units
Weight
Coef
131.92
2.726
0.04722
-2.5874
S = 9.810
SE Coef
25.69
2.275
0.09335
0.6428
R-Sq = 77.0%
T
5.13
1.20
0.51
-4.03
P
0.000
0.248
0.620
0.001
R-Sq(adj) = 72.7%
Source
Regression
Residual Error
Total
Source
Pounds
Units
Weight
DF
1
1
1
DF
3
16
19
SS
5158.3
1539.9
6698.2
MS
1719.4
96.2
F
17.87
P
0.000
Seq SS
3400.6
198.4
1559.3
The least squares equation is:
y = 131.92 + 2.726x1 + .0472x2 2.587x3

b.
To test the usefulness of the model, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i 0, for i = 1, 2, 3
MSR
1719.4
=
= 17.87
MSE
96.2
698
Chapter 11
The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k = 3 and 2 = n
5.29.
rejected. There is sufficient evidence to indicate a relationship exists between hours of labor and at
least one of the independent variables at = .01.
c.
H0: 2 = 0
Ha: 2 0
The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < . Since .620 > .05, do not
reject H0. There is insufficient evidence to indicate a relationship exists between hours of labor and
percentage of units shipped by truck, all other variables held constant, at = .05.
d.
e.
If the average number of pounds per shipment increases from 20 to 21, the estimated change in mean
number of hours of labor is 2.587. Thus, it will cost $7.50(2.587) = $19.4025 less, if the variables
x1 and x2 are constant.
f.
Since s = Standard Error = 9.81, we can estimate approximately with 2s precision or 2(9.81) or
19.62 hours.
g.
11.25
R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the labor hours is
explained by the regression model, including the independent variables pounds shipped, percentage
of units shipped by truck, and weight.
No. Regression analysis only determines if variables are related. It cannot be used to determine
cause and effect.
a.
For x1 = 1, x2 = 10, x3 = 5, and x4 = 2, y 3.58 .01(1) .06(10) .01(5) .42(2) 3.78
For x1 = 0, x2 = 8, x3 = 10, and x4 = 4, y 3.58 .01(0) .06(8) .01(10) .42(4) 4.68
b.
11.26
You would look up the number of walks (x1), singles (x2), doubles (x3), triples (x4), home runs (x5), stolen
bases (x6), caught stealing (x7), strikeouts (x8), and outs (x9) for your favorite team. Then use the following
fitted regression line to predict the number of runs scored:
y 3.70 .34 x1 .49 x2 .72 x3 1.14 x4 1.51x5 .26 x6 .14 x7 .10 x8 .10 x9
11.27
The 95% prediction interval is (1,759.75, 4,275.38). We are 95% confident that the true
actual annual earnings for a vendor who is 45 years old and who works 10 hours per day is between
$1,759.75 and $4,275.38.
b.
The 95% confidence interval is (2,620.25, 3,414.87). We are 95% confident that the true mean
annual earnings for vendors who are 45 years old and who work 10 hours per day is between
$2,620.25 and $3,414.87.
c.
11.28
a.
Yes. The prediction interval for the ACTUAL value of y is always wider than the confidence interval
for the MEAN value of y.
From the printout, the 90% prediction interval is (143.218, 180.978). We are 90% confidence that an
actual DDT level for a fish caught 300 miles upstream that is 40 centimeters long and weighs 800 grams
will be between 143.218 and 180.978. Since the DDT level cannot be negative, the interval would be
between 0 and 180.978.
11.29
699
The 95% prediction interval is (11,599.6, 13,665.5). We are 95% confident that the actual heat rate
will be between 11,599.6 and 13.665.5 when the RPM is 7,500, the inlet temperature is 1,000, the
exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.
b.
The 95% confidence interval is (12,157.9, 13,107.1). We are 95% confident that the mean heat rate
will be between 12,157.9 and 13,107.1 when the RPM is 7,500, the inlet temperature is 1,000, the
exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.
c.
11.30
a.
Yes. The confidence interval for the mean will always be smaller than the prediction interval for the
actual value. This is because there are 2 error terms involved in predicting an actual value and only
one error term involved in estimating the mean. First, we have the error in locating the mean of the
distribution. Once the mean is located, the actual value can still vary around the mean, thus, the
second error. There is only one error term involved when estimating the mean, which is the error in
locating the mean.
Yes, we agree. The fitted regression model is:
y 80, 991 2, 220.1latitude 1, 543.9longitude .3493depth .

Because the estimated coefficients for latitude and depth are negative, the higher levels of arsenic will be
when these levels are low. Because the estimated coefficient for longitude is positive, the higher levels of
arsenic will be when longitude is high.
The lowest value of latitude is 23.755, the maximum longitude is 90.662 and the minimum depth is 25.
Predicted Values for New Observations
New
Obs
1
Fit
232.43
SE Fit
23.23
95% CI
(186.73, 278.14)
95% PI
(24.14, 440.73)X
X denotes a point that is an outlier in the predictors.

Values of Predictors for New Observations
New
Obs
1
LATITUDE
23.8
LONGITUDE
90.7
DEPTH-FT
25.0
From the printout, the 95% prediction interval is (24.14, 440.73). We are 95% confident that the actual
arsenic level will be between 24.14 and 440.73 when the latitude is 23.755, longitude is 90.662, and depth
is 25.
700
11.31
Chapter 11
a.

Regression Analysis: PPRatio versus ARTenure, AR6Year, AveSal6
PPRatio = 0.70 + 0.180 ARTenure + 0.0729 AR6Year - 0.120 AveSal6
Predictor
Constant
ARTenure
AR6Year
AveSal6
Coef
0.704
0.17957
0.07285
-0.11981
S = 8.89248
SE Coef
1.192
0.08876
0.07379
0.04238
R-Sq = 11.2%
T
0.59
2.02
0.99
-2.83
P
0.556
0.045
0.325
0.005
R-Sq(adj) = 9.6%
Source
Regression
Residual Error
Total
Source
ARTenure
AR6Year
AveSal6
DF
1
1
1
DF
3
171
174
SS
1704.73
13522.03
15226.76
MS
568.24
79.08
F
7.19
P
0.000
Seq SS
994.79
78.02
631.93
The least squares prediction equation is: y = .704 + .180 x1 + .0729 x2 .120 x3
b.

H0: 1 = 2 = 3 = 0
From the printout, the test statistic is F = 7.19 and the p-value is p = .000.
the model is adequate at = .05.
c.

New
Obs
1
Fit
10.098
SE Fit
1.830
95% CI
(6.486, 13.710)
95% PI
(-7.822, 28.019)

New
Obs
1
ARTenure
40.0
AR6Year
32.0
AveSal6
1.00
The 95% confidence interval for the efficiency rating of a CEO with x1 = 40%, x2 = 32%, and x3 = $1
million is (7.822, 28.019). We are 95% confident that the actual efficiency rating of a CEO with the
above values for the independent variables is between 7.822 and 28.019.
11.32
701
The first order model is:

E(y) = 0 + 1x1 + 2x2 + 3x5
We want to find a 95% prediction interval for the actual voltage when the volume fraction of the disperse
phase is at the high level (x1 = 80), the salinity is at the low level (x2 = 1), and the amount of surfactant is at
the low level (x5 = 2).
y = 0.993 - 0.0243 x1 + 0.142 x2 + 0.385 x5
Predictor
Constant
x1
x2
x5
Coef
0.9326
-0.024272
0.14206
0.38457
S = 0.4796
StDev
0.2482
0.004900
0.07573
0.09801
R-Sq = 66.6%
T
3.76
-4.95
1.88
3.92
P
0.002
0.000
0.080
0.001
R-Sq(adj) = 59.9%
Source
Regression
Residual
Error
Total
Sourc
e
x1
x2
x5
DF
3
15
SS
6.8701
3.4509
18
10.3210
DF
F
9.95
P
0.001
Seq SS
1
1
1
MS
2.2900
0.2301
1.4016
1.9263
3.5422
Obs
x1
y
3
40.0
3.200
Fit
2.068
StDev Fit
0.239
Residual
1.132
St Resid
2.72R
R denotes an observation with a large standardized residual

Predicted Values
Fit
-0.098
StDev Fit
0.232
95.0%
( -0.592,
CI
0.396)
95.0%
-1.233,
PI
1.038)
The 95% prediction interval is (1.233, 1.038). We are 95% confident that the actual voltage is between
1.233 and 1.038 kw/cm when the volume fraction of the disperse phase is at the high level (x1 = 80), the
salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2).
702
11.33
Chapter 11
a.

Regression Analysis: Man-Hours versus Capacity, Pressure, Type, Drum
Man-Hours = - 3783 + 0.00875 Capacity + 1.93 Pressure + 3444 Type + 2093 Drum
Predictor
Constant
Capacity
Pressure
Type
Drum
Coef
-3783
0.0087490
1.9265
3444.3
2093.4
S = 894.6
SE Coef
1205
0.0009035
0.6489
911.7
305.6
R-Sq = 90.3%
T
-3.14
9.68
2.97
3.78
6.85
P
0.004
0.000
0.006
0.001
0.000
R-Sq(adj) = 89.0%
Source
Regression
Residual Error
Total
DF
4
31
35
Source
Capacity
Pressure
Type
Drum
Seq SS
175007141
490357
17813091
37544266
DF
1
1
1
1
SS
230854854
24809761
255664615
MS
57713714
800315
F
72.11
P
0.000

New Obs
1
Fit
1936
SE Fit
239
95.0% CI
1449,
2424)
95.0% PI
48,
3825)

New Obs
1
Capacity
150000
Pressure
500
Type
1.00
Drum
0.000000
The fitted regression line is:
y 3, 783 0.00875 x1 1.9265 x2 3, 444.3 x3 2, 093.4 x4

b.
To determine if the model is useful for predicting the number of man-hours needed, we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one i 0, i = 1, 2, 3, 4
The test statistic is F = 72.11 with p-value = .000. Since the p-value is less than = .01, we can
reject H0. There is sufficient evidence that the model is useful for predicting man-hours at = .01.
c.
The confidence interval is (1449, 2424).

With 95% confidence, we can conclude that the mean number of man-hours for all boilers with
characteristics x1 = 150,000, x2 = 500, x3 = 1, x4 = 0 will fall between 1449 hours and 2424 hours.
11.35
a.
E(y) = 0 + 1x1 + 2x2 + 3x1x2
b.
11.34
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3 + 6x2x3
a.
The response surface is a twisted surface in three-dimensional space.
b.
For x1 = 0, E(y) = 3 + 0 + 2x2 0x2 = 3 + 2x2

For x1 = 1, E(y) = 3 + 1 + 2x2 1x2 = 4 + x2
For x1 = 2, E(y) = 3 + 2 + 2x2 2x2 = 5
703
The plot of the lines is
c.
The lines are not parallel because interaction between x1 and x2 is present. Interaction between x1 and
x2 means that the effect of x2 on y depends on what level x1 takes on.
d.
For x1 = 0, as x2 increases from 0 to 5, E(y) increases from 3 to 13.

For x1 = 1, as x2 increases from 0 to 5, E(y) increases from 4 to 9.
For x1 = 2, as x2 increases from 0 to 5, E(y) = 5.
e.
For x1 = 2 and x2 = 4, E(y) = 5

For x1 = 0 and x2 = 5, E(y) = 13
Thus, E(y) changes from 5 to 13.
11.36
a.
R2 = 1
SSE
SS yy
21
= .956
479
95.6% of the total variability of the y values is explained by this model.

b.
To test the utility of the model, we test:

H0: 1 = 2 = 3 = 0
R2 / k
(1 R )[n (k 1)]
2
.956 / 3
= 202.8
(1 .956)[32 (3 1)]
The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k = 3 and 2 = n
2.95.
rejected. There is sufficient evidence that the model is adequate for predicting y at = .05.
704
Chapter 11
c.
The relationship between y and x1 depends on the level of x2.
d.
To determine if x1 and x2 interact, we test:

H0: 3 = 0
Ha: 3 0
1 0
s
10
= 2.5.
4
> 2.048.
rejected. There is sufficient evidence to indicate that x1 and x2 interact at = .05.
11.37
a.
The prediction equation is:
y = 2.55 + 3.82x1 + 2.63x2 1.29x1x2

b.
The response surface is a twisted plane, since the equation contains an interaction term.
c.
For x2 = 1, = 2.55 + 3.82x1 + 2.63(1) 1.29x1(1)

= .08 + 2.53x1
For x2 = 3, = 2.55 + 3.82x1 + 2.63(3) 1.29x1(3)
= 5.34 .05x1
For x2 = 5, = 2.55 + 3.82x1 + 2.63(5) 1.29x1(5)
= 10.6 2.63x1
d.
If x1 and x2 interact, the effect of x1 on y is different at
different levels of x2. When x2 = 1, as x1 increases, y
also increases. When x2 = 5, as x1 increases, y

decreases.
e.
705
The hypotheses are:

H0: 3 = 0
Ha: 3 0
f.
3
s
1.285
= 8.06
.159
= 15 (3 + 1) = 11. From Table V, Appendix B, t.005 = 3.106. The rejection region is t < 3.106 or
t > 3.106.
rejected. There is sufficient evidence to indicate that x1 and x2 interact at = .01.
11.38
a.
To determine if the overall model is useful for predicting y, we test:

H0: 1 = 2 = 3 = 0
Ha: At least one i is not 0
The test statistic is F = 226.35 and the p-value is p < .001. Since the p-value is less than
(p < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the overall model is useful
for predicting y, willingness of the consumer to shop at a retailers store in the future at = .05.
b.
To determine if consumer satisfaction and retailer interest interact to affect willingness to shop at
retailers shop in future, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less
than (p < .01 <
.05), H0 is rejected. There is sufficient evidence to indicate consumer satisfaction and retailer interest
interact to affect willingness to shop at retailers shop in future at = .05.
c.
When x2 = 1,

y o .426 x1 .044 x2 .157 x1 x2
o .426 x1 .044(1) .157 x1 (1)
.044 (.426 .157) x

o
o .044 .269 x1
Since no value is given for o , we will use o = 1 for graphing purposes. Using MINITAB, a
graph might look like:
706
Chapter 11
Scatterplot of YHAT vs X1 when X2=1

3.0
YH A T
2.5
2.0
1.5
4
X1
When x2 = 7,

y o .426 x1 .044 x2 .157 x1 x2
o .426 x1 .044(7) .157 x1 (7)
.308 (.426 1.099) x

o
o .308 .673 x1
Since no value is given for o , we will again use o = 1 for graphing purposes.
Using MINITAB, a graph might look like:

Scatterplot of YHAT vs X1 when X2=7
-1
YH A T
d.
-2
-3
-4
1
4
X1

e.
707
Using MINITAB, both plots on the same graph would be:

Scatterplot of YAHT vs X1
Variable
x2=1
x2=7
YH A T
1
0
-1
-2
-3
-4
1
4
X1
Since the lines are not parallel, it indicates that interaction is present.
11.39
a.
A regression model incorporating interaction between x1 and x2 would be:
E ( y ) o 1 x1 2 x2 3 x1 x2
b.
11.40
If the slope of the relationship between number of defects (y) and turntable speed (x1) is steeper for
lower values of cutting blade speed, then the interaction term must be negative. As the value of
cutting speed increases, the steepness gets smaller, thus, the interaction term must get smaller. This
implies 3 0.
a.
The hypothesized regression model including the interaction between x1 and x2 would be:
E ( y ) o 1 x1 2 x2 3 x1 x2
b.
If x1 and x2 interact to affect y then the effect of x1 on y depends on the level of x2. Also, the effect
of x2 on y depends on the level of x1.
c.
Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient evidence to indicate
x1 and x2 interact to affect y.
d.
1 corresponds to x1, the number ahead in line. If the negative feeling score gets larger as the
number of people ahead increases, then 1 is positive. 2 corresponds to x2, the number behind in
line. If the negative feeling score gets lower as the number of people behind increases, then 2 is
negative.
708
11.41
Chapter 11
a.
Using MINITAB, the results of fitting the interaction model are:

Regression Analysis: Earnings versus Age, Hours, A_H
Earnings = 1042 - 13.2 Age + 103 Hours + 3.62 A_H
Predictor
Constant
Age
Hours
A_H
Coef
1042
-13.24
103.3
3.621
S = 550.289
SE Coef
1304
29.23
162.0
3.840
R-Sq = 61.4%
T
0.80
-0.45
0.64
0.94
P
0.441
0.659
0.537
0.366
R-Sq(adj) = 50.8%
Source
Regression
Residual Error
Total
Source
Age
Hours
A_H
DF
1
1
1
DF
3
11
14
SS
5287427
3331000
8618428
MS
1762476
302818
F
5.82
P
0.012
Seq SS
600498
4417734
269196
y 1042 13.24 x1 103.3 x2 3.621x1 x2

b.
When x2 = 10, the least squares line is:
y 1042 13.24 x1 103.3(10) 3.621x1 (10)

1042 1033 13.24 x1 36.21x1 2075 22.97 x1
The estimated slope relating annual earnings to age is 22.97. When hours worked is equal to 10, for
each additional year of age, the mean annual earnings is estimated to increase by 22.97.
c.
y 1042 13.24(40) 103.3x2 3.621(40) x2

1042 529.6 103.3 x2 144.84 x2 512.4 248.14 x2
The estimated slope relating annual earnings to hours worked is 248.14. When age is equal to 40, for
each additional hour worked, the mean annual earnings is estimated to increase by 248.14.
d.
To determine if age and hours worked interact, we test:

H0: 3 = 0
e.
From the printout, the test statistic for the test for interaction is t = 0.94 and the
p-value is p = .366.
709
f.
11.42
Since the p-value is so large (p = .366), H0 is not rejected. There is insufficient evidence to indicate
age and hours worked interact to affect annual earnings.
a.
If client credibility and linguistic delivery style interact, then the effect of client credibility on the
likelihood value depends on the level of linguistic delivery style.
b.
To determine the overall model adequacy, we test:

H0: 1 = 2 = 3 = 0
c.
The test statistic is F = 55.35 and the p-value is p < 0.0005.

Since the p-value is so small (p < 0.0005), H0 is rejected for any reasonable value of . There is
sufficient evidence to indicate that the model is adequate at > 0.0005.
d.
To determine if client credibility and linguistic delivery style interact, we test:

H0: 3 = 0
Ha: 3 0
e.
The test statistic is t = 4.008 and the p-value is p < 0.005.

Since the p-value is so small (p < 0.005), H0 is rejected. There is sufficient evidence to indicate that
client credibility and linguistic delivery style interact at > 0.005.
f.
y 15.865 0.037(22) 0.678 x2 0.036 x2 (22) 16.679 0.114 x2

The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 22 is
0.114. When client credibility is equal to 22, for each additional point increase in linguistic delivery
style, the mean likelihood is estimated to increase by 0.114.
g.
y 15.865 0.037(46) 0.678 x2 0.036 x2 (46) 17.567 0.978 x2

The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 46 is
0.978. When client credibility is equal to 46, for each additional point increase in linguistic delivery
style, the mean likelihood is estimated to increase by 0.978.
710
11.43
Chapter 11
a.
Let x1 = latitude, x2 = longitude, and x3 = depth. The model is

y = o + 1x1 + 2x2 + 3x3 +4x1 x3 +5x2x3 + .
b.

Regression Analysis: ARSENIC versus LATITUDE, LONGITUDE, ...
ARSENIC = 10845 - 1280 LATITUDE + 217 LONGITUDE - 1549 DEPTH-FT - 11.0 Lat_d
+ 20.0 Long_d
327 cases used, 1 cases contain missing values
Predictor
Constant
LATITUDE
LONGITUDE
DEPTH-FT
Lat_D
Long_D
Coef
10845
-1280
217.4
-1549.2
-11.00
19.98
S = 103.072
SE Coef
67720
1053
814.5
985.6
11.86
11.20
R-Sq = 13.7%
T
0.16
-1.22
0.27
-1.57
-0.93
1.78
P
0.873
0.225
0.790
0.117
0.355
0.076
R-Sq(adj) = 12.4%
Source
Regression
Residual Error
Total
Source
LATITUDE
LONGITUDE
DEPTH-FT
Lat_D
Long_D
DF
1
1
1
1
1
DF
5
321
326
SS
542303
3410258
3952562
MS
108461
10624
F
10.21
P
0.000
Seq SS
132448
320144
53179
2756
33777
The least squares model is:
y 10, 845 1, 280 latitude 217.4 longitude 1, 549.2 depth 11.00 lat_d 19.98 long_d
c.
To determine if latitude and depth interact to affect arsenic level, we test:

H0: 4 = 0
Ha: 4 0
From the printout, the test statistic is F = -.93 and the p-value is p = .355.
Since the p-value is not less than (p = .355 < .05), Ho is not rejected. There is insufficient evidence
/
to indicate latitude and depth interact to affect arsenic level at = .05.
d.
711
To determine if longitude and depth interact to affect arsenic level, we test:

H0: 5 = 0
Ha: 5 0
From the printout, the test statistic is F = 1.78 and the p-value is p = .076.
Since the p-value is not less than (p = .076 < .05), Ho is not rejected. There is insufficient evidence
/
to indicate longitude and depth interact to affect arsenic level at = .05.
e.
11.44
a.
Because the interactions are not significant, this means that the effect of latitude on the arsenic levels
does not depend on the depth and the effect of longitude on the arsenic levels does not depend on the
depth.
The model that incorporates the researchers theories is:
E ( y ) 0 1 x2 2 x3 3 x5 4 x2 x5 5 x3 x5
b.
Using MINITAB, the results of fitting the model are:

Regression Analysis: HEATRATE versus INLET-TEMP, EXH-TEMP, ...
HEATRATE = 13945 - 15.1 INLET-TEMP + 28.8 EXH-TEMP - 0.69 AIRFLOW
+ 0.0228 IT_AFR - 0.0543 ET_AFR
Predictor
Constant
INLET-TEMP
EXH-TEMP
AIRFLOW
IT_AFR
ET_AFR
Coef
13945
-15.1379
28.843
-0.689
0.022770
-0.05430
S = 425.072
SE Coef
1044
0.7775
2.304
3.628
0.002999
0.01053
R-Sq = 93.4%
T
13.35
-19.47
12.52
-0.19
7.59
-5.16
P
0.000
0.000
0.000
0.850
0.000
0.000
R-Sq(adj) = 92.9%
Source
Regression
Residual Error
Total
DF
5
61
66
SS
156875371
11021838
167897208
MS
31375074
180686
F
173.64
P
0.000
y 13, 945 15.1379 x2 28.843x3 0.689 x5 0.02277 x2 x5 0.0543 x3 x5

c.
To determine if inlet temperature and air flow rate interact to affect heat rate, we test:
H0: 4 = 0
Ha: 4 0
The test statistic is t = 7.59 with a p-value of p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that inlet temperature and air flow rate
interact to affect heat rate at = .05.
712
Chapter 11
d.
To determine if exhaust temperature and air flow rate interact to affect heat rate, we test:
H0: 5 = 0
Ha: 5 0
The test statistic is t = 5.16 with a p-value of p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that exhaust temperature and air flow
rate interact to affect heat rate at = .05.
e.
11.45
Since the interaction of inlet temperature and air flow rate is significant, it means that the effect of
inlet temperature on the heat rate depends on the level of air flow rate. Also, since the interaction of
exhaust temperature and air flow rate is significant, it means that the effect of exhaust temperature on
the heat rate also depends on the level of air flow rate
a.
By including the interaction terms, it implies that the relationship between voltage and volume
fraction of the disperse phase depends on the levels of salinity and surfactant concentration.
A possible sketch of the relationship is:
b.

Regression Analysis: Voltage versus x1, x2, x5, x1x2, x1x5
Voltage = 0.906 - 0.0228 x1 + 0.305 x2 + 0.275 x5 - 0.00280 x1x2
+ 0.00158 x1x5
Predictor
Constant
x1
x2
x5
x1x2
x1x5
Coef
0.9057
-0.022753
0.3047
0.2747
-0.002804
0.001579
S = 0.5047
SE Coef
0.2855
0.008318
0.2366
0.2270
0.003790
0.003947
R-Sq = 67.9%
T
3.17
-2.74
1.29
1.21
-0.74
0.40
P
0.007
0.017
0.220
0.248
0.473
0.696
R-Sq(adj) = 55.6%
Source
Regression
Residual Error
Total
Source
x1
x2
x5
x1x2
x1x5
DF
1
1
1
1
1
DF
5
13
18
SS
7.0103
3.3107
10.3210
MS
1.4021
0.2547
F
5.51
Seq SS
1.4016
1.9263
3.5422
0.0994
0.0408
P
0.006
713
The fitted regression line is:
y = .906 - .023x1 + .305x2 + .275x5 - .003x1x2 + .002x1x5

H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0, for i = 1, 2, ..., 5
The test statistic is F = 5.51.

Since no was given, = .05 will be used. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = k = 5 and 2 = n (k + 1) = 19 (5 + 1) = 13. From Table VII,
Appendix B, F.05 = 3.03. The rejection region is F > 3.03.
rejected. There is sufficient evidence to indicate the model is useful for predicting voltage at = .05.
R2 = .679. Thus, 67.9% of the sample variation of voltage is explained by the model containing the
three independent variables and two interaction terms.
The estimate of the standard deviation is s = .5047.
Comparing this model to that fit in Exercise 11.20, the model in Exercise 11.20 appears to fit the data
better. The model in Exercise 11.20 has a higher R2 (.771 vs .679) and a smaller estimate of the
standard deviation (.4365 vs .5047).
c.
0 = .906.
This is simply the estimate of the y-intercept.
1 = .023.
For each unit increase in disperse phase volume, we estimate that the mean voltage
will decrease by .023 units, holding salinity and surfactant concentration at 0.
2 = .305.
For each unit increase in salinity, we estimate that the mean voltage will increase
by .305 units, holding disperse phase volume and surfactant concentration at 0.
3 = .275.
For each unit increase in surfactant concentration, we estimate that the mean
voltage will increase by .275 units, holding disperse phase volume and salinity at 0.
4 = .003.
This estimates the difference in the slope of the relationship between voltage and
disperse phase volume for each unit increase in salinity, holding surfactant
concentration constant.
5 = .002.
This estimates the difference in the slope of the relationship between voltage and
disperse phase volume for each unit increase in surfactant concentration, holding
salinity constant.
714
Chapter 11
a.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5
b.
H0: 4 = 0
c.
11.46
t = 4.408, p-value = .001
Since the p-value is so small, there is strong evidence to reject H0. There is sufficient evidence to
indicate that the strength of client-therapist relationship contributes information for the prediction of
a client's reaction for any > .001.
d.
e.
a.
E(y) = 0 + 1x + 2x2
2
2
E(y) = 0 + 1x1 + 2x2 + 3 x1 x2 + 4 x1 + 5 x2
c.
11.48
R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained by this model.
b.
11.47
Answers may vary.
2
2
2
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4 x1 x2 + 5 x1 x3 + 6 x2 x3 + 7 x1 + 8 x2 + 9 x3
a.
H0: 2 = 0
H a: 2 0
2 0
s
.47 0
= 3.133
.15
> 2.074.
rejected. There is sufficient evidence to indicate the quadratic term should be included in the model
at = .05.
b.
H0: 2 = 0
Ha: 2 > 0
The test statistic is the same as in part a, t = 3.133.
The rejection region requires = .05 in the upper tail of the t distribution with df = 22. From Table
V, Appendix B, t.05 = 1.717. The rejection region is t > 1.717.
rejected. There is sufficient evidence to indicate the quadratic curve opens upward at = .05.
11.49
a.
To determine if the model contributes information for predicting y, we test:

H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
R2 / k
(1 R ) /[n (k 1)]
2
.91/ 2
= 85.94
(1 .91) /[20 (2 1)]
715
The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k = 2, and
2 = n (k + 1) = 20 (2 + 1) = 17. From Table VIII, Appendix B, F.05 = 3.59. The rejection region
is F > 3.59.
rejected. There is sufficient evidence that the model contributes information for predicting y at =
.05.
b.
To determine if upward curvature exists, we test:

H0: 2 = 0
H a: 2 > 0
c.
To determine if downward curvature exists, we test:

H0: 2 = 0
H a: 2 < 0
11.50
a.
b.
c.
11.51
It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep the curvature
is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.
a.
To determine if at least one of the parameters is nonzero, we test:

H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one i 0, i = 1, 2, 3, 4, 5
The test statistic is F = 25.93, with p-value = 0.000. Since the p-value is less than = .05, H0 is
rejected. There is sufficient evidence to indicate that at least one of the parameters 1, 2, 3, 4, and
5 is nonzero at = .05.
b.
H0: 4 = 0
H a: 4 0
The test statistic is t = 10.74 with p-value = 0.000. Since the p-value is less than
= .01, H0 is rejected. There is sufficient evidence to indicate that 4 0 at = .01.
716
Chapter 11
c.
H0: 5 = 0
H a: 5 0
The test statistic is t = .60 with p-value = .550. Since the p-value is greater than =.01, H0 is not
rejected. There is insufficient evidence to indicate that 5 0 at = .01.
d.
a.
0 has no meaning because x = 0 would not be in the observed range of values
b.
11.52
Graphs may vary.
1 = 321.67. Since the quadratic effect is included in the model, the linear term is
just a location parameter and has no meaning.
c.
d.
11.53
2 = .0794. Since the value of 2 is positive, the curvature is upward.

Since no data could have been collected from 2009 to 2021, we have no idea if the relationship
between the two variables will remain the same until 2021.
a.
b.
11.54
a.
If information were available only for x = 30, 31, 32, and 33, we would suggest a first-order model
where 1 > 0. If information was available only for x = 33, 34, 35, and 36, we would again suggest a
first-order model where 1 < 0. If all the information was available, we would suggest a secondorder model.
H0: 1 = 2 = 0
R2 / k
(1 R )[n (k 1)]
2
.12 / 2
26.25
(1 .12)[388 (2 1)]
(k + 1) = 388 (2 + 1) = 385. From Table VIII, Appendix B, F.05 3.00. The rejection region is
F > 3.00.
rejected. There is sufficient evidence to indicate the model is adequate at = .05.
b.
717
To determine if leadership ability increases at a decreasing rate with assertiveness, we test:

H0: 2 = 0
Ha: 2 < 0
c.
11.55
From the table, the test statistic is t = -3.97 and the p-value is p < .01/2 = .005. Since the p-value is
less than (p < .005 < .05), H0 is rejected. There is sufficient evidence to indicate leadership ability
increases at a decreasing rate with assertiveness at = .05.
a.
The complete 2nd order model is:

2
2
E ( y ) 0 1 x1 2 x2 3 x1 x2 4 x1 5 x2
b.
c.
d.
R2 = .14. 14% of the total variation in the efficiency scores is explained by the complete 2nd order
model containing level of CEO leadership and level of congruence between the CEO and the VP.
2
If the -coefficient for the x2 term is negative, then as the value of the level of congruence increases,
the efficiency will increase at a decreasing rate to some point and then the efficiency will decrease at
an increasing rate, holding level of CEO leadership constant.
that the level of CEO leadership and the level of congruence between the CEO and the VP interact to
affect efficiency. This means that the effect of CEO leadership on efficiency depends on the level of
congruence between the CEO and the VP.
11.57
a.
2
2
E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
b.
11.56
2
2
4 x1 and 5 x2
a.
A first order model is:

E(y) = o + 1x
b.
A second order model is:

E(y) = o + 1x + 2x2
718
Chapter 11
c.
Using MINITAB, a scattergram of these data is:

Scatterplot of International vs Domestic
1200
International
1000
800
600
400
200
0
100
200
300
400
Domestic
500
600
From the plot, it appears that the first order model might fit the data better. There does not appear to
be much of a curve to the relationship.
d.

Regression Analysis: International versus Domestic, Dsq
International = 183 - 0.24 Domestic + 0.00262 Dsq
Predictor
Constant
Domestic
Dsq
Coef
182.9
-0.243
0.002625
S = 175.370
SE Coef
301.0
1.849
0.002523
R-Sq = 65.4%
T
0.61
-0.13
1.04
P
0.554
0.897
0.317
R-Sq(adj) = 60.1%
Source
Regression
Residual Error
Total
Source
Domestic
Dsq
DF
1
1
DF
2
13
15
SS
755320
399811
1155131
MS
377660
30755
F
12.28
P
0.001
Seq SS
722025
33295
To investigate the usefulness of the model, we test:

H0: 1 = 2 = 0
The p-value is p = 0.001. Since the p-value is so small, we reject H0. There is sufficient evidence to
indicate the model is useful for predicting foreign gross revenue.
719
To determine if a curvilinear relationship exists between foreign and domestic gross revenues, we
test:
H0: 2 = 0
H a: 2 0
The p-value is p = .317 Since the p-value is greater than = .05
(p = 0.317 > = .05), H0 is not rejected. There is insufficient evidence to indicate that a curvilinear
relationship exists between foreign and domestic gross revenues at = .05.
e.
a.
Using MINITAB, a sketch of the least squares prediction equation is:

Scatterplot of yhat vs Dose
12
10
8
yhat
11.58
From the analysis in part d, the first-order model better explains the variation in foreign gross
revenues. In part d, we concluded that the second-order term did not improve the model.
6
4
2
0
0
100
200
300
400
Dose
500
600
700
800
b.
For x = 500, y = 10.25 + .0053(500) .0000266(5002 ) = 10.25 + 2.65 6.65 = 6.25
c.
For x = 0, y = 10.25 + .0053(0) .0000266(02 ) = 10.25
d.
For x = 100, y = 10.25 + .0053(100) .0000266(1002 ) = 10.25 + .53 .266 = 10.514

This value is slightly larger than that for the control group (10.25).
For x = 200, y = 10.25 + .0053(200) .0000266(2002 ) = 10.25 + 1.06 1.064 = 10.246

This value is slightly smaller than that for the control group (10.25). So, the largest value of x which
yields an estimated weight change that is closest to, but just less than the estimated weight change for
the control group is x = 200.
720
a.
Using MINITAB, the scattergram of the data is:

Scatterplot of Time vs Temp
10000
8000
6000
Time
11.59
Chapter 11
4000
2000
0
120
130
140
150
160
170
Temp
The relationship appears to be curvilinear. As temperature increases, the value of time tends to
decrease but at a decreasing rate.
b.
Using MINITAB the results are:

Regression Analysis: Time versus Temp, Tempsq
Time = 154243 - 1909 Temp + 5.93 Tempsq
Predictor
Constant
Temp
Tempsq
Coef
154243
-1908.9
5.929
S = 688.137
SE Coef
21868
303.7
1.048
R-Sq = 94.2%
T
7.05
-6.29
5.66
P
0.000
0.000
0.000
R-Sq(adj) = 93.5%
Source
Regression
Residual Error
Total
Source
Temp
Tempsq
DF
1
1
DF
2
19
21
SS
144830280
8997107
153827386
MS
72415140
473532
F
152.93
P
0.000
Seq SS
129663987
15166293
The fitted regression line is: y = 154,243 1,908.9temp + 5.929temp 2
c.
721
To determine if there is an upward curvature in the relationship between failure time and solder
temperature, we test:
Ho: 2 = 0
Ha: 2 > 0
From the printout, the test statistic is t = 5.66 and the p-value is p = 0.000. Since the p-value is less
than (p = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate an upward curvature
in the relationship between failure time and solder temperature at = .05.
11.60
a.
Regression Analysis: RATE versus EST, Esq

RATE = - 288 + 1.39 EST + 0.000035 Esq
Predictor
Constant
EST
Esq
Coef
-288
1.395
0.00003509
S = 31901.1
SE Coef
8049
3.651
0.00009724
R-Sq = 45.9%
T
-0.04
0.38
0.36
P
0.972
0.706
0.722
R-Sq(adj) = 40.8%
Source
Regression
Residual Error
Total
Source
EST
Esq
DF
1
1
DF
2
21
23
SS
18138955261
21371254395
39510209656
MS
9069477631
1017678781
F
8.91
P
0.002
Seq SS
18006405335
132549926
To determine if the incidence rate is curvilinearly related to the estimated rate, we test:
H0: 2 = 0
H a: 2 0
From the printout, the test statistic is t = .36 and the p-value is p = .722. Since the p-value is not less
than (p = .722 < .05), H0 is not rejected. There is insufficient evidence to indicate that the incidence
/
rate is curvilinearly related to the estimated rate at = .05.
722
Chapter 11
b.
Using MINITAB, the scatterplot of the data is:

Scatterplot of RATE vs EST
200000
RA TE
150000
100000
50000
0
0
10000
20000
EST
30000
40000
The point for Botulism is in the lower right hand corner of the graph. The estimated value is way
bigger than the actual value.
c.

Regression Analysis: RATE2 versus EST2, Esq2
RATE2 = 735 - 0.081 EST2 + 0.000151 Esq2
Predictor
Constant
EST2
Esq2
Coef
735.0
-0.0810
0.00015052
S = 2756.80
SE Coef
695.9
0.3167
0.00000868
R-Sq = 99.6%
T
1.06
-0.26
17.34
P
0.303
0.801
0.000
R-Sq(adj) = 99.6%
Source
Regression
Residual Error
Total
Source
EST2
Esq2
DF
1
1
DF
2
20
22
SS
39251490541
151998825
39403489366
MS
19625745270
7599941
F
2582.35
P
0.000
Seq SS
36967483627
2284006914
To determine if the incidence rate is curvilinearly related to the estimated rate after eliminating the
point for Botulism, we test:
H0: 2 = 0
H a: 2 0
723
From the printout, the test statistic is t = 17.34 and the p-value is p = .000. Since the p-value is less
than (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate that the incidence rate is
curvilinearly related to the estimated rate after omitting the Botulism point at = .05.
Yes, the fit has improved. With all of the points, the value of R2 = 45.9%. When the Botulism point
has been omitted, the R2 = 99.6%. Almost all of the variation in the Incidence rates is explained by the
curvilinear relationship between incidence rate and estimated value.
11.61
The model would be E(y) = 0 + 1x + 2x2. Since the value

of y is expected to increase and then decrease as x gets larger,
2 will be negative. A sketch of the model would be:
11.62
a.
A scatterplot of the data is:
10500+
7000+
3500+
-
*
*
*
*
*
*
***
*
* * *
*
*
**
**
*
*
*
*
*
*
** *
*
*
*
*
*
* *
*
*
*
+---------+---------+---------+---------+---------+------X
0.0
8.0
16.0
24.0
32.0
40.0
b.
From the plot, it looks like a second-order model would fit the data better than a first-order model.
There is little evidence that a third-order model would fit the data better than a second-order model.
724
Chapter 11
c.
Using MINITAB, the output for fitting a first-order model is:

Y = 2752 + 122 X
Predictor
Constant
X
Coef
2752.4
122.34
s = 1904
Stdev
613.5
26.08
R-sq = 36.7%
t-ratio
4.49
4.69
p
0.000
0.000
R-sq(adj) = 35.0%
SOURCE
Regression
Error
Total
DF
1
38
39
SS
79775688
137726224
217501920
Obs.
X
Y
27
27.0
2007
40
40.0
11520
MS
79775688
3624374
Fit Stdev.Fit
6056
345
7646
591
F
22.01
Residual
-4049
3874
p
0.000
St.Resid
-2.16R
2.14R
R denotes an obs. with a large st. resid.
To see if there is a significant linear relationship between day and demand, we test:
H0: 1 = 0
H a: 1 0
The p-value for the test is p = 0.000. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate that there is a linear relationship between day and demand at = .05.
d.
Using MINITAB, the output for fitting a second-order model is:

Y = 5120 - 216 X + 8.25 XSQ
Predictor
Constant
X
XSQ
s = 1637
Coef
5120.2
-215.92
8.250
Stdev
816.9
91.89
2.173
R-sq = 54.4%
t-ratio
6.27
-2.35
3.80
p
0.000
0.024
0.001
R-sq(adj) = 52.0%
725
SOURCE
Regression
Error
Total
DF
2
37
39
SS
118377056
99124856
217501920
SOURCE
X
XSQ
DF
1
1
SEQ SS
79775688
38601372
Obs.
X
Y
27
27.0
2007
MS
59188528
2679050
Fit Stdev.Fit
5305
357
F
22.09
Residual
-3298
p
0.000
St.Resid
-2.06R
To see if there is a significant quadratic relationship between day and demand, we test:
H0: 2 = 0
H a: 2 0
The p-value for the test is p = 0.001. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate that there is a quadratic relationship between day and demand at
= .05.
e.
11.63
Since the quadratic term is significant in the second-order model in part d, the second order model is
better.
1 if qualitative variable assumes 2nd level

Let x =
0 otherwise
The model is E(y) = 0 + 1x
0 = mean value of y when the qualitative variable assumes the first level
1 = difference in the mean values of y between levels 2 and 1 of the qualitative variable
11.64
The model is E(y) = 0 + 1x1 + 2x2

where
1 if the variable is at level 2

x1
0 otherwise
1 if the variable is at level 3

x2
0 otherwise
0 = mean value of y when qualitative variable is at level 1.

1 = difference in mean value of y between level 2 and level 1 of qualitative variable.
2 = difference in mean value of y between level 3 and level 1 of qualitative variable.
726
11.65
Chapter 11
a.
Level 1 implies x1 = x2 = x3 = 0. y = 10.2
Level 2 implies x1 = 1 and x2 = x3 = 0. y = 10.2 - 4(1) = 6.2
Level 3 implies x2 = 1 and x1 = x3 = 0. y = 10.2 + 12(1) = 22.2
Level 4 implies x3 = 1 and x1 = x2 = 0. y = 10.2 + 2(1) = 12.2
b.
The hypotheses are:

H0: 1 = 2 = 3 = 0
11.66
a.
y = 80 + 16.8x1 + 40.4x2
b.
1 estimates the difference in the mean value of the dependent variable between level 2 and level 1
of the independent variable.
2 estimates the difference in the mean value of the dependent variable between level 3 and level 1
c.
of the independent variable.

The hypothesis H0: 1 = 2 = 0 is the same as H0: 1 = 2 = 3.
The hypothesis Ha: At least one of the parameters 1 and 2 differs from 0 is the same as Ha: At
least one mean (1, 2, or 3) is different.
d.
MSR 2059.5
= 24.72
MSE
83.3
Since no was given, we will use = .05. The rejection region requires = .05 in the upper tail of
the test statistic with numerator df = k = 2 and denominator df = n (k + 1) = 15 (2 + 1) = 12.
From Table VIII, Appendix B, F.05 = 3.89. The rejection region is F > 3.89.
rejected. There is sufficient evidence to indicate at least one of the means is different at = .05.
11.67
a.
1 if grape-picking method is manual

Let x1
0 otherwise
1 if soil type is clay

Let x2
0 otherwise
1 if soil type is gravel

Let x3
0 otherwise
1 if slope orientation is East

Let x4
0 otherwise
1 if slope orientation is South

Let x5
0 otherwise
1 if slope orientation is West

Let x6
0 otherwise
1 if slope orientation is Southeast

Let x7
0 otherwise
727
b.
c.
The model is: E(y) = 0 + 1x2 + 2x3

0 = mean wine quality for soil type sand
1 = difference in mean wine quality between soil types clay and sand
2 = difference in mean wine quality between soil types gravel and sand
d.
11.68
The model is: E(y) = 0 + 1x1

o = mean wine quality for grape-picking method automated
1 = difference in mean wine quality between grape-picking methods manual and automated
The model is: E(y) = 0 + 1x4 + 2x5 + 3x6 + 4x7

0 = mean wine quality for slope orientation Southwest
1 = difference in mean wine quality between slope orientations East and Southwest
2 = difference in mean wine quality between slope orientations South and Southwest
3 = difference in mean wine quality between slope orientations West and Southwest
4 = difference in mean wine quality between slope orientations Southeast and Southwest
1 if race is black
Let x1
0 otherwise
1 if availability is high
Let x2
0 otherwise
1 if position is quarterback
Let x3
0 otherwise
1 if position is running back

Let x4
0 otherwise
1 if position is wide receiver

Let x5
0 otherwise
1 if position is tight end

Let x6
0 otherwise
1 if position is defensive lineman

Let x7
0 otherwise
a.
1 if position is linebacker
Let x8
0 otherwise
1 if position is defensive back

Let x9
0 otherwise
b.

0 = mean price for race black
1 = difference in mean price between races white and black
c.

0 = mean price for card availability low
2 = difference in mean price between card availabilities high and low
d.
The model is: E(y) = 0 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9
0 = mean price for position offensive lineman
3 = difference in mean price between player positions quarterback and offensive lineman
4 = difference in mean price between player positions running back and offensive lineman
5 = difference in mean price between player positions wide receiver and offensive lineman
6 = difference in mean price between player positions tight end and offensive lineman
7 = difference in mean price between player positions defensive lineman and offensive lineman
8 = difference in mean price between player positions linebacker and offensive lineman
9 = difference in mean price between player positions defensive back and offensive lineman
728
11.69
Chapter 11
a.
1 if Developer
Let x
0 otherwise
Then the model would be: E ( y ) 0 1 x
0 = mean accuracy for the Project Leader

1 = difference in mean accuracy between the Developer and the Project Leader
b.
1 if Low
Let x1
0 otherwise
1 if Medium
Let x2
0 otherwise
Then the model would be: E ( y ) 0 1 x1 2 x2
0 = mean accuracy for the High task complexity

1 = difference in mean accuracy between Low and High task complexity
2 = difference in mean accuracy between Medium and High task complexity
c.
1 if Fixed price
Let x
0 otherwise
Then the model would be: E ( y ) 0 1 x
0 = mean accuracy for the Hourly rate
1 = difference in mean accuracy between the Fixed price and the Hourly rate
d.
1 if Time-of-delivery
Let x1
0 otherwise
1 if Cost
Let x2
0 otherwise
Then the model would be: E ( y ) 0 1 x1 2 x2
0 = mean accuracy for the Quality

1 = difference in mean accuracy between Time-of-delivery and Quality
2 = difference in mean accuracy between Cost and Quality
11.70
a.
The model would be: E(y) = 0 + 1x
b.
0 = mean relative optimism for analysts who worked for sell-side firms
1 = difference in mean relative optimism for analysts who worked for buy-side and sell-side firms
c.
Yes.
d.
Yes. If the buy-side analysts are less optimistic, then their estimates will be smaller than the sell-side
estimates. Thus, the estimate of 1 will be negative.
11.71
a.
729
2
Radj = .76. 76% of the total sample variation of SAT-Math scores is explained by the regression
model including score on PSAT and whether the student was coached or not, adjusting for the sample
size and the number of independent variables in the model.
b.
For confidence level .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B,
with df = n (k + 1) = 3,492 (2 + 1) = 3,489, t.025 = 1.96. The 95% confidence interval
is:
2 t / 2 s 19 1.96(3) 19 5.88 (13.12, 24.88)

2
We are 95% confident that the mean SAT-Math score for those who were coached was anywhere
from 13.12 to 24.88 points higher than the mean for those who were not coached, holding PSAT
scores constant.
c.
11.72
Since 0 is not contained in the confidence interval for 2, we can conclude that the coaching effect
was present. Those who received coaching scored higher on the SAT-Math than those who did not,
holding PSAT scores constant.
a.
4 = .296 The difference in the mean value of DTVA between when the operating earnings are
negative and lower than last year and when the operating earnings are not negative and lower than
last year is estimated to be .296, holding all other variables constant.
b.
To determine if the mean DTVA for firms with negative earnings and earnings lower than last year
exceed the mean DTVA of other firms, we test:
H0: 4 = 0
Ha: 4 > 0
The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we would reject H0 for
= .05. There is sufficient evidence to indicate the mean DTVA for firms with negative earnings
and earnings lower than last year exceed the mean DTVA of other firms at = .05.
c.
2
Ra = .280 28% of the variability in the DTVA scores is explained by the model containing the 5
independent variables, adjusted for the number of variables in the model and the sample size.
11.73
a.
To determine if there is a difference in the mean monthly rate of return for T-Bills between an
expansive Fed monetary policy and a restrictive Fed monetary policy, we test:
H0: 1 = 0
Ha: 1 0
Since no n nor is given, we cannot determine the exact rejection region. However, we can assume
that n is greater than 2 since the data used are from 1972 and 1997. With = .05, the critical value
of t for the rejection region will be smaller than 4.303. Thus, with = .05, t = 8.14 will fall in the
rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of
return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
730
Chapter 11
However, the value of R2 is .1818. The model used is explaining only 18.18% of the variability in
the monthly rate of return. This is not a particularly large value.
To determine if there is a difference in the mean monthly rate of return for Equity REIT between an
expansive Fed monetary policy and a restrictive Fed monetary policy, we test:
H0: 1 = 0
Ha: 1 0
Since no n nor is given, we cannot determine the exact rejection region. However, we can assume
that n is greater than 4 since the data used are from 1972 and 1997. With = .05, the critical value
of t for the rejection region will be smaller than 3.182. Thus, with = .05, t = 3.46 will fall in the
rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of
return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary
policy at = .05.
However, the value of R2 is .0387. The model used is explaining only 3.87% of the variability in the
monthly rate of return. This is a very small value.
b.
For the first model, 1 is the difference in the mean monthly rate of return for T-Bills between an
expansive Fed monetary policy and a restrictive Fed monetary policy.
For the second model, 1 is the difference in the mean monthly rate of return for Equity REIT
between an expansive Fed monetary policy and a restrictive Fed monetary policy.
c.
The least squares prediction equation for the equity REIT index is:
y = 0.01863 0.01582x.
When the Federal Reserves monetary policy is restrictive, x = 1. The predicted mean monthly rate of
return for the equity REIT index is
y = 0.01863 0.01582(1) = .00281

When the Federal Reserves monetary policy is expansive, x = 0. The predicted mean monthly rate of
return for the equity REIT index is
y = 0.01863 0.01582(0) = .01863.
11.74
a.
731
1 if study group complete solution

Let x1
0 otherwise
1 if study group check figures

Let x2
0 otherwise
A possible model would be: E(y) = o + 1x1 + 2x2

b.
The difference between the mean knowledge gains of students in the completed solution and no
help groups would be 1.
c.

Regression Analysis: IMPROVE versus X1, X2
IMPROVE = 2.43 - 0.483 X1 + 0.287 X2
Predictor
Constant
X1
X2
Coef
2.4333
-0.4833
0.2867
S = 2.70636
SE Coef
0.4941
0.7813
0.7329
R-Sq = 1.2%
T
4.92
-0.62
0.39
P
0.000
0.538
0.697
R-Sq(adj) = 0.0%
Source
Regression
Residual Error
Total
Source
X1
X2
DF
1
1
DF
2
72
74
SS
6.643
527.357
534.000
MS
3.322
7.324
F
0.45
P
0.637
Seq SS
5.523
1.121
The least squares prediction equation is: y = 2.4333 .4833x1 + .2867 x 2

d.

Ho: 1 = 2 = 0
From the printout, the test statistic is F = .45 and the p-value is p = .637. Since the p-value is not less
than (p = .637 < .05), H0 is not rejected. There is insufficient to indicate that the model was useful
/
at = .05..
e.
From Exercise 8.28, the test statistic was F = .45 and the p-value was p = .637. These are the same as
those in part d. Thus, the results agree.
732
11.75
Chapter 11
a.
1 if Lotion/cream
Let x =
0 otherwise
The model is E ( y ) 0 1 x.
b.

Regression Analysis: Cost/Use versus Type
Cost/Use = 0.778 + 0.109 Type
Predictor
Constant
Type
S = 0.8415
Coef
0.7775
0.1092
SE Coef
0.2975
0.4545
R-Sq = 0.5%
T
2.61
0.24
P
0.023
0.814
R-Sq(adj) = 0.0%
Source
Regression
Residual Error
Total
DF
1
12
13
SS
0.0409
8.4973
8.5381
MS
0.0409
0.7081
F
0.06
P
0.814
The fitted model is: y 0.7775 .1092 x

c.
To determine whether repellent type is a useful predictor of cost-per-use, we test:

H0: 1 = 0
d.
The alternative hypothesis is

Ha: 1 0
The test statistic is t = 0.24 and the p-value is p = 0.814.
Since the p-value is greater than (p = .814 > .10), H0 is not rejected. There is insufficient evidence
to indicate that repellent type is a useful predictor of cost-per-use at = .10.
e.
The dummy variable will be defined the same way and the model will look the same (just the
dependent variable will be different).
Regression Analysis: MaxProt versus Type
MaxProt = 7.56 - 1.65 Type
Predictor
Constant
Type
S = 6.617
Coef
7.563
-1.646
SE Coef
2.339
3.574
R-Sq = 1.7%
T
3.23
-0.46
P
0.007
0.653
R-Sq(adj) = 0.0%
Source
Regression
Residual Error
Total
DF
1
12
13
SS
9.29
525.43
534.71
MS
9.29
43.79
F
0.21
P
0.653
733
The fitted model is: y 7.56 1.65 x

To determine whether repellent type is a useful predictor of cost-per-use, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 0.46 and the p-value is p = 0.653.
Since the p-value is greater than (p = .653 > .10), H0 is not rejected. There is insufficient evidence
to indicate that repellent type is a useful predictor of maximum number of hours of protection at
= .10.
11.76
a.
For no stock split, x1 = 0. For high discretionary accrual, x2 = 1. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = o + 1(0) + 2(1) + 3(0)(1) = o + 2.
b.
For no stock split, x1 = 0. For low discretionary accrual, x2 = 0. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(0) + 2(0) + 3(0)(0) = 0.
c.
The difference would be 0 + 2 0 = 2.
d.
For stock split, x1 = 1. For high discretionary accrual, x2 = 1. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(1) + 2(1) + 3(1)(1) = 0 + 1 + 2 + 3.
For stock split, x1 = 1. For low discretionary accrual, x2 = 0. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(1) + 2(0) + 3(1)(0) = 0 + 1.
The difference would be 0 + 1 + 2 + 3 (0 + 1) = 2 + 3.
e.
When there is no stock split, the mean buy-and-hold return rate increases by 2 when discretionary
accrual goes from low to high. When there is a stock split, the mean buy-and-hold return rate
increases by 2 + 3 when discretionary accrual goes from low to high. Thus, the effect of
discretionary accrual on the mean buy-and-hold return rate depends on the level of stock split.
f.
Since the p-value is less than (p = .027 < .05), Ho is rejected. There is sufficient evidence to indicate
that interaction between stock split and discretionary accrual exists at = .05.
g.
Yes. For no stock split, the difference between high discretionary accrual and low discretionary
accrual is 2. Since 2 is negative, then the performance of the high discretionary accrual acquirers is
worse than low discretionary accrual acquirers.
For stock split, the difference between high discretionary accrual and low discretionary accrual is 2 +
3. Since both 2 and 3 are negative, then the performance of the high discretionary accrual acquirers
is worse than low discretionary accrual acquirers, and even worse than for no stock split.
734
11.77
Chapter 11
a.
1 if Group V
Let x1
0 otherwise
1 if Group S
Let x2
0 otherwise
The model would be: E ( y ) 0 1 x1 2 x2

b.

Regression Analysis: Recall versus x1, x2
Recall = 3.17 - 1.08 x1 - 1.45 x2
Predictor
Constant
x1
x2
Coef
3.1667
-1.0833
-1.4537
S = 1.73596
SE Coef
0.1670
0.2362
0.2362
R-Sq = 11.3%
T
18.96
-4.59
-6.15
P
0.000
0.000
0.000
R-Sq(adj) = 10.7%
Source
Regression
Residual Error
Total
Source
x1
x2
DF
1
1
DF
2
321
323
SS
123.265
967.352
1090.617
MS
61.633
3.014
F
20.45
P
0.000
Seq SS
9.150
114.116
The least squares prediction equation is: y 3.1667 1.0833x1 1.4537 x2 .

c.
To determine if the overall model is useful, we test:

H0: 1 = 2 = 0
The test statistic is F = 20.45 and the p-value is p = 0.000. Since the p-value is less than = .01, H0
is rejected. There is sufficient evidence to indicate the model is useful in predicting brand recall at
= .01.
From the Chapter 8 SIA, the test statistic was F = 20.45 and the p-value was p = 0.000. These are
identical to those above. The model is useful in predicting recall. This is the same as the conclusion
that there is a difference in mean recall among the 3 groups.
d.
With the dummy variable coding in part a, 0 is the mean recall for group N. Thus, the estimated
mean recall for Group N is 3.1667 or 3.17. 1 is the difference in mean recall between Group V and
Group N. Thus, the mean recall for Group V is 0 + 1 and is estimated to be 3.1667 1.0833 =
2.0834 or 2.08. 2 is the difference in mean recall between Group S and Group N. Thus, the mean
recall for Group S is 0 + 2 and is estimated to be 3.1667 1.4537 = 1.7130 or 1.71.
735
a.
The first-order model is E(y) = 0 + 1x1
b.
11.78
The new model is E(y) = 0 + 1x1 + 2x2 + 3x3
1 if level 2
where x 2
0 otherwise
c.
1 if level 3
x3
0 otherwise
To allow for interactions, the model is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3
d.
e.
There will be one response line if 2 = 3 = 4 = 5 = 0
a.
2
The complete second-order model is E(y) = 0 + 1x1 + 2 x1
b.
11.79
The response lines will be parallel if 4 = 5 = 0
2
The new model is E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3
1 if level 2
where x2 =
0 otherwise
c.
1 if level 3
x3 =
0 otherwise
The model with the interaction terms is:

2
2
2
E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x1 x2 8 x1 x3
d.
e.
The response curves will be parallel lines if the interaction terms as well as the second-order terms
are absent or if 2 = 5 = 6 = 7 = 8 = 0.
f.
11.80
The response curves will have the same shape if none of the interaction terms are present or if 5 = 6
= 7 = 8 = 0.
The response curves will be identical if no terms involving the qualitative variable are present or 3 =
4 = 5 = 6 = 7 = 8 = 0.
a.
When x2 = x3 = 0, E(y) = 0 + 1x1

When x2 = 1 and x3 = 0, E(y) = 0 + 1x1 + 2
When x2 = 0 and x3 = 1, E(y) = 0 + 1x1 + 3
b.
For level 1, y = 44.8 + 2.2x1
For level 2, y = 44.8 + 2.2x1 + 9.4

= 54.2 + 2.2x1
For level 3, y = 44.8 + 2.2x1 + 15.6

= 60.4 + 2.2x1
736
11.81
Chapter 11
a.
For x2 = 0 and x3 = 0, y = 48.8 3.4 x1 + .07 x12
For x2 = 1 and x3 = 0, y = 48.8 3.4 x1 + .07 x12 2.4(1) + 3.7 x1 (1) .02 x12 (1)
= 46.4 + 0.3 x1 + .05 x12
For x2 = 0 and x3 = 1, y = 48.8 3.4 x1 + .07 x12 7.5(1) + 2.7 x1 (1) .04 x12 (1)
= 41.3 0.7 x1 + 0.03 x12
b.
11.82
The plots of the lines are:
2
The model is E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x4
where x1 is the quantitative variable and
1 if level 2 of qualitative variable

x2 =
0 otherwise
x3 =
0 otherwise
x4 =
0 otherwise
11.83
a.
Ho: 1 = 2 = 3 = . . . 12 = 0
Ha: At least 1 i 0
Using Tables VII, VIII, IX, and X, Appendix B, with 1 = k = 12 and 2 = n (k + 1) = 148 (12 + 1)
= 135, the p-value associated with F = 26.9 is less than .001. Since the p-value is so small, H0 is
rejected. There is sufficient evidence to indicate the model is adequate.
R2 = .705. 70.5% of the total variation of the natural logarithm of card prices is explained by the
model with the 12 variables in the model.
Adj-R2 = .681. 68.1% of the total variation of the natural logarithm of card prices is explained by the
model with the 12 variables in the model, adjusting for the sample size and the number of variables in
the model.
Since these R2 values are fairly large, it indicates that the model is pretty good.
b.
737
To determine if race contributes to the price, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 1.014 and the p-value is p = .312. Since the p-value is so large, H0 is not
rejected. There is insufficient evidence to indicate race has an impact on the value of professional
football players rookie cards for any reasonable value of , holding the other variables constant.
c.
To determine if card vintage contributes to the price, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = 10.92 and the p-value is p = .000. Since the p-value is so small, H0 is rejected.
There is sufficient evidence to indicate card vintage has an impact on the value of professional
football players rookie cards for any reasonable value of , holding the other variables constant.
d.
11.84
The first order model is: E(y) = 0 + 1x3 + 2x5 + 3x6 + 4x7 + 5x8 + 6x9 + 7x10 + 8x11
+ 9x12 + 10x5x3 + 11x6x3 + 12x7x3 + 13x8 x3 + 14x9 x3 + 15x10 x3 + 16x11 x3 + 17x12 x3
a.
R2 = .069. 6.9% of the total variation of the relative optimism of the analysts 3-month horizon
forecasts is explained by the model containing type of firm, number of days between forecast and
fiscal year-end, and the natural logarithm of the number of quarters the analyst had worked with the
firm.
b.
H0: 1 = 2 = 3 = 0
Ha: At least 1 i 0
R2 / k
(1 R ) /[ n (k + 1)]
2
.069 / 3
= 274.64
(1 .069) /[11,121 (3 + 1)]
The rejection region requires = .01 in the upper tail of the F distribution with 1 = k = 3 and
2 = n (k + 1) = 11,121 (3+1) = 11,117. From Table X, Appendix B, F.01 = 3.78. The rejection
region is F > 3.78.
rejected. There is sufficient evidence to indicate the model is useful at = .01.
c.
To determine if x1 contributes significantly to the prediction of y, we test:
H0: 1 = 0
Ha: 1 0
The rejection region requires = .01/2 = .005 in each tail of the t distribution. From Table V,
Appendix, with df = n (k + 1) = 11,121 (3 + 1) = 11,117, t.005 = 2.576. The rejection region is t >
2.576 or t < -2.576.
rejected. There is sufficient evidence to indicate x1 contributes significantly to the prediction of y at
= .01, holding the other variables constant.
738
Chapter 11
d.
11.85
a.
Yes. In part c, we concluded that 1 is different from 0. Because the estimate of 1 is greater than 0,
we can conclude that 1 is positive. Therefore, the earnings forecasts by the analysts at buy-side firms
are more optimistic than forecasts made by analysts at sell-side firms, holding the other variables
constant.
For obese smokers, x2 = 0. The equation of the hypothesized line relating mean REE to time after
smoking for obese smokers is:
E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1

The slope of the line is 1.
b.
For normal weight smokers, x2 = 1. The equation of the hypothesized line relating mean REE to time
after smoking for normal smokers is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = (0 + 2) + (1 + 3)x1

The slope of the line is 1 + 3.
c.
The reported p-value is .044. Since the p-value is small, there is evidence to indicate that interaction
between time and weight is present for > .044.
For = .01, there is no evidence to indicate that interaction between time and weight is present.
11.86
a.
1 if perceived organizational support is low

Let x2
0 otherwise
1 ifperceived organizational support is neutral
Let x3
0 otherwise
b. The model would be E(y) = o + 1x1 + 2x2 + 3x3.

c. The model would be E(y) = o + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3.
d. If the effect of bullying on intention to leave is greater at the low level of POS than at the
high level of POS, this indicates that POS and bullying interact. Thus, the model in part c
supports these findings.
11.87
a.
1 if Channel catfish
Let x1 =
0 otherwise
1 if Largemouth bass
x2 =
0 otherwise
b.
Let x3 = weight. The model would be: E ( y ) 0 1 x1 2 x2 3 x3
c.
The model would be: E ( y ) 0 1 x1 2 x2 3 x3 4 x1 x3 4 x2 x3
d.
739

Regression Analysis: DDT versus x1, x2, Weight
DDT = 3.1 + 26.5 x1 - 4.1 x2 + 0.0037 Weight
Predictor
Constant
x1
x2
Weight
Coef
3.13
26.51
-4.09
0.00371
S = 98.57
SE Coef
38.89
21.52
37.91
0.02598
R-Sq = 1.7%
T
0.08
1.23
-0.11
0.14
P
0.936
0.220
0.914
0.887
R-Sq(adj) = 0.0%
Source
Regression
Residual Error
Total
Source
x1
x2
Weight
DF
3
140
143
DF
1
1
1
SS
23652
1360351
1384003
MS
7884
9717
F
0.81
P
0.490
Seq SS
23041
414
198
The least squares prediction equation is: y 3.1 26.5 x1 4.1x2 0.0037 x3
e.
3 0.0037 . For each additional gram of weight, the mean level of DDT is expected to increase by
0.0037 units, holding species constant.
f.

Regression Analysis: DDT versus x1, x2, Weight, x1Weight, x2Weight
DDT = 3.5 + 25.6 x1 - 3.5 x2 + 0.0034 Weight + 0.0008 x1Weight
- 0.0013 x2Weight
Predictor
Constant
x1
x2
Weight
x1Weight
x2Weight
Coef
3.50
25.59
-3.47
0.00344
0.00082
-0.00129
S = 99.29
SE Coef
54.69
67.52
84.70
0.03843
0.05459
0.09987
R-Sq = 1.7%
T
0.06
0.38
-0.04
0.09
0.02
-0.01
P
0.949
0.705
0.967
0.929
0.988
0.990
R-Sq(adj) = 0.0%
Source
Regression
Residual Error
Total
Source
x1
x2
Weight
x1Weight
x2Weight
DF
1
1
1
1
1
DF
5
138
143
SS
23657
1360346
1384003
MS
4731
9858
F
0.48
Seq SS
23041
414
198
4
2
y 3.5 25.6 x1 3.5 x2 0.0034 x3 0.0008 x1 x3 .0013x2 x3
P
0.791
740
Chapter 11
g.
For Channel catfish, x1 = 1 and x2 = 0. The least squares line is
y 3.5 25.6(1) 0.0034 x3 0.0008(1) x3 29.1 .0042 x3

The estimated slope is .0042.
11.88
a.
The first-order model is:
E(y) = 0 + 1x1 + 2x2

b.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) = 0 + 2 + 1x1

The slope of the line would be 1.
c.
The new model would include the interaction term:
E(y) = 0 + 1x1 + 2x2 + 3x1x2

d.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 2 + (1 + 3)x1

The slope of the line would be 1 + 3.
11.89
a.
Let x1 = sales volume
1 if NW
x2 =
0 if not
1 if W
x4 =
0 if not
1 if S
x3 =
0 if not
The complete second order model for the sales price of a single-family home is:
b.
E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x3 + 5x4 + 6x1x2 + 7x1x3 + 8x1x4

+ 9x12x2 + 10x12x3 + 11x12x4
For the West, x2 = 0, x3 = 0, and x4 = 1. The equation would be:
E(y) = 0 + 1x1 + 2x12 + 3(0) + 4(0) + 5(0) + 6x1(0) + 7x1(0)
+ 8x1(0) + 9x12(0) + 10x12(0) + 11x12(0)
= 0 + 1x1 + 2x12 + 5 + 8x1 + 11x12
= 0 + 5 + 1x1 + 8x1 + 2x12 + 11x12
= (0 + 5) + (1 + 8)x1 + (2 + 11)x12
c.
741
For the Northwest, x2 = 1, x3 = 0, and x4 = 0. The equation would be:
E(y) = 0 + 1x1 + 2x12 + 3(1) + 4(0) + 5(0) + 6x1(1) + 7x1(0)

+ 8x1(0) + 9x12(1) + 10x12(0) + 11x12(0)
= 0 + 1x1 + 2x12 + 3 + 6x1 + 9x12
= 0 + 3 + 1x1 + 6x1 + 2x12 + 9x12
= (0 + 3) + (1 + 6)x1 + (2 + 9)x12
d.
The parameters 3, 4, and 5 allow for the y-intercepts of the 4 regions to be different. The
parameters 6, 7, and 8 allow for the peaks of the curves to be a different value of sales volume (x1)
for the four regions. The parameters 9, 10, and 11 allow for the shapes of the curves to be different
for the four regions. Thus, all the parameters from 3 through 11 allow for differences in mean sales
prices among the four regions.
e.
Using MINITAB, the printout is:
Regression Analysis: Price versus X1, X1SQ, ...

Price = 1904740 - 70.4 X1 + 0.000721 X1SQ + 159661 X2 + 5291908 X3 + 3663319 X4
+ 22.2 X1X2 - 23.9 X1X3 - 37 X1X4 - 0.000421 X1SQX2 - 0.000404 X1SQX3
- 0.000181 X1SQX4
Predictor
Constant
X1
X1SQ
X2
X3
X4
X1X2
X1X3
X1X4
X1SQX2
X1SQX3
X1SQX4
S = 24365.8
Coef
1904740
-70.44
0.0007211
159661
5291908
3663319
22.25
-23.86
-37.2
-0.0004210
-0.0004044
-0.0001810
SE Coef
1984278
72.09
0.0006515
2069265
4812586
4478880
73.74
92.09
103.0
0.0006589
0.0006777
0.0007333
R-Sq = 85.0%
T
0.96
-0.98
1.11
0.08
1.10
0.82
0.30
-0.26
-0.36
-0.64
-0.60
-0.25
P
0.351
0.343
0.285
0.939
0.288
0.425
0.767
0.799
0.723
0.532
0.559
0.808
R-Sq(adj) = 74.6%
742
Chapter 11
Source
Regression
Residual Error
Total
Source
X1
X1SQ
X2
X3
X4
X1X2
X1X3
X1X4
X1SQX2
X1SQX3
X1SQX4
DF
1
1
1
1
1
1
1
1
1
1
1
DF
11
16
27
SS
53633628997
9499097458
63132726455
MS
4875784454
593693591
F
8.21
P
0.000
Seq SS
3591326
64275360
11338642654
10081000583
241539024
18258475317
5579187440
7566169810
138146367
326425228
36175888
Obs
2
5
7
X1
61025
60324
61025
Price
235900
345300
240855
Fit
291659
279697
241084
SE Fit
18746
15712
24360
Residual
-55759
65603
-229
St Resid
-3.58R
3.52R
-0.42 X

To determine if the model is useful for predicting sales price, we test:
H0: 1 = 2 = = 11 = 0
Ha: At least one of the coefficients is nonzero
MS(Model)
= 8.21
MSE
The p-value is p = .000. Since the p-value is less than = .01 (p = .000 < .01), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting sales price at = .01.
11.90
a.
1 if Developing
Let x2 =
0 otherwise
The model would be:
E(y) = 0 + 1x1 + 2x2 + 3x1x2
b.
743
Using MINITAB, the plot of the data is:

Scatterplot of Volatility vs CredRat
Market
D
E
60
50
40
30
20
0
10
20
30
40
50
x1
60
70
80
90
From the plot, it appears that the model is appropriate. The two lines appear to have different slopes.
c.

Regression Analysis: y versus x1, x2, x1x2
y = 58.8 - 0.557 x1 - 18.7 x2 + 0.354 x1x2
Predictor
Constant
x1
x2
x1x2
Coef
58.786
-0.55743
-18.718
0.35368
S = 2.66123
SE Coef
1.217
0.03669
5.572
0.07615
R-Sq = 96.1%
T
48.30
-15.19
-3.36
4.64
P
0.000
0.000
0.002
0.000
R-Sq(adj) = 95.7%
Source
Regression
Residual Error
Total
Source
x1
x2
x1x2
DF
1
1
1
DF
3
26
29
SS
4596.5
184.1
4780.6
MS
1532.2
7.1
F
216.34
P
0.000
Seq SS
4388.0
55.7
152.8
The fitted regression model is:
y = 58.786 .557x1 18.718x2 + .354x1x2
744
Chapter 11
For the emerging countries, x2 = 0. The fitted model is:
y = 58.786 .557x1 18.718(0) + .354x1(0) = 58.786 .557x1

For the developed countries, x2 = 1. The fitted model is:
y = 58.786 .557x1 18.718(1) + .354x1(1) = 40.068 .203x1

d.
The plot of the fitted lines is:

Scatterplot of y vs x1
Market
D
E
60
50
40
30
20
0
e.
10
20
30
40
50
x1
60
70
80
90
To determine if the slope of the linear relationship between volatility and credit rating depends on
market type, we test:
H0: 3 = 0
H a: 3 0
The p-value is 0.000. Since the p-value is less than = .01, H0 is rejected. There is sufficient
evidence to indicate that the slope of the linear relationship between volatility and credit rating
depends on market type at = .01.
11.91
The models in parts a and b are nested:

The complete model is E(y) = 0 + 1x1 + 2x2
The reduced model is E(y) = 0 + 1x1
The models in parts a and d are nested.
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2
The reduced model is E(y) = 0 + 1x1 + 2x2
745
The models in parts a and e are nested.

2
2
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
The reduced model is E(y) = 0 + 1x1 + 2x2
The models in parts b and c are nested.

2
The complete model is E(y) = 0 + 1x1 + 2 x1
The models in parts b and d are nested.

The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2
The models in parts b and e are nested.
2
2
The models in parts c and e are nested.

2
2
2
The reduced model is E(y) = 0 + 1x1 + 2 x1
The models in parts d and e are nested.

2
2
The reduced model is E(y) = 0 + 1x1 + 2x2 + 3x1x2
11.92
a.
b.
The reduced model would be E(y) = 0 + 1x1 + 2x2
c.
The numerator df = k g = 5 2 = 3 and the denominator df = n (k + 1)

= 30 (5 + 1) = 24.
d.
H0: 3 = 4 = 5 = 0
(SSE R SSE C)/(k g ) (1250.2 1125.2) /(5 2) 41.6667
= .89
=
1125.2 /[30 (5 1)]
46.8833
SSE C /[n (k 1)]
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 5 2 = 3 and denominator df = n (k + 1) = 30 (5 + 1) = 24. From Table VIII, Appendix B,
F.05 = 3.01. The rejection region is F > 3.01.
Since the observed value of the test statistic does not fall in the rejection region (F = .89 3.01), H0
is not rejected. There is insufficient evidence to indicate the second-order terms are useful at = .05.
746
Chapter 11
a.
Including 0, there are five parameters in the complete model and three in the reduced model.
b.
11.93
The hypotheses are:

H0: 3 = 4 = 0
c.
(SSE R SSE C ) /(k g )

SSE C /[n (k 1)]
=
(160.44 152.66) /(4 2)

3.89
= .38
152.66 /[20 (4 1)]
10.1773
Since the observed value of the test statistic does not fall in the rejection region (F = .38 3.68), H0
is not rejected. There is insufficient evidence to indicate the complete model is better than the
reduced model at = .05.
11.94
a.
Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be the Diagnostic
variables, variables x12 through x15 be the Treatment variables, and variables x16 through x21 be the
Community variables. The compete model is:
E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9
10 x10 11 x11 12 x12 13 x13 14 x14 15 x15 16 x16 17 x17
18 x18 19 x19 20 x20 21 x21
b.
To determine if the 7 Diagnostic variables contribute information for the prediction of y, we test:
H0: 5 = 6 = = 11 = 0
c.
The reduced model would be:

E ( y ) 0 1 x1 2 x2 3 x3 4 x4 12 x12 13 x13 14 x14
15 x15 16 x16 17 x17 18 x18 19 x19 20 x20 21 x21
d.
11.95
Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence to indicate at
least one of the seven diagnostic variables contributes information for the prediction of y.
a.
To determine whether the quadratic terms in the model are statistically useful for predicting relative
optimism, we test:
H0: 4 = 5 = 0
Ha: At least 1 i 0
b.
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 1x1 + 2x2 + 3x1x2.
c.
747
To determine whether the interaction terms in the model are statistically useful for predicting relative
optimism, we test:
H0: 3 = 5 = 0
Ha: At least 1i 0
d.
E(y) = 0 + 1x1 + 2x2 + 4x22.
To determine whether the dummy variable terms in the model are statistically useful for predicting
relative optimism, we test:
e.
f.
a.
The model from part b of Exercise 11.86 is E(y) = 0 + 1x1 + 2x2 + 3x3. The model from part c of
Exercise 11.86 is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3. These two models are nested
because all of the terms in the first model are contained in the second model. The first model is the
reduced model and the second model is the complete model.
The null hypothesis for comparing the two models is H0: 4 = 5 = 0.
c.
If we reject H0 in part b, we would conclude that at least one of the interaction terms is not 0. Thus,
we would prefer the second model.
d.
11.97
E(y) = 0 + 2x2 + 4x22.
b.
11.96
H0: 1 = 3 = 5 = 0
Ha: At least 1 i 0
If we fail to reject H0 in part b, then we would conclude that we have no evidence to indicate that the
interaction terms were significant. Thus, we would prefer the first model.
a.
Let x1 = cycle speed and x2 = cycle pressure ratio. A complete second order model is:
2
2
E ( y ) 0 1 x1 2 x2 3 x1 4 x2 5 x1 x2
b.
To determine whether the curvature terms in the complete 2nd order model are useful for predicting
heat rate, we test:
Ho: 3 = 4 = 0
Ha: At least one of the parameters 3 , 4 differs from 0
c.
2
2
The complete model is: E ( y ) 0 1 x1 2 x2 3 x1 4 x2 5 x1 x2
The reduced model is: E ( y ) 0 1 x1 2 x2 + 5x1x2

d.
From the printout, SSER = 25,310,639, SSEC = 19,370,350, and MSEC = 317,547.
e.

F
(SSE R SSE C ) /(k g ) 25, 310, 639 19, 370, 350 /(5 3)
9.35
SSE C /[ n ( k 1)]
19, 370, 350 /[67 (5 1)]
748
Chapter 11
f.
The rejection region requires = .10 in the upper tail of the F-distribution with
1 = k g = 5 3 = 2 and 2 = n (k + 1) = 67 (5 + 1) = 61. From Table VII, Appendix B,
g.
11.98
a.
rejected. There is sufficient evidence to indicate at least one of the curvature terms in the complete
2nd order model are useful for predicting heat rate at
= .10.
Model 1: R2 = .101. 10.1% of the total variation in the supervisor-directed aggression score is
explained by the terms in Model 1.
Model 2: R2 = .555. 55.5% of the total variation in the supervisor-directed aggression score is
explained by the terms in Model 2.
b.
To compare the fits of Model 1 and Model 2, we test:

H0: 5 = 6 = 7 = 8 = 0
Ha: At least 1 i 0
c.
Yes. All of the terms in Model 1 are contained in Model 2.
d.
H0 would be rejected. There is sufficient evidence that at least one of the variables Self-esteem,
history of aggression, Interactional injustice at primary job, and Abusive supervisor at primary job is
significant in predicting supervisor-directed aggression score.
e.
Model 3: E(y) = 0 + 1(Age) + 2(Gender) + 3(Interaction injustice at 2nd job) +

4(Abusive supervisor at 2nd job) + 5(Self-esteem) + 6(History of aggression) +
7(Interactional injustice at primary job) + 8(Abusive supervisor at primary job) +
9(Self-esteem)(History of aggression) + 10(Self-esteem)(Interactional injustice at primary job) +
11(Self-esteem)(Abusive supervisor at primary job) +
12(History of aggression)(Interactional injustice at primary job) +
13(History of aggression) (Abusive supervisor at primary job) +
14(Interactional injustice at primary job)(Abusive supervisor at primary job).
f.
To compare Model 2 with Model 3, we test:

H0: 9 = 10 = . . . = 14 = 0
Ha: At least 1 i 0
The p-value for the test is p > .10. Since the p-value > .10, H0 is not rejected. There is insufficient
evidence to indicate any of the interaction terms are significant in predicting supervisor-directed
aggression score for any reasonable value of .
11.99
a.
The hypothesized equation for E(y) is:

E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9 10 x10
b.
To determine if the initial model is sufficient, we test:

H0: 3 = 4 = = 10 = 0
Ha: At least one i 0 i = 3, 4, , 10
749
c.
Since the F was significant, we reject H0 at = .05. There is sufficient evidence to indicate that at
least one of the additional variables (student ethnicity, socio-economic status, school performance,
number of math courses taken in high school and overall GPA in the math courses) contributes to the
prediction of the SAT-math score.
d.
2
Radj = .79. 79% of the sample variability of SAT-math scores is explained by the model containing
the 10 independent variables, adjusted for the sample size and the number of variables.
e.
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with df =
n (k + 1) = 3,492 (10 + 1) = 3,481, t.025 = 1.96. The confidence interval is:
2 t / 2 s 14 1.96(3) 14 5.88 (8.12, 19.88)

2
We are 95% confident that the mean SAT-Math score for those who were coached was anywhere
from 8.12 to 19.88 points higher than the mean for those who were not coached, holding all other
variables constant.
f.
Yes. The value of 2 decreased from 19 to 14 when the additional variables were added to the
model. Thus, the increase from coaching is not as great.
g.
Te new model including all the interaction terms is:

E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9 10 x10
11 x1 x2 12 x3 x2 13 x4 x2 14 x5 x2 15 x6 x2 16 x7 x2 17 x8 x2
18 x9 x2 19 x10 x2
h.
To determine if the model with the interaction terms is better in predicting SAT-Math scores, we test:
H0: 11 = 12 = = 19 = 0
Ha: At least one i 0 i = 11, 12, , 19
We would fit the complete model above. We would then compare it to the fitted model from part a
(Reduced model). The test statistic would be:
F

SSE C /[n (k 1)]
750
Chapter 11
11.100 a.
Using MINITAB, the results for fitting the reduced model are:
Regression Analysis: Price versus X1, X2, X3, X4, X1X2, X1X3, X1X4
Price = - 286970 + 9.32 X1 + 578133 X2 + 60968 X3 - 575769 X4 - 10.4 X1X2
- 6.52 X1X3 + 1.00 X1X4
Predictor
Constant
X1
X2
X3
X4
X1X2
X1X3
X1X4
Coef
-286970
9.317
578133
60968
-575769
-10.408
-6.522
1.000
S = 30785.9
SE Coef
161003
2.900
183578
292823
325699
3.060
3.300
3.903
R-Sq = 70.0%
T
-1.78
3.21
3.15
0.21
-1.77
-3.40
-1.98
0.26
P
0.090
0.004
0.005
0.837
0.092
0.003
0.062
0.800
R-Sq(adj) = 59.5%
Source
Regression
Residual Error
Total
Source
X1
X2
X3
X4
X1X2
X1X3
X1X4
DF
1
1
1
1
1
1
1
DF
7
20
27
SS
44177277861
18955448594
63132726455
MS
6311039694
947772430
F
6.66
P
0.000
Seq SS
3591326
8414868549
9294417537
1463449502
17344397940
7594294303
62258704
From Exercise 11.89, SSEC = 9,499,097,458, n = 28, and k = 11.

To determine if the quadratic terms are statistically useful for predicting sales price, we test:
H0: 2 = 9 = 10 = 11 = 0
Ha: At least 1 i 0
(SSE R SSE C ) /(k g ) (18, 955, 448, 594 9, 499, 097, 458) /(11 7)
3.98
SSE C /[ n ( k 1)]
9, 499, 097, 458 /[28 (11 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 11 7 = 4
and 2 = n (k + 1) = 28 (11 + 1) = 16. From Table VIII, Appendix B, F.05 = 3.01. The rejection
region is F > 3.01.
rejected. There is sufficient evidence to indicate at least one of the quadratic terms is statistically
useful for predicting sales price at = .05.
b.
Since we rejected H0 in part a, the complete model is preferred. At least one of the quadratic terms is
significant.
d.
751
The preferred model from part b is the complete model. Using MINITAB, the results of fitting the
model without the interaction terms is:
Regression Analysis: Price versus X1, X1SQ, X2, X3, X4
Price = 289549 - 2.15 X1 + 0.000019 X1SQ - 57530 X2 - 203755 X3 - 24038 X4
Predictor
Constant
X1
X1SQ
X2
X3
X4
Coef
289549
-2.150
0.00001888
-57530
-203755
-24038
S = 43381.9
SE Coef
138840
3.325
0.00001621
50653
113332
67099
R-Sq = 34.4%
T
2.09
-0.65
1.16
-1.14
-1.80
-0.36
P
0.049
0.524
0.257
0.268
0.086
0.724
R-Sq(adj) = 19.5%
Source
Regression
Residual Error
Total
Source
X1
X1SQ
X2
X3
X4
DF
1
1
1
1
1
DF
5
22
27
SS
21729048947
41403677509
63132726455
MS
4345809789
1881985341
F
2.31
P
0.079
Seq SS
3591326
64275360
11338642654
10081000583
241539024
To determine whether region and sales volume interact to affect sales price, we test:
H0: 6 =7 =8 = 9 = 10 = 11 = 0
Ha: At least 1 i 0
(SSE R SSE C ) /(k g ) (41, 403, 677, 509 9, 499, 097, 458) /(11 5)
8.96
SSE C /[ n ( k 1)]
9, 499, 097, 458 /[28 (11 1)]
region is F > 2.74.
rejected. There is sufficient evidence to indicate region and sales volume interact to affect sales price
at = .05.
d.
Since we rejected H0 in part c, the complete model is preferred. At least one of the interaction terms is
significant.
752
Chapter 11
11.101 a.
The model would be:

E(y) = 0 + 1x1 + 2x2 + 3x3
b.
The model including the interaction terms is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3
c.
For AL, x2 = x3 = 0. The model would be:

E(y) = 0 + 1x1 + 2(0) + 3(0) + 4x1(0) + 5x1(0) = 0 + 1x1
The slope of the line is 1.
For TDS-3A, x2 = 1 and x3 = 0. The model would be:
E(y) = 0 + 1x1 + 2(1) + 3(0) + 4x1(1) + 5x1(0) = (0 + 2) + (1 + 4)x1
For FE, x2 = 0 and x3 = 1. The model would be:
E(y) = 0 + 1x1 + 2(0) + 3(1) + 4x1(0) + 5x1(1) = (0 + 3) + (1 + 5)x1
d.
To test for the presence of temperature-waste type interaction, we would fit the complete model listed
in part b and the reduced model found in part a. The hypotheses would be:
H0: 4 = 5 = 0
Ha: At least one i 0, for i = 4, 5
The test statistic would be F

where k = 5, q = 3, SSER is the SSE for the
SSE C /[n (k 1)]
reduced model, and SSEC is the SSE for the complete model.
11.102 a.
To determine whether the rate of increase of emotional distress with experience is different for the
two groups, we test:
H0: 4 = 5 = 0
b.
To determine whether there are differences in mean emotional distress levels that are attributable to
exposure group, we test:
H0: 3 = 4 = 5 = 0
c.
To determine whether there are differences in mean emotional distress levels that are attributable to
exposure group, we test:
H0: 3 = 4 = 5 = 0
(SSE R SSE C) /(k g )

(795.23 783.9) /(5 2)
=
= .93
783.9 /[200 (5 1)]
SSE C /[n (k 1)]
753
and 2 = n (k + 1) = 200 (5 + 1) = 194. From Table VIII, Appendix B, F.05 2.60. The rejection
region is F > 2.60.
(F = .93 2.60), H0 is not rejected. There is insufficient evidence to indicate that there are
differences in mean emotional distress levels that are attributable to exposure group at = .05.
11.103 a.
Using MINITAB, the output from fitting a complete second-order model is:
* NOTE *
* NOTE *
* NOTE *
X1 is highly correlated with other

X2 is highly correlated with other
X1X2 is highly correlated with other
predictor variables
predictor variables
predictor variables

Y = 172788 - 10739 X1 - 499 X2 - 20.2 X1X2 + 198 X1SQ + 14.7 X2SQ
Predictor
Constant
X1
X2
X1X2
X1SQ
X2SQ
s = 13132
Coef
172788
-10739
-499
-20.20
197.57
14.678
Stdev
97785
2789
1444
21.36
22.60
8.819
R-sq = 95.9%
t-ratio
1.77
-3.85
-0.35
-0.95
8.74
1.66
p
0.084
0.000
0.731
0.350
0.000
0.103
R-sq(adj) = 95.5%
SOURCE
Regression
Error
Total
DF
SS
MS
5 1.70956E+11 34191134720
42 7242915328
172450368
47 1.78199E+11
SOURCE
X1
X2
X1X2
X1SQ
X2SQ
DF
SEQ SS
1 1.56067E+11
1
13214024
1 1686339840
1 12711371776
1
477704384
Unusual
Obs.
14
22
34
43
47
Observations
X1
Y
62.9
203288
45.4
27105
28.2
28722
64.3
230329
63.9
212309
Fit Stdev.Fit
235455
6002
58567
3603
15156
11311
248054
8790
240469
4904
F
198.27
Residual
-32167
-31462
13566
-17725
-28160
p
0.000
St.Resid
-2.75R
-2.49R
2.03RX
-1.82 X
-2.31R

X denotes an obs. whose X value gives it large influence.
754
Chapter 11
b.
To test the hypothesis H0: 4 = 5 = 0, we must fit the reduced model

E(y) = 0 + 1x1 + 2x2 + 3x1x2
Using MINITAB, the output from fitting the reduced model is:
* NOTE *
X1X2 is highly correlated with other
predictor variables

Y = - 476768 + 11458 X1 + 3404 X2 - 64.4 X1X2
Predictor
Constant
X1
X2
X1X2
Coef
-476768
11458
3404
-64.35
s = 21549
Stdev
100852
1874
1814
33.77
R-sq = 88.5%
t-ratio
-4.73
6.11
1.88
-1.91
p
0.000
0.000
0.067
0.063
R-sq(adj) = 87.8%
SOURCE
Regression
Error
Total
DF
SS
MS
3 1.57767E+11 52588867584
44 20431990784
464363424
47 1.78199E+11
SOURCE
X1
X2
X1X2
DF
SEQ SS
1 1.56067E+11
1
13214024
1 1686339840
Unusual
Obs.
34
38
43
Observations
X1
Y
28.2
28722
66.5
290411
64.3
230329
Fit Stdev.Fit
-59713
11922
250350
9553
202899
11574
F
113.25
Residual
88435
40061
27430
p
0.000
St.Resid
4.93RX
2.07R
1.51 X

The test is:

H0: 4 = 5 = 0
Ha: At least one i 0, for i = 4, 5

SSE C /[n (k 1)]
=
(20, 431, 990, 784 7, 242, 915, 328) /(5 3)

= 38.24
7, 242, 915, 328 /[48 (5 1)]
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k g = 5 3 = 2
region is F > 3.23.
755
rejected. There is sufficient evidence to indicate that at least one of the quadratic terms contributes to
the prediction of monthly collision claims at = .05.
c.
From part b, we know at least one of the quadratic terms is significant. From part a, it appears that
none of the terms involving x2 may be significant.
2
Thus, we will fit the model with just x1 and x1 . The MINITAB output is:

Y = 185160 - 11580 X1 + 196 X1SQ
Predictor
Constant
X1
X1SQ
Coef
185160
-11580
195.54
s = 13219
Stdev
54791
2182
21.64
R-sq = 95.6%
t-ratio
3.38
-5.31
9.04
p
0.002
0.000
0.000
R-sq(adj) = 95.4%
SOURCE
Regression
Error
Total
DF
SS
MS
2 1.70335E+11 85167357952
45 7863868416
174752624
47 1.78199E+11
SOURCE
X1
X1SQ
DF
SEQ SS
1 1.56067E+11
1 14267676672
Unusual
Obs.
10
14
22
34
38
47
Observations
X1
Y
35.8
28957
62.9
203288
45.4
27105
28.2
28722
66.5
290411
63.9
212309
Fit Stdev.Fit
21200
5825
230397
4044
62456
2856
14099
11344
279798
6189
243611
4570
F
487.36
Residual
7757
-27109
-35351
14623
10613
-31302
p
0.000
St.Resid
0.65 X
-2.15R
-2.74R
2.15RX
0.91 X
-2.52R

To see if any of the terms involving x2 are significant, we test:

H0: 2 = 3 = 5 = 0
Ha: At least one i 0, for i = 2, 3, 5

SSE C /[ n ( k 1)]
=
(7, 863, 868, 416 7, 242, 915, 328) /(5 2)

= 1.20
7, 242, 915, 328 /[48 (5 1)]
756
Chapter 11
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k g = 5 2 = 3
region is F > 2.84
Since the observed value of the test statistic does not fall in the rejection region (F = 1.20 2.84),
H0 is not rejected. There is insufficient evidence to indicate that any of the terms involving x2
contribute to the model at = .05.
2
Thus, it appears that the best model is E(y) = 0 + 1x1 + 2 x1 . The model does not support the
analyst's claim. In the model above, the estimate for 2 is positive. This would indicate that the
higher claims are for both the young and the old. Also, there is no evidence to support the claim that
there are more claims when the temperature goes down.
11.104 a.
The best one-variable predictor of y is the one whose t statistic has the largest absolute value. The t
statistics for each of the variables are:
Independent
Variable

t = 1.6/.42 = 3.81
x1
x2
t = .9/.01 = 90
x3
t = 3.4/1.14 = 2.98
x4
t = 2.5/2.06 = 1.21
x5
t = 4.4/.73 = 6.03
x6
t = .3/.35 = .86
The variable x2 is the best one-variable predictor of y. The absolute value of the corresponding t
score is 90. This is larger than any of the others.
b.
Yes. In the stepwise procedure, the first variable entered is the one which has the largest absolute
value of t, provided the absolute value of the t falls in the rejection region.
c.
Once x2 is entered, the next variable that is entered is the one that, in conjunction with x2, has the
largest absolute t value associated with it.
11.105 a.
In Step 1, all one-variable models are fit to the data. These models are of the form:
E(y) = 0 + 1xi
Since there are 7 independent variables, 7 models are fit. (Note: There are actually only 6
independent variables. One of the qualitative variables has three levels and thus two dummy
variables. Some statistical packages will allow one to bunch these two variables together so that they
are either both in or both out. In this answer, we are assuming that each xi stands by itself.
b.
In Step 2, all two-varirable models are fit to the data, where the variable selected in Step 1, say x1, is
one of the variables. These models are of the form:
E(y) = 0 + 1x1 + 2xi
Since there are 6 independent variables remaining, 6 models are fit.

c.
757
In Step 3, all three-variable models are fit to the data, where the variables selected in Step 2, say x1
and x2, are two of the variables. These models are of the form:
E(y) = 0 + 1x1 + 2x2 + 3xi
Since there are 5 independent variables remaining, 5 models are fit.
d.
The procedure stops adding independent variables when none of the remaining variables, when
added to the model, have a p-value less than some predetermined value. This predetermined value is
usually = .05.
e.
Two major drawbacks to using the final stepwise model as the "best" model are:
(1) An extremely large number of single parameter t-tests have been conducted. Thus, the
probability is very high that one or more errors have been made in including or excluding
variables.
(2)
11.106 a.
Often the variables selected to be included in a stepwise regression do not include the highorder terms. Consequently, we may have initially omitted several important terms from the
model.
In the first step, there are 8 one-variable models fit to the data.
b.
The best one-variable model is the model that contains the one variable with the largest absolute
value of the t-statistic. This would also correspond to the one variable with the smallest p-value.
c.
In step 2, there would be 7 two-variable models fit to the data.
d.
1 .28 . The mean relative error for developers is estimated to be .28 lower than the mean relative
error for project leaders, holding previous accuracy constant.
.27 . The mean relative error for previous accuracy more than 20% is estimated to be .27 higher
8
than the mean relative error for previous accuracy less than 20%, holding company role of estimator
constant.
e.
11.107 a.
b.
There are a couple of reasons for being wary of using this model as the final model. First, in stepwise
regression, once a variable is in the model, it cannot be dropped. The best one variable model might
contain x1, but the best model may contain the variables x2 and x3. By including x1 in the model, we
may never get to the best model. Another reason to be wary is that we have not considered any 2nd
order terms in the model or any interactions. These higher order terms might be very important in the
model.
In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.
In step 2, all two-variable models are fit, where 1 of the variables is the best one selected in step 1.
Thus, a total of 10 two-variable models are fit.
c.
In the 11th step, only one model is fit the model containing all the independent variables.
d.
The model would be:

E ( y ) 0 1 x11 2 x4 3 x2 4 x7 5 x10 6 x1 7 x9 8 x3
758
Chapter 11
e.
67.7% of the total sample variability of overall satisfaction is explained by the model containing the
independent variables safety on bus, seat availability, dependability, travel time, convenience of
route, safety at bus stops, hours of service, and frequency of service.
f.
Using stepwise regression does not guarantee that the best model will be found. There may be better
combinations of the independent variables that are never found, because of the order in which the
independent variables are entered into the model. In addition, there are no squared or interaction
terms included. There is a high probability of making at least one Type 1 error.
11.108 a.
From the printout, the three variables that should be included in the model are: ST-DEPTH,
TGRSWT, and TI. They are all entered into the model using stepwise regression and all are retained.
b.
No. There may be other independent variables that were not included.
c.
The model is E(y) = 0 + 1x4 + 2x5 + 3x6 + 4x4x5 + 5x4x6 + 6x5x6
d.
He would test
H0: 4 = 5 = 6 = 0 versus
He would fit the first-order model and record SSER. He would then fit the model with the interaction
terms and record SSEC.
e.

SSE C /[ n ( k 1)]
To improve the model, the marine biologist could try to find other independent variables that affect y,
the log of the number of marine animals present, or higher order terms of the already identified
independent variables.
11.109 Yes. x2 and x4 are highly correlated (.93), as well as x4 and x5 (.86). When highly correlated independent
variables are present in a regression model, the results can be confusing. The researcher may want to
include only one of the variables.
11.110 a.
The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape. Such a
pattern usually indicates that curvature needs to be added to the model
b.
The plot of the residuals reveals a nonrandom pattern. The residuals versus the predicted values
shows a pattern where the range in values of the residuals increases as y increases. This indicates
that the variance of the random error, , becomes larger as the estimate of E(y) increases in value.
Since E(y) depends on the x-values in the model, this implies that the variance of is not constant
for all settings of the x's.
c.
This plot reveals an outlier, since all or almost all of the residuals should fall within 3 standard
deviations of their mean of 0.
d.
This frequency distribution of the residuals is skewed to the right. This may be due to outliers or
could indicate the need for a transformation of the dependent variable.
11.111 a.
Since the absolute value of the correlation coefficient is .983, this would imply there is a very high
potential for multicollinearity.
b.
Since the absolute value of the correlation coefficient is .074, this would imply there is a very low
c.
Since the absolute value of the correlation coefficient is .722, this would imply there is a moderate
d.
11.112 a.
b.
759
Since the absolute value of the correlation coefficient is .528, this would imply there is a moderate
Since all the pairwise correlations are .45 or less in absolute value, there is little evidence of extreme
multicollinearity.
No. The overall model test is significant (p < .001). This implies that at least one variable contributes
to the prediction of the urban/rural rating. Looking at the individual t-tests, there are several that are
significant, namely x1, x3, and x5. There is no evidence that multicollinearity is present.
11.113 It is possible that company role of estimator and previous accuracy could be correlated with each other.
This indicates multicollinearity may be present
11.114 First, we need to compute the value of the residual:
Residual y y 87 29.63 57.37

We are given that the standard deviation is s = 24.68. Thus, an observation with a residual of 57.37 is
57.37 / 24.68 = 2.32 standard deviations from the fitted regression line. Since this is less than 3 standard
deviations from the regression line, this point is not considered an outlier.
11.115 a.
b.
11.116 a.
The normal probability plot should be used to check for normal errors. The points in this plot are
fairly close to the straight line, so the assumption of normality appears to be satisfied.
The graph of the residuals versus the fitted or predicted values should be used to check for unequal
variances. The spread of the residuals appears to be fairly constant in this graph. It appears that the
assumption of equal variances is satisfied.
Regression Analysis: Food versus Income, Size
Food = 2.79 - 0.00016 Income + 0.383 Size
Predictor
Constant
Income
Size
Coef
2.7944
-0.000164
0.38348
S = 0.7188
SE Coef
0.4363
0.006564
0.07189
R-Sq = 55.8%
T
6.40
-0.02
5.33
P
0.000
0.980
0.000
R-Sq(adj) = 52.0%
Source
Regression
Residual Error
Total
Source
Income
Size
DF
1
1
DF
2
23
25
SS
15.0027
11.8839
26.8865
MS
7.5013
0.5167
F
14.52
P
0.000
Seq SS
0.2989
14.7037
Correlations: Income, Size

Pearson correlation of Income and Size = 0.137
P-Value = 0.506
No; Income and household size do not seem to be highly correlated. The correlation coefficient
between income and household size is .137.
760
Chapter 11
Using MINITAB, the residual plots are:

Histogram of the Residuals
(response is Food)
Frequency
10
0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Residual
Residuals Versus the Fitted Values

(response is Food)
3
Residual
-1
3
Fitted Value
Residuals Versus Income

(response is Food)
3
Residual
b.
-1
0
10
20
30
40
50
60
70
80
90
100
Income
761
Residuals Versus Size

(response is Food)
3
Residual
-1
0
Size
Yes; The residuals versus income and residuals versus homesize exhibit a curved shape. Such a
pattern could indicate that a second-order model may be more appropriate.
c.
No; The residuals versus the predicted values reveals varying spreads for different values of y . This
implies that the variance of is not constant for all settings of the x's.
d.
Yes; The outlier shows up in several plots and is the 26th household (Food consumption = $7500,
income = $7300 and household size = 5).
e.
No; The frequency distribution of the residuals shows that the outlier skews the frequency
distribution to the right.
762
Chapter 11
11.117 Using MINITAB, the residual plots are:
Residual Plots for ARSENIC

Normal Probability Plot of the Residuals
Percent
99
90
50
10
1
0.1

Standardized Residual
99.9
-4
-2
0
2
4
2
0

Frequency
80
Fitted Value
120
160
Residuals Versus the Order of the Data
80
60
40
20
0
40
-0.75 0.00 0.75 1.50 2.25 3.00 3.75 4.50

4
2
0
50
100
150
200
250
Observation Order
300
Scatterplot of SRES1 vs LATITUDE, LONGITUDE, DEPTH-FT

LATITUDE
LONGITUDE
4
2
SRES1
0
23.76
23.77 23.78 23.79

DEPTH-FT
23.80
90.60
90.62
90.64
4
2
0
50
100
150
200
90.66
763
a.
From the histogram of the standardized residuals, it appears that the mean of the residuals is close to
0. Thus, the assumption that the mean error is 0 appears to be met.
b.
From the plot of the standardized residuals versus the fitted values, it appears that the spread of the
residuals increases as the fitted values increase. Thus, it appears that the assumption of constant
variance is violated.
c.
From the plots of the standardized residuals versus the fitted values, it appears that there are some
outliers. There are several observations with standardized residuals of 4 or more.
d.
From the normal probability plot, the data do not form a straight line. Thus, it appears that the
assumption of normal error terms is violated.
e.
Using MINITAB, the correlations among the independent variables are:

Correlations: LATITUDE, LONGITUDE, DEPTH-FT
LONGITUDE
DEPTH-FT
LATITUDE
0.311
0.000
LONGITUDE
0.151
0.006
-0.328
0.000
Cell Contents: Pearson correlation

P-Value
None of the pairwise correlations are large in absolute value, so there is no evidence of
multicollinearity. In addition, the global test indicates that at least one of the independent variables is
significant and each of the independent variables is statistically significant. This also indicates that
multicollinearity does not exist.
764
Chapter 11
11.118 Using MINITAB, the residual plots are:
Residual Plots for DDT

Percent
99
90
50
10
1
0.1

99.9
-5
0
5
10.0
7.5
5.0
2.5
0.0
10

50
2
4
6
8
10
10.0
7.5
5.0
2.5
0.0
1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40
1 1 1 1 1
Observation Order
Residuals Versus WEIGHT

(response is DDT)
12
10
8
6
4
2
0
0
500
1000
1500
2000
2500
WEIGHT
Residuals Versus LENGTH

(response is DDT)
12
10
Frequency
100
8
6
4
2
0
20
25
30
35
LENGTH
40
100
150
50
Fitted Value
45
50
55
765
Residuals Versus MILE

(response is DDT)
12
10
8
6
4
2
0
0
50
100
150
200
250
300
350
MILE
From the normal probability plot, the points do not fall on a straight line, indicating the residuals are not
normal. The histogram of the residuals indicates the residuals are skewed to the right, which also indicates
that the residuals are not normal. The plot of the residuals versus yhat indicates that there is at least one
outlier and the variance is not constant. One observation has a standardized residual of more than 10 and
several others have standardized residuals greater than 3. This is also evident in the plots of the residuals
versus each of the independent variables. Since the assumptions of normality and constant variance appear
to be violated, we could consider transforming the data. We should also check the outlying observations to
see if there are any errors connected with these observations.
11.119 a.

Regression Analysis: Time versus Temp
Time = 30856 - 192 Temp
Predictor
Constant
Temp
Coef
30856
-191.57
S = 1099.17
SE Coef
2713
18.49
R-Sq = 84.3%
T
11.37
-10.36
P
0.000
0.000
R-Sq(adj) = 83.5%
Source
Regression
Residual Error
Total
DF
1
20
21
SS
129663987
24163399
153827386
MS
129663987
1208170
F
107.32
P
0.000
The fitted regression line is y = 30,856 191.57temp

b.
For temperature = 149, y = 30,856 191.57(150) = 2,312.07 . There are 2 observations with a
temperature of 149. The residuals for the microchips manufactured at a temperature of 149o C are
r = y y = 1,100 2,312.07 = 1,212.07 and r = y y = 1,150 2,312.07 = 1,162.07 .
766
Chapter 11
c.
Using MINITAB, the plot of the residuals versus temperature is:

Scatterplot of RESI1 vs Temp
3000
2000
RESI1
1000
-1000
-2000
120
130
140
150
160
170
Temp
There appears to be a U-shaped trend to the data.

d.
Yes. Because there appears to be a U-shaped trend to the data, this indicates that there is a
curvilinear relationship between temperature and time.
11.120 Using MINITAB, the results of the regression are:

Regression Analysis: HEATRATE versus RPM, CPRATIO, RPM*CPR
HEATRATE = 12065 + 0.170 RPM - 146 CPRATIO - 0.00242 RPM*CPR
Predictor
Constant
RPM
CPRATIO
RPM*CPR
Coef
12065.5
0.16969
-146.07
-0.002425
S = 633.842
SE Coef
418.5
0.03467
26.66
0.003120
R-Sq = 84.9%
T
28.83
4.89
-5.48
-0.78
P
0.000
0.000
0.000
0.440
R-Sq(adj) = 84.2%
Source
Regression
Residual Error
Total
Source
RPM
CPRATIO
RPM*CPR
DF
1
1
1
DF
3
63
66
SS
142586570
25310639
167897208
MS
47528857
401756
F
118.30
P
0.000
Seq SS
119598530
22745478
242561
Obs
11
28
36
61
62
64
RPM
18000
22516
4473
33000
30000
3600
HEATRATE
14628.0
14796.0
13523.0
16243.0
14628.0
8714.0
Fit
12710.6
14561.9
11428.0
16105.3
15296.4
7258.6
SE Fit
165.1
277.9
171.5
410.2
288.7
427.1
Residual
1917.4
234.1
2095.0
137.7
-668.4
1455.4
St Resid
3.13R
0.41 X
3.43R
0.28 X
-1.18 X
3.11RX

767
The residual plots are:
Residual Plots for HEATRATE

Percent
99
90
50
10
1
0.1

99.9
-4
-2
0
2
4
2
0
-2
8000

10
5
-1
0
1
2
2
0
-2
1 5 10 15 20 25 30 35 40 45 50 55 60 65
Observation Order
(response is HEATRATE)
4
3
2
1
0
-1
-2
5
10
15
20
CPRATIO
25
30
35
Residuals Versus RPM

(response is HEATRATE)
4
3
2
1
0
-1
-2
0
5000
16000
Residuals Versus CPRATIO
Frequency
15
-2
14000
20
10000
12000
Fitted Value
10000
15000
20000
25000
30000
35000
RPM
768
Chapter 11
From the normal probability plot, the points do not fall on a straight line, indicating the residuals are
not normal. The histogram of the residuals indicates the residuals are skewed to the right, which also
indicates that the residuals are not normal. The plot of the residuals versus yhat indicates that there
are potentially 3 outliers with standardized residuals of 3 or more. The variance appears to be
constant. On the graph of the residuals versus RPM, the spread of the residuals appears to decrease
as the value of RPM increases. This indicates the variance may not be constant for RPMs. Since the
assumptions of normality and constant variance appear to be violated, we could consider
transforming the data. We should also check the outlying observations to see if there are any errors
connected with these observations.
11.121 In multiple regression, as in simple regression, the confidence interval for the mean value of y is narrower
than the prediction interval of a particular value of y.
11.122 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample means. The
further x1, x2, and x3 are from their means, the larger the error. When x1 = 60, x2 = .4, and x3 = 900, the
observed values are outside the observed ranges of the x values. When x1 = 30, x2 = .6, and x3 = 1300, the
observed values are within the observed ranges and consequently the x values are closer to their means.
Thus, when x1 = 30, x2 = .6, and x3 = 1300, the error of prediction is smaller.
11.123 The model-building step is the key to the success or failure of a regression analysis. If the model is a good
model, we will have a good predictive model for the dependent variable y. If the model is not a good
model, the predictive ability will not be of much use.
11.124 a.
To determine if at least one of the parameters is not zero, we test:

H0: 1 = 2 = 3 = 4 = 0
R2 / k
(1 R ) /[n (k 1)]
2
.83 / 4
= 24.41
(1 .83)([25 (4 1)]
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 4
and denominator df = n (k + 1) = 25 (4 + 1) = 20. From Table VIII, Appendix B, F.05 = 2.87.
The rejection region is F > 2.87.
rejected. There is sufficient evidence to indicate at least one of the parameters is nonzero at =
.05.
b.
H0: 1 = 0
Ha: 1 < 0
1 0
s
2.43 0
= 2.01
1.21
The rejection region requires = .05 in the lower tail of the t distribution with df = n (k + 1) = 25
(4 + 1) = 20. From Table V, Appendix B, t.05 = 1.725. The rejection region is t < 1.725.
rejected. There is sufficient evidence to indicate 1 is less than 0 at = .05.
c.
769
H0: 2 = 0
Ha: 2 > 0
2 0
s
.05 0
= .31
.16
The rejection region requires = .05 in the upper tail of the t distribution. From part b above, the
rejection region is t > 1.725.
Since the observed value of the test statistic does not fall in the rejection region (t = .31 1.725), H0
is not rejected. There is insufficient evidence to indicate 2 is greater than 0 at = .05.

d.
H0: 3 = 0
Ha: 3 0
3 0
s
.62 0
= 2.38
.26
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = 20. From
Table V, Appendix B, t.025 = 2.086. The rejection region is t < 2.086 or t > 2.086.
rejected. There is sufficient evidence to indicate 3 is different from 0 at = .05.
11.125 a.
The least squares equation is y = 90.1 1.836x1 + .285x2
b.
R2 = .916. About 91.6% of the sample variability in the y's is explained by the model E(y) = 0 +
1x1 + 2x2
c.
To determine if the model is useful for predicting y, we test:

H0: 1 = 2 = 0
MSR 7400
= 64.91
=
MSE 114
3.89.
rejected. There is sufficient evidence to indicate the model is useful for predicting y at = .05.
770
Chapter 11
d.
H0: 1 = 0
Ha: 1 0
1
s
1.836
= 5.01
.367
> 2.179.
rejected. There is sufficient evidence to indicate 1 is not 0 at = .05.
e.
The standard deviation is MSE = 114 = 10.68. We would expect about 95% of the
observations to fall within 2(10.68) = 21.36 units of the fitted regression line.
11.126 From the plot of the residuals for the straight line model, there appears to be a mound shape which implies
the quadratic model should be used.
11.127 E(y) = 0 + 1x1 + 2x2 + 3x3
1, if level 2
where x1 =
0, otherwise
11.128 a.
1, if level 3
x2 =
0, otherwise
1, if level 4
x3 =
0, otherwise
E(y) = 0 + 1x1 + 2x2 + 3x3

1, if level 2
1, if level 3
x3 =
where x2 =
0, otherwise
0, otherwise
b.
2
2
2
E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x1 x2 + 8 x1 x3
where x1, x2, and x3 are as in part a.

11.129 The stepwise regression method is used to try to find the best model to describe a process. It is a screening
procedure that tries to select a small subset of independent variables from a large set of independent
variables that will adequately predict the dependent variable. This method is useful in that it can eliminate
some unimportant independent variables from consideration.
11.130 a.
b.
E(y) = 0 + 1x1 + 2x2

2
E(y) = 0 + 1x1 + 2 x12 + 3x2 + 4 x2 + 5x1x2
11.131 Even though SSE = 0, we cannot estimate 2 because there are no degrees of freedom corresponding to
error. With three data points, there are only two degrees of freedom available. The degrees of freedom
corresponding to the model is k = 2 and the degrees of freedom corresponding to error is n (k + 1) = 3
(2 + 1) = 0. Without an estimate for 2, no inferences can be made.

11.132 a.
b.
771
Ha: At least one of 4 and 5 0

The regression model
2
2
E(y) = 0 + 1x1 + 2x2 + 3 x2 + 4x1x2 + 5x1 x2
is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The regression model
2
E(y) = 0 + 1x1 + 2x2 + 3 x2
is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then the test statistic
is:
F=

SSE C /[n (k 1)]
where k = 5, g = 3, and n = 35.

c.
The numerator degrees of freedom is k g = 5 3 = 2, and the denominator degrees of freedom is n

(k + 1) = 35 (5 + 1) = 29.
d.
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = 2 and
denominator df = 29. From Table VIII, Appendix B, F.05 = 3.33. The rejection region is F > 3.33.
11.133 a.
b.
A confidence interval for the difference of two population means, ( 1 2 ), could be used. Since
both sample sizes are over 30, the large sample confidence interval is used (with independent
samples).
1 if public college
Let x =
0 otherwise
The model is E(y) = 0 + 1x
c.
11.134 a.
1 is the difference between the two population means. A point estimate for 1 is 1 . A confidence
interval for 1 could be used to estimate the difference in the two population means.
1.
2.
3.
4.
5.
b.
The "Quantitative GMAT score" is measured on a numerical scale, so it is a quantitative

variable.
The "Verbal GMAT score" is measured on a numerical scale, so it is a quantitative variable.
The "Undergraduate GPA" is measured on a numerical scale, so it is a quantitative variable.
The "First-year graduate GPA" is measured on a numerical scale, so it is a quantitative
variable.
The "Student cohort" has 3 categories, so it is a qualitative variable. Note that the numerical
scale is meaningless in this situation. (It is possible to consider this as a quantitative variable.
However, for this problem we will consider it as qualitative.)
The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA, and first-year
graduate GPA should all be positively correlated to final GPA.
772
Chapter 11
c.
1 if student entered doctoral program in year 3

x5 =
0 otherwise
1 if student entered doctoral program in year 5
x6
0 otherwise
d.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6
e.
0 = the y-intercept for students entering in year 1.

1 = the final GPA will increase by 1 for each additional increase of one unit of GMAT score,
holding the remaining variables constant.
2 = the final GPA will increase by 2 for each additional increase of one unit of verbal GMAT score,
3 = the final GPA will increase by 3 for each additional increase of one undergraduate GPA point,
4 = the final GPA will increase by 4 for each additional increase of one first-year graduate GPA
point, holding the remaining variables constant.
5 = difference in mean final GPA between student cohort year 2 and year 1.
6 = difference in mean final GPA between student cohort year 3 and year 1.
f.
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x1x5 + 8x1x6

+ 9x2x5 + 10x2x6 + 11x3x5 + 12x3x6 + 13x4x5 + 14x4x6
g.
For the year 1 cohort, x5 = x6 = 0. The model is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5(0) + 6(0) + 7x1(0) + 8x1(0)
+ 9x2(0) + 10x2(0) + 11x3(0) + 12x3(0) + 13x4(0) + 14x4(0)
= 0 + 1x1 + 2x2 + 3x3 + 4x4
The slopes for the four variables are 1, 2, 3 and 4 respectively.
11.135 a.
b.
The type of juice extractor is qualitative.

The size of the orange is quantitative.
The model is E(y) = 0 + 1x1 + 2x2
where
x1 = diameter of orange
1 if Brand B
x2 =
0 if not
c.
To allow the lines to differ, the interaction term is added:

E(y) = 0 + 1x1 + 2x2 + 3x1x2
d.
773
For part b:
For part c:
e.
To determine whether the model in part c provides more information for predicting yield than does
the model in part b, we test:
H0: 3 = 0
Ha: 3 0
f.
The test statistic would be F =

SSE C /[n (k 1)]
To compute SSER: The model in part b is fit and SSER is the sum of squares for error.
To compute SSEC: The model in part c is fit and SSEC is the sum of squares for error.
k g = number of parameters in H0 which is 1

n (k + 1) = degrees of freedom for error in the complete model
11.136 a.
b.
R2 = .31. 31% of the total sample variation of the natural log of the level of CO2 emissions in 1996 is
explained by the model containing the 7 independent variables.
R2 k
(1 R ) [n (k 1)]
2
.31 7
3.72
(1 .31) [66 (7 1)]
The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k = 7 and
2 = n (k + 1) = 66 (7 + 1) = 58. From Table VIII, Appendix B, F.01 = 2.95. The rejection region
is F > 2.95.
rejected. There is sufficient evidence to indicate that at least one of the 7 independent variables is
useful in the prediction of natural log of the level of CO2 emissions in 1996 at = .01.
774
Chapter 11
c.
To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in 1996, we test:
H0: 1 = 0
Ha: 1 0
d.
The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is less than
(p < .05), H0 is rejected. There is sufficient evidence to indicate foreign investments in 1980 is a
useful predictor of CO2 emissions in 1996 at = .05.
11.137 Variables that are highly correlated with each other are x4 and x5 (r = -.84). When highly correlated
independent variables are present in a regression model, the results can be confusing. Possible problems
include:
1.
Global test indicates at least one independent variable is useful in the prediction of y, but none of the
individual tests for the independent variables is significant.
2.
The signs of the estimated beta coefficients are opposite from what is expected.
11.138 a.
b.
The main effects model would be: E ( y ) 0 1 x1 8 x8
1 .28 . The mean value for the relative error of the effort estimate for developers
is estimated to be .28 units below that of project leaders, holding previous accuracy constant.
8 .27 . The mean value for the relative error of the effort estimate if previous accuracy is more
than 20% is estimated to be .27 units above that if previous accuracy is less than 20%, holding
company role of estimator constant.
c.
11.139 a.
One possible reason for the sign of 1 being opposite from what is expected could be that company
role of estimator and previous accuracy could be correlated.
2
R = .712. 71.2% of the total sample variation in the fees charged by auditors is explained
by the model containing 7 independent variables.

b.
H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least one i 0, i = 1, 2, 3, ..., 7
The test statistic is F = 111.1 (from table).
Since no was given, we will use = .05. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = k = 7 and 2 = n (k + 1) = 268 (7 + 1) = 260. From Table VIII,
Appendix B, F.05 2.01. The rejection region is F > 2.01.
rejected. There is sufficient evidence to indicate that the model is adequate for predicting the audit
fees at = .05.
c.
If new auditors charge less than incumbent auditors, then 1 is negative. By definition, x1 = 1 if new
auditor and 0 if incumbent. Therefore, we will be adding to the mean only for new auditors. If new
auditors charge less, we have to add a negative number.
11.140 a.
775
1 if no
Let x1 =
0 if yes
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded yes to the question "Flextime
of the position applied for" and 1 is the difference in the mean job preference between those who
responded 'no' to the question and those who answered yes to the question.
b.
1 if referral
Let x1 =
0 if not
1 if on-premise
x2 =
0 if not
The model would be E(y) = o + 1x1 + 2x2

In this model, o is the mean job preference for those who responded none to level of day care
support required, 1 is the difference in the mean job preference between those who responded
referral and those who responded none, and 2 is the difference in the mean job preference
between those who responded on-premise and those who responded none.
c.
1 if counseling
Let x1 =
0 if not
1 if active search
x2 =
0 if not
The model would be E(y) = 0 + 1x1 + 2x2

In this model, 0 is the mean job preference for those who responded none to spousal transfer
support required, 1 is the difference in the mean job preference between those who responded
counseling and those who responded none, and 2 is the difference in the mean job preference
between those who responded active search and those who responded none.
d.
1 if not married
Let x1 =
0 if married
In this model, 0 is the mean job preference for those who responded married to marital status and
1 is the difference in the mean job preference between those who responded not married and those
who answered married.
e.
1 if female
Let x1 =
0 if male
In this model, 0 is the mean job preference for males and 1 is the difference in the mean job
preference between females and males.
776
Chapter 11
11.141 The correlation coefficient between Importance and Replace is .2682. This correlation coefficient is fairly
small and would not indicate a problem with multicollinearity between Importance and Replace. The
correlation coefficient between Importance and Support is .6991. This correlation coefficient is fairly large
and would indicate a potential problem with multicollinearity between Importance and Support. Probably
only one of these variables should be included in the regression model. The correlation coefficient
between Replace and Support is .0531. This correlation coefficient is very small and would not indicate a
problem with multicollinearity between Replace and Support. Thus, the model could probably include
Replace and one of the variables Support or Importance.
11.142 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable, say CEO
income, on the dependent variable profit (y) depends on the level of the second variable, stock percentage.
11.143 a.
1 if intervention group
Let x2 =
0 if otherwise
The first-order model would be:
E(y) = 0 + 1x1 + 2x2

b.
For the control group, x2 = 0. The first-order model is:
E(y) = 0 + 1x1 + 2(0) = 0 + 1x1

For the intervention group, x2 = 1. The first-order model is:
E(y) = 0 + 1x1 + 2(1) = 0 + 1x1 + 2 = (0 + 2) + 1x1

In both models, the slope of the line is 1.
c.
If pretest score and group interact, the first-order model would be:
E(y) = 0 + 1x1 + 2x2 + 3x1x2

d.
For the control group, x2 = 0. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(0) + 3x1(0) = 0 + 1x1

For the intervention group, x2 = 1. The first-order model including the interaction is:
E(y) = 0 + 1x1 + 2(1) + 3x1(1) = 0 + 1x1 + 2 + 3x1

= (0 + 2) + (1 + 3)x1
The slope of the model for the control group is 1. The slope of the model for the intervention group
is 1 + 3.
11.144 a.
777
The SAS output is:

DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SOURCE
DF
SUM OF
SQUARES
MEAN
SQUARE
MODEL
ERROR
C TOTAL
3
16
19
25784705.01
568826.19
26353531.20
ROOT MSE
DEP MEAN
C.V.
188.5514
3014.2
6.255438
F VALUE
PROB>F
8594901.67
35551.63709
241.758
0.0001
R-SQUARE
ADJ R-SQ
0.9784
0.9744
PARAMETER ESTIMATES
VARIABLE
PARAMETER
ESTIMATE
STANDARD
ERROR
T FOR H0:
PARAMETER=0
PROB > |T|
INTERCEP
X1
X2
X1X2
b.
DF
1
1
1
1
1333.17830
-0.15122302
-2.62532461
0.05195415
290.99944
0.37864583
5.34596285
0.006863831
4.581
-0.399
-0.491
7.569
0.0003
0.6949
0.6300
0.0001
The fitted model is y = 1333.18 .151x1 2.625x2 + .052x1x2

To determine if the overall model is useful, we test:
H0: 1 = 2 = 3 = 0
MSR
8, 594, 901.67
= 241.758
MSE
35, 551.637
and denominator df = n (k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B, F.05 = 3.24.
rejected. There is sufficient evidence to indicate the model is useful at
= .05.
c.
To determine if the interaction is present, we test:
H0: 3 = 0
Ha: 3 0
3 0
= 7.569.
778
Chapter 11
> 2.120.
rejected. There is sufficient evidence to indicate the interaction between advertising expenditure and
shelf space is present at = .05.
d.
Advertising expenditure and shelf space are said to interact if the affect of advertising expenditure on
sales is different at different levels of shelf space.
e.
If a first-order model was used, the effect of advertising expenditure on sales would be the same
regardless of the amount of shelf space. If interaction really exists, the effect of advertising
expenditure on sales would depend on which level of shelf space was present.
f.
Since the data collected are sequential, it is fairly unlikely that the error terms are independent.
11.145 a.
Not necessarily. If Nickel was highly correlated to several other variables, then it might be better to
keep Nickel and drop some of the other highly correlated variables.
b.
Using stepwise regression is a good start for selecting the best set of predictor variables. However,
one should use caution when looking at the model selected using stepwise regression. Sometimes
important variables are not selected to be entered into the model. Also, many t-tests have been run,
thus inflating the Type I and Type II error rates. One must also consider using higher order terms in
the model and interaction terms.
c.
No, further exploration should be used. One should consider using higher order terms for the
variables (i.e. squared terms) and also interaction terms.
Using MINITAB, a scattergram of the data is:
Scatterplot of Rate vs Time
1.00
0.75
Rate
11.146 a.
0.50
0.25
0.00
0.0
0.5
1.0
1.5
Time
2.0
2.5
3.0
It appears that as the time increases, the rate decreases but at a decreasing rate.

b.
779

Regression Analysis: Rate versus Time, Tmsq
Rate = 1.01 - 1.17 Time + 0.290 Tmsq
Predictor
Constant
Time
Tmsq
Coef
1.00705
-1.1671
0.28975
S = 0.101142
SE Coef
0.07899
0.1219
0.03937
R-Sq = 92.7%
T
12.75
-9.57
7.36
P
0.000
0.000
0.000
R-Sq(adj) = 91.4%
Source
Regression
Residual Error
Total
Source
Time
Tmsq
DF
1
1
DF
2
12
14
SS
1.54782
0.12276
1.67057
MS
0.77391
0.01023
F
75.65
P
0.000
Seq SS
0.99365
0.55416
The least squares prediction equation is: y 1.007 1.1671x .2898 x 2

c.
To determine if there is an upward curvature in the relationship between surface production rate and
time after turnoff, we test:
H0: 2 = 0
H a: 2 > 0
From the printout, the test statistic is t = 7.36 and the p-value is p = 0.000/2 = 0.000. Since the p-value
is less than (p = 0.000 < .05), Ho is rejected. There is sufficient evidence to indicate there is an
upward curvature in the relationship between surface production rate and time after turnoff at = .05.
11.147 a.
Using MINITAB, the scattergram is:
780
Chapter 11
b.
1 if 1 35W
Let x2 =
0 if not
The complete second-order model would be
E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2
c.

Regression Analysis
y = 776 + 0.104 x1 -0.000002 x1sq + 232 x2 - 0.0091 x1x2
+0.000000 x1sqx2
Predictor
Coef
Constant
776.4
x1
0.10418
x1sq
-0.00000223
x2
232
x1x2
-0.00914
x1sqx2
0.00000027
S = 15.58
StDev
144.5
0.01388
0.00000033
1094
0.09829
0.00000220
R-Sq = 97.2%
T
5.37
7.50
-6.73
0.21
-0.09
0.12
P
0.000
0.000
0.000
0.833
0.926
0.903
R-Sq(adj) = 97.0%
Source
Regression
Residual Error
Total
Source
x1
x1sq
x2
x1x2
x1sqx2
DF
1
1
1
1
1
DF
5
66
71
SS
555741
16027
571767
MS
111148
243
F
457.73
P
0.000
Seq SS
254676
21495
279383
183
4
Obs
x1
y
Fit StDev Fit
27 19062 1917.64 1953.27
2.51
48 26148 1982.02 1978.23
9.10
53 26166 1972.92 1978.01
9.15
55 20250 2120.00 2130.56
10.57
56 20251 2140.00 2130.57
10.57
63 24885 2160.02 2161.81
12.67
Residual
-35.63
3.79
-5.09
-10.56
9.43
-1.79
St Resid
-2.32R
0.30 X
-0.40 X
-0.92 X
0.82 X
-0.20 X

781
The fitted model is

2
2
y = 776 + .104x1 .000002x1 + 232x2 .0091x1x2 + .00000027x1 x2 .
To determine if the curvilinear relationship is different at the two locations, we test:

H0: 3 = 4 = 5 = 0
H0: At least one of the coefficients is nonzero
In order to test this hypothesis, we must fit the reduced model
E(y) = 0 + 1x1 + 2x12
Using MINITAB, the printout from fitting the reduced model is:
Regression Analysis
y = 197 + 0.149 x1 -0.000003 x1sq
Predictor
Coef
Constant
197.5
x1
0.14921
x1sq
-0.00000295
S = 65.45
StDev
578.9
0.05551
0.00000132
R-Sq = 48.3%
T
0.34
2.69
-2.24
P
0.734
0.009
0.028
R-Sq(adj) = 46.8%
Source
Regression
Residual Error
Total
Source
x1
x1sq
DF
2
69
71
DF
1
1
Obs
x1
y
30
16691 1916.13
48
26148 1982.02
53
26166 1972.92
56
20251 2140.00
SS
276171
295597
571767
MS
138085
4284
F
32.23
P
0.000
Seq SS
254676
21495
Fit
1865.11
2079.68
2079.59
2007.88
StDev Fit
23.39
33.08
33.31
10.43
Residual
51.02
-97.66
-106.67
132.12
St Resid
0.83 X
-1.73 X
-1.89 X
2.04R

2
The fitted regression line is y = 197 + .149x1 .000003x1
To determine if the curvilinear relationship is different at the two locations, we test:

H0: 3 = 4 = 5 = 0
782
Chapter 11

(295, 597 16, 027) /(5 2)
=
SSE C /[ n ( k 1)]
16, 027 /[72 (5 1)]
= 383.76
Since no was given we will use = .05. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = (k g) = (5 2) = 3 and 2 = n (k + 1) = 72 (5 + 1) = 66. From
Table VIII, Appendix B, F.05 2.76. The rejection region is
F > 2.76.
Since the observed value of the test statistic falls in the rejection region
(F = 383.76 > 2.76), H0 is rejected. There is sufficient evidence to indicate the curvilinear
relationship is different at the two locations at = .05.
d.
Using MINITAB, the plot of the residual versus x1 is:
From this plot, we notice that there is only one point more than 2 standard deviations from the mean
and no points that are more than 3 standard deviations from the mean. Thus, there do not appear to
be any outliers. There is no curve to the residuals, so we have the appropriate model.
A stem-and-leaf display of the residuals is:
Character Stem-and-Leaf Display
Stem-and-leaf of RESI1
Leaf Unit = 1.0
1
1
2
5
13
23
29
(10)
33
28
21
13
10
3
-3
-3
-2
-2
-1
-1
-0
-0
0
0
1
1
2
2
= 72
5
5
210
99877755
4443221100
996655
4432111000
03344
5678899
11222244
577
0012334
556
783
The stem-and-leaf display looks fairly mound-shaped, so it appears that the assumption of normality
is valid.
A plot of the residuals versus the fitted values is:
From this plot, there is no cone-shape. Thus, it appears that the assumption of constant variance is
valid.
11.148 a.
The first order model for this problem is:

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4
b.

Regression Analysis
y = 28.9 -0.000000 x1 + 0.844 x2 - 0.360 x3 - 0.300 x4
Predictor
Coef
Constant
28.87
x1
-0.00000011
x2
0.8440
x3
-0.3600
x4
-0.3003
S = 5.989
StDev
12.67
0.00000028
0.2326
0.1316
0.1834
R-Sq = 51.2%
T
2.28
-0.38
3.63
-2.74
-1.64
P
0.034
0.708
0.002
0.013
0.117
R-Sq(adj) = 41.5%
784
Chapter 11
Source
Regression
Residual Error
Total
Source
x1
x2
x3
x4
DF
4
20
24
DF
1
1
1
1
Obs
x1
y
4 11940345 32.60
12
4905123 27.00
SS
753.76
717.40
1471.17
MS
188.44
35.87
F
5.25
P
0.005
Seq SS
129.96
355.43
172.19
96.17
Fit
17.25
16.17
StDev Fit
3.40
4.36
Residual
15.35
10.83
St Resid
3.11R
2.63R
The least squares prediction line is y = 28.9 .00000011x1 + .844x2 .360x3 .300x4.
To determine if the model is useful for predicting percentage of problem mortgages, we test:
H0: 1 = 2 = 3 = 4 = 0
MS(Model)
= 5.25
MSE
is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages
at = .05.
c.
0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem.
1 = 0.00000011. For each unit increase in total mortgage loans, the mean percentage of problem
mortgages is estimated to decrease by 0.00000011, holding percentage of invested assets, percentage
of commercial mortgages, and percentage of residential mortgages constant.
2 = 0.844. For each unit increase in percentage of invested assets, the mean percentage of problem
mortgages is estimated to increase by 0.844, holding total mortgage loans, percentage of commercial
mortgages, and percentage of residential mortgages constant.
3 = 0.360. For each unit increase in percentage of commercial mortgages, the mean percentage of
problem mortgages is estimated to decrease by 0.360, holding total mortgage loans, percentage of
invested assets, and percentage of residential mortgages constant.
4 = 0.300. For each unit increase in percentage of residential mortgages, the mean percentage of
problem mortgages is estimated to decrease by 0.300, holding total mortgage loans, percentage of
invested assets, and percentage of commercial mortgages constant.
d.
Using MINITAB, the scattergrams are:
From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in the model as
second order terms.
785
786
Chapter 11
e.

Regression Analysis
y = 56.2 -0.000000 x1 - 1.82 x2 - 0.449 x3 + 0.223 x4 + 0.0771 x2sq - 0.0189 x4sq
Predictor
Coef
Constant
56.17
x1
-0.00000008
x2
-1.8177
x3
-0.4494
x4
0.2227
x2sq
0.07707
x4sq
-0.01887
S = 4.956
StDev
13.81
0.00000025
0.9935
0.1127
0.6079
0.02665
0.02334
R-Sq = 69.9%
T
4.07
-0.31
-1.83
-3.99
0.37
2.89
-0.81
P
0.001
0.760
0.084
0.001
0.718
0.010
0.429
R-Sq(adj) = 59.9%
Source
Regression
Residual Error
Total
Source
x1
x2
x3
x4
x2sq
x4sq
DF
6
18
24
DF
1
1
1
1
1
1
Obs
x1
y
4 11940345 32.600
10 5328142
7.500
12 4905123 27.000
20 2978628
3.200
SS
1029.03
442.13
1471.17
MS
171.51
24.56
F
6.98
P
0.001
Seq SS
129.96
355.43
172.19
96.17
259.22
16.05
Fit
26.777
16.105
16.559
11.759
StDev Fit
4.038
2.599
3.607
2.679
Residual
5.823
-8.605
10.441
-8.559
St Resid
2.03R
-2.04R
3.07R
-2.05R
The least squares prediction equation is

2
2
y = 56.2 .00000008x1 1.82x2 .449x3 + .223x4 + 1 .0771x2 .0189 x4

To determine if the model is useful for predicting percentage of problem mortgages, we test:
H0: 1 = 2 = 3 = 4 = 5 = 6 = 0
MS(Model)
= 6.98
MSE
787
is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages
at = .05.
f.
To determine if one or more of the second-order terms of our model contribute information for the
prediction of the percentage of problem mortgages, we test:
H0: 5 = 6 = 0
(SSE R SSE C) /(k g ) (717.40 442.13) /(6 4)
= 5.60
442.13 /[25 (6 1)]
SSE C /[n (k 1)]
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = (k g) = (6 4)
= 2 and 2 = n (k + 1) = 25 (6 + 1) = 18. From Table VIII, Appendix B, F.05 = 3.55. The
rejection region is F > 3.55.
rejected. There is sufficient evidence to indicate one or more of the second-order terms of our model
contribute information for the prediction of the percentage of problem mortgages at = .05.
11.149 a.
The model is:

E(y) = 0 + 1x1 + 2x2 + 3x3
where
y = market share
1 if VH
x1 =
0 otherwise
1 if H
x2 =
0 otherwise
1 if M
x3 =
0 otherwise
We assume that the error terms ( i) or y's are normally distributed at each exposure level, with a
common variance. Also, we assume the i's have a mean of 0 and are independent.
b.
No interaction terms were included because we have only one independent variable, exposure level.
Even though we have 3 xi's in the model, they are dummy variables and correspond to different levels
of the one independent variable.
788
Chapter 11
c.

Regression Analysis: y versus x1, x2, x3
y = 10.2 + 0.500 x1 + 2.02 x2 + 0.683 x3
Predictor
Constant
x1
x2
x3
Coef
10.2333
0.5000
2.0167
0.6833
S = 0.2655
SE Coef
0.1084
0.1533
0.1533
0.1533
R-Sq = 90.4%
T
94.41
3.26
13.16
4.46
P
0.000
0.004
0.000
0.000
R-Sq(adj) = 89.0%
Source
Regression
Residual Error
Total
Source
x1
x2
x3
DF
3
20
23
DF
1
1
1
SS
13.3433
1.4100
14.7533
MS
4.4478
0.0705
F
63.09
P
0.000
Seq SS
0.7200
11.2225
1.4008
The fitted model is y = 10.2 + .5x1 + 2.02x2 + .683x3

1 if VH
x1 =
0 otherwise
1 if H
x2 =
0 otherwise
1 if M
x3 =
0 otherwise
d.
To determine if the firm's expected market share differs for different levels of advertising exposure,
we test:
H0: 1 = 2 = 3 = 0
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k = 3 and 2 = n
3.10.
rejected. There is sufficient evidence to indicate the firm's expected market share differs for different
levels of advertising exposure at = .05.
11.150 a.
789
Using SAS, the output for fitting the model is:

DEP VARIABLE: Y
SOURCE
DF
SUM OF
SQUARES
MEAN
SQUARE
MODEL
ERROR
C TOTAL
3
16
11
2396.36410
128.58590
2524.95000
2.83489
23.05000
12.29889
ROOT MSE
DEP MEAN
C.V.
F VALUE
PROB>F
798.78803
8.03662
99.394
0.0001
R-SQUARE
ADJ R-SQ
0.9491
0.9395
PARAMETER ESTIMATES
PARAMETER
STANDARD
INTERCEP
X1
X1SQ
X2
1
1
1
1
-11.768830
10.293782
-0.417991
13.244076
3.05032146
1.43788129
0.16132974
1.50325080
T FOR H0:
-3.858
7.159
-2.591
8.810
VARI
0.0014
0.0001
0.0197
0.0001
The fitted model is: y = 11.8 + 10.3x1 .418 x1 + 13.2x2
b.
To determine if the second-order term is necessary, we test:
H0: 2 = 0
Ha: 2 0
The p-value is p = .0197. Since the p-value is less than (p = .0197 < .05), H0 is rejected. There is
sufficient evidence to conclude that the second-order term in the model proposed by the operations
manager is necessary at = .05.
790
Chapter 11
c.
The reduced model E(y) = 0 + 3x2 was fit to the data. The SAS output is:
DEP VARIABLE: Y
SOURCE
DF
SUM OF
SQUARES
MEAN
SQUARE
MODEL
ERROR
C TOTAL
1
18
19
1.25000000
2523.70000
2524.95000
ROOT MSE
DEP MEAN
C.V.
11.84084
23.05
51.37025
F VALUE
PROB>F
1.25000000
140.20556
0.009
0.9258
R-SQUARE
ADJ R-SQ
0.0005
-0.0550
PARAMETER ESTIMATES
VARIABLE
DF
PARAMETER
ESTIMATE
STANDARD
ERROR
T FOR H0:
PARAMETER=0
PROB > |T|
INTERCEP
X2
1
1
23.30000000
-0.50000000
3.74440323
5.29538583
6.223
-0.094
0.0001
0.9258
The fitted model is y = 23.3 .5x2.

The hypotheses are:
H0: 1 = 2 = 0
SSE C /[n (k 1)]
(2523.7 128.586) /(3 1) 1197.557
= 149.01
=
128.586 /[20 (3 1)]
8.036625
Since the observed value of the test statistic falls in the rejection region (F = 149.01
> 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine contributes
information to the model at = .10.
After adjusting for machine type, there is evidence that down time is related to age.
11.151 a.
791
0 = 105 has no meaning because x3 = 0 is not in the observable range. 0 is simply the yintercept.
1 = 25. The estimated difference in mean attendance between weekends and weekdays is 25,
temperature and weather constant.
2 = 100. The estimated difference in mean attendance between sunny and overcast days is 100,
type of day (weekend or weekday) and temperature constant.
3 = 10. The estimated change in mean attendance for each additional degree of temperature is 10,
type of day (weekend or weekday) and weather (sunny or overcast) held constant.
b.
To determine if the model is useful for predicting daily attendance, we test:

H0: 1 = 2 = 3 = 0
R2 / k
.65 / 3
= 16.10
(1 R 2 ) /[n (k 1)] (1 .65) /[30 (3 1)]
and denominator df = n (k + 1) = 30 (3 + 1) = 26. From Table VIII, Appendix B, F.05 2.98.
rejected. There is sufficient evidence to indicate the model is useful for predicting daily attendance
at = .05.
c.
To determine if mean attendance increases on weekends, we test:

H0: 1 = 0
H a: 1 > 0
1
s
25 0
= 2.5
10
The rejection region requires = .10 in the upper tail of the t distribution with df = n (k + 1) = 30
(3 + 1) = 26. From Table V, Appendix B, t.10 = 1.315. The rejection region is t > 1.315.
rejected. There is sufficient evidence to indicate the mean attendance increases on weekends at =
.10.
d.
Sunny x2 = 1, Weekday x1 = 0, Temperature 95 x3 = 95

= 105 + 25(0) + 100(1) + 10(95) = 945
e.
We are 90% confident that the actual attendance for sunny weekdays with a temperature of 95 is
between 645 and 1245.
792
Chapter 11
11.152 a.
For a sunny weekday, x1 = 0 and x2 = 1:
x3 = 70 y = 250 700(0) + 100(1) + 5(70) + 15(0)(70) = 700
x3 = 80 y = 250 700(0) + 100(1) + 5(80) + 15(0)(80) = 750
x3 = 90 y = 800
x3 = 100 y = 850
For a sunny weekend, x1 = 1 and x2 = 1:
x3 = 70 y = 250 700(1) + 100(1) + 5(70) + 15(1)(70) = 1050
x3 = 80 y = 250 700(1) + 100(1) + 5(80) + 15(1)(80) = 1250
x3 = 90 y = 1450
x3 = 100 y = 1650
For both sunny weekdays and sunny weekend days, as the predicted high temperature increases, so
does the predicted day's attendance. However, the predicted day's attendance on sunny weekend
days increases at a faster rate than on sunny weekdays. Also, the predicted day's attendance is higher
on sunny weekend days than on sunny weekdays.
b.
To determine if the interaction term is a useful addition to the model, we test:

H0: 4 = 0
Ha: 4 0
4
s
15
=5
3
= 30 (4 + 1) = 25. From Table V, Appendix B, t.025 = 2.06. The rejection region is t < 2.06 or t >
2.06.
Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06), H0 is rejected.
There is sufficient evidence to indicate the interaction term is a useful addition to the model at =
.05.
c.
793
For x1 = 0, x2 = 1, and x3 = 95,
y = 250 700(0) + 100(1) + 5(95) + 15(0)(95) = 825
d.
The width of the interval in Exercise 11.151e is 1245 645 = 600, while the width is
850 800 = 50 for the model containing the interaction term. The smaller the width of the interval,
the smaller the variance. This implies that the interaction term is quite useful in predicting daily
attendance. It has reduced the unexplained error.
e.
Because an interaction term including x1 is in the model, the coefficient corresponding to x1 must be
interpreted with caution. For all observed values of x3 (temperature), the interaction term value is
greater than 700.
11.153 a.
E(y) = 0 + 1x1 + 2x6 + 3x7

1 if condition is good
where x 6 =
0 otherwise
1 if condition is fair
x7 =
0 otherwise
b.
The model specified in part a seems appropriate. The points for E, F, and G cluster around three
parallel lines.
794
Chapter 11
c.
Using MINITAB, the output is

y = 188875 + 15617 x1 - 103046 x6 - 152487 x7
Predictor
Constant
x1
x6
x7
S = 64624
Coef
StDev
T
P
188875
28588
6.61
0.000
15617
1066
14.66
0.000
-103046
31784
-3.24
0.004
-152487
39157
-3.89
0.001
R-Sq = 91.8%
R-Sq(adj) = 90.7%
Source
DF
SS
MS
Regression
Residual Error
Total
3
21
24
9.86170E+11
87700442851
1.07387E+12
3.28723E+11
4176211564
78.71
0.000
Source
x1
x6
x7
DF
1
1
1
SeqSS
9.15776E+11
7061463149
63332198206
Obs
x1
y
10
62.0
950000
23
14.0
573200
Fit
1054078
407512
StDev Fit
53911
26670
Residual
-104078
165688
St Resid
-2.92RX
2.81R

The fitted model is y = 188,875 + 15,617x1 103,046x6 152,487x7
For excellent condition, y = 188,875 + 15,617x1
For good condition, y = 85,829 + 15,617x1
For fair condition, y = 36,388 + 15,617x1

d.
e.
795
We must first fit a reduced model with just x1, number of apartments. Using MINITAB, the output
is:
y = 101786 + 15525 x1
Predictor
Constant
x1
Coef
101786
15525
S = 82908
StDev
23291
1345
R-Sq = 85.3%
T
4.37
11.54
P
0.000
0.000
R-Sq(adj) = 84.6%
Source
Regression
Residual Error
Total
DF
1
23
24
Obs
x1
y
4
26.0
676200
10
62.0
950000
23
14.0
573200
SS
9.15776E+11
1.58094E+11
1.07387E+12
Fit
505433
1064353
319140
MS
9.15776E+11
6873656705
F
133.23
StDev Fit
24930
69058
16765
Residual
170757
-114353
254060
P
0.000
St Resid
2.16R
-2.49RX
3.13R

The fitted model is y = 101,786 + 15,525x1.

To determine if the relationship between sale price and number of units differs depending on the
physical condition of the apartments, we test:
H0: 2 = 3 = 0
F=
(SSE R SSE C ) /(k g ) (1.58094 1011 87, 700, 442, 851) / 2
= 8.43
SSE C /[n (k 1)]
4,176, 211, 564
region is F > 3.47.
rejected. There is evidence to indicate that the relationship between sale price and number of units
differs depending on the physical condition of the apartments at = .05.
796
Chapter 11
f.
We will look for high pairwise correlations.
x2
x3
x4
x5
x6
x7
x1
-0.014
0.800
0.224
0.878
0.175
-0.128
x2
x3
x4
x5
-0.188
-0.363
0.027
-0.447
0.392
0.166
0.673 0.089
0.271 0.112 0.020
-0.118 0.050 -0.238
x6
-0.564
When highly correlated independent variables are present in a regression model, the results are
confusing. The researchers may only want to include one of the variables. This may be the case for
the variables: x1 and x3, x1 and x5, x3 and x5
g.
Use the following plots to check the assumptions on .

residuals vs x1
residuals vs x2
residuals vs x3
residuals vs x4
residuals vs x5
resisduals vs predicted values
frequency distribution of the standardized residuals.
From the plots of the residuals, there do not appear to be any outliers - no standardized residuals are
larger than 2.38 in magnitude. In all the plots of the residuals vs xi, there is no trend that would
indicate non-constant variance (no funnel shape). In addition, there is no U or upside-down U shape
that would indicate that any of the variables should be squared. In the histogram of the residuals, the
plot is fairly mound-shaped, which would indicate the residuals are approximately normally
distributed. All of the assumptions appear to be met.
797
Residuals Versus x1
(response is y)
Residuals Versus x2
(response is y)
Residuals Versus x3
(response is y)
798
Chapter 11
Residuals Versus x4
(response is y)
Residuals Versus x5
(response is y)
Residuals Versus the Predicted Values

(response is y)
799

(response is y)
1 if C
11.154 Let x1 = Length of operation and let x 2
0 otherwise
To allow for the relationship between Drop in Light Output and Length of Operation to be different for the
two different Bulb Surfaces, we will fit the model: E(y) = 0 + 1x1 + 2x2 + 3x1x2.
Using MINITAB, the results of fitting
Regression Analysis: DROP versus x1, x2, x1x2
DROP = 1.46 + 0.00473 x1 + 5.39 x2 + 0.00991 x1x2
Predictor
Constant
x1
x2
x1x2
Coef
1.464
0.004732
5.393
0.009911
S = 3.15719
SE Coef
2.151
0.001492
3.042
0.002109
R-Sq = 95.5%
T
0.68
3.17
1.77
4.70
P
0.512
0.010
0.107
0.001
R-Sq(adj) = 94.1%
Source
Regression
Residual Error
Total
Source
x1
x2
x1x2
DF
1
1
1
DF
3
10
13
SS
2106.68
99.68
2206.36
MS
702.23
9.97
F
70.45
P
0.000
Seq SS
840.88
1045.79
220.02
The fitted regression line is y 1.464 .0047 x1 5.393 x2 .0099 x1 x2

To determine if the model is adequate for predicting Drop in Light Output, we test:
H0: 1 = 2 = 3 = 0
Ha: At least 1 i 0
800
Chapter 11
From the printout, the test statistic is F = 70.45 and the p-value is p = 0.000. Since the p-value is so small,
H0 is rejected. There is sufficient evidence to indicate the model is adequate for predicting Drop in Light
Output at any reasonable value of .
For this model, R2 = 95.5%. 95.5% of the total variability of the Drop in Light Outputs is explained by th
model containing Length of Operation, Bulb Surface, and the Interaction of Bulb Surface and Length of
Operation.
To determine if the interaction between Bulb Surface and Length of Operation is significant, we test:
H0: 3 = 0
Ha: 3 0
From the printout, the test statistic is t = 4.70 and the p-value is p = 0.001. Since the p-value is so small, H0
is rejected. There is sufficient evidence to indicate Bulb Surface and Length of Operation interact at any
reasonable value of .
Using MINITAB, the residual plots are:
Residual Plots for DROP
99
Residual
Percent
90
50
0
-4
10
1
-8
-4
0
Residual
-8
20
30
Fitted Value
40
6
Residual
Frequency
10
4
2
0
-6
-4
-2
0
Residual
0
-4
-8
5 6 7 8 9 10 11 12 13 14
Observation Order
From the histogram of the residuals, the residuals look somewhat mound-shaped. In addition, the normal
probability plot looks to be a fairly straight line. Thus, the assumption of normal errors appears to be valid.
From the plot of the residuals versus the fitted values, there is no funnel shape. It does not appear that the
error terms increase or decrease as the fitted values increase. Thus, it appears that the assumption of
constant variance appears to be valid.
It appears that the model is a pretty good model for the prediction of the Drop in Light Output.

11.155 a.
801
To determine whether the complete model contributes information for the prediction of y, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one of the 's is not 0, i = 1, 2, 3, 4, 5
MSR
SS(Model) 4, 911.56
982.31
k
5
MSE
b.
SSE
1, 830.44
53.84
n (k 1) 40 (5 1)
MSR 982.31
=
= 18.24
MSE 53.84
and denominator df = n (k + 1) = 40 (5 + 1) = 34. From Table VIII, Appendix B, F.05 2.53.
rejected. There is sufficient evidence to indicate that the complete model contributes information for
the prediction of y at = .05.
c.
To determine whether a second-order model contributes more information than a first-order model
for the prediction of y, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0,
d.
i = 3, 4, 5
(SSE R SSE C ) /(k g ) (3197.16 1830.44) /(5 2)
SSE C /[ n ( k 1)]
1830.44 /(40 (5 1)
=
455.5733
= 8.46
53.8365
= 3 and denominator df = n (k + 1) = 40 (5 + 1) = 34. From Table VIII, Appendix B, F.05 2.92.
rejected. There is sufficient evidence to indicate the second-order model contributes more
information than a first-order model for the prediction of y at = .05.
e.
11.156 a.
The second-order model, based on the test result in part d.

The complete second order model is:
2
2
E(y) = 0 + 1x1 + x1 + 3x2 + 4x1x2 + 5 x1 x2
where x1 = age
1 if current
x2
0 otherwise
802
Chapter 11
b.
To determine if the quadratic terms are important, we test:
c.
H0: 2 = 5 = 0
To determine if the interaction terms are important, we test:
H0: 4 = 5 = 0
d.
From MINITAB, the outputs from fitting the three models are:
Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt
Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt
Predictor
Constant
Age
AgeSq
Status
AgeSt
AgeSqSt
Coef
83.4
-5.74
0.2361
-62.1
5.36
-0.2337
S = 286.8
SE Coef
316.3
18.68
0.2549
354.8
24.81
0.4080
R-Sq = 24.7%
T
0.26
-0.31
0.93
-0.18
0.22
-0.57
P
0.793
0.760
0.359
0.862
0.830
0.570
R-Sq(adj) = 16.1%
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
AgeSt
AgeSqSt
DF
5
44
49
DF
1
1
1
1
1
SS
1186549
3618994
4805542
MS
237310
82250
F
2.89
P
0.024
Seq SS
865746
138871
77594
77342
26996
Regression Analysis: Value versus Age, Status, AgeSt

Value = - 176 + 11.2 Age + 196 Status - 11.4 AgeSt
Predictor
Constant
Age
Status
AgeSt
Coef
-176.1
11.166
196.5
-11.432
S = 283.2
SE Coef
145.0
3.902
178.9
6.763
R-Sq = 23.2%
T
-1.21
2.86
1.10
-1.69
P
0.231
0.006
0.278
0.098
R-Sq(adj) = 18.2%
Source
Regression
Residual Error
Total
Source
Age
Status
AgeSt
DF
1
1
1
DF
3
46
49
SS
1116017
3689526
4805543
MS
372006
80207
F
4.64
P
0.006
Seq SS
865746
21097
229174
Regression Analysis: Value versus Age, AgeSq, Status

Value = 166 - 8.8 Age + 0.253 AgeSq - 106 Status

Predictor
Constant
Age
AgeSq
Status
Coef
165.8
-8.81
0.2535
-105.6
803
S = 284.5
SE Coef
182.7
10.89
0.1632
107.9
R-Sq = 22.5%
T
0.91
-0.81
1.55
-0.98
P
0.369
0.423
0.127
0.333
R-Sq(adj) = 17.5%
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
DF
1
1
1
DF
3
46
49
SS
1082210
3723332
4805542
MS
360737
80942
F
4.46
P
0.008
Seq SS
865746
138871
77594
Test for part b:

F=
(SSE R SSE C)/(k g ) (3, 689, 526 3, 618, 994) / 2
= .429
82, 250
SSE C /[n (k 1)]
Since no is given, we will use = .05. The rejection region requires = .05 in the upper tail of the
F distribution with 1 = 2 numerator degrees of freedom and 2 = 44 denominator degrees of
freedom. From Table VIII, Appendix B, F.05 3.23. The rejection region is F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region (F = .429 3.23),
H0 is not rejected. There is insufficient evidence to indicate the quadratic terms are important for
predicting market value at = .05.
Test for part c:
F=
(SSE R SSE C)/(k g ) (3, 723, 332 3, 618, 994) /(5 3)

= .634
82, 250
SSE C /[n (k 1)]
The rejection region is the same as in previous test. Reject H0 if F > 3.23.
(F = .634 3.23), H0 is not rejected. There is insufficient evidence to indicate the interaction terms
are important for predicting market value at = .05.
804
Chapter 11
11.157 First, we will fit the simple linear regression model: E ( y ) o 1 x1 2 x2

Regression Analysis: y versus x1, x2
y = - 1.57 + 0.0257 x1 + 0.0336 x2
Predictor
Constant
x1
x2
Coef
-1.5705
0.025732
0.033615
S = 0.4023
SE Coef
0.4937
0.004024
0.004928
R-Sq = 68.1%
T
-3.18
6.40
6.82
P
0.003
0.000
0.000
R-Sq(adj) = 66.4%
Source
Regression
Residual Error
Total
Source
x1
x2
DF
1
1
DF
2
37
39
SS
12.7859
5.9876
18.7735
MS
6.3930
0.1618
F
39.51
P
0.000
Seq SS
5.2549
7.5311
Obs
x1
y
4
100
1.5400
32
39
1.2200
Fit
2.6498
2.1558
SE Fit
0.1699
0.1483
Residual
-1.1098
-0.9358
St Resid
-3.04R
-2.50R
To determine if the model is useful in the prediction of y (GPA), we test:

H0: 1 = 2 = 0
The test statistic is F = 39.51 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate at least one of the variables Verbal score
or Mathematics score is useful in predicting GPA.
To determine if Verbal score is useful in predicting GPA, controlling for Mathematics score, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 6.40 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate Verbal score is useful in predicting
GPA, controlling for Mathematics score.
To determine if Mathematics score is useful in predicting GPA, controlling for Verbal score, we test:
H0: 2 = 0
Ha: 2 0
any reasonable value of . There is sufficient evidence to indicate Mathematics score is useful in
predicting GPA, controlling for Verbal score.
805
Thus, both terms in the model are significant. The R-squared value is R2 = .681.
This indicates that 68.1% of the sample variance of the GPAs is explained by the model.
Now, we need to check the residuals. From MINITAB, the plots are:
Residual Plots for y


0.5
Residual
1.0
90
Percent
99
50
10
1
0.0
-0.5
-1.0
-1.0
-0.5
0.0
Residual
0.5
1.0
3
Fitted Value

1.0
0.5
Residual
9
6
3
0
0.0
-0.5
-1.0
-1.00 -0.75 -0.50 -0.25 0.00
Residual
0.25
0.50
0.75
10
15
20
25
30
Observation Order
Residuals Versus x1
(response is y )
1.0
0.5
Residual
Frequency
12
0.0
-0.5
-1.0
40
50
60
70
x1
80
90
100
35
40
806
Chapter 11
Residuals Versus x2
(response is y )
1.0
Residual
0.5
0.0
-0.5
-1.0
50
60
70
80
90
100
x2
From the normal probability plot, it appears that the assumption of normality is valid. The points are very
close to a straight line except for the first 2 points. The histogram of the residuals implies that the residuals
are slightly skewed to the left. I would still consider the assumption to be valid. The plot of the residuals
versus y-hat indicates a random spread of the residuals between the two bands. This indicates that the
assumption of equal variances is probably valid. The plot of the residuals versus x1 indicates that the
relationship between GPA and Verbal score may not be linear, but quadratic because the points form a
somewhat upside down U shape. The plot of the residuals versus x2 indicates that the relationship between
GPA and Mathematics score may or may not be quadratic.
Since the plots indicate a possible 2nd order model and the R2 value is not real large, we will fit a complete
2nd order model:
2
2
E ( y ) o 1 x1 2 x2 3 x1 4 x2 5 x1 x2
807

Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2
y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq - 0.000843 x2sq + 0.000241 x1x2
Predictor
Constant
x1
x2
x1sq
x2sq
x1x2
Coef
-9.917
0.16681
0.13760
-0.0011082
-0.0008433
0.0002411
S = 0.187142
SE Coef
1.354
0.02124
0.02673
0.0001173
0.0001594
0.0001440
R-Sq = 93.7%
T
-7.32
7.85
5.15
-9.45
-5.29
1.67
P
0.000
0.000
0.000
0.000
0.000
0.103
R-Sq(adj) = 92.7%
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
x1x2
DF
1
1
1
1
1
DF
5
34
39
SS
17.5827
1.1908
18.7735
MS
3.5165
0.0350
F
100.41
P
0.000
Seq SS
5.2549
7.5311
3.6434
1.0552
0.0982
Obs
2
4
34
x1
68
100
70
y
2.8900
1.5400
3.8200
Fit
3.2820
1.5806
3.3940
SE Fit
0.1002
0.1404
0.0753
Residual
-0.3920
-0.0406
0.4260
St Resid
-2.48R
-0.33 X
2.49R

To determine if the interaction between Verbal score and Mathematics score is useful in the prediction of y
(GPA), we test:
H0: 5 = 0
H a: 5 0
The test statistic is t = 1.67 and the p-value is p = 0.103. Since the p-value is not small, H0 is not rejected
for any value of < .10. There is insufficient evidence to indicate the interaction between Verbal score
and Mathematics score is useful in predicting GPA.
Now, we will fit a model without the interaction term, but including the squared terms:
2
2
E ( y ) o 1 x1 2 x2 3 x1 4 x2
808
Chapter 11

Regression Analysis: y versus x1, x2, x1sq, x2sq
y = - 11.5 + 0.189 x1 + 0.159 x2 - 0.00114 x1sq - 0.000871 x2sq
Predictor
Constant
x1
x2
x1sq
x2sq
Coef
-11.458
0.18887
0.15874
-0.0011412
-0.0008705
S = 0.191905
SE Coef
1.019
0.01709
0.02417
0.0001186
0.0001626
R-Sq = 93.1%
T
-11.24
11.05
6.57
-9.62
-5.35
P
0.000
0.000
0.000
0.000
0.000
R-Sq(adj) = 92.3%
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
DF
1
1
1
1
DF
4
35
39
SS
17.4845
1.2890
18.7735
MS
4.3711
0.0368
F
118.69
P
0.000
Seq SS
5.2549
7.5311
3.6434
1.0552
Obs
2
4
32
34
x1
68
100
39
70
y
2.8900
1.5400
1.2200
3.8200
Fit
3.2921
1.7059
1.3190
3.3954
SE Fit
0.1025
0.1219
0.1240
0.0772
Residual
-0.4021
-0.1659
-0.0990
0.4246
St Resid
-2.48R
-1.12 X
-0.68 X
2.42R

To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics
score, we test:
H0: 3 = 0
Ha: 3 0
any reasonable value of . There is sufficient evidence to indicate the relationship between Verbal score
and GPA is quadratic, controlling for Mathematics score.
To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics
score, we test:
H0: 4 = 0
Ha: 4 0
809
any reasonable value of . There is sufficient evidence to indicate the relationship between Mathematics
score and GPA is quadratic, controlling for Verbal score.
Thus, both quadratic terms in the model are significant. The R-squared value is R2 =.913. This indicates
that 91.3% of the sample variance of the GPAs is explained by the model.
Now, we need to check the residuals. From MINITAB, the plots are:
Residual Plots for y

0.25
Residual
0.50
90
Percent
99
50
10
1
-0.50
-0.25
0.00
Residual
0.25
0.00
-0.25
-0.50
0.50
2
Fitted Value

0.50
12
0.25
Residual
16
Frequency
8
4
0
-0.4
-0.2
0.0
Residual
0.2
0.4
0.00
-0.25
-0.50
10
15
20
25
30
Observation Order
35
40
810
Chapter 11
Residuals Versus x1
(response is y )
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
40
50
60
70
x1
80
90
100
Residuals Versus x2
(response is y )
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
50
60
70
80
90
100
x2
From the normal probability plot, it appears that the assumption of normality is valid. The points are very
close to a straight line. The histogram of the residuals also implies that the residuals are approximately
normal. The plot of the residuals versus y-hat indicates a random spread of the residuals between the two
bands. This indicates that the assumption of equal variances is probably valid. The plot of the residuals
versus x1 indicates a random spread of the residuals between the two bands. This indicates that the order of
x1 (2nd) is appropriate. The plot of the residuals versus x2 indicates a random spread of the residuals
between the two bands. This indicates that the order of x2 (2nd) is appropriate.
The model appears to be pretty good. All terms in the model are significant, the residual analysis indicates
the assumptions are met and the R-squared value is fairly close to 1. The fitted model is
2
2
y 11.5 0.189 x1 0.159 x2 0.0114 x1 0.000871x2 .

16 CH11a-isbe11

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

16 CH11a-isbe11

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 11

Multiple Regression and Model Building

E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5

0 = 506.346, 1 = 941.900, 2 = -429.060

y = 506.346 941.900x1 429.060x2

SSE = 151,016, MSE = 8883, s = 94.251

2 t.025 s 429.060 2.110(379.83) 429.060 801.441

Multiple Regression and Model Building

To determine if at least one of the independent variables is significant in prediction y, we test:

From the printout, the test statistic is F = 7.22

We are given 2 = 2.7, s = 1.86, and n = 30.

The test statistic is t =

H0 is not rejected. There is insufficient evidence to indicate 2 0 at = .05.

We are given 3 = .93, s = .29, and n = 30.

The test statistic is t =

though 3 is smaller than 2 .

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

We are given 1 = 3.1, s = 2.3, and n = 25.

The test statistic is t =

H0 is not rejected. There is insufficient evidence to indicate 1 > 0 at = .05.

We are given 2 = .92, s = .27, and n = 25.

1 t.05 s 3.1 1.717(2.3) 3.1 3.949 (.849, 7.049)

We are 90% confident that 1 falls between .849 and 7.049.

2 t.005 s .92 2.819(.27) .92 .761 (.159, 1.681)

We are 99% confident that 2 falls between .159 and 1.681.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

The relationship will be parallel lines.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

To determine if the model is useful, we test:

To determine if income is a useful predictor of Mach score, we test:

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

The least squares prediction equation is: y 1.81231 0.10875 x1 0.00017 x2

1 t.005 s 0.10875 2.63(0.03166) 0.10875 0.08327

The 99% confidence interval is:

2 t.005 s 0.00017 2.63(0.00003) 0.00017 0.00008

The least squares prediction equation is: y 1.20785 0.06343x1 0.00056 x2

1 t.005 s 0.06343 2.63(0.01809) 0.06343 0.04758

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

2 t.005 s 0.00056 2.63(0.00012) 0.00056 0.00032

The first order model is: E(y) 0 1 x1 2 x2 3 x3 4 x4 5 x5

To determine if the overall model is adequate, we test:

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

The test statistic is F

The least squares prediction equation is:

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

5 t / 2 s 1.51 1.96(.05) 1.51 0.098 (1.412, 1.608)

The first order model would be

To determine if the overall first-order regression model is adequate, we test:

the 4 independent variables is linearly related to Change from Routine at = .01.

independent variables is linearly related to Surprise at = .01.

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

Multiple Regression and Model Building

Let x1 = latitude, x2 = longitude, and x3 = depth. The 1st-order model is

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.

1,543.91, holding latitude and depth constant.

.3493, holding latitude and longitude constant.

To determine if the model is adequate, we test:

The first-order model is: E(y) = 0 + 1x1 + 2x2

Using MINITAB, the results of fitting the model are:

Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.