Académique Documents
Professionnel Documents
Culture Documents
c.
11.2
a.
b.
11.1
a.
b.
c.
d.
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1 0
s
941.900
= 3.42
275.08
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 20 (2 + 1) = 17. From Table V, Appendix B, t.025 = 2.110. The rejection region is t < 2.110 or t
> 2.110.
Since the observed value of the test statistic falls in the rejection region (t = 3.42 < 2.110), H0 is
rejected. There is sufficient evidence to indicate 1 0 at = .05.
e.
For confidence coefficient .95, = .05 and /2 = .025. From Table V, Appendix B, with
df = n (k + 1) = 20 (2 + 1) = 17, t.025 = 2.110. The 95% confidence interval is:
(1230.501, 372.381)
f.
R2 = R-Sq = 45.9% . 45.9% of the total sample variation of the y values is explained by the model
containing x1 and x2.
R2a = R-Sq(adj) = 39.6%. 39.6% of the total sample variation of the y values is explained by the
model containing x1 and x2, adjusted for the sample size and the number of parameters in the model.
676
Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall.
g.
677
a.
H0: 2 = 0
Ha: 2 0
2 0
s
2.7
= 1.45
1.86
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 30 (3 + 1) = 26. From Table V, Appendix B, t.025 = 2.056. The rejection region is t < 2.056 or t
> 2.056.
Since the observed value of the test statistic does not fall in the rejection region (t = 1.45 2.056),
Test
H0: 3 = 0
Ha: 3 0
3 0
s
.93
= 3.21
.29
The rejection region is the same as part a, t < 2.056 or t > 2.056.
Since the observed value of the test statistic falls in the rejection region (t = 3.21 > 2.056), H0 is
rejected. There is sufficient evidence to indicate 3 0 at = .05.
11.3
The observed significance level of the test is p-value = 0.005. Since the p-value is so small, we will
reject H0 for most reasonable values of . There is sufficient evidence to indicate at least one of the
variables, x1 or x2, is significant in predicting y at greater than 0.005.
c.
3 has a smaller estimated standard error than 2 . Therefore, the test statistic is larger for 3 even
678
11.4
Chapter 11
a.
H0: 1 = 0
H a: 1 > 0
1 0
s
3.1
= 1.35
2.3
The rejection region requires = .05 in the upper tail of the t distribution with df =
n (k + 1) = 25 (2 + 1) = 22. From Table V, Appendix B, t.05 = 1.717. The rejection region is
t > 1.717.
Since the observed value of the test statistic does not fall in the rejection region (t = 1.35 1.717),
H0: 2 = 0
Ha: 2 0
The test statistic is t =
2 0
s
.92
= 3.41
.27
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 25 (2 + 1) = 22. From Table V, Appendix B, t.025 = 2.074. The rejection region is t < 2.074 or t
> 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.41 > 2.074), reject H0.
There is sufficient evidence to indicate 2 0 at = .05.
c.
For confidence coefficient .90, = 1 .90 = .10 and /2 = .10/2 = .05. From Table V, Appendix B,
with df = n (k + 1) = 25 (2 + 1) = 22, t.05 = 1.717. The confidence interval is:
For confidence coefficient .99, = 1 .99 = .01 and /2 = .01/2 = .005. From Table V, Appendix B,
with df = n (k + 1) = 25 (2 + 1) = 22, t.005 = 2.819. The confidence interval is:
The number of degrees of freedom available for estimating 2 is n (k + 1) where k is the number of
independent variables in the regression model. Each additional independent variable placed in the model
causes a corresponding decrease in the degrees of freedom.
11.6
a.
For x2 = 1 and x3 = 3,
E(y) = 1 + 2x1 + 1 3(3)
E(y) = 2x1 7
The graph is :
b.
For x2 = 1 and x3 = 1
E(y) = 1 + 2x1 + (1) 3(1)
E(y) = 2x1 3
The graph is:
c.
They are parallel, each with a slope of 2. They have different y-intercepts.
d.
679
680
Chapter 11
a.
Yes. Since R2 = .92 is close to 1, this indicates the model provides a good fit. Without knowledge of
the units of the dependent variable, the value of SSE cannot be used to determine how well the model
fits.
b.
11.7
H0: 1 = 2 = = 5 = 0
Ha: At least one of the parameters is not 0
The test statistic is F =
R2 / k
(1 R ) /[n (k 1)]
2
.92 / 5
= 55.2
(1 .92) /[30 (5 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 5 and 2 = n
(k + 1) = 30 (5 + 1) = 24. From Table VIII, Appendix B, F.05 = 2.62. The rejection region is F >
2.62.
Since the observed value of the test statistic falls in the rejection region (F = 55.2 > 2.62), H0 is
rejected. There is sufficient evidence to indicate the model is useful in predicting y at = .05.
11.8
No. There may be other independent variables that are important that have not been included in the model,
while there may also be some variables included in the model which are not important. The only
conclusion is that at least one of the independent variables is a good predictor of y.
11.9
a.
b.
R2 = .13. 13% of the total sample variation of the accountants Mach scores around their means is
explained by the model containing age, gender, education, and income.
c.
11.10
a.
The two properties are that the sum of the errors of prediction is 0 and the sum of the squares of the
errors of prediction is SSE.
b.
4 .42 . For each unit change in the betweenness centrality score, the mean lead-user rating is
estimated to increase by .42, holding all other variables constant.
c.
Since the p-value is less than (p = .002 < .05), H0 is rejected. There is sufficient evidence to indicate
that there is a significant linear relationship between betweenness centrality and lead-user rating,
holding all other variables constant.
11.11
a.
b.
681
o 1.81231 . Since x1 = 0 and x2 = 0 are not in the observed range, o has no meaning.
1 0.10875 . For each additional mile of roadway length, the mean number of crashes per three
years is estimated to increase by .10875 when average annual daily traffic is held constant.
2 0.00017 . For each additional unit increase in average annual daily traffic, the mean number of
crashes per three years is estimated to increase by .00017 when miles of roadway length is held
constant.
c.
For confidence coefficient .99, = .01 and /2 = .01/2 = .005. From Table V,
Appendix B, with df = n (k + 1) = 100 (2 + 1) = 97, t.005 2.63. The 99% confidence interval is:
(0.02548, 0.19202)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.02548 and 0.19202 for each additional mile of roadway length, holding average annual
daily traffic constant.
d.
(0.00009, 0.00025)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.00009 and 0.00025 for each additional unit increase in average annual daily traffic,
holding mile of roadway length constant.
e.
o 1.20785 . Since x1 = 0 and x2 = 0 are not in the observed range, o has no meaning.
1 0.06343 . For each additional mile of roadway length, the mean number of crashes per three
years is estimated to increase by 0.06343 when average annual daily traffic is held constant.
2 0.00056 . For each additional unit increase in average annual daily traffic, the mean number of
crashes per three years is estimated to increase by 0.00056 when miles of roadway length is held
constant.
The 99% confidence interval is:
(0.01585, 0.11101)
682
Chapter 11
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.01585 and 0.11101 for each additional mile of roadway length, holding average annual
daily traffic constant.
The 99% confidence interval is:
(0.00024, 0.00088)
We are 99% confident that the increase in the mean number of crashes per three years will be
between 0.00024 and 0.00088 for each additional unit increase in average annual daily traffic,
holding mile of roadway length constant.
a.
b.
R2 = .58. 58% of the total sample variation of the levels of trust is explained by the model containing
the 5 independent variables.
c.
d.
11.12
The rejection region requires = .10 in the upper tail of the F-distribution with 1 = k = 5 and
2 = n (k + 1) = 66 (5 + 1) = 60. From Table VII, Appendix B, F.10 = 1.90. The rejection region
is F > 1.96.
R2 k
(1 R ) [n (k 1)]
2
.58 5
16.57
(1 .58) [66 (5 1)]
Since the observed value of the test statistic falls in the rejection region (F = 16.57 > 1.96), H0 is
rejected. There is sufficient evidence to indicate that at least one of the 5 independent variables is
useful in the prediction of level of trust at = .10.
11.13
a.
1 2.006 . For each unit increase in the proportion of block with low-density residential areas, the
mean population density is estimated to increase by 2.006, holding proportion of block with highdensity residential areas constant. Since x1 is a proportion, it is unlikely that it can increase by one
unit. A better interpretation is: For each increase of .1 in the proportion of block with low-density
residential areas, the mean population density is estimated to increase by .2006, holding proportion of
block with high-density residential areas constant.
2 5.006 . For each unit increase in the proportion of block with high-density residential areas, the
mean population density is estimated to increase by 5.006, holding proportion of block with lowdensity residential areas constant. Since x2 is a proportion, it is unlikely that it can increase by one
unit. A better interpretation is: For each increase of .1 in the proportion of block with high-density
residential areas, the mean population density is estimated to increase by .5006, holding proportion of
block with low-density residential areas constant.
b.
R2 = .686. 68.6% of the total sample variation of the population densities is explained by the linear
relationship between population density and the independent variables proportion of block with lowdensity residential areas and the proportion of block with high-density residential areas.
c.
683
R2 / k
e.
The rejection region requires = .01 in the upper tail of the F distribution with 1 = k = 2 and 2 = n
(k + 1) = 125 (2 + 1) = 122. From Table X, Appendix B, F.01 4.79. The rejection region is
F > 4.79.
(1 R ) /[n (k 1)]
2
.686 / 2
133.27
(1 .686) /[125 (2 1)]
d.
Since the observed value of the test statistic falls in the rejection region (F = 133.27 > 4.79), H0 is
rejected. There is sufficient evidence to indicate the model is adequate at = .01.
11.14
a.
y 3.70 .34 x1 .49 x2 .72 x3 1.14 x4 1.51x5 .26 x6 .14 x7 .10 x8 .10 x9 .
b.
0 3.70 . This is estimate of the y-intercept. It has no other meaning because the point with all
independent variables equal to 0 is not in the observed range.
1 0.34 . For each additional walk, the mean number of runs scored is estimated to increase by
.30, holding all other variables constant.
2 0.49 . For each additional single, the mean number of runs scored is estimated to increase by
.49, holding all other variables constant.
3 0.72 . For each additional double, the mean number of runs scored is estimated to increase by
.72, holding all other variables constant.
4 1.14 . For each additional triple, the mean number of runs scored is estimated to increase by
1.14, holding all other variables constant.
5 1.51 . For each additional home run, the mean number of runs scored is estimated to increase
by 1.51, holding all other variables constant.
6 0.26 . For each additional stolen base, the mean number of runs scored is estimated to increase
by .26, holding all other variables constant.
7 0.14 . For each additional time a runner is caught stealing, the mean number of runs scored is
estimated too decrease by .14, holding all other variables constant.
8 0.10 . For each additional strikeout, the mean number of runs scored is estimated to decrease
by .10, holding all other variables constant.
9 0.10 . For each additional out, the mean number of runs scored is estimated to decrease by
.10, holding all other variables constant.
684
Chapter 11
c.
H0: 7 = 0
Ha: 7 < 0
The test statistic is t
7 0
s
.14 0
1.00
.14
The rejection region requires = .05 in the lower tail of the t-distribution with df = n (k + 1) = 234
(9 + 1) = 224. From Table V, Appendix B, t.05 = 1.645. The rejection region is t < 1.645.
Since the observed value of the test statistic does not fall in the rejection region
(t = 1.00 1.645), H0 is not rejected. There is insufficient evidence to indicate that the mean
number of runs decreases as the number of runners caught stealing increase, holding all other
variables constant at = .05.
d.
For confidence level .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix
B, with df = 224, t.025 = 1.96. The 95% confidence interval is:
We are 95% confident that the mean number of runs will increase by anywhere from 1.412 to 1.608
for each additional home run, holding all other variables constant.
11.15
a.
b.
Since the p-value is less than (p = .005 < .01), H0 is rejected. There is sufficient evidence to
indicate that there is a negative linear relationship between change from routine and the number of
years played golf, holding number of rounds of golf per year, total number of golf vacations, and
average golf score constant.
c.
The statement would be correct if the independent variables are not correlated. However, if the
independent variables are correlated, then this interpretation would not necessarily hold.
d.
e.
For all dependent variables, the rejection region requires = .01 in the upper tail of the
F-distribution with 1 = k = 4 and 2 = n (k + 1) = 393 (4 + 1) = 388. From Table X, Appendix
B, F.01 3.32. The rejection region is F > 3.32. Using MINITAB, the exact F.01, 4, 388 is 3.67. The
true rejection region is F > 3.67.
f.
For Thrill: Since the observed value of the test statistic falls in the rejection region
(F = 5.56 > 3.67), H0 is rejected. There is sufficient evidence to indicate at least one of the 4
independent variables is linearly related to Thrill at = .01.
For Change from Routine: Since the observed value of the test statistic does not fall in the rejection
region (F = 3.02 3.67), H0 is not rejected. There is insufficient evidence to indicate at least one of
f.
685
For Thrill: Since the p-value is less than (p < .001 < .01), H0 is rejected. There is sufficient evidence
to indicate that at least one of the independent variables is linearly related to Thrill at = .01.
For Change from Routine: Since the p-value is not less than (p = .018 > .01), H0 is not rejected.
There is insufficient evidence to indicate that at least one of the independent variables is linearly
related to Change from Routine at = .01.
For Surprise: Since the p-value is not less than (p = .011 > .01), H0 is not rejected. There is
insufficient evidence to indicate that at least one of the independent variables is linearly related to
Surprise at = .01.
h.
For Thrill: R2 = .055. 5.5% of the total variability around the mean thrill values can be explained by
the model containing the 4 independent variables: x1 = number of rounds of golf per year, x2 = total
number of golf vacations taken, x3 = number of years played golf, and x4 = average golf score.
For Change from Routine: R2 = .030. 3.0% of the total variability around the mean change from
routine values can be explained by the model containing the 4 independent variables: x1 = number of
rounds of golf per year, x2 = total number of golf vacations taken, x3 = number of years played golf,
and x4 = average golf score.
For Surprise: R2 = .023. 2.3% of the total variability around the mean surprise values can be
explained by the model containing the 4 independent variables: x1 = number of rounds of golf per
year, x2 = total number of golf vacations taken, x3 = number of years played golf, and x4 = average
golf score.
11.16
a.
b.
Coef
-86991
-2220.1
1543.9
-0.3493
S = 103.295
SE Coef
31218
526.8
373.0
0.1566
R-Sq = 12.8%
T
-2.79
-4.21
4.14
-2.23
P
0.006
0.000
0.000
0.026
R-Sq(adj) = 12.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
LATITUDE
LONGITUDE
DEPTH-FT
DF
1
1
1
DF
3
323
326
SS
506196
3446366
3952562
MS
168732
10670
F
15.81
P
0.000
Seq SS
132506
320624
53066
The least squares model is: y 80, 991 2, 220.1latitude 1, 543.9longitude .3493depth
686
Chapter 11
c.
1 -2,220.1. For each unit increase in latitude, the mean arsenic level is estimated to decrease by
2,220.1, holding longitude and depth constant.
1,543.91. For each unit increase in longitude, the mean arsenic level is estimated to increase by
2
-.3493. For each unit increase in depth, the mean arsenic level is estimated to decrease by
3
From the printout, the s = 103.295. We would expect about 95% of all observations to fall within 2s =
2(103.295) = 206.590 units of their predicted values.
e.
From the printout, R2 = 12.8%. 12.8% of the total sample variation of the arsenic levels is explained
by the model containing latitude, longitude, and depth.
From the printout, R2adj = 12.0. 12.0% of the total sample variation of the arsenic levels is explained
by the model containing latitude, longitude, and depth, adjusting for the sample size and number of
independent variables in the model.
f.
g.
11.17
Although the model was found to be adequate for = .05, it is not a particularly good model. The R2
value is only 12.8% and R2adj = 12.0. Only about 12% of the variation in arsenic values is explained
by the model.
a.
b.
Coef
-20.4
13.350
243.71
SE Coef
652.7
7.672
63.51
R-Sq = 58.2%
T
-0.03
1.74
3.84
P
0.976
0.107
0.002
R-Sq(adj) = 51.3%
687
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Hours
DF
1
1
DF
2
12
14
SS
5018232
3600196
8618428
MS
2509116
300016
F
8.36
P
0.005
Seq SS
600498
4417734
Unusual Observations
Obs
4
Age
18.0
Earnings
1552
Fit
2657
SE Fit
205
Residual
-1105
St Resid
-2.18R
0 = 20.4. This has no meaning since x1 = 0 and x2 = 0 are not in the observed range.
1 = 13.350. For each additional year of age, the mean annual earnings is predicted to increase by
$13.350, holding hours worked per day constant.
2 = 243.71. For each additional hour worked per day, the mean annual earnings is predicted to
increase by $243.71, holding age constant.
d.
e.
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with
df = n (k + 1) = 15 (2 + 1) = 12, t.025 = 2.179. The 95% confidence interval is:
(105.322, 382.098)
We are 95% confident that the change in the mean annual earnings for each additional hour worked
per day will be somewhere between $105.322 and $382.098, holding age constant.
f.
From the printout, R2 = R-Sq = 58.2% or .582. 58.2% of the total sample variance of annual earnings
is explained by the model containing age and hours worked per day.
g.
R2a = R-Sq(adj) = 51.3% or .513. 51.3% of the total sample variance of annual earnings is
explained by the model containing age and hours worked per day, adjusted for the sample size and
the number of parameters in the model.
688
Chapter 11
h.
To determine if at least one of the variables is useful in predicting the annual earnings, we test:
H0: 1 = 2 = 0
Ha: At least 1 i 0
The test statistic is F = 8.36 and the p-value is p = .005. Since the p-value is less than
= .01 (p = .005 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the
variables is useful in predicting the annual earnings at = .01.
11.18
a.
Coef
-108.07
0.08509
3.771
-0.04941
S = 97.48
SE Coef
62.70
0.08221
1.619
0.02926
R-Sq = 3.9%
T
-1.72
1.03
2.33
-1.69
P
0.087
0.302
0.021
0.094
R-Sq(adj) = 1.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
3
140
143
SS
53794
1330210
1384003
MS
17931
9501
F
1.89
P
0.135
s = 97.48. We would expect about 95% of the observed values of DDT level to fall within 2s or
2(97.48) = 194.96 units of their least squares predicted values.
c.
To determine if at least one of the variables is useful in predicting the DDT level, we test:
Ho: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F = 1.89 and the p-value is p = .135. Since the p-value is not less than = .05
(p = .135 .05), H0 is not rejected. There is insufficient evidence to indicate at least one of the
d.
e.
689
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with
df = n 3 = 144 4 = 140, t.025 = 1.96. The 95% confidence interval is:
(0.10676, 0.00794)
We are 95% confident that the mean DDT level will change from 0.10676 to 0.00794 for each
additional point increase in weight, holding length and mile constant. Since 0 is in the interval, there
is no evidence that weight and DDT level are linearly related.
11.19
a.
b.
Coef
13614.5
0.08879
-9.201
14.394
0.35
-0.8480
S = 458.828
SE Coef
870.0
0.01391
1.499
3.461
29.56
0.4421
R-Sq = 92.4%
T
15.65
6.38
-6.14
4.16
0.01
-1.92
P
0.000
0.000
0.000
0.000
0.991
0.060
R-Sq(adj) = 91.7%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
5
61
66
Source
RPM
INLET-TEMP
EXH-TEMP
CPRATIO
AIRFLOW
Seq SS
119598530
26893467
7784225
4623
774427
DF
1
1
1
1
1
SS
155055273
12841935
167897208
MS
31011055
210524
F
147.30
P
0.000
Unusual Observations
Obs
11
32
36
47
61
64
RPM
18000
14950
4473
7280
33000
3600
HEATRATE
14628.0
10656.0
13523.0
11588.0
16243.0
8714.0
Fit
13214.0
11663.0
12489.5
10533.0
15758.0
8415.2
SE Fit
117.9
132.5
195.1
154.7
246.5
340.9
Residual
1414.0
-1007.0
1033.5
1055.0
485.0
298.8
St Resid
3.19R
-2.29R
2.49R
2.44R
1.25 X
0.97 X
690
Chapter 11
o 13, 614.5 . Since 0 is not within the range of all the independent variables, this value has no
meaning.
1 0.0888 . For each unit increase in RPM, the mean heat rate is estimated to increase by .0888,
holding all the other 4 variables constant.
2 9.201 . For each unit increase in inlet temperature, the mean heat rate is estimated to decrease
by 9.201, holding all the other 4 variables constant.
3 14.394 . For each unit increase in exhaust temperature, the mean heat rate is estimated to
increase by 14.394, holding all the other 4 variables constant.
4 0.35 . For each unit increase in cycle pressure ratio, the mean heat rate is estimated to increase
by 0.35, holding all the other 4 variables constant.
5 0.8480 . For each unit increase in air flow rate, the mean heat rate is estimated to decrease by
.848, holding all the other 4 variables constant.
d.
From the printout, s = 458.828. We would expect to see most of the heat rate values within 2 s or
2(458.828) = 917.656 units of the least squares line.
e.
To determine if at least one of the variables is useful in predicting the heat rate values, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least 1 i 0
The test statistic is F = 147.30 and the p-value is p = .000. Since the p-value is less than = .01 (p =
.000 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is
useful in predicting the heat rate values at = .01.
f.
R2a = R-Sq(adj) = 91.7% or .917. 91.7% of the total sample variance of the heat rate values is
explained by the model containing the 5 independent variables.
g.
To determine if there is evidence to indicate heat rate is linearly related to inlet temperature, we test:
H0: 2 = 0
Ha: 2 0
The test statistic is t = -6.14 and the p-value is p = 0.000. Since the p-value is less than = .01 (p =
.000 < .01), H0 is rejected. There is sufficient evidence to indicate heat rate is linearly related to inlet
temperature at = .01.
11.20
691
a.
b.
Coef
0.9981
-0.022429
0.15571
-0.01719
-0.009527
0.4214
0.4171
-0.1552
S = 0.4365
StDev
0.2475
0.005039
0.07429
0.01186
0.009619
0.1008
0.4377
0.1486
T
4.03
-4.45
2.10
-1.45
-0.99
4.18
0.95
-1.04
R-Sq = 77.1%
P
0.002
0.001
0.060
0.175
0.343
0.002
0.361
0.319
R-Sq(adj) = 62.5%
Analysis of Variance
Source
DF
Regression
7
Residual Error 11
Total
18
SS
7.9578
2.3632
10.3210
MS
1.1368
0.2148
F
5.29
P
0.007
Source
DF
Seq SS
x1
1
1.4016
x2
1
1.9263
x3
1
0.1171
x4
1
0.0446
x5
1
4.0771
x6
1
0.1565
x7
1
0.2345
Unusual Observations
Obs
14
x1
80.0
y
0.120
1 = .0224. We estimate that the mean voltage will decrease by .0224 kw/cm, for each additional
increase of 1% of x1, the disperse phase volume (with all other variables held constant).
2 = .1557. We estimate that the mean voltage will increase by .1557 kw/cm for each additional
increase of 1% of x2, the salinity (with all other variables held constant).
692
Chapter 11
3 = -.0172. We estimate the the mean voltage will decrease by .0172 kw/cm for each additional
increase of 1 degree of x3, the temperature in Celsius (with all other variables held constant).
4 = .0095. We estimate that the mean voltage will decrease by .0095 kw/cm for each additional
increase of 1 hour of x4, the time delay (with all other variables held constant).
5 = .4214. We estimate that the mean voltage will increase by .4214 kw/cm for each additional
increase of 1% of x5, surficant concentration (with all other variables held constant).
6 = .4171. We estimate that the mean voltage will increase by .4171 kw/cm for each additional
increase of 1 unit of x6, span: Triton (with all other variables held constant).
7 = .1552. We estimate that the mean voltage will decrease by .1552 kw/cm for each additional
increase of 1% of x7, the solid particles (with all other variables held constant).
d.
11.21
a.
b.
R2 = .362. 36.2% of the variability in the AC scores can be explained by the model containing the
variables self-esteem score, optimism score, and group cohesion score.
To test the utility of the model, we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is:
.362 / 3
R2 / k
F=
= 5.11
=
2
(1 .362) /[31 (3 1)]
(1 R ) /[n (k 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 3 and 2 = n
(k + 1) = 31 (3 + 1) = 27. From Table VIII, Appendix B, F.05 = 2.96. The rejection region is F >
2.96.
Since the observed value of the test statistic falls in the rejection region (F = 5.11 > 2.96), H0 is
rejected. There is sufficient evidence that the model is useful in predicting AC score at = .05.
11.22
693
R2 / k
(1 R 2 ) /[n (k 1)]
.95 / 18
= 1.06
(1 .95) /[20 (18 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 18 and 2 = n (k
+ 1) = 20 (18 + 1) = 1. From Table VIII, Appendix B, F.05 245.9. The rejection region is F > 245.9.
Since the observed value of the test statistic does not fall in the rejection region (F = 1.06 247), H0 is not
a.
Model 1:
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1 0
s
.0354
= 2.58.
.0137
Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (1 + 1) = 10, t.025 =
2.228. The rejection region is t < 2.228 or t > 2.228.
Since the observed value of the test statistic falls in the rejection region (t = 2.58 > 2.228), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price.
Model 2:
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1 0
s
.0238
= 3.32
.00717
Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (4 + 1) = 7, t.025 =
2.365. The rejection region is t < 2.365 or t > 2.365.
Since the observed value of the test statistic falls in the rejection region (t = 3.32 > 2.365), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price, adjusting for all other variables.
H0: 2 = 0
Ha: 2 0
694
Chapter 11
2 0
s
.616
= 6.47
.0952
3 0
s
.00386
= 4.77
.00081
4 0
s
.0001173
= 0.24.
.000482
H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables.
Model 3:
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1 0
s
.0240
= 3.21
.00747
Since no was given, we will use = .05. The rejection region requires /2 = .05/2 = .025 in each
tail of the t distribution. From Table V, Appendix B, with df = n (k + 1) = 12 (5 + 1) + 7, t.025 =
2.447. The rejection region is t < 2.447 or t > 2.447.
Since the observed value of the test statistic falls in the rejection region (t = 3.21 > 2.447), H0 is
rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage
and the logarithm of price, adjusting for all other variables.
695
H0: 2 = 0
Ha: 2 0
The test statistic is t =
2 0
s
.608
= 5.24.
.116
3 0
s
.00380
= 4.00
.00095
4 0
s
.00115
= 2.28
.000505
H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables.
H0: 5 = 0
Ha: 5 0
The test statistic is t =
5 0
s
.00765
= 0.14.
.0565
H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between
average September temperature and the logarithm of price, adjusting for all other variables.
696
Chapter 11
b.
Mode1 1:
.0354
1 = .0354, e
1 = .036
We estimate that the mean price will increase by 3.6% for each additional increase of unit of x1,
vintage year.
Model 2:
.0238
1 = .024
1 = .0238, e
We estimate that the mean price will increase by 2.4% for each additional increase of 1 unit of x1,
vintage year (with all other variables held constant).
.616
2 = .616, e 1 = .852
We estimate that the mean price will increase by 85.2% for each additional increase of 1 unit of x2,
average growing season temperature C (with all other variables held constant).
.00386
1 = .004
3 = .00386, e
We estimate that the mean price will decrease by .4% for each additional increase of 1 unit of x3,
Sept./Aug. rainfall in cm (with all other variables held constant).
4 = .0001173, e
.0001173
1 = .0001
We estimate that the mean price will increase by .01% for each additional increase of 1 unit of x4,
rainfall in months preceding vintage in cm (with all other variables held constant).
Model 3:
.0240
1 = .024
1 = .0240, e
We estimate that the mean price will increase by 2.4% for each additional increase of 1 unit of x1,
vintage year (with all other variables held constant).
.608
2 = .608, e 1 = .837
We estimate that the mean price will increase by 83.7% for each additional increase of 1 unit of x2,
average growing season temperatures in C (with all other variables held constant).
.00380
1 = .004
3 = .00380, e
We estimate that the mean price will decrease by .4% for each additional increase of 1 unit of x3,
Sept./Aug. rainfall in cm, (with all other variables held constant).
.00115
1 = .001
4 = .00115, e
697
We estimate that the average mean price will increase by .1% for each additional increase of 1 unit of
x4, rainfall in months preceding vintage in cm (with all other variables held constant).
.00765
1 = .008
5 = .00765, e
We estimate that the average mean price will increase by .8% for each additional increase of 1 unit of
x5, average Sept. temperature in C (with all other variables held constant).
c.
11.24
I would recommend model 2. Model 1 has only 1 independent variable in the model and it is
significant at = .05. The R2 for this model is R2 = .212 and s = .575. Model 2 has 4 independent
variables in the model and all terms are significant at = .05 except one. This one variable is
significant at = .10. This model has R2 = .828 and s = .287. Comparing model 2 to model 1, the R2
for model 2 is much larger than that for model 1 and the estimate of the standard deviation is much
smaller. Model 3 contains all of the independent variables that model 2 has plus one additional
variable. This additional variable is not significant at = .10. In addition, the R2 for this new model
= .828, the same as for model 2. However, the estimate of the standard deviation of model 3 is now
larger than that of model 2. This indicates that model 2 is better than model 3.
a.
Coef
131.92
2.726
0.04722
-2.5874
S = 9.810
SE Coef
25.69
2.275
0.09335
0.6428
R-Sq = 77.0%
T
5.13
1.20
0.51
-4.03
P
0.000
0.248
0.620
0.001
R-Sq(adj) = 72.7%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Pounds
Units
Weight
DF
1
1
1
DF
3
16
19
SS
5158.3
1539.9
6698.2
MS
1719.4
96.2
F
17.87
P
0.000
Seq SS
3400.6
198.4
1559.3
MSR
1719.4
=
= 17.87
MSE
96.2
698
Chapter 11
The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k = 3 and 2 = n
(k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B, F.01 = 5.29. The rejection region is F >
5.29.
Since the observed value of the test statistic falls in the rejection region (F = 17.87 > 5.29), H0 is
rejected. There is sufficient evidence to indicate a relationship exists between hours of labor and at
least one of the independent variables at = .01.
c.
H0: 2 = 0
Ha: 2 0
The test statistic is t = .51. The p-value = .620. We reject H0 if p-value < . Since .620 > .05, do not
reject H0. There is insufficient evidence to indicate a relationship exists between hours of labor and
percentage of units shipped by truck, all other variables held constant, at = .05.
d.
e.
If the average number of pounds per shipment increases from 20 to 21, the estimated change in mean
number of hours of labor is 2.587. Thus, it will cost $7.50(2.587) = $19.4025 less, if the variables
x1 and x2 are constant.
f.
Since s = Standard Error = 9.81, we can estimate approximately with 2s precision or 2(9.81) or
19.62 hours.
g.
11.25
R2 is printed as R-Sq. R2 = .770. We conclude that 77% of the sample variation of the labor hours is
explained by the regression model, including the independent variables pounds shipped, percentage
of units shipped by truck, and weight.
No. Regression analysis only determines if variables are related. It cannot be used to determine
cause and effect.
a.
b.
11.26
You would look up the number of walks (x1), singles (x2), doubles (x3), triples (x4), home runs (x5), stolen
bases (x6), caught stealing (x7), strikeouts (x8), and outs (x9) for your favorite team. Then use the following
fitted regression line to predict the number of runs scored:
y 3.70 .34 x1 .49 x2 .72 x3 1.14 x4 1.51x5 .26 x6 .14 x7 .10 x8 .10 x9
11.27
The 95% prediction interval is (1,759.75, 4,275.38). We are 95% confident that the true
actual annual earnings for a vendor who is 45 years old and who works 10 hours per day is between
$1,759.75 and $4,275.38.
b.
The 95% confidence interval is (2,620.25, 3,414.87). We are 95% confident that the true mean
annual earnings for vendors who are 45 years old and who work 10 hours per day is between
$2,620.25 and $3,414.87.
c.
11.28
a.
Yes. The prediction interval for the ACTUAL value of y is always wider than the confidence interval
for the MEAN value of y.
From the printout, the 90% prediction interval is (143.218, 180.978). We are 90% confidence that an
actual DDT level for a fish caught 300 miles upstream that is 40 centimeters long and weighs 800 grams
will be between 143.218 and 180.978. Since the DDT level cannot be negative, the interval would be
between 0 and 180.978.
11.29
699
The 95% prediction interval is (11,599.6, 13,665.5). We are 95% confident that the actual heat rate
will be between 11,599.6 and 13.665.5 when the RPM is 7,500, the inlet temperature is 1,000, the
exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.
b.
The 95% confidence interval is (12,157.9, 13,107.1). We are 95% confident that the mean heat rate
will be between 12,157.9 and 13,107.1 when the RPM is 7,500, the inlet temperature is 1,000, the
exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.
c.
11.30
a.
Yes. The confidence interval for the mean will always be smaller than the prediction interval for the
actual value. This is because there are 2 error terms involved in predicting an actual value and only
one error term involved in estimating the mean. First, we have the error in locating the mean of the
distribution. Once the mean is located, the actual value can still vary around the mean, thus, the
second error. There is only one error term involved when estimating the mean, which is the error in
locating the mean.
Fit
232.43
SE Fit
23.23
95% CI
(186.73, 278.14)
95% PI
(24.14, 440.73)X
LATITUDE
23.8
LONGITUDE
90.7
DEPTH-FT
25.0
From the printout, the 95% prediction interval is (24.14, 440.73). We are 95% confident that the actual
arsenic level will be between 24.14 and 440.73 when the latitude is 23.755, longitude is 90.662, and depth
is 25.
700
11.31
Chapter 11
a.
Coef
0.704
0.17957
0.07285
-0.11981
S = 8.89248
SE Coef
1.192
0.08876
0.07379
0.04238
R-Sq = 11.2%
T
0.59
2.02
0.99
-2.83
P
0.556
0.045
0.325
0.005
R-Sq(adj) = 9.6%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
ARTenure
AR6Year
AveSal6
DF
1
1
1
DF
3
171
174
SS
1704.73
13522.03
15226.76
MS
568.24
79.08
F
7.19
P
0.000
Seq SS
994.79
78.02
631.93
The least squares prediction equation is: y = .704 + .180 x1 + .0729 x2 .120 x3
b.
c.
Fit
10.098
SE Fit
1.830
95% CI
(6.486, 13.710)
95% PI
(-7.822, 28.019)
ARTenure
40.0
AR6Year
32.0
AveSal6
1.00
The 95% confidence interval for the efficiency rating of a CEO with x1 = 40%, x2 = 32%, and x3 = $1
million is (7.822, 28.019). We are 95% confident that the actual efficiency rating of a CEO with the
above values for the independent variables is between 7.822 and 28.019.
11.32
701
Coef
0.9326
-0.024272
0.14206
0.38457
S = 0.4796
StDev
0.2482
0.004900
0.07573
0.09801
R-Sq = 66.6%
T
3.76
-4.95
1.88
3.92
P
0.002
0.000
0.080
0.001
R-Sq(adj) = 59.9%
Analysis of Variance
Source
Regression
Residual
Error
Total
Sourc
e
x1
x2
x5
DF
3
15
SS
6.8701
3.4509
18
10.3210
DF
F
9.95
P
0.001
Seq SS
1
1
1
MS
2.2900
0.2301
1.4016
1.9263
3.5422
Unusual Observations
Obs
x1
y
3
40.0
3.200
Fit
2.068
StDev Fit
0.239
Residual
1.132
St Resid
2.72R
StDev Fit
0.232
95.0%
( -0.592,
CI
0.396)
95.0%
-1.233,
PI
1.038)
The 95% prediction interval is (1.233, 1.038). We are 95% confident that the actual voltage is between
1.233 and 1.038 kw/cm when the volume fraction of the disperse phase is at the high level (x1 = 80), the
salinity is at the low level (x2 = 1), and the amount of surfactant is at the low level (x5 = 2).
702
11.33
Chapter 11
a.
Coef
-3783
0.0087490
1.9265
3444.3
2093.4
S = 894.6
SE Coef
1205
0.0009035
0.6489
911.7
305.6
R-Sq = 90.3%
T
-3.14
9.68
2.97
3.78
6.85
P
0.004
0.000
0.006
0.001
0.000
R-Sq(adj) = 89.0%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
4
31
35
Source
Capacity
Pressure
Type
Drum
Seq SS
175007141
490357
17813091
37544266
DF
1
1
1
1
SS
230854854
24809761
255664615
MS
57713714
800315
F
72.11
P
0.000
Fit
1936
SE Fit
239
95.0% CI
1449,
2424)
95.0% PI
48,
3825)
Capacity
150000
Pressure
500
Type
1.00
Drum
0.000000
To determine if the model is useful for predicting the number of man-hours needed, we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one i 0, i = 1, 2, 3, 4
The test statistic is F = 72.11 with p-value = .000. Since the p-value is less than = .01, we can
reject H0. There is sufficient evidence that the model is useful for predicting man-hours at = .01.
c.
11.35
a.
b.
11.34
a.
b.
703
c.
The lines are not parallel because interaction between x1 and x2 is present. Interaction between x1 and
x2 means that the effect of x2 on y depends on what level x1 takes on.
d.
e.
11.36
a.
R2 = 1
SSE
SS yy
21
= .956
479
R2 / k
(1 R )[n (k 1)]
2
.956 / 3
= 202.8
(1 .956)[32 (3 1)]
The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k = 3 and 2 = n
(k + 1) = 32 (3 + 1) = 28. From Table VIII, Appendix B, F.05 = 2.95. The rejection region is F >
2.95.
Since the observed value of the test statistic falls in the rejection region (F = 202.8 > 2.95), H0 is
rejected. There is sufficient evidence that the model is adequate for predicting y at = .05.
704
Chapter 11
c.
d.
1 0
s
10
= 2.5.
4
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 32 (3 + 1) = 28. From Table V, Appendix B, t.025 = 2.048. The rejection region is t < 2.048 or t
> 2.048.
Since the observed value of the test statistic falls in the rejection region (t = 2.5 > 2.048), H0 is
rejected. There is sufficient evidence to indicate that x1 and x2 interact at = .05.
11.37
a.
The response surface is a twisted plane, since the equation contains an interaction term.
c.
d.
e.
705
f.
3
s
1.285
= 8.06
.159
The rejection region requires /2 = .01/2 = .005 in each tail of the t distribution with df = n (k + 1)
= 15 (3 + 1) = 11. From Table V, Appendix B, t.005 = 3.106. The rejection region is t < 3.106 or
t > 3.106.
Since the observed value of the test statistic falls in the rejection region (t = 8.06 < 3.106), H0 is
rejected. There is sufficient evidence to indicate that x1 and x2 interact at = .01.
11.38
a.
b.
To determine if consumer satisfaction and retailer interest interact to affect willingness to shop at
retailers shop in future, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = -3.09 and the p-value is p < .01. Since the p-value is less
than (p < .01 <
.05), H0 is rejected. There is sufficient evidence to indicate consumer satisfaction and retailer interest
interact to affect willingness to shop at retailers shop in future at = .05.
c.
When x2 = 1,
y o .426 x1 .044 x2 .157 x1 x2
o .044 .269 x1
Since no value is given for o , we will use o = 1 for graphing purposes. Using MINITAB, a
graph might look like:
706
Chapter 11
YH A T
2.5
2.0
1.5
4
X1
When x2 = 7,
y o .426 x1 .044 x2 .157 x1 x2
o .308 .673 x1
Since no value is given for o , we will again use o = 1 for graphing purposes.
-1
YH A T
d.
-2
-3
-4
1
4
X1
707
x2=1
x2=7
YH A T
1
0
-1
-2
-3
-4
1
4
X1
Since the lines are not parallel, it indicates that interaction is present.
11.39
a.
E ( y ) o 1 x1 2 x2 3 x1 x2
b.
11.40
If the slope of the relationship between number of defects (y) and turntable speed (x1) is steeper for
lower values of cutting blade speed, then the interaction term must be negative. As the value of
cutting speed increases, the steepness gets smaller, thus, the interaction term must get smaller. This
implies 3 0.
a.
The hypothesized regression model including the interaction between x1 and x2 would be:
E ( y ) o 1 x1 2 x2 3 x1 x2
b.
If x1 and x2 interact to affect y then the effect of x1 on y depends on the level of x2. Also, the effect
of x2 on y depends on the level of x1.
c.
Since the p-value is not small (p = .25), Ho is not rejected. There is insufficient evidence to indicate
x1 and x2 interact to affect y.
d.
1 corresponds to x1, the number ahead in line. If the negative feeling score gets larger as the
number of people ahead increases, then 1 is positive. 2 corresponds to x2, the number behind in
line. If the negative feeling score gets lower as the number of people behind increases, then 2 is
negative.
708
11.41
Chapter 11
a.
Coef
1042
-13.24
103.3
3.621
S = 550.289
SE Coef
1304
29.23
162.0
3.840
R-Sq = 61.4%
T
0.80
-0.45
0.64
0.94
P
0.441
0.659
0.537
0.366
R-Sq(adj) = 50.8%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Hours
A_H
DF
1
1
1
DF
3
11
14
SS
5287427
3331000
8618428
MS
1762476
302818
F
5.82
P
0.012
Seq SS
600498
4417734
269196
c.
d.
e.
From the printout, the test statistic for the test for interaction is t = 0.94 and the
p-value is p = .366.
709
f.
11.42
Since the p-value is so large (p = .366), H0 is not rejected. There is insufficient evidence to indicate
age and hours worked interact to affect annual earnings.
a.
If client credibility and linguistic delivery style interact, then the effect of client credibility on the
likelihood value depends on the level of linguistic delivery style.
b.
c.
d.
e.
f.
g.
710
11.43
Chapter 11
a.
b.
Coef
10845
-1280
217.4
-1549.2
-11.00
19.98
S = 103.072
SE Coef
67720
1053
814.5
985.6
11.86
11.20
R-Sq = 13.7%
T
0.16
-1.22
0.27
-1.57
-0.93
1.78
P
0.873
0.225
0.790
0.117
0.355
0.076
R-Sq(adj) = 12.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
LATITUDE
LONGITUDE
DEPTH-FT
Lat_D
Long_D
DF
1
1
1
1
1
DF
5
321
326
SS
542303
3410258
3952562
MS
108461
10624
F
10.21
P
0.000
Seq SS
132448
320144
53179
2756
33777
y 10, 845 1, 280 latitude 217.4 longitude 1, 549.2 depth 11.00 lat_d 19.98 long_d
c.
d.
711
e.
11.44
a.
Because the interactions are not significant, this means that the effect of latitude on the arsenic levels
does not depend on the depth and the effect of longitude on the arsenic levels does not depend on the
depth.
The model that incorporates the researchers theories is:
E ( y ) 0 1 x2 2 x3 3 x5 4 x2 x5 5 x3 x5
b.
Coef
13945
-15.1379
28.843
-0.689
0.022770
-0.05430
S = 425.072
SE Coef
1044
0.7775
2.304
3.628
0.002999
0.01053
R-Sq = 93.4%
T
13.35
-19.47
12.52
-0.19
7.59
-5.16
P
0.000
0.000
0.000
0.850
0.000
0.000
R-Sq(adj) = 92.9%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
5
61
66
SS
156875371
11021838
167897208
MS
31375074
180686
F
173.64
P
0.000
To determine if inlet temperature and air flow rate interact to affect heat rate, we test:
H0: 4 = 0
Ha: 4 0
The test statistic is t = 7.59 with a p-value of p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that inlet temperature and air flow rate
interact to affect heat rate at = .05.
712
Chapter 11
d.
To determine if exhaust temperature and air flow rate interact to affect heat rate, we test:
H0: 5 = 0
Ha: 5 0
The test statistic is t = 5.16 with a p-value of p = 0.000. Since the p-value is less than
= .05, H0 is rejected. There is sufficient evidence to indicate that exhaust temperature and air flow
rate interact to affect heat rate at = .05.
e.
11.45
Since the interaction of inlet temperature and air flow rate is significant, it means that the effect of
inlet temperature on the heat rate depends on the level of air flow rate. Also, since the interaction of
exhaust temperature and air flow rate is significant, it means that the effect of exhaust temperature on
the heat rate also depends on the level of air flow rate
a.
By including the interaction terms, it implies that the relationship between voltage and volume
fraction of the disperse phase depends on the levels of salinity and surfactant concentration.
A possible sketch of the relationship is:
b.
Coef
0.9057
-0.022753
0.3047
0.2747
-0.002804
0.001579
S = 0.5047
SE Coef
0.2855
0.008318
0.2366
0.2270
0.003790
0.003947
R-Sq = 67.9%
T
3.17
-2.74
1.29
1.21
-0.74
0.40
P
0.007
0.017
0.220
0.248
0.473
0.696
R-Sq(adj) = 55.6%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x5
x1x2
x1x5
DF
1
1
1
1
1
DF
5
13
18
SS
7.0103
3.3107
10.3210
MS
1.4021
0.2547
F
5.51
Seq SS
1.4016
1.9263
3.5422
0.0994
0.0408
P
0.006
713
0 = .906.
1 = .023.
For each unit increase in disperse phase volume, we estimate that the mean voltage
will decrease by .023 units, holding salinity and surfactant concentration at 0.
2 = .305.
For each unit increase in salinity, we estimate that the mean voltage will increase
by .305 units, holding disperse phase volume and surfactant concentration at 0.
3 = .275.
For each unit increase in surfactant concentration, we estimate that the mean
voltage will increase by .275 units, holding disperse phase volume and salinity at 0.
4 = .003.
This estimates the difference in the slope of the relationship between voltage and
disperse phase volume for each unit increase in salinity, holding surfactant
concentration constant.
5 = .002.
This estimates the difference in the slope of the relationship between voltage and
disperse phase volume for each unit increase in surfactant concentration, holding
salinity constant.
714
Chapter 11
a.
b.
H0: 4 = 0
c.
11.46
Since the p-value is so small, there is strong evidence to reject H0. There is sufficient evidence to
indicate that the strength of client-therapist relationship contributes information for the prediction of
a client's reaction for any > .001.
d.
e.
a.
E(y) = 0 + 1x + 2x2
2
2
E(y) = 0 + 1x1 + 2x2 + 3 x1 x2 + 4 x1 + 5 x2
c.
11.48
R2 = .2946. 29.46% of the variability in the client's reaction scores can be explained by this model.
b.
11.47
2
2
2
E(y) = 0 + 1x1 + 2x2 + 3x3 + 4 x1 x2 + 5 x1 x3 + 6 x2 x3 + 7 x1 + 8 x2 + 9 x3
a.
H0: 2 = 0
H a: 2 0
2 0
s
.47 0
= 3.133
.15
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 25 (2 + 1) = 22. From Table V, Appendix B, t.025 = 2.074. The rejection region is t < 2.074 or t
> 2.074.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 2.074), H0 is
rejected. There is sufficient evidence to indicate the quadratic term should be included in the model
at = .05.
b.
H0: 2 = 0
Ha: 2 > 0
The test statistic is the same as in part a, t = 3.133.
The rejection region requires = .05 in the upper tail of the t distribution with df = 22. From Table
V, Appendix B, t.05 = 1.717. The rejection region is t > 1.717.
Since the observed value of the test statistic falls in the rejection region (t = 3.133 > 1.717), H0 is
rejected. There is sufficient evidence to indicate the quadratic curve opens upward at = .05.
11.49
a.
R2 / k
(1 R ) /[n (k 1)]
2
.91/ 2
= 85.94
(1 .91) /[20 (2 1)]
715
The rejection region requires = .05 in the upper tail of the F distribution, with 1 = k = 2, and
2 = n (k + 1) = 20 (2 + 1) = 17. From Table VIII, Appendix B, F.05 = 3.59. The rejection region
is F > 3.59.
Since the observed value of the test statistic falls in the rejection region (F = 85.94 > 3.59), H0 is
rejected. There is sufficient evidence that the model contributes information for predicting y at =
.05.
b.
c.
11.50
a.
b.
c.
11.51
It moves the graph to the right (2x) or to the left (+2x) compared to the graph of
y = 1 + x2.
It controls whether the graph opens up (+x2) or down (x2). It also controls how steep the curvature
is, i.e., the larger the absolute value of the coefficient of x2 , the narrower the curve is.
a.
b.
H0: 4 = 0
H a: 4 0
The test statistic is t = 10.74 with p-value = 0.000. Since the p-value is less than
= .01, H0 is rejected. There is sufficient evidence to indicate that 4 0 at = .01.
716
Chapter 11
c.
H0: 5 = 0
H a: 5 0
The test statistic is t = .60 with p-value = .550. Since the p-value is greater than =.01, H0 is not
rejected. There is insufficient evidence to indicate that 5 0 at = .01.
d.
a.
b.
11.52
1 = 321.67. Since the quadratic effect is included in the model, the linear term is
just a location parameter and has no meaning.
c.
d.
11.53
a.
b.
11.54
a.
If information were available only for x = 30, 31, 32, and 33, we would suggest a first-order model
where 1 > 0. If information was available only for x = 33, 34, 35, and 36, we would again suggest a
first-order model where 1 < 0. If all the information was available, we would suggest a secondorder model.
To determine if the model is adequate, we test:
H0: 1 = 2 = 0
Ha: At least one i 0
R2 / k
(1 R )[n (k 1)]
2
.12 / 2
26.25
(1 .12)[388 (2 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 2 and 2 = n
(k + 1) = 388 (2 + 1) = 385. From Table VIII, Appendix B, F.05 3.00. The rejection region is
F > 3.00.
Since the observed value of the test statistic falls in the rejection region (F = 26.25 > 3.00), H0 is
rejected. There is sufficient evidence to indicate the model is adequate at = .05.
b.
717
c.
11.55
From the table, the test statistic is t = -3.97 and the p-value is p < .01/2 = .005. Since the p-value is
less than (p < .005 < .05), H0 is rejected. There is sufficient evidence to indicate leadership ability
increases at a decreasing rate with assertiveness at = .05.
a.
b.
c.
d.
R2 = .14. 14% of the total variation in the efficiency scores is explained by the complete 2nd order
model containing level of CEO leadership and level of congruence between the CEO and the VP.
2
If the -coefficient for the x2 term is negative, then as the value of the level of congruence increases,
the efficiency will increase at a decreasing rate to some point and then the efficiency will decrease at
an increasing rate, holding level of CEO leadership constant.
Since the p-value is less than (p = .02 < .05), H0 is rejected. There is sufficient evidence to indicate
that the level of CEO leadership and the level of congruence between the CEO and the VP interact to
affect efficiency. This means that the effect of CEO leadership on efficiency depends on the level of
congruence between the CEO and the VP.
11.57
a.
2
2
E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4 x1 5 x2
b.
11.56
2
2
4 x1 and 5 x2
a.
b.
718
Chapter 11
c.
International
1000
800
600
400
200
0
100
200
300
400
Domestic
500
600
From the plot, it appears that the first order model might fit the data better. There does not appear to
be much of a curve to the relationship.
d.
Coef
182.9
-0.243
0.002625
S = 175.370
SE Coef
301.0
1.849
0.002523
R-Sq = 65.4%
T
0.61
-0.13
1.04
P
0.554
0.897
0.317
R-Sq(adj) = 60.1%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Domestic
Dsq
DF
1
1
DF
2
13
15
SS
755320
399811
1155131
MS
377660
30755
F
12.28
P
0.001
Seq SS
722025
33295
719
To determine if a curvilinear relationship exists between foreign and domestic gross revenues, we
test:
H0: 2 = 0
H a: 2 0
The test statistic is t = 1.04.
The p-value is p = .317 Since the p-value is greater than = .05
(p = 0.317 > = .05), H0 is not rejected. There is insufficient evidence to indicate that a curvilinear
relationship exists between foreign and domestic gross revenues at = .05.
e.
a.
yhat
11.58
From the analysis in part d, the first-order model better explains the variation in foreign gross
revenues. In part d, we concluded that the second-order term did not improve the model.
6
4
2
0
0
100
200
300
400
Dose
500
600
700
800
b.
c.
d.
720
a.
8000
6000
Time
11.59
Chapter 11
4000
2000
0
120
130
140
150
160
170
Temp
The relationship appears to be curvilinear. As temperature increases, the value of time tends to
decrease but at a decreasing rate.
b.
Coef
154243
-1908.9
5.929
S = 688.137
SE Coef
21868
303.7
1.048
R-Sq = 94.2%
T
7.05
-6.29
5.66
P
0.000
0.000
0.000
R-Sq(adj) = 93.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Temp
Tempsq
DF
1
1
DF
2
19
21
SS
144830280
8997107
153827386
MS
72415140
473532
F
152.93
P
0.000
Seq SS
129663987
15166293
c.
721
To determine if there is an upward curvature in the relationship between failure time and solder
temperature, we test:
Ho: 2 = 0
Ha: 2 > 0
From the printout, the test statistic is t = 5.66 and the p-value is p = 0.000. Since the p-value is less
than (p = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate an upward curvature
in the relationship between failure time and solder temperature at = .05.
11.60
a.
Coef
-288
1.395
0.00003509
S = 31901.1
SE Coef
8049
3.651
0.00009724
R-Sq = 45.9%
T
-0.04
0.38
0.36
P
0.972
0.706
0.722
R-Sq(adj) = 40.8%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
EST
Esq
DF
1
1
DF
2
21
23
SS
18138955261
21371254395
39510209656
MS
9069477631
1017678781
F
8.91
P
0.002
Seq SS
18006405335
132549926
To determine if the incidence rate is curvilinearly related to the estimated rate, we test:
H0: 2 = 0
H a: 2 0
From the printout, the test statistic is t = .36 and the p-value is p = .722. Since the p-value is not less
than (p = .722 < .05), H0 is not rejected. There is insufficient evidence to indicate that the incidence
/
rate is curvilinearly related to the estimated rate at = .05.
722
Chapter 11
b.
RA TE
150000
100000
50000
0
0
10000
20000
EST
30000
40000
The point for Botulism is in the lower right hand corner of the graph. The estimated value is way
bigger than the actual value.
c.
Coef
735.0
-0.0810
0.00015052
S = 2756.80
SE Coef
695.9
0.3167
0.00000868
R-Sq = 99.6%
T
1.06
-0.26
17.34
P
0.303
0.801
0.000
R-Sq(adj) = 99.6%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
EST2
Esq2
DF
1
1
DF
2
20
22
SS
39251490541
151998825
39403489366
MS
19625745270
7599941
F
2582.35
P
0.000
Seq SS
36967483627
2284006914
To determine if the incidence rate is curvilinearly related to the estimated rate after eliminating the
point for Botulism, we test:
H0: 2 = 0
H a: 2 0
723
From the printout, the test statistic is t = 17.34 and the p-value is p = .000. Since the p-value is less
than (p = .000 < .05), H0 is rejected. There is sufficient evidence to indicate that the incidence rate is
curvilinearly related to the estimated rate after omitting the Botulism point at = .05.
Yes, the fit has improved. With all of the points, the value of R2 = 45.9%. When the Botulism point
has been omitted, the R2 = 99.6%. Almost all of the variation in the Incidence rates is explained by the
curvilinear relationship between incidence rate and estimated value.
11.61
11.62
a.
10500+
7000+
3500+
-
*
*
*
*
*
*
***
*
* * *
*
*
**
**
*
*
*
*
*
*
** *
*
*
*
*
*
* *
*
*
*
+---------+---------+---------+---------+---------+------X
0.0
8.0
16.0
24.0
32.0
40.0
b.
From the plot, it looks like a second-order model would fit the data better than a first-order model.
There is little evidence that a third-order model would fit the data better than a second-order model.
724
Chapter 11
c.
Coef
2752.4
122.34
s = 1904
Stdev
613.5
26.08
R-sq = 36.7%
t-ratio
4.49
4.69
p
0.000
0.000
R-sq(adj) = 35.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
38
39
SS
79775688
137726224
217501920
Unusual Observations
Obs.
X
Y
27
27.0
2007
40
40.0
11520
MS
79775688
3624374
Fit Stdev.Fit
6056
345
7646
591
F
22.01
Residual
-4049
3874
p
0.000
St.Resid
-2.16R
2.14R
To see if there is a significant linear relationship between day and demand, we test:
H0: 1 = 0
H a: 1 0
The test statistic is t = 4.69.
The p-value for the test is p = 0.000. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate that there is a linear relationship between day and demand at = .05.
d.
Coef
5120.2
-215.92
8.250
Stdev
816.9
91.89
2.173
R-sq = 54.4%
t-ratio
6.27
-2.35
3.80
p
0.000
0.024
0.001
R-sq(adj) = 52.0%
725
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
37
39
SS
118377056
99124856
217501920
SOURCE
X
XSQ
DF
1
1
SEQ SS
79775688
38601372
Unusual Observations
Obs.
X
Y
27
27.0
2007
MS
59188528
2679050
Fit Stdev.Fit
5305
357
F
22.09
Residual
-3298
p
0.000
St.Resid
-2.06R
To see if there is a significant quadratic relationship between day and demand, we test:
H0: 2 = 0
H a: 2 0
The test statistic is t = 3.80.
The p-value for the test is p = 0.001. Since the p-value is less than = .05, H0 is rejected. There is
sufficient evidence to indicate that there is a quadratic relationship between day and demand at
= .05.
e.
11.63
Since the quadratic term is significant in the second-order model in part d, the second order model is
better.
0 = mean value of y when the qualitative variable assumes the first level
1 = difference in the mean values of y between levels 2 and 1 of the qualitative variable
11.64
726
11.65
Chapter 11
a.
b.
11.66
a.
y = 80 + 16.8x1 + 40.4x2
b.
1 estimates the difference in the mean value of the dependent variable between level 2 and level 1
of the independent variable.
2 estimates the difference in the mean value of the dependent variable between level 3 and level 1
c.
d.
MSR 2059.5
= 24.72
MSE
83.3
Since no was given, we will use = .05. The rejection region requires = .05 in the upper tail of
the test statistic with numerator df = k = 2 and denominator df = n (k + 1) = 15 (2 + 1) = 12.
From Table VIII, Appendix B, F.05 = 3.89. The rejection region is F > 3.89.
Since the observed value of the test statistic falls in the rejection region (F = 24.72 > 3.89), H0 is
rejected. There is sufficient evidence to indicate at least one of the means is different at = .05.
11.67
a.
727
b.
c.
d.
11.68
1 if availability is high
Let x2
0 otherwise
1 if position is quarterback
Let x3
0 otherwise
a.
1 if position is linebacker
Let x8
0 otherwise
c.
d.
The model is: E(y) = 0 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9
0 = mean price for position offensive lineman
3 = difference in mean price between player positions quarterback and offensive lineman
4 = difference in mean price between player positions running back and offensive lineman
5 = difference in mean price between player positions wide receiver and offensive lineman
6 = difference in mean price between player positions tight end and offensive lineman
7 = difference in mean price between player positions defensive lineman and offensive lineman
8 = difference in mean price between player positions linebacker and offensive lineman
9 = difference in mean price between player positions defensive back and offensive lineman
728
11.69
Chapter 11
a.
1 if Developer
Let x
0 otherwise
Then the model would be: E ( y ) 0 1 x
1 if Low
Let x1
0 otherwise
1 if Medium
Let x2
0 otherwise
1 if Fixed price
Let x
0 otherwise
Then the model would be: E ( y ) 0 1 x
0 = mean accuracy for the Hourly rate
1 = difference in mean accuracy between the Fixed price and the Hourly rate
d.
1 if Time-of-delivery
Let x1
0 otherwise
1 if Cost
Let x2
0 otherwise
a.
b.
0 = mean relative optimism for analysts who worked for sell-side firms
1 = difference in mean relative optimism for analysts who worked for buy-side and sell-side firms
c.
Yes.
d.
Yes. If the buy-side analysts are less optimistic, then their estimates will be smaller than the sell-side
estimates. Thus, the estimate of 1 will be negative.
11.71
a.
729
2
Radj = .76. 76% of the total sample variation of SAT-Math scores is explained by the regression
model including score on PSAT and whether the student was coached or not, adjusting for the sample
size and the number of independent variables in the model.
b.
For confidence level .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B,
with df = n (k + 1) = 3,492 (2 + 1) = 3,489, t.025 = 1.96. The 95% confidence interval
is:
We are 95% confident that the mean SAT-Math score for those who were coached was anywhere
from 13.12 to 24.88 points higher than the mean for those who were not coached, holding PSAT
scores constant.
c.
11.72
Since 0 is not contained in the confidence interval for 2, we can conclude that the coaching effect
was present. Those who received coaching scored higher on the SAT-Math than those who did not,
holding PSAT scores constant.
a.
4 = .296 The difference in the mean value of DTVA between when the operating earnings are
negative and lower than last year and when the operating earnings are not negative and lower than
last year is estimated to be .296, holding all other variables constant.
b.
To determine if the mean DTVA for firms with negative earnings and earnings lower than last year
exceed the mean DTVA of other firms, we test:
H0: 4 = 0
Ha: 4 > 0
The p-value for this test is p = .001 / 2 = .0005. Since the p-value is so small, we would reject H0 for
= .05. There is sufficient evidence to indicate the mean DTVA for firms with negative earnings
and earnings lower than last year exceed the mean DTVA of other firms at = .05.
c.
2
Ra = .280 28% of the variability in the DTVA scores is explained by the model containing the 5
independent variables, adjusted for the number of variables in the model and the sample size.
11.73
a.
To determine if there is a difference in the mean monthly rate of return for T-Bills between an
expansive Fed monetary policy and a restrictive Fed monetary policy, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 8.14.
Since no n nor is given, we cannot determine the exact rejection region. However, we can assume
that n is greater than 2 since the data used are from 1972 and 1997. With = .05, the critical value
of t for the rejection region will be smaller than 4.303. Thus, with = .05, t = 8.14 will fall in the
rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of
return for T-Bills between an expansive Fed monetary policy and a restrictive Fed monetary policy at
= .05.
730
Chapter 11
However, the value of R2 is .1818. The model used is explaining only 18.18% of the variability in
the monthly rate of return. This is not a particularly large value.
To determine if there is a difference in the mean monthly rate of return for Equity REIT between an
expansive Fed monetary policy and a restrictive Fed monetary policy, we test:
H0: 1 = 0
Ha: 1 0
The test statistic is t = 3.46.
Since no n nor is given, we cannot determine the exact rejection region. However, we can assume
that n is greater than 4 since the data used are from 1972 and 1997. With = .05, the critical value
of t for the rejection region will be smaller than 3.182. Thus, with = .05, t = 3.46 will fall in the
rejection region. There is sufficient evidence to indicate a difference in the mean monthly rate of
return for Equity REIT between an expansive Fed monetary policy and a restrictive Fed monetary
policy at = .05.
However, the value of R2 is .0387. The model used is explaining only 3.87% of the variability in the
monthly rate of return. This is a very small value.
b.
For the first model, 1 is the difference in the mean monthly rate of return for T-Bills between an
expansive Fed monetary policy and a restrictive Fed monetary policy.
For the second model, 1 is the difference in the mean monthly rate of return for Equity REIT
between an expansive Fed monetary policy and a restrictive Fed monetary policy.
c.
The least squares prediction equation for the equity REIT index is:
y = 0.01863 0.01582x.
When the Federal Reserves monetary policy is restrictive, x = 1. The predicted mean monthly rate of
return for the equity REIT index is
11.74
a.
731
The difference between the mean knowledge gains of students in the completed solution and no
help groups would be 1.
c.
Coef
2.4333
-0.4833
0.2867
S = 2.70636
SE Coef
0.4941
0.7813
0.7329
R-Sq = 1.2%
T
4.92
-0.62
0.39
P
0.000
0.538
0.697
R-Sq(adj) = 0.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X2
DF
1
1
DF
2
72
74
SS
6.643
527.357
534.000
MS
3.322
7.324
F
0.45
P
0.637
Seq SS
5.523
1.121
e.
From Exercise 8.28, the test statistic was F = .45 and the p-value was p = .637. These are the same as
those in part d. Thus, the results agree.
732
11.75
Chapter 11
a.
1 if Lotion/cream
Let x =
0 otherwise
The model is E ( y ) 0 1 x.
b.
Coef
0.7775
0.1092
SE Coef
0.2975
0.4545
R-Sq = 0.5%
T
2.61
0.24
P
0.023
0.814
R-Sq(adj) = 0.0%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
12
13
SS
0.0409
8.4973
8.5381
MS
0.0409
0.7081
F
0.06
P
0.814
d.
e.
The dummy variable will be defined the same way and the model will look the same (just the
dependent variable will be different).
From MINITAB, the output is:
Regression Analysis: MaxProt versus Type
The regression equation is
MaxProt = 7.56 - 1.65 Type
Predictor
Constant
Type
S = 6.617
Coef
7.563
-1.646
SE Coef
2.339
3.574
R-Sq = 1.7%
T
3.23
-0.46
P
0.007
0.653
R-Sq(adj) = 0.0%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
12
13
SS
9.29
525.43
534.71
MS
9.29
43.79
F
0.21
P
0.653
733
a.
For no stock split, x1 = 0. For high discretionary accrual, x2 = 1. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = o + 1(0) + 2(1) + 3(0)(1) = o + 2.
b.
For no stock split, x1 = 0. For low discretionary accrual, x2 = 0. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(0) + 2(0) + 3(0)(0) = 0.
c.
d.
For stock split, x1 = 1. For high discretionary accrual, x2 = 1. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(1) + 2(1) + 3(1)(1) = 0 + 1 + 2 + 3.
For stock split, x1 = 1. For low discretionary accrual, x2 = 0. The mean buy-and-hold return rate is
E(y) = 0 + 1x1 + 2x2 + 3x1 x2 = 0 + 1(1) + 2(0) + 3(1)(0) = 0 + 1.
The difference would be 0 + 1 + 2 + 3 (0 + 1) = 2 + 3.
e.
When there is no stock split, the mean buy-and-hold return rate increases by 2 when discretionary
accrual goes from low to high. When there is a stock split, the mean buy-and-hold return rate
increases by 2 + 3 when discretionary accrual goes from low to high. Thus, the effect of
discretionary accrual on the mean buy-and-hold return rate depends on the level of stock split.
f.
Since the p-value is less than (p = .027 < .05), Ho is rejected. There is sufficient evidence to indicate
that interaction between stock split and discretionary accrual exists at = .05.
g.
Yes. For no stock split, the difference between high discretionary accrual and low discretionary
accrual is 2. Since 2 is negative, then the performance of the high discretionary accrual acquirers is
worse than low discretionary accrual acquirers.
For stock split, the difference between high discretionary accrual and low discretionary accrual is 2 +
3. Since both 2 and 3 are negative, then the performance of the high discretionary accrual acquirers
is worse than low discretionary accrual acquirers, and even worse than for no stock split.
734
11.77
Chapter 11
a.
1 if Group V
Let x1
0 otherwise
1 if Group S
Let x2
0 otherwise
Coef
3.1667
-1.0833
-1.4537
S = 1.73596
SE Coef
0.1670
0.2362
0.2362
R-Sq = 11.3%
T
18.96
-4.59
-6.15
P
0.000
0.000
0.000
R-Sq(adj) = 10.7%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
DF
1
1
DF
2
321
323
SS
123.265
967.352
1090.617
MS
61.633
3.014
F
20.45
P
0.000
Seq SS
9.150
114.116
d.
With the dummy variable coding in part a, 0 is the mean recall for group N. Thus, the estimated
mean recall for Group N is 3.1667 or 3.17. 1 is the difference in mean recall between Group V and
Group N. Thus, the mean recall for Group V is 0 + 1 and is estimated to be 3.1667 1.0833 =
2.0834 or 2.08. 2 is the difference in mean recall between Group S and Group N. Thus, the mean
recall for Group S is 0 + 2 and is estimated to be 3.1667 1.4537 = 1.7130 or 1.71.
735
a.
b.
11.78
1 if level 2
where x 2
0 otherwise
c.
1 if level 3
x3
0 otherwise
d.
e.
a.
2
The complete second-order model is E(y) = 0 + 1x1 + 2 x1
b.
11.79
2
The new model is E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3
1 if level 2
where x2 =
0 otherwise
c.
1 if level 3
x3 =
0 otherwise
d.
e.
The response curves will be parallel lines if the interaction terms as well as the second-order terms
are absent or if 2 = 5 = 6 = 7 = 8 = 0.
f.
11.80
The response curves will have the same shape if none of the interaction terms are present or if 5 = 6
= 7 = 8 = 0.
The response curves will be identical if no terms involving the qualitative variable are present or 3 =
4 = 5 = 6 = 7 = 8 = 0.
a.
b.
736
11.81
Chapter 11
a.
For x2 = 1 and x3 = 0, y = 48.8 3.4 x1 + .07 x12 2.4(1) + 3.7 x1 (1) .02 x12 (1)
= 46.4 + 0.3 x1 + .05 x12
For x2 = 0 and x3 = 1, y = 48.8 3.4 x1 + .07 x12 7.5(1) + 2.7 x1 (1) .04 x12 (1)
= 41.3 0.7 x1 + 0.03 x12
b.
11.82
2
The model is E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x4
where x1 is the quantitative variable and
a.
Ho: 1 = 2 = 3 = . . . 12 = 0
Ha: At least 1 i 0
The test statistic is F = 26.9.
Using Tables VII, VIII, IX, and X, Appendix B, with 1 = k = 12 and 2 = n (k + 1) = 148 (12 + 1)
= 135, the p-value associated with F = 26.9 is less than .001. Since the p-value is so small, H0 is
rejected. There is sufficient evidence to indicate the model is adequate.
R2 = .705. 70.5% of the total variation of the natural logarithm of card prices is explained by the
model with the 12 variables in the model.
Adj-R2 = .681. 68.1% of the total variation of the natural logarithm of card prices is explained by the
model with the 12 variables in the model, adjusting for the sample size and the number of variables in
the model.
Since these R2 values are fairly large, it indicates that the model is pretty good.
b.
737
H0: 1 = 0
Ha: 1 0
The test statistic is t = 1.014 and the p-value is p = .312. Since the p-value is so large, H0 is not
rejected. There is insufficient evidence to indicate race has an impact on the value of professional
football players rookie cards for any reasonable value of , holding the other variables constant.
c.
H0: 3 = 0
Ha: 3 0
The test statistic is t = 10.92 and the p-value is p = .000. Since the p-value is so small, H0 is rejected.
There is sufficient evidence to indicate card vintage has an impact on the value of professional
football players rookie cards for any reasonable value of , holding the other variables constant.
d.
11.84
The first order model is: E(y) = 0 + 1x3 + 2x5 + 3x6 + 4x7 + 5x8 + 6x9 + 7x10 + 8x11
+ 9x12 + 10x5x3 + 11x6x3 + 12x7x3 + 13x8 x3 + 14x9 x3 + 15x10 x3 + 16x11 x3 + 17x12 x3
a.
R2 = .069. 6.9% of the total variation of the relative optimism of the analysts 3-month horizon
forecasts is explained by the model containing type of firm, number of days between forecast and
fiscal year-end, and the natural logarithm of the number of quarters the analyst had worked with the
firm.
b.
H0: 1 = 2 = 3 = 0
Ha: At least 1 i 0
The test statistic is F =
R2 / k
(1 R ) /[ n (k + 1)]
2
.069 / 3
= 274.64
(1 .069) /[11,121 (3 + 1)]
The rejection region requires = .01 in the upper tail of the F distribution with 1 = k = 3 and
2 = n (k + 1) = 11,121 (3+1) = 11,117. From Table X, Appendix B, F.01 = 3.78. The rejection
region is F > 3.78.
Since the observed value of the test statistic falls in the rejection region (F = 274.64 > 3.78), H0 is
rejected. There is sufficient evidence to indicate the model is useful at = .01.
c.
H0: 1 = 0
Ha: 1 0
The test statistic is t = 4.3.
The rejection region requires = .01/2 = .005 in each tail of the t distribution. From Table V,
Appendix, with df = n (k + 1) = 11,121 (3 + 1) = 11,117, t.005 = 2.576. The rejection region is t >
2.576 or t < -2.576.
Since the observed value of the test statistic falls in the rejection region (t = 4.3 > 2.576), H0 is
rejected. There is sufficient evidence to indicate x1 contributes significantly to the prediction of y at
= .01, holding the other variables constant.
738
Chapter 11
d.
11.85
a.
Yes. In part c, we concluded that 1 is different from 0. Because the estimate of 1 is greater than 0,
we can conclude that 1 is positive. Therefore, the earnings forecasts by the analysts at buy-side firms
are more optimistic than forecasts made by analysts at sell-side firms, holding the other variables
constant.
For obese smokers, x2 = 0. The equation of the hypothesized line relating mean REE to time after
smoking for obese smokers is:
For normal weight smokers, x2 = 1. The equation of the hypothesized line relating mean REE to time
after smoking for normal smokers is:
The reported p-value is .044. Since the p-value is small, there is evidence to indicate that interaction
between time and weight is present for > .044.
For = .01, there is no evidence to indicate that interaction between time and weight is present.
11.86
a.
a.
1 if Channel catfish
Let x1 =
0 otherwise
1 if Largemouth bass
x2 =
0 otherwise
b.
c.
d.
739
Coef
3.13
26.51
-4.09
0.00371
S = 98.57
SE Coef
38.89
21.52
37.91
0.02598
R-Sq = 1.7%
T
0.08
1.23
-0.11
0.14
P
0.936
0.220
0.914
0.887
R-Sq(adj) = 0.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
Weight
DF
3
140
143
DF
1
1
1
SS
23652
1360351
1384003
MS
7884
9717
F
0.81
P
0.490
Seq SS
23041
414
198
The least squares prediction equation is: y 3.1 26.5 x1 4.1x2 0.0037 x3
e.
3 0.0037 . For each additional gram of weight, the mean level of DDT is expected to increase by
0.0037 units, holding species constant.
f.
Coef
3.50
25.59
-3.47
0.00344
0.00082
-0.00129
S = 99.29
SE Coef
54.69
67.52
84.70
0.03843
0.05459
0.09987
R-Sq = 1.7%
T
0.06
0.38
-0.04
0.09
0.02
-0.01
P
0.949
0.705
0.967
0.929
0.988
0.990
R-Sq(adj) = 0.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
Weight
x1Weight
x2Weight
DF
1
1
1
1
1
DF
5
138
143
SS
23657
1360346
1384003
MS
4731
9858
F
0.48
Seq SS
23041
414
198
4
2
P
0.791
740
Chapter 11
g.
11.88
a.
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
For the high-tech firms, x2 = 1. The model for the high-tech firm is:
a.
1 if NW
x2 =
0 if not
1 if W
x4 =
0 if not
1 if S
x3 =
0 if not
The complete second order model for the sales price of a single-family home is:
b.
c.
741
The parameters 3, 4, and 5 allow for the y-intercepts of the 4 regions to be different. The
parameters 6, 7, and 8 allow for the peaks of the curves to be a different value of sales volume (x1)
for the four regions. The parameters 9, 10, and 11 allow for the shapes of the curves to be different
for the four regions. Thus, all the parameters from 3 through 11 allow for differences in mean sales
prices among the four regions.
e.
Coef
1904740
-70.44
0.0007211
159661
5291908
3663319
22.25
-23.86
-37.2
-0.0004210
-0.0004044
-0.0001810
SE Coef
1984278
72.09
0.0006515
2069265
4812586
4478880
73.74
92.09
103.0
0.0006589
0.0006777
0.0007333
R-Sq = 85.0%
T
0.96
-0.98
1.11
0.08
1.10
0.82
0.30
-0.26
-0.36
-0.64
-0.60
-0.25
P
0.351
0.343
0.285
0.939
0.288
0.425
0.767
0.799
0.723
0.532
0.559
0.808
R-Sq(adj) = 74.6%
742
Chapter 11
Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X1SQ
X2
X3
X4
X1X2
X1X3
X1X4
X1SQX2
X1SQX3
X1SQX4
DF
1
1
1
1
1
1
1
1
1
1
1
DF
11
16
27
SS
53633628997
9499097458
63132726455
MS
4875784454
593693591
F
8.21
P
0.000
Seq SS
3591326
64275360
11338642654
10081000583
241539024
18258475317
5579187440
7566169810
138146367
326425228
36175888
Unusual Observations
Obs
2
5
7
X1
61025
60324
61025
Price
235900
345300
240855
Fit
291659
279697
241084
SE Fit
18746
15712
24360
Residual
-55759
65603
-229
St Resid
-3.58R
3.52R
-0.42 X
H0: 1 = 2 = = 11 = 0
Ha: At least one of the coefficients is nonzero
The test statistic is F =
MS(Model)
= 8.21
MSE
The p-value is p = .000. Since the p-value is less than = .01 (p = .000 < .01), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting sales price at = .01.
11.90
a.
1 if Developing
Let x2 =
0 otherwise
The model would be:
b.
743
60
50
40
30
20
0
10
20
30
40
50
x1
60
70
80
90
From the plot, it appears that the model is appropriate. The two lines appear to have different slopes.
c.
Coef
58.786
-0.55743
-18.718
0.35368
S = 2.66123
SE Coef
1.217
0.03669
5.572
0.07615
R-Sq = 96.1%
T
48.30
-15.19
-3.36
4.64
P
0.000
0.000
0.002
0.000
R-Sq(adj) = 95.7%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1x2
DF
1
1
1
DF
3
26
29
SS
4596.5
184.1
4780.6
MS
1532.2
7.1
F
216.34
P
0.000
Seq SS
4388.0
55.7
152.8
744
Chapter 11
60
50
40
30
20
0
e.
10
20
30
40
50
x1
60
70
80
90
To determine if the slope of the linear relationship between volatility and credit rating depends on
market type, we test:
H0: 3 = 0
H a: 3 0
The test statistic is t = 4.64.
The p-value is 0.000. Since the p-value is less than = .01, H0 is rejected. There is sufficient
evidence to indicate that the slope of the linear relationship between volatility and credit rating
depends on market type at = .01.
11.91
745
11.92
a.
b.
c.
d.
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
The test statistic is F =
= .89
=
1125.2 /[30 (5 1)]
46.8833
SSE C /[n (k 1)]
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 5 2 = 3 and denominator df = n (k + 1) = 30 (5 + 1) = 24. From Table VIII, Appendix B,
F.05 = 3.01. The rejection region is F > 3.01.
Since the observed value of the test statistic does not fall in the rejection region (F = .89 3.01), H0
is not rejected. There is insufficient evidence to indicate the second-order terms are useful at = .05.
746
Chapter 11
a.
Including 0, there are five parameters in the complete model and three in the reduced model.
b.
11.93
c.
= .38
152.66 /[20 (4 1)]
10.1773
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 4 2 = 2 and denominator df = n (k + 1) = 20 (4 + 1) = 15. From Table VIII, Appendix B,
F.05 = 3.68. The rejection region is F > 3.68.
Since the observed value of the test statistic does not fall in the rejection region (F = .38 3.68), H0
is not rejected. There is insufficient evidence to indicate the complete model is better than the
reduced model at = .05.
11.94
a.
Let variables x1 through x4 be the Demographic variables, variables x5 through x11 be the Diagnostic
variables, variables x12 through x15 be the Treatment variables, and variables x16 through x21 be the
Community variables. The compete model is:
E ( y ) 0 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 9 x9
10 x10 11 x11 12 x12 13 x13 14 x14 15 x15 16 x16 17 x17
18 x18 19 x19 20 x20 21 x21
b.
To determine if the 7 Diagnostic variables contribute information for the prediction of y, we test:
H0: 5 = 6 = = 11 = 0
c.
d.
11.95
Since the p-value is so small (p < .0001), H0 is rejected. There is sufficient evidence to indicate at
least one of the seven diagnostic variables contributes information for the prediction of y.
a.
To determine whether the quadratic terms in the model are statistically useful for predicting relative
optimism, we test:
H0: 4 = 5 = 0
Ha: At least 1 i 0
b.
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 1x1 + 2x2 + 3x1x2.
c.
747
To determine whether the interaction terms in the model are statistically useful for predicting relative
optimism, we test:
H0: 3 = 5 = 0
Ha: At least 1i 0
d.
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 1x1 + 2x2 + 4x22.
To determine whether the dummy variable terms in the model are statistically useful for predicting
relative optimism, we test:
e.
f.
a.
The model from part b of Exercise 11.86 is E(y) = 0 + 1x1 + 2x2 + 3x3. The model from part c of
Exercise 11.86 is E(y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x2 + 5x1x3. These two models are nested
because all of the terms in the first model are contained in the second model. The first model is the
reduced model and the second model is the complete model.
The null hypothesis for comparing the two models is H0: 4 = 5 = 0.
c.
If we reject H0 in part b, we would conclude that at least one of the interaction terms is not 0. Thus,
we would prefer the second model.
d.
11.97
The complete model is E(y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x22 + 5x1x22 and the reduced model is
E(y) = 0 + 2x2 + 4x22.
b.
11.96
H0: 1 = 3 = 5 = 0
Ha: At least 1 i 0
If we fail to reject H0 in part b, then we would conclude that we have no evidence to indicate that the
interaction terms were significant. Thus, we would prefer the first model.
a.
Let x1 = cycle speed and x2 = cycle pressure ratio. A complete second order model is:
2
2
E ( y ) 0 1 x1 2 x2 3 x1 4 x2 5 x1 x2
b.
To determine whether the curvature terms in the complete 2nd order model are useful for predicting
heat rate, we test:
Ho: 3 = 4 = 0
Ha: At least one of the parameters 3 , 4 differs from 0
c.
2
2
The complete model is: E ( y ) 0 1 x1 2 x2 3 x1 4 x2 5 x1 x2
From the printout, SSER = 25,310,639, SSEC = 19,370,350, and MSEC = 317,547.
e.
(SSE R SSE C ) /(k g ) 25, 310, 639 19, 370, 350 /(5 3)
9.35
SSE C /[ n ( k 1)]
19, 370, 350 /[67 (5 1)]
748
Chapter 11
f.
The rejection region requires = .10 in the upper tail of the F-distribution with
1 = k g = 5 3 = 2 and 2 = n (k + 1) = 67 (5 + 1) = 61. From Table VII, Appendix B,
F.10 = 2.39. The rejection region is F > 2.39.
g.
11.98
a.
Since the observed value of the test statistic falls in the rejection region (F = 9.35 > 2.39), H0 is
rejected. There is sufficient evidence to indicate at least one of the curvature terms in the complete
2nd order model are useful for predicting heat rate at
= .10.
Model 1: R2 = .101. 10.1% of the total variation in the supervisor-directed aggression score is
explained by the terms in Model 1.
Model 2: R2 = .555. 55.5% of the total variation in the supervisor-directed aggression score is
explained by the terms in Model 2.
b.
c.
d.
H0 would be rejected. There is sufficient evidence that at least one of the variables Self-esteem,
history of aggression, Interactional injustice at primary job, and Abusive supervisor at primary job is
significant in predicting supervisor-directed aggression score.
e.
f.
11.99
a.
b.
749
c.
Since the F was significant, we reject H0 at = .05. There is sufficient evidence to indicate that at
least one of the additional variables (student ethnicity, socio-economic status, school performance,
number of math courses taken in high school and overall GPA in the math courses) contributes to the
prediction of the SAT-math score.
d.
2
Radj = .79. 79% of the sample variability of SAT-math scores is explained by the model containing
the 10 independent variables, adjusted for the sample size and the number of variables.
e.
For confidence coefficient .95, = .05 and /2 = .05/2 = .025. From Table V, Appendix B, with df =
n (k + 1) = 3,492 (10 + 1) = 3,481, t.025 = 1.96. The confidence interval is:
We are 95% confident that the mean SAT-Math score for those who were coached was anywhere
from 8.12 to 19.88 points higher than the mean for those who were not coached, holding all other
variables constant.
f.
Yes. The value of 2 decreased from 19 to 14 when the additional variables were added to the
model. Thus, the increase from coaching is not as great.
g.
h.
To determine if the model with the interaction terms is better in predicting SAT-Math scores, we test:
H0: 11 = 12 = = 19 = 0
Ha: At least one i 0 i = 11, 12, , 19
We would fit the complete model above. We would then compare it to the fitted model from part a
(Reduced model). The test statistic would be:
F
750
Chapter 11
11.100 a.
Using MINITAB, the results for fitting the reduced model are:
Regression Analysis: Price versus X1, X2, X3, X4, X1X2, X1X3, X1X4
The regression equation is
Price = - 286970 + 9.32 X1 + 578133 X2 + 60968 X3 - 575769 X4 - 10.4 X1X2
- 6.52 X1X3 + 1.00 X1X4
Predictor
Constant
X1
X2
X3
X4
X1X2
X1X3
X1X4
Coef
-286970
9.317
578133
60968
-575769
-10.408
-6.522
1.000
S = 30785.9
SE Coef
161003
2.900
183578
292823
325699
3.060
3.300
3.903
R-Sq = 70.0%
T
-1.78
3.21
3.15
0.21
-1.77
-3.40
-1.98
0.26
P
0.090
0.004
0.005
0.837
0.092
0.003
0.062
0.800
R-Sq(adj) = 59.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X2
X3
X4
X1X2
X1X3
X1X4
DF
1
1
1
1
1
1
1
DF
7
20
27
SS
44177277861
18955448594
63132726455
MS
6311039694
947772430
F
6.66
P
0.000
Seq SS
3591326
8414868549
9294417537
1463449502
17344397940
7594294303
62258704
3.98
SSE C /[ n ( k 1)]
9, 499, 097, 458 /[28 (11 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 11 7 = 4
and 2 = n (k + 1) = 28 (11 + 1) = 16. From Table VIII, Appendix B, F.05 = 3.01. The rejection
region is F > 3.01.
Since the observed value of the test statistic falls in the rejection region (F = 3.98 > 3.01), H0 is
rejected. There is sufficient evidence to indicate at least one of the quadratic terms is statistically
useful for predicting sales price at = .05.
b.
Since we rejected H0 in part a, the complete model is preferred. At least one of the quadratic terms is
significant.
d.
751
The preferred model from part b is the complete model. Using MINITAB, the results of fitting the
model without the interaction terms is:
Regression Analysis: Price versus X1, X1SQ, X2, X3, X4
The regression equation is
Price = 289549 - 2.15 X1 + 0.000019 X1SQ - 57530 X2 - 203755 X3 - 24038 X4
Predictor
Constant
X1
X1SQ
X2
X3
X4
Coef
289549
-2.150
0.00001888
-57530
-203755
-24038
S = 43381.9
SE Coef
138840
3.325
0.00001621
50653
113332
67099
R-Sq = 34.4%
T
2.09
-0.65
1.16
-1.14
-1.80
-0.36
P
0.049
0.524
0.257
0.268
0.086
0.724
R-Sq(adj) = 19.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
X1
X1SQ
X2
X3
X4
DF
1
1
1
1
1
DF
5
22
27
SS
21729048947
41403677509
63132726455
MS
4345809789
1881985341
F
2.31
P
0.079
Seq SS
3591326
64275360
11338642654
10081000583
241539024
To determine whether region and sales volume interact to affect sales price, we test:
H0: 6 =7 =8 = 9 = 10 = 11 = 0
Ha: At least 1 i 0
(SSE R SSE C ) /(k g ) (41, 403, 677, 509 9, 499, 097, 458) /(11 5)
The test statistic is F
8.96
SSE C /[ n ( k 1)]
9, 499, 097, 458 /[28 (11 1)]
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 11 5 = 6
and 2 = n (k + 1) = 28 (11 + 1) = 16. From Table VIII, Appendix B, F.05 = 2.74. The rejection
region is F > 2.74.
Since the observed value of the test statistic falls in the rejection region (F = 8.96 > 2.74), H0 is
rejected. There is sufficient evidence to indicate region and sales volume interact to affect sales price
at = .05.
d.
Since we rejected H0 in part c, the complete model is preferred. At least one of the interaction terms is
significant.
752
Chapter 11
11.101 a.
b.
c.
d.
To test for the presence of temperature-waste type interaction, we would fit the complete model listed
in part b and the reduced model found in part a. The hypotheses would be:
H0: 4 = 5 = 0
Ha: At least one i 0, for i = 4, 5
The test statistic would be F
reduced model, and SSEC is the SSE for the complete model.
11.102 a.
To determine whether the rate of increase of emotional distress with experience is different for the
two groups, we test:
H0: 4 = 5 = 0
Ha: At least one i 0, i = 4, 5
b.
To determine whether there are differences in mean emotional distress levels that are attributable to
exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
c.
To determine whether there are differences in mean emotional distress levels that are attributable to
exposure group, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0, i = 3, 4, 5
The test statistic is F =
753
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 5 2 = 3
and 2 = n (k + 1) = 200 (5 + 1) = 194. From Table VIII, Appendix B, F.05 2.60. The rejection
region is F > 2.60.
Since the observed value of the test statistic does not fall in the rejection region
(F = .93 2.60), H0 is not rejected. There is insufficient evidence to indicate that there are
differences in mean emotional distress levels that are attributable to exposure group at = .05.
11.103 a.
Using MINITAB, the output from fitting a complete second-order model is:
* NOTE *
* NOTE *
* NOTE *
predictor variables
predictor variables
predictor variables
Coef
172788
-10739
-499
-20.20
197.57
14.678
Stdev
97785
2789
1444
21.36
22.60
8.819
R-sq = 95.9%
t-ratio
1.77
-3.85
-0.35
-0.95
8.74
1.66
p
0.084
0.000
0.731
0.350
0.000
0.103
R-sq(adj) = 95.5%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
SS
MS
5 1.70956E+11 34191134720
42 7242915328
172450368
47 1.78199E+11
SOURCE
X1
X2
X1X2
X1SQ
X2SQ
DF
SEQ SS
1 1.56067E+11
1
13214024
1 1686339840
1 12711371776
1
477704384
Unusual
Obs.
14
22
34
43
47
Observations
X1
Y
62.9
203288
45.4
27105
28.2
28722
64.3
230329
63.9
212309
Fit Stdev.Fit
235455
6002
58567
3603
15156
11311
248054
8790
240469
4904
F
198.27
Residual
-32167
-31462
13566
-17725
-28160
p
0.000
St.Resid
-2.75R
-2.49R
2.03RX
-1.82 X
-2.31R
754
Chapter 11
b.
Using MINITAB, the output from fitting the reduced model is:
* NOTE *
predictor variables
Coef
-476768
11458
3404
-64.35
s = 21549
Stdev
100852
1874
1814
33.77
R-sq = 88.5%
t-ratio
-4.73
6.11
1.88
-1.91
p
0.000
0.000
0.067
0.063
R-sq(adj) = 87.8%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
SS
MS
3 1.57767E+11 52588867584
44 20431990784
464363424
47 1.78199E+11
SOURCE
X1
X2
X1X2
DF
SEQ SS
1 1.56067E+11
1
13214024
1 1686339840
Unusual
Obs.
34
38
43
Observations
X1
Y
28.2
28722
66.5
290411
64.3
230329
Fit Stdev.Fit
-59713
11922
250350
9553
202899
11574
F
113.25
Residual
88435
40061
27430
p
0.000
St.Resid
4.93RX
2.07R
1.51 X
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k g = 5 3 = 2
and 2 = n (k + 1) = 48 (5 + 1) = 42. From Table VIII, Appendix B, F.05 3.23. The rejection
region is F > 3.23.
755
Since the observed value of the test statistic falls in the rejection region (F = 38.24 > 3.23), H0 is
rejected. There is sufficient evidence to indicate that at least one of the quadratic terms contributes to
the prediction of monthly collision claims at = .05.
c.
From part b, we know at least one of the quadratic terms is significant. From part a, it appears that
none of the terms involving x2 may be significant.
2
Thus, we will fit the model with just x1 and x1 . The MINITAB output is:
Coef
185160
-11580
195.54
s = 13219
Stdev
54791
2182
21.64
R-sq = 95.6%
t-ratio
3.38
-5.31
9.04
p
0.002
0.000
0.000
R-sq(adj) = 95.4%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
SS
MS
2 1.70335E+11 85167357952
45 7863868416
174752624
47 1.78199E+11
SOURCE
X1
X1SQ
DF
SEQ SS
1 1.56067E+11
1 14267676672
Unusual
Obs.
10
14
22
34
38
47
Observations
X1
Y
35.8
28957
62.9
203288
45.4
27105
28.2
28722
66.5
290411
63.9
212309
Fit Stdev.Fit
21200
5825
230397
4044
62456
2856
14099
11344
279798
6189
243611
4570
F
487.36
Residual
7757
-27109
-35351
14623
10613
-31302
p
0.000
St.Resid
0.65 X
-2.15R
-2.74R
2.15RX
0.91 X
-2.52R
756
Chapter 11
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k g = 5 2 = 3
and 2 = n (k + 1) = 48 (5 + 1) = 42. From Table VIII, Appendix B, F.05 2.84. The rejection
region is F > 2.84
Since the observed value of the test statistic does not fall in the rejection region (F = 1.20 2.84),
H0 is not rejected. There is insufficient evidence to indicate that any of the terms involving x2
contribute to the model at = .05.
2
Thus, it appears that the best model is E(y) = 0 + 1x1 + 2 x1 . The model does not support the
analyst's claim. In the model above, the estimate for 2 is positive. This would indicate that the
higher claims are for both the young and the old. Also, there is no evidence to support the claim that
there are more claims when the temperature goes down.
11.104 a.
The best one-variable predictor of y is the one whose t statistic has the largest absolute value. The t
statistics for each of the variables are:
Independent
Variable
t = 1.6/.42 = 3.81
x1
x2
t = .9/.01 = 90
x3
t = 3.4/1.14 = 2.98
x4
t = 2.5/2.06 = 1.21
x5
t = 4.4/.73 = 6.03
x6
t = .3/.35 = .86
The variable x2 is the best one-variable predictor of y. The absolute value of the corresponding t
score is 90. This is larger than any of the others.
b.
Yes. In the stepwise procedure, the first variable entered is the one which has the largest absolute
value of t, provided the absolute value of the t falls in the rejection region.
c.
Once x2 is entered, the next variable that is entered is the one that, in conjunction with x2, has the
largest absolute t value associated with it.
11.105 a.
In Step 1, all one-variable models are fit to the data. These models are of the form:
E(y) = 0 + 1xi
Since there are 7 independent variables, 7 models are fit. (Note: There are actually only 6
independent variables. One of the qualitative variables has three levels and thus two dummy
variables. Some statistical packages will allow one to bunch these two variables together so that they
are either both in or both out. In this answer, we are assuming that each xi stands by itself.
b.
In Step 2, all two-varirable models are fit to the data, where the variable selected in Step 1, say x1, is
one of the variables. These models are of the form:
E(y) = 0 + 1x1 + 2xi
Since there are 6 independent variables remaining, 6 models are fit.
757
In Step 3, all three-variable models are fit to the data, where the variables selected in Step 2, say x1
and x2, are two of the variables. These models are of the form:
E(y) = 0 + 1x1 + 2x2 + 3xi
Since there are 5 independent variables remaining, 5 models are fit.
d.
The procedure stops adding independent variables when none of the remaining variables, when
added to the model, have a p-value less than some predetermined value. This predetermined value is
usually = .05.
e.
Two major drawbacks to using the final stepwise model as the "best" model are:
(1) An extremely large number of single parameter t-tests have been conducted. Thus, the
probability is very high that one or more errors have been made in including or excluding
variables.
(2)
11.106 a.
Often the variables selected to be included in a stepwise regression do not include the highorder terms. Consequently, we may have initially omitted several important terms from the
model.
In the first step, there are 8 one-variable models fit to the data.
b.
The best one-variable model is the model that contains the one variable with the largest absolute
value of the t-statistic. This would also correspond to the one variable with the smallest p-value.
c.
d.
1 .28 . The mean relative error for developers is estimated to be .28 lower than the mean relative
error for project leaders, holding previous accuracy constant.
.27 . The mean relative error for previous accuracy more than 20% is estimated to be .27 higher
8
than the mean relative error for previous accuracy less than 20%, holding company role of estimator
constant.
e.
11.107 a.
b.
There are a couple of reasons for being wary of using this model as the final model. First, in stepwise
regression, once a variable is in the model, it cannot be dropped. The best one variable model might
contain x1, but the best model may contain the variables x2 and x3. By including x1 in the model, we
may never get to the best model. Another reason to be wary is that we have not considered any 2nd
order terms in the model or any interactions. These higher order terms might be very important in the
model.
In step 1, all 1 variable models are fit. Thus, there are a total of 11 models fit.
In step 2, all two-variable models are fit, where 1 of the variables is the best one selected in step 1.
Thus, a total of 10 two-variable models are fit.
c.
In the 11th step, only one model is fit the model containing all the independent variables.
d.
758
Chapter 11
e.
67.7% of the total sample variability of overall satisfaction is explained by the model containing the
independent variables safety on bus, seat availability, dependability, travel time, convenience of
route, safety at bus stops, hours of service, and frequency of service.
f.
Using stepwise regression does not guarantee that the best model will be found. There may be better
combinations of the independent variables that are never found, because of the order in which the
independent variables are entered into the model. In addition, there are no squared or interaction
terms included. There is a high probability of making at least one Type 1 error.
11.108 a.
From the printout, the three variables that should be included in the model are: ST-DEPTH,
TGRSWT, and TI. They are all entered into the model using stepwise regression and all are retained.
b.
No. There may be other independent variables that were not included.
c.
d.
He would test
H0: 4 = 5 = 6 = 0 versus
Ha: At least one i 0, i = 4, 5, 6
He would fit the first-order model and record SSER. He would then fit the model with the interaction
terms and record SSEC.
The test statistic is F =
e.
To improve the model, the marine biologist could try to find other independent variables that affect y,
the log of the number of marine animals present, or higher order terms of the already identified
independent variables.
11.109 Yes. x2 and x4 are highly correlated (.93), as well as x4 and x5 (.86). When highly correlated independent
variables are present in a regression model, the results can be confusing. The researcher may want to
include only one of the variables.
11.110 a.
The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape. Such a
pattern usually indicates that curvature needs to be added to the model
b.
The plot of the residuals reveals a nonrandom pattern. The residuals versus the predicted values
shows a pattern where the range in values of the residuals increases as y increases. This indicates
that the variance of the random error, , becomes larger as the estimate of E(y) increases in value.
Since E(y) depends on the x-values in the model, this implies that the variance of is not constant
for all settings of the x's.
c.
This plot reveals an outlier, since all or almost all of the residuals should fall within 3 standard
deviations of their mean of 0.
d.
This frequency distribution of the residuals is skewed to the right. This may be due to outliers or
could indicate the need for a transformation of the dependent variable.
11.111 a.
Since the absolute value of the correlation coefficient is .983, this would imply there is a very high
potential for multicollinearity.
b.
Since the absolute value of the correlation coefficient is .074, this would imply there is a very low
potential for multicollinearity.
c.
Since the absolute value of the correlation coefficient is .722, this would imply there is a moderate
potential for multicollinearity.
d.
11.112 a.
b.
759
Since the absolute value of the correlation coefficient is .528, this would imply there is a moderate
potential for multicollinearity.
Since all the pairwise correlations are .45 or less in absolute value, there is little evidence of extreme
multicollinearity.
No. The overall model test is significant (p < .001). This implies that at least one variable contributes
to the prediction of the urban/rural rating. Looking at the individual t-tests, there are several that are
significant, namely x1, x3, and x5. There is no evidence that multicollinearity is present.
11.113 It is possible that company role of estimator and previous accuracy could be correlated with each other.
This indicates multicollinearity may be present
11.114 First, we need to compute the value of the residual:
b.
11.116 a.
The normal probability plot should be used to check for normal errors. The points in this plot are
fairly close to the straight line, so the assumption of normality appears to be satisfied.
The graph of the residuals versus the fitted or predicted values should be used to check for unequal
variances. The spread of the residuals appears to be fairly constant in this graph. It appears that the
assumption of equal variances is satisfied.
From MINITAB, the output is:
Regression Analysis: Food versus Income, Size
The regression equation is
Food = 2.79 - 0.00016 Income + 0.383 Size
Predictor
Constant
Income
Size
Coef
2.7944
-0.000164
0.38348
S = 0.7188
SE Coef
0.4363
0.006564
0.07189
R-Sq = 55.8%
T
6.40
-0.02
5.33
P
0.000
0.980
0.000
R-Sq(adj) = 52.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Income
Size
DF
1
1
DF
2
23
25
SS
15.0027
11.8839
26.8865
MS
7.5013
0.5167
F
14.52
P
0.000
Seq SS
0.2989
14.7037
No; Income and household size do not seem to be highly correlated. The correlation coefficient
between income and household size is .137.
760
Chapter 11
Frequency
10
0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Residual
Residual
-1
3
Fitted Value
Residual
b.
-1
0
10
20
30
40
50
60
70
80
90
100
Income
761
Residual
-1
0
Size
Yes; The residuals versus income and residuals versus homesize exhibit a curved shape. Such a
pattern could indicate that a second-order model may be more appropriate.
c.
No; The residuals versus the predicted values reveals varying spreads for different values of y . This
implies that the variance of is not constant for all settings of the x's.
d.
Yes; The outlier shows up in several plots and is the 26th household (Food consumption = $7500,
income = $7300 and household size = 5).
e.
No; The frequency distribution of the residuals shows that the outlier skews the frequency
distribution to the right.
762
Chapter 11
Percent
99
90
50
10
1
0.1
99.9
-4
-2
0
2
Standardized Residual
4
2
0
Frequency
80
Fitted Value
120
160
80
60
40
20
0
40
4
2
0
50
100
150
200
250
Observation Order
300
LONGITUDE
4
2
SRES1
0
23.76
23.80
90.60
90.62
90.64
4
2
0
50
100
150
200
90.66
763
a.
From the histogram of the standardized residuals, it appears that the mean of the residuals is close to
0. Thus, the assumption that the mean error is 0 appears to be met.
b.
From the plot of the standardized residuals versus the fitted values, it appears that the spread of the
residuals increases as the fitted values increase. Thus, it appears that the assumption of constant
variance is violated.
c.
From the plots of the standardized residuals versus the fitted values, it appears that there are some
outliers. There are several observations with standardized residuals of 4 or more.
d.
From the normal probability plot, the data do not form a straight line. Thus, it appears that the
assumption of normal error terms is violated.
e.
LATITUDE
0.311
0.000
LONGITUDE
0.151
0.006
-0.328
0.000
None of the pairwise correlations are large in absolute value, so there is no evidence of
multicollinearity. In addition, the global test indicates that at least one of the independent variables is
significant and each of the independent variables is statistically significant. This also indicates that
multicollinearity does not exist.
764
Chapter 11
Percent
99
90
50
10
1
0.1
99.9
-5
0
5
Standardized Residual
10.0
7.5
5.0
2.5
0.0
10
50
2
4
6
8
Standardized Residual
10
10.0
7.5
5.0
2.5
0.0
1 10 20 30 4 0 5 0 6 0 7 0 8 0 9 0 00 10 20 30 40
1 1 1 1 1
Observation Order
Standardized Residual
10
8
6
4
2
0
0
500
1000
1500
2000
2500
WEIGHT
Frequency
100
8
6
4
2
0
20
25
30
35
LENGTH
40
100
150
50
Fitted Value
45
50
55
765
Standardized Residual
10
8
6
4
2
0
0
50
100
150
200
250
300
350
MILE
From the normal probability plot, the points do not fall on a straight line, indicating the residuals are not
normal. The histogram of the residuals indicates the residuals are skewed to the right, which also indicates
that the residuals are not normal. The plot of the residuals versus yhat indicates that there is at least one
outlier and the variance is not constant. One observation has a standardized residual of more than 10 and
several others have standardized residuals greater than 3. This is also evident in the plots of the residuals
versus each of the independent variables. Since the assumptions of normality and constant variance appear
to be violated, we could consider transforming the data. We should also check the outlying observations to
see if there are any errors connected with these observations.
11.119 a.
Coef
30856
-191.57
S = 1099.17
SE Coef
2713
18.49
R-Sq = 84.3%
T
11.37
-10.36
P
0.000
0.000
R-Sq(adj) = 83.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
20
21
SS
129663987
24163399
153827386
MS
129663987
1208170
F
107.32
P
0.000
For temperature = 149, y = 30,856 191.57(150) = 2,312.07 . There are 2 observations with a
temperature of 149. The residuals for the microchips manufactured at a temperature of 149o C are
766
Chapter 11
c.
2000
RESI1
1000
-1000
-2000
120
130
140
150
160
170
Temp
Yes. Because there appears to be a U-shaped trend to the data, this indicates that there is a
curvilinear relationship between temperature and time.
Coef
12065.5
0.16969
-146.07
-0.002425
S = 633.842
SE Coef
418.5
0.03467
26.66
0.003120
R-Sq = 84.9%
T
28.83
4.89
-5.48
-0.78
P
0.000
0.000
0.000
0.440
R-Sq(adj) = 84.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
RPM
CPRATIO
RPM*CPR
DF
1
1
1
DF
3
63
66
SS
142586570
25310639
167897208
MS
47528857
401756
F
118.30
P
0.000
Seq SS
119598530
22745478
242561
Unusual Observations
Obs
11
28
36
61
62
64
RPM
18000
22516
4473
33000
30000
3600
HEATRATE
14628.0
14796.0
13523.0
16243.0
14628.0
8714.0
Fit
12710.6
14561.9
11428.0
16105.3
15296.4
7258.6
SE Fit
165.1
277.9
171.5
410.2
288.7
427.1
Residual
1917.4
234.1
2095.0
137.7
-668.4
1455.4
St Resid
3.13R
0.41 X
3.43R
0.28 X
-1.18 X
3.11RX
767
Percent
99
90
50
10
1
0.1
99.9
-4
-2
0
2
Standardized Residual
4
2
0
-2
8000
10
5
-1
0
1
2
Standardized Residual
2
0
-2
1 5 10 15 20 25 30 35 40 45 50 55 60 65
Observation Order
(response is HEATRATE)
4
Standardized Residual
3
2
1
0
-1
-2
5
10
15
20
CPRATIO
25
30
35
5000
16000
Standardized Residual
Frequency
15
-2
14000
20
10000
12000
Fitted Value
10000
15000
20000
25000
30000
35000
RPM
768
Chapter 11
From the normal probability plot, the points do not fall on a straight line, indicating the residuals are
not normal. The histogram of the residuals indicates the residuals are skewed to the right, which also
indicates that the residuals are not normal. The plot of the residuals versus yhat indicates that there
are potentially 3 outliers with standardized residuals of 3 or more. The variance appears to be
constant. On the graph of the residuals versus RPM, the spread of the residuals appears to decrease
as the value of RPM increases. This indicates the variance may not be constant for RPMs. Since the
assumptions of normality and constant variance appear to be violated, we could consider
transforming the data. We should also check the outlying observations to see if there are any errors
connected with these observations.
11.121 In multiple regression, as in simple regression, the confidence interval for the mean value of y is narrower
than the prediction interval of a particular value of y.
11.122 The error of prediction is smallest when the values of x1, x2, and x3 are equal to their sample means. The
further x1, x2, and x3 are from their means, the larger the error. When x1 = 60, x2 = .4, and x3 = 900, the
observed values are outside the observed ranges of the x values. When x1 = 30, x2 = .6, and x3 = 1300, the
observed values are within the observed ranges and consequently the x values are closer to their means.
Thus, when x1 = 30, x2 = .6, and x3 = 1300, the error of prediction is smaller.
11.123 The model-building step is the key to the success or failure of a regression analysis. If the model is a good
model, we will have a good predictive model for the dependent variable y. If the model is not a good
model, the predictive ability will not be of much use.
11.124 a.
R2 / k
(1 R ) /[n (k 1)]
2
.83 / 4
= 24.41
(1 .83)([25 (4 1)]
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 4
and denominator df = n (k + 1) = 25 (4 + 1) = 20. From Table VIII, Appendix B, F.05 = 2.87.
The rejection region is F > 2.87.
Since the observed value of the test statistic falls in the rejection region (F = 24.41 > 2.87), H0 is
rejected. There is sufficient evidence to indicate at least one of the parameters is nonzero at =
.05.
b.
H0: 1 = 0
Ha: 1 < 0
The test statistic is t =
1 0
s
2.43 0
= 2.01
1.21
The rejection region requires = .05 in the lower tail of the t distribution with df = n (k + 1) = 25
(4 + 1) = 20. From Table V, Appendix B, t.05 = 1.725. The rejection region is t < 1.725.
Since the observed value of the test statistic falls in the rejection region (t = 2.01 < 1.725), H0 is
rejected. There is sufficient evidence to indicate 1 is less than 0 at = .05.
c.
769
H0: 2 = 0
Ha: 2 > 0
The test statistic is t =
2 0
s
.05 0
= .31
.16
The rejection region requires = .05 in the upper tail of the t distribution. From part b above, the
rejection region is t > 1.725.
Since the observed value of the test statistic does not fall in the rejection region (t = .31 1.725), H0
H0: 3 = 0
Ha: 3 0
The test statistic is t =
3 0
s
.62 0
= 2.38
.26
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = 20. From
Table V, Appendix B, t.025 = 2.086. The rejection region is t < 2.086 or t > 2.086.
Since the observed value of the test statistic falls in the rejection region (t = 2.38 > 2.086), H0 is
rejected. There is sufficient evidence to indicate 3 is different from 0 at = .05.
11.125 a.
b.
R2 = .916. About 91.6% of the sample variability in the y's is explained by the model E(y) = 0 +
1x1 + 2x2
c.
MSR 7400
= 64.91
=
MSE 114
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k = 2 and 2 = n
(k + 1) = 15 (2 + 1) = 12. From Table VIII, Appendix B, F.05 = 3.89. The rejection region is F >
3.89.
Since the observed value of the test statistic falls in the rejection region (F = 64.91 > 3.89), H0 is
rejected. There is sufficient evidence to indicate the model is useful for predicting y at = .05.
770
Chapter 11
d.
H0: 1 = 0
Ha: 1 0
The test statistic is t =
1
s
1.836
= 5.01
.367
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 15 (2 + 1) = 12. From Table V, Appendix B, t.025 = 2.179. The rejection region is t < 2.179 or t
> 2.179.
Since the observed value of the test statistic falls in the rejection region (t = 5.01 < 2.179), H0 is
rejected. There is sufficient evidence to indicate 1 is not 0 at = .05.
e.
The standard deviation is MSE = 114 = 10.68. We would expect about 95% of the
observations to fall within 2(10.68) = 21.36 units of the fitted regression line.
11.126 From the plot of the residuals for the straight line model, there appears to be a mound shape which implies
the quadratic model should be used.
11.127 E(y) = 0 + 1x1 + 2x2 + 3x3
1, if level 2
where x1 =
0, otherwise
11.128 a.
1, if level 3
x2 =
0, otherwise
1, if level 4
x3 =
0, otherwise
0, otherwise
b.
2
2
2
E(y) = 0 + 1x1 + 2 x1 + 3x2 + 4x3 + 5x1x2 + 6x1x3 + 7 x1 x2 + 8 x1 x3
11.131 Even though SSE = 0, we cannot estimate 2 because there are no degrees of freedom corresponding to
error. With three data points, there are only two degrees of freedom available. The degrees of freedom
corresponding to the model is k = 2 and the degrees of freedom corresponding to error is n (k + 1) = 3
(2 + 1) = 0. Without an estimate for 2, no inferences can be made.
771
is also fit to the data and its sum of squares for error is obtained, denoted SSER. Then the test statistic
is:
F=
d.
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = 2 and
denominator df = 29. From Table VIII, Appendix B, F.05 = 3.33. The rejection region is F > 3.33.
11.133 a.
b.
A confidence interval for the difference of two population means, ( 1 2 ), could be used. Since
both sample sizes are over 30, the large sample confidence interval is used (with independent
samples).
1 if public college
Let x =
0 otherwise
The model is E(y) = 0 + 1x
c.
11.134 a.
1 is the difference between the two population means. A point estimate for 1 is 1 . A confidence
interval for 1 could be used to estimate the difference in the two population means.
1.
2.
3.
4.
5.
b.
The quantitative variables GMAT score, verbal GMAT score, undergraduate GPA, and first-year
graduate GPA should all be positively correlated to final GPA.
772
Chapter 11
c.
d.
e.
2 = the final GPA will increase by 2 for each additional increase of one unit of verbal GMAT score,
holding the remaining variables constant.
3 = the final GPA will increase by 3 for each additional increase of one undergraduate GPA point,
holding the remaining variables constant.
4 = the final GPA will increase by 4 for each additional increase of one first-year graduate GPA
point, holding the remaining variables constant.
5 = difference in mean final GPA between student cohort year 2 and year 1.
6 = difference in mean final GPA between student cohort year 3 and year 1.
f.
g.
11.135 a.
b.
x1 = diameter of orange
1 if Brand B
x2 =
0 if not
c.
d.
773
For part b:
For part c:
e.
To determine whether the model in part c provides more information for predicting yield than does
the model in part b, we test:
H0: 3 = 0
Ha: 3 0
f.
To compute SSER: The model in part b is fit and SSER is the sum of squares for error.
To compute SSEC: The model in part c is fit and SSEC is the sum of squares for error.
b.
R2 = .31. 31% of the total sample variation of the natural log of the level of CO2 emissions in 1996 is
explained by the model containing the 7 independent variables.
The test statistic is F
R2 k
(1 R ) [n (k 1)]
2
.31 7
3.72
(1 .31) [66 (7 1)]
The rejection region requires = .01 in the upper tail of the F-distribution with 1 = k = 7 and
2 = n (k + 1) = 66 (7 + 1) = 58. From Table VIII, Appendix B, F.01 = 2.95. The rejection region
is F > 2.95.
Since the observed value of the test statistic falls in the rejection region (F = 3.72 > 2.95), H0 is
rejected. There is sufficient evidence to indicate that at least one of the 7 independent variables is
useful in the prediction of natural log of the level of CO2 emissions in 1996 at = .01.
774
Chapter 11
c.
To determine if foreign investments in 1980 is a useful predictor of CO2 emissions in 1996, we test:
H0: 1 = 0
Ha: 1 0
d.
The test statistic is t = 2.52 and the p-value is p < 0.05. Since the observed p-value is less than
(p < .05), H0 is rejected. There is sufficient evidence to indicate foreign investments in 1980 is a
useful predictor of CO2 emissions in 1996 at = .05.
11.137 Variables that are highly correlated with each other are x4 and x5 (r = -.84). When highly correlated
independent variables are present in a regression model, the results can be confusing. Possible problems
include:
1.
Global test indicates at least one independent variable is useful in the prediction of y, but none of the
individual tests for the independent variables is significant.
2.
The signs of the estimated beta coefficients are opposite from what is expected.
11.138 a.
b.
1 .28 . The mean value for the relative error of the effort estimate for developers
is estimated to be .28 units below that of project leaders, holding previous accuracy constant.
8 .27 . The mean value for the relative error of the effort estimate if previous accuracy is more
than 20% is estimated to be .27 units above that if previous accuracy is less than 20%, holding
company role of estimator constant.
c.
11.139 a.
One possible reason for the sign of 1 being opposite from what is expected could be that company
role of estimator and previous accuracy could be correlated.
2
R = .712. 71.2% of the total sample variation in the fees charged by auditors is explained
H0: 1 = 2 = 3 = 4 = 5 = 6 = 7 = 0
Ha: At least one i 0, i = 1, 2, 3, ..., 7
The test statistic is F = 111.1 (from table).
Since no was given, we will use = .05. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = k = 7 and 2 = n (k + 1) = 268 (7 + 1) = 260. From Table VIII,
Appendix B, F.05 2.01. The rejection region is F > 2.01.
Since the observed value of the test statistic falls in the rejection region (F = 111.1 > 2.01), H0 is
rejected. There is sufficient evidence to indicate that the model is adequate for predicting the audit
fees at = .05.
c.
If new auditors charge less than incumbent auditors, then 1 is negative. By definition, x1 = 1 if new
auditor and 0 if incumbent. Therefore, we will be adding to the mean only for new auditors. If new
auditors charge less, we have to add a negative number.
11.140 a.
775
1 if no
Let x1 =
0 if yes
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded yes to the question "Flextime
of the position applied for" and 1 is the difference in the mean job preference between those who
responded 'no' to the question and those who answered yes to the question.
b.
1 if referral
Let x1 =
0 if not
1 if on-premise
x2 =
0 if not
1 if counseling
Let x1 =
0 if not
1 if active search
x2 =
0 if not
1 if not married
Let x1 =
0 if married
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for those who responded married to marital status and
1 is the difference in the mean job preference between those who responded not married and those
who answered married.
e.
1 if female
Let x1 =
0 if male
The model would be E(y) = 0 + 1x1
In this model, 0 is the mean job preference for males and 1 is the difference in the mean job
preference between females and males.
776
Chapter 11
11.141 The correlation coefficient between Importance and Replace is .2682. This correlation coefficient is fairly
small and would not indicate a problem with multicollinearity between Importance and Replace. The
correlation coefficient between Importance and Support is .6991. This correlation coefficient is fairly large
and would indicate a potential problem with multicollinearity between Importance and Support. Probably
only one of these variables should be included in the regression model. The correlation coefficient
between Replace and Support is .0531. This correlation coefficient is very small and would not indicate a
problem with multicollinearity between Replace and Support. Thus, the model could probably include
Replace and one of the variables Support or Importance.
11.142 CEO income (x1) and stock percentage (x2) are said to interact if the effect of one variable, say CEO
income, on the dependent variable profit (y) depends on the level of the second variable, stock percentage.
11.143 a.
1 if intervention group
Let x2 =
0 if otherwise
The first-order model would be:
If pretest score and group interact, the first-order model would be:
For the control group, x2 = 0. The first-order model including the interaction is:
11.144 a.
777
SOURCE
DF
SUM OF
SQUARES
MEAN
SQUARE
MODEL
ERROR
C TOTAL
3
16
19
25784705.01
568826.19
26353531.20
ROOT MSE
DEP MEAN
C.V.
188.5514
3014.2
6.255438
F VALUE
PROB>F
8594901.67
35551.63709
241.758
0.0001
R-SQUARE
ADJ R-SQ
0.9784
0.9744
PARAMETER ESTIMATES
VARIABLE
PARAMETER
ESTIMATE
STANDARD
ERROR
T FOR H0:
PARAMETER=0
INTERCEP
X1
X2
X1X2
b.
DF
1
1
1
1
1333.17830
-0.15122302
-2.62532461
0.05195415
290.99944
0.37864583
5.34596285
0.006863831
4.581
-0.399
-0.491
7.569
0.0003
0.6949
0.6300
0.0001
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F =
MSR
8, 594, 901.67
= 241.758
MSE
35, 551.637
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 3
and denominator df = n (k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B, F.05 = 3.24.
The rejection region is F > 3.24.
Since the observed value of the test statistic falls in the rejection region (F = 241.758 > 3.24), H0 is
rejected. There is sufficient evidence to indicate the model is useful at
= .05.
c.
H0: 3 = 0
Ha: 3 0
The test statistic is t =
3 0
= 7.569.
778
Chapter 11
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 20 (3 + 1) = 16. From Table V, Appendix B, t.025 = 2.120. The rejection region is t < 2.120 or t
> 2.120.
Since the observed value of the test statistic falls in the rejection region (t = 7.569 > 2.120), H0 is
rejected. There is sufficient evidence to indicate the interaction between advertising expenditure and
shelf space is present at = .05.
d.
Advertising expenditure and shelf space are said to interact if the affect of advertising expenditure on
sales is different at different levels of shelf space.
e.
If a first-order model was used, the effect of advertising expenditure on sales would be the same
regardless of the amount of shelf space. If interaction really exists, the effect of advertising
expenditure on sales would depend on which level of shelf space was present.
f.
Since the data collected are sequential, it is fairly unlikely that the error terms are independent.
11.145 a.
Not necessarily. If Nickel was highly correlated to several other variables, then it might be better to
keep Nickel and drop some of the other highly correlated variables.
b.
Using stepwise regression is a good start for selecting the best set of predictor variables. However,
one should use caution when looking at the model selected using stepwise regression. Sometimes
important variables are not selected to be entered into the model. Also, many t-tests have been run,
thus inflating the Type I and Type II error rates. One must also consider using higher order terms in
the model and interaction terms.
c.
No, further exploration should be used. One should consider using higher order terms for the
variables (i.e. squared terms) and also interaction terms.
Using MINITAB, a scattergram of the data is:
Scatterplot of Rate vs Time
1.00
0.75
Rate
11.146 a.
0.50
0.25
0.00
0.0
0.5
1.0
1.5
Time
2.0
2.5
3.0
It appears that as the time increases, the rate decreases but at a decreasing rate.
779
Coef
1.00705
-1.1671
0.28975
S = 0.101142
SE Coef
0.07899
0.1219
0.03937
R-Sq = 92.7%
T
12.75
-9.57
7.36
P
0.000
0.000
0.000
R-Sq(adj) = 91.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Time
Tmsq
DF
1
1
DF
2
12
14
SS
1.54782
0.12276
1.67057
MS
0.77391
0.01023
F
75.65
P
0.000
Seq SS
0.99365
0.55416
To determine if there is an upward curvature in the relationship between surface production rate and
time after turnoff, we test:
H0: 2 = 0
H a: 2 > 0
From the printout, the test statistic is t = 7.36 and the p-value is p = 0.000/2 = 0.000. Since the p-value
is less than (p = 0.000 < .05), Ho is rejected. There is sufficient evidence to indicate there is an
upward curvature in the relationship between surface production rate and time after turnoff at = .05.
11.147 a.
780
Chapter 11
b.
1 if 1 35W
Let x2 =
0 if not
The complete second-order model would be
E(y) = 0 + 1x1 + 2x12 + 3x2 + 4x1x2 + 5x12x2
c.
StDev
144.5
0.01388
0.00000033
1094
0.09829
0.00000220
R-Sq = 97.2%
T
5.37
7.50
-6.73
0.21
-0.09
0.12
P
0.000
0.000
0.000
0.833
0.926
0.903
R-Sq(adj) = 97.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x1sq
x2
x1x2
x1sqx2
DF
1
1
1
1
1
DF
5
66
71
SS
555741
16027
571767
MS
111148
243
F
457.73
P
0.000
Seq SS
254676
21495
279383
183
4
Unusual Observations
Obs
x1
y
Fit StDev Fit
27 19062 1917.64 1953.27
2.51
48 26148 1982.02 1978.23
9.10
53 26166 1972.92 1978.01
9.15
55 20250 2120.00 2130.56
10.57
56 20251 2140.00 2130.57
10.57
63 24885 2160.02 2161.81
12.67
Residual
-35.63
3.79
-5.09
-10.56
9.43
-1.79
St Resid
-2.32R
0.30 X
-0.40 X
-0.92 X
0.82 X
-0.20 X
781
StDev
578.9
0.05551
0.00000132
R-Sq = 48.3%
T
0.34
2.69
-2.24
P
0.734
0.009
0.028
R-Sq(adj) = 46.8%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x1sq
DF
2
69
71
DF
1
1
Unusual Observations
Obs
x1
y
30
16691 1916.13
48
26148 1982.02
53
26166 1972.92
56
20251 2140.00
SS
276171
295597
571767
MS
138085
4284
F
32.23
P
0.000
Seq SS
254676
21495
Fit
1865.11
2079.68
2079.59
2007.88
StDev Fit
23.39
33.08
33.31
10.43
Residual
51.02
-97.66
-106.67
132.12
St Resid
0.83 X
-1.73 X
-1.89 X
2.04R
782
Chapter 11
= 383.76
Since no was given we will use = .05. The rejection region requires = .05 in the upper tail of
the F-distribution with 1 = (k g) = (5 2) = 3 and 2 = n (k + 1) = 72 (5 + 1) = 66. From
Table VIII, Appendix B, F.05 2.76. The rejection region is
F > 2.76.
Since the observed value of the test statistic falls in the rejection region
(F = 383.76 > 2.76), H0 is rejected. There is sufficient evidence to indicate the curvilinear
relationship is different at the two locations at = .05.
d.
From this plot, we notice that there is only one point more than 2 standard deviations from the mean
and no points that are more than 3 standard deviations from the mean. Thus, there do not appear to
be any outliers. There is no curve to the residuals, so we have the appropriate model.
A stem-and-leaf display of the residuals is:
Character Stem-and-Leaf Display
Stem-and-leaf of RESI1
Leaf Unit = 1.0
1
1
2
5
13
23
29
(10)
33
28
21
13
10
3
-3
-3
-2
-2
-1
-1
-0
-0
0
0
1
1
2
2
= 72
5
5
210
99877755
4443221100
996655
4432111000
03344
5678899
11222244
577
0012334
556
783
The stem-and-leaf display looks fairly mound-shaped, so it appears that the assumption of normality
is valid.
A plot of the residuals versus the fitted values is:
From this plot, there is no cone-shape. Thus, it appears that the assumption of constant variance is
valid.
11.148 a.
b.
StDev
12.67
0.00000028
0.2326
0.1316
0.1834
R-Sq = 51.2%
T
2.28
-0.38
3.63
-2.74
-1.64
P
0.034
0.708
0.002
0.013
0.117
R-Sq(adj) = 41.5%
784
Chapter 11
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x3
x4
DF
4
20
24
DF
1
1
1
1
Unusual Observations
Obs
x1
y
4 11940345 32.60
12
4905123 27.00
SS
753.76
717.40
1471.17
MS
188.44
35.87
F
5.25
P
0.005
Seq SS
129.96
355.43
172.19
96.17
Fit
17.25
16.17
StDev Fit
3.40
4.36
Residual
15.35
10.83
St Resid
3.11R
2.63R
The least squares prediction line is y = 28.9 .00000011x1 + .844x2 .360x3 .300x4.
To determine if the model is useful for predicting percentage of problem mortgages, we test:
H0: 1 = 2 = 3 = 4 = 0
Ha: At least one of the coefficients is nonzero
MS(Model)
= 5.25
MSE
The p-value is p = .005. Since the p-value is less than = .05 (p = .005 < .05), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages
at = .05.
c.
0 = 28.9. This is merely the y-intercept. It has no other meaning in this problem.
1 = 0.00000011. For each unit increase in total mortgage loans, the mean percentage of problem
mortgages is estimated to decrease by 0.00000011, holding percentage of invested assets, percentage
of commercial mortgages, and percentage of residential mortgages constant.
2 = 0.844. For each unit increase in percentage of invested assets, the mean percentage of problem
mortgages is estimated to increase by 0.844, holding total mortgage loans, percentage of commercial
mortgages, and percentage of residential mortgages constant.
3 = 0.360. For each unit increase in percentage of commercial mortgages, the mean percentage of
problem mortgages is estimated to decrease by 0.360, holding total mortgage loans, percentage of
invested assets, and percentage of residential mortgages constant.
4 = 0.300. For each unit increase in percentage of residential mortgages, the mean percentage of
problem mortgages is estimated to decrease by 0.300, holding total mortgage loans, percentage of
invested assets, and percentage of commercial mortgages constant.
d.
From the scattergrams, it appears that possibly x2 and x4 might warrant inclusion in the model as
second order terms.
785
786
Chapter 11
e.
StDev
13.81
0.00000025
0.9935
0.1127
0.6079
0.02665
0.02334
R-Sq = 69.9%
T
4.07
-0.31
-1.83
-3.99
0.37
2.89
-0.81
P
0.001
0.760
0.084
0.001
0.718
0.010
0.429
R-Sq(adj) = 59.9%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x3
x4
x2sq
x4sq
DF
6
18
24
DF
1
1
1
1
1
1
Unusual Observations
Obs
x1
y
4 11940345 32.600
10 5328142
7.500
12 4905123 27.000
20 2978628
3.200
SS
1029.03
442.13
1471.17
MS
171.51
24.56
F
6.98
P
0.001
Seq SS
129.96
355.43
172.19
96.17
259.22
16.05
Fit
26.777
16.105
16.559
11.759
StDev Fit
4.038
2.599
3.607
2.679
Residual
5.823
-8.605
10.441
-8.559
St Resid
2.03R
-2.04R
3.07R
-2.05R
MS(Model)
= 6.98
MSE
787
The p-value is p = .001. Since the p-value is less than = .05 (p = .001 < .05), H0 is rejected. There
is sufficient evidence to indicate the model is useful in predicting percentage of problem mortgages
at = .05.
f.
To determine if one or more of the second-order terms of our model contribute information for the
prediction of the percentage of problem mortgages, we test:
H0: 5 = 6 = 0
Ha: At least one of the coefficients is nonzero
= 5.60
442.13 /[25 (6 1)]
SSE C /[n (k 1)]
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = (k g) = (6 4)
= 2 and 2 = n (k + 1) = 25 (6 + 1) = 18. From Table VIII, Appendix B, F.05 = 3.55. The
rejection region is F > 3.55.
Since the observed value of the test statistic falls in the rejection region (F = 5.60 > 3.55), H0 is
rejected. There is sufficient evidence to indicate one or more of the second-order terms of our model
contribute information for the prediction of the percentage of problem mortgages at = .05.
11.149 a.
y = market share
1 if VH
x1 =
0 otherwise
1 if H
x2 =
0 otherwise
1 if M
x3 =
0 otherwise
We assume that the error terms ( i) or y's are normally distributed at each exposure level, with a
common variance. Also, we assume the i's have a mean of 0 and are independent.
b.
No interaction terms were included because we have only one independent variable, exposure level.
Even though we have 3 xi's in the model, they are dummy variables and correspond to different levels
of the one independent variable.
788
Chapter 11
c.
Coef
10.2333
0.5000
2.0167
0.6833
S = 0.2655
SE Coef
0.1084
0.1533
0.1533
0.1533
R-Sq = 90.4%
T
94.41
3.26
13.16
4.46
P
0.000
0.004
0.000
0.000
R-Sq(adj) = 89.0%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x3
DF
3
20
23
DF
1
1
1
SS
13.3433
1.4100
14.7533
MS
4.4478
0.0705
F
63.09
P
0.000
Seq SS
0.7200
11.2225
1.4008
To determine if the firm's expected market share differs for different levels of advertising exposure,
we test:
H0: 1 = 2 = 3 = 0
Ha: At least one i 0, i = 1, 2, 3
The test statistic is F = 63.09.
The rejection region requires = .05 in the upper tail of the F-distribution with 1 = k = 3 and 2 = n
(k + 1) = 24 (3 + 1) = 20. From Table VIII, Appendix B, F.05 = 3.10. The rejection region is F >
3.10.
Since the observed value of the test statistic falls in the rejection region (F = 63.09 > 3.10), H0 is
rejected. There is sufficient evidence to indicate the firm's expected market share differs for different
levels of advertising exposure at = .05.
11.150 a.
789
SOURCE
DF
SUM OF
SQUARES
MEAN
SQUARE
MODEL
ERROR
C TOTAL
3
16
11
2396.36410
128.58590
2524.95000
2.83489
23.05000
12.29889
ROOT MSE
DEP MEAN
C.V.
F VALUE
PROB>F
798.78803
8.03662
99.394
0.0001
R-SQUARE
ADJ R-SQ
0.9491
0.9395
PARAMETER ESTIMATES
PARAMETER
STANDARD
INTERCEP
X1
X1SQ
X2
1
1
1
1
-11.768830
10.293782
-0.417991
13.244076
3.05032146
1.43788129
0.16132974
1.50325080
T FOR H0:
-3.858
7.159
-2.591
8.810
VARI
0.0014
0.0001
0.0197
0.0001
b.
H0: 2 = 0
Ha: 2 0
The test statistic is t = 2.591.
The p-value is p = .0197. Since the p-value is less than (p = .0197 < .05), H0 is rejected. There is
sufficient evidence to conclude that the second-order term in the model proposed by the operations
manager is necessary at = .05.
790
Chapter 11
c.
The reduced model E(y) = 0 + 3x2 was fit to the data. The SAS output is:
DEP VARIABLE: Y
ANALYSIS OF VARIANCE
SOURCE
DF
SUM OF
SQUARES
MEAN
SQUARE
MODEL
ERROR
C TOTAL
1
18
19
1.25000000
2523.70000
2524.95000
ROOT MSE
DEP MEAN
C.V.
11.84084
23.05
51.37025
F VALUE
PROB>F
1.25000000
140.20556
0.009
0.9258
R-SQUARE
ADJ R-SQ
0.0005
-0.0550
PARAMETER ESTIMATES
VARIABLE
DF
PARAMETER
ESTIMATE
STANDARD
ERROR
T FOR H0:
PARAMETER=0
INTERCEP
X2
1
1
23.30000000
-0.50000000
3.74440323
5.29538583
6.223
-0.094
0.0001
0.9258
H0: 1 = 2 = 0
Ha: At least one i 0, i = 1, 2
(SSE R SSE C) /(k g )
SSE C /[n (k 1)]
(2523.7 128.586) /(3 1) 1197.557
= 149.01
=
128.586 /[20 (3 1)]
8.036625
The rejection region requires = .10 in the upper tail of the F distribution with numerator df = k g
= 3 1 = 2 and denominator df = n (k + 1) = 20 (3 + 1) = 16. From Table VIII, Appendix B,
F.10 = 2.67. The rejection region is F > 2.67.
Since the observed value of the test statistic falls in the rejection region (F = 149.01
> 2.67), H0 is rejected. There is sufficient evidence to indicate the age of the machine contributes
information to the model at = .10.
After adjusting for machine type, there is evidence that down time is related to age.
11.151 a.
791
0 = 105 has no meaning because x3 = 0 is not in the observable range. 0 is simply the yintercept.
1 = 25. The estimated difference in mean attendance between weekends and weekdays is 25,
temperature and weather constant.
2 = 100. The estimated difference in mean attendance between sunny and overcast days is 100,
type of day (weekend or weekday) and temperature constant.
3 = 10. The estimated change in mean attendance for each additional degree of temperature is 10,
type of day (weekend or weekday) and weather (sunny or overcast) held constant.
b.
.65 / 3
= 16.10
(1 R 2 ) /[n (k 1)] (1 .65) /[30 (3 1)]
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 3
and denominator df = n (k + 1) = 30 (3 + 1) = 26. From Table VIII, Appendix B, F.05 2.98.
The rejection region is F > 2.98.
Since the observed value of the test statistic falls in the rejection region (F = 16.10 > 2.98), H0 is
rejected. There is sufficient evidence to indicate the model is useful for predicting daily attendance
at = .05.
c.
1
s
25 0
= 2.5
10
The rejection region requires = .10 in the upper tail of the t distribution with df = n (k + 1) = 30
(3 + 1) = 26. From Table V, Appendix B, t.10 = 1.315. The rejection region is t > 1.315.
Since the observed value of the test statistic falls in the rejection region (t = 2.5 > 1.315), H0 is
rejected. There is sufficient evidence to indicate the mean attendance increases on weekends at =
.10.
d.
e.
We are 90% confident that the actual attendance for sunny weekdays with a temperature of 95 is
between 645 and 1245.
792
Chapter 11
11.152 a.
x3 = 90 y = 800
x3 = 100 y = 850
For a sunny weekend, x1 = 1 and x2 = 1:
x3 = 90 y = 1450
x3 = 100 y = 1650
For both sunny weekdays and sunny weekend days, as the predicted high temperature increases, so
does the predicted day's attendance. However, the predicted day's attendance on sunny weekend
days increases at a faster rate than on sunny weekdays. Also, the predicted day's attendance is higher
on sunny weekend days than on sunny weekdays.
b.
4
s
15
=5
3
The rejection region requires /2 = .05/2 = .025 in each tail of the t distribution with df = n (k + 1)
= 30 (4 + 1) = 25. From Table V, Appendix B, t.025 = 2.06. The rejection region is t < 2.06 or t >
2.06.
Since the observed value of the test statistic falls in the rejection region (t = 5 > 2.06), H0 is rejected.
There is sufficient evidence to indicate the interaction term is a useful addition to the model at =
.05.
c.
793
d.
The width of the interval in Exercise 11.151e is 1245 645 = 600, while the width is
850 800 = 50 for the model containing the interaction term. The smaller the width of the interval,
the smaller the variance. This implies that the interaction term is quite useful in predicting daily
attendance. It has reduced the unexplained error.
e.
Because an interaction term including x1 is in the model, the coefficient corresponding to x1 must be
interpreted with caution. For all observed values of x3 (temperature), the interaction term value is
greater than 700.
11.153 a.
b.
The model specified in part a seems appropriate. The points for E, F, and G cluster around three
parallel lines.
794
Chapter 11
c.
Coef
StDev
T
P
188875
28588
6.61
0.000
15617
1066
14.66
0.000
-103046
31784
-3.24
0.004
-152487
39157
-3.89
0.001
R-Sq = 91.8%
R-Sq(adj) = 90.7%
Analysis of Variance
Source
DF
SS
MS
Regression
Residual Error
Total
3
21
24
9.86170E+11
87700442851
1.07387E+12
3.28723E+11
4176211564
78.71
0.000
Source
x1
x6
x7
DF
1
1
1
SeqSS
9.15776E+11
7061463149
63332198206
Unusual Observations
Obs
x1
y
10
62.0
950000
23
14.0
573200
Fit
1054078
407512
StDev Fit
53911
26670
Residual
-104078
165688
St Resid
-2.92RX
2.81R
e.
795
We must first fit a reduced model with just x1, number of apartments. Using MINITAB, the output
is:
The regression equation is
y = 101786 + 15525 x1
Predictor
Constant
x1
Coef
101786
15525
S = 82908
StDev
23291
1345
R-Sq = 85.3%
T
4.37
11.54
P
0.000
0.000
R-Sq(adj) = 84.6%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
23
24
Unusual Observations
Obs
x1
y
4
26.0
676200
10
62.0
950000
23
14.0
573200
SS
9.15776E+11
1.58094E+11
1.07387E+12
Fit
505433
1064353
319140
MS
9.15776E+11
6873656705
F
133.23
StDev Fit
24930
69058
16765
Residual
170757
-114353
254060
P
0.000
St Resid
2.16R
-2.49RX
3.13R
= 8.43
SSE C /[n (k 1)]
4,176, 211, 564
The rejection region requires = .05 in the upper tail of the F distribution with 1 = k g = 3 1 = 2
and 2 = n (k + 1) = 25 (3 + 1) = 21. From Table VIII, Appendix B, F.05 = 3.47. The rejection
region is F > 3.47.
Since the observed value of the test statistic falls in the rejection region (F = 8.43 > 3.47), H0 is
rejected. There is evidence to indicate that the relationship between sale price and number of units
differs depending on the physical condition of the apartments at = .05.
796
Chapter 11
f.
x2
x3
x4
x5
x6
x7
x1
-0.014
0.800
0.224
0.878
0.175
-0.128
x2
x3
x4
x5
-0.188
-0.363
0.027
-0.447
0.392
0.166
0.673 0.089
0.271 0.112 0.020
-0.118 0.050 -0.238
x6
-0.564
When highly correlated independent variables are present in a regression model, the results are
confusing. The researchers may only want to include one of the variables. This may be the case for
the variables: x1 and x3, x1 and x5, x3 and x5
g.
797
Residuals Versus x1
(response is y)
Residuals Versus x2
(response is y)
Residuals Versus x3
(response is y)
798
Chapter 11
Residuals Versus x4
(response is y)
Residuals Versus x5
(response is y)
799
1 if C
11.154 Let x1 = Length of operation and let x 2
0 otherwise
To allow for the relationship between Drop in Light Output and Length of Operation to be different for the
two different Bulb Surfaces, we will fit the model: E(y) = 0 + 1x1 + 2x2 + 3x1x2.
Using MINITAB, the results of fitting
Regression Analysis: DROP versus x1, x2, x1x2
The regression equation is
DROP = 1.46 + 0.00473 x1 + 5.39 x2 + 0.00991 x1x2
Predictor
Constant
x1
x2
x1x2
Coef
1.464
0.004732
5.393
0.009911
S = 3.15719
SE Coef
2.151
0.001492
3.042
0.002109
R-Sq = 95.5%
T
0.68
3.17
1.77
4.70
P
0.512
0.010
0.107
0.001
R-Sq(adj) = 94.1%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1x2
DF
1
1
1
DF
3
10
13
SS
2106.68
99.68
2206.36
MS
702.23
9.97
F
70.45
P
0.000
Seq SS
840.88
1045.79
220.02
800
Chapter 11
From the printout, the test statistic is F = 70.45 and the p-value is p = 0.000. Since the p-value is so small,
H0 is rejected. There is sufficient evidence to indicate the model is adequate for predicting Drop in Light
Output at any reasonable value of .
For this model, R2 = 95.5%. 95.5% of the total variability of the Drop in Light Outputs is explained by th
model containing Length of Operation, Bulb Surface, and the Interaction of Bulb Surface and Length of
Operation.
To determine if the interaction between Bulb Surface and Length of Operation is significant, we test:
H0: 3 = 0
Ha: 3 0
From the printout, the test statistic is t = 4.70 and the p-value is p = 0.001. Since the p-value is so small, H0
is rejected. There is sufficient evidence to indicate Bulb Surface and Length of Operation interact at any
reasonable value of .
Using MINITAB, the residual plots are:
Residual Plots for DROP
Normal Probability Plot of the Residuals
99
Residual
Percent
90
50
0
-4
10
1
-8
-4
0
Residual
-8
20
30
Fitted Value
40
6
Residual
Frequency
10
4
2
0
-6
-4
-2
0
Residual
0
-4
-8
5 6 7 8 9 10 11 12 13 14
Observation Order
From the histogram of the residuals, the residuals look somewhat mound-shaped. In addition, the normal
probability plot looks to be a fairly straight line. Thus, the assumption of normal errors appears to be valid.
From the plot of the residuals versus the fitted values, there is no funnel shape. It does not appear that the
error terms increase or decrease as the fitted values increase. Thus, it appears that the assumption of
constant variance appears to be valid.
It appears that the model is a pretty good model for the prediction of the Drop in Light Output.
801
To determine whether the complete model contributes information for the prediction of y, we test:
H0: 1 = 2 = 3 = 4 = 5 = 0
Ha: At least one of the 's is not 0, i = 1, 2, 3, 4, 5
MSR
SS(Model) 4, 911.56
982.31
k
5
MSE
b.
SSE
1, 830.44
53.84
n (k 1) 40 (5 1)
MSR 982.31
=
= 18.24
MSE 53.84
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k = 5
and denominator df = n (k + 1) = 40 (5 + 1) = 34. From Table VIII, Appendix B, F.05 2.53.
The rejection region is F > 2.53.
Since the observed value of the test statistic falls in the rejection region (F = 18.24 > 2.53), H0 is
rejected. There is sufficient evidence to indicate that the complete model contributes information for
the prediction of y at = .05.
c.
To determine whether a second-order model contributes more information than a first-order model
for the prediction of y, we test:
H0: 3 = 4 = 5 = 0
Ha: At least one i 0,
d.
i = 3, 4, 5
SSE C /[ n ( k 1)]
1830.44 /(40 (5 1)
=
455.5733
= 8.46
53.8365
The rejection region requires = .05 in the upper tail of the F distribution with numerator df = k g
= 3 and denominator df = n (k + 1) = 40 (5 + 1) = 34. From Table VIII, Appendix B, F.05 2.92.
The rejection region is F > 2.92.
Since the observed value of the test statistic falls in the rejection region (F = 8.46 > 2.92), H0 is
rejected. There is sufficient evidence to indicate the second-order model contributes more
information than a first-order model for the prediction of y at = .05.
e.
11.156 a.
where x1 = age
1 if current
x2
0 otherwise
802
Chapter 11
b.
c.
H0: 2 = 5 = 0
To determine if the interaction terms are important, we test:
H0: 4 = 5 = 0
d.
From MINITAB, the outputs from fitting the three models are:
Regression Analysis: Value versus Age, AgeSq, Status, AgeSt, AgeSqSt
The regression equation is
Value = 83 - 5.7 Age + 0.236 AgeSq - 62 Status + 5.4 AgeSt - 0.234 AgeSqSt
Predictor
Constant
Age
AgeSq
Status
AgeSt
AgeSqSt
Coef
83.4
-5.74
0.2361
-62.1
5.36
-0.2337
S = 286.8
SE Coef
316.3
18.68
0.2549
354.8
24.81
0.4080
R-Sq = 24.7%
T
0.26
-0.31
0.93
-0.18
0.22
-0.57
P
0.793
0.760
0.359
0.862
0.830
0.570
R-Sq(adj) = 16.1%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
AgeSt
AgeSqSt
DF
5
44
49
DF
1
1
1
1
1
SS
1186549
3618994
4805542
MS
237310
82250
F
2.89
P
0.024
Seq SS
865746
138871
77594
77342
26996
Coef
-176.1
11.166
196.5
-11.432
S = 283.2
SE Coef
145.0
3.902
178.9
6.763
R-Sq = 23.2%
T
-1.21
2.86
1.10
-1.69
P
0.231
0.006
0.278
0.098
R-Sq(adj) = 18.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Status
AgeSt
DF
1
1
1
DF
3
46
49
SS
1116017
3689526
4805543
MS
372006
80207
F
4.64
P
0.006
Seq SS
865746
21097
229174
Coef
165.8
-8.81
0.2535
-105.6
803
S = 284.5
SE Coef
182.7
10.89
0.1632
107.9
R-Sq = 22.5%
T
0.91
-0.81
1.55
-0.98
P
0.369
0.423
0.127
0.333
R-Sq(adj) = 17.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
AgeSq
Status
DF
1
1
1
DF
3
46
49
SS
1082210
3723332
4805542
MS
360737
80942
F
4.46
P
0.008
Seq SS
865746
138871
77594
= .429
82, 250
SSE C /[n (k 1)]
Since no is given, we will use = .05. The rejection region requires = .05 in the upper tail of the
F distribution with 1 = 2 numerator degrees of freedom and 2 = 44 denominator degrees of
freedom. From Table VIII, Appendix B, F.05 3.23. The rejection region is F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region (F = .429 3.23),
H0 is not rejected. There is insufficient evidence to indicate the quadratic terms are important for
predicting market value at = .05.
Test for part c:
The test statistic is:
F=
82, 250
SSE C /[n (k 1)]
The rejection region is the same as in previous test. Reject H0 if F > 3.23.
Since the observed value of the test statistic does not fall in the rejection region
(F = .634 3.23), H0 is not rejected. There is insufficient evidence to indicate the interaction terms
804
Chapter 11
Coef
-1.5705
0.025732
0.033615
S = 0.4023
SE Coef
0.4937
0.004024
0.004928
R-Sq = 68.1%
T
-3.18
6.40
6.82
P
0.003
0.000
0.000
R-Sq(adj) = 66.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
DF
1
1
DF
2
37
39
SS
12.7859
5.9876
18.7735
MS
6.3930
0.1618
F
39.51
P
0.000
Seq SS
5.2549
7.5311
Unusual Observations
Obs
x1
y
4
100
1.5400
32
39
1.2200
Fit
2.6498
2.1558
SE Fit
0.1699
0.1483
Residual
-1.1098
-0.9358
St Resid
-3.04R
-2.50R
805
Thus, both terms in the model are significant. The R-squared value is R2 = .681.
This indicates that 68.1% of the sample variance of the GPAs is explained by the model.
Now, we need to check the residuals. From MINITAB, the plots are:
1.0
90
Percent
99
50
10
1
0.0
-0.5
-1.0
-1.0
-0.5
0.0
Residual
0.5
1.0
3
Fitted Value
9
6
3
0
0.0
-0.5
-1.0
Residual
0.25
0.50
0.75
10
15
20
25
30
Observation Order
Residuals Versus x1
(response is y )
1.0
0.5
Residual
Frequency
12
0.0
-0.5
-1.0
40
50
60
70
x1
80
90
100
35
40
806
Chapter 11
Residuals Versus x2
(response is y )
1.0
Residual
0.5
0.0
-0.5
-1.0
50
60
70
80
90
100
x2
From the normal probability plot, it appears that the assumption of normality is valid. The points are very
close to a straight line except for the first 2 points. The histogram of the residuals implies that the residuals
are slightly skewed to the left. I would still consider the assumption to be valid. The plot of the residuals
versus y-hat indicates a random spread of the residuals between the two bands. This indicates that the
assumption of equal variances is probably valid. The plot of the residuals versus x1 indicates that the
relationship between GPA and Verbal score may not be linear, but quadratic because the points form a
somewhat upside down U shape. The plot of the residuals versus x2 indicates that the relationship between
GPA and Mathematics score may or may not be quadratic.
Since the plots indicate a possible 2nd order model and the R2 value is not real large, we will fit a complete
2nd order model:
2
2
E ( y ) o 1 x1 2 x2 3 x1 4 x2 5 x1 x2
807
Coef
-9.917
0.16681
0.13760
-0.0011082
-0.0008433
0.0002411
S = 0.187142
SE Coef
1.354
0.02124
0.02673
0.0001173
0.0001594
0.0001440
R-Sq = 93.7%
T
-7.32
7.85
5.15
-9.45
-5.29
1.67
P
0.000
0.000
0.000
0.000
0.000
0.103
R-Sq(adj) = 92.7%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
x1x2
DF
1
1
1
1
1
DF
5
34
39
SS
17.5827
1.1908
18.7735
MS
3.5165
0.0350
F
100.41
P
0.000
Seq SS
5.2549
7.5311
3.6434
1.0552
0.0982
Unusual Observations
Obs
2
4
34
x1
68
100
70
y
2.8900
1.5400
3.8200
Fit
3.2820
1.5806
3.3940
SE Fit
0.1002
0.1404
0.0753
Residual
-0.3920
-0.0406
0.4260
St Resid
-2.48R
-0.33 X
2.49R
To determine if the interaction between Verbal score and Mathematics score is useful in the prediction of y
(GPA), we test:
H0: 5 = 0
H a: 5 0
The test statistic is t = 1.67 and the p-value is p = 0.103. Since the p-value is not small, H0 is not rejected
for any value of < .10. There is insufficient evidence to indicate the interaction between Verbal score
and Mathematics score is useful in predicting GPA.
Now, we will fit a model without the interaction term, but including the squared terms:
2
2
E ( y ) o 1 x1 2 x2 3 x1 4 x2
808
Chapter 11
Coef
-11.458
0.18887
0.15874
-0.0011412
-0.0008705
S = 0.191905
SE Coef
1.019
0.01709
0.02417
0.0001186
0.0001626
R-Sq = 93.1%
T
-11.24
11.05
6.57
-9.62
-5.35
P
0.000
0.000
0.000
0.000
0.000
R-Sq(adj) = 92.3%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x1
x2
x1sq
x2sq
DF
1
1
1
1
DF
4
35
39
SS
17.4845
1.2890
18.7735
MS
4.3711
0.0368
F
118.69
P
0.000
Seq SS
5.2549
7.5311
3.6434
1.0552
Unusual Observations
Obs
2
4
32
34
x1
68
100
39
70
y
2.8900
1.5400
1.2200
3.8200
Fit
3.2921
1.7059
1.3190
3.3954
SE Fit
0.1025
0.1219
0.1240
0.0772
Residual
-0.4021
-0.1659
-0.0990
0.4246
St Resid
-2.48R
-1.12 X
-0.68 X
2.42R
To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics
score, we test:
H0: 3 = 0
Ha: 3 0
The test statistic is t = 9.62 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate the relationship between Verbal score
and GPA is quadratic, controlling for Mathematics score.
To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics
score, we test:
H0: 4 = 0
Ha: 4 0
809
The test statistic is t = 5.35 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected for
any reasonable value of . There is sufficient evidence to indicate the relationship between Mathematics
score and GPA is quadratic, controlling for Verbal score.
Thus, both quadratic terms in the model are significant. The R-squared value is R2 =.913. This indicates
that 91.3% of the sample variance of the GPAs is explained by the model.
Now, we need to check the residuals. From MINITAB, the plots are:
0.25
Residual
0.50
90
Percent
99
50
10
1
-0.50
-0.25
0.00
Residual
0.25
0.00
-0.25
-0.50
0.50
2
Fitted Value
12
0.25
Residual
16
Frequency
8
4
0
-0.4
-0.2
0.0
Residual
0.2
0.4
0.00
-0.25
-0.50
10
15
20
25
30
Observation Order
35
40
810
Chapter 11
Residuals Versus x1
(response is y )
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
40
50
60
70
x1
80
90
100
Residuals Versus x2
(response is y )
0.5
0.4
0.3
Residual
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
50
60
70
80
90
100
x2
From the normal probability plot, it appears that the assumption of normality is valid. The points are very
close to a straight line. The histogram of the residuals also implies that the residuals are approximately
normal. The plot of the residuals versus y-hat indicates a random spread of the residuals between the two
bands. This indicates that the assumption of equal variances is probably valid. The plot of the residuals
versus x1 indicates a random spread of the residuals between the two bands. This indicates that the order of
x1 (2nd) is appropriate. The plot of the residuals versus x2 indicates a random spread of the residuals
between the two bands. This indicates that the order of x2 (2nd) is appropriate.
The model appears to be pretty good. All terms in the model are significant, the residual analysis indicates
the assumptions are met and the R-squared value is fairly close to 1. The fitted model is
2
2