Académique Documents
Professionnel Documents
Culture Documents
CHAPTER 8
THE COMPARISON OF TWO POPULATIONS
8-1.
n = 25
D = 19.08
H0: D = 0
t (24) =
s D = 30.67
H1: D 0
D D0
sD / n
= 3.11
Reject H0 at = 0.01.
Paired Difference Test
Evidence
Size
25
Assumption
Populations Normal
8-2.
n = 40
D = 5 s D = 2.3
H0: D = 0
t(39) =
At an of
5%
Reject
H1: D 0
50
2.3 / 40
= 13.75
n = 12
D = 3.67
H0: D = 0
H1: D 0
8-1
Size
Assumption
15
10
Populations Normal
2
3
4
5
6
7
8
9
17
25
17
14
18
17
16
14
9
Stdev. of Difference 2.44949 sD
Note: Difference has been defined as
21
16
Test Statistic 4.4907 t
11
df
8
At an of
12 Hypothesis Testing
Null Hypothesis
p-value
13
5%
H0: 1 2 =0
Reject
15
0.0020
H0: 1 2 >=0
13
0.9990
H0: 1 2 <=0
Reject
0.0010
At = 0.05, we reject H0. There are more viewers for movies than commercials.
8-4.
D = 0.2
n = 60
H0: D 0
t(24) =
0.2 0
1 / 60
sD = 1
H1: D > 0
= 1.549. At = 0.05, we cannot reject H0.
60
Assumption
Average Difference
0.2
Populations Normal
Stdev. of Difference
sD
Note: Difference has been defined as
8-5.
n = 15
D = 3.2
H0: D 0
t (14) =
s D = 8.436
At an of
5%
(D = After Before)
H1: D > 0
3.2 0
8.436 / 15
= 1.469
8-2
n = 12
D = 37.08
H0: D = 0
s D = 43.99
H1: D 0
Size
12
Assumption
258
214
Populations Normal
2
3
4
5
6
7
8
9
10
11
12
289
228
200
190
350
310
212
195
175
299
190
250
Stdev. of Difference 43.9927 sD
Note: Difference has been defined as
190
185
Test Statistic 2.9200 t
114
df 11
At an of
285 Hypothesis Testing
Null Hypothesis
p-value
378
5%
H0: 1 2 =0
Reject
230
0.0139
H
:
>=
160
0
0.9930
0
1
2
H0: 1 2 <=0
Reject
120
0.0070
220
105
Reject H0. There is strong evidence that hotels in Spain are cheaper than those in France,
based on this small sample. p-value = 0.0139
8-7.
Power at D = 0.1
H0: D 0
n = 60
D = 1.0
= 0.01
H1: D > 0
C = 0 + 2.326( / n ) = 0.30029
We need:
P( D > C | D = 0.1)
= P( D > 0.30029 | D = 0.1)
0.30029 0.1
= P Z
1 / 60
n = 20
D = 1.25
H0: D = 0
s D = 42.896
H1: D 0
8-3
t (19) =
1.25 0
= 0.13
42.89 / 20
20
Assumption
Populations Normal
8-9.
n1 = 100
At an of
5%
H0: 2 1 0
H1: 2 1 0
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
100
76.5
38
n
x-bar
s
100
88.1
40
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0
Reject
n1 = n 2 = 30
H0: 1 2 = 0
Nikon (1): x1 = 8.5
H1: 1 2 0
s1 = 2.1
8-4
z=
8.5 7.8
2
= 1.386
Do not reject H0. There is no evidence of a difference in the average ratings of the two cameras.
8-11.
n1 = 32
x1 = 2.5M
s1 = 0.41M
Marin (2):
n 2 = 35
x 2 = 4.32M
s 2 = 0.87M
H0: 1 2 = 0
H1: 1 2 0
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
32
2.5
0.41
35
4.32
0.87
n
x-bar
s
Null Hypothesis
H0: 1 2 =0
p-value
0.0000
5%
Reject
H0: 1 2 >=0
H0: 1 2 <=0
0.0000
1.0000
Reject
p-value
0.0000
0.0000
1.0000
At an of
5%
Reject
Reject
8-5
Reject H0. There is evidence that the average Bel Air price is lower.
8-12.
H0: J SP = 0
H1: J SP 0
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
40
15
3
n
x-bar
s
40
6.2
3.5
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0
1.0000
0.0000
Reject
Reject the null hypothesis. The global equities outperform U.S. market.
8-13.
Music:
n1 = 128 x1 = 23.5
s1 = 12.2
Verbal: n 2 = 212
x 2 = 18.0
H0: 1 2 = 0
H1: 1 2 0
z=
23.5 18.0
(12.2 / 128) (10.5 2 / 212)
2
s 2 = 10.5
= 4.24
8-6
Evidence
Sample1 Sample2
n
Size
128
212
x-bar
Mean 23.5
18
Popn. Std. Devn.
Popn. 1
12.2
Popn. 2
10.5
Hypothesis Testing
Test Statistic 4.2397 z
At an of
p-value
5%
Reject
0.0000
Null Hypothesis
H0: 1 2 =0
8-14.
n1 = 13
n 2 = 13
x1 = 20.385
= .05
s1 = 7.622
s 2 = 4.292
H0: u1 = u2
H1: u1 u2
S p2
x 2 = 10.385
13 17.622 2 13 14.292 2
13 13 2
t 24
20.385 10.385
38.2581 1 1
13 13
38.2581
4.1219
df 24.
Use a critical value of 2.064 for a two-tailed test. Reject H0. The two methods do differ.
8-15.
Liz (1):
n1 = 32
x1 = 4,238
Calvin (2):
n 2 = 37
x 2 = 3,888.72 s 2 = 876.05
a. one-tailed: H0: 1 2 0
b. z =
s1 = 1,002.5
H1: 1 2 > 0
4,238 3,888.72 0
(1,002.52 / 32) (876.052 / 37)
= 1.53
c. At = 0.5, the critical point is 1.645. Do not reject H0 that Liz Claiborne models do not get
more money, on the average.
d. p-value = .5 .437 = .063 (It is the probability of committing a Type I error if we choose
to reject and H0 happens to be true.)
8-7
e.
S 2p
10 11002.5 2 11 1876.05 2
10 11 2
t 24
4238 3888.72
879983.804 1 1
10 11
879983.804
0.8522
df 19
8-16.
Sample2
28
0.19
5.72
28
0.72
5.1
Size
Mean
Std. Deviation
Assumptions
Populations Normal
n
x-bar
s
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0
0.3579
0.6421
Do not reject the null hypothesis. Pre-earnings announcements have no impact on earnings on
stock investments.
8-17.
Non-research (1):
n1 = 255
s1 = 0.64
Research (2):
n 2 = 300
s 2 = 0.85
x 2 x1 = 2.54
8-8
8-18.
Audio (1): n1 = 25
x1 = 87
s1 = 12
Video (2): n 2 = 20
x 2 = 64
s 2 = 23
H0: 1 2 = 0
t(43) =
H1: 1 2 0
x1 x 2 0
(n1 1) s1 (n 2 1) s 2
n1 n 2 2
2
1
1
n1 n 2
= 4.326
Reject H0. Audio is probably better (higher average purchase intent). Waldenbooks should
concentrate in audio.
Evidence
Sample1 Sample2
Size
Mean
Std. Deviation
25
87
12
n
x-bar
s
20
64
23
8-19.
p-value
0.0001
At an of
5%
Reject
n1 = 13
x1 = 55
s1 = 8
x 2 = 48
s2 = 6
H0: 1 2 4,000
(55 48) 4
t (26) =
= 1.132
(12)(8) (14)(6) 1
1
26
13 15
The critical value at = .05 for t (26) in a right-hand tailed test is 1.706. Since 1.132 < 1.706,
there is no evidence at = .05 that the program executives get an average of $4,000 per year
more than other executives of comparable levels.
8-20.
H1: P - L 0
8-9
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
20
1
1.1
20
6
2.5
n
x-bar
s
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0
Reject the null hypothesis: the average cost of beer is cheaper in Prague. Londoners save
between $3.74 and $6.26.
8-21.
US
China
15
3.8
2.2
18
6.1
5.3
8-10
Assumptions
Populations Normal
n
x-bar
s
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0
0.1073
99% -2.3 3.85252 = [ -6.1525, 1.55252 ]
0.0536
0.9464
Do not reject the null hypothesis (p-value = 0.1073), investment returns are the same in China
and the US.
8-22.
Old (1):
n1 = 19
x1 = 8.26
s1 = 1.43
New (2):
n 2 = 23
x 2 = 9.11
s 2 = 1.56
H0: 2 1 0
H1: 2 1 > 0
9.11 8.26 0
t (40) =
= 1.82
18(1.43) 22(1.56) 1
1
40
19 23
Some evidence to reject H0 (p-value = 0.038) for the t-distribution with df = 40, in a one-tailed
test.
8-23.
Take proposed route as population 1 and alternate route as 2. Assume equal variance for both
populations.
H0: 1 2 0
H1: 1 2 > 0
p-value from the template = 0.8674
cannot reject H0
8-24.
8-11
Evidence
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
20
3.56
2.8
n
x-bar
s
20
4.84
3.2
p-value
0.1862
H0: 1 2 >=0
H0: 1 2 <=0
0.0931
0.9069
At an of
5%
Do not reject the null hypothesis. Neither investment outperforms the other.
8-25.
Yes (1): n1 = 25
x1 = 12
No (2):
x 2 = 13.5
n 2 = 25
s1 = 2.5
s2 = 1
Assume independent random sampling from normal populations with equal population variances.
H0: 2 1 0
H1: 2 1 > 0
13.5 12
t(48) =
= 2.785
24(2.5) 24(1) 1
1
48
25 25
At an of
p-value
5%
0.0076 Reject
0.0038
Reject
8-12
8-26.
H0: 1 2 = 0
H1: 1 2 0
.1331 .105 0
z=
= 0.8887
20(.09) 27(.122) 1
1
47
21 28
Do not reject H0. There is no evidence of a difference in average stock returns for the two
periods.
8-27.
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
8
3
2
n
x-bar
s
10
2.3
2.1
p-value
0.4834
H0: 1 2 >=0
H0: 1 2 <=0
0.7583
0.2417
At an of
5%
Do not reject the null hypothesis. (p-value = 0.2417) The new advertising firm has not resulted
in significantly higher sales.
8.28.
x 2 = 13.5
s1 = 2.5
s2 = 1
1
1
n1 n2
24(2.5) 24(1) 1
1
= (13.5 12) 2.001
48
25 25
8-13
8-29.
Before (1):
x1 = 85
n1 = 100
After (2):
x 2 = 68
n 2 = 100
H0: p1 p2 0
H1: p1 p2 > 0
p 1 p 2
.85 .68
z=
=
= 2.835
1
1
1
1
(.765)(.235)
p (1 p )
100 100
n1 n 2
Reject H0. On-time departure percentage has probably declined after NWs merger with
Republic. p-value = 0.0023.
Evidence
Sample Sample
1
2
Size 100
100 n
#Successes 85
68 x
Proportion 0.8500 0.6800 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.7650
Test Statistic 2.8351 z
Null Hypothesis
H0: p1 - p2 = 0
H0: p1 - p2 >= 0
H0: p1 - p2 <= 0
8-30.
p-value
0.0046
0.9977
0.0023
At an of
5%
Reject
Reject
n1 = 1,000
x 1 = 850
n 2 = 2,500
x 2= 1,950
H0: p1 p2 0
H1: p1 p2 > 0
850 1,950
1,000 2,500
z=
1
850 1,950 2,800 1
= 4.677
Reject H0. There is strong evidence that the percentage of word-of-mouth recommendations in
small towns is greater than it is in large metropolitan areas.
8.31.
n1 = 31
x 1 = 11
H0: p1 p2 = 0
n 2 = 50
x 2= 19
H1: p1 p2 0
8-14
z=
p 1 p 2
1
1
p (1 p )
n
n
2
1
= 0.228
Do not reject H0. There is no evidence that one corporate raider is more successful than the other.
8-32.
n1 = 2,060
p 1 = 0.13
n 2 = 5,000
p 2 = 0.19
H0: p2 p1 .05
H1: p2 p1 > .05
p 2 p 1 D
0.19 0.13 .05
z=
=
= 1.08
(.13)(.87) (.19)(.81)
p 1 (1 p 1 ) p 2 (1 p 2 )
2,060
5,000
n1
n2
No evidence to reject H0; cannot conclude that the campaign has increased the proportion of
people who prefer California wines by over 0.05.
8-33.
( p 2 p 1 ) 1.96
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
(.13)(.87) (.19)(.81)
= [0.0419, 0.0781]
2,060
5,000
We are 95% confident that the increase in the proportion of the population preferring California
wines is anywhere from 4.19% to 7.81%.
Confidence Interval
95%
8-34.
Confidence Interval
0.0600 0.0181
= [
0.0419 , 0.0782 ]
n 2 = 480
x 2 = 20
p (1 p )
n1 n 2
Reject H0. p-value = 0.0122.
8-35.
x 1 = 34
x 2 = 41
H0: p 1 p 2 0
n 2 = 200
H1: p 1 p 2 > 0
8-15
z=
.283 .205
1
1
(.234)(1 .234)
120
200
= 1.601
At = 0.05, there is no evidence to conclude that the proportion of American executives who
prefer the A380 is greater than that of European executives. (p-value = 0.0547.)
Evidence
Sample 1 Sample 2
Size 120
200 n
x
#Successes
34
41
Proportion 0.2833 0.2050 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.2344
Test Statistic 1.6015 z
8-36.
Null Hypothesis
p-value
H0: p1 - p2 = 0
0.1093
H0: p1 - p2 >= 0
0.9454
H0: p1 - p2 <= 0
0.0546
At an of
5%
x 1 = 75
p 1 = .075
n 2 = 1,000
x 2 = 72
p 2 = .072
Chicago (2):
H0: p 1 p 2 = 0
z=
H1: p 1 p 2 0
p 1 p 2
1
1
p (1 p )
n1 n 2
= 0.257
8-16
Sample 1 Sample 2
Size 100
100 n
x
#Successes
18
6
Proportion 0.1800 0.0600 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.1200
Test Statistic 2.6112 z
At an of
p-value
5%
Null Hypothesis
H0: p1 - p2 = 0
0.0090
Reject
Reject the null hypothesis, the new accounting method is more effective.
8-38.
Sample 1 Sample 2
Size 100
100 n
x
#Successes
32
19
Proportion 0.3200 0.1900 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.2550
Test Statistic 2.1090 z
Null Hypothesis
H0: p1 - p2 = 0
At an of
p-value
1%
0.0349
Do not reject the null hypothesis: the proportions are not significantly different.
8-39.
Motorola (1):
n1 = 120
x 1 = 101 p1 = .842
Blaupunkt (2):
n 2 = 200
x 2 = 110 p2 = .550
H0: p 1 p 2
H1: p 1 > p 2
z=
.842 .550
1
1
(.659)(1 .659)
120 200
= 5.33
8-17
n1 = 40
2
s1 = 1,288
n 2 = 15
2
s 2 = 1,112
H0:
2
1
H1:
2
1
>2
use = .05
2
2
s 1 /s 2
F (39,14) =
= 1,288/1,112 = 1.158
The critical point at = .05 is F (39,14) = 2.27 (using approximate df in the table). Do not reject
H0. There is no evidence that the variance of the new production method is smaller.
F-Test for Equality of Variances
Sample 1 Sample 2
Size
40
15
Variance
1288
1112
Null Hypothesis
H0:
2
1
H0:
2
1
2
H0: 1
8-41.
2
2
p-value
= 0 0.7977
>= 0 0.6012
<= 0 0.3988
2
2
2
2
At an of
5%
H0: 1 = 2
F = 1.1025
H1:
2
1
2
2
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.1025
p-value 0.9186
n 2 = 25 s2= 1
H0: 1 = 2
H1: 1 2
2
Put the larger s in the numerator and use 2 :
2
8-18
From the F table using = .01, the critical point is F (24,24) = 2.66. Therefore, reject H0. The
population variances are not equal at = 2(.01) = 0.02.
F-Test for Equality of Variances
Size
Variance
6.25
Test Statistic
df1
df2
6.25
24
24
Null Hypothesis
H0:
2
1
2
H0: 1
2
H0: 1
8-43.
Sample 1 Sample 2
25
25
n1 = 21
2
2
2
2
2
2
p-value
= 0 0.0000
1
F
At an of
5%
Reject
>= 0 1.0000
<= 0 0.0000
s1 = .09
2
n2 = 28
Reject
s2 = .122
Before (1):
n1 = 12
s1 = 16,390.545
After (2):
n 2 = 11
2
s 2 = 86,845.764
H0: 1 = 2
H1: 1 2
F (10,11) = 5.298
The critical point from the table, using = .01, is F (10,11) = 4.54. Therefore, reject H0. The
population variances are probably not equal. p-value < .02 (double the ).
8-19
Null Hypothesis
2
H0: 1
2
H0: 1 2
H0: 1 -
8-45.
n1 = 25
p-value
At an of
1%
2
- 2=
2
2 >=
2
2 <=
0 0.9945
0 0.0055
Reject
s1 = 2.5
n2 = 25
s2 = 3.1
0 0.0109
H1: 1 2
H0: 1 = 2
= .02
F (24,24) = (3.1)2/(2.5)2 = 1.538
From the table: F .01(24,24) = 2.66. Do not reject H0. There is no evidence that the variances in the
two waiting lines are unequal.
8-46.
nA = 25
sA = 6.52
2
nB = 22
sB = 3.47
H1: A B
H0: A = B
= .01
F (24,21) = 6.52/3.47 = 1.879
The critical point for = .01 is F (24,21) = 2.80. Do not reject H0. There is no evidence that stock
A is riskier than stock B.
F-Test for Equality of Variances
Size
Variance
Sample 1 Sample 2
25
22
6.52
3.47
Null Hypothesis
H0:
2
1
2
H0: 1
2
H0: 1
2
2
2
2
2
2
p-value
At an of
1%
= 0 0.1485
>= 0 0.9258
<= 0 0.0742
8-20
8-47.
The assumptions we need are: independent random sampling from the populations in question,
and normal population distributions. The normality assumption is not terribly crucial as long as
no serious violations of this assumption exist. In time series data, the assumption of random
sampling is often violated when the observations are dependent on each other through time. We
must be careful.
8-48.
Evidence
Sample1 Sample2
Size 200
Mean 10402
Std. Deviation 8500
200 n
11359 x-bar
9100 s
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.14616
p-value 0.3367
p-value
0.2778
H0: 1 2 >=0
H0: 1 2 <=0
0.1389
0.8611
At an of
5%
Do not reject the null hypothesis. The average cost of the two procedures are similar.
8-49.
The C.I. contains zero as expected from the results of Problem 8-48.
8-50.
d = 51
d = 4.636
s d = 7.593
8-21
8-51.
8-52.
Sample 1 Sample 2
Size 200
200 n
x
#Successes
96
52
Proportion 0.4800 0.2600 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.3700
Test Statistic 4.5567 z
Null Hypothesis
At an of
p-value
5%
H0: p1 - p2 = 0
0.0000
H0: p1 - p2 >= 0
1.0000
H0: p1 - p2 <= 0
0.0000
Reject
Reject
Reject H0. There is evidence that NFL viewers watch more commercials than those viewing
Survivor.
8-53. 99% C.I. pNFL pSCI (for the difference between viewing commercials for NFL viewers vs.
Survivor viewers.)
Confidence Interval
99%
Confidence Interval
0.2200 0.1211
= [
0.0989 , 0.3411 ]
8-22
8-54.
Evidence
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
15
1242
50
n
x-bar
s
15
1240
50
p-value
0.9136
H0: 1 2 >=0
H0: 1 2 <=0
0.5432
0.4568
At an of
5%
Do not reject the null hypothesis. The number of roses imported from both countries is about the
same.
8-55.
x1 = 60
n1 = 80
x2= 65
n2 = 100
p = 125/180 = .6944
H0: p1 p2 = 0
H1: p1 p2 0
p 1 p 2 0
.75 .65
=
= 1.447
z=
1
1
1
1
(.6944)(1 .6944)
p (1 p )
80 100
n1 n 2
Do not reject H0. (There is no evidence that one movie will be more successful than the other
(p-value = 0.1478).
8-23
Evidence
Sample 1 Sample 2
Size
80
100 n
x
#Successes
60
65
Proportion 0.7500 0.6500 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.6944
Test Statistic 1.4473 z
At an of
p-value
5%
Null Hypothesis
H0: p1 - p2 = 0
8-56.
0.1478
95% C.I. for the difference between the two population proportions:
( p 1 p 2 ) 1.96
= 0.10 1.96
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
(.75)(.25) (.65)(.35)
= [0.0332, 0.2332]
80
100
K:
L:
nK = 12
nL = 12
x K = 12.55
x L = 11.925
H0: K L = 0
t (22) =
sK = .7342281
sL = .3078517
H1: K L 0
12.55 11.925
2
= 2.719
11(.7342281) 11(.3078517) 1
1
22
12
12
Reject H0. The critical points for t (22) at = .02 are 2.508. Critical points for t (22) at = .01
are 2.819. So .01 < p-value < .02. The L-boat is probably faster.
8-24
Evidence
Sample1 Sample2
n
Size
12
12
Mean 12.55
11.925 x-bar
Std. Deviation 0.73423 0.30785 s
8-58.
p-value
0.0125
At an of
5%
Reject
Do Problem 8-57 with the data being paired. The differences KL are:
0.2
n = 12
t (11) =
1.0
0.2
D = .625
.625 0
.7723929/ 12
1.0
2.2
0.2
0.8
0.9
1.0
0.2
0.6
1.2
sD = .7723929
= 2.803
2.718 < 2.803 < 3.106 (between the critical points of t (11) for = .01 and .02).
Hence, .01 < p-value < .02, which is as before, in Problem 8-57 (the pairing did not help much
herewe reach the same conclusion).
Paired Difference Test
Evidence
Size
12
Assumption
Populations Normal
8-59.
At an of
5%
Reject
Reject
8-25
Sample 1 Sample 2
Size 1000
1000 n
#Successes 49.5
67.9 x
Proportion 0.0495 0.0679 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.0587
Test Statistic -1.7503 z
At an of
p-value
5%
Null Hypothesis
H0: p1 - p2 = 0
0.0801
Do not reject the null hypothesis: the delinquency rates are the same.
8-60.
IIT (1):
n1 = 100
p 1 = 0.94
Competitor (2):
n2 = 125
p 2 = 0.92
H0: p1 p2 = 0
H1: p1 p2 0
z=
.02
1
1
(.9288)(1 .9288)
100 125
p = .92888
= 0.58
There is no evidence that one program is more successful than the other.
8-61.
Design (1):
n1 = 15
x1 = 2.17333
Design (2):
n2 = 13
x 2 = 2.5153846
H0: 2 1 = 0
s1 = .3750555
s2 = .3508232
H1: 2 1 0
2.5153846 2.173333
t (26) =
= 2.479
14(.3750555) 12(.3508232) 1
1
26
15
13
H0:
2
1
H1: 1
=2
2
2
2
8-26
Do not reject H0 at = 0.10. (Since 1.143 < 2.62. Also < 2.10, so the p-value > 0.20.) The
solution of Problem 8-61 is valid from the equal-variance requirement.
8-63.
A = After:
nA = 16
B = Before: nB = 15
H0: A B 5
x A = 91.75
sA = 5.0265959
x B = 84.7333
sB = 5.3514573
H1: A B > 5
91.75 84.733 5
t (29) =
= 1.08
15(5.0265959) 14(5.3514573) 1
1
29
16 15
H0: 1 = 2 H1: 1 2
F (14,15) = (5.3514573)2/(5.0265959)2 = 1.133
Do not reject H0 at = 0.10. There is no evidence that the population variances are not equal.
F-Test for Equality of Variances
Size
Sample 1 Sample 2
15
16
Variance 28.6381
25.26667
Null Hypothesis
H0:
2
1
2
H0: 1
2
H0: 1
8-65.
2
2
2
2
2
2
p-value
At an of
10%
= 0 0.8100
>= 0 0.5950
<= 0 0.4050
2
L
=K
H1: L K
Evidence
Sample1 Sample2
Size 200
Mean 10402
Std. Deviation 8500
200 n
11359 x-bar
9100 s
8-27
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.14616
p-value 0.3367
F = 1.146 p = 0.34
Do not reject the null hypothesis of equal variances.
8-66.
H0: K 2 = L 2
H1: K 2 L 2
df =
n 1 n 1
1
2
t .02(14) = 2.624 < 2.719 < 2.977 = t .01(14), hence 0.01 < p-value < 0.02. Reject H0.
8-67.
Differences A B:
11 3
D = 2.375
t (15) =
14 8 10 5 7
2 12
5 10 22 12
sD = 9.7425185 n = 16
2.375 0
9.7425185/ 16
H0: D = 0
= 0.9751
H1: D 0
Do not reject H0. There is no evidence that one package is better liked than the other.
Paired Difference Test
Evidence
Size
16
Assumption
Populations Normal
8-68.
t
At an of
5%
p-value
0.3450
0.1725
0.8275
Supplier A: nA = 200 xA = 12
Supplier B: nB = 250 xB = 38
H0: pA pB = 0
H1: pA pB 0
8-28
p A p B 0
z=
1
1
p (1 p )
n
n
2
1
.06 .152
1
1
(.1111)(.8888)
200
250
= 3.086
Reject H0. p-value = .002. Supplier A is probably more reliable as the proportion of defective
components is lower.
8-69.
95% C.I. for the difference in the proportion of defective items for the two suppliers:
( p B p A ) 1.96
p A (1 p A ) p B (1 p B )
nA
nB
95%
8-70.
Confidence Interval
0.0920 0.0554
= [
0.0366 , 0.1474 ]
90% C.I. for the difference in average occupancy rate at the Westin Plaza Hotel before and after
the advertising:
2
15(5.0265959) 14(5.3514573) 1
1
( x B x A ) 1.699
29
15 16
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
25
60
14
20
65
8
n
x-bar
s
p-value
0.1404
0.0702
0.9298
At an of
5%
8-29
Do not reject the null hypothesis. The price of the two virtual dolls is about the same.
8-72.
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
74
28
6
n
x-bar
s
65
22
6
p-value
0.0000
At an of
5%
Reject
H0: 1 2 >=0
H0: 1 2 <=0
1.0000
0.0000
Reject
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
74
50
20
65
14
8
8-30
n
x-bar
s
Null Hypothesis
H0: 1 2 =0
H0: 1 2 >=0
H0: 1 2 <=0
a.
n1 = 2500 x1 = 39
s1 = s2 = 2
H0: u1 = u2
z=
x 2 = 35
= .05
H1: u1 u2
39 35
n 2 = 2500
= 70.711
2 / 2500 2 / 2500
8-75.
Assumptions
Populations Normal
Sample1 Sample2
Size
Mean
Std. Deviation
25
1.7
0.4
25
1.5
0.7
n
x-bar
s
p-value
0.2225
At an of
5%
Do not reject the null hypothesis. The mean catches are about the same. p-value = 0.2225
8-31
8-76.
Yes. Lower income households are less likely to have internet access. (p-value = 0.0038)
Comparing Two Population Proportions
Evidence
Size
#Successes
Proportion
Sample 1
500
350
0.7000
Sample 2
n
500
x
310
p-hat
0.6200
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat
0.6600
Test Statistic
2.6702
Null Hypothesis
p-value
At an of
5%
Reject
H0: p1 - p2 = 0
0.0076
H0: p1 - p2 >= 0
0.9962
H0: p1 - p2 <= 0
0.0038
Reject
8-77. The 95% C.I. contains 0, which supports the results from 8-75.
Confidence Interval for difference in Population Means
Confidence Interval
95%
0.2 0.32642
= [
-0.1264, 0.52642 ]
8-78
The ration of the variances is 3.18. The degrees of freedom for both samples is 10 1 = 9.
Using the F-table for 9 degrees of freedom in both the numerator and the denominator, we find a
value of 3.18 when = 0.05. Therefore, there is a 5% chance.
8-79
8-32
2055
2940
2850
2475
2660
1940
2380
2590
2550
2485
2585
2710
2100
2655
1950
2115
Evidence:
Sample1 Sample2
n
Size
11
9
Mean 2623.18 2342.22 x-bar
Std. Deviation 174.087 393.55 s
p-value
0.0467
At an of
5%
Reject
0.9766
0.0234
Reject
At 0.05 level of significance, reject the null hypothesis that the charges are the same.
2. Test the assumption of equal variances.
H0 : 12 22
H1 : 12 22
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 5.11054
p-value 0.0193
8-33
At an of
5%
p-value
0.0748
0.9626
0.0374
Reject
Assumptions
Populations Normal
Sample1 Sample2
Size 40
40 n
Mean 2742.5 2729.35 x-bar
Std. Deviation 32.8883 38.3189 s
p-value
H0: 1 2 <=0
0.0518
95%
13.15 15.8956
=
2) Increasing would decrease . Increasing to any value above 5.18% will cause the null
hypothesis to be rejected.
3) Paired difference test: Reject the null hypothesis, (p-value = 0.0471)
8-34
Size
Average Difference
40
2792
2713
13.15
2
3
4
5
6
7
8
9
10
11
2755
2745
2731
2799
2793
2705
2729
2747
2725
2715
Assumption
Populations Normal
sD
Note: Difference has been defined as
Sample1 - Sample2
p-value
At an of
5%
0.0471
Reject
4) Reducing the variance of the new process will decrease the chances of a Type I error.
8-35
CHAPTER 9
ANALYSIS OF VARIANCE
9-1.
H0:
H1:
X X X X
X X
X X
X X X
X
X X X X
X X X X
1 2 3 4
All 4 are different
2 equal; 2 different
3 equal; 1 different
2 equal; other 2 equal but different from first 2
9-2.
ANOVA assumptions: normal populations with equal variance. Independent random sampling
from the r populations.
9-3.
Series of paired t-test are dependent on each other. There is no control over the probability of a
Type I error for the joint series of tests.
9-4.
r = 5 n1 = n2 = . . . = n5 = 21 n =105
dfs of F are 4 and 100. Computed F = 3.6. The p-value is close to 0.01. Reject H0. There is
evidence that not all 5 plants have equal average output.
F Distribution
10%
(1-Tail) F-Critical 2.0019
9-5.
5%
2.4626
1%
3.5127
0.50%
3.9634
r = 4 n1 = 52 n2 = 38 n3 = 43 n4 = 47
Computed F = 12.53. Reject H0. The average price per lot is not equal at all 4 cities. Feel very
strongly about rejecting the null hypothesis as the critical point of F (3,176) for = .01 is
approximately 3.8.
F Distribution
10%
(1-Tail) F-Critical 2.1152
5%
2.6559
1%
3.8948
0.50%
4.4264
9-6.
Originally, treatments referred to the different types of agricultural experiments being performed
on a crop; today it is used interchangeably to refer to the different populations in the study.Errors
are the differences between the data points and their sample means.
9-7.
9-1
9-8.
9-9.
9-10.
An error is any deviation from a sample menu that is not explained by differences among
populations. An error may be due to a host of factors not studied in the experiment.
9-11.
Both MSTR and MSE are sample statistics given to natural variation about their own means.
(If x > 0 we cannot immediately reject H0 in a single-sample case either.)
9-12.
The main principle of ANOVA is that if the r population means are not all equal then it is likely
that the variation of the data points about their sample means will be small compared to the
variation of the sample means about the grand mean.
9-13.
Distances among populations means manifest themselves in treatment deviations that are large
relative to error deviations. When these deviations are squared, added, and then divided by dfs,
they give two variances. When the treatment variance is (significantly) greater than the error
variance, population mean differences are likely to exist.
9-14.
9-15
SST = SSTR + SSE, but this does not equal MSTR + MSE. A counterexample:
Let n = 21
r = 6 SST = 100
SSTR = 85
SSE = 15
SST
SSTR SSE 85 15
18
n 1
r 1 n r 5 15
9-16.
When the null hypothesis of ANOVA is false, the ratio MSTR/MSE is not the ratio of two
independent, unbiased estimators of the common population variance 2 , hence this ratio does
not follow an F distribution.
9-17.
xij x = ( x i x ) + x ij xi
9-2
Now sum this over all observations (all treatments i = 1, . . . , r; and within treatment i, all
observations j = 1, . . . , ni:
ni
( xij x )2 =
i 1 j 1
ni
( x i x )2 +
i 1 j 1
ni
2( x i x )( xij x i ) +
ni
( xij x i )2
i 1 j 1
i 1 j 1
i 1
summand doesnt vary over each of the ni) values of j. Similarly the second sum is
r
i 1
ni
ni
j 1
of all deviations from the mean within treatment i. Thus the whole second sum in the long R.H.S.
above is 0, and the equation is now
r
ni
( xij x )2 =
i 1 j 1
ni( x i x )2 +
ni
( xij x i )2
i 1 j 1
i 1
(From Minitab):
Source
df
SS
MS
F
Treatment
2
381127
190563
20.71
Error
27
248460
9202
Total
29
629587
The critical point for F (2,27) at = 0.01 is 5.49. Therefore, reject H0. The average range of the 3
prototype planes is probably not equal.
5%
ANOVA Table
Source
SS
Between 381127
Within 248460
Total 629587
9-19.
df
MS
Fcritical p-value
F
2 190563.33 20.7084038 3.3541312 0.0000 Reject
27 9202.2222
29
5%
df
3
28
31
MS
Fcritical p-value
F
62.565 11.494 2.9467 0.0000 Reject
5.4433
9-3
MINITAB output
One-way ANOVA: UK, Mex, UAE, Oman
Source
Factor
Error
Total
DF
3
28
31
S = 2.333
SS
187.70
152.41
340.11
MS
62.57
5.44
R-Sq = 55.19%
F
11.49
P
0.000
R-Sq(adj) = 50.39%
Individual 95% CIs For Mean Based on
Pooled
Level
UK
Mex
UAE
Oman
Mean
StDev
8
8
8
8
60.160
58.390
55.190
54.124
2.535
2.405
2.224
2.149
StDev
+---------+---------+---------+-------(------*-----)
(------*-----)
(------*------)
(-----*------)
+---------+---------+---------+--------
52.5
55.0
57.5
60.0
Critical point F (3,28) for = 0.05 is 2.9467. Therefore we reject H0. There is evidence of
differences in the average price per barrel of oil from the four sources. The Rotterdam oil market
may not be efficient. The conclusion is valid only for Rotterdam, and only for Arabian Light. We
need to assume independent random samples from these populations, normal populations with
equal population variance. Observations are time-dependent (days during February), thus the
assumptions could be violated. This is a limitation of the study. Another limitation is that
February may be different from other months.
9-20.
An F(.05,2,101) = 3.61 result, relative to a critical value of 3.08637, indicates a significant difference
in their perceptions on the roles played by African American models in commercials.
9-21.
(From Minitab):
Source
Treatment
Error
Total
df
2
38
40
SS
91.0426
140.529
231.571
9-4
MS
45.5213
3.69812
F
12.31
p-value = .0001. Critical point for F (2,38) at = .05 is 3.245. Therefore, reject H0. There is a
difference in the length of time it takes to make a decision.
5%
ANOVA Table
Source
SS
df
MS
Fcritical
p-value
F
Between 91.0426
2 45.521302 12.3093042 3.2448213 0.0001 Reject
Within 140.529 38 3.6981215
Total 231.571 40
9-22.
An F(.05,2,55) = 52.787 result, relative to a critical value of 3.165, indicates a significant difference
in the monetary-economic reaction to the three inflation fighting policies.
9-23.
The test results exceed the critical value of F(.01,3,236) = 3.866. The results indicate that the
performances of the four different portfolios are significantly different.
9-24.
9-25.
Where do differences exist in the circle-square-triangle populations from Table 9-1, using
Tukey? From the text:
MSE = 2.125
triangles: n1 = 4
x1 = 6
squares:
n2 = 4
x 2 = 11.5
circles:
n3 = 3
x3 = 2
sig.
sig.
n.s.
9-26.
xC = 4,135
9-5
sig.
| x B xC | = 95 < 106.475
n.s.
sig.
Prototype A is shown to have higher average range than both B and C. Prototypes B and C have
no significant difference in average range (all conclusions are at = 0.05).
Tukey test for pairwise comparison of group means
A
r
B Sig B
3
n-r
C Sig
C
27
q0
T
9-27.
3.51
106.476
9-28.
9-29.
9-30.
9-6
9-31.
We cannot extend the results to planes built after the analysis. We used fixed effects here, not
random effects. The 3 prototypes were not randomly chosen from a population of levels as would
be required for the random effects model.
9-32.
A randomized complete block design is a design with restricted randomization. Each block of
experimental units is assigned to treatments with randomization of treatments within the block.
9-33.
Fly all 3 planes on the same route every time. The route (flown by the 3 planes) is the block.
9-34.
Look at the residuals. If the spread of the residuals is not equal, we probably have unequal 2 ,
the assumption of equal variances is violated. A histogram of the residuals will reveal normality
violations.
9-35.
Otherwise you are not randomly sampling from a population of treatments, and inference is not
valid for the entire population.
9-36.
9-37.
If the locations and the artists are chosen randomly, we have a random effects model.
9-38.
9-39.
Limitations and problems: (1) We dont know the overall significance level of the 3 tests; (2) If
we have 1 observation per cell then there are 0 degrees of freedom for error. Also, for a fixed
sample size there is a reduction of the df for error.
9-40.
9-41.
Since there are interactions, there are differences in emotions averaged over all levels of
advertisements.
9-42.
At = 0.05:
Location: F = 50.6, significant
Job type: F = 50.212, significant
Interaction: F = 2.14, n.s.
9-7
ANOVA Table
Source
SS
df
MS
F
Location 2520.988
2 1260.49 50.645
Job Type 2499.432
2 1249.72 50.212
Interaction 212.716
4 53.179 2.1367
Error
1792
72 24.8889
Total 7025.136
80
9-43.
Morning
Evening
Late Night
ABC
50
50
50
CBS
50
50
50
NBC
50
50
50
5%
Fcritical p-value
3.1239 0.0000 Reject
3.1239 0.0000 Reject
2.4989 0.0850
Source
Network
Newstime
Interaction
Error
Total
SS
145
160
240
6200
6745
df
2
2
4
441
449
MS
72.5
80
60
14.06
F
5.16
5.69
4.27
From table:
F 0.01(4,400) = 3.36
F 0.01(2,400) = 4.66
Therefore, all are significant at = 0.01. There are interactions. There are Network main effects
averaged over Newstime levels. There are Newstime main effects over Network levels.
Levels of task difficulty: a 1 = 1; therefore a = 2
Levels of effort: b 1 = 1; therefore b = 2
There are no task difficulty main effects because p-value = 0.5357
There are effort main effects because p-value < 0.0001
There are no significant interactions, as p-value = 0.1649.
9-44.
a.
b.
c.
d.
e.
9-45.
d.
e.
f.
g.
9-8
2
(144)
= F (1,144)
9-46.
Since there are interactions but neither of the main factors have significant F-tests, a likely
conclusion is that the two factors work in opposite directions, i.e., inverse to each other.
9-47.
Advantages: reduced experimental errors (the effects of extraneous factors) and greater economy
of sample sizes.
9-48.
Use blocking by firm, to reduce the error contributions arising from differences between firms.
9-49.
Could use a randomized blocking design: 4 observations, UK, Mexico, UAE, Oman at 4
locations and 4 different dates.
9-50.
A good blocking variable would be size of firm in terms of total assets or total sales, etc.
9-51.
Yes. Have people of the same occupation/age/demographics use sweaters of the 3 kinds under
study. Each group of 3 people are a block.
9-52.
As stated in 9-23, a good blocking variable would be some measure of diversity in the portfolio.
9-53.
We could group the executives into blocks according to some choice of common characteristics
such as age, sex, years employed at current firm, etc. The different blocks for the chosen attribute
would then form a third variable beyond Location and Type to use in a 3-way ANOVA.
9-54.
9-55.
SSTR = 3,233
SSE = 12,386 n = 100 blocks
df error = (n 1)(r 1) = 99(2) = 198 df treatment = r 1 = 2
3,233/ 2
= 25.84
12,386 / 198
Reject H0. p-value is very small. There are differences among the 3 sweeteners. Should be
very confident of results. Blocking reduces experimental error here, as people of the same
weight/age/sex will tend to behave homogeneously with respect to losing weight.
F = MSTR/MSE =
9-56.
n = 70
r=4
SSTR = 9,875 SSBL = 1,445 SST = 22,364
SSE = 22,364 1,445 9,875 = 11, 044
MSE =
11,044
= 53.35
(69)(3)
MSTR =
9,875
= 3,291.67
3
9-9
9-57.
SSTR = 7,102
SSE = 10,511 r = 8 ni = 20 for all i
MSTR = SSTR/(r 1) = 7,102/7 = 1,014.57
MSE = SSE/(n r) = 10,511/(160 8) = 69.15
F (7,152) = 14.67 > 2.76 (crit. point for = 0.01). Therefore, reject H0. Not all tapes are equally
appealing. p-value is very small.
9-58.
n1 = 32
n2 = 30
n3 = 38
n4 = 41
n =141
MSTR = SSTR/(r 1) = 4,537/3 = 1,512.33
F (3,137) = MSTR/MSE = 1,512.33/412 = 3.67
(at = 0.05) 2.67 < 3.67 < 3.92
(at = 0.01)
We can reject H0 at = 0.05. There is some evidence that the four names are not all equally well
liked.
9-59.
Software packages: 3
SS software = 77,645
SS computer = 54,521
SS int. = 88,699
SSE = 434,557
n = 60
Source
software
computer
interaction
error
Total
Computers 4
SS
77,645
54,521
88,699
434,557
655,422
df
2
3
6
708
719
MS
38,822.5
18,173.667
14,783.167
613.78
F
63.25
29.60
24.09
Treatment df = (r-1) = 2
Block df = 74
Total df = 224
Total sample size was 225:
Error df = (n-1)(r-1) = (74)(2) = 148
Critical value of F(.05, 2, 148) = 3.0572, which is less than F = 13.65. The results are significant.
9-10
9-61.
Source
pet
location
interaction
error
Total
SS
22,245
34,551
31,778
554,398
642,972
df
3
3
9
144
159
MS
7,415
11,517
3,530.89
3,849.99
F
1.93
2.99
0.92
5%
ANOVA Table
Source
SS
Between 3203.12
Within 25359.6
Total 28562.7
df
MS
Fcritical
p-value
F
2
1601.56 4.54708749 3.123901138 0.0138 Reject
72 352.21667
74
7.4824354
Placebo
27.8
7.4824354
N0-Treatment
39.48
7.4824354
95%
95%
95%
9-63.
N0-Treatment
a. Blocking (repeated measures) is more efficient as every person is his/her own control.
Reductions in errors. Limitations? Maybe carryover effects from trial to trial.
9-11
9-64.
b. SSTR = 44,572
SSE = 112,672
r=3
n= 30
MSTR = 44,572/2 = 22,286
MSE = 112,672/(29)(2) = 1,942.62
F (2,58) = 11.47. Reject H0.
n1 = n2 = n3 = 15
r = 3 A one-way ANOVA gives an F-value of 22.21, which is significant
even at < 0.001, hence we reject the hypothesis of no differences among the three models.
MSE = 48.1, so at = 0.01 we use the critical point q = 4.37 (closest to the required value for
dfs = 3, 42), giving the Tukey criterion T = q MSE / ni = 7.83. Observed means:
xGI = 124.73
x P = 121.40
| xGI x P | = 3.33
So:
x Z = 108.73
| xGI x Z | = 16.00*
| x P x Z | = 12.67*
Using T = 7.83, we reject the hypothesis of xGI = x Z and also x P = x Z (at the 0.01 level of
significance), but not the xGI = x P hypothesis.
5%
ANOVA Table
Source
SS
Between 2137.78
Within 2021.47
Total 4159.24
df
MS
Fcritical
p-value
F
2 1068.8889 22.2082976 3.219938094 0.0000 Reject
42 48.130159
44
3.6149467
Phillips
121.4
3.6149467
Zenith
108.733
3.6149467
95%
95%
95%
9-65.
n = 50
r =3
SSTR = 128,889
F (2,98) =
128,899 / 2
42,223,987 / 98
SSE = 42,223.987
= 0.14958
2
(df)
= F (1,df)
9-12
Zenith
9-67.
Rents are equal on average. There is no evidence of differences among the four cities.
9-68.
9-69.
A one-way ANOVA strongly rejecting H0. For the three levels of Store, 95% confidence
intervals are calculated for means, as shown, which do not overlap at all.
Case 11: Rating Wines
(Template: ANOVA.xls, sheet: 1-Way)
data:
n
11
1
2
3
4
5
6
7
8
9
10
11
Chard
89
88
89
78
80
86
87
88
88
89
88
10
13
11
1) Do not reject the null hypothesis, there is no difference in the average ratings due to the type of
grape.
ANOVA Table
Source
SS
Between 411.617
Within 6545.63
Total 6957.24
5%
df
3
41
44
MS
Fcritical p-value
F
137.21 0.8594 2.8327 0.4698
159.65
9-13
1.
ANOVA
n
1
2
3
4
5
6
7
8
9
10
10
10
10
5%
ANOVA Table
Source
Between
Within
Total
SS
df
20.6
2
79.7 27
100.3 29
MS
10.3
2.9519
Fcritical p-value
F
3.4893 3.3541 0.0449 Reject
ANOVA Table
5%
Source
SS
df
MS
Fcritical p-value
F
Row 20.76667
4 5.19167 2.1239 2.5787 0.0934
Column
90.7
2
45.35 18.552 3.2043 0.0000 Reject
Interaction 14.13333
8 1.76667 0.7227 2.1521 0.6705
Error
110
45 2.44444
Total
235.6
59
Reject the null hypothesis of equal number of scans per minute (columns)
Do not reject the null hypothesis that the clerks are equally efficient.
There are no interaction effects present.
9-14
CHAPTER 10
SIMPLE LINEAR REGRESSION AND CORRELATION
(The template for this chapter is: Simple Regression.xls.)
10-1.
A statistical model is a set of mathematical formulas and assumptions that describe some realworld situation.
10-2.
Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model
parameters; 3) Test the validity of the model; and 4) Use the model.
10-3.
Assumptions of the simple linear regression model: 1) A straight-line relationship between X and
Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed
random variables, uncorrelated with each other through time.
10-4.
0 is the Y-intercept of the regression line, and 1 is the slope of the line.
10-5.
10-6.
The regression model is used for understanding the relationship between the two variables, X and
Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the
variable X.
10-7.
The error term captures the randomness in the process. Since X is assumed nonrandom, the
addition of makes the result (Y) a random variable. The error term captures the effects on Y of a
host of unknown random components not accounted for by the simple linear regression model.
10-8.
The equation represents a simple linear regression model without an intercept (constant) term.
10-9.
The least-squares procedure produces the best estimated regression line in the sense that the line
lies inside the data set. The line is the best unbiased linear estimator of the true regression line
as the estimators 0 and 1 have smallest variance of all linear unbiased estimators of the line
parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the
data points from the line.
10-10. Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the
determination of the estimators of the line parameters because the procedure is based on
minimizing the squared distances from the line. Since outliers have large squared distances they
exert undue influence on the line. A more robust procedure may be appropriate when outliers
exist.
10-1
Wealth
Y
Error
17.3
0.8
2
3
4
2
3
4
23.6
40.2
45.8
-3.02
3.46
-1.06
0.167 -0.967
95%
10.12 + or - 2.77974
0.833 0.967
0.333 -0.431 Confidence Interval for Intercept
56.8
-0.18
0.500 0.000
Quantile
0.667 0.431
95%
b1 = 0.187
2
s(b1)
X
95%
10
-1.19025 + or - 2.8317
+ or ANOVA Table
Source
SS
Regn. 128.332
Error 10.8987
Total 139.231
df
1
11
12
MS
F
Fcritical p-value
128.332 129.525 4.84434 0.0000
0.99079
10-2
Return
Y
Error
-3
-20.0642
2
3
4
2
12.6
-10.3
36
12
-8
17.9677
-16.294
-14.1247
5
6
7
8
9
0.51
2.03
-1.8
5.79
5.87
53
-2
18
32
24
36.4102
-20.0613
3.64648
10.2987
2.22121
95%
0.96809 + or - 2.7972
95%
ANOVA Table
Source
SS
Regn. 291.134
Error 3042.87
Total 3334
df
1
7
8
MS
F
291.134 0.66974
434.695
Fcritical p-value
5.59146 0.4401
10-3
60
50
y = 0.9681x + 16.096
40
30
Y
20
10
0
-15
-10
-5
-10 0
10
15
-20
X
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)
10-16.
Simple Regression
Year
X
Value
Y
Error
1960
180000
84000
2
3
4
1970
1980
1990
40000
60000
160000
-72000
-68000
16000
2000
200000
40000
95%
1600 + or - 7949.76
s(b1)
95%
ANOVA Table
Source
SS
Regn. 2.6E+09
Error 1.9E+10
Total 2.1E+10
df
1
3
4
MS
F
2.6E+09 0.41026
6.2E+09
Fcritical p-value
10.128 0.5674
10-4
250000
y = 1600x - 3E+06
200000
150000
100000
50000
0
1950
1960
1970
1980
X
1990
2000
2010
There is a weak linear relationship (r) and the regression is not significant (r 2, F, p-value).
Limitations: sample size is very small.
Hidden variables: the 70s and 80s models have a different valuation than other decades possibly
due to a different model or style.
10-17. Regression equation is:
Credit Card Transactions = 39.6717 + 0.06129 Debit Card Transactions
2
95%
+ or -
0.17018
95%
X
+ or -
+ or ANOVA Table
Source
SS
Regn. 332366
Error 12984.5
Total 345351
df
1
4
5
MS
F
332366 102.389
3246.12
Fcritical p-value
7.70865 0.0005
There is no implication for causality. A third variable influence could be increases in per capital
income or GDP Growth.
10-5
10-18. SSE =
and b1:
y b b x = 0 and x y b
y nb xb = 0 and
0
b1 x = 0.
Expanding, we get:
xy - xb x b
2
=0
Solving the above two equations simultaneously for b0 and b1 gives the required results.
10-19. 99% C.I. for 1 :
df
1
3
4
MS
1024.14
7.62933
95%
95%
10-6
10-23. s(b0) = 0.971 s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for 1 :
0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at = 0.05.
95%
0.18663 + or - 0.03609
95%
1.55176 + or - 0.42578
95%
-255.943 + or - 237.219
10-25. s 2 gives us information about the variation of the data points about the computed regression line.
10-26. In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one
of them is dependent and the other independent, as the case in regression analysis. In
correlation analysis we are interested in the relation between two random variables, both
assumed normally distributed.
10-27. From the regression results for problem 10-11:
r 0.9890 Coefficient of Correlation
10-28. r = 0.960
r
0.9601
Coefficient of Correlation
10-7
10-29. t(5) =
0.3468
(1 .1203) / 3
= 0.640
r
(1 r ) /(n 2)
2
r
(1 r ) /(n 2)
2
.875
(1 .8752 ) / 8
= 5.11
There is statistical evidence of a correlation between the prices of gold and of copper.
Limitations: data are time-series data, hence not dependent random samples. Also, data set
contains only 10 points.
10-34. n= 65 r = 0.37
t (63) =
.37
(1 .37 2 ) / 63
= 3.16
10-35. z =
= 1/ n 3 = 1/ 62 = 0.127
z = ( z )/ = (0.3884 0.2237)/0.127 = 1.297.
10-36. Using TINV(,df) function in Excel, where df = n-2 = 52: =TINV(0.05,52) = 2.006645
And TINV(0.01, 52) = 2.6737
Reject H0 at 0.05 but not at 0.01. There is evidence of a linear relationship at = 0.05 only.
10-37. t (16) = b1/s(b1) = 3.1/2.89 = 1.0727.
Do not reject H0. There is no evidence of a linear relationship using any .
10-38. Using the regression results for problem 10-11:
critical value of t is: t( 0.05, 3) = 3.182
computed value of t is: t = b1/s(b1) = 10.12 / 0.87346 = 11.586
Reject H0. There is strong evidence of a linear relationship.
10-8
10-48. In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is
explained by the regression relationship.
10-49. r 2 in Problem 10-16: r 2 = 0.1203
10-50. Reading directly from the MINITAB output: r 2 = 0.962
10-9
10-51. Based on the coefficient of determination values for the five countries, the UK model explains
31.7% of the variation in long-term bond yields relative to the yield spread. This is the best
predictive model of the five. The next best model is the one for Germany, which explains 13.3%
of the variation. The regression models for Canada, Japan, and the US do not predict long-term
yields very well.
10-52. From the information provided, the slope coefficient of the equation is equal to -14.6. Since its
value is not close to zero (which would indicate that a change in bond ratings has no impact on
yields), it would indicate that a linear relationship exists between bond ratings and bond yields.
This is in line with the reported coefficient of determination of 61.56%.
10-53. r 2 in Problem 10-15: r 2 = 0.873
2
10-54.
( y y) = [( y y) ( y y )] = [( y y) 2( y y)( y y ) ( y y )
= ( y y ) 2 ( y y )( y y ) + ( y y )
But: 2 ( y y )( y y ) = 2 y ( y y ) 2 y ( y y ) = 0
2
because the first term on the right is the sum of the weighted regression residuals, which sum to
zero. The second term is the sum of the residuals, which is also zero. This establishes the result:
( y y ) 2 ( y y ) 2 ( y y ) 2 .
Fcritical p-value
10.128 0.0014
t (11) = 11.381
Fcritical
4.84434
t (4) = 10.119
t 2 = F (10.119)2 = 102.39
10-10
F
102.389
p-value
0.0005
Fcritical
7.70865
87,691/ 1
= 701.8
12,745 / 102
There is extremely strong evidence of a linear relationship between the two variables.
10-61. t (k2 ) = F (1,k) . Thus, F(1,20) = [b1/s(b1)]2 = (2.556/4.122)2 = 0.3845
Do not reject H0. There is no evidence of a linear relationship.
10-62
t (k2 )
SS / SS
X
= [b1/s(b1)] = XY
s / SS
X
= (SS XY / SS X )
MSE / SS X
SSR/1
MSR
SS 2XY / SS X
=
=
= F (1,k)
MSE
MSE
MSE
10-11
Durbin-Watson statistic
d
3.39862
Residual Plot
4
3
2
Error
1
0
-1
-2
-3
-4
X
Residual variance fluctuates; with only 5 data points the residuals appear to be normally
distributed.
Corresponding
Normal Z
2
1
0
-10
-5
0
-1
-2
-3
Residuals
10-12
10
1.2+
*
*
*
0.0+
*
*
*
*
*
-1.2+
Quality
30
40
50
60
No apparent inadequacy.
Residual Analysis
Durbin-Watson statistic
d
2.0846
10-68.
10-13
70
80
Residual Analysis
d
Durbin-Watson statistic
1.70855
10-69. In the American Express example, give a 95% prediction interval for x = 5,000:
y = 274.85 + 1.2553(5,000) = 6,551.35.
P.I. = 6,551.35 (2.069)(318.16) 1
1 (5,000 3,177.92) 2
25
40,947,557.84
= [5,854.4, 7,248.3]
10-70. Given that the slope of the equation for 10-52 is 14.6, if the rating falls by 3 the yield should
increase by 43.8 basis points.
10-71. For 99% P.I.:
t .005(23) = 2.807
6,551.35 (2.807)(318.16) 1
1 (5,000 3,177.92) 2
25
40,947,557.84
= [5,605.75, 7,496.95]
10-72. Point prediction: y 6.38 10.12(4) 46.86
The 99% P.I.: [28.465, 65.255]
Prediction Interval for Y
(1-) P.I. for Y given X
X
99%
4
46.86 + or - 18.3946
10-14
X
99%
5
56.98 + or - 20.407
X
95% 1990
144000 + or - 286633
X
95% 2000
160000 + or - 317990
Slope
b0
b1
-0.284157 2.779337
Slope
b0
0
b1
2.741537
Y
27.41537
Slope
b0
b1
-1.12783 2.825566
Prediction
X
5
Y
13
10-15
Y
27.12783
Slope
b0
4.236
b1
2
b1
0.051
0.02404
2.12487
t
b) Using an = 0.05, we would reject the null hypothesis of no relationship between the
response variable and the predictor based on the reported p-value of 0.034.
10-79. Given the reported p-value, we would reject the null hypothesis of no relationship between
neuroticism and job performance. Given the reported coefficient of determination, 19% of the
variation in job performance can be explained by neuroticism.
10-80. The t-statistic for the reported information is:
b1 0.233
4.236
0.055
sb1
10-16
Predictor
Constant
Coef
Oper Inc
67.62
0.40725
s = 9.633
R-sq = 89.0%
Stdev
12.32
t-ratio
5.49
11.38
0.03579
p
0.000
0.000
R-sq(adj) = 88.3%
Analysis of Variance
SOURCE
DF
SS
MS
F
p
Regression
1
12016
12016
129.49 0.000
Error
16
1485
93
Total
17
13500
Stock close based on an operating income of $305M is y = $56.24.
Coef
2.3153
0.0055201
Stdev
0.1077
0.0003129
R-sq = 95.1%
Analysis of Variance
SOURCE
DF
Regression
1
Error
16
Total
17
p
0.000
0.000
R-sq(adj) = 94.8%
SS
2.2077
0.1135
2.3212
Unusual Observations
Obs.
x
y
1
240
3.8067
t-ratio
21.50
17.64
MS
2.2077
0.0071
Fit
3.6401
F
311.25
Stdev.Fit
0.0366
p
0.000
Residual
0.1666
10-17
St.Resid
2.20R
The regression using the Log of monthly stock closings is a better fit. Operating Income explains
over 95% of the variation in the log of monthly stock closings versus 89% for non-transformed Y.
10-82. a) The calculated t-value for the slope coefficient is:
b1 0.92
92.00
sb1 0.01
Y = 3.820133 X + 52.273036
Intercept
Slope
b0
b1
52.273036 3.820133
b)
c)
90%
3.82013 + or - 0.4531
r2 = 0.9449, very high; F = 222.931 (p-value = 0.000): both indicate that X affects Y
10-18
d)
since the 99% CI does not contain the value 0, the slope is not 0
Confidence Interval for Slope
e)
99%
3.82013 + or - 0.77071
Y = 90.47436 when X = 10
Prediction
X
10
Y
90.47436
f)
X = 12.49354
g)
Residual Analysis
d
h)
Durbin-Watson statistic
2.56884
10-19
Y = 1.166957 X 1.060724
Intercept
Slope
b0
b1
-1.090724 1.166957
2)
3)
95 % CI for slope:
Confidence Interval for Slope
4)
95%
1.16696 + or - 0.37405
X
95% 10
10.5788 + or - 5.35692
5)
Durbin-Watson statistic
0.83996
10-20
6)
Cor responding
Nor mal Z
3
2
1
0
-10
-5
-1
10
-2
-3
Residuals
7)
Y = 1.157559 0.945353
Intercept
Slope
b0
b1
-0.945353 1.157559
Prediction
X
6
Y
6
risk has dropped a little but it is still above average since b1 > 1.10
10-21
CHAPTER 11
MULTIPLE REGRESSION
(The template for this chapter is: Multiple Regression.xls.)
11-1.
The assumptions of the multiple regression model are that the errors are normally and
independently distributed with mean zero and common variance 2 . We also assume that the X i
are fixed quantities rather than random variables; at any rate, they are independent of the error
terms. The assumption of normality of the errors is need for conducting test about the regression
model.
11-2.
Holding advertising expenditures constant, sales volume increases by 1.34 units, on average, per
increase of 1 unit in promotional experiences.
11-3.
In a correlational analysis, we are interested in the relationships among the variables. On the
other hand, in a regression analysis with k independent variables, we are interested in the effects
of the k variables (considered fixed quantities) on the dependent variable only (and not on one
another).
11-4.
A response surface is a generalization to higher dimensions of the regression line of simple linear
regression. For example, when 2 independent variables are used, each in the first order only, the
response surface is a plane is a plane in 3-dimensional euclidean space. When 7 independent
variables are used, each in the first order, the response surface is a 7-dimensional hyperplane in
8-dimensional euclidean space.
11-5.
8 equations.
11-6.
The least-squares estimators of the parameters of the multiple regression model, obtained as
solutions of the normal equations.
11-7.
Y nb b X b X
X Y b X b X b X X
X Y b X b X X b X
0
2
2
11-1
b1 = 0.0487011
b2 = 10.897682
Using SYSTAT:
DEP VAR:
VALUE
N: 9
MULTIPLE R: .909
.769
COEFFICIENT
59.477
STD ERROR
STD COEF
TOLERANCE
P(2TAIL)
CONSTANT
9.800
80.763
0.121
0.907
SIZE
0.173
0.040
0.753
0.9614430
4.343
0.005
DISTANCE
31.094
14.132
0.382
0.9614430
2.200
0.070
0.000
ANALYSIS OF VARIANCE
SOURCE
SUM-OF-SQUARES
DF
MEAN-SQUARE
F-RATIO
REGRESSION
101032.867
50516.433
14.280
RESIDUAL
21225.133
3537.522
1
2
Size Distance
0.17331
31.094
0.0399
14.132
4.34343
2.2002
0.0049
0.0701
VIF 1.0401
P
0.005
Value
3
1.0401
ANOVA Table
Source
SS
Regn. 101033
Error 21225.1
df
2
6
MS
50516
3537.5
Total 122258
15282
11-2
FCritical p-value
F
14.28 5.1432 0.0052
2
R 0.8264
s 59.477
2
Adjusted R 0.7685
11-9.
With no advertising and no spending on in-store displays, sales are b0 47.165 (thousands) on
the average. Per each unit (thousand) increase in advertising expenditure, keeping in-store
display expenditure constant, there is an average increase in sales of b1 = 1.599 (thousand).
Similarly, for each unit (thousand) increase in in-store display expenditure, keeping advertising
constant, there is an average increase in sales of b2 = 1.149 (thousand).
11-10. We test whether there is a linear relationship between Y and any of the X, variables (that is, with
at least one of the Xi). If the null hypothesis is not rejected, there is nothing more to do since
there is no evidence of a regression relationship. If H0 is rejected, we need to conduct further
analyses to determine which of the variables have a linear relationship with Y and which do not,
and we need to develop the regression model.
11-11. Degrees of freedom for error = n 13.
11-12. k = 2
n = 82
SSE = 8,650 SSR = 988
MSR = SSR / k = 988 / 2 = 494
SST = SSR + SSE = 988 + 8650 = 9638
MSE = SSE / n (k+1) = 8650 / 79 = 109.4937
F = MSR / MSE = 494 / 109.4937 = 4.5116
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(4.5116, 2, 79) = 0.013953
Yes, there is evidence of a linear regression relationship at = 0.05, but not at 0.01.
7,768 / 4
= 1,942/197.625 = 9.827
(15,673 7,768) / 40
Yes, there is evidence of a linear regression relationship between Y and at least one of the
independent variables.
11-14. Source
Regression
Error
Total
SS
7,474.0
672.5
8,146.5
df
3
13
16
MS
2,491.33
51.73
F
48.16
Since the F-ratio is highly significant, there is evidence of a linear regression relationship
between overall appeal score and at least one of the three variables prestige, comfort, and
economy.
11-15. When the sample size is small; when the degrees of freedom for error are relatively smallwhen
adding a variable and thus losing a degree if freedom for error is substantial.
11-3
11-16. R 2 = SSR/SST. As we add a variable, SSR cannot decrease. Since SST is constant, R 2 cannot
decrease.
11-17. No. The adjusted coefficient is used in evaluating the importance of new variables in the
presence of old ones. It does not apply in the case where all we consider is a single independent
variable.
11-18. By the definition of the adjusted coefficient of determination, Equation (11-13):
R2 = 1
n 1
SSE /( n k 1)
= 1 (SSE/SST)
n k 1
SST /( n 1)
n 1
n (k 1)
11-19. The mean square error gives a good indication of the variation of the errors in regression.
However, other measures such as the coefficient of multiple determination and the adjusted
coefficient of multiple determination are useful in evaluating the proportion of the variation in
the dependent variable explained by the regressionthus giving us a more meaningful measure
of the regression fit.
11-20. Given an adjusted R 2 = 0.021, only 2.1% of the variation in the stock return is explained by the
four independent variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(2.27, 4, 433) = 0.06093
There is evidence of a linear regression relationship at = 0.10 only.
s=
MSE =
51.73 = 7.192
R2
=1 (1 R 2)
11-23. R 2 = 1 (1 R 2)
n 1
= 1 (1 0.918)(16/12) = 0.8907
n (k 1)
11-4
n 1
= 1 (1 0.769)(241/235) = 0.7631
n (k 1)
Since R 2 =76.31%, approximately 76% of the variation in the information price is characterized
by the 6 independent marketing variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(44.8, 6, 235) = 2.48855E-36
There is evidence of a linear regression relationship at all s.
11-25. a.
The regression expresses stock returns as a plane in space, with firm size ranking and
stock price ranking as the two horizontal axes:
RETURN = 0.484 - 0.030(SIZRNK) 0.017(PRCRNK)
The t-test for a linear relationship between returns and firm size ranking is highly significant,
but not for returns against stock price ranking.
(
k
1
)
n (k 1)
R 2 = 1 (1 R 2 )
= 1 (1 0.093)(47/49) = 0.130
n 1
Thus, 13% of the variation is due to the two independent variables.
c. The adjusted R 2 is quite low, indicating that the regression on both variables is not a good
model. They should try regressing on size alone.
11-26. R 2 = 1 (1 - R 2)
n 1
= 1 (1 0.72)(712/710) = 0.719
n (k 1)
n = 500
Source
Regn.
Error
Total
SSE = 6179
SST = 23108
SS
16929
6179
df
8
491
23108
499
11-5
MS
2116.125
12.5845
3.0684E+14
F
168.153
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the dfs refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(168.153, 8, 491) = 0.00 approximately
There is evidence of a linear regression relationship at all s.
R 2 = SSR/SST = 0.7326
R2= 1
SSE /[ n (k 1)]
= 0.7282
SST /( n 1)
MSE = 12.5845
11-28. A joint confidence region for both parameters is a set of pairs of likely values of 1 , and 2 at
95%. This region accounts for the mutual dependency of the estimators and hence is elliptical
rather than rectangular. This is why the region may not contain a bivariate point included in the
separate univariate confidence intervals for the two parameters.
11-29. Assuming a very large sample size, we use the following formula for testing the significance of
each of the slope parameters: z
bi
. and use = 0.05. Critical value of |z| = 1.96
sbi
11-32. Use the following formula for testing the significance of each of the slope parameters:
bi
. and use = 0.05. Critical value of |z| = 1.96
sbi
11-6
bi
. and use = 0.05. Critical value of |z| = 1.96
sbi
11-7
Sl.No.
1
2
3
4
5
6
7
8
9
10
Y
Profits
-1221
-2808
-773
248
38
1461
442
14
57
108
ANOVA Table
Source
Regn.
Error
Total
1
Ones
1
1
1
1
1
1
1
1
1
1
X1
Employees
96400
63000
70600
39100
37680
31700
32847
12867
11475
6000
SS
df
4507008.861 2
7281731.539 7
11788740.4
X2
Revenues
17440
13724
13303
9510
8870
6846
5937
2445
2254
1311
MS
2253504.43
1040247.363
1309860.044
FCritical p-value
F
2.166
4.737 0.1852
2
R 0.3823
29.8304
29.8304
1019.925
0.2058
Adjusted R
Correlation matrix
1
2
Employees Revenues
1 Employees
1.0000
2 Revenues
0.9831
1.0000
Y
Profits
-0.5994
-0.6171
Regression Equation:
Profits = 834.95 + 0.009 Employees - 0.174 Revenues
The regression equation is not significant (F value), and there is a large amount of
multicollinearity present between the two independent variables (0.9831). There is so much
multicollinearity present that the negative partial correlations between the independent variables
and profits are not maintained in the regression results (both of the parameters of the independent
variables should be negative). None of the values of the parameters are significant.
11-40. The residual plot exhibits both heteroscedasticity and a curvature apparently not accounted for in
the model.
11-8
11-41.
a) residuals appear to be normally distributed
b) residuals are not normally distributed
11-42. An outlier is an observation far from the others.
11-43. A plot of the data or a plot of the residuals will reveal outliers. Also, most computer packages
(e.g., MINITAB) will automatically report all outliers and suspected outliers.
11-44. Outliers, unless they are due to errors in recording the data, may contain important information
about the process under study and should not be blindly discarded. The relationship of the true
data may well be nonlinear.
11-45. An outlier tends to tilt the regression surface toward it, because of the high influence of a large
squared deviation in the least-squares formula, thus creating a possible bias in the results.
11-46. An influential observation is one that exerts relatively strong influence on the regression surface.
For example, if all the data lie in one region in X-space and one observation lies far away in X, it
may exert strong influence on the estimates of the regression parameters.
11-47. This creates a bias. In any case, there is no reason to force the regression surface to go through
the origin.
11-48. The residual plot in Figure 11-16 exhibits strong heteroscedasticity.
11-49. The regression relationship may be quite different in a region where we have no observations
from what it is in the estimation-data region. Thus predicting outside the range of available data
may create large errors.
11-50. y = 47.165 + 1.599(8) + 1.149(12) = 73.745 (thousands), i.e., $73,745.
11-51. In Problem 11-8: X 2 (distance) is not a significant variable, but we use the complete original
regression relationship given in that problem anyway (since this problem calls for it):
y = 9.800 + 0.173X 1 + 31.094X 2
y (1800,2.0) = 9.800 + (.173)1800 + (31.094)2.0 = 363.78
11-9
bi
. and use = 0.05. Critical value of |z| = 1.96
sbi
For the dummy variable: z = -0.003 / 0.29 = -0.0103 is not significant. A firms being regulated
or not does not affect its leverage level.
11-59. Two-way ANOVA.
11-60. Use analysis of covariance. Run it as a regressionLength of Stay is the concomitant variable.
11-61. Early investment is not statistically significant (or may be collinear with another variable). Rerun
the regression without it. The dummy variables are both significantthere is a distinct line (or
plane if you do include the insignificant variable) for each type of firm.
11-62. This is a second-order regression model in three independent variables with cross-terms.
11-63. The STEPWISE routine chooses Price and M1 * Price as the best set of explanatory variables.
This gives the estimated regression relationship:
Exports = 1.39 + 0.0229Price + 0.00248M1 * Price
The t-statistics are: 2.36, 4.57, 9.08, respectively. R 2 = 0.822.
11-64. The STEPWISE routine chooses the three original variables: Prod, Prom, and Book, with no
squares. Thus the original regression model of Example 11-3 is better than a model with squared
terms.
11-10
Example 11-3 with production costs squared: higher s than original model.
Multiple Regression Results
0
Intercept
b 7.04103
s(b) 5.82083
t 1.20963
0.2451
p-value
1
2
prod promo
3.10543 2.2761
1.76478 0.262
1.75967 8.6887
0.0988 0.0000
3
4
book prod^2
7.1125 -0.017
1.9099 0.1135
3.7241
-0.15
0.0020 0.8827
df
4
15
MS
FCritical p-value
F
1581.4 109.07 3.0556 0.0000
14.498
Total 6542.95
19
344.37
s 3.8076
2
R 0.9668
Adjusted R 0.9579
Example 11-3 with production and promotion costs squared: higher s and slightly higher R2
Multiple Regression Results
0
Intercept
b 5.30825
s(b) 5.84748
t 0.90778
p-value 0.3794
1
2
prod promo
4.29943 1.2803
1.95614 0.8094
2.19792 1.5817
0.0453 0.1360
3
book
6.7046
1.8942
3.5396
0.0033
4
5
prod^2 promo^2
-0.0948 0.0731
0.1262 0.0564
-0.7511
1.297
0.4651 0.2156
df
5
14
MS
FCritical p-value
F
1269.8 91.564 2.9582 0.0000
13.867
Total 6542.95
19
344.37
11-11
0.9703
s 3.7239
2
Adjusted R 0.9597
Example 11-3 with promotion costs squared: slightly lower s, slightly higher R2
Multiple Regression Results
0
1
2
Intercept prod promo
b 9.21031 2.86071 1.5635
s(b) 2.64412 0.39039 0.7057
t 3.48332 7.3279 2.2157
p-value 0.0033 0.0000 0.0426
VIF
3
4
book promo^2
7.0476
0.053
1.8114 0.0489
3.8908 1.0844
0.0014 0.2953
ANOVA Table
Source
SS
Regn. 6340.98
Error 201.967
df
4
15
MS
1585.2
13.464
Total 6542.95
19
344.37
FCritical p-value
F
117.74 3.0556 0.0000
2
R 0.9691
s 3.6694
2
Adjusted R 0.9609
11-65. Use the following formula for testing the significance of each of the slope parameters:
bi
. and use = 0.05. Critical value of |z| = 1.96
sbi
11-12
11-70. A transformed model may be more parsimonious, when the model describes the process well.
11-71. Try the transformation logY.
11-72. A good model is log(Exports) versus log(M 1) and log(Price). This model has R 2 = 0.8652. Thus
implies a multiplicative relation.
11-73. A logarithmic model.
11-74. This dataset fits an exponential model, so use a logarithmic transformation to linearize it.
11-75. A multiplicative relation (Equation (11-26)) with multiplicative errors. The reported error term,
, is the logarithm of the multiplicative error term. The transformed error term is assumed to
satisfy the usual model assumptions.
11-76. An exponential model Y = (e 0 1x1 2 x2 ) =
(e3.79+1.66 1 +2.91 2 )
11-77. No. We cannot find a transformation that will linearize this model.
11-78. Take logs of both sides of the equation, giving:
log Q = log 0 + 1log C + 2log K + 3log L + log
11-79. Taking reciprocals of both sides of the equation.
11-80. The square-root transformation Y Y
11-81. No. They minimize the sum of the squared deviations relevant to the estimated, transformed
model.
11-82. It is possible that the relation between a firms total assets and bank equity is not linear.
Including the logarithm of a firms total assets is an attempt to linearize that relationship.
11-83.
Prod
Prom
Book
Earn
.867
.882
.547
Prod
Prom
.638
.402
.319
11-13
11-84. The VIFs are: 1.82, 1.70, 1.20. No severe multicollinearity is present.
11-85. The sample correlation is 0.740. VIF = 2.2 minor multicollinearity problem
11-86.
a) Y = 11.031 + 0.41869 X1 7.2579 X2 + 37.181 X3
Multiple Regression Results
0
1
2
X1
X2
Intercept
11.031 0.41869 -7.2579
b
s(b) 20.9905 0.28418 5.3287
t 0.52552 1.47334
-1.362
0.6107
0.1714 0.2031
p-value
VIF
1.0561 557.7
ANOVA Table
Source
SS
Regn. 2459.78
Error 5981.02
Total
8440.8
3
X3
37.181
26.545
1.4007
0.1916
557.9
df
3
10
MS
819.93
598.1
13
649.29
FCritical p-value
F
1.3709 3.7083 0.3074
R
s 24.456
2
0.2914
Adjusted R 0.0788
1
X1
0.29454
0.29945
0.98361
0.3485
2
3
X2
X3
16.583 -81.717
23.96
119.5
0.6921 -0.6838
0.5046 0.5096
1.0262 9867.0
9867.4
ANOVA Table
Source
Regn.
Error
SS
1605.98
6834.82
df
3
10
MS
FCritical p-value
F
535.33 0.7832 3.7083 0.5300
683.48
Total
8440.8
13
649.29
11-14
R 0.1903
s 26.143
2
Adjusted R -0.0527
c) all parameters of the equation change values and some change signs. X2 and X3 are correlated
(0.9999) Solution: use either X2 or X3, but not both.
d) Yes, the correlation matrix indicated that X2 and X3 were correlated
X1
X2
X3
X1 1.0000
X2 -0.0137 1.0000
X3 -0.0237 0.9991 1.0000
11-87. Artificially high variances of regression coefficient estimators; unexpected magnitudes of some
coefficient estimates; sometimes wrong signs of these coefficients. Large changes in coefficient
estimates and standard errors as a variable or a data point is added or deleted.
11-88. Perfect collinearity exists when at least one variable is a linear combination of other variables.
This causes the determinant of the X X matrix to be zero and thus the matrix non-invertible. The
estimation procedure breaks down in such cases. (Other, less technical, explanations based on the
text will suffice.)
11-89. Not true. Predictions may be good when carried out within the same region of the
multicollinearity as used in the estimation procedure.
11-90. No. There are probably no relationships between Y and any of the two independent variables.
11-91. X 2 and X 3 are probably collinear.
11-92. Delete one of the variables X 2, X 3, X 4 to check for multicollinearity among a subset of these
three variables, or whether they are all insignificant.
11-93. Drop some of the other variables one at a time and see what happens to the suspected sign of the
estimate.
11-94. The purpose of the test is to check for a possible violation of the assumption that the regression
errors are uncorrelated with each other.
11-95. Autocorrelation is correlation of a variable with itself, lagged back in time. Third-order
autocorrelation is a correlation of a variable with itself lagged 3 periods back in time.
11-96. First-order autocorrelation is a correlation of a variable with itself lagged one period back in
time. Not necessarily: a partial fifth-order autocorrelation may exist without a first-order
autocorrelation.
11-97. 1) The test checks only for first-order autocorrelation. 2) The test may not be conclusive.
3) The usual limitations of a statistical test owing to the two possible types of errors.
11-15
11-98. DW = 0.93
n = 21
k=2
d L = 1.13
d U = 1.54 4 d L = 2.87
4 d U = 2.46
At the 0.10 level, there is some evidence of a positive first-order autocorrelation.
11-99. DW = 2.13
n = 20
k=3
d L = 1.00
d U = 1.68 4 d L = 3.00
4 d U = 2.32
At the 0.10 level, there is no evidence of a first-order autocorrelation.
Durbin-Watson d = 2.125388
11-100. DW = 1.79 n = 10
k = 2 Since the table does not list values for n = 10, we will use the
closest table values, those for n = 15 and k = 2:
d L = 0.95
d U = 1.54 4 d L = 3.05
4 d U = 2.46
At the 0.10 level, there is no evidence of a first-order autocorrelation. Note that the table values
decrease as n decreases, and thus our conclusion would probably also hold if we knew the actual
critical points for n = 10 and used them.
11-101. Suppose that we have time-series data and that it is known that, if the data are autocorrelated, by
the nature of the variables the correlation can only be positive. In such cases, where the
hypothesis is made before looking at the actual data, a onesided DW test may be appropriate.
(And similarly for a negative autocorrelation.)
11-102. DW analysis on results from problem 11-39:
Durbin-Watson d = 1.552891
k = 2 independent variables
n = 10 for the sample size.
Table 7 for the critical values of the DW statistic begins with sample sizes of 15, which is a little
larger than our sample. Using the values for size 15 as an approximation, we have:
for = 0.05, dl = 0.95 and du = 1.54
the value for d is slightly larger than du
indicating no autocorrelation.
11-16
Residual Plot
2000
1500
Residual
1000
500
0
0
20000
40000
60000
80000
100000
120000
-500
-1000
-1500
-2000
11-103. F (r,n(k+1)) =
(6.996 6.9898) / 2
(SSE R SSE F ) / r
=
= 0.0275
0.1127
MSE F
Cannot reject H0. The two variables should definitely be droppedthey add nothing to the
model.
11-104. Y = 47.16 + 1.599X 1 + 1.1149X 2
The STEPWISE regression routine selects both
variables for the equation. R 2 = 0.961.
11-105. The STEPWISE procedure selects all three variables. R 2 = 0.9667.
11.106. All possible regression is the best procedure because it evaluates every possibility. It is expensive
in computer time; however, as computing power and speed increase, this becomes a very viable
option. Forward selection is limited by the fact that once a variable is in, there is no way it can
come out once it becomes insignificant in the presence of new variables. Backward elimination is
similarly limited. Stepwise regression is an excellent method that enjoys very wide use and that
has stood the test of time. It has the advantages of both the forward and the backward methods,
without their limitation.
11-107. Because a variable may lose explanatory power and become insignificant once other variables
are added to the model.
11-108. Highest adjusted R 2; lowest MSE; highest R 2 for a given number of variables and the assessment
of the increase in R 2 as we increase the number of variables; Mallowss Cp.
11-109. No. There may be several different best models. A model may be best using one criterion, and
not the best using another criterion.
11-17
Real GDP
Defense
Exp % GDP
1970
171
2.3
14.6
1,219
1980
238
2.7
17.0
1,052
1990
328
2.2
17.3
1,670
1992
330
2.3
17.5
1,800
1993
342
2.6
17.7
2,000
1994
359
2.5
17.9
1,230
1995
369
2.7
18.1
1,800
1996
382
2.6
18.3
2,090
2.5
18.4
1,790
1997
394
Grain
Population Yields
1.3387
ANOVA Table
Source
SS
Regn. 40654.3
Error 2039.75
Total
42694
2.0253
1.6331
df
3
5
MS
FCritical p-value
F
13551 33.218 5.4094 0.0010
407.95
5336.8
R 0.9522
Correlation matrix
1
2
3
Defense Population Grain Yields
Defense 1.0000
Population 0.4444
1.0000
Grain Yields 0.0689
0.5850
1.0000
Real GDP 0.2573
0.9484
0.7023
11-18
s 20.198
2
Adjusted R 0.9236
Partial F Calculations
Australia
#Independent variables in full model
#Independent variables dropped from the model
SSEF
2039.748
SSER
39867.85
Partial F
p-value
46.36369
0.0010
3k
2r
Predictor
Constant
Sincerity
Excitement
Ruggedness
Sophistication
S = 3.68895
Coef
-36.49
0.0983
1.9859
0.5071
-0.3664
SE Coef
24.27
0.3021
0.2063
0.7540
0.3643
R-Sq = 94.6%
T
-1.50
0.33
9.63
0.67
-1.01
P
0.171
0.753
0.000
0.520
0.344
R-Sq(adj) = 91.8%
Based on the p-values for the estimated coefficients, only the assessed excitement variable is
significant. The adjusted R-square indicates that 91.8% of the variation in commercial
effectiveness is explained by the model. The ANOVA test indicates that a linear relation exists
between the dependent and independent variables.
11-19
Analysis of Variance
Source
Regression
Residual Error
Total
DF
4
8
12
SS
1890.36
108.87
1999.23
MS
472.59
13.61
F
34.73
P
0.000
I1
I2
I3
Banking
Computers
Construction
Energy
1.
A
1
2
3
4
5
6
7
8
9
10
b
s(b)
t
p-value
F
G
Chapter 11 Case - ROC
0
1
Intercept
Sales
14.6209 2.30E-05
2.51538 2.60E-05
5.81259 0.88781
0.0000
0.3770
2
Oper M
0.0824
0.0553
1.4905
0.1396
3
Debt/C
-0.0919
0.0444
-2.0692
0.0414
4
I1
10.051
2.0249
4.9636
0.0000
5
I2
2.8059
2.2756
1.2331
0.2208
6
I3
-1.6419
1.8725
-0.8769
0.3829
1.2472
1.2212
1.6224
1.8560
1.8219
1.9096
VIF
Based on the regression coefficients of I1, I2, I3, the ranking of the sectors from highest return to lowest
will be:
Computers, Construction, Banking, Energy
2. From "Partial F" sheet, the p-value is almost zero. Hence the type of industry is significant.
11-20
Banking
Computers
12.9576 + or 12.977
Construction
Energy
23.0082 + or 13.295
15.7635 + or 13.139
11.3157 + or 12.864
11-21
CHAPTER 12
TIME SERIES, FORECASTING, AND INDEX NUMBERS
12-1.
Trend analysis is a quick method of determining in which general direction the data are moving
through time. The method lacks, however, the theoretical justification of regression analysis
because of the inherent autocorrelations and the intended use of the method in extrapolation
beyond the estimation data set.
12-2.
for t = 24
t
24
25
26
27
28
Forecast
Z-hat
12.0553
11.3607
10.666
9.9713
9.27668
Regression Statistics
2
r 0.5111
MSE 22.24426
Slope -0.69466
Intercept 28.72727
r 2 = .9858
y (2008) = 198.182
y (2009) = 210.748
(Using the template: Trend Forecast.xls)
12-1
t
1
2
3
4
5
6
7
8
9
10
11
12
Zt
53
65
74
85
92
105
120
128
144
158
179
195
Forecast
t
13
14
15
16
17
18
19
20
21
22
23
24
Z-hat
198.182
210.748
223.315
235.881
248.448
261.014
273.58
286.147
298.713
311.28
323.846
336.413
Regression Statistics
2
r 0.9858
MSE 32.51189
Slope 12.56643
Intercept 34.81818
Forecast for 2008 (t = 13) = 198.182 and for 2009 (t = 14) = 210.748
12-4.
r 2 = 0.8961
for t = 12
12-2
t
12
13
14
15
16
Forecast
Z-hat
39.0545
42.3818
45.7091
49.0364
52.3636
Regression Statistics
2
r 0.8961
MSE 15.68081
Slope 3.327273
Intercept -0.87273
12-6.
12-7.
The term, seasonal variation is reserved for variation with a cycle of one year.
12-8.
12-9.
The weather, for one thing, changes from year to year. Thus sales of winter clothing, as an
example, would have a variable seasonal component.
12-3
12-10. Using MINITAB to conduct a multiple regression with a time variable and 11 dummy variables:
Regression Analysis: profit versus t, jan, ...
adjusted R-square
is reasonable.
Setting t = 25, Jan = 1 and the rest of the months = 0, we
The The
regression
equation
is
get a =forecasted
for Jan, t2007
1.588 jan + 0.121 feb + 0.319 mar + 0.567 apr
profit
0.163 value
+ 0.0521
+ =
0.123
+ 0.615 may + 0.413 jun + 0.510 jul + 0.758 aug + 0.856 sep +
Predicted0.904
Values for
New
oct
+ Observations
0.602 nov
New
Obs
Fit SE
Fit SE
95%
CI
95%
Predictor
Coef
Coef
T PI
P
1
1.5875
0.3104
(0.9043,
2.2707)
(0.5874,
2.5876)
Constant
0.1625
0.3104 0.52 0.611
t
0.05208 0.01129 4.61 0.001
jan
0.1229
0.3543 0.35 0.735
for New 0.34
Observations
febValues of Predictors
0.1208
0.3505
0.737
mar
0.3188
0.3470 0.92 0.378
New
apr
0.5667
0.3439 1.65 0.128
Obs
t
jan
feb
mar
apr
may 1 25.0 0.6146
0.3411 0.000000
1.80 0.099
1.00 0.000000
0.000000
jun
0.4125
0.3387 1.22 0.249
julNew
0.5104
0.3366 1.52 0.158
augObs
0.7583
0.3349 2.26
0.045
aug
sep
oct
nov
0.000000
0.000000
sep 1 0.000000
0.8563
0.3336
2.57 0.000000
0.026
oct
0.9042
0.3326 2.72 0.020
nov
0.6021
0.3320 1.81 0.097
may
0.000000
Forecast
Z-hat
8728083
8768252
8808422
8848592
8888761
Regression Statistics
2
r 0.9715
MSE 7.82E+08
Slope 40169.72
Intercept 8165707
12-4
jun
0.000000
jul
0.000000
data:
t (mon.) Z(t)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
(Jul)
(Aug)
(Sep)
(Oct)
(Nov)
(Dec)
(Jan)
(Feb)
(Mar)
(Apr)
(May)
(Jun)
(Jul)
(Aug)
(Sep)
(Oct)
(Nov)
(Dec)
(Jan)
(Feb)
(Mar)
(Apr)
(May)
(Jun)
(Jul)
(Aug)
(Sep)
(Oct)
(Nov)
(Dec)
(Jan)
(Feb)
(Mar)
(Apr)
trend:
Zhat(t)
7.40
6.80
6.40
6.60
6.50
6.00
7.00
6.70
8.20
7.80
7.70
7.30
7.00
7.10
6.90
7.30
7.00
6.70
7.60
7.20
7.90
7.70
7.60
6.70
6.30
5.70
5.60
6.10
5.80
5.90
6.20
6.00
7.30
7.40
7.18
7.17
7.15
7.13
7.11
7.09
7.07
7.05
7.03
7.01
6.99
6.97
6.95
6.93
6.91
6.89
6.87
6.86
6.84
6.82
6.80
6.78
6.76
6.74
6.72
6.70
6.68
6.66
6.64
6.62
6.60
6.58
6.56
6.54
Centered
Moving
Average
C(t) =
CMA
Zhat(t)
7.02
7.01
7.05
7.10
7.15
7.20
7.25
7.30
7.30
7.29
7.28
7.25
7.20
7.11
7.00
6.89
6.79
6.71
6.62
6.51
6.43
6.40
0.993
0.995
1.002
1.012
1.022
1.032
1.043
1.052
1.057
1.057
1.059
1.058
1.053
1.043
1.029
1.017
1.005
0.996
0.985
0.971
0.963
0.960
12-5
Ratio
Moving
Average
Seasonal
Index
S
99.76
95.54
116.38
109.92
107.76
101.45
96.55
97.32
94.47
100.17
96.16
92.41
105.62
101.29
112.92
111.73
111.90
99.88
95.21
87.58
87.05
95.37
95.68
92.25
90.57
97.57
95.96
92.22
102.47
98.21
114.41
110.59
109.60
100.45
95.68
92.25
90.57
97.57
95.96
92.22
102.47
98.21
114.41
110.59
109.60
100.45
95.68
92.25
90.57
97.57
95.96
92.22
102.47
98.21
114.41
110.59
[Deseasoned]
Z(t)/S%
7.73
7.37
7.07
6.76
6.77
6.51
6.83
6.82
7.17
7.05
7.03
7.27
7.32
7.70
7.62
7.48
7.29
7.27
7.42
7.33
6.90
6.96
6.93
6.67
6.58
6.18
6.18
6.25
6.04
6.40
6.05
6.11
6.38
6.69
Month
11 Nov
12 Dec
1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug
9 Sep
10 Oct
11 Nov
12 Dec
1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug
Y
0.38
0.38
0.44
0.42
0.44
0.46
0.48
0.49
0.51
0.52
0.45
0.4
0.39
0.37
0.38
0.37
0.33
0.33
0.32
0.32
0.32
0.31
Intercept
Deseasonalized
0.40913
0.41684
0.45224
0.42406
0.48048
0.49272
0.45687
0.45687
0.4539
0.44922
0.44242
0.43222
0.4199
0.40587
0.39057
0.37357
0.36036
0.35347
0.30458
0.29837
0.2848
0.26781
t
23
24
25
26
Year
2006
2006
2006
2006
Slope
-0.00818
12-6
Forecasts
Month
9 Sep
10 Oct
11 Nov
12 Dec
Y
0.33587
0.29803
0.29152
0.27867
Data
Year
Month
2005 1
Jan
2005 2
Feb
2005 3
Mar
2005 4
Apr
2005 5
May
2005 6
Jun
2005 7
Jul
2005 8
Aug
2005 9
Sep
2005 10
Oct
2005 11
Nov
2005 12
Dec
2006 1
Jan
2006 2
Feb
2006 3
Mar
2006 4
Apr
2006 5
May
2006 6
Jun
2006 7
Jul
2006 8
Aug
2006 9
Sep
Intercept
Y Deseasonalized
14
16.8856
10
22.2728
50
54.0922
24
24.6668
16
15.3033
15
15.8805
20
22.3533
42
22.5141
18
21.3884
26
20.2627
21
20.6647
20
21.4286
18
21.71
10
22.2728
22
23.8006
24
24.6668
26
24.8678
24
25.4087
18
20.1179
58
31.0909
40
47.5297
t
22
23
24
25
Forecasts
Year Month
2006 10 Oct
2006 11 Nov
2006 12 Dec
2007 1 Jan
Y
28.73718
22.75217
20.88982
18.55136
Slope
-0.00694
The forecast for October is considerably less than the actual percents recorded for August and
September. The forecast reflects the historical percentage of negative stories instead of the
recent past history.
12-15. (Using the template: Trend+Season Forecasting.xls)
Forecasting with Trend and Seasonality (quarterly)
t Year
1 2005
2 2005
3 2005
4 2005
5 2006
6 2006
7 2006
8 2006
9 2007
Q
1
2
3
4
1
2
3
4
1
Deseasonalized
3.4
3.869621
4.5
4.150717
4
4.258289
5
4.554288
4.2
4.78012
5.4
4.98086
4.9
5.216404
5.7
5.191888
4.6
5.23537
12-7
Forecasts
Year
t
10
2007
11
2007
12
2007
Q
2
3
4
Y
6.20676
5.56327
6.71894
Seasonal
Indices
Q
Index
1
87.86
2
108.42
3
93.93
4
109.79
400
Actual Forecast
27 27.6959
26 27.4175
27 26.8505
28 26.9103
27.3462
12-8
Zhat(1) = Z(1) = 57
Zhat( 2):
Zhat( 3):
Zhat( 4):
Zhat( 5):
Zhat( 6):
Zhat( 7):
Zhat( 8):
Zhat( 9):
Zhat(10):
Zhat(11):
Zhat(12):
Zhat(13):
Zhat(14):
Zhat(15):
Zhat(16):
Zhat(17):
0.3(57.00)
0.3(58.00)
0.3(60.00)
0.3(54.00)
0.3(56.00)
0.3(53.00)
0.3(55.00)
0.3(59.00)
0.3(62.00)
0.3(57.00)
0.3(50.00)
0.3(48.00)
0.3(52.00)
0.3(55.00)
0.3(58.00)
0.3(61.00)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0.7(57.00)
0.7(57.00)
0.7(57.30)
0.7(58.11)
0.7(56.88)
0.7(56.61)
0.7(55.53)
0.7(55.37)
0.7(56.46)
0.7(58.12)
0.7(57.79)
0.7(55.45)
0.7(53.21)
0.7(52.85)
0.7(53.50)
0.7(54.85)
w = 0.8
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
57.00
57.30
58.11
56.88
56.61
55.53
55.37
56.46
58.12
57.79
55.45
53.21
52.85
53.50
54.85
56.69
0.8(57.00)
0.8(58.00)
0.8(60.00)
0.8(54.00)
0.8(56.00)
0.8(63.00)
0.8(55.00)
0.8(59.00)
0.8(62.00)
0.8(57.00)
0.8(50.00)
0.8(48.00)
0.8(52.00)
0.8(55.00)
0.8(58.00)
0.8(61.00)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0.2(57.00)
0.2(57.00)
0.2(57.80)
0.2(59.56)
0.2(55.11)
0.2(55.82)
0.2(53.56)
0.2(54.71)
0.2(58.14)
0.2(61.23)
0.2(57.85)
0.2(51.57)
0.2(48.71)
0.2(51.34)
0.2(54.27)
0.2(57.25)
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
57.00
57.80
59.56
55.11
55.82
53.56
54.71
58.14
61.23
57.85
51.57
48.71
51.34
54.27
57.25
60.25
The w = .8 forecasts follow the raw data much more closely. This makes sense because the raw
data jump back and forth fairly abruptly, so we need a high w for the forecasts to respond to
those oscillations sooner.
12-9
195.00
193.60
191.08
186.82
182.05
187.61
185.78
185.94
184.58
184.87
194.06
197.52
199.26
200.48
199.44
190.73
187.42
189.93
193.48
198.04
199.41
192.82
188.05
193.61
196.68
199.01
199.70
12-10
Exponential Smoothing
MAE
MAPE
MSE
4.8241 2.52% 34.8155
w 0.7
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Zt
195
193
190
185
180
190
185
186
184
185
198
199
200
201
199
187
186
191
195
200
200
190
186
196
198
200
200
Forecast
195
195
193.6
191.08
186.824
182.047
187.614
185.784
185.935
184.581
184.874
194.062
197.519
199.256
200.477
199.443
190.733
187.42
189.926
193.478
198.043
199.413
192.824
188.047
193.614
196.684
199.005
199.702
|Error|
%Error
Error
3.6
6.08
6.824
7.9528
2.61416
0.21575
1.93527
0.41942
13.1258
4.93775
2.48132
1.7444
1.47668
12.443
4.7329
3.58013
5.07404
6.52221
1.95666
9.413
6.8239
7.95283
4.38585
3.31575
0.99473
1.89%
3.29%
3.79%
4.19%
1.41%
0.12%
1.05%
0.23%
6.63%
2.48%
1.24%
0.87%
0.74%
6.65%
2.54%
1.87%
2.60%
3.26%
0.98%
4.95%
3.67%
4.06%
2.22%
1.66%
0.50%
12.96
36.9664
46.567
63.247
6.83383
0.04655
3.74529
0.17591
172.287
24.3814
6.15697
3.04292
2.18059
154.828
22.4004
12.8173
25.7459
42.5392
3.82853
88.6046
46.5656
63.2475
19.2357
10.9942
0.98948
12-11
0.9
Zt
1 2565942
2 2724292
3 3235231
4 3863508
5 4819747
6 5371689
7 6119114
8
Forecast
2565942
2565942
2708457
3182554
3795413
4717314
5306251
6037828
t 1
Now note that all the terms on the right side of the equation above are identical to all the terms in
Equation (12-11) on the top, after the term wZ t. Hence we can substitute in Equation (12-11) the
left hand side of our last equation, (1w) Z t for all the terms past the first. This gives us:
Z t 1 wZ t (1 w)Z t
Z
t +1 = Zt + (1 w)(Zt
Zt )
12-12
289.1
; thus:
100
new CPI
24.9
26.9
27.5
27.7
..
.
old CPI
72.1
77.8
79.5
80.1
..
.
a)
Price Index
BaseYear 1988
Year
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
100 Base
Price
Index
175 175
190 190
132 132
96 96
100 100
78 78
131 131
135 135
154 154
163 163
178 178
170 170
145 145
133 133
12-13
years,
c)
Price Index
BaseYear 1993
Year
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
163 Base
Price
Index
175 107.36
190 116.56
132 80.982
96 58.896
100 61.35
78 47.853
131 80.368
135 82.822
154 94.479
163 100
178 109.2
170 104.29
145 88.957
133 81.595
Jan.2004value
1.44 :
100
12-29. Since a yearly cycle has 12 months and there are only 18 data points, a seasonal/cyclical
decomposition isnt feasible. Simple linear regression, with the successive months numbered
1,2,..., gives SALES = 4.23987 .03870MONTH, thus for July 1995 (month #19), the forecast is
3.5046.
(Using the template: Trend Forecast.xls)
12-14
jan
feb
mar
apr
may
jun
jul
aug
sep
oct
nov
dec
jan
feb
mar
apr
may
jun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Zt
4.4
4.2
3.8
4.1
4.1
4
4
3.9
3.9
3.8
3.7
3.7
3.8
3.9
3.8
3.7
3.5
3.4
Forecast
t
19
20
21
Z-hat
3.50458
3.46588
3.42718
r 0.7285
MSE 0.016906
Slope
-0.0387
Intercept 4.239869
12-30. Trend analysis is a quick, if sometimes inaccurate, method that can give good results. The
additive and multiplicative TSCI models are sometimes useful, although they lack a firm
theoretical framework. Exponential smoothing methods are good models. The ones described in
this book do not handle seasonality, but extensions are possible. This author believes that BoxJenkins ARIMA models are the way to go. One limitation of these models is the need for large
data sets.
12-31. Exponential smoothing models smooth the data of sharp variations and produce forecasts that
follow a type of average movement in the data. The greater the weighting factor w, the closer
the exponential smoothing series follows the data and forecasts tend to follow the variations in
the data more closely.
12-15
Accuracy Measures
MAPE
MAD
MSD
1.69534
1.75000
3.66964
Forecasts
Period
13
Forecast
103.375
Lower
99.6204
Upper
107.130
0.4
Zt
18
17
15
14
15
11
8
5
4
3
5
4
6
5
7
8
Forecast
18
18
17.6
16.56
15.536
15.3216
13.593
11.3558
8.81347
6.88808
5.33285
5.19971
4.71983
5.2319
5.13914
5.88348
6.73009
y(2007) = 6.73009
12-16
12-34. a) raised the seasonal index to 99.38 for April from 99.29
We would expect to see the April index change by a significant amount. The reason it did
not is due to the calculations involving moving average.
b) raised the seasonal index to 122.27 for April from 99.29
c) raised the seasonal index to 100.16 for December from 100.09 We would expect the
December index to change by a significant amount. It did not due to the calculations for
moving average.
d) very high or low values for data points at the beginning or end of a series have little impact
on the seasonal index due to their limited influence in the moving average computations.
12-35. (Using the template: Trend Forecast.xls)
Forecasting with Trend
Data
Period
1998
1999
2000
2001
2002
2003
2004
t
1
2
3
4
5
6
7
Zt
6.3
6.6
7.3
7.4
7.8
6.9
7.8
Forecast
t
8
9
10
Z-hat
7.95714
8.15714
8.35714
r 0.5552
MSE 0.179429
Slope
0.2
Intercept 6.357143
Q
1
2
3
4
Y
$85,455,550.30
$108,706,616.14
$97,706,824.92
$105,724,455.54
12-17
Using Excels regression tool and the Centered Moving Average (col. G of the template) as our
Y and the values under t (col. B of the template) for our X, we get the following supporting detail
for the Trend+Seasonal model:
Regression Statistics
Multiple R
0.89727
R Square
0.805093
Adjusted R Square 0.785602
Standard Error
1.558112
Observations
12
Intercept
time
Standard
Coefficients Error
t Stat
P-value
152.2638 1.195366 127.3785 2.18E-17
-0.83741 0.130296 -6.42701 7.57E-05
(Note: the coefficient values are identical to those generated by the template.)
ANOVA
df
SS
MS
F
Significance F
1 100.2802 100.2802 41.30642
7.57E-05
10 24.27713 2.427713
11 124.5573
Regression
Residual
Total
4.6026E+15
15
1
2
M2 Index
Non Farm Activity Index
-8445234.547
82447357.24
101021547.4
38350031.1
-0.083598349
2.149864156
0.9348
0.0527
3
Oil Price
-3768891
1263314.066
-2.983336528
0.0114
MS
FCritical p-value
F
1.25831E+15 18.243631 3.4902996 0.0001
6.89725E+13
3.0684E+14
12-18
0.8202
s 8304970.102
2
Adjusted R 0.77521656
Quarter
2002/Q1
2002/Q2
2002/Q3
2002/Q4
4)
Forecast
$81,337,085.11
$55,574,874.53
$60,903,732.58
$59,868,829.41
X1
X2
X3
X4 X5 X6
Non Farm
Ones
Activity
Q2 Q3 Q4
Sales
M2 Index Index
Oil Price
35452300 1 2.356464
34.2
19.15
0
0
0
41469361 1 2.357643
34.27
16.46
1
0
0
40981634 1 2.364126
34.3
18.83
0
1
0
42777164 1 2.379493
34.33
19.75
0
0
1
43491652 1 2.373544
34.4
18.53
0
0
0
57669446 1 2.387192
34.33
17.61
1
0
0
59476149 1 2.403903
34.37
17.95
0
1
0
76908559 1
2.42073
34.43
15.84
0
0
1
63103070 1 2.431623
34.37
14.28
0
0
0
84457560 1 2.441958
34.5
13.02
1
0
0
67990330 1 2.447452
34.5
15.89
0
1
0
68542620 1 2.445616
34.53
16.91
0
0
1
73457391 1
2.45601
34.6
16.29
0
0
0
89124339 1
2.48364
34.7
17
1
0
0
85891854 1 2.532692
34.67
18.2
0
1
0
69574971 1 2.564984
34.73
17
0
0
1
12-19
b
s(b)
t
p-value
M2 Index
Intercept
-2655354679 -12780153.29
1219227600
118142020
-2.177899088 -0.108176187
0.0574
0.9162
2
Non Farm
Activity Index
81566233.8
43101535.65
1.892420596
0.0910
9.9367
9.0506
VIF
ANOVA Table
Source
SS
df
Regn.
3.89129E+15 6
Error
7.11312E+14 9
Total
MS
6.48548E+14
7.90347E+13
4.6026E+15 15
Oil Price
Q2
Q3
Q4
-3827527.175
5802059 7127252.8 3211850.1
1534592.501 6575281.3 6653402.9 6716387.2
-2.494165177 0.8824047 1.0712192 0.478211
0.0342
0.4005
0.3120
0.6439
1.4058
F
8.2058616
3.0684E+14
1.6411
1.6803
FCritical
p-value
3.3737564 0.0031
s 8890145.586
Adjusted
2
R 0.742423692
0.8455
Regression Equation:
Sales = -2655354679 12780153.29 M2 + 81566233.8 NFAI 3827527.175 Oil P +5802059 Q2
+7127252.8 Q3 + 3211850.1 Q4
5. Forecast for next four quarters:
Quarter
02 Q1
02 Q2
02 Q3
02 Q4
Sales
76344324
56495768
62878143
57771809
6. Partial F-test:
H0: 4 = 5 = 6 = 0
H1: not all are zero
(Remember, to drop the three indicator variables, they must be the last three independent
variables in the data sheet of the template.)
Partial F Calculations
#Independent variables in full model
#Independent variables dropped from the model
SSEF
7.11E+14
SSER
8.28E+14
Partial F
p-value
0.490747
0.6973
12-20
1.7123
6
3
p-value = 0.6973, very high. Do not reject the null hypothesis: indicator variables are not
significant.
7. Comparing the three model forecasts:
It would be ideal to have the values for 2004 to compare the forecasts to the actual values.
However, these values are not available. The next step is to compare the three models relative to
R2, F and standard error of the models.
Model
R2
F
Std. error
Trend + Seasonal
0.805
41.306
1.558
MR (part 2)
0.820
18.244
8,304,970.1
MR (part 4)
0.846
8.206
8,890,145.6
Clearly, the best model is the Trend + Seasonal Model with the smallest Std. Error and highest Fvalue. The only significant independent variable in the multiple regression models is oil prices,
and a regression of sales on oil prices only yields an R2 of 0.33 and a very high std. error.
12-21