1.3K vues

Transféré par oykubayraktar

solutıons

- Coherence, The Secret Science of Brilliant Leadership
- hanke9_odd-num_sol_04.doc
- Business Forecasting Chapter 4 Odd Solutions
- forecast solved proplem +case
- Business Forecasting 9th Edition Hanke Solution Manual
- Hanke9 Odd-num Sol 10
- Krugman Summary International Chapter Three
- Project Management Case: From Experience: Linking Projects to Strategy of HP
- Hanke9 Odd-num Sol 02
- Hanke9 Odd-num Sol 11
- Hanke9 Odd-num Sol 05
- Hanke9 Odd-num Sol 03
- Hanke9 Odd-num Sol 07
- Ragsdale Solutions Odd Numbers
- CASE 5-AAA Washington forecasting
- Selling Notes
- Lecture 5 Market Measurement
- A Comparison Between Ses Desho Lts and Arres
- ch13
- Forecasting1_TimeSeriesModels

Vous êtes sur la page 1sur 227

ANSWERS TO PROBLEMS AND CASES

1.

Descriptive Statistics

Variable

Orders

N

28

Variable

Orders

Min

5.00

a.

2.

21.32 17.00 13.37 2.53

Max

54.00

Q1

Q3

11.25 28.75

X = 21.32

b.

S = 13.37

c.

S2 = 178.76

d.

If the policy is successful, smaller orders will be eliminated and the mean will

increase.

e.

If the change causes all customers to consolidate a number of small orders into

large orders, the standard deviation will probably decrease. Otherwise, it is very

difficult to tell how the standard deviation will be affected.

f.

Descriptive Statistics

Variable

Prices

N

Mean Median StDev SE Mean

12 176654 180000 39440 11385

Variable

Min

Max

Q1

Q3

Prices 121450 253000 138325 205625

X = 176,654 and

3.

S = 39,440

a.

b.

X 1.96 S /

(5.85%, 15.67%)

c.

X =10.76,

X 2.045 S /

30 =10.76 4.91

(5.64%, 15.88%)

S =13.71

30 = 10.76 5.12

4.

d.

We see that the 95% confidence intervals in b and c are not much different because

the multipliers 1.96 and 2.045 are nearly the same magnitude.

This explains why a sample of size n = 30 is often taken as the cutoff between

large and small samples.

a.

Point estimate: X =

23.41 + 102.59

= 63

2

b.

1 = .90 Z = 1.645,

X 1.645 S /

H0: = 12.1

H1: > 12.1

n = 63 1.645(20.2) = 63 33.23

(29.77, 96.23)

5.

X =63, S /

= .05

X = 13.5

n = 100

S = 1.7

13.5 12.1

Z = 1.7

= 8.235

100

Reject H0 since the computed Z (8.235) is greater than the critical Z (1.645). The mean has

increased.

6.

interval estimate: 8.1 1.96

5.7

49

Forecast 8.1 empty seats per flight; very likely the mean number of empty seats will lie

between 6.5 and 9.7.

7.

H 0 : = 5.9

two-sided test, = .05, critical value: |Z|= 1.96

H 1 : 5.9

Test statistic: Z =

X 5.9

S/

5.60 5.9

.87 /

= 2.67

60

Since |2.67| = 2.67 > 1.96, reject H 0 at the 5% level. The mean satisfaction rating is

different from 5.9.

p-value: P(Z < 2.67 or Z > 2.67) = 2 P(Z > 2.67) = 2(.0038) = .0076, very strong

evidence against H 0 .

8.

df = n 1 = 14 1 = 13,

H0 : = 4

H1 : > 4

X =4.31, S =.52

2

Test statistic: t =

X 4

S/ n

4.31 4

.52 / 14

= 2.23

Since 2.23 > 1.771, reject H 0 at the 5% level. The medium-size serving contains an

average of more than 4 ounces of yogurt.

p-value: P(t > 2.23) = .022, strong evidence against H 0

9.

H0: = 700

H1: 700

n = 50

S = 50

= .05

X = 715

715 700

Z = 50

= 2.12

50

Since the calculated Z is greater than the critical Z (2.12 > 1.96), reject the null hypothesis.

The forecast does not appear to be reasonable.

p-value: P(Z < 2.12 or Z > 2.12) = 2 P(Z > 2.12) = 2(.017) = .034, strong evidence

against H 0

10.

This problem can be used to illustrate how a random sample is selected with Minitab. In

order to generate 30 random numbers from a population of 200 click the following menus:

Calc>Random Data>Integer

The Integer Distribution dialog box shown in the figure below appears. The number of

random digits desired, 30, is entered in the Number of rows of data to generate space. C1

is entered for Store in column(s) and 1 and 200 are entered as the Minimum and Maximum

values. OK is clicked and the 30 random numbers appear in Column 1 of the worksheet.

The null hypothesis that the mean is still 2.9 is true since the actual mean of the

population of data is 2.91 with a standard deviation of 1.608; however, a few students may

reject the null hypothesis, committing a Type I error.

11.

a.

b.

c.

Y = 6058

X2 = 513

Y2 = 4,799,724

XY = 48,665

X = 59

r = .938

4

12.

a.

b.

c.

Y = 2312

Y2 = 515,878

X2 = 282.55 XY = 12,029.3

Y

Y

X = 53.7

r = .95

= 32.5 + 36.4X

= 32.5 + 36.4(5.2) = 222

13.

This is a good population for showing how random samples are taken. If three-digit

random numbers are generated from Minitab as demonstrated in Problem 10, the selected

items for the sample can be easily found. In this population, = 0.06 so most

students will get a sample correlation coefficient r close to 0. The least squares line will, in

most cases, have a slope coefficient close to 0, and students will not be able to reject the

null hypothesis H0: 1 = 0 (or, equivalently, = 0) if they carry out the hypothesis test.

14.

a.

15.

b.

c.

foot of space.

d.

Point estimate: X = 45.2

98% confidence interval: 1 = .98 Z = 2.33

X 2.33 S /

(43.4, 47.0)

Hypothesis test:

H 0 : = 44

H 1 : 44

Test statistic: Z =

X 44

S/

45.2 44

10.3 / 175

= 1.54

As expected, the results of the hypothesis test are consistent with the confidence

interval for ; = 44 is not ruled out by either procedure.

H 0 : = 63,700

16.

a.

H1 : > 63,700

b.

c.

17.

H 0 : = 4 .3

H1 : 4.3

H 0 : = 1300

H1 : < 1300

1.10 1.96

5.99

= 1.10 1.88

39

( 2.98, .78)

= .94 (%) is not a realistic value for mean monthly return of clients

account since it falls outside the 95% confidence interval. Client may have a

case.

18.

a.

b.

Other variables affecting wages may be size of bank and previous experience.

c.

WAGES = 324.3 + 1.006 (80) = 405

In our consulting work, business people sometimes tell us that business schools teach a risktaking attitude that is too conservative. This is often reflected, we are told, in students choosing too

low a significance level: such a choice requires extreme evidence to move one from the status quo.

7

This case can be used to generate a discussion on this point as David chooses = .01 and ends up

"accepting" the null hypothesis that the mean lifetime is 5000 hours.

Alice's point is valid: the company may be put in a bad position if it insists on very dramatic

evidence before abandoning the notion that its components last 5000 hours. In fact, the indifference

(p-value) is about .0375; at any higher level the null hypothesis of 5000 hours is rejected.

CASE 2-2: MR. TUX

In this case, John Mosby tries some primitive ways of forecasting his monthly sales. The

things he tries make some sort of sense, at least for a first cut, given that he has had no formal

training in forecasting methods. Students should have no trouble finding flaws in his efforts, such

as:

1.

The mean value for each year, if projected into the future, is of little value since

month-to-month variability is missing.

2.

His free-hand method of fitting a regression line through his data can be improved

upon using the least squares method, a technique now found on inexpensive hand

calculators. The large standard deviation for his monthly data suggests considerable

month-to-month variability and, perhaps, a strong

seasonal effect, a factor not accounted for when the values for a year are averaged.

Both the hand-fit regression line and John's interest in dealing with the monthly seasonal

factor suggest techniques to be studied in later chapters. His efforts also point out the value of

learning about well-established formal forecasting methods rather than relying on intuition and very

simple methods in the absence of knowledge about forecasting. We hope students will begin to

appreciate the value of formal forecasting methods after learning about John's initial efforts.

CASE 2-3: ALOMEGA FOOD STORES

Julies initial look at her data using regression analysis is a good start. She found that the

r-squared value of 36% is not very high. Using more predictor variables, along with examining

their significance in the equation, seems like a good next step. The case suggests that other

techniques may prove even more valuable, techniques to be discussed in the chapters that follow.

Examining the residuals of her equation might prove useful. About how large are these

errors? Are forecast errors in this range acceptable to her? Do the residuals seem to remain in

the same range over time, or do they increase over time? Are a string of negative residuals

followed by a string of positive residuals or vice versa? These questions involve a deeper

understanding of forecasting using historical values and these matters will be discussed more fully

in later chapters.

CHAPTER 3

EXPLORING DATA PATTERNS AND

CHOOSING A FORECASTING TECHNIQUE

ANSWERS TO PROBLEMS AND CASES

8

1.

forecasting techniques rely more on manipulation of historical data.

2.

A time series consists of data that are collected, recorded, or observed over successive

increments of time.

3.

The secular trend of a time series is the long-term component that represents the growth or

decline in the series over an extended period of time. The cyclical component is the wavelike fluctuation around the trend. The seasonal component is a pattern of change that

repeats itself year after year. The irregular component is that part of the time

series remaining after the other components have been removed.

4.

Autocorrelation is the correlation between a variable, lagged one or more period, and itself.

5.

The autocorrelation coefficient measures the correlation between a variable, lagged one or

more periods, and itself.

6.

The correlogram is a useful graphical tool for displaying the autocorrelations for various

lags of a time series. Typically, the time lags are shown on a horizontal scale and the

autocorrelation coefficients, the correlations between Yt and Yt-k, are displayed as vertical

bars at the appropriate time lags. The lengths and directions (from 0) of the bars indicate

the magnitude and sign of the of the autocorrelation coefficients. The lags at which

significant autocorrelations occur provide information about the nature of the time series.

7.

a.

b.

c.

d.

nonstationary series

stationary series

nonstationary series

stationary series

8.

a.

b.

c.

d.

e.

f.

stationary series

random series

trending or nonstationary series

seasonal series

stationary series

trending or nonstationary series

9.

Naive methods, simple averaging methods, moving averages, and Box-Jenkins methods.

Examples are: the number of breakdowns per week on an assembly line having a uniform

production rate; the unit sales of a product or service in the maturation stage of its life

cycle; and the number of sales resulting from a constant level of effort.

10.

simple regression, growth curves, and Box-Jenkins methods. Examples are: sales

revenues of consumer goods, demand for energy consumption, and use of raw materials.

Other examples include: salaries, production costs, and prices, the growth period of the

life cycle of a new product.

9

11.

Classical decomposition, census II, Winters exponential smoothing, time series multiple

regression, and Box-Jenkins methods. Examples are: electrical consumption,

summer/winter activities (sports like skiing), clothing, and agricultural growing seasons,

retail sales influenced by holidays, three-day weekends, and school calendars.

12.

and Box-Jenkins methods. Examples are: fashions, music, and food.

13.

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

2,413

2,407

2,403

2,396

2,403

2,443

2,371

2,362

2,334

2,362

2,336

2,344

2,384

2,244

-6

-4

-7

7

40

-72

-9

-28

28

-26

8

40

-140

1999

2000

2001

2002

2003

2004

2358 114

2329 -29

2345 16

2254 -91

2245

-9

2279 34

14.

0 1.96 ( 1

15.

a.

b.

c.

MPE

MAPE

MSE or RMSE

16.

17.

a.

r1 = .895

H0: 1 = 0

H1: 1 0

Reject if t < -2.069 or t > 2.069

k 1

SE( r k ) =

1 + 2 ri 2

i =1

11

1 + 2 ( r1 )

i =1

24

10

1

= .204

24

t=

r1 1

.895 0

=

= 4.39

SE(rk)

.204

Since the computed t (4.39) is greater than the critical t (2.069), reject the null.

r2 = .788

H0: 2 = 0H1: 2 0

Reject if t < -2.069 or t > 2.069

k 1

SE( r k ) =

1 + 2 ri 2

i =1

2 1

1 + 2 ( .895)

i =1

t=

24

2.6

= .33

24

r1 1

.7880

= 2.39

=

SE(r1)

.33

Since the computed t (4.39) is greater than the critical t (2.069), reject the null.

b.

11

18.

a.

r1 = .376

b.

12

19.

Figure 3-19 - The data are random.

Figure 3-20 - The data are seasonal. (Monthly data)

Figure 3-21 - The data are stationary and have a pattern that could be modeled.

13

20.

The data have a quarterly seasonal pattern as shown by the significant autocorrelation

at time lag 4. First quarter earnings tend to be high, third quarter earnings tend to be low.

14

t

Yt

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

.40

.29

.24

.32

.47

.34

.30

.39

.63

.43

.38

.49

.76

.51

.42

.61

.86

.51

.47

.63

.94

.56

.50

.65

.95

.42

.57

.60

.93

.38

.37

.57

.40

.29

.24

.32

.47

.34

.30

.39

.63

.43

.38

.49

.76

.51

.42

.61

.86

.51

.47

.63

.94

.56

.50

.65

.95

.42

.57

.60

.93

.38

.37

et

-.11

-.05

.08

.15

-.13

-.04

.09

.24

-.20

-.05

.11

.27

-.25

-.09

.19

.25

-.35

-.04

.16

.31

-.38

-.06

.15

.30

-.53

.15

.03

.33

-.55

-.01

.20

et

.11

.05

.08

.15

.13

.04

.09

.24

.20

.05

.11

.27

.25

.09

.19

.25

.35

.04

.16

.31

.38

.06

.15

.30

.53

.15

.03

.33

.55

.01

.20

et2

.0121

.0025

.0064

.0225

.0169

.0016

.0081

.0576

.0400

.0025

.0121

.0729

.0625

.0081

.0361

.0625

.1225

.0016

.0256

.0961

.1444

.0036

.0225

.0900

.2809

.0225

.0009

.1089

.3025

.0001

.0400

et

Yt

et

Yt

.3793 -.3793

.2083 -.2083

.2500 .2500

.3191 .3191

.3824 -.3824

.1333 -.1333

.2308 .2308

.3810 .3810

.4651 -.4651

.1316 -.1316

.2245 .2245

.3553 .3553

.4902 -.4902

.2143 -.2143

.3115 .3115

.2907 .2907

.6863 -.6863

.0851 -.0851

.2540 .2540

.3298 .3298

.6786 -.6786

.1200 -.1200

.2308 .2308

.3158 .3158

1.2619 -1.2619

.2632 .2632

.0500 .0500

.3548 .3548

1.4474 -1.4474

.0270 -.0270

.3509 .3509

b.

MAD =

c.

MSE =

d.

5.85

= .189

31

1.6865

= .0544 , RMSE = .0544 = .2332

31

11.2227

MAPE =

= .3620 or 36.2%

31

15

21.

2.1988

= -.0709

31

e.

MPE =

a.

b.

The sales time series appears to vary about a fixed level so it is stationary.

c.

The sample autocorrelations die out rapidly. This behavior is consistent with a

stationary series. Note that the sales data are not random. Sales in adjacent

weeks tend to be positively correlated.

22.

a.

16

b.

Since, in this case, the residuals differ from the original observations by the

constant Y = 2460.05 , the residual autocorrelations will be the same as the

autocorrelations for the sales numbers. There is significant residual

autocorrelation at lag 1 and the autocorrelations die out in an exponential fashion.

The random model is not adequate for these data.

23.

17

pattern since 2nd and 3rd quarter earnings tend to be relatively large and 1st and 4th

quarter earnings tend to be relatively small.

c. The autocorrelation function for the first 10 lags follows.

The autocorrelations are consistent with choice in part b. The autocorrelations fail

to die out rapidly consistent with nonstationary behavior. In addition, there are

relatively large autocorrelations at lags 4 and 8, indicating a quarterly seasonal

pattern.

24.

18

about a fixed level.

25.

a. 98/99Inc

70.01

133.39

129.64

100.38

95.85

157.76

126.98

93.80

98/99For

50.87

93.83

92.51

80.55

70.01

133.39

129.64

100.38

98/99Err

19.14

39.56

37.13

19.83

25.84

24.37

-2.66

-6.58

98/99AbsErr

19.14

39.56

37.13

19.83

25.84

24.37

2.66

6.58

Sum

175.11

98/99Err^2

366.34

1564.99

1378.64

393.23

667.71

593.90

7.08

43.30

5015.17

98/99AbE/Inc

0.273390

0.296574

0.286409

0.197549

0.269588

0.154475

0.020948

0.070149

1.5691

b.

or 19.6%

c.

Autocorrelation function for fourth differences suggests they are not random.

Error measures suggest nave method not very accurate. In particular, on average,

there is about a 20% error. However, nave method does pretty well for 1999.

Hard to think of another nave method that will do better.

19

1.

The retail sales series has a trend and a monthly seasonal pattern.

2.

Yes! Julie has determined that her data have a trend and should be first differenced. She has

also found out that the first differenced data are seasonal.

3.

exponential smoothing, time series multiple regression, and Box-Jenkins methods.

4.

She will know which technique works best by comparing error measurements such as MAD,

MSE or RMSE, MAPE, and MPE.

1.

The retail sales series has a trend and a monthly seasonal pattern.

2.

The patterns appear to be somewhat similar. More actual data is needed in order to reach a

definitive conclusion.

3.

This question should create a lively discussion. There are good reasons to use either set of

data. The retail sales series should probably be used until more actual sales data is available.

1.

This case affords students an opportunity to learn about the use of autocorrelation functions,

and to continue following John Mosby's quest to find a good forecasting method for his data.

With the use of Minitab, the concept of first differencing data is also illustrated. The

summary should conclude that the sales data have both a trend and a seasonal component.

2.

The trend is upward. Since there are significant autocorrelation coefficients at time lags 12

and 24, the data have a monthly seasonal pattern.

3.

There is a 49% random component. That is, about half the variability in Johns monthly

sales is not accounted for by trend and seasonal factors. John, and the students analyzing

these results, should realize that finding an accurate method of forecasting these data could

be very difficult.

4.

Yes, the first differences have a seasonal component. Given the autocorrelations at lags 12

and 24, the monthly changes are related 12, 24, months apart. This information should be

used in developing a forecasting model for changes in monthly sales.

1.

First, Dorothy used Minitab to compute the autocorrelation function for the number of new

20

Autocorrelation Function for Clients

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

12

Lag

Corr

LBQ

Lag

0.49

4.83

24.08

0.43

3.50

42.86

0.35

2.56

55.51

10

0.18

0.33

2.30

67.18

11

0.23

0.28

1.85

75.60

12

0.36

0.24

1.50

81.61

13

0.24

1.49

87.87

14

22

Corr

LBQ

Lag

Corr

8 0.23

1.40

93.71

15

9 0.17

1.01

96.90

1.09 100.72

LBQ

Lag

Corr

LBQ

0.12

0.64 136.27

22

0.09

0.46 153.39

16

0.14

0.75 138.70

23

0.16

0.83 156.84

17

0.22

1.14 144.37

24

0.25

1.26 165.14

1.35 106.87

18

0.06

0.33 144.86

2.05 121.68

19

0.11

0.58 146.40

0.23

1.25 127.70

20

0.13

0.69 148.66

0.24

1.30 134.55

21

0.17

0.87 152.33

Since the autocorrelations failed to die out rapidly, Dorothy concluded her series was

trending or nonstationary. She then decided to difference her time series.

21

Autocorrelations for Differenced Data

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Corr

12

LBQ

1 -0.42 -4.11

17.43

0.05

Lag

Corr

LBQ

8 0.03

0.26

9 -0.06 -0.52

Lag

22

Corr

LBQ

18.49

15 -0.12 -0.93

Lag

Corr

LBQ

29.32

22 -0.12 -0.92

41.93

0.41

17.66

18.91

16 -0.04 -0.32

29.52

23 -0.03 -0.26

42.09

3 -0.04 -0.33

17.82

10 0.00

0.02

18.91

17

1.67

34.85

24

47.00

0.01

0.10

17.83

11 -0.08 -0.69

19.67

18 -0.18 -1.41

38.93

0.02

0.17

17.87

12 0.20

1.65

24.07

19

0.09

38.95

6 -0.07 -0.57

18.34

13 -0.14 -1.11

26.20

20 -0.02 -0.11

38.98

18.39

14 0.11

27.72

21

40.02

0.02

0.17

0.92

0.21

0.01

0.09

0.69

0.19 1.44

2.

The differences appear to be stationary and are correlated in consecutive time periods. Given

the somewhat large autocorrelations at lags 12 and 24, a monthly seasonal pattern should be

considered.

3.

Dorothy would recommend that various seasonal techniques such as Winters method of

exponential smoothing (Chapter 4), classical decomposition (Chapter 5), time series

multiple regression (Chapter 8) and Box-Jenkins methods (ARIMA models in Chapter 9) be

considered.

22

The sales data from Chapter 1 for the Alomega Food Stores case are reprinted in Case

3-4. The case suggests that Julie look at the data pattern for her sales data.

The autocorrelation function for sales follows.

Autocorrelations suggest an up and down pattern that is very regular. If one month is

relatively high, next month tends to be relatively low and so forth. Very regular

pattern is suggested by persistence of autocorrelations at relatively large lags.

The changing of the sign of the autocorrelations from one lag to the next is consistent with

an up and down pattern in the time series. If high sales tend to be followed by low sales or

low sales by high sales, autocorrelations at odd lags will be negative and autocorrelations at

even lags positive.

The relatively large autocorrelation at lag 12, 0.53, suggests there may also be a seasonal

pattern. This issue is explored in Case 5-6.

CASE 3-5: SURTIDO COOKIES

1.

A time series plot and the autocorrelation function for Surtido Cookies sales follow.

23

The graphical evidence above suggests Surtido Cookies sales vary about a fixed level with

a strong monthly seasonal component. Sales are typically high near the end of the year and

low during the beginning of the year.

2.

03Sales

1072617

NaiveFor

681117

510005

579541

771350

590556

549689

497059

652449

636358

Err

391500

AbsErr

391500

AbsE/03Sales

0.364995

MAPE = .816833/5 = .163 or

16.3%

-39684

82482

118901

-45802

Sum

39684

82482

118901

45802

678369

0.077811

0.142323

0.154147

0.077557

0.816833

MAD appears large because of the big numbers for sales. MAPE is fairly large but

perhaps tolerable. In any event, Jame is convinced he can do better.

24

CHAPTER 4

MOVING AVERAGES AND SMOOTHING METHODS

ANSWERS TO PROBLEMS AND CASES

1.

Exponential smoothing

2.

Naive

3.

Moving average

4.

5.

6.

a.

t

Yt

Y

t

1

2

3

4

5

6

7

8

9

10

11

12

19.39

18.96

18.20

17.89

18.43

19.98

19.51

20.63

19.78

21.25

21.18

22.14

19.00

19.39

18.96

18.20

17.89

18.43

19.98

19.51

20.63

19.78

21.25

21.18

et

et

8.990

8.92

= .74

12

b. MAD =

8.99

= .75

12

d. MAPE =

et

Yt

.39

.39 .1521 .020

- .43

.43 .1849 .023

- .76

.76 .5776 .042

- .31

.31 .0961 .017

.54

.54 .2916 .029

1.55 1.55 2.4025 .078

- .47

.47 .2209 .024

1.12 1.12 1.2544 .054

- .85

.85 .7225 .043

1.47 1.47 2.1609 .069

- .07

.07 .0049 .003

.96

.96 .9216 .043

8.92

c. MSE =

et2

.445

= .0371

12

25

.445

et

Yt

.020

-.023

-.042

-.017

.029

.078

-.024

.054

-.043

.069

-.003

.043

.141

e. MPE =

.141

= .0118

12

f. 22.14

7.

Price

19.39

18.96

18.20

17.89

18.43

19.98

19.51

20.63

19.78

21.25

21.18

22.14

AVER1

*

*

18.8500

18.3500

18.1733

18.7667

19.3067

20.0400

19.9733

20.5533

20.7367

21.5233

FITS1

*

*

*

18.8500

18.3500

18.1733

18.7667

19.3067

20.0400

19.9733

20.5533

20.7367

RESI1

*

*

*

-0.96000

0.08000

1.80667

0.74333

1.32333

-0.26000

1.27667

0.62667

1.40333

Accuracy Measures

MAPE: 4.6319

MAD: 0.9422

MSE: 1.1728

8.

Yt

200

210

215

216

219

220

225

226

Avg

*

*

*

*

212

216

219

221.2

Fits

*

*

*

*

*

212

216

219

221.2

Accuracy Measures

MAPE: 3.5779

Res

*

*

*

*

*

8

9

7

MAD: 8.0000

MSE: 64.6667

26

b. & c.

Yt

200

210

215

216

219

220

225

226

Smoothed

200.000

204.000

208.400

211.440

214.464

216.678

220.007

222.404

Accuracy Measures

MAPE: 3.2144

Forecast

200.000

200.000

204.000

208.400

211.440

214.646

216.678

220.007

222.404

MAD: 7.0013

MSE: 58.9657

values are averaged for the initial value. If 1 value is averaged, so in

this case the initial value is 200, the forecast for period 4 is 208.4.

The forecast error for time period 3 is 11.

27

9.

a. & c, d, e, f

Month

1

2

3

4

5

6

7

8

9

10

11

12

Yield

MA Forecast

9.29

*

*

9.99

*

*

10.16

9.813

*

10.25 10.133

9.813

10.61 10.340 10.133

11.07 10.643 10.340

11.52 11.067 10.643

11.09 11.227 11.067

10.80 11.137 11.227

10.50 10.797 11.137

10.86 10.720 10.797

9.97 10.443 10.720

Accuracy Measures

MAPE: 4.5875

Error

*

*

*

0.437

0.477

0.730

0.877

0.023

-0.427

-0.637

0.063

-0.750

MAD: 0.4911

MSE: 0.3193

28

MPE: .6904

b. & c, d, e, f

Month Yield

1

9.29

2

9.99

3 10.16

4 10.25

5 10.61

6 11.07

7 11.52

8 11.09

9 10.80

10 10.50

11 10.86

12

9.97

MA

*

*

*

*

10.060

10.416

10.722

10.908

11.018

10.996

10.954

10.644

Forecast

*

*

*

*

*

10.060

10.416

10.722

10.908

11.018

10.996

10.954

Error

*

*

*

*

*

1.010

1.104

0.368

-0.108

-0.518

-0.136

-0.984

Accuracy Measures

MAPE: 5.5830

MAD: 0.6040

Forecast for month 13 (Jan.) is 10.644

29

MSE: 0.5202

MPE: .7100

g.

10.

MAPE: 5.8926

MAD: 0.6300

MSE: 0.5568

MPE: 5.0588

11.

No! The accuracy measures favor the three-month moving average procedure, but the

values of the forecasts are not much different.

See plot below.

30

1

2

3

4

5

6

7

8

9

10

11

12

205

251

304

284

352

300

241

284

312

289

385

256

205.000

228.000

266.000

275.000

313.500

306.750

273.875

278.938

295.469

292.234

338.617

297.309

Accuracy Measures

MAPE: 14.67

205.000

205.000

228.000

266.000

275.000

313.500

306.750

273.875

278.938

295.469

292.234

338.617

MAD:

Error

0.0000

46.0000

76.0000

18.0000

77.0000

-13.5000

-65.7500

10.1250

33.0625

-6.4688

92.7656

-82.6172

43.44

MSE: 2943.24

12.

MAPE = 8.622

MAD = 1.916

MSE = 5.852

5-month moving average - Forecast for 1996 Q2: 24.244 (Actual: 26.47)

MAPE: 9.791

MAD: 2.249

MSE: 7.402

Exponential smoothing with a Smoothing Constant of Alpha: 0.696

31

MAPE: 8.425

MAD: 1.894

MSE: 5.462

(Actual: 26.47)

Based on the error measures and the forecast for Q2 of 1996, the nave method

and simple exponential smoothing are comparable. Either method could be used.

13.

a.

= .4

Accuracy Measures

MAPE: 14.05

MAD: 24.02

MSE: 1174.50

b.

= .6

Accuracy Measures

MAPE: 14.68

MAD: 24.56

MSE: 1080.21

c.

d.

Looking at the error measures, there is not much difference between the two

choices of smoothing constant. The error measures for = .4 are slightly better.

The forecasts for the two choices of smoothing constant are also not much

different.

The residual autocorrelations for = .4 are shown below. The residual

autocorrelations for = .6 are similar. There are significant residual

32

no significant residual autocorrelations would be desirable.

14.

None of the techniques do much better than the nave method. Simple exponential

Smoothing with close to 1, say .95, is essentially the nave method.

Accuracy Measures for Nave Method

MAPE: 42.57

MAD: 1.685

MSD: 4.935

Using the nave method, the forecast for 2000 would be 6.85.

15.

A time series plot of quarterly Revenues and the autocorrelation function show

that the data are seasonal with a trend. After some experimentation, Winters

multiplicative smoothing with smoothing constants (level) = 0.8, (trend) = 0.1

and (seasonal) = 0.1 is used to forecast future Revenues. See plot below.

Accuracy Measures

MAPE

3.8

MAD

69.1

MSE 11146.4

Forecasts

Quarter Forecast

71

2444.63

Lower Upper

2275.34 2613.92

33

72

73

74

75

76

1987.98

2237.98

1887.74

2456.18

1997.36

1773.84 2202.12

1969.23 2506.72

1559.46 2216.01

2065.70 2846.65

1543.10 2451.62

Winters multiplicative smoothing shown below indicates that none of them

are significantly different from zero.

16.

a.

34

The data appear to be seasonal with relatively large sales in August, September,

October and November, and relatively small sales in July and December.

b. & c. The Excel spreadsheet for calculating MAPE for the nave forecasts and

the simple exponential smoothing forecasts is shown below.

35

is calculated with a divisor of 23 (since the first smoothed value is set equal

to the first observation). Using a divisor of 24 gives MAPE2 = 7.69%, the

value reported by Minitab.

d.

generate accurate forecasts of future monthly sales since neither model

allows for seasonality.

e.

shown in the Minitab plot below.

f.

since it allows for seasonality and has the smallest MAPE of the three

models considered.

g.

shown below. The residual autocorrelations suggest Winters method

works well for these data since they are all insignificantly different from 0.

36

17.

a.

The four-week moving average seems to represent the data a little better.

Compare the error measures for the four-week moving average in the figure below

with the five-week moving average results in Figure 4-4.

b.

better job of smoothing the data than a four-week moving average as judged

by the uniformly smaller error measures shown in the plot below.

37

18.

a.

As the order of the moving average increases, the smoothed data become more

wavelike. Looking at the results for orders k =10 and k = 15, and counting the

number of years from one peak to the next, it appears as if the number of severe

earthquakes is on about a 30 year cycle.

b.

are shown in the plot below. The forecast for the number of severe earthquakes

for 2000 is 20.

38

significant residual autocorrelations. Simple exponential smoothing seems

to provide a good fit to the earthquake data.

19.

c.

these data are recorded on an annual basis.

a.

Southwest Airlines quarterly income are shown below. A plot of the residual

autocorrelation function follows. It appears as if Holts procedure represents the

data well but the residual autocorrelations have significant spikes at the seasonal

lags of 4 and 8 suggesting a seasonal component is not captured by Holts

method.

39

b.

income data and the results are shown in the plot below. The forecasts for the

four quarters of 2000 are:

Quarter Forecast

49

88.960

50

184.811

51

181.464

52

117.985

40

The forecasts seem reasonable but the residual autocorrelation function below has

a significant spike at lag 1. So although Winters procedure captures the trend and

seasonality, there is still some association in consecutive observations not

accounted for by Winters method.

20.

41

This time series is trending upward and has a seasonal pattern with third and fourth

quarter Gap sales relatively large. Moreover the variability in this series is increasing

with the level suggesting a multiplicative Winters smoothing procedure or a

transformation of the data (say logarithms of sales) to stabilize the variability.

The results of Winters multiplicative smoothing with smoothing constants

= = =.2 are shown in the plot below.

Forecasts

42

Quarter

101

102

103

104

Forecast

3644.18

3775.78

4269.27

5267.82

Lower

3423.79

3551.94

4041.58

5035.90

Upper

3864.57

3999.62

4496.96

5499.74

below indicate there is still some autocorrelation at low lags, including the

seasonal lag S = 4, that is not accounted for by Winters method. A better

model is needed. This issue is explored in later chapters of the text.

This case provides the student with an opportunity to deal with a frequent real world

problem: small data sets. A plot of the two years of data shows both an upward trend and seasonal

pattern. The forecasting model that is selected must do an accurate job for at least three months into

the future.

Averaging methods are not appropriate for this data set because they do not work when data

has a trend, seasonality, or some other systematic pattern. Moving average models tend to smooth

out the seasonal pattern of the data instead of making use of it to forecast.

A naive model that takes into account both the trend and the seasonality of the data might

work. Since the seasonal pattern appears to be strong, a good forecast might take the same value it

did in the corresponding month one year ago or Yt+1 = Yt-11.

However, as it stands, this forecast ignores the trend. One approach to estimate trend is to calculate

the increase from each month in 2005 to the same month in 2006. As an example, the increase from

43

After the increases for all 12 months are calculated, they can be summed and then divided by

12. The forecast for each month of 2007 could then be calculated as the value for the same month in

2006 plus the average increase for each of the 12 months from 2005 to the same month in 2006.

Consequently, the forecast for January, 2007 is

Y25 = 17 + [(17 - 5) + (14 - 6) + (20 - 10) + (23 - 13) + (30 - 18) + (38 - 15) + (44 - 23) +

(41 - 26) + (33 - 21) + (23 - 15) + (26 - 12) + (17 - 14)]/12

Y25 = 17 +

148

= 17 + 12 = 29

12

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

29

26

32

35

42

50

56

53

45

35

38

29

represent the data fairly well (see plot below) and produces the forecasts:

Month Forecast

Jan/2007

19.8

Feb/2007

18.0

Mar/2007 26.8

Apr/2007 32.0

May/2007 42.4

Jun/2007

45.8

Jul/2007

58.4

Aug/2007 58.9

Sep/2007

47.6

Oct/2007

33.7

Nov/2007 33.5

Dec/2007

28.0

44

The nave forecasts are not unreasonable but the Winters forecasts seem to have captured

the seasonal pattern a little better, particularly for the first 3 months of the year. Notice that if the

trend and seasonal pattern are strong, Winters smoothing procedure can work well even with only

two years of monthly data.

CASE 4-2: MR TUX

This case shows how several exponential smoothing methods can be applied to the Mr. Tux

data. John Mosby tries simple exponential smoothing and exponential smoothing with adjustments

for trend and seasonal factors, along with a three-month moving average.

Students can begin to see that several forecasting methods are typically tried when an

important variable must be forecast. Some method of comparing them must be used, such as the

three accuracy methods discussed in this case. Students should be asked their opinions of John's

progress in his forecasting efforts given these accuracy values. It should be apparent to most that

the degree of accuracy achieved is not sufficient and that further study is needed. Students should

be reminded that they are looking at actual data, and that the problems faced by John Mosby really

occurred.

1.

Of the methods attempted, Winters multiplicative smoothing was the best method John

found. Each forecast was typically off by about 25,825. The error in each forecast was

about 22% of the value of the variable being forecast.

2.

There are other choices for the smoothing constants that lead to smaller error measures.

For example, with = = = .1, MAD = 22,634 and MAPE = 20.

3.

John should examine plots of the residuals and the residual autocorrelations. If Winters

procedure is adequate, the residuals should appear to be random. In addition, John can

examine the forecasts for the next 12 months to see if they appear to be reasonable.

45

4.

The ideal value for MPE is 0. If MPE is negative, then, on average, the predicted values

are too high (larger than the actual values).

1.

Students should realize immediately that simply using the basic naive approach of

using last period to predict this period will not allow for forecasts for the rest of

1993. Since the autocorrelation coefficients presented in Case 3-3 indicate

some seasonality, a naive model using April 1992 to predict April 1993, May 1992 to

predict May 1993 and so forth might be tried. This approach produces the error

measures

MAD = 23.39

MSE = 861.34

MAPE = 18.95

over the data region, and are not particularly attractive given the magnitudes of the new

client numbers.

2.

A moving average model of any order cannot be defended since any moving average

will produce flat line forecasts for the rest of 1993. That is, the forecasts will lie along a

horizontal line whose level is the last value for the moving average. The seasonal pattern

will be ignored.

3.

procedure with smoothing constants = = =.2 was tried. For these choices:

MAD = 19.29, MSE = 545.41 and MAPE = 16.74. For smoothing constants = .5,

= = .1, MAD = 16.94, MSE = 451.26 and MAPE = 14.30.

4.

with smoothing constants = .5, = = .1 is best based on MAD, MSD, and MAPE.

5.

Using Winters procedure in 4, the forecasts for the remainder of 1993 are:

Month

Apr/1993

May/1993

Jun/1993

Jul/1993

Aug/1993

Sep/1993

Oct/1993

Nov/1993

Dec/1993

Forecast

148

141

148

141

143

136

159

146

126

46

6.

multiplicative smoothing seems adequate.

1.

= .3, = .2 and = .1 was deemed the best but there was still some significant

residual autocorrelation.

2.

47

adequate. Also, a nave model that combined seasonal and trend estimates (similar to

Equation 4.5) was found to be adequate. The trend and seasonal pattern in actual

Murphy Brothers sales are consistent and pronounced so a nave model is likely to

work well.

3.

Based on the forecasting methods tested, actual Murphy Brothers sales data should be

used. A plot of the results for the best Winters procedure follows.

An examination of the autocorrelation coefficients for the residuals from this Winters

model shown below indicates that none of them are significantly different from zero.

However, Julie decided to use the nave model because it was very simple and she could

explain it to her father.

48

This case is designed to emphasize the use of subjective probability estimates in a

forecasting situation. The methodology used to generate revenue forecasts is both appropriate

and accurately employed. The key to answering the question concerning the accuracy of the

projections hinges on the accuracy of the assumptions made and estimates used. Examination

of the report indicates that the analysts were conservative each time they made an assumption or

computed an estimate. This is probably one of the major reasons why the Professional

Marketing Associates (PMA) forecast is considerably lower. Since we do not know how the

accountant projected the number of procedures, it is difficult to determine why his revenue projections

were higher. However, it is reasonable to assume that his forecast of the number

of cases for each type of procedure was not nearly as sophisticated or thorough as PMAs.

Therefore, the recommendation to management should indicate that the PMA forecast, while

probably on the conservative side, is more likely to be accurate.

Downtown Radiology evidently agreed with PMA's forecast. They decided not to

purchase a 9,800 series CT scanner. They also decided to purchase a less expensive MRI.

Finally, they decided to obtain outside funding and did not resort to any type of public offering.

They built their new imaging center, purchased an MRI and have created a very successful

imaging center.

CASE 4-6: WEB RETAILER

1.

The time series plot for Orders shows a slight upward trend and a seasonal pattern

with peaks in December. Because of the relatively small data set, the autocorrelations

are only computed for a limited number of lags, 6 in this case. Consequently with

monthly data, the seasonality does not show up in the autocorrelation function. There

is significant positive autocorrelation at lag 1, so Orders in consecutive months are

correlated.

The time series plot for CPO shows a downward trend but a seasonal component is

not readily apparent. There is significant positive autocorrelation at lag 1 and the

autocorrelations die out relatively slowly. The CPO series is nonstationary and

observations in consecutive time periods are correlated.

2.

Forecasts for the next 4 months follow. Residual autocorrelation function below

has no significant autocorrelations.

Month

Jul/2003

Aug/2003

Sep/2003

Oct/2003

Forecast

Lower

Upper

3524720 3072265 3977174

3885780 3431589 4339972

3656581 3200544 4112618

4141277 3683287 4599266

49

3.

Simple exponential smoothing with = .77 (the optimal in Minitab) represents the

the CPO data well but, like any averaging procedure, produces flat-line forecasts.

Forecasts of CPO for the next 4 months are:

Month Forecast Lower Upper

Jul/2003 0.1045 0.0787 0.1303

Aug/2003 0.1045 0.0787 0.1303

Sep/2003 0.1045 0.0787 0.1303

Oct/2003 0.1045 0.0787 0.1303

The results for simple exponential smoothing are pictured below. There are no

significant residual autocorrelations (see plot below).

50

4.

Contacts forecasts:

Month

Jul/2003

Forecast

368333

51

Aug/2003

Sep/2003

Oct/2003

406064

382113

432763

5.

Multiplying a forecast of Orders by a forecast of CPO to get a forecast of Contacts

has the potential for introducing additional error (uncertainty) into the process.

6.

It may or may not be better to focus on the number of units and contacts per unit

to get a forecast of contacts. It depends on the nature of the data (ease of modeling)

and the amount of relevant data available.

1.

(since autocorrelations slow to die out) and seasonal (relatively large autocorrelation

at lag 12).

2.

multiplicative smoothing with = = .5 and = .2 seems to do as well as any

smoothing procedure (see error measures in plot below). Forecasts for the remainder

of FY2003-04 generated by Winters procedure follow.

Month Forecast

Mar/2004 1465.8

Apr/2004 1490.5

May/2004 1453.7

Jun/2004 1465.4

Jul/2004 1568.7

Aug/2004 1552.7

Lower

1249.9

1252.6

1189.3

1171.2

1242.3

1192.4

Upper

1681.7

1728.4

1718.1

1759.6

1895.1

1913.0

some remaining significant autocorrelation not captured by Winters method.

52

3.

in the Total Visits data, it is likely to produce better forecasts. This issue

is explored in subsequent cases.

4.

The forecasts from Winters smoothing show an upward trend. If they are

to be believed, perhaps additional medical staff are required to handle the

expected increased demand. At this point however, further study is required.

53

1.

Jame learned that Surtido Cookie sales have a strong seasonal pattern

(sales are relatively high during the last two months of the year, low during

the spring) with very little, if any, trend (see Case 3-5).

2.

The autocorrelation function for sales (see Case 3-5) is consistent with

the time series plot. The autocorrelations die out (consistent with no

trend) and have a spike at the seasonal lag 12 (consistent with a seasonal

component).

3.

data fairly well and produce reasonable forecasts (see plot below). However,

there is still some significant residual autocorrelation at low lags.

Month

Jun/2003

Jul/2003

Aug/2003

Sep/2003

Oct/2003

Nov/2003

Dec/2003

Forecast

Lower

Upper

653254

91351 1215157

712159 141453 1282865

655889

75368 1236411

1532946 941647 2124245

1710520 1107533 2313507

2133888 1518354 2749421

1903589 1274702 2532476

54

4.

Month

Jun/2003

Jul/2003

Aug/2003

Sep/2003

Oct/2003

Nov/2003

Dec/2003

Forecast

618914

685615

622795

1447864

1630271

2038257

1817989

These forecasts have the same pattern as the forecasts generated by Winters

method but are uniformly lower. Winters forecasts seem more consistent

with recent history.

CHAPTER 5

55

ANSWERS TO PROBLEMS AND CASES

1.

The purpose of decomposing a time series variable is to observe its various elements

in isolation. By doing so, insights into the causes of the variability of the series are

frequently gained. A second important reason for isolating time series components

is to facilitate the forecasting process.

2.

The multiplicative components model works best when the variability of the time

series increases with the level. That is, the values of the series spread out as the

trend increases, and the set of observations have the appearance of a megaphone

or funnel.

3.

The basic forces that affect and help explain the trend-cycle of a series are

population growth, price inflation, technological change, and productivity increases.

4.

a.

Exponential

b.

c.

Linear

5.

Weather and the calendar year such as holidays affect the seasonal component.

6.

a. & b.

c.

23.89 billion

d.

648.5 billion

56

7.

e.

f.

Inflation, population growth, and new technology affect the trend of capital

spending.

a. & b.

c.

d.

observations about fitted straight line. However, if there is a cyclical affect, it

is very slight.

8.

9.

10.

Y

11.

= TS = 850(1.12) = $952

= TS = 900(.827) = $744.3

12.

Month

Jan

Feb

Mar

Sales

($ Thousands)

125

113

189

Seasonal

Index (%)

Deseasonalized

Data

51

50

87

245

226

217

57

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

201

206

241

230

245

271

291

320

419

93

95

99

96

89

103

120

131

189

216

217

243

240

275

263

243

244

222

The statement is not true. When the data are deseasonalized, it shows that business

is about the same.

13.

a. & b. Would use both the trend and seasonal indices to forecast although seasonal

component is not strong in this example (see plot and seasonal indices below).

Yt = 2268.0 + 22.1*t

Seasonal Indices

Period

1

2

3

4

Index

0.969

1.026

1.000

1.005

58

Forecasts

Period

Q3/1996

Q4/1996

c.

Forecast

3305.39

3343.02

Lines forecast (3,305 versus 3,340). The forecast for fourth quarter

is a bit high compared to Value Lines (3,343 versus 3,300).

Additional plots associated with the decomposition follow.

59

14.

a.

Multiplicative Model

Data

Cavanaugh Sales

Length 77

NMissing 0

Fitted Trend Equation

Yt = 72.6 + 6.01*t

Seasonal Indices

Period Index

1

1.278

2

0.907

3

0.616

4

0.482

5

0.426

6

0.467

7

0.653

8

0.863

9

1.365

10

1.790

11

1.865

12

1.288

60

b.

forecasting.

c.

Month

Forecast

Jun/2006

253

Jul/2006

358

Aug/2006

478

Sep/2006

764

Oct/2006

1012

Nov/2006 1066

Dec/2006

744

15.

a.

Additive Model

Data

LnSales

Length 77

NMissing 0

Fitted Trend Equation

Yt = 4.6462 + 0.0215*t

Seasonal Indices

61

Period

1

2

3

4

5

6

7

8

9

10

11

12

b.

c. & d.

e.

Index

0.335

-0.018

-0.402

-0.637

-0.714

-0.571

-0.273

-0.001

0.470

0.723

0.747

0.342

Forecasts

Month

Forecast of LnSales

Forecast of Sales

Jun/2006

5.75297

315

Jul/2006

6.07205

434

Aug/2006

6.36535

581

Sep/2006

6.85802

951

Oct/2006

7.13248

1252

Nov/2006

7.17779

1310

Dec/2006

6.79477

893

Forecasts of Cavanaugh sales developed from additive decomposition are

higher (for all months June 2006 through December 2006) than those developed

from the multiplicative decomposition. Forecasts from multiplicative

62

Cavanaugh sales time series.

16.

a.

Multiplicative Model

Data

Disney Sales

Length 63

NMissing 0

Fitted Trend Equation

Yt = -302.9 + 44.9*t

Seasonal Indices

Period

1

2

3

4

b.

Index

0.957

1.022

1.046

0.975

There is a significant trend but it is not a linear trend. First quarter sales

tend to be relatively low and third quarter sales tend to be relatively high.

However, the plot in part a indicates a multiplicative decomposition with a

linear trend is not an adequate representation of Disney sales. Perhaps

better to do a multiplicative decomposition with a quadratic trend. Even

better, in this case, is to do an additive decomposition with the logarithms

of Disney sales.

63

c.

With the right decomposition, would use both the trend and seasonal

components to generate forecasts.

d.

Forecasts

Quarter Forecast

Q4/1995

2506

Q1/1996

2502

Q2/1996

2719

Q3/1996

2830

Q4/1996

2681

However, the plot in part a indicates that forecasts generated from a

multiplicative decomposition with a linear trend are likely to be too low.

17.

a.

decomposition may be appropriate or additive decomposition with the

logarithms of demand.

b.

linear trend work well for this series. This time series is best modeled with other

methods. The multiplicative decomposition is pictured below.

c.

Period

1

2

3

Index

0.947

0.950

0.961

Period

5

6

7

Index

1.004

1.007

1.022

64

Period

9

10

11

Index

1.045

0.982

0.995

0.998

1.070

12

1.019

d.

below).

Month

Oct/1996

Nov/1996

Dec/1996

18.

Forecast

171.2

174.9

180.5

Multiplicative Model

Data

U.S. Retail Sales

Length 84

NMissing 0

Fitted Trend Equation

Yt = 128.814 + 0.677*t

Seasonal Indices

Period

1

2

3

Index

0.880

0.859

0.991

65

4

5

6

7

8

9

10

11

12

0.986

1.031

1.021

1.007

1.035

0.973

0.991

1.015

1.210

Period

Jan/1995

Feb/1995

Mar/1995

Apr/1995

May/1995

Jun/1995

Jul/1995

Aug/1995

Sep/1995

Oct/1995

Nov/1995

Dec/1995

Forecast

164.0

160.7

186.1

185.8

194.9

193.6

191.8

197.7

186.6

190.6

196.1

234.5

Actual

167.0

164.0

192.1

187.5

201.4

202.6

194.9

204.2

192.8

194.0

202.4

238.0

Forecasts maintain the seasonal pattern but are uniformly below the actual

retail sales for 1995. However, MPE = MAPE = 2.49% is relatively small.

66

19.

a.

Jan =

600

= 500

1.2

b.

.

T

T

= 140 + 5(72) = 500

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

c.

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

= 500(1.20) = 600

= (140 + 5(73))(1.37) = 692

= (140 + 5(74))(1.00) = 510

= (140 + 5(75))(0.33) = 170

= (140 + 5(76))(0.47) = 244

= (140 + 5(77))(1.25) = 656

= (140 + 5(78))(1.53) = 811

= (140 + 5(79))(1.51) = 808

= (140 + 5(80))(0.95) = 513

= (140 + 5(81))(0.60) = 327

= (140 + 5(82))(0.82) = 451

= (140 + 5(83))(0.97) = 538

22.

Deflating a time series removes the effects of dollar inflation and permits the analyst

to examine the series in constant dollars.

23.

1289.73(2.847) = 3,671.86

24.

Jan

303,589

67

Feb

Mar

Apr

May

Jun

Jul

25.

251,254

303,556

317,872

329,551

261,362

336,417

Multiplicative Model

Data

Employed Men

Length 130

NMissing 0

Fitted Trend Equation

Yt = 65355 + 72.7*t

Seasonal Indices

Month Index

1 0.981

2 0.985

3 0.990

4 0.995

5 1.002

6 1.014

Month

7

8

9

10

11

12

Index

1.019

1.014

1.002

1.004

0.999

0.995

68

Forecasts

Month

Nov/2003

Dec/2003

Jan/2004

Feb/2004

Mar/2004

Forecast

74791.4

74581.7

73607.8

73954.0

74393.4

69

Apr/2004

May/2004

Jun/2004

Jul/2004

Aug/2004

Sep/2004

Oct/2004

74887.2

75454.0

76419.5

76894.1

76564.4

75757.2

76005.6

A multiplicative decomposition with a default linear trend is not quite right for these

data. There is some curvature in the time series as the plot of the seasonally adjusted

data indicates. Not surprisingly, there is a strong seasonal component with

employment relatively high in the summer and relatively low in the winter. In spite

of the not quite linear trend, the forecasts seem reasonable.

26.

A linear trend is not appropriate for the employed men data. The plot below shows

a quadratic trend fit to the data of Table P-25.

Although better than a linear trend, the quadratic trend is not quite right. Employment

for the years 20002003 seems to have leveled off. No simple trend curve is

likely to provide an excellent fit to these data. The residual autocorrelation

function below indicates a prominent seasonal component since there are large

autocorrelations at the seasonal lag S = 12 and its multiples.

70

27.

Multiplicative Model

Data

Wal-Mart Sales

Length 56

NMissing 0

Fitted Trend Equation

Yt = 1157 + 1088*t

Seasonal Indices

Quarter

Q1

Q2

Q3

Q4

Index

0.923

0.986

0.958

1.133

71

72

Quarter Forecast

Q1/2004

58328

Q2/2004

63346

Q3/2004

62607

Q4/2004

75278

Actuals

65443

70466

69261

82819

Slight upward curvature in the Wal-Mart sales data so a linear trend is not quite

right. Not surprisingly, there is a strong seasonal component with 4th quarter

sales relatively high and 1st quarter sales relatively low. The forecasts for 2004

are uniformly below the actuals (primarily the result of the linear trend assumption)

although the seasonal pattern is maintained. Here MPE = MAPE = 9.92%.

Multiplicative decomposition better than additive decomposition but any

decomposition that assumes a linear trend will not forecast sales for 2004 well.

28.

A linear trend fit to the Wal-Mart sales data of Table P-27 is shown below. A

linear trend misses the upward curvature in the data.

73

A quadratic trend provides a better fit to the Wal-Mart sales data (see plot

below). The autocorrelation function for the residuals from a quadratic

trend fit suggests a prominent seasonal component since there are large

autocorrelations at the seasonal lag S = 4 and its multiples.

74

1.

75

2.

3.

SEASONAL

ADJUSTMENT

MONTH

FACTORS

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

0.693

0.707

0.935

1.142

1.526

1.940

1.479

0.998

0.757

0.373

0.291

1.290

FORECASTS, T*S

2005 2006

2007

8.68

9.59

13.66

17.87

25.48

34.39

27.77

19.77

15.78

8.17

6.68

30.94

17.32 25.97

18.41 27.23

25.34 30.01

32.13 46.38

44.52 63.57

58.61 82.82

46.23 64.69

32.23 44.68

25.22 34.67

12.83 17.49

10.32 13.95

47.06 63.17

76

4.

77

5.

Trend*Seasonality (T*S):

Linear Trend Model:

MAD = 1.52

MAD = 9.87

6.

If you had to limit your choices to the models in 2 and 4, the linear trend model is

78

better (judged by MAD and MSE) than any of the Holt smoothing procedures.

However, the Trend*Seasonality (T*S) model is best. This procedure is the only

one that takes account of the trend and seasonality in Small Engine Doctor sales.

CASE 5-2: MR. TUX

At last, John is able to deal directly with the strong seasonal effect in his monthly data.

Students find it interesting that in addition to using these to forecast, John's banker wants them to

justify variable loan payments.

To forecast using decomposition, students see that both the C and I components must be

estimated. We like to emphasize that studying the C column in the computer printout is helpful,

but that other study is needed to estimate the course of the economy over the next several months.

The computer is not able to make such forecasts with accuracy, as anyone who follows economic

news well knows.

Thinking about Johns efforts to balance his seasonal business to achieve a more uniform

sales picture can generate a good class discussion. This is usually the goal of any business;

examples such as boats/skis or bikes/skis illustrate this effort in many seasonal businesses. In fact,

John Mosby put a great deal of effort into expanding his Seattle business in order to balance his

seasonal effect. Along with his shirt making business, he has achieved a rather uniform monthly

sales volume.

1.

The two sentences might look something like this: A computer analysis of John

Mosby's monthly sales data clearly shows the strong variation by month. I think we

are justified in letting him make variable monthly loan payments based on the seasonal

indices shown in the computer printout.

2.

Since John expects to do twice as much business in Seattle as Spokane, the Seattle

indices he should try to achieve will be only half as far from 100 as the Spokane

indices, and on the opposite side of 100:

Spokane Seattle

Jan

31.4

134.3

Feb

47.2

126.4

Mar

88.8

105.6

Apr

177.9

61.1

May

191.8

54.1

Jun

118.6

90.7

Jul

102.9

98.6

Aug

128.7

85.7

Sep

93.8

103.1

Oct

81.5

109.3

Nov

60.4

119.8

Dec

77.1

111.5

3.

Using the sales figures for January and February of 2005, to get average (100%) sales

dollars, divide the actual sales by the corresponding seasonal index:

79

Feb: 152,930/.472 = 324,004

Now subtract the actual sales from these target values to get the sales necessary

from the shirt making machine:

Jan: 226,252 - 71,043 = 155,209

Feb: 324,004 - 152,930 = 171,074

CASE 5-3: CONSUMER CREDIT COUNSELING

Both the trend and seasonal components are important. The trend explains about 34%

percent of the total variance.

Multiplicative Model

Data

Clients

Length 99

Fitted Trend Equation

Yt = 89.88 + 0.638*t

Seasonal Indices

Month Index

1 1.177

2 1.168

3 1.246

4 0.997

5 0.940

6 1.020

7 0.916

8 0.951

9 0.878

10 1.055

11 0.868

12 0.783

80

The number of new clients tends to be relatively large during the first three months

of the year.

Forecasts

Month

Apr/2003

May/2003

Jun/2003

Jul/2003

Aug/2003

Sep/2003

Oct/2003

Nov/2003

Dec/2003

Forecast

153.207

145.121

158.062

142.440

148.560

137.749

166.161

137.261

124.277

81

There is one, possibly two, large positive residuals (irregularities) at the beginning of the

series but there are no significant residual autocorrelations.

1.

Smoothing Constants

Alpha (level) 0.980

Gamma (trend) 0.025

Accuracy Measures

MAPE

1.1

MAD

76.2

MSD 11857.8

Forecasts

Month

Forecast

Jan/2002

8127.8

Feb/2002

8165.1

Mar/2002

8202.4

Apr/2002

8239.7

May/2002 8277.0

82

Jun/2002

Jul/2002

Aug/2002

Sep/2002

2.

8314.2

8351.5

8388.8

8426.1

Month

Jan/2002

Feb/2002

Mar/2002

Apr/2002

May/2002

Jun/2002

Jul/2002

Aug/2002

Sep/2002

Forecast

7453.2

7462.5

8058.7

7873.1

8223.5

8140.9

8308.8

8611.1

8368.2

Actual

7120

7124

7817

7538

7921

7757

7816

8208

7828

Holts linear smoothing was adequate for the seasonally adjusted data, but the

forecasts above are uniformly above the actual values for the first nine months of

2002.

3.

Using the same procedure as in 2, the forecast for October, 2002 is 8609.2.

4.

The pattern for the three sets of data shows a trend and monthly seasonality.

1.

83

Multiplicative Model

Data

Calls

Length 60

NMissing 0

Fitted Trend Equation

Yt = 21851 - 17.0437*t

Seasonal Indices

Month

1

2

3

4

5

6

7

8

9

11

12

Index

0.937

0.922

0.972

0.963

0.925

1.016

1.063

1.094

1.094

1.025

0.936

MAD

84

814

MSD

1276220

2.

Decomposition analysis works pretty well for AAA Washington data. There is a

slight downward trend in emergency road service call volume with a pronounced

seasonal component. Volume tends to be relatively high in the summer and

early fall. There is significant residual autocorrelation at lag 1 (see plot below) so

not all the association in the data has been accounted for by the decomposition.

85

The sales data for the Alomega Food Stores case is subjected to a multiplicative

decomposition procedure in this case. A trend line is first calculated with the actual data plotted

around it (using MINITAB). Students can project this line into future months for sales forecasts,

although, as the case suggests, accurate forecasts will not result: The MAPE using only the trend

line is 28%.

A plot of the seasonal indices from the MINITAB output is shown below.. Students can

summarize the managerial benefits to Julie from studying these values. As noted in the case, the

MAPE drops to 12% when the seasonal indices along with the trend are used.

Finally, a 12-month forecast is generated using both the trend line and the seasonal

indices. The forecasts seem reasonable.

Month

Jan/2007

Feb/2007

Mar/2007

Apr/2007

Forecast

785348

326276

585307

391827

86

May/2007

Jun/2007

Jul/2007

Aug/2007

Sep/2007

Oct/2007

Nov/2007

Dec/2007

558299

453257

520615

319029

614997

394599

377580

235312

Although more a management concern than a forecasting one, the attitude of Jackson

Tilson in the case might generate a discussion that ties the computer assisted forecasting process

into the real-life personalities of business associates. Although increasingly unlikely in the

business setting, there are still those whose backgrounds do not include familiarity with computer

based data analysis. Students whose careers will be spent in business might benefit from a

discussion of the human element in the management process.

CASE 5-7: SURTIDO COOKIES

1.

Multiplicative Model

Data

SurtidoSales

Length 41

NMissing 0

Fitted Trend Equation

Yt = 907625 + 4736*t

Seasonal Indices

Month

1

2

3

4

5

6

7

8

9

10

11

12

Index

0.696

0.546

0.517

0.678

0.658

0.615

0.716

0.567

1.527

1.664

1.988

1.829

87

88

2.

3.

Month

Forecast

Jun/2003

680763

Jul/2003

795362

Aug/2003

633209

Sep/2003

1710846

Oct/2003

1872289

Nov/2003 2246745

Dec/2003

2076183

The linear trend in sales has a slight upward slope. The seasonal indices show that

cookie sales are relatively high the last four months of the year with a peak in

November and relatively low the rest of the year.

The residual autocorrelation function is shown below. There are no significant

residual autocorrelations.

89

The multiplicative decomposition adequately accounts for the trend and seasonality

in the data. The forecasts are very reasonable. Jame should change his thinking

about the value of decomposition analysis.

CASE 5-8: SOUTHWEST MEDICAL CENTER

1.

that make up the time series. These components are the trend or trend/cycle

(long term growth or decline), the seasonal (consistent within year variation

typically related to the calendar) and the irregular (unexplained variation).

2.

decomposition are nearly the same (apart from the seasonal indices being

either multiplicative or additive). For the purposes of this case, either

can be considered. The results from a multiplicative decomposition follow.

Multiplicative Model

Data

Total Visits

Length 114

NMissing 0

Fitted Trend Equation

Yt = 955.6 + 4.02*t

90

Seasonal Indices

Month

1

2

3

4

5

6

7

8

9

10

11

12

Index

0.972

1.039

0.943

0.884

1.039

0.935

1.043

1.033

0.995

1.007

1.091

1.019

91

Forecasts

3.

4.

Month Forecast

Month Forecast

Mar/2004

1479

Sep/2004

1401

Apr/2004

1469

Oct/2004

1502

May/2004

1419

Nov/2004

1367

Jun/2004

1440

Dec/2004

1284

Jul/2004

1564

Jan/2005

1514

Aug/2004

1464

Feb/2005

1367

There is a distinct upward trend in total visits. The seasonal indices show that

visits in December (4th month of fiscal year) tend to be relatively low and visits

in July (11th month of fiscal year) tend to be relatively high.

The residual autocorrelation function is shown below.

92

There are significant residual autocorrelations. The residuals are far from random

The forecasts may be reasonable given the last three fiscal years of data. However,

looking at the time series decomposition plot in 2, it is clear a decomposition analysis

is not able to describe the middle two or three fiscal years of data. For some

reason, visits for these fiscal years, in general, appear to be unusually high. A

decomposition analysis does not adequately describe Marys data and leaves her

perplexed.

CHAPTER 6

REGRESSION ANALYSIS

ANSWERS TO PROBLEMS AND CASES

1.

Option b is inconsistent because the regression coefficient and the correlation coefficient

93

2.

.06 billion dollars.

b. If GNP is equal to zero, we expect earnings to be .078 billion dollars.

3.

The regression equation is

Sales = 828 + 10.8 AdvExpend

Predictor

Coef SE Coef

T

P

Constant

828.1

136.1 6.08 0.000

AdvExpend 10.787

2.384 4.52 0.002

S = 67.1945 R-Sq = 71.9% R-Sq(adj) = 68.4%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 92432 92432 20.47 0.002

Residual Error 8 36121 4515

Total

9 128552

a. Yes, the regression is significant. Reject H 0 : 1 = 0 using either the t value 2.384

and its p value .002, or the F ratio 20.47 and its p value .002.

b. Y = 828 + 10.8X

c. Y = 828 + 10.8(50) = $1368

d. 72% since r2 = .719

e. Unexplained variation (SSE) = 36,121

4.

The regression equation is

Time = 0.620 + 0.109 Value

Predictor

Coef

Constant 0.6202

SE Coef

0.2501

T

P

2.48 0.038

94

Value

0.10919

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 25.622 25.622 115.52 0.000

Residual Error 8

1.774

0.222

Total

9 27.396

a. Yes, the regression is significant. Reject H 0 : 1 = 0 using either the t value 10.75

and its p value .000, or the F ratio 115.52 and its p value .000.

b. Y = .620 + .109X

e. Unexplained variation (SSE) = 1.774

f. Total variation (TSS) = 27.396

Point forecast: Y = .620 + .1092(3) = 0.948

99% Interval forecast: Y + tsf

1+

sf = sy.x

sf = .471

2

1

(X X )

+

n ( X X )2

1

(3 19.78)

1+

+

10

2148.9

1 +.1 +.131

= .471

1.231

= .471

sf = .471(1.110) = .523

5.

small sample size and large confidence coefficient. Not useful.

a, b and d.

95

.

The regression equation is

Cost = 208.2 + 70.92 Age

Analysis of Variance

Source

DF

Regression

1

Error

7

Total

8

SS

MS

F

P

634820 634820 50.96 0.000

87197

12457

722017

e. Reject H 0 : 1 = 0 at the 5% level since F = 50.96 and its p value = .000 < .05.

Could also use t = 7.14, the t value associated with the slope coefficient, and its

p value = .000. The correlation coefficient is significantly different from 0 since

the slope coefficient is significantly different from 0.

f. Y = 208.20 + 70.92(5) = 562.80 or $562.80

6.

a, b and d.

96

Books = 32.46 + 36.41 Feet

Analysis of Variance

Source

Regression

Error

Total

DF

SS

MS

1 27032.3 27032.3

9 2905.4

322.8

10 29937.6

F

P

83.74 0.000

e. Reject H 0 : 1 = 0 at the 10% level since F = 83.74 and its p value = .000 < .10.

Could also use t = 9.15, the t value associated with the slope coefficient, and its

p value = .000. The correlation coefficient is significantly different from 0 since

the slope coefficient is significantly different from 0.

f. Based on the residuals versus the fitted values plot, there is no reason to

doubt the adequacy of the simple linear regression model.

97

7.

a, b, c & d.

The regression equation is

Orders = 15.8 + 1.11 Catalogs

Predictor

Coef SE Coef

Constant 15.846

3.092

Catalogs 1.1132 0.3596

T

5.13

3.10

P

0.000

0.011

R-Sq = 48.9% (Percentage of variation in Orders explained by Catelogs)

R-Sq(adj) = 43.8%

Analysis of Variance (ANOVA Table)

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

1 317.53 317.53 9.58 0.011

10 331.38

33.14

11 648.92

New

Obs

Fit SE Fit

90% CI

90% PI

1 26.98

1.93 (23.47, 30.48) (15.97, 37.98)

e. Do not reject H 0 : 1 = 0 at the 1% level since t = 3.10 and its p value = .011 > .01.

However, would reject H 0 : 1 = 0 at the, say, 5% level.

f. Do not reject H 0 : 1 = 0 at the 1% level since F = 9.58 and its p value = .011 > .01.

Result is consistent with the result in e as it should be.

98

g. See Fit and 90% PI at end of computer printout above. A 90% prediction interval

for mail orders when 10(000) catalogs are distributed is (16, 38)---16,000 to 38,000.

8.

Dollars = 3538 - 418 Rate

Predictor

Constant

Rate

Coef SE Coef

3538.1

744.4

-418.3

150.8

T

4.75

-2.77

P

0.001

0.024

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 978986 978986 7.69 0.024

Residual Error 8 1017824 127228

Total

9 1996810

a. There is a significant (at the 5% level) negative relationship between these variables.

Reject H 0 : 1 = 0 at the 5% level since t = -2.77 and its p value = .024 < .05.

b. The data set is small. Moreover, r2 = .49 so only 49% of the variation in investment

dollars is explained by interest rate. Finally, the last observation (6.2, 1420) has a

large influence on the location of the fitted straight line. If this observation is deleted,

there is a considerable change in the slope (and intercept) of the fitted line. Using the

original straight line equation for prediction is suspect.

c. A forecast can be calculated. It is 1865. However, the 95% prediction interval is wide.

Forecast unlikely to be useful without additional information. See comments in b.

d. See answer to b.

e. It seems reasonable to say movements in interest rate cause changes in the level

of investment.

9.

a. The firms seem to be using very similar rationale since r = .959. Also, from the fitted

line plot below, notice the fitted line is not far from the 45 o line through the origin (with

intercept 0 and slope 1).

99

b. If ABC bids 1.01, the predicted competitors bid is 101.212. A 95% prediction

interval (PI) is given below.

New

Obs

101

Fit

101.212

SE Fit

0.164

95% CI

(100.872, 101.552)

95% PI

(99.637, 102.786)

c. Assume normality distributed errors about the population regression line and

treat the least square line as if it were the population regression line (n is reasonably

large in this case). Then at ABC bid 101, possible competitor bids are normally

distributed about the fitted value 101.212 with a standard deviation estimated by

sy.x = .743. Consequently, the probability that ABC will have the bid is

P(Z (101-101.212)/ .743) = P(Z -.285) = .51.

10.

a. Only if the sample size is large enough. The t statistic associated with the

slope coefficient or the F ratio should be consulted to determine if the population

regression line slope is significantly different from a horizontal line with zero

slope.

b. It will typically produce significant results, not necessarily useful results.

The coefficient of determination, r2, might be small, so forecasting using the fitted

line is unlikely to produce a useful result.

11.

100

Permits = 2217 - 145 Rate

Predictor

Constant

Rate

Coef

2217.4

-144.95

SE Coef

316.2

27.96

T

P

7.01 0.000

-5.18 0.001

Analysis of Variance

Source

DF

SS

Regression

1 559607

Residual Error 7 145753

Total

8 705360

MS

559607

20822

F

26.88

P

0.001

c. Reject H 0 : 1 = 0 at the 5% level since t = -5.18 and its p value = .001 < .05.

d. If interest rate increases by 1%, on average the number of building permits will

decrease by 145.

e. From the computer output above, r2 = .793.

f. Interest rate explains about 79% of the variation in number of building permits issued.

12.

interest rates and building permits issued.

The population for this problem contains X-Y data points whose correlation

coefficient is .846 ( = .846). Each student will have a different answer, however,

most will conclude that the Y is linearly related to X, that r is around .846, r-squared

101

Any student who fails to find a meaningful relationship between X and Y will be the

victim of a Type II error.

13.

Defectives = - 17.7 + 0.355 BatchSize

Predictor

Constant

BatchSize

Coef SE Coef

T

P

-17.731

4.626 -3.83 0.003

0.35495 0.02332 15.22 0.000

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

1 14331 14331 231.77 0.000

11

680

62

12 15011

c. Reject H 0 : 1 = 0 at the 5% level since t = 15.22 and its p value = .000 < .05

d.

102

Residual Versus Fits plot shows curvature in scatter not captured by straight line fit.

e. Model with quadratic term in Batch Size fits well. Results with Size**2 as

predictor variable follow.

The regression equation is

Defectives = 4.70 + 0.00101 Size**2

Predictor

Constant

Size**2

Coef

4.6973

0.00100793

SE Coef

T

0.9997

4.70

0.00001930 52.22

P

0.001

0.000

Analysis of Variance

Source

DF

Regression

1

Residual Error 11

Total

12

SS

MS

F

P

14951 14951 2727.00 0.000

60

5

15011

f. Reject H 0 : 1 = 0 at the 5% level since t = 52.22 and its p value = .000 < .05

103

New

Obs

Fit SE Fit

1

95.411 1.173

95% CI

(92.829, 97.993)

95% PI

(89.647, 101.175)

j. Memo to Harry showing the value of transforming the independent (predictor)

variable.

14.

a.

104

assessed values (as predictor variable). There is a considerable amount of

unexplained variation.

d. F = 16.87 , p value = .000. Regression is highly significant.

e.

covered by data (see scatter diagram). Linear relation may no longer hold.

Y =98.1

Unusual Observations

Obs Assessed Market

3

64.6 87.200

26

72.0 97.200

Fit SE Fit

87.423 1.199

90.483 0.578

Residual St Resid

-0.223

-0.10 X

6.717

2.83R

X denotes an observation whose X value gives it large leverage.

15.

b. r 2 = .751 . About 75% of the variation in operating expenses is explained

by player costs.

c. F = 72.6 , p value = .000 < .10. The regression is clearly significant at the

= .10 level.

d. Coefficient on X = player costs is 1.30. Is H 0 : 1 = 2 reasonable?

105

t=

1.30 2.0

= 4.58 (p value = .000) suggests 1 = 2 is not supported by

.153

the data. Appears that operating expenses have a fixed cost component

represented by the intercept b0 = 18.88 , and are then about 1.3 times player costs.

e.

Y =58.6 , Y 2.064 s f

f. Unusual Observations

Obs PlayCosts OpExpens

7

18.0

60.00

42.31

1.64

17.69

3.45R

Team 7 has unusually low player costs relative to operating expenses.

16.

Consumption = - 811 + 0.226 Families

Predictor

Coef

Constant

-811.0

Families 0.22596

SE Coef

553.6

0.05622

T

P

-1.47 0.158

4.02 0.001

Analysis of Variance

Source

Regression

Residual Error

DF

SS

MS

F

P

1 10855642 10855642 16.15 0.001

21 14113925

672092

106

Total

22

24969567

Although the regression is significant, the residual versus fit plot indicates the

magnitudes of the residuals increase with the level. This behavior and the

scatter diagram in a suggest that consumption is not evenly distributed about

the regression line. That is, the data have a megaphone-like appearance. A

straight line regression model for these data is not adequate.

c & d. The response variable is converted to the natural log of newsprint consumption

(LnConsum).

The regression equation is

LnConsum = 5.70 + 0.000134 Families

Predictor

Coef

SE Coef

T

P

Constant

5.6987

0.3302 17.26 0.000

Families 0.00013413 0.00003353

4.00 0.001

S = 0.488968 R-Sq = 43.2% R-Sq(adj) = 40.5%

Analysis of Variance

Source

DF

SS

Regression

1 3.8252

Residual Error 21 5.0209

Total

22

8.8461

MS

3.8252

0.2391

107

F

P

16.00 0.001

The regression is significant (F = 16, p value = .001) although only 43% of the

variation in ln(consumption) is explained by families. The residual plots

above suggest the straight line regression of ln(consumption) on families is

adequate. This simple linear regression model with ln(consumption) is better

than the same model with consumption as the response.

e. Using the results in c, a forecast of ln(consumption) with 10,000 families is

7.040 so a forecast of consumption is 1,141.

f. Other variables that will influence newsprint consumption include number of

papers published and retail sales (influencing newspaper advertising).

17.

a. Can see from fitted line plot below that growth in number of steakhouses is

exponential, not linear.

108

growth rate.

The regression equation is

LnLocations = 0.348 + 0.820 Year

Predictor

Constant

Year

Coef SE Coef

0.3476

0.3507

0.81990 0.09004

T

0.99

9.11

P

0.378

0.001

Analysis of Variance

Source

DF

Regression

1

Residual Error 4

Total

5

SS

MS

11.764 11.764

0.568

0.142

12.332

F

82.91

P

0.001

c. Forecast of ln(locations) for 2007 is .348 + .820(20) = 16.748. Hence a forecast of

the number of Outback Steakhouse locations for 2007 is e16.748 or 18,774,310, an

absurd number. This example illustrates the danger of extrapolating a trend (growth)

curve far into the future.

18.

a, Can see from fitted line plot below that growth in number of copy centers is

exponential, not linear.

109

growth rate.

The regression equation is

LnCenters = - 0.305 + 0.483 Time

Predictor

Constant

Time

Coef SE Coef

-0.3049

0.1070

0.48302 0.01257

T

-2.85

38.42

P

0.015

0.000

Analysis of Variance

Source

DF

Regression

1

Residual Error 12

Total

13

SS

MS

F

P

53.078 53.078 1476.38 0.000

0.431 0.036

53.509

19.

the number of On The Double copy centers for 2012 is e9.355 or 11,556, an unlikely

number. This example illustrates the possible danger of extrapolating a trend

(growth) curve some distance into the future.

a. Intercept b0 = 17.954, Slope b1 = .2715

b. Cannot reject H0 at the 10% level since the t value associated with the slope

coefficient, 1.57, has a p value of .138 > .10. The regression is not significant.

There does not appear to be a relationship between profits per employee and

110

number of employees.

c. r2 = .15. Only 15% of the variation in profits per employee is explained by the

number of employees.

d. The regression is not significant. There is no point in using the fitted function to

generate forecasts for profits per employee for a given number of employees.

20.

The regression equation is

Profits = 25.0 - 0.713 Employees

Predictor

Constant

Employees

Coef

25.013

-0.7125

SE Coef

T

P

5.679 4.40 0.001

0.2912 -2.45 0.029

Analysis of Variance

Source

DF

SS

MS

F

Regression

1

579.40 579.40 5.99

Residual Error 13 1258.40

96.80

Total

14 1837.80

P

0.029

The regression is now significant at the 5% level (t value = -2.45, p value = .029 < .05).

r2 has increased from 15% to 31.5%. These results suggest there is a linear

relationship between profits per employee and number of employees. A single

observation can have a large influence on the regression analysis, particularly when

the number of observations is relatively small. However, the relatively small r2 of 31.5%

indicates there will be a fair amount of uncertainly associated with any forecast of

profits per employee. Dun and Bradstreet should not be thrown out unless there is some

good (non-numerical) reason not to include this firm with the others.

21.

Actual = 0.68 + 0.922 Estimate

Predictor

Constant

Estimate

Coef

0.683

0.92230

SE Coef

T

P

1.691 0.40 0.690

0.08487 10.87 0.000

111

Analysis of Variance

Source

DF

Regression

1

Residual Error 24

Total

25

SS

3833.4

779.1

4612.5

MS

F

P

3833.4 118.09 0.000

32.5

F ratio = 118.09, p value = .000).

c. r2 = .831 or 83.1% of the variation in actual costs is explained by estimated

costs.

d. If estimated costs are perfect predictor of actual costs, then 0 = 0 , 1 = 1 . The

estimated intercept coefficient, .683, is consistent with 0 = 0 . With the t value = .40

and its p value = .69, cannot reject the null hypothesis H 0 : 0 = 0 . To check the

hypothesis H1 : 1 = 1 compute t =(.922-1)/.0849 = .92, which is not in the rejection

region for a two-sided test at any reasonable significance level. The estimated slope

coefficient, .922, is consistent with 1 = 1 .

22.

e. The plot of the residuals versus the fitted values has a megaphone-like appearance.

The residuals are numerically smaller for smaller projects than for larger projects.

Estimated costs are more accurate predictors of actual costs for inexpensive (smaller)

projects than for expensive (larger) projects.

a. The regression is significant (t value = 14.71, p value = .000).

b. r2 = .90 or 90% of the variation in ln(actual costs) is explained by

ln(estimated costs).

c. If ln(estimated costs) are perfect predictor of ln(actual costs), then 0 = 0 , 1 = 1 .

The estimated intercept coefficient, .003, is consistent with 0 = 0 . With the

t value = .02 and its p value = .987, cannot reject the null hypothesis H 0 : 0 = 0 .

To check the hypothesis H1 : 1 = 1 compute t =(.968-1)/.0658 = .49, which is not

112

in the rejection region for a two-sided test at any reasonable significance level.

The estimated slope coefficient, .968, is consistent with 1 = 1 .

d. ln(24) = 3.178, so forecast of ln(actual cost) = .0026 + .968(3.178) = 3.079. Forecast

of actual cost is e3.079 = 21.737.

CASE 6-1: TIGER TRANSPORT

This case asks students to summarize the analysis in a report to management. We find this a useful

exercise since it requires students to put the application and results of a statistical procedure into their

own words. If they are able to do this, they understand the technique.

This case illustrates the use of regression analysis in a situation where determining a good

regression equation is only the first step. The results must then be priced out in order to

arrive at a rational decision regarding a pricing policy. This situation can generate a discussion regarding

the general nature of quantitative techniques: they aid in the decision-making

process rather than replace it. Possible policies regarding the small-load charge can be

discussed after the cost of such loads is determined. One approach would be to take small loads

at company cost, which is low. The resultant goodwill might pay off in increased regular

business. Another would be to charge a low cost for small loads but only if the customer agrees to

book a certain number of large loads.

The low out-of-pocket cost involved in adding small loads can focus management attention

in other directions. Since no significant costs need to be recovered by the small load charge,

a policy based on other considerations is appropriate.

CASE 6-2: BUTCHER PRODUCTS, INC.

1.

The 89 degree temperature is 24 degrees off ideal (89 - 65 = 24). This value is placed into

the regression equation yielding a forecast number of units per day of 338.

2.

Once again, the temperature is 24 degrees from ideal (65 - 41 = 24). For X = 24, a forecast

of 338 units is calculated from the regression equation.

3.

Since there is a fairly strong relationship between output and deviation from ideal

temperature (r = -.80), higher output may well result from efforts to control the

temperature in the work area so that it is close to 65 degrees. Gene should consider ways

to do this.

4.

Gene has made a decent start towards finding an effective forecasting tool. However,

since about 36% of the variation in output is unexplained, he should look for additional

important predictor variables.

1.

The correlation coefficient is: r = .927. The corresponding t = 8.9 for testing

H 0 : = 0 has a p value of .000. We reject H0 and conclude the correlation between

days absent and employee age holds for the population.

2.

Y = 4.28 + .254X

113

3.

r2 = .859. About 86% of Y's (absent days) variability can be explained through

knowledge of X (employee age).

4.

The null hypothesis H 0 : 1 = 0 is rejected using either t = 8.9, p value = .000 or the

F = 79.3 with p value = .000. There is a significant relation between absent days and

employee age.

5.

Placing X = 24 into the prediction equation yields a Y forecast of 1.8 absent days per year.

6.

If time and cost are not factors, it might be helpful to take a larger sample to see if these

small sample results hold. If results hold, a larger sample will very likely produce

more precise interval forecasts.

7.

The fitted function is likely to produce useful forecasts, although 95% prediction

intervals can be fairly wide because of the small sample size.

1.

After John uses simple regression analysis to forecast his monthly sales volume, he is

not satisfied with the results. The low r-squared value (56.3%) disappoints him.

The high seasonal variation should be discussed as a cause of his poor fit

when using only the month number to forecast sales. The possibility of using

dummy variables to account for the monthly effect is a possibility. After this topic

is covered in Chapter 7, you can have the students return to this case.

2.

Not adequate.

3.

The idea of serial correlation can be mentioned at this point. The possibility of

autocorrelated residuals can be introduced based on John's Durbin-Watson statistic.

In fact, the DW is low, indicating definite autocorrelation. A class discussion about

this problem and what might be done about it is useful. After this topic is covered

in Chapter 8, you can have the students return to this case. We hope that by this

time students appreciate the difficulties involved in real-life forecasting. Forecasting

Compromises and multiple attempts are the norm, not exceptions.

1.

significant but not very useful.

The regression equation is

114

Predictor

Constant

Stamps

Coef

32.68

0.003487

SE Coef

31.94

0.001076

T

1.02

3.24

P

0.312

0.002

Analysis of Variance

Source

DF

SS

MS

Regression

1

5891.9 5891.9

Residual Error 46 25791.4

560.7

Total

47 31683.2

F

10.51

P

0.002

The correlation of Clients and Index = 0.752. The relation is significant (see below).

The regression equation is

Clients = - 199 + 2.94 Index

Predictor

Constant

Index

Coef

-198.65

2.9400

SE Coef

T

P

28.64 -6.94 0.000

0.2619 11.23 0.000

Analysis of Variance

2.

Source

DF

SS

MS

F

P

Regression

1 49993 49993 126.04 0.000

Residual Error 97

38475

397

Total

98

88468

The regression equation is Clients = - 199 + 2.94 BI

Jan 1993: Clients = - 199 + 2.94 (125) = 168.5

Feb 1993: Clients = - 199 + 2.94 (125) = 168.5

Mar 1993: Clients = - 199 + 2.94 (130) = 183.2

Note: Students might develop a new equation that leaves out the first three months of

data for 1993. This is a better way to determine whether the model works and the

results are:

The regression equation is

Clients = - 204 + 2.99 Index

Predictor

Coef SE Coef

T

P

Constant -203.85

31.37 -6.50 0.000

Index

2.9898 0.2883 10.37 0.000

S = 20.0046 R-Sq = 53.4% R-Sq(adj) = 52.9%

115

Analysis of Variance

Source

DF

SS

Regression

1 43028

Residual Error 94 37617

Total

95 80645

MS

F

43028 107.52

400

P

0.000

Feb 1993: Clients= - 204 + 2.99 (125) = 169.8

Mar 1993: Clients = - 204 + 2.99 (130) = 184.7

Regressing Clients on the reciprocal of Index produces a little better straight line fit.

The results for this transformed predictor variable follow.

The regression equation is

Clients = 470 - 37719 RecipIndex

Predictor

Constant

RecipIndex

Coef SE Coef

469.58

32.07

-37719

3461

T

14.64

-10.90

P

0.000

0.000

Analysis of Variance

3.

Source

DF

Regression

1

Residual Error 94

Total

95

Actual

Jan 1993

Feb 1993

Mar 1993

152

151

199

SS

MS

F

P

45015 45015 118.76 0.000

35630

379

80645

Forecast Forecast Forecast(RecipIndex predictor)

169

169

183

170

170

185

168

168

180

4.

Only if the business activity index could itself be forecasted accurately. Otherwise, it is

not a viable predictor because the values for the business activity index are not

available in a timely fashion.

5.

6.

If a good regression equation can be developed in which the changes in the predictor

variable lead the response, it might be possible to accurately forecast the rest of 1993.

However, if the regression equation is based on coincident changes in the predictor

variable and response, forecasts for the rest of 1993 could not be developed since values

for the predictor variable are not known in advance.

116

1.

The four linear regression models are shown below. Both temperature and rainfall are

potential predictor variables.

The regression equation is

Calls = 18366 + 467 Rate

Predictor

Coef SE Coef

T

P

Constant 18366

1129 16.27 0.000

Rate

467.4

174.2

2.68 0.010

S = 1740.10 R-Sq = 11.0% R-Sq(adj) = 9.5%

The regression equation is

Calls = 28582 - 137 Temp

Predictor

Constant

Temp

Coef SE Coef

28582.2

956.0

-137.44

18.06

T

P

29.90 0.000

-7.61 0.000

Calls = 20069 + 400 Rain

Predictor

Constant

Rain

Coef SE Coef

T

P

20068.9

351.7 57.07 0.000

400.30

84.20

4.75 0.000

The regression equation is

Calls = 27980 - 0.0157 Members

49 cases used, 3 cases contain missing values

Predictor

Constant

Members

Coef

SE Coef

T

27980

3769

7.42

-0.015670 0.008703 -1.80

P

0.000

0.078

2. & 3. Sixty-five degrees was subtracted from the temperature variable. The variable used

was the absolute value of the temperature with relative zero at 65 degrees Fahrenheit

117

labeled NewTemp.

The correlation coefficient between Calls and NewTemp is .724, indicating a fairly

strong positive linear relationship. However, examination of the fitted line plot below

suggests there is a curvilinear relation between Calls and NewTemp

4.

better fit. The residual plots also indicate an adequate fit.

The regression equation is

Calls = 20044 + 5.38 NewTemp**2

Predictor

Constant

NewTemp**2

Coef

20044.4

5.3817

SE Coef

203.1

0.5462

T

98.68

9.85

P

0.000

0.000

Analysis of Variance

Source

DF

SS

Regression

1 119870408

Residual Error 55

67910916

Total

56 187781324

MS

119870408

1234744

118

F

P

97.08 0.000

CHAPTER 7

MULTIPLE REGRESSION

ANSWERS TO PROBLEMS AND CASES

1.

A good predictor variable is highly related to the dependent variable but not too

highly related to other predictor variables.

2.

The population of Y values is normally distributed about E(Y), the plane formed by the

regression equation. The variance of the Y values around the regression plane is

constant. The residuals are independent of each other, implying a random sample. A linear

relationship exists between Y and each predictor variable.

3.

The net regression coefficient measures the average change in the dependent variable per

unit change in the relevant independent variable, holding the other independent variables

constant.

4.

5.

119

6.

possible pair of variables in the analysis.

b. The proportion of Y's variability that can be explained by the predictor

variables is given by R2. It is also referred to as the coefficient of

determination.

c. Collinearity results when predictor variables are highly correlated among

themselves.

d. A residual is the difference between an actual Y value and Y , the value

predicted using the sample regression plane.

e. A dummy variable is used to determine the relationship between a qualitative

independent variable and a dependent variable.

f. Step-wise regression is a procedure for selecting the best regression

function by adding or deleting a single independent variable at different

stages of its development.

7.

b. The entries in a correlation matrix reflected about the main diagonal are the

same. For example, r32 = r23.

c. Variables 5 and 6 with correlation coefficients of .79 and .70, respectively.

d. The r14 = -.51 indicates a negative linear relationship.

e. Yes. Variables 5 and 6 are to some extent collinear, r56 = .69.

f. Models that include variables 4 and 6 or variables 2 and 5 are possibilities. The

predictor variables in these models are related to the dependent variable and not

too highly related to each other.

g. Variable 5.

8.

a. Correlations:

Amount

Items

Time Amount

0.959

0.876 0.923

Time = 0.422 + 0.0871 Amount - 0.039 Items

120

Predictor

Constant

Amount

Items

Coef SE Coef

0.4217 0.5864

0.08715 0.01611

-0.0386

0.1131

T

P

VIF

0.72 0.483

5.41 0.000 6.756

-0.34 0.737 6.756

Analysis of Variance

Source

DF

Regression

2

Residual Error 15

Total

17

SS

128.988

11.030

140.018

MS

F

64.494 87.71

0.735

P

0.000

Amount and Time are highly collinear (correlation = .923, VIF = 6.756). Both

variables are not needed in the regression function. Deleting Items with the

non-significant t value gives the best regression below.

Time = 0.263 + 0.0821 Amount

Predictor

Coef SE Coef

T

P

Constant

0.2633

0.3488 0.75 0.461

Amount 0.082068 0.006025 13.62 0.000

S = 0.833503 R-Sq = 92.1% R-Sq(adj) = 91.6%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 128.90 128.90 185.54 0.000

Residual Error 16

11.12

0.69

Total

17 140.02

b. From the Full Model, checkout time decreases by .039 which does not

make sense.

c. Using the best model

Time = .2633 + .0821(28) = 2.5621

e = Y - Y = 2.4 - 2.5621 = -.1621

d. Using the best model, sy.x = .8335

e. The standard deviation of Y is estimated by .8335.

f. Using the best model, the number of Items is not relevant so

121

g. Using the best model, the 95% prediction interval (interval forecast) for

Amount = $70 is given below.

New

Obs

Fit SE Fit

1 6.008 0.238

95% CI

(5.504, 6.512)

95% PI

(4.171, 7.845)

the single predictor variable Amount.

9.

Food Income

Income 0.884

Size

0.737

0.867

Income is highly correlated with Food (expenditures) and, to a lesser extent,

so is Size. However, the predictor variables Income and Size are themselves

highly correlated indicating there is a potential multicollinearity problem.

b. The regression equation is

Food = 3.52 + 2.28 Income - 0.41 Size

Predictor

Constant

Income

Size

Coef SE Coef

3.519

3.161

2.2776

0.8126

-0.411

1.236

T

P

1.11 0.302

2.80 0.026

-0.33 0.749

VIF

4.016

4.016

When income is increased by one thousand dollars holding family size constant, the

average increase in annual food expenditures is 228 dollars. When family size is

increased by one person holding income constant, the average decrease in annual

food expenditures is 41 dollars. Since family size is positively related to food

expenditures, r = .737, it doesnt make sense that a decrease in expenditures

would occur.

c. Multicollinearity is a problem as indicated by VIFs of about 4.0. Size should be

122

dropped from the regression function and the analysis redone with only Income

as the predictor variable.

10.

a. Both high temperature and traffic count are positively related to number of sixpacks sold and have potential as good predictor variables. There is some collinearity

(r = .68) between the predictor variables but perhaps not enough to limit their

value.

b. Reject H 0 : 1 = 0 if |t| > 2.898

b1

t= s

b1

.78207

= 3.45

.22694

Reject H0 because 3.45 > 2.898 and conclude that the regression coefficient for

the high temp-variable is unequal to zero in the population.

Reject H 0 : 2 = 0 if |t| > 2.898

b2

t= s

b2

.06795

= 3.35

.02026

Reject H0 because 3.35 > 2.898 and conclude that the regression coefficient for

the traffic count variable is unequal to zero in the population.

c. Y = -26.706 + .78207(60) + .06795(500) = 54 (six-packs)

2

(Y Y )

2727.9

= .81

2 = 1 14316.9

(Y Y )

We are able to explain 81% of the number of six-packs sold variation using

knowledge of daily high temperature and daily traffic count.

d. R2 = 1 -

e. sy.xs =

2

(Y Y ) =

n k 1

2727.9

=

( 203)

160.46 = 12.67

f. If there is an increase of one degree in high temperature while the traffic count

is held constant, beer sales increase on an average of .78 six-packs.

g. The predictor variables explain 81% of the variation in six-packs sold. Both

predictor variables are significant. It would be prudent to examine the residuals (not

available in the problem) before deciding to use the fitted regression function for

forecasting however.

123

11.

a. Scatter diagram follows. Female drivers indicated by solid circles, male divers by

diamonds.

For a given age of car, female drivers expect to get about 1.2 more miles

per gallon than male drivers.

c. Fitted line for female drivers has equation: Y = 26.21 1.04 X 1

Fitted line for male drivers has equation: Y = 25.5 1.04 X 1

(Parallel lines with different intercepts)

d.

124

12.

representing male drivers. Straight line equation over-predicts mileage for

male drivers and under-predicts mileage for female drivers. Important to include

gender variable in this regression function.

a. Correlations: Sales, Outlets, Auto

Outlets

Auto

Sales

0.739

0.548

Outlets

0.670

Number of retail outlets is positively related to annual sales, r12 = .74, and is

potentially a good predictor variable. Number of automobiles registered is

moderately related to annual sales, r13 = .55, and is positively correlated with

number of retail outlets, r23 = .67. Given number of retail outlets in the

regression function, number of automobiles registered may not be required.

b. The regression equation is

Sales = 10.1 + 0.0110 Outlets + 0.195 Auto

Predictor

Constant

Outlets

Auto

Coef

10.109

0.010989

0.1947

SE Coef

T

P

VIF

7.220 1.40 0.199

0.005200 2.11 0.068 1.813

0.6398 0.30 0.769 1.813

Analysis of Variance

Source

DF

SS

Regression

2 1043.7

Residual Error 8

849.6

Total

10 1893.2

MS

F

521.8 4.91

106.2

125

P

0.041

New

Obs Fit SE Fit

95% CI

95% PI

1 37.00 7.15 (20.50, 53.49) (8.07, 65.93)

As can be seen from the regression output, it appears as if each predictor variable is

not significant (at the 5% level), however the regression is significant at the 5%

level. This is one of things that can happen when the predictor variables are collinear.

The forecast for region 1 is 37 with a prediction error of 52.3 37 = 15.3. However,

it is not a good idea to use this fitted function for forecasting. If the regression is rerun

after deleting Auto, Outlets (and the regression) is significant at the 1% level and

R2 is virtually unchanged at 55%.

d. The standard error of estimate is 10.3 which is quite large. As explained in part b,

the fitted function with both predictor variables should not be used to forecast.

Even if the regression is rerun with the single predictor Outlets, R2 =55% and

the relatively large standard error of the estimate suggest there will be a lot of

uncertainly associated with any forecast.

e. sy.xs =

2

(Y Y )

=

n k 1

849.6

=

(11 3)

106.2

= 10.3

f. If one retail outlet is added while the number of automobiles registered remains

constant, sales will increase by an average of .011 million or $11,000 dollars. If

one million more automobiles are registered while the number of retail outlets

remains constant, sales will increase by an average of .195 million or $195,000

dollars. However, these regression coefficients are suspect due to collinearity

between the predictor variables.

g. New predictor variables should be tried.

13.

Outlets

Auto

Income

Sales Outlets

0.739

0.548 0.670

0.936

0.556

Auto

0.281

126

Sales = - 3.92 + 0.00238 Outlets + 0.457 Auto + 0.401 Income

Predictor

Coef

SE Coef

T

P

VIF

Constant

-3.918

2.290 -1.71 0.131

Outlets

0.002384 0.001572 1.52 0.173 2.473

Auto

0.4574

0.1675 2.73 0.029 1.854

Income

0.40058

0.03779 10.60 0.000 1.481

S = 2.66798 R-Sq = 97.4% R-Sq(adj) = 96.2%

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

3 1843.40 614.47 86.32 0.000

7

49.83

7.12

10 1893.23

to the regression function results in an increase in R2 from 55% to 97%. In addition,

the t value and corresponding p value for Income indicates the coefficient of this

variable in the population is different from 0 given predictor variables Outlets and

Sales. Notice however, the regression should be rerun after deleting the insignificant

predictor variable Outlets. The correlation matrix and the VIF numbers suggest

Outlets is multicollinear with Auto and Income.

b. Predicted Values for New Observations

New

Obs

Fit SE Fit

1 27.306 1.878

95% CI

(22.865, 31.746)

95% PI

(19.591, 35.020)

New

Obs Outlets Auto Income

1

2500

20.2

40.0

Annual sales for region 12 is predicted to be 27.306 million.

c. The standard error of estimate has been reduced to 2.67 from 10.3 and R2 has increased

to 97%. The 95% PI in part b is fairly narrow. The forecast for region 12 sales in

part be should be accurate.

d. The best choice is to drop Outlets from the regression function. If this is done,

the regression equation is

127

Predictor

Constant

Auto

Income

Coef SE Coef

T

P

VIF

-4.027

2.468 -1.63 0.141

0.6209

0.1382 4.49 0.002 1.086

0.43017 0.03489 12.33 0.000 1.086

14.

Measures of fit are nearly the same as those for the full model and there is no longer

a multicollinearity problem.

a. Reject H0 : 1 = 0 if |t |> 3.1.

t=

.65

= 13

.05

Reject H0 and conclude that the regression coefficient for the aptitude test variable

is significantly different from zero in the population.

Similarly, Reject H0 : 2 = 0 if |t |> 3.1.

t=

20.6

= 12.2

1.69

Reject H0 and conclude that the regression coefficient for the effort index variable

is significantly different from zero in the population.

b. If the effort index increases one point while aptitude test score remains constant,

sales performance increases by an average of $20.600.

c. Y = 16.57 + .65(75) + 20.6(.5) = 75.62

2

d. s y x ' s ( n 3) = (3.56)2 (14 - 3) = 139.4

2

e. s y ( n 1) = (16.57)2 (14 - 1) = 3569.3

f. R2 = 1 -

(Y Y ) 2

139.4

= 1 - .039 = .961

2 = 1 3569.3

(Y Y )

knowledge of the aptitude test score and the effort index.

g.

R 2 =1

SSE /( n k 1)

134.90 / 11

=1

= .955

SST /(n 1)

3569.3 / 13

128

15.

a. Scatter plot for cash purchases versus number of items (rectangles) and credit card

purchases versus number of items (solid circles) follows.

129

Notice that for a given number of items, sales from cash purchases are estimated to

be about $18.60 less than gross sales from credit card purchases.

c. The regression in part b is significant. The number of items sold and whether

the purchases were cash or credit card explains approximately 83% of the

variation in gross sales. The predictor variable Items is clearly significant. The

coefficient of the dummy variable X2 is significantly different from 0 at the

10% level but not at the 5% level. From the residual plots below we see that

there are a few large residuals (see, in particular, cash sales for day 25 and credit

card sales for day 1); but overall, plots do not indicate any serious departures

from the usual regression assumptions.

130

e. sy.xs = 30.98

df = 47

145 1.96(30.98) = ($84, $206)

f. Fitted function in part b is effectively two parallel straight lines given by the

equations:

Cash purchases: Y = 13.61 + 5.99Items 18.6(1) = -4.98 + 5.99Items

Credit card purchases: Y = 13.61 + 5.99Items

If we fit separate straight lines to the two types of purchases we get:

Cash purchases: Y = -.60 + 5.78Items

R2 = 90.5%

= 10.02 + 6.46Items

Credit card purchases:

R2 = 66.0%

Y

Predictions for cash sales and credit card sales will not be too much different

for the two procedures (one prediction equation or two individual equations).

In terms of R2, the single equation model falls between the fits of the separate

models for cash purchases and credit card purchases but closer to the higher

number for cash purchases. For convenience and overall good fit, prefer the

single equation with the dummy variable.

16.

WINS

ERA -0.494

ERA

SO

BA

131

RUNS

HR

SO

BA

RUNS

HR

SB

0.049

0.446

0.627

0.209

0.190

-0.393

0.015 -0.007

0.279 -0.209 0.645

0.490 -0.215 0.154 0.664

-0.404 -0.062 -0.207 -0.162 -0.305

SO is essentially uncorrelated with WINS.

BA is moderately positively correlated with WINS and is also correlated

with the predictor variable RUNS.

RUNS is the predictor variable most highly correlated with WINS and will be

the first variable to enter the regression function in a stepwise program. RUNS

is fairly highly correlated with BA, so once RUNS is in the regression function, BA

is unlikely to be needed.

HR is essentially not related to WINS.

SB is essentially not related to WINS.

b. The stepwise results are the same for an alpha to enter = alpha to remove = .05 or

.15 (the Minitab default) or F to remove = F to enter =4.

Response is WINS on 6 predictors, with N = 26

Step

Constant

1

2

20.40 71.23

RUNS

T-Value

P-Value

0.087 0.115

3.94 10.89

0.001 0.000

ERA

T-Value

P-Value

-18.0

-9.52

0.000

S

7.72

3.55

R-Sq

39.28 87.72

The fitted function from the stepwise program is:

WINS = 71.23 + .115 RUNS - 18 ERA with R2 = 88%

17.

a. View will enter the stepwise regression function first since it has the largest

correlation with Price. After that the order of entry is difficult to determine from

the correlation matrix alone. Several of the predictor variable pairs are fairly highly

correlated so multicollinearity could be a problem. For example, once View is in the

model, Elevation may not enter (be significant). Slope and Area are correlated so

it may be only one of these predictors is required.

b. As pointed out in part a, it is difficult to determine the results of a stepwise program.

However, a two predictor model will probably work as well as any in this case.

Potential two predictor models include View and Area or View and Slope.

132

18.

The regression equation is

Y = - 43.2 + 0.372 X1 + 0.352 X2 + 19.1 X3

Predictor

Coef

Constant -43.15

X1

0.3716

X2

0.3515

X3

19.12

SE Coef

T

P

VIF

31.67 -1.36 0.192

0.3397

1.09 0.290 1.473

0.2917

1.21 0.246 1.445

11.04

1.73 0.103 1.481

Analysis of Variance

Source

DF

Regression

3

Residual Error 16

Total

19

SS

3071.1

3096.7

6167.8

MS

F

1023.7 5.29

193.5

P

0.010

Unusual Observations

Obs X1

Y

Fit SE Fit Residual St Resid

20 95 57.00 84.43

4.73

-27.43

-2.10R

R denotes an observation with a large standardized residual.

Predicted Values for New Observations

New

Obs

Fit SE Fit

95% CI

95% PI

1 80.88

3.36 (73.77, 88.00) (50.55, 111.22)

F = 5.29 with a p value = .010, so the regression is significant at the 1% level.

The predicted final exam score for within term exam scores of 86 and 77 and a

GPA of 3.4 is Y = 81

The variance inflation factors (VIFs) are all small (near 1); however, the t ratios and

corresponding p values suggest that each of the predictor variables could be dropped

from the regression equation. Since the F ratio was significant, we conclude that

multicollinearity is a problem.

d. Mean leverage = (3+1)/20= .20. None of the observations are high leverage points.

e. From the regression output above, observation 20 has a large standardized residual.

The fitted model over-predicts the response (final exam score) for this student.

19.

Stepwise regression results, with significance level .05 to enter and leave the

regression function, follow.

133

Response is Y on 3 predictors, with N = 20

Step

Constant

1

-26.24

X3

T-Value

P-Value

31.4

3.30

0.004

S

R-Sq

R-Sq(adj)

14.6

37.71

34.25

The best regression model relates final exam score to the single predictor

variable grade point average.

All possible regression results are summarized in the following table.

Predictor

Variables

R2

X1

.295

X2

.301

X3

.377

X1, X2

.404

X1, X3

.452

X2, X3

.460

X1, X2, X3

.498

2

The R criterion would suggest using all three predictor variables. However, the

results in problem 7.18 suggest there is a multicollinearity problem with three

predictors. The best two independent variable model uses predictors X2 and X3.

When this model is fit, X2 is not required. We end up with a model involving the

single predictor X3, the model selected by the stepwise procedure.

20.

The regression equation is

LnComp = 5.69 - 0.505 Educate + 0.255 LnSales - 0.0246 PctOwn

Predictor

Constant

Educate

LnSales

PctOwn

S = 0.4953

Coef

5.6865

-0.5046

0.2553

-0.0246

SE Coef

0.6103

0.1170

0.0725

0.0130

R-Sq = 42.8%

T

P

9.32 0.000

-4.31 0.000

3.52 0.001

-1.90 0.064

R-Sq(adj) = 39.1%

134

VIF

1.0

1.0

1.0

increases, compensation decreases. Positive coefficient on lnsales implies as sales

increase, compensation increases, everything else equal. Finally, for fixed

education and sales, as percent ownership increases, compensation decreases.

Unusual Observations

Obs Educate LnComp

31

2.00

6.5338

33

0.00

6.3969

Fit

5.9055

7.0645

0.4386

0.6283

2.73RX

0.2624

-0.6676

-1.59 X

X denotes an observation whose X value gives it large influence.

Observation 31 has a large standardized residual and is influential. Observation 33

is also influential. The CEOs for companies 31 and 33 own relatively large

percentages of their companys stock, 34 % and 17% respectively. They are

outliers in this respect. The large residual for company 31 results from underpredicting compensation for this CEO. This CEO receives very adequate

compensation in addition to owning a large percentage of the companys stock.

All in all, this k = 3 predictor model appears to be better than the k = 2 predictor

model of Example 7.12.

21.

Assets = 7.61 - 0.0046 Accounts + 0.000034 Accounts**2

135

Predictor

Constant

Accounts

Accounts**2

Coef

7.608

-0.00457

0.00003361

E Coef

T

P

VIF

8.503 0.89 0.401

0.02378 -0.19 0.853 25.965

0.00000893 3.76 0.007 25.965

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

2 51130

7

1078

9 52208

MS

F

P

25565 165.95 0.000

154

model, Accounts**2 is significant ( t value = 3.76, p value = .007). Here Accounts

could be dropped from the regression function and the analysis repeated with only

Accounts**2 as the predictor variable. If this is done, R2 and the coefficient of

Accounts**2 remain virtually unchanged.

c. Dropping Accounts**2 from the model gives:

Assets = - 17.1 + 0.0832 Accounts

Predictor

Coef

SE Coef

T

P

Constant

-17.121

8.778 -1.95 0.087

Accounts 0.083205 0.007592 10.96 0.000

S = 20.1877 R-Sq = 93.8% R-Sq(adj) = 93.0%

The coefficient of Accounts changes from the quadratic model to the straight

line model because, not surprisingly, Accounts and Accounts**2 are highly

collinear (VIF = 25.965 in the quadratic model).

22.

The regression equation is

Taste = - 30.7 + 4.20 H2S + 17.5 Lactic

Predictor

Coef

Constant -30.733

H2S

4.202

Lactic

17.526

SE Coef

9.146

1.049

8.412

T

P

VIF

-3.36 0.006

4.01 0.002 2.019

2.08 0.059 2.019

136

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

2 2777.0 1388.5 32.57 0.000

12 511.6

42.6

14 3288.7

significant predictor at the 5% level, it is at the 6% level (t = 2.08, p value = .059)

and we have chosen to keep it in the model. R2 indicates about 84% of the variation

in Taste is explained by H2S and Lactic. The residual plots below indicate the fitted

function is adequate. There is no reason to doubt the usual regression assumptions.

23.

Using the final model from problem 22 with H2S = 7.3 and Lactic = 1.85

Predicted Values for New Observations

New

Obs

Fit SE Fit

1 32.36

3.02

95% CI

(25.78, 38.95)

95% PI

(16.69, 48.04)

Since s y x ' s = 6.53 and t.025 = 2.179 a large sample 95% prediction interval is:

32.36 2.179(6.53) (18.13, 46.59)

Notice the large sample 95% prediction interval is not too much different than the

actual 95% prediction interval (PI) above.

Although the fit in this case is relatively good, the standard error of the estimate is

somewhat large, so there is a fair amount of uncertainty associated with any forecast.

137

It may be a good idea to collect more data and, perhaps, investigate additional

predictor variables.

24.

GtReceit MediaRev StadRev TotRev PlayerCt OpExpens OpIncome

MediaRev 0.304

StadRev

0.587

0.348

TotRev

0.771

0.792

0.753

PlayerCt

0.423

0.450

0.269

0.499

OpExpens 0.636

0.554

0.623

0.766

0.867

OpIncome 0.562

0.672

0.547

0.785

-0.075

0.203

FranValu

0.655

0.780

0.701

0.925

0.397

0.635

0.797

Total Revenue is likely to be a good predictor of Franchise Value. The correlation

between these two variables is .925.

b. Stepwise Regression: FranValu versus GtReceit, MediaRev, ...

Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.05

Response is FranValu on 7 predictors, with N = 26

Step

1

Constant 2.928

TotRev

T-Value

P-Value

1.96

11.94

0.000

S

13.7

R-Sq

85.59

R-Sq(adj) 84.99

Results from stepwise program are not surprising given the definitions of the

variables and the strong (and in some cases perfect) multicollinearity.

c. The coefficient of TotRev from the stepwise program is 1.96 and the constant

is relatively small and, in fact, insignificant. Consequently, Franchise Value is,

on average, about twice Total Revenue.

138

OpExpens = 18.9 + 1.30 PlayerCt

Predictor

Coef

Constant 18.883

PlayerCt 1.3016

SE Coef

4.138

0.1528

T

4.56

8.52

P

0.000

0.000

Analysis of Variance

Source

DF

SS

Regression

1 2101.7

Residual Error 24 695.2

Total

25 2796.9

MS

2101.7

29.0

F

72.56

P

0.000

Unusual Observations

Obs PlayerCt OpExpens

7

18.0

60.00

42.31

1.64

17.69

3.45R

The linear relation between Operating expenses and Player costs is fairly strong.

About 75% of the variation in Operating expenses is explained by Player costs.

Observation 7 (Chicago White Sox) have relatively low Player costs as a

component of Operating expenses.

e. Clearly Total revenue, Operating expenses and Operating income are

multicollinear since, by definition, Operating income = Total revenue Operating

expenses. Also, Total revenue Gate receipts + Media revenue + Stadium revenue

so this group of variables will be highly multicollinear.

CASE 7-1: THE BOND MARKET

The actual data for this case is supplied in Appendix A. Students can either be asked to

Respond to the question at the end of the case or they can be assigned to run and analyze the data.

One approach that I have used successfully is to assign one group of students the role of asking

Judy Johnson's questions and another group the responsibility for Ron's answers.

1.

What questions do you think Judy will have for Ron? The students always seem

to come up with questions that Ms. Johnson will ask. The key is that Ron should be able

to answer them. Possible issues include:

Are all the predictor variables in the final model required? Is a simpler model

with fewer predictor variables feasible?

139

Do the estimated regression coefficients in the final model make sense and are

they reliable?

Four observations have large standardized residuals. Is this a cause for concern?

Is the final model a good one and can it be confidently used to forecast the

utilitys bond interest rate at the time of issuance?

Is multiple regression the appropriate statistical method to use for this situation?

CASE 7-2: AAA WASHINGTON

1.

The multiple regression model that includes both unemployment rate and average

monthly temperature is shown below. Temperature is the only good predictor variable.

2.

Yes.

3.

calls. Unemployment rate lagged 3 months is not a good predictor. The Minitab output

with Temp and Lagged11Rate is given below.

The regression equation is

Calls = 21405 - 88.4 Temp + 756 Lag11Rate

Predictor

Constant

Temp

Coef SE Coef

T

P

21405

1830 11.70 0.000

-88.36

19.21 -4.60 0.000

140

Lag11Rate

756.3

172.0

4.40

0.000

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

2 120430208 60215104 48.28 0.000

Residual Error 54

67351116

1247243

Total

56 187781324

The regression is significant. The signs on the coefficients of the independent variables

make sense. The coefficient of each independent variable is significantly different

from 0 (t = 4.6, p value = .000 and t = 4.4, p value = .000, respectively).

4.

rate lagged 11 months (Lag11Rate), transformed average temperature (NewTemp)

and NewTemp**2 are given below.

The regression equation is

Calls = 17060 + 635 Lag11Rate - 112 NewTemp + 7.59 NewTemp**2

Predictor

Coef

Constant

17060.2

Lag11Rate

635.4

NewTemp

-112.00

NewTemp**2 7.592

SE Coef

T

P

847.0 20.14 0.000

146.5

4.34 0.000

47.70 -2.35 0.023

1.657

4.58 0.000

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

3 140771801 46923934 52.90 0.000

Residual Error 53

47009523

886972

Total

56 187781324

Unusual Observations

Obs Lag11R

11

6.10

29

5.60

32

6.19

34

5.72

Calls

24010

17424

24861

19205

22101

193

1909

2.07R

20346

191

-2922

-3.17R

24854

487

7

0.01 X

21157

201

-1952

-2.12R

X denotes an observation whose X value gives it large leverage.

141

any lag.

Apart from a couple of large residuals, the residual plots indicate an adequate

model. There is no indication any of the usual regression assumptions have been

violated. A good model has been developed.

1.

The regression is significant. The R 2 of 78.1% looks good. The t statistic for each

of the predictor variables is large with a very small p-value. The VIFs are relatively

small for the three predictors indicating that multicollinearity is not a problem. The

residual plots shown in Figure 7-4 indicate that this model is valid. Dr. Hanke has

developed a good model to forecast ERA.

2.

The matrix plot below of ERA versus each of five potential predictor variables does

not show any obvious nonlinear relationships. There does not appear to be any

reason to develop a new model.

142

3.

The regression results with WHIP replacing OBA as a predictor variable follow.

The residual plots are very similar to those in Figure 7-4.

The regression equation is

ERA = - 2.81 + 4.43 WHIP + 0.101 CMD + 0.862 HR/9

Predictor

Coef SE Coef

Constant -2.8105

0.4873

WHIP

4.4333

0.3135

CMD

0.10076 0.04254

HR/9

0.8623 0.1195

T

P

VIF

-5.77 0.000

14.14 0.000 1.959

2.37 0.019 1.793

7.22 0.000 1.135

Analysis of Variance

Source

DF

SS

MS

F

Regression

3 91.167 30.389 157.48

Residual Error 134

25.859 0.193

Total

137 117.026

P

0.000

The fit and the adequacy of this model are virtually indistinguishable from the

corresponding model with OBA instead of WHIP as a predictor. The estimated

coefficients of CMD and HR/9 are nearly the same in both models. Both models are

good. The original model with OBA as a predictor has a slightly higher R2 and a

slightly smaller standard error of the estimate. Using these criteria, it is the preferred

model.

143

The project may not be doomed to failure. A lot can be learned from investigating the

influence of the various independent variables on WINS. However, the best regression model

does not explain a large percentage of the variation in WINS, R2 = 34%, so the experts have

a point. There will be a lot of uncertainty associated with any forecast of WINS. The stepwise

selection of the best predictor variables and the subsequent full regression output follow.

Stepwise Regression: WINS versus THROWS, ERA, ...

Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.05

Response is WINS on 10 predictors, with N = 138

Step

Constant

1

2

20.531 5.543

ERA

T-Value

P-Value

-2.16 -2.01

-7.00 -6.80

0.000 0.000

RUNS

T-Value

P-Value

0.0182

3.86

0.000

S

R-Sq

R-Sq(adj)

3.33 3.17

26.51 33.83

25.97 32.85

WINS = 5.54 - 2.01 ERA + 0.0182 RUNS

Predictor

Coef SE Coef

T

P

VIF

Constant

5.543

4.108 1.35 0.179

ERA

-2.0110

0.2959 -6.80 0.000 1.017

RUNS

0.018170 0.004702 3.86 0.000 1.017

S = 3.17416 R-Sq = 33.8% R-Sq(adj) = 32.8%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

2 695.31 347.66 34.51 0.000

Residual Error 135 1360.17

10.08

Total

137 2055.48

CHAPTER 8

REGRESSION WITH TIME SERIES DATA

144

1.

If not properly accounted for, serial correlation can lead to false inferences under the

usual regression assumptions. Regressions can be judged significant when, in fact,

they are not, coefficient standard errors can be under (or over) estimated so individual

terms in the regression function may be judged significant (or insignificant) when they

are not (or are) and so forth.

2.

Serial correlation often arises naturally in time series data. Series, like employment,

whose magnitudes are naturally related to the seasons of the year will be autocorrelated.

Series, like sales, that arise because of a consistently applied mechanism, like advertising

or effort, will be related from one period to the next (serially correlated). In the analysis

of time series data, autocorrelated residuals arise because of a model specification error

or incorrect functional formthe autocorrelation in the series is not properly accounted

for.

3.

is most frequently violated.

4.

Durbin-Watson statistic

5.

Reject H0 if DW < 1.10. Since 1.0 < 1.10, reject and conclude that the errors are

positively autocorrelated.

6.

Reject H0 if DW < 1.55, Do not reject H0 if DW > 1.62. Since 1.6 falls between 1.55

and 1.62, the test is inconclusive.

7.

the best predictor variables) consistent with the usual regression assumptions. This can

often be accomplished by using variables defined in terms of percentage changes rather

than magnitudes, or autoregressive models, or regression models involving first

differenced or generalized differenced variables.

8.

A predictor variable is generated by using the Y variable lagged one or more periods.

9.

Fuel = 113 - 8.63 Price - 0.137 Pop

Predictor

Coef SE Coef

T

P

Constant 113.01

16.67

6.78 0.000

Price

-8.630

2.798 -3.08 0.009

Pop

-0.13684 0.08054 -1.70 0.113

145

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

2 223.39 111.69 21.29 0.000

13

68.19

5.25

15 291.58

The null and alternative hypotheses are:

H0: = 0

H1: > 0

Using the .05 significance level for a sample size of 16 with 2 predictor variables,

dL = .98. Since DW = .61 < .98, reject H0 and conclude the observations are positively

serially correlated.

10.

Visitors = 309899 + 24431 Time - 193331 Price + 217138 Celeb.

Predictor

Coef

Constant 309899

Time

24431

Price

-193331

Celeb.

217138

SE Coef

T

P

59496 5.21 0.000

7240 3.37 0.007

97706 -1.98 0.076

47412

4.58 0.001

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

3 2.20854E+11 73617995859 15.02 0.000

Residual Error 10 49008480079

4900848008

Total

13

2.69862E+11

Durbin-Watson statistic = 1.14430

11.

With n = 14, k =3 and = .05, DW = 1.14 gives an indeterminate test for serial

correlation.

Serial correlation is not a problem. However, it is interesting to see whether the students

realize that collinearity is a likely problem since Customer and Charge are highly correlated.

Correlation matrix:

Use

Charge

Customer

Revenue

0.187

0.989

0.918

Use

Charge

0.109

0.426

0.891

146

Revenue = - 65.6 + 0.00173 Use + 29.5 Charge + 0.000197 Customer

Predictor

Coef

SE Coef

T

P

VIF

Constant

-65.63

14.83 -4.43 0.000

Use

0.001730

0.001483 1.17 0.255 2.151

Charge

29.496

2.406 12.26 0.000 8.515

Customer 0.0001968 0.0001367 1.44 0.163 10.280

S = 6.90038 R-Sq = 98.5% R-Sq(adj) = 98.4%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

3 77037 25679 539.30 0.000

Residual Error 24

1143

48

Total

27 78180

Durbin-Watson statistic = 2.20656 (Cannot reject H 0 : = 0 at any reasonable

significance level)

Deleting Customer from the regression function gives:

The regression equation is

Revenue = - 57.6 + 0.00328 Use + 32.7 Charge

Predictor

Coef

SE Coef

T

P

VIF

Constant

-57.60

14.03 -4.11 0.000

Use

0.003284 0.001039 3.16 0.004 1.012

Charge

32.7488

0.8472 38.66 0.000 1.012

S = 7.04695 R-Sq = 98.4% R-Sq(adj) = 98.3%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

2 76938 38469 774.66 0.000

Residual Error 25

1241

50

Total

27 78180

Durbin-Watson statistic = 1.82064 (Cannot reject H 0 : = 0 at any reasonable

significance level)

12.

147

Earnings

Dividend

Payout

0.565

0.719

0.712

0.435 -0.049

0.662

The best model, after taking account of the initial multicollinearity, uses the predictor

variables Earnings and Payout (ratio).

The regression equation is

Share = 4749 + 6651 Earnings + 171 Payout

Predictor Coef SE Coef

T

P

VIF

Constant 4749

5844 0.81 0.424

Earnings

6651

1546 4.30 0.000 1.002

Payout 171.40

50.49 3.39 0.002 1.002

S = 3922.16 R-Sq = 53.4% R-Sq(adj) = 49.7%

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

2 440912859 220456429 14.33 0.000

25 384584454

15383378

27 825497313

b. With n = 28, k = 2 and = .01, DW = .29 < dL = 1.04 so there is strong evidence of

positive serial correlation.

c. An autoregressive model with lagged Shareholders as a predictor might be a viable

= .8 is another possibility.

option. A regression using generalized differences with

13.

a.

148

b. No. The residual autocorrelation function for the residuals from the straight line fit

indicates significant positive autocorrelation. The independent errors assumption

is not viable.

c. The fitted line plot with the natural logarithms of Passengers as the dependent variable

and the residual autocorrelation function follow.

149

The residual autocorrelation function looks a little better than that in part b,

but there is still significant positive autocorrelation at lag 1.

d. Exponential trend plot for Passengers follows along with residual autocorrelation

function.

150

e. Models in parts c and d are equivalent. If you take the natural logarithms of

fitted exponential growth model you get the fitted model in part c.

f. As we have pointed out, the errors for either of the models in parts c and d are

not independent. Using a model that assumes the errors are independent can

lead to inaccurate forecasts and, in this case, unwarranted precision.

14.

a. The best model lags permits by 2 quarters (Lg2Permits):

151

=195.

Predictor

Coef SE Coef

T

P

Constant

20.24

27.06 0.75 0.467

Lg2Permits 9.2328 0.8111 11.38 0.000

S = 66.2883 R-Sq = 90.2% R-Sq(adj) = 89.6%

b. DW = 1.47. No evidence of autocorrelation.

c. The regression equation is

Sales = 16.6 + 8.80 Lg2Permits + 30.0 Season

Predictor

Constant

Lg2Permits

Season

Coef SE Coef

T

P

16.61

27.99 0.59 0.563

8.801

1.020 8.63 0.000

30.02

41.67 0.72 0.484

d. No. For Season: t = .72, p value = .484.

e. No. DW = 1.44. No evidence of autocorrelation.

f. 2007

2nd quarter forecast 113

Forecasts for the 3rd and 4th quarters can be done using several different

approaches. This is best left to the student with a discussion of why they

used a particular method. One method that is to average the past values

of Permits for the 1st and 2nd quarters and use these averages in the model.

This will result in forecasts: 3rd quarter 514; 4th quarter 235.

15.

Quarter

1

2

3

4

Sales S2 S3 S4

16.3

17.7

28.1

34.3

0

1

0

0

0

0

1

0

0

0

0

1

Sales = 19.3 - 1.43 S2 + 11.2 S3 + 33.3 S4

Predictor

Coef

SE Coef

T

152

Constant 19.292

S2

-1.425

S3

11.163

S4

33.254

2.074

2.933

2.999

2.999

9.30

-0.49

3.72

11.09

0.000

0.630

0.001

0.000

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

3 8726.5 2908.8 56.36 0.000

Residual Error 42 2167.6

51.6

Total

45 10894.1

Durbin-Watson statistic = 1.544

1996(3rd Qt) Y = 19.3 - 1.43(0) + 11.2(1) + 33.3(0) = 30.5

1996(4th Qt) Y = 19.3 - 1.43(0) + 11.2(0) + 33.3(1) = 52.6

The regression is significant. The model explains 80.1% of the variation in Sales.

There is no lag 1 autocorrelation but a significant residual autocorrelation at lag 4.

16.

Dickson = - 6.40 + 2.84 Industry

Predictor

Coef SE Coef

T

P

Constant -6.4011

0.8435 -7.59 0.000

Industry 2.83585 0.02284 124.14 0.000

S = 0.319059 R-Sq = 99.9% R-Sq(adj) = 99.9%

Durbin-Watson statistic = 0.8237 Consistent with positive autocorrelation.

See also plot of residuals versus time and residual autocorrelation function.

153

c.

X t' = X t .585 X t 1 , and fit the model given in equation (8.5). The result

is Yt ' = 2.31 + 2.81X t' with Durbin-Watson statistic = 1.74. In this case, the

estimate of 1 , 1 = 2.81 , is nearly the same as the estimate of 1 in part a.

Here the autocorrelation in the data is not strong enough to have much effect

on the least squares estimate of the slope coefficient.

d. The standard error of

1

regression involving generalized differences. The standard error in the initial

regression is under estimated because of the positive serial correlation. The

standard error in the regression with generalized differences, although larger,

is the one to be trusted.

17.

DiffSales = 149 + 9.16 DiffIncome

20 cases used, 1 cases contain missing values

Predictor

Coef

Constant 148.92

DiffIncome 9.155

SE Coef

T

P

97.70 1.52 0.145

2.034 4.50 0.000

Analysis of Variance

154

Source

DF

SS

MS

F

P

Regression

1 1164598 1164598 20.27 0.000

Residual Error 18 1034389

57466

Total

19 2198987

Durbin-Watson statistic = 1.1237

Here DiffSales = Yt ' = Yt Yt 1 and DiffIncome = X t' = X t X t 1 . The results

involving simple differences are close to the results obtained by the method of

generalized differences in Example 8.5. The estimated slope coefficient is 9.16

versus an estimated slope coefficient of 9.26 obtained with generalized. The intercept

coefficient 149 is also somewhat consistent with the intercept coefficient 54483(1.997) =

163 for the generalized differences procedure. We would expect the two methods to

= .997 is nearly 1.

produce similar results since

18.

Savings = 4.98 + 0.0577 Income

Predictor

Coef SE Coef

T

P

Constant

4.978

5.149 0.97 0.346

Income 0.05767 0.02804 2.06 0.054

S = 10.0803 R-Sq = 19.0% (2) 19% of variation in Savings explained by Income

Analysis of Variance

Source

Regression

DF

1

SS

MS

F

P

430.0 430.0 4.23 0.054 (1) Regression is not

significant

at .01 level

Residual Error 18 1829.0 101.6

Total

19 2259.0

Durbin-Watson statistic = 0.4135 With = .05, dL = 1.20 so positive

autocorrelation is indicated.

Can improve model by allowing for autocorrelated observations (errors).

b. The regression equation is

Savings = - 3.14 + 0.0763 Income + 20.2 War Year

Predictor

Coef SE Coef

T

P

Constant

-3.141

2.504 -1.25 0.227

Income

0.07632 0.01279 5.97 0.000

War Year 20.165

2.375 8.49 0.000 (1) Given Income, War Year makes

a significant contribution at

the

155

.01 level.

S = 4.53134 R-Sq = 84.5% R-Sq(adj) = 82.7%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

2 1909.94 954.97 46.51 0.000

Residual Error 17

349.06 20.53

Total

19 2259.00

Durbin-Watson statistic = 2.010 (2) No significant autocorrelation of any kind is

indicated.

Using all the usual criteria for judging the adequacy of a regression model, this model

is much better than the simple linear regression model in part a.

19.

a.

156

The data are clearly seasonal with fourth quarter sales large and sales for the

remaining quarters relatively small. Seasonality is confirmed by the

autocorrelation function with significant autocorrelation at the seasonal

lag 4.

b. From the autocorrelation function observations 4 periods apart are highly

positively correlated. Therefore an autoregressive model with sales lagged 4

time periods as the predictor variable might be appropriate.

c. The regression equation is

Sales = 421 + 0.853 Lg4Sales

24 cases used, 4 cases contain missing values

Predictor

Coef

Constant

421.4

Lg4Sales 0.85273

SE Coef

T

P

230.0 1.83 0.081

0.09286 9.18 0.000

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 4767638 4767638 84.32 0.000

Residual Error 22 1243890

56540

Total

23 6011528

d. May 31 (2003)

Aug 31 (2003)

Nov 30 (2003)

Feb 28 (2004)

= 421.4 + .85273(2221) = 2315.3

= 421.4 + .85273(2422) = 2486.7

= 421.4 + .85273(3239) = 3183.4

157

compared to 2150

compared to 2350

compared to 2600

compared to 3400

Forecasts are not bad but they are below the Value Line estimates for the

last 3 quarters and the difference becomes increasingly larger.

e. Value line estimates for the last 3 quarters of 2003-04 seem increasingly optimistic.

f. Model in part c can be improved by allowing for significant lag 1 residual

autocorrelation. One approach is to include sales lagged 1 quarter as an additional

predictor variable.

20.

ChickConsum

Income

0.922

ChickPrice

0.794

PorkPrice

0.871

BeefPrice

0.913

Income ChickPrice

0.932

0.957

0.986

PorkPrice

0.970

0.928

0.941

LnChickC LnIncome LnChickP LnPorkP

LnIncome 0.952

LnChickP 0.761

0.907

LnPorkP

0.890

0.972

0.947

LnBeefP

0.912

0.979

0.933

0.954

The correlations are similar for both the original and natural log transformed data.

Correlations among the potential predictor variables are large implying a

multicollinearity problem. Chicken consumption is most highly correlated with

Income and BeefPrice for both the original and log transformed data. Must be

careful interpreting correlations with time series data since autocorrelation in the

individual series can result in apparent linear association.

b. Stepwise Regression: ChickConsum versus Income, ChickPrice, ...

Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.05

Response is ChickConsum on 4 predictors, with N = 23

Step

Constant

Income

T-Value

P-Value

1

28.86

0.00970 0.01454

10.90

6.54

0.000

0.000

ChickPrice

T-Value

P-Value

S

R-Sq

2

37.72

-0.29

-2.34

0.030

2.58

84.98

2.34

88.21

158

R-Sq(adj)

84.27

87.03

c. There is high multicollinearity among the predictor variables so the final model

depends on which non-significant predictor variable is deleted first. If BeefPrice is

deleted, the final model is the one selected by stepwise regression (using a .05 level

for determining significance of individual terms) with significant lag 1 residual

autocorrelation. If Income is deleted first, then the final model involves the three

Price predictor variables as shown below. There is no significant residual

autocorrelation but large VIFs, although the coefficients of the predictor variables

have the right signs. In this data set, Income is essentially a proxy for the three

price variables.

The regression equation is

ChickConsum = 37.9 - 0.665 ChickPrice + 0.195 PorkPrice + 0.123 BeefPrice

Predictor

Constant

ChickPrice

PorkPrice

BeefPrice

Coef SE Coef

T

P

VIF

37.859

3.672 10.31 0.000

-0.6646

0.1702 -3.90 0.001 17.649

0.19516 0.05874 3.32 0.004 21.109

0.12291 0.02625 4.68

0.000 9.011

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

3 844.44 281.48 63.08 0.000

Residual Error 19

84.78

4.46

Total

22 929.22

Durbin-Watson statistic = 1.2392

21. Stepwise Regression: LnChickC versus LnIncome, LnChickP, ...

Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.05

Response is LnChickC on 4 predictors, with N = 23

Step

Constant

1

2

1.729 2.375

LnIncome

T-Value

P-Value

0.283 0.440

14.32 15.40

0.000 0.000

LnChickP

T-Value

P-Value

-0.445

-6.06

0.000

S

R-Sq

0.0528 0.0321

90.71 96.72

159

R-Sq(adj)

90.27

96.40

residual autocorrelation.

The regression equation is

LnChickC = 2.37 + 0.440 LnIncome - 0.445 LnChickP

Predictor

Coef

Constant

2.3748

LnIncome 0.43992

LnChickP -0.44491

SE Coef

T

P

VIF

0.1344 17.67 0.000

0.02857 15.40 0.000 5.649

0.07342 -6.06 0.000 5.649

Analysis of Variance

Source

DF

SS

MS

F

Regression

2 0.61001 0.30500 295.30

Residual Error 20 0.02066 0.00103

Total

22

0.63067

P

0.000

The coefficient of .44 on LnIncome implies as Income increases 1% chicken

consumption increases by .44%, chicken price held constant. Similarly, the

coefficient of .44 on LnChickP implies as chicken price increases by 1%

chicken consumption decreases by .44%, income held constant.

To obtain a forecast of chicken consumption for the following year, forecasts

of income and chicken price for the following year would be required. After taking

logarithms, these values would be used in the final regression equation to get a

forecast of LnChickC. A forecast of chicken consumption is then generated by taking

the antilog.

22. The regression equation is

DiffChickC = 1.10 + 0.00075 DiffIncome - 0.145 DiffChickP

22 cases used, 1 cases contain missing values

Predictor

Coef

SE Coef

T

P

VIF

Constant

1.0967

0.4158 2.64 0.016

DiffIncome 0.000746 0.003477

0.21 0.832 1.029

DiffChickP -0.14473 0.06218 -2.33 0.031 1.029

S = 1.21468 R-Sq = 22.3% R-Sq(adj) = 14.1%

160

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

2 8.039 4.020 2.72 0.091

Residual Error 19 28.033 1.475

Total

21 36.073

Durbin-Watson statistic = 1.642

Very little explanatory power in the predictor variables. If the non-significant DiffIncome

is dropped from the model, the resulting regression is significant at the .05 level, R 2 is

virtually unchanged and the standard error of the estimate decreases slightly. The residual

plots look good and there is no evidence of autocorrelation. With the very low R 2, the fitted

function is not useful for forecasting the change (difference) in chicken consumption.

23. The regression equation is

ChickConsum = 1.94 + 0.975 LagChickC

22 cases used, 1 cases contain missing values

Predictor

Coef SE Coef

T

P

Constant

1.945

1.823 1.07 0.299

LagChickC 0.97493 0.04687 20.80 0.000

S = 1.33349 R-Sq = 95.6% R-Sq(adj) = 95.4%

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 769.45 769.45 432.71 0.000

Residual Error 20 35.56

1.78

Total

21 805.01

161

a very good predictor of next years chicken consumption. The coefficient on

lagged chicken consumption (LagChickC) is almost 1. The intercept in not significant.

Chicken consumption is essentially a random walknext years chicken consumption is

this years chicken consumption plus a random amount with mean 0. The residual

plots look good and there is no residual autocorrelation.

We cannot infer the effect of a change in chicken price on chicken consumption with

this model since chicken price does not appear as a predictor variable.

24.

Yt Yt 1 = X t X t 1 + t t 1 = t + t t 1 = t say

X t X t 1 = t

Here the independent error t has mean 0 and variance 32. So the first differences for

both Yt and X t are stationary and X and Y are cointegrated of order 1. The cointegrating

linear combination is: Yt X t = t .

CASE 8-2: BUSINESS ACTIVITY INDEX FOR SPOKANE COUNTY

1.

Answer: Autocorrelation must be solved for first to create data (or model) consistent

with the usual regression assumptions.

2.

Would it have been better to eliminate multicollinearity first and then tackle

autocorrelation?

Answer: No. In order to solve the autocorrelation problem, the nature of the data was

changed (first differenced). If multicollinearity were solved first, one or more important

variables may have been eliminated. Autocorrelation must be accounted for first so the

usual regression assumptions apply; then multicollinearity can be tackled.

162

3.

Answer: A sample size of 15 is small for a model that uses three independent

variables (ideally, n should be in the neighborhood of 30 or more). A larger sample

size would almost certainly be helpful.

4.

Should the regression done on the first differences have been through the origin?

Answer: Perhaps. An intercept can be included in the regression model and then

checked for significance. Ordinarily, regressions with first differenced data does

not require an intercept term.

5.

Answer: Perhaps. Although using lagged dependent and independent variables would

constructing an index more difficult. Since the first differenced data work well in this

case, there is no real need to consider lagged variables.

6.

What conclusions can be drawn from a comparison of the Spokane County business

activity index and the GNP?

Answer: The Spokane business activity seems to be extremely stable. It was not

affected by the national recessions of 1970 and 1974. The large peak in 1974 was

caused by Expo 74 (a world fair). It would be inappropriate in this case to expect

the Spokane economy to follow national patterns.

1.

Answer: Jimss use of a dummy variable to represent periods when Marquette

was in session or out of session seems very reasonable. A good use of a dummy

variable.

2.

Answer: Jim's use of lagged sales as a predictor variable was eminently sensible.

This independent variable likely to have good predictor variable and can account

for autocorrelation. This is a good time to I

3.

Answer: Yes. Model 6 is the best. However, there may be other predictor variables

that would improve this model; the number of students enrolled at Marquette during

a particular quarter or semester is an example.

4.

Would another type of forecasting model be more effective for forecasting weekly sales?

Answer: Possibly! Jim will investigate Box-Jenkins ARIMA models in Chapter 9.

John is correct to be disappointed with the model run with seasonal dummy variables since

the residual autocorrelations have a spike at lag 12. From a forecasting perspective, the

autoregressive model is better. The intercept term allows for a time trend, seasonality is accounted

163

for by sales lagged 12 months as the predictor variable, R2 is large (91%) and there is no residual

autocorrelation. However, this model does not include predictor variables directly under Johns

control, like price, so he would not be able to determine how a change in price (or changes in other

operational variables) might affect future sales.

Nonseasonal model:

The regression equation is

Clients = - 292 + 3.38 Index + 0.370 Bankrupt - 0.0656 Permits

Predictor

Coef

Constant -292.27

Index

3.3783

Bankrupt 0.37001

Permits -0.06559

SE Coef

T

P

41.23 -7.09 0.000

0.3404 9.93 0.000

0.09740 3.80 0.000

0.02882 -2.28 0.026

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

3 34630 11543 41.62 0.000

Residual Error 80 22187

277

Total

83 56816

Durbin-Watson statistic = 1.605

The best nonseasonal regression model used the business activity index, number of

bankruptcies filed, and number of building permits to forecast number of clients seen. The

Durbin-Watson test for serial correlation is inconclusive at the .05 level. The residual

autocorrelation function shows some significant autocorrelation around lag 4.

Best seasonal model:

The regression equation is

Clients = - 135 + 2.51 Index - 3.79 S2 + 5.69 S3 - 15.9 S4 - 21.1 S5

- 13.6 S6 - 20.6 S7 - 19.6 S8 - 25.9 S9 - 6.87 S10 - 19.0 S11

- 33.1 S12

Predictor

Coef

Constant -135.08

Index

2.5099

S2

-3.793

SE Coef

T

P

26.96 -5.01 0.000

0.2421 10.37 0.000

8.443 -0.45 0.655

164

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

5.686

-15.869

-21.146

-13.580

-20.641

-19.650

-25.857

-6.869

-19.014

-33.143

8.469

8.445

8.441

8.443

8.441

8.443

8.441

8.445

8.448

8.441

0.67

-1.88

-2.51

-1.61

-2.45

-2.33

-3.06

-0.81

-2.25

-3.93

0.504

0.064

0.015

0.112

0.017

0.023

0.003

0.419

0.027

0.000

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

12 39111.7 3259.3 13.07 0.000

71 17704.7

249.4

83 56816.3

The best seasonal model uses Index and 11 seasonal dummy variables to represent

the months Feb through Dec. We retain all the seasonal dummy variables for forecasting

purposes even though some are non-significant. The Durbin-Watson test is inconclusive at the

.05 level. The residual autocorrelations have a just significant spike at lag 6 but are otherwise

non-significant. Forecasts for the first three months of 1993 follow.

Jan 1993

Feb 1993

Mar 1993

Forecast

179

175

197

Actual

151

152

199

Forecasts for Jan and Feb 1993 are high compared to actual numbers of clients but

forecast for Mar 1993 is very close to the actual number of new clients

Autoregressive model:

Autoregressive models with number of new clients lagged 1, 4 and 12 months were

tried. None of these models proved to be useful for forecasting. The best model had number of

new clients lagged 1 month. The results are displayed below.

The regression equation is

Client = 61.4 + 0.487 LagClients

95 cases used, 1 cases contain missing values

Predictor

Coef

Constant

61.41

LagClients 0.48678

SE Coef

T

P

10.91 5.63 0.000

0.08796 5.53 0.000

165

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1 19035 19035 30.62 0.000

Residual Error 93 57805

622

Total

94 76840

There is just significant residual autocorrelation at lag 12 but the remaining

residual autocorrelations are small. The best model of the ones attempted is the final

seasonal model with predictor variables Index and the seasonal dummies.

CASE 8-6: AAA WASHINGTON

1.

The results for the best model are shown below (see also solution to Case 7-2). Each of

the independent variables is significantly different from 0 at the .05 level. The signs of

the coefficients are what we would expect them to be.

The regression equation is

Calls = 17060 + 635 Lg11Rate - 112 NewTemp + 7.59 NewTemp**2

Predictor

Coef SE Coef

T

P

Constant

17060.2

847.0 20.14 0.000

Lg11Rate

635.4

146.5

4.34 0.000

NewTemp

-112.00

47.70 -2.35 0.023

NewTemp**2 7.592

1.657

4.58 0.000

S = 941.792 R-Sq = 75.0% R-Sq(adj) = 73.5%

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

3 140771801 46923934 52.90 0.000

53 47009523

886972

56 187781324

2.

Serial correlation is not a problem. The value of the Durbin-Watson statistic (1.62)

would not reject the null hypothesis of no serial correlation. There are no

significant residual autocorrelations. Restricting attention to integer powers, 2 is the

best choice for the exponential transformation. Allowing other choices for powers,

e.g. 2.4, may improve the fit a bit but is not as nice as an integer power.

3.

The memo to Mr. DeCoria should use all the usual inferential and descriptive summaries

to defend the model in part 1. A residual analysis should also be included.

166

1.

Additional significant explanatory variables may be available but there is not much

variation left to explain. However, it good to have students search for a good

equation using the full range of available variables. Along with the R-squared value,

they should check the t values for the variables in their final equation, and the F value

and the residual autocorrelations. Their results can be used effectively as individual

or team presentations to the class, or as a hand-in writeup or even a small term paper.

2.

Selling the final regression model to management, including the irascible Jackson

Tilson, ties the statistical exercise in the Alomega case to the real world of business

management. The idea of selling the statistical results to management can be

the focus of team presentations to the class with the instructor playing the role of

Tilson. Working through the presentation of results to the class adds an important

real world element to the statistical analysis.

3.

As noted in the case, the advertising predictor variables are under the control of

Alomega management. Students can demonstrate the usefulness of this result by

choosing reasonable future values for these advertising variables and generating forecasts.

However, students must recognize the regression equation does not necessarily

imply a cause and effect relationship between advertising expenditures and sales. In

addition, conditions under which the model was developed may change in the future.

4.

All forecasts, including the ones using Julies regression equation, assume a future

that is identical to the past except for the identified predictor variables. If her

model is used to generate forecasts for Alomega, she should check the model

accuracy on a regular basis. The errors encountered as the future unfolds should

be compared to those in the data used to generate the model. If significant

changes or trends are observed, the model should be updated to include the most

recent data, along with possibly discarding some of the oldest data. Alternatively,

a different approach to the forecasting problem can be sought if the forecasting errors

suggest that the current regression model is inadequate.

1.

The positive coefficient on November makes sense because cookie sales are seasonal

sales relatively high each year in November, the month before the Christmas holidays.

2.

James model looks good. Almost 94% of the variation in cookie sales is explained

by the model. The residual analysis indicates the usual regression assumptions are

tenable, including the independence assumption.

3.

Forecasts:

167

June 2003

July 2003

August 2003

September 2003

October 2003

November 2003

December 2003

4.

733,122

799,823

737,002

1,562,070

1,744,477

2,152,463

1,932,194

SurtidoSales = 115672 + 0.950 Lg12Sales

29 cases used, 12 cases contain missing values

Predictor

Coef

Constant 115672

Lg12Sales 0.94990

SE Coef

T

P

91884 1.26 0.219

0.08732 10.88 0.000

Analysis of Variance

Source

DF

SS

Regression

1 7.03141E+12

Residual Error 27 1.60415E+12

Total

28

8.63556E+12

MS

F

P

7.03141E+12 118.35 0.000

59412957997

This regression model is very reasonable. About 81% of the variation in cookie

sales is explained with the single predictor variable, sales lagged 12 months

(Lg12Sales). The usual residual plots look good and there is no significant residual

autocorrelation.

Forecasts:

June 2003

July 2003

August 2003

September 2003

October 2003

November 2003

December 2003

5.

717,956

632,126

681,996

1,642,130

1,801,762

2,113,392

1,844,434

Both models fit the data well. Apart from July 2003, the forecasts generated by the

models are very close to one another. Dummy variable regression explains more of

the variation in cookie sales but the autoregression is simpler. Could make a case for

either model.

168

1.

The regression results along with residual plots and the residual autocorrelation

function follow.

The regression equation is

Total Visits = 997 + 3.98 Time - 81.4 Sep + 5.3 Oct - 118 Nov - 149 Dec

- 24.2 Jan - 116 Feb + 23.8 Mar + 18.2 Apr - 30.5 May - 39.4 Jun

+ 35.2 Jul

Predictor

Constant

Time

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Coef SE Coef

T

P

996.97

58.42 17.06 0.000

3.9820

0.4444

8.96 0.000

-81.38

71.69 -1.14

0.259

5.34

71.67

0.07

0.941

-118.34

71.66 -1.65

0.102

-148.62

71.66 -2.07

0.041

-24.21

71.65 -0.34

0.736

-116.39

71.65 -1.62

0.107

23.80

73.55

0.32 0.747

18.15

73.53

0.25

0.806

-30.50

73.53 -0.41 0.679

-39.37

73.52 -0.54

0.593

35.20

73.51 0.48

0.633

Analysis of Variance

Source

Regression

Residual Error

Total

DF

SS

MS

F

P

12 2353707 196142 8.07 0.000

101 2456198

24319

113 4809905

169

Mary has a right to be disappointed. This regression model does not fit well. Even

allowing for seasonality, only the Dec seasonal dummy variable is significant at the

.05 level. The residual plots clearly show a poor fit in the middle of the series and

there is a considerable amount of significant residual autocorrelation.

2.

Mary might try an autoregression with different choices of lags of total visits

as predictor variable(s). She might try to fit a Box-Jenkins ARIMA model to

be discussed in Chapter 9. Regardless, finding an adequate model for this

time series will be challenging.

CHAPTER 9

170

ANSWERS TO PROBLEMS AND CASES

1.

a. 0 .196

b. Series is random

c. Series could be a stationary autoregressive process or series could be non-stationary.

Interpretation depends on how fast the autocorrelations decay to 0.

d. Seasonal series with period of 4

2.

t

1

2

3

4

Yt

32.5

36.6

33.3

31.9

Y

t

35.000

34.375

36.306

33.581

et

-2.500

2.225

-3.006

-1.681

6 = 35 + .25(0) - .3(-1.681) = 35.504

Y 7 = 35

Y

Y

3.

a.

Y

61

b.

Y

62

= 75.65

= 76.55

Y

62

= 84.04

Y

63

Y

63

= 87.82

= 84.45

c. 75.65 23.2

4.

a. Model

Autocorrelations

AR

die out

MA

cut off

ARIMA

die out

5.

a. MA(2)

Partial Autocorrelations

cut off

die out

die out

b. AR(1)

c. ARIMA(1,0,1)

6.

b. Q = 44.3 df = 11 = .05

Reject H0 if 2 > 19.675

171

Since Q = 44.3 > 19.675, reject H0 and conclude model is not adequate. Also,

there is a significant residual autocorrelation at lag 2. Add a MA term to the

model at lag 2 and fit an ARIMA(1,1,2) model.

7.

non-stationary. Autocorrelations for first differences of demand, do die

out (cut off relative to standard error limits) suggesting series of first

differences is stationary. Low lag autocorrelations of series of second

differences increase in magnitude, suggesting second differencing is too

much. A plot of the demand series shows the series is increasing linearly in

time with almost a perfect (deterministic) straight line pattern. In fact, a

straight line time trend fit to the demand data represents the data well as

shown in the plot below.

plots of the original series and the series of first differences, suggest an

ARIMA(0,1,1) model with a constant term might be good starting point. The

first order moving average term is suggested by the significant autocorrelation

at lag 1 for the first differenced series.

b. The Minitab output from fitting an ARIMA(0,1,1) model with a constant is

shown below.

172

The least squares estimate of the constant term, .7127, is virtually the same as

The least squares slope coefficient in the straight line fit shown in part a. Also,

The first order moving average coefficient is essentially 1. These two results

are consistent with a straight line time trend regression model for the original data.

Suppose Yt is demand in time period t. The straight line time trend regression

model is: Yt = 0 + 1t + t . Thus Yt 1 = 0 + 1 (t 1) + t 1 and

Yt Yt 1 = 1 + t t 1 . The latter is an ARIMA(0,1,1) model with a constant

term (the slope coefficient in the straight line model) and a first order moving

average coefficient of 1.

There is some residual autocorrelation (particularly at lag 2) for both the straight

line fit and the ARIMA(0,1,1) fit, but the usual residual plots indicate no other

problems.

c. Prediction equations for period 53.

Straight line model: Y53 =19.97 +.71(53)

52

ARIMA model: Y53 =Y52 +.71 1.00

d. The forecasts for the next four periods from forecast origin t = 52 for the

ARIMA model follow.

extrapolating the fitted straight line in part a.

8.

Since the autocorrelation coefficients drop off after one time lag and the partial

autocorrelation coefficients trail off, an MA(1) model should be adequate. The best

173

model is

Y

= 56.1853 - (-0.7064)t-1

Y

127

= 56.1853 + 0.7064)125

127

The critical 5% chi-square value for 10 df is 18.31. Since the calculated chi-square

Q for the residual autocorrelations equals 7.4, the model is deemed adequate.

The autocorrelation and partial autocorrelation plots for the original series follow.

Autocorrelation Function for Yt

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

10

Lag

Corr

LBQ

0.39

Lag

Corr

20

LBQ

Lag

Corr

LBQ

30

Lag

Corr

LBQ

4.33

19.20

10 -0.12 -1.13

25.16

19 -0.06 -0.51

36.49

28 0.19 1.60

62.90

2 -0.08 -0.80

20.06

11 -0.08 -0.75

26.02

20 -0.13 -1.22

39.26

29 0.03 0.21

63.01

0.06

0.62

20.59

12 0.10

0.95

27.43

21 -0.04 -0.36

39.51

30 -0.05 -0.45

63.51

0.02

0.22

20.65

13 0.14

1.36

30.41

22 -0.11 -1.02

41.53

31 0.08 0.63

64.52

5 -0.07 -0.65

21.24

14 0.02

0.20

30.47

23 -0.25 -2.22

51.35

0.18

21.29

15 0.13

1.21

32.95

24 -0.03 -0.26

51.49

7 -0.01 -0.12

0.02

21.31

16 0.13

1.21

35.49

25

0.50

52.05

8 -0.08 -0.82

22.27

17 0.05

0.47

35.89

26 -0.16 -1.36

56.19

9 -0.08 -0.81

23.23

18 0.03

0.27

36.02

27 -0.06 -0.54

56.87

0.06

174

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

10

Lag

PAC

20

30

Lag PAC

Lag PAC

0.39

4.33

10 -0.08

-0.90

19 -0.04

-0.40

28

0.05

0.53

2 -0.27

-3.03

11 0.02

0.24

20 -0.10

-1.10

29 -0.11

-1.26

Lag PAC

0.26

2.95

12 0.13

1.41

21 0.05

0.61

30 -0.00

-0.05

4 -0.20

-2.25

13 0.04

0.50

22 -0.20

-2.30

31

0.45

0.08

0.92

14 -0.02

-0.27

23 -0.05

-0.59

6 -0.01

-0.10

15 0.21

2.36

24 0.14

1.53

7 -0.06

-0.65

16 -0.11

-1.18

25 -0.10

-1.14

8 -0.02

-0.22

17 0.17

1.94

26 -0.08

-0.92

9 -0.08

-0.95

18 -0.15

-1.64

27 0.10

1.17

0.04

Final Estimates of Parameters

Type

Coef

StDev

T

MA 1

-0.7064

0.0638 -11.07

Constant 56.1853

0.5951 94.42

Mean

56.1853

0.5951

Number of observations: 126

Residuals: SS = 1910.10 (backforecasts excluded)

MS = 15.40 DF = 124

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 7.4(DF=10) 36.4(DF=22) 64.8(DF=34) 80.5(DF=46)

Period

127

128

129

9.

Forecast

52.3696

56.1853

56.1853

95 Percent Limits

Lower

Upper

44.6754

60.0637

46.7651

65.6054

46.7651

65.6054

Since the autocorrelation coefficients trail off and the partial autocorrelation

coefficients cut off after one time lag, an AR(1) model should be adequate.

The best model is

Y

= 109.628 - 0.9377Yt-1

175

Y

81

= 109.628 - 0.9377Y80

81

Autocorrelation Function for Yt

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

LBQ

Lag

Corr

1 -0.88 -7.86

64.18

0.17

Corr

10

0.80

4.50 118.31

10

0.05

20

LBQ

Lag

Corr

LBQ

0.60 229.81

15

0.21

0.74

238.24

0.59

15

0.19 230.78

17

0.26

0.90

251.34

2.45 186.36

19

13

0.40

1.47 220.35

0.08

0.28 231.46

176

0.36

1.22

277.15

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

10

15

PAC

Lag PAC

1 -0.88

-7.86

8 -0.16

0.13

1.16

9 -0.16

0.28

2.54

10

0.16

1.43

11

Lag

20

PAC

-1.44

15 -0.00

-0.04

-1.44

16

0.03

0.28

0.13

1.19

17

0.01

0.08

0.11

1.01

18 -0.26

-2.32

0.15

1.30

12 -0.02

-0.19

19 -0.10

-0.87

6 -0.04

-0.40

13

0.11

0.98

20

1.42

1.10

14 -0.22

-1.95

0.12

Final Estimates of Parameters

Type

Coef

StDev

AR 1

-0.9377

0.0489

Constant 109.628

0.611

Mean

56.5763

0.3151

0.16

T

-19.17

179.57

Number of observations: 80

Residuals: SS = 2325.19 (backforecasts excluded)

MS = 29.81 DF = 78

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 24.8(DF=10) 39.4(DF=22) 74.0(DF=34) 83.9(DF=46)

Period

81

82

83

Forecast

29.9234

81.5688

33.1408

95 Percent Limits

Lower

Upper

19.2199

40.6269

66.8957

96.2419

15.7088

50.5728

The critical 5% chi-square value for 10 df's is 18.31. Since the calculated chi-square

Q for the residual autocorrelations equals 24.8, the model is deemed inadequate. An

examination of the individual residual autocorrelations suggests it might be possible to

improve the model by adding a MA term at lag 2.

177

10.

As can be seen below, the autocorrelations for the original series are slow to die out. This

behavior indicates the series may be non-stationary. The autocorrelations for the

differenced data cut off after lag 1 and the partial autocorrelations die out. This suggests

an ARIMA(0,1,1) model. When this model is fit (see the computer output below), there

are no significant residual autocorrelations and the residual plots look good. The

forecasting equation from the fitted model is

Y

= Yt-1 - (-0.3714)t-1

Y

Y

= Y80 - (-0.3714)80

81 = 266.9 - (-0.3714)(3.4647) = 268.19

81

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

10

15

LBQ

Lag

Corr

LBQ

Lag

Corr

0.92

8.24

70.42

0.28

0.92 271.99

15 0.01

0.83

4.50 127.89

0.23

0.75 276.74

0.74

3.28 174.43

10

0.18

0.58 279.67

0.65

2.54 210.36

11

0.14

0.45 281.49

0.55

2.00 236.52

12

0.12

0.38 282.83

0.45

1.55 254.16

13

0.09

0.28 283.55

0.35

1.18 265.00

14

0.05

0.16 283.78

Final Estimates of Parameters

178

Lag

Corr

20

LBQ

0.03 283.80

Type

MA 1

Coef

-0.3714

StDev

0.1052

T

-3.53

Number of observations: Original series 80, after differencing 79

Residuals: SS = 10637.3 (backforecasts excluded)

MS = 136.4 DF = 78

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 9.2(DF=11) 14.1(DF=23) 28.6(DF=35) 39.2(DF=47)

Period

81

82

83

95 Percent Limits

Lower

Upper

245.848

291.635

229.885

307.597

218.787

318.695

Forecast

268.741

268.741

268.741

The critical 5% chi-square value for 11 df's is 19.68. Since the calculated

chi-square Q for the residual autocorrelations equals 9.2, the model is deemed adequate.

11.

The slow decline in the early, non-seasonal lags indicates the need for regular

differencing.

Autocorrelation Function for Yt

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

12

Lag

Corr

LBQ

0.71

6.92

49.43

0.63

4.34

88.66

0.63

Lag

Corr

22

LBQ

Lag

Corr

8 0.54

2.07 309.89

15

9 0.50

1.85 337.18

16

3.69 128.66

10 0.45

1.61 359.38

0.62

3.23 168.41

11 0.50

0.63

2.98 210.04

0.56

2.41 242.61

0.59

2.39 279.06

LBQ

Lag

0.40

1.22 506.38

22

0.40

1.20 525.09

23

17

0.42

1.26 546.38

24

1.74 387.14

18

0.33

0.97 559.60

12 0.70

2.34 441.38

19

0.35

1.03 574.97

13 0.49

1.56 468.47

20

0.30

0.87 586.38

14 0.41

1.28 487.93

21

0.27

0.78 595.80

179

Corr

LBQ

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Corr

LBQ

1 -0.35 -3.44

2 -0.17 -1.49

3

0.07

15

LBQ

LBQ

12.19

8 -0.01 -0.11

15.08

9 0.05

0.40

25.78

15 -0.03 -0.19

90.84

26.06

16 -0.05 -0.28

91.10

15.09

10 -0.17 -1.35

29.20

17

98.33

4 -0.03 -0.23

24

15.16

11 -0.29 -2.22

38.14

1.57

18.65

12 0.65

4.80

85.04

19

6 -0.21 -1.78

23.41

13 -0.19 -1.14

89.05

25.76

14 -0.12 -0.73

90.72

21

0.01

0.18

0.15

1.21

Lag

Corr

Lag

Corr

25

0.25

0.14

0.05

1.47

Lag

Corr

LBQ

0.79 107.44

0.26 107.79

coefficients seem to be decaying slowly. Seasonal differencing is necessary.

The autocorrelation coefficient and partial autocorrelation coefficient plots for the

regular and seasonal differenced data are shown on the next page.

180

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Corr

LBQ

1 -0.49 -4.44

20.42

2 -0.03 -0.19

3 0.04 0.30

15

Lag

Corr

LBQ

8 0.07 0.48

20.47

20.61

4 0.03 0.23

20.70

5 -0.10 -0.76

21.64

6 0.09 0.67

7 -0.08 -0.62

Lag

Corr

25

LBQ

23.42

15 -0.03 -0.15

9 0.01 0.08

23.43

10 -0.07 -0.50

23.89

11 0.27 2.00

12 -0.50 -3.48

22.38

23.02

Lag

Corr

LBQ

61.85

22 0.02 0.12

70.43

16 -0.11 -0.66

63.13

23 -0.02 -0.12

70.48

17 0.21 1.22

67.63

24 0.02 0.10

70.51

31.19

18 -0.13 -0.78

69.58

25 0.03 0.20

70.65

55.77

19 0.07 0.40

70.11

13 0.24 1.45

61.38

20 -0.05 -0.28

70.36

14 0.06 0.37

61.78

21 -0.01 -0.07

70.38

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

15

25

PAC

Lag PAC

Lag PAC

1 -0.49

-4.44

8 -0.07

-0.66

15 -0.04

2 -0.34

-3.14

9 -0.01

-0.07

16 -0.09

3 -0.21

-1.95

10 -0.09

-0.79

4 -0.09

-0.81

11

0.34

5 -0.17

-1.55

12 -0.32

6 -0.07

-0.64

13 -0.20

7 -0.15

-1.36

14 -0.12

PAC

-0.37

22 -0.04

-0.38

-0.81

23

0.15

1.36

17 0.01

0.09

24 -0.20

-1.79

3.10

18 0.04

0.36

25 -0.05

-0.47

-2.92

19 0.01

0.09

-1.86

20 0.02

0.17

-1.06

21 0.02

0.19

181

Lag

after one time lag and the partial autocorrelation coefficients trail off, so a regular

moving average term of order 1 is indicated. Concentrating on the seasonal lags

(12 and 24), the autocorrelation coefficients cut off after lag 12 and the partial

autocorrelation coefficients trail off, so a seasonal moving average term of order 12

is suggested. An ARIMA(0,1,1)(0,1,1) model for Yt is identified.

Final Estimates of Parameters

Type

Coef

StDev

MA 1

0.7486

0.0742

SMA 12

0.8800

0.0893

T

10.09

9.85

Number of observations: Original series 96, after differencing 83

Residuals: SS = 5744406210 (backforecasts excluded)

MS = 70918595 DF = 81

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 3.0(DF=10) 19.3(DF=22) 23.0(DF=34) 25.1(DF=46)

Period

97

98

99

100

101

102

103

104

105

106

107

108

Forecast

163500

158300

177084

178792

188706

184846

191921

188746

185194

187669

188084

221521

95 Percent Limits

Lower

Upper

146991

180009

141277

175322

159562

194606

160785

196798

170227

207185

165907

203785

172532

211310

168918

208574

164936

205451

166991

208348

166993

209175

200025

243016

The critical 5% chi-square value for 10 df's is 18.31. Since the calculated

chi-square Q for the residual autocorrelations equals 3, the model is deemed adequate.

12.

a. See part b.

b. The autocorrelation coefficient plot below indicates that the data are

non-stationary. Therefore, the data should be first differenced. The

autocorrelation coefficient and partial autocorrelation coefficient plots for

the first differenced data are also shown.

182

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

12

Lag

Corr

LBQ

Lag

Corr

LBQ

0.87

6.30

42.05

0.22

0.66

143.50

0.76

3.44

74.38

0.16

0.46

145.12

0.66

2.48

99.23

10

0.13

0.37

146.21

0.54

1.84

116.50

11

0.12

0.35

147.18

0.44

1.40

128.06

12

0.08

0.23

147.60

0.34

1.06

135.27

13

0.07

0.19

147.91

0.28

0.85

140.30

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

12

Lag

Corr

LBQ

Lag

Corr

0.30

2.12

4.77

-0.08

-0.52

6.86

0.08

0.56

5.17

-0.31

-2.01

13.24

0.06

0.37

5.34

10

-0.22

-1.31

16.44

0.03

0.20

5.40

11

-0.01

-0.05

16.45

-0.02

-0.14

5.42

12

-0.06

-0.37

16.74

-0.11

-0.74

6.19

-0.06

-0.41

6.44

183

LBQ

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

utocorrelation

-0.6

-0.8

-1.0

Lag

PAC

Lag

12

PAC

0.30

2.12

8 -0.06

-0.43

2 -0.00

-0.03

9 -0.30

-2.11

0.04

0.25

10 -0.05

-0.39

0.01

0.04

11

0.10

0.72

5 -0.04

-0.27

12 -0.09

-0.67

6 -0.11

-0.77

7 -0.00

-0.00

for the differenced series. Could make a case for a first order AR term in a model

for the differenced data. The results from fitting an ARIMA(0,1,1) model are shown

below. Successive changes are not random if an ARIMA(0,1,1) model is appropriate.

ARIMA model for IBM

Final Estimates of Parameters

Type

Coef

StDev

AR 1

0.3780

0.1496

T

2.53

P

.015

Number of observations: Original series 52, after differencing 51

Residuals: SS = 1710.50 (backforecasts excluded)

MS = 34.21 DF = 50

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 7.3(DF=11) 15.8(DF=23) 28.5(DF=35) 38.1(DF=47)

Period

53

54

Forecast

311.560

314.418

95 Percent Limits

Lower

Upper

300.094

323.026

294.895

333.941

d. The residual plots look good and there are no significant residual autocorrelations.

184

e. Y t = Yt-1 + .378(Yt - Yt-1)

Y

53

The nave forecast is Y53 =Y52 = 304 .

13.

53

One question that might arise is should the student use the first 145 observations or

all 150 observations. With this many observations, it will not make much difference.

The autocorrelation function using all the data below is slow to die out and suggests

the DEF time series is non-stationary. Therefore, the differenced data should be investigated.

The autocorrelation coefficient and partial autocorrelation coefficient plots for the first

differenced data follow.

185

It appears that the autocorrelations for the differenced data cut off after lag one

and that the partial autocorrelations die out. This suggests a regular MA term in a model

for the differenced data so an ARIMA(0,1,1) model is identified. If 145 observations

are used, the forecasting equation from the fitted model is

Y

= Yt-1 - 0.7179t-1

186

Type

Coef SE Coef

T

P

MA 1

0.7179 0.0582 12.34 0.000

Constant -0.00049 0.06024 -0.01 0.994

Differencing: 1 regular difference

Number of observations: Original series 145, after differencing 144

Residuals: SS = 917.134 (backforecasts excluded)

MS = 6.459 DF = 142

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 12.3 29.5 57.2 66.1

DF

10

22

34

46

P-Value 0.266 0.131 0.008 0.028

Forecasts from period 145

Period

146

147

148

149

150

Forecast

133.815

133.814

133.814

133.813

133.813

95% Limits

Lower Upper

128.832 138.797

128.637 138.991

128.450 139.178

128.268 139.358

128.092 139.533

Actual

135.2

139.2

136.8

136.0

134.4

This model fits well. The usual residual analysis indicates no model inadequacies.

Comparing the forecasts with the actuals for the five days from forecast origin t = 145

using MAPE gives MAPE = 1.82%.

14.

187

The sample autocorrelation and partial autocorrelation functions below suggest and

AR(2) or, equivalently, an ARIMA(2,0,0) model. The computer output follows along

with the residual autocorrelation function.

Final Estimates of Parameters

Type

AR 1

AR 2

Constant

Mean

Coef SE Coef

T

P

1.4837

0.0732 20.26 0.000

-0.7619

0.0729 -10.45 0.000

17.181

1.381 12.44 0.000

61.757

4.965

Number of observations: 90

Residuals: SS = 14914.5 (backforecasts excluded)

MS = 171.4 DF = 87

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

36

48

Chi-Square 19.9 25.9 41.7 55.9

DF

9

21

33

45

P-Value

0.018 0.209 0.142 0.128

Forecasts from period 90

Period Forecast

91 110.333

95% Limits

Lower

Upper Actual

84.665 136.001

188

The forecast of 110 accidents for the 91st week seems reasonable given the history

of the series near that point.

There is no evidence of annual seasonality in these data but since there is less

than two years of weekly observations, seasonality, if it exists, would be virtually

impossible to detect.

15.

The time series plot that follows suggests the Price series is non-stationary. This

is corroborated by the autocorrelations which are slow to die out. The differenced

series should be investigated.

189

The autocorrelation function for the differenced data below suggests the

differenced series is random. The partial autocorrelation function for the

differenced data has a similar appearance.

An ARIMA(0,1,0) model is identified for the price of corn. For this model

a forecast of the next observation at forecast origin t is given by Yt +1 =Yt . Forecasts

two steps ahead are the same, similarly for three steps ahead and so forth. In other

words, this model produces flat line forecasts whose intercept is given by Yt .

So, forecasts of the price of corn for the next 12 months are all given by the last

observation or 251 cents per bushel.

16.

The variation in the Cavanaugh sales series increases with the level, so a log

transformation seems appropriate. Let Yt be the natural log of sales and

190

ARIMA(0,0,2)(0,1,0)12 and ARIMA(1,0,0)(0,1,1)12. Both models contain a

constant term. Another possibility is the ARIMA(0,1,1)(0,1,1) 12 model (without

a constant), but the latter doesnt fit quite as well as the former models. The results

for the ARIMA(1,0,0)(0,1,1)12 process are displayed below.

Fitted model: Wt = .54Wt 1 + .119 + t .81t 12

Final Estimates of Parameters

Type

Coef

SE Coef

T

P

AR 1

0.5400

0.1080

5.00 0.000

SMA 12

0.8076

0.1162

6.95 0.000

Constant 0.1187

0.0060 19.70 0.000

Differencing: 0 regular, 1 seasonal of order 12

Number of observations: Original series 77, after differencing 65

Forecasts: Date

Jun. 2000

5.76675

Jul. 2000

Aug. 2000

Sep. 2000

Oct. 2000

Nov. 2000

Dec. 2000

ForecastLnSales

320

6.11484

6.40039

6.80928

7.09153

7.14969

6.85211

191

ForecastSales

453

602

906

1202

1274

946

The residual autocorrelation a lag 2 can be ignored or, alternatively, can fit the

ARIMA(0,0,2)(0,1,1)12 model.

17.

The variation in Disney sales increases with the level, so a log transformation

seems appropriate. Let Yt be the natural log of sales and Wt = Yt Yt 4 be the

seasonally differenced series. Two ARIMA models that represent the data

reasonably well are given by the representations ARIMA(1,0,0)(0,1,1) 4 and

ARIMA(0,1,1)(0,1,1)4. The former model contains a constant. The results for

the ARIMA(1,0,0)(0,1,1)4 process are displayed below.

Fitted model: Wt = .50Wt 1 + .089 + t .49t 4

Final Estimates of Parameters

Type

Coef SE Coef

T

P

AR 1

0.4991

0.1164

4.29 0.000

SMA 4

0.4863

0.1196

4.07 0.000

Constant 0.0886

0.0063 14.07 0.000

Differencing: 0 regular, 1 seasonal of order 4

Number of observations: Original series 63, after differencing 59

Forecasts: Date

ForecastLnSales

Q4 1995

8.25008

Q1 1996

8.12423

Q2 1996

8.11642

Q3 1996

8.24372

Q4 1996

8.43698

192

ForecastSales

3828

3375

3349

3804

4615

18.

The data were transformed by taking natural logs; however, an ARIMA model

may be fit to the original observations. Let Yt be the natural log of demand

and let Wt = 12Yt = Yt Yt 1 Yt 12 +Yt 13 be the series after taking one seasonal

difference followed by a regular difference. An ARIMA(0,1,1)(0,1,1) 12 model

represents the log demand series well. The results follow.

Fitted model: Wt = t .63t 1 .57t 12 + (.63)(.57)t 13

Final Estimates of Parameters

Type

Coef SE Coef

T

P

MA 1

0.6309

0.0724

8.71 0.000

SMA 12

0.5735

0.0849

6.75 0.000

Differencing: 1 regular, 1 seasonal of order 12

Number of observations: Original series 129, after differencing 116

Forecasts: Date

ForecastLnDemand

Oct. 1996

5.23761

Nov. 1996

5.29666

Dec. 1996

5.33704

193

ForecastDemand

188

200

208

19.

difference followed by a regular difference. Examination of the autocorrelation

function for Wt leads to the identification of an ARIMA(0,1,0)(0,1,1) 12 model.

The results follow.

Fitted model: Wt = t .84t 12

Final Estimates of Parameters

Type

Coef SE Coef

T

P

SMA 12 0.8438 0.0733 11.51 0.000

Differencing: 1 regular, 1 seasonal of order 12

Number of observations: Original series 130, after differencing 117

Residuals:

MS = 61770 DF = 116

194

Lag

12

24

36

Chi-Square 13.2 19.3 26.2

DF

11

23

35

P-Value

0.280 0.681 0.858

48

52.6

47

0.266

Period

131

132

133

134

135

136

137

138

139

140

141

142

20.

Forecast

73653.4

73448.7

72571.8

72904.3

73200.8

73711.5

74218.7

75021.6

75459.7

75114.5

74519.0

74681.4

95% Limits

Lower

Upper

73166.1 74140.6

72759.7 74137.8

71727.9 73415.7

71929.8 73878.7

72111.4 74290.3

72518.1 74905.0

72929.6 75507.7

73643.5 76399.7

73998.0 76921.4

73573.8 76655.3

72903.0 76134.9

72993.6 76369.2

The variation in Wal-Mart sales increases with the level, so a log transformation

seems appropriate. Let Yt be the natural log of sales and Wt = Yt Yt 4 be the

seasonally differenced series. Examination of the autocorrelation function for Wt

leads to the identification of an ARIMA(0,1,0)(0,1,1) 4 model.

The results follow.

Fitted model: Wt = t .52t 4

Final Estimates of Parameters

195

Type

Coef SE Coef

T

P

SMA 4 0.5249

0.1185 4.43 0.000

Differencing: 1 regular, 1 seasonal of order 4

Number of observations: Original series 60, after differencing 55

Residuals:

MS = 0.0005984 DF = 54

Lag

12

24

36

48

Chi-Square 12.5 20.3 30.3 47.7

DF

11

23

35

47

P-Value

0.327 0.626 0.697 0.445

Forecasts from period 60

Period

Q1/05

Q2/05

Q3/05

Q4/05

Q1/06

Q2/06

Q3/06

Q4/06

21.

Forecasts

LnSales

Sales

11.1671 70,764

11.2514 76,988

11.2408 76,176

11.4233 91,427

11.2660 78,120

11.3503 84,991

11.3397 84,095

11.5223 100,942

an AR(1) model.

196

Summary of model fit and forecasts for the next 5 years follow.

Final Estimates of Parameters

Type

Coef SE Coef

T

P

AR 1

0.5486

0.0845 6.49 0.000

Constant 9.0295

0.6082 14.85 0.000

Mean

20.001

1.347

Number of observations: 100

Residuals:

MS = 36.99 DF = 98

Lag

12

Chi-Square 11.0

24

23.2

36

30.7

48

45.9

197

DF

P-Value

10

22

34

46

0.358 0.388 0.631 0.477

Period

101

102

103

104

105

106

22.

Forecast

21.6463

20.9037

20.4964

20.2729

20.1504

20.0831

95% Limits

Lower Upper

9.7236 33.5691

7.3049 34.5026

6.4323 34.5605

6.0718 34.4741

5.9082 34.3925

5.8287 34.3376

Since the variation in the series increases with the level, a log transformation is indicated.

An examination of the autocorrelations and partial autocorrelations for LnGapSales leads

to the identification of an ARIMA(0,1,0)(0,1,1) 4 model. Summary of model fit and

forecasts for the next 8 quarters follow.

Final Estimates of Parameters

Type

Coef SE Coef

T

P

SMA 4 0.2780 0.1004 2.77 0.007

Differencing: 1 regular, 1 seasonal of order 4

Number of observations: Original series 100, after differencing 95

Residuals:

MS = 0.003316 DF = 94

Lag

12

24

36

48

198

DF

11

23

35

47

P-Value

0.194 0.659 0.916 0.990

Forecasts from period 100

Period LnGapSales GapSales

101

8.19679

3,629

102

8.23373

3,766

103

8.30268

4,035

104

8.51475

4,988

105

8.21496

3,696

106

8.25189

3,835

107

8.32085

4,109

108

8.53291

5,079

23.

The long strings of 0s (no Influenza A positive cases) of uneven lengths might create

identification and fitting problems for ARIMA modeling. On the other hand, a simple

AR(1) model with an AR coefficient of about .8 and no constant term might provide

reasonable one week ahead forecasts for the number of positive cases. These forecasts

can be generated with the understanding that any non-integer forecast less than 1 is set

to 0 and any non-integer forecast greater than 1 is rounded to the closest integer.

1. & 2. & 3.

Type

Coef

SE Coef

P

199

AR 1

0.5997

Constant 1921.7

Mean

4800.8

100.2 19.18 0.000

250.3

Residuals:

MS = 1038870 DF = 102

Lag

12

24

36

48

Chi-Square 8.9 24.5

36.8 48.5

DF

10

22

34

46

P-Value

0.545 0.322 0.342 0.372

Forecasts and actuals for first four weeks in January 1983:

Period

105

106

107

108

Forecast

3249.49

3870.48

4242.89

4466.23

95% Limits

Lower Upper Actual

1251.36 5247.62

2431

1540.58 6200.38

2796

1804.68 6681.10

4432

1990.23 6942.23

5714

Forecasts are too high for first two weeks of January 1983 and too low for next

Two weeks. Note however, that actual sales fall within the 95% prediction

interal limits for each of the four weeks.

4.

5.

The best model in Chapter 8 for the original Restaurant Sales data is an autoregressive

model with an added dummy variable to represent the period during the year when

Marquette University is in session. So, because of the additional dummy variable, this

model fits the data better than the AR(1) model in part 1. If the dummy variable were not

present, the two models would be the same. Consequently, we would expect better

forecasts with the AR + dummy variable model than with the simple AR model.

Regardless, however, if forecasts are compared to actuals from forecast origin 104 (last

week in 1982), the usual measures of forecast accuracy (RMSE, MAPE, etc.) are likely

to be relatively large since a large portion of the variation in sales is not accounted for

by the AR + dummy variable model.

At the very least the parameters in the AR(1) model should be re-estimated if the

new data are combined with the old data. A better approach is to combine the data

and the go through the usual ARIMA model building process again. It may be the

combined data suggest the form of the ARIMA has changed. In this case, an AR(1)

is still appropriate when the new data are combined with the old data.

1.

Box-Jenkins ARIMA models account for the autocorrelation in the observed series using

200

possibly differenced data, lagged dependent variables and current and previous errors.

There are no potential causal (exogenous) independent variables in these models so they

are often difficult to explain to management. Best to demonstrate the results.

2.

Autocorrelation and partial autocorrelation plots for the regular and seasonally

differenced data suggest a non-seasonal AR(2) term (the partial autocorrelations cut

off after lag 2 and the autocorrelations die out). No seasonal MA or AR terms should

be included. However, here is a case where, say, the ARIMA(2, 1, 0)(0, 1, 0) 12

model is more complex than necessary and a much simpler model works well. A time

series plot of the seasonally differenced Mr. Tux data is shown below along with the

sample autocorrelation function for these differences.

provide a good fit to the Mr. Tux data.

201

3.

To fit the model Wt = Yt Yt 12 = 0 + t to the Mr. Tux data, simply set 0 = W , the

0 = 32,174 . Since the residuals from this model

mean of the seasonal differences. Here

0 = 32,174 , the residual

differ from the seasonal differences by the constant

autocorrelation function will be identical to the autocorrelation function for the seasonal

differences shown in part 2. The forecasting equation is simply Yt =32,174 +Yt 12 .

Setting t = 97 through t = 108, we have the forecasts for the 12 months of 2006:

Y97 =32,174 +71,043 =103,217

Y98 =185,104

Y = 282,733

99

Y100

Y101

Y102

Y

=441,741

=426,921

=305,048

Y104 =407,576

Y105 = 227,583

Y

=205,692

106

Y107 =213,876

Y108 = 290,887

The sales forecasts for 2006 are obtained by adding 32,174 to the sales for

each of the 12 months of 2005.

CASE 9-3: CONSUMER CREDIT COUNSELING

2.

The autocorrelation function plot below indicates that the data are non-stationary.

The autocorrelations are slow to die out. In addition, there is a spike at lag 12 and a

smaller spike at lag 24 indicating some seasonality.

202

The autocorrelation functions for the differenced series (DiffClients), the seasonally

differenced series (Diff12Clients) and the series with one regular and one seasonal

difference (DiffDiff12Clients) follow.

203

for DiffDiff12Clients are much more pronounced, indicating one regular difference and

one seasonal difference is too much. The autocorrelations for Diff12Clients are the

cleanest with a significant spike at lag 12 and a slightly smaller spike at lag 24. This

autocorrelation pattern suggests an ARIMA(0,0,0)(0,1,1) 12 or an ARIMA(0,0,0)(1,1,0)

model. The former model is the better choice. Summary results and forecasts follow.

Final Estimates of Parameters

Type

Coef

SMA 12 0.4614

SE Coef

T

P

0.1055 4.37 0.000

Number of observations: Original series 99, after differencing 87

Residuals:

MS = 710.4 DF = 86

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

204

Lag

12

24

36

48

Chi-Square 10.9 20.3 30.8 37.4

DF

11

23

35

47

P-Value

0.452 0.623 0.669 0.842

Forecasts from period March 1993

Period Forecast

Apr 1993 123.181

May 1993 122.960

Jun 1993

140.803

Jul 1993

150.944

Aug 1993 140.056

Sep 1993

134.285

Oct 1993

146.517

Nov 1993 146.953

Dec 1993

126.243

95% Limits

Lower

Upper

70.931 175.431

70.710 175.210

88.553 193.053

98.694 203.194

87.806 192.306

82.035 186.535

94.267 198.767

94.703 199.203

73.993 178.493

1.

The forecast for 1961 using the AR(2) model is 1290. The revised error

measures are:

MAD = 114

2.

MAPE = 7.1%

The results from fitting an ARIMA(1,1,0) model, one step ahead forecasts and

actuals follow.

Final Estimates of Parameters

Type

Coef SE Coef

AR 1 0.4551

0.1408

T

P

3.23 0.002

Number of observations: Original series 42, after differencing 41

Residuals: SS = 2139107 (backforecasts excluded)

MS = 53478 DF = 40

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

24

Chi-Square 5.5

22.1

DF

11

23

P-Value 0.905 0.514

36

25.3

35

0.885

48

*

*

*

Period

Actual

Forecast

Error

205

1949

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1984

1787

1689

1866

1896

1684

1633

1657

1569

1390

1397

1289

1905

2018

1697

1644

1947

1910

1588

1610

1668

1529

1309

1400

79

-231

-8

222

-51

-226

45

47

-99

-139

88

-111

Comparing the forecasting equation for the ARIMA(1,1,0) model with the forecasting

equation for the AR(2) model given in the case, we see the two equations are very

similar and, consequently, would expect the one step ahead forecasts and forecast

errors to be similar. The error measures for ARIMA(1,1,0) forecasts are

MAD = 112

MAPE = 6.9%

the same as those for the AR(2) model. The choice of one model over the other depends

upon whether one believes the sales series in non-stationary or nearly non-stationary.

3.

products (such as the automobile) which could affect sales versus fairly standard

products (such as copper) whose demand may be impacted by technological

advances which require them (such as wiring). There are no right or wrong

answers here--just some that are better than others.

1. & 2.

206

Fitted model:

Yt = Yt 12 +50.479 +t .792t 12

207

CASE 9-6: UPS AIR FINANCE DIVISION

1.

A constant term is not required with a regular and a seasonal difference.

2.

significant autocorrelation. The residual autocorrelations and residual plots

below confirm the model is adequate. (There is one large residual in period 49.)

208

3.

209

The results from fitting an ARIMA(0,0,1)(0,1,1)12 model and forecasts follow.

Fitted model:

The residual autocorrelations are shown below. The residual plots look good.

The model is adequate.

210

1.

Final Estimates of Parameters

Type

Coef

SMA 12 -0.6376

SE Coef

T

P

0.3022 -2.11 0.046

Number of observations: Original series 25, after differencing 24

Residuals: SS = 202359762350 (backforecasts excluded)

MS = 8798250537 DF = 23

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

Lag

12

Chi-Square 10.2

DF

11

P-Value

0.513

24

*

*

*

36 48

* *

* *

* *

This model was suggested by an examination of the plots of the autocorrelation and

partial autocorrelation functions for the original series and the first differenced series.

Another potential model is an ARIMA(1,0,0)(0,0,1) 12 model. But if this model is fit to

the data, the estimate of the autoregressive parameter turns out to be very nearly 1,

confirming the choice of the initial ARIMA(0,1,0)(0,0,1) 12.

2.

The model in part 1 is adequate. The is no residual autocorrelation and the residual plots

that follow look good.

211

3.

Period

26

27

28

29

30

31

32

33

34

35

36

Forecast

426280

492809

527275

535656

545614

692161

554640

494570

484265

471355

462995

95% Limits

Lower

Upper

242397 610163

232759 752859

208780 845770

167890 903422

134439 956789

241741 1142580

68131 1041149

-25530 1014669

-67384 1035914

-110135 1052844

-146876 1072867

212

37

491232

-145757 1128222

The pattern of the forecasts is reasonable but the forecast of the seasonal peak in

December (recall this series starts in June) is very likely to be much too low. The

actual December peak may be captured by the 95% prediction limits but, because of

the small sample size, these limits are wide. The lower prediction limit is even

negative for some lead times.

4.

The sample size in this case is small. With only two years of monthly data, it is

difficult to estimate the seasonality precisely. Although an ARIMA model

does provide some insights into the nature of this series, another modeling approach

may produce more readily acceptable forecasts.

1.

Final Estimates of Parameters

Type

Coef

SMA 12 0.7150

SE Coef

T

P

0.1910 3.74 0.001

Number of observations: Original series 41, after differencing 29

Residuals: SS = 650837704391 (backforecasts excluded)

MS = 23244203728 DF = 28

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

213

Lag

12

24

Chi-Square 7.4

14.2

DF

11

23

P-Value 0.770 0.921

36

*

*

*

48

*

*

*

Cookie sales have a strong and quite consistent seasonal component but with

little or no growth. Following the usual pattern of looking at autocorrelations

and partial autocorrelations for the original series and its various differences, the

best patterns for model identification appear to be those for the original series and

the seasonally differenced series. In either case, a seasonal moving average term of

order 12 is included in the model to accommodate seasonality and can be deleted if

non-significant. Fitting an ARIMA(1,0,0)(0,0,1)12 model gives an estimated

autoregressive coefficient of about .9, suggesting perhaps a model with a regular

difference, residual autocorrelations and unattractive forecasts. This line of

inquiry is not useful. The ARIMA model above involving the seasonally

differenced data fits well and, as we shall see, produces reasonable forecasts.

2.

below, the ARIMA(0,0,0,)(0,1,1) 12 model is adequate.

214

3.

The forecasts for the next 12 months follow. Judging from the time series plot,

they seem very reasonable.

Forecasts from period 41

Period

42

43

44

45

46

47

48

49

50

51

52

53

Forecast

627865

721336

658579

1533503

1628889

2070440

1805503

778148

534265

525169

697168

624876

95% Limits

Lower

Upper

328983 926748

422453 1020219

359696 957461

1234620 1832386

1330007 1927772

1771557 2369323

1506620 2104385

479265 1077031

235382 833148

226286 824052

398285 996051

325994 923759

215

1.

Various plots follow. Given these plots, Marys initial model seems reasonable.

216

217

2.

analysis and forecasts for the next 12 months.

Final Estimates of Parameters

Type

Coef SE Coef

T

P

MA 1 0.3568

0.0931 3.83 0.000

SMA 12 0.8646

0.0800 10.80 0.000

Differencing: 1 regular, 1 seasonal of order 12

Number of observations: Original series 114, after differencing 101

Residuals: SS = 988551 (backforecasts excluded)

MS = 9985 DF = 99

Modified Box-Pierce (Ljung-Box) Chi-Square statistic

218

Lag

12

24

Chi-Square 21.2

53.0

DF

10

22

P-Value

0.020 0.000

36

72.9

34

0.000

48

88.1

46

0.000

Period Forecast

115 1419.59

116 1438.07

117 1386.09

118 1376.53

119 1459.48

120 1431.27

121 1365.43

122 1456.48

123 1324.46

124 1303.44

125 1442.69

126 1350.23

95% Limits

Lower

Upper

1223.70 1615.49

1205.16 1670.99

1121.28 1650.90

1083.27 1669.79

1140.30 1778.66

1088.12 1774.41

999.88 1730.98

1069.83 1843.14

917.79 1731.12

877.70 1729.17

998.71 1886.68

888.71 1811.74

219

Collectively, the residual autocorrelations are larger than they would be for random

errors; however, they suggest no obvious additional terms to add to the ARIMA model.

Apart from the large residual at month 68, the residual plots look good. The forecasts

seem reasonable but the 95% prediction limits are fairly wide.

3.

Total visits for fiscal years 4, 5 and 6 seem somewhat removed from the rest of the data.

Total visit for these fiscal years are, as a group, somewhat larger than the remaining

observations. Did something unusual happen during these years? Was total visits

defined differently? This particular feature makes modeling difficult.

220

CHAPTER 10

JUDGMENTAL FORECASTING AND FORECAST ADJUSTMENTS

ANSWERS TO PROBLEMS AND CASES

1.

The Delphi method can be used in any forecasting situation where there is little or no

historical data and there is expert opinion (experience) available. Two examples might

be:

Full capacity employment at a new plant

Assembling the right group of experts.

Overcoming individual biases or agendas

Not being able to arrange timely feedback

2.

a. Month

1

2

3

4

5

6

7

8

9

10

Averaged Forecast

4924.5

5976.0

6769.0

4708.0

4964.0

6102.0

8212.5

6178.5

4806.5

4228.5

b. Month

1

2

3

4

5

6

7

8

9

10

WtAvg Forecast

4721.4

5956.8

6731.2

4601.2

4991.6

6385.2

8362.2

6320.4

4596.8

4474.2

d. Averaged forecasts MAPE = 7.4%, Weighted averaged forecasts MAPE = 8.8%.

So based on MAPE, forecasts created by taking simple average of Winters

221

CASE 10-1: GOLDEN GARDENS RESTAURANT

1. & 2. Sue and Bill have tackled a very tough business project: designing a restaurant

that will succeed. Restaurants seem to come and go on a regular basis so their

planning efforts prior to opening are important.

They have already tried focus groups and have some ideas to add to their own.

Since they have a number of "expert" friends, some way must be found to use this

expertise. The Delphi method suggests itself as a way to utilize their friends'

knowledge. A written description of the project along with the question of proper

motif could be supplied to each of their friends, along with a request to design the

restaurant. These descriptions would then be mailed back to each participant with

a request to re-design the business based on all the written replies. This process

could be continued until changes are no longer generated.

An optional step would then be to bring the participants together for a discussion.

This expert focus group could argue their cases and respond to Sue and Bill's

objections or insights. At the end of this process Sue and Bill would probably have

a better idea of how a successful restaurant would look and could begin their project

with more confidence. Also, financial backers would probably be more enthusiastic

after reviewing the extensive planning that Sue and Bill have undertaken prior to

opening their business.

CASE 10-2: ALOMEGA FOOD STORES

1.

The nave forecasting model is not very accurate. The MSE equals

8,648,047,253.

2.

The MSE for the multiple regression model (from the regression output)

equals 2,097,765,646 which is quite a bit less than the nave model.

3.

If the nave approach had been more accurate, combining methods would have

been worth a try.

4.

If Julie did combine forecasts, she should use a weighted average that definitely

favored the multiple regression model.

1.

These articles are more abundant than many realize. More "popular journals,

particularly financial markets titles such as Technical Analysis of Stocks &

222

In addition, the proceedings from the neural network conferences (published by

IEEE) will usually have some business applications. Finally, this approach is

beginning to appear in more scholarly journals such as Management Science and

Decision Sciences.

1.

The interested student with access to a neural network simulator should enjoy

this assignment. In addition to the "backpropagation" approach, students might

try radial basis functions and least mean squares if they are available.

3.

9-4 where the choice between ARIMA(1,1,0) and AR(2) models is not clearcut. Neural

networks, however, do not require the analyst to specify the form of the

model -- they have been called "model free" function approximators (see Bart

Kosko, Neural Networks and Fuzzy Systems: A Dynamical Systems Approach

to Machine Intelligence, Prentice-Hall, 1992, for example).

CHAPTER 11

MANAGING THE FORECASTING PROCESS

ANSWERS TO PROBLEMS AND CASES

223

1.

a. One response: Forecasts may not be right, but they improve the odds of being

close to right. More importantly, if there are no agreed upon set of forecasts to

drive planning, then different groups may develop own procedures to guide

planning with potential chaos as the result.

b. One response: AnalogyIf you think education is expensive, try ignorance.

Having a good set of forecasts is like walking while looking ahead instead of at

your shoes. Planning without forecasts will lead to inefficient operations, sub

optimal returns on investment, poor customer service, and so forth.

c. One response: Good forecasts require not only good quantitative skills, they also

require an in-depth understanding of the business or, more generally, the

forecasting environment and, ultimately, good communication skills to sell

forecasts to management.

1.

This case invites students to think about how to use some of the forecasting techniques

discussed in Chapter 11. Guy Preston is trying to get his managers to think about the

long-range position of the company, as opposed to the short range thinking that most

managers are involved in on a daily basis. The case might generate a class discussion

about the tendency of managers to shorten their planning horizons too much in the

daily press of business.

Guy has asked his managers to write scenarios for the future: a worst case, a status quo,

and a most likely scenario. His next task might be to discuss each of these three

possibilities, and to discuss any differences of opinion that might emerge. A second

round of written scenarios by each participant could then follow this.

2.

The instructor should point out that the purpose of Guy's retreat is to expand the

planning horizon of his managers. He should be prepared to continue this effort after

the first round of written scenarios: it is quite possible that his team is still caught

up in the affairs of the day and is not really engaged in long range thinking. He should

encourage expanded thinking after the discussion phase and try during the day to continue

such thinking.

3.

There are two possible benefits from Guy's retreat. First, he may gain valuable insights

into the company's future to use in his own long range thinking. Second, and

probably more important, his managers may come away with an increased

awareness of the importance of expanding their planning horizons. If this is true,

the company will probably be in a better position to face the future.

1.

224

case ( = 0), would expect Holts procedure to fit and forecast better here.

Therefore, there is no reason to consider a combination of forecasts. Combining

forecasts is best considered when the sets of forecasts are produced by different

procedures.

2.

Jill should definitely update her historical data as new data points arrive. Since she

is using a computer program to do the forecasting, there would be very little effort

involved in this process. Why not update and re-run every quarter for a while?

3.

After the results for a few additional quarters (say 4) become available, the analysis

can be re-done to see if the current model is still viable. Model parameters can be

re-estimated after each new observation if appropriate computer software is available.

4.

Box-Jenkins ARIMA methodology is not well suited for small sample sizes and

can be difficult to explain to a non-statistician.

This case illustrates the practical problems that are typically encountered when

attempting to forecast a time series in a business setting. Among the problems Jill

encounters are:

She chooses to forecast a national variable for which data values are available

in the Survey of Current Business. Will this variable correlate well with the

actual Y value of interest (her firm's export sales)?

Her initial sample size is only 13.

When she attempts to gather more data, she finds that the series underwent a

definition change during the recent past, resulting in inconsistent data. She must

shift her focus to another surrogate variable.

Her data plot indicates a bump in the data and she decides a more

consistent series would result if she dropped the first few data points.

A real life-forecasting project could very likely involve difficulties such as those

Jill encountered in this case, or perhaps even more. For this reason this case is a "good

read" for forecasting students as they finish their studies since it shows that judgment

and skill must be involved in the forecasting effort: forecasting problems are not usually

as clean and straightforward as textbook problems.

Students should summarize the results of the analyses of these data in the cases at the

ends of chapters 4 (smoothing), 5 (decomposition), 6 (simple linear regression), 8 (regression

with time series data) and 9 (Box-Jenkins methods). Fits, residual analyses, and forecasts can be

compared. Regardless of the method, there is a fair amount of unexplained variation in the

number of new clients. This may be a situation where combining forecasts makes sense.

CASE 11-4: MR. TUX

We collected the data from the Mr. Tux rental shop so that real data could be used at the

225

end of each chapter instead of contrived data. We didn't know what would happen when we tried

to forecast this variable, but we think it turned out well because no one method was superior.

The case in Chapter 11 summarizes the different ways John used to forecast his monthly

sales, and asks students to comment on his efforts. We think a key point is that a lot of real data

sets do not lend themselves to accurate forecasting, and that continually trying different methods is

required. For the Mr. Tux data, there are fairly simple seasonal models (see the cases in Chapters

8 and 9) that represent the data well and provide reasonable forecasts.

What advice should we give to John Mosby for the future? Some suggestions to offer

might include:

1. Update the data set as future monthly values become available and re-run the

most promising analyses to see if the current forecasting model is still viable.

2. Consider combining forecasts from two different methods.

3. Try to develop a useful relationship between monthly sales and regional

economic variables. Perhaps the area unemployment rate or an economic activity

index would correlate well with John's sales. Perhaps some demographic

variables would correlate well. If several variables were collected over the

months of John's sales data, a good regression equation might result.

This would allow John to understand how is sales are tied to the local

environment.

CASE 11-5: ALOMEGA FOOD STORES

1.

Julie has to choose between two different methods of forecasting her companys

monthly sales. Students should review the results of these two efforts and decide

which offers the better choice. We find that class presentations by student teams

are valuable as they move the analysis beyond the computer results to simulate

implementing these results in a real situation.

2.

their analysis and choice of forecasting method is an alternative to class

presentations. Again, the results of this case do not point to a right answer, but

rather to the necessity of choosing a forecasting method and justifying its use. Nonquantitative considerations should come into play: the fact that Julie is the first

female president of Alomega, that she jumped over several qualified candidates for

the job, and that one of her subordinates (Jackson Tilson) seems to be unimpressed

with both her and any computer analysis.

3.

beyond a consideration of choosing between decomposition and multiple

regression would be to find a superior forecasting method using any available

software. Again, the qualitative considerations should be considered, including the

necessity of balancing the complexity and accuracy of a forecasting method with its

acceptance and use by the management team.

226

4.

Students should summarize the results of Marys forecasting efforts describing the fits,

residual analyses and forecasts. Moreover, they should point out the apparent difficulty in

finding an adequate model for Marys total visits series. If Marys data is accuratethere is no

reason for the apparent inconsistency in her time seriesthen it would probably be wise to

collect another year or so of data and attempt to model the entire data set or, perhaps, just the

data following fiscal year 6. In the interim, she may have to settle for the forecasts from the

best ARIMA model developed in Case 9-10.

227

- Coherence, The Secret Science of Brilliant LeadershipTransféré parmarcvenz
- hanke9_odd-num_sol_04.docTransféré parNiladriDas
- Business Forecasting Chapter 4 Odd SolutionsTransféré parBobbyHuffman
- forecast solved proplem +caseTransféré parmohammad
- Business Forecasting 9th Edition Hanke Solution ManualTransféré parMisha Lezhava
- Hanke9 Odd-num Sol 10Transféré parNiladriDas
- Krugman Summary International Chapter ThreeTransféré parGennifer Ingham
- Project Management Case: From Experience: Linking Projects to Strategy of HPTransféré parfarhanamjadpk
- Hanke9 Odd-num Sol 02Transféré parNiladriDas
- Hanke9 Odd-num Sol 11Transféré parNiladriDas
- Hanke9 Odd-num Sol 05Transféré parNiladriDas
- Hanke9 Odd-num Sol 03Transféré parNiladriDas
- Hanke9 Odd-num Sol 07Transféré parNiladriDas
- Ragsdale Solutions Odd NumbersTransféré pargen_mkv
- CASE 5-AAA Washington forecastingTransféré parhiteshdutt
- Selling NotesTransféré parAshish Pathak
- Lecture 5 Market MeasurementTransféré parHira Idris
- A Comparison Between Ses Desho Lts and ArresTransféré parRico Wu
- ch13Transféré parsumihosa
- Forecasting1_TimeSeriesModelsTransféré parAngel Kitty Labor
- Demand ForecastingTransféré parjigarpayalpatel
- Economic Statistics ReportTransféré parTravis Clark
- article analysisTransféré parapi-315466994
- Choosing a Time Series MethodTransféré parjhnai6
- ResentTransféré parYuki Snow
- Orchestro State Inventory Management 2015Transféré parssregens82
- Opportunity in Industrial Air Coolers Market in India_Feedback OTS_2013Transféré parFeedback Business Consulting Services Pvt. Ltd.
- Business Mathematics & Statistics.pdfTransféré parnyejoseph
- Moving Average MethodsTransféré parIshanDogra
- Chapter 18Transféré parAl

- Isaac NewtonTransféré parAriane Francesca Estira
- Lab 2 - Introduction to Ohms Law XAVIERTransféré parÏbhéèDre Thïïnfàllible Morrison
- Feed Forward and Cascade Control ExperimentTransféré parjohnmayard
- learning styles handoutTransféré parmrsfox
- Construction of a Frequency TableTransféré parEzekiel D. Rodriguez
- Short Course Syllabus Pjv w14Transféré parSaul Roberto Quispe
- Serge Abrate-Impact on Composite Structures (1998).pdfTransféré parShree Biradar
- Ap2011 Solutions 04Transféré parZobeda Marium
- SYLLABUS for Fybsc Physics Wef 2012-13Transféré parriteshgajare
- DTSP SyllabudTransféré parsridhar
- TorsionTransféré paramiruddin sharif
- Ce580 OutlineTransféré parDeniz C. Aydin
- List of Indian Inventions and DiscoveriesTransféré parAiman Hilmi Aiman Hilmi
- 2C Ardiente a#3 Measurement of Horizontal Distances 2014-2015Transféré parMichelle Anne
- Orr SommerfeldTransféré parmrjanus
- FC020 Week 3 Tutorial QuestionsTransféré parmpm1981
- Chemical-Engineering 20142015 Curriculum SheetsTransféré parRedhwanAL-Akbari
- 59332 Mark Scheme JuneTransféré parjohhnyenglish20
- Syllogism -Gr8AmbitionZTransféré parAmit Bharti
- e-Fair_Aggregation_in_e-Commerce_for_Exploiting_Ec.pdfTransféré parNicola Landro
- The Effect of Rock Decompaction on the Interaction of Movement Zones in Underground MiningTransféré parcarlos cruz
- Microimplantes Para Intrusion Ant-InfTransféré parMatias Anghileri
- Correlated Regressors & MulticollinearityTransféré parBarep Adji Widhi
- (c) Duality TheoryTransféré parUtkarsh Sethi
- lec4.pdfTransféré parBharathiraja Moorthy
- Inventory ModelsTransféré parResty Mae Orfiano
- Agitation and MixingTransféré parFaraj Al-Taher
- 1. Severe plastic deformation and strain localization in groove pressing.pdfTransféré parLinda de la Rans
- ARM InstructionsTransféré parSteven Astorga
- The Modulus Optimum (MO) Method Applied to Voltage Regulation Systems.pdfTransféré parMona Valikhani