Vous êtes sur la page 1sur 6

Over the past 25 years a chain of discount womens clothing stores has increased market share by

increasing the number of locations in the chain. A systematic approach to site selection was never
used. Site selection was primarily based on what was considered to be a great location or a great
lease. This year, with a strategic plan for opening several new stores, the director of special
projects and planning is being asked to develop an approach for forecasting annual sales for all
new stores.
The director of special projects and planning wanted to develop a strategy for forecasting annual
sales for all new stores. Suppose that he decided to examine the relationship between the size
(i.e., square footage) of a store and its annual sales by selecting a sample of 14 stores. The results
for these 14 stores are summarized in Table 1.
Table 1
Square footage and annual sales ($000) for a sample of 14 branches of
womens clothing store chain

STORE
1
2
3
4
5
6
7

SQUARE
FEET
1726
1642
2816
5555
1292
2208
1313

ANNUAL
SALES
($000)
3681
3895
6653
9543
3418
5563
3660

STORE
8
9
10
11
12
13
14

SQUARE
FEET
1102
3151
1516
5161
4567
5841
3008

ANNUAL
SALES
($000)
2694
5468
2898
10674
7585
11760
4085

The scatter diagram for the data in Table 1 is shown in Figure 1. An examination of Figure 1
indicates a clearly increasing relationship between square feet (X) and annual sales (Y). As the
size of the store as measured by its square footage increases, annual sales increase approximately
as a straight line. On this basis, if we assume that a straight line provides a useful mathematical
model of this relationship, the question in regression analysis becomes the determination of the
particular straight-line model that is best fit to these data.
From Figure 2 we observe that 1 1.686 and 0 901.247 . Thus the equation for the best
straight line for these data is
Yi 901.247 1.686 X i

The slope 1 was computed as +1.686. This means that for each increase of 1 unit in X, the
average value of Y is estimated to increase by 1.686 units. In other words, for each increase of 1
square foot in the size of the store, the fitted model predicts that the expected annual sales are
estimated to increase by 1.686 thousand dollars, or $1686.

The Y intercept 0 was computed to be +901.247 (thousand dollars). The Y intercept represents
the average value of Y when X equals 0. Because the square footage size of the store cannot be 0,
this Y intercept can be viewed as representing the portion of the annual sales that varies with
factors other than the size of the store.

Figure 1 Scatter diagram obtained from Microsoft Excel for the site selection data
Regression Analysis for the Site Selection
Regression Statistics
Multiple R
0.953824159
R Square
0.909780526
Adjusted R Square 0.902262236
Standard Error
936.8500077
Observations
14
ANOVA
df
Regression
Residual
Total

1
12
13

Coefficients
Intercept
901.2465701
Square Feet
1.68613497
Confidence Interval Estimate

SS
106208119.7
10532255.24
116740374.9
Standard
Error
513.0227603
0.153279311

MS
106208119.7
877687.9369

F
121.0089774

Significance
F
1.26653E-07

t Stat

P-value

Lower 95%

Upper 95%

0.10442398
1.26653E-07

216.5339829
1.352168046

2019.027123
2.020101895

1.756737985
11.00040805

Data
X Value
Confidence Level

4000
95%

Intermediate Calculations
Sample Size
14
Degrees of Freedom
12
t Value
2.178813
Sample Mean
2921.286
Sum of Squared Difference
37357091
Standard Error of the Estimate
936.85
h Statistic
0.102577
Average Predicted Y (YHat)
7645.786
For Average Predicted Y (YHat)
Interval Half Width
653.7558
Confidence Interval Lower
Limit
6992.031
Confidence Interval Upper
Limit
8299.542
For Individual Response Y
Interval Half Width
2143.357
Prediction Interval Lower Limit
5502.43
Prediction Interval Upper Limit
9789.143
Intermediate Calculations
Sample Size
14
Degrees of Freedom
12
t Value
2.178813
Sample Mean
2921.286
Sum of Squared Difference
37357091
Standard Error of the Estimate
936.85
h Statistic
0.102577
Average Predicted Y (YHat)
7645.786
For Average Predicted Y (YHat)
Interval Half Width
653.7558
Confidence Interval Lower
Limit
6992.031
Confidence Interval Upper
Limit
8299.542
For Individual Response Y
Interval Half Width
2143.357
Prediction Interval Lower Limit
5502.43
Prediction Interval Upper Limit
9789.143

Figure 2 Microsoft Excel output for the site selection problem

Suppose that we would like to use the fitted model to predict the average annual sales for a store
with 4000 square feet. From the Excel output above, the predicted annual sales for a store with
4000 square feet is 7,645.786 thousand dollars, or $7,645,786.
Note from Table 1 that the square footage varies from 1,102 to 5,841. Therefore, predictions of
annual sales should be made only for stores that are between 1,102 and 5,841 square feet in size.
Any prediction of annual sales outside this range presumes that the fitted relationship holds
outside the 1,102 to 5,841 range.
From the output above, R-square equals 0.91,therefore, 91% of the variation in annual sales can
be explained by the variability in the size of the store as measured by square footage. This is an
example of a strong positive linear relationship between two variables because the use of a
regression model has reduced the variability in predicting annual sales by 91%. Only 9% of the
sample variability in annual sales can be explained by factors other than what is accounted for by
the linear regression model that uses only square footage.

Figure 3 Plot of residuals against square footage of store obtained from MS Excel for the
site selection problem
To determine whether the linear model is appropriate for these data, the residuals have been
plotted against the independent variable (store size in square feet) in Figure 3. We observe that
although there is a widespread scatter in the residual plot, there is no apparent pattern or
relationship between the residuals and X i . The residuals appear to be evenly spread above and
below 0 for differing values of X. This result leads us to conclude that fitted straight-line model

is

appropriate

for

the

site

selection

sales

data.

Figure 4 Histogram of residuals obtained from MS Excel for site selection data
It is difficult to evaluate the normality assumption for a sample of only 14 observations
regardless of whether a histogram, stem-and-leaf display, box-and-whisker plot, or normal
probability plot is obtained. We can see from Figure 4 that although the data do not appear to be
normally distributed, they are also not extremely skewed. The robustness of regression analysis
to modest departures from normality, along with the small sample size, leads us to conclude that
we should not be overly concerned about departures from this normality assumption in this site
selection data.
Now we test whether there is a significant relationship between the size of the store and the
annual sales at 0.05 level of significance. From Figure 2, we have t 11.00 t12 2.1788 , and
p 0.000 0.05 . Hence, we can conclude that there is a significant linear relationship between
average annual sales and the size of the store. We could also us the F-test, where
F 121.01 4.75 and p 0.000 0.05 .
5

From Figure 2, the population slope is estimated with 95% confidence to be between 1.352 and
2.02 (i.e., $1,352 to $2,020). Because these values are above 0, we conclude that there is a
significant linear relationship between annual sales and square footage size of the store.
If we want to set-up a 95% confidence interval estimate of the average annual sales for all stores
that contain 4,000 square feet, Figure 2 gives us that our estimate of the average weekly sales are
between 6,992.031 and 8,299.542 (thousand of dollars) for all stores with 4,000 square feet of
space.
If we want to set up a 95% prediction interval estimate of the annual sales for an individual store
that contains 4,000 square feet, Figure 2 gives us that our estimate is that the annual sales for an
individual store with 4,000 square feet of space is between 5,502.43 and 9,789.143 (thousand
dollars)

Vous aimerez peut-être aussi