Académique Documents
Professionnel Documents
Culture Documents
11/8/06
4:23 PM
Page 548
Chapter
19
Predictive Analysis
in Marketing Research
Learning Objectives:
To understand the basic
concept of prediction
To learn how marketing
researchers use regression
analysis
To learn how marketing
researchers use bivariate
regression analysis
To see how multiple regression
differs from bivariate regression
To appreciate various types of
stepwise regression, how they are
applied, and the interpretation of
their findings
To learn how to obtain and
interpret regression analyses
with SPSS
Practitioner Viewpoint
We often use regression analysis both to examine simple relationships and
as the starting point in investigating more complex relationships. For example, we might look at the relationship between a B2B customers ratings of
your companys sales support and the customers overall satisfaction with
your company. We seldom expect a high correlation between one measured
item and overall satisfaction.
Why dont we expect a high correlation between any one item and overall satisfaction? If you look at the cross-tabulation that follows you will see
a typical data set.
Satisfaction Rating
Sales
Support
Rating
1
2
3
4
5
3
3
1
4
6
4
2
1
2
5
5
4
2
1
4
5
14
8
0
3
5
8
11
The correlation between satisfaction and sales support is .55. By looking at those ratings on the highlighted diagonal, you can observe that,
although there is a relationship, it is not strong. Every person whose ratings
were off the diagonal reduced the correlation. This leads us to using multiple regression to better explain factors (such as price, product quality,
customer service, and so forth) that are related to customer satisfaction.
In this chapter, you will learn more about simple and multiple regression
analysis using SPSS.
Ronald L. Tatham
548
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 549
2009934199
Where We Are:
1. Establish the need for
marketing research
2. Define the problem
3. Establish research objectives
4. Determine research design
5. Identify information types and
sources
6. Determine methods of
accessing data
7. Design data collection forms
8. Determine sample plan and size
9. Collect data
10. Analyze data
11. Prepare and present the final
research report
Predictive analysis reveals that Las Vegas and Atlantic City do not compete for
the same U.S. gamblers.
549
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
550
11/8/06
4:23 PM
Page 550
Income
Education
Distance to Las Vegas
analysis, the researcher answered the question, Do Las Vegas and Atlantic City
compete for U.S. gamblers? Table 19.1 lists his findings.
The highlighted cells are the ones that distinguish the market segment profiles
that differentiate Las Vegas from Atlantic City gamblers. Specifically, both Las Vegas
and Atlantic City are drawing gamblers who live closer to their respective locations,
and they both are attracting higher income and higher education groups. In addition,
Las Vegas gamblers are more likely to be (1) homeowners, (2) Midwesterners, (3)
retired, or (4) students, and (5) Asian, and not Northeasterners, Southerners, or
Blacks. Atlantic City, in contrast, is attractive to gamblers who are Northeasterners
and Blacks, but it is definitely not attracting gamblers who are Midwesterners or
Southerners. Compared to Las Vegas, Atlantic City is not attracting homeowners,
retirees, students, or Asians. From this set of findings, the two great American gambling destinations do not compete for the same gamblers.
This chapter is the last one in which we discuss statistical procedures frequently used
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 551
Understanding Prediction
551
UNDERSTANDING PREDICTION
A prediction is a statement of what is believed will happen in the future made on the
basis of past experience or prior observation. We are confronted with the need to make
predictions on a daily basis. For example, you must predict whether it will rain to
decide whether to carry an umbrella. You must predict how difficult an examination
will be in order to properly study. You must predict how heavy the traffic will be in
order to decide what time to start driving to make your dentist appointment on time.
Marketing managers are also constantly faced with the need to make predictions,
and the stakes are much higher than in the three examples just cited. That is, instead of
getting wet, receiving a grade of C rather than a B, or being late for a dentist appointment, the marketing manager has to worry about competitors reactions, changes in
sales, wasted resources, and whether profitability objectives will be achieved. Making
accurate predictions is a vital part of the marketing managers workaday world.
2009934199
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
552
11/8/06
4:23 PM
Page 552
The easiest way would be to compare the predictions for each days popcorn
sales to the actual amount sold. We have done this in Table 19.2. When you look at
the table, you will see that we have calculated the difference between your brothers
prediction and the actual sales for each day. Notice that for some days, the predictions were high, whereas for others, the predictions were low. When you compare
how far the predicted values are from the actual or observed values, you are performing analysis of residuals. Stated differently, assessment of the goodness of a prediction requires you to compare the pattern of errors in the predictions to the actual
data. Analysis of residuals underlies all assessments of the accuracy of a forecasting
method, and because researchers cannot wait a month, a quarter, or a year to compare a prediction with what actually happens, they fall back on past data. In other
words, they select a predictive model and apply it to the past data. Then they examine the residuals to assess the models predictive accuracy.
There are many ways to examine residuals. For example, in the case of your little brothers forecast, you could judge it either on a total basis or an individual basis.
On a total basis, you might compute the average as we have done in the table, or
you could sum all of the daily residuals. Of course, you would need to square the
daily residuals or use the absolute values to avoid cancellation of the positive differ-
ACTUAL SALES
RESIDUAL
(DIFFERENCE)
TYPE OF ERROR
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Averages
$100
$110
$120
$125
$260
$300
$275
$185
$125
$130
$135
$125
$225
$250
$235
$175
25
20
15
0
+35
+50
+40
+10
Very low
Low
Low
Exact
Very high
Very high
Very high
High
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
DAY OF WEEK
6160811_CH19
11/8/06
4:23 PM
Page 553
553
ences by the negative differences. (You have seen the necessary squaring operation
before, for instance, in the formula for a standard deviation or the sums of squares
formulas we described in Chapter 15.) For the individual error, you might look for
some pattern.2 On an individual basis, you might notice a pattern: Your little
brother tends to underestimate how much popcorn will be bought on weekdays,
which are low-sales days, whereas he overestimates it for Friday through Sunday,
which are high-sales days. As you can see, the goodness of a prediction approach
depends on how closely it predicts a set of representative values judged by examining the residuals (or errors).
Now that you have a basic understanding of prediction and how you determine
the goodness of your predictions, we turn our attention to regression analysis.
Figure 19.1 The General Equation for a Straight Line in Graph Form
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
554
11/8/06
4:23 PM
Page 554
You should recall the straight-line relationship we described underlying the correlation coefficient: When the scatter diagram for two variables appears as a thin
ellipse, there is a high correlation between them. Regression is directly related to
correlation. In fact, we use one of our correlation examples to illustrate the application of bivariate regression shortly.
MARKETING RESEARCH
Additional Insights
INSIGHT
19.1
In the following example we are using the Novartis pharmaceuticals company sales territory and number of salespersons
data found in Table 19.3. Intermediate regression calculations
are included in Table 19.3.
The formula for computing the regression parameter b is:
n
n
n
n
xi yi
xi
yi
Formula for b,
i =1 i =1
i =1
the slope, in
b =
2
n
n
bivariate regression
n
x i2
xi
i =1
i =1
n
b =
i =1
n
n
xi yi
xi
yi
i =1 i =1
n
xi
i =1
i =1
20 58,603 251 4,325
n
Calculation of b,
the slope, in
bivariate regression
using Novartis sales
territory data
x i2
20 3,469 2512
1172
, ,060 1085
, ,575
=
380 63,001
69,380
86,485
=
6,379
= 13.56
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
where
xi = an x variable value
yi = a y value paired with each xi value
n = the number of pairs
6160811_CH19
11/8/06
4:23 PM
Page 555
555
SALES
($ MILLIONS)
(y)
NUMBER OF
SALESPERSONS
(x)
102
125
150
155
160
168
180
220
210
205
230
255
250
260
250
275
280
240
300
310
4,325
(Average = 216.25)
7
5
9
9
9
8
10
10
12
12
12
15
14
15
16
16
17
18
18
19
251
(Average = 12.55)
2009934199
xy
714
625
1,350
1,395
1,440
1,344
1,800
2,200
2,520
2,460
2,760
3,825
3,500
3,900
4,320
4,400
4,760
4,320
5,400
5,890
58,603
49
25
81
81
81
64
100
100
144
144
144
225
196
225
256
256
289
324
324
361
3,469
y = 46.10 + 13.56 x
When SPSS or any other statistical analysis program computes the intercept and
the slope in a regression analysis, it does so on the basis of the least squares criterion. The least squares criterion is a way of guaranteeing that the straight line that
runs through the points on the scatter diagram is positioned so as to minimize the
vertical distances away from the line of the various points. In other words, if you
draw a line where the regression line is calculated and measure the vertical distances
of all the points away from that line, it would be impossible to draw any other line
that would result in a lower total of all of those vertical distances. Or, to state the
least squares criterion using residuals analysis, the line is the one with the lowest
total squared residuals.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
556
11/8/06
4:23 PM
Page 556
RETURN TO
SPSS
Now let us illustrate bivariate regression with SPSS using The Hobbits Choice Restaurant
survey data with which you are well acquainted. Our purpose is to help you learn the basic
SPSS commands for bivariate regression and to familiarize you with SPSS output and various regression statistics found on it.
The first step in bivariate regression analysis is to identify the dependent and independent variables. For our example, we will use the amount spent in restaurants per month
as our dependent variable. Logically, we would expect expenditures to be related to
income, so the before-tax household income level is a logical independent variable.
However, we have used a code system for income level: The code numbers are not in dollars or dollar units (such as thousands). In any regression, it is best to use realistic values
because realistic values are easiest to interpret. Consequently, we will recode the income
values to represent the midpoints of the income ranges on the questionnaire. For example, we will recode the less than $15,000 to 7.5 meaning $7,500, so our recoded units
are in thousands of dollars. The recoded income values are $7.5, $20.0, $37.5, $62.5,
$87.5, $125, and $175. Notice that the highest income level of $150,000 or higher did
not have an upper limit, so we arbitrarily use the range of the income level just below it.
As you can see in Figure 19.2, the SPSS menu commands clickstream to run bivariate
regression is ANALYZE-REGRESSION-LINEAR. This opens up the linear regression selection window where you would indicate which variable is the dependent variable and the
independent variable. In our example, we are investigating what you would expect to be a
linear relationship between the amount spent per month in restaurants (dependent vari-
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 557
557
2009934199
able) and household income (independent variable). When these variables are clicked into
their respective locations on the SPSS Linear Regression setup window, clicking on OK will
generate the SPSS output we are about to describe.
There are several pieces of information provided with a regression analysis such as this.
First, there is information on Variables Entered/Removed, which indicates that the regression analysis used a method designated as Enter. This designation refers to the regression
method we are using. (There are several methods, but most are beyond the scope of our
textbook.)
The annotated SPSS linear regression output is shown in Figure 19.3. In the Model
Summary table, three types of Rs are indicated. For bivariate regression, R Square (.738
on the output) is the square of the correlation coefficient of 0.859. The Adjusted R Square
(.737) reduces the R2 by taking into account the sample size and number of parameters
estimated. This R Square value is very important, because it reveals how well the straightline model fits the scatter of points. Because a correlation coefficient ranges from 1.0 to
+1.0, its square will range from 0 to +1.0. The higher the R Square value, the better is the
straight lines fit to the elliptical scatter of points. A standard error value is reported, and we
explain its use later.
Next, an Analysis of Variance (ANOVA) section is provided. As you can see, regression is related to analysis of variance.3 We must determine whether the straight-line
model we are attempting to apply to describe these two variables is appropriate. The F
value is significant (.000) so we reject the null hypothesis that a straight-line model does
not fit the data we are analyzing. Just as in ANOVA, this test is a flag, and the flag has
now been raised, making it justifiable to continue inspecting the output for more significant results. If the ANOVA F test is not significant, we would have to abandon our regression analysis attempts with these two variables. Finally, you can see in the Coefficients
Table that the values of b and a are listed under Unstandardized Coefficients. The
SPSS
SPSS Student Assistant Online
Running and Interpreting Bivariate
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
Regression
6160811_CH19
558
11/8/06
4:23 PM
Page 558
To relate this finding to our regression line in Figure 19.1, it says that the regression line
will intercept the dollars spent in restaurants per month (the y-axis) at $35.46, and the
line will increase $1.50 per month for each $1,000 unit increase in the income level (the
x-axis).
You must always test the regression model, intercept, and slope
for statistical significance.
puting the values for a and b is not sufficient for regression analysis, because the two
values must be tested for statistical significance. The intercept and slope that are
computed are sample estimates of population parameters of the true intercept,
(alpha), and the true slope, (beta). The tests for statistical significance are tests as
to whether the computed intercept and computed slope are significantly different
from zero (the null hypothesis). To determine statistical significance, regression
analysis requires that a t test be undertaken for each parameter estimate. The interpretation of these t tests is identical to other significance tests you have seen. We
describe what these t tests mean next.
In our example, you would look at the Sig. column in the Coefficients table.
This is where the slope and intercept t test results are reported. Both of our tests have
significance levels of .000, which are below our standard significance level cutoff of
.05, so our computed intercept and slope are valid estimates of the population intercept
and slope. If x and y do not share a linear relationship, the population regression slope
will equal zero and the t test result will support the null hypothesis. However, if a systematic linear relationship exists, the t test result will force rejection of the null hypothesis, and the researcher can be confident that the calculated slope estimates the true one
that exists in the population. Remember that we are dealing with a statistical concept,
and you must be assured that the straight-line parameters and really exist in the
population before you can use your regression analysis findings as a prediction device.
Making a Prediction and Accounting for Error Now, there is one more step to
relate, and it is the most important one. How do you make a prediction? The fact
that the line is a best-approximation representation of all the points means we must
account for a certain amount of error when we use the line for our predictions. The
true advantage of a significant bivariate regression analysis result lies in the ability
of the marketing researcher to use that information gained about the regression line
through the points on the scatter diagram and to estimate the value or amount of the
dependent variable based on some level of the independent variable. For example,
with our regression result calculated for the relationship between total monthly
restaurant purchases and income level, it is now possible to estimate the dollar
amount of restaurant purchases predicted to be associated with a specific income
level. However, we know that the scatter of points does not describe a perfectly
straight line because the correlation is .859, not 1.0. So our regression prediction
can only be an estimate.
Generating a regression prediction is conceptually identical to estimating a population mean. That is, it is necessary to express the amount of error by estimating a
range rather than stipulating an exact estimate for your prediction. Regression
analysis provides for a standard error of the estimate, which is a measure of the
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
Testing for Statistical Significance of the Intercept and the Slope Simply com-
6160811_CH19
11/8/06
4:23 PM
Page 559
559
Figure 19.4 Regression Assumes That Data Points Form a Bell-Shaped Curve
2009934199
accuracy of the predictions of the regression equation. This standard error value is
listed in the top half of the SPSS output and just beside the Adjusted R Square in
Figure 19.3. It is analogous to the standard error of the mean you used in estimating
a population mean from a sample, but it is based on the residuals, or how far away
each predicted value is from the actual value. Do you recall the popcorn sales example of residuals we described earlier in this chapter? SPSS does the same comparison
by using the regression equation it computed to predict the monthly restaurant purchases dollar amount for each respondent, and this predicted value is compared to
the actual amount given by the respondent. The differences, or residuals, are translated into a standard error of estimate value. In our Hobbits Choice Restaurant survey example, the standard error of the estimate was found to be $47.54 (rounded to
the nearest cent).
One of the assumptions of regression analysis is that the plots on the scatter diagram will be spread uniformly and in accord with the normal curve assumptions
over the regression line. Figure 19.4 illustrates how this assumption might be
depicted graphically. The points are congregated close to the line and then become
more diffuse as they move away from the line. In other words, a greater percentage
of the points is found on or close to the line than is found further away. The great
advantage of this assumption is that it allows the marketing researcher to use his or
her knowledge of the normal curve to specify the range in which the dependent variable is predicted to fall. For example, if the researcher used the predicted dependent
value result 1.96 times the standard error of the estimate, he or she would be stipulating a range with a 95 percent level of confidence; whereas if the researcher uses
2.58 times the standard error of the estimate, he or she would be stipulating a
range with a 99 percent level of confidence. The interpretation of these confidence
intervals is identical to interpretations for previous confidence intervals: Were the
prediction made many times and an actual result determined each time, the actual
results would fall within the range of the predicted value 95 percent or 99 percent of
these times.
Figure 19.5 illustrates how you can envision a regression prediction. Let us use
the regression equation to make a prediction about the dollar amount of monthly
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
560
11/8/06
4:23 PM
Page 560
Predicted
y values
Next, to reflect the imperfect aspects of the predictive tool being used, we must
apply confidence intervals. If the 95 percent level of confidence were applied, the
computations would be:
Predicted y z Standard error of the estimate
Calculation of 95% confidence
intervals for the predicted
monthly restaurant purchases
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
As you can see, the predicted y is the monthly dollar amount of restaurant purchases we just computed for an income level of $75,000, the 1.96 pertains to 95 percent level of confidence, and the standard error of the estimate is the value indicated
in the regression analysis output. The interpretation of these three numbers is as follows. For a typical individual in The Hobbits Choice Restaurant survey population,
if that persons household income before taxes was $75,000, the monthly restaurant
purchases amount that would be expected would be about $148, but because there
are differences between income ranges and monthly restaurant purchases, the
restaurant purchases would not be exactly that amount. Consequently, the 95 percent confidence interval reveals that the sales figure should fall between $55 and
$241. Finally, the prediction is valid only if conditions remain the same as they were
for the time period from which the original data were collected.4
6160811_CH19
11/8/06
4:23 PM
Page 561
561
2009934199
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
562
11/8/06
4:23 PM
Page 562
example, the dependent variable was territory sales. If Dell computers commissioned
a survey, it might want information on those who intend to purchase a Dell computer,
or it might want information on those who intend to buy a competing brand as a
means of understanding these consumers and perhaps dissuading them. The dependent variable would be purchase intentions for Dell computers. If Maxwell House
Coffee was considering a line of gourmet iced coffee, it would want to know how coffee drinkers feel about gourmet iced coffee, that is, their attitudes toward buying,
preparing, and drinking it would be the dependent variable.
Figure 19.6 provides a general conceptual model that fits many marketing
research situations, particularly those that are investigating consumer behavior. A
general conceptual model identifies independent and dependent variables and shows
their basic relationships to one another. In Figure 19.6, you can see that purchases,
intentions to purchase, and preferences are in the center, meaning they are dependent. The surrounding concepts are possible independent variables. That is, any one
could be used to predict any dependent variable. For example, ones intentions to
purchase an expensive automobile like a Lexus could depend on ones income. It
could also depend on friends recommendations (word of mouth), ones opinions
about how a Lexus would enhance ones self-image, or experiences riding in or driving a Lexus.
In truth, consumers preferences, intentions, and actions are potentially influenced by a great number of factors as would be very evident if you listed all of the
subconcepts that make up each concept in Figure 19.6. For example, there are probably a dozen different demographic variables; there could be dozens of lifestyle
dimensions, and a person is exposed to a great many types of advertising media
every day. Of course, in the problem definition stage, the researcher and manager
slice the myriad of independent variables down to a manageable number to be
included on the questionnaire. That is, they have the general model structure in
Figure 19.6 in mind, but they identify and measure specific variables that pertain to
the problem at hand.
Our underlying conceptual model example is one of many different conceptual
models that researchers have available to them. In truth, every research project has
a unique conceptual model that depends entirely on the research objectives. As an
example of how regression analysis can be applied to how consumers from two different countries view certain questionable sales tactics, read Marketing Research
Insight 19.2. We have also provided Marketing Research Insight 19.3 on page 563
Media
Exposure,
Word of
Mouth
Attitudes,
Opinions,
Feelings
Past Behavior,
Experience,
Knowledge
Purchases;
Intentions to
Purchase;
Preferences;
or Satisfaction
Demographics,
Lifestyle
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 563
563
MARKETING RESEARCH
A Global Perspective
INSIGHT
19.2
MARKETING RESEARCH
A Global Perspective
2009934199
INSIGHT
19.3
rior Japanese bikes, lesser Japanese bikes, superior
American bikes, and lesser American bikes. Which mountain bike will you buy?
These questions are vital ones for global marketers
because the ways consumers make decisions about products with foreign country origins have important implications
about the marketing strategies a global marketer should use
to gain competitive advantage in each of the countries where
its products are sold.
Two researchers6 investigated the exact situations we
have just posed for you: Japanese consumers considering
superior and lesser Japanese and American mountain bike
brands and American consumers considering superior and
lesser American and Japanese mountain bike brands. Using
various techniques including regression analysis, they found
that Japanese consumers favored Japanese-made products
regardless of superiority. In other words, they always chose
their own countrys brand. American consumers, on the
other hand, chose the superior brand, regardless of the
country of origin. The researchers concluded that any global
marketer must be aware of the strong home country product
favoritism they will face in countries where the culture is collectivist. In these markets, the global marketer may want to
(box continues)
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
564
11/8/06
4:23 PM
Page 564
downplay its foreign country status. Similarly, a global marketer competing in a country where the culture is individualistic must demonstrate superior product performance even if
it is the marketers home country. The researchers caution
that illustrates how regression analysis was used to assess the success of the slogan
Buy American.
Bivariate regression analysis treats only dependentindependent pairs, and it
would take a great many bivariate regression analyses to account for possible relevant dependentindependent pairs of variables in a survey. Fortunately, there is no
need to perform a great many bivariate regressions, as there is a much better tool
called multiple regression analysis, the last analysis technique described in this textbook. While hearing that you have only one more statistical technique to learn in
this textbook, you should be aware that there are a number of statistical techniques
that are beyond the scope of this textbook. A market researcher and author of a
book on these techniques is Dr. Ronald Tatham, who is also the CEO of Burke
Marketing Research, Inc., a world leader in marketing research. You can meet Dr.
Tatham in Marketing Research Insight 19.4.
Basic Assumptions in Multiple Regression Consider our example with the number
of salespeople as the independent variable and territory sales as the dependent variable. A second independent variable such as advertising levels can be added to the
equation (Figure 19.7). You will note that the advertising dimension (x2) is included
MARKETING RESEARCH
INSIGHT
19.4
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 565
565
X2
X1
at a right angle to the axis representing the number of salespeople (x1) , which creates a three-dimensional display. At the same time, the addition of a second variable
turns the regression line into a regression plane. A regression plane is the shape of
the dependent variable in multiple regression analysis. If other independent variables are added to the regression analysis, it would be necessary to envision each one
as a new and separate axis existing at right angles to all other axes. Obviously, it is
impossible to draw more than three dimensions at right angles. In fact, it is difficult
to even conceive of a multiple-dimension diagram, but the assumptions of multiple
regression analysis require this conceptualization.
Everything about multiple regression is essentially equivalent to bivariate
regression except you are working with more than one independent variable. The
terminology is slightly different in places, and some statistics are modified to take
into account the multiple aspect, but for the most part, concepts in multiple regression are analogous to those in the simple bivariate case. We note these similarities in
our description of multiple regression.
The regression equation in multiple regression has the following form:
2009934199
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
566
11/8/06
4:23 PM
Page 566
This multiple regression equation says that you can predict a consumers intention
to buy a Lexus level if you know three variables: (1) attitude toward Lexus, (2) friends
negative recommendations or other negative comments about Lexus, and (3) income
level using a scale with 10 income grades. Furthermore, we can see the impact of each
of these variables on Lexus purchase intentions. Here is how to interpret the equation.
First, the average person has a 2 intention level, or some small propensity to want
to buy a Lexus. Attitude toward Lexus is measured on a 15 scale, and with each attitude scale point, intention goes up 1 point. That is, an individual with a strong positive attitude of 5 will have a greater intention than one with a strong negative attitude of 1. With friends objections to the Lexus (negative word of mouth) such as
A Lexus is overpriced, the intention decreases by .5 for each level on the 5-point
scale. Finally, the intention increases by 1 with each increasing income grade.
Here is a numerical example for a potential Lexus buyer whose attitude is 4,
negative word of mouth is 3, and income is 5.
Calculation of Lexus
purchase intention
using the multiple
regression equation
Multiple R indicates how well the
independent variables can predict
the dependent variable in multiple
regression.
Intention to
purchase a Lexus = 2
+ 1.0 4
.5 3
+ 1.0 5
= 9.5
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 567
centage. For example, a multiple R of .75 means that the regression findings will
explain 75 percent of the dependent variable. The greater the explanatory power of
the multiple regression finding, the better and more useful it is for the researcher.
Let us issue a caution before we show you how to run a multiple regression
analysis using SPSS. The independence assumption stipulates that the independent
variables must be statistically independent and uncorrelated with one another. The
independence assumption is portrayed by the right-angle alignment of each additional independent variable in Figure 19.7. The presence of moderate or stronger correlations among the independent variables is termed multicollinearity and will violate
the independence assumption of multiple regression analysis results when it occurs.7
It is up to the researcher to test for and remove multicollinearity if it is present.
There are statistics issued to identify this problem. One commonly used statistic
is the variance inflation factor (VIF). The VIF is a single number, and a rule of
thumb is that as long as VIF is less than 10, multicollinearity is not a concern. With
a VIF of 10 or more associated with any independent variable in the multiple regression equation, it is prudent to remove that variable from consideration or to otherwise reconstitute the set of independent variables. In other words, when examining
the output of any multiple regression, the researcher should inspect the VIF number
associated with each independent variable that is retained in the final multiple
regression equation by the procedure. If the VIF is greater than 10, the researcher
should remove that variable from the independent variable set and rerun the multiple regression. This iterative process is used until only independent variables that are
statistically significant and that have acceptable VIFs are in the final multiple regression equation. SPSS calculates all VIFs.
RETURN TO
567
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
568
11/8/06
4:23 PM
Page 568
much people spend on restaurants per month. We already know from our bivariate
regression analysis work with The Hobbits Choice Restaurant data set that income predicts this dependent variable. Another predictor could be family size as larger families will
order more entres because there are more family members. So, we will add this independent variable. A third independent variable could be preferences. People who prefer
an elegant dcor surely will pay for this atmosphere as elegant dcors are typically found
in the most expensive restaurants. To summarize, we have determined our conceptual
model: The average amount paid at restaurants per month may be predicted by (1)
household income level, (2) family size, and (3) preference for restaurants with elegant
dcors.
Just as with bivariate regression, the ANALYZE-REGRESSION-LINEAR command
sequence is used to run a multiple regression analysis, and the variable, dollars spent in
restaurants per month, is selected as the dependent variable, while the other three are
specified as the independent variables. You will find this annotated SPSS clickstream in
Figure 19.8.
As the computer output in Figure 19.9 shows, the Adjusted R Square value (Model
Summary table) indicating the strength of relationship between the independent variables
and the dependent variable is .749, signifying that there is some linear relationship present. Next, the printout reveals that the ANOVA F is significant, signaling that the null
hypothesis of no linear relationship is rejected, and it is justifiable to use a straight-line relationship to model the variables in this case.
Just as we did with bivariate regression, it is necessary in multiple regression analysis to
test for statistical significance of the bis (betas) determined for the independent variables.
Once again, you must determine whether sampling error is influencing the results and giving a false reading. You should recall that this test is a test for significance from zero (the
null hypothesis) and is achieved through the use of separate t tests for each bi. The SPSS
output in Figure 19.9 indicates the levels of statistical significance. In this particular example, it is apparent the recoded income level and the preference for an elegant dining dcor
are significant as both have significance levels of .000. The constant (a) also is significant
as it, too, has a significance level of .000. However, the size of family independent variable
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 569
569
is not significant because its significance level is .200 and, therefore, above our standard
cutoff value of .05.
What do you do with the mixed significance results we have just found in our dollars
spent on restaurants per month multiple regression example? Before we answer this question, you should be aware that this mixed result is very likely, so how to handle it is vital to
your understanding of how to perform multiple regression analysis successfully. Here is the
answer: It is standard practice in multiple regression analysis to systematically eliminate
those independent variables that are shown to be insignificant through a process called
trimming. You then rerun the trimmed model and inspect the significance levels again.
This series of eliminations or iterations helps to achieve the simplest model by eliminating
the nonsignificant independent variables. The trimmed multiple regression model with all
significant independent variables is found in Figure 19.10. Notice that the VIF diagnostics
were not examined because they were examined on the untrimmed SPSS output and
found to be acceptable.
This additional run enables the marketing researcher to think in terms of fewer dimensions within which the dependent variable relationship operates. Generally, successive iterations sometimes cause the Adjusted R to decrease somewhat, and it is advisable to scrutinize this value after each run. You can see that the new Adjusted R is still .749, so in our
example, there has been no decrease. Iterations will also cause the beta values and the
intercept value to shift slightly; consequently, it is necessary to inspect all significance levels of the betas once again. Through a series of iterations, the marketing researcher finally
arrives at the final regression equation expressing the salient independent variables and
their linear relationships with the dependent variable. A concise predictive model has been
found.
2009934199
SPSS
SPSS Student Assistant Online
Running and Interpreting Multiple
Regression
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
570
11/8/06
4:23 PM
Page 570
predict the dependent variable based on assumed or known values of the independent
variables that are found to have significant relationships within the multiple regression
equation. The standard error of the estimate is provided on all regression analysis programs, and it is possible to apply this value to forecast the ranges in which the dependent
variable will fall, given levels of the independent variables.
Making a prediction with multiple regression is identical to making one with bivariate
regression except you use the multiple regression equation. For a numerical example, let
us assume that we are interested in upscale restaurant patrons so we can give Jeff Dean
an estimate of how much they spend monthly on restaurants. Well specify an upscale
restaurant patron as someone with household income of $100K and who very strongly
prefers an elegant dcor. To very strongly prefer means a 5 on the 5-point scale of preferences for various restaurant features.
Using our SPSS trimmed multiple regression findings, we can predict the amount of
dollars spent on restaurants each month by upscale consumers. The calculations follow.
Remember the constant and betas are from the trimmed multiple regression output in
Figure 19.10.
Calculation of monthly restaurant
purchases predicted with an
income level of $100,000
and preference for elegant
dcor of 5
y =
=
=
=
a + b1x 1 + b 2 x 2
$29.39 + $1.15 100 + $14.14 5
$29.39 + $115, 000 $70.70
$215.09
The calculated prediction is about $232; however, we must take into consideration the
sample error and variability of the data with a confidence interval, that is, the predicted
dollars spent on restaurants per month 1.96 times the standard error of the estimate.
Predicted y 1.96 standard error of the estimate
$215.09 1.96 $46.45
$215.09 91.04
$124.05 to $306.13
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
6160811_CH19
11/8/06
4:23 PM
Page 571
571
The interpretation would be that those individuals with a household income before taxes
of $100,000 and who very strongly prefer an elegant dcor when they dine can be
expected to spend between about $124 and $306 monthly on restaurants, averaging about
$215. Again, the confidence interval range is quite large, but it is perfectly reflective of the
variability in the data and in no way a flaw in the multiple regression analysis.
2009934199
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
572
11/8/06
4:23 PM
Page 572
SPSS
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
A researcher executes stepwise multiple regression by using the ANALYZEREGRESSION-LINEAR command sequence precisely as was described for multiple
regression. The dependent variable and many independent variables are clicked into
their respective windows as before. To direct SPSS to perform stepwise multiple
regression, one uses the Method selection menu to select Stepwise. The findings
will be the same as those arrived at by a researcher who uses iterative trimmed multiple regressions. Of course, with stepwise multiple regression output, there will be
6160811_CH19
11/8/06
4:23 PM
Page 573
573
MARKETING RESEARCH
A Global Perspective
INSIGHT
19.5
2009934199
KOREA
THAILAND
SRI LANKA
INDONESIA
PHILIPPINES
Wifes age
Births
Children at home
Young children
Rural location
Wifes college
education
Husbands high
school education
+
+
+
+
+
+
+
+
+
+
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
574
11/8/06
4:23 PM
Page 574
Before leaving our description of multiple regression analysis, we must issue a warning about your interpretation of regression. We all have a natural tendency to think
in terms of causes and effects, and regression analysis invites us to think in terms of
a dependent variables resulting or being caused by an independent variables
actions. This line of thinking is absolutely incorrect: Regression analysis is nothing
more than a statistical tool that assumes a linear relationship between two variables.
It springs from correlation analysis, which is, as you will recall, a measure of the linear association and not the causal relationship between two variables. Consequently,
even though two variables, such as sales and advertising, are logically connected, a
regression analysis does not permit the marketing researcher to formulate causeand-effect statements.
The second warning we have is that you should not apply regression analysis to
predict outside of the boundaries of the data used to develop your regression model.
That is, you may use the regression model to interpolate within the boundaries set
by the range (lowest value to highest value) of your independent variable, but if you
use it to predict for independent values outside those limits, you have moved into an
area that is not accounted for by the raw data used to compute your regression line.
For this reason, you are not assured that the regression equation findings are valid.
For example, it would not be correct to use our income-dollars spent on restaurants
regression equation findings on individuals who are wealthy and have annual
incomes in the millions of dollars because these individuals were not represented in
The Hobbits Choice Restaurant survey.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
There is a great deal more to multiple regression analysis but it is beyond the scope
of this textbook to delve deeper into this topic. The coverage in this chapter introduces you to regression analysis, and it provides you with enough information
about it to run uncomplicated regression analyses on SPSS, identify the relevant
aspects of the SPSS output, and to interpret the findings. As you will see when you
work with the SPSS regression analysis procedures, we have only scratched the surface of this topic. There are many more options, statistics, and considerations
involved. In fact, there is so much material that whole textbooks on regression exist.
Our purpose has been to teach you the basic concepts and to help you interpret the
statistics associated with these concepts as you encounter them in statistical analysis
program output. Our descriptions are merely an introduction to multiple regression
analysis to help you comprehend the basic notions, common uses, and interpretations involved with this predictive technique.10
Despite our simple treatment of it, we fully realize that even simplified regression analysis is very complicated and difficult to learn and that we have showered
you with a great many regression statistical terms and concepts in this chapter.
6160811_CH19
11/8/06
4:23 PM
Page 575
Summary
575
EXPLANATION
Regression analysis
Intercept
Slope
Dependent variable
Independent variable(s)
Least squares criterion
R Square
Trimming
Standardized beta coefficients
Dummy independent variable
Stepwise multiple regression
Seasoned researchers are intimately knowledgeable with them and very comfortable
in using them. However, as a student encountering them for the first time, you
undoubtedly feel very intimidated. Although we may not be able to reduce your
anxiety, we have created Table 19.5 that lists all of the regression analysis concepts
we have described in this chapter, and it provides an explanation of each one. At
least, you will not need to search through the chapter to find these concepts when
you are trying to learn or use them.
2009934199
SUMMARY
Predictive analyses are methods used to forecast the level of a variable such as sales.
Model building and extrapolation are two general options available to market
researchers. In either case, it is important to assess the goodness of the prediction.
This assessment is typically performed by comparing the predictions against the
actual data with procedures called residuals analyses.
Market researchers use regression analysis to make predictions. The basis of this
technique is an assumed straight-line relationship existing between the variables. With
bivariate regression, one independent variable, x, is used to predict the dependent
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
6160811_CH19
576
11/8/06
4:23 PM
Page 576
KEY TERMS
Prediction (p. 551)
Extrapolation (p. 551)
Predictive model (p. 551)
Analysis of residuals (p. 552)
Bivariate regression analysis (p. 553)
Intercept (p. 553)
Slope (p. 553)
Dependent variable (p. 554)
Independent variable (p. 554)
Least squares criterion (p. 555)
Standard error of the estimate (p. 558)
General conceptual model (p. 562)
REVIEW QUESTIONS/APPLICATIONS
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
1. Construct and explain a reasonably simple predictive model for each of the
following cases:
a. What is the relationship between gasoline prices and distance traveled for
family automobile touring vacations?
b. How do hurricane-force warnings relate to purchases of flashlight batteries
in the expected landfall area?
c. What do florists do with regard to inventory of flowers for the week prior
to and the week following Mothers Day?
2. Indicate what the scatter diagram and probable regression line would look like
for two variables that are correlated in each of the following ways (in each
instance, assume a negative intercept): (a) 0.89 (b) +0.48 and (c) 0.10.
3. Circle K runs a contest, inviting customers to fill out a registration card. In
exchange, they are eligible for a grand prize drawing of a trip to Alaska. The card
asks for the customers age, education, gender, estimated weekly purchases (in
dollars) at the Circle K, and approximate distance the Circle K is from his or her
home. Identify each of the following if a multiple regression analysis were to be
performed: (a) independent variable, (b) dependent variable, (c) dummy variable.
4. Explain what is meant by the independence assumption in multiple regression.
How can you examine your data for independence, and what statistic is issued
by most statistical analysis programs? How is this statistic interpreted? That is,
6160811_CH19
11/8/06
4:23 PM
Page 577
Review Questions/Applications
what would indicate the presence of multicollinearity, and what would you do
to eliminate it?
5. What is multiple regression? Specifically, what is multiple about it, and how
does the formula for multiple regression appear? In your indication of the formula, identify the various terms and also indicate the signs (positive or negative) that they may take on.
6. If one uses the Enter method for multiple regression analysis, what statistics
on an SPSS output should be examined to assess the result? Indicate how you
would determine each of the following:
a. Variance explained in the dependent variable by the independent variables.
b. Statistical significance of each of the independent variables.
c. Relative importance of the independent variables in predicting the dependent variable.
7. Explain what is meant by the notion of trimming a multiple regression result.
Use the following example to illustrate your understanding of this concept.
A bicycle manufacturer maintains records over 20 years of the following:
retail price in dollars, cooperative advertising amount in dollars, competitors
average retail price in dollars, number of retail locations selling the bicycle manufacturers brand, and whether or not the winner of the Tour de France was riding
the manufacturers brand (coded as a dummy variable where 0 = no, and 1 = yes).
The initial multiple regression result determines the following:
VARIABLE
SIGNIFICANCE LEVEL
.001
.202
.028
.591
.032
2009934199
Using the Enter method, what would be the trimming steps you would
expect to undertake to identify the significant multiple regression result?
Explain your reasoning.
8. Using the bicycle example in question 7, what do you expect would be the
elimination of variables sequence using stepwise multiple regression? Explain
your reasoning with respect to the operation of this technique.
9. Using SPSS graphical capabilities, diagram the regression plane for the following variables.
NUMBER OF GALLONS
OF GASOLINE
USED PER WEEK
MILES COMPUTED
FOR WORK PER WEEK
NUMBER OF
RIDERS IN CARPOOL
5
10
15
20
25
50
125
175
250
300
4
3
2
0
0
SPSS
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
577
6160811_CH19
578
11/8/06
4:23 PM
Page 578
INTERACTIVE LEARNING
Visit the Web site at www.prenhall.com/burnsbush. For this chapter, work through
the Self-Study Quizzes, and get instant feedback on whether you need additional
studying. On the Web site, you can review the chapter outlines and case information
for Chapter 19.
Segmentation Variable
Compact Automobile
Buyer
Luxury Automobile
Buyer
+.59
Demographics
.28
.15
.12
+.38
Family Size
+.39
.35
Income
.15
+.25
+.68
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
2009934199
Age
Education
6160811_CH19
11/8/06
4:23 PM
Page 579
Case 19.2
Segmentation Variable
Compact Automobile
Buyer
Luxury Automobile
Buyer
Lifestyle/Values
+.59
Active
American Pride
+.30
Bargain Hunter
+.45
Cosmopolitan
.33
.38
Conservative
.40
+.68
+.65
Embraces Change
.30
Family Values
+.69
Financially Secure
.28
Optimistic
.39
+.24
+.54
+.21
+.21
+.52
+.71
+.37
2. What are the segmentation variables that distinguish compact automobile buyers and
in what ways?
3. What are the segmentation variables that distinguish sports car buyers and in what ways?
4. What are the segmentation variables that distinguish luxury automobile buyers and in
what ways?
5. Contrast the segmentation variable classification differences among the three types of
automobile buyers.
CASE 19.2
SPSS
2009934199
Jeff Dean was a very happy camper. He had learned that his dream of The Hobbits Choice
Restaurant could be a reality. Through the research conducted under Cory Rogerss expert
supervision and Celeste Browns SPSS analysis, Jeff knew the approximate size of the upscale
restaurant market, and he had a good idea of what features were desired, where it should be
located, and even what advertising media to use to promote it. He had all the information he
needed to return to his friend, Walker Stripling, the banker, to obtain the financing necessary
to design and build The Hobbits Choice Restaurant.
Jeff called Cory on Friday morning and said, Cory, I am very excited about everything that your
work has found about the good prospects of The Hobbits Choice. I want to set up a meeting with
Walker Stripling next week to pitch it to him for the funding. Can you get me the final report by then?
Cory was silent for a moment and then he said, Celeste is doing the final figures and
dressing up the tables so we can paste them into the report document. But I think you have
forgotten about the last research objective. We still need to address the target market definition
with a final set of analyses. I know that Celeste just finished some exams at school, and she
has been asking if there is any work she can do over the weekend. Ill give her this task. Why
dont you plan on coming over at 11:00 A.M. on Monday? Celeste and I will show you what we
have found, and then we can take Celeste to lunch for giving up her weekend.
Your task in Case 19.2 is to take Celestes role, use The Hobbits Choice Restaurant SPSS
data set, and perform the proper analysis. You will also need to interpret the findings.
1. What is the demographic target market definition for The Hobbits Choice Restaurant?
2. What is the restaurant spending behavior target market definition for The Hobbits
Choice Restaurant?
3. Develop a general conceptual model of market segmentation for The Hobbits Choice
Restaurant. Test it using multiple regression analysis and interpret your findings for
Jeff Dean.
Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.
579