Vous êtes sur la page 1sur 17

Determining and Interpreting Associations Among Variables

1.,
:~'r.~J,,'1,~(;,
.~~..

!!I Be odventurous ill Be independent

a
On

Enjoy life the' other hand, relative to pct


()WI

nonowners

of pets were found to be:

Conservative
II Fatalistic 13 Health-conscious

'~!

III Concerned

for the environment of IJol ,\,


I}II', ,

Taking a closer look at the nature

.1.":11:

0'

~,,;

et Ownership: ANew Market ~gmentation Basis?


developed group ket segments of consumers

,
have idcnufled
or not owning a pet.

i\~,

Ii'

ship,

the researchers

then investigatecJ dog factors

relationships owning
Traits

with cat versus

OWII,.",

traits and demographic

cssocicuocl

a dog were found to be:

Marketing

managers

have a host of bases demographics,

for segmenting benefits,


Researchers
cat ownership.

their markets, and product

including

lifestyles,

Demographics 35-54 years old

usage, not to mention a number of commercially

systems that are available. can be identified on key factors

In general,

market segments

based on dog ownership,

Conservatism Traditionalism

market segmentation

holds that if a that is dif-

by some basis, then it will differ from other marbeliefs, or some other aspect To state the concept segment. somewhat

Less educated Married

such as purchases,

useful to a marketer

seeking to target that segment.

Ferently, the members of a market segment will be uniquely associated ~i0h:;, .::;;td ,,:h.::~ c i'larkt::~il-18liIUIIU~t:1 iJt:!,;lj~it:!!i z: those predispositons Seeking tion potential ions across in a marketing strategy. researchers specifically investiqoted dog~ers~s

wit~ prcdicposi-

fne marker

ne or sne

Being a cot owner

was associakd

with the following

treits:

can use
Traits Demographics More educated Metropolitan dweller

to break new ground, of pet ownership, and,

the marketing cot ownership.

segmentaI They sur-

Adventurous Healthconscious Concerned for the enviranment

veyed olorqe number of American pet ownership (or nonownership) described in this chapter. The researchers discovered

adults regarding with a technique that pet ownership

their attitudes, called

interests, and opinanswers to their which is

a number of topics, and they compared

these respondents'

cross-tabulation,

While (cat, dog, relative or both) was associto nonowners of pets, Whiskas Iinonciol For example,

dog-and-cat

segmentation shows

is obvious a middle-aged

for a. pet food

company worrying

such al

(for cats) or Alpo (for dogs), if an advertiser it would matters,

there are more subtle uses of these ossociolic morried- couple to the ad if there was a Family do"

ated or related to a certain pet owners were found to:

lifestyle profile. Namely,

lend more credence

--'

..,..-----_ ...__ ............ --- ...

-----

" Ih\jH\.'

IV

lol\"l

\.:.1 ~JIIIIIHg

,111(,1 J.lllCrprcung

Assocrations

~lnong

Variables Types of Re!ationships between Two Variables

the ad. On the other hand, if an air purification system was being advertised, it would be appropriate to show the family cat reclining in front of the purification unit.

sociative analyses determine ecrher stable- relationships .sr between two variables,

chapter illustrates the usefulness of statistical simple descriptive measures and statistical inference. Often, as we have described III our pet ownership segmentation example, marketers are interested in relationship, among variables. For example, Frito-Lay wants to know what kinds of people and under what circumstances these people choose to buy Doritos, Fritos, and any of ihe other items in the Frito-Lay line. The Pontiac Division of General Motors wants to know what types of individuals would respond favorably to the various style chang~s proposed for the Firebird. A newspaper wants to understand the lifestyle characteristics of its prospective readers so that it is able to modify or change sections in tlte newspaper to better suit its audience. Furthermore, tl!e newspaper desires information about various types of subscribers so as to communicate this ill formation to its advertisers, helping them in copy design and advert isernem placement within tlte various newspaper sections. For all of these cases, there arc statistical procedun-, available, termed "associative analyses." which determine answers to these questiou-. Associative analyses determine whether stable relat ionslnps exist between t"." variables; they are the central topic of this chapter. We begin the chapter by describing the four different types of'relationships possible between two variables. Then, we describe cross-tabulations and indicate how a crosstabulation can be used to compute a chi-square value, which in turn, can be assessed to determine whether or not a statistically signiflcant association exists between the two variables. For cross-tabulations, we move to a general discussion of correlation coefficients, and we illustrate the use of Pearson product moment correlations. As in our previous analysis chapters, we show you SPSS steps to perform these analyses and the resulting output.

":rhi'

m'y,,, bey"",

mfluence. Nonetheless, statistical linkages or relationships often provide us with insights lead to understanding even though they are not cause-and-effect relationships. For l'X~l11ple, if we found a relationship that 9 out of I 0 bottled water buyers purchased flavored water, we understand that the flavorings are important to these buyers. Associative analysis procedures are useful because they determine if there is a consistent and systematic relationship between the presence (label) or amount of one var i.iblc and the presence (label) or amount of another var iable. There are four basic types of rclauonships between two variables: nonmonotonic, monotonic, linear, and curvilinear. A discussion of each follows:
IItJl

Nonmonoronic Relationships
A

A norunonotonic relationship is one in which the presence (or absence) of one variable is systematically associared with the presence (or absence) of another variable. The term nonmonotonicmeans essentially that there is no discernible direction to the relationship, but a relationship exists. For example, McDonald's knows from experience that morning customers typically purchase coffee, whereas noon customers typically purchase soft drinks. The relationship is in no way exclusive-there is no guarantee that a morning customer will always order a coffee or that an afternoon customer will always order a soft drink. In general, though, this relationship exists, as-can be seen in Figure 18.1. The nonrnonotonic relationship is simply that the morning customer tends to purchase breakfast foods such as eggs, biscuits, and coffee, and the afternoon -customers tend to purchase lunch items such as burgers, fries, and soft drinks. In other words, with a nonrnonoronic relationship, when you find the presence of one label for a variable, you will tend to find the presence another specific label of another variable: breakfast diners typically order coffee. Here are some other examples of nonmonotonic relationships: (I) People who live in apartments do not buy lawn mowers but homeowners do; (2) tourists in Daytona Beach, Florida, during "bike week" are likely to be motorcycle owners, not college students; and (3) Play Starion game players are typically children, not adults. Again each example reports that the presence (absence) of one aspect of some object tends to be joined to the presence (absence) of an aspect of some other object. But the association is very general, and we must state each one by spelling it our verbally. In other words, we know only the general pattern of presence or nonpresence with a nonmonotonic relationship.

no nmonot onh 11'11111 means two vari.lhl.,..., III dated, but only eral sense.
ill,\ \'1'1

or

TYPES OF RELATIONSHIPS

BETWEEN TWO VARIABLES

In order to describe a relationship between two variables, we must first remind you 01 lhe ;Lale characteristic called "description" [hat we introduced 10 you ill Chapter 10. Every scale has unique descriptors, sometimes called "labels" or "amounts," that identify the different labels of that scale. The term levels implies that the scale is metric, namely interval or rattorwhile the term labels implies that the-scale IS not metric, typically nominal. A simple label is a "yes" or "no" one, for instance, if a respondent is labeled as a huyer (yes) or nonbuyer (no) of a particular product or service. Of course, if the researcher measured how many times a respondent bought a product, the amount would be the number of times, and the scale would be metric because this scale would satisfy the assumptions of a ratio scale. A relationship is a consistent and systematic linkage between the labels or amounts for two variables. This linkage is statistical, not necessarily causal. A causal linkage is one in which you are certain one variable affected the other one, but with a statis tical linkage you cannot be certain because some other variable might have had some

I
j

Monotonic Relationships
Monotonic relationships are ones in which the researcher can assign a general direction to the association between the two variables. There are tworypes of monotonic relationships: increasing and decreasing. Monotonic increasing relationships arc those in which one variable increases as the other variable increases. As you would guess, monotonic decreasing relationships are those in which one variable increases as the other variable decreases. You should note that in neither case is there any indication of the exact amount of change in one variable as the other changes. "Monotonic" means that the Breakfast Orders Lunch Orders
A monotonic re lat iUI Ifr!:11i means you know I,!W \"'11' direction of the n':I,IIIOI1 between L ~...rI,O"l~, 'v

FIGURE 18.1
McDonald's Example" Nornnonotonic Relationship for the 'f) Drink Ordered 1[ BrN and at Lunch

:ations!1ip is a consistent systcrnatir linkage -ecn the labels or unts for two var-iables.

\ 1I'!jllt'"

1 Ii:

J)~'ll'I'IIIJII"lg.1IlcJ

interpreting

Associauons

Among Var iables

Types of Relationships between TwoVariables relationship can be described only in a general directional sense Beyond this, precisi(\ll in the description is lacking. The following example should help to explain this concept. The owner of a shoe store knows that older children tend to require larger sho sizes than do younger children, but there is no way to equate a child's age with the right shoe size. No universal rule exists as to the rate of growth of a child's foot or to the IInal shoe size he or she will attain. There is, .however, a monotonic increasing relationship between a child's age and shoe size. At the same time, a monotonic decreasing rela(illil. ship exists between a child's age and the amount of involvement of his or her pareiu, in the purchase of his or her shoes. As Figure 18,2 illustrates, very )'oung children ()ft~1l have virtually no input into the purchase decision, whereas older children tend to gain more and more control over the purchase decision process until they ultimately beuJIl1e adults and have complete control over the decision. Once again, no universal rule "per. ares as to the amount of parental influence or the point in time at which the child becomes independent and gains complete control over the decision-making prueess. It is simply known that younger children have less influence in the decision-making process, and older children have more innuence in the shoe purchase decision.The rela, tionship is therefore monotonic. GellerJIi)" as , , Iltid H"'W or she has 11\01(' 11111111'1111
his or her shoe PUI i IhPII'~ the relationship h ""I lill

relationships em I creasing or decreasing.


~Ionic

-:;1/'.'
. ...
, ,

'~<:1'".:'}..:~!~~ ..

(!~~\
,

...

:'.':f

'." IF;'\ , . ~\. ....

:~Is.;;:':!:ii\<,

linear Relationships
ar relationship
.,I()

means

variables

have a

gtu-Iinc" relationship.

Now, we will turn to a more precise relationship. Certainly 'the easiest association tll envision between two variables is a linear relationship. A linear relationship. is J "straight-line association" between two variables. Here, knowledge of the arnuuru "I' one variable will automatically yield knowledge of the amount of' the other var iahlc ,l\ J consequence of applying the linear or straight-line formula that is known t" ni,t between them. In its general form, a straight-line formula is as follows: Formula for a Straight Line

y = a + bx
The terms intercept and slope should be familiar to you, but if they are a bit hazy, do not be concerned as we describe the straight-line formula in detail in the next chapter. We also clarify the terms independentand dependentin Chapter 19. It should be apparent to you that a linear relationship is much more precise and contains a great deal more information than does a monotonic relationship. By simply substituting the values of a and b, an exact amount can be determined for y given any value of x. For example, if Jack-in-the-Box estimates that every customer will spend about $5 per lunch visit, it is easy to use a linear relationship to estimate how many dollars of revenue will be associated with the number. of customers for any given location. The following equauon would be used: . Straight-Line Formula Example y
Linear relationships precise.
.HI.'

where:

y = the dependent
a b

variable being estimated

or predicted

11"

= 'the

intercept the slope variable used to predict the dependent variable

x = the independent

RE 18.2
ld's Control of His noe Purchases: rtonic Increasing onship .:
Or

A child gains conlrol 01his/her shoe purchases with developmenl, bullhe relalionship is not precise Complele

...

SO + $S x x

'2 c o
U

o
<r:

c
E
o

'"

where x is the number of customers. So if 100 customers come to a Jack-in-the-Box location, the associated expected total revenues would be $0 plus $5 times 100, or $500 dollars. If 200 customers were expected to visit the location, the expected total revenue would be $0 plus $5 times 200, or S I 000, To be sure, the Jack-in-the-Box location would not derive exactly S I 000 for 200 customers, but the linear relationship shows what is expected to happen, on average.

None Newborn Infanl Toddler Preschool Primary Middle Siage in Child's Developmenl High College

Curvilinear Relationships
Now, we turn to t he last type of relationship. Curvilinear relationships are those in which one var ia hie is associated with another variable, but the relationship is described hy a curve rather than a straivht line. In other ,,'oreis,' the formula for a
A curvilinear relationship means some smooth curve p~ tern describes the associatiot

ell,t,ptcr

I B: Determining

and Interpreting

Associations

AmongYariables

Cross- Tabulati ous

curved relationship is used rather than the formula for a sr.raight line. lI[any curvilln ear patterns are possible. For example, the relationship may be an S-shape, a J-shap~, or some other curved-shape pattern. An example of a curvilinear relationship wilh which you should be familiar is the product life-cycle curve that describes the salc~ pattern of a new product over time that grows slowly during its introduction and then spurts upward rapidly during its growth stage and finally plateaus or sI0\\'5 down con siderably as the market becomes saturated. Curvilinear relationships arc beyond tht, scope of this book; nonetheless, it is important to list them as a type of relationshltl that can be investigated through the use of special-purpose statistical procedures. '

CHARACTERIZING VARIABLES

RELATIONSHIPS

BETWEEN

Depending on its type, a relationship can usually be characterized in three ways: by its presence, direction, and strength of association. We need to describe these before taki ng up specific statistical analyses of associations between two variables.

Presence
::>rcsencc of a relationship

cell two variables is detercd by a statistical test.

Presence refers to the fl.nding that a systematic relationship exists between the two variables of interest in the population. Presence is a statistical issue. By this staterncru , Wl' mean that the marketing researcher relies on statistical significance tests to determine whether there is sufficient evidence in the sample to support that a particular association is present in the population. Chapter 17 on statistical inference introduced the concept or a null hypothesis. With associative analysis, the null hypothesis states there is no association present in the population and the appropriate statistical test is applied to test this hypothesis. If the test results reject the null hypothesis, then we can state that an association is present in the population (at a certain level of confidence). We descri be the statistical tests used in associative analysis later in this chapter.

. )'IIl'rnatic association is found to be present between two variables, it is then up to the researcher to ascertain the strength of the association. Strong associations are IIIII,.t' in which there is a high probability of the two variables' exhibiting a dependable lI'I,1l ronship, regardless of the type of relationship being analyzed. A low degree of assoI 1.11 rou, on the other hand, is one in which there is a low probability of the two vari,rllll's' exhibiting a dependable relationship. The relationship exists between the va r i,dll,'" but it is less evident. There is an orderly procedure for determining presence, direction, and strength of ,I n-lauonslup. First, you ,must decide what type of relationship can exist between the IWO variables of interest. The answer to this question depends oE., the scaling assumplions of the variables; as we illustrated earlier, low-level (nominal) scales can embody flilly imprecise, pattern-like, relationships, but high-level (i nteryal or ratio) scales can I ncorporare very precise and linear relationships. Once you identify the appropriate relauoriship type as either nonmonoronic, monotonic, or linear, the next step is to determine whether that relationship actually exists in the population you are analyzing. This "l'P requires a statistical test, and, again, we describe the proper test for each of these three relationship types beginning with the next section of this chapter. When you determine that a true relationship does exist in the population by means of the correct statistical test, you then establish its direction or pattern. Again, I he type of relationship dictates how you describe its direcrion'You might have to inspect the relationship in a table or graph, or you might need only to look for a posit ive or negative sign before the computed statistic. Finally, the strength of Ihe relationship remains to be judged. Some associative analysis statistics indicate the >trcngth in aver)' stralghtfc)[\ovard manner-that is, just by their absolute size. vVith nominal-scaled va ri ahles , however. you must inspect the- pattern to judge the strength. We describe this procedure next.
111,11 kl'lillg

Based

on sc;\ling

,,~,'111111HIIII

first determine Ih..: 1)/1'11 HI relationship, "rid i heu lUll form the appr-op-lau- ~1.111,o,j1 cal test.

.CROSS-TABULATIONS
Cross-tabulation and the associated chi-square value that we are about to explain are used to assess if a nonmonotonic relationship exists between two nominal-scaled variables. Remember that nonmonoronic relationships are those in which the presence of one variable coincides with the presence of another variable, such as lunch buyers ordering soft drinks with their meals.

Direction (or Pattern)


You have seen that in the cases of monotonic and linear relationships, associations may be described with regard to direction. As we indicated earlier, a monotonic relationship may be increasing or decreasing. For a linear relationship, if b (slope) is positive, then the linear relationship is increasing; and if b is negative, then the linear relationship is decreasing. So the direction of the relationship is straightforward with li~ear and monotonic relationships for nonmonotonic relationships, positive or negative direction is inappropriate, because we can only describe the pattern verbally. 2 It will soon become clear to you that the scaling assumptions of variables having a nonmonotonic association negate the directional aspects ofthe relationship. Nevertheless, we can-verbally describe the pattern of the association as we have in our examples, and that statement substitutes for direction. Finally, with curvilinear relationships, we can use a formula; however, the formula will d~fine d pattern such as an S-shape that we refer to in characterizing the nature of (he relationship.

lion means that you if th!; relationship is -ve negative. while patrieans you know the genarure of the relationship.

Ii1;ip~Ltii1gtl,e Itc:lationship

-v,.-ith a

n(ii'

Chart
Bar charts can be used to "t-tt a nonmonoronic rclatiousbl]

or

Strength of Association
Finally, when present (that is, statistically Significant) the association between two variables can be envisioned as to its strength, commonly using words such as "strong," "moderate," '''weak,'' or some similar characterization; that is, when a consistent and

I:ent

:th means you know how the relationship is.

A handy graphical tool that illustrates a nonmonotonic relationship is a stacked bar chart. With a stacked bar chart, two variables are accommodated simultaneously in the same bar graph Each bar in the stacked bar chart stands for 100%, and it isdivided proportionately by the amount of relationship that one variable shares with the other variables. For instance, you can see in Figure 18.3 that there are two variables: buyer type and occupational category The two bars are made up of tWO types of individuals: buyers of Michelob Beer and nonbuyers of Michelob. There are two types of occupations: professional workers, who might be called "white collar" employees, and manual workers, who are sometimes referred to as "blue collar" workers. With the buyers stacked bar: you can see that a large percent of the white collar stacked bar is accounted for by the Michelob buyers, while a smallerpercent of Michelob buyers is apparent on the blue collar workers bar graph.

28

Chapter

18: Determining

and Interpreting

Associations

Among Variables

:IGURE 18.3
-.1ichelob Light Purchases Status .rid Occupational

100 90 80 70 60 Michelob Buyer? Buyer

~*~~!~~(qti~r.~~~~:ii.~~t:{~:; -,
'2&.1 Non buyer
li1lqjlll'lIdcsTable

~ 50
CL

Buyer White Collar

Nonbuyer

Totds

40 30
en
LU

t52

t60

('I

~'Ip,\tlonal statUS 1S1.+14=166 8 + 26 = 34

31ue Collar Totals

14 166

26 34

-40 200

10 0 While Collar Blue Collar Occupalional Sialus

It.\W Percentages Table Buyer

Nonbuyer
4%

Totals 80% (l-toO) 20% (40) tOO%

You should remember that we described a non monotonic relationship as an identifiable association in which the presence of one variable is paired with the presence (or absence) of another. This pattern is apparent in the stacked bar chart in Figure 18.3: buycrs tend to be professional workers, while nonbuyers tend to be manual workers. Alrer nauvely, nonbuyers tend not to be white collar workers, while buyersrend I]()[ to have hlue collar occupations.

[t521200

76~'r----Wl'
Collar

'ute

76% (152) 7% (14) 83% (8) 13% (26) 18% (34)

Occupational statuS

Blue Collar

Cross- Tabulation Table


cross-tabulation consists of ows and columns defined by :::J.C categories classifying each ar iable.

Totals Column Percentages Table

(166)

(200)

While a stacked bar chan provides a way of visualizing nonmonotonic relationships, the most common method of presentation for' these situations is through the use of a crosstabulation table, defined as a table in which data are compared using a row-arid-column format. A cross-tabulation table is sometimes referred to as an "r X c" (r-by-c) table because it comprises rows and columns. The intersection of a row and a column is called a crosstabulation cell A cross-tabulation table for the stacked bar chart that we have been working with is presented in Table 18.1. Notice that we have identified the four cells with lines for the rows and columns. The columns are in vertical alignment and are indicated in this table as either "Buyer" or "Nonbuyer" of Michelob Light, whereas the rows are indicated as "White Collar" or "Blue Collar" for occupation.

Buyer 92% (t 52) 8% (14) \00% Totals Row Percentages Table l\uyer 95% (\52) 35% (t4) 83% Totals (166) (\66)

Nonbuyer 24% (8) 76% (26) \00% (3+)

Totals 80% (\60) 20% (40) 100% (200)

152/166

= 92%

White Collar Blue Collar

Occupational

status

Types of Frequencies and Percentages in a Cross-Tabulation Table


Look at the Frequencies Table section in Table 18.1. The upper left-hand cell number identifies people in the sample who are both ~ite-collar workers and buyers of Michelob Light (152), and the cell to its right identifies the number of individuals who are white-collar workers who do not buy Michelob Light (8). These cell numbers represent frequencies; that is, the number of respondents who possess the quality indicated by the row label as well as the quality indicated by the column label. Table 18. I illustrates how at least four different sets of numbers can be computed for cells in the table. These four sets are the frequencies table, the raw percentages table, the column percentages table, and the row percentages table. The frequencies table contains the raw numbers ,determined from the preliminary tabulation.3 The lower right-hand number of 200 refers to the total sample size, sometimes called the

Nonbuyer 5% '(8) 65% (26)


\8%

--

'totals 100% (160) 1Ooo/~ (40) \00% (200)

152/160

= 95%

While Collar Blue Collar

occupational

status

(34)

-<ross-classification table can lye four types of numbers in ch cell: frequency, raw per:ntage, colwnn percentage, .d row percentage.

__________

~_

"' .<.__ .. ,._

__

,~,::ot

('11.1pICr 18: Determining

and Interpreting

Associations

Among Variables Chi-Square Analysis 1;'11

"grand total." Just above it are the totals for the number of white-collar (160) blue-collar (40) occupation respondents in the sample, Going to the left of the gra, total are the totals for Michelob Light nonbuyers (34) and buyers (166) in the sani pie, The four cells are the totals for the intersection points: 152 white-collar Micheloh Light buyers, 8 white-collar non buyers, 14 blue-collar Michelob Light burers, and 26:~ blue-collar nonbuyers. , These raw frequencie. can be converted to raw percentages br dh'iding each by , the grand total. The second cross-tabulation table, the raw percentages table, con- , tains the percc:ntages of the raw frequency numbers just discussed. The grand tOtal; location now has 100 percent (or 2001200) of the grand total. Above it are 80% and' 20% for the raw percent'.ges of white-collar occupational respondellts and bluecollar occupational respondents, respectively, in the sample. Divide a couple of the cells justto verify that you understand how they are derived. For instance 152 -r- 200 :: 76 percent. Two additional cross-tabulation tables can be presented, and these are more valuable in revealing underlying relationships. The column percentages table divides the raw frequency by its column total raw frequency. The formula is as follows: Formula Column for a Cell Percent

The row percentages table presents the data with the row totals as the 100 perIl,ltl hase for each. That is, a row cell percentage is computed as follows:
Jinl'Illu1a for a I\oIV Cell Percent

Row ce

II

percent

C_e_ll_f_r_e-,q_u_e_n_c,-y __ Total of cells in that row


Row (column) pcrCCnlll~!(\.iI are row (co iumn) (;(,111'1"" quencies divided by I he lOW (column) Iota I.

centages are cell :ies divided by rl total.

Column

cell percent

- Cell freq uency = Total of cells inthat column

'~'5

.":?
For instance, it is apparent that of the nonbuyers, 24% were white-collar anrl 76% were blue-collar respondents. Note the reverse pattern for the buyers group: ,)2'~'" !If white-collar respondents were Michelob Light buyers and 8% '''vere blue-collar huycr. You are beginning to see the nonmonotonic relationship.

I.",

Now, it is possible to see that, of the white-collar respondents. 951" were buyers ,tllti S% were non buyers. ~\s you compare the RO\\' Percentages Table to the Column l'I'I('~lItages Table, you should detect the relationship between Occupational Status and MiI'hclob Light beer preference. Can you state it at this time) Unequal percentage concentratibns of individuals in a few celJ.s.,as we have in this I'x,emple, illustrates the possible presence of a non mor.otoni c association. If v.e h;:,d li.tlnt! ihrtt approximately 25% of the sample had fallen in each ofrhe four cells, no relatlnll,hip would be found to exist-it would be equally probable for any person to be a Mlchelob Light buyer or non buyer and a white- or a blue-collar worker. However, the 1,lIgc concentrations of individuals in two particular cells here suggests that there is a IlIgh probability that a buyer of Michelob Light beer is also a white-collar worker, and there is also a tendency for nonbu yers to work in blue-collar occupations. In other words, there is probably an association between occupational status and the beer-buying behavior of individuals in population represented by this sample. We must test the statistical Significance of the apparent relationship before we can say anything more about it.

I'

"

~ CHI-SQUARE

ANALYSIS
Chi-square analysis assesses nonmonoronic associations In cross-tabulation tables.

:::msbetween product :! preferences and :Jhic characteristics :;eters identify their ..kets.

Chi-square (X2) analysis is the examination of f[(~quencies lor two nominal-scaled variables in a cross-tabulation table to determine whether the variables have a nO)]I11Onotonic relationship." The formal procedure for chi-square analysis begins when the researcher formulates a statistical null hypothesis tbat the two variables under investigation are not associated in the population. Actually, it is not necessary for the researcher to state this hypothesis in a formal sense, for chi-square analysis always explicitly takes this hypothesis into account. In other words, whenever we use chi-square analysis with a cross-tabulation, we always begin with the assumption that no association exists between the two nominal-scaled variables under analysis.'

Observed and Expected Frequencies


The statistical procedure is as follows. The first cross-tabulation table in Table 18. J contains observed frequencies, which are the actual cell counts in the cross-tabulation tabl,,, These observed freque nco-s are compared In expected fr~uencies, which are rl~B.Ded as the theoretica' frp~lJ,=ncics iha: are derived ::-:::;~n this hyp()the~~i.s-ofno association between the two var iables. The degree to which the observed frequencies depart from the expected frequencies is expressed in a single number called the "chi-square statistic." The computed chi-square statistic is then compared to a table chi-square value (at a chosen level of significance) to determine whether the computed value is s.ignificantly different from zero. Here's a simple example to help you understand what we just stated. Suppose you perform a blind taste test with 10 of your friends. First, you pour Diet Pepsi in J 0 paper cups with no identification on the cup, Next, you assemble your 10 friends, and you let each one try a taste frorn his or her paper cup. Then, you ask each friend to guess whether it is Diet Pepsi or Diet Coke. If your friends guessed randomly, you would expect five to guess Diet Pepsi and five to guess Diet Coke. This is your null hypothesis: There is no relationship between the Diet Coke being tested and the guess. But you find Ih,1 9 nf vour fripnrls r or rr-rtlv ~lle" "Diel Pe ris i " ;1l1r1 I inc orrt-ci lv \Jll("SS("S"DiPI

I
,

Observed frequencies are the counts for each cell found in the sample. Expected frequencies are calculated based on the null hypothesis 'of no association between the two variables under investigation.

~.'

Chapter

18: Determining

and Interpreting

Associations

Among Variables

Chi-Square

Analysis

'III

Coke." In other words, you have found a departure in your observed frequencies from the expected frequencies. It looks like your friends can correctly identify Diet Pepsi about 90% of the time. There seems to be a relationship, but we are not certain of its Sta. tistical Significance, because we have not done any Significance tests. The chi-square statistic is used to perform such a test. We will describe the chi-square test and then aprly it to your bli nd taste test using Diet Pepsi. The expected frequencies are those that would be found if there were no assolia[ion between the two variables. Remember, this is the null hypothesis. About [he only "cHIlcult" part of chi-square analysis i, in the computation of the expected freqllencie~. The computation is accomplished using the following equation: Formula for an Expected Cell Frequency Expected celJ frequency = Cell column total x Cell row total Grand total

The application of this equation generates a number for each cell that would kn'c occurred if the stud)" had taken place and no associations existed. Returning to our Michelob Light beer example, you were told that 160 white-collar and +0 blue-collar consumers had been sampled, and it was found that there 'were 166 buyers and 3+ nonbuyers of Michelob Light. The expected frequency for each cell, assuming no associa tion, calculated with the expected cell frequency is as follows: 160 x 166. . = 132.H

I ,11I\('llation effect. This value is divided by the expected frequency to adjust for cell ,,1/(' differences, and these amounts are summed across all of the cells. If there are [II,[I[}' large deviations of observed frequencies from the expected frequencies, the (I unputed chi-square value will increase; but if there are only a few slight deviations 1'1(1111 the expected frequencies, the computed chi-square number will be small. In IIIlit'!' words, the computed chi-square value is really a summary indication of how 1,,, ,1IVayfrom the expected frequencies theobserved frequencies are found to be. As "It h, it expresses the departure of the sample findings from the null hypothesis of [HI,lSsociation. Some researchers think of an expected-to-observed cornpansojj analysis as a "good[11'$$ of fit" test. It assesses how closely rhe actual frequencies fit tire pattern of expected II t'lj uencies. We have provided Marketing Research Insight 18.1 .as,an illustration of the goodness-of-fit notion." Let us apply this equation to the example of your 10 friends guessing about Diet P('psi or Diet Coke. We already agreed that if they guessed randomly, you would find five guessing for each brand, or a SO-50 split. But if we found an 90-10 vote for Diet Pepsi, you would be inclined to conclude that they could recognize Diet Pepsi; that is, most recognized the cola taste, so they gave the name, Diet Pepsi, that is related to it. Let's use the chi-square formula with observed and expected frequencies to see if the relationship is statistically Significant. . -

Chi-square analysis ilo sometimes referred I 0 "goodness-of-fit" ICSt.

11~

White-collar Calculations of Expected Cell Frequencies Using the Michelob Beer Example Whire-collar

buyer ==

200 160 x 34 = ---= 27.2 200 40 x 166 Blue-collar buyer = = 33.2 200 nonbuyer
.

~M~i\~,[*fEliZltN'i'Gtil
-"'l~:.-;' v~-~t,1
<0: ~ :

!iI,I"-

't .. .

l' ' '. ~_ ~"...

Blue-collar nonbuyer

= --200

40 X 34

= 6.8

The Computed

1.2 Value
frequencies to these expected frequencies. The chi-square

-nputed chi-square ompares observed to ed frequencies.

Next, compare the observed formula is as follows: Chi-Square Formula

X2

L (Observed;
n

- Expected;)

;-1

Expected;

z:

where Observed; = observed frequency Expected expected frequency n number of cells

= =

in cell i in cell i

.Can you guess the next number based on the apparent pattern of 1, 3, 5? Okay, what obout this series: 1, 6, 1 1, 1 6? In the first series, you realize that 2 was added to determine the next number (1, 3, 5, 7, 9, and so on). You looked at the series and noticed the equal intervals of 2. You then erected a mental expectation of the series based on your suspected pattern. Let us toke the second series because it is a bit more dif ficult. Suppose your first intuition was to odd 0 3 to the previous number. Here is your expected series and' the actual one compared: Expected
1 I 4

18.1

('Zeroing in" on Goodness-of-Fit

Expected Actual
Difference

13

1
0

2
1

.u
2

l~
3

Getting closer, but still not there. Now try a 5. Expected Actual
Difference

11 11

16

l
0

~
0

..1.2 0

Applied to our Michelob beer example, Calculation of Chi-Square Value (Michelob Example)

7
11

10 16

Actual
Difference

6 2

-square statistic sum; how far away from ected frequencies the d cell frequencies arc 11 be.

(152 - 132-8)2 132.8

+ ----

(8 - 272)2 27.2

(14 - 33.2/ 33.2

- + -'----"-

(26 - 68)2 6.8

= 81.64 Oops, not much of a match here. So let', try a 4.

You have been performing "goodnessof.fit" tests, Notice that the differences become smeller os you zeroed in an -the true pattern. (Catch the pun2) In other words, when the actual numbers are equo! to rho expected numbers, there is no difference, and the fil is perfect. This is the concept used in chi-square onalysis, When the differences are small, you have a good fit to the expected values. Whe~ the differences are larger, you have a poor fit, and your hypothesis (the expected number sequence) is incorrect.

You can see from the equation that each expected frequency is compared to the observed freo uencv ann snuared to adjust for an" IW9a1i"E' values an d 10 avoid [he

Chapter

i8:

Determining

and Interpreting

Associations

Among Variables

Chi-Squnre

AII;)t)'!I,
The cornput ed \ III 'ic I' lilt ~ value is Cl'HllIMI't'd III.i 1.11111' val-ue to dl!II..'I'lnlnt' ,\/;1.11111111 II sigui ficnucc.

To determine

the chi-square

value, we calculate as follows:

X2
Calculation of Chi-Square Value U sing Diet Pepsi Taste Test

I (Observed;
n

- Expected;)

;-1 =----+--5

Expected,

(9-5/

(1-5)2
5

= 6.4
I

Remember,

you need to use the frequencies,

not the percentages.

The Chi-Square

Distribution

chi-square d istr-Ibur.ion's -e changes depending 011 lumber of degrees of' lorn.

Now that you've learned how to calculate a chi-square value, you need to know if it h sta. tistically Significant. In Chapter 17, we described how the normal curve, or L dislributi'Jn, the F distribution and Student's t distribution, all of which exist in tables, are used by a U JIll. puter statistical program to determine level of Significance. Chi-square analysis requires the use of a different distri burien. The chi-square distribution is skewed to the right and the rejection region is always at the right-hand tail of the flistribution. It differs from the nor. mal and t distributions in that it changes its shape depending on the situation at hand, hut it does not have negative values. Figure 18,4 shows examples of two chi-square distribuiion-; The chi-sq uare distr i bution 's shape is determined by the number of degrees (l lrl'l' dom. The flgure shows that the more the degrees of freedom, the more the curve's i,li I is pulled to the right. In orher words, the more tbe degrees of freedom, the larger the' chi square value must be to fall in the rejection region for the null hypothesis. It is a simple matter to determine the number of degrees of freedom. In a UtlS' tabulation table, the degrees of freedom are found through the formula below: Formula for Chi-Square Degrees of Freedom where r is the number cis the number of rows and of columns.

II table of chi-square values contains critical points that determine the break III'tween acceptance and rejection regions at various levels of significance, It also t,tk('s into account the numbers of degrees of freedom associated with each curve; 111.\1 is, a computed chi-square value says nothing by itself-you must consider the 1I111tlberof degrees of freedom in the cross-tabulation table because more degrees of I"'l'dom are indicative of higher critical chi-square table values for the same level of .,Igtlincance. The logic of this sttuation stems from the number of cells. vvith more I r-lls, there is more opportunity for departure from the expected values. The higher 1,t1ilc values adjust for potential inflauori due to chance alone. After all, we want to . , .h-iccr real nonmonotonic relationships, not phantom ones. SPSS and virtually all computer statistical analysis progums have chi-square l,tIJles in memory and print out the probability of the null hwothesis. Let us repeat rhls point: The program itself will take into account the number of degrees of freedOIl1 and determine the probability of support for the null hypothesis. This probabil IIY is the percentage of the area under the chi-square curve that lies to the right of the computed chi-square value. When rejection of the null hypothesis occurs, we have found that a statistically Significant nonmonotonic association exists between Ihe two variables. As an example of the use of cross-tabulations and chi-square, we have prepared Marketing Research Insight 18.2, which illustrates how cross-tabulation can be used with qualitative data. In this case, the researchers judged the models and the wording or Seventeen magazine advertisements and use a nominal classification system. Thus.the only way to analyze the data is with cross-tabulations .

Computer stali~dl,' I plt'H' ,II look up table CIII ~qll.U" \ Ii I ues and print 0111 tilt pltillo' bilitv of sup port Ii)! 1111' 111\11 hypothesis.

COMPUTE CHI-SQUARE VALUES


We have described the concepts of observed frequencies. expected frequencies. and computed chi-square value. Plus. we have provided formulas for the latter two concepts. Your task in this Active Learning exercise will be to compute the expected frequencies and chi-square values for two different cross-tabulation tables. Marketing Research Insight 18.2 has a cross-tabulation for Visual and a cross-tabulation for Verbal judged "girlish" advertisements in Seventeen magazine. Compute the expected frequencies and chi-square values for each.

.. ;'J[l~i

'!;"'~'

Degrees of freedom

(r - I)(c - I)

How to Interpret JRE 18.4


~.4 Chi-Square Curve for Degrees of Freedom

a Chi-Square

Result

Chi-Square Curve's e Depends on Its -ees of Freedom

/'

Chi-Square Curve lor 6 Degrees 01 Freedom

Rejection Region is the Right-Hand End 01 Curve

How does one interpret a chi-square result? Chi-square analysis yields the amount of support for the null hypothesis if the researcher repeated ule study many, many times with independent samples. By now, you should be well acquainted with the concept of many, many independent samples. For example, if the chi-square analysis yielded a 0.02 Significance level for the null hypothesis, the researcher would conclude that only 2% of the time he or she would find evidence to support the null hypothesis. Since the null hypothesis is not supported, this means there- is a signiflcant association. It must be pointed out that chi-square analysis is simply a method to determine whether a nonmonotonic association exists between two variables. Chi-square does not indicate the nature of the association, and it indicates only roughly the strength of association by its size. It is best interpreted as a prerequisite to looking more closely at the two variables to discern the nature of the association that exists between them. That is, the chi-square test is another one of our "flags" that tell us whether or not it is worthwhile to inspect all those row andcolumn percentages.

Chapter

18: Determining

and interpreting

Associations

Among Variables

Chi-Sgll,ln'

)\11.11)',1.,

-_'M -
jccasio"aliya e data, ,is situation. ifferent

II"

,I

u-lauonship

with

a significince
that Pepsi 3.8, he or that blind and value. that the the

level

of .05 or less wasting time exists between value

(the the

flag two

is waving),
variables for the

the in the

m'v\~~rR1rc~"E~f1f1N'@i!il

,,11'.,"
,I

her can be assured In our Diet than

~!ElhWIi@n&44HMII

she
taste

is not test,

and is actually table value so the

pursuing

1",11 association

, a relauonship was

truly

r and even rebellion. in port examined version Since by consumers' media (American) magazine. selfidentitie, and are shaped cessive Japanese judging mass advertising, ond the modeis (verbal) or not meaning The cross. researchers all of the relevant ods in four sue. version They used a issues of the English system to classify judging

l'''IIIII,llion. it'YI1

or signifiG>.Ilce

the chi-square was

9S~o

computed

6.4,

computed

\,11111'is larger

the critical

If we used relationship

SPSS, the Significance is staristically

level would

be

82

Cross Tabulations Reveal Cultural Differences between Japanese and American Seventeen Magazine Ads
researcher must work wilt, purely quai irodescribes girls in culture individuin comparing teen-ope Research portrayed Insight

1'1'''1 I vel as .000 I, indicating

Significant

of Seventeen

the visual the visual

or pictured the words whether "girlish" norms.

~-~8:E:
The Hobbit's Choice Restaurant Survey: Analyzing Cross-Tabulations for Significant Associations by Performing ChiSquare Analysis with SPSS
1110re likely tabulation indicated patronize likely" ing medium. to intend analysis to patronize We are Restaurant perform with going to use data our the Hobbit's Choice how to analysis and perwere survey to demonstrate cross-tabulation variable and interpret

and this Marketing magaZIne identity

and another

system to classify the judges components decided were follow.

Researchers .were interested Specifically,

in the ads. In cla'ssifying the advertisements, the advertisement's that they were tabulation

and verbal

aspects of

le way Seventeen cultures? shored mphasizes

the Japanese

and deemphasizes

indicative

of child-like

SPSS. You may recall

that we used subscript test in Chapter We can use crossbecause respondents

lity while the American

culture emphasizes

individuality

tables that resulted

tion to City Magazine as a grouping formed the Hobbir's an independent-samples that Choice Restaurant. variable 16, We found

VtSUAl JAPANESE SEVENTEEN 73 31

IPtCTURESI U,S, SEVENTEEN 64 95 Girlish Not girlish

VERBAL JAPANESE SEVENTEEN 45 59

IWORDS) U.S. SEVENTEEN 39 120 to get a better picture

City Magazi1resubscribers

of the effectiveness the respondents those of the Hobbit's patron:' to perform

of City Magazine as an adver tisbased on how likely. they are to likely" patron:' test with SPSS is
ANALYZE-

Subscription

to City Magazine is a nominal can categorize By taking patron sequence which

Gidish Not girlish

"yes" or "no."We the Hobbit's and creating

Choice. a "probable

who are "very Choice or "not probable a chi-square

or are "somewhat variable in

Restaurant"

which respondents In both cases, the computed rge and stalislicolly o relationship 6dvertisements, Irayal of teenaged significant, females chi-squor meaning culture e values Ihatthere were was vividly depict the nalure teenaged strongly of how Ihe odvertisements Japanese while female in

are either command analysis.

"probable

Seventeen
version cultural

The clickstream
DESCRTPTtVE

magazine

communicate females, teenaged

cultural American

STATtSTtCS-CROSSTABS,

leads to a dialog box in which you can select the variin Figure patron buttons I 8,S, we have selected of the Hobbit's at the bottom expected Subscribe to as , . Choice Restaurant frequencies,

between the country's The following column

and the pormagazine tables

norms to Japanese

the American readers.

in the Seventeen

of Seventeen

ables for chi-square the column option centages, variable.

In our example and Probable options of observed

With SPSS. chi-square is .'11 option under the "Crossr.rbv" analysis routine.

communicates

City Magazine as the row variable, leads to the specification column percentages, and the column option. output is found

percentage

norms to American

There are three

of the box. The Cells, frequencies

frequencies,

row perof statis-

ViSUAlIPICTURES) JAPANESE SEVENTEEN Girlish, Not girlish r: 70% 30% 100% U,S, SEVENTEEN 40% 60% 100% Girlish Not giriish

VERBAL JAPANESE SEVENTEEN 43% 57% 100%

IWORDS) U,S. SEVENTEEN 25% 75% 100%

and so forth, We have opted percents. The Statistics

for just the observed

(raw counts) the chi-square

.. , button

opens up a menu

tics that can be computed The resulting

from cross-tabulation in Figure

tables, Of course,

the oilly one we want is

18.6, In the first table, y,pu can see that we have the raw frequency analysis result. which rr-the first entry in each total. In the secthe only

variable and value labels, and the table contains ceil. Also, the row percentages ond table, there is information relevant column sponds statistic pertains is the Pearson to the number are reported chi-square, of degrees on the chi-square which

along with each row and column has been computed

For our purposes,

to be I I 2.878, The df in thi~ example that subscription are: not associated, is to In

of freedom, no support the probability

is 1; and the Asyrnp. Sig. correSignificance

to the probability

of suppOrt patronage

for the null hypothesis, of the Hobbit's

,000, which means that there is practically City Magazine and Probably ~iiic.lnt chi-square means ~c2rchcr should look at [oss...:tabulation row and p.n percentages. to "see" jsociation pattern. When hypothesis true. because the computed worth are more between the chi-square between marketing variables. value the two researcher's of sampling However, is small, variables time error when the null hypothesis assumed on are or the to be other words, SPSS reports somewhere
. So,

for the hypothesis Choice Restaurant

they are related. only three decimal

(Actually, place.)

is not exactly equal to zero because say, 10 places, you would see a number a nonmonotonic significant association. association

of independence the)'

is generally to focus the)' chi-square than

places, If it reported, step

It is not

associations, of meaningful idenu-

past the third decimal chi-square analysis

a function the two

SPSS has effected

the first

in determining that

relationships

analysis

Through

it has signaled

a statistically

\ 11''III\.:1Its: uelcrnuIllug

__-. . . . '!!II j;T.G"! :~.


).'

anc11nterpretln.g

Associations

Among Variables

hi-Squ.m A",dl'~"

TrlSf~

' .

' I

S.

Mer selecting

Celli;

a;::;S~an~~:n to
OK
,,~~~;

the Variahb,

,llllI.dly

exists. The z:cxt step is to fathom

the nature

of the association.

Remember

hat with

SJ'SS SII I, III"'__ 1'11111

11111111

,. lion monotonic
IVI

relationship, you must inspect the pattern and describe it verbally, When looked for the pattern in our example, we converted the frequencies into row and colNow, you have a cross-tabulation table with row percentages in their

T".

I'HI'Ul.II!'d

I ,I~!
1\

St l"li ''I' ,,,,,I r\

ill

'liI~

'I" '~'~~

"~if&BWlJP.H._ '",. _.

-:;._-.";:::_~.-:7"

!.;

c~eMe1S

1111111 percentages.

r()s~ 'r+i\IlU LlllolI'

CD

o
t..,""t.. ,

CoIurnt-(s}

A.

[;'>1100"'< p,,,,o d Hd
l.l __ ~ __ .

c.,
Ho
_

: i 2.
,

In the Crosstabswidcw, ~ Row Variable d the Cobu.o Variable to he


aJfa};ned

an

. ~_.______

I','spcctive cells. The row percentages show that about 88% of Probable patrons are Cily Mnyllzinc subscribers. At the same time, 71% of the Not probable patrons are nonsubscribers In CilYMagazine.You can interpret this finding in the following way. If Jeff places an ad in CilYMagazinefor the Hobbit's Choice Restaurant, almost 90% of the readers of the magazine wl II be members of his target market. In other words, because the significance was Tess than .OS, it was worthwhile to inspect .uid interpret the percentages in th~ cross-tabulation table. By doing this, we can discern the
p;\flCrn or nature of the association, and the percentages indicate its relative strength. More

~~~""""-';~
r r
Eta . ResiI:iJM Cochran's;

importantly, because tlle relationship was determined to be statisti;;:lly 'significant, you can be .issured that this association and the relationship you have observed will hold for the populaI ion

that this sample represents.

IChi-square analysis. net the SPSS significance as the aTIIQUntof support a~ciation between the ar iables being analyzed.

.P

CoIum Toted

I" UrntandardizOO F" 5 tancfatdized


F" .Ar'iI.l-,d,.,rl..d~r'iN'li ..

----~I

'r r.
r

'._.....
Ncnnteger

'3. Click on Cells ... &lectth.C6unlsw, Weq.!J ... ' and Percentages desired in the

Presentation
:~
I haracter

of Cross-Tabulation

Findings

Ro~dctJlcOlnts

c.t;'.~~~.~~~._~4~o.~!!.'!.u_~ .._ :~;. ..

tuocere cel ccc+s "'~~~~~~~~

FIGURE 18.5
The SPSS Clickstream to Create Cross-Tabulation with Chi-Square Analysis

i!I[0!t:1
Rle
Edt View [).,to Trensform_~~_~~ __ ~.,/yle_~r~~ __

~~~~~.:..~~_j

'-::I~I~~~II

r;w.

.~c. n. n. . .,Doyou~Ubs:crlbetoCIyMolQ<lZh 7 'Prob.1IlIeP.l1rOllo(HobbksChCCe1Cf'o

: ."

EilllI'=11iI1'1!1 <01 'I J

' ~:%:f~mahS
-1 ~~:;-:
I Total 18'r .(5.3% 219 504.8% <00 100.0%

The Crosstabulasions

table has raw ~ ~ ,:_

Wilen we introduced the notion of relationship or association analysis, we noted that izing the direction and strength of nonmonotonic relationships -rhat are detected in crosstabularions with chi-square analysis are not possible because nominal .iles are involved. Nominal scales do not have order or magnitude: they are simply catl'gories or labels that uniquely identify the data. To reveal the .norunonotonic relationsllips found Significant in cross-tabulation tables, researchers often turn to graphical presentations, as pictures will show the relationships very adequately Read Marketing Research Insight 18.3 for an example of how graphical presentations can effectively summarize and communicate these relationships.

counts and column percerus as menu

.. ,,'

,..

I
Do you sub scribe to CHyMagOlzlne7 Yes Count within Probable Patron ofHobb!fs Choice?

Probable Patron of (. Hobblrs ChOice? Yes No

iJ;M'~A~ii;KT.E~t';FN;:G'~
-I
~~

971
88.2% 13 11.8% 110 100.0%

804 29.0% 206 71.0% 290 100.0%

r,,:'.:
i-ii

~t:1~.t;i

.,i$~r\_ :';'.

t'A'';'?; ~-ti-.~4io~,."" ~.~~-_-J>.

No

Count

% y,ffhln Probable 'Patron ofHobbi!'s Choice? Total Count % within Probable Patron ofHobblfs ChOice?

,~: :: Value
Pearson Likelihood ChI-Square Ratio 112.878b 110.501 121.910 continuttyC;orrection" Fisher's EJ(actTesl Linear-by-Linear Association N of Valid Cases 112.596 400 .000 Chl-SqUolf9 Testi

The signijicance level of the Chi-squ.~ten non the first row oj .~-:

I
df

h()-=j:;;:::;:t-~~~~,~k
ExactSi~ (2-slded) ExactSig. (t-sfded) .000 000 000 .000

Asymp. SiQ. a-sided)

18.3

Use of Cross-Tabulations to Test and Graphical Presentations to Show CrossTabulation Relationships for Online versus Nononline Shoppers

I
I
You can ignore rows in this table below the Chi-Square 1"0''
~~c""'" ~rr-jf'""fii'1'!i2~j<V&' __. ,,~ _

I
.. :-':'~-:::~:::~:::-:::::::'--=='f-"' ~

~"'J';~':;;: --,2:.'. :. :,,~;;,";.~~ ~-=-- ';'~~;~:~~~~~'~" ::.c-.

1.

'FIGURE 18.6
SPSS Output for Cross-Tabulations with Chi-Square Analysis

The frequencies found in cross-tobulotions. when converted to percenloge tobles, ore quite amenable to graphical presentations that ore very useful in depicting the nature of the relationships found in a survey. With this Marketing Research Insight, we are using the cross-tabulations reported in a vveb-bosed survey that compared online shoppers with individuals who had never made an online purchose.f In the survey, these two types of purchasers

were measured by a number of demographic chorccteris tics such as gender, age, education, ethnicity, maritol slCI tus', and income. Ihey were also measured on sell-reports of computer competency and how they prefer :0 search for information about morketplace allernotives. The following three relationships were found to be statistically significanl, meaning that the relationships exist in the population. Thnll finding of significance with a cros s-tcouloticn allows th researcher to examine and describe the relulionship, 05 il is a nonmontonic pattern and not one rhot can be chorocterized by direction or strength from the chi-square results alone. We hove included a slotement of the apparenl relotionships in each graph. This graphical approach is on appropriate one for nonmontonic relotionships. (continued)

_~_-,-~rr

.....

.-;~r.:p...~-;:::;pr;;,~r.:M";::;:"-";;:::-;:::-;-~"~~i:f~f!~:;:"~."-"'~._,_,_

l Ji,ljJlCf 1~: Ueternlliling


II

and .1nterprcnng

Associations

Arnong variables

Correlation Coefficients and Cov.1I' 1.\1 Iun

60 50 40
c;
<D

ORRELATION COEFFICIENTS AND COVARIATION

~ 30 20 10 0

I,

'.~;;::;

I
36-4S

I~
~-.-~';

' '.'.~\'-

iii Online

Shopper

t:/j~.'.
,~~{

j;;\.

~~:

,,- Non-Online - Shopper I

~~;. ,{.:;..:

35 and Younger

46 and
Older

1111rnrrclation coefficient is an index number, constrained to fall between the range I Ii I () and + 1.0. that communicates both the strength and the direction of a linear 111.1111 mshlp between two variables. The strength of association between two variables is 1IIIIIIIIlInicated by the absolute size of the correlation coefficient. whereas its sign com111111111 the direction of the association. Stated in a slightly different manner, a correarcs 1,111i coefficient III indicates the degree of "covar iarion" between two variables. c'lIv.lriation is defined a's the amo!lnt of change in one variable systematically associ,\lI,d with a change in another variable. The greater the absolute'Tize of the correlation Illdlicicnt, the greater is the covar iation between the two variables, or the stronger is 1111'11' relationship." LeI us take up the statistical Significance of a correlation coefficient first. Regardless "I liS absolute value, a correlation that is not statistically Significant has no meaning at all. 1111.,s because of the null hypothesis, which states that the population correlation coefI Ikil'nt is equal to zero. If this null hypothesis is rejected (statistically Significant correla11/111),hen you' can be assured that a correlation other than zero will be found in the t pnpulation.-But if the sample correlation is found to be not significant, the population r rtrrclarion will be zero. Here is a y uestion. If you can answer it correctly, you understand III(' statistical Significance of a correlation. If you repeated a correlational survey many, "1o\lly times and computed the average for a correlation that was not Significant across all ,,(these surveys, what would be the result) (The answer is zero because if the correlation I, not Significant, the null hypothesis is true, and the population correlation is zero.) How do you determine the statistical Significance of a correlation coefflcient iTables exist that give the lowest value of the significant correlation coefficients for given sample sizes. However, most computer statistical programs will indicate the statistical significance level of the computed correlation coefficient. Your SPSS program provides the significance in the form of the probability that the null hypothesis is supported. In SPSS, this is a "Sig." value that we will identify for you when we show you SPSS correlation 'output. In addition, it will also allow you to indicate a directional hypothesis about the size of the expected correlation just as with a directional means hypothesis test.

correl,"lo" 101'111111111 I"


tht' I \\'0

dnrdl zcs betwceu a nuruher


to

I pv.llllllluH V,IIIIII,II'IIIII1I11

1',lJ\>!IIl~llllllti

+ 1.0.

Online Shopping and Age_Relationship: Online shoppers are younger than Non-online shoppers.

To use a corrul.u tun. )011 Hili first establish t h.u It I" ~11\l1"1 ca lly significant I) II lilt'''' III from zero.

70 GO 50 .~ 40 Online Shopper Non-Online Shopper

cf 30
20 10

,~~

Novice

inlermediate

Expert

Online Shopping and Computer Competence. Relationship: Online shoppers have more computer competence than do Non-online shoppers.

Rules of Thumb for Correlation 90


80 70 60

Strength
Rules of thumb exist COI1(('I'II' ing the strength of a cor rc!.i tion based on its absolute !ol/l

50
~ 40 30

Online Shopper NOR-Online Shopper

20
10

Online Shopping and Information Search for Marketplace Alternatives. Relationship: Online shoppers prefer to use the Internet to search about marketplace alternatives more than do Non-online shoppers.

After we have established that a correlation coefficient is statistically 'significant, we can talk about some general rules of thumb concerning the strength of association. Correlation coefficients that fall between + 1.00 and +.81 or between -1.00 and -.81 are generally considered to be "strong" Correlations that fall bet\';"een +.80 and +.6 i or -.~O and -.6 j generally indicate a "moderate" associanon. Those that fall between + .60 and +.41 or -.60 and -.41 are typically considered to be "low," and they denote a weak association. Finally, any correlation that falls between the,range of .21 and AO is usually considered indicative of a very weak association between the variables. Next, any correlation that is equal [0 or less than . LO is typically uninteresting to marketing researchers because it rarely identifies a meaningful association between two variables. We have provided Table 18.2 as a reference on these rules of thumb. As you use these guiclelines. remember two things: First, we are assuming that the statistical Significance of the correlation has been established. Second, researchers make up their own rules of thumb, so you may encounter someone whose gllidelines differ slightly from those in the table. 10 In any case, it is helpful to think in terms of the closeness of the correlation coefficient to zero or to t .00. Statistically Significant correlation coefficients that are close to

\ 1I.1I'1t-1

I I~.

IJI'I\

I IIIIU

IIIH

1111(1 1111 ('1'1)1 t'IIIIH

ASM)(,.I.lI

il)1I1'o A II H)lIg

V.II'I,\I)lc!<o

Correlation Cocfllclrnrv 300

.llld (I1I'."I.tll,,"

II

-,:::;l..")~: :t-~.~"""':'::-'''''~' .' Coefficient .8IlO Range Strength of Association' Strong Moderate Weak Very weak None
coefficient is statistically significilnt

:o'rieIatidii Coefficient Size


.. ~

:.;~~::,~:;.~,;1~f~~~~;:~~~:;h!~~'!,~;!:~~:~ ~'~ '; .r ,::.\:.

FIGURE

III

1.00

m
~

200

1.61 lO .80 .41lO .60

.21 lO .40 .OO to .20

,~
rso

:.

"Assuming

the correlation

100

L
2

... ....
'

.. ~
I ........ /
14

...1It

....

.. J'"

A St.lltll 1)1.1)!111l1l ShOIVIII/i ('0\',111.111,111 Nov.u-r 1.<S.tll'~ 1)11111

'./
1-I 16 18

I
20

.. ' I I I ~~/ .! 8 10 12 4 6 Number 01 Salespersons

zero show that there is no systematic those that are closer to + 1.00 or -1.00 between the variables.

association between the two var iables wheCl'as express that there is some systematic axsociation

The Correlation Sign


an indicates the ssociation 0() variables by its J1 indicates the -rh'c association.

But what about the sign of the correlaLion coefficient? The sign indicates the dirci.uor: "I the association. A positive sign indicates a positive direction; a negatil'e sign indic'atl'\.l negative direction. For instance. if you found a significant corrclauon ()fO.83 hetl"'TIl years of education and hours spent reading National GeographiC.t would mean that pl'''l'k i with more education spend more hours reading this magazine. But if you found a significant negative correlation between education and cigarette smoking. it would me." that more educated people smoke less.

Graphing Covariation Using Scatter Diagrams


We addressed the concept of covariation between two variables in our introductor y comments on correlations. It is now time to present covariation ill a slightl), different manner. Here is an example: A marketing researcher is investigating the possible relationship between total company sales for Novartis, a leading pharmaceutical com pan}'. in a particular territory and the number of salespeople assigned to that territory. At the researcher's fingertips are the sales figures and number of salespeople assigned for each of 20 different Novartis territories in the United States. It is possible to depict the raw datafor these two variables on a scatter diagram such as the one in Figure 18.7. A scatter diagram plots the points corresponding 'to each matched pair of x and y variables. In this figure, the vertical axis is Novartis sales for the territory and the horizontal axis contains the number oI.salespeople in that territory. The arrangement or scatter of points appears to fall ill a long ellipse. Any two variables tha: exhibit systematic covariation will form an ellipse-like pattern on a scatter diagram. Of course. this particular scatter diagram portrays the information gatht"red by the marketing researcher on sales and the number of salespeople in each territory and only that information. In actuality. the scatter diagram could have taken any shape, depending on the relationship between the points plotted for the two variables concerned. I I A number of different types of scatter diagram results are portrayed in Figure 18.8. Each of these scatter diagram results is indicative of a different degree of covar iation. For instance. you can see that the scatter diagram depicted in Figure 18.8(~) is one in which

illill I', 110apparent association or relationship between the two variables; the points fail I 11',111' identifiable pattern. Instead, they are clumped into a large. formless shape. .Illy 1111p"IIILS in Figure 1S.S(b) indicate a negative relationship between variable x and v,III,rlill' y; higher values of x tend to be associated with lower values ofy. The points in 11):1111IS.S(c) are fairly similar to those in Figure IS.8(b), but the a.ngle or the slope of Jill 1lllp.se is different. This slope indicates a positive relationship between x and y. 111'1,111'., larger values of x tend to be associated with larger values of y. What is the connection between scatter diagrams and correlation coefficients? The ,111'.\\1'1 these questions lies in the linear relationship described earlier in this chapter. to IIIII~ ,ll Figures 18.7 and 18.S(b) and IS.S(c). All form ellipses. Imagine taking an IIIIP'I and pulling on both ends. It would stretch out and become thinner until all of its jlllllIl' fall on a straight line. If you happened to find some data that formed an ellipse WHit ,iii of its points falling on the axis line and you computed a correlation, you would lillt! il to be exactly 1.0 (+ 1.0 if the ellipse went up to the right and -J .0 if it went tllIWIl to the right). Now imagine pushing the ends of the ellipse until it became the patIt'I'1I in Figure IS.S (a). There would be no identifiable straight line. Similarly. there would be no systematic covar iation. The correlation for a bell-shaped scatter diagram is /vro because there is no discernible linear relationship. In other words, a correlation ('ol'f'flcient indicates the degree of covariarion between two variables, and you can envision this relationship as a scatter diagram. The form and angle of the scatter pattern is revealed by the size and sign. respectively, of the correlation coefficient.
III

1\"0 highly cor-rcl atcd ,,",I ables will yield a scatter dLI gram pattern ofa tight elllp,'. Your Student Assistant movie shows g!,3phicaJ rcprescntn tions of covariation.

can be examined : of a scatter

e,

I
FIGURE 18.8

.......... ~ ~..........

(a) No Association Scatter Diagrams IllustratingYarious


Refationships

.......

.....~ .
'

. .

.:---": :..
...

......' ~....(
......... ~ ......
.'
.-

.." . ..
.....
.

.~

.-, ...
;

(b) Negative Associalion

(c) Positive Association

__ ..... -'"'"'=""'P"~1V'Il',.on-:.,.. .~".. .t'!."":: .... ....... .".>~ .... ~ ' : ' '"" ~

\ iil'l'H

I III

I nil

1111111111)1 111111 11111 '1"1

I III)!

I\~ltjl II 11111111''''

1\1111111)1 Y,III,WIl'~

The Pearson Product MOIIICIII on'd,llloil r IIH' rill'mula for calculating a Pearson product moment correlation is couipltcatcd. 111,1 1I"I'.I1L'hers never compute it by hand. as they invariably find these all computer ,,1111"11 lowcver, some instructors believe that students should understand the worki ngs l "I 1111'orrclauon coefficient formula. We have described this formula and provided an r I 11I1/11III Marketing Research Insight 18.4. " 1"',INlI1 product moment correlation and other linear association correlation coeffl, lilli, IlIciicarF not only the degree of association but the direction as well, because as we 11',1IlllI'd in our introductory camments on correlations, the sign of the correlation lIit 11'111 indicates the direction of the relationship. Negative correlation coefficients 1I\'I,.tl'lliilL the relationship is opposite: As one' variable increases, the other variable '/1'1II',;,CS. Positive correlation coefficients reveal that the relati'9'"nship is increasing: III >:"1quantities of one variable are associareci with larger quantities of another variable. II I', uuportant to note that the angle or the slope of the ellipse hasnothing to do with IIII' ',I/.c of correlation coefflcient. Everything hinges on the width of the ellipse. (The ,IIIPI' will be considered in Chapter 19 on regression analysis.)

('1111111"I II

1,1

THE PEARSON PRODUCT COEFFICIENT


rson product moment ion coefficient meadegree of linear assooetween two variables.

MOMENT CORRELATION

The Pearson product moment correlation measures the linear relationship h\t II VI II two interval- and/or ratio-scaled variables such as those depicted conceptually b} '(,II ter diagrams. The correlation coefficient that can he computed between the t\\'o va nil hll'" is a measure of the "tightness" of the scatter points to the straight line. You already kllli\\ that in a case in which all of the points fall exactly on the straight line, the corr~lillllili coefficient indicates this as a + I or a -I. In the case in which it was impossible to d" cern an ellipse such as in scatter diagram Figure IS.S(a), the correlation coefTicl\'lli approximates Zero. Of course, it is extremely unlikely that you will find perfect 1.1) <II 0.0 correlations. Usually, you will find some value in between that could be illltrprlt\d as "high," "moderate." or "weak" correlation using the rules of thuD1b given earlier,

A I,oslll,,' 11I11I,111I11I1I.IWII"
all ill\'I'"\,,,,IIIU IIIH 1'1 II LIIIIIII lilli'

ltip,"'hl'lljl.ljIIIlI~IHh'
b.lioll

\"I

\IHII.dt-:

II tiP! I JlIIII!

M filii

~"'A~:;Kl'E;tmN';(;'~
I: 'f':
:::~.:"t'::' .\., 'F :~.' ;~ : C"1~f.f"~ I~;~h.~f'

UATE.NET: MALE USERS CHAT-ROOM

PHOBIA

\.-1/1'1'

'..

_8,4

How to Compute a Pearson Product Moment Correlation Coefficient

srketinq researchers almost never compute stalistics such chi-square or correlation, but it is useful to learn aboul compulalion. the computational formula for Pearson product moment 'relations is as follows:

Formula for Pearson Product Moment Correlation


;=1

"covariance" between x and y. The cross-products sum is divided by nto scale il down 10 an overage per pair of x and y values. This average covarialion is then divided by both standard deviotions 10 odjust for differences in units The result constrains rxy to fall between -1.0 and +10 Here is a simple compulalional example. You hove some dato on population and retail sales by county for 10 counties in your state. Is Ihere a relationship between pop ulation and retail sales? You do a quick calculalion and find that the average number of people per county is 690,000, and Ihe average relail sales if $9.54 million. The slandard deviations are 384.3 and 7.8, respeciively, and the cross-products sum is 25,154. The computolions 10 find the correlation are:

11.lIt,net is an online meeting service. Its purpose is to operate a virtural meeting place for seeking women and women seeking men. Internal analysis has revealed that female r h.u-room users greatly outnumber male chat-room users. This is frustrating to Date.net prlucipals, as they know that the number of "men seeking women" is about the same as "women seeking men." Men seem to have a chat-room phobia. They commissioned an online marketing research company to design a questionnaire Ih.u was posted on. the date.net Web site for 15 days. The survey is a success. as over 5000 d.ue.net users fill it out in this time period. Date.net executives request a separate analysis of'''Jl1cn seeking women" user respondents to look into the chat-roam-related questions. The research company decided to report all correlations that are significant at the 0.01 level. Here is a summary of the correlation analysis findings
111('11

Correlation with
Amount Factor Demographics:
Age

of date.net Use

Chat-Room -.68 -.76 -.78 +.57 +.68 -.90

Ilx; - xliy; rxy


nSx5y

y)

Calculation of a Correlation Coefficient


n

Ilx; - xlly; - Yl
where x; = each x value z: x = mean of the x values y;= eachyvalue y = mean of the y values n = number of paired cases sx' 5y= standard deviations of x and y, respectively rxy
i=l nsxsy Sansfacnon with:

Income Education Number of years divorced Number of chlldren Years at present address Years at present job Relationships Job/ career Personal appearance Life in general Minutes online daily Online purchases Other chatting time/month Number of e-mail accounts

-&"1
-.16 -.86 -.72 -.50 +.90 -.65 +.86 +.77

25,154 10 x 7.8 x 384.4 =~5,154 29,975.4 .84 A correlation of 0.84 is a high positive correlation coef ficient for' the relationship. This value reveals thai Ihe greater the number of citizens living in a county, the greater the county's retail sales.

Online behavior:

We briefly describe the components of this formula to => you see how the concepts we [ust discussed fit in. In stotisticion's terminology, the numerator represents the :55-products sum and indicates the covariation or

Use of date.net (1

= not

----1

important and 5 very important) Meet new people Only way tv till to women Looking for a life partner Not much else to do

+.38 +.68 -.72 +.59

\ jj,IJiH'1

10:

IJCtC.J

IIIIIUl1g

:U1U InlCrprCLJng

ASSQC1:lU0l1SAmong knowledge

var ianres

The Pearson Product and provide a statement of how II

Mornelll

("orrl,I,lIlo,II"I"'II,

1""1

For each

factor,

use your

of correlations

characterizes you recommend

the typical

date.net

male chat room

user. Given your findings, problem?

what tac: icS \10

to Date.net to combat

the male chat phobia

1.1

..
The Hobbit's Choice Restaurant Survey: How to Obtain Pearson -Product Moment 'Correlation(s) with SPSS
probably feel very confident relation coefficients. you are familiar is generally about recommending look using correlation with the waterfront the waterfront waterfront location analysis. Correlation mined from previous preferred.

11

~C
$2~!
$11\

'" :i;!jm-~I

.:

o muu
3. A/tersel=tinz

._uuu_ .. _..__.
tlJe Variables;

'.,.fy
o~o_~e~~~_

ffV~~E':
. With SPSS, it takes only a few clicks to compute Once again, Cor.

. ,1 C>""'~ ... _... ~_'-roo_-=--

'111""

I. U.f8 Correlate> Blvariase.: to up the Bi .ariase Correlauons : window ;


--.--mmu __ ._.:

1;;:F.~~~~~~~Ijili-i--i-i-i-;-;i-i-iiii-ii--ij--i'i-;~jl~.
~
I~WNchseclionr:Jth~. '~DoYOU~crCetc -V<Jjia~e:: I~Prefei'w"-~elfI~V_ ~ ; '#.~ P,JaiDIi"; c Lc .s th , Prder Fortl""dW<'!lits;i

dic/;onOKtqp~nrmtJst~sU

we

will use th

Hobbit's Choice Restaurant

survey case study becaus view

i>

r-:;-l

with it. If you recall, we ha~e deter. analysis, that a waterfront Remembering this, you would

,!>PrelerL;rgeV&!ie(jI-

L-!J ~Cftr:tTffit9ltwm ~~

1~P1r.16f1:;.np~['l~. fsi <%> Prder Uoosud Enlleel

this location

to Jeff. But let's take a doser that they wanted preferences. questions view. We'll case analysis

I
I

,..---- Select i 1.
"",*",d

tlJe VarialJle(s)

-..--.- ..- ..---1 to b.

<~PrelerEIerJ<ntD6C(

Wnc.jl

<i> Preler
f':i

SbTio QUMte ,~ Pr,.Jrf ,I,.,,., r.flrTllvlII~: Cc.rck.l.ion Coefftcienh Spea"m<vl

c()1npute pi!lJNOlJ

and by tkfaJ.tU SPSS .ill ! coefficient.:J 1

analysis can be used to find out what people want would indicate with this location

view; that is, high positive correlations would signal thacrhey

Pe.non r Kendal'; lau-b r i .-T;;;is~~--.. ---.. : r. Two-tMe.d r One-tliied , __

i
0,,,,", ...

and the items that are highly correlated

p ~

~.ant

correlatiom

Conversely, high negative correlations location. prefer being mulled extent people work specified SPSS, oorrelauons are outed wi th the RELATE-BIVARIATE .re,

did not want those items with the decor, and atmosphere the waterfront as it can reveal to what

Recall that there were several menu, (or do not prefer)

over by Jeff Dean. Correlation

analysis is very powerful,

these items as they prefer

only do a few of the items here, and you can do the rest in your SPSS integrated at the end of the chapter. correlation is analysis with the waterfront the Hobbit's which locacon So, we need to perform The clickstrearn selected sequence

FIGURE 18.9
The SPSS Clickstream to Obtain Correlations

preference vari"personality."
Ale Edt VIew 'Dot/! Tr.:nsform Insert Ftm4lt PIldyze Gr/!Dhs Wities Wrldow Hetl

able and the other factors that will determine 18.9, to a selection menu. Different The output instruct box to specify location which

Choice Restaurant's

ii!IIJi3Tt!3., ~l~!J~J0./litJ / ~/ jgij/t;,.Li;L/ <?I1/1f;J _1.J .


Correlations

ANALYZE-CORRELATE-BIVARIATE,

leads, as can be seen in Figure Note that we have and and the twoyou to decor, atmosphere,

variables

are to be correlated. items related

the waterfront

and several other are optional,

types of correlations is the default. generated

so we have selected Pearson's, in Figure

r-.. 17.-;C;;;;;;i;;;;;;;';-;;Wk';;'; .. -..


!cotre14JUJns,
:rizni/iC4lJCe

tailed test of significance SPSS to compute

lnel..; and

ll
i

'"

correlation analysis. each :~ation willhave a unique licance level.

by this command correlations, coefficient, its output

is provided

18.10. Whenever

<;:;"'-{... SDmJ'u siuforeach f<Ur .'H


COfrel.ltio-ns Prefer Waterfront.

is a symmetric

correlation

matrix composed contains three


PreferWalertrontVlew Pes-son Correlation Sig.(2-laJled)

h.

of rows and columns that pertain items: (1) the correlation

to each of the variables. Each cell in the matrix

(2) the significance


correlations

level, and (3) the sample size. As you "prefer waterfront location" and into in entrees? Prefer unusual desserts?-are

Inew

Prefer Simple Decor

Prefer Unusual Entrees

Prefer Unusual

Desserts

can see in .Figure 18.10, the computed three of Jeff's questions-Simple a .001 or less probability

between

decor? Prefer unusual

+.780, -.782, and -.81 0, respectively. They all have a "Sig" value of .000, which translates that the null hypothesis you will also notice of zero correlation that a correlation is supported. an our correlation which a variable purpose s~etric. printout, is correlated of 1.000 is reported, with this procedure

1'-- -.,----Prefer Unusual Entrees Prefer Unusual Desserts

, yarson COrrelation .Sill.(2:ailed) N Pearson Correlation SIQ.(2lalled) N Pearson Correlation Slg.(2.lailed) N

'00 .780

if you look

.~~~ I
-.782 .000 '00 -.810 .000 '"0

4Gu 1
-.889 000 400

40"
1 400

-400 .868'"" 000 400 1 400

s:

with itself. This reporting

may seem strange,

but it serves the is

.891"
.000 400

of reminding

you that the correlation

matr ix rhat is generated

868
.000

I~ other words, the correlations

in the matrix above the diagonal 1s are identical to this fact is obvious; however, in a single run, and the Is on the diagonal are handy are statistically significant, or significantly dif-

'". ccner lton

is significant althe 0.01 level (2.talled):

Since all. correlations


: to interpret.

those correlations

below the diagonal. With only a few variables,

~----i
i

400

:siznificant, '''''' iii, "",In oftlJuralJ"/

sometimes several variables are compared reference points. .

Student.Assistanr: ~tegrate.d .Case: :ing with Correlations

Since We now know that the correlations

>oWIe dck to edt Tltie

tJurirstrelJgths

~!;c.~.r~~~i:i.d:;~~~2..:'-,'i:!:f.~f;.~1",;;,;,,_~d.[ij.i?;' . ~~z~;~~~"i:::d
['f:SPSSProcessa Isreody

'

~i~

'

!1-----------iH:t7~~

fe~ent from zero, We can assess their strengths. They hover around .80 which, according to our rules of thlllnb 0 n corre I'" . . ., anon SIZe In di cates a mo d erate Iy strong aSSOCIatIon. In 0 th er war d s, we have some rei" bl s: I . . ":,: the .tionships that are sta e and rair y strong. Last, we can use the SIgns to mterpret '.. eassoci.tions. What is your interpretation? ...

FIGURE 18. 10
SPss Output for Correlations

~~~~.~~

.. ",,~-;r~=~,..:r~-?::,,_.,.-: ..

I. 11.1j11lI I

ni

IJCLC1"!!lUUUg

anu Ullerprcung

Associauons

AIllong

var

rauies

The Pearso II Prorlll"1

MOII"''''

CIi'""!."I"1i

,,"11111,,11

lilllll' "",111, I, .)"' ill I' 1\'1111 I~"" 11i(' <l1.goI

!'"'

Here is what we have found. People who prefer to eat at a restaurant with a wat "', . CI front view also prefer a sImp I e deecor. At t Ire same time, t Iiey do not want unusual' entrees or unusual desserts. Apparently, when folks go to a waterfront restaurant thei want to kick back, be comfortable, and not be bothered with choosmg from a varlCl1 of curious dishes or an array of exotic desserts. They probably want seafood. An upscale Hobbie's Choice Restaurant with unusual entrees and desserts would d~n. nitely not fit the preferences of these people. ~o, now how do you feel about your previous recommendation to locate the Hobbit s ChOICe Restaurant on some expensiv waterfront property)

;!~:~~~l~~~~J:6Y~~hJ~gb~f~1~;~':'
I h,~ I (II relations only for marie variables (interval or ratio scaling).
t
I II

LUlor .arc
t

wl.lllon assumes that only the two variables involved are relevant: all other variables and considered [Q be constant. only covariance between the two variables is

ru n-l.uion does not indicate cause-and-efTect,

1,,'lilfI,lnalyzed.
I'(illl'!ation expresses only the Iinearrelauonship between two variables.

,\,., ivc

tllttl

USING

SPSS TO COMPUTE CORRELATIONS

You have just seen the correlation analysis findings for one set of preferences for a restau, rant. Now let's take a specific aspect of the restaurant, namely, the fact that it could be within a 3D-minute drive from patron's homes. Use SPSS to determine the correlation of the preference for this location with preferences for string quartet music, jazz combo music, formal waitstaff wearing tuxedos, and unusual entrees on the menu. When you inspect the correlation matrix that results, what have you discovered about the combination of restaurant attributes that these patrons prefer for a restaurant that is within a 30minute drive?

Special Considerations in Linear Correlation Procedures


We have prepared Table 18.3 to summarize and remind you of four considerations 10 keep in mind when working with correlations. We will discuss each of these in turn. To begin', the scaling assumptions underlying linear correlation should be apparent 10 you,

"hen dining on seafood. sraurant patrons often' like keep it simple.

to reiterate that the correlation coefficient discussed in this section share interval-scaling assumptions at minimum. If the two 1I!.IIII<'s have nominal scaling assumptions, the researcher would use cross-tabulation III,dy~ls, and if the two variables have ordinal scaling assumptions, the researcher would "I" louse a rank order correlation procedure. (We do not discuss rank order correlation III 1111; chapter, as its use is relatively rare.) Nt-xi, the correlation coefficient takes into consideration only the relationship I11'1Wl'~11 two variables. It does not take into consideration interactions with arty other v.u nhlcs. In fact, it explicitly assumes that they do not have any bearing on the relation,IIIJI with the two variables of interest. All other factors are considered to be constant or "I'(I/l'I1" in their bearing on the two variables under analysis. Second, the correlation coefficient expliciry does not assume a cause-and-effect rr-lationship, which is a condition of one variable bringing about the other variable. Although you might be tempted to believe that more company salespeople cause 11It)I'C company sales or that an increase in the competitor's sales force in a territory 1,lkcs away sales, correlation should not be interpreted to demonstrate such cause,lIld effect relationships. Just think of all of the other factors that affect sales: price, product quality, service policies, population, advertising, and more. It would be a mist.ikc to assume that just one factor causes sales. Instead, a correlation coefficient merely investigates the presence, strength, and direction of a linear relationship
I ,IIIII\'S that both variables

11111 II docs not hurt

Correlation docs IIOl di'llil strate cause and cffl'('t.

het ween two variables. Third, the Pearson product moment correlation expresses only linear relationships. Consequently, a corrclauon coefficient result of approximately zero does not ncccssar ily mean that the scatter diagram that could be drawn from the two variables defines a formless bail of points. Instead, it means that the j5iJints do not fall in a well-defined elliptical pattern Any num ber of alternative, curvilinear patterns such as an S-shaped or a J-shaped pattern are possible, and the linear correlation coefficient would not be able to communicate the existence of these patterns to the markeLing researcher. Anyone of several other systematic but nonlinear patterns is cnurely possible and would not be indicated by a linear correlation statistic. Only those cases of linear or straight-line relationships between two variables are identi(led by the Pearson product moment correlation. In fact, when a researcher does no.t find a Significant or strong correlation, but still believes some relationship exists between two variables, he or she may resort to running a scatter plot. This procedure allows the researcher to visually inspect the plotted points and possibly to spot a systematic nonlinear relationship. You already know that your SPSS program has a scattel' plot option that will provide a scatter diagram that you can use to obtain a sense of the relationship, if an)', between-two variables.

Correlation will not del C('\ nonlinear relationships between variables.

~-,,~"""""'i*

.v.F.

~~~.!t~:-~"~~~""~~'>::':'~ll'7!"';'~~_~_._ _... .. ....

IIII!"II

III, II'!!'"'''I''I,!!,''''II''"

IjllJll"WilUIIII,IIIIIJI,AIIIlIIIl!V,JI'I.,hlcs

Review OIl,'''IIIII~1 Apll!it ,1111111_

"ON( 'Ll' I>IN(; COMMENTS ON ASSOCIATIVE ANAI.YSliS


dr,H 1111 I. 11~'I) ~ II q lip

ullll)
1111 ,

j111t111
lilt

d.

,II

1111111'1111 _'YIIII

LI

I '1111111111111.

IIIIIWhill\, IlIlpl' I" 111111 l ,I ~pdlh lOll j'lj~lIdlltlWIIII' 1111 1.1111111 11I),11'1(/1"1'111i 111,111 1'11,11111'11111;11\"",,

III" '" ,IIIII}! ,1"1,(11111)11<111\ the data being analyzed are the key to 1I11(\1'I'oI,lIldlli Or ,\I,',lIvI,Illv(' clll,\Iysls, I} Sometimes a marketing researcher must use cal~gmll ,111111'11 ''(I/''''I\('I1L:, (norntual scale). As you know, nominal measurement provtd,-, IIIi II I I .uuouut 01' information about an object, whereas ratio measures provide lilt' ~pl,111,I ,IIIIOllnl 01' inforrnauon. The amount of information in scales direc:t1) 11111',"1 t hc amount or information yielded by their appropriate associative test. 511 till' I Itl square statistic, which uses two nominal-scaled variables cannot have' .1\ 11111111 informatton as the Pearson product moment correlation, which mal' be lI'oI'd 1111 two interval or ratio-scaled variables. Similarly, the underlying relationship, 11'/1"11 the differences in information. Chi-square describes a nonmonotonic relali""')lill, a Pearson product moment correlation describes a linear relationship, ,tlill ,I rank order correlation's relationship falls between these two with a 11101101111111 relationship. Finally, throughout our descriptions of various statistical tests, we have IVI"III''' to the "null hypothesis." For example, with chi-square analysis, there is t lu- Ilidl hypothesis of no association between the two nominal-scaled variables, and w] tli I (II relation analysis there is the null hypothesis of no correlation. But marketing nun agers really want to find strong evidence of an association that exists and can be 11'1'" to their advantage; that is, they really want to find support .for the "altc rn.urv hypothesis" that an association does exist. So why do we always test the null h)'p'liI' esis? Here is an example that answers this question. The Tree-Free Company of Medford, Massachusetts, makes paper products frolll 100 percent recycled paper. As a strategy to entice Kleenex to buy its tissue boxi-, Tree-Free might conduct a survey asking, "If you learned that a company used rei)" cied paper boxes, would that fact inf1uenc~ your decision to purchase a particular brand of tissue?" and "Do you typically buy Kleenex, or do you buy some other brand of facial tissue?" Of course, what Tree-Free management would love to elis cover is that buyers of some brand other than Kleenex are sensitive to the recycled paper issue. Then, they could make a persuasive argument that Kleenex should use Tree-Free tissue boxes, advertise how it is helping the environment, and increase its market share over tissue brands that do not use recycled paper boxes. The "hypothesis of interest" is a strong association between a "yes" answer to the first question and a "some other brand" answer to the second question.

This chapter dealt with instances in which a marketing researcher wants to see if there is a nlauonship between the responses to one question and the responses to another question ill the same survey. Four different types of relationship are possible. First, there is a IlOl1mOru uonic relationship, in which the presence (or absence) of one variable is systematically ",SOClated with the presence (or absence) of another. Second, a monotonic relationship indicates the direction of one variable relative to the direction of the other variable.Third, a linear relationship is characterized by a straight-line appearance if the variables are plotted against one another on a graph. Fourth, curvilinear relationship rrieans the pattern has a definite curved shape. Associative analyses are used to assess these relationships stansucally Associations can be characterized by presence, direction, and Strength, depending on the scaling assumptions of the questions being compared. With chi-square analysis, a crosstabulation table is prepared for two nominal-scaled questions, and the chi-square statistic is computed to determine whether the observed frequencies (those found in the survey) differ signiflcantly from what would be expected if there were no nonmonotonic relationship between the two. If the null hypothesis of no relationship is rejected, the researcher then looks at the cell percentages to identify the underlying pattern of association. A correlation coefflcientis an index number, constrained to Fall in the range of + 1.0 to -I .0, that communicates both the strength and the direction of association between two variables. TIle sign indicates the direction of the relationship and the absolute size indicares the strength of the association. Normally, correlations in excess of .8 are considered high. With two questions that are interval and/ or ratio in their scaling assumptions, the Pearson product moment correlation coefficient is appropriate as the means of determining the underlying linear relationship. A scatter diagram can be used to inspect the pattern.

hen the null hypothesis is ecred, the researcher may :;"~a rr:.aIi.d.geriall), important ationship to share with the
I..flager.

_=

~
;S Student Assistant Online: ar Integrated Case nie in the Bottle: SPSS eistics Coach

In truth, marketing managers and researchers typically have hypotheses of interest in mind. But statistical tests do not exist that can assess these hypotheses conveniently. Instead, the researcher must use the two-step process we have described. First, the existence of an association must be demonstrated. If there is no association, there is no sense in looking for evidence of the hypothesis of interest. However, when the null hypothesis is rejer'ted, an associatiofrdoes-exist in the population, and the researcher is then justified in looking at the direction of the association. When the second step takes place, the researcher is in pursuit of the hypothesis of interest. Now, the strength and direction aspects of the relationship are assessed to see if they correspond with the suspicions held by the marketing manager who wants to turn this association into managerial action. Just think of the millions of boxes used by the facial tissue industry for packaging. Now do you see why Tree-Free has such a strong interest in a hypothesis other than the null? To do competent work, a researcher must ferret out all of the hypotheses of interest during the problem definition stage.

Associative analyses (p. 522) Relationship (p. 522) Nonmonoronic relationship (p. 523) Monotonic relationships (p. 523) Linear relationship (p. 524) Straight-line formula (p. 524) Curvilinear relationship (p. 520) Cress-tabulation table (p. 528) Cross-tabulation (ell (p. 528) Frequencies table (p. 528) Raw percentages table (p. 530) Column percentages table (p. 530)

Row percentages table (p. 53 I) Chi-square Xl analysis (p. 531) Observed frequencies {p. 53 I) Expected frequencies (p. 53 I) Chi -sq uare formula (p. 532) Chi-square distribution (p. 534) Correlation coefficient (p. 541) Covariation (p. 541) Scatter diagram (p. 542) Pearson product moment correlation (p.544) Cause-and-effect relationship (p. 549)

I, Explain the distinction between a statistical relationship and a causal relationship. 2~ Define and provide an example for each of the following types of relationship: (a) nonmonotonic, (b) monotonic, (c) linear, and (d) ,curvilinear

~~~~~~'T',

.,,,.,...:,-::.--- .. , .... ,,. -'''--',

.... ~f''~;-".'''':~--' ...

-i-..

>--~~''~

-..-.'.~-~--.'.'.~."

Regression

Analysis

ill

MarKeting Research

I\niI W 1j1l11\11I111 \,,111,11 that yield J significam predictive model. The stand.ml ('11111111t III "~lllilI111 I iiI'S 1I'.I,d 111 compute a confidence interval range for a regression. pl'"dlulltil ~;t\l',()t1ed researchers Illay opt to use stepwise ruuhiple n:gl ~'"IIIII 11 1.11 \l'1I11.t l'Ii l,u f:I' number of candidate independent variables such as several dellltllP il!illll , Illnl I'll ,11111 huycr behavior characteristics. With stepwise multiple regress,iulI, intil'IH'lIdl'lli 1.111 11111"ll't' entered into the multiple regression equation containing , only 't"llI~lli ,tll} "IfI Itrllt ,1111 independent variables. .,,'

Concept Multiple regression analysis Additivity

Explanation A powerful form of regression


t"f~uc.ti0n

where more than one x variable is in the regression

A statistical assumption

that allows the use of more than one x variable in a

multiple regression equation: Independence assumption Multiple R A statistical requirement

=a+

blXl

+ bIx2

+ brnxm

that when more than one x variable is used, no pair ofx

variables has a high correlation Also called the coefficient of determination, a number that ranges from 0 to !.O that indicates the strength of the overall linear relationship in a multiple regression, the higher the better

Multicollinearity

The term used to denote a violation of the independence regression results to be in error

assumption

that causes

Variance inflation factor (VlF) Trimming

A statistical value that identifies what x variable(s) contribute to multicollinearity and should be removed from the analysis to eliminate multicollinearity. Any variable with a VIF of 10 or greater should be removed

Removing an x variable in multiple regression because it is nor statistically significant, rerunning the regression, and repeating until all remaining x variables are Significant

Standardized beta coefficients Dummy independent variable Stepwise multiple regression

Slopes (~values)

that are no~malized

so they can be compared in y's predicuou

directly

10

I'mliction (p, 560) l.xtrapolation (p. 561) l'u-dictive model (p. 561) Analysis of residuals (p. 562) I\ivariate regression analysis (p. 563) Intercept (p. 563) Slope (p. 563) Ikpendent variable (p. 564) :.idcpendent variable (p. 564) Least squares criterion (p. 564) Standard error of the estimate (p. 570) Outlier (p. 572)

General conceptual model (p, 573) Multiple regression analysis (p. 57 S) Regression plane (p. 575) Additivity (p. 576) Coefficient of determination (p. 577) Independence assumption (p. 577) Multicollinearity (p. 577) Variance inflation factor (VIF) (p. 5J 7) Dummy independent variable (p. S 8 I) Standardized beta coefficient (p. 582) Screening device (p. 583) Stepwise multiple regression (p. 585)

determine their relative importance

Use of an x variable that has a 0,1 or similar coding, used sparing when nominal variables must be in the independent variables set

A specialized multiple regression that is appropriate when there is a large number of x variables that need to be trimmed clown to a small, signiHcani. set and the researcher wishes the statistical program to do this automatically

.:

. Predicti ..'e analyses are methods used to forecast the levels of a variable such as sales. Model building and extrapolation are two general options available to lua,~,c:~ researchers. In either case, it is important to assess the goodness of the prediction. This assessment is typically performed by comparing the predictions against the actual data with procedures called "residuals analyses." -Market researchers use regression analysis to make predictions. Till': basis of this technique is an assumed straight-line relationship existing between the variables. With bivariate regression, one independent variable, x, is used to predict the dependent variable, y, using the straight-line formula of y a + bx. A high R square and a statistically Significant slope indicate that the linear model is a good fit. With multiple regressi0n, the underlying conceptual model specifics that several mdependcnr variables are to be used. and it is necessary to determine which ones are Significant. By systematically eliminating the nonsignificant independent variables in an iterative manner, a process called "trimming," a researcher will ultimately derive a set of Significant independent

1. Construct and explain a reasonably simple predictive model for each of the following cases: a. What is the relationship between gasoline prices and distance traveled for family automobile touring vacations? . b. How do hurricane warnings relate to purchases of flashlight batteries in the expected landfall area? c. What do florists do with regard to their inventory of flowers for the week prior to and the week foilowing Mother's Day? ~ 2. indicate what the scatter diagram and probable regression line would look like for two variables that are correlated in each of the following ways (in each instance, assume a negative intercept): (a) -0,89, (b) +0.48, and (c) -0.10 3. Circle K runs a contest, inviting customers to fill out a registration card. In exchange, they are eligible for a grand prize drawing of a trip to Alaska. The carel asks for the customer's age, education, gender, estimated weekly purchases (in dollars) at that Circle K, and approximate distance the Circle K is from his 0: her horne .. Identify each of the following if a multiple regression analysis were to be performed: (a) independent variable, (b) dependent variable, and (e) dummy variable. 4. Explain what is meant by the independence assumption in multiple regression. How can you examine your data for independence, and what statistic is issued oy most statistical analysis programs? How is this statistic interpreted? In other words, what would indicate the presence of multicollinearirv, and what would you do to eliminate it?

----_._-----_

- ---.--~--~

.__..._..".'-.. ..--- -~_r_..

,.._. .,.,.-.- ..-..."",..'''..,....."...- ..,-.''',..., . ....,~.... ..~'-,.' ,