Vous êtes sur la page 1sur 27

Welcome to Powerpoint slides for

Chapter 8

Simple Tabulation and Cross Tabulation

Marketing Research Text and Cases by Rajendra Nargundkar

Slide 1 1. In a questionnaire-based marketing research project, each question usually represents a variable under study. The basic form of analysis of one variable in a questionnaire is Simple Tabulation of the answers. This could be in the form of simple counting of the frequencies (how many people answered Yes, and No, for example), and percentages.

2. Two different questions in a questionnaire may represent two variables, and if we count these two together, this is called a cross-tabulation. An example could be 10 people from Income Group 1 said they liked Brand A. Here, the two variables are INCOME GROUP and LIKING FOR BRANDS A TO E, measured separately in two different questions on the questionnaire.
3. Simple and Cross tabulation is a very useful form of analysis for all nominally and ordinally scaled variables. For these two scales, calculations such as average (mean) and standard deviation are not permitted. Therefore, frequency and percentages are used to analyse such variables. We will see further examples in this chapter, of how these are done. 4. The case studies at the end of the chapter also illustrate the uses of cross tabulation with the use of a chi-squared test.

Slide 2

Dependent and Independent Variables


1. If two or more variables are analysed together, it may be necessary to spell out the relationship between the two variables. The concept of dependent and independent variables is useful in spelling out the relationship. Two variables are called independent variables if a change in one does not influence or cause a change in the other. But if a change in one variable causes a change in the other, the first one is called an independent variable, and the second one is called a dependent variable (dependent on the first).

2. A common example of a dependent variable in marketing is Sales. Annual sales of a brand usually depend on several factors or variables. One of the independent variables on which annual sales depend could be the quantum of advertising (in rupees) done for the brand. A second variable on which sales may depend could be the number of retailers stocking the brand.
3. In a consumer research questionnaire, the dependent variable could be satisfaction with the brand, which may depend on taste (if it is a food brand), and easy availability. Another example is the quantity of a product bought, a dependent variable, which depends on family size and household income.

Slide 3

Demographic Variables

1. Many demographic variables such as age, location, income, occupation, sex, education are generally independent variables for the purposes of most marketing studies. This is because other variables depend on them. 2. Attitude towards a brand, or the brand purchased, or intention to buy, are usually treated as dependent variables in many marketing studies. For a marketing researcher, these variables or similar ones, are the real variables of interest, as they help in arriving at strategies for increasing sales or market share. 3. The other major types of independent variables are the elements of the four Ps of marketing. The marketing effort of a company can be measured in terms of its promotional efforts, price variations and distribution changes. It can also be gauged from new product launches, or repositioning or repackaging of existing brands. 4. Therefore, we could measure sales as the dependent variable with any of the marketing Ps as independent variables.

Slide 4

First Stage Analysis Simple Tabulation

In a questionnaire-based survey, the first stage of analysis is called simple tabulation. This consists of every question being treated separately and tabulated. For every question, the number of responses in each category of answers is counted. Assuming the sample size is 500, and all 500 have answered the question, the simple tabulation of the respondents' gender may look like the following
1. Male 2. Female Total 300 200 ----500 -----

The simple tabulation for another question on the questionnaire may look like this
1. Regular Users of Brand X 2. Occasional Users of Brand X 3. Non-users of Brand X -- 200 -- 150 -- 150 ----Total 500 ----A title can be included for each table, and on the top of each column, to explain the variable name through a label. For example, the above simple table can be titled Frequency of Usage, or Number of Users and Nonusers of Brand X.

Slide 5

Computer Tabulation

If codes were used to input the data into the computer for tabulation, the numbers 1, 2 and 3 could have also been the numerical codes for the three categories of responses to the above question. The descriptions Regular Users of Brand X, Occasional Users of Brand X and Nonusers of Brand X are called Value Labels in most of the computer packages such as SPSS, and can be defined by the user. They will appear on the table whenever the table is printed as output.
The Variable Label is usually the title of a column of data in the package. In this case, the column could have been labeled with a Variable Label Usage of Brand X or some similar title.

Slide 6 Percentages In addition to the number of respondents who fall into each category, we usually compute percentage of the respondents also. This appears as one more column on the table, and is automatically printed out in most computer packages when you request a table to be printed. For example, in the above table, it would look like the following, with percentages added

Usage of Brand X Number 1. Regular Users of Brand X 200 2. Occasional Users of Brand X 150 3. Non-users of Brand X 50 -----Total 500 ------

% ( 40 ) ( 30 ) ( 30 ) ------(100) -------

Please note that the percentage is based on the total number of respondents who answered this question.

Slide 6 contd. If in a questionnaire, the number of respondents is different for some of the questions, the percentage will be calculated with respect to the total number of respondents for the respective questions. For example, in the above example, there may be a question for non-users only, after the above question has identified them. Since there are only 150 non-users of Brand X, the sample size of respondents for the question will be 150. Another question for users (both occasional and regular) may have 200+150=350 as the number of total respondents. So, the percentages will be calculated on different totals for these two subsequent questions.

Slide 7

Totals of Percentages

If the categories of answers to a question are such that multiple choices can be ticked by respondents, the percentages may not add up to 100. For example, the question may ask respondents which brand or brands of toothpaste they have used before, and the answer categories may be1. Colgate 2. Pepsodent 3. Close Up 4. Promise 5. Any Other (specify) In such a case, people may tick more than one brands. Therefore, the percentages may add up to more than 100. For example, 30% of the respondents may choose Colgate, 40% may say Pepsodent, 50% may say Close Up, 10% may tick Promise, and 20% may pick other brands. The total percentage would then add up to 30+40+50+10+20, or 150. This total percentage is not meaningful if multiple options can be ticked by respondents. But the individual percentages for each brand do hold meaning. These types of simple tables are also known as Frequency Tables. Many computer packages provide graphics capabilities to print out a variety of graphs and charts to represent the data in addition to the tables. One popular chart is the Bar Chart. Another is a Pie Chart, in the form of a circle with segments representing different categories of answers to a question. Please consult the help menu of your computer package to learn how to do various graphs.

Slide 8 Simple Tabulation for Ranking Type Questions

Suppose we had ordinally scaled questions in our questionnaire. Then, we may have a complex answer to tabulate. For example, the question could have been Q. Rank the 5 brands of refrigerators shown below on a scale of 1 to 5 (1=Best and 5=Worst), according to your opinion. BRAND Whirlpool Kelvinator Godrej Samsung Videocon RANK ___ ___ ___ ___ ___

Slide 9 The tabulation of this question will end up with an output table that looks like this Table 1

BRAND Whirlpool Kelvinator Godrej Samsung Videocon

RANK 1 RANK2 RANK3 RANK4 RANK5 x x x x x x x x x x x x x x x x x x x x x x x x x

The x values in the table represent the number of respondents who gave a particular rank to each brand. This is actually a bivariate table, because Brand of Refrigerator and Rank are the two variables.

Slide 10 If we want to construct univariate tables out of the above data, we can take up one column at a time from Table 1 and do separate frequency tables or charts. If we assume some numbers, one of the univariate tables may look as follows BRAND Whirlpool Kelvinator Godrej Samsung Videocon TOTAL No. of People who Ranked it No.1 90 60 70 32 45 297

This is a univariate table, and if we wish to, we can calculate the percentages on a total for each brand. For example, 90/297 works out to .303 or 30.3% who ranked Whirlpool as no.1. Similar calculations can be done for other brands in the column. We can construct similar tables for Ranks 2, 3, 4 and 5 if we want to look at the frequencies of people who gave those ranks to the brands separately. But the overall picture is already available from Table 1.

Slide 11

Tabulating Ratings

Commonly used rating scales are of the following type Q. Rate the following attributes of LIRIL soap on a scale of 1 to 5 (1= Very Unsatisfactory, 2=Unsatifactory, 3=Neither Satisfactory nor Unsatisfactory, 4=Satisfactory, 5=Very Satisfactory). Lather Fragrance __________________________________ 1 2 3 4 5 __________________________________ 1 2 3 4 5

For each attribute, the number of people who rated it as 1, 2, 3, 4 or 5 can be tabulated in separate tables, one of which will look as follows RATING 1 2 3 4 5 TOTAL Lather 30 25 50 76 22 203

Slide 12 Alternatively, we can tabulate ratings for all attributes in one table as follows RATING LATHER FRAGRANCE x x x x x ATR.3 ATR.4 ATR.5

1 2 3 4 5

x x x x x

x x x x x

x x x x x

x x x x x

This table enables us to look at both columns and rows simultaneously.

Slide 13 Second Stage Analysis Cross Tabulation After the simple frequency and percentage tabulation for every question on the questionnaire comes the second stage the cross tabulations. A cross-tabulation can be done by combining any two of the questions and tabulating the data together. This is a 2-variable cross tabulation. An example could be a cross-tabulation between Brand Preference for brands of tea and Region to which Respondent belongs. Assuming we have the data on these two variables from a study, the cross tabulation may look like this BRAND Regionwise Buyers (No.) North South East BrookeBond 25 20 20 Lipton 10 15 20 Tata 15 15 10 Total 50 50 50

West 15 5 30 50

Total 80 50 70 200

This is a cross-tabulation of two variables. An extension of this could be adding percentages.

Slide 14 Calculating Percentages in a Cross Tabulation For computing percentages in a cross-tab, however, there is a problem which needs to be addressed. There are two or three different ways percentages can be calculated. For example, in the above example, we can compute percentages row-wise, column-wise or on the total sample of 200. The interpretation of percentages is different in each of the three cases. So which way is right?

The general rule for percentage calculation is to calculate it across the dependent variable. In the above example, we may assume that brand preference depends on the region to which respondents belong. In other words, Brand is the dependent variable, and Region is the independent variable. The rule says that percentages must be calculated across Brand categories that is, column-wise. This appears to be the better interpretation, because the interpretation is Out of 50 respondents from the Northern Region, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy Tata Tea.

Slide 15

All these percentages can be displayed in a table form separately, or in brackets along with number of respondents. The table of percentages along with numbers will look like this
BRAND Regionwise Buyers-Numbers and Percentage North South East West Total BrookeBond 25(50%) 20(40%) 20(40%) 15(30%) 80(40%) Lipton 10(20%) 15(30%) 20(40%) 5(10%) 50(25%) Tata 15(30%) 15(30%) 10(20%) 30(60%) 70(35%) Total 50(100%) 50(100%) 50(100%) 50(100%) 200(100%) Note: The format of the figures is No(%)

The above table can be interpreted according to the column (region) we are looking at. The first four columns represent findings for each region, and the fifth column (Total) represents overall findings for all the regions on an average. For example, from column 4, 30% of buyers in the west prefer Brooke Bond, 10% Lipton, and 60% prefer Tata tea. From column 5, out of the total 200 respondents, across all regions, 40% prefer Brooke Bond, 25% Lipton and 35% Tata tea.

Slide 16

Cross Tabulation of More than 2 Variables


It is possible to have cross-tabulations of 3 or more variables in a table. But most people find it difficult to assimilate information contained in 3 variable crosstabulations. For most normal uses, a 2-variable crosstabulation is quite adequate. A series of 2-variable crosstabulations can be performed on the important variables in the questionnaire. Caution : Do only those Cross-tabs which are necessary or useful It is for the researcher to decide which variables need to be cross-tabulated. It is very easy to overdo the cross-tabulations, and too many of these may end up confusing the researcher or his client. It is a good idea to do only those cross-tabs which are likely to help in the analysis and to draw useful conclusions.

Slide 17 Lack of Causal Inference in Cross Tabulations

It must be mentioned here that any two variables can be cross-tabulated. Even if the cross-tabulation shows a significant association between the two variables, it does not necessarily mean that one of them (the independent) causes the other (the dependent). Causality or direct effect is more of an assumption made by the researcher based on his expectation or experience. The mere existence of a statistically significant association does not necessarily imply a cause-and-effect relationship between the (presumed) independent and the (presumed) dependent variable.
The Chi-squared Test for Cross Tabulations In the case of cross-tabulations featuring two variables, a test of significance called the chi-squared test can be used to test if the two variables are statistically associated with each other significantly. The user who is analysing the data on the computer and using a statistical package, can request a chisquared test along with any cross-tabulations. Commands such as CROSSTABS or CROSSTABULATION on most statistical packages have the option of doing a chi-squared test. In the manual technique, a chi-squared statistic had to be calculated from the numbers in the cross-tabulation. This had to be compared with the chi-squared value from the chisquared tables for the given degrees of freedom, and a given confidence level. But in the computer users case, none of these manual steps are needed. An illustration will explain how to do and interpret the chi-squared test on cross-tabs using a computer.

Slide 18 Chi-squared Test : An Illustration

Let us assume that we have conducted a consumer survey for a brand of detergent. One of the questions dealt with income category of the respondent. Another asked the respondent to rate his purchase intention. These two variables are listed in Table 1.

Slide 18 contd...
S. No . 1 2 3 4 5 6 7 8 9 10 11 12 INCOME CO INTENT DE 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 NONE LOW LOW NONE HIGH LOW HIGH VERY HIGH HIGH LOW HIGH VERY HIGH CERTAIN HIGH VERY HIGH HIGH CERTAIN VERY HIGH CERTAIN CERTAIN INT COD E 1 2 2 1 3 2 3 4 3 2 3 4 5 3 4 3 5 4 5 5

Less Than 5000 Less Than 5000 Less Than 5000 Less Than 5000 Less Than 5000 5001-10000 5001-10000 5001-10000 5001-10000 5001-10000 10001-20000 10001-20000

13 10001-20000 14 10001-20000 15 10001-20000 16 Above 20000 17 Above 20000 18 Above 20000 19 Above 20000 20 Above 20000

Slide 19

Both variables are coded. equivalent incomes are


Code 1 2 3 4

Income codes and their

Income in Rs. per Month Less than 5000 5001 to 10,000 10,001 to 20,000 Above 20,000

Purchase Intention codes are as follows Code 1 2 3 4 5 Explanation (Value Labels for the Variable) None No intention to buy Low Low intention to buy High High intention Very High Very high intention Certain Certain to buy

These two variables were cross-tabulated from a sample of 20 respondents for the sake of this illustration. A cross-tabulation with a chi-squared test was requested from the computer package. The output is shown in Table 2.

Slide 20 Table 2 INCOME Per Month by PURCHASE INTENTION PURCHA SE INTENT NONE LOW HIGH V. HIGH CERTAIN TOTAL INCOME PER MONTH in RS.--- C Less 5000- 10000- Above O than 10000 20000 20000 D 5000 E 1 2 0 0 0 2 2 2 0 0 3 1 2 2 1 4 0 1 2 1 5 0 0 1 3 5 5 5 5 Value
-----------

TOT AL

2 4 6 4 4 20

Chi-Square
-------------------------

DF
----

Significance
------------------------

Pearson

18.66667

12

.09690

The cross-tabulation shows the number of respondents falling into each cell (a cell is the combination of one INCOME category with one PURCHASE INTENTION category). For example, 2 respondents with income less than Rs. 5,000 per month said they had no intention of buying this detergent brand.

Slide 21 Is there a Significant Association Between Respondent Income and Purchase Intention ? The chi-squared test basically answers the above question. At the lower part of Table 2, we have the results of the chi-squared test. The first line of the chisquared test reads a significance level of 0.09690. This means the chi-squared test is showing a significant association between these two variables at a 90 percent confidence level (equivalent to 100-90 100 or 0.10 significance level). Thus, we conclude that at 90 percent confidence level, PURCHASE INTENTION and INCOME are associated significantly with each other. This may lead us to conclude that the price of the detergent is important in its purchase.

Like we said earlier, it is possible to do a crosstabulation (and a chi-squared test) for any two nominal variables in the survey. But it is a good idea to use the cross-tabulation only for those variables where the association makes some sense theoretically.

Slide 22
Measures of the Strength of Association Between Variables In our discussion of the chi-square test so far, we have only looked at the statistical significance by looking at the p-value (probability value) reported on the computer output. This does not tell us the strength of the association between the two variables in the crosstab. If we want a measure of the strength, we have to request the package to give us one of the following (these measures are called the indexes of agreement): 1. Contingency Coefficient C 2. Cramer's V 3. The Phi Correlation Coefficient 4. Goodman and Kruskal's Lambda Asymmetric Coefficient We will briefly discuss these indexes of agreement, as these measures are known.

Slide 23

1. The Contingency Coefficient lies between 0 and 1, and can be used for any crosstab with any number of rows (R) and any number of columns(C), provided R and C are equal (symmetric crosstab). However, it cannot attain the maximum value of 1. The maximum value of the Contingency Coefficient depends on the number of rows and columns in the crosstab. For instance, it can be a maximum of .707 in a 2x2 table, and a maximum of .87 in a 4x4 table.
2. Cramer's V is a variation of the Phi Correlation Coefficient, but it is not restricted to 2x2 tables. It can have a maximum value of 1. 3. Phi Correlation Coefficient is used mainly for 2x2 contingency tables (crosstabs) because otherwise its value can go beyond the 0-1 range, which becomes difficult to interpret.

Slide 23 contd...

4. Lambda Asymmetric Coefficient measures the error reduction in predicting the value (category) of one variable (say, the column variable), if we know the category (or value) of the other (say, row ) variable. Thus, if Lambda (for the Row Variable, given the Column Variable), is 0.43, the reduction in error in predicting the row variable value, given the column variable value is 0.43, or 43 percent. Similarly, we could compute Lambda Asymmetric for the Row Variable, given knowledge of the Column Variable. Also, Lambda Symmetric could also be computed as a weighted average of the above two Lambda Asymmetric values (for the row and the column variables). 5. All these indexes of agreement can be requested on SPSS or other computer packages. Generally one or two of them are sufficient to find out if the association between the row and column variable in the crosstab is weak (close to 0) or strong ( close to 1).

Vous aimerez peut-être aussi