Vous êtes sur la page 1sur 195

Marketing Research


Controllabl Marketing ble
Research Environment
Environme factors
nt Economy
Assessing Providing Marketing
Product Technology
Informatio Decision
Price Competition
informatio n Making
Promotio Regulations
n Political
needs factors
on Social &
MarketIngManagers Cultural
Market Segmentation factors
Target market
• Research is a process (or
series of iterative steps), and followed often when
management is faced with a “problem” and/or
“opportunity”, management needs further information in
order to make a decision – the need for market(ing)
research is an issue that is likely to need addressing...

The question is

“when to conduct market(ing)

When to Conduct Market(ing)
Yes Yes Yes Yes

Time Availability Nature of Benefits vs.

Constraints of Data Decision Costs Conduct
Is sufficient Is the Is the Does the value Market
time information decision of of the research Research
available? on hand considerable exceed the cost?
inadequate? importance?

No No No No

Do not conduct market research!

Example issues: (1) What is our market share?

(2) Will people drink tomato soup from a plastic jar?
(3) Whose machine tools do our potential customers buy?
(4) Which medicine is more preferred for a decease?
When Research Should be
•If it clarifies problems or investigates
changes in the marketplace that can
directly impact your product
•If it resolves your selection of
alternative courses of marketing action
to achieve key marketing objectives
•If it helps you gain a meaningful
competitive advantage
•If it allows you to stay abreast of
Questions addressing the various
stages of the Research Process
Stage in the Process Typical Questions
1. Formulate problem What is purpose of study - solve a
problem? Identify opportunity? Is
background info necessary? What info is
needed to make decision? How will info
be utilized? Should research be
2. Determine research How much is already known? Can
design – Exploratory / conclusivebe formulated ? What types of
questions need Descriptive and causal to be answered ?
What type of study best address
research questions ?
3. Determine data collection Can existing data be used to
Questions addressing the various
stages of the Research Process
Stage in the Process Typical
4. Design data collection Should structure or unstructured
items used in
forms collecting data? Should purpose of
study be made
known to respondents? Should rating scale
be used?
What type of rating scale would be most

5. Design sample & collect Who is target population? Is list of

elements available? Is sample necessary? Is
sample desirable? How large should sample be?
operational procedures will be followed? What
methods will be used to ensure quality of data
The research process
The research process
Is a set of iterative steps and
The Concept of Total Error
All research has error and this impacts on the research outcome – its
usability and accuracy

Poorly Written
Research Report

Poor Logic Poor problem

definition formulation

Improper use of
Procedures Error

Poor data collection

Inadequate sample
size Inadequate
sample design
Problem definition

Management problem
definition process

Research problem definition process

Please note that sometimes this is
Research question or research
“research problem”... and that
research questions are objectives that fit
underneath the research problem.....
Problem Definition
• Management problem:

– Focuses on the decision that

management has to make and is
action oriented (i.e. once the
information is obtained a course of action will be
required)…. The management problem may

– Symptoms of failure to achieve an objective. Must

select course of action to regain it.

– Symptoms of likelihood of achieving objective. Must

decide how to seize opportunity (opportunity
Formulate Formulate
Management Problem Research Problem
• The research problem: How to
provide relevant, accurate, and unbiased
information that manages can use to solve their
marketing management problems.
• The research problem is information
oriented and researchers need to do some
investigation (e.g., ask questions, read information)
before defining the research problem – Researchers
ask yourself: is the issue that management is
seeking answers to merely a symptom of X?
– Remember the iceberg principle
• The symptoms are what we can see (e.g. falling sales)
• The issues (causes) are generally what we cant see and
generally the issue (below the surface) is what needs
and therefore forms the research problem …………..
Examples of
Management Problem Research
Develop package for new Evaluate
effectiveness of

product. alternative package


Increase store traffic. Measure

current image of
the store.
Ok, so we have a problem,
how do we write the problem
So you think you have a
problem – how do you
write it????
Management Research Problem
Information oriented
Should a new
Decision product
/ action be
oriented To determine consumer
introduced? preferences and purchase
intentions for the proposed new
Should the advertising campaign product
To determine the effectiveness of
be changed? the current advertising campaign
Should the price of the brand be To determine the price elasticity of
increased? demand and the impact of sales
and profits of various levels of
price changes

To help you develop and write the research problem and research objectives
you should consult other sources of information: ask questions, rely on
search industry info, academic journals (theory)...... This is an iterative and
The problem definition process
How much is this information worth?????? Estimate the value of
Marketing Research

Problem identification Problem solving

research research

Market Potential Research

Market Share Research Segmenting
Image Research Research
Market Characteristics Product Research
Research Pricing Research
Sales Analysis Research Promotion
For casting Research Research
Business Trends Research Distribution
Problem solving research
Segmenting Research: Basis of segmentation, find out
response of segments, selection of
target segment

Product Research : test , design , packaging,

modification, positioning and repositioning

Pricing Research : price policy, line policy, price elasticity,

customer response

Promotion Research: Promotion budget, relationship with

other tools, media decision , testing,
2nd Session
Marketing Research Defined
“Marketing research is the function
which links consumers and the
consumer to the organization
through information- Information
used to identify and define marketing
problems; generate, refine, and
evaluate marketing actions ; monitor
marketing performance; and improve
our understanding of marketing as a
The role of marketing research within the
marketing system




a) specifying
b) collecting
c) analyzing
d) interpreting

a) planning
b) problem-solving
c) control

Applied/Problem solving research

Often based on cost-benefit analysis
Vital for implementation of marketing
Value of information declines with time
Dynamic (ongoing)
Shift from production to customer-
Declining cost of unit information
(digital age)
Increase intensity of competition
Technology and commercialization
Factors shaping the Marketing Research
Low cost
survey Surveys to
Competitor providers generate
Intelligence sales & PR

The nature
Customer Internet,
and future of
Analytics e.g. online

‘Value for
money’ ‘Respondent’
marketing ‘Strategic’ rewards
Reasons for Doing Marketing
Research: The Five Cs
✂ Customers: To determine how well customer needs
are being met, investigate new
target markets, and assess and test
new services and facilities.
✂ Competition: To identify primary competitors and
pinpoint their strengths and
✂ Confidence: To reduce the perceived risk in making
marketing decisions.
✂ Credibility: To increase the believability of
promotional messages among
✂ Change: To keep updated with changes in
Reasons for Not Doing
Marketing Research
✂ Timing: It will take to much time.
✂ Cost: The cost of the research is too
✂ Reliability: There is no reliable research
method available for
doing the research.
✂ Competitive intelligence: There is a fear
competitors will learn
about the organization’s
Five Key Requirements of
Marketing Research
✂ Utility: Can we use it?
Does it apply to
✂ Timeliness: Will it be
available in
✂ Cost-effectiveness: Do the benefits
outweigh the costs?
✂ Accuracy: Is it accurate?
Classification of marketing research
Examples of problem-solving research
Problem Definition Process

E n v ir o n m e n t a l C o n t e x t o f t h e p r o b le m

T a s k s in v o lv e d in p r o b le m d e fin it io n

D is c u s s io n w it h I n t e r v ie w s w it h S e c o n d a ry d a ta Q u a lit a t iv e
d e c is io n m a k e r s e x p e rts a n a ly s is re s e a rc h

Management decision problem

Marketing research problem

Factors to Consider -
Environmental Context
•Past information and forecasts
•Resources and constraints
•Objectives (organizational &
decision maker)
•Buyer behavior
•Legal environment
•Economic environment
•Marketing and technological skills
Defining the Research
Allow the researcher to obtain all the
information needed to address the
management decision problem

Guide the researcher in formulating the

research design

A broad definition does not provide clear

guidelines for the subsequent steps involved in
the project e.g.

Developing a marketing strategy for the brand

So you think you have a problem –
how do you write it????

Management Research Problem

Information oriented
Should a new
Decision product
/ action be
oriented To determine consumer
introduced? preferences and purchase
intentions for the proposed new
Should the advertising campaign product
To determine the effectiveness of
be changed? the current advertising campaign
Should the price of the brand be To determine the price elasticity of
increased? demand and the impact of sales
and profits of various levels of
price changes
Define Research Design

A framework or blueprint for

conducting the marketing research

Details the procedures necessary for

obtaining the information needed to
structure or solve marketing
research problems
A Classification of Marketing Research Designs

Research Design

Exploratory Research Design Conclusive Research Design

Descriptive Research Causal Research

Cross-Sectional Design Longitudinal Design

Differences Between
Exploratory and
Conclusive Research
Exploratory Conclusive

Objective: To provide insights, understandings. Test hypothesis/examine

Characteristics: Information needed defined loosely.
Information needed is clearly
Research process
Research process is formal and
Sample is small and
Sample is large and representative.
Analysis of primary data is
qualitative. Data Analysis is quantitative.
Findings: Tentative. Conclusive.
Outcome: Followed by conclusive research. Findings input into decision making.
Exploratory Research:
Characteristics : Overview
flexible, versatile, but not conclusive
Useful for :
discovery of ideas and insights,
Formulating problems more precisely,
Identifying alternative courses of action,
Establishing priorities for further research
Methods Used :
case studies
secondary data
focus groups
qualitative research
When done?
Generally initial research conducted to clarify and define the
nature of a problem
Does not provide conclusive evidence :
Subsequent research expected
Descriptive Research:
Characteristics : Overview
Describes characteristics of a population or phenomenon
Some understanding of the nature of the problem
preplanned, structured, conclusive
Useful for :
describing market characteristics or functions
Methods Used :
Surveys (primary data)
scanner data (secondary data)
When Used:
Often a follow-up to exploratory research
Examples include:
Market segmentation studies, i.e., describe characteristics of
various groups
Determining perceptions of product characteristics
Price and promotion elasticity studies
Examples of Descriptive Studies
•Market studies that describe the size of the market, buying power of
the consumers, availability of distributors, and consumer profiles

•Market share studies that determine the proportion of total sales

perceived by a company and its competitors

•Sales analysis studies that describe sales by geographic region,

product line, type of account size of account

•Image studies that determine consumer perceptions of the firm and

its products

•Product usage studies that describe consumption patterns

•Distribution studies that determine traffic flow patterns and the

number and location of distributors

•Pricing studies that describe the range and frequency of price

changes and probable response to proposed price changes

•Advertising studies that describe media consumption habits and

audience profiles for specific television programs and magazines
A Comparison of Basic Research

Exploratory Descriptive Causal

Objective: Discovery of Describes market Determine cause and

ideas characteristics effect

Flexible, Manipulate
Prior formulation of
versatile. independent variables.
hypothesis. Planned,
Control of other
Front end structured design

Methods: Secondary data Surveys
Classification of Marketing
Research Data
Research Data

Secondary Data Primary Data

Qualitative Data Quantitative


Descriptive Causal

Survey Data Observational & Experimental

Other Data Data
Relationship among Exploratory,
Descriptive and causal Research
3rd Session
Sampling Design
information systems

Recom mendations Problem definition


Data collection & Research design Descriptive




Non-probability Probability
Sample or Census
A population is the aggregate of all the
elements that share some common set of
characteristics, and that comprise the
universe for the purpose of the marketing
research problem.

The population parameters are typically

numbers, such as the proportion of
consumers who are loyal to a particular
brand of toothpaste.
Sample or Census
A census involves a complete enumeration
of the elements of a population. The
population parameters can be calculated
directly in a straightforward way after the
census is enumerated (specify

A sample is a subgroup of the population

selected for participation in the study.
Sample characteristics, called statistics,
are then used to make inferences about
the population parameters. The inferences
that link sample characteristics and
Sample Versus Census
Condition favoring the
use of
Budget Small
Time Available Short
Population Small
Variance in Characteristics Small
is the process of selecting a sufficient
number of elements from the
population so that by studying the
sample, and understanding the
properties or characteristics of the
sample subjects, it would be possible
to generalise the properties or
characteristics to the population

more representative the sample is of

the population, the more generalisable
are the findings of the research
Sampling design – key
Population – entire group of people, events or things
of interest that the researcher wishes to investigate -

Population element – single member of the population

Sampling frame – list of all elements or the population

from which the sample is drawn

Sample (ing) – subset of the population selected for

the specific research study - n

Sample unit (subject) – single element selected in the

sample; could be a group ( could be a two stage

Census – an investigation of all individual elements

that make up the population
Why sample?

population may be difficult to
greater depth of information
Managerial objectives of

efficient as time permits
Errors associated with
Sampling frame error - an error that occurs
when certain sample elements are not listed or are
not accurately represented in a sampling frame
(occurs between the population and sampling

Random sampling error – occurs between the

sampling frame and the planned sample for study

Non - response error – the statistical difference

between a survey that includes only those who
responded and a perfect survey that would also
include those who failed to respond (occurs
between the planned sample and the respondents
Sampling design process
Step 1: Define Population
Entire group under study as defined by research objectives

Step 2: Establish Sampling Frame

list of sampling units from which a sample will be drawn;
the list could consist of geographic areas, institutions,
individuals or other units

Step 3: Choose sampling technique/method

method of selecting the sampling units
Probability (random) vs. non probability (non-random)

Step 4: Determine sample size

if non-probability sampling method –involves some
judgement based on time, cost, analysis required
if probability sampling – based on statistical determination
of sample size

Step 5: Identify and select sample unit (subject)

follow procedures based on sampling technique selected
Classification of Sampling
Sampling Techniques

Nonprobability Probability
Sampling Techniques Sampling Techniques

Convenience Judgmental Quota Snowball

Sampling Sampling Sampling Sampling

Simple Systematic Stratified Cluster Other Sampling

Random Sampling Sampling Sampling Techniques
Non Probability Sampling
each sampling unit of the population being studied
does not have an equal chance of being included
in the study (due to the way the sample is

non-random (selection process is subjective)

researchers rely heavily on personal judgement

projecting the findings beyond the sample is

statistically inappropriate

is less concerned about generalisability; other

factors are more important - time ; preliminary
information - then use non-probability
Non Probability Sampling

Common sampling

Convenience Sample
Also known as haphazard or accidental sampling
based on convenient availability of sampling units
sample units happen to be in a certain place at certain
time – high traffic locations – shopping malls;
pedestrian areas

Acceptable only in pre - test/exploration phase when

further research will use probability sampling

Representativeness highly uncertain

Quota sampling can reduce some of the sample

selection error
Judgement Sampling

An experienced individual (could

be the researchers) selects the
sample based on personal
judgement about some
appropriate characteristics
suited to the study

Focus group studies use this

Quota Samples

Various subgroups in a population

are represented based on pertinent

Haphazard selection of respondents

may introduce bias

Similar to stratified random sampling

Snowball Sampling
Judgement sample that relies on
researchers ability to locate an initial set of
respondents with the desired
characteristics; these individuals are then
used as informants to identify others with
desired characteristic

Acceptable when sample units are difficult

to locate

Advantages reduced sample size and costs

Probability Sampling
In a probability sample each element in
the population has some known chance
or probability of being included in the

Used when the representativeness of

the sample is important for
generalisability of results

Random selection of sample thus

eliminating bias
Probability Sampling cont.

statistical efficiency
same sample size and smaller
standard error of the mean is

economic efficiency
precision refers to the level of
uncertainty about the characteristics
being measured
precision is inversely related to
sampling error
precision is positively related to cost
Types of probability
Simple random sample
Systematic sampling
Stratified sampling
Cluster sampling
Area sampling
Simple Random Sampling
Assures each element in the population of an
equal chance of being included in the sample

Blind draw - putting all name in a hat and

drawing out a sample of 100 (size has been
statistically calculated)

Random numbers

Need to begin with a complete list of the

population – sometimes difficult to obtain
Systematic Sampling
A starting point is selected by a random
process and then every nth number on
the list is selected
Calculate skip interval = population list
size/ sample size (size has been statistically
Danger of periodicity – if list has a
systematic pattern
Can be more representative than a
simple random sample
Stratified Sampling
Simple random sub samples are drawn
from within each stratum in the
population that are more or less equal
on some characteristic
Greater degree of representativeness
Two types
proportionate - sample size of each stratum
is relative to the size of each stratum in the
disproportionate –sample size of each
stratum does not reflect their relative
proportions in the population
Cluster Sampling
divides the population into groups
(clusters), any one of which can be
considered a representative sample

an economically efficient technique in

which the primary sampling unit is not the
individual element but a large cluster of

clusters are selected randomly

random sample from within each cluster

Technique Strengths Weaknesses
Nonprobability Sampling Least expensive, least Selection bias, sample not
 Convenience sampling time­consuming, most representative, not recommended for
convenient descriptive or causal research
 Judgmental sampling Low cost, convenient, Does not allow generalization,
not time­consuming subjective
 Quota sampling Sample can be controlled Selection bias, no assurance of
for certain characteristics representativeness
 Snowball sampling Can estimate rare Time­consuming

Probability sampling Easily understood, Difficult to construct sampling

 Simple random sampling results projectable frame, expensive, lower precision,
(SRS) no assurance of representativeness.
 Systematic sampling Can increase Can decrease representativeness
easier to implement than
SRS, sampling frame not
Stratified sampling Include all important Difficult to select relevant
subpopulations, stratification variables, not feasible to
precision stratify on many variables, expensive
 Cluster sampling Easy to implement, cost Imprecise, difficult to compute and
effective interpret results
Choosing probability vs. non-
probability sampling
Probability Evaluation Criteria Non-probability
sampling sampling
Conclusive Nature of research Exploratory

Larger sampling Relative magnitude Larger non-sampling

errors of sampling and error
non-sampling error

High Population variability Low

[Heterogeneous] [Homogeneous]

Favorable Statistical Considerations Unfavorable

High Sophistication Needed Low

Relatively Longer Time Relatively shorter

High Budget Needed Low

Selecting an Appropriate

degree of accuracy
advance knowledge of the
national versus local projects
need for statistical analysis
Session - 4
Measurement and
Measurement means assigning numbers
or other symbols to characteristics of
objects according to certain pre-specified
One-to-one correspondence between
the numbers and the characteristics
being measured.
The rules for assigning numbers should
be standardized and applied uniformly.
Rules must not change over objects or
Measurement and
Scaling involves creating a continuum
upon which measured objects are

Consider an attitude scale from 1 to 100.

Each respondent is assigned a number
from 1 to 100, with 1 = Extremely
Unfavorable, and 100 = Extremely
Favorable. Measurement is the actual
assignment of a number from 1 to 100 to
each respondent. Scaling is the process of
placing the respondents on a continuum
with respect to their attitude toward
Primary Scales of
Nominal Numbers Finish
7 8 3
to Runners

Ordinal Rank Order Finish

of Winners
Third Second First
place place place

Interval Performance
Rating on a 8.2 9.1 9.6

0 to 10 Scale
15.2 14.1 13.4
Ratio Time to
Finish, in
Primary Scales of
Nominal Scale
numbers serve only as labels or tags for
identifying and classifying objects.
When used for identification, there is a strict one-to-
one correspondence between the numbers and the
The numbers do not reflect the amount of the
characteristic possessed by the objects.
The only permissible operation on the numbers in a
nominal scale is counting.
Only a limited number of statistics, all of which are
based on frequency counts, are permissible, e.g.,
Illustration of Primary Scales of
Nominal Ordinal Interval Ratio
Scale Scale Scale Scale
Preference Preference $ spent last
No. Store Rankings Ratings 3 months
1-7 11-17
1. Lord & Taylor 7 79 5 15 0
2. Macy’s 2 25 7 17 200
3. Kmart 8 82 4 14 0
4. Rich’s 3 30 6 16 100
5. J.C. Penney 1 10 7 17 250
6. Neiman Marcus 5 53 5 15 35
7. Target 9 95 4 14 0
8. Saks Fifth Avenue 6 61 5 15 100
9. Sears 4 45 6 16 0
10.Wal-Mart 10 115 2 12 10
Primary Scales of Measurement -
Ordinal Scale
• A ranking scale in which numbers are assigned
to objects to indicate the relative extent to which
the objects possess some characteristic.
• Can determine whether an object has more or
less of a characteristic than some other object,
but not how much more or less.
• Any series of numbers can be assigned that
preserves the ordered relationships between the
• In addition to the counting operation allowable
for nominal scale data, ordinal scales permit
the use of statistics based on centiles, e.g.,
percentile, quartile, median.
Primary Scales of Measurement -

Interval Scale
• Numerically equal distances on the scale
represent equal values in the characteristic being
• It permits comparison of the differences
between objects.
• The location of the zero point is not fixed. Both
the zero point and the units of measurement are
• Any positive linear transformation of the form y
= a + bx will preserve the properties of the scale.

• It is not meaningful to take ratios of scale

• Statistical techniques that may be used include
Primary Scales of
Measurement -
Ratio Scale
• Possesses all the properties of the
nominal, ordinal, and interval scales.

• It has an absolute zero point.

• It is meaningful to compute ratios of

scale values.

• Only proportionate transformations of

the form y = bx, where b is a positive
constant, are allowed.
Primary Scales of
Scale Basic Common Marketing Permissible Statistics
Characteristics Examples Examples Descriptive Inferential
Nominal Numbers identify Social Security Brand nos., store Percentages, Chi-square,
& classify objects nos., numbering types mode binomial test
of football players
Ordinal Nos. indicate the Quality rankings, Preference Percentile, Rank-order
relative positions rankings of teams rankings, market median correlation,
of objects but not in a tournament position, social Friedman
the magnitude of class ANOVA
between them
Interval Differences Temperature Attitudes, Range, mean, Product-
between objects (Fahrenheit) opinions, index standard moment
Ratio Zero point is fixed, Length, weight Age, sales, Geometric Coefficient of
ratios of scale income, costs mean, harmonic variation
values can be mean
A Classification of Scaling
Scaling Techniques

Comparative Noncomparative
Scales Scales

Paired Rank Constant Q-Sort and Continuous Itemized

Comparison Order Sum Other Rating Scales Rating Scales

Semantic Stapel
A Comparison of Scaling
• Comparative scales involve the
direct comparison of stimulus objects.
Comparative scale data must be
interpreted in relative terms and
have only ordinal or rank order
• In non-comparative scales, each
object is scaled independently of the
others in the stimulus set. The resulting
data are generally assumed to be
Relative Advantages of
Comparative Scales
• Small differences between stimulus
objects can be detected.
• Same known reference points for
all respondents.
• Easily understood and can be
• Involve fewer theoretical
• Tend to reduce halo or carryover
Relative Disadvantages of
Comparative Scales

Ordinal nature of the data

Inability to generalize beyond the

stimulus objects scaled.
Comparative Scaling

Paired Comparison Scaling
A respondent is presented with two
objects and asked to select one according
to some criterion.
• The data obtained are ordinal in nature.
• Paired comparison scaling is the most
widely-used comparative scaling
• Under the assumption of transitivity, it is
possible to convert paired comparison data
to a rank order.
Obtaining Shampoo Preferences

Using Paired Comparisons

Instructions: We are going to present you with ten pairs of shampoo
brands. For each pair, please indicate which one of the two brands of shampoo
you would prefer for personal use.
Recording Form:

J hirmack Finesse Vidal Head & Pert 

Sassoon Shoulders
J hirmack 0 0 1 0 
Finesse 1a 0 1 0 
Vidal Sassoon 1 1 1 1 
Head & Shoulders 0 0 0 0 
Pert 1 1 0 1  
Number of Times 3 2 0 4 1 
A 1 in a particular box means that the brand in that column was preferred over
the brand in the corresponding row. A 0 means that the row brand was preferred
over the column brand. bThe number of times a brand was preferred is obtained
by summing the 1s in each column.
Paired Comparison Selling
The most common method of taste testing is paired comparison.
The consumer is asked to sample two different products and
select the one with the most appealing taste. The test is done in
private and a minimum of 1,000 responses is considered an
adequate sample. A blind taste test for a soft drink, where
imagery, self-perception and brand reputation are very
important factors in the consumer’s purchasing decision, may
not be a good indicator of performance in the marketplace. The
introduction of New Coke illustrates this point. New Coke was
heavily favored in blind paired comparison taste tests, but its
introduction was less than successful, because image plays a
major role in the purchase of Coke.
Comparative Scaling Techniques
Rank Order Scaling
Respondents are presented with several
objects simultaneously and asked to order
or rank them according to some criterion.
It is possible that the respondent may
dislike the brand ranked 1 in an absolute
Furthermore, rank order scaling also results
in ordinal data.
Only (n - 1) scaling decisions need be made
in rank order scaling.
Preference for Toothpaste Brands

Using Rank Order Scaling

Instructions: Rank the various brands of toothpaste in
order of preference. Begin by picking out the one brand
that you like most and assign it a number 1. Then find the
second most preferred brand and assign it a number 2.
Continue this procedure until you have ranked all the
brands of toothpaste in order of preference. The least
preferred brand should be assigned a rank of 10.

No two brands should receive the same rank number.

The criterion of preference is entirely up to you. There is

no right or wrong answer. Just try to be consistent.
Preference for Toothpaste
Using Rank Order Scaling
Brand Rank Order
1. Crest _________
2. Colgate _________
3. Aim _________
4. Gleem _________
5. Sensodyne _________
6. Ultra Brite _________
7. Close Up _________
8. Pepsodent _________
9. Plus White _________
10. Stripe _________
Comparative Scaling Techniques
Constant Sum Scaling
Respondents allocate a constant sum of
units, such as 100 points to attributes of a
product to reflect their importance.
If an attribute is unimportant, the respondent
assigns it zero points.
If an attribute is twice as important as some
other attribute, it receives twice as many
The sum of all the points is 100. Hence, the
name of the scale.
Importance of Bathing Soap
Attributes Using a Constant Sum

On the next slide, there are eight attributes of
bathing soaps. Please allocate 100 points among
the attributes so that your allocation reflects the
relative importance you attach to each attribute.
The more points an attribute receives, the more
important the attribute is. If an attribute is not at
all important, assign it zero points. If an attribute
is twice as important as some other attribute, it
should receive twice as many points.
Importance of Bathing Soap
Using a Constant Sum Scale
Average Responses of Three Segments
Segment I Segment II8 Segment III 2 4
1. Mildness 2 4 17
2. Lather 3 9 7
3. Shrinkage 53 17 9
4. Price 9 0 19
5. Fragrance 7 5 9
6. Packaging 5 3 20
13 60 15
7. Moisturizing
Sum 100 100 100
8. Cleaning Power
Q – Sort Scaling

A comparative scaling technique

that uses a rank order procedure to
sort objects based on similarity with
respect to some criterion.
Session - 5
Non - comparative Scaling

Respondents evaluate only one object

at a time, and for this reason
noncomparative scales are often
referred to as monadic scales.

Non-comparative techniques consist

of continuous and itemized rating
Continuous Rating Scale
Respondents rate the objects by placing a mark at the appropriate position on a line
that runs from one extreme of the criterion variable to the other.

The form of the continuous scale may vary considerably.

How would you rate Sears as a department store?
Version 1
Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Probably the best
Version 2
Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- -
Probably the best
0 10 20 30 40 50 60 70 80 90
Version 3
Very bad Neither good Very good
nor bad
Probably the worst - - - - - - -I - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -
-Probably the best
RATE: Rapid Analysis and Testing
A relatively new research tool, the perception analyzer, provides continuous measurement of “gut
reaction.” A group of up to 400 respondents is presented with TV or radio spots or advertising
copy. The measuring device consists of a dial that contains a 100-point range. Each participant
is given a dial and instructed to continuously record his or her reaction to the material being

As the respondents turn the dials, the information

is fed to a computer, which tabulates second-by-
second response profiles. As the results are
recorded by the computer, they are superimposed
on a video screen, enabling the researcher to view
the respondents' scores immediately. The
responses are also stored in a permanent data file
for use in further analysis. The response scores
can be broken down by categories, such as age,
income, sex, or product usage.
Itemized Rating Scales
The respondents are provided with a scale
that has a number or brief description
associated with each category.

The categories are ordered in terms of scale

position, and the respondents are required to
select the specified category that best
describes the object being rated.

The commonly used itemized rating scales

are the Likert, semantic differential, and
Likert Scale
The Likert scale requires the respondents to indicate a degree of
agreement or
disagreement with each of a series of statements about the stimulus
  SD D Neither A SA
A or
1. Sears sells high quality merchandise. 1 2X 3 4 5
2. Sears has poor in-store service. 1 2X 3 4 5
3. I like to shop at Sears. 1 2 3X 4 5
The analysis can be conducted on an item-by-item basis (profile analysis),
or a total (summated) score can be calculated.

When arriving at a total score, the categories assigned to the negative

statements by the respondents should be scored by reversing the scale.
Semantic Differential
The semantic differential is a seven-point rating scale with end
points associated with bipolar labels that have semantic
Powerful --:--:--:--:-X-:--:--: Weak
Unreliable --:--:--:--:--:-X-:--: Reliable
Modern --:--:--:--:--:--:-X-: Old-fashioned

The negative adjective or phrase sometimes appears at the left

side of the scale and sometimes at the right.
This controls the tendency of some respondents, particularly
those with very positive or very negative attitudes, to mark the
right- or left-hand sides without reading the labels.
Individual items on a semantic differential scale may be scored
on either a -3 to +3 or a 1 to 7 scale.
A Semantic Differential Scale for
Measuring Self- Concepts, Person
Concepts, and Product Concepts
1) Rugged :---:---:---:---:---:---:---: Delicate

2) Excitable :---:---:---:---:---:---:---: Calm

3) Uncomfortable :---:---:---:---:---:---:---: Comfortable

4) Dominating :---:---:---:---:---:---:---: Submissive

5) Thrifty :---:---:---:---:---:---:---: Indulgent

6) Pleasant :---:---:---:---:---:---:---: Unpleasant

7) Contemporary :---:---:---:---:---:---:---: Obsolete

8) Organized :---:---:---:---:---:---:---: Unorganized

9) Rational :---:---:---:---:---:---:---: Emotional

10) Youthful :---:---:---:---:---:---:---: Mature

Stapel Scale
The Stapel scale is a unipolar rating scale with ten categories numbered
from -5 to +5, without a neutral point (zero). This scale is usually
presented vertically.
+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5
The data obtained by using a Stapel scale can be analyzed in the same
way as semantic differential data.
Basic Non - comparative
Scale Basic Examples Advantages Disadvantages
Continuous Place a mark on a Reaction to Easy to construct Scoring can be
Rating continuous line TV cumbersome
Scale commercials unless
Itemized Rating

Likert Scale Degrees of Measurement Easy to construct, More

agreement on a 1 of attitudes administer, and time - consuming
(strongly disagree) understand
to 5 (strongly agree)

Semantic Seven - point scale Brand, Versatile

Differential with bipolar labels product, and to whether the

Stapel Unipolar ten - point Measurement Easy to construct, Confusing and

Scale scale, - 5 to +5, of attitudes administer over
witho ut a neutral and images telephone
point (zero)
Itemized Scale Decisions
1) Number of categories Although there is no single,
optimal number,
traditional guidelines suggest that
should be between five and nine
2) Balanced vs. unbalanced In general, the scale should
be balanced to
obtain objective data (Next Slide).
3) Odd/even no. of categories If a neutral or indifferent
scale response is
possible from at least some of the
an odd number of categories should
be used
4) Forced vs. non-forced In situations where
respondents are
expected to have no opinion, the
accuracy of
the data may be improved by a non-
forced scale
5) Verbal description An argument can be made for
Balanced and Unbalanced

Balanced Scale Unbalance Scale

Jovan Musk for Men is Jovan Musk for Men is

Extremely good Extremely good

Very good Very good
Good Good
Bad Somewhat good
Very bad Bad
Extremely bad Very bad
Rating Scale Configurations
A variety of scale configurations may be employed to measure the gentleness of
Cheer detergent. Some examples include:

Cheer detergent is:

1) Very harsh --- --- --- --- --- --- --- Very gentle

2) Very harsh 1 2 3 4 5 6 7 Very gentle

3) . Very harsh
. Neither harsh nor gentle
. Very gentle

4) ____ ____ ____ ____ ____ ____ ____

Very Harsh Somewhat Neither harsh Somewhat Gentle Very
harsh Harsh nor gentle gentle gentle


Very Neither harsh Very

harsh nor gentle gentle
Measurement Error –
Difference between
observed score and true
Measurement Accuracy
The true score model provides a framework for
understanding the accuracy of measurement.

XO = XT + XS + XR


XO = the observed score or measurement

XT = the true score of the characteristic
XS = systematic error ( they affect the
observed in the same way each
Potential Sources of Error on
1) Other relatively stable characteristics of the individual
that influence the test score, such as intelligence, social
desirability, and education.
2) Short-term or transient personal factors, such as
health, emotions,
and fatigue.
3) Situational factors, such as the presence of other
people, noise, and distractions.
4) Sampling of items included in the scale: addition,
deletion, or changes in the scale items.
5) Lack of clarity of the scale, including the instructions
or the items themselves.
6) Mechanical factors, such as poor printing,
overcrowding items in the questionnaire, and poor
7) Administration of the scale, such as differences among

Reliability can be defined as the

extent to which measures are free from
random error, XR. If XR = 0, the
measure is perfectly reliable. Random
error produces inconsistency leading
to lower reliability

The validity of a scale may be defined as the

extent to which differences in observed scale
scores reflect true differences among objects
on the characteristic being measured, rather
than systematic or random error. Perfect
validity requires that there be no measurement
error (XO = XT, XR = 0, XS = 0).
Relationship Between Reliability and

If a measure is perfectly valid, it is also

perfectly reliable. In this case XO = XT, XR =
0, and XS = 0. If a measure is unreliable, it
cannot be perfectly valid, since at a minimum
XO = XT + XR. Furthermore, systematic error
may also be present, i.e., XS≠0. Thus,
unreliability implies invalidity.
If a measure is perfectly reliable, it may or
may not be perfectly valid, because
systematic error may still be present (XO = XT
+ X ). Reliability is a necessary, but not
Session - 6

Data Collection and

Collection of Data
Data can be obtained :

Secondary Source
Internal Records
Primary source
Collection of Data
Primary Data :

Questionnaire : Schedule, Interview form

(telephone and personal
Observation :
Questionnaire Definition

A questionnaire is a formalized
set of questions for obtaining
information from respondents.
Questionnaire Objectives

It must translate the information needed into a set of

specific questions that the respondents can and will

A questionnaire must uplift, motivate, and encourage

the respondent to become involved in the interview, to
cooperate, and to complete the interview.

A questionnaire should minimize response error.

Questionnaire Design Process
Specify the Information Needed

Specify the Type of Interviewing Method

Determine the Content of Individual Questions

Design the Question to Overcome the Respondent’s Inability and

Unwillingness to Answer

Decide the Question Structure

Determine the Question Wording

Arrange the Questions in Proper Order

Identify the Form and Layout

Reproduce the Questionnaire

Eliminate Bugs by Pre-testing

Individual Question
Content -
1.Is the Question

If there is no satisfactory use

for the data resulting from a
question, that question should
be eliminated.
Individual Question Content ─
2. Are Several Questions Needed
Instead of One?
Sometimes, several questions are needed to obtain the
required information in an unambiguous manner. Consider
the question:

“Do you think Coca-Cola is a tasty and refreshing soft

drink?” (Incorrect)

Such a question is called a double-barreled question,

because two or more questions are combined into one. To
obtain the required information, two distinct questions should
be asked:  

“Do you think Coca-Cola is a tasty soft drink?” and

“Do you think Coca-Cola is a refreshing soft drink?”
Overcoming Inability To
Answer –
1. Is the Respondent
In situations where not all respondents are
likely to be informed about the topic of
interest, filter questions that measure
familiarity and past experience should be
asked before questions about the topics

A “don't know” option appears to reduce

uninformed responses without reducing the
Overcoming Inability To
Answer –
2. Can the Respondent
How many gallons of soft drinks did you
consume during the last four weeks?

How often do you consume soft drinks in a

typical week? (Correct)
1.                  ___ Less than once a week
2.                  ___ 1 to 3 times per week
3.                  ___ 4 to 6 times per week
4.                  ___ 7 or more times per week
Overcoming Inability To Answer

3. Can the Respondent
Respondents Articulate?
may be unable to
articulate certain types of responses,
e.g., describe the atmosphere of a
department store.

Respondents should be given aids,

such as pictures, maps, and
descriptions to help them articulate
their responses.
Overcoming Unwillingness To
Answer – Effort Required of the

Most respondents are unwilling to

devote a lot of effort to provide
Overcoming Unwillingness To
Respondents are unwilling to respond to questions which
they consider to be inappropriate for the given context.
The researcher should manipulate the context so that the
request for information seems appropriate.
Legitimate Purpose
Explaining why the data are needed can make the request
for the information seem legitimate and increase the
respondents' willingness to answer.
Sensitive Information
Respondents are unwilling to disclose, at least accurately,
sensitive information because this may cause
embarrassment or threaten the respondent's prestige or
Overcoming Unwillingness To
Answer – Increasing the Willingness
of Respondents
Place sensitive topics at the end of the questionnaire.

Preface the question with a statement that the behavior of

interest is common.

Ask the question using the third-person technique : phrase

the question as if it referred to other people.

Hide the question in a group of other questions which

respondents are willing to answer. The entire list of
questions can then be asked quickly.

Provide response categories rather than asking for specific

Use randomized techniques.
Choosing Question
Structure –
Unstructured Questions
Unstructured questions are open-
ended questions that respondents
answer in their own words.

What is your occupation?

Who is your favorite actor?
What do you think about people
who shop at high-end
department stores?
Choosing Question Structure
– Structured Questions

Structured questions specify the

set of response alternatives and
the response format. A structured
question may be multiple-choice,
dichotomous, or a scale.
Choosing Question
Structure –
Multiple-Choice Questions
In multiple-choice questions, the researcher provides a
choice of answers and respondents are asked to select one
or more of the alternatives given.

Do you intend to buy a new car within the next six

____ Definitely will not buy
____ Probably will not buy
____ Undecided
____ Probably will buy
____ Definitely will buy
____ Other (please specify)
Choosing Question
Structure –
Dichotomous Questions
A dichotomous question has only two response
alternatives: yes or no, agree or disagree, and so
Often, the two alternatives of interest are
supplemented by a neutral alternative, such as
“no opinion,” “don't know,” “both,” or “none.”

Do you intend to buy a new car within the next six

_____ Yes
_____ No
_____ Don't know
Choosing Question Structure –

Do you intend to buy a new car within the next six months?

Definitely Probably Undecided Probably

will not buy will not buy will buy
will buy
1 2 3 4 5
Choosing Question
Wording –
Define the Issue
Define the issue in terms of who, what, when, where, why,
and way (the six Ws). Who, what, when, and where are
particularly important.

Which brand of shampoo do you use?


Which brand or brands of shampoo have you personally used

at home during the last month?
In case of more than one brand, please list all the brands that
apply. (Correct)
Choosing Question
Wording –
Use Unambiguous Words
In a typical month, how often do you shop in department
_____ Never
_____ Occasionally
_____ Sometimes
_____ Often
_____ Regularly

In a typical month, how often do you shop in

department stores?
_____ Less than once
_____ 1 or 2 times
_____ 3 or 4 times
_____ More than 4 times (Correct)
Choosing Question Wording –
Avoid Leading or Biasing
A leading question is one that clues the respondent to what
the answer should be, as in the following:
Do you think that patriotic Americans should buy
imported automobiles when that would put American labor out
of work?
_____ Yes
_____ No
_____ Don't know

Do you think that Americans should buy imported

_____ Yes
_____ No
_____ Don't know
Choosing Question
Wording –
Avoid Implicit Alternatives
An alternative that is not explicitly expressed in the options
is an implicit alternative.
1. Do you like to fly when traveling short

2. Do you like to fly when traveling short

distances, or would you rather drive?

Choosing Question
Wording –
Avoid Implicit Assumptions
Questions should not be worded so that the
answer is dependent upon implicit assumptions
about what will happen as a consequence.
1. Are you in favor of a balanced budget?

2. Are you in favor of a balanced budget

if it would result in an increase in
the personal income tax?

Determining the Order of
Opening Questions
The opening questions should be interesting,
simple, and non-threatening.
Type of Information
As a general guideline, basic information should
be obtained first, followed by classification, and,
finally, identification information.
Difficult Questions
Difficult questions or questions which are
sensitive, embarrassing, complex, or dull, should
be placed late in the sequence.
Determining the Order of
Effect on Subsequent Questions
General questions should precede the specific
questions (funnel approach).
Q1: “What considerations are important to
you in selecting a department store?”

Q2: “In selecting a department store, how

important is convenience of location?”

Form and Layout
Divide a questionnaire into several parts.

The questions in each part should be

numbered, particularly when branching
questions are used.

The questionnaires should preferably be


The questionnaires themselves should be

numbered serially.
Example of a Precoded
The American Lawyer
A Confidential Survey of Our Subscribers

(Please ignore the numbers alongside the answers. They are only to help
us in data processing.)

1. Considering all the times you pick it up, about how much time, in total, do
you spend reading or looking through a typical issue of THE AMERICAN

Less than 30 minutes.....................-1 11/2 hours to 1 hour 59 minutes.........-4

30 to 59 minutes............................-2 2 hours to 2 hours 59 minutes...........-5

1 hour to 1 hour 29 minutes..........-3 3 hours or more.................................-6

Reproduction of the
The questionnaire should be reproduced on good-quality paper
and have a professional appearance.
Questionnaires should take the form of a booklet rather than a
number of sheets of paper clipped or stapled together.
Each question should be reproduced on a single page (or
double-page spread).
Vertical response columns should be used for individual
Grids are useful when there are a number of related questions
they use the same set of response categories.
The tendency to crowd questions together to make the
questionnaire look shorter should be avoided.
Directions or instructions for individual questions should be
placed as close to the questions as possible.
Pretesting refers to the testing of the questionnaire on a
small sample of respondents to identify and eliminate potential

A questionnaire should not be used in the field survey without

adequate pretesting.

All aspects of the questionnaire should be tested, including

question content, wording, sequence, form and layout,
question difficulty, and instructions.

The respondents for the pretest and for the actual survey
should be drawn from the same population.

Pretests are best done by personal interviews, even if the

actual survey is to be conducted by mail, telephone, or
After the necessary changes have been made,
another pretest could be conducted by mail,
telephone, or electronic means if those methods
are to be used in the actual survey.

A variety of interviewers should be used for


The pretest sample size varies from 15 to 30

respondents for each wave.

Protocol analysis and debriefing are two commonly

used procedures in pretesting.

Finally, the responses obtained from the pretest

should be coded and analyzed.
Measurement of Central

Session - 7
Classification of Data
Geographic i.e. Area wise classification – cities , districts

Chronological i.e. on the basis of time – year wise

Qualitative i.e. according to some attribute – Male and


Quantitative i.e . In terms of magnitude – some

characteristics- income
Formation of Frequency
e.g. Refrigerator sold each day in Oct.
Classification according to class

Class Limits
Class intervals
Class frequency

Simple Tables or one way


Two way Tables

Frequency Distribution
In a frequency distribution, one
variable is considered at a time.

A frequency distribution for a

variable produces a table of
frequency counts, percentages, and
cumulative percentages for all the
values associated with that variable.
Measures of central tendency
Mean, median, mode, etc.
Measure of variation
Range, interquartile range,
variance and standard deviation,
coefficient of variation
Symmetric, skewed, using box-
and-whisker plots
Coefficient of correlation
Summary Measures

Central Tendency Quartile Variation

Mean Mode
Median Range Coefficient of


Geometric Mean Standard Deviation

Data:100, 78, 65, 43, 94, 58

Mean: The sum of a collection of data

divided by the number of data
Mean is 73
Sample Mean
Sample Size

X X1  X 2  L  X n
X 
i 1

n n
Population Mean
Population Size

X i
X1  X 2  L  X N
 i 1

Direct Method : X
• The most common measure of
central tendency
• Acts as ‘Balance Point’
• Affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12
Mean = 5 Mean = 6
Robust measure of central tendency
Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12
Median = 5 Median = 5
In an ordered array, the median is
the “middle” number
If n or N is odd, the median is the
middle number
If n or N is even, the median is the
average of the two middle numbers
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or
categorical data
Mode = 9
may be no mode orNoseveral

1 2 34 5 6 7
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Q1, the first quartile, is the value such
that 25% of the observations are smaller,
corresponding to (n+1)/4 ordered
Q2, the second quartile, is the median,
50% of the observations are smaller,
corresponding to 2(n+1)/4 = (n+1)/2
ordered observation
Q3, the third quartile, is the value such
that 75% of the observations are smaller,
Split Ordered Data into 4 Quarters

25% 25% 25% 25%

 Q1   Q3 
 Q2  i  n  1
Position of ith Quartile  Qi  
Data in Ordered Array: 11 12 13 16 16 17 17 18 21

1 9  1  12  13
Position of Q1   2.5 Q1   12.5
= Median 4= 16, Q3 = 17.5 2
Measures of Variation


Variance Standard Deviation Coefficient of

Range Population
Variance Population
Variance Sample
Interquartile Range
Measure of variation
Difference between the largest and the
smallest observations:
Range  X Largest  X Smallest
Ignore the way in which data are
Range = 12 - 7 = 5 Range = 12 - 7 = 5

7 8 9 10 11 7 8 9 10 11
12 12
Interquartile Range
Measure of variation
Also known as midspread
Spread in the middle 50%
Difference between the first and
third quartiles
Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Interquartile Range  Q3  Q1  17.5  12.5  5

Not affected by extreme values

•Important measure of variation
•Shows variation about the mean

Sample variance: n

 X X
S 
2 i 1

n 1
Population variance
 X 
2  i 1

Standard Deviation
Most important measure of variation
Shows variation about the mean
Has the same units as the original
data n

  Xi  X 

Sample standard deviation:

i 1

n 1

 Xi   
Population standard deviation: 2

 i 1

Comparing Standard
Data A
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20
Data B
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19
20 21
Data C
Mean = 15.5
s = 4.57
11 12 13 14 15 16 17 18 19 20
Coefficient of Variation
Measure of Relative Dispersion
Always in %
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula (Sample Coefficient of
CV = ⋅ 100%
Session - 8

Skewness and Kurtosis

Review of Previous
The difference between the largest and smallest
Interquartile range
The difference between the 25th and 75th
The sum of squares divided by the population size
or the sample size
Standard deviation
The square root of the variance
•Another Measure of Dispersion

•Coefficient of Variation (CV)


Measures of Dispersion –
Coefficient of Variation
Coefficient of variation (CV)
measures the spread of a set of data
as a proportion of its mean.
It is the ratio of the sample standard
deviation to the sample mean
CV = ×100%
It is sometimes expressed as a
Measures of Skewness and
A fundamental task in many
statistical analyses is to characterize
the location and variability of a
data set (Measures of central
tendency vs. measures of
Both measures tell us nothing about
the shape of the distribution
A further characterization of the
Skewness measures the degree of
asymmetry exhibited by the data
Positive skewness
There are more observations below the
mean than above it
When the mean is greater than the median
Negative skewness
There are a small number of low
observations and a large number of high
When the median is greater than the mean
Shape of a Distribution
Describes how data is distributed
Measures of shape
Mean > median: right-skewness
Mean < median: left-skewness
Mean =
median: symmetric
Symmetric Right-Skewed
Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
Kurtosis measures how peaked the
histogram is n

∑ (x − x)

kurtosis = i

The kurtosis of a normal

distribution is 0
Kurtosis characterizes the relative
peakedness or flatness of a
Platykurtic– When the kurtosis < 0,
the frequencies throughout the curve
are closer to be equal (i.e., the curve is
more flat and wide)
Thus, negative kurtosis indicates a
relatively flat distribution
Leptokurtic– When the kurtosis > 0,
there are high frequencies in only a
small part of the curve (i.e, the curve is
more peaked)
Thus, positive kurtosis indicates a
relatively peaked distribution



• Kurtosis is based on the size of a distribution's tails.
• Negative kurtosis (platykurtic) – distributions with short tails
• Positive kurtosis (leptokurtic) – distributions with relatively long tails
Statistical data which are collected,
observed or recorded at successive
intervals of time – such data are
referred as TIME SERIES :
-It helps in understanding the past
-It helps in planning future operations
-It helps in evaluating current
Components of Time Series:
-Secular trends – General movement
persisting over
long term
-Seasonal variations - pattern year after
-Cyclical variations – Fluctuations
moving up and
down every few years
-Irregular variations- Variations in
Methods of Measurement

-Moving Avg. Method

-Method of least square

Correlation Analysis
If two quantities vary in such a way that
movement in one are accompanied by
movement in another, these quantities
are said to be correlated. The statistical
tool for calculating such relationship is
known as correlation and is denoted by
= r.

Types of correlation ship

- Positive and Negative;
- Simple, partial and multiple;
- Linear and Non - linear
Scatter Plots and
A scatter plot (or scatter diagram) is used
to show the relationship between two
Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
Only concerned with strength of the
No causal effect is implied
Scatter Plot Examples
Linear relationships Curvilinear relationships

y y

x x

y y

x x
Scatter Plot Examples
Strong relationships Weak relationships

y y

x x

y y

x x
Scatter Plot Examples
No relationship

Correlation Coefficient
The population correlation
coefficient ρ (rho) measures the
strength of the association between
the variables
The sample correlation coefficient r
is an estimate of ρ and is used to
measure the strength of the linear
relationship in the sample
Features r

Range between -1 and 1

The closer to -1, the stronger the
negative linear relationship
The closer to 1, the stronger the
positive linear relationship
The closer to 0, the weaker the linear
Calculating the Correlation

r =
∑( x −x )( y −y )
[ ∑( x −x ) ][ ∑( y −y )
2 2

or the algebraic equivalent:

n∑ xy − ∑ x ∑ y
[n( ∑ x 2 ) − ( ∑ x )2 ][n( ∑ y 2 ) − ( ∑ y )2 ]

r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
For Example
Tree Trunk
Height Diameter

y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
Σ=321 Σ=73 Σ=3142 Σ=14111 Σ=713
Tree n∑ xy − ∑ x ∑ y
[n( ∑ x 2 ) − ( ∑ x)2 ][n( ∑ y 2 ) − ( ∑ y)2 ]

8(3142) − (73)(321)
50 =
[8(713) − (73)2 ][8(14111) − (321)2 ]

= 0.886


r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Calculations of Correlation when
deviations are taken from Assumed
Rank Correlation