Vous êtes sur la page 1sur 32

C H A

6 P T E R

CORRELATION ANALYSIS

CONTENTS
6.1 Introduction


6.2
6.2.1
Types of Correlation

S
Positive or Negative Correlation
IM
6.2.2 Simple or Multiple Correlations
6.2.3 Partial or Total Correlation
6.2.4 Linear and Non-linear Correlation
6.3  Methods of Calculating Correlation
6.4 Scatter Diagram Method
NM

6.5 Co-variance Method – The Karl Pearson’s Correlation


Coefficient
6.5.1  Assumptions Underlying Karl Pearson’s Correlation
Coefficient
6.5.2 Interpretation of R
6.5.3 Estimation of Probable Error
6.6 Rank Correlation Method
6.6.1 Rank Correlation when Ranks are given
6.6.2 Rank Correlation when Ranks are not given
6.6.3 Rank Correlation when Equal Ranks are given
6.7  Correlation Coefficient using Concurrent Deviation
6.8 Summary
6.9 Descriptive Questions
6.10 Answers and Hints
6.11 Suggested Readings for Reference

NMIMS Global Access – School for Continuing Education


172  BUSINESS STATISTICS

INTRODUCTORY CASELET
N O T E S

RBI’S BALANCING ACT AMID SHAKY CURRENCY MARKET

The correlation between the Sensex and the rupee has been drifting
away from its historical averages, following RBI’s interventions
in the currency market. The central bank has been intervening
in the forex market in order to cap the significant upside in the
rupee as well as to build forex reserves. The 120-day correlation
between the Sensex and the rupee has fallen to a negative point of
0.36. Interestingly, such correlation levels were not seen before the
global financial crisis in September 2008.

S
IM
NM

A correlation is a measurement of how two variables are related to


each other and it can range from plus one to minus one levels. The
prime reason for the rupee not moving in tandem with the equity
gauges is the change in RBI’s focus. Of late, RBI has been focusing
on building the foreign reserves in order to be able to hedge against
any potential outflows of funds in case the yield increases in the
US markets.

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  173 

N O T E S

After studying this chapter, you should be able to:


  Understand the concept of correlation
  Study about different types of correlation
  Describe various methods of calculating correlation such as
scatter diagram method
  Discuss various types of correlation coefficients viz, Karl
Pearson correlation coefficient, rank correlation and
coefficient based on concurrent deviations.

6.1 INTRODUCTION
We often encounter the situations, where data appears as pairs of
figures relating to two variables, for example, price and demand of

S
commodity, money supply and inflation, industrial growth and GDP,
advertising expenditure and market share, etc. Examples of correlation
problems are found in the study of the relationship between IQ and
IM
aggregate percentage marks obtained in mathematics examination or
blood pressure and metabolism. In these examples, both variables are
observed as they naturally occur, since neither variable can be fixed
at predetermined levels.
These are some of the important definitions about correlation.
NM

Croxton and Cowden say, “When the relationship is of a


quantitative nature, the appropriate statistical tool for discovering
and measuring the relationship and expressing it in a brief formula
is known as correlation”.

A.M. Tuttle says, “Correlation is an analysis of the covariation between


two or more variables.”
W.A. Neiswanger says, “Correlation analysis contributes to the
understanding of economic behavior, aids in locating the critically
important variables on which others depend, may reveal to the
economist the connections by which disturbances spread and suggest
to him the paths through which stabilizing forces may become
effective.”
L.R. Conner says, “If two or more quantities vary in sympathy so
that the movement in one tends to be accompanied by corresponding
movements in others than they are said are correlated.”
Correlation is a degree of linear association between two random
variables. In these two variables, we do not differentiate them as
dependent and independent variables. It may be the case that one
is the cause and other is an effect i.e. independent and dependent
variables respectively. On the other hand, both may be dependent

NMIMS Global Access – School for Continuing Education


174  BUSINESS STATISTICS

N O T E S
variables on a third variable. In some cases there may not be any
cause-effect relationship at all. Therefore, if we do not consider and
study the underlying economic or physical relationship, correlation
may sometimes give absurd results. For example, take a case of global
average temperature and Indian population. Both are increasing over
past 50 years but obviously not related.
Correlation is an analysis of the degree to which two or more variables
fluctuate with reference to each other. Correlation is expressed by a
coefficient ranging between –1 and +1. Positive (+ve) sign indicates
movement of the variables in the same direction. E.g. Variation of the
fertilizers used on a farm and yield, observes a positive relationship
within technological limits. Whereas negative (–ve) coefficient
indicates movement of the variables in the opposite directions, i.e.
when one variable decreases, other increases. E.g. Variation of price
and demand of a commodity have inverse relationship. Absence of
correlation is indicated if the coefficient is close to zero. Value of the

S
coefficient close to ±1 denotes a very strong linear relationship.
The study of correlation helps managers in following ways:
IM
‰‰ To identify relationship of various factors and decision variables.
‰‰ To estimate value of one variable for a given value of other if both
are correlated. E.g. estimating sales for a given advertising and
promotion expenditure.
NM

‰‰ To understand economic behaviour and market forces.


‰‰ To reduce uncertainty in decision-making to a large extent.
In business, correlation analysis often helps manager to take
decisions by estimating the effects of changing the values of the
decision variables like promotion, advertising, price, production
processes, on the objective parameters like costs, sales, market share,
consumer satisfaction, competitive price. The decision becomes
more objective by removing subjectivity to certain extent. However,
it must be understood that the correlation analysis only tells us about
the two or more variables in a data fluctuate together or not. It does
not necessarily be due cause and effect relationship. To know if the
fluctuations in one of the variables indeed affects other or not, one
has to be established with logical understanding of the business
environment.
Some of the correlations could be completely nonsense relations like
increase in jobs in I.T. and reduction production of wheat over past 3
years in India, or share market Bull Run of 2004 to 2007 and increase
in suicides by farmers in India. There are many reasons to get such
spurious correlations. Hence before we use correlation analysis we
must check few factors responsible for the apparent relationship.
Firstly, the fluctuation may be a chance coincidence. In this case we
could look at the data over different periods and also study if one factor
affects the other through third factor that we have not considered.
Secondly, even when correlation exists the logical analysis may tell

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  175 

N O T E S
us that one variable is independent and other dependent on it. E.g.
surface temperature of the Pacific Ocean (Al Niño) affects monsoons
in India but monsoons do not affect temperatures of the Pacific Ocean.
Thirdly, in some cases both variables under study may be fluctuating
together due to a variation in the third variables. Thus both variables
under correlation analysis may be dependent variables and hence
not mutually correlated. In such a case, manager can not vary one of
them and expect other variable to vary. For example, correlation in
increase in share prices and stronger rupee against dollar may be due
to increase in Foreign Direct Investment (FDI). In this case expecting
to control falling share prices through selling dollars by the Reserve
Bank is incorrect. To control these two variables we need to control
FDI. Further, if the falling share prices are due to market sentiments or
overheated market, controlling FDI may not help. Thus, the manager
needs to analyze the problem in business environment before he/she
can apply the correlation analysis in decision-making.

Fill in the blanks:


S
IM
1. Correlation is an analysis of the ................... between two or
more variables
2. Correlation is a ................... of linear association between two
random variables.
NM

3. ................... analysis helps to identify relationship of various


factors and decision variables.

A correlation considers the joint variation of two measurements


with no distinction as independent and dependent variables. It is
a measure of linear relationship between them. In correlation, we
do not restrict or set values of any measurement and observe then
as they vary to different levels. It only gives indication whether the
two variables move together in linearly. On the other hand, the
regression problem considers the frequency distribution of one
variable when another is set at each of the several possible levels.

6.2 TYPES OF CORRELATION


The correlation can be studied as positive and negative, simple
and multiple, partial and total, linear and non-linear. Further the
method to study the correlation is plotting graphs on x-y axis or by
algebraic calculation of coefficient of correlation. Graphs are usually
scatter diagrams or line diagrams. The correlation coefficients have
been defined in different ways, of these Karl Pearson’s correlation
coefficient; Spearman’s Rank correlation coefficient and coefficient
of determination are more popular.

NMIMS Global Access – School for Continuing Education


176  BUSINESS STATISTICS

N O T E S
In managerial decision-making, it is a good practice to draw the scatter
diagram first, and then study the logical relationship to identify the
type of correlation and the cause effect relation. Only then manager
should calculate the coefficient of correlation for further mathematical
analysis. Types of correlation that need to be differentiated before
using the correlation coefficient for managerial decision-making are
given below.

6.2.1 POSITIVE OR NEGATIVE CORRELATION


In positive correlation, both factors increase or decrease together.
Positive or Direct Correlation refers to the movement of variables in
the same direction.

The correlation is said to be positive when the increase (decrease) in


the value of one variable is accompanied by an increase (decrease)

S
in the value of other variable also.
IM
Negative or inverse correlation refers to the movement of the
variables in opposite direction. Correlation is said to be negative, if
an increase (decrease) in the value of one variable is accompanied
by a decrease (increase) in the value of other.
NM

When we say a perfect correlation, the scatter diagram will show a


linear (straight line) plot with all points falling on straight line. If we
take appropriate scale, the straight line inclination can be adjusted
to 45°, although it is not necessary as long as inclination is not 0° or
90° where there is no correlation at all because value of one variable
changes without any change in the value of other variable. In case of
negative correlation when one variable increases the other decrease
and visa versa. If the scatter diagram shows the points distributed
closely around an imaginary line, we say it is high degree of correlation.
On the other hand, if we can hardly see any unique imaginary line
around which the observations are scattered, we say correlation
does not exist. Even in case of imaginary line being parallel to one
of the axes we say no correlation exists between the variables. If the
imaginary line is a straight line we say the correlation is linear.

6.2.2 SIMPLE OR MULTIPLE CORRELATIONS


In simple correlation the variation is between only two variables under
study and the variation is hardly influenced by any external factor.
In other words, if one of the variables remains same, there won’t be
any change in other variable. For example, variation in sales against
price change in case of a price sensitive product under stable market
conditions shows a negative correlation. In multiple correlations,
more than two variables affect one another. In such a case, we need to
study correlation between all the pairs that are affecting each other
and study extent to which they have the influence.

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  177 

N O T E S
6.2.3 PARTIAL OR TOTAL CORRELATION
In case of multiple correlation analysis there are two approaches to
study the correlation. In case of partial correlation, we study variation
of two variables and excluding the effects of other variables by keeping
them under controlled condition. In case of ‘total correlation’ study we
allow all relevant variables to vary with respect to each other and find
the combined effect. With few variables, it is feasible to study ‘total
correlation’. As number of variables increase, it becomes impractical
to study the ‘total correlation’. For example, coefficient of correlation
between yield of wheat and chemical fertilizers excluding the effects of
pesticides and manures is called partial correlation. Total correlation
is based upon all the variables.

6.2.4 LINEAR AND NON-LINEAR CORRELATION

S
When the amount of change in one variable tends to keep a
constant ratio to the amount of change in the other variable, then
the correlation is said to be linear.
IM
But if the amount of change in one variable does not bear a
constant ratio to the amount of change in the other variable then
NM

the correlation is said to be non-linear.

The distinction between linear and non-linear is based upon the


consistency of the ratio of change between the variables. The manager
must be careful in analyzing the correlation using coefficients because
most of the coefficients are based on assumption of linearity. Hence
plotting a scatter diagram is good practice. In case of linear correlation,
the differential (derivative) of relationship is constant with the graph
of the data being a straight line. In case on nonlinear correlation the
rate of variation changes as values increase or decrease. The nonlinear
relationship could be approximated to a polynomial (parabolic, cubic
etc.), exponential sinusoidal, etc. In such cases using the correlation
coefficients based on linear assumption will be misleading unless
used over a very short data range. Using computers, we could analyze
a nonlinear correlation to a certain extent, with some simplified
assumption.

Fill in the blanks:


4. The correlation is said to be ................... when the increase
(decrease) in the value of one variable is accompanied by an
increase (decrease) in the value of other variable also.
Contd...

NMIMS Global Access – School for Continuing Education


178  BUSINESS STATISTICS

N O T E S
5. Correlation is said to be ..................., if an increase (decrease)
in the value of one variable is accompanied by a decrease
(increase) in the value of other.
6. When the amount of change in one variable tends to keep a
constant ratio to the amount of change in the other variable,
then the correlation is said to be ................... .
7. In case on ................... correlation the rate of variation changes
as values increase or decrease.

Give practical examples from your life on the different types of


correlation which you have studied above.

S
Scatter diagram not only tell us about linearity or nonlinearity but
also whether the data is cyclic. When values of two variables have a
IM
constant rate of change it is linear correlation.

 ETHODS OF CALCULATING
M
6.3
CORRELATION
Simple linear correlation is a statistical tool applied in many business
NM

situations to find the degree to which two variables vary linearly to


one another. Although in many situations even if there are more than
two variables involved, two of them may be dominant. In such a case,
correlation analysis between these two variables helps us to measure
the degree of association between these two variables. For example,
demand of a particular product depends on number of factors.
However, association of demand with price may be dominant.
Correlation analysis may also be necessary to eliminate a variable
which shows low or hardly any correlation with the variable of our
interest. In statistics, there are number of measures to describe degree
of association between variables.
These are Karl Pearson’s Correlation Coefficient, Spearman’s rank
correlation coefficient, coefficient of determination, Yule’s coefficient
of association, coefficient of colligation, etc.
There are different methods which help us to find out whether the
variables are related or not.
‰‰ Scatter Diagram Method.
‰‰ Karl Pearson’s Coefficient of correlation
‰‰ Rank Method
‰‰ Concurrent deviation method.
We shall discuss these methods one by one.

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  179 

N O T E S

State whether the following statements are true/false:


8. Correlation analysis may also be necessary to eliminate a
variable which shows low or hardly any correlation with the
variable of our interest.
9. Simple linear correlation is a statistical tool applied in many
business situations to find the degree to which two variables
vary linearly to one another.

Suppose, you have some achievement test results collected in


a project on which you had worked years ago. The achievement
test had four scales: vocabulary, reading, math concepts, and math

S
problem solving. How will you find the correlation of your scores of
different subjects and interpret which was your strongest subject.
IM
6.4 SCATTER DIAGRAM METHOD
Scatter diagram is the most fundamental graph plotted to show
relationship between two variables. It is a simple way to represent
bivariate distribution. Bivariate distribution is the distribution of two
NM

random variables. Two variables are plotted one against each of the X
and Y axes. Thus, every data pair of (xi, yj) is represented by a point on
the graph, x being abscissa and y being the ordinate of the point. From
a scatter diagram we can find if there is any relationship between the
x and y, and if yes, what type of relationship. Scatter diagram thus,
indicates nature and strength of the correlation.

The pattern of points obtained by plotting the observed points are


knows as scatter diagram.

It gives us two types of information.


‰‰ Whether the variables are related or not.
‰‰ If so, what kind of relationship or estimating equation that
describes the relationship.
If the dots cluster around a line, the correlation is called linear
correlation. If the dots cluster around a curve, the correlation is called
a non-linear or curve linear correlation.
Scatter diagram is drawn to visualize the relationship between two
variables. The values of more important variable are plotted on the
X-axis while the values of the variable are plotted on the Y-axis.
On the graph, dots are plotted to represent different pairs of data.
When dots are plotted to represent all the pairs, we get a scatter

NMIMS Global Access – School for Continuing Education


180  BUSINESS STATISTICS

N O T E S
diagram. The way the dots scatter gives an indication of the kind of
relationship which exists between the two variables. While drawing
scatter diagram, it is not necessary to take at the point of sign the zero
values of X and Y variables, but the minimum values of the variables
considered may be taken.
When there is a positive correlation between the variables, the dots
on the scatter diagram run from left hand bottom to the right hand
upper corner. In case of perfect positive correlation all the dots will lie
on a straight line.
When a negative correlation exists between the variables, dots on the
scatter diagram run from the upper left hand corner to the bottom
right hand corner. In case of perfect negative correlation, all the dots
lie on a straight line.
If a scatter diagram is drawn and no path is formed, there is no
correlation.

S
Example: Figures on advertisement expenditure (X) and Sales (Y) of
a firm for the last ten years are given below. Draw a scatter diagram.
IM
Advertisement 40 65 60 90 85 75 35 90 34 76
cost in ‘000 `
Sales in Lakh ` 45 56 58 82 65 70 64 85 50 85
Solution:
NM

90
85
80
Sales in Lakh `

75
70
65 Sales
60 in Lakh `
55
50
45
40
30 50 70 90 110
Advertisement cost in '000 `

Scatter Diagram: Correlation


between Advertisement Cost & Sales
Example: Draw a scatter diagram for the following data of eight years
between income (X) and expenditure (Y).

Income (X) (`) 100 110 113 120 125 130 130 140
Expenditure (Y) (`) 85 90 91 100 110 125 125 130

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  181 

N O T E S
Solution:

140
130
Expenditure (Y) (`)

120
110
100
90
80
70
60
50
80 100 120 140 160
Income (X) (`)

Scatter Diagram
S
IM
Fill in the blanks:
10. Scatter diagram is the most fundamental graph plotted to
show relationship between ................... variables.
NM

11. The pattern of points obtained by plotting the observed points


are knows as ...................
12. In case of perfect positive correlation all the dots will lie on a
................... line.

Collect the data of income and expenditure of ten households in


your locality. Draw a scatter diagram to plot the correlation between
income and expenditure. Interpret the results and prepare a short
report.

 O-VARIANCE METHOD – THE KARL


C
6.5
PEARSON’S CORRELATION COEFFICIENT

The correlation coefficient measures the degree of association


between two variables X and Y.

Karl Pearson’s formula for correlation coefficient is given as,


Covx.cov y

sX sY

NMIMS Global Access – School for Continuing Education


182  BUSINESS STATISTICS

N O T E S

1
n
∑ (X − X)(Y − Y)  (1)
r=
sX sY
Where r is the ‘Correlation Coefficient’ or ‘Product Moment Correlation
Coefficient’ between X and Y. sX and sY are the standard deviations
of X and Y respectively. ‘n’ is the number of the pairs of variables X
1
and Y in the given data. The expression ∑ (X − X)(Y − Y) is known
n
as a covariance between the variables X and Y. It is denoted as Cov
(x, y). The Correlation Coefficient r is a dimensionless number whose
value lies between +1 and –1. Positive values of r indicate positive (or
direct) correlation between the two variables X and Y i.e. both X and
Y increase or decrease together. Negative values of r indicate negative
(or inverse) correlation, thereby meaning that an increase in one
variable X or Y results in a decrease in the value of the other variable.
A zero correlation means that there is no association between the two
variables.

S
The formula can be modified as,
IM
1 1
∑ ( X − X )(Y − Y ) ∑ ( XY − XY − XY + XY )
=r n= n
s Xs Y s Xs Y

∑ XY − ∑ X × ∑ Y
NM

= n n n
 (2)
∑X ∑X  ∑Y  ∑Y 
2 2 2 2

−   −  
n  n  n  n 
E[ XY ] − E[ X ] E[Y ]
= (3)
E[ X 2 ] − ( E[ X ] ) E[Y 2 ] − ( E[Y ] )
2 2

Equations (2) and (3) are alternate forms of equation (1). These have
advantage that we don’t have to subtract each value from the mean.
Example: The data of advertisement expenditure (X) and sales (Y)
of a company for past 10 year period is given below. Determine the
correlation coefficient between these variables and comment on the
correlation.

X 50 50 50 40 30 20 20 15 10 5
Y 700 650 600 500 450 400 300 250 210 200
Solution:

=
X
∑=
X 290
= 29=
,Y ∑
=
Y 4260
= 426
n 10 n 10

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  183 

N O T E S

S.No. X Y x (X − X ) =
= y (Y − Y ) x2 y2 xy

1 50 700 21 274 441 75076 5754


2 50 650 21 224 441 50176 4704
3 50 600 21 174 441 30276 3654
4 40 500 11 74 121 5476 814
5 30 450 1 24 1 576 24
6 20 400 -9 -26 81 676 234
7 20 300 -9 -126 81 15876 1134
8 15 250 -14 -176 196 30976 2464
9 10 210 -19 -216 361 46656 4104
10 5 200 -24 -226 576 51076 5424
Total ∑ 290 4260 0 0 2740 306840 28310

=
1
Now, r n
∑ ( X − X )(Y − Y )
=
s Xs Y
=
1
n
∑ xy
∑ x 2 ∑ y2 S ∑ xy
∑x ∑y 2 2
IM
n n
28310
=r = 0.976
2740 × 306840

This value of Karl Pearson’s coefficient r = 0.976 indicates a high


NM

degree of positive association between the variables X and Y.


Effect of shifting origin and change of scale on correlation coefficient
– –
Value of X and Y may not be integers. In such a case, the calculations
become tedious. We can expand the formula as,
1 1
n=
∑ ( X − X )(Y − Y ) ∑ XY − n ∑ X ∑ Y
r
s Xs Y 1 1
(
∑X ∑ X ) ∑ Y 2 − (∑ Y )
22 2

n n
Further simplification in computations can be adopted by calculating
the deviation of the observation from an assumed mean rather than
the actual mean, and also scaling these deviations conveniently.
Here we use the property that correlation coefficient does not change
with shifting of origin i.e. by adding or subtracting any constant from
the two variables (X, Y) correlation coefficient remains same. It also
remains unchanged if we change the scales by dividing or multiplying
the variables by a constant. Let X and Y be the two variable with
values x1, x2, ...., xn and y1, y2, ...., yn. Let us define another two variables
obtained by transformation as,
X −a
U= and V = Y − b
g h

NMIMS Global Access – School for Continuing Education


184  BUSINESS STATISTICS

N O T E S
Where a, b, g and h are constants.
In this case, we have defined variables U and V through shift of origin
from (0, 0) to (a, b) and change the X and Y scale by factors ‘g’ and
‘h’ respectively. Thus for every observation pair (xi, yi) there is a
corresponding pair ( ui, vi) such that,
xi − a and v = yi − b
ui = i
g h
Σx i Σ(g × ui + a) g × Σui + n × a
Now, X = = = = gU + a
n n n
Similarly,

Y = hV + b

Now, xi − X = (g × ui + a) − (gU + a) = g( ui − U )

And
Σ ( x i − X )2
Hence, s X 2 =
S
yi − Y= h(vi − V )

g2 ×
=
Σ( ui − U )2
g2s U
=
2
IM
n n
And s Y 2 = h2s V 2
1
Σ(xi − X )( yi − Y )
n Σg × ( ui − U ) × h × (vi − V )
NM

=
Now,  rXY =
s Xs Y n × (g × s U )(h × s V )

1
Σ( ui − U )(vi − V )
= n
s Us V
= rUV
This result is very useful for manual calculations. We can select
arbitrary constants a, b, g and h so as to simplify the data and the
find rUV which gives the result rXY. Thus, if any constant is added or
subtracted to the variables or the variables are multiplied or divided by
any constant, the correlation coefficient between these two variables
does not change.
Example: The data of advertisement expenditure (X) and sales (Y)
of a company for past 10 year period is given below. Determine the
correlation coefficient between these variables and comment the
correlation.

X 50 50 50 40 30 20 20 15 10 5
Y 700 650 600 500 450 400 300 250 210 200
Solution: We shall take U to be the deviation of X values from the
assumed mean of 30 divided by 5. Similarly, V represents the deviation
of Y values from the assumed mean of 400 divided by 10.

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  185 

N O T E S
Short cut procedure for calculation of correlation coefficient

Sl. No. X = xi Y = yi U = ui V = Vi uivi ui2 vi2


1 50 700 4 30 120 16 900
2 50 650 4 25 100 16 625
3 50 600 4 20 80 16 400
4 40 500 2 10 20 4 100
5 30 450 0 5 0 0 25
6 20 400 -2 0 0 4 0
7 20 300 -2 -10 20 4 100
8 15 250 -3 -15 45 9 225
9 10 210 -4 -19 76 16 361
10 5 200 -5 -20 100 25 400
Total -2 26 561 110 3136

r= =i 1
n

∑ ui vi −
1 n
=
n

∑ i ∑ vi
u
n i 1=i 1 S
IM
2 2
n
1 n  n
1 n 
∑ ui −  ∑ ui  ∑ vi −  ∑ vi 
2 2

=i 1 = n  i 1= i1 = n i 1 

(−2)(26)
561 −
10 561 + 5.2
= = 0.976
NM

4 676 109.6 3068.4


110 − 3136 −
10 10

Correlation of Grouped Data


Many times the observations are grouped into a ‘two way’ frequency
distribution table. These are called bivariate frequency distribution.
It is a matrix where rows are grouped for X variable and columns are
grouped for Y variable. Each cell say (i, j) represents the frequency
or count that falls in both groups of a particular range of values of
Xi and Yj. In this case correlation coefficient is given by:
1
Σ f × mx × m y − Σ( f × mx )Σ( f × my )
r= n
2 (Σf × mx )2 2 (Σf × my )2
Σ ( f × mx ) − Σ( f × m y ) −
n n
Where, mX and mY are class marks of frequency distributions of X and
Y variables, fx and fy are marginal frequencies of X and Y and fxy are
joint frequencies of X and Y respectively. As explained earlier, to make
the calculations easier, we can use the property that shifting the origin
and change of scale does not affect correlation coefficient. Hence we
could use transformation as,
mx − a my − b
dx = and dy =
g h

NMIMS Global Access – School for Continuing Education


186  BUSINESS STATISTICS

N O T E S
This is explained in the following example.
Example: Calculate coefficient of correlation for the following data.

X/Y 0-500 500-1000 1000-1500 1500-2000 2000-2500 Total


0-200 12 6 - - - 18
200-400 2 18 4 2 1 27
400-600 - 4 7 3 - 14
600-800 - 1 - 2 1 4
800-1000 - - 1 2 3 6
Total 14 29 12 9 5 69
Solution: Let the assumed mean for X be a = 1250 and the scaling
factor g = 500. Therefore, we can calculate f × dx and f × dx2 from the
marginal distribution of X as,

X Class mx − a Frequency f × dx f × dx2

0-500 250
S
Mark mx dx = g
-2
f

14 -28 56
IM
500-1000 750 -1 29 -29 29
1000-1500 1250 0 12 0 0
1500-2000 1750 1 9 9 9
2000-2500 2250 2 5 10 20
NM

Total -38 114


Similarly, let the assumed mean for Y be b = 500 and the scaling
factor h = 200. Therefore, we can calculate f × dy and f × dy2 from the
marginal distribution of Y as,

Y Class my − b Frequency f × dy f × dy2


Mark my dy = f
h
0-200 100 -2 18 -36 72
200-400 300 -1 27 -27 27
400-600 500 0 14 0 0
600-800 700 1 4 4 4
800-1000 900 2 6 12 24
Total -47 127
From the values of dx, dy and joint frequency given in the table, we can
find the value,

∑ f ×d x × dy
= (−2)(−2)(12) + (−1)(−2)(6) + (−2)(−1)(2) + (−1)(−1)(18) + (−1)(1)(2) + (−1)(2)(1)
+(1)(−1)(1) + (1)(1)(2) + (1)(2)(1) + (2)(1)(2) + (2)(2)(3)

= 48 + 12 + 4 + 18 − 2 − 2 − 1 + 2 + 2 + 4 + 12 = 97

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  187 

N O T E S
Hence,
1
Σf × dx × dy − Σ( f × dx )Σ( f × dy )
r= n
2 (Σf × dx )2 2 (Σf × dy )2
Σ( f × dx ) − Σ( f × dy ) −
n n

1
97 −× (−38)(−47)
69 71.1159
= = = 0.76
1 1 9.647 × 9.746
114 − × (−38)2 127 − × (−47)2
69 69

6.5.1 ASSUMPTIONS UNDERLYING KARL PEARSON’S


CORRELATION COEFFICIENT
The assumptions underlying Karl Pearson’s correlation coefficient
are as follows:
‰‰
S
Your data on both variables is measured on either an Interval
Scale or a Ratio Scale. Interval Scales have equal intervals
between points on your scale but they do not have a true zero
IM
point. Ratio Scales have both equal intervals between points on
their scale and they do have a true zero point.
‰‰ The traits you are measuring are normally distributed in the
population. In other words, even though the data in your sample
NM

may not be normally distributed (if you plot them in a histogram


they do not form a bell-shaped curve) you are pretty sure that
if you could collect data from the entire population the results
would be normally distributed.
‰‰ The relationship, if there is any, between the two variables
is best characterized by a straight line. This is called a “linear
relationship”. The best way to check this is to plot the variables
on a scatter plot and see if there is a clear trend from lower left to
upper right (a positive relationship) or from the upper left to the
lower right (a negative relationship). If the relationship seems
to change directions somewhere in the scatter plot, this means
that you do not have a linear relationship. Instead, it would be
curvilinear and Pearson’s r is not the best type of correlation
coefficient to use. There are others, however, that are beyond the
scope of this book so they will not be discussed. It is ok if this
assumption is violated as long as it’s not too bad (sounds really
specific, huh?)
‰‰ Homoscedasticity: A fancy term that says scores on the Y variable
are “normally distributed” across each value of the X variable.
Again, one of the easiest ways to assess homoscedasticity is to plot
the variables on a scatter plot and make sure the “spread” of the dots
is approximately equal along the entire length of the distribution.

NMIMS Global Access – School for Continuing Education


188  BUSINESS STATISTICS

N O T E S
6.5.2 INTERPRETATION OF R
The correlation coefficient, r ranges from −1 to 1. A value of 1 implies
that a linear equation describes the relationship between X and Y
perfectly, with all data points lying on a line for which Y increases
as X increases. A value of −1 implies that all data points lie on a line
for which Y decreases as X increases. A value of 0 implies that there
is no linear correlation between the variables.
More generally, note that (Xi  −  X) (Yi  −  Y) is positive if and only
if Xi and Yi lie on the same side of their respective means. Thus the
correlation coefficient is positive if Xi and Yi tend to be simultaneously
greater than, or simultaneously less than, their respective means.
‰‰ The correlation coefficient is negative if Xi and Yi tend to lie on
opposite sides of their respective means.
‰‰ The coefficient of correlation r lies between –1 and +1 inclusive
of those values.
‰‰
together. S
When r is positive, the variables x and y increases or decrease
IM
‰‰ r=+1 implies that there is a perfect positive correlation between
variables x and y.
‰‰ When r is negative, the variables x and y move in the opposite
direction.
NM

‰‰ When r=–1, there is a perfect negative correlation.


‰‰ When r=0, the two variables are uncorrelated.

6.5.3 ESTIMATION OF PROBABLE ERROR


It is used to help in the determination of the Karl Pearson’s coefficient
of correlation ‘r’. Due to this ‘r’ is corrected to a great extent but note
that ‘r’ depends on the random sampling and its conditions. It is given
by
 1 − r2 
P. E. = 0.6745  
 n 
If the value of r is less than P. E., then there is no evidence of correlation
i.e. r is not significant.
If r is more than 6 times the P. E. ‘r’ is practically certain .i.e. significant.
By adding or subtracting P. E. to ‘r’, we get the upper and Lower limits
within which ‘r’ of the population can be expected to lie.

Symbolically e = r ± P. E.
P = Correlation (coefficient) of the population.
Example:  If r = 0.6 and n = 64 find out the probable error of the
coefficient of correlation.
 1 − r2 
Solution: P. E. = 0.6745   
 n 

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  189 

N O T E S

 1 − (−0.6)2 
= 0.6745  
 64 

=  0.6745 − 0.64
8
= 0.57

Fill in the blanks:


13. The correlation ................... measures the degree of association
between two variables X and Y.
1
14. The expression
n
∑ ( X − X )(Y − Y ) is known as a ...................
between the variables X and Y.

S
15. Correlation coefficient does not change with shifting of
................... i.e. by adding or subtracting any constant from the
two variables (X, Y) correlation coefficient remains same.
IM
16. If the value of r is ................... than P. E., then there is no
evidence of correlation i.e. r is not significant.
17. If r is ................... than 6 times the P. E. ‘r’ is practically certain
i.e. significant.
NM

Suppose, you are doing a data analysis to understand if there is


any correlation between 5 different product categories of FMCG
products in terms of Attitude of buying of customers. Collect the
data and apply Karl Pearson’s correlation coefficient to find out the
correlation between 5 different product categories.

The coefficient of determination, r², is useful because it gives the


proportion of the variance (fluctuation) of one variable that is
predictable from the other variable. It is a measure that allows us
to determine how certain one can be in making predictions from a
certain model/graph.

6.6 RANK CORRELATION METHOD


Quite often the data is available in the form of some ranking for different
variables. Also there are occasions where it is difficult to measure the
cause-effect variables. For example, while selecting a candidate, there
are number of factors on which the experts base their assessment. It
is not possible to measure many of these parameters in physical units
e.g. sincerity, loyalty, integrity, tactfulness, initiative, etc. Similar is the

NMIMS Global Access – School for Continuing Education


190  BUSINESS STATISTICS

N O T E S
case during beauty contests. However, in these cases the experts may
rank the candidates. It is then necessary to find out whether the two
sets of ranks are in agreement with each other. This is measured by
Rank Correlation Coefficient. The purpose of computing a correlation
coefficient in such situations is to determine the extent to which the
two sets of ranking are in agreement. The coefficient that is determined
from these ranks is known as Spearman’s rank coefficient, rs.
This is defined by the following formula:
n
6 × ∑ di
2

rS = 1 − i =1

n( n2 − 1)

Where, n = Number of observation pairs


di = Xi – Yi
Xi = Values of variable X and Yi = values of variable Y

S
6.6.1 RANK CORRELATION WHEN RANKS ARE GIVEN
IM
Example: Ranks obtained by a set of ten students in a mathematics
test (variable X) and a physics test (variable Y) are shown below:

Rank for Variable X 1 2 3 4 5 6 7 8 9 10


Rank for Variable Y 3 1 4 2 6 9 8 10 5 7
NM

To determine the coefficient of rank correlation, rs


Solution: Computations of Spearman’s Rank Correlation as shown
below:

Individual Rank in Rank in di = xi – yi di2


Maths Physics
(X = xi) (Y = yi)
1 1 3 +2 4
2 2 1 -1 1
3 3 4 +1 1
4 4 2 -2 4
5 5 6 +1 1
6 6 9 +3 9
7 7 8 +1 1
8 8 10 +2 4
9 9 5 -4 16
10 10 7 -3 9
Total 50
n

∑d 2
Now, n = 10, i = 50
i =1

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  191 

N O T E S
Using the formula
n
6 × ∑ di
2

6 × 50
rS =1− i =1
2
=
1− =
0.697
n( n − 1) 10(100 − 1)
We can say that there is a high degree of correlation between the
performance in mathematics and physics.

6.6.2 RANK CORRELATION WHEN RANKS ARE NOT GIVEN


Example: Find the rank correlation coefficient for the following data.

X: 88 95 70 60 80 81 50 75
Y: 50 115 110 140 142 100 120 134
Solution: Let R1 and R2 denotes the ranks in X and Y respectively.
X Y R1 R2 d=R1-R2 d2
75
88
95
120
134
150
5
2
1
5
4
1 S 0
–2
0
0
4
0
IM
70 115 6 6 0 0
60 110 7 7 0 0
80 140 4 3 1 1
81 142 3 2 1 1
50 100 8 8 0 0
NM

6
6∑ d2 6×6
Coefficient of Correlation P =
1− =
1− =
+.93
n( n2 − 1) 8 ( 64 − 1)

In this method the biggest item gets the first rank, the next biggest
second rank and so on.

Example: Calculate the coefficient of rank correlation of the following


data:

X: 87 22 35 75 37
Y: 29 63 52 46 48
Solution:

X Y R1 R2 d=R1-R2 d2
87 29 1 5 –4 16
22 63 5 1 4 16
35 52 4 2 2 4
75 46 2 4 –2 4
37 48 3 3 0 0
40

NMIMS Global Access – School for Continuing Education


192  BUSINESS STATISTICS

N O T E S

6∑ d2 6 × 40
Coefficient of correlation P =
1− =
1− =
−1
n ( n − 1)
2
5 × 24
This shows on absolute negative correlation or perfect inverse
correlation.

6.6.3 RANK CORRELATION WHEN EQUAL RANKS ARE GIVEN


When two or more items have the same rank, a correction has to be
applied to ∑ di . For example, if the ranks of X are 1, 2, 3, 3, 5,….
2

showing that there are two items with the same 3rd rank and fourth
rank is skipped, then instead of writing 3, we write 3½ for both. Thus
the sum of these ranks which is 7 (3+4= 3½+3½= 7) remains same
keeping the mean of ranks unaffected. But in such cases the standard
deviation is affected. Therefore, correction is required for the Rank
( m3 − m)
Correlation Coefficient. For this, ∑ di is increased by
2
for

S 12
each tie, where m is number of items in each tie. If there are more
than one group of items with common rank, this correction factor is
to be added that many times once for each group.
IM
Example: Twelve salesmen are ranked for efficiency and length of
service as below:
Salesman A B C D E F G H I J K L
Efficiency (X) 1 2 3 4 4 4 7 8 9 10 11 12
NM

Length of 2 1 5 3 9 7 7 6 4 11 10 11
Service (Y)
Find the value of Spearman’s Rank Coefficient.
Solution:
Computations of Spearman’s Rank Correlation as shown below:
Individual Efficiency (X Length of Service di = xi – yi di2
= xi) (Y = yi)
A 1 2 -1 1
B 2 1 1 1
C 3 5 -2 4
D (4+5+6)/3 = 5 3 2 4
E (4+5+6)/3 = 5 9 -4 16
F (4+5+6)/3 = 5 (7+8)/2 = 7.5 -2.5 6.25
G 7 (7+8)/2 = 7.5 -0.5 0.25
H 8 6 2 4
I 9 4 5 25
J 10 (11+12)/2 = 11.5 -1.5 2.25
K 11 10 1 1
L 12 (11+12)/2 = 11.5 0.5 0.25
Total 65

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  193 

N O T E S
n

Now, n = 12, ∑d
i =1
i
2
= 65

Using the formula


n 2 1 1 1 
6 × ∑ di + × (33 − 3) + × (23 − 2) + × (23 − 2) 
 i =1 12 12 12 
rS = 1 − 2
n( n − 1)
6 × {65 + 2 + 0.5 + 0.5}
= 1− =
0.762
12(144 − 1)
We can conclude that there is a high degree of correlation between
efficiency and length of service.
Example: An investigation was conducted by a company on the value
educational and aptitude tests as assessment methods for recruiting
employees. It is the present practice of the company to give recruits
such tests when they apply for posts. The following data give the

S
educational and aptitude test scores, together with assessment score
by the Personal department of their ability one year after joining the
company. 1 is a low score and 20 is a high score.
IM
Employee Educational Aptitude Assembly by
test officer
A 9 17 12
B 10 14 14
NM

C 15 12 16
D 14 13 15
E 16 10 17
F 11 15 10
G 12 12 11
H 17 16 18
‰‰ Rank each set of the data
‰‰ Calculate appropriate rank correlation coefficients
Solution: Let X denote the score in educational tests, let Y denote the
score in aptitude test and Z denote the assessment by personal office.
Employee X Y Z Rx Ry Rz d1 d2 d12 d22
A 9 17 12 8 1 6 2 –5 4 25
B 10 14 14 7 4 5 –1 4 1 16
C 15 12 16 3 6.5 3 3.5 0 12.25 0
D 14 13 15 4 5 4 1 0 1 0
E 16 10 17 2 8 2 6 0 36 0
F 11 15 10 6 3 8 –5 4 25 16
G 12 12 11 5 6.5 7 –0.5 0 0.25 0
H 17 16 18 1 2 1 1 0 1 0
16 101.25 67

NMIMS Global Access – School for Continuing Education


194  BUSINESS STATISTICS

N O T E S

6∑ d2 6 × 16
P(d2 1) =
1− 2
=
1− =
0.81
N ( N − 1) 8 × 63

6∑ d2 + ∑ m( m2 − 1) / 12
P(d2 2)= 1 −
N ( N 2 − 1)

6 × (101.25 + 0.5)
=
1− =
0.2141
8 × 63
The rank correlation coefficient between educational test and
assessment score is positive and high and therefore high educational
test score will correspond to high ability in performance of the job.

Fill in the blanks:

S
18. The coefficient that is determined from these ranks is known
as ................... rank coefficient, rs.
19. When two or more items have the same rank, a correction has
IM
to be applied to ................... .

Collect the data of marks of all the students of your class of any
NM

two subjects. Convert them into ranks and find the rank correlation
between the two subjects.

 ORRELATION COEFFICIENT USING


C
6.7
CONCURRENT DEVIATION
This is the easiest method to find the correlation between two
variables. Although the method is effective in giving the direction of
the correlation as positive or negative but fails to give the accurate
strength of the correlation. In this method we check the fluctuation
in each data series as increasing (+), or decreasing (-) or equal
values. Then we count the number of items that increase or decrease
or remains equal concurrently and denote as c. The correlation
coefficient is then calculated as,

 2×c − n 
r =± ±  
 n 
Where, n = total number of pairs.
c = Number of concurrent changes
Example: The data of advertisement expenditure (X) and sales (Y)
of a company for past 10 year period is given below. Determine the
correlation coefficient between these variables and comment the
correlation.

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  195 

N O T E S

X 50 50 50 40 30 20 20 15 10 5
Y 700 650 600 500 450 400 300 250 210 200
Solution:

S.No. X Deviation Y Deviation Concurrent


Sign Sign Deviation
1 50 …… 700 …… …..
2 50 = 650 - -
3 50 = 600 - -
4 40 - 500 - +
5 30 - 450 - +
6 20 - 400 - +
7 20 = 300 - -
8 15 - 250 - +
9
10
10
5
-
-
210
200 S -
-
+
+
IM
Total ∑ 6
Therefore,

 2×c − n   2×6 −9 
r =± ±   =+ +   =0.577
 n   9 
NM

The result indicates that there is positive correlation between


advertisement expenditure (X) and sales (Y).

Fill in the blank:


20. We count the number of items that increase or decrease or
remain equal ................... and denote as c.

Collect the data of heights and weights of all the boys in your class.
Find the correlation coefficient using concurrent deviation method
between the variables height and weight.

2×c − n 
1. Sign ± is selected to make the value of   positive. The
same sign is used outside the radical.  n 
2. This method does not give strength of correlation. The
method is ad hoc and used only to reduce the efforts of tedious
calculations.

NMIMS Global Access – School for Continuing Education


196  BUSINESS STATISTICS

N O T E S

6.8 SUMMARY
‰‰ In this chapter the concept of correlation or the association
between two variables has been discussed. A scatter plot of the
variables may suggest that the two variables are related but
the value of the Pearson correlation coefficient r quantifies this
association.
‰‰ Correlation is a degree of linear association between two random
variables. In these two variables, we do not differentiate them
as dependent and independent variables. It may be the case
that one is the cause and other is an effect i.e. independent and
dependent variables respectively. On the other hand, both may
be dependent variables on a third variable.
‰‰ In business, correlation analysis often helps manager to take
decisions by estimating the effects of changing the values of the
decision variables like promotion, advertising, price, production
processes, on the objective parameters like costs, sales, market

S
share, consumer satisfaction, competitive price. The decision
becomes more objective by removing subjectivity to certain
extent.
IM
‰‰ The correlation coefficient r may assume values between –1 and
1. The sign indicates whether the association is direct (+ve) or
inverse (-ve). A numerical value of r equal to unity indicates
perfect association while a value of zero indicates no association.
‰‰ The correlation is said to be positive when the increase
NM

(decrease) in the value of one variable is accompanied by an


increase (decrease) in the value of other variable also. Negative
or inverse correlation refers to the movement of the variables
in opposite direction. Correlation is said to be negative, if an
increase (decrease) in the value of one variable is accompanied
by a decrease (increase) in the value of other.
‰‰ In simple correlation the variation is between only two variables
under study and the variation is hardly influenced by any external
factor. In other words, if one of the variables remains same, there
won’t be any change in other variable.
‰‰ In case of multiple correlation analysis there are two approaches
to study the correlation. In case of partial correlation, we study
variation of two variables and excluding the effects of other
variables by keeping them under controlled condition.
‰‰ When the amount of change in one variable tends to keep a
constant ratio to the amount of change in the other variable, then
the correlation is said to be linear. But if the amount of change
in one variable does not bear a constant ratio to the amount of
change in the other variable then the correlation is said to be
non-linear.
‰‰ Correlation analysis may also be necessary to eliminate a variable
which shows low or hardly any correlation with the variable
of our interest. In statistics, there are number of measures to
describe degree of association between variables. These are Karl
Pearson’s Correlation Coefficient, Spearman’s rank correlation

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  197 

N O T E S
coefficient, coefficient of determination, Yule’s coefficient of
association, coefficient of colligation, etc.
‰‰ The correlation coefficient measures the degree of association
between two variables X and Y.
‰‰ Karl Pearson’s formula for correlation coefficient is given as,
Covx.cov y
r=
s Xs Y
1
n
∑ ( X − X )(Y − Y )
r=
s Xs Y
‰‰ The purpose of computing a correlation coefficient in such
situations is to determine the extent to which the two sets of
ranking are in agreement. The coefficient that is determined
from these ranks is known as Spearman’s rank coefficient, rs.
This is defined by the following formula:

rS = 1 −
n( n2 − 1)
n
6 × ∑ di
i =1
2

S
IM
Where, n = Number of observation pairs
    di = Xi – Yi
    Xi = Values of variable X and Yi = values of variable Y
‰‰ Although the concurrent deviation method is effective in giving
NM

the direction of the correlation as positive or negative but fails


to give the accurate strength of the correlation. In this method
we check the fluctuation in each data series as increasing (+),
or decreasing (–) or equal values. Then we count the number of
items that increase or decrease or remains equal concurrently
and denote as c. The correlation coefficient is then calculated as,
 2×c − n 
r =± ±  
 n 
Where, n = total number of pairs.
c = Number of concurrent changes

‰‰ Correlation: Correlation is a degree of linear association


between two random variables. In these two variables, we
do not differentiate them as dependent and independent
variables.
‰‰ Positive Correlation: The correlation is said to be positive
when the increase (decrease) in the value of one variable is
accompanied by an increase (decrease) in the value of other
variable also.
‰‰ Negative Correlation: Correlation is said to be negative, if an
increase (decrease) in the value of one variable is accompanied
by a decrease (increase) in the value of other.
Contd...

NMIMS Global Access – School for Continuing Education


198  BUSINESS STATISTICS

N O T E S
‰‰ Linear Correlation: When the amount of change in one
variable tends to keep a constant ratio to the amount of change
in the other variable, then the correlation is said to be linear.
‰‰ Non-linear Correlation: The amount of change in one variable
does not bear a constant ratio to the amount of change in the
other variable then the correlation is said to be non-linear.
‰‰ Coefficient of Correlation: The correlation coefficient
measures the degree of association between two variables X
and Y.
‰‰ Scatter Diagram: The pattern of points obtained by plotting
the observed points are knows as scatter diagram.

6.9 DESCRIPTIVE QUESTIONS


1. Define correlation. Explain the meaning with the help of an
example.
2.
3. S
How can a study of correlation help managers in business?
What are different types of correlation? Explain with example.
IM
4. How will you find out correlation between two variables by
scatter diagram method?
5. What are different methods of calculating correlation?
6. How do you calculate Karl Pearson’s correlation coefficient?
NM

Give its different formulas. What is the effect of shifting origin


and change of scale on correlation coefficient?
7. What are the assumptions underlying Karl Pearson’s correlation
coefficient?
8. How do you interpret coefficient of correlation?
9. How do you calculate rank correlation coefficient when ranks are
given and when equal ranks are given? Explain with examples.
10. What is the formula for finding out correlation coefficient using
concurrent deviations?

EXERCISE FOR PRACTICE


1. Calculate coefficient of correlation between advertisement cost
and sales as per the data given below:

Advertisement 39 65 62 90 82 75 25 98 36 78
cost in ’000 `
Sales in Lakh ` 47 53 58 86 62 68 60 91 51 84
2.
Marks in Marks in
Statistics Economics
Mean 55 48
Standard Deviation 4 5

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  199 

N O T E S
The correlation coefficient between marks in statistics and
economics is 0.8 given in table above. Estimate the marks in
statistics of a student who scored 50 marks in economics.
3. Calculate coefficient of correlation between X and Y as per the
data given below:
X 14 16 20 22 28 30 34 40 45
Y 97 89 68 65 56 50 37 18 12
4. Ten competitors in a beauty contest are ranked by three judges
in the following order. Determine which pair of judge has the
nearest approach to common taste in beauty?
Judge 1: 1 6 5 10 3 2 4 9 7 8
Judge 2: 3 5 8 4 7 10 2 1 6 9
Judge 3: 6 4 9 8 1 2 3 10 5 7
5.

S
Ten candidates obtained the following marks in examinations in
Statistics and Mathematics. Find the rank correlation coefficient
to determine whether these results support the suggestion that
IM
ability in one subject is associated with ability in the other.

Candidate A B C D E F G H I J
Statistics 40 65 61 49 53 42 68 57 58 46
Maths 51 58 67 55 76 45 69 56 73 63
NM

6.10 ANSWERS AND HINTS


ANSWERS FOR SELF ASSESSMENT QUESTIONS
Topic Q. No. Answers
Introduction 1. Covariation
2. Degree
3. Correlation
Types of Correlation 4. Positive
5. Negative
6. Linear
7. Nonlinear
Methods of Calculating Correlation 8. True
9. True
Scatter Diagram Method 10. Two
11. Scatter diagram
12. Straight
Co-variance Method – The Karl 13. Coefficient
Pearson’s Correlation Coefficient
14. Covariance
15. Origin
Contd...

NMIMS Global Access – School for Continuing Education


200  BUSINESS STATISTICS

N O T E S

16. Less
17. More
Rank Correlation Method 18. Spearman’s
19.
∑d i
2

Correlation coefficient using 20. Concurrently


concurrent deviation
HINTS FOR DESCRIPTIVE QUESTIONS
1. Refer Section 6.1
Correlation is a degree of linear association between two random
variables. In these two variables, we do not differentiate them
as dependent and independent variables. It may be the case
that one is the cause and other is an effect i.e. independent and
dependent variables respectively. On the other hand, both may

2. Refer Section 6.1


S
be dependent variables on a third variable.
IM
The study of correlation helps managers in following ways:
(a) To identify relationship of various factors and decision
variables.
(b) To estimate value of one variable for a given value of other
if both are correlated. E.g. estimating sales for a given
NM

advertising and promotion expenditure.


3. Refer Section 6.2
The correlation can be studied as positive and negative, simple
and multiple, partial and total, linear and non linear. Further the
method to study the correlation is plotting graphs on x-y axis or
by algebraic calculation of coefficient of correlation.
4. Refer Section 6.4
Scatter diagram is the most fundamental graph plotted to show
relationship between two variables. It is a simple way to represent
bivariate distribution. Bivariate distribution is the distribution of
two random variables. Two variables are plotted one against each
of the X and Y axes. Thus, every data pair of (xi ,yj) is represented
by a point on the graph, x being abscissa and y being the ordinate
of the point.
5. Refer Section 6.3
Correlation analysis may also be necessary to eliminate a variable
which shows low or hardly any correlation with the variable
of our interest. In statistics, there are number of measures to
describe degree of association between variables.
These are Karl Pearson’s Correlation Coefficient, Spearman’s
rank correlation coefficient, coefficient of determination, Yule’s
coefficient of association, coefficient of colligation, etc.

NMIMS Global Access – School for Continuing Education


CORRELATION ANALYSIS  201 

N O T E S
6. Refer Section 6.5
Karl Pearson’s formula for correlation coefficient is given as,
Covx.cov y
r=
s Xs Y
1
∑ ( X − X )(Y − Y )
r= n
s Xs Y
Where r is the ‘Correlation Coefficient’ or ‘Product Moment
Correlation Coefficient’ between X and Y. sX and sY are the
standard deviations of X and Y respectively. ‘n’ is the number of
the pairs of variables X and Y in the given data.
7. Refer Section 6.5.1
The assumptions underlying Karl Pearson’s correlation
coefficient are as follow:

S
(a) Your data on both variables is measured on either an Interval
Scale or a Ratio Scale.
IM
(b) The traits you are measuring are normally distributed in the
population.
8. Refer Section 6.5.2
The correlation coefficient, r ranges from −1 to 1. A value
NM

of 1 implies that a linear equation describes the relationship


between X and Y perfectly, with all data points lying on a line for
which Y increases as X increases. A value of −1 implies that all
data points lie on a line for which Y decreases as X increases. A
value of 0 implies that there is no linear correlation between the
variables.
9. Refer Section 6.6
The purpose of computing a correlation coefficient in such
situations is to determine the extent to which the two sets of
ranking are in agreement. The coefficient that is determined
from these ranks is known as Spearman’s rank coefficient, rs.
This is defined by the following formula:
n
6 × ∑ di
2

rS = 1 − i =1

n( n2 − 1)

Where, n = Number of observation pairs


di = Xi – Yi
Xi = Values of variable X and Yi = values of variable Y
10. Refer Section 6.7
In this method we check the fluctuation in each data series as
increasing (+), or decreasing (–) or equal values. Then we count

NMIMS Global Access – School for Continuing Education


202  BUSINESS STATISTICS

N O T E S
the number of items that increase or decrease or remains equal
concurrently and denote as c. The correlation coefficient is then
calculated as,

 2×c − n 
r =± ±  
 n 
Where, n = total number of pairs.
c = Number of concurrent changes

ANSWERS FOR EXERCISE FOR PRACTICE


1. 0.78041
2. 56.28
3. -0.99863
4. The first and third judge has the nearest approach in common

5. 0.6
between them.
S
testing beauty because the coefficient of correlation is highest
IM
6.11 SUGGESTED READINGS FOR REFERENCE
SUGGESTED READINGS
NM

‰‰ Gupta, S.P. and Gupta, M.P., Business Statistics, Sultan Chand &
Sons, New Delhi, 1987
‰‰ Loomba, M.P., Management – A Quantitative Perspective,
MacMillan Publishing Company, New York, 1978.
‰‰ Levin, R.I., Statistics for Management, Prentice-Hall of India,
New Delhi, 1979
‰‰ Shenoy, G.V., Srivastava, U.K. and Sharma, S.C., Quantitative
Techniques for Managerial Decision Making, Wiley Eastern, New
Delhi, 1985
‰‰ Venkata Rao, K., Management Science, McGraw-Hill Book
Company, Singapore, 1986.
‰‰ Bhardwaj, R.S., Business Statistics, 2nd Edition, Excel Books,
New Delhi.
‰‰ Kothari, C.R., Quantitative Techniques, Vikas Publication.

E-REFERENCES
‰‰ http://www.pinkmonkey.com/
‰‰ https://www.tutorsland.com/
‰‰ http://www.jstor.org/

NMIMS Global Access – School for Continuing Education

Vous aimerez peut-être aussi