Vous êtes sur la page 1sur 16

MTH-520: UNIT-II

Correlation
Lecture
Introduction
Often we obtain information on not only one
variable, but on two or more related variables
For e.g.
Height and weight of a group of people
Temperature and incidence of pests
Size of landholding and income from farm
Plant height and seed yield per plant
Price and demand of a commodity
When we have two or more variables
(Bivariate/ multivariate distribution), we may
be interested
To see if the change in one variable
produces any change in another variable
To quantify the strength of the
relationship between the variables
Magnitude and direction
For this, the statistical tool, correlation is used.
Correlation
If the changes in one variable produce any change in
another variable, the variables are said to be correlated
If the increase in one variable results in corresponding
increase in another variable, the variables are said to be
positively correlated.
e.g. height and weight of a group of people
Investment in agriculture and agricultural
production
Amount of rainfall and yield of paddy
Correlation
If the increase in one variable results in
corresponding decrease in another variable, the
variables are said to be negatively or inversely
correlated.
e.g. Price of goods and their demand
Mothers education and prevalence of malnutrition
among children
Pest incidence and crop yield
How to spot any relationship between two
variables?
Graphically representing the data:
Scatter Plot
73
72
height of sons (inches)

71
70
69
68
67
66
65
64
64 65 66 67 68 69 70 71 72 73
height of fathers (inches)
Scatter Plots
How to measure this relationship?
Karl Pearsons Coefficient of Correlation: it
measures the strength of linear relationship
between two variables.
Example: Correlation coefficient between seed yield
per plant and plant height of sesamum

Seed yield per Plant height (in


plant (g) (Y) cm) (X)
Scatter plot for seed yield and plant
5.22 94.2 height
8.13 69.3 10
6.52 115.3 8
4.16 83.3 6
4
8.98 85.4
2
3.05 68.1
0
3.49 50.7 0 50 100 150
5.40 96.2
2.39 76.1
2.71 52.0
3.97 82.1
7.56 81.3
Example: Correlation coefficient between seed yield
per plant and plant height of sesamum

Seed yield per Plant height


plant (g) (Y) (in cm) (X)
5.22 94.2
8.13 69.3
6.52 115.3
4.16 83.3
8.98 85.4
3.05 68.1
3.49 50.7
5.40 96.2
2.39 76.1
2.71 52.0 r=0.3944
3.97 82.1
7.56 81.3
Limits of correlation coefficient

If the sign of r(X,Y) is positive => positive or direct correlation


between the variables
If the sign of r(X,Y) is negative => negative or inverse correlation
between the variables
If the value of r(X,Y) is close to 1 => very high (strong) positive
correlation
If the value of r(X,Y) is close to -1 => very high (strong) negative
correlation
If the value of r(X,Y) is close to 0 => very weak correlation
If the value of r(X,Y) =0 => the variables are uncorrelated
Scatter Plots and Corresponding Values of r(X,Y)

r>0 r<0 r=0


(positive) (negative) (zero, uncorrelated)

r=-1 (Perfect negative)


r=1 (Perfect Positive)
Rank Correlation
To find out correlation between two variables
where data are in form of ranks
Much useful in case of qualitative data
measured in ordinal scale
E.g. correlation between intelligence and
honesty of a group of people
Correlation between the ranks obtained by
competitors in a music competition from two
judges
Spearmans Rank Correlation
Coefficient
Rank correlation
Ranks by A Ranks by B
1 3
6 5
5 8
10 4
3 7
2 10
4 2
9 1
7 6
8 9
Important to remember
Correlation coefficient measures the strength of LINEAR
relationship between two variables.

If the variables are dependent on each other by some


quadratic, cubic or any other form of relationship (except
linear), r(X,Y) fails to give the strength of relationship between
two variables.

Hence, r(X,Y)=0 means there is no LINEAR relationship


between the variables.
r(X,Y) does not tell anything about causality between
Variables X and Y.