Vous êtes sur la page 1sur 7

INTRODUCTORY STATISTICS

CORRELATION AND REGRESSION

Relationships between Variables

There are frequently occasions in business when changes in one factor appear to be
related in some way to movements in one or several other factors. For example, a
marketing manager may observe that sales increase when there has been a change in
advertising expenditure. The transportation manger notice that as vans and Lorries cover
more miles then the need for maintenance becomes more frequent.

Certain questions that may arise in the mind of the manager or analyst are:

a) Are the movements in the same or opposite direction?


b) Could changes in one phenomenon be causing or be caused by movements in the
other variable?

This important relationship is known as CAUSAL RELATIONSHIP.


The existence or non existence of a causal relationship between two variable may be
conferred from a scatter graph.

Types of relationships

Simple Linear Regression

Is a statistical technique employed to estimate the magnitude and direction of any


apparent relationship that exist between two variables.

Variables
Any quantity that can take different values is a variable. For example the heights of
blades of grass and the numbers of people waiting at a bus stop are all variables.

Two types of variables are dependent variable and independent variable.

1
Dependent variable: The variable that is being predicted or estimated.
Independent variable: A variable that provides the basis for estimation. It is the predictor
variable.
e.g 1 The area of a circle, A   r 2 , where r represents radius.
The independent variable is radius, r and the dependent variable is area, A.

e.g 2 The Bradford Electric Illuminating Company is studying the relationship between
kilowatt-hours (thousands) and the number of rooms in a private single – family
residence.
The independent variable is the number of rooms and the dependent variable is
kilowatt-hours.

Correlation

It is a statistical tool used to measure the strength of the relationship between 2 or more
variables. Two measure of correlation are:

1. Product moment coefficient of correlation – “r”


2. The rank correlation coefficient – “ R “

e.g The Bradford Electric Illuminating Company is studying the relationship between
kilowatt-hours (thousands) and the number of rooms in a private single – family
residence.
The independent variable is the number of rooms and the dependent variable is
kilowatt-hours.
In this study Bradford Electric Illuminating Company may want to find out the
strength and the direction of the relationship between the two variables, the number
of rooms and kilowatt-hours.

Product Moment Coefficient of Correlation “r” provides a measure of the strength of


the LINEAR relationships between an independent and dependent variables. It is such
that -1 �r �+1 and is given by the formula:

n�XY - �X �Y

n�X 2 - ( �X ) �
� n�Y 2 - ( �Y ) �

2 2

� �� �

2
COEFFICIENT OF DETERMINATION “r2”

This is the square of the correlation coefficient. It measures the proportion of the total
variation in y which is accounted for by changes in the value of x.
e xp lained var iation
r2 
total var iation

COEFFIECIENT OF NON - DETERMINATION

Measures the proportion of the total variation in Y is not accounted for by changes in X.
Un exp lained Variation
 1- r2
Total Variation

Example

The manger of a company with ten operating plants of similar size producing small
components have observed the following pattern of expenditure on inspection and
defective parts delivered to the customer.

Observation X(Independent Variables) Y(dependent Variable)


Inspection Expenditure Defective parts per
per 1000 units 1,000 units

1 25 50
2 30 35
3 15 60
4 75 15
5 40 46
6 65 20
7 45 28
8 24 45
9 35 42
10 70 22

3
Managers are wondering how strong the relationship is between Inspection Expenditure
and the Number of Faulty items delivered and to what extent they may predict the
number of faulty parts delivered from knowledge of expenditure on inspection.
Find:

(i) Product moment coefficient of correlation, r. Interpret the answer.


(ii) The least squares line ( or the regression equation).
(iii) The coefficient of determination, r 2 . Interpret the answer.

SOLUTION

Observation X Y X2 Y2 XY
1 25 50 625 2500 1250
2 30 35 900 1225 1050
3 15 60 225 3600 900
4 75 15 5625 225 1125
5 40 46 1600 2116 1840
6 65 20 4225 400 1300
7 45 28 2025 784 1260
8 24 45 576 2025 1080
9 35 42 1225 1764 1470
10 70 22 4900 484 1540

n = 10 �X  424 �Y  363 �X 2
 21,926 �Y 2
 15,123 �XY  12,815

n�XY - �X �Y

(i) r
n�X 2 - ( �X ) ��
� n �Y 2 - ( �Y ) �
2 2

� �� �
10 �12815 - 424 ( 363)

( 10 �21,926 - 424 ) ( 10 �15,123 - 363 )
2 2

128,150 - 153,912

( 21,9260 - 179, 766 ) ( 15,1230 -131, 768 )
-25, 762

39, 484 �19, 461
-25, 762

27, 719.99502

 -0.93

(ii) Least Squares Lines

4
Equation of a straight line y = a + bx
y is the dependent variable ( one which depends on a next variable for its value or
assignment)
x is the independent variable ( one which does not depend on another variable for
its value)
b is the gradient which is constant
a is the intercept on the y axis which is constant.

For the least square lines y = a + bx

n�x y - �x �y
b a
�y - b�x or a  Y - bX
n�x - ( �x )
2 2
n

n  10 �X  424 �Y  363 �X 2
 21,926 �Y 2
 15,123 �x Y  12,815
n�XY - �X �Y
b 
n�X 2 - ( �X )
2

10 �12,815 - 424 �363


b 
10 �21,926 - ( 424 )

-25, 762
b 
39, 484
 -0.65 ( to 2 d . p.)

b 
�Y - b�X
n
363 - ( -0.65 �424 )
b 
10
638.6
  63.86
10
Y  63.86 - 0.65 X - line of best fit

INTERPRETATION OF a

If x is equal to 0 then y = a.
In this case a = 63.86
If inspection expenditure (per 1000 units) were equal to 0 ($), the number of defective
parts (per 1000 units) delivered would be 63.86.

INTERPRETATION OF b

On average, for every unit increase in x, y increases/decrease by b (depending if b is +ve


or –ve respectively).
In this case b = - 0.65
For every 1 ($) increase in inspection expenditure (per 1000 units), the number of
defective parts (per 1000 units) delivered would decreased by 0.65.

The line of “best fit” can be imposed on the scatter graph.


(iii) COEFFICIENT OF DETERMINATION “r2”

This is the square of the correlation coefficient. It measures the proportion of the total
variation in y which is accounted for by changes in the value of x.
e xp lained var iation
r2 
total var iation
In this case r = - 0.93: r2 = 0.8649

5
r2 = 0.86

INTERPRETATION:- 86% of the variation the amount of defective parts ( per 1000)
delivered is due to changes in the inspection expenditure (per 1000 units).

COEFFIECIENT OF NON - DETERMINATION

Measures the proportion of the total variation in Y is not accounted for by changes in X.
Un exp lained Variation
 1- r2
Total Variation
In this case = 1- 0.86 = 0.14, 14% of the variation in Y is not caused by changes in X.

Using the results of the results of the regression analysis

The manager wishes to know the likely number of defects if $50 (per 1000 unit) was
spent on inspection.
Y = 63.97 – 0.65x (“line of best fit”)
When X = 50 Y = 63.86 – 0.65 (50) = 31.36
Thus the manager would conclude that, on average 31.36 defects (per 1000) would be
delivered if $50(per 1000) was on inspection

Rank Correlation Coefficient (R)

Provides a measure of the association between two sets of ranked or ordered data
-1 �R �1

Example
The table below gives the scores obtained by 10 students in two different subjects.

X Y

25 50
30 35
15 60
75 15
40 46
65 20
45 28
24 45
35 42
70 22

Find the rank correlation coefficient (or Spearman rank correlation coefficient).

SOLUTION

6
X Rank X Y Rank Y d d2
25
30 3 50 9 -6 36
15 4 35 5 -1 1
75 1 60 10 -9 81
40 10 15 1 9 81
65 6 46 8 -2 4
45 8 20 2 6 36
24 7 28 4 3 9
35 2 45 7 -5 25
70 5 42 6 -1 1
9 22 3 6 36
�d 2  310

6 �d 2
R  1-
n ( n 2 - 1)

6 �310
R  1-
10 ( 10 2 - 1)

1,860
 1-
990

= 1 – 1.88

= - 0.88