Vous êtes sur la page 1sur 11

ALTERNATIVE METHOD OF COMPUTING

CORRELATION COEFFICIENT USING THE


COMPUTATIONAL VERSION OF THE
PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT
FORMULA

A Term Paper
Presented to:

Dr. Lucila Fineza-Tibigar


(Professor)

In Partial Fulfillment
of the Requirements of the Course
Statistics Applied to Educational Research II
(EdAd 600)

Tryon R. Gabriel
April, 2005
Background of the Study

The Pearson Product Moment Correlation Coefficient is the most widely

used measure of correlation or association. It is named after Karl Pearson who developed

the correlational method to do agricultural research. The product moment part of the

name comes from the way in which it is calculated, by summing up the products of the

deviations of the scores from the mean.

The symbol for the correlation coefficient is lower case r, and it is

described in textbooks as the sum of the product of the Z-scores for the two variables

divided by the number of scores.

If we substitute the formulas for the Z-scores into this formula we get the following

formula for the Pearson Product Moment Correlation Coefficient, which we will use as a

definitional formula.

The numerator of this formula says that we sum up the products of the deviations of a

subject's X score from the mean of the X’s and the deviation of the subject's Y score from

the mean of the Y’s. This summation of the product of the deviation scores is divided by

the number of subjects times the standard deviation of the X variable times the standard

deviation of the Y variable.


You can see that it is fairly difficult to calculate the correlation coefficient

using the definitional formula. In real practice we use another formula that is

mathematically identical but is much easier to use. This is the computational or raw score

formula for the correlation coefficient. The computational formula for the Pearsonian r is

To properly interpret the correlation coefficient, one must understand the

basic properties of r:

 The value r measures the strength of the linear relationship between X and Y and

will always be between -1 and +1.

 The closer r is to either -1 or +1, the stronger the linear relationship between X

and Y. In fact, points that fall exactly on a straight line have a correlation of +1 if

the line has positive slope and -1 if the line has negative slope.

 If r is zero, then X and Y are not linearly related. They may be related, but the

relationship is not a straight line.

 The value of r does not change when the units of measurement are change.

It is still computationally difficult to find the correlation coefficient,

especially if we are dealing with a large number of subjects. In practice we would

probably use a computer to calculate the correlation coefficient. The aim of this paper is

to present a modified method of computing correlation coefficient using the

computational version of the Pearson Product-Moment Correlation Coefficient formula.

As mentioned above, it is possible that the data obtained for each variable are too large to

handle for manual computation. In the absence of the computer, such difficulty could lead
to computational error giving results that greatly affect the decision making. In this paper,

the author presents a method of reducing the said difficulty by subtracting from the

values of the variable its corresponding assumed mean.

Statement of the Problem

The purpose of this paper is to present and determine the validity of an

alternative method of computing correlation coefficient using the computational version

of the Pearson Product-Moment Correlation Coefficient formula. Specifically, this paper

sought to answer the question: Is there a difference in the result of the computation of

correlation coefficient when an assumed mean for a given variable is subtracted from its

values?

Procedure

To determine the validity of the said alternative method, the author

presented all the possible cases where the assumed mean for a given variable (say, X or

Y) is subtracted from its values. The said cases are the following: (i) assumed mean

subtracted from the values of X alone; (ii) assumed mean subtracted from the values of Y

alone; and (iii) corresponding assumed means for X and Y subtracted from their values.

For each case, correlation coefficient is computed using the computational version of the

Pearson Product-Moment Correlation Coefficient formula.


Findings

The following is the result of the usual method of computing the

correlation coefficient between the variables X and Y using the computational version of

the Pearson Product-Moment Correlation Coefficient Formula.

X Y X2 Y2 XY
26 37 676 1369 962
42 90 1764 8100 3780
37 48 1369 2304 1776
82 90 6724 8100 7380
66 88 4356 7744 5808
44 100 1936 10000 4400
24 95 576 9025 2280
39 120 1521 14400 4680
55 95 3025 9025 5225
61 76 3721 5776 4636
77 89 5929 7921 6853
58 100 3364 10000 5800
Σ= 34961 93764 53580
r= 0.264201335

The above shows that the correlation coefficient r = 0.264201335 and the values obtained

are very large and difficult to handle for manual computation. The above table is

presented by the author of this paper for the purpose of comparing it to the following data

obtained for the above-mentioned cases:


X Y X2 Y2 XY
-14 37 196 1369 -518
2 90 4 8100 180
-3 48 9 2304 -144
42 90 1764 8100 3780
26 88 676 7744 2288
4 100 16 10000 400
-16 95 256 9025 -1520
-1 120 1 14400 -120
15 95 225 9025 1425
21 76 441 5776 1596
37 89 1369 7921 3293
18 100 324 10000 1800
Σ= 5281 93764 12460
r= 0.264201335

1. Assumed mean subtracted from the values of X alone:


The above result shows that after subtracting the assumed mean (=40) for the values of X

it still yields the same correlation coefficient. Notice also that the values for X and X 2

become smaller compared to their original values shown in the first table and easier to

handle for manual computation.

2. Assumed mean subtracted from the values of Y alone:

X Y X2 Y2 XY
26 -43 676 1849 -1118
42 10 1764 100 420
37 -32 1369 1024 -1184
82 10 6724 100 820
66 8 4356 64 528
44 20 1936 400 880
24 15 576 225 360
39 40 1521 1600 1560
55 15 3025 225 825
61 -4 3721 16 -244
77 9 5929 81 693
58 20 3364 400 1160
Σ= 34961 6084 4700
r= 0.264201335

The above result shows that after subtracting the assumed mean (=80) for the values of Y

it still yields the same correlation coefficient. Notice also that the values for Y and Y2
become smaller compared to their original values shown in the first table and easier to

handle for manual computation.

3. Corresponding assumed means for X and Y subtracted from their values:

X Y X2 Y2 XY
-14 -43 196 1849 602
2 10 4 100 20
-3 -32 9 1024 96
42 10 1764 100 420
26 8 676 64 208
4 20 16 400 80
-16 15 256 225 -240
-1 40 1 1600 -40
15 15 225 225 225
21 -4 441 16 -84
37 9 1369 81 333
18 20 324 400 360
Σ= 5281 6084 1980
r= 0.264201335

The above result shows that after subtracting the corresponding assumed means for the

values of X and Y it still yields the same correlation coefficient. Notice also that the

values for X, X2, Y, and Y2 become smaller compared to their original values shown in

the first table and again they are now easier to handle for manual computation.
Conclusion

On the basis of the above results, the author of this paper inferred that

subtracting the assumed mean from the values of the variables X and Y doesn’t alter the

result of the computation of the correlation coefficient using the computational version of

the Pearson Product-Moment Correlation Coefficient formula.

Recommendation

1. In view of the above satisfactory result, the author of this paper

recommends the method of subtracting the assumed mean from the

values of the variable in the computation of the correlation coefficient

using the computational version of the Pearson Product-Moment

Correlation Coefficient formula. It also greatly reduces the magnitude

of the numbers involved making them easier to handle for manual

computation.

2. If the assumed mean doesn’t sufficiently reduce the size of the

numbers, the author also recommends dividing the said numbers by a

multiple of ten before performing the computation of the correlation


coefficient using the computational version of the Pearson Product-

Moment Correlation Coefficient formula.

Reference

Kitchens, L. J. (1998). Exploring Statistics, A Modern Introduction to Data Analysis and


Inference, 2nd ed. Ca. 93950: Brooks/Cole Publishing Co.

Bernstein, S. & Bernstein, R. (1999). Schaum’s Outline of Theory and Problems of


Elements of Statistics I: Descriptive Statistics and Probability, International ed.
Singapore: McGraw-Hill Book Co.

Bernstein, S. & Bernstein, R. (1999). Schaum’s Outline of Theory and Problems of


Elements of Statistics II: Inferential Statistics, International ed. Singapore: McGraw-Hill
Book Co.

Dougherty, E. R. (1990). Probability and Statistics for the Engineering, Computing, and
Physical Sciences. New Jersey 07632: Prentice-Hall, Inc.

Vous aimerez peut-être aussi