Académique Documents
Professionnel Documents
Culture Documents
A Term Paper
Presented to:
In Partial Fulfillment
of the Requirements of the Course
Statistics Applied to Educational Research II
(EdAd 600)
Tryon R. Gabriel
April, 2005
Background of the Study
used measure of correlation or association. It is named after Karl Pearson who developed
the correlational method to do agricultural research. The product moment part of the
name comes from the way in which it is calculated, by summing up the products of the
described in textbooks as the sum of the product of the Z-scores for the two variables
If we substitute the formulas for the Z-scores into this formula we get the following
formula for the Pearson Product Moment Correlation Coefficient, which we will use as a
definitional formula.
The numerator of this formula says that we sum up the products of the deviations of a
subject's X score from the mean of the X’s and the deviation of the subject's Y score from
the mean of the Y’s. This summation of the product of the deviation scores is divided by
the number of subjects times the standard deviation of the X variable times the standard
using the definitional formula. In real practice we use another formula that is
mathematically identical but is much easier to use. This is the computational or raw score
formula for the correlation coefficient. The computational formula for the Pearsonian r is
basic properties of r:
The value r measures the strength of the linear relationship between X and Y and
The closer r is to either -1 or +1, the stronger the linear relationship between X
and Y. In fact, points that fall exactly on a straight line have a correlation of +1 if
the line has positive slope and -1 if the line has negative slope.
If r is zero, then X and Y are not linearly related. They may be related, but the
The value of r does not change when the units of measurement are change.
probably use a computer to calculate the correlation coefficient. The aim of this paper is
As mentioned above, it is possible that the data obtained for each variable are too large to
handle for manual computation. In the absence of the computer, such difficulty could lead
to computational error giving results that greatly affect the decision making. In this paper,
the author presents a method of reducing the said difficulty by subtracting from the
sought to answer the question: Is there a difference in the result of the computation of
correlation coefficient when an assumed mean for a given variable is subtracted from its
values?
Procedure
presented all the possible cases where the assumed mean for a given variable (say, X or
Y) is subtracted from its values. The said cases are the following: (i) assumed mean
subtracted from the values of X alone; (ii) assumed mean subtracted from the values of Y
alone; and (iii) corresponding assumed means for X and Y subtracted from their values.
For each case, correlation coefficient is computed using the computational version of the
correlation coefficient between the variables X and Y using the computational version of
X Y X2 Y2 XY
26 37 676 1369 962
42 90 1764 8100 3780
37 48 1369 2304 1776
82 90 6724 8100 7380
66 88 4356 7744 5808
44 100 1936 10000 4400
24 95 576 9025 2280
39 120 1521 14400 4680
55 95 3025 9025 5225
61 76 3721 5776 4636
77 89 5929 7921 6853
58 100 3364 10000 5800
Σ= 34961 93764 53580
r= 0.264201335
The above shows that the correlation coefficient r = 0.264201335 and the values obtained
are very large and difficult to handle for manual computation. The above table is
presented by the author of this paper for the purpose of comparing it to the following data
it still yields the same correlation coefficient. Notice also that the values for X and X 2
become smaller compared to their original values shown in the first table and easier to
X Y X2 Y2 XY
26 -43 676 1849 -1118
42 10 1764 100 420
37 -32 1369 1024 -1184
82 10 6724 100 820
66 8 4356 64 528
44 20 1936 400 880
24 15 576 225 360
39 40 1521 1600 1560
55 15 3025 225 825
61 -4 3721 16 -244
77 9 5929 81 693
58 20 3364 400 1160
Σ= 34961 6084 4700
r= 0.264201335
The above result shows that after subtracting the assumed mean (=80) for the values of Y
it still yields the same correlation coefficient. Notice also that the values for Y and Y2
become smaller compared to their original values shown in the first table and easier to
X Y X2 Y2 XY
-14 -43 196 1849 602
2 10 4 100 20
-3 -32 9 1024 96
42 10 1764 100 420
26 8 676 64 208
4 20 16 400 80
-16 15 256 225 -240
-1 40 1 1600 -40
15 15 225 225 225
21 -4 441 16 -84
37 9 1369 81 333
18 20 324 400 360
Σ= 5281 6084 1980
r= 0.264201335
The above result shows that after subtracting the corresponding assumed means for the
values of X and Y it still yields the same correlation coefficient. Notice also that the
values for X, X2, Y, and Y2 become smaller compared to their original values shown in
the first table and again they are now easier to handle for manual computation.
Conclusion
On the basis of the above results, the author of this paper inferred that
subtracting the assumed mean from the values of the variables X and Y doesn’t alter the
result of the computation of the correlation coefficient using the computational version of
Recommendation
computation.
Reference
Dougherty, E. R. (1990). Probability and Statistics for the Engineering, Computing, and
Physical Sciences. New Jersey 07632: Prentice-Hall, Inc.