Académique Documents
Professionnel Documents
Culture Documents
• number of observations;
• strength of relationship (slope);
• strength of correlation (scatter).
• significance
• confidence
How many points are in each quadrant?
2 5
6 2
Simple case: centred on (0,0)
Y
X is positive
X is negative
Y is positive:
Y is positive:
X x Y is positive
X x Y is negative
X is negative X is positive
Y is negative : Y is negative :
X x Y is positive X x Y is negative
Porosity, φ and Permeability, K are both
always positive:
K
x
x
x
x x
x
x
x
x
x
φ
But the difference between φ and mean(φ)
plotted against the difference between K and
mean(K) centres the plot on (0,0):
(K – K)
x
x
x
x x
x
x (φ – φ)
x
x
x
So the difference between φ and mean(φ) and
the difference between K and mean(K) gives us
the basis for the measure of correlation we want:
(K – K)
4 points in
1 point in x quadrant
quadrant x
x
x x
x
x (φ – φ)
1 point in
x quadrant
x
4 points in x
quadrant
_ _
8 points for which (K – K).(φ – φ)
is positive versus 2 points for which
it is negative
(K – K)
4 points in
1 point in x quadrant
quadrant x
x
x x
x
x (φ – φ)
1 point in
x quadrant
x
4 points in x
quadrant
A formula for calculating the
correlation:
x x y y
r i i
n i j
Notation
∑X = ∑xi
= x1 + ……………..….. xn
∑Y = ∑yi
= y1 + ……………. ….. yn
But a better formula for calculating
the correlation is:
r
x y x y / n
i i i i
x x
2
y y
2
n
2 i 2 i
i n i
Why is this a better formula for
calculating the correlation?
∑xi = SUM(M7:M226)
∑xi2 = SUMSQ(M7:M226)
∑xiyi = SUMPRODUCT(J2:J4,I2:I4)
Notation: ∑xi
∑xi = x1 + x2 + x3 + x4 + ….. xn
∑yi = y1 + ……………. ….. yn
∑xiyi = SUMPRODUCT(M7:M232,P7:P232)
∑xiyi - ∑xi ∑yi / n
______________________________
√(∑xi2 - (∑xi)2 / n ) (∑yi2 - (∑yi)2 / n )
• ∑xi = SUM(M7:M232)
• ∑xi2 = SUMSQ(M7:M232)
• ∑xiyi = SUMPRODUCT(M7:M232,P7:P232)
Calculating the correlation coefficient (3)
0.08 8
0.07 6
0.06
4
0.05
2
0.04
0.03 0
0 5 10 15 20 25
0.02 -2
0.01
-4
0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 -6
Cautions:
2 .Causality:
Just because two variables are correlated, does not mean
there is a causal relationship between them, even once we
have ruled out the "false positive" effect. The statistical
literature abounds with counter-examples, mostly
accidental and some hilarious, at least to those who
weren't involved.
Does this show what the authors
think it shows?