Correlation: We Take Two Measurements, of Two Different Physical Properties Are They Related?

Correlation
We take two measurements, of

two different physical properties;
are they related?
What affects the degree (or amount)
of correlation?
• number of observations;
• strength of relationship (slope);
• strength of correlation (scatter).
• significance
• confidence
How many points are in each quadrant?
2 5
6 2
Simple case: centred on (0,0)
Y
X is positive
X is negative
Y is positive:
Y is positive:
X x Y is positive
X x Y is negative
X is negative X is positive
Y is negative : Y is negative :
X x Y is positive X x Y is negative
Porosity, φ and Permeability, K are both
always positive:
K
x
x
x
x x
x
x
x
x
x
φ
But the difference between φ and mean(φ)
plotted against the difference between K and
mean(K) centres the plot on (0,0):
(K – K)
x
x
x
x x
x
x (φ – φ)
x
x
x
So the difference between φ and mean(φ) and
the difference between K and mean(K) gives us
the basis for the measure of correlation we want:
(K – K)
4 points in
1 point in x quadrant
quadrant x
x
x x
x
x (φ – φ)
1 point in
x quadrant
x
4 points in x
quadrant
_ _
8 points for which (K – K).(φ – φ)
is positive versus 2 points for which
it is negative
(K – K)
4 points in
1 point in x quadrant
quadrant x
x
x x
x
x (φ – φ)
1 point in
x quadrant
x
4 points in x
quadrant
A formula for calculating the
correlation:
 x  x  y  y 
r i i
n i j
Notation
∑X = ∑xi
= x1 + ……………..….. xn
∑Y = ∑yi
= y1 + ……………. ….. yn
But a better formula for calculating
the correlation is:
r
 x y    x  y / n
i i i i
 
 x   x  
2
  y   y 
  2


 n    
2 i  2 i
i n i

    
Why is this a better formula for
calculating the correlation?
Because each of the terms can be

calculated relatively simply
Calculating the correlation coefficient (1)
Set up one cell of a spreadsheet for each of

the terms in the equation for the
correlation coefficient, r :
∑xiyi
∑xi ∑yi / n
∑xi2 ∑yi2
(∑xi)2 / n (∑yi)2 / n
Notation: ∑xi
∑xi = x1 + x2 + x3 + x4 + ….. xn
In Excel, if the data are in rows 7 –
232 (n = 226) of column M, then:
∑xi = SUM(M7:M226)
∑xi2 = SUMSQ(M7:M226)
∑xiyi = SUMPRODUCT(J2:J4,I2:I4)
Notation: ∑xi
∑xi = x1 + x2 + x3 + x4 + ….. xn
∑yi = y1 + ……………. ….. yn
In Excel, if the data are in rows 7 – 232

(n = 226) of columns M and P, then:
∑xiyi = SUMPRODUCT(M7:M232,P7:P232)
∑xiyi - ∑xi ∑yi / n
______________________________
√(∑xi2 - (∑xi)2 / n ) (∑yi2 - (∑yi)2 / n )
• ∑xi = SUM(M7:M232)
• ∑xi2 = SUMSQ(M7:M232)
• ∑xiyi = SUMPRODUCT(M7:M232,P7:P232)
• Calculate the correlation coefficient for the

porosity and permeability data in
USGS_poroperm_data\37-Lindquist-1988.xls
Alternatively, the whole expression can be evaluated in a single
function call:
PEARSON takes 2 arguments, the arrays of x and of y. Hence in
the above example:
porosity and permeability data in

porosity and log10(permeability) data in

porosity and ln(permeability) data in

porosity and (cube root of permeability) data in
Cautions:
1. False positives:
Just because there is a high correlation coefficient does
not mean there is a high correlation. It is the nature of
random processes that, if we take a number of
uncorrelated variables and plot them against each other,
then there will be a spread of correlations around the
zero point and, if we take enough pairs, then the highest
and lowest will be significant at any pre-defined
percentage point.
Experiment
• Create 10 sets of 10 pairs of random
numbers.
• Tabulate each set.
• Calculate the correlation coefficient for
each set
Larger number of pairs reduces this
risk: two data sets both with r = 0.84
0.08 8
0.07 6
0.06
4
0.05
2
0.04
0.03 0
0 5 10 15 20 25
0.02 -2
0.01
-4
0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 -6
Cautions:
2 .Causality:
Just because two variables are correlated, does not mean
there is a causal relationship between them, even once we
have ruled out the "false positive" effect. The statistical
literature abounds with counter-examples, mostly
accidental and some hilarious, at least to those who
weren't involved.
Does this show what the authors
think it shows?

Correlation: We Take Two Measurements, of Two Different Physical Properties Are They Related?

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Correlation: We Take Two Measurements, of Two Different Physical Properties Are They Related?

Transféré par

Droits d'auteur :

Formats disponibles

Correlation

We take two measurements, of

Because each of the terms can be

Set up one cell of a spreadsheet for each of

In Excel, if the data are in rows 7 – 232

• Calculate the correlation coefficient for the

• Calculate the correlation coefficient for the

• Calculate the correlation coefficient for the

• Calculate the correlation coefficient for the

Vous aimerez peut-être aussi