Vous êtes sur la page 1sur 15

Quantitative Reasoning

Association
Key Concepts

Sudarshan Narasimhan
(Dash)

pvosn@nus.edu.sg
Tutor
Provost office, QR team

1
Outline

R/s between 2 numerical variables

Deterministic Non-deterministic

Scatter plot Association between x & y variables

Linear regression Correlation coefficient

Ecological correlation Attenuation effect

Ecological fallacy Atomistic fallacy


Deterministic relationships
• E.g degrees celsius to Farenheit

• We have a formula for which given one x value you can compute a
true value for y and vice versa.

• The story ends there.


Non – deterministic relationships
• For each x value, there can exist multiple y values and vice versa

• We WANT to arrive at SOME kind of formula.

• But whatever formula we arrive at, we must understand what it IS


and what it is NOT.
Simple Linear Regression
 Regression line (or line of best fit to data)
Son's Height vs Father's Height Son's Height vs Father's Height
80 80

75 75

Son's Height
Son's Height

70 70

65 65

60 60

55 55
55 60 65 70 75 80 55 60 65 70 75 80

Father's height Father's height


What does the regression equation mean?
• Suppose the line in the previous slide has a regression equation y =
1.01x + 1.02

• So what does it mean if I input a value of x = 60?

• Does the value obtained for y correspond to the son’s height


assuming the father’s height is 60 inches?

• What if I have a father whose height is 80 inches. Can I use the


regression equation to predict what will the son’s height be?
Simple Linear Regression Exam
Point!

 Regression line (or line of best fit to data)


Son's Height vs Father's Son's Height vs Father's
Height Height
𝑌 = 𝑚𝑋 + 𝑐
Son's Height

Son's Height
80 85
75
70
75 ≠𝑟
65 65
60 55
(in
55 55 60 65 70 75 80
55 60 65 70 75 80
Father's height
general)
Father's height

 Only can predict son’s AVERAGE height!

 CANNOT predict son’s average height if father’s height is


beyond range used in data set! (i.e can’t simply extrapolate)

 GRADIENT is not the same as r value in general.


Exam
Correlation Coefficient, 𝒓 Point!
1. measures linear association between 2 variables (NOT causation!)

2. ranges between -1 and 1 (no units)

3. 𝑟 > 0 → positive linear association


𝑟 < 0 → negative linear association
𝑟 = 0 → no linear association

4.
What can you say about the graphs below?

Exam
Point!
e.g.
5. Computing 𝑟

Son's Height vs Father's Height


80

1 (65,71) 2

Son's Height
75

e.g. -1.1 × 0.71 70

65
4 3
60

55
55 60 65 70 75 80
6. 𝑟 is not affected by change of scale Father's height
Ecological Correlation
 Correlation computed based on aggregated data, e.g., group averages
Exam
Point!

 E.g of groups : School, business organization, country, race, etc.

 Why use it?


But beware
 ecological fallacy: deduce inferences on correlation between
individuals based on aggregated data
Exam
Point!

 atomistic fallacy: generalize correlation based on individuals


towards aggregate level correlation
Exam
Point!
Another way to think of ecological fallacy and
atomistic fallacy
Ecological Correlation Direction of conclusion
Correlation
Average Math score

Ecological fallacy

Math score
Atomistic fallacy
Each point Each point
represents a represents a
school None of them can prove student
the other one is true. If
someone makes the Chemistry score
Average Chemistry score
wrong conclusion, he
has committed a fallacy

13
Is this attenuation effect?
Attenuation Effect

 Attenuation Effect: Due to range restriction in one variable, Exam


correlation coefficient obtained understates the strength of Point!
association between the two variables
Son's Height vs Father's Height Son's Height vs Father's Height
80 80

75 75
Son's Height

Son's Height
70 70

65 65

60 60

55 55
55 60 65 70 75 80 55 60 65 70 75 80

Father's height Father's height (66-70 in)

Vous aimerez peut-être aussi