Académique Documents
Professionnel Documents
Culture Documents
Looking at DATA-Relationship
Chapter 2
Chapter 2
Chapter 2
Question
In a study to determine whether surgery or
chemotherapy results in higher survival
rates for a certain type of cancer, whether
or not the patient survived is one variable,
and whether they received surgery or
chemotherapy is the other. Which is the
explanatory variable and which is the
response variable?
Chapter 2
Scatterplot
Graphs the relationship between two
quantitative (numerical) variables
measured on the same individuals.
If a distinction exists, plot the
explanatory variable on the horizontal (x)
axis and plot the response variable on
the vertical (y) axis.
Chapter 2
Scatterplot
Relationship
between
mean SAT
verbal score
and percent
of high
school grads
taking SAT
Chapter 2
Scatterplot
Look
Describe
Look
for outliers
Chapter 2
Linear Relationship
Some relationships are such that the
points of a scatterplot tend to fall along
a straight line -- linear relationship
Chapter 2
Direction
Positive association
above-average values of one variable tend
to accompany above-average values of the
other variable, and below-average values
tend to occur together
Negative association
above-average values of one variable tend
to accompany below-average values of the
other variable, and vice versa
Chapter 2
Examples
From a scatterplot of college students,
there is a positive association between
verbal SAT score and GPA.
Chapter 2
10
Examples of Relationships
Chapter 2
11
Extensions
Adding categorical variable to the
scatter plot..
Taking log tranformation.when data is
more clustered
Chapter 2
12
Scatterplot
To add a
categorical
variable, use
a different
plot color or
symbol for
each
category
Southern
states
highlighted
Chapter 2
13
Chapter 2
14
Correlation Coefficient
15
Examples of Correlations
Chapter 2
16
Examples of Correlations
= .94
= .36
= -.94
Chapter 2
17
Linear relationship?
Correlation is close
to zero.
Chapter 2
18
Curved relationship.
Correlation is
misleading.
Chapter 2
19
Chapter 2
20
21
Correlation Calculation
1 n xi x
n - 1 i 1 s x
Chapter 2
yi y
s
y
22
Case Study
Per Capita Gross Domestic Product
and Average Life Expectancy for
Countries in Western Europe
Chapter 2
23
Case Study
Country
Austria
21.4
77.48
Belgium
23.2
77.53
Finland
20.0
77.32
France
22.7
78.63
Germany
20.8
77.17
Ireland
18.6
76.39
Italy
21.5
78.51
Netherlands
22.0
78.15
Switzerland
23.8
78.99
United Kingdom
21.2
77.37
Chapter 2
24
Case Study
xi x /s x y i y /s y
xi - x
s
x
yi - y
s
y
21.4
77.48
-0.078
-0.345
0.027
23.2
77.53
1.097
-0.282
-0.309
20.0
77.32
-0.992
-0.546
0.542
22.7
78.63
0.770
1.102
0.849
20.8
77.17
-0.470
-0.735
0.345
18.6
76.39
-1.906
-1.716
3.271
21.5
78.51
-0.013
0.951
-0.012
22.0
78.15
0.313
0.498
0.156
23.8
78.99
1.489
1.555
2.315
21.2
77.37
-0.209
-0.483
0.101
x = 21.52 y = 77.754
sx =1.532
sum = 7.285
sy =0.795
Chapter 2
25
Case Study
1 n xi x
n - 1 i 1 s x
yi y
s
y
(7.285)
10 1
0.809
Chapter 2
26
Linear Regression
Objective:
We
Chapter 2
27
Linear Regression
Case Study
Number of new birds and Percent returning
One of natures patterns
connects the percent of
adult birds in a colony
that return from the
previous year and the
number of new adults
that join the colony.
Chapter 2
28
Chapter 2
29
Chapter 2
30
Least Squares
Chapter 2
31
Least Squares
Chapter 2
32
equation:
y = a + bx
33
equation:
sy
br
sx
y = a + bx
a y bx
where sx and sy are the standard deviations of
the two variables, and r is their correlation
Chapter 2
34
Chapter 2
35
Regression Calculation
Case Study
Per Capita Gross Domestic Product
and Average Life Expectancy for
Countries in Western Europe
Chapter 2
36
Regression Calculation
Case Study
Country
Austria
21.4
77.48
Belgium
23.2
77.53
Finland
20.0
77.32
France
22.7
78.63
Germany
20.8
77.17
Ireland
18.6
76.39
Italy
21.5
78.51
Netherlands
22.0
78.15
Switzerland
23.8
78.99
United Kingdom
21.2
77.37
Chapter 2
37
Regression Calculation
Case Study
Linear regression equation:
x 21.52
s x 1.532
y 77.754
s y 0.795
r 0.809
sy
0.795
br
(0.809)
0.420
sx
1.532
a y bx 77.754 - (0.420)(21 .52) 68.716
^
y = 68.716 + 0.420x
Chapter 2
38
39
Residuals
A
residual = y y
Chapter 2
40
Residuals
A residual
Chapter 2
41
Residual Plot:
Case Study
Number of new birds and Percent
returning
Chapter 2
42
Chapter 2
43
Outliers:
Case Study
Gesell Adaptive Score and Age at First Word
After removing
child 18
r2 = 11%
From all the data
r2 = 41%
Chapter 2
44
Cautions
beware of extrapolation
predicting outside of the range of x
45
Caution:
Beware of Extrapolation
46
Caution:
Beware of Extrapolation
Regression line:
y-hat = 71.95 + .383 x
height at age 42
months? y-hat = 88
height at age 30
years? y-hat = 209.8
She is predicted to
be 6 10.5 at age 30.
Chapter 2
47
Caution:
Correlation Does Not Imply Causation
Even very strong correlations may
not correspond to a real causal
relationship (changes in x actually
causing changes in y).
(correlation may be explained by a
lurking variable)
Chapter 2
48
Caution:
Correlation Does Not Imply Causation
Social Relationships and Health
House, J., Landis, K., and Umberson, D. Social Relationships
and Health, Science, Vol. 241 (1988), pp 540-545.
Chapter 2
49