Académique Documents
Professionnel Documents
Culture Documents
Name
Line Review
Recall a linear equation is of the form: y = mx + b
Example 1 (a)
Eight batches of plastic are made. From each batch one test item is molded and its hardness,
y, is measured at time x. The following are the 8 measurements and times:
Time (x)
Hardness (y)
32
230
72
323
64
298
48
255
16
199
40
248
80
359
56
305
We want to minimize:
n
n
X
X
(yi yi )2 =
(yi b0 b1 xi )2
i=1
i=1
n
X
(yi b0 b1 xi )2
b0
0=
i=1
n
X
= 2
(yi b0 b1 xi )
n
X
(yi b0 b1 xi )2
b1
i=1
and
= 2
i=1
0=
n
X
n
X
(xi )(yi b0 b1 xi )
i=1
n
X
0=
(yi xi b0 xi b1 x2i )
(yi b0 b1 xi )
i=1
i=1
P P
P
xi yi
P
x i yi
(xi x
)(yi y)
P
b1 =
=
Pn
(xi x
)2
P 2 ( xi )2
xi
n
Example 1 (b)
Compute the least squares line
x
y
xy
x2
32
230
7360
1024
72
323
23256
5184
64
298
19072
4096
255
12240
2304
48
16
199
3184
256
248
9920
1600
40
80
359
28720
6400
56
305
17080
3136
408 2217 120832 24000
Then we calculate:
P
P
xi yi
x i yi
=
Pn
P 2 ( xi )2
xi
n
P
b1 =
b0 = y b1 x
=
Now we have the fitted line:
We can use this to get interpretations of estimates and compute a predicted/fitted value
for a given x.
What is the predicted hardness for time x = 24?
Intercept:
Dont Extrapolate!
When making predictions, dont extrapolate.
Extrapolation: is when a value of x beyond
the range of our actual observations is used to
find a predicted value for y. We dont know
the behavior of the line beyond our collected
data.
Interpolation: is when a value of x within
the range of our observations is used to find a
predicted value for y.
Correlation
Correlation gives the strength and direction of the linear relationship.
Sample correlation:
P
P
(xi x
)(yi y)
p
r= P
=r
P
(xi x
)2 (yi y)2
P
P
xi yi
x i yi
n
P 2r
P
P 2 ( yi )2
( xi )
2
xi
yi
n
n
P
Example 1 (c)
Compute
Example 1. P
Pand interpret
P the sample
P correlation forP
Recall:
x = 408;
y = 2217;
xy = 120832;
x2 = 24000;
y 2 = 634069
P P
x i yi
xi yi
n
r=r
=
P 2r
P
P 2 ( xi )
P 2 ( yi ) 2
xi
yi
n
n
P
Interpretation:
R2 (Coefficient of Determination)
Total amount of variation in response:
V ar(y) =
1 X
(yi y)2
n1
Sum of squares total (SST) Total amount of variation in the response can be divided
into amount of variation due to the model (SSM) and the amount of variation due to the
error (SSE)
X
SST =
(yi y)2
X
X
=
(yi y)2 +
(
y y)2
= SSE + SSM
Coefficient of Determination: proportion of variation in the response thats explained
by the model
P
P
(yi y)2 (yi y)2
P
R2 =
(yi y)2
SST SSE
=
SST
SSM
=
SST
Used to assess the fit of other types of relationships as well (not just linear).
Interpretation: fraction of raw variation in y accounted for by the fitted equation. (ie:
amount of variability in y explained by the model.)
0 R2 1
The closer R2 is to 1, the better the model.
R2 = (r)2
Example 1:
Residual Plots
Residuals: e = y y
Residual Plot: A plot of the residuals (e) vs. x or y.
If the fitted equation is a good one, the residuals should be patternless (cloud-like,
random scatter), and centered around 0.
7
Should have been obvious from plot of data that a linear fit is not appropriate;
data looked more quadratic.
Notice residual plot shows above-below-above pattern indicative of quadratic
data (also true for below-above-below pattern)
Precautions
Precautions about Simple Linear Regression (SLR)
r only measures linear relationships (recall quadratic relationship)
R2 and r can be drastically affected by a few unusual data points (for an example see
pg 137)
Using Computers
JMP: available on almost all on campus computers
(http://www.it.iastate.edu/labsdb/search.php)
or download for free
(http://www.stat.iastate.edu/resources/software/jmp/)
Curve Fitting
Not all bivariate relationships are linear.
y = 0 + 1 x + 2 x2 + + k xk
Actual computations to get b0 , b1 , . . . , bk is done by computer.
Surface Fitting
For when we have more than 1 predictor variable (x) for a single response y.
y = 0 + 1 x1 + 2 x2 + + k xk
Again, estimates b0 , b1 , . . . , bk come from computer.
10
11