Annotated 4 Ch4 Linear Regression F2014

Stat 305, Fall 2014
Name
Chapter 4: Linear Regression

Fitting a Line by Least Squares
Goal: Notice a potential relationship between two quantitative variables. We would like to
use an equation to describe how a dependent (response) variable, y, changes in response to
a change in one or more independent (experimental) variable(s),x.
Line Review
Recall a linear equation is of the form: y = mx + b
In statistics, we use: y = 0 + 1 x + where we assume 0 and 1 are unknown

parameters, and is some error.
The goal is to find estimates b0 and b1 for the parameters (also called 0 and 1 )
Example 1 (a)
Eight batches of plastic are made. From each batch one test item is molded and its hardness,
y, is measured at time x. The following are the 8 measurements and times:
Time (x)
Hardness (y)
32
230
72
323
64
298
48
255
16
199
40
248
80
359
56
305
Step 1: look at a scatter plot to determine if

a linear relationship seems appropriate.
Describe the strength, direction, and
form.
How do we find an equation for the line

that best fits the data?
A straight line will not pass through every data point so when we estimate the
line, we have y (the predicted y value)
instead of y (the observed y value).
The fitted equation then is

y = b0 + b1 x.
We choose the line that has the
smallest residuals.
Residual: The vertical distance between the actual point
and the line. e = y y
Principle of Least Squares

The best line minimizes the sum of squared vertical distances from the data points
to the line (sum of squared residuals).
n
X
So choose b0 and b1 to minimize
(yi yi )2 .
i=1
We want to minimize:
n
n
X
X
(yi yi )2 =
(yi b0 b1 xi )2
i=1
i=1
So we take derivatives and set them to zero:

0=
n
X
(yi b0 b1 xi )2
b0
0=
i=1
n
X
= 2
(yi b0 b1 xi )
n
X
(yi b0 b1 xi )2
b1
i=1
and
= 2
i=1
0=
n
X
n
X
(xi )(yi b0 b1 xi )
i=1
n
X
0=
(yi xi b0 xi b1 x2i )
(yi b0 b1 xi )
i=1
i=1
Solving these for b0 and b1 we get:

b0 = y b1 x
P P
P
xi yi
P
x i yi
(xi x
)(yi y)
P
b1 =
=
Pn
(xi x
)2
P 2 ( xi )2
xi
n
Example 1 (b)
Compute the least squares line
x
y
xy
x2
32
230
7360
1024
72
323
23256
5184
64
298
19072
4096
255
12240
2304
48
16
199
3184
256
248
9920
1600
40
80
359
28720
6400
56
305
17080
3136
408 2217 120832 24000
for the data in example 1.

y2
52900
104329
88804
65025
39601
61504
128881
93025
634069
Then we calculate:
P
P
xi yi
x i yi
=
Pn
P 2 ( xi )2
xi
n
P
b1 =
b0 = y b1 x
=
Now we have the fitted line:
We can use this to get interpretations of estimates and compute a predicted/fitted value
for a given x.
What is the predicted hardness for time x = 24?
Interpreting Slope and Intercept

Slope:
Bad: Rise over run, change in y with change in x
Good: For every 1 (unit) increase in (x), we expect a (b1 ) increase in (y).
Intercept:
Bad: Where the lines crosses the y axis, value of y when x = 0
Good: When (x) is 0 (units), we expect (y) to be (b0 )
Interpreting the intercept is nonsense when . . .

A value of 0 for x is not practical. (i.e.: measuring heights of adult humans)
Extrapolation would have to be used to get the predicted value of y. (i.e.: get a
negative intercept for any measurement)
Note: this doesnt mean the intercept is wrong! Its just not interpretable.
Always put it in context! i.e.: replace everything in parentheses.
Example 1 Interpretations
Slope:
Intercept:
Is the interpretation of the intercept reasonable?
Dont Extrapolate!
When making predictions, dont extrapolate.
Extrapolation: is when a value of x beyond
the range of our actual observations is used to
find a predicted value for y. We dont know
the behavior of the line beyond our collected
data.
Interpolation: is when a value of x within
the range of our observations is used to find a
predicted value for y.
Correlation
Correlation gives the strength and direction of the linear relationship.
Sample correlation:
P
P
(xi x
)(yi y)
p
r= P
=r
P
(xi x
)2 (yi y)2
P
P
xi yi
x i yi
n
P 2r
P
P 2 ( yi )2
( xi )
2
xi
yi
n
n
P
Properties of sample correlation:

1 r 1
r = 1 or r = 1 if all points lie exactly on the fitted line.
The closer r is to 0, the weaker the linear relationship; the closer it is to 1 or -1, the
stronger the linear relationship.
Negative r indicates negative linear relationship.
Positive r indicates positive linear relationship.
Interpretation: always needs 3 things:
Strength (strong, moderate, weak)
Direction (positive or negative)
Linear
Examples:
There is a strong, positive linear correlation between (x) and (y).
There is a fairly weak, negative linear correlation between (x) and (y).
Guess the correlation:
Example 1 (c)
Compute
Example 1. P
Pand interpret
P the sample
P correlation forP
Recall:
x = 408;
y = 2217;
xy = 120832;
x2 = 24000;
y 2 = 634069
P P
x i yi
xi yi
n
r=r
=
P 2r
P
P 2 ( xi )
P 2 ( yi ) 2
xi
yi
n
n
P
Interpretation:
R2 (Coefficient of Determination)
Total amount of variation in response:
V ar(y) =
1 X
(yi y)2
n1
Sum of squares total (SST) Total amount of variation in the response can be divided
into amount of variation due to the model (SSM) and the amount of variation due to the
error (SSE)
X
SST =
(yi y)2
X
X
=
(yi y)2 +
(
y y)2
= SSE + SSM
Coefficient of Determination: proportion of variation in the response thats explained
by the model
P
P
(yi y)2 (yi y)2
P
R2 =
(yi y)2
SST SSE
=
SST
SSM
=
SST
Used to assess the fit of other types of relationships as well (not just linear).
Interpretation: fraction of raw variation in y accounted for by the fitted equation. (ie:
amount of variability in y explained by the model.)
0 R2 1
The closer R2 is to 1, the better the model.
R2 = (r)2
Example 1:
Residual Plots
Residuals: e = y y
Residual Plot: A plot of the residuals (e) vs. x or y.
If the fitted equation is a good one, the residuals should be patternless (cloud-like,
random scatter), and centered around 0.
7
Situation 1: The ideal residual plot
Situation 2: An obvious pattern (BAD!!)
Should have been obvious from plot of data that a linear fit is not appropriate;
data looked more quadratic.
Notice residual plot shows above-below-above pattern indicative of quadratic
data (also true for below-above-below pattern)
Situation 3: Another common pattern
As x increases, so does the variance in y.

Evident by fan-shaped appearance in residual plot (more spread for larger
fitted values).
Solutions:
May want to investigate measurement process to see if there is an issue with
measuring device
Transform the data! Use a log transformation
Normal Residuals: Recall from Ch3: best way to check if a data set is normal is
the Normal Probability Plot (Normal Q-Q plot) These are plots from example 1.
Normal probability plot close to a straight line

No clear pattern in the residual plot
No obvious problem with the linear model!
Precautions
Precautions about Simple Linear Regression (SLR)
r only measures linear relationships (recall quadratic relationship)
R2 and r can be drastically affected by a few unusual data points (for an example see
pg 137)
Using Computers
JMP: available on almost all on campus computers
(http://www.it.iastate.edu/labsdb/search.php)
or download for free
(http://www.stat.iastate.edu/resources/software/jmp/)
Multiple Linear Regression (MLR)

Fitting Curves and Surfaces by Least Squares
Curve Fitting
Not all bivariate relationships are linear.
y = 0 + 1 x + 2 x2 + + k xk
Actual computations to get b0 , b1 , . . . , bk is done by computer.
Surface Fitting
For when we have more than 1 predictor variable (x) for a single response y.
y = 0 + 1 x1 + 2 x2 + + k xk
Again, estimates b0 , b1 , . . . , bk come from computer.
Interpretations for MLR

Interpretations of parameter estimates for the model: y = 0 + 1 x1 + 2 x2
b0 represents the value of y when x1 = 0 and x2 = 0.
b1 represents the increase/decrease in y for every one unit increase in x1 when x2 is
held constant.
b2 represents the increase/decrease in y for every one unit increase in x2 when x1 is
held constant.
Dont forget to put into context!
10
SLR vs. MLR

Curve and surface fitting go together into Multiple Linear Regression (MLR).
By-hand formulas from SLR dont work for MLR.
Interpretations not the same either.
Residual plots
Still use the computer to generate plots.
Still need:
Normal probability plot of residuals.
Plot of residuals vs fitted values.
Plot of residuals against each of the x variables.
Back to R2
Can use R2 to pick a better model, however, it gets inflated the more parameters you add the model.
For MLR, its generally better to use R2 adj
Dont JUST use R2 . You need to check residuals too.
11

Annotated 4 Ch4 Linear Regression F2014

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Annotated 4 Ch4 Linear Regression F2014

Transféré par

Droits d'auteur :

Formats disponibles

Stat 305, Fall 2014

Chapter 4: Linear Regression

In statistics, we use: y = 0 + 1 x +  where we assume 0 and 1 are unknown

Step 1: look at a scatter plot to determine if

How do we find an equation for the line

The fitted equation then is

Principle of Least Squares

So we take derivatives and set them to zero:

Solving these for b0 and b1 we get:

for the data in example 1.

Interpreting Slope and Intercept

Interpreting the intercept is nonsense when . . .

Is the interpretation of the intercept reasonable?

Properties of sample correlation:

Guess the correlation:

Situation 1: The ideal residual plot

Situation 2: An obvious pattern (BAD!!)

Situation 3: Another common pattern

As x increases, so does the variance in y.

Normal probability plot close to a straight line

Multiple Linear Regression (MLR)

Interpretations for MLR

SLR vs. MLR

Vous aimerez peut-être aussi

In statistics, we use: y = 0 + 1 x + where we assume 0 and 1 are unknown