Vous êtes sur la page 1sur 5

Name: _______________________________________________Per__________Date_______________

AP Statistics Linear Regression Review



1) In a statistics course, a linear regression equation was computed to predict the final exam scored based on
the score on the first test of the term. The was

y = 25 + 0.7x , where y is the final exam score and x is the
score on the first test. George scored 80 on the first test. On the final exam George scored 85. What is the
value of his residual?
Residual =

y !y

y = 25 + 0.7 80
( )
= 81 residual = 85 81 = 4

2) A student was interested in the relationship between weight of a
car and gas consumption measured in mpg. He selected 16 different
automobiles and recorded their weights along with their advertised
mpg. The regression plot is show to the right


What affection would the addition of the point (4,300lbs, 15.63mpg)
have on the value of r
2


The addition of the new point would lie almost nearly on the line. It
would also be the lowest point on the graph and further away from
the majority of the data making it an influential point. This point
would hold leverage on the data causing the value of r
2
in this case to increase.


3) The correlation between height and weight among men between the ages of 18 and 70 in the United States is
approximately 0.42. Which of the following conclusions doesnt follow from the data?
a) Taller men tend to be heavier
b) Changing the units of weight and height would still yield the same correlation value.
c) Heavier men tend to be taller.
d) If a man in this group changes his diet and gains 10 pounds, he is likely to get taller.
e) There is a moderate association between a mans height and weight..

4) There is a linear relationship between the duration x (in seconds) of an eruption of a geyser and the interval
of time y (in minutes) until the next eruption. A LSRL of data collected by a geologist is represented by,

y = 41.9+ 0.18x . What is the estimated increase in the interval of time until the next eruption that corresponds
to a 60 seconds in the duration?
3.6 minutes.


5) Which of the following statement(s) is true?
a) Values of r near 0 indicate a strong linear relationship?
b) Changing the measurement units of x and y may affect the correlation between x and y.
c) Strong correlations means that there is a definite cause-and-effect relationship between x and y.
d) Correlation changes when the x and y variables are reversed.
e) The correlation can be strongly affected by a few outlying observations.

6) Data are obtained from a group of high school seniors comparing age and the number of hours spent on the
telephone. The resulting regression equation is:
Predicted number of hours = 0.123 (AGE) + 2.57., where r = 0.866

What percentage of variation in the number of hours spent on the telephone can be explained by the least
squares regression line model?
Since r
2
is the value that explains the percentage of variation that can be attributed to a linear relationship
between the variables of age and predicted number of hours, 75% of the variation amongst predicted numbers
of hours can be attributed to the linear relationship between the variables age and predicted number of hours.

7) Consider the following three scatterplots to the right. Put
them in order by the value of their correlation coefficient
from smallest to greatest.

r2 < r1 < r3

8) A linear model was constructed for a set of bivariate data using least
squares regression techniques. Given the residual plot shown, what conclusion
should be drawn?

Because there is an obvious patter in the data, a linear model is not an
appropriate fit.


9) The following output was generated from a random sample of 40
companies on the Forbes 500 list, where sales (in hundreds of thousands of dollars) and profits (in hundred of
thousands of dollars) was investigated using linear regression. Here is the output computed in minitab:

Dependent variable is Profits
R-squared = 66.2%

Variable Coef S.E. P-value
Constant -176.644 61.16 0.0050
Sales 0.092498 0.0106 <0.001

Interpret the value of the slope for this output.

On average, for every $100,000 increase in sales, there is a $9,249.80 increase in the profit.

10) A fisheries research report gives the following regression equation for the relationship between the length
(L) in cm and weight (W) in grams, of the gracile lizardfish, a small marine fish that lives in the Indian Ocean:

lnW = !5.36+ 3.216lnL
What is the predicted weight of a lizardfish that was 12 cm long, based on this model?
13.80 grams

11) All but one of these statements is false. Which one could be true?
a) The correlation between a football players weight and the position he plays is 0.54.
b) The correlation between a cars length and its fuel efficiency is 0.71 miles per gallon.
c) There is a high correlation (1.09) between height of a corn stalk and its age in weeks.
d) The correlation between the amounts of fertilizer used and quantity of beans harvested is 0.42.
e) There is a correlation 0f 0.63 between gender and political party.
a and e are wrong because correlation can only be established between quantitative variables. C is wrong
because the correlation cannot be greater than 1. B is wrong because correlation is not a measure that involves a
unit

Free Responses.

The statistics department at a large university is trying to determine if it is possible to predict whether an
applicant will successfully complete the Ph. D. program or will leave before completing the program. The
department is considering whether GPA (grade point average) in undergraduate statistics and mathematics
courses (a measure of performance) and mean number of credit hours per semester (a measure of workload)
would be helpful measures.








Successfully Completed Ph.D. Program


a)What is the LSRL?


Predicted Doctors = 23.514 !2.756 GPA
( )


b) Interpret the values of r, r-squared, and the slope in context of the problem.

r= 0.872 There is a strong negative correlation between the doctors and GPA
r
2
= 76% Approximately 76% of the variation in doctors can be explained by the linear relationship between
doctors and GPA
slope= -2.755 On average, for every increase in GPA there is an approximate 2.755 decrease in the number of
students completing their doctorate

c) Is a linear model appropriate for this data? Justify your answer.

Yes the linear model is appropriate, since there is no apparent pattern in the residuals and approximately 76%
of the variation in doctors can be explained by the linear relationship between doctors and GPA

The dependent variable is DOCTORS.
Predictor Coef StDev T P
Constant 23.514 1.684 13.95 0.000
GPA -2.7555 0.4668 -5.90 0.000
S = 0.5658 R-Sq=76.0%
Assume that the following data was collected for a chemical reaction where reactants A and B are reacting to
form products C and D. As the products are
formed, we measure their masses at two minute
increments.

Time (min) Amt of product (g)
2 3
6 5
7 7
8 10
10 13
12 17
14 21
16 26
18 34
20 50

a) What is the equation for the model of best fit?
Illustrate your process carefully. Give a rough
sketch of the residual plot.



b) Is a linear model the most appropriate? If not,
what would be a better model?

No the linear model is not the most
appropriate, because there is a definite
pattern in the residual plot. Appropriate model:

log predicted amt. of product
( )
!
= .3853+.0662 time
( )
r = .9901 r
2
= .9803

The Exponential model proved to be the most
appropriate model because there was no
apparent patter in the residual and
approximately 98.03% of the variation in the
log(amount of product) can be explained by the
(exponential) linear relationship between the
log(amount of product) and time.


c) What does your model predict would have
been the amount present at 5 minutes?


log y
( )
!
= .3853+.0662 5
( )
log y
( )
!
= .7163
y
"
= 5.2036
At 5 minutes there
would be approximately 5.2036 grams of product present
5 10 15 20
0
5
1
0
Time
R
e
s
i
d
u
a
l
s
5 10 15 20
-
0
.
2
-
0
.
1
0
.
0
0
.
1
0
.
2
Time
R
e
s
i
d
u
a
l
s
5 10 15 20
-
5
0
5
1
0
1
5
Time
R
e
s
i
d
u
a
l
s
5 10 15 20
-
0
.
4
-
0
.
2
0
.
0
0
.
2
0
.
4
Time
R
e
s
i
d
u
a
l
s
Linear Model:

predicted amt. of product
( )
!
= -8.8467+2.429 time
( )
r = .945 r
2
= .894

Pattern in the residual= NOT appropriate

Exponential Model:

log predicted amt. of product
( )
!
= .3853+.0662 time
( )
r = .9901 r
2
= .9803

No Pattern in the residual= Appropriate

Logarithmic Model:

predicted amt. of product
( )
!
= -20.6026+39.949 log time
( ) ( )
r = .8046 r
2
= .6473
Pattern in the residual= NOT appropriate

Power Model:

log predicted amt. of product
( )
!
= -.0764+1.233 log time
( ) ( )
r = .9541 r
2
= .9104
Pattern in the residual= NOT appropriate


d) At what time would 25.1 grams remain, according to your model?

log 25.1
( )
!
= .3853+.0662 x
( )
1.0144 = .0662 x
( )
x =15.3229
With 25.1 grams remaining the

Vous aimerez peut-être aussi