Vous êtes sur la page 1sur 22

AP Statistics

Page 1 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
Key Terms and Concepts
Before taking the quiz, you need to be able to explain the meanings (and recognize
symbols in cases where there is an associated symbol) of each of these terms or
concepts. You should also know when and how to use them in statistics problems.
Unless otherwise noted (by a note such as "see lesson 2") these terms and concepts are
defined in the glossary.
categorical variables
cause and effect
conditional distributions
data transformation
explanatory variable
finding residuals on the graphing calculator (see lessons 3 and 4 and your calculator manual)
influential points
interpreting MINITAB for regression (see lesson 3)
joint frequencies
least-squares regression line (line of best fit)
LinReg (a + bx) on the graphing calculator (On the TI83/84 it is STAT CALC 8. On the TI-89 it
is [F4] 3:Regressions)
marginal frequencies
negative association
non-linear bivariate data
numerical variables
outliers (in bivariate data)
positive association
relation between r and the slope of the regression line
resid on the graphing calculator
residual
response variable
row and column percents
scatterplot
Simpson's Paradox
sum of squared residuals
the coefficient of determination (r2)
the correlation coefficient (r)
two-way table
use the graphing calculator to transform data to achieve linearity (see lesson 5)
using residuals to test a linear model (see lessons 3, 4, and 5)

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 2 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
Objectives, Example Problems, and Study Tips
Introduction to Bivariate Data
Objective 1
Distinguish between quantitative and categorical data.
Examples
1. Which of the following statistics or variables are derived from quantitative data and
which are derived from categorical data?
A. Your G.P.A
B. The political party your father belongs to
C. The cities of residence of 300 people
D. The populations of 10 different cities
Tips

Categorical data are also called qualitative data. Quantitative data are also called
numerical data.
A categorical variable (which holds categorical data) tells which of several groups an
individual belongs to. A quantitative variable has a numerical value that can be
manipulated mathematically.
Categorical variables can be thought of as labels, or names; quantitative variables can
be thought of as numerical values, or quantities.
Some categorical data appear as numbers, but they are really just names, or labels, for
categories. For example, the variable "favorite radio station" may have the value 102.3
or 104.1, but these numbers are meant as labels and not as quantities. It would be
meaningless to add 102.3 and 104.1 to get the average radio station.

Answers
1.
A. Quantitative. Your G.P.A, is an arithmetic mean of grades you received from many
classes.
B. Categorical. "Political party" is a label, not a quantity.
C. Categorical
D. Quantitative
Objective 2
Distinguish between explanatory and response variables.
Examples
1. You want to be able to predict class rank from number of hours a student spends on
homework. Which is the explanatory variable, and which is the response variable?
2. True or False: There is always a cause-and-effect relationship between the explanatory
and response variables.
Tips

Sometimes things can be associated without knowing which, if either, variable caused
the other.
The explanatory variable is sometimes called the independent variable, and the
response variable is sometimes called the dependent variable.

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 3 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
Answers
1. The variable you're using to predict from is the explanatory variable and the variable you're
trying to predict is the response variable. Therefore, hours spent on homework is the
explanatory variable, and class rank is the response variable.
2. False. We might suspect a cause-an-effect relationship between two variables if they're
strongly related, but association alone does not prove cause and effect. For example, it's
well known that success on the SAT predicts success in college (that's one reason why
many colleges use SAT scores to help decide on admissions). In no way, however, does it
mean that a high score on the SAT causes success in college.
Objective 3
Construct a scatterplot when given a set of paired data.
Examples
1. The number of calories and the number of grams of fat in 25 common fast foods
(hamburgers, pie, french fries, onion rings, etc.) are given in the following table.
Construct a scatterplot of the data where grams of fat is the explanatory
(independent) variable: (Data taken from Landwehr & Watkins, Exploring Data, Dale
Seymour Publications, Palo Alto, 1995, pg. 21)
Grams of
Fat
31
38
48
55
14
19
10
15
28
33
25
10
32
28
13
9
4
5
0
20
19
14
13
8
14

Calories
570
660
800
890
300
350
260
300
470
530
450
280
620
450
236
178
142
95
25
372
339
320
360
290
220

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 4 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

Answer
1.

Objectiv
ve 4
Identify instances off positive an
nd negative association
n.
Example
es
1. As ag
ge increases
s, the averag
ge number of
o years leftt to live decrreases. It th
his an instan
nce of
positive or negattive association?
2. Does
s the followin
ng scatterpllot demonsttrate positiv
ve or negativ
ve associatio
on?

3. Would you expe


ect hours of exercise pe
er week and
d weight to
o be positive
ely or negattively
ciated?
assoc

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 5 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
Tip

Associations are considered to be positive if the response variable increases as the


explanatory variable decreases, and negative if the response variable decreases
as the explanatory variable increases.

Answers
1. Negative. As the explanatory variable (age) increases, the response variable (number of
years to live) decreases.
2. Negative
3. Negative, since typically people who exercise weigh less.
The Least-Squares Regression Line
Objective 1
Calculate the linear regression line from a bivariate data set, interpret the correlation
coefficient, and use the line to predict values of the response variable when given values
for the explanatory variable.
Examples
1. In what sense is the linear regression line also the "line of best fit?"
2. Define the least-squares regression line.
3. Consider again the fat vs. calories data you saw earlier:
Fat
Calories

31
570

38
660

48
800

55
890

14
300

19
350

10
260

15
300

28
470

33
530

25
450

10
280

Fat
Calories

28
450

13
236

9
178

4
142

5
95

0
25

20
372

19
339

14
320

13
360

8
290

14
220

32
620

Calculate the line of best fit of calories on fat (that is, use fat as the explanatory
variable and use calories as the response variable) and interpret the regression
coefficient. (Do this on your calculator, not by hand!)
Tips

The line of "best fit" for any set of points is the line that comes closest to containing all
the ordered pairs in the data.
When you interpret a correlation coefficient, you simply make a statement that tells the
amount of increase in the response variable for every unit increase (an increase of 1) in
the explanatory variable. The amount of increase is simply the regression coefficient, or
the slope of the regression line.
On the TI-83/TI-84 there is a LinReg(a+bx), which you can get to by
Press STAT
Choose [CALC]
Choose [LinReg(a+bx)] (or just press 8)
On the TI-89 there is a LinReg(a+bx), which you can get to when you are in the
Stats/List Editor press [F4] 3:Regressions [ENTER].

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 6 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

Note: In statistic
cs, do NOT use
u LinReg((ax+b), whicch you can get by follow
wing the
instru
uctions abov
ve and choo
osing [LinRe
eg(ax+b] ratther than [L
LinReg(a+bx
x)].

Answers
l
regres
ssion line is also called the line of b
best fit. It'ss the one line, of all pos
ssible
1. The linear
lines,, that comes
s closest to the set of points.
p
2. The least-square
es regression line, or the line of besst fit, is the line that miinimizes the
e sum
uals from th
he regression line. A ressidual is deffined as the vertical disttance
of squared residu
edicted y-va
alue ( y ). Sy
ymbolically, the
between the actual y-value of the pointt and its pre

ession line is
s defined as
s the line tha
at minimize
es the expre
ession y y .
regre
3. y = 78.03
7
+ 14.96x (rounde
ed to two de
ecimal place
es).
You can
c
display the scatterp
plot with the
e line on TI--83/TI-84 by
y pressing 2nd STAT PLOT
and setting
s
up a scatterplot with L1 and
d the Xlist a
and L2 as th
he Ylist.
You can
c display the
t scatterplot with the line on the T
TI-89 by pre
essing
plot1 with x as liist1 and y as
a list2.

[F1] and s
set up

The correlation
c
coefficient
c
ca
an be interpreted as: "F
For every 1 g
gram increas
se in the varriable
"fat content,"
c
there tends to
o be an increase in the variable "ca
alories" of 1
14.96."
Note: Save yourr data for the next parts
s of this rev
view.
Objectiv
ve 2
Calculate
e a set of re
esiduals from
m a linear re
egression.
Example
es
1. Using the data fro
om Lesson 3,
3 calculate a complete
e set of resid
duals and de
etermine
A. Th
he sum of th
he residuals

y y .

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 7 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables

B. The sum of the squares of the residuals y y

Here's the data again:


Fat
Calories

31
570

38
660

48
800

55
890

14
300

19
350

10
260

15
300

28
470

33
530

25
450

10
280

Fat
Calories

28
450

13
236

9
178

4
142

5
95

0
25

20
372

19
339

14
320

13
360

8
290

14
220

32
620

Tip

You should be able to calculate the residuals without using the built-in function in your
calculator; it teaches you a lot about just what residuals are. However, you should
remember that when you do a linear regression on the graphing calculator, a set of
residuals is created and stored under the list name [RESID] on the TI-83/TI-84 which
can be found under the [NAMES] menu; and [statvars\resid] on the TI-89.

Answer on the TI-83/TI-84:


1. Enter the fat data in L1 and the calorie data in L2.
2. Press STAT CALC 8 (to get LinReg(a+bx)).
3. Press 2nd [L1], [L2], VARS Y-VARS 1 1 (to get LinReg(a+bx) L1,L2,Y1).
4. Press ENTER (the regression equation is stored in Y1).
5. Press STAT 1 and move the cursor on top of L3 (clear L3 if it has numbers in it).
6. Press CLEAR 2nd [L2] - VARS Y-VARS 1 1 ( 2nd [L1] ) ENTER.
7. This will place the residuals in L3. The expression L2-Y1(L1) is equivalent to subtracting the
predicted y-value from the actual y-value.
8. Press STAT CALC 1 2nd [L3].

9. The sum of the residuals is y y

residuals squared is y y

and should be very close to 0. The sum of the

and is something like 44942.81.

10.Just to be sure you've done it right in L3, do the following:


2nd [LIST], select MATH, press 5, press 2nd [LIST], select RESID and press ENTER, press x2.
(Your screen should end up with this expression: sum( LRESID)2).
11.Press ENTER again, and your answer should be 44942.81, which is the sum of all of the
residuals squared.
Note: Save the data for fat, calories, and residuals for the next part of this review.
Answer on the TI-89:
1. Go to the Stats/List Editor and clear list1. Enter the Fat data there. Clear list 2 and put the
Calorie data there.
2. Compute the least-squares regression line for the data while you are in the lists, by
pressing [F4] 3:Regressions->1:LineReg(a+bx) [ENTER] (your list names of list1 and

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 8 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

list2 should alrea


ady be there
e; but, if they are not, type them iin). Choose a function s
slot
h as y1(x)) to store you
ur regression equation. [ENTER].
(such
3. The calculator
c
diisplays the regression
r
information on the scre
een.
4. You can
c
also writte down you
ur equation and press
to y1
1 [ENTER] and enter your equation.

[F1] to access the Y= Editor. S


Scroll

5. When
n you press [2ND] [APP
PS], you sho
ould be back
k in the listss again.
6. Scroll over to the
e top of list3
3 and press
s [CLEAR] [E
ENTER]
7. Type the formula
a list2-y1(lis
st1) [ENTER
R].
8. You now
n
see the
e residual va
alues entere
ed in list3.

9. The sum
s
of the residuals
r
is y y and
d should be very close to 0. On the
e [HOME] sc
creen
nd

press
s [2 ] [5]3::List [ENTER
R] 6:sum( [ENTER]
sum is indeed close to 0.

10. The sum of the residuals sq


quared is y y

li st3 [)] [ENT


TER], and yo
ou will see iff your

ssomething cclose to 4494


42.81.

Notte: Save the


e data for Fa
at, Calories,, and residu
uals for the n
next part off this review
w.
Justt to be sure youve done
e it right in list3, do the
e following o
on the TI-89
9:
n
nd
On the hom

me screen editor press [CLEAR] [2 ] [5]3:Lis


st [ENTER] 6
6:sum( [EN
NTER]
statvars\r
resid^2 [)] [ENTER]. Your
Y
answerr should be 4
44942.8, wh
hich is the su
um of
all of the re
esiduals squ
uared.
Note: Save
e the data fo
or Fat, Calorries, and ressiduals for tthe next parrt of this rev
view
ve 3
Objectiv
Use resid
duals to disc
cuss the ade
equacy of a linear regre
ession mode
el.
Objectiv
ve 4
Use the graphing
g
calculator to create
c
residu
ual plots and
d determine
e whether a linear regression
gives an acceptable model for the
t
relations
ship betwee
en the explanatory and response
variables
s.
Example
e
83/TI-84 orr list1
1. You should
s
have
e saved the data from the previouss section in llists L1, TI-8
TI-89
9 Fat data, L2
L TI-83/TI-84 or list2 TI-89
T
Calorie
e data, and L
L3 TI-83/TI--84 or list3 T
TI-89
residuals. If not,, return to th
he previous section and
d follow the steps now. Note that, iif you
ny new regrressions, the
e same data will be in th
he list name
ed RESID as in L3
have not done an
on th
he TI-83/TI--84, and sta
atvars\resid on the TI-8
89. You may
y use either list for this
sectio
on.

Use the
t
data in your
y
calcula
ator to discu
uss the exte nt to which a line is a g
good model for
these
e data.
Tip

A residual plo
ot should sh
how points more-or-les
m
s randomly distributed about the
average resid
dual value of
o zero (the average of any set of rresiduals is always 0).

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 9 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

ne would ind
Distinctive
D
pa
atterns, such as a parab
bola, about the y = 0 lin
dicate that a line
is
s not the best possible model
m
for th
he data.
Answer
1. First,, use the data in L1 and
d L2 on the TI-83/TI-84
4 (list1 and list2 on the
e TI-89) to d
do a
linear regression
n of Calories
s on Fat. (If youve forg
gotten how tto do this, rreturn to the
e
ow.) Graph the scatterp
plot of the d
data and the
e regression
n line.
previous section and do it no
It sho
ould look lik
ke this (note
e that the re
egression lin
ne and one point on the
e line also shows
on th
he screen):

Our visual
v
impre
ession is tha
at this is a pretty
p
good ffit. Now, to do a residu
ual plot:
On the TI-83/T
TI-84:
d turn off Y1
1 by moving your cursorr over the = and pressin
ng ENTER. (IIf the
Press Y = and
unction is tu
urned off, th
he = will nott be highligh
hted.)
fu
Turn 2nd STA
AT PLOT 1 on and be su
ure the othe
er STAT PLO
OTs are off. Set up the STAT
ng way, with
h L3 as yourr Ylist:
PLOT screen the followin

a
you should get:
Then press [ZOOM 9], and

On the TI-89:
Y=
= and turn on
o your data
a plot and th
he best-fit line in y1. To
o see your p
plot,
Press
press [F2]9.

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 10 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

On
O the home
e screen press [F4] 8: FnOff
F
[ENTER
R] to turn off all functio
ons on the T
TI-89.
[F
F1] and scro
oll to plot1. Turn it on b
by pressing [F4], which
h puts a che
eck
Press
mark
m
in frontt of it. Make
e sure that all
a other plo
ots are off.
With
W
your cursor on plott1, press [EN
NTER], and for Plot Typ
pe choose "S
Scatter." Ch
hoose
th
he mark you
u like to use
e; x is list1 and
a
y is list3
3. Press [EN
NTER][ENTE
ER] to save and
re
eturn to the
e plot1. Pres
ss [F2] 9 (Zo
oomData) to
o see your rresidual plott.

There is a mo
ore or less ra
andom patte
ern about th
he line y = 0 (also know
wn as y - y = 0),
o we would therefore co
onclude thatt the line is a reasonable model forr the data. T
There
so
is
s some tendency for the
e pattern to get closer tto y = 0 at tthe right of tthe graph, s
so we
might
m
expectt the model to
t predict so
omewhat be
etter for high
her numbers
s of grams o
of fat.

The Cor
rrelation Co
oefficient
Objectiv
ve 1
Calculate
e Pearson's correlation coefficient r for a set o
of paired datta and expla
ain its mean
ning.
Example
es
1. Defin
ne the correllation coeffiicient by writing out the
e formula.
f
hy
ypothetical data
d
represe
ents a set off IQ scores and grade p
point averag
ges:
2. The following
IQ
Q (x)
GPA
G
(y)

10
00
3.0

120
3.8

110
3.1

10
05
2.9
9

85
2.6

95
2.9

13
30
3.6
6

100
2.8

105
3.1

90
2.4
4

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 11 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

A. Drraw a scatte
erplot of the
ese data (graphing calcu
ulator).
B. Fin
nd the corre
elation coeffficient (grap
phing calcula
ator).
C. What does r tell
t you in te
erms of the strength of a linear rela
ationship be
etween IQ a
and
GPA?
G
D. Ho
ow would th
he correlatio
on coefficien
nt change if we let GPA be the x-va
ariable (the
ex
xplanatory variable)
v
an
nd IQ the y-variable (th
he response variable)?
Tips

efined in terrms of raw sscores for th


he x- and
The correlation coefficient can be de
o in terms of
o the standardized z-sccores for ea
ach data sett.
y-variables or
T
appea
ars when yo
ou do a linea
ar regression
The correlation coefficient on your TI-83
mbered to tu
urn Diagnosttics on in th
he CATALOG
G menu: 2nd
assuming youve remem
CATALOG] Alpha
A
[D], sc
croll to Diag
gnosticsOn, and press E
ENTER ENTE
ER. Diagnosttics is
[C
always on for the TI-89,, and r is dis
splayed whe
en you perfo
orm a regression equattion.

Answers
1. r

x x
1
i
n 1 sx

yi y

sy

1 z x zy
n 1

A.
A

B.
B r = .91
C.
C r is quite strong (quitte close to -1 or +1). Th
he visual image of the s
scatter plot iin #1
and the correlation
c
coefficient in
n #2 leads u s to believe
e that there is a strong linear
relationsh
hip between
n IQ and GPA
A. It says no
othing aboutt cause and effect, however,
or about the direction of predicttion.
D.
D The corre
elation coeffficient would
d not chang e. Correlatio
on is simply
y a measure
e of
linear rela
ationship. The
T
regressio
on coefficien
nt (the slope of the least-squares line)
talks about a directio
on of predicttion, but co rrelation is simply a me
easure of th
he
strength of the linear relationship between two variables.
Objectiv
ve 2
Explain the
t
relations
ship betwee
en the correllation coeffiicient and th
he slope of tthe regressiion
line.
Example
es
1. Consider a data set
s with r = -.88, sx = 9..17, and sy = 10.54. Find the slope of the regression
line.
mplete set off z-scores fo
or the x- and
d y-values is
2. A sett of data poiints has r = .56. A com
comp
puted and a regression line fitted to these data
a. If there'ss enough infformation to
o tell,
what is the slope
e of the regression line fitted to the
e scatterplo
ot of the z-scores?

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 12 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
Tips
sx is the standard deviation of the values in the variable x, and sy is the standard
deviation of the values in the variable y.
sy
br

sx

where
b = the regression coefficient
r = the correlation coefficient
sx = the standard deviation of the values in the variable X
sy = the standard deviation of the values in the variable Y

If you standardize the data in x and y (if you convert the numbers to z-scores), sx and
sy will both be one, and the formula is simplified to:
1
br
1
b=r
Therefore, the correlation coefficient and the regression coefficient will have the same
value.

Answer
sy
1. b r
sx

10.54
.88
1.011

9.17

2. The slope of the regression line for the scatterplot of standardized scores will be the
same as the correlation coefficient for the raw data: r = .56. This happens because
when you standardize scores, you end up with a standard deviation of 1. Thus the
ratio of the standard deviations in the formula given in example 1 is 1, and you're left
with b = r.
Objective 3
Calculate the coefficient of determination (r2) for a set of paired data and explain its
meaning.
Examples
1. Given the following data set:
X
Y

114
14

87
12

93
10

74
9

50
7

A. Find the line of best fit, and the correlation coefficient, of Y on X.


B. Find the value of r2.
C. Interpret r2 in the context of the data set.
2. Describe what's meant by the phrase, "r2 is the amount of variation in Y explained by
X."
Tip

The "amount of variation in Y" refers to the total variation in Y measured from the

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 13 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
average y-value, that is, from y (as opposed to y ). The total variation is often referred

to as the "total sum of squares" (SST) and equals y1 y

Answers
1.
A. y = 1.51 + .11x,r = .93
B. r2 = .87
C. 87% of the variation in Y (as measured from y ) is attributable to variation in X.
2. The short explanation:
Some of the variation in Y is tied to the trend shown by the least-squares line; as the
x-variable increases, the y-variable increases or decreases by a certain amount. That's the
variation explained by X. But other components of the variation in Y aren't related to
changes in X. That's the variation not explained by X. The coefficient of determination (r2)
is simply the proportion of the variation in Y that is explained by X.
The longer but more precise explanation:
Imagine that we had a set of y-values with no knowledge of x-values they might be paired
with. With no ability to predict a value of y, our best guess at any value of Y would have to
be y , the average value. The distance from any point to y can be considered an "error" in
our prediction. You can compute the sum of the squares of all such "errors" in the data set
of y-values and call it total sum of squares. Now find the line of best fit, which is simply the
horizontal line y = y . Now, consider an "error" to be the distance from the actual y-value to
the predicted y-value (that's right, it's a residual). Find the sum of the squares of those
residuals. The (total sum of squares)-(sum of squares of residuals) is the amount of error
eliminated because of basing our predictions on the regression line rather than the average
y-value. The fraction of the total error from y that this represents is the amount of
variation "explained" by the variable X.
Objective 4
Interpret MINITAB output for regression and correlation.
Examples
Consider this printout:

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 14 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

1. Identtify the corrrelation coeffficient.


2. Identtify the regrression equa
ation.
Answers
R-Sq" in the printout) = .776, and r = -.881. NO
OTE: r is neg
gative since
e the slope o
of the
1. r2 ("R
regre
ession line is
s negative.
2. y = 64.2
6
1.01x

ers
Influenttial Points and Outlie
Objectiv
ve 1
Identify and describ
be the influence of outlie
ers and influ
uential points in a regre
ession
setting.

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 15 of 22
Review:
Bivariate Data: Regression Analysis and Two-Way Tables

Objectiv
ve 2
Distinguiish between
n an outlier and an influ
uential pointt.
Example
es
1. Describe the diffferences bettween outlie
ers and influ
uential pointts.
2. Consider the following graph:

e point show
wn as a box
x an outlier or
o an influen
ntial point?
Is the
3. If the
e "box" poin
nt were removed from the
t
above d rawing, wou
uld the slope of the
regre
ession line in
ncrease or decrease?
d
Answers:
1. Outlie
ers are poin
nts that have large resid
duals; they''re far from the linear p
pattern in th
he
vertic
cal direction
n. They usua
ally aren't extreme valu
ues of the explanatory (X). Influen
ntial
pointts are points
s whose rem
moval would have a sign
nificant effe
ect on the re
egression lin
ne.
Influe
ential points
s are typically on the ex
xtreme endss of the range of x-valu
ues. Influential
pointts can also be
b outliers.
Outlie
ers tend to have a large effect on the
t
correlattion coefficie
ent (r) and a relatively
smaller effect on
n the regres
ssion coefficient (slope o
of the regre
ession line).
Influe
ential points
s tend to ha
ave a large effect
e
on the
e regression
n coefficientt (slope of th
he
regre
ession line); they have a relatively smaller effe
ect on the ccorrelation c
coefficient (rr).
2. It's an
a influential point and an outlier, since
s
it's farr from the liinear patterrn in a vertic
cal
direction and it's
s removal would
w
have a large effecct on the slo
ope of the re
egression lin
ne. It
ct on both th
he regressio
on coefficien
nt and the correlation coefficient.
has a large effec
3. The slope
s
would increase. For these datta, the regre
ession equa
ation with th
he "box" inclluded
is y 6.03 .5 x. (N
Note that the effect of the
t
point is so severe that it makes the regres
ssion
line have
h
a nega
ative slope when
w
the res
st of the patttern is clea
arly positive.) the equattion
with the influenttial point rem
moved is y 1.9 1.5 x.

t Achieve Linearity
Transformations to
ve 1
Objectiv
Recogniz
ze whether bivariate da
ata may be linearly
l
rela ted.
Objectiv
ve 2
Use power, logarithm
mic, and po
olynomial tra
ansformatio ns to linear ize non-line
ear data whe
ere
ulted in data
a that
appropriate and analyze the results to deterrmine if the transformattion has resu
y modeled with
w
a straig
ght line.
can be appropriately

______________________________
Copyright
2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 16 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
Objective 3
Use the graphing calculator to do transformation of data and analyze the results to determine
if the transformation has resulted in data that can be appropriately modeled with a straight
line.
Examples
1. Consider the following set of observations:
Obs.
x
y

10

11

12

13

14

13.5
5

13.5
15

14
35

15
25

17.5
25

19
70

20
80

21
140

22
75

23
125

25
190

25
300

26
240

27
315

A. Enter the data in L1 and L2 in your TI-83/84 or list1 and list2 in your TI-89; find the
regression line, and construct a scatterplot with the regression line included. Does a
line appear to be a good model for these data? Why not?
B. What is r2 for this model?
C. Find the natural logarithms (ln) of the y-data. Put these values in L3 (L3=ln(L2)) or list3
(list3 = ln(list2)).
D. Draw a scatterplot of x vs. ln y. Find the regression equation of ln y on x and include it
on the graph. Does this appear to be a better linear fit? What is r2 for this model?
E. Use the regression equation you found in #4 to predict the value of y when x = 16
(remember to "back transform" to the original data!)
Tips

To transform a data set (for instance, to find ln y), follow these steps:
1. Go to your lists.
2. Place your cursor over the top of L3 and clear the data in L3 (assuming that your
data are in L1 and L2) or list3 and clear the data in list3 on the TI-89 with your data
in list1 and list3.
3. With your cursor over the top of L3 (or list3), press [ENTER] so that your cursor is
blinking at the bottom of the screen, and you see the expression L3= (or list3= on
the TI-89).
4. Press LN 2nd L2 ENTER (or [2nd] LN list2 [ENTER] on the TI-89).
5. You can now do bivariate statistics and make plots of L1 vs. L2 (list1 vs. list2). Be
sure to recalculate the linear regression equation for L1 vs. L3 (list1 vs. list3) and
change your STAT PLOT to L1 vs. L3 (list1 vs. list3 for plots on the TI-89).
Important note: Your graphing calculator has options for quadratic regressions, cubic
regressions, and other nonlinear regressions. Do not use these, as they will yield
incorrect residuals. Instead, transform the data first as taught in the tutorial.

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 17 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
$QVZHUV

$

7KHUHJUHVVLRQHTXDWLRQLVVKRZQDWWKHWRSRIWKHFDOFXODWRUVFUHHQ $OLQHLVQRWD
JRRGILWIRUWKHVHGDWDWKHSRLQWVVKRZDGHILQLWHFXUYHGSDWWHUQ
% r 2 
& 3DUWRIWKHFRQYHUVLRQVDUHVKRZQEHORZ

' 7KHUHJUHVVLRQHTXDWLRQOQ\RQ[LVVKRZQDWWKHWRSRIWKHVFUHHQEHORZ

,WGRHVDSSHDUWREHDEHWWHUILWU DQG U
(

  

=  +  [ =  +   =  = OQ \ 


)RUWKLVPRGHO \
7KHQ \ = H



=  1RWHWKDWWKLVZDVFRPSOHWHGE\EDFNWUDQVIRUPLQJ

&DWHJRULFDO%LYDULDWH'DWD7ZR:D\7DEOHV
2EMHFWLYH
5HFRJQL]HVLWXDWLRQVZKHUHXVLQJDWZRZD\WDEOHLVDQDSSURSULDWHPHWKRGIRU
DQDO\]LQJELYDULDWHGDWDDQGFRQVWUXFWDWZRZD\WDEOHIURPWKHGDWD
([DPSOHV
 ,QDUHFHQWVXUYH\DWDSXEOLFKLJKVFKRRODUDQGRPVDPSOHIURPHDFKFODVVZDV
DVNHGZKHWKHUWKH\FRQVLGHUHGWKHPVHOYHV'HPRFUDWVRU5HSXEOLFDQVRI
IUHVKPHQVDLGWKH\ZHUH'HPRFUDWVRIVRSKRPRUHVVDLGWKH\ZHUH
'HPRFUDWVRIMXQLRUVVDLGWKH\ZHUH'HPRFUDWVVHQLRUVZHUHLQWHUYLHZHG
$OOWRJHWKHUVWXGHQWVLQFOXGLQJVHQLRUVVDLGWKH\FRQVLGHUHGWKHPVHOYHV
'HPRFUDWV
$ 3UHVHQWWKHUHVXOWVRIWKHVXUYH\LQDWZRZD\FRQWLQJHQF\WDEOH OHWFODVVEH
WKHURZYDULDEOH 
% *LYHWKHFRQGLWLRQDOGLVWULEXWLRQIRU5HSXEOLFDQV
______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 18 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
$QVZHUV

$

3DUW\3UHIHUHQFH
&ODVV
)5
62
-5
65

'HP
5HS














% &RQGLWLRQDOGLVWULEXWLRQIRU5HSXEOLFDQVIUHVKPHQDUHRU5HSXEOLFDQ
VRSKRPRUHVDUHRU5HSXEOLFDQMXQLRUVDUHRU
5HSXEOLFDQDQGVHQLRUVDUHRU5HSXEOLFDQ
2EMHFWLYH
,GHQWLI\DQGLQWHUSUHWWKHGLIIHUHQWW\SHVRIIUHTXHQFLHVDQGGLVWULEXWLRQVLQDWZRZD\
WDEOHLQFOXGLQJPDUJLQDOIUHTXHQFLHVPDUJLQDOGLVWULEXWLRQVDQGFRQGLWLRQDO
GLVWULEXWLRQV RIFROXPQVE\URZVDQGURZVE\FROXPQV 
([DPSOH
 &RQVLGHUWKHIROORZLQJWZRZD\WDEOH
9DULDEOH$

$
%
&
'

9DULDEOH%
)

&



'

















,GHQWLI\WKHURZDQGFROXPQYDULDEOHV
,GHQWLI\WKHPDUJLQDOIUHTXHQFLHV
)LQGWKHPDUJLQDOGLVWULEXWLRQVIRUHDFKYDULDEOH
,GHQWLI\WKHFRQGLWLRQDOGLVWULEXWLRQIRU9DULDEOH%YDOXH(

7LSV
&RQGLWLRQDOGLVWULEXWLRQVDUHSURSRUWLRQVRIDYDOXHZLWKLQDJLYHQYDULDEOH
&RQGLWLRQDOGLVWULEXWLRQVFDQJRLQWZRGLUHFWLRQVURZE\FROXPQRUFROXPQE\
URZ,QWKLVFDVHDQH[DPSOHRIDURZE\FROXPQGLVWULEXWLRQZRXOGEHWKH
SURSRUWLRQRIYDOXHVLQWKHHDFKURZLQYDULDEOH$WKDWIDOOLQWKHFROXPQV()RU
*
:KHQFRQGLWLRQDOGLVWULEXWLRQVDUHQDPHGWKHILUVWQDPHLVWKHGHQRPLQDWRU)RU
H[DPSOHLI\RXZDQWWKHFRQGLWLRQDOGLVWULEXWLRQRI:KLWHVZKRDUHIHPDOH UDFHE\
JHQGHU \RX
UHORRNLQJIRUWKHSURSRUWLRQ

QXPEHU RI :KLWHV ZKR DUH IHPDOH


WRWDO QXPEHU RI :KLWHV

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 19 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
,I\RX
UHORRNLQJIRUWKHFRQGLWLRQDORIIHPDOHVZKRDUH:KLWH JHQGHUE\UDFH 
\RX
UHORRNLQJIRUWKHSURSRUWLRQ

QXPEHU RI IHPDOHV ZKR DUH :KLWH


WRWDO QXPEHU RI IHPDOHV
$QVZHU

$ 5RZYDULDEOH$&ROXPQ9DULDEOH%
% )RUYDULDEOH$IRU9DULDEOH%


= 
= 
& )RUYDULDEOH$





)RUYDULDEOH%
= 
= 
= 



' )RUYDOXH(RIYDULDEOH%YDOXHV&DQG'RI9DULDEOH$KDYHUHVSHFWLYHO\


= 
= 


2EMHFWLYH
5HFRJQL]HDQGLQWHUSUHW6LPSVRQV3DUDGR[
([DPSOH
 ,Q$QGUH'DZVRQEDWWHGDJDLQVWULJKWKDQGHGSLWFKHUVDQGDJDLQVW
OHIWKDQGHGSLWFKHUV7KDWVDPH\HDU/HH/DF\EDWWHGDJDLQVWULJKWKDQGHG
SLWFKHUVDQGDJDLQVWOHIWKDQGHGSLWFKHUV7KDWLV$QGUH'DZVRQEDWWHGEHWWHU
WKDQ/HH/DF\DJDLQVWERWKULJKWDQGOHIWKDQGHGSLWFKHUV+RZHYHUIRUWKH\HDU
/HH/DF\EDWWHGWR$QGUH'DZVRQV+RZLVLWSRVVLEOHWKDWRQHEDWWHU
FRXOGEHEHWWHUDJDLQVWERWKULJKWDQGOHIWKDQGHGSLWFKHUV\HWKDYHDORZHU
DYHUDJHRYHUDOO"
7LS

7KHUHLVSUREDEO\DOXUNLQJYDULDEOHLQKHUHVRPHZKHUH

$QVZHU
 7KLVLVDQH[DPSOHRI6LPSVRQV3DUDGR[$QGUH'DZVRQKDGPDQ\PRUHDWEDWVDQG
KLVDWEDWVZHUHFRQFHQWUDWHGDWWKHORZHUDYHUDJH/HH/DF\VDWEDWVRQWKHRWKHU
KDQGZHUHPRUHFRQFHQWUDWHGDWWKHKLJKHUDYHUDJH7KHH[DFWGDWDIROORZV
$QGUH'DZVRQ
YV5+3
YV/+3
7RWDOV

+LWV




$W%DWV




$YHUDJH




/HH/DF\
YV5+3
YV/+3
7RWDOV

+LWV




$W%DWV




$YHUDJH




______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 20 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
8QLW:UDS8S
2EMHFWLYH
'HILQHHDFKNH\WHUPDQGFRQFHSWLQWKLV8QLW
2EMHFWLYH
([SODLQWKHVLJQLILFDQFHRIHDFKNH\WHUPDQGFRQFHSWLQWKLV8QLW
2EMHFWLYH
$SSO\FRQFHSWV\RXOHDUQHGLQWKLV8QLWWRVSHFLILFSUREOHPV
7LSV
$V\RXUHYLHZWKHNH\WHUPVWKLQNDERXWKRZWKH\UHODWHWRRQHDQRWKHU3OD\D
JDPHLQZKLFK\RXSLFNDWHUPDWUDQGRPGHILQHLWWKHQSLFNDQRWKHUWHUPDQG
WKLQNKRZLWUHODWHVWRWKHILUVWWHUP7KHWZRWHUPVPD\QRWVHHPUHODWHGDWILUVW
EXWDOORIWKHWHUPV\RX
OOOHDUQLQWKLVFRXUVHDUHVRPHKRZUHODWHG VRPHDUH
GLVWDQWFRXVLQVUDWKHUWKDQEURWKHUVDQGVLVWHUV 
7KLQNDERXWKRZWKHFRQFHSWV\RXOHDUQHGFRXOGEHFRPELQHGDQGDSSOLHGWRQHZ
VLWXDWLRQV7KHTXHVWLRQV\RXVHHRQWKH4XL]DQGRQWKH$3([DPPD\ORRNQHZ
RQWKHVXUIDFHDQGWKH\PD\FRPELQHFRQFHSWVLQQHZZD\VEXW\RX
OOEHDEOHWR
DQDO\]HDQGVROYHWKHPE\DSSO\LQJWHFKQLTXHVWDXJKWLQWKLV8QLW
6DPSOH)UHH5HVSRQVH4XHVWLRQ
7KHUHDUHPDQ\ZD\VWKHFRQFHSWVLQWKLV8QLWFDQEHFRPELQHGLQWRDIUHHUHVSRQVH
TXHVWLRQUHTXLULQJ\RXWRXVHPDQ\RIWKHFRQFHSWV\RX
YHOHDUQHGLQRUGHUWRVROYHLW
+HUH
VMXVWRQHH[DPSOH
7KHIROORZLQJUHSUHVHQWVWKHSRSXODWLRQRIWKH8QLWHG6WDWHV LQWKRXVDQGV LQWHQ\HDU
LQWHUYDOVIURPWR 6RXUFH86&HQVXV%XUHDX
<HDU











3RSXODWLRQ
[











 )LQGWKHOHDVWVTXDUHVUHJUHVVLRQRISRSXODWLRQRQ\HDUDQGWKHFRUUHODWLRQ
FRHIILFLHQW
 'UDZWKHVFDWWHUSORWDQGUHJUHVVLRQOLQHRIWKHGDWD'RHVDOLQHDSSHDUWREHDJRRG
PRGHOIRUWKHGDWD"
 &RQVWUXFWWKHUHVLGXDOSORWIRUWKHUHJUHVVLRQMXVWFRPSOHWHG1RZGRHVWKHOLQH
DSSHDUWREHDJRRGPRGHO"
 'LVFXVVDEHWWHUPRGHOIRUWKHGDWD8VHJRRGVWDWLVWLFDOUHDVRQLQJWRMXVWLI\\RXU
DQVZHU

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 21 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
$QVZHUV
 3RSXODWLRQ [U 
 7KHOLQHGRHVDSSHDUWRILWWKHGDWDUHDVRQDEO\ZHOO

 7KHUHVLGXDOSORWVKRZVDGHILQLWHSDWWHUQVRWKDWHYHQWKRXJKWKHOLQHYLVXDOO\
DSSHDUHGWREHDJRRGILWDQGWKHUHZDVDYHU\KLJKFRUUHODWLRQFRHIILFLHQWDOLQH
FHUWDLQO\LVQWWKHEHVWSRVVLEOHPRGHOIRUWKHGDWD

 :HNQRZWKDWSRSXODWLRQJURZWKRIWHQJURZVH[SRQHQWLDOO\VROHWVWU\WDNLQJWKHORJ
RIWKHSRSXODWLRQYDOXHVDQGGRDUHJUHVVLRQRQWKHWUDQVIRUPHGGDWD
<HDU











3RSXODWLRQ
[











/Q 3RSXODWLRQ











7KHUHJUHVVLRQRI/Q 3RSXODWLRQ RQ\HDUORRNVOLNHWKLV

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

AP Statistics
Page 22 of 22
Review: Bivariate Data: Regression Analysis and Two-Way Tables
7KHUHJUHVVLRQHTXDWLRQLVOQ SRSXODWLRQ  [U 7KHSLFWXUHGRHV
ORRNOLNHDVRPHZKDWEHWWHUILWDQGWKHFRUUHODWLRQFRHIILFLHQWLVVOLJKWO\KLJKHUWKDQIRU
WKHUDZGDWD7REHVXUHKRZHYHUZHQHHGWRORRNDWWKHUHVLGXDOSORWIRUWKHVHGDWD

$OWKRXJKQRWFRPSOHWHO\UDQGRP ZHVKRXOGZRUU\DELWDERXWWKHWUHQGIRUWKHUHVLGXDOV
WREHGHFOLQLQJGXULQJWKHODVW\HDUV LWFHUWDLQO\KDVOHVVRIDSDWWHUQWKDWWKH
UHVLGXDOSORWIRUWKHUDZGDWD:HFRQFOXGHWKDWWKHGDWDLVPRUHOLNHO\H[SRQHQWLDOWKDQ
OLQHDUDQGZHQRWHWKDWWKHUDWHRILQFUHDVHVHHPVWREHGHFOLQLQJRYHUWKHODVWVHYHUDO
GHFDGHV

______________________________
Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)
TI-83 screens are used with the permission of the publisher. Copyright 1996, Texas Instruments, Incorporated.
TI-89 screens are used with the permission of the publisher. Copyright 2011, Texas Instruments, Incorporated.

Vous aimerez peut-être aussi