Académique Documents
Professionnel Documents
Culture Documents
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
C H A P T E R
CORE
Data transformation
What do we mean by data transformation?
What is the effect of applying a log, squared or reciprocal transformation to the x
variable?
What is the effect of applying a log, squared or reciprocal transformation to the y
variable?
How do I choose which data transformation to apply?
How do I carry out a regression analysis with transformed data?
166
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
167
Generally, it is only luck if a data set reveals all its hidden information when analysed in the
form in which it was initially gathered and/or reported. It is part of the analysts role to search
out different ways of looking at the data in order to enhance our understanding of that data.
One of the most powerful tools available to help achieve this task is data transformation.
How do these transformations affect the values to which they are applied? Consider the
following table of numbers.
value
(value)2
log(value)
1/value
0.2
0.04
0.699
5
0.4
0.16
0.398
2.5
0.6
0.36
0.222
1.667
1
1
0
1
2
4
0.301
0.5
3
9
0.477
0.333
4
16
0.602
0.25
From the table we can see that the transformations have the following effects on the data values:
The squared transformation has the effect of decreasing values less than 1, and
increasing values greater than 1. Large values are increased the most. For example, 22 = 4,
and 202 = 400, so that while the values 2 and 20 are 18 units apart, the values 4 and 400
are 396 units apart. That is, the effect of the square transformation is to stretch the values.
The log transformation reduces all values, and values between 0 and 1 become negative.
Large values are reduced much more than small values. For example, log 2 = 0.301, and
log 20 = 1.301, so that while the values 2 and 20 are 18 units apart, the values 0.301 and
1.303 are only 1 unit apart. That is, the effect of the log transformation is to compress the
values. Note that the log function can be applied only to values that are greater than 0.
The reciprocal transformation again reduces all values greater than one. Large values
1
= 0.05, so that
are reduced much more than small values. For example, 12 = 0.5, and 20
while the values 2 and 20 are 18 units apart, the values 0.5 and 0.05 are only 0.45 units
apart. That is, the effect of the reciprocal transformation is to compress the large values to
an even greater extent than the log transformation.
Thus it can be seen that all transformations have a greater effect on the larger values, but this
effect varies for each transformation.
Exercise 6A
1 a Copy and complete the table.
value
1 2 3 4 5
b Use the information in the table to
2
(value)
complete the following statements
log(value)
by deleting the incorrect term.
1/value
i The squared transformation
stretches/compresses the scale.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
168
12:58
10
100
1000
10000
100000
b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
4 a Copy and complete the table.
value
(value)2
log(value)
1/value
20
10
2.5
1.25
0.625
b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
5 a Copy and complete the table.
value
(value)2
log(value)
1/value
20
200
2000
20000
200000
b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
169
6.2
Outcome
Graph
log x
1
x
The following examples show the effect on the relationship between x and y when the squared,
log and reciprocal transformations are applied to the x values.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
170
November 2, 2011
7:52
x
y
0
2
1
3
2
6
3
11
4
18
b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data with y as the DV and x 2 as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value of y when x = 5.
Solution
y
20
15
10
5
0
20
15
10
5
0
10
15
20
x2
5
y
20
15
10
5
0
10
15
20
x2
2
d Use the non-linear equation y = 2 + x 2 to determine When x = 5, y = 2 + 5 = 27
the value of y when x = 5.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
171
0
2
1
3
2
6
3
11
4
18
3 Press
+ and select Add Data &
Statistics. Construct a scatterplot of y
against x. Let x be the independent variable
and y the dependent variable. The plot is
clearly non-linear.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
172
Eqn: y = 2 + x 2
when x = 5, y = 2 + 52 = 27
0
2
1
3
2
6
3
11
4
18
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
173
Steps
1 Open the Statistics application and
enter the data into the columns
named x and y. Your screen should
look like the one shown.
2 Construct a scatterplot of y against
x. Let x be the independent variable
and y the dependent variable. The
plot is clearly non-linear.
3 To calculate the values of x 2 and
store them in a list named xsq
(i.e. x-squared):
a Tap to highlight the cell at the
top of the next empty list.
Rename by typing xsq and
.
pressing
b Tap to highlight the cell at the
bottom of the newly named
xsq column (in the row titled
Cal ). Type x 2 and press
to calculate and list the x 2
values.
4 Construct a scatterplot of y against
xsq (i.e. x 2 ). The plot is clearly
linear.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
174
12:58
Exercise 6B
The x squared transformation
These exercises are expected to be completed with the aid of a graphics calculator.
1 a Plot the data in the table, and comment on the
x 0 1 2 3 4
form of the relationship between y and x.
y 16 15 12 7 0
b Apply a squared transformation to the x values
(x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 2.
2 a Plot the data in the table, and comment
on the form of the relationship between
y and x.
x
y
1 2 3
4 5
3 9 19 33 51
b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 6.
3 a Plot the data in the following table, and
comment on the form of the relationship
between y and x.
x
y
1 2 3
4 5
30 27 22 15 6
b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 6.
Data transformation is a process best performed totally within a graphics calculator
environment and we will adopt this approach for the log x and 1/x transformations.
1
0
10
10
100
20
400
25
600
28
1000
30
b Apply a log transformation to the x values (log (x)) and replot the data. Again
comment on the relationship between x and y.
c Fit a line to the transformed data with y as the DV and log x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value of y when
x = 800.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Steps
a 1 Start a new document by pressing
2 Select Add Lists & Spreadsheet.
Enter the data into lists named
x and y, as shown opposite.
175
3 Press
+ and select Add Data &
Statistics.
Construct a scatterplot of y against x.
Let x be the independent variable
and y the dependent variable. The plot is
clearly non-linear.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
176
c Press b>Analyze>Regression>Show
Linear (a + bx) to plot the regression
line on the scatterplot.
Note that, simultaneously, the equation
of the regression line is shown.
Note: The x on the screen corresponds to the
transformed variable lx (log x), so we can rewrite
the equations as y = 0.01 + 9.92 lx or
y = 0.01 + 9.92 log x (correct to two decimal places).
1
0
10
10
100
20
400
25
600
28
1000
30
b Apply a log transformation to the x values (log (x)) and replot the data. Again
comment on the relationship between x and y.
c Fit a line to the transformed data with y as the DV and log x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value of y
when x = 800.
Steps
a 1 Open the Statistics application
and enter the data into the
columns named x and y. Your
screen should look like the one
shown opposite.
2 Construct a scatterplot of y
against x. Let x be the
independent variable and y the
dependent variable. The plot is
clearly non-linear.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
177
When x = 800,
y = 0.01 + 9.92 log 800
= 28.8 (to 1d.p.).
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
178
12:58
Exercise 6C
The log x transformation
1 a Plot the data in the table, and comment on the
form of the relationship between y and x.
x
y
x
y
x
y
10
44 132 436 981
15.0 11.8 9.4 6.8 5.0
b Apply a log transformation to the x values (log x), again plot the data, and comment on
the form of the relationship between y and log x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 1000.
1
x
a Plot the data presented in the table below. Comment on the form of the relationship
between x and y.
x
y
1
30
2
15
3
10
4
7.5
5
6
b Apply a reciprocal transformation to the x values (1/x) and replot the data. Again
comment on the form of the relationship between x and y.
c Fit a line to the transformed data with y as the DV and 1/x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value
of y when x = 4.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
179
Steps
+ .
a 1 Start a new document by pressing
2 Select Add Lists & Spreadsheet. Enter the
data into lists named x and y, as shown
opposite.
3 Press
+ and select Add Data &
Statistics.
Construct a scatterplot of y against x. Let x
be the independent variable and y the
dependent
variable. The plot is clearly non-linear.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
180
c Press b>Analyze>Regression>Show
Linear (a + bx) to plot the regression
line on the scatterplot. Note that,
simultaneously, the equation of the
regression line is shown.
Note: The x on the screen corresponds to the
transformed variable recx (1/x), so we can rewrite
the equations as y = 5.E-14 + 30 recx or
y = 30/x (we can ignore 5.E-14 because
5.E-14 = 5 1014 0).
Eqn: y = 30/ X
When X = 4, y = 30/4 = 7.5
1
30
2
15
3
10
4
7.5
5
6
b Apply a reciprocal transformation to the x values (1/x) and replot the data. Again
comment on the relationship between x and y.
c Fit a line to the transformed data with y as the DV and 1/x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value
of y when x = 4.
Steps
a 1 Open the Statistics application
and enter the data into the
columns named x and y. Your
screen should look like the one
shown opposite.
2 Construct a scatterplot of
y against x. Let x be the
independent variable and y the
dependent variable. The plot is
clearly non-linear.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
181
y = 30/x
d Use this equation to evaluate y
when x = 4.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
182
12:58
Exercise 6D
The reciprocal (1/x) transformation
1 a Plot the data in the table, and comment on the
form of the relationship between y and x.
x
y
2 4 6 8 10
60 30 20 15 12
b Apply a reciprocal transformation to the x values (1/x), again plot the data, and comment
on the form of the relationship between y and 1/x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 5.
2 a Plot the data in the table, and comment on the
form of the relationship between y and x.
x
y
1 2 3 4 5
61 31 21 16 13
b Apply a reciprocal transformation to the x values (1/x), again plot the data, and comment
on the form of the relationship between y and 1/x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 6.
Which transformation?
What sorts of non-linear relationships can we linearise using the x2 transformation?
The x 2 transformation has the effect of stretching out the upper end of the x scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the x to x 2 transformation. Note that for the x 2 transformation to
apply, the scatterplot should peak or bottom around x = 0.
y
What sorts of non-linear relationships can we linearise using the log x transformation?
The log x transformation has the effect of compressing the upper end of the x scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the x to log x transformation.
y
x
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
x
Cambridge University Press
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
183
1
What sorts of non-linear relationships can we linearise using the transformation?
x
As a guide, relationships that have scatterplots which look like those shown below can often
1
(but not always) be linearised using the x to transformation.
x
y
Exercise 6E
1 The table below shows the diameter (in m) of a number of umbrellas, along with the number
of people each umbrella is designed to keep dry.
Diameter
Number of people
0.50
1
0.70
2
0.85
3
1.00
4
1.10
5
a Construct a scatterplot showing the relationship between number of people and umbrella
diameter, and comment on the form.
b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Give coefcients correct to
one decimal place. Use the equation to predict the number of people that can be kept dry
with an umbrella of diameter 1.30 m. Give answer correct to the nearest whole number.
2 The table shows the horsepower and fuel consumption in kilometres/litre of several cars.
Fuel consumption 5.2 7.3 12.6 7.1 6.3 10.1 10.5 14.6 10.9 7.7
Horsepower
155 125 75 110 138 88 80 70 100 103
a Construct a scatterplot showing the relationship between horsepower and fuel
consumption, and comment on the form. The IV is fuel consumption.
b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment
on the form of the relationship between y and 1/x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the horsepower of a car with fuel consumption of 9 kilometres per litre.
6.3
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
184
November 2, 2011
8:2
Outcome
Graph
x
y
log y
x
y
1
y
Example 2
a Plot the data in this table, and comment on the form of the relationship between y and x.
x
y
0
0
1
3.2
2
4.5
3
5.5
4
6.3
5
7.1
b Apply a squared transformation to the y values (y 2 ), again plot the data, and comment on
the form of the relationship between y 2 and x.
c Fit a line to the transformed data with y 2 as the DV and x as the IV. Write its equation and
determine the value of y when x = 1.6.
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the
relationship is linear or non-linear.
y
8
6
4
2
0
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
November 2, 2011
9:40
185
x
Y2
10.2
20.3
30.3
39.7
50.4
y2
60
50
40
30
20
10
y 2=10x
y
100
80
60
40
20
0
x 0
1
2
3
4
5
log y 2.00 1.57 1.15 0.70 0.30 0.00
log y
2.0
log y = 2 - 0.4x
1.5
1.0
0.5
0
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
186
November 2, 2011
8:13
Solution
10
8
6
4
2
x
1/y
1
0.1
1
against x.
y
3 Decide if the form of the
relationship is linear or non-linear.
1
The relationship between and x is
y
linear, so that we can t a line to the data.
2
0.2
3
0.3
5
0.5
0.5
0.4
0.3
1 = 0.1x
y
0.2
0.1
4
0.4
1
y
1
and x is linear.
y
1
= 0.1x.
y
When x = 8,
1
= 0.1 8 = 0.8 or y = 1.25.
y
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
187
Which transformation?
What sorts of non-linear relationships can we linearise using the y 2 transformation?
The y 2 transformation has the effect of stretching out the upper end of the y scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the y to y 2 transformation. Note that for the y 2 transformation to
apply, the scatterplot should peak or bottom around y = 0.
y
What sorts of non-linear relationships can we linearise using the log y transformation?
The log y transformation has the effect of compressing the upper end of the y scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the y to log y transformation.
y
What sorts of non-linear relationships can we linearise using the 1/y transformation?
As a guide, relationships that have scatterplots which look like those shown below can often
(but not always) be linearised using the y to 1/y transformation.
y
Exercise 6F
The y2 transformation
1 a Plot the data in the table. Comment on the
x 0
2
4
6
8 10
form of the relationship between y and x.
y 1.2 2.8 3.7 4.5 5.1 5.7
b Apply a squared transformation to the y values
(y 2 ). Plot the data, and comment on the form of the relationship between y 2 and x.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
188
12:58
c Fit a line to the transformed data with y 2 as the DV and x as the IV. Write its equation.
Find the value of y when x = 9.
2 a Plot the data in the table. Comment on
the relationship between y and x.
x
y
5
10
15
20
25
30
13.2 12.2 11.2 10.0 8.7 7.1
b Apply a squared transformation to the y values (y 2 ). Plot the data, and comment on the
form of the relationship between y 2 and x.
c Fit a line to the transformed data with y 2 as the DV and x as the IV. Write its equation.
Find the value of y when x = 7.
The log y transformation
3 a Plot the data in the table. Comment on the
x
0.1
0.2
0.3
0.4
0.5
form of the relationship between y and x.
y 15.8 25.1 39.8 63.1 100.0
b Apply a log transformation to the y values
(log y). Plot the data and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation.
Find the value of log y when x = 0.6.
4 a Plot the data in the table. Comment on the
x 2
4
6
8
10
form of the relationship between y and x.
y 7.94 6.31 5.01 3.98 3.16
b Apply a log transformation to the y values
(log y). Plot the data and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation.
Find the value of log y when x = 5.
5 a Plot the data in the table. Comment on the
x
1
3
5
7
9
form of the relationship between y and x.
y
7
32 147 681 3162
b Apply a log transformation to the y values
(log y). Plot the data, and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation.
Find the value of log y when x = 2.
The 1/x transformation
6 a Plot the data in the table. Comment on the
x
1
2
3
4
5
form of the relationship between y and x.
y
1
0.5 0.33 0.25 0.20
b Apply a reciprocal transformation to the
y values (1/y). Plot the data and comment on the relationship between 1/y and x.
c Fit a line to the transformed data with 1/y as the DV and x as the IV. Write its equation.
Find the value of y when x = 2.5.
7 a Plot the data in the table. Comment
on the form of the relationship
between y and x.
x
y
11
0.43
14
0.34
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
26
0.19
35
0.14
41
0.12
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
189
b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on
the form of the relationship between 1/y and x.
c Fit a line to the transformed data with 1/y as the DV and x as the IV. Write its equation.
Find the value of y when x = 20.
d Name a y-axis transformation that should also work for the data. Try it and see.
e Name a y-axis transformation that should not work for the data. Try it and see.
Practical examples
8 The time (in minutes) taken for a local anaesthetic to take effect is related to the dose given.
To investigate this relationship a researcher collected the data shown.
Dose
Time
0.5
3.67
0.6
3.55
0.7
3.42
0.8
3.29
0.9
3.15
1.0
3.00
1.1
2.85
1.2
2.68
1.3
2.51
1.4
2.32
1.5
2.12
a Construct a scatterplot showing the relationship between the dose of anaesthetic and
time taken for it to take effect, and comment on the form.
b Apply a squared transformation to the time values (y), again plot the data, and comment
on the form of the relationship between time squared (y 2 ) and dose (x).
c Fit a line to the transformed data with time2 as the DV and dose as the IV. Write its
equation. Predict the time for the anaesthetic to take effect when the dose is 0.4 units.
9 The table below shows the number of internet users signing up with a new internet service
provider for each of the rst nine months of their rst year of operation.
Number
Month
24
1
32
2
35
3
44
4
60
5
61
6
78
7
92
8
118
9
1
14
2
9
2
11
4
5
5
4
6
4
7
3
7
3
9
2
11
2
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
190
12:58
transformation
Putting together the information in Sections 6.2 and 6.3, we can see that there may be more
than one transformation which linearises the scatterplot. The forms of the scatterplots that can
be transformed by the squared, log or reciprocal transformations can be largely classied into
one of four categories, shown as the circle of transformations.
The circle of transformations
Possible transformations
Possible transformations
y2
log x
y2
x2
1
x
log y
1
y
log x
log y
1
y
x2
1
x
Note that the transformations we have introduced in this chapter are able to linearise only
those relationships that are consistently increasing or decreasing.
The advantage of having alternatives is that, in practice, we can always try each of them to
see which gives us the best result. How do we decide which transformation is the best? The
best transformation is the one that results in the best linear model. To choose the best linear
model we will consider, for each transformation applied:
the residual plot, in order to evaluate the linearity of the transformed relationship
the value of the coefcient of determination (r 2 ): a higher value indicates a better t.
This procedure is illustrated in Example 5.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
191
Example 5
The data in this table gives life expectancy in years and gross national product, GNP, in dollars
for 24 countries in 1982. Using an appropriate transformation, nd a regression model for the
relationship between life expectancy in years and GNP.
Country
Nicaragua
Paraguay
Venezuela
France
West Germany
Greece
Norway
Czechoslovakia
Austria
Jordan
Sri Lanka
Brunei
Country
GNP Life expectancy
Indonesia
550
50
North Korea
930
66
Mongolia
940
64
Taiwan
2 670
72
Australia
11 220
74
Congo
1 420
48
Ethiopia
150
41
Guinea
330
44
Mauritania
520
44
Nigeria
940
49
Togo
350
48
Zaire
180
48
Source: Modern Data Analysis: A First Course in Applied Statistics, L.C. Hamilton 1990, p. 537
West Germany is now part of Germany; Czechoslovakia is now the Czech Republic and Slovakia.
Solution
Life expectancy
GNP
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
192
GNP
Residual
GNP
Life expectancy
log GNP
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
193
Residual
log GNP
Life expectancy
1/GNP
Residual
1/GNP
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
194
12:58
Some comments
It might seem unnatural to talk about the wealth of a country in terms of log (GNP), yet when
we are comparing the relative wealth of countries, log (GNP) is probably a more useful measure
than GNP. For instance, knowing that the difference in GNP between Australia and Sri Lanka
is $10 900 is less informative than knowing that the difference in log (GNP) is 1.5448, which
tells us that Australias GNP is 101.5884 or 35 times that of Sri Lanka. Natural units of
measurement are more often those that are familiar rather than those that are most useful!
Exercise 6G
1 The following scatterplots show non-linear relationships. For each scatterplot, state which
of the transformations x 2 , log x, 1/x, y 2 , log y or 1/y, if any, you would apply to linearise
the relationship.
b 5
a 5
4
3
y
y
2
0
1 2 3 4 5 6 7 8 9 10
x
1 2 3 4 5 6 7 8 9 10
x
5
4
3
y
y
2
1 2 3 4 5 6 7 8 9 10
x
1 2 3 4 5 6 7 8 9 10
x
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
195
2 The data below give the yield in kilograms and length in metres of 12 commercial potato
plots.
Yield (kilograms) 346 1798 152 86 436 968 686 257 2435 287 1850 1320
Length (metres) 12.1 27.4 8.3 5.5 15.7 21.5 19.3 9.0 34.2 14.7 31.9 25.3
a Construct a scatterplot showing the relationship between yield and length of plot and
comment on the form.
b Using an appropriate transformation, nd a regression model for the relationship
between yield in kilograms and length of plot in metres.
3 A recent study in Canada showed that cigarette consumption (per day) is related to cost per
pack. Some data drawn from that study is shown below.
Cost ($)
4.00 4.50 4.80 5.50 6.00 6.50 7.50
Cigarette consumption 8.0 7.4 7.0 6.4 5.9 5.5 5.0
a Construct a scatterplot showing the relationship between the cost of cigarettes and
cigarette consumption, and comment on the form.
b Using an appropriate transformation, nd a regression model for the relationship
between the cost of cigarettes and cigarette consumption.
4 The population of a large town increased over a 13-year period, as shown in the table.
a Construct a scatterplot showing the
annual population growth of the town,
and comment on the form.
b Using an appropriate transformation,
nd a regression model for the annual
population growth of the town.
Year
1
2
3
4
5
6
7
Population
58860
57770
58206
59513
59983
60123
59763
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Year
8
9
10
11
12
13
Population
61726
60387
61646
62347
64185
67158
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
196
12:58
5 The monthly average exchange rate (to the nearest cent) between the Australian dollar and
the US dollar over a period of 18 months in the 1990s is given in the table below.
a Construct a scatterplot showing
Exchange rate
Exchange rate
the exchange rate over the 18-month Month
(US $)
Month
(US $)
period, and comment on the form.
1
0.77
10
0.75
b Using an appropriate transformation,
2
0.77
11
0.75
nd a regression model for the
3
0.77
12
0.72
exchange rate over that 18-month
4
0.76
13
0.72
period.
5
0.78
14
0.69
6
0.78
15
0.69
7
0.77
16
0.68
7
0.76
17
0.71
8
0.76
18
0.70
9
0.76
6 The table below shows the percentage of people who can read (literacy rate) and the gross
domestic product (GDP) for a selection of 14 countries.
a Construct a scatterplot showing the
relationship between literacy rate
and GDP, and comment on the form.
b Using an appropriate transformation,
nd a regression model for the
relationship between literacy rate
and GDP for this group of countries.
Country
Botswana
Cambodia
Canada
Ethiopia
France
Georgia
Germany
Honduras
Japan
Liberia
Pakistan
Saudi Arabia
Switzerland
Syria
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Literary
rate (%)
72
35
97
24
99
99
99
73
99
40
35
62
99
64
Gross domestic
product/capita
2677
260
19904
122
18944
4500
17539
1030
19860
409
406
6651
22384
2436
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
197
Data transformation
x2 or y2 transformation
1
1
or transformation
x
y
Review
Residual plots
Skills check
Having completed this chapter you should be able to:
1
1
recognise which of the x 2 , log x, , y 2 , log y or transformations might be used to
x
y
linearise a bivariate relationship
apply each of these transformations to a data set
use residual plots and the coefcient of determination, r 2 , to decide which
transformation gives the best model for the relationship
use the transformed variable as part of a regression analysis to give a model for the
relationship.
Multiple-choice questions
1 The missing data values, a and b, in the table are:
value
(value)2
log(value)
1
a
0
A a = 0, b = 0.5
D a = 1, b = 0.602
2
4
b
3
4
9
16
0.477 0.602
B a = 1, b = 0.5
E a = 1, b = 0.693
C a = 1, b = 0.301
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
14
12
Weight loss
Review
198
10
8
6
4
2
0
2
3
4
5
6
7
Number of weeks on a diet
4.00
6
5
4
2.00
0.00
2.00
4.00
2 3 4 5 6 7
Number of weeks on a diet
E
14
12
10
8
6
4
2
0
7
6
5
4
3
2
0 2 4 6 8 10 12 14
Weight loss
4.00
Residual
Weight loss
Residual
The residual plot for this least squares line would look like:
2.00
0.00
2.00
4.00
2 3 4 5 6 7
Number of weeks on a diet
2 3 4 5 6 7
Number of weeks on a diet
y
5
4
3
2
1
0
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
x
1 2 3 4 5 6 7 8 9 10
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
5
4
3
2
Review
199
1
x
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
7 The following data was collected for two related variables x and y.
x
y
1
7
2
8.6
3
8.9
4
8.8
5
9.9
6
9.7
7
10.4
8
10.5
9
10.7
10
11.2
11
11.1
y
12
10
8
6
4
2
0
x
0
10
12
8 Brian has determined from a scatterplot of his data that the appropriate
transformations for his data are log x, 1/x and y 2 . After applying each of these
transformations to the data, he obtains the results shown below.
Model
y vs x
y vs log x
y vs 1/x
y 2 vs x
Residuals
Curved
Random
Random
Random
r2
79.6%
80.8%
81.9%
88.4%
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
Review
200
12:58
Based on the information in the table, which transformation would you suggest
Brian use?
B a y 2 transformation
C a log x transformation
A an x 2 transformation
1
D a transformation
E no transformation
x
9 When investigating the relationship between the weight of the strawberries picked
from a strawberry patch, and the width of the patch, Suzie decides that an x 2
transformation is appropriate. After transforming the data, she ts a least squares
regression line to the data and determines that the intercept is 10 and the slope is 5.
Based on this information, the model that Suzie has tted to the data can be written
as:
B weight = 5 + 10 (width)2
A (weight)2 = 10 + 5 width
C weight = 10 + 5 (width)2
D (weight)2 = 10 + 5 (width)2
2
E (weight) = 5 + 10 width
10 Suppose that the model that describes the relationship between the hours spent
studying for an exam and the mark achieved can be modelled by the equation:
Mark = 20 + 40 log (Hours)
From this model, we would predict that a student who studies for 20 hours would
score a mark (to the nearest whole number) of:
A 80
B 78
C 180
D 72
E 140
Extended-response questions
1 Measurements of distance travelled in metres and time taken in seconds were made
on a falling body. The data are given in the table below.
Time
Distance
Time2
0
0
1
5.2
2
18.0
3
42.0
4
5
6
79.0 128.0 168.0
a
b
c
d
e
f
2 The data in the table below show the marks obtained by students on a test and the
amount of time they reported studying for the test:
Mark
62 74 79
Time (hours) 1.5 2.25 3.0
80
2.5
56
0.8
86
3.5
92 87 64
6.0 2.75 1.0
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
88
4.5
48
0.5
32
0.1
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
201
Review
a We want to predict a students mark from the time they reported studying for the
test. In this situation, which is the dependent variable and which is the
independent variable?
b Construct a scatterplot and comment on the relationship between test mark and
time spent studying in terms of direction, outliers, form and strength.
c i Fit a linear model to the data and record its equation. Interpret the slope in
terms of the problem at hand.
ii Calculate the coefcient of determination and interpret.
iii Construct a residual plot and use it to comment on the suitability of modelling
the relationship between Mark and Time spent studying with a straight line.
d Apply a log transformation to Time. Then:
i construct a scatterplot for the transformed data
ii nd the equation of the least squares regression line for the transformed data
iii use the equation to predict the mark obtained after 5 hours of study
iv calculate the coefcient of determination and interpret
v construct a residual plot and use it to comment on the linearity of the
transformed model
y = 31.9 0.015x
r2 = 0.662
40
0
60
0
80
0
10
00
12
00
14
00
16
00
30
28
26
24
22
20
18
16
14
12
10
Testosterone level
5
4
3
2
1
0
1
2
3
4
5
40
0
60
0
80
0
10
00
12
00
14
00
16
00
Residual
Testosterone
1305
1000
1175
1495
1060
800
1005
710
1150
605
690
700
625
610
450
Age at rst
conviction
11
12
13
14
15
16
16
17
18
20
21
23
24
27
30
Age
3 The following are the testosterone levels and the age at rst conviction for violent
and aggressive crimes collected on a sample of young male prisoners. It is believed
that the higher the testosterone level in a male prisoner, the earlier they are likely to
be convicted of a violent and aggressive crime. A correlation and regression
analysis is also given.
Testosterone level
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
Review
202
12:58
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS
12:58
203
Review
The data below show the age (in years) and diameter at chest height (in cm) of a
sample of trees of the same species taken from a commercial plantation.
Age
(years)
4
5
8
8
8
10
10
12
13
14
Diameter
(centimetres)
2.0
2.0
2.5
5.1
7.5
5.1
8.9
12.4
9.0
6.4
Age
(years)
16
18
22
25
29
30
34
38
40
Diameter
(centimetres)
11.4
11.7
14.7
16.5
15.2
15.2
17.8
17.8
19.1
a We wish to predict the age of a tree from its diameter at chest height. In this
situation, which is the dependent variable and which is the independent variable?
b Construct a scatterplot and comment on the relationship between age and
diameter in terms of direction, outliers, form and strength.
c i Fit a linear model to the data and record its equation. Interpret the slope in
terms of the problem at hand.
ii Calculate the coefcient of determination and interpret.
iii Form a residual plot and use it to comment on the suitability of modelling the
relationship between age and diameter with a straight line.
d Use the x 2 transformation to linearise the data. Then:
i construct a scatterplot of age against diameter squared
ii nd the equation of the least squares regression line for the transformed data
iii calculate the coefcient of determination and interpret
iv form a residual plot and use it to comment on the suitability of modelling the
relationship between age and diameter squared with a straight line
6 The table below shows the performance level recorded by a number of people on
completion of a task, along with the time spent (in minutes) in practising the task.
Time spent on practice
Level of performance
0.5
1.0
1.0
1.5
1.5
2.0
2.0
3.0
3.0
3.0
4.0
3.5
5.0
4.0
6.0
3.5
7.0
3.9
7.0
3.6