Vous êtes sur la page 1sur 38

P1: FXS/ABE

P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

C H A P T E R

CORE

Data transformation
What do we mean by data transformation?
What is the effect of applying a log, squared or reciprocal transformation to the x
variable?
What is the effect of applying a log, squared or reciprocal transformation to the y
variable?
How do I choose which data transformation to apply?
How do I carry out a regression analysis with transformed data?

6.1 Data transformation


There are methods for tting curves to non-linear relationships using non-linear regression.
However, this procedure is mathematically complicated and the results difcult to interpret.
The method of dealing with a non-linear relationship favoured in practice is to apply a
mathematical function to one of the variables, so that the relationship between the variables
becomes closer to a straight line. By appropriate choice of the function, the scale of
measurement is stretched or compressed. There are many functions that can be used to
transform the data, but here we will consider only three. These are:
the squared transformation
the logarithmic transformation
the reciprocal transformation.
When rst confronted with data transformation, many people tend to be suspicious. However,
when we think about it from the point of view of analysing a set of data, there is nothing
special about the units of measurement used when gathering the data. In general, units used are
chosen because they are convenient for recording and reporting the data. Natural units tend
to be used; for example, seconds when recording time, or metres when recording length. But
what is the natural unit for measuring fuel economy of a car: kilometres per litre (x) or litres
per kilometre (x 1 )? In measurement, natural often tends to mean familiar. For example, to a
chemist, it is natural to measure acidity in terms of pH and the logarithm of hydrogen ion
concentration (log x) rather than the hydrogen ion concentration (x).

166

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

167

Generally, it is only luck if a data set reveals all its hidden information when analysed in the
form in which it was initially gathered and/or reported. It is part of the analysts role to search
out different ways of looking at the data in order to enhance our understanding of that data.
One of the most powerful tools available to help achieve this task is data transformation.
How do these transformations affect the values to which they are applied? Consider the
following table of numbers.
value
(value)2
log(value)
1/value

0.2
0.04
0.699
5

0.4
0.16
0.398
2.5

0.6
0.36
0.222
1.667

1
1
0
1

2
4
0.301
0.5

3
9
0.477
0.333

4
16
0.602
0.25

From the table we can see that the transformations have the following effects on the data values:
The squared transformation has the effect of decreasing values less than 1, and
increasing values greater than 1. Large values are increased the most. For example, 22 = 4,
and 202 = 400, so that while the values 2 and 20 are 18 units apart, the values 4 and 400
are 396 units apart. That is, the effect of the square transformation is to stretch the values.
The log transformation reduces all values, and values between 0 and 1 become negative.
Large values are reduced much more than small values. For example, log 2 = 0.301, and
log 20 = 1.301, so that while the values 2 and 20 are 18 units apart, the values 0.301 and
1.303 are only 1 unit apart. That is, the effect of the log transformation is to compress the
values. Note that the log function can be applied only to values that are greater than 0.
The reciprocal transformation again reduces all values greater than one. Large values
1
= 0.05, so that
are reduced much more than small values. For example, 12 = 0.5, and 20
while the values 2 and 20 are 18 units apart, the values 0.5 and 0.05 are only 0.45 units
apart. That is, the effect of the reciprocal transformation is to compress the large values to
an even greater extent than the log transformation.
Thus it can be seen that all transformations have a greater effect on the larger values, but this
effect varies for each transformation.

Exercise 6A
1 a Copy and complete the table.
value
1 2 3 4 5
b Use the information in the table to
2
(value)
complete the following statements
log(value)
by deleting the incorrect term.
1/value
i The squared transformation
stretches/compresses the scale.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

168

October 22, 2011

12:58

Essential Further Mathematics Core

2 a Copy and complete the table.


value
1 2 4 8 16 32 64
b Use the information in the table to
2
(value)
complete the following statements
log(value)
by deleting the incorrect term.
1/value
i The squared transformation
stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
3 a Copy and complete the table.
value
(value)2
log(value)
1/value

10

100

1000

10000

100000

b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
4 a Copy and complete the table.
value
(value)2
log(value)
1/value

20

10

2.5

1.25

0.625

b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.
5 a Copy and complete the table.
value
(value)2
log(value)
1/value

20

200

2000

20000

200000

b Use the information in the table to complete the following statements by deleting the
incorrect term.
i The squared transformation stretches/compresses the larger values.
ii The log transformation stretches/compresses the larger values.
iii The reciprocal transformation stretches/compresses the larger values.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

169

Chapter 6 Data transformation

6.2

Transforming the x-axis


We are interested in linearising the relationship between two variables, x and y, and the
transformations discussed in the previous section can be applied to either x or y (but not both
here). We will examine the effect of transforming the x-axis and the y-axis separately.
Transforming the x-axis will have the effect of moving the x values on the plot horizontally,
and leave the y values unaltered. The square, log and reciprocal transformations can be applied
to the x-axis with the following effects:
Transformation
x2

Outcome

Graph

Spreads out the high x values relative


to the smaller x values

Compresses large x values relative to


the smaller data values

Also compresses large x values relative


to the smaller data values, to a greater
extent than log x. Note that values of x
less than 1 become greater than 1, and
values of x greater than 1 become less
than 1, so that the order of the data
values is reversed.

log x

1
x

The following examples show the effect on the relationship between x and y when the squared,
log and reciprocal transformations are applied to the x values.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

170

November 2, 2011

7:52

Essential Further Mathematics Core

The squared transformation


Example 1

Linearising the relationship with a squared transformation

a Plot the data in the table, and comment on the


form of the relationship between x and y.

x
y

0
2

1
3

2
6

3
11

4
18

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data with y as the DV and x 2 as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value of y when x = 5.
Solution
y

a 1 Plot the values of y against x.


2 Decide if the form of the
relationship is linear or non-linear.

20
15
10
5
0

3 Write down your conclusion.


b 1 Construct a new table of values.

The relationship between y and x is non-linear.


x2 0 1 4 9 16
y 2 3 6 11 18
y

2 Plot the values of y against x 2 .


3 Decide if the form of the
relationship is linear or non-linear.

20
15
10
5
0

4 Write down your conclusion.

10

15

20

x2

The relationship between y and x2 is linear.

c 1 Fit a straight line to the transformed data.


2 Write down the equation of the line noting that
i the y-intercept is 2 and the slope is 1
ii the independent variable on this graph is x 2 not x.
The equation of the line is:
y = 2 + x2

5
y

20
15
10
5
0

10

15

20

x2

2
d Use the non-linear equation y = 2 + x 2 to determine When x = 5, y = 2 + 5 = 27
the value of y when x = 5.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

Chapter 6 Data transformation

171

How to apply the squared transformation using the TI-Nspire CAS


Plot the data presented in the table below.
x
y

0
2

1
3

2
6

3
11

4
18

Use an x 2 -transformation to linearise the data, then:


t a line to the transformed data with y as the DV and x 2 as the IV. Write its
equation.
use the equation determined from the transformed data to predict the value of
y when x = 5.
Steps
1 Start a new document by pressing

2 Select Add Lists & Spreadsheet.


Enter the data into lists named x and y, as
shown.

3 Press
+ and select Add Data &
Statistics. Construct a scatterplot of y
against x. Let x be the independent variable
and y the dependent variable. The plot is
clearly non-linear.

4 Return to the Lists & Spreadsheet


+ ).
application (by pressing
To calculate the values of x 2 and store them
in a list named xsq (short for x-squared), do
the following:
a Move the cursor to the top of column C
.
and type xsq. Press
b Move the cursor to the grey cell
immediately below the xsq heading. We
need to enter the expression = x 2. To
then VAR (
),
do this, press
highlight the variable x and then press
to paste x into the formula line.
Finally, type 2 (or press ) to complete
to calculate and
the formula. Press
display the x-squared values.

Note: The dash in front of the x (i.e.  x)

is automatically added when a list name is


pasted from the VAR menu.
Note: You can also type in the variable x and
then select Variable Reference when prompted.
This avoids using the VAR menu.

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

172

Essential Further Mathematics Core

5 Construct a scatterplot of y against x 2 .


+ to return to the
Press
scatterplot created earlier and change
the independent variable to xsq as follows:
a Press e until the list of variables is
displayed near the x-axis. Select the
to paste the
variable, xsq. Press
variable to the x-axis.
b A scatterplot of y against xsq (x 2 ) is
then displayed, as shown. The plot is
clearly linear.
6 Press b>Analyze>Regression>Show
Linear (a + bx) to plot the regression
line on the scatterplot.
Note that, simultaneously, the equation
of the regression line is shown.
Note: The x on the screen corresponds to the
transformed variable xsq, so we can rewrite
the equations as y = 2 + xsq or y = 2 + x 2 .

7 Write down the equation for y in terms


of x 2 and evaluate y when x = 5.

Eqn: y = 2 + x 2
when x = 5, y = 2 + 52 = 27

How to apply the squared transformation using the ClassPad


Plot the data presented in the table below.
x
y

0
2

1
3

2
6

3
11

4
18

Use an x 2 -transformation to linearise the data, then:


t a line to the transformed data with y as the DV and x 2 as the IV. Write its
equation.
use the equation determined from the transformed data to predict the value of
y when x = 5.

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

Chapter 6 Data transformation

173

Steps
1 Open the Statistics application and
enter the data into the columns
named x and y. Your screen should
look like the one shown.
2 Construct a scatterplot of y against
x. Let x be the independent variable
and y the dependent variable. The
plot is clearly non-linear.
3 To calculate the values of x 2 and
store them in a list named xsq
(i.e. x-squared):
a Tap to highlight the cell at the
top of the next empty list.
Rename by typing xsq and
.
pressing
b Tap to highlight the cell at the
bottom of the newly named
xsq column (in the row titled
Cal ). Type x 2 and press
to calculate and list the x 2
values.
4 Construct a scatterplot of y against
xsq (i.e. x 2 ). The plot is clearly
linear.

5 Fit a line to the transformed data


and plot it on the scatterplot. Write
down its equation using y as the
DV and xsq (i.e. x 2 ) as the IV.
y = 2 + x2
Use this equation to evaluate y
when x = 5.
W hen x = 5, y = 2 + 52 = 27

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

174

October 22, 2011

12:58

Essential Further Mathematics Core

Exercise 6B
The x squared transformation
These exercises are expected to be completed with the aid of a graphics calculator.
1 a Plot the data in the table, and comment on the
x 0 1 2 3 4
form of the relationship between y and x.
y 16 15 12 7 0
b Apply a squared transformation to the x values
(x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 2.
2 a Plot the data in the table, and comment
on the form of the relationship between
y and x.

x
y

1 2 3
4 5
3 9 19 33 51

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 6.
3 a Plot the data in the following table, and
comment on the form of the relationship
between y and x.

x
y

1 2 3
4 5
30 27 22 15 6

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 6.
Data transformation is a process best performed totally within a graphics calculator
environment and we will adopt this approach for the log x and 1/x transformations.

The log transformation


How to apply the log transformation using the TI-Nspire CAS
a Plot the data presented in the table below. Comment on the form of the relationship
between x and y.
x
y

1
0

10
10

100
20

400
25

600
28

1000
30

b Apply a log transformation to the x values (log (x)) and replot the data. Again
comment on the relationship between x and y.
c Fit a line to the transformed data with y as the DV and log x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value of y when
x = 800.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

Chapter 6 Data transformation

Steps
a 1 Start a new document by pressing
2 Select Add Lists & Spreadsheet.
Enter the data into lists named
x and y, as shown opposite.

175

3 Press
+ and select Add Data &
Statistics.
Construct a scatterplot of y against x.
Let x be the independent variable
and y the dependent variable. The plot is
clearly non-linear.

b Return to the Lists & Spreadsheet


+ ).
application (by pressing
To calculate the values of log x and
store them in a list named lx (short for
log x), complete the following:
1 Move the cursor to the top of column
.
C and type lx. Press
2 Move the cursor to the grey cell
immediately below the lx
heading and type = log(. Then press
), highlight the variable x,
VAR (
to paste x into the formula
press
line, then type ) to complete the
to calculate and
command. Press
display the log values.

Note: If your answers are not given as


decimals, refer to the Appendix to
change Mode settings to Approximate.

3 Construct a scatterplot of y against


log x.
+ to return to the
Use
scatterplot created earlier and change
the independent variable to lx.
A scatterplot of y against lx (i.e. the log of x)
is displayed, as shown. The plot is clearly
linear.

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

176

Essential Further Mathematics Core

c Press b>Analyze>Regression>Show
Linear (a + bx) to plot the regression
line on the scatterplot.
Note that, simultaneously, the equation
of the regression line is shown.
Note: The x on the screen corresponds to the
transformed variable lx (log x), so we can rewrite
the equations as y = 0.01 + 9.92 lx or
y = 0.01 + 9.92 log x (correct to two decimal places).

Eqn: y = 0.01 + 9.92 log X


When X = 800,
y = 0.01 + 9.92 log 800
= 28.8 (to 1 d.p.)

d Write down the equation for y in terms


of log x and evaluate when x = 800.

How to apply the log transformation using the ClassPad


a Plot the data presented in the table below. Comment on the form of the relationship
between x and y.
x
y

1
0

10
10

100
20

400
25

600
28

1000
30

b Apply a log transformation to the x values (log (x)) and replot the data. Again
comment on the relationship between x and y.
c Fit a line to the transformed data with y as the DV and log x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value of y
when x = 800.
Steps
a 1 Open the Statistics application
and enter the data into the
columns named x and y. Your
screen should look like the one
shown opposite.
2 Construct a scatterplot of y
against x. Let x be the
independent variable and y the
dependent variable. The plot is
clearly non-linear.

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

Chapter 6 Data transformation

b To calculate the values of


log x and store them in a list
named lx (short for log x):
1 Tap to highlight the cell at
the top of the next empty list
(in this case, list3). Rename
by typing lx and pressing
.
2 Tap to highlight the cell at
the bottom of the newly
named lx column (in the
row titled Cal ).
Typing log(x) and pressing
calculates and lists the
values of log x.
3 Construct a scatterplot of y
against lx (i.e. log x). The
plot is clearly linear.

177

Note: To ensure decimal values are displayed,


Decimal should be visible in the status bar (at the
bottom). If Standard is visible, tap Standard and it
will change to Decimal.

c Fit a line to the transformed


data and plot it on the scatterplot.
Write down its equation using
y as the DV and lx (log x)
as the IV.

y = 0.01 + 9.92 log x


d Use this equation to evaluate y
when x = 800.

When x = 800,
y = 0.01 + 9.92 log 800
= 28.8 (to 1d.p.).

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

178

October 22, 2011

12:58

Essential Further Mathematics Core

Exercise 6C
The log x transformation
1 a Plot the data in the table, and comment on the
form of the relationship between y and x.

x
y

1 10 100 400 600 1000


30 20 10
5
2
0

b Apply a log transformation to the x values (log x),


again plot the data, and comment on the form of the relationship between y and log x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 500. Give your answer correct to one decimal place.
2 a Plot the data in the table, and comment on the
form of the relationship between y and x.

x
y

5 10 150 500 1000


3.1 4.0 7.5 9.1 10.0

b Apply a log transformation to the x values (log x),


again plot the data, and comment on the form of the relationship between y and log x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 100.
3 a Plot the data in the following table, and
comment on the form of the relationship
between y and x.

x
y

10
44 132 436 981
15.0 11.8 9.4 6.8 5.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on
the form of the relationship between y and log x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 1000.

The reciprocal (1/x ) transformation


How to apply the reciprocal transformation

1
x

using the TI-Nspire CAS

a Plot the data presented in the table below. Comment on the form of the relationship
between x and y.
x
y

1
30

2
15

3
10

4
7.5

5
6

b Apply a reciprocal transformation to the x values (1/x) and replot the data. Again
comment on the form of the relationship between x and y.
c Fit a line to the transformed data with y as the DV and 1/x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value
of y when x = 4.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

Chapter 6 Data transformation

179

Steps
+ .
a 1 Start a new document by pressing
2 Select Add Lists & Spreadsheet. Enter the
data into lists named x and y, as shown
opposite.

3 Press
+ and select Add Data &
Statistics.
Construct a scatterplot of y against x. Let x
be the independent variable and y the
dependent
variable. The plot is clearly non-linear.

b Return to the Lists & Spreadsheet application


+ ).
(by pressing
1
To calculate the values of , complete
x
the following:
1 Move the cursor to the top of column
C and type recx (short for the
.
reciprocal of x). Press
2 Move the cursor to the grey cell
Note: If your answers are not presented as
immediately below the recx heading
decimals, refer to the Appendix to change
Mode settings to Approximate.
and type = 1 , then press VAR
) and highlight the variable x and
(
to paste into the formula
press
line. Press
to calculate and
display the 1/x values.
3 Construct a scatterplot of y against
1/x (i.e. recx)
+ to return to the
Use
scatterplot created earlier and change
the independent variable to recx.
A scatterplot of y against recx (the
reciprocal of x) is displayed as shown.
The plot is clearly linear.

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

180

Essential Further Mathematics Core

c Press b>Analyze>Regression>Show
Linear (a + bx) to plot the regression
line on the scatterplot. Note that,
simultaneously, the equation of the
regression line is shown.
Note: The x on the screen corresponds to the
transformed variable recx (1/x), so we can rewrite
the equations as y = 5.E-14 + 30 recx or
y = 30/x (we can ignore 5.E-14 because
5.E-14 = 5 1014 0).

Eqn: y = 30/ X
When X = 4, y = 30/4 = 7.5

d Write down the equation for y in terms


of 1/x and evaluate y when x = 4

How to apply the reciprocal transformation using the ClassPad


a Plot the data presented in the table below. Comment on the relationship
between x and y.
x
y

1
30

2
15

3
10

4
7.5

5
6

b Apply a reciprocal transformation to the x values (1/x) and replot the data. Again
comment on the relationship between x and y.
c Fit a line to the transformed data with y as the DV and 1/x as the IV. Write its equation.
d Use the equation determined from the transformed data to predict the value
of y when x = 4.
Steps
a 1 Open the Statistics application
and enter the data into the
columns named x and y. Your
screen should look like the one
shown opposite.
2 Construct a scatterplot of
y against x. Let x be the
independent variable and y the
dependent variable. The plot is
clearly non-linear.

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

Chapter 6 Data transformation

181

b To calculate the values of 1/x and


store them in a list named recx
(short for the reciprocal of x):
1 Tap to highlight the cell at the
top of the next empty list (in
this case, list3). Rename by
.
typing recx and pressing
2 Tap to highlight the cell at the
bottom of the newly named
recx column (in the row titled
Cal ). Typing 1 x and
calculates and
pressing
lists the 1/x values.
3 Construct a scatterplot of y
against recx (i.e. 1/x). The
plot is clearly linear.

c Fit a line to the transformed


data and plot it on the scatterplot.
Write down its equation using
y as the DV and recx (1/x)
as the IV.

y = 30/x
d Use this equation to evaluate y
when x = 4.

When x = 4, y = 30/4 = 7.5

ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

182

October 22, 2011

12:58

Essential Further Mathematics Core

Exercise 6D
The reciprocal (1/x) transformation
1 a Plot the data in the table, and comment on the
form of the relationship between y and x.

x
y

2 4 6 8 10
60 30 20 15 12

b Apply a reciprocal transformation to the x values (1/x), again plot the data, and comment
on the form of the relationship between y and 1/x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 5.
2 a Plot the data in the table, and comment on the
form of the relationship between y and x.

x
y

1 2 3 4 5
61 31 21 16 13

b Apply a reciprocal transformation to the x values (1/x), again plot the data, and comment
on the form of the relationship between y and 1/x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the value of y when x = 6.

Which transformation?
What sorts of non-linear relationships can we linearise using the x2 transformation?
The x 2 transformation has the effect of stretching out the upper end of the x scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the x to x 2 transformation. Note that for the x 2 transformation to
apply, the scatterplot should peak or bottom around x = 0.
y

What sorts of non-linear relationships can we linearise using the log x transformation?
The log x transformation has the effect of compressing the upper end of the x scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the x to log x transformation.
y

x
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

x
Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

183

1
What sorts of non-linear relationships can we linearise using the transformation?
x
As a guide, relationships that have scatterplots which look like those shown below can often
1
(but not always) be linearised using the x to transformation.
x
y

Exercise 6E
1 The table below shows the diameter (in m) of a number of umbrellas, along with the number
of people each umbrella is designed to keep dry.
Diameter
Number of people

0.50
1

0.70
2

0.85
3

1.00
4

1.10
5

a Construct a scatterplot showing the relationship between number of people and umbrella
diameter, and comment on the form.
b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on
the form of the relationship between y and x 2 .
c Fit a line to the transformed data and write down its equation. Give coefcients correct to
one decimal place. Use the equation to predict the number of people that can be kept dry
with an umbrella of diameter 1.30 m. Give answer correct to the nearest whole number.
2 The table shows the horsepower and fuel consumption in kilometres/litre of several cars.
Fuel consumption 5.2 7.3 12.6 7.1 6.3 10.1 10.5 14.6 10.9 7.7
Horsepower
155 125 75 110 138 88 80 70 100 103
a Construct a scatterplot showing the relationship between horsepower and fuel
consumption, and comment on the form. The IV is fuel consumption.
b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment
on the form of the relationship between y and 1/x.
c Fit a line to the transformed data and write down its equation. Use the equation to predict
the horsepower of a car with fuel consumption of 9 kilometres per litre.

6.3

Transforming the y-axis


Another way to linearise the relationship between x and y is to apply these transformations to
the y-axis. Transforming the y-axis will have the effect of moving the y values on the plot
vertically, and leave the x values unaltered. The square, log and reciprocal transformations can
be applied to the y-axis with the following effects:

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

184

November 2, 2011

8:2

Essential Further Mathematics Core


Transformation
y

Outcome

Graph

Spreads out the large y values relative


to the smaller data values

x
y

Compresses large y values relative to


the smaller data values

log y

x
y

Also compresses large y values relative


to the smaller data values, to a greater
extent than log y. Note that values of y
less than 1 become greater than 1, and
values of y greater than 1 become less
than 1, so that the order of the data
values is reversed.

1
y

Example 2

Linearising the relationship with a y-squared transformation

a Plot the data in this table, and comment on the form of the relationship between y and x.
x
y

0
0

1
3.2

2
4.5

3
5.5

4
6.3

5
7.1

b Apply a squared transformation to the y values (y 2 ), again plot the data, and comment on
the form of the relationship between y 2 and x.
c Fit a line to the transformed data with y 2 as the DV and x as the IV. Write its equation and
determine the value of y when x = 1.6.
Solution
a 1 Plot the values of y against x.
2 Decide if the form of the
relationship is linear or non-linear.

The relationship between y and x is non-linear.

y
8
6
4
2
0

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

November 2, 2011

9:40

185

Chapter 6 Data transformation

b 1 Construct a new table of values.

x
Y2

10.2

20.3

30.3

39.7

50.4

2 Plot the values of y 2 against x.


3 Decide if the form of the
relationship is linear or non-linear.
The relationship between y 2 and x is
linear, so that we can t a line to the data.

y2
60
50
40
30
20
10

y 2=10x

c The slope of the line is approximately


10 and the intercept is approximately
zero. Thus, remembering that the DV
is y 2 and the IV is x, the equation of
the line is y 2 = 10x.
Example 3

The equation y 2 = 10x .


When x = 1.6,
y 2 = 10 1.6 = 16 or y = 4.

Linearising the relationship with the log transformation

a Plot the data in this table, and comment on the


x
0
1
2
3
4
5
form of the relationship between y and x.
y 100 37 14
5
2
1
b Apply a log transformation to the y values (log y),
again plot the data, and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation and
nd the value of log y when x = 1.5.
Solution
a 1 Plot the values of log y against x.
2 Decide if the form of the
relationship is linear or non-linear.
The relationship between y and x
is non-linear.

y
100
80
60
40
20
0

b 1 Construct a new table of values.

x 0
1
2
3
4
5
log y 2.00 1.57 1.15 0.70 0.30 0.00

2 Plot the values of log y against x.


3 Decide if the form of the relationship is linear or
non-linear. The relationship between y and x is
linear, so that we can t a line to the data.

log y
2.0
log y = 2 - 0.4x

1.5
1.0
0.5
0

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

186

November 2, 2011

8:13

Essential Further Mathematics Core

c The slope of the line is approximately The equation is log y = 2 0.4x.


0.4 and the intercept is approximately
When x = 1.5, log y = 2 0.4 1.5 = 1.4
2. Thus, remembering that the DV is
or y = 101.4 = 25.
log y and the IV is x, the equation of
the line is log y = 2 0.4x.
Example 4

Linearising the relationship with the 1/y transformation

a Plot the data in this table, and comment on the


x
1
2
3
4
5
form of the relationship between y and x.
y 10.0 5.0
3.3
2.5
2.0
b Apply a reciprocal transformation to the y values
(1/y), again plot the data, and comment on the form of the relationship between x and 1/y.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation and
nd the value of y when x = 8.
y

Solution
10

a 1 Plot the values of y against x.


2 Decide if the form of the relationship
is linear or non-linear.
The relationship between y and x is non-linear.

8
6
4
2

b 1 Construct a new table of values.

x
1/y

1
0.1

1
against x.
y
3 Decide if the form of the
relationship is linear or non-linear.
1
The relationship between and x is
y
linear, so that we can t a line to the data.

2
0.2

3
0.3

5
0.5

0.5
0.4
0.3
1 = 0.1x
y

0.2
0.1

The relationship between


The equation is

4
0.4

1
y

2 Plot the values of

c Thus, remembering that the DV is


1
and the IV is x, the equation
y
1
of the line is = 0.1x.
y

1
and x is linear.
y

1
= 0.1x.
y

When x = 8,
1
= 0.1 8 = 0.8 or y = 1.25.
y

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

187

Which transformation?
What sorts of non-linear relationships can we linearise using the y 2 transformation?
The y 2 transformation has the effect of stretching out the upper end of the y scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the y to y 2 transformation. Note that for the y 2 transformation to
apply, the scatterplot should peak or bottom around y = 0.
y

What sorts of non-linear relationships can we linearise using the log y transformation?
The log y transformation has the effect of compressing the upper end of the y scale. As a guide,
relationships that have scatterplots which look like those shown below can often (but not
always) be linearised using the y to log y transformation.
y

What sorts of non-linear relationships can we linearise using the 1/y transformation?
As a guide, relationships that have scatterplots which look like those shown below can often
(but not always) be linearised using the y to 1/y transformation.
y

Exercise 6F
The y2 transformation
1 a Plot the data in the table. Comment on the
x 0
2
4
6
8 10
form of the relationship between y and x.
y 1.2 2.8 3.7 4.5 5.1 5.7
b Apply a squared transformation to the y values
(y 2 ). Plot the data, and comment on the form of the relationship between y 2 and x.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

188

October 22, 2011

12:58

Essential Further Mathematics Core

c Fit a line to the transformed data with y 2 as the DV and x as the IV. Write its equation.
Find the value of y when x = 9.
2 a Plot the data in the table. Comment on
the relationship between y and x.

x
y

5
10
15
20
25
30
13.2 12.2 11.2 10.0 8.7 7.1

b Apply a squared transformation to the y values (y 2 ). Plot the data, and comment on the
form of the relationship between y 2 and x.
c Fit a line to the transformed data with y 2 as the DV and x as the IV. Write its equation.
Find the value of y when x = 7.
The log y transformation
3 a Plot the data in the table. Comment on the
x
0.1
0.2
0.3
0.4
0.5
form of the relationship between y and x.
y 15.8 25.1 39.8 63.1 100.0
b Apply a log transformation to the y values
(log y). Plot the data and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation.
Find the value of log y when x = 0.6.
4 a Plot the data in the table. Comment on the
x 2
4
6
8
10
form of the relationship between y and x.
y 7.94 6.31 5.01 3.98 3.16
b Apply a log transformation to the y values
(log y). Plot the data and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation.
Find the value of log y when x = 5.
5 a Plot the data in the table. Comment on the
x
1
3
5
7
9
form of the relationship between y and x.
y
7
32 147 681 3162
b Apply a log transformation to the y values
(log y). Plot the data, and comment on the form of the relationship between log y and x.
c Fit a line to the transformed data with log y as the DV and x as the IV. Write its equation.
Find the value of log y when x = 2.
The 1/x transformation
6 a Plot the data in the table. Comment on the
x
1
2
3
4
5
form of the relationship between y and x.
y
1
0.5 0.33 0.25 0.20
b Apply a reciprocal transformation to the
y values (1/y). Plot the data and comment on the relationship between 1/y and x.
c Fit a line to the transformed data with 1/y as the DV and x as the IV. Write its equation.
Find the value of y when x = 2.5.
7 a Plot the data in the table. Comment
on the form of the relationship
between y and x.

x
y

11
0.43

14
0.34

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

26
0.19

35
0.14

41
0.12

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

189

b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on
the form of the relationship between 1/y and x.
c Fit a line to the transformed data with 1/y as the DV and x as the IV. Write its equation.
Find the value of y when x = 20.
d Name a y-axis transformation that should also work for the data. Try it and see.
e Name a y-axis transformation that should not work for the data. Try it and see.
Practical examples
8 The time (in minutes) taken for a local anaesthetic to take effect is related to the dose given.
To investigate this relationship a researcher collected the data shown.
Dose
Time

0.5
3.67

0.6
3.55

0.7
3.42

0.8
3.29

0.9
3.15

1.0
3.00

1.1
2.85

1.2
2.68

1.3
2.51

1.4
2.32

1.5
2.12

a Construct a scatterplot showing the relationship between the dose of anaesthetic and
time taken for it to take effect, and comment on the form.
b Apply a squared transformation to the time values (y), again plot the data, and comment
on the form of the relationship between time squared (y 2 ) and dose (x).
c Fit a line to the transformed data with time2 as the DV and dose as the IV. Write its
equation. Predict the time for the anaesthetic to take effect when the dose is 0.4 units.
9 The table below shows the number of internet users signing up with a new internet service
provider for each of the rst nine months of their rst year of operation.
Number
Month

24
1

32
2

35
3

44
4

60
5

61
6

78
7

92
8

118
9

a Construct a scatterplot showing the relationship between number of users signing up


and month, and comment on the form. Month is the independent variable.
b Apply a log transformation to the number of users (y), again plot the data, and comment
on the form of the relationship between log (number) and month (x).
c Fit a line to the transformed data with log number as the DV and month as the IV. Write
its equation. Predict the value of log number for month 10.
10 A group of ten students was given an opportunity to practise a complex matching task as
often as they liked before they were assessed on the task. The number of times they
practised the task and the number of errors made when assessed are given in the table.
Number of practices
Number of errors

1
14

2
9

2
11

4
5

5
4

6
4

7
3

7
3

9
2

11
2

a Construct a scatterplot showing the relationship between number of practices and


number of errors (y), and comment on the form.
b Apply a reciprocal (1/y) transformation to the number of errors values. Plot the data,
and comment on the form of the relationship between number 1of errors (1/y) and number of
practices (x).
c Fit a line to the transformed data with number 1of errors as the DV and number of practices
as the IV. Write its equation. Predict the number of errors when the task is practised
12 times.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Cambridge University Press

Photocopying is restricted under law and this material must not be transferred to another party

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

190

October 22, 2011

12:58

Essential Further Mathematics Core

6.4 Choosing and applying the appropriate

transformation
Putting together the information in Sections 6.2 and 6.3, we can see that there may be more
than one transformation which linearises the scatterplot. The forms of the scatterplots that can
be transformed by the squared, log or reciprocal transformations can be largely classied into
one of four categories, shown as the circle of transformations.
The circle of transformations
Possible transformations

Possible transformations

y2

log x

y2
x2

1
x

log y
1
y

log x

log y

1
y

x2

1
x

Note that the transformations we have introduced in this chapter are able to linearise only
those relationships that are consistently increasing or decreasing.
The advantage of having alternatives is that, in practice, we can always try each of them to
see which gives us the best result. How do we decide which transformation is the best? The
best transformation is the one that results in the best linear model. To choose the best linear
model we will consider, for each transformation applied:
the residual plot, in order to evaluate the linearity of the transformed relationship
the value of the coefcient of determination (r 2 ): a higher value indicates a better t.
This procedure is illustrated in Example 5.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

191

Example 5
The data in this table gives life expectancy in years and gross national product, GNP, in dollars
for 24 countries in 1982. Using an appropriate transformation, nd a regression model for the
relationship between life expectancy in years and GNP.
Country
Nicaragua
Paraguay
Venezuela
France
West Germany
Greece
Norway
Czechoslovakia
Austria
Jordan
Sri Lanka
Brunei

GNP Life expectancy


950
58
1670
65
4250
68
11 520
74
12 280
73
4170
73
14 300
75
5540
71
9830
72
1680
61
320
67
22 260
66

Country
GNP Life expectancy
Indonesia
550
50
North Korea
930
66
Mongolia
940
64
Taiwan
2 670
72
Australia
11 220
74
Congo
1 420
48
Ethiopia
150
41
Guinea
330
44
Mauritania
520
44
Nigeria
940
49
Togo
350
48
Zaire
180
48

Source: Modern Data Analysis: A First Course in Applied Statistics, L.C. Hamilton 1990, p. 537

West Germany is now part of Germany; Czechoslovakia is now the Czech Republic and Slovakia.

Solution

2 Plot the values of y against x, decide if


the form of the relationship is linear or
non-linear, and nd the value of the
coefcient of determination (r 2 ).

The independent variable is GNP. The


dependent variable is Life expectancy.

Life expectancy

1 Decide which of the variables is the


independent variable, and which is the
dependent variable.

GNP

3 Write down your conclusion.

The relationship between y and x is


non-linear: r 2 = 36.7%.

4 Compare the shape of this plot to those in the


circle of transformations (page 190). The
scatterplot is similar to the plot in the top
left-hand corner. Thus, y 2 , log x and 1x
are the transformations to investigate.

Suitable transformations are y 2, log x


1
and x .

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

12:58

Essential Further Mathematics Core

Try the y 2 transformation


5 Calculate the values of (Life expectancy)2
and plot these against GNP. Comment
on the linearity of the plot.

Life expectancy squared

192

October 22, 2011

GNP

r 2 = 38.4%. The relationship between (Life


expectancy) 2 and x is still non-linear. This is
confirmed by the residual plot.

Residual

6 Fit a regression line, and nd the value of


the coefcient of determination (r 2 ).
Produce a residual plot, and use this to
comment on the form of the relationship.

GNP

7 Comment on the effect of the


transformation.

The y 2 transformation has not really helped.

Life expectancy

Try the log x transformation


8 Calculate the values of log GNP and plot
these against Life expectancy.

log GNP

9 Fit a regression line, and nd the value of


r 2 . Produce a residual plot, and use this to
comment on the form of the relationship.

r 2 = 66.0%. The relationship between Life


expectancy and log GNP is closer to linear.
This is confirmed by the residual plot.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

193

Residual

10 Comment on the effect of the


transformation.

log GNP

Try the 1/x transformation


1
11 Calculate the values of
and plot
GNP
1
.
Life expectancy against
GNP

Life expectancy

The log x transformation has linearised the


relationship quite well.

1/GNP

r 2 = 51.5%. The relationship between Life


expectancy and 1/GNP is reasonably linear.
This is confirmed by the residual plot.

Residual

12 Fit a regression line, and nd the value of


r 2 . Produce a residual plot and use this to
comment on the form of the relationship.

1/GNP

13 Comment on the effect of the


transformation.
14 Decide which transformation is the
most appropriate for this relationship.
Choose the transformation that
gives the most linear relationship
(from the residual plots) and the
highest value of r 2 .

The 1/x transformation has done a


reasonable job in linearising the relationship.
The most appropriate transformation to use
here is the log x transformation, as the
residual plot shows that the relationship
between log GNP and Life expectancy is
linear, and this model has the highest
coefficient of determination, r 2 = 66.0%.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

194

October 22, 2011

12:58

Essential Further Mathematics Core

Using the log x transformation gives a


regression model for the relationship:

15 As the relationship between Life expectancy


and log (GNP) appears to be linear and there
are no obvious outliers, we can use the
least squares method to t a line to the data.
Using a calculator, nd the equation of the
least squares regression line and write it in
terms of the transformed variables.

Life expectancy = 14.3 + 14.5 log (GNP)

Note: The independent variable (IV) is now


log GNP, and the dependent variable (DV) is
Life expectancy.

Some comments
It might seem unnatural to talk about the wealth of a country in terms of log (GNP), yet when
we are comparing the relative wealth of countries, log (GNP) is probably a more useful measure
than GNP. For instance, knowing that the difference in GNP between Australia and Sri Lanka
is $10 900 is less informative than knowing that the difference in log (GNP) is 1.5448, which
tells us that Australias GNP is 101.5884 or 35 times that of Sri Lanka. Natural units of
measurement are more often those that are familiar rather than those that are most useful!

Exercise 6G
1 The following scatterplots show non-linear relationships. For each scatterplot, state which
of the transformations x 2 , log x, 1/x, y 2 , log y or 1/y, if any, you would apply to linearise
the relationship.
b 5
a 5
4

3
y

y
2

0
1 2 3 4 5 6 7 8 9 10
x

1 2 3 4 5 6 7 8 9 10
x

5
4

3
y

y
2

1 2 3 4 5 6 7 8 9 10
x

1 2 3 4 5 6 7 8 9 10
x

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

195

2 The data below give the yield in kilograms and length in metres of 12 commercial potato
plots.
Yield (kilograms) 346 1798 152 86 436 968 686 257 2435 287 1850 1320
Length (metres) 12.1 27.4 8.3 5.5 15.7 21.5 19.3 9.0 34.2 14.7 31.9 25.3
a Construct a scatterplot showing the relationship between yield and length of plot and
comment on the form.
b Using an appropriate transformation, nd a regression model for the relationship
between yield in kilograms and length of plot in metres.
3 A recent study in Canada showed that cigarette consumption (per day) is related to cost per
pack. Some data drawn from that study is shown below.
Cost ($)
4.00 4.50 4.80 5.50 6.00 6.50 7.50
Cigarette consumption 8.0 7.4 7.0 6.4 5.9 5.5 5.0
a Construct a scatterplot showing the relationship between the cost of cigarettes and
cigarette consumption, and comment on the form.
b Using an appropriate transformation, nd a regression model for the relationship
between the cost of cigarettes and cigarette consumption.
4 The population of a large town increased over a 13-year period, as shown in the table.
a Construct a scatterplot showing the
annual population growth of the town,
and comment on the form.
b Using an appropriate transformation,
nd a regression model for the annual
population growth of the town.

Year
1
2
3
4
5
6
7

Population
58860
57770
58206
59513
59983
60123
59763

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Year
8
9
10
11
12
13

Population
61726
60387
61646
62347
64185
67158

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

196

October 22, 2011

12:58

Essential Further Mathematics Core

5 The monthly average exchange rate (to the nearest cent) between the Australian dollar and
the US dollar over a period of 18 months in the 1990s is given in the table below.
a Construct a scatterplot showing
Exchange rate
Exchange rate
the exchange rate over the 18-month Month
(US $)
Month
(US $)
period, and comment on the form.
1
0.77
10
0.75
b Using an appropriate transformation,
2
0.77
11
0.75
nd a regression model for the
3
0.77
12
0.72
exchange rate over that 18-month
4
0.76
13
0.72
period.
5
0.78
14
0.69
6
0.78
15
0.69
7
0.77
16
0.68
7
0.76
17
0.71
8
0.76
18
0.70
9
0.76
6 The table below shows the percentage of people who can read (literacy rate) and the gross
domestic product (GDP) for a selection of 14 countries.
a Construct a scatterplot showing the
relationship between literacy rate
and GDP, and comment on the form.
b Using an appropriate transformation,
nd a regression model for the
relationship between literacy rate
and GDP for this group of countries.

Country
Botswana
Cambodia
Canada
Ethiopia
France
Georgia
Germany
Honduras
Japan
Liberia
Pakistan
Saudi Arabia
Switzerland
Syria

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Literary
rate (%)
72
35
97
24
99
99
99
73
99
40
35
62
99
64

Gross domestic
product/capita
2677
260
19904
122
18944
4500
17539
1030
19860
409
406
6651
22384
2436

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

197

Data transformation

This means changing the scale on either the x- or y-axis. It


is performed when a residual plot shows that the
underlying relationship in a set of bivariate data is clearly
non-linear.

x2 or y2 transformation

The squared transformation stretches out the upper end of


the scale on an axis.

log x or log y transformation

The log transformation compresses the upper end of the


scale on an axis.

1
1
or transformation
x
y

Review

Key ideas and chapter summary

The reciprocal transformation compresses the upper end


of the scale on an axis to a greater extent than the log
transformation.

Residual plots

Residual plots are used to assess the effectiveness of each


data transformation.

Coefcient of determination (r2 )

The transformation that results in a linear relationship and


that has the highest value of the coefcient of
determination is considered to be the best transformation.

The circle of transformations

The circle of transformations provides guidance in


choosing the transformations that can be used to linearise
various types of scatterplots. See page 190.

Skills check
Having completed this chapter you should be able to:
1
1
recognise which of the x 2 , log x, , y 2 , log y or transformations might be used to
x
y
linearise a bivariate relationship
apply each of these transformations to a data set
use residual plots and the coefcient of determination, r 2 , to decide which
transformation gives the best model for the relationship
use the transformed variable as part of a regression analysis to give a model for the
relationship.

Multiple-choice questions
1 The missing data values, a and b, in the table are:
value
(value)2
log(value)

1
a
0

A a = 0, b = 0.5
D a = 1, b = 0.602

2
4
b

3
4
9
16
0.477 0.602
B a = 1, b = 0.5
E a = 1, b = 0.693

C a = 1, b = 0.301

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

12:58

Essential Further Mathematics Core

2 Select the statement that correctly completes the sentence:


The effect of a log transformation is to . . .
A stretch the high values in the data
B maintain the distance between values
C stretch the low values in the data
D compress the high values in the data
E reverse the order of the values in the data
3 The scatterplot opposite shows the
relationship between the number of weeks
each person has been on a diet program and
their weight loss in kilograms. A least
squares regression line has been tted
to the data.

14
12
Weight loss

Review

198

October 22, 2011

10
8
6
4
2
0
2

3
4
5
6
7
Number of weeks on a diet

4.00

6
5
4

2.00
0.00

2.00

4.00

4.00 2.00 0.00 2.00 4.00


Residual

2 3 4 5 6 7
Number of weeks on a diet

E
14
12
10
8
6
4
2
0

7
6
5
4
3
2
0 2 4 6 8 10 12 14
Weight loss

4.00
Residual

Weight loss

Number of weeks on a diet

Residual

Number of weeks on a diet

The residual plot for this least squares line would look like:

2.00
0.00

2.00
4.00
2 3 4 5 6 7
Number of weeks on a diet

2 3 4 5 6 7
Number of weeks on a diet

4 The relationship between two variables y and


x, as shown in the scatterplot, is non-linear.
In an attempt to transform the relationship to
linearity, a student would be advised to:
A leave out the rst four points
B use a y 2 transformation
C use a log y transformation
1
D use a transformation
y
E use a least squares regression line

y
5
4
3
2
1
0

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

x
1 2 3 4 5 6 7 8 9 10

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

Which of the following sets of transformations


could possibly linearise this relationship?
1
1
A log y, , log x,
B y2, x 2
y
x
1
1
D log y, , x 2
C y 2 , log x,
y
x
E ax + b

5
4
3
2

Review

5 The relationship between two variables y and x, as shown in the scatterplot, is


non-linear.

199

1
x

1 2 3 4 5 6 7 8 9 10

6 The relationship between two variables y and x, as shown in the scatterplot, is


non-linear.
y
Which of the following transformations is most
5
likely to linearise the relationship?
1
B a y 2 transformation 4
A a transformation
x
3
1
C a log y transformation D a transformation
y
2
E a log x transformation
1

1 2 3 4 5 6 7 8 9 10

7 The following data was collected for two related variables x and y.
x
y

1
7

2
8.6

3
8.9

4
8.8

5
9.9

6
9.7

7
10.4

8
10.5

The scatterplot indicates a non-linear


relationship. The data is linearised using a log x
transformation. A least squares line is then tted
to the transformed data. The equation of this
line is closest to:
A y = 7.52 + 0.37 log x
B y = 0.37 + 7.52 log x
C y = 1.71 + 0.25 log x
D y = 3.86 + 7.04 log x
E y = 7.04 + 3.86 log x

9
10.7

10
11.2

11
11.1

y
12
10
8
6
4
2
0

x
0

10

12

8 Brian has determined from a scatterplot of his data that the appropriate
transformations for his data are log x, 1/x and y 2 . After applying each of these
transformations to the data, he obtains the results shown below.
Model
y vs x
y vs log x
y vs 1/x
y 2 vs x

Residuals
Curved
Random
Random
Random

r2
79.6%
80.8%
81.9%
88.4%

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

Review

200

October 22, 2011

12:58

Essential Further Mathematics Core

Based on the information in the table, which transformation would you suggest
Brian use?
B a y 2 transformation
C a log x transformation
A an x 2 transformation
1
D a transformation
E no transformation
x
9 When investigating the relationship between the weight of the strawberries picked
from a strawberry patch, and the width of the patch, Suzie decides that an x 2
transformation is appropriate. After transforming the data, she ts a least squares
regression line to the data and determines that the intercept is 10 and the slope is 5.
Based on this information, the model that Suzie has tted to the data can be written
as:
B weight = 5 + 10 (width)2
A (weight)2 = 10 + 5 width
C weight = 10 + 5 (width)2
D (weight)2 = 10 + 5 (width)2
2
E (weight) = 5 + 10 width
10 Suppose that the model that describes the relationship between the hours spent
studying for an exam and the mark achieved can be modelled by the equation:
Mark = 20 + 40 log (Hours)
From this model, we would predict that a student who studies for 20 hours would
score a mark (to the nearest whole number) of:
A 80
B 78
C 180
D 72
E 140

Extended-response questions
1 Measurements of distance travelled in metres and time taken in seconds were made
on a falling body. The data are given in the table below.
Time
Distance
Time2

0
0

1
5.2

2
18.0

3
42.0

4
5
6
79.0 128.0 168.0

Construct a scatterplot of the data and comment on its form.


Determine the values of (Time)2 and complete the table.
Construct a scatterplot of Distance against (Time)2 .
Obtain a residual plot for the new model and comment on the linearity.
Determine the value of r 2 for the new model.
Write down the regression equation for the new model in terms of the variables
in the question.
g Use the regression equation to predict the distance travelled in seven seconds.

a
b
c
d
e
f

2 The data in the table below show the marks obtained by students on a test and the
amount of time they reported studying for the test:
Mark
62 74 79
Time (hours) 1.5 2.25 3.0

80
2.5

56
0.8

86
3.5

92 87 64
6.0 2.75 1.0

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

88
4.5

48
0.5

32
0.1

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

Chapter 6 Data transformation

201

Review

a We want to predict a students mark from the time they reported studying for the
test. In this situation, which is the dependent variable and which is the
independent variable?
b Construct a scatterplot and comment on the relationship between test mark and
time spent studying in terms of direction, outliers, form and strength.
c i Fit a linear model to the data and record its equation. Interpret the slope in
terms of the problem at hand.
ii Calculate the coefcient of determination and interpret.
iii Construct a residual plot and use it to comment on the suitability of modelling
the relationship between Mark and Time spent studying with a straight line.
d Apply a log transformation to Time. Then:
i construct a scatterplot for the transformed data
ii nd the equation of the least squares regression line for the transformed data
iii use the equation to predict the mark obtained after 5 hours of study
iv calculate the coefcient of determination and interpret
v construct a residual plot and use it to comment on the linearity of the
transformed model

y = 31.9 0.015x
r2 = 0.662

40
0
60
0
80
0
10
00
12
00
14
00
16
00

30
28
26
24
22
20
18
16
14
12
10

Testosterone level
5
4
3
2
1
0
1
2
3
4
5
40
0
60
0
80
0
10
00
12
00
14
00
16
00

Residual

Testosterone
1305
1000
1175
1495
1060
800
1005
710
1150
605
690
700
625
610
450

Age at rst
conviction
11
12
13
14
15
16
16
17
18
20
21
23
24
27
30

Age

3 The following are the testosterone levels and the age at rst conviction for violent
and aggressive crimes collected on a sample of young male prisoners. It is believed
that the higher the testosterone level in a male prisoner, the earlier they are likely to
be convicted of a violent and aggressive crime. A correlation and regression
analysis is also given.

Testosterone level

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

Review

202

October 22, 2011

12:58

Essential Further Mathematics Core

a What is the value of Pearsons correlation coefcient, r?


b Write the equation of the least squares regression line in terms of Testosterone
level and Age.
c Interpret the value of r 2 in terms of Testosterone level and Age.
d Use the residual plot to comment on the linearity of the relationship.
e Construct a scatterplot of Age against log (Testosterone).
f Obtain a residual plot for the new model and comment on the linearity.
g Determine the value of r 2 for the new model.
h Write down the regression equation for the new model in terms of the variables
in the question.
4 Are infant mortality rates in a country related to the number of doctors in a
country? The data below give infant mortality rates (deaths per 1000 births) and
doctor numbers (per 100 000 people) for 17 countries.
Infant mortality No. of doctors Infant mortality No. of doctors
12
192
15
270
13
222
85
9
12
154
20
357
14
294
21
250
10
182
54
79
10
179
75
59
7
204
121
27
10
271
71
52
111
61
a Construct a scatterplot of Infant mortality against Number of doctors and
comment on the relationship between infant mortality rate and doctor numbers
in terms of direction, outliers, form and strength.
b Construct a scatterplot of Infant mortality against log (Number of doctors).
c Obtain a residual plot for the new model and comment on the linearity.
d Determine the value of r 2 for the new model.
e Write down the regression equation for the new model in terms of the variables
in the question.
f Use the regression equation to predict the infant mortality rate when there are
100 doctors (per 100 000).
5 Tree ages can be determined by cutting down a tree and counting the number of
rings on the stump of its trunk. This, however, is a destructive process and it would
be useful to have a method of working out the approximate age of a tree without
having to cut it down. Noting the obvious, that trees tend to get bigger as they get
older, we might be able to use some external measurement of size to help us
estimate the age of a tree.

ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

P1: FXS/ABE
P2: FXS
9780521740517c06.xml
CUAU031-EVANS

October 22, 2011

12:58

203

Chapter 6 Data transformation

Review

The data below show the age (in years) and diameter at chest height (in cm) of a
sample of trees of the same species taken from a commercial plantation.
Age
(years)
4
5
8
8
8
10
10
12
13
14

Diameter
(centimetres)
2.0
2.0
2.5
5.1
7.5
5.1
8.9
12.4
9.0
6.4

Age
(years)
16
18
22
25
29
30
34
38
40

Diameter
(centimetres)
11.4
11.7
14.7
16.5
15.2
15.2
17.8
17.8
19.1

a We wish to predict the age of a tree from its diameter at chest height. In this
situation, which is the dependent variable and which is the independent variable?
b Construct a scatterplot and comment on the relationship between age and
diameter in terms of direction, outliers, form and strength.
c i Fit a linear model to the data and record its equation. Interpret the slope in
terms of the problem at hand.
ii Calculate the coefcient of determination and interpret.
iii Form a residual plot and use it to comment on the suitability of modelling the
relationship between age and diameter with a straight line.
d Use the x 2 transformation to linearise the data. Then:
i construct a scatterplot of age against diameter squared
ii nd the equation of the least squares regression line for the transformed data
iii calculate the coefcient of determination and interpret
iv form a residual plot and use it to comment on the suitability of modelling the
relationship between age and diameter squared with a straight line
6 The table below shows the performance level recorded by a number of people on
completion of a task, along with the time spent (in minutes) in practising the task.
Time spent on practice
Level of performance

0.5
1.0

1.0
1.5

1.5
2.0

2.0
3.0

3.0
3.0

4.0
3.5

5.0
4.0

6.0
3.5

7.0
3.9

7.0
3.6

a Construct a scatterplot showing the relationship between time spent on practice


and level of performance, and comment on the form. The IV is time.
b Apply a log transformation to the x values (log x), again plot the data and
comment on the form of the relationship between y and log x.
c Fit a line to the transformed data and write down its equation. Use the equation
to predict the expected level of performance (correct to one decimal place) for a
person who spends 2.5 minutes practising the task.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party

Cambridge University Press

Vous aimerez peut-être aussi