14 vues

Transféré par Gulleds Ibrahim

- Excel 2007 Regression
- Feyrer, J. (2009) w14910
- LIMDEP Short Student Manual 9.0
- 548-1724-1-PB
- schneider hoike part 2 lesson plan materials-1
- Lampiran SPSS
- SPSS16 Manual
- Spatial Distribution of Nematodes in a Heavy Metal Contamibated Nature Reserve Thesis 2013
- lectures1-3
- Inferential Statistics
- Forecasting FX Rates
- Errors and Residuals in Statistics - Wikipedia, The Free Encyclopedia
- corr_reg
- Jaiswal Caterers - BRM Proposal
- ARTICULO
- OUTPUT.doc
- 1
- satellite project final paper
- Regresi Ganda
- Modelling and Optimization of Cyanidation Process. a Chemometric Approach by Regression Analysis

Vous êtes sur la page 1sur 38

CORE

Data transformation

What do we mean by data transformation? What is the effect of applying a log, squared or reciprocal transformation to the x variable? What is the effect of applying a log, squared or reciprocal transformation to the y variable? How do I choose which data transformation to apply? How do I carry out a regression analysis with transformed data?

There are methods for tting curves to non-linear relationships, using non-linear regression. However, this procedure is mathematically complicated and the results difcult to interpret. The method of dealing with a non-linear relationship favoured in practice is to apply a mathematical function to one of the variables, so that the relationship between the variables becomes closer to a straight line. By appropriate choice of the function, the scale of measurement is stretched or compressed. There are many functions that can be used to transform the data, but here we will consider only three. These are: the squared transformation the logarithmic transformation the reciprocal transformation When rst confronted with data transformation, many people tend to be suspicious. However, when we think about it from the point of view of analysing a set of data, there is nothing special about the units of measurement used when gathering the data. In general, units used are chosen because they are convenient for recording and reporting the data. Natural units tend to be used, for example, seconds when recording time, or metres when recording length. But what is the natural unit for measuring fuel economy of a car: kilometres per litre (x) or litres per kilometre (x 1 )? In measurement, natural often tends to mean familiar. For example, to a chemist, it is natural to measure acidity in terms of pH and the logarithm of hydrogen ion concentration (log x) rather than the hydrogen ion concentration (x).

166

167

Generally, it is only luck if a data set reveals all its hidden information when analysed in the form in which it was initially gathered and/or reported. It is part of the analysts role to search out different ways of looking at the data in order to enhance our understanding of that data. One of the most powerful tools available to help achieve this task is data transformation. How do these transformations affect the values to which they are applied? Consider the following table of numbers: value (value)2 log(value) 1/value 0.2 0.04 0.699 5 0.4 0.16 0.398 2.5 0.6 0.36 0.222 1.667 1 1 0 1 2 4 0.301 0.5 3 9 0.477 0.333 4 16 0.602 0.25

From the table we can see that the transformations have the following effects on the data values: The squared transformation has the effect of decreasing values less than 1, and increasing values greater than 1. Large values are increased the most. For example, 22 = 4, and 202 = 400, so that while the values 2 and 20 are 18 units apart, the values 4 and 400 are 396 units apart. That is, the effect of the square transformation is to stretch the values. The log transformation reduces all values, and values between 0 and 1 become negative. Large values are reduced much more than small values. For example, log 2 = 0.301, and log 20 = 1.301, so that while the values 2 and 20 are 18 units apart, the values 0.301 and 1.303 are only 1 unit apart. That is, the effect of the log transformation is to compress the values. Note that the log function can be applied only to values which are greater than 0. The reciprocal transformation again reduces all values greater than one. Large values 1 are reduced much more than small values. For example, 1 = 0.5, and 20 = 0.05, so that 2 while the values 2 and 20 are 18 units apart, the values 0.5 and 0.05 are only 0.45 units apart. That is, the effect of the reciprocal transformation is to compress the large values to an even greater extent than the log transformation. Thus it can be seen that all transformations have a greater effect on the larger values, but this effect varies for each transformation.

Exercise 6A

1 a Copy and complete the table. value 1 2 3 4 5 b Use the information in the table to 2 (value) complete the following statements log(value) by deleting the incorrect term. 1/value i The squared transformation stretches/compresses the scale. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 6 7

168

2 a Copy and complete the table. value 1 2 4 8 16 32 64 b Use the information in the table to 2 (value) complete the following statements log(value) by deleting the incorrect term. 1/value i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 3 a Copy and complete the table. value (value)2 log(value) 1/value 1 10 100 1000 10000 100000

b Use the information in the table to complete the following statements by deleting the incorrect term. i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 4 a Copy and complete the table. value (value)2 log(value) 1/value 20 10 5 2.5 1.25 0.625

b Use the information in the table to complete the following statements by deleting the incorrect term. i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values. 5 a Copy and complete the table. value (value)2 log(value) 1/value 2 20 200 2000 20000 200000

b Use the information in the table to complete the following statements by deleting the incorrect term. i The squared transformation stretches/compresses the larger values. ii The log transformation stretches/compresses the larger values. iii The reciprocal transformation stretches/compresses the larger values.

169

6.2

We are interested in linearising the relationship between two variables, x and y, and the transformations discussed in the previous section can be applied to either x or y (but not both here). We will examine the effect of transforming the x-axis and the y-axis separately. Transforming the x-axis will have the effect of moving the x values on the plot horizontally, and leave the y values unaltered. The square, log and reciprocal transformations can be applied to the x-axis with the following effects:

Transformation x2 Outcome Graph

y

log x

1 x

Also compresses large x values relative to the smaller data values, to a greater extent than log x. Note that values of x less than 1 become greater than 1, and values of x greater than 1 become less than 1, so that the order of the data values is reversed.

The following examples show the effect on the relationship between x and y when the squared, log and reciprocal transformations are applied to the x values.

170

Example 1

a Plot the data in the table, and comment on the form of the relationship between x and y.

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.

y 20 15 10 5 0 x 1 2 3 4

x2 0 1 4 9 16 y 2 3 6 11 18

y 20 15 10 5 0 x2 5 10 15 20

2 Plot the values of y against x 2 . 3 Decide if the form of the relationship is linear or non-linear.

Data transformation is very conveniently carried out with the aid of a graphics calculator, and in practice, this is how you will do it in future. Note that, throughout this chapter, you will nd it useful to enter the data into named lists because you will need to keep track of the various lists of transformed data as you work through the problems.

171

How to apply the squared transformation using the TI-Nspire CAS Plot the data presented in the table below. x y 0 2 1 3 2 6 3 11 4 18

Apply a squared transformation to the x values (x 2 ) and replot the data. Steps 1 Start a new document by pressing + . 2 Select Add Lists & Spreadsheet. Enter the data into lists named x and y, as shown.

3 Press + and select Add Data & Statistics. Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

4 Return to the Lists & Spreadsheet + ). application (by pressing To calculate the values of x 2 and store them in a list named xsq (short for x-squared), do the following: a Move the cursor to the top of column C . and type xsq. Press b Move the cursor to the grey cell immediately below the xsq heading. We need to enter the expression = x 2. To then VAR ( ), do this, press highlight the variable x and then press to paste x into the formula line. Finally, type 2 (or press ) to complete to calculate and the formula. Press display the x-squared values.

Note: The dash in front of the x (i.e. x) is automatically added when a list name is pasted from the VAR menu. Note: You can also type in the variable x and then select Variable Reference when prompted. This avoids using the VAR menu.

172

5 Construct a scatterplot of y against x 2 . + to return to the Press scatterplot created earlier and change the independent variable to xsq as follows: a Press e until the list of variables is displayed near the x-axis. Select the to paste the variable, xsq. Press variable to the x-axis. b A scatterplot of y against xsq (x 2 ) is then displayed, as shown. The plot is clearly linear.

Note: If you wish to keep the original plot of y against x you can create a new Data & Statistics page to plot the transformed data.

How to apply the squared transformation using the ClassPad Plot the data presented in the table below. x y 0 2 1 3 2 6 3 11 4 18

Apply a squared transformation to the x values (x 2 ) and replot the data. Steps 1 Open the Statistics application and enter the data into the columns named x and y. Your screen should look like the one shown. 2 Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

173

3 To calculate the values of x 2 and store them in a list named xsq (i.e. x-squared): a Tap to highlight the cell at the top of the next empty list. Rename by typing xsq and pressing enter . b Tap to highlight the cell at the bottom of the newly named xsq column (in the row titled Cal ). Type x 2 and press to calculate and list the x 2 values. 4 Construct a scatterplot of y against xsq (i.e. x 2 ). The plot is clearly linear.

Example 2

Linearising the relationship with the log transformation x y 1 0 10 10 100 20 400 25 600 1000 28 30

a Plot the data in the table, and comment on the form of the relationship between x and y.

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.

30 25 20 15 10 5 0 x 200 400 600 800 1000 y

174

log x y

0 0

y 30 25 20 15 10 5 0

1 10

2 20

2.6 2.8 25 28

3 30

2 Plot the values of y against log x. 3 Decide if the form of the relationship is linear or non-linear.

Once again, this transformation is very conveniently carried out with the aid of a graphics calculator. How to apply the log transformation using the TI-Nspire CAS Plot the data presented in the table below. x y 1 0 10 10 100 20 400 25 600 28 1000 30

Apply a log transformation to the x values (log (x)) and replot the data. Steps 1 Start a new document by pressing + . 2 Select Add Lists & Spreadsheet. Enter the data into lists named x and y, as shown opposite.

175

3 Press + and select Add Data & Statistics. Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

4 Return to the Lists & Spreadsheet + ). application (by pressing To calculate the values of log x and store them in a list named lx (short for log x), complete the following: a Move the cursor to the top of . column C and type lx. Press b Move the cursor to the grey cell immediately below the lx heading and type = log(. Then ), highlight the press VAR ( to paste x variable x, press into the formula line, then type ) to complete the command. Press to calculate and display the log values. 5 Construct a scatterplot of y against log x. + to return to the Use scatterplot created earlier and change the independent variable to lx. A scatterplot of y against lx (i.e. the log of x) is displayed, as shown. The plot is clearly linear.

Note: If your answers are not given as decimals, refer to the Appendix to change Mode settings to APPRX.

176

How to apply the log transformation using the ClassPad Plot the data presented in the table below. x y 1 0 10 10 100 20 400 25 600 28 1000 30

Apply a log transformation to the x values (log (x)) and replot the data. Steps 1 Open the Statistics application and enter the data into the columns named x and y. Your screen should look like the one shown opposite. 2 Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear. 3 To calculate the values of log x and store them in a list named lx (short for log x): a Tap to highlight the cell at the top of the next empty list (in this case, list3). Rename by typing lx and pressing enter . b Tap to highlight the cell at the bottom of the newly named lx column (in the row titled Cal ). Typing log(x) and calculates and pressing lists the values of log x.

Note: To ensure decimal values are displayed, Decimal should be visible in the status bar (at the

177

4 Construct a scatterplot of y against lx (i.e. log x). The plot is clearly linear.

Example 3

1 transformation x x y 1 30 2 15 3 10 4 7.5 5 6

a Plot the data in the table, and comment on the form of the relationship between x and y.

b Apply a reciprocal transformation to the x values 1 , again plot the data, and comment on x the form of the relationship between y and 1 . x Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.

y 30 25 20 15 10 5 1 2 3 4 5 x

The relationship between y and x is non-linear. 1/x y 1.0 30 0.5 15 0.33 10 0.25 7.5 0.2 6

178

2 Plot the values of y against 1 . x 3 Decide if the form of the relationship is linear or non-linear.

1 is linear. x

Once again, this transformation is very conveniently carried out with the aid of a graphics calculator. How to apply the reciprocal transformation using the TI-Nspire CAS Plot the data presented in the table below. x y 1 30 2 15 3 10 4 7.5 5 6 1 x and replot the data.

Apply a reciprocal transformation to the x values Steps 1 Start a new document by pressing + . 2 Select Add Lists & Spreadsheet. Enter the data into lists named x and y, as shown opposite.

3 Press + and select Add Data & Statistics. Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear.

179

4 Return to the Lists & Spreadsheet + ). application (by pressing 1 To calculate the values of , complete x the following: a Move the cursor to the top of column C and type recx (short for the . reciprocal of x). Press b Move the cursor to the grey cell immediately below the recx heading and type = 1 , then press VAR ) and highlight the variable x ( to paste into the and press formula line. Press to calculate 1 and display the values. x 1 5 Construct a scatterplot of y against x (i.e. recx) + to return to the scatterplot Use created earlier and change the independent variable to recx. A scatterplot of y against recx (the reciprocal of x) is displayed as shown. The plot is clearly linear.

Note: If your answers are not presented as decimals, refer to the Appendix to change Mode settings to APPRX.

180

How to apply the reciprocal transformation using the ClassPad Plot the data presented in the table below. x 1 2 3 4 y 30 15 10 7.5 Steps 1 Open the Statistics application and enter the data into the columns named x and y. Your screen should look like the one shown opposite. 2 Construct a scatterplot of y against x. Let x be the independent variable and y the dependent variable. The plot is clearly non-linear. 1 3 To calculate the values of and x store them in a list named recx (short for the reciprocal of x): a Tap to highlight the cell at the top of the next empty list (in this case, list3). Rename by typing recx and pressing enter . b Tap to highlight the cell at the bottom of the newly named recx column (in the row titled Cal ). Typing 1 x and calculates and pressing 1 lists the values. x 4 Construct a scatterplot of y 1 against recx i.e. . The plot is x clearly linear.

5 6

1 x

181

What sorts of non-linear relationships can we linearise using the x2 transformation? The x 2 transformation has the effect of stretching out the upper end of the x scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the x to x 2 transformation. Note that for the x 2 transformation to apply, the scatterplot should peak or bottom around x = 0.

y y

What sorts of non-linear relationships can we linearise using the log x transformation? The log x transformation has the effect of compressing the upper end of the x scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the x to log x transformation.

y y

1 What sorts of non-linear relationships can we linearise using the transformation? x As a guide, relationships that have scatterplots which look like those shown below can often 1 (but not always) be linearised using the x to transformation. x

y y

Exercise 6B

These exercises are expected to be completed with the aid of a graphics calculator. 1 a Plot the data in the table, and comment on x 0 1 2 3 4 the form of the relationship between y and x. y 16 15 12 7 0 b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 .

182

2 a Plot the data in the table, and comment on the form of the relationship between y and x.

x y

1 3

2 9

3 19

4 33

5 51

b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . 3 a Plot the data in the following table, and x 1 2 3 4 5 comment on the form of the relationship y 30 27 22 15 6 between y and x. b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . 4 a Plot the data in the following table, and comment on the form of the relationship between y and x. x y 1 30 10 20 100 10 400 5 600 2 1000 0

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 5 a Plot the data in the table, and comment on the form of the relationship between y and x. x y 5 3.1 10 4.0 150 7.5 500 9.1 1000 10.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 6 a Plot the data in the table, and comment on the form of the relationship between y and x. x y 10 15.0 44 11.8 132 9.4 436 6.8 981 5.0

b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 7 a Plot the data in the table, and comment on the x 2 4 6 8 10 form of the relationship between y and x. y 60 30 20 15 12 b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment on the form of the relationship between y and 1/x. 8 a Plot the data in the table, and comment on the form of the relationship between y and x. b Apply a reciprocal transformation to the x values x y 1 61 2 31 3 21 4 16 5 13

(1/x), again plot the data and comment on the form of the relationship between y and 1/x. 9 a Plot the data in the following table, and comment on the form of the relationship between y and x. x y 2 10 4 70 6 90 8 100 10 106

b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment on the form of the relationship between y and 1/x.

183

c Name an x-axis transformation that should also work for the data. Try it and see. d Name an x-axis transformation that should not work for the data. Try it and see. 10 The table below shows the diameter (in cm) of a number of umbrellas, along with the number of people each umbrella is designed to keep dry. Diameter Number of people 50 1 70 2 85 3 100 4 110 5

a Construct a scatterplot showing the relationship between number of people and umbrella diameter, and comment on the form. b Apply a squared transformation to the x values (x 2 ), again plot the data, and comment on the form of the relationship between y and x 2 . 11 The table below shows the performance level on a task of a number of people, along with the time spent (in minutes) in practising the task. Time spent on practise Level of performance 0.5 1.0 1.0 1.5 1.5 2.0 2.0 3.0 3.0 3.0 4.0 3.5 5.0 4.0 6.0 3.5 7.0 3.9 7.0 3.6

a Construct a scatterplot showing the relationship between the time spent on practice and level of performance, and comment on the form. b Apply a log transformation to the x values (log x), again plot the data, and comment on the form of the relationship between y and log x. 12 The table below shows the horsepower of several cars, along with their fuel consumption in kilometres/litre. Fuel consumption 5.2 7.3 12.6 7.1 6.3 10.1 10.5 14.6 10.9 7.7 Horsepower 155 125 75 110 138 88 80 70 100 103 a Construct a scatterplot showing the relationship between horsepower and fuel consumption, and comment on the form. b Apply a reciprocal transformation to the x values (1/x), again plot the data and comment on the form of the relationship between y and 1/x.

6.3

Another way to linearise the relationship between x and y is to apply these transformations to the y-axis. Transforming the y-axis will have the effect of moving the y values on the plot vertically, and leave the x values unaltered. The square, log and reciprocal transformations can be applied to the y-axis with the following effects:

184

y

Spreads out the large y values relative to the smaller data values

log y

1 y

Also compresses large y values relative to the smaller data values, to a greater extent than log y. Note that values of y less than 1 become greater than 1, and values of y greater than 1 become less than 1, so that the order of the data values is reversed.

The following examples show the effect on the relationship between x and y when the squared, log and reciprocal transformations are applied to the y values. Once again, all these data transformations can be very conveniently carried out with the aid of a graphics calculator. Example 4 Linearising the relationship with a squared transformation

a Plot the data in this table, and comment on the form of the relationship between y and x. x y 0 0 1 3.2 2 4.5 3 5.5 4 6.3 5 7.1

b Apply a squared transformation to the y values (y 2 ), again plot the data, and comment on the form of the relationship between y 2 and x. Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.

8 6 4 2 0 x 1 2 3 4 5 y

185

5

x Y2

0 0

1 10.2

y2 60 50 40 30 20 10 0

2 20.3

3 30.3

4 39.7

50.4

2 Plot the values of y 2 against x. 3 Decide if the form of the relationship is linear or non-linear.

x 1 2 3 4 5

a Plot the data in this table, and comment on the x 0 1 2 3 4 5 form of the relationship between y and x. y 100 37 14 5 2 1 b Apply a log transformation to the y values (log y), again plot the data, and comment on the form of the relationship between log y and x. Solution a 1 Plot the values of log y against x. 2 Decide if the form of the relationship is linear or non-linear.

y 100 80 60 40 20 0 x 1 2 3 4 5

The relationship between y and x is non-linear. x 0 1 2 3 4 5 log y 2.00 1.57 1.15 0.70 0.30 0.00

log y 2.0 1.5 1.0 0.5 0 x 1 2 3 4 5

2 Plot the values of log y against x. 3 Decide if the form of the relationship is linear or non-linear.

186

Example 6

1 transformation y

a Plot the data in this table, and comment on the x 1 2 3 4 5 form of the relationship between y and x. y 10.0 5.0 3.3 2.5 2.0 b Apply a reciprocal transformation to the y values (1/y), again plot the data, and comment on the form of the relationship between x and 1/y. Solution a 1 Plot the values of y against x. 2 Decide if the form of the relationship is linear or non-linear.

10 8 6 4 2 1 2 3 4 5 x y

3 Write down your conclusion. b 1 Construct a new table of values. 1 2 Plot the values of against x. y 3 Decide if the form of the relationship is linear or non-linear.

1 y 0.5 0.4 0.3 0.2 0.1 1 2 3 4 5 x

3 0.3

4 0.4

5 0.5

1 and x is linear. y

What sorts of non-linear relationships can we linearise using the y 2 transformation? The y 2 transformation has the effect of stretching out the upper end of the y scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the y to y 2 transformation. Note that for the y 2 transformation to apply, the scatterplot should peak or bottom around y = 0.

y y

187

What sorts of non-linear relationships can we linearise using the log y transformation? The log y transformation has the effect of compressing the upper end of the y scale. As a guide, relationships that have scatterplots which look like those shown below can often (but not always) be linearised using the y to log y transformation.

y

1 transformation? y As a guide, relationships that have scatterplots which look like those shown below can often 1 (but not always) be linearised using the y to transformation. y What sorts of non-linear relationships can we linearise using the

y

y

Exercise 6C

These exercises are expected to be completed with the aid of a graphics calculator. 1 a Plot the data in the table. Comment on the x 0 2 4 6 8 10 form of the relationship between y and x. y 1.2 2.8 3.7 4.5 5.1 5.7 b Apply a squared transformation to the y values (y 2 ). Plot the data, and comment on the form of the relationship between y 2 and x. 2 a Plot the data in the table. Comment on x 5 10 15 20 25 30 the form of the relationship between y 13.2 12.2 11.2 10.0 8.7 7.1 y and x. b Apply a squared transformation to the y values (y 2 ). Plot the data, and comment on the form of the relationship between y 2 and x. 3 a Plot the data in the table. Comment on the form of the relationship between y and x. x y 2 5.1 6 6.2 11 12 21 40 7.3 7.5 9.1 11.8

188

b Apply a squared transformation to the y values (y 2 ). Plot the data and comment on the form of the relationship between y 2 and x. 4 a Plot the data in the table. Comment on the x 0.1 0.2 0.3 0.4 0.5 form of the relationship between y and x. y 15.8 25.1 39.8 63.1 100.0 b Apply a log transformation to the y values (log y). Plot the data and comment on the form of the relationship between log y and x. 5 a Plot the data in the table. Comment on the x 2 4 6 8 10 form of the relationship between y and x. y 7.94 6.31 5.01 3.98 3.16 b Apply a log transformation to the y values (log y). Plot the data and comment on the form of the relationship between log y and x. 6 a Plot the data in the table. Comment on the x 1 3 5 7 9 form of the relationship between y and x. y 7 32 147 681 3162 b Apply a log transformation to the y values (log y). Plot the data, and comment on the form of the relationship between log y and x. 7 a Plot the data in the table. Comment on the x 1 2 3 4 form of the relationship between y and x. y 1 0.5 0.33 0.25 b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on the form of the relationship between 1/y and x. 5 0.20

8 a Plot the data in the table. Comment on the x 0.2 0.4 0.6 0.8 1.0 form of the relationship between y and x. y 0.71 0.56 0.45 0.38 0.33 b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on the form of the relationship between 1/y and x. 9 a Plot the data in the table. Comment x 11 14 26 35 41 on the form of the relationship y 0.43 0.34 0.19 0.14 0.12 between y and x. b Apply a reciprocal transformation to the y values (1/y). Plot the data and comment on the form of the relationship between 1/y and x. c Name a y-axis transformation that should also work for the data. Try it and see. d Name a y-axis transformation that should not work for the data. Try it and see. 10 The time taken for a local anaesthetic to take effect is related to the dose given. To investigate this relationship a researcher collected the data shown. Dose Time 0.5 3.67 0.6 3.55 0.7 3.42 0.8 3.29 0.9 3.15 1.0 3.00 1.1 2.85 1.2 2.68 1.3 2.51 1.4 2.32 1.5 2.12

189

a Construct a scatterplot showing the relationship between the dose of anaesthetic and time taken for it to take effect, and comment on the form. b Apply a squared transformation to the time values (y), again plot the data, and comment on the form of the relationship between time squared (y 2 ) and dose (x). 11 The table below shows the number of internet users signing up with a new internet service provider for each of the rst nine months of their rst year of operation. Number Month 24 1 32 2 35 3 44 4 60 5 61 6 78 7 92 8 118 9

a Construct a scatterplot showing the relationship between number of users signing up and month, and comment on the form. Month is the independent variable. b Apply a log transformation to the number of users (y), again plot the data, and comment on the form of the relationship between log (number) and month (x). 12 A group of ten students was given an opportunity to practise a complex matching task as often as they liked before they were assessed on the task. The number of times they practised the task and the number of errors made when assessed are given in the table below. Number Errors 1 14 2 9 2 11 4 5 5 4 6 4 7 3 7 3 9 2 11 2

a Construct a scatterplot showing the relationship between number of practices and number of errors (y), and comment on the form. b Apply a reciprocal transformation to the number of errors values (1/y), again plot the data, and comment on the form of the relationship between number of errors (1/y) and number of practices (x).

6.4

Putting together the information in Sections 6.2 and 6.3, we can see that there may be more than one transformation which linearises the scatterplot. The forms of the scatterplots that can be transformed by the squared, log or reciprocal transformations can be largely classied into one of four categories, shown as the circle of transformations.

190

y2

y2 x2

1 x

Possible transformations

log x

log y 1 y

log y

1 y

log x 1 x

x2

Note that the transformations we have introduced in this chapter are able to linearise only those relationships that are consistently increasing or decreasing. The advantage of having alternatives is that, in practice, we can always try each of them to see which gives us the best result. How do we decide which transformation is the best? The best transformation is the one that results in the best linear model. To choose the best linear model we will consider for each transformation applied: the residual plot, in order to evaluate the linearity of the transformed relationship the value of the coefcient of determination (r 2 ): a higher value indicates a better t This procedure is illustrated in Example 7.

191

Example 7 The data in this table gives life expectancy in years and gross national product, GNP, in dollars for 24 countries in 1982. Using an appropriate transformation, nd a regression model for the relationship between life expectancy in years and GNP. Country Nicaragua Paraguay Venezuela France West Germany Greece Norway Czechoslovakia Austria Jordan Sri Lanka Brunei

GNP Life expectancy 950 58 1670 65 4250 68 11 520 74 12 280 73 4170 73 14 300 75 5540 71 9830 72 1680 61 320 67 22 260 66

Country GNP Life expectancy Indonesia 550 50 North Korea 930 66 Mongolia 940 64 Taiwan 2 670 72 Australia 11 220 74 Congo 1 420 48 Ethiopia 150 41 Guinea 330 44 Mauritania 520 44 Nigeria 940 49 Togo 350 48 Zaire 180 48

Source: Modern Data Analysis: A First Course in Applied Statistics, L.C. Hamilton 1990, p. 537 West Germany is now part of Germany; Czechoslovakia is now the Czech Republic and Slovakia.

Solution 1 Decide which of the variables is the independent variable, and which is the dependent variable. 2 Plot the values of y against x, decide if the form of the relationship is linear or non-linear, and nd the value of the coefcient of determination (r 2 ).

Life expectancy

GNP

3 Write down your conclusion. 4 Compare the shape of this plot to those in the circle of transformations (page 190). The scatterplot is similar to the plot in the top left-hand corner. Thus, y 2 , log x and 1 x are the transformations to investigate.

The relationship between y and x is non-linear: r 2 = 36.7%. Suitable transformations are y 2 , log x 1 and x .

192

Try the y 2 transformation 5 Calculate the values of (Life expectancy)2 and plot these against GNP. Comment on the linearity of the plot.

GNP

6 Fit a regression line, and nd the value of the coefcient of determination (r 2 ). Produce a residual plot, and use this to comment on the form of the relationship.

r = 38.4%. The relationship between (Life expectancy) 2 and x is still non-linear. This is confirmed by the residual plot.

2

Residual

GNP

7 Comment on the effect of the transformation. Try the log x transformation 5 Calculate the values of log GNP and plot these against Life expectancy.

Life expectancy

log GNP

6 Fit a regression line, and nd the value of r 2 . Produce a residual plot, and use this to comment on the form of the relationship.

r 2 = 66.0%. The relationship between Life expectancy and log GNP is closer to linear. This is confirmed by the residual plot.

193

Residual

log GNP

Try the 1/x transformation 1 and plot 8 Calculate the values of GNP 1 Life expectancy against . GNP

Life expectancy

1/GNP

9 Fit a regression line, and nd the value of r 2 . Produce a residual plot and use this to comment on the form of the relationship.

r = 51.5%. The relationship between Life expectancy and 1/GNP is reasonably linear. This is confirmed by the residual plot.

2

Residual

1/GNP

10 Comment on the effect of the transformation. 11 Decide which transformation is the most appropriate for this relationship. Choose the transformation which gives the most linear relationship (from the residual plots) and the highest value of r 2 .

The 1/x transformation has done a reasonable job in linearising the relationship. The most appropriate transformation to use here is the log x transformation, as the residual plot shows that the relationship between log GNP and Life expectancy is linear, and this model has the highest coefficient of determination, r 2 = 66.0%.

194

12 As the relationship between Life expectancy and log (GNP) appears to be linear and there are no obvious outliers, we can use the least squares method to t a line to the data. Using a calculator, nd the equation of the least squares regression line and write it in terms of the transformed variables.

Note: The independent variable (IV) is now log GNP, and the dependent variable (DV) is Life expectancy.

Using the log x transformation gives a regression model for the relationship:

Life expectancy = 14.3 + 14.5 log (GNP)

Some comments It might seem unnatural to talk about the wealth of a country in terms of log (GNP), yet when we are comparing the relative wealth of countries, log (GNP) is probably a more useful measure than GNP. For instance, knowing that the difference in GNP between Australia and Sri Lanka is $10 900 is less informative than knowing that the difference in log (GNP) is 1.5448, which tells us that Australias GNP is 101.5884 or 35 times that of Sri Lanka. Natural units of measurement are more often those that are familiar rather than those that are most useful!

Exercise 6D

1 The following scatterplots show non-linear relationships. For each scatterplot, state which of the transformations x 2 , log x, 1/x, y 2 , log y, 1/y, if any, you would apply to linearise the relationship. b 5 a 5

4 3 y 2 1 0 y 2 1 0 1 2 3 4 5 6 7 8 9 10 x 4 3

1 2 3 4 5 6 7 8 9 10 x

5 4 3 y 2 1 0

5 4 3 y 2 1 0

1 2 3 4 5 6 7 8 9 10 x

1 2 3 4 5 6 7 8 9 10 x

195

2 The data below give the yield in kilograms and length in metres of 12 commercial potato plots. Yield (kilograms) 346 1798 152 86 436 968 686 257 2435 287 1850 1320 Length (metres) 12.1 27.4 8.3 5.5 15.7 21.5 19.3 9.0 34.2 14.7 31.9 25.3 a Construct a scatterplot showing the relationship between yield and length of plot and comment on the form. b Using an appropriate transformation, nd a regression model for the relationship between yield in kilograms and length of plot in metres. 3 A recent study in Canada showed that cigarette consumption (per day) is related to cost per pack. Some data drawn from that study is shown below. Cost ($) 4.00 4.50 4.80 5.50 6.00 6.50 7.50 Cigarette consumption 8.0 7.4 7.0 6.4 5.9 5.5 5.0 a Construct a scatterplot showing the relationship between the cost of cigarettes and cigarette consumption, and comment on the form. b Using an appropriate transformation nd a regression model for the relationship between the cost of cigarettes and cigarette consumption. 4 The population of a large town increased over a 13-year period, as shown in the table. a Construct a scatterplot showing the annual population growth of the town, and comment on the form. b Using an appropriate transformation, nd a regression model for the annual population growth of the town. Year 1 2 3 4 5 6 7 Population 58860 57770 58206 59513 59983 60123 59763 Year 8 9 10 11 12 13 Population 61726 60387 61646 62347 64185 67158

196

5 The monthly average exchange rate (to the nearest cent) between the Australian dollar and the US dollar over a period of 18 months in the 1990s is given in the table below. a Construct a scatterplot showing Exchange rate Exchange rate the exchange rate over the 18-month Month (US $) Month (US $) period, and comment on the form. 1 0.77 10 0.75 b Using an appropriate transformation, 2 0.77 11 0.75 nd a regression model for the 3 0.77 12 0.72 exchange rate over that 18-month 4 0.76 13 0.72 period. 5 0.78 14 0.69 6 0.78 15 0.69 7 0.77 16 0.68 7 0.76 17 0.71 8 0.76 18 0.70 9 0.76 6 The table below shows the percentage of people who can read (literacy rate) and the gross domestic product (GDP) for a selection of 14 countries. a Construct a scatterplot showing the relationship between literacy rate and GDP, and comment on the form. b Using an appropriate transformation nd a regression model for the relationship between literacy rate and GDP for this group of countries. Literary rate (%) 72 35 97 24 99 99 99 73 99 40 35 62 99 64 Gross domestic product/capita 2677 260 19904 122 18944 4500 17539 1030 19860 409 406 6651 22384 2436

Country Botswana Cambodia Canada Ethiopia France Georgia Germany Honduras Japan Liberia Pakistan Saudi Arabia Switzerland Syria

197

Review

Data transformation This means changing the scale on either the x- or y-axis. It is performed when a residual plot shows that the underlying relationship in a set of bivariate data is clearly non-linear. The squared transformation stretches out the upper end of the scale on an axis. The log transformation compresses the upper end of the scale on an axis. The reciprocal transformation compresses the upper end of the scale on an axis to a greater extent than the log transformation. Residual plots are used to assess the effectiveness of each data transformation. The transformation which results in a linear relationship and which has the highest value of the coefcient of determination is considered to be the best transformation. The circle of transformations provides guidance in choosing the transformations that can be used to linearise various types of scatterplots. See page 190.

Skills check

Having completed this chapter you should be able to: 1 1 recognise which of the x 2 , log x, , y 2 , log y or transformations might be used to x y linearise a bivariate relationship apply each of these transformations to a data set use residual plots and the coefcient of determination, r 2 , to decide which transformation gives the best model for the relationship use the transformed variable as part of a regression analysis to give a model for the relationship

Multiple-choice questions

1 The missing data values, a and b, in the table are: value (value)2 log(value) 1 a 0 2 4 b 3 4 9 16 0.477 0.602 B a = 1, b = 0.5 E a = 1, b = 0.693 C a = 1, b = 0.301

A a = 0, b = 0.5 D a = 1, b = 0.602

198

Review

2 Select the statement which correctly completes the sentence: The effect of a log transformation is to . . . A stretch the high values in the data B maintain the distance between values C stretch the low values in the data D compress the high values in the data E reverse the order of the values in the data 3 The scatterplot opposite shows the relationship between the number of weeks each person has been on a diet program and their weight loss in kilograms for a group of subjects. A least squares regression line has been tted to the data.

14 12 Weight loss 10 8 6 4 2 0 2 3 4 5 6 7 Number of weeks on a diet

The residual plot for this least squares line would look like:

Number of weeks on a diet Number of weeks on a diet

B

4.00 Residual 2.00 0.00 2.00 4.00 2 3 4 5 6 7 Number of weeks on a diet

7 6 5 4 3 2 0 2 4 6 8 10 12 14 Weight loss

D

Weight loss

E

14 12 10 8 6 4 2 0 2 3 4 5 6 7 Number of weeks on a diet Residual

4 The relationship between two variables y and x as shown in the scatterplot is non-linear. In an attempt to transform the relationship to linearity, a student would be advised to: A leave out the rst four points B use a y 2 transformation C use a log y transformation 1 D use a transformation y E use a least squares regression line

y 5 4 3 2 1 0 x 1 2 3 4 5 6 7 8 9 10

199

Review

5 The relationship between two variables y and x as shown in the scatterplot is non-linear. Which of the following sets of transformations could possibly linearise this relationship? 1 1 A log y, , log x, B y2, x 2 y x 1 1 D log y, , x 2 C y 2 , log x, y x E ax + b

y 5 4 3 2 1 0 x 1 2 3 4 5 6 7 8 9 10

6 The relationship between two variables y and x as shown in the scatterplot is non-linear. y Which of the following transformations is most 5 likely to linearise the relationship? 1 B a y 2 transformation 4 A a transformation x 3 1 C a log y transformation D a transformation y 2 E a log x transformation

1 0 x 1 2 3 4 5 6 7 8 9 10

7 The relationship between two variables y and x as shown in the scatterplot is clearly non-linear. In an attempt to transform the relationship to linearity, a student would be advised to apply: A an x 2 transformation B a y 2 transformation C a log y transformation 1 D a transformation y E none of these

y 5 4 3 2 1 0 x 1 2 3 4 5 6 7 8 9 10

8 Brian has determined from a scatterplot of his data that the appropriate transformations for his data are log x, 1/x and y 2 . After applying each of these transformations to the data, he obtains the results shown below. Model y vs x y vs log x y vs 1/x y 2 vs x Residuals Curved Random Random Random r2 79.6% 80.8% 81.9% 88.4%

200

Review

Based on the information in the table, which transformation would you suggest Brian use? B a y 2 transformation C a log x transformation A an x 2 transformation 1 D a transformation E no transformation x 9 When investigating the relationship between the weight of the strawberries picked from a strawberry patch, and the width of the patch, Suzie decides that an x 2 transformation is appropriate. After transforming the data, she ts a least squares regression line to the data and determines that the intercept is 10 and the slope is 5. Based on this information, the model that Suzie has tted to the data can be written as: B weight = 5 + 10 (width)2 A (weight)2 = 10 + 5 width C weight = 10 + 5 (width)2 D (weight)2 = 10 + 5 (width)2 E (weight)2 = 5 + 10 width 10 Suppose that the model which describes the relationship between the hours spent studying for an exam and the mark achieved can be modelled by the equation: Mark = 20 + 40 log (Hours) From this model, we would predict that a student who studies for 20 hours would score a mark (to the nearest whole number) of: A 80 B 78 C 180 D 72 E 140

Extended-response questions

1 Measurements of distance travelled in metres and time taken in seconds were made on a falling body. The data are given in the table below. Time Distance Time2 a b c d e f 0 0 1 5.2 2 18.0 3 42.0 4 5 6 79.0 128.0 168.0

Construct a scatterplot of the data and comment on its form. Determine the values of (Time)2 and complete the table. Construct a scatterplot of Distance against (Time)2 . Obtain a residual plot for the new model and comment on the linearity. Determine the value of r 2 for the new model. Write down the regression equation for the new model in terms of the variables in the question. g Use the regression equation to predict the distance travelled in seven seconds.

2 The data in the table below show the marks obtained by students on a test and the amount of time they reported studying for the test: Mark 62 74 79 Time (hours) 1.5 2.25 3.0 80 2.5 56 0.8 86 3.5 92 87 64 6.0 2.75 1.0 88 4.5 48 0.5 32 0.1

201

Review

a We want to predict a students mark from the time they reported studying for the test. In this situation, which is the dependent variable and which is the independent variable? b Construct a scatterplot and comment on the relationship between test mark and time spent studying in terms of direction, outliers, form and strength. c i Fit a linear model to the data and record its equation. Interpret the slope in terms of the problem at hand. ii Calculate the coefcient of determination and interpret. iii Construct a residual plot and use it to comment on the suitability of modelling the relationship between Mark and Time spent studying with a straight line. d Apply a log transformation to Time. Then: i construct a scatterplot for the transformed data ii nd the equation of the least squares regression line for the transformed data iii use the equation to predict the mark obtained after 5 hours of study iv calculate the coefcient of determination and interpret v construct a residual plot and use it to comment on the linearity of the transformed model 3 The following are the testosterone levels and the age at rst conviction for violent and aggressive crimes collected on a sample of young male prisoners. It is believed that the higher the testosterone level in a male prisoner, the earlier they are likely to be convicted of a violent and aggressive crime. A correlation and regression analysis is also given. Testosterone 1305 1000 1175 1495 1060 800 1005 710 1150 605 690 700 625 610 450 Age at rst conviction 11 12 13 14 15 16 16 17 18 20 21 23 24 27 30

30 28 26 24 22 20 18 16 14 12 10

0 0 60 40

Age

Testosterone level 5 4 3 2 1 0 1 2 3 4 5

40 0 60 0 80 0 10 00 12 00 14 00 16 00

Residual

Testosterone level

0 10 00 12 00 14 00 16 00

80

202

Review

a What is the value of Pearsons correlation coefcient, r? b Write the equation of the least squares regression line in terms of Testosterone level and Age. c Interpret the value of r 2 in terms of Testosterone level and Age. d Use the residual plot to comment on the linearity of the relationship. e Construct a scatterplot of Age against log (Testosterone). f Obtain a residual plot for the new model and comment on the linearity. g Determine the value of r 2 for the new model. h Write down the regression equation for the new model in terms of the variables in the question. 4 Are infant mortality rates in a country related to the number of doctors in a country? The data below give infant mortality rates (deaths per 1000 births) and doctor numbers (per 100 000 people) for 17 countries. Infant mortality No. of doctors Infant mortality No. of doctors 12 192 15 270 13 222 85 9 12 154 20 357 14 294 21 250 10 182 54 79 10 179 75 59 7 204 121 27 10 271 71 52 111 61 a Construct a scatterplot of Infant mortality against Number of doctors and comment on the relationship between infant mortality rate and doctor numbers in terms of direction, outliers, form and strength. b Construct a scatterplot of Infant mortality against log (Number of doctors). c Obtain a residual plot for the new model and comment on the linearity. d Determine the value of r 2 for the new model. e Write down the regression equation for the new model in terms of the variables in the question. f Use the regression equation to predict the infant mortality rate when there are 100 doctors (per 100 000). 5 Tree ages can be determined by cutting down a tree and counting the number of rings on the stump of its trunk. This, however, is a destructive process and it would be useful to have a method of working out the approximate age of a tree without having to cut it down. Noting the obvious, that trees tend to get bigger as they get older, we might be able to use some external measurement of size to help us estimate the age of a tree.

203

Review

The data below show the age (in years) and diameter at chest height (in cm) of a sample of trees of the same species taken from a commercial plantation. Age (years) 4 5 8 8 8 10 10 12 13 14 Diameter (centimetres) 2.0 2.0 2.5 5.1 7.5 5.1 8.9 12.4 9.0 6.4 Age (years) 16 18 22 25 29 30 34 38 40 Diameter (centimetres) 11.4 11.7 14.7 16.5 15.2 15.2 17.8 17.8 19.1

a We wish to predict the age of a tree from its diameter at chest height. In this situation, which is the dependent variable and which is the independent variable? b Construct a scatterplot and comment on the relationship between age and diameter in terms of direction, outliers, form and strength. c i Fit a linear model to the data and record its equation. Interpret the slope in terms of the problem at hand. ii Calculate the coefcient of determination and interpret. iii Form a residual plot and use it to comment on the suitability of modelling the relationship between age and diameter with a straight line. d Use the x 2 transformation to linearise the data. Then: i construct a scatterplot of age against diameter squared ii nd the equation of the least squares regression line for the transformed data iii calculate the coefcient of determination and interpret iv form a residual plot and use it to comment on the suitability of modelling the relationship between age and diameter squared with a straight line

- Excel 2007 RegressionTransféré pardarkniight
- Feyrer, J. (2009) w14910Transféré parGang Li
- LIMDEP Short Student Manual 9.0Transféré parBhawat Chaichannawatik
- 548-1724-1-PBTransféré parHaseeb Malik
- schneider hoike part 2 lesson plan materials-1Transféré parapi-377332228
- Lampiran SPSSTransféré parjulisimarmata
- SPSS16 ManualTransféré parThiripura Sundhari
- Spatial Distribution of Nematodes in a Heavy Metal Contamibated Nature Reserve Thesis 2013Transféré parJulio Cesar Parada S
- lectures1-3Transféré parAli Zain Bhatti
- Inferential StatisticsTransféré parSammy Datastat Gathuru
- Forecasting FX RatesTransféré parBer Nanke
- Errors and Residuals in Statistics - Wikipedia, The Free EncyclopediaTransféré parSubash Ghimire
- corr_regTransféré parMuhammad Abduh
- Jaiswal Caterers - BRM ProposalTransféré parslade
- ARTICULOTransféré parGerald Roy
- OUTPUT.docTransféré parSmanIKadugede
- 1Transféré parAzadul Talukder Shantu
- satellite project final paperTransféré parapi-314464220
- Regresi GandaTransféré parekoefendi
- Modelling and Optimization of Cyanidation Process. a Chemometric Approach by Regression AnalysisTransféré parLuis Miguel La Torre
- dswdsqTransféré parCamilo Gutiérrez
- Regresi Linear BergandaTransféré parMangke Hermansyah
- Math644 Chapter 1 Part1Transféré parAndy
- AGUSALIMTransféré parMuh Arsawan
- Correlation Ang RegressionTransféré parJobelle Cariño Resuello
- Chapter 3 Special Topics in RegressionTransféré parRay Liu
- Mac master evaluacion familiarTransféré parLorenzo Roa
- Omitted Variable TestsTransféré parAndrew Tandoh
- Sub struktur output.docxTransféré parmadecasta
- Excel Beta Example (6)Transféré parRahul Singh

- The Embodied Theory of Language: Evidence and ConstrainsTransféré parGuaguancon
- CascadeTransféré parLasandu Wanniarachchi
- TR Commercial Cooking NC IVTransféré parMari Cris
- Measuring Service Quality to Improve SchoolsTransféré parSartika Dewi
- Customer Survey AnalysisTransféré parAdrianus Pramudhita
- Brand preference of soapTransféré parsunithascribd
- Lafarge Blast Optimisation Initiative (1)Transféré parscarriongalindo
- Argumentative TextsTransféré parMostafa Eldaly
- RPMS Accomplishment Journal (Deped Tambayan)Transféré parAnnemargarettejustine Cruz
- Non Parametric EstimationTransféré parMedi Raghu
- bettcher term paper apsy 607Transféré parapi-162509150
- Demographic Factors as a Predictor of Entrepreneurs’ Success among Micro, Small and Medium Enterprises (MSMEs) Owners in Lagos State, NigeriaTransféré parIOSRjournal
- Intensive-Level Survey of the Washington Heights Area of Washington DC.Transféré parEnvision Adams Morgan
- Crime Mapping News Vol 4 Issue 2 (Spring 2002)Transféré parPoliceFoundation
- Acquisition and Processing Pitfall Associated With Clipping Near-surface Seismic Reflection TracesTransféré parKhamvanh Phengnaone
- Best Practices in ConstructionTransféré parRian Ibayan
- Citing EU HR Cases From Print Source Using OSCOLATransféré parBlue Flame
- c.behaviourPERCEPTIONTransféré parBiplab Sarkar
- Resume.docxTransféré parLuxna Suresh
- core FETPTransféré parIndra Dwinata
- c4rp00009aTransféré parapi-326396036
- amended 36034(1).docTransféré parAseah Othman
- Week7Assignment-EduardoDeJesusEscobar.docxTransféré parIvonnePosso
- A Proposal for a New Size Label to Assist Consumers in Finding Well-fitting Women’s ClothingTransféré parnguyenthuyngoc71
- ProjectFilm_Anthi.docxTransféré parAdonis Galeos
- Teacher Evaluation System UpdateTransféré parDallasObserver
- PahTransféré parvzimak2355
- Acresearch-Chapter2-3-4-5Transféré parEd Rheyl
- nsk - spTransféré parKaran Agrawal
- Assignment 2Transféré parزينب سيد