Vous êtes sur la page 1sur 11

3-1

Session 3

CrosstabuIation and Recode

Page

Crosstabulation in SPSS 3-2
Recoding 3-4
Recoding into the same variable 3-5
Changing the value labels 3-6
Recoding into Different Variables 3-7
Other forms of recode 3-9
Practical session 3 3-10


3-2
SESSION 3: CrosstabuIation and Recode
CrosstabuIation in SPSS

The crosstabs procedure allows you to explore the relationship between
(normally just two) categorical variables. t will show you a table of the
joint frequency distributions of the two variables. Accompanying statistics
will also tell you if there is a significant association between the two
variables.

As an example of crosstabulation, retrieved H:\My Documents\spss
data\sampIe.sav, the SPSS Data file saved in the previous exercise, and
crosstabulated the two variables hincdiff with srinc.

n order to carry out a crosstabulation, selected.

Analyze
Descriptive Statistics
Crosstabs...

Then hincdiff was selected as the row variable and srinc as the column
variable. As a rule of thumb, you should place the dependent variable as
the row variable and the independent variable as the column variable. n
this example it is assumed, if anything, that it is high income or lack of it
which affects how people feel about whether they are managing, not that
how they feel they are managing affects their income.

As with all SPSS procedures, the variables are selected by highlighting
them and then clicking on the button (see Figure 3.1).


Figure 3.1

3-3
To get SPSS to start processing, clicked on OK.

This resulted in the following:

Figure 3.2

n each cell of the table is the cell count. n order for these to be
expressed as column percents, the CeIIs option in the Crosstabs: CeII
dispIay dialog box needs to be selected and CoIumn Percentages ticked
(see Figure 3.3):


Figure 3.3
Cases which have a
missing value in either
variable are not included
in the table
3-4

After that the resulting table looks like:

hincdiff If managing on income * srinc SeIf assessed income group CrosstabuIation
1 0 1
8.3% .0% 4.2%
10 7 17
83.3% 58.3% 70.8%
1 3 4
8.3% 25.0% 16.7%
0 2 2
.0% 16.7% 8.3%
12 12 24
100.0% 100.0% 100.0%
Count
% within srinc Self
assessed income group
Count
% within srinc Self
assessed income group
Count
% within srinc Self
assessed income group
Count
% within srinc Self
assessed income group
Count
% within srinc Self
assessed income group
1 Very well
2 Quite well
3 Not very well
4 Not at all well
hincdiff f
managing
on income
Total
2 Middle
income
3 Low
income
srinc Self assessed
income group
Total

Figure 3.4

We now can say that about 83% of those who said they were on a middle
income are managing quite well.

Recoding

However, the table has two empty cells and the table could be collapsed
across some categories of the variable hincdiff, i.e. 'Very well' and 'Quite
well' could be collapsed into one category called 'Well', while 'Not very
well' and 'Not at all well' could also be collapsed into one 'Not well'
category. For this we use the recoding facility of SPSS.

Other reasons to recode:

Altering an existing coding scheme
e.g. to regroup a continuous variable like age
During editing to correct coding errors
e.g. to change any wild (i.e. erroneous) codes to a missing value

3-5
There are two ways to recode in SPSS for Windows:

1. Recode into the Same VariabIe
2. Recode into a Different VariabIe

(1) has the advantage of being slightly simpler, but (2) has the advantage
that you retain the old values and create a new variable to hold the
recoded values.

Recoding into the same variabIe

To illustrate this selected...

Transform
Recode into Same Variable...


then selected the variable to recode, hincdiff (see Figure 3.5)


Figure 3.5

Then clicked on the OId and New VaIues... button.

Essentially we need to change the following values:

Original values 1 and 2 become 1
and original values 3 and 4 become 2


3-6
Step by step recode

1. Click on Range in the old value box
2. Type in 1 in the first box and 2 in the second box, after through
3. Then click on VaIue in the new value box and type in 1 in the box
4. Click on Add
5. Once again click on Range in the old value box
6. Type in 3 in the first box and 4 in the second box
7. Then click on VaIue in the new value box and type in 2 in the box
8. Click on Add

When you click on Add, a record of the recodes so far appears in the
OId New sub box.


Figure 3.6

9. Click on Continue

This returns you to the Recode into Same VariabIes dialog box

10. Click on OK to perform the Recode

This changes the hincdiff data column where there was a 1 or a 2,
there is now just 1; where there was a 3 or a 4, there is now a 2. Any
values which were not specified in the Recode remain the same (notice
the value 9 for case six).


Changing the vaIue IabeIs

Usually, since the values have changed, it will be necessary to change the
value labels as well. This is performed as in the previous exercise by
selecting the VaIues cell of the variable hincdiff in the VariabIe View,
clicking on the button to get the VaIue LabeIs dialogue box, and then
3-7
making changes so that 1 has the label 'Well' and 2 has the label 'Not
well' (labels for values 3 and 4 can be removed).

Repeating the crosstabulation

By selecting

Analyze
Descriptive Statistics
Crosstabs.

as before, the crosstabulation specification should still be present so it is
only necessary to click on OK.

The table (with column percentages) should look like:
hincdiff If managing on income * srinc SeIf assessed income group CrosstabuIation
11 7 18
91.7% 58.3% 75.0%
1 5 6
8.3% 41.7% 25.0%
12 12 24
100.0% 100.0% 100.0%
Count
% within srinc Self
assessed income group
Count
% within srinc Self
assessed income group
Count
% within srinc Self
assessed income group
1 Very well
2 Quite well
hincdiff f managing
on income
Total
2 Middle
income
3 Low
income
srinc Self assessed
income group
Total

Figure 3.7

Recoding into Different VariabIes

As hincdiff now has only 2 categories, we need to start again with H:\My
Documents\spss data\sampIe.sav.

This procedure is essentially the same as recoding into the same
variables but you have an extra box into which to type the name of the
new variable.

shall illustrate this by recoding hincdiff into a new variable called incdiff.

selected

Transform
Recode into Different Variables...

put hincdiff in the NumericaI VariabIe Output VariabIe box:

3-8
Figure 3.8

then typed in the name of the new variable (incdiff) in the Output
VariabIe dialog box. clicked on the Change button and then on the OId
and New VaIues. button.

The OId and New VaIues screen appeared as in Figure 3.9. This screen
is very similar to that when recoding into the same variable (see Figure
3.6) and the recoding process is the same as in the step-by-step list for
'Recode into the same variable'.

Figure 3.9
NB : However, this time all values not included in the OId New box will
become system missing. f you wish to keep the other values of hincdiff
as they were in the new variable incdiff you must click on AII other
vaIues in the OId VaIue part and Copy oId vaIues(s) in the New VaIue
3-9
part. This gives ELSECopy, (see Figure 3.9), and the result is seen in
Figure 3.10:

Figure 3.10

Other forms of recode

As well as recoding specific values into other values, you can also recode
system and missing values into either values or system missing values.

System missing values

These are missing values assigned by the system, where no legal value
can be assigned, i.e. if SPSS meets either a blank or a letter in numeric
data, or when SPSS is asked to calculate an illegal calculation such as
division by zero. n a frequency listing, system missing values are
represented by a period.

Missing values

Just a reminder that these are values assigned by you the user to values
that may require special treatment in calculations or in data
transformation, i.e. the value 99 for age which is given to those people
who failed to respond to the question and needs to be ignored in any
calculation.

3-10
System and Missing Values in the Recode Command

The following recodes are possible:

Value or range of values Single value
Value System missing
Missing value System missing
System missing Single Value


PracticaI session 3

Income and perception of living standards

n this exercise, you will start by re-running the above crosstabs table, but
this time using a subset of the 1991 data set containing data for 2836
respondents. Then you will be using one of the SPSS data transformation
commands to Recode some of the variables from that dataset.

Load H:\My Documents\spss data\bsas91b.sav

Crosstabulate hincdiff with srinc.

Recode hincdiff into incdiff as in my example (i.e. into a different
variable).

Add appropriate value labels to incdiff. You will find the new variable is at
the right hand edge of the data in the Data Editor window.

Crosstabulate incdiff with srinc. Remember to select CoIumn
Percentages under the CeIIs options.

s it the case that richer respondents are likely to think that they are coping
better than poor ones? While you should have had some idea about an
answer to this question from the tiny sample used previously, you should
now be able to answer the question with some confidence.


Political identification and age

The variable PARTYID1 records the political identification of the
respondent (note that the variable is spelt with D (the letters and D) and
a final digit, 1). The variable shows respondents' answers to the question:

What political party do you support, or feel a little closer to, or if there was
a general election tomorrow, which one would you most likely support?

These can be
declared 'missing'
after the recode if
required
3-11
We want to see how party identification varies with age. To do this, carry
out the following steps:

Run a FREQUENCES command on the variable PARTYID1 to see the
range of parties and the distribution of respondents between them.

Using your frequencies table, or the Value Labels, or the bsas91b code
book (attached at the end of the notes), identify the values of those
who did not answer 'Conservative', 'Labour', or 'Democrat/SLD/
Liberal'.

RECODE PARTYID1 into a new variable POLPART, which has just four
categories:
1 = 'Conservative'
2 = 'Labour'
3 = 'Democrat/SLD/ Liberal'
9 = all the other original answers declare this as a missing value

(Hint: The values '6 through highest' become value 9, while everything
ELSE is a COPY)


Recode RAGE into the different variable AGEGP by dichotomizing it into 2
groups; those aged 40 or over and those under 40. You will need to
decide what to do with No response, coded 99 (a missing value).

Add appropriate variable and value labels to POLPART and AGEGP.

(Remember to indicate MSSNG VALUES.)


Crosstabulate the new political identification variable with agegroup.
Choose appropriate percentages.

Are older respondents more likely to vote Conservative than younger
ones? Where was the 'Democrat/SLD/ Liberal' support concentrated?

Save your output as EXER3.SPO (and print it if you wish). Do not change
bsas91b.sav.

Vous aimerez peut-être aussi