Vous êtes sur la page 1sur 49

Using SPSS

Prepared by Pam Schraedley


January 2002

This document was prepared using SPSS versions 8 and 10 for the PC. Mac versions or other
PC versions may look slightly different, but most instructions here should still work.

Table of contents
1. Getting Started
1.1 Entering data from scratch........ 3
Defining Variables (SPSS version 8)... 3
Defining Variables (SPSS version 10)..... 4
1.2 Importing data from Excel.... 6
2. Getting your data in shape
2.1 Calculating variables.8
2.2 The If button .... 9
2.3 Recoding Variables .. 10
Recoding into Same Variables..... 10
Recoding into Different Variables....... 11
Special case: Median (or tertile or quartile) splits ...... 12
2.4 Select cases... 13
2.5 Merging files. 14
Adding cases 15
Adding variables.. 16
3. Analyzing your data
3.1 Independent Samples t-test .. 18
3.2 Paired t-test .. 19
3.3 Oneway simple ANOVA.. 21
3.4 Chi square contingency test . 24
3.5 Correlations (simple and partial).. 25
3.6 Regression.... 27
3.7 ANOVA models and GLM . 30
Repeated Measures ......... 34
3.8 Reliability .37
4. Taking a look at your data
4.1 Checking the numbers . 39
Frequencies . 39
Tables .. 40
4.2 Graphing and plotting .. 42
Scatterplots . 42
Histograms .. 43
Bar charts 43
5. Output
5.1 Organizing your output 45
5.2 Results Coach .. 46
6. Using Syntax
6.1 The Paste function 48
6.2 Creating a Session Journal ... 48
7. For more information .... 49

1. Getting started
1.1 Entering data from scratch:
You will first want to create a template into which to enter data by defining variables. This is
done differently in SPSS 8 and SPSS 10, and is the most commonly used feature that differs
between the 2 versions.
Defining variables (SPSS version 8)
Under the Data tab, click Define Variable.
Important note about entering data
in SPSS:
SPSS likes it best when all of the data
for one subject are on one line. For
doing paired t-tests, repeated
measures ANOVAs and complicated
ANOVA designs, your life will be
easier if you enter your data this way.
If you generally have a huge matrix
of data for each subject in which this
would be prohibitive, maybe SPSS is
not the stats package for you.

This will bring up the following window:

One way to make these huge datasets


more palatable is to enter the data
into several smaller datasets and then
merge the files later. Instructions on
how to do this are in section 2.5.

Type your variable name into the Variable Name Box (circled in red above). Variable names
must have 8 characters or less. Specify the variable Type by clicking on Type (circled in green
above). Numeric is the default, but date or string are other common types. If numeric, you can
specify the number of decimal places here. Specify whether your variable is scale, ordinal, or
nominal (circled in purple above). It has to be scale if you want to do things like add and
average it or to do typical statistics like t-tests. Specify labels for your variables by clicking on

Labels (circled in blue above). I strongly recommend that you do this. Many the grad student
has come back to their data a year later and had no idea what boms47 meant. Here is the Labels
dialog box:

Specify a Variable label (i.e. tell yourself that boms47 is the Brat-o-meter Scale, question #47,
hairpulling). Enter this information in the Variable label box (circled in red). Then specify
value labels if appropriate. For example, entering 1 may mean the person responded with I
never pull peoples hair; 2 means I pull peoples hair occasionally; 3 means I pull peoples
hair often etc. In that case, you would enter 1 in the Value box above (circled in green), enter
I never pull peoples hair into the Value label box (circled in blue), then click Add (red arrow).
Your new value label will appear in the box circled in purple. Do that for value=2, 3, and so on
until you have all of your values entered. Then click continue to return to the Define variable
dialog box.
Click OK in the Define Variable dialog box, and that variable will be created. If you want to do
a whole slew of similar ones of these (e.g. boms1 - boms50), there may be easier ways. You can
do one, and then copy and paste the syntax to create all of your variables. Ill explain how to do
this in the Using Syntax section below.
Defining variables (SPSS version 10)
The good news is that defining variables got much easier in SPSS version 10.
At the opening screen, you will see two tabs at the bottom of the grid (circled in red below): You
start out in the Data View tab. You can click on the Variable View tab to define variables.

Once in Variable View, you can enter a variable name in the first
column, labeled name. Here, I have entered our old boms47 into the
first column, and all of the defaults have filled themselves in:

At the red arrow, you can see all of the characteristics of your variable
that can be specified, including our old Label (meaning variable label)
and Values (meaning value labels). Again, I strongly recommend that
you use variable and value labels. If you click in the Label box for
your variable, you will see 3 little dots in a box (circled in red below).
Clicking on those dots will pop up the Value labels dialog box (circled in
green below). You can add value labels using this dialog box in the
same way you did in version 8.

1.2 Importing data from Excel


Importing data from Excel is easy. You can type your variable names (again, 8 characters or
less) in the first row (see red arrow below) and enter data below that. Once you are done, save
the file as a Microsoft Excel 4.0 Worksheet. Most versions of SPSS cannot read anything
newer than Excel version 4.0. Excel will prompt you to OK some things (like that you are only
saving the active worksheet, not the whole book and that some features may be lost). If you are
dealing with ordinary data, this should be fine. When in doubt, save as the current Excel version
first as a backup.

Once you have your data in Excel 4.0 format, open SPSS. Click on FileOpenData (red
arrow below) which will open the Open File dialog box. Under files of type choose Excel
(*.xls) (green arrow below) to show your Excel 4.0 file. Choose your file (note: it must not be
currently open in Excel or you will not be able to open it in SPSS).

Once you choose the file, the Opening File Options


box (right) will pop up. If you put variable names in
your Excel file (which I do recommend), make sure the
Read variable names box is checked (red arrow right).
Then click OK.
This will read in your data (green arrow below) and pop
up an output window (outlined in red below) that will
show a Log that tells you how many variables and how
many cases were read in. Check to make sure this is correct.
Variable names longer than 8 characters will be truncated.

Errors may result from


funny characters in your
variable names, or
duplicates. In that case,
SPSS will give some
dummy variable name
and will report an error
in the Log.
You should still go in
and give Variable and
Value labels as before.

2. Getting your data in shape


2.1 Calculating variables
If you have a questionnaire or a scale,
and need to combine variables or do
any calculations on them, under the
Transform tab, click on Compute
(circled in red to right). This will pop
up the Compute Variable dialog box
(below). Type in the name of your
new variable in the Target variable box
(circled in red below). You can click
on the Type&Label button below that to include a Type and label (as you did for your entered
variables). Then you create the calculated expression in the Numeric Expression box (where the
green arrow is). Here, I have taken the sum of three of our variables and divided by three
taking the mean. I sent those variable names over to the Numeric Expression box by
highlighting them in the variable list (where the red arrow is) and clicking the arrow button
(circled below in green). Then you can use the keypad buttons in the dialog box (or on your
keyboard) to include arithmetic operators, parentheses, etc. Instead of adding and then dividing,
I could have used the MEAN() function in the function box (blue arrow below). In this case, the
same variable could have been created by typing MEAN(boms45, boms46, boms47). This
function box also has useful functions like LN() (taking the natural log), MIN() (taking the
minimum of the variables in its parentheses) and others.
In some cases, you may want to compute a variable one way for one set of subjects and another
way for another set. In that case you would use the If button (purple arrow below) to specify
the conditions under which you want to compute this variable in the way that you are specifying.
Ill describe that below in section 2.2. When youre done and click OK, your new variable will
appear at the end of your dataset (i.e. as the last variable).

2.2 The If button


The If button will show up in quite a few dialog boxes, as it does in the Calculate Variable box
above. Clicking it yields the If Cases dialog box below. The first thing to do is make sure the
Include if case satisfies condition radio button is selected (red arrow below). Then place your
condition in the box below that (where the expression is circled in green). Here, I have asked
that only cases where id <= 8 be included (circled in green). So only subjects with ID#s less
than or equal to 8 will be included. The buttons circled in blue below are the operators you can
use to place your conditions.
The table to the right are
=
Is equal to
definitions for the operators. Right < Less than
>
Greater
than
~=
Is not equal to
clicking on them will also give you
<= Less than or equal to
& AND
their definitions. For example,
>= Greater than or equal to
|
OR
right clicking on ** will tell you
~
NOT
that it is the
exponential operation. So 5**2 is 5 squared, or 25. So for example, if you wanted to limit cases
to females (where sex=1 means female) who were also at least 18 years old, you would enter
sex=1 & age >= 18 in the box.

Once you are finished with your If condition, clicking Continue will return you to the Compute
Variable dialog box, or whatever box you were in prior to clicking the If button.

2.3 Recoding Variables


Once you have your variables in, you may decide that you need to recode. For example, you
may need to reverse code certain variables. Clicking on TransformRecode (as below) gives
you two optionsInto Same Variables or Into Different Variables (circled in red below).
Recoding into different variables leaves your initial variable intact. Recoding into the same
variable does not.

For example, lets say that boms46 is a reverse coded item (e.g. 1 is I am a big brat; 2 is I am a
medium sized brat; etc.) so 1 becomes 4, 2 becomes 3, and so on.
Recoding into Same Variables
To the left is the Recode into same
variable dialog box. I have clicked
boms46 over into the Numeric Variables
box to be recoded. You can send over
more than one variable at a time if they
will need the same recoding operation
(e.g. do all of your reverse coded items
at once). I will then click the Old and
New Values button (circled in red). You
will also see our old friend the If
button which will let you specify the
conditions under which you want to
recode.
Clicking on Old and New Values (above) brings up the Old and New Values dialog box below.
Entering a 2 into the Old Value box (red arrow) and a 3 into the New Value box (green arrow)
and then clicking the Add button (circled in red) will make all 2s in the boms46 column change
to 3s. Once you add them, they will appear in the Old New Box where the 1 4 already

10

appears. You can also change a range of numbers using the three Range options (outlined in
purple) or change all remaining values to some value. For example, you could recode all other
values to system missing by clicking All other values on the left (inside the purple square) and
system missing on the right (blue arrow) then clicking Add. Or change missing values to zeros
by clicking system missing on the left (above the purple box) and entering zero in the new value
box on the right (green arrow) then clicking Add. Dont forget to click Add. (Its easy to forget).
When youre done, click Continue to go back to the Recode dialog box. Then click OK.

Recoding into Different Variables


If instead you want to keep your original variable and just make a new one that recodes the
original, use Recode into Different Variables. That dialog box is shown below. In this example,
I have clicked boms46 over to the Numeric Variable Output Variable box (circled in red
below) and typed boms46r (my new variable name) into the Output Variable Name box (circled
in green below). Notice I have also entered a variable label into the Label box (green arrow
below). Once you do this, click the Change button (red arrow below) and the question mark in
the red circle below will change to boms46r. If you want to do more than one variable at this
stage, click over the next variable into the Numeric Variable Output Variable box and repeat
this process of typing in the new variable name and label. You have to click the Change button
between each Variable that you want to change. Once you are done, click the Old and New

11

Values button
(circled in blue to
left). This will
send you to an
identical dialog
box to that for
Recode into Same
Variables (above).
Follow those
instructions to
recode your
variable(s) then
click Continue
and OK. Your
new variable will
be at the end of
your dataset.
Special case: Median (or tertile or quartile) splits
One common form of recoding is to divide your variable values into two groups, split at the
median (or into four quartile groups, etc.). To do this in SPSS, there is a secret function in Rank
Cases. Click on Transform Rank Cases (circled in red below). This will bring up the Rank
Cases dialog box (below). Click over the variable(s) you want to recode (circled in green below)
then click the Rank Types button (circled in blue below).

You should leave the default of Assign Rank 1 to Smallest Value (red arrow above) unless you
want your highest values to be assigned a value of 1 in your new recoded variable. Clicking on
Rank Types (circled in blue above) will get you to the Rank Cases: Types dialog box below.

12

To do a median split, check


the Ntiles box (red arrow to
left) then enter 2 in the box
(green arrow to left).
Entering 3 would give you
tertiles, 4 would give you
quartiles, etc. You can also
do a simple ranking by
checking the Rank box (blue
arrow to left), but that is not
what is needed for the
median split, so it is not
checked. Click Continue to
go back to the Rank Cases
dialog box.
Clicking on the Ties button (previous page circled in purple) will give you options for how to
deal with ties. The default is to give them the mean of the two values, but you can also put ties
into the lower or higher category if that is what you need. Once you are done, click OK in the
Rank Cases dialog box. This will create a new variable that has the same name as your old
variable with an n at the beginning. So in this example, we created nboms45, which takes on the
value 1 if boms45 was below the median and 2 if boms45 was above the median. This variable
will be placed at the end of your dataset.
2.4 Select Cases
If you want, for example, to limit an analysis to only female subjects 18 or older, or to only Time
2 data, you can do that using the Select Cases function in SPSS. Click DataSelect Cases
(circled in red below). That brings up the Select Cases dialog box. To select a subset of cases
based on some condition, click the If condition is satisfied radio button (red arrow below) then
click the If button (blue arrow below). The default is that cases that do not meet your
condition are merely filtered (Ill show you what that looks like in a minute). But you can also
change that so that unselected cases are deleted (see green arrow below). Clicking the If
button takes you to our old friend the If dialog box which you already know how to use. Set
your condition (e.g. sex=1 & age >= 18, time=2, etc) then click Continue, then OK. Cases will
be filtered (or deleted). At this select cases dialog box, you can also take a random sample of
cases. When you are done with your specific analysis and want all of your data again (assuming
you filtered and did not delete), just go to DataSelect Cases again and click the All cases radio
button (purple arrow below) then OK.

13

When you filter cases, a diagonal line will go through the case
number as shown to the right (column indicated by the red arrow)
for cases that are being filtered out. That is, for cases that are NOT
selected. In this case, I selected cases if boms45 <= 2, so all 3s
and 4s are filtered. Any analyses I do at this point will not include
any subjects who scored a 3 or 4 on boms45. Dont forget to
Select all cases again when you are done. Incidentally, this also
creates a variable called FILTER_$ in your dataset that takes a
value of 1 if the case is selected and 0 if it is filtered out. You can
ignore that variable if you like, but sometimes it can be useful.

2.5 Merging Files


Sometimes it is easier to enter data into multiple separate data files (Time 1 data and Time 2 data
for example, or each questionnaire in a separate data file) to keep file size more manageable.
But at some point, you may need to look at data all togetherthat is, you need to merge your
data files. There are two ways to merge data filesadding cases (or adding subjects) and adding
variables.

14

Adding cases
To add cases to an existing data file, go to DataMerge filesAdd Cases (circled in red below).
That will pop up the Add cases: Read file window shown below. Click on the file that contains
the cases you need to add. In this case, that is boms3.sav.

Once you choose the data file (boms3


in this case), the Add cases dialog box
will come up (to left). The variables
with the same variable name are
assumed to be paired up and will
appear in the new file (in green box).
Any variables that do not have the
same name will be dropped from the
new data file. In this case, however,
we can see that ID# was called id in
one data file and idnum in the other
(circled in red to left). If we want to
include those in the new data file, we
would click on both id and idnum
(which will highlight both of
them) and then click the Pair button (red arrow), which will then include id and idnum in the new
data file as one variable. When youre done pairing variables, click OK. You will get a new
dataset that includes all of the cases from both of your datasets.

15

Adding variables
Adding variables is a little more tricky. You will need
a variable with the same name in both files (for
example, id). Before you start, you have to sort
BOTH data files in ascending order by that variable,
which SPSS calls a key variable. For example, you
will see in the case to the right that the variable id
(our key variable) is NOT sorted (red arrow to right).
Merging to add variables will not work in this case.
To sort by id, click on DataSort Cases (circled in
green below). This will pop up the Sort Cases dialog
box. As you can see, I have clicked over id into the
Sort by box (red arrow below), and it is sorted in
ascending order (the default). Once you do this to
both data files, you are ready to merge and add
variables.

Go into one of your data files, and click DataMerge FilesAdd Variables (circled in red
below). This will pop up the Add Variables: Read File window. Choose the (sorted) file that has
the additional variables you want to add to your current (sorted) data file. In this case that is
boms2 (green arrow below). Click OK.

16

This will pop up the Add Variables


dialog box to the right. The red arrow
shows that one of the id variables is
being excluded, while the rest of our
variables are in the New working data
file (blue box). The key variables box
is currently empty (outlined in green).
This is NOT what you wantthis is
just how the dialog box pops up by
default. You will want to check the
Match cases on key variables in
sorted files box (green arrow to right)
then click on id (red arrow to right)

and send the id variable over to the


Key Variables box (in green above) using the
Red arrow button (circled in red above). Once
you have done this, it will look like the dialog
box to left. Then click OK. SPSS will warn
you once again that your key variables must
be sorted. Click OK on that and your new
dataset will be formed. You can also exclude
extraneous variables at this stage. For
example, if you calculated a total score from a questionnaire and dont need all of the individual
items, you can click on them in the New Working Data File box and send them (using the little
17

arrow button) over to the Excluded Variables box. This is a good way to clean up your dataset so
you are only looking at the variables you need. But make sure you keep your original data
somewhere so you dont have to re-enter it.

3. Analyzing your data


Yay! Your data are all neat and tidy and ready to be analyzed. I have created a dataset called
bomsclean.sav to use as an example. It contains: ID#, gender, family (only child vs. firstborn vs.
laterborn), age, bomstot (the total score on our Brat-O-Meter Scale, rbomstot (a median split on
the bomstot variable), and bomstot2 (another Brat-O-Meter scale taken a week later). You see
why labels become important! By the way, these are completely made-up data, so you should
not take any results reported here as representing anything other than unconscious biases in
random data creation on the part of well, me.
3.1 Independent Samples t-test
OK, lets start simplea t-test. Are men more bratty than women? A t-test on bomstot by
gender.
Go to AnalyzeCompare Means
Independent Samples T-Test (circled in
red to right). Note: In SPSS8 this
menu is called Statistics, not Analyze,
but everything else is the same. This
will pop up the Independent Samples
T-Test dialog box (to right). Click
your dependent measure(s) (here,
bomstotred arrow to right) into the
Test Variable(s) box. You can do a
bunch at once. Send your binary
variable (here gendergreen arrow to
right) into the Grouping Variable box.
You will see 2 little question marks in
parentheses next to your grouping
variable. This means you need to
define your groups. Click the Define
Groups button (blue arrow to right).
This will bring up the Define Groups
dialog box (below). Simply enter the
values of your grouping variable (here
1=female; 2=malecircled in red
below). You could also specify a cutpoint and do your t-test that way (e.g.

18

compare people who scored above 10 compared to below


10 on some scale) by using the cut point radio button
(green arrow to left). I tend not to use this option. Click
continue and then click OK in the T-test dialog box. The
Options button in the T-test dialog box (above) doesnt do
much interesting. It does allow you to change the
confidence interval alpha of the confidence intervals that
the t-test spits out. The default is a 95% confidence
interval, which is what most people want.
Here is the t-test output:

The window to the left above shows an outline of all of your output. I like to rename the tests so
I can see what Ive done. For example, I would call this T-test of bomstot by gender (rather than
just T-test) Ill show you how to do that later in the output section. You can see that SPSS has
spit out the two categories (female and malered arrow above), the N for each group (green
arrow above) and the mean for each group (blue arrow above) as well as the standard deviation
and the standard error. Woohoo, boys are brattier than girls according to the means, but is it
significant? Levines test for quality of variances (outlined in red above) is not significant, so the
variances can be assumed to be equal. In that case, you use the first line of results (in purple
above). If the Levines test had been significant, we would use the lower line of results (in
orange above). You can see the t value, degrees of freedom, and p value in the green box above,
and the 95% confidence interval for the difference in the blue box. In this case, men and women
are not significantly different on the Brat-O-Meter Scale, t(28)=-1.529, p=.137.
3.2 Paired t-test
To do a paired t-test in SPSS, we will use the Time 1 vs. Time 2 bomstot variables. This will test
whether people were brattier at the first time point (lets say, right before a visit to see parents)
and the second (right after the same visit). Go to AnalyzeCompare MeansPaired Samples TTest (circled in red below). This will pop up the Paired samples t-test dialog box below. Click
19

on the 2 variables that you want to compare (here bomstot and bomstot2green arrows below)
then click the arrow button (circled in blue below). This will pair those two variables. Again,
the options button only allows you to change the percentage on your confidence interval, and the
default is 95%.

This will give you your paired variables in the


Paired Variables box (see example to right). You
can pair up as many variables as you want and do
the t-tests at the same time. This is not a
multivariate testit simply save you trouble and
does individual t-tests in a batch. Click OK when
youre done. You can see from the output (below)
that there is a very low correlation between Time 1
and Time 2 (outlined in red). This would indicate
that the boms scale has low test-retest reliability.
You can also see that there is not a significant effect of time (or not a significant difference
between Time 1 and Time 2 boms score), t(29)=.499, p=.622 (outlined in green). The blue box,
again, shows the confidence interval of the difference.
20

3.3 Oneway simple ANOVA


The oneway ANOVA works pretty much like an independent samples t-test. Well do an
ANOVA to determine whether birth order has an effect on brattiness. Go to AnalyzeCompare
meansOne-way ANOVA (circled in red below). This will pop up the One-way ANOVA dialog
box. You can see I have clicked over birth order into the factor box and bomstot into the
Dependent list box (again, you can analyze more than one dependent measure at once). Here,
you do get more options. Clicking the Options button
(blue arrow below) takes you to the Options dialog box
(to right). I generally check the Descriptive box (red
arrow to right) to get descriptive statistics of the
dependent measures for my groups. You can also check
the Homogeneity of variance box (green arrow to right)
to check that assumption of the ANOVA.

21

You can also click on the Post Hoc


button (green arrow to left). Which
will bring up the Post Hoc multiple
comparisons window below. There
are many post-hoc techniques to
choose from. Simply check the box
or boxes you want. You can change
the familywise error rate by
changing the value in the
significance level box (circled in red
below). As for choosing a post-hoc?
I generally use Tukeyit seems like
a good mix of controlling error and
not being too conservative
(Bonferroni is the most
conservative).

Clicking on the Contrasts button (red arrow above) will


take you to the Contrasts box below. Enter the
coefficients for your linear contrast one at a time into
the Coefficients box (red arrow below) then click the
Add button after each one (green arrow below). In this
case, we will compare only children with people who
have siblings. So only children get a coefficient of 2
and first and laterborns each get a coefficient of +1. I
have already entered the 2 and the first +1. Now the second +1 is in the box. When I click Add
it will appear below the other two (blue arrow below). The order of the coefficients is important.
For the family variable, 1=only, 2=firstborn, and 3=laterborn, so the first coefficient in the box
will go to only children and so forth. You can also check for linear or quadratic (or other
polynomial) trends by checking the Polynomial box (purple arrow below) and then choosing
linear, quadratic, etc from the pulldown menu beside it. That doesnt make sense for this
example, but might make sense if your factor was something like increasing dosages of a
medicine. When you are done, click Continue, and then click OK in the ANOVA box.
On the next page you will see the results
of this analysis. It starts with the
descriptive statistics that we selected in the
Options window. It gives you the N,
mean, standard deviation, etc. (red arrow
below), as well as the minimum and
maximum scores for each group (green
arrow below). The Ns and mins and
maxes are good numbers to double check
to make sure there are no errors in your

22

23

dataset (or 99s that someone entered as a missing value). Next, the Levines test for
homogeneity of variances is not significant (circled in red above) so equal variances can be
assumed. Next is a typical ANOVA table (outlined in green above) including SS, df, Mean
Squares, F, and p. This analysis is not significant (probably because the data are completely
random). Because you do not have a significant main effect, you should stop here, but we will
look at the output from the contrasts and post-hocs anyway as a learning exercise. In real life,
you do not look at these tests if your main ANOVA is not significant. The blue box above shows
the contrast coefficientsthis is just as a double-check. Next you have the contrast tests.
Because Levines above was not significant, you can use the first row of numbers (assume equal
variances). This table includes the contrast value blue arrow above), the t value (purple arrow
above), df (orange arrow above), and significance (pink arrow above). In this case, the contrast
value was 3.30 and was not significant t(27)=-.774, p=.446. Finally we come to the multiple
comparisons. In the blue box above, you can see the mean difference for each pairwise
comparison and the significance value. When a difference is significant, the mean difference is
starred. The purple box above shows the confidence intervals for the differencethese all
include zero, confirming that out differences are not significant.
3.4 Chi square contingency test
This is the question about SPSS that I have fielded more than any other question. This oft-used
test is just not where you would think. As an example, we can examine whether gender is
associated with scoring above or below the median on the bomstot variable (using our median
split nbomstot). Go to AnalyzeDescriptive StatisticsCrosstabs (circled in red below). Click
your two categorical
variables into the Row and
Column boxes (it doesnt
matter which goes into
which). Then click the
Statistics button (green arrow
to right). This will pop up
the Crosstabs: Statistics box
below. Check the Chi-square
box (red arrow below) to
perform the Chi-square test
on your contingency table.
Click Continue, then
OK.

24

Also notice that the Crosstabs: Statistics box is where you would go to perform a Kappa
reliability test (blue arrow above)Kappa is the reliability statistic used when two raters make
categorical judgments rather than continuous ratings.
Below is the Chi-square test output. First, youll see a Case Processing Summary (circled in
green to left). This
will pop
up in many of the
statistics you do.
Its good to check
that you have the
expected number of
cases included and
are not missing
large portions of
data. Next is the
crosstab, or
contingency table
(red arrow to left).
Finally the Chisquare test is
reported (in blue
box to left). The
Pearson Chi-square
on the first line is
the typical test used
for data of this sort.
Notice that SPSS
will warn you if you
have expected cell
counts lower than 5
(purple arrow to
left). This test
should not be used
in that case.
Note that the Chi-square model fit test is under AnalyzeNonparametric testsChi-square.
This is a different testone in which you assign expected values to cells and test the goodness of
fit of that model The fact that these are called the same thing has tricked many an SPSS user.
3.5 Correlations (simple and partial)
Simple correlations are a piece of cake in SPSS. You can do a whole slew of em if you want.
Go to AnalyzeCorrelateBivariate (circled in red below). Click over all of the variables that
you want to correlate. In this case, we have age, bomstot and bomstot2 (Time 1 and Time 2
brattiness). SPSS will compute all pairwise correlation. Thats itjust click OK.

25

SPSS will spit


out a nice table
(see below).
Each cell has a
correlation
coefficient, 2tailed
significance,
and N. Each
correlation
appears twice in
the symmetrical
table, and there
are 1s (as
expected) on the
diagonal. Easy
as pie. Nothing
significant here,
as usual.

SPSS will also do


partial correlations
in which you can
examine the
relationship
between two
variables controlling
for a third. For
example, we can
look at the effect of
age on Time2
brattiness
controlling for
Time1 brattiness.
Go to Analyze
CorrelatePartial
(circled in red
below). Send the
variables of interest into the Variables box, and the control variable(s) into the Controlling for
box and click OK.

26

Below you will see


the results of this
analysis. The
(symmetrical) table
reports the
correlation, degrees
of freedom, and the
2-tailed p-value
(outlined in green
below). You can
see that the partial
correlation of age
and bomstot2,
controlling for
bomstot, is a
whopping -.0335.

3.6 Regression
The linear regression function in SPSS covers a lot of ground. Go to AnalyzeRegression
Linear (circled in red below). That will pop up the Linear regression dialog box shown below.
Enter your dependent measure (here we used bomstot) into the Dependent box (red arrow
below). Enter your independent variable(s) (here age) into the Independent(s) box (green arrow
below). You can enter more than one independent variable here. Choose a regression method if
you are using more than one independent variable using the pulldown menu (blue arrow below).
Enter is the default and is standard linear regression but you can also use stepwise regression,
either forward and backward, enter (and remove) variables in blocks using the Previous and Next
27

buttons, etc. This is a very versatile dialog box. Of more common use are the Statistics, Save,
and Options buttons.
The Statistics button (outlined
in green to left) brings up the
Statistics window below.
Checking the estimates box
(red arrow below) gives you
estimated for your regression
coefficients (or betas).
Checking the Model fit box
(green arrow below) gives you
an R2 for the regression model.
Checking R squared change
will tell you the change in R2 if
each variable (when you have
more than one independent
variable) is removed. Finally,
checking casewise diagnostics
(purple arrow below) will give
you information on outliers
outside a range that you specify
(here 2 standard deviations).

Clicking the Save button (outlined in purple above) allows you to save residuals of various kinds
from your regression in a column in your dataset (outlined in red to left below). This is useful in
examining residuals to look for a patterns and in computing corrected means. Finally, clicking
the Options button (outlined in orange above) allows you to remove the constant from your
regression (forcing it to go through zero) by unchecking the Include constant box (orange arrow
to right below). It also gives some options for Stepwise regression.

28

There are clearly far too many regression


options for this guide to explicate all of
them, but again, right clicking on most
options in SPSS will give you more
information.
To the right is output from a simple but
typical SPSS regression analysis. The R
and R2 are reported in the Model summary
(red and green arrows to right
respectively). An ANOVA table for the
regression is also reported (outlined in
blue to right). This tells whether your
regression model as a whole is predicting a
significant amount of variance. Finally the
Beta coefficients and t-tests for them are
reported in the orange box to right. Here,
the only thing that is significant is the
Constant (or the intercept). Dont get
excited boys and girls, that doesnt help
you get published.
For logistic regression (in which the
dependent variable is categorical instead
of continuous), use AnalyzeRegression
Binary Logistic (for a two-category DV)
or AnalyzeRegressionMultinomial
Logistic (for a multi-category DV). Inputs
look much the same, except one can use
categorical independent variables as well
as continuous. Enter all independent
variables into the Covariates box, then
click the Categorical button which allows

29

you to assign some of your covariates as categorical. Output will also include a Chi-square
goodness of fit test (to test the goodness of your prediction) and a table of predicted values. A
full treatment of logistic regression is beyond the scope of this guide, but it is fairly
straightforward to use the SPSS functionality if you read and understand a chapter or so on the
statistical test that you are performing.
3.7 ANOVA models and GLM
SPSS offers pretty much any kind of
ANOVA model you can think of. Lets
start with a univariate ANOVA.
Actually, the univariate GLM
encompasses ANCOVA as well. Go to
AnalyzeGeneral Linear
ModelUnivariate (circled in red to
right). Click your dependent measure
(continuous) into the Dependent Variable
box (outlined in green to right). Click
over any fixed factors (ordinary ANOVA
factorscategorical variables) into the
Fixed Factor(s) box (outlined in blue to
right). Enter any random effects factors
(such as region, classroom, etccheck a
statistics textbook if you are not sure)
into the Random Factor(s) box (outlined
in purple to right). Finally enter any
continuous predictors, or covariates, into
the Covariate(s) box (outlined in orange
to right). There is generally some
confusion about the meaning of the word
covariate. Many people use covariate to
mean a variable I dont care about, as
in Ill just covary out SES. But in
statistical and SPSS terms, a covariate is
simply a continuous
predictor. You CAN use this method to covary out age in the above example, but you would
use the exact same technique if you were interested in the effect of age as well as your factor
effects. Whew now we have all of our factors and covariates in place, but theres more. Click
on the Model button (red arrow above) to specify anything less than a fully crossed model For
example, lets say that we are interested in main effects of gender, birth order, and age, as well as
the interaction of gender and age, but no other interactions. We click on model which pops up
the Univariate: Model dialog box below to left. Click on the Custom radio button (red arrow
below to left) to specify a custom model. You will see that I have already sent over main effects
for gender and family and am about to send over the main effect of age (green arrow below to
left). Simply click on the effect you want to send over, then click the arrow button (outlined in
purple below to left). One the panel below and to the right, you can see I have sent over the

30

main effect of age, and also the interaction effect of age by gender (orange arrow below to right).
To do this just click on both age and gender, then while both are highlighted, click the arrow
button (outlined in purple below to left). Once you have the custom model you want, click
Continue.

Going all the way back up to the Univariate dialog box


on the previous page, clicking on the Contrasts button
(green arrow on previous page) allows you to specify
contrasts on your factors. Below you will see I have
assigned Simple contrasts to the gender variable (red
arrow to right). This is actually less than fascinating,
because the gender variable only has 2 levels to begin
with. But for the family variable, which has three
levels, you can use simple (in which each level is
compared to either the first or last level), deviation (in
which each level except for one is compared to the
overall effect of the variable, repeated (in which each
level is compared to the one previous to it) Helmert,
reverse Helmert (a.k.a. difference), or polynomial
contrasts that examine linear and quadratic effects.
Highlight the variable for which you want to assign a
contrast in the Factors box, choose a type of contrast
from the pulldown menu (green arrow to right), then
click the Change button (Blue arrow to right).
Finally, the Options button in the Univariate window (blue arrow on previous page) allows you
to examine multiple comparisons in your factors, request homogeneity tests (green arrow below),
etc. Here, we have requested descriptive statistics ed arrow below) and LSD multiple
comparisons for the family variable (purple arrow below). By using the pulldown menu (blue
arrow below) you can change the comparison technique to Bonferroni or Sidak. This window
also allows you to do such things as report observed power, effect size estimates, etc.

31

Below and on the following


page, you will see the output
from this large analysis. First,
below and to the left, the output
simply reports your between
subjects factors (red arrow below
to left). You can double-check
your Ns here. Next, you have the
descriptive statistics that you
requested in the Options box to
left (green arrow below to left).
This presents a nice table of
means, suitable for later
graphing. Next, below to the
right, you have the Levenes test
for equality of variances
(outlined in blue below to right)
that you also requested in
Options Because this test is not
significant, you can assume your
equal variance assumption was met. Next you
have an ANOVA table (outlined below to right
in orange) that reports F, df, p-value etc. for all
of the main and interaction effects in your
custom model.

32

To the left, youll see the


results of the contrast we
requested on the birth order
variable. Level 1 (only child)
is not different from Level 2
(firstborn) (see red arrow to
left) and Level 1 is not
different from Level 3 (later
born) (see green arrow to
left). The overall test results
for this contrast indicate it is
not useful (see nonsignificant
p circled in orange to left).
Next, we have the estimate
marginal means for birth
order controlling for our
covariateage (outlined in
pink to left). SPSS did also
spit out pairwise comparisons
for the birth order variable,
but that output looks identical
to the pairwise comparisons
we produced in the simple
oneway ANOVA example so
we will not go through them
in detail here.
As you can see, quite a bit of
output is generated in
response to all of these extra
tests. There is, luckily, some
help for you with output that
we will explore in the Output
section of this guide.
You can also see that the
output and examples get
more complicated as the
statistics get more
complicated. I strongly urge
you not to use any statistics
in SPSS that you are not quite familiar with. It is very easy to point and click your way to
mistaken conclusions, and this guide is not meant to substitute for strong knowledge of the
statistics you wish to use.

33

Repeated measures
To give a full example of the functionality of the Repeated measures GLM, I have added 4 new
variables to our dataset. They are: bomsfam1, bomsfam2, bomsfrd1, and bomsfrd2. These
assess the family and friend subscales of the BOMS scale at Time 1 and Time2. These will help
me to show an example of a fully crossed within-subjects design.
To run a repeated measures ANOVA, go to
AnalyzeGeneral Linear ModelRepeated
Measures (circled in red to right). This will pop
up the Repeated Measures: Define Factor(s)
dialog box below. Here, you enter each withinsubjects factor in your design (saving your
between subjects factors for later). I have
already entered the subscale (family vs. friends)
factor (pink arrow to right). To enter the time
factor (Time 1 vs. Time 2), enter time in the
Within-subject factor name box (purple arrow to
right) then enter the number of levels for this
factor (blue arrow to right) then click Add
(green arrow to right). Once you have Added all
of your within-=subjects factors, click the
Define button (orange arrow to right).
This will pop up the Repeated measures dialog
box below. Here you can enter your between
subjects factors (here, birth order, blue arrow
below) and covariates (here, age, orange arrow
below). You also need to define your within
subjects variables at this point.

I have already defined 3 of the four


cells needed. You need to look
carefully at the order of your crossed
variables (see red box to left). Here,
subscale is the first number in
parentheses and time is the second. So
(1,2) would be Family, Time2. We still
need to enter the last cell (2,2) (see
green arrow to left) by clicking over
BOMS Friend scale Time 2. The
Model, Contrasts, and Options buttons
work the same way as those in the
Univariate GLM example above.
Once you have specified all of those to
your liking, click OK.

34

To the left is the


first page of
output from the
repeated measures
GLM. This
output can be very
confusing. First
you have a table
of your withinsubjects factors
(reed arrow to
left). Next you
see your between
subjects factor(s)
(green arrow to
left). Next is a
large and scarylooking table of
Multivariate tests
(orange arrow to
left). In most
cases, you can
actually ignore
this table. The
multivariate tests
are not necessarily
the tests you need
to look at,
although they are
often equivalent to
the within- and
between-subjects
tests later. Next,
something called
Mauchlys test of
Sphericity will
print out. In this
example, there
were not sufficient
degress of
freedom to do this
test. If Mauchlys
test is significant,
you should NOT
use the

35

Sphericity Assumed
row in your ANOVA
table. (red arrow to
right). Otherwise, in
most cases, you can
assume Sphericity. In
fact, in most cases, all
rows within a cell of
this table will look the
same. This table also
gives information on
the error terms for
each group of tests
most importantly, the
MSE for these tests
(green arrows to
right). Next, SPSS
prints out tests of
within-subjects
contrasts (red arrow
on next page). It does
this even if you dont
request it, and uses
linear trend contrasts
as a default. These
tend not to be useful
to most people. You
can ignore this table
too. Finally, you get
to your between
subjects effects
ANOVA table (purple
arrow on next page).

You can see that Repeated measures GLM outputs quite a bit of material. You will probably
want to tidy this output up a little, which will be demonstrated in the Output section of this guide.
You can also see that we have a significant 3-way interaction in these data (subscale*time*family
above), thus showing that Type I error will give you a significant result every so often even when
nothing is going on.

36

3.8 Reliability
Another common analysis is to determine alpha reliabilityeither for scale or questionnaire
items or among raters or coders. In either case, the items (or people) to be compared must be
entered in columns and the subjects or observations must be entered in the rows. If you have
your data entered backwards, there is a transpose function in Excels Paste Special window. In
this case, we will use our old BOMS items and determine reliability. Here we will look at
boms1-boms10. Go to AnalyzeScaleReliability Analysis (circled in red below). This will
pop up the Reliability Analysis dialog box below. Click over all of your items or coders (here,
bomns1-boms10) into the Items box. Make sure your Model is set to Alpha (orange arrow
below). You can also set this Model to split-half or some other forms of reliability. If you like,
you can press the Statistics button (outlined in green below). That will take you to a dialog box
in which you can do item analysis (e.g. get the alpha with each item of the scale deleted to see if
any items are pulling your alpha down, etc.). Otherwise, just press OK to see your alpha.

37

Easy as pieyou can


see the Alpha in the
simple output below
(red arrow below).
Generally an alpha of .7
or higher is considered
acceptable.
All things being equal,
alpha does tend to get
higher as more items
(or more coders) are
introduced.

4. Taking a look at your data

38

4.1 Checking the numbers


One way to get a simple look at your data is to look at frequencies or tables. Tables can give you
an idea of means or medians, etc for your groups. Frequencies can alert you to outliers or data
entry errors. Perhaps this section should have come before data analysis, but I can never resist
getting a peek at significance levels before I tease myself with means and pretty graphs. Im
weird that way.
Frequencies
Go to AnalyzeDescriptive Statistics
Frequencies (circled in red to right). This will
pop up the Frequencies dialog box. Click over
the variable(s) you are interested in. Click on
the Statistics button (green arrow to right) to get
the box below. There you can ask for quartiles,
mean, median, mode, and other descriptive
measures.

You can click on the Charts button (blue arrow


to right) to request, for example, a histogram
(red arrow below).
Finally, the Format button (purple arrow
above) allows you to do such things as
switch your frequency table order from
ascending to descending order by variable
values, or to ascending or descending order
by frequency count. The next page has a
sample frequency output, with a histogram
requested using the Charts button as
indicated by the red arrow to left.
The output is fairly straightforward. It
gives the observed values of your
variable (red arrow to right), the
39

observed frequency (green arrow to


right), the percentage of observations
with that value (blue arrow to right).
The Valid percent column (purple arrow
to right) gives the percentages based on
only non-missing observations (in this
case that is the same). Finally you get
the cumulative percent (orange arrow to
right). Then you can see the histogram
that we requested, clearly showing one
outlier.
Note: Dealing with outliers and
transformations
In order to eliminate outliers from
analyses, you would use the Select cases
function described earlier.
If your histogram showed you that you
needed to transform your data, you
would use the Compute function
described earlier to take the square root,
inverse, cosine, or whatever
transformation is necessary

Tables
Tables are also a good way to get a quick
look at whats going on in your data in
preparation for graphing. Go to
AnalyzeReportsCase summaries
(circled in red below). Click over the
variable you want statistics for in your
table (see green arrow below), and click
over any grouping variables (see blue
arrow below) Here, we will look at means and standard errors for bomstot by birth order. I
prefer to uncheck the Display cases box (orange arrow below) because I dont want a frequency
tableI just want the summaries, but you could leave that checked if you wanted a frequency
table at the same time

40

Click the Statistics button (outlined in


green to left) to choose which statistics
will go into the table. Below you can
see we have selected mean and
standard error of the mean.

Click the Options button (outlined in


blue to left) if you want to change the
title of your table or exclude the
Total category in your tables (see red
arrow below). Below is sample output
from the table we have created.

The output shows the means and


standard errors for the three
groups sorted by birth order (see
green arrows to right) as well as
for the whole sample (red arrow
to right).

4.2 Graphing and plotting


41

OK, its pretty picture time. You can use scatterplots to get an idea about the relationship
between two variables, histograms to get an idea about the distribution of your variables, and bar
charts to help interpret interactions or to show your results to your friends and family (I include
grant reviewers in this category).
Scatterplots
To create a scatterplot, go to GraphsScatter (circled in red below). This will pop up the
Scatterplot dialog box in which you select a style of scatterplot. A simple scatterplot (red arrow
below) will serve most peoples purposes. Choose your style then click Define. This will pop up
the Simple Scatterplot dialog box below. Choose your X and Y axes from your variable list
(green arrows below). Click on Titles (outlined in blue below) to add titles to your scatterplot.
You will probably not need to click on Click on Options (outlined in orange below).

Once you are finished, click OK to get your scatterplot.


You can see the very straightforward output to right.

42

Histograms
We saw one way to create histograms using the Frequencies function in the last section. You can
also create them another way. Go to GraphsHistogram (circled in red below). This pops up
the histogram
dialog box (to
left). Click over
the variable you
want to graph.
You can click on
the Titles button
to add titles. You
can check the
Display normal
curve box (green
arrow to left) if
you want a normal
curve
superimposed on
your histogram.

To right you can see the output from a sample


histogram on bomstot2.
Bar Charts
Bar charts are also fairly easy to create in
SPSS. Personally, I tend to create my bar
charts in Excel because they are easier to
format, and you can add error bars to Excel
Bar charts. As far as I know, there is no way
to add error bars to SPSS bar charts. This
is another of those frequently asked questions.
To create a bar chart, go to GraphsBar
(circled in red below). This will pop up the
Bar Charts dialog box below. I tend to use
clustered bar charts most often (green arrow
below) because they help to understand what
is going on in an interaction. Choose your bar
chart style and then click Define. This will
pop up the Define Clustered Bar Charts dialog
box below. Select the two grouping variables by which you want to cluster your data (blue
arrows below). These would be the two variables that interact on the dependent measure. Then
click over the continuous variable that you want to graph into the Variable box (orange arrow

43

below). Note that the Other summary function radio button must be clicked in order to create
this kind of bar chart. You could, instead, do a bar chart on number of cases, or percentage,
using one of the other radio buttons. You can change the summary function from mean (the
default) to median or some other function by clicking the Change summary button (purple arrow
below). Again, you can add titles by using the Titles button. In this case, I do generally click the
Options button and deselect (uncheck) the Display groups defined by missing values checkbox.
If you dont do this, you will get an extra group for anyone who is missing values in your dataset
and it gets in the way, in my opinion. Once you are done, click OK to see your bar chart.

44

And here is your


completed bar chart.
Line charts and
other types of
graphs are equally
simple to create, so
I will leave it to you
to play around with
the rest of those.

5. Output
5.1 Organizing
As we mentioned before, some of these analyses spit out large amounts of output that you dont
really need. In addition, a happy day of data analysis can leave you with more tests that you can
handle, so keeping things organized is the goal of this section.
We have been kind of
ignoring the lefthand side
of the output windowthe
organizational part. You
can see in the output to
right that it is hard from the
output window to know
exactly what analyses were
done. The first big help is
to rename the tests. Instead
of T-test, report WHAT the
t-test was on. You can also
click on the little minuses
to temporarily hide
analyses. Finally, you can

45

see in the output window above that


there are Notes whose icons look like a
closed book rather than an open book.
These are hidden sections of output.
They will remind you exactly what
analysis you are looking at, whether a
filter was in place, etc. You can unhide
these notes (or hide any visible output
component) by double clicking on it.
To rename a component, do not double
click on it. Rather, click on it twice,
slowly, to highlight the name so
that you can change it. Below you will see a much tidier example of the same output, in which
we have hidden the Graph and renamed all of the components
Still, if you print out your output, none of these pretty organizational things will show up. So
you need to incorporate some organization into the right side of the results window. Double
clicking on any element in the results window allows you to edit it. For example, double
clicking on the t-test title (red arrow above) will allow you to edit that title to read T-test of
bomstot by gender, or whatever is helpful to you. Double clicking on charts and graphs will give
you options to change them, add titles, change the axes, etc. Double clicking on output tables
will allow you to go in and change numbers, or copy and paste the cells out into Excel or some
other program.
To add text to your output, click on
Insert New Text (circled in red to right).
This will allow you to incorporate text
notes into your output to help remind you
what you did.
5.2 Results Coach
Another way to help you plow through
mountains of output that may not make
sense is to use the Results Coach. Double
click on a table or component in your
results that you want explained (in this
case, well use the Tests of between
subjects effects in our ANOVA (red arrow
below). Then go to HelpResults coach
(circled in red below). This will pop up a
window that helps you understand the
results you are seeing.

46

To the left is
the Results
Coach.
Simply hit
the Next
button
(green arrow
to left) to
cycle
through the
information
given by the
coach. This
is a very
helpful
feature.

47

6. Using syntax
There are two simple ways to start using syntax. Either you can save a specific analysis by using
the Paste function, or you can log your entire session in a Session Journal.
6.1 The Paste function
All of the functions that you can use in SPSS to compute variables, do statistics, and create
graphs have a little button near the Cancel and OK buttons called Paste. Here is an example
from the Univariate ANOVA case (red arrow below). Hitting the Paste instead of the OK button
will paste the syntax associated with the
action you are about to perform into a
syntax window (which will pop up
automatically). Below you will see the
syntax associated with this analysis. To
run this syntax, highlight the part you wish
to run (all of it in this case) and then hit
the Arrow button (orange arrow below).

If you do this each time you are about to run an


analysis, you will have a record of the statistics
you have done. You can go in and edit just as you would text if you make a mistake. You can
also copy and paste and then just make small adjustments in the pasted syntax if, for example,
you need to do something very similar many times. Once you have a syntax file you are happy
with, just save it using the File menu as you would any other file.
6.2 Creating a Session Journal
Actually, SPSS has been creating a session journal, a kind of log file, every time you use SPSS.
But it has been putting it in a temporary directory and probably overwriting it. Go to Edit
Options (circled in red below). In the Options window, go to the General Tab (it will probably
come up by default). Outlined below in green, you will see the Session Journal Options. If the
box for Record syntax in journal is not checked, check it. You will see that right now, my syntax
has been recorded in C:\WINNT\TEMP\spss.jnl. You can click the Browse button in the green
box to choose a file or directory that youd like to save your syntax into. Decide whether you
want to append the files each time or overwrite it each time you begin a new session. Then click
OK to have this Option take effect. This will save all of the syntax for your entire session into a

48

file that you choose.


You can then go in and
highlight parts to run
them again at a later
time, or else simply
keep the syntax as a
record of your analyses.

7. For more information


For more information, the SPSS manuals that came with your software are good references.
They are not so hot at getting you started using SPSS, which is why I created this guide, but once
you know what youre doing, they can help you with specific questions. Even easier, though, are
the help files included with SPSS. Right clicking on most things will give you an option to
choose Whats this? or may simply pop up an explanation. If those dont work, the
HelpTopics will bring up a window that has Contents, as well as an Index and a Find Tab that
can help you to find more information on specific kinds of analyses. Finally, you can contact
technical support assuming you are using a licensed copy of SPSS. Go to whoever handled the
licensing and ask for the tech support number, or else seek out the technical support person in
your organization. If you have any questions about this guide, please e-mail me at
pam@psych.stanford.edu.

49

Vous aimerez peut-être aussi