Vous êtes sur la page 1sur 36

March 2011

CAPACITY BUILDING WORKSHOPDATA MANAGEMENT SOFTWARE


Training on data software PSPP Introduction

Dakar, March 2011

ACP Observatory on Migration 20, rue Belliardstraat (7th floor) 1040 Brussels Tel: +32 (0)2 894 92 30 Fax: +32 (0)2 894 92 49 mrfbrusselsacp@iom.int www.acpmigration-obs.org

An ACP Secretariat initiative, implemented by IOM, funded by the European Union and with the financial support of Switzerland

This publication has been produced with the financial assistance of the European Union. Prepared by Brahim El Mouaatamid, Research Assistant, ACP Observatory on Migration. The contents of this publication are the sole responsibility of the author and can in no way be taken to reflect the views of the Secretariat of the African, Caribbean and Pacific Group of States (ACP), the International Organization for Migration (IOM) and the other members of the Consortium of the ACP Observatory on Migration, the European Union nor the Swiss Confederation.

CAPACITY BUILDING WORKSHOP-DATA MANAGEMENT SOFTWARE

Training on Data Software: PSPP1


Introduction
Dakar, March 2011

Table of content
Presentation of PSPP program 1. Windows in PSPP 2. Menus in PSPP 3. Preparing Data for Analysis 4. Data Entry 5. Descriptive Data Analysis 6. PSPP file transformations 7. Merging data files in PSPP 8. Data analysis 9. PSPP Output 10. Further Data Analysis References and Further Reading

Draft prepared by Brahim El Mouaatamid, Research Assistant at the ACP Observatory on Migration. This draft is to be reviewed according to the training progress. For any comments, please contact mrfbrusselsacp@iom.int.

Presentation of PSPP program


PSPP is a program for statistical analysis of sampled data. It is a meant to be free replacement for the proprietary program SPSS. Therefore, it is a System for statistical analysis and data management. It reads a syntax file and a data file, analyzes the data, and
writes the results to a listing file or to standard output. The language accepted by PSPP is similar to those accepted by SPSS statistical products. The current version of PSPP, 0.7.6-g26ff6f, is woefully incomplete in terms of its statistical procedure support. PSPP is a work in progress and its development is ongoing. It already

supports a large subset of SPSS's syntax. PSPP can take data from SPSS files and use them to generate tabulated reports and plots of distributions and trends, descriptive statistics, and conduct complex statistical analyses. At your option, PSPP will produce statistical reports in ASCII, PostScript, PDF, HTML, SVG, or OpenDocument formats. PSPP development is ongoing. Its statistical procedure support is currently limited, but growing. PSPP provides a user interface that makes statistical analysis more intuitive for all levels of users. Simple menus and dialog box selections make it possible to perform complex analyses without typing many lines of command syntax. The built-in PSPP Data Editor offers a simple and efficient spreadsheet-like utility for entering data and browsing the working data file. You can handle the output with greater flexibility by saving it into other file formats such as html and doc formats. The main PSPP web site is : <http://www.gnu.org/software/pspp/>. However, new versions are available for download on <http://pspp.awardspace.com/> and the last version is available since March 13, 2011. The manual can be accessed on < http://sunet.dl.sourceforge.net/project/pspp4windows/pspp-master.pdf>.

1. Windows in PSPP
There are 3 different types of windows that you will see in PSPP:

1. 1. Data Editor Window

This window displays the contents of the data file. You may create new data files, or modify existing ones with the Data Editor. The Data Editor window opens automatically when you start an PSPP session.

1.1.1. Variables sub-window of the Editor window

1.2. Viewer window

The Viewer window displays the statistical results and tables from the analysis you performed (e.g., descriptive statistics, correlations). A Viewer window opens automatically when you run a procedure that generates output. In the Viewer windows, you can edit and copy your results.

1.3. Syntax Editor Window


You can paste your dialog box choices into a Syntax Editor window, where your selections appear in the form of command syntax. You can then edit the command syntax to utilize special features of PSPP not available through dialog boxes. You can open up a Syntax Editor window and enter PSPP commands and execute the job. You can save these commands in a file for use in subsequent PSPP sessions.

2. Menus in PSPP

Many of the tasks you may want to perform with PSPP start with menu selections. Each window in PSPP has its own menu bar with menu selections appropriate for that window type. The Data Editor window, for example, has the following menu with its associated toolbar: Most menus are common for all windows and some are found in certain types of windows.

2.1. Common menus


File

Use the File menu to create a new PSPP system file, open an existing system file, read in spreadsheet or database files created by other software programs such as PSPP. It can be used to read in an external ASCII data file from the Data Editor; create a command file, retrieve an already created PSPP command file into the Syntax Editor; open, and save output files from the Viewer.

Edit

Use the Edit menu to cut, copy, and paste data values from the Data Editor; modify or copy text from the Viewer or Syntax Editor; etc. In the Data editor, this menu can be used to insert cases or variables.

View

Use the View menu to turn toolbars and the status bar on and off, and turn grid lines on and off from all window types; and control the display of value labels and data values in the Data Editor.

Analyze

This menu is selected for various statistical procedures such as tabulation, crosstabulation, correlation, linear regression, factor analysis, etc. 8

Utilities Use the Utilities menu to display information about variables in the working data file. Help The Help menu is not fully operational. It should open the reference manual which contains information on how to use the many features of PSPP. Context sensitive help is not yet available through the dialog boxes although the icon is there.

2.2. Data Editor specific menus


Data Use the Data menu to make global changes to PSPP data files, such as transposing variables and cases or creating subsets of cases for analysis. These changes are only temporary and do not affect the permanent file unless you save the file with the changes.

Transform

Use the Transform menu to make changes to selected variables in the data file and to compute new variables based on the values of existing ones. These changes are temporary and do not affect the permanent file unless you save the file with changes.

2.3. Syntax Editor specific menu


Run

Use the Run menu to run the selected commands.

Toolbars in PSPP The Data Editor Viewer window has a toolbar that provides quick and easy access to common tasks. Tool Tips provide a brief description of each tool when you put the mouse pointer on the tool. For example, the toolbar with Insert Variable shows the following tool tip (create a new variable at the current position) when the mouse pointer is put on the icon:

All these tasks are also available under diverse elements of the menu (Edit, Data, etc.). Status Bar in PSPP for Windows

A status bar at the bottom right of the PSPP application window indicates the current status of the PSPP session. The status bar provides information such as command status, filter status, weight status, and split file status.

10

3. Preparing Data for Analysis

3.1. Organizing Data for Analysis


Suppose you have three remittance values collected for a group of 10 migrants (5 males, and 5 females) during a limited duration of time (18 mounts, value registered at the end of each semester for that semester). Each migrant was assigned an identification number. The information for each migrant you have is an identification number, gender of each migrant, and value for remittance one, remittance two, and remittance three (the full data set is displayed toward the end of this section for you to view). Your first task is to present the data in a form acceptable to PSPP for processing. PSPP uses data organized in rows and columns. Cases are represented in rows and variables are represented in columns. Variable Name Rem1 Rem2 rem3 Mig1 20 23 24 Case Mig2 21 26 28 A case contains information for one unit of analysis (e.g., a person, a country, a region). Variables are information collected for each case, such as name, sex, age, income, country of birth, educational level. In the above chart, there are two cases and four variables. Attributes of Variables In PSPP, each variable has a number of attributes, including the name, Type, Width, Decimals, Labels, Values, Decimals, Label, Values, Missing, Columns, Align, Measure.

Name: it is an identifier, up to 64 bytes long. Each variable must have a different name less than eight characters of variable names are recommended. They must begin with a letter, although the remaining characters can be any letter, any digit, a period, or the symbols (@, #, _, or $). Variable names cannot end with a period . because such an identifier will be misinterpreted when it is the final token on a line. The . will be considered mistakenly as indicating end-of-command. Variable names that end with an underscore _should be avoided. Some system variable names begin with $, but user11

defined variables' names may not begin with $. Blanks and special characters such as &, !, ?, ', and * cannot be used in a variable name. Variable names are not case sensitive. Each variable name must be unique; duplication is not allowed. Type: most variables are generally numeric (e.g., 12, 93.23) or character / string / alphanumeric (e.g., F, f, Ousmane). Only the first 16 digits are correct. The maximum number of decimal positions depends on the number of digits you have before the decimal point because the total valid digits for the numeric variable is 16. String variables with a defined width of eight or fewer characters are short strings; more than eight characters are long strings. Short string variables can be used in many PSPP procedures. You may leave a blank for any missing numeric values or enter a user-define missing (e.g., 9, 99, 999) value. However, for string values a blank is considered a valid value. You may choose to enter a user-defined missing (e.g., x, xxx, na) value for missing short string variables, but long string variables cannot have user-missing values. Following the conventions above, let us assign names for the variables in our data set: id, sex, Rem1, Rem2, and Rem3. Once the variables are named according to PSPP conventions, it is a good practice to prepare a code book with details of the data layout. Following is a code book for the data in discussion. Note that this step is to present your data in an organized fashion. It is not mandatory for data analysis. A code book becomes especially handy when dealing with large number of variables. A short sample data, like the following, may not need a code book, but it is included for illustration. Name id sex Rem1 Rem2 Rem3 8 8 8 8 8 Type Width Numeric String Numeric Numeric Numeric Label identification no. migrant gender (f, m) Remittance value 1st sem. Remittance value 2nd sem. Remittance value 3rd sem. Columns 2 1 2 2 2

In the above code book, width indicates the length of a variable measured in digits or characters. For example, the value for variable id takes a maximum of two fields since the highest identification number in our example is going to be 10. The value for variable sex takes a maximum of one field, and so on. Columns affect only the display of values in the Data Editor. Changing the column width does not change the defined width of a variable type specifies the data type (numeric, comma, dot, scientific notation, date, dollar, custom currency or string). In our example, sex is the only string variable coded as f for female, m for male.

12

4. Data Entry
The next issue is entering your data into the computer. There are several options. You may create a data file using one of your favorite text editors, word processing packages (e.g., MS-Word) or a spreadsheet (e.g., Excel) and read it directly into PSPP for Windows. Files created using word processing software or a spreadsheet should be saved in text format (.txt) before trying to read them into a PSPP session. Finally, you may enter the data directly into the spreadsheet-like Data Editor of PSPP. In this document we are going to examine two of the above data entry methods: using a text editor/word, and using the Data Editor of PSPP.

4.1. Using an Editor/Word Processor to Enter Data


Let us first look into the steps for using a text editor or word processor for entering data. Note that if you have a data set with a limited number of variables, you may want to use the PSPP Data Editor to enter your data. However, this example is for illustration purposes. Open up your editor session, or word session, and enter the variable values into appropriate columns as outlined in the code book. If you are using a word processor, make sure to save your data in text format (txt). Your completed data file will appear as follows. 01f838591 02f657268 03f909490 04f878082 05f788680 06m607464 07m889692 Save the data as a text file named, Remit.txt. Notice that in the above data layout no blank space is left after each variable. We will discuss later the case of blank space left between variables as specified in the code book. It is optional whether to leave a space between variable values. Whichever style (format) you choose, as long as you convey the format correctly to PSPP, it should not have any impact on the analysis.

13

4.2. Creating a Command file to read in your data


In many instances, you may have an external ASCII data file made available to you for analysis, just like the data, remit.txt, we discussed earlier. In such a situation, you do not have to enter your data again into the Data Editor. You can direct PSPP to read the file from the PSPP Syntax Editor window.2 Suppose you want to read the file, remit.txt, into PSPP from a Syntax Editor window and create a system file. Creating a command file is a faster way to define your variables, especially if you have a large number of variables. You may create a command file using your favorite editor or word processor and then read it into a Syntax Editor window or open a Syntax Editor Window and type in the command lines. To read your already created command file into a Syntax Editor window Select File Open Syntax... Choose the syntax file (with .sps extension) you want to read and click Open. In the following example we are opening a new Syntax Editor window. Select File New Syntax. When the Syntax Editor window appears, type:
GET DATA /TYPE=TXT /FILE='C:\PSPP\Remit.txt'/ARRANGEMENT=FIXED /FIRSTCASE=1 /VARIABLES= id 0-1 F sex 2 A rem1 3-4 F rem2 5-6 F rem3 78 F.

Alternatively, to save the new file immediately in PSPP data format, you type:
GET DATA /TYPE=TXT /FILE='C:\PSPP\Remit.txt'/ARRANGEMENT=FIXED /FIRSTCASE=1 /VARIABLES= id 0-1 F sex 2 A rem1 3-4 F rem2 5-6 F rem3 78 F. EXECUTE. SAVE /OUTFILE='C:\PSPP\remit.sav'.

Click and drag with your mouse to highlight the lines entered, then click Run and choose selection. This supposes that the name and location of the file is: C:\PSPP\Remit.txt If different, you change. The command file will read the specified variable values from the data file, Remit.txt, on C:\PSPP, and create a system file, remit.sav, on C:\PSPP. Make sure you specify the pathname; appropriately indicating the location of the external data file and where the newly created file is to be written. However, you do not have to save a system file to do the analysis. This means the last line is optional for data analysis. Every time you run the above lines, PSPP does create an active file stored in the computer's memory. However,

You can also use the Menu File Import Delimited Text Data. If this end with a bug, use the Syntax Editor and create an appropriate command file.
2

14

for large data sets, it will save processing time if you save it as a system file and access it for analysis. In the above command lines, VARIABLES defines a raw data file by assigning names and formats to each variable in the file. They can be in fixed format (values for the same variable are always entered in the same location on the same record for each case) or in free format (values for consecutive variables are not in particular columns but are entered one after the other, separated by blanks or commas). In our example, we have the fixed format and used ARRANGEMENT=FIXED.
FIRSTCASE=1 is the default if data starts from the first row. That is, in our example we did not have to use the FIRSTCASE=1 keyword, but it is included for the sake of

illustration. The only string variable in the data is sex, which is identified with a A after the variable name and column location. The others are identified with an F as numeric variables.

4.3. Reading delimited data


In the case of blank space left after each variable. The syntax will be different. For example, you may choose to enter/save in texte (.TXT) format as following: id sex rem1 rem2 rem3 01 f 83 85 91 02 f 65 72 68 03 f 90 94 90 04 f 87 80 82 05 f 78 86 80 06 m 60 74 64 07 m 88 96 92 08 m 84 79 82 09 m 90 87 93 10 m 76 73 70 In this case, when Syntax editor window appears, you type:
GET DATA /TYPE=TXT /FILE='C:\PSPP\Remit2.txt' /DELIMITERS=' ' /FIRSTCASE=2 /VARIABLES=id F2 sex A1 rem1 F2 rem2 F2 rem3 F2.

15

And if we want to save immediately the new PSPP data file, we add the commands as showed by these two following line.
EXECUTE. SAVE /OUTFILE='C:\PSPP\Remit2.sav'.

Most databases and spreadsheet programs are able to read or save data in a delimited text format. Any character or sequence of characters may be used to separate the values, but the most common delimiters are the comma, tab, and colon. The vertical bar (also referred to as pipe) and space are also sometimes used. The data files with formats using delimiter-separated values can be read by PSPP. In this case, you adapt the syntax by indicating the appropriate delimitation. You can also save the created command file into a Syntax Editor window Select File Save... and save the syntax file (with .sps extension). Using Text Import Wizard to Read Text Data3 Using Text Import Wizard is another way to direct PSPP to read an external ASCII data file. Suppose you want to read the file, remit.txt, into PSPP from Text Import Wizard. Select File Import delimited data, choose the data file remit.txt in your (C:\PSPP) drive and click Open and follow the Steps in the wizard to specify how the data should be read. The data file is read into the PSPP. We can save the data file as remit.sav.

4.4. Using the PSPP Data Editor for entering data


Suppose you want to use the PSPP features for data entry. In that case, you enter data directly into the PSPP spreadsheet-like Data Editor. This is convenient if you have only a small number of variables. The first step is to enter the data into the Data Editor window by opening a PSPP session. You will define your variables, variable type (e.g., numeric, string), number of decimal places, and any other necessary attributes while you are entering the data. In this mode of data entry, you must define each variable in the Data Editor. You cannot define a group of variables (e.g., Q1 to Q10) using the Data Editor. To define a group of variables, without individually specifying them, you would use the Syntax window. Let us start a PSPP session to enter the above data set. Start Windows and launch PSPP. This opens the PSPP Data Editor window (titled Untitled). The Data Editor window contains the menu bar, which you use to open files, choose statistical procedures, etc. When you start a PSPP session, the Data Editor window always opens first. You are ready to enter your data once the Data Editor window appears. The first step is to enter the variable names that will appear as the top row of the data file. When you start the
3

This option is still experimenting bugs in the current version of PSPP (March 2011).

16

session, the top row of the Data Editor window contains a dimmed var as the title of every column, indicating that no data are present. In our sample data set, discussed above, there are five variables named earlier as id, sex, remit1, remit2, and remit3. Let us now enter these variable names into the Data Editor. To define the variables, click on the Variable View tag at the lower left corner of the Data Editor window and: Type in the variable name, id, at the first row under the column Name. Press the Tab key to fill-in the variable's attributes with default settings.

PSPP considers all variables as numeric variables by default. Since id is a numeric variable you do not have to redefine the variable type for id. However, you may want to change the current format for decimal places. Enter 0 for Decimals. Now let us define the second variable, sex. Type in the variable name, sex, at the second row under the column Name. Press the Tab key to fill-in the variable's attributes with default settings. To modify the variable type, click on the grey vertical rectangle icon in the Type column. Select String by clicking on the circle to the left. Define the remaining three numeric variables, rem1, rem2, and rem3, the same way the variable id was defined. Once you have finished, the Variable View screen should look like:

Click on the Data View tag. Now enter the data pressing [Tab] or the right arrow key after each entry. After entering the last variable value for case number one use the arrow key to move the cursor to the beginning of the next line. Continue the process until all the data are entered. 17

4.5. Saving PSPP Data


After you have entered/read the data into the Data Editor, save it onto C:\PSPP or the flash drive or any other location. Select Save... or Save As... from the File menu. A dialog box appears:

18

In the box below File Name type C:\PSPP\remit.sav. Click OK The data will be saved as a PSPP format file which is readable by PSPP. Note that the data file, remit.txt, you saved earlier and the file, remit.sav, you saved now are in different formats.

4.7. Read ASCII data and save as a PSPP data file


The purpose of this example is to illustrate: 1. Data input from the data file Personnel.dat via the DATA LIST command. 2. Saving data & a data dictionary (includes all labels and missing value codes) as an SPSS data file via the SAVE OUTFILE command. Note that you can also create a PSPP data file by selecting Save from the File menu in the SPSS Data Editor. Personnel.dat - ASCII data file Personnel.sav - PSPP data file created Note the location of both data files is the PSPP folder in the C: drive. To run these syntax commands select Run All from the Run menu in the PSPP Syntax Editor.
data list file='C:\PSPP\Personnel.dat' records=2 /1 name 1-24(A) employid 26-30 /2 yrhired 3-4 age 6-7 race 9 sex 11 locatn82 13 dept82 15 jobcat 17 promo82 19 salary82 21-25 raise82 27-31 eeo82 33-33. variable labels name "Employee's Name" employid "Employee's Badge Number" yrhired "Year of First Hiring" age "Employee's Age in 1980" race "Employee's Race" sex "Employee's Sex" locatn82 "City Where Employed" dept82 "Department Code in 1982" jobcat "Job Category" promo82 "Was Emp Promoted in 1982?" salary82 "Yearly Salary in 1982" raise82 "Increase in Salary over 1981". value labels race 1 'Black' 2 'A.Indian' 3 'Oriental' 4 'Latino' 5 'White' /sex 1 'Male' 2 'Female' /locatn82 0 'Not Employed' 1 'Chicago' 5 'St. Louis' /dept82 0 'Not Employed' 1 'Administrative' 2 'Project Directors' 3 'Chicago Operations' 4 'St. Louis Operations' /jobcat 1 'Officials & Managers' 2 'Professionals' 3 'Technicians' 4 'Office and Clerical' 5 'Craftsmen' 8 'Service Workers' /promo82 0 'No' 1 'Yes' 9 'Not Employed'. missing values yrhired to dept82 salary82 (0)/ promo82 (9)/raise82 (-999). execute. save outfile='C:\PSPP\Personnel.sav'

/compressed.

19

5. Descriptive Data Analysis


Suppose that you have the data set, remit.sav, still displayed on the screen. If not, select PSPP Data Editor Open C:\PSPP\remit3.sav Open Here, for the data processing, let us use an extended file containing data for 40 cases. The next step is to run some basic statistical analysis with the data you entered. The commands you use to perform statistical analysis are developed by simply pointing and clicking the mouse to appropriate menu options. This frees you from typing in your command lines. However, you may paste the command selections you made to a Syntax Editor window. The command lines you paste to the Syntax Editor window may be edited and used for subsequent analysis, or saved for later use. Use the Paste pushbutton to paste your dialog box selections into a Syntax Editor window. If you don't have an open Syntax Editor window, one opens automatically the first time you paste from a dialog box. Click the Paste button only if you want to view the command lines you generated. Once you click the Paste pushbutton the dialog selections are pasted to the Syntax Editor window, and this window becomes active. To execute the pasted command lines, highlight them and click run Selection or Run All if this is the case. You can always get back to the Data Editor window by selecting remit3.sav-PSPP Data Editor from the Window menu.

20

6. PSPP file transformations


6.1. Generating a New Variable
Before computing the descriptive statistics, we want to calculate the mean score from the three remittance values for each migrant. To compute the mean score: Select Compute... from the Transform menu. A dialog box appears:

In the box titled Target Variable type in average as the variable name you want to assign to the mean score. Move the pointer to the box titled Numeric Expression: and type: mean (rem1, rem2, rem3) Click OK A new column titled average will be displayed in the Data Editor window with the values of the mean score for each case. The number of decimal places in a newly created variable can be tailored by selecting the Variable View to display format setting and make the needed changes.

21

7. Merging

data files in PSPP

7.1. Add variables: Command Match files


The purpose of this example is to illustrate adding variables from separate PSPP data files via the MATCH FILES command. Let us consider the PSPP data file Personnel.sav located in C:\PSPP\ It is file with 13 variables and 50 cases. To this file, we want to add variables from a separate PSPP data file named Rating.sav located in C:\PSPP\. It is a PSPP data file containing additional information with 8 variables and 40 cases. By merging these two files, we want to obtain a new PSPP data file to be named PersonnelRating.sav in C:\PSPP\. A Combined PSPP data file with 20 variables and 50 cases. Note the PSPP data files Personnel.sav and Rating.sav have only the variable employid in common. Both PSPP data files must be sorted by the same key variable(s) which will be used in the MATCH FILES command. For this example, the data file Personnel.sav is already sorted in ascending order. The data file Rating.sav must be sorted by employid in ascending order. The following syntax is available at the file C:\PSPP\AddVariables.sps To run these syntax commands select Run All in the menu of the PSPP Syntax Editor
get file='C:\PSPP\Rating.sav'. sort cases by employid. match files file= * /file='C:\PSPP\Personnel.sav' /by employid. save outfile='C:\PSPP\PersonnelRating.sav' /compressed.

The subcommand FILE= refers to the data currently in the PSPP data editor. The combined file will have as many cases as the number of unique values of employid. Each case will have employid plus variables from both files.

7.2. Add cases: Command add files


The purpose of this example is to illustrate: 1. Adding cases from separate PSPP data files via the ADD FILES command. 2. Keeping some of the variables via the KEEP subcommand. The keyword TO is used to refer to consecutive variables as seen in the PSPP Data Editor. The variables kept are 22

employid, age, race sex, locatn82 & jobcat. The files in this example are: Dept1.sav - PSPP data file with 14 cases Dept2.sav - PSPP data file with 20 cases Dept3.sav - PSPP data file with 16 cases Dept123.sav - Combined PSPP data file with 50 cases The following syntax is available at the file C:\PSPP\AddCases.sps To run these syntax commands select Run All from the Run menu in the PSPP Syntax Editor.
add files file='H:\PSPP\Dept1.sav' /file='H:\PSPP\Dept2.sav' /file='H:\PSPP\Dept3.sav' /keep=employid age to locatn82 jobcat. save outfile='H:\PSPP\Dept123.sav' /compressed.

23

8. Data analysis

8.1. Frequencies
To run the FREQUENCIES procedure: Select Descriptive Statistics from Analyze menu Choose Frequencies... A dialog box appears. Names of all the variables in the data set appear on the left side of the dialog box. Now the selected variable appears in a box on the right and disappears from the left box. Note that when a variable is highlighted in the left box, the arrow button is pointed right for you to complete the selection. When a variable is highlighted in the right box, the arrow button is pointed left to enable you to deselect a variable (by clicking the button) if necessary. If you need additional statistics besides the frequency count, go to Statistics... area below and make appropriate selections (deselections). In this instance, we are interested only in frequency counts. Click OK

The output appears on the Viewer screen

8.2. Descriptives
Our next task is to run the DESCRIPTIVES procedure on the four continuous variables in the data set. Select Descriptive Statistics from the Analyze menu Choose Descriptives... 24

A dialog box appears. Names of all the numeric variables in the data set appear on the left side of the dialog box.

Click the variable average and click the arrow button to the right of the selected variable Do the same thing for the variables rem1 through rem3 Now the selected variables appear in the box on the right and disappear from the box on the left. The mean, standard deviation, minimum, and maximum are displayed by default. The variables are displayed, by default, in the order in which you selected them. Click OK The following output will be displayed on the Viewer screen.

25

8.3. Means
Suppose you want to obtain the above results for males and females separately. The MEANS procedure displays means, standard deviations, and group counts for dependent variables based on grouping variables. In our data set sex is the grouping variable and rem1, rem2, rem3, and average are the dependent variables. To run the MEANS procedure: Select Analyze Compare Means Means... Select rem1, rem2, rem3, and average as the dependent variables Select sex as the independent variable

Select Mean, Number of cases, and Standard Deviation. Normally these options are selected by default. if any other options are selected, deselect them by clicking them Click Continue Click OK

26

There may be other situations in which you want to select a specific category of cases from a grouping variable (e.g., ethnic background, socio-economic status, education). To do so, choose Data Select Cases... to select the cases you want and do the analysis (e.g., from the grouping variable educat, select cases less than university degree). However, make sure you reset your data if you want to include all the cases for subsequent data analysis. If not, only the selected cases will appear in subsequent analysis. To reset your data choose Data Select Cases... All Cases, and click OK.

27

9. PSPP Output
9.1. Working with Output
When you run a procedure in PSPP, the results are displayed in the Viewer window in the order in which the procedures were run. In this window, you can easily navigate to whichever part of output you want to see. You can also manipulate the output and create a document that contains precisely the output you want. The document can be arranged and formatted appropriately. The Viewer is divided into two panes. The left pane contains an outline view of the output contents. The right pane contains syntax, statistical tables and text output. You can use the scroll bars to browse the results. Suppose you want to see the output in another Windows application such as a word processing program (MsWord). Click the File in the menu Export The following box appears:

Select the format (extension) of the file to be saved (e.g. HTML (*.html)) in the Infer file from extention box below. Type the name of the file (e.g. Outpt) in the Name area situated in the top. Click Save You can then open the saved file in MsWord 28

10. Further Data Analysis


So far, we've used PSPP to develop a basic idea about how PSPP for Windows works. Next step is to examine a few other data analysis techniques (CORRELATIONS, REGRESSION, T-TEST, ANOVA). Refer to the PSPP Users Guide and other documentation for the most complete information (see further reading).

10.1. Sample Data Set


Now we will turn to another data set with more variables and cases. In this example, you will read an ASCII data file, remit4.txt, created with a word processor and saved as a text file into the PSPP session. The data collected from 40 middle school students contains 26 variables including the following: id (student identification number) sex (gender of the student) exp (previous computer experience in months/yrs) school (name of school system) C1 thru C10 (10 scores on the computer anxiety scale) M1 thru M10 (10 scores on the math anxiety scale) mathscor (math score for the same testing period) compscor (computer test score for a given testing period) The first four variables (id, sex, exp, school) are background variables. The variable sex has two levels (M=male, F=female). Exp (prior computer experience) has three levels (1=less than one year, 2=1-2 years, 3=more than 2 years), school (type of school system) has three levels (1=rural, 2=suburban school, 3=urban school). The next 20 variables (C1C10, M1...M10) are Likert type responses to computer opinion and math anxiety surveys. The remaining variables (mathscor, compscor) are scores on the math test and computer test.4

10.2. Creating a Program to Read the Data File


Let us assume that the data file, remit4.txt, is on drive C\PSPP\. At this point the fastest way to read this data into PSPP for Windows is using the Syntax window. You may open a Syntax Editor window (File New Syntax) and type in the following lines or create a command file with the following lines using a word processor or editor and then read it into the Syntax Editor window (File Open and read it by clicking Syntax(*.sps) for the file type from the Open File dialog box). Suppose the following command lines are stored in a file, remit4.sps, on drive C\PSPP\.

A copy of the sample data file is available from the Stat/Math Web home page <tp://www.indiana.edu/~statmath>.

29

GET DATA /TYPE=TXT /FILE='C:\PSPP\Remit4.txt'/ARRANGEMENT=FIXED /FIRSTCASE=1 /VARIABLES= ID 0-1 F SEX 2 A EXP 3 F SCHOOL 4 F C1 5 F C2 6 F C3 7 F C4 8 F C5 9 F C6 10 F C7 11 F C8 12 F C9 13 F C10 14 F M1 15 F M2 16 F M3 17 F M4 18 F M5 19 F M6 20 F M7 21 F M8 22 F M9 23 F M10 24 F MATHSCOR 25-26 F COMPSCOR 27-28 F. MISSING VALUES MATHSCOR COMPSCOR (99). RECODE C3 C5 C6 C10 M3 M7 M8 M9 (1=5) (2=4) (3=3) (4=2) (5=1) INTO C3 C5 C6 C10 M3 M7 M8 M9. RECODE SEX ('M'=1) ('F'=2) INTO NSEX. /* CHAR VAR INTO NUMERIC COMPUTE COMPOPI=SUM(C1 TO C10). /* FIND SUM OF 10 ITEMS USING SUM FUNCTION COMPUTE MATHATTI=M1+M2+M3+M4+M5+M6+M7+M8+M9+M10. /* ADDING EACH ITEM

VARIABLE LABELS ID 'STUDENT IDENTIFICATION' SEX 'STUDENT GENDER' EXP 'YRS OF COMP EXPERIENCE' SCHOOL 'SCHOOL REPRESENTING' MATHSCOR 'SCORE IN MATHEMATICS' COMPSCOR 'SCORE IN COMPUTER SCIENCE' COMPOPI 'TOTAL FOR COMP SURVEY' MATHATTI 'TOTAL FOR MATH ATTI SCALE. VALUE LABELS SEX 'M' 'MALE' 'F' 'FEMALE'/ EXP 1 'UPTO 1 YR' 2 '2 YEARS' 3 '3 OR MORE'/ SCHOOL 1 'RURAL' 2 'CITY' 3 'SUBURBAN'/ C1 TO C10 1 'STROGNLY DISAGREE' 2 'DISAGREE' 3 'UNDECIDED' 4 'AGREE' 5 'STRONGLY AGREE'/ M1 TO M10 1 'STROGNLY DISAGREE' 2 'DISAGREE' 3 'UNDECIDED' 4 'AGREE' 5 'STRONGLY AGREE'/ NSEX 1 'MALE' 2 'FEMALE'/.

Use the mouse and click Run All. The command lines will be executed and an active PSPP file will be created. Select Window Untitled - PSPPIRE Data Editor to see the data file you just read in. Save the data file as a PSPP system file to drive C\PSPP\ or to other hard drive. Select File Save Type in a filename (e.g., remit4.sav)

A copy of the file will now be saved in PSPP format. Now you are ready for further data analysis.

30

Use the following syntax to obtain examples of further statistical analysis of the data:

Frequences Crosstables Compare Means Correlation Analysis Simple Linear Regression T-test One-way Analysis of Variance
PRINT FORMATS COMPOPI MATHATTI (F2.0). /* SPECIFYING THE PRINT FORMAT

LIST VARIABLES=SEX EXP SCHOOL MATHSCOR COMPSCOR COMPOPI MATHATTI/ FORMAT=NUMBERED /CASES=10. /* ONLY THE FIRST 10 OBS FREQUENCIES VARIABLES=SEX,EXP,SCHOOL/ STATISTICS=ALL.

TEMPORARY . SELECT IF SEX EQ 'F'. FREQUENCIES VARIABLES=EXP SCHOOL NSEX/ STATISTICS=ALL. CROSSTABS TABLES=SEX BY EXP SCHOOL. DESCRIPTIVES COMPOPI MATHATTI MATHSCOR COMPSCOR.

T-TEST GROUP=NSEX(1,2) /VARIABLES = MATHSCOR COMPSCOR.

REGRESSION /VARIABLES=MATHSCOR /DEPENDENT= COMPSCOR.

REGRESSION /VARIABLES=EXP NSEX /DEPENDENT= COMPOPI /STATISTICS=ALL, DEFAULTS, R, COEFF, ANOVA, BCOV /SAVE=PRED, RESID. ONEWAY /VARIABLES= MATHSCOR COMPSCOR BY SEX /STATISTICS= DESCRIPTIVE.

31

The Output viewer shows the following results:


PRINT FORMATS PRINT FORMATS COMPOPI MATHATTI (F2.0). /* SPECIFYING THE PRINT FORMAT LIST LIST VARIABLES=SEX EXP SCHOOL MATHSCOR COMPSCOR COMPOPI MATHATTI/ FORMAT=NUMBERED /CASES=10. /* ONLY THE FIRST 10 OBS Data List Case Number SEX EXP SCHOOL MATHSCOR COMPSCOR COMPOPI MATHATTI 1 M 1 3 39 44 31 30 2 F 2 2 30 28 13 19 3 F 1 1 28 25 19 18 4 F 2 1 45 31 19 48 5 F 1 2 42 43 34 47 6 F 1 1 48 44 36 45 7 F 2 2 50 40 19 45 8 F 2 1 40 33 21 48 9 F 2 1 25 24 15 17 10 F 2 2 99 25 15 47 FREQUENCIES FREQUENCIES VARIABLES=SEX,EXP,SCHOOL/ STATISTICS=ALL. STUDENT GENDER Mean 1,90 Std Dev ,81 Valid Cum Value Valu Frequenc Percen Minimum 1,00 Percen Label e y t Percen Maximum 3,00 t t SCHOOL REPRESENTING FEMAL Valid Cum F 22 55,00 55,00 55,00 Value Valu Frequenc Perce E Perce Perce Label e y nt MALE M 18 45,00 45,00 100,00 nt nt Total 40 100,0 100,0 RURAL 1 13 32,50 32,50 32,50 YRS OF COMP EXPERIENCE CITY 2 13 32,50 32,50 65,00 Value Valid Cum SUBURBA Value Frequency Percent 3 14 35,00 35,00 100,00 Label Percent Percent N UPTO 1 YR 2 YEARS 3 OR MORE 1 2 3 15 14 11 37,50 35,00 27,50 37,50 35,00 37,50 72,50 Total 40 100,0 100,0 SCHOOL REPRESENTING N Valid 40 Missing 0 Mean 2,02 Std Dev ,83 Minimum 1,00 Maximum 3,00

27,50 100,00 100,0

Total 40 100,0 YRS OF COMP EXPERIENCE N Valid 40 Missing 0

TEMPORARY TEMPORARY . SELECT IF SELECT IF SEX EQ 'F'. FREQUENCIES FREQUENCIES VARIABLES=EXP SCHOOL NSEX/ STATISTICS=ALL.

32

YRS OF COMP EXPERIENCE Value Valid Cum Value Frequency Percent Label Percent Percent UPTO 1 YR 2 YEARS 3 OR MORE 1 2 3 7 7 8 31,82 31,82 36,36 31,82 31,82 31,82 63,64

36,36 100,00 100,0

Total 22 100,0 YRS OF COMP EXPERIENCE N Valid 22 Missing 0 Mean 2,05 S.E. Mean ,18 Mode 3,00 Std Dev ,84 Variance ,71 Kurtosis -1,61 S.E. Kurt ,95 Skewness -,09 S.E. Skew ,49 Range 2,00 Minimum 1,00 Maximum 3,00 Sum 45,00 Percentiles 50 (Median) 2 SCHOOL REPRESENTING Value Label RURAL CITY SUBURBA N

Mean 2,05 S.E. Mean ,18 Mode 3,00 Std Dev ,84 Variance ,71 Kurtosis -1,61 S.E. Kurt ,95 Skewness -,09 S.E. Skew ,49 Range 2,00 Minimum 1,00 Maximum 3,00 Sum 45,00 Percentiles 50 (Median) 2 NSEX Value Label FEMAL E NSEX N Valid Cum Valu Frequenc Percen Percen Percen e y t t t 2,00 Total Valid Missing 22 100,00 100,00 100,00 22 100,0 100,0

Valid Cum Valu Frequenc Perce Perce Perce e y nt nt nt 1 2 3 7 31,82 31,82 31,82 7 31,82 31,82 63,64 8 36,36 36,36 100,00

Total 22 100,0 100,0 SCHOOL REPRESENTING N Valid 22 Missing 0 CROSSTABS CROSSTABS TABLES=SEX BY EXP SCHOOL. Summary.

22 0 Mean 2,00 S.E. Mean ,00 Mode 2,00 Std Dev ,00 Variance ,00 Kurtosis . S.E. Kurt ,95 Skewness . S.E. Skew ,49 Range ,00 Minimum 2,00 Maximum 2,00 Sum 44,00 Percentiles 50 (Median) 2,00

Cases Valid Missing Total N Percent N Percent N Percent STUDENT GENDER * YRS OF COMP EXPERIENCE 40 100.0% 0 0.0% 40 100.0% STUDENT GENDER * SCHOOL REPRESENTING 40 100.0% 0 0.0% 40 100.0%

33

SEX * EXP [count]. EXP SEX UPTO 1 YR 2 YEARS 3 OR MORE Total FEMALE 7,0 7,0 8,0 22,0 MALE 8,0 7,0 3,0 18,0 Total 15,0 14,0 11,0 40,0 SEX * SCHOOL [count]. SCHOOL SEX RURAL CITY SUBURBAN Total FEMALE 7,0 7,0 8,0 22,0 MALE 6,0 6,0 6,0 18,0 Total 13,0 13,0 14,0 40,0 DESCRIPTIVES DESCRIPTIVES COMPOPI MATHATTI MATHSCOR COMPSCOR. Valid cases = 40; cases with missing value(s) = 5. Variable N Mean Std Dev Minimum Maximum COMPOPI 40 27,93 MATHATTI 40 38,83 MATHSCOR 37 40,65 COMPSCOR 38 35,95 11,53 12,55 7,57 6,57 13,00 15,00 20,00 24,00 46,00 50,00 50,00 48,00

T-TEST T-TEST GROUP=NSEX(1,2) /VARIABLES = MATHSCOR COMPSCOR. Group Statistics NSEX N Mean Std. Deviation S.E. Mean MATHSCOR MALE 17 39,47 FEMALE 20 41,65 COMPSCORMALE 16 37,31 FEMALE 22 34,95 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference F Equal variances 5,98 assumed Equal variances not assumed Equal variances COMPSCOR 4,83 assumed Equal variances not assumed MATHSCOR Sig. t df Sig. Mean Std. Error (2Difference Difference tailed) ,39 ,37 ,28 ,25 -2,18 -2,18 2,36 2,36 2,40 2,40 2,02 2,02 Lower -7,05 -7,07 -1,74 -1,74 Upper 2,69 2,71 6,45 6,46 5,09 9,19 4,94 7,50 1,23 2,05 1,23 1,60

,02 -,87 35,00 -,91 30,47 ,03 1,0 36,00 9 1,1 35,72 7

34

REGRESSION /VARIABLES=MATHSCOR /DEPENDENT= COMPSCOR. Model Summary R ,55 ANOVA Sum of Mean df Squares Square Regression Residual Total Coefficients B Std. Beta t Error Signifi cance ,00 ,00 424,06 1 999,54 33 1423,60 34 F Signif icance ,00 R Square ,30 Adjusted R Square ,30 Std. Error of the Estimate 5,50

/SAVE=PRED, RESID. Model Summary R ,18 ANOVA Sum of Mean df F Significance Squares Square Regression 161,91 2 80,96 ,60 Residual 5022,86 37 135,75 Total 5184,77 39 Coefficients B (Constant) YRS OF COMP EXPERIENCE NSEX Std. Beta t Error 2,35 ,11 ,64 3,78 -,16 -,99 Signifi cance ,00 ,52 ,33 ,56 R Square ,03 Adjusted R Square ,01 Std. Error of the Estimate 11,65

424,06 14,00 30,29

30,87 6,87 ,00 4,50 1,51 -3,76

(Constant) 17,77 5,01 ,00 3,54 SCORE IN ,45 ,12 ,55 3,74 MATHEMATICS REGRESSION /VARIABLES=EXP NSEX /DEPENDENT= COMPOPI /STATISTICS=ALL, DEFAULTS, R, COEFF, ANOVA, BCOV

Coefficient Correlations Model Covariances YRS OF COMP EXPERIENCE NSEX NSEX 5,53

ONEWAY /VARIABLES= MATHSCOR COMPSCOR BY SEX /STATISTICS= DESCRIPTIVE. Descriptives 95% Confidence Interval for Mean Lower Upper Bound Bound 37,35 36,85 38,13 31,63 34,68 33,79 45,95 42,09 43,17 38,28 39,94 38,11

N Mean SCORE IN MATHEMATICS FEMALE 20 41,65 MALE 17 39,47 Total 37 40,65 SCORE IN COMPUTER FEMALE 22 34,95 SCIENCE MALE 16 37,31 Total 38 35,95 ANOVA

Std. Deviation 9,19 5,09 7,57 7,50 4,94 6,57

Std. Error 2,05 1,23 1,24 1,60 1,23 1,07

Minimum Maximum 20 28 20 24 30 24 50 50 50 48 45 48

Sum of Squares df Mean Square F Significance Between Groups Within Groups Total SCORE IN COMPUTER SCIENCE Between Groups Within Groups Total SCORE IN MATHEMATICS 43,65 1 2018,79 35 2062,43 36 51,50 1 1546,39 36 1597,89 37 43,65 ,76 57,68 51,50 1,20 42,96 ,39

,28

35

References and Further Reading


The material covered in this document illustrates some of the basic features of PSPP. Examining additional features of PSPP is beyond the scope of this document. For further help, refer to PSPP documents. If you need assistance in using PSPP contact Brahim El Mouaatamid, Research Assistant, ACP Observatory on Migration. mrfbrusselsacp@iom.int The basic documents for PSPP are: PSPP Users Guide: accessible here: http://netcologne.dl.sourceforge.net/project/pspp4windows/pspp-master.pdf The financial support in the production of this manual comes from Network Theory Ltd http://www.network-theory.co.uk A French version is accessible here (version 0.4.0) : http://cict.fr/~stpierre/doc-pspp.pdf by Julie Sgula (avec laide de Joseph Saint Pierre). The main PSPP website: http://www.gnu.org/software/pspp/ Newest versions and documentation are available for free download here: http://pspp.awardspace.com/ Peter Browne, IT Services, December 2009. SPSS: (Predictive Analytics SoftWare), Getting started. Accessible here: http://www.its.qmul.ac.uk/training/manuals/SPSSIntro.pdf Parts of this guide are adapted for PSPP Getting Started with SPSS for Windows by John Samuel, Spring 2010. Accessible here: http://www.indiana.edu/~statmath/stat/spss/win/spss_win_printable.pdf

36

Vous aimerez peut-être aussi