Vous êtes sur la page 1sur 56

INTRODUCTION TO SPSS Statistical Package for Social Sciences is what SPSS stands for.

As its name implies it is a statistical package that was originally designed for the handling of data generated in the process of social science studies. But, currently, it is widely used in other areas, too. For example it is used by Governments, businesses, law enforcement agencies, health care providers, academics and also in experimental and observational studies. So, what is SPSS? SPSS is a simple package to use. The user interface of the package is a spreadsheet. In this spreadsheet too, there are cells, columns and rows. The columns represent the variables and the rows, cases. Cases and variables are the two main components in statistics. Case is the subject of analysis. This could be an animal in a scientific experiment or a person replying a questionnaire. Variables are the measurements obtained on the various characteristics of each case. Data so obtained can be analyzed by this package by means of descriptive and bivariate or multivariate statistical methods. SPSS differs from other spreadsheets in that the analysis is done in pull-down menus through commands instead of analyzing within the spreadsheet. The output also does not appear in the spreadsheet itself as common in other spreadsheets, but in a separate window. The output of SPSS is comprehensive. That is to say the package may give additional outputs to augment the expected output. For example, in addition to a graph it may give a histogram, the mean and the standard deviation. SPSS can take data from almost any type of file and use them to generate tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and conduct complex statistical analyses. DIFFERENT TYPES OF WINDOWS IN A TYPICAL STATISTICAL SOFTWARE There are a number of different types of windows in typical statistical software: 1. Data Editor Window / Object Window / Variable Window Default window with a blank data sheet ready for analyses. This window displays the contents of the data file. You may create new data files, or modify existing ones with the Data Editor. The Data Editor window opens automatically when you start an SPSS session.

2. Viewer Window / Output Window / Log file / Results Window The Viewer window displays the statistical results, tables, and charts from the analysis you performed (e.g., descriptive statistics, correlations, plots, charts). A Viewer window opens automatically when you run a procedure that generates output. In the Viewer windows, you can edit, move, delete and copy your results. Whenever a procedure is run, the output is directed to a separate window. One can also have multiple [Output] windows open to organize the various analyses that might be conducted. Later, these results can be saved and/or printed. 3. Syntax Editor Window / Do File You can paste your dialog box choices into a Syntax Editor window in SPSS, where your selections appear in the form of command syntax. You can then edit the command syntax to utilize special features of SPSS not available through dialog boxes. You can save these commands in a file for use in subsequent SPSS sessions. Similarly in some softwares like STATA you can enter several lines of command in the do file editor. Either you can run the selected line in the Do Editor OR Do from the start line to the end. This can be created from the event history window. 4. Chart Editor Window You can modify and save high-resolution charts and plots in chart windows. You can change the colors, select different type fonts or sizes, switch the horizontal and vertical axes, rotate 3-D scatter plots, and even change the chart type. 5. Script Editor Window Scripting allow you to customize and automate many tasks in a typical statistical software. Use the Script Editor to create and modify basic scripts. 6. Command Window Command Window is where you will type your commands with syntax. To send a command to software, hit the "return" or "enter" key. 7. Event History Window The STATA Review Window lists all of the STATA commands that have been executed since STATA opened. These can be repeated by double-clicking them and then clicking into the Command Window and hitting Enter. The Review Window records your commands. The

Results window displays your output. The variables window lists the variables in the data set you are using. The Results window is the Log Window. A BRIEF ABOUT THE DATA SET Before discussing the types of files generation and various menus in the SPSS, it will be useful to understand the dataset which we are going to use as an example throughout. A Sample Research Problem: The Employee Income study The SPSS file name of the data set used with this manual is Employee Data.sav; it stands for Employee Income Data. It is based on a sample data found in SPSS. The current data set is a sample of 474 employees drawn randomly from the larger employee population. Also, there are several kinds of personal, social and occupational data such as Gender, Date of Birth, Minority Classification, Educational Level (years), Employment Category, Current Salary, Beginning Salary, Months Since Hire and Previous Experience (months). The description of the variables is given below: Variable Description: Variable Name id gender bdate educ jobcat salary salbegin jobtime prevexp minority Variable Description/Label Employee Code Gender (M/F) Date of Birth Educational Level (years) Employment Category (Clerical, Custodial, Manager) Current Salary Beginning Salary Months Since Hire Previous Experience (months) Minority Classification (yes/no)

DIFFERENT WINDOWS AND FILES / FILE EXTENSIONS:

Each window corresponds to a separate type of files in statistical software.

Data Files There are two basic types of files in SPSS. The first is the data file window (.sav). This is where all the data for your analysis resides. When you open up a data file, it will appear in the Program Editor window. The format is similar to a spreadsheet with a grid of rows and columns. The columns represent variables and the rows represent observations. You can place the cursor on the column heading to get a lengthier description of each variable. To get complete information on any variable, go to the UTILITIES menu and click on variables. Data view>Utilities>Variables

Data view

Variable Name

Observation (Case Number)

Variable view

Variable characteristics (entire row) Variables (entire column)

Output files The second type of file is an output window file (.spv). When a statistical procedure is run, output is produced. The Viewer window will automatically open to show the output. The left pane contains an outline view of the output. The right pane contains the contents of the output which include tables, charts, and text. There are book icons in the outline view next to the various objects of output. If the book is open, it indicates that the output is visible. If the book is closed, it is hidden.

Analyze> Descriptive statistics>Frequencies>Employment Category>OK

DIFFERENT MENUS IN A TYPICAL STATISTICAL SOFTWARE Many of the tasks you may want to perform with a typical statistical software start with menu selections. Interestingly, in many statistical softwares each window would have its own set of menu bars with different options. The Data Editor window in SPSS, for example, has the following menu.

Most menus in this window are similar to the ones found in windows menu and some are unique / specific to the task of Data Editor. Data Editor Window has ten main menus. The different menus are described in more detail below: 6

1. File The File menu has an option to create a new SPSS system file, open an existing system file, read in spreadsheet or database files created by other software programs, read in an external ASCII/EXCEL data file from the Data Editor; create a command file, retrieve an already created SPSS command file into the Syntax Editor; open, save, and print output files from the Viewer and Pivot Table Editor; and save chart templates and export charts in external formats in the Chart Editor, etc. 2. Edit The Edit menu has an option to cut, copy, and paste data values from the Data Editor; modify or copy text from the Viewer or Syntax Editor; copy charts for pasting into other applications from the Chart Editor, etc. 3. View The View menu has an option to turn toolbars and the status bar on and off, and turn grid lines on and off from all window types; and control the display of value labels and data values in the Data Editor. 4. Data The Data menu has an option to make global changes to SPSS data files, such as transposing variables and cases, or creating subsets of cases for analysis, and merging files. These changes are only temporary and do not affect the permanent file unless you save the file with the changes. 5. Transform The Transform menu has an option to make changes to selected variables in the data file and to compute new variables based on the values of existing ones. These changes are temporary and do not affect the permanent file unless you save the file with changes. 6. Analyze The Analyze menu is the important menu which contains all statistical procedures specific to SPSS. This menu will be discussed later in detail. 7. Graphs

File

Edit

View

The Graphs menu has an option to create bar charts, pie cha rts, histograms, scatterplots, and other full-color, high-resolution graphs. Some statistical procedures also generate graphs. All graphs can be customized with the Chart Editor. 8. Utilities The Utilities menu has an option to display information about variables in the working data file and control the list of variables from all window types; change the designated Viewer and Syntax Editor, etc. 9. Add-ons The Add-ons menu has an option to view the information of add-on modules. 10. Window The Window menu has an option to switch between SPSS windows or to minimize all open SPSS windows. 11. Help The Help menu has a standard Microsoft Help window containing information on how to use the various features of SPSS. Context-sensitive help is available through the dialog boxes.

Graphs

Utilities

Add-ons

Window

Help

10

TOOLBAR IN SPSS For each SPSS window there exist a toolbar that provides quick and easy access to common tasks of that window. Each icon in the tool bar is provided with Tool Tips. These Tool Tips show a brief description of each tool when you put the mouse pointer on the icon.

The Main Toolbar File Buttons The first three buttons represent the three most common commands from the File menu Open an Existing File, Save the Current File and Print the Current File respectively. Dialog Recall This button gives you quick access to the previous 12 dialog boxes you were working with. This is particularly useful when you are building up an analysis and frequently going back and forth to the same box to change or modify options. Go to Chart This button helps you to open Chart Editor. Go to Data When you are in a window other than the Data Editor window this button will take you back to the Data Editor.

Go to Case

11

This button helps you to go quickly to a specific case in the data editor. It is helpful in editing data, when an abnormal data points / outliers are found in your analysis and you want to check out the source data.

Variables This button creates a dialog box containing a list of all the variables defined in the data file. Selecting a variable from this list displays the properties of variables viz., its name, label, type, information about missing values and the value labels. This box can be kept open while you work with the data file so that you can examine a variable's information as you examine the results of an analysis. Find This button helps you to carry out a simple search to find a value. Insert Case / Insert Variable It is not unusual to find yourself wanting to add a case or a variable in the middle of data entry. These two buttons will help you add a blank row or column in your data set. Split File/Weight Cases/Select Cases These buttons help you to do three of the Data Menu commands Split File, Weight Cases and Select Cases respectively. (Data Menu is discussed later in this section)

Value Labels This button helps you to display the labels in the data editor so that you dont have to remember what the numbers meant. Disabling mode of this button would display the number again.

Use Sets

12

In Data Editor window you can group variables together into sets so that the variables can be analysed together. This button helps you to specify what sets of the ones you have defined you want to use. SPECIAL (STATISTICS) MENUS For every statistical software you may find some menus special and distinct. The special menus of SPSS are called the Statistics Menus. These special menus are as follows: 1. Data Menu 2. Transform Menu 3. Analyse Menu DATA MENU IN SPSS. Data Menu provides procedures to define variables, insert variables or cases, sort cases, merge files, split files, select cases and use a variable to weight cases. Some of the menu items in the Data Menu such as sorting, merging and transposing data sets and for selecting subset of cases and splitting files by variables are explained below. a) Define variable properties:

SPSS offers a wizard-type tool that helps you to set all variable properties using an interactive interface. Although it can be used for all types of variables, it is especially useful for categorical variables, as it scans the actual variables for all distinct values. From the menu select Data Define variable properties - then a first panel appears that lets you select the variables for which you want to set or change properties: Data>Define variable properties> take gender>Variable to scan

13

Select the variables; you can also limit the number of cases to scan (useful with very large files) and as the tool is best use with categorical variables, you can also limit the number of values (codes) that should be displayed-When you click continue the next panel will pop-up

14

b) Copy Data Properties The Copy Data Properties Wizard provides the ability to use an external SPSS Statistics data file as a template for defining file and variable properties in the active dataset. You can also use variables in the active dataset as templates for other variables in the active dataset. You can copy selected file properties from an external data file or open dataset to the active dataset. File properties include documents, file labels, multiple response sets, variable sets, and weighting. Copy selected variable properties from an external data file or open dataset to matching variables in the active dataset. Variable properties include value labels, missing values, level of measurement, variable labels, print and write formats, alignment, and column width (in the Data Editor). Copy selected variable properties from one variable in an external data file, open dataset, or the active dataset to many variables in the active dataset. Create new variables in the active dataset based on selected variables in an external data file or open dataset. When copying data properties, the following general rules apply: If you use an external data file as the source data file, it must be a data file in SPSS Statistics format.

15

If you use the active dataset as the source data file, it must contain at least one variable. You cannot use a completely blank active dataset as the source data file.

Undefined (empty) properties in the source dataset do not overwrite defined properties in the active dataset.

Variable properties are copied from the source variable only to target variables of a matching typestring (alphanumeric) or numeric (including numeric, date, and currency).

From the menus in the Data Editor window choose: Data-Copy Data Properties. Select the data file with the file and/or variable properties that you want to copy. This can be a currently open dataset, an external SPSS Statistics data file, or the active dataset. Follow the step-by-step instructions in the Copy Data Properties Wizard. Data>Copy data set> follow the instructions of the wizard

Define Dates The Define Dates dialog box allows you to generate date variables that can be used to establish the periodicity of a time series and to label output from time series analysis.
Name YEAR_ QUARTER_ MONTH_ DATE_ Label YEAR, not periodic QUARTER, period 4 MONTH, period 12 DATE. FORMAT: "MMM YYYY"

The following is a partial listing of the new variables:

16

YEAR_ QUARTER_ MONTH_ DATE_ 1950 1950 1950 1950 1950 2 2 2 3 3 4 5 6 7 8 APR MAY JUN JUL AUG 1950 1950 1950 1950 1950

Define Multiple Response Sets To define multiple responses sets: From the menus, choose-Data- Define Multiple Response Sets Select two or more variables. If your variables are coded as dichotomies, indicate which value you want to have counted. Enter a unique name for each multiple response set. The name can be up to 63 bytes long. A dollar sign is automatically added to the beginning of the set name.

17

Enter a descriptive label for the set. (This is optional.) Click Add to add the multiple response set to the list of defined sets.

Identify Duplicate Cases Duplicate cases may occur in your data for many reasons, including: Data entry errors in which the same case is accidentally entered more than once. Multiple cases share a common primary ID value but have different secondary ID values, such as family members who all live in the same house. Multiple cases represent the same case but with different values for variables other than those that identify the case, such as multiple purchases made by the same person or company for different products or at different times. Note: Take employee id and current salary to check the duplicate case 18

Sort Cases Sort Cases procedure reorders the sequence of cases based on the values of one or more variables. You can optionally sort cases in ascending or descending order, or you can use combinations of ascending and descending order for different variables. For example, if you select gender as the first sorting variable and minority as the second sorting variable, cases will be sorted by minority classification within each gender category. Note: Sort cases by jobcat in Descending order

19

Sort Variables You can sort the variables in the active dataset based on the values of any of the variable attributes (e.g., variable name, data type, measurement level), including custom variable attributes. Values can be sorted in ascending or descending order. You can save the original (pre-sorted) variable order in a custom variable attribute. Note: Sort variables by width in ascending order

20

Transpose Transpose procedure creates a new data file in which the rows and columns in the original data file are transposed so that cases (rows) become variables and variables (columns) become cases. Transpose automatically creates new variable names and displays a list of the new variable names. A new Untitled file is created with the transposed data set. Ex: Flip variables = bdate gender salary salbegin by id

21

Note: Variable view after transpose

Note: Data view after transpose Please note: the values of gender are string and hence will be converted into SYSMIS. Restructuring Data Restructuring Data Wizard can help replace the current file with a new, restructured file. The wizard can:

Restructure selected variables into cases. Restructure selected cases into variables. Transpose all data.

There are 7 steps to complete restructuring the data. You just need to feed variables in each step following the instructions given by SPSS. The screen shots for select steps are given below.

22

23

24

Merging Data Files This procedure helps you to merge data from two files in two different senses. You can: Merge the active dataset with another open dataset or SPSS-format data file containing the same variables but different cases. Take employee data Employee_MergeCase1and merge Empolyee_MergeCase2 with that. Verify the number of cases and variable once the data is merged. Merge the active dataset with another open dataset or SPSS-format data file containing the same cases but different variables. Take employee data Employee_MergeVariable1and merge Empolyee_MergeVariable2 with that. Verify the number of cases and variable once the data is merged.

25

Note: Merge data file containing the same variables but different cases

26

Note: Merge data file containing the different variables but same cases Aggregate Data Aggregate Data procedure aggregates groups of cases in the dataset into single cases and creates a new, aggregated file or creates new variables in the active dataset that contain aggregated data. Cases are aggregated based on the value of one or more break /grouping variables.If you create a new, aggregated data file, the new data file contains one case for each group defined by the break variables. For example, if there is one break variable with two values, the new data file will contain only two cases. Exercise: From employee data base, take gender as a Break Variable and education, Job category, salary, salary beginning, previous experience as aggregated variables.

27

Note: Employee dataset before aggregate

28

Variables aggregate by gender

Note: New variable after aggregate Copy Dataset By the click of the option, SPSS creates one complete duplicate dataset. Split File Split File procedure splits the data file into separate groups for analysis based on the values of one or more grouping variables. If you select multiple grouping variables, cases are grouped by each variable within categories of the preceding variable. Based on the purpose the files may be split up in two ways.

Compare groups: This option may split up file and compute the statistical procedures Organize output by groups: All results from each statistical procedure are displayed

according to groups defined. The results are presented together for comparison purpose.

separately for each split up file group.

29

Select Cases Select Cases procedure provides several methods for selecting a subgroup of cases based on criteria that include variables and complex arithmetical / logical expressions. You can also select a random sample of cases. The criteria used to define a subgroup can include:

Variable values and ranges Date and time ranges Case (row) numbers Arithmetic expressions Logical expressions Functions

30

Weight Cases Weight Cases procedure gives cases different weights (equivalent to frequency) for statistical analysis. This option helps the researcher to work with different sample schemes other than simple random sampling.

The values of the weighting variable should indicate the number of observations represented Cases with zero, negative, or missing values for the weighting variable are excluded from Fractional values are valid; they are used exactly where this is meaningful and most likely

by single cases in your data file.

analysis.

where cases are tabulated. Once you apply a weight variable, it remains in effect until you select another weight variable or turn off weighting. If you save a weighted data file, weighting information is saved with the data file. You can turn off weighting at any time, even after the file has been saved in weighted form.

31

No Weights

Weight on

TRANSFORM MENU IN SPSS This menu helps to change, or transform, the values associated with the variables. A number of data transformation procedures provided in the Transform Menu. The following are the procedures available in Transform Menu.

32

Computing Variables The compute procedure opens up a dialog box that may help to compute values for a defined variable based on arithmetic computations defined over other variables.

You can compute values for numeric or string (alphanumeric) variables. You can create new variables or replace the values of existing variables. For new variables, You can compute values selectively for subsets of data based on logical conditions. You can use a large variety of built-in functions, including arithmetic functions, statistical

you can also specify the variable type and label.


functions, distribution functions, and string functions. Exercise: From Employee database- create a new variable called NewSalary Add 2000 to the current salary.

33

NewSalary= current Salary+ 20000

Exercise: From Employee database- create a new variable called Salary_Difference Find the difference between current salary and beginning salary.

Salary Difference = Current salarybeginning salary

Count Values within Cases This dialog box creates a variable that counts the occurrences of the same value(s) in a list of variables for each case. For example, a survey might contain a list of magazines with yes/no 34

check boxes to indicate which magazines each respondent reads. You could count the number of yes responses for each respondent to create a new variable that contains the total number of magazines read. Exercise: From the employee data set- Create a new variable empcat_Minority which has a count of job category=3 and minority=0

35

Empcat_Minority= 2 where job category is 3 and minority is 0

Shift Values In the procedure a new variable is created out of the existing variable either with a lead or lag values of the existing variable. We may also simply assign a new name to the existing variable. Example: From employee data set take variable Salary_Difference and change it to Salary_Gap with a lag of 1.

36

Salary_Gap with one lag from Salary_Difference

Recode into same variable The Recode into same variables dialog box allows you to reassign the values of existing variables or collapse ranges of existing values into new values for a new variable. For example, you could collapse salaries into a new variable containing salary-range categories. You can recode numeric and string variables. You can recode numeric variables into string variables and vice versa. If you select multiple variables, they must all be the same type. You cannot recode numeric and string variables together. Exercise: Take variable Gender from the Employee data base. Recode male as 1 and female as 0.

37

Gender takes the value as m for male and f for female before recoding

Gender variable has been recoded: m is recoded as 1 and f as 0

38

Recode into different variable The Recode into Different Variables dialog box allows you to reassign the values of existing variables or collapse ranges of existing values into new values for a new variable. For example, you could collapse salaries into a new variable containing salary-range categories. You can recode numeric and string variables. You can recode numeric variables into string variables and vice versa. If you select multiple variables, they must all be the same type. You cannot recode numeric and string variables together. This is less risky because you are able to retain your original variable. Exercise: Convert the values of variable gender which is in numeric form of 1 and 0 as Male and Female. Create a new variable called Gender_String for the same.

39

Gender_string is a new variable with values in strings. The old variable gender has been retained

Automatic Recode The Automatic Recode dialog box allows converting string and numeric values into consecutive integers. When category codes are not sequential, the resulting empty cells reduce performance and increase memory requirements for many procedures. Additionally, some procedures cannot use string variables, and some require consecutive integer values for factor levels. The new variable(s) created by Automatic Recode retain any defined variable and value labels from the old variable. For any values without a defined value label, the original value is used as the label for the recoded value. A table displays the old and new values and value labels. String values are recorded in alphabetical order, with uppercase letters preceding their lowercase counterparts. Missing values are recoded into missing values higher than any nonmissing values, with their order preserved. For example, if the original variable has 10 nonmissing values, the lowest missing value would be recoded to 11, and the value 11 would be a missing value for the new variable.

40

Use the same recoding scheme for all variables. This option allows you to apply a single autorecoding scheme to all the selected variables, yielding a consistent coding scheme for all the new variables. Exercise: Take variable salary Beginning and apply a automatic recode. Name the new variable as Recoded_BeginSalary. Count how many people have been drawing lowest beginning salary.

41

The beginning salary has been ranked in ascending order.

Visual Binning Visual Binning is designed to assist the process of creating new variables based on grouping of continuous values of existing variables into a limited number of distinct categories. We can use Visual Binning to: Create categorical variables from continuous scale variables. For example, we could use a scale income variable to create a new categorical variable that contains income ranges. Collapse a large number of ordinal categories into a smaller set of categories. For example, we could collapse a rating scale of nine down to three categories representing low, medium, and high. In the first step, we select the numeric scale and/or ordinal variables for which we want to create new categorical (binned) variables. Optionally, we can limit the number of cases to scan. For data files with a large number of cases, limiting the number of cases scanned can save time, but we should avoid this if possible because it will affect the distribution of values used in subsequent calculations in Visual Binning. 42

Note: String variables and nominal numeric variables are not displayed in the source variable list. Visual Binning requires numeric variables, measured on either a scale or ordinal level, since it assumes that the data values represent some logical order that can be used to group values in a meaningful fashion. We can change the defined measurement level of a variable in Variable View in the Data Editor.

Example: Take Education level which is a continuous variable and change it into categorical variable with the help of binning. .

43

44

Education level with categories

Optimal Binning The Optimal Binning procedure discreteness one or more scale variables (referred to henceforth as binning input variables) by distributing the values of each variable into bins. Bin formation is optimal with respect to a categorical guide variable that "supervises" the binning process. Bins can then be used instead of the original data values for further analysis. For example: reducing the number of distinct values a variable takes has a number of uses, including: Data requirements of other procedures. Discredited variables can be treated as categorical for use in procedures that require categorical variables. For example, the Crosstabs procedure requires that all variables be categorical. Data privacy. Reporting binned values instead of actual values can help safeguard the privacy of were data sources. The Optimal Binning procedure can guide the choice of bins.

45

Speed performance. Some procedures are more efficient when working with a reduced number of distinct values. For example, the speed of Multinomial Logistic Regression can be improved using discredited variables.

Uncovering complete or quasi-complete separation of data. Optimal versus Visual Binning. The Visual Binning dialog boxes offer several automatic methods for creating bins without the use of a guide variable. These "unsupervised" rules are useful for producing descriptive statistics, such as frequency tables, but Optimal Binning is superior when ever end goal is to produce a predictive model.

Output- The procedure produces tables of cut points for the bins and descriptive statistics for each binning input variable. Additionally, we can save new variables to the active dataset containing the binned values of the binning input variables and save the binning rules as command syntax for use in discrediting new data.

Exercise: Group Current Salary with respect to Educational level binned.

46

Output: Current salary binned with respect to Educational Level group


Current Salary End Point Bin 1 2 3 4 Total
a

Number of Cases by Level of Educational Level (years) (Binned) 12 - 14 220 28 0 1 249 15 - 17 68 64 33 10 175 18+ 2 2 8 38 50 Total 290 94 41 49 474

Lower

Upper $31,050

$31,050 $43,000 $59,375


a

$43,000 $59,375

Each bin is computed as Lower <= Current Salary < Upper. a. Unbounded

Rank Cases The Rank Cases dialog box allows creating new variables containing ranks, normal and Savage scores, and percentile values for numeric variables. 47

New variable names and descriptive variable labels are automatically generated, based on the original variable name and the selected measure(s). A summary table lists the original variables, the new variables, and the variable labels. Optionally, you can: Rank cases in ascending or descending order Organize rankings into subgroups by selecting one or more grouping variables for the by list. Ranks are computed within each group. Groups are defined by the combination of values of the grouping variables. For example, if you select gender and minority as grouping variables, ranks are computed for each combination of gender and minority. Exercise: Rank Beginning salary in descending order.

48

Ranks are assigned to Beginning salary in descending order

Create Time Series Several data transformations that are useful in time series analysis are provided in this procedure: Generate date variables to establish periodicity and to distinguish between historical, validation, and forecasting periods. Create new time series variables as functions of existing time series variables. Replace system- and user-missing values with estimates based on one of several methods. A time series is obtained by measuring a variable (or set of variables) regularly over a period of time. Time series data transformations assume a data file structure in which

49

each case (row) represents a set of observations at a different time, and the length of time between cases is uniform. Exercise: Create a time series taking variable current salary using cumulative sum.

Cumulative sum of current salary

Replace Missing Values Missing observations can be problematic in analysis, and some time series measures cannot be computed if there are missing values in the series. Sometimes the value for a particular observation is simply not known. In addition, missing data can result from any of the following: Each degree of differencing reduces the length of a series by 1. Each degree of seasonal differencing reduces the length of a series by one season. 50

If you create new series that contain forecasts beyond the end of the existing series (by clicking a Save button and making suitable choices), the original series and the generated residual series will have missing data for the new observations.

Some transformations (for example, the log transformation) produce missing data for certain values of the original series. Missing data at the beginning or end of a series pose no particular problem; they simply shorten the useful length of the series. Gaps in the middle of a series (embedded missing data) can be a much more serious problem. The extent of the problem depends on the analytical procedure you are using.

The Replace Missing Values dialog box allows you to create new time series variables from existing ones, replacing missing values with estimates computed with one of several methods. Default new variable names are the first six characters of the existing variable used to create it, followed by an underscore and a sequential number. For example, for the variable price, the new variable name would be price_1. The new variables retain any defined value labels from the original variables. Exercise: Replace missing values in the variable Salary Difference by using series mean.

51

Missing values have been substituted with series mean

Random Number Seed The Random Number Seed dialog box allows to select the random number generator and to set the seed value so as to reproduce a sequence of random numbers. Two different random number generators are available: Version 12 Compatible. The random number generator used in version 12 and previous releases. If you need to reproduce randomized results generated in previous releases based on a specified seed value, use this random number generator. Mersenne Twister. A newer random number generator that is more reliable for simulation purposes. If reproducing randomized results from version 12 or earlier is not an issue, use this random number generator. The random number seed changes each time a random number is generated for use in transformations (such as random distribution functions), random sampling, or case weighting. To replicate a sequence of random numbers, set the initialization starting point value prior to each analysis that uses the random numbers. The value must be a positive integer.

52

ANALYSE MENU IN SPSS The Analyze Menu is the work horse of SPSS. Nearly all procedures that generate output are located on this menu. Here only most important of these menu is discussed. 1. a. Descriptive Statistics. Frequencies Statistics

The Frequencies procedure provides statistics and graphical displays that are useful for describing many types of variables. The frequency procedure reports frequency table along with select statistics and graphs, viz., Percentile values, Central Tendency, Dispersion, Skewness and Kurtosis, and basic charts. b. Descriptives Options One or more of the following subgroup statistics for the variables within each category of each grouping variable: sum, number of cases, mean, median, grouped median, standard error of the mean, minimum, maximum, range, variable value of the first category of the grouping variable, variable value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard error of kurtosis, skewness, standard error of skewness, percentage of total 53

sum, percentage of total N, percentage of sum in, percentage of N in, geometric mean, and harmonic mean may be computed. You can change the order in which the subgroup statistics appear. The order in which the statistics appear in the Cell Statistics list is the order in which they are displayed in the output. Summary statistics are also displayed for each variable across all categories. c. Explore Statistics displayed. Along with descriptive statistics, M-estimators and Huber's M-estimator are Outliers display the five largest and five smallest values with case labels. d. Crosstabs The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and measures of association for two-way tables. 2. groups. a. Independent-Samples T Test The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for this test, the subjects should be randomly assigned to two groups, so that any difference in response is due to the treatment (or lack of treatment) and not to other factors. This is not the case if you compare average income for males and females. b. Paired-Samples T Test The Paired-Samples T Test procedure compares the means of two variables for a single group. The procedure computes the differences between values of the two variables for each case and tests whether the average differs from 0. c. One-Way ANOVA The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative dependent variable by a single factor (independent) variable. Analysis of variance is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test. In addition to determining that differences exist among the means, you may want to know which means differ. There are two types of tests for comparing means: a priori contrasts and post hoc tests. Contrasts are tests set up before running the experiment, and post hoc tests are run after the experiment has been conducted. You can also test for trends across categories. Compare Means

The following set of procedures help to compare the differences in means among two or more

54

3.

GLM Model

The GLM procedure provides regression analysis and analysis of variance for one dependent variable by one or more factors and/or variables. The factor variables divide the population into groups. Using this General Linear Model procedure, you can test null hypotheses about the effects of other variables on the means of various groupings of a single dependent variable. You can investigate interactions between factors as well as the effects of individual factors, some of which may be random. In addition, the effects of covariates and covariate interactions with factors can be included. For regression analysis, the independent (predictor) variables are specified as covariates. 4. Bivariate Correlations Options

The Bivariate Correlations procedure computes Pearson's correlation coefficient, Spearman's rho, and Kendall's tau-b with their significance levels. Correlations measure how variables or rank orders are related. Before calculating a correlation coefficient, screen your data for outliers (which can cause misleading results) and evidence of a linear relationship. Pearson's correlation coefficient is a measure of linear association. Two variables can be perfectly related, but if the relationship is not linear, Pearson's correlation coefficient is not an appropriate statistic for measuring their association. 5. Partial Correlations

The Partial Correlations procedure computes partial correlation coefficients that describe the linear relationship between two variables while controlling for the effects of one or more additional variables. Correlations are measures of linear association. Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not an appropriate statistic for measuring their association. 6. Linear Regression Variable Selection Methods

Method selection allows you to specify how independent variables are entered into the analysis. Using different methods, you can construct a variety of regression models from the same set of variables. 7. Discriminant Analysis

Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between 55

the groups. The functions are generated from a sample of cases for which group membership is known; the functions can then be applied to new cases that have measurements for the predictor variables but have unknown group membership. 8. Factor Analysis

Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. Factor analysis can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis (for example, to identify collinearity prior to performing a linear regression analysis).

56

Vous aimerez peut-être aussi