Vous êtes sur la page 1sur 9

SPSS is a computer program used for survey authoring and deployment (IBM SPSS Data Collection), data mining

(IBM SPSS Modeler), text analytics, statistical analysis, and collaboration and deployment (batch and automated scoring services).
SPSS (originally, Statistical Package for the Social Sciences) was released in its first version in 1968 after being developed by Norman H. Nie and C. Hadlai Hull. Norman Nie was then a political science postgraduate at Stanford University, and is now Research Professor in the Department of Political Science at Stanford and Professor Emeritus of Political Science at the University of Chicago.[1] SPSS is among the most widely used programs for statistical analysis in social science. It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others.

In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the datafile) are features of the base software.

Statistics included in the base software:


y y y y

Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore, Descriptive Ratio Statistics Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial, distances), Nonparametric tests Prediction for numerical outcomes: Linear regression Prediction for identifying groups: Factor analysis, cluster analysis (two-step, Kmeans, hierarchical), Discriminant

SPSS datasets have a 2-dimensional table structure where the rows typically represent cases (such as individuals or households) and the columns represent measurements (such as age, sex or household income). Only 2 data types are defined: numeric and text (or "string"). All data processing occurs sequentially case-by-case through the file. Files can be matched one-to-one and one-to-many, but not many-to-many.The graphical user interface has two views which can be toggled by clicking on one of the two tabs in the bottom left of the SPSS window. The 'Data View' shows a spreadsheet view of the cases (rows) and variables (columns). Unlike spreadsheets, the data cells can only contain numbers or text and formulas cannot be stored in these cells. The 'Variable View' displays the metadata dictionary where each row represents a variable and shows the variable name, variable label, value label(s), print width, measurement type and a variety of other characteristics. Cells in both views can be manually edited, defining the file structure and allowing data entry without using command syntax. This may be sufficient for small datasets. Larger datasets such as statistical surveys are more often created in data entry software, or entered during computer-assisted personal interviewing, by scanning and using optical character recognition and optical

mark recognition software, or by direct capture from online questionnaires. These datasets are then read into SPSS.
y

SPSS can read and write data from ASCII text files (including hierarchical files), other statistics packages, spreadsheets and databases. SPSS can read and write to external relational database tables via ODBC and SQL.

SPSS Server is a version of SPSS with a client/server architecture. It had some features not available in the desktop version, such as scoring functions (Scoring functions are included in the desktop version from version 19).

FUNCTIONS:

. Introduction SPSS has a wide variety of functions you can use for creating and recoding variables. We will explore three kinds of functions: mathematical functions, string functions, and random number functions. These functions have the same general syntax: function_name(argument1, argument2, etc.) We will illustrate some functions using the following data file that includes name, x, test1, test2, and test3.
DATA LIST FREE / name (A14) x test1 BEGIN DATA. "John Smith" 4.2 "Samuel Adams" 9.0 "Ben Johnson" -6.2 "Chris Adraktas" 9.5 "John Brown" -999 END DATA. LIST. test2 test3. 86.5 -99 82.1 94.2 79.7 84.55 81 82.37 -99 84.81 87 -99 93 79.07 72

The output of the LIST command is shown below.


NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown X 4.20 9.00 -6.20 9.50 -999.00 TEST1 86.50 -99.00 82.10 94.20 79.70 TEST2 84.55 82.37 84.81 -99.00 79.07 TEST3 81.00 -99.00 87.00 93.00 72.00

The variable x uses -999 to indicate missing values, and test1, test2 and test3 use -99 to indicate missing values. Below we tell SPSS about these missing values and list out the data again.

MISSING VALUES x (-999) /test1 test2 test3 (-99). LIST.

The output is shown below. Note that the data really does not look any different after we have defined the missing values. But, as we will see below, SPSS does know to treat these values as missing rather than treating them as though they were -99 and -999.
NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown X 4.20 9.00 -6.20 9.50 -999.00 TEST1 86.50 -99.00 82.10 94.20 79.70 TEST2 84.55 82.37 84.81 -99.00 79.07 TEST3 81.00 -99.00 87.00 93.00 72.00

2. Math functions Now let's try some basic math functions. The trunc function (short for truncate) takes a number and converts it to a whole number (integer) by removing all the decimal places, for example, 6.99 and 6.49 would become 6. By contrast, the rnd function (short for round) rounds numbers to the nearest whole number using conventional rounding rules, for example 6.99 would become 7, but 6.49 would become 6.
COMPUTE t1tr = TRUNC(test1). COMPUTE t2tr = TRUNC(test2). COMPUTE t1rnd = RND(test1). COMPUTE t2rnd = RND(test2). LIST name test1 t1tr t1rnd test2 t2tr t2rnd.

The results below are as we would expect.


NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown TEST1 86.50 -99.00 82.10 94.20 79.70 T1TR 86.00 . 82.00 94.00 79.00 T1RND 87.00 . 82.00 94.00 80.00 TEST2 84.55 82.37 84.81 -99.00 79.07 T2TR 84.00 82.00 84.00 . 79.00 T2RND 85.00 82.00 85.00 . 79.00

SPSS has other mathematical functions. Below we illustrate functions for getting the square root (sqrt), natural log (ln), log to the base 10 (lg10) and exponential (exp). Note that the sqrt, ln and lg10 functions do not work with negative numbers (for example you cannot take the square root of a negative number). SPSS will generate missing values in such cases, as we will see below.
COMPUTE xsqrt COMPUTE xln COMPUTE xlg10 COMPUTE xexp EXECUTE. = = = = SQRT(x). LN(x). LG10(x). EXP(x).

LIST x xsqrt xln xlg10 xexp.

The results are shown below. We expected SPSS to generate missing values for xsqrt, xln and xlg10 when x was negative and we see below that those values are displayed as a single decimal point. This is the way that SPSS shows a system missing value. Also, we see that xsqrt, xln, xlg10 and xexp were all assigned system missing values when x was -999.
X 4.20 9.00 -6.20 9.50 -999.00 XSQRT 2.05 3.00 . 3.08 . XLN 1.44 2.20 . 2.25 . XLG10 XEXP .62 66.69 .95 8103.08 . .00 .98 13359.73 . .

The results also included warnings like the one shown below. The one below is telling us that you cannot take the square root of a negative number and that SPSS is going to set the result to the system missing value.
Warning # 603 The argument to the square root function is less than zero. been set to the system-missing value. The result has

3. Statistical functions SPSS also has statistical functions that operate on one or more variables. For example, we might want to compute the average of the three test scores. SPSS has the MEAN function that can do that for you, as shown below.
COMPUTE avg = MEAN(test1, test2, test3). LIST name test1 test2 test3 avg.

We see the results below. Note that SPSS computed the mean of the non missing values. For Samuel Adams, that meant that his average was the same as his score on test2 since that was the only non-missing value. We could tell SPSS to give anyone a missing value if they have fewer than 2 valid test scores using the mean.2 function. Likewise, we could tell SPSS that we want the mean to be missing if any of the scores were missing, by using the mean.3 function. These are illustrated below.
COMPUTE avg2 = MEAN.2(test1, test2, test3). COMPUTE avg3 = MEAN.3(test1, test2, test3). LIST name test1 test2 test3 avg avg2 avg3.

As you see below, avg2 is missing for Samuel Adams, and avg3 is also missing for Samuel Adams and Chris Adraktas because they both had some missing test scores.
NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown TEST1 86.50 -99.00 82.10 94.20 79.70 TEST2 84.55 82.37 84.81 -99.00 79.07 TEST3 81.00 -99.00 87.00 93.00 72.00 AVG 84.02 82.37 84.64 93.60 76.92 AVG2 84.02 . 84.64 93.60 76.92 AVG3 84.02 . 84.64 . 76.92

In addition to the mean function, SPSS also has sum, sd, variance, min and max functions. 4. String functions

Now let's illustrate some of the SPSS string functions. Below we create up that will be the name converted into upper case, lo that will be the name converted to lower case, and sub that will be the third through eighth character in the persons name. Note that we first had to use the string command to tell SPSS that up lo and sub are string variables that will have a length of up to 14 characters. Had we omitted the string command, these would have been treated as numeric variables, and when SPSS tried to assign a character value to the numeric variables, it would have generated an error. We also create len that is the length of the name variable, and len2 that is the length of the persons name.
STRING up lo (A14) /sub (A6). COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE up = UPCASE(name). lo = LOWER(name). sub = SUBSTR(name,3,8). len = LENGTH(name). len2 = LENGTH(RTRIM(name)).

LIST name up lo sub len len2.

The results are shown below. The results for up lo sub all as we would expect. The result for len may be a bit confusing. The variable len does not refer to the length of the person's name, but it refers to the length of the variable name. When we read the data we entered name (A14) for name, giving the variable a length of 14, and that is why len is always 14. By contrast, len2 uses the rtrim function to strip off any excess blanks, and then it takes the length of that. In the end, len2 returns the length of the persons name, for example John Smith has a length of 10.
NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown UP JOHN SMITH SAMUEL ADAMS BEN JOHNSON CHRIS ADRAKTAS JOHN BROWN LO john smith samuel adams ben johnson chris adraktas john brown SUB hn Smi muel A n John ris Ad hn Bro LEN 14.00 14.00 14.00 14.00 14.00 LEN2 10.00 12.00 11.00 14.00 10.00

Let's use SPSS string functions to get the first name and last name out of the name variable. We start by using the index function to determine the position of the first blank space in the name. We then use the substr function to extract the part of the name before the blank to be the first name, and the part after the blank to be the last name.
STRING fname lname (A10). COMPUTE blank = INDEX(name,' '). COMPUTE fname = SUBSTR(name,1,blank-1). COMPUTE lname = SUBSTR(name,blank+1). LIST name blank fname lname.

The results below show that this was successful. For example, for John Smith, the substr function extracted the first name by taking the substring from the 1st to 4th character of name, and the last name by taking the 6th character and onward.
NAME John Smith Samuel Adams BLANK FNAME 5.00 John 7.00 Samuel LNAME Smith Adams

Ben Johnson Chris Adraktas John Brown

4.00 Ben 6.00 Chris 5.00 John

Johnson Adraktas Brown

5. Random number functions Random numbers are more useful than you might imagine, they are used extensively in Monte Carlo studies, but they are also frequently used in many other situation We will look at two of SPSS's random number functions
y y

uniform(n) - generates a random number that is 0 or greater, and less than n from a uniform distribution. rv.binomial(n,p) - generates a value from the binomial distribution with n trials, and with a probability of success equal to p.

Below we generate a random number that is greater than or equal to 0, but less than 1.
COMPUTE rannum = UNIFORM(1). LIST name rannum.

We see the results below.


NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown RANNUM .14 .43 .61 .29 .16

Below we generate a random number that is greater than or equal to 0, but less than 10.
COMPUTE ran10 = UNIFORM(10). LIST NAME ran10.

And the results are shown below.


NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown RAN10 7.00 3.46 4.46 .52 1.03

The example below generates a whole number (integer) from 1 to 100. The trucn function is used to convert the result into a whole number from 0 to 99, and then 1 is added to make it from 1 to 100.
COMPUTE ran100 = TRUNC(UNIFORM(100)) + 1. LIST name ran100.

As we see below, these values are all whole numbers.

NAME John Smith Samuel Adams Ben Johnson Chris Adraktas John Brown

RAN100 15.00 5.00 63.00 16.00 72.00

Below we use the rv.binomial function to simulate a coin flip. It is like a coin flip since the number of trials is 1 and the probability of success is .5 (like flipping a coin once and the probability of it coming up heads is .5). Let's treat a 1 as coming up heads, and a 0 as coming up tails. As we see below, Ben and John each got a head, and the others got tails.
COMPUTE flip = RV.BINOMIAL(1 , .5 ). LIST name flip. NAME FLIP John Smith .00 Samuel Adams .00 Ben Johnson 1.00 Chris Adraktas .00 John Brown 1.00

Below, we change the number of flips to 10, and count the number of heads each person gets. John got the most heads (7) and Ben got the fewest (4).
COMPUTE flip10 = RV.BINOMIAL(10 , .5 ). LIST name flip10. NAME FLIP10 John Smith 6.00 Samuel Adams 6.00 Ben Johnson 4.00 Chris Adraktas 5.00 John Brown 7.00

The next example changes the flips to 100. It also sets the seed for the random number generator. The seed determines the string of random numbers that will be generated. John got the fewest heads (49 out of 100) and Samuel got the most (58 out of 100).
SET SEED = 149238. COMPUTE flip100 = RV.BINOMIAL(100 , .5 ). LIST name flip100 . NAME FLIP100 John Smith 49.00 Samuel Adams 58.00 Ben Johnson 52.00 Chris Adraktas 53.00 John Brown 52.00

If we repeat the example from above using the exact same seed, we will get the same results. This is very useful for being able to replicate results of a simulation study or Monte Carlo style study. Indeed, using the same seed did generate the same results (see below).
SET SEED = 149238. COMPUTE flip100 = RV.BINOMIAL(100 , .5 ). LIST name flip100 . NAME FLIP100 John Smith 49.00

Samuel Adams Ben Johnson Chris Adraktas John Brown

58.00 52.00 53.00 52.00

6. Random number functions, advanced In the examples above, we used the rv.binomial function to simulate coin flips but it gave us the end result of all of the flips. Perhaps you would like to do a simulation study where you generate each of the flips as a separate observation. SPSS can do this, as we illustrate below.
SET seed=943785. INPUT PROGRAM. + LOOP id = 1 to 25. + COMPUTE cointoss = RV.BINOMIAL( 1 , .5 ). + END CASE. + END LOOP. + END FILE. END INPUT PROGRAM. LIST CASES.

The program above creates 25 observations, each having a variable called id which is the trial number, and cointoss that will be either 1 or 0. Even if this program does not make much sense to you, you could use it as a template to make your own simulation. You can change the number of trials by changing 25 to the number of trials you want. You can change the probability of success by changing the value of .5 to the value you would like. Or, you could choose an entirely different random number generating function instead of rv.binomial you might choose uniform. The results of the program above are shown below.
ID COINTOSS 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 .00 1.00 1.00 .00 .00 1.00 .00 .00 .00 .00 .00 1.00 .00 .00 .00 1.00 .00 1.00 1.00 1.00 1.00 1.00 .00 1.00 .00

Vous aimerez peut-être aussi