Académique Documents
Professionnel Documents
Culture Documents
By
Dr.R.RAVANAN
Associate Professor Department of Statistics Presidency College Chennai 600 005
What is SPSS?
Statistical Package for Social Science General Purpose Statistical Software Consists of three components
Data Window - data entry and database (.sav) Output Window - all output from any SPSS session (.lst) Syntax Window - commands lines (.sps)
Data Definition
Purpose:
Give meanings to the numbers for ease of reading the output
Involves
Data Format Variable Name Value Labels Missing Values Command: Data Data Definition
Data Manipulation
Recoding To give new values to old values (especially reversing negatively worded questions) To form nominal variable from continuous data Variable Development To form new variables combinations of old ones or functions of old ones Command: Transform Recode/ Compute
Means of variables by subgroups defined by one or more nominal variables Analyze Compare Means Means (Use of Levels)
Command
Analyze Compare Means Independent t-test/ Paired t-test/ oneway ANOVA
Command
Analyze Non-parametric 2 Independent/ 2 related samples/ k independent samples/ k related samples
Command
Analyze General Linear Model Simple Note: Fixed Factor Effect
Bivariate Relationship
When
Covariation between two variables
Correlation:
When both are continuous or ordinal
Command
Analyze Correlate Bivariate (with option for Spearman if both ordinal)
Regression Analysis
When
To establish relationship between one continuous dependent variable and a number of continuous independent variables
Command
Analyze Regression Linear (Use Statistics, Save options)
Issues:
Assumptions of Regression - normality; constant variance, independence of independent variables; independence of error terms
Regression Analysis
Issues (cont.)
Outliers and Leverage Values Choice of Selection Method of Independent Variables - Enter, Backward, Forward, Stepwise Dummy Independent Variables
Options
Residual Analysis; Influence Statistics, Collinearity Diagnostics, Normality Plots
Regression Analysis
Interpretation
Goodness of Model: R2, F-statistics, Adj. R2, Standard error Strength of Influence of Independent Variables: beta and standardized beta
Reliability Analysis
When
Before forming composite index to a variable from a number of items
Command
Analyze Scale Reliability Analysis (with option for Descriptives item, scale, scale if item deleted)
Interpretation
alpha value greater than 0.7 is good; more than 0.5 is acceptable; delete some items if necessary
Measures of Reliability
Internal Consistency: (of items in a scale): 1. Average inter-item correlation If average inter-item correlation > 0.6, then standardize items and add them together as an index. 2. Cronbach's alpha , which measures " internal consistency of items in a scale" Garson ,G.D.(1999) and is
Factor Analysis
When
To reduce the number of variables to underlying dimensions
Command
Analyze Data Reduction Factor (Option: rotation, save factor scores)
Issues
Assumptions sufficient correlations between the variables (Bartlett test; anti-image, KMO test of sufficiency)
Discriminant Analysis
When
Dependent Variable is Nominal and the Purpose is to predict group membership on the basis of independent variables
Command
Analyze Classify Discriminant (Option: Classify by summary tables; Select - for holdout and analysis samples
Issues
Similar to Regression
Discriminant Analysis
Interpretation
Goodness of Analysis: Hits Ratio compared to maximum chance, proportional chance and Press Q. Univariate Results: To establish the discriminating variables
10
11
12
HS
HS
HS
HS
HS
Test whether the level of satisfaction are above average level at 1% level
Solution:
1. Null Hypothesis: The level of satisfaction of employees is equal to average level. 2. Alternate Hypothesis: The level of satisfaction of employees is not equal to average level 3. Test Statistic: t test for single mean is
Exercise 2: t -TEST FOR DIFFERENCE OF TWO MEANS(INDEPENDENT SAMPLE) Problem: The Marks obtained by a group of 9 regular students and another group of 11 part-time course students in a test are given below:
Regular Part -Time 70 78 75 71 73 59 78 69 62 70 71 62 60 56 69 64 72 72 68 66
Examine whether the marks obtained by regular and part-time students differ significantly at 5% level of significance.
Solution: 1. Null Hypothesis: There is no significant difference between the average marks obtained by regular and Part time students 2. Alternate Hypothesis: There is a significant difference between the average marks obtained by regular and Part-Time students.
Exercise 3: PAIRED t TEST FOR DIFFERENCE OF TWO MEANS (DEPENDENT SAMPLES) Problem: A Company arranged an intensive training course for its team of salesmen. A random sample of 10 salesmen was selected and the value (in 000) of their sales made in the weeks immediately before and after the course are shown in the following table: Salesmen Sales Before Sales After 1 12 18 2 3 4 18 5 10 13 6 21 22 7 8 9 8 12 10 14 16
23 5
19 15 17 19
22 15 21
Solution: 1. Null Hypothesis: There is no significant difference in mean sales of before and after the training course. 2. Alternate Hypothesis: There is significant difference in mean sales of before and after the training course.
Method II
27
33
42
35
32
34
38
Test whether there is any significance difference between the variance of time distribution.
Solution: 1. Null Hypothesis: There is no significant difference between the variance of method I and method II with regard to time distribution. 2. Alternate Hypothesis: There is significant difference between the variance of method I and method II with regard to time distribution. 3. Test Statistic: F test for equality of variance is
The Following table gives the yields of 15 sample of plot under three varieties of seed. Variety A Variety B Variety C 20 18 25 20 20 28 23 17 22 16 15 28 20 25 32
Test whether there is significance difference in the average yield of three varieties of seed
1. Null
Hypothesis: There is no significant difference between average yield of three varieties of seeds
2. Alternate Hypothesis: There is a significant difference between the average yield of three varieties of seeds.
Variety
1 A B C 52 43 39 2 56 41 39
Blocks
3 48 45 41 4 44 38 41
1. Null Hypothesis: There is no significant difference between the mean yields between varieties as well as blocks. 2. Alternate Hypothesis: There is significant difference between the mean yields between varieties as well as blocks.
Problem: A company keeps records of accidents. During a recent safety review, a random sample of 60 accidents was selected and classified by the day of the week on which they occurred.
Day No of accidents Monday Tuesday Wednesday Thursday Friday 8 12 9 14 17
Test whether there is any evidence that accidents are more likely on some days than others.
Solution: 1. Null Hypothesis: Accidents are equally distributed over the days of the week. 2. Alternate Hypothesis: Accidents are not equally distributed over the days of the week 3. Test Statistic: Chi-square test for goodness of fit is
Exercise 8: CHI SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES Problem: The following table gives the data relating to the condition of child and condition of home. Test whether the two attributes are independent.
Condition of Child
Clean
Condition of Home
Clean 70 Dirty 50
Fairly clean
Dirty
80
35
20
45
Solution: 1. Null Hypothesis: There is no association between condition of child and condition of home. 2. Alternate Hypothesis: There is an association between condition of child and condition of home. 3. Test Statistic: Chi-square test for independence of attributes is
Exercise 9: TEST FOR SIGNIFICANCE OF CORRELATION COEFFICIENT Problem: Find the correlation coefficient between income and expenditure of the family to the following data. Also test whether correlation coefficient is significant. Income ( in hundreds) Expenditure (in hundreds) 60 55 58 50 45 40 65 60 56 62 38 45 70 63
1. Null Hypothesis: There is no relationship between income and expenditure of the family
2. Alternate Hypothesis: There is relationship between income and expenditure of the family 3. Test Statistic: t test for coefficient of correlation is
1 2
5.2 5.1
28 26
3 3
3
4 5 6
5.6
4.6 11.3 8.1
32
24 54 29
2
1 4 2
7
8 9 10
7.8
5.8 5.1 18.0
44
30 40 82
3
2 1 6
Non-Parametric Test
One sample test:
Binomial Test Chi-Square test for goodness of fit Kolmogorov-Smirnov one sample test
Non-Parametric Test
Two dependent sample
McNemar test Sign test Wilcoxon Matched-Pairs signed rank test Walsh test
Mann-Whitney U test
Mann-Whitney U test is
Where
Wilcoxon test
Wilcoxon test is
Where
Where
Where