Académique Documents
Professionnel Documents
Culture Documents
Variables: A quantity which changes its value time to time, place to place and person to person is called variable and if the corresponding probabilities are attached with the values of variable then it is called a random variable. For example If we say x= 1 or x=7 or x=-6 then x is a variable but if a variable appears in the following way then it is known as a random variable.
P(x)
0.2
0.3
0.1
0.4
Population:
A large count or the whole count of the object related things is called population. There are two types of population it may be finite or infinite. If the population elements are countable then it is known as finite population but if the population elements are uncountable then it is called an infinite population.
For example: Population of MBA students at IUGC (Finite Population)
Page 2
Statistical Applications through SPSS Population of the University teachers in Pakistan (Finite Population) Population of trees (Infinite Population) Population of sea life (Infinite Population)
The population is also categorized in two ways. 1. 2. Homogeneous population Heterogeneous population
Homogeneous Population:
If all the population elements have the same properties then the population is known as homogeneous population. For example: Population of shops, Population of houses, Population of boys, Population of rice in a box etc.
Heterogeneous Population:
If all the population elements do not have the same properties then the population is known as homogeneous population. For example: Population of MBA students (Male and Female), Population of plants, etc.
Parameter:
A constant computed from the population or a population characteristic is known as parameter. For Example: Population Mean , Population standard deviation , coefficient of skewness and kurtosis for the population.
Statistic:
A constant computed from the Sample or a sample characteristic is known as parameter. For Example: Sample mean , sample standard deviation s, coefficient of skewness and kurtosis for the sample.
Estimator:
A Sample statistic used to estimate the population parameter is known as estimator.
Page 3
Sample mean is used to estimate the population mean. So sample mean is also called an estimator of population mean.
Sample variance is used to estimate the population variance. So sample variance is also called an estimator of population variance.
Hypothesis:
An assumption about the population parameter tested on the basis of sample information is called hypothesis or hypothesis testing. These assumptions are established in the way that we generate two alternative statements say null and alternative hypothesis in such a manner if one statement is found wrong automatically other one is selected as correct statement.
As it is clear from above statements there are two types of null hypothesis. 1- Simple null hypothesis 2- Composite null hypothesis
Page 4
Statistical Applications through SPSS Average rain fall in United States of America during 1999 was 200 mm. The average concentrations of two substances are same The IQ level of MBA and BBA students are same. IQ level is independent from education level.
2) Alternative Hypothesis:
An Automatically generated statement against the established null hypothesis is called an alternative hypothesis. For Example: Null Hypothesis H0: H0: H0: = 0 0 0 Alternatives Hypothesis (H1: 0, (H1: 0, (H1: 0, H1: > 0, H1: > 0, H1: > 0, H 1: H 1: H 1: < 0) < 0) < 0)
It is clear from the above stated alternatives that there are two different types of alternatives. 1- One tailed or One sided alternative hypothesis 2- Two tailed or two sided alternative hypothesis
Page 5
Statistical Applications through SPSS For Example: Phrases Average rain fall in Pakistan is more then from average rain fall in Jakarta. Inzamam is more consistent player then Shaid Afridi. Waseem Akram is a better bowler then McGrath. Gold prices are dependent on oil prices. H1: > 0, Or H 1: < 0
Page 6
Type I Error
True Population
1-
Other Population
It is clear from the above figures that both the errors can not be minimized at the same time. An increase is observed in the type II error when type I error is minimized.
P- Value:
It is the minimum value of alpha which is needed to reject a true null hypothesis. As it is the value of so it can be explain as the minimum value of type I error which is associated with a hypothesis while it is testing. Therefore, it is used in two ways, one in decision making and the other to determine the probability of type I error associated with the testing.
Page 7
If the p-value for any test appears 0.01. It is indicating that our null hypothesis is to be rejected and there is only 1% chance of rejecting a true null hypothesis. That further can explain as we are 99% confident in rejection of the null hypothesis. Or we can say that we can reject our this null hypothesis up to = 1% or 99% confidence level If the p-value for any test appears 0.21. It is indicating that our null hypothesis is to be accepted and there is 21% chance of rejecting a true null hypothesis. That further can explain as we are 79% confident in our decision and rejection of the null hypothesis. Or we can say that our this true null hypothesis may be rejected at = 21%.
T-test:A t-test is a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.
A test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points. We can use some kind of t-test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. There are different versions of the t- test depending on whether the two samples are o Unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group), or o Paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention.
Interpretation of the results:If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.
The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-tonoise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group difference. Figure shows the formula for the t-test and how the numerator and denominator are related to the distributions.
The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square root The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value we have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, we need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred we would find a statistically significant difference between the means even if there was none (i.e., by "chance"). We also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, we can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is
Quantitative Techniques in Analysis Page 9
large enough to be significant. If it is, we can conclude that the difference between the means for the two groups is different (even given the variability.
Calculations:a)
In testing the null hypothesis that the population means is equal to a specified value 0, one uses the statistic.
Where s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test is n 1.
b)
A) Equal sample sizes, equal variance This test is only used when both: the two sample sizes (that is, the n or number of participants of each group) are equal; It can be assumed that the two distributions have the same variance.
Violations of these assumptions are discussed below. The t statistic to test whether the means are different can be calculated as follows:
Where;
Page 10
Here is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of t is the standard error of the difference between two means. For significance testing, the degrees of freedom for this test is n1 + n2 2 where n1 = # of participants of group # 1 and n2= # of participants of group # 2 B) Unequal sample sizes, unequal variance This test is used only when the two sample sizes are unequal and the variance is assumed to be different. See also Welch's t test. The t statistic to test whether the means are different can be calculated as follows:
Where n1 = number of participants of group 1 and n2 is number of participants group two. In this case, variance is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student's t distribution with the degrees of freedom calculated using
This is called the Welch-Satterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances. This test can be used as either a one-tailed or two-tailed test. c) Dependent t-test for paired samples:This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".
For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and
Quantitative Techniques in Analysis Page 11
standard deviation (sD) of those differences are used in the equation. The constant 0 is non-zero if you want to test whether the average of the difference is significantly different than 0. The degree of freedom used is N 1.
The data should be in the form of numerical (i.e the numerical variable) A test value which is our hypothetical value to which we are going to test.
To analyze the one-sample t-test I have use the employees salaries of an organization. For this purpose, I have select the sample of 474 employees of the company. The hypotheses are:
a)
The null hypothesis states that the average salary of the employee is equal to 30,000. H0 : 30,000
b)
The alternative hypothesis states that the average salary of the employee is not equal to 30,000. HA: 30,000
Method:
Enter the data in the data editor and the variable is labeled as employee's current salary. Now click on Analyze which will produce a drop down menu, choose Compare means from that and click on one-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select a variable, which is to be computed. The variable computed in our case is Current salaries of the employees. The variables can be selected for analysis by transferring them to the test variable box. Next, change the value in the test value box, which originally appears as 0, to the one against which you are testing the sample mean. In this case, this value would be 35000. Now click on OK to run the analysis.
Pictorial Representation
Quantitative Techniques in Analysis Page 12
Analyze (Scale)
Page 13
Mean $34,419.57
Interpretation:In above table N shows the total number of observation. The average salary of total employees is 34,419.57. The standard deviation of the data is 17,075.661and the standard error of the mean is 784.311.
One-Sample Test Test Value = 30000 t Current Salary 5.635 Df 473 Sig. (2-tailed) .000 Mean Difference $4,419.568 95% Confidence Interval of the Difference Lower $2,878.40 Upper $5,960.73
ii)
iii)
iv) v)
T value is positive which show that our estimated mean value is greater than actual value of mean. Degree of freedom is (N 1) = 473. The P-value is 0.000 which is less than 0.05. The difference between the estimated & actual mean is 4,419.568. Confidence interval has the lower & upper limit 2,878.4 & 5,960.73 respectively. The confidence interval limits does not contains zero.
Decision:On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) The P-value is 0.000 which is less than 0.05. ii) The confidence interval limits does not contains zero. Comments:The average salary of employees is not equal to 30,000.
Quantitative Techniques in Analysis Page 14
Example # 02
B) Independent t-test:SPSS need:1)
Two variable are required one should be numerical and other should be categorical with two levels.
To analyze the independent t-test I have use the employees salaries of an organization. For this purpose, I have select the sample of 474 employees of the company containing the both males and females. In my analysis I assigned males as m and female as f. The hypotheses are:
a)
The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 :
i.e b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:
Enter the data in the data editor and the variables are labeled as employee's beginning salary and employee's designations respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Beginning salary of the employees is the dependent variable to be analyzed and should be transferred into test variable box by clicking on the first arrow in the middle of the two boxes. Job category is the variable which will identify the groups of the employees and it should be transferred into the grouping variable box. Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents the employees belong to clerical category and group2 represents the employees belong to the custodial category. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis. Quantitative Techniques in Analysis Page 15
Pictorial Representation
Analyze Grouping Variable Compare Means Define Groups Independent-Samples T Test OK Drag Test &
Page 16
SPSS output:Group Statistics Gender Current Salary Male Female N 258 216 Mean $41,441.78 $26,031.92 Std. Deviation $19,499.214 $7,558.021 Std. Error Mean $1,213.968 $514.258
Interpretation:Through above table we can observe that, Total number of male is 258 and the female is 216. The mean value of salaries of male employee is 41,441.78 & the female employee is 26,031.92. iii) Standard deviation of salaries of male employee is 19,449.214 & the female employee is 7,558.021. iv) Standard error of mean of salaries of male employees is 1,213.968 & the Standard error of mean of salaries of female employees is 514.258.
i) ii)
Current Salary
t-test for Equality of Means Mean Difference $15,409.862 10.945 472 .000 $15,409.862 11.688 344.262 .000 $1,318.400 $12,816.728 $18,002.996 $1,407.906 $12,643.322 $18,176.401 Std. Error Difference 95% Confidence Interval of the Difference Lower Upper
Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,
Page 17
F value is 119.669 with significant value of 0.00 which is less than 0.05. On the basis of P-value of F-test part we assume that that the variance of the two populations is not equal. iii) T value is positive which show that the mean value of salaries of male employees is greater than the mean value of salaries of female employees iv) Degree of freedom is 344.262. v) The P-value is 0.000 which is less than 0.05. vi) The difference between the two population mean is 15,409.862. vii) The standard error difference between the two population mean is 1,318.400. viii) Confidence interval has the lower & upper limit 12,816.728 & 18,002.996 respectively. The confidence interval limits does not contains zero. Decision:On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) The P-value is 0.000 which is less than 0.05. ii) The confidence interval limits does not contains zero. Comments:The average salaries of male & female employees are not equal.
Example # 03
C) Paired t-test:SPSS need:1)
To analyze the paired t-test I used the begging & ending salaries of the employees of an organization. For this purpose, I have select the sample of 474 employees of the organization. The hypotheses are: a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee.
Quantitative Techniques in Analysis Page 18
H0 : i.e
b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:
Enter the data in the data editor and the variables are labeled as employee's current and beginning salary respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on Paired-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select variables, which are to be computed. The two variables computed in our case are Current and Beginning salaries. Select these together and they will immediately appear in the box at the bottom labeled current selection. They are simultaneously highlighted in the box in which they originally appeared. Once the variables are selected the arrow at the center becomes active. The variables can be transferred to the Paired-Variables box by clicking on this arrow. They will appear in the box as Current-Beginning. Now click on OK to run the analysis.
Pictorial Representation
Analyze Compare Means (Scale) Paired-Samples T Test OK Drag Paired Variables
Page 19
Pair 1
N 474 474
Page 20
Interpretation:Through above table we can observe that, i) The mean vale of current & beginning salary is 34,419.57 & 17,016.09 respectively. ii) Total number of both groups is 474 individually. iii) The standard deviation of current & beginning salary is 17,075.661 & 7,870.638 respectively. iv) The standard error mean of current & beginning salary is 784.331 & 361.510 respectively.
Paired Samples Correlations N Pair 1 Current Salary & Beginning Salary 474 Correlation .880 Sig. .000
Analyze
Interpretation:Through above table we can observe that, i) The total number of pair is 474. ii) 0.88 show that the both values of group are highly co-related, which indicate that the employees who has greater begging salary has also greater current salary. iii) The P-value is 0.00 which is less than 0.05.
Paired Samples Test
Mean
Std. Deviation
df
Sig. (2-tailed)
$17,403.481
$10,814.620
$496.732
35.036
473
.000
Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that, i) The mean value of pair is 17,403.481. ii) The standard deviation of pair is 10,814.620.
Quantitative Techniques in Analysis Page 21
iii) The standard error mean of pair is 496.732. iv) Confidence interval has the lower & upper limit 16,427.407 & 18,379.555 respectively. The confidence interval limits does not contains zero. v) T- Value is 35.036. vi) Degree of freedom is (N-1) = 473. vii) P-vale is 0.00 which is less than 0.05.
Decision:On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. iii) The P-value is 0.000 which is less than 0.05. iv) The confidence interval limits does not contains zero. Comments:The mean difference of the two paired variables i.e. current and beginning salary is significant or not same.
One-Way ANOVA:
ANOVA is a commonly used statistical method for making simultaneous comparisons between two or more population means, that yield values that can be tested to determine whether a significant relation exist between variables or not. Its simplest form is One-Way ANOVA, it involves only one dependent variable and one or more independent variables.
Data Source:
C:\SPSSEVAL\Employee Data
Hypothesis:
H0: HA: 1 = 2 = 3 at least one mean is not equal.
Page 22
SPSS Need:
SPSS need two types of variables for analyzing one-way ANOVA. Numerical Variable (Scale). Categorical Variable (with more than two categories).
Method:
First of all enter the data in the data editor and the variables are labeled as employee's current salary and employment category respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on One-Way ANOVA, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform one-way ANOVA, transfer the dependent variable into the box labeled Dependent List and all factoring variable into the box labeled Factor. In our case Current salary is the dependent variable and should be transferred to the dependent list box by clicking on the first arrow in the middle of the two boxes. Employment Category is the factoring variable and should be transferred to the factor box by clicking on the second arrow and then click OK to run the analysis. If the null hypothesis is rejected, ANOVA only tells us that all population means are not equal. Multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.
Pictorial Representation
Analyze & Factors Compare Means Post Hoc (Optional) One-Way ANOVA OK Drag Dependent List
Page 23
Output:
ANOVA Current Salary Between Groups Within Groups Total Sum of Squares 89438483925.943 48478011510.397 137916495436.340 df 2 471 473 Mean Square 44719241962.972 102925714.459 F 434.481 Sig. .000
The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled between groups gives the variability due to the different designations of the employees (known reasons). The second row labeled within groups gives the variability due to random error (unknown reasons), and the third row gives the total variability. In this case, F-value is 434.481, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis and conclude that the average salary of the employees is not the same in all three categories.
Page 24
(I) Employment Category (J) Employment Category Clerical Custodial Manager Custodial Clerical Manager Manager Clerical Custodial
The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for Clerical Manager and Custodial Manager comparison is shown as 0.000, whereas it is 0.126 for Clerical Custodial comparison. This means that the average current salary of the employees between Clerical and Manager as well as Manager and Custodial are significantly different, whereas the same is not significantly different between Clerical and Custodial. Conclusion: As our null hypothesis is rejected and we conclude that all three means are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of managers is significantly different from other two means whereas the other two means are insignificant with each other.
Two-Way ANOVA
In two-Way Analysis, we have two independent variables or known factors and we are interested in knowing their effect on the same dependent variable.
Data Source:
C:\SPSSEVAL\Carpet
Variables: Here we analyze two different categorical variables with a numerical variable by Two-Way ANOVA,
i.e. A) Preference B) Package design C) Brand (Numerical) (Categorical) (Categorical)
Hypothesis:
Page 25
Statistical Applications through SPSS For Brand: For Package: H0: HA: H0': HA': i = j i j for all i & j i = j i j for all i & j
SPSS Need:
SPSS need two types of variables for analyzing two-way ANOVA. Numerical Variable (Scale). Two categorical Variables (with more than two levels).
Method:
First of all enter the data in the data editor and the variables are labeled as Preference, brand, package design respectively. Click on Analyze which will produce a drop down menu, choose General Linear model from that and click on Univariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform two-way ANOVA, transfer the dependent variable (Preference) into the box labeled Dependent variable and factor variable (Brand & Package) into the box labeled Fixed Factor. After defining all variables, now click on OK to run the analysis. If the null hypothesis is rejected, multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.
Pictorial Representation
Analyze General Linear Model Drag Dependent Variable & Fixed Factors Univariate Post Hoc OK
Page 26
Output:
Between-Subjects Factors Package design Brand name 1.00 2.00 3.00 1.00 2.00 3.00 Value Label A* B* C* K2R Glory Bissell N 9 6 7 7 7 8
This table shows the value label under each category and the frequency of each value label. We have totaled 6 value labels under package design and brand name.
Tests of Between-Subjects Effects Dependent Variable: Preference Source package brand Error Total Type III Sum of Squares
a
df 2 2 13 22
F 16.883 1.135
The above table gives the test results for the analysis of two-way ANOVA. The results are given in four rows. The first row labeled package gives the variability due to the different package design of the carpets, which may affect the customer's preferences (known reason). The second row labeled brand gives the variability due to the different brand names (known reason). The third row labeled error gives the variability due to random error, which also affects the customer's preferences (unknown reasons). The fourth row gives the total variability in the customer's preferences due to both known and unknown reasons. In this case, F-value for package design is 16.883, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis for package design and conclude that the average preference for all packages is not same.
Page 27
Now the F-value for brand name is 1.135, and the corresponding p-value is greater than 0.05. So we can accept the null hypothesis for brand and conclude that all average brand preferences are found approximately same.
* * * *
95%C onfidence Interval Low B er ound U pper B ound 7.0139 16.0972 4.9272 13.6125 -16.0972 -7.0139 -7.0799 2.5085 -13.6125 -4.9272 -2.5085 7.0799
B ased on observed m eans. *. The m ean difference is significant at the .05 level.
As our null hypothesis for package design is rejected, so multiple comparisons are used to assess that which group mean is different from the others. The above table gives the results for multiple comparisons between each value label under package design category. The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for A* B* and A* C* comparison is shown as 0.000, whereas it is 0.322 for B* C* comparison. This means that the average preference for package design between A* and B* as well as A* and C* are significantly different, whereas the same is not significantly different between B* & C*. Conclusion: As our null hypothesis for package design is rejected and we conclude that all mean preferences for package design are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of A* is significantly different from other two means whereas the other two means are insignificant with each other. But in the case of brand name, our null hypothesis is accepted and we conclude that all mean brand preferences are same. So there is no need for multiple comparisons in the case of brand.
Chi-Square Test
Chi-square test is a test which is commonly used to test the hypothesis regarding; Goodness of fit test Test for Association / Independence / Attributes
It is denoted by "2" and its degree of freedom is "n-1", where n = Number of categories Quantitative Techniques in Analysis Page 28
It is a positively skewed distribution so that, it has one tailed critical region on the right tail of the curve and the value of 2 is always positive.
Data Source:
C:\SPSSEVAL\Carpet
SPSS Need:
SPSS need a categorical variable or a numerical variable for analyzing Chi-Square goodness of fit test.
Graphical Representation:
8
Frequency
Mean = 2.00 Std. Dev. = 0.87287 N = 22 0 0.50 1.00 1.50 Quantitative Techniques in Analysis 2.00 2.50 3.00 3.50
Page 29
Price
Explanation of Graph
From the above graph we see that our numerical variable (price) is on x-axis and its frequency on the y-axis. The mean and standard deviation of 22 observations are 2.00 and 0.87287 respectively. The above graph clearly shows that the selected numerical variable i.e. price does not follow a normal distribution, so we use chi-square goodness of fit test to determine if the sample under investigation has been drawn from a population, which follows some specified distribution.
Method:
First of all enter the data in the data editor and the variables are labeled as price. Click on Analyze which will produce a drop down menu, choose non-parametric test from that and click on Chi-square test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to analyze. When you select the test variable, the arrow between the two boxes will now active and you can transfer the variable on the box labeled test variable list by clicking on the arrow. In this case our test variable is price and it should be transferred to the test variable box. You also click on the options button, if you are interested to know the descriptive statistics of the tested variable. Now click on OK to run the analysis.
Pictorial Representation
Analyze Non-parametric test OK Chi-square Define test Variable list
Page 30
Page 31
Output
Price $1.19 $1.39 $1.59 Total Observed N 8 6 8 22 Expected N 7.3 7.3 7.3 Residual .7 -1.3 .7
First column of the above table shows the three categories in price variable. The column labeled Observed N gives the actual number of cases falling in different categories of test variable, which is directly obtained from the data given. The column labeled Expected N gives the expected number of cases that should fall in each category of the test variable. The column labeled Residual gives the difference between observed and expected frequencies of each category, and it is commonly known as Error.
a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 7.3.
Page 32
The above table gives the test results for Chi-Square Goodness of Fit Test. In this case the chi-square value is 0.364 with a degree of freedom 2. The p-value for the test is shown as 0.834 which is greater than 0.05, so we can accept our null hypothesis that Fit is good. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that our null hypothesis is correct and our test variable (price) follows a uniform distribution, and we are 16.6% confident in our decision and the rejection of the null hypothesis.
Data Source:
C:\SPSSEVAL\Employee Data
Hypothesis:
H0: HA: Designation is independent of Sex. Designation is not independent of Sex.
SPSS Need:
SPSS need two categorical variables for analyzing Chi-Square test for independence.
Method:
First of all enter the data in the data editor and the variables are labeled as Gender, Designation, respectively. Click on Analyze which will produce a drop down menu, choose Descriptive Statistics from that and click on Crosstabs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to create the row of your contingency table and transfer it to the box labeled Row(s), transfer the other variable to the box labeled Column(s). In this case we transfer gender to the box labeled Row(s) and designation to the box labeled column(s). Next, click on the Statistics button, which brings up a dialogue box. Here Tick the first box labeled Chi-Square and click continue to return to the previous screen. Click on OK to run the analysis.
Pictorial Representation
Analyze Descriptive Statistics Tick Chi-Square OK Crosstabs Drag Row and Column Variables
Page 33
Output
Page 34
Gender * Employment Category Crosstabulation Count Employment Category Clerical Custodial Manager 206 0 10 157 27 74 363 27 84 Total 216 258 474
Gender Total
Female Male
Cross tabulation is used to examine the variation in the categorical data, it is a cross measuring analysis. Above we are cross examine the gender and designation of the employees. We take designation of the employees in the column and gender of the employees in row, and we have totaled 474 observations. The results are given in two rows; the first row shows the number of female employees in each employment category. The second row shows the number of male employees in each employment category.
Chi-Square Tests Value 79.277 a 474 df 2 Asymp. Sig. (2-sided) .000
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 12.30.
The above table gives the test results for the chi-square test for independence. The first row labeled Pearson ChiSquare shows that the value of 2 is 79.277 with 2 degree of freedom. The two-tailed p-value is shown as 0.000, which is less than 0.05, so we can reject our null hypothesis and conclude that the Designation is not independent of Sex. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that the designation of the employees is not independent of the sex, and we are almost 100% confident on our decision and the rejection of the null hypothesis.
Second Approach
Consider a case in which the data is not available and only the table labeled as Gender*Employment Category Crosstabulation in the above output, is given. On the basis of the output table you can easily find out the same result as above by using SPSS weight cases options. Below we briefly explain that how to enter the data on the basis of the table and to find out the desired results.
Method
Quantitative Techniques in Analysis Page 35
First of all, in the variable view of SPSS define three variables and labeled them as Gender, Employment Category and Value. Now on the data view of the SPSS, enter the data in a different manner. We see that the table contains two rows and 3 columns. In row we have two categories i.e. Female and Male; similarly in columns we have three categories i.e. Clerical, Custodial, and Manager. Now the female and male employees, both are fall in the three employment category. So in the data view we simply define the row data i.e. Gender and its opposite define the column data i.e. Employment category and its corresponding frequencies in the Value column. The resulted data view is shown in the picture below.
. After defining the data just click on Data, which will produce a drop down menu, choose weight cases from that, a dialogue box appears in which all the variables are on the left hand side of that box. Tick weight cases by and drag value in the box labeled Frequency Variable by clicking on the arrow between the two boxes. Now click OK to return to the previous window.
The Further process is same as described above. Just define Gender in row and Employment category in Column. Tick Chi-square by clicking on the Statistics button. Now click OK to run the analysis. When the output appears, you will see that SPSS will give you the same result as we find out earlier through data.
Regression Analysis
Regression is the relationship between selected values of independent variable and observed values of dependent variable, from which the most probable value of dependent variable can be predicted for any value of independent variable. The use of regression to make quantitative predictions of one variable from the values of another variable is called regression analysis. There are following several types of regression, which may be used by the researcher. Linear regression Page 36
Statistical Applications through SPSS Multiple linear regression Quadratic / Curvilinear regression Logistic / Binary logistic regression Multivariate logistic regression
Linear Regression
When one dependent variable depends on single independent variable then their dependency called linear regression and its model is given by y = a + bx Where, y x a b is is is is a depending variable a independent variable called the regression constant called the regression coefficient
Regression Coefficient
Regression coefficient is a measure of how strongly the independent variable predicts the dependent variable. There are two types of regression coefficient. Un-standardized coefficients Standardized coefficients commonly known as Beta.
The un-standardized coefficients can be used in the equation as coefficients of different independent variables along with the constant term to predict the value of dependent variable. The standardized coefficient is, however, measured, in standard deviations. The beta value of 2 associated with a particular independent variable indicates that a change of 1 standard deviation in that particular independent variable will result in change of 2 standard deviations in the dependent variable.
C:\SPSSEVAL\Employee Data
Here we are interested to analyze two numerical variables i.e. Current salary Beginning salary (Numerical) (Numerical)
Page 37
Hypothesis:
H0: HA: Regression coefficient is zero. Regression coefficient is not zero.
SPSS Need:
Method:
The given data is entered in the data editor and the variables are labeled as current salary and beginning salary. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Linear, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent. Transfer the independent variable into the box labeled Independent(s). In our case, current salary is a dependent variable and beginning salary is an independent variable. Next we have to select the method for analysis in the box labeled Method. SPSS gives five options here: Enter, Stepwise, Remove, Forward, and Backward. In the absence of a strong theoretical reason for using a particular method, Enter should be used. The box labeled Selection variable is used if we want to restrict the analysis to cases satisfying particular selection criteria. The box labeled Case labels is used for designating a variable to identify points on plots. After making the appropriate selections click on Statistics button. This will produce a dialogue box labeled Linear regression: Statistics. Tick against the statistics you want in the output. The Estimates option gives the estimate of regression coefficient. The Model fit option gives the fit indices for the overall model. Besides these the R-Squared change option is used to get the incremental R-square value when the models change. Other options are not commonly used. Click on the Continue button to return to the main dialogue box. The Plots button in the main dialogue box may be used for producing histograms and normal probability plots of residual. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression. Now click on OK in the main dialogue box to run the analysis.
Pictorial Representation
Analyze Plots Regression Linear Tick Histogram & Normal Probability Plot Define DV & IV OK Page 38
Page 39
OUTPUT
b Variables Entered/Removed
Model 1
Variables Removed .
Method Enter
The above table tells us about the independent variable and the regression method used. Here we see that the independent variable i.e. beginning salary is entered for the analysis as we selected the Enter method.
b Model Summary
Model 1
This table gives us the R-value, which represents the correlation between the observed values and predicted values of the dependent variable. R-Square is called the coefficient of determination and it gives the adequacy of the model. Here the value of R-Square is 0.775 that means the independent variable in the model can predict 77.5% of the variance in dependent variable. Adjusted R-Square gives the more accurate information about the model fitness if one can further adjust the model by his own.
b ANOVA
Model 1
df 1 472 473
F 1622.118
Sig. .000 a
The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled Regression gives the variability in the model due to known reasons. The second row labeled Residual gives the variability due to random error or unknown reasons. F-value in this case is 1622.118 and the p-value is given by
Page 40
0.000 which is less that 0.05, so we reject our null hypothesis and conclude that the mean beginning salary is not equal to the mean current salary of the employees.
a Coefficients
Model 1
t 2.170 40.276
The above table gives the regression constant and coefficient and their significance. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary i.e. Current salary = 1928.206 + (1.909) (Beginning salary) Now we test our hypothesis, we see that the p-value for regression coefficient of beginning salary is given by 0.000, which is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero.
Charts
Histogram
D p n e t V ria le C rre t S la e e d n a b : u n a ry
15 2
Frequency
10 0
7 5
5 0
2 5 M a = -3 7 -1 en .1 E 6 S . D v. = 0 9 td e .9 9 N= 4 4 7
0 -5 .0 -2 .5 0 .0 2 .5 5 .0 7 .5
R g s io S n a ize R s u l e re s n ta d rd d e id a
Page 41
The above histogram of standardized residuals shows the value of mean and standard deviation of the residual in the model. The mean and standard deviation is approximately 0 and 1 respectively, which shows that the fitted model is best and the chances of error is minimum. Normal P P Plot of Regression Standardized Residual
0.8
0.6
0.4
0.2
O bserved C umProb
The above Normal probability plot of regression standardized residual shows the regression line which touches maximum number of points presents in the model and it also shows the accuracy of the fitted model. Scatter Plot
Page 42
Current Salary
$100,000 $80,000 $60,000 $40,000 $20,000 $0 -5.0 -2.5 0.0 2.5 5.0 7.5
The above scatter plot also shows the adequacy of the fitted model as we can see that the data is scattered and it does not follow any particular pattern, so we can say that the fitted model has minimum chances of error.
Data Source:
C:\SPSSEVAL\Employee Data
Variables:
Here we are interested to analyze four numerical variables i.e. Current salary Beginning salary Educational Level Month since Hire (Numerical) (Numerical) (Numerical) (Numerical)
Hypothesis:
H0: HA: Regression coefficients are zero. Regression coefficients are not zero.
Page 43
SPSS need more than two numerical variables that should be scaled.
The method for analyzing multiple regression is same as we discuss earlier in the case of linear regression. The only change in the case of multiple regression is that we have one dependent variable along with three independent variables. Here Current salary is the dependent variable, whereas Beginning salary, Educational Level, and Month since hire are the independent variables. So here we transfer current salary in the box labeled Dependent and beginning salary, educational level, and month since hire in the box labeled Independent(s). The further procedure and uses of advance options for extra results are discussed earlier in the case of linear regression. Now after making appropriate selections of options for better results click on OK to run the analysis.
OUTPUT
b Variables Entered/Removed
Model 1 2 3
Variables Entered a Beginning Salary a Educational Level (years) a Months since Hire
Variables Removed . . .
The above table shows that beginning salary was entered in model one followed by educational level in model two followed by months since hire in model three. Note that model one includes only beginning salary as independent variable. Whereas model two includes beginning salary and educational level as independent variables, and so on model three includes beginning salary, educational level, and months since hire as independent variables. Enter method is used to assess all three models.
Page 44
Model Summary Change Statistics Model 1 2 3 R R Square .880 a .775 .890 b .792 .895 c .801 Adjusted R Square .774 .792 .800 Std. Error of the Estimate $8,115.356 $7,796.524 $7,645.998 R Square Change .775 .018 .008 F Change 1622.118 40.393 19.728 df1 1 1 1 df2 472 471 470 Sig. F Change .000 .000 .000
a. Predictors: (Constant), Beginning Salary b. Predictors: (Constant), Beginning Salary, Educational Level (years) c. Predictors: (Constant), Beginning Salary, Educational Level (years), Months since Hire
The above table shows different R-values along with change statistics for the three models in different rows. In this table we get some additional statistics under the column change statistics. Under change statistics, the first column labeled R-square change gives change in the R-square value between the three models. The last column labeled Sig. F Change tests whether there is a significant improvement in models as we introduce additional independent variables. In other words it tells us if the inclusion of additional independent variables in different steps helps in explaining significant additional variance in the dependent variable. We can see the R-square change value in row three is 0.008. This means that the inclusion of month since hire after beginning salary and educational level helps in explaining the additional 0.8% variance in the current salary of the employees. The p-value for all three models shows that our value falls in the critical region, so we can reject our null hypothesis that means regression coefficients are not zero.
a Coefficients
Model 1 2
Unstandardized Coefficients B Std. Error (Constant) 1928.206 888.680 Beginning Salary 1.909 .047 (Constant) -7808.714 1753.860 Beginning Salary 1.673 .059 Educational Level (years) 1020.390 160.550 (Constant) -19986.5 3236.616 Beginning Salary 1.689 .058 Educational Level (years) 966.107 157.924 Months since Hire 155.701 35.055
Sig. .031 .000 .000 .000 .000 .000 .000 .000 .000
Page 45
The above table gives the regression coefficients and related statistics for three models separately in different rows. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary of the employees for three models i.e. MODEL 1 MODEL 2 MODEL 3 CS = 1928.206 + (1.909) (BS) CS = -7808.714 + (1.673) (BS) + (1020.390) (EL) CS = 19986.50 + (1.689) (BS) + (966.107) (EL) + (155.701) (MSH)
Now we test our hypothesis, we see that the p-value for regression coefficient in all three models is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero. Conclusion: By using hierarchal or stepwise method for multiple regression we concluded that model adequacy is being increased by introducing each independent variable but the increased in adequacy by including the independent variable i.e. educational level is more than the adequacy increased by introducing the independent variable, months since hire. But as our p-value lie in the critical region so we can reject our null hypothesis and conclude that the regression coefficients for all three models are not equal to zero.
Charts:
This model also produces three diagrams for the standardized residual i.e. Histogram, Normal Probability Plot, and Scatter Plot. The charts and its interpretation are almost same as we discuss under the case of linear regression. So we are not describing these charts and its interpretations again.
Data Source:
C:\SPSSEVAL\Employee Data
Variables:
Page 46
Statistical Applications through SPSS Current salary Beginning salary Educational level (Numerical) (Numerical) (Numerical)
Hypothesis:
H0: HA: Regression coefficient is zero. Regression coefficient is not zero.
SPSS Need:
Method:
The given data is entered in the data editor and the variables are labeled as current salary, beginning salary and Educational level. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Curve Estimation, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent(s). Transfer the independent variable into the box labeled Independent. In our case, current salary and Beginning salary are dependent variables and Educational level is an independent variable. Now choose an appropriate model you want by ticking its box appearing below the window labeled Curve Estimation. In this case we choose Quadratic model by ticking its corresponding box. The Save button can be used to save statistics like predicted values, residuals, and predicted intervals. Now click on OK in the main dialogue box to run the analysis.
Pictorial Representation
Analyze Regression Curve Estimation Define DVs and IV Tick Quadratic OK
Page 47
OUTPUT
Quantitative Techniques in Analysis Page 48
Model Description Model Name Dependent Variable 1 2 MOD_2 Current Salary Beginning Salary Quadratic Educational Level (years) Included Unspecified .0001
Equation 1 Independent Variable Constant Variable Whose Values Label Observations in Plots Tolerance for Entering Terms in Equations
The above table gives the description of the model. In this case we have two dependent variables i.e. Current salary and Beginning salary along with one independent variable i.e. Educational level (years).
Case Processing Summary N Total Cases a Excluded Cases Forecasted Cases Newly Created Cases 474 0 0 0
a. Cases with a missing value in any variable are excluded from the analysis.
The above table shows the number of cases fall in the selected model. In our case the total number of cases is 474, with no excluded or missing cases.
Model Summary and Parameter Estimates Dependent Variable: Current Salary Equation Quadratic R Square .589 Model Summary F df1 337.246 2 df2 471 Parameter Estimates Sig. Constant b1 b2 .000 85438.237 -12428.5 612.950
The independent variable is Educational Level (years). The above table gives the test results for the quadratic regression. R-value shows the correlation between the observed and expected values of the dependent variables. In this case the F-value is given by 337.246, with level of significance equals 0.000 which is less
Page 49
that 0.05. This means that our value falls in the critical region, so we can reject our null hypothesis and conclude this as the regression coefficients are not zero.
Scatter Plots
Current Salary
Observed Quadratic $120,000
$140,000
$100,000
$80,000
$60,000
$40,000
$20,000
$0 8 10 12 14 16 18 20 22
Page 50
B eginning Salary
$80,000
O bserved Q uadratic
$60,000
$40,000
$20,000
$0 8 10 12 14 16 18 20 22
The above charts for residuals of dependent variables clearly show that the residual values are not scattered and it follows a particular pattern, this means that the fitted model is not good.
Data Source:
RUN \\temp\temp\Ali Raza\Mateen.sav
Hypothesis:
H0: HA: Male = Female Male Female Page 51
Variables:
Here we are interested to analyze two categorical variables i.e. Gender. (Categorical) Preference of cellular service with respect to network coverage. (Categorical but treated as Numerical)
Here we consider the preference of cellular service as a numerical variable and statistically test the hypothesis that Mean preference of male and female over cellular service with respect to network coverage is same. The method, we use to test the above hypothesis is Independent samples t-test.
Method:
Enter the data in the data editor and the variables are labeled as Gender and preference. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Preference is the dependent variable to be analyzed and should be transferred into test variable box. Gender is the variable which will identify the groups and it should be transferred into the grouping variable box. Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents Male and group2 represents female. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.
Pictorial Representation
Analyze Compare Means Drag Test & Grouping Variable Independent-Samples T Test Define Groups OK
Page 52
OUTPUT
Quantitative Techniques in Analysis Page 53
Group Statistics Gender contain Both male and female Wide network coverage Male motivate the induvidual to prefer a particular Female cellulare service N 100 100 Mean 4.12 4.32 Std. Deviation .891 .618 Std. Error Mean .089 .062
This table contains the descriptive statistics for both groups. We have taken 200 observations for the independent samples t-Test in which 100 belongs to male category and 100 to female category. The column labeled Mean shows that the mean preferences of cellular service with respect to network coverage for both groups are approximately 4. This means that both groups are Agree that wide network coverage motivate the individual to prefer a particular cellular service.
The above table contains the test statistics for independent samples t-test. Levene's Test: The table contains two sets of analysis, the first one assuming equal variances in the two groups and the second one assuming unequal variances. The Levene's test tells us which statistic to consider analyzing the equality of means. The p-value for Levene's test is given by 0.10, which is greater than 0.05. Therefore, the statistic associated with equal variances assumed should be used for the t-test for equality of means of two independent populations. P-Value: shows that the value of our test statistic does not fall in the critical region i.e. 0.067 > 0.05 so we can accept our Null Hypothesis i.e. Male = Female Conclusion: The test results are statistically significant at 95% confidence level and the data provide sufficient evidence to conclude that the mean preference of cellular service with respect to network coverage for male and female is same and there is only 6.7%
Independent Samples Test Levene's Test for Equality of Variances
F Wide network coverage motivate the induvidual to prefer a particular cellulare service Equal variances assumed Equal variances not assumed 2.730
Sig. .100
t -1.845 -1.845
df 198 176.31
chance of rejecting a true Null Hypothesis and we are 93.3% confident in our decision.
Page 54
Reliability Analysis
Reliability analysis is applied to check the reliability of the data, that whether the conclusions and the analysis perform for the data are reliable to understand and forecast. One way to ideally measure reliability is by the test-retest method. However, establishing reliability through test-retest is practically very difficult. Some of the commonly used techniques for assessing reliability include Cohen's Kappa Coefficient for categorical data and Cronbach's Alpha for internal reliability of the data set.
Here we are interested to check the reliability of data set, which includes five numerical variables i.e. Appraised land Value. Appraised value of improvements. Total Appraised Value. Sales Price. Ratio of Sales price to total Appraised Value.
Note that the data contains one String variable labeled as Neighborhood; we deleted this variable because SPSS does not check the reliability, if the data contains any String or Blank variable.
SPSS Need:
For reliability analysis through SPSS, one can use any variable of any nature except the String and Blank Variables.
Method:
Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Scale from that and click on Reliability Analysis, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the reliability analysis, transfer the variables in the box labeled Items by clicking on the arrow between the two boxes. In this case, we have five numerical variables in the data set that should be transferred to the Items box.
Page 55
Choose appropriate Model by clicking on that box, here we choose Alpha as model. Now click on Statistics button, a dialogue box appears. Tick the corresponding box which you want to analyze in the output. Now click Continue to return to the main dialogue box. Click on OK to run the analysis.
Pictorial Representation
Analyze Drag Items Scale Choose Model Reliability Analysis Give Statistics OK
Page 56
OUTPUT
Case Processing Summary Cases Valid a Excluded Total N 2440 0 2440 % 100.0 .0 100.0
The above table shows the total number of cases fall in the data set. We have 2440 observations with no missing and excluded cases.
The above table shows the test results for the reliability analysis. The value of Cronbach's Alpha is given by 0.576; the number of items in the data set is 5. The value associated with Alpha is said to be Poor and the conclusions draw from this data is not reliable to understand and forecast.
Item-Total Statistics Scale Mean if Item Deleted Appraised Land Value 164151.7603 Appraised Value of 140212.2148 Improvements Total Appraised Value 132761.3423 Sale Price 106454.2587 Ratio of Sale Price to 181191.6111 Total Appraised Value Scale Variance if Item Deleted 5533196618 4646009801 5160815037 1928523141 6801537138 Corrected Item-Total Correlation .688 .505 .314 .565 -.032 Cronbach's Alpha if Item Deleted .480 .438 .533 .477 .615
The above table shows the statistics associated with each item. The last column of the table shows the improvement in the value of Alpha, if the corresponding item is deleted from the data set. Now the value associated with the top four items in the data set is less than the current value of Alpha which is 0.576, that means if one of these items is deleted, the Quantitative Techniques in Analysis Page 57
value of Cronbach's Alpha is become worst. But the value associated with the item labeled Ratio of sale price to total appraised value is given by 0.615. This means that if this item is deleted from the analysis and retests the reliability of the entire data, the value of Cronbach's Alpha becomes 0.615. So, in order to improve the value of Alpha to make our data set more reliable we delete the last item and retest the value of our Cronbach's Alpha.
Reliability Statistics Cronbach's Alpha .615 N of Items 4
Here we retest our data after deletion of one item and our new value of Alpha is given by 0.615. Now the total number of items in the entire data set is 4. The value associated with Alpha in this set of reliability statistics is said to be Acceptable and the conclusions draw from this data is reliable to understand and forecast.
Item-Total Statistics Scale Mean if Item Deleted 164150.57 140211.03 132760.16 106453.07 Scale Variance if Item Deleted 5533198039 4646008210 5160813863 1928530335 Corrected Item-Total Correlation .688 .505 .314 .565 Cronbach's Alpha if Item Deleted .540 .493 .599 .536
Appraised Land Value Appraised Value of Improvements Total Appraised Value Sale Price
This table shows that if we delete any other item from the data set and retest the reliability, then our value of Alpha becomes Poor. Because all the values associated with the remaining four items in last column of the above table is less than the current value of our Cronbach's Alpha i.e. 0.615. So we dont need to further retest the reliability of the data set, which means the data is reliable at the current value of our Cronbach's Alpha.
Correlation Analysis
Correlation refers to the degree of relation between two numerical variables. It is denoted by "r", which is typically known as Correlation Coefficient.
Page 58
Correlation Coefficient
The Correlation coefficient gives the mathematical value for measuring the strength of the linear relation between two variables. Mathematically the value of "r" always lay between -1 and 1 with: (a) +1 representing absolute positive linear relationship (as X increases, Y increases). (b) 0 representing no linear relationship (X and Y have no pattern). (c) -1 representing absolute inverse relationship (as X increases, Y, Decreases).
Bivariate Correlation
Bivariate correlation tests the strength of relationship between two variables without giving any consideration to the interference some other variables might cause to the relationship between the two variables being tested. For example, while testing the correlation between the Current and Beginning salary of the employees, bivariate correlation will not consider the impact of some other variables like Educational Level and Previous Experience of the employees. In such cases, a bivariate analysis may show us a strong relationship between Current and Beginning salary; but in reality, this strong relationship could be the result of some other extraneous factors like Educational Level and Previous Experience etc.
Data Source:
C:\SPSSEVAL\Employee data
Hypothesis:
H0: HA: There is no Correlation between Variables (r =0) There is some Correlation between Variables (r 0)
Variables:
Here we are interested to analyze three numerical variables i.e. Current salary Beginning salary Educational Level (years) (Numerical) (Numerical) (Numerical)
Page 59
Technically correlation analysis can be run with any kind of data, but the output will be of no use if a correlation is run on a categorical variable with more than two categories. For example, in a data set, if the respondents are categorized according to nationalities and religions, correlation between these variables is meaningless.
SPSS Need:
SPSS need two or more numerical variables to perform Correlation Analysis.
Method:
Firstly the data is entered in the data editor and the variables are labeled as Current salary, Beginning salary, Educational Level, and Previous Experience. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Bivariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the bivariate correlation, choose the variables for which the correlation is to be studied from the left-hand side box and move them to the right-hand side box labeled Variables. Once any two variables transferred to the variables box, the OK button becomes active. In our case we will transfer four numerical variables i.e. Current salary, Beginning salary, Educational Level, and Previous Experience, to the right-hand side box labeled as Variables. There are some default selections at the bottom of the window; that can be change by clicking on the appropriate boxes. For our purpose, we will use the most commonly used Pearson's Coefficient. Next, while choosing between one-tailed and two-tailed test of significance, we have to see if we are making any directional prediction. The one-tailed test is appropriate if we are making predictions about a positive or negative relationship between the variables; however the two-tailed test should be used if there is no prediction about the direction of relationship between the variables to be tested. Finally Flag Significant Correlations asks SPSS to print an asterisk next to each correlation that is significant at the 0.05 significance level and two asterisks next to each correlation that is significant at the 0.01 significant level, so that the output can be read easily. The default selections will serve the purpose for the problem at hand. We may choose Means and Standard Deviations from the Options button if we wish to compute these figures for the given data. After making appropriate selections, click on OK to run the analysis.
Pictorial Representation
Page 60
Page 61
OUTPUT
Correlations Current Salary Current Salary Pearson Correlation 1 Sig. (2-tailed) N 474 Beginning Salary Pearson Correlation .880** Sig. (2-tailed) .000 N 474 Educational Level (years) Pearson Correlation .661** Sig. (2-tailed) .000 N 474 **. Correlation is significant at the 0.01 level (2-tailed). The above table gives the correlation for all pairs of variables and each correlation is produced twice in the matrix. So here we get following 3 correlations for the given data. Current salary and Beginning salary Current salary and Educational level Beginning salary and Educational level Beginning Educational Salary Level (years) .880** .661** .000 .000 474 474 1 .633** .000 474 474 .633** 1 .000 474 474
The value of correlation coefficient is 1 in the cells where SPSS compare two same variables (Current salary and Current salary and so on). This means that there is a perfect positive correlation between the variables. In each cell of the correlation matrix, we get Pearson's correlation coefficient, p-value for two-tailed test of significance and the sample size. From the output we can see that the correlation coefficient between Current salary and Beginning salary is 0.88 and the pvalue for two-tailed test of significance is less than 0.05. From these figures we can conclude that there is a strong positive correlation between Current salary and beginning salary and that this correlation is significant at the significance level of 0.01. Similarly, the correlation coefficient for Current salary and Educational level is 0.661. So there is a moderate positive correlation between these variables. The correlation coefficient for Beginning salary and Educational level is 0.633 and its pvalue is given by 0.000, so we can reject our null hypothesis and conclude this as there is some correlation between these two variables. Conclusion: At 1% level of significance all variables are significantly correlated with each other. In this case our null hypothesis is rejected that there is no correlation between the
Page 62
variables for all pairs of variables. We can conclude this as there is some correlation present between all variables in the given data.
Partial Correlation
Partial correlation allows us to examine the correlation between two variables while controlling for the effects of one or more of the additional variables without throwing out any of the data. In other words, it is the degree of relation between the dependent variable and one of the independent variable by controlling the effect of other independent variables, because we know that, in a multiple regression model, one dependent variable depends on two or more independent variables.
Data Source:
C:\SPSSEVAL\Employee data
Hypothesis:
H0: HA: There is no Correlation between Variables (r =0) There is some Correlation between Variables (r 0)
Variables:
Here we are interested to analyze two numerical variables, while controlling one additional variable. Current salary Beginning salary Educational level
(Control variable)
SPSS Need:
SPSS need two or more numerical variables to perform partial Correlation.
Method:
Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Partial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the partial correlation, transfer the variables for which you want to know the correlation between them in the box labeled Variables, while controlling the effect of one or more additional variables by transferring them to the box labeled Controlling for.
Page 63
In our case, we want to find the correlation between Current salary and beginning salary of the employees, so these variables should be transferred to the box labeled Variables, while controlling for the effect of Educational level and Previous experience of the employees by transferring them to the box labeled Controlling for. Now click on OK to run the analysis.
Pictorial Representation
Analyze Drag Variables Correlate Drag Controlling Variables Partial OK
Page 64
OUTPUT
Correlations Control Variables Educational Level (years) Current Salary Current Salary Correlation 1.000 Significance (2-tailed) . df 0 Correlation .795 Significance (2-tailed) .000 df 471 Beginning Salary .795 .000 471 1.000 . 0
Beginning Salary
The above table shows the test results for the partial correlation between Current salary and beginning salary of the employees. The variable we are controlling for in the analysis is Educational level, and it is shown in the left-hand side of the table. We can see that the correlation coefficient between Current salary and Beginning salary is 0.795, which is considerably smaller as compared to 0.88 in case of Bivariate. This means that both the variables are still have positive correlation, but the value of correlation coefficient decreased if we control for the Educational level of the employees and the variables are no longer strongly correlated with each other. Conclusion: The test results are significant at 5% level of significance and the data provide sufficient evidence to conclude that there is some correlation present between the Current Quantitative Techniques in Analysis Page 65
salary and Beginning salary of the employees, but it is considerably smaller in the case of partial correlation than in case of bivariate correlation.
Logistic Regression
Logistic regression starts in 1700. If a categorical variable depends on any numerical or categorical variable then their dependency may called the logistic regression. It is used to predict a discrete outcome based on variables may be discrete, continuous, or mixed. Thus when the dependent variable is categorical with two or more than two discrete outcomes, logistic regression is a commonly used technique. It has the following two types: Binary logistic regression / Logit Multinomial logistic regression
Data Source:
C:\SPSSEVAL\AML Survival
Page 66
Hypothesis:
H0: HA: Regression coefficients are zero Regression coefficients are not zero
Variables:
Here we are interested to analyze three Different variables i.e. Status (Categorical) Time (Numerical) Chemotherapy (Categorical)
Here Status is our dependent variable depending on Time and Chemotherapy. As in this case our dependent variable is categorical having only two levels i.e. Censored and Relapsed, so we use binary logistic regression to analyze the dependency between the variables.
SPSS Need:
SPSS need one dependent variable and it must be Categorical, while the independent variables can be categorical as well as numerical.
Method:
Firstly the data is entered in the data editor and the variables are labeled as Status, Time, and Chemotherapy. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Binary Logistics, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binary Logistics Regression, transfer the dependent variable in the box labeled Dependent and the independent variables in the box labeled Covariates. In our case, Status is an only dependent variable and should be transfer to the box labeled Dependent. Time and Chemotherapy are independent variables and should be transfer to the box labeled Covariates. Next we have to select the method of for analysis in the box labeled Method. SPSS gives seven options, of which the Enter method is most commonly used. For common purpose one does not need to use the Save and Options buttons. Advance users may experiment with these. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression. After making appropriate selections, click on OK to run the analysis.
Page 67
Pictorial Representation
Analyze Drag Dependent Regression Drag Covariates Binary Logistic OK
Page 68
OUTPUT
Case Processing Summary Unweighted Cases Selected Cases Included in Analysis Missing Cases Total Unselected Cases Total
a
N 23 0 23 0 23
a. If weight is in effect, see classification table for the total number of cases.
The above table gives the description of cases selected for the analysis. We have totaled 23 cases included in the analysis with no missing and unselected cases.
The above table shows that how the two outcomes or two levels of Status i.e. Censored and Relapsed have been coded by SPSS. Quantitative Techniques in Analysis Page 69
Step 0
Censored Relapsed
The above table shows the observed or actual number of cases fall in each category of the dependent variable. The last column labeled Percentage Correct shows that our model can predict 0% status of the censored patients and 100% status for the relapsed patients. Overall, our model can predict 78.3% status of the patients.
The above table reports significance levels by the traditional chi-square method. It tests if the model with the predictors is significantly different from the model. The omnibus test may be interpreted as a test of the capability of all predictors in the model jointly to predict the response (dependent) variable. A finding of significance, as in the illustration above, corresponds to the a research conclusion that there is adequate fit of the data to the model, meaning that at least one of the predictors is significantly related to the response variable. In the illustration above, the Enter method is used (all model terms are entered in one step), so there is no difference for Step, Block, or Model, but in a stepwise procedure one would see results for each step.
Page 70
Model Summary Step 1 -2 Log Cox & Snell likelihood R Square 19.476 a .182 Nagelkerke R Square .280
a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.
The above table gives the Cox & Snell R-Square value, which gives an approximation about how much variance in the dependent variable can be explained with the hypothesized model. In this case Time and Chemotherapy can explain 18.2% of the patient's current Status.
a Classification Table
Step 1
Censored Relapsed
The above Classification table summarizes the results of our predictions about patient's Status based on Time and Chemotherapy. We can see that our model can correctly predict 20% status of censored patients and 100% status of the relapsed patients. Overall, our model predicts 82.6% status of the patients.
Variables in the Equation Step a 1 chemo time Constant B -1.498 -.024 2.962 S.E. 1.262 .024 1.207 Wald 1.409 1.055 6.025 df 1 1 1 Sig. .235 .304 .014 Exp(B) .224 .976 19.332
The above table gives the Beta coefficients for the independent variables along with their significance. Negative beta coefficients for time and chemotherapy mean that with increasing chemotherapy and time of the treatment, it chances for the patient of having a relapsed status. Same as Multiple linear regression models, we can construct an OLS equation for the status of the patient by the help of above regression constant and coefficients. The expression for status of the patient is given by: Status = 2.962 + (-1.498) (Chemotherapy) + (-0.024) (Time)
Page 71
The last column labeled Exp(B) takes a value of more than one, if the beta coefficients are positive and less than one, if it is negative. In our case, the beta coefficients for Chemotherapy and time are negative, so coefficients are having the values of less than one in column labeled Exp(B). A value of 0.976 for Time indicates that for 1 week increase in the treatment, the odds of a patient having a relapsed status increases by a factor of 0.976. These values can also use to construct an equation for the odds of a patient, and it is given by:
P=
Non-Parametric Tests
Non-Parametric tests are used to test the hypothesis regarding the population parameters of non-normal data with small sample size (less than 30). These tests are sometimes also referred as "Distribution-Free tests"
Binomial Test
Binomial tests are used to test the hypothesis regarding the population proportion. It runs on a categorical variable having two levels only.
Data Source:
C:\SPSSEVAL\Carpet
Hypothesis:
H0: HA: P = 0.5 P 0.5
Variables:
Page 72
Here we are interested to analyze a categorical variable i.e. House keeping Seal. In our case a superstore owner claims that 50% of their customers got house keeping seal on the purchase of the product.
SPSS Need:
SPSS need one categorical variable (2 levels only).
Method:
Firstly the data is entered in the data editor and the variable is labeled as House keeping seal. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Binomial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binomial test, transfer the test variable in the box labeled Test variable list. In our case House keeping seal is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now give the test value in the box below labeled as Test Proportion. In our case the test value is 0.50.
Pictorial Representation
Analyze Drag test Variable Non-Parametric tests Give test Proportion Binomial OK
Page 73
Page 74
The above table gives the test results for the Binomial Non-parametric test.
The first column labeled Category gives the two categories (Yes or No) of the test variable i.e. Good House keeping seal.
The second column labeled as N gives the total number cases analyzed, and also the number of cases fall in each category of our test variable. In this case we have selected the sample of 22 persons out of which 8 persons say Yes they got the house keeping seal and the remaining says No.
The third column labeled as Observed Proportion gives the percentage of the persons saying Yes or No. 36% individuals says Yes they got the house keeping seal while 64% individuals says No.
The last column gives the p-value for the 2-tailed test and it is given by 0.286, which is greater than 0.05, so we can accept our null hypothesis and conclude that the claim of the superstore owner is correct, the proportion is 0.50.
Page 75
Runs Test
Runs test is used to test the randomness of the data. This test is best run, if the test variable is numerical. The word RUNS refer the number of time sign is changed.
Data Source:
C:\SPSSEVAL\Carpet
Hypothesis:
H0: HA: Data is random Data is not random
Variables:
Here we are interested to analyze a numerical variable i.e. Preference.
SPSS Need:
SPSS need a numerical variable with small sample size.
Method:
Firstly the data is entered in the data editor and the variable is labeled as Preference. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Runs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Run test, transfer the test variable in the box labeled Test variable list. In our case Preference is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now in our case the test variable i.e. Preference is a numerical variable, so in the Section labeled Cut Point, we tick the box Median, but if the test variable is categorical, it is appropriate to calculate its Mean by ticking its corresponding box.
Page 76
Pictorial Representation
Analyze Drag test Variable Non-Parametric tests Tick Box (Median) Runs OK
Page 77
OUTPUT
Runs Test
a Test Value Cases < Test Value Cases >= Test Value Total Cases Number of Runs Z Asymp. Sig. (2-tailed)
a. Median
The above table gives the test results for Runs Test. The first row labeled as Test Value gives the Median of the data. In this case out of 22 observations 11 values is less than our median or in other words those values having a negative sign, while the remaining values having a positive sign. The row labeled Number of Runs gives a value 13; this means in the given data 13 times a sign is changed. The last row gives the p-value for the Runs test and it is given by 0.827 > 0.05, so we can accept our null hypothesis and conclude that the Data is Random.
Page 78
Representation of Runs:
Data Source:
C:\SPSSEVAL\Carpet
Hypothesis:
H0: HA: Fit is Good (Data follows the fitted distribution) Fit is not Good (Data does not follow the fitted distribution)
Page 79
Variables:
Here we are interested to analyze a numerical variable i.e. Price.
SPSS Need:
SPSS need a numerical variable with small sample size.
Method:
Firstly the data is entered in the data editor and the variable is labeled as Price. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on 1-sample K-S, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the K-S test, transfer the test variable in the box labeled Test variable list. In our case Price is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now tick the box in the section labeled Test Distribution at the bottom of the dialogue box. In our case the fitted distribution is Poisson, so we tick the corresponding box labeled Poisson. After making appropriate selections, click on OK to run the analysis.
Pictorial Representation
Analyze Drag test Variable Non-Parametric tests Tick Box (Median) Runs OK
Page 80
Page 81
OUTPUT
One-Sample Kolmogorov-Smirnov Test N a,b Poisson Parameter Most Extreme Differences Price 22 2.0000 .143 .143 -.135 .670 .760
Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed) a. Test distribution is Poisson. b. Calculated from data.
The above table gives the test results for the one sample K-S test. We have taken 22 observations for the analysis. The mean of Poisson distribution calculated from data is given by 2. The row labeled Absolute gives the difference between extreme values i.e. extremely high values and Extremely Low values and it given by 0.143.
The row labeled Positive gives the difference between the Maximum and Minimum values, when we subtract Minimum value from maximum value and it is 0.143.
The row labeled Negative also gives the same as row labeled Positive, but here we subtract Maximum value from the Minimum value, so the resulted value is given by (-0.135).
The Kolmogrov-Smirnov Z value is given by 0.67, which calculated from the formula.
The last row gives the p-value for the analysis. In our case the p-value is given by 0.76, and it is greater than 0.05. So we can accept our null hypothesis and conclude this test as the Fit is Good and the data follows the Poisson distribution.
Page 82
Page 83
TABLE OF CONTENTS
Non Parametric Test 3 One Way Chi Square Test....4 The Median Test....8 The Mann-Whitney U Test......12 Kruskal-Wallis One-Way Analysis of Variance..15
Matrix of Nonparametric Statistics....19
Page 84
Nonparametric Tests
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
One-Way Chi-Square test and its assumptions Observed and expected frequencies Degrees of freedom in a One-Way Chi-Square test Two-Way Chi-Square test Expected frequencies and the rule of independent probabilities Degrees of freedom in a Two-Way Chi-Square test Chi-Square and the problem of small expected frequencies Remedies for small expected frequencies Yatess Correction for Continuity Fishers Exact Probability Test Assumptions of the Two-Way Chi-Square test The Median Test Dealing with tied ranks in the Median Test Degrees of freedom in the Median Test Assumption of the Median Test The Mann-Whitney U Test The normal distribution and the Mann-Whitney U Test with large samples Assumptions of the Mann-Whitney U Test Kruskal-Wallis One-Way ANOVA Dealing with tied ranks in a Kruskal-Wallis One-Way ANOVA Assumptions of the Kruskal-Wallis One-Way ANOVA Matrix of nonparametric statistical tests
Page 85
Nonparametric Statistic:A group of statistical procedures that have two things in common
Designed to be used with nominal and/or ordinal variables Make few or no assumptions about population parameters
Statistical Power:All other things being equal, nonparametric techniques are less powerful tests of significance than their parametric counterparts. Assuming that the null hypothesis is false, Nonparametric tests have less power to discover significant relationships than parametric tests.
A test that compares the observed frequency of a variable in a single group with what would be the expected by chance. Example Felony cases are assigned at random to four district criminal courts. The annual observed frequency of drug cases is given below. Since the number of cases in each court is not the same, is the case assignment system not random? District Court 132nd 225 Total number of cases = 896 If the assignment process is truly random, we would expect that (896 / 4) = 224 cases would be assigned to each court by chance. How much do the observed frequencies (fo) differ from the expected frequencies (fe = 224)? 189th 264 205th 211 264th 196
Page 86
Calculating a One-Way Chi-Square District Court Frequencies Observed Expected 132nd 225 224 189th 264 224 205th 211 224 264th 196 224
2 = [ (fo fe)2 / fe ]
2 = (225 224)2 / 224 + (264 224)2 / 224 + (211 224)2 / 224 + (196 224)2 / 224 2 = (0.0045 + 7.1429 + 0.7545 + 3.50) 2 = 11.40
Interpretation of Chi-Square:Chi-square is a family of probability distributions that vary with degrees of freedom.
Page 87
A one-way chi-square has df = (k 1). For the court example, df = (4 1) = 3 (k = # courts) The critical value of 2 for 3 df at = 0.05 is 7.815. Since 11.40 > 7.815, the null hypothesis is rejected and it is concluded that the assignment system is not random.
Analysis through SPSS:Frequency of 70 Felony Cases By Pretrial Status In Jail 32 On Bail 21 ROR 17
If the null hypothesis assumes equal numbers of cases in each status Do the observed frequencies differ significantly from this assumption?
Page 88
a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 23.3.
Interpretation:-
The distribution of cases by status is not significantly different than the assumption of equal numbers of cases in each pretrial status category. The differences between the observed and expected frequencies are due to sampling error.
Page 89
2)The Median Test:Used to determine whether two random samples come from populations with the same medians. Assessment Center Rating by Two Teams: Officer Candidates Randomly Assigned to Assessment Teams Team A Subject Goldin Jesani Pritchard Birdwell Chavez ONeal Johnson Tate Bird Zuni Compton Lewis Rating 72 67 87 46 58 63 84 53 62 77 82 89 Subject Olsen Smither Trantham Gordon Graham Andel Hutton Paul McGuire Costo Raines Battan Litzmann Team B Rating 97 76 83 69 56 68 92 88 74 73 65 54 43
Step 1:- Determine the median rating for the two assessment groups combined
Midpoint = (N + 1) / 2 = (25 + 1) / 2 = 13 Ranking of the combined ratings of the two groups 97 92 89 88 87 84 83 82 77 76 74
Quantitative Techniques in Analysis
69 68 67 65 63 62 58 56 54 53 46
Page 90
73 72
Step 2:- Determine the number of officers in either group whose ratings were equal to or above the
median, and the number not above the median. Team A Above Mdn Below Mdn 6 6 Team B 7 6
Step 3
Run a two-way chi-square to determine whether there is an association between assessment team and the ratings. (Expected frequencies are in parentheses) Position Above Mdn Below Mdn Total Team A 6 (6.24) 6 (5.76) 12 Team B 7 (6.76) 6 (6.24) 13 Total 13 12 25
Interpretation:The critical value of 2 for 1 df at = 0.05 is 3.841. Since 0.0367 < 3.841, the null hypothesis is accepted. It is concluded that there is no significant difference in the median ratings given by the two assessment teams.
Page 91
Analysis through SPSS:Is gender related to the seriousness of the offenses charged by the prosecutor? Crime seriousness was graded on a 7-point Likert scale where 1= very minor crime and 7 = very serious crime.
Page 92
a Test Statistics
Interpretation:Male and female offenders do not differ significantly in the median seriousness of the crime with which they have been charged.
Page 93
3)The Mann-Whitney U Test:Used to compare the ranks of two independent groups, comparable to the purpose of the t test. It is more powerful than the median test since the latter only considers the number of cases above & below the median, not the rank order of the cases. U = N1N2 + [N1(N1 + 1) / 2] - R1 U = N1N2 + [ N2(N2 + 1) / 2 ] - R2
and
R1 = sum of the ranks for the smaller group R2 = sum of the ranks for the larger group N1 = size of the smaller group N2 = size of the larger group
Page 94
Calculating a Mann-Whitney U:Assessment Center Rating By Two Teams: Officers Randomly Assigned to Teams Team A Score 72 67 87 46 58 63 84 53 62 77 82 89 Rank (R1) 13 10 21 2 6 8 20 3 7 17 18 23 Score 97 76 83 69 56 68 92 88 74 73 65 54 43 Team B Rank (R2) 25 16 19 12 5 11 24 22 15 14 9 4 1
Step 1:- Rank the ratings from lowest to highest regardless of assessment team. Step 2: Sum the ranks in either group
(R1) = 148 (R2) = 175
Page 95
Interpretation:- The critical value of U for N1 = 12 and N2 = 13, two-tailed = 0.05, is 41.
Since the smaller obtained value of U (U = 72) is larger than the table value, The null hypothesis is accepted: there is no difference in the ratings given by the two assessment teams.
Page 96
4)Kruskal-Wallis
Used to compare three or more independent samples with respect to an ordinal variable. H = [ 12 / N (N+1) ] [ (( R)2 / n) ] 3(N + 1)
N = the total number of cases n = the number of cases in a given group ( R)2 = the sum of the ranks squared for a given group of subjects
An Example:A state court administrator asked the 24 court coordinators in the states three largest counties to rate their relative need for training in case- flow management on a Likert scale (1 to 7). 1 = no training need 7 = critical training need Training Need of Court Coordinators County A 3 1 3 1 5 4 4 2 County B 7 6 5 7 3 1 6 4 4 5 County C 4 2 5 1 6 7
Page 97
Step 1:- Rank order the total groups' Likert scores from lowest to highest.
If tied scores are encountered, sum the tied positions and divide by the number of tied scores. Assign this rank to each of the tied scores. Kruskal-Wallis One-Way Analysis of Variance (cont. Scores & Ranks Across the Three Counties Ratings 1 1 1 1 2 2 3 3 3 4 4 4 Ranks 2.5 2.5 2.5 2.5 5.5 5.5 8 8 8 12 12 12 Ratings 4 4 5 5 5 5 6 6 6 7 7 7 Ranks 12 12 16.5 16.5 16.5 16.5 20 20 20 23 23 23
Calculating the ranks of tied scores Example:- Three court administrators rated their need for training as a 3. These three scores
occupy the rank positions 7, 8, & 9. (7 + 8 + 9) / 3 = 8
Step 2 Sum the ranks for each group and square the sums
County A Rating 3 1 3 1 5 Rank 8 2.5 8 2.5 16.5 Rating 7 6 5 7 3 County B Rank 23 20 16.5 23 8 Rating 4 2 5 1 6 County C Rank 12 5.5 16.5 2.5 20
Page 98
4 4 2
12 12 5.5
1 6 4 4 5
23
R ( R)2
67.0 4489
79.5 6320.25
Interpretation:The critical chi-square table value of H for = 0.05, and df = 2, is 5.991 Since 4.42 < 5.991, the null hypothesis is accepted. There is no difference in the training needs of the court coordinators in the three counties
Page 99
SPSS Results
White = 1, African American = 2, Hispanic = 3
NPar Tests Kruskal-Wallis Test
Ranks SER_INDX RACE 1.00 2.00 3.00 Total N 25 22 23 70 Mean Rank 31.86 41.48 33.74
Interpretation
No significant relationship found between race and crime seriousness.
Page 100
Ordinal
Median Test MannWhitney U Test Robust RankOrder Test Kolmogoro vSmirnov TwoSample Test SiegelTukey Test for Scale Differences
Page 101