Académique Documents
Professionnel Documents
Culture Documents
SGTBIMIT 01290201817
GGSIPU SGTBIMIT
1
TABLE OF CONTENTS
2
LIST OF FIGURES
4. Box Plot 29
5. Sales Diagram 57
3
DATA SET 1: FREQUNCY DISTRIBUTION
Description:Workers
This data set consist of workers working in a small & medium enterprise in a
city of India.
Objective:
a) Frequency Distribution
b) Bar- Chart
c) Pie- Chart
d) Cross tabs
4
Dataset of workers working in small & medium scale enterprises in city of India is shown
below in table.
S.No Gender Age Religion Education S.No Gender Age Religion Education
group group
1 1 1 3 2 26 1 5 3 2
2 1 4 2 1 27 1 1 1 2
3 1 3 3 4 28 1 5 2 2
4 1 3 1 3 29 1 1 2 4
5 2 4 1 1 30 1 5 2 2
6 1 4 1 1 31 1 2 3 5
7 2 2 1 1 32 1 3 2 1
8 1 2 3 1 33 2 2 2 2
9 1 2 2 1 34 1 5 2 1
10 2 2 2 2 35 2 5 1 2
11 1 3 1 2 36 2 5 2 3
12 1 3 1 3 37 2 2 3 4
13 1 4 1 4 38 2 5 2 3
14 2 1 2 3 39 1 3 3 3
15 1 5 2 2 40 1 5 2 2
16 2 2 2 2 41 1 2 1 1
17 1 1 1 5 42 1 2 3 1
18 1 5 1 5 43 1 3 2 1
19 1 5 2 2 44 1 5 2 5
20 1 2 2 5 45 1 2 1 2
21 2 5 2 2 46 2 5 2 3
22 1 2 2 1 47 2 2 1 2
23 1 2 3 1 48 1 3 3 4
24 1 2 1 5 49 1 4 2 4
25 2 5 2 5 50 2 1 1 1
Table 1.1: Data set of workers working in small & medium scale enterprises in city of India.
The coding details of different variables in the dataset are shown below in table 1.2
5
STEPS
STEP 1: In the VARIABLE VIEW select the heads of DATA VIEW. In this
case first head is ‘Gender’ and then choose the type,measure,values,labes etc
as done below.
STEP 2: Similarly assign the values of all the names of variable view as per
data given above.
6
STEP 3: Final output of variable view.
7
To convert data in the values click the convertor sign from the menu
bar(A arrow 1)
8
STEP 5: From the new dialogue box choose any one label and transfer it to
the variable box.
9
RESULT
FREQUENCIES
Statistics
EDUCATION
Frequency Percent Valid Cumulative
Percent Percent
BELOW TENTH 14 28.0 28.0 28.0
GRADE
HIGH SCHOOL 16 32.0 32.0 60.0
INTERMEDIATE 7 14.0 14.0 74.0
Valid
TECHNICAL 6 12.0 12.0 86.0
DIPLOMA
DEGREE LEVEL 7 14.0 14.0 100.0
Total 50 100.0 100.0
Table 1.3: Shows the frequency distribution
10
STEP 7: From the gallery of new dialogue box select the type of graph
(here bar graph is selected).
STEP 8: Drag education to the x-axis and count will appear on the y axis.
11
Bar graph will apper like this.
12
Fig:1.2: Pie chart of education in count and percentage.
13
STEP 11: Select the two variables.
RESULT
14
CONCLUSION
Hence through frequency distribution we find that majority number of workers had done
their education till high school level which is 32% (16 of total number of 50 workers).
15
DATA SET 2: MEAN, MODE& RANGE
Introduction: Ice Melt
In this data set we will estimate or predict the time as and when will the ice melt upon the
river next year by observing and understanding past data.
Objectives:
1. To determine mean and mode hour of the day in which ice melts using variable like
2. To determine the hour range for the ice melts and to determine in which month ice
1. Mean
2. Mode
3. Bar chat
4. Pie chart
5. Range
16
STEPS
17
18
19
STEP 2: To find mean, mode hour of the day. Analyze<Descriptive Statistics
< Frequencies.
STEP 3: Select the mean and mode hour that to find out the central tendency.
20
RESULT
Statistics
hour of the day
Valid 90
N
Missing 0
Mean 14.60
Mode 13
Table 2.1: Shows Mean & Mode hour of the days in which ice melt.
Descriptive Statistics
N Minimum Maximum Mean Std.
Deviation
hour of the day 90 5 23 14.60 4.069
Valid N (list 90
wise)
Table 2.2: Shows the descriptive statistics of hour of day of ice melt.
STEP 4: To determine hour range, using bar graph. Graphs< Legacy Dialogue < Bar.
21
STEP 5: For cluster bar graph choose hour for category axis and month for cluster.
RESULT
Fig2.1: Cluster bar diagram for hour of ice melt corresponding to months.
22
STEP 6: To determine in which month ice melts most. Graphs< Legacy Dialogue< Pie.
23
Pie Chart for hour of the day in which ice melt in count not in percentage.
Fig 2.2: Shows the pie chart for hours of ice melt in count.
24
Key Findings
25
DATA SET 3: OUTLIERS
Description: Sports
This data set consist of male and female of different age category and the time(hours) they
spend on playing their favourate outdoor sport.
Outlier:
Objective:
a) To find the outlier (if any)
b) To understand the effect of outlier on the measurement and replace it do that distorted
data could be rectified.
c) Correcting the data.
26
STEPS
STEP 1: In the VARIABLE VIEW select the heads of DATA VIEW. In this case first head
is ‘Gender’ and then choose the type,measure,values,labes etc as done below.Values given are
1 for male and 2 for female.
27
RESULT as per data view.
28
STEP 4: For outlay,
Analyze< Descriptive statistics< Explore
STEP 5: From the new dialogue box drag any name (say hour spend for playing) into
dependent list. Click statistics.
29
STEP 6: From the new dialogue box click OUTLIERS. Click continue.
Result of outliers.
30
Table 3.1: Shows the summary of data
Descriptives
Mean 3.033 .4082
95% Confidence Interval Lower Bound 2.198
for Mean Upper Bound 3.868
5% Trimmed Mean 2.759
Median 2.750
Variance 4.999
hours spent for
Std. Deviation 2.2358
playing
Minimum .5
Maximum 13.0
Range 12.5
Interquartile Range 2.1
Skewness 3.146 .427
Kurtosis 13.662 .833
Table3.2: Shows all the descriptive if the data.
Extreme Values
Case Number Value
1 22 13.0
2 29 5.0
Highest 3 4 4.5
4 26 4.5
hours spent for 5 27 4.5
playing 1 8 .5
2 30 1.0
Lowest 3 11 1.0
4 10 1.0
5 15 1.5a
Table 3.3:Shows the extreme values.
31
Hours spent for playing
1.00 0. 5
6.00 1. 000555
8.00 2. 00000555
7.00 3. 0000055
6.00 4. 000555
1.00 5. 0
1.00 Extremes (>=13.0)
BOX PLOT
Fig:3.1:Box plot
32
Key findings
33
DATA SET 4: NORMALITY TEST
Objective:
Normality test
Assumptions/ Hypothesis:
34
STEPS
STEP 1: Fill the data in the variable view whose result is shown on the data view.
STEP 2: For normality test click on Analyze < Descriptive Statistics < Explore.
35
STEP 3: Drag Monthly Sales to Dependent List
STEP 4: Explore< Plots. Select normality plot with test < Continue< Ok.
36
RESULT
OUTPUT MONTHLY_SALES.sav
Descriptives
Statistic Std.
Error
Mean 61.34 4.519
Lower 52.26
95% Confidence Bound
Interval for Mean Upper 70.42
Bound
5% Trimmed Mean 59.99
Median 55.00
monthly Variance 1021.290
sales
Std. Deviation 31.958
Minimum 8
Maximum 150
Range 142
Interquartile Range 35
Skewness .761 .337
Kurtosis .240 .662
Table 4.2: Shows all the statistical value
37
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Monthly .133 50 .027 .951 50 .039
Sales
Table 4.3: Test of normality
a. Lilliefors Significance Correction
KEY FINDINGS
Table 4.5: Shows that significance value is 0.027 & 0.039 respectively, which is less than
0.05.
CONCLUSION
That means we cannot apply Parametric test(T-test, F-test, Z- test, ANNOVA) and Non-
parametric test (chi-square) should be applicable.
38
DATA SET 5: ONE SAMPLE t-TEST
Introduction: Healthcare
A healthcare provider claims that on an average its customers have lost 5 kg of weight in a
month after joining its weight loss programme. In order to test the validity of the claim an
independent researcher collects data of weiht loss by 5 customes a month after joining the
programme. The researcher has decided to apply 1 sample t- test in order to test the validity of
the claim.
Hypothesis:
So, Null Hypo(Ho): mean of the population is 5
Alternative Hypo(Ha/H1): mean of the population is ≠ 5 which means <5 or >5.
Objective:
To check whether the claim of healthcare provider is right i.e null hypothesis is accepted or
not. If the null hypothesis is rejected find the correct mean of the population.
39
STEPS
STEP 1: Fill the data of the weight loss in the variable view.
40
STEP 3: Analyse< Compare mean < one sample t test
41
RESULT
t- Test
One-Sample Statistics
One-Sample Test
Test Value = 5
t Df Sig. (2- Mean 95% Confidence Interval of
tailed) Difference the Difference
Lower Upper
Loss in Weight During -6.212 49 .000 -.980 -1.30 -.66
Weight Loss Program
Table5.2: Shows the result of one sample t -test
Observation 1:
As here p-value (o.ooo) is not greater than alpha value(0.05). We will reject the null
hypothesis which means claim of healthcare of 5 kg weight loss is wrong.
RESULT
t-TEST
One-Sample Statistics
N Mean Std. Std. Error
Deviation Mean
Loss in Weight During 50 4.02 1.116 .158
Weight Loss Program
42
Table5.3: Shows the result for one sample statistics
One-Sample Test
Test Value = 4
T df Sig. Mean 95% Confidence Interval of the Difference
(2- Difference Lower Upper
tailed)
Loss in .127 49 .900 .020 -.30 .34
Weight
During
Weight Loss
Program
Table5.4: Shows the (revised) result of one sample t- test
Observation 2
Here p-value (0.900) is greater than alpha value(0.05). So we will accept the alternative
hypothesis which means if healthcare would have claimed 4kg as the mean weight of the
population he would be right.
Key findings
a) Table 5.2 conclude that p-value is 0.000 which is not greater than 0.05 alpha
value.We will reject the null hypothesis which means claim of healthcare of 5 kg
weight loss is wrong.
b) Table5.4 concludes that p value is 0.900 that is obviously greater than alpha value. So
we conclue that 4 is the correct mean of the population.
c) Null hypothesis is rejected because p value is not greater than alpha value(0.05) at the
sample mean 5.
d) Alternative hypothesis will be accepted at the sample mean 4 where p value(0.900) is
greater than alpha value(0.05).
CONCLUSION
Therefore, there is no significant difference between sample mean and population mean at test
value 4.
The claim of healthcare provider is not true. A customer loses 4 kg of weight instead of 5kg in
a month after joining its weight loss program.
43
DATA SET 6: PAIRED t-TEST
Introduction: Training program
The HR manager of a business firm wants to analyze the impact of a training program
conducted for 30 employees. The purpose of conduction the training program was to improve
performance of employees. The performance scores of employees are noted before and after
training program. He wanted observe the performance of same respondents on pre sample and
post sample i.e. Bivariate.
Assumption:
Null hypothesis (Ho): There is no difference between pre training and post training of
employees.
Alternative hypothesis (Ha): There is difference between pre training and post training
of employees.
Objective:
To record the performance of scores of the employees before & after training.
To improve the performance of employees.
To perform paired t-test.
44
STEPS
STEP 1: Define the variables (name, type, label, measure) and fill it with values.
STEP 2: For paired sample t-test Analyze< Compare Mean < Paired Sample t-Test.
45
STEP 3: Drag both Pre Training Score to Paired Variables.
RESULT
Output of “Paired t-test.”
46
Paired Samples Test
Paired Differences t df Sig. (2-
Mean Std. Std. 95% Confidence tailed)
Deviatio Error Interval of the
n Mean Difference
Lower Upper
Pair Pre_training_Score - -17.367 9.565 1.746 -20.938 -13.795 -9.945 29 .000
1 Post_training_Score
Table 6.3: Shows paired sample test
Observation
Since p- value (0.000) of the sample mean is NOT GREATER than alpha value (0.05) that
means we will not accept null hypothesis.
Key Findings
Table 6.3 shows that significance value 0.000 is less than alpha value 0.005 which
conclude that we will reject null hypothesis
1. Null hypothesis is rejected due to lesser p value than alpha value which means that
alternative hypothesis is accepted. Therefore, there is significant difference between
means of pre-sample and post-sample performance of employees.
So, the training program is highly effective in increasing the sales figure of the company.
47
DATA SET 7: INDEPENDENT SAMPLES t-TEST
Objective:
Assumptions
For Levene’s homogeneity test
Ho = There is no significance difference between the sample variances of two independent
samples (equality of variance)
Ha = There is significance difference between the sample variances of two independent
samples.
48
STEPS
STEP 1: Define the variables (name, type, label, measure) and fill it with values.
49
STEP 3: Define variable list. Drag Performance Score to Test variables & Gender to
Grouping Values. Click “Define Groups”, write “Male” & “Female” in Group 1 & Group 2
column respectively < Click Continue then OK.
RESULT
Group Statistics
Gender N Mean Std. Deviation Std. Error Mean
50
Independent Samples Test
Levene's Test for t-test for Equality of Means
Equality of
Variances
51
Table 7.2: Shoes levenes test and one sample t test
If Levene’s hypothesis (Ho) will reject i.e. if it fails then go for latter value.
As first two columns shows the result of Levene’s homogeneity test, p-value =0.956 is
compare with alpha value which is 0.05. and latter columns tells t-test results.
In this case, p-value ( 0.937) > 0.05,so accept the Ho.
52
STEP5: Define variable list on the basis of age group and set cut point”40”.
Group Statistics
Age N Mean Std. Deviation Std. Error Mean
>= 40 22 68.86 19.075 4.067
Performance Score
< 40 28 55.07 16.777 3.171
53
Independent Samples Test
Levene's Test t-test for Equality of Means
for Equality of
Variances
F Sig. t Df Sig. Mean Std. 95%
(2- Differ Error Confidence
tailed) ence Differ Interval of the
ence Difference
Lower Upper
Equal 1.408 .241 2.7 48 .009 13.792 5.077 3.585 23.999
variances 17
Performa
assumed
nce_Scor
Equal 2.6 42. .011 13.792 5.157 3.387 24.197
e
variances 75 170
not assumed
54
KEY FINDINGS
1. According to table 7.2 the p value of Levene test on the basis of gender is 0.956,
which is greater than 0.05. So, the variances of performance for male & female are
equal. (σ12=σ22)
2. According to table 7.4 the p value of Levene test on the basis of age is 0.241, which is
greater than 0.05. So, the variances of performance of male & female are equal. (σ12 =
σ2 2 )
3. According to table 7.2 for the performance of employees on the basis of gender, the p
value is 0.843 which is greater than 0.05, which implies that the HO will be accepted.
4. For the performance of employees on the basis of age, the p value is 0.009 which is
less than 0.05,which implies that the HO will be rejected & HA will be accepted
So, there is no significant difference in the average performance of the Male & Female
employees.
So, there is a significant difference in the average performance of the Employees below and
above 40 years of age.
55
DATA SET 8: ANOVA
Assumption:
Objective :
Apply one way anova to find out if the sales of three companies are equal.
56
STEPS
STEP 1:Fill the data in the variable view.
57
STEP 3: Analyze< Compare Mean< One way Anova
STEP 4: Take sales to dependent list and company to the factor list.
58
STEP 5: Post Hoc< Tukey< Continue
STEP 6: Click options. From option select descriptive, homogenityand mean plot. Click
Continue.
59
RESULT
Descriptive
Sales
N Mean Std. Std. 95% Confidence Interval for Mini Maximu
Deviation Error Mean mum m
Lower Bound Upper Bound
1 18 17.11 15.335 3.615 9.49 24.74 6 76
2 15 22.53 20.525 5.299 11.17 33.90 5 76
3 17 44.47 15.895 3.855 36.30 52.64 20 89
Total 50 28.04 20.767 2.937 22.14 33.94 5 89
Table 8.1: Shows descriptive data.
ANOVA
Sales
Sum of Squares Df Mean Square F Sig.
Between Groups 7194.174 2 3597.087 12.130 .000
Within Groups 13937.746 47 296.548
Total 21131.920 49
60
Multiple Comparisons
Sales
a,b
Tukey HSD
Compan N Subset for alpha = 0.05
y 1 2
1 18 17.11
2 15 22.53
3 17 44.47
Sig. .639 1.000
Table8.5: Shows tukey comparison of the three companies
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 16.570.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error
levels are not guaranteed.
61
Fig 8.1: Shows the diagrammatic representation of sales of companies
62
CONCLUSION
Table 8.2 –Homogeneity of Variance in Anova using levene’s test concludes that since
significance value is 0.134 that is more than alpha0.05 value hence our null hypothesis for
levene’s test is accepted i.e variance of sale all three companies are equivalent or similar.
Table 8.3- In the Anova table significance value is 0.00 which states that null hypothesis (Ho)
is rejected and the mean sales of all three companies are not equivalent.
Hence, we will accept alternative hypothesis (Ha). Now to check which of the 4 cases are to
be selected we will perform post Hoc test.
Table 8.4: Shows the significance value of each company with respect to other for example
company1 with company 2 has significance value 0.643 we will Ho here and significance
value of 1 with company 3 is 0.000 here we will reject Ho.
Fig 8.1: Shows that sales of company 1 is equivalent to company 2 and vice versa but sales of
company 3 are different from both.
Lastly we conclude that mean sales of Company 1 and Company 2 are equivalent but
mean sale of company 3 are totally different.
63