Académique Documents
Professionnel Documents
Culture Documents
This tutorial carries on the work from tutorial 1 and extends the work.
On completion of this tutorial you should be able to do the following.
Explain and find the standard deviation and variance for grouped and ungrouped data.
D.J.Dunn www.freestudy.co.uk
1.
The data about things like people's height varies about a mean value. If the data is random then
there would be for example, equal numbers of short and tall people. We covered the mean value in
the last tutorial. The STANDARD DEVIATION is a measurement that tells if the data is
concentrated close to the mean or spread out over a wide range. Plots of data like this give us the
characteristic bell shaped curve shown.
The STANDARD DEVIATION uses the symbol
(sigma). If is large, the data is concentrated close to
the mean. If is small, the data is spread out. If we have
the same number of samples, then small values of
produce short graphs and large values of produce tall
graphs as shown.
acc.f =
f dx
D.J.Dunn www.freestudy.co.uk 2
2.
Ungrouped data is presented in a table listing the value of each sample. If the number of samples is
large, this becomes a large table but it is probably best to use this method with small numbers of
samples.
DEFINITION of
(x x )
=
S=
n 1
n = number of samples
You might find it better to arrange your tables in columns rather than rows. Lets look at an
example.
The following is a table of lead concentration in the blood of a group of people. Calculate the
mean and the standard deviation.
Sample
1
2
3
4
5
6
7
8
9
10
Totals10
Resistance (Ohms)
119
120
120
121
122
119
119
122
123
123
1208
Differences squared
3.24
0.64
0.64
0.04
1.44
3.24
3.24
1.44
4.84
4.84
23.6
(x x )
=
S=
n 1
23.6
= 2.622
9
Finally, the square root of the variance provides the standard deviation:
= 2.622 = 1.619 Ohms
D.J.Dunn www.freestudy.co.uk 3
1. The hardness of ten steel samples was measured and the results were as follows.
Sample
Hardness
1
90
2
92
3
95
4
91
5
98
6
102
7
97
8
92
9
95
10
99
1
2
3
4
5
6
7
8
9
10
19.8 19.9 19.9 20.1 20.1 19.9 20.2 19.7 19.7 19.9
Calculate the mean and the standard deviation. Answer 19.92 and 0.168
D.J.Dunn www.freestudy.co.uk 4
We need to discuss what x means. If you were throwing a dice over and over again you would get
a score of exactly 1, 2, 3, 4, 5 or 6. Hence x can only be these exact numbers and if you throw the
dice repeatedly you can measure how many times a particular number comes up.
In the case of continuous variables such as height, weight, size and so on, you can get values of x
with as many decimal places as required to express the accuracy of the measurement. In order to do
anything meaningful, we have to count the number of samples f that fall within a specified range
of each x value used in the plot. Grouped data is presented in tables showing the bands and the
frequency and is more likely to be used with large numbers of samples.
For example suppose the strength of spot welds is measured and the numbers falling within a band
of 10N are plotted and we get a graph as shown. (This is a fictional table) x is the variable
representing the strength in Newtons at the middle of each band and f the number in each band.
If the data is truly random and no factors exist to make the results biased to one extreme or the
other, the plots usually compare well with the normal distribution curve.
Consider a bell shaped distribution curve. The mean occurs at or near the middle. The deviation
from the mean at any point is d. Next consider the graph of d plotted against f and further the graph
of d2 plotted against f. On this last graph we find the mean d2 as follows.
(d12 + d 22 + d 32 + .....d 2n )
The mean height of the graph is the variance S =
n
2
d
For reasons not explained here, n-1 is often used instead of n on the bottom
In general S =
n
f x x 2
S = 2 =
line. Substitute d = x - x
is the standard deviation.
n 1
D.J.Dunn www.freestudy.co.uk 5
fx 2 fx
=
f f
The following is a grouped set of data for visits made to the doctor by a sample of children.
Visit to Doctor
No.of Children
Total Visits Cumulative
x
f
fx
0
2
0
2
1
8
8
10
2
27
54
64
3
45
135
199
4
38
152
351
5
15
75
426
6
4
24
450
7
1
7
457
Totals
f = n = 140
f x = 455
Mean number of visits = 455/140 = 3.25.
fx 2 fx
f
f
x2
0
1
4
9
16
25
36
49
2
fx2
0
8
108
405
608
375
144
49
fx2=1697
2
1697 455
=
= 1.55
140 140
= 1.25
fx 2 fx
f
1
f
1697 455
=
= 1.57 = 1.25
139 140
This does not make much difference so long as the total number of samples is very small.
WORKED EXAMPLE No. 3
The hardness of 143 samples of steel is measured and grouped into bands as shown. Calculate
the mean and standard deviation. The figures of 17.5 and 21.5 result from one sample being
exactly 91 units and so half is allocated to each band.
Range
89-91
91-93
93-95
95-97
97-99
99-101 100
Totals
f
17.5
21.5
32
38
17
1700
143
fx
acc f
x2
f x2
1575
1978
3008
3648
1666
143
13575
17.5
39
71
109
126
10000
8100
8464
8836
9216
9604
170000
141750
181976
282752
350208
163268
1289954
fx 2 fx
f f
1289954 13575
= 8.939
143
143
= 2.99
It is of interest to note that in this population, we get a very different answer using the other
formula.
fx 2 fx
f 1 f
D.J.Dunn www.freestudy.co.uk 6
1289954 13575
=
= 72.46
142
143
= 8.51
1. The accuracy of 100 instruments was measured as a percentage and the results were grouped
into bands of 1% as shown. Calculate the mean and the standard deviation.
Range
Mid
61.5-62.5
62
62.563
63.564
64.565
65.566
66.567
67.568
68.569
69.570
70.571
71.572
72.573
73.574
74.575
75.5-76.5
76
Answers 68.88 and 2.74%
freq
1
2
3
4
8
12
13
18
14
10
5
4
3
2
1
2. The breaking strengths of 150 spot welds was measured in Newton and grouped into bands of
20 N as shown.
Range
160-10
180-200
200-220
220-240
240-260
260-280
280-300
300-320
f
2
6
10
28
50
31
15
8
Calculate the mean and the standard deviation. (Answers 251.47 N and 29.04 N)
D.J.Dunn www.freestudy.co.uk 7
4.
Many examples of data distribution produce a bell shaped curve when plotted that is symmetrical
about the mean value. This is a natural event since we expect most things to have a lot of values
near the mean and very few far away from the mean. Mathematicians have produced various
models of the bell shaped curve and the one most widely used is the normal distribution curve given
by the equation.
( x x )2
2
e 2
y=
2
x is the mean value of x and is the standard deviation. These two parameters define the shape of
the curve. The plots show that the smaller the value of the taller the graph becomes. x is at the
centre and corresponds to the median.
y=
(z )2
e 2
D.J.Dunn www.freestudy.co.uk 8
The area under the curve of the standardised normal curve between z = - and z can be put into the
table form below. This is mostly used for probability problems and so the data in the table below is
called the probability content. The tabled values give the blue area of the diagram for any value of
z. These areas have important use for solving statistical and probability problems. The total area is
1.0 so the yellow area = 1 blue area. Because the graph is symmetrical, the areas between any two
values of z are easily found from the table.
0.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
D.J.Dunn www.freestudy.co.uk 9
0.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989
0.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
ACCUMULATIVE DATA
A = y dz .
0
The area under the normal distribution curve between z = 0 and any other value is A = y dz
0
D.J.Dunn www.freestudy.co.uk 10
The strength of spot welds is measured and the numbers falling within a band of 10N are
shown. (This is the example on the previous page).
Strength(N)
Number
130 150 170 190 210 230 250 270 290 310 330 350 370
1
1
2
6
18 26 28 24 17 8
2
1
1
Determine the mean and standard deviation. Using the Normalised distribution data find out the
probability of a spot weld having a strength less than 200 N?
SOLUTION
mid
x
130
150
170
190
210
230
250
270
290
310
330
350
370
Totals
freq.
f
1
1
2
6
18
26
28
24
17
8
2
1
1
135
fx
acc f
x2
f x2
130
150
340
1140
3780
5980
7000
6480
4930
2480
660
350
370
33790
1
2
4
10
28
54
82
106
123
131
133
134
135
16900
22500
28900
36100
44100
52900
62500
72900
84100
96100
108900
122500
136900
16900
22500
57800
216600
793800
1375400
1750000
1749600
1429700
768800
217800
122500
136900
8658300
fx 2 fx
=
f f
8658300 33790
= 1487
135
135
= 38.56
200 250.3
= 1.304
38.56
If we look up the table value for 1.30 we get 0.9032. The answer we need is 1- 0.9032 = 0.0968
so the probability of getting a spot weld weaker than 200 N is 9.68% or put another way 9.68%
of the samples are likely to be less.
D.J.Dunn www.freestudy.co.uk 11
The resistors must have a range of 22 k 5% so they must fall within a band of 23.1k and
20.9 k.
23.1 21.8
20.9 21.8
= 1.625 and z =
= 1.125
0.8
0.8
From the table the probability of being less than 23.1 corresponds to z = 1.625 and is 0.9479
(half way between two values).
The z values are z =
To find the probability of being less than 20.9 look up z = 1.125 and subtract from 1.0. Hence we
get 1 - 0.8697 = 0.1303
The quantity or probability of falling within the required limits is 0.9479 0.1303 = 0.8176
81.8% are acceptable, the rest are too high or too low.
1. A machine tool producing ground pins must produce a diameter of 12 mm 0.05 mm.
Continuous monitoring of the size by gauging equipment shows that the mean is 12.01 with a
standard deviation of 0.03. Assuming a normal distribution, what is the % rejected?
(11.4%)
2. The lifespan in hours of a mass produced light optical device is normally distributed and has a
mean of 1400 with a standard deviation of 300.
What is the probability of one taken at random having a lifespan between 1400 and 1850 hours?
(43.3%)
What is the percentage that will last longer than 2100 hours?
(1%)
If the guarantee is for 1000 hours, what percentage will fail to meet the guarantee?
What lifespan should be guaranteed if 95% must obtained? (907 hours)
D.J.Dunn www.freestudy.co.uk 12
(9.1%)