Vous êtes sur la page 1sur 14

8: Statistical Distributions

The Uniform Distribution 1


The Normal Distribution 3
The Student Distribution 5
Sample Calculations 5
The Central Limit Theory 8
Calculations with Samples 9
Histograms & the Normal Distribution 11

This Unit covers some of the topics of Chapters 8 and 9 of Quantitative Approaches in
Business Studies. The reader may wish to download the files PROBABILITY.XLS,
NORMALDISTA.XLS, NORMALDISTB.XLS and NORMALDISTC.XLS from the web site
containing this supplement. In this Unit we will use the Data Analysis tool to generate
random numbers and we will explore the behaviour of these numbers.

The Uniform Distribution


One of the fundamental tenets of probability theory states that experimental probability
approaches theoretical probability as the number of experiments becomes very large. Most
people will agree the probability of a randomly spun coin landing heads up is 0.5 (or 50%).
This is the theoretical probability. We are not surprised if the same coin spun 10 times
gives, say, 7 heads and 3 tails. If we spun it 100 times we would not be surprised by 47
heads and 53 tails. What about a result of 30 heads and 70 tails?
The theory about experimental and theoretical probabilities is difficult to prove.
Whenever we get a result that disagrees with the theory we can say (a) well that can
happen statistically or (b) the coin (dice, or whatever) is not random. However, the theory
has stood the test of time.
The worksheet on Sheet1 of PROBABILITY.XLS (See Figure 1) has 1000 random
numbers in column A. These were generated by using Tools|Data Analysis, selecting the
Random Number Generation and completing the dialog box as shown in Figure 1. We
have asked for a uniform distribution of numbers in the range 1 to 6. By uniform we mean
that every number in the range has an equal probability of being generated. In B4 we use
the formula =INT(A4) +1 and copy this down to B1003 by double clicking on the cells fill
handle. The numbers in column B range from 1 to 6 the numbers on a dice.
In D3:J13 we compute the experimental probability of each dice face value for various
numbers of throws. The formula in E5 is =COUNTIF($B$4:$B8,E$4)/COUNT($B$4:$B8).
The numerator counts the number of times the value 1 (that is the value of E4) occurs in
the range B4:B8 while the denominator counts all the values five throws. The result is
the experimental probability of finding the face value 1 in 5 throws.
The dollar signs in the formula enable us to copy this formula to E5:J14 and get the
correct formulas in the other cells by making slight modifications. For example, in G6 the
formula is =COUNTIF($B$4:$B13,G$4) / COUNT($B$4:$B13). This gives the experimental
probability of finding the face value 3 in ten throws.
The plot in Figure 1 shows how the experimental probabilities for two face values vary
with the number of throws. The chart was made by selecting D5:D14, holding down C
and selecting G4:G14 and J4:J14. The data in D16:E17 was used to draw the horizontal
line using the Edit and Paste Special method introduced in Unit 4 under the topic Adding
a New Series.
2 Statistical Distributions

We can see that the probabilities for the face values 3 and 6 (these were chosen
arbitrarily) do seem to tend to the theoretical value of 1/6 or 0.166667 as the number of
throws increases. We may use this experiment to indicate that Excel has indeed generated
a more-or-less uniform set of random numbers.

A B C D E F G H I J
1 Probability
2 Probability
3 Random Dice Face value
4 1.847407 2 n 1 2 3 4 5 6
5 4.107181 5 5 0.000 0.400 0.200 0.000 0.400 0.000
6 1.524766 2 10 0.200 0.200 0.200 0.100 0.300 0.000
7 4.306772 5 20 0.150 0.200 0.200 0.150 0.150 0.150
8 2.977935 3 30 0.167 0.233 0.167 0.167 0.133 0.133
9 0.547685 1 40 0.175 0.200 0.175 0.175 0.175 0.100
10 4.185553 5 50 0.160 0.200 0.160 0.140 0.220 0.120
11 3.93524 4 100 0.160 0.150 0.180 0.150 0.200 0.160
12 0.585772 1 200 0.175 0.145 0.140 0.170 0.190 0.180
13 2.27131 3 500 0.162 0.144 0.174 0.180 0.176 0.164
14 3.292337 4 1000 0.166 0.149 0.162 0.181 0.164 0.178
15 0.028565 1
16 1.325724 2 1 0.166667
17 5.20603 6 1000 0.166667
18 2.379711 3
19 5.026032 6
0.25
20 3.754143 4
21 5.061373 6
0.20
22 1.185461 2
23 2.114933 3
Probability

24 0.104373 1 0.15
25 4.749168 5
26 1.13657 2 0.10
27 1.525864 2
28 3.152989 4 0.05
29 0.536515 1
30 2.642293 3 0.00
31 1.4008 2 1 10 100 1000
32 5.363323 6
N
33 3.365215 4
34 4.884304 5

Figure 1

Figure 2
Statistics Distributions 3

The Normal Distribution


The worksheet in NORMALDISTA.XLS contains three sheets that allow the reader to
experiment with some properties of the normal distribution curve.

Figure 3 shows the worksheet on the TwoCurve sheet. The blue curve shows a standard
normal distribution (mean = 0 and standard deviation = 1). On the y-axis we have
probability values and on the x-axis we have z (measurement) values. Each point on the
curve corresponds to the probability p that a measurement will yield a particular z value
(value on the x-axis.) The probability is expressed as a number from 0 to 1. Of course, we
could also talk about percentage probabilities just multiply p by 100. It can be shown that
the area under the curve must be one since a measurement must result in some value.

Note how the probability is essentially zero for any value z that is greater than 3 standard
deviations away from the mean on either side.

The two parameters of the red curve may be changed by using the spinner. You will see
the shape and position of the red curve alter. Just click on a spinner arrow to increase or
decrease the Mean and/or StdDev.

If you set the mean for the adjustable curve to zero and experiment with the standard
deviation (s), you will see that as the as s increases the curve gets wider while its height
decreases. The area, of course, remains constant.

Figure 3

On the second sheet (AboutM) you can select z1 and z2, one from each side of the mean,
and find the probability that a measurement z will be within the range see Figure 4. You
will see this probability written in textbooks as P(z1 < z < z2).
4 Statistical Distributions
The slider objects are used in one of three ways: (1) drag the slider bar, (2) for large
jumps, click on the spaces either side of the slider bar, and (3) for more precise control,
click either arrow on the slider object.

Figure 4

As shown in Figure 4, 95.45% of all observations lie within two standard deviations of the
mean. What are the corresponding percentages for 1 standard deviation of the mean and
for -3 < z < 3?

The sheet AnyP (see Figure 5) is similar to the previous sheet except that the z value may
take any values. To create a different visual effect, the area is plotted as a series of
columns.

Figure 5
Statistics Distributions 5

The Student Distribution

Figure 6

Also in the workbook NORMALDISTA you will find the sheet Student which lets you
compare the standard normal distribution with the Student distribution with varying degrees
of freedom see Figure 6.
The next Unit has some calculations using the Student distribution.

Sample Calculations
We will show how to perform some simple calculations involving the normal distribution.
These will help the reader become familiar with the Excel function NORMDIST and its
converse NORMINV. The syntax for the former is NORMDIST(x, mean, standard deviation,
cumulative), where x is the measured value, mean and standard deviation have obvious
meanings, and cumulative is a logical value (i.e. you may use TRUE or FALSE ,or 1 or 0,
for its value). A TRUE value returns the cumulative probability while a FALSE value returns
the value of the probability function.
There are also the functions NORMSDIST and NORMSINV which are used only for the
istandard normal distribution. These, of course, do not required the mean and standard
distibution arguments since the standard normal distribution these are constant at 0 and
1, respectively.

Each problem will be solved in three ways: (1) using the worksheet AnyProb or AboutM,
(2) using Appendix 5 in Quantitative Approaches in Business Studies , and (3) using the
NORMDIST or NORMINV function. In this way the use of the two Excel functions should
become clearer.

a) For a standard normal distribution, find the area under the curve to the left of z = 1.84.
i) The worksheet AnyProb may be used to determine the answer see Figure 7. You may
be concerned that this computes the area from -4 to 1.84 so part of the left tail is
missing. We will see that this is insignificant. This method yields 96.71%.
6 Statistical Distributions
A B C D E F G H
1 z1 -4 Probality that Z lies between Z1 and Z2
2 z2 1.84 96.71%
3
4 0.45
5 0.40
6
0.35
7
8 0.30
9 0.25

P
10
0.20
11
12 0.15
13 0.10
14
0.05
15
16 0.00
17 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
z
18

Figure 7

ii) Set up a worksheet as shown in Figure 8 and you can answer all question of this type
by entering the appropriate value in B5. For z = 1.84 we get 0.97116 which agrees with
the result above.

A B C D E
3 Question (a)
4 z 1.84
5 NORMDIST 0.967116 =NORMDIST(B4,0,1,TRUE)
6
Figure 8

iii) Look up the value 1.84 in Appendix 5 of Quantitative Approaches in Business Studies
and you should get 0.0329. Do not panic! The difference is explainable. The Appendix
lists values for areas to the right as shown in the diagram in its heading. Since the total
area under the curve is 1, it follows that for any given x value: Area(left of x) +
Area(right of x) = 1. So the area to the left is 1 - 0.329 = 0.9671. So we do get the same
answer!

It should be apparent by now that when we find an area to the right of x, we are finding the
probability of an observation that is greater than x. Conversely, an area to the left is the
probability of an observation being less than x.

b) How different would our solutions be if z was negative, say -1.84? The problem is now:
what is the area to the left of z = -1.84?
i) The AnyProb worksheet gives 3.29% or 0.0329. Does that value look familiar?
ii) The worksheet function NORMDIST gives the same value, i.e. 0.032884 with the
default format.
iii) Appendix 5 in Quantitative Approaches in Business Studies does not differentiate
between positive and negative values since the normal distribution is symmetrical
about the mean. So again we get 0.0329.

c) Find the area between z = -1.97 and z = 0.86.


i) The AnyProb worksheet gives the result 78.07%
ii) The function NORMDIST may be used once we recognize that we need the area to the
left of 0.86 less the area to the left of -1.97. Figure 9 shows how to set up a worksheet
to solve problems of this type. So that the user does not need to remember to place the
Statistics Distributions 7
two z values in order, the formula in B10 is =ABS(B9-D9).

A B C D
7 Question (c)
8 z1 0.86 z2 -1.97
9 NORMDIST 0.805106 NORMDIST 0.024419
10 Difference 0.780686
Figure 9

iii) If you look up the two z values in Appendix 5 the two areas are 0.1949 and 0.0244. The
result we need is 1 - (the sum of these two), or 1 - (0.2193) = 0.7807. If necessary,
draw a diagram to convince yourself that this is the way to proceed.

Note: If you try to enter the text (c) in an Excel cell, it is most likely that the copyright
symbol will be displayed. To overcome this use Tools|AutoCorrect, select the
appropriate entry and click the Delete button.

d) Given a normal distribution with : = 400 and F = 50, what is the probability that x will
have a value greater than 469?
i) To use Appendix 5 of Quantitative Approaches in Business Studies we must convert
x 469 400
the x value to a z-score using z = or z = = 138
. . Then we look up
50
the z value in the Appendix to get 0.0838.
ii) The worksheet solution is shown in Figure 10. Remember that NORMDIST finds the
area to the left of the x value or the probability that the observation will be less than x.
We need the probability of it being greater so we use 1 - NORMDIST.

A B C D E F
12 Question (d)
13 mean 400
14 stdev 50
15 critical value 469
16 P(x<value) 0.916207 =NORMDIST(B15,B13,B14,TRUE)
17 P(x>value) 0.083793 =1-B17
Figure 10

In the problems above we computed P knowing x. Sometimes we need to find x knowing


P. Not surprisingly, the Excel function for this is called NORMINV and it has the syntax
NORMINV(probability, mean, standard deviation).

e) For a normal distribution with a mean of 120 and a standard deviation of 12, determine
the value of x for the first 25%.
i) The worksheet solution is shown in Figure 11 which shows an answer of 111.91.

A B C D E F
20 Question (e)
21 mean 120
22 stdev 12
23 P 25%
24 x 111.9061 =NORMINV(B23,B21,B22)

Figure 11
8 Statistical Distributions
ii) To solve this with the AnyProb worksheet we first find what z value will include the first
25%. We do this by setting z1 to -4 and varying z2 until the probability reads 25%, or
as close to that value as we can get. We find a z value of -0.67 gives P = 25.14%. Now
we must convert the z value to an x. We have already met the relationship
x
z= so we may write x = z + . Thus x = -0.6712 + 120 = 112.96. This is

not exactly the same as the first solution because we did not find z corresponding to
exactly 25%.
iii) To solve the problem with Appendix 5 we use a similar hunting process. Look in the
table until you find an area value close to 0.250. Did you find 0.2514 with a z value of
0.67? We complete the solution as in (ii) above. However, you must realise that you
need the left tail of the curve so use -0.67 as the z value.

In methods (ii) and (iii) interpolation could be used. We have these two data points
P(z=0.67) = 0.2514 and P(z=0.68) = 0.2483. We can say that the midpoint will be
approximately P(z=0.675) = 0.2498.5 which is closer to the required 0.25 value. With
0.675 as the value of z, we find x = 111.90. This agrees with the worksheet approach.

The Central Limit Theory


The worksheet in NORMALDISTB.XLS is a demonstration of the Central Limit Theory
which states : As the sample size (n = number of observations in each sample) increases,
the distribution of the sampling mean ( x ) approximates a normal distribution with the
mean x = and standard deviation x = / n .
On sheet NormData (see Figure 12) the first 9 columns contain 100 random number
taken from a normal distribution with a mean of 0 and a standard deviation of 1. In column
J the averages of the nine values in each row are computed. On the sheet called CLT
(Figure 13), the distribution of values in column A and in column J have been found using
the FREQUENCY function and the results charted. It can be seen that as n goes from 1
to 9, the distribution does become approximately normal. It is left as an exercise for the
reader to compute the mean and standard deviation of the data in column J of the
NormData sheet and show that is is approximately 1 / 9 .

A B C D E F G H I J
1 Random Normal Distribution Samples
2 average
3 -0.16673 0.328766 -0.28371 -1.20275 -0.50711 1.856251 0.61199 -0.45234 -1.97114 -0.19853
4 0.226639 -1.68125 -0.51268 1.231635 0.912933 -0.99342 0.469188 -0.7506 0.15325 -0.10492
5 -0.63241 0.439417 0.028652 -0.00922 -0.15163 -1.11076 -0.45031 1.230819 -0.32078 -0.10847
6 -1.45999 2.6817 -0.57594 -0.22664 -1.38391 -0.3894 0.983568 0.274481 0.87663 0.086722
7 -0.0373 0.394685 1.75377 1.381718 2.097477 0.809821 -0.19418 0.313062 -0.21152 0.700837
Figure 12
Statistics Distributions 9

A B C D E F G H I J K L M
1 sample size 1 9
14 30
2 bin frequency frequency
3 -1.8 3 0 n=1 n=9
12 25
4 -1.6 0 0
5 -1.4 4 0 10
6 -1.2 4 0 20
7 -1 3 0 8
8 -0.8 1 1 15
9 -0.6 13 5 6
10 -0.4 12 12 10
11 -0.2 6 13 4
12 0 4 27 5
13 0.2 6 21 2

14 0.4 12 12
0 0
15 0.6 6 6
-1.8 -1.4 -1 -0.6 -0.2 0.2 0.6 1 1.4 1.8 -1.8 -1.4 -1 -0.6 -0.2 0.2 0.6 1 1.4 1.8
16 0.8 3 3
17 1 4 0
18 1.2 1 0
19 1.4 5 0
20 1.6 5 0
21 1.8 2 0
22

Figure 13

The reader may wish to generate a new data set with the Random Number Generator
found in the Data Analysis tool (see Figure 14).

Figure 14

Calculations with Samples


a) Coots Plc makes light bulbs whose lifetimes are distributed normally with a mean of
800 hours and a standard deviation of 40. What is the probability that a random sample
of 16 bulbs for Coots will have a mean lifetime of less than 775 hours?

Solution: From the Central Limit Theory, the sampling distribution of the average ( x ) will
be approximately normal with x = 800 and x = 40 / 16 = 10 To find z corresponding
775 800
to x = 775, use z = = 2.5 .
10
i) Again we can use the AnyProb worksheet. Strictly speaking we need the cumulative
probability for the range -4 to -2.5 but we will settle for -4 to -2.5. Set the spinners to
these values to get an answer of 0.62% or 0.0062.
ii) Look up the absolute z in Appendix 5 and you find the value 0.0062. This is the
10 Statistical Distributions
probability P(z>2.5) but because of the symmetry of the curve it is also P(z<-2.5).
iii) The answer may be readily found with the NORMDIST function as shown in Figure 15.
The result for P(<x) in E5 is 0.00621 or 0.62%.

A B C D E F G H I
1 Calculations with samples.
2
3 Population values Probability calculation
4 mean 800 critical value 775
5 std dev 40 P(<x) 0.62% =NORMDIST(B13,B4,B9,TRUE)
6 P(>x) 99.38% =1-B14
7 Sample values
8 Size 16
9 Error of mean 10 Interval calculation
10 P 95%
11 Tail size 2.50% =(1-E10)/2
=B5/SQRT(B8)
12 x(low) 780.40 =NORMINV(E11,B4,B9)
13 x(high) 819.60 =NORMINV(1-E11,B4,B9)
14

Figure 15

b) If we measure a number of averages for samples of size 16, what interval around the
population mean will include 95% of the sample means?

i) If we want 95% distributed about the mean then there will be 47.5% on each side. We
can use the worksheet AboutM or AnyProb to find that a z value of 1.06 yields this
percentage see Figure 16. Of course, on the other side of the mean, a z of -1.96 will
encompass 47.5%. The corresponding x values are obtained from:

xU = + z and x L = z
n n
These give xU = 819.6 and xL = 780.4.

z1 0 Probability that Z lies between Z1 and Z2


z2 1.96 47.50%

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Figure 16

ii) Recall that Appendix 5 gives us areas to the right of a z value the white area on the
right side of the curve in Figure 16. So we need to search in the table an area value of
0.5 - 0.475 z = 1.96 and we finish the problem as before.
iii) A spreadsheet solution is shown in Figure 15 above where we use the NORMINV
function. The same results are obtained.
Statistics Distributions 11

Histograms & the Normal Distribution


The worksheet in the file NORMALDISTC.XLS demonstrates three ways to find the mean
and the standard deviation from experimental frequency data. We will use this to add a
normal curve to a histogram.

Suppose you have 100 items for which it is possible to measure a quantity (weight,
diameter, etc.) with one of three progressively coarser devices. On the sheet Tables (see
Figure 17) the range A7:B39 is a table listing the frequency of specific x values in a
measured sample when the increment for the bins is 0.005 units. The ranges F7:G23 and
K7:L15 are similar tables when the bin increments are 0.01 and 0.02, respectively. Note
that the maxima for these three tables are approximately 8, 16 and 32, respectively. It is
not surprising, therefore, that we shall need to normalize the curve produced by the
NORMDIST function.

We will assume that the distribution of these measurements are normal (i.e. Gaussian).
The mean and standard deviation can be computed from such tables using the
relationships:
N N
= xi Pi and = (x i ) 2 Pi
i i
where Pi is the probability for measurement xi .

Our data is expressed in terms of percentage frequency rather than probability but we can
use the simple relationship that Pi = fi /100.

The mean is computed in B3 with the formula =SUMPRODUCT(A8:A23,B8:B23)/100. The


factor of 100 comes from the Pi = fi /100. The formulas in G3 and L3 are analogous:
=SUMPRODUCT(F8:F23,G8:G23)/100 and =SUMPRODUCT(K8:K15,L8:L15)/100,
respectively.

To compute the standard deviation we need the value of ( xi ) Pi ; this is the purpose
2

of the third column in each table. The formula in C8 is =(A8-$B$3)^2*B8 and this is copied
down to row 23. The data in the third column is summed to give the standard deviation. So
in B4 we use =SQRT(SUM(C8:C39)/100). Analogous formulas are used in the other
tables.

The fourth column in each table is used to compute the normal distribution values so as
to be able to display a histogram with a superimposed normal curve. In D8 we have
=NORMDIST(A8,$B$3,$B$4,FALSE)*$D$3. Carefully note the use of absolute cell
references for the mean $B$3, standard deviation $B$4 and normalization factor $D$3.
This formula is copied down to row 39.

The user may adjust the value of the normalization factor to give a total in D40 of
approximately 100. There is no merit in attempting great precision here. Now we may
construct a combination chart with the data from columns A, B and D. Similar methods are
used with the other two tables.
12 Statistical Distributions
A B C D E F G H I J K L M N
1 Normal Gaussian Distribution
2
3 mean 0.494961 norm 0.5 mean 0.497482 norm 1 mean 0.502378 norm 2
4 sdt 0.024480 sdt 0.024531 sdt 0.025249
5
6
7 x freq (%) diff sq* prop normdist x freq (%) diff sq* prop normdist x freq (%) diff sq* prop normdist
8 0.425 0.00 0.0000 0.14 0.43 0.01 0.0000 0.37 0.44 1.03 0.0040 1.49
9 0.430 0.01 0.0000 0.24 0.44 1.02 0.0034 1.04 0.46 9.18 0.0165 7.73
10 0.435 0.30 0.0011 0.41 0.45 3.06 0.0069 2.50 0.48 21.40 0.0107 21.34
11 0.440 0.72 0.0022 0.66 0.46 6.12 0.0086 5.06 0.50 30.40 0.0002 31.46
12 0.445 1.50 0.0037 1.02 0.47 7.62 0.0058 8.68 0.52 23.80 0.0074 24.77
13 0.450 1.56 0.0032 1.51 0.48 13.78 0.0042 12.62 0.54 11.65 0.0165 10.41
14 0.455 2.70 0.0043 2.15 0.49 15.10 0.0008 15.52 0.56 2.52 0.0084 2.34
15 0.460 3.42 0.0042 2.94 0.50 15.30 0.0001 16.18 0.58 0.02 0.0001 0.28
16 0.465 3.08 0.0028 3.85 0.51 13.60 0.0021 14.28 total 100 99.82
17 0.470 4.54 0.0028 4.85 0.52 10.20 0.0052 10.67
18 0.475 6.78 0.0027 5.84 0.53 7.55 0.0080 6.76
19 0.480 7.00 0.0016 6.76 0.54 4.10 0.0074 3.62
20 0.485 7.90 0.0008 7.50 0.55 2.01 0.0055 1.64
21 0.490 7.20 0.0002 7.98 0.56 0.51 0.0020 0.63
22 0.495 7.80 0.0000 8.15 0.57 0.01 0.0001 0.21
23 0.500 7.50 0.0002 7.98 0.58 0.01 0.0001 0.06
24 0.505 6.70 0.0007 7.49 total 100 99.84
25 0.510 6.90 0.0016 6.75
26 0.515 5.90 0.0024 5.83
27 0.520 4.30 0.0027 4.83
28 0.525 4.40 0.0040 3.84
29 0.530 3.15 0.0039 2.93
30 0.535 2.00 0.0032 2.14
31 0.540 2.10 0.0043 1.50
32 0.545 1.10 0.0028 1.01
33 0.550 0.91 0.0028 0.65
34 0.555 0.26 0.0009 0.40
35 0.560 0.25 0.0011 0.24
36 0.565 0.00 0.0000 0.14
37 0.570 0.01 0.0001 0.07
38 0.575 0.01 0.0001 0.04
39 0.580 0.00 0.0000 0.02
40 100 99.83
41
42
43 9.00 18.00 35.00
44 8.00 16.00
45 30.00
7.00 14.00
46
25.00
47 6.00 12.00
48 5.00 20.00
10.00
49
50 4.00 8.00 15.00
51 3.00 6.00
52 10.00
2.00 4.00
53
54 1.00 5.00
2.00
55 0.00 0.00 0.00
56
0.425
0.440
0.455
0.470
0.485
0.500
0.515
0.530
0.545
0.560
0.575

0.44

0.46

0.48

0.50

0.52

0.54

0.56

0.58
0.43

0.45

0.47

0.49

0.51

0.53

0.55

0.57

57
58

Figure 17

One will find in the Excel literature two methods to fit histogram data to a normal curve that
reportedly improve on the method shown above. This author has serious doubts about the
supposed improvement in the results. It is doubtful if the apparent improvements in
precision have any statistical significance. However, we will look at the two methods
briefly.

The sheet Solver1 is shown in Figure 18. The table A8:B24 contains the same data as the
middle table in the previous sheet. As before, we use the NORMDIST function to produce
the normal curve. We wish to use Solver (see Unit 3) to vary the three quantities mean,
standard deviation and normalization factor in such a way as to make the normal curve
agree with frequency data. If for a given measured value (xi ), the experimental frequency
is fi and the predicted is gi, then minimizing the quantity ( f i gi ) 2 will give the so-called
least-squares fit. We may compute this sum of squares of residuals (SSR) in B6 with the
SUMXMY2 function as shown in Figure 18.
Statistics Distributions 13
A B C D E F G H I J K L
1 Normal Gaussian Distribution
2
3 mean 0.497124 16
4 sdt 0.025795
5 norm 1.012054 =SUMXMY2(B9:B24,C9:C24)
14
6 SSR 4.937021
7
12
8 x freq (%) normdist
9 0.43 0.01 0.53
10

Frequency (%)
10 0.44 1.02 1.35 =NORMDIST(A9,$B$3
11 0.45 3.06 2.95 ,$B$4,FALSE)*$B$5
12 0.46 6.12 5.56 8
13 0.47 7.62 9.00 Copied to row 24
14 0.48 13.78 12.56 6
15 0.49 15.10 15.07
16 0.50 15.30 15.56
4
17 0.51 13.60 13.82
18 0.52 10.20 10.56
19 0.53 7.55 6.95 2
20 0.54 4.10 3.93
21 0.55 2.01 1.91 0
22

0.43

0.45

0.47

0.49

0.51

0.53

0.55

0.57
0.56 0.51 0.80
23 0.57 0.01 0.29
24 0.58 0.01 0.09 x
25 total 100 100.93

Figure 18

Solver is set up to minimize the SSR value in B6 by varying the mean, standard deviation
and normalization factor i.e. cells B3, B4 and B5. As a precaution, the constraint
B4>=0.001 is used to ensure that Solver never tries to make the standard deviation zero
because the NORMDIST functions would then return error values and terminate Solvers
activity. The settings for Solver are shown in Figure 19. To aid Solver in finding a solution,
you may wish to start with the values found on the Tables sheet.

Figure 19

A further refinement of the worksheet for use with Solver is given in Solver2 (see Figure
20). The formulas in column C are somewhat more complicated. This approach may give
better results than the others when the increments of the x values are large. As before,
Solver is used to minimize the SSR in B6 by varying the mean and the standard deviation.
14 Statistical Distributions
A B C D E F G H
1 Normal Gaussian Distribution These cells have been named l
2 B3 as Mean
3 mean 0.49216 B4 as std
4 std 0.02552 B5 as norm
5 norm 100.95882
6 SSR 5.2485449
7 =NORMDIST(A8,mean,std,TRUE)*norm
8 x freq (%) normdist
9 0.43 0.01 0.750283
10 0.44 1.02 1.317731
11 0.45 3.06 2.905758 =(NORMDIST(A9,mean,std,TRUE)-
12 0.46 6.12 5.505988 NORMDIST(A8,mean,std,TRUE))*norm
13 0.47 7.62 8.965295
14 0.48 13.78 12.54459 This is copied down to row 22
15 0.49 15.10 15.08387
16 0.50 15.30 15.58608
17 0.51 13.60 13.83979
18 0.52 10.20 10.56056
19 0.53 7.55 6.924826
20 0.54 4.10 3.902014
21 0.55 2.01 1.889366
22 0.56 0.51 0.786109
23 0.57 0.01 0.28105
24 0.58 0.01 0.115506
25 total 100 101 =(1-NORMDIST(A22,mean, std,TRUE))*norm
26
Figure 20

Vous aimerez peut-être aussi