Vous êtes sur la page 1sur 3

Problem Statement

1. Find the precise data and mathematical expression needed for calculation of one standard deviation from a number of values.

2. Comparison of sample and population standard deviation.

Solution :
1.

Standard Deviation: First, we will work out the expression needed for calculation of standard deviation ( of the entire discrete population. The expression (eq.1) is:

where X = observation value , could be X1, X2etc N = number of observations = population average i.e. sum of expressions divided by their number (X1+ X2+.+ XN)/N For example (E1), for values 12, 20, 16, 18, 19, = 17, = 8 = 2.82. If the sample is divided in clusters with known frequency, X could be replaced by the mean value of the interval and (X-)2 by f(M-)2, where f is the frequency of occurrence of interval, and M is the intervals mean value.

Similarly, for sample standard deviation, the expression changes to (eq.2)

= sample mean( not the population mean) N = number of values in sample

s = sample standard deviation This replacement of n -> n-1 is bessels correction due to the fact that s is not an unbiased estimator for . Let us take another example E2 in which a large population of unknown size contains elements of E1 as a sample only. Now our standard deviation increases to 10 = 3.162, which takes care of the fact that we have comparatively lesser information in the sample taken in E2. The method we used was the crudest and the most fundamental method of finding out the standard deviation. So, in the next part we will compute the general expression for getting the confidence intervals based on sample size. 2. Construction of confidence intervals: For simplicity, take = n-1. Let z be a quantity which is defined as the number of standard deviations by which our samples standard deviation deviates from populations standard deviation. So, our equation becomes, (1-z) < s < (1+z) (1-z)2 < < (1+z)2 as written above. Suppose

A quantity v2 is defined as the expression

we define confidence level as . Assuming a two tailed test, and using the distribution table given at http://www.statsoft.com/textbook/distributiontables/#chi, our area would be define as (1-)/2 for higher interval i.e. (1+z)2, and (1+)/2 for lower interval i.e. (1-z)2. Using lower interval, we have (1-z)2 = 2 to get z=1Using this formula, a 2-dimensinal table could be created using 2 tables. \ P 10 .995 0.535688 .990 0.494212 .975 0.430178

20 30

0.390335 0.322093

0.357334 0.293991

0.307512 0.251874

For example, for = 30, = 0.05 we can compute z = .25 which means our sample standard deviation would lie between .75 and 1.25 with 95% confidence level. As the sample becomes bigger, value of z decreases. E.g., when becomes 1000, 0.9943 < s <1.0057. From the nature of functions involved in the calculations, it appears that the calculability applies throughout the domain of the parameters. Explanation of doubts: In the expression (X-)2 means (X1-)2+ (X2-)2+(X3-)2+(X4-)2+(X5-)2 Using that, we get (12-17)2+ (20-17)2+(16-17)2+(18-17)2+(19-17)2 Divide that by 5 and take square root to get (25+9+1+1+4) /5 = 8 = 2.828
1.

stands for summation operator. (ai +bi) = a1 + b1 + a2 +b2 an +bn. (ai+b) = a1+ a2 an +n*b, where ai, bi are the ith terms of a, b which vary from 1 to n. In the second example, b is constant. n*f(x) = n*f(x) where n is constant multiplier.
2.

For regression analysis, I could not derive a method which could relate sample standard deviation to the population standard deviation