Vous êtes sur la page 1sur 16

Statistical methods depend upon probability theory.

Probability, P = number of observations / total number of observations or, to put it another way: P = number of specific outcomes / total number of possible outcomes

Example
In a group of mice there are 200 white mice and 50 brown mice:
The proportional frequency of brown mice is 50/250 = 1/5 = 0.2 If we randomly take one mouse there is a 1/5 chance of it being brown (20%). The probability of picking a brown mouse as a single random sample is equivalent to the proportional frequency of brown mice in the group / population. If there were 250 white mice, the probability of selecting a brown mouse would be 0/250 = 0. The probability of selecting a white mouse would be 250/250 = 1 (100%). P is normally written as a decimal, e.g. P = 0.5 - all probabilities lie between 0 and 1 (0% and 100%).

Example
Replacing versus not replacing selections: If we replace the first selection and make a second selection, then the probability of making a given selection is unaltered. Thus, the probability of picking a brown mouse is still 50/250 = 1/5 = 0.2 If we do not replace our first selection the probability when making the second selection will change: In above situation, selection 1 = brown mouse. If this is not replaced there are now 249 mice (not 250) and only 49 brown mice (not 50). The probability of picking a brown mice in a second selection is 49/249. Similarly, the probability of selecting a white mouse is now 200/249 rather than 200/250 as it would have been in the first selection.

Example
A population of 50 brown mice, 200 white mice, selections with replacement: a) Probability of 3 brown mice in 3 selections = (50/250) * (50/250) * (50/250) = (1/5) * (1/5) * (1/5) = 0.008 b) Probability of selecting, in order, brown, brown and then white = (50/250) * (50/250) * (200/250) = (1/5) * (1/5) * (4/5) = 0.032 c) If, however, we are not interested in the order (i.e. brown, brown, white) but just the overall outcome (i.e. 2 brown, 1 white), the probability is different:

Normal distribution In probability theory and statistics, the normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean or average.

The normal distributions are a very important class of statistical distributions. All normal distributions are symmetric and have bell-shaped density curves with a single peak.

Kurtosis (from the Greek word , kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations. Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails.

Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

Density of the t-distribution (red and green) for 1, 2, 3, 5, 10, and 30 df compared to normal distribution (blue)

Student's t-distribution (or simply the t-distribution) is a probability distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small.

In the figure below, the red curve shows the distribution of chi-square values computed from all possible samples of size 3, where degrees of freedom is n - 1 = 3 - 1 = 2. Similarly, the the green curve shows the distribution for samples of size 5 (degrees of freedom equal to 4); and the blue curve, for samples of size 11 (degrees of freedom equal to 10).

Determining Sample Size : "What size sample do I need?"

Three criteria usually will need to be specified to determine the appropriate sample size: 1. the level of precision, 2. the level of confidence or risk, and 3. the degree of variability in the attributes being measured.

The Level Of Precision


The level of precision, sometimes called sampling error, is the range in which the true value of the population is estimated to be. This range is often expressed in percentage points, (e.g., 5 percent). Thus, if a researcher finds that 60% of farmers in the sample have adopted a recommended practice with a precision rate of 5%, then he or she can conclude that between 55% and 65% of farmers in the population have adopted the practice.

The Confidence Level The confidence or risk level is based on ideas encompassed under the Central Limit Theorem. When a population is repeatedly sampled, the average value of the attribute obtained by those samples is equal to the true population value. The values obtained by these samples are distributed normally about the true value, with some samples having a higher value and some obtaining a lower score than the true population value.

If a 95% confidence level is selected, 95 out of 100 samples will have the true population value within the range of precision specified earlier.

Degree Of Variability The degree of variability in the attributes being measured refers to the distribution of attributes in the population. The more heterogeneous a population, the larger the sample size required to obtain a given level of precision. The less variable (more homogeneous) a population, the smaller the sample size. Note that a proportion of 50% indicates a greater level of variability than either 20% or 80%. This is because 20% and 80% indicate that a large majority do not or do, respectively, have the attribute of interest.

STRATEGIES FOR DETERMINING SAMPLE SIZE


1. Using A Census For Small Populations: take all persons 2. Using A Sample Size Of A Similar Study 3. Using Published Tables 4. Using Formulas To Calculate A Sample Size

Size of Population 500 600 700 800 900 1,000 2,000 3,000 4,000 5,000 6,000 7,000

Sample Size (n) for Precision (e) of: 3% a a a a a a 714 811 870 909 938 959 5% 222 240 255 267 277 286 333 353 364 370 375 378 7% 145 152 158 163 166 169 185 191 194 196 197 198 10% 83 86 88 89 90 91 95 97 98 98 98 99

Table 1. Sample size for 3%, 5%, 7% and 10% Precision Levels Where Confidence Level is 95% and P=.5.

a = Assumption of normal population is poor (Yamane, 1967). The entire population should be sampled.

8,000
9,000 10,000 15,000 20,000 25,000 50,000 100,000 >100,000

976
989 1,000 1,034 1,053 1,064 1,087 1,099 1,111

381
383 385 390 392 394 397 398 400

199
200 200 201 204 204 204 204 204

99
99 99 99 100 100 100 100 100

Vous aimerez peut-être aussi