Vous êtes sur la page 1sur 8

SAMPLING THEORY

Random Sampling: The process of selecting a sample from a population such that each item has equal
chance of being selected.

Sampling distribution (of means): Suppose we take all possible random samples of size n from a
population and calculate the means of each of these samples. The frequency distribution of these means is
called the sampling distribution of means.

Notation:

Population Sample Sampling


distribution
Size N n -
Mean µ ̅
𝒙 µ𝒙̅
Variance 𝝈𝟐 𝒔𝟐 𝝈𝒙̅ 𝟐

Case i) Random Sampling with replacement Case ii) Random Sampling without replacement
No: of samples = 𝑵𝒏 ; µ𝒙̅ = µ, No: of samples = 𝑵𝑪𝒏 ; µ𝒙̅ = µ,
𝟐 𝟐
𝝈 𝝈 𝑵−𝒏
𝝈𝒙̅ 𝟐 = 𝝈𝒙̅ 𝟐 = [ ]
𝒏 𝒏 𝑵−𝟏
𝑵−𝒏
C= 𝑵−𝟏 is called the correction factor. If N is large,
C→1 and can be omitted

The SD 𝝈𝒙̅ of the sampling distribution is called 𝒕𝒉𝒆 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 (𝑺𝑬)of the means.
If 𝒏 ≥ 𝟑𝟎 the sample is considered as large, otherwise, small.
The sampling distribution of means of large samples is assumed to be normal. The random variable is the
mean 𝒙̅ and hence, the standard normal variate for the sampling distribution of means is given by
̅ − µ𝒙̅
𝒙 ̅−µ
𝒙
𝒛= =
𝝈𝒙̅ 𝝈𝒙̅
Problems:

1. The weights of 1500 ball bearings are normally distributed with a mean of 635 gms and S.D of 1.36
gms. If 300 random samples of size 36 are drawn from the population, determine thhe expected mean
and S.D of the sampling distribution of means if sampling is done (a) with replacement (b) without
replacement.
If random sampling is done with replacement, how many samples would have their mean greater
than 635.5 gms?
2. Certain tubes manufactured by a company have a mean lifetime of 800 hours and a standard deviation
of 60 hours. Find the probability that a random sample of 16 tubes taken from the group will have a
mean lifetime (a) between 790 and 810 hours, (b) less than 785 hours, (c) more than 820 hours, (d)
between 770 and 830 hours.

Sampling distribution of proportions


𝒑: population proportion of an item of interest (or binomial probability of success)
𝒑′ : sample proportion (or observed probability of success)

The distribution of possible values of 𝒑′ is approximately normal, with


Mean = 𝒑
𝒑𝒒
S.D = √ 𝒏
𝒑′ −𝒑 𝒏𝒑′ −𝒏𝒑 𝒙− 𝒏𝒑
𝒛= = ==

𝒑𝒒 √𝒏𝒑𝒒 √𝒏𝒑𝒒
𝒏
where 𝒙 is the number of items of interest in the sample and 𝒏𝒑
is the expected number of items of interest in the population

Problems:

1. Find the probability that in 100 tosses of a fair coin, between 45% and 55% of the outcomes are
heads.
2. Out of 1000 samples of 200 children each, in how many would you expect to find that 55% or more
are girls?
Confidence Interval: The purpose of taking a random sample from a lot or population and computing a
statistic such as the mean from the data, is to approximate the mean of the population. A confidence
interval provides an interval estimate of the population parameter. That is, it provides a range of values
within which the true value of the population parameter is expected to lie.

Confidence intervals are constructed at a confidence level, such as 95 % or 99%, selected by the user. It
means that if the same population is sampled on numerous occasions and interval estimates are made on
each occasion, the resulting intervals would bracket the true population parameter in approximately 95 %
of the cases.

In most general terms, for a 95% CI, we say “we are 95% confident that the true population parameter is
between the lower and upper calculated values”.

Confidence intervals for means


̅−𝝁
𝒙
The standard normal variate 𝒛 = 𝝈
⁄ 𝒏

99% confidence interval:
𝐖𝐞 𝐤𝐧𝐨𝐰 𝐭𝐡𝐚𝐭 99% of 𝒛 values lie between −𝟐. 𝟓𝟖 and +𝟐. 𝟓𝟖 (𝐬𝐢𝐧𝐜𝐞 𝟐 × 𝝋(𝟐. 𝟓𝟖) = 𝟎. 𝟗𝟗)
̅−𝝁
𝒙
𝐓𝐡𝐮𝐬, − 𝟐. 𝟓𝟖 ≤ 𝝈 ≤ 2.58
⁄ 𝒏

𝝈 𝝈
This gives the 99% confidence interval for 𝝁 as (𝒙
̅ − 𝟐. 𝟓𝟖 ̅ + 𝟐. 𝟓𝟖
,𝒙 )
√𝒏 √𝒏
𝝈
The 99% confidence limits for 𝝁 are 𝒙
̅ ± 𝟐. 𝟓𝟖
√𝒏
95% confidence interval:

𝑝′ −𝑝
Confidence intervals for proportions are obtained in a similar manner from 𝑧 =
𝑝𝑞

𝑛

Problems:

1. A random sample of 400 items chosen from an infinite population is found to have a mean of 82 and
S.D of 18. Find the 95% confidence limits for the mean of the population from which the sample is
drawn.
2. A sample of 100 days is taken from the meteorological records of a certain district and 10 of them
were found to be foggy. What are the probable limits for the percentage of foggy days?
3. A biased coin is tossed 500 times and head turns up 120 times. Find the 95% confidence limits for the
proportion of heads that turn up in infinitely many tosses.
4. A random sample of 500 apples was taken from a large consignment and 65 were found to be rotten.
Estimate the proportion of bad apples in the consignment.
Testing of Hypothesis
A quantitative statement about the population, formed for the purpose of testing is called a statistical
hypothesis.

Null Hypothesis(𝑯𝟎 ): A proposition that undergoes verification to see whether it can be rejected
(nullified) in favour of an alternative hypothesis (𝑯𝑨 ).
(Alt: Null Hypothesis is a hypothesis framed for the purpose of rejection under the assumption that it is
true.)
Eg.: To test whether population mean = some target value.
𝑯𝟎 : µ = 𝝁 𝟎
𝑯𝑨 : µ ≠ 𝝁𝟎 (two tailed test)
𝑯𝑨 : µ < 𝝁𝟎 (left-tailed test)
𝑯𝑨 : µ > 𝝁𝟎 (right-tailed tests)

Errors:

Significance Level: It is the probability of rejecting a null hypothesis when it is true. That is, the
probability of Type I error. It is usually fixed at 5% or 1%. We are more confident of rejecting a null
hypothesis at 1% level of significance than at 5% level of significance.

Steps in hypothesis testing:


1. Formulate H0 and HA.
2. Note the level of significance (and degrees of freedom for t and chi square tests).
3. Compute test statistic using appropriate formula.
4. Decide: Compare the critical value
(value corresponding to the level of significance-
obtained from tables) and computed value.
Reject H0 if it falls in the rejection region.
5. State conclusion in words.
Large Samples (n ≥ 𝟑𝟎):

Problems
Test of significance of sample mean
̅−µ
𝒙
Test statistic : 𝒛= 𝛔
⁄ 𝐧

1.The mean life of a sample of 100 tube lights manufactured by a company is found to be 1570 hours with
a S.D of 120 hours. Test the hypothesis that the mean life of all tube lights is 1600.
2.A sample of 900 items is found to have mean 3.4. Can it be reasonably regarded as a truly random
sample from a large population with mean 3.25 and S.D 1.61?
3.A random sample of 100 recorded deaths in a year showed an average life-span of 71.8 years. Assuming
a population S.D of 8.9 years, does the data indicate that the average life-span is greater than 70 years?

Test of significance of proportions


𝒑′ −𝒑
Test statistic : 𝒛= 𝒑𝒒

𝒏

1.A die was thrown 1200 times and the number 6 was obtained 236 times. Can the die be considered fair
at 0.01 level of significance?
2.In 324 throws of a die, an odd number turned up 181 times. Is it reasonable to assume that the die is
unbiased?
3.Of the total number of babies born in a hospital, 230 were girls and 270 were boys. Do these figures
confirm the hypothesis that boys and girls are in equal proportion?
4.A manufacturer claimed that at least 95% of the equipment that he supplied to a factory conformed to
specifications. An examination of a sample of 200 pieces of equipment revealed that 18 of them were
faulty. Test his claim at level of significance 1%.
5.In a sample of 400 parts manufactured by a company, the number of defective parts was found to be 30.
The company however claims that at most 5% of their product is defective. Is the claim tenable?
Test of significance of difference of means and proportions

Problems:

1. To test the equality of two population means: i.e., Is 𝝁𝟏 = 𝝁𝟐 ?


We test the significance of the difference between two independent sample means.

𝐻0 : 𝜇1 = 𝜇2 / The difference ̅̅̅


𝒙𝟏 − ̅̅̅
𝒙𝟐 is not significant

̅𝒙̅̅𝟏̅−𝒙
̅̅̅𝟐̅
𝑧= , where ̅̅̅
𝒙𝟏 and ̅̅̅
𝒙𝟐 are the means and 𝑠1 and 𝑠2 are the S.Ds of two samples
𝑠2 𝑠2
√ 1+ 2
𝑛1 𝑛2

2. To test whether two independent samples have come from the same population.
̅𝒙̅̅𝟏̅−𝒙
̅̅̅𝟐̅
If 𝝈 is known, 𝒛 = 𝟏 𝟏
, taking 𝒔𝟏 = 𝒔𝟐 = 𝝈
𝝈√ +
𝒏𝟏 𝒏𝟐

𝒏𝟏 𝒔𝟐 𝟐
𝟏 + 𝒏 𝟐 𝒔𝟐
If not, use S.E =√ 𝒏𝟏 +𝒏𝟐
3. To test whether the difference in sample proportions of some attribute indicates a difference in the
population proportion w.r.t that attribute.

𝐻0 : 𝑝1 = 𝑝2 / The difference 𝑝1 − 𝑝2 is not significant


𝑝1 −𝑝2 𝑛1 𝑝1 +𝑛2 𝑝2 𝑥1 +𝑥2
𝑧= 𝟏 𝟏
, where 𝑃 = 𝒏𝟏 +𝒏𝟐
or 𝒏𝟏 +𝒏𝟐
and 𝑄 = 1 − 𝑃
√𝑃𝑄( 𝑛 +𝑛 )
1 2

1. In a school exam, the mean grade of 32 boys is 72 with SD of 8 while the mean grade of 36 girls
was75 with SD of 6. Test the hypothesis that the performance of girls is better than that of boys.
2. A random sample of 1000 workers in a company has mean wage ₹ 500 per day and SD ₹150.
Another sample of 1500 workers from another company has mean wage of ₹450 per day and S D of
3. ₹ 200. Is there a significant difference b/w the mean wages of the two companies? Find the 95%
confidence limits for the difference in mean wages of the two companies.
4. The mean heights of two large samples of 1000 and 2000 workers are 168.75 cm and 170 cm
respectively. Can the samples be regarded as drawn from the same population of SD 6.25 cm?
5. One type of aircraft is found to develop engine trouble in 5 flights out of a total of 100 and another
type, in 7 flights out of a total of 200 flights. Is there a significant difference in the 2 types of aircraft
with regard to engine defects?
6. In an exit poll poll enquiry, it was revealed that 600 voters in one locality and 400 voters from
another locality favoured 55% and 48% respectively a particular party. Test the hypothesis that the
difference in opinion is significant.
Student’s t-test
Significance of mean of small samples:
The procedure is the same as that for large
samples.
̅−µ
𝒙
Test statistic (for mean): 𝒕 = 𝐬
⁄ 𝐧

The mean and S.D of sample are calculated
using
∑𝒙
̅=
𝒙
𝒏
̅)𝟐
∑(𝒙−𝒙
𝒔=
𝒏−𝟏
t-scores corresponding to 𝒗 = 𝒏 − 𝟏 degrees
of freedom for specified levels of significance are found from tables.

Confidence limits for mean:


𝒔 𝒔
̅ ± 𝒕𝟎.𝟎𝟏
The 99% confidence limits are 𝒙 ̅ ± 𝒕𝟎.𝟎𝟓
and the 95% confidence limits are 𝒙
√𝒏 √𝒏

Significance of difference between sample means:


̅−𝒚
𝒙 ̅ 𝟏
Test statistic 𝒕 = 𝟏 𝟏
where 𝒔𝟐 = 𝒏 ̅)𝟐 + ∑(𝒚 − 𝒚
[∑(𝒙 − 𝒙 ̅ )𝟐 ]
𝒔√ + 𝟏 + 𝒏𝟐 −𝟐
𝒏𝟏 𝒏𝟐

Problems:
1. Ten individuals are chosen at random from a population. Their heights in inches are found to be
63,63,66,67,68,69,70,70,71,71. Test the hypothesis that the mean height of the population is 66 inches.
2. A certain stimulus administered to each of 12 patients resulted in the following change of blood
pressure: 5,2,8,-1,3,0,6,-2,1,5,0,4. Can it be concluded that the stimulus has affected the blood
pressure?
3. Eleven school boys were given a test in drawing. They were given a month’s coaching in drawing and
a second test of equal difficulty was held at the end of the month. Do the marks indicate that coaching
has had an impact on performance?
4. A sample of 10 measurements of the diameter of a sphere gave a mean of 12 cm and S.D of 0.15 cm.
Find 95% confidence limits for the actual diameter.
5. Two types of batteries were tested for their length of life and the following results were obtained:
Battery A : 𝒏𝟏 = 𝟏𝟎, ̅𝒙𝟏= 𝟓𝟎𝟎 𝐡𝐫𝐬, 𝒔𝟐𝟏 = 𝟏𝟎𝟎
Battery B : 𝒏𝟐 = 𝟏𝟎, ̅𝒙𝟐= 𝟓𝟔𝟎 𝐡𝐫𝐬, 𝒔𝟐𝟐 = 𝟏𝟐𝟏
Test whether there is a significant difference in the two means.
6. A group of 10 boys were given Diet A and another group of 8 boys were given Diet B for 6 months.
The following increase in weights was recorded:
Diet A: 5 6 8 1 12 4 3 9 6 10
Diet B: 2 3 6 8 10 1 2 8
Test whether the diets differ significantly.
Chi-square test for goodness of fit

n
Oi  Ei 2
Test statistic  2  
1 Ei
1. Five dice were thrown 96 times. The numbers 1,2 or 3 appearing on the face of the dice follows the
following frequency distribution:
No. of dice
showing 1,2 or 3 : 5 4 3 2 1 0
Frequency : 7 19 35 24 8 3
Test the hypothesis that the data follows a binomial distribution.
2. Genetic theory states that children having one parent of blood type M and the other of blood type N
will always be of one of the three types M, MN and N and the proportions of these types will on an
average, be 1:2:1. A report states that out of 300 children having one M parent and one N parent, 30%
were found to be of type M, 45% of type MN and the remainder of Type N. Test the theory by chi-
square test.