Académique Documents
Professionnel Documents
Culture Documents
The sampling units are chosen without replacement in the sense that the units once chosen
are not placed back in the population .
The sampling units are chosen with replacement in the sense that the chosen units are
placed back in the population.
2.
numbers.
3.
Choose the sampling unit whose serial number corresponds to the random number drawn
In case of SRSWR, all the random numbers are accepted ever if repeated more than once.
In case of SRSWOR, if any random number is repeated, then it is ignored and more
numbers are drawn.
1
Such process can be implemented through programming and using the discrete uniform distribution.
Any number between 1 and N can be generated from this distribution and corresponding unit can be
seleced into the sample by associating an index with each sampling unit. Many statistical softwares
like R, SAS, etc. have inbuilt functions for drawing a sample using SRSWOR or SRSWR.
Notations:
The following notations will be used in further notes:
N:
n:
Y:
Yi :
y=
1 n
yi : sample mean
n i =1
Y =
1
N
y
i =1
: population mean
N
1 N
1
(Yi =
( Yi 2 NY 2 )
Y )2
N 1 i 1=
N 1 i 1
=
=
S2
1 N
1 N
N i 1=
N i1
=
n
n
1
1
2
2
( yi =
( yi2 ny 2 )
s
=
y)
n 1 i 1=
n 1 i 1
=
2 =
=
(Yi Y )2 =( Yi 2 NY 2 )
1
.
N
n
Note that a unit can be selected at any one of the n draws. Let ui be the ith unit selected in the
sample. This unit can be selected in the sample either at first draw, second draw, , or nth draw.
2
Let Pj (i ) denotes the probability of selection of ui at the jth draw, j = 1,2,...,n. Then
Pj (i=
) P1 (i ) + P2 (i ) + ... + Pn (i )
1 1
1
+ + ... +
(n times )
N N
N
n
=
N
=
Now if u1 , u2 ,..., un are the n units selected in the sample, then the probability of their selection is
P(u1 , u2 ,..., un ) = P(u1 ).P(u2 ),..., P(un )
Note that when the second unit is to be selected, then there are (n 1) units left to be selected in the
sample from the population of (N 1) units. Similarly, when the third unit is to be selected, then
there are (n 2) units left to be selected in the sample from the population of (N 2) units and so on.
If P(u1 ) =
P(u2 )
=
n
, then
N
1
n 1
,..., P(un )
.
=
N 1
N n +1
Thus
=
P(u1 , u2 ,.., un )
n n 1 n 2
1
1
=
.
.
...
.
N N 1 N 2 N n +1 N
n
Alternative approach:
The probability of drawing a sample in SRSWOR can alternatively be found as follows:
Let ui ( k ) denotes the ith unit drawn at the kth draw. Note that the ith unit can be any unit out of the N
units. Then so = (ui (1) , ui (2) ,..., ui ( n ) ) is an ordered sample in which the order of the units in which they
are drawn, i.e., ui (1) drawn at the first draw, ui (2) drawn at the second draw and so on, is also
considered. The probability of selection of such an ordered sample is
P ( so ) = P (ui (1) ) P(ui (2) | ui (1) ) P(ui (3) | ui (1)ui (2) )...P(ui ( n ) | ui (1)ui (2) ...ui ( n 1) ).
ui (1) , ui (2) ,..., ui ( k 1) have already been drawn in the first (k 1) draws.
1
.
N k +1
So
n
N k +1
=
P( so )
k =1
( N n)!
.
N!
( N n)!
N!
So the probability of drawing a sample in which the order of units in which they are drawn is
( N n)!
1
=
irrelevant n=
!
.
N!
N
n
2. SRSWR
When n units are selected with SRSWR, the total number of possible samples are N n . The
Probability of drawing a sample is
1
.
Nn
Alternatively, let ui be the ith unit selected in the sample. This unit can be selected in the sample
either at first draw, second draw, , or nth draw. At any stage, there are always N units in the
population in case of SRSWR, so the probability of selection of ui at any stage is 1/N for all i =
1,2,,n. Then the probability of selection of n units u1 , u2 ,..., un in the sample is
P(u1 , u2 ,.., un ) = P(u1 ).P(u2 )...P(un )
1 1 1
. ...
N N N
1
= n
N
=
=
1 1
1
... 1
N N 1 N 2 N k + 2 N k +1
N 1 N 2 N k +1
1
=
.
...
.
N N 1 N +2 N k + 1
1
=
N
2. SRSWR
P[ selection of u j at kth draw] =
1
.
N
Y =
1 n
yi as an estimator of population mean
n i =1
i =1
SRSWOR
n
Let ti = yi . Then
i =1
n
1
E ( yi )
n i =1
1
= E ( ti )
n
1 1 n
=
ti
n N i =1
E( y ) =
N
n
1 1
n
yi .
n N=
i 1=
i 1
When n units are sampled from N units by without replacement , then each unit of the population
can occur with other units selected out of the remaining ( N 1) units is the population and each unit
N
N 1
occurs in
of the possible samples. So
n
n 1
N
n
So
N 1
y = n 1 y .
i
=i 1 =i 1
=i 1
Now
E( y ) =
=
( N 1)!
n !( N n)! N
yi
(n 1)!( N n)!
nN!
i =1
1
N
y
i =1
=Y.
Thus y is an unbiased estimator of Y . Alternatively, the following approach can also be adopted to
show the unbiasedness property.
n
1
n
E( y ) =
E( y j )
j =1
1 n N
Yi Pj (i )
n=j 1 =
i 1
1 n N
1
Yi .
n=j 1 =
i 1 N
1
n
Y
j =1
=Y
SRSWR
n
1
E ( yi )
n i =1
1 n
= E ( yi )
n i =1
E( y ) =
1 n
(Y1P1 + .. + YN P)
n i =1
=
=
1 n
Y
n
=Y.
where
Pi =
1
for all i = 1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased
N
V (=
y ) E ( y Y )2
2
1 n
= E ( yi Y )
n i =1
1 n
1 n n
= E 2 ( yi Y ) 2 + 2 ( yi Y )( y j Y )
n i j
=
n i 1
n
n
n
1
1
= 2 E ( yi Y ) 2 + 2 E ( yi Y )( y j Y )
n
n i j
1 n 2 K
+ n2
n2
N 1 2 K
S + 2
=
Nn
n
where =
K
E ( y Y )( y Y )
j
SRSWOR
=
K
E ( y Y )( y Y ) .
i
Consider
E ( y=
i Y )( y j Y )
N N
1
( yk Y )( ye Y )
N ( N 1) k
Since
2
N
N N
N
2
y
Y
y
Y
+
(
)
(
)
( yk Y )( y Y ))
k
k
=
i 1
k
k 1=
0 =( N 1) S 2 + ( yk Y )( y Y )
N
( y
k
Y )( y=
Y )
1
[( N 1) S 2 ]
N ( N 1)
=
S2
.
N
Thus K =
n(n 1)
S2
N
N 1 2 1
S2
S 2 n(n 1)
Nn
n
N
N n 2
=
S .
Nn
V ( yWOR )=
SRSWR
=
K
=
E ( y Y )( y Y )
i
E ( y Y ) E ( y
i
je
Y )
=0
because the ith and jth draws (i j ) are independent.
Thus the variance of y under SRSWR is
V ( yWR ) =
N 1 2
S .
Nn
S2
n
N n
is responsible for changing the
N
variance of y when the sample is drawn from a finite population in comparison to an infinite
population. This is why
N n
is called a finite population correction (fpc) . It may be noted that
N
n
N n
n
N n
is close to 1 if the ratio of sample size to population
, is very small or
= 1 , so
N
N
N
N
n
is called sampling fraction. In practice, fpc can be ignored whenever
N
n
< 5% and for many purposes even if it is as high as 10%. Ignoring fpc will result in the
N
overestimation of variance of y .
Thus
V ( yWR ) > V ( yWOR )
and so, SRSWOR is more efficient than SRSWR.
Consider
=
s2
1 n
( yi y ) 2
n 1 i =1
1 n
=
( yi Y ) ( y Y )
n 1 i =1
=
1 n
( yi y ) 2 n( y Y ) 2
n 1 i =1
=
E (s 2 )
=
1 n
E ( yi Y ) 2 nE ( y Y ) 2
n 1 i =1
1 n
1
n 2 nVar ( y )
Var ( yi ) nVar ( y ) =
n 1 i =1
n 1
10
In case of SRSWOR
V ( yWOR ) =
N n 2
S
Nn
and so
=
E (s 2 )
n 2 N n 2
S
n 1
Nn
n N 1 2 N n 2
S
S
n 1 N
Nn
= S2
In case of SRSWR
V ( yWR ) =
N 1 2
S
Nn
and so
E (s 2 )
=
n 2 N n 2
S
n 1
Nn
n N 1 2 N n 2
S
S
n 1 N
Nn
N 1 2
S
=
N
=2
Hence
S 2 is SRSWOR
E (s2 ) = 2
is SRSWR
N 1 N 2
V ( yWR ) =
s
.
Nn N 1
s2
=
in case of SRSWR.
n
11
Standard errors
The standard error of y is defined as
Var ( y ) .
In order to estimate the standard error, one simple option is to consider the square root of estimate of
variance of sample mean.
under SRSWOR, a possible estimator is ( y ) =
N n
s.
Nn
N 1
s.
Nn
( y) .
It is to be noted that this estimator does not possess the same properties as of Var
Consider s as an estimator of S .
Let
s2 =
S 2 + with E ( ) =
0, E ( 2 ) =
S 2.
Write
=
s ( S 2 + )1/2
= S 1 + 2
S
1/2
2
= S 1 + 2 4 + ...
8S
2S
assuming will be small as compared to S 2 and as n becomes large, the probability of such an
event approaches one. Neglecting the powers of higher than two and taking expectation, we have
12
Var ( s 2 )
E ( s=
) 1
S
8S 4
where
2S 4 n 1
Var ( s ) =
1+
( 2 3) ) for large N .
(n 1) 2n
1
=
j
N
2 =
4
S4
(Y Y )
i =1
: coefficient of kurtosis.
Thus
3
1
2
E (s) =
S 1
8n
4(n 1)
2
1 Var ( s 2 )
Var ( s ) =
S S 1
4
8 S
2
Var ( s )
=
4S 2
S 2 n 1
=
1+
( 2 3) .
2 ( n 1) 2n
S2
.
2 ( n 1)
Both Var ( s ) and Var ( s 2 ) are inflated due to nonnormality to the same extent, by the inflation factor
n 1
1 + 2n ( 2 3)
This is an important result to be kept in mind while determining the sample size in which it is
assumed that
13
Alternative approach:
The results for the unbiasedness property and the variance of sample mean can also be proved in an
alternative way as follows:
(i) SRSWOR
With the ith unit of the population, we associate a random variable ai defined as follows:
1, if the i th unit occurs in the sample
ai =
th
0, if the i unit does not occurs in the sample (i =1, 2,..., N )
Then,
E (ai ) = 1 Probability that the i th unit is included in the sample
n
=
, i 1, 2,..., N .
N
E (ai2 ) = 1 Probability that the i th unit is included in the sample
=
n
=
, i 1, 2,..., N
N
E (ai a j ) = 1 Probability that the i th and j th units are included in the sample
=
n(n 1)
=
, i j 1, 2,..., N .
N ( N 1)
n( N n)
2
Var (ai ) =
E (ai2 ) ( E (ai ) ) = 2 , i =
1, 2,..., N
N
n( N n)
Cov(ai=
i j 1, 2,..., N .
, a j ) E (ai a j ) E (ai ) E=
(a j )
,=
N 2 ( N 1)
We can rewrite the sample mean as
1 N
ai yi
n i =1
Then
y=
=
E( y )
1 N
=
E (ai ) yi Y
n i =1
and
N
1
N
1 N
2
Var
(
a
)
y
+
Cov(ai , a j ) yi y j .
Var ( y ) = =
Var
a
y
i
i
i i
2
2
n
i j
n i 1
i =1=
14
Substituting the values of Var (ai ) and Cov(ai , a j ) in the expression of Var ( y ) and simplifying, we
get
Var ( y ) =
N n 2
S .
Nn
2
y
ny
ai yi2 ny 2 .
(n 1) i 1 =
=
(n 1) i 1
s2
=
(n 1) i =1
E (s 2 )
=
Substituting the values of E (ai ) and Var ( y ) in this expression and simplifying, we get E ( s 2 ) = S 2 .
(ii)
SRSWR
Let a random variable ai associated with the ith unit of the population denotes the number of times
the ith unit occurs in the sample i = 1, 2,..., N .
So
n!
N
a !
i =1
where
a
i =1
1
Nn
n
,
N
n( N 1)
Var (ai ) =
=
, i 1, 2,..., N .
N2
n
Cov(ai , a j ) = 2 , i j =
1, 2,..., N .
N
We rewrite the sample mean as
E (ai ) =
y=
1 N
ai yi .
n i =1
Hence, taking expectation of y and substituting the value of E (ai ) = n / N we obtain that
E( y ) = Y .
15
Further,
N
1 N
2
(
)
Var
a
y
Cov(ai , a j ) yi y j
+
i
i
2
n i 1 =i 1
=
Var ( y )
=
n( N 1) / N 2 and Cov(ai , a j ) =
n / N 2 and simplifying, we get
Substituting, the values of Var (ai ) =
Var ( y ) =
N 1 2
S .
Nn
N 1 2
S 2 in SRSWR, consider
=
N
To prove that=
E (s 2 )
n
yi2 ny 2 =
(n 1) s 2 =
a y
=i 1 =i 1
(n 1) E ( s 2=
)
E (a ) y
i =1
2
i
2
i
ny 2 ,
n {Var ( y ) + Y 2 }
( N 1) 2
n N
S nY 2
= yi2 n.
N i =1
nN
(n 1)( N 1) 2
S
=
N
N 1 2
E (s 2 ) =
S 2
=
N
Y
=
i =1
NY
16
Obviously
( )
E YT = NE ( y )
= NY
( )
Var YT = N 2 ( y )
2 N n 2 N ( N n) 2
S for SRSWOR
N Nn S =
n
=
N 2 N 1 S 2 = N ( N 1) S 2 for SRSWOR
Nn
n
Var (YT ) =
N s2
for SRSWOR
n
then
y Y
Var ( y )
follows N (0,1) when 2 is known. If 2 is unknown and is estimated from the sample then
y Y
follows a t -distribution with (n 1) degrees of freedom. When 2 is known, then the
Var ( y )
100( 1 ) % confidence interval is given by
y Y
P Z
Z
Var ( y )
2
2
or P y Z Var ( y )
1
=
y y + Z
2
Var ( y ) =1
y Z
Var ( y ), y + Z
2
Var ( y
17
when Z
y Y
P t
t =1
Var( y )
2
2
or P y t Var( y ) y y + t Var( y ) =1
2
2
y t Var( y ) y + t Var( y )
2
2
An important constraint or need to determine the sample size is that the information regarding the
population standard derivation S should be known for these criterion. The reason and need for this
will be clear when we derive the sample size in the next section. A question arises about how to
have information about S before hand? The possible solutions to this issue are to conduct a pilot
survey and collect a preliminary sample of small size, estimate S and use it as known value of S
it. Alternatively, such information can also be collected from past data, past experience, long
association of experimenter with the experiment, prior information etc.
Now we find the sample size under different criteria assuming that the samples have been drawn
using SRSWOR. The case for SRSWR can be derived similarly.
18
1. Prespecified variance
The sample size is to be determined such that the variance of y should not exceed a given value, say
V. In this case, find n such that
Var ( y ) V
or
N n
( y) V
Nn
or
N n 2
S V
Nn
or
1 1
V
2
n N S
or
1 1
1
n N ne
ne
n
1+ e
N
where ne =
S2
.
v
It may be noted here that ne can be known only when S 2 is known. This reason compels to assume
that S should be known. The same reason will also be seen in other cases.
The smallest sample size needed in this case is
nsmallest =
ne
.
ne
1+
N
estimation error, i.e., which is a small quantity. Such requirement can be satisfied by associating a
probability (1 ) with it and can be expressed as
P y Y e = (1 ).
19
Since y follows N (Y ,
N n 2
S ) assuming the normal distribution for the population, we can write
Nn
y Y
e
P
1
=
Var ( y )
Var ( y )
e
= Z
Var ( y )
2
or Z 2 Var ( y ) = e 2
2
or Z 2
2
N n 2
S = e2
Nn
Z S 2
2
or n =
2
Z S
1 2
1+
N e
Z S
n = 2e .
2 Z Var ( y ) W
2
2Z
2
N n
S W
Nn
1 1
or 4Z 2 S 2 W 2
N
2 n
20
or
1
1
W2
+
n N 4 Z 2 S 2
2
4 Z 2 S 2
2
W2
.
4 Z 2 S 2
or n
1+
NW 2
W2
4 Z 2 S 2
nsmallest =
1+
NW 2
If N is large then
4Z 2 S 2
n
W2
W2
If it is desired that the the coefficient of variation of y should not exceed a given or prespecified
value of coefficient of variation, say C0 , then the required sample size n is to be determined such
that
CV ( y ) C0
or
Var ( y )
C0
Y
21
N n 2
S
or Nn 2
C02
Y
or
1 1 C02
n N C2
C2
Co2
or n
C2
1+
NC02
is the required sample size where C =
S
is the population coefficient of variation.
Y
nsmallest
C2
C02
=
.
C2
1+
NC02
If N is large, then
n
C2
C02
and nsmalest
C2
= 2
C0
y Y
. If it is required that such relative estimation error should not exceed a prespecified value
Y
R with probability (1 ) , then such requirement can be satisfied by expressing it like such
1.
=
Var ( y )
Var ( y )
N n 2
Assuming the population to be normally distributed, y follows N Y ,
S .
Nn
22
N n 2
2 2
or Z 2
S = R Y
Nn
2
R2
1 1
or =
2 2
n N C Z
2
Z C
2
R
or n =
2
Z C
1
1+ 2
N R
where C =
S
is the population coefficient of variation and should be known.
Y
If N is large, then
2
z C
n= 2 .
R
6. Prespecified cost
Let an amount of money C is being designated for sample survey to called n observations, C0 be
the overhead cost and C1 be the cost of collection of one unit in the sample. Then the total cost C
can be expressed as
C
= C0 + nC1
Or n =
C C0
C1
23