Vous êtes sur la page 1sur 14

Journal of Applied Statistics

ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: http://www.tandfonline.com/loi/cjas20

On stratified bivariate ranked set sampling for


regression estimators
Daniel F. Linder, Hani Samawi, Lili Yu, Arpita Chatterjee, Yisong Huang &
Robert Vogel
To cite this article: Daniel F. Linder, Hani Samawi, Lili Yu, Arpita Chatterjee, Yisong Huang &
Robert Vogel (2015) On stratified bivariate ranked set sampling for regression estimators,
Journal of Applied Statistics, 42:12, 2571-2583, DOI: 10.1080/02664763.2015.1043868
To link to this article: http://dx.doi.org/10.1080/02664763.2015.1043868

Published online: 12 May 2015.

Submit your article to this journal

Article views: 46

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=cjas20
Download by: [COMSATS Headquarters]

Date: 02 December 2015, At: 21:33

Journal of Applied Statistics, 2015


Vol. 42, No. 12, 25712583, http://dx.doi.org/10.1080/02664763.2015.1043868

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

On stratified bivariate ranked set sampling


for regression estimators
Daniel F. Lindera , Hani Samawia , Lili Yua , Arpita Chatterjeeb , Yisong Huanga
and Robert Vogela
a Department

of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University,


Statesboro, GA 30460, USA; b Department of Mathematical Sciences, Georgia Southern University,
Statesboro, GA 30460, USA
(Received 19 August 2014; accepted 20 April 2015)

We investigate the relative performance of stratified bivariate ranked set sampling (SBVRSS), with respect
to stratified simple random sampling (SSRS) for estimating the population mean with regression methods.
The mean and variance of the proposed estimators are derived with the mean being shown to be unbiased.
We perform a simulation study to compare the relative efficiency of SBVRSS to SSRS under various
data-generating scenarios. We also compare the two sampling schemes on a real data set from trauma
victims in a hospital setting. The results of our simulation study and the real data illustration indicate that
using SBVRSS for regression estimation provides more efficiency than SSRS in most cases.
Keywords: bivariate ranked set sampling; ranked set sampling; ratio estimator; regression estimator;
stratified sampling

1.

Introduction

It is typically advantageous if the number of measured observations used in analysis can be


reduced from a larger set of available units to a smaller set, but with the unmeasured units contributing some degree of information. If exact measurement of the observation is expensive in
terms of cost (i.e. genome sequencing experiments) or risk (i.e. invasive medical procedures)
it may be possible and desirable to extract information content on each unit by an alternative
method (i.e. visual, expert opinion, ranking, etc.) allowing experimenters to take exact measurements on only a subset of the available units. By far the most common sampling scheme
employed is the simple random sample (SRS) in which each observation in the sample is measured and contributes equally and independently to inferences made on the underlying population
structure. Ranked set sampling (RSS) was first introduced by McIntyre [7] and provides an alternative sampling scheme in which collections of SRS are taken, typically with one unit from
each of these SRS being measured after some form of ranking has been performed. Measured
*Corresponding author. Email: hsamawi@georgiasouthern.edu
c 2015 Taylor & Francis


Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

2572

D.F. Linder et al.

observations under the RSS scheme contribute independently to inferences on the underlying
population structure, similar to SRS. In contrast to SRS, measured observations in RSS do not
contribute identically to inference due to the additional structure that ranking imposes, allowing
observations to capture different aspects of the population and in turn leading to possible gains
in efficiency. Provided the ranking mechanism is reliable and more cost-effective than actual
unit measurement, RSS can be superior to SRS for estimating certain population attributes (i.e.
parameters) in terms of cost for information.
The literature on regression estimators is quite extensive beginning with Gauss and extending
to many volumes of work on linear models. When an additional variable is correlated with some
response variable of interest, the total information on the joint measurements may be used to
increase the precision in inferences made on the population structure of the variable of interest. Most regression analysis seeks to improve inference on the mean of the response variable
through this auxiliary information. Regression estimators using RSS were proposed by Yu and
Lam [20] with Al-Saleh and Al-Kadiri [1] improving mean estimation in terms of efficiency
using a double-ranked set sampling scheme. A double extreme ranked set sampling scheme was
used in [13] where mean estimation was improved in regression methods. The impact of RSS
on inference is discussed in [19]. Also, see Hatefi and Jafari Jozani [4] for Fisher information in
different types of perfect and imperfect ranked set samples from finite mixture models.
For multiple characteristics estimation, Patil et al. [9,10] and Norris et al. [8] used a bivariate
ranked set sampling (BVRSS) procedure, ranking only on one of the characteristics (X or Y ).
However, BVRSS, or ranking on both characteristics (X and Y ), introduced by Al-Saleh and
Zheng [2] can improve the performance of ratio and regression estimators as shown in [14] where
BVRSS for ratio and regression estimators were discussed. More details about RSS are available
in [6,11]. Stratified ranked set sampling (SRSS) was introduced in [12]. Furthermore, Samawi
and Siam [16] used SRSS to improve the performance of the ratio estimator. Also, Samawi and
Saeid [15] use stratified extreme RSS and apply it to ratio estimators.
In this paper, the performance of stratified bivariate ranked set sampling (SBVRSS) in comparison to stratified SRS for estimating the population means using regression estimators is
considered. The paper is outlined as follows. In Section 2 we describe the aspects of RSS and
give standard results about means and variances. In Section 3 we derive properties of SBVRSS
in regression estimation. A simulation study designed to evaluate the bias and efficiency of the
competing methods under different bivariate populations is conducted in Section 4. Finally, we
provide an illustration of SBVRSS on a data set from trauma patients in a hospital setting, with
concluding remarks in Section 5.

2.
2.1

RSS procedure and standard results


Univariate population

We begin by briefly introducing the univariate RSS scheme for completeness. The balanced RSS
scheme begins by selecting r SRS sets. These SRS sets are of size r from the target population and
will typically be 2, 3 or 4, although any set size is allowable. The elements of the sets are ranked
by a method that is more cost-effective than actual measurement, we mention a few examples
of such methods in the following sections. Letting X ij denote the jth element from the ith set
with X i(j) the jth ordered statistic from the ith set we may order the sets and instead of measuring
each unit from each set we only directly measure X1(1) , X2(2) , . . . , Xr(r) . Hence, we measure a
single order statistic from each set, giving one cycle of RSS. For a simple scenario, consider
a researcher interested in the average height of a population. In a RSS scheme, the researcher
could rank individuals in the SRS sets by visual inspection, noting which individual was the
shortest, then the second shortest, and so on until the tallest individual is identified and given

Journal of Applied Statistics

2573

the highest rank. In a clinical setting, the expert opinion of the physician may be used to rank
patients medical condition from least severe to the most severe. We may repeat the procedure m
times to obtain a sample of size n = mr [18].
2.2

Bivariate population

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

The convention for bivariate rank set sampling was developed in [2] as follows: We assume (X,
Y ) is a bivariate random vector with a distribution function which is absolutely continuous with
respect to Lebesgue product measure and hence admits the joint probability density function
(p.d.f.) f (x, y). The following steps to collect a BVRSS are given in [14] and we repeat these for
clarity.
Step 1 Randomly sample r4 units from the population. Distribute units randomly into r2 pools of
size r2 . Each pool corresponds to a square matrix of dimension r.
Step 2 Rank each set (row) from pool 1 (w.r.t) X and identify the unit with the smallest rank.
Step 3 Rank the r units identified in Step 2, now w.r.t. Y. Measure the unit with the smallest rank
(w.r.t.) Y. This pair of measurements is the first element of the BVRSS sample, which can
be labeled (1,1).
Step 4 Steps 2 and 3 are repeated for pool 2. However, the pair corresponding to the second
smallest rank w.r.t. Y is chosen for actual measurement (quantification) with corresponding
label (1, 2).
Step 5 Repeat the above steps until the label (r, r) is assigned to the r2 th (last) pool.
The above procedure produces a BVRSS of size r2 . The procedure can be repeated m times to
obtain a sample of size n = m r2 .
In sampling notation, assume that a random sample of size mr4 is identified (no measurements
were taken) from a bivariate p.d.f., say f (x, y); (x, y) R2 , with means x , y , variances x2 , y2
and correlation coefficient . Following Al-Saleh and Zheng [2], definition of BVRSS, then
[(X[i](j)k ,Y(i)[j]k ), i = 1, 2, . . . ,r; j = 1,2, . . . ,r; and k = 1,2, . . . ,m] denotes the BVRSS. Now, let
fX[i](j) , Y (x, y) be the joint p.d.f. of [(X[i](j)k , Y(i)[j]k ), k = 1, 2, . . . , m]. Al-Saleh and Zheng [2],
(i)[j]
with m = 1, showed that
(1)
(2)
(3)
2.3

1
r2
1
r2
1
r2

r r
i=1 f[i](j),(i)[j] (x, y) = f (x, y),
j=1
r r
i=1 fX[i](j) (x) = fX (x), and
j=1
r r
j=1
i=1 fY(i)[j] (y) = fY (y).

Stratified BVRSS

Similar to Samawi [12], SBVRSS, is a sampling plan, similar to stratified simple random sampling (SSRS), in which a population is divided into L mutually exclusive and exhaustive strata,
and a bivariate ranked set sample (BVRSS) of rh elements is quantified within each stratum,
h = 1, 2, . . . , L. The stratification is typically done in some natural way (i.e. according to the
population structure and not randomly) and is usually done to make sampling and inference
more efficient, as such as when it is easier to take random samples within a stratum. The process can be repeated m times within each stratum with sampling performed independently across
the strata. Therefore, we can think of a SBVRSS scheme as a collection of L separate bivariate
ranked set samples.
For m = 1, the following notations and results 
will be used throughout this paper.
2
= Var(Xhij ),
For all i, j = 1, 2, . . . , rh and h = 1, 2, . . . , L, let n = Lh=1 rh2 hx = E(Xhij ), hx

2574

D.F. Linder et al.

2
hy = E(Yhij ), hy
= Var(Yhij ), h = Corr(Xhij , Yhij ), hx[i](j) = E(Xh[i](j) ), hy(i)[j] = E(Yhi(i)[j] ),
2
2
var(Yhi(i)[j] ) = hy(i)[j] and var(Xhi[i](j) ) = hx[i](j)
. for all i, j = 1, 2, . . . , rh .

3.

Population mean estimators

3.1

Nave estimator of means

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

Using BVSRSS, [(Xh[i](j) ,Yh(i)[j] ), h = 1, 2, . . . , L; i, j = 1, 2, . . . , rh ], the nave estimator for



L
 h rh
2
x , y are, respectively, X SBVRSS = Lh=1 Wh ri=1
Wh X hBVRSS , and
j=1 (Xh[i](j) /rh ) =
L

h=1
rh rh
L
L
2
Y SBVRSS = h=1 Wh i=1 j=1 (Yh(i)[j] /rh ) = h=1 Wh Y hBVRSS , where
Wh = 1. Using
Lh=1

a similar argument as in [2], we can show that E(XSBVRSS ) = h=1 Wh hx = x and



E(Y SBVRSS ) = Lh=1 Wh hy = y with

rh
rh 
2


1
Wh2 hx
4
(hx[i](j) hx )2 and
var(X SBVRSS ) =
2
r
r
h
h i=1 j=1
h=1

rh
rh 
L
2



1
hy
Wh2 2 4
(hy(i)[j] hy )2 .
var(Y SBVRSS ) =
r
r
h
h i=1 j=1
h=1
L


3.2

Regression estimator

If an auxiliary variable X is correlated with the response of interest Y we may use the information
contained in X to improve inference on the mean of Y. Given a linear relationship, not passing
through the origin, an estimate of the population mean based on the linear regression is preferred
over ratio estimation. Sukhatme and Sukhatme [17] and Yu and Lam [20] explore the properties
of these estimators under SRS and RSS.
We use the two-phase regression estimator using SBVRSS, within each stratum h = 1, 2, . . . ,
L, in the first stage, suppose that (X, Y ) is a bivariate random vector with the joint p.d.f. f (x, y).
First, randomly sample rh4 units from each stratum and allocate the units into rh2 pools of size rh2 ,
each pool is a square matrix of dimension rh :
Step (a):
(1) Rank each set (row) in each of the first rh pools w.r.t. the first characteristic (X ). From each
row, identify the units with the smallest rank w.r.t. (X ), and obtain the measurement.
(2) Repeat Step 1 for the second rh pools, but identify the second minimum w.r.t. the first characteristic (X ), obtaining the measurement. Repeat until the rh th smallest units (maximum)
have been identified and measured, from each row of each of the last rh pools.
Note that this will produce rh pools, each of size rh2 , or rh3 quantified units (w.r.t the variable
X ), which are used to estimate x . The scheme continues as follows:
Step (b):
(1) From the ith pool produced in Step (a), identify the ith minimum w.r.t. the second characteristic (Y ), from the ith row, of that pool, and measure (Y ) on this unit. Note that (X ) has been
measured on these units.
The procedure gives a SBVRSS of size rh2 . A similar description is given in [14]. We give an
example below detailing how to select a BVRSS in each stratum. Suppose that we are interested

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

Journal of Applied Statistics

2575

in students heights (X) and weights (Y). To obtain a BVRSS of size 4 with set size of 2, we need
a SRS of 16 students and allocate them randomly into 4 pools, of which each has 4 students. For
illustration purposes, we give the actual weight and height measurements.



(56, 111) (60, 130)
(56, 111)

(X[1](1) = 56, Y(1)[1] = 111),


(65, 171) (67, 168)
(65, 171)



(70, 190) (62, 150)
(62, 150)

(X[2](1) = 67, Y(2)[1] = 170),


(67, 170) (71, 195)
(67, 170)



(68, 180) (60, 150)
(68, 180)

(X[1](2) = 68, Y(1)[2] = 180),


(72, 210) (73, 200)
(73, 200)



(73, 181) (60, 160)
(73, 181)

(X[2](2) = 71, Y(2)[2] = 215),


(71, 215) (70, 205)
(71, 215)
where the notation () indicates that the ranking is perfect and [] indicates that the ranking is
imperfect. The right column indicates the final BVRSS at the last stage. Note that the middle
column above indicates the stage at which rh3 units are used to estimate hx . In this particular
example, we have 8 units to measure the mean of the second characteristic (X ).
3.2.1

Separate regression estimator

Using the SBVRSS sample, [(Xh[i](j) ,Yh(i)[j] ) h = 1, 2, . . . , L; i, j = 1, 2, . . . , rh ], and assume that


Yh(i)[j] = hy + h (Xh[i](j)k hx ) + hij

i, j = 1, 2, . . . , rh , h = 1, 2, . . . , L,

(1)

2
2
where E(hij ) = 0, var(hij ) = he
= hy
(1 h2 ), Cov(hij , hlk ) = 0, i = l, j = k, h = 1, 2, . . . ,
L, then the separate regression estimator is given by

Y SSBVRSS = Y SBVRSS +

L


Wh B h (X hRSS X hBVRSS ),

(2)

h=1

where
Y SBVRSS =

L


Wh Y hBVRSS , B h =

rh rh
i=1

h=1

X hRSS =
X hBVRSS =

(Xh[i](j) X hBVRSS )(Yh(i)[j] Y hBVRSS )


,
rh rh
2

i=1
j=1 (Xh[i](j) XhBVRSS )

j=1

r
r
r
1 
Xhi(j)k ,
rh3 k=1 j=1 i=1
rh
rh 
1 
Xh[i](j)
rh2 i=1 j=1

and

rh
rh 
1 

YhBVRSS = 2
Yh(i)[j] .
rh i=1 j=1

Using similar arguments as in [14] and under the assumption of Equation (1), Y SSBVRSS is an
unbiased estimator of y .
Note that by Al-Saleh and Zheng [2]
E(X hRSS ) = hx

and

var(X hRSS ) =

rh
rh
2
hx
1 
1 
2

(hx(j) hx )2 .
rh4 i=1 hx(j)
rh4 j=1
rh3

Using the basic properties of conditional moments the following results can easily be proven.

2576

D.F. Linder et al.




Under the assumptions of Equation (1), E( h ) = h .

Proposition 3.2.1
Proposition 3.2.2

Var(h |X ) =  
rh
rh
i=1

j=1

2
he

(Xh[i](j)k X hBVRSS )

Under the assumptions of Equation (1):





2
rh
L
2


hy
(Z hRSS
Z hBVRSS )
h2 
2
2
2
Wh (1 h ) 2 1 + E
hx(j)
,
var(Y SSBVRSS ) =
+
2
4
r
S
r
h
h i=1
hBVRSS
h=1

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

Theorem 3.2.1

where

3.2.2

h 
h
1
2
(Zh[i](j)k Z hBVRSS ) ,
n i=1 j=1

Zh[i](j)k =

Xh[i](j)k hx
,
hx

2
ShBVRSS
=

Z hBVRSS =

X hBVRSS hx
hx

and

Z hRSS =

X hRSS hx
.
hx

Combined regression estimator using SBVRSS

The combined regression estimator of y is given by the following model:


Yh(i)[j] = y + (Xh[i](j) x ) + hij ,

i, j = 1, 2, . . . , rh ,

(3)

where
E(hij ) = 0, var(hij ) = e2 = y2 (1 2 ), Cov(hij , lk ) = 0, i = j, l = k, h = 1, 2, . . . , L.
As in [3], the combined regression estimator of y using SBVRSS is

Y CBVRSS = Y SBVRSS + B SBVRSS (X SRSS
X SBVRSS ),

(4)

where
Y SBVRSS =

L


Wh Y hBVRSS , X SBVRSS =

h=1

L



Wh X hBVRSS , X SRSS
=

h=1

L

h=1

Wh

rh 
rh 
rh

Xhi(j)k
i=1 j=1 k=1

rh3

and the optimal choice of SBVRSS [3] is given by


L
SBVRSS =

h=1

 h rh

(Wh2 /rh2 (rh2 1)) ri=1


j=1 (Yh(i)[j] YhBVRSS )(Xh[i](j) XhBVRSS )
.

L

2
rh
rh
2 2 2

(W
/r
(r

1))
(X

X
)
h[i](j)
hBVRSS
i=1
j=1
h=1
h h h

When the stratification is proportional and we can replace rh2 1 by rh2 then SBVRSS can be
simplified to
L

=
SBVRSS

h=1

rh rh

(Yh(i)[j] Y hBVRSS )(Xh[i](j) X hBVRSS )


,
rh rh
2

h=1
i=1
j=1 (Xh[i](j) XhBVRSS )

i=1

L

j=1

Journal of Applied Statistics

2577

[3]. The estimator becomes




Y CBVRSS
= Y SBVRSS + B SBVRSS (X SRSS
X SBVRSS ),

(5)

where
Y SBVRSS =

L


Wh Y hBVRSS , X SBVRSS =

h=1

L



Wh X hBVRSS , and X SRSS
=

h=1

Proposition 3.3.1

L

h=1

Wh

rh 
rh 
rh

Xhi(j)k
i=1 j=1 k=1

rh3

Under the assumption of Equation (3),

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015


) = .
(1) E(SBVRSS ) = and E(SBVRSS
L rh rh 2

2
|X ) =
(2) var(SBVRSS |X ) = e h=1 i=1 j=1 Chij and var(SBVRSS
L
h=1

rh rh
i=1

e2

j=1

2
(Xh[i](j) X hBVRSS )

where

(Wh2 /rh2 (rh2 1))(Xh[i](j) X hBVRSS )


.
rh rh
2
2 2 2

h=1 (Wh /rh (rh 1))


i=1
j=1 (Xh[i](j) XhBVRSS )

Chij = L
Theorem 3.3.1

Under the assumption of Equation (3),


(1) E(Y CSBVRSS ) = y ,E(Y CBVRSS
) = y (Unbiased
estimator)


rh rh 2 
2 L
Wh2
L
2
2

(2) var(YCSBVRSS ) = y (1 )
i=1
j=1 Chij
h=1 rh2 + E (XSBVRSS XSRSS )
h=1
L Wh2 rh 2
2
+
j=1 hx(j) ,
h=1 r4
h

when the stratification is proportional and we can replace rh2 1 by rh2 ,


rh
L 
2


(X SBVRSS X SRSS
)

2
(3) var(Y CSBVRSS
) = y2 (1 2 ) 1n + E L 
hx(j)
,
+ n12 2
nh
2

h=1

i=1

(Xh(i) XhRSS )

h=1 j=1

where
n=

L


rh2 .

h=1

The proof the theorem is straight forward by using the concept of conditional expectation.
4.

Simulation study

To gain insight into the efficiency of SBVRSS for regression estimators, we conduct a computer
simulation under various bivariate data-generating scenarios. First, we generate bivariate random samples, namely SSRS and SBVRSS, from bivariate normal distributions with parameters
(hx , hy hx , hy , h ), h = 1,2, . . . ,L. Three or four strata are used with different stratum sample
sizes nh and weights Wh . For the departure of normality assumptions, we investigate the performance of the proposed estimators under Placketts class of bivariate distributions. The Placketts
joint c.d.f is given by

1/2
S(x,y)[S 2 (x,y)4(1)F(x)G(y)]
if = 1,
2(1)
H(x, y) =
F(x)G(y)
if = 1,
where S(x, y) = 1 + ( 1)[F(x) + G(y)] and the parameter governs the dependence
between X and Y. The reason for choosing this class of bivariate distributions is that it covers a

rh

hx

hy

hx

hy

MSE SRS

MSE RSS

SBVSRS
Bias

SBVRSS
Bias

Efficiency of
Estimator

3
3
3
3
3
3
4
4
4
4
4
4
4
3
3
4
4

3,3,3
3,3,4
3,3,5
3,4,4
3,4,5
3,5,5
3,3,3,3
3,3,3,4
3,3,3,5
3,3,4,5
3,4,4,4
3,4,4,5
3,4,5,5
3,3,3
3,3,4
5,5,5,5
3,4,5,5

7.0, 7.1, 7.2


6, 5, 7
6.2, 7.3, 8.4
6.2, 7.3, 8.4
6.9, 7.1, 8.8
7.9, 6.1, 8.8
7.9, 6.1, 8.8, 9.2
6.9, 6.1, 8.8, 9.2
6.9, 7.1, 9.8, 10.3
5.9, 6.1, 9.8, 10.1
6.9, 6.1, 9.8, 10.1
7.9, 6.2, 9.7, 10.3
8.1, 6.3, 9.6, 10.9
7, 7, 7
7, 7, 7
7, 7, 7, 7
7, 7, 7, 7

16.2, 16.3, 16.4


12.2, 16.3, 18.4
16.2, 16.3, 16.4
16.2, 16.3, 16.4
17.2, 15.3, 18.4
20.2, 15.3, 18.4
20.2, 15.3, 18.4, 24.3
20.2, 15.3, 19.4, 24.3
20.2, 17.3, 19.4, 25.6
18.6, 17.3, 19.4, 25.6
18.6, 17.3, 19.4, 25.6
18.5, 17.2, 19.6, 25.7
18.4, 17.2, 19.6, 25.7
16, 16, 16
16, 16, 16
16, 16, 16, 16
16, 16, 16, 16

0.3,0.4,0.5
0.3,0.4,0.5
0.2,0.7,0.9
0.2,0.7,0.9
0.9,0.7,0.8
0.9,0.7,1.0
0.9,0.7,1.0, 2.1
0.3,0.7,1.0, 1.1
0.3,0.7,1.0, 2.4
0.9,0.7,1.0, 2.1
0.9,0.7,1.0, 2.1
1.9,0.7,1.0, 1.1
2.9,0.7,1.2, 1.9
3,4,5
3,4,5
3,4,5, 1
3,4,5, 1

1.4, 1.5, 1.6


1.4, 1.5, 1.6
1.4, 1.5, 1.6
1.4, 1.5, 1.6
2.4, 1.9, 1.2
2.4, 1.9, 1.2
2.4, 1.9, 1.2, 2.2
2.4, 1.9, 1.2, 2.4
2.4, 1.9, 2.2, 3.5
2.1, 1.9, 2.2, 3.0
2.1, 1.9, 2.2, 3.0
2.2, 1.8, 2.5, 3.3
2.2, 1.8, 2.5, 3.3
4, 5, 6
4, 5, 6
4, 5, 6, 3
4, 5, 6, 3

0.5, 0.6, 0.7


0.5, 0.6, 0.7
0.05, 0.06, 0.07
0.05, 0.06, 0.07
0.9, 0.8, 0.07
0.9, 0.8, 0.7
0.9, 0.8, 0.7, 0.5
0.09, 0.08, 0.07, 0.05
0.03, 0.02, 0.01, 0.04
0.03, 0.02, 0.01, 0.04
0.3, 0.2, 0.1, 0.4
0.2, 0.3, 0.4, 0.5
0.2, 0.3, 0.4, 0.5
0.01, 0.01, 0.01
0.01, 0.01, 0.01
0.9, 0.9, 0.9, 0.9
0.9, 0.9, 0.9, 0.9

0.0694
0.0539
0.0599
0.0631
0.0374
0.0265
0.0730
0.1115
0.1708
0.1164
0.0988
0.0959
0.0814
1.075
0.8726
0.0768
0.1117

0.0408
0.0294
0.0262
0.0265
0.0185
0.0147
0.0439
0.0500
0.0686
0.0465
0.0444
0.0405
0.0341
0.5252
0.3969
0.0435
0.0634

0.0014
0.001
0.0031
0.0033
0.0033
0.0030
0.0077
0.0034
0.0003
0.0019
0.0024
0.003
0.0004
0.0181
0.0165
0.0022
0.0068

0.0005
0.0016
0.0010
0.0014
0.0020
0.0001
0.0024
0.0025
0.0024
0.0002
0.0002
0.0010
0.0023
0.0058
0.0024
0.0014
0.001

1.6992
1.8368
2.2886
2.374659
2.0160
1.8044
1.6619
2.2317
2.4918
2.5030
2.2266
2.3667
2.3845
2.0466
2.1988
1.7678
1.7624

D.F. Linder et al.

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

2578

Table 1. Separate estimator efficiency for bivariate normal distributions.

rh

hx

hy

hx

hy

MSE SRS

MSE RSS

SBVSRS
Bias

SBVRSS
Bias

Efficiency of
Estimator

3
3
3
3
3
3
4
4
4
4
4
4
4
3
3
4
4

3,3,3
3,3,4
3,3,5
3,4,4
3,4,5
3,5,5
3,3,3,3
3,3,3,4
3,3,3,5
3,3,4,5
3,4,4,4
3,4,4,5
3,4,5,5
3,3,3
3,3,3
5,5,5,5
3,4 ,5,5

7,7.1,7.2
6, 5, 7
6.2, 7.3, 8.4
6.2, 7.3, 8.4
6.9, 7.1, 8.8
7.9, 6.1, 8.8
7.9, 6.1, 8.8, 9.2
6.9, 6.1, 8.8, 9.2
6.9, 7.1, 9.8, 10.3
5.9, 6.1, 9.8, 10.1
6.9, 6.1, 9.8, 10.1
7.9, 6.2, 9.7, 10.3
8.1, 6.3, 9.6, 10.9
7, 7, 7
7, 7, 7
7, 7, 7, 7
7, 7, 7, 7

16.2, 16.3, 16.4


12.2, 16.3, 18.4
16.2, 16.3, 16.4
16.2, 16.3, 16.4
17.2, 15.3, 18.4
20.2, 15.3, 18.4
20.2, 15.3, 18.4,24.3
20.2, 15.3, 19.4, 24.3
20.2, 17.3, 19.4, 25.6
18.6, 17.3, 19.4, 25.6
18.6, 17.3, 19.4, 25.6
18.5, 17.2, 19.6, 25.7
18.4, 17.2, 19.6, 25.7
16, 16, 16
16, 16, 16
16, 16, 16, 16
16, 16, 16, 16

0.3,0.4,0.5
0.3,0.4,0.5
0.2,0.7,0.9
0.2,0.7,0.9
0.9,0.7,0.8
0.9,0.7,1.0
0.9,0.7,1.0, 2.1
0.3,0.7,1.0, 1.1
0.3,0.7,1.0, 2.4
0.9,0.7,1.0, 2.1
0.9,0.7,1.0, 2.1
1.9,0.7,1.0, 1.1
2.9,0.7,1.2, 1.9
3,4,5
3,4,5
3,4,5, 1
3,4,5, 1

1.4, 1.5, 1.6


1.4, 1.5, 1.6
1.4, 1.5, 1.6
1.4, 1.5, 1.6
2.4, 1.9, 1.2
2.4, 1.9, 1.2
2.4, 1.9, 1.2, 2.2
2.4, 1.9, 1.2, 2.4
2.4, 1.9, 2.2, 3.5
2.1, 1.9, 2.2, 3.0
2.1, 1.9, 2.2, 3.0
2.2, 1.8, 2.5, 3.3
2.2, 1.8, 2.5, 3.3
4, 5, 6
4, 5, 6
4, 5, 6, 3
4, 5, 6, 3

0.5, 0.6, 0.7


0.5, 0.6, 0.7
0.05, 0.06, 0.07
0.05, 0.06, 0.07
0.9, 0.8, 0.07
0.9, 0.8, 0.7
0.9, 0.8, 0.7, 0.5
0.09, 0.08, 0.07, 0.05
0.03, 0.02, 0.01, 0.04
0.03, 0.02, 0.01, 0.04
0.3, 0.2, 0.1, 0.4
0.2, 0.3, 0.4, 0.5
0.2, 0.3, 0.4, 0.5
0.01, 0.01, 0.01
0.01, 0.01, 0.01
0.9, 0.9, 0.9, 0.9
0.9, 0.9, 0.9, 0.9

0.0649
0.0513
0.0578
0.0602
0.0477
0.0314
0.0859
0.1043
0.1651
0.1131
0.0954
0.0963
0.0822
1.0048
0.8340
0.0815
0.1202

0.0391
0.0286
0.0254
0.0256
0.0190
0.0148
0.0453
0.0472
0.0663
0.0451
0.0432
0.0389
0.0334
0.495
0.3836
0.0433
0.0625

0.0012
0.0008
0.0030
0.0038
0.0050
0.0034
0.0091
0.0036
0.0009
0.0018
0.0036
0.0036
0.0004
0.0152
0.0195
0.0011
0.0062

0.0002
0.0016
0.0005
0.0013
0.0025
0.0004
0.0015
0.0024
0.0026
0.0001
0.0002
0.0001
0.0034
0.003
0.0018
0.0019
0.0022

1.6606
1.7969
2.2696
2.3510
2.5066
2.1179
1.8946
2.2097
2.4891
2.5091
2.2077
2.4747
2.4630
2.0298
2.1743
1.881
1.9215

Journal of Applied Statistics

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

Table 2. Combined estimator efficiency for bivariate normal distributions.

2579

rh

hx

hy

MSE SRS

MSE RSS

SBVSRS
Bias

SBVRSS
Bias

Efficiency of
estimator

3
3
3
3
3
3
4
4
4
4
4
4
4
3
3
4
4

3,3,3
3,3,4
3,3,5
3,4,4
3,4,5
3,5,5
3,3,3,3
3,3,3,4
3,3,3,5
3,3,4,5
3,4,4,4
3,4,4,5
3,4,5,5
3,3,3
3,4,5
5,5,5,5
3,4,5,5

0.5, 0.6, 0.7


0.5, 0.6, 0.4
0.3, 0.2, 0.4
0.1, 0.2, 0.5
0.9, 0.8, 0.7
0.9, 0.8, 0.7
0.5, 0.6, 0.4,0.3
0.5, 0.6, 0.4,0.3
0.5, 0.6, 0.4,0.7
0.5, 0.6, 0.8,0.7
0.9, 0.6, 0.8,0.7
0.9, 0.6, 0.8,0.7
0.9, 0.6, 0.8,0.7
0.9, 0.8, 0.9
0.9, 0.8, 0.9
0.9, 0.8, 0.9,0.8
0.9, 0.8, 0.9,0.8

0.5, 0.6, 0.7


1.2, 1.7, 1.6
1.9, 1.7, 1.3
1.9, 1.7, 1.3
1.9, 1.7, 1.8
1.6, 1.7, 1.8
1.2, 1.7, 1.6, 1.8
1.2, 1.7, 1.6, 1.8
1.9, 1.7, 1.6, 1.8
2.1, 1.7, 1.6, 2.5
2.1, 2.7, 2.6, 2.5
2.1, 2.7, 2.6, 2.5
2.1, 2.7, 2.6, 2.5
2.7, 2.7, 2.6
2.7, 2.7, 2.6
2.7, 2.7, 2.6, 2.5
2.7, 2.7, 2.6, 2.5

0.5, 0.6, 0.7


0.05, 0.06, 0.07
0.1, 0.06, 0.07
0.1, 0.2, 0.3
0.4, 0.2, 0.3
0.4, 0.5, 0.3
0.05,0.06, 0.07, 0.08
1.03, 1.06, 1.07, 1.08
1.99, 1.8, 1.7, 2.08
2.50, 1.8, 1.7, 2.8
2.50, 3.8, 3.6, 2.9
3.50, 4.8, 3.6, 2.9
10, 9, 7, 8
0, 0, 0
10,10,10
1, 1, 1, 1
0, 0, 0, 0

0.1197
0.0525
0.0432
0.0561
0.0611
0.0489
0.055
0.0682
0.063
0.0773
0.1096
0.0897
0.0707
0.1394
0.1125
0.0698
0.0397

0.0741
0.0317
0.026
0.0338
0.0322
0.0241
0.0384
0.0354
0.0334
0.0379
0.0586
0.0470
0.0388
0.0807
0.0639
0.0330
0.0162

0.2519
0.0217
0.0119
0.0111
0.0075
0.0045
0.0276
0.0002
0.0025
0.0034
0.0089
0.0053
0.0131
0.0843
0.0192
0.0018
0.0436

0.0565
0.0009
0.0022
0.0001
0.0004
0.0012
0.0004
0.0008
0.0042
0.0045
0.0012
0.0057
0.0012
0.0042
0.0062
0.0004
0.0019

1.6158
1.6578
1.6631
1.6583
1.898
2.0319
1.4318
1.9277
1.8867
2.041
1.8693
1.9105
1.8226
1.7273
1.7602
2.1177
2.4456

D.F. Linder et al.

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

2580

Table 3. Separate estimator efficiency for Placketts distributions.

rh

hx

hy

MSE SRS

MSE RSS

SBVSRS
Bias

SBVRSS
Bias

Efficiency of
Estimator

3
3
3
3
3
3
4
4
4
4
4
4
4
3
3
4
4

3,3,3
3,3,4
3,3,5
3,4,4
3,4,5
3,5,5
3,3,3,3
3,3,3,4
3,3,3,5
3,3,4,5
3,4,4,4
3,4,4,5
3,4,5,5
3,3,3
3,4,5
5,5,5,5
3,4,5,5

0.5, 0.6, 0.7


0.5, 0.6, 0.4
0.3, 0.2, 0.4
0.1, 0.2, 0.5
0.9, 0.8, 0.7
0.9, 0.8, 0.7
0.5, 0.6, 0.4,0.3
0.5, 0.6, 0.4,0.3
0.5, 0.6, 0.4,0.7
0.5, 0.6, 0.8,0.7
0.9, 0.6, 0.8,0.7
0.9, 0.6, 0.8,0.7
0.9, 0.6, 0.8,0.7
0.9, 0.8, 0.9
0.9, 0.8, 0.9
0.9, 0.8, 0.9,0.8
0.9, 0.8, 0.9,0.8

0.5, 0.6, 0.7


1.2, 1.7, 1.6
1.9, 1.7, 1.3
1.9, 1.7, 1.3
1.9, 1.7, 1.8
1.6, 1.7, 1.8
1.2, 1.7, 1.6, 1.8
1.2, 1.7, 1.6, 1.8
1.9, 1.7, 1.6, 1.8
2.1, 1.7, 1.6, 2.5
2.1, 2.7, 2.6, 2.5
2.1, 2.7, 2.6, 2.5
2.1, 2.7, 2.6, 2.5
2.7, 2.7, 2.6
2.7, 2.7, 2.6
2.7, 2.7, 2.6, 2.5
2.7, 2.7, 2.6, 2.5

0.5, 0.6, 0.7


0.05, 0.06, 0.07
0.1, 0.06, 0.07
0.1, 0.2, 0.3
0.4, 0.2, 0.3
0.4, 0.5, 0.3
0.05,0.06, 0.07, 0.08
1.03,1.06, 1.07, 1.08
1.99, 1.8, 1.7, 2.08
2.50, 1.8, 1.7, 2.8
2.50, 3.8, 3.6, 2.9
3.50, 4.8, 3.6, 2.9
10, 9, 7, 8
0, 0, 0
10,10,10
1, 1, 1, 1
0, 0, 0, 0

0.1241
0.0505
0.0436
0.0586
0.0586
0.0480
0.0538
0.0645
0.0599
0.0741
0.1065
0.0873
0.0686
0.1323
0.1107
0.0679
0.0378

0.0588
0.0312
0.0251
0.0320
0.0316
0.0236
0.0377
0.0339
0.0322
0.0369
0.0577
0.0459
0.0379
0.0795
0.0634
0.0326
0.0162

0.242
0.0078
0.0007
0.0025
0.0038
0.0028
0.006
0.0012
0.0012
0.005
0.0014
0.0003
0.0036
0.0229
0.0056
0.0012
0.0121

0.0417
0.0011
0.0015
0.0002
0.0006
0.0028
0.0018
0.0010
0.0046
0.0041
0.0013
0.0062
0.0009
0.0019
0.0053
0.0004
0.0005

2.113
1.6208
1.7382
1.8296
1.8549
2.037
1.4282
1.8994
1.8613
2.0045
1.8461
1.9021
1.8129
1.6650
1.7470
2.0835
2.3390

Journal of Applied Statistics

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

Table 4. Combined estimator efficiency for Placketts distributions.

2581

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

2582

D.F. Linder et al.

wide range of dependency. For example, in the case of U(0, 1) marginal distributions, X = F(X )
and Y = G(Y ) we have the following:
(a) 0 X = 1 Y . (b) = 1 X and Y are independent.
(c) X = Y .
We control dependency with the parameter = 0.05, 0.3, 3, 10, ranging from strongly negative to strongly positive, respectively, and consider two marginal distributions, uniform and
exponential. See Johnson [5] (p. 191197, for more detail. Results from the simulations, each
from 5000 simulated replications and in terms of means and mean-squared errors, are located
in Tables 14. We see from the results that SBVRSS performs similar to SSRS in terms of bias
in estimating the population mean for all combinations of parameters and model values for both
types of bivariate distributions. In terms of the efficiency of SBVRSS, compared with SSRS,
the efficiency varies from roughly 1.4 to 2.5 indicating the SBVRSS is superior in all cases of
simulated data, which is justified theoretically from the results above.

5.

Application to real data set

In order to investigate the performance of the methods introduced in this paper in a real data
setting, we compare the SBVRSS regression estimates to SSRS regression estimates on data collected from trauma victims in a hospital setting. Each observation consists of the patients age, bd
score, and gender. The bd score is a measure indicating the level of blunt force trauma as reported
by the administering doctor. The data are stratified by gender with population strata numbers of:
male stratum size N 1 = 1480 and female stratum size N 2 = 669. For the analysis, we treat the
data as the population and resample it 5000 times under the various sampling mechanisms (i.e.
SBVRSS and SSRS) to estimate the mean bd score using the covariate age under stratification
by gender. For the SBVRSS, we take r = 5 from the female stratum and r = 6 from the male
stratum, resulting in samples of sizes 25 and 36 from the female and male strata, respectively.
We take the same sample sizes for the SSRS as well, which results in allocation that is close
to proportional allocation. Again we emphasize that we are treating the data as the population,
allowing for the calculation of the population values.
2
) = 412.58, the mean
For females, it was found that the mean age ( 1x ) = 35.44, Variance (1x
2
bd score ( 1y ) = 2.25, Variance (1y ) = 12.19, 1 = 0.21.
2
) = 314.88, the mean
For males, it was found that the mean age ( 2x ) = 34.55, Variance (2x
2
bd score ( 2y ) = 2.06, Variance (2y ) = 12.96, 2 = 0.13.
The overall mean age ( x ) = 34.83, the mean bd score ( y ) = 2.12, variance ( 2 ) = 12.72,
= 0.15 with weight w1 = 0.4, w2 = 0.6
Using SSRS, the separate estimator (i.e. see Section 3.2.1) has mean 2.74, with variance
1.62 and the combined estimator (i.e. see Section 3.2.2) has mean 2.41 with variance 0.21.
Using SBVRSS, the separate estimator is 2.48, with variance 0.51; the combined estimator is
2.63, with variance 0.22. From the results above, we conclude that both sampling techniques
exhibit similar performance in terms of bias with SBVRSS performing better in terms of variance
for the separate estimator and performing the same for the combined estimator.
Our conclusions from the derivations as well as from the simulation study indicate that
SBVRSS is more efficient than SSRS when the parameter of interest is a population mean. We
observed this gain in efficiency in nearly all situations investigated, with the only exception
being the case of the combined estimator for the real data example, in which the performance
was similar. The applications that could benefit from this type of sampling and regression are
wide and range from clinical trials to agriculture. In light of the results presented, it seems
fruitful to explore the theoretical properties of the RSS scheme when using a correlated auxiliary variable to improve inference on various other population characteristics (i.e. quantile

Journal of Applied Statistics

2583

regression, characteristic function estimation, etc.). We hope that the theoretical results presented
here coupled with the simulation study will lead to further interest in RSS and its ability to
improve inference.

Downloaded by [COMSATS Headquarters] at 21:33 02 December 2015

References
[1] M.F. Al-Saleh, and M.A. Al-Kadiri, Double ranked set sampling, Stat. Probab. Lett. 48(2) (2000), pp. 205212.
[2] M.F. Al-Saleh and G. Zheng, Estimation of bivariate characteristics using ranked set sampling, Aust. N.Z. J. Stat.
44 (2002), pp. 221232.
[3] W.G. Cochran, Sampling Techniques, 3rd ed., John Wiley & Sons, New York, 1977.
[4] A. Hatefi, and M. Jafari Jozani, Fisher information in different types of perfect and imperfect ranked set samples
from finite mixture models, J. Multivariate Anal. 119 (2013), pp. 1631.
[5] M.E. Johnson, Multivariate Statistical Simulation, Wiley, New York, 1987.
[6] A. Kaur, G.P. Patil, A.K. Sinha, and C. Tailie, Ranked set sampling: An annotated bibliography, Environ. Ecol. Stat.
2 (1995), pp. 2554.
[7] G.A. McIntyre, A method for unbiased selective sampling using ranked set, Aust. J. Agric. Res. 3 (1952),
pp. 385390.
[8] R.C. Norris, G.P. Patil, and A.K. Sinha, Estimation of multiple characteristics by ranked set sampling methods,
COENOSES 10(2/3) (1995), pp. 95111.
[9] G.P. Patil, A.K. Sinha, and C. Taillie, Relative efficiency of ranked set sampling: Comparison with regression
estimator, Environmetrics 4 (1993), pp. 399412.
[10] G.P. Patil, A.K. Sinha, and C. Taillie, Ranked set sampling for multiple characteristics, Int. J. Ecol. Environ. Sci.
20 (1994), pp. 94109.
[11] G.P. Patil, A.K. Sinha, and C. Taillie, Ranked set sampling: A bibliography, Environ. Ecol. Statist. 6 (1999),
pp. 9198.
[12] H.M. Samawi, Stratified ranked set sample, Pak. J. Stat. 12(1) (1996), pp. 916.
[13] H.M. Samawi, On double extreme ranked set sample with application to regression estimator, Metron LX(1/2)
(2002), pp. 5366.
[14] H.M. Samawi and Al-Saleh, On bivariate ranked set sampling for ratio and regression estimators, Int. J. Model.
Simul. 27(4) (2007), pp. 17.
[15] H.M. Samawi and L.J. Saeid, Stratified extreme ranked set sample with application to ratio estimators, J. Modern
Appl. Statist. Methods 3(1) (2004), pp. 117133.
[16] H.M. Samawi and M.I. Siam, Ratio estimation using stratified ranked set sample, Metron LXI(1) (2003), pp. 7590.
[17] P.V. Sukhatme and B.V. Sukhatme, Sampling Theory of Surveys with Applications, Iowa state university Press,
Ames, 1970.
[18] K. Takahasi and K. Wakimoto, On unbiased estimates of the population mean based on the stratified sampling by
means of ordering, Ann. Inst. Statist. Math. 20 (1968), pp. 131.
[19] D.A. Wolfe, Ranked set sampling: Its relevance and impact on statistical inference, Probab. Stat. (2012).
doi:0.5402/2012/568385.
[20] L.H. Yu and K. Lam, Regression estimator in ranked set sampling, Biometrics 53 (1997), pp. 10701080.

Vous aimerez peut-être aussi