Académique Documents
Professionnel Documents
Culture Documents
previous :: next
The Annals of Mathematical Statistics Last Issue All Issues Search this Journal IMS and IMS Related Journals
Snowball Sampling
Leo A. Goodman Source: Ann. Math. Statist. Volume 32, Number 1 (1961), 148-170.
Abstract
as follows: A random sample of individuals is drawn from a given finite population. (The kind of random sample will be discussed later in this section.) Each individual in the population, where is a specified integer; for example,
Turn MathJax Off What is MathJax?
k k k s
An
stage
each individual may be asked to name his " best friends," or the " individuals with whom he most frequently associates," or the " individuals whose opinions he most frequently seeks," etc. (For the sake of simplicity, we assume throughout that an individual cannot include were not in the random sample but were named by individuals in it form the first stage. Each of the individuals individuals. (We assume that the question asked of the individuals in the random sample and of those in each who were not in the random sample nor in the first stage but were named by individuals who were in the first stage form the second stage. Each of the individuals in the
projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177705148
k k k k k
different
1/6
04/02/13
The individuals who were not in the random sample nor in the first or second stages but were named by individuals who were in the second stage form the third stage. Each of different individuals. This procedure is continued until each of the individuals in the th stage has been asked to name name snowball sampling procedure can be utilized to
s k
make statistical inferences about various aspects of the relationships present in the population. The relationships present, in the hypothetical situation where each individual can be described by a matrix with rows and columns corresponding to the members of the population, rows for the individuals naming and columns for the individuals is 1 if the th individual in the population includes the th 0 otherwise. While the matrix of the 's cannot be known in general unless every individual in the population is will be possible to make statistical inferences about various stage when name snowball sampling procedure. For example,
11 M
, the number,
The methods of statistical inference applied to the data course depend on the kind of random sample drawn as the initial step. In most of the present paper, we shall suppose that a random sample (i.e., the "zero stage" in snowball
projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177705148 2/6
k s
obtained from an
different individuals), it
ji
different individuals.
stage
1 = k = s
1 =
ij
ji
k k
different individuals,
04/02/13
sample) is drawn so that the probability, , that a given individual in the population will be in the sample is independent of whether a different given individual has appeared. This kind of sampling has been called binomial been called the sampling fraction [4]. This sampling scheme might also be described by saying that a given individual is included in the sample just when a coin, which the tosses of the coin from individual to individual are independent. (To each individual there corresponds an independent Bernoulli trial determining whether he will or will not be included in the sample.) This sampling scheme differs in some respects from the more usual models where the sample size is fixed in advance or where the ratio of the sample size to the population size (i.e., the sample sizepopulation size ratio) is fixed. For binomial sampling, this ratio is a random variable whose expected value is . (The variance of this ratio approaches zero as the population becomes infinite.) In some situations (where, for example, the variance of this ratio is near zero), mathematical results obtained for binomial sampling are sometimes quite similar to results obtained using some of the more usual sampling models (see [4], [7]; compare the variance formulas in [3] and [5]); in such cases it will often not make much difference, from a practical point of view, which sampling model is utilized. (In Section 6 of the present paper some results for snowball sampling based on an initial sample of the more usual kind are obtained and compared with results presented in the earlier sections of this paper obtained for snowball sampling based on an initial binomial sample.) For snowball sampling based on an initial
1 = k = s p p p
has a probability
, so that each
3/6
projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177705148
04/02/13
individual asked names just one other individual and there is just one stage beyond the initial sample, Section 2 of number of pairs of individuals in the population who would name each other. One of the unbiased estimators considered (among a certain specified class of estimators) has uniformly smallest variance when the population characteristics are unknown; this one is based on a sufficient statistic for a simplified summary of the data and sufficient statistic (when the population characteristics are variance than a comparable minimum variance unbiased estimator computed from a larger random sample when random sample are interviewed) even where the expected number of individuals in the larger random sample is equal to the maximum expected number (i.e., the sum of the
1 = k = s ) 1 = k , 0 = s( 1 = k 1 = k = s
11 M 11 M
, the
and
expected number of individuals in the initial sample and the maximum expected number of individuals in the first and is at least twice as large as the variance of the even where the and
0 = s 1 = k = s 0 = s
stage). In fact, the variance of the estimator when comparable estimator when
1 = k
based on the simplified summary of the data having minimum variance when the population characteristics are unknown can be improved upon in cases where certain
projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177705148 4/6
1 = k = s
1 = k
0 = s
1 = k = s
1 = k = s
0 = s
is as large as the maximum expected number of . Thus, for estimating is preferable to . Furthermore,
and
04/02/13
population characteristics are known, or where additional data not included in the simplified summary are available. Several improved estimators are derived and discussed. generalized in Sections 3 and 4 to deal with cases where results are presented about
k 1 = k = s
are
s
stage
name snowball
sampling procedures, where each individual asked to name from the population. (Except in Section 5, the numbers
k
which form the matrix referred to earlier, are assumed to be fixed (i.e., to be population parameters); in Section 5, they are random variables. A variable response error is not considered except in so far as Section 5 deals with an extreme case of this.) For social science literature that discusses problems related to snowball sampling, see [2], [8], and the articles they cite. This literature indicates, among other things, the importance of studying "social structure and...the relations among individuals" [2].
First Page: Show Full-text: Open access
PDF File (2432 KB)
ji
individuals at random ,
04/02/13
projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/1177705148
6/6