Vous êtes sur la page 1sur 10

Slide 1 Now, we will discuss sample size.

Sample Size

SURVEY DESIGN: SAMPLING, PART 2

Slide 2 Considerations Driving Sample Size


The sample size chosen for a survey
should be based on how precise the final
 Parameter being estimated estimates need to be. In practice, trade-
 Level of precision desired offs are usually made between the ideal
 Acceptable confidence level
 Variability in the population
sample size and the expected cost of the
 Size of the target population survey. (By increasing the size of a
 Resources (cost, labor, time, and materials) sample you will get more precise
estimates but this requires increased
resources and could be wasteful.)

To decide on whether a larger or smaller


sample size is needed, you should
consider the items listed in this slide.
Now, before I proceed, we often say that
these items are the “determinants” of
sample size. This makes it sound as if
there is one “right” answer with respect
to the sample size. But as you will see,
some of the items included in the list are
unknowns (for which the investigator
must venture a guess) and some are set
by the investigator based on the goals of
the study and resources. As a result,
some investigators will select certain
values for these items and other
investigators will select other values,
leading to different estimates of the
required sample size. In other words, no
one right answer exists.
Slide 3 Parameter Being Estimated
So let’s talk about the “determinants” of
sample size. The most straightforward
 Different formula used
consideration in determining the sample
to calculate sample size size is the parameter (or statistic) being
depending on
parameter being estimated in the study. Is the goal of the
estimated
study to estimate what proportion of the
population has some characteristic (e.g.,
what proportion smoke, what proportion
have had their blood pressure checked in
the last two years, what proportion have
been told by a health care provider that
they have diabetes)? Or is the goal to
estimate the mean or average value for
some characteristic (e.g., salary, weight,
cholesterol measurement)? A different
formula is used to calculate the required
sample size depending on the parameter
being estimated (and the method for
selecting the sample), so keep that in
mind.

Slide 4 Level of Precision


The second item affecting the sample size
is the desired level of precision. The level
 Uncertainty associated with the quantity being of precision is the uncertainty associated
estimated
with the quantity being estimated
 Sampling error reported with results (e.g. 28 +/- 3%)

 Narrower range, more precise estimate


through the sample. (It is the range of
 To increase precision, increase sample size
estimates consistent with the survey
findings.) The level of precision is the
sampling error (i.e., the plus/minus) you
see reported with the results of a survey.
For example, if the percentage of people
who report that they smoke in a survey is
estimated to be 283%, that means that,
based on the sample data, the range of
values between 25% and 31% are likely to
contain the true parameter value for the
target population. The narrower the
range, the more precise the estimate.
The wider the range, the less precise the
estimate. To increase the precision of the
estimate, you must increase the sample
size.

Now, let me digress for a moment. How


precise do you need to be when
estimating a quantity in a survey?
This is a tough question, and your
decision should be based on how the
survey results will be used. You must ask
yourself “How will the precision of the
estimate (or lack of precision) affect the
conclusions (and actions based on those
conclusions)?”

Slide 5 Imprecise Estimates - Good


Believe it or not, sometimes imprecise
estimates are good enough. For example,
Interpretation of Prevalence of Malnutrition*
let’s say you determine that 3010% of
Prevalence Interpretation
children in a developing country are
<5% Acceptable malnourished. That means that the data
5-9% Poor
10-14% Serious are consistent with 20% to 40% of
>14% Critical
children being malnourished. Although
*Based on Z-scores <2.0 or 80% of median weight-for-height
that is a relatively imprecise estimate, it is
useful. Based on guidelines set by
international agreement, a prevalence of
malnutrition in a community between 10
and 14% is a serious sign. And a
prevalence above 14% is considered
critical, demanding immediate
remediation. Therefore, every value
within the 20-40% range points to a
critical problem that needs to be
addressed immediately.

Slide 6 Imprecise Estimates - Bad


On the other hand, if you wish to examine
trends in cholesterol screening in the U.S.
Estimates of Cholesterol Screening* to determine if national health objectives
United States
Healthy People 2010 objective = 80% (e.g., Healthy People 2010) are being met
Year Estimate Range** (or whether progress toward those goals
1991 67.6% 67.2-68.1%
2003 73.1% 72.7-73.4%
is occurring), it might be a different story.
*Based on CDC’s Behavioral Risk Factor Surveillance System (BRFSS)
To appreciate whether there has been a
**Based on 95% confidence level
change in the rate of cholesterol
screening over time, the estimates have
to be relatively precise because the
estimates for each year are not that
different. Based on the Behavioral Risk
Factor Surveillance System, in 1991,
about 68% of the U.S. adult population
had their cholesterol screened in the
previous 5 years. In 2003, about 73% had
their cholesterol screened.
Because the estimates from this survey
are fairly precise, we can say that
progress has been made on the rate of
cholesterol screening in the U.S. The rate
has gone up (although clearly, national
health objectives have not been met).
Had the sampling error in the survey been
greater, let’s say 3%, it would have been
impossible to say whether the estimates
for 1991 were different than for 2003.

Slide 7 Acceptable Confidence Interval


The third item affecting the sample size is
the desired level of confidence. The level
 Generally 90% or 95% of confidence is set by investigators, most
 Survey repeated under similar conditions, would of whom set the level at 90% or 95%.
expect true value for parameter in target population
in 95% of the sets of confidence intervals Using a confidence level of 95%, this
 Increase sample size for higher level of confidence means if the survey were repeated under
for same level of precision
similar conditions an infinite number of
times, you would expect the true value
for the parameter in the target
population in 95% of the sets of
confidence intervals calculated. To have a
higher level of confidence for the same
level of precision for the estimate being
examined, you need to increase the
sample size. Of course, if it is acceptable
to have a less precise estimate which will
have a wider range of values that have a
stated likelihood of containing the true
value, you can increase your confidence
without increasing the sample size.
Slide 8 Variability in the Population
The next item affecting the sample size is
the population variability. The variability
 Diversity among members of population with in the population is the diversity among
respect to parameter being studied
the members of the population with
 Less homogeneous, larger sample size

 More homogeneous, smaller sample size


respect to the parameter being studied.
(If there is not much diversity in the
population and the members are very
much alike with respect to the parameter
of interest, the population is considered
homogenous. If the members of the
population are very different with respect
to the parameter, the population is
considered heterogeneous.) The less
homogenous the population is (i.e., it has
more variability) the larger the sample
size that is needed to achieve the same
precision and level of confidence for the
estimated parameter. The more
homogenous a population is, the smaller
the sample size that is needed. Let’s face
it, if every person in a population were
exactly the same on the parameter being
studied (and the investigators knew it),
the sample size could be quite low … like
one. At the other extreme (i.e., where
there are large differences among
individuals and/or subgroups), calculating
a single overall estimate for the total
target population may be meaningless—
regardless of the sample size. In these
cases, it may make more sense to use a
sampling method (e.g., stratified random
sampling) that permits calculating precise
estimates for multiple subgroups within
the target population which contain more
homogeneous members.
Slide 9 Size of Target Population
The size of the target population also has
an impact on the sample size. To a
 Bigger the population, bigger sample certain extent, the bigger the population,
 At certain level, increase in population size no the bigger the sample needed. But once
longer affects sample size
 Sample size for certain level of precision and confidence same you reach a certain level, an increase in
for population of one million as twice or five times size
population size no longer affects the
required sample size. For instance, the
necessary sample size to achieve a certain
level of precision and confidence will be
about the same for a population of one
million as for a population twice or even
five times that size.

Slide 10 Resources
Finally, resources to perform the survey
will also impact the sample size. Some
 More resources available, larger possible sample size might say it is the major factor in deciding
 Often do not have adequate resources sample size! The more resources you
have available, the larger the possible
sample size. Sadly, we often do not have
adequate resources to really do the job
we want to do and must compromise.
That is, we must lower our expectations
and alter the goals of our survey.

Slide 11 Impact on Sample Size


Briefly, let’s summarize how each of these
considerations affect the sample size. To
Increase in:
Affect on increase the precision we must increase
sample size*
Precision the sample size. To increase our
Confidence
level confidence in the results, we must
Population
variability increase the sample size. If the
Size of
population
population is highly variable (i.e.,
Resources
heterogeneous), we must increase the
*Holding all other parameters constant sample size or, in some cases, select a
sample that represents separate, more
internally homogeneous subgroups. If
the size of the target population is large,
we must increase the sample size (up to a
point). And if we have lots of resources,
we can increase our sample size and do
all of the above … and still have enough
money left over for the celebration
afterward.
Okay, now to illustrate how one might
calculate sample size, and how changes in
the above items affect the sample size, I
will give you a formula that is used to
determine the required sample size when
estimating proportions from a random
sample. I do not expect you to memorize
this formula. But it is worth working
through it to help crystallize the thought
processes involved in determining sample
size.

Slide 12 Sample Size Calculation


Here is the formula: “Small n” equals “t”-
squared times “p” times “q” divided by
Sample Size Calculation for “d”-squared.
Estimating Proportions

n = t2pq
d2 “Small n” is the first estimate of the
Where:
n = first estimate of sample size
sample size. It will become clear
t = confidence
d = precision
momentarily why this is called the “first
p = proportion of population with characteristic being measured
q = proportion of population without characteristic being
estimate”. “t” represents the level of
measured (1-p)
confidence. For 90% confidence, you
should use 1.645 for “t”, for 95%
confidence, you should use 1.96, and for
99% confidence, you should use 2.58.
Trust me. “d” represents the level of
precision (i.e., the plus/minus on either
side of the estimate). As “d” goes down,
the level of precision goes up and vice
versa. Most investigators use 0.05
(meaning a precision of 5%) and 0.10
(meaning a precision of 10%) for d. “p”
is the proportion of the population with
the characteristic being measured. And
“q” is the proportion of the population
without the characteristic being
measured. “p” and “q” reflect the
variability in the population. By
definition, q=1-p. 100% of the population
minus the proportion of the population
with the characteristic will give you the
proportion of the population without the
characteristic. Obviously, you are doing
the survey to determine “p”, so you won’t
have an exact value for this. But, it is
likely that you can make an educated
guess based on previous studies or other
information.
If the proportion of the population with
the characteristics is completely
unknown, then set p=0.5. A “p” of 0.5 (or
approaching 0.5) is indicative of a
population with maximal variability.

Slide 13 Sample Size Calculation


Now, you aren’t quite finished yet. If the
first estimate of the sample size is equal
Sample Size Calculation for to 10% or more of the size of the target
Estimating Proportions
population, you can adjust the final
If n  10% of N then:
sample size by dividing the first estimate
nf = n
1 +n/N by one plus the first estimate divided by
the size of the target population. Note:
Where:
nf = final sample size Because you are dividing by a number
N = size of target population
greater than one, this adjustment serves
to lower the final sample size.

Slide 14 Sample Size Calculation Okay, let’s try an example (or actually
Example 1
series of examples) to drive this point
Sample Size Calculation for home. An investigator wishes to
Estimating Proportions

n = t2pq = (1.96)2(0.06)(0.94) = (3.84)(0.056)


determine the required sample size for a
d2 (0.05)2 0.0025 study that estimates the prevalence of
= 0.22 = 87
0.0025 adverse reactions to latex products
t = confidence = 1.96 among nurses in the operating room. The
d = precision = 0.05
p = proportion of population with characteristic = 0.06 target population is the Association of
q = proportion of population without characteristic = 1-p = 0.94
Operating Room Nurses who attend the
annual meeting which usually numbers
about 2,200. From a study published in
FDA consumer reports, it is thought that
approximately 6% of nurses have latex
sensitivities. If the investigator sets the
level of confidence as 95%, “t” equals
1.96. If the investigator wants a relatively
precise estimate, “d” is set at 0.05. Based
on the FDA report, the prevalence of
adverse reactions is 6%, so p=0.06. That
means that “q” which equals 1 minus 0.06
is 0.94. And “N” (big N) is 2,200. Plugging
all of the numbers into the formula we
get 87. Since 87 is not 10% of 2,200 (“big
N”) the required sample size estimate is
not adjusted.
Slide 15 Sample Size Calculation What if some of the parameters were
Example 2
changed? Let’s change only one at a
Sample Size Calculation for time so we can appreciate how the
Estimating Proportions
required sample size changes. What if
n = t2pq = (1.96)2(0.15)(0.85) = (3.84)(0.128)
d2 (0.05)2 0.0025 the suspected prevalence of latex
= 0.49 = 196 sensitivity was higher, like 15%. 15%
0.0025

t = confidence = 1.96
suggests increased variability in the
d = precision = 0.05
p = proportion of population with characteristic = 0.15 population compared to 6%. The required
q = proportion of population without characteristic = 1-p = 0.85
sample size goes up and 196 (as opposed
to 87) survey participants are needed.

Slide 16 Sample Size Calculation What if we are willing to have a less


Example 3
precise estimate? Let’s set “d” at 0.10. A
Sample Size Calculation for higher “d” means a less precise estimate.
Estimating Proportions
Okay, before I show you the calculations,
n = t2pq = (1.96)2(0.15)(0.85) = (3.84)(0.128)
d2 (0.10)2 0.01 would you expect the required sample
= 0.49 = 49 size to go up or down if we were
0.01

t = confidence = 1.96
accepting of a less precise estimate? If
d = precision = 0.10
p = proportion of population with characteristic = 0.15 you said the required sample size would
q = proportion of population without characteristic = 1-p = 0.85
go down, you are right. And here are the
calculations. Only 49 survey participants
would be needed.

Slide 17 Size of Sample Obtained


Of course, the size of the sample obtained
in a survey (i.e., the number of
 Percentage of participants can be lower than those participants who actually participate in
selected
and complete the survey or for whom
 Typically increase desired sample size proportionally
information is collected) may differ from
 Low response rates can damage credibility of survey
results the required sample size estimate. The
 Biased – less likely to represent overall target percentage of participants who actually
population
participate in a survey (also known as the
response rate) can be much lower than
those initially selected. As a result,
investigators will typically increase the
desired sample size proportionally,
knowing that a certain percentage of
potential survey participants will not
respond (or not be eligible.

NOTE: Although increasing the number of


participants selected for a survey might
allow investigators to increase the total
number of respondents (and, thereby
increase the precision of the estimates
and level of confidence), low response
rates can damage the credibility of a
survey's results, because the sample is
more likely to be biased in some way and,
therefore, is less likely to represent the
overall target population. So just
increasing the number of participants
selected to participate is not always the
answer. For further reading on sample
size calculation, please refer to the Herold
and Peavy reference at this end of this
presentation.

Slide 18 Summary
In conclusion, surveys can provide
invaluable information about the target
 Surveys can provide invaluable information population being studied. Surveys can tell
 Surveys can provide useful information and us what proportion of the population has
direction on how to improve the health and well-
being of our communities a particular health condition or disease,
 To provide useful information, a survey takes proper risk factor, or exposure that may lead to
forethought and planning
the disease. They can tell us what
members of the population think, feel,
and do about issues that affect their
health. As a result, surveys can provide
public health practitioners, such as
ourselves, with useful information and
direction on how we can improve the
health and well-being of our
communities. But to provide useful
information, a survey takes proper
forethought and planning.

Slide 19 References

 Herold JM and Peavy JV. Surveys and Sampling. Field


Epidemiology, 2nd ed. Ed. Gregg M. New York: Oxford
University Press, 2002.

 Hoshaw-Woodard, S. Description and comparison of the


methods of cluster sampling and lot quality assurance
sampling to assess immunization coverage. World Health
Organization: Geneva Switzerland, 2001.

Vous aimerez peut-être aussi