Vous êtes sur la page 1sur 31

# Sampling:

Chapter 4
Surveys and
Questions
The Beauty of Sampling
Sample Survey: a subgroup of a large
population questioned on set of topics.
Special type of observational study.
Less costly and less time than a census.
With proper methods, a sample of
1500 can almost certainly gauge the
percentage in the entire population who
have a certain trait or opinion to within
3%. 2
The Margin of Error:
The Accuracy of Sample Surveys
The sample proportion and the population
proportion with a certain trait or opinion
differ by less than the margin of error in at
least 95% of all random samples.
1
Conservative margin of error = n ´ 100%
the margin of error to create an
approximate 95% confidence interval.
3
The Margin of Error:
The Accuracy of Sample Surveys
A confidence interval is an interval of
values computed from sample data
that is likely to include the true
population value.

1
´ 100%
n

4
Example: The Importance of Religion
Poll of n = 1003 adult Americans: “How important
would you say religion is in your own life?”
Very important 65%
Fairly important 23%
Not very important 12%
No opinion 0%
Conservative margin of error is 3%:
Approx. 95% confidence interval for the percent of
all adult Americans who say religion is very important:
65% ± 3% or 62% to 68%
5
Interpreting Confidence Interval
The interval 62% to 68% may or may not
capture the percent of adult Americans who
considered religion to be very important in
their lives.
But, in the long run this procedure will
produce intervals that capture the unknown
population values about 95% of the time
=> called the 95% confidence level.
6
Confidence Level
Confidence Level describes the chance
(probability) that an interval actually
contains the true population value
Most of the time (quantified by the confidence
level) intervals computed in this way will capture
the truth about the population, but occasionally
they will not. In any given instance, the interval
either captures the truth or it does not, but we
will never know which is the case. Therefore, our
confidence is the procedure (it works most of the
time) and the confidence level (or level of
confidence) is the percent of time we expect it to
work. 7
Advantages of a Sample Survey
over a Census
Sometimes a Census Isn’t Possible
when measurements destroy units

Speed
especially if population is large

Accuracy
devote resources to getting accurate sample results
8
Bias: How Surveys Can Go Wrong
Results based on a survey are biased if method used to
obtain those results would consistently produce values
that are either too high or too low.

## Selection bias occurs if method for selecting

participants produces sample that does not represent
the population of interest.
Nonresponse bias occurs when a representative
sample is chosen but a subset cannot be contacted
or doesn’t respond.
Response bias occurs when participants respond
differently from how they truly feel.
9
Simple Random Sampling
and Randomization
Probability Sampling Plan: everyone in
population has specified chance of making it
into the sample.

## Simple Random Sample: every conceivable

group of units of the required size has the
same chance of being the selected sample.

10
Choosing a Simple Random Sample
You Need:
• List of the units in the population.
• Source of random numbers.
Portion of a Table of Random Digits:
ROW
0 00157 37071 79553 31062 42411 79371 25506 69135
1 38354 03533 95514 03091 75324 40182 17302 64224
2 59785 46030 63753 53067 79710 52555 72307 10223
3 27475 10484 24616 13466 41618 08551 18314 57700
4 28966 35427 09495 11567 56534 60365 02736 32700
5 98879 34072 04189 31672 33357 53191 09807 85796
6 50735 87442 16057 02883 22656 44133 90599 91793
7 16332 40139 64701 46355 62340 22011 47257 74877
8 83845 41159 67120 56273 67519 93389 83590 12944
9 12522 20743 28607 63013 60346 71005 90348 86615

11
Simple Random Sample of Students
Class of 270 students.
Want a simple random sample of 10 students.
• Number the units: Students numbered 001 to 270.
• Choose a starting point: Row 3, 2nd column (10484…)
• Read off consecutive numbers: (3-digit labels here)
104, 842, 461, 613, 466, 416, 180, 855, 118, 314, 577, 002, 896, …
• If number corresponds to a label, select that unit.
If not, skip it. Continue until desired sample size obtained.
ROW
0 00157 37071 79553 31062 42411 79371 25506 69135
1 38354 03533 95514 03091 75324 40182 17302 64224
2 59785 46030 63753 53067 79710 52555 72307 10223
3 27475 10484 24616 13466 41618 08551 18314 57700
4 28966 35427 09495 11567 56534 60365 02736 32700
5 98879 34072 04189 31672 33357 53191 09807 85796
6 50735 87442 16057 02883 22656 44133 90599 91793
12
7 16332 40139 64701 46355 62340 22011 47257 74877
Simple Random Sample of Students
5. Step 4 very inefficient.
Can give each unit in population multiple labels.
e.g. use 001 to 270 then 301 to 570, 601 to 870
so the second 3-digit number of 842 would
correspond to unit with label 842 – 600 = 242.

## Using method in Step 4 selected units would be:

104, 180, 118, 002, etc.
Using method in Step 5 selected units would be
found more efficiently as:
104, 242, 161, 013, 166, 116, 180, 255, 118, 014.

13
Using a Table of Random Digits
in a Randomized Experiment

## Randomization plays a key role in designing

experiments to compare treatments.
Completely randomized design = all units
are randomly assigned to treatment conditions.
Matched-pairs / Randomized Block design =
randomize order treatments are assigned within
pair/block.

14
Other Sampling Methods
Not always practical to take a simple random sample,
can be difficult to get a numbered list of all units.
Example: College administration would like to
survey a sample of students living in dormitories.

show a simple
random sample
of 30 rooms.

15
Stratified Random Sampling
Divide population of units into groups (called strata)
and take a simple random sample from each of the strata.
College survey: Two strata = undergrad and graduate dorms.
Take a simple
random sample
of 15 rooms from
each of the strata
for a total of 30
rooms.
Ideal: stratify
so little variability
in responses within
each of the strata.
16
Cluster Sampling
Divide population of units into groups (called clusters),
take a random sample of clusters and
measure only those items in these clusters.
College survey: Each floor of each dorm is a cluster.
Take a random sample
of 5 floors and all
rooms on those floors
are surveyed.
a list of the clusters
instead of a list of all
individuals.
17
Systematic Sampling
Order the population of units in some way, select one of
the first k units at random and then every kth unit thereafter.
College survey: Order list of rooms starting at top floor of 1st
undergrad dorm. Pick one of the first 11 rooms at random =>
room 3, then pick every 11th room after that.

Note: often a
good alternative
to random
sampling but
can lead to a
biased sample.

18
Difficulties and
Disasters in Sampling
Some problems occur even when a
sampling plan has been well designed.

## • Using wrong sampling frame

• Not reaching individuals selected
• Self-selected sample
• Convenience/Haphazard sample

19
Using the Wrong Sampling Frame
The sampling frame is the list of units from which the
sample is selected. This list may or may not be the same
as the list of all units in the desired “target” population.

## Example: using telephone

directory to survey general
population excludes those
who move often, those with
unlisted home numbers, and
those who cannot afford a
telephone. Solution: use
random-digit dialing.

20
Not Reaching the Individuals Selected

## Failing to contact or measure the individuals

who were selected in the sampling plan leads
to nonresponse bias.

## • Telephone surveys tend to reach more women.

• Some people are rarely home.
• Others screen calls or may refuse to answer.
• Quickie polls: almost impossible to get a
random sample in one night.

21
Nonresponse or Volunteer Response
“In 1993 the GSS (General Social Survey) achieved its
highest response rate ever, 82.4%. This is five percentage
points higher than our average over the last four years.”
GSS News, Sept 1993

## • The lower the response rate, the less the results

can be generalized to the population as a whole.
• Response to survey is voluntary. Those who
respond likely to have stronger opinions than
those who don’t.
• Surveys often use reminders, follow up calls to
decrease nonresponse rate.
22
Disasters in Sampling
Responses from a self-selected group, convenience
sample or haphazard sample rarely representative
of any larger group.
Example A Meaningless Poll
“Do you support the President’s economic plan?”
Results from TV quickie poll and proper study:

## Those dissatisfied more likely to respond to TV

poll and it did not give the “not sure” option.
23
Example: The Infamous Literary
Digest Poll of 1936
Election of 1936: Democratic incumbent
Franklin D. Roosevelt and Republican Alf Landon

## Literary Digest Poll:

• Sent questionnaires to 10 million people from magazine
subscriber lists, phone directories, car owners, who
were more likely wealthy and unhappy with Roosevelt.
• Only 2.3 million responses for 23% response rate.
Those with strong feelings, the Landon supporters
wanting a change, were more likely to respond.
• (Incorrectly) Predicted a 3-to-2 victory for Landon.

24
Survey Questions
Possible Sources of Response Bias in Surveys
• Deliberate bias: The wording of a question can
deliberately bias the responses toward a desired answer.
• Unintentional bias: Questions can be worded such
that the meaning is misinterpreted by a large percentage
of the respondents.
• Desire to Please: Respondents have a desire to please
the person who is asking the question. Tend to understate
response to an undesirable social habit/opinion.

25
Possible Sources of
Response Bias in Surveys (cont)
• Asking the Uninformed: People do not like to
admit that they don’t know what you are talking about
when you ask them a question.
• Unnecessary Complexity: If questions are to be
understood, they must be kept simple. Some questions
ask more than one question at once.
• Ordering of Questions: If one question requires
respondents to think about something that they may not
have otherwise considered, then the order in which
questions are presented can change the results.
26
Possible Sources of
Response Bias in Surveys (cont)
• Confidentiality and Anonymity: People will
often answer questions differently based on the degree
to which they believe they are anonymous.
Easier to ensure confidentiality, promise not to release
identifying information, than anonymity, researcher
does not know the identity of the respondents.

27
• Be Sure You Understand What Was Measured:
Words can have different meanings. Important to get
a precise definition of what was actually asked or
measured. E.g. Who is really unemployed?

## • Some Concepts Are Hard to Precisely Define:

E.g. How to measure intelligence?

## • Measuring Attitudes and Emotions:

E.g. How to measure self-esteem and happiness?

28
• Open or Closed Questions:
Should Choices Be Given?

## Open question = respondents allowed to answer

in own words.
Closed question = given list of alternatives,
usually offer choice of “other” and can fill in blank.
If closed are preferred, they should first be presented
as open questions (in a pilot survey) for establishing
list of choices.
Results can be difficult to summarize with open
questions.

29
Example: No Opinion of Your Own?
Let Politics Decide

## 1978 Poll, Cincinnati, Ohio: people asked whether they

“favored or opposed repealing the 1975 Public Affairs Act.”

## No such act, about one-third expressed opinion.

1995 Washington Post Poll: 1000 randomly selected
people asked “Some people say the 1975 Public Affairs
Act should be repealed. Do you agree or disagree that
it should be repealed?”
43% expressed opinion,
24% agreeing should be repealed.

30
Example: No Opinion of Your Own?
Let Politics Decide (cont)
Second 1995 Washington Post Poll: polled two
separate groups of 500 randomly selected adults.
Group 1: “President Clinton [a Democrat] said that the 1975
Public Affairs Act should be repealed. Do you agree or
disagree?” Of those expressing an opinion:
36% of the Democrats agreed should be repealed,
16% of the Republicans agreed should be repealed.
Group 2: “The Republicans in Congress said that the 1975
Public Affairs Act should be repealed. Do you agree or
disagree?” Of those expressing an opinion:
36% of the Republicans agreed should be repealed,
19% of the Democrats agreed should be repealed.
31