Vous êtes sur la page 1sur 84

# Sampling

## Real Image of the Population

Sampling
A sample is a subset of a larger population of objects individuals, households, businesses, organizations and so forth. Sampling enables researchers to make estimates of some unknown characteristics of the population in question A finite group is called population whereas a non-finite (infinite) group is called universe A census is a investigation of all the individual elements of a population

Population

Sample

## Reasons for Sampling

 Budget and time Constraints (in case of large populations)  High degree of accuracy and reliability (if sample is representative of population)  Sampling may sometimes produce more accurate results than taking a census as in the latter, there are more risks for making interviewer and other errors due to the high volume of persons contacted and the number of census takers, some of whom may not be well-trained  Industrial production: destructive tests.

## The Sampling Process

Plan procedure for selecting sampling units 4 Determine if a probability or non-probability sampling method will be chosen

## Defining the Target Population

Conduct fieldwork

77

## Defining the Target Population

 The target population is that complete group whose relevant characteristics are to be determined through the sampling  A target population may be, for example, all faculty members in the Department of Industrial Engineering, all housewives in Istanbul, all pre-college students in Kad ky, and all medical doctors in Be ikta  The target group should be clearly delineated if possible, for example, do all pre-college students include only primary and secondary students or also students in other specialized educational institutions?

## The Sampling Frame

 The sampling frame is a list of all those population elements that will be used in the sample  Examples of sampling frames are a student telephone directory (for the student population), the list of companies on the stock exchange, the directory of medical doctors and specialists, the yellow pages (for businesses)  Often, the list does not include the entire population. The discrepancy is often a source of error associated with the selection of the sample (sampling frame error)  Information relating to sampling frames can be obtained from commercial organizations

Sampling Units
 The sampling unit is a single element or group of elements subject to selection in a sample. Examples:  Every student at the university whose first name begins with the letter F  All child passengers under 18 years of age who are traveling in a train from destination X to destination Y  All jeweler shops in Kapal ar in Istanbul

Sample Size
How you sample is as important as How many you sample. How
Probability samples Non Probability samples

How many
Statistical precision Industry standards

## HOW - Types of Sampling Methods

Nonprobability sampling

Probability sampling

## Types of Sampling Methods

 Non-Probability Sampling An arbitrary means of selecting sampling units based on subjective considerations, such as personal judgment or convenience. It is less preferred to probability sampling  Probability Sampling Every element in the population under study has a non-zero probability of selection to a sample, and every member of the population has an equal probability of being selected

## Types of sampling methods

Nonprobability Convenience sampling Judgment sampling Quota sampling Snowball sampling Probability Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling

## Nonprobability sampling methods

Convenience sampling relies upon convenience and access Judgment sampling relies upon belief that participants fit characteristics Quota sampling emphasizes representation of specific characteristics

Snowball sampling relies upon respondent referrals of others with like characteristics

Convenience sampling
Elements are selected for convenience because theyre available or easy to find. Selection based on ones convenience, by accident, or haphazard way. Often, respondents are selected because they happen to be in the right place at the right time. Thus this sampling method is also known as a haphazard, accidental, or availability sample. Examples: Interviewing people on a street corner or at the mall Surveying students in a classroom Magazine surveys Observing conversations in an on-line chat room

Judgement Sampling
 This is a sampling technique in which the business researcher selects the sample based on judgment about some appropriate characteristic of the sample members  Example 1: The Consumer Price Index (CPI) is based on a judgment sample of market-based items, housing costs, and other selected goods and services which are representative for most of the overall population in terms of their consumption  Example 2: Selection of certain voting districts which serve as indicators for the national voting trend

Quota Sampling
 This is a sampling technique in which the business researcher ensures that certain characteristics of a population are represented in the sample to an extent which is he or she desires  Example: A business researcher wants to determine through interview, the demand for Product X in a district depending on the gender. If the sample size is to consist of 100 units, the number of individuals from each gender interviewed should correspond to the groups percentage composition of the total population of that district

## Quota Sampling, continued

The problem is that even when we know that a quota sample is representative of the particular characteristics for which quotas have been set, we have no way of knowing if the sample is representative in terms of any other characteristics.
For example, quotas have been set for gender only. Under the circumstances, its no surprise that the sample is representative of the population only in terms of gender, not in terms of race. Interviewers are only human;.

## Quota Sampling, continued

y To select a quota sample comprising 3000 persons in country X using three control characteristics: sex, age and level of education. y Here,the three control characteristics are considered independently of one another. In order to calculate the desired number of sample elements possessing the various attributes of the specified control characteristics, the distribution pattern of the general population in country in terms of each control characteristics is examined.
Control Characteristics Gender: .... ................. Age: ......... ................. ................. Population Male ...................... Female .................. 20-29 years ........... 30-39 years ........... 40 years & over .... Distribution 50.7% 49.3% 13.4% 53.3% 33.3% Male Female 20-29 years 30-39 years 40 years & over Sample Elements 3000 x 50.7% = 1521 3000 x 49.3% = 1479 3000 x 13.4% = 402 3000 x 52.3% = 1569 3000 x 34.3% = 1029 .

Religion: .. Christianity ........... 76.4% Christianity 3000 x 76.4% = 2292 ................. Islam ..................... 14.8% Islam 3000 x 14.8% = 444 ................. Hinduism .............. 6.6% Hinduism 3000 x 6.6% = 198 ................. Others ................... 2.2% Others 3000 x 2.2% = 66 _________________________________________________________________________________ _

## Quota Sampling continued

 Advantages include the speed of data collection, less cost, the element of convenience, and representativeness (if the subgroups in the sample are selected properly)

 Disadvantages include the element of subjectivity (convenience sampling rather than probability-based which leads to improper selection of sampling units)

Snowball sampling
In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based on the referrals. Hardly leads to representative sample, but useful when population is inaccessible or hard to find. E.g. * the homeless * forced sales properties * wound-up companies

## Snowball sampling continued..

b

Snowball sampling

yinvolves building a sample through referrals. yonce an initial respondent is identified you ask them to identify others who meet the study criteria. Each of those individuals is then asked for further recommendations. yoften used when working with populations that are not easily identified or accessed, i.e.) a population of homeless persons can be hard to identify, but by using referrals a sample can build quite quickly. ysnowballing does not guarantee representativeness. An option here is to develop a population profile from the literature, and assess representativeness by comparing your sample to your profile.

## Snowball Sampling continued

More systematic versions of snowball sampling can reduce the potential for bias. For example, respondent-driven sampling gives financial incentives to respondents to recruit peers

## Probability Sampling Methods

Probability Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling

## Simple Random Sampling

For the sample to be representative, it must be obtained randomly. It is a simple random sample if each item in the population has an equal chance of being selected. Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. This implies that every element is selected independently of every other element.

## Simple random sampling

Population A B S T P C G N Y G K Q element L W E Sample B G T K

## Probability selected = ni/N

population

When population is rather uniform (e.g. school/college students, low-cost houses) Simplest, fastest, cheapest

## Simple Random Sampling

1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Generate n (sample size) different random numbers between 1 and N 4. The numbers generated denote the elements that should be included in the sample

## Random Number Table

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 96268 03550 22188 63759 55006 81972 06344 92363 96083 92993 95083 77308 11913 70648 92771 2 11860 59144 81205 61429 17765 45644 50136 99784 16943 10747 70292 60721 49624 47484 99203 3 83699 59468 99699 14043 15013 12600 33122 94169 89916 08985 50394 96057 28519 05095 37786 4 38631 37984 84260 44095 77707 01951 31794 03652 55159 44999 61947 86031 27311 92335 81142 5 90045 77892 19693 84746 54317 72166 86723 80824 62184 35785 65591 83148 61586 55299 44271 6 69696 89766 36701 22018 48862 52682 58037 33407 86206 65036 09774 34970 28576 27161 36433 7 48572 86489 43233 19014 53823 37598 36065 40837 09764 05933 16216 30892 43092 64486 31726 8 05917 46619 62719 76781 52905 11955 32190 97749 20244 77378 63561 53489 69971 71307 74879 9 51905 50236 53117 61086 70754 73018 31367 18361 88388 92339 59751 44999 44220 85883 89384 10 10052 91136 71153 90216 68212 23528 96007 72666 98675 96151 78771 18021 80410 69610 76886

## How to Use a Random Number Table

1. 2. 3. 4. 5. 6. Number each member of the population. Determine population size (N). Determine sample size (n). Determine starting point in table by randomly picking a page and dropping your finger on the page with your eyes closed. Choose a direction in which to read (up to down, left to right, or right to left). Select the first n numbers read from the table whose last X digits are between 0 and N. (If N is a two digit number, then X would be 2; if it is a four digit number, X would be 4; etc.). Once a number is chosen, do not use it again. If you reach the end of the table before obtaining your n numbers, pick another starting point, read in a different direction, use the first X digits, and continue until done. Example: N = 300; n = 50; starting point is column 3, row 2 on Random Number Table (first page); read down. You would select population numbers 43, 13, 122, 169, etc., until you had 50 unique numbers.

7. 8.

## Simple Random Sample Another example: Sample Members

01 Alaska Airlines 02 Alcoa 03 Ashland 04 Bank of America 05 BellSouth 06 Chevron 07 Citigroup 08 Clorox 09 Delta Air Lines 10 Disney 11 DuPont 12 Exxon Mobil 13 General Dynamics 14 General Electric 15 General Mills 16 Halliburton 17 IBM 18 Kellog 19 KMart 20 Lowes 21 Lucent 22 Mattel 23 Mead 24 Microsoft 25 Occidental Petroleum 26 JCPenney 27 Procter & Gamble 28 Ryder 29 Sears 30 Time Warner

N = 30 n=6

9 5 8 8 6 5 8

9 0 0 6 0 2 9

4 6 8 4 0 5 1

3 5 8 2 9 8 5

7 6 0 0 7 7 5

8 0 6 4 8 7 9

7 0 3 0 6 1 0

9 1 1 8 4 9 5

6 2 7 5 3 6 5

1 7 1 3 6 5 3

4 6 4 5 0 8 9

5 8 2 3 1 5 0

7 3 8 7 8 4 6

3 6 7 9 6 5 8

7 7 7 8 9 3 9

3 6 6 8 4 4 4

7 6 6 9 7 6 8

5 8 8 4 7 8 6

5 8 3 5 5 3 3

2 2 5 4 8 4 7

9 0 6 6 8 0 0

7 8 0 8 9 0 7

9 1 5 1 5 9 9

6 5 1 3 3 9 5

9 6 5 0 5 1 5

3 8 7 9 9 9 4

9 0 0 1 9 9 7

0 0 2 2 4 7 0

9 1 9 5 0 2 6

4 6 6 3 0 9 2

3 7 5 8 4 7 7

4 8 0 8 8 6 1

4 2 0 1 2 9 1

7 2 2 0 6 4 8

5 4 6 4 8 8 2

3 5 4 7 3 1 6

1 8 5 4 0 5 4

6 3 5 3 6 9 4

1 2 8 1 0 4 9

8 6 7 9 6 1 3

N = 30 n=6

## Simple Random Sampling

Simple Sampling error easily measured

Need complete list of units Does not always achieve best representativeness Units may be scattered and poorly accessible Heterogeneous population (minorities)

## Systematic random sampling

Simple or stratified in nature Systematic in the picking-up of element. E.g. every 5th. visitor, every 10th. House, every 15th. minute Steps: * Number the population (1,,N) * Decide on the sample size, n * Decide on the interval size, k = N/n * Select randomly an integer between 1 and k * Take case for every kth. unit

Example

## Systematic random sampling

Example

In a face-to-face consumer survey, a sample of 500 shoppers is planned for a 7-day (Mon. Sun.) period at a shopping complex. The sampling is planned for 3 time blocks: 12-3 p.m.; 3-6 p.m.; and after 6-9 p.m. Respondents are sub-divided into 4 ethnic groups: Malays (30%), Chinese (30%), Indian (30%), and Others (10%). Finally, they are categorized into Family and Single. Repeat persons are not allowed in the sampling. Determine you sampling plan and determine the timing for respondent pick-up interval? 500/7 = 72 shoppers per day 72/3 = 24 per time block 24/3 = 8 shoppers per hour 8/4 = 2 shoppers per ethnic group per hour 60/8 = 7.5th. minutes pick-up interval

## Systematic random sampling

Ensures representativity across list Easy to implement

Need complete list of units If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample.

## Stratified Random Sampling

When the population is heterogeneous overall, but within it there are homogeneous populations (strata) the population is stratified.

## Stratified random sampling

When to use
Population with distinct subgroups

Procedure
Divide (stratify) sampling frame into homogeneous subgroups (strata) e.g. age-group, urban/rural areas, regions, occupations; Draw random sample in each stratum If strata population size unequal: sample same proportion of subjects from each stratum (the same sampling fraction is used, so probability proportional to size)

## Stratified random sampling

Population 1 3 10 7 4 6 14 20 11 8 13 15 16 12 2 Sample 3 10
Stratum 1 = odd no. Stratum 2 = even no.

7 16

Break population into meaningful strata and take random sample from each stratum Can be proportionate or disproportionate within strata When: * population is not very uniform (e.g. shoppers, houses) * key sub-groups need to be represented more precision * variability within group affects research results * sub-group inferences are needed

## Stratified Random Sampling

Proportionate stratified sample The size of the sample selected from each subgroup is proportional to the size of that subgroup in the entire population. (Self weighting) Disproportionate stratified sample The size of the sample selected from each subgroup is disproportional to the size of that subgroup in the population. (needs weights)

## Stratified random sampling Proportionate

Let say a sample of 180 companies is required to conduct a research on strategic planning practices among the managers. Total company population is 600.

Sole Proprietor

Partnership

300

120

## Stratified random sampling Disproportionate

Let say a sample of 180 companies is required to conduct a research on strategic planning practices among the managers. Total company population is 600. Researcher decides to take equal numbers to form each subgroup.

Sole Proprietor

Partnership

## Stratified Random Sampling

Can acquire information about whole population and individual strata Precision increased if variability within strata is smaller (homogenous) than between strata

Sampling error is difficult to measure Different strata can be difficult to identify Loss of precision if small numbers in individual strata (resolved by sampling proportional to stratum population)

Cluster sampling
Principle
Whole population divided into groups e.g. neighbourhoods Random sample taken of these groups (clusters) Within selected clusters: all units e.g. households included or random sample of these units Provides logistical advantage

## Types of Cluster Sampling

Cluster Sampling

One-Stage Sampling

Two-Stage Sampling

Multistage Sampling

## Types of Cluster Sampling

One-stage cluster sampling randomly select clusters and sample all members of the clusters Two-stage cluster sampling
For the first stage
randomly select clusters
Or

- Weighting of clusters: probability proportionate to size (PPS) sampling random sampling within the clusters

## Reasons for weighting:

Not all clusters are the same size. Can weight the clusters to equate the difference. Can weight the chances of a cluster being selected

## Example: 1-stage cluster sampling

Section 1 Section 2

Section 3

Section 5 Section 4

## Example: 2-stage cluster sampling

Section 1 Section 2

Section 3

Section 5 Section 4

## Probability Proportionate to Size Sampling

Selecting first-stage units with probability proportional to size
Scenario - 7 villages in region - 3 villages to be sampled (first stage units) - Total: 6000 individuals - List cumulative frequency of all individuals - draw 3 random numbers between 1 and 6000 e.g. 985, 3830, 4457

Village 1 2 3 4 5 6 7

## Multistage Cluster Sampling

Lay out primary clusters
Sample randomly

Sample randomly

## Lay out tertiary clusters

Sample randomly

etc

Multistage - An Example
The president of Supermarkets, Inc. decided to sample purchases at 150 stores in the US. The first stage is to select, on the basis of clustering (save travel time), 15 of the 150 stores. The researcher recommends that cash register files be randomly selected at each of the 15 stores. [second stage] Then select every 20th purchase in a file using a random start. [final stage]

## Multi-stage Cluster Sampling

Primary Clusters 1 2 3 4 5 6 7 8 9 10 Secondary Clusters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Simple Random Sampling within Secondary Clusters

## Multi-stage Sample Designs

Many surveys use complex sample designs that combine several of the above elements in a multi-stage sampling framework For example, face-to-face in-home surveys of people often employ three stages Systematic pps sampling of areas Cluster samples of households within areas Random selection of one person from each household (unequal sampling probabilities)

Cluster sampling
Advantages Simple as complete list of sampling units within population not required Much more efficient; less costly Less travel/resources required Disadvantages Cluster members may be more alike than those in another cluster (homogenous).and this dependence needs to be taken into account in the sample size and in the analysis (design effect)

## Population of L strata, stratum l contains nl units

Population of C clusters

## Stratified and Cluster Sampling

Stratified Population divided into few subgroups Homogeneity within subgroups Heterogeneity between subgroups Choice of elements from within each subgroup Cluster Population divided into many subgroups Heterogeneity within subgroups Homogeneity between subgroups Random choice of subgroups

Summary..
Simple random sample: Select every unit randomly by using random number table. Systematic random sample: pick a random case from the first k cases of a sample; select every kth case after that one. Stratified random sample: divide a population into groups, then select a simple random sample from each stratum. Cluster sampling: divide the population into groups called clusters or primary sampling units (PSUs); take a random sample of the clusters. Multistage sampling: several levels of nested clusters, often including both stratified and cluster sampling techniques.

Summary..

Summary

Summary

Summary

Summary

## HOW MANY - Systematic and Random Errors

Error: Defined as the difference between a calculated or observed value and the true value Systematic Errors: Errors that occur reproducibly from faulty calibration of equipment or observer bias. Statistical analysis in generally not useful, but rather corrections must be made based on experimental conditions. Random Errors: Errors that result from the fluctuations in observations. Requires that experiments be repeated a sufficient number of time to establish the precision of measurement.

## Sampling Errors (1)

 Random (Sampling) Error This is defined as the difference between the sample result and the result of a census conducted using identical procedures and is the result of chance variation in the selection of sampling units  If samples are selected properly (for e.g. through the technique of randomization), the sample is usually deemed to be a good approximation of the population and thus capable of delivering an accurate result  Usually, the random sampling error arising from statistical fluctuation is small, but sometimes the margin of error can be significant

## Sampling Errors (2)

 Systematic (Non-Sampling) Errors These errors result from factors such as an improper research design that causes response error or from errors committed in the execution of the research, errors in recording responses and non-responses from individuals who were not contacted or who refused to participate  Both Random sampling errors and systematic (non-sampling) errors reduce the representativeness of a sample and consequently the value of the information which is derived by business researchers from it

## Graphical Depiction of Sampling Errors

Planned Sample Respondents (actual sample)

Sampling Frame

Non-Response Error Sampling Frame Error Random Sampling Error Total Population

## Systematic and Random Errors

Systematic error (or bias) Representativeness (validity) Information bias Random error (sampling error) Precision

Representativeness (validity)
Sample should accurately reflect distribution of relevant variable in population Person (age, sex) Place e.g. urban vs. rural Time e.g. seasonality Representativeness is essential to generalise

Representativeness (validity)

Information bias
Systematic problem in collecting information Inaccurate measuring Scales, ultrasound, lab tests Badly asked questions Ambiguous, not offering right options

Precision
No sample is exact mirror image of population Random difference between sample and population from which sample drawn is calling precision

Sampling error depends upon size of the sample distribution of character of interest in population Size of error can be measured in probability samples

## Accuracy vs. Precision

Systematic error causes lack of Accuracy A measure of how close an experimental result is to the true value. Random error causes lack of Precision A measure of how exactly the result is determined. It is also a measure of how reproducible the result is.

No precision

Random error

Continuous Data

## Calculating Sample Size: How Big is Big Enough?

Sample results are almost never identical to the entire population The larger the sample of clients, the greater the likelihood that the statistical analysis will yield significant results that closely resemble the entire client population, but Too large Too small waste time, resources and money inaccurate results

## Factors affecting the sample size..

Variability of the population characteristic under investigation Level of confidence desired in the estimate (z) Degree of precision desired in estimating the population characteristic ( e )

## Factors affecting the sample size..

Precision Confidence level Size of interval estimate Population Dispersion

## Calculating Sample Size

When estimating a population mean n = z2 W2 / e2 e, the sampling error, is the difference between sample mean and population mean When estimates of a population proportion are of concern n = [z2 p (1 p)] / e2 e, the sampling error, is the difference between sample proportion and population proportion So, we need to know the normal distribution and the central limit theorem to understand these formulas.

## Central limit theorem



The central limit theorem states that given a parameter with mean and variance , the sampling distribution of the sample mean approaches a normal distribution with mean and variance /n This is true even when the distribution of the parameter is not normal. The normal distribution is widely used. Part of its appeal is that it is well behaved and mathematically tractable.

## The beauty of the normal curve:

No matter what Q and W are, the area between Q-W and Q+W is about 68%; the area between Q-2W and Q+2W is about 95%; and the area between Q-3W and Q+3W is about 99.7%. Almost all values fall within 3 standard deviations.

## Standart Normal Distribution (Z)

All normal distributions can be converted into the standard normal curve by using the following formula:

X Q Z! W/n
Somebody calculated all the integrals for the standard normal and put them in a table! So we never have to integrate! Even better, computers now do all the integration.

## What is the area to the left of Z=1.51 in a standard normal curve?

Area is 93.45%

Problem 1
A study is to be performed to determine a certain parameter in a community. From a previous study a sd. of 46 was obtained. If a sampling error of up to 4 is to be accepted. How many subjects should be included in this study at 99% level of confidence?

n!

2 2 2

## 2 x 462 2.58 n! ! 880.3 ~ 881 2 4

Problem 2
It was desired to estimate proportion of anaemic children in a certain preparatory school. In a similar study at another school a proportion of 30 % was detected. Compute the minimal sample size required at a confidence limit of 95% and accepting a difference of up to 4% of the true population.

Z p(1  p) n! e2

n!