Vous êtes sur la page 1sur 84

Sampling

Real Image of the Population

Sampling
A sample is a subset of a larger population of objects individuals, households, businesses, organizations and so forth. Sampling enables researchers to make estimates of some unknown characteristics of the population in question A finite group is called population whereas a non-finite (infinite) group is called universe A census is a investigation of all the individual elements of a population

Population

Sample

Reasons for Sampling


 Budget and time Constraints (in case of large populations)  High degree of accuracy and reliability (if sample is representative of population)  Sampling may sometimes produce more accurate results than taking a census as in the latter, there are more risks for making interviewer and other errors due to the high volume of persons contacted and the number of census takers, some of whom may not be well-trained  Industrial production: destructive tests.

The Sampling Process


Plan procedure for selecting sampling units 4 Determine if a probability or non-probability sampling method will be chosen

Determine sample size

Select a Sampling Frame

Select actual sampling units

Defining the Target Population

Conduct fieldwork

77

Defining the Target Population


 The target population is that complete group whose relevant characteristics are to be determined through the sampling  A target population may be, for example, all faculty members in the Department of Industrial Engineering, all housewives in Istanbul, all pre-college students in Kad ky, and all medical doctors in Be ikta  The target group should be clearly delineated if possible, for example, do all pre-college students include only primary and secondary students or also students in other specialized educational institutions?

The Sampling Frame


 The sampling frame is a list of all those population elements that will be used in the sample  Examples of sampling frames are a student telephone directory (for the student population), the list of companies on the stock exchange, the directory of medical doctors and specialists, the yellow pages (for businesses)  Often, the list does not include the entire population. The discrepancy is often a source of error associated with the selection of the sample (sampling frame error)  Information relating to sampling frames can be obtained from commercial organizations

Sampling Units
 The sampling unit is a single element or group of elements subject to selection in a sample. Examples:  Every student at the university whose first name begins with the letter F  All child passengers under 18 years of age who are traveling in a train from destination X to destination Y  All jeweler shops in Kapal ar in Istanbul

Sample Size
How you sample is as important as How many you sample. How
Probability samples Non Probability samples

How many
Statistical precision Industry standards

HOW - Types of Sampling Methods

Nonprobability sampling

Probability sampling

Types of Sampling Methods


 Non-Probability Sampling An arbitrary means of selecting sampling units based on subjective considerations, such as personal judgment or convenience. It is less preferred to probability sampling  Probability Sampling Every element in the population under study has a non-zero probability of selection to a sample, and every member of the population has an equal probability of being selected

Types of sampling methods


Nonprobability Convenience sampling Judgment sampling Quota sampling Snowball sampling Probability Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling

Nonprobability sampling methods


Convenience sampling relies upon convenience and access Judgment sampling relies upon belief that participants fit characteristics Quota sampling emphasizes representation of specific characteristics

Snowball sampling relies upon respondent referrals of others with like characteristics

Convenience sampling
Elements are selected for convenience because theyre available or easy to find. Selection based on ones convenience, by accident, or haphazard way. Often, respondents are selected because they happen to be in the right place at the right time. Thus this sampling method is also known as a haphazard, accidental, or availability sample. Examples: Interviewing people on a street corner or at the mall Surveying students in a classroom Magazine surveys Observing conversations in an on-line chat room

Judgement Sampling
 This is a sampling technique in which the business researcher selects the sample based on judgment about some appropriate characteristic of the sample members  Example 1: The Consumer Price Index (CPI) is based on a judgment sample of market-based items, housing costs, and other selected goods and services which are representative for most of the overall population in terms of their consumption  Example 2: Selection of certain voting districts which serve as indicators for the national voting trend

Quota Sampling
 This is a sampling technique in which the business researcher ensures that certain characteristics of a population are represented in the sample to an extent which is he or she desires  Example: A business researcher wants to determine through interview, the demand for Product X in a district depending on the gender. If the sample size is to consist of 100 units, the number of individuals from each gender interviewed should correspond to the groups percentage composition of the total population of that district

Quota Sampling, continued


The problem is that even when we know that a quota sample is representative of the particular characteristics for which quotas have been set, we have no way of knowing if the sample is representative in terms of any other characteristics.
For example, quotas have been set for gender only. Under the circumstances, its no surprise that the sample is representative of the population only in terms of gender, not in terms of race. Interviewers are only human;.

Quota Sampling, continued


y To select a quota sample comprising 3000 persons in country X using three control characteristics: sex, age and level of education. y Here,the three control characteristics are considered independently of one another. In order to calculate the desired number of sample elements possessing the various attributes of the specified control characteristics, the distribution pattern of the general population in country in terms of each control characteristics is examined.
Control Characteristics Gender: .... ................. Age: ......... ................. ................. Population Male ...................... Female .................. 20-29 years ........... 30-39 years ........... 40 years & over .... Distribution 50.7% 49.3% 13.4% 53.3% 33.3% Male Female 20-29 years 30-39 years 40 years & over Sample Elements 3000 x 50.7% = 1521 3000 x 49.3% = 1479 3000 x 13.4% = 402 3000 x 52.3% = 1569 3000 x 34.3% = 1029 .

Religion: .. Christianity ........... 76.4% Christianity 3000 x 76.4% = 2292 ................. Islam ..................... 14.8% Islam 3000 x 14.8% = 444 ................. Hinduism .............. 6.6% Hinduism 3000 x 6.6% = 198 ................. Others ................... 2.2% Others 3000 x 2.2% = 66 _________________________________________________________________________________ _

Quota Sampling continued


 Quota Sampling has advantages and disadvantages:

 Advantages include the speed of data collection, less cost, the element of convenience, and representativeness (if the subgroups in the sample are selected properly)

 Disadvantages include the element of subjectivity (convenience sampling rather than probability-based which leads to improper selection of sampling units)

Snowball sampling
In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based on the referrals. Hardly leads to representative sample, but useful when population is inaccessible or hard to find. E.g. * the homeless * forced sales properties * wound-up companies

Snowball sampling continued..


b

Snowball sampling

yinvolves building a sample through referrals. yonce an initial respondent is identified you ask them to identify others who meet the study criteria. Each of those individuals is then asked for further recommendations. yoften used when working with populations that are not easily identified or accessed, i.e.) a population of homeless persons can be hard to identify, but by using referrals a sample can build quite quickly. ysnowballing does not guarantee representativeness. An option here is to develop a population profile from the literature, and assess representativeness by comparing your sample to your profile.

Snowball Sampling continued


More systematic versions of snowball sampling can reduce the potential for bias. For example, respondent-driven sampling gives financial incentives to respondents to recruit peers

Probability Sampling Methods


Probability Simple random sampling Systematic random sampling Stratified random sampling Cluster sampling

Simple Random Sampling


For the sample to be representative, it must be obtained randomly. It is a simple random sample if each item in the population has an equal chance of being selected. Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. This implies that every element is selected independently of every other element.

Simple random sampling


Population A B S T P C G N Y G K Q element L W E Sample B G T K

Probability selected = ni/N

population

When population is rather uniform (e.g. school/college students, low-cost houses) Simplest, fastest, cheapest

Procedures for Drawing Probability Samples

Simple Random Sampling

1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Generate n (sample size) different random numbers between 1 and N 4. The numbers generated denote the elements that should be included in the sample

Random Number Table


1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 96268 03550 22188 63759 55006 81972 06344 92363 96083 92993 95083 77308 11913 70648 92771 2 11860 59144 81205 61429 17765 45644 50136 99784 16943 10747 70292 60721 49624 47484 99203 3 83699 59468 99699 14043 15013 12600 33122 94169 89916 08985 50394 96057 28519 05095 37786 4 38631 37984 84260 44095 77707 01951 31794 03652 55159 44999 61947 86031 27311 92335 81142 5 90045 77892 19693 84746 54317 72166 86723 80824 62184 35785 65591 83148 61586 55299 44271 6 69696 89766 36701 22018 48862 52682 58037 33407 86206 65036 09774 34970 28576 27161 36433 7 48572 86489 43233 19014 53823 37598 36065 40837 09764 05933 16216 30892 43092 64486 31726 8 05917 46619 62719 76781 52905 11955 32190 97749 20244 77378 63561 53489 69971 71307 74879 9 51905 50236 53117 61086 70754 73018 31367 18361 88388 92339 59751 44999 44220 85883 89384 10 10052 91136 71153 90216 68212 23528 96007 72666 98675 96151 78771 18021 80410 69610 76886

How to Use a Random Number Table


1. 2. 3. 4. 5. 6. Number each member of the population. Determine population size (N). Determine sample size (n). Determine starting point in table by randomly picking a page and dropping your finger on the page with your eyes closed. Choose a direction in which to read (up to down, left to right, or right to left). Select the first n numbers read from the table whose last X digits are between 0 and N. (If N is a two digit number, then X would be 2; if it is a four digit number, X would be 4; etc.). Once a number is chosen, do not use it again. If you reach the end of the table before obtaining your n numbers, pick another starting point, read in a different direction, use the first X digits, and continue until done. Example: N = 300; n = 50; starting point is column 3, row 2 on Random Number Table (first page); read down. You would select population numbers 43, 13, 122, 169, etc., until you had 50 unique numbers.

7. 8.

Simple Random Sample Another example: Sample Members


01 Alaska Airlines 02 Alcoa 03 Ashland 04 Bank of America 05 BellSouth 06 Chevron 07 Citigroup 08 Clorox 09 Delta Air Lines 10 Disney 11 DuPont 12 Exxon Mobil 13 General Dynamics 14 General Electric 15 General Mills 16 Halliburton 17 IBM 18 Kellog 19 KMart 20 Lowes 21 Lucent 22 Mattel 23 Mead 24 Microsoft 25 Occidental Petroleum 26 JCPenney 27 Procter & Gamble 28 Ryder 29 Sears 30 Time Warner

N = 30 n=6

Simple Random Sampling: Random Number Table

9 5 8 8 6 5 8

9 0 0 6 0 2 9

4 6 8 4 0 5 1

3 5 8 2 9 8 5

7 6 0 0 7 7 5

8 0 6 4 8 7 9

7 0 3 0 6 1 0

9 1 1 8 4 9 5

6 2 7 5 3 6 5

1 7 1 3 6 5 3

4 6 4 5 0 8 9

5 8 2 3 1 5 0

7 3 8 7 8 4 6

3 6 7 9 6 5 8

7 7 7 8 9 3 9

3 6 6 8 4 4 4

7 6 6 9 7 6 8

5 8 8 4 7 8 6

5 8 3 5 5 3 3

2 2 5 4 8 4 7

9 0 6 6 8 0 0

7 8 0 8 9 0 7

9 1 5 1 5 9 9

6 5 1 3 3 9 5

9 6 5 0 5 1 5

3 8 7 9 9 9 4

9 0 0 1 9 9 7

0 0 2 2 4 7 0

9 1 9 5 0 2 6

4 6 6 3 0 9 2

3 7 5 8 4 7 7

4 8 0 8 8 6 1

4 2 0 1 2 9 1

7 2 2 0 6 4 8

5 4 6 4 8 8 2

3 5 4 7 3 1 6

1 8 5 4 0 5 4

6 3 5 3 6 9 4

1 2 8 1 0 4 9

8 6 7 9 6 1 3

N = 30 n=6

Simple Random Sampling


Advantages
Simple Sampling error easily measured

Disadvantages
Need complete list of units Does not always achieve best representativeness Units may be scattered and poorly accessible Heterogeneous population (minorities)

Systematic random sampling


Simple or stratified in nature Systematic in the picking-up of element. E.g. every 5th. visitor, every 10th. House, every 15th. minute Steps: * Number the population (1,,N) * Decide on the sample size, n * Decide on the interval size, k = N/n * Select randomly an integer between 1 and k * Take case for every kth. unit

Systematic random sampling

Example

Systematic Random Sampling

Systematic random sampling

Example

In a face-to-face consumer survey, a sample of 500 shoppers is planned for a 7-day (Mon. Sun.) period at a shopping complex. The sampling is planned for 3 time blocks: 12-3 p.m.; 3-6 p.m.; and after 6-9 p.m. Respondents are sub-divided into 4 ethnic groups: Malays (30%), Chinese (30%), Indian (30%), and Others (10%). Finally, they are categorized into Family and Single. Repeat persons are not allowed in the sampling. Determine you sampling plan and determine the timing for respondent pick-up interval? 500/7 = 72 shoppers per day 72/3 = 24 per time block 24/3 = 8 shoppers per hour 8/4 = 2 shoppers per ethnic group per hour 60/8 = 7.5th. minutes pick-up interval

Systematic random sampling


Advantages
Ensures representativity across list Easy to implement

Disadvantages
Need complete list of units If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample.

Stratified Random Sampling


When the population is heterogeneous overall, but within it there are homogeneous populations (strata) the population is stratified.

Stratified random sampling


When to use
Population with distinct subgroups

Procedure
Divide (stratify) sampling frame into homogeneous subgroups (strata) e.g. age-group, urban/rural areas, regions, occupations; Draw random sample in each stratum If strata population size unequal: sample same proportion of subjects from each stratum (the same sampling fraction is used, so probability proportional to size)

Stratified random sampling


Population 1 3 10 7 4 6 14 20 11 8 13 15 16 12 2 Sample 3 10
Stratum 1 = odd no. Stratum 2 = even no.

7 16

Break population into meaningful strata and take random sample from each stratum Can be proportionate or disproportionate within strata When: * population is not very uniform (e.g. shoppers, houses) * key sub-groups need to be represented more precision * variability within group affects research results * sub-group inferences are needed

Stratified Random Sampling


Proportionate stratified sample The size of the sample selected from each subgroup is proportional to the size of that subgroup in the entire population. (Self weighting) Disproportionate stratified sample The size of the sample selected from each subgroup is disproportional to the size of that subgroup in the population. (needs weights)

Stratified random sampling Proportionate


Let say a sample of 180 companies is required to conduct a research on strategic planning practices among the managers. Total company population is 600.

Type of company Population Sample stratum Sample

Sole Proprietor

Partnership

Private Limited 180

300

120

300/600 X 180 120/600 x 180 180/600 x 180 90 36 54

Stratified random sampling Disproportionate


Let say a sample of 180 companies is required to conduct a research on strategic planning practices among the managers. Total company population is 600. Researcher decides to take equal numbers to form each subgroup.

Type of company Population Sample stratum Sample

Sole Proprietor

Partnership

Private Limited 180 33/100 x 180 60

300 20/100 x 300 60

120 50/100 x 120 60

Stratified Random Sampling

Stratified random sampling Advantages


Can acquire information about whole population and individual strata Precision increased if variability within strata is smaller (homogenous) than between strata

Disadvantages
Sampling error is difficult to measure Different strata can be difficult to identify Loss of precision if small numbers in individual strata (resolved by sampling proportional to stratum population)

Cluster sampling
Principle
Whole population divided into groups e.g. neighbourhoods Random sample taken of these groups (clusters) Within selected clusters: all units e.g. households included or random sample of these units Provides logistical advantage

Types of Cluster Sampling


Cluster Sampling

One-Stage Sampling

Two-Stage Sampling

Multistage Sampling

Simple Cluster Sampling

Probability Proportionate to Size Sampling

Types of Cluster Sampling


One-stage cluster sampling randomly select clusters and sample all members of the clusters Two-stage cluster sampling
For the first stage
randomly select clusters
Or

- Weighting of clusters: probability proportionate to size (PPS) sampling random sampling within the clusters

For the second stage

Reasons for weighting:


Not all clusters are the same size. Can weight the clusters to equate the difference. Can weight the chances of a cluster being selected

Example: 1-stage cluster sampling


Section 1 Section 2

Section 3

Section 5 Section 4

Example: 2-stage cluster sampling


Section 1 Section 2

Section 3

Section 5 Section 4

Probability Proportionate to Size Sampling


Selecting first-stage units with probability proportional to size
Scenario - 7 villages in region - 3 villages to be sampled (first stage units) - Total: 6000 individuals - List cumulative frequency of all individuals - draw 3 random numbers between 1 and 6000 e.g. 985, 3830, 4457

Village 1 2 3 4 5 6 7

Frequency individuals 30 400 1100 500 2000 100 1870

Cumulative frequency 30 430 1530 2030 4030 4130 6000

Multistage Cluster Sampling


Lay out primary clusters
Sample randomly

Lay out secondary clusters


Sample randomly

Lay out tertiary clusters


Sample randomly

etc

Multistage - An Example
The president of Supermarkets, Inc. decided to sample purchases at 150 stores in the US. The first stage is to select, on the basis of clustering (save travel time), 15 of the 150 stores. The researcher recommends that cash register files be randomly selected at each of the 15 stores. [second stage] Then select every 20th purchase in a file using a random start. [final stage]

Multi-stage Cluster Sampling


Primary Clusters 1 2 3 4 5 6 7 8 9 10 Secondary Clusters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Simple Random Sampling within Secondary Clusters

Multi-stage Sample Designs


Many surveys use complex sample designs that combine several of the above elements in a multi-stage sampling framework For example, face-to-face in-home surveys of people often employ three stages Systematic pps sampling of areas Cluster samples of households within areas Random selection of one person from each household (unequal sampling probabilities)

Cluster sampling
Advantages Simple as complete list of sampling units within population not required Much more efficient; less costly Less travel/resources required Disadvantages Cluster members may be more alike than those in another cluster (homogenous).and this dependence needs to be taken into account in the sample size and in the analysis (design effect)

Difference Between Cluster and Stratified Sampling

Population of L strata, stratum l contains nl units

Population of C clusters

Take simple random sample in every stratum

Take srs of clusters, sample every unit in chosen clusters

Stratified and Cluster Sampling


Stratified Population divided into few subgroups Homogeneity within subgroups Heterogeneity between subgroups Choice of elements from within each subgroup Cluster Population divided into many subgroups Heterogeneity within subgroups Homogeneity between subgroups Random choice of subgroups

Summary..
Simple random sample: Select every unit randomly by using random number table. Systematic random sample: pick a random case from the first k cases of a sample; select every kth case after that one. Stratified random sample: divide a population into groups, then select a simple random sample from each stratum. Cluster sampling: divide the population into groups called clusters or primary sampling units (PSUs); take a random sample of the clusters. Multistage sampling: several levels of nested clusters, often including both stratified and cluster sampling techniques.

Summary..

Summary

Summary

Summary

Summary

HOW MANY - Systematic and Random Errors


Error: Defined as the difference between a calculated or observed value and the true value Systematic Errors: Errors that occur reproducibly from faulty calibration of equipment or observer bias. Statistical analysis in generally not useful, but rather corrections must be made based on experimental conditions. Random Errors: Errors that result from the fluctuations in observations. Requires that experiments be repeated a sufficient number of time to establish the precision of measurement.

Sampling Errors (1)


 Random (Sampling) Error This is defined as the difference between the sample result and the result of a census conducted using identical procedures and is the result of chance variation in the selection of sampling units  If samples are selected properly (for e.g. through the technique of randomization), the sample is usually deemed to be a good approximation of the population and thus capable of delivering an accurate result  Usually, the random sampling error arising from statistical fluctuation is small, but sometimes the margin of error can be significant

Sampling Errors (2)


 Systematic (Non-Sampling) Errors These errors result from factors such as an improper research design that causes response error or from errors committed in the execution of the research, errors in recording responses and non-responses from individuals who were not contacted or who refused to participate  Both Random sampling errors and systematic (non-sampling) errors reduce the representativeness of a sample and consequently the value of the information which is derived by business researchers from it

Graphical Depiction of Sampling Errors


Planned Sample Respondents (actual sample)

Sampling Frame

Non-Response Error Sampling Frame Error Random Sampling Error Total Population

Systematic and Random Errors

Systematic error (or bias) Representativeness (validity) Information bias Random error (sampling error) Precision

Representativeness (validity)
Sample should accurately reflect distribution of relevant variable in population Person (age, sex) Place e.g. urban vs. rural Time e.g. seasonality Representativeness is essential to generalise

Representativeness (validity)

Information bias
Systematic problem in collecting information Inaccurate measuring Scales, ultrasound, lab tests Badly asked questions Ambiguous, not offering right options

Precision
No sample is exact mirror image of population Random difference between sample and population from which sample drawn is calling precision

Sampling error depends upon size of the sample distribution of character of interest in population Size of error can be measured in probability samples

Accuracy vs. Precision


Systematic error causes lack of Accuracy A measure of how close an experimental result is to the true value. Random error causes lack of Precision A measure of how exactly the result is determined. It is also a measure of how reproducible the result is.

Quality of a sampling estimate

Precision & accuracy

No precision

Precision but no accuracy

Random error

Systematic error (bias)

Accuracy and Precision


Continuous Data

Calculating Sample Size: How Big is Big Enough?


Sample results are almost never identical to the entire population The larger the sample of clients, the greater the likelihood that the statistical analysis will yield significant results that closely resemble the entire client population, but Too large Too small waste time, resources and money inaccurate results

Minimum sample size needed to estimate a population parameter.

Factors affecting the sample size..

Variability of the population characteristic under investigation Level of confidence desired in the estimate (z) Degree of precision desired in estimating the population characteristic ( e )

Factors affecting the sample size..


Precision Confidence level Size of interval estimate Population Dispersion

Calculating Sample Size


When estimating a population mean n = z2 W2 / e2 e, the sampling error, is the difference between sample mean and population mean When estimates of a population proportion are of concern n = [z2 p (1 p)] / e2 e, the sampling error, is the difference between sample proportion and population proportion So, we need to know the normal distribution and the central limit theorem to understand these formulas.

Central limit theorem




The central limit theorem states that given a parameter with mean and variance , the sampling distribution of the sample mean approaches a normal distribution with mean and variance /n This is true even when the distribution of the parameter is not normal. The normal distribution is widely used. Part of its appeal is that it is well behaved and mathematically tractable.

The beauty of the normal curve:


No matter what Q and W are, the area between Q-W and Q+W is about 68%; the area between Q-2W and Q+2W is about 95%; and the area between Q-3W and Q+3W is about 99.7%. Almost all values fall within 3 standard deviations.

Standart Normal Distribution (Z)


All normal distributions can be converted into the standard normal curve by using the following formula:

X Q Z! W/n
Somebody calculated all the integrals for the standard normal and put them in a table! So we never have to integrate! Even better, computers now do all the integration.

The Standard Normal table

What is the area to the left of Z=1.51 in a standard normal curve?

Area is 93.45%

Problem 1
A study is to be performed to determine a certain parameter in a community. From a previous study a sd. of 46 was obtained. If a sampling error of up to 4 is to be accepted. How many subjects should be included in this study at 99% level of confidence?

n!

2 2 2

2 x 462 2.58 n! ! 880.3 ~ 881 2 4

Problem 2
It was desired to estimate proportion of anaemic children in a certain preparatory school. In a similar study at another school a proportion of 30 % was detected. Compute the minimal sample size required at a confidence limit of 95% and accepting a difference of up to 4% of the true population.

Z p(1  p) n! e2

n!

1.96 2 x 0.3(1  0.3) (0.04) 2

! 504 .21 ~ 505