Vous êtes sur la page 1sur 38

Sampling

Population

• All people or items with the characteristic one wishes to understand. It is also called the
sampling frame

• . For example, a manufacturer needs to decide whether a batch of material from production
is of high enough quality to be released to the customer, or should be sentenced for scrap or
rework due to poor quality. In this case, the batch is the population.

Probability sampling

• Every unit in the population has a chance (greater than zero) of being selected in the sample,
and this probability can be accurately determined.

Non probability sampling

• is any sampling method where some elements of the population have no chance of selection
(these are sometimes referred to as 'out of coverage'/'under covered'), or where the
probability of selection can't be accurately determined.

• It involves the selection of elements based on assumptions regarding the population of


interest, which forms the criteria for selection

• Example: We visit every household in a given street, and interview the first person to answer
the door. In any household with more than one occupant, this is a non probability sample,
because some people are more likely to answer the door (e.g. an unemployed person who
spends most of their time at home is more likely to answer than an employed housemate
who might be at work when the interviewer calls) and it's not practical to calculate these
probabilities. Types of probability sampling

• Simple Random Sampling( random point, area or line)

• Systematic Sampling
• Stratified Sampling
Simple random sampling
• Each element of the frame thus has an equal probability of selection:
• the frame is not subdivided or partitioned.
• any given pair of elements has the same chance of selection as any other such pair (and
similarly for triples, and so on).
• This minimises bias and simplifies analysis of results
Random sampling
• involves the use of random numbers
• Each member within the population is given a number and the numbers are then chosen at
random.
Advantages and disadvantages
Advantages
• Highly representative if all subjects participate;
• it eliminates bias, each member in the population has an equal chance of being selected
Disadvantages
• Not possible without complete list of population members;
• potentially uneconomical to achieve;
• can be disruptive to isolate members from a group;

1
• time-scale may be too long, data/sample could change
Stratified random
• Used where the population embraces a number of distinct categories, the frame can be
organized by these categories into separate "strata.“
• Each stratum is then sampled as an independent sub-population, out of which individual
elements can be randomly selected. There are several potential benefits to stratified
sampling.
A stratified sampling approach is most effective when three conditions are met.i.e.
• Variability within strata are minimized
• Variability between strata are maximized
• The variables upon which the population is stratified are strongly correlated with the desired
dependent variable.
Advantages over other sampling methods
• Focuses on important subpopulations and ignores irrelevant ones.
• Allows use of different sampling techniques for different subpopulations.
• Improves the accuracy/efficiency of estimation.
• Permits greater balancing of statistical power of tests of differences between strata by
sampling equal numbers from strata varying widely in size.
Disadvantages
• Requires selection of relevant stratification variables which can be difficult.
• Is not useful when there are homogeneous subgroups.
• Can be expensive to implement.
Systematic sampling
• Relies on arranging the target population according to some ordering scheme and then
selecting elements at regular intervals through that ordered list.
• Systematic sampling involves a random start and then proceeds with the selection of every
kth element from then onwards.
Advantages
• It is easy to implement and the stratification induced can make it efficient,
• if the variable by which the list is ordered is correlated with the variable of interest. 'Every
10th' sampling is especially useful for efficient sampling from databases
• Can ensure that specific groups are represented, even proportionally, in the sample(s) (e.g.,
by gender), by selecting individuals from strata list
• For example, suppose we wish to sample people from a long street that starts in a poor area
(house No. 1) and ends in an expensive district (house No. 1000).
• A simple random selection of addresses from this street could easily end up with too many
from the high end and too few from the low end (or vice versa), leading to an
unrepresentative sample. Selecting (e.g.) every 10th street number along the street ensures
that the sample is spread evenly along the length of the street, representing all of these
districts. (Note that if we always start at house #1 and end at #991, the sample is slightly
biased towards the low end; by randomly selecting the start between #1 and #10, this bias is
eliminated.
Disadvantages
• However, systematic sampling is especially vulnerable to periodicities in the list.
• If periodicity is present and the period is a multiple or factor of the interval used, the sample
is especially likely to be unrepresentative of the overall population, making the scheme less
accurate than simple random sampling.
Example
• Consider a street where the odd-numbered houses are all on the north (expensive) side of
the road, and the even-numbered houses are all on the south (cheap) side. Under the
sampling scheme given above, it is impossible to get a representative sample; either the

2
houses sampled will all be from the odd-numbered, expensive side, or they will all be from
the even-numbered, cheap side.
• All elements have the same probability of selection (in the example given, one in ten). It is
not 'simple random sampling' because different subsets of the same size have different
selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten probability of selection,
but the set {4,13,24,34,...} has zero probability of selection.
Disadvantages
• More complex,
• requires greater effort than simple random;
• strata must be carefully defined
Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters'). Sampling is often
clustered by geography, or by time periods. (Nearly all samples are in some sense 'clustered' in
time - although this is rarely taken into account in the analysis.) For instance, if surveying
households within a city, we might choose to select 100 city blocks and then interview every
household within the selected blocks.
Advantages
• . Clustering can reduce travel and administrative costs.
• In the example above, an interviewer can make a single trip to visit several households in
one block, rather than having to drive to a different block for each household.
• It also means that one does not need a sampling frame listing all elements in the target
population.
• Instead, clusters can be chosen from a cluster-level frame, with an element-level frame
created only for the selected clusters. In the example above, the sample only requires a
block-level city map for initial selections, and then a household-level map of the 100
selected blocks, rather than a household-level map of the whole city.
Cluster sampling generally increases the variability of sample estimates above that of simple
random sampling, depending on how the clusters differ between themselves, as compared with
the within-cluster variation. For this reason, cluster sampling requires a larger sample than SRS
to achieve the same level of accuracy - but cost savings from clustering might still make this a
cheaper option.
• In quota sampling, the population is first segmented into mutually exclusive sub-groups, just
as in stratified sampling.
• Then judgement is used to select the subjects or units from each segment based on a
specified proportion. For example, an interviewer may be told to sample 200 females and
300 males between the age of 45 and 60.
• It is this second step which makes the technique one of non-probability sampling.
• In quota sampling the selection of the sample is non-random.
• For example interviewers might be tempted to interview those who look most helpful. The
problem is that these samples may be biased because not everyone gets a chance of
selection. This random element is its greatest weakness and quota versus probability has
been a matter of controversy for many years.
Accidental sampling
• Accidental sampling (sometimes known as grab, convenience or opportunity sampling) is a
type of non probability sampling which involves the sample being drawn from that part of
the population which is close to hand. That is, a population is selected because it is readily
available and convenient. It may be through meeting the person or including a person in the
sample when one meets them or chosen by finding them through technological means such
as the internet or through phone.
snowball sampling
• Existing study subjects are used to recruit more subjects into the sample.

3
What to do when planning for a survey

 Request for permission to conduct the survey from the responsible authorities
 Once granted obtain a large base map of the area which will help you
identify the places or points where you want to conduct your study
 Conduct a pre-survey of the area to see whether it is feasible for you to
conduct the survey alone
 Look for helpers
 Design a questionnaire/interview schedule
 Read around the topic where necessary
 Prepare adequate equipment e.g. note books, pens, safety clothing, cameras
 Decide on sampling technique

What to do during the survey

 Observe
 Collect statistical data
 Collect samples where possible
 Draw sketch maps where possible
 Administer questionnaires
 Ask questions/interview people
 Record answers in note books
 Count traffic or people
 Phone people where possible
 Take photos and videos where possible

Problems you are likely to encounter

 Illiteracy
 Falsehood/lies
 Biased information
 Confidentiality/secretive
 Ignorance
 Rudeness/ hostility/lack of cooperation
 Inaccessibility due to tight security

4
 Financial constraints
 Unfavourable weather conditions
 Dangerous animals.

Please note that the points raised above will vary according to the type of survey
that one wants to conduct.
Sampling

Population

• All people or items with the characteristic one wishes to understand. It is


also called the sampling frame

• . For example, a manufacturer needs to decide whether a batch of material


from production is of high enough quality to be released to the customer, or
should be sentenced for scrap or rework due to poor quality. In this case,
the batch is the population.

Probability sampling

• Every unit in the population has a chance (greater than zero) of being
selected in the sample, and this probability can be accurately determined.

Non probability sampling

• is any sampling method where some elements of the population have no


chance of selection (these are sometimes referred to as 'out of
coverage'/'under covered'), or where the probability of selection can't be
accurately determined.

• It involves the selection of elements based on assumptions regarding the


population of interest, which forms the criteria for selection

5
• Example: We visit every household in a given street, and interview the first
person to answer the door. In any household with more than one occupant,
this is a non probability sample, because some people are more likely to
answer the door (e.g. an unemployed person who spends most of their time
at home is more likely to answer than an employed housemate who might
be at work when the interviewer calls) and it's not practical to calculate
these probabilities.

Types of probability sampling

• Simple Random Sampling( random point, area or line)

• Systematic Sampling
• Stratified Sampling
Simple random sampling
• Each element of the frame thus has an equal probability of selection:
• the frame is not subdivided or partitioned.
• any given pair of elements has the same chance of selection as any other
such pair (and similarly for triples, and so on).
• This minimises bias and simplifies analysis of results
Random sampling
• involves the use of random numbers
• Each member within the population is given a number and the numbers are
then chosen at random.

6
Advantages and disadvantages
Advantages
• Highly representative if all subjects participate;
• it eliminates bias, each member in the population has an equal chance of
being selected
Disadvantages
• Not possible without complete list of population members;
• potentially uneconomical to achieve;
• can be disruptive to isolate members from a group;
• time-scale may be too long, data/sample could change
Stratified random
• Used where the population embraces a number of distinct categories, the
frame can be organized by these categories into separate "strata.“
• Each stratum is then sampled as an independent sub-population, out of
which individual elements can be randomly selected. There are several
potential benefits to stratified sampling.
A stratified sampling approach is most effective when three conditions are met.i.e.
• Variability within strata are minimized
• Variability between strata are maximized
• The variables upon which the population is stratified are strongly correlated
with the desired dependent variable.
Advantages over other sampling methods
• Focuses on important subpopulations and ignores irrelevant ones.
• Allows use of different sampling techniques for different subpopulations.
• Improves the accuracy/efficiency of estimation.
• Permits greater balancing of statistical power of tests of differences between
strata by sampling equal numbers from strata varying widely in size.
Disadvantages
• Requires selection of relevant stratification variables which can be difficult.
• Is not useful when there are homogeneous subgroups.
• Can be expensive to implement.
Systematic sampling
• Relies on arranging the target population according to some ordering scheme
and then selecting elements at regular intervals through that ordered list.
• Systematic sampling involves a random start and then proceeds with the
selection of every kth element from then onwards.
Advantages

7
• It is easy to implement and the stratification induced can make it efficient,
• if the variable by which the list is ordered is correlated with the variable of
interest. 'Every 10th' sampling is especially useful for efficient sampling from
databases
• Can ensure that specific groups are represented, even proportionally, in the
sample(s) (e.g., by gender), by selecting individuals from strata list
• For example, suppose we wish to sample people from a long street that
starts in a poor area (house No. 1) and ends in an expensive district (house
No. 1000).
• A simple random selection of addresses from this street could easily end
up with too many from the high end and too few from the low end (or vice
versa), leading to an unrepresentative sample. Selecting (e.g.) every 10th
street number along the street ensures that the sample is spread evenly
along the length of the street, representing all of these districts. (Note that if
we always start at house #1 and end at #991, the sample is slightly biased
towards the low end; by randomly selecting the start between #1 and #10,
this bias is eliminated.
Disadvantages
• However, systematic sampling is especially vulnerable to periodicities in the
list.
• If periodicity is present and the period is a multiple or factor of the interval
used, the sample is especially likely to be unrepresentative of the overall
population, making the scheme less accurate than simple random sampling.
Example
• Consider a street where the odd-numbered houses are all on the north
(expensive) side of the road, and the even-numbered houses are all on the
south (cheap) side. Under the sampling scheme given above, it is impossible
to get a representative sample; either the houses sampled will all be from
the odd-numbered, expensive side, or they will all be from the even-
numbered, cheap side.
• All elements have the same probability of selection (in the example given,
one in ten). It is not 'simple random sampling' because different subsets of
the same size have different selection probabilities - e.g. the set
{4,14,24,...,994} has a one-in-ten probability of selection, but the set
{4,13,24,34,...} has zero probability of selection.
Disadvantages
• More complex,

8
• requires greater effort than simple random;
• strata must be carefully defined
Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters').
Sampling is often clustered by geography, or by time periods. (Nearly all
samples are in some sense 'clustered' in time - although this is rarely taken into
account in the analysis.) For instance, if surveying households within a city, we
might choose to select 100 city blocks and then interview every household
within the selected blocks.

Advantages
• . Clustering can reduce travel and administrative costs.
• In the example above, an interviewer can make a single trip to visit several
households in one block, rather than having to drive to a different block for
each household.
• It also means that one does not need a sampling frame listing all elements
in the target population.
• Instead, clusters can be chosen from a cluster-level frame, with an element-
level frame created only for the selected clusters. In the example above, the
sample only requires a block-level city map for initial selections, and then a
household-level map of the 100 selected blocks, rather than a household-
level map of the whole city.
Cluster sampling generally increases the variability of sample estimates above
that of simple random sampling, depending on how the clusters differ between
themselves, as compared with the within-cluster variation. For this reason,
cluster sampling requires a larger sample than SRS to achieve the same level
of accuracy - but cost savings from clustering might still make this a cheaper
option.
• In quota sampling, the population is first segmented into mutually exclusive
sub-groups, just as in stratified sampling.
• Then judgement is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to
sample 200 females and 300 males between the age of 45 and 60.
• It is this second step which makes the technique one of non-probability
sampling.
• In quota sampling the selection of the sample is non-random.

9
• For example interviewers might be tempted to interview those who look most
helpful. The problem is that these samples may be biased because not
everyone gets a chance of selection. This random element is its greatest
weakness and quota versus probability has been a matter of controversy for
many years.
Accidental sampling
• Accidental sampling (sometimes known as grab, convenience or opportunity
sampling) is a type of non probability sampling which involves the sample
being drawn from that part of the population which is close to hand. That is,
a population is selected because it is readily available and convenient. It
may be through meeting the person or including a person in the sample
when one meets them or chosen by finding them through technological
means such as the internet or through phone.
snowball sampling
• Existing study subjects are used to recruit more subjects into the sample.

10
Sampling errors and biases
• Sampling errors and biases are induced by the sample design. They include:
• Selection bias: When the true selection probabilities differ from those
assumed in calculating the results.
• Random sampling error: Random variation in the results due to the
elements in the sample being selected at random.
Non-sampling error
• Non-sampling errors are other errors which can impact the final survey
estimates, caused by problems in data collection, processing, or sample
design. They include:
• Overcoverage: Inclusion of data from outside of the population.
• Undercoverage: Sampling frame does not include elements in the population.
• Measurement error: e.g. when respondents misunderstand a question, or find
it difficult to answer.
• Processing error: Mistakes in data coding.
• Non-response: Failure to obtain complete data from all selected individuals.

Sampling techniques: Advantages and disadvantages

Technique Descriptions Advantages Disadvantages

Simple Random sample Highly representative Not possible without


random from whole if all subjects complete list of
population participate; the ideal population members;

11
potentially uneconomical
to achieve; can be
disruptive to isolate
members from a group;
time-scale may be too
long, data/sample could
change

Stratified Random sample Can ensure that More complex, requires


random from identifiable specific groups are greater effort than
groups (strata), represented, even simple random; strata
subgroups, etc. proportionally, in the must be carefully
sample(s) (e.g., by defined
gender), by selecting
individuals from
strata list

Cluster Random samples Possible to select Clusters in a level must


of successive randomly when no be equivalent and some
clusters of subjects single list of natural ones are not for
(e.g., by institution) population members essential characteristics
until small groups exists, but local lists (e.g., geographic:
are chosen as do; data collected on numbers equal, but
units groups may avoid unemployment rates
introduction of differ)
confounding by
isolating members

Stage Combination of Can make up Complex, combines


cluster (randomly probability sample by limitations of cluster and
selecting clusters) random at stages stratified random
and random or and within groups; sampling
stratified random possible to select
sampling of random sample when

12
individuals population lists are
very localized

Purposive Hand-pick subjects Ensures balance of Samples are not easily


on the basis of group sizes when defensible as being
specific multiple groups are representative of
characteristics to be selected populations due to
potential subjectivity of
researcher

Quota Select individuals Ensures selection of Not possible to prove


as they come to fill adequate numbers of that the sample is
a quota by subjects with representative of
characteristics appropriate designated population
proportional to characteristics
populations

Snowball Subjects with Possible to include No way of knowing


desired traits or members of groups whether the sample is
characteristics give where no lists or representative of the
names of further identifiable clusters population
appropriate subjects even exist (e.g., drug
abusers, criminals)

Volunteer, Either asking for Inexpensive way of Can be highly


accidental, volunteers, or the ensuring sufficient unrepresentative
convenience consequence of not numbers of a study
all those selected
finally participating,
or a set of subjects
who just happen to
be available

13
Questionnaire surveys
• Questionnaires consists of a set of open ended, closed or multiple choice
questions which the respondent has to respond to.
• Open ended questions are those that offer the respondent freedom to
respond using his own words and thoughts.
• Closed questions or multiple choice questions are those where there are a
set of answers from where the respondent chooses the one which matches
his response
General guidelines when designing a questionnaire
• Should be kept anonymous
• Questions must be non-threatening.
• Questions should not ask more than one dimension ( e.g. "Do you like the
texture and flavour of the snack?" If a respondent answers "no", then the
researcher will not know if the respondent dislikes the texture or the flavour,
or both.)
• A good question asks for only one "bit" of information.
• Ask questions that accommodate all the possible responses
• For example, consider the question:
• What type of drink do you like
a) Coke
b) Fanta
c) Sprite
Clearly, there are many problems with this question. What if the respondent doesn't
drink any of the drinks? What if he owns a different brand of computer? What if
he /she dinks all ?
There are two ways to correct this kind of problem.
• The first way is to make each response a separate dichotomous item on the
questionnaire. For example:
• Do you drink soft drinks? (circle: Yes or No)
• Another way to correct the problem is to add the necessary response
categories and allow multiple responses. Which one s___________________
list in order of preference.
• Do not ask ambiguous questions.
• Transitions between questions should be smooth.
• Do not ask leading questions.
• Questions should be short and specific.
• Ask sensitive questions in a socially acceptable way

14
• Design your questionnaire such that it is respondent friendly avoiding the use
of technical jargon and abbreviations.
• Where the answer is obvious fill it in
• Do not ask questions which rely on one’s memory
• Sequence your questionnaire starting with questions which might concern a
person’s background in terms of age, marital status, education level
• Ensure that a separate introductory page is attached to the questionnaire
explaining the purpose of the study, requesting the respondent’s consent and
cooperation.
• Assure confidentiality of the data obtained.
• Your questionnaire should have a heading and a space to insert the number,
date
General guidelines when administering a questionnaire
Administer pre-notification letters-
• They are an excellent (but expensive) way to increase response.
• The researcher needs to weigh the additional cost of sending out a pre-
letter against the probability of a lower response rate.
• When sample sizes are small, every response really counts and a pre-letter
is highly recommended.
Briefly describe why the study is being done and identify the sponsors. This is
impressive and lends credibility to the study.
Explain why the person receiving the pre-letter was chosen to receive the
questionnaire.
• Justify why the respondent should complete the questionnaire.
• The justification must be something that will benefit the respondent
• If an incentive will be included with the questionnaire, mention the inclusion
of a free gift without specifically telling what it will be.
• . Explain how the results will be used.
• Response rate is the single most important indicator of how much confidence
can be placed in the results of a survey. A low response rate can be
devastating to the reliability of a study.
• One of the most powerful tool for increasing response is to use follow-ups or
reminders. Traditionally, between 10 and 60 percent of those sent
questionnaires respond without follow-up reminders. These rates are too low
to yield confident results, so the need to follow up on non-respondents is
clear.

15
• Researchers can increase the response from follow-up attempts by including
another copy of the questionnaire. When designing the follow-up procedure, it
is important for the researcher to keep in mind the unique characteristics of
the people in the sample. The most successful follow-ups have been
achieved by phone calls.
Types of questionnaires
• Self administered-These are questionnaires which are administered by the
researcher directly to the respondents
Advantages
• High response rate
• They provide a chance to clarify questions.
• Allows adjustments to be done using the feedbacks that one gets from the
respondents.
Disadvantages
• Time consuming
• May be inconvenient for some respondents
• Postal -These are questionnaires which are sent by mail to the respondents,
Advantages
• It cuts down on travelling.
• There is no interviewer bias.
• The respondent has more time o respond.
• Can be used when more personal information is required.
Disadvantages
• A lot of time is consumed in designing such a questionnaire.
• Answers cannot be rechecked with the respondents
• One can never be really sure of who exactly completed the questionnaire.
• The respondents can read through the questionnaire and see the line of
thinking and then tailor make the responses to suit the line of questioning
biasing the responses .
In general the advantages of questionnaires
• Questionnaires are very cost effective when compared to face-to-face
interviews.
• Standard questions are asked to all respondents
• Answers can be quantified
• Caters for confidentiality especially when posted
• They can be stored for records and for comparisons
• Allows several questions to be asked in one document

16
• Ensures direct contact with the respondent
• It’s a primary data source meaning that it is original
• It is a fast method of data collection.This is especially true for studies
involving large sample sizes and large geographic areas.
Written questionnaires become even more cost effective as the number of
research questions increases.
Other advantages of using a questionnaire
• Questionnaires are easy to analyze.
• Data entry and tabulation for nearly all surveys can be easily done with
many computer software packages.
• Questionnaires are familiar to most people. Nearly everyone has had some
experience completing questionnaires and they generally do not make people
apprehensive.
• Questionnaires reduce bias.
• There is uniform question presentation and no middle-man bias.
• The researcher's own opinions will not influence the respondent to answer
questions in a certain manner.
• There are no verbal or visual clues to influence the respondent.
• Questionnaires are less intrusive than telephone or face-to-face surveys.
• When a respondent receives a questionnaire in the mail, he is free to
complete the questionnaire on his own time-table.
• Unlike other research methods, the respondent is not interrupted by the
research instrument.
Disadvantages of questionnaires in general
 Some may fail to post back
 Some may be lost in transit if it is a postal quaestionnare
 Can only be completed by literate people
 Language barrier can be a problem
 Respondents may decide to ignore the questionnaire
 Closed questions limit the respondent’s answer.
 There is plenty of room for lying
 Information obtained may be biased as people might be hesitant to tell you
about their habits if asked
 Closed questions leave no room for explanations
Note that here speculation and negatives are allowed.
Interviewing as a method of data collection

17
• This refers to the purposeful oral conversation between the researcher
(interviewer) and the respondent(s).
• The researcher provides both the subject matter and direction of the
interview while the respondent can also have some opportunity to elaborate
on views regarding the topic.
Types of interviews
• Personal
• Telephone
Personal interview
• This involves the face to face conversation between the respondent and the
researcher
Procedure
• Ensure full cooperation of the respondent
• A professional appearance and a brief explanation of the objective of the
study will achieve full cooperation.
• Record whatever the respondent says using either a pen and a notebook or
using a tape recorder.
• Ask the required questions following the order. Where the answer is not
clear probe further.
Advantages of a personal interview
• High degree of flexibility
• Has a less non- response error
• It allows the researcher to gather a lot of information in a very short time.
Disadvantages of a personal interview
• It is costly
• Greater response error as the respondent will be trying not to disappoint the
researcher.
Telephone interview
• This is a voice to voice type of interview
Advantages of a telephone interview
• Saves time since calls are made from one place
• They incorporate a sense of importance since
• Many people are quick to respond to telephones than direct people.
• They are less costly
Disadvantages of a telephone interview
• Good telephone manners required
• Respondent has little time to think

18
• Visual aids cannot be used
• Not everyone has a telephone
• Repeat calls are inevitable
• Straight forward questions are required
Data presentation
Methods of presenting data
• Tabulation – this involves the arrangement of data in rows and columns.The
tabulation can be simple one way or can be cross tabulation where the
relationship between two variables is recorded
Graphical representation if data
• A graph is a pictorial representation of the characteristics of any set of given
variables.
• Graphs allow for quicker interpretation of data
Types of graphs
• Histograms
• Frequency polygons
• Cumulative frequency curves( ogives)
• Line graphs
• Scatter graphs and Regression lines
• Bar graphs
• Circular graphs and pie charts
Histograms
• It is a graph of frequency distribution
• It uses bars to represent changes in a distribution
• The bars touch each other.
Method of construction
• Construct the horizontal axis using a scale which continuous running from
one extreme to another, label it
• Find a suitable scale for frequency on the vertical scale (y- axis)
• Draw a vertical rectangle for each class in the distribution with the base on
the horizontal axis extending from one class to another.
• Do not have gaps between the rectangles

19
Frequency polygon
• It is a graph which has a close relationship with the histogram.
• In such a polygon straight lines are drawn from the midpoints linking the top
of each rectangles of the histogram.
Ogive/cumulative frequency curve
• This is a graph of frequency distribution
• To obtain such a graph add total frequencies for any given class and all the
frequencies above it.
Method of constructing an ogive
• Prepare a cumulative frequency table using the data available
• Decide on a suitable scale depending on the data available
• Prepare the axis using a suitable scale
• Draw the axis
• Insert the points using the values from the table
• Plot the points
• Join the points using a pencil
• Insert title and key
Pie chart
• It is a circle which is divided into sectors so that the area of each sector is
proportional to the quantity being represented.

20
Method of construction
• Draw a circle of any size
• Convert the component parts to percentages and then to degrees.
• Draw a vertical line from the centre of the circle to the top.
• Draw the segments using a protractor to measure the degrees.
• Allocate shades for each sector and shade
• Complete the graph by showing a title and a key (note that you can label
the segments directly).

Advantages of a pie chart


• Present a clear visual impression on how the data is distributed.
• display relative proportions of multiple classes of data
• size of the circle can be made proportional to the total quantity it represents
• summarize a large data set in visual form
• be visually simpler than other types of graphs
• permit a visual check of the reasonableness or accuracy of calculations
• require minimal additional explanation
• be easily understood due to widespread use in business and the media.
• Comparative
• quantitative
Disadvantages of a pie chart
• do not easily reveal exact values
• Many pie charts may be needed to show changes over time
• fail to reveal key assumptions, causes, effects, or patterns
• be easily manipulated to yield false impressions.
• Rounding off results in loss of information.

21
Bar graph
• Bar graphs are good for showing how data change over time.
How to make bar graphs
• Consider the range of data and decide on a suitable scale
• Draw the axes using the scale and label the axes
• Draw the bars in proportion to the quantity being represented.
• Leave equal spaces between the bars and
• bars should be of the same thickness
• Shade the bars
• Add title and key. See diagram

Try and modify this diagram such that it has atitle and a key remember those
are 2 marks in an examination
Advantages of bar graphs
• Clear visual impression of the data

22
• show each data category in a frequency distribution
• display relative numbers or proportions of multiple categories
• summarize a large data set in visual form
• clarify trends better than do tables
• estimate key values at a glance
• permit a visual check of the accuracy and reasonableness of calculations
• be easily understood due to widespread use in business and the media.
• Comparative
• quantitative
Disadvantages of bar graphs
• loss of information when rounding off
• require additional explanation
• be easily manipulated to yield false impressions
• fail to reveal key assumptions, causes, effects, or patterns
Types of bar graphs
• Simple-individual bars used where the length of each bar represent the size
of the figure being represented
• Component- the length of each component part represent the size of the
component being represented
• Divergent-used where there is need to show two opposite set of data or
contrasting set of data
• Multiple-quantities belonging to a given common source are represented by
bars adjoining each other.
Component bar graph

Divergent/Bi-polar bar graph

23
Multiple bar graph

The advantages and disadvantages are the same as those under simple bar
graphs.
Scatter Graphs
• Scatter graphs are used to investigate the relationship between two variables
(or aspects) for a set of paired data. The pattern of the scatter describes the
relationship as shown in the examples below. Best-fit or trend lines should:
• Follow the trend of the data
• Join as many points as possible
• Leave an equal number of unconnected points on either side.
Method of construction
 Draw the axes vertical and horizontal
 Label the axes
 Plot in the points
 Draw aline of best fit
 Label the places on the regression line
 Insert a title
 Insert a key.
 See diagram

24
25
The line of best fit
• This is a line which summarises the pattern of dots on scatter graphs
• The closer the points are to the line, the closer the relationship between the
points
• Can be used to estimate other values not given in the data.
Disadvantages of scatter graphs
• They do not incorporate time.
• The eye is used to place the line of best fit and this can lead to errors
• When the line of best fit is drawn mathematically it is independent of
subjective judgements and such a line is called a regression line.
Method of drawing a regression line
• Calculate the averages for the data on each axis and mark them on the
graph
• Mark the point where the two averages meet
• Draw a line parallel to the y- axis to pass through the point.
• Calculate the mean of the points to the left and that of the points to the
right
• Draw a line which passes through all the three points.
• This line represents the regression line
Example:
• Price changes of a convenience item along an environmental gradient in El
Raval, Barcelona.
• The hypothesis tested is that prices should decrease with distance from the
key area of gentrification surrounding the Contemporary Art Museum.
• The line followed is Transect 2 in the map below, with continuous sampling
of the price of a small bottle water at every convenience store.
Map to show the location of environmental gradients for transect lines in El Raval,
Barcelona

26
Distance along transect from Contemporary Art Price of a small bottle of
Museum water (euros)

1 1.80

2 1.20

3 2.00

4 1.00

5 1.00

6 1.20

7 0.80

8 0.60

9 1.00

10 0.85

Scatter graph showing changes in prices of a convenient item

27
Choropleth Maps
• These are maps, where areas are shaded according to a prearranged key,
each shading or colour type representing a range of values.
• Population density information, expressed as 'per km²,' is appropriately
represented using a choropleth map.
• Choropleth maps are also appropriate for indicating differences in land use,
like the amount of recreational land or type of forest cover.
Method of construction of a choropleth map
• Calculate the range of data by subtracting the lowest value from the highest
value as this will help you to come out with class interval.
• Decide on the method of progression to use either arithmetic for evenly
distributed data and geometric progression for unevenly distributed data.
• Decide on the number of classes
• Decide on a systematic way of shading.
• The shading process should highlight the highest values by using the more
intense values
• Shade
• Insert a key and a title.
Example of a choropleth map

28
Advantages of choropleth maps
• give a good visual impression of change over space there are certain
disadvantages to using them:
• Can be used with other methods
• Fairly easy to construct as they do not require lengthy and difficult
calculations
• Can be used even when the range of values have extremes. In this case
uneven class intervals are used.
Disadvantages of Choropleth Maps
• They give a false impression of abrupt change at the boundaries of shaded
units.
• Choropleths are often not suitable for showing total values. Proportional
symbols overlays (included on the choropleth map above) are one solution to
this problem.
• It can be difficult to distinguish between different shades.
• Variations within map units are hidden, and for this reason smaller units are
better than large ones.
• Uniformity of shading within a class assumes that there is uniform
distribution which is not always the case.
• Where there is a wide range of data uneven classes can be made large and
uneven and the net effect will be a higher level of generalisation.
• When used on a large scale like the world the shading process becomes
time consuming.
Sample question

29
Isopleth maps
• Lines of equal value are drawn such that all values on one side are higher
than the "isoline" value and all values on the other side are lower, or
• Ranges of similar value are filled with similar colours or patterns.
• This type of map is ideal for showing gradual change over space and avoids
the abrupt changes which boundary lines produce on choropleth maps.
Temperature, for example, is a phenomenon that should be mapped using
isoplething, since temperature exists at every point (is continuous), yet does
not change abruptly at any point (like population density may do as you
cross into another census zone). Relief maps should always be in isopleth
form for this reason.
Method of construction
• Note the lowest value from the highest value as this will help you to come
out with an isoline interval.
• Decide on the isoline interval looking at the values and the possible
multiples.
• Start from the lowest to the highest, draw the isolines.
• Join the points with the same values and where the values for a particular
isoline are not indicated use interpolation.
Procedure for interpolation

30
• Measure the distance between two adjacent points and divide the
measurement into 2.
• Mark the centre with an X
• Do the same for all such points
• Join the marked points with a smooth line/curve breaking it to enable
labeling.
Advantages of an isoline map
• Easy to construct once the points of equal value have been established.
• Isolines show the gradualness of change in a distribution
• Isolines allow one to interpolate thus allowing values which are not on the
map to be represented.
• They are also easy to interpret
• They give a clear visual representation of data by putting some order in the
way the data is changing.
• They deal with individual values and not averages.
• Any value can be found on the map due to interpolation
• Isolines can be combined with other methods. See diagram

Sample question

31
Disadvantages of an isoline map
• Interpolation it difficult and also makes the map inaccurate as it is based on
assumptions.
• It also assumes uniformity between isolines which might not always be the
case.
• The method cannot accommodate extreme values.
The dot map
• This is a map which uses dots to represent values.e.g. the distribution and
density of population
Considerations
• Dot value-low enough to avoid emptiness and high enough to avoid many
dots and merging
• Dot size-must be uniform throughout and should conform to the scale of the
base map.
• Dot location-to depict actual distribution, no dots should be placed in areas
such as mountains of lakes or swamps.
Method of construction
• Obtain a large base map and sub divide the map into smaller units
• Obtain the figures of the sub areas and a physical map of the areas as this
will help in the placement of dots.
• Consider the range of data as this will help in coming up with the dot value.
• Where dots give fractions these are rounded off.
• Calculate the number of dots for each area

32
• Trace the base map on plain paper noting areas such as mountains and
swamps so that you do not place the dots in areas which cannot be
inhabited.
• Place the dots on the map avoiding the inhabitable areas using pencil so
that you can rub if the dots are placed in wrong areas.
• Put a key and a title.
Advantages of a dot map
• It produces a clear visual impression of the data
• Allows comparison to be made
• Easy to use
Disadvantages of a dot map
• Dots can merge
• Even distribution of dots suggests that the population is evenly distributed
which cannot always be the case.
• Rounding off causes loss of information, in this case working backwards
becomes difficult.
• Areas with low population densities will be difficult to place the dots.
• It is difficult to copy the map for other uses.
Proportional circles
• This is a method which uses circles which are proportional to the quantity
they represent.
Method of construction of proportional circles
• Identify on a map the areas where the circles will be drawn
• Obtain information about the numbers of people who are in each of the
areas.
• Divide the numbers by a constant then find the square root of the figures so
as to determine the radius of the circle for each centre.
• If the numbers remain high divide further by another constant if they become
too small multiply by a constant.
• Draw the proportional circles with the centres of the circles on the centres of
the towns
• Complete the map by inserting a key and a title.
Advantages
 Easy to find an estimate of the original values by reading off the varying
circle sizes (it is quantitative)
 Produces a clear visual/pictorial impression of the data
 Shows the spread of data on the map

33
 Different circle sizes solve the problem of very wide range offigures
 It allows comparison of places/ citie/towns to be done
 It has a wide range of uses/versatile [3/2]
Disadvantages
 Leaves blank areas which appear empty
 Rounding off of figures causers loss of information
 Circles merge and this results in loss of information and also makes
interpretation difficult
 There is need to constantly refer to the key which is time consuming
 The circles disregards administrative boundaries by averlapping
 Shading obscures base map features. [3/2] max [5]
=

Flow line maps (routed flow maps)


• These are maps which show movement along routes to and from a focal
point.
• The width of the flow lines will be proportional to the volume being
represented.
Method of construction of a flow line map
• Trace out from the base map the route system.
• Choose an appropriate scale relative to the scale of the map.
• Use the highest and smallest values to come up with the scale.
• Divide the flow line scaled value by two.
• Draw the flow lines using the scale.
• Add title and a key.
Triangular Graphs
• Used where three variables have to be shown.
• Each axis is divided into 100, representing percentages.
• From each 100-0% axis, lines are drawn at angles of 60 degrees to carry
the values.
• The data used must be in the form of three components, each component
representing a percentage value, and the three component percentage values
must add up to 100 per cent.
• The position of the plots indicates the relative dominance of each of the
three components and the value of the graph arises in giving a quick visual
comparison of contrasting component dominance for different areas.

34
• It is particularly useful in identifying changes over time, since a position on
the graph will change as the relative dominance of the components change.
• The graph can be used to show contrasting service structures for 4 locations
in El Raval, an inner-city area of Barcelona which has been the subject of
radical urban reform.
• The choice of the three graph components is important and must be in the
context of the investigation. An example of data from one location (El Raval
Site 2) is shown in map 1 below, and this has been used along with data
from three other sites (1,3 and 4) to compile the triangular graph.

Measure of Central Tendency


MEAN
Use the mean to describe the middle of a set of data that does not have an
outlier. It can be used to predict the future or to determine the level of life of
people e.g. the average income of a person in a country gives an idea on
purchasing power.
• Advantages:
• Most popular measure in fields such as business, engineering and
computer science.

35
• It is unique - there is only one answer.
• Useful when comparing sets of data.
• Disadvantages
• Affected by extreme values (outliers)
MEDIAN
Use the median to describe the middle of a set of data that does have an outlier.
Equal numbers occur above and below it.
• Advantages:
• Extreme values (outliers) do not affect the median as strongly as they
do the mean.
• Useful when comparing sets of data.
• It is unique - there is only one answer.

Disadvantages:
• Not as popular as mean.
• MODE
it shows the most popular item in a distribution. Used when the data is
numeric or non-numeric. For example the most common shoe size in a
class.
• Advantages:
• Extreme values (outliers) do not affect the mode.
• Disadvantages:
• Not as popular as mean and median.
• Not necessarily unique - may be more than one answer
• When no values repeat in the data set, the mode is every value and
is useless.
• When there is more than one mode, it is difficult to interpret and/or
compare.
Measures of dispersion
• The range-is the difference between the highest and the lowest values , it is
affected by extreme values, it fails to indicate the degree of clustering within
a distribution.
The quartile deviation
• Quartiles are numbers that divide a distribution into 4 parts.
• There are three quartiles i.e. The lower quartile Q1 , the middle quartile Q2
and the upper quartile Q3.
• The inter quartile range is derived from subtracting Q1 from Q3 (Q3-Q1)

36
How to calculate the quartiles
• Arrange the values in ascending order.
• Find the median-it is equal to the middle quartile Q2.
• Find the middle point of the values below Q2 this is Q1.
• Find the middle point of the values above Q2 this is Q3.
• Subtract Q1 from Q3.
The significance of the quartiles
• The measure of dispersion that uses quartiles is called the quartile deviation
and is calculated using the formula Q3-Q1 divided by 2
• E.g. 4,5,6,7,8,9,10.
• Median Q2 is 7, Q1 is 5, Q3 is 9.
• Quartile deviation is 9-5divided by 2= 4/2 =2.
• The answer obtained indicate how half of all the items differ from the
median e.g. 2and 6 if the median is 7.
• Advantages of the quartile deviation.
• It is not affected by extreme values.
• Can be calculated even with open ended classes.
• It gives a good description of the half of the population that occurs between
the lower and the upper quartiles.
Disadvantages of the quartile deviation
 It gives no real indication about the degree of clustering.
 The values derived cannot be used for further mathematical calculations.
 Does not make use of all the values in the distribution.

Standard Deviation
• Standard deviation is a number that tells you approximately how far the
values in a data set deviate from the mean (the average).
• The larger the standard deviation, the larger the deviation.
• The smaller the standard deviation, the smaller the deviation.
• If all of the values are equal, the standard deviation is equal to zero.
You should also look at the mean when you interpret the standard deviation.
• There are 100 pirates on the ship. In statistical terms this means we have a
population of 100.
• If we know the amount of gold coins each of the 100 pirates have, we use
the standard deviation equation for an entire population:

37
• What if we don't know the amount of gold coins each of the 100 pirates
have? For example, we only had enough time to ask 5 pirates how many
gold coins they have. In statistical terms this means we have a sample size
of 5 and in this case we use the standard deviation equation for a sample
of a population:

38

Vous aimerez peut-être aussi