Académique Documents
Professionnel Documents
Culture Documents
Population
• All people or items with the characteristic one wishes to understand. It is also called the
sampling frame
• . For example, a manufacturer needs to decide whether a batch of material from production
is of high enough quality to be released to the customer, or should be sentenced for scrap or
rework due to poor quality. In this case, the batch is the population.
Probability sampling
• Every unit in the population has a chance (greater than zero) of being selected in the sample,
and this probability can be accurately determined.
• is any sampling method where some elements of the population have no chance of selection
(these are sometimes referred to as 'out of coverage'/'under covered'), or where the
probability of selection can't be accurately determined.
• Example: We visit every household in a given street, and interview the first person to answer
the door. In any household with more than one occupant, this is a non probability sample,
because some people are more likely to answer the door (e.g. an unemployed person who
spends most of their time at home is more likely to answer than an employed housemate
who might be at work when the interviewer calls) and it's not practical to calculate these
probabilities. Types of probability sampling
• Systematic Sampling
• Stratified Sampling
Simple random sampling
• Each element of the frame thus has an equal probability of selection:
• the frame is not subdivided or partitioned.
• any given pair of elements has the same chance of selection as any other such pair (and
similarly for triples, and so on).
• This minimises bias and simplifies analysis of results
Random sampling
• involves the use of random numbers
• Each member within the population is given a number and the numbers are then chosen at
random.
Advantages and disadvantages
Advantages
• Highly representative if all subjects participate;
• it eliminates bias, each member in the population has an equal chance of being selected
Disadvantages
• Not possible without complete list of population members;
• potentially uneconomical to achieve;
• can be disruptive to isolate members from a group;
1
• time-scale may be too long, data/sample could change
Stratified random
• Used where the population embraces a number of distinct categories, the frame can be
organized by these categories into separate "strata.“
• Each stratum is then sampled as an independent sub-population, out of which individual
elements can be randomly selected. There are several potential benefits to stratified
sampling.
A stratified sampling approach is most effective when three conditions are met.i.e.
• Variability within strata are minimized
• Variability between strata are maximized
• The variables upon which the population is stratified are strongly correlated with the desired
dependent variable.
Advantages over other sampling methods
• Focuses on important subpopulations and ignores irrelevant ones.
• Allows use of different sampling techniques for different subpopulations.
• Improves the accuracy/efficiency of estimation.
• Permits greater balancing of statistical power of tests of differences between strata by
sampling equal numbers from strata varying widely in size.
Disadvantages
• Requires selection of relevant stratification variables which can be difficult.
• Is not useful when there are homogeneous subgroups.
• Can be expensive to implement.
Systematic sampling
• Relies on arranging the target population according to some ordering scheme and then
selecting elements at regular intervals through that ordered list.
• Systematic sampling involves a random start and then proceeds with the selection of every
kth element from then onwards.
Advantages
• It is easy to implement and the stratification induced can make it efficient,
• if the variable by which the list is ordered is correlated with the variable of interest. 'Every
10th' sampling is especially useful for efficient sampling from databases
• Can ensure that specific groups are represented, even proportionally, in the sample(s) (e.g.,
by gender), by selecting individuals from strata list
• For example, suppose we wish to sample people from a long street that starts in a poor area
(house No. 1) and ends in an expensive district (house No. 1000).
• A simple random selection of addresses from this street could easily end up with too many
from the high end and too few from the low end (or vice versa), leading to an
unrepresentative sample. Selecting (e.g.) every 10th street number along the street ensures
that the sample is spread evenly along the length of the street, representing all of these
districts. (Note that if we always start at house #1 and end at #991, the sample is slightly
biased towards the low end; by randomly selecting the start between #1 and #10, this bias is
eliminated.
Disadvantages
• However, systematic sampling is especially vulnerable to periodicities in the list.
• If periodicity is present and the period is a multiple or factor of the interval used, the sample
is especially likely to be unrepresentative of the overall population, making the scheme less
accurate than simple random sampling.
Example
• Consider a street where the odd-numbered houses are all on the north (expensive) side of
the road, and the even-numbered houses are all on the south (cheap) side. Under the
sampling scheme given above, it is impossible to get a representative sample; either the
2
houses sampled will all be from the odd-numbered, expensive side, or they will all be from
the even-numbered, cheap side.
• All elements have the same probability of selection (in the example given, one in ten). It is
not 'simple random sampling' because different subsets of the same size have different
selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten probability of selection,
but the set {4,13,24,34,...} has zero probability of selection.
Disadvantages
• More complex,
• requires greater effort than simple random;
• strata must be carefully defined
Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters'). Sampling is often
clustered by geography, or by time periods. (Nearly all samples are in some sense 'clustered' in
time - although this is rarely taken into account in the analysis.) For instance, if surveying
households within a city, we might choose to select 100 city blocks and then interview every
household within the selected blocks.
Advantages
• . Clustering can reduce travel and administrative costs.
• In the example above, an interviewer can make a single trip to visit several households in
one block, rather than having to drive to a different block for each household.
• It also means that one does not need a sampling frame listing all elements in the target
population.
• Instead, clusters can be chosen from a cluster-level frame, with an element-level frame
created only for the selected clusters. In the example above, the sample only requires a
block-level city map for initial selections, and then a household-level map of the 100
selected blocks, rather than a household-level map of the whole city.
Cluster sampling generally increases the variability of sample estimates above that of simple
random sampling, depending on how the clusters differ between themselves, as compared with
the within-cluster variation. For this reason, cluster sampling requires a larger sample than SRS
to achieve the same level of accuracy - but cost savings from clustering might still make this a
cheaper option.
• In quota sampling, the population is first segmented into mutually exclusive sub-groups, just
as in stratified sampling.
• Then judgement is used to select the subjects or units from each segment based on a
specified proportion. For example, an interviewer may be told to sample 200 females and
300 males between the age of 45 and 60.
• It is this second step which makes the technique one of non-probability sampling.
• In quota sampling the selection of the sample is non-random.
• For example interviewers might be tempted to interview those who look most helpful. The
problem is that these samples may be biased because not everyone gets a chance of
selection. This random element is its greatest weakness and quota versus probability has
been a matter of controversy for many years.
Accidental sampling
• Accidental sampling (sometimes known as grab, convenience or opportunity sampling) is a
type of non probability sampling which involves the sample being drawn from that part of
the population which is close to hand. That is, a population is selected because it is readily
available and convenient. It may be through meeting the person or including a person in the
sample when one meets them or chosen by finding them through technological means such
as the internet or through phone.
snowball sampling
• Existing study subjects are used to recruit more subjects into the sample.
3
What to do when planning for a survey
Request for permission to conduct the survey from the responsible authorities
Once granted obtain a large base map of the area which will help you
identify the places or points where you want to conduct your study
Conduct a pre-survey of the area to see whether it is feasible for you to
conduct the survey alone
Look for helpers
Design a questionnaire/interview schedule
Read around the topic where necessary
Prepare adequate equipment e.g. note books, pens, safety clothing, cameras
Decide on sampling technique
Observe
Collect statistical data
Collect samples where possible
Draw sketch maps where possible
Administer questionnaires
Ask questions/interview people
Record answers in note books
Count traffic or people
Phone people where possible
Take photos and videos where possible
Illiteracy
Falsehood/lies
Biased information
Confidentiality/secretive
Ignorance
Rudeness/ hostility/lack of cooperation
Inaccessibility due to tight security
4
Financial constraints
Unfavourable weather conditions
Dangerous animals.
Please note that the points raised above will vary according to the type of survey
that one wants to conduct.
Sampling
Population
Probability sampling
• Every unit in the population has a chance (greater than zero) of being
selected in the sample, and this probability can be accurately determined.
5
• Example: We visit every household in a given street, and interview the first
person to answer the door. In any household with more than one occupant,
this is a non probability sample, because some people are more likely to
answer the door (e.g. an unemployed person who spends most of their time
at home is more likely to answer than an employed housemate who might
be at work when the interviewer calls) and it's not practical to calculate
these probabilities.
• Systematic Sampling
• Stratified Sampling
Simple random sampling
• Each element of the frame thus has an equal probability of selection:
• the frame is not subdivided or partitioned.
• any given pair of elements has the same chance of selection as any other
such pair (and similarly for triples, and so on).
• This minimises bias and simplifies analysis of results
Random sampling
• involves the use of random numbers
• Each member within the population is given a number and the numbers are
then chosen at random.
6
Advantages and disadvantages
Advantages
• Highly representative if all subjects participate;
• it eliminates bias, each member in the population has an equal chance of
being selected
Disadvantages
• Not possible without complete list of population members;
• potentially uneconomical to achieve;
• can be disruptive to isolate members from a group;
• time-scale may be too long, data/sample could change
Stratified random
• Used where the population embraces a number of distinct categories, the
frame can be organized by these categories into separate "strata.“
• Each stratum is then sampled as an independent sub-population, out of
which individual elements can be randomly selected. There are several
potential benefits to stratified sampling.
A stratified sampling approach is most effective when three conditions are met.i.e.
• Variability within strata are minimized
• Variability between strata are maximized
• The variables upon which the population is stratified are strongly correlated
with the desired dependent variable.
Advantages over other sampling methods
• Focuses on important subpopulations and ignores irrelevant ones.
• Allows use of different sampling techniques for different subpopulations.
• Improves the accuracy/efficiency of estimation.
• Permits greater balancing of statistical power of tests of differences between
strata by sampling equal numbers from strata varying widely in size.
Disadvantages
• Requires selection of relevant stratification variables which can be difficult.
• Is not useful when there are homogeneous subgroups.
• Can be expensive to implement.
Systematic sampling
• Relies on arranging the target population according to some ordering scheme
and then selecting elements at regular intervals through that ordered list.
• Systematic sampling involves a random start and then proceeds with the
selection of every kth element from then onwards.
Advantages
7
• It is easy to implement and the stratification induced can make it efficient,
• if the variable by which the list is ordered is correlated with the variable of
interest. 'Every 10th' sampling is especially useful for efficient sampling from
databases
• Can ensure that specific groups are represented, even proportionally, in the
sample(s) (e.g., by gender), by selecting individuals from strata list
• For example, suppose we wish to sample people from a long street that
starts in a poor area (house No. 1) and ends in an expensive district (house
No. 1000).
• A simple random selection of addresses from this street could easily end
up with too many from the high end and too few from the low end (or vice
versa), leading to an unrepresentative sample. Selecting (e.g.) every 10th
street number along the street ensures that the sample is spread evenly
along the length of the street, representing all of these districts. (Note that if
we always start at house #1 and end at #991, the sample is slightly biased
towards the low end; by randomly selecting the start between #1 and #10,
this bias is eliminated.
Disadvantages
• However, systematic sampling is especially vulnerable to periodicities in the
list.
• If periodicity is present and the period is a multiple or factor of the interval
used, the sample is especially likely to be unrepresentative of the overall
population, making the scheme less accurate than simple random sampling.
Example
• Consider a street where the odd-numbered houses are all on the north
(expensive) side of the road, and the even-numbered houses are all on the
south (cheap) side. Under the sampling scheme given above, it is impossible
to get a representative sample; either the houses sampled will all be from
the odd-numbered, expensive side, or they will all be from the even-
numbered, cheap side.
• All elements have the same probability of selection (in the example given,
one in ten). It is not 'simple random sampling' because different subsets of
the same size have different selection probabilities - e.g. the set
{4,14,24,...,994} has a one-in-ten probability of selection, but the set
{4,13,24,34,...} has zero probability of selection.
Disadvantages
• More complex,
8
• requires greater effort than simple random;
• strata must be carefully defined
Cluster sampling
Sometimes it is more cost-effective to select respondents in groups ('clusters').
Sampling is often clustered by geography, or by time periods. (Nearly all
samples are in some sense 'clustered' in time - although this is rarely taken into
account in the analysis.) For instance, if surveying households within a city, we
might choose to select 100 city blocks and then interview every household
within the selected blocks.
Advantages
• . Clustering can reduce travel and administrative costs.
• In the example above, an interviewer can make a single trip to visit several
households in one block, rather than having to drive to a different block for
each household.
• It also means that one does not need a sampling frame listing all elements
in the target population.
• Instead, clusters can be chosen from a cluster-level frame, with an element-
level frame created only for the selected clusters. In the example above, the
sample only requires a block-level city map for initial selections, and then a
household-level map of the 100 selected blocks, rather than a household-
level map of the whole city.
Cluster sampling generally increases the variability of sample estimates above
that of simple random sampling, depending on how the clusters differ between
themselves, as compared with the within-cluster variation. For this reason,
cluster sampling requires a larger sample than SRS to achieve the same level
of accuracy - but cost savings from clustering might still make this a cheaper
option.
• In quota sampling, the population is first segmented into mutually exclusive
sub-groups, just as in stratified sampling.
• Then judgement is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to
sample 200 females and 300 males between the age of 45 and 60.
• It is this second step which makes the technique one of non-probability
sampling.
• In quota sampling the selection of the sample is non-random.
9
• For example interviewers might be tempted to interview those who look most
helpful. The problem is that these samples may be biased because not
everyone gets a chance of selection. This random element is its greatest
weakness and quota versus probability has been a matter of controversy for
many years.
Accidental sampling
• Accidental sampling (sometimes known as grab, convenience or opportunity
sampling) is a type of non probability sampling which involves the sample
being drawn from that part of the population which is close to hand. That is,
a population is selected because it is readily available and convenient. It
may be through meeting the person or including a person in the sample
when one meets them or chosen by finding them through technological
means such as the internet or through phone.
snowball sampling
• Existing study subjects are used to recruit more subjects into the sample.
10
Sampling errors and biases
• Sampling errors and biases are induced by the sample design. They include:
• Selection bias: When the true selection probabilities differ from those
assumed in calculating the results.
• Random sampling error: Random variation in the results due to the
elements in the sample being selected at random.
Non-sampling error
• Non-sampling errors are other errors which can impact the final survey
estimates, caused by problems in data collection, processing, or sample
design. They include:
• Overcoverage: Inclusion of data from outside of the population.
• Undercoverage: Sampling frame does not include elements in the population.
• Measurement error: e.g. when respondents misunderstand a question, or find
it difficult to answer.
• Processing error: Mistakes in data coding.
• Non-response: Failure to obtain complete data from all selected individuals.
11
potentially uneconomical
to achieve; can be
disruptive to isolate
members from a group;
time-scale may be too
long, data/sample could
change
12
individuals population lists are
very localized
13
Questionnaire surveys
• Questionnaires consists of a set of open ended, closed or multiple choice
questions which the respondent has to respond to.
• Open ended questions are those that offer the respondent freedom to
respond using his own words and thoughts.
• Closed questions or multiple choice questions are those where there are a
set of answers from where the respondent chooses the one which matches
his response
General guidelines when designing a questionnaire
• Should be kept anonymous
• Questions must be non-threatening.
• Questions should not ask more than one dimension ( e.g. "Do you like the
texture and flavour of the snack?" If a respondent answers "no", then the
researcher will not know if the respondent dislikes the texture or the flavour,
or both.)
• A good question asks for only one "bit" of information.
• Ask questions that accommodate all the possible responses
• For example, consider the question:
• What type of drink do you like
a) Coke
b) Fanta
c) Sprite
Clearly, there are many problems with this question. What if the respondent doesn't
drink any of the drinks? What if he owns a different brand of computer? What if
he /she dinks all ?
There are two ways to correct this kind of problem.
• The first way is to make each response a separate dichotomous item on the
questionnaire. For example:
• Do you drink soft drinks? (circle: Yes or No)
• Another way to correct the problem is to add the necessary response
categories and allow multiple responses. Which one s___________________
list in order of preference.
• Do not ask ambiguous questions.
• Transitions between questions should be smooth.
• Do not ask leading questions.
• Questions should be short and specific.
• Ask sensitive questions in a socially acceptable way
14
• Design your questionnaire such that it is respondent friendly avoiding the use
of technical jargon and abbreviations.
• Where the answer is obvious fill it in
• Do not ask questions which rely on one’s memory
• Sequence your questionnaire starting with questions which might concern a
person’s background in terms of age, marital status, education level
• Ensure that a separate introductory page is attached to the questionnaire
explaining the purpose of the study, requesting the respondent’s consent and
cooperation.
• Assure confidentiality of the data obtained.
• Your questionnaire should have a heading and a space to insert the number,
date
General guidelines when administering a questionnaire
Administer pre-notification letters-
• They are an excellent (but expensive) way to increase response.
• The researcher needs to weigh the additional cost of sending out a pre-
letter against the probability of a lower response rate.
• When sample sizes are small, every response really counts and a pre-letter
is highly recommended.
Briefly describe why the study is being done and identify the sponsors. This is
impressive and lends credibility to the study.
Explain why the person receiving the pre-letter was chosen to receive the
questionnaire.
• Justify why the respondent should complete the questionnaire.
• The justification must be something that will benefit the respondent
• If an incentive will be included with the questionnaire, mention the inclusion
of a free gift without specifically telling what it will be.
• . Explain how the results will be used.
• Response rate is the single most important indicator of how much confidence
can be placed in the results of a survey. A low response rate can be
devastating to the reliability of a study.
• One of the most powerful tool for increasing response is to use follow-ups or
reminders. Traditionally, between 10 and 60 percent of those sent
questionnaires respond without follow-up reminders. These rates are too low
to yield confident results, so the need to follow up on non-respondents is
clear.
15
• Researchers can increase the response from follow-up attempts by including
another copy of the questionnaire. When designing the follow-up procedure, it
is important for the researcher to keep in mind the unique characteristics of
the people in the sample. The most successful follow-ups have been
achieved by phone calls.
Types of questionnaires
• Self administered-These are questionnaires which are administered by the
researcher directly to the respondents
Advantages
• High response rate
• They provide a chance to clarify questions.
• Allows adjustments to be done using the feedbacks that one gets from the
respondents.
Disadvantages
• Time consuming
• May be inconvenient for some respondents
• Postal -These are questionnaires which are sent by mail to the respondents,
Advantages
• It cuts down on travelling.
• There is no interviewer bias.
• The respondent has more time o respond.
• Can be used when more personal information is required.
Disadvantages
• A lot of time is consumed in designing such a questionnaire.
• Answers cannot be rechecked with the respondents
• One can never be really sure of who exactly completed the questionnaire.
• The respondents can read through the questionnaire and see the line of
thinking and then tailor make the responses to suit the line of questioning
biasing the responses .
In general the advantages of questionnaires
• Questionnaires are very cost effective when compared to face-to-face
interviews.
• Standard questions are asked to all respondents
• Answers can be quantified
• Caters for confidentiality especially when posted
• They can be stored for records and for comparisons
• Allows several questions to be asked in one document
16
• Ensures direct contact with the respondent
• It’s a primary data source meaning that it is original
• It is a fast method of data collection.This is especially true for studies
involving large sample sizes and large geographic areas.
Written questionnaires become even more cost effective as the number of
research questions increases.
Other advantages of using a questionnaire
• Questionnaires are easy to analyze.
• Data entry and tabulation for nearly all surveys can be easily done with
many computer software packages.
• Questionnaires are familiar to most people. Nearly everyone has had some
experience completing questionnaires and they generally do not make people
apprehensive.
• Questionnaires reduce bias.
• There is uniform question presentation and no middle-man bias.
• The researcher's own opinions will not influence the respondent to answer
questions in a certain manner.
• There are no verbal or visual clues to influence the respondent.
• Questionnaires are less intrusive than telephone or face-to-face surveys.
• When a respondent receives a questionnaire in the mail, he is free to
complete the questionnaire on his own time-table.
• Unlike other research methods, the respondent is not interrupted by the
research instrument.
Disadvantages of questionnaires in general
Some may fail to post back
Some may be lost in transit if it is a postal quaestionnare
Can only be completed by literate people
Language barrier can be a problem
Respondents may decide to ignore the questionnaire
Closed questions limit the respondent’s answer.
There is plenty of room for lying
Information obtained may be biased as people might be hesitant to tell you
about their habits if asked
Closed questions leave no room for explanations
Note that here speculation and negatives are allowed.
Interviewing as a method of data collection
17
• This refers to the purposeful oral conversation between the researcher
(interviewer) and the respondent(s).
• The researcher provides both the subject matter and direction of the
interview while the respondent can also have some opportunity to elaborate
on views regarding the topic.
Types of interviews
• Personal
• Telephone
Personal interview
• This involves the face to face conversation between the respondent and the
researcher
Procedure
• Ensure full cooperation of the respondent
• A professional appearance and a brief explanation of the objective of the
study will achieve full cooperation.
• Record whatever the respondent says using either a pen and a notebook or
using a tape recorder.
• Ask the required questions following the order. Where the answer is not
clear probe further.
Advantages of a personal interview
• High degree of flexibility
• Has a less non- response error
• It allows the researcher to gather a lot of information in a very short time.
Disadvantages of a personal interview
• It is costly
• Greater response error as the respondent will be trying not to disappoint the
researcher.
Telephone interview
• This is a voice to voice type of interview
Advantages of a telephone interview
• Saves time since calls are made from one place
• They incorporate a sense of importance since
• Many people are quick to respond to telephones than direct people.
• They are less costly
Disadvantages of a telephone interview
• Good telephone manners required
• Respondent has little time to think
18
• Visual aids cannot be used
• Not everyone has a telephone
• Repeat calls are inevitable
• Straight forward questions are required
Data presentation
Methods of presenting data
• Tabulation – this involves the arrangement of data in rows and columns.The
tabulation can be simple one way or can be cross tabulation where the
relationship between two variables is recorded
Graphical representation if data
• A graph is a pictorial representation of the characteristics of any set of given
variables.
• Graphs allow for quicker interpretation of data
Types of graphs
• Histograms
• Frequency polygons
• Cumulative frequency curves( ogives)
• Line graphs
• Scatter graphs and Regression lines
• Bar graphs
• Circular graphs and pie charts
Histograms
• It is a graph of frequency distribution
• It uses bars to represent changes in a distribution
• The bars touch each other.
Method of construction
• Construct the horizontal axis using a scale which continuous running from
one extreme to another, label it
• Find a suitable scale for frequency on the vertical scale (y- axis)
• Draw a vertical rectangle for each class in the distribution with the base on
the horizontal axis extending from one class to another.
• Do not have gaps between the rectangles
19
Frequency polygon
• It is a graph which has a close relationship with the histogram.
• In such a polygon straight lines are drawn from the midpoints linking the top
of each rectangles of the histogram.
Ogive/cumulative frequency curve
• This is a graph of frequency distribution
• To obtain such a graph add total frequencies for any given class and all the
frequencies above it.
Method of constructing an ogive
• Prepare a cumulative frequency table using the data available
• Decide on a suitable scale depending on the data available
• Prepare the axis using a suitable scale
• Draw the axis
• Insert the points using the values from the table
• Plot the points
• Join the points using a pencil
• Insert title and key
Pie chart
• It is a circle which is divided into sectors so that the area of each sector is
proportional to the quantity being represented.
20
Method of construction
• Draw a circle of any size
• Convert the component parts to percentages and then to degrees.
• Draw a vertical line from the centre of the circle to the top.
• Draw the segments using a protractor to measure the degrees.
• Allocate shades for each sector and shade
• Complete the graph by showing a title and a key (note that you can label
the segments directly).
21
Bar graph
• Bar graphs are good for showing how data change over time.
How to make bar graphs
• Consider the range of data and decide on a suitable scale
• Draw the axes using the scale and label the axes
• Draw the bars in proportion to the quantity being represented.
• Leave equal spaces between the bars and
• bars should be of the same thickness
• Shade the bars
• Add title and key. See diagram
Try and modify this diagram such that it has atitle and a key remember those
are 2 marks in an examination
Advantages of bar graphs
• Clear visual impression of the data
22
• show each data category in a frequency distribution
• display relative numbers or proportions of multiple categories
• summarize a large data set in visual form
• clarify trends better than do tables
• estimate key values at a glance
• permit a visual check of the accuracy and reasonableness of calculations
• be easily understood due to widespread use in business and the media.
• Comparative
• quantitative
Disadvantages of bar graphs
• loss of information when rounding off
• require additional explanation
• be easily manipulated to yield false impressions
• fail to reveal key assumptions, causes, effects, or patterns
Types of bar graphs
• Simple-individual bars used where the length of each bar represent the size
of the figure being represented
• Component- the length of each component part represent the size of the
component being represented
• Divergent-used where there is need to show two opposite set of data or
contrasting set of data
• Multiple-quantities belonging to a given common source are represented by
bars adjoining each other.
Component bar graph
23
Multiple bar graph
The advantages and disadvantages are the same as those under simple bar
graphs.
Scatter Graphs
• Scatter graphs are used to investigate the relationship between two variables
(or aspects) for a set of paired data. The pattern of the scatter describes the
relationship as shown in the examples below. Best-fit or trend lines should:
• Follow the trend of the data
• Join as many points as possible
• Leave an equal number of unconnected points on either side.
Method of construction
Draw the axes vertical and horizontal
Label the axes
Plot in the points
Draw aline of best fit
Label the places on the regression line
Insert a title
Insert a key.
See diagram
24
25
The line of best fit
• This is a line which summarises the pattern of dots on scatter graphs
• The closer the points are to the line, the closer the relationship between the
points
• Can be used to estimate other values not given in the data.
Disadvantages of scatter graphs
• They do not incorporate time.
• The eye is used to place the line of best fit and this can lead to errors
• When the line of best fit is drawn mathematically it is independent of
subjective judgements and such a line is called a regression line.
Method of drawing a regression line
• Calculate the averages for the data on each axis and mark them on the
graph
• Mark the point where the two averages meet
• Draw a line parallel to the y- axis to pass through the point.
• Calculate the mean of the points to the left and that of the points to the
right
• Draw a line which passes through all the three points.
• This line represents the regression line
Example:
• Price changes of a convenience item along an environmental gradient in El
Raval, Barcelona.
• The hypothesis tested is that prices should decrease with distance from the
key area of gentrification surrounding the Contemporary Art Museum.
• The line followed is Transect 2 in the map below, with continuous sampling
of the price of a small bottle water at every convenience store.
Map to show the location of environmental gradients for transect lines in El Raval,
Barcelona
26
Distance along transect from Contemporary Art Price of a small bottle of
Museum water (euros)
1 1.80
2 1.20
3 2.00
4 1.00
5 1.00
6 1.20
7 0.80
8 0.60
9 1.00
10 0.85
27
Choropleth Maps
• These are maps, where areas are shaded according to a prearranged key,
each shading or colour type representing a range of values.
• Population density information, expressed as 'per km²,' is appropriately
represented using a choropleth map.
• Choropleth maps are also appropriate for indicating differences in land use,
like the amount of recreational land or type of forest cover.
Method of construction of a choropleth map
• Calculate the range of data by subtracting the lowest value from the highest
value as this will help you to come out with class interval.
• Decide on the method of progression to use either arithmetic for evenly
distributed data and geometric progression for unevenly distributed data.
• Decide on the number of classes
• Decide on a systematic way of shading.
• The shading process should highlight the highest values by using the more
intense values
• Shade
• Insert a key and a title.
Example of a choropleth map
28
Advantages of choropleth maps
• give a good visual impression of change over space there are certain
disadvantages to using them:
• Can be used with other methods
• Fairly easy to construct as they do not require lengthy and difficult
calculations
• Can be used even when the range of values have extremes. In this case
uneven class intervals are used.
Disadvantages of Choropleth Maps
• They give a false impression of abrupt change at the boundaries of shaded
units.
• Choropleths are often not suitable for showing total values. Proportional
symbols overlays (included on the choropleth map above) are one solution to
this problem.
• It can be difficult to distinguish between different shades.
• Variations within map units are hidden, and for this reason smaller units are
better than large ones.
• Uniformity of shading within a class assumes that there is uniform
distribution which is not always the case.
• Where there is a wide range of data uneven classes can be made large and
uneven and the net effect will be a higher level of generalisation.
• When used on a large scale like the world the shading process becomes
time consuming.
Sample question
29
Isopleth maps
• Lines of equal value are drawn such that all values on one side are higher
than the "isoline" value and all values on the other side are lower, or
• Ranges of similar value are filled with similar colours or patterns.
• This type of map is ideal for showing gradual change over space and avoids
the abrupt changes which boundary lines produce on choropleth maps.
Temperature, for example, is a phenomenon that should be mapped using
isoplething, since temperature exists at every point (is continuous), yet does
not change abruptly at any point (like population density may do as you
cross into another census zone). Relief maps should always be in isopleth
form for this reason.
Method of construction
• Note the lowest value from the highest value as this will help you to come
out with an isoline interval.
• Decide on the isoline interval looking at the values and the possible
multiples.
• Start from the lowest to the highest, draw the isolines.
• Join the points with the same values and where the values for a particular
isoline are not indicated use interpolation.
Procedure for interpolation
30
• Measure the distance between two adjacent points and divide the
measurement into 2.
• Mark the centre with an X
• Do the same for all such points
• Join the marked points with a smooth line/curve breaking it to enable
labeling.
Advantages of an isoline map
• Easy to construct once the points of equal value have been established.
• Isolines show the gradualness of change in a distribution
• Isolines allow one to interpolate thus allowing values which are not on the
map to be represented.
• They are also easy to interpret
• They give a clear visual representation of data by putting some order in the
way the data is changing.
• They deal with individual values and not averages.
• Any value can be found on the map due to interpolation
• Isolines can be combined with other methods. See diagram
Sample question
31
Disadvantages of an isoline map
• Interpolation it difficult and also makes the map inaccurate as it is based on
assumptions.
• It also assumes uniformity between isolines which might not always be the
case.
• The method cannot accommodate extreme values.
The dot map
• This is a map which uses dots to represent values.e.g. the distribution and
density of population
Considerations
• Dot value-low enough to avoid emptiness and high enough to avoid many
dots and merging
• Dot size-must be uniform throughout and should conform to the scale of the
base map.
• Dot location-to depict actual distribution, no dots should be placed in areas
such as mountains of lakes or swamps.
Method of construction
• Obtain a large base map and sub divide the map into smaller units
• Obtain the figures of the sub areas and a physical map of the areas as this
will help in the placement of dots.
• Consider the range of data as this will help in coming up with the dot value.
• Where dots give fractions these are rounded off.
• Calculate the number of dots for each area
32
• Trace the base map on plain paper noting areas such as mountains and
swamps so that you do not place the dots in areas which cannot be
inhabited.
• Place the dots on the map avoiding the inhabitable areas using pencil so
that you can rub if the dots are placed in wrong areas.
• Put a key and a title.
Advantages of a dot map
• It produces a clear visual impression of the data
• Allows comparison to be made
• Easy to use
Disadvantages of a dot map
• Dots can merge
• Even distribution of dots suggests that the population is evenly distributed
which cannot always be the case.
• Rounding off causes loss of information, in this case working backwards
becomes difficult.
• Areas with low population densities will be difficult to place the dots.
• It is difficult to copy the map for other uses.
Proportional circles
• This is a method which uses circles which are proportional to the quantity
they represent.
Method of construction of proportional circles
• Identify on a map the areas where the circles will be drawn
• Obtain information about the numbers of people who are in each of the
areas.
• Divide the numbers by a constant then find the square root of the figures so
as to determine the radius of the circle for each centre.
• If the numbers remain high divide further by another constant if they become
too small multiply by a constant.
• Draw the proportional circles with the centres of the circles on the centres of
the towns
• Complete the map by inserting a key and a title.
Advantages
Easy to find an estimate of the original values by reading off the varying
circle sizes (it is quantitative)
Produces a clear visual/pictorial impression of the data
Shows the spread of data on the map
33
Different circle sizes solve the problem of very wide range offigures
It allows comparison of places/ citie/towns to be done
It has a wide range of uses/versatile [3/2]
Disadvantages
Leaves blank areas which appear empty
Rounding off of figures causers loss of information
Circles merge and this results in loss of information and also makes
interpretation difficult
There is need to constantly refer to the key which is time consuming
The circles disregards administrative boundaries by averlapping
Shading obscures base map features. [3/2] max [5]
=
34
• It is particularly useful in identifying changes over time, since a position on
the graph will change as the relative dominance of the components change.
• The graph can be used to show contrasting service structures for 4 locations
in El Raval, an inner-city area of Barcelona which has been the subject of
radical urban reform.
• The choice of the three graph components is important and must be in the
context of the investigation. An example of data from one location (El Raval
Site 2) is shown in map 1 below, and this has been used along with data
from three other sites (1,3 and 4) to compile the triangular graph.
35
• It is unique - there is only one answer.
• Useful when comparing sets of data.
• Disadvantages
• Affected by extreme values (outliers)
MEDIAN
Use the median to describe the middle of a set of data that does have an outlier.
Equal numbers occur above and below it.
• Advantages:
• Extreme values (outliers) do not affect the median as strongly as they
do the mean.
• Useful when comparing sets of data.
• It is unique - there is only one answer.
Disadvantages:
• Not as popular as mean.
• MODE
it shows the most popular item in a distribution. Used when the data is
numeric or non-numeric. For example the most common shoe size in a
class.
• Advantages:
• Extreme values (outliers) do not affect the mode.
• Disadvantages:
• Not as popular as mean and median.
• Not necessarily unique - may be more than one answer
• When no values repeat in the data set, the mode is every value and
is useless.
• When there is more than one mode, it is difficult to interpret and/or
compare.
Measures of dispersion
• The range-is the difference between the highest and the lowest values , it is
affected by extreme values, it fails to indicate the degree of clustering within
a distribution.
The quartile deviation
• Quartiles are numbers that divide a distribution into 4 parts.
• There are three quartiles i.e. The lower quartile Q1 , the middle quartile Q2
and the upper quartile Q3.
• The inter quartile range is derived from subtracting Q1 from Q3 (Q3-Q1)
36
How to calculate the quartiles
• Arrange the values in ascending order.
• Find the median-it is equal to the middle quartile Q2.
• Find the middle point of the values below Q2 this is Q1.
• Find the middle point of the values above Q2 this is Q3.
• Subtract Q1 from Q3.
The significance of the quartiles
• The measure of dispersion that uses quartiles is called the quartile deviation
and is calculated using the formula Q3-Q1 divided by 2
• E.g. 4,5,6,7,8,9,10.
• Median Q2 is 7, Q1 is 5, Q3 is 9.
• Quartile deviation is 9-5divided by 2= 4/2 =2.
• The answer obtained indicate how half of all the items differ from the
median e.g. 2and 6 if the median is 7.
• Advantages of the quartile deviation.
• It is not affected by extreme values.
• Can be calculated even with open ended classes.
• It gives a good description of the half of the population that occurs between
the lower and the upper quartiles.
Disadvantages of the quartile deviation
It gives no real indication about the degree of clustering.
The values derived cannot be used for further mathematical calculations.
Does not make use of all the values in the distribution.
Standard Deviation
• Standard deviation is a number that tells you approximately how far the
values in a data set deviate from the mean (the average).
• The larger the standard deviation, the larger the deviation.
• The smaller the standard deviation, the smaller the deviation.
• If all of the values are equal, the standard deviation is equal to zero.
You should also look at the mean when you interpret the standard deviation.
• There are 100 pirates on the ship. In statistical terms this means we have a
population of 100.
• If we know the amount of gold coins each of the 100 pirates have, we use
the standard deviation equation for an entire population:
37
• What if we don't know the amount of gold coins each of the 100 pirates
have? For example, we only had enough time to ask 5 pirates how many
gold coins they have. In statistical terms this means we have a sample size
of 5 and in this case we use the standard deviation equation for a sample
of a population:
38