Stat10010 1

STAT10010
Introductory Statistics
Dr. Patrick Murphy
School of Mathematical Sciences
Textbook
Seeing Through Statistics
by
Jessica Utts
OR
STATISTICS
by
UTTS AND HECKARD
DISCOUNT PRICE IN CAMPUS BOOKSHOP
Course Website available on
Monday
WWW.UCD.IE/mathsci
Click on classpages
Then Click on STAT10010
Other Requirements
New Cambridge Statistical
Tables
Calculator with Statistics Mode
IMPORTANT
MY PART OF THIS COURSE
DOES NOT USE
BLACKBOARD for NOTES
I couldnt find the notes on Blackboard
Assessment
60% Final Exam
10% Laboratory
20% In Class Tests
10% Participation in Lectures/Labs/Tutorials
Participation
Arriving Late
Leaving Early
Talking
Eating
Texting
Not Paying Attention
Sleeping
Whole Class Loses Marks
Individuals May Gain Marks
LECTURES 2 per week
Tutorials 1 per week
Tutorials Start in Week 4
Computer Labs Start in Week 4
What do you know
about statistics?
Its boring
There are three kinds of lies:
Lies
Damned Lies
and
Statistics
- Benjamin Disraeli
Simpsons episode:
Homer is questioned about his newly formed
vigilante group
Newscaster: Since your group started up, petty
crime is down 20%, but other crimes are up.
Such as heavy sack beating which is up 800%. So
youre actually increasing crime.
Homer: You can make up statistics to prove
anything.
43% of people know that.
Misuse of Statistics
INTRO STATS
PART 1
Section A : DATA COLLECTION
1. Introduction and Terminology
2. Seven Critical Components of a Study
3. Questionnaire Design
4. Survey Design
5. Design of Experiments and Observational Studies.
Chapter 1:
Terminology
Statistics is the science of data. This involves collecting,
analysing and interpreting information.
STATISTICS is the Science of Variability.
Descriptive Statistics uses graphical and numerical
techniques to summarise and display the information
contained in a dataset.
Inferential Statistics uses sample data to make
decisions or predictions about a larger population of
data
Chapter 1
The Beginning
Sample Survey
Observational Study
Designed Experiment
DATA
More Definitions
Population: The entire collection of individuals or objects
about which information is desired.
Sample: A part (subset) of the population selected in some
prescribed manner.
Variable: A characteristic or property of an individual unit in
the population.
Representative Sample: A selection of data chosen from the
target population which exhibits characteristics typical of
the population.
Representative samples should give unbiased estimates
Chapter 2
CHAPTER 2:
7 Critical Components: To believe or not to
believe the results of a study.
The importance of not always believing what
we read in the papers cannot be overstated.
This lesson applies not just to newspapers but
to academic research papers in journals also.
In fact it applies everywhere someone presents
us with a conclusion based on a study.
When we are presented with a study we should examine 7
components of the study:
1. The source of the research and of the funding.
2. The researchers who had contact with the participants.
3. The individuals or objects studied and how they were selected.
4. The exact nature of the measurements made or the questions
asked.
5. The setting in which the measurements were asked.
6. The extraneous differences in groups being compared.
7. The magnitude of any claimed effects or differences.
Research costs money, researchers need to be paid,
equipment has to be bought, subjects need to be found.
We should always be aware of this fact when we examine
the claims made by researchers.
We should look differently on research conducted by
Independent agencies e.g. CSO, Eurostat, ESRI?
Academics (who funds them?)
Companies trying to convince consumers to buy their
product instead of a competitors.
Journalists
When a study is conducted, the responses of an individual
may depend on who asks the questions.
Question: How much money do you earn?
Response to a Revenue Commissioner will probably be
different than to a friend or date whom you are trying to
impress.
Would you trust a study on the use of illegal drugs which was
carried out by Gardai knocking on peoples doors.
3. The individuals or objects studied and how they were
selected.
THE INDIVIDUALS STUDIED
Can we apply the results of a study conducted on men to
women?
Does a study conducted on Americans apply to Irish
people?
HOW THEY WERE SELECTED
Many studies rely on volunteers, is this wise?
Is there not a difference between the kind of people who
volunteer for a study and those who dont.
Consider what would happen if someone came up to you in
the street with a questionnaire.
4. The exact nature of the measurements made or the
questions asked.
You should be aware that the wording and the ordering of
questions can influence responses.
What is your opinion on the plight of refugees forced by war
in Syria to flee their homes and come to Ireland?
How do you feel about all those foreigners coming over here
taking our jobs?
5. The setting in which measurements were taken.
When and where was the study conducted.
Studies conducted at certain times of the day may exclude
certain elements of the target population.
@3.00pm many employed people are at work, so a study
conducted on Grafton St. at that time will probably not be
representative.
Phone surveys conducted during the day will also probably
under-represent the views of workers.
How you would reply to certain questions posed in a police
interrogation room would probably differ to how you
would answer those same questions in a pub. Capilano
Suspension Bridge
6. The extraneous differences in groups being compared-
Confounding.
CONFOUNDING FACTORS
A study shows that exam scores among marijuana smokers
are lower than among non-smokers.
A conclusion is drawn that marijuana, impairs exam
performance.
We should however consider that the type of person who
smokes dope may be the kind of unmotivated slacker who
doesnt do enough study for their exams irrespective of
whether they smoked or not.
Newspapers seldom say how large the effects of a
statistical study are.
Are the results STATISTICALLY SIGNIFICANT?
Is Statistically Significant the same as Meaningfully
Significant?
UK & IRISH General Elections.
Taking aspirin reduces heart attacks.
We really should be told that the reduction is from 17 per 1000 to 9.4
per 1000.
But we should also be told that this increases strokes and stomach
ulcers.
Review: Seven Critical Components of a Study
3. The individuals or objects studied and how they were selected.
4. The exact nature of the measurements made or the questions
asked.
5. The setting in which the measurements were asked.
6. The extraneous differences in groups being compared.
Chapter 3
Chapter 3
Questionnaire Design: How to ask a question
(plus some statistical terms).
We saw in the previous chapter that deciding
exactly what to measure and what questions to
ask is extremely important.
Remember the 4th component
4. The exact nature of the measurements made
or the questions asked.
In this chapter we will examine this component
in detail.
Section 3.1 Questions
A study was conducted in the US in 1974 where two
researchers showed college students a film of a car
accident.
After viewing the film the students were asked one of
two questions.
Group 1 was asked the question:
About how fast were the cars going when they contacted
each other
The average of the responses for Group 1 was 31.8
Miles per Hour
Group 2 was asked the question:
About how fast were the cars going when they collided
with each other
The average of the responses for Group 2 was 40.8
Miles per Hour
Both groups had seen exactly the same film. The only
difference was the use of the word collided instead of
contacted.
Simply using the word collided increased peoples
estimates of the speed of the accident by 9 mph or
28%.
There are many problems associated with asking
questions we will examine seven of them
Deliberate Bias
Unintentional Bias
Desire to please
Asking the uninformed
Unnecessary complexity
Ordering of questions
Confidentiality and anonymity
Deliberate Bias
Sometimes when a survey is conducted, the questions
are worded in a leading manner to illicit a favourable
response.
Recall the questions on refugees in Ireland.
The responses to questions that begin Do you agree
that.. should be treated with caution. This does not
invalidate such questions just be careful to see if there
is deliberate bias.
Asked whether they felt New Improved Persil was
better at cleaning clothes than ordinary Persil, 90% of
people said yes.
Who wouldnt say yes to such a leading question.
Deliberate Bias
Do you agree with the continued destruction of trees
on this campus for the construction of new buildings?
Do you agree that during the construction of new
buildings to alleviate the overcrowding on Belfield
campus that it is okay to knock down a few trees?
Unintentional Bias
Besides the deliberate bias caused by leading questions,
sometimes the questions are worded badly unintentionally
and are misinterpreted by many respondents.
What was the most important date in your life?
People may respond differently to this question.
Some may interpret the word date as calendar date and may
reply for instance
- The day I was born
- The day I passed my final exams
Some may interpret the word date as dinner and a movie.
And some may think
that a shrivelled fruit is being referred to.
Desire to please
Many respondents like to please the questioner.
Recall the sketch from The Fast Show
Respondents do not like to admit to certain socially
undesirable habits
Surveys on the prevalence of cigarette smoking based
on surveys of individuals disagree with data from
cigarette sales.
In Dublins fair city
Asking the uninformed
Nobody likes appearing ignorant when asked a
question.
The day that Articles 2 & 3 of our constitution were changed
TV3 sent a reporter out to Grafton Street to ask Dubliners:
Do you know what important thing happened today in the
North?
Most people replied yes.
But the reporter then asked the people:
OK, so what happened?
Many people got embarrassed and said that they didnt
know after all.
Unnecessary Complexity
Questions should be kept simple, otherwise people may
get confused.
Shouldnt students not be allowed to repeat their
exams if they fail at the first attempt.
This sentence actually contains a double negative .Is it
therefore equivalent to the question:
Should students be allowed to repeat their exams if
they fail at the first attempt.
Ordering of Questions.
If two questions are asked of a respondent but one question
causes the respondent to think about something they may not
have thought of otherwise then the order of the questions will
be important.
Example
Name the five most popular types of television
programme.
Do you watch hospital dramas on TV such as Greys
Anatomy?
Confidentiality and Anonymity
Anonymity
Some questions may only be answered if the
respondent feels that they are anonymous.
Confidentiality
If a follow up study is necessary then respondents
cannot remain anonymous and so confidentiality of
responses must be ensured.
Questions on sexual behaviour and financial dealings
are usually only responded to if either Anonymity or
Confidentiality can be ensured.
Section 3.2 Choices
When asking a question should we present the
respondent with a choice of possible answers.
Should we ask open questions or closed questions?
Most opinion polls are conducted using closed
questions i.e. the respondent is asked to chose
between a group of answers. This allows easy
compilation of the results of the survey compared
to an open question format.
Closed Questions.
Weve already mentioned that opinion polls often use
the closed question format, in which the respondent is
presented with a choice of answers. This form of
question can often lead to some very strange results.
The textbook refers to a study conducted in 1987 in the
US to examine the difference between Open Questions
and Closed Questions. The study asked the following
Question:
What do you think is the most important problem
facing this country today?
Half of the sample were given this as an open question,
the top four responses were:
17% Unemployment
17% General Economic Problems
12% Threat of Nuclear War
10% Foreign Affairs
The other half of the sample were given this as a closed
Question to pick between the following choices:
The Energy Shortage
The Quality of Public Schools
Legalised Abortion
Pollution
If you prefer you may name a different problem as
most important.
The responses to this closed question were:
5.6% The Energy Shortage
32% The Quality of Public Schools
8.4% Legalised Abortion
14% Pollution
So even though the respondents were allowed to choose
an alternative to these 4 choices, 60% saw these four
as being the most important problems.
But in the open question format these problems were
only listed by 2.4% of respondents.
Something is wrong here!
WHAT IS HAPPENING?
Open Questions.
We mentioned one problem with the open question
format, that it is hard to compile results from possibly
thousands of different responses.
There is another major problem with the open question
format, this was highlighted in the same 1987 study
referred to earlier.
A group of respondents were asked to name one or two
of the most important national or world events or
changes during the past 50 years
Half of this sample were given this as an open question,
the top responses were:
14.1% World War II
6.9% Space Exploration
4.6% JFK Assassination
10.1% Vietnam War
10.6% Dont know
53.7% All Other Responses
These responses were then given as a closed Question
together with another choice The invention of the
computer - this had been mentioned by only 1.4% of
respondents in the Open Question format.
The responses to this closed question were:
22.9% World War II
15.8% Space Exploration
11.6% JFK Assassination
14.1% Vietnam War
29.9% The Invention of the Computer
0.3% Dont know
5.4% All Other Responses
The problem here was the wording of the question,
people concentrated on the word events rather than
changes. When it was shown to them they realised
that the invention of the computer was indeed one of
the most important changes during the past 50 years.
To summarise:
Perhaps the best way to ask a question is to conduct a
small trial Open Question format survey. Then use the
responses from this trial survey as the choices in a
Closed Question survey together with any other answers
that may not immediately spring to mind.
Section 3.3 Defining whats being measured
Before we use the results of a survey we should be fully
aware of what was actually measured by the survey.
Unemployment in Ireland.
Live Register 166,142
QNHS 86,700
GROWTH RATES
Q3 2002
GDP 6.9%
GNP -0.3%
Section 3.4
Some Statistical Terms
Measurement/Numerical Data: Data we measure in the
form of numbers.
Examples:
Percentage you will get on the summer exam for this
course.
Number of lectures that you will skip.
Frequency of radio station you listen to when studying.
Categorical Data: Data which can be placed in a category,
cannot add/subtract this kind of data.
Examples:
Grade you will receive on the summer exam for this
course.
Name of radio station you listen to.
Brand of shoes you are wearing.
Numerical/Measurement data is further distinguished as to
whether it is Discrete or Continuous.
Discrete variables take only isolated whole number values
(integers) on the number line.
Example: Number of Nike runners in this class.
Continuous variables have values comprising entire intervals
of the number line. Decimals and Fractions are allowed.
Example: The duration of this class . remember seconds
are not the smallest unit of time measurement. This class
could possibly last 50.123456789 Minutes.
Validity
A valid measurement is one which actually
measures what it claims to measure.
Example: Unemployment figures are validly
measured using the Labour Force Survey not the
Live Register
Reliability
A reliable measurement is one which will give
approximately the same result time after time, when
taken on the same individual or object.
Example: Most physical measurements are reliable,
for example measuring your weight using a
bathroom scales.
Some measurements may be reliable but not
necessarily valid.
Are exams reliable measuring devices?
Are exams results valid measurements of intelligence?
Bias
Sometimes when measurements are made a systematic
error is made which underestimates or overestimates
the true value. Such a measurement is called a biased
measurement.
Example: Suppose your bathroom scales always
overestimated your weight.
Example: Car Speedometers are deliberately biased to
overestimate a cars real speed.
Variability
If we try to measure a certain characteristic for
many different objects or people we will most likely
not get the same answer each time. The fact that the
observations vary is referred to as the variability in
the dataset.
Some datasets are more variable than others:
Example: A dataset consisting of the ages of 100
students in UCD will be less variable than a dataset
consisting of the ages 100 randomly chosen Irish
people.
Homework
Design two surveys to look at some of the
concepts in this chapter.
Chose a random sample of 20 people and divide
the sample in to two groups of 10.
1. Examine bias caused by changing words in one
question.
2. Examine the effects of using Open Questions vs
Closed Questions
The Topics of the questions are up to you!
Chapter 4
Chapter 4
How to get the Data Part 1:
Survey Design
In the first 3 Chapters of this course we spoke at
length about what care we should take in conducting
a study ourselves or in interpreting the results of
someone elses study.
What we didnt mention was how to actually conduct
the study, that will be the topic of todays lecture.
We saw that studies can take three forms:
Experiments
Observational Studies
Sample Surveys
We saw earlier that experiments are possibly the best
way to conduct studies as the researcher usually has
complete control over the elements of the study. And
experiments allow a determination of cause and effect.
Since this is a human sciences/business course and not
a Biology or Chemistry course we will restrict ourselves
to experiments involving humans.
No not what you may think, experiments these days
rely on volunteer subjects.
Experiments
The Experimental procedure involves manipulating
something called the Explanatory Variable and seeing
the effect on something called the Outcome Variable
Example: In an experiment to test the effect of a new
diet.
The Explanatory Variable would be?
The Outcome Variable would be?
The experiment has to be designed to eliminate to any
extraneous effects and to determine only the results of
the explanatory variable on the outcome variable.
The way that this is accomplished is that participants
are randomly assigned to one of two groups:
One group receives the treatment the other receives a
placebo - ie no treatment at all.
This random assignment to a treatment group or control
group is the way most clinical trials are conducted
today.
This form of study is similar to an experiment except
that the treatment occurs naturally and is not imposed
on the subjects.
It is much harder to establish a cause and effect
relationship using an observational study than using an
experiment because we cannot create control and
treatment groups to eliminate confounding effects.
One attempt to isolate the explanatory variable is to
conduct what is called a case control study.
We will examine this type of study in detail later.
We will concentrate for the rest of this chapter on
sample surveys.
First some definitions:
A Unit is a single individual or object to be measured.
A Population is the entire collection of Units about
which we would like information.
A Sample is the collection of Units we actually
measure.
A Sampling Frame is a list of Units from which the
sample is chosen. Ideally the sampling frame
includes the whole Population.
Sample Surveys
In a Sample Survey measurements are taken on a
sample chosen from the Population
In a Census the entire Population is surveyed.
Resources are needed to conduct a Census
CSO Spends about 80 million to conduct the 5 year
Census of Population
Sometimes the measuring process destroys the thing
being measured, e.g. if we were to test the
strength of a weld or in testing an individuals blood
- who among us would be willing to donate all of our
blood in a test?
Because of the work involved in a Census it is much
faster to conduct a survey, sometimes it is important to
have results fast.
Why is Sampling used?
There are accuracy advantages to be had in conducting
a sample survey:
It is easier to get complete coverage of a sample
than of a population.
Easier to train a small number of interviewers for a
sample survey than to train a large number for a
census.
OK but a sample is still a sample and is bound to be
inaccurate by its very nature, isnt it???
British General Election
Accuracy in Surveys
We mentioned before that if a sample was chosen to be
representative of the target population then it could be
very accurate.How accurate?
For surveys conducted to measure a sample proportion
as an estimate for a population proportion we can define
a Margin of Error.
The sample proportion will differ from the population
proportion by more than the margin of error less than
5% of the time.
The Margin of Error for a sample of size n is 1/n
Accuracy in Sampling
For Example with a sample of 1600 the margin of error
is 1/40 or 2.5%
So a survey conducted using a sample of size 1600 will
be accurate to within 2.5% more than 95% of the time.
Accuracy in Sampling
We saw already that in order for the sampling procedure
to work properly the sample must be representative of
the target population
There are several ways to get a representative sample:
Simple Random Sampling
Stratified Random Sampling
Cluster Sampling
Quota Sampling
Systematic Sampling
Random Digit Dialing
Multi Stage Sampling
How to choose a Sample
The simplest form of sampling procedure.
Each group of individuals has the same chance of
getting chosen.
Therefore each individual has the same chance of being
chosen.
Use Random Number Tables or a random number
generator.
Or put names in a hat or roll a die.
Simple Random Sampling
Polling companies dont have a list of all adults and
select from that list randomly.
Instead they use other methods like Stratified Random
Sampling
We first divide the population into different strata, then
sample randomly within those strata.
Example: To conduct an opinion poll we might divide
the population into different age groups or sexes or by
County of residence.
Advantages of this method are:
We can get individual estimates for each strata
We can use different interviewers for each strata and
train them appropriately
If strata are different geographic regions it may be
cheaper to sample them separately.
Divide the Population into similar groups or clusters.
Then choose a random sample of clusters.
The analysis of this type of survey is more complicated
than for simple random sampling.
NOTE: This is not the same as Stratified Sampling, in
Cluster Sampling the Clusters are chosen so that the
resemble each other as much as possible.
Cluster Sampling
This is where a plan is used to chose the participants in
the study.
For Example: We might decide to survey every 3rd
person we meet. Or to choose every 5th House to be
surveyed.
Sometimes this procedure can be very biased.
What happens if every 5th house is an end house?
Systematic Sampling
Quota Sampling
We know the demographic characteristics of the
population of interest and we ensure that our sample
contains the same distribution of demographic traits as
the population.
Over-sampling will be required as we will be forced to
exclude respondents once each quota has been achieved.
Used very much in the US
Phone numbers in certain area codes are dialled
randomly by computer, then when someone answers an
interviewer asks questions
Random Digit Dialling
Multi- Stage Sampling
Used for large surveys
Involves using a combination of the methods described
above.
Here are 5 ways to make a mess of the sampling
procedure:
Use the wrong sampling frame
Fail to reach the individuals selected
Get no response
Get a sample of volunteers
Use a convenient or haphazard sample
The last 2 of these are disastrous
What can go wrong in Sampling
1936 US election
Literary Digest had been extremely successful in
predictions
1936 predicted 3-2 victory for Rep Landon over Dem.
FDR
George Gallup American Institute of Public Opinion
predicted FDR correctly and also predicted what Literary
Digest would predict.
Literary Digest 10 million
Gallup 50,000
LD- Magazine Subscribers, Phone Directories, Car
Owners - Wealthy
Most serious though: 23% Volunteer response.
What went wrong in Sampling
Chapter 5
The Experimental procedure involves manipulating
something called the Explanatory Variable and seeing
the effect on something called the Outcome Variable
or the Response Variable.
Many times it is hard to establish a clear causal
connection between one variable and another, it may be
that a third variable is causing both.
There is an established correlation between the number
of fillings in childrens teeth and their vocabulary.
Chapter 5
Design of Experiments and Observational Studies
Does this mean that eating Mars bars increases your
vocabulary??
There may be a third variable causing both of the
variables we are studying.
Ideally we want to create changes in the explanatory
variable and then examine the effects on the response
variable. This we can only really do in an Experiment.
In an Observational Study we cannot create changes in
the explanatory variable. Instead we observe differences
in the explanatory variable and try to see if these are
related to changes in the response variable.
So in an Experiment we have an element of control that
we do not have in an Observational Study.
Why dont we just do Experiments then?
Well it may be unethical to perform certain experiments.
Eg To measure the effect of smoking during pregnancy on a
child, it would be unethical to make a random selection of
mothers smoke. We could however observe the effects on
the children of mothers who already are smoking during
pregnancy.
It may be that we cannot randomly assign some
explanatory variables-like baldness, or handedness.
Some Definitions
A Treatment is one or more explanatory variables
assigned by the experimenter.
A Confounding Variable is one whose effect on the
response variable cannot be separated from the effect of
the explanatory variable.
An Interaction occurs when the effect of one explanatory
variable depends on whats happening with another
explanatory variable.
An Experimental Unit is the smallest object to which we
can assign different treatments in an experiment.
Some More Definitions
An Observational Unit is the smallest object which we can
observe in an observational study.
When the Units are people they are called Participants or
Subjects.
Usually these participants are Volunteers.
How to design an Experiment
Randomisation is the most important element of any
experiment. We will discuss different types of
randomisation in a little while.
A Control Group which is treated identically in all respects
to the group receiving the treatment except that the
members of the control group do not receive the
treatment.
Placebos: There is a proven phenomenon called the
placebo effect. Patients receiving Placebo tablets which
have no active drug ingredient (eg a sugar tablet) may
experience a certain beneficial effect.
The way to eliminate this Placebo Effect from the
experiment is to give Placebo tablets to the control group
when administering a tablet to the treatment group.
Blinding: It is not just in receiving tablets that the power
of suggestion plays an important role. It is usually best
therefore if the subject does not know whether they are
receiving the treatment or not. This practice is called
Blinding.
Sometimes it is also best if the experimenter does not
know which subject is receiving the treatment and which
is not. This will remove any potential bias in the way the
experimenter reports his findings.
Experiments in which both the subject and the
experimenter do not know who receives the treatment
are called Double Blind.
Experiments in which either the subject or the
experimenter (but not both) do not know who receives
the treatment are called Single Blind.
Experimental design
The design of an experiment is very important.
Experiments are designed with the purpose of isolating
the effect of the treatment on the response variable and
removing any confounding effects.
One way that we have seen already of removing the
effect of any confounding variables is to randomly assign
subjects to the treatment or control group. This way any
possible bias in the population should be evenly spread
among the treatment and control groups.
Sometimes instead of relying on randomisation to make
the groups as even as possible we actually force the
groups to be similar.
Matched Pair designs: These are experimental designs in
which either the same individual or two matched
individuals are assigned to receive the treatment and the
control. In the case where an individual receives both the
treatment and the control, the order in which this
happens should be random. And the experiment should
be conducted as a Double Blind experiment.
Block Designs: This is an extension of the Matched Pair
design to the case of three or more treatments (one may
be the control).
If there are 4 treatments and a control then there will be
5 blocks each one designed to be as similar as possible. 4
of the blocks will each receive one of the treatments and
one block will be a control.
Problems with Experiments
Confounding variables -
These are variables connected to the explanatory variable
which may be the actual cause of the effect on the
response variable.
Cured by Randomisation
Storks
Interacting variables
A second variable which interacts with the explanatory
variable.
Smoking/Alcohol/Exercise
Placebo effect
Hawthorne effect
Participants in an experiment respond differently just
because they are in an experiment.
Problems with Experiments
Experimenter effect
The experimenter can influence the results of the
experiment.
They may record data incorrectly.
Or inadvertently let the subjects know the desired outcome.
RATS
Ecological validity and generalisability
Results obtained in a closed experimental setting may not
be applicable in the real world
Observational studies are not as good as Experiments at
establishing causal connections.
However since no special setting is required they usually
do not suffer from the problem of Ecological Validity.
In addition the Hawthorne and Experimenter effects are
not a problem.
Observational studies are classified as either
Retrospective or Prospective Studies
In Retrospective studies the participants are asked to
recall certain past events.
In Prospective studies the participants are followed by
the researcher into the future and events are recorded.
One particular type of observational study has become
very popular
- The Case Control Study
Case Control Studies
To try and examine the possibility of a relationship
between an explanatory variable and a response variable
the researcher selects a group of participants called
CASES in which the response variable is already present.
For example in a study to determine if there is a
relationship between baldness and heart attacks, a group
of patients in hospital for heart attacks are chosen as the
Cases.
A group who have not had heart attacks are chosen as
the Controls.
Case Control Studies
The Control Group should be chosen to be as similar in all
respects as the case group except for the response
variable. Why?
A Case Control study is much more efficient than some
other forms of study. Consider first choosing two groups
according to whether the explanatory variable was
present or not then waiting until the response variable
revealed itself. This may take a long time.
Good at removing confounding variables if controls are
chosen appropriately.
Problems with Observational Studies
Confounding Variables
Cases and Controls not representative of the population
Recollections of the Past may not be accurate.

Stat10010 1

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stat10010 1

Transféré par

Droits d'auteur :

Formats disponibles

STAT10010

Vous aimerez peut-être aussi