Validity and Reliability

Validity and reliability
In Research
Agenda
AT the end of this lesson, you should be able to:
1 Discuss validity
2 Discuss reliability
3 Discuss how to achieve validity and reliability
4 Discuss validity in qualitative research
5 Discuss validity in experimental design

Reliability
The consistency of scores or answers from one
administration of an instrument to another, or from
one set of items to another.
A reliable instrument yields similar results if given
to a similar population at different times.
Validity
Appropriateness, meaningfulness, correctness,
and usefulness of inferences a researcher
makes.
Validity of ??
Instrument?
Data?
Validity
Internal validity is the extent to which research
findings are free from bias and effects
External validity is the extent to which the findings
can be generalised
Validity - Content-related
evidence
Content-related evidence of validity focuses on the
content and format of an instrument.
Is it appropriate?
Comprehensive?
Is it logical?
How do the items or questions represent the
content? Is the format appropriate?
Validity - Criterion-related
evidence
This refers to the relationship between the scores
obtained using the instrument and the scores
obtained using one or more other instruments or
measures. For example, are students scores on
teacher made tests consistent with their scores on
standardized tests in the same subject areas?
Validity - Construct-related
evidence
Construct validity is defined as establishing correct
operational measures for the concepts being
studied (Yin, 1984).
For example, if one is looking at problem solving in
leaders, how well does a particular instrument
explain the relationship between being able to
problem solve and effectiveness as a leader.
ATTAINING VALIDITY AND
RELIABILITY
Elements of content-related
evidence
Adequacy : the size and scope of the questions
must be large enough to cover the topic.
Format of the instrument: Clarity of printing, type
size, adequacy of work area, appropriateness of
language, clarity of directions, etc.
How to achieve content
validity
Consult other experts who rate the items.
Rate items, eliminating or changing those that do
not meet the specified content.
Repeat until all raters agree on the questions and
answers.
Criterion-related validity
To obtain criterion-related validity, researchers
identify a characteristic, assess it using one
instrument (e.g., IQ test) and compare the score
with performance on an external measure, such as
GPA or an achievement test.
A validity coefficient is obtained by correlating a set
of scores on one test (a predictor) with a set of
scores on another (the criterion).
The degree to which the predictor and the criterion
relate is the validity coefficient. A predictor that
has a strong relationship to a criterion test would
have a high coefficient.
Construct-related validity
This type of validity is more typically associated
with research studies than testing.
It relates to psychological traits, so multiple
sources are used to collect evidence. Often times a
combination of observation, surveys, focus groups,
and other measures are used to identify how much
of the trait being measured is possessed by the
observee.
Proactive
Coping
Skills
Reliability
The consistency of scores
obtained from one instrument to
another, or from the same
instrument over different
groups.
Errors of measurement
Every test or instrument has associated with its
errors of measurement.
These can be due to a number of things: testing
conditions, student health or motivation, test
anxiety, etc.
Instrument/test developers work hard to try to
ensure that their errors are not grounded in flaws
with the instrument/test itself.
Reliability Methods
Test-retest: Same test to same group
Equivalent-forms: A different form of the same
instrument is given to the same group of
individuals
Internal consistency: Split-half procedure
Kuder-Richardson: Mathematically computes
reliability from the # of items, the mean, and the
standard deviation of the test.
Reliability coefficient
Reliability coefficient - a number that tells us how
likely one instrument is to be consistent over
repeated administrations
Alpha or Cronbachs alpha
used on instruments where answers arent scored right
and wrong. It is often used to test the reliability of
survey instruments.
INTERNAL VALIDITY
Validity
Validity can be used in three ways.
instrument or measurement validity
external or generalization validity
internal validity, which means that what
a researcher observes between two
variables should be clear in its meaning
rather than due to something that is
unclear (something else)
What is something else?
Any one (or more) of these conditions:
Age or ability of subjects
Conditions under which the study was conducted
Type of materials used in the study
Technically, the something else is called a threat to
internal validity.
Threats to internal validity
Subject characteristics
Loss of subjects
Location
Instrumentation
Testing
History
Maturation
Attitude of subjects
Implementation
Subjectcharacteristics can pose a
threat if there is selection bias, or if
there are unintended factors present
within or among groups selected for a
study. For example, in group studies,
members may differ on the basis of
age, gender, ability, socioeconomic
background, etc. They must be
controlled for in order to ensure that
the key variables in the study, not
these, explain differences.
Age Intelligence
Strength Vocabulary
Maturity Reading ability
Gender Fluency
Ethnicity Manual dexterity
Coordination Socioeconomic status
Speed Religious/political belief

Loss of subjects (mortality)
Loss of subjects limits generalizability, but
it can also affect internal validity if the
subjects who dont respond or participate
are over represented in a group.
Location
The
place where data collection
occurs, aka location might pose a
threat. For example, hot, noisy,
unpleasant conditions might affect
scores; situations where privacy is
important for the results, but where
people are streaming in and out of the
room, might pose a threat.
Instrumentation
Decay: If the nature of the instrument or the
scoring procedure is changed in some way,
instrument decay occurs.
Data Collector Characteristics: The person
collecting data can affect the outcome.
Data Collector Bias: The data collector might hold
an opinion that is at odds with respondents and it
affects the administration.
Testing
In longitudinal studies, data are often collected
through more than one administration of a test.
If the previous test influences subsequent ones by
getting the subject to engage in learning or some
other behavior that he or she might not otherwise
have done, there is a testing threat.
History
If an unanticipated or unplanned event occurs prior
to a study or intervention, there might be a history
threat.
Attitude of subjects
Sometimes the very fact of being studied
influences subjects. The best known example of
this is the Hawthorne Effect.
Implementation
This threat can be caused by various things;
different data collectors, teachers, conditions in
treatment, method bias, etc.
Minimizing Threats
Standardize conditions of study
Obtain more information on subjects
Obtain as much information on details of the study:
location, history, instrumentation, subject attitude,
implementation
Choose an appropriate design
Train data collectors
Qualitative Research
Validity and reliability??
Qualitative research
.
Many qualitative
researchers contend that
validity and reliability are
irrelevant to their work
because they study one
phenomenon and dont seek
to generalize
Fraenkel and Wallen - any
instrument or design used
to collect data should be
credible and backed by
evidence consistent with
quantitative studies.
Trustworthiness
Quantitative vs. Qualitative
Traditional
Alternative Criteria
Criteria for
for Judging
Judging
Qualitative
Quantitative
Research
Research
Internal validity Credibility
External validity Transferability
Reliability Dependability
Objectivity Confirmability
In qualitative research
Reliability pertained to the extent to which the
study is replicable and how accurate the research
methods and the techniques used to produce data
Objectivity of the researcher - researcher must look
at her bias and preconceived notions of what she
will find before she begins her research.
Objectivity of the interviewee
In qualitative research
Triangulation
Member check
Audit trail
Lets look at one
particular design
Validity in experimental
research
Experiment
al Designs
Should be
Developed
to Ensure
Internal and
External
Validity of
the Study
Internal Validity:
Are the results of the
study (DV) caused by
the factors included in
the study (IV) or are
they caused by other
factors (EV) which were
not part of the study?
Threats to Internal Validity
Subject
Characteristics
(Selection Bias/Differential Selection) -- The groups may have been
different from the start. If you were testing instructional strategies to
improve reading and one group enjoyed reading more than the
other group, they may improve more in their reading because they
enjoy it, rather than the instructional strategy you used.
Loss of Subjects
(Mortality) -- All of the high or low scoring subject may
have dropped out or were missing from one of the
groups. If we collected posttest data on a day when the
debate society was on field trip , the mean for the
treatment group would probably be much lower than it
really should have been.
Location
Perhaps one group was at a
disadvantage because of their location.
The city may have been demolishing a
building next to one of the schools in our
study and there are constant
distractions which interfere with our
treatment.
The testing instruments may not be scores similarly.

Perhaps the person grading the posttest is fatigued
and pays less attention to the last set of papers
reviewed. It may be that those papers are from one
of our groups and will received different scores than
the earlier group's papers
Instrumentation
Instrument Decay
The subjects of one group may react differently to the data collector
than the other group. A male interviewing males and females about
their attitudes toward a type of math instruction may not receive the
same responses from females as a female interviewing females would.
Data Collector
Characteristics
The person collecting data my favors one group, or some

characteristic some subject possess, over another. A principal
who favors strict classroom management may rate students'
attention under different teaching conditions with a bias toward
one of the teaching conditions.
Data Collector Bias

Testing
The act of taking a pretest or posttest may influence the results of the
experiment. Suppose we were conducting a unit to increase student
sensitivity to racial prejudice. As a pretest we have the control and
treatment groups watch a movie on racism and write a reaction essay.
The pretest may have actually increased both groups' sensitivity and we
find that our treatment groups didn't score any higher on a posttest given
later than the control group did. If we hadn't given the pretest, we might
have seen differences in the groups at the end of the study.
History
Something may happen at one site during our study that influences the results.
Perhaps a classmate was injured in a car accident at the control site for a study
teaching children bike safety. The control group may actually demonstrate more
concern about bike safety than the treatment group.
There may be natural changes in

the subjects that can account for
the changes found in a study. A
critical thinking unit may appear
more effective if it taught during a
time when children are developing
abstract reasoning.
Maturation
Hawthorne Effect
The subjects may respond differently just because they are being studied. The
name comes from a classic study in which researchers were studying the effect
of lighting on worker productivity. As the intensity of the factory lights increased,
so did the worker productivity. One researcher suggested that they reverse the
treatment and lower the lights. The productivity of the workers continued to
increase. It appears that being observed by the researchers was increasing
productivity, not the intensity of the lights.
One group may view that it is in competition with the other group and may work
harder than they would under normal circumstances. This generally is applied to
the control group "taking on" the treatment group.
John
Henry
Effect
The control group may become discouraged because it is not

receiving the special attention that is given to the treatment
group. They may perform lower than usual because of this.
Resentful
Demoralization of
the Control Group
Regression
(Statistical Regression) -- A class that scores particularly low can
be expected to score slightly higher just by chance. Likewise, a
class that scores particularly high, will have a tendency to score
slightly lower by chance. The change in these scores may have
nothing to do with the treatment.
The treatment may not be implemented as intended. A

study where teachers are asked to use student modeling
techniques may not show positive results, not because
modeling techniques don't work, but because the teacher
didn't implement them or didn't implement them as they
were designed.
Implementation
Someone may feel sorry for the control group because they
are not receiving much attention and give them special
treatment. For example, a researcher could be studying the
effect of laptop computers on students' attitudes toward
math. The teacher feels sorry for the class that doesn't have
computers and sponsors a popcorn party during math class.
The control group begins to develop a more positive attitude
about mathematics.
Compensatory
Equalization of
Treatment
Experimental Treatment
Diffusion
Sometimes the control group actually
implements the treatment. If two different
techniques are being tested in two
different third grades in the same
building, the teachers may share what
they are doing. Unconsciously, the control
may use of the techniques she or he
learned from the treatment teacher.
Once the researchers are confident that
the outcome (dependent variable) of the
experiment they are designing is the
result of their treatment
(independent variable)
[internal validity],
they determine for which
people or situations
the results of
their study apply
[external validity].
External Validity:
Are the results of the study generalizable to

other populations and settings?
Population
Ecological
Threats to External Validity
(Population)
Population Validity is the extent to which the results of a study can be

generalized from the specific sample that was studied to a larger
group of subjects. It involves...
...the extent to which one can generalize from the study sample to a defined
population--
If the sample is drawn from an accessible population, rather than the target
population, generalizing the research results from the accessible population to the
target population is risky.
(Ecological)
Ecological Validity is the

extent to which the results of an experiment can be
generalized from the set of environmental conditions
created by the researcher to other environmental
conditions (settings and conditions).
There are 10
common threats
to external
validity.
(Ecological)
Explicit description of
the experimental
treatment
(not sufficiently described for others to replicate) If the

researcher fails to adequately describe how he or
she conducted a study, it is difficult to determine
whether the results are applicable to other
settings.
(Ecological)
Multiple-treatment
interference
(catalyst effect)
If a researcher were to apply several treatments,
it is difficult to determine how well each of the
treatments would work individually. It might be
that only the combination of the treatments is
effective.
(Ecological)
Hawthorne effect
(attention causes differences)
Subjects perform differently because they know they are
being studied. "...External validity of the experiment is
jeopardized because the findings might not generalize
to a situation in which researchers or others who were
involved in the research are not present" (Gall, Borg, &
Gall, 1996, p. 475)
(Ecological)
Novelty and
disruption effect
(anything different makes a difference)
A treatment may work because it is novel and the subjects respond to the
uniqueness, rather than the actual treatment. The opposite may also occur,
the treatment may not work because it is unique, but given time for the
subjects to adjust to it, it might have worked.
(Ecological)
Experimenter effect
(it only works with this experimenter)
The treatment might have worked because of the
person implementing it. Given a different person, the
treatment might not work at all.
(Ecological)
Pretest sensitization
(pretest sets the stage)
A treatment might only work if a pretest is
given. Because they have taken a pretest, the
subjects may be more sensitive to the
treatment. Had they not taken a pretest, the
treatment would not have worked.
(Ecological)
Posttest sensitization
(posttest helps treatment "fall into place")
The posttest can become a learning experience. "For
example, the posttest might cause certain ideas presented
during the treatment to 'fall into place' . If the subjects had
not taken a posttest, the treatment would not have worked.
(Ecological)
Interaction of
history and
treatment effect
(...to everything there is a time...)
Not only should researchers be cautious about generalizing to other
population, caution should be taken to generalize to a different time
period. As time passes, the conditions under which treatments work
change.
(Ecological)
Measurement of
the dependent
variable
(maybe only works with M/C tests)
A treatment may only be evident with certain types of
measurements. A teaching method may produce
superior results when its effectiveness is tested with an
essay test, but show no differences when the
effectiveness is measured with a multiple choice test.
(Ecological)
Interaction of time
of measurement
and treatment
effect
(it takes a while for the treatment to kick in)
It may be that the treatment effect does not occur until several weeks after the end
of the treatment. In this situation, a posttest at the end of the treatment would
show no impact, but a posttest a month later might show an impact.
NEXT WEEK
Consultation

Validity and Reliability

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Validity and Reliability

Transféré par

Droits d'auteur :

Formats disponibles

Validity and reliability

3 Discuss how to achieve validity and reliability

4 Discuss validity in qualitative research

5 Discuss validity in experimental design

Speed Religious/political belief

The testing instruments may not be scores similarly.

The person collecting data my favors one group, or some

Data Collector Bias

There may be natural changes in

The control group may become discouraged because it is not

The treatment may not be implemented as intended. A

Are the results of the study generalizable to

Population Validity is the extent to which the results of a study can be

Ecological Validity is the

(not sufficiently described for others to replicate) If the

Vous aimerez peut-être aussi