Vous êtes sur la page 1sur 61

Reliability and Validity

Introduction to Study Skills & Research Methods (HL10040)

Dr James Betts
Lecture Outline:
Definition of Terms
Types of Validity
Threats to Validity
Types of Reliability
Threats to Reliability
Introduction to Measurement Error.
Commonly used terms

She has a valid point

My car is unreliable

in science
The conclusion of the study was not valid

The findings of the study were not reliable.


Some definitions
Validity

The soundness or appropriateness of a test


or instrument in measuring what it is
designed to measure
(Vincent 1999)
Some definitions
Validity

Degree to which a test or instrument


measures what it purports to measure

(Thomas & Nelson 1996)


Some definitions
Reliability

the degree to which a test or measure


produces the same scores when applied in
the same circumstances

(Nelson 1997)
Some definitions
Objectivity

the degree to which different observers


agree on measurements

(Atkinson & Nevill 1998)


Types of Experimental Validity
Internal

Is the experimenter measuring the effect of the


independent variable on the dependent variable?

External

Can the results be generalised to the wider


population?
Validity
AKA Criterion

Logical Statistical

Construct

Face Content Concurrent Predictive

Reliability Consistency Objectivity


Logical Validity
Face Validity
Infers that a test is valid by definition
It is clear that the test measures what it is supposed to

e.g.
If you want to assess reaction
time, measuring how long it
takes an individual to react to
a given stimulus would have Externally
face validity Valid?
Logical Validity
Face Validity
Infers that a test is valid by definition
It is clear that the test measures what it is supposed to

i.e.
Would assessing 15 m sprint
time be a valid means of
assessing reaction time?

Assessing face validity is therefore a subjective process.


Logical Validity
Content Validity
Infers that the test measures all aspects contributing to
the variable of interest
e.g.
Who is the most physically
fit?
VO2 max test?
Wingate test?
1 RM?
also a subjective process.
Overall:

A logically valid test simply appears to


measure the right variable in its entirety?
Statistical Validity
Concurrent Validity
Infers that the test produces similar results to a
previously validated test
e.g.
VO2
max

Incremental Treadmill Protocol


Multi-Stage Fitness (Beep) Test
with expired gas analysis
Statistical Validity
Predictive Validity
Infers that the test provides a valid reflection of
future performance using a similar test

e.g.
Can performance
during test A be
used to predict
future performance
in test B?

A B
http://www.youtube.com/watch?v=vdPQ3QxDZ1s
Overall:

A statistically valid test produces results


that agree with other similar tests?
Logical/Statistical Validity
Construct Validity
Infers not only that the test is measuring what it is
supposed to, but also that it is capable of detecting
what should exist, theoretically
Therefore relates to hypothetical or intangible
constructs
e.g.
Team Rivalry

Sportsmanship.
Logical/Statistical Validity
Construct Validity
Infers not only that the test is measuring what it is
supposed to, but also that it is capable of detecting
what should exist, theoretically
Therefore relates to hypothetical or intangible
constructs
This makes assessment difficult,
i.e. if what should exist cannot be detected, this could mean:

a) Test Invalid? b) Theory Incorrect? c) Sensitivity/Specificity Issues?


Interesting Example: Breast Cancer
Incidence: ~1 % (0.8 %)
(i.e. a positive result should be detected for approximately 1
in every 100 women tested)
Sensitivity: ~90 % (87 %)
(the mammogram is sensitive enough that approximately 90
in every 100 breast cancer patients will receive a positive result)
Specificity: ~90 % (93 %)
(the mammogram is specific enough that approximately 90
in every 100 healthy patients will receive a negative result).

Data from Kerlikowske et al. (1996)


Quick Test

What is the probability that a


patient receiving a positive
result actually has breast
cancer?
Threats to Validity
(and possible solutions?)
Threats to Internal Validity
Maturation
Changes in the DV over time irrespective of the IV
Threats to Internal Validity
Maturation
e.g. One Group Pre-test Post-test

O 1
T O 2
Threats to Internal Validity
Maturation (possible solution)

Time series

O 1 O 2 O 3 T O 4 O 5 O 6
Threats to Internal Validity
Maturation (possible solution)
Pre-test Post-test Randomised Group Comparison
O 1 T O 2

n.b.
R RCT
O 3 P O 4
Threats to Internal Validity
Maturation (possible solution)
Repeated measures designs can occasionally be an inappropriate
solution, even when randomised and counterbalanced
e.g.
Muscle Damage (repeated bout effect)
Vitamin Supplementation (wash-out period)

In which case independent measures designs could be used.


Threats to Internal Validity
History
Unplanned events between measurements
Threats to Internal Validity
History

O 1
T O 2

e.g. exercise?

Therefore, solution = control extraneous variables!


Threats to Internal/External Validity
Pre-testing
Interactive effects due to the pre-test (e.g. learning,
sensitisation, etc.)
Also influences External Validity
Threats to Internal/External Validity
Pre-testing so it is actually T+O1 that
e.g. is better than P, not T alone.
O 1 T O 2

Assessing muscle
mass here could make
R them train harder in
but then respond better
to the T than the P
O 3
both trials
P O 4
Threats to Internal/External Validity
Pre-testing (possible solution)
T
O 1
O 2

P O
O 4

R
3

T
O 5
Solomon Four-
Group Design P O 6
Threats to Internal Validity
Sophomore
Slump & SI
Statistical Regression Cover Jinx
AKA regression to the mean

An initial extreme score is likely to be


followed by less extreme subsequent scores
e.g.
Training has the greatest effect on untrained individuals.

Therefore, solution = effective sampling.


Threats to Internal Validity
Instrumentation
A difference in the way 2 comparable variables
were measured
e.g.
Uncalibrated equipment

Therefore, solution = calibrate!


Threats to Internal Validity
Selection Bias
The groups for comparison are not equivalent
Threats to Internal Validity
Selection Bias
e.g. Groups not randomly assigned
T O 1
i.e.

Static Group Group T were


resistance trained
Comparison
P to start with

Oa
Threats to Internal Validity
Selection Bias (possible solution)

Either: T O 1

-Randomise group
assignment,
-Pre-test and post-
test difference, P
-Repeated Measures Oa
Design.
Threats to Internal/External Validity
Experimental Mortality
Missing Data due to subject drop-out
Reduced n = reduced statistical Power
Not only challenges quality of data gathered
(Internal Validity) but
also our ability to
generalise
(External Validity).
Therefore, solution =
recruit sufficient (young?)
participants
Threats to External Validity
Inadequate description
5th characteristic of research
should be replicable

If nobody can replicate the methods of a given


study, then it is irrefutable and therefore lacks
external validity.

Therefore, solution = comprehensive methodology


Threats to External Validity
Biased sampling
Linked to statistical regression
Sample does not reflect target population
nN
Results generalised
across gender

Therefore, solution = random sample (of target population).


Threats to External Validity
Hawthorne Effect
DV is influenced by the fact that it is being
recorded
e.g.
Fastest sprint when
professor enters lab

Therefore, solution =
control the lab environment.
Threats to External Validity
Demand Characteristics
Participants detect the purpose of the study and
behave accordingly
e.g.
Sports Science students already know that the
carbohydrate drink is supposedly superior
Therefore, solution =
CHO double or single H2 O
blinding.
Threats to External Validity
Operationalisation
AKA Ecological Validity
The DV must have some relevance in the
real world
e.g.
TTE has no
Olympic
equivalent

Therefore, solution = choose your DV carefully.


Reliability
Reliability is a pre-requisite of validity
e.g. Direct versus Indirect measures of VO2 max

-Gold Standard (i.e. valid and reliable) -Predictive


-Expensive -Cheap
-Complex -Easy
Reliability

Subject 1 60 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 55 ml.kg-1.min-1 55 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 70 ml.kg-1.min-1 70 ml.kg-1.min-1

Valid and Reliable


Reliability

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 65 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 75 ml.kg-1.min-1 75 ml.kg-1.min-1


5 ml.kg-1.min-1
Not Valid but Reliable correction?
Reliability

Subject 1 60 ml.kg-1.min-1 72 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 61 ml.kg-1.min-1 52 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 40 ml.kg-1.min-1 84 ml.kg-1.min-1


i.e. a test can never
Not Valid and not Reliable be valid without
being reliable?
Types of Reliability

Relative
Absolute
Rater reliability (Objectivity)
Intrarater reliability
Interrater reliability.
Relative Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1


i.e. Individuals maintain
Relatively Reliable position in the group
Absolute Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1


i.e. Test-Retest
Not Absolutely Reliable within individuals
Rater Reliability
Intrarater reliability
The consistency of a given observer or
measurement tool on more than one occasion
Rater Reliability
Interrater reliability
The consistency of a given measurement from
more than one observer or measurement tool
e.g.
Score for the American Gymnast
British Judge = 9.9
French Judge = 4.4
Japanese Judge = 7.0
Threats to Reliability
Fatigue
8 am 9 am 10 am

Subject 1 60 ml.kg-1.min-1 55 ml.kg-1.min-1 50 ml.kg-1.min-1

Therefore, solution = increase time between tests.


Threats to Reliability
Habituation

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 70 ml.kg-1.min-1

Therefore, solution = familiarise prior to test.


Threats to Reliability
Standardisation of Procedures
Control of extraneous variables

Precision of Measurements
i.e. if we are happy to measure VO2 max to the nearest
10 ml.kg-1.min-1, then it could probably be reliably
predicted from your training volume and age.
Measurement Errors
Ultimately, reliability is dependent on the
degree of measurement error in a given study

The overall error in any measurement is


comprised of both systematic and random error

We will address measurement error further next


week
Literature Search Assignment
The handout lists 8 questions which can be
answered through retrieving the corresponding
source articles
Answer as many as possible and bring them to
next weeks lecture
DO NOT contact author or order articles.
Selected Reading
Atkinson, G. and A. M. Nevill. Statistical methods for
assessing measurement error (Reliability) in variables relevant
to sports medicine. Sports Medicine. 26:217-238, 1998.

Holmes, T. H. Ten categories of statistical errors: a guide for


research in endocrinology and metabolism. American Journal
of Physiology. 286: E495-501.

Thomas J. R. & Nelson J. K. (2001) Research Methods in


Physical Activity, 4th edition. Champaign, Illinois: Human
Kinetics
J.Betts@bath.ac.uk

Vous aimerez peut-être aussi