Module 3 - Measurement

Module 3 – Measurement
Learning Objectives:
- Understand/identify different types of outcome measures and understand their limits
- Define different measurement properties
- Describe methods to evaluate different types of validity, reliability, sensitivity to change
and responsiveness
- Design a study to evaluate the measurement properties of an outcome measure
Measuring Health
What is health?
- According to WHO, it is a state of complete physical, mental, and social well-being
- It is a multi-faceted concept influenced by a person’s experiences, beliefs, expectations,
and perceptions
- It means different things to different people
ICF Model of Health

- ICF = International Classification of Functioning, Disability, and Health
- Model meant to standardize communication about health
- Health outcomes are classified according to the effect on body function and structure
(impairment; includes mental health items), limitations in activities (disability), and
participation (handicap)
- Modifiers of these outcomes are: age, coping strategies, social attitudes, education,
experience
Measuring Health in Research

- By using the ICF as a guide, use several outcome measures that can speak to specific
aspect(s) of health being affected by the intervention
- Measure QoL questionnaire, whose items include specific aspects of health that patients
have deemed important and relevant to their disease
o Not always the case that all questions are equally valued; some are more important
than others
- Good outcome measures for a study have good measurement properties AND well-
known/commonly used (ease of interpretation by others)
- Issue: too many independent outcomes = potential for multiple comparisons error
Types of Outcome Measures
Predictive Outcome Measure

- An instrument/device/method that predicts future
o E.g. MCAT predicts who is likely to perform well on licensing exam
o E.g. following an acute injury, predict who is likely to become chronic
- Design a predictive instrument using prognosis design
- Evaluate predictive validity using diagnosis design
Discriminative Outcome Measure

- An instrument/device/method that sorts individuals into groups
o E.g. x-ray (fracture present or absent)
- Evaluate validity using diagnosis design
Evaluative Outcome Measure
- An instrument/device/method that provides data on the quantity/quality of the result of
the experiment
- It is a basis for measuring the effects of the independent variable or change in dependent
variable
o E.g. pain in pre- and post-intervention
- Evaluate using longitudinal construct validity and sensitivity to change
Types of Evaluative Measures

- Surrogate Outcomes
- Patient Important Outcomes
Surrogate Outcomes
- Outcome measures that are not of direct practical importance to patients but are believed
to reflect outcomes that are important.
- Validity depends on magnitude of the association b/w surrogate and the patient important
outcome (i.e. its predictive validity)
o E.g. reduction in cholesterol as surrogate for reduction in mortality
o E.g. increased bone density as a surrogate for reduction in fracture incidence
- We use these outcomes b/c of its efficiency; changes can be measured on all patients over
a shorter time interval
Patient Important Outcomes

- Outcome measures that are of direct importance to patients
o E.g. death/survival, success/failure, patient-reported QoL
- Advantage: validity
- Disadvantage: long time interval needed to measure
Types of QoL
Measurement Properties
Validity and Reliability

- Validity (accuracy) is a measure of how close a measurement comes to the true score for
a variable
- Reliability (precision) is a measure of the extent to which repeated measurements come
up with the same value
- All outcome measures need to demonstrate validity and reliability
- Exception: evaluative measures only need to show responsiveness
Validity vs. Reliability

- Improve precision of an estimate by increasing the number of measurements taken (i.e.
regression to the mean)
o Reduces level of random error and narrows CI about the value being estimated
- Increasing precision when experiment contains systematic errors is not the solution
o Solution: calibration of instrument
Validity: the extent to which an instrument measures what it is intended to measure
Types:
- Face: the extent to which a measurement instrument appears to measure what it is
intended to measure.
- Content: the extent to which a measurement instrument represents all facets of a given
social construct.
- Criterion: examines the extent to which a measure provides results that are consistent
with a gold standard.
o Predictive: compares the measure in question with an outcome assessed at a later
time.
o Concurrent: comparison between the measure in question and an outcome assessed at
the same time.
- Construct: forming theories about the attribute of interest and then assessing the extent to
which the measure under investigation provides results that are consistent with the
theories.
o Convergent: tests the degree to which two measures of constructs that theoretically
should be related, are in fact related
o Divergent: tests whether concepts or measurements that are supposed to be unrelated
are, in fact, unrelated
Study Designs: Validity

- For known groups, one group has disease and the other doesn’t
Reliability: the extent to which an instrument yields the same results in repeated
administrations in a stable population
Study Designs: Reliability

- All require the disease to be in a stable state; measurements are repeated at least twice
- Test re-test: assumes the rater and disease are consistent and evaluates the
reproducibility of the test (patient has to perform the test)
- Inter-rater: The extent to which 2 or more raters are able to consistently differentiate
subjects with higher and lower values on an underlying trait
o assumes the test and disease are consistent and evaluated the reproducibility b/w
different raters (observations of a client or rater has to perform this test)
- Intra-rater: The extent to which a rater is able to consistently differentiate participants
with higher and lower values of an underlying trait on repeated ratings over time
o Assumes the test and disease are consistent and evaluates the reproducibility of one
rater over time (observations of a client or rater has to perform the test)
Statistics to Communicate Reliability
Relative Reliability:
- Reliability = measuring agreement, NOT association
- Cannot use Pearson/Spearman to demonstrate reliability b/c they are measures of
association and do not consider systematic differences b/w measures
- However, both Intra-class Correlation Coefficient (ICC) and Kappa do consider this
(measures of agreement)
- Ideal value = 1
- Measures that are highly associated but are systematically different will have a
correlation coefficient that is larger than the agreement statistics (A)
- Measures that are highly associated without a systematic difference will have similar
values for the correlation coefficient and agreement statistic (B)
ICC:
- Is a measure of reproducibility that compares variance b/w patients to the total variance
(b/w patient and within-patient variance)
Kappa
- Is a measure of the extent to which observers achieve agreement beyond the level
expected to occur by chance alone.
- For a binary outcome variable (0 – no agreement or 1 – perfect agreement)
- The more discordant the raters are, the lower the value of Kappa
- The weighted Kappa is for ordered categories
o Any discordant ratings will largely affect value of Kappa
Absolute Reliability: Precision – Individual Score
- Standard Error of Measurement (SEM) is a statistic for absolute reliability and is
calculated from test-re-test reliability study design
- SEM allows us to determine how certain we can be about a particular individual’s score
at a particular point in time.
- SEM = √within-client variance
- Ideally 0
- Clinician can be x % confident (x defined by confidence level chosen) that the true score
lies within the reported interval
Absolute Reliability: Real Change or Error?

- We can use SEM to determine if there has been a real change in score over time.
- We can be x % confident that a true change has occurred if the change exceeds the
reported interval, known as the Minimal Detectable Change/Difference (MDC/D), as
opposed to possibly due to random error within the measurement.
- MDC(X) = SEM * Z score for X * sqrt(2)
Sensitivity to Change
- Is the ability to detect change that isn’t necessarily meaningful change
- Many stats. for expressing this
- Standardized Response Mean (SRM) is most common
- Study Design: in a population expected to change, administer the new test pre- and post-
change
- SRM = (mean change) / (SD change)
- If SRM > 1, ‘signal/change’ could be detected over the ‘noise/variability’
- Signal = change that occurred from pre- to post-treatment
- Noise = all systematic and random errors
Responsiveness
Responsiveness: is the instruments ability to detect a clinically meaningful change

- Statistic: Minimally Clinically Important Difference (MCID)
- Sensitivity to change is necessary but insufficient condition for responsiveness
- NOTE: using wrong MCID has important implications for sample size
o Within-group: within a treatment group, every patient changes from pre- to post-
treatment
o B/w-group: the difference we want to detect in a study evaluating two different
treatments; the more similar the treatments, the smaller the expected difference b/w the
groups
o B/w-group MCID is approx. 20% of a within-group MCID
Anchor-Based Approach
- way to establish the interpretability of measures of patient-reported outcomes
- All patients are measured at Time 1 and Time 2
- B/w these times, provide an intervention that usually provides some improvement
- At Time 2, the Anchor is included = Global Rating of Change questionnaire
o Patient indicates how much better/worse they feel compared to Time 1
o Calculate average score of all patients who indicated a small but important change on
the GRC (score of 2 or 3); represents the within-group MCID for that instrument.
- If the magnitude of change in ‘better’ and ‘worse’ group are different, then averaging
score is not valid.
Distribution-Based Approach
- Approach 1
o Measure outcome at two time points in individuals not expected to change
o Calculate change scores for every participant and plot them in distribution
o Choose threshold (MCID) for classifying an individual as not having changed by an
important amount
- Approach 2
o Measure outcome at two time points in individuals expected to change by an important
amount
o Calculate change scores for every participant and plot them in distribution
o Choose threshold (MCID) for classifying an individual as having changed by an
important amount
- The score at the cut-off is the within-group MCID for that instrument
Self-Assessment

Module 3 - Measurement

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Module 3 - Measurement

Transféré par

Droits d'auteur :

Formats disponibles

Module 3 – Measurement

ICF Model of Health

Measuring Health in Research

Types of Outcome Measures

Predictive Outcome Measure

Discriminative Outcome Measure

Types of Evaluative Measures

Patient Important Outcomes

Validity and Reliability

Validity vs. Reliability

Validity: the extent to which an instrument measures what it is intended to measure

Study Designs: Validity

Study Designs: Reliability

Statistics to Communicate Reliability

Absolute Reliability: Real Change or Error?

Responsiveness: is the instruments ability to detect a clinically meaningful change

Vous aimerez peut-être aussi