Académique Documents
Professionnel Documents
Culture Documents
Learning Objectives:
- Understand/identify different types of outcome measures and understand their limits
- Define different measurement properties
- Describe methods to evaluate different types of validity, reliability, sensitivity to change
and responsiveness
- Design a study to evaluate the measurement properties of an outcome measure
Measuring Health
What is health?
- According to WHO, it is a state of complete physical, mental, and social well-being
- It is a multi-faceted concept influenced by a person’s experiences, beliefs, expectations,
and perceptions
- It means different things to different people
Surrogate Outcomes
- Outcome measures that are not of direct practical importance to patients but are believed
to reflect outcomes that are important.
- Validity depends on magnitude of the association b/w surrogate and the patient important
outcome (i.e. its predictive validity)
o E.g. reduction in cholesterol as surrogate for reduction in mortality
o E.g. increased bone density as a surrogate for reduction in fracture incidence
- We use these outcomes b/c of its efficiency; changes can be measured on all patients over
a shorter time interval
Types of QoL
Measurement Properties
Types:
- Face: the extent to which a measurement instrument appears to measure what it is
intended to measure.
- Content: the extent to which a measurement instrument represents all facets of a given
social construct.
- Criterion: examines the extent to which a measure provides results that are consistent
with a gold standard.
o Predictive: compares the measure in question with an outcome assessed at a later
time.
o Concurrent: comparison between the measure in question and an outcome assessed at
the same time.
- Construct: forming theories about the attribute of interest and then assessing the extent to
which the measure under investigation provides results that are consistent with the
theories.
o Convergent: tests the degree to which two measures of constructs that theoretically
should be related, are in fact related
o Divergent: tests whether concepts or measurements that are supposed to be unrelated
are, in fact, unrelated
Reliability: the extent to which an instrument yields the same results in repeated
administrations in a stable population
Relative Reliability:
- Reliability = measuring agreement, NOT association
- Cannot use Pearson/Spearman to demonstrate reliability b/c they are measures of
association and do not consider systematic differences b/w measures
- However, both Intra-class Correlation Coefficient (ICC) and Kappa do consider this
(measures of agreement)
- Ideal value = 1
- Measures that are highly associated but are systematically different will have a
correlation coefficient that is larger than the agreement statistics (A)
- Measures that are highly associated without a systematic difference will have similar
values for the correlation coefficient and agreement statistic (B)
ICC:
- Is a measure of reproducibility that compares variance b/w patients to the total variance
(b/w patient and within-patient variance)
Kappa
- Is a measure of the extent to which observers achieve agreement beyond the level
expected to occur by chance alone.
- For a binary outcome variable (0 – no agreement or 1 – perfect agreement)
- The more discordant the raters are, the lower the value of Kappa
- The weighted Kappa is for ordered categories
o Any discordant ratings will largely affect value of Kappa
Absolute Reliability: Precision – Individual Score
- Standard Error of Measurement (SEM) is a statistic for absolute reliability and is
calculated from test-re-test reliability study design
- SEM allows us to determine how certain we can be about a particular individual’s score
at a particular point in time.
- SEM = √within-client variance
- Ideally 0
- Clinician can be x % confident (x defined by confidence level chosen) that the true score
lies within the reported interval
Sensitivity to Change
- Is the ability to detect change that isn’t necessarily meaningful change
- Many stats. for expressing this
- Standardized Response Mean (SRM) is most common
- Study Design: in a population expected to change, administer the new test pre- and post-
change
- SRM = (mean change) / (SD change)
- If SRM > 1, ‘signal/change’ could be detected over the ‘noise/variability’
- Signal = change that occurred from pre- to post-treatment
- Noise = all systematic and random errors
Responsiveness
Anchor-Based Approach
- way to establish the interpretability of measures of patient-reported outcomes
- All patients are measured at Time 1 and Time 2
- B/w these times, provide an intervention that usually provides some improvement
- At Time 2, the Anchor is included = Global Rating of Change questionnaire
o Patient indicates how much better/worse they feel compared to Time 1
o Calculate average score of all patients who indicated a small but important change on
the GRC (score of 2 or 3); represents the within-group MCID for that instrument.
- If the magnitude of change in ‘better’ and ‘worse’ group are different, then averaging
score is not valid.
Distribution-Based Approach
- Approach 1
o Measure outcome at two time points in individuals not expected to change
o Calculate change scores for every participant and plot them in distribution
o Choose threshold (MCID) for classifying an individual as not having changed by an
important amount
- Approach 2
o Measure outcome at two time points in individuals expected to change by an important
amount
o Calculate change scores for every participant and plot them in distribution
o Choose threshold (MCID) for classifying an individual as having changed by an
important amount
- The score at the cut-off is the within-group MCID for that instrument
Self-Assessment