The Hype and Futility of Measuring Implementation Fidelity v5

The Hype and Futility of Measuring Implementation Fidelity (in GRTs)
David Judkins Presentation at Evaluation 2009 Orlando
The Hype
Effectiveness research is now at the point of sophistication wherein black-box outcomes studies are no longer acceptable. Mowbray, Holter, Teague and Bybee, 2003 89,900 Google hits on October 10, 2009, for the phrase, what works best for whom
2
Lofty Goals
What social programs, policies, and interventions work? For whom do they work, and under what conditions? And why do they workor fall short? Preface to Learning More from Social Experiments, edited by Howard Bloom
(34,000 Google hits on book title)
3
And why do they workor fall short?

Bloom expands on the question in an MDRC announcement about the book publication:
But, in the past, there have been questions that randomized experiments have not been able to address effectively. What component of a social policy made it successful?
Can such questions be answered?

I will argue that the answer is generally negative Worse, that attempting to answer it compromises the first objective of determining whether the intervention works at all (This is all in the context of group randomized trials)
5
Thesis
It is better not to attempt to measure fidelity in GRTs It is counter-productive to try to answer questions about efficacy (intervention effects under ideal conditions) in a trial designed to measure effectiveness (intervention effects under realistic conditions) Other forms of measurement about the intervention process in the hopes of learning more about alternate interventions are also vain and wasteful
6
Outline
Opportunities & perspectives The preconditions for useful fidelity measurement Operational challenges in fidelity measurement Statistical issues in the estimation of fidelityadjusted intervention effectiveness A case study Another perspective
Opportunities
Many educational, social, and behavioral interventions are complex
Multidimensional Incorporate aspects of culturally accepted best practices (traditions and fads) Require the participation of trained intervenors and of intervention subjects over extended periods of time Can never be detailed enough to handle every eventuality
8
Cynics Perspective
If there is failure:
Blame the subjects Blame the intervenors Disqualify or discount the work of control intervenors who by virtue of superior skill, appear to infringe on the developers recipe, possibly merely by implementing the culturally accepted best practices
9
Kaspar Hauser (Jeder fr sich und Gott gegen alle

The wolf child raised in isolation from most humanity would make an ideal foil for many educational and parenting interventions
10
The Thrifty Perspective

It is urgent that an effective intervention be found Limited number of fresh ideas in circulation Limited dollars for research It would be nice to be able to learn from an experiment designed to measure the impact of a complex intervention A3B7C2 what would be the effect of A1B22C4
11
Preconditions for Fidelity Measurement

Well-defined intervention A way of splitting an intervention into components that could be recombined in alternative strengths and mixtures Some theory about what aspects of interactions between intervenor and subject are relevant and consequential
12
Operational Challenges
Choice of informant
Subject Intervenor Trainer/ senior intervenor adviser Neutral observer
How to make fidelity reliable, valid and cost effective?

13
Intervenor Informant
Likely to think they are doing just fine if asked to summarize their fidelity
Let's Begin with the Letter People: ECE
24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 Project Director rating 4 5
Play & Learning Strategies (PALS): PE
Frequency
Frequency
Partners for Literacy: ECE

24 22 20 18 16 14 12 10 8 6 4 2 0 1
Partners for Literacy: PE
Frequency
Frequency
3 Project Director rating
14
Intervenor Informant (2)

If asked to keep detailed logs, they will likely do a poor job For those who do a good job on detailed activity reporting, it will probably detract from their effectiveness
15
Trainer/Advisor Informant
Can have vested interest Blind neither to treatment status nor outcome outlook Possible to read the writing on the wall and rate the intervenors with unfavorable average outcomes as having low fidelity, thereby protecting the fidelity-adjusted effectiveness of the intervention
16
Trainer/Advisor Informant (2)

Even if unbiased, how sound of an opinion can be formulated from initial training and occasional (often telephone) contact with intervenors?
17
Neutral Observer
Very costly Need staff who fully understand the intervention model Need extensive training for consistent rating Usually need travel Results in strong pressure for additional clustering
18
Neutral Observer (2)

High cost of training, travel, and salary directly reduce power for primary effectiveness research by reducing subject sample size (for fixed budget) Pressure for stronger clustering indirectly reduces power by reducing the number of intervenors and/or intervention sites
19
Statistical Issues
Most advocates of fidelity measurement have unwarranted optimism about the ability of statisticians to do anything useful with the data Of course, one can always hunt for the statistician who will provide rosy promises of artful analyses
20
Statistical Issues (2)

The statistician who offers multi-level causal path mediated analyses will be loved by many, but as Tukey said: The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
21
The Best We (Statisticians and Econometricians) Can Offer

Requires heroic assumptions. Either:
Randomization provides an instrumental variable for fidelity; or The collection of measured covariates is rich enough to render fidelity conditionally independent of potential outcomes
22
Heroic Assumption #1
Can only render the mediating role of one (one!) unidimensional summary of fidelity By definition, Z is an instrumental variable for the effect of X on Y if the only effect of Z on Y is through X In other words, one must be able to rule out a priori that there could be any effects of Z on Y that do not run through X In the context of fidelity-adjusted effect estimation, this means that there is a unique plausible summarization of fidelity
23
Heroic Assumption #1 (cont.)

Might not be so heroic if the intervention is very simple and nearly instantaneous Then a binary measure of fidelity might be the unique plausible choice Or if the intervention is purely unidimensional, perhaps a uniquely plausible ordinal measure of fidelity could be developed
24

The little recognized but ironic kick is that even if you make this assumption, the formal hypothesis tests for fidelity-adjusted intervention effectiveness based on the IV approach yield the same star pattern as the original analysis The point estimate will be altered, but if the ITT analysis found no statistically significant treatment effect, an IV analysis with randomization as the instrumental variable will yield the same finding
25
Heroic Assumption #2
If one relies upon the adequacy of covariate measurement, one quickly runs up against sample size problems A typical group randomized trial will have only a few dozen intervenors per arm (maybe just one or two dozen, and I have seen less than one dozen)
26

If we agree that it would probably take on the order of 30 covariates to fully explain why some intervenors are more faithful than others (the propensity scoring approach) or more effective than others (the ANCOVA approach), then we need on the order of a 1000 intervenors before we even consider interactions among the covariates
27

However, instrument designers generally have no clue how to design intervenor background questionnaires that would explain intervenor fidelity And if we knew how to measure intervenor effectiveness, then the entire experiment would be unnecessary
28
CASE STUDY
29
CLIO
Randomized field trial of curricula for Even Start Centers 5 arm study 4 active, 1 control Three fidelity measurements:
Local Even Start center director Curriculum designer Neutral observer
30
Fidelity Instrument Development

Several of the top national experts in the evaluation of early education interventions designed the neutral observer instruments and training Curriculum designers were consulted Curriculum designers had ongoing contact with intervenors through technical assistance contracts
31
Correlations Between Developer and Observer Fidelity Ratings

Across 96 active projects for early childhood curriculum:
o 0.48 in year 1 o 0.39 in year 2
Across 48 active projects for parenting curriculum:

o 0.10 in year 1 o -0.01in year 2
32
Relationship between developer-rated fidelity and emergent child English literacy (arm A2)
Outcome
Fidelity
33
Relationship between developer-rated fidelity and emergent child English literacy (arm B2)
Outcome
Fidelity
34
Relationship between developer-rated fidelity and emergent child English literacy (arm A1)
Outcome
Fidelity
35
Relationship between developer-rated fidelity and emergent child English literacy (arm B1)
Outcome
Fidelity
36
Relationship between developer-rated fidelity and emergent child English literacy (control)
Outcome
Fidelity
37
Relationship between observer-rated fidelity and emergent child English literacy (arm A2)
Outcome
Fidelity
38
Relationship between observer-rated fidelity and emergent child English literacy (arm B2)
Outcome
Fidelity
39
Relationship between observer-rated fidelity and emergent child English literacy (arm A1)
Outcome
Fidelity
40
Relationship between observer-rated fidelity and emergent child English literacy (arm B1)
Outcome
Fidelity
41
Relationship between observer-rated fidelity and emergent child English literacy (control)
Outcome
Fidelity
42
Methods and Results

Multiplied arm indicators by fidelity scores (constrained to lie between 0 and 1) in multi-level model Generally similar results Fidelity-adjusted estimates not always larger than ITT estimates! Two more stars
One positive One negative!
43
Case Study Wrap Up

A lot of money spent with little discernable return We still dont know how to develop good preschool curricula for Even Start projects
44
Other voices
Peter Schochet, Mathematica Policy Research, in a recent IES white paper, final line: Thus, these classroom practice mediators may be of little help in confirming the studys conceptual model and identifying teacher practices that are most associated with student learning gains.
45
Josh Angrist
Instrumental Variables Methods in Experimental Criminological Research: What, Why, and How? 2004. Journal Of Experimental Criminology. Especially noteworthy is the fact that, in marked contrast with an unfortunate trend in education research, criminologists do not appear to have been afflicted with what social scientist Tom Cook (2001) calls sciencephobia. This is a tendency to eschew rigorous quantitative research designs in favor of a softer approach that emphasizes process over outcomes.
46

The Hype and Futility of Measuring Implementation Fidelity v5

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

The Hype and Futility of Measuring Implementation Fidelity v5

Transféré par

Droits d'auteur :

Formats disponibles

The Hype and Futility of Measuring Implementation Fidelity (in GRTs)

David Judkins Presentation at Evaluation 2009 Orlando

And why do they workor fall short?

Can such questions be answered?

Kaspar Hauser (Jeder fr sich und Gott gegen alle

The Thrifty Perspective

Preconditions for Fidelity Measurement

How to make fidelity reliable, valid and cost effective?

Play & Learning Strategies (PALS): PE

Partners for Literacy: ECE

Partners for Literacy: PE

3 Project Director rating

Intervenor Informant (2)

Trainer/Advisor Informant (2)

Neutral Observer (2)

Statistical Issues (2)

The Best We (Statisticians and Econometricians) Can Offer

Heroic Assumption #1 (cont.)

Heroic Assumption #1 (cont.)

Heroic Assumption #2 (cont.)

Heroic Assumption #2 (cont.)

Fidelity Instrument Development

Correlations Between Developer and Observer Fidelity Ratings

Across 48 active projects for parenting curriculum:

Methods and Results

Case Study Wrap Up

Vous aimerez peut-être aussi