Vous êtes sur la page 1sur 46

The Hype and Futility of Measuring Implementation Fidelity (in GRTs)

David Judkins Presentation at Evaluation 2009 Orlando

The Hype
Effectiveness research is now at the point of sophistication wherein black-box outcomes studies are no longer acceptable. Mowbray, Holter, Teague and Bybee, 2003 89,900 Google hits on October 10, 2009, for the phrase, what works best for whom
2

Lofty Goals
What social programs, policies, and interventions work? For whom do they work, and under what conditions? And why do they workor fall short? Preface to Learning More from Social Experiments, edited by Howard Bloom
(34,000 Google hits on book title)
3

And why do they workor fall short?


Bloom expands on the question in an MDRC announcement about the book publication:
But, in the past, there have been questions that randomized experiments have not been able to address effectively. What component of a social policy made it successful?

Can such questions be answered?


I will argue that the answer is generally negative Worse, that attempting to answer it compromises the first objective of determining whether the intervention works at all (This is all in the context of group randomized trials)
5

Thesis
It is better not to attempt to measure fidelity in GRTs It is counter-productive to try to answer questions about efficacy (intervention effects under ideal conditions) in a trial designed to measure effectiveness (intervention effects under realistic conditions) Other forms of measurement about the intervention process in the hopes of learning more about alternate interventions are also vain and wasteful
6

Outline
Opportunities & perspectives The preconditions for useful fidelity measurement Operational challenges in fidelity measurement Statistical issues in the estimation of fidelityadjusted intervention effectiveness A case study Another perspective

Opportunities
Many educational, social, and behavioral interventions are complex
Multidimensional Incorporate aspects of culturally accepted best practices (traditions and fads) Require the participation of trained intervenors and of intervention subjects over extended periods of time Can never be detailed enough to handle every eventuality
8

Cynics Perspective
If there is failure:
Blame the subjects Blame the intervenors Disqualify or discount the work of control intervenors who by virtue of superior skill, appear to infringe on the developers recipe, possibly merely by implementing the culturally accepted best practices
9

Kaspar Hauser (Jeder fr sich und Gott gegen alle


The wolf child raised in isolation from most humanity would make an ideal foil for many educational and parenting interventions

10

The Thrifty Perspective


It is urgent that an effective intervention be found Limited number of fresh ideas in circulation Limited dollars for research It would be nice to be able to learn from an experiment designed to measure the impact of a complex intervention A3B7C2 what would be the effect of A1B22C4
11

Preconditions for Fidelity Measurement


Well-defined intervention A way of splitting an intervention into components that could be recombined in alternative strengths and mixtures Some theory about what aspects of interactions between intervenor and subject are relevant and consequential
12

Operational Challenges
Choice of informant
Subject Intervenor Trainer/ senior intervenor adviser Neutral observer

How to make fidelity reliable, valid and cost effective?


13

Intervenor Informant
Likely to think they are doing just fine if asked to summarize their fidelity
Let's Begin with the Letter People: ECE
24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 Project Director rating 4 5
24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 Project Director rating 4 5

Play & Learning Strategies (PALS): PE

Frequency

Frequency

Partners for Literacy: ECE


24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 Project Director rating 4 5
24 22 20 18 16 14 12 10 8 6 4 2 0 1

Partners for Literacy: PE

Frequency

Frequency

3 Project Director rating

14

Intervenor Informant (2)


If asked to keep detailed logs, they will likely do a poor job For those who do a good job on detailed activity reporting, it will probably detract from their effectiveness

15

Trainer/Advisor Informant
Can have vested interest Blind neither to treatment status nor outcome outlook Possible to read the writing on the wall and rate the intervenors with unfavorable average outcomes as having low fidelity, thereby protecting the fidelity-adjusted effectiveness of the intervention
16

Trainer/Advisor Informant (2)


Even if unbiased, how sound of an opinion can be formulated from initial training and occasional (often telephone) contact with intervenors?

17

Neutral Observer
Very costly Need staff who fully understand the intervention model Need extensive training for consistent rating Usually need travel Results in strong pressure for additional clustering
18

Neutral Observer (2)


High cost of training, travel, and salary directly reduce power for primary effectiveness research by reducing subject sample size (for fixed budget) Pressure for stronger clustering indirectly reduces power by reducing the number of intervenors and/or intervention sites
19

Statistical Issues
Most advocates of fidelity measurement have unwarranted optimism about the ability of statisticians to do anything useful with the data Of course, one can always hunt for the statistician who will provide rosy promises of artful analyses
20

Statistical Issues (2)


The statistician who offers multi-level causal path mediated analyses will be loved by many, but as Tukey said: The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
21

The Best We (Statisticians and Econometricians) Can Offer


Requires heroic assumptions. Either:
Randomization provides an instrumental variable for fidelity; or The collection of measured covariates is rich enough to render fidelity conditionally independent of potential outcomes

22

Heroic Assumption #1
Can only render the mediating role of one (one!) unidimensional summary of fidelity By definition, Z is an instrumental variable for the effect of X on Y if the only effect of Z on Y is through X In other words, one must be able to rule out a priori that there could be any effects of Z on Y that do not run through X In the context of fidelity-adjusted effect estimation, this means that there is a unique plausible summarization of fidelity
23

Heroic Assumption #1 (cont.)


Might not be so heroic if the intervention is very simple and nearly instantaneous Then a binary measure of fidelity might be the unique plausible choice Or if the intervention is purely unidimensional, perhaps a uniquely plausible ordinal measure of fidelity could be developed

24

Heroic Assumption #1 (cont.)


The little recognized but ironic kick is that even if you make this assumption, the formal hypothesis tests for fidelity-adjusted intervention effectiveness based on the IV approach yield the same star pattern as the original analysis The point estimate will be altered, but if the ITT analysis found no statistically significant treatment effect, an IV analysis with randomization as the instrumental variable will yield the same finding
25

Heroic Assumption #2
If one relies upon the adequacy of covariate measurement, one quickly runs up against sample size problems A typical group randomized trial will have only a few dozen intervenors per arm (maybe just one or two dozen, and I have seen less than one dozen)

26

Heroic Assumption #2 (cont.)


If we agree that it would probably take on the order of 30 covariates to fully explain why some intervenors are more faithful than others (the propensity scoring approach) or more effective than others (the ANCOVA approach), then we need on the order of a 1000 intervenors before we even consider interactions among the covariates

27

Heroic Assumption #2 (cont.)


However, instrument designers generally have no clue how to design intervenor background questionnaires that would explain intervenor fidelity And if we knew how to measure intervenor effectiveness, then the entire experiment would be unnecessary

28

CASE STUDY

29

CLIO
Randomized field trial of curricula for Even Start Centers 5 arm study 4 active, 1 control Three fidelity measurements:
Local Even Start center director Curriculum designer Neutral observer
30

Fidelity Instrument Development


Several of the top national experts in the evaluation of early education interventions designed the neutral observer instruments and training Curriculum designers were consulted Curriculum designers had ongoing contact with intervenors through technical assistance contracts
31

Correlations Between Developer and Observer Fidelity Ratings


Across 96 active projects for early childhood curriculum:
o 0.48 in year 1 o 0.39 in year 2

Across 48 active projects for parenting curriculum:


o 0.10 in year 1 o -0.01in year 2
32

Relationship between developer-rated fidelity and emergent child English literacy (arm A2)

Outcome

Fidelity

33

Relationship between developer-rated fidelity and emergent child English literacy (arm B2)

Outcome

Fidelity

34

Relationship between developer-rated fidelity and emergent child English literacy (arm A1)

Outcome

Fidelity

35

Relationship between developer-rated fidelity and emergent child English literacy (arm B1)

Outcome

Fidelity

36

Relationship between developer-rated fidelity and emergent child English literacy (control)

Outcome

Fidelity

37

Relationship between observer-rated fidelity and emergent child English literacy (arm A2)

Outcome

Fidelity

38

Relationship between observer-rated fidelity and emergent child English literacy (arm B2)

Outcome

Fidelity

39

Relationship between observer-rated fidelity and emergent child English literacy (arm A1)

Outcome

Fidelity

40

Relationship between observer-rated fidelity and emergent child English literacy (arm B1)

Outcome

Fidelity

41

Relationship between observer-rated fidelity and emergent child English literacy (control)

Outcome

Fidelity

42

Methods and Results


Multiplied arm indicators by fidelity scores (constrained to lie between 0 and 1) in multi-level model Generally similar results Fidelity-adjusted estimates not always larger than ITT estimates! Two more stars
One positive One negative!
43

Case Study Wrap Up


A lot of money spent with little discernable return We still dont know how to develop good preschool curricula for Even Start projects

44

Other voices
Peter Schochet, Mathematica Policy Research, in a recent IES white paper, final line: Thus, these classroom practice mediators may be of little help in confirming the studys conceptual model and identifying teacher practices that are most associated with student learning gains.
45

Josh Angrist
Instrumental Variables Methods in Experimental Criminological Research: What, Why, and How? 2004. Journal Of Experimental Criminology. Especially noteworthy is the fact that, in marked contrast with an unfortunate trend in education research, criminologists do not appear to have been afflicted with what social scientist Tom Cook (2001) calls sciencephobia. This is a tendency to eschew rigorous quantitative research designs in favor of a softer approach that emphasizes process over outcomes.
46

Vous aimerez peut-être aussi