Assessing Intervention Fidelity in RCTs: Concepts and Methods

Assessing Intervention Fidelity in RCTs: Models, Methods and Modes of Analysis David S. Cordray & Chris Hulleman Vanderbilt University Presentation for the IES Research Conference Washington, DC June 9, 2009 Overview Fidelity and Achieved Relative Strength: Definitions, distinctions and illustrations Conceptual foundation for assessing fidelity in RCTs Achieved relative strength, a special case in RCTs Modes of analysis: Approaches and challenges Chris Hulleman -- Assessing implementation fidelity and achieved relative strength indices: The single core component case Questions and discussion Distinguishing Implementation Assessment from the Assessment of Implementation Fidelity Two ends on a continuum of intervention implementation/fidelity: A purely descriptive model: Answering the question What transpired as the intervention was

put in place (implemented). Based on a priori intervention model, with explicit expectations about implementation of program components: Fidelity is the extent to which the realized intervention (tTx) is faithful to the pre-stated intervention model (TTx ) Infidelity = TTx tTx Most implementation fidelity assessments involve descriptive and model-based approaches. Dimensions Intervention Fidelity Aside from agreement at the extremes, little consensus on what is meant by the term intervention fidelity. Most frequent definitions: True Fidelity = Adherence or compliance: Program components are delivered/used/received, as prescribed With a stated criteria for success or full adherence The specification of these criteria is relatively rare Intervention Exposure: Amount of program content, processes, activities delivered/received by all participants (aka, receipt, responsiveness) This notion is most prevalent Intervention Differentiation: The unique features of the intervention are distinguishable from other programs, including the control condition A unique application within RCTs

Linking Intervention Fidelity Assessment to Contemporary Models of Causality Rubins Causal Model: True causal effect of X is (YiTx YiC) RCT methodology is the best approximation to this true effect In RCTs, the difference between conditions, on average, is the causal effect Fidelity assessment within RCTs entails examining the difference between causal components in the intervention and control conditions. Differencing causal conditions can be characterized as achieved relative strength of the contrast. Achieved Relative Strength (ARS) = tTx tC ARS is a default index of fidelity Treatment Strength .45 .40 .35 t tx Infidelity

YT Yt tC T C Infidelity 90 85 80 Achieved Relative Strength =.15 .25 .15 100 TTx .30 .20 Outcome

75 Yc YC 70 65 .10 60 .05 55 .00 50 d with fidelity Y YC T sd pooled d with fidelity (85)-(70) = 15

90 65 0.83 30 Expected Relative Strength = (0.40-0.15) = 0.25 d 85 70 0.50 30 d Yt Yc sd pooled d 0.50 Why is this Important? Statistical Conclusion validity Unreliability of Treatment Implementation: Variations across participants in the delivery receipt of the causal variable (e.g., treatment). Increases error and reduces the size of the effect; decreases chances of detecting covariation. Resulting in a reduction in statistical power or the need for a larger study. The Effects Structural Infidelity on

Power Fidelity .60 .80 1.0 Influence of Infidelity on Study-size Fidelity 1.0 .80 .60 If That Isnt Enough. Construct Validity: Which is the cause? (TTx - TC) or (tTx tC) Poor implementation: essential elements of the treatment are incompletely implemented. Contamination: The essential elements of the treatment group are found in the control condition (to varying degrees). Pre-existing similarities between T and C on intervention components. External validity generalization is about (tTx - tC) This difference needs to be known for proper generalization and future specification of the intervention components 10 9

8 7 6 5 4 3 2 1 0 So what is the cause? The achieved relative difference in conditions across components Infidelity T Planned T Obser PD Asmt Dif Inst PD= Professional Development Asmt=Formative Assessment Diff Inst= Differentiated Instruction

10 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 C Planned C Observ PD Asmt

Dif Inst Augmentation of Control Dif in Theory Dif as Obs PD Dif Inst Positive Infidelity .45 .40 .35 True Fidelity TTX TTx t tx .30

.15 tC T C Infidelity 100 90 85 80 Achieved Relative Strength =.15 .25 .20 Infidelity YT ' YT Yt 75 Yc YC

Intervention Differentiation 70 65 .10 60 .05 55 .00 50 Treatment Strength Intervention Exposure Outcome Review: Concepts and Definitions Tx Contamination Augmentation of C Intervention

Exposure Some Sources and Types of Infidelity If delivery or receipt could be dichotomized (yes or no): Simple fidelity involves compliers; Simple infidelity involves No shows and crossovers. Structural flaws in implementing the intervention: Missing or incomplete resources, processes External constraints (e.g. snow days) Incomplete delivery of core intervention components Implementer failures or incomplete delivery A Tutoring Program: Variation in Exposure 4-5 tutoring sessions per week, 25 minutes each, 11weeks Expectations: 44-55 sessions Random Assignment of Students Average Sessions Delivered Range

Time Cycle 1 Cycle 2 Cycle 3 47.7 33.1 31.6 16-56 12-42 16-44 Variation in Exposure: Tutor Effects Average Number of Tutoring Sessions per Tutor 50 45 40 35 30

25 20 15 10 5 0 1 3 5 7 9 11 13 15 17 Individual Tutors The other fidelity question: How faithful to the tutoring model is each tutor? In Practice.

Identify core components in the intervention group e.g., via a Model of Change Establish bench marks (if possible) for TTX and TC Measure core components to derive tTx and tC e.g., via a Logic model based on Model of Change Measurement (deriving indicators) Converted to Achieved Relative Strength and implementation fidelity scales Incorporated into the analysis of effects What do we measure? What are the options? (1) Essential or core components (activities, processes); (2) Necessary, but not unique, activities, processes and structures (supporting the essential components of T); and (3) Ordinary features of the setting (shared with the control group) Focus on 1 and 2. Fidelity Assessment Starts With a Model or Framework for the Intervention From: Gamse et al. 2008

Core Reading Components for Local Reading First Programs Design and Implementation of Research-Based Reading Programs Use of research-based reading programs, instructional materials, and assessment, as articulated in the LEA/school application Teacher professional development in the use of materials and instructional approaches After Gamse et al. 2008 1)Teacher use of instructional strategies and content based on five essential components of reading instruction 2) Use of assessments to diagnose student needs and measure progress 3) Classroom organization and supplemental services and materials that support five essential components

From Major Components to Indicators Major Components Subcomponents Instructional Time Reading Instruction Facets Block Scheduled block? Actual Time Reported time Instructional Material Instructional Activities/Strategies Support for Struggling Readers

Assessment Professional Development Indicators Reading First Implementation: Specifying Components and Operationalization Components Sub-components Facets Indicators (I/F) Reading Instruction Instructional Time 2 2 (1) Instructional Materials 4 12 (3)

Instructional Activities /Strategies 8 28 (3.5) Support for Struggling Readers (SR) Intervention Services 3 12 (4) Supports for Struggling Readers 2 16 (8) Supports for ELL/SPED 2 5 (2.5) Assessment

Selection/Interpretation 5 12 (2.4) Types of Assessment 3 9 (3) Use by Teachers 1 7 (7) Improved Reading Instruction 11 67 (6.1) 10 41 Professional development 4

Adapted from Moss et al. 2008 170 (4) Reading First Implementation: Some Results Components Reading Instruction Subcomponents Performance Levels RF ARSI (U3) Non-RF Instructional 101 Time (minutes) 78 0.33 (63%) Support 79%

58% 0.50 (69%) Struggling Readers More Tx, Time, Supplemental Service 83% 74% 0.20 (58%) Professional Development Hours of PD 41.5 17.6 0.42 (66%) Five reading

dimensions 86% 62% 0.55 (71%) Assessment Grouping, progress, needs 84% 71% 0.32 (63%) 0.39 (65%) Adapted from Moss et al. 2008 So What Do I Do With All This Data? Start with: Scale construction, aggregation over facets, subcomponents, components Use as: Descriptive analyses Explanatory (AKA exploratory) analyses There are a lot of options

In this section we describe a hierarchy of analyses, higher to lower levels of causal inference Caveat: Except for descriptive analyses, most approaches are relative new and not fully tested. Hierarchy of Approaches to Analysis: ITT (Intent-to-treat) estimates (e.g., ES) plus: an index of true fidelity: ES=.50 Fidelity = 96% an index of Achieved Relative Strength (ARS). Hullemans initial analysis: ES=0.45, ARS=0.92. LATE (Local Average Treatment Effect): If treatment receipt/delivery can be meaningfully dichotomized and there is experimentally induced receipt or non-receipt of treatment: adjust ITT estimate by T and C treatment receipt rates. Simple model can be extended to an Instrumental Variable Analysis (see Blooms 2005 book). ITT retains causal status; LATE can approximate causal statements. More on Fidelity to Outcome Linkages TOT (Treatment-on-Treated) Simple: ITT estimate adjusted for compliance rate

in Tx, no randomization. Two-level linear production function, modeling the effects of implementation factors in Tx and modeling factors affecting C in separate Level 2 equations. Regression-based model, exchanging implementation fidelity scales for treatment exposure variable. Descriptive Analyses Fidelity is often examined in the intervention group, only. Dose-response relationship Partition intervention sites into high and low implementation fidelity: My review of some ATOD prevention studies, the ESHIGH =0.13 to 0.18 ESLOW =0.00 to 0.03 Some Challenges Interventions are rarely clear; Measurement involves novel constructs; How should components be weighted? If at all. Fidelity assessment occurs at multiple levels; Fidelity indicators are used in 2nd and 3rd levels of HLM

models, few degrees of freedom; There is uncertainty about the psychometric properties of fidelity indicators; and Functional form of fidelity and outcome measures is not always known. But, despite these challenges, Chris Hulleman has a dandy example Assessing Implementation Fidelity in the Lab and in Classrooms: The Case of a Motivation Intervention The Theory of Change INTEREST MANIPULATED RELEVANCE PERCEIVED UTILITY VALUE PERFORMANCE Model Adapted from: Eccles et al. (1983); Hulleman et al. (2009) Methods (Hulleman & Cordray, 2009) Laboratory Classroom Sample

N = 107 undergraduates N = 182 ninth-graders 13 classes 8 teachers 3 high schools Task Mental Multiplication Technique Biology, Physical Science, Physics Treatment manipulation Write about how the Pick a topic from science mental math technique is class and write about relevant to your life. how it relates to your life. Control manipulation Write a description of a picture from the learning notebook. Pick a topic from science

class and write a summary of what you have learned. Number of manipulations 1 28 Length of Study 1 semester Dependent Variable 1 hour Perceived Utility Value Motivational Outcome C ontrol T reatment 6 ? P erc eived Utility Value 5.28

5 g = 0.45 (p = .03) 4.78 g = 0.05 (p = .67) 4 3.56 3.62 3 L ab C las s room Fidelity Measurement and Achieved Relative Strength Simple intervention one core component Intervention fidelity: Exposure: quality of participant responsiveness Rated on scale from 0 (none) to 3 (high) 2 independent raters, 88% agreement Exposure Laboratory

Classroom C C Tx Tx Quality of Responsiveness N % N % N % N % 0

47 100 7 11 86 96 38 41 1 0 0 15 24 4 4

40 43 2 0 0 29 46 0 0 14 15 3 0 0 12 19

0 0 0 0 Total 47 100 63 100 90 100 92 100 Mean 0.00

1.73 0.04 0.74 SD 0.00 0.90 0.21 0.71 Indexing Fidelity Absolute Compare observed fidelity (tTx) to absolute or maximum level of fidelity (TTx) Average Mean levels of observed fidelity (tTx) Binary Yes/No treatment receipt based on fidelity scores Requires selection of cut-off value Fidelity Indices Conceptual

Absolute Tx C Average Tx C Binary t Tx tC Laboratory Classroom 1.73 0.74 X Tx 100 58% 100 25% 100 T Tx 3.00 3.00 XC 100

TC 0.00 100 0% 3.00 0.04 100 1% 3.00 1.73 0.74 t C =X C 0.00 0.04 ntTx 41 0.65 63 0 0.00 47 14 0.15

92 0 0.00 90 t Tx = X Tx Tx t C tC Tx nTx ncTx nC Indexing Fidelity as Achieved Relative Strength Intervention Strength = Treatment Control Achieved Relative Strength (ARS) Index Tx t t ARS Index ST

C Standardized difference in fidelity index across Tx and C Based on Hedges g (Hedges, 2007) Corrected for clustering in the classroom (ICCs from .01 to .08) See Hulleman & Cordray (2009) Average ARS Index X1 X2 3 2(n 1) p g ( ) (1 ) 1 ST 4( nTx nC ) 9 N 2 Group Difference Sample Size Adjustment Clustering Adjustment Where, X 1 = mean for group 1 (tTx ) X 2 = mean for group 2 (tC) ST = pooled within groups standard deviation

nTx = treatment sample size nC = control sample size n = average cluster size p = Intra-class correlation (ICC) N = total sample size Absolute and Binary ARS Indices 3 2(n 1) p g 2 *arcsin ( pTx ) 2 *arcsin ( pC ) (1 ) 1 4(nTx nC ) 9 N 2 Group Difference Sample Size Adjustment Where, pTx = proportion for the treatment group (tTx ) pC = proportion for the control group (tC) nTx = treatment sample size nC = control sample size n = average cluster size p = Intra-class correlation (ICC) N = total sample size Clustering Adjustment

Average ARS Index Treatment Strength 100 Fidelity TTx 3 Infidelity 66 t Achieved Relative Strength = 1.32 33 tC 0 Xt tx T C

Infidelity 2 1 Xc 0 (0.74)-(0.04) = 0.70 ARS g ARS g Xt Xc sd pooled 0.74 0.04 1.32 0.53 Achieved Relative Strength Indices Observed Fidelity Lab Class Absolute Tx

0.58 0.25 C 0.00 0.01 g 1.72 0.80 Tx 1.73 0.74 C 0.00 0.04 g Tx

2.52 0.65 1.32 0.15 C 0.00 0.00 g 1.88 0.80 Average Binary Lab vs. Class Contrasts Lab - Class 0.92 1.20

1.08 Linking Achieved Relative Strength to Outcomes Sources of Infidelity in the Classroom Student behaviors were nested within teacher behaviors Teacher dosage Frequency of student exposure Student and teacher behaviors were used to predict treatment fidelity (i.e., quality of responsiveness/exposure). Sources of Infidelity: Multi-level Analyses Part I: Baseline Analyses Identified the amount of residual variability in fidelity due to students and teachers. Du to missing data, we estimated a 2-level model (153 students, 6 teachers) Student: Yij = b0j + b1j(TREATMENT)ij + rij, Teacher: b0j = 00 + u0j, b1j = 10 + u10j Sources of Infidelity: Multi-level Analyses Part II: Explanatory Analyses

Predicted residual variability in fidelity (quality of responsiveness) with frequency of responsiveness and teacher dosage Student: Yij = b0j + b1(TREATMENT)ij + b2(RESPONSE FREQUENCY)ij + rij Teacher: b0j = 00 + u0j b1j = 10 + b10(TEACHER DOSAGE)j + u10j b2j = 20 + b20(TEACHER DOSAGE)j + u20j Sources of Infidelity: Multi-level Analyses Baseline Model Variance Component Residual Variance % of Total Explanatory Model Variance % Reduction

Level 1 (Student) 0.15437* 52 0.15346* <1 Level 2 (Teacher) 0.13971* 48 0.04924 65* Total * p < .001. 0.29408 0.20270 Case Summary

The motivational intervention was more effective in the lab (g = 0.45) than field (g = 0.05). Using 3 indices of fidelity and, in turn, achieved relative treatment strength, revealed that: Classroom fidelity < Lab fidelity Achieved relative strength was about 1 SD less in the classroom than the laboratory Differences in achieved relative strength = differences motivational outcome, especially in the lab. Sources of fidelity: teacher (not student) factors Key Points and Issues Identifying and measuring, at a minimum, should include model-based core and necessary components Collaborations among researchers and practitioners (e.g., developers and implementers) is essential for specifying: Intervention models Core and essential components Benchmarks for TTx (e.g., an educationally meaningful dose; what level of X is needed to instigate change) Tolerable adaptation Key Points and Issues Fidelity assessment serves two roles: Average causal difference between conditions; and Using fidelity measures to assess the effects

of variation in implementation on outcomes. Post-experimental (re)specification of the intervention Thank You Questions and Discussion

Recently Viewed Presentations

  • Pure Energy Fuels -

    Pure Energy Fuels -

    Microalgae are the most primitive form of "plants" with most contain green chlorophyll, and use photosynthesis to convert energy from the sun. Single cell organisms that exist individually, or in chains or groups. Their sizes range from a few micrometers...
  • CRACKING THE COMPLEXITY CODE To optimise learning World

    CRACKING THE COMPLEXITY CODE To optimise learning World

    BACKGROUND. So what makes the difference? There are some well known factors. The Presenter. Subject Matter. Presentation Aids. Delivery (Style, Pace, Engagement,
  • The glossary of prosthodontic terms

    The glossary of prosthodontic terms

    literature. Authors, clinicians, and teachers have ascribed many different interpretations and meanings to the same terms and words. Many "old" terms have been given "new" denitions by contem-porary authors. Many "old" terms have been redened in the light of the...
  • W. Ross Ashby - Grandfather of The Matrix?

    W. Ross Ashby - Grandfather of The Matrix?

    A New Artificial Intelligence 2 Kevin Warwick
  • Geometry Unit

    Geometry Unit

    The word Trigonometry comes from the Greek words meaning "Triangle Measure.". This material can be applied to any kind of triangle… But we will only be using this for right triangles. Trigonometry
  • Geographic Information Science - Harvard University

    Geographic Information Science - Harvard University

    Geography/Space is being simplified Non-geographic information is being simplified * Map Types Let's see what types of maps we can make Dot density Proportional/graduated symbol choropleth Dot Density Maps Portraying numeric/count data with dot density Dots coincide with enumeration area...
  • Gulf of Tonkin: Did the attacks really happen?

    Gulf of Tonkin: Did the attacks really happen?

    Viet Cong not easily identified. Easy to ambush. Vietcong Tactics. They fought a guerrilla war. Ambushing US patrols. Setting booby traps and landmines. Planting bombs in towns. Mingled in with the peasants. Supplied with rockets/weapons by China and Russia.
  • Pediatric Ingestions Emergency department 11/10/17 Case 1:  18

    Pediatric Ingestions Emergency department 11/10/17 Case 1: 18

    Soon after, refused to eat. Afebrile with normal vital signs and no respiratory distress. Oropharynx and lung exam are clear. What, if any, imaging is recommended? Is a surgical consult indicated. Can he be safely discharged home? ... Seizure ....