Research Synopsis: How Can One Identify "Scientifically-Based Research"?
In December 2003, the Institute of Education Sciences (IES), a part of the U.S. Department of Education, published Identifying and Implementing Educational Practices Supported by Rigorous Evidence: A User Friendly Guide. In the executive summary, the purpose of this guide is defined as:
The field of K-12 education contains a vast array of educational interventions-such as reading and math curricula, school-wide reform programs, after-school programs, and new educational technologies-that claim to be able to improve educational outcomes and, in many cases, to be supported by evidence. This evidence often consists of poorly-designed and/or advocacy-driven studies. State and local education officials and educators must sort through a myriad of such claims to decide which interventions merit consideration for their schools and classrooms. Many of these practitioners have seen interventions, introduced with great fanfare as being able to produce dramatic gains, come and go over the years, yielding little in the way of positive and lasting change-a perception confirmed by the flat achievement results over the past 30 years in the National Assessment of Educational Progress long-term trend.
The federal No Child Left Behind Act of 2001, and many federal K-12 grant programs, call on educational practitioners to use 'scientifically-based research' to guide their decisions about which interventions to implement. As discussed below, we believe this approach can produce major advances in the effectiveness of American education. Yet many practitioners have not been given the tools to distinguish interventions supported by scientifically-rigorous evidence from those which are not. This Guide is intended to serve as a user-friendly resource that the education practitioner can use to identify and implement evidence-based interventions, so as to improve educational and life outcomes for the children they serve (p. iii). |
The text of this article summarizes the main features of what the IES considers scientifically based research. The complete guide, including a checklist for evaluating whether an intervention (defined as "an educational practice, strategy, curriculum, or program," p. 1) is supported by rigorous evidence (Appendix B), can be found at http://www.ed.gov/rschstat/research/pubs/rigorousevid/rigorousevid.pdf . Web sites that list evidence-based interventions can be found in Appendix A of the report.
How to Evaluate if an Intervention is Backed by "Strong" Evidence of Effectiveness
Part of the difficulty in choosing programs, products, and strategies that are supported by strong evidence of effectiveness lies in defining the components of scientifically based research. Lots of interventions have been designed from a research base. Lots of interventions claim success in educating students. Yet which should be trusted? Which should be discounted as just marketing hype or part of a political agenda?
The IES considers randomized controlled trials (defined on p. 1 as "studies that randomly assign individuals to an intervention group or to a control group. .. in order to measure the effects of the intervention") to be the "gold standard" for evaluating an intervention's effectiveness. IES provides an example: "Suppose you want to test, in a randomized controlled trial, whether a new math curriculum for third-graders is more effective than your school's existing math curriculum for third-graders. You would randomly assign a large number of third-grade students to either an intervention group, which uses the new curriculum, or to a control group, which uses the existing curriculum. You would then measure the math achievement of both groups over time. The difference in math achievement between the two groups would represent the effect of the new curriculum compared to the existing curriculum" (p. 1). In much the same way as randomized controlled trials are used in medicine, welfare and employment policy, and psychology, these techniques can be used in education.
By using randomized controlled trials, the intervention itself, rather than other factors that could cause unwanted effects, can be evaluated. If a randomized control trial is designed and implemented properly, it is "superior to other study designs in measuring an intervention's true effect" (p. 2). A pre-post study design (defined as a "study that examines whether participants in an intervention improve or regress during the course of the intervention, and then attributes any such improvement or regression to the intervention"), cannot answer whether "the participants' improvement or decline would have occurred anyway, even without the intervention" (p. 2). A study of the federally funded "Even Start" program illustrates the advantages of randomized control trials versus a pre-post study design:
| A randomized controlled trial of Even Start-a federal program designed to improve the literacy of disadvantaged families-found that the program had no effect on improving the school readiness of participating children at the 18th-month follow-up. Specifically, there were no significant differences between young children in the program and those in the control group on measures of school readiness including the Picture Peabody Vocabulary Test (PPVT) and PreSchool Inventory. If a pre-post design rather than a randomized design had been used in this study, the study would have concluded erroneously that the program was effective in increasing school readiness. This is because both the children in the program and those in the control group showed improvement in school readiness during the course of the program (e.g., both groups of children improved substantially in their national percentile ranking on the PPVT). A pre-post study would have attributed the participants' improvement to the program whereas in fact it was the result of other factors, as evidenced by the equal improvement for children in the control group. (p. 2) |
Other important features to consider when assessing a study that evaluates an intervention include:
- "The study should clearly describe (i) the intervention, including who administered it, who received it, and what it cost; (ii) how the intervention differed from what the control group received; and (iii) the logic of how the intervention is supposed to affect outcomes."
- "Be alert to any indication that the random assignment process may have been compromised." Did any of the individuals assigned to one group move to the other group?
- "The study should provide data showing that there were no systematic differences between the intervention and control groups before the intervention." "Systematic differences" here refer to academic achievement levels, socioeconomic status, and language learning status.
- "The study should use outcome measures that are 'valid'-i.e., that accurately measure the true outcomes that the intervention is designed to affect." Well-known tests should be used. If the study is based on interview or observations, the interviewers/observers should not know who is in the intervention or control groups.
- "The percent of study participants that the study has lost track of when collecting outcome data should be small, and should not differ between the intervention and control groups." The IES says that the number of participants who drop out should be less than 25% for both groups.
- "The study should collect and report outcome data even for those members of the intervention group who don't participate in or complete the intervention." It is possible that only the more motivated students decide to participate in any intervention, thus skewing the data.
- "The study should preferably obtain data on long-term outcomes of the intervention, so that you can judge whether the intervention's effects were sustained over time." The IES mentions that many intervention strategies lose effectiveness within 2-3 years.
- "If the study claims that the intervention improves one or more outcomes, it should report (i) the size of the effect, and (ii) statistical tests showing the effect is unlikely to be due to chance." The size of the difference between the intervention and control group on tests should be "statistically significant"-around the 0.05 level. This means that there is only a 1 in 20 chance that differences in groups could have occurred by chance.
- "A study's claim that the intervention's effect on a subgroup (e.g., Hispanic students) is different than its effect on the overall population in the study should be treated with caution." The effects reported might again be because of chance: "studies that engage in a post-hoc search for different subgroup effects (as some do) will sometimes turn up spurious effects rather than legitimate ones."
- "The study should report the intervention's effects on all the outcomes that the study measured, not just those for which there is a positive effect." As the guide states: "if a study measures a large number of outcomes, it may, by chance alone, find positive (and statistically-significant) effects on one or a few of those outcomes."
A longer and more detailed description of these features can be found on pages 5-9 of the guide.
Factors to Consider when Implementing an Intervention
The guide cites two important factors to consider when implementing an intervention in a school or district:
A) "Whether an evidence-based intervention will have a positive effect in your schools or classrooms may depend critically on your adhering closely to the details of its implementation." For example:
The Tennessee Class-Size Experiment-a large, multi-site randomized controlled trial involving 12,000 students-showed that a state program that significantly reduced class size for public school students in grades K-3 had positive effects on educational outcomes. For example, the average student in the small classes scored higher on the Stanford Achievement Test in reading and math than about 60 percent of the students in the regular-sized classes, and this effect diminished only slightly at the fifth-grade follow-up.
Based largely on these results, in 1996 the state of California launched a much larger, state-wide class-size reduction effort for students in grades K-3. But to implement this effort, California schools hired 25,000 new K-3 teachers, many with low qualifications. Thus the proportion of fully-credentialed K-3 teachers fell in most California schools, with the largest drop (16 percent) occurring in the schools serving the lowest-income students. By contrast, all the teachers in the Tennessee study were fully qualified. This difference in implementation may account for the fact that, according to preliminary comparison-group data, class-size reduction in California may not be having as large an impact as in Tennessee (p. 14). |
B) "When implementing an evidence-based intervention, it may be important to collect outcome data to check whether its effects in your schools differ greatly from what the evidence predicts." This is especially important when considering intervention strategies for ELLs. Was the success of the intervention based on a school district with very few ELLs? No ELLs? |