When Statistical Significance Hides More Than it Reveals
by Jeanne M. Powers & Gene V. Glass - July 02, 2014
Background & Purpose: The What Works Clearinghouse (WWC) recently released a summary of five CREDO studies of charter school outcomes produced between 2009 and 2013. We compared the WWC’s summary, which highlighted the statistical significance of the findings, to the effect sizes reported in the individual reports. We also addressed the findings reported in the original studies that were not summarized in the WWC reports.
Research Design: Analytic essay highlighting the gaps between the findings reported in the summary WWC report, the individual WWC reports for each study, and the original CREDO studies.
Findings: We argue that focusing on statistical significance is potentially misleading. The WCC summary invites the reader to conclude that charter schools had a greater effect on students’ achievement gains than traditional public schools. Comparing across the studies’ effect sizes suggests that the average effect of charter schools on students’ achievement gains is negligible. The WWC reports also do not address the considerable variation in achievement gains within and across subgroups of students and schools.
Conclusion: Summaries generated from research studies should provide an accounting of findings that allows practitioners to assess their practical importance. When these and similar reports are hard to understand and misleading, they run the risk of eroding practitioners’ trust in research and increasing rather than bridging the gulf between research and practice.
Table 1 is a simplified version of a table produced for the What Works Clearinghouses (WWC) Review of the Center for Research on Education Outcomes (CREDO) Charter School Studies (WWC, 2014a). The WWC review summarized the individual reviews of five studies examining charter school student outcomes. The original studies were produced by CREDO between 2009 and 2013. The WWC reviews (three quick reviews and two single study reviews) were released about six months after the release of each study. The last of these reviews and the summary were released in January 2014.
Table 1: WWC table summarizing statistically significant findings across five CREDO studies.
Source: Authors reproduction of table from WWC Review of the CREDO Charter School Studies (January, 2014a)i
The table is surrounded by technical details about the studies summarized. But the power of the table lies not in its fidelity to technical details; it is technically correct. The power of the table is in its simple visual appeal. In this research note we argue that this table can also provide important lessons about statistical and practical significance.
The goal of the WWC is to allow educators to make evidence-based decisions by identifying studies that provide credible and reliable evidence of the effectiveness of a given practice, program, or policy (WWC, n.d.). These are studies that have been vetted and summarized using protocols for assessing the research designs and findings of studies (WWC, 2014b). These summaries of what works are explicitly intended to help people working in schools figure out how to work more effectively with students and their familiesthese short summaries are presumably easier to access and understand than the full research reports.
Quick reviews provide preliminary reviews of analyses of program or practice effectiveness; single study reviews are more detailed reviews of studies that underwent quick review. Single study reviews are designed to provide education practitioners and policymakers with timely and objective assessments of the quality of the research evidence from recently released research papers and reports (WWC, 2012).1 The CREDO studies were reviewed by the WWC because they had received significant media attention (WWC, 2014a). They are also important because they are the studies of school choice outcomes in the WWC that are the largest in scope. The 14 other studies of school choice in the WWC tend to focus on specific settings (Milwaukee, the District of Columbia, New York City) or smaller samples of specific types of charter school (KIPP schools, charter schools run by charter management organizations). Our discussion is focused on the WWC reviews, although we consulted the CREDO studies to clarify our understanding of the WWC reviews as needed (WWC 2010a, 2010b, 2011, 2013, 2104a, 2014c).
Briefly, the CREDO studies all used the same technique: charter school students were matched to similar students attending feeder traditional public schools2 by grade level, baseline achievement, and demographic characteristics (eligibility for free or reduced price lunch, special education status, gender, and race). The analyses compared the two groups achievement gains on standardized reading and math tests between the baseline year and the following year. The main difference between the studies was the scope of the analytic samples. Two were multi-state (a national and a 16 state study), two focused on specific states (Indiana and New Jersey), and the final study was of New York City charter schools. The studies also varied by time frame and the grades included in the analyses.
Table 2 provides a summary of the effect sizes reported in the individual WWC reviews. For the purposes of accuracy, in Table 2, we reproduced much of the text from the individual reports and organized it into a chart similar to Table 1. Leave aside for the moment the possibility that all of these comparisons are confounded by a differential regression effect (Campbell & Stanley, 1966); one of the common reasons a student enrolls in a charter school is because they earned poor grades at their traditional public school.3 We paraphrased when necessary to present the findings in table form and also facilitate comparison across the reports.
Table 2: Effect sizes across CREDO charter school studies
Source: Authors summary from WWC 2010a, 2010b, 2011, 2013, 2014c.
1. Retrieved February 21, 2014 from http://ies.ed.gov/ncee/wwc/SingleStudyReview.aspx?sid=220. A + indicates that charter school students had greater achievement gains after one year than their traditional public school counterparts. A - indicates that traditional public school students made greater achievement gains. The text above the table notes that all of the findings were statistically significant except for the negative math gains observed in the national study.
2. The results reported here are from WWCs analysis of CREDOs first report on Indiana (2011), which covered the years between 2004 and 2008. In a second report covering the years between 2007 and 2011 (CREDO, 2011), the achievement gains for charter school students, while still positive, were smaller (.04 standard deviations higher than traditional public school students in reading and .04 in math).
3. The information about percentile rank equivalency for New Jersey is drawn from the table in Appendix C. We reworded the information about percentile rank equivalency in the New Jersey and New York reports to be consistent with the information presented in the WWCs 16 state report (the first of the WWCs five single study reports).
4. The WWC report discussed here analyzed CREDOs January 2010 report on charter school achievement in New York City which analyzed achievement gains from 2003 to 2008. A more recent report (February 2013) assessed achievement gains from 2007 to 2011 and found that the average achievement gains for charter school students in reading were lower than those reported here (.03 standard deviations higher than traditional public students), and the average achievement gains for charter school students in math were slightly higher (.14.).
Table 2 lacks the impact on the reader of Table 1. Table 1 invites the reader to count the number of cells with a + and come to the conclusion that on balance, charter schools had a much greater effect on students achievement gains than traditional public schools. Yet for more than 30 years it has been clear that counts of statistically significant and non-significant results violate some of the most fundamental properties of statistical hypothesis testing (Hedges & Olkin, 1980). Table 2 suggests that on average, the effect of charter schools on students achievement gains, positive or negative, was negligible. The strongest positive results for charter schools were reported for New York City students in mathematics. While positive, these effect sizes are among the lowest of those reported across the 15 studies of school choice in the WWC database that met WWC evidence standards.4 They are also well under the definition of a substantially important finding provided in the glossary of all WWC single study reviews: a substantively important finding has an effect size of .25 or greater, regardless of statistical significance (WWC, 2014a).5 By comparison, the typical effect size of a years growth in reading achievement at the elementary school level is about 1.0; in math, it is slightly larger (Levin, Glass & Meister, 1986, 1987). The latter figure allows us to interpret more correctly the practical significance of the studies findings because we can compare the effect sizes yielded across the studies to an appropriate benchmark, typical yearly achievement growth.
Finally, while the discussion above focuses on the findings that the WWC reported and assessed, these are a relatively small part of the analyses in the CREDO studies. The WWC reports focus largely, although not exclusively, on the average achievement gains of charter school students compared to traditional public school students across the full sample of students in each study. However, the CREDO studies also compare achievement gains within subgroups of students by race, special education, and English language learner status, deciles of prior achievement, and the number of years enrolled in charter schools. Other analyses include assessments of charter school and traditional public school students achievement gains in different locales (the local markets charter schools serve, metropolitan areas, and states depending on the scope of the report). These findings are substantially more complex and nuanced than the summary table might suggest and are only minimally and inconsistently addressed in three of the five WWC reports and not addressed in two.6 Moreover, marked differences in achievement gains within groups of students or schools indicate that simple overall averages are not appropriate for understanding the effectiveness of a given practice, program, or policy (WWC, n.d.). In other words, the WWC summary and individual reports are largely silent on a key finding of the CREDO studiescharter school achievement effects vary considerably across settings and subgroups of studentsfindings that have important implications that practitioners and researchers need to consider.
While the table that is arguably the center of the WWC Review of the CREDO Charter School Studies is a technically correct summary of statistical significance, it provides potentially misleading visual cues that may overemphasize and exaggerate the success of charter schools, adding to the myth of their superiority (Thaler & Sunstein, 2009; see also Fischman & Tefera, 2014 and Berliner, Glass & Associates, 2014). This review is unique in the WWC database because it summarizes across multiple single study and quick reviews of studies using the same research design. It is also the starkest representation of an inconsistency in how WWC reports are presented to the public. Half of the 14 other studies of school choice summarized as either quick or single study reviews address the practical significance of the studies findings on each reviews web page the easiest to access source of information about a given study. More detailed reports can be easily downloaded in PDF form. In the latter, the statistical significance of findings and effect sizes are reported in narrative form, much like the information presented in Table 2 above. In the WWC Review of the CREDO Charter School Studies, the effort to summarize across multiple studies resulted in an oversimplification that obscures more than it illuminates. Likewise, the summary report and the individual reviews largely do not address a significant aspect of all of the CREDO analysesthe variations in achievement gains across subgroups of students and different types of charter schools.
If summaries generated from research studies are intended to be useful guides to practitioners, they must provide a consistent and careful accounting of findings that allows them to assess their practical importance. Summaries of research that are hard to understand and misleading run the risk of eroding practitioners trust in research and increasing rather than bridging the gulf between research and practice.
1. The WWC provides four types of review: (a) intervention guides that review analyses of specific interventions; (b) practice guides that are produced by an expert panel and are aimed at providing clear, research-based guidance for practitioners; (c) quick reviews; and (d) single study reviews.
2. Feeder schools are traditional public schools that had students transfer to one of the charter schools in the study sample.
3. All of the reviews provide a similar note of caution about the research design that alludes to the possibility of regression effects. These vary somewhat across the five reports. The most elaborate version states the following: [U]nobserved differences between [charter school and traditional public school students] may have existed. For example, charter school students may have been more motivated to do well in school or may have had other unobserved characteristics that influenced student achievement. This means the studys results do not necessarily isolate the effect of charter schools. (WWC, 2013) According to the WWC protocol, the highest rating that quasi-experimental studies such as the CREDO charter school studies can receive is meets WWC group design standards with reservations because even if groups are equivalent on observed characteristics, there may be important differences between a treatment group and the comparison group on unobserved characteristics that may introduce bias into an estimate of the effect of the intervention (WWC, 2014b, pp. 1011). All five of the CREDO studies met the WWCs standards with reservations.
4. As the WWC Review of the CREDO Charter School Studies noted, the effect sizes for the CREDO studies are not directly comparable to the effect sizes reported for other studies because the CREDO studies compare achievement gains between charter school students and traditional public school students whereas other studies compare the two groups on different outcomes (e.g., reading and math achievement, high school graduation, and college attendance).
5. While outside the scope of the discussion here, this standard is also problematic because it provides a normative standard for interpreting findings without a clear rationale (Konstantopolis & Hedges, 2008; see also Lipsey et al., 2012). It is significant here because this is ostensibly the information a practitioner, the target audience for WWC reviews would have at hand to assess the findings reported in single study reviews.
6. The WWC report of the 16-state study discusses the school-level comparisons (WWC, 2010), the Indiana report summarizes the decile comparison (WWC, 2011), and the New Jersey report highlights the findings for Newark (WWC, 2013).
Berliner, D. C., Glass, G. V & Associates. (2014). 50 myths and lies that threaten Americas public schools: The real crisis in education. NY: Teachers College Press.
Campbell, D. T. & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand-McNally.
Center for Research on Education Outcomes. (2012, December 12). Charter school performance in Indiana. Retrieved from http://credo.stanford.edu/pdfs/IN_2012_FINAL_20130117nw.pdf
Center for Research on Education Outcomes. (2013, February 20). Charter school performance in New York City. Retrieved from http://credo.stanford.edu/documents/NYC_report_2013_FINAL_20130219_000.pdf
Hedges, L. V. & Olkin, I. (1980). Vote-counting methods in research synthesis. Psychological Bulletin, 88(2), 359-369.
Fischman G. E. & Tefera A. A. (2014). Qualitative inquiry in an age of educationalese. Education Policy Analysis Archives, 22(7).
Konstantopolis, S. & Hedges, L. (2008). How large an effect can we expect from school reforms? Teachers College Record 110(8), 16111638.
Levin, H.M., Glass, G.V & Meister, G.R. (1986). The political arithmetic of cost-effectiveness analysis. Kappan, 68(1), 6972.
Levin, H. M.; Glass, G. V & Meister, G.R. (1987). Different approaches to improving performance at school. Zeitschrift fur Internationale Erziehungs und Sozial Wissenschaftliche Forschung, 3, 156176.
Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. (NCSER 2013-3000). Washington, DC: Institute of Education Sciences, U.S. Department of Education.
Thaler, R. H., & Sunstein, C. R. Nudge: Improving decisions about health, wealth, and happiness. New York: Penguin Books.
What Works Clearinghouse. (n. d.) Topics in education. http://ies.ed.gov/ncee/wwc. Retrieved February 23, 2014 from http://ies.ed.gov/ncee/wwc/topics.aspx.
What Works Clearinghouse. (2010a, February). WWC quick review of the report Multiple choice: Charter school performance in 16 States. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/quick_reviews/charterschools_021710.pdf
What Works Clearinghouse. (2010b, July). WWC review of the report Charter School Performance in New York. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/quick_reviews/nyccharter_070710.pdf.
What Works Clearinghouse. (2011, September). WWC review of the report Charter School Performance in Indiana. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/quick_reviews/incharter_093011.pdf.
What Works Clearinghouse. (2013, October). WWC review of the report Charter School Performance in New Jersey. Retrieved from es.ed.gov/ncee/wwc/pdf/single_study_reviews/wwc_njcharter_100113.pdf.
What Works Clearinghouse. (2014a, January). WWC review of the CREDO charter school studies. Retrieved from http://ies.ed.gov/ncee/wwc/SingleStudyReview.aspx?sid=220.
What Works Clearinghouse. (2014b, January). WWC review of the report National Charter School Study: 2013. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/single_study_reviews/wwc_ncss_012814.pdf.
What Works Clearinghouse. (2014c). Procedures and standards handbook, Version 3.0. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/reference_resources/wwc_procedures_v3_0_standards_handbook.pdf.