
Dividing by Zero: Exploring Null Results in a Mathematics Professional Development Programby Heather C. Hill, Douglas Lyman Corey & Robin T. Jacob  2018 Background/Context: Since 2002, U.S. federal funding for educational research has favored the development and rigorous testing of interventions designed to improve student outcomes. However, recent reviews suggest that a large fraction of the programs developed and rigorously tested in the past decade have shown null results on student outcomes and, often, intermediate variables. Scholars reporting on null results often explain such results by citing factors they informally identified while they either delivered or observed the program. Purpose/Objective/Research Question/Focus of Study: In this paper, we argue for a more systematic approach to examining null results, and develop a framework for evaluating null results based on the policy implementation literature. We then illustrate this approach by examining why one professional development program failed to show impacts on measures of teaching and student learning in a recent study. Setting: The professional development program took place in a midsized urban school district on the East Coast. The provider was fully scaled up, capable of providing professional development in most U.S. locations. Research Design: The main study of this program was conducted as a cluster randomized trial with 105 teachers in 18 schools. Here, we engage in a qualitative case study, using multiple sources of evidence to assess the likelihood that specific reasons for null results are valid. Data Collection and Analysis: The case study sources of evidence include observations of professional development; teacher surveys and logs; transcribed videos of teachers’ mathematics instruction; and teacher interviews. Findings/Results: Our analysis suggested that null impacts could result from district priorities and instructional guidance that compete with professional development goals; weaknesses in the intervention as well as its fit to teachers’ needs; and the difficulty of implementing ambitious instructional practice. Conclusions/Recommendations: Our findings suggest the need for further elaboration of the nullresults framework. They also suggest that professional development providers consider both (a) both the organizations in which programs operate and (b) fit of the program to teachers’ needs as potential barriers to success. Since 2002, U.S. federal funding for educational research has favored both the development of interventions designed to improve student outcomes and the use of randomized trials to evaluate these interventions. To the surprise of many, a large proportion of the programs developed and tested in the past decade have shown null results on student outcomes and, often, intermediate variables. A recent report (Coalition for EvidenceBased Policy, 2013) found that of roughly 90 Institute for Education Sciences studies funded under contracts between 2002 and 2013, 88% produced weak or null results; this included many highlyregarded and widely implemented programs evaluated in welldesigned studies (e.g., Bos et al., 2012; Garet et al., 2008; Garet et al., 2011). These findings have met with a fair amount of dismay among scholars and educators but, to date, there has been little systematic inquiry regarding why such results might occur. Some have speculated that this outcome is consistent with those from other fields, where noimpact findings are common (Coalition for EvidenceBased Policy, 2013). It is also possible that methodological issues, such as weak statistical power for detecting effects (Schochet, 2008; Spybrook & Raudenbush, 2009), may contribute to these null results. Other evidence suggests more substantive reasons for flat student outcomes, including prominently the inadequate implementation of major intervention components (Garet et al., 2008; Garet et al., 2011; Hurtig, 2009; Santagata, Kersting, Givven, & Stigler, 2011). In this paper, we argue for developing a common approach to testing substantive reasons for null results and offer a potential framework for engaging in such analyses. We then demonstrate how we used this framework to test why a mathematics professional development program failed to influence measures of teaching practice or student outcomes. Although identifying the causes of a null result is a little like dividing zero by zero—results are always indeterminate owing to infinite possibilities—gaining purchase on potential explanations would allow programs to adapt and innovate. BACKGROUND Our argument for this paper centers around two related points: (1) the field of educational impact evaluation to date has lacked consistency in its investigations and interpretations of null results, and (2) a framework for conducting such investigations would be particularly helpful for cases in which programs failed to influence classroom practice and processes, a finding that we refer to here as implementation failure. To illuminate these points, we examined recent K–12 Institute for Educational Sciences (IES) efficacy and scaleup awards, which evaluate programs either already in wide use or developed through a succession of prior IES field trials. Of grants made through 2011, about one in three had null results, with null results defined as having more than half of main student impacts either not statistically significant or quite small.^{1} An examination of these 10 nullresult studies (see Table 1) shows that four list implementation as a possible cause of program failure.^{2} Evidence from a fifth also suggests a lack of effect on classroom processes (Murray, Rabiner, & Carrig, 2014). In these papers, authors listed a variety of factors that may have contributed to implementation failure: a lack of teacher buyin; a lack of principal or administrator support; other programs competing for teachers’ time and attention; teachers who lack the knowledge or skills needed to take advantage of the intervention; and, importantly, an insufficiently strong intervention. In addition to implementation issues, authors of nullresults studies noted methodological issues, such as inadequate power to detect desired effects and the lack of a “bright” treatment contrast, as well as a program that was adequately implemented but potentially too weak to show impacts on student outcomes. A lack of implementation, however, was among the most common reasons for null findings. Table 1. IES Efficacy and Scaleup Studies With Null Results
However, Table 1 shows that the field’s investigations of null results have, to date, been far from consistent and systematic across studies. Most papers considered very few explanations for null results—between zero and two typically, with Santagata and colleagues’ (Santagata et al., 2011) and Gersten and colleagues’ (Gersten, Dimino, Jayanthi, Kim, & Santoro, 2010) discussions of null effects the exceptions to this rule. Importantly, as reflected in the ratio of explanations considered to explanations accepted, most articles only present information on the hypotheses authors consider viable, rather than raising a number of theoretically based considerations and then testing evidence for each. This leaves the field to wonder whether alternative explanations were explored and rejected, or not explored at all—a significant problem for future attempts to aggregate findings across studies. Relatedly, few articles show their work, so to speak, in the consideration of explanations for null results/lack of fidelity. For instance, RimmKaufman and colleagues (2014) show strong fidelity as measured quantitatively (two to three SDs in difference in classroom practice based on observations and student surveys), but in the conclusion, they discuss a lack of fidelity at some sites without further elaboration of the evidence for this point. This can be confusing to readers seeking to assess the strength of evidence for specific conclusions. These observations lead us to suggest that authors describing null results, and scholars in the field more generally, collaborate to create a framework for interpreting and investigating null results. Because other scholars have focused attention on methodological issues (Lemons, Fuchs, Gilbert, & Fuchs, 2014; Spybrook & Raudenbush, 2009), we focus in this paper on developing a framework to test more substantive issues for null results, with particular attention to identifying reasons for nonimplementation of programs. We argue for basing this framework in the existing literature on null results (e.g., Table 1), as well as the moreestablished scholarship on policy implementation. Over a period of 50 years this latter literature has identified a number of major causes of program failure, making it potentially useful in the design of a method for interpreting null results from educational evaluations. We also integrate observations from the teacher professional development literature, which both complements the implementation findings and makes its lessons more education specific. LESSONS FROM POLICY IMPLEMENTATION In the early 1970s, the failure of many Johnsonera Great Society programs to deliver relief to impoverished communities spurred academics to inquire about the reasons for such disappointments; the result was a thriving literature on the perils of policy implementation (e.g., Pressman & Wildavsky, 1973; Sabatier & Mazmanian, 1979). By the mid1980s, implementation scholars had identified more than 300 variables (O’Toole, 1986) responsible for the adaptation, transformation, or collapse of policy at the local level. Many of these variables related to a newly recognized class of agents—termed streetlevel bureaucrats (Lipsky, 1980)—responsible for implementing social policy in schools, police stations, public defenders’ offices, welfaretowork programs, and elsewhere. These agents interact directly with policy “recipients,” tend to have wide discretion in the construction of policy and practice, and often work in situations with few resources and high demand for services. Early research showed that streetlevel bureaucrats’ attitudes, capacity, and practice profoundly shape the implementation of policy (Berman & McLaughlin, 1978). Over the next three decades, scholars both inside and outside education elaborated on these findings, arriving at several broad reasons policies and programs fail. Streetlevel bureaucrats’ lack of willingness to implement policies and reforms constitutes one such reason. Such unwillingness can take several forms: selectively attending to policy by screening out unwanted messages; symbolic acceptance of policy and reform by making superficial changes over typical practice; and outright rejection of policy altogether (Berman, 1978; Meyer & Rowan, 1977). Education scholars examining teachers’ reaction to reform initiatives have documented all such forms of unwillingness, for instance: selective attention to data resulting from accountability policies (Terhart, 2013); superficial change in response to standardsbased reforms (Spillane & Zeuli, 1999); and outright rejection of school reform, particularly among veteran teachers who have experienced multiple waves of such reform (Cuban, 1993; Olsen & Sexton, 2009). A review of the teacher learning literature (Goldsmith, Doerr, & Lewis, 2014) also suggests subtler individuallevel interactions that may lead teachers to hesitate to implement a program; for instance, their relative confidence in the efficacy of their existing instruction versus their confidence in the efficacy of the program (see also Borko, Mayfield, Marion, Flexer, & Cumbo, 1997). As this shows, a lack of willingness to implement may constitute a rational response, as programs and policies may not address important teacher and student needs, may ask teachers to perform new activities with inadequate resources, and/or may require actions that teachers feel conflict with students’ best interests. Implementers’ sensemaking around policy constitutes another reason for policy failure. Logically, streetlevel bureaucrats must understand what policy implies for practice before enacting that practice. Scholars have observed that streetlevel bureaucrats’ understanding of policy can vary and that the “meaning” of policy is created not only from the actual words of legislation, but also from the knowledge and values implementers bring to their jobs and from the milieu in which implementation occurs (Yanow, 1996). In education, studies of the 1990s standardsbased reforms suggested that teachers’ interpretation of policy differed in significant ways from policy intent (Hill, 2001; Spillane & Zeuli, 1999). Moreover, they differed in predictable ways. In a review of the role cognition plays in implementation, Spillane, Reiser, and Reimer (2002) noted that teachers were likely to implement reforms in light of existing frames of reference, leading many to interpret policies in conventional ways. They also noted that reforms requiring the fundamental reorganization of teacher knowledge and beliefs were less likely to succeed. Finally, work in this field focused on the role colleagues play in influencing implementers’ understanding of policy, particularly when teachers have collaborative opportunities in which they interpret and respond to policy (e.g., Coburn, 2001). Another finding from this literature focuses on streetlevel bureaucrats’ insufficient resources to implement policy. In the wider implementation literature, resources are often construed as financial inputs and support from key stakeholders (Derthick, 1972; Sabatier & Mazmanian, 1979). In education, some scholars have argued that the most relevant resources are those closest to classroom practice—the supports teachers draw upon when enacting instruction. One such resource is thought to be teachers’ knowledge (Ball, 1990; Cohen, 1990; Heaton, 1992; Putnam, Heaton, Prawat, & Remillard, 1992). Teachers must have basic knowledge of the policy or program they are to enact. In addition, they are likely to need deep knowledge of the content they are to teach, as well as knowledge of how students are likely to learn that content. Cavalluzzo et al. (2014) and Santagata et al. (2010) provide examples of currentday interventions that may have suffered for lack of teacher knowledge. A second resource for classroom practice is thought to be curriculum materials, particularly in the case where policies and programs call for very different forms of instruction from that typical in U.S. schools (Remillard, 2005; Stein, Remillard, & Smith, 2007). Such materials can support changes in daytoday classroom activities, providing novel tasks and problems for students to work on, allowing for more conceptuallybased presentations of content, and supporting new forms of assessment and student learning. A fourth finding from this literature focuses on organizations and the broader environmental milieu in which streetlevel bureaucrats work. Sandfort and Moulton (2015), for instance, describe organizations as occupying the “critical middle position” between policy and streetlevel bureaucrats; organizations interpret, translate, and operationalize the signals received from policies and programs and direct resources toward implementation. These organizations often exist in multilevel, multilayered networks, creating problems of coordination between agencies tasked with solving problems (Sandfort & Moulton, 2015). In education, Berman (1978) noted that policy implementation required changes in local organizational routines, rules, and processes, and that congruence between organizational goals and policy aims affected the probability of full implementation. Cuban (1993) argued that norms, such as longheld traditions about the role of the teacher and the organization of classrooms and schools, affect the fate of curricular and pedagogical reforms. Wayne, Yoon, Zhu, Cronen and Garet (2008) note that organizational factors such as competing professional development and mismatched curriculum materials can impede program success. Factors outside of implementers’ organizations—rival policies, political upheavals, and unexpected events—can distract organizations and individuals from intended reforms (McLaughlin, 1987, 1990). Finally, simple things matter. For instance, reviews by McLaughlin (1990) and Wilson (2013) both note that supportive administrators increase the likelihood of instructional improvement, and singleprogram studies by Santagata et al. (2010) and Wanless, Patton, RimmKaufman, and Deutsch (2013) name lack of principal support as a key ingredient in their null results. A fifth and more recently emergent explanation for implementation failure in education focuses on the relative difficulty of enacting ambitious instructional practice. Kennedy (2005) notes that what reformers typically want—instruction that yields more student intellectual engagement—may be unattainable because such practice is both highly demanding and exhausting for teachers. Cohen (2011) elaborates this further, analyzing the demands that ambitious, rather than conventional, teaching places on teachers and their students. The former calls for students to engage intellectually, often around difficult problems and tasks; this puts the onus on teachers to motivate learners to take on these more complex tasks and assignments. Once students are willing to engage, ambitious instruction calls for more realtime instructional responsiveness on the part of the teacher. Such work requires deep and flexible knowledge of content, a willingness and capacity to heed learners’ ideas and to make use of them in instruction, skill in leveraging classroom discussions and novel tasks to attain learning goals, and decisionmaking expertise responsive to both the content and students’ needs (Cohen, 2011; Cohen & Barnes, 1993; Heaton, 2000; Lampert, 2001). A final lesson from this research literature is that policy itself may be poorly designed, lack an adequate theory of action, and/or have failed to attract “fixers” or other interested advisors to oversee its implementation (Bardach, 1977; Pressman & Wildavsky, 1973; Sabatier & Mazmanian, 1980). In education, this corresponds to some observers’ notions that policies and educational interventions are often weak treatments, incapable of securing longterm change in organizational and individual behavior (McLaughlin, 1987). These observations from extant null results studies as well as the policy implementation literature form the framework for the analysis we undertake below. Although we cannot evaluate all potential explanations for the null findings, by using the structure provided by the extant literature and checking these hypotheses against our data, our work can contribute a first test of the framework itself and also develop hypotheses regarding how educational interventions might counter common problems in their environments. Recognizing that the policy implementation literature may not completely describe reasons for the failure of classroom interventions like the one we study, we also leave room for emergent hypotheses. Our effort is also consistent with Goldsmith, Doerr, & Lewis’ (2014) call for common and rigorous reporting practices in considerations of professional development. METHODS THE MATH SOLUTIONS PROFESSIONAL DEVELOPMENT PROGRAM Math Solutions (www.mathsolutions.com) supplied the professional development for teachers in this study. Founded by Marilyn Burns in 1984, Math Solutions is one of a handful of mathematics professional development programs with a national reach, using handselected, rigorously trained professional developers based in locations across the country. At the beginning of the study (2010), Math Solutions had provided professional development to over 600 districts in 48 states, reaching over 100,000 teachers. According to Math Solutions materials, four goals are central to the professional development—helping teachers (a) to learn more mathematics, (b) to understand how children learn math, (c) to use formative assessment to develop insight into what specific students know and do not know, and (d) to develop effective classroom instructional strategies. Specifically, our observations of the program suggest that it emphasizes instructional strategies that involve engaging, highcognitivedemand tasks that allow students to develop their own solutions to mathematical problems (often in collaboration with other students), to participate in mathematical discussions, and to deepen their understanding of mathematical concepts. Math Solutions sessions included (a) teachers collaboratively exploring and solving the mathematics tasks they could choose to use in classrooms, (b) discussions about best practices in instruction, (c) videotapes of students solving challenging mathematics problems via novel strategies, (d) examples of and teacher practice in interviewbased assessment techniques, and (e) planning sessions aimed at allowing teachers to integrate Math Solutions ideas with district instructional guidance. The program also provided teachers with supplemental curriculum materials, most often in the form of books with detailed lesson plans and written explanations of both the mathematics and how students might learn the mathematics of the lesson. Setting and Participants The professional development and research study were conducted in one midsize school district, Eastern, serving a racially and socioeconomically diverse population of over 30,000 students across 46 school locations. Approximately 60% of the district’s elementary schools received Title 1 funds and all experienced relatively high rates of student and teacher turnover due to a nearby military presence; Eastern’s schools also have performed consistently below state averages in elementary mathematics. Eastern was nominated to participate in the study by Math Solutions, which had worked with teachers as well as many of the individuals in the district’s mathematics office roughly five years prior to the current project. Recruitment efforts focused on schools and teachers not already trained by Math Solutions. The study enrolled 105 fourth and fifthgrade teachers over two separate cohorts, with 88 teachers beginning the study in fall 2010 and an additional 17 teachers added in fall 2011 to replace 29 departing teachers. All teachers who left did so for reasons unrelated to the study; they left study schools, left teaching, were no longer teaching fourth or fifth grade, or stopped teaching mathematics. Teachers in our sample were 72% white and 9% male, and 55% held a graduate degree. They had, on average, almost nine years of teaching experience. The baseline mathematical knowledge for teaching (MKT) scores of the teachers in the study were slightly below the national average for elementary teachers of all grades, and moderately (0.2–0.3 SD) below teachers of the same grades in a nationally representative sample (see below for a description of MKT). Teachers were randomly assigned within schools to either the treatment or control group. The control group participated in districtdeveloped science professional development, meeting roughly once a month during the school year to cover topics from the fourth and fifthgrade curriculum. Teachers in the treatment group began their Math Solutions experience with a fourday summer institute in August 2010, with similar summer institutes held in two subsequent summers through August 2012. Attendance was relatively high during the first year, when 35 out of 42 treatment teachers attended the summer session (83%), but dropped in Years 2 and 3 for reasons described below. In addition to the summer institutes, Math Solutions staff taught either four (Year 3) or six (Years 1 and 2) oneday, inperson sessions during the school year. Almost all of the teachers in the treatment group participated in these sessions during the first year of the study, but participation dropped during Years 2 and 3, again for reasons described below. DATA COLLECTION AND MEASURES A complete description of data collected for this study exists in Jacob, Hill, and Corey (2018). In Table 2 and below, we describe only the data we draw upon for our current analyses. Table 2. Overview of Data Used in Analysis by Treatment (T) and Control ©
All = both treatment and control teachers, T Only = only treatment teachers Observations of Professional Development Activities Two authors of this study attended 19 days of professional development led by Math Solutions staff. These days were split roughly equally between the summer and the school year. Observers took notes on the tasks completed by teachers, the discussions teachers had with one another, and the wholegroup discussions between the providers and teachers. The observers also took photographs of displays created for or by teachers within the professional development (e.g., posters with a mathematics problem and teachers’ solution methods). We used these notes to both characterize the program and to evaluate the strength of the Math Solutions program relative to the task of helping teachers change their practice. Teacher Surveys and Logs Teachers completed surveys at the beginning of the study (the pretest) and each spring for the following three academic years (Year 1, Year 2, Year 3). These surveys provided two sources of information for this analysis. During the spring, teachers in both the treatment and control groups reported on the quantity and quality of their professional development during the prior year, including whether they learned content and pedagogical strategies that were relevant to their practice and whether they changed their teaching as a result. As this set of items was given to both treatment and control teachers, the questions did not name Math Solutions specifically, and thus teachers reported on all professional development experiences in the year prior to the survey. Reliabilities on the professional development quality metric were 0.92 for Year 1, 0.93 for Year 2, and 0.98 for Year 3. We used these reports to assess the hypothesis regarding lack of will. Teacher surveys also included MKT items covering lateelementary number/operations and geometry (Hill, Schilling, & Ball, 2004). These multiplechoice items measure teachers’ “common” content knowledge (i.e., the mathematical knowledge common to educated adults) as well as “specialized” content knowledge (i.e., the mathematical knowledge and skills uniquely needed by teachers). For instance, a common content knowledge item may ask a teacher to select the correct answer to the problem 35 ´ 25; a specialized version of this item asks teachers to assess the generalizability of several nonstandard methods for solving this problem. Specialized content knowledge items also asked teachers to identify mathematical explanations for common rules and procedures, and to assess the usability of content representations. Offtheshelf MKT forms supplied by the Learning Mathematics for Teaching project were used, with reliabilities ranging between 0.62–0.78 across different years and test forms. We used teacher MKT scores to evaluate the hypothesis regarding whether and how teacher content knowledge affects changes in instruction. Teachers also completed a short log following each of their videorecorded lessons, described below. This log asked teachers to reflect on lesson goals, what went well, and what did not go well. We used these data to assess the hypothesis associated with sensemaking by comparing teachers’ reports to observers’ accounts of lesson activities. The log also asked teachers to report on the curriculum materials used in the lesson—whether it was the districtadopted text (Math Expressions), a supplemental set of materials used in the district (Investigations in Number and Space), materials from Math Solutions, or districtcreated materials. Teachers not using one of these resources could write in the name and source of the lesson materials. We used this data to assess whether the use of Math Solutions materials helps foster the implementation of Math Solutions instructional techniques, as suggested by resource theory. VideoRecorded Lessons During the first two years of the study, lessons taught by teachers in the treatment group were video recorded;^{3} in the third and final year of the study, lessons taught by both treatment and control group teachers were captured. At each time point, participating teachers recorded six lessons averaging roughly one hour each in length. Lessons were sampled in three blocks of two backtoback taping days. To avoid repetition of content, blocks were scheduled at least two weeks apart. Teachers chose the day and time to be video recorded, although we asked that teachers not record on testing days. We scored these videos using the mathematical quality of instruction (MQI) observation instrument, which captures lesson quality along four dimensions. MQI’s richness dimension captures the depth of the mathematics taught to students, including the extent to which the lesson features mathematical practices and emphasizes the meaning behind facts and procedures. Student participation in meaningmaking and reasoning (SPMMR) captures evidence of students’ involvement in cognitively demanding mathematical activities, such as providing explanations, engaging in reasoning, posing mathematically motivated claims or questions, or working on novel/complex tasks. Teacher errors reports on the presence of mathematical errors introduced by the teacher during instruction, and working with students captures teacher use of students’ mathematical ideas during the course of instruction and teachers’ remediation of incorrect student thinking. Each dimension consists of between two and five items that identify specific behaviors, and each item was scored on a scale of 1 (not present) to 3 (present and high quality). Following scoring at the item level, raters were also asked to complete an overall assessment of the lesson (Overall MQI) on a 1–5 scale, with scores of 1–2 reserved for lessons that are mathematically problematic and scores of 4–5 reserved for lessons with both strong richness and student participation. Two trained raters were randomly assigned to each lesson and were blind not only to the treatment/control condition, but also to the fact that this data stemmed from a study of professional development, as lessons from this study were scored alongside those from a much larger study of instruction. The adjusted intraclass correlations of the withinyear dimensionlevel scores ranged from 0.39 (errors) to 0.71 (SPMMR), low by conventional standards for reliability but typical for observational metrics (Bell et al., 2012; Garet et al., 2017; Kane & Staiger, 2012). MQI scores were used to check hypotheses regarding the role of teachers’ knowledge and curriculum materials in the implementation of Math Solutions instructional techniques. For the analyses reported here, we also used lesson videos to construct case studies of treatment teachers, noting instructional changes between the first and third year of videotaping and examining those changes (or lack thereof) in light of our hypotheses about insufficient resources and the difficulty of enacting ambitious instruction. Observers watched up to six videos per teacher (n = 24) in both Year 1 and Year 3, and examined these teachers’ postobservation logs. These observers wrote memos to record lesson strengths and weaknesses in an unstructured format. Then, observers completed a case analysis by answering questions about the degree of student participation in mathematical meaningmaking and reasoning, teacher use of student ideas, teacher problems with mathematical content, and teacher change over time. These topics followed the major MQI dimensions, but allowed observers to construct a more holistic picture than could be accomplished during the rating process, and to note issues outside of the MQI framework (e.g., student engagement, climate). Together, a reading of the lesson notes and case analyses enabled us to understand the degree to which instruction overall matched Math Solutions ideals. Interviews School researchers conducted interviews with 31 of the 33 Eastern teachers who remained in the treatment group at the end of the project. These exit interviews averaged about 30 minutes in length and covered topics such as teachers’ beliefs about mathematics teaching, experiences in the Math Solutions professional development, and reported changes in practice. Although selfreports are not always reliable indicators of actual practice, they can provide insight into the ways in which teachers both perceived changes in their practice and the role of Math Solutions in fostering those changes. We used these interviews to assess hypotheses associated with lack of will and organizational barriers to implementation of the Math Solutions program. We also interviewed the Math Solutions providers who worked with teachers in Eastern. The interview was loosely structured around trying to understand the evaluation results presented below. Providers discussed their own explanations for the null results and commented on some of the hypotheses generated from our own observations of the program. SUMMARY OF EVALUATION RESULTS Because we have reported in depth on our formal evaluation of the program elsewhere (see Jacob, Hill, & Corey, 2017), we only briefly summarize those results here. We expected that teachers’ participation in the Math Solutions professional development program would lead to an increase in teachers’ mathematical knowledge for teaching, classroom instruction that featured more student participation in mathematical thinking and reasoning, and stronger student test score outcomes as compared to a control group. Results did not bear these hypotheses out. Although there were some effects on teachers’ MKT favoring the treatment group (a significant +0.40 SDs on the numbers and operations assessment at the end of Year 2; +0.32 SD and insignificant at the end of Year 3), there were no statistically significant impacts on teachers’ instructional practice as measured by teachers’ MQI scores. Examining treatment teachers’ MQI scores between Year 1 and Year 3 of the study actually suggested they remained relatively static (Year 1 = 2.76, SD = 0.46; Year 3 = 2.87, SD = 0.25). However, this masks a slight improvement in the rate of teacher errors over this time period and a 1 SD drop in scores on the MQI dimension student participation in meaningmaking and reasoning (Y1 = 1.31, SD = 0.20; Y3 = 1.12, SD = 0.08). Finally, there was no impact of the professional development on either a state standardized assessment or a projectadministered assessment. This last result is unsurprising given the lack of effects on instruction, although it is also possible that student achievement could have improved as a result of factors other than those detected by the MQI. CURRENT ANALYSIS The current analysis assesses potential explanations for these null results, with a focus on factors that could have led to nonimplementation at the classroom level. To do so, we specified hypotheses identified by the literature reviewed above, then drew on our quantitative, observational, and interview data to evaluate each. In reading through our data corpus, we also remained alert for other, emergent explanations. We organize our results by first discussing a priori hypotheses and then reviewing emergent explanations. Although our analyses cannot definitively identify the reason(s) for program failure, we argue that this highly structured search can both help refine our null results framework and provide evidence useful for future comparisons across programs. Although Table 1 indicates that many authors attribute their null results to methodological issues, we do not believe they operated here. First, this study was wellinstrumented, including two reasonably reliable assessments as well as other measures that captured intermediate outcomes. Second, our records show sufficient contrast between treatment and control dosages of mathrelated professional development. For example, at the end of the second year of the study, 93% of the teachers in the treatment group reported that they had received more than 15 hours of Math Solutions professional development over the course of the year. In contrast, 77% of the control group teachers reported receiving fewer than 15 hours of math professional development of any type, and 20% reported receiving no professional development in math at all. Third, analyses suggest that we had sufficient power to detect substantively meaningful effects. With our final Year 3 sample (n = 57), assuming beta equal to 0.80 and alpha equal to 0.05, we had statistical power to detect effect sizes of 0.35 SD for the MKT measures, of 0.20 SD for the MQI measures, and of roughly 0.10 SD for student achievement outcomes. Backoftheenvelope estimates suggest that for MKT, 0.35 SD correspond to answering about three more items correct out of 28 total, which is, in our opinion, the smallest MKT gain likely to lead to gains in instruction and student outcomes. For MQI, 0.20 SD correspond to one or two more instances of student reasoning or rich mathematics per lesson. Impacts smaller than this are not likely to be educationally significant for children’s classroom experiences. Therefore, we do not believe factors related to the design of the study played a major role in the largely null results that were obtained. Finally, as we worked to test the hypotheses drawn from the implementation literature, we noted a logical problem: All failure can be attributed to both program recipients and their environments (e.g., contextual conditions limited fullscale enactment of the program’s ideas) and the program itself (e.g., the intervention was not strong enough to overcome contextual conditions). This twosidesofthesamecoin issue led us to omit a direct evaluation of the weak treatment hypothesis, instead weaving observations regarding this topic into the discussions of other hypotheses below. RESULTS As summarized above, the professional development had a modest effect on teachers’ mathematical knowledge, but failed to realize an effect on instructional practice and student outcomes. Our qualitative analysis of the 24 teachers with Year 1 and Year 3 video data further illuminated these findings. Across the sample and years, observers found instruction to be variable both across and within teachers. But on average, the instruction contained few extended instances of student participation in meaningmaking and reasoning, and few episodes in which teachers substantially used students’ thinking. Observers also noted that about one third of the teachers had moderate to serious problems with the clarity and accuracy of the mathematical content provided to students. Observers’ memos also suggested that there was a lack of substantial change in teachers’ practice between the first and third years of video data collection. Although seven teachers appeared to change their instruction toward the Math Solutions ideal, observers noted that in five of these cases, changes were minor and/or superficial, and did not result in an overall improvement in the quality of the lesson. For another 15 teachers, observers reported no differences between Year 1 and Year 3 videos. Finally, we noted two teachers for whom observers described Year 3 lessons as, in the aggregate, lower quality than Year 1 lessons. The a priori and emergent hypotheses we explored to help explain these findings are listed in Table 3 and our findings are described in detail below. Table 3. Explanations for Null Results
A PRIORI HYPOTHESES Lack of Will It is possible that teachers participated in the professional development for reasons other than wishing to change their practice. They may have felt pressure from the district math coordinator or their school principal—both of whom often attended recruiting sessions—to enroll, or they may have agreed to join the study because they were paid up to $550 per year for their participation. Likewise, the professional development itself may not have convinced teachers that improvements in their practice were necessary, or teachers may have initially wished to improve their practice and then changed their mind. We tested for these possibilities in several ways. First, we examined teachers’ selfreports of learning and change in response to their professional development on spring surveys. Second, we analyzed the teachers’ interviews to understand their views of the program’s influence on their instruction. Finally, we used survey data to rule out differences in attitudes between treatment teachers who finished Year 3 and those who dropped out. The results of these analyses indicate that teachers were overwhelmingly positive about the program, making lack of will an unlikely explanation for the failure of the program to improve instruction and student outcomes. Survey results show that teachers in the treatment group were significantly more likely than controlgroup teachers in all three years to report that they had learned a great deal from their professional development experiences. For example, at the end of Year 3, 75% of the treatment group but only 20% of the control group indicated strong agreement with the statement “I learned things that were helpful to my everyday practice” in their math professional development—despite the fact that most controlgroup teachers also participated in at least some math professional development. Similarly, 63% of teachers in the treatment group strongly agreed that “the professional development was useful to me” as compared to only 15% of the control group. In interviews, every treatment teacher indicated making at least one change to their practice in response to their Math Solutions experience, and approximately 60% indicated making three or more such changes. Table 4 lists the changes teachers report; many of these were themes Math Solutions staff emphasized during professional development sessions. In discussing the program, teachers also volunteered statements such as “great program” and “I love it. The kids love it.” There were no teachers who made negative comments about the program. This analysis of interview data was corroborated by survey findings, in which 83% of the treatment group teachers agreed or strongly agreed that they had made changes to the way they taught because of what they learned in program. Table 4. Reported Changes from Participant Interviews Attributed to the Math Solutions Professional Development
This enthusiasm did not appear to result from differential attrition. Using Year 1 spring survey data, we found no statistically significant difference in reported changes between teachers who left the study at the end of Year 1 or 2 and those who stayed for all three years. This evidence suggests that teachers’ lack of will to change was not an issue in this study. However, teachers’ reports of having made substantial changes in their practice does raise a new set of questions about how such perceptions could coexist with observational evidence of more conventional and static practice. To explore this further, we turn to sensemaking. SenseMaking As Spillane and colleagues (2002) show, individuals typically interpret new ideas through existing cognitive frameworks, creating a gap between what programs or policies desire and what implementers understand must be done. Comments made by Math Solutions’ onsite professional developers raised this as a possibility: If we just went off of [teachers’ comments] in the sessions, we would be thrilled with what they were saying, but then every time we watched somebody teach or saw stuff in the classroom, it was a reality check about what was really going on in classrooms. . . The wait time was very minimal, and for a lot of teachers when they did ask their questions, it was the traditional “I’m waiting to hear the right answer” strategy, and when they didn’t hear the right answer, they moved on to the next question, instead of really assessing what percentage of the class really understands it and how deep they understand it. It was really a race against time of “move, move, move” and not really being comfortable with students’ wrong answers. Our own notes corroborate this trend: During professional development sessions, teachers’ comments suggested they had significantly reformed their practice, yet few recorded lessons contained evidence of significant student mathematical thinking. On this evidence, we might have concluded that teachers’ sensemaking was a factor in instruction remaining static. However, further analysis suggested otherwise. We reasoned that if there was such a gap between teachers’ and professional developers’ understandings of reform, we should see a mismatch between how teachers and observers described the specific lesson. Teachers may remark that the lesson featured student reasoning and problem solving, whereas observers would characterize the lesson as procedural and recalloriented. To this end, we examined Year 3 responses to the openended postlesson log questions that invited teachers to comment on what went well and did not go well during their lessons. For each, we coded responses for mentions of student thinking, reasoning, discussion, multiple solution strategies, and other terms associated with the Math Solutions professional development. To start, we found only very few mentions of these terms; out of all (n = 187) Year 3 treatmentgroup logs, we noted only 12 where the teachers named such an activity as something that went well during the lesson. For the inverse question, what did not go well, only three logs recorded a wish for more student reasoning or communication. This suggests that, at a minimum, teachers were not attending to Math Solutions principles in reflecting on their lessons. Next, we compared the 12 cases where teachers used a Math Solutionstype term to describe their instruction to observers’ written characterizations of the lessons. In most cases, we did find a correspondence between the two descriptions of the lesson: When teachers mentioned asking why questions or fostering student discussions or multiple solutions strategies, for instance, observers’ notes often explicitly mentioned or alluded to the same. Observers noted, however, that these activities were often enacted with lower levels of quality than providers would wish, an issue Math Solutions providers also hinted at in their comments about lessons they observed (for example: “I think there was definitely more math talk, but it wasn’t necessarily more productive”). Together, this evidence suggests that sensemaking was not a major explanatory factor for static instruction. The 14 teachers who referenced Math Solutions practices related to their instruction seemed to understand and interpret the meaning of those practices in the same way as the observers. Moreover, evidence in our analysis suggests that inattention to Math Solutions goals in daytoday instruction and the difficulty of enacting highquality practice might have played a role in nonimplementation. We return to these issues below.
Insufficient Resources A third potential reason for the lack of change in instructional practice relates to the resources teachers use when instructing students. As identified in prior research (e.g., Ball, 1990; Cohen, 2011; Lampert, 1990), teachers’ knowledge of content is one such resource. In this line of thinking, deeper knowledge of mathematics would ease the development of practice that features more rigorous mathematics and more student participation. If this were true in our data, we would see that instructional change may correlate with initial MKT status, with teachers who had stronger mathematical knowledge for teaching exhibiting more change and weaker teachers remaining the same. Our data, however, suggests that baseline MKT was not a significant predictor of change in MQI scores (r = –0.18, p = 0.39). Curriculum materials can also serve as a support for the enactment of reformed instructional practice (Davis & Krajcik, 2005; Ball & Cohen, 1996). Teachers who undertake ambitious instruction while using conventional materials face steep costs—designing or unearthing more challenging tasks, crafting openended questions for students, and anticipating students’ conceptual misunderstandings. At the time the study began, Eastern was using Math Expressions (Fuson, 2006). An inspection of fourth and fifthgrade lessons suggests that there was considerable overlap between Math Expressions and Math Solutions in the mathematical content covered. However, an inspection of Math Solutions materials, which were presented to the teachers in the form of both professional development activities and published books, suggests that the lessons written by Math Solutions authors tended to follow a quite different format than those in Math Expressions. Whereas Math Expressions lessons support teachers’ scaffolding concepts and skills for students to learn in a wholegroup setting, Math Solutions materials begin with teachers posing a complex question or problem for students to solve, often collaboratively. Thus, a lack of alignment between the district’s assigned textbook and Math Solutions materials may have suppressed the implementation of the ideas transmitted through professional development. However, an examination of teachers’ Year 3^{4} reports of curriculum use against observers’ notes suggests that lesson quality—in particular the presence of student participation in mathematical explanation and reasoning—did not substantially vary by the materials used by teachers. To confirm this, we calculated overall MQI for each major category of curriculum material reported by teachers (see Table 5). If anything, the use of Math Solutions materials was associated with slightly lower MQI scores than district online materials and other lesson sources. This suggests that it was not a lack of appropriately aligned materials that contributed to the null results. Table 5. Overall MQI Lesson Score by Curriculum Materials, Year 3
Note. Overall MQI score as averaged across two raters. Overall MQI is reported on a 1–5 scale, with scores of 1–2 reserved for lessons that are mathematically problematic, and scores of 4–5 reserved for lessons with both strong richness and student participation. Organizational Barriers Similar to the policy implementation literature, our analyses identified a set of district and schoollevel factors that appeared to hinder the implementation of Math Solutions practices in classrooms. Some of these factors relate to what McLaughlin (1987) termed “environmental instability.” Between teacher recruitment and the end of the study, several changes took place: State and district mathematics standards and pacing guides were updated; the state assessment changed from paper and pencil to computerized and was reportedly upgraded to include more cognitively demanding problems; Eastern had three different superintendents; and the district’s math coordinator (a “fixer” of problems related to the program’s implementation; see Bardach, 1977) retired. As a result of these changes, district support for the professional development waned over the years, and teachers often could not attend scheduled professional development sessions because the district failed to prioritize time for them to leave their classrooms. Limited teacher access to appropriate technology also prevented the implementation of an online learning community, intended to provide support to teachers between formal professional development sessions. In interviews for the study, Math Solutions providers spoke of these as barriers to their work in the district. Another districtlevel factor appeared to be competing instructional guidance. Eastern, like many other urban districts struggling to make progress on state assessments and accountability metrics, maintained a series of instructional policies regulating teachers’ content coverage, instructional strategies, and time use in the classroom. In Eastern, this took the form of specific gradelevel objectives, frequent benchmark assessments, and pacing guides. District leaders rewrote some of these documents during the second year of the project, asking teachers to reconfigure their lesson sequencing and also to locate new lessons to fill content gaps. Eastern also encouraged the use of smallgroup instruction and, in Year 3 of the study, instituted a mandatory “calendar math” session for daily review of place value and basic mathematical skills. This district instructional policy was a frequent topic in our Year 3 interviews with Math Solutions teachers. In these interviews, we asked whether any aspects of Math Solutions were difficult to implement in the classroom—a relatively broad, openended question meant to elicit any topics teachers considered relevant. Over half—17 out of the 31 teachers interviewed—identified Eastern’s instructional guidance as a significant reason for nonimplementation of Math Solutions. Some teachers specifically mentioned the district curriculum and pacing guides. For example: You know and even like the materials that we get from Math Solutions . . . because of our curriculum and our pacing and all of those things, you’re not always able to, you know, use it, incorporate it. A second set of teachers identified lesson length—either their prescribed mathematics time blocks or the time demands of Math Solutions lessons—as a major factor: I think the only thing is a lot of [Math Solutions] is time consuming. Like I have an hour for math. Like I don’t have all this time. And there’s a pacing guide, like I’ve got to get to the next thing. And ideally it’s not Math Solutions who’s wrong; it’s more of the way that the system is set up in the district and the state by expecting all these things to be done in a short amount of time. Finally, a third set of teachers also noted a lack of fit between Math Solutions and their school requirements regarding the use of abilitysorted small groups in mathematics. For instance, a teacher commented: There are a lot of [Math Solutions] activities and a lot of approaches and strategies and things that I would have liked to do in my classroom. However, what principals want to see and what we’re allowed to do is a different story. So I would have liked to do a lot more of heterogeneous groups and a lot more of, you know, problemsolvingbased learning as the way we did it in Math Solutions. . . And we’re not able to do that as often as we probably should because of what Eastern thinks is the best way to teach math and what my principal thinks is the best way to teach math. As a result, many teachers saw the Math Solutions material and approach as separate from their “regular instruction.” This suggests that the district environment may have played a significant role in producing the results observed in our evaluation. In our experience, it is not uncommon for districts to increase efforts to control instruction in light of state testing and accountability; it is also not uncommon to see multiple individuals and offices within the district add programs and additional guidance on top of extant decisions about curriculum and instruction. In this case, such guidance seems to have hampered the implementation of the instructional practices recommended by Math Solutions staff. The Difficulty of Enacting Ambitious Practice Kennedy (2005) and Cohen (2011) suggest that a reason for the nonimplementation of ambitious mathematics reforms may be the difficulty of the practice itself. We tested this idea by examining observers’ writeups and corresponding interviews for the seven teachers who appeared to be making some changes toward Math Solutions practices between Year 1 and Year 3. As noted above, five of those teachers enacted the instructional questioning and tasks called for by the Math Solutions program superficially, suggesting that although the teacher was trying to make changes in the direction of the program, such changes were difficult to put in practice. For instance, one teacher had adopted Math Solutionsrecommended questioning strategies, in Year 3 asking students, “Why?” or asking them to explain their thinking more often than in Year 1. However, the observer noted that students’ responses to these “why” questions were more procedural than conceptual—a report on the series of steps the student used to solve a problem rather than an explanation of why a solution was correct or a particular method worked. Asking why, as the program recommends, is easy; however, it is far more difficult to know when to ask why, discern and encourage meaningfocused student responses, and respond to those students’ ideas (Cohen, 2011; Lampert, 2003). The same pattern was observed in teachers’ use of tasks aligned with those found in the Math Solutions program. Though cognitively complex tasks were used, they were rarely used to their potential. For instance, one observer wrote: [Year 3] lessons featured a number of tasks that had potential to be inquiry/exploratory in nature but were enacted in a way that either removed the possibility of students finding their own way through the task or just devolved the cognitive demand. . . If anything, I might argue that the teacherdesigned tasks in [Year 3] reflect a misguided attempt at meaningmaking. . . It does seem that she has the idea to do more “engaging” longer tasks, but they never go deep. Similarly, other implementations of openended tasks featured slowpaced mathematics without the richer discussions or the student mathematical thinking and reasoning intended by Math Solutions developers. In one lesson, the class spent 47 minutes estimating how many beans would fit into a cup, then how many cups of beans would fit into a larger container. Students spent a majority of the lesson recording numbers, with no connections to larger ideas about estimation, volume, or threedimensional shapes. An observer summarized what she saw for another teacher: Although the [Year 3] lessons were clearly more openended and more studentcentered, they were not appreciably higher in quality. The pacing was very slow and very little mathematics got done over fairly large chunks of time (for example, students were asked to find two numbers that multiply to 360 in groups and worked on this task for over a halfhour). Ultimately, the math worked on in these “investigations” was somewhat superficial (listing the factors of a product, dividing three pizzas among six people) and connections to big ideas were rarely surfaced. To successfully enact such highcognitivedemand tasks, these teachers would have had to either possess or learn several skills: how to launch a mathematical task without devolving its demands on student thinking; how to thread mathematical ideas through an entire investigation, drawing students’ attention to and engagement with those ideas; and how to maintain the mathematical density and pace of the lesson while allowing for depth in discussion. We argue that these observations, along with those above regarding the quality of lessons that use Math Solutions materials, provide evidence for the “difficulty of ambitious practice” hypothesis. Teachers who reported making sincere efforts to change, and whose classrooms did feature at least superficial differences between the first and third year of the study, did not improve along the major dimensions targeted by the intervention, likely because the new form of instruction required skills they had not yet developed. Another way of looking at this problem, however, is that the treatment was too weak to support teachers’ transition to highquality, ambitious practice. In fact, a read of our professional development field notes suggested that while the program itself prized ambitious instruction, its activities and features stayed at a superficial level rather than building core technical skills to enable highquality implementation. For example, discussion was valued, and mathematically meaningful discussions were often on display in videos and in the modeling by program staff; however, teachers themselves neither identified nor practiced the specific techniques involved in leading mathematically productive discussions (see, e.g., Chazan & Ball, 1999). Teachers were cautioned not to devolve the cognitive demand of tasks, but they did not work on how to launch tasks in a way that allowed students to work productively while not directly giving students solution methods (Jackson, Shahan, Gibbons, & Cobb, 2012). One reason for the lack of work on such technical skills may have been a district and program culture that pulled away from difficult public work; we discuss this next, as we transition to hypotheses that emerged from our read of the data. EMERGENT HYPOTHESES Weak Instructional Press Observations of professional development and discussions with professional developers after the conclusion of the study suggested another potential reason for null effects: that factors both inside and outside the professional development program led to a relatively weak “instructional press” (Firestone & Rosenblum, 1988) on teachers to improve their craft. This was most visible in work around improving teachers’ mathematical knowledge and in assisting them to develop effective classroom instructional strategies. On the first issue, the professional development was filled with what we judged to be mathematical learning opportunities for teachers: teachers solving problems their students would solve, with attention to meaning; videos of students using inventive and correct solution methods to complex mathematical problems; and examples of ways to link numeric and visual solution methods. However, as teachers carried out these tasks, the developers often stopped short of pressing hard on the underlying mathematics. For example, one task asked teachers to determine how many possible pentominoes exist (a pentomino is a shape made with five squares with edges and vertices touching). Teachers worked in groups, finding all 12 possible pentominoes, and then shared each pentomino at length with the whole group. But rather than delving deeper into the mathematics (How do we know we have them all? What strategies could you use to make sure you had all sextominoes without actually making the arrangements of shapes? What is the mathematical point of this activity? What mathematical practices does this activity involve?), the subsequent conversation focused on group dynamics and other instructional matters. In other cases, the providers overlooked teachers’ difficulties solving mathematics tasks. For instance, when a group of teachers failed to successfully model problems such as 1/2 + 2/3 using manipulatives, the provider simply worked one example publicly and moved on without further comment, even though she later disclosed that she knew many of the teachers struggled with the task. On the second issue, a lack of instructional press was also evident during demonstration lesson debriefings, according to Math Solutions developers. For both lessons taught by Math Solutions providers and by Eastern teachers themselves, few public critiques of the instruction surfaced. Math Solutions providers reported that they were hesitant to press critiques, for they needed to ensure teachers would feel safe participating in future model lessons. These two areas provide examples of ways Math Solutions staff failed to press on the skills and knowledge necessary for ambitious instruction. Although our study was not designed to examine this issue in detail, we hypothesize that the lack of press relates to the “culture of nice” (Mangin & Stoelinga, 2011) and privatization of instruction that exists within schools, and under which teachers are protected from questions about their professional judgment and expertise. As well, Math Solutions staff raised concerns about retaining teachers in the study, noting that if teachers did not view the professional development as a positive experience, they would not return to the professional development and thus likely count against the program in the impact evaluation. Whatever the cause, conditions in this study and district did not allow close work on topics that could have supported improved instruction.
Fit of the Intervention to Teachers’ Needs Finally, our analyses uncovered one further potential reason for the null results: the fit of the intervention to teachers’ needs. We make this claim based on evidence from observers’ notes that suggested teachers’ instructional quality varied widely after the first year of the intervention. For one group of teachers, Math Solutions was in fact a good fit; their Year 1 lessons featured little student thinking and reasoning, and the mathematics conveyed seldom went beyond procedures. A reasonable starting place for teachers in this category should be professional development focused on changing teachers’ beliefs about good instruction, providing models for such instruction, and exposing teachers to mathematics and classroom mathematics activities. However, a second group of teachers, whether due to the Year 1 program itself or to preexisting experiences and learning opportunities, already displayed a fair number of the practices the program desired. Although these practices were not necessarily enacted consistently throughout each Year 1 lesson or at the highest level of sophistication, roughly one third of the sample showed some productive work in these domains. The presence of these teachers in the professional development may make providers’ work more complex, as these teachers had already taken a first step to change instruction and may have needed help finetuning those practices, as well as encouragement to make them more prevalent in classrooms. Our analysis also uncovered multiple areas of need not related to the cognitive demand of lessons. One was teachers’ mathematical content knowledge. Although observers’ notes did not often record outright teacher content errors (e.g., solving problems incorrectly, misdefining terms), teachers’ communication of the mathematics to students often suffered in Year 1 lessons. In some cases, teachers’ presentations and launches of lesson tasks were muddled, with students sometimes unsure of how to engage in the mathematical task laid out. In other cases, observers noted instances in which the teacher likely had a correct idea of the mathematical content, but the language used to communicate the idea was imprecise (for instance, defining perimeter as “distance around” a shape and telling students that finding the perimeter consisted of “counting the squares around” rather than measuring the length of the shape’s edge). Another area of need identified in observers’ Year 1 notes concerned responding to students’ thinking and misconceptions in productive ways. In some cases, teachers chose not to engage incorrect student answers; in other cases, students made correct and sometimes elegant contributions to the mathematical discussion, but teachers struggled to use those thoughts to develop a mathematical point. The thinking entailed in parsing student thoughts, connecting them to the mathematical point, and deciding how to move instruction forward using those thoughts has been shown to be quite complex (Lampert, 2001). Our analysis also showed that some Year 1 lessons meant to be openended and exploratory also suffered from a lack of a clear mathematical point. For example, in a game in which students could have developed sophisticated strategies for multiplying successive rolls of dice by a variable to arrive at a target number, a teacher was unable to describe the mathematical content beyond twice stating, “We’re using probability, and we’re computing.” Student written work during the lesson consisted only of computation, and by the end of the lesson, there had been no discussion of using principles from probability to better reach the target value. Openended and exploratory lessons appeared frequently in the Math Solutions materials, meaning the program itself induced these difficulties in treatment teachers’ classrooms. A final issue observed in Year 1 lessons relates to pacing and the resulting density of the mathematics in instruction. For half of the teachers in the sample, observers noted that the time spent on some activities was quite long relative to the amount of mathematical content covered. In some cases, this occurred because of the lengthy preparation of, or passing out of, materials (e.g., cutting out triangles or copying definitions onto a poster); transitions between lesson activities; and offtask discussion and chatter. In other cases, teachers and students simply solved problems very slowly, sometimes covering only two or three in a given lesson without compensating features such as indepth discussion or careful attention to meaning. Math Solutions materials may have exacerbated this problem for some teachers, especially when the enacted activity lacked a mathematical point and students engaged mainly in computational tasks or clerical activities, again decreasing the mathematical density of the lesson. For many of these activities, mathematical closure was seldom evident. Thus teachers’ needs differed from what Math Solutions provided. The Math Solutions program, at the time we studied it, focused more on creating a new vision for instruction than on providing the technical skills to support that vision. But these technical skills—launching tasks so that students can profitably work, responding to student ideas—may be necessary before highquality implementation can proceed. DISCUSSION In this paper, we used the implementation literature, as well as prior findings from IESfunded studies, to develop and test a framework for analyzing a nullimpact study where the cause seemed traceable to substantive, rather than methodological, explanations. A hallmark of this approach is the explicit consideration of multiple hypotheses for null results, subjecting these hypotheses to tests with available evidence, and making this process transparent and visible for readers. This contrasts with other null results studies, which often follow a report of null student impacts with a search for positive signs in mediation or subgroup analyses, yet which do not systematically vet potential reasons for null results. Because this framework is in its infancy, we also allowed for alternative, exploratory hypotheses to emerge. In what follows, we place our Math Solutions findings in the context of other research in the field, and then reflect on the ways in which others may choose to use this or related frameworks to conduct similar analyses. Our analysis found little or no support for three phenomena common in other situations where implementation fails to occur: a lack of will to change on the part of teachers, insufficient resources, and sensemaking. These factors may play important roles in other program and policy failures, for instance when professional development is mandatory for teachers (see, e.g., Givvin & Santagata, 2011), but we think the probability that they were at work here is low. We did identify evidence that organizational barriers, particularly in the form of competing district instructional guidance, likely played a significant role in implementation failure. While the effects of conflicting district guidance has been noted (e.g., Spillane, 1998), the amount and specificity of the guidance around instructional activities in Eastern (e.g., calendar math, homogenous grouping) seemed to us different than what has been observed in prior literature on standardsbased reforms. The culture of teacher autonomy—long described in the literature as “close the door and teach how you want” (Lortie, 1975)—appears to have ended, at least in this district (see also Spillane, Parise, & Sherer, 2011). More broadly, as Wilson (2013) notes, “District leadership churns, new curriculum and assessments arrive, policies change, and resources are cut. These factors vex researchers who need to control these changes as much as possible.” These factors also vex professional developers, especially those whose programs are not tightly aligned to the evolving mix of instructional guidance available in districts. We also evaluated Kennedy’s (2005) hypothesis that ambitious practice is simply difficult to enact. The data appears to support this assertion—there was little strongly reformed practice among the lessons we viewed, and even among teachers whose lessons showed some sign of change between Year 1 and Year 3, those changes were typically modest. We also observed teachers struggle with difficult aspects of enacting such ambitious practice, such as responding to student thinking and making the mathematical point of a lesson clear. This echoes findings from other intervention studies in which teachers’ classroom activities are vaguely specified and relatively complex. In a study of inquirybased science, for instance, Penuel and colleagues (Debarger et al., 2017; Penuel & Gallagher, 2009), noted that although teachers did assign more complex tasks from new curriculum materials, “teachers ha[d] difficulty making connections among student ideas, science ideas, and the investigations that students conduct to support their learning” (DeBarger et al., 2016, p. 12). Evaluators of formative assessment and data interpretation programs reached similar conclusions (Cavalluzzo et al., 2014; Schneider & Meyer, 2011). Programs in which teacher moves are more highly contingent and require substantial expertise and judgment may need to provide more extensive or different kind of support for practice. Emergent hypotheses also appeared related to the “weak treatment” hypothesis from the implementation literature. Our first emergent hypothesis suggested that several factors may have constrained providers from pressing on teachers’ mathematical knowledge and instruction. One such factor is the “culture of nice.” Although this is often used in descriptions of reform efforts (MacDonald, 2011; Mangin & Stoelinga, 2011), we know of only one prior study that has empirically identified these phenomena as a potential barrier to the efficacy of professional development (Wilson, Lubienski, & Mattson, 1996). Yet the professional development staff appeared familiar with this dilemma, suggesting that this may be a widely faced problem in this sector. Our second emergent hypothesis—the lack of fit between the professional development content and teachers’ needs—echoes other similar themes in the research literature. For instance, Santagata and colleagues (2011) note that some teachers in their program lacked the mathematical knowledge necessary for them to take advantage of the program. Cavalluzzo et al. (2014), studying an assessmentbased intervention, found that teachers did report using assessment data, but were not proficient in tailoring instructional practices to students’ needs. Finally, the treatment may have been weak via its design as an externally facilitated, workshopbased program; it may be that such programs are less likely to affect practice than efforts rooted in local communities of practice that support teachers’ leadership, innovation, instructional inquiry, and change (Galosy & Gillespie, 2013; Katzenmeyer & Moller, 2009; Lewis, Perry, & Murata, 2006; Randi & Corno, 1997; Randi & Zeichner, 2004). Throughout this paper, we noted struggles with disentangling the weak treatment hypothesis from other factors that prevented implementation. For instance, Kennedy’s difficultyofpractice hypothesis can also be viewed from the perspective of weak treatment; in the Math Solutions case, we suspect that the skills necessary to implement ambitious practice were not addressed by the program. The case of organizational barriers can be viewed as a fault of contexts, or as a fault of interventions not wellprepared to operate in complex environments. As noted earlier, these are two sides of the same coin; the difficulty of the new practice desired by a program must be matched by the supports that the program offers for enactment. Based on these results, we ask what both teacher professional development and teaching would have to look like to enable ambitious instructional practice—in other words, how “strong” the treatment would need to be to see effects. Such questions are typically answered in terms of hours of professional development, or by pointing to the formats and topics contained in effective professional development (see, e.g., Desimone, 2009; Landesman Ramey et al., 2011). What seems missing from such formulations, however, is a backwards mapping from instructional goals, to specific teacher competencies, to the components of professional development that might develop such competencies. For instance, if ambitious instruction is the goal, teachers might need to build skill in reasoninginpractice—what Rowland, Huckstep, and Thwaites (2005) would call “contingent knowledge,” or the types of thinking Lampert (2003) elegantly describes. To develop such thinking, teachers would surely need opportunities to face complex instructional situations, try out instructional moves and questions, receive feedback on their performance, and reflect on the experience. Teachers may also have to build content knowledge—not just the actual content being taught, but also the structure of the discipline (N. Joglar Prieto, personal communication, July 12, 2015) and how particular classroom activities and student ideas relate to that structure. Finally, professional development may need to resemble more closely comprehensive school reform in carrying components meant to organize environments that support instructional improvement (Peurach, 2011). In addition to backward mapping, we also argue that the individualization of professional development would maximize learning gains for both teachers and students. As we move past the era of onesizefitsall workshops, districts might invest in diagnostic systems, perhaps linked to teacher evaluation observation protocols, that would deliver the right tools at the right time. Such systems appear possible and efficacious with students (Connor, Morrison, Fishman, Schatschneider, & Underwood, 2007), and could be tried with teachers. We imagine that this work would more often take place in classrooms themselves, through coaching and individualized support designed to help teachers work on the specific aspects of practice required for improvement. Working in classrooms with individual teachers would also allow more nimble responses to implementation barriers. Math Solutions itself has moved away from workshopstyle professional development and toward such coaching. Finally, we return to discuss the use of a common framework for exploring and interpreting null results. We do not consider our framework complete for several reasons. While we draw upon insights from past research, we cannot be certain that we have the right set of potential hypotheses, have covered all the relevant hypotheses, or that our hypotheses are organized in the most effective manner. As well, a more thorough framework would be needed to cover all reasons for null results, including methodological issues. Here, we focused our attention on a moderately sized but substantively interesting set of factors. Our framework also grew from a need to analyze null results from a fairly conventional professional development program; interventions that place bets on teacher collaborative work may require alternative, more situated frameworks for analyzing null results (e.g., Horn, 2005; Putnam & Borko, 2000; Randi & Zeichner, 2004). Our framework favors macrolevel reasons; a more microlevel lens, focusing on teachers’ internal learning processes and moving out toward their environments (see Goldsmith et al., 2014; Wilson, 2013) would undoubtedly point scholars toward different reasons and thus different kinds of data to collect and analyze. Finally, scholars may also want to pursue frameworks that derive from the field of implementation studies generally (e.g., Durlak & DuPre, 2008), rather than the implementation literature focusing on instructional improvement. Despite the fact that such a framework would take some effort to develop, we argue that this is work the field should undertake. Accounts of the prevalence of null and weak results put the frequency at between one third and nine tenths of largescale RCTs. While scholars can now aggregate across all studies (both null and positive) to understand program and methodological features associated with positive outcomes (e.g., Yoon, Duncan, Lee, Scarloss, & Shapley, 2007), analysts cannot aggregate across studies to examine contextual reasons for program failure, for that type of data is seldom systematically reported. New, common reporting standards using a common framework would help, as would funding support for data collection to explain program—and especially implementation—failure. If successful, we believe these efforts will help investigators triangulate among reports and identify means for both strengthening intervention design and better matching specific programs to environments likely to allow them to thrive. Notes 1. We reviewed grants made through the following programs: social and behavioral contexts for academic learning; reading and writing; math and science; teacher quality in reading and writing; teacher quality in math and science. We identified 44 grants total and located impact analyses for 38. For each award, we divided positive results by the total number of impact analyses. We defined weak results as less than or equal to 0.20 standard deviations for researchergenerated measures and less than 0.08 for standardized measures. 2. In one case, KPALs (Lemons, Fuchs, Gilbert, & Fuchs, 2014), we also located a prior study that examined influences of implementation fidelity. However, there were positive impacts reported in this study, likely as a result of using a subsample of the total data eventually collected as part of the evaluation. We thus excluded this paper from consideration. 3. At the time the study was proposed to the National Science Foundation, the costs associated with video recording prohibited capturing lessons from all teachers in all years. 4. We conducted this test for Year 3 lesson data only because the implementation of Math Solutions lessons in Year 1 could have been complicated by the fact that teachers were new to the materials. References Ball, D. L. (1990). The mathematical understandings that prospective teachers bring to teacher education. The Elementary School Journal, 90(4), 449–466. Ball, D. L., & Cohen, D. K. (1996). Reform by the book: What is, or might be, the role of curriculum materials in teacher learning and instructional reform? Educational Researcher, 25(9), 6–14. Bardach, E. (1977). The implementation game: What happens after a bill becomes a law (Vol. 1). Cambridge, MA: MIT Press. Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Yi, Q. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. Berman, P. (1978). The study of macro and micro implementation of social policy. Santa Monica, CA: RAND. Berman, P., & McLaughlin, M. W. (1978). Implementing and sustaining innovations (Vol. 8). Santa Monica, CA: RAND. Borko, H., Mayfield, V., Marion, S., Flexer, R., & Cumbo, K. (1997). Teachers’ developing ideas and practices about mathematics performance assessment: Successes, stumbling blocks, and implications for professional development. Teaching and Teacher Education, 13(3), 259–278. Bos, J. M., Sanchez, R. C., Tseng, F., Rayyes, N., Ortiz, L., & Sinicrope, C. (2012). Evaluation of Quality Teaching for English Learners (QTEL) professional development (NCEE 20124005). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Cavalluzzo, L., Geraghty, T. M., Steele, J. L., Holian, L., Jenkins, F., Alexander, J. M., & Yamasaki, K. Y. (2014). Using data to inform decisions: How teachers use data to inform practice and improve student performance in mathematics. (CNA Report IRM2013U006508). Retrieved from the CNA, Institute for Public Research website: https://www.cna.org/CNA_files/PDF/IRM2013U006508.pdf Chazan, D., & Ball, D. L. (1999). Beyond being told not to tell. For The Learning of Mathematics, 19(2), 2–10. Coalition for EvidenceBased Policy. (2013). Randomized controlled trials commissioned by the Institute of Education Sciences since 2002: How many found positive versus weak or no effects. Retrieved from http://coalition4evidence.org/wpcontent/uploads/2013/06/IESCommissionedRCTspositivevsweakornullfindings72013.pdf. Coburn, C. E. (2001). Collective sensemaking about reading: How teachers mediate reading policy in their professional communities. Educational Evaluation and Policy Analysis, 23(2), 145–170. Cohen, D. K. (1990). A revolution in one classroom: The case of Mrs. Oublier. Educational Evaluation and Policy Analysis, 12(3), 311–329. Cohen, D. K. (2011). Teaching and its predicaments. Cambridge, MA: Harvard University Press. Cohen, D. K., & Barnes, C. A. (1993). Teaching for understanding: Challenges for policy and practice. San Francisco, CA: JosseyBass. Connor, C. M., Morrison, F. J., Fishman, B. J., Schatschneider, C., & Underwood, P. (2007). Algorithmguided individualized reading instruction. Science Magazine, 315(5811), 464–465. Cuban, L. (1993). The lure of curricular reform and its pitiful history. Phi Delta Kappan, 75(2) 182–185. Davis, E. A., & Krajcik, J. S. (2005). Designing educative curriculum materials to promote teacher learning. Educational Researcher, 34(3), 3–14. Debarger, A. H., Penuel, W. R., Moorthy, S., Beauvineau, Y., Kennedy, C. A., & Boscardin, C. K. (2017). Investigating purposeful science curriculum adaptation as a strategy to improve teaching and learning. Science Education, 101(1), 6698. Derthick, M. (1972). New towns intown. Washington, DC: Brookings Institution. Desimone, L. M. (2009). Improving impact studies of teachers’ professional development: Toward better conceptualizations and measures. Educational Researcher, 38(3), 181–199. DiPerna, J. C., Lei, P., Bellinger, J., & Cheng, W. (2015). Efficacy of the Social Skills Improvement System Classwide Intervention Program (SSISCIP) primary version. School Psychology Quarterly, 30(1), 123–141. Durlak, J. A., & DuPre, E. P. (2008). Implementation matters: A review of research on the influence of implementation on program outcomes and the factors affecting implementation. American Journal of Community Psychology, 41(3–4), 327–350. Firestone, W., & Rosenblum, S. (1988). The alienation and commitment of students and teachers in urban high schools. Washington, DC: Rutgers University and Office of Educational Research and Improvement. Fuson, K. (2006). Math expressions. Boston, MA: Houghton Mifflin. Galosy, J. A., & Gillespie, N. M. (2013). Community, inquiry, leadership: Exploring early career opportunities that support STEM teacher growth and sustainability. The Clearing House: A Journal of Educational Strategies, Issues and Ideas, 86(6), 207–215. Garet, M. S., Cronen, S., Eaton, M., Kurki, A., Ludwig, M., Jones, W., . . . & Sztejnberg, L. (2008). The impact of two professional development interventions on early reading instruction and achievement (NCEE 20084030). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Garet, M. S., Wayne, A. J., Stancavage, F., Taylor, J., Eaton, M., Walters, K., . . . & Doolittle, F. (2011). Middle school mathematics professional development impact study: Findings after the second year of implementation (NCEE 20114025). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Garet, M. S., Wayne, A. J., Brown, S., Rickles, J., Song, M., Manzeske, D., & Ali, M. (2017). The impact of providing performance feedback to teachers and principals. Washington, DC: U.S. Department of Education, Institute of Education Sciences. Gersten, R., Dimino, J., Jayanthi, M., Kim, J. S., & Santoro, L. E. (2010). Teacher study group: Impact of the professional development model on reading instruction and student outcomes in first grade classrooms. American Educational Research Journal, 47(3), 694–739. Givvin, K. B., & Santagata, R. (2011). Toward a common language for discussing the features of effective professional development: The case of a US mathematics program. Professional Development in Education, 37(3), 439–451. Goldsmith, L. T., Doerr, H. M., & Lewis, C. C. (2014). Mathematics teachers’ learning: A conceptual framework and synthesis of research. Journal of Mathematics Teacher Education, 17(1), 5–36. Heaton, R. M. (1992). Who is minding the mathematics content? A case study of a fifthgrade teacher. The Elementary School Journal, 93(2), 153–162. Heaton, R. M. (2000). Teaching mathematics to the new standard: Relearning the dance. New York, NY: Teachers College Press. Hill, H.C. (2001). Policy is not enough: Language and the interpretation of state standards. American Educational Research Journal, 38, 289320. Hill, H. C., Schilling, S. G., & Ball, D. L. (2004). Developing measures of teachers’ mathematics knowledge for teaching. The Elementary School Journal, 105(1), 11–30. Horn, I. S. (2005). Learning on the job: A situated account of teacher learning in high school mathematics departments. Cognition and Instruction, 23(2), 207–236. Hurtig, R. (2009). IES annual performance report (Grant R305G040145). Washington, DC: Author. Jackson, K. J., Shahan, E. C., Gibbons, L. K., & Cobb, P. A. (2012). Launching complex tasks. Mathematics Teaching in the Middle School, 18(1), 24–29. Jacob, R., Hill, H., & Corey, D. (2017). The impact of a professional development program on teachers' mathematical knowledge for teaching, instruction, and student achievement. Journal of Research on Educational Effectiveness, 10(2), 379407. Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: Combining highquality observations with student surveys and achievement gains. Seattle, WA: Bill & Melinda Gates Foundation. Katzenmeyer, M., & Moller, G. (2009). Awakening the sleeping giant: Helping teachers develop as leaders. Corwin Press. Kennedy, M. M. (2005). Inside teaching. Cambridge, MA: Harvard University Press. Lampert, M. (1990). When the problem is not the question and the solution is not the answer: Mathematical knowing and teaching. American Educational Research Journal, 27(1), 29–63. Lampert, M. (2003). Teaching problems and the problems of teaching. New Haven, CT: Yale University Press. Landesman Ramey, S., Crowell, N. A., Ramey, C. T., Grace, C., Timraz, N., & Davis, L. E. (2011). The dosage of professional development for early childhood professionals: How the amount and density of professional development may influence its effectiveness. In J. A. Sutterby (Ed.), The early childhood educator professional development grant: Research and practice (pp. 11–32). Boston, MA: Emerald Group Publishing Limited. Lemons, C. J., Fuchs, D., Gilbert, J. K., & Fuchs, L. S. (2014). Evidencebased practices in a changing world: Reconsidering the counterfactual in education research. Educational Researcher, 43(5), 242–252. Lewis, C., Perry, R., & Murata, A. (2006). How should research contribute to instructional improvement? The case of lesson study. Educational Researcher, 35(3), 3–14. Lipsky, M. (1980). Streetlevel bureaucracy: Dilemmas of the individual in public services. New York, NY: Russell Sage Foundation. Lortie, D. C. (1975). School teacher: A sociological inquiry. Chicago, IL: University of Chicago Press. MacDonald, E. (2011). When nice won’t suffice. Journal of Staff Development, 32(3), 45–47. Mangin, M., & Stoelinga, S. (2011). Peer? Expert. Journal of Staff Development, 32(3), 48–52. McLaughlin, M. W. (1987). Learning from experience: Lessons from policy implementation. Educational Evaluation and Policy Analysis, 9(2), 171–178. McLaughlin, M. W. (1990). The Rand Change Agent Study revisited: Macro perspectives and micro realities. Educational Researcher, 19(9), 11–16. Meyer, J. W., & Rowan, B. (1977). Institutionalized organizations: Formal structure as myth and ceremony. American Journal of Sociology, 83(2), 340–363. Murray, D. W., Rabiner, D. L., & Carrig, M. M. (2014, March ). Grade level effects of the Incredible Years Teacher Training Program on emotion regulation and attention. Paper presented at the annual conference of the Society for Research on Educational Effectiveness, Washington, DC. Olsen, B., & Sexton, D. (2009). Threat rigidity, school reform, and how teachers view their work inside current education policy contexts. American Educational Research Journal, 46(1), 9–44. O’Toole, L. J. (1986). Policy recommendations for multiactor implementation: An assessment of the field. Journal of Public Policy, 6(2), 181–210. Penuel, W. R., & Gallagher, L. P. (2009). Preparing teachers to design instruction for deep understanding in middle school earth science. The Journal of the Learning Sciences, 18(4), 461508. Penuel, W. R., Gallagher, L. O., & Moorthy, S. (2011). Preparing teachers to design sequences of instruction in earth systems science: A comparison of three professional development programs. American Education Research Journal, 48(4), 996–1025. Peurach, D. (2011). Seeing complexity in public education: Problems, possibilities, and success for all. Oxford, UK: Oxford University Press. Pressman, J. L., & Wildavsky, A. B. (1973). How great expectations in Washington are dashed in Oakland. Berkeley, CA: University of California Press. Putnam, R. T., & Borko, H. (2000). What do new views of knowledge and thinking have to say about research on teacher learning? Educational Researcher, 29(1), 4–15. Putnam, R. T., Heaton, R. M., Prawat, R. S., & Remillard, J. (1992). Teaching mathematics for understanding: Discussing case studies of four fifthgrade teachers. The Elementary School Journal, 93(2), 213–228. Randi, J., & Corno, L. (1997). Teachers as innovators. In B. J. Biddle, T. L. Good, & I. F. Goodson (Eds.), International handbook of teachers and teaching, Vol. I (pp. 1163–1122). Dordrecht, the Netherlands: Kluwer Academic. Randi, J., & Zeichner, K. (2004). New visions of teacher professional development. In M. Smylie & D. Miretszky (Eds.), Developing the teacher workforce. The 103rd yearbook of the National Society for the Study of Education, Part I (pp. 180–227). Chicago, IL: University of Chicago Press. Remillard, J. T. (2005). Examining key concepts in research on teachers’ use of mathematics curricula. Review of Educational Research, 75(2), 211–246. RimmKaufman, S. E., Larsen, R. A., Baroody, A. E., Curby, T. W., Ko, M., Thomas, J. B., . . . & DeCoster, J. (2014). Efficacy of the responsive classroom approach results from a 3year, longitudinal randomized controlled trial. American Educational Research Journal, 51(3), 567–603. Rowland, T., Huckstep, P., & Thwaites, A. (2005). Elementary teachers’ mathematics subject knowledge: The knowledge quartet and the case of Naomi. Journal of Mathematics Teacher Education, 8(3), 255–281. Sabatier, P., & Mazmanian, D. (1979). The conditions of effective implementation: A guide to accomplishing policy objectives. Policy Analysis, 5(4), 481–504. Sabatier, P., & Mazmanian, D. (1980). The implementation of public policy: A framework of analysis. Policy Studies Journal, 8(4), 538–560. Sandfort, J., & Moulton, S. (2015). Effective implementation in practice: Integrating public policy and management. San Francisco, CA: JosseyBass. Santagata, R., Kersting, N., Givvin, K. B., & Stigler, J. W. (2011). Problem implementation as a lever for change: An experimental study of the effects of a professional development program on students’ mathematics learning. Journal of Research on Educational Effectiveness, 4(1), 1–24. Schneider, M. C., & Meyer, J. P. (2011). Investigating the efficacy of a professional development program in formative classroom assessment in middle school English language arts and mathematics. Journal of Multidisciplinary Evaluation, 8(17), 1–24. Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62–87. Schoenbach, R., Greenleaf, C., & Murphy, L. (2012). Reading for understanding: How Reading Apprenticeship improves disciplinary learning in secondary and college classrooms. 2^{nd} Edition. San Francisco, CA: JosseyBass. Spillane, J. P. (1998). Challenging instruction for “all students”: Policy, practitioners, and practice. Chicago, IL: Institute for Policy Research, Northwestern University. Spillane, J. P., Parise, L. M., & Sherer, J. Z. (2011). Organizational routines as coupling mechanisms: Policy, school administration, and the technical core. American Educational Research Journal, 48(3), 586–619. Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition: Reframing and refocusing implementation research. Review of Educational Research, 72(3), 387–431. Spillane, J. P., & Zeuli, J. S. (1999). Reform and teaching: Exploring patterns of practice in the context of national and state mathematics reforms. Educational Evaluation and Policy Analysis, 21(1), 1–27. Spybrook, J., & Raudenbush, S. W. (2009). An examination of the precision and technical accuracy of the first wave of grouprandomized trials funded by the Institute of Education Sciences. Educational Evaluation and Policy Analysis, 31(3), 298–318. Stein, M. K., Remillard, J., & Smith, M. S. (2007). How curriculum influences student learning. In F. Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp. 319–369). Greenwich, CT: Information Age Publishing. Terhart, E. (2013). Teacher resistance against school reform: Reflecting an inconvenient truth. School Leadership & Management, 33(5), 486–500. Wanless, S. B., Patton, C. L., RimmKaufman, S. E., & Deutsch, N. L. (2013). Settinglevel influences on implementation of the Responsive Classroom approach. Prevention Science, 14(1), 40–51. Wayne, A. J., Yoon, K. S., Zhu, P., Cronen, S., & Garet, M. S. (2008). Experimenting with teacher professional development: Motives and methods. Educational Researcher, 37(8), 469–479. Wilson, S. M. (2013). Professional development for science teachers. Science, 340(6130), 310–313. Wilson, S., Lubienski, S. T., & Mattson, S. M. (1996, April). What’s the role of mathematics? A case study of the challenges facing reformoriented professional development. Paper presented at the annual meeting of the American Educational Research Association, New York, NY. Yanow, D. (1996). How does a policy mean? Interpreting policy and organizational actions. Washington, DC: Georgetown University Press. Yoon, K. S., Duncan, T., Lee, S. W. Y., Scarloss, B., & Shapley, K. L. (2007). Reviewing the evidence on how teacher professional development affects student achievement. Issues & answers (REL 2007No. 033). Washington, DC: Regional Educational Laboratory Southwest (NJ1)


