
Do Organizational Supports for Math Instruction Improve the Quality of Beginning Teachers’ Instruction?by Thomas M. Smith, Laura Neergaard Booker, Eric D. Hochberg & Laura M. Desimone  2018 Background/Context: Researchers have found that teachers’ effectiveness at increasing student achievement improves during the first few years on the job. Yet little research maps the trajectory of beginning teachers’ instructional quality or investigates what forms of support are associated with variation in this trajectory. Further, although beginning teachers face many challenges not directly related to the rigor of their instruction, such as classroom management, effectively implementing highquality instruction remains a major challenge. Purpose/Objective/Research Question/Focus of Study: This article focuses on five research questions: (a) What are the initial levels of beginning seventh and eighthgrade teachers’ mathematics instructional quality? (b) To what extent are teachers’ preservice qualifications (e.g., major; mathematics knowledge for teaching), prior teaching experience (e.g., weeks of student teaching), and school teaching context (e.g., percent of student receiving free or reduced price lunch) associated with the quality of their instruction during their first semester of teaching? (d) What are the levels of, and changes in, organizational supports for math instruction that these teachers receive during their first three years in the profession? (d) To what extent does the instructional quality of beginning middle school math teachers change over their first three years of teaching? and (e) To what extent do contentfocused supports (e.g., mathfocused mentoring, mathfocused PD, professional community, principal leadership) provided over these three years predict improvement in instructional quality? Population/Participants/Subjects: Participants include 62 teachers from eight southeastern and three northeastern districts in the United States. Research Design: Using observation, survey, and interview data, we identify the links between the organizational supports provided beginning teachers and the teachers’ improvements in instructional quality during their first three years of teaching. Findings/Results: Results suggest little improvement in the instructional quality of mathematics lessons during the first three years of teaching and that most organizational supports, as they are currently delivered, do not appear to help beginning middle school mathematics teachers improve their instructional quality. Using indepth case studies, we explore the nature of the supports provided and their potential links to teacher improvement. Conclusions/Recommendations: Our quantitative findings suggest that current methods of supporting beginning middle school mathematics teachers are not robust enough to support the type of teacher improvement demanded by new math standards, although our qualitative analyses suggest ways of designing these supports to better attend to instructional improvement. Our findings also emphasize the critical role the principal can play in connecting new teachers to effective supports. Improving teacher quality is the focus of many current educational reforms. Policies aimed at changing teacher behavior include incentivebased initiatives, such as pay for performance, and the implementation of highstakes evaluation systems based on student test scores, classroom observations, and/or student surveys. Other policies focus on teacher learning, including the provision of supports such as induction, mentoring, coaching, instructional leadership, and other forms of teacher professional development (PD). Our study focuses on this latter set of policies. Researchers have found that teachers’ effectiveness at increasing student achievement improves during the first few years on the job (Clotfelter, Ladd, & Vigdor, 2007; Harris & Sass, 2007; Loeb, Beteille, & Kalogrides, 2012). Yet little research maps the trajectory of beginning teachers’ instructional quality or investigates what forms of support are associated with variation in this trajectory. Further, although beginning teachers face many challenges not directly related to the rigor of their instruction, such as classroom management (Desimone et al., 2014), effectively implementing highquality instruction remains a major challenge (e.g., Cohen & Lotan, 2014). We focus on whether current supports provided to beginning middle school mathematics teachers assist them in implementing forms of instruction advocated by National Council of Teachers of Mathematics (NCTM) standards and embedded in the new Common Core and other collegeandcareerready state standards (National Research Council, 2012; U.S. Department of Education, 2010). RESEARCH QUESTIONS We are primarily interested in the relationship between the supports provided by schools and districts for the learning of beginning teachers and the quality of their instructional practices. Specifically, we examine the relationship between what we label “organizational supports” for math instruction—formal and informal mentoring, collaboration with colleagues, PD, support from school leadership, and the rigor of the curriculum—and the extent to which beginning teachers implement instruction that reflects key features of highquality mathematics teaching, namely the selection and implementation of tasks emphasizing high levels of cognitive demand, connections among mathematical concepts, and classroom discourse centered on mathematical reasoning. Using observation, survey, and interview data, we identify the links between the organizational supports provided beginning teachers and the teachers’ improvements in instructional quality during their first three years of teaching. To explore this issue, we answer five research questions: (1) What are the initial levels of beginning seventh and eighthgrade teachers’ mathematics instructional quality? (2) To what extent are teachers’ preservice qualifications (e.g., major; mathematics knowledge for teaching), prior teaching experience (e.g., weeks of student teaching), and school teaching context (e.g., percent of student receiving free or reduced price lunch) associated with the quality of their instruction during their first semester of teaching? (3) What are the levels of, and changes in, organizational supports for math instruction that these teachers receive during their first three years in the profession? (4) To what extent does the instructional quality of beginning middle school math teachers change over their first three years of teaching? and (5) To what extent do contentfocused supports (e.g., mathfocused mentoring, mathfocused PD, professional community, principal leadership) provided over these three years predict improvement in instructional quality? CONTRIBUTIONS OF OUR STUDY Our study contributes to the literature in several ways. First, we take a broader view of induction than many studies by examining the role of multiple aspects of teacher support—formal and informal mentoring, collaboration, participation in PD activities, principal guidance and support, and the curriculum. Where possible, we include not only measures of quantity of support but aspects of quality as well. Second, we broaden the set of outcomes that prior research examining the impact of induction supports on beginning teachers has focused on to include trajectories of teachers’ instructional quality. Though 8 out of 10 beginning teachers participate in induction programs (Wei, DarlingHammond, & Adamson, 2010), we know little about whether and how these programs improve teachers’ instruction. Comprehensive reviews of the induction literature conclude that much of the research has focused on describing differences among programs (e.g., Johnson & Birkeland, 2003; Long et al., 2012) rather than gauging their effectiveness. Effectiveness studies have emphasized retention rates and measures of teacher satisfaction rather than instructional quality (DeAngelis, Wall, & Che, 2013; FeimanNemser, Schwille, Carver, & Yusko, 1999; Ingersoll & Kralik, 2004; Tricarico, Jacobs, & YendolHoppey, 2015). Further, in the era of Common Core State Standards and other state content standards that emphasize conceptual instruction in mathematics, understanding how organizational supports may contribute to this type of instructional improvements is relevant and timely. A third contribution is that we look at three years of teaching, building on studies that usually only look at one or two years. Fourth, we use a mixedmethods approach that capitalizes on the strengths of survey, observation, and interview data to allow a more comprehensive and indepth picture of the organizational supports provided beginning teachers and links among organizational supports and instructional quality. WHY FOCUS ON THE INDUCTION OF MIDDLE SCHOOL MATHEMATICS TEACHERS? U.S. middle school students continue to score at low levels in math. In 2011, only 42% of eighth graders scored at or above the proficiency level in mathematics on the National Assessment of Educational Progress (National Center for Education Statistics, 2013). The quality of middle school mathematics teaching has been implicated as contributing to this low student performance (Hiebert et al., 2003; Schmidt et al., 2008). For students to succeed in math, teachers must foster basic knowledge, advanced thinking, and problem solving (Hill, 2011). These abilities require a deep understanding of content that many teachers lack (Lewis, 2014; Ma, 1999). Implementing highquality instructional practices in mathematics (as required in the Common Core State Standards, for example) is challenging for teachers (Luft, Dubois, Nixon, & Campbell, 2015); doing so requires teachers to change their beliefs, advance their knowledge, and change their routines of practice (Cohen & Ball, 1999; Wood, Jilk, & Paine, 2012). WHAT DO WE KNOW ABOUT INDUCTION AND INSTRUCTIONAL QUALITY? Concerns about teacher quality and retention have driven the expansion of induction activities over the past few decades (Ingersoll, 2012; Ingersoll & Kralik, 2004; New Teacher Center, 2012). Designed to help acclimate new teachers to their schools and to the profession, induction programs often consist of activities such as orientation seminars and workshops; the most common component is mentoring—personal guidance provided by veterans to beginning teachers (Fideler & Haselkorn, 1999). In the 2011–2012 school year, four states required new teachers to participate in a statefunded induction program, and 16 states had statefunded mentoring programs for beginning teachers (Education Week, 2011). According to the National Council on Teacher Quality (2014), in 2014, 32 states provided mentoring support to all of their new teachers. These efforts are costly; states and local schools contribute considerable resources to induction and mentoring. For example, in the 2008–2009 school year, the state of California spent about $130 million on new teacher induction (Goldrick, Osta, Barlin, & Burn, 2012), with average perteacher costs of running an induction program between $1,000 and $9,468 in 2014–2015 (California Commission on Teacher Credentialing, 2015). An important goal of many mentoring programs is improving the instructional practices of beginning teachers (Isenberg et al., 2009; Stanulis & Floden, 2009; Stanulis, Little, & Wibbens, 2012). Instructional support, often stated as the primary purpose for new teacher mentoring (Smith, Desimone, & Porter, McGraner, & Taylor Haynes, 2012), frequently occurs via observation and feedback as well as guidance with planning and pacing. Early studies examining the relationship between induction supports and effects on instruction did not focus on fulltime teachers (Ratsoy, 1987), failed to emphasize content (FeimanNemser & Parker, 1990), or returned inconclusive findings (Klug & Salzman, 1991). More recently, Ingersoll and Strong (2011) reviewed studies that examine the impact of induction on various outcomes. Of the five studies that examined the relationship between induction and the quality of beginning teachers’ instructional practices, the authors reported that only Thompson, Paek, Goe, and Ponte (2004), who examined the impact of California’s Beginning Teacher Support and Assessment program on teacher practice, analyzed multiple sources of data to measure instruction, including classroom observation. Thompson et al. (2004) studied third, fourth, and fifthgrade public school teachers in their third year of teaching, and they observed 34 teachers. The authors found that beginning teachers with higher engagement in induction tended to have higher ratings on most of the measures of teaching practice, though findings were only statistically significant for instructional planning. In a largescale, randomized controlled trial studying the impacts of participation in an intensive mentoring program on instructional practices, student achievement, and teacher mobility, Glazerman et al. (2008) found no differences in instruction between treatment and control teachers in their first year; they did find effects on student achievement for teachers who had participated in two years of intensive induction (Isenberg et al., 2009). No classroom observation was conducted during the second year of the evaluation, however. Although Thompson et al. (2004) and Glazerman et al. (2008) made important progress in discerning the relationship between induction and instruction, neither examined the development of beginning teachers’ practice over time. There is a clear need to examine whether (and if so, how) the supports schools and districts provide to beginning teachers, what we call organizational supports, improve their teaching quality (Loeb et al., 2012). OUR DEPENDENT VARIABLE: COGNITIVE DEMAND, CONNECTIONS, AND DISCOURSE AS MEASURES OF INSTRUCTIONAL QUALITY Both Curriculum and Evaluation Standards (1989) and Principles and Standards for School Mathematics (2000), published by the National Council of Teachers of Mathematics, called for comprehensive reforms to traditional mathematics instruction. Two central aspects of highquality mathematics instruction outlined in these standards are the use of genuine, challenging tasks and classroom discourse that focuses on key mathematical ideas that emerge from individual and collective efforts to solve such problems (Stein, Engle, Smith, & Hughes, 2008); these ideas are consistent with those in recent Common Core standards (Cobb & Jackson, 2011a). Scholars (e.g., Polikoff, Porter, & Smithson, 2011; Stein, Grover, & Henningsen, 1996) classify the cognitive demand of tasks into those with low and high cognitive demand. Tasks with low cognitive demand require students to memorize or reproduce facts, or perform relatively routine procedures without making connections to the underlying mathematical ideas. Tasks with high cognitive demand require students to make connections to the underlying mathematical ideas, use procedures to solve tasks that are open with regard to which procedures to use, and, ideally, engage students in disciplinary activities of explanation, justification, and generalization. Stein and Lane (1996) found that the use of tasks with high cognitive demand was related to greater student gains on an assessment requiring high levels of mathematical thinking and reasoning. In particular, the greatest gains occurred when teachers were assigned tasks that were initially of high cognitive demand, and they maintained that level of demand throughout the lesson. These forms of mathematics instruction differ from typical math instruction that generally requires low cognitive demand from students and is consistent with “drillandpractice” pedagogy (Valli, Croninger, & Buese, 2012). HOW DOES TEACHERS’ (CONTENT AND PEDAGOGICAL) KNOWLEDGE RELATE TO INSTRUCTIONAL QUALITY? Research suggests that both content knowledge and pedagogical content knowledge are critical for effective teaching. Teachers weak in either of these areas express more misconceptions and tend to pose questions of low cognitive level (Ball, Thames, & Phelps, 2008). Further, higher levels of subject matter knowledge and pedagogical content knowledge are associated with higher student achievement (e.g., Boyd, Grossman, Lankford, Loeb, & Wyckoff, 2009; Campbell et al., 2014; DarlingHammond, 2000; Ferguson & Womak, 1993; Goldhaber & Brewer, 1997; Hill, Rowan, & Ball, 2005). Previous research on induction has not adequately addressed how the content knowledge of newly hired teachers influences their initial teaching quality or improvement in their teaching during their first few years in the profession. Based on prior research, we suspect that beginning teachers with stronger pedagogical content knowledge and mathematics content knowledge select and implement higher level cognitive demand tasks and engage their students in richer wholeclass discussion; we also suspect that the quality of their instruction would improve more rapidly than other beginning teachers with less pedagogical experience and math content knowledge. As we discuss in detail later, our findings do not back up these assumptions. THE ROLE OF SUPPORTS IN NEW TEACHER LEARNING Schools and districts offer multiple supports designed to address gaps in beginning teachers’ content knowledge and pedagogical content knowledge and to improve the quality of their instruction. As we mentioned earlier, we conceptualize induction to involve all the supports that beginning teachers draw on, including interaction with a formally assigned mentor, interaction with informal mentors whom the beginning teacher seeks out, collaboration with other teachers, participation in PD activities, the guidance and support they receive from their principal, and the curriculum they use. For mentoring, PD, and collaboration, we are particularly interested in activities focused on mathematics content and the teaching of mathematics, and the degree to which participation in these activities predicts instructional improvement. Mentoring Recent research indicates that having a mentor–mentee subject match improves retention (Ahn, 2014; Desimone et al., 2014; Smith & Ingersoll, 2014). Research also suggests that early contentfocused PD experiences can shape instruction (FeimanNemser, 2012; Luft et al., 2015; Luft, Roehrig, & Patterson, 2003). Prior studies have been much less specific, however, with regard to the structure or quality of these supports, such as whether they are provided by a formally assigned mentor or a colleague whom a teacher seeks out on his or her own, the total amount of time that beginning teachers interact with their formal and informal mentors, or the content of the interactions (i.e., to what extent they were focused on math content, teaching math, or assessing math). PD Beginning teachers may also participate in PD specifically designed for novice teachers or in activities designed for teachers of all experience levels. Though recent findings have shown that PD focused on specific content areas is associated with some changes in teaching practice, the link to student achievement gains is more tenuous (Desimone, Smith, & Phillips, 2013; Garet et al., 2010). We focus on the content of all support activities for new teachers—including the extent to which teacher interactions are focused on mathematics content or specific contentrelated instructional material, strategies, and lesson planning (Ball et al., 2008; Boston, 2012; Stodolsky & Grossman, 1995). We hypothesized that more intensive supports with a greater content focus would be associated with larger gains in teachers’ instructional quality. Collaboration We also consider the role that collaboration, or professional learning communities, plays in creating a productive learning environment for beginning mathematics teachers (Lee & Smith, 1996; Pogodzinski, 2015; Ronfeldt, Farmer, McQueen, & Grissom, 2015). We see the measurement of a new teacher’s sense of his or her professional learning community as critical to understanding the impact of the organizational supports available in schools, given the finding that PD, collaboration between teachers, and collegiality between teachers and leaders are unlikely to be effective unless they are connected to a sense of professional community defined by shared vision, expectations, commitment, responsibility for student learning, and trust (Bryk, Sebring, Allensworth, Easton, & Luppescu, 2010; Bryk & Schneider, 2002; Elmore, Peterson, & McCarthey, 1996; Kruse, Louis, & Bryk, 1994; Louis & Marks, 1998). Principals We also recognize that principals have the potential to influence the instructional development of new teachers by setting schoolwide instructional goals, arranging mentoring or PD experiences for teachers, and monitoring their instruction and providing instructional direction (DragoSeverson, 2012; Hallinger, 2003; Heck, 1992; Leithwood & Jantzi, 2008), although research on the effects of principals providing contentfocused support is limited (Nelson & Sassi, 2005; Stein & Nelson, 2003). Results of metaanalyses do show, however, large student achievement effects when principals offer a clear academic vision for their schools and provide instructional supervision (Hallinger & Heck, 1998; Robinson, Lloyd, & Rowe, 2008). Principals working directly with teachers can include principals discussing instructional strategies with teachers, providing evaluations that help teachers improve their practices, encouraging the use of different instructional strategies, and observing classroom instruction frequently (DragoSeverson, 2012; Sun, Youngs, Yang, Chu, & Zhao, 2012). Supovitz, Sirinides, and May (2010) found a statistically significant association between principal leadership and selfreported change in teachers’ instructional practices. Thus, although there is little research on how aspects of principal leadership influence change in teachers’ instructional practice during their first few years in the profession, prior research suggests that schoolwide instructional improvement can be driven by principals taking an active role in instructional leadership. Curriculum Finally, we know that the curricular tools that teachers have at their disposal can influence their instructional practices (Remillard, HerbelEisenmann, & Lloyd, 2009). Inexperienced teachers are especially prone to stick close to the curriculum that they are provided (Grossman & Thompson, 2008). The teachers in our study used curriculum ranging from more traditional (e.g., Glencoe) to more reformoriented (e.g., the Connected Mathematics Project, CMP) curricula. Identifying the relationship between curriculum and instructional practice is important because curricular tools like CMP contain a greater proportion of cognitively demanding tasks (Cai, Moyer, Wang, & Nie, 2009). Thus, we would expect teachers using a curriculum like CMP to select more cognitively demanding tasks (Remillard & Heck, 2014). Findings from a number of studies conclude that students who have access to reformoriented curricula do as well on tests of computation and perform better on conceptual understanding and problem solving than students taught with traditional curricula (Schoenfeld, 2002). METHODS This study uses teacher and studentlevel data from a longitudinal study of the induction and mentoring experiences of beginning middle school mathematics teachers. The study explores the natural variation among teachers who work in states with and without formal mentoring policies. We discuss the participants, measures, and issues with missing data in the following sections, followed by an explanation of our analysis methods and findings. PARTICIPANTS Participants include 62^{1} teachers from eight southeastern and three northeastern districts in the United States. Districts were chosen based on their location (proximity to researchers) as well as their likelihood of having new middle school mathematics teachers. Teachers were invited to participate after districts identified them as meeting two inclusion criteria: (a) serving as the teacher of record for at least one seventh or eighthgrade general education math class, and (b) having no prior experience as a teacher of record. Stipends were offered for each year of participation. About 50% of eligible teachers who were recruited to the study participated in it. This analysis uses three years of survey and classroom observation data from cohorts 1 (teachers who began teaching in 2007–2008), 2 (teachers who began teaching in 2008–2009), and 3 (teachers who began teaching in 2009–2010). We pooled the data so that we could examine the firstyear, secondyear, and thirdyear teaching experiences across cohorts. MEASURES This section details the observational and survey measures that we used to answer the research questions. Dependent Variable: Instructional Quality We measured teachers’ instructional quality using the Instructional Quality Assessment (IQA) (Boston & Wilhelm, 2015; Junker et al., 2006; Matsumura, Garnier, Slater, & Boston, 2008), which bases ratings on the degree to which the teacher selects and implements cognitively demanding problemsolving tasks and organizes discussions emphasizing reasoning and connections among mathematical ideas. The IQA’s design is based on guidelines for instructional practice as articulated in the National Research Council’s publication, How People Learn (Bransford, Brown, & Cocking, 2000), and summarized for practitioners as the Principles of Learning (Resnick & Hall, 2001). The IQA was originally designed for external evaluation, but its creators have argued that it is also positioned to serve as a resource for professional growth (Crosson et al., 2006). Indeed, the IQA has been used to gauge the quality of instruction across districts (Boston & Wilhelm, 2015), in studies of PD interventions designed to improve teachers’ instructional quality (Boston & Smith, 2011), as well as in studies of district efforts to improve math instruction (Cobb, Henrick, & Munter, 2011). The reliability of the rubrics for assessing the quality of mathematics lessons is good (alpha = .89; interrater agreement = 82% overall) (Matsumura et al., 2006). The IQA assesses the quality of observed classroom instruction on two separate rubrics: academic rigor and accountable talk. The three rubrics assessing academic rigor are (a) task potential, (b) task implementation, and (c) rigor of discussion following students’ work on the task. IQA ratings range from 0 to 4. For task potential and implementation, a score of 0 specifies absence of mathematical activity in a lesson; 1 points out tasks or instruction emphasizing facts and memorization; 2 indicates tasks or instruction emphasizing unambiguous application of procedures and single representations of concepts; and 3 and 4 designate tasks or instruction characterized by openended tasks, multiple representation of mathematical concepts, and connections among mathematical ideas, with a 3 characterized by a lack of connections or explicit evidence of students’ reasoning. For rigor of discussion, a score of 0 indicates no discussion of the task; 1 specifies discussion where students provide brief or oneword answers; 2 designates discussion in which students show or describe their work but do not talk about their strategies or mathematical ideas; and 3 and 4 designate discussion characterized by student explanations for their strategies and connections to the underlying mathematical ideas involved in the task, with a 3 indicating lack of clear and thorough explanation in student responses. Accountable talk is assessed using five rubrics: (a) Participation: Was there widespread participation in teacherfacilitated discussion? (b) Teacher’s Linking: Does the teacher support students in connecting ideas and positions to build coherence in the discussion? (c) Students’ Linking: Do students’ contributions link to and build on each other? (d) Asking (Teachers): Were students pressed to support their contributions with evidence and/or reasoning? and (e) Providing (Students): Did students support their contributions with evidence and/or reasoning? An overall measure of instructional quality was created by (1) averaging the task potential and implementation scores, (2) averaging the academic rigor discussion score and five additional accountable talk scores (i.e., participation, teacher linking, student linking, asking, and providing), and (3) averaging the two averages. The creation of this overall instructional quality measure was supported by an exploratory factor analysis that indicated that the eight IQA scores separate into two main factors: one that includes the task potential and implementation and another that includes all the discussion ratings. This article therefore focuses on three IQA outcomes: (1) a combined task potential and implementation score constructed by averaging the rubric scores for task potential and task implementation, (2) an overall discussion score constructed by averaging the academic rigor discussion score and the five accountable talk scores, and (3) the overall IQA score. Participating teachers’ instruction was recorded during the same class period on 2 consecutive school days in the fall and spring of their first year, and in the spring of their second and third years. One district did not allow videotaping, so live rating and audio recording was used. Videographers logged the sequence of activities that occurred in the classroom and collected or recorded (by hand or with video) the mathematical tasks students were assigned to work on during class. Raters participated in at least two full days of training conducted by IQA developers. Before rating began, each rater participated in interrater reliability exercises in which all raters on the team viewed the same subset of classroom videos, coded the videos according to the IQA protocol, and discussed their coding with one another to reach mutual understanding and agreement. Raters did not begin individually coding lessons until 80% interrater agreement was reached on the lessons used for these exercises. Table 1 shows the number of recordings that were conducted at each time point. Sixtysix teachers participated in at least one component of the data collection. At least one IQA observation was conducted for 62 of the teachers; no IQA scores were collected for four teachers. Some teachers entered into the study late, and others are missing certain observation periods because of scheduling or technology issues. Our analyses make use of data on 62 teachers. Because of teachers either leaving the profession (n = 8), moving to different school districts (n = 10), switching subject areas or to grades outside of middle school (n = 2), or dropping out of the study (n = 7), we have a full three years of data for only 35 teachers. Table 1. Number of Teachers in the Sample at Each Time Period
Notes: 1. Teachers entered the study late due to switching subject areas and grade levels in January of year 1. 2. One teacher observation is missing because the teacher was out for maternity leave, but this teacher was observed in year 3. Recordings of each lesson were viewed and rated on the IQA by two independent raters. One set of IQA ratings for each of the four observation points was generated by averaging across the two raters for each day and then across the consecutive days as a means of improving reliability. A third rater was brought in if any of the eight rubric scores differed across the two initial raters by 2 or more points and if the difference crossed the 2 to 3point threshold. For example, if the ratings were 0 and 2, we didn’t use a third rater, but we did use a third rater if the scores were 1 and 3, 2 and 4, or 0 and 3. Codes were then averaged across the two closest raters. A third rater was needed for about 25% of the observations. We used multiple methods to assess the reliability of IQA ratings. Exact agreement between paired coders—calculated as the total number of agreements divided by the total number of agreements and disagreements—was 60% overall. This is lower than the overall exact point interrater reliability of about 80% found in a pilot study of 13 middle school mathematics teachers (Matsumura et al., 2006) but higher than the about 50% exact interrater agreement on the academic rigor rubrics found in a pilot conducted with 14 elementary school teachers (Boston & Wolf, 2006). Onepoint agreement, where we considered raters in agreement if individual scores were within 1 point of one another on each IQA rating scale, was 88%. In addition, we conducted a generalizability study (Gstudy) at four time points during the coding process to verify that our design for rating lessons—with two raters rating two lessons each at each time point—provided a stable estimate of instructional quality. For each Gstudy, each rater who was coding at the time when the Gstudy was conducted independently rated two lessons from each teacher in a random sample, and the ratings given to the lessons were analyzed using GENOVA software (Crick & Brennan, 2001). At the teacher level, generalizability coefficients with two raters and two observations for each teacher ranged from .74 to .98, with an average of .81, indicating sufficient reliability. Independent Variables We created variables measuring the quantity of mathrelated instructional support provided through mentoring and PD, as well as teachers’ perceptions of their school leadership and professional community. We created these variables from data generated from surveys that we administered. At each of the observation periods, we asked teachers to complete the surveys, which included questions about (a) the mentoring that they received, from both formally assigned mentors and other individuals to whom the teacher reported going for assistance with mathematics instruction (referred to here as informal mentors); (b) the content of PD they participated in; (c) the interactions with their principal; and (d) the degree of professional community in their school. Mentoring. Teachers in our sample reported having between zero and five formal mentors; most had either one or three, a reflection that a large portion of our sample taught in a state with a formal induction policy requiring mentorship from a threeperson committee. The number of informal mentors per teacher ranged from zero to six, although most reported between one and three. On average, teachers spent more time interacting with informal mentors than with formal mentors, but there was no statistically significant difference in the percentage of time devoted to mathspecific instructional topics between informal and formal mentors after taking total time into account. Because teachers’ mentorship experiences typically included both formal and informal elements, and these elements appear to be supplementary rather than compensatory, we aggregated formal and informal mentoring in our analyses. Surveys asked teachers to check whether their formal and informal mentoring on 14 topics was either not a focus, a minor focus, or a major focus. These topics included classroom management, parental involvement, emotional support and stress management, and several topics directly related to instruction. Because our analysis focuses on the role of contentspecific supports, we used four of these topics to form a variable for mathspecific instructional supports. To generate this variable, we first calculated the total amount of mentoring that teachers received from each mentor, using data from the four survey questions about frequency of formal inperson meetings, the average duration of those meetings, and frequency of informal communications. The questions from the firstyear fall and spring surveys asked about the previous semester, whereas the questions from the second and thirdyear surveys referred to the previous school year. To estimate the total amount of mentoring time each teacher received from each mentor, we multiplied the frequency of formal inperson meetings by the meeting duration and added to this the product of the estimate of informal communication frequency and 7.5 minutes (see the formula that follows; 7.5 minutes was the midpoint for the 0–15 minutes option). Although this procedure may result in an overestimate of total mentoring time, the overestimate applies uniformly across teachers in our sample and, therefore, enables us to make comparisons of relative time spent on mathrelated mentoring. Total time = [Frequency of formal meetings * Average duration of formal meetings] + [Frequency of informal communications * 7.5 minutes] We used the total time estimates to calculate the amount of time spent on mathrelated instructional support. On each survey, we provided teachers with a list of topics and asked them to rank topics based on the level of focus given to the topics in their formal and informal meetings with each mentor; topics that were not a focus were ranked 1, topics that were a minor focus were ranked 2, and topics that were a major focus were ranked 3. To estimate the amount of time spent on mathrelated instructional support, we first summed all the content foci for each mentor, counting major foci as a 2 and minor foci as a 1. Next, we summed just the mathrelated instructional support foci using the same procedure. We labeled as mathrelated the following content areas: how students learn mathematics, deepening your subjectmatter knowledge of mathematics, individualized instruction in mathematics, and analyzing student work. Finally, we divided the math focus rating by the total focus rating to calculate a percentage of math focus, which we then multiplied by the total time estimation for each mentor to create a more interpretable measure of the amount of mathrelated support provided through mentoring. Professional development and orientation. To determine duration and content of PD supports, we asked teachers how many hours they spent on each of the four contentarea topics (how students learn mathematics, deepening your subjectmatter knowledge of mathematics, individualized instruction in mathematics, and analyzing student work) during their PD participation. We used their responses to create a variable for PD math content time for each teacher for each year by adding together the total number of PD hours spent on the four mathrelated instructional support topics. In the fall of their first year, we also asked about the hours of mathrelated orientation activities in which the teachers participated. Finally, we summed the four mathrelated instruction support categories to create a math orientation variable. Leadership. In the spring of each year, teachers indicated whether they (1) strongly disagreed, (2) disagreed, (3) agreed, or (4) strongly agreed that the school administration (a) lets staff members know what is expected of them, (b) is supportive and encouraging, (c) enforces school rules for student conduct, (d) recognizes staff members for a job well done, (e) provides time for teachers to meet and share ideas with one another, (f) deals effectively with pressures from outside the school, (g) encourages innovative instructional practices, and (h) backs me up when I need it. From their responses, we created a leadership support scale (alpha reliability ranged from .80 to .90, depending on the time period). Professional community. To indicate the teachers’ perceptions of the supportiveness of their colleagues and the overall school culture, we generated a professional community scale. In the spring of each year, teachers indicated whether they (1) strongly disagreed, (2) disagreed, (3) agreed, or (4) strongly agreed that teachers in their school (a) feel supported by colleagues to try out new ideas, (b) regularly share ideas and materials related to instruction, (c) trust each other, (d) feel responsible to help each other do their best, (e) are willing to question one another’s views, (f) share high expectations for student work, and (g) share a vision of good teaching (alpha reliability ranged from .84 to .89, depending on the time period). Teacher background. Several teacher background and other school characteristics, attained from our initial survey administered in the fall of the first year for each cohort, were included as predictors of IQA scores during the first semester of teaching. Variables measuring the subjectcontent of teachers’ preservice training were generated from information provided about major and minor fields of study for each of the degrees indicated on the fall firstyear survey. Teachers were categorized as having a math degree if they listed mathematics or math education as their major or minor area of study for either a bachelor’s or master’s degree. Teachers were categorized as having an education degree if they did not have a math major, a math minor, or a matheducation degree but did have a bachelor’s degree with a major or minor in education or a master’s degree in education. Of the 62 teachers included in the study, 8% (n = 5) had math degrees, 24% (n = 15) had math education degrees, and 42% (n = 26) had education degrees but no math degree. The remaining 16 teachers did not have math, math education, or education degrees. Teacher preservice pedagogical experience was measured by the number of weeks of student teaching, reported on the fall firstyear survey. The average student teaching period was 11 weeks (16 of the beginning teachers had no student teaching). Teachers were also assessed on the Math Knowledge for Teaching (MKT) assessment developed by scholars at the University of Michigan (Hill, Schilling, & Ball, 2004), which measures mathematical knowledge as it is used within particular tasks of teaching. Current measures consist of 10 to 12 multiplechoice prompts, achieve reliability of .70 or above, and can be used as a pretest/posttest to assess teachers’ knowledge growth (Hill, Schilling, & Ball, 2004). The average MKT score of beginning teachers in our sample was .52, about half a standard deviation below the mean of a national norming sample of middle school teachers (Hill, 2007). School context. School characteristics measured as variables included use of reform or inquirybased mathematics curricula and school free and reduced price (FRPL) lunch information, obtained from the Common Core of Data collected by the National Center of Education Statistics. The average percent of FRPL students at a particular school ranged from 8% to 99%. About 42% (n = 26) of teachers used a reform mathematics curriculum. To construct this variable, we first identified the mathematics curriculum or textbook used in the class we observed for each teacher. We then classified the mathematics curriculum based on whether it was one of the programs deemed “exemplary” by the NSF (Borasi & Fonzi, 2002), assigning a value of 1 if it was and 0 if it was not. Teachers with a value of 0 on this variable generally used other commercially available programs that did not meet the NSF criteria, which include alignment with NCTM standards, development by groups of specialists in both content and pedagogy, revision based on fieldbased evaluations, emphasis on the development of mathematical concepts, and support for teachers through supplemental materials (Borasi & Fonzi, 2002). Two teachers taught in a district that used the Carnegie Learning program, which, although not on the NSF list of exemplary programs (and not funded by NSF), met similar criteria (Ritter, 2010); consequently, we included these teachers in the reform mathematics curriculum group. Class observed. Teachers may change their instruction depending on the level of the course taught. Because the level of the observed class for each teacher was not consistent from year to year, we controlled for the level of the course observed by using a binary variable to indicate advanced mathematics. We counted as advanced math honors math, prealgebra, and algebra. We gathered course data from teachers’ schedules and our observation notes. Month of observation. We created a time variable based on the month in which the IQA observation took place. The month variable ranged from 1 to 36 (i.e., August of the first year was coded as a 1, August of the second year was coded as 13). ANALYSIS METHODS Survey and Observation Data To investigate the initial levels of the teachers’ mathematics instructional quality and the levels of, and changes in, organizational supports for math instruction that these teachers received, we examined descriptive statistics. We then used ordinary least squares (OLS) regression analysis to investigate what teacher background and school context factors are associated with teachers’ initial instructional quality on the IQA. To assess whether instructional quality improved over time, we employed a twolevel hierarchical linear growth model (Raudenbush & Bryk, 2002), with time being the only predictor. Most researchers use growth modeling to study change in student achievement, though some researchers have applied this method to the study of teacher pedagogical and content knowledge (Goldschmidt & Phelps, 2010). Growth curve modeling has rarely been used to examine change in ratings of teachers’ instructional quality. Given that the student achievement of beginning teachers tends to rise during their first few years teaching, we expected to see improvement over time in beginning teachers’ instructional quality, as measured by the IQA (Clotfelter et al., 2007; Harris & Sass, 2007). The twolevel hierarchical model allowed for monthly growth in IQA to vary randomly across teachers (i.e., a random slope was fit on the variable on month). The Level 1 model is: Y_{it} = p_{0i} + p_{1i} (Month of Observation) _{it }+ e_{it} . Level 1 is a repeatedobservations model where Y_{it} is the IQA score at month t for teacher i. The coefficient p_{0i} represents an estimate of the initial IQA score for teacher i prior to the start of his or her first year of teaching (estimated initial score), and p_{1i} is the estimated monthly growth rate for teacher i. The variable e_{it} is the withinteacher error term, assumed to be normally distributed with a mean of 0 and constant variance. The Level 2 models measure differences between teachers in their initial status and rate of growth, which in this model is the teacher’s predicted initial score and teacher’s monthly growth rate. Thus the Level 2 models are as follows: p_{0i} = b_{00} + r_{0i} p_{1i} = b_{10} + r_{1i} To investigate which contentfocused supports predict improvement in instructional quality, we included mathfocused mentoring, mathfocused PD, professional community, and principal leadership as timevarying covariates in the Level 1 model. We also controlled for whether we observed the teacher teaching an honors/advanced or regular math course at each time period. As a sensitivity test, we also implemented a teacher fixedeffects specification. The longitudinal nature of the data allows us to use growth curve analysis to examine the relationship between organizational supports for math instruction and the extent to which teachers implement math instruction that is characterized by cognitively demanding tasks as well as opportunities for students to explain mathematical content. Advantages of growth curve modeling are (a) that assessment times do not have to be identical, which allows respondents with missing data to remain in the analysis and (b) that it captures the timeordered nature of the observations (Raudenbush & Bryk, 2002). We fit the multilevel models using Stata 11, specifying an independent covariance structure and restricted maximum likelihood (REML) to estimate the variance components. Interview Data To explore how teachers who do and do not show improvement in instructional quality over their first three years teaching describe the nature of the supports they receive, we used comparative case study methods to examine a subset of 8 teachers from a single school district. New teachers express beliefs and enact practices that reflect their districts’ approaches to instruction (Achinstein, Ogawa, & Speiglman, 2004; Boston & Wilhelm, 2015; Grossman, Thompson, & Valencia, 2002; Youngs, HoldgreveResendez, & Qian, 2011). Because beginning teachers also tend to rely heavily on the prescribed curriculum (Grossman & Thompson, 2008), focusing on teachers from the same district allows us to examine differences in teachers’ descriptions of their supports within the same local and curricular context. We chose the school district that contributed the largest number of teachers to the study to help ensure sufficient variation in instructional quality. This district uses a reform mathematics curriculum, the Connected Mathematics Project 2 (CMP2), which is aligned with the vision of highquality instruction evaluated by the IQA and has an intensive, districtwide induction program. Teachers were interviewed and observed at four time points: winter of the first year of teaching and the spring of the first, second, and third years. To explore the development of beginning teachers’ instructional practices over time, it was important to have at least 2 years of data. Because no interviews were conducted with Cohort 3 teachers in their third year, only teachers from Cohorts 1 and 2 were eligible for case study selection. Eleven teachers from the selected district met these requirements. Our focus was to compare teachers who improved on the IQA and teachers who did not improve on the IQA, as well as teachers who reported high amounts of mentoring and PD and teachers who reported low amounts of mentoring and PD. Thus, we placed each of the 11 teachers into one of four categories: high support, improvement; high support, no improvement; low support, improvement; and low support, no improvement. Teachers were identified as being either improving or not improving by examining their IQA scores on the task potential, implementation, and discussion rubrics, as well as by looking at their predicted growth coefficient on the overall score. Of the 11 teachers, 3 were categorized as improving based on their positive growth coefficients, and another teacher was classified as improving based on high scores in the third year of teaching. We selected these 4 teachers as our improving teachers. Seven teachers were classified as nonimprovers based on a growth coefficient below or near 0. We used survey data on mentoring and PD to categorize teachers as receiving either a low or high amount of support. Teachers with both formal and informal mentoring and PD that were above average over most time points were categorized as highsupport teachers, and teachers with formal and informal mentoring and PD that were below average over most time points were categorized as lowsupport teachers. This selection method enables us to contrast the instructional growth cases by amount of organizational support that was received. Of the 4 improving teachers, 2 were low support and 2 were high support. Of the 7 nonimproving teachers, 2 were classified as high support, 3 were categorized as low support, and two were classified as having high and low support depending on the area. One of the lowsupport teachers was eliminated because of a high variability in IQA scores over time, so we selected the two highsupport, nonimproving teachers and the remaining two lowsupport, nonimproving teachers. Table 2 describes these eight teachers with whom we developed case studies. In the interviews, teachers were asked about their successes and challenges related to teaching; supports received for challenges; visions of highquality math instruction; and relationships with their formal and informal mentors, principals and other school leaders, and colleagues. We also interviewed these teachers’ principals and formal and informal mentors each spring. The interview transcripts associated with a particular teacher constituted the case study data we analyzed. We read the interview transcripts and identified major themes related to the nature of supports provided. We captured information for each teacher in the following categories: challenges; collaboration; PD content and quality; administrative support; other supports; coherence and consistency of supports; predispositions to and experience with a certain curriculum and instructional style; instructional quality improvement on dimensions not measured by the IQA; school culture and visions of highquality mathematics; and curriculum, grade, or school change. Table 2. Description of Case Study Participants
PRESENTATION AND DISCUSSION OF RESULTS TEACHERS’ INITIAL MATHEMATICS INSTRUCTIONAL QUALITY In fall of their first year, the beginning mathematics teachers in our study tended to implement lessons with an average task potential of somewhat above 2, which indicates an emphasis on procedures and single representations of concepts, with comparatively few teachers selecting and implementing tasks that connect to underlying math concepts. Table 3 shows the average, minimum, and maximum initial scores on the eight IQA rubrics, as well as the average discussion and overall IQA scores, and Figure 1 shows the score distributions, with scores rounded to the nearest whole number. Seventyseven percent of beginning teachers had task potential scores that averaged to 2, and 23% had scores of 3 or above. Because it is more likely that a teacher reduces rather than raises the cognitive demand of tasks during implementation, the distribution of task implementation scores tends to be lower than that of task selection. Ninety percent of our beginning teachers had task implementation scores during their first semester that averaged to a score of 2—signifying an emphasis on procedures and single representations of concepts. Only 10% had task implementation scores of 3 or above, indicating that the rigor of a highlevel task was maintained during the lesson. Our findings are similar to results from the 1999 Trends in International Mathematics and Science Study (TIMSS) video analysis, which found that tasks implemented in eighthgrade classrooms in the United States tended to be procedural in nature and that when teachers did select highlevel tasks, they often implemented them in lowlevel ways (Gallimore et al., 2003; Hiebert et al., 2003). These results are also consistent with more recent research documenting teachers implementing tasks at lower levels than was intended in the design of the curriculum (e.g., Boston & Wilhelm, 2015) Table 3. Descriptive Statistics of Initial IQA Scores by Subcomponent (N = 61)
Figure 1. Initial IQA Score Distribution by Subcomponent Our beginning mathematics teachers also tended to have low discussion scores, averaging about 1, indicating that students primarily provided brief responses to the teachers’ questions rather than having more elaborate discussions that involved sharing different strategies or justifying solutions. Nearly all teachers had discussion scores of less than 2, which signifies, for example, stepbystep instructions for solving a problem but no explanation of why such steps are appropriate. Several teachers scored 0 for rigor of discussion because they did not engage the entire class in a discussion. On average, scores for each accountable talk rubric were close to 1, indicating fairly low levels of accountable talk. Of the accountable talk items, teachers scored highest on participation and lowest on teacher asking. RELATIONSHIP OF TEACHER BACKGROUND AND ORGANIZATIONAL CONTEXT TO INITIAL INSTRUCTIONAL QUALITY Table 4 shows the results of three models examining the relationship between IQA scores and college major, math knowledge for teaching, student teaching experience, orientation the of curriculum, amount of math content during orientation, level of class observed, and school poverty level, during the first semester of teaching. OLS models were estimated on the following IQA outcomes: (a) task potential and implementation, (b) overall discussion, and (c) the combined overall IQA score. Background factors include having a degree in mathematics/mathematics education, education (suppressed), or another field; weeks of student teaching experience; and the teacher’s MKT score. School context factors include whether the math curriculum was deemed “exemplary” by the NSF, hours of math content teachers reported being exposed to during orientation, school FRPL percentage, and course level. Table 4. Ordinary Least Squares Results of Teacher Background and Organizational Context Variables Regressed on Initial IQA Scores (N = 61)
+p < .10. *p < .05. **p < .01. ***p < .001. Contrary to our expectations, holding a math degree/math education degree compared with holding an education degree was associated with selection and implementation of lower level tasks (β = .34). This is supported by the fact that none of our beginning teachers who held math or matheducation degrees had a task implementation score that averaged to 3 or above during the fall of their first year teaching. In contrast, beginning teachers holding neither a math nor an education (other) degree had similar task potential and implementation scores to those with education degrees (β = .01) and higher overall discussion scores (β = .66). Number of weeks of student teaching was also not significantly related to IQA outcomes, although a sensitivity test replacing weeks of student teaching with a binary variable indicating whether preservice teaching experience was in middle school math was marginally significantly associated with overall discussion scores (β =.45, p < .09) and significantly associated with overall IQA scores (β = .38, p < .049) (not shown). This suggests that alignment of preservice training experience—in this case, in a math classroom in a middle school, holding degree and math content knowledge constant, is associated with higher initial teaching quality. Also contrary to expectations, beginning teachers’ scores on the MKT were not significantly associated with higher instructional quality, as measured by the IQA. Being in a school that uses an “exemplary” NSF curriculum was associated with higher average task potential and implementation scores (β= .25), as well as having higher overall discussion (β = .35), and average IQA scores (β = .30). Attending more hours of math orientation was also positively associated with increased task potential and implementation scores (β = .05). NSF curriculum and math orientation hours are highly correlated (r = .50) in our study, given that districts using more rigorous instructional materials tended to focus more time on math instruction during orientation. Teachers not using a reformmath curriculum reported just .32 hours of mathrelated orientation, whereas reformmath curriculum teachers reported 3.1 hours of mathrelated orientation. NSF curriculum and math orientation hours were jointly significant (p < .01) in predicting overall IQA scores. Teaching at a school with a higher percentage of students eligible for FRPL was negatively associated with some instructional quality ratings, though the effect was small. Teaching an advanced or honors math course was significantly associated with higher task potential and implementation scores (β = .36) during the first semester of teaching. ORGANIZATIONAL SUPPORTS PROVIDED TO BEGINNING TEACHERS DURING THEIR FIRST THREE YEARS The level of support beginning mathematics teachers in our study received varied widely across their organizational environments. Table 5 shows the sample means and standard deviations for a range of supports, including hours of formal and informal mentoring, hours spent in mathrelated PD, a scale measuring the teachers’ perceptions of their principals’ instructional leadership, and a scale with their perceptions of the level of professional community in their schools across four time points during their first three years. The total average of nearly 40 hours of mathrelated mentoring for the fall and spring of the first year contrasts with the 9 hours reported for the second year and 11 hours for the third year. Reported mathrelated mentoring hours were highly variable from teacher to teacher, ranging, for example, from 0 to more than 100 in the fall and spring semesters of the first year. The math mentoring hours are also broken down by the source of support—contact hours with formal and informal mentors during each of the four time periods. Teachers reported receiving greater amounts of informal math mentoring than formal math mentoring. This is likely due to informal mentors tending to be located at the same school as the beginning teachers. Generally, informal mentors either taught the same subject or were on the same gradelevel team as the beginning teacher. Table 5. Descriptive Statistics of Organizational Supports Provided to Beginning Math Teachers Over Time
Note. Standard deviation in parentheses. Participation in mathrelated PD also varied widely across the beginning mathematics teachers in our study, although there was more consistency in the average hours of participation—13 hours in the first year, 18 in their second year, and 13 hours in their third year. There may be an upward bias in PD participation over time if teachers who were more likely to remain teachers were also those who may have sought out more PD offerings. We have some evidence in support of this hypothesis: Mathrelated PD hours in the spring of the first year averaged 6.66 for teachers who continued teaching the second year but only 1.89 for teachers who left teaching at the end of their first year. The instructional leadership and collaboration scales were less variable across teachers than the mentoring and PD math hours. Although the mean of the instructional leadership variable stayed stable between the first and second years near a mean of 3—suggesting that, on average, teachers found their principal supportive—it decreased in the third year. The difference between the second and thirdyear reports of instructional leadership is statistically significant for those teachers who were teaching in the third year. The mean for the collaboration scale was similar across all 3 years, suggesting that the typical beginning teacher in our study agreed that teachers in their school feel supported by colleagues to try out new ideas, regularly share ideas and materials related to instruction, trust each other, feel responsible to help each other do their best, are willing to question one another’s views, share high expectations for student work, and share a vision of good teaching CHANGE IN INSTRUCTIONAL QUALITY Overall, the beginning teachers in our study did not show improvement in their instructional quality over their first three years in the profession, as measured by the IQA. Figure 2 shows the averages for task implementation, task potential, overall discussion, and the average IQA over time. The pattern of little substantive change in average scores over time remains similar when looking only at teachers who remained teaching at the school where they started for three years (not shown). We also found no differences in IQA scores between teachers who remain teaching at their initial school compared with teachers who moved schools or left the profession. We tested these linear trends using growth models with no covariates, where the coefficient on the month variable was not significant in any model; see Table 6. Figure 2. Beginning Math Teachers’ Average IQA Scores Over Time Table 6. Hierarchical Linear Growth Model Results From Regressing Background Variables and TimeVarying Organizational Supports on IQA Scores
RELATIONSHIP OF ORGANIZATIONAL SUPPORTS AND INSTRUCTIONAL QUALITY Overall we found very few supports associated with changes in teachers’ instructional practices, as measured by the IQA. Table 6 shows the relationship between level of/change in contentfocused supports (e.g., mathfocused mentoring, mathfocused PD, professional community, instructional leadership) and IQA scores (i.e., task potential and implementation, overall discussion, and overall IQA), holding constant preservice degree, student teaching experience, and whether or not the curriculum has been classified as reform oriented. Neither the amount of contentfocused interactions with mentors nor quantity of time spent in mathematicsfocused PD is associated with changes in any component of the IQA. Contrary to our expectations, strength of beginning teachers’ professional community is associated with lower IQA scores, although these differences are not statistically significant at conventional levels. Our measure of instructional leadership, however, is positively associated with teachers selecting more cognitively demanding tasks and implementing them at a higher level of rigor, an effect that is marginally significant. A standard deviation increase in teacher reports of their principal’s instructional leadership is associated with a .14 (p = .06) increase in the task potential and implementation average score—about 15% of the distance between a score of 2 and a score of 3. WHY ARE INCREASED SUPPORTS NOT ASSOCIATED WITH INCREASED TEACHING QUALITY? Comparative case study analysis of our interview data helps us understand how and why certain organizational supports may or may not be related to teacher improvement on our measure of instructional quality. A closer look at how teachers talk about their collaborative activities, mentoring, relationships with their principals, and their PD suggests aspects of these supports that may be important for improving teachers’ instruction. Collaboration A theme emerged from our case study analysis that provides insight into why our empirical models may not have shown a significant relationship between frequent collaboration and improved instruction: Intense collaboration is often not deeply focused on mathematics or the teaching of mathematics. For our case study teachers, behavioral, social, and emotional issues often take precedence for beginning teacher interactions with their colleagues. Previous research suggests that for collaborative interaction to have positive effects on teacher knowledge and instruction, it must include sustained, deep engagement with math content and how students learn math (Ahn, 2014; Cohen & Ball, 1999; Wilson, Sztajn, Edgington, & Confrey, 2014; Windschitl, Thompson, & Braaten, 2011). Although teachers in our study were often engaged in multiple forms of collaboration, sometimes daily, which they found engaging and useful, a closer look at the content of these interactions reveals that many were focused on organizational, logistical, and behavioral aspects of instruction rather than mathematics or teaching mathematics effectively. Take Ian, for example, a highsupport, nonimproving teacher. He said that he “always has someone to talk to.” But when probed about the nature of discussions with other mathematics teachers, he responded that conversations focus on logistics, how to use the new electronic chalk boards, and the pacing of the curriculum, not the mathematics. Similarly, we found examples of highsupport teachers spending much of their collaborative time on organizational, management, and logistical strategies. Howard, a highsupport, nonimproving teacher, told us that he spent lots of time working with other teachers but that most of this time was devoted to issues related to classroom management, sharing resources, or student behavior. He said that departmental meetings were spent largely “dealing with standardized testing results [and] action plans.” This is in contrast to other types of collaboration that we would expect to translate directly into more math knowledge or better math teaching. Kelly, a highsupport, improving teacher gave examples of ongoing conversations she had with other math colleagues who were “good at helping me explain it to the kids”—colleagues who helped her identify background knowledge about kids that was missing and strategies for addressing that missing information. She described how she worked with another math teacher to develop projects for their math students. She said, “I keep in pretty good contact with most of the math teachers in the building, and we’re always . . . shooting ideas by each other about how this lesson went for them and how they ran their lesson to see what we might be able to combine and make a real good lesson with.” Based on Kelly’s comments, it seems that she and her fellow math teachers talked about what worked and what didn’t work in their mathematics lessons—the type of dialogue that can lead to increased understanding of how students learn math (Cohen & Ball, 1999). Similarly, Jen, a highsupport, improving teacher described learning how kids think about mathematics by using an intense, highquality PD activity: Connected Math 2 . . . help[ed] us think like kids. And so you’re not just working out the problem, you’re anticipating student problems. You’re figuring out what homework would be good, working out the homework problems, you’re getting really in depth with it, stuff that if you just sat down and did it on your own, you wouldn’t do. Mentoring Three themes emerged that are related to the content and quality of mentoring, and they are all supported by previous research: the importance of coplanning (Bauml, 2014; Louis & Marks, 1998), access to mentors beyond the first year of teaching (Desimone et al., 2014; Isenberg et al., 2009), and use of observation and feedback focused on mathematics instruction (Cohen & Ball, 1999; Polikoff, Desimone, Porter, & Hochberg, 2015). Although observation and feedback were included in our quantitative measure of mentoring supports, coplanning and longevity of relationship with the mentor were not a focus. Regular coplanning may be an especially strong form of mentoring that is related to improvement in instructional quality. Our case studies revealed that both high and lowsupport teachers whose IQA scores improved had one thing in common—consistent, regular opportunities to coplan lessons. These opportunities occurred with both formal and informal mentors. Three of the improving teachers (Kelly, Tonya, and Janet) planned lessons with the other math teacher of the same grade. These teachers reported that they benefited from discussing ideas and strategies during the coplanning. As Kelly’s mentor explained, planning lessons together afforded them the opportunity to “talk . . . things through and com[e] out with what’s best for the students.” This is in contrast to nonimproving teachers, whose planning with others, if it occurred at all, was more of an occasional, rather than a regular, occurrence. A second theme was that improvers tended to have access to the same mentor in the second and third years of teaching. Three of the four teachers who improved on our measure of instructional quality (i.e., Kelly, Jen, and Tonya) reported that their formal mentors from their first year became informal mentors in their second year. The fourth improver, Janet, indicated that her formal mentor retired but that she received mentoring from another mentor during all three years. Jen also reported receiving mentoring from the math department chair across all three years. In contrast, continued access to mentors was less consistent with the nonimprovers. Only one of the nonimprovers, Keisha, identified a formal mentor from the first year as an informal mentor in the second year, but the mentor transferred out of Keisha’s school before the end of her second year of teaching. Another nonimprover, Howard, had a new mentor assigned in his second year because his original mentor changed grade levels. Brandon, a third nonimprover, indicated that his formal mentor left the school in the first year, and he was never assigned another mentor. A third theme related to mentoring was that improvers tended to have mentors who engaged more often in observation and feedback focused on instruction, compared to the mentors of nonimprovers. Tonya, one of the lowsupport, improving teachers, reported getting thorough feedback from her formal mentor. She said that her mentor commented on “everything that she saw, the good, bad, the ugly. I mean she’d just tell me what she saw and you know like give me ideas on how I could work with that or improve it.” In contrast, Keisha, a lowsupport, nonimproving teacher, noted that her mentor did not provide her with critical feedback; instead, she “had nothing but positive things to say even though I felt like . . . I don’t know what I’m doing sometimes.” Further, when nonimproving teachers did receive feedback, it was more often focused on classroom management rather than how to improve their math instruction. As Brandon noted, “all of [my mentor’s] suggestions, all that was on classroom management because I really don’t need the help with content, but I need all the help I can get with classroom management.” Principal Support Although our survey data suggest that strong administrative support is related to instructional growth, our interview data suggest that the positive effect of principal support may operate primarily through the organizational climate established by the principal rather than the amount of direct support or pressure to improve math instruction. This idea—that principals can facilitate instructional improvement by setting the conditions that facilitate teacher learning and growth—is supported in the literature (Leithwood & Jantzi, 2008; Ten Bruggencate, Luyten, Scheerens, & Sleegers, 2012). Across our case study interviews, we found no systematic differences between the supports or accountability relationship between principals and teachers, whether they improved their instruction or not. Although the casestudy district assigned the principal the specific role of mentoring new teachers, most of the beginning teachers reported infrequent contact with their principal. For example, Kelly, an improver, reported in her first year that she did not see her principal very often but that she felt she could go to him with questions or concerns. And although this principal and teacher had little contact, the principal reported intentionally putting the teacher on a “strong team” so that she would have peer support. This case highlights the range of ways that principals offer beginning teachers indirect, rather than direct, support. Although principals were required to observe and provide feedback to their beginning teachers, few teachers reported receiving feedback from their principals on the quality of their mathematics instruction. Most often, the focus of discussion between principals and teachers was on classroom management or student engagement. For example, Tonya, an improver, met with her principal six times formally and another “dozen times informally,” primarily to discuss classroom management and working with difficult populations of students. Similarly, Ian, who did not improve, reported interacting with his principal in new teacher team meetings and sought him for assistance on behavioral issues. The teacher reported that his interactions with his principal were helpful, but he noted that they did not discuss math. Professional Development Our survey analysis did not support our initial hypothesis that PD focused on math content and instruction would be associated with teacher improvement in reformoriented instruction, as measured on the IQA. All of our case study teachers experienced relatively intensive training in the implementation of the CMP curriculum over the course of their first years in the classroom, so the content or curriculumembedded nature of PD did not itself suggest an explanation for growth on the IQA. But close examination of our case study teachers identified two dimensions of PD that were not directly examined in our survey analysis and that may be related to improved instruction: opportunities for interaction around the content of the PD, and targeting the PD to a teacher’s specific classroom challenges. Specifically, our case studies suggest that contentfocused PD may be better positioned to foster instructional change when it provides opportunities for ongoing interaction linked to classroom implementation of the content. This hypothesis is consistent with the PD literature’s prioritization of active learning opportunities that are ongoing and that feature collective engagement (Garet, Porter, Desimone, Birman, & Yoon, 2001; Garet et al., 2010). All of our case study teachers found the curriculumfocused PD to be helpful, but the teachers who improved on the IQA had additional PD that was more connected to everyday math practice. For example, both highsupport IQA improvers (Kelly and Jen) participated in the district’s voluntary cohort of middle school math teachers who were provided with several full days of release time throughout the year to meet as a group and with district math specialists to focus on effective implementation of the CMP curriculum, including observation of model classrooms. Essentially, this math teacher cohort formed a professional learning community focused on math instruction, its members having common materials and curriculum objectives. As noted earlier, Jen explained the benefits as pushing her to interact more in depth with the material and to think like her students would. For this teacher, then, participating in the math teacher cohort enabled a deeper engagement with the curriculum than she would have had through the district CMP training sessions alone. In addition, both highsupport improvers attended a weeklong summer PD experience on CMP2 conducted by the program’s developer. This training afforded them opportunities to interact with the curriculum, to ask about specific areas of the text they found especially challenging to teach, and, as Kelly indicated, to get different ideas. Like you know, it’s a book but you can teach it like so many different ways. . . . And, you know, a lot of teachers, you know, they had different ways of making different little manipulatives for the kids or drawing different pictures or projects. These findings suggest that this extra PD, which was closely tied to teachers’ actual classroom implementation of the curriculum, may have played a role in their instructional improvement. The second theme that we identified in our interviews was the importance of aligning PD that was not contentspecific with contentfocused PD. All of the beginning teachers in our sample experienced challenges with classroom management, ranging from issues with problematic student behaviors to difficulty motivating students to participate. What differed between the improvers and nonimprovers was how these challenges appear to have been addressed by PD. Both highsupport improvers indicated that their contentfocused PD for managing their instruction provided them with ideas for promoting student success. This included ideas related to the particular manipulative materials used in lessons, as well as modifications to CMP instructional materials that could help maintain focus on learning objectives. For instance, referring to her math teacher cohort group, Kelly explained, “it was like, alright, the kids are gonna have trouble with this, so here’s how you can make this worksheet better or something.” Jen also learned about “anticipating student problems” as part of cohort group discussions. In contrast, the two highsupport nonimprovers participated in several workshops that addressed issues outside of mathematics content, such as integrating technology, implementing multiple intelligences theory, and improving classroom climate. Although these topics may be valuable, the concepts they cover are less likely to improve one’s ability to manage instruction and promote depth of conceptual understanding within the Connected Mathematics curriculum. Both highsupport nonimprovers indicated that the material in these workshops did not cover how to apply ideas to their math instruction. One would not expect contentfocused PD to leverage instructional improvement if the goals of the math PD are misaligned with other PD or supports that the teacher is receiving (Desimone et al., 2014). Aligning the PD that isn’t focused on content (but is instead focused on topics such as technology integration or student engagement) with content and curriculum more directly may foster a clearer connection between contentbased PD and improved instruction. DISCUSSION This study was motivated by our interest in how the organizational supports provided to beginning teachers by their schools and districts assist them in improving their instructional practices. The Common Core Standards, as well as the aligned assessments currently being developed through the Smarter Balanced Assessment Consortium and the Partnership for Assessment of Readiness for College and Careers, expect more rigorous instruction than is currently being implemented. The teachers in our study, a sample of beginning seventh and eighthgrade mathematics teachers in 11 urban, suburban, and rural districts across four states, do not provide a strong endorsement of the current teacher support and development system—at least as far as middle school mathematics is concerned. The beginning seventh and eighthgrade teachers in our study tended to introduce their students to tasks of relatively low cognitive demand (e.g., unambiguous application of procedures and single representations of concepts) and tended to proceduralize tasks when implemented; if they had a wholeclass discussion of students’ work on the mathematical tasks at all, it tended to be characterized by students providing brief or oneword answers to teacher questions. We found relatively few cases of teachers pressing their students to explain their strategies or connecting their solutions to broader mathematical ideas. These findings would not be particularly surprising if they were just for teachers in their first or second semester of teaching, given that the transition to the profession is challenging in many ways. That relatively few teachers in our study improved their mathematics instruction during their first three years in the profession is troubling. Although we saw the proportion of teachers implementing tasks at a high level increase from 14% to 25% between their first and third years of teaching, the vast majority continued to emphasize applying procedures to solve tasks. Further, we saw no improvement in the average quality of discussion. It could be, of course, that beginning teachers are improving in other critical aspects of teaching, such as organization and classroom management, and that improvement in the rigor of instruction may happen after the third year. We found little evidence that common markers of preservice teacher quality, including having a major in the field taught and more weeks of student teaching, were associated higher levels of instructional quality measured by the IQA. Further, our survey analysis showed that few of the organizational supports that beginning teachers received were associated with the improvement of our beginning teachers’ instructional practices during their early careers. Our interview analysis suggests that the design of the supports and how they relate to a beginning teacher’s needs may distinguish supports that help to improve instructional practice from supports that do not. We see this as an indication that the content and quality of the supports are what drive improvement—aspects that are often missing from survey data. For example, we found a lack of relationship between time spent on mathematicsrelated activities (mathfocused mentoring by a formal or informal mentor or participation in mathfocused PD) and improvement in instructional practice. Although the measures in our study tapped what the literature suggests are key aspects of highquality mentoring (e.g., content focus, subject match) and PD (e.g., content focus, of significant duration), we did not find robust relationships between increased use of these supports and teacher improvement. One interpretation of this finding might be that investing in these supports for beginning teachers does not provide a return of increased learning and improvement. Our case study analysis, however, suggests an alternative interpretation: Important differences within types of contentfocused interactions can shape how effective the supports are. For example, our case studies support the hypothesis that mentoring is likely to have a bigger impact on instructional quality if teachers coplan with their mentors and engage in observation and feedback cycles focused on math instruction. It is likely that mentors and teachers need to explicitly focus on increasing the rigor of instruction for this to be an outcome of their interactions. Further, our case study analyses also suggest that the contentfocused PD of teachers may be more effective at leveraging change in instructional practice if organized around curricular materials and fostering ongoing interactions between teachers and district coaches. Also contrary to the literature, we found that teacher ratings of professional learning communities in the school are not associated with developing the forms of instruction assessed by the IQA. Prior research suggests that teacher norms in a school can support traditional teaching (Rowan & Miller, 2007). The findings from our study suggest that increased math content supports may not be enough to substantially improve teaching quality, particularly when most teachers start teaching with lowlevel tasks and explicit instruction in procedures. Our case study analyses suggest that the content of collaboration may be key and that current forms of collaboration—even those that teachers value—are often not deeply focused on mathematics or the teaching of mathematics. Thus, although our quantitative analyses suggest that, on average, the current menu of supports offered to beginning teachers is unlikely to leverage teacher improvement to the level required for the Common Core Standards to be taught effectively, our case study analyses signal the importance of the goals and design of organizational supports for beginning teachers, particularly how they support the enactment of rigorous instruction while attending to important aspects of the school context. Clearly, not all contentfocused mentoring and PD meet these criteria. We interpret our findings as support for the importance of specifying the nature and quality of supports available to teachers in order to understand their links to teacher improvement and growth. Although not measured directly in our study, one challenge for our beginning teachers may be that they are not surrounded by teachers who are currently effective at the forms of teaching that the IQA values (Cobb & Jackson, 2011b; Cobb, Jackson, Smith, Sorum, & Henrick, 2013). A recent study by Jackson and Bruegmann (2009) documented larger mathematics test score gains for elementary teachers who have more effective colleagues. The authors found that these “spillover” effects are strongest for less experienced teachers—suggesting that schoolwide or districtwide efforts to improve teachers’ instructional quality may have added benefits for the instructional improvement of beginning teachers. One of our key quantitative findings involved the capacity of school leaders to positively influence their beginning teachers’ instructional practices. We found that the academic rigor of beginning teachers’ instruction tends to be higher when they report that their principals (a) tell staff members what is expected of them, (b) offer support and encouragement, (c) enforce school rules for student conduct, (d) recognize staff members for a job well done, (e) provide time for teachers to meet and share ideas with one another, (f) deal effectively with pressure from outside the school, (g) encourage innovative instructional practices, and (h) back teachers up when they need it. Although it is possible that principals who score high on this index explicitly look for new teachers who are committed to improving their instructional practices, the fact that when teachers report an increase in these components of instructional leadership in their schools, there tends to also be an increase in their own instructional quality suggests a more causal interpretation. Our case study analyses suggest that the positive effect of principal support may operate primarily through the organizational climate established by the principal (e.g., what team a teacher is on or whom they assign as a mentor) rather than through the amount of direct support or pressure to improve math instruction. Our quantitative findings suggest that current methods of mentoring and PD are likely not robust enough to support the type of teacher improvement demanded by new math standards, although our qualitative analyses suggest ways of designing these supports to better attend to instructional improvement. Our findings also emphasize the critical role the principal may play. Future research should focus on designing supports for beginning teachers with instructional improvement in mind, while attending to the organizational contexts that can strengthen or inhibit these supports. Acknowledgment This article is based on work supported by the National Science Foundation under Grant No. 0554434. All opinions, findings, conclusions, and recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the National Science Foundation. Notes 1. A total of 66 teachers participated in at least one element of the study, but four teachers were never observed. References Achinstein, B., Ogawa, R., & Speiglman, A. (2004). Are we creating separate and unequal tracks of teachers? The effects of state policy, local situations and teacher characteristics. American Educational Research Journal, 41(3), 557–603. Ahn, R. (2014). How Japan supports novice teachers. Educational Leadership, 71(8), 49–53. Ball, D. L., Thames, M. H., & Phelps, G. (2008). Content knowledge for teaching: What makes it special? Journal of Teacher Education, 59(5), 389–407. Bauml, M. (2014). Collaborative lesson planning as professional development for beginning primary teachers. New Educator, 10(3), 182–200. Borasi, R., & Fonzi, J. (2002). Foundations: Professional development that supports school mathematics reform. Washington, DC: National Science Foundation. Boston, M. (2012). Assessing instructional quality in mathematics. Elementary School Journal, 113(1), 76–104. Boston, M. D., & Smith, M. S. (2011). A “taskcentric approach” to professional development: Enhancing and sustaining mathematics teachers’ ability to implement cognitively challenging mathematical tasks. ZDM Mathematics Education, 43(6–7), 965–977. Boston, M., & Wilhelm, A. C. (2015). Middle school mathematics instruction in instructionally focused urban districts. Urban Education. doi:10.1177/0042085915574528. Boston, M. D., & Wolf, M. K. (2006). Assessing academic rigor in mathematics instruction: The development of the instructional qualiy assessment toolkit (CSE Technical Report No. 672). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. Boyd, D., Grossman, P., Lankford, H., Loeb, S., & Wyckoff, J. (2009). Teacher preparation and student achievement. Education Evaluation and Policy Analysis, 31(4), 416–440. Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). How people learn. Washington, DC: National Academy Press. Bryk, A. S., & Schneider, B. (2002). Trust in schools: A core resource for improvement. New York, NY: Russell Sage Foundation. Bryk, A. S., Sebring, P. B., Allensworth, E., Easton, J. Q., & Luppescu, S. (2010). Organizing schools for improvement: Lessons from Chicago. Chicago, IL: University of Chicago Press. Cai, J., Moyer, J. C., Wang, N., & Nie, B. (2009). Learning from classroom instruction in a curricular content: An analysis of instructional tasks. In S. L. Swars, D. W. Stinson, & S. LemonsSmith (Eds.), Proceedings of the 31st Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education (Vol. 5, pp. 692–699). Atlanta: Georgia State University. California Commission on Teacher Credentialing. (2015, September). Report on new teacher induction. Sacramento, CA: Author. Campbell, P. F., Nishio, M., Smith, T. M., Clark, L. W., Conant, L. M., Conant, A. H., . . . Choi, Y. (2014). The relationship between teachers’ mathematical content and pedagogical knowledge, teachers’ perceptions, and student achievement. Journal for Research in Mathematics Education, 45(4), 419–459. Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2007). Teacher credentials and student achievement: Longitudinal analysis with student fixed effects. Economics of Education Review, 26(6), 673–682. Cobb, P. A., Henrick, E. C., & Munter, C. (2011, April). Conducting design research at the district level. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Cobb, P., & Jackson, K. (2011a). Assessing the quality of the Common Core State Standards for mathematics. Educational Researcher, 40(4), 183–185. Cobb, P., & Jackson, K. (2011b). Towards an empirically grounded theory of action for improving the quality of mathematics teaching at scale. Mathematics Teacher Education and Development, 13(1), 6–33. Cobb, P., Jackson, K., Smith, T., Sorum, M., & Henrick, E. (2013). Design research with educational systems: Investigating and supporting improvements in quality of mathematics teaching and learning at scale. National Society for the Study of Education Yearbook, 112, 320–349. Cohen, D. K., & Ball, D. L. (1999). Instruction, capacity, and improvement. Philadelphia: Consortium for Policy Research in Education, University of Pennsylvania, Graduate School of Education. Cohen, E. G., & Lotan, R. A. (2014). Designing group work: Strategies for the heterogeneous classroom (3rd ed.). New York, NY: Teachers College Press. Crick, J. E., & Brennan, R. L. (2001). GENOVA (Version 3.1). Crosson, A. C., Boston, M., Levison, A., Matsumura, L. B., Resnick, L. B., Wolf, M. K., & Junker, B. W. (2006). Beyond summative evaluation: The instructional quality assessment as a professional development tool (CSE Technical Report No. 691). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. DarlingHammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence. Education Policy Analysis Archives, 8(1). DeAngelis, K. J., Wall, A. F., & Che, J. (2013). The impact of preservice preparation and early career support on novice teachers’ career intensions and decisions. Journal of Teacher Education, 64(4), 338–355. Desimone, L., Hochberg, E., Polikoff, M., Porter, A., Schwartz, R., & Johnson, L. (2014). Formal and informal mentoring: Compensatory, complementary, or consistent? Journal of Teacher Education, 65(2), 88–110. Desimone, L. M., Smith, T. M., & Phillips, K. J. R. (2013). Linking student achievement growth to professional development participation and changes in instruction: A longitudinal study of elementary students and teachers in Title I schools. Teachers College Record, 115(5), 1–46. DragoSeverson, E. (2012). New opportunities for principal leadership: Shaping school climates for enhanced teacher development. Teachers College Record, 114(3), 1–30. Education Week. (2011). Quality Counts 2011. Washington, DC: Author. Elmore, R. F., Peterson, P. L., & McCarthey, S. J. (1996). Restructuring in the classroom: Teaching, learning, and school organization. San Francisco, CA: JosseyBass. FeimanNemser, S. (2012). Teachers as Learners. Cambridge, MA: Harvard Education Press. FeimanNemser, S., & Parker, M. B. (1990). Making subject matter part of the conversation or helping beginning teachers learn to teach (Research Report No. 903). East Lansing: National Center for Research on Teacher Learning, Michigan State University. FeimanNemser, S., Schwille, S., Carver, C., & Yusko, B. (1999). A conceptual review of literature on new teacher induction. Washington, DC: National Partnership for Excellence and Accountability in Teaching. Ferguson, P., & Womack, S. T. (1993). The impact of subject matter and education coursework on teaching performance. Journal of Teacher Education, 44, 155–163. Fideler, E. F., & Haselkorn, D. (1999). Learning the ropes: Urban teacher induction programs and practices in the United States. Belmont, MA: Recruiting New Teachers, Inc. Gallimore, J., Garnier, H., Bogard Givvin, K. Hollingsworth, H., Jacobs, J. Chui, A.M.Y., . . . Stigler, J. (2003). Teaching mathematics in seven countries: Results from the TIMSS 1999 Video Study. Washington, DC: U.S. Department of Education, National Center for Education Statistics. Garet, M., Porter, A., Desimone, L., Birman, B., & Yoon, K. (2001). What makes professional development effective? Analysis of a national sample of teachers. American Education Research Journal, 38(3), 915–45. Garet, M., Wayne, A., Stancavage, F., Taylor, J., Walters, K., Song, M., . . . Doolittle, F. (2010). Middle school mathematics professional development impact study: Findings after the first year of implementation (NCEE 20104009). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Glazerman, S., Dolfin, S., Bleeker, M., Johnson, A., Isenberg, E., LugoGil, J., . . . Ali, M. (2008). Impacts of comprehensive teacher induction: Results from the first year of a randomized controlled study (NCEE 20094034). Washington, DC: U.S. Department Of Education. Goldhaber, D. D., & Brewer, D. J. (1997). Evaluating the effect of teacher degree level on educational performance. In W. Fowler (Ed.), Developments in school finance, 1996 (NCES Publication No. 97–535, pp. 197–210). Washington, DC: U.S. Department of Education, National Center for Education Statistics. Goldrick, L., Osta, D., Barlin, D., & Burn, J. (2012). Review of state policies on teacher induction. Santa Cruz, CA: New Teacher Center. Goldschmidt, P., & Phelps, G. (2010). Does teacher professional development affect content and pedagogical knowledge: How much and for how long? Economics of Education Review, 29(3), 432–439. Grossman, P., & Thompson, C. (2008). Learning from curriculum materials: Scaffolds for new teachers? Teaching and Teacher Education, 24(8), 2014–2026. Grossman, P., Thompson, C., & Valencia, S. W. (2002). Focusing the concerns of new teachers: The district as teacher educator. In A. M. Hightower, M. S. Knapp, J. A. Marsh, & M. W. McLaughlin (Eds.), School districts and instructional renewal (pp. 129–142). New York, NY: Teachers College Press. Hallinger, P. (2003). Leading educational change: Reflections on the practice of instructional and transformational leadership. Cambridge Journal of Education, 33(3), 329–351. Hallinger, P., & Heck, R. H. (1998). Exploring the principal’s contribution to school effectiveness: 1980–1995. School Effectiveness and School Improvement, 9(3), 157–191. Harris, D. N., & Sass, T. R. (2007). Teacher training, teacher quality, and student achievement (Working Paper No. 3). Washington, DC: National Center for the Analysis of Longitudinal Data in Education Research. Heck, R. H. (1992). Principals’ instructional leadership and school performance: Implications for policy development. Educational Evaluation and Policy Analysis, 14(1), 21–34. Hiebert, J., Gallimore, R., Garnier, H., Givvin, K., Hollingsworth, H., Jacobs, J., . . . Stigler, J. (2003). Teaching mathematics in seven countries: Results from the TIMSS 1999 Video Study (NCES Rep. No. 2003013). Washington, DC: National Center for Education Statistics. Hill, H. C. (2007). Mathematical knowledge of middle school teachers: Implications for the No Child Left Behind policy initiative. Educational Evaluation and Policy Analysis, 29(2), 95–114. Hill, H. C. (2011). The nature and effects of middle school mathematics teacher learning experiences. Teachers College Record, 113(1), 205–234. Hill, H. C., Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42(2), 371–406. Hill, H. C., Schilling, S. G., & Ball, D. L. (2004). Developing measures of teachers’ mathematics knowledge for teaching. Elementary School Journal, 105, 11–30. Ingersoll, R. (2012) Beginning teacher induction: What the data tell us. Phi Delta Kappan, 93(8), 47–51. Retrieved from http://www.kappanmagazine.org/content/93/8/47 Ingersoll, R. M., & Kralik, J. M. (2004). The impact of mentoring on teacher retention: What the research says. Denver, CO: Education Commission of the States. Ingersoll, R. M., & Strong, M. (2011). The impact of induction and mentoring programs for beginning teachers: A review of the research. Review of Educational Research, 81(2), 201–233. Isenberg, E., Glazerman, S., Bleeker, M., Johnson, A., LugoGil, J., Grider, M., & Dolfin, S. (2009). Impacts of comprehensive teacher induction: Results from the second year of a randomized controlled study (NCEE 20094072). Washington, DC: U.S. Department of Education. Jackson, C. K., & Bruegmann, E. (2009). Teaching students and teaching each other: The importance of peer learning for teachers. American Economic Journal: Applied Economics, 1(4), 85–108. Johnson, S., & Birkeland, S. (2003). Pursuing a “sense of success”: New teachers explain their career decisions. American Educational Research Journal, 40(3), 581–617. Junker, B. W., Matsumura, L. C., Crosson, A., Wolf, M. K., Levison, A., Wiesberg, J., & Resnick, L. (2006). Overview of the Instructional Quality Assessment (CSE Technical Report No. 671). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing. Klug, B. J., & Salzman, S. A. (1991). Formal induction vs. informal mentoring: Comparative effects and outcomes. Teaching and Teacher Education, 7(3), 241–251. Kruse, S., Louis, K. S., & Bryk, A. (1994). Building professional community in schools. Issues in Restructuring Schools, 6, 3–6. Lee, V. E., & Smith, J. B. (1996). Collective responsibility for learning and its effects on gains in achievement for early secondary school students. American Journal for Education, 104, 103–147. Leithwood, K., & Jantzi, D. (2008). Linking leadership to student learning: The contributions of leader efficacy. Educational Administration Quarterly, 44(4), 496. Lewis, G. M. (2014). Implementing a reformoriented pedagogy: Challenges for novice secondary mathematics teachers. Mathematics Education Research Journal, 26(2), 399–419. Loeb, S., Beteille, T., & Kalogrides, D. (2012). Effective schools: Teacher hiring, assignment, development, and retention. Education Finance and Policy, 7(3), 269–304. Long, J. S., McKenzieRobblee, S., Schaefer, L., Steeves, P., Wnuk, S., Pinnegar, E., & Clandinin, D. J. (2012). Literature review on induction and mentoring related to early career teacher attrition and retention. Mentoring and Tutoring: Partnership in Learning, 20(1), 7–26. Louis, K., & Marks, H. (1998). Does professional community affect the classroom? Teachers’ work and student experiences in restructuring schools. American Journal of Education, 106, 532–575. Luft, J. A., Dubois, S. L., Nixon, R. S., & Campbell, B. K. (2015). Supporting newly hired teachers of science: Attaining teacher professional standards. Studies in Science Education, 51(1), 1–48. Luft, J., Roehrig, G., & Patterson, N. (2003). Contrasting landscapes: A comparison of the impact of different induction programs on beginning secondary science teachers’ practices, beliefs, and experiences. Journal of Research in Science Teaching, 40(1), 77–97. Ma, L. (1999). Knowing and teaching elementary mathematics: Teachers’ understanding of fundamental mathematics in China and the United States. Mahwah, NJ: Erlbaum. Matsumura, L. C., Garnier, H., Slater, S. C., & Boston, M. (2008). Toward measuring instructional interactions “atscale.” Educational Assessment, 13(4), 267–300. Matsumura, L. C., Slater, S. C., Junker, B., Peterson, M., Boston, M., Steele, M., & Resnick, L. (2006). Measuring reading comprehension and mathematics instruction in urban middle schools: A pilot study of the instructional quality assessment. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. (2013). The Nation’s Report Card: Mathematics 2013 (NCES 2013). Washington, DC: Author. National Council on Teacher Quality. (2014). Interactive map of each state’s teacher policies from 2014. StatebyState Summary. Retrieved from http://www.nctq.org/statePolicy/2014/statePolicyNationalSummary.do National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author. National Research Council. (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century (J. W. Pellegrino & M. L. Hilton, Eds.). Washington, DC: The National Academies Press. Nelson, B. S., & Sassi, A. (2005). The effective principal: Instructional leadership for highquality learning. New York, NY: Teachers College Press. New Teacher Center. (2012). National Association of State Boards of Education discussion guide: Teacher induction—improving state systems for supporting new teachers. Santa Cruz, CA: Author. Pogodzinski, B. (2015). Administrative context and novice teachermentor interactions, Journal of Educational Administration, 53(1), 40–65. Polikoff, M. S., Desimone, L. M., Porter, A. C., & Hochberg, E. D. (2015). Mentor policy and the quality of mentoring. Elementary School Journal, 116(1), 76–102. Polikoff, M. S., Porter, A. C., & Smithson, J. (2011). How well aligned are state assessments of student achievement with state content standards? American Educational Research Journal, 48(4), 965–995. Ratsoy, E. (1987). Evaluation of the Initiation to Teaching Project: Final report. Edmonton, Alberta, Canada: Alberta Department of Education. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Remillard, J. T., & Heck, D. (2014). Conceptualizing the curriculum enactment process in mathematics education. ZDM: The International Journal on Mathematics Education, 46(5), 705–718. Remillard, J. T., HerbelEisenmann, B., & Lloyd, G. M. (Eds.). (2009). Mathematics teachers at work: Connecting curriculum materials and mathematics instruction. New York, NY: Routledge. Resnick, L. B., & Hall, M. W. (2001). The principles of learning: Study tools for educators. Pittsburgh, PA: University of Pittsburgh, Learning Research and Development Center, Institute for Learning. Ritter, S. (2010). The research behind the Carnegie Learning Math series. Pittsburgh, PA: Carnegie Learning. Robinson, V. M. J., Lloyd, C. A., & Rowe, K. J. (2008). The impact of leadership on student outcomes: An analysis of the differential effects of leadership types. Educational Administration Quarterly, 44(5), 635–674. Ronfeldt, M., Farmer, S. O., McQueen, K., & Grissom, J. A. (2015). Teacher collaboration in instructional teams and student achievement. American Educational Research Journal, 52(3), 475–514. Rowan, B., & Miller, R. J. (2007). Organizational strategies for promoting instructional change: Implementation dynamics in schools working with comprehensive school reform providers. American Educational Research Journal, 44(2), 252–297. Schmidt, W. H., Houang, R. T., Cogan, L., Blomeke, S., Tatto, M. T., Hseih, F. J., . . . Paine, L. (2008). Opportunity to learn in the preparation of mathematics teachers: Its structure and how it varies across six countries. ZDM International Journal on Mathematics Education, 40(5), 735–747. Schoenfeld, A. H. (2002). Making mathematics work for all children: Issues of standards, testing, and equity. Educational Researcher, 31(1), 13–25. Smith, T. M., Desimone, L. M., & Porter, A. C., McGraner, K., & Taylor Haynes, K. (2012). Learning to teach: An agenda for research on the induction and mentoring of beginning teachers. NSSE Yearbook, 111(2), 219–247. Smith, T. M., & Ingersoll, R. M. (2004). What are the effects of induction and mentoring on beginning teacher turnover? American Educational Research Journal, 41(3), 681–714. Stanulis, R. N., & Floden, R. E. (2009). Intensive mentoring as a way to help beginning teachers develop balanced instruction. Journal of Teacher Education, 60(2), 112–122. Stanulis, R. N., Little, S., & Wibbens, S. (Eds.). (2012). Intensive mentoring that contributes to change in beginning elementary teachers’ learning to lead classroom discussion. Teaching and Teacher Education: An International Journal of Research and Studies, 28(1), 32–43. Stein, M. K., Engle, R. A., Smith, M. S., & Hughes, E. K. (2008). Orchestrating productive mathematical discussions: Five practices for helping teachers move beyond show and tell. Mathematical Thinking and Learning, 10(4), 313–340. Stein, M. K., Grover, B. W., & Henningsen, M. (1996). Building student capacity for mathematical thinking and reasoning: An analysis of mathematical tasks used in reform classrooms. American Educational Research Journal, 33(2), 455–488. Stein, M. K., & Lane, S. (1996). Instructional tasks and the development of student capacity to think and reason: An analysis of the relationship between teaching and learning in a reform mathematics project. Educational Research and Evaluation, 2(1), 50–80. Stein, M. K., & Nelson, B. S. (2003). Leadership content knowledge. Educational Evaluation and Policy Analysis, 25(4), 423–448. Stodolsky, S., & Grossman, P. (1995). The impact of subject matter on curricular activity: An analysis of five academic subjects. American Educational Research Journal, 32, 227–249. Sun, M., Youngs, P., Yang, H., Chu, H., & Zhao, Q. (2012). Association of district principal evaluation with learningcentered leadership practice: Evidence from Michigan and Beijing. Educational Assessment, Evaluation, and Accountability, 24(3), 189–213. Supovitz, J., Sirinides, P., & May, H. (2010). How principals and peers influence teaching and learning. Educational Administration Quarterly, 46(1), 31–56. Ten Bruggencate, G., Luyten, H., Scheerens, I., & Sleegers, P. (2012). Modeling the influence of school leaders on student achievement: How can school leaders make a difference. Educational Administration Quarterly, 48(4), 699–732. Thompson, M., Paek, P., Goe, L., & Ponte, E. (2004). Study of the impact of the California formative assessment and support system for teachers: Report 2: Relationship of BTSA/CFASST engagement and teacher practices (ETSRR0431). Washington, DC: Educational Testing Service. Tricarico, K. M., Jacobs, J., & YendolHoppey, D. (2015). Reflection on their first five years of teaching: Understanding staying and impact power. Teachers and Teaching: Theory and Practice, 21(3), 237–259. U.S. Department of Education. (2010). Collegeandcareerready students. Washington, DC: Author. Retrieved from http://www2.ed.gov/policy/elsec/leg/blueprint/collegecareerready.pdf Valli, L., Croninger, R., & Buese, D. (2012). Studying highquality teaching in a highly charged policy environment. Teachers College Record, 114(4), 33. Wei, R. C., DarlingHammond, L., & Adamson, F. (2010). Professional development in the United States: Trends and challenges. Stanford, CA: Stanford Center for Opportunity Policy in Education and Dallas, TX: National Staff Development Council. Wilson, P. H., Sztajn, P., Edgington, C., & Confrey, J. (2014). Teachers’ use of their mathematical knowledge for teaching in learning a mathematics learning trajectory. Journal of Mathematics Teacher Education, 17(2), 149–175. Windschitl, M., Thompson, J., & Braaten, M. (2011). Ambitious pedagogy by novice teachers: Who benefits from toolsupported collaborative inquiry into practice and why? Teachers College Record, 113(7), 1311–1360. Wood, M. B., Jilk, L. M., & Paine, L. W. (2012). Moving beyond sinking or swimming: Reconceptualizing the needs of beginning mathematics teachers. Teachers College Record, 114(8), 1–44. Youngs, P., HoldgreveResendez, R. T., & Qian, H. (2011). The role of instructional program coherence in beginning elementary teachers’ induction experiences. Elementary School Journal, 111(3), 455–476.


