Developing the "Will": The Relationship between Teachers’ Perceived Policy Legitimacy and Instructional Improvement
by Jihyun Kim, Min Sun & Peter Youngs - 2019
Background/Context: As part of a nationwide initiative that re-conceptualized teacher evaluation, Virginia issued the Guidelines for Uniform Performance Standards and Evaluation Criteria for Teachers on July 1, 2012; these guidelines marked a significant overhaul of the state’s approach to teacher evaluation. Previous studies examined the impact of teacher evaluation policies on student achievement (e.g., Dee & Wyckoff, 2015; Steinberg & Sartain, 2015; Taylor & Tyler, 2012), but there has been little empirical research on factors that lead teachers to change their instructional practices in response to teacher evaluation.
Purpose/Objective/Research Question/Focus of Study: We focused on an important element of policy implementation: teachers’ perceptions of the legitimacy of teacher evaluation policies. Specifically, we asked: 1) How do teachers’ perceptions of the legitimacy of teacher evaluation policies influence their efforts to improve their instruction? and 2) What school supports are associated with an increase in teachers’ perceived policy legitimacy? Our examination of teachers’ perceived legitimacy of teacher evaluation policies is critically important because individuals’ beliefs affect their willingness to respond to externally initiated reform in productive ways and to generate sustainable changes in instruction.Research Design: To examine the potential impact of teachers’ perceived legitimacy of teacher evaluation policies on their instruction and the effects of various supports on teachers’ perceptions, we drew on teacher survey data and teacher evaluation ratings from two school districts in Virginia. We collected two years of teacher survey data, and three years of teacher evaluation ratings. Combining two different data sets, we provided evidence of an association between teachers’ perceived legitimacy of teacher evaluation policies and their instructional practice.
Conclusions/Recommendations: Our findings indicate that teachers’ perceived legitimacy of evaluation policies is positively correlated with their likelihood of taking actions to improve their instruction. That is, developing teachers’ perceptions of policy legitimacy seems to be a fruitful strategy for promoting changes in instruction. Moreover, teachers’ perceived legitimacy of teacher evaluation policies seems to have a positive relationship with various school supports, such as principal leadership, professional development, and time and resources.
The consequences of even the best planned, best supported, and most promising policy initiatives depend finally on what happens as individuals throughout the policy system interpret and act on them . . . Policy success depends critically on two broad factors: local capacity and will (McLaughlin, 1987, p. 172).
Recent teacher evaluation reforms, propelled by Race to the Top (RTTT) and Elementary and Secondary Education Act (ESEA) Title I waivers, have used multiple measures of teacher performance that focus on classroom instruction and student learning outcomes. According to The National Council on Teacher Quality, by the end of 2015, 42 states and the District of Columbia had altered their teacher evaluation policies to require student achievement or progress as a significant criterion, and 37 states and the District of Columbia required multiple observations for some or all teachers (Doherty & Jacobs, 2015). As part of a nationwide initiative that re-conceptualized teacher evaluation, Virginia issued the Guidelines for Uniform Performance Standards and Evaluation Criteria for Teachers on July 1, 2012; these guidelines marked a significant overhaul of the states approach to teacher evaluation. This system aimed to promote instructional improvement through an iterative process of observations, principal-teacher feedback, and teacher self-reflection and refinement of their practices.
Some researchers have studied the impact of these teacher evaluation policies on instructional outcomes (e.g., Dee & Wyckoff, 2015; Steinberg & Sartain, 2015; Taylor & Tyler, 2012), but there is little empirical research on the factors that lead teachers to change their instructional practices in response to teacher evaluation. In this study, we focus on an important element of policy implementation: teachers perceptions of the legitimacy of teacher evaluation policies. Perceived legitimacy is based on individuals beliefs that a given policy is proper, just, and valuable. This internalized norm or value shapes their behavior and increases their will to respond to the policy. When individuals perceive a given policy to be legitimate, they tend to respond to the policy in more productive ways (Tyler, 2006; Van der Toorn, Tyler, & Jost, 2011). Our examination of teachers perceived legitimacy of teacher evaluation policies is critically important because individuals beliefs affect their willingness to respond to this externally initiated reform in productive ways and to generate sustainable changes in instruction.
This study addresses significant gaps in the literature. To our knowledge, this study is one of the first to examine teachers perceived legitimacy of teacher evaluation policies. While legitimacy theory is developed and supported by empirical studies in social psychology (Tyler, 1997) and public policy (Jackson et al., 2012; Sunshine & Tyler, 2003), it has rarely been applied in educational research. Recently enacted teacher evaluation policies provide a unique context for studying teachers perceptions of policy legitimacy. Although these policies are designed to strengthen the link between incentives and teacher performance, there are very few chronically low-performing teachers whose job security is threatened and very few high performers who are rewarded as a result of teacher evaluation. In most districts, the majority of teachers are not directly subject to either rewards or sanctions.
Therefore, to increase the likelihood that teacher evaluation policies actually alter the practices of most or all teachers, policymakers need to rely on a different approach to motivation than solely rewards and sanctions. As studies on the effects of legitimacy in other fields have suggested, policy legitimacy can be another form of power, distinct from policymakers control over incentives or sanctions, that enables authorities to shape the behavior of teachers (e.g., Lammers, Galinsky, Gordijn, & Otten, 2008; Porter et al., 2003; Tyler, 2006; Tyler, Schulhofer, & Huq, 2010; Van der Toorn, Tyler, & Jost, 2011). That is, perceived policy legitimacy can motivate teachers to change their practices voluntarily, to align with the expectations of a given policy, rather than following rules based on the fear of punishment or the anticipation of reward.
Moreover, among the few studies of teachers perceptions about teacher evaluation (Jiang, Sporte, & Luppescu, 2015; Milanowski & Heneman, 2001; Tuytens & Devos, 2010), the degree to which teachers perceptions influenced their classroom practices has not been examined. Given that one desired outcome of teacher evaluation is to improve instruction by providing teachers with quality feedback (Darling-Hammond, 2014; Firestone, 2014; Papay, 2012), it is important to examine the relationship between teachers perceptions of teacher evaluation policy and changes in their practices. More broadly, most studies of teachers perceptions of and responses to educational reforms are qualitative in nature (e.g., Coburn, 2001; Spillane, Reiser, & Reimer, 2002).
Using two years of teacher surveys and three years of teacher evaluation data from two school districts in Virginia, our study draws on the concept of policy legitimacy from public policy and social psychology to investigate teachers responses to recently enacted teacher evaluation policies (e.g., Smoke, 1994; Tyler, 2006; Wallner, 2008). Specifically, we ask:
How do teachers perceptions of the legitimacy of teacher evaluation policies influence their efforts to improve their instruction?
What school supports are associated with an increase in teachers perceived policy legitimacy?
We first review studies on the implementation of teacher evaluation policies and extend prior work on policy legitimacy to develop our conceptual framework. Second, we describe the state policy context in Virginia. Third, we introduce our sample and estimation strategy. Fourth, we present estimation results and, finally, we discuss the implications of our results.
RESEARCH ON TEACHERS PERCEPTIONS AND BELIEFS IN POLICY IMPLEMENTATION
This study is situated in the broader literature on teachers perceptions about policies and policy implementation. Educational policy implementation is often situated in complex school situations. Thus, it is important for researchers to focus on the conditions, if any, under which various education policies get implemented and work (Honig, 2006, p. 2), rather than simply on what was implemented or what worked, as prior studies have done. To be specific, Honig (2006) emphasized interactions between policies, people, and places as the conditions for policy implementation. This study is well aligned with this view of policy implementation as we argue that the impact of teacher evaluation policies depends on each teachers perceived legitimacy of the policies. Perceptions of policy legitimacy are a product of interactions between teachers and principals based on the design of a given policy.
This study also expands the line of inquiry into policy implementation based on sensemaking theory (c.f. Coburn, 2001; Spillane et al., 2002). Sensemaking theory advances educational policy implementation research beyond just problems of will and organizational structure of local agents by examining social learning and the cognitive aspects of policy implementation (Park & Datnow, 2009). For example, on the surface, teachers may seem to lack the will to change their practices in response to a new reform but, in fact, they may choose their actions based on previous experience and professional knowledge with good intentions for contributing to their students academic development. Teachers perceived legitimacy is a product of their cognitive judgment based on various sources of information about a given policy and the policy implementation process, and is likely to affect their behavioral responses to the policy.
In terms of studies of the implementation of teacher evaluation policies, the previous literature mostly features descriptive case studies using surveys and interviews to investigate teachers or principals views on the policies or the challenges of implementation. The findings from these studies can be summarized as: (a) local actors affected by teacher evaluation policies, principals, and teachers, mostly appreciate the goal of the policies, which is professional growth (Delvaux et al., 2013; Donaldson, 2012; Kraft & Gilmour, 2015); (b) they have had positive experiences with the observation process (Jiang et al., 2015; Tuytens & Devos, 2009) and the overall evaluation process (Lacireno-Paquet, Bocala, & Bailey, 2016); (c) conditions for implementation significantly vary across schools, districts, and states (Donaldson, Woulfin, LeChasseur, & Cobb, 2016; Herlihy et al., 2014; Riordan et al., 2015); and (d) principals and teachers face substantial challenges such as time, proper professional development, and expertise (Kimball, 2002; Kraft & Gilmour, 2015; Milanowski & Heneman, 2001; Riordan et al., 2015). These studies assume that teachers (or principals) perceptions about the policies are a critical part of successful policy implementation. Although this assumption is supported by classic studies (c.f., McLaughlin, 1987; Weatherly & Lipsky, 1977), there are two main gaps in these previous studies in terms of how researchers define perceptions and measure the impact of such perceptions.
First, although these case studies depict important aspects of teachers views of teacher evaluation policies in particular districts or states, they do not examine the link between these perceptions and changes in teachers instructional practices. That is, these studies assume that teachers views on teacher evaluation are important but provide little evidence that their views are actually critical for teaching and learning. For example, Delvaux and colleagues (2013) found that teachers who perceived the purpose and criteria of teacher evaluation as being clear were more likely to report positive effects of teacher evaluation on their professional development. The outcome in Riordan et al. (2015) was implementation fidelity measured by the proportion of the teachers who participated in the teacher evaluation process. As the authors noted, their study did not focus on the quality or breadth of implementation (p. 2). In these studies, the results focus only on teachers perceptions (e.g., their reported support for recently enacted policies or their self-reported effects on their professional growth) or the number of teachers who were evaluated, not the desired outcome of the policy, to enhance instructional quality and student learning.
Second, these studies capture different aspects of teachers views but lack a coherent explanation for them. Some studies examined teachers perceptions about different aspects of teacher evaluation policies, including clarity, practicality, and cost (Jiang et al., 2015; Tuytens & Devos, 2010). Others put more emphasis on teachers general judgement about how policies were implemented. For example, Kimballs (2002) framework included teachers perceptions of 1) the quality of feedback from evaluators; 2) fairnessincluding whether the teachers were aware of evaluation expectations; 3) opportunities for teacher input into the evaluation; and 4) enabling conditions, such as professional development and time. In sum, these studies addressed different aspects of teachers perceptions about the evaluation process. This is partially attributable to the absence of a guiding theory for studying teachers perceptions of teacher evaluation policies. In this study, we address this gap by drawing on the concept of policy legitimacy to explain teachers changes in behavior in response to teacher evaluation policies.
Moreover, some studies are based on the pilot phase of teacher evaluation policies when high stakes were not attached to evaluation results (e.g., Donaldson et al., 2016; Firestone et al., 2013; Riordan et al., 2015). Teachers interpretations of and responses to teacher evaluation policies can be entirely different when the policies are fully enacted with high-stakes decisions attached, such as rewarding effective teachers or dismissing ineffective ones. Our study addressees this gap by using data from two school districts full implementation of teacher evaluation policies, rather than during the pilot phase, to examine the association between teachers perceptions of such policies and changes in their instruction.
Existing research suggests that teachers buy-in and support for a given policy matters significantly (c.f., McLaughlin, 1987; Weatherly & Lipsky, 1977). However, as noted above, most prior measures of teachers perceptions about teacher evaluation policies are not based on theory. This section draws on and expands a theoretical model of legitimacy from social psychology and public policy to offer a framework for examining teachers perceptions of the legitimacy of teacher evaluation policies and identifying conditions that can shape their perceptions of policy legitimacy.
WHAT IS LEGITIMACY?
Perceptions of policy legitimacy enhance ones motivation to comply with a policy or an authority even in the absence of incentives or sanctions (Tyler, 2006; Wallner, 2008; Weber, 1978). Legitimacy is a property of an authority, institution, or social arrangement that leads people connected to it to believe that the authority, institution, or social arrangement is appropriate, proper, and just. In the case of teacher evaluation policies, legitimacy can be judged based on three closely related factors: reliable and valid instruments, procedural fairness, and worthiness of effort.
First, the instruments used to evaluate teachers are a core aspect of the current policy context, and teachers might consider teacher evaluation policies to be legitimate when valid and reliable instruments are used (Tyler, 2006). The use of the instruments should promote effective communication among teachers and their principal (Coggshall, Rasmussen, Colton, Milton, & Jacques, 2012), capture the quality of instruction, and identify strengths and weaknesses of instruction for individual teachers, so that they can act on the feedback and improve their teaching. The fervent scholarly debates about the reliability and validity of teacher evaluation instruments, such as classroom observation tools, value-added measures, and student surveys, reflect the importance of the tools themselves (e.g., Hill, Kapitula, & Umland, 2011; Kane, Taylor, Tyler, & Wooten, 2011; Measures of Effective Teaching (MET), 2013; Rothstein, 2010; Taylor & Tyler, 2012; see Youngs & Haslam, 2012, for review). However, using a valid and reliable tool for teacher evaluation does not necessarily guarantee the desired outcomes of the policies, which are improved instruction and student learning (Harris & Herrington, 2006). Based on the legitimacy framework, teachers subjective judgements about the reliability and validity of the tools might be more important than the actual measurement properities of these instruments, as teachers perceptions about the instruments can directly affect their behavioral responses to their evaluations.
Second, procedural fairness is another key aspect of policy legitimacy (Sunshine & Tyler, 2003; Tyler, 1997; Tyler, 2006; Van der Toorn et al., 2011). Sunshine and Tyler argued that the antecedent of legitimacy is the fairness of the procedures (2003, p. 513). People are more willing to comply with decisions made by authorities when they perceive that the authorities are exercising their power in equitable ways (Tyler, 2006). In terms of teacher evaluation policies, teachers responses to two authorities are key: (a) principals, who usually evaluate them and use the results for various purposes, and (b) district administrators, who make decisions about policy design and allocate resources for teacher evaluation. The ways in which these two authorities exercise their power as part of the teacher evaluation process influence whether teachers perceive such policies as being fairly implemented. For example, principals and district administrators communicate evaluation standards to teachers, train evaluators, conduct pre-observation and post-observation conferences, offer assistance to teachers, and establish due process for dismissing ineffective teachers and rewarding effective ones. Fair procedures can increase the likelihood that teachers feel that they have control over decisions and are competent to take actions. Moreover, procedural fairness is influenced by instrument reliability and validity. In one study, when fine-grained evaluation standards were used, teachers considered the system more equitable compared with the previous system where principals made judgments about teacher performance without any guidelines (Milanowski & Heneman, 2001). Procedural fairness is widely viewed as one of the major features of successful teacher evaluation systems (Colby, Bradshaw, & Joyner, 2002; Delvaux et al., 2013; Kimball, 2002; Milanowski & Heneman, 2001).
Third, the worthiness of the anticipated future outcomes is another important source of policy legitimacy (Van der Toorn et al., 2011). The key assumption is that people react to instrumental aspects of their experiences with a recently enacted policy, including the favorability or desirability of the outcomes obtained or expected from conforming to authorities (Tyler, 1997). That is, the extent to which teachers perceive that the teacher evaluation process is worth their efforts and advances their interests is closely related to their perceptions of the legitimacy of teacher evaluation policies. This is also related to the abilities of authorities to achieve their goals. Within the two models of psychological dynamics underlying legitimacy introduced by Tyler (1997), the resource-based instrumental model suggests that people look for evidence of competence and likely success in resolving group problems when evaluating authorities (Tyler, 1997, p. 325). Accordingly, this third aspect of perceived legitimacy, related to the outcomes of a policy, is likely to be a cumulative result of the first two aspects. For example, Hegtvedt, Clay-Warner, and Johnson (2003) argued that procedural justice could impact members perceptions of policy outcomes. That is, if teachers perceive the teacher evaluation tool to be reliable and valid and the process as fair, they would be likely to regard the future outcome of this policy as being relevant to their ownand their studentsinterests and believe that it is possible to improve instruction and student learning via the current system.
WHAT PROMOTES LEGITIMACY?
Drawing on literature on teacher evaluation policy, policy implementation, and teacher sensemaking (e.g., Coburn, 2001; Spillane et al., 2002; Steinberg & Sartain, 2015; Sun, Frank, Penuel, & Kim, 2013; Sun, Penuel, Frank, Gallagher, & Youngs, 2013), we hypothesize that three school supports may promote or constrain teachers perceived policy legitimacy particularly in the context of implementing teacher evaluation policies: principal leadership, professional development, and resources and time related to teacher evaluation. In contrast to the three aspects of policy legitimacy noted above, these factors are school supports that can affect teachers perceived legitimacy of teacher evaluation policies.
Principals are particularly important in promoting teachers perceived legitimacy of teacher evaluation policies because they are not only the leaders of their schools but also evaluators of teachers (Colby et al., 2002; Delvaux et al., 2013; Halverson, Kelley, & Kimball, 2004; Jiang et al., 2015; Sartain et al., 2011; Steinberg & Sartain, 2015; Tuytens & Devos, 2010). We relied on two frameworks to examine the role of principals in promoting legitimacy: instructional and transformational leadership (Delvaux et al., 2013). According to an instructional leadership framework, principals tend to focus on using the teacher evaluation process to gather data on classroom instruction, provide feedback on strengths and weaknesses of teachers instruction, and support teachers efforts to improve based on student achievement data. In other words, principals would treat teacher evaluation as a useful tool for instructional improvement rather than another administrative task. This focus of instructional improvement is likely to increase teachers perceived legitimacy of teacher evaluation policy. In contrast, if evaluators do not establish appropriate goals for evaluation or have proper knowledge and skills, even highly reliable and valid instruments may be used improperly and, thus, teachers may perceive the outcome of evaluation as neither feasible nor worth their effort.
Transformational leadership focuses on cultivating an organizations capacity to address its challenges (Hallinger, 1992; Leithwood & Jantzi, 1999; Marks & Printy, 2003). Accordingly, the active roles that principals (and teachers) play in restructuring school organizations are emphasized; as those closest to students, teachers are likely to initiate changes (Hallinger, 1992). Principals can play an active role in implementing teacher evaluation policies by communicating the policies to teachers and rallying support for them.
Professional development can affect teachers capacity to modify their beliefs and improve their instructional practices (Sun et al., 2013; McLaughlin, 1987). Although capacity is separate from beliefs, it shapes beliefs. The recently enacted teacher evaluation policies include challenging tasks for teachers. They need to establish rigorous learning goals for students, develop assessments to monitor students progress, and reflect on and modify their instructional practices. Investment in the development of teachers knowledge and skills and their understanding of teacher evaluation policies is likely to increase their self-efficacy. Teachers attitudes regarding their capacity to earn high evaluation ratings and change their practice influence their perceptions of the quality of evaluation instruments, the fairness of the procedures, and the worthiness of their efforts (Guskey, 1988). Conversely, the lack of teacher capacity may weaken teachers support for the recently enacted policies or even their commitment to the teaching profession (Ford, Van Sickle, Clark, Fazio-Brunson, & Schween, 2017). In particular, enacting standards-based teacher evaluation without substantial investment in building teachers capacity to use data to adjust instruction is unlikely to elicit improved performance from low-performing teachers (Farrell & Marsh, 2016; Marsh, Bertrand, & Huguet, 2015).
Lastly, insufficient resources, particularly funding and time, can lead teachers to question the feasibility of a policy and its legitimacy. If political actors rush to enact teacher evaluation policy, they may be unable to garner support from individuals needed for successful implementation or to create a meaningful consensus to guarantee the sustainability of the initiative (Wallner, 2008). Teacher evaluation policies are time-sensitive and complex, and educators need time and resources in order to make productive use of them (Coggshall et al., 2012; Halverson et al., 2004; Kimball, 2002). Without sufficient resources, teachers and principals are more likely to treat the evaluation process as an administrative burden rather than a legitimate process for improving teaching quality (Coggshall et al., 2012).
VIRGINIA UNIFORM PERFORMANCE STANDARDS AND EVALUATION CRITERIA FOR TEACHERS
On April 28, 2011, the Board of Education of Virginia approved the revised Guidelines for Uniform Performance Standards and Evaluation Criteria for Teachers and the Virginia Standards for the Professional Practice of Teachers. The guidelines and standards became effective on July 1, 2012, and they represented a significant overhaul of conventional teacher evaluation practices. The guidelines called for 40% of a teachers evaluation rating to be based on student academic progress, one of seven standards. Each of the other six standards represented 10% of a teachers evaluation rating. These included professional knowledge, instructional planning, instructional delivery, assessment for and of learning, learning environment, and professionalism (see Table 1 for details). For each standard, teacher performance was rated on a four-point scale from exemplary to unacceptable. Annual summative evaluations categorized teacher performance into four groups: those whose performance consistently exceeded expectations on all standards (exemplary), those who met the standards (proficient), those who had an opportunity for improvement (developing), and those who did not meet expectations (unacceptable). If a teachers performance with regard to one or more of the standards was rated as unacceptable or developing, their school district placed them in either support dialogue or a performance improvement plan. When a teacher still did not make proper progress, even after being supported by an evaluator with a performance improvement plan, they could be dismissed.
Typically, Virginia districts collected evidence on teachers performance from three sources: classroom observations and walkthroughs, teacher documentation of student growth objectives, and student/parent surveys. Specifically, the classroom observations and walkthroughs were often conducted by school administrators. Probationary teachers (similar to pre-tenured teachers) were observed at least three times per year, while teachers employed under a continuing contract (similar to tenured teachers) are observed at least once per year.
The Virginia Department of Education (VDOE) did not require school districts to use particular classroom observation instruments. Thus, some districts chose to employ widely used observation tools such as the Framework for Teaching (FFT) and the Classroom Assessment Scoring System (CLASS; Danielson, 2013; Pianta & Hamre, 2009). In contrast, other districts used district-developed observation instruments or permitted principals to select or develop their own approaches to classroom observation. In this study, Districts A and B allowed principals to identify or create their own approaches to classroom observation. For example, principals in District A indicated that they did not use an existing classroom observation tool; instead, they scripted what they observed (i.e., they wrote down the student and teacher behaviors that they saw) and reported them to the teachers they observed.
In both school districts that participated in this study, districts had the option of administering student surveys to provide additional feedback to teachers. Teachers documentation provided information about aspects of their practices that principals could not observe and served as a basis for self-reflection and two-way communication between them and their evaluators.
In both school districts, teachers on annual contracts received a summative evaluation during each of their probationary years, as well as a mid-year interim review to provide systematic feedback prior to the summative review. However, the evaluation procedures for teachers on continuing contracts differed across the two districts. District A teachers received a summative evaluation every year, while their counterparts in District B received a summative evaluation every three years. These attributes of district evaluation policies, including variation between the districts and varying evaluation requirements for different categories of teachers, had significant implications for the conceptual framing of this study and our analyses (Sun, Mutcheson, & Kim, 2016).
We situate the Virginia system in the larger context of teacher evaluation reform nationwide. According to a 2015 NCTQ report, Virginia was one of 43 states that included objective student achievement measures in teacher evaluations; one of 38 states that emphasized teachers receiving feedback based on teacher evaluations; one of 23 states in which teacher evaluation ratings informed teacher tenure decisions; one of 34 states that used four categories to evaluate teachers; one of 27 states that required multiple observations only for new teachers; and one of 30 states that allowed districts to design their own system (Doherty & Jacobs, 2015). Although the NCTQ report did not provide further details, the main components and procedures of teacher evaluation in Virginia were largely consistent with those of other states as of 2015.
DATA AND SAMPLE
To examine the potential impact of teachers perceived legitimacy of teacher evaluation policies on teachers instruction and the effects of various supports on these perceptions, we drew on teacher survey data and teacher evaluation ratings from two school districts in Virginia. We collected two years of teacher survey data during the 201314 and 201415 school years, and three years of teacher evaluation ratings at the end of the 201213, 201314, and 201415 school years. These two districts were mid-sized suburban school districts that served mostly white students. In the 201213 school year, District A served about 3,900 students1 in six schools; among them, 77% were white and about 30% were eligible for free or reduced lunch. District B served about 4,900 students in 11 schools; 91% of these students were white and about 23% were eligible for free or reduced lunch. Total enrollment and student composition in both school districts remained stable during the years of our study. In terms of school performance, both school districts were high-achieving; all schools in District A were accredited by the State of Virginia based on their 201213 state standardized test results, and only one of District Bs schools was issued warnings due to its low mathematics achievement (Sun et al., 2016).
We invited all teachers from the 17 elementary, middle, and high schools in the two districts who had participated in teacher evaluation in the previous school year (201213 or 201314) to complete the survey during the next school year (201314 or 201415). The survey asked about different aspects of teachers experiences with their evaluations, their perceptions of their principals, their experiences with professional development, and available resources related to teacher evaluation. Among 695 eligible teachers, 392 teachers completed the survey in 201314, which resulted in a 56% response rate; among 452 eligible teachers, 298 teachers completed the survey in 201415, resulting in a 66% response rate. However, not all teachers in District B were assigned teacher evaluation ratings, so the probationary teachers in this district may have been overrepresented in our study. We thus included teachers experience level as a control variable in our analysis. In addition, this unique system in District B led us to draw only on data from District A to address the first research question. We were not able to increase our sample size by including District B in this analysis because it rarely evaluated the same teachers in two consecutive years. There were only 11 teachers from District B who had non-missing data on current-year evaluation ratings, prior ratings, and perceived policy legitimacy. Moreover, District Bs teacher evaluation system was significantly different than that in District A. Therefore, to simplify the analysis for the first research question, we included only District A data in the analysis; this limited the generalizablity of our results.
Among all teachers who were included in our final analytic sample, most were white. Sixty-six percent of teachers who completed the 201314 survey had an advanced degree, while 59% of the 201415 survey completers had an advanced degree. They had an average of 13 to 15 years of teaching experiences, respectively; about 70% of them taught in subjects and grades for which the state administered standardized tests, and 80% of teachers were female. While there were no statistically significant differences in teachers demographic and professional characteristics between survey completers and non-completers, we observed statistically significant differences in prior teacher evaluation ratings (in the 201314 survey), gender, and whether they taught high-stakes subjects/grades (in the 201415 survey) (see Table 3). Teachers who had significantly higher evaluation rating scores in 201213 were more likely to complete the next years survey. Teachers who were female and taught high-stakes subject/grades were more likely to complete the 201415 survey. These patterns only appeared in one year of the survey and they prompted us to include teachers prior rating scores, gender, and whether they taught high-stakes subjects/grades in the analysis to account for teacher selection bias in our sample, as well as interaction terms between the main independent variables and those teacher background variables. Despite our efforts to rule out selection bias as shown in Appendix A, we were not able to fully address this issue, given that we lacked information about non-respondents.
Figure 1 illustrates the timeline for data collection. None of the surveys coincided with the release of teacher evaluation ratings. That is, we administered the surveys a few months after teachers were informed of their evaluation ratings. For example, the first survey was administered between the release of the year 1 and year 2 teacher evaluation ratings; this enabled us to study teachers reactions to the evaluation policy upon knowing their prior years ratings. This temporal order of collecting data on perceived legitimacy prior to observing teachers current year instruction (i.e., as captured by the year 2 evaluation rating) helped us to assess the influence of perceived policy legitimacy on change in instruction at the end of the school year, controlling for prior years ratings.
Figure 1. Timeline for Data Collection
In this section, we elaborate on the key measures we used and how they were derived. We also articulate our rationale for each measure by linking it to our research questions and conceptual framing. All composite measures have high internal consistency with Cronbach’s alpha ≥0.7 and eigenvalue >1.
Teachers Instructional Practices
For our first research question, we hypothesized that an increase in teachers perceived legitimacy of teacher evaluation policies would be positively associated with an increase in their likelihood of changing their instructional practices. The goal of the policies was to help teachers reflect on their teaching practice and increase their impact on student learning. Thus, we focused on three domains that were significantly and positively correlated to teachers value-added scores: instructional planning, instructional delivery, and assessment of and for student learning. We also focused on three domains pertaining to student progress: setting learning goals for students, documenting progress, and using outcome data to inform instruction (Sun et al., 2016). Teachers were rated on a scale from 1= Unacceptable to 4= Exemplary by their principals at the end of each school year. We took an average across these six domains and named this construct Teachers Instructional Practices. However, it is also possible that teachers changed other practices in response to the policies. Thus, we also used another measure, Teachers General Practices, to capture all aspects of teachers practices. We took an average across all domains for this variable. All measures were derived using district administrative data from personnel evaluations from the 201213 through 201415 school years and were standardized at the district level.
The following measures were constructed from a survey administered to all teachers across both districts. Drawing on our conceptualization of teachers perceived legitimacy of teacher evaluation policies, we derived composite measures from survey responses using factor analysis. The survey items were the same in both 201314 and 201415. To facilitate interpretation, we standardized the measure within a given school year. Table 4 presents the results of the descriptive analysis of these composite variables.
Teachers Perceived Legitimacy of Teacher Evaluation Policies
This measure was derived by taking the mean of teachers responses to three items under the question stem To what extent do you agree with the following statements about your evaluation (in a given school year)? The items included Instruments used to evaluate me were precise, Procedures used to evaluate me were fair, and The evaluation process was worth the effort for me. The scale ranged from 0=Not at all, 1=Some extent, 2=Moderate extent, to 3=Great extent.
This measure addressed teachers perceptions of principal leadership relevant to the legitimacy of teacher evaluation policies. This was calculated by taking the mean of teachers responses to four items under To what extent do you agree with the following statements about your principals (in the given school year)? on a four-point scale, ranging from 0=Not at all, 1=Some extent, 2=Moderate extent, to 3=Great extent. To be specific, teachers were asked to rate on: My principal applied the same evaluation procedures to all teachers, My principal had the knowledge and skills to evaluate me, My principal advocated for the Virginia Uniform Performance Standards for Teachers, and My principal made decisions in the best interests of the school. The first item was related to procedural fairness in principals implementation of the policies, the second item was related to principals instructional leadership related to teacher evaluation, and the third and fourth items measured principals transformational leadership in the context of teacher evaluation. Although the fourth item did not explicitly mention teacher evaluation, the whole survey focused on teacher evaluation process; thus, it can be assumed that teachers would consider teacher evaluation while answering the item. These items about teachers perceptions of their principals abilities and trust in the teacher evaluation process are very similar to items used in previous research studies (Jiang et al., 2015; Tuytens & Devos, 2010).
This measure addresses teachers experiences with professional development related to teacher evaluation policies. This measure was calculated by taking the mean of teachers responses to three items about the usefulness of their professional development (i.e., How useful was the professional development on each topic?) on a four-point scale, ranging from 0=Not at all, 1=Some extent, 2=Moderate extent, to 3=Great extent. The items included Content areas I taught, Teacher evaluation, and Make sense of data and use of data to adjust instruction. Although teachers subjective evaluations of usefulness might be insufficient to capture the overall effect of professional development, these have typically been used in studying professional development (Lawless & Pellegrino, 2007).
Resources and Time
This was a composite measure derived by taking the mean of responses to two items: I had sufficient time to complete the evaluation and I had sufficient resources to complete the evaluation. Although principals conducted evaluations, teachers had to prepare all supporting documents, which took a considerable amount of time and resources. Thus, from a teachers perspective, this item might be understood as asking whether teachers had sufficient time and resources to prepare the materials needed for the evaluation process. Riordan and colleagues (2015) used a similar survey item to examine conditions for policy implementation.
For the first research question, we sought to establish an association between teachers perceived legitimacy of teacher evaluation policies and changes in their instructional practices. We used longitudinal measures to account for several observable and unobservable variables that might have confounded this relationship. We controlled for teachers instructional practices in the prior year; this meant that we essentially estimated how variation in perceived legitimacy was related to deviation from teachers expected instruction practices based on their prior instruction. Controlling for prior instruction accounts for important sources of bias in our estimates and adds precision (e.g., Bifulco, 2012; Cook, Shadish, & Wong, 2008; Kane & Staiger, 2008; Shadish, Clark, & Steiner, 2008). Moreover, teachers perceived legitimacy of evaluation policies can be influenced by other concurrent reforms and initiatives in a given school and a given year. It can also be affected by changes in school working conditions. To account for these school-level factors and changes in these factors over time, we included school-by-year fixed effects.2 In addition, other factors related to teachers might have influenced both instruction and teachers beliefs, such as teachers years of experience, educational levels, gender, and subjects that they taught. We controlled for variation among these different variables. Including these variables was particularly important because they enabled us to identify whether the policy had different effects for different teachers.3
Our preferred estimation model is simplified in Equation 1.
Yijt = α + ρYijt-1 + β(LEGITijt) + Xijt γ + Ujt + eijt (1)
where Yijt indicates instructional practice of teacher i, in school j, in year t. LEGITijt is teachers’ perceived legitimacy of teacher evaluation policies self-reported by teacher i in year t. Xijt is a vector of teacher is characteristics, including gender, years of teaching, having a masters degree or higher, and whether they taught high-stakes subject areas. Ujt is school-year fixed effects and eijt is a random error term. The standard errors were clustered at teacher level to account for the correlation among observations about the same individuals over time.
Next, to answer the second research question, we investigated the implementation supports that may have promoted teachers perceived legitimacy of teacher evaluation policies. These supports included principal leadership, the usefulness of professional development, and resources and time related to teacher evaluation. These three supports were measured using teachers self-reported data because there was substantial variation in individual teachers immediate, local conditions even within the same school, and this local variation was more relevant to their perceived legitimacy. By using school fixed effects, we examined variations in legitimacy within a school and factors that might have influenced such variations.
We controlled for teachers perceived legitimacy of the policy in the prior year to account for unobserved factors that may have affected perceived policy legitimacy in the current year. Moreover, including this prior measure enabled us to estimate the effects of implementation supports as the deviation from the expected legitimacy based on perceived legitimacy in the prior year. We first added these three variablesprincipal leadership, professional development, and resources/timeto Equation 2, one at a time, to gauge the main effect of each variable; then we included them all simultaneously in the model to estimate the marginal effect of each variable. Last, because all measures were on the same scale in these two school districts, we included both districts in the analysis.
(LEGIT)ijt = α + ρ(LEGIT)ijt-1 + β1(PRIN)ijt + β2(PD)ijt + β3(RES)ijt + Xijt γ + Ψj + eijt (2)
where LEGITijt is perceived legitimacy of teacher evaluation policies reported by teacher i, in school j, in year t. LEGITijt-1 is teacher i’s perceived legitimacy in the prior year. β1, β2, and β3 represent the degree to which principal leadership, the usefulness of professional development, and resources and time promoted change in teacher i’s perceived legitimacy from the 2013–-14 school year to 2014–15. Ψj is school fixed effects and eijt is a random error term.
It is also possible that teachers who received higher scores in the prior school year would be more likely to perceive teacher evaluation policies as legitimate, regardless of the supports they received (Donaldson, 2012). Therefore, we further controlled for teacher evaluation scores in the 201314 school year that were released prior to data collection on perceptions of policy legitimacy in the 201415 school year. Because including teachers evaluation scores dramatically reduced our sample size, particularly in District B, we used Equation 2 as our preferred model. However, we report the results from the alternative model, too. The results are qualitatively similar to those from Equation 2.
Our measures in this analysis of school supports influence on change in perceptions of policy legitimacy may have suffered from common source bias. That is, teachers who rated the three measures of school support highly may have been enthusiastic about school changes in general. This general trait could have extended to teachers positive perceptions of the legitimacy of teacher evaluation, which would have inflated the true relationship between policy implementation supports and perceptions of policy legitimacy. One way to address common source bias is to use school-level measures of supports, rather than teachers self-reported measures (Boyd et al., 2011). We calculated school means for all teachers responses in a given school while excluding the teachers own response. Although these school means helped us avoid the common-source bias issue, it would not have been appropriate to include them for the purpose of investigating our main research questionthat is, understanding local supports specifically available to individual teachers. There might be variation among teachers in a given school due to their individual interactions with these school supports. The individual-level measures were most relevant for teachers own perceived legitimacy of teacher evaluation policies. Moreover, given that we had only 17 schools in our sample, using school means considerably reduced our statistical power. As expected, none of the three factors were statistically significant in any model, but the coefficients were in the same direction as those of the individual-level measures. Moreover, controlling for teachers perceived legitimacy in prior years would have been a more appropriate way to address common source bias in our analysis. If there were time-invariant factors that predicted a teachers general level of satisfaction (e.g., an enthusiastic personality), the prior measure accounted for these types of unobserved factors.
THE RELATIONSHIP BETWEEN TEACHERS PERCEIVED LEGITIMACY OF TEACHER EVALUATION POLICIES AND THEIR CLASSROOM INSTRUCTION
As shown in Table 5, when a teacher perceived teacher evaluation as legitimate, they were more likely to improve their instructional practices compared to the previous year, controlling for school-year fixed effects and other teacher characteristics.4 A one-standard deviation increase in teachers’ perceived legitimacy of teacher evaluation policies was associated with an 0.129 standard deviation increase in their end-of-year evaluation scores (p≤0.05). Because we controlled for school-year fixed effects, this result captured variation among teachers who worked at the same school in the same school year. The relationship of perceived policy legitimacy with teachers general practices was not statistically significant. This weaker association is partially attributable to the fact that instructional practices can be more susceptible to change for teachers than other aspects of their practices, such as professionalism (e.g., collaboration with other teachers in schools).
Next, we examined the degree to which the association between teachers perceived legitimacy of teacher evaluation policies and changes in their instruction varied across subgroups of teachers. Table 6 summarizes the results. There is weak evidence that the association of perceived legitimacy with instructional improvement was weaker for female teachers and stronger for teachers who held an advanced degree (p<0.1). There were no differential associations based on (a) teachers prior evaluation ratings, (b) teachers experience levels, or (c) whether teachers taught high-stakes subjects/grades.
SCHOOL SUPPORTS AND TEACHERS PERCEIVED LEGITIMACY OF TEACHER EVALUATION POLICIES
Before including school supports in the models, we examined whether teachers perceived legitimacy of teacher evaluation policies was related to their observed traits, such as their experience level, gender, whether they held a higher degree, or whether they taught tested subjects, and their teacher evaluation ratings. Except for teacher evaluation ratings, there was no significant association between these traits and teachers perceived legitimacy of teacher evaluation policies with or without controlling for the pre-measure of perceived legitimacy.
Table 7 reports results from our examination of the relationships between three types of support and teachers perceived legitimacy of teacher evaluation policies. In models 1 through 3, we added each type of support to Equation 2 separately, while in models 5 through 7, we included teachers previous year evaluation ratings as a control. The main inferences of estimates for these three supports remained qualitatively the same between models 1 through 3 and models 5 through 7. Then, in models 4 and 8 we added the three school supports simultaneously to Equation 2 to test the marginal effect of each support, after controlling for the other two types of support. Although the coefficients for the three variables in models 1 through 3 and models 5 through 7 were larger than those in models 4 and 8 respectively, the standard errors were consistent across the models; therefore, multi-collinearity is not a serious concern.
Each of the three types of supports is positively related to an increase in teachers perceived legitimacy of the evaluation. Specifically, a one-standard deviation increase in principal leadership related to teacher evaluation is associated with a 0.371 SD-higher perceived legitimacy of teacher evaluation. Because we include school fixed effects, this estimated association leverages variation among teachers within schools. This within-school variation may stem from individual teachers interactions with their principals, which are predictive of their perceived legitimacy of teacher evaluation policies. The results for professional development related to teacher evaluation and resources/time related to teacher evaluation are similar. With a one-standard deviation increase in professional development or in resources and time, perceived legitimacy would be likely to increase by 0.309 or 0.41 SD, respectively. The estimates for the marginal effects of these three supports remain similar.
Table 8 reports heterogeneous effects of the three school supports on teachers perceived legitimacy of teacher evaluation policies (see Appendix B for further details). We examined heterogeneous effects depending on teachers prior evaluation ratings, gender, experience levels, whether they taught high-stakes subjects/grades, and whether they held advanced degrees. As shown in Panel A, the association between principal leadership and perceived legitimacy did not vary across different groups of teachers. However, the association between professional development and perceived legitimacy was stronger for female teachers (p<0.01), and the association between resources and time and perceived legitimacy was stronger for teachers who taught high-stakes subjects or grades and for female teachers. It is worth noting that the interaction terms between the three kinds of supports and prior evaluation ratings were not statistically significant in Models 1, 6, or 11. That is, the effects of prior evaluation ratings were consistent among teachers who reported different levels of support from principals, professional development, and resources/time.
This study is one of the first to assess the degree to which teachers perceived legitimacy of teacher evaluation policies influences changes in their instruction and to examine which school supports shape such perceptions. Drawing on a conceptual framework of policy legitimacy from social psychology and public policy, our findings indicate that teachers perceived legitimacy of evaluation policies is positively correlated with their likelihood of taking actions to improve their instruction.
This study contributes to the current literature on policy implementation that emphasizes interactions among policies, people, and contexts where policies unfold (Honig, 2006). Researchers and policymakers acknowledge that teachers who participate in reforms affect the success of the reforms in a wide variety of ways (Spillane et al., 2002). This study shows that teachers evaluate the legitimacy of teacher evaluation policies and respond to the policies based on such evaluation.
Moreover, teachers perceived legitimacy of teacher evaluation policies seems to have a positive relationship with various school supports. In contrast to the assertion that their will is less likely to be changed via interventions, this study finds that three strategies can promote teachers perceived legitimacy of the polices, and thus affect teachers will. The first is principal leadership related to teacher evaluation. This is connected to research on instructional and transformational leadership regarding how principals can promote practical changes (Griffith, 2004; Leithwood & Jantzi, 1999; Marks & Printy, 2003). As much research about the influence of principal leadership in the teacher evaluation process has shown, interactions with principals can shape teachers perceived legitimacy of teacher evaluation policies (Delvaux et al., 2013; Halverson et al., 2004; Lacireno-Paquet et al., 2016; Tuytens & Devos, 2010). This is also related to the argument that legitimacy is not only influenced by the nature of a given policy but also by how local leaders or middle-level managers implement it (Tyler, 2006). In this case, how principals act (e.g., whether they evaluate teachers fairly and whether they make decisions in the best interests of the school), how much they know (i.e., the knowledge and skills they have for evaluating teachers), and what they stand for collectively influence how teachers perceive the legitimacy of teacher evaluation policies. Interestingly, the relationship between principal leadership and teachers legitimacy is identified within a given school and within a given school year, which reinforces the notion that teachers individual interactions with their principal matter.
Moreover, professional development can be used as a strategy to develop teachers capacity to make productive use of teacher evaluation policies, which further develops their legitimacy. Our results particularly emphasize the usefulness of professional development in developing teachers use of data to adjust instruction, promoting their knowledge and skills in teaching subject areas, and supporting the evaluation reform in general (Kerr, Marsh, Ikemoto, Darilek, & Barney, 2006; Lasky, Schaffer, & Hopkins, 2009; Little, 2012). We also find that having sufficient resources and time to complete the teacher evaluation process is another source of support for perceived legitimacy. This finding is aligned with literature on the enactment of teacher evaluation policies in that sufficient resources and enabling conditions are essential for policy implementation (Coggshall et al., 2012; Kimball, 2002). It should be noted that our focus is not on teachers overall perceptions about their professional development, resources and time, or principal leadership. We focused especially on these perceptions in relation to teacher evaluation policies by framing them as available supports when such policies are being enacted.
This study has several limitations. First, one may suspect an opposite causal relationship based on our results; Donaldson (2012) reported that teachers with the highest ratings tended to favor recently enacted teacher evaluation policies, while teachers with the lowest ratings tended to have negative attitudes toward them. In our study, we cannot completely rule out the possibility that a similar phenomenon happened, given its observational nature. However, we measured teachers perceived legitimacy of teacher evaluation policies about seven months before they knew their ratings of their performance during the same school year. Thus, it is very unlikely that teachers perceptions would be swayed by their rating scores in the current year. Although our results are less likely to suffer from questions about opposite causality, the identified inference that policy legitimacy affects teachers instruction may still not be causal. For example, there might be some unobserved traits of teachers that make those who are more adaptive to change also more likely to view the policies as fair and valuable, and thus more likely to change their teaching.
The inference could be invalidated by unobserved variables that are correlated with both our independent and dependent variables (e.g., teachers motivation and personality). However, given that none of the teachers observed traits used in the current study had a significant association with teachers perceived legitimacy, it is less likely that other unobserved traits would bias our results. Moreover, as noted earlier, our results were based on models that controlled for pre-measures of two outcomes, teachers instructional practices and perceived legitimacy. That is, we already controlled for some unobserved variables that affected the pre-measures. Given these controls, our results should be interpreted in relation to the deviation from expected outcomes. That is, each teachers expected instructional practices at year 2 were predicted by their year 1 practice, school fixed effects, and other traits, such as their gender and experience levels. Given these expected values, some teachers had much larger growth than others at the same school, and their perceived legitimacy of teacher evaluation policies could explain some of this variation. This also applies to the associations between teachers perceived legitimacy and support.
Second, it should be noted that our sample is not representative of all teachers working in Virginia. We drew on data from two small school districts located in a rural area in Virginia that served relatively higher-achieving students than the state average. The analysis for the first research question was based on data from only one of the two districts. Moreover, teachers who completed our survey tended to receive higher teacher evaluation ratings and were more likely to teach high-stakes subjects/grades than non-respondents. Such characteristics of the sampled districts and teacher respondents make the observed relationship less representative. Thus, we recommend caution in applying the findings from the current study to other contexts.
Third, our examination of the three supports for promoting teachers perceptions of policy legitimacy relied on teachers self-reported data. This may potentially raise an issue of endogeneity. However, it was important to use data on teachers individual experiences. Even when teachers had the same principal, the same professional development, or the same resources, they may have experienced these supports differently. Given that we examined changes in teachers beliefs, it was important to draw on these individual experiences. Nevertheless, we recognize that the results could be more policy-relevant if these measures of school supports were reported by a third party.
Fourth, although we used longitudinal data collected at two different time points, we cannot examine the long-term effects of perceived legitimacy on changes in teachers instruction. It is possible that changing teachers instruction takes more than one year or that teachers perceived legitimacy of teacher evaluation policies has only a short-term effect. While beyond the scope of our study, this is an important research question for future study.
Despite these limitations, the current study contributes significantly to understanding the implementation of teacher evaluation policies. The findings suggest several practical implications. First, developing teachers perceptions of policy legitimacy seems to be a fruitful strategy for promoting successful implementation. Second, this study highlights three supports for helping teachers make productive use of teacher evaluation policies: principal leadership, professional development, and resources/time. Teacher evaluation policies, like those in Virginia, rely on principals to conduct multiple classroom observations to gather evidence on the quality of teachers daily instruction and to communicate useful feedback to them. Similar to findings in Steinberg and Sartains (2015) study of a pilot teacher evaluation program in Chicago, training for principals to improve their capacity to conduct classroom observations and teacher conferences is essential for successful implementation. These trainings should include the use of instruments and rubrics, methods for collecting evidence, and strategies for coaching teachers. Variation in training likely shapes the variation in teachers use of evaluation results to improve practices and in their perceived legitimacy. Professional development is another powerful policy tool to build teachers capacity to make changes in instruction based on teacher evaluation results. Given our findings, the nature and extent of teachers professional development experiences are important for policy implementation, as is whether they find it useful. Although these school supports are well-known conditions that facilitate policy implementation, our study shows how these factors might affect teachers individual behavioral responses, as captured by teachers perceived legitimacy of teacher evaluation.
In sum, although it is well recognized that policy implementation is critical for achieving desired policy outcomes, surprisingly, we know very little about the conditions under which successful implementation unfolds. This study provides quantitative evidence of the importance of developing teachers perceptions of policy legitimacy for the successful implementation of an externally initiated policy reform and possible ways to develop such perceptions.
1. Total enrollment is rounded to maintain district confidentiality.
2. We are not able to include teacher fixed effects here because only 21% of the teachers had repeated observations.
3. We considered Hierarchical Linear Modeling (HLM) initially because of the nested data structure with teachers nested within schools. We then realized that our data might not support HLM, because of the low Interclass Correlation Coefficients (ICCs, i.e., 0.04 and 0.06) and the small number of clusters, i.e., schools, in our data. We thus chose a different method to account for the nested structure of data by using fixed effects (i.e., school-year fixed effects for the first part of the analysis, and school fixed effects for the second part of the analysis), and clustering the standard errors at the school level.
4. Before conducting the main models, we started by examining how the pre-measures were associated with the outcome variables. This is important because it helps justify using the pre-measures as the main control variable. In terms of teachers instructional practices, we ran an ordinary least squares (OLS) regression, which predicted teachers evaluation ratings in the 201314 school year based on the previous years evaluation ratings. The regression coefficient was 0.618 (p<0.001) with an adjusted R-square of 0.407. Similarly, the pre-measure of teachers perceived legitimacy was highly correlated with the current year teachers perceived legitimacy (b= 0.545, p<0.001), with an adjusted R-square of 0.309. That is, for both outcomes, the pre-measures were strong predictors that explained a large portion of the variance, but there were still some parts left. In the current study, we aimed to tease out these unexplained portions of the variance.
We greatly appreciate the generous financial support from both the CLAHS Grant-Writing Incentive Grants and seed funds from the Institute for Society, Culture, and Environment (ISCE) at Virginia Tech. We also would like to thank Brock Mutcheson for his fantastic assistance on data collection, as well as Susan Magliaro, Ken Frank, Courtney Bell, and Alan Seibert for their useful suggestions on this paper.
Bifulco, R. (2012). Can nonexperimental estimates replicate estimates based on random assignment in evaluations of school choice? A within‐study comparison. Journal of Policy Analysis and Management, 31(3), 729–751.
Boyd, D., Grossman, P., Ing, M., Lankford, H., Loeb, S., & Wyckoff, J. (2011). The influence of school administrators on teacher retention decisions. American Education Research Journal, 48(2), 303333.
Coburn, C. E. (2001). Collective sensemaking about reading: How teachers mediate reading policy in their professional communities. Educational Evaluation and Policy Analysis, 23(2), 145170.
Coggshall, J. G., Rasmussen, C., Colton, A., Milton, J., & Jacques, C. (2012). Generating teaching effectiveness: The role of job-embedded professional learning in teacher evaluation. Research & Policy Brief. National Comprehensive Center for Teacher Quality. Retrieved from http://eric.ed.gov/?id=ED532776
Colby, S. A., Bradshaw, L. K., & Joyner, R. L. (2002). Teacher evaluation: A review of the literature. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans.
Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within‐study comparisons. Journal of Policy Analysis and Management, 27(4), 724–750.
Danielson, C. (2013). The framework for teaching evaluation instrument. Princeton, NJ: The Danielson Group.
Darling-Hammond, L. (2014). One piece of the whole: Teacher evaluation as part of a comprehensive system for teaching and learning. American Educator, 38(1), 413.
Dee, T. S., & Wyckoff, J. (2015). Incentives, selection, and teacher performance: Evidence from IMPACT. Journal of Policy Analysis and Management, 34(2), 267297.
Delvaux, E., Vanhoof, J., Tuytens, M., Vekeman, E., Devos, G., & Van Petegem, P. (2013). How may teacher evaluations have an impact on professional development? A multilevel analysis. Teaching and Teacher Education, 36, 111.
Doherty, K. M., & Jacobs, S. (2015). State of the states 2015: Evaluating teaching, leading and learning. National Center on Teacher Quality. Retrieved from: http://www.nctq.org/dmsView/StateofStates2015
Donaldson, M. L. (2012). Teachers perspectives on evaluation reform. Washington, DC: Center for American Progress. Retrieved from: http://eric.ed.gov/?id=ED539750
Donaldson, M. L., Woulfin, S., LeChasseur, K., & Cobb, C. D. (2016). The structure andsubstance of teachers opportunities to learn about teacher evaluation reform: Promise or pitfall for equity? Equity & Excellence in Education, 49(2), 183201.
Farrell, C. C., & Marsh, J. A. (2016). Metrics matter: How properties and perceptions of data shape teachers instructional responses. Educational Administration Quarterly. 52(3), 423462.
Firestone, W. A. (2014). Teacher evaluation policy and conflicting theories of motivation. Educational Researcher, 43(2), 100107.
Firestone, W. A., Blitz, C. L., Gitomer, D. H., Kirova, D., Shcherbakov, A., & Nordon, T. L. (2013). New Jersey Teacher Evaluation, RU-GSE External Assessment, Year 1 Report. New Brunswick, NJ: Rutgers University Graduate School of Education.
Ford, T. G., Van Sickle, M. E., Clark, L. V., Fazio-Brunson, M., & Schween, D. C. (2017). Teacher self-efficacy, professional commitment, and high-stakes teacher evaluation policy in Louisiana. Educational Policy, 31(2) 202248.
Griffith, J. (2004). Relation of principal transformational leadership to school staff jobsatisfaction, staff turnover, and school performance. Journal of EducationalAdministration, 42(3), 333356.
Guskey, T. R. (1988). Teacher efficacy, self-concept, and attitudes toward the implementation of instruction innovation. Teaching and Teacher Education, 4(1), 6369.
Hallinger, P. (1992). The evolving role of American principals: From managerial to instructional to transformational leaders. Journal of Educational Administration, 30(3). 3852.
Halverson, R., Kelley, C., & Kimball, S. (2004). Implementing teacher evaluation systems: How principals make sense of complex artifacts to shape local instructional practice. In W. Hay & C. Miskel (Eds.), Educational administration, policy, and reform: Research and measurementa volume in research and theory in educational administration (Vol. 3, pp. 153188.) Greenwich, CT: George F. Johnson.
Harris, D. N., & Herrington, C. D. (2006). Accountability, standards, and the growing achievement gap: Lessons from the past half‐century. American Journal of Education,112(2), 209–238.
Hegtvedt, K. A., Clay-Warner, J., & Johnson, C. (2003). The social context of responses to injustice: Considering the indirect and direct effects of group-level factors. Social Justice Research, 16(4), 343366.
Herlihy, C., Karger, E., Pollard, C., Hill, H. C., Kraft, M. A., Williams, M., & Howard, S. (2014).
State and local efforts to investigate the validity and reliability of scores from teacher evaluation systems. Teachers College Record, 116(1), 128.
Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794831.
Honig, M. (2006). Complexity and policy implementation: Challenges and opportunities for the field. In M. Honig (Ed.), New directions in education policy implementation (pp. 124). Albany, NY: State University of New York Press.
Jackson, J., Bradford, B., Hough, M., Myhill, A., Quinton, P., & Tyler, T. R. (2012). Why do people comply with the law? Legitimacy and the influence of legal institutions. British Journal of Criminology, 52(6), 10511071.
Jiang, J. Y., Sporte, S. E., & Luppescu, S. (2015). Teacher perspectives on evaluation reform. Educational Researcher, 44(2), 105116.
Kane, T., & Staiger, D. (2008). Estimating teacher impacts on student achievement: An experimental evaluation. NBER working paper 14607.
Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2011). Identifying effective classroom practices using student achievement data. Journal of Human Resources, 46(3), 587613.
Kerr, K. A., Marsh, J. A., Ikemoto, G. S., Darilek, H., & Barney, H. (2006). Strategies to promote data use for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112(4), 496520.
Kimball, S. M. (2002). Analysis of feedback, enabling conditions and fairness perceptions of teachers in three school districts with new standards-based evaluation systems. Journal of Personnel Evaluation in Education, 16(4), 241268.
Kraft, M. A., & Gilmour, A. (2015). Can evaluation promote teacher development? Principals views and experiences implementing observation and feedback cycles. Retrieved from http://scholar.harvard.edu/files/mkraft/files/principals_as_evalutors_3.5_0.pdf
Lacireno-Paquet, N., Bocala, C., & Bailey, J. (2016). Relationship between school professional climate and teachers satisfaction with the evaluation process (REL 2016133). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast & Islands. Retrieved from http://ies.ed.gov/ncee/edlabs
Lammers, J., Galinsky, A. D., Gordijn, E. H., & Otten, S. (2008). Illegitimacy moderates the effects of power on approach. Psychological Science, 19(6), 558564.
Lasky, S., Schaffer, G., & Hopkins, T. (2009). Learning to think and talk from evidence:Developing system-wide capacity for learning conversations. In L. M. Earl & H. Timperley (Eds.), Professional learning conversations: Challenges in using evidence for improvement (pp. 95107). Dordrecht, Netherlands: Springer.
Lawless, K. A., & Pellegrino, J. W. (2007). Professional development in integrating technology into teaching and learning: Knowns, unknowns, and ways to pursue better questions and answers. Review of Educational Research, 77(4), 575614.
Leithwood, K., & Jantzi, D. (1999). Transformational school leadership effects: A replication.School Effectiveness and School Improvement, 10(4), 451479.
Little, J. W. (2012). Understanding data use practice among teachers: The contribution of microprocess studies. American Journal of Education, 118(2), 143166.
Marks, H. M., & Printy, S. M. (2003). Principal leadership and school performance: An integration of transformational and instructional leadership. Educational Administration Quarterly, 39(3), 370397.
Marsh, J. A., Bertrand, M., & Huguet, A. (2015). Using data to alter instructional practice: The mediating role of coaches and professional learning communities. Teachers College Record, 117(4), 140.
McLaughlin, M. W. (1987). Learning from experience: Lessons from policy implementation. Educational Evaluation and Policy Analysis, 9(2), 171178.
Measures of Effective Teaching Project. (2013). Ensuring fair and reliable measures of effective teaching. Seattle, WA: Bill & Melinda Gates Foundation.
Milanowski, A. T., & Heneman, H. G. (2001). Assessment of teacher reactions to a standards-based teacher evaluation system: A pilot study. Journal of Personnel Evaluation in Education, 15(3), 193212.
Papay, J. P. (2012). Refocusing the debate: Assessing the purposes and tools of teacher evaluation. Harvard Educational Review, 82(1), 123141.
Park, V., & Datnow, A. (2009). Conceptualizing policy implementation: Large-scale reform in an era of complexity. In G. Sykes, B. Schneider, & D. Plank (Eds.), Handbook of education policy research (pp. 348361). New York: Routledge.
Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), 109119.
Porter, C. O. L. H., Hollenbeck, J. R., Ilgen, D. R., Ellis, A. P., West, B. J., & Moon, H. (2003). Backing up behaviors in teams: The role of personality and legitimacy of need. Journal of Applied Psychology, 88(3), 391403.
Riordan, J., Lacireno-Paquet, N., Shakman, K., Bocala, C., & Chang, Q. (2015). Redesigning teacher evaluation: Lessons from a pilot implementation (REL 2015030). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast & Islands. Retrieved from http://ies.ed.gov/ncee/edlabs
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125(1), 175214.
Sartain, L., Stoelinga, S. R., Brown, E. R., Luppescu, S., & Consortium on Chicago School Research. (2011). Rethinking teacher evaluation in Chicago: Lessons learned from classroom observations, principal-teacher conferences, and district implementation. Retrieved from http://bibpurl.oclc.org/web/49392
Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103(484), 13341356.
Smoke, R. (1994). On the importance of policy legitimacy. Political Psychology, 15(1), 97110.
Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition: Reframing and refocusing implementation research. Review of Educational Research, 72(3), 387431.
Steinberg, M. P., & Sartain, L. (2015). Does teacher evaluation improve school performance? Experimental evidence from Chicagos excellence in teaching project. Education Finance and Policy, 10(4), 535572.
Sun, M., Frank, K. A., Penuel, W. R., & Kim, C. M. (2013). How external institutions penetrate schools through formal and informal leaders. Educational Administration Quarterly, 49(4), 610644.
Sun, M., Mutcheson, B., & Kim, J. (2016). Teachers use of evaluation for instructional improvement and school supports. In J. A. Grissom & P. Youngs (Eds.), Making the most of multiple measures: The impacts and challenges of implementing rigorous teacher evaluation systems. New York: Teachers College Press.
Sun, M., Penuel, W. R., Frank, K. A., Gallagher, H. A., & Youngs, P. (2013). Shaping professional development to promote the diffusion of instructional expertise among teachers. Educational Evaluation and Policy Analysis, 35(3), 344369.
Sunshine, J., & Tyler, T. R. (2003). The role of procedural justice and legitimacy in shaping public support for policing. Law & Society Review, 37(3), 513548.
Taylor, E. S., & Tyler, J. H. (2012). The effect of evaluation on teacher performance. American Economic Review, 102(7), 36283651.
Tuytens, M., & Devos, G. (2009). Teachers perception of the new teacher evaluation policy: A validity study of the Policy Characteristics Scale. Teaching and Teacher Education, 25(6), 924930.
Tuytens, M., & Devos, G. (2010). The influence of school leadership on teachers perception of teacher evaluation policy. Educational Studies, 36(5), 521536.
Tyler, T. R. (1997). The psychology of legitimacy: A relational perspective on voluntary deference to authorities. Personality and Social Psychology Review, 1(4), 323345.
Tyler, T. R. (2006). Psychological perspectives on legitimacy and legitimation. Annual Review of Psychology, 57(1), 375400.
Tyler, T. R., Schulhofer, S., & Huq, A. Z. (2010). Legitimacy and deterrence effects in counterterrorism policing: A study of Muslim Americans. Law & Society Review, 44(2), 365402.
Van der Toorn, J., Tyler, T. R., & Jost, J. T. (2011). More than fair: Outcome dependence, system justification, and the perceived legitimacy of authority figures. Journal of Experimental Social Psychology, 47(1), 127138.
Virginia Department of Education. (2011). Guidelines for Uniform Performance Standards and Evaluation Criteria for Teachers. Virginia, U.S.
Wallner, J. (2008). Legitimacy and public policy: Seeing beyond effectiveness, efficiency, and performance. Policy Studies Journal, 36(3), 421443.
Weatherly, R., & Lipsky, M. (1977). Street-level bureaucrats and institutional innovation: Implementing special-education reform. Harvard Educational Review, 47(2), 171197.
Weber, M. (1968). Economy and society: An outline of interpretive sociology. New York: Bedminster Press.
Youngs, P., & Haslam, M. B. (2012). A review of research on emerging teacher evaluation systems. Washington, DC: Policy Studies Associates.
ISSUES REGARDING SELECTION BIAS
The main concerns based on the comparison between respondents versus non-respondents of the surveys are: 1) Teachers who had significantly higher evaluation rating scores in 201213 were more likely to complete the next years survey, and 2) Teachers who taught high-stakes subjects/grades were more likely to complete the 201415 survey. More female teachers than male teachers among respondents is less of a concern in that the population (i.e., teachers working in Virginia) is predominantly female, and it is unclear whether teachers gender is related to their perceived legitimacy of teacher evaluation policies or their instructional practices in general.
We acknowledge that there is no perfect way to ascertain whether there is serious selection bias or the direction of such bias (i.e., whether it overestimates or underestimates the effect) without data from non-respondents. Nonetheless, we examined this issue using data from respondents whose characteristics were similar to those of non-respondents. To be specific, we grouped respondents based on their evaluation ratings in 201213 (i.e., teachers whose evaluation ratings were lower than the mean were coded 1; otherwise, they were code 0) and whether they taught high-stakes subjects/grades in 201415 (i.e., teachers who taught high-stakes subjects/grades were coded 1; otherwise, they were coded 0). Using these group memberships, we conducted t-tests on the main variables of interest, perceived legitimacy of policies, three types of supports, and teacher evaluation ratings to examine the direction of the potential bias. Second, we ran the main models using data from only teachers whose characteristics were similar to those of non-respondents (i.e., they had lower evaluation ratings or they did not teach high-stakes subjects/grades).
In terms of teacher evaluation ratings, teachers with a lower evaluation rating had a significantly lower level of perceived legitimacy and lower perceptions of principal leadership, while there was no statistically significant difference in perceptions about professional development or resources/time. Since teachers with lower evaluation ratings had lower values for most independent and dependent variables in the models, including more teachers with lower evaluation ratings might not change the main findings significantly. Though the magnitude of the associations may be weakened by including more teachers whose ratings were relatively lower, the direction of the associations is likely to remain the same. In fact, when we included only respondents whose scores were lower than average in the main models, the main findings stayed similar although all significant associations became insignificant, due to the small sample size.
Teachers teaching high-stakes subjects/grades had no significant association with any of the main variables of interest. Results from models that only data from teachers who did not teach high-stakes subjects/grades again were very similar to the models using all available data. As a result, we concluded that it is hard to find evidence of serious systemic selection bias in our analysis based on available data.