This paper explores how state education officials and their district and local partners plan to implement and evaluate their teacher evaluation systems, focusing in particular on states’ efforts to investigate the reliability and validity of scores emerging from the observational component of these systems.
This article discusses the intended and unintended consequences of high-stakes teacher evaluation. The potential for high-stakes teacher evaluation to meet the intended outcome of a better teacher workforce and improved student achievement is assessed, and the costs of doing so.
This study examines accountability in teacher education in an era of testing. It compares how multiple professions evaluate program outcomes and identifies concerns with overemphasis on value-added models as the basis for assessing the impact of teacher preparation program graduates. Suggestions are offered for possible alternative paths.
This article discusses the papers in the special issue of Teachers College Record addressing broad themes of reliability and validity that raise cautions regarding the usefulness of recent approaches to high-stakes evaluation of educators. Implications are drawn for the long-term health of the teacher labor market.
Foreword to the special issue on High-Stakes Teacher Evaluation.
In the fairytale of US public education reform, the root of all evil has presumably been identified: the dragons of ineffectiveness. In this fairytale, The LA Times, a newspaper team of investigative reporters, hired statisticians, and other columnists have rode in on the back of Value-Added Measurement. In this paper, we present findings from a policy narrative centered on teacher evaluation and effectiveness. We conducted an analysis of 52 articles published between 2009 and 2011 that were from or related to a series on Value-Added Measurement initially published in 2010 by The LA Times. We sought to understand the ways in which discourse choices worked to construct a certain version of policy issues related to teacher quality, positioning some individuals and even national groups on one side of a polarized debate. We have given particular attention to the ways in which the media discourse functioned to politicize and (over)simplify issues related to educational policy and teacher evaluation.
Design-based implementation research offers the opportunity to rethink the relationships between intervention, research, and situation to better attune research and evaluation to the program development process. Using a heuristic called the intervention development curve, I describe the rough trajectory that programs typically follow as they evolve, and argue that research design considerations and methodological choices are best made in consideration of where interventions are along this curve. Further, I contend that, as programs develop, situational influences play a major role in their evolution and consequently require increased attention to design and methodological considerations. By viewing research as an integral part of a program’s development, by making design and methodological choices in consideration of where programs are in their development, and by considering that the situation in which programs evolve as a potential source of change in the nature of the program itself, we alter fundamental perspectives on how research can best contribute to the steady work of building robust programs for educational improvement.
This article approaches the evolving concept of validity of assessments, moving from the scholarship of the past, to the constraints and demands of the present. The use of technology and globalization are raised as challenges to future approaches to validity.
This article provides an analysis of the recent publication of the value-added measurements found in the Teacher Data Reports of the New York City Department of Education.
This article considers the value of the national move toward value-added measures and our current fascination with objective measurements – a fascination that stems from our collective distrust of our teachers and ourselves, and our reluctance to make judgments about the substantive narratives we teach students.
Assessment use has switched from measurement tools to policy levers. Meaning is created by use, and the intense push for test-based accountability and teacher evaluation policies in the U.S. has fundamentally changed the nature of test use. The core meaning of testing has accordingly been qualitatively changed, and serious policy attention to issues of consequential validity counsels against use of tests to drive policy unless, and until, the results that process itself have been validated for their furtherance of recognized goals.
This article, an afterword to the special issue, considers the multiple purposes of test validity.
This research uses oral history narratives to examine the professional choices and trajectories of Teach for America participants over a twenty-year period, attending especially to individuals’ perceptions of their urban teaching experiences, their beliefs, and their reasons for staying in or leaving the urban classroom, with the aim of better understanding the experiences of such teachers and the implications for staffing urban schools.
Examination of the political origins of state performance funding for higher education in six states (Florida, Illinois, Missouri, South Carolina, Tennessee, and Washington) and the lack of its development in another two states (California and Nevada).
Setting the stage for the special issue, this article discusses the increased attention to data use in policy and practice, provides an overview of the major ways that scholars have studied data use, highlights the limitations of the extant research, summarizes the contributions of the articles in this special issue to addressing these limitations, and previews the articles that follow.
This article reviews the literature on the qualities of assessments and identifies three crucial test design elements that can provide insightful feedback to teachers about students’ understanding to inform subsequent instructional choices.
This article synthesizes what we currently know about interventions to support educators’ use of data—ranging from comprehensive, system-level initiatives, such as reforms sponsored by districts or intermediary organizations, to more narrowly focused interventions, such as a workshop. The article summarizes what is known across studies about the design and implementation of these interventions, their effects at the individual and organizational levels, and the conditions shown to affect implementation and outcomes, and concludes by suggesting directions for future research.
Many studies have found that educational accountability policies increase data use, but because accountability has been conceived of as one “treatment,” little is known about the features of accountability systems that are most likely to increase desirable versus undesirable uses of data. I define desirable data use as practices that do not invalidate the inferences about student- and school-level performance that policy makers, educators, and parents hope to make. This article proposes that five features of accountability systems affect how data are used and discusses what we know, and what we don’t know, about their effects. In each of these areas, I propose a research agenda intended to further our understanding of how accountability systems affect data use.
This article offers that many data use studies suggest that the interpretation and use of data take place both within and between individuals who, through social interaction, are both co-constructing and making sense of data and their use. Given the increasing important role of social relationships in data use studies, better theorizing and deeper understanding regarding the dynamics of social influence and processes on the interpretation and use of data are needed. Social network theory and analysis offers a useful conceptual framework and accompanying methods for describing and analyzing the structure of a social system in an effort to understand how social relationships support and constrain the interpretation and use of data in educational improvement.
This article reviews political science theories and findings to inform our understanding of how politics affects efforts to increase data usage in education policy and school reform. Rather than block the door to politics, those who hope to promote informed policy making might consider ways to use politics to protect and defend high-quality data.
This commentary on the special issue on data use highlights the distinctions between data systems intended to improve the performance of school staff and those intended to hold schools and districts accountable for outcomes. It advises researchers to be alert to the differences in the policy logics connected with each approach.
This commentary draws on the articles in this issue to underscore the importance of community engagement and districtwide capacity building as central to efforts to use data to inform accountability and choice, along with school and instructional improvement. The author cautions against treating data as an all-purpose tool absent adequate attention to developing solutions to the problems data illuminate.
This commentary frames the importance of the topic of this special issue by highlighting the changes that have occurred in school systems around data use, particularly in large urban districts, and the need for a more rigorous evidence base. Collectively, the articles in this volume provide a jumping-off point for such a research agenda around data use in schools. Each of the articles identifies significant gaps in our knowledge base and develops useful conceptual frameworks within which to think about the dimensions of data use, the quality of the research evidence, and the implications for the field.
College grades can influence a student’s graduation prospects, academic motivation, postgraduate job choice, professional and graduate school selection, and access to loans and scholarships. Despite the importance of grades, national trends in grading practices have not been examined in over a decade, and there has been a limited effort to examine the historical evolution of college grading. This article looks at the evolution of grading over time and space at American colleges and universities over the last 70 years. The data provide a means to examine how instructors’ assessments of excellence, mediocrity, and failure have changed in higher education.
This study examines the impact of high-stakes exit testing on English learners (ELs) in Texas. Quantitative and qualitative data suggest that high-stakes exit assessments have led to higher EL dropout and lower graduation rates.
This article describes a randomized field trial conducted to estimate the impact of modest monetary incentives on performance on a version of the National Assessment of Educational Progress (NAEP) 12th-grade reading assessment. Monetary incentives have a statistically significant and substantively important impact on both student engagement/effort and achievement.
The goal of this article is to broaden the understanding of what it means for schools and teachers to be held accountable for student learning and to discuss how different accountability frameworks affect instructional practices in classrooms. We take a practice-oriented perspective on assessment, examining how assessments in schools that participated in a class size reduction program intersected with forces of accountability. Building on three years of data collection in 27 classrooms in nine schools, data were generated through multiple methods, including ethnographic observations, interviews, administration of the Classroom Assessment Scoring System (CLASS), document and artifact collection, and analyses of school-level standardized test scores. We identify and discuss three aspects of assessment practices that affect this intersection: alignment, audience, and action.
This article argues that the historical reduction of age at grade level in the 20th century has interacted with test scores that take age into account, resulting in a rise in IQ scores in school populations.
This chapter contrasts the approach to educational evaluation championed in recent educational policy-making with a dialogical epistemology of evaluation. A dialogical epistemology, drawn from the writings of M. M. Bakhtin, enjoins evaluators to consider multiple voices and their relative authority and power in making judgments of the worth or merit of programs. Further, it positions them as participants in policy discussions rather than arbiters of program value whose authority stems from the methods they use.
Framed by the assumptions of ethnomethodology and drawing on methods of conversational analysis, the author analyzed a set of 10 transcripts of Teacher Work Sample scoring conversations to identify patterns in scorer interaction. Interactive rules and strategies are identified and implications and cautions offered for the use of work samples, particularly for high-stakes assessment.