We use data from the Hillsborough County Public Schools (HCPS) district in Florida to consider the consequences of particular characteristics of instruction and testing in high school for the modeling and estimation of value-added measures of school or teacher effectiveness. We show that the traditional value-added model used in NCLB grades and subjects can be generalized to the high school context.
Connecticut experienced two major changes in testing policy for children with disabilities that played a major role in conclusions about educational progress in the state. The responses to these changes in testing policy make Connecticut an illuminating case regarding the problem of high-stakes testing and changes in policies for students with disabilities in a state characterized by deep racial and economic inequity.
This paper was developed as part of the Dynamic Learning Maps™ Alternate Assessment Project under grant 84.373X100001 from the U.S. Department of Education, Office of Special Education Programs. The views expressed herein are solely those of the authors and no official endorsement by the U.S. Department of Education should be inferred.
The accountability movement and high-stakes testing fail to attend to ongoing instructional improvements based on the regular assessment of student skills and teacher practices. The purpose of this chapter is to describe the School System Improvement Project’s hybrid approach to utilizing both formative and summative assessments to (a) inform decisions about effective instruction based on all students’ and teachers’ needs, and (b) guide high-stakes decisions about teacher effectiveness.
Drawing on state-level panel data for the 2007–2009 period, this study examines the potential overuse of test accommodations for students with disabilities as a gaming strategy to inflate state-level proficiency gains in response to high-stakes accountability pressures. We identify particular conditions under which test accommodations are more likely to be used for gaming and specify several directions for further research.
An analysis using a nationally-representative dataset suggests that raising test scores by one standard deviation (SD) would substantially reduce the probabilities that black, Hispanic, Asian, and white students would drop out of high school and would increase the probabilities that students would compile a rigorous high school record, complete algebra 2 in high school, enroll at a 4-year institution, and attain a baccalaureate degree.
This study evaluates the information significance of Oklahoma A–F school accountability grades relevant to the policy objective of achievement equity.
This article analyzes the effects of mandated accountability testing, teachers' knowledge and beliefs, and teachers' milieu on the work of four social studies teachers in one middle school in Texas. The article argues that more comprehensive and holistic research efforts are needed for researches to be able to more fully understand and communicate to readers the combination of factors that impact teachers' work.
The achievement gap may be explained as a consequence of the conventional structure of schooling and the failure to individualize task difficulty and provide performance feedback in a way that is necessary to ensure that all students experience mastery.
This study utilizes a non-equivalent control group design and quantitative analyses to compare the association between classroom grades and standardized test scores.
To allay public concerns that state exit examination mandates might unfairly hinder some students’ educational attainment prospects, most states with exam requirements offer alternative routes to graduation for all students. This study probes the relationship between various exam difficulty-alternative route policy combinations and the subsequent attainment outcomes of tenth-graders.
This article demonstrates that the carryover effects of STAR’s small classes are not robust; the effects are driven mostly by a small number of STAR schools.
This article explains the idea of a neopragmatic postmodernist test theory and offers some thoughts about what changing notions concerning the nature of and meanings assigned to knowledge imply for educational assessment, present and future.
Reprinted with permission from Transitions in Work and Learning: Implications for Assessment, 1997, by the National Academy of Sciences, Courtesy of the National Academies Press, Washington, D.C.
This article challenges the presumption that the educational testing of students provides objective information about such students.
This paper considers future educational assessment in terms of principles of evidential reasoning, focusing the discussion on the changes to the claims our assessments must support, the types of evidence needed to support these claims, and the statistical tools available to evaluate our evidence vis-à-vis the claims. An expanded view of assessment is advanced in which assessments based on multiple evidence sources from contextually rich situated learning environments, including unconventional data regarding human competencies, improve our ability to make valid inferences and decisions all education stakeholders.
Educational researchers and policymakers have often lamented the failure of teachers to implement what they consider to be technically sound assessment procedures. Through a case study of New York City’s Central Park East Secondary School (CPESS), in the years when it served as a model for progressive American school reform, Duckor and Perlstein demonstrate the usefulness of an alternative to reliance of the technical characteristics of standardized tests for constructing and judging assessments: teachers’ self-conscious and reasoned articulation of their approaches to learning and assessment. They conclude that when teachers are given opportunities for genuine, shared reflection on teaching and learning and classroom practices are tied to this understanding, fidelity to what they call the logic of assessment offers a more promising framework for the improvement of schooling than current forms of high-stakes, standardized accountability. Thus, instead of expecting teachers to rely on data from standardized assessments or replicate features of standardized testing in their own assessment practices, researchers, policymakers and teacher educators should promote fidelity to the broader logic of assessment.
Drawing upon the concept of interpretive flexibility, this study illuminates some of the sensemaking processes around teachers’ uses of data and computer data systems. Accordingly, it provides recommendations regarding how researchers, school, and district leaders might be more attentive to the “people problems” around data system implementation.
In this article, the author describes the history of classroom research and notes that, despite potential for present day application, many of those who currently develop observational systems for evaluating teachers appear to be unaware of this literature. The author describes what we know about effective teaching, the limits of using this information, and the need for identifying new important outcomes of schooling that can be used in teacher evaluation.
This paper reviews the literature on teacher effects and focuses on value-added measures and their use in evaluating teachers. Suggestions about the use of value-added measures and about the future of teacher effects research are provided.
In this study, the researchers surveyed all 50 states and the District of Columbia to provide an inclusive national growth and value-added model overview.
This paper explores how state education officials and their district and local partners plan to implement and evaluate their teacher evaluation systems, focusing in particular on states’ efforts to investigate the reliability and validity of scores emerging from the observational component of these systems.
This article discusses the intended and unintended consequences of high-stakes teacher evaluation. The potential for high-stakes teacher evaluation to meet the intended outcome of a better teacher workforce and improved student achievement is assessed, and the costs of doing so.
This study examines accountability in teacher education in an era of testing. It compares how multiple professions evaluate program outcomes and identifies concerns with overemphasis on value-added models as the basis for assessing the impact of teacher preparation program graduates. Suggestions are offered for possible alternative paths.
This article discusses the papers in the special issue of Teachers College Record addressing broad themes of reliability and validity that raise cautions regarding the usefulness of recent approaches to high-stakes evaluation of educators. Implications are drawn for the long-term health of the teacher labor market.
Foreword to the special issue on High-Stakes Teacher Evaluation.
In the fairytale of US public education reform, the root of all evil has presumably been identified: the dragons of ineffectiveness. In this fairytale, The LA Times, a newspaper team of investigative reporters, hired statisticians, and other columnists have rode in on the back of Value-Added Measurement. In this paper, we present findings from a policy narrative centered on teacher evaluation and effectiveness. We conducted an analysis of 52 articles published between 2009 and 2011 that were from or related to a series on Value-Added Measurement initially published in 2010 by The LA Times. We sought to understand the ways in which discourse choices worked to construct a certain version of policy issues related to teacher quality, positioning some individuals and even national groups on one side of a polarized debate. We have given particular attention to the ways in which the media discourse functioned to politicize and (over)simplify issues related to educational policy and teacher evaluation.
Design-based implementation research offers the opportunity to rethink the relationships between intervention, research, and situation to better attune research and evaluation to the program development process. Using a heuristic called the intervention development curve, I describe the rough trajectory that programs typically follow as they evolve, and argue that research design considerations and methodological choices are best made in consideration of where interventions are along this curve. Further, I contend that, as programs develop, situational influences play a major role in their evolution and consequently require increased attention to design and methodological considerations. By viewing research as an integral part of a program’s development, by making design and methodological choices in consideration of where programs are in their development, and by considering that the situation in which programs evolve as a potential source of change in the nature of the program itself, we alter fundamental perspectives on how research can best contribute to the steady work of building robust programs for educational improvement.
This article approaches the evolving concept of validity of assessments, moving from the scholarship of the past, to the constraints and demands of the present. The use of technology and globalization are raised as challenges to future approaches to validity.