State and Local Efforts To Investigate the Validity and Reliability of Scores From Teacher Evaluation Systems

by Corinne Herlihy, Ezra Karger, Cynthia Pollard, Heather C. Hill, Matthew A. Kraft, Megan Williams & Sarah Howard - 2014

Context: In the past two years, states have implemented sweeping reforms to their teacher evaluation systems in response to Race to the Top legislation and, more recently, NCLB waivers. With these new systems, policymakers hope to make teacher evaluation both more rigorous and more grounded in specific job performance domains such as teaching quality and contributions to student outcomes. Attaching high stakes to teacher scores has prompted an increased focus on the reliability and validity of these scores. Teachers unions have expressed strong concerns about the reliability and validity of using student achievement data to evaluate teachers and the potential for subjective ratings by classroom observers to be biased. The legislation enacted by many states also requires scores derived from teacher observations and the overall systems of teacher evaluation to be valid and reliable.

Focus of the study: In this paper, we explore how state education officials and their district and local partners plan to implement and evaluate their teacher evaluation systems, focusing in particular on statesí efforts to investigate the reliability and validity of scores emerging from the observational component of these systems.

Research design: Through document analysis and interviews with state education officials, we explore several issues that arise in observational systems, including the overall generalizability of teacher scores; the training, certification, and reliability of observers; and specifications regarding the sampling and number of lessons observed per teacher.

Findings: Respondentsí reports suggest that states are attending to the reliability and validity of scores, but inconsistently; in only a few states does there appear to be a coherent strategy regarding reliability and validity in place.

Conclusions: There remain a variety of system design and implementation decisions that states can optimize to increase the reliability and validity of their teacher evaluation scores. While a state may engage in auditing scores, for instance, it may miss the gains to reliability and validity that would accrue from periodic rater retraining and recertification, a stiff program of rater monitoring, and the use of multiple raters per teacher. Most troublesome are decisions about which and how many lessons to sample, which are either mandated legislatively, result from practical concerns or negotiations between stakeholders, or, at best case, rest on broad research not directly related to the state context. This suggests that states should more actively investigate the number of lessons and lesson sampling designs required to yield high-quality scores.

To view the full-text for this article you must be signed-in with the appropriate membership. Please review your options below:

Store a cookie on my computer that will allow me to skip this sign-in in the future.
Send me my password -- I can't remember it
Purchase this Article
Purchase State and Local Efforts To Investigate the Validity and Reliability of Scores From Teacher Evaluation Systems
Individual-Resource passes allow you to purchase access to resources one resource at a time. There are no recurring fees.
Become a Member
Online Access
With this membership you receive online access to all of TCRecord's content. The introductory rate of $25 is available for a limited time.
Choose this to join the mailing list or add an announcement.
Print and Online Access
With this membership you receive the print journal and free online access to all of TCRecord's content.

Cite This Article as: Teachers College Record Volume 116 Number 1, 2014, p. - ID Number: 17292, Date Accessed: 9/21/2021 10:14:57 AM

Purchase Reprint Rights for this article or review