“Truths” Devoid of Empirical Proof: Underlying Assumptions Surrounding Value-Added Models in Teacher Evaluation
by Jessica Holloway-Libell & Audrey Amrein-Beardsley - June 29, 2015
Despite the overwhelming and research-based concerns regarding value-added models (VAMs), VAM advocates, policymakers, and supporters continue to hold strong to VAMs’ purported, yet still largely theoretical strengths and potentials. Those advancing VAMs have, more or less, adopted and promoted a set of agreed-upon, albeit “heroic” set of assumptions, without independent, peer-reviewed research in support. These “heroic” assumptions transcend promotional, policy, media, and research-based pieces, but they have never been fully investigated, explicated, or made explicit as a set or whole. These assumptions, though often violated, are often ignored in order to promote VAM adoption and use, and also to sell for-profits’ and sometimes non-profits’ VAM-based systems to states and districts. The purpose of this study was to make obvious the assumptions that have been made within the VAM narrative and that, accordingly, have often been accepted without challenge. Ultimately, sources for this study included 470 distinctly different written pieces, from both traditional and non-traditional sources. The results of this analysis suggest that the preponderance of sources propagating unfounded assertions are fostering a sort of VAM echo chamber that seems impenetrable by even the most rigorous and trustworthy empirical evidence.
Value-added models (VAMs) are statistical instruments intended to objectively measure the amount of value that a teacher adds to (or detracts from) student learning and achievement from one school year to the next. Stemming largely from the field of economics (Hanushek, 1971, 1979, 2009, 2011), VAMs have been the source of both academic and public controversy, often causing rift between teachers and public officials (e.g., the Chicago Teacher Strike of 2012).
There is also a growing divide between academic scholars who have taken to either side of the VAM debate, with economists often on one side (although not always; see, for example, Rothstein, 2009, 2014) promoting the still largely purported strengths of these models (see for example, Chetty, Friedman, & Rockoff, 2011, 2014a, 2014b, 2014c; Ehlert, Koedel, Parsons, & Podgursky, 2012; Gordon, Kane, & Staiger, 2006; Kane & Staiger, 2008, 2012; Hanushek, 1971, 2009, 2011), and educational researchers often on the other side, representing the VAM critics (see for example, Baker, Barton, Darling-Hammond, Haertel, Ladd, Linn, . . . & Shepard, 2010; Corcoran, 2010; Gabriel & Lester, 2013; Graue, Delaney, & Karch, 2013; Hill, Kapitula, & Umlan, 2011; Newton, Darling-Hammond, Haertel, & Thomas, 2010; Papay, 2010; Rothstein, 2009, 2010).
Given the still current momentum of VAM adoption and use, then, it is reasonable to posit that economists have taken the lead in influencing educational policy in this area, which is not surprising in light of their increasing influence in the public policy arena at large (Fourcade, Ollion, & Algan, 2014; Lazear, 1999).
THE ROLE OF ECONOMICS IN EDUCATIONAL POLICY
Economists work from a different set of theoretical and epistemological assumptions than those of other social scientists (Fourcade et al., 2014). Economists, for example, tend to work from the presupposition that the properties of 'collectivities'groups, institutions, societiescan be reduced to statements about the properties of individuals (Ingham, 1996, pp. 245246). In contrast, sociologists (and educational researchers) work from the presupposition that individuals (human capital) are part of larger social structures (social capital) that influence the make-up of the individuals (Ingham, 1996). Accordingly, the methodological approaches to understanding human behavior and society are quite different, by discipline. Other social science disciplines, such as psychologists, anthropologists, and political scientists, have entirely different sets of epistemological assumptions within which they ground their work, as well.
But here, and in particular, VAMsthe products of economistsare built on assumptions that, while appropriate for an economic analytical framework, create a set of problems for other disciplinary approaches that work from different sets of assumptions. Hence, our purpose for writing this commentary was to better explore and (hopefully) help others better understand the core of the VAM debate by (a) identifying some of the key assumptions, or conditions, that have not been explicitly addressed by economics-based value-added calculations, but have nonetheless been implicitly assumed as true conditions upon which VAMs and VAM-use can be justified and rationalized; and (b) mapping these assumptions onto the greater VAM literature to interpret the feasibility of such assumptions as being true given a multi-disciplinary lens.
For the purposes of this paper, we define assumptions broadly to include room for all social science disciplines definitions, for we find that such inclusion is not only important, but critical for understanding the complexities involved in the process for measuring teacher quality, as well as teacher impacts on growth in student achievement and learning over time (i.e., growth).
In addition, we worked from the position that economists have led the work. They have also informed and defined the narratives surrounding VAMs, especially as these narratives relate to the asserted need for VAMs (Chetty et al., 2011, 2014a, 2014b, 2014c; Gordon et al., 2006; Kane & Staiger, 2008, 2012; Rockoff, Staiger, Kane, & Taylor, 2010; Weisburg, Sexton, Mulheron, & Keeling, 2009), the development of VAMs (Hanushek, 1971, 2009, 2011), and the use of VAMs for teacher evaluation and accountability purposes (Chetty et al., 2011, 2014a, 2014b, 2014c; Harris & Weingarten, 2011; Sanders, 2003; Sanders, Saxton, & Horn, 1997).
Accordingly, VAMs and VAM-based policies and practices have been built upon a set of assumptions that are appropriate to the discipline of economics. However, given the complexities that are inherent in education systems, it might serve our understandings well to apply multidisciplinary approaches to think about not only the capabilities of VAMs, but also the assertions regarding the need for VAMs, the practical application of VAMs, and the potential consequences related to VAM-use. We argue that by looking at the VAM literature in terms of disciplinary assumptions, we might better understand the nature of the debate surrounding VAMs, all the while explicitly unpacking the problems that can arise by depending almost entirely on one disciplinary approach to a problem. To this end, we conducted a content analysis of the VAM literatures, defined both in traditional (e.g., peer-reviewed publications) and non-traditional (e.g., news articles) terms, while treating each of these literature pieces as artifacts.
Our goal was to locate and identify the implicit and explicit assumptions that were made across pieces, while situating these assumptions within a multidisciplinary framework. In other words, we did not limit our definition of assumption to the strict, statistical sense of the word. Rather, we took the term assumption as broadly conceived, to encompass that which is accepted as truth, devoid of empirical proof. Sources for this study included 470 distinctly different pieces1 that we read and analyzed, again while deconstructing the assumptions common across sources. In order by volume, resources that we read and analyzed included: peer-reviewed research studies (29%, n=138); articles published in media outlets (21%, n=98); organization, foundation, business, and think tank research studies (20%, n=95); editor-and self or author-reviewed studies (14%, n=66); federal, local, and other promotional materials (10%, n=46); and blog posts (6%, n=27) (see Figure 1).
Figure 1. Types of VAM-based articles read and analyzed for this study.
We carefully read through each of the sources, and, using a constant comparison analytic method (Leech & Onwuegbuzie, 2008), noted when the authors made or advanced assumptions, or implicitly accepted as true potentially necessary pre-conditionswithout acknowledgement and research or references in support. In other words, we attempted to make sense of the way in which VAMs have been politically and socially accepted, despite the academic contention, by tracking the narrative at the base of the discrepanciesthe assumptions or conditions upon which VAMs are possible, necessary, and successful at measuring teacher quality as conceptualized.
After collapsing the 1226 initial codes into a set of 33 major assumptions, and eventually into four major themes, we mapped these assumptions onto the greater VAM literature to determine the feasibility, practicality, and appropriateness of VAM-use given a multidisciplinary lens. We compared each assumption against the research to determine the literature.
We present each assumption, as situated within the greater multi-disciplinary VAM literature. We do thisto consider VAMs and VAM-useas part of complex schooling systems that can best be understood, via multiple approaches instead of a single, economics-based approach. Of particular focus are the 33 assumptions, related to the (a) assumptions used as rationales to justify VAM adoption (see Figure 2), (b) assumptions used as justifications to further advance VAM implementation (see Figure 3), (c) major statistical and methodological assumptions about VAMs (see Figure 4), and, (d) assumptions specifically made about the large-scale standardized tests used for value-added calculations (see Figure 5). We call on readers to consider, ponder, think upon, or deliberate these, as illustrated in these figures.
Figure 2. Assumptions used as rationales to justify VAM adoption.
Figure 3. Assumptions used as justifications to further advance VAM implementation.
Figure 4. Statistical and methodological assumptions about VAMs.
Figure 5. Assumptions made about the tests used for value-added calculations.
For this commentary, we attempted to situate the VAM narrative, within a multi-disciplinary context by unpacking the conditions, upon which VAMs and VAM-use are built. We also attempted to situate these conditions withinand map these conditions ontothe larger value-added literature. Taking such a multi-disciplinary approach, it becomes rather clear that the conditions must stand as true in order for VAMs and VAM-use to be appropriate. It is much more difficult than is currently represented in the public narrative surrounding VAMs.
As illustrated, accepting the many assumptions that must be accepted, in order to truly measure the isolated impact of teacher effects, on student learning is nearly impossible. Nonetheless, policies across the country are not only requiring school administrators to apply methods to teacher evaluationsbut accept these assumptions, certainly not reject themthen also attach high-stakes personnel decisions to VAM-derived outcomes.
Lazier (1999) reminds us that it is the intention of the economist to make simplifying assumptions (p. 5), in order to sustain a level of statistical rigor. It is feasible to assume that such work has a certain appeal that might lure politicians and the public into trusting such seemingly simple analytical instruments and analyses, largely given the simple assumptions that make simple sense.
1. To view a full list of these 470 references, please visit an anonymous, APA-formatted listing of them at https://sites.google.com/site/anonymousauthor111/references
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., . . . & Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. (Briefing Paper #278).
Retrieved from the Economic Policy Institute website: http://www.epi.org/publications/entry/bp278
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2011, December). The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood. (Working Paper No. 17699). Retrieved from National Bureau of Economic Research website: http://www.nber.org/papers/w17699
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014a). Measuring the impact of teachers I:
Teacher value-added and student outcomes in adulthood. (Working Paper No. 19423). Retrieved from National Bureau of Economic Research website: http://www.nber.org/papers/w19423
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014b). Measuring the impact of teachers II:
Evaluating bias in teacher value-added estimates. (Working Paper No. 19424). Retrieved from National Bureau of Economic Research website: http://www.nber.org/papers/w19424
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014c). Response to Rothstein (2014) on
Revisiting the Impacts of Teachers. Unpublished research note. Retrieved
from Harvard website: http://obs.rc.fas.harvard.edu/chetty/Rothstein_response.pdf
Corcoran, S. P. (2010). Can teachers be evaluated by their students test scores? Should they be? The use of value-added measures of teacher effectiveness in policy and practice. Providence, RI: Annenberg Institute for School Reform. Retrieved from http://annenberginstitute.org/publication/can-teachers-be-evaluated-their-students%E2%80%99-test-scores-should-they-be-use-value-added-mea
Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2012). Selecting growth measures for school and teacher evaluations. Washington, DC: National Center for Analysis of Longitudinal Data in Education Research (CALDER). Retrieved from: http://www.caldercenter.org/publications/selecting-growth-models-school-and-teacher-evaluations-should-proportionality-matter
Fourcade, M., Ollion, E., & Algan, Y. (2014). The superiority of economists. (Discussion Paper 14/3). Retrieved from the Max Planck Sceince Po Center website: http://www.maxpo.eu/pub/maxpo_dp/maxpodp14-3.pdf
Gabriel, R. & Lester, J. N. (2013). Sentinels guarding the grail: Value-added measurement and the quest for education reform. Education Policy Analysis Archives, 21(9), 130. Retrieved from http://epaa.asu.edu/ojs/article/view/1165
Gordon, R., Kane, T. J., & Staiger, D. O. (2006). Identifying effective teachers using performance on the job. (Discussion paper). Retrieved from the Brookings Institution website: http://www.brookings.edu/papers/2006/04education_gordon.aspx
Graue, M. E., Delaney, K. K., & Karch, A. S. (2013). Ecologies of education quality. Education Policy Analysis Archives, 21(8), 1-36.
Hanushek, E. (2009). Teacher deselection. In D. Goldhaber & J. Hannaway (Eds.), Creating a new teaching profession (pp. 16580). Washington, DC: Urban Institute Press.
Hanushek, E. A. (1971). Teacher characteristics and gains in student achievement: Estimation using micro data. The American Economic Review, 61(2), 280288.
Hanushek, E. A. (1979). Conceptual and empirical issues in estimating educational production function issues. Journal of Human Resources 14(3), 35188.
Hanushek, E. A. (2011). The economic value of higher teacher quality. Economics of Education Review, 30, 466-479.
Harris, D. N., & Weingarten, R. (2011). Value-added measures in education: What every educator needs to know. Cambridge, MA: Harvard Education Press.
Hill, H. C., Kapitula, L, & Umlan, K. (2011, June). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794831. doi:10.3102/0002831210387916
Ingham, G. (1996). Some recent changes in the relationship between economics and sociology. Cambridge Journal of Economics, 20(2), 243-275.
Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: An experimental evaluation. (Working Paper No. 14607). Retrieved from the National Bureau of Economic Research website: http://www.nber.org/papers/w14607
Kane, T., & Staiger, D. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. (Research Paper). Retrieved from the Bill & Melinda Gates Foundation website: http://www.metproject.org/downloads/Preliminary_Findings-Research_Paper.pdf
Lazear, E. P. (1999). Economic imperialism. (Working paper no. 7300). Retrieved from National Bureau of Economic Research website: http://www.nber.org.ezproxy1.lib.asu.edu/papers/w7300
Leech, N. L., & Onwuegbuzie, A. J. (2008). Qualitative data analysis: A compendium of techniques for school psychology research and beyond. School Psychology Quarterly, 23, 587604.
Newton, X., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010) Value-added modeling of teacher effectiveness: An exploration of stability across models and contexts. Educational Policy Analysis Archives, 18(23), 127. Retrieved from http://epaa.asu.edu/ojs/article/view/810
Papay, J. P. (2010). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48, 163193. doi: 10.3102/0002831210362589
Rockoff, J. E., Staiger, D. O., Kane, T. J., & Taylor, E. S. (2010). Information and employee
evaluation: Evidence from a randomized intervention in public schools. (Working Paper No. 16240). Retrieved from the National Bureau of Economic Research Website: http://www.nber.org/papers/w16240
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125(1). 175214. doi:10.1162/qjec.2010.125.1.175
Sanders, W. L. (2003, April). Beyond No Child Left Behind. Paper presented at the Annual Meeting of the American Educational Research Association. Retrieved from: http://www.sas.com/govedu/edu/no-child.pdf
Sanders, W. L., Saxton, A. M., & Horn, S. P. (1997). The Tennessee Value-Added Accountability
System: A quantitative, outcomes-based approach to educational assessment. In J.Millman (Ed.), Grading teachers, grading schools: is student achievement a valid
evaluation measure? (pp. 137162). Thousand Oaks, CA: Corwin Press.
Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect. Education Digest, 75(2), 3135.