Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13
Topics
Discussion
Announcements
 

Evidence of Grade and Subject-Level Bias in Value-Added Measures


by Jessica Holloway-Libell - June 08, 2015

While value-added models (VAMs)—the statistical tools used to measure teacher effects on student achievement scores—continue to emerge throughout districts and states across the country, education scholars simultaneously recommend caution, especially in terms of the inferences that are made and/or used based on VAM outcomes. This research note investigates an unexplored feature of bias in VAM-based estimates—that which is associated with grade levels and subject areas. Findings contribute an alternative perspective regarding how we think about VAM-based bias and teacher classifications.

INTRODUCTION


Tennessee was one of only two states to win first-round Race to the Top (RttT) funds in 2009 (RttT, 2011). While most other contending states had to adopt or develop a statistical model to measure growth in achievement scores and attribute that growth to teachers (e.g., via a growth or value-added model [VAM]), Tennessee had the distinct advantage in that their VAM, the Tennessee Value-Added Assessment System (TVAAS), had been in place for nearly two decades by the time applications were due.


While currently 44 states and Washington D.C. have adopted some form of VAM or other student growth model in order to comply with RttT stipulations, or to receive No Child Left Behind waivers (Collins & Amrein-Beardsley, 2014), Tennessee was rewarded for being a leader in this competition. In addition, as the TVAAS model was later bought by the analytic software company, SAS® Institute Inc., and renamed the Education Value-Added Assessment System (EVAAS)—although it keeps the TVAAS name in Tennessee—it has since become the most popular and widely used VAM in the country.


Since the propagation of such models, however, many educational researchers have increasingly grown more skeptical of said VAMs, primarily because of concerns with the reliability, validity, and bias of the models currently available (Haertel, Rothstein, Amrein-Beardsley & Darling-Hammond, 2011; Graue, Delaney & Karch, 2013; Newton, Darling-Hammond, Haertel & Thomas, 2010; Papay, 2010; Rothstein, 2009, 2010, 2014). The TVAAS model, specifically, has also been the subject of such criticisms (Amrein-Beardsley, 2008; Ballou & Springer, 2015; Gabriel & Lester, 2013; Kupermintz, 2003). The focus of this paper is to call for additional consideration to how we think about issues of bias, as observed in the “best” VAM on the market, and more specifically about whether Tennessee teachers’ VAM estimates might be influenced by the grade levels and subject areas they teach.


VALUE-ADDED MODELS AND BIAS


VAMs are statistical tools meant to measure the purportedly causal relationships between teachers’ instruction and student achievement. This is done by statistically measuring growth in student achievement from one year to the next using students’ large-scale standardized test scores, while controlling for students’ prior testing histories and sometimes student-level variables (e.g., race, gender, levels of poverty) and sometimes classroom and school-level variables (e.g., class size, average prior achievement for a teacher). However, the extent to which these controls function as intended to control for bias has been a source of contention (Ehlert, Koedel, Parsons & Podgursky, 2013; Rothstein, 2009, 2010, 2014).


The most salient concern is that of bias related to the types of students taught, predominantly in the cases of homogenous classrooms, including students who consistently score at the extremes of the normal bell curve distribution (Goldhaber, Gabele & Walch, 2012; Kupermintz, 2003; Newton et al., 2010; Rothstein, 2009, 2010, 2014; Rothstein & Mathis, 2013). To mitigate these effects, the best recommendation is to randomly assign students to teachers (Ballou, 2012; Ehlert, Koedel, Parsons & Podgursky, 2012; Glazerman & Potamites, 2011); yet it is unlikely this will be done in practice (Paufler & Amrein-Beardsley, 2014). Others have argued that random assignment is unnecessary because most, if not all VAMs control for at least students’ prior achievement scores, thus controlling for other risk variables by proxy (Sanders & Horn, 1998; Sanders, Wright, Rivers & Leandro, 2009).


One area that is yet to have been explored, however, is the potential bias that exists at the grade and/or subject level, begging the question, are students more likely to demonstrate growth in some grade levels or subject areas over others? If yes, then teachers’ value-added scores might also be biased beyond just the student-level.


A CASE IN POINT


This analysis was initially prompted by an assistant principal in Tennessee who recently wrote about his concerns about what he thought were skewed data trends in his state’s 2013 TVAAS scores (Amrein-Beardsley, 2014). A quick look at the publicly available data seemed to corroborate his “hunches,” thus compelling a more systematic review; after which I pulled data from the 10 largest school districts in Tennessee, including all grade levels for which TVAAS scores were publicly available (i.e., third through eighth grade English/language arts [ELA] and mathematics). For each grade-level and subject-area, I calculated the percentage of schools per district whose students made more than expected growth (i.e., positive grade-level value-added) for 2013 and for their three-year composite scores.


FINDINGS


What I found was that there were, indeed, some perplexing trends, suggesting that this Tennessee administrator’s “hunches” were more than just conjecture. For clarity purposes, the tables below are color-coded—with green signifying the districts for which 75% or more of the schools had students who made more than expected growth (i.e., positive grade-level value-added), and red signifying the districts for which 25% or less of the schools had students who made more than expected growth.


In ELA in 2013, schools were, across the board, much more likely to receive positive value-added scores for ELA in fourth and eighth grades than in other grades (see Table 1). Simultaneously, districts struggled to yield positive value-added scores for their sixth and seventh grades in the same subject-areas. Fifth-grade scores fell consistently in the middle range, while the third-grade scores varied across districts.


Table 1. Percent of Schools that had Positive Value-Added Scores in English/language arts by Grade and District (2013)


District

Third

Fourth

Fifth

Sixth

Seventh

Eighth

Memphis

41%

43%

45%

19%

14%

76%

Nashville-Davidson

NA

43%

28%

16%

15%

74%

Knox

72%

79%

47%

14%

7%

73%

Hamilton

38%

64%

48%

33%

29%

81%

Shelby

97%

76%

61%

6%

50%

69%

Sumner

77%

85%

42%

17%

33%

83%

Montgomery

NA

71%

62%

0%

0%

71%

Rutherford

83%

92%

63%

15%

23%

85%

Williamson

NA

88%

58%

11%

33%

100%

Murfreesboro

NA

90%

50%

30%

NA

NA


The three-year composite scores were similar (see Table 2), except even more schools received positive value-added scores for the fifth and eighth grades. In fact, in each of the nine districts that had a composite score for eighth grade, at least 86% of their schools received positive value-added scores at the eighth-grade level. This, contrasted with the sixth and seventh-grade composite scores, suggests that there might be some grade-level bias at play, in this case against sixth and seventh-grade ELA teachers.


Table 2. Percent of Schools that had Positive Value-Added Scores in English/Language Arts by Grade and District (Three-Year Composite)


District

Third

Fourth

Fifth

Sixth

Seventh

Eighth

Memphis

NA

54%

54%

46%

17%

98%

Nashville-Davidson

NA

70%

48%

50%

36%

100%

Knox

NA

77%

77%

43%

14%

93%

Hamilton

NA

75%

55%

29%

33%

95%

Shelby

NA

72%

82%

25%

38%

88%

Sumner

NA

69%

58%

25%

25%

92%

Montgomery

NA

85%

95%

14%

0%

86%

Rutherford

NA

96%

29%

54%

31%

92%

Williamson

NA

96%

91%

33%

78%

100%

Murfreesboro

NA

90%

90%

90%

NA

NA


Though the mathematics scores were not as apparently skewed as the ELA scores, there were some trends to note there as well (see Table 3). In particular, the fourth and seventh grade-level scores were consistently higher than those of the third, fifth, sixth, and eighth grades, which illustrated much greater variation across districts. The three-year composite scores were similar. In fact, a majority of schools across the state received positive value-added scores in mathematics across all grade levels (see Table 4).


Table 3. Percent of Schools that had Positive Value-Added Scores in Mathematics by Grade and  District (2013)


District

Third

Fourth

Fifth

Sixth

Seventh

Eighth

Memphis

42

91%

70%

37%

84%

55%

Nashville-Davidson

NA

60%

65%

70%

90%

44%

Knox

72

87%

57%

64%

93%

67%

Hamilton

31

86%

84%

38%

76%

78%

Shelby

97

97%

61%

75%

100%

69%

Sumner

81

77%

50%

67%

75%

58%

Montgomery

NA

86%

71%

14%

86%

57%

Rutherford

79

100%

67%

62%

77%

85%

Williamson

NA

67%

21%

100%

100%

56%

Murfreesboro

NA

100%

60%

90%

NA*

NA*



Table 4. Percent of Schools that had Positive Value-Added Scores in Mathematics by Grade and

District (Three-Year Composite)


District

Third

Fourth

Fifth

Sixth

Seventh

Eighth

Memphis

NA

90%

72%

62%

96%

78%

Nashville-Davidson

NA

73%

67%

89%

97%

74%

Knox

NA

96%

79%

90%

93%

93%

Hamilton

NA

89%

80%

52%

90%

71%

Shelby

NA

97%

79%

75%

100%

69%

Sumner

NA

92%

65%

90%

92%

100%

Montgomery

NA

95%

85%

71%

86%

71%

Rutherford

NA

100%

79%

69%

92%

85%

Williamson

NA

91%

22%

100%

100%

89%

Murfreesboro

NA

100%

90%

100%

NA

NA


CONCLUSION


Of most importance here is how these results are being interpreted and used, particularly in terms of the validity of the inferences that are being based on these data. By Tennessee’s standard, given their heavy reliance on the TVAAS to evaluate teachers, the conclusion might be that the mathematics teachers were, overall, more effective than the ELA teachers in almost every tested grade-level (with the exception of eighth-grade ELA), regardless of school district. Perhaps the fourth and eighth-grade ELA teachers across the state were indeed more effective than the sixth and seventh-grade ELA teachers; thus, they earned and deserved the higher value-added scores and all of the accompanied accolades. Perhaps not.


Reckase (2004) reminds us that bias is not always what it seems. For example, sometimes the apparent “bias” is due to a misalignment between the content taught and content tested. Interestingly enough, however, if Tennessee deserves praise in one area, it could very well be for their strategic plan regarding standards, curriculum, and testing alignment, especially given the state’s recent transition from state standards to the Common Core Standards. To accommodate the transition, the Tennessee Department of Education, with the support of a Technical Advisory Committee, revisits the standards and accompanied test, each year, to guarantee effective alignment (for the detailed transition plans, visit TNCore.org and TN.gov). I also spoke with the aforementioned assistant principal to confirm this was the case, and he agreed that the state has gone to extensive lengths to guarantee that what is taught is what is tested, indicating that the problem is most likely not with alignment.


Perhaps a more reasonable explanation, though, is that there was/is some unique bias present, possibly related to issues with the vertical scaling of Tennessee’s tests, other measurement errors, or some other culprit that is indeterminate at this time. The extreme grade-level differences in ELA and the lack thereof in mathematics suggests two potential forms of biases—grade-level bias (i.e., against sixth and seventh-grade teachers in ELA) and subject-level bias (i.e., in favor of mathematics teachers in general)—indicating that teacher effectiveness, as is often simplistically assumed, is most likely not the sole reason for what are often positioned as “true” differences between and among teachers given their “true” levels of effectiveness.


Regardless, in Tennessee, as well as other states and districts across the country, this growth is being attributed to the teachers of students in these grades and subject areas, despite not only the levels of bias (likely) inherent in these estimates themselves (e.g., based on the types of students), although, this is of great controversy, but also by the grade levels or subject areas taught. The latter is something about which more research is certainly warranted, particularly in terms of the ways we define and think about VAM-based bias and teacher classifications.


References


Amrein-Beardsley, A. (2008). Methodological concerns about the Education Value-Added Assessment System (EVAAS). Educational Researcher, 37(2), 65–75. doi: 10.3102/0013189X08316420


Amrein-Bearsley, A. (2014, February 9). An assistant principal from Tennessee on the EVAAS system. [Blog post]. Retrieved from http://vamboozled.com/an-assistant-principal-from-tennessee-on-the-evaas-system/


Ballou, D. (2012). Review of “The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood.” [Review of the report The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood, by R. Chetty, J]. Boulder, CO: National Education Policy Center. Retrieved from http://nepc.colorado.edu/thinktank/review-long-term-impacts


Ballou, D., & Springer, M. G. (2015). Using student test scores to measure teacher performance: Some problems in the design and implementation of evaluation systems. Educational Researcher44(2), 77–86.


Collins, C., & Amrein-Beardsley, A. (2013). Putting growth and value-added models on the map: A national overview. Teachers College Record 116(1), 1–34. Retrieved from http://www.tcrecord.org/Content.asp?ContentId=17291


Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2012). Selecting growth measures for school and teacher evaluations. Washington, D.C: National Center for Analysis of Longitudinal Data in Education Research. Retrieved from www.caldercenter.org/publications/upload/WP-80.pdf


Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013). The sensitivity of value-added estimates to specification adjustments: Evidence from school- and teacher-level models in Missouri. Statistics and Public Policy, 1(1), 19–27. doi: 10.1080/2330443X.2013.856152


Gabriel, R., & Lester, J. N. (2013). Sentinels guarding the grail: Value-added measurement and the quest for education reform. Education Policy Analysis Archives, 21(9), 1–30. Retrieved from http://epaa.asu.edu/ojs/article/view/1165


Glazerman, S. M., & Potamites, L. (2011). False performance gains: A critique of successive cohort indicators. [Working paper]. Washington DC: Mathematica Policy Research. Retrieved from http://www.mathematica-mpr.com/~/media/publications/PDFs/Education/False_Perf.pdf


Goldhaber, D., Gabele, B., & Walch, J. (2012). Does the model matter? Exploring the relationship between different student achievement-based teacher assessments. [Panel paper]. Seattle, WA: Center for Education Data and Research. Retrieved from https://appam.confex.com/appam/2012/webprogram/Paper2264.html


Graue, M. E., Delaney, K. K., & Karch, A. S. (2013). Ecologies of education quality. Education Policy Analysis Archives, 21(8), 1–36. Retrieved from http://epaa.asu.edu/ojs/article/view/1163


Haertel, E., Rothstein, J., Amrein-Beardsley, A., & Darling-Hammond, L. (2011). Getting teacher evaluation right: A challenge for policy makers. [Capitol Hill briefing]. Retrieved from http://www.aera.net/AboutAERA/KeyPrograms/EducationResearchandResearchPolicy/AERANAEDHoldSuccessfulBriefingonTeacherEval/VideoRecordingofResearchBriefing/tabid/12327/Default.aspx


Kupermintz, H. (2003). Teacher effects and teacher effectiveness: A validity investigation of the Tennessee Value-Added Assessment System. Educational Evaluation and Policy Analysis, 25, 287–298. doi:10.3102/01623737025003287


Newton, X., Darling-Hammond, L., Haertel, E., & Thomas, E. (2010) Value-added modeling of teacher effectiveness: An exploration of stability across models and contexts. Educational Policy Analysis Archives, 18(23), 1–27. Retrieved from http://epaa.asu.edu/ojs/article/view/810


Papay, J. P. (2010). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163–193. doi: 10.3102/0002831210362589


Paufler, N. A., & Amrein-Beardsley, A. (2013, October 22). The random assignment of students into elementary classrooms: Implications for value-added analyses and interpretations. American Educational Research Journal. doi: 10.3102/0002831213508299


Race to the Top Act of 2011, S. 844, 112th Congress. (2011). Retrieved from https://www.govtrack.us/congress/bills/112/s844

Reckase, M. D. (2004). The real world is more complicated than we would like. Journal of Educational and Behavioral Statistics, 29(1), 117–120.

Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571. Retrieved from http://dx.doi.org/10.1162/edfp.2009.4.4.537

Rothstein, J.  (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics. 175–214. doi:10.1162/qjec.2010.125.1.175

Rothstein, J., & Mathis, W. J. (2013). Review of two culminating reports from the MET Project. Boulder, CO: National Education Policy Center. Retrieved from http://nepc.colorado.edu/thinktank/review-MET-final-2013

Rothstein, J. (2014). Revisiting the impacts of teachers. (Working Paper) Berkeley, CA: University of California, Berkeley.

Sanders, W. L., & Horn, S. (1998). Research findings from the Tennessee Value-Added Assessment System (TVAAS) database: Implications for educational evaluation and research. Journal of Personnel Evaluation in Education, 12(3), 247–256.

Sanders, W. L., Wright, S. P., Rivers, J. C., & Leandro, J. G. (2009). A response to criticisms of SAS EVAAS. Cary, NC: SAS Institute Inc. Retrieved from http://www.sas.com/resources/asset/Response_to_Criticisms_of_SAS_EVAAS_11-13-09.pdf




Cite This Article as: Teachers College Record, Date Published: June 08, 2015
https://www.tcrecord.org ID Number: 17987, Date Accessed: 11/29/2021 10:46:41 AM

Purchase Reprint Rights for this article or review
 
Article Tools
Related Articles

Related Discussion
 
Post a Comment | Read All

About the Author
  • Jessica Holloway-Libell
    Kansas State University
    E-mail Author
    JESSICA HOLLOWAY-LIBELL is an assistant professor of Educational Leadership at Kansas State University. Her current research looks at market influences on teacher evaluation policies, practices and instruments. Jessica’s recent publications include Holloway-Libell, J., & Amrein-Beardsley, A. (in press). “Truths” devoid of empirical proof: Underlying assumptions surrounding value-added models (VAMs) in teacher evaluation [Commentary]. Teachers College Record; and Holloway-Libell, J., & Collins, C. (2014). VAM-based teacher evaluation policies: Ideological foundations, policy mechanisms, and implications. InterActions: UCLA Journal of Education and Information Studies.
 
Member Center
In Print
This Month's Issue

Submit
EMAIL

Twitter

RSS