Evaluating Teacher Education Programs in the Accountability Era
by Joshua Barnett & Audrey Amrein-Beardsley - January 13, 2011
Included in this commentary is a discussion of three key challenges that need to be addressed when conducting large-scale, internal evaluations of teacher education programs, a progressively interesting undertaking as those on the inside are increasingly engaging in this empirical work and those on the outside are increasingly holding them accountable for doing so.
Those outside of teacher education are increasingly viewing traditional teacher education programs as ripe for revision. Spirited calls for higher standards, less pedagogy, amplified course content, and stronger accountability are resounding from the national, state, and local levels. Yet as traditional teacher educators, generally those within university colleges of education, continue responding, a need for more discussion surrounding the methodical considerations required for conducting teacher education evaluations is emerging.
In this commentary, we describe a statewide initiative organized to systematically evaluate our states five public, traditional teacher education programs to assess whether our graduates are impacting student learning in positive and sustained ways. In addition, we present the three most significant challenges we have faced thus far, offer critical reflections about what we believe are necessary steps towards conducting such evaluations internally, and provide recommendations to others on how to use evaluation data to improve their programs.
This commentary follows the work of prior research on teacher education quality (Berliner, 1976) and similar calls for expediency (Grossman, 2008). The need for this commentary is further bolstered by the ongoing speeches made by the U.S. Secretary of Education, Arne Duncan, who recently described teacher education as brimming with Mickey Mouse courses (Duncan, 2009), and the Mathematica Policy Research Report (Constantine, Player, Silva, Hallgren, Grider, Deke, & Warner, 2009) which noted negligible differences between teachers trained through traditional and alternative teacher routes.
In October 2009, approximately 30 individuals convened for their fall steering committee meeting. At the table sat the deans and representative faculty from the five colleges of education, the two principal researchers, state department and governors office representatives, as well as education association individuals. The principal researchers presented the results in both aggregated and disaggregated figures. What followed was a critical conversation regarding the nature of data collection, the analytical procedures employed, and the dissemination of results. The conversation focused around three issues: What is the purpose of this evaluation? Who should manage the evaluation data? Should comparisons among the five teacher education programs be made? These questions were of heightened pertinence as all involved were aware of the consequences facing their programs, including that public information could affect enrollment numbers and potentially affect legislation about which entities could continue to certify teachers.
Challenge 1: Purpose
Those charged with conducting internal evaluations must first determine what the purpose for doing so should be. On one hand, the primary purpose could be to rebut the growing arguments from those outside of traditional teacher education programs by showcasing that those inside many with former PreK-12 teaching experience are the most capable to research their graduates impact in schools. On the other hand, the purpose could be to reflect critically and identify areas for improvement, where researchers could use disaggregated data to draw inter-institutional comparisons. The dilemma in our project, and likely others, was that we needed to accomplish multiple goals. We needed to inform policymakers that we were working to discover and address our own weaknesses, yet also promote positive elements within our programs, all the while not disrupting the increasingly unstable status of teacher education.
As the aforementioned results meeting unfolded, a serious discussion commenced regarding why anyone but the participating institutions should be made aware of the findings, which indicated that overall, students were well-pleased with their preparation and felt prepared for the field, although some differences emerged between colleges and programs. After considerable discussion, consensus was eventually reached reaffirming that the purpose of this project was to provide information to program leaders for formative purposes. This information could then be used to promulgate the successes and challenges the states colleges were both celebrating and addressing.
Based on our experience, we recommend releasing aggregated information publicly, while utilizing disaggregated data internally to facilitate candid conversations about improving teacher education, graduate quality, and student learning. We contend that similar internal evaluations should do more than air the proverbial dirty laundry, but rather reinforce that those within these fields are best suited to celebrate their achievements and redress areas for change.
Challenge 2: Data Management
The second challenge we faced was about who would be responsible for managing the data collected. This unforeseen dilemma came to a head as the five teacher education programs, formerly in competition with each other for students, resources, and rankings, had to work collectively. Some believed that all of the data should be collected through a central source and then disseminated to the institutions, while others stated that each program should collect data themselves. Ultimately the conversation developed around privacy and transparency. Central to the discussion was that those involved did not want their data mishandled or used against them. A centerpiece to this dilemma was that communal data collection across colleges and universities required agreement to transform the education of teachers across the state.
In our project, members agreed to hire a project Director, who was charged with collecting, analyzing, and disseminating the data to the group. We contend that this approach addressed the concerns of transparency across colleges, while maintaining a level of trust and support within the project. Overall we assert that central collection allows for transparency across groups and allows key personnel to operationalize terms consistently (e.g. courses, transfer students, branch campuses). Further, central collection allows for inter- and intra-group comparisons from which all constituents can learn and improve.
Challenge 3: Data Dissemination
The primary argument for disaggregating the data was to permit a transparent examination and to facilitate comparisons, so members could see how their programs measured up, relative to one another. The primary argument for aggregating was to avoid comparisons by providing a holistic summary. According to Boyd, Grossman, Lankford, Loeb, Michelli, & Wyckoff (2006), a cardinal rule when conducting such evaluations is that evaluators should compare performance across institutions to identify effective practices (see also Wineburg, 2006). However, comparisons do make programs susceptible to ranking systems and the pursuit of self-interests (Grossman, 2008), and with comparisons, data can be used to make highly consequential decisions not necessarily made previously (e.g. termination of a department or program).
Our recommendation and the action taken by our group was to produce two reports one with aggregated data written for a public audience showcasing that we were collectively producing highly confident and competent teachers and one with disaggregated data written for an internal audience comparing by college and program area. Disaggregated comparisons helped college administrators make college, departmental, and programmatic decisions and better contextualize the data to rectify issues locally. Aggregated comparisons allowed for a collective response to the growing concerns from those outside about the performance of teacher education programs and the teachers they are training.
As policymakers have increased the demands on teacher education programs to produce higher quality teachers, these types of evaluations are needed to identify what is and what is not working. We contend that as educational researchers we must constantly be reminded that discovering, or constructing knowledge and meaning in the field of education is never easy, regardless of the prudence with which a researcher might approach such an evaluation. Evaluations of this scale are extremely complex (Berliner, 2002), difficult to conduct, and take inordinate amounts of time, attention, and care.
In the evaluation example discussed, each of the representatives had a substantial stake in the results. If the results indicated one program was training teachers better than another, this finding could result in a list of unforeseen outcomes. These are serious issues, tied to which are serious consequences. However, if we fail to acknowledge and work under these conditions others on the outside will become increasingly likely to continue viewing us as part of the problem. Only together can we re-emerge as not only relevant to the 21st century, but vital, impressive, and prestigious as we collectively pave the way for our countrys future.
Berliner, D. (1976). Impediments to the study of teacher effectiveness. Journal of Teacher Education, 27, 5-13.
Berliner, D. (2002). Educational research: The hardest science of all. Educational Researcher, 31(8),18-20.
Boyd, D., Grossman, P., Lankford, H., Loeb, S., Michelli, N., & Wyckoff, J. (2006). Complex by design: Investigating pathways into teaching in New York City Schools. Journal of Teacher Education, 57(2), 102-119.
Constantine, J., Player, D., Silva, T., Hallgren, K., Grider, M., Deke, J., & & Warner, E. (2009). An evaluation of teachers trained through different routes to certification: Final report. Retrieved from http://www.mathematica-mpr.com/publications/pdfs/education/teacherstrained09.pdf
Duncan, A. (2009, October 22). Teacher preparation: Reforming the uncertain profession. Retrieved from http://www.ed.gov/news/speeches/2009/10/10222009.html
Grossman, P. (2008). Responding to our critics: From crisis to opportunity in research on teacher education. Journal of Teacher Education, 59(1), 10-23.
Wineburg, M. S. (2006). Evidence in teacher preparation: Establishing a framework for accountability. Journal of Teacher Education, 57(1), 51-64.