Teacher as Game-Show Host, Bookkeeper, or Judge? Challenges, Contradictions, and Consequences of Accountability
by Patricia Anders & Virginia Richardson - 1992
A culture of testing permeates schools. Many teachers cannot separate instruction/instructional improvement from outcome measures designed for accountability. The article discusses the effects of standardized testing and examines a study of teachers' beliefs and practices concerning reading comprehension instruction that indicates accountability greatly concerns and confuses teachers. (Source: ERIC)
On a bright, sunny afternoon in the Sonoran Desert, seven teachers of grades four, five, and six gathered in the library of their working-class, modern, and nicely designed neighborhood school. Both authors of this article are professors of education at a nearby university in Arizona, and we were meeting regularly with the teachers to explore barriers to using research-based practices in reading instruction. This particular afternoon was the day before grades were due, and a topic that had dominated many of our previous conversationstesting, grading, and assessment-was weighing heavily on the teachers minds. The meeting started with a discussion orchestrated by us to compare notions of subjectivity and objectivity in grading. Objectivity was held in high regard and being objective was a goal the teachers shared. We hoped to encourage the teachers to elaborate on their understandings of the nature of that construct. The following dialogue represents part of the conversation from that afternoons session:
PATTY:1 Okay, Kerri, lets say that my son is one of your students, and he receives a B in reading. I come to the parent-teacher conference, and ask you what it means for my son to receive a B in reading. Im not complaining about his grade, I just want to know what a B means, relative to other kids his age who get an A or a C.
KERRI: Well, wed look in the grade book at the 44 grades for the grading period, and I would show you how when averaged together they come out to 85 percent, and that equals a B.
PATTY: Yeah, but what would that mean about my sons reading?
KERRI: Well, wed have to look at each of the assignments and see what he missed and why he got the scores he did.
PATTY: What kind of activities would have been graded that way?
KERRI: Oh, you know, theyd be the workbook pages and the end of the unit quizzes.
PATTY: But weve already agreed that workbook pages and unit quizzes dont really represent reading . . .
KERRI (in an agitated voice): Yeah, yeah, but lets face it -grading is really a game. The kids have to learn to play the game!
PHILIP (enthusiastically interjecting): Hey! Thats right. My wife and I are both teachers, and at this time of the year, we set up the calculators, turn on the TV and average grades. Add em up and youre all done.
RANDALL: Oh, its gotta be more than that . . . there has to be some reflection, some judgment, it just cant be the numbers . . . sometimes the grade I give a student is because theyve made a lot of progress, or tried really hard . . . it cant be just the numbers . . .
PATTY: That sounds more like a judge, and I sure like thinking of teachers as more judge-like than like a game show host or a bookkeeper . . .
Of course, it is well known that beliefs about testing, assessment, grades, and accountability in general play a major role in shaping teachers practice. We have now come to realize that a culture2 of testing permeates schools and classrooms to such an extent that teachers seemingly have great difficulty considering instruction and the improvement of instruction separately from outcome measures designed for purposes of accountability. After reviewing some of the literature on the effects of standardized testing, we will explore the findings of our study that suggest that the view of teachers toward standardized testing has been generalized to other forms of assessment.3
THE EFFECTS OF STANDARDIZED TESTING
A small but growing body of literature on the negative effects of testing has documented the influence of standardized testing on society, teachers, curriculum, and students. Potent effects of testing are attributed to undue credibility placed on concepts inherent in the construct of normative measurement for educational purposes.
Lorrie Shepard, an educational psychologist at the University of Colorado, stated the case strongly:
In the U.S. today standardized testing is running amok. Newspapers rank schools and districts by their test scores. Real estate agents use test scores to identify the best schools as selling points for expensive housing. Superintendents can be fired for low scores, and teachers can receive merit pay for high test scores. Superintendents exhort principals and principals admonish teachers to raise test scoresrather than increase learining.4
Thus, standardized testing is high stakes. How students do on the tests has a tremendous impact on all who are even remotely concerned with schools. In summarizing general principles regarding the impact of tests, George Madaus has noted:
The power of tests and examinations to affect individuals, institutions, curriculum, or instruction is a perceptual phenomenon: if students, teachers, or administrators believe that the results of an examination are important, it matters very little whether this is really true or falsethe effect is produced by what individuals perceive to be the case.5
A teacher from a Native American school even told us that her principal discouraged her from presenting a paper at a conference because the districts test scores were among the lowest in the state.
In addition, standardized tests influence the curriculum teachers offer in their classrooms. In a study conducted for the Rand Corporation, Linda Darling-Hammond and Arthur Wise interviewed a random sample of forty- forty-three teachers to understand teachers responses to policies that shape the three conditions of their work. Standardized testing was one of the factors that emerged as a powerful force shaping life in the classroom. Content analvsis e of the interview responses revealed five categories of effects: altered curriculum emphasis, teaching students how to take tests, teaching students for the test (specific preparation for the test), having less time to teach, and feeling under pressure. The most common effect reported by teachers about their own behavior was that they altered their curriculum emphasis.6
Peter Johnston extended these observations by suggesting that-methods used to evaluate students reading achievement strongly influenced not only what was taught, but how it was taught. The current standardized measures, he suggested, ignore reading as a constructive process and reduce it to a product. This leaves out important aspects of the reading process, such as a readers purpose, prior knowledge and experience, and social context. Practices teachers choose reflect their understanding of the content. Practices that model reading as a product and as a scope and sequence of skills and objectives that can be tested on a multiple-choice test violate the nature of the reading process and send a message to students that suggests reading to be something different from what those of us who read understand it to be.7 As Arthur Wirth has concluded: Predictable multiple-choice questions replace conversation about books. Reading scores may have gone up, but literacy has collapsed.8
Reading is not the only aspect of curriculum-affected. L. Salmon-Cox found that teachers reported taking time away from science and social studies to devote more time to teaching math skills that were to be tested on the standardized test. Mary Lee Smith found that teachers employed eight different approaches to preparing students for the test, six of which would take time away from the regular curriculum.9 This curriculum effect was also summarized by George Madaus as a principle of the impact of testing: If important decisions are presumed to be related to test results, then teachers will teach to the test.10
Standardized testing also influences how teachers perceive and describe their students. In a study of norms guiding teachers evaluation of students, Peter Johnston, Paula Weiss, and Peter Afflerbach found that where tests ruled the classroom, descriptions of development were framed by those tests. In this situation, descriptions of the students development were relatively scant and impersonal, reflecting a lack of detailed knowledge of the students and a less personal and involving relationship.11 Further, Terrence Crooks observed that teachers pay relatively little attention to their own methods of evaluation; rather, he suggests that responsibility for the process of reflecting on student strengths and limitations is too often deferred by teachers to those who are more expert and "objectivethe norm-referenced test makers.12
Students experiences with tests send a variety of messages concerning the nature and meaning of learning and what is valuable to learn. Lorrie Shepard claimed, for example, that students learn that there is one right answer to every question, that the right answer resides in the head of the teacher or test maker, and that their job is to get that answer by guessing if necessary. Those perceptions and behaviors are hardly consistent with the goal of having children construct their own understanding.13 What is more, these attributes are not consistent with the promotion of critical thinking, creativity, or responsibility for ones own learningall goals that most educators would agree are important.
Clearly, there is quite convincing evidence that standardized testing of the sort that is going on in the majority of states carries with it liabilities and consequences that seriously affect teachers, students, and curricula. These effects were at work in the schools we studied, because the state of Arizona mandates annual assessment of schoolchildren using a state-selected standardized test. However, we believe we uncovered additional influences of accountability that incorporate and go beyond the singular, albeit important, influence of the standardized test.
THE CONTEXT OF THE STUDY
This study was designed to describe and examine teachers beliefs and practices concerning the teaching of reading comprehension in grades four, five, and six.14 Altogether, the study included thirty-nine teachers in six schools and two school districts in the Southwest. All thirty-nine teachers were interviewed about their beliefs concerning reading comprehension and were observed teaching reading in their classrooms. As one element of the study, twelve of these teachersseven at Desert School and five at Gun Schoolwere involved in a semester-long staff-development program that was videotaped. The principals of these schools were also interviewed, and the principal of Desert School dropped in on the staff-development meetings from time to time. This article focuses on the conversations in staff-development sessions.
The content of the staff-development program was teachers practical knowledge about reading, in combination with research-based reading comprehension practices that may be used when teaching fourth-, fifth-, and sixth-grade students. We had conducted an extensive literature search to find practices that were reported to be empirically sound.15 In the staff-development program, we introduced research-based practices as teachers engaged in informal group discussions about their questions and needs. To help frame questions and to articulate needs, typed transcriptions of the teachers interviews and our analysis of the premises revealed by the teachers in the interviews were presented and discussed at the first staff-development meeting. Through discussion of those interviews, a tentative agenda was established for the topics that would be the focus of future meetings. It is important to emphasize that the topics discussed by the group were those they selected within the general parameters of reading comprehension.
We videotaped each staff-development session and then analyzed the conversations by categorizing them into major topics and subtopics. A major topic, for example, was grading, with a subtopic being book reports (when the conversation shifted from a general discussion of grading to one in which a teacher describes how she grades book reports). The time devoted to each topic and the following information was recorded: the stimulus for the change in topic, the participant(s) who initiated the topic, the nature of the conversation, the discourse mode, and the participation level. All topics related to assessment or grading were examined in depth for this article.
TIME SPENT ON ASSESSMENT ISSUES
We met with the teachers of Desert School eleven times, in two-hour sessions, and the teachers of Gun School eight times, in three-hour sessions. During fourteen of those meetings time was devoted to discussing accountability-related topics. At most meetings, the conversation covered a broad range of topics, including instruction and management-related issues in literature groups, different questioning practices, definitions of reading and comprehension, general management concerns, pros and cons of basal readers, and vocabulary instruction. For many of these topics, the conversation would turn quickly to assessment. At Desert School, one meeting focused entirely on assessment issues; at another meeting half the time was devoted to that topic, and during five other meetings at least 20 percent of the time was devoted to assessment. A similar pattern emerged at Gun School. All but one of the meetings had some time devoted to the discussion of assessment, and all but two of the remaining meetings focused on assessment 20 percent or more of the time. Thus, there was little doubt that assessment weighed heavily on these teachers minds.
Examining the nature of the discussions also revealed interesting patterns. First, on days on which the discussion was devoted entirely to testing, the dialogue was characterized by those analyzing the data as adamant [intensive], with all teachers actively involved and engaged. Likewise, at Gun School, during the session in which testing was discussed the most, the dialogue was characterized as being a conversation among all the teachers. Control of the discussion seemed to be with the group rather than with the leaders. In contrast, when instructional practices were presented or when the nature of reading comprehension was described, the project evaluators described the discussion as being a lecture initiated by a staff developer or teacher, and the floor was controlled by that person while the explanation was being offered. The difference between the two modes of discussion may be similar to what Johnston refers to as dialogue, when discussing testing, and monologue, when discussing practices for teaching reading or theories of the reading process.16
Two themes emerged from analyzing the content of the conversations devoted to accountability: (1) grading as accountability and control, and (2) the importance of objectivity and the teachers mistrust of their own judgment when evaluating students.
GRADING AS ACCOUNTABILITY AND CONTROL
With regard to grading as related to accountability and control, Margie from Desert School said:
We find ourselves in this system of balancing what we consider to be meaningful teaching, not only reading but anything. Balancing meaningful teaching with what is expected of us from a management standpoint. And each teacher and each site and administrator has to come to some kind of a compromise on this thing.
In a similar vein, Peggy from Gun School reported needing to show her grade book to her principal to prove that she knew what her students were doing, that she had enough separate grades for each item on the report card, and that certain material had been covered.
Teachers at Gun School argued for quite some time that the issue really was one of accountability rather than diagnosis or assessment. They felt they needed to prove to their principal and to the parents of their students that they knew where their students were and had systematically gathered grades to prove it. Teachers at Desert School further argued adamantly that a school board member had the right and could be expected to walk unannounced into a teachers classroom to examine the plan book and grade book. Although none had experienced this, nor knew any teachers who had, Kerri said that a parent or school board member could and would bring a teacher up in front of the board for reprimands for grading practices.
Not only were the teachers pressured by their external constituents, but they were also influenced by their perception of the role grades played in the classroom. There was a generally accepted sense that without grading, students would not work or pay attention. When asked in a session what would happen if we did away with report cards, one teacher responded: Well, we might as well just send the students home. Others nodded and laughed in agreement.
This climate of suspicion and defensiveness is consistent with the findings of the effects of standardized testing, and suggests that these attitudes have generalized beyond standardized tests to include day-to-day evaluation methods. Grading, like standardized tests, was viewed not as providing teachers, parents, and students with information that would help them in the instruction and learning process, but as a response to outside demands for accountability. Grading was also seen as functioning to-control student behavior. The cost of these practices and beliefs is that evaluation has moved away from educational goals and toward responding to systemic pressures applied by those outside the classroom. Indeed, as Kerri said in the opening dialogue, a game is being played but it is not a game necessarily related to learning; rather, it is a public relations game, meant to appease the constituents.
The importance of objectivity and the lack of teachers trust in their own judgment was also strongly evident in the tapes. The teachers at Desert School talked at length about how limited their own judgment was and about how no one could be expected to believe them, since indeed they mistrusted themselves. Kerri said that she would have given many of her students As on the basis of how they talked:
I dont want to be a subjective grader, I want to be an objective grader. I can tell you of students that if they came into my room at the beginning of the year, the way they talk and the things-that they say, I would give all As. But actually when I look at the work they do, its not A work.
An initial response to Kerri might be that she simply needed to learn about alternative means of watching and evaluating students; however, a review of the videotape from this session revealed that her colleagues-nodded in agreement and apparently supported her contention. There were mumblings that the students could talk a good line, and that the only way to know what the youngsters could really do was to accumulate grades and average them. The power attributed to averaging is reminiscent of Richard Feynmans reaction to a panel of educators who were carelessly, he thought, evaluating textbooks for adoption. He told this story:
Nobody was permitted to see the Emperor of China, and the question was, What is the length of the Emperors nose? To find out, you go all over the country asking people what they think the length of the Emperor of Chinas nose is, and you average it. And that would be accurate because you averaged so many people. But its no way to find anything out.17
In the case of averaging for grading, the sense was that grades could be sort of mushed together, thereby achieving objectivity about which no one could complain.
The worth and nature of the report card and the requisite objectivity required by the report card were topics of major conflict. The report card called for grades in areas of student behavior such as listening and speaking. Teachers reported that it was impossible to determine objective grades for behavior that so clearly could be evaluated only subjectively. Thus, we saw a major contradiction among these teachers and within the community at large arising from the perceived demands for and desirability of objectivity and the reality that all types of educational evaluation were necessarily and by definition subjective.
This theme, which suggested that the teachers lacked confidence in their own judgment, was disturbing because it was indicative of the dehumanization that Arthur Wirth wrote of when describing the character of schools.18 Tests developed elsewhere were objective, and could therefore be trusted, but teachers own tests were subjective, implying that when they used their own judgment the grades could not be trusted.
THREE TYPES OF GRADING
We saw the two themes found in our discussions further played out within the context of three types of grading and assessment systems used by these teachers. The different systems seemed to affect teachers differently in terms of the anxiety caused by their sense of the measures validity, the degree of personal control they felt over the evaluation process, and how they used each measure. The three systems comprised assessment to place students in reading groups, assessment to determine whether students had mastered skills in the basal readers, and assessment to determine grades for the required districtwide report card.
Placement for Reading Groups
Among teachers who grouped students by ability, this measurement system was used to determine reading level for the number and makeup of reading groups. The teachers in this sample seemed to feel the most ownership of this process. It was also the only private evaluation act they talked about; that is, they were not accountable to anyone-for this decision, and were not required to explain it to anyone beyond a parent who might ask.
Sometimes they stated that they gave placement tests provided by the basal reader publisher, and sometimes they looked at a students preceding years placement. Few teachers used the state-mandated standardized test for this decision. One teacher mentioned that she looked at the placement files at the end of September, after she had placed students in reading groups. This was consistent with other findings that experienced teachers prefer to make their own judgments about students before reading the students files.19 The teachers who grouped by ability further reported that their placement decisions often contradicted the results of the district-adopted basal placement tests, and all stated that they would move a student from one group to another in theory, but that they did not do so very often in practice.
The fact that these teachers took responsibility for placing students in reading groups may seem to contradict both the accountability-to-constituents and objectivity themes. When this was discussed with teachers they suggested that it was a low-risk procedure because students could always be changed from one group to another. Further, this process was not always seen as assessment; but rather as an instructional decision, and teachers were seldom, if ever, challenged to justify their placement decisions; thus, it was not a public act.
Basal Unit Tests
The results of these tests, presumably measuring students mastery of skills taught in the basal reader units, were recorded in students files. The assumption was that teachers who received these students in the next grade would use this information in placement for reading. Teachers appeared nervous about this reporting mechanism, and perceived that they were mandated to give the tests after teaching each unit. They thought that the scores on the unit tests were the least valid of all, and most said they never looked at their students scores from preceding years. All teachers agreed that some students who did very well on a unit test really could not read very well, and others, who were superb readers, did poorly on the unit tests. One teacher said that she had recently figured out that she did not have to race through the content in the basal reader because if we dont get through the book, I give them the test anyway. . . . I think they could have done all of it without anybodys help in the first place.
However, greater concern was expressed about getting around the unit test than about contradicting the placement tests. Perhaps this was because they placed the authority of the unit test outside the classroom, whereas placement of students in groups appeared to be a decision that conventionally and rightfully belonged to the teacher.
Their approach to the unit tests indicated a misunderstanding of the purpose and theory of competency-based testing. When we suggested that the test be given before the unit to see if the skills really needed to be taught, all looked very uncomfortable, and some suggested that such a practice would be cheating. Their understanding of the process was that the teachers were expected to cover the content of the basal reader whether or not the students could perform the skills, and give the unit test at the completion of the unit. This was a good example of imposed accountability expectations. It is important to emphasize that these teachers and many others with whom we have worked in the districts spend countless hours giving unit tests and recording the results on forms, but report never using the results of the tests from previous years when assessing their current students.
All teachers in these schools were required to send home a report card each quarter. It was a one-page listing of subjects including listening, speaking, writing, grammar, reading, math, science, social studies, the fine arts, and several aspects of behavior and citizenship. The subject areas, including listening and speaking, were to be filled in with the traditional letter grade, and the citizenship grades with marks representing excellent, satisfactory, needs improvement, or unsatisfactory.
The teachers exhibited anxiety about the report card, perhaps because this was the system they were personally responsible for filling out and the means by which they most consistently communicated progress to parents. As discussed before, the pressure to be objective was often expressed within the context of discussions of the report card. This pressure was the rationale given for grading every piece of work students turned in. Grades were even given for drafts of writing.
This concern also seemed to affect the use of more current practices in teaching reading comprehension. Many of the practices involve encouraging students to take risks, make errors, and work in collaboration with their peers. Teachers were concerned about how these practices could be graded fairly. Process-oriented and descriptive means for keeping track of student progress were suggested, but were clearly judged to be inadequate by the teachers because such methods were subjective rather than objective and because those sorts of evaluation were not understood by the constituents.
Thus, the teachers proclaimed interest in moving toward process-oriented practices was constrained by their need for objective measures. This tension was played out in practice by the way in which they actually determined students grades. Their practices can be portrayed on a continuum, as seen in Table 1.
The most inflexible response, Hands-off objective, was subscribed to by three of the teachers at both schools. The grading was viewed as objective and fair, with little reliance on the personal whims of a particular teacher. It was a system that was distanced from the teacher; therefore the teacher could not be held personally accountable for grades students earned, and questions from parents or administrators could be answered objectively. One of the three teachers admitted to being uncomfortable with the system because often it was not in agreement with her own judgments, and it just seemed like more useless paperwork. However, the other two teachers were satisfied with the system, and both indicated that it helped them control their students.
The second category, Squiggle the grades to fit judgments, was the practice used by the majority of the teachers. Some were more willing than others to talk about how and why they squiggled the grades. One, for example, started giving his poor readers higher grades than they deserved because he wanted them to feel better about themselves as readers. Another claimed to be very objective, but admitted that after going to extensive lengths to carefully calculate each students grade in each of the subject areas, he would then go back through and change the grades if he thought doing so would motivate or reward students. Another teacher agreed, saying she would change grades for self-concept or motivational reasons.
Two teachers fit into the third category. They adapted the grading system to deal with particularly problematic students. For example, one developed a special grade that she would explain on the report card to deal with students who were trying but not succeeding. She simply did not wish to give a D or F to such students. She would, however, use a standard approach to grading in her classroom. Another teacher who taught students identified as learning disabled felt comfortable developing a special set of individualized standards for her students.
Among the twelve teachers, only one developed her own system outside of the supposedly required report card. She communicated extensively with parents at the beginning of the year, assuring them that they could come in to talk with her at any time. She collected samples of students work, and made global judgments about them on the basis of progress, completion of activities and assignments, and other unspecified criteria. Should a parent complain, which had happened very seldom, she used the students file containing the work and showed parents the actual work, not the grades; sometimes she compared the file with the work of another (anonymous) student. Her colleagues were surprised, perhaps even incredulous, that she would go against the required report card and were concerned that she was going to get into trouble with the administration and parents. Their response seems to be yet another example of the influence of perceived outside-of-the-classroom expectations that teachers allow to shape their practice.
While the teachers responded to the influence of assessment and accountability in varying ways, their responses did not vary between the two schools. Anxiety was high and the two main themes identified in this studythat the functions of assessment were accountability and control and that teachers feel a strong allegiance to objectivityran through the teachers comments in both schools. We therefore concluded that the schools, as well as the views of the two principals toward testing, were similar. As will be shown below, however, they were not, suggesting that the culture of testing extends well beyond school walls.
While both schools were similar in overall demographics, the mix of students in each school was quite different. Desert School was located in a suburban neighborhood with an enrollment of 50 percent Hispanic students, 47 percent white, and 1 percent African-American, Asian, and Native American. The principal described the families of the students as upwardly mobile working class to middle class. Students were primarily from the immediate neighborhood with a few bused in for special education. The Gun School was located in midtown in an area that is changing from a residential to a small business-oriented neighborhood. The student population was a complex mixture of neighborhood students, minority students bused in from another area, and students bused from a more affluent neighborhood.
The principals differed considerably in their views of testing. The principal of Gun School paid a lot of attention to scores, and worried about being embarrassed when they were published each year. He seemed quite product-oriented, and described himself as a real competitor. . . . If theres a game to be played, Id like to be in the top 9 or 10 percent.
The Desert School principal was, however, much less test-oriented. In a staff-development session, she stated that all activities did not have to be graded, and that grading could disempower the students. She consciously worked to regard teachers as professionals and to support their practices. Her personal educational philosophy was oriented toward whole language and integrated curriculum.
While the principals of these two schools differed considerably in their views and attitudes about grading and assessment, the views of the teachers did not. In fact, at Desert School, the principals protestation concerning the lack of need to grade everything seemed to be completely ignored in subsequent discussions among the teachers. The one teacher who rejected the district-made report card and made her own system of reporting to parents was a teacher who taught at Gun School. At least in these two schools, the principals influence was minimal.
Although our study involved only twelve teachers in one large school district in Arizona, the results were similar to those of other larger studies, including some based on samples of urban teachers.20 It was not insignificant, therefore, that we found that teachers are leery of measures of accountability. This affects both curriculum and pedagogy, and results in a-frightening image. The assumptions and values of the psychometric industry permeate the school and the community in which the school resides, and are overgeneralized and misunderstood. Expectations from society at large are that tests are being objective and scientific, to being very inventive and developing creative noncompliant reactions to the expectations. These views led to the set of metaphors included in the title: teacher as-game-show host, bookkeeper, or judge.
We believe that tests and grades play a negative role in schooling. This belief was confirmed by our study, which demonstrated that testing is a matter of extreme concern and confusion for teachers. Conflict and challenge about testing, both within individual teachers value systems and among teachers within a school and community, are bound to create tensions and disagreements. In and of themselves, tensions and disagreements are not harmful, but this study suggests inherent and pervasive contradictions between what a teacher is required to do and his or her own values. What is more, grading makes it difficult for teachers to care for their students. This may be why teachers are so uncomfortable with the grading process. Low grades and concepts such as standardized tests are antithetical to sound pedagogical principles that suggest that students should be provided with success experiences.21 Those of us who are concerned with the caring ethic and with pedagogical principles that rely on constructivist theories therefore need to find ways to resist the culture of accountability that is developing in this country. The influence of this culture must be addressed for teacher autonomy to become a reality.
This is especially important because principals may be relatively powerless in affecting teachers views of grading and testing. Recall the principal of Desert School, who clearly told her faculty about her position on grading and testing to no apparent avail. Indeed, comments by the teachers suggested that they perceived her as an exception and possibly even a wolf in sheeps clothing. They were afraid that she really did care about the standardized test scores and they admitted to doing what they could to assure that their students would do as well as possible.
In light of this, staff-development meetings in which a considerable amount of time is devoted to aspects of the accountability culture are important. Albeit in a small way, such a staff-development process can provide what Deborah Meier has called for to advance reform. It can put teachers in charge by structural change that supports school-based initiative, inquiry, and decision makingchanges that open time and space so that teachers have time to converse and plan together.22 Further, it is the responsibility of those who helped create the problem in the first place-researchers and teacher educatorsto persist in educating policymakers and the public about psychometric principles and, particularly, their limitations.