The Conscience of Educational Evaluation
by Ernest R. House - 1972
The author discusses teachers' and school board opinions of being evaluated and their initial hostile reaction to the process. At the termination of the evaluation, most teachers and boards come to realize its value. (Source: ERIC)
Those persons who talk most about human freedom are those who are actually most blindly subject to social determination, inasmuch as they do not in most cases suspect the profound degree to which their conduct is determined by their interests. In contrast with this, it should be noted that it is precisely those who insist on the unconscious influence of the social determinants in conduct, who strive to overcome these determinants as much as possible. They uncover unconscious motivations in order to make those forces which formerly ruled them more and more into objects of conscious rational decision. Karl Mannheim, Ideology and Utopia1
In recent years many good papers have been written on educational evaluation. In one of the best, Robert E. Stake outlined the many kinds of data useful for evaluation and suggested ways of processing those data.2 The liberating impact of his paper is demonstrated by the reaction of a fellow evaluator who, in a frenzy of orgiastic excess, exclaimed, "Stake has opened the data matrices!"a real highpoint in the evaluation world.
The title of Stake's paper was "The Countenance of Educational Evaluation." The title of this paper, "The Conscience of Educational Evaluation," reflects a somewhat different perspective. I wish to look at some unpleasant facts, to examine evaluation from a different focus.
The first observation I want to make is that there is no real demand among teachers and administrators for evaluating their own programs. To evaluate kids, yes, we cannot live without that; but to evaluate ourselves and our own programsno. At times, in that strange ideology with which we disguise our motives and cover our tracks, we educators convince ourselves that we would be overjoyed to receive data about our teaching and educational programs. Well, try it sometime. Try evaluating a program. On simply asking teachers their goals, we have had them break into tears and throw chalk across the room. Rare events, but not unrepresentative of teachers' attitudes towards evaluation.
After all, what does a teacher have to gain from having his work examined? As he sees it, absolutely nothing. He is exposing himself to administrators and parents. He risks damage to his ego by finding out he is not doing his job as well as he thinks he is. Perhaps worst of all he risks discovering that his students do not really care for him, something a teacher would rather not know. The culture of the school offers no rewards for examining one's behavioronly penalties. Since there are no punishments for not exposing one's behavior and many dangers in so doing, the prudent teacher gives lip-service to the idea and drags both feet.
And this is not so strange, is it? Do we have any serious evaluations of lawyers, or doctors, or cabdrivers? That there is no such demand is a corollary of a broader principle: No one wants to be evaluated by anybody at any time. Evaluate an evaluator's work and see how he reacts.
As recently as a year ago I believed in what might be called the "good instrument" assumption. If one could develop an instrument that was easy to administer, non-threatening to the teacher, that spoke in terms meaningful in the teacher's world (unlike most psychometric instruments which measure things like "convoluted neurasthenia"), and was cheap enough for even an administrator to purchase, the knot of evaluation would be partly unraveled. In fact, last year the Center for Instructional Research and Curriculum Evaluation (CIRCE) developed two such instruments. One could be administered in twenty minutes, measured things that teachers at least talk about, and cost about six dollars per class. The other was so non-threatening that the teacher filled it out herself and sent it home with her studentssimply a description of what was going on in class. The response to these instruments was considerably less than overwhelming. Teachers reacted as I've indicated: What have I to gain? Why do this extra work? Why set myself up for someone to shoot at? Why supply material for somebody's book?
Nonetheless, the instruments have been used somewhat. Although there is no demand for evaluation, there is a demand for personal attention and for someone to talk to. Teaching is a lonely profession, and if the instruments (or any innovations) are propagated by personal contact, some people will use them, at least as long as the personal contact lasts.
Budgets indicate the low priority administrators assign to evaluation. School districts spend no money for evaluation. State and federal programs typically allocate very little, usually a small percentage of the total budget. When budgets are cut (as they always are), evaluation funds are the first to go. Even when as much as 10 percent of the budget has been set aside for evaluation, it is difficult to get districts to spend the money for the purpose designated. The anti-evaluation attitude is deeply embedded in the school structure.
At the higher governmental levels the situation is only slightly different. Once when I was harping about the United States Office of Education evaluating some of its programs, a high OE official told me, "Look. In order to sell these programs to the Congress we have to promise them everything good that can possibly occur for the next ten years. What's an evaluation going to do? No matter how good the program, it's going to show that we are not delivering on all we promised." Generally speaking, in this country the claims for education have been so extravagant that there is no hope of living up to the promise. But without evaluation who's going to know the difference?
Why then have I been directing a large evaluation project over the last few years? There are, I think, a few important exceptions to the general rule. Evaluation becomes desirable when you think you are doing well but feel unappreciated; when you are in serious trouble; or when someone with authority over you insists that you be evaluated. For example, the large-scale evaluation of the Illinois Gifted Program got underway only when it was clear that the program was in deep trouble with the state legislature and even then only when higher authorities strongly pressured us to have it evaluated. State officials will confess that this evaluation was a novel and anxious event.
Under such pressures the personnel initially submitted with the enthusiasm of a chloroformed moth being pinned to a mounting board. They later warmed up when it appeared the evaluation might have some value for them. As it turned out, the evaluation has played an instrumental role in saving the program, but at the beginning no one knew that. By no means did the Illinois people volunteer themselves for examination.
As everyone knows, the troubles that beset most programs are economic; they need money. The contest for funds at the national level and the rising tax rates at the state level insure that educational enterprises are going to come under increased scrutiny. Accountability is one means of lassoing the wild stallion of educational spending.
A related set of forces are the interest groups who want one thing or another within the school. For instance, one of the Illinois independent study programs was under attack from conservative elements on the local board of education. Forces in favor of the program launched a counterattack. Right now there is an uneasy stalemate. The board has removed the school principal, but the independent study program is still operating. Since neither side has decisive political strength, words of evaluation are in the air. Conceivably either side may use data to bolster its case. A real resolution will come only when one side gains the upper hand politically.
Often evaluation activities are spurred by groups who feel they have been short-changed by the schools. Today this could be almost anybody. We have, for example, the NAACP audit of Title I, ESEA funds which revealed interesting irregularities in their disbursal. My point is that evaluations are not inspired by the heartfelt need of professionals to try to do a better job (frequent though that rhetoric is); rather, the impetus is usually traceable to a pressure group with a specific aim.
Evaluations For Defense And For Attack
From such dynamics result evaluations for defense and for attack. Most Office of Education evaluations are a good example of the former. Under pressure the OE officials tell Congressional aides, "Yes, we will evaluate this program if we get the money." So into the OE guidelines goes a requirement for evaluation. It is a symbolic gesture and is interpreted as such by the fund recipients. They respond with a token evaluation which says their project is doing fine. Everyone is happy. Paper begets paper.
These evaluations are faulted for their poor scientific methodology. But that is not the primary problem. One might get valuable information by asking one's cousin or by other "unscientific" means. The problem is in the intent: the evaluations were never intended to produce relevant information. Rather, they were meant to protect the Office of Education from hostile forces in the Congress and the Bureau of the Budget. Occasionally the Office is surprised, as it was with last year's Title I evaluation.
The evaluations for attack are generally more carefully done since they are trying to change things, rather than defending an entrenched position. A recent example is the Carnegie report by Charles Silberman, Crisis in the Classroom, which obviously laid the groundwork for a liberal assault upon the educational establishment.3 It attacks the "mindlessness" and restrictiveness of the public schools and takes as its positive model the "open classroom" as exemplified by the English infant schools.
Occasionally we also get evaluations that are simultaneously defensive and attacking. The following is an excerpt from the University of Illinois student paper:
POLI SCI INVESTIGATIONS START
The University political science department will be undergoing an evaluative investigation this week as initiated by the College of Liberal Arts. . . . The investigation is the first of its kind, although the political science department evaluation will not be the only one conducted this year in the college. . . . Sources in the political science department maintain, however, that the investigation was initiated by Chancellor J. V. Peltason. . . . Peltason initiated the evaluation, according to sources within the department, because certain department members' political activities have been "a thorn in his side" for at least two years.4
What might be interpreted as an evaluative attack on the political science department on one level, could be construed as an evaluation for defense of the University on another. Some very angry state legislators are breathing hard down the neck of the University over last spring's strike activities which featured prominent performances by members of the political science department.
Who sponsors and pays for the evaluation makes a critical difference in the eventual findings. Who conducts the evaluation is equally important. For example, knowing that the evaluation of the University of Illinois political science department is to be conducted by a committee of three political scientists from other campuses might give one a clue to the outcome of the evaluation. Certainly the outcome will be different than if the committee were comprised of three legislators. Even finer predictions could be made if we knew the relationships among these three professors and the principals in the case, e.g. how the evalu-ators were chosen. This is a gross example, but so are the differences which can result.
The Content of Valuation and the Content of Justification
Now this is quite a mess, is it not? If you have been following my argument, you should by now believe that evaluation is a branch of sophistry. And, judging from the number of times I've heard comments like, "Well, you can prove anything with statistics," the feeling is fairly widespread. Formal evaluations do arise from political motivations and reflect the biases of their origins. Whom can you trust if you can't trust an evaluator?
However, this is not an entirely new problem in human knowledge. Thoughts always arise from obscure and suspect origins: from metaphors, from dreams, from all forms of illogic, quite often from the most perverse attempts at self-serving. Yet, whatever the genesis of these thoughts, it is often possible to determine the worth of an idea.
Philosophers of science have found it useful to distinguish between an idea's "context of discovery"the psychological background from which the idea aroseand its "context of justification"the publicly determined worth of the idea.5 Even though we do not think in syllogisms or use formal logic, it is often desirable, perhaps necessary, to construct a logically idealized version of an idea so that it may be critically examined.
Analogously, in formal evaluation it may be useful to distinguish between the "context of valuation" and the "context of justification." The "context of valuation" involves the basic value slant derived from the genesis of the evaluation, and includes all those motivations, biases, values, attitudes, and pressures from which the evaluation arose. The "context of justification" involves our attempt to justify our findings. There are many means of justifying findings, but in formal educational evaluation it usually means using the logic and methodology of the social sciences (predominantly psychology) to collect and analyze data. In fact, Michael Scriven defines evaluation as a methodological activity which consists of gathering and combining performance data to yield ratings, and in justifying the data collection procedures, the weighings of data, and the goals themselves.6 (Most operating definitions of evaluation set it down squarely within the "context of justification.")
Such "scientific" procedures do not guarantee that the findings are "true," but they do promise that biases originating from the "context of valuation" will be greatly reduced. Hence we get concepts like "control group" and "random sampling" which are common sense attempts to eliminate one bias or another. It should be noted that in addition to the non-argumentative logic of science, argumentative forms of logic are available for justifying findings, for example, the methodology of our court system to which we entrust our property and our lives. This raises the possibility of other legitimate forms of justification. In the main, however, we rely on the institutionalized methods of science to exorcise the demons of subjectivity.
Thus utilizing scientific methodology in the "context of justification" enables us to justify findings arising from the "context of evaluation." But, as much as evaluators love to appear in white lab coats and tell their clients that this brand of individualized instruction will decrease tooth decay (and as much as it comforts clients to have them do so), life is not so simple. Many leading scientists tell us that even our scientific approaches are ultimately biased. After all, the communities of men who establish scientific canons are subject to the same pressures as the rest of us. In fact, there can be no value-free social research. All research must proceed from initial valuations of some kind. So if being "objective" means being totally free from bias, there can be no "objective" research. Try as we will, there is no escape from the "context of valuation."
Gunnar Myrdal, the Swedish social scientist, contends that it is not necessary for social research to meet the impossible condition of being value-free in order to be useful to us.7 All that is important is that the scientist reveal the values on which his research is based. It is the hidden, unseen valuation which is damaging and which leads to opportunistically distorted research findings, for covert valuations allow us to pursue our base interests at the expense of proper justification. We trick ourselves as well as others.
Making valuations explicit demonstrates the evaluator's awareness of them, forces him to account for them, and exposes them for what they are. Ideally one would use alternative sets of values to judge a program. Resources are seldom available to do so. Practically one usually chooses one set of valuations and puts one's meager resources into that. But one can try to see that the valuations are neither hidden nor arbitrary. They should be relevant and significant to the audiences involved. According to Myrdal, only in this way can the evaluation be as fair and honest as possible.
For example, when I began to evaluate the Illinois Gifted Program I was not a neutral observer. I was not about (nor is anyone likely) to invest much time and effort in a project about which I had no feelings. Early in the project it was suggested by several people that we set up a massive testing design using standardized achievement tests to measure the outcomes of the state program. That was the "normal" thing to do and in fact would have been done by a "neutral" evaluator who was unwilling to spend much time. I knew in advance that the nature of achievement tests plus technical problems like regression effects would combine to show no significant difference in favor of gifted programs.
Most importantly, by familiarity with the program, I knew that increasing achievement was not the state's main intent, and that the major efforts of the program had been to promote certain kinds of classroom achievements namely higher thought processes and student involvement. So we looked into the area of environmental press measures and eventually developed an instrument to measure those factors. We measured the program at perhaps its point of greatest strengththat's what I mean by being fair.
At the same time in the "context of justification" we developed our instrument in a rigorous fashion and employed as good a sampling design as we were able. We in no way "set-up" the gifted people. Where the program did turn out badly we reported it. Through familiarity with the program we also knew where the weakest points lay. We knew that many districts were taking state funds for the program and doing absolutely nothing for the gifted. So we developed instruments to measure that too. That's what I mean by being honest.
We reported both favorable and unfavorable data to people in the program, to people with control over the program, and to state legislators, saying that in our opinion the good outweighed the bad. Although we were conscious of "context of valuation" problems at the time, even stating in the original rationale the intention of making all value assumptions explicit, in retrospect I see places where we did slipshod work because we were influenced by our prior valuations. For example, we intended to collect some achievement scores as well as the other types of data, yet succumbed to our own prior valuations by placing such data so low on our priority list that we never did so.
In looking at our earlier work I can also see all sorts of hidden valuations and implicit assumptions that resulted in systematic errors. Perhaps most important, we put many more resources into collecting data that might be positive rather than negative. For example, we never investigated the elitism that might arise from gifted programs. As Myrdal suggests, ignorance, like knowledge, is not random but opportunistically contrived. We don't know what we don't want to know. All these problems notwithstanding, I think we were as fair and honest as one is likely to be, though our work can certainly be criticized on these grounds.
The Context of Persuasion
There is one final check on our own valuations and biasesthe biases and interests of other people. I might note that although the state legislature, influenced by our evaluation data, maintained the funding of the Illinois Program at its previous level, it did not significantly increase funding in certain areas we suggested. Much to the chagrin of both evaluators and parents, people do not always do what you tell them to do. This final argument I will pursue under the label the "context of persuasion." Producing data is one thing; getting it used is quite another.
In fact, evaluation data will be utilized or ignored mainly to the extent that they are of advantage to the interpreting group. For example, the findings of the Kerner Commission on Civil Disorders, which said that the United States has a system of institutionalized racism, were never put into effect. More recently the Scranton Report claimed that a major reason for campus unrest was a lack of moral leadership from the White House. President Nixon replied that there are lots of people responsible for moral leadership, like teachers and preachers. He did not even wait to read the commission report on obscenity and pornography, but rejected the findings out of hand without contaminating himself with the data. (Evidently not everyone felt as the President did. An illustrated copy of the Pornography Report was a bestseller at $12.50.) Along with these examples, all hot political issues, I submit the following from the Chicago Daily News:
WHITE HOUSE STALLS DATA ON CONSUMER PRODUCT TESTS The White House is delaying release of an order to government agencies to publish data on the thousands of products they test . . . including brand names of everything from computers to toilet paper. The order is based on a year-long study recommending that [government agencies] publish for the first time results of tests they make on products for government purchase. . . . White House sources say that pressures from business groups . . . have postponed its release indefinitely. The report, prepared by representatives of 21 agencies under Mrs. Knauer's direction, is weaker than consumer groups had hoped because it concentrated on positive data. Groups such as Consumers Union had sought release of all information, including negative facts on which products failed tests and why. But the White House report . . . states "the release of... adverse test data . . . would not contribute much in the way of useful data to consumers...."
I submit that toilet paper is not a hot political issue, but evaluation is. Evaluation is political because it involves allocation of resources, which means that some people gain and some lose. Little wonder that evaluation data are strongly slanted in the light of what the results mean for the interpreting group. Early in the Illinois evaluation we were amazed and confounded by the way our findings were used. One group would pick up one finding and use it to press its own position, ignoring the rest of the findings. Another would take the same finding to justify maintaining the status quo. In the legislature the part of the programtrainingwe found most effective for changing teacher behavior was severely cut because it was labeled "fellowships" and fell under the ire of legislators during the spring riots.
Later on we began to understand this phenomenon and use it to our advantage, which brings me to the last point of the paper, the "context of persuasion." We all live in a concrete world, in a world of metaphors and anecdotes, of strong feelings and personal relationships. Even evaluators live in that world. When I make decisions .for myself it is on the basis of this concrete world, not an abstract one. The kind of information a person can act on must be meaningful in terms of personal experience. And that means appeals to metaphors, anecdotes, and self-interests.
Evaluators are not very good at translating abstract data, like correlation coefficients, into concrete experience for their audience. The more formalized the operations in the "context of justification," the more difficult it is to make the findings meaningful in personal terms. In fact, the very methods that increase our generalizing powers lead us away from concrete meanings into abstract relationships. There is a natural antipathy between personal meaning, which leads us to action, and abstract data. Communicating scientific findings is not a matter of understanding research terminology; it is a matter of translating the findings to fit the audience's personal experience. Every person has a vocabulary of action within his mind; only when evaluation data roughly correspond to his internal vocabulary, does he respond to them.
Political lobbyists have been most effective in understanding and using the vocabulary of legislators. As one state legislator told me, "Man, I don't need another report. I've got sixty of those in my office. Don't tell me about statistics and all that stuff. I believe you. Tell me what you found out." In this case, personal credibility was his guide to action. On another level, the Silberman report is a good example of persuasive writing. In 525 pages there are only two lines explaining how data were collected. Though Silberman does refer to research studies, the bulk of the report is composed of anecdotes. And it is persuasive. Ordinarily evaluators greatly neglect the "context of persuasion."
However, it seems to me that the producers of the data must assume some burden in seeing their information is properly understood. Simply wrapping the baby up warmly and leaving him on the doorstep at midnight does not absolve one of responsibility.
So there it isa thumbnail sketch of the problems of trying to determine the social worth of educational programs: valuation, justification, persuasion; values, thought, action; morality, knowledge, power. New problems for education perhaps, but perennial problems for society.
1 Karl Mannheim. Ideology and Utopia. New York: Harcourt, Brace and World, 1936.
2 Robert E. Stake, "The Countenance of Educational Evaluation," Teachers College Record, Vol. 68,1967, pp. 523-540.
3 Charles E. Silberman. Crisis in the Classroom. New York: Random House, 1970.
4 Daily Mini, December 10,1970.
5 Israel Scheffler. Science and Subjectivity. Indianapolis, Indiana: Bobbs-Merrill Co., 1967.
6 Michael Scriven, "The Methodology of Evaluation," AERA Monograph Series of Curriculum Evaluation, November 1,1967.
7 Gunnar Myrdal. Objectivity in Social Research. New York: Random House, 1969.