Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Constructive Evaluation and the Improvement of Teaching and Learning

by Peter Johnston - 1989

In the attempts to improve teaching and learning through the use of psychometrics and its underlying science, conditions have been set inadvertently that are unlikely to encourage and support learning and self-evaluation. (Source: ERIC)

I would like to thank Terry Crooks, Carole Edelsky, Michael Kane, and Robert McMorris for various comments and constructive critiques of this article.

The purpose of educational evaluation is ultimately to contribute to the improvement of teaching and learning, and many see psychometrics (and edumetrics) as a cornerstone of this enterprise. The field of psychometrics1 rests on the assumption that it is possible to obtain an objective, valid, unbiased, empirical description of human learning activity and that it will serve educational stakeholders (students, teachers, parents, administrators) to do so. Yet for some time, leading researchers in the area and in numerous related domains have expressed serious doubts about the appropriateness of viewing human beings in objective terms, and about the underlying notion of science that makes it possible to do so. Perhaps the doubts are not being taken seriously, even when they are expressed by leading figures in the field. What would be the consequences for the field if these concerns were taken seriously? What would happen to such issues as validity, objectivity, and the value of educational measurement for serving the educational needs of society, and what are some alternatives? These are the issues I shall raise in this article.

At the outset, I shall assume (with unbounded optimism) that the gate-keeping function of testing is now declining in our society and that we seek more egalitarian, democratic ends. This issue alone would constitute a lengthy treatise, but I shall take seriously a statement by a major figure in the field of psychometrics. Robert Glaser, proposing an agenda for psychometrics, wrote in the American Psychologist:

It seems clear that we are over the threshold in the transition from education as a highly selective enterprise to one that is focused on developing an entire population of educated people. A selective system is no longer the prevalent educational demand. . . . The requirement now is to design a helping society in which we devise means for providing educational opportunities for all in equitable ways.2

In other words, we should be seeking to develop our educational system in such a way that we provide high-quality instruction for all. I think this is an admirable goal but, unlike Glaser, I would like to suspend for a moment the underlying assumption that psychometrics with its endless searches for objective, valid measures of human characteristics has anything to do with this goal. It appears to me, for example, that the issue of objectivity is something of a red barracuda, and that the term validity, as it is used in psychometrics, needs to be taken off life support. I shall start by describing some problems with objectivity and validity, and then proceed to describe what I think might be more constructive issues to wrestle with.3


The search for objectivity4 in psychometrics has been a search for tools that will provide facts that are untouched by human minds. Classical measurement has enshrined objectivity in terms such as “objective tests” and “true score” (the absolute reality). The intention is to appear like the “hard sciences” of physics, chemistry, and the like, which are believed by many to have such tools for describing real facts. Having stolen this idea from the hard sciences, it would be nice if we could give it back, but alas I doubt that they would take it. Even in quantum physics, the act of observation materially influences what is being observed.5 That is, when light is used to observe atomic particles, the photons of light alter the particles under observation. In addition, as high school science students are well aware, light can be described in terms of waves or in terms of particles, depending on the techniques used to study it or the questions asked about it. In the Atlantic Monthly, Werner Heisenberg, the Nobel Prize-winning physicist, noted:

In science we are not dealing with nature itself but with the science of nature—that is, with nature which has been thought through and described by man. . . . Modern physics, in the final analysis, has already discredited the concept of the truly real.6

In education and psychology we have been a little slow to realize this, but we have not been without warning even in our own fields.7 That perception itself is interpretive has been argued by philosophers of science such as Kuhn,8 and in education we are concerned with aspects of mental activity that we cannot see. Even children’s overt behavior, which is presumed to reflect mental activity, cannot be seen without being interpreted. There is no way to avoid this state of affairs. We look with our eyes, but we choose what to look at and we see (interpret, make sense) with our minds and describe with our language. The point is that no matter how we go about educational evaluation, it involves interpretation. Human symbol systems are involved, and thus there is no “objective” measurement. Tests are constructed by people with their particular frame of reality and responded to by people within their constitution of reality, and responses are analyzed by people within their own version of reality. Despite this knowledge, in education and psychology we still take a test like the Stanford Diagnostic Reading Test9 and treat the data it generates as if it were not a reflection of its authors’ view of reading, and as if the numbers it produces will not have to be interpreted by someone else with a particular view of reading, learning, and education. We struggle to ascertain whether the test is a truthful representation of the true components of real reading.

We are stuck with interpretation, so we might as well get used to it and make the most of it. Furthermore, unlike measurement and description in the physical sciences, when we are assessing literacy, we are engaged in examining something that is personal and (consequently) cultural in nature, using tools that are similarly of cultural origin. In doing so we engage in a social interaction with the individual or group being evaluated, and thus influence in powerful ways the nature of the understanding constructed by all parties. To make matters even less objective, in reading, the performance of the subject about whom we wish to make a statement involves his or her interaction with and interpretation of a text. The text is a sociohistorically situated symbolic entity produced by another subject. Furthermore, these dialogical interchanges take place in particular social contexts. In other words, evaluating reading performance involves a social interaction between a subject who is interpreting the interpretive interaction between another subject and a text produced by yet another subject within a particular (different) sociohistorical context. Objective? Hardly.

The search for objectivity may not simply be futile. I believe it to be destructive. Jerome Bruner has commented that “the language of education, if it is to be an invitation to reflection and culture creating, cannot be the so-called uncontaminated language of fact and ‘objectivity.’ It must express stance and must invite counter-stance and in the process leave place for reflection, for metacognition.“10 Psychometrics, because of its roots in positivistic empiricism, prevents the expression of stance because that suggests “subjectivity.” In doing so it eliminates the likelihood of the expression of counter-stance. Reflection is an internal, personal activity that is unlikely to be invoked by psychometrics, which validates only external knowledge. When used for purposes such as accountability, it is most likely to produce threat and entrenchment or self-righteousness rather than reflectiveness. Attempts to make knowledge of teaching and learning neutral through objectivity are also problematic, for reasons Wirth makes clear:

Insight may be regarded as the core of social knowledge. It is arrived at by being on the inside of the phenomena to be observed. . . . It is participation in an activity that generates interest, purpose, point of view, value, meaning, and intelligibility, as well as bias.11 In other words, by reducing the likelihood of responsive, reflective action, our efforts toward producing objectivity may be in direct conflict with our overall goal of improving teaching and learning. It is personal response and involvement that produce caring and concern. Of course this can go both ways. We can get negative consequences of personal involvement such as racism, classism, and low expectations. Attempts to produce neutral objectifications are, however, only one approach to addressing these problems, and I would argue a bad one.

Our efforts to objectify children’s activity involve trying to place sufficient distance between teacher and student and teacher and parent so that the human activity can be seen without invoking the “subjective” human response. In other words, we try to depersonalize the interaction and the interpretation. There are costs to this. For example, one of the important school characteristics that leads to a reduction in the dropout rate is a feeling on the part of the students that teachers actually care about them as individuals.12 The reduction of students or teachers to numbers in the service of objective truth statements suitable for decision making is not only counterproductive, however; it hides the political nature of the activity in a veil of science. As Raskin points out, numbers are “true” only when they are used nonreferentially.13 As soon as they are applied to people, serious social decisions have to be made about what can be seen as interchangeable, about the importance of interchangeability over uniqueness, and about which qualities of uniqueness should be sacrificed.

If objectivity as presently conceived is impossible, or irrelevant or downright destructive to educational improvement, what is left for psychometrics as a science to offer the educational community to help it better itself? Perhaps validity?


Richard Schutz, executive director of South West Regional Laboratory (SWRL), in reconsidering validity,14 commented that

the most reasonable course of action at this time would be to apply Occam’s razor to the term validity and to drop it forthwith. But that is not a realistic course of action. It is just not feasible to knock out one of the two prime foundations of an endeavor that has evolved to the present stature of educational measurement.15

On the contrary, eliminating the term with its present connotations in measurement might provide an interesting exercise. As Messick has noted, validity is merely a hypothetical construct, and subject to critique as any other construct.16 Schutz’s concerns about validity are based on the work of researchers who, on analyzing all the different types of test validity, conclude that at base they all amount to construct validity. For example, Messick contends that “the evidential basis of test interpretation is construct validity. The evidential basis of test use is also construct validity, but elaborated to determine the relevance of the construct to the applied purpose and the utility of the measure in the applied setting.“17 He notes further that

constructs . . . carry with them in the process of interpretation a variety of value connotations stemming from three main sources: the evaluative overtones of the construct rubrics themselves, the value connotations of the broader theories or nomological networks in which the constructs are embedded, and the valuative implications of the still broader ideologies about the nature of humanity and society that frame the construct theories.18

I interpret Messick to be saying that test validity in an educational setting essentially comes down to construct validity and that constructs are social constructions that carry with them all the personal and social attendants that implies.

Social constructions are in a constant state of transformation and negotiation under normal circumstances, just as language evolves and changes. If language has anything at all to do with thought, as Messick implies and Vygotsky19 and others clearly argue, then I can also assume that this transformation is critical to development and social evolution. The values that underlie judgments of validity cannot be universally agreed on, and keep changing, as Messick points out, “because of the incompleteness of our knowledge about human functioning and the workings of education and society, and in part because they implicate individual and societal values which appear to have changed considerably in the last 50 years.“20 In other words, the constructs that “anchor” educational testing are socially negotiated variables, the validity of which is socially negotiated within a changing pluralistic society.21

A good proportion of educational assessment and evaluation, including achievement testing for accountability, involves not only descriptive constructs, but causal explanatory constructs. We attempt to explain why children performed as they did so that we can get them or the teacher to perform better. Messick notes that we are dealing with an “open system” in which causal statements will be problematic, and generalization equally so since definitions of “better” and “perform” are continually up for negotiation. Lee Cronbach points out the problem with this through describing the problems with specifying boundary conditions.

Within an isolated system, the sufficient conditions for a phenomenon may be few; one can speak of a bacillus as causing disease. The boundary assumptions (for example, the bacillus must enter a living human body) go unmentioned, being implicit in the nature of the inquiry. A social phenomenon, however, does not reside within a stable, isolated system. Causal language is loose at best. What does it mean to ask the cause of the women’s movement of the last decade in the United States? It is only tautological to say that conditions in society were right for it. One might point out a few events or demographic trends and say, “If it were not for these, the movement would have died.” But such a claim appears to be unprovable. The issues raised in an educational evaluation are more like questions about the women’s movement than like questions about a bacillus.22

He also notes how we more frequently understand phenomena, and how the phenomena are not necessarily stable: “It is accumulated experience with widely varied conditions that enables us [for example] to attribute scurvy to lack of a certain food and nothing else; even so, evolution may in time change human metabolism so that people can go two years without vitamin C.”23

I find Messick’s and Cronbach’s arguments compelling. If we reject the notion of objectivity and the positivism that accompanied it as I have suggested, however, we are left without any “truly real” entities against which to consider the “truth” and social value of our interpretations and their use. We may still objectify, but the objectifications are deprived of their presumed ontological, privileged status. We can discuss our objectifications, or constructs, but not out of the context of our methods of establishing, or negotiating, the meaningfulness, value, and consequences of the constructs. These turn out to be a matter of taste with normative social constraints as with any other taste.24 From Messick’s perspective, “Test validity is viewed as an overall evaluative judgment of both the adequacy and the appropriateness of both inferences and actions derived from test scores.“25

Of course, one way for the field to handle this more complex construction of validity is to accept its various dimensions but weight some more heavily than others. For example, Lee Cronbach comments that

a well-trained investigator records what was done to a sample on a stated date and what was observed by what means, then derives various numbers. To that point, interpretation is minimal and ordinarily no question of validity arises. At most a critic can wish that a different study had been made. Questions of construct validity become pertinent the moment a finding is put into words.26

Cronbach appears to agree in general with Messick, but holds as primary the (relatively pure) empirical observation and its quantification (not considered a language) and places as secondary (“at most”) the choice of which questions to ask. When a first-grade teacher is deciding whether to focus on a beginning writer’s spelling, or some less countable aspect of how the child makes meaning, or the manner in which the child adopts the role of a writer, the consequences of choice of question are very serious indeed, making questions of relative empirical accuracy trivial.

Messick’s comments notwithstanding, many psychometricians will find this idea unsatisfying because it is very hard to give up a simple notion of accuracy. Some tests, some interpretations, must be truly better than others. How do we decide which interpretation is better than others? The current answer, of course, is that we turn to scientists—the ones who have privileged access to the definitions of accuracy and acceptable logic. If scientists are forced to give up privileged access to objectivity, or the relevance or ontological status of objectivity, what do we have left to maintain privileged status? If we must concede that all validity is construct validity and that these constructs are socially constructed figments of our imagination, which may or may not be socially valuable, then to maintain our privileged status we would need to argue for the privilege of the imagination and values of the “scientific community” over those of other communities such as the “teaching community.” I do not think we could muster a compelling argument, and if we could, we would have to interpret it in terms of the consequences for our children’s development under such tutelage, and what we must do about it.

My argument, then, is that the overwhelming concern for objective, valid measurement as a means to improve teaching and learning is not very helpful. I will argue next that if we really want to improve the quality of teaching and learning in the schools, two things must happen. First, we must turn to a different view of science, one based on dialogue not monologue. Second, if we want to improve children’s learning we must improve teachers. In other words, we need to set conditions in which teachers become learners about children, themselves, and their practice.


Science is fundamentally political and the choice of a scientific “metatheory” has serious implications. The positivistic view of science that drives current attempts to improve teaching and learning has been under criticism by writers in many different disciplines. Indeed, there is a groundswell that at the very least is rocking the boat. Kenneth Gergen in the American Psychologist discusses the issue of a metatheory from the perspective of a social constructionist :

What is confronted [by social constructionism] is the traditional, Western conception of objective, individualistic, ahistorical knowledge. . . . As this view is increasingly challenged one must entertain the possibility of molding an alternative scientific metatheory based on constructionist assumptions. Such a metatheory would remove knowledge from the data-driven and/or the cognitively necessitated domains and place it in the hands of people in relationship. Scientific formulations would not on this account be the result of an impersonal application of decontextualized, methodological rules, but the responsibility of persons in active, communal interchange.27

In other words, Gergen argues that knowledge about teaching and learning must be intersubjective—at once personal and social—and that science is something practiced collaboratively by communities more than individuals. He proposes that

rather than looking toward the natural sciences and experimental psychology for kinship, an affinity is rapidly sensed with a range of what may be termed interpretive disciplines, that is, disciplines chiefly concerned with rendering accounts of human meaning systems. . On the most immediate level, social constructionist inquiry is conjoined with ethnomethodological work . . . with its emphasis on the methods employed by persons to render the world sensible, and with much dramaturgical analysis . . . and its focus on the strategic deployment of social conduct.28

Suppose this view of science were at the heart of attempts to improve education. What are some of the consequences? First, it would mean an elimination of the ontological privileging of particular ways of knowing. It would no longer be permissible to accept a test score as more real than a teacher’s narrative description of a child’s reading and writing development over a period of time. Second, we would need to set conditions in which teacher learning communities could develop and engage in open, reflective dialogue. Third, a social constructionist view recognizes the place of values in science, and George Howard, also writing in the American Psychologist, comments that this recognition implies “a shift in roles for research psychologists from neutral truth seekers to chroniclers and molders of human action.“29 This sounds a lot like a teacher’s role to me, particularly in its active nature. It is reminiscent of Donald Schôn’s description of the “reflective practitioner” or Ann Berthoffs description of the “teacher-researcher.”30 Howard notes that such a shift in roles “implies a greater acknowledgment of subjectivity and value judgement in science.“31 Unfortunately, acknowledgement of this implication is not likely to be forthcoming from the educational measurement community. Its current general acceptance and financial viability rest very heavily on the denial of the relevance of values and subjectivity. The public, too, would be bitterly disappointed by such an admission.

Suppose the public did accept the place of subjectivity and value judgment in the work of research scientists.32 Could they accept teachers as bona fide researchers? Could they trust teachers to responsibly adopt such a role? Could they trust them to be responsive, reflective practitioners? I believe that under certain conditions they can, indeed must, just as teachers must trust children to be independent learners, and set conditions to allow this, if they ever expect them to become so. I believe that this is our only option, and I am not alone. Robert Stake noted in summarizing the consensus that he and his colleagues reached over a quarter of a century of experience in educational evaluation:

We did not come to be great admirers of American teachers, or their capacity for self-correction, but as with Churchill’s view of democracy, it beats the alternatives. We came to see that the important knowledge for correction on classroom practice is experiential knowledge.33

Although this comment supports my contention, it is a trifle negative and I would like to qualify it. We should not be surprised in the United States if teachers currently are not great self-evaluators. It is almost surprising that they self-evaluate at all. Evaluation has always been external to them. They are evaluated in order to have what they are doing validated by “science.” Positivistic science has devalued their means of knowing the children and removed most of the means of their knowing themselves, as I shall now explain.

The current assessment system, grounded as it is in psychometrics with its positivistic, empiricist notions of science, strongly contributes to a manipulative, depersonalized view of those to whom it is applied: students and teachers. In a largely female teaching community, it sanctions only ways of knowing that are not prototypical of women’s ways of knowing.34 For example, it does not value the kind of knowledge that allows kindergarten teachers to read their students’ writing. It keeps teachers lacking confidence in their own knowledge of themselves, their teaching, and their students, thus separating them and reducing critical dialogue through insecurity. As an example, in my region most school districts offer teachers several days per year in which they can go and observe other teachers, but this offer is rarely taken. My interviews with teachers suggest that, apart from administrative problems, this is at least in part because they fear that if they observe another teacher whom they respect, that teacher will come and watch them and see how little they really know. The competitive situations often set up through the use of tests also do a great deal to keep teachers apart.35

In these ways, the tests function to maintain subservience or powerlessness on the part of teachers. In addition, the tests maintain a situation that favors what Belenky and her colleagues call “received knowledge”:

For those who adhere to the perspective of received knowledge, there are no gradations of the truth—no gray areas. Paradox is inconceivable because received knowers believe several contradictory ideas are never simultaneously in accordance with fact. Because they see only black and white, but never shades of gray, these women shun the qualitative and welcome the quantitative.36

Many teachers appear to have adopted this view of learning, which contributes to their oppression. However, Belenky and her colleagues point out that alternative settings can produce something quite different: “In pluralistic and intellectually challenging environments, this way of thinking quickly disappears.“37

It appears to me that in our attempts to improve teaching and learning through the use of psychometrics and its underlying science, we have inadvertently set conditions that are unlikely to encourage and support learning and self-evaluation. The social constructionist metatheory of science stresses community, responsibility (which implies self-knowledge), and interchange or dialogue (which requires voice), and we have gone to some lengths to eliminate these conditions. Yet, even under such adverse conditions, Stake is still prepared to say that teacher self-evaluation is better than the alternatives.


It is often hard to imagine what something that makes sense in theory would look like in practice. How might children’s learning and its irregularities be documented? Let us start with an example that might fit the criteria. The Prospect School at North Bennington, Vermont, because of its concern for understanding and valuing the whole child’s development, shuns psychometrics entirely. Pat Carini, the director, argues that their method of evaluating children’s development is more comprehensive than these methods. Walt Haney describes Prospect’s approach as including

close observation of students and classes, written descriptions of children and the life of the school, and documentation of children’s work, including collecting samples of their writing, drawings, and other projects. Records are kept in narrative form (not, for example, by using checklists), and may contain accounts of what children like to do, how they work in groups and with other children, and their “involvement with formal subject matter.” . . .

Prospect School members meet regularly—at least once a week—to review records or issues. These reviews, also called reflective conversations, may focus on a particular child, a curriculum issue, or a more general issue or problem. These events are intended as collegial reviews in which staff members, and sometimes outsiders, attempt to deepen their understanding of particular children, pedagogical practices, ethical issues, or sometimes just school procedures.38 Haney notes that when reading the two- or three-page narrative observation reports, parent reports, and transition reports on Prospect children he was struck by the feeling he got for the child as an individual and contrasted it to his response to the typical test report, or school records. Narrative, you will note, is not commonly associated with empiricist views of science. However, Carini comments that this form of evaluation is more concrete, stemming from the practical need to understand particular children in particular settings. It is also, incidentally, more memorable and thus more likely to be used.

Central to this approach is the teacher’s ability to know the students, and to notice and record their development in a variety of areas. In other words, the teacher (in the context of dialogue) is the critical evaluation instrument. The ability to set the conditions for and to notice patterns of activity and changes in those patterns is at the heart of the teacher’s evaluative skill.

At Prospect, the teachers engage in real dialogue during which there is the opportunity for them to understand the children better, both directly through the dialogue about particular children and other children by analogy and by internalization of the dialogical process. At the same time, the dialogue allows reflective evaluation of their own teaching activity. The situation is such that it is likely to produce a community of what Schôn has called “reflective practitioners.” The reflective practitioner is expected to reveal uncertainties, publicly reflect on his knowledge-in-practice, and engage in a process of selfeducation.39 Part of this process requires allowing oneself to experience one’s confusions. This clearly requires the teacher to have confidence in his or her ways of knowing and ability to participate in the construction of knowledge, and in himself or herself as a learner and practitioner.

But the practicalities! Without psychometrics, how could we classify children with “special needs”—a major function of psychometrics? At the Prospect School, they believe that all children have special needs. Lori Shepard claims that not classifying children can save a school $600 to $1,000 per child just for the assessment.40 Admittedly, not classifying children results in reduced federal assistance, but the evidence suggests that such assistance results in no gains (or worse) for the children so classified in any case.41 In other words, the use of psychometrics for this function is unacceptable in the terms provided by Starr in the American Psychologist: “Testing should always be used in the interests of the children tested.“42

What can be done instead of classification? Pugach and Johnson introduced what they call “prereferral intervention.“43 Their work shows that teachers can be helped to counsel one another when they are having difficulties with a particular student. The teachers learn to notice children who are having difficulties, to describe the problem clearly, and through brief focused discussions to devise and monitor interventions within their classrooms. This has had the effect of reducing substantially the number of children classified as learning disabled.

This idea of teachers helping teachers also appears if we recast the literature on “dynamic assessment .” Dynamic assessment involves the teacher/tester working one-on-one with the learner, and by strategically intervening, finding out what the learner can accomplish given particular kinds of support. This allows the examination of what Vygotsky called the “zone of proximal development ,” which is the region between what the child can do unassisted and what he or she can do with optimal instructional support.44 Vygotsky and others have claimed that this upper limit is a better indicator of ability than is the lower, static, limit.45

Recently Delclos, Burns, and Kulewicz found that when teachers watched children being assessed in this way, their expectations of the child were raised and that this has important instructional consequences.46 I interpret this finding in terms of demonstrations of optimal instructional techniques and positive performance rather than as having anything to do with measurement and finding “levels of performance.” Delclos and his colleagues comment that it is unlikely that the raised expectations could be readily produced in written reports (although they have not tried narratives). The observation is seen as critical. Presumably, observation along with dialogue offers the possibility that the teacher learns more not only about the child, but about optimal instructional practice.

Note that in this situation the learning that takes place about the child is not merely “objective.” There is a memorable ownership of the knowledge. This personalizing and contextualizing of the evaluation brings us back to an issue that has been considered problematic. With the personalizing of the knowledge comes not only ownership, insight, and motivation, but bias. Psychometrics has struggled to remove bias by neutralizing data, assuming that bias will not be restored in the inevitable interpretation process. In the study by Delclos and his colleagues we see what amounts to the introduction of a positive bias. Work by Marie Clay has a similar effect by having teachers do no teaching in initial tutoring sessions with children.47 This requires the teacher to set a situation in which the child can successfully engage in reading and writing activities independently. When teaching does begin, it is only after the teacher has seen the child perform in a positive and independent manner. Psychometrics has been concerned with removing bias without obvious concern for the bias of interpretation. Perhaps it would be better to directly approach the problem of interpretive bias and, since teacher expectation does make a difference, actually go for positive bias.

What of the big practicality? Without psychometrics, how will the public know whether it is getting its money’s worth—whether teachers are doing a good job? How will teachers be held accountable? A good proportion of the public will undoubtedly be uncomfortable with the apparent loss of “quality control” in evaluation if we move away from psychometrics and the science that supports it. Reified positivism is very much a part of the public’s folk wisdom, and to be left with a relativistic view of what constitutes quality would certainly be unnerving. Some will fear that very soon we would have a return to witchcraft, black magic, and folk wisdom. There are several responses to this issue. The first is that psychometrics is not immune to this problem, and we might witness such constructs as “learning disabilities,” which are supported by one or another aspect of psychometrics. A second response is that views of what is acceptable in terms of social values, and what is credible in terms of acceptable lines of argument, are like the language that mediates them. They generally change quite slowly. A third response is to propose open negotiation of criteria for quality control on an ongoing basis, within a situation that allows for a dialogical view of knowledge.

A fourth response is to consider the issue not as one of quality control or accountability, but rather as one of trust. In other words, the issue would be how teachers might obtain the public’s trust that they are doing a good job. Trust and trustworthiness seem to be more appropriate qualities to seek than validity. I do not mean this simply in terms of the data and interpretations of them, but in terms of the personnel, resources, and community involved in the process itself. Trustworthiness implies the credibility to which Guba and Lincoln refer,48 as well as a general acceptance of the values implicit in the interpretation and application.

In the school setting, in which the processes are always ongoing, we must consider the issues of trust and trustworthiness as a little separate. That is, a school may be doing everything in its power to operate as an open, auditable, credible, trustworthy enterprise, yet not be trusted by the community. Trust is a personal response that often has a lot to do with the extent of knowledge about and experience with an individual or a community. We can examine how to put in place a trustworthy system, but the involvement of the community as a stakeholder from the outset is critical. If the only contact the community has with the school is through a set of graded report cards and test scores, then the establishment of trust will be a rocky road. While the relationship is based on threat and power (in the sense of “power over”) it will be hard for the school to carry out its work and be trusted.


In my school district I observed one of the kindergarten teachers on parents’ night, showing the parents a video of her teaching activity, explaining the rationale as they went through, and stopping here and there for questions. As a parent I found this (subjectively) very compelling, and at least healthy grounds for open discussion. I doubt that all of the parents found it quite as compelling, especially those who did not attend, but most appeared to be a great deal more comfortable after viewing the videotape than when they arrived. If nothing else, the fear of the unknown was substantially diminished. I am certain that they also had a great deal more respect for the teacher’s knowledge. In the larger area, once a week the school districts run “Principals’ Report” on their local cable television stations. In the half-hour reports, school activities are shown and explained and the principals and teachers are interviewed. Perhaps this type of activity would be more helpful in establishing trust than the production of standardized test scores. Another possibility might be, under certain conditions, to involve members of the public in the regular school reflective conversations. Newsletters sent to the public can be used more extensively, and other means can be found to open the schools to the public for dialogue. It is generally compelling to parents when teachers are able to demonstrate that they know a particular child and can discuss his or her development in detail. This requires the teacher to be a good observer of children’s development, and skilled at keeping records of this development—in other words, to be an evaluation expert.49 It also requires teachers to be able to talk coherently about the theory in their practice. Teachers must establish themselves as reflective practitioners within a community of such practitioners.

Guba and Lincoln have stressed the importance of auditability.50 Using the metaphor of the accounting audit, the idea is to leave a trail documenting the methods, the data, and the interpretations so that an external auditor might examine them to comment on their adequacy. The audit trail might be an archive of teacher logs, student portfolios, records of meetings, videotapes of instruction perhaps with accompanying comment, and other “raw” data against which external reviewers can compare interpretations. The sheer availability of such information is likely to be reasonable grounds for trust, though it is unlikely that a great deal of the information would be examined in detail by external reviewers. Data and interpretations can also be considered by a group of stakeholders from the local community (teacher and public). Provided the audiences for interpretation are critical and supportive (I include in this self as audience), these are ways to establish trust or credibility. However, the public, and teachers themselves, must first come to respect other types of knowledge and ways of knowing, which will not happen easily within the current reign of psychometrics for the reasons I noted earlier.

What if the stakeholders outside the school are unsatisfied with the school’s interpretations of its own activity? Or what can be done to prevent their becoming dissatisfied? Susan Klein and her colleagues at the U.S. Department of Education have reported the successful use of a procedure for helping to deal with this problem, and they call it “the convening process.“51 The idea is for the school to invite external analysis of its practice by a group of respected external school practitioners, and to enter into dialogue with them about ways to develop their instructional practice. The dialogue should include, for example, some board members, and should result in specific recommendations being made. I would add that it is probably important that the external group take pains to describe the school’s current practice not in terms of psychometrically sound data, but rather in terms of the participants’ views of their activity. As Dorr-Bremme points out, “If the account of the program or other activity under study does not accurately portray reality as participants know and experience it, participants can easily reject the evaluation as a useful basis for decision and action.“52

Aside from (but not unrelated to) the issue of establishing trust, there is the problem of establishing the credibility of data and interpretations at various levels. Guba and Lincoln describe a number of techniques for ensuring the credibility of interpretations. For example, they note the necessity for prolonged engagement in the social situation to ensure that interpretations being made are consistent with the realities of the participants, and not simply imposed by an observer from an external frame of reference.53 Providing evidence from a variety of different sources to justify an interpretation also makes it more credible. That is, if we have two different verbal reports, a piece of writing, and a pattern of questions asked by a student, all pointing to the same interpretation, then we can have more confidence in the interpretation than if we had one form of information.

Guba and Lincoln suggest “peer debriefing” to increase credibility of data.54 They note that searching dialogue about the interpretations with an experienced peer can help to keep the inquirer (teacher) from making hasty interpretations, and can push the inquirer on. At the same time, peer debriefing can provide a mechanism to reduce the emotional clouding that is almost inevitable when one is involved in close observation. It is critical that this peer be a real peer and that the context is not comparative, in order to prevent other issues from reducing real dialogue. Note that this activity sounds very much like the “reflective conversations” at the Prospect School described by Haney and by the Prospect Archive and Center for Education and Research.55 Pugach and Johnson, too, have shown this kind of activity to be very powerful in preventing educational difficulties for particular children (rather for particular teacher-child relationships).56


The most effective evaluation for learning is self-evaluation.57 Failure to evaluate one’s own learning is quite debilitating and severely restricts independent learning. For example, probably the best indicator of unhealthy reading development is failure to self-correct. The teacher’s responsibility with children is to help them to self-evaluate by causing “intelligent unrest” while helping the student to maintain a feeling of confidence so that he or she will respond by trying new possibilities rather than becoming defensive. Psychometrics stresses external evaluation and produces a dependency, and a distrust of the individual’s way of knowing himself or herself.

The same is true for teachers as learners. For example, when two teachers observe each other teaching and talk about their experience, some important things tend to happen. That each watches the other reduces power differential—both are at risk. When I watch another teacher I see specific examples of teaching activity that will or will not fit into my theory of teaching. When the other teacher watches me, to the extent that I know that teacher, I view what I am doing, as I am doing it, through his or her eyes. In many cases, simply doing that is enough to produce reevaluation and change. In a way it gives stereoscopic vision. The dialogue that stems from these experiences, and their explanation, is likely to produce a reevaluation of the activities and of the theories, provided there is security within which to contemplate disagreement. As I noted earlier, currently the effect of psychometrics is to inhibit such activity.

The point is that much psychometric evaluation is ostensibly in the service of public trust, but there are more effective ways of establishing public trust than through the use of psychometrics. Alternative methods may be less prone to producing negative side-effects. Methods of establishing trust deserve more direct investigation. Similarly, much psychometric evaluation is directed toward evaluating, and thereby improving teaching and learning. However, the side-effects of the efforts to produce “accurate evaluations” often reduce the chances of improving teaching and learning. It would be better to directly address the improving process.


I have argued that psychometrics is based on a view of science that is unfriendly to teaching and learning. The assumptions that underlie it make it indefensible in applied contexts, which appears to be readily inferred from the statements of major figures in the field of psychometrics itself. Furthermore, the principle concepts of objectivity and validity that currently preoccupy the field are more than distractions. They actually do substantial damage. Indeed, the only way to defend psychometric science is in the abstract, when it is totally removed from the applied context. This is indeed one of the tenets of the view of science that spawned psychometrics, that “pure science” and the academic should be insulated from the buffeting and turmoil of the “real world.” I have tried to make it clear that this policy cannot work in matters of human affairs, and that it is pretentious to believe that researchers can arrive at useful, static definitions of evolving sociocultural activities such as literacy.

For example, improving learning conditions in classrooms will involve eliminating basal reading programs, which are the most public comparative measurement system we have for children. A change to the use of writing and diverse literature in the classroom would help prevent teachers and parents knowing children through the simplistic construct “grade level” or “reader level” and would require teachers to know children in more complex ways.

If we start with the goal of setting conditions for reflective action and the generation of the personal and social knowledge that it engenders, our efforts may bear more fruit. If psychometrics seems relevant to those reflective practitioners at some point, then so be it.

In the long run, I believe that we will have no alternative but to provide teachers (most of whom happen to be women) with the conditions to develop themselves both as individuals and as a community of teachers and learners. In this regard, we might find relevant studies of how people know themselves and others. As Belenky and her colleagues describe in Women’s Ways of Knowing, “To learn to speak in a unique and authentic voice, women must ‘jump outside’ the frames and systems authorities provide and create their own frame.“58 The women in their study who became such “constructive knowers” report that it

began as an effort to reclaim the self by attempting to integrate knowledge that they felt intuitively was personally important with knowledge they had learned from others. They told of weaving together the strands of rational and emotive thought and of integrating objective and subjective knowing. Rather than extricating the self in the acquisition of knowledge, these women used themselves in rising to a new way of thinking.59

It is in the process of sorting out the pieces of the self and of searching for a unique and authentic voice that women come to the basic insights of constructivist thought: “All knowledge is constructed, and the knower is an intimate part of the known. At first women arrive at this insight in searching for a core self that remains responsive to situation and context.“60

If our ultimate goal is to improve teaching and learning, we have no alternative but to improve teachers and we must address what that means and how we might go about it. We are not without models. For example, Carole Edelsky describes some excellent examples of how teachers have organized to improve themselves and their practice.61 If we want children who are constructive knowers then it would surely help if we have teachers who are appropriate models.

If a school can establish that it is a reflective community committed to its clients, the students, then striking an agreement with the community should be somewhat easier, though far from easy. The major hurdle is probably the pervasiveness of the view of what constitutes science in the folk wisdom of the general public and of academia. We must work to assist the teaching community to make strong arguments for the rigor of reflection-in-action, the uniqueness of individuals, and the openness of classroom systems in order for real dialogue about teaching and learning to occur among the stakeholders. This will require rejecting psychometrics (at least as it is currently constituted) and the privileging of certain types of knowledge by certain segments of the community. Academicians must argue against many of the things that give them privileged status in the community-a tall order. Ultimately, we will be looking for a balance in our ways of knowing teaching and learning. I have not provided answers, just some issues and suggestions that I hope will prove provocative and productive.

Cite This Article as: Teachers College Record Volume 90 Number 4, 1989, p. 509-528
https://www.tcrecord.org ID Number: 464, Date Accessed: 5/25/2022 10:59:15 PM

Purchase Reprint Rights for this article or review
Member Center
In Print
This Month's Issue