Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Reflecting on Authentic Assessment and School Reform

reviewed by Walt Haney - 1996

coverTitle: Reflecting on Authentic Assessment and School Reform
Author(s): Linda Darling-Hammond, Jacqueline Ancess & Beverly Falk
Publisher: Teachers College Press, New York
ISBN: 0807734381, Pages: , Year: 1995
Search for book at Amazon.com

In its lead article, the winter 1994 issue of the ERIC Review noted that much attention lately has been focused on alternatives to multiple-choice standardized tests or what has variously been called authentic, alternative, or performance assessment:

Federal commissions have endorsed performance assessment. It's been discussed on C-SPAN, and in a number of books and articles. Full issues of major education journals, including Educational Leadership (April 1989 and May 1992) and Phi Delta Kappan (February 1993) have been devoted to performance assessment. A surprisingly large number of organizations are actively involved in developing components of a performance assessment system. (Rudner & Boston, 1994, p. 2)

Authentic Assessment in Action is one of the first book-length account of how these forms of assessment are working out in practice in a variety of schools. The bulk of the book, apart from brief introductory and concluding chapters, consists of accounts of how "authentic" performance-based assessments of student learning were developed in five schools and how these assessments relate to teaching and learning in the schools. Given widespread interest in authentic assessment recently, these case studies provide rich and welcome descriptions of the complexity of using these kinds of assessments in situ in schools. Taken collectively these accounts raise several important questions about the relationship between authentic assessment and other educational practices, about how student learning promoted via such assessments appears through the alternative lens of traditional standardized tests, and even about the appropriate scale for schools. At a minor level, these questions raise doubts about the consistency of viewpoint expressed in Authentic Assessment in Action. But more broadly they raise important issues about how school reforms being attempted in many of our nation's schools ought to be judged-and even more generally how we ought to view the aims and accomplishments of all our schools.

Before recapping some of the case studies and describing some of the large questions they raise, let me begin by recounting what the authors mean by the alliteratively attractive but ambiguous phrase in the book's title: "authentic assessment." According to Darling-Hammond. Ancess, and Falk, authentic assessments

engage students in "real world" tasks rather than in multiple choice exercises, and evaluate them according to criteria that are important for actual performance in that field (Wiggins, 1989). Such assessments include oral presentations or performances along with collections of students' written products and their solutions to problems, experiments, debates and inquiries (Archbald & Newman, 1988). They also include teacher observations and inventories of individual students' work and behavior, as well as cooperative group projects (NAEYC, 1988). (p. 10)

While I certainly endorse the authors' promotion of attention to alternative forms of assessment, such as those mentioned, it is unfortunate that they neglect to give more serious attention to standardized multiple-choice tests, which are of course frequently encountered in the "real world" outside of schools-for example in the form of employment, licensing, and military tests-and have often been seen as solving some of the shortcomings (such as unreliability, noncomparability, and cost) of the kinds of assessment that Darling-Hammond, Ancess, and Falk now tout as authentic.

The apparent reason the authors omit standardized tests from their conception of authentic assessment is that they argue that alternative kinds of assessments, other than those of the multiple-choice format, are needed as an antidote to the ills of standardized testing. Standardized multiple-choice tests "are poor measures of higher order thinking skills" (p. 6), they say; over reliance on standardized tests narrows the curriculum and such tests are poor diagnostic tools. Authentic Assessment in Action then can be read as an account of how five schools, all in the greater New York area and members of the Coalition of Essential Schools, have tried to unlace the straitjacket of standardized tests and have developed their own assessment systems.(1)

The five case studies are based on "classroom observations and interviews with staff, students and parents" (p. 15). This is about all we are told as to how the authors developed their case studies. Hence readers with a research bent may well wish for a bit more explanation as to how these five schools were selected for study and the methods by which the case studies were developed. But however developed, the cases all present rich depictions of some highly unusual schools that appear to be unusually effective in sewing substantial populations of minority, immigrant, and low-income students.

The first case, in Chapter 2, describes how students at Central Park East Secondary School (CPESS) in New York City prepare and present portfolios of their work as part of the school's graduation requirements. CPESS is a small, 450-student secondary school organized into three levels: Division I or what in most school systems would be described as a grade seven to eight junior high, Division II or grades nine and ten, and the Senior Institute, serving students who would normally be classified as upper class, grade eleven and twelve students. Chapter 2, "Graduation by Portfolio at CPESS," focuses on how students must meet fourteen portfolio requirements in order to graduate from the Senior Institute. Most of the portfolios students must prepare focus on traditional academic disciplines, such as math, tine arts, and history, but four portfolio topics are more unusual: postgraduate plan, autobiography, school/community service and internship, and ethics and social issues (and seven of the portfolios must be presented to a review panel). The Senior Institute is also unusual in that students are engaged in two kinds of work outside the school. While all students take a core set of seminars, each must also undertake a work-related internship (of 100 hours over the course of a semester) and attend two courses in local colleges. Examples of student work in meeting requirements of the Senior Institute are presented in some detail in Chapter 2. For instance, the portfolio of Marlena included "reports on an experiment on cancer causing cell transformations conducted as part of her internship in a Minority Research Apprenticeship Program at Hunter College" (p. 37).

Chapter 3 describes the senior project at Hodgson Vocational Technical High School. Building on a previous Hodgson requirement for a senior English paper, the concept of "diploma by exhibition," and a commitment to the idea of "student as worker," the Hodgson staff developed the idea of the senior project, which would require students to demonstrate skills mastered in both academic and career programs. Projects are comprised of a shop-based research paper, a shop project, and a formal public presentation of the project. The project of Jake, described in Chapter 3, consisted of a research paper on the history and craft of bed-making, a mahogany eighteenth-century-style pencil post bed he had built, and a presentation on his project to a panel of three teachers.

Chapter 4 is an account of collaborative learning and assessment at International High School in New York City. International aims to serve new immigrant students with limited English language proficiency. In 1992, its 460 students came from 54 countries and represented native speakers of 39 different languages. As one of the International teachers explained the evolution of the educational program there: "It became apparent that many of the techniques that I used in teaching, when applied to limited English proficient students, simply didn't work"(p. 120). So International developed a "learner-centered" model of instruction based on collaboration between and among students and teachers. One example of the new approach is the Motion Program, an interdisciplinary trimester-long program in which students participate full time and which spans the study of physics, mathematics, and literature. Other intensive programs developed at International High School are the Beginnings Program (aimed at helping students learn about themselves and how to live in a new place, and to take stock of who they are and what they want to become while beginning to learn English) and the Personal and Career Development Program (which focuses on career development and learning about work, oneself, and responsibility). Each of these programs within International High School has developed various programs of authentic assessment such as debriefings, exhibitions, written evaluations, portfolio reviews, and conferences.

The fourth case, presented in Chapter 5, is that of how P.S. 261 in Brooklyn, New York, is using the Primary Language Record (PLR) to document children's literacy development in reading, writing, and listening. The PLR was "conceived in 1985 by educators in England who were searching for a better means of recording children's literacy progress" (p. 170). Though some fifty New York public elementary schools have been using the PLR since around 1980, the focus in Authentic Assessment is on how it is being used in P.S. 261, "a fairly large, traditionally structured school of about 700 students with about 35 teachers and other support staff' (p. 169). With the encouragement of the school principal, Pre-K through grade 1 teachers at P.S. 261 started using the PLR in 1991. The PLR was designed around the principles of parent involvement, respect for families, respect for children, and respect for teachers' knowledge and professionalism. It encourages teachers to observe, document, and learn about their students' learning by interviewing parents about children's reading, writing, and personal interests, by language/literacy conferences between child and teacher, via observations focused on the child as a language learner, and by using rating scales of the child's literacy development.

The final case presented in Authentic Assessment is of how assessment has been woven into the fabric of teaching and learning in the Bronx New School. "In 1987, parents and teachers from diverse neighborhoods of a local school district in New York City came together to found the Bronx New School (BNS)-a small public elementary school of choice" (p. 205). At BNS, pedagogy centers on students, and assessment "places observation of students and their work at its center" (p. 207). The main approaches to assessment described in Chapter 6 are portfolios of both teacher-kept records and samples of student work. Though BNS is a small school, teachers were often overwhelmed by the job of keeping such records and the time it takes to write notes on each student. BNS teachers experimented with several approaches to documenting student progress, including the Primary Language Record, but according to the account presented in Chapter 6, they seem to have found a more powerful assessment strategy in the Descriptive Review of the Child. The Descriptive Review, developed by Pat Carini and others at the Prospect Center in Bennington, Vermont, is not so much an assessment technique as it is a social process through which teachers in a school collaborate in making sense of individual children's progress in school:

The perspectives through which the child is described are multiple, to insure a balanced portrayal of the person, that neither over emphasizes some current "problem" nor minimizes ongoing difficulty. The description of the child addresses the following facets of the person as these characteristics are expressed within the classroom setting at the present time:

The child's physical presence and gesture
The child's disposition
The child's relationships with other children and adults
The child's activities and interests
The child's approach to formal learning
The child's strengths and vulnerabilities

(Prospect Center, 1986, quoted in Darling-Hammond,Ancess, and Falk, pp. 224-225)

The Descriptive Review proceeds through a meeting in which teachers come together to discuss an individual child and his or her schoolwork in depth and to collaborate on finding ways for the school to be responsive to and promote the growth of the individual child. How the Descriptive Review process works is nicely illustrated through the story of Akeem, who was a third grader when he came to the Bronx New School. Considered an openly disruptive presence in class, Akeem seemed almost surely headed for school failure until his BNS teachers figured out what was causing his disruptive outbursts and found ways of building on his strengths in art and physical construction.

The main value of Authentic Assessment in Action, it seems to me, resides in these five case studies themselves. The cases are a hit uneven, with those of Hodgson and BNS having more narrative detail to catch and hold readers' interest than some of the other cases (the stories of Jake's bed project at Hodgson and Akeem's integration into school life at BNS, for example, are ones this reader will remember). Nonetheless, the final slim chapter (a mere 15 pages out of 270) attempts to draw some lessons about how "these schools' teaching, learning and assessment practices are able to support high standards without standardization-to promote excellence in the context of diversity" (p. 252). Among the common factors identified as contributing to these diverse schools' success are commitment to viewing the school community as a learning organization, with strong leadership committed to democratic approaches to educational governance, with time and resources set aside for teachers to learn from one another and from sources of help outside the schools; keeping schools small enough to allow and maintain a real sense of community among teachers, students, and parents; and a common commitment to building on the strengths of all students so that "teachers, along with students and their families, solve problems before they give rise to learning failures [which, incidentally, is what almost happened to Akeem], nonattendance, disciplinary issues, and other outcomes of students' inabilities to adapt to or be noticed by a depersonalized school structure" (p. 264).

Authentic Assessment in Action is a very good book, one that can be profitably read and pondered not just by people interested in alternative forms of educational assessment, but by anyone interested in school reform and improvement. And if the hallmark of excellence in a volume is its capacity to make a reader think, this is an excellent book, for I am left pondering several matters: (1) to what extent the assessment horse can lead the educational cart; (2) to what extent the sorts of student achievement that Darling-Hammond, Ancess, and Falk have documented in Authentic Assessment in Action are or should be mirrored through the alternative assessment lens of standardized multiple-choice tests; (3) whether there is a critical size for schools; and (4) whether there has been an overemphasis on cognitive learning in recent school reform initiatives.

Can the assessment cart lead the educational horse? While reading Authentic Assessment, I wondered several times whether Darling-Hammond, Ancess, and Falk engaged in a sleight of pen in naming this volume, for much of the book is not about assessment, but about the broader processes by which these five schools have transformed themselves. In their final chapter, the authors clearly state their view that authentic assessment cannot be grafted onto just any school:

Authentic assessment is not just another external program that can be imported into a school that behaves as though it is a warehouse of programs, stacked side by side, securely isolated from one another, and having little expectation of significantly impacting the school culture...Authentic assessment can survive, thrive and be powerful only as schools become dynamic consciously evolving communities. (p. 262)

This passage suggests that it is the educational horse of the school that pulls the authentic assessment wagon. And it leaves the reader wondering: If not assessment, what created the educational horsepower in these schools? But just one page later, the authors suggest that it is the other way around, that assessment can lead:

In all of the schools we have studied, authentic assessment practices have encouraged faculties to rethink traditional allocations of time and traditional organizational structures to support these norms. (p. 263)

The authors sometimes seem inconsistent on the extent to which authentic assessment can spur schools into becoming "dynamic consciously evolving communities." If assessment alone cannot, what other crucial ingredients are required? Perhaps the closest approximation to a single line answer to this question comes from the Chapter 4 case study of International High School: "At the heart of International's integrated teaching, learning and assessment system is its creative and energetic faculty" (p. 165).

But what really seems to be at work in the schools described in Authentic Assessment is something of a dialectical process. The schools have begun with a number of conditions that seem conducive to learning-widely shared views of the school community as a learning organization, strong school leadership committed to democratic approaches to educational governance, time and resources set aside for teachers to learn from one another and from sources of help outside the schools, small school size (more on this point later), and a common commitment to student-centered learning. Then, in the process of thinking through how to better promote learning, the educators in these schools made a commitment to using new assessments, and in using new assessments, they see the need to change conditions for learning and using assessments, like taking more time to talk about student work, which in turn can lead to further changes in assessment and conditions of learning.

If the authors of Authentic Assessment are a tad inconsistent as to the extent to which they think that authentic assessment can lead school reform, they are downright schizophrenic on the matter of whether the student learning promoted and documented via authentic assessments will be apparent in standardized test results. The reader is first alerted to this problem by the following disclaimer in the preface:

It is not our intent to investigate either the measurement properties of these [authentic] assessments or their effects on student performance as captured by other measures, though we touch on these matters as they arise in the course of our inquiry. (p. xi)

The first place I was reminded of this disclaimer was in Chapter 2. In discussing the case of Marlena, the CPESS Senior Institute student who pre-pared a report on cancer-causing cell transformations, the authors present Marlena's transcript and comment that this student had "secured creditable scores on the SAT and College Board Achievement Tests" (p. 37). But Marlena's transcript (p, 41) shows that she had received a 430 on the SAT Verbal. Presuming that this was a pre-"recentered" SAT score, it is equivalent to about a median score among the national population of SAT taking students. Such a test score is certainly nothing to be ashamed of, but it seems a bit out of line for a student whom the authors describe as "a young person well-launched on a scientific career" (p. 37). My own view is that single test scores should never be judged in isolation-and since Marlena's transcript showed a number of other more "creditable" test scores, including SAT math and College Board Achievement test scores in the range of 550 to 600 and satisfactory-plus or distinguished grades in most of her Senior Institute courses, I did not make much of this minor apparent anomaly.

Then in Chapter 3, on Hodgson's senior project, I was puzzled by the following passage:

Given the differences in the nature of abilities being assessed, it would be unreasonable to expect the Senior Project to significantly influence Hodgson's standardized test scores. However, it is possible that the companion efforts to integrate mathematics into the vocational curriculum had some bearing on the increase in the school's Stanford Achievement Test scores in math between 1991 and 1992. (p. 108)

This passage hints of an asymmetry of attitude that seems unfortunately common in school reform efforts. If test scores go up, schools are usually happy to take the credit. But if scores go down, then either the test or a mismatch between test and instruction is frequently fingered for the blame.(2)

In Chapter 4, we are presented with a table (about the only table of numerical data in all of Authentic Assessment) showing the International High School Regents Competency Results for 1990, 1991, and 1992 and these data are cited as indicators of the school's success: "As Table 4.1 indicates, virtually all of the students pass the New York Regents Competency Tests, an unusual accomplishment for students whose first language is not English and who have been in this country for only a short time" (pp. 116-17). But then at the end of Chapter 4, the authors comment:

As the school seeks to go further in the direction of an interdisciplinary curriculum, organized around active, collaborative learning, the New York Regents Competency Tests (RCTs), particularly those in content areas (e.g. social studies and science), increasingly "get in the way," according to Nadelstern [International's founding director and principal]. Still grounded in an outmoded theory of learning emphasizing the memorization of discrete facts, and an outmoded form of testing using primarily multiple-choice exercises, the RCTs grate against integrated project and activity based curricula that are seeking to get students to think in greater depth and to perform in more complex ways.
Though students at International succeed on the Regents tests, the tests deflect time and attention from what the school sees as more important kinds of learning activities. International faculty look forward to a day when the state's approach to assessment supports the kind of successful teaching and learning they have developed. (pp. 165-66)

Chapter 5, on the use of the Primary Language Record at P.S. 261, says little about how student performance compares when viewed through the alternative lenses of the PLR and standardized tests. One P.S. 261 teacher is quoted as saying that the PLR provides her with concrete information that is more useful than standardized test scores in guiding her teaching (p. 186). At the end of Chapter 5, in the section "Lessons for Using Assessment like the Primary Language Record," the authors suggest:

Perhaps the major challenge is how to lessen the grip of traditional standardized tests on teaching, so that the tension between the values and goals of these different approaches to learning are[sic] minimized. The barrage of tests given each year to students in New York City public schools (and many other school systems) reflects views of literacy development that conflict with those that shape the practices of the Primary Language Record. (p. 196)

The final case study, that of the Bronx New School in Chapter 6, again presents a contradictory answer to the question of whether student learning as revealed through authentic assessment will also appear in standardized test results. In the case of Akeem, the answer appears to be no:

In spite of the changes that took place in Akeem's learning, his standardized test scores did not improve dramatically over the three years that Bronx New School teachers knew him. Although they increased slightly each year, he essentially remained a low test scorer. The tests did not reveal what a thinker and questioner he had become; what a risk-taker he was; what an inventive, artistic sculptor and drawer he was; what a gentle, funny, considerate person he could be. They did not demonstrate his progress or give information that would support further teaching for him. They did not show, for instance, that over time he had tapped into many more reading strategies than he had utilized before; he was able to read a wider range of material with greater success. (p. 223)

Later in Chapter 6, however, we are told that students at the Bronx New School "actually performed quite well" on standardized multiple-choice tests:

The areas in which they showed the greatest strength were the sections that allowed them to demonstrate their abilities to "problem-solve" in mathematics and to make sense of text in a holistic way. The older the children, the better the scores. It seemed that as children progressed through the grades, the limiting format of the tests became less of a hindrance in allowing them to demonstrate what they knew and what they could do. (p. 246)

So in the end, Authentic Assessment leaves us with a repeatedly mixed message as to whether, or under what conditions, student learning documented through various forms of "authentic assessment" (such as portfolios, debriefings, exhibitions, written evaluations, and conferences) will also be apparent in standardized test results. Why is it that International High School students do well on the New York Regents Competency Tests, but regarding Hodgson high school students we are told that "it would be unreasonable to expect the Senior Project to significantly influence Hodgson's standardized test scores?" And in the Bronx New School, why did Akeem remain "a low test scorer" over a period of three years, even though "over time he had tapped into many more reading strategies than he had utilized before; he was able to read a wider range of material with greater success," whereas other students at the Bronx New School "actually performed quite well" on standardized multiple-choice tests?

To be fair to Darling-Hammond, Ancess, and Falk, they tell readers right from the start that they are not going to address such matters systematically-that is, that they do not intend to investigate the effects of authentic assessments "on student performance as captured by other measures." But these matters deserve more attention than they have been given in Authentic Assessment, because on two important points Darling-Hammond, Ancess, and Falk are in one instance simply dead wrong and on the other ironically myopic.

First, they are wrong in saying that standardized multiple-choice tests represent an outmoded technology that is soon to be replaced by more authentic forms of assessment (pp. 165-66). By all the signs I can read, standardized multiple-choice tests are going to be with us and our schools for the foreseeable future. Indeed, despite the upsurge of interest in and enthusiasm for performance assessment in the early 199Os, there have been several prominent signs in the last few years of cutbacks in performance assessment as part of large-scale assessment programs. In California, that state's much publicized California Learning Assessment System (CLAS) came under fire for technical deficiencies and political correctness, and then was aborted when funding for the program was vetoed by the California governor. Plans for the National Assessment of Educational Progress call for scaling back the use of performance assessments so as to allow wider content coverage via multiple-choice items. And in Kentucky, which in the Kentucky Instructional Results Instructional System (KIRIS) has had one of the most widely watched statewide systems of alternative assessment, "multiple-choice items will be added for direct assessment of knowledge and skills in all subject areas" in 1996-1997, while "performance events will not be part of the school and district accountability calculations for 1996-98" (Salyers, 1996, pp. 10, 11).

On the second point, Darling-Hammond, Ancess, and Falk are ironically myopic in conveying the message that the various forms of assessment they deem authentic are intrinsically better than standardized multiple-choice tests. One of the reasons that these authors prefer authentic assessments is because of their "contextual authenticity" (p. 263), by which they mean that assessments are embedded in meaningful learning experiences for students and in the educational culture of particular schools. In these regards, Darling-Hammond. Ancess, and Falk are quite correct in arguing that when "members of school communities [are] engaged in development and use of the assessments" (p. 253), such involvement can help schools become learning communities in which teachers and students become "both problem-framers and problem- solvers, taking hold of the destiny of their school and collectively steering its course to enhance student success" (p. 255) and can help "transform school from a bureaucratic institution into an educational community, bound by commonly constructed goals, values, beliefs and commitments about how children should be educated and how schools need to be organized to achieve these goals" (p, 262).

But what is largely ignored in Authentic Assessment is that there are other contexts, outside of individual schools, in which assessment plays an important role. Indeed, it is no accident that standardized testing has grown in prominence in the educational system of the United States as schooling grew from a largely local enterprise in the nineteenth century to a state and national undertaking in the twentieth. One of the key reasons standardized testing seems to have grown so dramatically in the United States in this century, not just in education but also in business and the military, is that it fits so well with key aspects of the bureaucratic ideal-for instance, that individuals should advance in society not because of their family connections or who they know, but because of merit, objectively determined. And more concretely, one of the biggest spurs to standardized testing in schools was the passage of the Elementary and Secondary Education Act of 1965, with its mandate for evaluation of the effects of federally funded Title I programs (See Haney, Madaus, & Lyons, 1993, chap. 5, for a discussion of forces powering the growth of testing in the United States in this century).

While it is tempting to try to dismiss such forces as the bureaucratic intrusions of big government into educational matters, to do so would be a mistake. As the preceding paragraph suggests, even a modest historical perspective shows that the bureaucratic ideal presently gets more of a bum rap than it deserves (indeed in recent political discourse, "bureaucracy" seems to have become almost an epithet). And education is a key function of state government and states have a responsibility to assess how well students are progressing, not just in terms of schools' own internal assessments, but also in terms of external assessments developed outside of individual schools. External assessments provide important checks and balances to schools' homegrown internal assessments. And in this regard it is notable how seemingly universal-even in states such as Kentucky and Maryland, with well-developed state assessment systems-is the practice of having students assessed also on nationally developed tests to provide a gauge of how students are progressing in the basic academic subjects in terms of national norms and standards.

To be clear, I concur completely with Darling-Hammond, Ancess, and Falk that in recent decades-particularly the 1970s and 1980s-there has been an overemphasis on standardized multiple-choice test results in American schools. But my view is that schools ought to employ assessments that span the full range of modes by which we want students to be able to demonstrate their learning, including speaking, writing, performing hands-on tasks, and answering multiple-choice tests. Some such assessments may well be developed within particular schools. Indeed my own experience and that of Boston College colleagues Mike Russell, Cengiz Gulek, Ed Fierros, and Amie Goldberg over the last five years in helping schools tailor their own school wide assessment leads me to concur completely with Darling-Hammond, Ancess, and Falk that involving members of school communities in developing their own school wide assessment systems and involving them in collaboratively judging student work can be a very valuable vehicle for helping them take hold of the destiny of their school and collectively steer their own course to enhance student learning.

But our experience also suggests that it can be extremely valuable to schools to help them try out externally developed assessments, including multiple-choice test questions that have been used elsewhere. Discussions about student learning are often deepened, we have found, when one form of assessment (such as answering multiple-choice science questions) shows students to be doing well, whereas another (such as solving open-ended science tasks cooperatively) shows students to be doing poorly. Thus, I really would have liked to see more discussion of why it was that Akeem remained "a low test scorer" even while the Descriptive Review procedures at BNS showed him to be making substantial progress as a reader.

Authentic Assessment provides little insight into unlocking this puzzle regarding Akeem. But in pondering what might explain the puzzle, I was reminded of a concrete example of the value of pursuing anomalous assessment results in the cases of individual students, It is a small study undertaken several years ago by John Cawthorne of Boston College and the National Urban League (Cawthorne, 1990).

Around 1990 the Boston School Committee mandated a policy requiring that all high school students attain a particular score in reading on the Metropolitan Achievement Test (MAT) in order to graduate from high school. Preliminary results of MAT scores from the fall of 1989 indicated that 40 percent of high school seniors would not meet the new requirement (and over 50 percent if Boston's exam schools were not included). What Cawthorne (1990) did was to interview students from two Boston high schools to find out why many "good" kids who "were doing everything their schools asked of them" had failed the test. In the interviews he also asked students to read for him. He found that many of the students were minority and/or bilingual students (some of whom had already been accepted to college). Though all students he interviewed could read, some said that they "did not test well" or read English so slowly that they had been unable to finish the test in the time allotted (though by working long hours on their school assignments they were able to do well in school). Other students simply had not been given reasonable notice about the importance of the test and had not taken it very seriously, What Cawthorne showed by delving into individual cases was that there were some understandable reasons why some students did well in terms of school grades but still performed poorly on standardized tests. (And Cawthorne's study also helped deep-six the proposed reading test graduation policy.)

In short, my experience as well as that of colleagues in helping schools to actively use different forms of assessment (including multiple-choice questions, writing tasks, and performance tasks) indicates that, at least in the short term, different forms of assessment often provide quite different pictures of how well students are learning. (And incidentally, just as Akeem's teachers found that art provided an effective means for his expression and learning, we have found that highly unusual and provocative views of life in schools-often quite different from those afforded by any of the sorts of authentic assessment discussed by Darling-Hammond, Ancess, and Falk-can be derived from student drawings of their teachers.) But it seems to me a disservice to members of school communities (and likely to fall on skeptical, if not deaf, ears anyway) to tell them that multiple-choice tests represent an outmoded form of assessment that will soon be replaced by more "authentic" forms of assessment. Instead, educators should be encouraged to dig beneath the surface of the contrasting pictures of student performance that may be seen through alternative forms of assessment.

Lest these last few comments should seem to overly qualify my essentially very positive evaluation of Authentic Assessment in Action, let me close this review by reflecting on two questions raised by this volume that may be much more important than debates about the merits of alternative forms of assessment. The first is whether there is a critical size for schools. The second is whether there has been an overemphasis on cognitive learning in recent school reform initiatives.

Is there a critical size for schools? Reading about the schools profiled in Authentic Assessment, one is struck by how unusual they are in many respects-for instance, with teachers investing huge amounts of time in professional deliberations with one another and with extensive and on-going communications among schools, students, and families. But one simple structural feature of all five schools is that they are relatively small. In their closing chapter, Darling-Hammond, Ancess, and Falk comment on this feature in saying that it is important to keep schools small enough to allow and maintain a sense of community among teachers, students, and parents, and a common commitment to building on the strengths of all students. But the authors do not tell us what they mean by "small enough." While no systematic data are presented in Authentic Assessment in Action about pupil-teacher ratios, passing references lead me to estimate that in all of the schools profiled there is at least one teacher for every twenty pupils. This observation reminded me of a most unusual experimental study of class size that has garnered far less attention than it deserves, namely the Tennessee class-size study.

The Tennessee class-size project was begun in 1985. Under then governor of Tennessee Lamar Alexander, education had been made a high priority. In light of proposals to increase funding for education so that class sizes in the state could be reduced, the Tennessee legislature authorized a study in which learning would be compared in kindergarten through grade three classrooms when class sizes were randomly assigned as either small (thirteen to seventeen pupils per classroom) or large (twenty-two to twenty-five pupils).

Both standardized and curriculum-based tests were used to assess and compare the performance of some 6,500 pupils in 330 classrooms at approximately 80 schools in the areas of reading, mathematics and basic study skills. After four years, it was clear that the smaller classes did produce substantial improvement in early learning and cognitive studies and that the effect of small class size on achievement of minority children was about double that observed for majority children, but in later years it was about the same. (Mosteller, 1995, p, 113).

A follow-up study, begun in 1989, examined whether children who had been in smaller classes in the early grades continued to perform better than their graduates after they returned to normal-sized classes in the upper elementary grades. Results showed a lasting benefit to those who were in the small-size classes during their early school grades.

Now the Tennessee class-size study was very, very different from the inquiry undertaken by Darling-Hammond, Ancess, and Falk. The former was based on a randomized experimental design, for example, whereas the latter clearly drew on ethnographic traditions of observation and interviews in natural settings. And of course, the Tennessee study employed assessments that Darling-Hammond, Ancess, and Falk would seem to deem inauthentic. But the strikingly common finding from these two very different studies is that small class size-or since some of the schools studied by Darling-Hammond, Ancess, and Falk were not organized into traditional classes, low pupil-teacher ratios-with fewer than twenty students per teacher, can lead to substantial gains in student learning.

What is more, Darling-Hammond. Ancess, and Falk point out a simple strategy by which pupil-teacher ratios can be reduced. Schools can organize themselves for more personalized and intensive adult-student relationships, they write, by reconfiguring themselves "into smaller self-contained learning clusters" so that "the need for non-teaching personnel, such as deans, assistant principals, guidance counselors, attendance officers, and pull-out program staff, diminishes" (p. 264).

But even more important than such suggestions about school organization is an implicit message in the cases profiled in Authentic Assessment about the aims of education. Most school reform initiatives of the last decade or so seem to have been led by academics. And academics frequently suffer from what might be called "cogniphilia"-a love of things cognitive. The very first principle of the Coalition for Essential Schools, headed by Theodore Sizer of Brown University (who incidentally wrote the foreword for Authentic Assessment), refers to schools' "central intellectual purpose" (p. 16). The implicit assumption seems to he that schools' central aim ought to be to make students smarter, to improve their intellects. But one common feature of all five schools profiled in Authentic Assessment that struck this reader is their aim to engage not just students' minds, but their hearts and hands as well. At CPESS, students have to complete portfolios in not just the academic subjects, but also in school/community service and internship, and ethics and social issues. At Hodgson Vocational Technical High School, much of the school's power for engaging students seems to derive from involving them in vocational pursuits in shop projects and in work experiences outside of school. And one of the three interdisciplinary programs at International High School focused on personal and career development including internships in which students work outside the school four days per week. All of which leads me to suggest that we cannot make a fair or full appraisal of the success of these, or any other schools, by focusing only on students' cognitive learning (regardless of how it is assessed) Instead of looking just at development of minds, we need also to weigh how hearts have evolved and hands have been engaged. For as Francis Bacon reminds us:

Consider what are the true ends of knowledge, and seek it not either for pleasure of mind or for contention, or for superiority to others, or for profit, or fame, or any of the inferior things; but for the benefit and use of life; and that they perfect and govern it in charity. For it was from lust of power that the angels fell, from lust of knowledge that men fell; hut of charity there can be no excess, neither did angel or men come in danger of it. (quoted in Ravetz, 1971, p. 436)

This review ha now wandered far afield from assessment. And so in the end I must conclude that Darling-Hammond, Ancess, and Falk most likely did not undertake any conscious sleight of pen in naming their volume Authentic Assessment in Action. Rather, it is testimony to the richness of the cases they have profiled that in their reading we are led to contemplate matters ranging from the aims of education and the appropriate size for schools to the nature of bureaucracy. And it is a reflection of a point made some years ago in a social history of standardized testing in the United States:

To make sense of the last century of reasoning about testing, as Cohen and Rosenberg (1977) observed about the social functions of schooling, we must attend not only to the strictly instrumental functions of tests [and now of assessment] but also to their expressive qualities, such as feelings, values and images expressed. In this regard, it seems clear that attitudes and debates over testing are affected not just by prevailing social conditions and political attitudes but in particular by attitudes toward the social role of schooling. (Haney, 1984, p. 641)

Thus Authentic Assessment in Action, like many recent debates about the virtues of different forms of assessment, beneath the surface harbors a brief for a particular view of the social role for schooling. In this case it is one that is democratic, emancipatory, and inclusive of all children. But make no mistake about the likelihood of settling disputes about the merits of alternative forms of assessment. As long as we have contrasting opinion as to the appropriate social role for schools, and as long as competing roles have different emphasis at different levels of schooling (for instance, there seems to be broad social commitment to Darling-Hammond, Ancess, and Falk's view of schooling at the elementary level, but at the secondary and postsecondary levels, the sorting and selection functions of schooling come to the fore). debates about alternative forms of assessment will continue to ebb and flow.

I would like to thank Diane Joyce, Joe Pedulla, and Anne Wheelock for helpful comments on a draft of this review.

(1) In the interest of candor it should be noted that while all the authors of Authenic Assessment are now affiated with the National Center for Restructruing Education (NCREST) at Teachers College, Columbia University, one was previously directly affiated with the Colalition of Essential Schools. Before going to NCREST, Ancess was the "Director of Secondary School Change Services for the Center for Collaborative Education, the NEw York City Affiliate of the Coalition for Essential Schools" (p. 275)
And on the topic of candor, it should be mentioned that this reviewer is creditied in the acknowledgemnts of Authentic Assessment with having provided a helpful review of earlier drafts of some of the case studies presented in this volume. To be candid, I cannot recall exactly what I said in reviewing such drafts. I can only hope that it was nothing terribly inconsistant with my current review of this volume.
(2) It is worth mentioning that there may ine fact be a substantive reason for standardized test math scores to be more susceptible to curriculum effects than verbal test scores. Math skills, unlike verbal skills, are learned almost exclusively in school. Thus it is not surprising that research on the effects of test preparation and coaching quite consistantly has found larger effects on math than on verbal tests. (Becker, 1990).


Archbald, D.A. & Newman, F.M. (1988). Beyond standarized testing: Assessing authentic academic achievement in the secondary school. Reston, VA: National Association of Seondary school Principals.

Becker, B.J. (1990). Coaching for the Scholastic Aptitude Test: Further synthesis and appraisal. Review of Educational Research 60(3), 373-417.

Cawthorne, J. (1990). "Tough" graduation standards and the good kids. Unpublished paper Chesnut Hill, MA: Center for the Study of Testing, Evaluation and Educational Policy.

Cohen, D.K., & Rosenburg, B. (1977). Functions and Fantasies: Understanding schools in capitalist America. History of Education Quarterly, 11, 113-117.

Haney, W. (1994). Testing Reasoning and reasoning about testing. Review of Education Research, 54(4), 597-654.

Haney, W., Madus, G., & Lyons, R. (1993). The fractured marketplace for standardized testing. Boston, MA: Kluwer Acacdemic Publishers.

Mosteller, F. (1995). The Tennessee study of class size in the early school grades. Critical Issues for Children and Youth, 5(2), 64-84.

National Association for the Education of Young Children (NAYCE), (1988). NAYCE position statement on developmentally appropriate practices in the promary grades, serving 5 through 8 year olds. Young Children 43(2), 64-84.

Ravetz, J. (1971). Scientific knowledge and its social problems. New York, NY: Oxford University Press.

Rudner, L. M., & Boston, C. (1994). Performance Assessment. The ERIC Review 3(1), 2-12.

Salyers, F. (1996). Kentucky continues to improve assessment. Kentucky Teacher, pp. 10-11.

Wiggins, G. (1989). Teaching to the (authentic) test. Educational Leadership, 46(7), 141-147.

Cite This Article as: Teachers College Record Volume 98 Number 2, 1996, p. 328-345
https://www.tcrecord.org ID Number: 9631, Date Accessed: 1/24/2022 8:04:34 PM

Purchase Reprint Rights for this article or review
Article Tools
Related Articles

Related Discussion
Post a Comment | Read All

About the Author
Member Center
In Print
This Month's Issue