Evaluation and Education at Quarter Century
reviewed by Peter H. Rossi - 1993
Actually education has been evaluated for more than twenty-five years. Educational evaluation has no identifiable birth-year that can be used to calculate its age. However, the twenty-fifth anniversary celebrated in this volume is that of the passage of the Elementary and Secondary School Act of 1965, certainly a milestone in evaluation history worth noting because that act marked the beginning of a federal commitment to assessing educational programs (and other social programs) that has continued, with ups and downs, ever since. Acting on behalf of the National Society for the Evaluation of Education, the two editors corralled a set of first-rate authors, most of whom had written important statements at the beginning of the surge of evaluations that followed the passage of the 1965 act. The editors asked each to review what they had written and to comment on how they would change things were they to rewrite those early contributions today. The end result is a set of excellent essays, fascinating for what they tell us about how evaluation in education has changed over the last two decades and for hints about transformations yet to come.
The list of authors is studded with the stars of evaluation. The grandest old man of evaluation of education, Ralph W. Tyler, comments on his seminal 1942 paper and expands on it. Tyler's essay emphasizes the importance of studying the implementation of educational programs and of thereby specifying what it is being evaluated. Michael Scriven worries in his essay whether his contributions to the lexicon of evaluation, especially the terms "formative" and "summative," have been inaccurately interpreted to mark different kinds of evaluations rather than different evaluation tasks or roles. As is usual for him, Scriven writes an essay that is provocative and rich with insights. Seemingly filling in for Donald T. Campbell, Thomas Cook looks back to what must be the most frequently cited exposition of evaluation design, by Campbell and Stanley.(n1) Campbell lent legitimacy to quasi-experimentation in that work, but, as Cook relates, the difficulties in the use of designs are better known today but not fully solved. The dilemma facing the evaluator is that randomized experiments produce the most credible estimates but cannot be used for most evaluations. "Queazy-experiments," as Campbell once called them, are practical to use but difficult to defend, except under some unusual circumstances. The chapter by Boruch, another of Campbell's proteges, presents a stronger case for randomized experiments, but argues for abandoning the leviathan experiments of the seventies for families of smaller, closely related experiments from which we will learn more about how best to improve education.
Stake's chapter expresses his disillusionment with the social science forms of evaluation, noting that formal evaluations did not improve education when they were in vogue. Stake discerns a strong trend toward qualitative methods involving deeper attention to process as opposed to net effects. The drift to the qualitative is echoed in chapters by Alkin, House, and Eisner. Carol Weiss assesses the impact of evaluation on decision making, noting that evaluation results rarely settle the fate of programs, but the knowledge gained through them infiltrates and changes decision makers' views of the educational process. A final chapter by Stufflebeam reviews the state of educational standards as promulgated by professional associations.
There are a few stars missing from the list of contributors: I especially marked the absence of Donald T. Campbell and Lee Cronbach, truly giants of evaluation whose writings are referred to with great respect in almost all of the essays. Campbell led evaluators to develop field (i.e., outside-the-laboratory) versions of randomized experiments and explored the flaws in approximations to randomized experiments. In his writings, Cronbach was willing to trade off the internal validity of randomized experiments for the external validity of evaluations that would be more relevant to decisions. The difference between these two viewpoints is echoed in almost all of the chapters.
The message I draw from these essays is that the field of education has been traumatized by the events of the last two and a half decades. The Elementary and Secondary Education Act of 1965 launched an era of great promise. The federal government was going to take on some of fiscal burden of public education and enlarge the total financial support for that institution. Evaluation to be done in the social science mode would show what worked and what did not, allowing education to evolve through the selective elimination of the identified unfit programs.
Alas, the last quarter century did not live up to much of its promise. our public schools did not improve. Indeed, there is some evidence that they declined in quality--certainly many of our political leaders believe so. As initially envisaged, the evaluation enterprise also failed. Evaluations were difficult to conduct in the best social science mode as school systems held back their full cooperation. Then too, evaluations did not clearly identify either the fit or the unfit. Educational evaluators reacted by slowly abandoning the more rigorous social science evaluation modes and adopting those that brought them closer to the schools. The bearer of bad news began to find out how to deliver good news, often no news at all.
All that said, the essays in this volume are excellent and should be of great interest to evaluators of all types. They will not instruct the reader on how to conduct evaluations, but they will make all evaluators more truly appreciate where evaluation fits into the politics of public education.
(n1) Donald T. Campbell and Julian G. Stanley, Experimental and Quasi-Experimental Descriptions for Research (Chicago: Rand McNally, 1963).