The Science and Politics of National Educational Assessment
by Martin T. Katzman & Ronald S. Rosen - 1970
The authors provide an overview of the political and scientific factors pertinent to the "rise and fall" of National Assessment.
At a time when, for rather different reasons, President Nixon and Dr. Kenneth Clark are proposing that educational "output" be evaluated, we think attention should be paid to the strange history of the National Educational Assessment, first initiated in 1964. In this article, Professor Katzman of the Harvard Graduate School of Education and classroom teacher Ronald Rosen provide an overview of the political and scientific factors pertinent to the "rise and fall" of National Assessment. Their essay is an outgrowth of Harvard's 1968-9 Seminar on Education and Public Policy; and they wish to thank Walter McCann for introducing them to the issues involved in national assessment, and both Walter McCann and David K, Cohen (who sent their manuscript to the record) for their comments on an earlier draft.
The American people today expect more of American education than ever before. At such a time isn't clear to all of us as educators that what we don't know CAN hurt us?
- Francis Keppel, then Assistant Secretary of Education1
Without social indicators on the results of the educational process, the Federal Government cannot know where its financial help is needed most.
- Wilbur Cohen, then Secretary of Health, Education, and Welfare2
It is a commonplace that a large and increasing share of the nation's resources, human and financial, are being devoted to the schooling of its children. Were schooling simply a consumer service like getting a haircut, there would hardly be such anxiety and conflict over the state of American education. Schooling assumes tremendous importance because both the public and the experts perceive it as an investment, preparing youngsters for adult roles, and perhaps more importantly, as an agency for solving social problems in the areas of race relations, poverty, and technological unemployment.
For decades we have had considerable information about the inputs into the educational process (enrollments, teacher salaries, class size, etc.) but little systematic information on its outcome—i.e., how much children know or what they learn at school. With the express purpose of gathering information useful in formulating policy, a continuing program of National Educational Assessment (NEA) is being undertaken, at a cost of some $3-4 million per annum. Over a three year period, over one hundred thousand individuals per year will be tested in ten subject areas. In 1969-70, science, citizenship, and writing will be assessed; in 1970-71, music, mathematics, literature, and social studies; and in 1971-72, reading, art, and vocational education. In this essay we consider the factors which led to the NEA program as currently conceived and evaluate the likelihood of its serving the intended purposes.
I. THE NEED FOR EVALUATION IN EDUCATIONAL POLICY
Americans are the great quantifiers, gauging the nation's progress in terms of numbers, indices, and rates of growth. No clearer manifestation of this tendency is found than in the flurry of emotion and activity in the newsrooms and stockmarkets which follows the publication of the latest consumer price index or income estimates. In the regulation of aggregate economic activity, such numbers perform a highly useful function, indicating: 1) how well we are doing, as opposed to other countries or other time periods; and 2) the directions in which fiscal and/or monetary policy should move to attain particular economic objectives. While partisans of conflicting hue may disagree as to the relative importance of various macroeconomic indicators—inflation, unemployment, growth—all partisans find these measures useful guides to behavior and none would claim to be better off without this information. Such indicators assume usefulness because most observers share a common view of reality which relates policy to outcomes. In fact, the development of this shared view has been facilitated by an analysis of these indicators.
Systems Analysis in Education
Unlike the economic dimension of society, progress in the area of education has generally been measured by inputs (expenditures, etc.) rather than by outcomes. Because such measures do not tell us how well we are doing or where we should go, there has been a renewal of interest in systems analytic techniques in education, in emulation of its "success" in the area of defense.3 (The techniques have been successful in defense perhaps only in the sense that they are used, and not necessarily in the sense that they promote "better" decisions.) Systems techniques (variously known as operations analysis, cost-benefit, cost-effectiveness, and programming-planning-budgeting systems) focus on two aspects of policy. First, they attempt to define programs in terms of outcomes as opposed to inputs. Second, they attempt to specify the input-output or policy-performance relationship.
In the area of education, it would be interesting to know, for example, what percentage of 17-year-olds can comprehend a given newspaper paragraph or how many 9-year-olds can solve a particular problem in multiplication. It would further be interesting to know whether the average American youngster of today is a better reader or mathematician than those of a decade ago. Finally, it might be interesting to know whether certain subpopulations (economically, ethnically, or regionally defined) perform below the standards expected in our society.
While interesting, such information alone does not help us in policy formation. We must have prior knowledge of the effect of our policy variables (such as curriculum, class size, teacher qualifications) on these measures of outcome.
Difficulties of Educational Survey Research
Although school systems have been testing their students for decades, there have never been comprehensive measures on a national basis which permit the gauging of progress over time, a comparison among subpopulations, and most importantly the guiding of policy. Two programs which came close to this ideal were the one-shot Equal Educational Opportunity (E.E.O.) Survey, popularly known as the Coleman Report, and the longitudinal Project talent, both of which were initiated by the U.S. Office of Education. These sample surveys provided some information on how much children know, what kinds of schools they attend, and what characteristics their families and peer groups possess.
In principle, these are the kinds of data necessary to evaluate the effects of educational policy differences. Following the canons of experimental design, one can test the impact of a particular policy variable—e.g., varying class size —by comparing the performance of youngsters of similar background whose schools are similar in all respects but class size.
In addition to the many methodological shortcomings for which E.E.O. Survey and talent have been criticized,4 a basic difficulty with educational survey research is that social reality refuses to provide data in a pattern tailored to the requirements of experimental design. First, schools are not profoundly different enough, at least on the characteristics which are most measurable or manipulable. For example, there are too few schools with extreme class sizes, say 10 and 50, which are similar in all other respects for us to test non-experimentally the effects of radical policy changes in class size. Second, school and student characteristics are so highly correlated that it is difficult to find schools which differ in only one major respect. For example, it is difficult to find upper class children attending small classes who have poorly trained teachers; most such children have highly trained teachers. Consequently, we cannot distinguish among the effects of various school characteristics. These two difficulties are not obviated by larger or better chosen samples.
The Thrust towards National Assessment
These generally recognized limitations of educational research notwithstanding, the wheels were set in motion for a continuing national program of testing with the express purpose of improving policy making. To allay fears of federal encroachment in local affairs, the Office of Education encouraged the Carnegie Corporation to set up the Exploratory Committee on Assessing the Progress of Education (ECAPE)s in 1964. Under the direction of Ralph Tyler, a distinguished psychologist and then director of the Center for Advanced Study in Behavioral Science, this committee set some father far-reaching goals. It hoped to design a system of assessment with two major characteristics:
First, it would give the nation as a whole a better understanding of the strengths and weaknesses of the American educational system. Thus, it might contribute a more accurate guide than we currently possess for allocation of public and private funds...
Second, assessment results, especially if coupled with auxiliary information on characteristics of the various regions, would provide data necessary for research on educational problems and processes which cannot now be undertaken (italics ours).6
II. THE RISE AND FALL OF NATIONAL EDUCATIONAL ASSESSMENT
The thrust towards national assessment, embodied in ECAPE, was not without opposition. The changing nature of the criticism and debate that the Assessment program met in its development arouses suspicion. Although some of the opposition to the project was basically political, a good deal of it was based on technical objections. Curiously, it seems that these objections were quelled without any changes in the substance of the Assessment program.
Many of the very early disagreements were based on unreliable facts or incomplete information about ECAPE's plans. When educational administrators became informed of the true nature of the program, and took part in its development, many of their fears were allayed. Unfortunately, meaningful criticism dissipated simultaneously. It is worthwhile and interesting to review how an institution for disseminating "pure" knowledge affects and is in turn affected by the phenomenon it intends to describe—in this case, the educational enterprise.
The Opposition Phase
At its inception, ECAPE's membership consisted primarily of behavioral scientists and administrators with previous interest in educational measurement, obvious supporters of the program. Two important professional interest groups, in particular, the American Association of School Administrators and the National Education Association, were not represented.
When ECAPE later moved from research and pilot testing to a design for national operations, this one-sidedness proved to be its major stumbling block. The program came under heavy attack from the American Association of School Administrators (AASA), which felt it legitimately deserved a role in a program impinging upon its area of professional concern, a role which it had been in fact, if not in intent, denied. While exclusion from participation may have been the source of a good deal of the suspicion and animosity cast upon the Assessment program, it must be remembered that at least some of the objections raised at that time were valid and worth the consideration of the ECAPE staff.
Many of the more valid technical objections were raised at the 1965 White House Conference on Education, at which one panel discussed the advantages and disadvantages of national assessment.7 Among the objections, some of which are discussed in greater detail below, were that assessment would tell us little more than current standardized tests, that memorization and conformity would be rated above understanding and creativity, and that the information might be misused.8
AASA opposition accumulated to the point that its executive committee resolved not to cooperate with the Assessment. In effect, this meant that the AASA was boycotting the project and it was likely that ECAPE would not be allowed to administer its tests in the schools.
Political Grounds for Opposition
Rather than perceiving an attempt at national evaluation as a neutral technological advance that would have benign effects on all concerned with education, grass roots administrators were gripped with fear on three major grounds: 1) of redirection of educational activity; 2) federal control or homogenization; and 3) invidious comparison.9
First, there was the fear that local school systems would redirect teaching to improve the scores of students on the nationwide test. The precedent for this fear is the perceived redirection of high school curricula to improve students' College Board scores. A corollary was that tremendous pressure would be placed on students to perform well.
Second, there was the fear that national standards of performance would be imposed, supplanting state controlled input standards. At the extreme there was fear that participation in national testing programs might be necessary to receive federal aid.
Third, there was the fear that local or state schools would appear in unfavorable light as compared to other systems. This in turn might shake public confidence in an already vulnerable educational establishment.
The Interests of the Public vs. the Professionals
These fears, which stem from the occupational interests of the educators, were not necessarily in the best interests of the clientele. Basically, educators are unwilling to subject the results of their enterprises to public scrutiny, since as marginal professionals they wish to be judged only by peers.10 Neither do they wish to be held accountable for problems which they can do relatively little about.
In the absence of performance measures, it is not only difficult for parents to determine how much their children know, it is also difficult for educators to determine their teaching success. Only if testing instruments were arbitrary or invalid—i.e., unrepresentative of things we expect people to know in the real world—should the public or teachers find them objectionable. If the test sampled what society deemed useful knowledge, teaching for the test becomes compatible, indeed synonymous, with a "good education."
The fear of national standardization is somewhat spurious. Because school administrators tend to share a common professional subculture and because there is consensus, at least within social classes, as to what characterizes good schooling, there are tremendous similarities among the nation's school systems. In fact, the similarity of controllable aspects like teacher salary, class size, facilities, is so great that survey research on the effect of such differences becomes nigh impossible, as suggested above. In addition, local school systems already tend to be constrained by state standards and requirements and are likely to become more so as state aid assumes greater prominence.
Finally, the fear of invidious comparison at best saves face for the educators and at worst abuses the students. Regardless of who is embarrassed by publicity on poor performance, parents have a right to know how their children stand so that corrective measures can be applied if necessary. Hiding the community head in the sand is maladaptive, for the labor market and/or the colleges these students enter are much more ruthless evaluators than a battery of tests.
The Co-optation Phase
The legitimacy of superintendents' fears notwithstanding, these are political facts which the U.S. Office of Education and ECAPE had to cope with when they began to promote national assessment. The ability of superintendents to stymie any comprehensive testing program in their jurisdictions was amply demonstrated in the refusal of many big cities to participate in the Equal Opportunity Survey undertaken by Congressional mandate.11
At this point, ECAPE realized that its project was in jeopardy, and was forced to consider changes to ease the situation. At least two roads were open to ECAPE: change the details of the program to be more acceptable to the opponents, or invite the opponents to join the committee. ECAPE took the latter course, doubling the membership of the committee (and dropped the "Exploratory" in its name, becoming cape), and changing its chairman (the new chairman, George Brain, is a past president of AASA). After this change, the opposition began to dwindle. In fact, William H. Curtis, who was president of AASA when it was still the loyal opposition, commented, "I think we should give encouragement to this committee because its responsibilities are great and its long-range impact is tremendous... (ECAPE) is now on the right track."12 Despite this pronouncement, there was no evidence since reorganization of any substantive change in the evaluation program. Apparently having "a piece of the action" was enough to mollify the opposition.
It is curious and surprising that the enlarged committee has not made any changes in the Assessment. Possibly ECAPE decided to enlarge the committee at such a late date that all the important decisions had been irreversibly sealed, but still early enough to diminish the resistance of school administrators, a necessity for the actual testing to begin. It is also possible that the original supporters bargained with the prospective members of cape to be allowed to conduct one cycle of testing before making any changes; financial reasons alone might have forced such an understanding. In either event, two facts are clear: no effective criticism of Assessment remains, and the program which began administering tests in April 1969 maintains its original form in all essentials.
Except for the tactical error of limiting its original steering committee to its strong supporters, ECAPE made few political mistakes. This seems to be particularly true of the detailed plans for the Assessment, i.e., the research design. It appears that ECAPE, in a desire to avoid critics as well as criticism, decided to make the Assessment "politically harmless." It is our contention that in so doing they have come close to rendering it educationally useless.
Just how did ECAPE manage to evade these issues? Dr. Banesh Hoffman, author of The Tyranny of Testing, explains, "In the purely political sense this program is brilliantly conceived. It will not tread on the toes of any individual, simply because no student, teacher, or school will be individually rated. Only a small sampling of students will be tested, and none of these students will be subjected to more than a small sampling of the total evaluative procedure."13 ECAPE found that statistical sampling procedures had been developed to the point where testing a small fraction of the nation's students could yield accurate data for the whole population.
What this plan means for a particular school is that only a few students would be involved, and even then for only a half hour. No individual scores would ever be reported since the sampling procedure would make this information meaningless. Therefore students and teachers would feel little pressure to prepare for the examination or feel burdened by it. Even still, it is conceivable that a state, facing the possibility of being rated in comparison to other states, would apply pressure to the schools and teachers. In response to this possibility, ECAPE decided that the smallest geographical area for which data would be reported would be a quarter-country region; that is, data will be reported as relevant to the Northeastern, Southeastern, Central, or Western regions, or to the nation as a whole.
One should not get the impression that the technicians on the cape staff were solely motivated by political considerations in establishing their sampling procedures. In preparing to administer the test, cape made useful contributions to the methodology of testing by carefully controlled experiments. They found, for example, that the locale of testing (in-school versus out-of-school) had no effect on test results.14 These technical advances, however, concern us less than what can be learned from the results.
Regions are but one of the subpopulations for which the test results will be reported. Additionally, each participant will be classified by sex, one of two socioeconomic background levels (rich or poor, with the demarcation line set at some poverty index), four types of communities (urban, suburban, smaller city, rural), and age (the tests will be given to four age groups: 9, 13, 17 and young adult—26 to 35). (The ECAPE public reports originally included race as a dimension with the choices white, Negro, and "other," but the more recent technical articles do not mention it, probably in another move to avoid invidious comparisons.) Thus cape might report that 35% of the male 17-year-olds of lower socioeconomic status from the Northeast were not able to read and understand a particular reading passage.
One could hardly quarrel with the age groupings (9, 13, 17, 26-f-), sex, or regional divisions (Southeast, Northeast, Midwest, Farwest). These, however, are not dimensions along which disparities are of prime public concern. The most excruciating cleavages are along lines of social class and race, or more properly, ethnicity, neither of which are adequately measured in this study.
Community breakdowns (urban, suburban, small city, rural) are not of intrinsic interest, but really reflect possible differences in industrial structure and hence socioeconomic composition. If class is what we are really interested in, it seems rather inefficient to mask class divisions by community labels.
Since only two socioeconomic levels are to be reported, it would seem that cape seems no more willing than the average American to recognize the fine grain of the American class structure with its many subcultures. The differences in attitudes toward education between the working class and middle class, holding income constant, are immense.15
Regarding ethnicity, cape is not following the striking breakthrough of the Coleman Survey in classifying students as white, Negro, Mexican-American, Puerto Rican, Oriental, and American Indian. Coleman's ethnic categories break through the usual white-nonwhite dichotomies which hide the fact that Japanese and Chinese are more like Caucasians in their socioeconomic status while statistically white Mexican-Americans and Puerto Ricans are more like Negroes. In ignoring substantial cognitive differences among Caucasians,16 cape hews to a dubious tradition.
When broken down on these five dimensions, the tested population comprises 256 subpopulations (=4 regional x 4 community x 4 age x 2 sex x 2 socioeconomic). Given the sample size, mode of testing, and number of sub-populations, there would be an average of 300 in each sample cell. In other words, there would be for example about 300 Northern, urban, male, poor 17-year-olds for whom there are test data in any year. Clearly, the number of cells could be increased manifold to give finer class-ethnic breakdowns without compromising the reliability of the sample estimates too seriously.17
The Goals of Education
In developing the assessment materials, cape avoided the pejorative "tests" and adopted the euphemism "instruments." Almost from the beginning it was decided that assessing education meant testing the students in a broad range of subject areas. Up until now, cape has chosen ten academic areas that will be covered in the assessment: reading, writing, science, mathematics, social studies, citizenship, music, literature, vocational education, and art. cape's publicity states that more areas will be added later, but the prospect for this seems limited by their plan to administer the tests in a three-year cycle, covering three or four of the areas each year. The significance of their decision to test these ten areas is elusive; although the list sounds complete, it fully assumes that the only thing necessary to assess education is to assess music, writing, science, and so forth. This assumption is based on the educational philosophy that the learning process consists of the absorption of specific material.
In January 1966, the Executive Committee of the Association for Supervision and Curriculum Development, wary of the plans for National Assessment, published a statement of "Guidelines for National Assessment of Educational Outcomes." One of its criteria that any assessment program must meet is:
Adequate assessment also requires exploration of learning in depth. Learning may vary from superficial 'knowing' to effective, efficient 'behaving.' It is not enough that schools produce students who 'know' better. The only valid criterion for effective learning is whether the student behaves differently as a consequence of having participated in the process. Proper assessment must be directed to the deeper questions of effective behavior.18
In limiting assessment to traditional curriculum areas, ECAPE had to develop the "goals of education" in each of the ten areas. ECAPE asked one of its contractors to work with teachers and curriculum specialists from each subject field in listing objectives that met these three requirements:19
It is implied, but nowhere stated, that the curriculum specialists consulted by the contractors were expected to test possible objectives on the first criterion and that teachers were asked to cover the second requirement. To meet the third criterion, the results of the first two groups were submitted to various panels of laymen assembled by ECAPE for the purpose. These laymen were asked to review the lists, making sure the objectives were important for children to learn. In practice, this meant revising the wording of a goal or possibly removing one. Hence it is doubtful that at this stage many new goals were added to the program since the laymen were charged with revising existing lists, not with creating their own.
There is an additional reason that these lay committees would not be likely to produce any new ideas, nor to disagree with those of the educators who drew up the lists in the first place. ECAPE asked national educational organizations and their affiliates to nominate people for these panels. In so doing they assembled people with a background of interest in educational matters, who are unlikely to represent fairly the majority of the lay public. To add a second limitation, ECAPE wanted "intelligent and thoughtful" laymen, which criterion alone would probably limit the participation of minority groups in this procedure.
Thus it is not surprising that the goals for a particular subject, as developed through this procedure, tend to be those of professionals in that field (including teachers). Any objective of education reflecting innovative goals would not be found on the assessment lists, since it would not be an objective "which the schools are currently seeking to attain" (italics ours).
Considering all of these tendencies to assume that the educators know best in terms of the goals of learning, it is not too surprising that the product of these efforts reads like an elementary school report card. (Remember the teacher evaluations—"tries to work efficiently" or "respects the rights of others"?)
For example, the goals for reading include such nondescript objectives as (1) comprehend what is read, (2) analyze what is read, (3) use what is read, (4) reason logically from what is read, (5) make judgments from what is read. Similarity, citizenship goals include (1) show concern for the welfare and dignity of others, (2) help maintain law and order, (3) seek school and community improvements through active democratic participation, (4) support rationality in communication, independent and informed thought, and action on school, civic, and social problems, and (5) help and respect one's family and nurture the development of children as future citizens. The goals of science contain (1) understand the investigative nature of science, (2) possess the abilities and skills to engage in the process of science, and (3) know fundamental facts and principles of science.20
The responses evoked by reading these goals lists to a sample of educators and laymen alike have varied from mild disbelief to outright laughter. ECAPE's goals of education, a blend of both the unimaginative and the chiliastic, seem to boil right down to "momism" and apple pie.
In a vain attempt to test more than specific knowledge, ECAPE included in the lists objectives relating to opinions and attitudes in addition to those assessing skills or knowledge. It certainly seems that one extra objective has been added in each field to pay lip service to the importance of understanding or appreciation. For example, as an adjunct to the three goals of science listed above, the fourth and remaining goal is "have attitudes about and appreciation for scientists, science, and the consequences of science that stem from adequate understanding." Having had four years of higher education as a physicist, the junior author finds little meaning in that statement. It reads, as do the "tacked-on" attitude questions in most of the other fields assessed, as though the "Party Line" required an attitude question.
Operationalizing the Goals
The specific instruments themselves seem to test a child's ability to memorize tidbits of information rather than any ability to process information or solve problems.
A bright urban nine-year-old, for example, may be unable to answer, "From what animal does pork come?" having never seen a pig nor eaten its meat. The nine-year-old who can "name five causes of the Civil War" may remember the list put up by his history teacher on the blackboard, without having the slightest understanding of the web of factors which lead men to solve their conflicts violently. A thirteen-year-old who "correctly" answers that "only Congress can declare war according to the U.S. Constitution" either possesses a fine sense of semantics or is unaware that 35 thousand Americans have died in Vietnam.
Another set of questions attempts to measure the normative behavior of American youngsters. The correct answers on "how to elect a team captain" or "how to treat a substitute teacher" are probably those in conformity with the norms of American society.21 Without challenging these norms directly, we may ask whether the use of the results will be to enforce conformity for its own sake.
One gets the overall impression that cape, in its attention to details of statistical validity, simplicity of administration, and use of a quasi-scientific approach, has lost sight of its major aims. It may seem amazing that such a large undertaking could go so far astray, but this becomes understandable when viewed in the perspective of its growth. Overreacting to early opposition, cape has evolved to a point of considerable ambivalence with respect to its original purpose of improving educational decision-making at the local, state, and federal level. It is quite clear that National Assessment will provide little information on the policy issues of the day—the effects of segregation, the effects of decentralization, the effects of resource or curriculum shifts. Nevertheless, considerable lip service is paid to the notion that assessment will improve policy.
As for precisely how the results of assessment can be so used, cape staff members are now suggesting that local schools might obtain copies of the national examinations, or even design their own based upon the cape experience. Presumably after comparing the local and national results, the local system could adjust its policy.
First, it is quite unlikely that local systems would have the resources to design or administer another battery of tests above and beyond those they normally administer. Second, the local-national comparison would only tell them where to go, but not how to get there. Finally, there is no mechanism for forcing states or localities to use the tests to identify much less correct problem areas.
III. CAN NATIONAL ASSESSMENT BE SAVED?
The National Educational Assessment Program as it stands today can be criticized on several grounds: 1) measuring questionable educational outcomes with questionable techniques; 2) classifying student subpopulations on largely irrelevant dimensions and/or insufficient detail; 3) neglecting to collect any information on school characteristics which would identify policy-performance relationships. In principle all of these shortcomings can be remedied; however, the institutions for administering the program make such remedy unlikely. We question whether the budget for the program might be shifted to better forms of educational research.
The dwindling opposition to neap on the part of the superintendents was purchased for the price of foregoing crucial information on school resources. Although one might argue that the present compromises were necessary to get a foot in the door for proper evaluation in the future, it seems more likely that the current program will become ossified, devoted to the publication of "social vindicators" of the educational establishment, to quote a phrase of Raymond Bauer's. Since the superintendents are in a position to veto any changes which would arouse the fears discussed above, it seems unlikely that more detailed data would be collected in the future. In short, the compliance of the educational establishment to neap was a Pyrrhic victory.
Although the technicians who developed neap still hope that it can be used in policy formulation, they are tending to shift from an explicit evaluative to an explicit descriptive orientation, to "what is learned, not where or how."22 In other words, at most neap can provide a measure of educational progress analogous to the gross national product, but no tools to affect it. If National Assessment is simply to be a descriptive venture, it will duplicate with little advantage efforts currently undertaken by other agencies. School systems throughout the country already administer the tests which come close to measuring verbal and computational skills which are perhaps the best indicators of life chances. While different local and state systems administer different tests, it is possible to develop standards for translating scores on one test to those on another. For example, the Educational Testing Service, in one pilot survey, was able to determine the extent of cognitive progress made by elementary pupils in the last two decades.23 In addition, as high school graduation and college application becomes more universal, Scholastic Aptitude Tests may become more representative indicators of what high school students know.
National Educational Assessment will annually cost $3-4 million for its activities, fl million of which are federal funds, the remainder from foundations. An alternative use of these funds might be experimentation to determine the effectiveness of alternative policies. While Americans tend to view human experimentation with considerable reservation, verging on horror, the possibilities for performing selected kinds of experimental programs may not be so formidable. This has already been done with some success for curriculum changes such as PSSC physics.
While we feel that comprehensive educational surveys may be desirable, despite their limitations, the particular National Assessment program of cape is of dubious value, with little hope for future payoff. Finally, we feel that the resources for assessment could be put to better use in serious educational experimentation.
POSTSCRIPT: THE FUTURE OF SOCIAL INDICATORS
In response to Secretary Keppel's exhortation to know itself, the educational establishment revealed something about the relationship between knowledge and power. The leaders of enterprises whose main purpose is to disseminate knowledge acted as if the dissemination of knowledge about those enterprises was indeed threatening. Although the development of indicators of performance for public enterprises is important for improving the efficiency and responsiveness of those enterprises, administrators—be they educators, physicians, policemen, or soldiers—have little to gain and much to lose to public scrutiny. Not only may scrutiny challenge the ends of those enterprises, but also, most abhorrent to the professional, the means by which those ends are achieved.
In a contest between the professional administrator and the public, mere numbers are not the sole determinant of the outcome. The professional has greater authority, greater personal stake, and greater information than the public. While the benefits of scrutiny to the public may be significant, the public is generally not organized qua consumers. Because of the asymmetry of information and authority, even delegated watchdogs of the consumers are often overwhelmed by the producers in any regulatory confrontation. The consumer is generally overwhelmed for another important reason: the dissemination of information is what economists call a "collective good"—once information becomes public, an individual cannot be excluded from consuming it whether or not he fought for its disclosure. A consumer would rather not engage in this effort alone since others will get the benefits without having to fight for it. Consequently, it is difficult to get anybody to invest effort in obtaining the disclosure of this type of information.24 On the other hand, professionals are highly organized in trade associations, such as AASA and NEA, in which mechanisms for cooperation and lobbying on pertinent issues already exist.
Perhaps one of the great problems of a postindustrial society is correcting the imbalance between highly informed professionals and the uniformed public in whose interest they are charged to act. The sectors in which the consumer is unable to evaluate the performance of the producer compose the growing share of economic activity—national defense, education, health.
Organizing consumer lobbies for the disclosure of information will be a difficult job. If the experience of National Assessment is indicative, the future of meaningful social accounting does not seem bright.