How Effective is Schooling? A Critical Review and Synthesis of Research Findings
reviewed by Selma J. Mushkin - 1973
Title: How Effective is Schooling? A Critical Review and Synthesis of Research Findings
Author(s): Harvey Averch, Stephen J. Carroll, Theodore S. Donaldson, Herbert J. Kiesling and John Pincus
Publisher: Rand, Santa Monica, CA
ISBN: 0877780714, Pages: 232, Year: 1971
Search for book at Amazon.com
Dr. Selma J. Mushkin is professor of economics and director of Public Services Laboratory, Georgetown University, Washington, D.C.
How Effective is Schooling? A Critical Review and Synthesis of Research Findings
Harvey Averch, Stephen J. Carroll, Theodore S. Donaldson, Herbert J. Kiesling, and John Pincus. Santa Monica, Calif.: Rand Corporation, 1971. $5.00.232 pp.
In the short period since its release in March 1972, How Effective is Schooling? A Critical Review and Synthesis of Research Findings by Harvey Averch and others for the Rand Corporation has become a model of evaluation of research studies on policy-related, politically charged educational questions. The report shows the great need for an evaluation strategy that encourages new research and cautions against premature policy application of what are essentially preliminary findings. While the authors say that they view the report as the first step toward increasing the potential effectiveness of interdisciplinary research in education, their emphasis is directed to evaluation study findings and policy recommendations.
From the outset it was clear that this Rand study would have far-reaching consequences. Unlike the research it examines, it comes at a time when the seemingly anti-education findings of the basic works on output assessment have lost their shock effect. Because it is a clear, careful summary of these studies, it is this reviewer's impression that its findings on educational outcomes are quoted now far more than those of any other work.
THE ORIGINS OF THE RAND STUDY
The President's Commission on School Finance, established in March 1970, was charged with making recommendations to the President on possible alternative roles of the federal government in the financing of elementary and secondary education. The commission requisitioned a number of studies, among them the Rand study made by Averch and his colleagues. The purpose was analysis and summary of the vast body of data on educational outcome that would help answer the question: Do the resources, processes, and organizations now employed in primary and secondary education have an appreciable impact on student achievement? The Commission sponsored a small interdisciplinary study beginning in January 1971, and the Rand Corporation, in recognition of the widespread interest in the question raised, supplemented the Commission's funding.
A MODEL OF EVALUATION OF EVALUATION
Faced with the complex interdisciplinary studies that run the gamut from microresearch to the pathbreaking Coleman report, Equality of Educational Opportunity,1 the authors classify the several research approaches associated with the discipline in which the research originates. They identify the similarities and differences among various types of research; place the individual studies and groups of studies in the perspective of the gamut of ongoing research; and set up a framework that facilitates the application of common standards of appropriateness of methodology and consistency of findings. The different approaches used by the several disciplines to examine educational outcome are modelled graphically. When at the outset the description falls short, the subsequent presentation overcomes any initial shortcomings.
The Rand study presents a review pattern of evaluation studies that calls for identification and subsequent assessment of policy-related research. It selects from the vast body of research those studies that are centered on a policy issue and assesses them using two scales, a scale of internal validity and a scale of consistency among studies, or external validity. The presentation of the input-output approach is especially lucid and should lead to more widespread understanding of this economic methodology.
The design of the tests for internal and for external validity marks an important forward step in "state of the research" assessments. Internal validity is denned in the monograph as methods appropriate to the evaluations made. Questions to be asked to determine internal validity include: Was the method appropriate to the problem? Were generally accepted procedures carefully followed? Were the results correctly interpreted given the advantages and limitations of the analytical technique used? Tests of significance and goodness of fit are applied to assessment of input-output studies in which multiple regressions analyses were used. The monograph notes the search for credibility. Studies that did not satisfy minimum requirements for internal validity were disregarded.
External validity requires consistency in findings of a number of studies. External validity testing is also a mechanical way of summarizing numerous disparate studies. It asks what conclusions should be drawn in the face of conflicting results, and it attempts to overcome the difficulties of evaluating materials in the extensive area of educational research. Interstudy consistency is determined subjectively. Studies using larger samples, more replication, better study designs, greater number of controls for intervening variables, or more accurate measure of variables are given greater weight. Thus internal validity criteria govern the findings on consistency.
The National Science Foundation in its recent requests for proposals has adopted the general framework of the tests of internal and external validity of the Averch study. The differentiation between tests of validity and the greater systematic review that those criteria of validity testing imply cannot but over time gain widespread acceptance. Improved reporting on the state of the research art will be an important consequence.
The monograph describes general methods of reviewing research that also will have a long-term impact. Especially important for those concerned with evaluation of educational outcomes is the caution about omitting individual student differences. Such omissions cloud the results of earlier research based on averages of information. The report notes that differences among students may set in motion differing interactions among students, teachers, schools, teaching methods, and so forth, and suggests a watchful concern about averages of data that can obscure such differences. Study findings of differences among student achievement scores using optional types of teaching methods, to take a simple case, assume essentially that all students respond in the same way to the teaching method. The method tried out works to improve learning, or it does not work. But if differences among students mean that one group learns better with method A and the other with method B, averaging the two groups tends to obscure the meaning of the differences in teaching method. The monograph notes that reasonable as the hypothesis of student differences in response to different kinds of teaching methods, in different classrooms with different teachers, may be, there is little research that supports it, though there are some notable exceptions. The problem rests not with the basic research on differences but in knowing how to respond to such identified student differences and how to define interactions so that students may be grouped for appropriate teaching method or teacher response.
SUMMARY OF AIMS
In brief, the report undertakes to assess what is known about the determinants of educational effectiveness. The monograph manages the large research base on which it has to draw by organizing the analysis according to an identified approach to research as follows: (1) the input-output approach; (2) the process approach; (3) the organizational approach; (4) effectiveness studies (such as those required by Title I and Head Start); and (5) the experiential approach which seeks to summarize the educational reform literature. The common thread for the particular studies selected is the view that the student in the school is an end objective rather than a means toward some further end.
Limitations of existing research, such as inadequacy of data, the serious problems of interpretation of achievement test results, the restriction of findings to cognitive outcomes, are emphasized. The authors particularly note that the failure of research studies to examine the cost implications of their findings makes it very difficult to translate the research results into policy-relevant statements.
Comments on Report's Conclusions
The Averch study is noted for the findings quoted below:
1. Research has not identified a variant of the existing system that is consistently related to students' educational outcomes.
2. Increasing expenditures on traditional educational practices is not likely to improve educational outcome substantially.
3. There seem to be opportunities for significant reduction or redirection on educational expenditures without deterioration in educational outcomes.
4. Innovation, responsiveness, and adaptation in school systems decrease with size and depend upon exogenous shocks to the system.
The monograph also notes tentatively that research suggests that improvements in both cognitive and noncognitive student outcomes may require sweeping changes in the organization, structure, and conduct of educational experiences. This inference is drawn from the four conclusions already cited and from the testimony of the studies examined under the experiential approach.
It is this reviewer's view that these recommendations detract from the luster of the report and stand in sharp contrast to the careful presentation of evaluative materials and qualifications. First, despite the prefatory words emphasizing the limitations identified throughout the report, implications are drawn for program funding and for program redesign.
Second, there is some doubt whether the right questions are addressed by the studies reviewed. What is it that schools are trying to accomplish in their segment of student time? What is the actual amount of school learning time? (According to my own count, it is 20 percent or less of the annual waking time of the child.) What are the appropriate measures of outcome? The effectiveness of education is being measured in these studies often by one outcome among many. Achievement scores on standardized tests are used despite their known deficiencies; other outcomes of the schools are set aside.
The report does note that there are numerous problems about method and measurement: "In general there is good reason to believe that our statistical techniques have just not been up to the kinds of problems we are addressing." The report states emphatically: "We must emphasize that we are not suggesting that nothing makes a difference, or that nothing works. Rather, we are saying that research has found nothing that consistently and unambiguously makes a difference in student outcomes." We must, the report goes on to say, note that we are not saying "that school does not affect student outcomes."
COMMENTS ON THE RESEARCH AGENDA
Identification of the limitations of the research starts with the difficulties of defining educational outcome. In most of the studies reviewed in the Averch volume, education is measured almost exclusively by cognitive achievement. Yet cognitive achievement is at best a simplistic approach to the complex development of human competencean approach taken in the reach toward quantification when the more orthodox economic measurement of educational return, namely increments to discounted lifetime earnings, clearly could not be used. Yet competence of an individual has many parts. Earlier, in another context, I have identified the 4A's of educational outcomes: aptitudes, attributes, attitudes, and achievements. Each of these parts in turn has many components. The debate on aptitudes, for example, still continues after many decades of IQ measurement. Understanding of educational outcome requires much research, and the agenda provided in the Rand report is insufficient in its detail. Some comments may be needed for clarification.
Numerous measurements and scales, for example, are available on attributes and attitudes. The interest in self-esteem as a product, or as component input of educational development, has led to the creation of more than 200 measures to assess it. Many of those measures have been used only once, many have been used only for a specific research task, and few have been applied to large samples. Coleman's use of three questions on self-esteem and the National Center for Educational Statistics' longitudinal study use of such questions, along with the Bachman2 and Rosenberg3 surveys, mark steps toward understanding educational outcomes.
Types of additional research required include experimentation with instrument testing. Such research would establish the kinds of instruments that could be applied uniformly in large numbers of evaluations and statistical surveys. In addition, there is need for analyses of how to measure concepts of competence. Such analyses would address questions of developmental work on measurement concepts, standardization of testing instruments, and design of large-scale application techniques.
The sole reliance on achievement testing stands in sharp contrast to the need for a major increase in research on such achievement tests:
1. What does each of the major testing instruments now used in the schools actually measure of what children are intended to learn? What additions and improvements are needed? The Center for the Study of Evaluation (CSE) at the University of California at Los Angeles, for example, reviewed currently available elementary school tests and judged each one according to its capacity to assess a particular educational objective from among 145 enumerated goals. The findings illuminate strikingly those educational objectives and goals that are not now measurable by published instruments.
2. What is the predictive value of the testing instruments? Is the outcome that is measured really a useful one? College testing and the predictive value of those tests for performance in college have been the subject of much research by Coffman, Dyer, Mills, Olsen, and others. But wider questions can be raised. Of those young persons giving evidence of capacity to perform in higher education, for example, what proportion goes on to college? What proportion of young persons with capacity for mechanical skills continues through to high school graduation in a technical or vocational school compared with the proportion completing comprehensive high school?
3. What is the currency of grade score norms for each of the most widely used achievement tests? And is the score scale continuing to measure the range of learning? In the CSE test evaluations mentioned earlier, some 1,600 scales and subscales were evaluated for the following criteria: stability of test scoring; internal consistency for a set and subset of scores; replicability of results from one group to another, or one time period to another; the range of coverage of the test within each norm; and the gradation of the scores or the score scale.
The norms for many widely used tests are out of date, and there is some question as to their representativeness at the time the norm of the test was established. (Voluntary participation of a number of schools or school districts in a project to establish a norm does not yield a representative sample for norm setting.)
Test forms that measure the same level of competence would presumably yield the same scaled score with high probability. Scores designating a position on a scale should be reproduced, test form by test form. Further, for each test given within a battery of tests the number designation of a specific position on the scale should represent the same level of competence for any test administeredthat is, 80 should always be in the same position relative to a top score. As of 1972, the scale system of even College Board scores was defined by the candidates who took the tests in April 1941.
The Rand report does not deal sufficiently with such areas of new research. For example, it displays the Coleman report finding of a high correlation between students' socioeconomic backgrounds and educational outcomes and puts forth a variety of hypotheses as to why the SES relationships seem so powerful in the Coleman report. But it does this without mentioning that the Coleman report finds that correlation is even higher between educational achievement scores and self-esteem and control.
The Rand report identifies program interactions as important areas of study. The hypothesis is presented that teacher, student, instructional method, and perhaps other aspects of the educational process interact with each other in different ways. These changing interactions need to be understood when examining the issue of program outcomes related to inputs. "We must emphasize," the report notes, "that we know very little about interactions."
Two kinds of interactions, it appears to me, are at issue. The report makes a very important contribution by identifying one set: the averaging of outcome over classes, groups, and schools. This averaging would screen out the effects on different individuals.
Another set of program interactions appears to be ignored: the question of joint costs of interactive public programs and their joint effects. School lunches, school physical education, highway safety education programs are cases in point. Despite the difficult conceptual problem of joint costs and gains, methods for differentiation become necessary if single programs are to be judged individually.
ANSWERING POLICY QUESTIONS PREMATURELY
Problems of analysis and interpretations of analytical results beset policy research. When instruments for fact-gathering are found lacking and the conceptual framework of analysis and data bases is inadequate, caution on policy prescriptions is indicated. At the same time, it is highly important that analytical research go forward if we are ever to learn about costs and benefits of public policies, about the full range of these impacts, or about the quality of public services.
Public officials have a lawyer's way of marshaling the evidence in support of a position. Evaluation studies are qualifying the presentation of that evidence. A 1972 study in the Office of the Secretary of the Department of Health, Education, and Welfare reviews, as did the Averch study, the evidence of the effectiveness of education, with the HEW study restricting its review to compensatory education. Compensatory education is selected in the light of the recommendation of the President for a new federal compensatory program for disadvantaged children. This report asks two questions:
1. Can compensatory education be made to work?
2. Does the application of concentrated compensatory resources (usually at higher dollar costs) in basic learning programs enhance the probability of success in compensatory education?
With respect to the first question on whether or not compensatory education can work, the evidence . . . is definitely encouraging. The important difference between success and non-success appears to depend on whether compensatory education funds have been channeled into traditional patterns of expendituresalary increases, routine techniques, etc.or whether they have been used to develop supplementary, focused, compensatory education programs. The reason there is so much evidence of failure is that resources have more often been used in the former rather than in the latter manner. On the second question of how closely effective compensatory education is related to increased expenditures, the evidence, and therefore our conclusion, is much less clear. However, on the basis of the commonsense observation that a supplementary compensatory education program will require' additional resources, on the evidence that the elements of programs found to be successful require significant additional resources (e.g., individualized instruction), and on the basis of some fragmentary evidence from several studies which have attempted to relate achievement gains to additional expenditures, we conclude that an effective compensatory education program will indeed require significant additional resources and we have recommended as an approximation of the needed addition the figure of $300.00.
The report in support of the President's 1972 recommendation illustrates the general importance of analytical studies that question and probe. Without these, we fail to understand the problem; in fact, we even fail to understand what the real questions are.
The studies that have been made have identified important issues, including the question of the bounds and metes of potential school impact. Earlier this question was not raised. Nor were the issues of program interactions and of timing. The analytical research studies have also identified the question of costs and cost differentials that take into account family differences, neighborhood differences, and also direct school cost differences.
Controversy surrounding interpretation of analyses' results should point to more analysis, more fact-gathering, and more research, as well as development of better methodology for answering policy-related questions. Present analytical methods, as the Averch report notes, are not sufficient. Empirical research and experimental program designs may be a necessary adjunct.
New organizational arrangements have to be created that can support by development of new competencies and new knowledge the essentially embryonic policy research. Premature findings can be counterproductive and premature application of premature findings can be destructive.
1 James S. Coleman et al. Equality of Educational Opportunity. Washington, D.C.: Government Printing Office, 1966.
2 Jerald G. Bachman. Youth in Transition, 2 Vols. Ann Arbor, Mich.: Institute for Social Research, University of Michigan, 1969, 1970.
3 Morris Rosenberg. Society and the Adolescent Self-image. Princeton, N.J.: Princeton University Press, 1965.