Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Curriculum Evaluation: Problems and Guidelines

by Herbert J. Walberg - 1970

Professor Walberg, of the Wisconsin Research and Development Center for Cognitive Learning, continues The Record's discussion of curriculum evaluation. He confronts the problems of how specific statements of objectives ought to be, what sorts of indicators might help in the judgmental process, and how better instructional methods might be revealed. Emphasizing the importance of "explicitness, objectivity, and critical judgment" in evaluation, Dr. Walberg has some suggestive things to say about the usefulness and relevance of educational evaluation in general. If it is ever to become an applied science, lie says, it has a long way to go.

Thomas Kuhn1 termed underdeveloped fields of science as "pre-paradigmatic." He defined "paradigms" as "universally recognized scientific achievements that for a time provide model problems and solutions to a community of practitioners." In distinguishing "pre-paradigmatic" and "normal" science, Kuhn writes:

No natural history can be interpreted in the absence of at least some implicit body of intertwined theoretical and methodological belief that permits selection, evaluation, and criticism. If that body of belief is not already implicit in the collection of facts—in which case more than 'mere facts' are at hand—it must be externally supplied, perhaps by a current metaphysic, by another science, or by personal or historical accident. No wonder, then, that in the early stages of development of any science different men confronting the same particular phenomena, describe and interpret them in different ways.

By this characterization of nascent disciplines, then, educational research and evaluation are in an early stage of development. Gage2, for example, has summarized some paradigms for research on teaching, but none would meet Kuhn's criterion of universal recognition. While these and other paradigms in education have been useful in isolated research efforts, none has led to a programmatic, cumulative series of studies. Much of educational research is atheoretical, and what theoretical work that has been accomplished is largely derived from the physical, biological, and social sciences, from philosophy, or from personal idiosyncrasies. Even aside from theory, fundamental and unresolved methodological problems of gathering "facts" plague educational research and evaluation. Research workers and schoolmen alike have been disappointed in the practical results of experimental methods in education.3 Experimenters argue that more precise measurement and rigorous research designs are likely to bear fruit eventually. But other investigators question the appropriateness of quantitative methods and use in their place methods of anecdotal descriptions of classroom events, like those of the social anthropologist, or observation and intuition, in the manner of the clinical psychologist. The point is not to bewail the inadequacy of various methods of gathering "facts," but to illustrate the necessity for a re-assessment of educational evaluation, its theory, practice, and their interdependencies.

Tyler's Strategies

The writings of Ralph Tyler have been the basis of much of the major work in educational evaluation and offer a constructive starting point for conceptualizing work in this field. Tyler4 proposed a three-stage process in curriculum development: 1) stating objectives in terms of student behavior, 2) specifying learning experiences likely to contribute to student attainment of objectives, and 3) evaluating learning experiences in terms of attainment of objectives. (This rationale arises from the means-ends distinction emphasized by the pragmatic philosophers Charles Peirce, William James, and John Dewey.)

The next two sections identify the problems of Tyler's strategies in stating objectives and specifying learning experiences in course evaluation. By no means is this discussion intended to belittle the work of Tyler, his former colleagues, and students at the University of Chicago. Indeed, their fundamental contribution to both the theory and practice of evaluation can hardly be overestimated. However, Tyler himself might be the first to admit that this is not a time for orthodoxy, even his own. He writes:

The accelerating development of research in the area of educational evaluation has created a collection of concepts, facts, generalizations, and research instruments and methods that represent many inconsistencies and contradictions because new problems, new conditions, and new assumptions are introduced without reviewing the changes they create in the relevance and logic of the older structure.

Therefore let us examine the "relevance and logic of the older structure."

Controversy on Objectives

During the past few years a controversy has centered on the specificity of the statement of objectives. Gagne5 and Mager6 hold that the objectives must be precise, detailed descriptions of student behavior exhibited on attainment of the objective. Others have argued that behavioral objectives constrict education to the trivial kinds of behavior that can be described precisely. Eisner7 warned that adherence to precise behavioral objectives may prevent the teacher from spontaneously deriving new objectives from on-going learning activities, especially in the arts where creative expressions are most clearly valued. Moreover, even the behaviorists would have to admit that it is often time-consuming and frustratnig, if not impossible, to get curriculum workers and teachers to state precise behavioral objectives. Nor have evaluations employing behavioral objectives proved to be conspicuously successful.

Bloom8 takes a reasonable position on this controversy: "It is virtually impossible to engage in an educational enterprise of any duration without some specification to guide one." Further, "Insofar as possible, the purpose of education and the specifications for educational changes should be made explicit if they are to be open to inquiry, if teaching and learning are to be modified as improvement or change is needed, and if each new group of students is to be subjected to a particular set of educative processes." Hopefully, further work in evaluation will reveal the efficacy of explicit objectives in instruction and evaluation.

Another point made by Bloom also seems constructive: less specific objectives may be more appropriate for educational media designed for teacher use. Indeed, it may be that a teacher's rigid adherence to pre-determined, specific objectives may impede student learning in much of education. Now in training, as opposed to liberal education, a number of explicit criteria are set forth; they can be "covered" by the teacher, programmed materials, or, probably just as effectively, by a textbook. Training is most effective when the objectives are explicit and when adequate motivation or reinforcement can be assumed as in military or industrial settings. Such training is characterized by its emphasis on the acquisition of basic skills, which can often be defined behaviorally.

On the other hand, curriculum makers, school boards, and teachers aspire to inculcate ideals, values, social skills, and other intangibles. They are concerned with higher-order cognitive processes such as analysis and critical thinking. Moreover, both the teacher and the students bring important, though vague objectives, ideas, and interests to class, some permanent, others transient. Paradoxically, these random elements lend caprice and serendipity to the class that may be far more important to the attainment of general ideals than predetermined specific objectives and lessons. Or certain events of the day may conjoin unexpectedly with the teachers' planned objectives and activities. These occurrences inject relevance, suspense, humor, and other human qualities to learning that are impossible with a programmed machine, a programmed course, or a programmed teacher.

The Problem with Programming

Though the word "programmed" has a modern ring, in education it is an essentially medieval idea. It stems from the time before printing when professors literally dictated Aristotle and exegeses to their students. The lecture method is still prevalent in modern times, and is bound up with the objective of "covering" a subject or a text through lectures and recitation. This is not to say that lectures always are inappropriate. There are a few teachers who can occasionally muster a beautiful lecture and create excitement in their students. But in the main it is overused: writing is generally more organized and comprehensive than speech, and reading normally proceeds at three times the speed of speech. Moreover, the reader may skip or skim parts of the work he knows and actively concentrate on what gives him difficulty.

Not only are programmed methods inefficient in "covering" material, they may be harmful to the social environment of learning.9 Classroom groups have at least two tasks: attaining instructional objectives through learning and developing a viable, if not cohesive, social structure. Paradoxically, if the course or teacher specifies the purposes and procedures of instruction too emphatically, the group may resist and learning may not proceed. This phenomenon may be observed on the university campus particularly in the lists of demands for greater student participation in the formulation of objectives, activities, and evaluation of learning. The little available objective evidence (cited above) suggests that the social environment has greater effect on important affective learning than on cognitive achievement.

If explicit programmed objectives have been slighted in this discussion, it may be a reaction to their current vogue. Federal funding agencies have required their use in new educational projects; persuasive exponents have sold them to schools. Their rhetoric seems to insist that what is not objectively specified and precisely measured does not exist or is not important, and further, that what is most measurable is most important. Taken to an extreme, the argument holds that social and affective learning may be ignored since it is difficult to measure; that, in the cognitive domain, essay examinations are undesirable because they lack technical standards of reliability; and that hope lies in multiple-choice tests because they are efficient and require no judgment in scoring. And this may be right; it remains to be seen. But until there is convincing evidence to support these kinds of assertions, it is dangerous to force such an orthodoxy on the schools.

The Need for Indicators

In the meantime, the evaluator is often supplied with vague, general objectives or no objectives at all. Obviously these conditions make his work more difficult—but not impossible. His job is to elicit more explicit objectives from the curriculum maker, or, failing this, he has other alternatives. He may be able to derive explicit objectives from the general; he may infer objectives from the learning materials themselves; and he may administer a general battery of indicators to find out what objectives the materials accomplish.

These alternatives may be used in combination, but in any case, a general battery should be employed for at least two reasons. Many different kinds of learning may occur in a course; by emphasizing one, others may be sacrificed. Also, the evaluator cannot assume that schoolmen will value the same objectives for the subject as the course developer; therefore, he must include indicators that may be of interest to a variety of consumers. Metfessel and Michael10 provided a comprehensive, seven-page list of about 105 suggested criteria. While it would not be feasible to include all these in most projects, it would seem necessary in any educational evaluation to use indicators of the following: factual and conceptual mastery of the general subject; higher-order cognitive mastery such as understanding and analysis; and affective learning such as values, interests, and attitudes brought about by the course. It would also be desirable to include other indicators even if they are only in experimental stages of development. One of these is the induced flavor or projected image of the subject, for example, the relative emphasis on developmental, logical, or intuitive aspects. For reasons discussed earlier, indicators of the social environment of learning might reveal unintended consequences of the course. Systematic observations in classrooms might show changed patterns of teacher behavior; casual visits would at least reveal whether or not the teachers are using the course materials. Since these indicators are by no means comprehensive, teacher and student comments might be solicited. Although it is difficult to code comments objectively, presumably any expected or unexpected sterling qualities or glaring inadequacies would be salient enough to detect.

Learning Experiences

Let us now turn to another difficulty of Tyler's strategy -- designing and selecting the most appropriate learning experiences to attain general or specific objectives. Stephens11 has taken a fresh look at educational research over the last fifty years and produced some humbling conclusions. The results of his survey indicate that the things commonly believed to promote learning make no difference at all. Research on teaching, for example, has consistently concluded that different teaching methods make little or no difference in student learning and attitudes. These conclusions apply to television and traditional instruction, team teaching and ordinary teaching, teaching in large and small classes, homogeneous and heterogeneous groups, core and traditional curricula, lecture classes and discussion classes, teacher-centered and group-centered approaches, in small schools with indifferent facilities and large schools with lavish facilities. Thus, it has proven impossible to specify instructional activities which optimize the general performance of students.

Perhaps as a consequence, some theorists have proposed sub-optimizing learning for groups of students with different aptitudes. This proposal is at least 25 years old and can be traced to Plato's Republic where he describes children of brass, iron, and gold and the different learning experiences required for each group. This concept is now known as "individualizing" instruction. Technically it depends upon the presence of "aptitude-instruction interaction," i.e., the tendency for different students to benefit unequally under different methods of instruction. For example, student A performs better under instruction A; whereas student B performs better under instruction B.

Unfortunately, it is extremely difficult to find consistent evidence for the aptitude-instruction interaction. Bar-Yam,12 in a 231-item review of aptitude-instruction interaction research, has found a little evidence that bright students and independent, assertive, flexible students perform better with flexibility and independence in the classroom; whereas dull students and dependent, anxious, rigid students do better under directive, highly structured conditions. While a balance of evidence shows that the two types of students perform better under these two conditions of learning, there are a number of studies which do not support this notion. Moreover, these interactions account for little variance compared to that accounted for separately by intelligence, socio-economic status, and prior achievement. It is likely that if there were powerful interactions of student aptitudes and instruction, they would have been found by now. Moreover, Bracht and Glass 1S point out that it might be fruitless to look for these kinds of interactions in courses because they are complex and contain many instructional and content elements.

The Environment of Learning

Perhaps a more fruitful area of optimization and sub-optimization research lies in the social environment of learning brought about by different courses. Exploratory research has already shown significant differences in environments attributable to randomly assigned courses.14 Moreover, with relevant factors held constant, the social environment is an optimizer of cognitive and affective learning;15 and environmental characteristics sub-optimize student learning, i.e., students of different levels of intelligence, personality, and other characteristics differ sharply in their performance in different environments.

The discussion in this section is not to depreciate basic instructional research in curriculum evaluation; indeed, curriculum projects offer an ideal setting for the educational psychologist to test his ideas against instructional realities. Moreover, courses are superseded; whereas instructional research, if it becomes an applied science, could develop empirical laws of learning that would have continuing relevance for courses in the future.

For the time being, however, Tyler's second stage must be based on common sense and guess work. Educational psychology offers no satisfactory method of designing learning experiences to attain given objectives. In view of the multiplicity of vague course and teacher objectives, the problem of specifying learning activities, and the possibilities of aptitude-instruction and aptitude-environment interactions, the course developer might do well to avoid trying to optimize and instead, include many diverse concepts and learning materials in the course. These elements, with a guide to their possible organization and use, may enable supervisors, teachers, and students to optimize and sub-optimize according to their own needs and objectives. If this is done, there is all the more need for a general battery of indicators in the evaluation of the course. A second consequence is the necessity of studying sub-groups of students who are likely to perform especially well or poorly under varying conditions of course use.

Having examined the difficulties of the Tyler strategies of stating objectives and specifying learning activities, and offered some provisional solutions that seem workable, let us turn to the problem of generalizing curriculum research.


That evaluation should be generalizable to specified populations of students seems an obvious objective; yet most evaluations must be faulted on statistical grounds. Certain well-known but little employed statistical procedures relating to randomization bear repeating here. Let us first consider the two traditional uses of randomization. As R. A. Fisher16 showed, the assumption underlying statistical inference is that the experiment to which it is applied meets the following conditions: 1) there has been a random selection of units from the population under study, from which population parameters can be estimated, and 2) for the estimation of experimental effects, there has been a random assignment of experimental units to treatments (and non-treatment to control groups). The first assumption allows estimation of population parameters with a known probability of error; the second allows the estimation of treatment effects with a known probability of error. It would hardly seem necessary to point out these assumptions again in 1970; but educational researchers (and social and biological scientists, for that matter) have continued to ignore them and resorted to "convenient" samples, "matched" groups, and "quasi-experiments." While descriptive statistics may be calculated for non-random samples, it is misleading to infer population parameters for them.

Only random samples of the population permit valid estimates of population parameters. Actually the sample defines the population, and statistical inference must be limited to that population from which the sample has been drawn. Unfortunately, this means that in a typical curriculum study, the sample, even if it is random, unnaturally constrains inferences to volunteer teachers or local schools or school systems with cooperative administrators. There is a great need for national random samples in educational research. To our knowledge, there has never before been a curriculum project to employ a truly random national sample of teachers with random assignment to control and experimental treatments.

A related statistical point often overlooked or misunderstood concerns the units of analysis, which must be independent observations. If a sample of teachers is drawn and the comparative progress of their students in different courses is to be studied, the proper unit of analysis is the mean of the students under each teacher. The "degrees of freedom" used in statistical significance tests is the number of teachers, not the number of students since students within the same class are not independent sampling units. This is not to say that non-inferential research studies with students or classes as the units of analysis are invalid; indeed, they are necessary to examine certain questions, for example, the comparative progress of bright and dull students in two courses. However, these studies do not permit generalization to the population.

The Long Haul

Another problem of generalizability has to do with changes in the course and students across time. To what extent does a course remain unchanged while undergoing evaluation? The intent of formative evaluation, of course, is to suggest ways' that course materials might be improved. But even at the stage of summative evaluation, the course may still be evolving. If this is so, it may be well to recycle the formative evaluation each year from the beginning to the end of the project, and to begin yearly cycles of summative evaluation during the last few years of the project and extend them for a few years after the course is completed. Evaluation of this scope and duration would require much labor and coordination, but it may be the most effective, if not efficient method of valid, comprehensive assessment.

If a project would continue evaluating for several years, it would allow follow-up studies of the students several years after they have taken the course; evaluation "over the long haul," as Carroll17 has put it, might be quite valuable. Ebbinghaus's classic studies of memory curves have shown the rapid rate of forgetting immediately after learning and the retention of the residual for long periods. Thus, an important topic for extended-term evaluation is the student retention over long periods after completing the course. Another question that may be answered by long-term evaluation is: Has the course aroused the student's motivation and interest enough for him to continue learning as evidenced by pursuing a career in the area of the course, taking more courses, or continuing his interest through independent study?

Still another problem of generalizability across time is the changing state of society and the possible irrelevance of courses developed before relevant changes. A vast complex of waxing, waning forces bear upon the content and methods of the curriculum. Dewey held that the schools reflect society, which seems obvious enough; but because the reflection is screened, distorted, and delayed, it would be difficult to specify and quantify the characteristics of society that brought about a given curriculum change. Many of the forces are, like social class, hypothetical constructs difficult to measure and weakly related to a host of other constructs in an uncertain direction of causality. Consider the changing character of high school physics: in 1949, applied, technical aspects were given primacy; 1959 marked the era of waxing scientific modernity and rigor; and 1970 seems to exemplify concern for the humanistic, social, and moral relevance of science. Many factors come to mind that may have led to these changes, but who is to say which and to what extent? The point is that social conditions change rapidly, and the curriculum reflects the changes. Ironically the course that appears to be relevant to specific conditions at one period may likely be outdated quickly. Until social indicators of the Zeitgeist are developed, the course evaluator will have to duck these issues or assert subjective judgment.

Explicitness, Objectivity, and Judgment

Explicitness in evaluation means that the methods employed are described in enough detail that the reader may assess their validity and attempt to replicate them. Objectivity is the independence of results from the individual characteristics of the evaluator. Meeting the standard of objectivity will increase the likelihood of making evaluation an applied science. Yet neither science nor evaluation are value-free: subjective factors have enormously influenced the progress of science (the root of the word "evaluation" connotes human judgment and possible personal bias). The interplay of these factors warrants more careful consideration.

The need for objectivity is most apparent in summative evaluation, for its purpose is to assess the comparative or absolute effectiveness of the finished course in attaining objectives. Many projects have sampled a highly selected group of teachers with able students, administered achievement test items based upon the course text, and concluded that the resulting scores demonstrate the effectiveness of the course. A few projects have employed pretests and posttests to show student growth in achievement during the course; and still fewer projects have contrasted the achievement of students in their course with a contrast or control group of students in other courses. If these methods are made explicit, the evaluator and his readers are able to judge the value of the evaluation design. While the readers, if not the evaluator, may conclude that the evaluation is trivial, biased, or invalid, these judgments can only be made if the methods and results are explicit.

Objectivity and judgment are also important in formative evaluation. It is extremely difficult for course developers to be objective and critical of their own work, yet it is absolutely necessary. As in any creative work, there must be a continuous, balanced re-cycling of productive and critical phases. The first and most severe critic must be the developer. But his own criticism is not enough, for inevitably he will be biased and unable to see all the weak points of his work. Therefore, he must solicit critical opinion from his immediate colleagues and various outsiders—specialists in educational media and evaluation, university professors of the subject, and school teachers and students using trial versions of the course. Yet here a balance is needed for critical capacity often outruns the productive with the result that work is never finished. Too much criticism, doubt, and revision may prevent bringing work to fruition. No amount of revising and polishing of a course or evaluation will result in a perfect product. One can hope for a reasonably good job given the inevitable constraints of time, energy, and funds. After this, remaining creative energy might well be channeled into objectivity and judgment in identifying the strengths and weaknesses of the finished course and evaluation and their implications for future projects.

Perhaps the role of judgment has been underestimated; the evaluator must judge. Bias can enter the "objective" methods and results through the choice of groups and instruments employed in the evaluation. Therefore, judgments and decisions regarding technical methods must stem from an explicit rationale for the evaluation so that the reader may judge its validity. A rationale is needed for the interpretation and judgment of the results; these processes must be explicit, couched in interpretive rather than objective language, and should err on the side of caution.

Education requires rigor and relevance, social and moral passion; but these very factors may be the downfall of research and evaluation. The history of "scientific breakthroughs" in education reveals a discouraging series of inadequate experiments which could not be replicated.18 The technical inadequacies went unrecognized by educational policy makers and did not deter them from attempting to reform the schools. Contemporary examples may be found in critical reviews19 of two recent books on "creativity" and "teacher expectancies and blooming students." The rather devastating reviews were probably read by only a handful of educational researchers concerned about the methodology. Yet these books or newspaper summaries of them reached the public and professional educators, and policy decisions based upon findings have already been made. The implication for evaluation is clear: it is not enough to present objective results and judgments; the evaluator must make clear to the non-technical reader the possible inadequacies of his methods and the weaknesses of his conclusions. An authoritative, refereed journal of educational evaluation would serve as an excellent vehicle for such studies.

There is also the problem of the evaluator's allegiance. An evaluator on a project staff may have conflict of interests which bias his judgment. Since he is paid by the project, his job or even the project may be at stake if he publishes an uncomplimentary report. On the other hand, non-staff evaluators may lack appreciation of the special qualities of a project or the interest and wherewithal to do a comprehensive job. It is difficult to imagine how a federal bureau modeled on the Food and Drug Administration or the National Bureau of Standards could take on this work especially in view of the traditional fear of national control of education. An independent group modeled on Consumer's Union may seem even more farfetched. Yet the massive amount of evaluation needed in education may require such steps. In the meantime, curriculum groups will probably continue their own evaluations, and there are a few ways that conflicts of interest may be lessened. Developing a critical climate and involving outside critics have been mentioned. Another alternative is to commission outsiders to carry on parts of the evaluation. This practice would be especially useful when the project lacks the facilities or specialized competence for certain aspects of the work, for example, the data files from national testing agencies or the techniques of quantifying teacher observations. Still another alternative is to separate the evaluation group to some extent from the rest of the project staff and give them no responsibilities for course development.

The evaluation group should have autonomy and authority to carry out their work. Presumably they would be sympathetic to the goals of the course and perhaps identified with them, but would be expected to reach their own decisions regarding evaluation. They would serve as a kind of "loyal opposition" as in the British Parliament. None of these methods, however, can insure complete objectivity and valid judgment.

Nor can the alternatives described above make educational evaluation a science in the same way physics or chemistry is a science. Like the social sciences, educational research is inevitably subjective in known and unknown ways. And this is as it should be, for education is committed to social and moral values. The general goal is making explicit these values and the "objective" methodology so that other workers can assess their validity from their own viewpoint.

Usefulness of Evaluation

Finally, evaluation should be useful. Obviously, formative evaluation should be useful in improving the course and is of concern mainly to the course makers before releasing the final product. On the other hand, others will be interested in the summa-tive evaluation. Who should it be useful for?—the curriculum maker, the subject-matter expert, the supervisor, the teacher or the student? Or should it be designed for a technical research audience or school purchasing officers in large city school systems?

In line with earlier discussion, an evaluation report should be appropriately explicit concerning sampling, research design, measurement, statistical analysis and interpretation. Such a report would enable other evaluators to judge the merits of the evaluation. On the other hand, this kind of detailed analysis may make the report dull and restricted to a technical audience. Therefore, many teachers and supervisors would not read it in its entirety or at all. They would be more interested in a description of the course and only the results of the evaluation. These problems can be resolved by writing at least two reports, one a technical substantive report for the research audience, the other a shorter substantive report for schoolmen. Part of the results might well be published in journals for the teaching audience or reported orally and graphically at various regional conferences so that teachers and supervisors may react to their results, ask questions, and make comments about the usefulness of materials and raise further questions for the evaluator to pursue.

Some of these points would hardly seem in need of saying. But there is a danger of evaluation becoming an isolated professional specialty. Already at educational research meetings, presented papers often appear to be displays of methodological virtuosity rather than educationally relevant. If educational evaluation is to become a useful applied science, it must develop theory, rigor, and relevance; and it has a long way to go on all three counts.

In conclusion, the recommendations made earlier in this paper may also make evaluation useful. Stating the special objectives of the course as best one can will enable others to judge its effectiveness on these criteria. Including both special indicators and those of interest to various groups will enable others to form a judgment of the course on the basis of their own priorities. Studies of sub-groups of students will enable those working with similar groups to judge the adequacy of the course for their students. Basic educational research might reveal better instructional methods and media. Including a random sample would allow generalizing the results to specified populations. Explicitness, objectivity, and critical judgment in formative evaluation are likely to improve the course. And finally, an objective reporting of the results and possible sources of bias will enable both other evaluators and potential consumers to judge the effectiveness of the evaluation itself.


  1. Thomas S. Kuhn. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 1962.
  2. N. L. Gage, Ed. "Paradigms for Research on Teaching," Handbook of Research on Teaching. Chicago: Rand McNally, 1963.
  3. See Donald T. Campbell and Julian C. Stanley, "Experimental and Quasi-Experimental Designs for Research on Teaching," Handbook of Research on Teaching, op. cit.; J. M. Stephens. The Process of Schooling: A Psychological Examination. New York: Holt, Rinehart and Winston, 1967; H. J. Walberg, "Can Educational Research Contribute to the Practice of Teaching?", Journal of Social Work Education, Vol. 9, Fall 1968, pp. 77-85.
  4. Ralph W. Tyler. Constructing Achievement Tests. Columbus, Ohio: Ohio State University Press, 1934.
  5. R. M. Gagne, "The Analysis of Instructional Objectives for the Design of Instruction," in Robert Glaser, Ed. Teaching Machines and Programmed Instruction. Washington, D.C.: Department of Audiovisual Instruction, National Education Association, 1965.
  6. R. F. Mager. Preparing Instructional Objectives. Palo Alto, Calif.: Fearon Publishers, 1962.
  7. Elliot W. Eisner, "Educational Objectives: Help or Hindrance?", School Review, Vol. 75, Winter 1967, pp. 250-62.
  8. Benjamin S. Bloom, "Some Theoretical Issues Relating to Educational Evaluation," Educational Evaluation: Neva Roles, New Means. Sixty-eighth Yearbook, Part II, National Society for the Study of Education, Chicago: University of Chicago Press, 1969.
  9. See Gary J. Anderson, Herbert J. Walberg and Wayne W. Welch, "Curriculum Effects on the Social Climate of Learning: A New Representation of Discriminant Functions," American Educational Research Journal, Vol. VI, No. 3, May 1969. Herbert Thelen, "The Evaluation of Group Instruction," Educational Evaluation: New Roles, New Means. Sixty-eighth Yearbook, Part II, National Society for the Study of Education, Chicago: University of Chicago Press, 1969; and Herbert J. Walberg, "The Social Environment as a Mediator of Classroom Learning," Journal of Psychology, Vol. 60, No. 6, Dec. 1969.
  10. Newton S. Metfessel and William B. Michael, "A Paradigm Involving Multiple Criterion Measures for the Evaluation of the Effectiveness of School Programs," Educational and Psychological Measurement, Vol. 27, Winter 1967, pp. 931-44.
  11. J. M. Stephens. The Process of Schooling: A Psychological Examination, op. cit.
  12. Miriam Bar-Yam. The Interaction of Student Characteristics with Instruction Strategies: A Study of Students' Performance and Attitude in a High School Innovative Course. Doctoral thesis. Cambridge, Mass.: Harvard University, 1969.
  13. Glenn H. Bracht, and Gene V. Glass, "The External Validity of Experiments," American Educational Research Journal, Vol. 5, November 1968, pp. 437-74.
  14. Anderson, Walberg and Welch, "Curriculum Effects on the Social Climate of Learning: A New Representation of Discriminant Functions," op. cit.
  15. H. J. Walberg, "The Social Environment as a Mediator of Classroom Learning," op. cit.
  16. Ronald A. Fisher. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd, 1925.
  17. John B. Carroll, "School Learning over the Long Haul" in J. D. Krumboltz, Ed. Learning and the Educational Process. Chicago: Rand McNally, 1965.
  18. Gene V. Glass, "Educational Piltdown Men," Phi Delta Kappan, Vol. 50, November 1968, pp. 148-51.
  19. Lee J. Cronbach, "Intelligence? Creativity? A Parsimonious Reinterpretation of the Wal-lach-Kogan Data," American Educational Research Journal, Vol. 5, November 1968, pp. 491-511; Robert L. Thorndike, a review of Robert Rosenthal and Lenore Jacobson. Pygmalion in the Classroom. American Educational Research Journal, Vol. 5, November 1968, pp. 708-11.

Cite This Article as: Teachers College Record Volume 71 Number 4, 1970, p. 557-570
https://www.tcrecord.org ID Number: 1788, Date Accessed: 12/3/2021 2:57:00 AM

Purchase Reprint Rights for this article or review
Member Center
In Print
This Month's Issue