On The Assessment of Academic Achievement
by Henry S. Dyer - 1960
A comprehensive program to develop better methods for assessing achievement in all the classrooms of the country would go a long way toward taking the heat out of the controversies now plaguing us and would furnish the means for replacing the anarchy in education with new vitality and a sense of direction.
IN THESE DAYS of ferment in the curriculum, any large producer of educational tests is likely to get strange and wonderful requests from teachers suddenly smitten by the need to evaluate the new things they are doing. The following letter, though purely imaginary, is not untypical. It reflects a few of the mistaken ideas about testing and the assessment of achievement which are prevalent:
During this past academic year I have been giving a new course, Marine Wild Life 106. I need some incontrovertible evidence to prove to my suspicious colleagues that Marine Wild Life 106 is more comprehensive and effective than the course it has replaced, namely, Marine Wild Life 102.
Will you therefore please send me 92 copies of your standardized objective test in ichthyology? I am particularly interested in showing that my students have developed a true and deep appreciation of the love-life of our underwater friends.
Please rush the shipment, since I want to give the test as part of my final examination next Thursday.
Although Professor Finn does not exist, he has many educational cousins who do, and who think, with him, that the purpose of educational evaluation is to demonstrate the truth of a foregone conclusion. It has never occurred to him that an investigation of Marine Wild Life 106 should be so planned as to permit negative findings to appear if, in fact, the course is not all he firmly believes it to be.
But what most bothers the professional tester about the many Professor Finns who are experimenting with new courses is the series of bland assumptions that lie behind the letters they write. The first such assumption is that one can prove something by giving a single test after the show is all over. The second is that a standardized objective test is standard and objective in some absolute sense, as though it had been made in heaven. The third is that testing agencies are like vending machines: All you have to do is put your nickel in the right slot, and out will come precisely the test you are looking for. All of which would be just dandy if true, but unfortunately none of these assumptions bears any noticeable relation to the facts of testing and the assessment of achievement.
THE HUMAN SIDE OF TESTS
What Professor Finn does not appear to realize is that an educational test of any kind is primarily a human process, not a physical thing. It is a process that begins and ends in human judgment. It is a process that requires time, effort, and hard creative thinking, not only of the professional tester, but of the test user as well. It is a process than can take an infinite number of forms, depending on the purposes to be served and the conditions under which the testing is to be done. It is a process that can hardly be confined to the use of multiple choice questions and paper-and-pencil techniques if it is to provide the sort of rewarding illumination of the educational scene that some people hope for.
In what follows I shall discuss some of the essentials of the testing process as applied to the assessment of academic achievement and pay my respect to several of the knotty problems that bedevil the whole enterprise. The discussion will turn up more questions than answers, but if the questions are sufficiently disturbing, the outcome should be such as to nudge Professor Finn toward deeper wisdom in approaching his evaluation problem.
What do we mean by "academic achievement?" If it is something we are trying to assess, then it seems reasonable to ask first of all what it is. In current usage, it is a fuzzy term that may mean any one of a dozen unspecified things: the sum total of information a student has at his command when he finishes a course of instruction, the getting of a passing grade in a course regardless of what may lie behind the grade, the score on a test that has "achievement" in the title, and so on.
There are two ideas that can be used to pin down the notion of academic achievement a bit more precisely. The first idea is that academic achievement refers to the identifiable operations a student is expected to perform on the materials of a course, that is, on the facts, theories, problems, principles, and points of view which he encounters while taking the course. The second idea is that academic achievement refers to the differences between the number and kinds of operations a student can and does perform at the beginning of a course and the number and kinds of operations he can and does perform at the end of a course.
The key terms in this definition of achievement are identifiable operations and differences. The emphasis on operations is supposed to suggest that it is what the student actually does that counts. We should think of achievement as something composed of transitive verbs with direct objectsverbs like infer', generalize, recall, compare, analyze, evaluate, organize, criticize. And we need to be sure that we can identify these operations by reference to specific tasks or questions that require them. ("What inferences can you draw from the following set of data?" "What general principle can you find that explains the behavior of the following political figures?" etc.)
The emphasis on the differences between what a student does at the beginning of a course and what he does at the end of a course calls attention to the fact that academic achievement is a dynamic, not a static, concept; it is what has happened between then and now, not just what is happening now. Can the student now solve differential equations that he found impossible last October? Can he detect differences between Beethoven and Brahms that were not apparent to him earlier? Can he organize his thoughts in written form better than he did at the outset?
Clearly, under this definition, the assessment of academic achievement is a complicated business; it is not something that can be done merely by going out and buying a standardized test two weeks before the final examination. It requires that the teacher responsible for a course must himself be a major participant in the assessment job from start to finish. He may be able to get supplementary aid from published tests; he may find that the professionals in testing can give him useful ideas and point the way to sound strategy; but the substance of the job of sizing up what he has done to his students rests with him.
How shall he proceed?
CLARITY OF GOALS
Obviously, the first step is for him to get a clear idea of what he is trying to have his students achieve. What operations does he want them to perform in June that they could not perform last September? This may be an "obvious" first step, but a look around the classrooms of the country suggests that it is a step rarely taken. Almost invariably teachers are more concerned with what they are going to do in their courses than with what their students are going to do. Lessons and lectures are outlined, reading lists are laid out, visual aids are planned, but few teachers give much concentrated attention to what the fuss is really all about from the standpoint of the student. If a syllabus is prepared, it almost never gives any detailed description of what students are supposed to be able to do as a result of having been exposed to the instructor and the subject matter. There is usually a hope that students will be impressed, that they will remember something, and in a vague sort of way it is felt that the course experiences will do them good, that is, make better thinkers or better citizens of them. But there is very little disposition to get down to brass tacks and specify what kinds of student performance are deemed to reflect better thinking or better citizenship.
All of which is hardly surprising. Stating the goals of a course in a highly concrete manner takes more energy and imagination than most teachers have time for. One of the very practical problems in developing a sound assessment program is to find the time needed to carry out this essential first step.
Generally speaking the goals of achievement fall into three broad classes: informational goals, proficiency goals, and attitudinal goals. The informational goals refer to those items of information that students are expected to know and give forth on demand by the time they have completed a course. It would appear that these can be readily described simply by listing the topics, sub-topics, and sub-sub-topics the course is expected to cover. But this is only half the story. A fact in human knowledge is not just the name of something standing all by itself; it is a subject and a predicate. It is not just the name "sodium"; it is the sentence, "Sodium is a metal." Progress toward the informational goals of a course, therefore, is measured by the number and complexity of these subject-predicate relationships which the student can reproduce. In making the informational goals useful as a basis for assessing achievement, some explicit attention must be given to these relationships and to their degree of complexity. It is not sufficient to list topics and sub-topics and sub-sub-topics.
The large majority of teachers would disown the idea that the only kinds of goals they have in mind for their courses are informational goals. They would argue, and quite properly, that information is one of the goals, and a necessary one, but certainly not one of ultimate importance. This they argue, but their arguments are too often confounded by their own examinations, which demand facts and facts only. One of the hoariest criticisms brought against objective tests is that they test only for factual information, while essay tests get at the higher aspects of learning. This would be a cogent criticism except for two things: First, a well-made objective test need not be limited to the measurement of factual information; and second, essay tests are too frequently graded solely by counting the number of relevant facts that each student churns up.
Most of what we think of as "the higher aspects of learning" is contained in the proficiency goals of academic achievement, that is, proficiency in various kinds of skills, both manual and mentalmanipulative skills, problem solving skills, evaluative skills, and organizational skills. Taken together, they add up to effective thinking and sound execution. But to be useful as guides in the assessment of achievement, they must not be taken together; they must be elaborated in detail and expressed in terms of the multitude of different kinds of specific tasks a student is expected to perform and in terms of the specific kinds of operations he must follow in performing them.
Unless the proficiency goals are elaborated in this fashion, they may be too easily overlooked or forgotten. Back in the 1930*8 a number of experiments in the teaching of science led to the conclusion that science courses without laboratory instruction were just as effective as science courses with laboratory instruction. The typical set-up was to divide science classes into two groups, one of which received laboratory work in addition to the regular course work and the other of which received no laboratory exercise. At the end of the course, the two groups would be compared by means of an achievement test in science. The differences in test scores between the two groups were generally negligible. Why? Because in setting up the tests, the proficiency goals peculiar to laboratory instruction had been overlooked. The only measures used had been measures related mainly to informational goals that have little or nothing to do with what goes on in a laboratory. Under these conditions, the finding of no difference in achievement between the two groups is essentially meaningless, since the relevant proficiency variables had not been taken into account. More recently Kruglak and his associates at Minnesota have taken a second look at this problem in science teaching, have devised tests aimed directly at the proficiency goals of laboratory instruction, and have come up with results of a more meaningful sort (7).
It is admittedly no easy matter to define all the proficiency goals in terms of student performance. One can flounder a long time over the types of performance that form the components of creative thinking, for instance, or skill in evaluative judgment. The result is that nobody has yet been able to produce a completely satisfactory and universally applicable method for assessing achievement in these areas. On the other hand, there have been some interesting attacks on the problem, and possibly in the course of time it may be solved (8, 10, 11). A giant step toward the solution will be made when teachers now uttering pious sentiments on the subject will get down to cases and try to define what they are really looking for when they speak of such things as "creative power" and "sensitivity to values."
Finally, there are the attitudinal goalsthose educational objectives which are often blithely called the "intangibles." Everybody applauds them, but few teachers seem to know what to do about them. How does one define such goals as love of music or sense of social responsibility or enthusiasm for abstract ideas in such a way that they can be recognized when seen? What is it that a student does to demonstrate that he likes good literature? We have our dodges on matters of this sort: We count the books a student takes out of the library; we measure the amount of spare time he gives to good reading (provided, of course, we allow him any spare time at all); we engage him in conversation and try to judge from the fervor in his voice how far along the scale of appreciation his reading has brought him, or we ask him outright on some sort of rating schedule how well he likes Shakespeare and T. S. Eliot. But we suspect that these devices for uncovering attitudes are based on exceedingly tenuous inferences. Books may be taken out of the library not to be read, but to maintain an impressive looking bookshelf. The time spent in reading may really be spent only in an effort to avoid reality. Conversational fervor may be only good acting or the effect of one cocktail too many. And a rating schedule can almost always be faked if much depends upon it.
I dislike to be discouraging in this all-important matter of attitudinal goals, but thus far very few operations have been suggested which define them satisfactorily. There are some breaks in the clouds, though. At Pennsylvania State University a group of experimenters has been trying to get a line on student attitudes toward television instruction. One device they have hit upon is to give the students generous samples of both TV instruction and conventional instruction, and then to permit them a genuine choice as to which type they would take for the remainder of the semester (3). This is a real operational definition of a specific attitude. It furnishes a fruitful clue for further development in defining attitudes in other areas, the general principle being that one defines attitudes in terms of live decisions that make an actual difference in what an individual will do or not do.
MEASUREMENTS OF GOALS
Once the goals of instruction have been clearly set forth by describing the operations students have to perform to attain them, the problem of devising tests and other techniques for assessing academic achievement is at least seventy-five per cent solved. The reason is that an achievement test is in effect a sample of all the kinds of tasks that a given course of study is striving to get students to master. As such, the tests should in themselves constitute a definition of the goals to be attained. In putting together an appropriate sample of tasks and checking out their adequacy, there are technical matters in which the experienced tester may serve as a guide to the teacher. But, again, the questions of what to include or not to include, what is more important and what is less important, what constitutes a good response and what constitutes a poor response must in the last analysis be the teacher's decision if the test is to measure progress toward the goals he has in mind.
On the other hand, as mentioned above, teachers are busy people who are unlikely to find the time they need to do all that really should be done to work out the goals of their own instruction and to devise fully valid measures for assessing achievement. What are some of the practical compromises?
In the first place, although one is unlikely to find any published standardized test that in all respects fits the goals of a particular course or school, one may, by analyzing such tests question by question, possibly find one that comes reasonably close. Looking at a test question by question is important not only for determining a test's general suitability, but also for deciding what questions might properly be eliminated in assessing the performance of one's own students. Surprisingly, the notion of adapting a published test by dropping out irrelevant material seldom occurs to people, yet in many circumstances it is an obvious and justifiable procedure.
Standardized tests, when they can be found and adapted, have two important advantages for any program for assessing academic achievement. First, the makers of such tests have usually lavished a great deal of care on the preparation and try-out of each question to make sure that it is unambiguous, that it discriminates sharply between good and poor students, and that it is of the right level of difficulty for the group at which it is aimed. Second, two or more parallel forms of a standardized test are usually available, so that one has at his disposal the means for accomplishing a highly important part of the assessment job, that of measuring the student's performance twiceonce at the beginning of the course and once again at the end.
TESTING THINKING ABILITY
Until recently most standardized tests have concentrated rather heavily on the purely informational goals of instruction and have tended to neglect the proficiency goals. In the last few years, however, there has been a marked change in this respect. More and more tests are made up of questions that require the student to perform various mental operations well beyond the simple operation of recalling learned facts. They are getting at problem solving and reasoning processes of many kinds. Consider the following sets of questions drawn from a booklet describing the achievement tests of the College Entrance Examination Board (1). If the reader will take the trouble to wrestle with them, I think he will find that they challenge powers of thought well beyond simple memory, though they also require of the student a firm foundation of factual knowledge.
The first set of questions is intended to test the student's appreciation of those factors that must be controlled so that valid conclusions can be drawn from the results of a scientific experiment:
You are to conduct an experiment to determine whether the rate of photosynthesis in aquatic plants is affected by the addition of small amounts of carbon dioxide to the water in which the plants are growing. You have available all of the equipment and material generally found in a well-equipped science laboratory plus several well-rooted Elodea plants which are growing in a battery jar. It is evident that the plants are healthy because they are giving off bubbles of oxygen.
1. If the addition of carbon dioxide were to have an effect upon aquatic plants, almost immediately after bubbling the gas through the water you would expect to observe a noticeable change in the
(A) growth of the plants
(B) temperature of the water
(C) coloring of the plant leaves
(D) rate of bubble production by the plant
(E) amount of oxygen consumed by leaf respiration
2. If a carbon dioxide generator were not available for the experiment, an adequate supply of carbon dioxide might be provided by
(A) placing a piece of carbon in the water in which the plants are growing
(B) burning a candle over the battery jar
(C) adding carbon tetrachloride to the water
(D) pumping air into the water
(E) blowing through a glass tube into the water
3. Which of the following materials would be least useful in identifying the escaping gas as oxygen?
(A) A collection bottle
(B) Glass tubing
(C) A glass plate
4. Which of the following materials could be used to provide a source of carbon dioxide for this experiment?
(A) Distilled water and carbon
(B) Zinc and carbonic acid
(C) Limestone and hydrochloric acid
(D) Limewater and carbon
(E) Sodium hydroxide and carbon
5. The addition of carbon dioxide could have no observable effect upon the rate of bubbling by the Elodea plants if you had
(A) previously turned off the light
(B) no means of regulating the carbon dioxide flow
(C) no means of regulating the room temperature
(D) a variable water temperature
(E) no cover on the battery jar
6. Before you could conclude whether or not the rate of photosynthesis in aquatic plants in general is affected by the addition of small amounts of carbon dioxide to the surrounding water, you would have to repeat the experiment using
(A) all the materials of the original experiment
(B) an entirely different set of materials
(C) other aquatic plants
(D) other Elodea plants
(E) some terrestrial plants
The following set of questions attempts to test the student's knowledge and understanding of United States foreign policy:
SPEAKER I: I don't think the United States has any business getting involved in the affairs of other nations, through foreign aid or any thing else. We have prospered without aid from other countries. Why can't other nations do the same?
SPEAKER II: But we can't afford not to aid other nations. A foreign-aid program offers us a great opportunity to increase our prestige and we should take advantage of it. A nation which ignores other nations will not be regarded as important.
SPEAKER III: Maybe, but let's not forget the fact that foreign aid is also an investment in our own future. If we don't help other free nations, we can't expect to stay free ourselves.
SPEAKER IV: Let's be practical about this. If other nations are too weak to stand on their own two feet, we should help them, yes, but let's remember that when they become dependent on us we must also subordinate them to us. We can only justify foreign aid if we're going to protect our investment.
SPEAKER V: You're all so cold-blooded about this! It isn't a matter of practicality, but of moral obligation to aid other countries. If we could only renounce war, think of the money and material which would be available for constructive work.
1. A humanitarian point of view is best represented by Speaker (A) I (B) II (C) III (D) IV (E) V
2. Since the close of World War II, the United States has mainly justified its foreign-aid policy by arguments such as those advanced by Speaker (A) I (B) II (C) III (D) IV (E) V
3. Which speaker represents a point of view historically associated with the midwestern United States?
(A) I (B) II (C) III (D) IV (E) V
4. Speaker IV would probably have approved of the
1. Platt Amendment
2. Roosevelt Corollary to the Monroe Doctrine
3. Good Neighbor policy
(A) 3 only
(B) 1 and 2 only
(C) 1 and 3 only
(D) 2 and 3 only
(E) 1, 2, and 3
* * *
Despite the progress that may have been made in developing standardized tests, one must always bear in mind that they can never do the whole job of assessing academic achievement in a particular situation. In the last analysis, if a teacher wants to measure those aspects of achievement peculiar to his own instruction he must resort to at least some tests of his own making. He can get some good ideas for approaching this enterprise from two books: Taxonomy of Educational Objectives (2) and General Education: Explorations m Evaluation (5). If one peruses these books looking for new testing ideas rather than faults, one may find much that is helpful in suggesting ways and means of measuring progress toward the goal of more effective thinking.
But the assessment of the student's growth in effective thinking is still not enough. Any teacher with a well-developed conscience is primarily concerned that his students shall acquire increasing maturity in their attitudes, that is, a more grown-up and informed approach to life and learning. The assessment of this aspect of achievement requires ingenuity and a willingness to experiment with tests that might be regarded as "offbeat."
Two such instruments may be mentioned as examples. One, called The Facts About Science (9), looks on the surface like an ordinary factual test aimed at seeing how much the student knows about scientists and the sorts of things they do. Actually, however, the pay-off response to each of the scored questions represents a false sterotype of the scientist or his work, e.g., "A scientist has no sense of humor." The test has been used to see to what extent high school courses in science influence students away from these false stereotypes. The same approach could be taken to get at student attitudes toward many other social institutions and enterprises.
Another instrument carries the title Sizing Up Your School Subjects (6). It focuses on students' attitudes toward their current academic work and is especially designed for use in educational experiments where a given subject is being taught by two or more methods. Through an indirect approach, which almost certainly guarantees that the student will not fake his responses, it provides the means for comparing different courses or different methods of instruction in the same course with respect to the reactions of students to ten aspects of the material or the teaching. (E.g., What course is best at holding the student's attention? What course is regarded by the student as most valuable to him?) With a little imagination, the technique can be elaborated to cover the attitudes of students toward almost any feature of the academic work in which they are engaged.
One of the things that is holding back the development of new and better methods of assessing academic achievement, especially in the area of attitudes, is the tendency to think almost exclusively in terms of paper-and-pencil tests, particularly multiple choice tests. Multiple choice tests are useful up to a point. They can do more than most people realize in tapping a student's higher mental processes, and all the possibilities of multiple choice questions have not yet been fully explored. Nevertheless, the multiple choice test has, by its very nature, certain severe limitations. So do other forms of written tests, including essay tests.
To break out of this restricted mode of thinking we ought to consider how to exploit the possibilities in the so-called situational tests which psychologists have been working on in recent years. To what extent can such tests be adapted to the measurement of the "intangibles" of academic achievement? Situational tests are still highly experimental and full of technical problems, but they may well be the answer to the question of how to evaluate many of the subtle aspects of human behavior that can never be reached satisfactorily by paper-and-pencil devices.
One effort to apply a situational test in the classroom setting can be found in The Russell Sage Social Relations Test (4). On the surface, this is a test of the ability of a group of students to solve cooperatively a series of construction problems. It does in fact give a reasonably good measure of the group's effectiveness in this situation. If properly administered, however, it also yields information about other important aspects of group behavior: how well the students have learned to work together, how they respond to group control, how well they develop among themselves an efficient group organization, how they react on one another, and so on.
This sort of technique, and others like it, should eventually bring us closer to a truly adequate assessment of how students are attaining the attitudinal goals of instruction. Indeed, such techniques can do more. They can dramatize the importance of the goals by demonstrating how far we are falling short of achieving them. The development and application of good situational tests are expensive and time-consuming. Nevertheless, if we really wish to know what is happening to students in our classrooms, we are going to have to give up the notion that we can find out by relying on the "quick-and-dirty" tests that come cheap.
ASSESSMENT IS CENTRAL
Education these days has become a major concern of national policy. It is in consequence the center of a good deal of bitter controversy, and the issues being debated (segregation, the source of school support, teachers' salaries, federal or local control, "standards") are so overriding that the question of how to assess student achievement seems by comparison to be of minor technical importance. Actually, it ought to be regarded as of central importance to the whole educational enterprise. Until we can reduce our vast ignorance of what is actually happening to the minds and hearts of students in the classrooms, until we can point up the goals of education in terms of what we want students to do physically, mentally, spiritually, and until we have better ways of knowing how well we are getting them to do these things, there is hardly much point to all the fuss about the so-called "larger issues." A comprehensive program to develop better methods for assessing achievement in all the classrooms of the country would go a long way toward taking the heat out of the controversies now plaguing us and would furnish the means for replacing the anarchy in education with new vitality and a sense of direction.
1. A description of the College Board Achievement Tests. Princeton, N. J.: College Entrance Examination Board, 1960.
2. Bloom, B. S. (Ed.). Taxonomy of Educational Objectives. New York, N. Y.: Longmans, Green and Co., 1956.
3. Carpenter, C. R., & Greenhill, L. P. An Investigation of Closed Circuit Television for Teaching University Courses. University Park, Pa.: The Pennsylvania State University, 1958, p. 77.
4. Damrin, D. E. The Russell Sage Social Relations Test: a technique for measuring group problem solving skills in elementary school children. Journal of Experimental Education, 1959, 28, 85-99.
5. Dressel, P. L., & Mayhew, L. B. General Education: Explorations in Evaluation. Washington, D. C.: American Council on Education, 1954.
6. Dyer, H. S., & Ferris, A. H. Sizing Up Your School Subjects. Princeton, N. J.: Educational Testing Service, 1958.
7. Kruglak, H., & Carlson, C. R. Performance tests in physics at the University of Minnesota. Science Education, 1953, 37:2, 108-121.
8. MacKinnon, D. W. The highly effective individual. Teachers College Record, 1960, 61, 367-378.
9. Stice, G. The Facts About Science Test. Princeton, N. J.: Educational Testing Service, 1958.
10. Taylor, C. W. (Ed.) Research Conference on the Identification of Creative Scientific Talent. Salt Lake City, Utah: University of Utah Press, 1956.
11. Wilson, R. C. Improving criteria for complex mental processes. Proceedings of the 1957 Invitation Conference on Testing Problems. Princeton, N. J.: Educational Testing Service, 1957.