The Decline and Fall of Group Intelligence Testing

by Joseph O. Loretan - 1965

The validity of the group intelligence test, used extensively all over the nation, is being widely questioned; and New York City, with the largest public school system in the country, has recently discontinued its use. Although the concept of the IQ has long been taken for granted, it is actually only within the last four decades that intellectual development has been measured by group tests.

THE VALIDITY OF the group intelligence test, used extensively all over the nation, is being widely questioned; and New York City, with the largest public school system in the country, has recently discontinued its use. Although the concept of the IQ has long been taken for granted, it is actually only within the last four decades that intellectual development has been measured by group tests.

"Give a man a tolerably fair memory to start with, and piloting will develop it into a very colossus of a capability. But only in the matters it is daily drilled in. A time would come when the man's faculties could not help noticing landmarks and soundings, and his memory could not help holding on to them with the grip of a vise, but if you asked that same man at noon what he had had for breakfast, it would be ten chances to one that he could not tell you." So Mark Twain commented on the training of a riverboat pilot in Old Times on the Mississippi, coming close to suggesting what Dr. Loretan here calls one of the crucial weaknesses of group IQs. They measure only a limited range of abilities in terms of something already learned; and he discusses here (from a position on the firing line) some of the good reasons for discontinuing their use and replacing them with achievement tests.

(For important help with this article, the author wishes to thank Shelly Umans of the Office of Curriculum Research, NYC Board of Education.)

Traditionally, individual teachers bore the responsibility for assessing their own students' mental characteristics as they watched them struggle to accomplish assigned tasks and saw how some succeeded and some did not. Their judgments were based on observation; and, although they sometimes miscalculated,, one teacher's assessment could always be balanced against another's, and great damage was seldom done.

Then, early in this century, science intervened with new ideas about the measurement of intellectual development. In 1904, Alfred Binet was appointed to a commission in the city of Paris which had been asked to recommend methods of picking out mentally incapable schoolchildren. He had worked out some tasks that middle class children could be expected to perform routinely. They included such activities as buttoning clothes, obeying simple commands and copying simple drawings, knowing left from right. On the basis of his observations of individual children set to such tasks, he came to some tentative conclusions about the comparative development of intellect. These, coupled with an interest in psychometrics, led him to devise the first "measurement of intelligence" instrument.

But "intelligence," for Binet, was educable. "A child's mind," he wrote, "is like a field for which an expert farmer has advised a change in the method of cultivating, with the result that in place of desert land we now have a harvest. It is in this particular sense, the one that is significant, that we say that the intelligence of children may be increased. One increases that which constitutes the intelligence of a school child, namely, the capacity to learn, to improve with instruction" (6).

Binet's testing procedures survived, but not his belief that what was being measured was educable. James McKeen Cattell, Henry Goddard, and Lewis Terman, concerned with either retarded or gifted children, took the tasks developed by Binet and incorported them into the instrument we have been using for forty years. Binet's notion of "a field of cultivation" was completely reversed in the textbooks on measurement written after the development of the Stanford-Binet test in 1916.

The textbook writers tended to the view that the IQ was to be considered a constant, since, in each individual case, intelligence was conceived to be fixed. By intelligence, psychologists suggested, they understood "inborn, all round intellectual ability" (9), an inherited, general mental quality susceptible to accurate and easy measurement. Almost universally, the popular press and the parents and teachers of America began speaking of intelligence as something children possessed in fairly fixed amounts.


The advent of the group test was what led to the greatest misconceptions. The group IQ test was descended from Binet's individual test; but, unlike a biological descendant, it possessed few of its ancestor's characteristics. Modified by psychologists at Columbia, Stanford, and Chicago, it was intended for low-cost use in various-sized groups, since individual testing was (and is) prohibitive in terms of time, personnel and expense. The test-makers, it is true, were men of good conscience; and care was taken to preserve as many as possible of the aspects of the individually administered test. The first group mental ability tests, in fact, took four hours to administer and included many of the elements of the original individual test. But even the most enthusiastic proponents of the four hour device realized that four hours were too much for youngsters in classroom situations. So the test was whittled away until it was reduced to one that could be administered in forty-five minutes.

In such a brief span, then, a youngster is tested on verbal meanings, spatial concepts, reasoning, number concepts, and word fluency. He is subjected to this, moreover, at whatever time is administratively convenient for his particular school system. The test is usually administered to him by a teacher untrained in testing, whose only qualification is the ability to read the "Directions to Teachers" on the first page of the manual.

By these means, he is identified, labelled, classified, and placed, more often than not, in a class with youngsters of "equal ability." As more than one teacher has said, "Once you know the child's IQ, you tend to see him through it, and you adjust your teaching to his ability or level of intelligence—as revealed by the test." In this manner, the IQ becomes a self-fulfilling prophecy to a child as well as a teacher. It begins to signify something "given," as if it were a part of the body; and, too often, it is taken to be a clue to what a child is "worth." The effect of this on the initiative of teachers and the self-images of children is appalling to contemplate. What is even worse, the general public seems to think that the IQ means the brain power a person was born with, and that nothing can be done about it in future life.

Yet we know through experience that the IQ score is not unchanging. Glenn Heathers, Director of the New York University Experimental Center, states that some children's IQs may vary as much as 40 points from one period of their lives to another. This was dramatically demonstrated in the New York City Demonstration Guidance Project not long ago. Children who were in the project from 1956-59 gained an average of 8 points in IQ. Those in the project from 1957-60 gained an average of 15 points. The range was from a gain of 5 points to 40 points.

Study after study is proving that the IQ can be elevated with good teaching and increased student motivation, for all the fact that many people still consider the IQ score to be stable and predictive. Henry C. Dyer, a recognized expert in testing, writes that ". . . the *IQ,' which is a type of derived score attached to intelligence tests, is now generally frowned upon by experts in measurement because the assumption on which it rests differs from one test to another and from one standardization group to another. The common meaning it appears to have across tests and populations is quite unreal and misleading" (4). Kenneth Clark, psychologist at City College of the City University of New York, says that IQs based on the usual group tests "are worse than meaningless; they are misleading." He goes on to explain that an IQ score is meaningful only under controlled conditions "of individual testing by a specially trained psychologist who can draw out the best in a child and who can remain sensitive to the level of the child's motivation during testing."

When one looks at studies of the distribution of intelligence, as measured by group tests, one finds that only five to ten per cent of America's students rank as "superior." If this is so, the chances of national survival are slim. But, in reality, we have never learned to recognize or develop more than a portion of the potential in this country because we have imposed strait-jackets through our measurement devices, set limits, sat in judgment.


Consider the observation made in the HARYOU report to the United States Office of Education: that median IQs dropped 4 points from grade 3 to grade 6 in 25 Central Harlem schools. What inferences ought to be drawn from this? Can it be that, the longer some children stay in school, the "less intelligent" they become? Or can it mean that we are simply testing achievement with respect to a narrow range of abilities?

Chauncey and Dobbin, in a recent book (2), state that intelligence tests only measure capacity for learning. They do not measure latent intelligence, nor do they trick people into revealing how much brilliance or stupidity they possess. The basic idea of an intelligence test is that of the "work-sample," since it confronts a student with tasks (or standard jobs) on which he can demonstrate his skills and then compares his work with that of others who have been set to the same tasks.

In addition, they can only measure mental ability, not in terms of some inborn power but in terms of something already learned. ". . . even though 'native intelligence' is suspected to exist, the intelligence tests we use" can only measure "a developed ability in which the innate ability and learned behavior are mixed in unknown proportions." Moreover, the writers go on to say, intelligence tests provide comparative estimates of learning capacity, rather than measures of capacity in absolute units.

Although similar in concept, intelligence tests tend to differ from publisher to publisher; and different types of ability may be associated with intelligence in different schools. Nevertheless, the tasks on which children's skills are tested tend always to be those most appropriate for children in an "average" cultural environment; and they are almost always "schoolish," if not "bookish" in nature.

Recent research, like J. P. Guilford's on the specific abilities required for scientific research, and Jacob Getzels' on creativity, indicates clearly that the abilities commonly measured do not include the ability to think of original scientific solutions, nor do they include the skill of "divergent thinking" which creative people characteristically possess. With considerable justification, Robert L. Ebel of Michigan State University has pointed to the danger associated with the use of a single test or test battery in selective admission procedures and in awards of scholarships. This may foster, he says, "an undesirably narrow conception of ability and thus tend to reduce diversity in the talents available to a school or to society" (5).

Important as verbal and quantitative skills may be, Dr. Ebel suggests, "they do not encompass all phases of achievement." It is operationally simpler to use "a common yardstick" for all students, but overemphasis on the single test may well lead educators "to neglect those students whose special talents lie outside the common core."


Having found +.80 correlations between high school grades and freshman college grades in a study conducted for the Educational Records Bureau, Geraldine Spaulding concluded that there was considerable evidence for the stability of student grades throughout four years of high school, and that a high correlation exists even between ninth year grades and college freshman grades. Studies such as hers suggest the possibility of longitudinal studies based upon school grades supported by standardized achievement tests (11).

Benjamin Bloom, in a recent book (1), writes of "correlations as high as +.83 between high school grades and college grades (scaled), with the average correlation being approximately +.78." The evidence seems to be reliable enough to suggest the use of school grades as a "measure of academic prediction," when further longitudinal evaluations are made. Another possible measure is to be found in standardized achievement tests like those reported by Learned and Wood (8). Again, high correlations were found between the results of tests at the end of the 14th and 16th years of school. Similar results were reported by Lanholm (7), after giving the General Education portion of the Graduate Record Examination to 1,000 college students when they were sophomores and again when they were seniors. Where elementary and secondary school students are concerned, there are studies like those of A. E. Traxler (12), who gave reading comprehension tests to 7th grade students in four schools and repeated the tests in the 12th grade with a correlation of +.85. D. P. Scannel (10) conducted Iowa Tests of Basic Skills in the 4th, 6th and 8th grades and gave the Iowa Tests of Educational Development to the same students in the 9th and 12th grades. Again the correlation (between, for example, the scores at the 4th and 12th grades) was high.

As Vaughn J. Crandall reports (3), achievement behaviors tend to be consistent across situations and time; and evidence of achievement efforts during elementary school age and early adolescence tends to be predictive of adult behavior. Also, achievement testing combined with teacher judgment have the value of showing continuing growth. The student has the opportunity to compare himself against an achievement test mark — and move up from where he started, if he can. Unlike IQ tests, achievement tests offer hope for improvement and serve as starting lines, not limits or goals. There is considerable agreement among teachers as well as psychologists that, as Kurt Lewin once pointed out, successful goal striving renews motivation for more striving.


New York City has initiated a comprehensive program of achievement testing to supplement class grades. Not only is there substantial evidence that, with appropriate help, achievement scores tend to soar; there is the sense that the new programs may counteract the fatalism fed by beliefs of inherent inferiority, based on misinterpretations of the meaning of the IQ.

Beginning in September, 1964, reading tests have been administered to every grade from the 2nd to the 10th. Achievement batteries are planned for three points in each pupil's educational career, in grades 3, 6, and 9. Mathematics tests are to be given in grades 2, 3, 6, and 8. The following skills are to be repeatedly tested: spelling, capitalization, punctuation, usage, map-reading, graphs and tables, reference tables, arithmetic concepts, and problem-solving, constituting a range of abilities considerably wider than those traditionally tested by IQ group tests.

In the spirit of this new approach, an action research project has been undertaken and is being conducted cooperatively by the New York City School System and the Educational Testing Service. The project deals with the problem of assessing the intellectual ability of young children; and 2,250 first graders in 25 schools are involved. It grew out of the conviction of the author and Henry Chauncey, President of ETS, that teachers' judgments would be more reliable than group IQ testing in gauging diverse first graders' intellectual ability, especially if certain guidelines were established, and if the judgments were supplemented by achievement testing.

It was thought necessary to show teachers how to assess their pupils' intellectual development on the basis of observations in the classroom; and a variety of principles were developed. Teachers were provided with sensible guides to observation, so as to know precisely what to watch for; and a few ground rules were defined for recording and interpreting the observations made.

Equally significant was the structuring of a theoretical model or point of view. A unifying theory does not have to be totally or uncritically accepted to benefit a project of this sort; but there are great advantages in using a single model as a set of working assumptions, so that the project can proceed "all of a piece" rather than as a series of unrelated activities.

The source of the model was largely discovered in the now familiar work of Jean Piaget, modified by ETS staff members and by relevant observations of such psychologists as Jerome Bruner, J. P. Guilford, and others. The three basic assumptions are (1) that intelligence is mostly acquired in a sequence of stages which is about the same for every individual; (2) that it is acquired through interaction between the child and his environment; and (3) that it is revealed by the child's behavior—to the person who knows what to look for.

The first step in putting a theory to work in education is to translate it into terms which can be used by teachers; and the logical starting point for our project was found in the work of Jean Piaget himself. If, it was asked, the working assumptions are tentatively considered to be true, what are the behavioral clues to intellectual development to be used by the first grade teacher in improving his or her understanding of the individual child? The answer had to satisfy two criteria: each clue was to have a demonstrable connection with the child's intellectual development and each clue taken was to be a kind of behavior potentially visible to the teacher who knew what to look for.

Specific descriptions of behavioral clues to intellect among children between five and seven were drawn from the research observations of Piaget, Guilford, Bruner, and others. Contemporary empiricists like these men, no matter how their observations are phrased, always describe development in terms of behavior, or in terms of what the child does that is visible to someone else. Those concerned with the project were able to mine a rich vein of research into the nature of intellect and the evidence of learning, and to compile a list of behavioral "signs" of intellect as observed by many professional researchers. This list was called the "researchers' list" of clues to intellect.


The second source was the primary grade teacher. The project team, an ETS staff, and a headquarters "task force" consulted 75 teachers in 25 New York City elementary schools and asked them about the behavioral signs of intellect they could see in their classrooms. After being given an explanation of the project, each teacher was encouraged to talk about specific occasions when particular children, by their behavior, provided insights into their intellectual development. Occasionally, two or three teachers meeting together would recall behavioral clues provided by the same child, filling out a rounded picture of an intellectually able child. All those interviewed were asked to do all their describing in terms of behavior and to concentrate on positive, not negative clues to intellect.

The candor and thoughtfulness of the teachers interviewed may be suggested by the following:

Teacher A: "Juan seldom speaks in class, apparently doesn't know much English, and will fall out of the bottom of the readiness tests, BUT ... he can take the right bus to get him across town with the laundry, work all the right buttons in the laundromat, and go home again on the bus in rush hour. No matter what the IQ tests show, Juan is a bright six year old in my book!"

Teacher B: "Grace can't or won't do much in language or arithmetic, but in art she reveals perceptions and understandings that would be a credit to a child twice her age."

Teacher C: "If children at this age can communicate well—with each other and with me—I suspect they are pretty bright. Not all the bright ones can communicate, of course, but all the good communicators seem better to me than average in mental alertness."

Hundreds of behavioral clues were gathered from these conversations with teachers. Each suggestion was written on a card immediately after each session with a teacher; and no attempt was made to eliminate duplication until all the material had been gathered. When the teachers' suggestions were finally edited and categorized, it was found that a rich "teacher list" of behavioral clues to intellectual development could now be evaluated against and compared with the "researchers' list." A staff jury did this work and discovered that most of the clues suggested by the teachers fitted into the original list at least once. The outcome was a long list of behavioral clues to intellectual development in six year olds which conformed to the theoretical model.

Because of the unfamiliarity of Piaget's terminology, a set of categories appropriate to the teachers' list of clues was developed for the try-out materials, and the researchers' list was fitted into the categories defined. Like all categories, these are simply "handles" for efficient use of the behavioral statement, and the primary purpose of the organization is to help teachers find their way through the list of behavioral descriptions. The categories are:

Area 1: Concepts of Space and Time

Area 2: The Growth of Logical Reasoning

Area 3: Understanding Mathematics

Area 4: Oral Communication

Area 5: Learning About the World

Area 6: Imagination and Creativity

The combined researchers' list and New York teachers' list of behavioral clues to intellectual development have been printed in the booklet Let's Look At First Graders: An Observational Guide for Teachers, for try-out in the school year 1964-65.


From the very beginning of the project, when the ETS staff was combing the research papers of Piaget and others for the specific behavioral evidence of intelligence, it was apparent that not all of the behavioral signs of growing maturity could be seen just by waiting for them to happen in the classroom. Some of the more important signs appear so infrequently in the ordinary course of events that the teacher might watch for months and never see them. The water-volume demonstration that Piaget was so fond of illustrates the point. This is a simple experiment that children think is fun to watch, and it permits the individual to exhibit his level of development in two different intellectual characteristics. However, lacking this simple but artificial opportunity for a child to reveal such development, the teacher might never be able to see what the child is like in these respects.

So the project was planned by the author's direct staff and the ETS policy group and project staff to consist of three phases or elements:

Element I: This phase is to concentrate on behavioral clues to intellectual development likely to be exhibited naturally in the course of planned classroom lessons and playground activities following usual curriculum guides. The teacher is to observe clues without special provision for eliciting them. The eventual outcome of this phase of the project is to be a guide to the observation of six-year olds for the purpose of estimating their intellectual development. The try-out form of the material in this phase is Let's Look at First Graders.

Element II: This phase is to concentrate on lesson materials and ideas for eliciting intellectual behavior so that it can be observed by the teacher. If, for any one of several reasons, the teacher is not able to "see" certain children engaged in appropriate intellectual tasks, she can use prototype lessons suggested as part of Element II, as well as puzzles, tasks, and games, to make sure that they have opportunities to demonstrate their development.

Element III: This element of the project is concentrated upon developing special tasks for first-graders to perform in approximately "standardized" circumstances. That is, even though the project emphasizes the role and the skill of the teacher in observing and assessing pupils' intellectual development, there remains a need for a means of comparison among children so that the teacher may retain a fairly stable set of ideas about what bright children at age six can do. Though the materials produced in this phase of the project will not be standardized "tests" in the traditional meaning of that term, they will have more of the characteristics of tests than the materials in either of the other two phases.

There are no assurances that Let's Look At First Graders: An Observational Guide for Teachers will be a valid assessment of the intellectual ability of young children. It might just live up to its name, "an observation guide for teachers"; and this alone might represent a giant step in education, especially if it directs teachers' attention to the actualities of behavior.

Jerome Bruner, Director of the Center of Cognitive Studies at Harvard University, suggests that the objective of testing should be to discern what the far limits of man's capacities are—how much a child can make of the best hints we might give him. He goes further and suggests that we should teach and test and teach and test, and only then can tests serve us with a benchmark, not of where a child is but of where he is capable of going.

It is on this premise that we should build: that there is no upper limit to what people are capable of doing with their minds. Back in 1900, before Binet had turned most of his attention to psychometrics, he concerned himself with the educability of intelligence, taking five or six aspects of intelligence that he thought could be trained. We have moved little in sixty years. It is time for educators to concern themselves with educability once again.


1. Bloom, B. S. Stability and change in human characteristics. New York: Wiley, 1964.

2. Chauncey, H., & Dobbin, J. Testing: Its place in education today. New York: Harper & Row, 1963.

3. Crandall, V. J. Achievement. In Stevenson, H. W., et al (Eds.) Child Psychology: Yearb. nat. Soc. Stud. Educ., 1963, 62, Part I, Pp. 416-459.

4. Dyer, H. What intelligence tests don't test. University, A Princeton Magazine', 1964 (No. 20), 4-5.

5. Ebel, R. The social consequences of educational testing. ETS Annual Meeting, 30 October, 1964.

6. Hunt, J. McV. Intelligence and experience. New York: Ronald, 1961.

7. Lannholm, G. Educational growth during the second two years of college. Educ. psychol. Measmt., 1950, 10, 367-370.

8. Learned, W. S., & Wood, B. D. The student and his knowledge. New York: Carnegie Fnd. Adv. Teach., 1938.

9. Moody, W. and others. How the mind works. London, Eng.: George Allen, 1945.

10. Scannel, D. P. Differential prediction of academic success from achievement test scores. Unpbl. PhD dissertation, State Univer. Iowa, 1958.

11. Spaulding, G. Another look at the prediction of college scores. Unpubl. rep. for the College Entrance Examination Board. New York: Educ. Rec. Bur., 1960.

12. Traxler, A. E. Reading growth of secondary school pupils during a five year period. Educational Records Bulletin. New York: Educ. Rec. Bur., 1950, No. 54, Pp. 96-107.

Cite This Article as: Teachers College Record Volume 67 Number 1, 1965, p. 10-17 ID Number: 2290, Date Accessed: 5/25/2022 10:40:28 PM

Purchase Reprint Rights for this article or review