Subscribe Today
Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Gender and Antecedents of Performance in Mathematics Testing

by Ann Gallagher - 1998

Gender differences in mathematics test performance have been documented extensively, providing a fairly clear picture of the circumstances under which differences are found. Notably fewer insights have been offered as to how these differences arise, why performance differences are found on tests but not in classroom grades, or what might be done to change current patterns. Using Halpern s (1997) psychobiosocial model of cognitive development as the point of departure, this article seeks to trace how differences in socialization patterns may contribute to cognitive processing differences, which, in turn, may lead to performance differences on tests.

Gender differences in mathematics test performance have been documented extensively, providing a fairly clear picture of the circumstances under which differences are found. Notably fewer insights have been offered as to how these differences arise, why performance differences are found on tests but not in classroom grades, or what might be done to change current patterns. Using Halpern s (1997) psychobiosocial model of cognitive development as the point of departure, this article seeks to trace how differences in socialization patterns may contribute to cognitive processing differences, which, in turn, may lead to performance differences on tests.

Gender differences in mathematics test performance have been documented extensively (Hyde, Fennema, & Lamon, 1990; Kimball, 1989; Wilder, 1996; Wilder & Powell, 1989; Willingham & Cole, 1997). The size of the differences ranges from negligible to greater than half a standard deviation (a magnitude considered either moderate or large depending on the consequences). In general, larger gender differences are found on tests intended to assess quantitative reasoning ability as opposed to achievement in mathematics.1 Furthermore, significant gender differences are found primarily on tests administered to groups at or beyond adolescence (Hyde, Fennema, & Lamon, 1990; Maccoby & Jacklin, 1974), although recent work suggests that some differences in strategy use can be detected at earlier ages (Fennema, Carpenter, Jacobs, Franke, & Levi, 1998). Research examining gender differences in classroom grades, on the other hand, has generally reported no difference, or differences favoring females, even in high-level mathematics and science courses (Bridgeman & Wendler, 1991; Kessel & Linn, 1996).

Indeed, an examination of mean college-grade point averages (GPA) for majors in engineering and physical sciences conducted at the Massachusetts Institute of Technology revealed that when students are selected for admission on the basis of multiple criteria, significant gender differences in mean SAT-I Mathematical Reasoning Test (SAT-M) scores do not translate into significant differences in classroom performance. For eight cohorts of students showing significant differences in SAT-M scores ranging from three points to sixty-six points in favor of males, only two (both in Aero/Astro Engineering) cohorts showed any significant difference in mean GPA (Johnson, 1993).

The literature examining factors related to gender differences in quantitative test performance has primarily been descriptive, dealing with the conditions under which differences are more or less likely to be found and documenting patterns of course-taking and their relationship to performance. Although these descriptive accounts provide a fairly clear picture of where and when gender differences on quantitative tests may be found, they offer few insights into how these differences arise, why performance differences are found in tests, but not in classroom grades, or what might be done to change the current patterns.

Many models have been suggested to explain the existence of gender differences in cognitive abilities. Traditionally, models were based on the premise that these differences resulted primarily from either socialization practices (Bandura & Walters, 1963; Bern, 1981; Kohlberg, 1966) or physiological indicators (Nyborg, 1983; Tanner, 1962). However, more recent models recognize that societal and biological factors interact systematically to create gender differences in cognitive abilities. Halperns (1997) psychobiosocial model outlines a system in which socialization and biological predisposition interact in a complex fashion and the influence of either on its own is impossible to determine. As Halpern notes,

Learning is both a socially-mediated event and a biological one. Individuals are predisposed to learn some topics more readily than others. A predisposition to learn some behaviors or concepts more easily than others is determined by prior learning experiences. . . . . Neural structures change in response to environmental events; environmental events are selected from the environment on the basis of, in part, predilections and expectancies; and the biological and socially mediated underpinnings of learning help to create the predilections and expectancies that guide future learning. (p. 1692)

In other words, small genetic predispositions influence early experience and experience influences development, which in turn influences future experiences and future development. Although biological and environmental factors cannot be examined as separate entities in research, for both ethical and methodological reasons, environmental factors are the easier of the two to manipulate. For this reason, the current article focuses on patterns of socialization and how they are reflected in cognitive processing differences that ultimately affect performance on tests. This discussion should be understood in the context of the psychobiosocial model in that gender differences in cognitive abilities stem from both physiological and environmental influences. However, according to this model, if environmental factors change, then physiological differences may also change. Environmental factors, therefore, offer one potential avenue for diminishing gender differences in performance.

By linking research from three disparate areas of psychology it is possible to create a clearer picture of the influence of socialization on gender differences in test performance. First, research examining gender differences in socialization reveals how early socialization creates two distinct cultural realms. Differences in male and female cultures result in differential participation in activities and patterns of behaviors that provide the background for performance differences on tests. Second, research examining gender differences in educational experiences demonstrates how experiences in school help to reinforce and augment the early influences of socialization. Finally, work on gender differences in cognitive processes indicates patterns of performance that parallel the patterns found in socialization. These cognitive-processing differences ultimately lead to differential performance on tests.

The first section of this article discusses explanatory factors that have been proposed in the educational psychology literature on gender differences in quantitative test performance. This includes differences in course-taking patterns, as well as differences in classroom experiences and aspects of motivation. The explanatory value of these factors can be evaluated for various segments of the testing population.

The next section discusses work from cognitive psychology, examining gender differences in cognitive processes used in responding to questions on quantitative tests. This section will review published work examining gender differences in problem solving.

The third section reviews work from social psychology that holds explanatory value for patterns of gender differences in problem-solving strategies observed on quantitative tests. This discussion focuses on socialization factors that differentially shape the lives of males and females. Part of this socialization process includes the formation of a preferred mode of approaching quantitative material that differs as a function of broader socialization factors.

A summary of the findings from the three domains of psychology (educational, cognitive, and social) suggests how each informs the others to provide a richer understanding of the various factors that interact to produce gender differences in performance on quantitative tests. To illustrate what can be done, given the present state of data, some research in progress is discussed. The final section of this article explores policy and practice implications for standardized testing in quantitative domains and for teachers and parents as influential elements in the socialization process.


Several explanations have been offered for gender differences in performance on standardized tests of quantitative reasoning. The most widely cited factors are different course-taking patterns, different experiences both inside and outside the classroom, differences in motivational factors, such as self-confidence and expectations, and the sex-role standard that mathematics is a male domain.


Gender differences in course-taking patterns for advanced mathematics courses have been well documented (College Board, 1996). On the SAT-M, the magnitude of the gap between males and females is reduced, but not eliminated, when students are matched on specific course work. Bridgeman and Wendler (1991) found that male and female students performed equally well within specific mathematics courses (i.e., course grades were the same or favored females slightly). However, differences in the performance of these same students on the SAT-M remained sizable; the initial difference of .48 standard deviation units was reduced to .42 for students in algebra classes, .36 for students in precalculus, and .32 for students in calculus classes.

Performance differences on the quantitative section of the Graduate Record Examination general test (GRE-Q) are also reduced, but not eliminated, when the sample is restricted only to students entering fields requiring substantial mathematics preparation. An examination of gender differences in performance for GRE test-takers from 1995 to 1996 who intend to apply to graduate school in Physical Sciences indicates that the difference is reduced from the average of .65 standard deviation units to .39 standard deviation units (Graduate Record Examinations Board, 1997). In fact, because a large proportion of these students score in the highest range, the reduction in gender differences for this group may appear more substantial than is actually the case because of the "ceiling effect.2


Sadker, Sadker, and Klein (1991) reviewed work conducted during the 1970s and 1980s documenting gender differences in the classroom interactions of teachers with students. This work included observations of mathematics and science classrooms. In. general, males tend to receive both more positive and more negative attention from teachers, and teachers tended to provide more scaffolding for male students, encouraging them to rethink incorrect responses and arrive at correct answers. In contrast, teachers are more likely to give shorter or yes/no responses to female students and to be less encouraging of longer, more elaborate responses from female students than from male students. Beal (1994) suggests that this may be the result of more protective feelings toward girls on the part of teachers; they challenge girls less for fear of embarrassing them. This type of differential treatment, however, has been linked to differences in educational outcomes (Beal, 1994).

Teacher feedback to male and female students in mathematics also tends to differ. According to Eccles and Blumenfeld (1985), observations of first- and fifth-grade classrooms in both middle- and working-class school districts showed that the majority of teacher feedback to both male and female students was negative. Feedback to male students focused primarily on procedural violations, while feedback to female students focused on their academic performance. According to the authors, negative feedback for academic performance (as opposed to procedural violations) served to lower female students confidence in their ability to perform well on future tasks.

Some parents appear to have diminished expectations for their daughters success in mathematics. In a survey of parents and students in eleven high schools in the San Francisco Bay area, Stallings (1985) found that of students continuing in advanced mathematics courses, parents had higher expectations for sons than for daughters; male students reported more parental support than females for taking advanced mathematics courses.

Other studies indicate that parental expectations influence childrens own expectations for success in mathematics. Jacobs (1991) surveyed middle school and high school students and their parents about the students mathematics achievement and gender differences in mathematics ability. Results from this survey indicate that parental expectations for childrens performance in mathematics classes were influenced by the childs sex and parental stereotypes regarding mathematics. Parents who stereotyped mathematics as a male domain held lower expectations for their daughters than for their sons. In addition, students self-perceptions about mathematics ability were more strongly related to parental expectations than to actual classroom performance.

The intermediate outcome of differential teacher behaviors and parents expectations is that males and females develop different perceptions of their own mathematics ability and potential for future success in mathematics. Some researchers hypothesize that differential feedback from teachers, combined with lower expectations from parents, causes females to attribute successful performance to different and less stable causes than males (Eccles &Jacobs, 1986). Females have a greater tendency to attribute their successes in mathematics to hard work, while males more often attribute their successes to ability (Wolleat, Pedro, Becker, & Fennema, 1980). Hard work is an unstable factor that demands a high level of sustained effort, as opposed to ability, which is stable and does not require effort to maintain. These differential attributions can lead to reduced self-confidence and persistence in mathematics as a field by females (Meyer & Koehler, 1990).


Differences in course-taking patterns provide one reasonable explanation for average gender differences in performance on standardized tests of quantitative material and differing proportions of males and females entering and persisting in fields requiring extensive mathematical training. Female students tend to take fewer advanced mathematics courses, and so are less familiar with the material presented on the tests and do not have the prerequisite skills for more technical courses. However, patterns of course-taking fail to account for gender differences in test scores among students with equivalent training (Bridgeman & Wendler, 1991; Graduate Record Examinations Board, 1997),

On tests such as the SAT-M, it is possible that the large residual differences found after course work has been controlled could be attributed to factors such as different levels of self-confidence and expectancies of teachers and parents. However, it is not clear why these factors would affect test performance and not classroom performance. The assumption is that standardized tests present a high-stakes environment likely to elicit greater anxiety than classroom assessments. However, work examining the relationship between gender differences in performance and test anxiety indicates that although females generally rate themselves higher on test anxiety (Hembree, 1988), gender differences in performance remain at about the same level in both high-stakes and low-stakes testing situations (Bridgeman, 1992).

Self-confidence becomes an even less convincing explanatory factor for gender differences in performance among students who have completed undergraduate programs in mathematical and scientific fields and are contemplating graduate work in these domains. Females who lack confidence in their mathematics ability are not likely to survive in engineering or other undergraduate programs that rely heavily on quantitative skill and are largely populated by males. Differences in course work are also not a reasonable explanation for gender differences in performance for this group. Most undergraduate programs in natural sciences and engineering have strict prerequisites regarding course work in calculus and other advanced quantitative material. Notwithstanding these requirements, performance differences favoring males are found for this group of students on quantitative tests. Indeed, the magnitude of the gender differences on assessments targeting highly technical fields is equivalent to what is found on assessments intended for the general population. For example, on the mathematics portion of the GRE Computer Science exam (taken primarily by students planning graduate work in that field) the difference is .62 standard deviation units and for the GRE Mathematics subject test (for those entering graduate work in mathematics) the difference is .87 (Willingham & Cole, 1997).

More recent work focusing on the interaction between cognitive-processing demands related to specific types of problems and differences in abilities of males and females may provide a more credible explanation for performance differences on tests designed for populations with advanced quantitative training.


Analyses of the mathematics content of questions on standardized tests have failed to identify specific mathematics content that consistently favors males over females (Doolittle & Cleary, 1987; McPeek & Wild, 1987; ONeill, Wild & McPeek, 1989). However, general patterns of performance are evident; many studies find that women tend to perform better on algebra problems than on geometry problems. Other studies examining cognitive factors show gender differences in performance on problems requiring unconventional applications of mathematics knowledge or problems classified as reasoning problems (Armstrong, 1985; Dossey, Mullis, Lindquist & Chambers, 1988; Gallagher & De Lisi, 1994).

The evidence for gender differences in performance patterns on standardized mathematics tests, combined with the fact that women do as well as (or better than) men in mathematics courses even at the most advanced levels (Kimball, 1989), suggests that the standardized tests may be assessing a different or more narrowly defined construct than do course grades.

This is not surprising given the differences between standardized tests and the teacher-constructed tests that form much of the basis for course grades. Standardized tests generally rely on a large number of rapidly generated responses to a diverse set of questions. Questions are often designed to require unfamiliar applications of mathematical knowledge. The solution path is irrelevant as long as the answer is correct. Teacher designed tests, on the other hand, focus on applications and content students have worked with over a prolonged period of time. Teacher-designed assessments of advanced quantitative material place as much (if not more) emphasis on how the answer is constructed as on whether it is correct. Partial credit is frequently rewarded for incorrect answers that are, nonetheless, based on sound reasoning.

Kessel and Linn (1996) suggest that there arc two views of what constitutes mathematical ability. One view is that the kind of unfamiliar task found on some standardized tests of quantitative material is a crucial element in any assessment of quantitative reasoning ability. This line of reasoning posits that those who can apply knowledge in an unfamiliar circumstance and solve test items quickly are more able reasoners than those who cannot. On the other hand, according to Kessel and Linn, professional mathematicians value the solution of difficult extended problems that require thought over a period of hours or days. They suggest that the former view of quantitative reasoning ability, which they call clever and speedy, is likely to disadvantage more reflective students whose study habits and problem-solving approach may actually be more reflective of the reasoning valued by professional mathematicians. Research by Gallagher and DeLisi (1994) suggests that females as a group are less likely than males to fall into the clever and speedy category in solving difficult mathematical reasoning problems.

In this study, students (22 females and 25 males) who scored at least 670 on the mathematics portion of the Scholastic Aptitude Test (SAT-M) were asked to think aloud while solving difficult mathematics problems that had previously shown sizable gender differences in performance when administered as part of an operational SAT. Problems were labeled either conventional or unconventional on the basis of how closely they resembled typical textbook problems. Solution strategies used by students were also categorized as conventional and unconventional on the basis of how closely they resembled strategies taught in mathematics classes. Most of the unconventional strategies involved short-cuts, estimation, or visual/spatial strategies. Results indicated that although there was no gender difference in performance on the entire set of problems (both conventional and unconventional), females were more likely than males to use conventional strategies and males were more likely than females to use unconventional strategies.

More recent work by Fennema et al. (1998) suggests that the same type of differential strategy use found by Gallagher and DeLisi (1994) may be evident even in grade school, before it is generally found in standardized test performance. In their sample of third-grade students, there was no sex difference in performance on number fact, addition/subtraction, or non routine problems, but boys performed significantly better than girls on extension problems (problems requiring multi-digit operations that had to be done without paper and pencil).

Work by Halpern (1992) approaches this same issue from a cognitive-processing perspective. Halpern suggests that there may be common underlying cognitive processes among the tasks that favor males or females across a variety of content domains. According to this hypothesis, women appear to excel at tasks that require rapid access and retrieval of information from memory. These tasks include associational fluency tasks (e.g., generating synonyms), language production, and word fluency tasks, as well as anagrams and computational tasks. Males appear to excel at tasks that require the retention and manipulation of a mental representation. These tasks include mental rotation and spatial perception tasks, verbal analogies, and some types of mathematical problem solving. In terms of the discrepancy between course grades and standardized test performance in mathematics, course work relies heavily on accessing and retrieval of information, skills at which females tend to excel. Standardized tests, on the other hand, may rely more heavily on the quick mental manipulation tasks at which males tend to excel.


A large body of research documents gender differences in patterns of socialization and the consistency and pervasiveness of sex-role socialization throughout a childs life from infancy through adulthood. Specific sets of behaviors, based on the childs sex, are rewarded and sanctioned by parents, teachers, and peers. These male and female cultures hold different expectations for their members, which are repeatedly reinforced by society.

The general pattern that emerges from reviews of this literature (e.g., Beal, 1994; Bern, 1993; Thorne, 1993) is that males are expected to show greater independence and assertiveness and less compliance with rules and authority. Females are expected to show greater compliance with rules and authority and less assertiveness and independence. For example, male toddlers are encouraged to explore their environment, whereas female toddlers are encouraged to stay in close physical proximity to their caregiver. Male toddlers are more often are left to play alone and are expected to be less compliant when told no by their mothers. Later, school-age boys are granted more independence to visit parks or libraries on their own, and parents of school-age boys believe they are more competent and better able to take care of themselves than girls of the same age. Parents believe that daughters are in need of extra protection and are more fragile and easily frightened than boys.

Parental attitudes toward achievement also differ for male and female children. Hard work, self-reliance, persistence, and initiative are more highly valued for sons than for daughters. Some parents have more narrowly defined views of success for their sons than for their daughters and are less involved in decisions regarding choice of college major and career for daughters than for sons. Beals (1994) review paints a picture of parents reinforcing gender appropriate stereotypes in their children. The result of this reinforcement is the development of different styles of interactions with peers and the world at large.

Patterns of interactions with peers also suggest that two distinctly different cultures exist, each operating under a-different set of rules and assumptions. Male friendships are usually based on large groups of peers, whereas, females tend toward small groups of two or three. Competition and a dominance hierarchy are typical of male groups of friends. Female groups, on the other hand, more often emphasize equality and cooperation. Males participate to a greater degree than females in competitive sports; females tend to participate in individual sports, such as gymnastics that involve indirect competition.

Several researchers have examined the influences of sex-role stereotypes on participation in mathematics and science (Armstrong, 1985; Eccles & Jacobs, 1986; Jacobs, 1991; Sadker, Sadker, & Klein, 1991; Stallings, 1985). The consensus is that mathematics is considered a male domain. These cultural stereotypes discourage females from participating. However, it appears that even among those few who persist in the field, gendered socialization may influence their way of approaching quantitative problems.

The Autonomous Learning theory (Fennema & Peterson, 1985) applies the notion of differential socialization to learning behaviors in the classroom. According to this theory, beliefs about oneself and the realm of mathematics influence students to choose and persist at different sets of tasks. A high level of independence and confidence leads to independent or autonomous learning, where students seek out complex tasks and show a high level of persistence. Males, who are expected and encouraged to be independent in other aspects of their lives, tend to bring this style to the classroom, especially in the male domain of mathematics.

Other research indicates that females tend to prefer cooperative, rather than competitive learning environments (Sadker et al., 1991) and that cognitive gain for females is enhanced in cooperative or one-on-one situations. It is only a small leap in extending these notions to strategies applied in a standardized testing environment. Females are socialized to be cautious, compliant, and cooperative and to follow the rules, while males are socialized to be competitive and independent thinkers, and to take matters into their own hands. The timed nature of standardized tests evokes a highly competitive environment. The unfamiliarity of some of the material in standardized tests also lends itself more readily to a problem-solving style that is less rule-governed, more competitive, more prone to risk-taking than the typical classroom environment. Given the differences in the behaviors that are considered gender appropriate, the standardized testing environment (in contrast to the classroom environment) seems clearly more compatible with a masculine rather than a feminine culture.


Work from the domains of educational, cognitive, and social psychology all shed light on the question of factors underlying gender differences in performance on quantitative tests. The literature from social and educational psychology paints a clear picture of two distinct sets of behaviors that are considered appropriate for males and females. This socialization leads to different ways of thinking about the world and approaching problems; females are more conservative and consistent in following the rules, while males take greater risks and respect the rules less. In the words of Bern (1993), each individual brings a particular style of social interaction to all situations (p. 154)) and this style is the product of the individuals interactions with society.

The cognitive psychology literature suggests a similar pattern in the kinds of quantitative tasks at which males and females excel; males are better at manipulating images and finding less orthodox shortcuts to problems, working outside the rules, while females are better at remembering what they have been taught and consistently applying rules and translating from one language to another (e.g., from standard English to algebra). The educational psychology literature shows that females get equal or better grades in mathematics classes, but that males outperform females on standardized tests. Research examining performance on standardized tests indicates that the largest differences in performance are found on tasks that are the farthest removed from what is taught in schoolunfamiliar problems that require some unconventional application of knowledge in a highly timed competitive setting.

Behaviors that are reinforced in schools are many of the same behaviors that society labels appropriate for girls. Boys, on the other hand, learn one set of behaviors from society and a different set in school. Society teaches boys to take risks and break rules while schools encourage them to be compliant and adhere to rules (though there is quite a bit of wink and nod toward boys behavior in schools). In addition to a knowledge of the material being tested, the behaviors that are most likely to lead to success on standardized tests of quantitative reasoning are the behaviors that boys (but not girls) learn outside of school. In contrast, the behaviors at which girls excel, applying a given- set of rules or language to language translations, are behaviors that are rewarded in the classroom but not on tests.


If research continues to show patterns of gender differences in performance that are related to different sets of cognitive processing demands that, in turn, may stem (at least in part) from differences in socialization, then several important policy issues must be considered. First, testing organizations and their clients must decide whether a more assertive, greater risk-taking approach is, in fact, more likely to predict success in higher education than a systematic, rule-governed approach. If both approaches are deemed valuable, then changes to the test should be made to reflect that balanced view.

Equally important, consideration should be given to factors that contribute to success but are currently left out of the assessments (e.g., interpersonal skills, persistence, motivation, and an ability to get the job done), and funding should be allocated to exploring ways of reliably assessing these skills and qualities. This becomes increasingly important as affirmative action guidelines are being abandoned in favor of more objective criteria (such as standardized test scores). Funding for continuing education is frequently tied to performance on standardized tests. If the tests reflect only a subset of skills and abilities required for success in graduate education, and these skills are more closely aligned with behaviors designated male, then females are more likely to be unfairly denied access to higher education.

If standardized admissions tests continue to focus on only a limited set of skills and abilities, then use of the these tests to select students to continue in higher education will omit students with other qualities. Testing organizations should continue to expand the dialogue with the teaching community to form a partnership capable of clarifying the limitations of standardized tests and to work toward incorporating other important skills and abilities in future assessments. Such a partnership is likely to result in a more valid, broadly-based assessment process.

Educational institutions and teachers within them should reflect on problem solving styles that are being taught in fields such as mathematics. Initial findings from the Third International Mathematics and Science Study (Peak, 1996), an international comparison of performance and practices in eighth-grade mathematics and science education, reveal that, although most mathematics teachers in the United States are familiar with recommendations for the reform of mathematics teaching (e.g., the National Council of Teachers of Mathematics, 1989), very few apply key points of this reform in their own classroom teaching. The key points of this reform movement are a departure from the tradition of the teacher and text as exclusive sources of knowledge to a greater emphasis on the students development of conceptual understanding, exploration of a variety of solution methods, and deductive validation of their own conjectures. Instruction that emphasizes multiple methods for solving individual problems and encourages students to formulate and evaluate their own conjectures regarding quantitative material is likely to foster the kind of independent thinking and creativity necessary for high-level mathematical reasoning. Indeed, the patterns of behavior and test performance discussed above are almost exclusively based on research conducted within the United States. International research indicates that performance patterns found in the United States may not hold in other cultures (Peak, 1996).

Teachers themselves are likely to be as steeped in sex-role stereotypes as the rest of society. Therefore, preservice and continuing education for teachers should focus on sensitizing them to the fact that they may be holding down their students achievement by perpetuating such stereotypes. Training should also incorporate specific strategies for teaching girls the set of skills that boys learn as part of their socialization outside of the school environment. Many of these behaviors (e.g., estimating, forming conjectures, or exploring multiple routes of arriving at an answer) already form part of recommendations for reforms to mathematics instruction. By incorporating reforms, teachers may help to reduce gender differences in performance on tests of quantitative reasoning.


To illustrate how the patterns of performance discussed may be linked to specific characteristics of test questions, we can consider some work in progress examining operational testing data for several GRE quantitative (GRE-Q) tests. At Educational Testing Service, we (Gallagher, Morley, and Cahalan, in preparation) are currently examining gender differences in performance on test questions that have been coded for attributes of both the stimulus (or question) and their solutions. This coding scheme is an attempt to build on earlier work (see Gallagher & Delisi, 1994) examining SAT-M data in a way that incorporates Halperns (1992) theoretical perspective on categories of cognitive tasks that show differential performance by gender.

We have now coded test questions from four forms of the GRE-Q using the categories listed in Table 1. A key consideration in coding the items was the mathematical expertise of the population taking the test. What is considered content mastery for one population may be considered unfamiliar material for another. Therefore, it was important to know something about the level of mathematics training of students taking the test. Items were coded only for the most salient characteristics of solutions since many items are highly complex in the set of skills that are tapped. In some cases, more than one code applied to items (e.g., word problems with a spatial component). Questions that were coded as both F (likely to favor females) and M (likely to favor males) were recoded as M (n = 72).

Of the 234 questions included in the analyses from the four forms, 108 were coded F and 126 were coded M. Six items were excluded from the analyses because they did not clearly fit either-coding category. There was no significant difference in the average difficulty (percent correct) of items in the two categories.

Analyses were conducted on data for students (11,804 males and 25,382 females) who reported majoring in social sciences (e.g., Psychology, Sociology, Anthropology, Political Science, Economics). 3 Males outperformed females on the majority of test questions (95%). However, the average impact, or gender difference, in performance44 was significantly-larger for items coded M versus those coded F [F 1, 232] =9.74 p<.002). The effect size for this difference was .41 standard deviation units.

Because the coding scheme used in these analyses focuses on cognitive attributes as opposed to mathematics content; it may be possible to change the balance of these attributes within a test without violating current definitions of the construct as outlined by test specifications. The next step in this line of research is to examine the intersection of test specifications and M/F attributes to determine whether this can be done and whether test forms with reduced impact can be created by taking into account the cognitive attributes of test questions and their solutions.

If this program of research is successful in manipulating the magnitude of gender differences in performance through changes to question formats and cognitive content, then it may be possible to reduce performance differences on tests through small but systematic changes to the questions themselves, while remaining faithful to the mathematical construct. This work will also contribute to the discussion of which elements of current tests are important in the assessment of mathematical reasoning by identifying sources of gender differences in performance beyond preparation and motivational factors. Once gender-linked test elements have been identified, they can then be considered for inclusion (or elimination) in tests.

Reform movements in the teaching of mathematics advocate a style of teaching that discourages students from thinking about mathematics as a set of rules that must be applied in a narrowly defined manner, rather than a dynamic system, or language, that provides a multitude of routes to solving any problem. Reformers argue that teachers should encourage students (especially females) to explore, experiment, and conjecture about quantitative situations.


Such exploration and experimentation seems to be one avenue for changing students behavior and potentially reducing performance differences on high-stakes assessments However, the relationship between the way mathematics is taught and gender differences in test performance has not yet been investigated. Future studies should examine this relationship to determine whether reforms that are currently recommended can contribute to changing patterns of mathematics test performance.

This article was made possible through the support and assistance of Ellen Mandinach.


Armstrong, J. M. (1985). A national assessment of participation and achievement of women in mathematics. In S. F. Chipman, L. R. Brush, & D. M. Wilson (Eds.), Women and mathematics: Balancing the equation (pp. 59-94). Hillsdale, NJ: Lawrence Erlbaum.

Bandura, A., & Walters, R. H. (1963). Social learning and personality development. New York: Holt, Rinehart & Winston.

Beal, C. R. (1994). Boy and girls: The development of gender roles. New York: McGraw-Hill.

Bern, S. L. (1981). Gender schema theory: A cognitive account of sex-typing. Psychological Review, 88, 354-364.

Bern, S. L. (1993). The lenses of gender. New Haven: Yale University Press.

Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253-271.

Bridgeman, B., & Wendler, C. (1991). Gender differences in predictors of college mathematics performance. Journal of Educational Psychology, 83(2), 275-284.

College Board. (1996). College bound seniors: A profile of SAT program test takers. Princeton, NJ: Educational Testing Service.

Doolittle, A. E., & Cleary, T. A. (1987). Gender-based differential problem performance in mathematics achievement problems. Journal of Educational Measurement, 24, 157-166.

Dossey, J. A., Mullis, I. V. S., Lindquist, M. M., & Chambers, D. L. (1988) The mathematics report card: Are we measuring up? Trends and achievement based on the 1986 National Assessment. Princeton, NJ: The Nations Report Card, NAEP, Educational Testing Service.

Eccles, J. S., & Blumenfeld, P. (1985) Classroom experiences and student gender: Are there differences and do they matter? In L. C. Wilkinson & C. B. Marrett (Eds.), Gender influences in classroom interaction (pp. 79-114). New York: Academic Press.

Eccles, J. S., &Jacobs, J. (1986). Social forces shape math attitudes and performance. Signs, 11, 367-389.

Fennema, E., Carpenter, T. P., Jacobs, V. R., Franke, M. L., & Levi, L. W. (1998). A longitudinal study of gender differences in young childrens mathematical thinking. Educational Researcher, 27(5), 6-11.

Fennema, E., & Peterson, P. L. (1985). Autonomous learning behavior: A possible explanation of gender-related differences in mathematics. In L. C. Wilkinson & C. B. Marrett (Eds.), Gender-related differences in classroom interactions (pp. 17-35). New York: Academic Press.

Gallagher, A. M., & De Lisi, R. (1994). Gender differences in scholastic aptitude test¾mathematics problem solving among high ability students. Journal of Educational Psychology, 86(2), 204-211.

Gallagher, A. M., Morley, M., & Cahalan, C. (in preparation). An examination of the underlying cognitive processing demands of GRE quantitative items. Princeton, NJ: Educational Testing Service.

Graduate Record Examinations Board. (1997). Sex race ethnicity and performance on the GRE General Test (technical report). Princeton, NJ: Educational Testing Service.

Halpern, D. F. (1992). Sex differences in cognitive abilities (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Halpern, D. F. (1997). Sex differences in intelligence. American Psychologist, 52(10), 1091-1102.

Hembree, R. (1988). Correlates, causes, effects and treatment of test anxiety. Review of Educational Research, 58, 47-77.

Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematical performance: A meta-analysis. Psychological Bulletin, 107(2), 139-155.

Jacobs, J. E. (1991). Influence of gender stereotypes on parent and child mathematics ability. Journal of Educational Psychology, 83, 518-527.

Johnson, E. S. (1993). College womens performance in a math-science curriculum: A case study. College and University, 68(2), 74-78.

Kessel, C., & Linn, M. C. (1996). Grades or scores: Predicting future college mathematics performance. Educational Measurement: Issues and Practice, 15(4), 10-14.

Kimball, M. M. (1989). A new perspective on womens math achievement. Psychological Bulletin, 105, 198-214.

Kohlberg, L. (1966). A cognitive-developmental analysis of childrens sex-role concepts and attitudes. In E. Maccoby (Ed.), The development of sex differences (pp. 82-172). Stanford: Stanford University Press.

Maccoby, E. E., & Jacklin, C. N. (1974). The psychology of sex differences. Stanford: Stanford University Press.

McPeek, W. M., & Wild, C. L. (1987, August). Characteristics of quantitative problems that function differently for men and women. Paper presented at the annual meeting of the American Psychological Association, New York.

Meyer, M. R., & Kohler, M. S. (1990). Internal influences on gender differences in mathematics. In E. Fennema and G. Leder (Eds.), Mathematics and gender (pp. 60-95). New York: Teachers College Press.

National Council of Teachers of Mathematics. (1989). Professional standards for teaching mathematics. Reston, VA: Author.

Nyborg, H. (1983). Spatial ability in men and women: Review and new theory. Advances in Behaviour Research & Therapy, 5, 89-140.

ONeill, K., Wild, C. L., & McPeek, W. M. (1989, Marc). Gender-related differential item performance on graduate admissions tests. Paper presented at the annual meeting of the American Psychological Association, San Francisco, CA.

Peak, L. (1996). Pursuing excellence: A study of U.S. eighth-grade mathematics and science teaching, learning, curriculum, and achievement in international context. Washington. DC: National Center for Educational Statistics.

Sadker, M., Sadker, D., & Klein, S. (1991). The issue of gender in elementary and secondary education. In G. Grant (Ed.), Review of research in education, (pp. 269-334). Washington DC: American Educational Research Association.

Stallings, J. (1985). School, classroom, and home influences on womens decisions to enroll in advanced mathematics courses. In S. Chipman, L. Brush, & D. Wilson (Eds.), Women and mathematics: Balancing the equation (pp. 199-224). Hillsdale, NJ: Lawrence Erlbaum.

Tanner, J. M. (1962). Growth at adolescence. Oxford: Blackwell Scientific.

Thorne, B. (1993). Gender play: Girls and boys in school New Brunswick, NJ: Rutgers University Press.

Wilder, G. Z. (1996). Correlates of gender differences in cognitive functioning (CB Rep. No. 96-03). New York: College Entrance Examination Board.

Wilder, G. Z., & Powell, K. (1989). Sex differences in test performance: A survey of the literature (CB Rep. No. 89-3: ETS RR-89-4). New York: College Entrance Examination Board.

Willingham W. W., & Cole, N. S. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum.

Wolleat, P., Pedro, J. D., Becker, A., & Fennema, E. (1980. Sex differences in high school students causal attributions of performance in mathematics. Journal for Research in Mathematics Education, 11 (5), 356-366.

Cite This Article as: Teachers College Record Volume 100 Number 2, 1998, p. 297-314
https://www.tcrecord.org ID Number: 10313, Date Accessed: 11/27/2021 7:03:43 PM

Purchase Reprint Rights for this article or review
Article Tools
Related Articles

Related Discussion
Post a Comment | Read All

About the Author
  • Ann Gallagher
    Educational Testing Service
    Ann Galligher is a research scientist, Educational Testing Service, Princeton, New Jersey. She is co-author, with R. De Lisi, of "Gender Differences in Scholastic Aptitude Test: Mathematics Problem Solving among High Ability Students," Journal of Educational Psychology (1994).
Member Center
In Print
This Month's Issue