Rethinking "High Stakes:" Lessons from the US and England and Wales
by William A. Firestone & David Mayrowetz - 2000
Based on fieldwork conducted in England, Wales and two American states, this paper suggests six themes about “high stakes testing.” First, not all stakes are perceived to be equally high. Second, pressure to respond to a test comes from more than just formal stakes. Third, external pressure leads to symbolic responses outside the classroom. Fourth, external pressure can be useful for changing content taught. Fifth, external pressure is less effective in changing instructional strategy than content taught. Sixth, the effects of stakes will depend on a variety of other policy factors.
Based on fieldwork conducted in England, Wales, and two American states, this paper suggests six themes about high stakes testing. First, not all stakes are perceived to be equally high. Second, pressure to respond to a test comes from more than just formal stakes. Third, external pressure leads to symbolic responses outside the classroom. Fourth, external pressure can be useful for changing content taught. Fifth, external pressure is less effective in changing instructional strategy than content taught. Sixth, the effects of stakes will depend on a variety of other policy factors.
The utility of high stakes testing as a policy tool promoting excellence has been the subject of heated debate. Educators and some researchers believe that high stakes inevitably distort the instructional process, leading to the worst sort of teaching to the test and harmful stress for teachers and children (Corbett & Wilson, 1991; Smith, 1991). However, the popular view that stakes are needed to enforce sanctions is reflected in such documents as Quality Counts 99 (Education Week, 1999). A number of test developers and policy analysts argue that governmental tests can play a useful role if new technologies such as the use of portfolios and performance-based assessment are expanded, although they focus more on test format than stakes (Baron & Wolf, 1996). A third view is that stakes and tests have less influence than either of the dominant parties think (Cohen, 1995).
While researchers debate these issues, policy makers are expanding the use of tests and the stakes that go with them. Forty-seven states now test students. Almost all of these use multiple-choice technology, but 34 states also use performance-based assessments and two score portfolios. Stakes also vary widely: 36 states report school or district results publicly, 14 link some rewards to high scores, and 16 apply some sanctions to professionals when scores are low (Education Week, 1999). Furthermore, high stakes testing is an international phenomenon. Policy makers in England, for instance, introduced national assessment in the early 1990s. Individual test scores help determine students educational and occupational future after age 16; and with extensive parental choice, public reporting of test scores is said to influence enrollments in specific schools (Daugherty, 1997).
Given the prevalence of tests with stakes attached to them, it is important to better understand what the effects of such policies are. To that end, this paper offers six themes about the nature of stakes attached to testing and their relationship to changes in instructional practice. These themes, which we hope will guide future research, are illustrated with data from two related studies. The first study looked at teacher responses to two American state testing programs. The second used a similar design to examine the instructional effects of National Assessment in England and Wales. Though study limitations prevent us from drawing definitive conclusions, our findings suggest that moderate stakes for educators (as opposed to students) can contribute to changes in content taught but not to changes in instructional strategiesthe issue that has generated most discussion recently. Our findings also illustrate the difficulty in fine tuning educational policy in order to get desired results.
The American study examined teacher responses to two performance-based state assessment programs. The research team sought states with performance-based assessments of mathematics for eighth graders (14-year-olds). When American fieldwork was initiated in 1994, Maine and Maryland were two of three states that reported using such tests. The third, Kentucky, was not included because the political situation was quite unstable, and it was not clear that we would be able to finish data collection before the assessment changed. The Maryland School Performance Assessment Program (MSPAP) had high stakes for educators through the threat of school reconstitution, a form of takeover. The Maine Education Assessment (MEA) was not linked to such stakes, permitting a useful comparison.
Shortly after the American fieldwork was completed, an opportunity arose to conduct comparative research in England and Wales. By that time, some Americans were quite interested in the implementation of the English-Welsh national performance-based assessment although, as will become clear below, the central government was moving away from that approach as fieldwork was carried out. The research focused on the Key Stage 3 (KS3 for 14-year-olds) mathematics assessment. At that time, the KS3 test had quite low stakes because results were not published, although they soon would be. Moreover, results for the test for 16-year-olds, the General Certificate of Secondary Education (GCSE), were published. Since 14- and 16-year-olds attend the same secondary schools and are usually taught by the same teachers in England and Wales, we learned a lot about the effects of high stakes testing there, albeit indirectly.
Both research projects used embedded case studies to clarify how assessment policies were interpreted. Conducted in 19951996, the American study focused on four schools in two districts in Maryland and eight schools in three Maine districts. The original plan was to select a poor and a middle-wealth district in each state for study, expecting this strategy to give us one high scoring and one low scoring district. This approach made sense in Maryland, but a major source of variation in Maine was between the few relatively urban districts and the many rural ones. There we chose one urban district as well as a poorer and wealthier system from the smaller districts.
In each district, semi-structured interviews were conducted with school board members, the superintendent, district curriculum specialists, principals, and department heads in math, English, and social studies. In addition, 25 math teachers across the two states were visited on two occasions to be interviewed and observed (see Table 1). Interviews were structured by fairly detailed guides with open-ended questions. However, the second interview for math teachers included three questions asking teachers to choose between prespecified options and explain their choices as a way of clarifying their thinking about mathematics teaching.
The European fieldworkconducted in Bristol and Cardiff in the spring of 1997was designed to parallel the American procedures as much as possible. A middle income and a low income school was chosen in each city. Interviews were conducted with head teachers (the equivalent of American principals), the senior management team member responsible for curriculum, and mathematics department heads.1 In addition, two visits to 16 teachers were made for classroom observations and interviews (see Table 1). The interview guides used in England and Wales drew from those used in Maine and Maryland but were shortened somewhat and modified to reflect the different national context.
Data were entered into NUD-IST, a computerized data analysis package, and coded using a system that began from initial hypotheses but was elaborated inductively through a review of interviews and observation notes. The coding scheme permitted identifying themes and developing and testing larger arguments as suggested by Strauss and Corbin (1990). The American data were analyzed first and provided some direction for comparing American and English-Welsh data.
To examine the classroom observations, we identified two dimensions that described possible changes in the teaching and learning of mathematics. These were developed by the Third International Mathematics and Science Study (TIMSS) (Stigler & Hiebert, 1997). To standardize the classroom coding system, coders independently analyzed initial sets of observation transcripts, differences were reconciled, and interrater reliability was checked (Firestone, Mayrowetz, & Fairman, 1998). The system developed for coding the American data was later used for coding the English and Welsh classroom observations. All American data were coded by two American observers (94 percent agreement); the observations from England and Wales were coded by three American observers (77 percent agreement).
The following six themes surfaced from national and international comparisons of our data. The first two themes deal with educators perceptions of stakes while the next three address changes in practice. The final theme offers some reasons for the limitations of stakes. For each theme, we present data from our fieldwork along with explanation and interpretation.
THEME 1: NOT ALL STAKES ARE PERCEIVED TO BE EQUALLY HIGH
Educators often use the phrase high stakes rather casually and loosely. Some individuals become quite concerned about what seem to be very modest sanctions. For instance, the major reward or punishment linked to the MEA was the publication of school scores in the newspaper.2 One principal complained that, What is bad about it is that it was never intended for school boards to use to compare between schools or between districts. It has been used that way; they took the blame. . . . Corroborating the bitterness and discomfort created by the publication and comparison of test scores, three teachers in that state said:
The MEAs are a political issue. They can make us look great, or they can cause mischief.
Its not what the test was set up for. Other communities have the same problem. How come this town fifteen miles away does better than we do?
What is negative is that all the schools are vying to be number one on the test scores. They shouldnt compete with each other.
In Maryland, 88 percent of teachers surveyed by Koretz, Mitchell, Barron, and Keith (1996) agreed with a statement that they were under undue pressure to improve students test scores.
While teachers complained in both states, from an outsiders perspective, the differences in sanctions among these governments seemed considerable, and educators actually responded accordingly. The only sanction in Maine was negative publicity. After all the general objections there, educators said:
MEA is not what drives teachers. My perception is teachers dont place so much stock in MEA that it will move them to do something differently.
I will not teach to the test to enhance scores. But at the same time I want them [the students] to do the best they can. (Emphasis as spoken)
Four teachers and one administrator said the MEA had some influence on their work, three teachers saw the MEA as providing only symbolic influence, and three teachers and one administrator denied that it had any influence. No one felt compelled to respond to state demands in Maine. In fact one math teacher explained that his job was to get them ready for algebra. Thats my personal opinion. You know, not to get them ready to pass the MEAs. Thats not what Im here for.
The stakes were higher in Maryland. Aggregate student test scores on the MSPAP were important indicators to determine which schools should be reconstituted; reconstitution is a form of state takeover that could include the removal of teachers and administrators. However, almost all the reconstituted schools were in Baltimore City. The schools visited for this project were near the middle of the state test score distribution and not at great risk. While not imminent, reconstitution, as one principal put it, was in the back of your mind. Its a reflection on the school. Were above the state level and I dont think anyone in this county has to worry about it. A district curriculum specialist said, For most schools, reconstitution is not perceived as a threat. But even if the fire isnt on your block, if its three blocks away, you check your fire insurance.
Yet the tension was generally higher than in Maine as indicated by jokes that floated around district offices and teachers lounges. One such humorous yet nervous response to the test that illustrates the tension and frustration of Maryland teachers was a MSPAP exam for adults that contained items like:
BIOLOGY: Create life. Estimate the difference in subsequent human culture if this form of life had developed 500 million years earlier, with special attention to its probable effect on the English parliamentary system. Prove your thesis.
EPISTEMOLOGY: Take a position for or against truth. Prove the validity of your position.
These examples illustrate Maryland teachers discomfort with the new form of assessment used by MSPAP. Four Maryland teachers and five administrators said they felt compelled to respond to the MSPAP and four teachers (some of the same ones) said they resented the state pressure. In addition, two teachers said the MSPAP had some influence, andunlike Maineno one denied that it had any influence on what they did. Still, when we visited one Maryland school two weeks before the MSPAP was given to view test preparation activity, the big event was Olympics Week, a series of assemblies with professional athletes and track-and-field events for students. Students were pulled out of classesincluding test preparation workto participate in these events, and the teacher talk over lunch was about the travails of putting on those events. In short, our data indicate that not all stakes are equally high, even within the same state.
THEME 2: PRESSURE TO RESPOND TO A TEST COMES FROM MORE THAN JUST FORMAL STAKES
Early researchers on high stakes noted that the strength of sanctions is socially constructed so that the same formal sanctions will have different meanings to different people (Corbett & Wilson, 1991). Less attention has been given to contextual factors that may affect how people interpret those sanctions. The political culture of central jurisdictions (i.e., a level of government, like a state or nation, that incorporates smaller units) is one such factor (Marshall, Mitchell, & Wirt, 1989)
The importance of political culture comes through in a comparison of Maryland and Maine. Maryland has a history of making a great deal of educational policy. In the 1980s, it was one of the first states to increase its high school graduation requirements. Today, it mandates that individual schools establish School Improvement Teams that must create and report annual development plans with building-wide strategies to improve MSPAP scores.
The political culture in Maryland is manifest in an institutional framework that has strong connections between the central authority and local agents. First, the state has been consolidated into 24 (usually) county districts, which enables frequent, effective communication between state and district administrators. This system structure also led to the creation of job-alike groups of state and local officials who meet often and communicate fairly extensively. Moreover, in some counties board members are nominated locally but appointed by the governor, while in others they are elected at large. Either way, public input to the school board is limited.
The result is a habit of complying with state policy, at least at a surface level. When asked a standard interview question about how useful the MSPAP was, three Maryland administrators explicitly challenged the premise of the question, saying things like, It doesnt work that way. Its like saying someone wrote a law, and how does it meet this guys needs? No. This is the law. You will shape yourself to fit the laws needs. It is not clear that teachers were as sensitive to these issues as principals and central office staff, but administrators were certainly primed to comply with state policy.
The culture and educational system in Maine is much different from that in Maryland, resulting in fewer opportunities for central authority to influence local districts. Generally, less educational policy is made in Maine than in Maryland. The small, and shrinking, state education department has limited capacity either to devise policy or oversee its implementation. Just before the study period, for instance, the departments assessment unit was reduced from five professional staff to one. Furthermore, local governance of schools is distributed among 188 local authorities, including school districts, towns, supervisory unions, and school administrative districts (SADs), so there is much less opportunity for communication between state and district officials than in Maryland. School boards are larger and elected regionally, not at large, so local influence is greater. These structural and political factors, as well as the difference in stakes between the two states, help explain why educators in Maine were more likely to ignore the state assessments. In fact, absence of formal stakes in Maine may reflect the decentralized political culture of that state.
Another key contextual factor that impacted on stakes was how each test was situated with regard to other educational policies in the jurisdiction. The major reforms in England and Wales were national curriculum and assessment where the emphasis was on curriculum; and assessments were, in large part, used to determine how well the curriculum was taught. By contrast, both American states had testing programs before the current interest in state standards. The MEA had preceded the standards movement. Maine legislators were debating what their learning results should be while our fieldwork was going on so curricular standards had little influence on assessment design. The MSPAP was part of a larger effort to develop state standards and monitor schools, but it seemed to have some life of its own. In neither state did the content standards offer as much guidance about what should be taught as the relatively detailed schemes of work being developed in England and Wales (see below).
THEME 3: EXTERNAL PRESSURE LEADS TO SYMBOLIC RESPONSES
School officials, primarily administrators, responded to pressure from the threat of formal sanctions, impending comparisons in the press, and potential enrollment losses with actions designed to motivate teachers and students and maintain the public trust. These intentions, combined with the fact that administrators are often removed from the classroom, led to responses that had little direct impact on instructional practice. Instead, administrators drew on the power of symbolic gestures (Edelman, 1988) to broadcast the importance of tests and took steps to ensure that newspaper coverage of test results would not damage the institutional prestige or survival of their schools.
In Maine, where external pressures were arguably the lowest, teachers and administrators reported using pep rallies to exhort their students to try their best on the MEA. Often principals would then reward student efforts with a party. One principal reported, I challenge the kids to do their best, and try to do better than other schools or last years scores. If they try hard, I give them an ice cream or pizza party afterwards. I found other places doing that. In another district, a curriculum coordinator reported principal cheerleading [tactics] run[ning] the gamut from parties, to snacks during testing, to letters to parents reminding students to eat and sleep the night before the assessment. The strongest signal of the importance of testing occurred at a high school in one of our districts where test scores were so low in comparison to its neighbors they made front page news. There, school board members seriously contemplated using cash incentives for students but did not do so.
With more capacity and more distance from voters, administrators in both Maryland districts reached out to the public in an effort to promote themselves and head off possible unrest stemming from low test scores. This was especially important because the MSPAP was considered to be a difficult test and the performance standards were set fairly high; 95 percent of students are expected to pass. One district level official said that after the first year the MSPAP scores were publicized, and a local newspaper reported that all the schools in his district failed because none attained the very high standard. To forestall further embarrassment in the following year, he and his colleagues began staging a public release of test results, inviting reporters from two newspapers and local government officials to partake in a celebration, complete with finger foods and a band. In the third year of testing, slightly over 50 percent of district eighth graders passed the mathematics section and less than 35 percent scored at the satisfactory level in reading. The message of that press conference was We have not reached our goal yet but we made a lot of progress. According to the superintendent, public relations efforts such as these were necessary because they educated parents about the effects of the testing program. But other officials remarked that what goes out to the public (i.e., test scores) mitigated external pressure since the district was doing well in comparison to others statewide. In short, since administrators were unable to directly control student performance, they tried to control the interpretation of that performance.
In some ways, the highest external pressures were in England and Wales where educators worried about league tables of test scores published in the local papers. The difference between those tables and the newspaper reports in Maine and Maryland was that the English-Welsh tables were believed to affect school enrollments. The neighborhood school is largely dead in England and Wales where parents and students have considerable choice in what school they can attend. Head teachers believed that test scores influenced who came into their schools. One explained that:
The big problem has been intake [of students], quality of intake. Our ability profile is slipping to the left quite rapidly. A constant drift from schools like us to other, better schools because of parental perception. Thats the way this government at the moment has given us hurdles to jump. The way they publicize our achievement which is totally unscientific and statistically inaccurate. League tables. Schools are directly compared even though their [student] intake is totally different.
The schools appeared to take the GCSE test for 16-year-olds quite seriously, as it was the one with the highest stakes. However, they adopted a two-part coping strategy. Actual changes in curriculum and instruction were delegated to departments and department heads. Schools senior management teams (head teachers and others) responded to the recruitment problem more directly by taking actions such as offering a foreign language not available at other schools in the city, developing a strong performing arts program and seeking funding for a special performing arts building, strengthening links to feeder primary schools to get head teacher endorsement of their programs when parents chose secondary schools, and publicizing special events and actively courting good publicity (for further discussion, see Gewirtz, Ball, & Bowe, 1995). More of these actions were taken in the schools with lower socioeconomic intake. Thus while stakes appear to be high in England and Wales, especially for schools serving a lower socioeconomic clientele, much of the response to these sanctions takes the form of marketinga response that is more symbolic than substantiveinstead of instructional reform.
THEME 4: EXTERNAL PRESSURE CAN BE USEFUL FOR CHANGING CONTENT TAUGHT
While many responses to external pressures were powerfully symbolic and directed outside the system, educators also reported making more substantive changes. A major focus of this research was to discover whether instructional changes were occurring and what role pressure played in causing them. A comparison between teachers in Maine and Maryland suggested that external pressure might influence classroom practice more in the latter state. Seventy-five percent of teachers in Maryland (15 of 20, including mathematics teachers and department heads) described making changes in their teaching to accommodate the state test while only 20 percent of their counterparts in Maine did the same (5 in 25). Nine of the 11 Maryland math teachers reported some changes in teaching in response to the MSPAP. The most prominent changes reported were in the content taught. These were rather extensive in Maryland although they varied from broad topics like number relationships or measurement to very narrow ones like stem-and-leaf plots. One teacher said the latter was necessary because without such instruction, children were answering practice test items by drawing charts with leaves on them. Given the National Council of Teachers of Mathematicss (NCTMs) (1989) concern to give certain topicslike statistics and discrete mathmore time in the curriculum, the broad changes are not insignificant. However, squeezing new topics into the curriculum was not easy because conventional topics remained in place. Pressure to maintain the status quo was reflected by five of the 11 Maryland math teachers who felt some conflict between the requirements to get students ready for the MSPAP and for high school algebra. These teachers commented:
Theres not enough time to go through all the algebra and do the MSPAP tests.
This second class you observed . . . will lead into essentials of algebra next year, and theres very little probability in our algebra curriculum right now. So, I have to make sure that they get it well enough to remember it to the end of next year when they see it on the MSPAP because its always probability.
The other change was teaching to the test. Teachers used items like those on the tests in their teaching. They also explained to children how the tests were scored and how to do well on them, for instance by giving hints about how much to write on the math tests.
The situation is more complicated in England and Wales because, as one department head noted, The Key Stage 3 assessment came [i.e., was initiated] at the same time as the National Curriculum, and the National Curriculum told us what to teach. Or as a teacher in another school explained, Our schemes of work are in line with the National Curriculum, so we are covering it. Thus, as policy makers intended, it is difficult to ascertain whether local curricular changes were in response to the National Curriculum or to the tests that are supposed to be (and appear to be) aligned with it.
Teachers and administrators in England and Wales pointed to a certain amount of explicit test preparation activity, however. This included adjusting the order in which topics were taught to cover areas expected to be on the test, and revision3 of topics on the test just before it was given. Something we observed directly was explicit practice with tests from the previous yearwhich are released in England and Wales but not in the two American states. Nevertheless, since the greatest pressure was linked to the GCSE for 16-year-olds, that was where teachers put their effort. As one said, I do try very hard not to allow [the KS3 test] to impact on my teaching in any way, shape, or form. . . . I think of the GCSE right from year 7 [age 11].
In Maine, where teachers had the least pressure to comply, only four of 14 math teachers mentioned any response to the test. These were usually changes in the order in which topics were presented, shifts to focus more intensively on topics, or even specific practicesi.e., writing out answerswhere students did poorly, but rarely included adding new topics. Teachers were more concerned with the conventional curriculum. Five of the 14 math teachers mentioned that they felt pressure to get students ready for high school algebra, and only one mentioned conflict between the MEA and preparing for algebra. Changes in the order topics were presented were rarely made just because of the state test. As one principal explained,
There was talk about putting more physical science in sixth and seventh grade because of the MEA, but students will get it in 9th grade. The one thing we did change was that the half-year course in Maine studies used to be done with ninth graders. Since we knew Maine studies would be on the MEA, we put it in seventh grade. . . .
In this and other instances, reordering of curricular topics seemed to reflect local concerns as much as any perceived need to respond to the MEA. Maine teachers also reported less teaching to the test than in Maryland.
THEME 5: EXTERNAL PRESSURE IS LESS EFFECTIVE IN CHANGING INSTRUCTIONAL STRATEGY THAN CONTENT TAUGHT
While changing curriculum content is important, much of the debate about the quality of American education focuses on the instructional strategies used. Critics point to a persistent pattern of teaching in the United States that is emotionally flat and intellectually unengaging (Elmore, 1996, p. 299). Researchers using many methodologies and perspectives agree. Most American math teaching features lectures and recitation, students using worksheets requiring simplistic answers, and too many topics covered too shallowly (Goodlad, 1984; McNeill, 1986; Schmidt, McNight, & Raizen, 1996).
Reformers have argued for instruction that provides more opportunities for students to learn actively and explore issues in more depth. In the United States, the NCTM (1989, p. 5) argues that teachers should build students mathematical power, a term that denotes an individuals abilities to explore, conjecture, and reason logically, as well as the ability to use a variety of mathematical methods effectively to solve non-routine problems. Similar prescriptions have been advocated as ways to raise the scores of American students in international comparisons (Schmidt, McNight, & Raizen, 1996). Related ideas were proposed in Englands Cockcroft Report (Cockcroft, 1982). The national curriculum in England and Wales embodies similar ideas through its mathematics Attainment Target 1, Using and Applying Mathematics, which promotes learning of conventional mathematical topics through exploration and investigation (Hughes, 1997).
Traditional ideas about teaching are thoroughly embedded in the thinking of teachers in both countries. Rather than giving children a chance to develop their mathematical power by exploring mathematical ideas, the large majority of these teachers strongly believed they had to break topics down into easily learnable bits. In our second interview, teachers were asked to choose between one teaching strategy where the topic is broken down into smaller pieces, and I show them how to move through each step, and they have lots of opportunities to practice, and another where, I give them a larger problem, with mathematical content for them to figure out, and they can get help from me or other students. Sixteen of the American math teachers who responded to the question chose the first option, two chose the second, and six wanted to combine the two.4 One said:
I do the pieces. They can put the pieces together if you give them the pieces at the middle school level. Ive given them the whole thing and they dont seem to know where to start. Once I give them a starting point and break it down, they can handle it better.
The notable exceptions were the few teachers who felt that after students had drilled on specific computational procedures, they could be given end-of-unit activities offering an opportunity to extend or apply their knowledge. In England and Wales, no teachers chose the large problem option, seven of the 14 who responded to the question chose small problems, and seven more chose a combination. One offered that they learn easier using [small problems], but I think their long term understanding is better using [larger problems].
Teachers were also asked to choose between two definitions of mathematics: one as a language to investigate patterns and relationships and to build models, intended to reflect the NCTMs perspective, and the second as a system of procedures and rules to solve problems, indicating a more conventional view. American teachers split evenly between these two options (10 choosing the first, 11 the second, and three unable to choose). However, those who said that mathematics involved recognizing patterns and relationships also said that this view reflected what they wanted to think math is; they actually felt constrained to teach in ways that reflected the second definition. The answers were also split in England and Wales (3 choosing patterns and relationships, 5 procedures and rules, and 5 a combination).5 One said, Id go with . . . rules and procedures. With everything, you need rules before you can build anything.
Thus an important question is whether external pressure linked to tests can encourage teachers to change their thinking about and practice of teaching mathematics. Some Maryland teachers recognized that the MSPAP was supposed to bring about a major shift in mathematics teaching. One said that MSPAP is challenging us to reassess our traditional methods of teaching math. We are old dogs learning new tricks. The challenge really is to rethink and adopt new paradigms. I still have a hard time interpreting it into the classroom.
The most obvious and ubiquitous manifestation of instructional change in Maryland was the spread of MSPAP activities. The pedagogical significance is that MSPAP activities were extended projects that used a variety of mathematical and nonmathematical concepts as well as manipulatives and multiple forms of representation. They provided one of the few breaks from the dominant pattern in these data where teaching was characterized by extensive practice on large sets of small, tightly structured problems focusing on single concepts or a limited number of operations (Stigler & Hiebert, 1997). MSPAP activities included surveys of student smoking behavior and hands-on projects where students went outside to measure the height of a water tower using several mathematical methods.
At their best, MSPAP activities gave students the opportunity to reason about mathematical issues and develop their own ways to represent and solve problems. Yet, the MSPAP activities we observed often proved flawed. One activity, for instance, gave students the opportunity to develop ways to display data and draw conclusions from those graphical representations. Students were given information about how many meals had been bought at specific prices and asked to organize the data and then make recommendations for how to price meals in the future. However, the book containing the problem specified the displays students should use (stem-and-leaf charts, histograms), and the teachers coaching indicated clearly that the price at which the most meals had been sold in the past should be preferred in the future. Alternative ways to organize data and the consideration of other factorse.g., that pizza at almost any price will sell well in a student cafeteriawere not entertained in class discussion.
Moreover, especially in the weeks leading up to the test itself, pressure to do well encouraged teachers to give students formulae for coping with MSPAP activities. One teacher did a strict simulation of the test with problems he had made up and coached students on what was an appropriate answer in the testing context. At one point, he told students, Let me warn you. What is a paragraph? Is it two pages? Is it fifty words? Twenty-five words? Longer isnt better. They like it short and sweet in complete sentences. Also you can draw a picture and label it and then use the labels in sentences.
In England and Wales, changes away from conventional instructional strategies toward larger problems with greater opportunities for mathematical reasoning usually come under the heading of investigational research (see below). Teachers report doing investigations (11 of 16 so indicated), but these seem to take a relatively small amount of time during the academic year because of the crowded curriculum. As one teacher explained, If anything gets squeezed, it will be that. I think we feel the important thing is to cover the scheme of work and if you dont get the investigation done thats not too bad. Even where it takes place, investigational work is clearly segmented from regular teaching, and has relatively little impact on it.
More generally, the conventional instructional pattern proved quite universal. The TIMSS video study used a number of dimensions to compare American mathematics teaching to that in other countries (Stigler & Hiebert, 1997). In one dimension, student activities were divided into three categories. First, students might practice solving routine problems by repeatedly using one or a few procedures as in conventional American worksheets consisting of a large number of one kind of probleme.g., long division. Second, they could apply procedures to situations that are new, either because they require some connection to the real world or because they require consideration of new mathematical concepts or situations. Exemplary activities that we observed included finding the area of a circle several different ways as a means to understand the value of pi orafter playing two probability games with dice and discussing what fairness meant in this contexthaving the opportunity to invent their own game. Finally, they might invent new procedures and analyze new situations. In these cases, they may have to generate proofs, theorems, or rules. Not surprisingly, almost all the math classes required students to practice using procedures (see Table 2).
Another dimension of teaching refers to the teachers presentations: whether teachers simply state concepts and formulae or try to develop them somehow. For instance, one teacher showed how the formula for the area of a triangle can be derived by combining two triangles to form a rectangle. This approach addresses a concept in more intellectual depth than simply presenting the class with A = ½ b X h. In another instance of developing concepts in our data, a teacher used a number line and then pieces of plastic that represented parts of a whole to help students develop an understanding of fractions. Overall, however, our teachers overwhelmingly engaged in telling rather than developing mathematical concepts (see Table 2). Indeed, the emphasis on teacher telling and student practice characterized teaching regardless of the nature of the test or the stakes attached to it.
A third dimension, generated from direct observation and a desire to compare the prevalence of MSPAP activities and investigational work in England and Wales, was problem size; large problems should provide more opportunities for conjecture and mathematical reasoning than smaller ones. Thus they could facilitate the development of mathematical power (NCTM, 1989), althoughas we have notedthat need not be the case. Large problems require extended effort by students; they often entail several steps or activities organized around a common theme, problem situation, or math concept. The problem where students approximate the area of a circle six different ways to lead up to the concept of pi is a large one. While this problem gave students considerable room to reason about a mathematical issue, not all large problems did so. The problem where students used data to price cafeteria meals was also a large problem. Small problems would include sets of many long-division problems, fractions to reduce, or polynomials to factor and are designed to help students practice a procedure. Historically, most American textbooks have featured large sets of small problems. These rarely provide opportunities for analytical reasoning. Large problems proved especially rare in England and Wales, reinforcing the observation that investigational work is uncommon. It only occurred in a small minority of cases in the two American states although no more in Maryland than in Maine. When juxtaposed, the interview and observation data suggest that the external pressure felt by educators did not greatly influence teachers instructional strategies (see Table 2).
THEME 6: THE EFFECTS OF EXTERNAL PRESSURE WILL DEPEND ON A VARIETY OF OTHER POLICY FACTORS
This statement is the most speculative. The classroom observations suggest that pressure to respond to tests can influence topics taught, but not instructional strategiesexcept at the most obvious level of promoting teaching to the test (although the pressure to do it seems to be weaker or less monolithic than some previous observers have suggested). But what factors might combine with such pressure to influence instructional practice? Examination of the policy context in England, Wales, and Maryland suggests three: the policy makers intent, the design of the assessments, and learning opportunities.
Of the two higher pressure jurisdictions, Maryland most clearly intended to break with the documented pattern of conventional mathematics teaching. When it began the new assessment program in the 1990s, its policy documents embraced the NCTM framework:
The National Council of Teachers of Mathematics approved, in 1989, the CURRICULUM AND EVALUATION STANDARDS FOR SCHOOL MATHEMATICS. This document will be used as a guideline for local and state agencies as new curricula are being prepared. Four STANDARDS are common to all grade levels: communication, reasoning, problem solving, and mathematical connections. . . . Problems for discussion will be based on real life situations. (Maryland School Performance Program, 1990, p. 1, capitalization as in the original)
Support for the program stretched across the terms of two different governors. Although Parris Glendenning opposed MSPAP when he campaigned for governor, he became a supporter once in office.
Mathematics educators in England and Wales have sought to achieve ends similar to those advocated by NCTM by promoting investigational research as a teaching strategy, and other practices that provide more opportunity for mathematical reasoning and problem solving. However, the Conservative Party, which adopted the countrys curriculum and testing policy in 1988, was not particularly sympathetic to these concerns. That party was dominated by two wingsneoliberals who were interested in market reforms, and cultural reconstructionists bent on bringing back the traditional curriculum they had been taught (Ball, 1990). The latter group had the greatest impact on curriculum and assessment development (Hughes, 1997).
The impact of cultural reconstructionists is reflected in the history of the development of mathematics attainment targets, critical components of the national curriculum. The earliest drafts of the mathematics curriculum had a profile component, as it was then called, for using mathematics, communications skills, and personal qualities. Government ministers objected to this component, but largely because of its reference to personal qualities. The draft curriculum put before Parliament in March, 1989, did have an attainment target for using and applying mathematics. This target was intended to cut across the other, more topic-focused targets and was to encourage instruction through investigations. Thus it was the one target that promoted mathematical reasoning and communication. It was almost removed during several later efforts to streamline the curriculum.
A similar debate occurred over the design of the assessments to go with the National Curriculum. The first round, the report of the Task Group on Assessment and Testing (TGAT), clearly reflected the perspective of professional educators (Daugherty, 1997). While a very complex document, it had two key features. The first was a balance of emphasis between formative assessment to be used by educators and summative assessment for making external judgements about schools and students. The second was advocacy of standardized assessment tasks rather than conventional examinations. These tasks were to use a wider variety of modes of presentation and be much closer to the American idea of performance-based assessment. Teachers ratings of pupil performance during regular class work was to have weight equal to that of scores on preconstructed tasks (Daugherty, 1995).
Actual implementation moved away from the TGAT design rather quickly. During pilots, the TGAT ideal proved difficult to actualize. Individual standardized assessment tasks were intended to model good mathematical investigations but proved extremely time-consuming and disruptive (Daugherty, 1995). A new Secretary of State for Education and Science attacked the whole premise of standardized assessment tasks, primarily because he (and the rest of the government) sought tests that provided more rigorous, objective summative measures of pupil attainment that could be used to compare schools. In due course, the language of standardized assessment tasks was eliminated, and written assessments became more like conventional assessments in England and Wales. Teacher assessment remained but took a definite back seat to written tests (Daugherty, 1995).
Where they happen, political debates are likely to be reflected in the design of state and national assessments. These can be important for their hortatory significance (McDonnell, 1994), the way they exemplify good practice for teachers. The MSPAP seems to be fairly well aligned with the NCTM framework (Firestone et al., 1998). It is a significant departure from multiple-choice tests in that students are asked to construct responses that must be scored by trained judges using predetermined rubrics.6 MSPAP problems tend to be fairly long or to include linked parts, and may be interdisciplinary; the same problem may be scored for two or more subject areas such as mathematics and science. Most problems are given to individual students but there is some group work.
The MSPAP de-emphasizes pure calculation by allowing students to use calculators on at least parts of the test. It follows NCTM standards that call for greater emphasis on complex, problem situations involving topics such as probability, statistics, geometry, and rational numbers (National Council of Teachers of Mathematics, 1989, p. 75). The NCTM standards expect students [to] assume more responsibility for validating their own thinking (p. 79), and MSPAP scores are higher when students can explain their answers better. Moreover, students are expected to use various forms of representation, including graphs, charts, pictures, and writing as well as mathematical notation. Another standard is mathematical reasoning. Some items ask students to generate, test, and refine hypotheses. Others ask students to consider and generate patterns and present arguments justifying conclusions. One, for instance, asks students to extend a pattern and develop a rule for identifying the nth item in a series.
In England and Wales, using and applying mathematics has remained part of the curriculum; but it is not assessed through the written tests, in spite of recurring professional recommendations to give that target more attention (e.g., Brown, Johnson, Askew, & Millet, 1994). Early on, this target was deemed too difficult to operationalize reliably on written tests.7 In theory, it is now only tested through teacher assessment. However, teacher assessment has never received the priority of written tests (Radnor, 1996). When asked about teacher assessment, one math teacher asked, Is this where you write down in the report what level you think theyre at and you compare it to what they actually get on the standardized assessment task?8 Moreover, teacher assessments of a students level are based on a wide variety of data, and some teachers seem to rely primarily on end-of-unit tests or homework, more than investigations. A typical response is that teacher assessment is just based on the marks Ive recorded and leveling them in terms of how much I think theyve retained and understood. The limited significance of teacher assessment and variety of data sources used contributes to the restricted time teachers in England and Wales give to investigational work.
The KS3 written tests are not multiple choice, but they usually require short answers. One 1997 question showed students a series of triangles made out of matchsticks and told students the formula relating the number of sticks to the number of triangles. Students were then asked to find how many sticks they needed to construct so many triangles, find how many triangles could be constructed from a given number of sticks, and write out in letters the formula linking matchsticks to triangles. Students are rarely asked to show their work or justify their answers. Where they have to organize and represent data, one clear best answer is expected.
Another indication of the priority given to traditional mathematics is that the major assessment innovation during this fieldwork was the introduction of a mental math test, which required students to answer straight calculation problems read to them on a tape in a fixed number of seconds. Thus the English-Welsh national assessment program seems to reinforce practicing procedures, as opposed to encouraging mathematical reasoning, because of the intent of policy makers.
The third factor that interacts with stakes to influence instruction is the learning opportunities that accompany new tests. Given the strong commitment the teachers from all three countries had to conventional instructional practices, such learning opportunities seemed especially critical. Without them, teachers would not have known what to do differently regardless of the stakes. At the state level, Maryland did provide some opportunities to learn about teaching approaches congruent with the MSPAP through the Maryland Council of Teachers of Mathematics and the states Governors Academy. Despite the fact that these statewide professional development activities were the best attended in the state, the large majority of eighth grade mathematics teachers still did not have direct access to these informational resources (Koretz et al., 1996).
Maryland had very large districts with substantial central curriculum staffs, including math specialists in both districts visited. Through the efforts of these specialists, and as a result of state requirements for site-based management, a fair amount of time was made available through district in-service days and in-school planning time to develop activities aligned to the state test and otherwise working on raising test scores. One Maryland district math coordinator described an in-service day he organized where the middle school math teachers met as a group, and we had the people who attended the Governors Academy share activities, materials, strategies that they picked up at the Academy. Many focused on MSPAP. He also supervised teachers directly, at which time he said, I always try to focus on the outcomes, MSPAP outcomes. Which math outcomes and indicators is this lesson or series of lessons addressing? The strength of these in-service activities was their process; teachers worked together to develop new teaching approaches and came to own them. The weakness was their content; the workshops rarely, if ever, provided teachers with any new ideas about the nature of mathematics or forms of instructional practice that differed from what they had used before. The fundamental knowledge about mathematical content and instructional strategies that Cohen and Hill (1998) suggest is critical for changing instructional practice was missing.
Maine offered much less subject-based professional development. The state department of education offered little, if any, professional development to teachers. Maine teachers were critical of their districts efforts to support instruction. These districts did not have central office subject matter specialists, and one was phasing out its district curriculum coordinator during fieldwork. Two of the three districts did not have system-wide in-service days at all. In commenting on the quality of the professional development available to them, one teacher said, Well, we do have workshop days, but truly, theres not much thats useful.
Most teachers in England and Wales (9 of 14 commenting) felt that they did not have adequate professional development to cope with the new national assessments. They noted special needs in two areas. The first was in the target area, using and applying mathematics, and the investigational work that was supposed to go with it. Their concerns included both how to do and how to score such investigations. They also sought more guidance on the written KS3 assessment. As one teacher explained, I still dont know whether we are doing it right or not. Moreover, unlike their American counterparts, teachers in England and Wales get no financial incentives for taking university courses, another potential source of knowledge about mathematics education. As a result fewer teachers take university courses in those countries. The schools did have INSET days that provided release time for working together as a department, much more than providing access to new knowledge. In that regard, they were somewhat like the professional development days in Maryland. LEAs (i.e., districts) were supposed to be offering training, but as LEA budgets and authority declined in response to changes in central policy, the presence of these entities was becoming much less noticeable in English and Welsh schools.
While these findings must be treated as more illustrative than definitive, they do suggest some things about the place of stakes in a comprehensive or systemic reform policy (Elmore & Fuhrman, 1994). One is that stakes must be treated as one of several sources of pressure for educational change, and one important question then is what contribution can pressure play in efforts to reform practice. There appear to be two views on how to accomplish such reform. Much of the popular press views poor practice as a problem of limited will to be resolved by applying more pressure. Some of the best known policy analysts (e.g., Berman, 1986; Cohen & Barnes, 1993) argue that the necessary changes in practice require considerable learning on the part of current teachers, a position that we share. Past critics of high stakes testing have suggested that such policies will primarily promote short term accommodations, but not significant deeper learning. However, these analysts have not considered whether pressure can help promote learning.
Fullan (1991) suggests that effective change requires a mix of pressure and support. The contribution of pressure has been suggested by Deborah Ball (personal communication). Following Schwab (1978), she suggests that people learn from their own practice; it is hard for them to apprehend something they have not experienced. The problem then is to get people to have a new kind of experience when they are already committed to doing things in an older, routine way. Certain kinds of pressure may have that effect. An important question raised by research on stakes related to assessments is whether they are an effective means to encourage teachers to have such experiences without generating excessive resentment. Clotfelter and Ladd (1996) point to a similar balancing act in the use of financial incentives to reward performance.
Fine tuning that kind of pressure would appear to be a challenge partly because of the elements that go into creating the sense of pressure, partly because pressure alone is not enough, and partly because of the nature of the policy making process. The factors that come together to generate an individual teachers experience of pressure are difficult to predict. Loose talk among educators about high stakes means that it is difficult to take local reports totally at face value.
Moreover, the same policy may have different meanings in different places. For instance, greater pressure that Maryland educators felt in comparison to those in Maine reflected such factors as relative insulation from local constituencies, tight linkages to the state government, and a fairly centralized political culture. Differences in perceived pressures may also reflect local context in a way that amplifies equity concerns. It is likely that teachers in Baltimore City felt very different about reconstitution than did teachers in other parts of Maryland. There seem to have been similar differences in response to national assessments between the English and Welsh schools that historically attracted high achieving students and those that did not. Teachers in these schools were pressed to change their practice but were not necessarily provided extensive guidance about what good practice looked like or the materials needed to implement such practice. This kind of concern has led a National Academy of Education panel to recommend that sanctions not be linked to test scores until opportunity to learn has been equalized between schools serving rich and poor students (McLaughlin & Shepard with ODay, 1995). Their focus is on opportunity for students to learn. We would add that for such opportunities for students to be equalized, the opportunities for teachers in those setting may have to be even higher than in other schools.
Even when some pressure is present, however, its relationship to practice is complex. Pressure often generates symbolic responses from press releases to program marketing. In some instances, it also promotes changes in practice. This was the case in Maryland and England and Wales where there were changes in content taught, but not instructional strategy. It appears that reconstitution (outside of the inner city) was a reasonably moderate form of pressure in that it got educators attention and led to some constructive change without causing panic. Extensive curriculum guidance in England and Wales combined with market sanctions also influenced content, but the conflict it caused was considerable (see Daugherty, 1995).
What was missing with reconstitution and to a lesser extent with the sanctions in England and Wales was the structures and opportunities to help teachers reflect on their teaching and develop more effective practices. Cohen and Hill (1998) suggest that changing practice requires less time thinking about how to best the test and more time thinking about what mathematics is and how to teach it. There are models that promote that kind of reflection in the American policy context. For example, teacher networks, including state-sponsored networks, have been very helpful for some teachers (Firestone & Pennell, 1997). In fact, under some circumstances, thinking about the work students do and how to help them do better work can be combined with thinking about tests, as the experience around the Vermont portfolio assessment illustrates (Murnane & Levy, 1996). While Maryland did more than Maine or England and Wales to provide teachers the time and knowledge to rethink their practice, what was offered did not seem to be enough. Hence educators there made the changes they knew how to makechanging the content presentedand not the ones they did not understand.
It appears then that in the best case, policies to change instructional strategy can benefit from a mix of mechanisms to generate pressurewhich are likely to be interpreted differently by different teachersand learning opportunities. Yet even if we had a better social psychology of educational incentives and knew more about how to guide teachers to a deeper understanding of subject-based pedagogy, it seems unlikely that such knowledge could be enacted in central policy. For one thing, the messiness, randomness, and fragmentation of the policy making process (e.g., Firestone, 1989; Kingdon, 1995) probably works against the fine-tuning that seems necessary. For another, without a strong political consensus around the educational philosophies instructional strategies advocated by NCTM and many professional educators in Britain, top-down policies that incorporate them constantly run the risk of being attacked, subverted, or reversed. That was certainly true in England and Wales under the Thatcher and Major governments, and appears still to be so with the new Labor government. The debates over assessment design and standards in California (e.g., Lawton, 1997), Arizona (Smith, 1996), and other states, and now over how to rate state standards [compare the recent report by the Council for Basic Education (Joftus & Berman, 1998) with those from the Fordham Foundation (Raimi & Braden, 1998, Stotsky, 1997)], indicates that the same kinds of disagreements are common in the United States.
In sum, this study of what we would call moderate stakes for educators leaves one somewhat skeptical about both the claims for and the charges against high stakes. The popular complaints seem somewhat overdrawn partly because educators are likely to claim that stakes are higher than they are, and partly because some forms of pressure can help induce change. We saw less stress among educators than some earlier writing would have led us to believe (Corbett & Wilson, 1991; Smith, 1991). The teaching to the test we saw was less a matter of dumbing down an existing curriculum than of undermining efforts to introduce a more challenging form of pedagogy.
Yet teaching to the test is one reason for skepticism about the way pressure is applied to educators in conjunction with assessments. It is hard to imagine how to organize such pressure and related policy instruments to really induce change in instructional strategies. And if one could design such a system, the ways in which policy is made work against a careful calibration of pressure. Moreover, with the current backlash against what was generally seen as high standards teaching just a few years ago (NCTM, 1989), it is probably appropriate to spend less time thinking about how to design incentive systems and more time thinking about what constitutes good practice and how to help teachers and the public understand what it is.
Writing this paper was supported by a grant from the Spencer Foundation to Rutgers University and a Spencer small grant to the University of Wales-Cardiff. Responsibility for the conclusions offered here rests strictly with the authors. Thanks to Brian Davies, John Fitz, and Jan Winter for sharing the English and Welsh fieldwork and Janet Fairman for her help with the American fieldwork. Cathy Lugg and Greg Camilli provided useful reviews of an earlier draft.
Ball, S. (1990). Politics and policymaking in education. London: Faber & Faber.
Baron, J. B., & Wolf, D. P. (1996). Performance-based student assessment: Challenges and possibilities. Chicago: University of Chicago Press.
Berman, P. E. (1986). From compliance to learning: Implementing legally-induced reform. In D. Kirp & D. Jensen (Eds.), School days, rule days (pp. 4662). Philadelphia: Falmer.
Brown, M., Johnson, D., Askew, M., & Millett, A. (1994). Evaluation of the implementation of National Curriculum mathematics at Key Stages 1, 2 and 3. London: SCAA.
Clotfelter, C. T., & Ladd, H. F. (1996). Recognizing and rewarding success in public schools. In H. F. Ladd (Ed.), Holding schools accountable: Performance-based reform in education (pp. 2363). Washington, DC: Brookings Institution.
Cockcroft, W. H. (1982). Mathematics counts: Report of the Committee of Inquiry into the Teaching of Mathematics in Schools. London: HMSO.
Cohen, D. K. (1995). What is the system in systemic reform? Educational Researcher, 24(9), 1117.
Cohen, D. K., & Barnes, C. A. (1993). Pedagogy and policy. In D. K. Cohen, M. W. McLaughlin, & J. E. Talbert (Eds.), Teaching for understanding (pp. 207239) San Francisco: Jossey-Bass.
Cohen, D. K., & Hill, H. (1998). Instructional policy and classroom performance: The mathematics reform in California. Philadelphia: CPRE.
Corbett, H. D., & Wilson, B. L. (1991). Testing, reform, and rebellion. Norwood, NJ: Ablex.
Daugherty, R. (1995). National curriculum assessment: A review of policy 19871994. London: Falmer.
Daugherty, R. (1997). National curriculum assessment: The experience of England and Wales. Educational Administration Quarterly, 33(2), 198218.
Edelman, M. (1988). Constructing the political spectacle. Chicago: University of Chicago Press.
Education Week. (1999). Quality counts 99: Rewarding results, punishing failure. Bethesda, MD: Author.
Elmore, R. F. (1996). Getting to scale with successful educational practices. In S. H. Fuhrman & J. A. ODay (Eds.), Rewards and reform: Creating educational incentives that work (pp. 294 329). San Francisco: Jossey-Bass.
Elmore, R. F., & Fuhrman, S. H. (1994). The governance of curriculum. Alexandria, VA: ASCD.
Firestone, W. A. (1989). Educational policy as an ecology of games. Educational Researcher, 18(7), 1824.
Firestone, W. A., Mayrowetz, D., & Fairman, J. (1998). Performance-based assessment and instructional change: The effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20(2), 95117.
Firestone, W. A., & Pennell, J. R. (1997). State-initiated teacher networks: A comparison of two cases. American Educational Research Journal, 34(2), 237268.
Fullan, M. (1991). The new meaning of educational change. New York: Teachers College Press.
Gewirtz, S., Ball, S. J., & Bowe R., (1995). Markets choice and equity in education. Buckingham, UK: Open University Press.
Goodlad, J. A. (1984). A place called school. New York: McGraw-Hill.
Hughes, M. (1997). The National Curriculum in England and Wales: A lesson in externally imposed reform. Educational Administration Quarterly, 33(2), 183197.
Joftus, S., & Berman, I. (1998, January). Great expectations?: Defining and assessing rigor in state standards for mathematics and English language arts. Washington, DC: Council for Basic Education.
Kingdon, J. W. (1995). Agendas, alternatives, and public policy (2nd ed.). New York: Harper Collins.
Koretz, D., Mitchell, K., Barron, S., & Keith, S. (1996). Final report: Perceived effects of the Maryland school performance assessment program. (Project 3.2). Los Angeles: National Center for Research on Evaluation, Standards and Student Testing.
Lawton, M. (1997, March 12). Facing deadline, California is locked in battle over how to teach math. Education Week, pp. 1, 25.
Marshall, C., Mitchell, D., & Wirt, F. (1989). Culture and education policy in the American states. New York: Falmer.
Maryland School Performance Program. (1990). Learning outcomes in mathematics, reading, writing/language usage, social studies, and science for Maryland School Performance Assessment Program. Baltimore: Maryland State Department of Education.
McDonnell, L. M. (1994). Assessment policy as persuasion and regulation. American Journal of Education, 102, 394420.
McLaughlin, M. W., & Shepard, L. A., with ODay, J. (1995). Improving education through standards-based reform: A report by the National Academy of Education Panel on Standards-Based Education Reform. Washington, DC: National Academy of Education
McNeill, L. M. (1986). Contradictions of control. New York: Routledge.
Murnane, R. J., & Levy, F. (1996). Teaching to new standards. In S. Fuhrman & J. ODay (Eds.), Rewards and reform: Creating educational incentives that work (pp. 257293) San Francisco: Jossey-Bass.
National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author.
Radnor, H. (1996). Evaluation of Key Stage 3 Assessment: Final Report 1995. London: School Curriculum and Assessment Authority.
Raimi, R. A., & Braden, L. S. (1998, March). State mathematics standards: An appraisal of math standards in 46 states, the District of Columbia and Japan. Fordham Report 2(3).
Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1996). A splintered vision: An investigation of U.S. science and mathematics education. East Lansing, MI: U.S. National Research Center for the Third International Mathematics and Science Study.
Schwab, J. J. (1978). The impossible role of the teacher in progressive education. In I. Westbury, & N. Wilkof (Eds.), Science, curriculum, and liberal education, selected essays (pp. 167 183). Chicago: University of Chicago Press.
Smith, M. L. (1991). Put to the test: The effects of external testing on students. Educational Researcher, 20(5), 812.
Smith, M. L. (1996). Reforming schools by reforming assessment: Consequences of the Arizona Student Assessment Program. Phoenix, AZ: Arizona State University.
Stigler, J. W., & Hiebert, J. (1997). Understanding and improving classroom mathematics instruction. Phi Delta Kappan, 79(1), 1421.
Stotsky, S. (1997, July). State English standards. Fordham Report 1(1).
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.
WILLIAM A. FIRESTONE is Professor of Educational Policy at the Rutgers Graduate School of Education and Director of the Center for Educational Policy Analysis. He is continuing his research on the effects of assessment on practice with a large-scale survey study in New Jersey. He is co-author with James Pennell of State-Initiated Teacher Networks: A Comparison of Two Cases (American Educational Research Journal, 1997).
DAVID MAYROWETZ is a doctoral candidate in the department of educational theory, policy and administration, Rutgers University, and research associate at the Center for Educational Policy Analysis. His interests include policy implementation, inclusion of students with disabilities, and assessment reform. He is the co-author, with Carol Weinstein, of Sources of Leadership for Inclusive Education: Creating Schools for All Children (Educational Administration Quarterly, September 1999).