The Paradoxes of High Stakes Testing: How They Affect Students, Their Parents, Teachers, Principals, Schools, and Society

reviewed by Sharon L. Nichols - June 24, 2009

coverTitle: The Paradoxes of High Stakes Testing: How They Affect Students, Their Parents, Teachers, Principals, Schools, and Society
Author(s): George Madaus, Michael Russell, and Jennifer Higgins
Publisher: Information Age Publishing, Charlotte
ISBN: 1607520273, Pages: 264, Year: 2009
Madaus, Russell, and Higgins provide an authoritative historical and contemporary account of the role, impact, and consequences of high-stakes testing—tests that are used to make decisions about people or institutions. Although some of the evidence they present is not new (e.g., that high-stakes testing narrows the curriculum or changes how teachers teach), their contextualization of high-stakes testing from an historical viewpoint offers a fresh and much needed perspective on the evolution and impact of high-stakes testing in society.

The authors’ primary argument is that the practice of high-stakes testing presents an inherent paradox. As one example of this, they argue that high-stakes testing may increase student motivation and learning or improve teaching among some, while they erode it for others. Consequently, if we are to use high-stakes testing as a lever for school reform, they ask us to consider how we might “accentuate the positive” while we simultaneously, “eliminate” these negative side effects (p. 2). However, as I made my way through the book, I found myself increasingly questioning this fundamental premise. The authors do such a good job worrying me about the fallibility of tests and the problems of high-stakes tests that I was hard pressed by the end to identify the ways in which high-stakes testing offered any “positives.”

The paradoxical nature of high-stakes testing is introduced in chapter 1 where the authors discuss its contributions as well as its problems. For example, they suggest high-stakes testing is reasonable since one of its goals is to provide the public with information about local schools. They note, “[high-stakes tests] give communities information about the quality of their schools and help parents make informed decisions when choosing a school for their children” (p. 2). They go on to note, “high-stakes tests also open doors of opportunity to those previously shut out by holding teachers and schools accountable for student achievement and helping them to focus attention on students previously poorly served” (p. 2).  Although it is true that we want the best from our schools and for our students, by suggesting that high-stakes tests are the only mechanism for providing this information, they confound quality accountability with high-stakes testing. Accountability is paradoxical and has great potential to open doors if it worked in intended ways. High-stakes testing as a form of accountability does not seem to work.

After a brief review of the history of testing and its role in educational policy in chapter 2, the authors discuss test construction, reliability, and validity in chapters 3, 4 and 5. They discuss the considerations that are involved in test construction focusing on item sampling and item formats. In chapter 4 they point out the complications of how students’ prior knowledge interacts with tests and test items. And in chapter 5 they review cut scores and the politics of determining “proficiency.” Chapter 4 presents an especially compelling discussion of how culture impacts testing. They note, “High-stakes testing incorporates two culturally held values. The first is that achievement is an individual accomplishment. The second value is that individuals must display their accomplishment publicly” (p. 62).  However, these values conflict with other cultures’ beliefs, such as Native American cultures which tend to value collectivism, reflection, and introspection.  

In chapter 5, the authors focus on the importance and challenges of establishing test validity. Validity can be a difficult and abstract concept for many to grasp. However, Madaus and his co-authors simplify the concept by focusing on the problematic nature of using cut scores to label and categorize students. They use analogies to sports and medicine to make the critical argument that tests are inherently error prone and therefore should not be used to make critical decisions about students. After reviewing data that showed students who performed well on state level high-stakes tests were subsequently ill prepared for college level academics, they conclude with the reminder that test validity is not “an all-or-none concept” but “a matter of degree” (p. 80). This holds particular significance in the last chapter and where the authors point out that there is virtually no oversight of the test makers.

In chapters 6 and 7, tests are viewed as a form of technology and the authors trace the history of testing as it relates to societal advances that include new technologies. Tests of “authenticity” were among the first types of tests used to measure physical characteristics thought to represent underlying personality characteristics or traits. These tests helped to shape “what is valued by society, how those values are measured, and subsequently how those measures are applied to foster those values” (p. 111).  As researchers grew more cognitively and technologically sophisticated, so too did the function, form, and utility value of tests. For example, the authors note,

with the introduction of mathematics into the curriculum at Cambridge University during the mid-18th century, it was soon apparent that the oral mode of examining was inappropriate for measuring mathematical skills. To overcome this limitation of the oral exam, a written exam was introduced. (p. 116)

Tying advances in tests and testing with advances in technology, the authors convincingly demonstrate how politics, education, and commerce are inextricably and perpetually intertwined.

In chapter 8, the authors discuss what they call the paradoxes of testing. Here they review much of the evidence that has been discussed by others: How testing narrows the curriculum, encourages memorization, lowers student motivation evidenced by increasing retention in grade and dropout rates. Interestingly, they place these data in an historical context that demonstrates the persistence of these effects over decades if not centuries. We learn, for example, that an analysis of over 1,200 years of the administration of the high-stakes Chinese civil service exams, revealed many negative side effects such as “students focusing on test taking skills … and elaborate student cheating” (p. 142). One is left to wonder whether today’s policymakers were simply unaware of this history or chose to ignore it in adopting high-stakes testing as the best option for educational accountability.

Chapter 9 is concerned with the future of testing. Here, the authors review what they see to be the potential advantages and disadvantages of using computers with testing. In chapter 10, they end with one of the most important points about high-stakes tests: the need for oversight and accountability of the testers.

High-stakes testing is not paradoxical at all. But tests, as a tool for measuring human qualities, may be. Danish reviewers Garsdal and Ydesen (2009) argue that Madaus and colleagues may have missed an opportunity to explore a more fundamental problem of tests: “On the one hand testing denies the test taker his/her uniqueness but on the other hand testing is a science of the individual” (Garsdal & Ydesen, 2009). Our society must grapple with the ongoing dilemma of how we might identify and then meaningfully measure attributes we most value before we worry about how those measures might be used.

Madaus, Russell, and Higgins make an important contribution to understanding of the role and nature of high-stakes testing. The book effectively argues that not only is high-stakes testing bad for education, but its deleterious effects have been known for a very long time! By highlighting the complexities involved in creating meaningful and valid tests, and by showing how tests have become inextricably linked with the politics of education, the authors present a convincing case for holding test makers accountable.  


Garsdal, J., & Ydesen, C. (2009). The debate on educational and psychological testing in the United States: An outsider perspective with some philosophical musings. Education Review, 12(8). Retrieved June 15, 2009 from http://edrev.asu.edu/essays/v12n8index.html.

