Assessment Policy and Political Spectacle
by Mary Lee Smith, Walter Heinecke & Audrey J. Noble - 1999
This paper relates the history of assessment policy in Arizona based on an extensive base of interview, observation, and archival data. In less than a decade, two radical shifts took place, from basic skills and standardized testing to progressive reform by performance testing to high-stakes standardized testing based on state standards. We argue that the shape of assessment policy and policy change had more to do with political spectacle (Edelman 1988) and the struggle for power, position, resources, and the control over public schools than with empirical or rational analysis, moral imperative or democratic debate.
This paper relates the history of assessment policy in Arizona using an extensive base of interview, observation, and archival data. In less than a decade, two radical shifts took place: from basic skills and standardized testing to progressive reform by performance testing to high-stakes standardized testing based on state standards. We argue that the shape of assessment policy and policy change had more to do with political spectacle and the struggle for power, position, resources, and the control over public schools than with empirical or rational analysis, moral imperative, or democratic debate.
At a news conference2 on January 21, 1995, newly elected superintendent of public instruction Lisa Graham Keegan announced the suspension of one element of the Arizona Student Assessment Program (ASAP) a mere two weeks before it was to have been administered to thousands of students throughout the state. Serious problems of reliability and validity threatened Form D (the on-demand or audit form of the state assessment program), the performance test her predecessor had put into place and described as the cutting edge of state-mandated assessments. The developers of Form D themselves had uncovered these technical problems, which, she said, rendered this decision inevitable. She attempted to reassure educators that all their efforts to adjust curriculum and instruction to be consistent with ASAP had not been wasted, that she would call on technical experts to study and remedy the problems of Form D. Instead, at a second news conference 8 weeks later she declared that more serious issues underlying ASAP had come to light, necessitating wholesale changes in state assessment policyeventually resulting in an Academic Summit to write new state standards, new legislation and regulations, new tests, and new forms of accountability. Along the way, this rupture in assessment policy created an arena for the performance of competing pedagogies, ideologies, interests, coalitions, political actors, and agencies contending for power and office.
In this paper we trace the history of assessment policy in Arizona over a 10-year period. As we watched events unfold we recognized a predicament in the theory of policy. Is policy action fundamentally a matter of technical rationality, or of political symbolism and contention?
The conventional model of school reform assumes that states develop policies by consensus, state departments develop programs and policy instruments consistent with the goal consensus, and educators respond rationally and predictably to implement the programs. Schools change because of state policy, in mechanistic ways, rationally, with a sense of shared vision, and more or less predictably and uniformly. Even slow or variable response can be explained on rational grounds; for example, by lack of resources or knowledge.
Applying the conventional model to assessment policy, one would expect that rational and democratic processes would underlie a states rules and programs for determining whom and what content to test, using what instruments, on what schedules, and how the results of the tests will be aggregated and counted. Assessment policy sets the functions of state testing programs as well. A state adopts academic standards and frameworks to bring about curricular coherence, increase achievement, and make schools accountable. It chooses tests as the tools or instruments to attain policy goals (McDonnell & Elmore, 1987). By placing consequences (high stakes) on test results, the state expects that students, teachers, and school authorities will focus their energies on the policy targets.
Against this conventional model, we consider an alternative, political model, in which assessment policy is variously constructed by political and policy actors as well as educational practitioners. Politics at both macro and micro levels influence these constructions. The political model challenges assumptions of rationality and uniformity, occasionally exemplifying the spectacular, though often hiding the ugly pushing and tugging for power, resources, and political agendas behind a facade of rationality. By politics we mean not the usual view of the contest between Republicans and Democrats, but the dynamic process wherein partisans contend for power, prestige, position, and ideology in an official (governmental or institutional) capacity. The enactment of assessment policy is as much a symbol over which the partisans contend as it is a deliberate technique to change schools.
This paper rests on three sources of data. A policy study was conducted to discover the images, ideologies, interests, and tactics of policy actors at the outset of the ASAP, using interviews and documents as data sources (Noble, 1994). Second, a 2-year study of the responses of educators to ASAP employed both long-term qualitative and extensive survey data sources (Smith, 1996). These first two projects followed a consensual model of policy analysis (Rein, 1976), which proceeds from agreed-upon aims; it asks whether policies and the specific programmes that implement them, work as intended (p. 126). As we observed events such as the early 1995 suspension of ASAP and the posturing and maneuvering of political actors over elements of assessment policy, we realized that a different approach was necessary. We thus adopted Reins contentious or paradigm-challenging approach, in which the researcher acts as a moral witness or social critic about a governments aims, actions, or nonactions regarding social needs. The aim of the policy researcher should be to develop stories that weave together values, facts, images, and tentative explanations with the setting, characters, and local context.
Evidence on assessment policy at the end of ASAP and the beginning of the next phase was constructed by collecting documents and reports, interviewing policy actors and constituents, and observing events in which assessment policy was negotiated, including the Academic Summit. Data were arranged in an event chronology. Data were sorted into the categories from the theoretical frameworks. Examining the data within these categories suggested a narrative line with characters, settings, plot, and themes. We then engaged in a process of internal, structural corroboration and subjected the report to review by three informants.2
WHAT WAS ASAP?
The ASAP was the official state assessment policy from its inception in 1990 to its radical revision in 1995. The Arizona Department of Education (ADE) derived its authority for administering the program from Arizona Revised Statutes 15-741. The intent of the program was to increase accountability to the states curriculum frameworks and move schools toward higher-order thinking, complex problem-solving on real-world problems, integrated subject matter, and application of basic skills. ASAP included the following elements: Arizona Essential Skills (curriculum frameworks in nine subjects); District Assessment Plans (DAPseach districts plan for measuring each Essential Skill and its mastery by grades 3, 8, and 12; norm-referenced testing (Iowa Test of Basic Skills and Tests of Academic Proficiency in grades 4, 7, and 10); Essential Skills Reporting Documents (each districts annual report to ADE of test scores and mastery of skills); report cards (annual reports by student, school, district, and state); goal-setting (districts plans for improvement); and performance assessments. Assessments of clusters of Essential Skills were administered to all students in grades 3, 8, and 12, in the form of ASAP Form D. The state published guidelines to insure that Form D would be administered under standard conditions. The generic rubric (4-point guide) was used by scoring contractors at several sites to standardize scoring. Form D tested in integrated style (students had to write an essay or some other extended form in response to a reading assignment with an embedded math problem). Four variations of Form D (D-1 through D-4) were to be phased in over four years, each testing one quarter of the Essential Skills. ASAP Forms A, B, and C consisted of performance assessments in reading, writing, and math to be used for preparing pupils to take the Form D, or as instructional packets, and for district assessments. Forms A, B, and C tested the content areas separately, purported to measure all the Essential Skills, and were administered and scored by teachers, also using the generic rubric. Spanish language versions of all forms of the performance assessment were available and there were guidelines for teachers to modify test administration conditions for disabled students. The performance tests were developed by Riverside Publishers, which also had contracts for scoring services and technical reports. In response to a 1994 policy adopted by the Arizona Board of Education, ADE planned to use a revised version of Form A assessments as a graduation competency battery.
It is doubtful that any individual during 1995, whether policy actor, politician, or educator, could keep this whole list in mind. The average teacher or principal only knew ASAP as a particular set of performance tests and perhaps a set of reform ideals. What one knew was what one had experienced directly. This included statements in the press or during meetings that ASAP was the best we know about how students learn, or that ASAP represented authentic assessment, integrated learning, a new role for teachers, and a way out of the quagmire of high-stakes standardized testing and the stale brand of teaching that seemed to follow it. The accountability aspects and intents of the program came later to awareness, and to some, not at all.
PROLOGUE TO ASAP
Before ASAP, Arizona students experienced one of the highest test burdens in the nation. Legislation mandated the administration of the Iowa Test of Basic Skills (ITBS) to every pupil in grades 38 (the Tests of Academic Progress [TAP] in grades 912). Although the state placed no consequences on the scores of students, some districts based decisions about salaries of administrators and teachers in part on test results. High stakes are often in the eye of the beholder, however, as teachers reported a high degree of embarrassment, shame, and pressure when the results (reported at the district, school, and grade level) came out in the newspapers each summer.
In addition to the norm-referenced testing, the state also required that each district administer tests of basic skills, a list of which had been first approved by the state board of education in 1983. Dissatisfaction with the list resulted in the development of the Arizona Essential Skills, for which the state would hold districts accountable. In 1987, the board of education appointed representative groups of educators and content specialists to committees that wrote the content frameworks and revised them based on extensive hearings around the state. ADE staff guided the work of these committees toward the newly emerging principles of constructivism and progressivism. An influential staff member reflected on their work:
The Language Arts Essential Skills looked at what people were learning about from a constructivist philosophy of education. It looked at what the writing teachers were saying and the writing professors, writing research was saying about writing as a process. It looked at new ways of reading instruction, somewhat whole-language based or literature, and it looked at integrating the language arts so that you didnt teach reading separately from writing. You looked at how they supported each other. And, finally, it looked at the possibilities of assessing language and other subjects directly thatinstead of judging whether students can write by having them do a multiple choice test, it looked at having students actually write, and assessing that writing.
In the late 1980s, the board of education and ADE seem to have arrived at a shared definition of the problem: that existing state tests failed to cohere with the progressive, constructivist frameworks newly developed by content specialists nationally. C. Diane Bishop was an influential member of the board in the late eighties and became state superintendent and head of ADE in 1991. Then a Democrat, she taught high school mathematics and thus understood higher-level thinking in mathematics. Bishops ideas found support among the curriculum specialists at ADE, the Arizona English Teachers Association, the Arizona Education Association (AEA), local affiliates of the National Council of Teachers of Mathematics, the Center for Establishing Dialogue in Education, and local university professors. ADE commissioned two research studies that contributed data to this common definition of the problem. One study compared the content of the ITBS with the content in the Essential Skills and found that only 26% of the Skills were tested. A survey (Nolan et al., 1989) showed that most Arizona educators disputed the validity of ITBS and TAP, spent too much time preparing students to take them, and believed that the tests had deleterious effects on students, teachers, and the curriculum.
A powerful constituency at that time consisted of teachers and university experts in curriculum and pedagogy who believed strongly that students construct knowledge actively and intentionally from their interactions with texts, teachers, and other students; that reading, writing, and problem solving are parts of a whole. They believed that the state-mandated standardized testing program functioned as a structural barrier to expanding this mode of teaching and learning. Standardized tests, they believed, encouraged teachers to teach in ways that mimicked the form of mandated assessment. There was empirical support for their view that state testing narrowed curriculum and restricted instructional methods (Smith et al., 1989). If the state would only cease standardized testing, they reasoned, the way would open for better education. They found common ground with language educators and activists who believed that standardized tests adversely affect children whose first language is not English.
The momentum for change was building. Even within the coalition aiming to change assessment policy, however, one can uncover alternative views of what was problematic with existing state assessments. One subgroup believed that the ITBS detracted from efforts to reform instruction toward progressivism and constructivism. The other believed that the ITBS did not provide adequate accountability to the Essential Skills.
It was the perspective of the latter group that resonated most with the policy constituency with the power to make change: the legislature. Asked later about the function of ASAP, a legislator said:
This assessment is an accountability measure, because we want those Essential Skills taught. And the only way we know that its going to be done is if you drop in and take an assessment of that . . . because there really have been no accountability measures up until now. . . . I think that was a driving force to put this all under a legislative piece and put a little teeth into this thing.
Legislators involved in the birth of ASAP had little concern or understanding of those principles of schooling that so motivated the policy actors at ADE and in the professional associations. Nevertheless, the various actors came together in the Goals for Educational Excellence project to develop new assessment policy and write enabling legislation.
Arizona Revised Statutes 15-741, which became effective in May 1990, required the state board to implement Essential Skills tests to do the following: measure pupil achievement in reading, writing, and mathematics in grades 3, 8, and 12; ensure that the tests would be uniform, objective, and yield national comparisons; and mandate district assessment plans, local assessments, and report mechanisms. The legislation itself never mentions the ASAP nor commits schools toward any principles of practice at all, nor to reform, nor to performance tests.
Policy actors would later report a high degree of consensus among the board, legislature, and ADE at the outset of the ASAP program. Yet these agencies were consenting to quite different things. The legislators believed that they were promoting greater accountability as a result of the legislation, while parts of the department and board believed that the state had embarked on a bold new vision of teaching and learning.
ASAP AS ASSESSMENT POLICY: A MOVING TARGET
Most educators never came into contact with the legislation itself and probably would have been surprised by its wording if they had done so. What they were exposed to, in contrast, was the program implemented by ADE and the extensive communication about the function of ASAP to change curriculum and teachers instructional practice by changing the nature of the test. Every communication from ADE trumpeted the merits of performance assessment and the kind of education that is consistent with it.
The moving target of Arizona assessment policy, rife with internal contradictions and ambiguities from the outset, evolved and diversified in relation to (a) shifting power among coalitions of policy actors, (b) variations and limitations in capacity-building efforts, (c) confrontations with the tests themselves, and (d) scarcity of time and resources.
Three reform-minded officials at ADE, supported by the board, AEA, and constructivist educators, set the tone for the program at this stage. They believed that assessment must be authentic and integrated with instruction; that subjects must be integrated with each other around interesting, real-life problems; that teachers should be co-learners, coaches, collaborators, facilitators of learning, and actors rather than targets of curricular reform; that instruction should follow new research on cognition, multiple intelligences, and constructivist learning theory. They also believed that mandating a performance test would move teachers away from reductionistic, basic skillsdrill and practiceteaching of subjects in isolation. As one of them said later, We saw the potential of doing something special . . . the chance for the state to break the lock of the norm-referenced testing which was not serving teaching or learning very well.
ADE public discourse rarely included the accountability aspects of ASAP in those early days. Later, one person declared:
[ASAP] was never intended to be that [a high stakes test], never intended to be used as a determiner for whether or not students graduate, never intended to be a determiner of whether or not they go from grade to grade.
Against this reform coalition, however, was a legislature that was still more interested in holding teachers feet to the fire than in changing the way they taught. As its composition became more conservative and Republican over time, one heard more complaints about, for example, the subjective scoring of the performance measures and the fact that anti-business and environmental activist attitudes had crept into the content of the tests.
Within ADE, the staff was not of one mind about assessment policy, even at the beginning. Bishop herself had always spoken of ASAP as a way to focus teachers attention on the states curriculum frameworks and students attention on mastery of the Essential Skills. Staff and officials concerned with ASAP were organized in two different units, with the reform agenda represented in the ASAP Unit. Meanwhile, the Pupil Achievement Testing Unit, made up of individuals who were experienced with norm-referenced and criterion-referenced assessment (not performance assessment), aligned themselves with the accountability values of the legislature.
About 3 years into the development of this program, the three key reformers left the department. Remaining staff proved to be less thoroughly grounded in progressive educational principles and less effective in maintaining the direction that ADE initially took. While the ASAP Unit changed faces and voices, the Assessment Unit remained consistent and began to dominate official discourse.
If the intent of ASAP was to change the way teachers teach, a critical element was missingteacher training. Many Arizona teachers were less than expert in the principles and practices of performance assessment and of holistic, thematic, and problem-solving instruction and curriculum reform. For a reform of this type, extensive time and training have proved in other states to be necessary (Flexer & Gerstner, 1994; Shepard et al., 1995). The report of the Goals for Educational Excellence panel had earlier promised the legislature that the new program would cost no more than the previous testing program. Now ADE was in a bind, able to mandate assessment policy but powerless to fund state-wide training of teachers to adapt to it. Some districts with sufficient wealth and officials who were open to the new direction suggested by ASAP invested considerable resources in local capacity development, but these were in the minority. In a state with considerable disparity in taxing ability, the already rich and poor districts reproduced disparities in staff development for ASAP as well.
The pace of transformation of the program away from its reform agenda increased under the press from key legislators to start producing data for accountability purposes, which had been their intention all along.
ASAP had limited time either to develop the capacities of teachers and schools or to develop sound psychometric instruments. As a result of political pressure ADE and the test publisher had to produce the various performance test forms in weeks rather than years. Form D-1 was commissioned and administered before all the psychometric and administrative kinks of Form A were worked out, and D-2 was commissioned and administered before the characteristics of D-1 were corrected or even known. Nor was there an equating study to show whether Form D could function as an audit of Form A. An ADE official would later recall that the development of D was done in a fairly shabby way, without adequate field testing. But ADE didnt act on this information for a couple of years. An ADE insider at the time agreed:
Now, the problem there was that the first Form D was used, we tried it out, we reported the results, but Riverside ran a concurrent field test on the Form D. The concurrent Form D-1 field test was returned to the department in late 93, or the fall of 93 sometime. And what it said is that the D form didnt match the A form well enough. But, for whatever reason, the staff of the department kind of took that report and put it on the shelf. . . . Politically, the thing was developing its own momentum down there. Nobody wanted to stop the process, nobody wanted to pull it back. Riverside staff was saying youve got to stop this because the D now needs to be revised and refield-tested to be sure that it matches the A that its auditing. Wasnt done. [The report] was shelved . . . and D-2 then was commissioned, developed, was not field-tested, and was ready to go as the next state-wide audit.
Beyond ignoring the problematic technical data of early versions of the state performance assessments, Superintendent Bishop and other ADE staff reacted defensively to criticism of ASAP. At the time, we found the teachers complaints seemed quite reasonable. The problems seemed to have been for the most part correctable, having to do with glitches in administration, the burden of purchasing test materials, question wording, insufficient time limits, inadequately prepared scorers, vague scoring rubrics, and lack of time and training. However, these rational suggestions were met with political responses. At a meeting of educators sponsored by ADE and AEA, Bishop warned teachers not to complain too much, because conservative policy actors were poised to reinstate universal standardized testing. Open debate over assessment policy did not happen.
In June of 1993, ADE released the initial results of ASAP Form D. The newspapers published the results by school and grade level and ranked schools in much the same manner as they had always reported the standardized test results. In a newspaper report headlined, Tests Say Schools are Failing, the superintendent called the results disturbing and criticized schools and teachers for not adapting fast enough and for not teaching the way kids learn. ADE took the scores at face value, failing to note the possible technical problems associated with any new test. Educators were shocked and dismayed at ADE and media reaction. Many believed that the state had pledged not to use performance assessments this way.
Time and political capital had begun to run out for the reform faction in ADE. The state board of education, prompted by key legislators, demanded accountability. The use of ASAP in determining high school graduation standards became part of assessment policy in January 1994 through the action of state board of education rule R7-2-317. A task force on graduation standards was then appointed to make recommendations about proficiency levels. Its recommendations were later adopted, specifying a level of proficiency for graduation from Grade 12: A student shall demonstrate competency in reading, writing, mathematics, social studies and science . . . by attaining a score of 3 or 4 on each question or item of each Form A assessment [of ASAP] . . . scored with the corresponding Essential Skills (ASAP) generic rubric. . . . Its decision process was political, based on what would appear rigorous to the public. There is no evidence that the task force examined technical data (for example, standard errors around cut scores) or consulted experts on established procedures for setting cut scores. It specifically ignored the Riverside technical report that warned against the use of Form A for pupil level reporting or accountability.
CONSEQUENCES OF ASAP
Had ASAP been effective in achieving its reform aims? Although ADE monitored compliance through the DAPs and conducted a survey of teachers, it commissioned no serious evaluation. Our independent policy study of the consequences of ASAP, however, tracked implementation and reactions for nearly the complete life of the program. What we found (Smith, 1996) can be summarized as follows: Arizona educators were fully cognizant of ASAP, though they defined it in quite different ways. Somewhat less than half of the educators we studied approved of ASAP as they defined it. Much of the disapproval seemed to be the result of problems of implementation of ASAP Form D (e.g., inadequate time limits or directions, inappropriate item content or scoring rubrics, the use of ASAP scores for high-stakes purposes, etc.) rather than the idea of ASAP. Change in curriculum and teaching consistent with ASAP also varied widely and depended on whether there were adequate financial resources to make the change and knowledgeable personnel to help, whether the existing teaching patterns and beliefs were amenable to constructivism, and whether there was little commitment to standardized achievement testing and traditional education. If so, then change was evident, but not otherwise. The low rate of change can be attributed in part to inadequate professional development. The state failed to provide resources for teacher training, ceding responsibility to the districts, many of which were too strapped financially to do anything. Although a few districts devoted impressive resources to develop teachers capacities to implement ASAP reform ideals and invested in curricular changes as well, the average number of hours of relevant professional development reported by teachers across the state was only about 8 hr over a 2-year period. Still, there was enormous effort spent by teachers and administrators simply in complying with ASAP testing and reported requirements.
When we interviewed policy actors subsequent to the demise of ASAP, we found that their interpretations of the consequences of ASAP fit their current agenda and position. For example, an ADE insider during Bishops administration said this:
Well, the great strength of it was that it was changing the behavior in the classroom. We were making change with respect to teaching methodology, instruction, technique, those kinds of things. Also materials were being changed, moving from a reliance on rote memorization, teachers primarily engaged in lecture and students repeating back what they heard, to one where students were actually engaged in the application of knowledge. Teachers were engaging the students in the learning process more. In other words, they would have to solve problems, they would have to apply whatever they had learned in the classroom to a real life situation. They would have to actually write.
But a legislator reflecting back on the program noted:
[There were] constant complaints about the content of the test, about perceptions that the test wasnt valid, that it disrupted the classroomjust constant complaints, and zerozero [support to keep it going].
CAST CHANGES/ASSESSMENT POLICY CHANGES
The pace of curricular change failed to keep pace with the pace of political change. Bishop decided not to seek a second term of office. The Democrats nominated a prominent official in the AEA, the states largest professional association. The Republican nominee and subsequent electee was Lisa Graham Keegan, a state legislator and chair of the Education Committee. The 1994 election also featured the reelection bid of Governor Fife Symington, a prominent businessman who shared politically conservative views with Keegan. They were particularly in tune over the issue of vouchers and school choice. In a move that took everyone by surprise, Bishop bolted the Democratic Party and campaigned both for Symingtons reelection and for the support of charter schools and vouchers. After Symington was elected, he appointed Bishop to a newly created post as education advisor to his administration. After that, she became nearly invisible as a policy actor.
Few involved could be considered more visible than Keegan, a bright, attractive, and articulate woman in her 30s who was a Stanford-educated speech therapist. She was termed the Voucher Queen after two terms as a prominent state legislator. The Phoenix Gazette had tagged her in May 1996 as the strongest candidate for governor in 1998, a designation that may have proved significant.
As a political conservative, Keegan supported policies of less government, less regulation, and promoted the values of efficiency, decentralization, and choice. These values she shared with Symington. They disagreed with each other, however, on how the state should respond to the 1994 Supreme Court order in Roosevelt v. Bishop, which required the state to correct the substantial financial inequities among the districts. Keegan believed that making the educational opportunities more nearly equal was a fundamental responsibility of the state, while Symington fought the order at every turn, launched a court challenge, and even declared that making districts equal was tantamount to state socialism.
Where Keegan stood on assessment policy was less clear. During her campaign she professed her support for ASAP when she addressed teachers organizations and the like. With her corporate and conservative supporters she agreed that the schools were underachieving, bureaucratic, and falling short of producing graduates that could plug into jobs in the corporate world. According to the corporate view, the problem with public schools was a lack of accountability. More testing was the solution.
The Reorganization of ADE
One month after taking office in January 1995, Keegan announced the reorganization of ADE, to better focus our energies on our mission of improving academic achievement. Staff who had previously been involved in ASAP were moved to units not involved with curriculum or assessment, and some were assigned to menial tasks, shuffling papers. Various informants used words such as purge, hit list, and litmus test, to describe the changes in the department. Results of the reorganization on assessment policy were immediate. When you call for information, either no one answers or someone answers who knows nothing.
An official in the Bishop administration who had already left ADE reflected later:
It wasnt personal, it was political. It says more about the lack of confidence in a professional educatorthat someone who has a doctorate in curriculum and instruction, someone who has a doctorate in tests and measurements, someone who has a doctorate in fine arts, for example, is not really important.
TECHNICAL REPORT: TEXT OR PRETEXT?
On January 25, 1995, just two weeks before it was scheduled to be administered, Keegan suspended ASAP Form D. According to news reports, the test publishers themselves had brought technical problems to her attention and compelled her to act: The results we have so far have been called into question, Graham said. I cant say with confidence that its a valid test. It hasnt been verified enough to determine whether it correlates with how much kids know (ASAP Test Suspended, 1995). She stated that the suspension would be only for 1 year while ADE and Riverside worked to improve the test. Meanwhile, the other aspects of ASAP would remain in place.
The main stimulus to Keegans decision seems to have been the Arizona Student Assessment Program Assessment Development Process/Technical Report, which had been prepared by Riverside Publishing Company in June 1994 (with a supplement in November 1994) but never made public. As she said later, The technical merit of the test, probably on a weight basis, weighed more with me than anything else. You cant get around basically an invalid test. The gist of her argument was that the report said Form D correlated too low with Form A, and had inadequate levels of reliability.
The few with access to the report realized that there was little new in the data or recommendations. The report detailed the procedures of a study Riverside staff conducted to equate Forms A and D in volunteer samples. The resulting alpha reliabilities of Form D ranged from .69 (for third grade math) to .87 (twelfth grade writing), with a median of .80. The correlations between relevant Form A and Form D components ranged from .31 to .61, with a median of .45. These, the report said, were about typical for performance tests.
Although cautioning against broad interpretation (given small sample size and differences in level of difficulty between the forms), the report nevertheless concluded that statistical correlations are at a level to demonstrate that the assessments measure the Essential Skills being tested (p. 8), although the evidence was less than compelling and the results not as dependable as those of standardized tests. The report warned against use of ASAP performance test scores to make decisions about individuals.
Riversides interpretations represented only one way to spin these data. We heard others. For example, examining the technical data of established assessment systems that use constructed-response measurement put the ASAP data in a more favorable light. Some observers marveled at how high the correlation turned out to be, given that Form D measured integrated reading, writing, and math, whereas Form A measured skills separately. Or that D-1 was designed to test only one quarter of the Essential Skills, not all of them. Or that development time had been so brief, with one version administered before the previous one had been evaluated. Or that the scoring rubric was so nonspecific.
Opinion about the technical adequacy of Form D divided over its use as an audit of district assessments (Form A) and whether it could be used to certify individual student competence. ADE insiders believed that Form D wasnt providing honest accountability. And what you needed was a high stakes examination that was true.
But an official from the Bishop administration countered:
I dont think the reliability and validity would be a problem if those assessments were put to the use for which they were designed. It is when you try to use them for something for which they were not designed that those things become a problem. . . . Reliability and validity are very good words to use when you want to take an action as she did to end the test.
Suggesting that technical data were pretext for political action, an observer said:
Well, it gave people a place to stand if they didnt like ASAP. . . . I think it was a political decision more than anything, and the technical report provided some place to stand. But I wouldnt say that the action taken to suspend was really warranted based on that technical report.
A Democratic legislator agreed:
So this trumped-up, great revelation that this is all out of whack and we have to put a moratorium on testing . . . and we have to re-tool the instrument, I think it was done with a lot of dramatic flair.
A Riverside representative opined that performance assessments are notoriously unreliable and that given the nature of the materials, it was probably about average. Adequate reliability and validity coefficients rely on a much longer development process than that which characterized ASAP:
Policymakers have this just (clicks fingers), I mean completely unrealistic idea about the difficulties in building tests in terms of time and money. . . . And reliability and validity? Theyre just words to policymakers.
Among those insiders in the Keegan administration and her allies on the state board of education, the evidence was considered damning and Form D incapable of resuscitation. Asked about whether she had ever considered an effort to improve the assessment rather than kill it, Keegan replied:
I dont have that kind of patience. I mean I cant fathom my representing the state exam as a valid measurement of the Essential Skills which were mandatoryare mandatorywhen I knew for a fact that the test was not a representation of ability in that area. Its dishonest.
To advise her on what action to take after the suspension, Keegan assembled an ad hoc committee that met three times during February and March. A member of the committee, which consisted primarily of district test coordinators and ADE staff, reported later:
The committee recommended that D be fixed. We kept telling her, keep working on it, dont get rid of it. . . . But she had this bee in her bonnet that the validity was not high enough, and I think it was all just rhetoric. . . . She had promised to get rid of it [ASAP] during her campaign, and, lo and behold, Form D is gone. I dont believe that she pulled it because it was not valid. It was her way of making a statement right away that she was going to be a strong superintendent. She convened that committee but wasnt really interested in what the committee had to say. It was all just show.
By May 1995 Keegan decided that a 1-year suspension and revision of ASAP Form D would not be enough. The Essential Skills themselves needed substantial revision. On a local television program, Keegan foreshadowed what was to come when she recommended sending the Essential Skills to the scrap heap. Most of them were not measurable, and the documents were so long, convoluted, and filled with educational jargon that parents could not possibly understand. In addition, the Essential Skills failed to embody world class standards and emphasized process rather than outcomes, according to her view.
Stamping her vision of assessment policy meant first changing its name (but not its acronym) to the Arizona Student Achievement Program [italics added]. The new ASAP would test student mastery at different grade levels and add a high school competency exam. Norm-referenced testing would continue. No state ASAP testing would be conducted during the 199596 academic year.
Reactions to this announcement varied across the spectrum. Zero reaction, recalled a Republican legislator. Insiders in the Bishop administration, however, reported protests from groups such as the Parent-Teacher Association, teacher groups, and administrators in those districts that had spent time and energy adapting their programs toward constructivist education.
People were very disappointed, very disappointed. And they didnt understand why it was stopping. People in school districts said, geez, weve really invested a lot of time and effort getting the kids ready. Everything that we had hoped for that districts and schools would do to prepare their kids to do well on these assessments was happening. There were places obviously it wasnt, but just stopping was a tremendous let-down for people.
Members of the Keegan administration denied that this move amounted to only a correction or clarification, because it affected only a part of ASAP. What they misunderstood was that, to many educators, Form D was ASAPan integrated assessment that both mirrored and promoted integrated instruction.
Although Keegan originally planned to take 5 years to revise the Essential Skills and develop new tests in consultation with teachers, parents, business, experts, and the like, the board of education rejected that plan as too slow. Board members wanted an immediate revision, no gaps in accountability data, and a graduation competency test as a substitute for seat time. So Keegan put aside her plan for a patient, collaborative process of standard-setting in favor of staging the Academic Summit, at which the Skills would be replaced with new standards in minimal time.
Some educators wrongly believed that the actions of the board and superintendent contravened state law. In fact, the legislation had never mandated or even mentioned ASAP, but prescribed norm-referenced testing in three grades and Essential Skills testing, which the districts were to carry out during the transition. However, the proposed change of grades for Essential Skills testing (i.e., from grades 3, 8, and 12 to grades 4, 8, and 10, plus a graduation test) required that the legislation be changed. The necessary change in legislation again made the state legislature a powerful player in assessment policy. By this time, the legislature had moved far to the right.
THE ACADEMIC SUMMIT: BLITZKRIEG STANDARD-SETTING
The Academic Summit took place in a Scottsdale resort, in October 1995. The nine design teams, one team for each of the nine content areas for which standards were to be developed, had had two prior meetings to become acquainted with each other and with the task before them. The schedule would be tight, but the summit planners believed it would be possible to form a team, write a three-page list of clear, measurable standards, present them to the other teams, get reactions during public hearings in December, and write a final draft to present to the state board in time for its January 1996 meeting. The board would then approve the standards and issue a request for proposals to test publishers. The winning bidder would then construct pilot assessments to be administered in March of 1996. That was the plan.
Although officially managed by the ADE deputy superintendent, the summit organization bore the marks of two groups: a set of consultants from Doyle Associates (a conservative think tank) and trainers and facilitators from Keegans corporate partners. The influence of Dennis Doyle was apparent in that the new standards were to be written in terms of Levels, rather than grades. Competency tests, rather than time, would be the means by which pupils would progress through the system. The influence of the corporate partners was felt in choice of facilitators, in language and concepts more appropriate to the corporate world than education (design teams, clear and measurable standards, market incentives, performance equated with product, and the like), and in the prominent place of workplace skills and technology.
Each of the nine design teams comprised nine members plus one or more facilitators. The participants included parents, teachers, students, and laypersons who had been appointed by ADE from a list of self-nominations. That is, a math teacher was just as likely to turn up on the social studies team as on the math team. Curriculum specialists were conspicuously absent. An official with ADE explained that many of the participants had nominated themselves based on Keegans informal comments during a fact-finding trip. Later, when the summit was announced more generally and officially, people (i.e., curriculum specialists) who asked to participate were told that the teams were already full. But the department also made clear that loading the teams with nonspecialists would have the effect of reducing educational jargon and making the standards clear and measurable. During a board meeting, one member said about curriculum specialists, We dont want to know what they know. We deliberately cut them out of the process. Another board member said in an interview:
We had teachers involved, but we did not have curriculum coordinators involved, nor did we have the Department of Education employees involved. We felt those latter two, while many could contribute, just as many, if not all, would have a stake in maintaining the status quo, or in making their job easier. And that was not our intent. We dont care really whether their job is easier. We want it to be just as challenging for them as it is for everybody else.
Whether intentional or accidental, ADE did not pursue a strategy of representation on the teams except to include laypersons on each. Teams had few minorities.
In the language arts team, tensions surfaced. Although the summit directives attempted to guide participants away from constructivism toward the simple, clear, and measurable, resistance by the few professionals was evident in their attempts to introduce constructivist principles (some said jargon) from the standards of the National Council of Teachers of English. Again resisting directives, participants repeatedly tried to deal with issues of equity and quality education for second-language learners. Despite their pleas, the superintendent resolved that, Without question, the standards you are creating are for proficiency in the English language. Assessment of the standard will be in the English language.
On the language arts design team, 20 planned hours became approximately 200 hours of meeting time, spread over nearly a calendar year, plus countless hours spent in reading, writing, and reflection. The language arts standards would not be accepted by the board until the summer of 1996, and then only the basic skills parts. By late 1997 conservative groups were holding up science standards over inclusion of evolution and creationism. By late 1998, social studies standards were still bogged in controversy.
In January 1996 ADE conceded that its schedule would not hold up. The design teams continued to meet thereafter, passing drafts back and forth to consultants and ADE. In a letter to schools Keegan announced that the drafts would not be submitted as planned to the board and that the spring pilot assessment would therefore be delayed a year.
Despite the diversity of backgrounds of the language arts team, the members eventually came to common understandings of the issues. They took their work seriously and produced draft standards with integrity, focus, and balance, according to an informant. They incorporated constructivist principles in several ways. They included standards that could not be easily tested with multiple choice tests. They injected issues of global literature, cultural comparisons, reading for pleasure, and self-assessment of student as writer. They designated four varieties of standards within language arts, that is, listening, speaking and visual representation as well as reading and writing. Many of the constructivist ideas such as content integration, projects, thematics, and problem-solving were in those latter parts of the draft language arts standards.
REACTIONS AND REVISIONS OF STANDARDS
During the third week of December, after the nine draft standards documents had been distributed, 11 public hearings were conducted. ADE staff ran the meetings, sometimes including the superintendent and board of education members as well as the design team participants. The audience for most of these meetings consisted primarily of organized constituencies: members of Parent-Teacher Associations, teacher associations, university content specialists, and conservative political groups, plus a few parents and teachers on their own. The meetings were all of a typeADE presentation on the history of the standard-setting process, a question-and-answer session, and then a series of controlled, 3-min speeches from members of the audience who had signed a list to speak. Although the members of the podium politely listened, there was little democratic participation. Occasionally tempers grew short, as contenders over such issues as outcomes-based education, bilingual education, and phonics argued against each other or against the state. Conservatives reminded Keegan that she had made campaign promises to them to eliminate ASAP and other progressive reforms.
Following the public meetings, further reactions to the draft standards came to ADE by mail and fax. No one could say, however, just how extensive was the distribution of the design drafts or how representative were the comments sent back. If state teacher organizations had an official response, it was not reported in the newspapers. ADE later claimed that extensive teacher input had been sought and received.
Revisions of the draft standards were made and passed back and forth to ADE and consultants over several months. Two members of the state board also participated in the review. The evolution of the drafts reflected the tensions already evident at the summit: the board and ADE emphasizing the simple, brief, measurable, and ambitious; and the team leaning toward the complex, process-oriented, holistic, and integrated, influenced by the national standards. Between drafts three and four, the superintendent wrote a long memo to the review team, recommending a variety of substantive changes. We must develop a sample reading list for each of the five levels to give . . . a sense of the quality and complexity of text students are expected to read and master. . . . We may want to add a requirement that students read a certain number of books per year (e.g., 2030) from an identified number of writers and genres. Angered, the team ignored most of the recommendations. One change she suggested did work its way into (or more properly, out of) the writing standards. Originally, one standard at the readiness level read, Perceiving themselves as writers. Her response, important, but how do you measure? led to deletion of that standard and the substitution of spells simple words, and writes the 26 letters of the alphabet. ADEs insistence on measurability also resulted in the deletion of standards related to developing students as life-long readers.
Overall, the board had two interests that interfered with the teams vision of setting integrated language arts standards. First there was the pressure of time and the need to show that something had been accomplishedthe standard-setting process had already taken up many more months than had been planned. But the second interest was to boldly undo the old ASAP reform agenda and move the policy toward a more traditional pedagogical orientation. The board therefore moved to fracture language arts into separate areas: reading, writing, listening0speaking, and visual representation. Since the progressive elements were in the connections, what was left was disaggregated skills. The team was frustrated in its attempts to preserve the old ASAP, professional autonomy, progressive pedagogy, and sensitivity to ethnic and language diversity.
THE LEGISLATORS, GOVERNOR, SUPERINTENDENT, AND BOARD PRESIDENT: AT PLAY IN ASSESSMENT POLICY
During the months after the summit when the team members, ADE staff, and board president were occupied with exchanging drafts and moderating their positions, the governor and key Republican legislators were also working on their own versions of assessment policy. A legislative staffer reminded us that there had been serious opposition to ASAP (the old version) for some time, and bills had twice been introduced to kill it:
There were a number of concerns around the ASAP test. Some of them are the same kinds of concerns that have come up about what you call outcomes-based education, that its value-driven, that youre funneling values to the kids that might be contrary to some peoples family values, and so forth. . . . I know that, for example, the House Speaker and a number of parents were quite upset one year when the twelfth-grade test had to do with rain forests. They claimed that the test biased the children in favor of keeping rain forests at all costs [and] they come from it from an economic perspective which isdont harm peoples economics just to save the environment. Then there were those like the Senate majority leader [who] hated it because he mistrusts educators in general. . . . The people who feel very strongly about this know that thats what they want. And they didnt ask for research.
House Bill 2417, after considerable give and take about how much testing there should be, finally emerged. It required Essential Skills testing in at least four grades, norm-referenced tests in four grades, and competency tests to determine high school graduation. Several versions of the bill passed privately among the political actors with virtually no public scrutiny or commentary. When the bill passed both the education committees and the full legislative bodies, it happened without a public hearing.
SYMINGTON CONTRA KEEGAN
By the state constitution, the Arizona Department of Education does not report to the governor, as some other state departments do, and therefore he has no control over its budget and operations. The state superintendent, who heads ADE, is elected rather than appointed and thus is not part of the governors cabinet. However, the governor appoints members of the board of education, which has statutory authority for educational policy. The superintendent both serves as a member of the board and holds primary responsibility for carrying out board policy through the department. The tussle between Superintendent Keegan and Governor Symington reached a dramatic climax in the March 16, 1996, incident described below. No less than the governorship was at stake.
Fife Symington was first elected governor in 1990, campaigning on his record as a successful businessman and on his moderate position on social issues and fiscal conservatism. By 1995, he had moved far to the right in the political spectrum on every issue. In a September 29, 1995, press release of a speech before the Phoenix 100 Rotary Club, he called for radical restructuring of the state school system, doing away with districts altogether. No collective bargaining or master contracts would be allowed. He planned to eliminate certification of teachers and administrators, free existing public schools from all laws and regulations, institute parental choice grants to enable parents to send their children to the school of their choice, and create a mechanism to place into receivership those schools that consistently fail to educate their students. In addition, Symington proposed to abolish the ADE (Keegans department):
The public education system spends over $3.5 billion taxpayer dollars annually, with absolutely no accountability for results. We have a school report card that is virtually toothless because we have no independent, uniform testing system in place to evaluate our students progress. We must restore the ITBS achievement testing of every student, every year in grades 312 immediately; we cannot wait two or more years for the department of education to revise the state testing program. We must set high graduation standards for all students.
A week before this speech, Symington had declared personal bankruptcy. The Arizona Republic editorialized (Symingtons shifting priorities, 1995), If Gov. Fife Symington had set out deliberately to divert public attention from his personal financial travails, he couldnt have picked a better strategy than getting his critics, and others, focused on something else.
GOVERNOR ENTERS STANDARDS DRAMA FROM STAGE RIGHT
March 25, 1996; what we thought would be the finale to the Academic Summit turns out to be a mere complicating action. As the summiteers and other educational policy watchers take their seats in the board room, the members of the Arizona Board of Education assume their leather high-back chairs behind a long curved table raised to imposing height above the spectators. The board meets today to considerwe expect them to vote, yea or naythe new academic standards developed during the Academic Summit. The summit and standard-setting process have dragged on for more than a year already. But Keegan has come into the superintendency with a new and quick broom. Sweeping out the old here means rejecting the progressivism deeply embedded in Essential Skills and ASAP, which, she believes, have mistakenly abandoned traditional drill and practice, basal and textbook mode of instruction, and high-stakes testing in favor of student-centered, integrated, higher-order thinking and problem discovery and solving. The new standards and assessments would instead be clear, measurable, and focused on basic skills. But there is still a constituency for ASAP among the summit participants and the standard-setting process has turned contentious and drawn out as each faction struggles for purchase. We have followed this history with interest. And now, finally, at least the language arts and mathematics design teams have worked out their final drafts. We observers thought they represented compromises between progressive and traditional valuesthere was something in them for both sides. So, it was now time for the board to make them official. Members of the board had already indicated they wanted to go forward so that test development could begin. The agenda calls for short testimonies from the floor and then a vote. No one expects much out of the ordinary.
The buzz takes us all by surprise, as Governor Symington strides purposefully into the room and requests permission to address the board. Trailing him are members of his entourage, who distribute copies of his prepared address to members of the press. Among them is his educational advisor, C. Diane Bishop, the former superintendent.
From the looks on their faces, we can see that the board members and Keegan are just as surprised as the rest of us at this unprecedented intrusion. The gist of Symingtons remarks is this: the draft standards show the reckless drift toward fads and foolishness that characterizes most of professional educators work; and the standards fail to mention phonics, spelling, or memorization of math facts and the state capitals. The arts standards are singled out for particular ridicule. Taking questions from the floor, he declares that the state should reject pointy-headed elitist professional jargon and just get back to basics and standardized testing for everybody.
Keegan is clearly flustered and tries to correct what she sees as Symingtons misreading of the standards, but the damage is already done. Symington leaves, the board breaks, and the argument goes on in the audience.
The vignette above was constructed from field notes. A press release from the governors office preserved his remarks at the board meeting:
I have been following the effort by the board and Lisa Graham Keegan to develop curriculum standards for Arizonas public schools. I support the concept, but I am concerned about the direction the board may be taking. . . . In education, we have been making the same mistake humanity always makes again and again. We have casually cast aside the settled and true in favor of the trendy and allegedly exciting. [Other than technology,] there is almost nothing new about a high-quality primary education, and very little new in secondary education. Most of the social and academic innovations the so-called professional educators have brought to our classrooms are wasteful at best and insidious at worst. I stopped by today because some of this reckless drift toward fads and foolishness is evident in the standards currently under consideration. The reading standards, for instance, mention nothing about phonics for primary school students, nor, say, great works of literature for those in high school. They do, however, insist that our students learn to use consumer information for making decisions, and to interpret visual clues in cartoons. The mathematics standards state that students should be able to explore, model, and describe patterns and functions involving numbers, shapes, data, and graphs, and use simulations to estimate probabilities. Educational concepts more familiar to most of us, such as multiplication and division, are unmentioned.
In a political analysis in the Arizona Republic, headlined Symington Moves to Right Seeking Votes, Michael Murphy wrote:
Symington, who once cast himself as a moderate Republican, is laying the groundwork for a re-election bid by courting the most extreme elements in the conservative coalition. . . . [P]olitical observers in Arizona agree that Symington, whose popularity ratings have spiraled downward because of his personal bankruptcy and the indictment of two close associates, has adopted a strategy of fiery neo-populism. The idea is to build a core of supporters among the states hard-liners who would be the backbone of a 1998 re-election campaign. . . . [O]ne close political ally indicated that Symington has developed a political playbook focused on picking hot-button issues that resonate among the GOPs most conservative elements. . . .
Symingtons interest [in the standard-setting process] was spurred by Dinah Monahan, a Snowflake [Arizona] mother of five and a leader in the Eagle Forum, a conservative lobbying group founded by Phyllis Schlafly. She is mobilizing other Christian Right groups, including the Christian Coalition and the Concerned Women for America, to fight what she calls the humanist, globalist, New Age indoctrination of Arizona schoolchildren. (Murphy, 1996, pp. 1, 13)
In October 1996, Keegan continued to conflict with the governor, calling on him to resign, as he was no longer a productive or effective leader. But neither his legal and financial problems nor opposition from fellow Republicans could slow Symingtons pursuit of his policy agenda. The appointments he made to the state board during 1996 and his working behind the scenes with the legislature further reinforced his conservative stance and shifted the conflict in developing assessment policy away from ASAP reform ideals and toward basic skills and standardized testing.
APPROVAL OF THE STANDARDSHOOKED ON PHONETICS
Months dragged on. The draft standards in math and language arts were discussed at every meeting from January to August. By summer, the composition of the board had shifted, and the alliance between the board and the superintendent had weakened.
The board interpreted its task of approving standards quite broadly. Indeed, members minutely inspected each standard, bullet, and level. The primary bone of contention was the extent of basic skills to be made explicit in the standards. The newly appointed conservatives insisted on explicit inclusion of rote memorization of math facts, direct instruction of spelling and phonic skills, and exclusion of anything they deemed unmeasureable or progressive. Keegan defended the drafts the language arts team had submitted. An informant described the climactic moment in the July board meeting:
All along Lisa hadnt wanted phonics mentioned as a standard because it was a process, and she didnt want instructional techniques in the standards. But Felicia, who is a teacher at Franklin Traditional school and is about as far right as you can getshe was unbelievable. She was relentless. She had been wearing people down all day. All day long they had been going back and forth, back and forth about whether to put the phonics in as a standard. Finally it came down to 5 oclock in the afternoon and everyone on the board had left but five people and you need all those five votes to pass anything. And Keegan caved in. And I had quotes from her earlier in the day that said she wouldnt go for it, but she did. Because Felicia said in so many words that she would not approve the standards unless phonics were not only part of K3, but also part of K8. I think they caved in just to get something passed. Anything. All these months had gone by, and still nothing had been approved. I think they were desperate to get something officially approved. So what they did, they agreed to a change in wording. Instead of calling it phonics, they called it phonetics. And the vote was 50.
THE DEBUT OF AIMS
The political salience of assessment policy has cooled considerably since the conviction and removal from office of Governor Symington and the fall of Superintendent Keegan from the list of gubernatorial replacements. State board versus legislature became the hotter political issue in 1998. ASAP was replaced by AIMS, the Arizona Instrument to Measure Standards. No official talks are held about problem-solving; integrated, thematic teaching; or reading-writing connections; or about testing the way teachers teach and learners learn. Discourse about progressive reform no longer occupies the wings, let alone the center stage. Talk about performance assessment as a way of authentically representing the way teachers teach has given way to concerns about efficient testing of basic skills, isolated by subject matter. The test burden on Arizona school children is higher than it was before ASAP, with standardized testing of every pupil in all grades, 312. The new AIMS (reading, writing, and math) is administered in four grades as well, and consists mostly of multiple-choice items, a few short-answer items, and one essay. In addition, the state mandates that districts develop and implement local assessments at the grades AIMS does not cover and in the other content areas. Student mastery will be reported to the Arizona Department of Education, which makes all these data available on the Internet. High-stakes accountability is also greater than it had been before, with high school graduation to be determined by test results. State assessment policy has moved further to the right both politically and pedagogically. Teacher organizations and progressive educators have little voice. Second-language issues remain unresolved. The official line on equity for ethnic and language minority and disadvantaged pupils is that the state must set the bar high and equally for everyone, and that it is racist to think that all children cannot vault it successfully with the available pole.
Like the ASAP policy, the successor fails to provide for professional development. ADE policy actors give lip service to teacher training, although they lack the power or budget to do anything more. The legislature is disinclined to do so because of its higher priority on tax saving and its general antiprofessional sentiment. Once again, the districts and individual teachers are left to their own devices about retraining, curriculum development, and adjusting to the demands of the new tests. Nor has there been any concern expressed for opportunity-to-learn or delivery standards. The graduation competency requirements will have kicked in before any official attention to curricular offerings has been paid. This will be most serious in the consequences of the math standards, some of which mirror the National Council of Math Teachers Standards. No one knows how widespread these standards have permeated math classrooms in Arizona, but the math curriculum experts believe that the dissemination is uneven and slow overall. In those districts with substandard math programs, it will be students who suffer the high-stakes consequences when the tests test what they have not been taught. Policy actors refused to see OTL issues as anything but a bid by the district establishment for more money. Finally, like ASAP performance tests before them, the new assessments will have to be developed in a matter of weeks rather than years, with insufficient review and evaluation, because the board of education and legislature demand accountability, now, and cheaply.
POLITICAL ANALYSIS OF ARIZONA ASSESSMENT POLICY
The preceding narrative was constructed around the theme of politics. Consequently it suggests that fair deliberation and rational action do not always or necessarily characterize policy making. Taking the conventional view of policy assumes that rational choice, although never optimal, is the central influence in decision-making and policy making. Better information and democratic deliberation by informed participants will lead to better policies (Edelman, 1985). Rein added that rationality suggests that, at the least, the process of making a decision made use of whatever resources of knowledge, judgment, imagination, and analysis were available (Rein, 1976, p. 100). This view assumes that policy making aims to optimize the consensual and core values of the commonweal and that the democratic process adjudicates differences in values among constituent groups. In educational policy making it assumes that once consensus is reached the policy translates rationally and predictably through program development and implementation to its logical consequences in schools, distributing the benefits imagined in policy goals. Means and ends are coordinated. Policies are instrumental: they lead to achievable consequences with few surprises. Costs and benefits are understood (Edelman, 1985). Stone (1997) refers to this set of assumptions as the rationality project.
Edelman contrasted instrumental from symbolic and political policy: Any analysis that encourages belief in a secure, rational, and cooperative world fails the test of conformity to experience and to the record of history (Edelman, 1985, 1988). Instead, politics and policy are matters of symbol, myth, and spectacle constructed for and by the public (Edelman, 1988, pp. 45).
The political nature of Arizona assessment policy is revealed by reference to a variety of frameworks in the literature: political culture, political trends and structure, garbage can, transformation of intentions, micropolitics, political symbolism, and political spectacle models. We will reference each of these and indicate the features of the Arizona case that instantiate the categories of each.
The policy that takes hold and persists in a state tends to be one consistent with that states political culture (Marshall et al., 1989). Those policies that conflict will not long endure. Arizonas political culture tugs policy more often toward its dominant values of efficiency, accountability, and choice, and away from contending values of quality, equity, and professionalism. Assessment policy before and after the ASAP era reflected these core values. ASAP, however, which was conceived by professionals and embodied progressive ideals, was soon overrun by demands for central control over curriculum and tangible test results for the least expense. Calls for policy to enhance fairness and equity were unlikely to last for long in a state political culture hostile to these values. Traditionalistic policy cultures such as Arizonas emphasize the leading role of economic elites in shaping public decisions, with a consequent fusing of private and public sectors and a limitation on citizen participation (Marshall et al., 1989, p. 118), as well as a distrust of bureaucracy, labor unions, or teacher and administrator authority and concerns (e.g., professional development and certification standards). Furthermore, there were strong antitaxation sentiments and persistent demands for accountability (Marshall et al., 1989) in the political culture that had effects on assessment policy.
The national discourse on education and political trends interacted with the state policy culture (Ball, 1990; House, 1991). Claims that public schools were failing were repeated as Arizonas policy actors proclaimed the need for greater stringency in state assessment policy. Discourse about the link between achievement test scores and economic competitiveness reinforced this trend as did the image of schools as factories manufacturing achievement test scores and producing economic prosperity. The role of corporate elites and national networks of conservative actors in the local policymaking and standard-setting process further revealed the influence of national political culture and political trends. The shift to the political right that characterized national and state politics from the late 1980s to the middle 1990s also influenced assessment policy. Early on, the policy actors were split between the parties. Later, Arizona became virtually a one-party state. The governor, state superintendent, and legislative majorities were conservative Republicans. The appointments they made to the state board of education, ADE staff, and various ad hoc groups reinforced this trend. The dominant discourse was union-baiting, educator-bashing, federal mandate-and court order-defying. Right-wing extremists often made the news, as did religious conservatives. Assessment policy could hardly be immune from this climate, particularly because of the relationship between political and pedagogical conservatism.
Within broad structural limits, policy itself is neither unitary nor invariant in that different actors in different situations experience and interpret assessment policy differently (Hall, 1995). At the stage of policy formation, policy actors construct links between assessment solutions and putative problems. The problem, as defined by some Arizona actors, was that schools were not accountable, efficient, or effective. Others defined the problem as an outmoded form of pedagogy held in place by an outmoded high-stakes standardized test. With her progressive-minded staff, Bishop, as policy entrepreneur, grafted constituencies together to get ASAP on the policy agenda, doing so by obscuring the underlying contradictions between the two problem definitions. Change and the pace of change (at both the birth and death of ASAP) can be explained by the garbage can theory of policy making (Kingdon, 1995). This theory suggests that there is a narrow window of opportunity during which the various constituencies (each with different policy goals) can be brought together to get a policy on the agenda. Coalitions of constituencies with conflicting agendas and interests often prove to be unstable, as happened in Arizona. The constituency for progressive reform through performance testing was scattered and silent by 1995. This short-lived confluence of alternative, even internally contradictory, perspectives reflects Kingdons theory that policies usually obscure underlying contradictions in values and perspectives of political actors whose various agendas come together temporarily. Ambiguities and contradictions then send conflicting signals to those who must implement the policy and those who are supposed to react to it (Kingdon, 1995). An important influence on changing assessment policy was the change in policy entrepreneur (Kingdon, 1995) and the political motivation to establish the leadership of the new superintendent. Bishop had defined ASAP as the centerpiece of her administration. The political competition between Keegan and the new coalition of Bishop and Symington in 1995 made a change predictable.
Once on the agenda, a legislated policy is still not invariant, as the ASAP case illustrates. Halls model proposed that policy is a process of transformations from its original intentions through layers of administration and implementation (Hall, 1995). This helps explain how the Bishop administration could take a piece of legislation that specified one thing (only that essential skills should be assessed and reported), turn it into something else (a program of performance assessment and reform of teaching and curriculum), and later transform the policy into a form of high-stakes accountability, even though the policy, a legislative text, remained the same. There were multiple incidents that revealed the conflicting goals and interests of constituent groups within the educational system that contended with one another for status, power, and definitions of the situation (Ball, 1987). For example, bureaus within Bishops ADE contended over standardized versus performance assessments, ADE contended with the Arizona Board of Education, and the Arizona Board of Education contended with the legislature over how to define accountability and which testing system best provided it. The micropolitics between levels, factions, and organizational units had effects on the radical change in assessment policy.
Once a policy enters the political arena, political analysis is necessary to understand it. When a candidate for office, or officeholder, tries to identify him- or herself as the education governor, or when educational goals become part of campaign promises or function as bargaining chips among political supporters, a conventional policy analysis becomes incomplete or even distorted. Edelman (1985) disputes the conventional notion about politics as the authoritative allocation of benefits and costs across the polity. Political actions do convey goods, services, and power to specific groups (p. 5), but these groups are the few. For the majority of us, most of the time politics is a series of pictures in the mind . . . a passing parade of abstract symbols (p. 5). Thus, policy analysis must distinguish between instrumental elements (rational process and real consequences) and symbolic elements. The following paragraphs present these elements, which, in Table 1, we instantiate with incidents from the narrative.
MANEUVERING TO OBTAIN REAL EFFECTS FOR THE FEW
Political acts do result in tangible benefits to a few political actors. These political acts typically occur in the background, largely out of sight of the general public. They result in monetary gains and losses (through jobs and contracts) for the few, advancement of political careers and status positions, furtherance of an ideological position of a particular group of political supporters, and changes in the relative power of one agency at the expense of others. In contrast, the general public is engaged in a spectator sport where politics are concerned. Nearly all important or controversial political acts function primarily as symbols. The meaning of an act . . . depends only partly or not at all upon its objective consequences, which the mass public cannot know (Edelman, 1985, p. 7).
INVOKING SYMBOLIC LANGUAGE
The paradigm case of using symbolic language in educational policy is The Nation at Risk (National Commission on Excellence in Education, 1983), from its title to its recommendations. The symbolic association of the status of U.S. schools with disease or national defense is by now so ingrained that one rarely notes that its concrete referent is only distantly related to its symbolic forms. It functions, Edelman would argue, to create anxiety in the public and justify actions on the part of political policy makers (Edelman, 1985). Use of hortatory language and metaphors aims to appeal to the support or at least quiescence of the public. Policies meant to determine high school graduation standards by attainment of passing scores on competency tests, for example, use the metaphor of seat-time. The phrase connotes that the system to be replaced is one in which students just sit passively and incompetently. It quiets the critical response that might refer instead to students accumulating Carnegie units (course credits) on the basis of multiple teachers making professional judgments of their competence. But as a result, material changes in authority and definition of schooling have been effected, largely out of public notice.
INVOKING POLITICAL SETTINGS AS SYMBOLS
According to Edelman, political acts take place in contexts that suggest some individuals are actors but most are spectators. These formal settings reinforce and justify the social distance between the two groups and legitimize a series of future acts (whose content is still unknown) . . . thereby maximizing the chance of acquiescence (Edelman, 1985, p. 98). Policies announced from in front of the presidential seal, rules handed down from a federal court bench or from other formal or evocative settings have this function.
POSITIONING POLITICAL ACTORS AS LEADERS
Politicians in the policy arena take advantage of the common ideology that some people are born leaders and thus are different from the rest of us, according to Edelman. They reinforce images of themselves as leaders by acting in formal, public settings and through a dramaturgical performance emphasizing the traits popularly associated with leadership: forcefulness, responsibility, courage, decency, and so on (Edelman, 1985, p. 81). Leaders publicly associate themselves with innovation, emphasizing the apparent differences between their own qualities and programs versus those of their predecessors or competitors. The defining of policy actors as leaders functions to insure quiescence and justify unequal privileges and authority.
Part of the process of constructing oneself as leader involves evoking crises as justification for ones policy initiatives. For example, a precipitous (though possibly artifactual) test score decline becomes a pretext for changing curriculum to the leaders (or the leaders political supporters) favorite alternative. The change is justified because of the dire risk the decline implies, say, to the states economic health. Likewise, leaders create enemies and stage battles for dramaturgical effects. Media reinforce the aspects of spectacle rather than substance.
DEMOCRATIC PARTICIPATION AS MYTH
Leaders are privileged to act. Others react. Most people believe they participate in a democratic or representative way by voting or testifying at hearings where legislation or regulations are decided. Yet in politicized policy making, according to Edelman, the actions of the public amount to mere rituals because they are highly formalized and far removed from where the real decisions are made. The backstage is where broad visions and fine details of policies are made.
MYTH OF RATIONALITY
According to Edelman, complete rationality in decision-making is never possible . . . because knowledge of consequences of any course of action is always fragmentary, because future values cannot be anticipated perfectly, and because only a few of the possible alternative courses of action ever come to mind (Edelman, 1985, p. 68). In political acts, actors evoke symbols of rationality even when reason does not govern the act itself. Thus political actors point to the results of public polls, census statistics, or test score declines as justification for actions they want to take on political grounds. But the public must believe in the rational and ethical underpinning of the action or else it will fail the test of credibility and authority.
DISCONNECTION OF MEANS AND ENDS
One can distinguish instrumental from symbolic policies by judging whether the goals they purport to achieve have credible relationship to the means provided or suggested to achieve them. Is there a technology established or a research base available that connects programs to desired outcomes? Are teachers equipped to deliver the programs? Have enough time and material resources been provided to develop and implement them? Is there any provision for monitoring implementation or assessing effects? If not, one suspects a primarily symbolic policy. Symbolic policies reinforce the leadership image of those who propose them and instill quiescence among othersa dulling of critical response. Calling for a reduction in class size positions the political actor as a friend of education and defender of high achievement standards. The public is lulled into a satisfied mind-set of feeling something positive is being done. People in such a state are unlikely to ask about the potential side effects on teacher supply and classroom availability (or what children are most likely to be taught by uncertified teachers as a result). The high costs of the program may make implementation prohibitive. The leader symbolically benefits, and material benefits for children will be unequally distributed and largely absent.
CREATING POLITICAL SPECTACLE
The elements of symbolism in policy making add up to what Edelman refers to as the political spectacle (Edelman, 1988). Assessment policy is not solely political in nature. Policy actors and those who administer policy have positive intentions to do substantive good. Instrumental goals are both implicit and explicit at all levels. But to ignore the political nature of assessment policy, that is, to treat it solely as rational and instrumental, is to engage in a cycle of confusion, optimism, frenzied activity, disappointment, and cynicism.
ASAP test suspended. (1995, January 25). Arizona Republic, pp. A1, B2.
Ball, S. J. (1987). The micropolitics of the school. London: Methuen.
Ball, S. J. (1990). Politics and policy making in education: Explorations in policy sociology. London: Routledge.
Edelman, M. (1985). The symbolic uses of politics. Urbana: University of Illinois Press.
Edelman, M. (1988). Constructing the political spectacle. Chicago: University of Chicago Press.
Flexer, R. J., & Gerstner, E. A. (1994). Dilemmas and issues for teachers developing performance assessments in mathematics. Los Angeles: University of California, Los Angeles, Center for Research on Educational Standards and Student Testing.
Hall, P. M. (1995). The consequences of qualitative analysis for sociological theory: Beyond the microlevel. The Sociological Quarterly 36(2): 397423.
House, E. R. (1991). Big policy, little policy. Educational Researcher 20(5): 2126.
Kingdon, J. W. (1995). Agendas, alternatives, and public policies. New York: Harper-Collins.
Marshall, C., Mitchell, D., & Wirt, F. (1989). Culture and educational policy in the American states. New York: Falmer Press.
McDonnell, L., & Elmore, R. (1987). Getting the job done: Alternative policy instruments. Educational Evaluation and Policy Analysis 9(2): 133152.
Murphy, M. (1996, May 5). Symington moves to the right seeking votes. Arizona Republic, pp. A1, A13.
National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government Printing Office.
Noble, A. J. (1994). Measurement-driven reform: The interplay of educational policy and practice. Tempe: Arizona State University Press.
Nolan, S. B., Haladyna, T. M., & Haas, N. S. (1989). A survey of Arizona teachers and school administrators on the uses and effects of standardized achievement testing. Phoenix: Arizona State UniversityWest Campus, Education and Human Services.
Rein, M. (1976). Social science and public policy. New York: Penguin Books.
Shepard, L. A., Flexer, R., Hiebert, E., Marion, S., Mayfield, V., & Weston, T. (1995). Effects of introducing classroom performance assessments on student learning. Los Angeles: University of California, Los Angeles, Center for Research on Educational Standards and Student Testing.
Smith, M. L. (1996). Reforming schools by reforming assessment: Consequences of the Arizona student assessment program. Los Angeles: University of California, Los Angeles, Center for Research on Educational Standards and Student Testing.
Smith, M. L., Edelsky, C., Draper, K., Cherland, M., & Rottenberg, C. (1989). The role of testing in elementary schools. Los Angeles: University of California, Los Angeles, Center for Research on Educational Standards and Student Testing.
Stone, D. (1997). Policy paradox: The art of political decision making. New York: Norton.
Symingtons shifting priorities. (1995, October 3). Arizona Republic, p. B4.