Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13
Topics
Discussion
Announcements
 

State Assessment Becomes Political Spectacle--Part V: Cast Changes in the Second Act of Assessment Policy History


by Mary Lee Smith, Walter Heinecke & Audrey Noble - September 13, 2000



...Continued from Part IV: First Act in the History of Arizona Assessment Policy


ELECTIONS

Bishop's first term of office was due to conclude in 1994. For reasons unrelated to policy, she decided not to seek a second term of office. The Democrats nominated a prominent official in the Arizona Education Association, the state's largest professional association. The Republican nominee and subsequent electee was Lisa Graham Keegan, a state legislator and chair of the Education Committee.

The 1994 election also featured the reelection bid of Governor Fife Symington, a prominent businessman who shared politically conservative views with Keegan. They were particularly in tune over the issue of vouchers and school choice. He was opposed by another businessman, Eddie Basha, who enthusiastically supported public schools and opposed voucher initiatives.

In a move that took everyone by surprise, Bishop bolted the Democratic Party and campaigned both for Symington's reelection and for the support of charter schools and vouchers. After Symington was elected, he appointed Bishop to a newly created post as education advisor to his administration but became nearly invisible as a policy actor.

Few could be considered more visible than Keegan, a bright, attractive, articulate woman in her thirties, a Stanford-educated speech therapist. She was termed the "Voucher Queen" after two terms as highly visible state legislator, according to a subsequent article in Education Week. (9/25/96, p. 20, "Ariz. Chief Puts Name and Face Front and Center"). A very powerful speaker, she called her approach, "populist," "emphasizing equity and access for all." That article noted that the Phoenix Gazette had tagged her in May 1996 as the strongest candidate for governor in 1998, a designation that may have proved significant to the subsequent events.

 

KEEGAN'S AGENDA

As a political conservative, Keegan supported policies of less government, less regulation, efficiency, decentralization, and choice. Most observers agreed that her central mission would be the support of the charter school movement. These values she shared with Symington. They disagreed with each other, however, on the issue of financial equalization and response to the Supreme Court order in Roosevelt v. Bishop, 1994 that found substantial inequity in the way Arizona funds schools and ordered the state to rectify the inequities. Keegan believed that making the educational opportunities of students more nearly equal was a fundamental responsibility of the state, while Symington fought the order at every turn, launched a court challenge, and even declared that making districts equal was tantamount to "state socialism."

Where Keegan stood on assessment policy was less clear. Publicly she supported the ASAP "process." To her conservative supporters, she made promises to retreat from ASAP as it was then defined to an assessment policy based on standardized tests. In an interview later, she had this to say:

I do believe that there is a little bit of a simplistic assumption: that if I write out my answer it is just de facto a better answer than if I say "yes" or "no." And all of a sudden performance examinations or answers that are written out and explained became almost cultish. I mean it is not the case that there aren't any good computations that can be made and answered in a multiple choice fashion. You have to have gone through a computation to get there. This sort of slavish devotion to watching everything get written out, I think has taken over in large aspect of what performance assessment is really about, performance assessment being the ability to think critically and to demonstrate that you've done that. It's not the same thing as is just writing out an answer. As a matter of fact, I would suggest that some of the problems that we've got in curriculum are around a very poor understanding of performance-based assessment. So that if a student writes out an answer, even if it's flawed in its thinking, if the answer has been written out, it's somehow defended, even if it's wrong.

Before audiences of teachers' organizations and school advocates, she used the inclusive pronouns, "we" and "us". She aligned with her corporate supporters, however, in her belief that the schools were underachieving, bureaucratic, and failing to produce graduates that could plug into jobs in the corporate world. According to the corporate view, the problem with public schools was a lack of accountability. More testing and more control over the curriculum would solve the problem. The corporate constituency believed that a high school diploma certified "seat time" rather than academic proficiency. Keegan seemed to share that view, even though control of curriculum and academic standards might be seen as inconsistent with the libertarian, anti-government ideology, though a postfordist analysis resolves the contradiction .

Her first press release upon taking office at ADE in January 1995 stated that her vision was, "To promote academic achievement and ensure the responsible, effective, and efficient use of state monies through services provided to all constituents." Among her goals were to streamline the system to allow schools to focus on academics (including reduction of mandates and paperwork burden); create options in the system (e.g., charter schools, parental choice grants, open enrollment); provide accurate and timely information and communication (on-line school report card and financial data bank), ensure effective funding (work with governor and legislators to equalize school finance); and "emphasize high-stakes academic accountability and testing" (critique and refine ASAP, continue norm referenced testing, emphasize Essential Skills standards, adapt Goals 2000 to Arizona use, report cards, assure relevance of education to the workplace).

Very quickly, Keegan gained visibility in educational policy in the national arena as well as the state. On the national scene, she broke ranks with the Council of Chief State School Officers, the group that comprises most state superintendents. In an Education Week article (Lisa Graham Keegan and John Root, "Why We Formed the Education Leaders Council, 2/21/96, p.39), she expressed her opposition to federal assessment policy (Goals 2000), her support of local reform and parental choice, and her skepticism about the quality of public schools:

[We] share the belief that education initiatives, policies, practices, and standards are strongest when generated from within individual communities and weakest when handed down from on high. We also believe that true education reforms are those that center on the needs and choices of families, empower parents and teachers to work in concert to chart the course of a child's education, increase accountability in America's schools, and restore local control over school policies and practices. While all of that separates us from the education establishment, we believe it unites us with parents and the vast majority of American teachers and school administrators who share our exasperation with the nationalized business-as-usual approach to reform, and our fear that unless we act quickly and boldly to restore excellence to all schools our nation at risk will become a nation of ruin...We have two constituencies : parents and children. We share their views on what they want and need... As reported by Public Agenda, a public-opinion research organization in NYC, parents' priorities for education are: safety, discipline, high standards, and a focus on the basics...more than half...said that if they could afford to, they'd send their children to private schools because...those priorities were being met [there]...We don't need politically correct education standards set at the national level. It's our job to set standards and then to hold schools, educators, and students accountable...too many [children] are stuck in failing schools. We haven't lost our zeal to free them...

 

THE REORGANIZATION OF ADE.

One month after taking office in January 1995, Keegan announced the reorganization of ADE, "to better focus our energies on our mission of improving academic achievement," (Press release, 2/9/95). Brenda Henderson was named to head the Division of Student Achievement and Assessment, wherein the Essential Skills and ASAP Unit was placed. Staff who had previously been involved in ASAP were moved to Academic Support, Teacher Certification, and Teacher Development projects. Various informants used words such as "purge," "hit list," and "litmus test," to describe the changes in the department.

Results of the reorganization on assessment policy were immediate both symbolically and in reality. "When you call for information, either no one answers or someone answers who knows nothing, " said an observer of ADE. Another noted:

The word, reorganization, sounds like an oxymoron. They came in and they decimated the department, is what they did. Anybody that wasn't on permanent status was history -- gone. That was 90 something people, as I understood, overnight. Anybody with a doctorate -- gone, or moved to a marginal position. And then because they had people with-that you had different departments that didn't communicate with one another, but the solution was to take people from one department and put them in another department, so they would take their experience with them and so on. So it was like a salad bowl; they just tossed everybody around, and they landed all different kinds of places and they did it in (clicks fingers) a very short order. And the result was chaos.

Another official in the Bishop administration who had already left ADE reflected later:

It wasn't personal, it was political. It says more about the lack of confidence in a professional educator -- that someone who has a doctorate in curriculum and instruction, someone who has a doctorate in tests and measurements, someone who has a doctorate in fine arts, for example, is not really important, it's not necessary. That someone who is a layperson or a bureaucrat in a government agency or someone of that ilk can do the job better than a professional educator can...[The people who were committed to using ASAP as a way of improving education] are now the ones who are doing menial tasks, shuffling papers somewhere... no longer involved in that kind of work.

TECHNICAL REPORT: Text or pretext?

On 1/25/95, the Arizona Republic (Hal Mattern, "State's public-school test suspended," p. A1, B-8) wrote:

Arizona's new standardized [sic] test for public-school students has been suspended for one year because of concerns about whether it accurately reflects what students are learning, the state's top education official said Friday... said she decided to suspend the annual Arizona Student Assessment Program test after questions were raised by the company that developed it. "The results we have so far have been called into question," Graham [Keegan] said. "I can't say with confidence that it's a valid test. It hasn't been verified enough to determine whether it correlates with how much kids know." ...Graham said the suspension of the test won't affect the curriculum portion of ASAP, which required teachers to change their methods of instruction. "Instituting that program has really made a difference in the classrooms," she said. Graham said that she and other state Department of Education officials will be studying the test over the next year with representatives of Chicago-based Riverside Publishing Co., which developed the test, in an effort to improve it. Suspension of the test didn't surprise some educators, some of whom have complained in the past about inconsistencies in the way the ASAP program has been implemented and the lack of training on administering the test. "We like the concept of the test, but more training and adjustments in the way districts use it are needed to make it as successful as it should be," said an AEA spokesman. A teacher in Avondale said, "I'm major-league disappointed." and said he likes the test because it "stresses writing and thinking skills, not the memorization of facts, rules, and formulas. I love the test. It's really worth teaching to." Graham said teachers such as Cooper needn't worry. She expects the ASAP test to be back in place in 1996. In a memo to school district officials announcing the suspension, Graham described her action as an "affirmation of ASAP and nothing less. We are not abandoning the process," she said.

The report (Arizona Student Assessment Program Assessment Development Process/Technical Report, by Riverside Publishing Company) that Keegan referred to had been available since June 1994, but had not been made public. An official later admitted that the technical data on the performance assessments had never been very promising, but that "ideologues," that is, those who were actively promoting ASAP reform goals and performance assessment in general and those who suppressed or ignored information that cast ASAP in a bad light paid little attention. A former member of the Bishop administration commented on the calculated neglect of the weak technical data: "I think the staff chose to ignore it because nobody wanted to go in and say to her, hey we've done this now once or twice, and we're not getting the results back we ought to. It was so public at that point, that who's going to walk into that office and say we're using a test that has not been properly field-tested."

Upon taking office, Keegan became aware of the technical report and the direction of its evidence. She sought counsel from representatives of the test publishers, staff members, and some university and district testing experts before making her decision to suspend ASAP. Other sources of information, such as an ADE teacher survey and opinions of staff and advisors, were said to have influenced her decision. More than anything else, the technical report was the principal stimulus:

The technical merit of the test, probably on a weight basis weighed more with me than anything else. You can work your way around a whole lot of veracity and work your way out of a whole lot of curricula problems or pedagogy problems or any of the things that I thought were evident in the skills themselves, because you could have fixed them, but you can't get around basically an invalid test.

After consulting with the State Board, she announced the decision, just two weeks before Form D-3 was due to be administered.

WHAT WAS IN THE TECHNICAL REPORT?

One's eyes are drawn to these items on the cover: The word "Draft" (though no final report was ever issued); the embargo date ("For use 2/2/95," though it was available to the department months before); the absence of authorship. Its introduction reads:

Form D was conceived and designed as the Arizona state assessment to be administered each spring beginning in 1993 and continuing through the spring of 1996 to third, eighth, and twelfth grade students. Form D is a statewide audit of student achievement on a subset of the Essential Skills. As an audit, the content and the specific skills addressed has to be secure. Form D was developed by addressing a selection of Essential Skills in reading, mathematics, and writing each year over a four-year period, for each of the grades 3, 8, and 12 with all the Essential Skills being measured over the four-year span.

Since it was neither practical nor necessary to measure every essential skill in Form D, a cluster of Essential Skills in a given Form A assessment was selected with emphasis on those requiring higher order thinking... Thus, in the Form D assessments, students are required to use their knowledge and skills to think critically, to solve problem [sic] and to demonstrate writing skills as related to real life scenarios. (p. 7-8).

The Form D assessments were designed to be performance based and require students to engage in authentic tasks and use critical thinking skills to arrive at an opinion, conclusion, or summary; to construct a graph, chart, or table; or otherwise solve a problem. In addition to being authentic, the assessments had to be relevant, contemporary, and developmentally appropriate (p. 10).

The report described the development of Form D, a process that apparently took less than three months (The contract from ADE to Riverside was awarded in October, 1992 with print-ready copy of assessments due in January, 1993). Fairness and content validity checks were performed by convening "focus group discussions" (p. 11) and incorporating the comments of participants in revisions of the assessments. The content group evaluated the match between the assessments and the Essential Skills. Based on this review and "informal tryouts" to determine if instructions were clear and to estimate time requirements, the report declared that Form D assessments were content valid and free of bias. A thorough reading of the report fails to reveal any empirical evaluation of this try-out to check for reliability or construct validity.

According to the report, scoring of the assessments was contracted to Measurement Incorporated whose personnel managed five scoring sites in Arizona or at their own offices. A separate technical report is referred to that contained the statistical data from the test administration itself, that is, after the assessments had actually been administered rather than on pilot data.

The report explains that validation of ASAP Form D-1 consisted of matching items with Essential Skills (content validity checks by the focus groups) and a construct validation study designed to equate Forms A and D-1 and estimate the correlations among the relevant portions of each. The equating study was done on volunteer samples of districts. Because the Arizona samples proved too small, the publisher solicited participation from a New Mexico school. The samples in the study then took both forms of ASAP assessment. In the end, twelfth grade data were insufficient to provide validity estimates. For the other grades, the resulting correlations between Form D-1 and the corresponding sections of Form A are represented in Table 1.

TABLE 1: CORRELATIONS BETWEEN ASAP FORM A AND D-1

GRADE THREE

Pearson Correlation

Corrected for Attenuation

% Common Variance

READING

0.37

0.47

14%

MATH

0.54

0.41

0.79

0.67

29%

17%

WRITING

0.51

0.80

26%

 

 

 

 

GRADE EIGHT

 

 

 

READING

0.30

0.41

0.39

9%

12%

MATH

0.58

0.41

0.76

0.67

34%

17%

WRITING

0.34

0.51

0.45

0.75

12%

26%

A supplement to the technical report was published in November 1994 based on analysis of the Form D-2 administration. Another equating study was conducted in which students took D-2 as part of the regular assessment process and then were administered Form A. This time the correlations were slightly higher, and the alpha reliabilities were also reported for both Form A and D-2. These data are represented in Table 2.

 

TABLE 2: ALPHA RELIABILITY AND CORRELATIONS BETWEEN ASAP FORMS A AND D-2

GRADE THREE

Pearson Correlation

Corrected for Attenuation

%Common Variance

Alpha Reliabilities

READING

0.45

0.59

21%

0.80 D

0.76 A

MATH

0.52

0.57

0.84

0.83

27%

33%

0.69 D

0.56 A

0.69 A

WRITING

0.53

0.73

28%

0.81 D

0.65 A

GRADE EIGHT

 

 

 

 

READING

0.61

0.45

0.80

0.54

37%

20%

0.82 D

0.71 A

0.84 A

MATH

0.42

0.43

0.53

0.58

18%

19%

0.86 D

0.73 A

0.66 A

WRITING

0.41

0.52

17%

0.77 D

0.80 A

 

 

 

 

 

GRADE TWELVE

 

 

 

 

READING

0.69

0.31

0.95

0.42

47%

10%

0.74 D

0.71 A

0.76 A

MATH

0.52

0.71

27%

0.69 D

0.79 A

WRITING

0.37

0.50

14%

0.87 D

0.65 A

 

 

The report cautioned against interpreting the results of the equating study because of the small sample sizes in certain categories and because the level of difficulty of Forms A and D were different in some cases. In addition, the twelfth grade sample produced a substantial number of scores of 0 on the writing test, calling into question the accuracy of the data. Having offered these cautions, the report asserted the following:

The validity studies performed for Form D2 provide the documentation to demonstrate that appropriate procedures were employed in the test construction process and that the statistical correlations are at a level to demonstrate that the assessments measure the Essential Skills being tested (p.8).... The amount of common variance between the assessments was greater than 10% for all assessments except in grade eight reading. Although these correlations would signify that a relationship between the two assessments exists, the amount of common variance is small compared with that usually found in studies like this involving assessments in reading, mathematics, and writing. For example, in the Iowa Test of Basic Skills (ITBS) correlations between Forms K and L (two parallel versions of the ITBS) are usually in the 70s or 80s.... [In this study] the evidence is less than compelling (p. 12).

About the reliabilities, the report stated:

This range in reliabilities is typical of the magnitude generally found in assessments eliciting student-constructed responses, however it is lower than the reliabilities of most multiple-choice assessments used in large scale testing programs which generally range from 0.85 to 0.95... For accurate reporting of individual student scores, reliabilities greater than 0.85 are generally expected. However, if the intent is to examine school level results, reliabilities of 0.55 and higher can be satisfactory... [I]ndividual student scores reported from most of these assessments contain a large amount of error and should be used with caution.

REFLECTIONS ON THE REPORT

What can one make out from this technical report? First, note that the psychometric analysis occurred after the test was administered rather than before. If there were technical weaknesses, they were already implanted in the assessment results. Furthermore, the psychometric properties of scoring were not addressed. The four point rubric that the state had adopted was extremely general, not even differentiated by grade level. Second, the equating study was a best-case scenario of the relationships between Forms A and D. Form A as implemented in schools was administered and scored by teachers and thus would likely be less standardized than the administration and scoring undertaken by supervised scorers employed in the equating studies. Districts had a choice of using Forms A, a portfolio system, or other local tests as long as they could be scored by the official rubric used on A. Thus the function of D as an audit of A in practice was not evaluated by this study, although the data are useful in other respects. Third, the report spins the data a particular way by comparing them with reliability and validity data from standardized tests, an unrealistic burden for any performance assessment to bear. The authors give little guidance about comparable data from other large-scale assessments that have a history with extended response formats (e.g., National Assessment of Educational Progress, Advanced Placement) a perspective that might have cast the ASAP data in more favorable light. Interestingly, although Keegan focused on the inadequacy of Form D, D-2 had higher alpha reliabilities in some cases than the comparable Form A, which emerged unscathed from the department's analysis and was destined at that time to be the graduation competency test. Fourth, the report fails to mention that Form D tested content in integrated form while A tests reading, writing, and math separately, a fact that would likely depress their correlation. Finally, the analysis fails to account for the situation that each version of D (D-1, D-2, etc.) was designed to test one-fourth of the Essential Skills, so that the content domains of the two forms were different. Thus, depending on the question asked, the evidence about the technical qualities of the performance test is either more or less positive than how it was interpreted for the public. Of course, the actual report was quite closely held, and few have seen it directly.

Even accepting the premise that the evidence on the quality of ASAP was negative, it was nevertheless less than fatal. For example, problems with poor directions or ambiguous content could have been discovered in a field trial and fixed. A rigorous and independent evaluation by groups with expertise in performance assessment could have identified problems with test content too easy or difficult for each age level, or could have suggested ways to simplify local testing practices. Reliabilities could be increased in a number of ways -- by sampling items from much more specifically defined domains of tasks (and increasing the size of the sample), by providing pupils with many practice exercises similar to the tasks that were measured, by using more specific rubrics to score the results, by training and increasing the number of raters for each test, calibrating the performance of rating teams and rating sites, and the like (personal communication, Lorrie A. Shepard, 1996).

HOW WAS THE REPORT INTERPRETED?

Comments of insiders and observers subsequent to the decision represent an interesting array of information, misinformation, and alternative definitions of ideas held dear by psychometricians. We include some of the comments here to demonstrate the tenuous hold of these ideas by policy actors.

Riverside told us that Form D will not correlate with your Form A. And so you can give it to them, the kids, but you're not going to get any valid information. There isn't any validity there. Form A, B, and C were district level assessments, practice. Form D came in to say, 'Did you report that your kids knew this? Did you tell us the truth?' is basically what it meant. I'm only telling you what I know about it. And to be very honest with you, I never went back and asked anyone. This is hearsay. This is not-it's what I understand to be true.

Well, the problem was the format itself. The company who had developed ASAP, which is out of California, Riverside, in the testing, it appeared that in the Form D that we weren't getting an accurate reading of the overall assessment of ASAP. And it made it impossible to absolutely certify the results. So if we couldn't do that, then basically the overall aspect means that it was worthless from a standpoint of being able to say, "Here's what's this data is doing compared to another state." It did not mean that the individual parts -- A, etc., were not good tests. It's just that we couldn't absolutely guarantee it."

Well, I had felt that if we were going to base graduation requirements on Form A's, which the districts were kind of reporting themselves, but then we had a statewide audit on Form D's. I felt from the very beginning that as soon as those- If or as soon as those results started to ever differ, if the districts are saying, "Yeah, 90% of our students are competent on all Form A Essential Skills," then we give a Form D once a year, and it's 50, I felt that something would begin to unravel. I've never been real clear-I mean I'm not an educational professional, I guess, so I'm not sure that I have a complete understanding of what about the Form D's that was declared invalid or unreliable. But once the results of those too started to be different, something was going to unravel.

 

The teacher comments were particularly poignant in that the teacher comments suggested that the test couldn't possibly be valid because those districts that had adopted to a large degree the ASAP curriculum, the Form A's and the State promoted Essential Skills tests did not feel that the Form D, the examination followed, covered the same material.... The validity reports, the technical reports that came from Riverside itself said the same thing. And they started in the first year, suggesting that there were concerns about validity, how well did the Form D actually track the Form A. In the second year, there was no question, but they've got in their documents there really was a worse than chance equation. The contracts that had been drawn suggested that the Form D would equate with the A.

The field test reports show that these are not valid matches of the A's that they're supposed to be auditing. ...At the same time, though, there were enough districts... [saying,] hey, there's something wrong with this. Our kids completed all the A's, got them right, and then they came in and did the D's on the same skills, and their performance levels were two different? It can't happen. If you can do it one time you ought to be able to do it the next.

Among those insiders in the Keegan administration and her allies on the State Board of Education, the evidence was considered damning and the Form D incapable of resuscitation. Asked about whether she had ever considered an effort to improve the assessment rather than kill it, Keegan replied:

I don't have that kind of patience. I mean I can't fathom my representing the state exam as a valid measurement of the Essential Skills which were mandatory - are mandatory -- when I knew for a fact that the test was not a representation of ability in that area. It's dishonest. So I mean, no amount of time gets you over dishonesty. I don't know how that works itself out. That is not-it was not represented to be a feature of the test, [that it] would resolve itself in time, even by the testing company or the measurement company who scored this test.

The touchstone of opinions about the technical adequacy of Form D centered on whether the state could use it to audit district assessments and certify individual student competence. ADE insiders believed that Form D, "wasn't providing honest accountability. And what you needed was a high stakes examination that was true." A related worry concerned the lawsuits that might follow the use of a test with low reliability to deny high school diplomas. Because of the confluence of accountability functions with weak technical evidence, the Department treated the decision to suspend as a psychometric inevitability. As Keegan rhetorically asked, "What else could I do?"

But others believed the data became a pretext for political action. That is, since there was such rampant misunderstanding of psychometric principles and conventions and the relative absence of technical advice, the state could not possibly have been using the data rationally. Only a political use of the report remains credible. An observer of the department reflected later:

I don't know. But I think when she made the decision she was faced with a decision that she had to make. Knowing what she knew, to go ahead and do D-3 becomes her problem all of a sudden. It was somebody else's problem, I mean, let's face it. If she did the D-3 knowing that its prior two forms were neither valid enough nor reliable enough, she then took the problem on for herself. So I don't think she had any other choice she could have made.

Not everyone, however, thought of the reliability and validity evidence in quite so negative light. A district testing official characterized the equating study of A and D as comparing "apples to oranges. It's a wonder they correlated at all." A Riverside representative opined that, "given the nature of the materials, it was probably about average, okay." And that performance assessments are "notoriously unreliable." He explained that high reliability and validity coefficients rely on a much longer development process than that which characterized ASAP:

One of the problems with so many of these State mandated programs that, you know, somebody comes out here with an RFP, and where the impetus for this is coming from either the Legislature or the Governor, policymakers have this just (clicks fingers) I mean completely unrealistic idea about the difficulties in building tests in terms of time and money.... And so inevitably these mandated programs where they're mandated, they always have too short of a startup time associated with them. Which means there's no question but what the materials suffer in quality. You just can't do things that fast. And so when I say given, what I think of the comparison of these materials to other similar kinds of materials, they're fine. They're-but they're not near as good as they would be- I mean we would develop ITBS, I mean we're talking about years.... And reliability and validity? They're just words to policymakers.

 

An official in the Bishop administration thought that the technical qualities were adequate even in light of the suspect methodology of the study, expressed the widely shared belief that the technical report was a pretext for a political decision. Referring to the report, she stated:

Well, it gave people a place to stand if they didn't like ASAP. But I think nothing was so severe that would require completely scrapping the examination. I mean it could have been corrected; if there was a technical flaw, it could have been corrected. Even though the correlations were low, it still didn't answer the question about how well is it measuring the Essential Skills. I'm not privy to the discussions that went on. I think it was a political decision more than anything, and the technical report provided some place to stand.

A democratic legislator reported:

I do believe that it was politically motivated. I don't believe for one second that it was this great revelation to [Keegan] that this testing didn't jive, mainly because everyone in the educational field knew that Form D was not compatible or to be compared to the Form A tests.... So this trumped-up, great revelation that this is all out of whack and we have to put a moratorium on testing and I don't know what we're going to do and we have to re-tool the instrument, I think it was done with a lot of dramatic flair. And also I think it was politically motivated because there was a certain perception in political circles that it was a slam to Superintendent (Keegan) that the Governor turned around and hired (Bishop) as his education advisor. I just thought it was a reckless move. It was not done with a lot of consideration, or forethought, or... I don't mind people coming in and taking radical changes or doing radical things as long as they anticipate the outcomes more and make contingencies for it, but to come in and do what I would perceive as a reckless act, I can't support.

A former official in the Bishop administration connected the action to suspend ASAP with the transformed function that the assessment had assumed over the past two years:

I don't think the reliability and validity would be a problem, I don't think those are problems, if those assessments were put to the use for which they were designed. It is when you try to use them for something for which they were not designed that those things become a problem. Which makes me wonder about two things: one, the politics of it. As a new chief, she has to make her mark, and second, I wonder about the finance, the money. Those are the two things that come to my mind as you talk, that the reliability and validity are very good words to use when you want to take an action as she did to end the test. Those are both valid words to use, they're valid for her to use, even. Because she's thinking of a different use for the test. But, reliability only becomes a problem if you're using a test for purposes other than which that test was designed.

 

To advise her on what action to take after the suspension, Keegan assembled an ad hoc committee that met three times during February and March. A member of the committee, which consisted primarily of district test coordinators and ADE staff, reported later:

The committee recommended that D be fixed. We kept telling her, keep working on it, don't get rid of it because you are going to lose your credibility and of course your validity is not going to be too high, because you're comparing apples to oranges. And if Form D tests the Essential Skills and Form A tests the Essential Skills, that's what you're trying to do. So what if they don't correlate with each other? But she had this bee in her bonnet that the validity was not high enough, and I think it was all just rhetoric... She had promised to get rid of it (ASAP) during her campaign, and, lo and behold, Form D is gone. I don't believe that she pulled it because it was not valid. It was her way of making a statement right away that she was going to be a strong superintendent. And it was an unpopular test anyway, so what better way to get the teachers and parents behind her, to pull an unpopular test. It was unpopular because of the way it was implemented, not because of the underlying idea. I agree that they should have fixed it rather than starting over from square one. No one really wanted to get rid of it. That was mainly because of loss of credibility and the fact that we would be several years without any state test or state data. But she just disregarded our recommendations and disbanded the committee. She convened that committee but wasn't really interested in what the committee had to say. It was all just show.

 

THE DEMISE OF ASAP

Keegan had promised in January not to abandon "the ASAP process", but merely to critique and refine it. But abandon it she did, if by "the process" she meant large scale performance assessment as the touchstone, or "audit" of the ASAP program as a whole or the reform of schools toward constructivism. Between January and May, she decided that the problems of Form D were too serious to remedy, and signaled more far-reaching problems in state assessment policy. The Arizona Essential Skills needed an overhaul as well. On a local television program, Keegan foreshadowed what was to come when she derisively recommended sending the Essential Skills to the scrap heap. Most of them were not measurable, she claimed, and the documents were so long, convoluted, and filled with educational jargon that parents could not possibly understand them or hold schools accountable for achieving them. In addition, the Essential Skills failed to embody world class standards, and emphasized process rather than outcomes.

An ADE memo of 3/8/95 to schools stated that "a team of specialists from the Student Achievement and Assessment and the School to Work Divisions has been formed. This team is charged with ensuring that the Essential Skills have the following characteristics: Encourage high-level achievement, emphasize academic content, be precisely defined and measurable, incorporate the SCANS skills considered necessary by business and industry, reflect a real-world, occupational context, have a consistent, easy-to-read format." The superintendent also assigned to the team the task of ensuring that the state assessment system incorporate both standardized and performance-based tests and incorporate high school graduation tests. Two months later, May 25, 1995, ADE announced a revision of the program as a whole, and a new name with the old acronym:

...[Keegan] announced the new direction for the Arizona Student Achievement Program [italics added] which integrates career education and academic proficiency for all students. "What we expect of our students is what we will get," she said. "Our expectation must be for both high academic achievement and lifetime employment."

The ASAP has undergone thorough review since the Form D statewide ASAP was suspended earlier this year. "While I was unhappy to find that our previous testing program was a problem, the discussion of the past few months has resulted in very strong revisions.... What we have heard from parents, teachers, and business about ASAP has led us to keep the foundation and vision of this excellent concept, but to be far more demanding and to ensure that the program is relevant to our students' education."

The major components of the revised program include:

  • new statewide and district level assessments
  • professional development for teachers
  • changing to 4th, 8th, and 10th the grades in which the statewide assessments of academic proficiency are given
  • introduction of a certificate of mastery of academic proficiency in 10th grade
  • introduction of a 12th grade workplace-specific or higher education placement test.

"Our primary emphasis at the elementary level will be mastery of foundational skills......Where we find students not proficient in those Essential Skills by grade four, we must offer solutions immediately to increase the likelihood of that child's successful completion of his or her education. Waiting on a 12th grade graduation test of proficiency is eight years too late."

Under Graham's proposal, the focus on academic proficiency will continue in middle school, but with career counseling and an eye toward the work force. Upon demonstrating the mastery of the Essential Skills in 10th grade, students will earn a certificate of mastery of academic proficiency. The balance of a student's high school course work will be tailored to meet his or her educational and career goals.

A final test of proficiency in the 12th grade, congruent with students' course work and career goals, will be required. These tests may be either workplace-specific tests or college placement tests. The certificate of mastery and successful completion of the appropriate tests will be a graduation requirement for the class of 2002.

"The difference we've created by adding the 10th grade academic proficiency assessment and appropriate 12th grade tests is that it ensures that 100 percent of students will have exposure to workplace experience, and that all will be expected to master the same academic skills."

"We remain committed to high stakes graduation requirements for our students."

 

The release noted that norm referenced testing would be continued, that no state ASAP testing would be conducted during the 1995-96 academic year, and that the new ASAP state test would be piloted during 1996-97. Meanwhile, districts would still be required to continue their state mandated testing and reporting according to the DAPs. Subsequent to the press release, the Board of Education approved the action to reform the assessment policy. The vote was not unanimous, as some members worried that educators would construe the action as affecting the ASAP program as a whole and that state testing would lose credibility.

Reactions to this announcement varied across the spectrum, as Keegan herself noted in an interview.

Well, they fell into two categories. Either people were pleased about [the decision] because they didn't like the test, and I guess within that category you had people who didn't like the test because they weren't on board, period, didn't want tests, didn't want to deal with that. And then the other side are those people who really had held out some hope that this would be something that would be terrific, that were on board with the Essential Skills and performance-based assessment, and they just flat out thought that the examination was not reflective of what their kids knew, and they felt like they were being misrepresented in their teaching. Now, that was the majority of the letters that I saw, not "We don't want to be tested," but, "This test isn't reflective of my students' ability," you know, "This is a horrible experience, and the kids don't respond well to it," which, you know, my own experience in looking at that exam is pretty predictable. So there was that, "Gee, thanks, we didn't like it either, it does need to be fixed," and then there was, "Thanks for getting rid of it, we don't ever want to see a test again." And then probably a third which was, "We don't like performance assessment, we're fine with yes-no, true-false, traditional what-we-understand-type testing." So those folks, all of whom were happy to have it suspended. The other side was those who were not happy to have it suspended, and primarily those were curriculum directors, sort of people who had been involved in putting it together, in pushing it, the administration superintendents who were on board with the ASAP program. And there was a whole lot of diatribe and rumor about the fact that, you know, my election was seen by a lot of people as sort of a right wing thing and "See, this is a back to basics move, and we're going to start testing with some Neanderthal examination."

Republican legislators and the State Board members played down the reaction or failed to see it at all. "Zero reaction," said a legislator. Insiders in the Bishop administration, however, reported dissatisfaction and even protest from groups such as the Parent Teachers Association, teacher groups, and administrators in those districts that had spent time and energy adapting their programs toward constructivist education. Said one:

People were very disappointed, very disappointed. And they didn't understand why it was stopping. People in school districts said, "geez, we've really invested a lot of time and effort getting the kids ready." Everything that we had hoped for that districts and schools would do to prepare their kids to do well on these assessments was happening. Everything was happening. There were places obviously it wasn't, but just stopping was a tremendous let-down for people. They said, "oh, geez, we've really worked hard." So, I thought at that point in January of '95 that the program really had come a long way. Had really made a huge impact. And people here, I went in and talked to the principals about not giving the D-3, which was going to be within two weeks or something--three weeks, I don't know what the timing was, March maybe--and they were tremendously disappointed because they had invested in the program, they began to understand the power of this. Our principals here were saying our teachers are teaching differently, they're thinking differently about achievement, they're using resources and their time--everything that late in '88 we thought we could do was being done. And so they were very disappointed.

Members of the Keegan administration were befuddled by the negative reaction in some quarters, because they claimed that this was not a wholesale change in assessment policy but only a "correction" or "clarification," and that it only affected parts of ASAP. What they misunderstood was that, to many educators, Form D was ASAP -- an integrated assessment that both mirrored and promoted integrated instruction . The words of the superintendent seemed to indicate a return to competency-based teaching, skills-oriented testing, and a highly tracked, test-driven school structure.

An insider in Keegan's administration noted that her original plan had involved taking five years to revise the Essential Skills along the lines of her specifications, consulting with teachers, parents, business, experts, and the like, and carefully developing appropriate state testing. But the Board of Education rejected that plan as too slow. Board members wanted an immediate revision and no gaps in accountability data. The majority agenda on the Board was to institute, as soon as possible, a graduation competency test as a substitute for "seat time." A State Board member reported:

I have always been concerned that seat time should not be a graduation requirement. Ever since I was in high school, seat time was all you really needed to get a diploma. And we all agreed that we wanted a diploma to mean something, to have some stakes to it, some risks to it, perhaps even get to the point ultimately where there could be a guarantee to the business community that if our students have a diploma that they can count on them having certain skills.

So Keegan put aside her plan for a patient, collaborative process of standard-setting in favor of staging the Academic Summit, in which the Skills would be replaced with new standards in minimal time. According to a Board member,

Well, I think there had been some prior belief among board members that the Essential Skills were too voluminous, and that when the time was appropriate, it would be a good idea to streamline them, to perhaps review them, to make them higher standards. And probably mid-year, we asked the administration how quickly could we develop standards so that then assessments could be developed, so that then we could develop an appropriate testing mechanism, keeping in mind that we wanted this to happen as soon as possible, which is A-S-A-P. And we're told that they felt that they could accomplish that with our support by January of '96.

Some educators wrongly believed that the actions of the Board and Superintendent contravened state law. In fact, the legislation had never mandated or even mentioned "ASAP," but prescribed norm-referenced testing in three grades and "Essential Skills testing," which the districts carried on during the transition. However, the proposed change of grades for state Essential Skills testing (i.e., from grades 3, 8, and 12 to grades 4, 8, and 10 plus a graduation test) required that the legislation be changed. The necessary change in legislation again made the state legislature a powerful player in assessment policy. By this time, the Legislature had moved far to the right from where it had been at the time of the last legislative change in assessment policy, as one can see in a subsequent section.

Next - Part 6 - Raising Standards



Cite This Article as: Teachers College Record, Date Published: September 13, 2000
https://www.tcrecord.org ID Number: 10479, Date Accessed: 10/25/2021 12:11:03 AM

Purchase Reprint Rights for this article or review
 
Article Tools
Related Articles

Related Discussion
 
Post a Comment | Read All

About the Author
  • Mary Smith
    Arizona State University
    Mary Lee Smith is professor of Educational Leadership and Policy Studies in the College of Education at Arizona State University. Her research interests include the effects of state-mandated measurement-driven reform on schools. Among her publications are Analysis of Quantitative and Qualitative Data (Handbook of Educational Psychology).
  • Walter Heinecke
    University of Virginia
    Walter Heinecke is an Assistant Professor in the Department of Educational Leadership, Foundations and Policy in the Curry School of Education at the University of Virginia. His research interests include the impact of policy on practice in education. He has conducted research on the impacts of standardized testing on elementary school instruction, desegregation, educational technology and school reform policy. He is co-editor of Advances in Research on Educational Technology.
  • Audrey Noble
    University of Deleware
    Audrey J. Noble is the Director of the Delaware Education Research & Development Center at the University of Delaware. Her current policy research examines Delaware's efforts to reform education through standards, assessment, capacity-building, and governance. Among her recent papers is "Old and new beliefs about measurement-driven reform" in Educational Policy.
 
Member Center
In Print
This Month's Issue

Submit
EMAIL

Twitter

RSS