
Instructional Policy and Classroom Performance: The Mathematics Reform in Californiaby David K. Cohen & Heather C. Hill  2000 Educational reformers increasingly seek to manipulate policies regarding assessment, curriculum, and professional development in order to improve instruction. They assume that manipulating these elements of instructional policy will change teachers' practice, which will then improve student performance. We formalize these ideas into a rudimentary model of the relations among instructional policy, teaching, and learning. We propose that successful instructional policies are themselves instructional in nature: because teachers figure as a key connection between policy and practice, their opportunities to learn about and from policy are a crucial influence both on their practice, and, at least indirectly, on student achievement. Using data from a 1994 survey of California elementary school teachers and 1994 student California Learning Assessment System (CLAS) scores, we examine the influence of assessment, curriculum, and professional development on teacher practice and student achievement. Our results bear out the usefulness of the model: under circumstances that we identify, policy can affect practice, and both can affect student performance. Politicians, business leaders, and educators
have been pressing major change in education since the mid1980s.
School reform had focused on the “basics” in the
mid1970s and early 1980s, but by the end of President Reagan’s first
term, researchers and reformers had begun to argue for more
intellectually ambitious instruction. They argued for better
academic performance, stiffer state and national standards, and
even stiffer state and perhaps national tests. Many contended that
teaching and learning should be more deeply rooted in the
disciplines and much more demanding. Teachers should help students
understand mathematical concepts, interpret serious literature,
write creatively about their ideas, and converse thoughtfully about
history and social science—not focus only on facts and
skills. Reformers also began to argue that schools should orient
their work to the results that students achieve rather than the
resources that schools receive. Beginning with The reformers faced two central problems. One was political: power and authority were extraordinarily dispersed in As instructional policy moved to the top of many states’ education agendas in the past fifteen years, these two questions about the relations between policy and practice have moved to the center of researchers’ investigation of school reform. We continue the effort. Using data from a 1994 survey of In the pages that follow, we develop this
rudimentary model: students’ achievement is the ultimate
dependent measure of instructional policy, and teachers’
practice is both an intermediate dependent measure of policy
enactment and a direct influence on students’ performance.
Teachers therefore figure in the model as a key connection between
policy and practice, and teachers’ opportunities to learn
what the policy implies for instruction is a crucial influence on
their practice, and thus at least indirectly an influence on
student achievement. THE REFORM: POLICY AND INSTRUMENTS State reform of mathematics instruction in The 1985 Framework called for intellectually
much more ambitious instruction, more mathematically engaging work
for students, and teachers to help students understand rather than
just memorize math facts and operations. The Framework was a
central part of state instructional policy, though it was formally
“advisory” to local districts. It encouraged teachers
to open up discourse about math in their classrooms, to pay more
attention to students’ mathematical ideas, and to place much
more emphasis on mathematical reasoning and explanation rather than
the mechanics of mathematical facts and skills. Shortly after the new Framework was issued the
state board of education tried to use textbook adoption as an
instrument of the policy; state approval carries great weight with
localities, since they get state aid for using approved
texts.
The state board used the Framework to reject most texts. After much
debate, some negotiation, and a good deal of acrimony, some of the
books were somewhat revised, and the board declared most of them
fit for children’s use. But state officials were not happy
with the result, and decided that text revision might not be the
best way to encourage reform. Reformers then began to encourage the development of other curriculum materials—small, topiccentered modules termed “replacement units”— that would support changed math teaching without requiring warfare with text publishers. The state department also tried to encourage professional development for teachers around the reforms, though continuing budget cuts had weakened the department’s capacity to support such work. The new Framework called for a substantial shift in teachers’ and students’ views of knowledge and learning, toward views that most Americans would see as unfamiliar and unconventional. If the new ideas were to be taken seriously, teachers and other educators would have a great deal to learn. Moreover, the Framework offered such general guidance that the The California Department of Education (CDE)
used its student assessment system as another means to change
teaching, and devoted considerable attention to revising the tests
so that they were aligned with the Framework. Though some reformers
were uneasy about testing, others assumed that new tests could
help.
They reasoned that once the state began to test students on new
mathematical content and methods, scores would drop because the
material would be unfamiliar and more difficult. Teachers and the
public would notice the lower scores, which would generate pressure
for better results; teachers would pay attention to the pressure
and thus to the new tests, and instruction would change. As one
state official told us, “. . . tests drive
instruction.” The Department had some difficulty revising
the tests—in part because it was a formidable task, in part
because of state Superintendent of Public Instruction Bill
Honig’s disputes with thenGovernor Dukmejian, and in part
because of Honig’s own tribulations and trials. But finally
the revisions were made and the new tests were given in 1993 and
1994. As state education leaders had thought, scores were lower and
the public noticed, but that rather understates the matter: a storm
of protest erupted after the 1993 test results were
published. Not only were scores generally quite low but a
committee chaired by Lee Cronbach, a prominent researcher, raised
questions about the technical quality of the test and its
administration, reporting, and analysis (Cronbach, Bradburn, &
Horvitz, 1994). Things were modified for 1994, partly in response
to the outcry over low scores but it was too late, for the
opposition had organized an assault on the whole enterprise.
Conservatives criticized the new tests on the grounds that they
gave little attention to the “basics” and instead
encouraged “critical thinking,” or
“outcomesbased education,” activities that many
rejected. Governor Wilson was running for the Republican
presidential nomination at the time; he attacked and then canceled
the testing program. PROFESSIONAL LEARNING AND REFORM Larry Cuban (1984) once wrote of such
political controversies that they only weakly affect schools and
classrooms. Like storms on the surface of a deep ocean, they roil
the surface but have little impact on developments further below.
Much research on policy implementation has probed the failure of
central policies to shape practice in streetlevel agencies, but
most researchers seem to assume that policy is normative and
practice should follow suit. They write from the perspective of
policy, trying to explain why practice has gone awry. A smaller
number have tried to consider policy from the perspective of
practice, asking whether there are features of policy that explain
its frequent failure to affect practice. The research reported here began nearly a decade ago by a research group we participated in at As the MSU research project studied classrooms and mathematics teaching in From this perspective it seems that the
connections between policy and practice would be crucial. If
implementation were in part a matter of professionals’
learning, and if most teachers could not do it all by themselves,
then some agencies would have to do the teaching and encourage the
learning. This implies that the relations between events both on
and far beneath the surface of policy would be significant, and
that the content of those relations would in a sense be
instructional. From this point of view the key issues about those
relations would be similar to those one might encounter in any case
of teaching and learning: What opportunities did teachers and other
enactors have to learn? What content were they taught? Did teachers
who reported these opportunities report a different kind of
practice than those who did not have them? Analysts would
investigate who taught the new ideas and materials, and what
materials or other guidance for learning teachers had. It would not
do solely to look beneath the surface of policy in practice;
rather, one would want to look anywhere one could find agents and
opportunities that might connect policy and practice via
professional learning. Beginning in 1988 the MSU research group
explored some of the features of the response to reform in detailed
longitudinal field studies of teachers’ practice, their
understanding the reforms, whether their practice changed, and what
opportunities they had had to learn. In late 1994 the studies were
supplemented with a onetime survey of elementary school teachers in order
to extend the breadth of our findings about the extent of change in
math teaching. A survey instrument was designed and a random
sample selected using the school district as the primary sampling
unit (Table 1). Because the number of students in each district
varies so greatly, districts were stratified into five categories
by student population and unevenly sampled in order to achieve
probabilities proportionate to size. Within the 250 schools
sampled, one teacher from each of grades 2 through 5 was selected
at random and mailed a survey. Because some schools did not support
four teachers for these grades, the final number of teachers in the
sample was 975, rather than 1,000. Teachers were offered $25 in
instructional materials for completing the hourlong survey and
returning it. Those who did not immediately respond were sent mail
reminders, and eventually 595 (61%) teachers
responded. Teachers’ We report the initial analysis of the survey
data here. Our opening conjecture was that the greater the
teachers’ opportunities to learn the new mathematics and how
to teach it, the more their practice would move in the direction
that the state policy had proposed. To probe this
conjecture Table 1. Sampling information
Size of district # districts
sampled/ # Schools sampled/ Stratum (in students)
total districts in strata district
1 600,000
1/1 10
2 35,000+
10/10 5
3 10–35,000
50/97 2
4 1–10,000
70/367 1
5 <1,000
20/421
1 we needed to know what opportunities teachers
had to learn, what they had learned, and what they did in math
class. Therefore the survey examined teachers’ familiarity
with the leading reform ideas, their opportunities to learn about
improved mathematics instruction, and their mathematics
teaching. This approach implies a conception of the
relations between policy and practice in which a teacher’s
opportunity to learn would be a critical mediating instrument. But
the content of this opportunity is not selfevident, and if the
teachers might play the crucial role we propose, a more precise
notion of what various opportunities entail is required. The MSU
group’s work and previous research suggest that several
aspects of teachers’ opportunities to learn would be
significant: 1. General orientation: teachers’
exposure to key ideas about reform 2. Specific content: teachers’ exposure
to such educational “instruments” as improved
mathematics curriculum for students, or assessments that inform
teachers about what students should know and how they perform
3. Consistency: the more overlap there was
among the educational instruments noted above, the more likely
teachers’ learning would be to move in the direction that
state policy proposed 4. Time: teachers who had more of such
exposure would be more likely to move in the direction that the
state policy proposed These ideas imply two points about any
analysis of the relations between instructional policy and teaching
practice. One is that we view teachers’ reported practice as
evidence of the enactment of state instructional policy and thus as
a key dependent measure. The other is that we view teachers’
opportunities to learn as a bundle of independent variables that
are likely to influence practice. The connections between policy
and practice, therefore, are our central concern, and learning for
professionals is one of several key connecting
agents. We conjecture that relations between events on
and below the surface of policy implementation would depend less on
the depth of the water than on the extent to which government or
other agencies built connections, or made use of those already
extant. Ours is an instructional model of instructional policy. It
draws on the MSU group’s earlier studies in California
(Guthrie, 1990), other work we have done on practice and policy
(Cohen, 1989; Cohen & Spillane, 1992), and several lines of
research in which researchers have tried to observe teachers’
responses to policy (Schwille et al., 1983), or in which they have
tried to devise indicators of quality in teaching and curriculum
(Murnane & Raizen, 1988; Porter, 1991; Shavelson, McDonnell,
Oakes, & Carey, 1987). Student Achievement Students’ performance is no less important an outcome than teaching practice, since reformers’ justification for asking teachers to learn new math instruction was that students’ learning would improve. In this sense teachers’ practices are a crucial intervening factor, for if instructional reform were to affect most students, it would mainly be through teachers’ practice. Thus teachers’ practice is a dependent measure of policy implementation from one perspective, but from another it is an independent measure that mediates the effects policy may have on students’ work, which is the final dependent measure. Hence we probe links between teachers’ opportunity to learn, their practice, and scores on In this conception of the relations between
policy and practice, teachers’ opportunity to learn
(1–4 above) influences their practice, and their practice
influences students’ performance. But teachers’
practice is not the only influence on students’ learning.
Such a policy also could influence learning by way of
students’ exposure to specific educational
“instruments” such as improved mathematics curriculum,
or tests that direct teachers’ and students’ attention
to the goals and content of reform. Other factors are also likely to influence
either the opportunities that teachers are provided, their
learning, or students’ learning. Social and economic
inequalities among families would create differences in
students’ capacity to take advantage of improved curriculum
and teaching, and inequalities among schools and communities could
inhibit teachers’ capacity to learn from new curriculum and
assessments. Neither learning nor opportunities to learn are
independent of politics, money, social and economic advantages, and
culture. Hence we take several of these into account in the
analysis that follows. But in developing our conception of links
between policy and practice we keep most attention on factors
closest to the production of student growth—teachers’
learning and practices, and related curricula and
time. MEASURES OF PRACTICE AND PROFESSIONAL LEARNING We want to know how teachers’ practice
compares with reform ideals, so we asked teachers to report on
their classroom practice in mathematics along some of the
dimensions advocated by the California Frameworks. But since
we—and the reformers—were interested in change, we also
wanted to know how their teaching compared with conventional
practice, and we asked teachers to report on that as well. Both
sorts of measures would be required to probe whether
teachers’ opportunities to learn influenced their practice,
and to explore whether reformoriented practice is related to
students’ achievement. For now, we stick to the first part of this investigation, asking whether teachers’ practice is correlated with their own opportunities to learn. We start by more closely defining how we measured “practice” and investigating how opportunities to learn are distributed through Practice Teachers’ selfreports of classroom
practices associated with mathematics instruction are measured by
fourteen survey items. A factor analysis reveals two dimensions
along which these items line up. The first consists of more
conventional instructional activities (see Table 2), with higher
numbered responses indicating more such activities, except in the
case of item 35, which was reversed for the purposes of this
analysis. The responses to these items are individually
standardized and averaged by teacher to form the scale we call
“conventional practice.” Its mean is zero and standard
deviation .75; the scale’s reliability is .63. The second set of items that emerged from our
factor analysis is composed of activities more closely keyed to
practices that reformers wish to see in classrooms (see Table 3).
We averaged teachers’ responses to these seven items to make
our “framework practice” scale, on which higher scores
indicate more of this kind of activity. The scale has a mean of
3.26, a standard deviation of .72, and a reliability of
.85. Though these scales were not validated by the
project, a previous study comparing teachers’ survey answers
to data from observations of their teaching demonstrated that while
teachers tend to overreport the reform practices they engage in,
their placement in relation to other teachers on selfreported
practice scales tends to correlate highly with the relative
placement made by an observer (Mayer, 1999). Professional Learning As we said above, most teachers had much to
learn if they were to respond deeply to the new ideas about
mathematics teaching and learning. We report here on three very
different sorts of opportunities to learn: study of certain special
topics and issues related to reform; study of specific math
curriculum materials for students that were created to advance the
reforms; and more general participation in learning opportunities,
reform networks, and activities. Table 2. Teacher reports of conventional
mathematics practices About how often do students in your class take
part in the following activities during mathematics instruction?*
( CIRCLE ONE ON EACH LINE. )
Q. 35. a. Which statement best describes your
use of a mathematics textbook?* ( CIRCLE ONE. ) 1. A textbook is my main curriculum resource
.....................................................................................................30.9
2. I use other curriculum resources as much as
I use the text
................................................................................39.1
3. I mainly use curriculum resources other
than the text
.......................................................................................
21.0 4. I do not use a textbook. I use only
supplementary resources
...............................................................................9.
1 *Numbers are percentages of respondents
selecting that category, weighted to represent statewide
population. Table 3. Teacher reports of framework
practices 9. About how often do students in your class
take part in the following activities during mathematics
instruction?* ( CIRCLE ONE ON EACH LINE. )
*Numbers
are percentages of respondents selecting that category, weighted to
represent statewide population. Table 4 contains evidence from our first
inquiry into teachers’ opportunities to learn. A single
question reproduced in the table asks teachers to estimate how much
time they invested in mathematicsrelated activities within the
previous year. The question refers to two somewhat different sorts
of workshops. The workshops mentioned in section B of the table
focus on the new mathematics curriculum for students. For instance,
Marilyn Burns Institutes are offered by experienced trainers whom
Marilyn Burns selects and teaches, and focus on teaching specific
math topics; some focus on replacement units that Burns has
developed. In some cases, teachers who attended these workshops one
summer were able to return the next summer and
continue. Replacement units are curriculum modules
designed to be consistent with reforms that center on specific
topics, like fractions, or sets of topics. Unit authors devised
these units to be coherent and comprehensive in their exploration
of mathematical topics—to truly replace an entire unit in
mathematics texts, rather than just add in activities to existing
curricula—and to be supportive of teacher as well as student
learning. Teachers who attended these sessions worked through the
units themselves, and often had a chance to return to the workshops
during the school year for debriefing and discussions about how the
unit worked in their own classrooms. Workshops like cooperative learning, EQUALS,
and Family Math (see section A of the table) had a different focus.
Each was loosely related to the frameworks, for the frameworks had
many goals, but none of the three focused directly on
students’ mathematical curriculum. EQUALS, for instance,
deals with gender, linguistic, class, and racial inequalities in
math classrooms. Family Math helps teachers involve
their Table 4. Teachers’ opportunities to
learn Which of the following mathematicsrelated
activities have you participated in during the past year and
approximately how much total time did you spend in each?
(E.g., if four 2hour meetings, circle
2—“1 day or less.”) (CIRCLE ONE ON EACH LINE.)*
1 day 2–6
1–2 >2
None** or less days weeks weeks
Section A: Special Topics EQUALS. . .
96.5 2.4 .9 .2 0
Family Math. . .
81.7 12.9 4.3 .8 .3
Cooperative learning. . .
54.5 28.9 13.7 1.8 1.1 Section B: Student Curriculum Marilyn Burns Workshops. . .
83.2 9.8 5.3 1.3 .3 Mathematics replacement units. . .
58.9 22.7 14.2 1.7 2.5 *Numbers are percentages of respondents
selecting that category, weighted to represent statewide
population. **Missing data assumed to be
“none.” students’ parents in math learning, and
cooperative learning workshops come in many different flavors, some
dealing with “detracking,” some with other things, but
all encouraging learning together. Twothirds of the teachers who responded to
our survey participated in professional development activities in
at least one of the five workshops listed in Table 4. But the
breadth of these professional development opportunities was not
matched by their depth. Our chief indicator of depth was the amount
of time teachers reported participating. While we recognize that
more time is no guarantee of more substantial content, it creates
the opportunity for substantial work, which could not occur in a
day or a few hours. Table 4 shows that most teachers spent only
nominal amounts of time in either sort of professional development
activity. By tabulating each teacher’s total investment
across the five options above, we found that roughly half of all
teachers who reported attending one of the workshops in the past
year indicated they spent one day or less in it. About 35% reported
spending somewhere between two and six days. A smaller fraction of
those who attended workshops—and a very small fraction of the
sample as a whole—attended workshops for one week or
more. One way to place these numbers in context would be to compare Another way to put these numbers in context is to ask how they relate to teachers’ more general opportunities to learn about Table 5. Participation in reform networks
and leadership roles*
Percent that Activities
participated Attended a national mathematics teacher
association meeting 5.8**
Attended a state or regional mathematics
teacher association meeting, including California Mathematics Council
affiliates 12.5
Taught an inservice workshop or course in
mathematics or mathematics teaching
13.8 Served on a district mathematics curriculum
committee 13.7 *Teachers were asked to report only for the
year prior to the survey. **Numbers are percentages of respondents
selecting that category, weighted to represent statewide
population. So far, we have described what teachers
reported to us on their learning opportunities within the year
before the survey. We also asked teachers to tell us whether they
had had past opportunities to learn about the new standards,
although we did not inquire into the specifics of those
experiences. According to our tabulations, 65% of teachers reported
they had at some time attended school or district workshops related
to the new mathematics standards, and 45% said they had been given
time to attend offsite workshops or conferences related to those
standards. Merged, somewhere near seven out of ten teachers did one
of these two activities—many did both. But because these are
general measures only, we have no sense of the character of the
learning opportunities—whether they were long or short,
whether they focused on specific problems or general principles, or
whether their formats were innovative or
conventional. One view of the evidence in Tables 4 and 5 is that reformers in But another view of the evidence is that some
reformers took a novel departure: they grounded some
teachers’ professional development in the improved student
curriculum that state policy had helped to enable. This was a
unique opportunity, for most professional development is not so
grounded. It also is a happy event for the interested researcher,
for comparing the two approaches in Table 4 enables us to ask one
central question: Do teachers who do attend the curriculum centered
workshops in Table 4 report different kinds of practice from those
who attend the special topics workshops, or none at
all? We used the raw data reported in Table 4 to
create several variables that represent the broader classes of
opportunities to learn we identified earlier. “Time in
student curriculum workshops” is a variable marking time
invested in the workshops that used students’ new curriculum
to investigate mathematics instruction. “Time in special
topics workshops” marks attendance at workshops associated
with special issues or topics in mathematics reform. Roughly 45% of
teachers had at least some opportunity to learn about student
curriculum in either the Marilyn Burns or mathematics replacement
units workshops, and around 50% of teachers spent some time
learning about EQUALS, Family Math, or cooperative learning. Taking
time investment into consideration, the mean of our “time
spent” markers was .91 days for student curriculum, and .5
days for the special issue workshops. Finally, we created a more general variable
known as “past framework opportunity to learn (OTL).”
Since the variables in Table 4 capture teachers’
opportunities to learn only in the year prior to the survey, we
tried to control for earlier learning opportunities in predicting
“framework” and “conventional” practice.
Not doing so could lead to a type of omitted variable bias, for
teachers who had some earlier learning about the content of the new
frameworks would be lumped with teachers who had none. Our simple
measure of earlier learning showed that about 30% of teachers had
not attended one of the curriculum or mathrelated workshops in
the past year but did report some previous opportunity to
learn. Controls Causality is difficult to determine in a
onetime survey. It would not be surprising, for instance, if
teachers who took advantage of professional development that was
centered in students’ mathematics curriculum were different
than teachers who spent their time in brief workshops on peripheral
matters. Teachers of the first sort might be more committed to the
reforms, or more knowledgeable about them already, or both. Were
that the case, our measures of teachers’ opportunities to
learn would include effects of such selectivity, and relationships
with practice would be suspect. We tried to err on the side of
caution by including two controls; while these do not completely
eliminate possible selection bias, they safeguard against inflation
of teacher learning effects. The first, “affect,” is
teachers’ reports about their views of the state mathematics
reforms. Teachers answered this item on a scale of 1 to 5, with 1
being “extremely negative” and 5 “extremely
positive.” The scale mean is 3.77 and its standard deviation
.93. We include it in our equations since
teachers’ views of reform are likely to be linked to the
classroom practices they report. Affect also might be correlated
with taking certain workshops, either because being enthusiastic
about the frameworks leads teachers to certain workshops, or
because these workshops cause teachers to be more enthusiastic. We
wanted to control for selectivity—the former
case—because leaving such affect out of the model might
enable a workshop marker to pick up this selectivity and upwardly
bias the workshop variable’s value. Because affect could also
pick up some effects of workshops on individuals’ opinion of
reform, thus understating any relationship between opportunities to
learn and practice, this may act as a conservative
control. The second control is teachers’
familiarity with the state reform. Teachers who are more familiar
with these broad policy objectives may have at least learned to use
the language of the frameworks and know what is “in”
and “out.” We found, for example, that familiarity is
linked to teachers’ attitudes toward conventional math
instruction; teachers who know what classroom practices are
approved by the frameworks much less often report approval of
spending math time in drill and skill. Familiarity was measured
by asking teachers to identify the themes central and not central
to the reforms from a list of statements about instruction and
student learning. We include this in our analysis of the
relationship between opportunities to learn and classroom practice
because teachers who were more familiar with the reform might
report practices more consistent with the reforms, just because
they know what is approved. Other teachers whose classrooms were
identical but who were less familiar with the reforms might have
been less likely to report practices acceptable to reformers. The
mean of this measure is .83 on a scale of 0–1, which
indicates considerable familiarity with the leading reform ideas.
Familiarity also may be a conservative check on our analysis:
though some portion of teachers’ familiarity may predate the
workshops and thus signal selection, another portion may be an
effect of workshops. By including this measure we may be reducing
possible associations between professional development and
practice. IMPACT OF OPPORTUNITIES TO LEARN ON PRACTICE We turn now to the results. We report first on
the impact that workshop “curricula” had on
teachers’ reports of both conventional and framework
practices. Teacher Learning and
Practice The results of the Ordinary Least Squares
(OLS) regressions in Table 6 state a central finding quite bluntly:
the content of teachers’ professional development makes a
difference to their practice. Workshops that offered teachers an
opportunity to learn about student math curriculum are associated
Table 6. Associations between
teachers’ learning opportunities and practice
*indicates p <.05 **indicates p< .01 ***indicates p< .001 Note: Estimation by OLS. with teacher reports of more reformoriented
practice. The average teacher who attended a Marilyn Burns or
replacement unit workshop on student curriculum—who focused
for an average amount of time (two days)—reports nearly a
half of a standard deviation more of framework practice than the
average teacher who did not attend those workshops. And the typical
teacher who attended a weeklong student curriculum workshop appears
about a full standard deviation higher on the framework practice
scale than the teacher who did not attend this workshop at all.
Moreover, the relationship works in both directions. Teachers who
report an average amount of attendance at either Marilyn Burns or
replacement unit workshops report fewer conventional practices
(about a third of a standard deviation), than teachers who did not
attend. These opportunities to learn seem not only to increase
innovative practice but to decrease conventional practice; teachers
do not just add new practices to a conventional core, but also
change that core. In contrast, the variable for the special
topics and issues workshops has nearly a zero regression
coefficient in both cases. Workshops not closely tied to student
curriculum thus seem unrelated either to the kinds of practices
reformers wish to see in schools or to the conventional
practices— like worksheets and computational tests—that
they would rather not see. We suspect this is because the special
topics and issues workshops, though consonant with the state math
frameworks in some respects, are not so much about mathematics
teaching practices that are central to instruction, but focus
instead on other things that may be relevant to instruction but are
not chiefly about the mathematical content. Such workshops may be
useful for some purposes, but would likely be peripheral to
mathematics teaching—i.e., for adding cooperative learning
groups or new techniques for girls or students of
color—rather than for changing core beliefs and practices
about mathematics and its teaching. The coefficients on “past math framework
OTL” show a slight effect on conventional practice, and none
for framework practice. This is as expected, for the question from
which this variable was constructed invited teachers to lump all
sorts of different learning opportunities
together—opportunities both about student curriculum and
not. An extension of this analysis checked for
linearity in the effects associated with student curriculum and
special topics workshops by breaking each workshop into a set of
five dummy variables representing a discrete time investment
(Marilyn Burns for one day; Marilyn Burns for two to five days,
etc.; analysis not shown), and entering them into the models
predicting framework and conventional practice. In general, the
greater time investments in student curriculum workshops were
associated with teacher reports of more frequent framework
practices and fewer reports of conventional practices, while
greater investments of time in special topics workshops were not
related to either. This result parallels research on
students’ opportunities to learn, in which researchers have
found the combination of time and content focus to be a potent
influence on learning. It also raises an important point: even
large investments of time in less content focused workshops are not
associated with more of the practices that reformers want, nor with
fewer of the conventional practices that reformers consider
inadequate. The effects of these workshops seem tangential to the
central classroom issues measured by our practice scales and on
which the mathematics reform focused. This effect of time bears on our concerns
about selectivity. A critic might argue that the regression results
presented in Table 6 could be explained by teachers having selected
into workshops that mirror their extant teaching styles and
interests. But it seems extremely unlikely that teachers would
arrange themselves neatly by level of enthusiasm and progressive
practice into different levels of time investment as well. Thus
when we see that adding hours or days in a student curriculum
workshop means scoring progressively higher on our framework
practice scale and lower on conventional practice scale, especially
when controlling for teachers’ familiarity with and views of
reform, we surmise that learning, not fiendishly clever
selfselection, was the cause. When teachers’ opportunities to learn
from instructional policy are focused directly on student
curriculum that exemplifies the policy, then that learning is more
likely to affect teachers’ practice. Capable math teachers
must know many things, but their knowledge of mathematics, and how
it is taught and learned, are central. This explanation points to
the unusual coherence between the curricula of students’ work
and teachers’ learning that the student curriculumcentered
professional development created. Teachers in these workshops would
have been learning both the mathematics that their students would
study and something about teaching and learning it. This type of learning differs quite sharply
from most professional development, which appears to be either
generic (“classroom management”),or peripheral to
subject matter (“using math manipulatives”). Neither
has deep connections to central topics in school subjects (Little,
1993; Lord, 1994). There was a modest move in the 1980s away from
generic pedagogy workshops, toward subjectspecific workshops like
cooperative learning for math, that several observers considered an
improvement (see Little, 1989, 1993; McLaughlin, 1991). But our
results suggest that teachers’ learning opportunities may
have to go one level deeper than just subject specificity. It seems
to help to change mathematics teaching practices if teachers have
even more concrete, topicspecific learning opportunities:
fractions, or measurement, or geometry. This conjecture is
consistent with recent research in cognitive psychology, which
holds that learning is domainspecific. It also may be because the
workshops offered teachers elements of a student curriculum, which
may have helped them to structure their teaching and support their
practices when they left the workshop and returned to their
classrooms. What does this all mean for the average teacher in * * * * This picture of the impact that professional
learning can have on teacher practice is grainy, for surveys of
this sort are relatively crude instruments. But the associations
are substantively significant and fairly consistent in size across
different model specifications. They support the idea that the kind
of learning opportunities teachers have matters to their practice,
as does the time that they spend learning. Because of our concerns about causality we
have subjected our findings to some protections against selection,
such as using fairly strict control variables like
“affect” and “familiar,” to mitigate
against selection effects in our models. But since these are far
from perfect, we also performed a twostage least squares
regression to control for those factors—which may be
correlated with teacher practice—that may have led teachers
to select themselves into certain workshops. The results show that
decisions to enroll appear to be only modestly related to
teachers’ preexisting dispositions toward certain types of
mathematics teaching. Insofar as we can tell from these data,
teacher selection into workshops is only weakly rational, in the
sense that teachers carefully seek out workshops that fit with
strongly held convictions about reform. This further suggests that
our findings are robust, an impression that is strengthened by
Little’s (1989, 1993) account of the professional development
“system.” She describes teachers’ workshop
choices as usually related to very general subjectmatter interest
like “math” or “technology” but only weakly
related to things like specific workshop content, quality, or
potential effects for students’ learning. Lord (1994) goes
one step further, arguing that teachers’ staff development
choices are “random” with regard to the factors
reformers might care about. The sort of selection that concerns us
does not seem to be characteristic of professional
development. The Mediating Role of
Tests Tests are widely believed to be a significant influence on teaching, and the Efforts of this sort raise several issues for
anyone concerned about the state reforms. One is straightforward:
Did the tests affect practice? Did teachers who knew about,
administered, or shared the intellectual bent of the CLAS report
more “framework practice” and less conventional
practice than others who did not? If so, a second question is: How
did the tests affect practice? If some of the reformers were
correct, the test should have provided an incentive for
fourthgrade math teachers, or an opportunity for them to learn
more about the new mathematics teaching—or both. That
question is especially salient because there is hot disagreement
about the means by which tests influence practice—is it
learning or incentives? A third question concerns differing methods
of reform: Do the effects of tests on teachers’ practice wash
out the effects that teachers’ opportunities to learn have on
practice? That could occur if teachers who took the CLAS seriously
had attended the student curriculum workshops—but had done
so, and changed their practice, because of the test rather than the
workshops. To investigate these issues we operationalized
two variables: whether teachers “learned about
CLAS,” and whether teachers administered the CLAS. About
onethird of the teachers reported they had learned about the CLAS,
and another third reported they had administered it. But not all
teachers who learned about the mathematics CLAS said they also
administered the test, or vice versa. Table 7 shows that there is
an association between these two variables—teachers who
administered the CLAS were more likely to have had an opportunity
to learn about it. The offdiagonal cases, however, show that there
is enough variance to enable us to sort out the effects of learning
about the test from the effects of actually administering
it. Table 8, Set I, contains the results of that
effort. As one would expect, there is a statistically significant
and positive relationship between administering the CLAS and
reporting more framework practice. But the relationship is quite
modest; it does not come at all close to the size of the
association between curriculum workshop learning and practice. In
addition, this CLASpractice association does not decrease
teachers’ reports of Table 7. Learning about CLAS versus
administering the test*
Learned
about CLAS
No Yes
Total Administered CLAS No
312 93 405
(53%) (16%) (68%)
Yes
58 131 190
(10%) (22%) (32%)
Total
371 224 595
(62%) (38%)
(100%) *Numbers are weighted to represent statewide
population conventional practices like bookwork and computational
tests. Table 8. Association between OTL, practice,
and CLAS measures
~indicates p< .15 *indicates p< .05 **indicates p< .01 ***indicates p< .001 Note: Estimation by OLS. It seems that any incentive associated with
administration of the CLAS only adds new practices onto existing,
mostly conventional practice. Rather than redecorating the whole
house, teachers supplemented an existing motif with more
stuff—a result that also was clear in our fieldwork. By way of
contrast, the teachers who spent extended time in curriculum
workshops reported both less conventional practice and more
frameworkoriented practice. That modest effect of test administration
might disappoint supporters of assessmentbased reform because it
suggests that the incentives associated with testing alone are not
great. But the CLAS lasted for only two years and published results
only at the school level, which may not have been sufficient for
incentive effects to develop. There also seems to be little solace
in the results above for advocates of a contrary view: that any
effect of assessmentbased reform will occur only through
teachers’ opportunities to learn. The other new variable in
this model—whether teachers reported learning about the
CLAS—fared even worse: it was unrelated to teachers’
descriptions of their classroom practice in
mathematics. One might conclude both that the
“incentive” that the CLAS presented to teachers who
administered it caused mild change in their math instruction, and
that the test prompted little independent learning about new
mathematics practices. That alone would be humble yet hopeful news
for assessmentbased reform: since teachers certainly did not
“select” themselves into administering the test, the
effect associated with test administration should be a
“true” estimate of practitioners’ response to
policy. But there is more to the story. To further probe
teachers’ views of the assessment, we generated cross tabs
that described the relationship between administering the CLAS and
various measures of agreement with the test. Table 9 shows that
there is a strong relationship among administering the CLAS,
teachers’ views of the test, and adopting classroom practices
that it might seem to inspire. But the table also shows that not
all teachers who reported administering the CLAS either agreed with
the test’s orientation or tried to fit their teaching to it.
This implies that Table 9. Attitude toward CLAS by test
administration
The mathematics CLAS corresponds well with the
mathematics understanding that I want my students to demonstrate.
I currently use performance assessments like
CLAS in my classroom.
The mathematics CLAS has prompted me to change
some of my teaching practices.
Learning new forms of assessment has been
valuable for my teaching.
Note: Numbers do not always add to 100
due to rounding. teachers were quite selective in attending to
the new test. Many who administered the CLAS liked it and used it
as a learning opportunity, but others did not. The same can be said
for those who did not administer the test but learned about it some
other way: even without the direct “incentive” supplied
by the tests’ presence in their classroom, some found it
instructive in changing their mathematics teaching, while others
paid it little heed. That throws a bit more light on how statewide testing may influence teaching and curriculum, at least in states that resemble To pursue this more teacherdependent
representation of teachers’ relationship with the test, we
made the four survey items in Table 9 into a scale, called
“CLAS useful.” They were: 1. The mathematics CLAS corresponds well with
the mathematics understanding I want my students to demonstrate.
2. I currently use performance assessments
like CLAS in my classroom. 3. The mathematics CLAS has prompted me to
change some of my teaching practices. 4. Learning new forms of assessment has been
valuable for my teaching. The scale links several elements of the role
that an assessment might play: (1) teachers’ sense of the
congruence between the CLAS and their work; (2) their use of and
thus familiarity with such assessments; (3) their sense of whether
the test had changed their teaching, which could occur through
learning or an incentive, or both; or (4) their view of whether
they had learned from CLASlike assessments and whether the
learning was pedagogically useful. We then reran the equations that probed the
effects of testing on practice in Table 8, with this new variable
included. Doing so rendered the two testrelated variables that we
initially discussed quite insignificant—or significant in the
wrong direction (see Table 8, Set II). Moreover, teachers who score
relatively high on this scale not only have more reformoriented
practices but fewer conventional practices, which indicates a more
thorough revision of practice, and perhaps greater internal
consistency in teachers’ work than if teachers had reported
more framework practice but no less conventional practice. This
supports the view that it is neither learning alone nor incentives
alone that make a difference to teachers’ practice, but a
combination of experience, knowledge, belief, and incentives that
seem to condition teachers’ responses to the test. The
effects of assessment on practice appear among those teachers who
constituted themselves as learners about and sympathizers with the
test—and this group itself seems to consist both of teachers
whose approaches already concurred with the test and those for whom
the test spurred learning more about mathematics. This complex relationship also was evident in the views of . . . the CLAS test....It was a shock to me.
They (students) really did fall apart. It was like, “Oh! What
do I do?” And I realized, I need to look at mathematics
differently. You know, I really was doing it the way I had been
taught so many years before. I mean, it was so dated. And I began
last year, because of the CLAS test the year before, looking to see
what other kinds of things were available. (Perry, 1996, p. 87)
This suggests that the teacher’s learning (“. . .
looking to see what other kinds of things were available”)
and her efforts to change her practice were associated with the
incentive for change that was created when she noticed that her
students “. . . really did fall apart” when trying to
take the new test. Her students’ weak performance as test
takers prompted her to find ways to help them do better, before she
saw any scores. Another major reason the new assessment system
worked as it did is that it provided opportunities for teachers to
learn. To start, the CDE involved a small number of teachers in the
development and pilot testing of the CLAS. The state department
then paid many more teachers—several hundred—to grade
student responses to openended tasks on the 1993 and 1994
assessments. These teachers then returned to their districts and
taught others about performance assessment in general, and about
the CLAS in particular. Other opportunities to learn about the test
were made available through the California Mathematics Council and
its regional affiliates, various branches of the California Math
Projects, and assessment collaboratives in the state. Finally, the
state published in 1991 and 1993 “Samplers of Mathematics
Assessment” to help familiarize teachers with the novel
problems and formats of the new test. Wherever teachers came into contact with the
new assessment, they had opportunities to examine student work
closely, to think about children’s mathematical thinking, and
to learn about the activities and understandings associated with
the state’s reform. Such work would have offered participants
elements of a “curriculum” of improved math teaching.
Simply administering the CLAS may have served as a curriculum for
many teachers, for it provided those unfamiliar with the frameworks
a chance to observe how children react to challenging math problems
and novel exercises and activities. In either event, the closer a
teacher’s contact with the test—via its administration
or by learning about it—the more likely he or she was to have
had both internal incentives to change and opportunities to
learn. Our third question about testing was whether
the effects of the CLAS on teachers’ practice washed out the
effects of their workshop learning on practice. Table 8 shows that
they did not. When we ran models with only “administered
CLAS” and “learned about CLAS” (Table 8, Set I),
the coefficients on the curriculum workshop variables declined very
slightly. When we entered “CLAS useful” (Table 8, Set
II), the student curriculum coefficient declined a bit, suggesting
modest overlap between teachers’ learning about the CLAS and
learning from curriculum. But it was a small overlap: the
coefficients on “student curriculum workshops” remained
quite near their former size, and statistically
significant. Teachers’ learning through student
curriculum workshops and their learning via the CLAS were more
independent than overlapping paths to frameworkoriented practice.
* * * * These effects tend to support our conjecture
that teachers’ opportunities to learn can be a crucial link
between instructional policy and classroom practice. Many educators
believe that such links exist, but research generally has not
supported that belief. Our results suggest that one may expect such
links when teachers’ opportunities to learn
are: • grounded in the curriculum that
students study; • connected to several elements of
instruction—i.e., not only curriculum but also assessment;
• extended in time, possibly including
followup during the school year. Such opportunities are quite unusual in
American education, for professional development rarely has been
grounded either in the academic content of schooling or in
knowledge of students’ performance. That is probably why so
few studies of professional development report connections with
teachers’ practice, and why so many studies of instructional
policy report weak implementation: teachers’ work as learners
has not been tied to the academic content of their work with
students. EFFECTS ON STUDENT
ACHIEVEMENT Reformers in To explore this reasoning we merged student
scores on the 1994 fourth grade mathematics CLAS onto the school
files in our data set. The CLAS included a good deal of
“performance based assessment”; to do well, students
would have had to answer adequately a combination of openended and
multiple choice items designed to tap their understanding of
mathematical problems and procedures. State scorers assigned
students a score of 1–6 based on their proficiency level, and
school scores were reported as “percent of students scoring
level 1,” and so on. We established an average of these for
each school to arrive at our dependent variable (CLAS),on which
higher school scores represent a more proficient student body. The
mean of CLAS in our sample of schools was 2.77, and the standard
deviation at the school level .56. Because assessment
officials corrected problems that had turned up the year before,
the 1994 assessment was technically improved—all student
booklets were scored, and measurement problems reduced. Moreover,
it was administered in the spring of 1994, roughly six months
before this survey, so our estimates of teachers’ learning
opportunities and practice corresponded in time to the
assessment. Despite good timing, we faced several
difficulties. Because the CDE reported only schoollevel scores, we
had to compute school averages of all independent variables,
including teachers’ reports of practice and opportunities to
learn. Using aggregate data tends to increase any relationship
found between variables, because aggregation is apt to reduce
random error in data. Yet the survey sampled only four or fewer
teachers per school, so the averages enable us to get only a crude
estimate of our independent measures. These measures of school
engagement with reform are therefore “error filled,”
that is, likely to bias the investigation against finding
significant results, since random “noise” in equations
is known to diminish the effects on affected variables. Working
with school averages also reduced the size of the sample (n
5 161), for we deleted school files in which only one teacher
responded or lacked CLAS scores. We created three additional variables for each
school in the reduced sample. One is the 1994 report of the percent
of students in each school who qualified for free lunch (%FLE,
freeluncheligible), so we can allow for the influence of
students’ social class on test scores. The next is the school
average of teachers’ estimates of the school environment,
called “school conditions.” This consists of a
fivepoint scale that includes teacher reports on parental support,
student turnover, and the condition of facilities, with 5
indicating poorer conditions. Finally, we took teachers’
reports of the number of replacement units they used and averaged
them by school; the mean for this measure is .73, its standard
deviation .70. In addition to these three, we continued to use the
variables that mark other potential connections between policy and
practice, including time in student curriculum workshops, our
control for teachers’ past framework learning
experiences, teachers’ reports of framework practice, and
the CLASOTL measure, all averaged for schools. Table 10 shows the
school averages for all these measures. The central issue in this analysis is whether
the evidence supports our model of relations between policy and
performance, but this question is difficult to handle empirically.
Reformers and researchers argue that the more actual overlap there
is among policy instruments, the more likely teachers, students,
and parents are to get the same messages and respond in ways that
are consistent with policy. But the more the overlap, the more
highly correlated any possible measures of those policy instruments
would be, and thus the greater the problems of multicolinearity.
The more successful agencies are at “aligning” the
instruments of a given policy, the more headaches analysts will
have in discerning the extent to which they operate jointly or
separately. Table 11 displays some reasons for such
headaches, for it reveals that the correlations among the
independent variables of interest range from mild to moderately
strong. At the stronger end of this continuum, school average
incidence of using replacement units is correlated at .44 with the
school average teacher report of participation in the student
curriculum workshops within the previous year, and at .48 with
school average reports of framework practice. This makes sense,
since student curriculum workshops should provide teachers
replacement unit materials and knowhow, and encourage them to
change their practices. At the weak end of the continuum, school
average reports of teachers’ learning about the CLAS is
correlated at only the .10 to .16 level with schools’ use of
replacement units, teachers’ reports of framework practice,
and their average participation in the student curriculum
workshops. Special topics workshops and conventional practices also
evidenced low correlations with other variables. Finally, school
average student performance on the math CLAS is correlated at the
.14 to .29 level with the markers for policy instruments that we
think may explain variation in these math scores. With this knowledge, we built an analysis
strategy. Starting with a base equation including the demographic
measures, we tested our primary conjecture in this part of the
analysis: that changes in teacher practice will lead to
improvements in student performance. But because our practice scale
is an imperfect measure, tapping only one subset of the ways
instruction might improve, we also tested the separate effects of
each of the policy Table 10. Basic data statistics for
analysis of achievement and policy
Table 11. Intercorrelations among measures
of policy “instruments” and math performance ( school
level )
variables—teacher learning about the
CLAS, use of replacement units, and learning about the student
curriculum—on student achievement in successive equations.
These models will provide some overall impressions about the effect
of policy on student performance because each of the variables
roughly summarizes a type of intervention that policy makers or
others can organize. Yet the coefficient estimates in these three
models will be compromised by the high correlations among the
policy variables as evidenced in Table 11. Hence we devised a
second strategy: put all three policy variables in the base
equation at once to see if it is possible to sort out the
independent effects of new student curriculum, teacher learning,
and learning about the test on student achievement. If the second
method enables us to distinguish the relative importance of policy
variables, it would offer evidence about which paths to reform
might be most effective. Finally, we also want to know whether
these policy activities were independently influential in improving
student performance, or whether they operate through
teachers’ practice. So our third analysis strategy is to add
back our practice variable to this fuller model. We include the
demographic measures in all equations to control the influence of
social and economic status on student performance. We start with teachers’ practice alone
because we have already shown practice at least in part results
from some of the learning opportunities provided by reformers, and
because it provides the most logical link between policy
makers’ efforts to affect what happens in the classroom and
how students score on tests. Equation 1 (column 1) in Table 12
below shows a modest relationship: schools in which teachers report
classroom practice that is more oriented to the math frameworks
have higher average student scores on the fourth grade 1994 CLAS,
controlling for the demographic characteristics of schools. No such
relationship, however, was found between schools high on our
conventional practice scale and student achievement scores. This
provides evidence that teachers’ practice links the goals and
results of state policy: students benefited from having teachers
whose work was more closely tied to state instructional goals.
Though this interpretation is based on aggregate data, it is
difficult to think of any other reasonable inference than that
teachers’ opportunities to learn can pay off for their
students’ performance if the conditions summarized in our
model are satisfied. The significant coefficient on
“framework practice” also helps to answer one possible
criticism of our earlier analysis, namely that the relationship
between workshop attendance and framework practice results from
teachers learning to “talk the talk” of reform rather
than making substantial changes in their classrooms. A critic might
argue that that relationship is an artifact of teachers’
rephrasing their descriptions of classroom work to be more
consistent with the reform lingo; in that critic’s scenario,
only the talk Table 12. Associations between
teachers’ practice, their learning, and student math scores
*indicates significance at p< .05
level **indicates significance at p< .15
level Note: All survey based measures are
averages from the teachers within a school who
responded. would be different, and classroom practice
would be the same. But if teachers learned only new talk, it is
difficult to imagine how schools with teachers who report more
frameworkrelated instruction should post higher scores on the
CLAS. Thus the association between framework practice and student
scores seems to ratify the link between teacher and student
learning, and to imply that teachers are roughly doing what they
report. It also seems to indirectly confirm our earlier finding
that teachers who had substantial opportunities to learn did
substantially change their practice. Our second model concerns the
effect of teachers’ learning on student achievement. Given
the analysis just above, we would expect a modest relationship
between teacher attendance at student curriculum workshops and CLAS
scores absent other things, for we have seen teachers who attend
these workshops report more framework practice. Controlling for
teachers’ past framework learning, that relationship does
occur, as is evident from Model 2 in Table 12. A more important query, perhaps, is the effect of teacher learning in the special topics workshops on student achievement. We saw earlier that this variable contributed little to explaining differences among teachers in framework or conventional mathematics practice. Hence any effect we might find on student achievement would be through pathways not detected by these scales, like increasing teacher knowledge, improving equity within classrooms, or helping teachers better understand student learning. But we found no such effect of special topics workshops on student achievement. This is a very important result: whatever improvements these workshops may bring to The third component of the policy mix, the use
of replacement units, also shows a positive relationship to student
achievement. Model 3 indicates that schools in which teachers
report they use one replacement unit each have student test scores
which average about a fifth of a standard deviation higher than
schools in which no teachers reported replacement unit
use. Finally, we come to the effect on achievement
associated with teacher learning about the CLAS. The coefficient on
the CLASOTL (Model 4) suggests a clear effect: when comparing
student achievement scores, schools where all teachers learned
about the CLAS had student test scores that were roughly a third of
a standard deviation higher than schools where no teachers learned.
It is easier to report this result than to decide what it means.
The CLASOTL measure consists of the question asking teachers
whether they have had an opportunity to learn about the new test in
professional development, test piloting, scoring, and so forth. We
saw earlier that this kind of learning affected teachers’
practices under certain conditions, and that learning may then
translate into changed practice and improved student achievement.
But it also is possible that teachers prepared their students by
administering CLASlike assessments, used performancebased
assessments yearround, or learned something more about mathematics
while learning about the CLAS. In principle, then, both our practice and
policy measures positively relate to student achievement. State
efforts to improve instruction can affect both teaching and
learning. But the relatively close relations among these markers
call the point estimates in these models into question, since
omitting any one variable will allow another to pick up its effects
via their correlation. So we ask next about the “true”
influence of each policy instrument on student achievement,
controlling for the effects of others: Do the three instruments of
policy exert their influence jointly, each having some independent
effect on performance, or does one dominate? This is an important
theoretical and practical question, for if one instrument were
overwhelmingly influential we would draw different inferences for
action than if several were jointly influential. To this end, we
entered the CLASOTL, student curriculum workshop, and replacement
unit markers into the CLAS regression along with the important
control variable, “past framework learning,” hoping we
had enough statistical power to sort among them. We did, and Model 5 offers a version of the
joint influence story. Schools in which all teachers reported using
an average of one replacement unit appeared a little less than a
fifth of a standard deviation higher in the distribution of CLAS
scores than schools where no replacement units were used. This
effect is modest, and statistically significant at the .15 level.
Teacher learning in student curriculum workshops
“added” less power to student learning than did
replacement unit use. But this is what one would expect; if
teachers’ participation in curriculum workshops were to have
an effect on students’ performance, that effect would be
exerted both through what teachers learned about materials,
mathematics, and other things, and through the materials they used
with students. Hence we would expect the curriculum marker to be
linked to student performance and to pick up some of the workshop
effect. In addition, schools in which teachers had
opportunities to learn about the CLAS itself continued to post
scores about a third of a standard deviation higher than schools in
which teachers did not. Two of the three interventions organized by
reformers were associated with higher student scores on the
CLAS. One reason these policy variables might appear
significant in this equation is that they might correlate with
framework practice. If instructional policy is to improve student
achievement, it must do so to some extent through changes in
teacher practice, for students will not learn more simply because
teachers know different things about mathematics or have
been exposed to new curricula or tests. Instructional interventions
like those studied here must change what teachers do in the
classroom—including what they do with curricula and
tests—even if very subtly, in order to affect student
understanding. Teachers who used new curricula but understood
nothing about how to use them might have some students who were so
capable and motivated as to learn from the materials alone, but
there is no reason to expect general effects of curriculum alone on
student learning. Following this reasoning—and assuming that
we had measured framework practice perfectly—adding the
measure of framework practice to Model 5 should result in that
variable gathering an effect and zeroing out the three policy
measures. Model 6 reveals some but not all of that
result. “Framework practice” and “learned about
CLAS” retain some modestly significant effect on CLAS scores,
but replacement unit use and student curriculum workshops edge
closer to zero. Notably, the coefficient on our measure of
framework practice is cut by about a third, indicating it
“shares” variance with student curriculum workshops and
replacement unit use. But we do not imagine we have discovered a
hitherto unnoticed magical effect of teacher knowledge or
curriculum use on student achievement. Instead, we are inclined to
stick to our learningpracticelearning story. One reason is that
the three variables that split variance are the most colinear,
suggesting both that the regression algorithm will have difficulty
sorting among their effects, and that we might do better to
conceive of the three as a package, rather than as independent
units. An Ftest of the three policy variables (student curriculum
use; workshops; and CLAS learning) finds them jointly
significant. A second reason is that our practice scale is
imperfect. Recall the types of items that comprise this measure:
students do problems that have more than one correct solution;
students make conjectures; students work in small groups. While
these represent one aspect of the ways teachers’ practices
may change due to reformers’ efforts, they don’t
represent others, such as the changes in practice that might occur
when a teacher’s understanding of mathematics deepens, when
teachers understand student learning differently, when they
reconceive assessment, or when their pedagogical content knowledge
increases. It is hard to imagine these interventions not teaching
teachers some of these things, yet these dimensions of instruction
are omitted from the framework practice scale. Hence if, as we
expect, they do affect student achievement, they would be picked up
by the policy variables in Model 6. Model 6 teaches us as much
about issues in survey research in instructional policy as it does
about the pathways to improved student achievement. CONCLUSION We began this article by sketching an
instructional view of instructional policy. We argued that policy
makers increasingly seek to improve student achievement by
manipulating elements of instruction, including assessment,
curriculum, and teachers’ knowledge and practice. To do so
requires the deployment of a range of instruments that are specific
to instructional policy, including student curriculum, assessments,
and teachers’ opportunities to learn. Because these
instruments’ effects would depend in considerable part on
professionals’ learning, both teachers’ knowledge and
practice and their opportunities to learn would be key to such
policies’ effects. We proposed a rudimentary model of this
sort, in which students’ achievement was the ultimate
dependent measure of the effects of instructional policy, and in
which teachers’ practice was both an intermediate dependent
measure of policy enactment and a direct influence on
students’ performance. Teachers figure as a key connection
between policy and practice, and teachers’ opportunities to
learn what the policy implies for instruction are both a crucial
influence on their practice, and at least an indirect influence on
student achievement. The results we reported seem to bear out the
model’s usefulness. We were able to operationalize measures
of each important element, and the relationships we hypothesized
did exist. Teachers’ opportunities to learn about reform do
affect their knowledge and practices; when those opportunities were
situated in curriculum that was designed to be consistent with the
reforms, and which their students studied, teachers reported
practice that was significantly closer to the aims of the policy.
There was a consistent relationship among the professional
curriculum of reform, the purposes of policy, assessment and
teachers’ knowledge of assessment, and the student
curriculum. Since the assessment of students’ performance was
consistent with the student and teacher curriculum, teachers’
opportunities to learn paid off for students’ math
performance. This confirms the analytic usefulness of an
instructional model of instructional policy, and suggests the
potent role that professional education could play in efforts to
improve public education. It has been relatively unusual for researchers
to investigate the relations between teachers’ and
students’ learning, and when they did so it has been even
more unusual to find evidence that teachers’ learning
influenced students’ learning. But a few recent studies are
consistent with our results. Wiley and Yoon (1995) investigated the
impact of teachers’ learning opportunities on student
performance on the 1993 CLAS, and found higher student achievement
when teachers had extended opportunities to learn about mathematics
curriculum and instruction. Brown, Smith, and Stein (1996) analyzed
teacher learning, practice, and student achievement data collected
from four QUASAR project schools, and found that students had
higher scores when teachers had more opportunities to study a
coherent curriculum designed to enhance both teacher and student
learning. In a metaanalysis of the features of professional
development that affect teachers’ practice and student
achievement, Kennedy (1998) found the content for teacher
learning—e.g., subject matter or student learning—was
an important predictor of student achievement. Format, such as
distributed hours or longer opportunities, was not. All of these studies suggest that when
educational improvement is focused on learning and teaching
academic content, and when curriculum for improving teaching
overlaps with curriculum and assessment for students, teaching
practice and student performance are likely to improve. Under such
circumstances educational policy can be an instrument for improving
teaching and learning. Policies that do not meet these
conditions—like new assessments or curricula that do not
offer teachers adequate opportunities to learn, or professional
development that is not grounded in academic content—are less
likely to have constructive effects. These points have important bearing for the professional development system—or nonsystem. Professional development that is fragmented, not focused on curriculum for students, and does not afford teachers consequential opportunities to learn cannot be expected to be a constructive agent of state or local policy. Yet that seems to be the nature of most professional development in the Elements of this analysis also seem to confirm
some of the arguments for standardsbased reform, particularly
those that stress the importance of consistency among the elements
of instructional policy, and the importance of professional
learning. But the analysis goes beyond most enacted versions of
standardsbased reform, for it suggests the value of school
improvement that integrates better curriculum for students, makes
suitable provision for teachers to learn that curriculum, focuses
teaching on learning, and thoughtfully links curriculum and
assessment to teaching. Most examples of standardsbased reform do
not meet these criteria; some other approaches to school
improvement that do not fly the banner of standardsbased reform do
meet the criteria. It also is worth noting that ours is not a story in which state agencies carried the day. Rather, the related actions of government and professional organizations were crucial. When this diverse set of agencies worked together it was able to create coherent relationships among teachers’ opportunities to learn, their practice, school curriculum and assessments, and student achievement. But such relationships were not easy to organize. Our evidence shows that reformers in This returns us to the opening of this essay, where we distinguished between life at and below the surface of policy. At the surface, in debates about math reform that roiled But all of this rests on nonexperimental
evidence, which is not conclusive. At the very least, the
relationships that we have reported should be investigated with a
larger population of schools and teachers in a longitudinal format,
so that more robust causal attributions might be probed, and more
precise measures employed. Better yet, experiments could be
designed to clarify both the mechanisms and effects of teacher
learning. But the results that we reported do not come from left
field: they seem reasonably robust, and are quite consistent with
several related lines of recent research. Though we think better
research on these issues is essential, we would be surprised if the
direction of the effects we have found, and our model of causation,
do not stand up in a more powerful design. We have no hesitation in
writing that policy makers and practitioners would be well advised
to more solidly ground teachers’ professional education in
deeper knowledge of the student curriculum, or that it would be
wise, when new curricula and assessments are being designed, to
make much more adequate provision for teachers to acquaint
themselves with and learn from them. APPENDIX A: SELECTIVITY A twostageleastsquares was performed on the
student curriculum models to help mitigate against “selection
effects”—that is, the possibility that teachers who
attended one of the workshops did so because they were somehow
predisposed to teach to the Frameworks. To do this, we identified
variables that affect the probability teachers would attend a
Marilyn Burns or replacement unit workshop and estimated a probit
equation representing that relationship. We then took the predicted
values from this first equation and used them instead of the
Student Curriculum Workshop (SCW) marker in the student curriculum
models. In order to resolve endogeneity problems, we
needed to identify factors that affect the probability a unit will
select into the “treatment” condition but that do not
affect the final outcome variable. We searched for factors that
might encourage teachers to take these workshops but would not have
a direct effect on their practice. Using both theory and empirical
investigation, we identified three such factors: • Policy networks, a variable marking teacher attendance at national or regional mathematics association meetings. Such participation should affect teacher practice if the content of meetings focuses on substantive matters of instruction and mathematics; where the focus is administrative or political matters, practice is less likely to be affected (Lichtenstein et al., 1992). The content of • District development, a
variable marking teacher participation in district mathematics
committees or in teaching math inservices. Again, knowledge of the
content of those activities is key to understanding whether this
should affect teacher practice or not. In the absence of this
information, however, we proceed on the basis of results from a
regression analysis that shows this marker unrelated to teacher
practices. • Administrative support, a
threeitem measure of teachers’ reports of the extent to
which their principal, school, and district are wellinformed and
favorable toward the Frameworks. One item specifically asked about
the amount of staff development supplied by the district. School
and district instructional policy, however, is not thought to have
great direct impact on teacher practice, and this measure has no
direct effect on our practice scales. We also chose to include two more variables in
the firststage selection equations—teacher affect toward the
reforms, and teacher familiarity with the reforms—on the view
that these markers might indicate teacher desire to learn about
both the reforms and children’s curriculum as a vehicle for
those reforms. To the extent these capture teacher
“will” they will act as important
controls. Teachers’ reports on all these measures
were entered into the first stage probit equation predicting
whether or not teachers attended an SCW activity in 1993–1994
(Table A1): attendSCW= b0 +b1 affect +b3
familiar +b4 policy +b5 district +b6
adminsup All five proved moderately strong predictors
of student curriculum workshop attendance. Other
variables—teacher math background, classes in mathematics
teaching, student race, and class—were examined but yielded
weaker or nonexistent relationships to workshop attendance. It is
noteworthy that teacher affect toward the reforms is outperformed
by other predictors in this equation. Based on the firststage model above, a
predicted level of SCW (zero or one) was generated using the probit
model for each observation in the sample, and this predicted value
was entered into a pair of practice equations similar to those in
Table 6 (see Table A2): practice = b0 +b1 SCW +b2
affect +b3 familiar Table A1. Firststage regression
predicting student curriculum workshop
Attendance
Here the coefficient on a dummy SCW variable
increases from .54 (SE 5 .06) to 1.04 (SE 5 .19) in
the framework practice regression. The increase in the coefficient
is likely due to the decreased precision with which our statistical
package can estimate the twostage equation rather than to
substantive differences in its real value. The coefficient on a
dummy SCW variable in the conventional practice regression likewise
dropped from 2.28 (SE 5 .07) to 2.74 (SE 5 .22) but
also likewise saw higher standard errors. Table A2. Predicting teachers’
practice from “corrected” student curriculum workshop
attendance
Despite the decrease in precision with which
we could estimate both equations, we note that both measures of SCW
remain significant and related to the dependent variables in the
expected direction. The same procedure was accomplished for the
regressions using the variable “time in student curriculum
workshop” instead of the simple dummy SCW. Similar results
obtained. This method—specifying a twostage model
in which the first stage is a probit—tends to inflate
standard errors for the regressors in the model. Because our
regressors remained significant predictors of teacher practice
outcomes, however, we did not pursue methods to correct this
problem. APPENDIX B: CHANGE IN COEFFICIENTS IN TABLE
8 Clogg, Petkova, and Haritou’s (1995)
test for difference in nested coefficient compares point estimates
within models with and without one or a set of predictors. Point
estimates for the variable in question—here student
curriculum workshop—are compared with and without the
“competing” explanatory variable(s) in the equation to
see if the difference in its “effect” is significant,
and thus warranting of a claim that the regression is in fact
incorrect without the competitor variable included. Here we
examined the point estimate on student curriculum workshop both
with and without the CLAS variables—CLAS Useful, CLASOTL,
CLASADM (administering class)—in and out of the equation
(Table B1; see note 26). For more details, see Clogg, Petkova, and
Haritou (1995). APPENDIX C: HIERARCHICAL LINEAR
MODELING ADJUSTMENT FOR MEASUREMENT ERROR We improved the estimates in our CLAS models
by employing (a) knowledge of the variances and covariances of the
observed data, (b) assumptions Table B1. Clogg, Petkova, Haritou test
regarding the nature of the variances and
covariances of the measurement error, and (c) multilevel modeling
using hierarchical linear modeling (HLM) software. These
assumptions maintain that errors of measurement are independent of
one another and of the true values. The intuition behind the model is this: given
that we have observed “error filled” estimates of our
dependent and independent variables, and that we also have some
knowledge of the error variance of each of these variables
(obtained through HLM for all independent variables; for CLAS, the
variance in student scores), we can obtain the “true”
point estimates for the model. The math for the model that includes
SCW follows: CLAS = b_{1} +
e_{1} SCW = b_{2} +
e_{2} b_{1} = y_{10} +
y_{11} %FLE + y_{12}
PastOTL + u_{1} b_{2} = y_{20}
+u_{2} [b_{1} E = [y_{10} +
y_{11} %FLE + y_{12}
PastOTL b_{2}]
y_{20}
] [b_{1} var = [ t_{11}
t_{12} t_{21} t_{22} ] b_{2}] E(b_{1}/ b_{2}) =
y_{10} + y_{11} %FLE + y_{12}
PastOTL + t_{12} / t_{22}
(b_{2}  y_{20}) + S var(S) = t11  t12 + t22^{1}
t11 Level 1 observations are weighted by the
variance of the measure. The equations for our other independent
variables are similar. Due to multicolinearity, we could only test
three and fourvariable models. We did so with an expanded sample,
however, since using HLM allowed us to readmit into the sample
schools where only one teacher answered the survey (N 5
199). The results of the models are listed in Table
C1. Most coefficients increase to between three
and seven times their previous size, an outcome consistent with the
observation that “true” coefficients are attenuated by
10reliabilityofvariable when measurement error is present. The
reliabilities of these measures—essentially a measure of
agreement between teachers within schools—range between .15
(replacement unit use) and .50 (conventional practice). Standard
errors are likewise affected. Since Table C 1. HLM
models
*indicates significance at p< .05
** indicates significance at p< .10
*** These models were created before the
variable “school conditions” was reversed ( to make it
interpretable in the same way as “percent FLE” ) . Thus
in these models, higher scores on “school conditions”
represent better teacher estimates of their working climate.
**** CLAS OTL was not “improved”
because it was, at the teacher level, a 01
variable. they are also functions of a measure’s
reliability at the school level, they also inflate when the
measurement error fix is applied. Thus we achieve with this method
more realistic point estimates, but cannot decrease the size of the
confidence interval around these estimates. Apart from the size of actual estimates
themselves, there are several other things to note in Table C1.
Teachers’ average reports of conventional practice continue
to be statistically unrelated to student achievement on the CLAS,
as does teachers’ time spent in special topics workshops.
Though the coefficient on the latter does increase in size, it
remains well below the level acceptable within conventional
significance testing. Also, when controlling for replacement unit
use and student curriculum workshop learning, the effect on
“school conditions” disappears, suggesting it proxies
for these variables when they are more roughly measured. Finally,
the coefficient on the replacement unit variable does
increase—but is swamped by the increase in its standard
error. This paper is part of a continuing study of the origins and enactment of the reforms and their effects that was carried out by Deborah Loewenberg Ball, David K. Cohen, Penelope Peterson, Suzanne Wilson, and a group of associated researchers at An earlier version of this paper was
presented at the 1997 meeting of the American Education Research
Association, and draws on a larger book manuscript. We thank Gary
Natriello at Teachers College Record and an anonymous reviewer for
helpful comments. For such comments on earlier versions we also
thank Richard Murnane and Susan Moffitt. Notes In the 1970s and early 1980s, in response to worries about relaxed standards and weak performance by disadvantaged students, states and the federal government pressed basic skills instruction on schools, supporting the idea with technical assistance, and enforcing it with standardized “minimum competency” tests. Those tests were In This was not the first bad news about student performance in One investigation carried out was the RAND Change Agent studies (Berman & McLaughlin, 1978); another was Richard Elmore’s use of what he calls backward mapping (1979); a third was made by Pressman and Wildavsky in their book on implementation (1984); a fourth was done by a research group at Michigan State University led by Andrew Porter, which found that teachers were “policy brokers.” Michael Lipsky’s book Street Level Bureaucracy (1980) offers one of the few efforts at extended explanation of policy failures from a perspective of practice. Because this was a onetime survey, this analysis faces several problems. To start, selection effects threaten any causal claims we might make; we deal with this extensively below. Further, variables constructed from a single survey tend to be more highly intercorrelated than independently derived estimates. The survey was designed by Ball, Cohen, Peterson and As is often the case with factor analyses, the “results” are dependent on statistical specifications. When different types of factor analyses turned up conflicting results for specific items, theoretical judgments were made concerning where those items belonged. In the main, however, every factor analysis run turned up two dimensions: conventional and “framework” practice. It is common in workshops like EQUALS and cooperative learning for teachers to engage in mathematical activities they may then bring back to their classes to “try out.” We feel it is important to distinguish between these activities, which tend to be short exercises intended to motivate students or introduce them to a topic, from the kind of curriculum offered by a replacement unit. Iris Weiss’ 1993 National Survey of Science and Mathematics Education suggests that teachers in The survey asked teachers to circle an amount of time ranging from “one day or less” to “more than two weeks” rather than write down the number of days they spent at each activity. To calculate time spent, we assumed the following: “None” 5 0 days; “One day or less” 51 day; “2–6 days” 5 4 days; “1–2 weeks” 57.5 days; “More than 2 weeks” 514 days. We then added up the teachers’ reports of workshop attendance. A number of respondents in this category, for instance, reported using replacement units, indicating they had perhaps attended a replacement unit workshop in a past year. Our hypothesis is not that knowing of broad policy objectives will, ceteris paribus, lead teachers to greater classroom enactment; knowledge of broad policy prescriptions is not the same as practice, many of these practices require learning and resources, and the scale of familiarity does not measure knowledge deeply. We say “smaller scale” because that is what we have found; familiarity with reform has a stronger influence on teachers’ beliefs than on their practice. When we run the models in Table 5 without “affect” and “familiar” with controls, the size of the coefficients on the student curriculum workshop variables increases. Instead, teachers’ familiarity with reform, work in reform networks, and their administrators’ support for reform were the strongest predictors of participation in student curriculum workshops. The question asked if teachers “. . . participated in any activities that provided [them] with information about the CLAS (e.g., task development, scoring, pilot testing, staff development).” Making the four survey items in Table 9 into a dependent measure and regressing it on “administered CLAS” and “learned about CLAS” show that both learning and doing add about the same amount of “enthusiasm” to teachers’ responses. This scale runs from 1 (CLAS did not correspond . . . etc.) to 5 (CLAS corresponded well . . . etc.). Its mean is 3.24, its standard deviation 1.02, and its reliability .85. According to the test suggested by Clogg, Petkova, and Haritou (1995), the change in the student curriculum workshop coefficient is not significant in the conventional practice model (t 5 1.74), but significant in the framework practice model (t 5 6.26). See Appendix B for details. The same statistics for all elementary schools in the state
are: N Mean
Std Dev Minimum Maximum
4228 2.8135951 0.6242373 0
5.0200000 The studentlevel standard deviation for our sample (constructed from schools’ reports of student distributions) is 1.728. To the extent teachers’ workshop learning occurred in the summer of 1994 (after the test) we could underestimate the effect of these workshops on student learning. The CLAS scores also have some measurement error, most of it consistent with the usual problems associated with psychometric research. Also, the CDE reports that schools’ CLAS scores were not reported in the case where “error” in the score crossed above a threshold of acceptability, the number of students on which the score was based was low, or the number of students who opted out of taking the test was too high. We compared schools that we did use in the CLAS analysis against those we could not use because they had missing school scores, had only one teacher who responded to our survey, or were unusable for some other reason. Of our independent variables, significant differences between the two groups occurred in only a handful of cases: schools with CLAS data tended to have fewer freeluncheligible students; schools with CLAS data tended to have teachers who reported more opportunities to learn about the assessment, were more likely to have teachers who said they had administered the test, and had higher scores on the CLAS useful scale; schools with CLAS data also had more teachers, on average, who attended student curriculum workshops—although there is no significant difference in the “time” correlate of this variable used in the CLAS analysis. We include this variable in our equations because educational environments are not perfectly correlated with student socioeconomic status; some schools enrolling many FLE children, for example, have teachers who report quite orderly environments, with lots of parental support and good building facilities. The scale items are: [Q: How well does each of the following statements describe general conditions and resources for mathematics teaching in your classroom, school, and district?] 1. Adequate parent support of your instruction; 2. High student turnover during the school year; 3. Wellmaintained school facilities. This variable is underspecified, but not including it biases the coefficients on the remaining variables, since teachers with some past opportunities to learn would be marked as zero, and the baseline would be off. This correlation probably would rise if we had careerlong estimates of teachers’ attendance at student curriculum workshops. We tried both “Learned about CLAS” and “CLAS Useful” in this model, since both could be measures of teachers’ attempts to prepare students for the test. “CLAS Useful” was not significant, and evidenced colinearity with “framework practice.” There is reason to expect the coefficient on student curriculumtime in this model— and elsewhere—is actually underestimated. Remember that the survey asked teachers to report workshop learning of this type within the last year—leaving teachers who attended student curriculum workshops in past years and now use replacement units represented by only the replacement unit marker. This will bias the effect of replacement unit use up, and student curriculumtime down. Because we had at most four teachers from whom to construct each school’s profile, all estimates in Table 11 are attenuated downward by measurement error. We attempted to correct these models for this error using a latent variable approach and a prototype version of the hierarchical linear modeling software. By employing information generated by that program about the variance and covariance of both errors and variables, we were able to arrive at better point estimates for models with one independent regressor. The size of the “effect” on the student curriculum time marker, for instance, rises to .43 from .065 when holding percent FLE and school conditions constant. But though we have arrived at this more reasonable point estimate, we cannot be concomitantly more sure of its accuracy, as standard errors do not also improve with this method. Further, our attempts to test fuller models including competing policy and practice variables were frustrated by the low reliability and multicolinearity of the survey measures. For these reasons, we leave both the technical details and model presentation to Appendix C. We are indebted to Steve Raudenbush for masterminding this fix for measurement error, but reserve the right to claim all mistakes as our own. These studies are supported indirectly by other work on opportunity to learn, including Cooley and Leinhardt’s Instructional Dimensions Study (Leinhardt & Seewaldt, 1981; Linn, 1983), other research concerning the significance of time on task, and studies of the relationship between the purposes and content of instruction (Barr & Dreeben, 1983; Berliner, 1979). Efforts to improve schools typically have focused only on one or another of the influences that we discussed. Challenging curricula have failed to broadly influence teaching and learning at least partly because teachers had few opportunities to learn and improve their practice (Dow, 1991). Countless efforts to change teacher’s practices in various types of professional development have been unrelated to central features of the curriculum that students would study, and have issued in no evidence of effect on students’ learning. Many efforts to “drive” instruction by using “highstakes” tests failed to either link the tests to the student curriculum or to offer teachers substantial opportunities to learn. These and other interventions assume that working on one of the many elements that shape instruction will affect all the others, but lacking rational relationships among at least several of the key influences, that assumption seems likely to remain unwarranted. We have profited from reading portions of a book manuscript that Suzanne Wilson is at work on concerning educators learning in and from the One issue that this research does not reach is the effects of compulsory assignment to improved professional development. While we are fairly confident that the results reported here are not the result of selfselection, we have no evidence on which teachers in our sample were assigned to professional development by superiors. Hence we could not probe the effects of the influences that we discussed on teachers who would initially not volunteer for education in the replacement units or for other elements of the reforms. We have so far only performed the check for
“administrative support” in SAS; a more proper
estimation technique might be hierarchical linear modeling, given
that this is a school or districtlevel variable. It would be
surprising, given the very low coefficient on this variable, if HLM
changed the results to any great extent. There is also an argument
for the view that different communities of support exist within the
same schools—and therefore the individuallevel measure is
more appropriate. References Ball, D. L., & Rundquist, S. S. (1993). Collaboration as context for joining teacher learning with learning about teaching. In D. Cohen, M. McLaughlin, & J. Talbert (Eds.), Teaching for understanding (pp. 13–42). Barr, R., & Dreeben, R. (1983). How schools work. Berliner, D. (1979). Tempus educare. In P. Peterson & H. Walberg (Eds.), Research on teaching: Concepts, findings, and implications (pp. 120–135). Berman, P., & McLaughlin, M. W. (1978). Federal programs supporting educational change. Vol. VIII: Implementation and sustaining innovations. Brown, C. A., Smith, M. S., & Stein, M. K. (1996, April). Linking teacher support to enhanced classroom instruction. Paper presented at the annual meeting of the American Educational Research Association, Clogg, C. C., Petkova, E., & Haritou, A.
(1995). Statistical methods for comparing regression coefficients
between models. American Journal of Sociology,
100(5), 1261–1312. Cohen, D. K. (1989) Teaching practice: Plus ca change....In P.W.Jackson (Ed.), Contributing to educational change: Perspectives on research and practice (pp. 27–84) Cohen, D. K., & Spillane, J. P. (1992).
Policy and practice: The relations between governance and
instruction. Review of Research in Education, 18,
3–49. Corcoran, T. B. (1995). Helping teachers teach well: Transforming professional development. Cuban, L. (1984). How teachers taught : Constancy and change in American classrooms, 1890–1980. Dow,P.B.(1991). Schoolhouse politics: Lessons from the Sputnik Era. Elmore, R. F. (1979). Backward mapping:
Implementation research and policy decisions. Political Science
Quarterly, 94(4), 601–616. Guthrie, J. W. (Ed.).
(1990). Educational Evaluation and Policy Analysis,
12(3). Heaton, R. M., & Lampert, M. (1993). Learning to hear voices: Inventing a new pedagogy of teacher education. In D. Cohen, M. McLaughlin, & J. Talbert (Eds.), Teaching for understanding (pp. 43–83). Kennedy, M. M. (1998, April). Form and substance in inservice teacher education. Paper presented at the annual meeting of the American Educational Research Association, Kirst, M. W., & Mazzeo, C. (1996). The rise, fall, and rise of state assessment in Lash, A., Perry, R., & Talbert, J. (1996). Survey of elementary mathematics education in Leinhardt, G., & Seewaldt, A. M. (1981).
Overlap: What’s tested, what’s taught. Journal of
Educational Measurement, 18(2), 85–95.
Linn, R. L. (1983). Testing and instruction:
Links and distinctions. Journal of Educational Measurement,
20(2), 179–189. Lipsky, M. (1980). Street level bureaucracy: Dilemmas of the individual in public services. Little, J. W. (1989). District policy choices
and teachers’ professional development opportunities.
Educational Evaluation and Policy Analysis, 11(2),
165–179. Little, J. W. (1993). Teachers’
professional development in a climate of educational reform.
Educational Evaluation and Policy Analysis, 15(2),
129–151. Lord, B. (1994). Teachers’ professional development: Critical colleagueship and the role of professional communities. In N. Cobb (Ed.), The future of education: Perspectives on national standards in education (pp. 175–204). Mayer, D. P. (1999). Measuring instructional
practice: Can policymakers trust survey data? Educational
Evaluation and Policy Analysis, 21(1), 29–45.
McCarthy, S. J., & Peterson, P. L. (1993). Creating classroom practice within the context of a restructured professional development school. In D. Cohen, M. McLaughlin, & J. Talbert (Eds.), Teaching for understanding (pp. 130–166). McLaughlin, M. W. (1991). Enabling professional development: What have we learned? In A. Lieberman and L. Miller (Eds.), Staff development for education in the ’90s (pp. 61–82). Murnane, R. J., & Raizen, S. A. (1988) Improving indicators of the quality of science and mathematics education in grades K–12. O’Day, J. & Smith, M. (1993). Systemic reform and educational opportunity. In Pressman, J. L., & Wildavsky, A. B. (1984). Implementation. Schifter, D., & Fosnot, C. T. (1993). Reconstructing mathematics education. Schwille, J., Porter, A., Floden, R., Freeman, D., Knapp, L., Kuhs, T., & Schmidt, W. (1983). Teachers as policy brokers in the content of elementary school mathematics. In L. Schulman and G. Sykes (Eds.), Handbook of teaching and policy (pp. 370–391). Shavelson, R. J., McDonnell, L., Oakes, J., & Carey, N. (1987). Indicator systems for monitoring mathematics and science education. Weiss, Welch, W. W. (1979). Twenty years of science curriculum development: A look back. In D. C. Berliner (Ed.), Review of research in education (Vol. 7, pp. 282–306). Wiley, D., & Yoon, B. (1995). Teacher reports of opportunity to learn: Analyses of the 1993 Wilson, S. M., Miller, C., & Yerkes, C. (1993). Deeply rooted change: A tale of teaching adventurously. In D. Cohen, M. McLaughlin, & J. Talbert (Eds.), Teaching for understanding (pp. 84–129). DAVID K. COHEN is John Dewey Collegiate Professor of Education and Professor of Public Policy at the


