Subscribe Today
Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Ambitious Pedagogy by Novice Teachers: Who Benefits From Tool-Supported Collaborative Inquiry into Practice and Why?

by Mark Windschitl, Jessica Thompson & Melissa Braaten - 2011

Background/Context: The collegial analysis of student work artifacts has been effective in advancing the practice of experienced teachers; however, the use of such strategies as a centerpiece for induction has not been explored, nor has the development of tool systems to support such activity with novices.

Purpose/Objective: We tested the hypothesis that first-year teachers could take up forms of ambitious pedagogy under the following conditions: 1) that reform-based practices introduced in teacher preparation would be the focus of collaborative inquiry throughout the first year of teaching, 2) that participants use analyses of their students’ work as the basis of critique and change in practice, and 3) that special tools be employed that help participants hypothesize about relationships between instruction and student performance.

Participants: Eleven secondary science teachers engaged in tool-supported collegial analysis of their students’ work over two years, spanning pre-service and in-service contexts.

Research Design: We used a qualitative multi-case study approach, incorporating videotapes of collaborative inquiry (CFG) sessions, classroom observations, student-created artifacts, interviews, and field notes. The primary cases were of the CFG sessions themselves. Analysis entailed identifying patterns of participation across CFG sessions and changes in classroom practice during induction.

Findings: More than one third of the group developed elements of expert-like teaching, with the greatest gains made in pressing their students for evidence-based scientific explanations, a practice that was the focus of their regular examinations of student work. For a majority—those who initially held the most problematized images of the relationships between teaching and learning—the system of tools (rubrics and protocol) was critical in allowing deep analyses of students’ work and supporting a shared language that catalyzed conversations linking “what counts” as scientific explanation with the re-calibration of expectations for students. This, in turn, helped participants envision more specialized forms of scaffolding for learners.

Conclusions: Those who begin their careers with a problematized view of the relationships between teaching and learning are not only more likely to appropriate sophisticated practices early, but also to benefit from evidence-based collaborative inquiry into practice. This study also highlights the potentially powerful role of tools and tool-based routines, tailored to the needs of beginning teachers, in fostering ambitious pedagogy. This success, we believe, can support the design of more robust systems of tools for early career teachers’ collaborative inquiry and can inform theory around the implementation of these tools.

There is a growing consensus within the field of teacher education that equipping novices with a repertoire of competent classroom practices is no longer considered an adequate professional preparation. Because initial training can only begin new educators on the long trajectory towards expert teaching, it is equally important that these programs help novices develop strategies and habits of mind to learn from practice as they enter the profession, laying the foundations for career-long development (Darling-Hammond & McLaughlin, 1995; Fullan, 1993; Grossman & McDonald, 2008; Hiebert, Morris, Berk, & Jansen, 2007; Lieberman & Miller, 2001; Nemser, 1983). Broadly speaking, learning from teaching is best achieved through systematic cycles of inquiry into practice and using evidence generated by these examinations to re-shape instruction (Borko, Jacobs, Eiteljorg, & Pittman, 2008; Cobb, Dean, & Zhao, 2006; Jacobs, Franke, Carpenter, Levi, & Battey, 2007; Lewis, Perry, & Murata, 2006; Nelson, Slavit, Perkins, & Hathorn, 2008).

In recent years, teachers have engaged in such inquiries within collegial settings by analyzing student-generated artifacts and hypothesizing about the relationships between learning and instructional moves made in the classroom (Curry, 2008; Kazemi & Franke, 2004; Little, 2007). Although the analysis of student work as a way to advance practice is no longer a new idea, the use of such strategies with pre-service educators has not been widely documented. Indeed the data are too scarce to confirm that prospective teachers can acquire the skills necessary to analyze teaching (Hiebert et al., 2007, p. 58).  In this research, we test the hypothesis that teacher candidates can systematically interrogate and advance their beginning practice with the help of special tools to guide the analysis of their pupils work and support collegial critique based on those analyses. We explore the broader possibility that the analysis of student work can serve as the basis for sustaining an intellectual community that helps bridge pre-service and induction years. Such a bridging strategy makes sense in terms of what we know of the difficult transitions that novices face when entering the world of professional work and in terms of maintaining a collaborative sense of purpose for instructional critique that builds upon and extends powerful pedagogical ideas that are introduced in teacher preparation.

Although studies that follow novices from their preparation experiences into the first years of teaching are remarkably rare, the few that have been reported portray similar transitions into professional work: newcomers are willing to try out some non-traditional strategies when they enter their classrooms, but for a variety of reasons, key pedagogical practices from their pre-service education are not often put into play (Goodlad, 1990; Grossman et al., 2000; Kennedy, 1999; Nolen et al., in press; Simmons et al., 1999, Wilcox, Schram, Lappan, & Lanier, 1991).  

In the research presented here, we describe how eleven secondary science teachers engaged in the collegial analysis of their students work over two years, spanning pre-service and in-service contexts. The analyses were facilitated by tools that allowed them to situate their current repertoire of instruction within an explicit continuum of development, and to visualize their practice as an object of critique, evidence-based analysis, and target of ongoing refinement. We seek to develop theory around engagement with collegial tool-based practices by novice teachers in order to extend the press for ambitious pedagogy from pre-service preparation into their first year of professional work. Currently, we know little about the mechanisms by which individual teachers practices change as a result of examinations of student work (Borko, 2004; Lewis et al., 2006; Wilson & Berne, 1999), nor do we have ways to characterize the collective trajectories of pedagogical reasoning in groups of teachers (Little & Horn, 2007).



In this study, the analysis of practice does not focus on teacher behavior but on student work (artifacts) that allows insights into young learners thinking and how performance is influenced by tasks, questions, and assessments developed by the teacher (Kazemi & Hubbard, 2008). These artifacts include students written responses to teachers prompts or problems, lists of ideas, drawings, video of conversations, or self-reflections on learning. Analyzing learning artifacts has helped teachers generate and test hypotheses about instructional decisions (Hiebert et al., 2007; Nave, 2000; Wheatley, 2002), pushed them to think beyond routine, familiar activity in the classroom (Kazemi & Franke, 2004; Sandoval, Deneroff, & Franke, 2002), and led to improved student learning (Crespo, 2002; Goldenberg, Saunders, & Gallimore, 2004). The larger aim of such analyses in professional settings is to cultivate an inquiry stance, where classroom practice becomes an object of ongoing critique and refinement (Ball & Cohen, 1999; Warren-Little, Gearhart, Curry, & Kafka, 2003).

The available research, however, gives little sense of how these analyses and subsequent conversations should be structured so that benefits might be systematically achieved. Without guiding these conversations, the focus on student learning is often lost when participants become pre-occupied with talk of classroom activity or conceptualize the analysis as an evaluation task (Cobb et al., 2006; Crespo, 2002). For example, in a high school science professional development project aimed at having teachers use pupil work to understand their own practice, Sandoval et al. (2002) found teachers discourse almost entirely grounded in descriptions of activity or judgments about particular activities, and that these descriptions were rarely unpacked or explained (p. 4). Hammer (1999) similarly noticed, when working with science teachers that, to examine student work, they fixated on activities that could be used to correct a misconception or on how an activity could be done differently to avoid a problem, rather than looking for evidence of student thinking.

In order to sustain the kinds of analysis necessary to interrogate and advance practice, special resources and supportive contexts are necessary. Among these resources are tools such as protocols that guide conversations and rubrics for analyzing student work. These help frame the boundaries of practice being investigated, make clear what counts as evidence for assertions about learning, and facilitate how collegial groups might talk together about inquiries into learning (Hiebert, et al., 2007). Such tools serve another valuable function in professional development the reinforcement of a common vocabulary that references certain types of learning, teaching, or ideas (Curry, 2008). This helps establish what Lortie (1975) termed a technical culture within which teachers can broker with one another meaningful exchanges about classroom experiences, define problems of practice, and co-construct informative accounts of teaching and learning (Wenger, 1998). Little (2007) notes that [o]ther things being equal, the utility of collegial work and the rigor of experimentation with teaching is a direct function of the concreteness, precision, and coherence of the shared language (p. 331).


Productively investigating ones practice through the collaborative examination of student work requires special analytical tools and a shared language, but it also requires that participants have access to a theory about what constitutes good teaching. Our study is grounded in the logic that the learning envisioned in current educational reforms can only be realized through ambitious forms of teaching that are unlike the pedagogy seen in most classrooms. Ambitious teaching deliberately aims to get students of all racial, ethnic, class, and gender categories to understand important ideas, participate in the discourses of the discipline, and solve authentic problems (Lampert & Graziani, 2009; Newmann et al., 1996). This kind of pedagogy is both adaptive to students needs and thinking, and maintains high standards of achievement for learners. Within science education, such instruction has been clarified through converging lines of scholarship within science studies, student learning, assessment, and curriculum (summarized in National Research Council, 2006; National Research Council, 2005a; National Research Council, 2005b). Based on this scholarship, we constructed a framework of four inter-related elements of ambitious pedagogy. Each of these elements are represented by a range of how they may be enacted in classroomsranging from traditional to more sophisticated (Appendix 1). Each of these four ideas was incorporated into participants methods class instruction (see Windschitl, Thompson, & Braaten, 2008a for details).

The larger instructional context for these four ideas was that of Model-Based Inquiry (MBI). Ideally, MBI is used to help students test ideas which are represented as a system of explanatory processes, events, or structures. These tentative models, in the form of drawings or verbal descriptions, are tested against observations in the world and assessed for adequacy against standards of evidence (see Windschitl, Thompson, & Braaten, 2008b; Schauble, Glaser, Duschl, Schulze, & John , 1995; Schwarz & White, 2005; Smith, Maclin, Houghton & Hennessey, 2000; Stewart, Hafner, Johnson & Finkel, 1992; Stewart, Passmore, Cartier, Rudolph & Donovan, 2005). Of the four MBI-embedded elements of pedagogy, participants in this study focused primarily on pressing students for evidence-based explanations as the basis of their collaborative inquiries into practice.

In the most sophisticated version of pressing for explanation, the teacher supports students abilities to use observable evidence to generate causal explanations for specific science phenomena (such as the oceans tides, or the production of gasses during yeast growth). The teacher supports epistemic thinking by prompting students to unpack what counts as a good scientific explanation. In the less sophisticated versions of pressing for explanation, teachers ask students only to provide a description of what happened in a demonstration or inquiry (e.g., food dye disperses faster in a beaker of warm water than in a beaker of cold water). Otherwise, the teacher might ask students to talk about trends in data (e.g., the more dissolved oxygen there is in a stream, the greater the number of invertebrate species present) without probing for what the relationships might indicate about the cause of the phenomenon under question. We recognize that there are other forms of scientific explanation. We emphasized this type, however, because it represents an important goal for inquiry in most science sub-disciplines (Friedman, 1974; Salmon, 1989) and it requires students to focus on how explanatory scientific models help us understand a wide range of natural phenomena.

We define the types of teaching that focus on students Model-Based Inquiry and in particular evidence-based explanation as ambitious pedagogy for three reasons. From a content standpoint, it requires teachers to have a mastery of subject matter that allows them to penetrate the layers of material activity and vocabulary inhabiting common curricula to identify core scientific ideas (e.g., force and motion, natural selection, plate tectonics, chemical equilibrium) that can explain a wide range of phenomena in the natural world. From an epistemic standpoint, ambitious pedagogy in the form of MBI requires that the teacher involve students in practices that reflect the nature of discourse and intellectual work done by scientists (iteratively proposing and comparing explanatory models, generating evidence to test hypotheses, and constructing explanations for phenomena) rather than occupying young learners with a set of confirmatory lab exercises that point to predetermined outcomes. Finally, from an interactional standpoint, ambitious pedagogy is fundamentally dialogic. This means that sense-making and problem-solving are mediated by opportunities for students to talk through their science conceptions in increasingly sophisticated ways over time. During instruction the teacher must use in-the-moment judgment to build upon, challenge, or re-voice students conjectures, explanations, or arguments, and ultimately to get students to query each other in these ways.  Several large-scale studies indicate that this kind of teaching is rare, even in the classrooms of experienced teachers (Baldi et al., 2007; Banilower, Smith, Weiss, & Pasley, 2006; Horizon Research International, 2003; Roth & Garnier, 2007).


We view the system of support needed by novices through the lens of activity theory. This theory assumes that a persons frameworks for thinking are developed through problem-solving actions carried out with the help of tools, socially constructed routines of activity, and other resources (Cole, 1996; Engestrom, 1999; Wertsch, 1991). In this study, tool-based routines are recurring, recognizable patterns of social and intellectual interaction, mediated by the structure and language of material resources that assist in the achievement of culturally-defined goals (Feldman & Pentland, 2003).

Within an instructional context, the use of tools and tool-based practices can scaffold a teachers thinking around problems such as how to transform subject matter and disciplinary practices into forms comprehensible to learners (Hammerness et al., 2005). Although many teacher preparation programs provide conceptual tools that assist in the process (e.g., broad theories about teaching, learning, or the subject matter as a discipline), these do not solve the problem of what to do next in the classroom for beginning teachers (Grossman, Smagorinsky, & Valencia, 1999), or in our case, how to interpret what went on in the classroom. For this kind of learning, praxis tools must complement conceptual tools. Praxis tools embed theory about good teaching into material resources or strategies that guide planning, instruction, analysis of learning, and reflection.

While tools serve valuable roles in mediating classroom discourse, they may be used in a range of different ways by teachers who hold to various mental models of what constitutes good teaching. These mental models likely play a role as well in settings of collegial inquiry (Strauss, 1993). Such models represent informal theory about what conditions lead to learning in particular situations and draw upon a logic that organize[s] all thoughts, perceptions, and actions by means of a few generative principles (Bordieu, 1990, p. 86). Spillane and Miele (2007) argue that because individuals bring different mental models to bear on problems of practice, it is possible for practitioners in a group to interpret and act upon the same evidence in fundamentally different ways.

Using this conceptual framework, we describe our rationale for using the collegial analysis and discussion of student work to bridge early learning-to-teach contexts (pre- and in-service). Our participants, as members of a teacher preparation cohort, had shared a history of engagement with reform-based pedagogical ideas. These ideas formed the beginnings of a common technical language around classroom science inquiry. These ideas and their constituent language were incorporated into a praxis tool i.e., a multi-faceted rubric for the analysis of students work. This rubric was then used with a conversation protocol (a second praxis tool) during regular collaborative sessions to mediate interactions the cohort had over the joint examination of student work preserving this common language by using it to envision, negotiate and solve problems of practice. If we could enculturate participants into routine examinations of student work as what all good teachers do, we could then build expectations to pursue ambitious forms of pedagogy, and as part of this process, to engage in evidence-based hypothesis testing about teaching. Thus, we hoped to dispel a mindset common to beginning teachers of mere first-year survival and the intellectual dormancy that can result.


Three specific questions guided our inquiry:

1) Under what conditions of analysis and collegial conversation do participants come to a deeper understanding of elements of instruction that influence students performance and learning?

2) Are there differences in how individuals benefit from analytic tools and tool-based practices in socio-professional settings where the critique and revision of practice is the goal?

3) Does classroom practice change as a result of participation in collegial tool-supported analyses of student work?



The 11 participants were candidates in a teacher education program at a public university in the western United States (see Table 1 for additional participant information). These individuals were part of a larger cohort and had remained in the region for their first year of teaching. Only one participant who took up a local teaching position declined to participate. All entered the program with a bachelors degree in an area of science or engineering. During the first two quarters of the program, participants took a methods course, taught by the first author, in which elements of ambitious science pedagogy mentioned in the previous section were introduced and practiced (through peer-teaching and lesson design activities; for details, see Windschitl, et al., 2008a).

Table 1. Backgrounds of Participants


Degrees held when entering program

Undergraduate credits in science

Student teaching context

First year teaching context


Cell Biology


Suburban High School

Suburban High School




Suburban Middle School

Suburban High School




Suburban Junior High School

Suburban Junior High School




Suburban High School

Urban High School




Suburban High School

Urban High School




Suburban Junior High School

Suburban Junior High School


Earth Sciences


Urban Middle School

Urban Middle School




Suburban Middle School

Suburban High School




Suburban Middle School

Suburban Junior High School




Suburban High School

Suburban High School




Suburban Middle School

Suburban High School

Following this methods course, participants taught as student teachers in either a middle school or high school for approximately 10 weeks. We observed each participant between three and nine times, depending upon their availability and desire to receive feedback from us after the observation. Typically, we would observe a full class period, during which we would script classroom dialogue.

During this practicum, we asked participants to collect samples of student work at points early in the term, midway through the term, and near the end of term. One set of samples was to demonstrate student growth in understanding of the use of evidence in scientific explanation (i.e., what counts as evidence, how it can be used to make claims, support arguments), and another set that was to demonstrate student growth in understanding of a key scientific concept (e.g., the seasons, photosynthesis, or energy transformations). We asked that each set of student work include samples from four individuals who appeared to learn new ideas easily, four who were in the mid-range, and four for who appeared to learn with great difficulty. Each of the sets came from the same 12 students.

When participants returned to the university the following quarter, we facilitated three working sessions with them to analyze their students work for trends that could illuminate how learners responded to different aspects of instruction. We developed and tested rubrics to help participants analyze their students work around different facets of Model-Based Inquiry and evidence-based explanation (Figure 1 shows the explanation section of the rubric). The larger rubric includes criteria for assessing to what degree pupils understand the nature and function of models, how students use models to generate hypotheses for testing, how students use evidence to support explanations, and several other indices of thinking. After the analysis sessions, participants came together in a Critical Friends Group (CFG) setting to consider with colleagues how patterns of student learning might be linked to instructional decisions made during the teaching practica. The authors led groups of three or four individuals through a protocol (Appendix 2) that allowed the focal participant to present her/his student work, describe instructional decisions that informed the design of students tasks, illustrate the trends they saw in the work and a offer a key problem of practice they wanted colleagues to help them think about. Non-presenting participants reviewed the student work and had multiple opportunities to discuss the work and offer hypotheses about connections between student performance and teacher decision-making. Through the design of the protocol tool, we intended for CFG members to engage in discussions marked by constructive controversy (Achinstein, 2002), pressing each other to unpack assumptions about what was represented in the student work, clarify terms, and defend stances about teaching.

Figure 1. Criteria lines for pressing for explanations from rubric used to analyze student work

Level 1

Level 2

Level 3

Explanations with Theoretical Components


Student describes what happened.


Student describes, summarizes, or restates a pattern or trend in data without making a connection to any unobservable/ theoretical components.


Student describes how something happened.


Student addresses unobservable/ theoretical components tangentially, but not as part of a causal explanation.


Student explains why something happened.


Student explanation contains a claim that justifies the link between observable data and unobservable/ theoretical causal components.

Explanations with Mathematical



Student describes what happened.


Student describes, summarizes, or restates a pattern or trend in data.


Student describes how something happened.


Student links observations to mathematical concepts in isolation.


Student explains why a mathematical model accounts for a phenomenon.


Student links observations to statistical or other mathematical models.


Student explains the links between observations and statistical or other mathematical expressions.


The following year, participants began their professional work in local secondary schools. We observed classroom instruction and debriefed with participants six times during their first year of teaching. Participants again collected, at four different points in time, samples of work by twelve pupils from a range of abilities. This work was selected specifically to assess pupils understandings of how to use evidence in constructing coherent causal explanations of important science phenomena. These pupil work samples were analyzed by participants before they arrived for the Critical Friends Group meetings, which were held in December, March, and May of the school year.

All CFGs were videotaped and samples of the student work were collected. During the first year of teaching, the research team purposely limited support to participants during classroom visits (i.e., little overt guidance about planning, teaching, or assessment). Across the two years encompassed by this study we conducted four interviews that probed participants curricular vision, continuing pedagogical challenges they faced, and retrospective assessments of how their practice had evolved.


We used a qualitative multi-case study approach (Creswell, 1998), incorporating videotapes of CFG sessions, classroom observations, student-created artifacts, interviews, and field notes. The primary cases we developed were of the CFG sessions themselves. Analysis of the CFGs entailed iterative cycles of identifying comparative patterns of discourse across sessions and longitudinal trends in discourse over the two years.

In analyzing the video data of the CFGs, we used three sets of codes aimed at revealing patterns in participants use of language to construct and re-construct representations of practice with peers (Green & Bloome, 1997). One set of codes related to forms of discourse associated with the tool (rubric) and marked how the tool was referenced and how specific language from the tool itself was incorporated into the conversation. The second set of codes related to interactions among participants around re-voicing or re-framing the problems and hypotheses that the focal participant had originally brought to the table. A third set of codes related to the interactions among participants that connected students work with pedagogical decisions. During the analyses, we developed and tested hypotheses related to our research questions, then created a number of matrices that charted longitudinal connections between specific features of participation in the CFGs, the evolving nature of the tools we asked participants to use, and changes or stasis in their classroom practice.


We begin by describing the activity of our first two CFGs. We foreground the different ways in which participants represented their practice to peers and describe the emerging role of the analysis tools in their talk. In the second section, covering the later CFGs, we foreground the differential roles the tools and tool-based practices played for two groups of participants, depending upon how they conceptualized their practice and represented episodes of learning in their classrooms. In the third section, we describe changes in classroom practice by these novices.


Our first CFG was held in March, 2007, two months after participants had finished student teaching and just prior to their graduation. In addition to the participants, we invited their cooperating teachers to engage with them in the discussions. Participants brought with them boxes they had used to organize their pupils work; some carried in color-coded file folders and others brought posters with students hand-drawn diagrams of science ideas. Throughout the day, in mixed groups of four to six, each participant (the focal participant) took a turn presenting a dilemma of practice that emerged from their previous analysis of artifacts. They began by describing the instructional context and passed out copies of student work that represented their dilemmas. To focus the conversations, we specifically requested that they collect work on student understanding of evidence-based explanations, as opposed to presenting generic dilemmas to their peers (classroom management, problems with parents, issues with curriculum, etc.).

From the later analysis of these videotapes, we first noted distinctly varied forms of engagement and that these forms of engagement appeared to result from how the focal participants initially framed both their teaching practice and their dilemmas to peers. We referred to this framing as the problems with students / puzzles of practice continuuma continuum because each of the characteristics that helped to define this range of representation (discussed later) varied along a qualitative spectrum.

Approximately one-third of our participants described their dilemma as a problem with studentsthese were often summarized as My students dont get it or I cant get them past this misconception. Student learning in these cases was characterized as an all-or-nothing intellectual event. Participants who framed their dilemma as a problem with students used language that suggested a sense of frustration or disappointment in students about failing to meet what they portrayed as modest expectations, despite exposure to competent teaching. One pre-service physics teacher presented his case this way:

I thought I was doing a great job at having kids look at evidence, until I looked at the student work, and, ah well, they are basing [their explanations] on evidence, but what they have written on the page is not very good at all & It was more of an assumption on my part, you know, they were juniors in high school, you know, youve had enough science so that you should probably do this not once, but twice or three times [measure speed during a kinetics lab]those are the assumptions I had and they were bad ones.

The talk of bad assumptions above was characteristic of a pattern that this individual continued throughout the CFGs, which was to reference in only a perfunctory and tangential way his responsibility for student performance. In fact, for all the CFGs that were framed by the focal participant as problems with students, the responsibility for performance rested almost entirely with students. In the focal participants narratives, they described their teaching as having crafted an illustrative event rather than having made a series of decisions that would influence learning. The questions posed to peers from this perspective seemed to be grounded in a normalized conception of teaching; essentially: What special or extra moves must I make for individuals who didnt learn from instruction-as-planned?

The problems with students framing was also linked with limited analyses of students work. We had intentionally asked participants to collect artifacts from students of a range of abilities. However, rather than looking at trends across different types of students, or across different assignments, all participants with a problems orientation focused on one student, one assignment, and often one question on an assignment. For example, a participant who had just finished teaching in a local middle school grounded his entire CFG session in the work of a single student, without reference to the work samples of 12 of his pupils collected over three months. He remarked that this student &had straightforward instructions, but I got weird results from her. His dilemma was presented as: How can I help students with learning disabilities? What strategies will provide better results?

On the other end of the continuum, approximately one-third of participants portrayed their dilemmas as puzzles of practice. These narratives were marked by a genuine sense of curiosity and intellectual challenge, often couched in terms of student thinking rather than student answers. As opposed to the problems frame of learning (as an all-or-none event), the puzzles participants initiated their CFGs by citing evidence of partial understandings in their students work as clues to how they were interpreting learning tasks, or as potential leverage points for different instructional moves. Puzzles were more often accompanied by talk of high expectations of all students that differed from the expectations expressed from the problems perspective, in that students were portrayed as capable of significant achievement under the right conditions. When participants presented puzzles of practice, they tended to use the language of support and scaffolding rather than instruction. Puzzles were puzzling in part because they were a product of interactions between teachers instructional choices and the students responsesnot simply located in student failure. These participants would say, in effect: My students responded this way to a prompt I gave, and then describe the question or the scaffolding in detail. Sarah, for example, shared how her higher-level students actually regressed in their depth of explanation between a lab exercise with termites and a subsequent test:

The high-level students on the termite lab had a level 3, but they all seemed to decrease on this later test, on their Depth of Student Explanation [line on rubric]. A lot of the middle-level students and low-level students actually increased. So I guess my question is&what about how Ive worded this question made these students, kind of, provide less of a rich explanation compared to a lot of the other students who had an increase?

For some participants, this sense of teacher responsibility generated a great deal of emotion. Amanda, for example, who taught in a high school chemistry class, fought back tears as she opened her own CFG. After surveying the nature of the questions she had asked students throughout her practicum, she came to a painful conclusion:

I didnt push students to higher level of thinking. I was asking what instead of how or why& If they went to higher level they had to do it themselvesthen that thinking eventually stopped&Under-served students, well, they stayed the same. I was focused on imparting knowledge and didnt push them to that level where they felt uncomfortable going.

Another characteristic of those who presented puzzles was their deeper analyses of students work, describing how different groups of students responded to prompts or feedback, citing trends over time in students performances, or showing differences in how students performed given different instructional conditions. Rachel, who student taught in a high school biology classroom, set up her dilemma by referencing groups of students at varying ability levels and the differential progress of these groups over time:

I was making bits of progress with high-end kids and underserved kidsso, using this rubric, going from a one to a one-plus or a one-plus to a two-minus, that kind of thing. The rest of the class, you know, most of the kids were flat-lining on both claims and evidence and on models.

She continued her introduction, commenting on one student in particular, saying that she was not seeing the kind of work this student was capable of. Then, as characteristic of other who posed puzzles, asked: What can I be doing differently to support this student?

These more complex analyses resulted in CFG discussions where sophisticated hypotheses about teaching and learning were generated. Barbara, for example, had asked her students to make a claim about how air pressure had caused a soda can to implode during a lab. In looking at the work of these students, she was disappointed in their ability to use data to back up their claims, and she spent a great deal of time providing targeted feedback. Then, in a subsequent lesson, she again asked students to construct a claim and provide warrants for that claim. After analyzing this second set of work samples, she found that her high-level student had made great strides, while her middle and lower students did not improve. During the course of the CFG, her colleagues proposed ways of testing the hypothesis that many of her lower ability students were struggling, in part, because they could not incorporate Barbaras feedback into their thinking and that this could be partially responsible for their difficulties in meeting the high expectations she maintained in her classroom.

In summary, participants who presented their dilemmas as puzzles of practice differed along several dimensions from those who presented their dilemmas as problems with students. These dimensions included: 1) depth of analysis of student work, 2) locus of responsibility for student performance, 3) nature of focal questions to peers, and 4) ways of expressing expectations of students.

A second major trend in the early CFGs related to the conversations that followed the focal participants presentation of the dilemma. Nearly all of the CFG sessions indicated a lack of accountability in conversations to understand evidence of thinking in the student work and to understand the science ideas being taught. Despite the occasionally rich and varied conversations about teaching and learning, participants were engaging in most CFG discussions as if they had shared understandings of the student thinking represented in the artifacts and of the science being taught. We discuss first the lack of commitment to understanding the students work.

We intended for student work to be the central text of the CFG conversations, and assumed that non-focal participants would give these artifacts a generous reading. We reserved significant time during the CFGs for participants collectively to examine the samples in order to comment on the nature of the tasks to which the students were responding, to challenge the focal participants view of what evidence of understanding they saw in the samples, and to provide concrete referents for comments about the science involved in the lessons. These types of interactions, however, rarely happened. Rather, participants spent much of their time asking about the context of instruction, about what work students had done previously, or what curriculum was being used. Immediately following these questions about context came a distinctive pattern of discourse that we refer to as repair talk, in which hypotheses were offered about why the students work was not of high quality, but without reference to the work itself. CFG participants would suggest, for example, that the pupil in question may have trouble with writing in general or be disaffected from school, but would offer no warrants for these beliefs. The excerpt below shows one of the cooperating teachers in the first round of CFGs (Christine) attempting to identify why a student of the focal participant (Patricia) was not able to describe how she used evidence in her conclusions to an experiment. Without reference to the student work, Christine speculates about issues the student might have with writing, then tags on an additional question about whether gender had an influence on her performance:

Christine: Can you tell us anything about this student? How long has she been at your school?

Patricia: Shes been there a while.

Christine: Can you tell us a little about her writing ability outside of science?

Patricia: Not much but she is a good student

Christine: She gets very good grades?

Patricia: Yes.

Christine: If you compare this student who gets very good grades with another student in the class who gets very good grades, is there is difference in writing ability? So she is not right there with her peer group?

Patricia: What Im saying is that of the people who were getting good grades, there were two types, one group

Christine: knew where they were going

Patricia: Yea, she was one among many of her peers.

Christine: So, in these groups was there gender predictability?

This type of talk was infectious. Once started, other participants would join in trying to fix what might be wrong with the lesson; in the case above, offering suggestions as to how to scaffold writing in science notebooks. These comments were not always counterproductive, but they drew attention away from the details of the student work, signifying that colleagues could take the dilemma posed by the focal participant at face value and move on from there.

Similarly, in many of these sessions, there was no interrogation of scientific concepts or practices that focal participants based the student tasks on. For example, one individual had her students roll toy cars down ramps with different inclines. Her dilemma was Why cant my students put evidence in their conclusions? Over the course of an hour, six different peers probed for solutions to this question, but no one asked What did you expect in the conclusion? What should a conclusion look like? Similarly, no participant questioned the science, for example, What is supposed to happen with cars going down ramps at different anglesand why? Many participants focused their students on the mechanics of conclusion-writing after activities rather than having them use ideas generated from the activities to revise their current explanatory models, as suggested in our MBI framework.

In sum, the protocol allowed too much to be taken for granted about students thinking, and about the aims of instruction. There were virtually no verbal references back to the rubrics in the first round of CFGs. Participants had used them in previous weeks to analyze their students work, but the rubrics as tools did little in terms of directly supporting the CFG conversations.


In reflecting on our participants performances in the post student-teaching CFGs and in preparation for following them into their first year of teaching, the research team, with input from participants, agreed that engaging students in scientific explanation was a key practice worthy of an induction-year focus. We also recognized that our participants had not really understood how to analyze their students abilities to engage in this key discourse; on the contrary, they appeared satisfied with surface level descriptions from their students and could not imagine what a press for a more sophisticated (evidence-based, causal) explanation might look like. In preparation then for the second round of the CFGs, we asked participants to use the rubric explicitly as a guide to collect certain types of student work for analysis. We asked them to consider what phenomenon they were asking students to develop explanatory models for and then to determine for themselves what the typical student responses might be for all three levels of explanation before examining their work. During the CFGs, we included the rubric as a handout, making it available to reference features of the student work. With these conditions in place, participants gradually began to use the rubric to support different forms of productive conversation.

Interestingly, participants in discussions began referring to the rubric not only to classify an existing question they had posed to their students, but to imagine what kind of question might represent the next level of thinking. Sarah, for example, had asked her students to write about why humans have opposable thumbs after a lab in which students explored how they used their own thumbs to carry out everyday tasks. Most of her students had replied pragmatically, to manipulate things, but no one had mentioned a different kind of why a causal story involving evolutionary mechanisms.  This confused Sarah because her students were able to come up with richer why explanations during a lab activity occurring two months earlier in a lab on termite behavior. Here, Sarah references the levels of explanation in the rubric, realizing that the way she had asked the question of her students had prompted only a low level of thinking. Another participant, Rachel, co-constructs with Sarah a more challenging question:

Sarah: So I mainly think it was that, like, how did I ask the question? So that it maybe&and why were they able to come up with kind of a more further, richer explanation for the termite lab than the thumb lab even though they had experience with the thumb already?

Jessica: Yes.

Sarah: And we talked a lot about it.  So how did I ask the question that made them not go there?

Jessica: Yes, because both of them, you did sort of say you prompted them to explain why in both them. And so, Sarah, what did you hear from the discussion today?

Sarah: Well, I think, and I kind of shared already that question&it wasnt possible to really get to a level 3.  I mean its a pretty simple question.  Its not open-ended and its not very in depth.  I mean its a thumb. Its notIm not asking them to make the connection to any sort of great evolutionary ideas. So I mean my purpose for this question really was to work on or see if all the partsbut since I knew this was coming I also wanted to see the why and I hadnt collected any other work that really did that.  

Rachel: Well if you&I think just if you asked a following question Why is it that you think that humans have an opposable thumb yet other species dont? I mean maybe it doesnt have to be part of the WASL-like question [acronym for state exams]&

Sarah: Yeah, I did feel kind of constrained by the format which I was trying to make it like a WASL question.

We also made changes in the protocol itself. To help participants be more accountable to the science in their conversations, we built in a step where participants jointly had to answer the questions: What is a level 3 explanation for this phenomenon?  and What could teachers do to incorporate content into the inquiry? Over the course of the final three CFGs then, we had inserted multiple opportunities for participants to consider the qualities of a well-constructed scientific explanation and had directed them to think about levels of explanation (for their particular lessons) earlier and more explicitly in their planning process.


In this section we first describe the increasingly consistent differences in engagement in the CFG conversations between those participants who conceived of their dilemmas as problems with students and those who framed dilemmas as puzzles of practice. Second, we characterize emergent uses of the rubric tool that arose within the context of the CFGs, and later in participants classrooms. As we collected more data from classrooms, we began to recognize that the individuals who presented their dilemmas as problems with students were also the ones who held to the most unproblematic views of the relationships between teaching and learning. This view was most closely documented by our classroom analysis of one of the four elements of ambitious pedagogy which we referred to as working with students ideas. This element captures the degree to which participants incorporated the thinking of students into planning, classroom discourse, and assessment (Appendix A).

The four participants who routinely presented their dilemmas as problems with students each had modal rating of 1 (least sophisticated) in the Working with students ideas category during their first year teaching. This rating indicated an acquisition model of learning. These participants began lessons with no knowledge of students ideas of the scientific topic (i.e., no initial models to work on via Model-Based Inquiry), and no efforts to document these beginning understandings. Instruction was characterized by an organized delivery of information by the teacher, often scripted by the curriculum. Whole class conversations were used only as a check for nominal understandings or vocabulary-level familiarity with the science ideas. Engagement with individual students was to do one-on-one tutoring or to see if students got it. On the other hand, for participants who routinely presented their dilemmas as puzzles of practice, four of the five had modal ratings in the Working with students ideas category at level 3 (most sophisticated), and one participant had a modal rating of 2. The individuals rated at level 3 began units of instruction by eliciting students ideassometimes using these ideas to create tentative public models of how sound travels or why the seasons occurthen referenced these to reshape the direction of instruction.

We characterized the five participants with the highest ratings in this category as having an intellectual stance called: Teaching-Learning Problematic (TL-P), because data from classroom observations and de-briefings indicated they were working to help students to reconstruct their own partial understandings to be more coherent with canonical scientific ideas, and were dealing with the improvisational instructional moves and ambiguities that are inherent to this way of teaching. The intellectual stance of the four participants who had the lowest ratings we referred to as: Teaching-Learning Unproblematic (TL-U), because their classroom practices indicated they were working from a transmission-acquisition theory of pedagogy in which student learning was expected to result from an organized presentation of activities and information. In this view, the scientific sense to be made was evident in the lessons themselves. These participants indicated an absolutist view of learning in which students would understand an idea or not, with little middle ground. We classified two participants as Transitional. Although these individuals did not often teach in ways that demonstrated a problematic stance toward learning, they recognized how their current practice differed from the higher levels on our rating scales. Their participation in the CFGs was similar to that of the TL-P participants.

During the final two CFGs, other trends in the nature of the collegial conversations became clearer (Figure 2). The TL-P participants, who from the beginning CFGs had presented more in-depth analyses of their own students work, also made references to detail in the students work of others who were presenting their dilemmas, they looked for evidence of partial understandings by these students, and created more evidence-based hypotheses linking student performance with instructional decisions. In addition, when dilemmas were presented as puzzles by an individual, the other CFG participants tended to make more references to the student work in the course of the conversation. These more frequent references to student work were made by both TL-P (to a greater degree) and TL-U participants (to a lesser degree). In one example, a TL-P participant, Sarah, presented a puzzle about her students understanding of osmosis. Emily, a Transitional participant, looked for partial understandings in the student work and connected these with the big idea of the lesson:

I like how student 5 says, when you break up corn syrupit at least makes sense in terms of a concentration gradient where it says half stays in one place and half goes in another place. It shows some gradient thinking going on there, a little bitand its not clear, but something is happening there, with this person using the one-half. So this, I think, is the idea of yours, that you want kids to see how homeostasis gets back to a nice balance.

Substantive references to student work were important because they often initiated conversations that linked the nature of science ideas in the curriculum with the questions or tasks given by the teacher, and with appropriate expectations for students responses. For example, in a photosynthesis inquiry, Rachel had her students extract the air from the intercellular spaces of spinach leaves, and then submerge the leaves in water. In a few minutes, the leaves began to produce oxygen bubbles and eventually floated to the surface of the container. One participant pointed out students responses to Rachels question about why the leaves floated, and challenged her on the way it was constructed: But is this the why that you want? Some students here are explaining why leaves float, but are not explaining why the bubbles are getting produced. Another participant noted that This kid is using chemistry knowledge here but have they taken chemistry yet? I think this person is making an effort to use some outside knowledge. In the next few minutes, participants questioned the links between the way questions were posed, the different levels of expectations that teachers should have for students in writing explanations of scientific phenomena, and the scientific explanation for the phenomenon itself.

Figure 2. How personal theories of teaching and learning related to participants analysis of student work and to their representations of practice.


To get a quantitative picture of participants engagement with students artifacts, we compared the number of substantive references (for example, pointing out partial understandings, comparing written responses across samples, using the work to support hypotheses about the relationships between the prompts given by the teacher and the responses students gave) to the nominal references (for example, using the artifact as a prop to make a broad generalization that was not linked to a close examination of the work or making non-specific comment such as I liked what this student wrote). We then looked at the relative frequencies of substantive references in the CFGs that were initiated by puzzles of practice versus problems with students. Across the four CFGs, about half the sessions were framed as puzzles, and 85% of these were sessions were characterized by moderate to high rates of substantive references to student work. On the other hand, only 55% of those sessions initiated by problems with students were characterized by moderate or high numbers of substantive references to student work.

The influence of the tools in discourse.

During the final two CFGs we found that the ideas and the language in the rubric were taken up in a number of consequential ways primarily by TL-P and Transitional participants. These were categorized as analytical/evaluative, linguistic, pedagogical, and scientific. In some cases the tool was used in preparation for CFGs, in other cases the tool supported the conversations of the CFGs themselves, and interestingly, the tools were occasionally transformed by participants for use in their own classrooms. We do not view these uses as mutually exclusive of one another; two or more uses were often part of the same conversation, and contributed to the same understandings (AE2 and P5 for example).

1. Analytical/Evaluative (used in preparation for and during CFGs)

AE1. Used to characterize and classify individual responses by students.

AE2. Used to identify trends across student groups or within a group across time.

2. Linguistic (used during CFGs and in participants classrooms)

L1. Used during CFGs to represent practice in mutually understandable ways, to help pose and explore common problems of practice.

L2. Used as tool in participants classrooms as basis for common language for and by students about difference between what, how, and why explanations.

3. Pedagogical (used during CFGs and in participants classrooms)

P1. Used in CFGs as a standard to evaluate depth of ones own current questioning and tasks.

P2. Used to anchor conversations in CFGs about the scaffolding necessary to press for deeper explanations.

P3. Used in CFGs to calibrate expectations for students with different abilities.

P4. Used in CFGs to envision where to go next in pressing students for use of evidence, constructing explanations.

P5. Used over time to help determine big picture effects of ones teaching.

P6. Used in classroom as basis for assessments in everyday practice.

P7. Rubrics modified, then given to and used by students to direct and evaluate their own performances.

4. Scientific (used primarily during CFGs)

S1. Used in combination with the protocol to probe the why explanations; compels participants to unpack science ideas at appropriately fundamental levels.

In the previous sections of this research report, we have already provided passages that exemplify some of these elements. P5 is demonstrated in Amandas earlier quote, when she realized, after her practicum was over, that she had not once pressed her students to construct causal explanations for chemistry phenomena. In a later interview she referred to this as a turning point in planning for her first year of teaching. She was able to identify which aspects of her new curriculum were what based (her words) and replaced many of the surface level questions with deeper probes for students. Another participant, Sarah, used the three different levels of the explanation rubric to scaffold different student discussions of sickle cell anemia (P2, P4). The most sophisticated portion in the explanation rubric (level three: the why) became the goal for the unit, but she also recognized that depictions of what and how explanations, described in levels one and two, could guide how she would teach and what she would expect of her students in the early lessons of the unit.

While each of these uses of the rubric supported individual practice, some were critical to the intellectual work that was accomplished in the collective analysis of teaching. For example, using the tool in CFGs to represent practice in mutually understandable ways (L1) is instrumental in the refinement of teaching. Sarah, for example, described an osmosis activity in which her students had put eggs in three different solutions with different concentrations of salt. She asked them to Use words and pictures to describe what happened to each of the eggs if you left them in these conditions, and why. In the following quote, Sarah uses level 3 as shorthand for a deep causal explanation that both ties together evidence from multiple investigations and references underlying (theoretical) eventsa key practice in MBI.

I was hoping they could use some of the different pieces evidence from these activities throughout the unit, to add on to their ideas about what was happening to the egg. But most people didnt do that&I dont think I had anybody who got up to level 3. I didnt think they were using multiple findings, to fit them together to develop a theory. A few people were pushing that way, they used words like isotonic, hypotonic, but they werent connecting the various experiences they had in combination with this egg lesson.

Later in this conversation, the moderator used the revised protocol tool (which now required all participants in a CFG to determine the ideal causal explanation for the focal phenomena) in conjunction with the rubric to ask Sarah and others to imagine what a level 3 explanation would look like. In the following conversation, we can see not only that the rubric is being used to compel participants to talk about science ideas in deep ways (S1), but also used as a reference to evaluate the depth of ones own current questioning and tasks (P1), and hints of where to go next (P4).

Sarah: I think [a level 3] definitely needs to include something about there being a membrane. One of the big ideas we talked about was an internal and external environment, if we are shooting for the stars here, theyd be able to talk about the difference between the internal and external environment, and that there would be pores in the membrane that would allow some things to go across, and that they'd mention if the concentrations were different, that that would allow diffusion to happen. Except they could mention if the molecule was too big to get across.

Emily: I think too youd have something about corn syrup moving from inside the eggsomething about direction of the movement

Sarah: Yeah,

Adam: [refers to the rubric] Isnt a level three a why question?

Emily: Oh, yeah, so, why do the molecules

Jessica: So thats the how, thats good.

Adam: It gets more into stuff like entropy, thats why diffusion works

Emily: Why do molecules move in the first place?

Sarah: [She looks at what she had written as her own description for a level 3 explanation] I had a hard time coming up with a level three full explanation, I had Explains why concentration of eggs changeso, ok, its not why is this happening, but why did the mass decrease, so, its kind of a lower level of why, its not like [gestures with outstretched arms and looks at ceiling] Whyyy? Its not Why diffusion but Why did the mass change?  So I brought it downso I said for level 1 [refers to rubric] States expected results based on what he/she saw, so like Did the mass change? Or Did it look wrinkly? I had one kid say that. Then for level 2, Describes how mass changes, like water went out or in, so they were able to say that the contents were changing and thats how the mass changed. And then the why would be all the things we talked about earlier. Pores, membranes, gradients, that stuff. So its a why, but a different why. It's not like the big theoretical

Emily: Now I'm trying to think in my mindwhy? [laughter].  I'm trying to go to the next level.

Adam: It has to do with entropy, the random motion of molecules, I don't know

Emily: So do they have to be spread out in equal distances? And equally apart in the vessel theyre in? Is that the Why?

Jessica: You were talking about balance.

Emily: Homeostasis basically, the balance between, it has to be the same concentration on the external as on the inside. But why does it have to be? Why does it have to happen? Is it just that energy has to be equally dispersed?

This conversation epitomizes the need for teachers who attempt this kind of ambitious science pedagogy to not only identify the topic they are going to teach, but to understand for themselves the most rich and well integrated explanation for that phenomenon in preparation for teaching.

We never intended for our participants to hand this rubric over to students, but several did. Sarah, Emily, Simon, Amanda and Rachel (all TL-P or transitional participants) imported our rubric for classroom use, transforming it in the process so that their students could use it to plan how they would complete certain assessments and use it to judge their performances as well as the performances of others. Simon used his version of the rubric to mediate conversations with students about levels of explanation. Near the end of his first year of teaching, he recalled:

But after that first lab that I graded with that rubric theyd come up to me and ask Why did I get 8 out of 10 or 7 out of 10? well, lets take out the rubric.  So here is how I graded it and here is how the points break down, now when I read through this, what I am hearing is you are telling me how you think something happened and what I was looking for is why you think something happened or, you know, however it was.  I would give it back to the kids and say, okay, you underline in there in one color when you are telling me what. Underline in another color where you are telling me how.  Underline in another color where you are telling my why.  They would do their little colors and say, Oh, you are right I didnt tell you why at all&I think at that point it really started to sink in for a couple kids of okay that is great, there is nothing wrong with this answer, but I need to go beyond&and they are starting to.

Regarding the uses of the rubric as a tool, there were again consequential differences between the TL-U and TL-P participants. The TL-U participants typically used it only to classify individual responses by students (AE1). These individuals were more likely than TL-P participants to refer to the CFGs and the whole system of first year activities as analyzing student work and were more prone to see the process as simply a detailed form of grading. In contrast, TL-P participants talked about the analysis as an inquiry process rather than an evaluation. One participant, for example, stated without prompting in an interview that the analysis was not like grading, and another concluded after the final CFG that traditional grading to her had become a clumsy tool for understanding what her kids were capable of. In sum, the TL-P participants and occasionally those in transition:

" took up the language of the rubric more frequently,

" used it to make sense of their students work,

" used it to imagine more in-depth answers their students could give, and

" used it to imagine what forms of prompting and scaffolding their students would need to demonstrate deep understandings of the subject matter.

The TL-U participants on the other hand, seemed to use the terms in the rubric as linguistic props rather than functional ideas, and rarely applied the ideas embodied in the rubric into their own practice. Although there was little difference in the number of times the TL-U and the TL-P participants invoked the word explanation, the TL-U participants would only use the term to express what they wish they had done in practice (e.g., I should have pressed my students to go further in their explanation). For these participants, the idea of explanation, including its various enactments in the classroom was a vague notion, even though the rubric spelled out what characterized various levels of explanation and their TL-P colleagues were routinely using the details of the rubric as a regular touchstone to depict the work of their students. TL-U participants often substituted the term conclusions for explanation and when pressed to articulate to others what it meant to conclude, they would cast about for other equally ambiguous terms to support their reasoning. Even the phrase explain why was re-shaped by TL-U participants to fit their existing classroom practices. During the final CFG in May 2008, Patricia brought a set of five different laboratory assignments. In each she claimed that she had asked students to express what, why, and whats your evidence. The following dialogue shows the unease with which Patricia spells out to others what her intentions for students were.

Justin: What do you mean by why?

Patricia: They know they need to come up with a scientific (she gestures quotes with her fingers) reason, I never use the term theoretical with them, so most of them seem to come up with something scientific. People who tend to misinterpret it the most seem to say the reason why is because of the evidence. I get that a lot but usually I get something that is somewhat theoretical. So your question is Do they really know what I mean by why? (shakes her head side to side).

Melissa: So what do you mean by scientific?

Patricia: Well, that gets to the crux of it doesnt it? So the why is using some sort of concept we used in science class, and giving it a reason why. What did [students] mean by scientific? I dont know.

Adam: So what part of their answers are you looking for them to improve in?

Sarah: You mean the row on the rubric?

Patricia: I dont know.

Sarah: So earlier you said you werent seeing any trends in improvement over the year (she leans over the rubric, passes her hand over it), so what particular trends were you not seeing? (Patricia and rest of group laughs).

Despite participating in a methods class and multiple CFGs that had specified what it means to help students develop causal explanations, Patricia still held to only a cursory, almost non-functional model of this rhetorical practice.

In our classroom observations, we found that other participants who held these vague notions of explanation routinely undershot the capabilities of their students in doing scientific work. Luke, for example, had adopted a theory of instruction advocated in a series of university physics department workshops, and he resisted many of the strategies demonstrated in the teacher education program. His adopted system led him to dismiss a fundamental premise of MBIthat students could understand the underlying theoretical events and structures that furnished the why behind science phenomena. Luke, then, was unconvinced of the appropriateness of pressing students for explanation, despite participating in an early CFG with Simon (a TL-P participant) who was teaching in one of the most challenging high-needs middle schools in the area. Simons curriculum included the study of pulleys and forces, a topic similar to one taught by Luke. Simon had introduced to his students the theoretical notion of balanced forces and how their distribution in the pulley system made loads easier to lift. His students successfully transcended the what and the how and eventually explained, using theories of force, why simple pulleys functioned as efficient machines a level of understanding that Luke believed was beyond the capabilities of his suburban high school students.


We made frequent observations of participants classrooms. Figure 3 shows the four levels of  Pressing for explanations and number of observations we recorded for each participant at these levels (our observation rubric differed from that used by participants in that it had a 4 levels, the first level indicating no press for explanation). The TL-P participants are shown at the top half, the Transitional participants below them, and the TL-U participants are listed on the lower half. The shaded cells reflect the modal levels of participants teaching in their first year.  The modal levels were determined by direct observation and through debriefings in which participants described recent past and upcoming lessons. The TL-P individuals had significantly greater levels of sophisticated practice, in terms of pressing students for science explanations, than TL-U participants.

Figure 3. Participants appropriation of instructional strategies introduced during teacher education courseworktracked during classroom observations for student teaching and their first year of teaching


a = Student teaching classroom observation

Ü = First year of teaching classroom observation

Dark Shading = Participants’ typical practice for first year of teaching

* Note = Levels 3 and 4 represent pedagogical moves beyond what is required by or designed into participants standard curriculum

This trend held true for the other three core elements as well. Table 2 shows a comparison among our classroom observations of the four pedagogical components of ambitious pedagogy. There were approximately 80 total observations. The second row of the figure shows the numbers of these observations which we rated as going beyond the requirements of the curriculum used by these participants. We examined their standard curriculum and noted what kinds of instructional moves or classroom discourse were required. The top two levels of each of our dimensions generally required that participants go beyond what was encouraged by/designed into their curricula. Using this as a metric, participants went beyond the curriculum requirements in: 38% of the observations of Pressing for explanations, 41% of the observations of Selecting big ideas, 16% of the observations of Working with science ideas, and 63% of the observations of Working with students ideas. In comparing row three with row one of Table 2 we see that in about 10% of observations, participants engaged in pedagogical moves that could be classified as expert-like.

Table 2. Summary classroom observation data for the four dimensions of ambitious pedagogy associated with Model-Based Inquiry

Four Dimensions of Ambitious Pedagogy à

Pressing for explanation

Selecting big ideas/models

Working with science ideas

Working with students ideas

Total number of classroom observations for all participants in which this element was applicable





Observations (and percents of total observations) in which participant was rated beyond what standard curriculum required

31 (38%)

33 (41%)

13 (16%)

50 (63%)

Number of observations rated at highest level for that dimension





Average gain/loss in mean rating for all participants between student teaching and first year teaching (average gain/loss for TL-P participants)

+ .35 (+.60)

- .03 (+.24)

- .20 (-.34)

- .21 (+.14)

All four of these pedagogical elements were major features of the pre-service preparation, however in the first year CFG sessions focused almost exclusively on pressing for explanation. The bottom row of Table 2 shows the mean gain or loss in ratings for each of these four dimensions between the student teaching observations and the first year of teaching. Of the four dimensions, only Pressing for explanation showed an improvement, and this improvement was significant (an average of a third of a point). Even more significant were the differences in degree of improvement between the TL-P participants and the rest of the participants. TL-P and Transitional participants showed an average gain of .60 in Pressing for explanation ratings, while TL-U participant lost .08. It appears that the analysis of students work and the CFG conversations contributed to a significant change in pedagogy, but only for those who were classified as TL-P or Transitional.

We coordinated our classroom observations with the final interviews to identify changes in their practice that were attributable their participation in the CFGs. In the interviews, some participants described particular moments in the CFGs that gave them unexpected insights, which in turn, altered their classroom practice. Sarah, for example, cited the second CFG, during which they were given some planning time with peers later in the day and were asked to use a checklist developed by the participants themselves in the previous CFG. Their list included teaching strategies for scaffolding students efforts to use evidence in developing explanations. As Sarahs peers helped her move back and forth between concrete designs for her upcoming lesson and the scaffolding elements on the checklist, she suddenly realized that the principled explanation of scientific phenomena should serve as a focus of instruction (something that we as instructors thought we were making clear from the outset of the teacher education program). Sarah recalled the moment: It was the first time I had thought about lessons beingthe goal of the lesson is to end up with an explanation.

Other participants spoke not in terms of epiphany, but of refining ones vision over time, employing a more disciplined eye to their students performances and imagining with more clarity what might be pedagogically possible. Emily for example, spoke about what the analysis allowed her to see:

I think, you know, the analysis worksheets that we had to fill out on student work and rate them at a 2 or a 3 or something like this, it is definitely the first step in understanding, not particularly what one student is doing, but understanding what the majority of a class is doing and thinking. And picturing where it is that they are at and where they need to go. I dont think you can reflect in a quality way if you dont make some things concrete when youre analyzing student data.

A comment by Simon represents another major theme in participants thinking as a result of CFG activitiesa recalibration in expectations of students and of their own teaching:

I think what I realized is that I might have actually been pretty comfortable early on just leaving them with recognizing that things are happening and recognizing patterns of stuff or patterns in data. Had I not used that [rubric] to look at like, okay what are these students understanding and then whats the outcome that the book wants&And then realizing that theres obviously this whole other component of like understanding like why it is that things are happening.

Not everyone could articulate how the CFG experiences influenced their practice in terms of seeing features of student thinking or imagining new levels of expectation for students. The TL-U participants, to a person, spoke only in broad conceptual language about the CFG experiences. Patricia remarked, The CFGs were always amazing. I mean, it was always amazing to me that something could be picked up on that you hadnt thought of&So, that I think I can make huge strides as far as trying to make it a little more inquiry-based instead of this textbook-based and try to have things work together. When asked about how the analysis of work had changed her practice, she did not talk about fundamental shifts in thinking; rather, she talked only about acquiring portions of lessons that other participants had mentioned throughout the year: &I think I got great ideas with what they did in their classroom more than necessarily hearing their dilemma. Not necessarily strategies to work with kids, but more ideas for activities and how to set them upthat kind of thing. Her acquisition theory of student learning seemed mirrored in her own professional learning as well.

Another TL-U participant, Adam, admitted that examining student work was not helpful to him, and in the final interview he echoed a recurring theme from his own presentation of student work, that the responsibility for learning had little to do with him:

Well, I guess the analysis hasnt done much for me, probably because Ive noticed that all their [his students] answers just dont go that far.  So theyre always the same, basically.  They could go further.  I think the big help has been in the CFGs where Im getting feedback of how I could do this better next time.

Adam continued to say that, when looking at his student work, there was not much to analyze because his assignments were not framed in way that fit on the rubric. The fact that his assessments never required his students to explain science ideas did not strike him as problematic, and consequently he made no changes throughout the year in his instruction or assessment.



We begin with three inter-related assertions that problematize what it means to individually and collectively take an inquiry stance to ones teaching. First, the depth of analysis of students work and the way practice is represented to peers is influenced by participants tacit theories about the relationship between teaching and learning. Second, these differences in representations of practice vary along the fundamental discourses of who is responsible for learning (the student versus shared between students and teacher), and the expected outcome of dilemma discussions (fixing a problem versus learning more about how to support thinking). Third, in CFG situations, the quality and consequences of intellectual engagement with dilemmas of practice is shaped both by how the focal participant represents their practice to peers and by the underlying theories of teaching and learning each participant brings to the discussion.

Beginning from the first CFG, one group of participants consistently presented their dilemmas as puzzles of practice. These puzzles had three defining characteristics: they arose from in-depth analyses of student work across time and/or across different ability groups, they placed responsibility for learning in a space shared by the student and the teacher, and the questions posed to peers were about scaffolding learning versus fixing deficits in performances.

From a functional standpoint, when dilemmas were presented as puzzles, the other participants in that CFG tended to make more references to the student work in the course of the conversation. Substantive references to student work were important because they often initiated conversations that linked the science ideas with the nature of the questions or tasks given by the teacher, and with appropriate expectations for students. On the other hand, when dilemmas were presented as problems with students, the normalized portrayals of practice and the declarative nature of the dilemma as students not getting it seemed to foreclose on the possibility there may be something gained by looking closely at student work.

From a research standpoint, the disposition to frame dilemmas as puzzles appeared to arise from a more fundamental theory participants held about the relationships between teaching and learning, and that these theories would interact with the tools and tool-based practices we used in the CFGs. In our classroom observations, we noticed that those participants who presented their practice as problems with students were working from an acquisition model of learning. These individuals, whose model of pedagogy we labeled Teaching-Learning Unproblematic (TL-U) began lessons with little knowledge of students scientific ideas, and no efforts to make visible these beginning understandings. In contrast, the participants who routinely presented their dilemmas as puzzles of practice (Teaching-Learning Problematic, TL-P) often began units of instruction by eliciting students ideas, and then reshaped classroom conversations and tasks based on students conceptions.

Participants tacit theories of teaching and learning influenced not only how they represented their practice to others, but the ways in which they would participate in the CFGs, use the tools to inquire into practice, appropriate the language necessary to pose and solve problems with their peers, and eventually make strategic shifts in their practice based on CFG conversations. These theories may partially explain the contradictory findings in the literature around dialogic engagement in CFGs. Crespo (2002), for example, found that when teachers collaboratively examined their students work in mathematics, they engaged in talk characterized by the use of monologues, seeking and giving approval, and non-analytical or unproblematic narration of events. Crockett (2002) found that under similar conditions teachers showed a great deal of intellectual involvement; there were explicit disagreements about what the work was showing, and participants readily disclosed uncertainties and confusion to each other. The overt goals of collectively analyzing student work may not influence how participants engage with each other as much as participants working models of teaching and learning and, in turn, how practice is then represented to peers.

The question remains however: Why would the TL-P participants engage in deeper analyses of student work than the TL-U participants? Extensive analysis requires effort, and such an investment, especially during the first year of teaching requires a sense of pay-off that valuable insights could emerge from the work. If TL-P participants held to a theory that learning is a product of sense-making, and that sense-making is generated by certain questions and tasks that have to be adapted to learners existing conceptions, then an analysis of student workif it is the kind of work that displays student thinkingwill have something valuable to indicate about the effects of instructional choices. In addition, participants analyses would be perceived as useful if they felt they could then make changes in their instruction. It turns out that all TL-P regularly made adaptations to their curriculum. We believe that the TL-U participants, on the other hand, followed their curriculum as an organized presentation of ideas and assumed learning should follow from exposure to it. The TL-U participants, in fact, rarely varied from their curricula, and when they did, it was to add a question or two to an existing assignment. For TL-U participants, the teachers potential roles in student sense-making was not a central feature of their pedagogical imagination. For these participants, the analysis of student work took on an evaluative function, similar to grading. As such, it did not require a look across ability groups and consequently revealed only broad trends from a single assignment or from individual students whose only characteristic was that they did not take up information from the classroom experiences. This hypothesis about patterns of pedagogical decision-making may help to explain why Gearhart et al. (2006, p. 33) and others have found that Weaknesses in teachers interpretation of evidence have consequences for equitable teaching and assessment (see also: Morgan & Watson, 2002; Schafer, Swanson, Bene, & Newberry, 2001; Watson, 2000). It may be that instruction that is sensitive to student differences and adaptive (i.e. equitable), emerges from the same underlying theories of teaching and learning that allows such in-depth analyses of student work in the first place and that makes constructivist approaches such as Model-Based Inquiry seem potentially productive for young learners.

Regarding participants differences in how they perceived responsibility for learning, if learning is considered a reception phenomenon (as TL-U participants believed), the burden of understanding lies primarily with the student. For TL-P participants, learning is a more complex interaction between the scientific ideas to be learned, conditions of learning that may be tailored to students current conceptions (the phrasing of a question, the particular demands of a task) and purposeful sense-making on the part of students. It follows that learning outcomes would be considered the products of teacher-student-task interactions. It would further be sensible for TL-P participants to frame the CFG questions in terms of how the teacher can create effective conditions for learning beyond what the curriculum offers (i.e. scaffolding), given that the curriculum by itself cannot shape instruction.


In this section, we draw upon activity theory to interpret two data patterns. First, when dilemmas were presented as puzzles of practice, the tools served to mediate talk that pressed participants to be accountable to understanding the science and the thinking represented in students work samples. This accountable talk was characterized by inter-animated discussions in which participants used language of rubric to make collaborative sense of the student artifacts, co-develop local theory about what counts as different levels of scientific explanation, imagine more sophisticated answers students could give, and posit the types of scaffolding needed to support higher student expectations. Second, rubrics that describe performance levels of a disciplinary practice, valued within a particular socio-professional setting, can be used not only as an analytical/evaluative tool for student work, but can also serve linguistic, pedagogical, and scientific functions. These functions, however, are less visible and consequently less accessible to individuals with unproblematic images of teaching and learning. These findings may help explain outcomes such as those by Horn (2005), who found that groups of mathematics teachers could productively use reifications of reform ideas (curriculum materials, assessment instruments, etc.) as tools to guide collaborative inquiries into practice, but that these tools varied widely in the extent to which they transformed teaching practice.

In our study the TL-P participants used the structure (levels) of the rubric and the language of the rubric (e.g. the what, how and why scientific explanations, or the idea of what is theoretical versus observable) in ways that transcended the purely evaluative functions of the tool used by the TL-U participants. Together with changes in the protocol toolwhich over the two years increasingly foregrounded the requirement that participants themselves co-construct explanations for the central science phenomenonthe rubric mediated conversations in which participants grappled with different versions of the scientific ideas in the lessons.

We hypothesize that the tools pressed participants to be accountable to understand both the science and the student work, and through these functions influenced key conversations (Figure 4). Accountability to the science was promoted by the protocol in that it required participants to present student work on a valued scientific practice. It directed participants to co-explain scientific phenomena in enough detail to support conversations about learner expectations and needed scaffolding. Accountability to understanding the students work was prompted by the protocol, in that time was provided for peers to closely examine artifacts brought to the CFGs and ask probing questions. As for the rubric, it aided accountability to science in that it furnished a language participants could use to co-construct deep scientific explanations and compare these against surface level explanations. The rubric fostered accountability for students work in that it allowed participants to gauge at what depth students were constructing evidence-based explanations. Overall, this press for accountability influenced four inter-animated conversations of pedagogical value. These occurred in the later CFGs and generally involved the TL-P and Transitional participants. These individuals used language from the rubric to 1) make collaborative sense of students partial understandings and to 2) co-develop specific theory about what counts as different levels of scientific explanation. These conversations led participants to 3) imagine more sophisticated answers students could give and 4) suggest types of scaffolding needed to support these expectations for student performance.

Figure 4: Conversations of pedagogical value, evolving later in study, supported by the language of the rubric and demands of the protocol.


The rubric was used in ways we had not anticipated (linguistic, pedagogical, and scientific). Several of its most important functions were realized outside the analysis and CFG settings. The portability of this tool was not an intentional design element; nevertheless, the rubric took on a life of its own. Participants used it as a referent to directly shape instructional choices, and to construct classroom assessments. Several had even had their own students take up the language of the rubric as way to negotiate with each other and the teacher what counted as high-level performances in the classroom. Kazemi and Hubbard (2008) note that it may be helpful to design analysis tools specifically as boundary objects that can travel across multiple settings (see Borko et al., 2008; Jacobs et al., 2007; Lewis et al., 2006), however our findings indicate that, to be taken up for use in classrooms, the tool must have more than a general heuristic value for critical analysis. It must: 1) embody a valued practice (e.g. scientific explanation) 2) be applicable across grade levels and subject matter sub-domains, 3) represent practice in accessible language, and 4) provide descriptions of levels of performance from which teachers and students can identify where they are and their next level of performance.

This model suggests that the idea of shared language as a criterion for successful engagement within a community of professionals be clarified. Vocabulary such as explanation, inquiry, conclusions, scientific can be used to describe a range of fundamentally different activities or describe what appears in student work. After two years, a number of our participants clung to poorly articulated versions of these terms and/or used frames for teaching science that were based on vague notions of the scientific method. Patricia, for example, never developed an actionable pedagogical theory about scientific explanation. Our conclusion is that it is necessary but insufficient to have a shared vocabulary. The language must reflect and reinforce shared conceptions of what counts as key intellectual tasks. Without this, as Patricia demonstrates, participants have difficulty communicating to others what they actually do in the classroom, analyzing their students performances over time, and imagining how they might scaffold students to achieve definable goals.

The looming question for those of us interested in teachers early career development is Can more sophisticated theories about teaching and learning be fostered in novices through the design of praxis tools and the broader CFG experiences? Because we have evidence that an unsophisticated view of teaching and learning underlies pedagogical reasoning, teaching moves, and frames for the analysis of student work, it seems as though we would merely be treating the symptoms if we, for example, more explicitly guided these individuals to represent their dilemmas as puzzles of practice or directed them to speak about the responsibility for learning as shared between student and teacher. On the other hand, we believe that helping participants design classroom assessment tasks that can yield rich data on student thinking may allow TL-U participants to see how more nuanced understandings are expressed by young learners and at least understand qualitative differences in how high-achieving, middle, and struggling students respond to instruction. For TL-P participants, the more rigorous analysis of high, middle, and underachieving students yielded insights about different needs they had, not simply that struggling students need more of something. In contrast to these analyses, the TL-U participants could not determine what they as teachers were doing particularly well or poorly, in part because they could not see the impact of their instruction on different types of learners.


Our primary assertion here is that the principled and ongoing analysis of student work around an important professional practice can support the development of expert-like pedagogical practice by novice teachers. However, before this is explored, a caution about causality and fully accounting for findings is in order. Up to this point, we have emphasized the influences of varying forms of participation in tool-supported routines as linked to changes in practice, rather than directly attributing such changes to the tools themselves or to the mental models of teaching held by participants. Even though simple causal storylines are inappropriate in this case, examining actual classroom practice can inform our understanding of how the tools and mental models of teaching/learning mediated participation in the collegial inquiries and perhaps teaching itself. In accounting for changes in classroom practice, we also recognize the inevitable influences of participants school contexts, and in particular, those of their cooperating teachers. As student teachers, some of our participants felt highly constrained about the kinds of practices they could engage in or about deviating from their curriculum. On the positive side, other participants readily took up thoughtful practices used by their cooperating teachers. We documented, however, a significant number of counterexamples for the dominating influence of context. Several individuals who reported feeling constrained in their instructional style and curriculum choices during their student teaching, taught in similarly conformist ways when they moved into different school settings for their first year of practicesettings where they were free to use strategies and curriculum as they saw fit. Regarding cooperating teachers, Sarah and Simon, two of the most sophisticated beginners, worked with mentors whom they reported as routinely using practices antithetical to their own visions of equitable and effective instruction. So, while we acknowledge the likely contribution of school-based contextual influences, we focus principally on the context we provided, namely the regular collaborative inquiries into practice.

We see three major trends related to teaching practice. First, if we look at all participants, they frequently went beyond what the curriculum required them to do in the areas of Working with science ideas (16% of classroom observations), Selecting big ideas (41%), Working with students ideas (63%) and Pressing for explanation (38%). Second, if we test the hypothesis that participating in the system of tools and tool-based practices was linked with more effective teaching, supporting evidence is provided by the fact that the mean gain in rating for Pressing for explanations between student teaching and first year teaching was more than a third of a point (+.35) while the other three categories slightly declined. Third, if we test an adjunct hypothesis that productive participation in the CFGs (effortful analysis of student work, engaging in the student work of others, using the rubric to make sense of ones current teaching repertoire and considering how to advance ones practice) would be associated with even higher gains in the target pedagogical elements, we find supporting evidence in the fact that the TL-P and the Transitional participants had a mean gain of .60 in Pressing for explanation, while the TL-U participants remained nearly the same between student teaching and first year teaching. This increase by TL-P participants is even more impressive when we consider that some of their student teaching observations were already rated as level 3 or 4 (the maximum).

To put these ratings in perspective, going beyond the curriculum for these novices does not mean grafting on additional questions or tasks to the lessons they were handed (literally). On the pressing for explanation dimension, for example, teaching at a level 3 puts students in a qualitatively different and more challenging learning situation as compared to a level 2 or 1. Teachers can ask new kinds of epistemic questions since unobservable entities and events are being used to explain how and why the observable world operates in the ways it does. Such considerations are at the very heart of scientific theorizing and our Model-Based Inquiry approach. The fact that several of our participants in this study were successful in taking students to these levels with topics like pulleys (Simon), air pressure (Barbara), pond ecosystems (Rachel), and natural selection (Sarah), leads us to label this kind of teaching as ambitious and expert-like.

Teaching in these ways involves riskier, more complex ways of relating to the subject matter and the learners than traditional approaches (characterized by the 1 rating of the four elements of pedagogy in our matrix). We cannot overstate what other researchers have found (Baldi et al., 2007; Banilower et al., 2006; Goodlad, 1983; Horizon Research International, 2003; Roth & Garnier, 2007)that traditional forms of instruction remain the norm in science classrooms and that these methods are generally unresponsive to students thinking and lack disciplinary rigor. Any attempts by new teachers to challenge these norms, then, are small victories that must be supported. Support is key because this kind of teaching comes at a price in the chaotic early years of ones career. In the fall of her first year of teaching, Sarah confided to usher voice shaking from stressthat it would save her hours of preparation time if she would just use the lesson plans of her colleagues (which were heavily fact-based). I cant, she said, Im not that kind of teacher.


This research indicates that pre-service and first year teachers are capable of productively analyzing student work, and more importantly that these analyses can play a significant role in helping some teachers develop expert-like classroom repertoires early in their career. Thankfully, none of these findings are particular to science; similarly principled tools and routines can be developed in any subject matter area where student thinking or performance can be documented. The question remains, howeverWho benefits most from these experiences and why? The nature of individuals participation in systems of tool-based practices reflects their underlying assumptions of what counts as learning and what counts as good (or adequate) teaching. The habits of mind that follow from these orientations influence the kinds of artifacts that can be collected from classroom tasks, the depth of analyses of student work, the ways in which dilemmas are represented to peers, the nature of collegial conversations around these dilemmas, and ultimately what individuals can learn from the collaborative examination of records of practice. It appears that those who begin their careers with a problematized view of the relationships between teaching and learning are not only more likely to engage early in more skilled teaching, but also to benefit more from evidence-based collaborative inquiry into practice. This kind of professional momentum is more difficult to achieve for those beginning their careers with simplified conceptions of teaching and learning. These findings help explain why, in and of themselves, neither experience nor inquiry improves teaching (Ball & Cohen, 1999).

This study also highlights the potentially powerful role of praxis tools and tool-based routines, tailored to the needs of beginning teachers, in fostering ambitious pedagogy. Our relatively simple system of protocol and rubric served several important mediating functions which were transformative for many participants. This success, we believe, can support the design of more sophisticated systems of tools for early career teachers collaborative inquiry and can inform theory around the implementation of these tools. We are currently testing a set of discourse tools for classroom conversations that novices find especially challenging, and a learning progression for Model-Based Inquiry to be used by teachers in both crafting investigative experiences for students and assessing where their own practice is located on this progression.

Hiebert et al. (2007) suggest that there is a long learning curve to expert teaching, and that this path is not traversed very far in preparation programs. We do not fundamentally disagree with the latter half of this statement, but feel that there may be multiple paths to expertise and that the journey may be hastened substantially with coherent programs of support that focus uncompromisingly on student learning. After witnessing participants first year in classrooms, we saw how the faint echoes of best practices from their preparation program could be drowned out by the unrelenting pace of the plan-teach-grade cycles that dominate their working lives. Given this, we consider it not only professionally prudent, but a moral imperative to allow early career teachers regular opportunities for collaborative, supportive and disciplined reflection on their practice.


Achinstein, B. (2002) Conflict and community: Micropolitics of teacher collaboration. Teachers College Record, 104(3), 421-455.

Banilower, E., Smith, P. S., Weiss, I. R., & Pasley, J. D. (2006). The status of K-12   science teaching in the United States: Results from a national observation survey. In D. Sunal & E. Wright (Eds.) The impact of the state and national standards on K-12 science teaching, pp. 83-122. Greenwich, Connecticut: Information Age Publishing.

Baldi, S., Jin, Y., Skemer, M, Green, P., Herget, D. & Xie, H. (2007). Highlights from PISA 2006: Performance of US 15-year old students in science and mathematics literacy in an international context. National Center for Education Statistics, US Department of Education.

Ball, D. & Cohen, D. (1999). Developing practice, developing practitioners: Toward a practice-based theory of professional education. In L. Darling-Hammond & G. Sykes (Eds.) Teaching as the learning profession: Handbook of policy and practice. (pp. 3-32)San Francisco: Jossey-Bass.

Borko, H., Jacobs, J., Eiteljorg, E., & Pittman, M. E. (2008). Video as a tool for fostering productive discussions in mathematics professional development. Teaching and Teacher Education, 24, 417-436.

Cobb, P., Dean, C., & Zhao, Q. (2006). Conducting design experiments to support teachers' learning. Paper presented at the Annual Conference of the American Educational Research Association, San Francisco, CA.

Cole, M. (1996). Cultural psychology: A once and future discipline. Cambridge, MA: Harvard University Press.

Crespo, S. (2002, April). Doing mathematics and analyzing student work: Problem-solving discourse as a professional learning experience. Paper presented at the American Educational Research Association, New Orleans.

Creswell, J. (1998). Qualitative inquiry and research design. Thousand Oaks, CA: Sage Publications.

Crockett, M. D. (2002). Inquiry as professional development: Creating dilemmas through teaches work. Teaching and Teacher Education, 18, 609-624.

Curry, M. (2008). Critical friends groups: The possibilities and limitations embedded in teacher professional communities aimed at instructional improvement and school reform. Teacher College Record, 110(4), 733-774.

Darling-Hammond, L. & McGlaughlin, M. W. (1995). Policies that support professional development in an era of reform. Phi Delta Kappan, 76(8), 597-604.

Engestrom, Y. (1999). Activity theory and individual and social transformation. In Y. Y. Engestrom, R. Miettinen, & R. Punamaki (Eds.) Perspectives on activity theory (pp. 19-38). New York; Cambridge University Press.

Feldman, M. S. & Pentland, B. T. (2003). Reconceptualizing organizational routines as a source of flexibility and change. Administrative Science Quarterly, 48(1), 96.

Friedman, M. (1974). Explanation and scientific understanding. Journal of Philosophy, 71, 5-19.

Fullan, M. (1993). Change forces: Probing the depths of educational reform. Bristol, PA: Falmer Press.

Gearhart, M, Nagashima S, Pfotenhauer, J, Clark, S, Schwab, C,Vendlinski , T, Osmundson E, Herman J, & Bernbaum, D. (2006) Developing expertise with classroom assessment in K-12 science: Learning to interpret student work. Interim Findings From a 2-Year Study. CSE Technical Report 704. Center for the Assessment and Evaluation of Student Learning (CAESL)/University of California, Berkeley & Center for the Assessment and Evaluation of Student Learning (CAESL)/University of California, Los Angeles.

Goldenberg, C., Saunders, B., & Gallimore, R. (2004). Settings for change: A practical model for linking rhetoric and action to improve achievement of diverse students. Final report to the Spencer Foundation: Grant #199800042). Long Beach: California State University at Long Beach.

Goodlad, J. (1983) A place called school. New York: McGraw-Hill.

Goodlad, J. (1990). Teachers for our nation's schools. San Francisco: Jossey-Bass.

Green, J., & Bloome, D. (1997). Ethnography and ethnographers of and in education: A situated perspective. In J. Flood, S. Brice Heath, D. Lapp (Eds.), Research on teaching literacy through the communicative and visual arts (pp. 181-202). New York: Macmillan.

Grossman, P., & McDonald, M. (2008). Back to the future: Directions for research in teaching and teacher education. American Educational Research Journal, 45, 184-205.

Grossman, P., Valencia, S., Evans, K., Thompson, C., Martin, S., & Place, N. (2000). Transitions into teaching: Learning to teach writing in teacher education. Journal of Literary Research, 32(4), 631-662.

Grossman, P., Smagorinsky, P. & Valencia, S. (1999). Appropriating tools for teaching English: A theoretical framework for research on learning to teach. American Journal of Education, 108, 1-29.

Hammer, D. (1999). Teacher inquiry. Newton, MA: Center for the Development of Teaching, Education Development Center, Inc.

Hammerness, K., Darling-Hammond, L., Bransford, J., Berliner, D., Cochran-Smith, M., McDonald, M., & Zeichner, K. (2005). How teachers learn and develop. L. Darling-Hammond, J. Bransford, P. LePage, K. Hammerness, & H. Duffy (Eds.). Preparing teachers for a changing world: What teachers should learn and be able to do (pp. 358-389). San Francisco, CA: Jossey-Bass.

Hiebert, J., Morris, A, Berk, D. Jansen, A. (2007). Preparing teachers to learn from teaching. Journal of Teacher Education, 58(1), 47-61.

Horizon Research International. (2003). Special tabulations of the 2000-2001 LSC teacher questionnaire and classroom observation data. Chapel Hill, NC: Horizon Research.

Horn, I. (2005). Learning on the job: A situated account of teacher learning in high school mathematics departments. Cognition and Instruction, 23(2), 207-236.

Jacobs, V. R., Franke, M. L., Carpenter, T. P., Levi, L., & Battey, D. (2007). A large-scale study of professional development focused on children's algebraic reasoning in elementary school. Journal for Research in Mathematics Education, 38, 258-288.

Kazemi, E., & Franke, M. L. (2004). Teacher learning in mathematics: Using student work to promote collective Inquiry. Journal of Mathematics Teacher Education, 7, 203-235.

Kazemi, E. & Hubbard, A. (2008) New Directions for the Design and Study of Professional Development:  Attending to the Coevolution of Teachers Participation across Contexts. Journal of Teacher Education, 59(5), 428-441.

Kennedy, M. (1999). The role of pre-service education. In L. Darling-Hammond & G. Sykes (Eds.) Teaching as the learning profession: Handbook of policy and practice (pp. 54-85). San Francisco: Jossey-Bass.

Lampert, M., & Graziani, F. (2009). Instructional activities as a tool for teachers and teacher

educators learning. Elementary School Journal, 109(5), 491509.

Lewis, C., Perry, R., & Murata, A. (2006). How should research contribute to instructional improvement?  The case of lesson study. Educational Researcher, 35, 3-14.

Lieberman, A. & Miller, L. (2001). Teachers caught in the action: The work of professional development. New York: Teachers College Press.

Little, J.W. (2007). Teachers' accounts of classroom experience as a resource for professional learning and instructional decision making. In P.A. Moss (Ed.), Evidence and decision-making: The 106th yearbook of the National Society for the Study of Education, Part I (pp. 217-240). Malden, MA: Blackwell Publishing.

Little, J. W., & Horn, I. S. (2007). 'Normalizing' problems of practice: Converting routine conversation into a resource for learning in professional communities. In L. Stoll & K. S. Louis (Eds.), Professional learning communities: Divergence, depth, and dilemmas (pp. 79-92). New York: Open University Press.

Lortie, D. (1975). Schoolteacher: A sociological study. Chicago: University of Chicago Press.

Morgan, C., & Watson, A. (2002). The interpretative nature of teachers assessment of students mathematics: Issues for equity. Journal for Research in Mathematics Education, 33, 78110.

National Research Council (2005a). Americas lab report: Investigations in high school science. Committee on High School Laboratories: Role and vision. In S.R. Singer, A. L. Hilton, & H. A. Schweingruber, (Eds.). Board on Science Education. Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academies Press.

National Research Council. (2005b). How students learn science in the classroom. Washington DC: National Academies Press.

National Research Council (2006). Taking science to school: learning and teaching science in grades K-8. Committee on Science Learning, Kindergarten Through Eighth Grade. R. A. Duschl, H. A. Schweingruber, and A. W. Shouse (Eds.). Board on Science Education, Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.

Nave, B. (2000).  Among critical friends: A study of critical friends groups in three Maine schools. Ed.D. dissertation, Harvard University, United States MA. Retrieved September 7, 2005, from ProQuest Digital Dissertation database. (Publication No. AAT 9968318).

Nelson, T. H., Slavit, D., Perkins, M,. & Hathorn, T. (2008). A culture of collaborative inquiry: Learning to develop and support professional learning communities. Teachers College Record, 110(4).

Nemser, S. F. (1983). Learning to teach. In L. Shulman & G. Sykes (Eds.), Handbook of teaching and policy (pp. 150-170). New York: Longman.

Newmann, F., & Associates. (1996). Authentic achievement: Restructuring schools for intellectual quality. San Francisco: Jossey-Bass.

Nolen, S. B., Ward, C. J., Horn, I. S., Childers, S., Campbell, S., & Mahna, K. (forthcoming). Motivation in preservice teachers: The development of utility filters. In M. Wosnitza, S. A. Karabenick, A. Efklides & P. Nenniger (Eds.). Contemporary Motivation Research: From Global to Local Perspectives. Ashland, OH: Hogrefe & Huber.

Roth, K. & Garnier, H. (2007). What science teaching looks like: An international perspective. Educational Leadership, 64(4), 16-23.  

Salmon, W. (1989). Four decades of scientific explanation. Minneapolis, MN: University of Minnesota Press.

Sandoval, W., Deneroff, V., & Franke, M. (2002). Teaching as learning, as inquiry: Moving beyond activity in the analysis of teaching practice. Paper presented at the annual conference of the American Educational Research Association, April, New Orleans, La.

Schafer, W. D., Swanson, G., Bene, N., & Newberry, G. (2001). Effects of teacher knowledge of rubrics on student achievement in four content areas. Applied Measurement in Education, 14, 151170.

Schauble, L., Glaser, R., Duschl, R., Schulze, S. & John, J. (1995). Students understandings of the objectives and procedures of experimentation in the science classroom. Journal of the Learning Sciences, 4(2), 131-166.

Schwarz, C. & White, B. (2005). Meta-modeling knowledge: Developing students understanding of scientific modeling. Cognition and Instruction, 23(2), 165-205.

Simmons, P. and others (1999). Beginning teachers: Beliefs and classroom actions. Journal of Research in Science Teaching, 36(8), 930-954.

Smith, C. L., Maclin, D. Houghton, C. & Hennessey, M. G. (2000). Sixth-grade students epistemologies of science: The impact of school science experiences on epistemological development. Cognition and Instruction, 18(3), 349-422.

Spillane, J. & Miele, D. (2007). Evidence in practice: A framing of the terrain. In P.A. Moss (Ed.), Evidence and decision-making: The 106th yearbook of the National Society for the Study of Education, Part I (pp. 46-73). Malden, MA: Blackwell Publishing.

Stewart, J., Hafner, R., Johnson, S., & Finkel E. (1992). Science as model-building: Computers and high school genetics. Educational Psychologist, 27, 317-336.

Stewart, J., Passmore, C., Cartier, J. Rudolph, J., & Donovan, S. (2005). Modeling for understanding in science education. In T. Romberg, T. Carpenter, and F. Dremock (Eds.) Understanding mathematics and science matters (pp. 159-184). Mahwah, NJ; Lawrence Erlbaum Associates.

Strauss, S. (1993). Teachers pedagogical content knowledge about childrens minds and learning: Implications for teacher education. Educational Psychologist, 28, 279-290.

U.S. Department of Education. (2008). Web document retrieved June 25th, 2008 at http://www.ed.gov/nclb/methods/teachers/hqtflexibility.html.

Warren-Little, J., Gearhart, M., Curry, M., & Kafka, J. (2003). Looking at student work for teacher learning, teacher community, and school reform. Phi Delta Kappan, 185-192, November.

Watson, A. (2000). Mathematics teachers acting as informal assessors: Practices, problems and recommendations. Educational Studies in Mathematics, 41, 6991.

Wenger, E. (1998). Communities of practice:  Learning, meaning, and identity. Cambridge: Cambridge University Press.

Wertsch, J. (1991). Voices of the mind: A sociocultural approach to mediated action. Cambridge, MA: Harvard University Press.  

Wheatley, K. (2002). The potential benefits of teacher efficacy doubts for educational reform. Teaching and Teacher Education, 18, 5-22.

Wilcox, S.K., Schram, P, Lappan, G., and Lanier, P. (1991). The role of a learning community in changing preservice teachers' knowledge and beliefs about mathematics education. East Lansing, MI: Michigan State University. ERIC Document Reproduction Service No. 330 680.

Wilson, S. M., & Berne, J. (1999). Teacher learning and the acquisition of professional knowledge: An examination of research on contemporary professional development. In A. Iran-Nejad, & P. D. Pearson (Eds.), Review of Research in Education, 24, 173-209.

Windschitl, M., Thompson, J. & Braaten, M. (2008a). Beyond The Scientific Method: Model-Based Inquiry As A New Paradigm of Preference for School Science Investigations. Science Education, 92(5), 941-967.

Windschitl, M., Thompson, J. & Braaten, M. (2008b). How novice science teachers appropriate epistemic discourses around model-based inquiry for use in classrooms. Cognition and Instruction, 26(3), 310-378.  


Four dimensions of Model-Based Inquiry instruction, ranging from least sophisticated practices on left to practices representing ambitious pedagogy on right. Note: This is not the same rubric that the participants used to analyze student work.

Less sophisticated

More sophisticated

1) Selection and treatment of key ideas from the curriculum

Topic focus

" T selects things as topic for instruction.

" In class, Ts press is on describing, naming, labeling, identifying, using correct vocabulary.  

Process focus

" T selects natural processes as topics, but without any connections to underlying causes.

" In class, T focuses on what is changing in a system or descriptions of how a change happens within a condition.

Model/Theory focus

" T able to see fundamental ideas in the curriculum.

" T has Ss focus on unobservable and/or theoretical processes or on the relationships among science concepts.

2) Working with students ideas

Monitoring, checking, re-teaching ideas

" T begins instruction with no knowledge of Ss conceptions.

" Instruction centers on delivering correct information.

" Whole class conversations are only to check for nominal understandings.

" T engages in one-on-one tutoring to see if students get it.

Eliciting Ss initial understandings

" T elicits Ss initial hypotheses, questions, or conceptual frameworks about a scientific phenomenon.

" This information not consciously used to shape subsequent instruction.

References Ss ideas & adapts instruction

" Within and across lessons T elicits and uses Ss current conceptions of science ideas to reshape direction of classroom conversations.

" T engineers productive classroom conversations, or consciously re-shapes Ss line of thinking across multiple lessons.

3) Working with science ideas in the classroom

Primary focus on procedure

" T asks Ss to identify variables in a system and describe an experimental set-up.

" Science concepts are played down to afford time to talk about designing investigations.

" Talk with Ss around method is about error and validity.

Discovering or Confirming Science Ideas

" T has Ss discover conceptual relationships for themselves (with minimal background ahead of time) OR

" T has Ss use an activity as proof of concept.

Forwarding science ideas to work on

" T foregrounds key science concepts and asks Ss to use an investigation to make sense of the concepts.

" Focus is on sense-making between data and developing science concepts.

Model-Based Inquiry focus

" T set-up for inquiry and data collection highlights tentative explanatory models as the basis for investigation into a phenomenon.

" T uses model as a touching point before, during and after an inquiry; builds in background knowledge of key science ideas and models before, during and following an inquiry.

4) Pressing for explanation

What happened explanation

" T asks Ss to provide a description of a phenomenon or thing, or may ask Ss to put into words a given scientific correlation.

How/ partial why explanation

" T asks students to characterize relationships between observable/detectable features of a system.

Causal explanation

" T asks Ss to use theoretical or unobservable  processes to tell causal story of why something happened.

" T unpacks/scaffolds what counts as an accountable scientific explanation with Ss.


Protocol for Critical Friends Groups (note Step 6 added after first CFG*)

Consultancy Protocol


Presenter teacher bringing student work and a question for the group to discuss.

Facilitator colleague who coordinates the group process and monitors time while participating.

Participants teacher colleagues who collaborate in the analysis, discussion of student work.






Presenter gives overview of student work and central question or dilemma.  Please address the following factors:


Summary of findings from the analysis of student work including any patterns or trends seen in the data.


Brief context of the lesson(s) & student(s) featured in the samples of work.


Central question or dilemma that emerged from the analysis and that is featured in the samples.

5 minutes


Reading & Listening

Group quietly reads and reviews the sections of student work pertaining to presenters dilemma.

15 minutes


Clarifying questions

Participants ask clarifying questions of presenter.  Clarifying questions have brief, factual answers.  Presenter responds.

5 minutes


Probing questions

Participants ask probing questions of presenter.  Probing questions push presenter to think deeply about assumptions and different perspectives.  The goal is to use questions to help presenter expand his/her thinking about student work and central question/dilemma.  Presenter responds to the probing questions, but there is no larger discussion.

5 minutes



Presenter is silent listener while participants engage in larger discussion of student work, the central question, and the information gathered from responses to questions.  Participants encouraged to include both warm and cool feedback in the discussion.  


Warm feedback identify what you see or hear about successful first steps that students made (or that the teacher made) in these assignments.


Cool feedback suggest an area that has some room for improvement and provide the next step that could be taken.

10 minutes


Scientific Evidence and Explanation

What is a level 3 explanation for this phenomenon?

What could teachers do to be more rigorous about incorporating content into the inquiry?

10 minutes



Presenter reflects on what he/she learned from the consultancy.  Presenter reflects on any new ideas, new perspectives, or questions that emerged from the group discussion.  Presenter also reflects on central question/dilemma in light of discussion.

5 minutes



Facilitator leads conversation about the overall group process reflecting on the dynamics of the group and the use of the protocol. Some ideas for debriefing include:


Active listening How did group members use strategies like paraphrasing and wait time to be active listeners?


Generous reading How did group members give the student work a generous reading during the process?


Underlying assumptions How did group members challenge underlying assumptions during the process?

5 minutes

Cite This Article as: Teachers College Record Volume 113 Number 7, 2011, p. 1311-1360
https://www.tcrecord.org ID Number: 16061, Date Accessed: 10/7/2021 8:43:37 AM

Purchase Reprint Rights for this article or review
Article Tools

Related Media

Related Articles

Related Discussion
Post a Comment | Read All

About the Author
  • Mark Windschitl
    University of Washington
    E-mail Author
    MARK WINDSCHITL is a Professor of Science Teaching and Learning. His research interests deal with the early career development of science teachers. Current work includes a five-year project to develop and study a system of tools and tool-based practices for early career secondary science teachers that support transitions from novice to expert-like pedagogical reasoning and practice.
  • Jessica Thompson
    University of Washington
    E-mail Author
    JESSICA THOMPSON is a Research Assistant Professor with the University of Washington College of Education. Her areas of inquiry focus on engaging underserved girls in science and on helping science teachers work toward ambitious and equitable pedagogy.
  • Melissa Braaten
    University of Washington
    E-mail Author
    MELISSA BRAATEN is a research assistant in science education working with the team to develop the tools, videos, and coursework that are part of this project. As a 4th year doctoral student Melissa is researching the relationships between teachers' learning, teachers' classroom practice, and student learning.
Member Center
In Print
This Month's Issue