Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13
Topics
Discussion
Announcements
 

Classroom Assessment: Tensions and Intersections in Theory and Practice


by Susan Brookhart - 2004

The practice of classroom assessment occurs at the intersection of three teaching functions: instruction, classroom management, and assessment. Theory relevant to studying classroom assessment comes from several different areas: the study of individual differences (e.g., educational psychology, theories of learning and motivation), the study of groups (e.g., social learning theory, sociology), and the study of measurement (e.g., validity and reliability theory, formative and summative assessment theory). This article explores how intersections of these areas have played out in the classroom assessment literature over the last 20 years. Some literature has emphasized one practical function or theoretical tradition; some literature has blended several. Overlapping theoretical traditions are opportunities both for richness of understanding and for tensions and conflicts.

Why is the classroom assessment literature all over the map, so to speak, and what is the map? Wiliam (1998) noted that two widely cited and substantive review articles from the same time period (Crooks, 1988; Natriello, 1987) had 323 references between them, but only 9 in common. Black and Wiliam (1998a) noted that formative assessment did not have a tight definition or usage. What British authors call formative assessment or teacher assessment is often called classroom assessment in the United States literature, although classroom assessment in the United States does include some aspects of assessment that are more properly considered summative─for example, grading.


I submit that classroom assessment as an academic field─that is, theoretical and empirical study of the nature and effects of classroom assessment in schools─would make more progress if it had a clearer identity. I further submit that the reason the field is a bit scattered at present is that classroom assessment sits at intersections in both theory and practice and that the resulting array of relevant practical and theoretical material creates tensions for those who try to chart this territory. That several theoretical bases are relevant to classroom assessment also makes it more difficult for any one scholar to encompass them; academic research typically arises in some particular theoretical tradition.


THESIS


The practice of classroom assessment occurs at the intersection of three teaching functions: instruction, classroom management, and assessment. Theory relevant to classroom assessment comes from several different areas of study: the study of individual differences (e.g., educational psychology, theories of learning and motivation), the study of groups (e.g., social learning theory, sociology), and the study of measurement (e.g., validity and reliability theory, formative and summative assessment theory). Figure 1 presents a visual organizer for the reader.


I set out to test this thesis that classroom assessment as a field is characterized by these tensions and intersections by conducting a literature review covering roughly the last 20 years of classroom assessment literature. The purpose of this review was to explore and demonstrate how these




[39_11523.htm_g/00001.jpg]





Figure 1. Tensions in Classroom Assessment Theory and Practice


tensions have been manifest in the classroom assessment literature. I made an effort to review both empirical studies and theoretical or review articles, and I tried to look at literature beyond the United States. Nevertheless, given the thesis of the article, I would be surprised if there are some articles readers might have found and included that I did not.




METHOD


LITERATURE SEARCH


I wished to find articles about (a) classroom assessment in basic education (K–12 in the United States, or its equivalent elsewhere) as opposed to higher education; (b) classroom assessment for learning as opposed to classroom assessment for administrative and grading purposes; and (c) classroom assessment as opposed to large scale assessment.1 Because I would be using what I found to discuss the ‘‘field’’ of classroom assessment, I limited my search to published work to ensure some sort of peer review, reasoning that one’s academic or scientific peers are the ‘‘field’’ in some sense. I was looking to select articles published within the last 20 years.


To canvass the United States literature, I searched the ERIC database from 1982–2002, using the phrase ‘‘classroom assessment.’’ I limited the search to journal articles. From the 149 abstracts returned, I removed those in agriculture and in higher education, resulting in 83. I re-reviewed these to select those that had a K–12 academic focus (e.g., not assessment of behaviors or attitudes) and were either some sort of empirical work or theoretical literature review. I eliminated opinion papers, general discus- sions, or recommendations and guides for teachers. I added three reviews that are widely cited (Black & Wiliam, 1998a; Crooks, 1988; Natriello, 1987) because these have, in being widely cited, come to help define the ‘‘field.’’ I added literature I have read myself, in the course of working in this field, that I judged to be relevant even though it did not score a hit in the ERIC search. These included three book length summaries of research (Gipps, McCallum, & Hargreaves, 2000; Stiggins & Conklin, 1992; Torrance & Pryor, 1998).


To canvass the non-U.S. literature published in English, I searched the British Educational Research Journal table of contents on the publisher’s Web site, adding one more article to the list. I also searched Educational Research Abstracts online, using the phrase ‘‘classroom assessment.’’ Of the

16 articles returned, I removed those about higher education and those that I had already identified from the ERIC search. Four additional articles resulted. I also obtained a special issue of the Alberta Journal of Educational Research on classroom assessment.



Table 1. Characteristics of Articles Reviewed


 

EMPIRICAL

REVIEW

Publication Dates

1980–1984 2

1987–1989 3

 

1985–1989 7

1990–1994 6

 

1990–1994 7

1995–1999 7

 

1995–1999 15

2000–2001 1

 

2000–2002 9

 
   

Subjects

Preservice teachers 4

 
 

Inservice teachers 20

 
 

Students 10

 
 

Pre- & inservice 1

 
 

Inservice & students 5

 
   

Country

United States 25

 
 

Canada 6

 
 

Israel 2

 
 

United Kingdom 2

 
 

Greece 1

 
 

Netherlands 1

 
 

New Zealand 1

 
 

UK and Ghana 1

 
 

Venezuela 1

 



I reviewed the reference lists of the articles to identify additional relevant articles. A reviewer suggested some additional works. The final collection of published material included 41 articles, 1 book chapter, and 3 books that reported results based on quantitative or qualitative data collected in research studies. Because some of the studies in this count represented a series of analyses of data from the same study, there were 40 separate entries in the analysis. I counted as empirical articles that referred to specific data collected even if they were not written up in a conventional research report structure. The final collection also included 17 reviews of various kinds: theoretical reviews, meta-analyses, and other broad and narrow literature reviews, in all 15 articles (representing 12 reviews) and 2 book chapters. The reference section of this article is coded to identify the empirical and review studies. Table 1 presents characteristics of the articles reviewed.



ANALYSIS


I sorted the empirical articles according to (a) whether the subjects were preservice teachers, in-service teachers, or K–12 students; (b) the country in which data were collected; and (c) the main theoretical or practical frameworks (from Figure 1) that structured the analysis. Similarly, I sorted the

reviews according to the main theoretical and/or practical frameworks (from Figure 1) that structured the analysis.


By ‘‘main theoretical and practical frameworks,’’ I do not mean an exhaustive categorization of each reference in an article’s bibliography. Rather, I attempted to categorize the framework as a whole to see what learned tradition influenced the research question and methods (for empirical articles) and the thesis and discussion (for reviews). For example, if an article mainly talked about the role of feedback in providing information that individual students can use for learning, I counted that as a framework based on individual psychology─in this case, specifically, learning theory. Frameworks that paid attention to the group nature of classrooms, including social learning theories, classroom environment theories, or sociological theories, were counted as frameworks based on theories about groups. Frameworks based in measurement theory fell mostly into two categories, either talking about measurement quality (validity and reliability) or measurement function (formative and summative).




THEORETICAL AND PRACTICAL BASES OF CLASSROOM ASSESSMENT LITERATURE


PRACTICAL BASES: INSTRUCTION, CLASSROOM MANAGEMENT, AND ASSESSMENT PRACTICES


All 40 of the classroom assessment studies as searched and defined above discussed classroom assessment practices; this was the definition required by the search. Twelve (30%) discussed classroom assessment practices and related instructional issues. Seven (18%) discussed classroom assessment practices, instructional practices, and classroom management issues. The remaining 21, roughly half, of the articles discussed classroom assessment practices without reference to related instructional or management practices.


INVENTORIES OF CLASSROOM ASSESSMENT PATTERNS OR PRACTICES


The basic purpose of some studies was to do an inventory of classroom assessment practices. Some studies did simply that. Either there was no theoretical framework at all, or there was a literature review but it did not drive the nature of the research questions for the inventory or the discussion of results. In either case, these studies remained basically reports of practice.


Barnes (1985) concentrated on the people who do the evaluating. She interviewed 20 student teachers and their 20 cooperating teachers to find out their purposes for and uses of classroom evaluation. For the teachers, the main purpose of classroom evaluation was grades. Teachers also made intuitive judgments about their pupils based on their grades. For student teachers, the main purposes of evaluation were evaluating motivated students, communicating with parents, classifying students, and assessing their own effectiveness. Their concept of evaluation was limited to marking papers. Teachers’ concerns included (1) fear of hurting pupils’ feelings, (2) lack of evaluation knowledge and skills, (3) unresolved conflicts over criteria to be used.


Several studies concentrated on listing the kinds of classroom assessment methods U.S. teachers used. Gullickson (1985) surveyed 295 teachers in one state to find what student evaluation techniques they used and whether these practices varied by grade or subject. Elementary teachers used a diversity of techniques including heavy reliance on nontest techniques. Secondary teachers relied more heavily on tests. Secondary teachers used fewer commercially prepared tests than did elementary teachers. Schmidt and Brosnan (1996) surveyed 180 mathematics teachers in 35 school districts to find out what evaluation practices and reporting methods were in use in mathematics. They concluded current reporting practices limited the use of alternative assessment methods in mathematics. Teachers reported believing alternative assessments were important but did not use them much. Most used points or proportions correct on conventional test and assignments to arrive at grades.


Practice inventories have also been conducted outside of the United States. Nicholson and Anderson (1993) interviewed 21 Canadian primary (K–4) teachers about observational assessment practices and opinions about using observation for classroom assessment. They found teachers felt comfortable with observation. Most believed the purpose of assessment was to help pupils learn and that assessment was part of the larger task of education. About half believed the purpose of observation was also to collect information to communicate to parents. To most, observation meant simply watching students. It also could mean listening to pupils, recording anything children do, and asking pupils questions. Most reported they observed continuously, then focused on behavior of interest. Methods of record keeping varied, and quiet students in the middle were often overlooked.


Mavrommatis (1997) observed 20 and surveyed 372 teachers in Greece about their assessment practices. Teachers used oral questioning, class or individual discussions, informal observations, commenting or marking work, behavior, interaction, paper-pencil exercises, and tests. Evidence interpretation was somewhat criterion referenced (against the tasks and assignments themselves), but there were not usually clear criteria. Teachers were aware of this lack of clarity.


There is some use for this describe-and-categorize kind of information. A colleague once reminded me that one of the main rationales for descriptive research in education is that we do not yet have a full set of categories or taxonomies for all the topics of interest in our field. As he put it (Shank, personal communication), ‘‘there was no Linnaeus’’ in educational research. Nevertheless, inventories of practice that remain at the level of reporting a list of practices, or a list with frequencies and perceived importance, lack a framework for discussion that allow next steps to be taken to advance a research agenda.


INVENTORIES OF CLASSROOM ASSESSMENT PRACTICES BASED IN A THEORETICAL FRAMEWORK


Some inventories of practice were based on a theoretical framework. The results of these studies have the potential to inform additional work because of their theoretical basis. However, of the five studies clearly in this genre (inventories of practices that were based on some theoretical framework), each used a different framework. Thus these studies serve to illustrate the patchwork nature of the theoretical bases for classroom assessment research─creating a tension instead of resolving one.


Stiggins and Bridgeford (1985) surveyed 228 teachers in 8 districts across the United States. Their questionnaire was essentially an inventory of assessment practices, but they used a framework based on teachers’ levels of use and stages of concern, that is, a framework concerned with the psychology of teachers. They found that teacher-developed assessments were the mainstay of classroom assessment. Teachers reported being concerned about assessment and needing help making changes. Stiggins and Bridgeford’s grade- and subject-level findings were similar to Gullickson’s (1985). Use of teacher-made objective tests and structured performance assessment increased across grades; reliance on published tests and spontaneous performance assessment decreased. Math and science teachers used more objective tests. Writing teachers used more structured performance assessments. Speaking teachers used more spontaneous performance assessments. Teachers were concerned about test improvement and time management. Concern about assessment and attention to quality control increased with grade.


Wilson (1990) also used a framework concerned with the psychology of teachers, but his framework was concerned with teachers’ decision making. Two different Canadian provinces, with different educational policies and procedures, were sampled. Findings concentrated on the uses teachers made of assessment information. Teachers beyond early primary grades used evaluation mainly for report card marks, so the reports ‘‘typically reaffirm judgments about achievement rather than develop them, with the result that very little alteration of instruction is made as a result of the evaluation activity’’ (Wilson, 1990, p. 4). In addition to the grading purpose, elementary teachers also used evaluation to diagnose student weaknesses. Secondary teachers also used evaluation to check progress against course objectives. Classroom decision making served multiple purposes: ‘‘Consequently, the tidiness of aim, so necessary for the development of objective measurement, is replaced with a variety of purposes that renders the teacher decision making process far more complex’’ (Wilson, 1990, p. 5). Wilson (1990) also described the frequency, regularity, patterns, origins (teacher developed or borrowed) of assessments. These patterns were similar to those that have been described for the U.S. (Gullickson, 1985; Stiggins & Bridgeford, 1985). Secondary teachers relied more heavily on teacher-made instruments than did elementary teachers. There was more performance assessment at the elementary level and more multiple-choice testing at the junior/intermediate levels.


Adams and Hsu (1998) surveyed 269 Florida elementary (Grades 1–4) teachers. They declared a framework based on the psychology of teachers, using the theory that the facilitation of meaningful change in curriculum and instruction in mathematics would be affected by teacher beliefs and conceptions. They did not, however, cite any literature focused on the assessment conceptions of math teachers. Consistent with previous literature, these elementary teachers rated pupil observations most important, student performance next most important. Essays were rated of lowest importance. In general these elementary teachers rated all assessment techniques as neutral to important.


McMillan (2001) based his survey of 1,483 teachers in Virginia in measurement theory, specifically the validity issue that grades should consist solely of academic indicators. However, he considered teacher beliefs as he interpreted his findings, theorizing that teachers used classroom assessments of qualities he called ‘‘academic enablers’’ (e.g., effort and participation) to judge student motivation and engagement. There were some differences by grade, subject, and ability level of class regarding this practice. The largest effect sizes were for ability level of class.


Gipps et al. (2000) described assessment patterns of primary school teachers in the United Kingdom. They framed their discussions of assessment patterns by using a theory of formative assessment (Sadler, 1989). They categorized informal, formative assessment patterns of primary teachers and discussed how each contributed to teachers’ instruction. Assessment strategies included using other teachers’ records; using written tests; observing; questioning (of two types, oral testing and delving); getting a child to demonstrate; checking; listening; eavesdropping; marking; making a mental assessment note; gauging the level (of three types, assessing general level of understanding, judging individual progress, and looking at a range of work to make a judgment about U.K. National Assessment levels); and working out why a child has or has not achieved (Gipps et al., 2000, p. 68).



CLASSROOM MANAGEMENT─UNDEREMPHASIZED IN THE LITERATURE?


Only seven of the studies (18%) discussed classroom management issues. Perhaps this is one of the reasons the classroom assessment literature is sometimes overlooked or not used by teachers and other educators. Of these seven studies, four were conducted outside of the United States. One of the three U.S. studies was the only action research study in this review and was coauthored by the two teachers in whose classroom the study took place. It may be that in the U.S. academic community, the emphasis currently placed on classroom management (e.g., including it in studies’ theoretical framework, literature review, and method) is not commensurate with the size of its intersection (and tension!) with other aspects of classroom assessment.


Most of the studies that did discuss classroom management issues dealt with elementary school. Higgins, Harris, and Kuehn (1994) reported an action research study of student generated performance criteria in a team taught, first and second grade classroom. They found that these primary students believed that the amount of work and level of cooperation were as important, if not more so, than the product of group work. Pupils were able to reliably assess the work of their own group but not others, because observing silly behavior led the students, but not the teachers, to underestimate the achievement level of other groups’ work. Tunstall and Gipps (1996), studying teacher feedback in Years 1 and 2 in UK primary schools, found a whole class of teacher feedback the primary role for which was socialization and management. Mavrommatis (1997) found that Greek primary teachers who were asked about the impact of their feedback on children mostly described children’s nonverbal behavior. Mavrommatis also found that management concerns affected teachers’ feedback. Teachers’ comments about achievement typically were positive, with the intent to encourage students and others to do good work, a finding similar to one Barnes (1985) had made in the United States. Teachers’ comments about behavior typically were negative, to control and correct. The amount of true descriptive feedback was notably small. Moreland and Jones (2000) studied assessment practices in technology education among New Zealand primary teachers (Years 1–6). They noted that in this context, assessment often dealt with the managerial issues of teamwork, taking turns at the computer, and the like, instead of dealing with the technology concepts and procedures the students were to learn. Thus teachers’ formative interaction with students pulled the focus away from learning about technology and toward behavior management.


One U.S. study that dealt with classroom management issues focused on how student teachers learned to assess and grade in the middle school (Kusch, 1999). Student teachers from two different courses in math methods, reflective practice versus conventional, demonstrated both (a) control of procedure and content and (b) control of student behavior and learning in the classroom. Control-related aspects of learning assessment derived primarily from cooperating teachers’ pressure to maintain established classroom order and secondarily from methods courses. Cooperating teachers expected assessment to result in grades─as did the teachers in Barnes’ (1985) study─as well as sanctioning thinking and policing pupil behavior. These management functions of assessment were more salient than the instructional decision making functions. Assessment did not really tell student teachers how to adjust lessons. Student teachers were constrained by the record-keeping systems their coops had already set up.



THEORETICAL BASES: THEORIES ABOUT INDIVIDUALS, GROUPS, OR MEASUREMENT PRINCIPLES


Theory relevant to studying classroom assessment comes from the study of individuals, the study of groups, and the study of measurement principles. Some literature has emphasized one theoretical tradition; some literature has blended several. Overlapping theoretical traditions are opportunities both for richness of understanding and for tensions and conflicts.



Psychology, the Study of Individual Differences


The majority of the classroom assessment literature used psychological theory as the theoretical framework that drove the statement of the research questions and selection of methods. Fifteen of the reviews (88%) and 24 of the empirical studies (60%) used psychological theory either as part or as the main framework for design and analysis.


Learning and motivation theory. Studies with students as the subjects most often used theories of learning and motivation as the theoretical basis for the investigation. For example, Dassa (1990) was interested in the function of assessment for error diagnosis and remediation. Butler and Nisan (1986) and Butler (1987) framed their work in terms of students’ goal orientations. Shepard and her colleagues (1996) used cognitive learning theory more generally─specifically, that active involvement in learning should lead to enhanced student learning─to investigate the effects of using performance assessment on student learning. Stiggins, Griswold, and Wikelund (1989) based their investigation on Quellmalz’s taxonomy of cognitive levels.


Crooks (1988) and Natriello (1987) each searched primarily the educational psychology literature to construct reviews of the role of classroom evaluation on students’ learning and motivation. Crooks (1988) studied the impact of classroom evaluation practices on students. This major literature review, still widely cited, summarized results from 14 fields of research that cast light on the relation between classroom assessment and student outcomes, especially outcomes regarding students’ learning strategies, motivation, and achievement. Crooks (1988) described patterns of evaluation; what is evaluated; short, medium, and long term effects (mostly on psychological variables); expectations, studying, and learning; attributions, anxiety, and intrinsic motivation. He was interested in the intersection of instruction and evaluation practices, including teacher questioning practices and feedback.


Natriello (1987) studied the impact of evaluation processes on students from the perspective of psychology (not surprising, since the review was published in Educational Psychologist). This was a major literature review of research on effects of features of the evaluation process on students’ motivation and achievement. Features included (1) establishing the purposes for evaluating students; (2) assigning tasks to students; (3) setting criteria for student performance; (4) setting standards for student performance; (5) sampling information on student performance; (6) appraising student performance; (7) providing feedback on student performance; and (8) monitoring outcomes of the evaluation of students.


Stiggins (1999) was concerned about student motivation. Based on his reading of motivation theory, Stiggins concluded we must reevaluate not only how we assess but how we use assessment in pursuit of student success. He cited Black and Wiliam’s (1998a, 1998b) evidence that formative assessment is the key to motivation and learning, but assessment practices often emphasize summative judgment. Historically, he claimed, we have assumed assessment would ‘‘motivate’’ by intimidation. Stiggins (1999, p.195) concluded we should strive to (1) keep students from losing confidence in themselves as learners to begin with and (2) rekindle confidence among those students who have lost it. Students need to experience success on real classroom assessments to believe in themselves. He pointed out that current assessment practices often communicate to students that failure is bad. Involving students in their own assessment should foster motivation. He called for professional development for teachers in how to do this.


Feedback theory. One subset of the psychology-based studies that seemed especially interesting for a discussion about intersections in classroom assessment research were studies that focused on the nature and function of feedback. This is an interest for psychologists who study learning theory, and it is also a practical function that teachers perform (or should perform) in their classroom assessment. Teachers evaluate student work, and communicating the results of that evaluation means giving feedback.


Elawar and Corno (1985) theorized that ‘‘written feedback that makes student errors salient in a motivationally favorable way’’ (p. 163) is effective. This cognitive approach postulated that drawing student attention is the mechanism by which feedback contributes to learning, in contrast to older, behaviorist learning theories that postulated feedback works when the signal value of a reinforcer outweighs the reward aspect. They studied 504 Venezuelan public school sixth graders, in 18 teachers’ classes, using an experimental design. The ‘‘treatment’’ was a specific form of feedback─ teachers gave constructive criticism on errors on students’ papers, describing how to improve, coupled with at least one positive remark on what was done well. Their design acknowledged the intersection of assessment and instruction, since in addition to treatment and control groups they had a half-class treatment group to see whether instruction was influenced by the use of feedback. They measured students before and after the treatment, on self-esteem, attitude toward school, school anxiety, attitude toward math, analytic reasoning, and math achievement. They measured teachers’ attitude toward math and conducted interviews about their reactions to the experiment. As hypothesized, providing constructive feedback improved student performance and attitude over simple knowledge of results. Teachers believed that feedback had this effect and felt that their training was an effective way to help them do this kind of feedback.


Bangert-Drowns and his colleagues published two meta-analyses using the same set of 40 studies. The studies they reviewed mixed basic and higher education settings. Bangert-Drowns, Kulik, Kulik, and Morgan (1991) analyzed the instructional effect of feedback in testlike events, theorizing that the main function of feedback was correcting errors. They found that feedback was most effective under conditions that encouraged learners’ mindful reception of the content. Bangert-Drowns, Kulik, and Kulik, (1991) analyzed the effects of frequent classroom testing. Thy found that testing frequency in control conditions was the most important predictor of effect size between control (no feedback) and experimental conditions, suggesting that knowledge of incremental results functioned as a kind of feedback itself. Attitudes were more positive with more frequent testing. The effects of frequency of testing showed diminishing returns on achievement.


Kluger and DeNisi (1996) examined studies of the effects of feedback interventions on performance more broadly, including with the education literature studies from other disciplines. Their work, which has been widely cited in the education literature, included a historical review and critique of previous feedback theories, a meta-analysis, a preliminary feedback intervention theory, and then a test of that theory with results from their meta-analysis. Their theoretical argument shared Elawar and Corno’s (1985) focus on attention and Sadler’s (1989) focus on comparing performance to an ideal. Kluger and DeNisi (1996, p. 259) postulated that behavior is regulated by comparisons of feedback to goals or standards; goals or standards are organized hierarchically; attention is limited; only feedback-standard gaps that receive attention actively help regulate behavior; attention is normally directed to a moderate level of the hierarchy; and feedback interventions change the locus of attention and therefore affect behavior. They also pointed out that people tend to think about a task in terms of goals at increasingly higher levels of abstraction as they learn.


Tunstall and Gipps (1996) developed a typology of the kinds of feedback U.K. primary teachers (years 1 and 2) used. Their data for this study were part of a larger study (Gipps et al., 2000) that described assessment and instructional patterns of successful primary teachers, acknowledging the relationships among them. Their discussion of feedback was partly based in psychological theories, but they also acknowledged the importance of the group aspect of classrooms. Learning environments, they pointed out, are distinguished by cues in the feedback about how students are to understand their performance. Their feedback typology distinguished socialization feedback, whose main role is socialization and classroom management, from assessment feedback, whose main role is differentiating students’ learning orientations. They categorized assessment feedback according to whether it was evaluative (judgmental) or descriptive and positive or negative, for a total of eight categories of kinds of teacher feedback to students; rewarding, punishing, approving, disapproving, specifying attainment, specifying improvement, constructing achievement (describing the specifics of achievement), and constructing the way forward. There is some potential for overlap between socialization and assessment feedback in the reward/ punishment category. One of the functions listed for this kind of feedback was management of the classroom and of individual students.


Ross, Rolheiser, and Hogaboam-Gray (2002) studied how elementary

(grades 2, 4, and 6) students in Canada processed feedback, using Bandura’s social cognition theory. They interviewed 71 students and found that parents, peers, and students’ characteristics influenced student thoughts on evaluation. Female students produced richer, more productive interpretations of the feedback they received than did male students. For older students, peers’ opinions were more salient than parents’ opinions for focusing their attention on specific aspects of their performance they needed to work on. Older children became more sophisticated consumers of assessment information. They expressed less uncertainty and used more resources for interpretation, including more longitudinal comparisons. Younger students focused more on neatness and language. Older students had mixed feelings about evaluation but saw benefits to attending to the information.


Feedback theory, then, is an important connection between psychological theory and instructional, management, and assessment practices. Indeed, it pulls what is studied about assessment toward considerations about instruction or management. But considering psychology and classroom practices together also raises some questions that studies to date have not answered. For example, why do older students become more cynical and more oriented toward grades? Because their experience is that performance does not influence instruction or how they are treated (potentially a problem of practice)? Or because they are developmentally becoming more sophisticated?


Studies of teacher beliefs and practices. One review (Niemi, 1997) applied cognitive science to understanding teachers’ use of performance assessments, pointing out that identifying or developing good performance tasks requires of the teacher a deep understanding of both the concepts to be assessed and how they are organized. Niemi also pointed out that (p. 245) teachers may contribute to cognitive science theory building as they study the ‘‘development and assessment of subject area competence in the classroom.’’


By and large, however, studies with preservice or in-service teachers as the subjects used one of two approaches─a focus on the nature and function of teacher beliefs about assessment or a focus on teacher decision making, especially instructional decision making. The two are related (McMillan, 2001; Thomas & Oldfather, 1997). Both are concerned with the psychology of teachers because they focus on the beliefs and behaviors of individual teachers.


Thomas and Oldfather (1997) pointed out the logical connections between individual teachers’ epistemological beliefs and their assessment practices. If one believes knowledge is static, it follows that assessment should focus on scoring content. If one believes knowledge is dynamic, it follows that assessment should focus on constructing a narrative about process. If one believes knowledge is transmitted from experts, it follows that assessments should be individual and focus on cognition, and assessment of parts or subskills assessment should be encouraged. If one believes knowledge is actively constructed and reciprocal, it follows that there should be both individual and group assessments (to assess where one performs alone and with scaffolding) and outcomes of interest should include not only performance but also interest in the subject, risks taken, and attitude. If one believes that the teacher is the keeper and provider of knowledge, it follows that the teacher should be responsible for grades. If one believes students are coconstructors, it follows that students and teacher should be responsible for assessment. Each of these choices has implications for students’ perceived autonomy, self-determination, and self-efficacy. Kusch (1999) found some evidence that this logic does play out in practice. Student teachers who studied reflective practice in mathematics methods assessed during the lesson and asked pupils to participate in their assessments. Student teachers who studied conventionally assessed after the lesson.


Shepard (2001) established historical connections between epistemological beliefs and the study of assessment. She pointed out that in the early 20th century, the curriculum of social efficiency, hereditarian theories of intelligence and behaviorist theories of learning, and a belief in scientific measurement supported assessment in small, fragmented steps that postponed higher order thinking until basic skills were mastered. At the start of the 21st century, cognitive and constructivist learning theories, a reformed vision of the curriculum, and a more formative view of classroom assessment support a view of classroom assessment that is more integrated with learning and more performance based.


This changing view in the discipline, however, has not made a smooth transition into classroom practice. Teachers and student teachers, as a group, are uncomfortable with the assessment function of their work. This discomfort has been noted among US teachers and student teachers (Barnes,1985) and among teachers in the UK and in Ghana (Pryor & Akwesi, 1998). These studies reported teachers dislike the summative nature of assessment. It interferes with their primary perception of their job, which is to nurture growth and understanding among their students.


The news is not all about roadblocks, however. Johnson, Wallace, and Thompson (1999) studied the effects of participation in a project emphasizing performance-based assessment (PBA) on building teacher efficacy among middle school math teachers. A high percentage believed they had the skills necessary to implement performance based reform, disagreed that student motivation was primarily due to home environment─one measure of teacher efficacy─and rated their own success as moderate. Beliefs about math assessment moved toward a constructivist position, emphasizing active learning, after participation in the project. Johnson, Wallace, and Thompson (1999) therefore concluded that views of ‘‘relatively seasoned, urban middle school’’ math teachers were confident, empowered, and involved in implementing PBA.


Wilson (1999) and colleagues studied how student teachers learned to assess student work. A series of articles in a special issue of the Alberta Journal of Educational Research reported on data from a study exploring the perceptions and behaviors of 147 Canadian preservice as they assessed a hypothetical eighth grade student named Chris. Each week they received more information from Chris’s portfolio. They were asked to help Chris’s teacher grade his work and comment on the processes they used and how they felt about the tasks they did. Wilson and Martinussen (1999) studied characteristics of distributions of marks and their relationships with other variables. Anderson (1999) used structural equation modeling to test a logic model about assessment. Shulha (1999) reported on content analyses of themes about participants’ approaches to assessment. Their findings described the effects of student teachers’ beliefs about students and learning on their evaluation of Chris’s work.


One conclusion about the results of studies of teacher beliefs about assessment must be, therefore, that it depends on whom one studies. Reform leaders, regular teachers, beginning teachers, and student teachers all have been studied with varying results. This sampling issue will be discussed later as one of the methodological tensions in the study of classroom assessment.



Theories About Groups


Some studies also considered students in groups. Perhaps if the group nature of classroom assessments were better supported with theoretical work, the practical classroom management issues would not be so neglected in the literature. There is some overlap between group and individual psychology. For example, the major frameworks Crooks (1988) used for his discussion were from educational psychology, but he discussed the social outcomes of evaluation and the effects of cooperative learning for individuals. Many of the works cited here that were concerned with characteristics of individual students acknowledged that the individuals operated in groups in school. This section discusses the use in the classroom assessment literature of theoretical frameworks that go beyond an interest in groups (classes) as collections of individual students to situate some of their theoretical concerns within the group itself. The two most common theoretical frameworks that acknowledged the group nature of classroom assessments were classroom assessment environment theory and sociocultural learning theory.



Classroom assessment environment theory. The classroom assessment environment, as a theoretical construct, grew out of the work of Stiggins and his colleagues (Stiggins & Bridgeford, 1985; Stiggins & Conklin, 1992). Stiggins and Conklin (1992) described the classroom assessment environment more in terms of teacher practices than in terms of student perceptions. They based their description on a research agenda covering much of the 1980s and including surveys, interviews, observations, inspection of assessments, and teacher journals. The eight dimensions of the classroom assessment environment Stiggins and Conklin identified included (1) the purposes for which teachers used classroom assessments; (2) the assessment methods used, (3) the criteria for selecting them, and (4) their quality; (5) the teacher’s use of feedback; (6) the teacher’s preparation and background in assessment; (7) the teacher’s perceptions of students; and (8) the assessment policy environment. All of these except the last are under the teacher’s control. Classes have an assessment ‘‘character’’ or environment that stems from the teacher’s general approach to assessment.


Tittle (1994) wrote a theory building review that took seriously the individual-within-group nature of the classroom. She proposed a general framework for educational psychologists to use in thinking about assessment dimensions with three categories (p. 150): (a) epistemology and theories about teaching and learning, curriculum, and development and change; (b) the knowledge, beliefs, intents, and actions of the assessment interpreter and user; and (c) assessment characteristics, including embeddedness in practice, format and mode, scoring, evaluation, preparation, and feedback. This review, published in a journal for educational psychologists, took seriously all three theoretical bases relevant to classroom assessment (theories of individuals, groups, and the measurements themselves). Her work acknowledged and described the multiple theoretical bases necessary for thinking about classroom assessment and identifying the constellations of relevant variables. The next step in theory development would be to specify theoretical relationships among the variables that would lead to testable assertions.


My colleagues and I (Brookhart, 1997a, 1997b; Brookhart & DeVoge, 1999) incorporated classroom assessment environment and educational psychology (theories of learning and motivation) together into a theory about the role of classroom assessment in student motivation and learning. We have not yet been able to test the classroom assessment environment aspect of the theory, because volunteer teachers usually have very positive classroom environments. Therefore, it is difficult to obtain a sample of a range of environments. This sampling issue is revisited in the section below on methodological issues.



Social constructivism. Sociocultural learning theory is a theory of learning that is broader than individual psychology. Whereas constructivist learning theory emphasizes that students construct their own meaning from their experiences, social constructivist learning theory situates the learning itself, not just the experiences that give rise to it, in the interactions among people. We have already seen that many of the psychology-oriented classroom assessment studies were based in constructivist learning theory, acknowledging that students construct their own meaning based in part on their group experiences. One study (Thomas & Oldfather, 1997) emphasized the construction of group understandings between and among students in the United States. They proposed a group analogue in social constructivism to what in individual psychology would be called intrinsic motivation, calling it ‘‘continuing impulse to learn’’ (CIL). Their definition of CIL (p. 111) was a personal learning agenda originating in the social process of meaning construction. They discussed teacher beliefs in terms of the social expectations of the profession, reminding readers of Dewey’s observation that one cannot measure what does not exist, and for teachers ‘‘Possibilities are more important than what already exists’’ (p. 107) Perhaps this explains findings from teacher belief studies that teachers dislike assessment.


Culture and sociology. Educators who are interested in a practice as pervasive as classroom assessment, a practice with such consequences for children’s lives and learning, must recognize that there is a cultural aspect to the way we choose to pass on accepted knowledge and behavior patterns to our children. Schooling in general is part of how we socialize children into our academic disciplines and into society, and classroom assessment as a large part of that schooling. Wolf (1993) realized this. She called for assessments to become ‘‘episodes of learning’’─a phrase that has come to be widely used─and pointed out that these episodes are cultural events. She wrote (p. 213), ‘‘any assessment is also a head-on encounter with a culture’s models of prowess. It is an encounter with a deep-running kind of ‘ought.’’’ She gave examples of ‘‘the ecology of wise assessments’’ from the arts and humanities, also keepers of culture. Artists and writers practice, keep journals and notebooks, do sketches, and so on, making evaluations on these partial products and incorporating what they learned in their final products or performances. These interim assessments are episodes of learning that are directly on the way to, and necessary to, the art. Wolf called for assessments in schools to become this deeply a part of learning.


Torrance and Pryor’s (1998) study of formative assessment in classrooms drew on three theoretical frameworks (p. 3): classroom interaction studies, social constructivist learning theory, and ‘‘more straightforwardly psychological studies of motivation and attribution’’ (i.e., the psychology of individual differences). They called this work an interest in ‘‘the ‘micro-sociology’ of classroom assessment and classroom learning.’’ They constructed a rich description of formative assessment in infant classrooms (ages 4 to 7 years) in the UK.


The teacher characteristics that define what Stiggins and Conklin (1992) termed the classroom assessment environment are similar to what sociologists have termed aspects of classroom structure: the level of classroom task differentiation, student autonomy, grouping practices, and grading practices (Rosenholtz & Rosenholtz, 1981; Simpson, 1981). These instructional and assessment choices lead to different kinds of classroom structures, which afford students different environments in which to ‘‘construct identities’’ (Rosenholtz & Rosenholtz, 1981, p. 133). More unidimensional classroom structures (whole group instruction, little student choice, frequent grading) are associated with a more normal distribution of ability perceptions and more consensus about individuals’ ability levels than more multi-dimensional classrooms (individualized instruction, student choice, less frequent grading; Rosenholtz & Rosenholtz, 1981; Simpson, 1981)


Natriello (1996) used a theory of evaluation incompatibility and instability in the authority system to predict student disengagement from high school. The theory posited that students have a minimum satisfactory evaluation level and that incompatibility exists when the system prevents students from maintaining their own acceptable levels of performance because evaluations are contradictory, uncontrollable (by the student), unpredictable, or unattainable. He studied 291 students from 4 high schools and concluded that incompatibilities in evaluation and authority systems were pervasive and were strongly related to student disengagement from high school, as measured by their reported low acceptance levels, low effort, and participation in negative acitivities.



Measurement Theory


Theoretical frameworks based in measurement theory were mainly of two types. Either the measurement quality features of validity and reliability were considered, or the measurement function features of formative and summative assessment were considered.


Validity and reliability. Several studies and review articles used conventional measurement theory as the criterion against which to judge published classroom tests (Frisbie, Miranda, & Baker, 1993), portfolios (Cicmanec & Viechnicki, 1994), teachers’ knowledge (Impara, Plake, & Fager, 1993), teachers’ reported beliefs and practices (Frary, Cross, & Weber, 1993; Mertler, 2001; Traub, 1990), and student teachers’ lesson plans (Campbell & Evans, 2000). In all of these studies, someone or something is found wanting. Frisbie et al. (1993) found that in unit tests from elementary and middle school science and social studies textbook series, on average about half of the chapter objectives were measured, about two thirds of the items were phrase matches with the text, and about 90% of the items required simple recall. Impara et al. (1993) reported teachers averaged 23.18 out of 35 on a test of classroom assessment knowledge. Frary et al. (1993) reported the results of a cluster analysis in which they classified teachers, based on their agreement or disagreement with various assessment practices, as ranking (comparing students with each other), ‘‘softhearted,’’ ‘‘hard nosed,’’ ‘‘arbitrary or manipulative,’’ uncertain, or ‘‘easy’’ (p. 28). Mertler (2001) found the steps teachers in his study reported taking to insure reliability and validity of classroom assessments to be incomplete or incorrect. Campbell and Evans (2000) found that only 25 out of 237 lesson plans that contained objectives had even partial evidence that the objectives would be assessed in a reliable and/or valid manner.


Bulterman-Bos, Terwel, Verloops, and Wardekker (2002) turned this reasoning around and tried to reframe reliability and validity in terms of teachers’ practice. They interviewed 25 Dutch teachers, collected their stories, and did a narrative analysis. The two most frequent ways teachers identified students’ individual differences were observations and graded work. They found four observational patterns: (1) triggered observations, (2) incidental observations, (3) intentional observations, and (4) long-term observations (over teaching careers, not longitudinal observations of pupils). They concluded that observation is embedded in the act of teaching, and teachers cannot separate themselves from the observations they make. Therefore, they tried to frame personal frameworks as reliability and validity issues, noting that teacher observations will be better (more valid) the better they are at understanding teaching, instruction, and management. Considered this way, the success of an observation (e.g., at informing instruction) becomes a warrant for its validity.


Another use of validity theory as a framework for studies of classroom assessment has been to use Messick’s (1986) observation that the consequences of an assessment’s use should be part of the evidence for the validity of that use. Shepard and her colleagues (1996) looked for the effects of introducing classroom performance assessments into the schooling of Denver third graders on student learning, arguing that if learning was enhanced that would be evidence for the validity of the performance assessments. Their basic finding was that there were no gains in student learning after a year of performance assessment. They saw some specific changes that could reasonably be interpreted as due to the use of performance assessments in instruction. There was some improvement in math achievement measured with performance-based assessment. Qualitative analysis indicated more poorer performing students could recognize patterns in the assessment tasks, even if they could not get the whole problem correct.


Goldberg and Roswell (2000) asked whether scoring experience with the Maryland State Performance Assessment (MSPAP) really served as professional development, a claim made as consequential evidence for the validity of the MSPAP. Teachers perceived their scoring experience as valuable professional development. Most could define performance-based assessment generally but not completely or totally accurately. Scorers did say they would incorporate more performance-based assessment into their classes and create activities aligned with Maryland learning objectives. Teachers with scoring experience were more likely to integrate performance based assessment into their classroom instead of just tacking on an activity. The alignment of their classroom assessments with Maryland learning objectives was partial. Sometimes it appeared that teachers considered the use of hands-on or open-ended assessment activities as an end in itself. Scorers crafted and used evaluation criteria more often than non-scorers but often scored extraneous features or counted features of work instead of judging the quality of work.


Theory of formative assessment. A high view of the importance of formative assessment and a high view of teacher judgment of student classroom work is characteristic of the non-US literature. Scriven (1967, pp. 40–43) is credited with the first published use of the terms ‘‘formative’’ and ‘‘summative’’ to describe two general functions of any evaluation. Sadler (1983, 1989), first in the context of higher education and then more generally should be credited with the classic application of these functions to students’ learning, showing how the instructional practice of formative feedback plays into learning theory. All of the studies that investigated formative assessment cited Sadler (1989).


Black and Wiliam (1998a, 1998b; Wiliam & Black, 1996) succeeded in moving the interest about formative assessment across the pond to the United States. Black and Wiliam (1998a) published a comprehensive and influential review of 250 studies that was summarized in ‘‘Inside the Black Box’’ (1998b) and reprinted in Phi Delta Kappan, a widely circulated U.S. educational journal. The black box is the classroom, and Black and Wiliam rightly pointed out that most studies of classroom assessment ignored the classroom─thereby ignoring the classroom assessment environment (Stiggins & Conklin, 1992) and the group nature of assessments, and also ignoring the formative purposes of most instructional practices (Sadler, 1989). The good news is that this oversight is now on the radar screen for many educators. The bad news is that, as this article demonstrates, the screen does not have a good map behind it.


Sadler (1989) used concepts from all three of the theoretical traditions described in this article as foundational for classroom assessment: theories about individuals, about groups, and about measurement. Part of the reason his article has become so widely cited, as it has especially in the non-US literature on formative assessment, may be its solid grounding in the several relevant bodies of theory. He wrote (p. 119), ‘‘The focus is on judgments about the quality of student work; who makes the judgments, how they are made, how they may be refined, and how they may be put to use in bringing about improvement.’’ From repeated feedback the student comes to hold a concept of quality. Sadler (1989) used feedback to mean information from the teacher and self-monitoring to mean information from students’ self-evaluations. He concluded that lack of opportunity to self-monitor is an impediment to learning (pp. 140–141). He acknowledged the group nature of the concept of quality work (p. 135), observing that criteria are elusive until ‘‘caught’’ by examples and shared experiences. Levels of quality can be conveyed as statements or as exemplars. Standards become goals when they are ‘‘desired, aimed for, or aspired to.’’ The source of some of the productive energy researchers have drawn from Sadler’s theoretical work to advance the field may be the productive simultaneous consideration of theoretical work from different areas, operating in what I am calling the ‘‘intersections’’ in theory and practice.


Barnes (1985) found that U.S. teachers reported doing some formative evaluation work but emphasized the summative (grading) functions of classroom assessment. Thomas and Oldfather (1997) reported students perceive that grades shift the focus in the classroom from learning and self-efficacy to evaluation. Published studies that framed their look at classroom assessment using formative evaluation theory, with one exception to date, have been done outside of the United States. The only U.S. sample among the studies that investigated formative assessment was my own, and was published in a non-U.S. journal. Pryor and Akwesi (1998) studied formative assessment in the U.K. and Ghana. Torrance and Pryor (1998), Gipps et al. (2000), and Rea-Dickens (2001) studied formative classroom assessment in the UK. I (Brookhart, 2001) studied successful U.S. high school students’ formative and summative uses of assessment information. I found that good students try to use all assessment information formatively, and good students often mix formative and summative functions when they use assessment information.



INTERSECTIONS AND TENSIONS


The previous section reviewed the classroom assessment literature according to the main practical or theoretical orientation underlying the research questions and study design. A large portion of the articles consisted of inventories of practice or studies based in one theoretical tradition, often psychology. Some of the studies were informed by more than one theoretical tradition or discussed assessment practices in conjunction with instructional and classroom management practices. Studies that combined theories or practices in this way illustrated what I mean by taking seriously the ‘‘intersection’’ nature of classroom assessment.


This section discusses three additional intersections where theoretical concepts conflict. I believe productive new theoretical work that will move the field of classroom assessment forward may be done around these tensions. As a metaphor, when worlds collide, new stars are born. These three examples are by no means exhaustive.


INTERSECTION BETWEEN MEASUREMENT THEORY AND THE GROUP NATURE OF LEARNINGFTEACHERS AS DEFINERS OF THE CONSTRUCT ‘‘GOOD WORK’’


Classroom assessment environment theory, sociocultural theories of learning, and sociological studies of classroom structure emphasize the cultural nature of learning and assessment. Measurement theory focuses on constructs and how they are quantified, mostly from the perspective of individual psychology (Messick, 1986; Shepard, 2001). Classroom assessment sits at the intersection here. In the classroom context, teachers define what good work (the construct) is. Work is good if the teacher says it is good (Wiliam, 1998); Wiliam thus named this construct-referenced assessment. Dassa (1990) wrote ‘‘an error is in practice defined as such essentially by the teacher. Hence errors express, at least partially, the type of relationship to knowledge that is gradually being built for the student’’ (p. 40). Tunstall and Gipps (1996) studied feedback as the mechanism by which teachers create a learning environment. Cues in the feedback tell students how they are to understand their performance in the environment. Mavrommatis (1997, pp. 394–395) pointed out that symbols teachers use on their assessments function as a language. ‘‘Such symbols, i.e. faces, signatures, stars, ticks and the like, were unique codes for every class. They were part of the ‘private assessment language’ and only the teacher and the pupils of that particular class could fully interpret them.’’


Black and Wiliam (1998a) noted ‘‘a feature which in our view has been absent from a great deal of the research we have reviewed’’ (p. 56). This is that all the assessment processes are, at heart, social processes, taking place in social settings, conducted by, on, and for social actors. Guy Brousseau

(1984, Cited in Black & William 1998a, p. 56) has used the term didactical contract to describe the network of (largely implicit) expectations and agreements that are evolved between students and teachers. They go on to point out that these expectations limit what the teacher can do as well as what students can do. They use as an example a classroom where over time only knowledge-level questions were expected and thus came to be the only fair game. Students’ assessments and the constructs they indexed, the concepts they learned, were therefore devoid of higher order thinking.


Similarly, Simpson (1981, p. 124) used grading to characterize unidimensional classroom structure because ‘‘Frequent grading is capable of reducing even relatively complex performances to a single dimension, because grades reduce information to numbers, because these numbers can be averaged, and because teachers and student peers can use these numbers to place students on a single, global stratification scale.’’


Black and Wiliam’s (1998a) and Simpson’s (1981) cautions that assessment handled in a limited or low-level manner limit learning by limiting the construct that is referenced contrast with Tunstall and Gipps (1996), Wiliam’s (1998) and others’ descriptions of more positive assessment approaches that help define good work for students. Studies taking seriously a theory that situates the actual construct measured in group settings, and studies of assessment practices that are appropriately multidimensional, are needed for a complete understanding of classroom assessment at this intersection.



TENSION BETWEEN MEASUREMENT THEORY AND EDUCATIONAL PSYCHOLOGYFPURPOSE AND FUNCTION OF CLASSROOM ASSESSMENTS


Studies of classroom assessment that used measurement theory as the framework often discussed classroom assessments in light of validity and reliability theory developed for large-scale assessment contexts. These theories were applied from their primary context to a secondary one. Perhaps this is why classroom assessment is often found wanting in this kind of research. We need some measurement theory work situated in the classroom assessment context (Bulterman-Bos et al., 2002; Wiliam, 1998). For the latter, the differences between U.S. and non-U.S. scholarship is most clear. In the literature outside the United States, formative assessment and classroom assessment are often considered to be the same thing. In the U.S., grading and evaluation have a larger place, and at least some of the classroom assessment is summative, although there is a sense in which all classroom assessment should be formative.


Isaacson (1999) ran into trouble trying to judge reliability and validity and develop practical assessments for a summer program at the same time. He taught a college sponsored summer school program for middle school students who had problems writing and developed a classroom based assessment system in writing short expository pieces. In his analysis, challenges that complicated the development of a classroom assessment system included (pp. 45–46): (1) tension between authenticity of task and manageable instructional domain (small enough chunks to learn); (2) subjectivity in judging performance versus the objectivity needed for reliability; (3) need for alternative equivalent tasks for formative assessment (interesting difference from the need for equivalent tasks for reliability); and (4) how to promote beneficial link between assessment and instruction.


The classroom environment provides a different kind of context for classroom assessments than the administrative context provides for large-scale assessments. Assessment should be integrated with instruction, which implies the meaning of the items or assessment tasks will depend on the environment. This is contrary to the large-scale assessment expectation that assessment should be consistent across administrative occasions or locations. Classroom assessment should be formative, although tensions with summative assessment cause an interesting set of problems. Taken together, these factors render the measure itself literally a part of who the student is in the classroom. As Thomas and Oldfather (1997) quoted a student, ‘‘That’s my grade. That’s me.’’ This is terribly serious for students in classrooms and has all sorts of implications for future learning choices, self-worth, and a host of other cognitive, affective, and conative understandings (Stiggins, 1999). We need to develop theories of validity and reliability that acknowledge the classroom assessment context in order to evaluate the meaning, value, accuracy and consistency of classroom assessment information.


TENSION AMONG METHODS CONVENTIONALL REQUIRED FOR THE DIFFERENT THEORETICAL APPROACHES


Different theoretical traditions also have different methodological conventions and expectations. For example, the branch of psychology that has had the strongest influence on educational psychology, at least until recently, has a long tradition of quantitative methods and experimental design. Sociology has a strong quantitative tradition but also relies on participant observation and other methods from anthropology. Measurement theory uses large-sample quantitative methods to study reliability and validity. To consider several different theoretical traditions in designing a study, as the thesis of this article suggests might be a good idea, researchers also have to deal with tensions among the methodological approaches each tradition would take. If the research is to be credible to several disciplinary audiences, this must be done carefully and well. In addition to the tensions among methods required for various theoretical traditions, there is a tension between some methodological requirements and the realities of classrooms.


Quantitative methods generally require sample size and sampling variability that are difficult to come by in classroom assessment research. What should a researcher do, for example, if a power analysis suggests more participants are required than exist in one class? Combining data from several classes to form a larger sample is not theoretically defensible, because classroom assessment environment and sociocultural learning theories suggest that different classrooms will be non-equivalent groups. More complicated nested designs require even larger samples.


The representative nature of any sample of students or teachers is also an issue. Studies sampling teachers in general, for example, have found different attitudes, beliefs and practices from studies sampling teachers involved in reform efforts. Students and teachers are volunteer participants in studies, protected by human subjects protocols. This is a good thing, but it does affect the representativeness of samples more in classroom assessment research─where emotions tend to run high─than in many other areas of research. Nonresponse bias is more likely to be linked systematically to the classroom assessments under study.


Classroom assessment research may be reviewed by editors, reviewers, and other readers with a psychological, sociological, or measurement theory orientation different from the orientation from which the researchers worked. Psychological theorists used to experimental methods may not appreciate the uncontrolled nature of field studies. Measurement theorists used to methods of testing reliability and validity that assume large sampling variability may misjudge the reliability and validity of classroom assessments developed for small, relatively homogeneous classrooms. Researchers unfamiliar with the cultural role of teachers in classrooms may not fully appreciate studies where data rely heavily on teacher judgment. This may make it difficult for classroom assessment research to be clearly understood. And yet research ‘‘in the intersections’’ promises benefits worth pursuing, including both richer academic understanding of classroom assessment and more application of findings to practice.


In the foreword to Torrance and Pryor’s (1998) book, Pollard (p. vi) points out that mixing theoretical traditions and methods can strengthen the ecological validity of findings.


A significant characteristic of this book is that it is positioned at the interface of psychology and sociology. Whilst there are benefits from disciplinary specialization, there are also weaknesses, and one of these is to undermine claims to holistic validity. Consequential problems may then arise in convincing research ‘users’ of the relevance of the work. . . . Stepping across such disciplinary boundaries may be academically risky but is soundly justified, if it yields a more valid analysis which practitioners can understand and apply. (p. vi)



CONCLUSION


The thesis of this article was twofold. First, the practice of classroom assessment occurs at the intersection of three teaching functions: instruction, classroom management, and assessment. Second, theory relevant to classroom assessment comes from several different areas of study: the study of individual differences, the study of groups, and the study of measurement. A review of recent classroom assessment literature was presented as evidence for this thesis. A summary of the evidence indicates that each of these practical and theoretical areas were indeed represented.


This patchwork of scholarship on the one hand can result in a disciplinary richness and complexity that will further understanding and on the other hand can result in an unfocused body of work that does not have mainstream influence in any one theoretical tradition. Many of the articles approached classroom assessment from only one perspective; the two largest groups of these were atheoretical inventories of practice and studies of classroom assessment based in psychological theory. Many of the articles reviewed, however, did mix two or more practical or theoretical interests and, where this was accomplished, the resulting picture of classroom assessment was richer and more multidimensional.


My reason for doing this review was to describe the patchwork with a systematic review of literature. My hope is that in taking a systematic look at the patchwork and coming to understand it, we might as a field begin more systematic work at turning the patches into a quilt. Pressing at the edges of theory and exploring intersections should contribute to richer understanding of classroom assessment. Reading across disciplines is difficult to do but is important and productive. Perhaps this article’s demonstration of how classroom assessment requires a cross-disciplinary tapestry will broaden researchers’ consideration of theory and methods as they design their own studies or read the studies of others. This becomes easier to do as growing electronic access makes searching and retrieving a wide range of journals more convenient. Finally, I remind readers that I only reviewed the English language literature. It would be instructive to review studies of classroom assessment published in other languages.



Note


1 Articles that were included in the literature review are noted as follows:

edenotes a report containing at least some empirical data (not all are complete research reports); ‘‘empirical’’ here is construed to include both quantitative and qualitative data.

rdenotes a review of studies.




References


eAdams, T. L., & Hsu, J. W. Y. (1998). Classroom assessment: Teachers’ conceptions and practices in mathematics. School Science and Mathematics, 98, 174–180.


eAnderson, J. O. (1999). Modeling the development of student assessment. Alberta Journal of Educational Research, 45, 278–287.


eBachor, D. G., & Baer, M. R. (2001). An examination of preservice teachers’ simulated classroom assessment practices. Alberta Journal of Educational Research, 47, 244–258.


rBangert-Drowns, R. L., Kulik, C. C., & Kulik, J. A. (1991). Effects of frequent classroom testing. Journal of Educational Research, 85, 89–99.


rBangert-Drowns, R. L., Kulik, C. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61, 213–238.

eBarnes, S. (1985). A study of classroom pupil evaluation: The missing link in teacher education. Journal of Teacher Education, 36(4), 46–49.


rBlack, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education, 5, 7–74.


rBlack, P., & Wiliam, D. (1998b). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80, 139–144.


rBrookhart, S. M. (1997a). A theoretical framework for the role of classroom assessment in motivating student effort and achievement. Applied Measurement in Education, 10, 161–180.


eBrookhart, S. M. (1997b). Effects of the classroom assessment environment on mathematics and science achievement. Journal of Educational Research, 90, 323–330.


eBrookhart, S. M. (2001). Successful students’ formative and summative uses of assessment information. Assessment in Education, 8, 153–169.


eBrookhart, S. M., & DeVoge, J. G. (1999). Testing a theory about the role of classroom assessment in student motivation and achievement. Applied Measurement in Education, 12, 409–425


eBulterman-Bos, J., Terwel, J., Verloop, N., & Wardekker, W. (2002). Observation in teaching: Toward a practice of objectivity. Teachers College Record, 104, 1069–1100.


eButler, R. (1987). Task-involving and ego-involving properties of evaluation: Effects of different feedback conditions on motivational perceptions, interest, and performance. Journal of Educational Psychology, 79, 474–482.


eButler, R., & Nisan, M. (1986). Effects of no feedback, task-related comments, and grades on intrinsic motivation and performance. Journal of Educational Psychology, 78, 210–216.


eCampbell, C., & Evans, J. A. (2000). Investigation of preservice teachers’ classroom assessment practices during student teaching. Journal of Educational Research, 93, 350–355.


rCicmanec, K. M., & Viechnicki, K. J. (1994). Assessing mathematics skills through portfolios: Validating the claims from existing literature. Educational Assessment, 2, 167–178.


rCrooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58, 438–481.


eDassa, C. (1990). From a horizontal to a vertical method of integrating educational diagnosis with classroom assessment. Alberta Journal of Educational Research, 36, 35–44.


eElawar, M. C., & Corno, L. (1985). A factorial experiment in teachers’ written feedback on student homework: Changing teacher behavior a little rather than a lot. Journal of Educational Psychology, 77, 162–173.


eFrary, R. B., Cross, L. H., & Weber, L. J. (1993). Testing and grading practices and opinions of secondary teachers of academic subjects: Implications for instruction in measurement. Educational Measurement: Issues and Practice, 12(3), 23–30.


eFrisbie, D. A., Miranda, D. U., & Baker, K. K. (1993). An evaluation of elementary textbook tests as classroom assessment tools. Applied Measurement in Education, 6, 21–36.


Gipps, C., McCallum, B., & Hargreaves, E. (2000). What makes a good primary school teacher? Expert classroom strategies. London: Routledge Falmer.


eGoldberg, G. L., & Roswell, B. S. (2000). From perception to practice: The impact of teachers’ scoring experience on performance-based instruction and classroom assessment. Educational Assessment, 6, 257–290.


eGullickson, A. R. (1985). Student evaluation techniques and their relationship to grade and curriculum. Journal of Educational Research, 79, 96–100.


eHiggins, K. M., Harris, N. A., & Kuehn, L. L. (1994). Placing assessment into the hands of young children: A study of self-generated criteria and self-assessment. Educational Assessment, 2, 309–324.


eImpara, J. C., Plake, B. S., & Fager, J. J. (1993). Educational administrators’ and teachers’ knowledge of classroom assessment. Journal of School Leadership, 3, 510–521.


eIsaacson, S. (1999). Instructionally relevant writing assessment. Reading and Writing Quarterly: Overcoming Learning Difficulties, 15, 29–48.


eJohnson, S. T., Wallace, M. B., & Thompson, S. D. (1999). Broadening the scope of assessment in schools: Building teacher efficacy in student assessment. Journal of Negro Education, 68, 397–408.


rKluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119, 254–284.


eKusch, J. W. (1999). The dimensions of classroom assessment: How field study students learn to grade in the middle level classroom. Journal of Educational Thought (Revue de la Pensee Educative), 33(1), 61–81.


eMavrommatis, Y. (1997). Understanding assessment in the classroom: Phases of the assessment process-The assessment episode. Assessment in Education, 4, 381–400.


eMcMillan, J. H. (2001). Secondary teachers’ classroom assessment and grading practices. Educational Measurement: Issues and Practice, 20(1), 20–32.


eMertler, C. A. (2000). Teacher-centered fallacies of classroom assessment validity and reliability. Mid-Western Educational Researcher, 13(4), 29–35.


Messick, S. (1986). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan.


eMoreland, J., & Jones, A. (2000). Emerging assessment practices in an emergent curriculum: Implications for technology. International Journal of Technology and Design Education, 10, 283–305.


rNatriello, G. (1987). The impact of evaluation processes on students. Educational Psychologist, 22, 155–175.


eNatriello, G. (1996). Evaluation processes and student disengagement from high school. In A. M. Pallas (Ed.), Research in sociology of education and socialization, (Vol. 11, pp. 147–172). Greenwich, CT: JAI Press.


eNicholson, D. J., & Anderson, J. O. (1993). A time and place for observations: Talking with primary teachers about classroom assessment. Alberta Journal of Educational Research, 39, 363–374.


rNiemi, D. (1997). Cognitive science, expert-novice research, and performance assessment. Theory into Practice, 36, 239–246.


ePryor, J., & Akwesi, C. (1998). Assessment in Ghana and England: Putting reform to the test of practice. Compare, 28, 263–275.


eRea-Dickens, P. (2001). Mirror, mirror on the wall: Identifying the processes of classroom assessment. Language Testing, 18, 429–462.


eRosenholtz, . S. J., & Rosenholtz, S. H. (1981). Classroom organization and the perception of ability. Sociology of Education, 54, 132–140.


eRoss, J. A., Rolheiser, C., & Hogaboam-Gray, A. (2002). Influences on student cognitions about evaluation. Assessment in Education, 9, 81–95.


Sadler, D. R. (1983). Evaluation and the improvement of academic learning. Journal of Higher Education, 54, 60–79.


rSadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.


eSchmidt, M. E., & Brosnan, P. A. (1996). Mathematics assessment: Practices and reporting methods. School Science and Mathematics, 96, 17–20.


Scriven, M. (1967). The methodology of evaluation. In R. W. Tyler, R. M. Gagne & M. Scriven (Eds.), Perspectives of curriculum evaluation. Chicago: Rand McNally.


rShepard, L. A. (2001). The role of classroom assessment in teaching and learning. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 1066–1101). Washington, DC: AERA.


eShepard, L. A., Flexer, R. J., Hiebert, E. H., Marion, S. F., Mayfield, V., & Weston, T. J. (1996). Effects of introducing classroom performance assessment on student learning. Educational Measurement: Issues and Practice, 15, 7–18.


eShulha, L. M. (1999). Understanding novice teachers’ thinking about assessment. Alberta Journal of Educational Research, 45, 288–303.


eSimpson, C. (1981). Classroom structure and the organization of ability. Sociology of Education, 54, 120–132.


rStiggins, R. J. (1999). Assessment, student confidence, and school success. Phi Delta Kappan, 81, 191–198.


eStiggins, R. J., & Bridgeford, N. J. (1985). The ecology of classroom assessment. Journal of Educational Measurement, 22, 271–286.


Stiggins, R. J., & Conklin, N. F. (1992). In teachers’ hands: Investigating the practices of classroom assessment. Albany: State University of New York Press.


eStiggins, R. J., Griswold, M. M., & Wikelund, K. R. (1989). Measuring thinking skills through classroom assessment. Journal of Educational Measurement, 26, 233–246.


eThomas, S., & Oldfather, P. (1997). Intrinsic motivations, literacy, and assessment practices: ‘‘That’s my grade. That’s me’’. Educational Psychologist, 32, 107–123.


rTittle, C. K. (1994). Toward an educational psychology of assessment for teaching and learning: Theories, contexts, and validation arguments. Educational Psychologist, 29,149–162.


Torrance, H., & Pryor, J. (1998). Investigating formative assessment. Buckingham, UK: Open University Press.


rTraub, R. E. (1990). Assessment in the classroom: What is the role of research? Alberta Journal of Educational Research, 36(1), 85–91.


eTunstall, P., & Gipps, C. (1996). Teacher feedback to young children in formative assessment: A typology. British Educational Research Journal, 22, 389–404.


Wiliam, D. (1998, September). Enculturating learners into communities of practice: Raising achievement through classroom assessment. Paper presented at the European Conference for Educational Research, University of Ljublzana, Slovenia.


rWiliam, D., & Black, P. J. (1996). Meanings and consequences: A basis for distinguishing formative and summative functions of assessment? British Educational Research Journal, 22, 537–548.


Wilson, R. J. (1990). Classroom processes in evaluating student achievement. Alberta Journal of Educational Research, 36, 4–17.


eWilson, R. J. (1999). Classroom assessment investigations. Alberta Journal of Educational Research, 45, 263–266.


eWilson, R. J., & Martinussen, R. L. (1999). Factors affecting the assessment of student achievement. Alberta Journal of Educational Research, 45, 267–277.


rWolf, D. P. (1993). Assessment as an episode of learning. In R. E. Bennet & W. C. Ward (Eds.), Construction versus choice in cognitive measurement (pp. 213–240). Hillsdale, NJ: Lawrence Erlbaum.



Cite This Article as: Teachers College Record Volume 106 Number 3, 2004, p. 429-458
https://www.tcrecord.org ID Number: 11523, Date Accessed: 10/23/2021 8:14:12 AM

Purchase Reprint Rights for this article or review
 
Article Tools
Related Articles

Related Discussion
 
Post a Comment | Read All

About the Author
  • Susan Brookhart
    Duquesne University
    E-mail Author
    SUSAN M. BROOKHART is an educational consultant and adjunct professor at Duquesne University. Her research specialty is classroom assessment. Recent publications include the theme article for the Winter, 2003, special issue of Educational Measurement: Issues and Practice on measurement theory for classroom assessment and a textbook on grading intended to help preservice and in-service teachers understand assessment and grading practices.
 
Member Center
In Print
This Month's Issue

Submit
EMAIL

Twitter

RSS