Subscribe Today
Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Requirements for an Assessment Procedure for Beginning Teachers: Implications from Recent Theories on Teaching and Assessment

by Anne Marie Uhlenbeck, Nico Verloop & Douwe Beijaard - 2002

The purpose of this study was to determine the best approach to the development of procedures to assess beginning teachers. First, studies on teacher thinking, teacher development, teacher learning and teacher knowledge were reviewed to obtain information on the most current views on the nature of teaching. Second, studies on new approaches to teacher assessment and on issues of validity and reliability were examined. An analysis of these topics yielded a set of implications that could be used as a basis for an adequate evaluation procedure. We propose a framework that consists of 15 implications for the development of beginning teacher assessments. We illustrate how the framework was applied in the development of an assessment procedure for beginning teachers.

The purpose of this study was to determine the best approach to the development of procedures to assess beginning teachers. First, studies on teacher thinking, teacher development, teacher learning and teacher knowledge were reviewed to obtain information on the most current views on the nature of teaching. Second, studies on new approaches to teacher assessment and on issues of validity and reliability were examined. An analysis of these topics yielded a set of implications that could be used as a basis for an adequate evaluation procedure. We propose a framework that consists of 15 implications for the development of beginning teacher assessments. We illustrate how the framework was applied in the development of an assessment procedure for beginning teachers.

As in other industrialized countries, policy makers in the Netherlands are continually involved in efforts to improve the quality of teaching and learning in schools. These efforts rest on the general conviction that the teacher is at the center of any attempt to attain this goal. School restructuring and innovations in the curriculum ultimately rely on the professional development of teachers (Calderhead, 1996). Reform initiatives include the formulation of standards for both experienced and beginning teachers. As a result, beginning teachers willin the futurebe awarded their teaching certificate after demonstrating that they meet teaching standards and not because they have successfully completed teacher education programs (Darling-Hammond, 1999). At the same time, a shortage of teachers in the Netherlands makes it necessary to attract more people to the teaching profession and to provide shortcuts to certification. Formulating standards for beginning teachers opens the door to alternative routes to certification and to different ways to prepare for it. Standards for teaching, however, rely heavily on the possibilities of developing evaluation procedures that adequately assess beginning teachers' competence. The purpose of the present study is to formulate the requirements for such an evaluation procedure.

Existing teacher evaluation procedures have been criticized for assessing elements which are not relevant to teaching, for failure to take the classroom context into consideration, and for evaluating teaching processes without referring to the adequacy of the content transmitted (Andrews & Barnes, 1990; Beijaard & Verloop, 1996; Dwyer & Stufflebeam, 1996; Shulman, 1987a). They have also been criticized for being based on the notion of generic teaching characteristics and for decomposing teaching into a set of discrete categories, resulting in a fragmented view of teaching (Delandshere, 1994; Haertel, 1991; Tomlinson, 1995a; Wolf, 1995). Others have pointed out the negative impact of evaluation practices on the teaching profession and teacher education curricula as well as the tendency to view teacher evaluation in isolation from the preparation and further professional development of teachers (Dwyer & Stufflebeam, 1996). Evaluation procedures have also been criticized for not being explicit about underlying assumptions about teaching and good teaching (Dwyer, 1994; Stodolsky, 1990). In addition, and most important of all, teacher evaluation has mostly been atheoretical and not grounded in ongoing theory development (Dwyer & Stufflebeam, 1996).

Teacher evaluation is a controversial and complex issue. In order to avoid the weaknesses mentioned above, we sought guidance on how best to approach the development of a teacher evaluation procedure. Two sources were examined in order to inform the development process. First, we examined studies on teacher thinking, teacher Development, teacher learning, and teacher knowledge. These studies contained information on current views on the nature of teaching, on how teachers learn and develop, and on what teachers need to know to teach well. Second, we turned to studies and reports on teacher assessments that have recently been developed or that are still under development elsewhere. These new approaches to teacher assessment rely on performance-based methods of assessment and emphasize a professional growth perspective.

From both types of sources, we extracted a set of implications that could serve as a framework that directs attention to the many issues involved in the design of an assessment procedure. The framework was also intended to make explicit the tensions that may arise among implications.

The context within which we started this undertaking is the development of an assessment procedure for certifying beginning secondary teachers of English as a Foreign Language (EFL). In our view, the issues involved in designing a procedure to assess EFL teachers do not fundamentally differ from teacher assessments for other subjects and levels and are therefore of interest to all involved in teacher assessment.

The first two parts of this article are devoted to the topics on which the development of valid teacher evaluation procedures largely rests: recent conceptions of teaching and new approaches to teacher assessment. An analysis of these topics enabled us to extract 15 implications that may be used as the basis for an adequate evaluation procedure. The third part contains a summary of the implications together with an indication of the foundations and a list of unresolved issues that need further study. We illustrate the use of the framework by describing the assessment procedure that has grown out of it.


During the last decade, new conceptions of teaching have emerged that have implications for the evaluation of teachers. Process-product studies in the 1970s and 1980s focused on teacher behavior and on how teacher behaviors related to outcomes in student learning. Effective teachers were expected to demonstrate generic teaching skills that correlated positively with student achievement (Lowyck, 1994). Studies on teacher thinking, teacher development, teacher learning, and teacher knowledge have fundamentally changed how teaching is viewed and what it takes to teach. When considering the evaluation of teachers, these views form an important impetus in developing new evaluation systems.


Studies on teacher thinking focus on how teachers make sense of their students and classroom events and how their understandings influence the decisions they make. Initially, these studies examined teachers' cognitions that are related to pre-, inter-, and postactive phases of teaching and then shifted to include teachers' beliefs, perceptions, reflections, and routines (Calderhead, 1996; Clark & Peterson, 1986). Studies on teachers' thought processes have revealed the complex nature of teaching. Teaching is complex because it is a purposeful activity intended to promote learning, usually in a relatively large group of students with different individual characteristics, different needs, and backgrounds (Tomlinson, 1995a). Promoting learning in groups of students means attending to multiple goals simultaneouslysuch as involving all students in the lesson, creating a safe learning environment, encouraging shy students, and managing the class (Leinhardt, 1993). Teaching is also complex because teachers work in settings that make constant demands on them and in which there is little time to reflect (Doyle, 1986). The classroom is an uncertain place where it is difficult to anticipate how a particular activity will work out. During teaching, teachers resolve tensions among competing goals as they make moment-to-moment decisions about what to do in a particular situation. Teachers act on what they think is best in a given situation, mostly on the basis of incomplete evidence, without much time for deliberation, and without clear criteria for judging the success of their actions (Airasian, Gullickson, Hahn, & Farland, 1995). While teachers may pursue the same goals, they may do so in different ways, using different strategies, depending on their personal theories about teaching and learning and their personal interpretations of the situation (Leinhardt, 1993; Tomlinson, 1995b). The context, that is the school, the classroom, the particular students, the content, and the particular textbook, has considerable influence on teachers' decisions. Teaching shapes and is shaped by the context in which it takes place (Airasian et al., 1995; Leinhardt, 1993). In other words, teaching is defined as the interaction of teacher, students, content, and setting within the larger context of the school (Delandshere & Petrosky, 1994, 1998; Tomlinson, 1995b). Consequently, what is judged as appropriate and effective teaching cannot be separated from the context in which it takes place and from the goals a teacher pursues. Teaching, in this view, is not at all like a technique in which teachers apply teaching methods that produce unequivocal learning results in students. Rather, teaching requires considerable judgment, a variety of pedagogical and instructional strategies, and a good understanding of the context in order to select those strategies that best fit the situation.

Teacher thinking studies also underscored that much of what teachers do takes place outside the classroom, like planning, assessing students, choosing and adapting instructional materials, and working with colleagues (Stodolsky, 1990). This gradually led to a change in how teaching is viewed: from a narrow conception of teaching as performance in the classroom to a broad conception including pre- and postactive phases of teaching and collaboration with colleagues (Reynolds, 1992).

What is stated above has far-reaching consequences for the assessment of teachers. We infer four implications. First, we should not only assess a teacher's actions but also a teacher's cognitions. What a teacher is trying to achieve, how he or she is trying to do so in view of the particular situation, and why should be assessed. For instance, teachers' explanations of the reason they arranged a speaking lesson in a certain way can reveal their understanding of how best to conduct speaking lessons with certain types of students. This implies examining a teacher's actions in combination with his or her cognitions. However, this is not unproblematic since a teacher's cognitions are not directly accessible. Teachers often have difficulty being explicit about why they made certain decisions or why they acted in a certain way. Not all actions are consciously processed because, with experience, many behaviors become automated and routine (Tomlinson, 1995a). Moreover, although it is generally assumed that teachers' behaviors depend on what they think, how their thinking influences their actions is not well understood (Richardson, 1996). In addition, there are no well-tried methods by which cognitive data can be linked to behavioral data (Kagan, 1990).

Second, if teaching is defined as adapting instruction to the particular situation and to particular students, teaching should be assessed in context. Only within the context in which teaching takes place can the appropriateness of a teacher's actions be assessed. The most obvious context is a teacher's own school and his or her own classroom. However, teachers work in contexts that differ substantially. Inner-city schools, suburban or rural schools, schools with a multicultural or a mono-cultural school population, schools with extensive or limited resources, schools with much or little support to teachers, require diverse capabilities and place different restraints on teachers. This raises the question whether the same criteria can be applied to all teachers. Some contexts make it much harder to satisfy criteria because of limited resources or support or the type of students. This will affect the reliability of the assessment because the interpretation of the results of the assessment is ambiguous as we may ascribe them to the capabilities of the teacher (or a lack of them) or to the (difficulty of the) situation.

Third, studies on teacher thinking portray teaching as a complex activity. Teachers make complex decisions, taking the content, the students, and the situation into account in the light of the goals of the teacher. The assessment should reflect this by acknowledging that there are different ways in which teachers can deal with this complexity. Criteria on which teachers are judged should not prescribe a particular way of teaching; but they should accommodate a range of acceptable ways to teach. This implies that we need to formulate criteria that allow a range of possible courses of action and yet define what is considered as unacceptable (Dwyer, 1994). How to specify criteria that satisfy this requirement is a challenge.

Finally, defining teaching more comprehensively by including pre- and postactive phases of teaching and collaboration with colleagues implies that we should collect evidence on all these aspects of teaching. If we limit teaching to performance in the classroom, we leave out much evidence about a teacher's work. Data need to be collected on all aspects of a teacher's work so that the entire domain of teaching is covered. This implies the need to develop more than one type of instrument.


In this section, we look at studies that attempt to describe how teachers grow and develop over their careers and the factors that affect their growth. If their development can be understood in terms of common sequences of changes in knowledge and beliefs, this might indicate what to assess and how to make distinctions between the competent and the not yet competent beginning teacher. In general, professional development and teacher learning have to be viewed as dynamic processes, not restricted to the period of preparation before practice or as periodic staff development but as extending from initial preparation over the course of a teacher's entire career (Cochran-Smith & Lytle, 1999).

In a review of studies on learning to teach, Kagan (1992) found support for a stage model of professional development from novice to expert. Kagan regarded preservice and the first year of teaching as one developmental stage in which a novice's primary tasks are to acquire knowledge of pupils, knowledge of self as teacher, and procedural routines. Novices need to address these tasks first before they can shift their attention to instruction and student learning. Kagan considered "procedural routines to be the sine qua non of classroom teaching" (p. 162). Preservice teacher education should primarily focus on promoting the acquisition of standardized routines. Only after these are in place can novices move to the next stage. From a stage model view of teacher growth, learning to teach is a more or less ordered process that progresses through distinct stages that can be defined and assessed. Kagan's conclusions seem to imply that the assessment of beginning teachers should focus on whether beginning teachers have successfully mastered classroom management routines.

Kagan's stage theory has been criticized because it suggests that earlier stages lead to later stages (Grossman, 1992). According to Grossman, there is no evidence that teachers, once they have mastered the routines of teaching, will naturally begin to question these routines. Bullough (1997) argued that stage theories tend to emphasize linearity while not accounting for movement or lack of movement from one stage to another. More recent stage theories emphasize the dynamic nature of the learning-to-teach process with teachers moving in and out of stages in response to environmental influences (Fessler, 1995; Huberman, 1995). Calderhead and Shorrock (1997) reported that some of the students in their study progressed through distinct stages and others did not. They emphasized the diversity of routes to becoming a teacher. They regard stage models mostly as useful heuristics to highlight the complexity of teaching. Bullough (1997), agreeing with Calderhead and Shorrock (1997), described the process of becoming a teacher as an idiosyncratic process in which past experience, personality, and context each influence the decisions beginning teachers make. This view is supported by constructivist perspectives on learning to teach that regard it as a unique and dynamic process in which multiple dimensions are involved and not as a single progression from novice to expert. Shifts in professional growth occur at different points in time and are difficult to predict.

For the assessment of beginning teachers, we extracted the following implication. If we conceptualize the process through which teachers develop not as a linear process but as a dynamic process that is difficult to predict and that shows inconsistencies, the assessment should reflect this. More concretely, the assessment should strive to capture the diversity in what concerns beginning teachers, in what they know, and in what they can do. A portrait or a profile that describes in some detail how the beginning teacher has performed on the assessment more truly reflects the various and complex ways in which beginning teachers grow. A profile or portrait may do justice to what an individual teacher knows and can do. Because it provides a teacher with a description of strengths and weaknesses, it would fit professional growth purposes of assessment. It leaves us, however, with the question whether such an approach can be combined with making pass/fail decisions with respect to certification. The detailed information of such a profile makes it difficult to reach an overall judgment. Decisions need to be made about whether to employ a model in which weaknesses can be compensated by strengths or a model that requires minimum competence on a range of standards.


In this section, we discuss constructivist perspectives on teacher learning that describe the process through which teachers learn to teach and how that process may be facilitated. In constructivist studies on learning to teach, teacher learning is described as a process of organizing and reorganizing, structuring and restructuring a teacher's understanding of practice. Teachers are viewed as learners who actively construct knowledge by interpreting events on the basis of existing knowledge, beliefs, and dispositions (Borko & Putnam, 1996; Feiman-Nemser & Remillard, 1996; Putnam & Borko, 1997). Prior knowledge and beliefs about learning, teaching, students, and subject matter play a central role in learning to teach because they function as interpretative lenses through which beginning teachers make sense of their experience and which determine how they frame and resolve teaching problems. These beliefs are often not held consciously and cannot be readily articulated. They are also highly resistant to change (Calderhead, 1996; Richardson, 1996). For instance, many beginning teachers believe that learning is absorbing and memorizing information, while teaching is a process of passing knowledge from teacher to students (Feiman-Nemser & Remillard, 1996). Such beliefs are incompatible with beliefs underlying recent conceptions of teaching and learning. It is assumed that only by changing their beliefs can teachers learn new instructional practices. In order to do this, beginning teachers need opportunities to become aware of and enhance their understanding of their actions and beliefs (Borko & Putnam, 1996; Cochran-Smith Sc Lytle, 1999). Only by becoming aware of their actions and beliefs and by holding them up for scrutiny and comparison to divergent beliefs can teachers develop new understandings and learn new instructional practices. Calderhead (1996) argued that an important aspect of teachers' professional development is the process of making implicit beliefs explicit and thereby developing a language for talking and thinking about practice by questioning the sometimes contradictory beliefs underlying their practice.

In social views of cognition, learning to critically examine and reflect on teaching practice is not primarily seen as an individual but as a social process. Here, learning is conceptualized as coming to know how to participate in the discourse and the practices of a particular community (Putnam & Borko, 1997). By participating with others in professional activities and conversations, teachers extend their knowledge of teaching. Collaborating with others on tasks of teaching is assumed an important means to continue to learn professionally. In working with colleagues on teaching tasks, by discussing and reflecting on problems of teaching, teachers may be confronted with divergent beliefs. By articulating what underlies their practice and the practice of others, an opportunity is created to adjust their beliefs and to learn about other instructional practices. They can only profit from these opportunities when they possess the skills and the disposition to engage in such "conversations."

From what is stated above, we inferred two important implications for the assessment of beginning teachers. In view of the fact that professional development continues throughout a teacher's career, it is of major concern that teachers have the disposition and the skills to develop new understandings of teaching, to learn new instructional skills, and to expand their knowledge base for teaching. Being able to verbalize one's thinking and articulate one's assumptions is an important tool in constructing and expanding one's knowledge base for teaching. This means that beginning teachers should demonstrate that they engage in the kinds of thinking and reasoning that are necessary to continue and regulate their professional development. Beginning teachers should be required to question and reflect on their actions and beliefs. For instance, they should be explicit about issues like how and why they selected certain tasks for their students or why they explained a concept in a particular way. However, how to judge the quality of beginning teachers' deliberations and reasonings and how to determine criteria for good reasoning or good reflection are still matters of much debate (Kagan, 1990; Meijer, 1999).

If learning to teach is conceptualized not only as learning to individually examine and reflect on teaching practice but also as learning how to participate in professional discourse with colleagues, a second implication would be that we assess teachers' ability to collaborate on tasks of teaching. In working with colleagues, they should demonstrate they have the skills, the disposition, and the knowledge to engage in professional conversations with colleagues.

Again, although it is felt that working with colleagues and engaging in professional conversations are important qualities in view of further growth, it may be quite difficult to make evaluative judgments about their collaboration. What are the distinctive elements of such forms of collaboration? What should be concluded about beginning teachers who do not engage in such professional conversations? Does it mean they lack self-confidence or communication skills or the necessary knowledge for teaching? What if such professional conversations are not encouraged by the school culture?


Studies on knowledge of experts in fields other than education highlight the extensive, accessible, and well-organized bodies of knowledge experts have about their field of expertise (Bereiter & Scardamalia, 1993; Sternberg & Horvath, 1995). In the field of education, too, the central role of knowledge in teaching and in learning to teachand thus in the assessment of (beginning) teachersis widely accepted (Borko & Putnam, 1996). In this section, we discuss different perspectives on the sort of knowledge teachers need in order to teach well. From a research-based perspective, beginning teachers ideally need to know and be able to use the knowledge about teaching generated by researchers in various disciplines. This research-based knowledge relates to such topics as subject matter knowledge, classroom management, student learning, student motivation, and instructional strategies. It outlines confirmed knowledge and best practices that are based on empirical evidence of effectiveness (Reynolds, 1989). Teachers are encouraged to use this knowledge generated by researchers and apply it in their classrooms because it is assumed that such teachers will teach better (Cochran-Smith & Lytle, 1999; Feiman-Nemser & Remillard, 1996). Based on a research-based perspective on the knowledge beginning teachers need for teaching, an assessment should focus on how well they are able to apply this knowledge in their teaching. The extent to which beginning teachers have mastered certain confirmed strategies that have been. empirically tested by researchers is the focus of the assessment. From this perspective, beginning teachers need to demonstrate that they can use and adapt certain instructional strategies to their own situation. Criteria derived from the research literature are used to judge how well they have done this. The vision underlying this perspective is that good teaching can be defined as being able to apply confirmed strategies.

However, numerous studies have shown that teachers use very little research-based knowledge nor do they find it particularly relevant (Kagan, 1992; Kennedy, 1997; Kwakman, 1999). According to Feiman-Nemser and Remillard (1996), organizing the knowledge base for teachers around discrete topics sidesteps the issue of knowledge use. Teachers do not draw on knowledge about one topic at a time, but they integrate different kinds of knowledge in teaching. The question here is how teachers transform formal knowledge into teaching activities. This is better accounted for in a practice-based perspective on teacher knowledge, where the assumption is that what teachers need to develop is practical knowledge, the knowledge that is embedded in practice. This knowledge indicated as personal practical knowledge, practical knowledge, craft knowledge, wisdom of practice, or implicit theories refers to the knowledge teachers develop with respect to their teaching practice, knowledge of classroom situations and of practical dilemmas. It is knowledge of the particular and the concrete in contrast to abstract rules and general theories (Beijaard & Verloop, 1996; Carter, 1990; Fenstermacher, 1994). Classroom experiences and reflections on experiences form a primary source in the construction of teachers' practical knowledge. Teachers also integrate research-based knowledge in their practical knowledge if this knowledge seems useful in their particular situation (Beijaard, 1998; Van Driel, Verloop, & De Vos, 1998). Meijer (1999) summarized the characteristics of practical knowledge that emerge from different perspectives on practical knowledge: "It is personal and to a certain extent unique, it is contextual, based on (reflection on) experiences in teaching, and mainly tacit; it underlies teachers' practice and it is content-related" (p. 19). Practical knowledge guides a teacher's behavior because of its relevance and immediate utility in daily practice (Verloop, 1992).

For that reason, several authors suggest that the practical knowledge of teachers should be included in teacher assessments (Beijaard & Verloop, 1996; Leinhardt, 1990; Shulman, 1987b). By regarding the practical knowledge of teachers as valid knowledge, teachers are also seen as producers of knowledge (Cochran-Smith & Lytle, 1999). If the knowledge teachers themselves have of their practice is acknowledged and taken seriously, it will also make the assessment more valid in the eyes of beginning teachers. A practice-based perspective on teacher knowledge is, however, controversial. First, there are normative questions about the status of teachers' practical knowledge. For instance, whose practical knowledge is included, since experience in itself is not enough to be an expert teacher (Meijer, 1999; Sprinthall, Reiman, & Thies-Sprinthall, 1996; Sternberg & Horvath, 1995)? The question is how to make the distinction between an expert and an experienced teacher (Leinhardt, 1990). Second, it is often argued that teachers' practical knowledge is of a conservative nature (Sprinthall, Reiman, & Thies-Sprinthall, 1996). This is a particularly important point because reform initiatives expect teachers to adopt new practices such as more student-centered styles of teaching.

Taking a practice-based perspective on what beginning teachers should know and be able to do hinges on involving expert teachers in the development of the assessment. Being viewed by their peers and their headmasters as experts in combination with proof about their being involved in ongoing learning about their practice might be used as indicators of such expertise. The view underlying this perspective is that competent teaching can be defined as what expert teachers find acceptable.

From the above, we formulated two implications for the development of an assessment procedure. First, criteria on which beginning teachers are judged should be established through dialogue between research-based and practice-based perspectives on teacher knowledge. Both research-and practice-based perspectives on what teachers should know and be able to do form important sources of knowledge that should be included in the assessment. However, they form two completely different sources of knowledge: one is explicit and in the form of general rules or heuristics; the other resides in the mind of teachers and is mainly tacit, personal, and to some extent unique. The main issue here is how the interaction between these different sources of knowledge is organized. Delandshere (1996) suggested that by analyzing the practice of exemplary teachers, by involving expert teachers in the design of assessment tasks and activities, and by having them serve as assessors in the assessment procedure, their practical knowledge can be included in the assessment (see also Leinhardt, 1990).

The second implication follows from the first. Expert teachers should be involved in the development of the assessment, by taking part in exercise development and by acting as judges. By involving expert teachers in the development and judgment process, their practical knowledge about what is critical in teaching can be accessed in a natural way. Their involvement can guarantee that the assessment contains the kind of problems that practitioners are faced with. Their extensive and detailed knowledge of the context of teaching makes them, in principle, good judges of their beginning colleagues. This does not mean that they do not need extensive training in interpreting and judging the widely varying responses of examinees. Developing a shared frame of reference and the ability to articulate and justify their judgments needs specific training.


In the first section, we describe new approaches to teacher assessment and the main features of performance or authentic assessment. In the second section, we discuss problems of validity and reliability of performance assessment, setting out the main issues developers have to attend to in balancing concerns of validity and reliability in designing the overall assessment procedure.


In the 1970s, competence-based approaches to teacher education and assessment were viewed as promising ways to improve the preparation and evaluation of teachers. Competence was specified in concrete behaviors, and before being certified teachers had to demonstrate that they had acquired these behaviors (Wolf, 1995). Since then, this approach has met with considerable criticism. The criticism focused on how teacher competence is defined for assessment purposes and how this interacts with classroom observation as the preferred and often sole method of assessing teacher competence (Delandshere, 1994; Stodolsky, 1990). Teacher competence is defined in terms of a set of discrete behaviors associated with the completion of atomized tasks. How these behaviors relate to each other and to the whole of teaching is left unclear (Delandshere, 1994; Eraut, 1994; Gonczi, 1994; Tomlinson, 1995a; Wolf, 1995). Classroom observation checklists score whether a candidate shows the required behaviors, frequently without judging whether they are appropriate in the specific situation. In addition, by focusing exclusively on classroom performance, much evidence about teaching is inevitably left out (Delandshere, 1994). Because of the high stakes attributed to test scores, most observation systems for teacher evaluation have been more concerned with the reliability and the objectivity of the scores they produce than with their validity (Stodolsky, 1990). The negative effects of these types of teacher evaluations have been well documented in terms of reductionist views on teaching and learning and lack of consideration of the teaching context (Haney, Madaus, & Kreitzer, 1987).

In recent years, teacher assessment has received a strong new impulse, notably in the United States under the term standards-based teacher assessment. Teacher assessment procedures, developed by the National Board of Professional Teaching Standards (NBPTS, n.d.), the Interstate New Teacher Assessment and Support Consortium (INTASC, 1992), and PRAXIS III, developed by Educational Testing Service (ETS, 1992), all start from an awareness of the considerable impact of testing on teaching and learning to teach. Incorporated in these new assessments for beginning teachers (Praxis III and INTASC) and experienced teachers (NBPTS) is a more comprehensive view of teaching and an explicit intention to improve the quality of teaching. These assessments are based on standards that describe good teaching as knowing particular students, knowing subject matter content, being able to use a repertoire of teaching strategies, selecting techniques that best fit the situation, and being able to reflect on teaching (Milanowski, Odden, & Youngs, 1998). The standards are broad descriptions of what teachers need to know and be able to do, and they are based on the recognition that there are many ways in which teachers can meet these standards. As such, they differ considerably from standards formulated in early competence-based assessment that indicate prescribed behaviors. Standards are developed through consultation with experienced teachers and teacher educators. NBPTS and INTASC assessments are subject and level specific, whereas PRAXIS III is meant to be used in all subject areas and at all grade levels. They all rely on "authentic" assessments of teacher competence. From the evidence gathered from performances on these authentic assessments, assessors make judgments about whether an individual meets the criteria specified in the standards (Gonczi, 1994).

Authentic assessmentthe term is often used indiscriminately with other terms like performance assessment or performance-based methods of assessmentcalls for examinees to demonstrate their capabilities directly by creating some product or engaging in some activity (Haertel, 1991). From the product or the performance, underlying competence is inferred. Messick (1994) elaborated on what authentic might mean in connection with expected benefits of authentic testing. In his view, authentic assessment means that the full complexity of the knowledge and skills involved in, for instance, teaching is preserved in the assessment and that nothing that is essential about teaching is left out of the assessment. The assessment tasks/activities, the scoring criteria, and rubrics should all reflect that complexity. This also bears upon the way in which the results of the assessment are reported to examinees (Delandshere & Petrosky, 1998; Gipps, 1994). Authentic or performance assessment attempts to mirror as closely as possible what is expected of a candidate in the real work situation. There is evidence that direct methods of assessment predict success at work much better than indirect tests like paper-and-pencil tests (Hoekstra, 1995; Wolf, 1995; Tomlinson, 1995a). The assumption is that assessment methods based on broad standards that reflect the complex knowledge and skills demanded of a candidate in the real (teaching) situation are valid measures of teaching. In addition, they are believed to have a positive impact on learning to teach, on further professional development, and on teacher education curricula (Delandshere, 1994; Dwyer & Stufflebeam, 1996; Moss, 1992).

Authentic assessment and performance assessment typically share the following features (Gipps, 1994; Haertel, 1990, 1991; Moss, 1994; Swanson, Norman, & Linn, 1995; Tillema, 1993; Wiggins, 1993):

1. They aim to be realistic representations of the actual tasks/activities we want to assess competence in.

2. They aim to be meaningful and relevant tasks/activities that are worthwhile doing.

3. They allow examinees substantial freedom in the interpretation of, responding to, and in designing or selecting of tasks/activities.

4. They take a considerable amount of time.

5. They require expert judgment in scoring.

Eraut (1994) made a useful distinction between performance assessment methods. First, there are methods that collect evidence on (teaching) performance and products (of teaching). It is assumed that competence is incorporated in the performance or in the product. Second, there are methods that collect evidence on capability. Eraut defined capability as knowledge in use, the knowledge and understandings that underpin competent performance and that comprise knowledge of people, knowledge of situations, and knowledge of practice. Third, there are methods that collect evidence on both performance (or products) and capability. Generalization from a candidate's performance on tasks and activities can be increased by questioning candidates about their understanding of underlying principles and their knowledge of alternative strategies in coping with variations in context (Jessup, 1991).

We drew three implications for developers of assessment procedures. The first implication is that, in order to be able to develop an assessment procedure for teachers, standards should be developed that represent the key aspects of professional practice. Standards indicate outcomes of education and training. They describe what we actually expect beginning teachers to know and be able to do and can be formulated at different levels of generality. At their most general level, standards define a profession. At a more specific level, standards describe what we expect from teachers teaching a specific age group and/or subject (Roth, 1996). This would imply that the connection between curriculum standards for students and the standards for teachers who teach these students needs to be considered. Standards may describe what we minimally expect from teachers, but often implicitly show a vision of higher levels of competence (Diez, Richardson, & Pearson, 1994). An important question is at what level of specificity standards are formulated. If formulated at a very general level, it is easier to get consensus but then there is considerable room for different interpretations. If formulated very specifically, the standards might have a limiting effect and fragmentize teaching (for different ways in which this problem is dealt with, see the vocational qualifications in Great Britain, described as National Vocational Qualifications (NVQs; NCVQ, 1991), and those by the National Board of Professional Teaching Standards (NBPTS, n.d.) in the United States).

A second implication is that it is important to think through what process standards are developed. How standards are developed significantly affects their nature, their effectiveness, and credibility (Roth, 1996). For instance, there is the question of who is involved in their development. There are different stakeholders in education: teachers and their organizations, students and their parents, state and local officials, researchers and educational specialists. Standards need to be accepted and understood by all parties involved, but in the first place by teachers themselves. The question is how to deal with the different perspectives (Dwyer & Stufflebeam, 1996). Another question is the sources of information that are used to define the content of the standards. For this, we refer to the section on research-based and practice-based perspectives on what teachers should know and be able to do. Moreover, standards need to be fair to teachers and reflect what is currently acceptable professional practice (Dwyer, 1994). At the same time, what is acceptable is not a static but a dynamic concept. This is particularly true at present, now that the curriculum in primary and secondary education in the Netherlands is undergoing changes towards more independent learning that require new knowledge and skills from teachers.

A third implication is that we should design assessment systems that assess what is actually demanded of candidates in the real situation, that is, on the job. This implies that assessment methods should collect evidence both on teaching performance and products of teaching and evidence on knowledge and understanding of teaching. Developers should consider whether beginning teachers should be assessed in their own schools and in their classrooms or whether there are advantages in assessing them in simulated settings (Gonszi, 1994; Straetmans, 1995). In addition, since we cannot assess everything, a choice has to be made regarding the tasks or activities that best enable teachers to demonstrate that they meet the standards. Tasks or activities have to be chosen in such a way that candidates have a fair chance of meeting the standards regardless of the specific situation in their schools in terms of resources and the support they get (Delandshere, 1994; Moss & Schutz, 1999).


Although performance-based methods seem to be most promising for the assessment of teachers, they present serious problems with respect to the content validity and the reliability of the scores they generate (Moss, 1994).

The first problem concerns the limited number of tasks/activities on which to base decisions about a candidate's overall performance. We must feel assured that the evidence about a candidate's performance adequately covers teaching as we define it at present (Messick, 1994). This is a serious matter because it appears that performance on tasks/activities is highly task-specific. This means that there is low reliability in terms of consistency in performance across tasks that require knowledge and skills from the same domain or on tasks/activities that are very similar (Gipps, 1994; Linn, Baker, Sc Dunbar, 1991; Moss, 1994; Swanson, Norman, & Linn, 1995). In the case of teacher evaluation, this implies that what a teacher can do in one context does not generalize well to other contexts with other topics and other age groups. A teacher may demonstrate competence in teaching a particular topic to particular students of particular age groups, without necessarily showing the same competence in teaching another topic to other students. Thus, both the limited number of tasks and activities, and the fact that performance on tasks/activities is highly task dependent, makes it difficult to generalize from the performance to the whole domain. Increasing the number of tasks tends to increase both domain coverage and consistency across tasks, but this is not always feasible because of extended testing time (Linn, 1993). For that reason, Messick (1994) advised combining extended and more time-consuming tasks that assess depth of understanding with shorter structured tasks in order to reach acceptable levels of content validity.

A second problem with respect to reliability refers to consistency in judging the performances elicited by tasks and activities. Examinees get considerable freedom in the interpretation of the task and activities, in their response, and even in the selection of tasks and activities (e.g., in the case of selecting items for a portfolio). Their responses may vary widely and be difficult to anticipate (Delandshere, 1994). Judging responses is even more difficult because the contexts in which candidates perform tasks and activities are highly variableand this is particularly true when candidates are judged in the real context of the classroom and the school. This imposes a heavy burden on assessors. It implies that they must determine how to take the context into account when judging a candidate. This calls for trained assessors who are knowledgeable of the context. Training makes a major difference in supporting assessors to rate according to specified criteria rather than their own preferences. Another way of dealing with these problems is refining instructions to candidates, specifying scoring instructions, and reducing the variability of the contextand thus standardizing the task and activity and narrowing the range of performances that are evaluated. If tasks and activities, the scoring procedure, and the context are highly specified, levels of interrater agreement will increase but the validity of the assessment will be compromised (Moss, 1994). For example, it will be easier to reach high levels of interrater agreement about the number of questions a teacher poses in a simulated lesson than about the quality and the appropriateness of questions in a real classroom. Although the second situation will reveal more about the qualities of the teacher, reaching agreement on this issue requires a good deal of expertise from the assessors. In fact-and this is an enduring problem-reliability and validity place conflicting demands on the selection, formulation, and scoring of tasks and activities (Delandshere, 1994; Gipps, 1994; Moss, 1994). The concept of reliability, quantitatively defined in terms of consistency of evaluation across raters on a given task, and consistency in performance across tasks that address the same knowledge and skills needs to be redefined at least according to some assessment specialists (Gipps, 1994; Moss, 1994; 1996). According to Gipps (1994), we should stop presenting assessment as an exact science and give up the thought that something like a "true score" exists. In a similar vein, Moss (1994) argued that we should acknowledge that complex performances like teaching are context bound and, therefore, are likely to show inconsistencies. Advocates of the view that reliability needs to be redefined argue that we should look for other measures to make sure that the judgment process proceeds fairly and responsibly (Wolf, Bixby, Glenn, & Gardner, 1991). The quality of assessors' argumentation and the adequacy of selected evidence supporting assessors' judgments might for instance serve as such a guarantee.

From a similar viewpoint, Delandshere and Petrosky (1998) questioned the use of numerical ratings as the basis on which decisions are made about the certification of teachers. In their view, numerical ratings reduce the complex, multidimensional evidence about a candidate's performance. Consequently, valuable information gets lost and an opportunity for personal, specific feedback is missed. They stressed the need to search for alternatives to the use of ratings in the context of teacher assessment.

From the above, we saw three implications for the development of an assessment procedure for beginning teachers. First, in order to reach acceptable levels of content validity, a mixture of assessment methods should be developed consisting of open, extended performance assessments and shorter structured tasks. In combination, they should sufficiently cover the different aspects of teaching, so decisions about a candidate are based on multiple lines of evidence. It is assumed that examining different sources of information results in a more valid picture of a teacher's overall performance. In addition, in order to broaden the base on which inferences are made about a candidate's competence, assessments should be designed in such a way that candidates comment on what they were trying to achieve and why they selected one course of action rather than another. The question that remains is how different types of information, for instance numerical scores and written summaries, can be combined in order to reach an overall judgment. In addition, questions like whether a candidate can compensate weaknesses on some aspects of the assessment with strengths on other aspects need to be answered.

The second implication is that we should look for measures that ensure that the judgment process proceeds responsibly. One measure that could serve that purpose is requiring judges to have extensive knowledge of the context of the assessment. This would imply involving assessors teaching the same subject to students of the same level. Investing energy in the training of judges in a shared understanding of the standards, of the criteria, and of the application of the criteria can also contribute to such a responsible process (Gipps, 1994; Wolf, 1995). In addition, the scoring procedure should be designed in such a way that judgment proceeds systematically and that the steps assessors take to reach a judgment are open for inspection by others (Moss, 1994; 1996). Finally, sources of evidence on which judgments about candidates are based should be varied. The question that remains to be answered is how these requirements can best be met. For instance, what is the best way to prepare assessors for their task? Another question is whether these measures are equally important or whether some are more important than others.

A third implication is that, in making decisions about the assessment design and the development of tasks/activities, developers should balance concerns regarding the validity and reliability of the total set of instruments for its intended purposes. Bachman and Palmer (1996) argue that, in order to achieve maximum usefulness of an assessment procedure, we need to find an appropriate balance among different test qualities. They state that individual test qualities must be evaluated in terms of their combined effect on the overall usefulness of the test. In making decisions about assessment methods and their design, developers should attempt to compensate the weaknesses of one method by the strengths of another method. What is the most appropriate combination cannot be determined in a general way because it depends on the particular assessment situation and on the purposes of the assessment. Following Bachman and Palmer, we propose the following qualities for developers to consider in combination when designing an assessment procedure for beginning teachers. Most of these qualities flow from what was discussed before. Four; qualities are concerned with the validity of the assessment and one with the reliability of the assessment. The sixth quality, practicality, is of a different nature and refers to practical considerations in terms of time and money that play a role in developing and implementing an assessment procedure.


In order to justify the use of teacher assessments, we need to be able to demonstrate that performance on the tasks/activities that are included in the assessment correspond to performance in situations other than the test situation itself. Authenticity refers to the degree of correspondence between the tasks/activities the candidate is asked to carry out as part of the assessment and similar tasks/activities in nontest situations. The degree to which developers design tasks/activities that are faithful and realistic representations of what is actually required of candidates on the job contributes to the authenticity of the assessment. A critical feature of teaching is its complexity, and tasks/activities should reflect this. Authenticity is also an important consideration in designing tasks/activities that candidates perceive as relevant and significant aspects of their work. This would imply that candidates have some say in determining what is significant or relevant to them. Authenticity provides a means for investigating the extent to which, score interpretations generalize beyond performance on the assessment tasks/activities.

Content quality

Content quality refers to the extent and the type of involvement of a candidate's capabilities in carrying out the tasks/activities included in the assessment. The degree to which tasks/activities elicit specific areas of (EFL) knowledge and skills as they are employed in practice contributes to their validity. The content quality of a task/activity can be characterized in terms of the ways in which a candidate's personal characteristics and specific areas of (EFL) teaching knowledge and skills are engaged by the task/activity. In order to make inferences about a candidate's teaching capabilities, his or her responses to assessment tasks/activities must involve the integration and use of relevant areas of EFL teaching knowledge.

Domain coverage

The evidence on which decisions about a candidate are based must be sufficiently broad and cover the different content standards that define what teachers should know and be able to do. Sources of evidence should be collected over a period of time and consist of various types of data. Considering the context-specific nature of teaching, the evidence should pertain to different settings, different ages of students, and different lesson content. This is particularly important because the purpose of the test is to make inferences that provide the basis for making decisions about beginning (EFL) teachers' capabilities to take on full responsibility for teaching 12 to 18 students in different schools with different school populations. Adequate levels of domain coverage may conflict with considerations of efficiency and time. Coverage refers to the breadth and depth of evidence.


Reliability refers to the extent to which an assessment produces consistent scores across tasks intended to address the same capabilities or when tasks are rated by different assessors. Consistency in performance on tasks that address the same capabilities is difficult to achieve because of the complex interactions between the task and the performance. With performance assessment, attention therefore tends to focus on interrater reliability, that is, different raters agreeing on the same score for a single performance. Measures that improve interrater reliability are increased specification of the task/activity, clear performance criteria, scoring guides that help assessors to focus on the same features, and careful training of assessors. However, measures that increase reliability may be at the expense of authenticity and of content quality as these measures may narrow the performance that is being assessed.


Impact refers to the extent to which the assessment has a positive or a negative impact on individual teachers, as well as on teacher education and the educational system. At the level of individual candidates, the experience of carrying out the assessment may have a positive impact if they perceive the tasks/activities as relevant. How the results of the assessments are communicated to candidates may also affect their perceptions. If candidates receive personalized feedback on their performance, this may have a positive impact on the assessment. A profile would do justice to the complexity and multidimensionality of the performance and could provide a candidate with valuable information about his or her performance. Candidates may also be affected by the decisions that are to be made about them on the basis of the evidence. The procedure should guarantee that candidates are treated fairly and uniformly. They should be informed about the different aspects of the assessment procedure. They should get equal opportunities to demonstrate what they can do. Throughout the development process, developers should consider the potential consequences of decisions for candidates.


This quality is of a different type. In developing the assessment, an attempt should be made to achieve an optimum balance among the five qualities described above. Achieving an optimum balance requires considerations of time and cost in the development and implementation of the assessment. This is particularly important in developing teacher assessments because of the complexity of the undertaking; Development is costly and once tasks/activities have been developed, they demand considerable investment of resources. So what can be achieved in terms of the five qualities described above directly relates to the availability of resources. Cost and efficiency relative to the expected benefits of the assessment should be taken into account when developing instruments.

Ultimately, the aim is to develop a set of instruments that is useful for its intended purposes by finding an optimum balance among the six qualities.


Although a framework cannot capture details, the results of the analysis of the recent literature on teaching and teacher assessment are presented in Table 1. We have extracted 15 implications that developers of teacher assessments should pay attention to and make decisions about. These are listed in the left column in the order in which they appear in the text. In the middle column, the foundations or the rationale for these implications are described briefly. In the right column, questions and problems that need further attention from developers are indicated.




In exploring the actual development of an assessment procedure for beginning EFL teachers, we took the framework as a point of departure. In order to take the framework a step further and specifically inform the development process, we turned to the Standards for Educational and Psychological Testing (AERA, 1999) for information. Next, we adapted components or phases of test development described by various designers of assessment procedures to suit our purposes (Bachman & Palmer, 1996; Haertel, 1991, 1992; Stiggins, 1987). We distinguished nine components that should be specified in order to develop an assessment procedure. The components refer to decisions regarding purposes of the assessment, characteristics of examinees, who determines on what examinees will be assessed and who are the assessors, the formulations of content standards, a specification of the performance to be assessed, the quality criteria that help to make a value judgment about the assessment procedure, methods by which candidates will be assessed, determination of the scoring procedure, and communication of the results. The 15 requirements provided guidance in the specification of these components. An illustration of how we linked the requirements to different components of test development is presented in Table 2. The components of test development are listed in the left column of Table 2. The right column contains a description of how we actually carried out the specification of this component in creating an assessment procedure for EFL teachers. Between brackets, we refer to the requirements listed in Table 1 that apply to the specific component. Though an essential aspect of test development, the first two components were part of prior decisions and did not form part of the 15 requirements. The illustration is necessarily of limited scope.

The assessment procedure under development illustrated in Table 2 demonstrates the usefulness of the framework presented in Table 1. The 15 requirements, reordered to fit the various components of test development, shed light on important considerations in the conceptualization and the design of teacher assessments. The usefulness of the framework lies in the explicit connections it creates between what we know about teaching and what it takes to teach and how to assess teaching. The framework points to important issues developers should pay attention to in specifying each component. However, we are aware that other sets of procedures are needed to further specify components. For example, the framework provides no guidelines on how to weigh the information resulting from the three instruments, how to combine the information to make an overall decision, and how to reach a pass-fail decision. It is beyond the scope of this article to include such procedures.


The framework fulfills three functions. Its first function is bringing together insights from recent theories about teaching and assessment and considering their implications for the development of teacher assessment procedures. This was what we originally set out to do. The result is a framework that provides a focus and a direction for the planning and development of an assessment procedure. As such, the framework serves a heuristic function: It is an aid to development. In exploring assessments for beginning teachers of English as a foreign language as illustrated in Table 2, we took this framework as a point of departure in our work. The framework was used to inform the different components of test development. Preliminary results indicate that the framework is useful in guiding the planning and development of beginning teacher assessments.

The framework also serves a critical function. Developers generate evaluative questions from the framework which they can use to evaluate an existing assessment system, while users of assessment systems may see the framework as a checklist to examine existing systems or systems under consideration.



A third function of the framework is that it offers an overview of the factors involved in developing assessment procedures. For all those involved in teacher evaluation, the complexity of creating an evaluation procedure becomes immediately apparent. Developers who intend to follow up on the implications have to face tough questions that cannot be easily answered. The framework and the preliminary results of the actual development of an assessment procedure for EFL teachers illustrate that developing teacher assessments is not only a complex technical process but also, and perhaps even more so, a social process. The need for standards implies that a consensus can be reached among divergent views on what good teaching involves. The selection of tasks by which candidates demonstrate their capabilities on standards also depends on human judgment and decision making, as does the ultimate judgment about whether candidates meet standards. The advantage of being explicit about these issues lies in stimulating the dialogue among all those involved in teaching about what we understand good teaching to be. The quality and substance of that effort determine how teacher evaluation affects teachers.


Airasian, P., Gullickson, A., Hahn, L., & Farland, D. (1995). Teacher self-evaluation: The literature in perspective. Kalamazoo, MI: CREATE.

American Educational Research Association (AERA), American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington DC: American Psychological Association.

Andrews, T. E., & Barnes, S. (1990). Assessment of teaching. In W. R. Houston (Ed.), Handbook of research on teacher education (pp. 569-598). New York: Macmillan.

Bachman, L. E, & Palmer, A. S. (1996). Language testing in practice. Oxford: OUP.

Beijaard, D. (1998). Persoonlijke onderwijstheorieen van leraren [Teachers' personal theories of teaching]. In L. Verschaffel & J. Vermunt (Eds.), Het leren van leerlingen. Onderwijskundig Lexicon Editie III. (pp. 107-123). Alphen aan de Rijn, Netherlands: Samson.

Beijaard, D., & Verloop, N. (1996). Assessing teachers' practical knowledge. Studies in Educational Evaluation, 22 (3), 275-286.

Bereiter, C., & Scardamalia, M. (1993). Surpassing ourselves: An inquiry into the nature and implications of expertise. Chicago: Open Court.

Borko, H., & Putnam, R, T. (1996). Learning to teach. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 673-708). New York: Macmillan.

Bullough, R. V. (1997). Becoming a teacher: Self and the social location of teacher education. In B. J. Biddle, T. L. Good, & L F. Goodson (Eds.), The international handbook of teachers and leaching (pp.79-134). Dordrecht, Netherlands: Kluwer.

Calderhead, J. (1996). Teachers: Beliefs and knowledge. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 709-725). New York: Macmillan.

Calderhead, J., & Shorrock, S. B. (1997). Understanding teacher education. London: Falmer Press.

Carter, K. (1990). Teachers' knowledge and learning to teach. In W. R. Houston (Ed.), Handbook of research on teacher education (pp. 291-310). New York: Macmillan,

Clark, C. M., & Peterson, P. L, (1986). Teachers' thought processes. In M. C. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 255-296). New York: Macmillan.

Cochran-Smith, M., & Lytle, S. L. (1999). Relationships of knowledge and practice: Teacher learning in communities. In A. Iran-Nejad & C. D. Pearson (Eds.), Review of research in education (Vol. 24, pp. 249-305). Washington DC: American Educational Research Association.

Darling-Hammond, L. (1999). Educating teachers for the next century: Rethinking practice and policy. In G. A. Griffin (Ed.), Ninety-eighth yearbook of the national society for the study of education (pp. 221-256). Chicago: University of Chicago Press.

Delandshere, G. (1994). The assessment of teachers in the United States. Assessment in Education, 1 (1), 95-113.

Delandshere, G. (1996). From static and prescribed to dynamic and principled assessment of teaching. The Elementary School Journal, 97 (2), 105-120.

Delandshere, G., & Petrosky, A. R. (1994). Capturing teachers' knowledge: Performance assessment and post-structuralism. Educational Researcher, 23 (5), 11-18.

Delandshere, G., & Petrosky, A. R. (1998). Assessment of complex performances: Limitations of key measurement assumptions. Educational Researcher, 27 (2), 14-24.

Diez, M. E., Richardson, V., & Pearson, P. D. (Eds.) (1994). Setting standards and educating teachers: A national conversation. Washington DC: American Association of Colleges for Teacher Education.

Doyle, W. (1986). Classroom organization and management. In M. C. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 392-431). New York: Macmillan.

Dwyer, C. A. (1994). Criteria for performance-based teacher assessment: Validity, standards and issues. Journal of Personnel Evaluation, 8 (2), 135-150.

Dwyer, C. A., & Stufflebeam, D. ( 1996). Teacher Evaluation. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 765-786). New York: Macmillan.

Eraut, M. (1994). Developing professional knowledge and competence. London: Palmer Press.

Educational Testing Service (ETS). (1992). Guidelines for proper use of the Praxis series: Professional assessments for beginning teachers. Princeton, NJ: Author.

Feiman-Nemser, S., & Remillard, J. (1996). Perspectives on learning to teach. In F. B. Murray (Ed.), The teacher educator's handbook: Building a knowledge base for the preparation of teachers (pp. 63-91). San Francisco: Jossey Bass.

Fenstermacher, G. D. (1994). The Knower and the Known: The nature of knowledge in research on teaching. In L. Darling-Hammond (Ed.), Review of research in education (Vol. 20, pp. 3-56). Washington DC: American Educational Research Association.

Fessler, R. (1995). Dynamics of teacher career stages. In T. R. Guskey & M. Huberman (Eds.), Professional development in education (pp. 171-192); New York: Teacher College Press.

Gipps, C. V. (1994). Beyond testing: Towards a theory of educational assessment. London: Falmer Press.

Gonczi, A. (1994). Competency based assessment in the professions in Australia. Assessment in Education 1 (1), 27-45.

Grossman, P. L. (1992). Why models matter: An alternate view on professional growth in teaching. Review of Educational Research, 62 (2), 171-179.

Haertel, E. H. (1990). Performance tests, simulations and other methods. In J. Millman Sc L. Darling-Hammond (Eds.), The new handbook of teacher evaluation: Assessing elementary and secondary schoolteachers (pp. 278-294). Newbury Park, CA: Sage.

Haertel, E. H. (1991). New forms of teacher assessment. In G. Grant (Ed.), Review of research in education (Vol. 17, pp. 3-29). Washington DC: American Educational Research Association.

Haertel, E. H. (1992). Issues of validity and reliability in assessment center exercises and portfolios (Report No.S-1). Teacher Assessment Project, School of Education, Stanford University.

Haney, W., Madaus, G., & Kreitzer, A. (1987). Charms talismanic: Testing teachers for the improvement of American education. In E. Z. Rothkopf (Ed.), Review of research in education (Vol. 14, pp. 169-238). Washington DC: American Educational Research Association.

Hoekstra. H. A. (1995). Management selectie via simulaties: De methodologie van het assessment center [Management selection by simulations: The Assessment Center Approach]. In F. J. R. C. Dochy & T. R. Rijke (Eds.), Assessment Centers: Nieuwe toepassingen in opleiding, onderwijs en HRM (pp. 53-72). Utrecht, Netherlands: Lemma.

Huberman, M. (1995). Professional careers and professional development. In T. R. Guskey & M. Huberman (Eds.), Professional development in education (pp. 193-224). New York: Teachers College Press.

Interstate New Teacher Assessment and Support Consortium (INTASC) (1992). Model standards for beginning teacher licensing and development: A resource for state dialogue, Washington, DC: Council of Chief State School Officers.

Jessup, G. (1991). Outcomes: NVQs and the emerging model of education and training. London: Falmer Press,

Kagan, D. M. (1990). Ways of evaluating teacher cognition: Inferences concerning the Goldilocks principle. Review of Educational Research, 60 (3), 419-469.

Kagan, D. M. (1992). Professional growth among preservice and beginning teachers. Review of Educational Research, 62 (2), 129-169.

Kennedy, M. (1997). The connection between research and practice. Educational Researcher, 26 (7), 4-12.

Kwakman, K. (1999). Leren van docenten tijdens de beroepsloopbaan [Teacher learning during their careers]. Unpublished doctoral dissertation, Katholieke Universiteit Nijmegen, Netherlands.

Leinhardt, G. (1990). Capturing craft knowledge in teaching. Educational Researcher, 19 (2), 18-25.

Leinhardt, G. (1993). On teaching. In R. Glaser (Ed.), Advances in instructional psychology (pp. 1-54). Hillsdale, NJ: Laurence Erlbaum

Linn, R.L. (1993). Educational assessment: Expanded expectations and challenges. Educational Evaluation and Policy Analysis, 15 (1), 1-16.

Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: expectations and validation criteria. Educational Researcher, 20 (8), 15-21.

Lowyck, J. (1994). Teaching Effectiveness: An overview of studies. Tijdschrift voor Onderwijsresearch, 19(1), 17-25.

Meijer, P. C. (1999). Teachers' practical knowledge: Teaching reading comprehension in secondary education. Unpublished doctoral dissertation, Leiden University, Netherlands.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23 (2), 13-23.

Milanowski, A., Odden, A., & Youngs, P. (1998). Teacher knowledge and skills assessments and teacher compensation: An overview of measurement and linkage issues. Journal of Personnel Evaluation in Education, 12 (2), 83-101.

Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62 (3), 229-258.

Moss, P. A. (1994). Can there be validity without reliability? Educational Researcher, 23 (2), 5-12.

Moss, P. A. (1996). Enlarging the dialogue in educational measurement: Voices from interpretive research traditions. Educational Researcher, 25 (1), 20-28.

Moss, P. A., & Schutz, A. M (1999). Risking frankness in educational assessment. Phi Delta Kappan, 80 (9), 680-687.

National Board for Professional Teaching Standards (NBPTS). (n.d.). What teachers should know and be able to do. Southfield, Ml: Author.

National Council for Vocational Qualifications (NCVQ). (1991). Guide to national vocational qualifications. London: Author.

Putnam, R. T., & Borko, H. (1997). Teacher learning: Implications of the new view of cognition. In B. J. Biddle, T. L. Good, & I. F. Goodson (Eds.), The international handbook of teachers and teaching (pp. 1223-1296). Dordrecht, Netherlands: Kluwer.

Reynolds, A. (1992). What is a competent beginning teaching? A review of the literature. Review of Educational Research, 62 (1), 1-35.

Reynolds, M. C. (1989). Knowledge base for the beginning teacher. Oxford: Pergamon.

Richardson, V. (1996). The role of attitudes and beliefs in learning to teach. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 102-119). New York: Macmillan.

Roth, R. A. (1996). Standards for certification, licensure and accreditation. In J. Sikula (Ed.), Handbook of research on teacher education (2nd ed., pp. 242278). New York: Macmillan.

Shulman, L. S. (1987a). Assessment for teaching: An initiative for the profession. Phi Delta Kappan, 69 (1), 38-44.

Shulman, L. S. (1987b). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57 (1), 1-22.

Sprinthall, N. A., Reiman, A. J., & Thies-Sprinthall, L. (1996). Teacher professional development. In J. Sikula (Ed.), Handbook of research on teacher education (2nd ed., pp. 666-703). New York: Macmillan.

Sternberg, R. J., & Horvath, J. A. (1995). A prototype view of expert teaching. Educational Researcher, 24 (6), 9-17.

Stiggins, R. J. (1987). Design and development of performance assessments. Educational Measurement: Issues and Practice, 6 (3), 33-41.

Stodolsky, S. S. (1990). Classroom Observation. In J. Millman & L. Darling-Hammond (Eds.), The new handbook of teacher evaluation: Assessing elementary and secondary schoolteachers (pp. 175-190). London: Sage.

Straetmans, G. J. J. M. (1995). Assessment in onderwijs en opleiding met competentietoetsen [Assessment in education and training with competence tests]. In F. J. R. C. Dochy & T. R. de Rijke (Eds.), Assessment Centers: Nieuwe toepassingen in opleiding, onderwijs en HRM (pp. 215-238). Utrecht, Netherlands: Lemma.

Swanson, D. B., Norman, G. R., & Linn, R. L. (1995). Performance-based assessment: Lessons from the health professions. Educational Researcher, 24 (5), 5-11.

Tillema, H. H. (1993). Ontwerpen van authentieke assessments [Designing authentic assessments]. In H, H. Tillema (Ed.), Assessment en opleiden in organisaties. Opleiders in organisaties 14 (pp. 46-65). Deventer, Netherlands: Kluwer Bedrijfswetenschappen.

Tomlinson, P. (1995a). Can competence profiling work for effective teacher preparation? Part I: General issues. Oxford Review of Education, 21 (2), 179-194.

Tomlinson, P. (1995b). Can competence profiling work for effective teacher preparation? Part II: Pitfalls and principles. Oxford Review of Education, 21 (3), 299-314.

Van Driel, J. H., Verloop, N., & De Vos, W. (1998). Developing science teachers' pedagogical content knowledge. Journal of Research in Science Education, 35 (6), 673-695.

Verloop, N. (1992). Praktijkkennis van docenten: Een blinde vlek in de onderwijskunde [Teachers' practical knowledge: A blind spot in educational theory]. Pedagogische Studien, 69 (6), 410-423.

Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75 (3), 200-214.

Wolf, A. (1995). Competence-based Assessment. Buckingham, England: Open University Press.

Wolf, D., Bixby, J., Glenn III, J., & Gardner, H. (1991). To use their minds well: Investigating new forms of student assessment. In G. Grant (Ed.), Review of research in education (Vol. 17, pp. 31-74). Washington DC: American Educational Research Association.

ANNE MARIE UHLENBECK is a teacher educator at ICLON, Graduate School of Education, Leiden University, The Netherlands. She is working on her dissertation on the evaluation of beginning teachers of English as a foreign language.

NICO VERLOOP is Professor of Education and Director of ICLON, Graduate School of Education, Leiden University, The Netherlands. His major research interests are teachers' knowledge base, teachers' practical knowledge, learning and professional development of teachers, and the evaluation of teachers. He is the coauthor of "Professional development and reform in science education: The role of teachers' practical knowledge," Journal of Research in Science Teaching (2001).

DOUWE BEIJAARD is Associate Professor at ICLON, Graduate School of Education, Leiden University, The Netherlands. His major research interests are teachers' practical knowledge, learning and professional development of teachers, teacher evaluation and the use of portfolios in teacher education programs. He is the author of "Teachers' perceptions of professional identity: An exploratory study from a personal knowledge perspective," Teaching and Teacher Education (2000).

Cite This Article as: Teachers College Record Volume 104 Number 2, 2002, p. 242-272
https://www.tcrecord.org ID Number: 10828, Date Accessed: 11/29/2021 10:11:47 AM

Purchase Reprint Rights for this article or review
Article Tools
Related Articles

Related Discussion
Post a Comment | Read All

About the Author
  • Anne Uhlenbeck
    Leiden University, Leiden, The Netherlands
    E-mail Author
    ANNE MARIE UHLENBECK is a teacher educator at ICLON, Graduate School of Education, Leiden University, The Netherlands. She is working on her dissertation on the evaluation of beginning teachers of English as a foreign language.
  • Nico Verloop
    Leiden University, Leiden, The Netherlands
    NICO VERLOOP is Professor of Education and Director of ICLON, Graduate School of Education, Leiden University, The Netherlands. His major research interests are teachers’ knowledge base, teachers’ practical knowledge, learning and professional development of teachers, and the evaluation of teachers. He is the coauthor of “Professional development and reform in science education: The role of teachers’ practical knowledge,” Journal of Research in Science Teaching (2001).
  • Douwe Beijaard

Member Center
In Print
This Month's Issue