Teacher Effects in Early Grades: Evidence From a Randomized Study
by Spyros Konstantopoulos - 2011
Background/Context: One important question to educational research is whether teachers can influence student achievement over time. This question is related to the durability of teacher effects on student achievement in successive grades. The research evidence about teacher effects on student achievement has been somewhat mixed. Some education production function studies seem to suggest that the effects of observed teacher characteristics on student achievement are negligible, while others suggest that they are considerable (Greenwald, Hedges, & Laine, 1996; Hanushek, 1986). Other studies have consistently documented that teachers differ substantially in their effectiveness measured as between-classroom variation in achievement adjusted by student background (Hanushek, 1986; Nye et al., 2004; Rivkin et al., 2005). Thus far, there is no evidence about the persistence of teacher effects in early grades using high quality data from a randomized experiment.
Purpose/Objective/Research Question/Focus of Study: This study examines the enduring benefits of teacher effects on student achievement in early elementary grades using high quality experimental data from Project STAR. I am interested in determining the persistence of teacher effects in early grades and whether teacher effects remain strong predictors of student achievement or fade over a four-year period for kindergarten through third grade.
Research Design: I computed teacher effects as classroom-specific random effects and then I used them as predictors of student achievement in subsequent years. I also examined whether teacher effects persisted through third grade. Multilevel models were used to conduct the analysis. The results suggest that overall teacher effects in early grades are evident through third grade in reading and mathematics achievement.
Findings/Results: The findings support the idea that teachers do matter and significantly affect reading and mathematics achievement not only in the current or the following year, but in subsequent years as well. However, the results also show that teacher effects estimates in previous grades are smaller than estimates in later grades. The teacher effects are more pronounced in reading.
Conclusions: Students who receive effective teachers at the 85th percentile of the teacher effectiveness distribution in three consecutive grades kindergarten through second grade would experience achievement increases of about one-third of a SD in reading in third grade. These effects are considerable and comparable to achievement increases caused by cumulative effects of small classes in early grades. Such effects in education are important and are nearly one-third of a year’s growth in achievement (Hill, Bloom, Black, & Lipsey, 2008).
A fundamental objective of American education is to provide high-quality educational experiences that facilitate academic growth for all students. Much educational research has focused on identifying important school-related factors that affect student learning, while many school policy initiatives have attempted to ensure that valuable school resources are allocated adequately across schools. What lies at the center of this line of research is the notion that school resources do matter; that is, they can positively affect student achievement. One paramount factor that is widely believed by educational researchers to affect student achievement is teachers, and a fundamental goal of research on teacher effects is to examine how teachers improve academic achievement for all students. Currently, with the passage of the No Child Left behind Act (NCLB), test scores are widely used to hold schools and teachers accountable for student learning. One way to increase student achievement is through teachers, and NCLB has mandated state plans to improve teacher effectiveness. The underlying belief is that highly effective teachers can make a difference in promoting student achievement. An important and timely task, then, is to examine the effects of teachers on academic achievement.
One crucial question for educational research is whether teachers differ noticeably in their effectiveness as educators and pedagogues in promoting student achievement. This question has been addressed by numerous studies in the teacher effects literature, and evidence from experimental and non-experimental studies has consistently indicated that teachers differ considerably in their effectiveness (Goldhaber & Brewer, 1997; Hanushek, 1971; Nye, et al., 2004; Rivkin, Hanushek, & Kain, 2005; Rowan et al., 2002). In such studies, teacher effectiveness is typically defined as differences or variation in achievement between classrooms adjusted by student background. However, this work has mainly discussed teacher effects in current grades in specific years, employing cross-sectional analyses of student samples.
An equally important question for educational research is whether teacher effectiveness can influence student achievement over time. This question is related to the durability of teacher effects on student achievement in successive grades. Pioneering work on teacher effects in the 1970s and 1980s has hypothesized that those students who receive highly effective teachers in consecutive years benefit more than do other students (Good & Brophy, 1987; Pedersen, Faucher, & Eaton, 1978). For example, Pedersen and colleagues reported positive long-term effects of a first grade teacher on the adult success of disadvantaged students. If teaching indeed has enduring, meaningful, and positive effects on student achievement, then identifying effective teachers as well as factors that cause teachers to be more effective is important for both educational research and reform. This would indicate that not only the current, but the previous teachers also matter. This is of critical importance for student learning, especially in early school grades where students acquire the very basic skills that lay the foundations for more advanced skills in later grades. If, on the other hand, teacher effectiveness has negligible long-term benefits on student achievement, then perhaps only the current teachers make a difference in student achievement in specific years (assuming that teacher effects in a specific year are not trivial).
The research evidence about teacher effects on student achievement has been somewhat mixed. Some education production function studies seem to suggest that the effects of observed teacher characteristics on student achievement are negligible, while others suggest that they are considerable (Greenwald et al., 1996; Hanushek, 1986). Other studies have consistently documented that teachers differ substantially in their effectiveness, as measured by between-classroom variation in achievement adjusted by student background (Hanushek, 1986; Rivkin et al., 2005). Recent evidence using data from a randomized experiment has also suggested that teacher effects, measured as residual between-classroom variation in achievement, are large and meaningful in early grades (Nye et al., 2004). In fact, the results of that study indicated that the magnitude of teacher effects were at least as large as the small class effects in Project STAR and were similar to estimates from non-experimental studies of teacher effectiveness. Nonetheless, it seems that only a small proportion of the variation in teacher effectiveness is explained by the typical observable teacher characteristics such as education and experience (Nye et al., 2004; Rivkin et al., 2005).
Thus far, there is no evidence about the persistence of teacher effects in early grades using high quality data from a randomized experiment. The present study examines the enduring benefits of teacher effects on student achievement in early elementary grades, using high quality experimental data from Project STAR (Krueger, 1999; Nye et al., 2000). Specifically, I am interested in determining the persistence of teacher effects in early grades and whether teacher effects remain strong predictors of student achievement or whether they fade over a four-year period (kindergarten through third grade).
PREVIOUS RESEARCH ON TEACHER EFFECTS
In general, two lines of research have discussed the effects of teachers on student achievement. The first tradition of research includes studies that measure the association between teacher characteristics and student achievement. Some of these studies are known as education production function studies and these endeavor to determine the relationship of specific measured teacher characteristics, such as teacher experience or teacher education, with student achievement. However, because parents choose neighborhoods in which to live, and hence their associated schools, according to tastes and resources (Tiebout, 1956), student and family backgrounds are confounded with naturally occurring teacher characteristics. Education production function studies (Coleman, et al., 1966) attempt to control for this confounding effect by using student and family background characteristics as covariates in regression models. A particularly important covariate is prior achievement, because it can be seen as summarizing the effects of individual background (including prior educational experiences) and family background up to that time. However, even this covariate may leave important characteristics of the student unmeasured. Some reviewers of the production function literature argue that measured teacher characteristics such as educational preparation, experience, or salary are only slightly related to student achievement (Hanushek, 1986). Other reviewers argue that there are positive effects of some of the resource characteristics, such as teacher experience and teacher education, on student achievement (Greenwald et al., 1996). More recently, researchers have demonstrated a positive association between teacher experience and student achievement (Clotfelter, et al., 2006). In addition, National Board certified teachers have been shown to be more effective than other teachers (Goldhaber & Anthony, 2007).
The first tradition of research also includes studies that examine the association between good teaching practice, or what teachers do in the classroom, and student achievement (Good & Brophy, 1987). Such studies are often called process-product studies because they identify classroom processes or observed teacher characteristics that are associated with student outcomes or products such as achievement. Some of these teacher characteristics include teacher confidence in teaching students successfully, efficient allocation of classroom time to instruction and academic tasks, effective classroom organization and group management, and active/engaging teaching that emphasizes understanding of concepts (Good & Brophy, 1987). Such teacher characteristics have been documented as positive correlates of student achievement (Good & Brophy, 1987). Teachers with higher evaluation scores in their teaching also had higher classroom achievement means and contributed to closing the achievement gap between lower and higher socioeconomic status (SES) students in some grades (Borman & Kimball, 2005). Improvements in teacher qualifications also seem to improve student achievement especially in poor schools (Boyd, Lankford, Loeb, Rockoff, & Wyckoff, 2008) and teacher content knowledge seems to benefit students (Kennedy, 2008).
The second tradition of research examines the variation between classrooms in achievement, while controlling for student background. These models typically also use prior achievement as a covariate, so they can be interpreted as measuring the variance in residualized student achievement gain across classrooms. That is, these variances represent the variation in achievement gain due to differences in teacher effectiveness. The underlying assumption is that the between-classroom variation in achievement is caused by variation in teacher effectiveness. Overall, the results of such studies have suggested that there is indeed variation in teacher effectiveness (Goldhaber and Brewer, 1997; Murnane and Phillips, 1981; Rowan et al., 2002). A recent study provided evidence about teacher effects, similarly defined as variation in average classroom achievement, using experimental data and documented large differences in average achievement among classrooms (Nye et al., 2004). However, studies within this second tradition of research cannot identify specific teacher characteristics that compose teacher effectiveness and instead define teacher effects generally as between-classroom variation in achievement. It is noteworthy that typically observed teacher characteristics such as teacher experience and education explain a small proportion of the variation in teacher effectiveness (Rivkin et al., 2005). Using Project STAR data, I have also found that teacher education and experience explained less than one percent of the variation in teacher effectiveness across all grades and test scores. It is important to recognize that failure to find some set of measured teacher characteristics that is related to student achievement does not mean that all teachers have the same effectiveness in promoting achievement.
LIMITATIONS OF PREVIOUS WORK
Students within schools are frequently placed into classrooms or assigned to teachers based on student characteristics such as achievement. In turn, teachers are also not randomly assigned to classrooms. This creates problems when inferring the relationship between characteristics of teachers and student achievement. For instance, suppose that teachers with more experience are assigned to classes composed of higher achieving students as a privilege of seniority or to those consisting of lower achieving students as compensatory strategy of assigning human capital. In such cases, the causal direction in the relationship between teacher experience and student achievement is not that teacher experience causes achievement but rather is the reverse. This ambiguity of causal direction is a major problem for production function studies of the effects of teacher characteristics on student achievement. In fact, a recent study by Clotfelter and colleagues (2006) reported that advantaged students are more likely to receive highly qualified teachers, and this then biases the association between teacher characteristics and achievement.
Valid interpretation of results requires that the covariates adequately control for preexisting differences including unobservable differences that are related to achievement growth among students assigned to different classrooms. Another requirement is that teachers are not assigned to classrooms on the basis of student characteristics, which may be known to the school but are unavailable for use as covariates in the statistical analysis to exaggerate or attenuate differences between classrooms in achievement or achievement gains. For example, schools might assign a particularly effective teacher to students believed to be entering a difficult period as a compensatory resource allocation strategy. Alternatively, schools might assign a particularly effective teacher to students believed to have promise for unusually large achievement gains as a reward for accomplishment or a meritocratic resource allocation strategy. Schools have many sources of information suitable for identifying students poised for unusually large gains or losses. These include essentially everything known about the child beyond test scores and easily recorded factors such as SES, gender, and family structure. Some examples are an impeding divorce, change of residence, delinquency problems, problems with siblings, unemployment of parents, or adjustment problems in school, all of which may signal potential difficulties in the next school year. Alternatively, improvements in student motivation, compliance, adjustment, or parental involvement may all signal unusually good prospects for the next school year. Even important covariates such as prior achievement do not completely control for unobserved student characteristics.
It is difficult to interpret the estimates of teacher effects on student achievement in both traditions of research mentioned previously, even after controlling for previous achievement and student background such as family SES, because the teacher effects may still be confounded with unobserved individual, family, school, and neighborhood factors. It is not clear that the observed covariates adequately control for all preexisting differences, including unobservable differences that are related to achievement, such as motivation, among students assigned to different classrooms. This suggests that the teacher effects variable may be endogenous and its estimate biased. Nonetheless, when modeling non-experimental data to estimate teacher effects, it is crucial to adjust for important student covariates such as family background and especially previous achievement.
The problems in interpretation of the two research traditions discussed above would be eliminated if both students and teachers were randomly assigned to classes. Random assignment of students would ensure that all observable and unobservable differences between students in different classes would be no larger than would be expected by chance. Random assignment of teachers to classes would ensure that any differences in teacher characteristics would be uncorrelated with classroom achievement, although this potential problem would also be substantially mitigated if randomization of students ensured that no large differences would exist in student achievement across classrooms. In this study, I used data from the Tennessee class size experiment or Project STAR that satisfies both conditions of randomization. Note that Project STAR was a field experiment designed to measure class size effects. However, the fact that students and teachers were randomly assigned to classroom types within schools in each grade provides a great opportunity to gauge teacher effects, since the potential confounding issues should, in principle, be eliminated.
Project STAR (Student Teacher Achievement Ratio) is a four-year large-scale experiment that took place in Tennessee in the 1980s. The experiment was commissioned in 1985 by the Tennessee state legislature and was implemented by a consortium of Universities and the Department of Education in Tennessee. The experiment lasted for four years, from Kindergarten to grade 3, and the total cost, including hiring teacher and teacher aids, was about $12 million. The state of Tennessee paid for hiring additional teachers and classroom aides. Project STAR is considered one of the greatest experiments in education in America.
All school districts in Tennessee were asked to participate in Project STAR, but only about 100 schools met the criteria for participation. That is, each school had to have a minimum of nearly 60 students in each grade to participate, the idea being that, in each grade, it was necessary to form one small and two regular size classes in order to carry out the experimental design. This school size requirement excluded very small schools. In the first year of the experiment, a cohort of more than 6,000 students in more than 300 classrooms in 79 elementary schools in 42 districts in Tennessee participated. The sample included a broad range of schools and districts (urban, rural, wealthy, and poor). Districts had to agree to participate for 4 years, allow school visits for verification of class sizes, interviewing, and data collection, and include extra student testing. They also had to allow research staff to assign pupils and teachers randomly to class types and to maintain the assignment of students to class types from kindergarten through third grade.
During the first year of the study, within each school, kindergarten students who were enrolled in participating schools were assigned randomly to classrooms in one of three types: smaller classes (13 to 17 students), larger classes (with 22 to 26 students), or larger classes (22 to 26 students) with a full-time classroom aide. Teachers were also assigned randomly to classes of different types. The assignments of students to classroom types were maintained through the third grade for students who remained in the study. Some students entered the study in the first grade and in subsequent grades and were assigned randomly to classes at that time. Teachers at each subsequent grade level were also assigned randomly to classes as the cohort passed through the grades. The students had different teachers in each grade with a small exception in the first and second grade. In particular, 33 students who participated in Project STAR in the first and the second grade had the same teachers in both grades.
Schools followed their own policies and curricula and did not receive additional funds or incentives to participate in the project (other than hiring teachers and classroom aides). Teachers also did not receive additional training and there were neither incentives nor penalties for students to participate in Project STAR. That is, Project STAR was part of the everyday operation of the schools that participated. On average, each year, more than 6,000 in approximately 330 classrooms in about 75 schools were part of the project. Over the 4 years more than 11,500 students participated in Project STAR. Stanford Achievement Test (SAT) scales of the seventh edition were used to measure reading and mathematics achievement in Project STAR.
VALIDITY OF PROJECT STAR
The internal validity of the Project STAR data depend on whether random assignment effectively eliminated preexisting differences between students and teachers assigned to different types of classrooms. The fact that the randomization of students and teachers to classrooms was carried out by the consortium of researchers who carried out the experiment, enhances its credibility. However it is good practice to check whether there were any differences in pre-existing observed characteristics of teachers or students. Unfortunately, no pretest scores were collected in Project STAR so it is not possible to examine differences in pre kindergarten achievement. However, one could check randomization using student variables such as age, race, and SES. Kreuger (1999) examined the effectiveness of the randomization among treatment groups, small, regular, and regular classes with a full time aide, and found that across three observed variables such as SES, minority group status, and age there were no significant differences between classroom types once school differences were taken into account. Krueger also found that there were no significant differences across classroom types with respect to teacher characteristics such as race, experience, and education. Kruger concluded that it did not appear that random assignment was compromised by these observed characteristics. This result, however, does not necessarily indicate that randomization was successful on all observed and unobserved characteristics. Other investigators have expressed some concerns about the randomization in Project STAR, especially for teacher characteristics, and have argued that the small class effect may be biased upwards (Hanushek, 1999).
Even if we assume that randomization across classroom types was successful, it is still possible that there might be differences between classrooms that were assigned to the same treatment group within schools. Because teacher effects here are defined and estimated using differences in average achievement between classrooms that receive the same treatment type within schools, it is critical to check whether randomization worked across classrooms within treatment types within schools. A recent study undertook this task and produced results that are consistent with what would be expected if randomization were successful. That is, no systematic differences were found for specific observed student characteristics between classrooms that were in the same treatment type within schools (Nye et al., 2004). However, evidence about differences across teachers within schools was not provided.
Attrition and Mobility
As in most large-scale longitudinal studies, attrition also took place in Project STAR. Specifically, some of the students who attended participating schools in 1 year did not remain in the Project STAR sample and dropped out of the experiment. Approximately 28 percent of the students who participated in Project STAR in kindergarten were not part of the study in the first grade. The attrition rate was slightly smaller, nearly 25 percent, for students who participated in the study in the first grade, but were not present in the second grade. Another 20 percent of the students dropped out of the study after the second grade and thus they were not present in the third grade. Only about 50 percent of the students who were part of the experiment in kindergarten were still part of Project STAR in the third grade.
The effects of differential attrition on the estimates of class size have been discussed in two recent studies (Krueger, 1999; Nye et al., 2000). It is common practice to examine differential attrition between types of classrooms on the outcome measures such as achievement scores. For example, Krueger examined whether differential attrition among types of classrooms biased the estimates of class size. Differential attrition can bias class size effects if the students who dropped out of small classes were systematically different in achievement compared with those who dropped out of the regular type of classes (Kruger, 1999). In longitudinal designs such as Project STAR, one way to measure the effects of differential attrition is by imputing the scores of those students who dropped out of the study each year. Specifically, Krueger imputed dropouts scores with their most recent observed scores. That is, if a student participated in the study and had a specific score in kindergarten, but was not part of the study in first grade, their kindergarten score was assigned as their first grade score. Krueger computed the class size estimates with and without imputation and, after comparing these estimates, he concluded that it seemed unlikely that differential attrition biased the class size estimates. The same conclusion was independently reached by Nye and colleagues (2000) using slightly different methods. Nye and colleagues examined differences in achievement for dropouts and stayers and observed no differential attrition that could bias the small class estimates.
In the same vein, differential attrition can potentially affect the teacher effect estimates of the present study. Specifically, attrition could be a source of bias if the students who dropped out in one year, and who had received, for example, low effective teachers in the previous year, are systematically different in their achievement levels from students who dropped out and had received high effective teachers in the previous year. If students who dropped out from low effective teachers/classrooms have higher achievement than those who dropped out from high effective teachers/classrooms, then teacher effects may be overestimated because of this selection mechanism. Such differential attrition mechanisms will bias the teacher effect estimates.
To examine whether differential attrition might bias teacher effects, I followed a method similar to that used by Nye et al. (2000). I computed differences in achievement between students who dropped out or who stayed in the experiment and who received higher or lower effective teachers. For each grade (e.g., kindergarten, first, or second) I compared the achievement scores of students who dropped out of or stayed in the study the following grade (e.g., first, second, or third). I first created three categories of teacher effectiveness: a) high effective teachers (e.g., top quartile in the distribution of teacher effectiveness), b) medium effective teachers (e.g., middle 50 percent in the distribution of teacher effectiveness), and c) low effective teachers (e.g., bottom quartile in the distribution of teacher effectiveness). Second, for each year, I constructed three binary indicators to represent these categories, and a binary variable for students who dropped out or stayed in Project STAR. Third, I used a linear model and regressed mathematics or reading scores on each teacher effectiveness indicator, the dropout indicator, and the interaction between the teacher effectiveness and the dropout dummies. Specifically, for student i in school j in grade g the statistical model I used is:
where y represents mathematics or reading achievement, g represents the grade (g = k, 1, 2), DROP is an indicator variable for dropout status, HIGHEFF and LOWEFF are indicator variables for having a high or low effective teacher, respectively, DROPHIGH, DROPLOW are interaction terms between dropout status and high or low effective teacher, and e is a random error term. The coefficients of interest in this equation are g40g , g50g. The estimates of the interaction effects represent differences in achievement for dropouts and stayers who received high, medium, or low effective teachers the previous year. Insignificant interactions would suggest no evidence of differential attrition for each category of teacher effectiveness. I also examined whether the linear association between teacher effectiveness and achievement differed for dropouts and stayers.
Table 1 summarizes the p-values of the tests of the interaction effects for reading and mathematics for kindergarten, first, and second grade. Overall, I examined 24 interactions and the p-values of the tests of these interactions are reported in Table 1. Only one of the 24 p-values was smaller than 0.05, while all other p-values were greater than 0.05. That is, only one interaction effect out of 24 was significant at the 0.05 level and indicated that students who received low effective teachers in the first grade and dropped out in the second grade had significantly higher reading achievement in first grade than those who stayed in the study in the second grade. All other interactions were insignificant. The probability that 1 out of 24 interactions is significant is slightly less than 5 percent, and thus, I argue that it could have occurred by chance. Overall, these results do not suggest systematic evidence of differential attrition and hence, attrition probably did not bias the teacher effects substantially. It is not impossible, however, that differential attrition may have created differences among students with respect to unobserved characteristics.
In addition, a small percentage of students who remained in the sample switched schools each year. For example, about four percent of the students who were part of Project STAR in kindergarten and first grade switched schools in first grade. Approximately two percent of the students who were part of Project STAR in the first and second grades, or in the second and third grades, switched schools in the second and third grade, respectively. It is possible that student mobility affected the teacher effect estimates. To examine whether student mobility affected these estimates, I ran sensitivity analyses for all models described in the following sections (see equations 3 and 4) on the sample of students who did not switch schools. The results from these analyses produced estimates of teacher effects that were very similar in magnitude and had the same signs as those reported in Tables 4 to 6. Hence, it seems unlikely that student switching across schools would have meaningfully affected the estimates.
DEFINING TEACHER EFFECTS
The main objective of the study is to examine whether teacher effects in 1 year (e.g., kindergarten) predict student achievement in subsequent years (e.g., first grade), net of the teacher effects in the current grade. The first step in this process involves the computation of teacher effects within each grade. This analysis makes use of the SAT reading and mathematics test scores collected as part of Project STAR. SAT is a widely used test that measures academic achievement of elementary and secondary school students that is designed to measure, among other things, word reading, reading comprehension, and mathematics computation and application. Overall, the internal consistency of the test is considered excellent. Because of the random assignment of students and teachers to classrooms within schools, the classrooms within each school are initially equivalent, and hence, any systematic differences in achievement among classes must be due to one of two sources: the class size effect or differences in teacher effectiveness. Thus, within each school, any systematic differences in achievement between classrooms that had the same treatment must be due to differences in teacher effectiveness (Nye et al., 2004). In other words, in this randomized experiment, the classroom mean residual adjusted for the treatment effect should be a valid estimate of the teacher effect (Raudenbush, 2004). In this case, it is reasonable to consider teacher effects as causal effects.
Following previous work by Nye et al. (2004), I operationalize the teacher effects as classroom-specific residuals that are adjusted for class size and student effects such as race, SES, or previous achievement. It is crucial to adjust for class size effects because it is likely that class size plays a role in achievement differences between classrooms. I compute teacher effects as classroom-specific random effects or residuals employing a three-level HLM (Raudenbush & Bryk, 2002). The first level involves a between-student within-classroom and school model, the second level involves a between-classroom within- school model, and the third level is a between-school model. To compute the teacher effects I used the same specification for mathematics and reading achievement for each grade (kindergarten, first, or second grade) separately. Hence, for each grade g, the one-level regression equation for student i, in class j, in school k is
where g = k, 1, 2, Yij kg represents student achievement in mathematics or reading in grade g, g000g is the average achievement across students, classrooms, and schools in grade g, g010g represents the overall small class effect in grade g, SMALL is a dummy variable for being in a small class in grade g, g020g represents the overall regular class with a full-time aide effect in grade g, AIDE is a dummy variable for being in a regular class with a full-time aide in grade g, g100g is the overall gender effect in grade g, FEMALE is a dummy variable for gender in grade g, g200g is the overall low SES effect in grade g, LOWSES is a dummy variable for free or reduced price lunch eligibility in grade g, g300g is the overall minority effect, MINORITY is a dummy variable for minority group membership indicating that the student was Black, Hispanic, or Asian, eij kg is a student-specific random effect or residual in grade g, x0 jkg is a classroom-specific random effect or residual in grade g, and h00kg is a school-specific random effect or residual in grade g. Notice that, for simplicity, only the classroom-specific and school-specific intercepts are treated as random at the second and third level, respectively. In this model, the variance of the error term is divided into three parts: the within-classroom, the between-classroom within-school, and the between-school variance. The classroom specific random effects, x s, represent the teacher effects adjusted for student gender, SES, minority group status, and class size effects.
Because the teacher-specific residuals are computed separately from the school level residuals, the differences or variation in achievement among teachers/classrooms within types of classrooms and within schools are net of the differences in achievement among schools. That is, the variance of the second level residuals indicates variance in classroom achievement within treatment types and within schools that is adjusted for school effects expressed as variability in achievement among schools in the third level residuals. Empirically, using Project STAR data, the variability in achievement among schools in the third level residuals is almost equivalent to the variability explained due to school fixed effects via dummy indicators in a typical regression model.
ASSESSING THE PREDICTIVE EFFICACY OF TEACHER EFFECTS
Once the teacher effects were computed for kindergarten, first, and second grade, they were used as predictors of student achievement in subsequent years. In this analysis, teacher effectiveness is used as a predictor of future achievement at the student level, and its estimate indicates whether the effectiveness of the teacher who taught a student in one year affected that students achievement in the following years, net of the effects of the current grade teachers. That is, this analysis uses samples of students who were part of project STAR for two or more grades.
In the first part of this analysis, I examined whether teacher effects in one year predicted student achievement the following year. For example, I examined whether the teacher effects computed in kindergarten were a significant predictor of student achievement in the first grade, net of first grade teacher effects. It is common practice also to include student covariates to model achievement in school effects research (Bryk & Raudenbush, 1988; Konstantopoulos, 2006; Lee, 2000). Hence, at the student level, I included the typical student demographic characteristics such as SES, race/ethnicity, gender, and previous achievement. These covariates assist with estimating the effects of teacher effects more precisely (e.g., smaller standard errors of the estimates). To simplify interpretations, I standardized the outcomes and the predictors included in the models to have a mean of zero and a standard deviation of one in each grade. Because of the standardization of all variables, all estimates are standardized regression coefficients. To determine the predictive efficacy of the teacher effects I also employed a three-level HLM. I used the same specification for mathematics and reading achievement in each grade (first, second, and third). Hence, for each grade g, the one-level regression equation for student i, in class j, in school k is:
where g = 1, 2, 3, Yij kg represents student achievement in mathematics or reading in first, second, or third grade, g 400g represents the overall teacher effect on student achievement in the following year in grade g, TEACHEREFFECT is a continuous variable that represents the teacher effects in the previous year (e.g., kindergarten, first, or second grade), and all other terms have been defined previously. The main objective of this analysis was to calculate the estimate g 400, which indicates whether teacher effects predict student achievement in the following year.
In this analysis, it is important to compute the association between teacher effects in one year and student achievement in the following year, while controlling for the effects of the current teachers. In HLM, this can be achieved by centering the level-1 predictors at their classroom means (group mean centering). The group mean centering adjusts for teacher effects in the current year and it is equivalent to using teacher fixed effects in regression (see Raudenbush & Wilms, 1995).
In the second part of this analysis, I examined whether teacher effects persisted in early grades. Specifically, teacher effects in kindergarten and first grade were used simultaneously in the regression equation to predict student achievement in second grade. In the same vein, teacher effects in kindergarten, first, and second grade were used simultaneously in the regression equation to predict student achievement in third grade. The objective was to investigate whether teacher effects in earlier grades such as kindergarten influence third grade achievement in the presence of teacher effects in first, second, and third grade. Below, I portray the model for third grade achievement. The model for second grade achievement is similar, only the teacher effects in grade 2 were not included in the equation. Hence, for grade 3, for student i, in classroom j, in school k, the one-level regression model is:
where K, 1, and 2 indicate teacher effects in kindergarten, grade 1, and grade 2, respectively, g 400 , g 500 , g 600 represent estimates of teacher effects in grades k, 1, and 2, respectively, and all other terms have been defined previously. The teacher effects in equation 4 are conditioned on all other teacher effects in the model. In these analyses, level-1 predictors were also group mean centered on their classroom means to adjust for current teacher effects.
DESCRIPTIVE STATISTICS OF THE PROJECT STAR SAMPLE
Table 2 reports descriptive statistics for the variables of interest included in the analysis. Nearly 50 percent of the students in the sample were female and low SES students. Approximately 33 percent of the students were minorities and about 30 percent of the students were in small classes.
Descriptive statistics of the teacher effects computed from kindergarten through second grade are summarized in Table 3. The means of teacher effects are zero, which is expected since they are residuals. In this case, the distribution of teacher effects is well defined by its variability. In the entire sample, the teacher effects had larger variability in mathematics than in reading, especially in the first and second grade, which indicates larger differences in classroom achievement in mathematics than in reading.
ASSESSING THE PREDICTIVE EFFICACY OF TEACHER EFFECTS
First, I present the results that describe the association between teacher effects in one year and student achievement in the following year for mathematics and reading achievement. These results are reported in Tables 4 and 5, respectively. All estimates in Table 4 are standardized regression coefficients and thus, the results for the first grade indicated that an increase of one standard deviation(SD) in teacher effectiveness in kindergarten corresponds to an increase of 0.070 SD in mathematics achievement. This association was positive and significant and suggested that the teachers that students have in kindergarten affect their mathematics achievement in first grade, net of gender, race, SES, class size, and teacher effects in first grade. The results for the second and third grades are similar, with positive and significant coefficients of larger magnitude at 0.08 and 0.11 SD, respectively. The gender differences in mathematics were small and insignificant, whilst minority and low SES students had significantly lower mathematics achievement than their white and higher SES peers. Small class effects were positive and significant.
The associations between teacher effects in one year and student reading achievement in the following year are summarized in Table 5. The standardized regression coefficients of teacher effects modeling first grade achievement indicated that and increase of one standard deviation in teacher effectiveness in kindergarten corresponds to an increase of nearly 1/10 of a standard deviation in first grade reading achievement. This association was positive and significant and suggests that the teachers who taught students in kindergarten affected their reading achievement in first grade, net of student background and current grade teacher effects. The results for the second and third grade were similar, with positive and significant coefficients of larger magnitude at 0.14 and 0.13 SD, respectively. The gender differences in reading were positive and significant in favor of female students, while minority and low SES students had significantly lower reading achievement than their white and higher SES peers, respectively. Small class effects were positive and significant.
Overall, these results showed that teacher effects in one year were positive and significant predictors of student achievement in the following year. On average, in reading, one standard deviation increase in teacher effects resulted in nearly one-tenth of a SD increase in student achievement, which does not seem to be a trivial effect. In mathematics, however, the teacher effects were somewhat smaller than were those in reading.
Second, I present the results that show the conditional teacher effects over time. These results indicated whether teacher effects in kindergarten first, and second grade predicted third grade achievement when teacher effects in third grade were taken into account. In the first model, teacher effects in kindergarten and first grade were used simultaneously in the equation to predict student achievement in grade 2. Likewise, teacher effects in kindergarten, first, and second grade were used simultaneously in the equation to predict student achievement in grade 3. The results of these analyses are reported in Table 6. Although class size, gender, race, SES, and current teacher effects were taken into account in this analysis, Table 6 reports only estimates of teacher effects for simplicity. Again, all estimates are standardized regression coefficients. The results for grade 2 mathematics achievement indicated that teacher effects in kindergarten and first grade have independent, positive, and significant effects on student mathematics achievement. The results for grade 2 reading achievement were comparable, but the teacher effects estimates were much larger and, on average, about one-tenth of a SD. The results for grade 3 mathematics achievement indicated that teacher effects in kindergarten and second grade have independent, positive, and significant effects on student mathematics achievement. However, teacher effects in first grade were not a significant predictor of grade 3 mathematics achievement in the presence of the other covariates. The results for grade 3 reading achievement were similar, larger in magnitude, and suggested that teacher effects in kindergarten, first, and second grade are significant and positive predictors of third grade reading achievement. In reading, the teacher effects estimates were, on average, greater than one-tenth of a SD. It is noteworthy that, in reading achievement, the estimates of teacher effects were consistently larger than were those in mathematics. These results indicated that teacher effects in different grades predicted third grade achievement for both reading and mathematics However, the teacher effects estimates in later years were typically stronger than those in earlier years.
In this study, I investigated teacher effects in early grades using high-quality data from Project STAR, in which teachers and students were randomly assigned to classrooms within schools. The results suggest that overall teacher effects in early grades are evident through third grade in reading and mathematics achievement. Because of random assignment of teachers and students to classrooms in this experiment, these results should provide strong evidence about the durability of teacher effects. The findings support the idea that teachers do matter and that they significantly affect mathematics and reading achievement not only in the current or the following year, but in subsequent years as well. However, the results also show that teacher effect estimates in previous grades are smaller than estimates in later grades.
The teacher effects are more pronounced in reading. In particular, students who are taught by effective teachers at the 85th percentile of the teacher effectiveness distribution in three consecutive grades (e.g., kindergarten, first, and second grade) would experience achievement increases of about one-third of a SD in reading. These effects are considerable and comparable to achievement increases resulting from cumulative small classes in early grades. Such effects in education are important and are nearly one-third of a years growth in achievement (Hill, et al., 2008). In addition, as Krueger (2003) argues, the minimum cost effective gain from class size reduction of the magnitude undertaken in Project STAR would be one-tenth of a standard deviation, and note that many of the teacher effects estimates in the present study are of that magnitude in reading. The teacher effects are slightly smaller in mathematics (an additive effect of about one-fourth of a SD). The teacher effects estimates in this study are typically larger than gender and race effects, which are typically not trivial (Hedges & Nowell, 1995, 1999; Konstantopoulos, 2009) and in certain cases nearly one half as large as the SES effects, which are typically substantial.
It is noteworthy that the enduring benefits of teacher effects seem consistently larger in reading than in mathematics. The teacher effects estimates in reading were, in some cases, 25 to 50 percent larger than those in mathematics. This is an interesting finding, given that the students in the same classroom are taught mathematics and reading by the same teacher. Student selection is also unlikely, since virtually the same samples of students took the SAT mathematics and reading tests. One explanation is that this finding is consistent with the notion that teachers typically put more emphasis on reading than on mathematics in early grades and that the pedagogy of reading is heavily infused in early grades. Familiarity with the basic mechanisms of reading, vocabulary growth, and systematic practice in reading take place in early grades. In addition, basic reading skills such as coding are developed in early grades and lay the foundation for later, more advanced reading skills such as comprehension. A related point is that teachers who teach in early grades may be better prepared to teach reading than mathematics, and schools may stress the importance of focusing more on reading in early grades. Another explanation is that curricula and teaching are more closely connected across early grades in reading than in mathematics. It is also possible that teaching practices and coverage of content are more likely to be captured by the reading than the mathematics section of SAT. In any case, the findings suggest that not much value is added in mathematics in early grades by teachers. Unfortunately, classroom observations or teacher logs were not available in Project STAR; therefore, it is impossible to know the actual teaching practices that took place in each classroom.
I also conducted analyses to examine whether teacher effects estimates were consistent across classrooms and schools. That is, I treated teacher effects estimates as random effects at the second and at the third level. I replicated all analyses portrayed in equations 3 and 4 for mathematics and reading scores. The results constantly indicated that the estimates of teacher effects did not vary much between schools, but varied considerably among classrooms between schools. This suggests that the estimates of teacher effects are similar across schools, whilst the magnitude of the estimates of teacher effects depends on the classrooms attended by the students in the following years. That is, the effects are more pronounced in some classrooms and less pronounced in other classrooms.
Sensitivity analyses were also conducted. First, I constructed different coding schemes for teacher effects in order to examine whether they have different effects on student achievement. The teacher effects in equations 3 and 4 were assumed to have a linear association with student outcomes, but it is possible that teacher effects are nonlinear. As a result, I reran some of the models defined earlier that coded teacher effects as nonlinear (e.g., top quartile or top half of the teacher effectiveness distribution). Overall, the estimates of these sensitivity analyses were similar to the estimates reported in Tables 4 to 6. Second, because of the possibility of nonrandom switching among types of classrooms in the first, second, and third grades, it was also crucial to examine how switching affected the teacher effects. Krueger (1999) and Nye et al. (2000) examined whether switching biased the estimates of class size effects and concluded that it seemed unlikely. A typical way of doing such analysis is to use the intention to treat (ITT) assignment as the main independent variable, not the actual assignment as it was received because the latter may be biased. In particular, the ITT is unbiased by design and does not incorporate any possible compromises that may have occurred during the experiment (Freedman, 2006). As a result, I reran all of the analyses using the ITT variable in the equations. The results of this analysis indicated that the estimates of the teacher effects in all models had the same sign, were essentially identical to those reported in Tables 4, 5, and 6, and differed only in the second or third decimal place. In addition, statistical significance was not affected. Third, it is also possible that teacher effects may be affected by differences in the actual class size. By design, different types of classrooms had different numbers of students in Project STAR. The size of the class may affect teacher instruction and classroom practices. Hence, one needs to control for actual class size when computing teacher effects. This approach, however, has the disadvantage that, although target class size was assigned randomly, actual class size may be a result of nonrandom unobserved factors and hence, its estimate may be biased. Indeed, in Project STAR, there was more than intended variation in small and regular classes; that is, the actual class size ranged from 11 to 20 for small classes and from 15 to 29 for regular classes. To overcome this problem, I used the treatment assignment as an instrument for actual class size (Angrist, Imbens, & Rubin, 1996) and replicated all analyses using the new class size variable. Again, the estimates of the teacher effects were qualitatively (in terms of statistical significance) similar to those in Tables 4, 5, 6, which suggests that actual class size did not seem to influence teacher effects. Overall, these results support the notion that teacher effects estimates are robust.
Because students and teachers are not randomly assigned to schools, adjusting for school effects when estimating teacher effects is important. As a result, I also repeated the analysis using two-level models, where students are nested within schools, in order to control directly for school effects. The results of this analysis were comparable to those reported here. The coefficients of the student characteristics and teacher effects were very similar to those shown in Tables 4, 5, and 6. Therefore, this additional analysis points to the robustness of the estimates reported.
Further, because of the computation of teacher effects as classroom-specific residuals, it is possible that these estimates include some amount of measurement error. Therefore, when these estimates are used as predictors of future achievement, the assumption that the predictor is measured without any error seems implausible. Therefore, the results reported here may be affected by measurement error and may overstate or understate the teacher effects. This is a potential limitation of the study
The present study showed that teachers matter in early grades and that teacher effects are evident though the third grade in reading and mathematics. This finding suggests the importance of having effective teachers in early grades. This finding also highlights the importance of indentifying effective teachers and studying the characteristics that compose teacher effectiveness. In addition, the findings reported here suggest that interventions to improve the effectiveness of teachers are promising strategies for improving student achievement in the early grades. The challenge is to identify what constitutes teacher effectiveness, and then to design and implement interventions, such as professional development, to improve teacher effectiveness. If there were a cost-effective intervention to improve teacher effectiveness, the findings suggest that the cumulative effects would be at least as large as those obtained from small class effects.
Angrist, J., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444-472.
Borman, G. D., & Kimball, S. M. (2005). Teacher quality and educational equality: Do teachers with higher standards-based evaluation ratings close student achievement gaps? Elementary School Journal, 106, 3-20.
Boyd, D., Lankford, H., Loeb, S., Rockoff, J., & Wyckoff, J. (2008). The narrowing gap in New York teacher qualifications and its implications for student achievement in high-poverty schools. Journal of Policy Analysis and Management, 27, 793-818.
Bryk, A. S., & Raudenbush, S. W. (1988). Toward a more appropriate conceptualization of research on school effects. A three-level hierarchical linear model. American Journal of Education, 97, 65-108.
Clotfelter, C. T., Ladd, H. F., & Vidgor, J. L. (2006). Teacher-student matching and the assessment of teacher effectiveness. Journal of Human Resources, 41, 778-820.
Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: U.S. Government Printing Office.
Freedman, D. A. (2006). Statistical models for causation. What inferential leverage do they provide? Evaluation Review, 30, 691-713.
Goldhaber, D., & Anthony, E. (2007). Can teacher quality be effectively assessed? National Board certification as a signal of effective teaching. Review of Economics and Statistics, 89, 134-150.
Goldhaber, D. D., & Brewer, D. J. (1997). Why dont schools and teachers seem to matter?: Assessing the impact of unobservables on educational productivity. The Journal of Human Resources, 32, 505-523.
Good, T., & Brophy, J. (1987). Looking in classrooms. New York: Harper & Row.
Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). The effect of school resources on student achievement. Review of Educational Research, 66, 361-396.
Hanushek, E. A. (1971). Teacher characteristics and gains in student achievement; estimation using micro data. American Economic Review, 61, 280-288.
Hanushek, E. A. (1986). The economics of schooling: Production and efficiency in public schools. Journal of Economic Literature, 24, 1141-1177.
Hanushek, E. A. (1999). Some findings from an independent investigation of the Tennessee STAR experiment and from other investigations of class size effects. Educational
Evaluation and Policy Analysis, 21, 143-163.
Hedges, L. V., & Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science, 269, 41-45.
Hedges, L. V., & Nowell, A. (1999). Changes in the Black-White gap in achievement test scores. Sociology of Education, 72, 111-135.
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2, 172-177.
Kennedy, M. M. (2008). Contributions of qualitative research to research in teacher qualifications. Educational Evaluation and Policy Analysis, 30, 344-367.
Konstantopoulos, S. (2006). Trends of School Effects on Student Achievement: Evidence from NLS:72, HSB: 82, and NELS:92. Teachers College Record, 108, 2550-2581.
Konstantopoulos, S. (2009). The mean is not enough: Using quantile regression to examine trends in Asian-White differences across the entire achievement distribution. Teachers College Record, 111, 1274-1295.
Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114, 497-532.
Krueger, A. B. (2003). Economic considerations and class size. Economic Journal, 113, 34-63.
Lee, V. E. (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 35, 125-141.
Murnane, R. J. & Phillips, B. R. (1981). What do effective teachers of inner-city children have in common? Social Science Research, 10, 83-100.
Nye, B., Hedges, L.V., & Konstantopoulos, S. (2000). Effects of small classes on academic achievement: The results of the Tennessee class size experiment. American Educational Research Journal, 37, 123-151.
Nye, B., Konstantopoulos, S, & Hedges, L.V. (2004). How Large are Teacher Effects? Educational Evaluation and Policy Analysis, 26, 237-257.
Pedersen, E., Faucher, T. A, & Eaton, W. W. (1978). A new perspective on the effects of first grade teachers on childrens subsequent status. Harvard Educational Review, 48, 1-31.
Raudenbush, S. W. (2004). What are value-added models estimating and what does this imply for statistical practice? Journal of Educational and Behavioral Statistics, 29, 121-129.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models. Thousand Oaks, CA: Sage Publications.
Raudenbush, S. W., & Willms, J. D. (1995). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20, 307-335.
Rivkin, S. G, Hanushek, E. A, & Kain J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73, 417-458.
Rowan, B., Correnti, R., & Miller, R. J. (2002). What large scale, survey research tells us about teacher effects on student achievement: Insights from the Prospects study of elementary schools. Teachers College Record, 104, 1525-1567.
Tiebout, C. M. (1956). A pure theory of local expenditures. Journal of Political Economy, 64, 416-424.