Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13
Topics
Discussion
Announcements
 

Contradictions Resolved: An Analysis of Two Theories of the Achievement Gap


by Stuart S. Yeh - 2017

Background: Value-added modeling (VAM) has been used to rank teachers and assess teacher and school quality. The apparent relationship between value-added teacher rankings and gains in student performance provide a foundation for the view that the contribution of teachers to student performance is the largest factor influencing student achievement, suggesting that differences in teacher quality might explain the persistence of the gap in student achievement as students advance throughout the K12 years. However, several studies raise questions about the reliability and validity of VAM.

Purpose: The purpose of this article is to reconcile the evidence that the contribution of teachers to student achievement is large with the evidence that value-added rankings are unreliable and possibly invalid.

Design: The method involves an analytical review of the available evidence, development of a theoretical explanation for the contradictory results, and a test of this explanation using path analysis with three longitudinal datasets involving nationally representative samples of schools and students.

Conclusion: The hypothesis that the contribution of teachers to student performance is the strongest factor influencing student achievement is not supported. A stronger factor is the degree to which students believe that they are proficient students. This is consistent with the view that the persistence of the achievement gap is better explained as the outcome of structural factors embedded in the conventional model of schooling that undermines the self-efficacy, engagement, effort, and achievement of students who enter kindergarten performing below the level of their more advantaged peers.



What is the cause of the persistent gap in academic achievement between minority students and their white peers? One possibility is that early differences in parenting or other sociocultural and socioeconomic factors contribute to initial differences in academic achievement that persist over time. From a theoretical perspective, however, one might expect the importance of parenting style and socioeconomic and sociocultural factors to decline with age since the proportion of a school-age child's life spent in school steadily increases with age. The persistence of the gap in the face of strenuous interventions by teachers throughout the academic careers of low-achieving students suggests the need for an explanation that identifies a ubiquitous factor (or factors) that consistently reinforce downward spirals throughout the K–12 years (for evidence of downward spirals, see LoGerfo, Nichols, & Reardon, 2006, pp. 26–32, 62–65).


This article compares two alternative models that seek to explain the persistence of the achievement gap. One model explains the persistence of the gap based on data suggesting that minority students tend to experience lower quality schools and teachers compared to their white peers. The second model explains the persistence of the gap as a psychological phenomenon. According to this model, minority students tend to become demoralized because they enter kindergarten performing below their same-age classmates and from that point forward receive comments, test scores, grades, and other cues that trigger and reinforce negative self-images, undermining effort and achievement throughout their school careers.


This article advances an unorthodox argument: after parsing the available evidence regarding the reliability and validity of value-added teacher rankings, and after reviewing evidence that National Board for Professional Teaching Standards (NBPTS) teacher certification is a reliable measure, but a weak predictor, of gains in student performance, there is reason to question the prevailing view that the contribution of teachers to student performance is the largest factor influencing student achievement. The article begins by reviewing studies regarding the use of value-added modeling (VAM) to assess teacher and school quality. VAM has received attention as a promising approach for judging quality. The apparent relationship between value-added teacher rankings and gains in student performance provide the foundation for the argument that the contribution of teachers to student performance is the largest factor influencing student achievement. However, several studies raise questions about the reliability and validity of VAM. I review this literature and offer a theoretical explanation for these contradictory results. This explanation is linked to the second model seeking to explain the persistence of the achievement gap, which explains the persistence of the gap as a psychological phenomenon, rather than a problem of teacher or school quality. The two models are then tested and compared using path analysis with three longitudinal datasets involving nationally representative samples of schools and students.


TEACHER QUALITY


A current theory regarding the persistence of the achievement gap is that minority students tend to experience lower quality schools and teachers, compared to their white peers. In the United States, sizable race-related wealth inequalities persist. Black individuals generally have a smaller stock of accumulated wealth to bestow upon their offspring, thereby perpetuating the black-white gap in net worth (Darity, 2005). Low wealth limits the prices of the homes that black families can afford.  This translates into differences in the perceived quality of the neighborhoods and associated schools experienced by blacks versus whites. Whites and individuals with more education and income are disproportionately likely to reside in neighborhoods on the "high quality" side of school boundaries, where homes command school-related premiums (Bayer & McMillan, 2012). Blacks and individuals with less education and income are disproportionately likely to live in neighborhoods on the "low quality" side of school boundaries, where home values are depressed by perceptions of low school quality (Bayer & McMillan, 2012). Whites live in neighborhoods where public school test scores are higher, on average, by 15%, compared to neighborhoods where the average black family lives (Bayer & McMillan, 2012). In sum, there is evidence of racial segregation that translates into differences in the quality of schools experienced by blacks versus whites.


Significantly, studies suggest that the contribution of teachers to student achievement is large and value-added estimates of teacher contributions predict their students’ measured achievement (Rivkin, Hanushek, & Kain, 2005; Rowan, Correnti, & Miller, 2002; Sanders & Rivers, 1996; Staiger & Rockoff, 2010; Wright, Horn, & Sanders, 1997). If it is true that minority students experience lower quality teachers and schools, and if teachers significantly influence student performance, this might explain the gap in achievement versus white peers.


FLAWED MEASURES


While value-added modeling of student achievement is increasingly being adopted by school districts across the nation, growing evidence suggests that statistical estimates of teacher contributions are flawed. Teacher rankings based on value-added estimates of performance are unreliable measures of future performance. Six studies have investigated the predictive power of teacher rankings based on value-added measures (Aaronson, Barrow, & Sander, 2007; Ballou, 2005; Goldhaber & Hansen, 2008; Koedel & Betts, 2007; Lefgren & Sims, 2012; McCaffrey, Sass, Lockwood, & Mihaly, 2009). In each study, teachers were ranked from high to low during a base period. In some studies, only one year of data was used to create the ranking (Aaronson et al., 2007; Ballou, 2005; Koedel & Betts, 2007; McCaffrey et al., 2009). In some studies, two or three years of data were used (Goldhaber & Hansen, 2008; Lefgren & Sims, 2012). In one study, four to five years of ranking data were used (Lefgren & Sims, 2012). In some studies, the focus was on the top and bottom quartiles of teachers. In other studies, the focus was on the top and bottom quintiles. However, in every study, the rankings were unreliable in predicting future performance. In all but one instance (top quartile teachers in the Aaronson et al. study), over half of the teachers ranked in the top and bottom quartiles (or quintiles) during the base period did not remain in those categories during the subsequent year (Aaronson et al., 2007; Ballou, 2005; Goldhaber & Hansen, 2008; Koedel & Betts, 2007; Lefgren & Sims, 2012; McCaffrey et al., 2009). While the value-added measure predicted that teachers in the top and bottom quartiles (or quintiles) would remain in those categories, over half of the teachers shifted out of those categories during the subsequent year.1 In short, value-added methods do not permit the reliable identification of high- and low-performing teachers.


A second problem is that the estimate of a teacher's value-added contribution predicts the prior performances of his or her students (Koedel & Betts, 2011; Rothstein, 2009, 2010). Since it is impossible for a teacher to cause the prior performance of his or her students, this result implies there is nonrandom selection of students into teacher classrooms that is not controlled through the inclusion of time-invariant student characteristics. Therefore, the central assumption underlying value-added modeling appears to be invalid (Braun, Chudowsky, & Koenig, 2010).


These results suggest that the use of value-added measures to rank teacher quality is not warranted. The instability in teacher rankings suggests that it is misleading to assert that a student who has a high-quality teacher for three years in a row would greatly benefit. The reason is that a high-quality teacher this year is not likely to remain a high-quality teacher next year, if it is indeed the case that teacher rankings are highly unstable. Value-added methods cannot be relied upon to identify teachers who cause their students to achieve at high levels.


In sum, the value-added modeling (VAM) methods that are employed to calculate the contribution of each teacher to student achievement depend on assumptions that appear to be invalid. The evidence that these methods are unreliable in predicting future teacher performance suggests that key variables influencing student performance have been omitted from the statistical models. The evidence that the estimate of a teacher's value-added contribution predicts the prior performances of his or her students indicates that the central assumption underlying value-added modeling is invalid.


OMITTED VARIABLES


McCaffrey et al. (2009) explain that VAM estimates the contribution of each teacher each year by calculating, across all students served by the teacher, the average gain in student performance from the previous year after controlling for all other variables that are included in the VAM model. The teacher's contribution is calculated indirectly, as the residual gain that remains after the influences of all other variables that are included in the VAM model are controlled. A problem arises if key variables are inadvertently omitted from the VAM model. The influences of the omitted variables are inadvertently lumped together in the calculation of the residual gain and, thus, are inextricably conflated with, and indistinguishable from, the value of the teacher's contribution calculated through VAM. The procedure that is used to calculate the teacher's estimated contribution mixes the teacher's actual contribution and the influences of all other variables that were inadvertently omitted from the model:


The persistent teacher effect is simply the portion of the estimated effect that is common across years. It is not necessarily equal to the teacher’s true performance; estimated effects from VAMs might not equal true causal effects of teachers due to violations of the model assumptions (Rothstein, 2008), and even persistent components of the estimated effects might include confounding factors that endure over time. For example, if the achievement model fails to properly capture all unobservables that are correlated with classroom assignments and the classroom average of the unobservables is stable across years, these omitted variables will be part of the persistent teacher effect. In an extreme case of confounding, suppose annual teacher effects were measured by classroom average test scores without any adjustment for student heterogeneity. These effects would likely demonstrate strong persistence within teacher over time due to the stability in the types of students assigned to teachers across years. (McCaffrey et al., 2009, p. 578).


Suppose, for example, that children who enter kindergarten performing below their same-age classmates consistently receive signals through grades, test scores, and teacher comments throughout their academic careers indicating that the children are performing below-average, and suppose that this steady diet of negative feedback depresses the children's levels of self-efficacy, engagement, effort and achievement throughout their school careers. Depressed levels of self-efficacy would be an example of the type of unmeasured, unobserved characteristic described by McCaffrey et al. (2009) whose effects would be conflated with the estimated contribution of each teacher and would cause each teacher's contribution to be incorrectly estimated when using value-added statistical modeling. Teachers who are assigned to teach classes with high proportions of low-income minority students would be likely to receive, year after year, students whose levels of self-efficacy and learning potential are depressed relative to the average student in the school district, causing value-added estimates of the teachers' contributions to be artificially depressed. Teachers who are assigned to teach classes with low proportions of low-income minority students would be likely to receive, year after year, students whose levels of self-efficacy and learning potential are elevated relative to the average student in the school district, causing value-added estimates of the teachers' contributions to be artificially elevated.

 

ARTIFICIAL EFFECTS


As a consequence, it may be expected that a portion of all teachers would, by chance, periodically receive entire classrooms filled with high-self-efficacy, high academic potential students that would artificially boost the value-added estimates of the contributions of those teachers. For any given teacher, this might happen to occur over a one, two, or three-year period and the effect would be reflected in any VAM research study lasting one, two, or three years. However, this process would be unrelated to each teacher's skill and ability. The identities of the teachers whose students exhibit these gains would continually change. However, these fortuitous alignments of entire classrooms filled with high-self-efficacy students would mislead researchers into thinking that some teachers make large contributions to student achievement.


To draw an analogy, strong winds whip up unusually large waves. It would be wrong to conclude that these waves are "high quality" waves that intentionally direct their skills to achieve impressive effects. Similarly, we might expect that any given sample of teachers would include, by chance, a number of teachers who happened to receive an unusually high proportion of high-self-efficacy, high academic potential students for a one, two, or three-year period of time. Since the effect that is attributable to self-efficacy is not measured and not included in the VAM model, the value-added procedure for calculating each teacher's contribution is unable to disentangle the self-efficacy effect from each teacher's actual contribution. The two effects would be lumped together when calculating the residual gain and would be indistinguishable. This would explain not only why some teachers exhibit large estimated value-added contributions to student achievement over one, two, or three year periods of time, but why those estimates are poor predictors of future performance. The specific mix of high- and low-self-efficacy students received by each teacher each year is akin to fluctuation in winds that vary beyond the control of particular waves. In this view, the mix of students received by each teacher each year, rather than deliberate actions by individual teachers, drives the value-added statistical estimates of each teacher's contribution. Consistent with this view, the only study that investigated the stability of value-added estimates of teacher performance over a 10-year period found that the estimated performance of individual teachers fluctuates significantly over time due to unobserved factors that are currently not captured through VAM (Goldhaber & Hansen, 2012). This invalidates the assumption of stable teacher performance that is embedded in key studies regarding VAM (see Gordon, Kane, & Staiger, 2006; Hanushek, 2009; McCaffrey et al., 2009; Staiger & Rockoff, 2010).


INADEQUATE CONTROLS


The inclusion of race and poverty covariates fails to fully adjust the value-added statistical calculations for differences in levels of self-efficacy (and differences in levels of potential achievement associated with differences in levels of self-efficacy) because race and poverty covariates measure race and poverty, not self-efficacy. For example, some minority students possess high levels of self-efficacy, while other minority students possess low levels of self-efficacy. Some minority students possess high levels of potential achievement, while other minority students possess lower levels of potential achievement. A value-added model might include race covariates, but this would not fully adjust the value-added calculation of a teacher's contribution to student achievement if all students taught by the teacher possess low levels of self-efficacy. A teacher may receive an entire class filled with low-self-efficacy black and Hispanic students, but the black and Hispanic covariate indicators under-adjust for the effect of low self-efficacy because they adjust performance for a mixed group of high- and low-self-efficacy students rather than an entire group of low-self-efficacy students. Similarly, free-lunch eligibility status is an indicator that is used in value-added modeling to control for level of poverty, but it is an imprecise measure of self-efficacy. Many students who are not eligible for free-lunch programs may possess low levels of self-efficacy. A teacher may be assigned to teach an entire class of students that happen to have low levels of self-efficacy, but only some students may qualify for free lunch. The inclusion of free-lunch status as a covariate in the value-added statistical model fails to fully control for the depressed level of potential achievement that may be expected with entire classes of students who possess low levels of self-efficacy.


A similar issue arises if the focus is switched to school quality, measured by average school test scores. If large numbers of low-income minority children enter kindergarten performing below their same-age classmates and consistently receive signals through grades, test scores, and teacher comments throughout their academic careers indicating that the children are performing below-average, and if this steady diet of negative feedback depresses the children's levels of self-efficacy, engagement, effort, and potential achievement throughout their school careers, then it may be expected that certain schools in low-income neighborhoods would be filled every year with children whose levels of self-efficacy and potential achievement are depressed—not because of the quality of the schools, but instead because universal grading and testing practices systematically undermine the self-efficacy, effort, and potential achievement of every child who enters a school performing below grade level.


The use of race and free-lunch status as covariates in value-added statistical models would fail to fully adjust the models for the same reasons explained above. Both race and the indicator of free-lunch eligibility are flawed measures of self-efficacy. They are flawed measures of the degree to which potential achievement is depressed whenever children enter a school performing below grade level and are continuously subjected to grading and testing practices that systematically depress self-efficacy and potential achievement.


The use of prior student achievement as a covariate in value-added statistical models would be inadequate to fully adjust the models because the harm that is produced by existing grading and testing practices would be expected to accumulate throughout the career of each low-performing student at each school that he or she attends. The hypothesized effect would be perfectly confounded with attendance at every school and would be impossible to separate from the independent contribution of each school.


A THEORETICAL EXPLANATION


The hypothesis that harmful effects from existing grading and testing practices persist and accumulate throughout the career of each low-performing student at each school that he or she attends provides a theoretical explanation for the finding that the estimate of a teacher's value-added contribution predicts the prior performances of his or her students (Koedel & Betts, 2011; Rothstein, 2009, 2010). If existing grading and testing practices exert a steady, cumulative effect throughout each student's career, and if teachers tend to be assigned the same types of students every year, certain unfortunate teachers will be systematically assigned students whose academic potentials have been undermined and continue to be undermined. The low performance of these students would necessarily be correlated with their prior performances. Since VAM omits a control for the level of self-efficacy and the VAM procedure lumps the influence of this factor together with the teacher's contribution into the calculation of the residual gain and then labels the entire residual gain as the teacher's contribution, what VAM labels as the teacher's contribution would be contaminated by the influence of the self-efficacy variable, creating the observed correlation between the residual gain score that is calculated for each student (i.e., what VAM labels the teacher's estimated contribution to student performance) and each student's prior performance. The evidence that contamination has occurred lies in the fact that the residual gain scores are correlated with the level of prior student performance. The residual gain scores must be contaminated by factors other than teacher contributions because teachers cannot cause the performance of their students during the period prior to the point where teachers receive the students.


To recapitulate, VAM teacher rankings predict future student performance. But they also predict past student performance, which implies that the correlation is not an indicator of causation but is instead an indicator of some third factor that influences student performance as well as the VAM rankings. It is difficult to identify a suitable factor that is (a) strongly correlated with student achievement and (b) not controlled in conventional value-added models of achievement. Self-efficacy is a likely candidate. It is strongly correlated with (and influences) student achievement, yet it is not measured and controlled in conventional value-added models of achievement (see Skinner, Zimmer-Gembeck, Connell, Eccles, & Wellborn, 1998). Therefore, its influence is not controlled. Instead, any persistent influence attributable to self-efficacy is lumped together with each teacher's independent contribution to student achievement and the combined effect is attributed to the teacher. It is difficult to think of another factor that is strongly correlated with student achievement and is not measured in conventional value-added models of achievement.


CHECKING THE ASSUMPTIONS


All of this suggests a need to re-examine the fundamental assumption that teachers exert strong influences on student performance. A strategy to check the premise that teachers exert strong influences on student performance is to investigate the best alternative measure of teacher quality that is independent of value-added statistical measures of teacher quality. If the best alternative measure of teacher quality is a strong predictor of student performance, this would support the theory that teachers make strong contributions to student performance. However, if the best available measures of teacher quality are weak predictors of student performance, this would undermine the theory that teachers make strong contributions to student performance.


Perhaps the best alternative measure of teacher quality is certification by the National Board for Professional Teaching Standards (NBPTS). NBPTS is an independent organization established in 1987 with the goal of advancing the quality of teaching and learning (National Board for Professional Teaching Standards, 2015a). NBPTS developed professional standards for teaching, then contracted with the Educational Testing Service (and, later, Pearson Educational Measurement) to create a voluntary system to certify teachers who meet those standards (Educational Testing Service, 2004; National Board for Professional Teaching Standards, 2015a).


NBPTS certification is a lengthy, highly demanding process. Applicants for certification are required to submit a portfolio to NBPTS involving four entries (National Board for Professional Teaching Standards, 2015b). Three are classroom based, where video recordings of teacher-student interaction and examples of student work serve as supporting documentation. A fourth entry relates to the candidate’s accomplishments outside of the classroom—with families, the community, or colleagues—and how they impact student learning. Each entry requires some direct evidence of teaching or school counseling as well as a commentary describing, analyzing, and reflecting on this evidence. Following submission of the portfolio, candidates are tested on their content knowledge through six 30-minute exercises, specific to the candidate’s chosen certificate area, at one of 300 NBPTS computer-based testing centers across the United States. Applicants are scored on a scale of 75 to 425, incorporating both the portfolio and the assessment center exercises, and they must earn a score of at least 275 to achieve certification (Goldhaber, Perry, & Anthony, 2003).


Drew Gitomer evaluated the interrater reliability of NBPTS ratings and found that there was agreement within one score point for approximately 90% of all ratings where two assessors performed the rating (Gitomer, 2008, p. 241). This indicates a high level of inter-rater reliability. Gitomer concluded that "the design features of the NBPTS system support a relatively reliable set of assessments" (Gitomer, 2008, p. 231).


Seven large-scale studies investigated the impact of NBPTS certification and offer the necessary power to detect effects, if they exist, and either controlled for student or school fixed effects or used hierarchical linear modeling (HLM) (Cavalluzzo, 2004; Clotfelter, Ladd, & Vigdor, 2006, 2007a; Goldhaber & Anthony, 2007; Harris & Sass, 2007; Ladd, Sass, & Harris, 2007; Sanders, Ashton, & Wright, 2005). These studies provide the best available estimates of the signaling and human capital effects of NBPTS certification.


To summarize, there is a small signaling effect of NBPTS certification, and effects on human capital are either mixed or negative. The average signaling effect size across the seven key studies is 0.002 SD in reading and 0.004 SD in math (Yeh, 2010b). This represents the average gain in student achievement of replacing an existing teacher with an NBPTS-certified teacher (the effect is diluted because some teachers in the general population are already teaching at the NBPTS level and would pass the NBPTS exam if they applied while others would fall below the NBPTS standard and would fail the NBPTS exam).


While other studies have investigated the relationship between NBPTS certification and student achievement (Bond, Smith, Baker, & Hattie, 2000; McColskey et al., 2005; Stone, 2002; Vandevoort, Amrein-Beardsley, & Berliner, 2004), none involved a sample with more than 35 NBPTS certified teachers, none controlled for student fixed effects, and the only study that used HLM involved a small sample of 25 NBPTS certified teachers and failed to find any impact on student achievement (McColskey et al., 2005). These studies are limited by key methodological weaknesses (Cunningham & Stone, 2005; Education Commission of the States, 2006).


A third measure of teacher quality is whether a teacher meets the federal definition of a highly qualified teacher: a person who has been awarded a minimum of a bachelor's degree from a four-year institution, is fully certificated or licensed by the state in which the teacher teaches, and demonstrates subject matter competence in each core academic subject area taught by the teacher (U.S. Department of Education, 2015). At the middle and high school level, teachers may demonstrate competency by showing that they possess a major in the subject area of instruction, credits equivalent to a major in the subject, an advanced level of state certification, a graduate degree, or have passed a state-developed test in the subject area or a high, objective uniform state standard developed by a state for the purpose of establishing subject-matter competency. These credentials are only weakly associated with value-added estimates of teacher performance (Clotfelter et al., 2007a; Clotfelter, Ladd, & Vigdor, 2007b). However, it is unclear whether the problem is weak reliability of the criterion measure or if credentials are indeed weak predictors of teacher contributions to student performance.


Measures of school quality based on value-added measures of school performance raise the issues described above with regard to the use of VAM to measure quality. What is needed is a measure that does not rely upon VAM.  Measures that are independent of VAM include student or parent judgments of school quality. Significantly, De Jong and Westerhof found that the quality of aggregated student ratings of eighth-grade mathematics teacher quality is equal to the quality of data obtained from trained external observers (De Jong & Westerhof, 2001). With regard to the quality of data obtained from parents, a survey involving responses from 3,948 District of Columbia Public School parents yielded a strong test-retest reliability coefficient of .937, while the internal reliability of survey items for each section of the survey ranged from .69 to .90 (Tuck, 1995). These results suggest that the reliability of data obtained from students and parents may be adequate for the purpose of rating teacher or school quality. Alternatively, an objective indicator that does not rely upon student or parent judgments of quality, such as reports of whether students or teachers have been physically attacked or whether students have been involved in fights, is likely to be correlated with the perceived quality of a school and may be used as a proxy indicator of quality. A reasonable approach might employ multiple measures to arrive at an overall judgment of the degree to which measures of teacher or school quality predict student achievement and explain the persistence of the achievement gap throughout the K–12 years.


TWO APPROACHES


Jesse Rothstein's results indicate that VAM omits one or more key variables and is a biased measure of teacher quality (Rothstein, 2009, 2010). While NBPTS teacher certification is a reliable measure of teacher quality (Gitomer, 2008), this measure is not a strong predictor of gains in student achievement, nor is there evidence that any other available measure of teacher or school quality is a strong predictor of gains in student achievement.


The current study, reported below, employed available measures of teacher and school quality to evaluate the hypothesis that teacher and school quality are strong influences on student achievement. While the reliability of these other measures is uncertain, a sensitivity analysis suggests that the results reported here are not sensitive to measurement reliability.2 An alternative approach would be to suspend this type of evaluation until advances in technology produce reliable measures of teacher and school quality that are strong predictors of gains in student achievement. Pursuit of this alternative path, however, presumes that advances in technology will eventually overcome the limitations of current measures such as VAM, and presumes that teacher and school rankings based on these measures will, in the future, prove to be strong, reliable, measures of teacher and school quality. There are three difficulties with this approach.


First, federal funds, including Race-to-the-Top funds, are increasingly being directed toward state educational agencies that promise to implement measures such as VAM and promise to use these measures to make operational decisions regarding the identification and termination of low-performing teachers (Dillon, 2010; U.S. Department of Education, 2012). There is an urgent need for research to investigate the core assumption underlying this policy. Policymakers cannot and will not wait until ideal measures of teacher and school quality are perfected. Instead, they will continue to proceed with the implementation of policies based on the assumption that teacher and school quality are the primary influences on student achievement—unless and until researchers demonstrate otherwise. Researchers who insist upon methodological purity risk delaying the type of research studies that are urgently needed.


Second, the literature reviewed above suggests reasons to question the core assumption that teacher and school quality are the primary influences on student achievement. NBPTS teacher certification is a reliable measure of teacher quality, yet it is only weakly correlated with gains in student achievement. Other than VAM-based rankings, none of the available measures of teacher quality are strongly correlated with gains in student achievement. If it is true that teacher and school quality are not primary influences on student achievement, then no advance in technology will ever lead to the development of measures of teacher and school quality that are strongly correlated with gains in student achievement. It would only be possible to develop measures of teacher and school quality that are strongly correlated with gains in student achievement if there is, in fact, a strong relationship between teacher and school quality and student achievement. Substantial effort has been expended on extremely sophisticated value-added statistical modeling and the development of NBPTS certification procedures. The results are disappointing. It may be necessary to consider the possibility that the failure to identify strong measures of teacher and school quality is not due to the limits of current technology but instead reflects a need to pursue a different strategy that is based on an alternative view of factors influencing and maintaining the achievement gap.


Third, it is incumbent upon advocates of the view that teacher and school quality are the primary influences on student achievement to develop appropriate measures and demonstrate that those measures are strong, reliable, valid measures of teacher and school quality. If those measures are not developed and made available to other researchers, it would not be appropriate to fault those researchers for failure to use strong, reliable, valid measures of teacher and school quality.


GRADING PRACTICES


An alternative view is that the persistence of the achievement gap may be traced to the way that schools are currently structured. A factor that has been overlooked is the psychological impact on children of existing grading practices. When children enter the school system, they are graded and compared to their same-age classmates. This practice undermines children's self-efficacy, engagement, effort, and achievement (Crooks, 1988, p. 464; C. Dweck, 1986, p. 1041; C. S. Dweck & Elliott, 1983). In particular, low-performing children are continually reminded that their performances are below par. The psychological impact is exacerbated by the introduction of letter grades in middle school. Even relatively high-performing children may be discouraged by an occasional bad grade as they realize they are not "straight-A" students. However, the impact on low-performing children is more severe and this may explain the persistence and growth of the achievement gap as children advance from grade to grade.


Children may receive low grades for several reasons. The tasks that are assigned may be too difficult. In addition, a child may receive a low grade even when demonstrating improvement if the performance of each child is scored in relation to other children, rather than in relation to each child's prior performance.


Lack of individualization has profound effects on children. For example, in one research study, matched children ages 9–11 who were identified as exhibiting poor engagement and performance were randomly-assigned to three groups (Kennelly, Dietz, & Benson, 1985). Group 1 received easy math problems, Group 2 received moderately-difficult problems, and Group 3 received difficult problems. Group 1 completed all problems with 100% accuracy, Group 2 achieved 76.9% accuracy, and Group 3 achieved 46.2% accuracy. Significantly, Group 2 exhibited the best level of persistence on a subsequent set of math problems. This study indicates the importance of individualizing task difficulty so that children experience a modest but not overwhelming challenge to their levels of competence.


Individualization of task difficulty permits low-performing children to achieve high accuracy scores on daily math assignments and high reading comprehension scores on reading comprehension tests. High scores presumably permit children to feel a sense of accomplishment. It appears that this promotes engagement, effort and achievement (Blankenship, 1992; Danner & Lonsky, 1981; Dougherty & Harbison, 2007; Drucker, Drucker, Litto, & Stevens, 1998; Harter, 1978; Kennelly et al., 1985; Nygard, 1977).


In combination with individualized task difficulty, it may be important to provide rapid performance feedback on daily math and reading assignments. Studies indicate that the most effective feedback is objective, involving daily testing, permitting children to see that they are making progress, thereby promoting engagement, effort and achievement (Bangert-Drowns, Kulik, Kulik, & Morgan, 1991; Fuchs & Fuchs, 1986; Kluger & DeNisi, 1996; Mac Iver, Stipek, & Daniels, 1991; Ryan, Mims, & Koestner, 1983).


While individualization of task difficulty and rapid performance feedback might be considered integral aspects of strong teaching, it is not unusual for teachers, including those typically categorized as strong teachers, to give the same set of math problems to all students in each class, and not to grade math homework until days later. It is not unusual for teachers categorized as strong teachers to employ letter-grade report cards and classroom assessments that compare students to each other and demoralize low-achieving students. NBPTS offers perhaps the most rigorous set of standards for distinguishing between strong and weak teachers. However, the NBPTS scoring process does not distinguish between teachers who do, and teachers who do not, employ letter-grade report cards and classroom assessments that compare students to each other. Nor does the NBPTS scoring process distinguish between teachers who do, and teachers who do not, give the same set of math problems to all students in each class, or delay grading math homework.


The challenge of individualizing task difficulty and providing rapid performance feedback on a daily basis for each student in a class of 25 students may be addressed through the use of technology. Evaluations of this technology indicate that it is more efficient than numerous alternative strategies for raising student achievement: voucher programs, charter schools, increased expenditure per pupil, stronger accountability for students and teachers, teacher certification by the National Board for Professional Teaching Standards, class-size reduction, comprehensive school reform, and the use of value-added statistical methods to identify and replace low-performing teachers (Yeh, 2010a).


The evidence that the combination of individualized task difficulty and rapid performance feedback is more efficient than numerous alternative strategies for addressing the achievement gap suggests that the persistence of the gap may be understood as a lack of individualized task difficulty, a lack of rapid performance feedback, and a lack of attention to grading practices that inadvertently undermine children's self-efficacy, engagement, and achievement. In this view, differences in achievement that exist at kindergarten are perpetuated through a system of grading practices that undermines the self-efficacy of low-performing children in a way that maintains the gap in achievement through the end of high school.


Three randomized studies support this view (Nunnery, Ross, & McDonald, 2006; Ross, Nunnery, & Goldfeder, 2004; Ysseldyke & Bolt, 2007). In each study, a randomized treatment group received a technology-based intervention that individualized task difficulty in either math or reading, in combination with rapid performance feedback. Randomized evaluations are significant because they provide strong evidence that individualization of task difficulty, in combination with rapid performance feedback, raises student achievement. To the extent that the intervention operates through the hypothesized mechanism and improves student self-efficacy, engagement, and effort, the results support the proposed theory of the achievement gap. Randomized evaluations are preferred to regression analyses because the latter are correlational studies that, by nature, cannot establish causal relationships.


While randomized studies are preferred to correlational studies, it is impractical to perform randomized studies with nationally representative samples of students. In addition, this type of study does not permit a direct test of hypothesized path effects with nationally representative samples of students. Path modeling with nationally representative samples of students permits a direct test of the relative strength of factors that hypothetically mediate early and late academic achievement, explaining how differences in achievement that exist at kindergarten may be translated into differences in achievement that exist at the end of high school.


An advantage of using nationally representative samples of students is that the results may be generalized to the entire population of American students. A disadvantage, however, is that researchers are limited to the indicators and measures for which data were collected. In particular, researchers who wish to investigate the influence of teacher or school quality are limited to the indicators and measures of teacher and school quality that were used in the available studies involving nationally representative samples of students. These indicators include the federal measure of high quality teachers, student judgments of the quality of their teachers, and parental judgments of the quality of the schools attended by their children. While some researchers may prefer the use of value-added measures of teacher quality or NBPTS certification as a signal of teacher quality, the studies reviewed above indicate that value-added measures are biased and NBPTS certification is only weakly correlated with gains in student achievement. Even if these measures were available with nationally representative samples of students, it would not be sensible to substitute these measures for the measures included in the current study. While the federal measure of high quality teachers, student judgments of the quality of their teachers, and parental judgments of the quality of the schools attended by their children may have limitations as indicators of teacher and school quality, there is no consensus regarding suitable indicators and there is no expectation that the problem of developing suitable indicators will be solved in the near future. Regardless, policymakers need information about the most promising strategies for improving student achievement. There is a need for studies that directly test and compare various theories about the nature of the achievement gap and compare promising ways of thinking about how to address the gap. This article reports the results of a study that employed path analysis with data from nationally representative samples of students to investigate factors, using available measures, that are hypothesized to maintain and perpetuate initial differences in achievement that exist at kindergarten.


METHODS


Path analysis was employed to compare two theories regarding the persistence of the achievement gap using data from three surveys sponsored by the National Center for Education Statistics (NCES). The Early Childhood Longitudinal Study of the Kindergarten Class of 2010–2011 (ECLS-K:2011) is currently following a nationally representative cohort of 18,170 children who attended kindergarten during the 2010-2011 school year (National Center for Education Statistics, 2015b). As of June, 2015, data from the kindergarten and first-grade years had been released. Data from this survey were used for path analyses covering the fall kindergarten through spring first-grade period of each student's academic career. The Early Childhood Longitudinal Study of the Kindergarten Class of 1998-1999 (ECLS-K) followed a nationally representative cohort of 21,260 children from kindergarten into middle school (National Center for Education Statistics, 2015a). Data from this survey were used for path analyses covering the spring first-grade through eighth-grade period of each student's academic career. The National Education Longitudinal Study (NELS) followed a nationally-representative cohort of 27,394 individuals who were surveyed as eighth-grade students in 1988, tenth-grade students in 1990, and twelfth-grade students in 1992 (National Center for Education Statistics, 2015c). Data from this survey were used for path analyses covering the eighth- through 12th-grade period of each student's academic career. The data collected through each survey included data from student, parent, teacher and school administrator questionnaires, standardized reading and math assessments, and administrative records.


The path diagram in Figure 1 suggests that differences in math achievement at entry at kindergarten are perpetuated and maintained by differences in the quality of the teachers and schools experienced by students. In this model, socioeconomic and sociocultural factors related to race as well as gender contribute to differences in achievement that exist upon entry at kindergarten. These differences in student achievement are presumed to be associated with race-related socioeconomic differences that influence residential location and are correlated with the quality of schools and teachers experienced by students. It is hypothesized that these differences in school and teacher quality magnify and perpetuate differences in academic achievement throughout the students' academic careers.


Figure 1. Model of Achievement, Teacher Quality and School Quality


[39_21786.htm_g/00002.jpg]

Notes: K=kindergarten. G1=grade 1. G3=grade 3.



In this model, the expectation is that differences in school and teacher quality experienced by students are correlated with race and socioeconomic status, as well as achievement, throughout the students' academic careers, i.e., low-achieving minority students from poor families are more likely than high-achieving white students from middle income families to experience below-average schools and teachers. Therefore, indicators for race and socioeconomic status were intentionally omitted as covariates after the kindergarten time period because it would not be appropriate to include covariates that are highly correlated with school and teacher quality (i.e., in the presence of collinearity). The model was explicitly designed to investigate the unconditional influence of school and teacher quality on student achievement and, conversely, the unconditional influence of differences in student achievement that are presumably related to racial and socioeconomic differences influencing residential location and correlated with the quality of schools and teachers experienced by students.


Race and ethnicity data were collected from parent interviews. Math achievement was measured by standardized math assessment scores. Grade 1 teacher quality was measured by dichotomous teacher responses to the question, "This school year, do you qualify as a 'highly qualified teacher (HQT)' according to your state's requirements?" Grade 3 school quality was measured by dichotomous school administrator responses to the question: "Have any of the following things happened during this school year at this school: Children or teachers being physically attacked or involved in fights?" Grade 8 school quality was measured by level of parent agreement with the statement "(child)'s school is a good school." Grade 10 school quality was measured by level of student agreement with the statement "the teaching is good at this school." All teacher and school quality variables were recoded so that high values indicated high quality and low values indicated low quality.


The decision to employ multiple measures of teacher and school quality throughout the path analyses was dictated by the availability of the measures in the ECLS-K and NELS datasets at each grade level where data were collected. It was not feasible to restrict the path analyses to a single type of school quality measure. An advantage, however, of using multiple measures is that this approach permits a judgment about whether the particular choice of measure affects conclusions about the influence of teacher and school quality on student achievement.


The path diagram in Figure 2 suggests that differences in math achievement upon entry at kindergarten are perpetuated and maintained by differences in the levels of self-efficacy experienced by students. In this model, socioeconomic and sociocultural factors related to race as well as gender contribute to differences in achievement that exist upon entry at kindergarten. It is hypothesized that these differences are associated with differences in the comments, grades, test scores and other cues received by students that magnify and perpetuate differences in self-efficacy and academic achievement throughout the students' academic careers.


Figure 2. Model of Achievement and Self-efficacy


[39_21786.htm_g/00004.jpg]


Notes: K=kindergarten.



In this model, the expectation is that differences in student self-efficacy across students are correlated with race and socioeconomic status, as well as achievement, throughout the students' academic careers, i.e., low-achieving minority students from poor families are more likely than high-achieving white students from middle income families to experience sharp, lengthy declines in self-efficacy over the K–12 years. Therefore, indicators for race and socioeconomic status were intentionally omitted as covariates after the kindergarten time period because it would not be appropriate to include covariates that are highly correlated with student self-efficacy (i.e., in the presence of collinearity).3 The model was explicitly designed to investigate the unconditional influence of student self-efficacy on student achievement and, conversely, the unconditional influence of student achievement on student self-efficacy.4 The appendix addresses hypotheses that race, income, or socioeconomic status might explain why black, Hispanic, low-income, and low-SES children exhibit depressed levels of self-efficacy.


Race and ethnicity data were collected from parent interviews. Math achievement was measured by standardized math assessment scores. Math self-efficacy was measured by level of student agreement with the statement "I am good at math" (grades 3 and 5) or "mathematics is one of respondent's best subjects" (grade 10). These are the only items available in the ECKLS-K and NELS datasets that are consistent with Albert Bandura's construct of perceived self-efficacy, defined (in this case) as expectations and convictions that a student can successfully execute the behavior required to solve math problems (Bandura, 1977).  Other items measure the extent to which respondents "like" or "look forward to" or "enjoy work" in math. These measures relate to a student's affect, anticipation for the subject of math, and degree of positive emotions, rather than expectations and convictions about whether a student can successfully execute the required behavior. These constructs are different. A student might have positive feelings but lack a strong conviction that he/she can solve math problems. Or, a student might be confident about solving math problems, but not derive pleasure in pursuing them. Lumping inconsistent measures together would be conceptually problematic and would interfere with the interpretation of the results.


The path diagram in Figure 3 suggests that achievement in grade 5 influences later achievement (in grade 8) by influencing grades received by students in grade 5, levels of academic interest and self-efficacy in grade 5, and the level of effort exerted by students in grade 5. Grades in grade 5 were measured by level of student agreement with the statement "I get good grades in all school subjects". This self-report measure reflects each student's own evaluation of his/her grades and is presumably a better measure for the purpose of evaluating the psychological impact of grades on each student's level of academic interest and self-efficacy. The academic interest and self-efficacy indicator is a composite indicator measuring level of student agreement with statements regarding academic interest ("I like all school subjects," "I enjoy work in all school subjects," "I look forward to all school subjects") and academic self-efficacy ("I am good at all school subjects," "work in all school subjects is easy for me"). This composite indicator was computed by the researcher as the mean of the items comprising the score and was only created if there were valid data on at least four of the five items. This indicator is similar but not identical to a composite indicator created by NCES and labeled "perceived interest/competence in all school subjects." Each student's effort in grade 5 was measured by combining the responses of the student's teacher to two queries ("How often does this child work to the best of her/his ability in reading?" and "How often does this child work to the best of her/his ability in math?") into a single composite measure of each child's level of effort. Each student's achievement in grade 8 was measured by combining each student's standardized math and reading assessment scores. Grade 8 school quality was measured by level of parent agreement with the statement "(child)'s school is a good school."


Figure 3. Grade 8 Achievement Model


[39_21786.htm_g/00006.jpg]




Path coefficients were derived from regressions of outcomes on predictors as indicated in Figure 1, Figure 2, and Figure 3. Dichotomous outcomes were modeled using logistic regression. Continuous measures were standardized to permit comparisons of effect magnitudes. For continuous predictors, each path coefficient corresponds to a one standard deviation change in the corresponding predictor. For dichotomous (yes/no) predictors, each path coefficient corresponds to "yes" values of the predictors. The data, involving complex surveys and non-independent observations, did not permit the use of log-likelihood, Akaike information criterion (AIC) or Bayesian information criterion (BIC) statistics to compare the models in Figures 1 and 2.5 However, the standardized path coefficients are comparable.


RESULTS


The hypothesis that the contribution of teachers to student performance is the strongest factor influencing student achievement is not supported. A stronger factor is the degree to which students believe that they are proficient students. The path coefficients in Figure 4 and Figure 5 indicate that self-efficacy is a stronger predictor of student achievement than school or teacher quality at every level of schooling.6



Figure 4. Model of Achievement, Teacher Quality and School Quality With Standardized Path Coefficients

[39_21786.htm_g/00008.jpg]


Notes: *p < .05. **p < .01. ***p < .001. K=kindergarten. G1=grade 1. G3=grade 3.


Figure 5. Model of achievement and self-efficacy with standardized path coefficients.


[39_21786.htm_g/00010.jpg]

Notes: *p < .05. **p < .01. ***p < .001. K=kindergarten.



The effect of the grade 1 measure of teacher quality on student achievement in spring of grade 1 is not significantly different than zero. The grade 3 measure of school quality, the grade 8 measure of school quality, and the grade 10 measure of school quality never exhibit path coefficients exceeding .18. The path coefficients bounce up and down and do not demonstrate a consistent pattern. In addition, the level of student achievement in fall kindergarten is not a significant predictor of grade 1 teacher quality. After kindergarten, the level of student achievement modestly predicts school quality. This supports the hypothesis that differences in achievement (presumably associated with socioeconomic status) are associated with residential decisions that influence the quality of the schools attended by students, but suggests that the effect is modest.


In contrast, the path coefficient for the grade 3 measure of self-efficacy on grade 3 achievement equals .17, the path coefficient for the grade 5 measure of self-efficacy on grade 8 achievement equals .25, and the path coefficient for the grade 10 measure of self-efficacy on grade 12 achievement equals .34, with all path coefficients significant at a .001 alpha level. The steady increase in the magnitude of the path coefficients as students advance from grade to grade, and the doubling of the path coefficient from grade 3 to grade 12, are consistent with the hypothesis that grading and testing practices systematically erode student self-efficacy throughout the academic careers of low-achieving students, systematically eroding student achievement in a way that maintains, perpetuates and widens the differential between high- and low-self-efficacy students. In addition, there appears to be a significant feedback effect that increases in magnitude as students advance from grade to grade, tripling in magnitude between kindergarten and grade 5. The path coefficient for the fall kindergarten measure of student achievement on grade 3 math self-efficacy equals .09. The path coefficient for the grade 3 measure of student achievement on grade 5 self-efficacy equals .27. The path coefficient for the grade 8 measure of student achievement on grade 10 self-efficacy equals .28. Once again, the pattern of effects increases as students advance from grade to grade. All of these coefficients are significant at the .001 alpha level. This pattern of results is consistent with the hypothesis that grading and testing practices exert a corrosive effect on student self-efficacy throughout the academic careers of low-achieving students and the effect is magnified because depressed achievement feeds back and further depresses self-efficacy in a negative downward spiral that strengthens in magnitude as students advance from grade to grade. These downward spirals may explain the persistence of the achievement gap despite the best efforts of teachers to address it.


The path coefficients in Figure 6 support this interpretation. Grades have a strong effect (path coefficient equal to .59) on the composite measure of academic interest and self-efficacy. The direction of influence may be inferred from previous research indicating that children enter kindergarten with relatively high levels of academic interest and self-efficacy but something about the interaction of children with the school system causes interest and self-efficacy to decline at an accelerating rate as children advance from grade to grade (Yeh, 2015). This suggests that sociocultural and family influences that exist prior to the point when children enter the school system equip children with relatively high levels of interest and self-efficacy that are then eroded after children enter the school system in kindergarten. This implies that the direction of causation runs from grades to interest/efficacy, not the reverse (otherwise, interest/efficacy would remain high, instead of declining after students enter kindergarten).



Figure 6. Grade 8 Achievement Model With Standardized Path Coefficients


[39_21786.htm_g/00012.jpg]

Notes: *p < .05. **p < .01. ***p < .001. G5=grade 5. G8=grade 8.



The path coefficient relating academic interest/efficacy to effort indicates that the level of student interest/efficacy is related to the level of effort exerted by each student. Presumably, the causal direction runs from interest/efficacy to effort, not the reverse (unless the exertion of effort causes interest and self-efficacy to increase). The components of the composite indicator of interest/efficacy suggest that students who receive high grades tend to like, enjoy, and look forward to, activities in all of their academic subject areas, feel confident about their abilities in those areas and, as a consequence, exert relatively high levels of effort that contribute to achievement three years later in grade 8. Conversely, it appears that students who receive low grades tend to dislike, do not enjoy, and do not look forward to, academic activities, do not feel confident about their academic abilities and, as a consequence, exert relatively low levels of effort that contribute to depressed achievement three years later in grade 8.


The strong association between grades and interest/efficacy, the association between interest/efficacy and effort, and the evidence suggesting that the causal direction runs from grades to interest/efficacy (and presumably from interest/efficacy to effort) suggests that the correlation between grades and effort is explained by the causal effect of grades operating through interest/efficacy on effort, not the reverse.


The direct effect of grades on achievement in grade 8 is .20 SD. The indirect effect of grades on achievement in grade 8 is .59 X .25 X.31, or .05 SD. The total effect size is .20 plus .05, or .25 SD. This is substantially larger than the .14 SD effect size of grade 8 school quality on achievement in grade 8. In addition, the largest immediate influence on achievement in grade 8 is student effort in grade 5. The effect size is .31 SD, more than twice the effect of grade 8 school quality. This suggests that grades and student effort, measured in grade 5 and mediated by level of academic interest and self-efficacy in grade 5, are much more important influences on achievement in grade 8 than school quality measured in grade 8.


These results should not be surprising. The Coleman report was the first national study to document the existence of substantial differences in educational achievement between black and white students at every grade level (Coleman et al., 1966). The Coleman report demonstrated that these differences increased as students progressed from first through 12th grade and demonstrated that the strongest predictor of student achievement for black and Hispanic students was a student's perceived control over his or her environment (Coleman et al., 1966, pp. 319, 322). For black and Hispanic students, this factor was stronger than any other school or background variable, including parental education (Coleman et al., 1966, pp. 319, 322). This suggests a psychological explanation for low minority student achievement instead of an explanation that focuses on school or teacher quality.


The results reported here support the hypothesis that students who enter kindergarten performing above their same-age classmates tend to receive grades, test scores, and teacher comments that reinforce student interest in academic activities and feelings of competence and self-efficacy with regard to academic activities throughout the K–12 years. This reinforces and promotes high levels of effort that tend to maintain high levels of achievement. Conversely, students who enter kindergarten performing below their same-age classmates tend to receive grades, test scores, and teacher comments that undermine student interest in academic activities and feelings of competence and self-efficacy with regard to academic activities. This undermines student effort in a way that tends to further depress achievement throughout the K–12 years.


CONCLUSION


In the United States, a disproportionate fraction of students who enter kindergarten performing above their same-age classmates happen to be white and Asian. A disproportionate fraction of students who enter kindergarten performing below their same-age classmates happen to be black and Hispanic. The results reported here suggest that the combination of this circumstance with the universal practice of grading, testing and comparing students to their same-age classmates may be sufficient to explain the persistence of the achievement gap throughout the K–12 years.


What explains the persistence of the idea that low achievement is associated with low school quality? Schools where achievement is low are disproportionately characterized by classrooms filled with students who are disengaged, apathetic, and disruptive. Teachers may have difficulty commanding the attention of their pupils. Pupils and teachers may engage in testy exchanges. Discipline tends to be poor. Classroom management tends to be poor. Conversely, schools where achievement is high are characterized by classrooms filled with students who are engaged, on-task, and eager to learn. Teachers have no difficulty commanding the attention of their pupils. Interactions between pupils and teachers tend to be pleasant and cooperative. Discipline tends to be good. Classroom management tends to be good.


A casual observer would have no difficulty in categorizing the first set of schools as "bad" schools and the latter set of schools as "good" schools. Arguably, however, all of the characteristics of "good" and "bad" schools are predictable when "good" schools happen to be filled with students with high self-efficacy and high learning potential, and "bad" schools happen to be filled with students with low self-efficacy and low learning potential. Students with high self-efficacy are engaged, on-task, eager to learn, exhibit high learning potential and large gains in achievement. Students with low self-efficacy are disengaged, disruptive, apathetic, and exhibit low learning potential and low gains in achievement. Teachers who teach in classrooms filled with high-self-efficacy students have no difficulty commanding the attention of their pupils, communicating pleasantly, maintaining discipline, managing their classrooms, and raising student achievement. Teachers who teach in classrooms filled with low-self-efficacy students have tremendous difficulty commanding the attention of their pupils, maintaining pleasant communication, maintaining discipline, managing their classrooms, and raising student achievement.


The analysis presented in this article suggests, however, that correlation has been mistaken for causation. The characteristics of "good" schools are associated with high student achievement, and the characteristics of "bad" schools are associated with low student achievement. But it appears that the relationship is not a causal relationship, where school or teacher quality causes differences in student achievement. Instead, the characteristics of "good" and "bad" schools, as well as the levels of achievement that characterize "good" and "bad" schools, may instead be traced to a third factor, namely, the nearly universal practice of grading, testing, and comparing students to their same-age classmates, causing demoralization among low-achieving students that triggers disengagement, reduction of effort, and reductions in learning potential. This translates into depressed achievement which further reduces self-efficacy, engagement, effort, and future achievement in a downward spiral that magnifies and perpetuates initial differences in achievement that exist at kindergarten. This process generates large numbers of low-performing students who fill schools in low-income urban areas. These schools then acquire reputations as bad schools.


The notion that teacher and school quality are the key influences on student achievement has been maintained by research suggesting that the contribution of teachers to student achievement is large and value-added estimates of teacher contributions predict their students’ measured achievement (Rivkin et al., 2005; Rowan et al., 2002; Sanders & Rivers, 1996; Staiger & Rockoff, 2010; Wright et al., 1997). Many researchers accept this evidence at face value.


However, the analysis presented here explains and resolves the puzzling contradictions involving VAM: the existence of large numbers of teachers who exhibit high value-added estimates of their contributions to student achievement; the poor predictive reliability of teacher rankings based on VAM; the failure of measures of race, poverty, and prior achievement to fully adjust and control for student heterogeneity when estimating the value-added contribution of each teacher; and the seemingly impossible finding that VAM estimates predict the achievement of each teacher's students prior to the point where the teacher receives those students. These contradictions are not easily explained in any other way. The capacity to explain these contradictions is powerful evidence supporting the proposed explanation of the achievement gap.


In October 2015, Jesse Rothstein analyzed and responded to Raj Chetty, John Friedman and Jonah Rockoff's series of arguments and studies defending VAM (Rothstein, 2015). Rothstein's analysis culminated five years of private and public communication and debate among the researchers in an attempt to pinpoint the source of differences in their assessments of the reliability and validity of using VAM for the purpose of estimating the contribution of individual teachers to student achievement. Rothstein concluded that none of the arguments and evidence presented by Chetty et al. alter the conclusion that VAM-based estimates of teacher quality are biased, unreliable and invalid:


My results are sufficient to re-open the question of whether high-value added elementary teachers have substantial causal effects on their students’ long-run outcomes . . . [there is] no strong basis for conclusions about the long-run effects of high- vs. low-value added teachers, which in the most credible estimates are not distinguishable from zero (Rothstein, 2015, p. 32).


The most credible VAM-based estimates of the contribution of individual teachers to student achievement "are not distinguishable from zero" and any statement, based on VAM, that teachers make significant contributions to student achievement is open to question. Rothstein's analysis challenges not only the Chetty et al. analysis, but also the previous body of studies, based on VAM, suggesting that teachers make significant contributions to student achievement (Rivkin, Hanushek, & Kain, 2005; Rowan, Correnti, & Miller, 2002; Sanders & Rivers, 1996; Staiger & Rockoff, 2010; Wright, Horn, & Sanders, 1997). The conclusion that VAM is flawed raises serious questions about the assertion that teachers make significant contributions to student achievement.


What this suggests is a need for a fundamental reordering of current ideas about the key factors influencing student achievement and the gap in achievement. It suggests a need to rethink the best approaches for addressing the gap. Most importantly, it suggests a need to reconsider the idea that when student achievement is low, the cause is bad schools and bad teachers.




APPENDIX


Models of achievement typically include controls for race. This controls for the possibility that the effects of other covariates depend on race. For example, if black and Hispanic children are born with levels of self-efficacy that are depressed relative to the self-efficacy of white children, this might explain why black and Hispanic children exhibit depressed levels of self-efficacy and depressed levels of achievement. Alternatively, if black and Hispanic families raise their children in ways that cause self-efficacy to be depressed, this might explain why black and Hispanic children exhibit depressed levels of self-efficacy and depressed levels of achievement.


However, it appears that black and Hispanic children enter the school system with higher—not lower—levels of self-efficacy, compared to white children, but their levels of self-efficacy are depressed and fall below the level for white children after entry into the school system (see Figures A.1 and A.2). This contradicts the hypothesis that black and Hispanic children are born with levels of self-efficacy that are depressed relative to the self-efficacy of white children. In addition, this contradicts the hypothesis that black and Hispanic families raise their children in ways that cause self-efficacy to be depressed.



Figure A.1. Self-efficacy of black children falls below self-efficacy of white children after grade 3.

[39_21786.htm_g/00014.jpg]


Figure A.2. Self-efficacy of Hispanic children falls below self-efficacy of white children after grade 3

[39_21786.htm_g/00016.jpg]




Similarly, models of achievement typically include controls for level of income or socioeconomic status. This controls for the possibility that the effects of other covariates depend on the level of income or socioeconomic status. For example, if low-income children (family income below the poverty level) are born with levels of self-efficacy that are depressed relative to the self-efficacy of middle-income children (family income above the poverty level), this might explain why low-income children exhibit depressed levels of self-efficacy and depressed levels of achievement. Similarly, if low-SES children (who fall in the bottom quintile of the SES distribution) are born with levels of self-efficacy that are depressed relative to the self-efficacy of high-SES children (who fall in the top four quintiles of the SES distribution), this might explain why low-SES children exhibit depressed levels of self-efficacy and depressed levels of achievement. Alternatively, if low-income families raise their children in ways that cause self-efficacy to be depressed, this might explain why low-income children exhibit depressed levels of self-efficacy and depressed levels of achievement. Similarly, if low-SES families raise their children in ways that cause self-efficacy to be depressed, this might explain why low-SES children exhibit depressed levels of self-efficacy and depressed levels of achievement.


However, it appears that low-income children enter the school system with higher—not lower—levels of self-efficacy, compared to middle-income children, but their levels of self-efficacy are depressed and fall below the level for middle-income children after entry into the school system (see Figure A.3). Similarly, low-SES children enter the school system with higher—not lower—levels of self-efficacy, compared to high-SES children, but their levels of self-efficacy are depressed and fall below the level for high-SES children after entry into the school system (see Figure A.4).


This contradicts the hypothesis that low-income (or low-SES) children are born with levels of self-efficacy that are depressed relative to the self-efficacy of middle-income (or high-SES) children. In addition, this contradicts the hypothesis that low-income (or low-SES) families raise their children in ways that cause self-efficacy to be depressed.


Figure A.3. Self-efficacy of low-income children falls below self-efficacy of middle-income children after grade 3

[39_21786.htm_g/00018.jpg]


Figure A.4. Self-efficacy of low SES children falls below self-efficacy of high SES children after grade 3

[39_21786.htm_g/00020.jpg]




Notes


1. When VAM is used to categorize teachers into top-quintile and bottom-quintile teachers, the result is highly unstable. If 100 teachers are categorized as "high" and 100 teachers are categorized as "low" at time t, the studies cited here imply that when the VAM ranking procedure is repeated at time t+1, the result is that less than 50 of the 100 teachers who were categorized as "high" at time t are categorized as "high" at time t+1. Less than 50 of the 100 teachers who were categorized as "low" at time t are categorized as "low" at time t+1.


2. NCES did not report the reliability of the measures used in the current study. However, a sensitivity analysis may be performed that explores the sensitivity of the results to a range of plausible values for measurement reliability. This analysis suggests that the results reported here are not sensitive to measurement reliability. Spearman's formula for disattenuating sample correlations is:

ρ̂xy = rxy [rxx + ryy]-1/2

where:

ρ̂xy = estimated population correlation of true scores

rxy = observed sample correlation

rxx = reliability of variable x

ryy = reliability of variable y

With regard to parent estimates of school quality, plausible values of item reliabilities are suggested by a survey involving responses from 3,948 District of Columbia Public School parents that yielded internal reliabilities of survey items for each section of the survey ranging from .69 to .90 (Tuck, 1995). Using these figures, Table 1 reports disattenuated correlations for the path coefficients for school quality on math achievement (see Figure 4). Using the low (.69) estimate of reliability for the school quality measure, the values of the disattenuated correlations for school quality on math achievement range from 0.18 to 0.23. Table 2 reports disattenuated correlations for the path coefficients for self-efficacy on math achievement (see Figure 5). Using the high (.9) estimate of reliability, the values of the disattenuated correlations for self-efficacy on math achievement range from 0.19 to 0.38. Even under unfavorable assumptions regarding measurement reliability (low reliability of the school quality measure and high reliability of the self-efficacy measure), the average effect of self-efficacy on achievement remains stronger than the average effect of school quality on achievement, and the effect of self-efficacy on achievement doubles between grade 3 and grade 12.


Table 1. Disattenuated Correlations for Path Coefficients for School Quality on Math Achievement Derived from Figure 4*

rxy

low reliability of x
(rxx = .69; ryy = .9)

high reliability of x
(rxx = ryy = .9)

0.16

0.20

0.18

0.18

0.23

0.20

0.14

0.18

0.16

mean

0.20

0.18

*Based upon the range of item reliabilities (.69 - .9) reported by Tuck (1995).



Table 2. Disattenuated Correlations for Path Coefficients for Self-Efficacy on Math Achievement Derived from Figure 5*

rxy

low reliability of x
(rxx = .69; ryy = .9)

high reliability of x
(rxx = ryy = .9)

0.17

0.22

0.19

0.25

0.32

0.28

0.34

0.43

0.38

mean

0.32

0.28

*Based upon the range of item reliabilities (.69 - .9) reported by Tuck (1995).


3. For example, the inclusion of race dummies as covariates would cause the coefficient for self-efficacy to drop to zero in a model predicting achievement if the only students who exhibit low self-efficacy and low achievement are black and Hispanic students and the only students who exhibit high self-efficacy and high achievement are white and Asian students. The expectation and presence of collinearity dictates that race be excluded in this model if the guiding theory is that grading practices depress black and Hispanic student self-efficacy, depressing effort and achievement, and if the purpose of the analysis is to estimate the relationship between levels of self-efficacy and levels of achievement. Note that race dummies are included in the path analyses as exogenous predictors of achievement upon entry at kindergarten.


Similarly, it would not be appropriate to include school quality or teacher quality as a covariate along with self-efficacy in a model predicting achievement if the only students exhibiting low self-efficacy and low achievement are minority students who attend urban schools rated low in quality or are taught by urban teachers rated low in quality, and the only students exhibiting high self-efficacy and high achievement are white students who attend suburban schools rated high in quality or are taught by suburban teachers rated high in quality, and if the purpose of the analysis is to estimate the relationship between levels of self-efficacy and levels of achievement. The expectation and presence of collinearity dictates that school quality and teacher quality be excluded as predictors in this model if the guiding theory is that grading and testing practices depress the self-efficacy, effort, and achievement of minority students who are concentrated in urban schools that are subsequently rated low in quality, while the same practices tend to maintain high levels of self-efficacy, effort, and achievement of white students who are concentrated in suburban schools that are subsequently rated high in quality, and if the purpose of the analysis is to estimate the relationship between levels of self-efficacy and levels of achievement.


The appropriate strategy is the strategy that is adopted here, i.e., to estimate separate models for each of the two theories regarding the persistence of the achievement gap without inserting extraneous predictors that are expected to be highly correlated with the predictors of interest, and then compare the magnitudes of the path coefficients.

4. It might be argued that prior achievement should be included as a covariate in any model predicting student achievement. For example, a student who performs poorly with regard to addition and subtraction does not have the foundation to advance to multiplication and division, and so forth. But this begs the question about why the student is performing poorly with regard to addition and subtraction. The arrays of assessments that are administered to students starting at entry into kindergarten quickly reveal areas of weakness. Teachers are well-informed about those areas of weakness and have ample opportunities to address those weaknesses.


When students continue to perform poorly despite attention from teachers, one possible explanation is that the quality of teaching is poor. However, significant resources and attention have been devoted to improve the quality of teaching, with disappointing results. Furthermore, as explained in the review of literature included in the current article, value-added measures of teacher quality are highly unreliable and the statistical approach used to calculate teacher rankings appears to be invalid. The lack of reliability suggests that existing VAM models are based on incorrect assumptions and an incorrect understanding of the sources of poor performance. There is a need to re-examine those assumptions and consider alternative views.


When students continue to perform poorly despite attention from teachers, a second possible explanation is that low-achieving students are subjected throughout their academic careers to a steady diet of negative comments, grades, test scores, and other cues that trigger and reinforce declines in self-efficacy that trigger reduced effort and achievement. The current study employs path analysis to compare both explanations.


The inclusion of prior achievement as a covariate in any statistical model predicting student achievement is problematic because it dominates the influence of other predictors. One might argue that the data simply indicate the need to drop other, insignificant predictors. However, there are two problems. First, any analysis suggesting that achievement is mainly a function of prior achievement offers little insight into factors that mediate the relationship between prior achievement and later achievement. Second, the effect of prior achievement may simply represent the cumulative effect of a factor (such as erosion of self-efficacy) that is slow and steady but is dominated in any statistical model that includes prior achievement as a covariate.


To draw an analogy, a build-up of corrosion can cause an engine to malfunction. The immediate cause of the malfunction is corrosion. If a statistical analysis were to be performed involving cases of corrosion, an indicator of the level of corrosion would dominate other factors, such as the level of moisture seeping into engine components during periods when the engine is not used. However, in these cases the root cause of engine malfunction is moisture, not corrosion.


The current analysis seeks to compare and contrast two possible explanations of the root cause of the persistent achievement gap. Prior achievement is a proximate factor that strongly predicts current achievement, but it is not the root cause of current differences in achievement.


5. Various indices are available to compare nested models where one parent model includes all of the predictors specified in comparison models that contain a subset of the predictors. However, Figure 1 and Figure 2 describe two non-nested models. It is not appropriate to compare them to a parent model that includes both sets of predictors because indicators of school and teacher quality are expected to be highly correlated with indicators of student self-efficacy, implying collinearity. Recall that Figure 1 describes a model where school and teacher quality are expected to be correlated with race and socioeconomic status. Figure 2 describes a model where student self-efficacy is expected to be correlated with race and socioeconomic status. Therefore, Figure 1 and Figure 2 describe models where the key predictors are expected to be correlated with race and socioeconomic status, implying that the key predictors are expected to be correlated with each other and it would not be appropriate to include them in the same statistical model.


References


Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25(1), 95–135.


Ballou, D. (2005). Value-added assessment: Lessons from Tennessee. In R. Lissetz (Ed.), Value added models in education: Theory and applications (pp. 1–26). Maple Grove, MN: JAM Press.


Bandura, A. (1977). Self-efficacy:  Toward a unifying theory of behavior change. Psychological Review, 84, 191–215.


Bangert-Drowns, R. L., Kulik, C. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61(2), 213–238.


Bayer, P., & McMillan, R. (2012). Tiebout sorting and neighborhood stratification. Journal of Public Economics, 96(11-12), 1129–1143.


Blankenship, V. (1992). Individual differences in resultant achievement motivation and latency to and persistence at an achievement task. Motivation and Emotion, 16(1), 35–63.


Bond, L., Smith, T., Baker, W. K., & Hattie, J. A. (2000). The certification system of the National Board for Professional Teaching Standards: A construct and consequential validity study. Greensboro, NC: Center for Educational Research and Evaluation, The University of North Carolina at Greensboro.


Braun, H., Chudowsky, N., & Koenig, J. (Eds.). (2010). Getting value out of value-added: Report of a workshop. Washington, DC: The National Academies Press.


Cavalluzzo, L. (2004). Is National Board certification an effective signal of teacher quality? Retrieved from http://www.nbpts.org/sites/default/files/documents/research/Cavalluzzo_IsNBCAnEffectiveSignalofTeachingQuality.pdf


Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006). Teacher-student matching and the assessment of teacher effectiveness. Journal of Human Resources, 41(4), 778–820.


Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2007a). How and why do teacher credentials matter for student achievement? Washington, DC: National Center for Analysis of Longitudinal Data in Education Research.


Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2007b). Teacher credentials and student achievement: Longitudinal analysis with student fixed effects. Economics of Education Review, 26(6), 673–682.


Coleman, J. S., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfeld, R., & York, R. (1966). Equality of educational opportunity. Washington, DC: Government Printing Office.


Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58, 438–481.


Cunningham, G. K., & Stone, J. E. (2005). Value-added assessment of teacher quality as an alternative to the National Board for Professional Teaching Standards: What recent studies say. In R. Lissitz (Ed.), Value added models in education: Theory and applications (pp. 209–232). Maple Grove, MN: JAM Press.


Danner, F. W., & Lonsky, D. (1981). A cognitive-developmental approach to the effects of rewards on intrinsic motivation. Child Development, 52, 1043–1052.


Darity, W. (2005). Stratification economics: The role of intergroup inequality. Journal of Economics and Finance, 29(2), 144–153.


De Jong, R., & Westerhof, K. J. (2001). The quality of student ratings of teacher behavior. Learning Environments Research, 4(1), 51–85.


Dillon, S. (2010, September 1). Formula to grade teachers' skill gains acceptance, and critics, The New York Times, pp. A1, A3.


Dougherty, M. R., & Harbison, J. I. (2007). Motivated to retrieve: How often are you willing to go back to the well when the well is dry? Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 1108–1117.


Drucker, P. M., Drucker, D. B., Litto, T., & Stevens, R. (1998). Relation of task difficulty to persistence. Perceptual and Motor Skills, 86, 787–794.


Dweck, C. (1986). Motivational processes affecting learning. American Psychologist, 41, 1040–1048.


Dweck, C. S., & Elliott, E. S. (1983). Achievement motivation. In E. M. Hetherington & P. H. Mussen (Eds.), Handbook of child psychology, Vol. 4: Socialization, personality, and social development (pp. 643–691). New York: Wiley.


Education Commission of the States. (2006). Synthesis of reviews of "The value-added achievement gains of NBPTS-certified teachers in Tennessee: A brief report". Retrieved from http://www.ecs.org/html/special/nbpts/PanelReport.htm


Educational Testing Service. (2004). Where we stand on teacher quality. Retrieved from https://www.ets.org/Media/Education_Topics/pdf/teacherquality.pdf


Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53(3), 199–208.


Gitomer, D. (2008). Reliability and NBPTS assessments. In L. Ingvarson & J. Hattie (Eds.), Assessing teachers for professional certification: The first decade of the National Board for Professional Teaching Standards (pp. 231–253). Amsterdam: Elsevier.


Goldhaber, D., & Anthony, E. (2007). Can teacher quality be effectively assessed? National Board certification as a signal of effective teaching. Review of Economics and Statistics, 89(1), 134–150.


Goldhaber, D., & Hansen, M. (2008). Assessing the potential of using value-added estimates of teacher job performance for making tenure decisions. Washington, DC: National Center for Analysis of Longitudinal Data in Education Research.


Goldhaber, D., & Hansen, M. (2012). Is it just a bad class? Assessing the long-term stability of estimated teacher performance. Washington, DC: National Center for Analysis of Longitudinal Data in Education Research.


Goldhaber, D., Perry, D., & Anthony, E. (2003). NBPTS certification: Who applies and what factors are associated with success? Seattle, WA: University of Washington.


Gordon, R., Kane, T. J., & Staiger, D. O. (2006). Identifying effective teachers using performance on the job. Washington, DC: The Brookings Institution.


Haertel, E. H. (2009, October 5). Letter Report to the U.S. Department of Education on the Race to the Top Fund. Retrieved from https://download.nap.edu/catalog.php?record_id=12780


Hanushek, E. A. (2009). Teacher deselection. In D. Goldhaber & J. Hannaway (Eds.), Creating a new teaching profession (pp. 165–180). Washington, DC: Urban Institute Press.


Harris, D. N., & Sass, T. R. (2007). The effects of NBPTS-certified teachers on student achievement. Madison, WI: University of Wisconsin.


Harter, S. (1978). Pleasure derived from optimal challenge and the effects of extrinsic rewards on children's difficulty level choices. Child Development, 49, 788–799.


Jacob, B. A., & Lefgren, L. (2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics, 26(1), 101–136.


Kennelly, K. J., Dietz, D., & Benson, P. (1985). Reinforcement schedules, effort vs. ability attributions, and persistence. Psychology in the Schools, 22(4), 459–464.


Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284.

Koedel, C., & Betts, J. R. (2007). Re-examining the role of teacher quality in the educational production function. Columbia, MO: University of Missouri.


Koedel, C., & Betts, J. R. (2011). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Education Finance and Policy, 6(1), 18–42.


Ladd, H. F., Sass, T. R., & Harris, D. N. (2007, February 28). The impact of National Board certified teachers on student achievement in Florida and North Carolina: A summary of the evidence prepared for the National Academies Committee on the evaluation of the impact of teacher certification by NBPTS. Retrieved from http://www7.nationalacademies.org/BOTA/NBPTS-MTG4-Sass-paper.pdf


Lefgren, L., & Sims, D. (2012). Using subject test scores efficiently to predict teacher value-added. Educational Evaluation and Policy Analysis, 34(1), 109–121.


LoGerfo, L., Nichols, A., & Reardon, S. (2006). Achievement gains in elementary and high school. Washington, DC: Urban Institute.


Mac Iver, D. J., Stipek, D. J., & Daniels, D. H. (1991). Explaining within-semester changes in student effort in junior high school and senior high school courses. Journal of Educational Psychology, 83(2), 201–211.


McCaffrey, D. F., Sass, T. R., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4(4), 572–606.


McColskey, W., Stronge, J. H., Ward, T. J., Tucker, P. D., Howard, B., Lewis, K., & Hindman, J. L. (2005). Teacher effectiveness, student achievement, and National Board certified teachers: A comparison of National Board certified teachers and non-National Board certified teachers: Is there a difference in teacher effectiveness and student achievement? Retrieved from http://www.education-consumers.com/articles/W-M%20NBPTS%20certified%20report.pdf


National Board for Professional Teaching Standards. (2015a). About us. Retrieved from http://www.nbpts.org/who-we-are


National Board for Professional Teaching Standards. (2015b). Guide to National Board Certification. Retrieved from http://boardcertifiedteachers.org/sites/default/files/Guide_to_NB_Certification.pdf


National Center for Education Statistics. (2015a). Early childhood longitudinal program (ECLS): Kindergarten class of 1998-99 (ECLS-K). Retrieved from http://nces.ed.gov/ecls/kindergarten.asp


National Center for Education Statistics. (2015b). Early childhood longitudinal program (ECLS): Kindergarten class of 2010-11 (ECLS-K:2011). Retrieved from https://nces.ed.gov/ecls/kindergarten2011.asp


National Center for Education Statistics. (2015c). National education longitudinal study of 1988 (NELS: 88). Retrieved from http://nces.ed.gov/surveys/nels88/index.asp


Nunnery, J. A., Ross, S. M., & McDonald, A. (2006). A randomized experimental evaluation of the impact of Accelerated Reader/Reading Renaissance implementation on reading achievement in grades 3 to 6. Journal of Education for Students Placed At Risk, 11(1), 1–18.


Nygard, R. (1977). Personality, situation, and persistence: A study with emphasis on achievement motivation. Oslo: Universitetsforlaget.


Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools and academic achievement. Econometrica, 73(2), 417–458.


Ross, S. M., Nunnery, J., & Goldfeder, E. (2004). A randomized experiment on the effects of Accelerated Reader/Reading Renaissance in an urban school district: Final evaluation report. Memphis, TN: Center for Research in Educational Policy, The University of Memphis.


Rothstein, J. (2008). Teacher quality in educational production: Tracking, decay, and student achievement. NBER Working Paper No. 14442.  


Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.


Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125(1), 175–214.


Rothstein, J. (2015). Revisiting the impacts of teachers. Goldman School of Public Policy and Department of Economics. University of California, Berkeley. Retrieved from http://eml.berkeley.edu/~jrothst/workingpapers/rothstein_cfr_oct2015.pdf


Rowan, B., Correnti, R., & Miller, R. J. (2002). What large-scale survey research tells us about teacher effects on student achievement: Insights from the Prospects study of elementary schools. Teachers College Record, 104, 1525–1567.


Ryan, R. M., Mims, V., & Koestner, R. (1983). The relationship of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45, 736–750.


Sanders, W. L., Ashton, J. J., & Wright, S. P. (2005). Comparison of the effects of NBPTS certified teachers with other teachers on the rate of student academic progress. Cary, NC: SAS Institute, Inc.


Sanders, W. L., & Rivers, J. C. (1996). Cumulative and residual effects of teachers on future student academic achievement. Knoxville, TN: University of Tennessee Value-Added Research Center.


Skinner, E. A., Zimmer-Gembeck, M. J., Connell, J. P., Eccles, J. S., & Wellborn, J. G. (1998). Individual differences and the development of perceived control. Monographs of the Society for Research in Child Development, 63(2/3), 1–231.

Staiger, D. O., & Rockoff, J. E. (2010). Searching for effective teachers with imperfect information. Journal of Economic Perspectives, 24(3), 97–118.


Stone, J. E. (2002). The value-added achievement gains of NBPTS-certified teachers in Tennessee: A brief report. Retrieved from http://www.education-consumers.com/briefs/stoneNBPTS.shtm


Tuck, K. D. (1995). Parent satisfaction and information (a customer satisfaction survey). Washington, DC: District of Columbia Public Schools, Office of Educational Accountability, Assessment and Information.


U.S. Department of Education. (2012). Teacher Incentive Fund. Retrieved from http://www2.ed.gov/programs/teacherincentive/index.html


U.S. Department of Education. (2015). New No Child Left Behind flexibility: Highly qualified teachers. Retrieved from http://www2.ed.gov/nclb/methods/teachers/hqtflexibility.html


Vandevoort, L. G., Amrein-Beardsley, A., & Berliner, D. C. (2004). National Board certified teachers and their students’ achievement. Education Policy Analysis Archives, 12(46). Retrieved from http://epaa.asu.edu/ojs/article/view/201


Wright, S. P., Horn, S. P., & Sanders, W. L. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57–67.


Yeh, S. S. (2010a). The cost-effectiveness of 22 approaches for raising student achievement. Journal of Education Finance, 36(1), 38–75.


Yeh, S. S. (2010b). The cost-effectiveness of NBPTS teacher certification. Evaluation Review, 34(3), 220–241.


Yeh, S. S. (2015). Two models of learning and achievement: An explanation for the achievement gap? Teachers College Record, 117(12).


Ysseldyke, J., & Bolt, D. M. (2007). Effect of technology-enhanced continuous progress monitoring on math achievement. School Psychology Review, 36(3), 453–467.







Cite This Article as: Teachers College Record Volume 119 Number 6, 2017, p. 1-42
https://www.tcrecord.org ID Number: 21786, Date Accessed: 10/23/2019 2:42:26 PM

Purchase Reprint Rights for this article or review
 
Article Tools
Related Articles

Related Discussion
 
Post a Comment | Read All

About the Author
  • Stuart Yeh
    University of Minnesota
    E-mail Author
    STUART YEH is Associate Professor of Evaluation Studies at the University of Minnesota. He is the author of Solving the Achievement Gap: Overcoming the Structure of School Inequality (Palgrave, 2017).
 
Member Center
In Print
This Month's Issue

Submit
EMAIL

Twitter

RSS