Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13

Non-Evidence about Tracking: Critiquing the New Report from the Fordham Institute

by Kevin G. Welner - December 14, 2009

A new report authored by Tom Loveless and published by the Fordham Institute misleads in an attempt to convince policymakers to maintain tracking policies. The report combines weak data with questionable analyses to manufacture a flawed argument against detracking.


The issue of tracking and detracking is getting considerable education policy attention. This past Thursday, the Fordham Institute released a report by Brookings scholar Tom Loveless called “Tracking and Detracking: High Achievers in Massachusetts Middle Schools.” In a nutshell, the report concludes that non-tracked Massachusetts middle schools have fewer advanced students in math than do tracked schools. But the report is being characterized as more broadly supporting the claims that “students do better in school when they are separated into groups based on their achievement.” Teachers College Record is also currently highlighting featured articles by Jeannie Oakes and Beth Rubin concerning tracking and detracking – articles that present data and conclusions very much in tension with Dr. Loveless’ perspective.

Now, on the heels of the Loveless report comes one that I co-authored, “Universal Access to a Quality Education: Research and Recommendations for the Elimination of Curricular Stratification” published by the Education and the Public Interest Center (CU Boulder) and the Education Policy Research Unit (ASU). We use three case studies to document successful tracking reform, highlight lessons, and offer recommendations for changing policy and practice. The final section of the brief presents model statutory code language that can be used by state legislators seeking to implement the recommendations set forth in the brief.

This commentary focuses on a critique of the Fordham report. The piece is worthy of this attention in part because of the importance of tracking reform. Moreover, Loveless is probably the country’s foremost opponent of “detracking.” He has written extensively about the threat that he sees non-tracked classrooms posing for high-achieving math students. In this recent report, he follows up on two principal surveys he conducted, in 1995 and 2005, and he combines school-level tracking information with school-level ratings information – the percentage labeled as “Advanced” on the Massachusetts state assessment.

The Fordham study is preceded by a Foreword from Chester Finn and Amber Winkler, presumably speaking for the Fordham Institute. If anything, the tone and policy advocacy in the Foreword is even stronger than that found in the main body of the report. This surprised me, given Fordham’s long-time advocacy of strong standards and accountability policies. Whatever one may think of Dr. Loveless’ study, the body of research about low-track classes is crystal clear: students in those classes will tend to fall further behind. So it seems that, as a practical matter, Fordham is advocating rigorous standards and rigorous accountability, along with non-rigorous classes.

The key findings of the Fordham report are (as stated by Fordham):

• Tremendous change has occurred in tracking since the 1990s. The eighth-grade student of twenty years ago attended tracked classes for most of the day. Today’s eighth grader is primarily in detracked classes where students of all achievement levels are grouped together.

• Middle schools with more tracks have significantly more math pupils performing at the advanced and proficient levels on state tests …. And when socioeconomic status is held constant, each additional track in eighth-grade math is associated with a 3 percentage-point rise in students scoring at the advanced level. In other words, a school with 200 eighth graders that offers at least three levels of math is typically attended by twelve more students scoring at the advanced level than a detracked school of similar size and socioeconomic status.

• Detracking is popular in high-poverty schools. Urban schools serving mostly poor children are more likely to have diminished or abolished tracking while suburban schools serving children from more prosperous backgrounds are more apt to have retained it.

The Achievement Finding

As discussed below, each of these three findings is flawed – particularly the second, where this critique will concentrate. As noted at the outset, it is this purported “3 percentage-point rise” that has garnered media attention.

But there’s no “there” there. The publishers of this report seem to want it to say something that it doesn’t. In fact, the report presents data and analyses that are, by the author’s own admission, not very strong. Dr. Loveless concludes that the report “offers qualified support for the hypothesis that tracking policy is associated with achievement.” But, as discussed below, even this is an overstatement. The report combines weak data with questionable analyses to manufacture an argument against detracking.

Looking beyond the report’s rhetoric, there’s nothing in the data to argue against detracking. Dr. Loveless has tried to spin the results to say otherwise, and he offers a simple regression analysis (results presented in his Table 13) to support his claim. But – as explained below – the regression really only tells us about the distinction between schools with two tracks and those with three or more tracks, and it reports the reality that suburban schools tend to have high-scoring students. None of this is particularly useful for policymakers.

Loveless went to great pains to avoid writing that his study was asking about the “effects” of tracking and detracking on high achievers. Over and over, he reminds us that the study data and design don’t allow for answering causal questions. Instead, he writes about “associations” between tracking and test scores. He’s correct, of course – the study really can’t say much more than that. But one can understand why some readers would still walk away from the report confused by some of the statements in the report and the Foreword (“An analysis such as this cannot prove that tracking is the direct cause of the observable difference in schools. But the association is clear: More tracks, more high-performing kids and fewer failures. Fewer tracks, fewer high-performing kids and more failures.”) Unfortunately, it’s likely that many readers will jump to causal conclusions because of the overall tenor.

But, in fact, contrary to this assertion in the Foreword, the association itself is not clear at all, even using the data and results presented in this report. By Loveless’ own admission, it’s not clear for language arts (English) classes. And it’s really not clear for math, either.

To understand a key analytic weakness of the Fordham report, see Table 12 on page 32 of the report (replicated below). These are results prior to controlling for free- and reduced-price lunch (FRL) recipients in each school. That is, these are the raw percentages for each of the categories (Advanced, Proficient, etc.). The 17 schools with non-tracked (heterogeneous) 8th grade math classes have 15.8% who score in the advanced range. The others have 18.6% (2 tracks) and 26.6% (3+ tracks) in the advanced range.

Loveless’ Table 12: Distribution of Achievement in 8th Grade Math

(n = 126 out of 128)

Number of math tracks



Needs Improvement




















*p < .05 ** p<. 01

Although the report does not provide specifics, we know that the heterogeneously grouped schools tend to have a higher percentage of FRL children. The report says as much, and Table 13 (replicated below) also shows this, with the ‘No Controls’ result showing a 6.05 percentage point effect, dropping to 2.98 after ‘Lunch’ is added.

Loveless’ Table 13: Short-term Association. Predicting Percent Advanced 2008

(2008 Tracking Policy)


No Controls n=126

+ Lunch


Math Tracks















Summary statistics

F statistic



Degrees of freedom






Adjusted R-square



*p< .10 ** p <.05 ***p< .01

Tables 12 and 13 together show a very different pattern than the one Loveless highlights in his text. The increase in Advanced from 15.8% for heterogeneously-grouped schools to 18.6% for schools with two tracks is 2.8 percentile points. However, if Dr. Loveless had isolated this comparison and controlled for FRL, this difference would almost surely disappear or perhaps reverse. That is, the 2.8-point increase is prior to adding the FRL control. (The similarity to the 3-point regression result is merely coincidental.)

The regression outcomes (presented in Loveless’ Table 13) are driven not by the movement from no tracks to two tracks; it is only after one adds in the multi-tracked schools (where 26.6% of students score Advanced) that the headline “3 percentage-point gain associated with an additional track” emerges.

This distinction and insight is important because the issue for reformers has not been about moving from three tracks to two, since the low-track problem continues to exist. The findings of the Loveless regression model, driven by the difference between three math tracks and two, tell us little or nothing about the comparison between heterogeneous (untracked) schools and schools with one additional track.

Why might this jump occur between two-track math schools and three-track schools? The answer probably lies in Loveless’ Table 6 (replicated below), which shows tracking levels in the urban, suburban, and rural schools that responded. While 64.3% of urban schools that responded are listed as detracked, only 28.6% of suburban ones were “detracked.” (Note that Loveless uses the term “detracked” to include all schools with no math tracking in 8th grade; no evidence is presented as to whether the schools had always had no tracking or whether they had engaged in a detracking reform.) Dr. Loveless skips over the likely reason for this: suburban schools, built more recently than urban schools and generally during a time when the large, comprehensive high school was in vogue, tend to be substantially larger (link is to NCES data; note the “Enrollment (percentage distribution)” row). These larger, suburban schools tend to be the main location of the tracking that seems to be generating the results here.

Loveless’ Table 6: Community of School: Percentage of Urban, Suburban, and Rural Schools

(n = 97 out of 128)

School Community












Dr. Loveless gamely offers the regression analysis in Table 13 (see above) to try to control for some of the difference that might exist between the urban schools and the suburban schools, but the FRL variable is hardly sufficient. Schools differ for a lot of reasons, not just the wealth of students’ families. Resources, teacher quality, parental educational levels, etc., are also undoubtedly different and are likely to have an effect on an outcome like "percent Advanced." Moreover, the percentage of students receiving free- or reduced-price lunch is only a weak measure of the overall range of differences in wealth. Simply put, there exist many likely explanations for the observed variance, that are not examined or included in the Loveless analysis.

The analysis that Dr. Loveless would have ideally conducted – one that could have actually led to a causal conclusion – would require a student-level, longitudinal dataset with scale scores and some reliable indicators of students’ socio-economic status. If we think of this dataset as a full pail of water, we can then think of each shortfall as a hole in the pail. That is, because Loveless only had school-level data about achievement and poverty, he lost some crucial information about what was happening for the students within those schools. Because he only had snapshot data – what we call “cross-sectional data” – rather than longitudinal data, he had to compare different students across different years and hope they were comparable. Because he didn’t have students’ actual scores, he could only use the state categories of “advanced,” “proficient,” etc., so he lost tons of data about how well they actually performed. (In fact, he was focused on high achievers only, he chose to conduct only an analysis that compared “advanced,” ignoring all three remaining categories, thus losing the benefit of even more data.) Because he had only school-level data on free- and reduced-price lunch recipients, he could only include a rough control on socio-economic status. He couldn’t distinguish between schools enrolling students with family incomes at twice the poverty level versus those that enrolled students from middle-class or wealthy families. (Public school children qualify for reduced price lunches if their family's income is less than 185 percent of the federal poverty level.) Moreover, the free- or reduced-price lunch indicator tends to be unreliable after elementary grades because older students are more likely to opt out due to stigmatization fears.

Altogether, these weaknesses tremendously undermine the usefulness of the study. Downgrading the conclusions (as was the responsible thing to do) to merely associational does nothing to help provide guidance to policymakers.

The Other Two Findings

The focus above is on the achievement finding. But Fordham’s three bulleted findings presented above include two other findings:

• Tremendous change has occurred in tracking since the 1990s. The eighth-grade student of twenty years ago attended tracked classes for most of the day. Today’s eighth grader is primarily in detracked classes where students of all achievement levels are grouped together.

• Detracking is popular in high-poverty schools. Urban schools serving mostly poor children are more likely to have diminished or abolished tracking while suburban schools serving children from more prosperous backgrounds are more apt to have retained it.

The second bullet point above (bullet #3 in the Fordham presentation) is the most accurate. But I would use the term “heterogeneous grouping” rather than “detracking,” because what Loveless is seeing in Massachusetts middle schools is likely more the result of school size than any school reform. That is, the principals reported the number of tracks; they did not report that they had engaged in a detracking reform; and the numbers just discussed suggest that very little change has in fact been occurring. Rather, Loveless’ data show what Professor Valerie Lee has called a “constrained curriculum” – smaller schools tend to offer only the basic courses, while larger schools tend to take the ‘shopping mall’ approach (Lee, Croninger, & Smith, 1997). The misleading aspect of the Fordham statement is simply that it links “detracking” with high-poverty, when the more sensible link would be between large, suburban schools and more courses and more layers of courses.

Regarding the first bullet point, it should be noted that the Report itself presents (as its Table 1) the results of a survey conducted as part of the National Assessment of Educational Progress (NAEP). From 1992 to 2007, tracking in 8th grade ELA appears to have decreased from 48% to 43%. But the amount of tracking in 8th grade mathematics seems to have increased from 73% to 75%. Also, Dr. Loveless’ own data show (in his Table 4) that from 1995 to 2009 in Massachusetts, the percent of principals reporting heterogeneous 8th grade mathematics classes went from 15.2% to 15.6% (no real change). The only change occurred in other two categories, with the percent of schools with 3+ tracks dropping from 54.5% to 35.2%, while schools with two tracks increased correspondingly. So the “tremendous change” stated by Fordham appears to be largely lacking in empirical support.

Interestingly, these results (the data presented in Loveless’ Table 4) probably explain a key decision that Dr. Loveless made. The best (really, only), non-experimental way to analyze the effect of a policy is to have a “counter-factual” – a sensible comparison. In this instance, the sensible comparison would be these same schools before and after a detracking reform. Dr. Loveless rhetorically argues that these schools did, in fact, detrack over time. And he has three data points (1995, 2005, and 2009), since this is his second follow-up study of Massachusetts. I, therefore, was wondering why he didn’t analyze the test scores of students at these schools before and after that reform. But given that the percent of schools reporting heterogeneous 8th grade mathematics classes stayed static, and that the only change occurred in the other two categories, it seems the only analysis he could have done would have been to compare the effects of having two tracks to the effects of having 3+ tracks. Unfortunately, as noted above, that does not get at the key policy issue of actual detracking.


So what might really be going on in Massachusetts’ middle schools, if we accept the validity of the survey findings? (The survey had a 43% response rate; the Report provides no details to allow readers to tease out possible response bias.) Combining the information in Tables 6 and 12, it appears that Dr. Loveless’ data are telling us that large, suburban middle schools have more math tracks and have more students scoring in the Advanced range. So far, no surprises. The analysis then attempts to account for the demographic differences between the urban schools, which are disproportionately smaller and have fewer tracks, and these suburban, highly tracked schools. But all Dr. Loveless is able to use is the FRL variable. This substantially shrinks the outcome difference, but three percentile points per math track remain – the outcome trumpeted by Fordham.

The problem, as noted above, is that this outcome is (a) driven solely by the difference between two math tracks and 3+ math tracks, not by the difference between non-tracked and two-tracked schools; and (b) notwithstanding the Report’s repeated statement that the analyses controlled for “socio-economic status,” the FRL variable is a control for poverty rates but only a weak control for other differences in family wealth – let alone for the other differences between urban and suburban schools.

So, in the end, the data in the Loveless study suggest little or nothing about the effects of 8th grade math tracking or even the association between such tracking and the “Advanced” outcome. Clearly, even using the limited control of FRL rates, there is no benefit to high-achieving students associated with being in a school with two math tracks, as compared to an untracked school. What is less clear is what’s going on with regard to the schools with 3+ math tracks. From the information provided, it certainly appears that the better results are likely associated with locations in wealthier, suburban communities (what is sometimes called a “lurking variable”) rather than the variable that Dr. Loveless chose to include in his regression model (number of tracks). But we would need to know the specifics regarding the community location of the non-tracked, two-tracked, and 3+ tracked schools.

I should also note the study’s results for English language arts (ELA), presented in the Report’s Table 11. Schools with non-tracked and tracked ELA classes are shown with the same rate of Advanced performance. Moreover, most schools (91 out of 126) are reported by their principals to have non-tracked 8th grade ELA classes. There is no corresponding analysis that controls for FRL, since ELA was not the main focus of Loveless’ study. However, at the very least it appears that lack of ELA tracking in Massachusetts middle schools is not associated with lesser performance of high-achieving students. This might be the headline to the Study, if the underlying sentiments had been a bit different.

I do think it important to acknowledge that the tracking reform process can be difficult and that 8th grade math is grounded in the acquisition of prior, sequential knowledge. This is why the process that I recommended along with Carol Burris and Jennifer Bezoza in our new report makes the distinction between non-sequential subjects (such as ELA) and sequential subjects such as mathematics. We recommend a phase-in process that gives particular attention to sequential subjects – one that starts in elementary grades and works its way up to middle school and high school, thus increasing the likelihood of success for the detracking of 8th grade math.

However, acknowledging the challenges of reforming an injurious practice (and seeking to understand successful reform) is very different from defending the continuation of that practice. The Fordham report – and in particular its “3 percentile point per track” claim – does not help policy makers; it only misleads in an attempt to maintain tracking policies that are no longer defensible. Better treatment of these same data would, in fact, likely show that high-achieving Massachusetts middle school students in heterogeneous, untracked classrooms do as well or better than those in schools with two tracks of math classrooms – certainly in language arts (English) and maybe even in mathematics.


Burris, C. C., Welner, K. G., & Bezoza, J. W. (2009). Universal access to a quality education: Research and recommendations for the elimination of curricular stratification. Boulder and Tempe: Education and the Public Interest Center & Education Policy Research Unit. Retrieved December 13, 2009 from http://epicpolicy.org/publication/universal-access

Lee, V. E., Croninger, R. G., & Smith, J. B. (1997). Course-taking, equity, and mathematics learning: Testing the constrained curriculum hypothesis in U.S. secondary schools. Educational Evaluation and Policy Analysis, 19(2), pp. 99–121.

Loveless, T. (2009). Tracking and detracking: High achievers in Massachusetts middle schools. Washington D.C.: Thomas B. Fordham Institute. Retrieved December 13, 2009 from http://edexcellence.net/doc/200912_Detracking.pdf

Cite This Article as: Teachers College Record, Date Published: December 14, 2009
https://www.tcrecord.org ID Number: 15872, Date Accessed: 10/16/2021 9:01:59 AM

Purchase Reprint Rights for this article or review
Article Tools
Related Articles

Related Discussion
Post a Comment | Read All

About the Author
  • Kevin Welner
    University of Colorado at Boulder
    E-mail Author
    KEVIN G. WELNER is Professor at the University of Colorado at Boulder. His present research examines small school reforms, tuition tax credit voucher policies, and various issues concerning the intersection between education rights litigation and educational opportunity scholarship. His past research studied the change process associated with equity-minded reform efforts - reforms aimed at benefiting those who hold less powerful school and community positions (primarily Latinos, African Americans, and the poor).
Member Center
In Print
This Month's Issue