

Race, Test Scores, and Educational Outcomesby Stuart S. Yeh  2016 Background: Policymakers wish to know what changes must be made to the primary and secondary education system so that when students apply to college they are highly prepared, ready to perform at high levels, and likely to be successful in college. Purpose: The purpose of the current study was to investigate the significance of strong educational preparation for all students—and especially for minority students—as measured by standardized test scores. While the importance of educational preparation may already seem well established, previous studies have tended to minimize or obscure the significance of the type of preparation that is measured by test scores. Existing studies are not adequate for the purpose of estimating the total effect of a hypothetical intervention that raises student achievement by one standard deviation (SD). Research Design: The current study employed logistic regression with a nationallyrepresentative dataset, controlled for key covariates, and analyzed the hypothetical effect of raising student test scores by one SD. Findings: Raising test scores by one SD would substantially reduce the probabilities that black, Hispanic, Asian, and white students would drop out of high school, and would greatly increase the probabilities that both minority and white students would compile a rigorous high school record, complete algebra 2 in high school, enroll at a 4year institution, and attain a baccalaureate degree. Recommendation: The results reported here underline the importance of swiftly adopting the most efficient approaches for raising student achievement. To the extent that existing approaches for raising student achievement are unproductive, inefficient, and disproportionately affect minority students, current policies may serve to depress baccalaureate attainment rates and to perpetuate the disadvantaged status of minorities. Policymakers wish to understand the changes that must be made to the primary and secondary education system so that when students apply to college they are highly prepared, ready to perform at high levels, and likely to be successful in college. However, while previous studies have established that certain interventions are more effective and efficient than other interventions for the purpose of raising student achievement (Yeh, 2007, 2008, 2009a, 2009b, 2010a, 2010b, 2011, 2012, 2013; Yeh & Ritter, 2009), this research has received little attention. This may be attributed to a widespread perception that student achievement, as measured by standardized tests, may be useful for predicting freshman undergraduate letter grades (FGPA), but should not be overemphasized (see, for example, Bowen, Chingos, & McPherson, 2009; Geiser & Santelices, 2007; Geiser & Studley, 2002; Niu & Tienda, 2009). This perception may contribute to the problem of low achievement by inadvertently drawing attention away from the need to swiftly adopt the most efficient approaches for raising achievement in the elementary and secondary grades. The purpose of the current study was to investigate the significance of strong educational preparation for all studentsand especially for minority studentsas measured by standardized test scores. While the importance of educational preparation may already seem well established, previous studies that regressed baccalaureate attainment on SAT score and high school gradepoint average (GPA) are not adequate for understanding the potential impact of interventions that raise student achievement in the K12 years. These studies held GPA constant, whereas an intervention that raises student achievement may be expected to raise both test scores and GPA in tandem. A study is needed that aggregates the direct effect of an intervention that operates through test scores on postsecondary outcomes, plus the indirect effect of the intervention operating through GPA to raise postsecondary outcomes, to arrive at the total effect. Thus, the purpose of the current study is to estimate the total effect of a hypothetical intervention that raises student achievement by one standard deviation (SD). This article reports analyses involving a nationallyrepresentative sample of students in a way that graphically emphasizes the significant disadvantages that arise when students are not well prepared. The results reported here underline the importance of swiftly adopting the most efficient approaches for raising student achievement. To the extent that existing approaches for raising student achievement are unproductive, inefficient, and disproportionately affect minority students, current policies may serve to depress baccalaureate attainment rates and to perpetuate the disadvantaged status of minorities. The lack of corrective action may be partly attributable to an incomplete understanding of the significance of strong educational preparation as measured by test scores. BACKGROUND It is well accepted that student achievement, measured by test scores, influences educational outcomes (Duncan, Featherman, & Duncan, 1972; Herrnstein & Murray, 1994; Jencks et al., 1979; Jencks & Phillips, 1998; Winship & Korenman, 1999). Test scores are important standardized indicators of the educational preparation that the primary and secondary education system has imparted. A standardized measure is important because it can be used to measure the size of the gap that needs to be filled and the extent to which interventions to raise student achievement will fill that gap. One issue is the appropriate criterion measure. Studies that use undergraduate freshman GPA (FGPA) as the criterion measure suffer from two problems. Restriction of range occurs when students with high test scores and grades gravitate toward one set of institutions and students with low test scores and grades gravitate toward a second set of institutions. Criterion reliability is affected when FGPA does not reflect the difficulty of each course. Some students select challenging courses, while others select easier courses. The grades received do not reflect the difference in level of difficulty. The problem is exacerbated if students with greater aptitude select challenging courses while students with lesser aptitude select easier courses. After correcting for predictor restriction of range and criterion unreliability, the validity coefficient for SAT scores is similar to the validity coefficient for high school GPA, indicating that SAT scores are approximately as valid as high school GPA for the purpose of predicting FGPA (Camara & Echternacht, 2000; Ramist, Lewis, & McCamleyJenkins, 1994). However, the problems with FGPA as a criterion outcome suggest that college graduation is a better criterion. In perhaps the most influential study, Bowen and Bok (1998) investigated the relationship between combined SAT scores and baccalaureate graduation rates for students who matriculated at 28 academically selective colleges and universities and found that both black and white students graduated at higher rates at more selective institutions. This result suggested that it may be desirable to remove barriers that impede minorities from enrolling at selective institutions. However, Bowen and Bok's data indicate that there was a positive relationship between SAT scores and graduation rates. A subsequent study by Bowen and two coauthors found that SAT scores explained much of the variation in student performance at 19 academically selective postsecondary institutions (Bowen, Kurzweil, & Tobin, 2005). After controlling for SAT scores, students from disadvantaged socioeconomic backgrounds did not underperform their more advantaged counterparts, graduated at comparable rates, and were equally successful in attaining lucrative law and business degrees. Bowen et al. concluded that strong preparation, captured by math and reading test scores, can overcome socioeconomic disadvantages and is the major determinant of differences in educational attainment between advantaged and disadvantaged young people (2005, p. 224). While a third study by Bowen and two coauthors concluded that the influence of test scores is relatively small and high school GPA is a better predictor of baccalaureate attainment rates (Bowen, Chingos, & McPherson, 2009), other studies suggested that academic preparation as measured by test scores has an important influence on postsecondary outcomes. A synthesis of research studies involving 400 institutions and 82,000 students found that SAT scores are a better predictor of baccalaureate attainment rates than high school GPA (see review by Burton & Ramist, 2001). A national study of 1,429 baccalaureategranting institutions found correlations ranging from 0.62 to 0.73 between graduation rates and SAT or ACT scores (Stumpf & Stanley, 2002). A national study of 262 baccalaureategranting institutions found that students with combined SAT scores of 1300 or above were about three times more likely to attain a baccalaureate degree within four years compared to students scoring below 800 (Astin & Oseguera, 2005). An analysis of data involving 12,144 students from the nationallyrepresentative National Education Longitudinal Study found that only 16.1% of students scoring in the bottom quartile on the eighthgrade mathematics test subsequently completed a baccalaureate degree, compared to 67.0% of those scoring in the top quartile (Scott, Ingels, & Owings, 2007). An analysis of longitudinal data from the nationallyrepresentative High School and Beyond Study of 10,470 high school students found that only 3% of those scoring in the bottom quintile of a short version of the SAT administered during their senior year subsequently completed a baccalaureate degree, compared to 64% of those scoring in the top quintile (Adelman, 1999). An analysis of data from 22,652 high school seniors who participated in the National Longitudinal Study of the High School Class of 1972 found that college students ranking at the 25th percentile of their high school class were twice as likely to drop out if their combined SAT score was 700, compared to students with a combined SAT score of 1300 (Manski & Wise, 1983). College students ranking at the 100th percentile of their high school class were over three times as likely to drop out if their SAT score was 700, compared to students with an SAT score of 1300 (Manski & Wise, 1983). A study of academicallycompetitive colleges found that fewer than 14% of students with combined SAT scores of 1000 or lower compiled a college freshman gradepoint average (FGPA) of 3.5 or higher, while over 50% of the students with SAT scores over 1200, and 77% of the students with SAT scores over 1400, reached this standard (Bridgeman, Pollack, & Burton, 2004). A study of academically selective colleges found a similar pattern for 4year college GPAs: none of the students scoring below 800 had 4year GPAs of 3.5 or higher, while over 50% of students with SAT scores equal to or exceeding 1410 met this standard (Bridgeman et al., 2004). These results suggest that test scores do indeed measure significant differences in academic preparation that are reflected in important outcomes. If the purpose is to predict the impact on these outcomes of raising student achievement by one SD, it is reasonable to use test scores as a standardized measure of achievement. However, even the best available studies are inadequate for this purpose. These studies typically regress baccalaureate attainment on SAT score and high school GPA and are intended to predict the college performance of a particular individual. However, these studies are not adequate for understanding the potential impact of an intervention that raises student achievement in the K12 years because these studies hold GPA constant. This is problematic because an intervention that raises student achievement may be expected to raise both test scores and GPA in tandem. A study is needed that aggregates the direct effect of an intervention that operates through test scores on postsecondary outcomes, plus the indirect effect of the intervention operating through GPA to improve postsecondary outcomes, to arrive at the total effect (Figure 1). This type of path analysis was not conducted in any of the previous studies reviewed above. Thus, the purpose of the current study is to estimate the total effect of a hypothetical intervention that raises student achievement by one SD.
Figure 1. Path model of relationships among test scores, GPA, and postsecondary outcomes Key outcomes include the probability of dropping out of high school, compiling a rigorous high school record, completing algebra 2 in high school, completing calculus in high school, enrolling at a 4year institution, compiling an undergraduate GPA of 3.75 or above, attaining a baccalaureate degree, and aspiring to a doctorate or professional degree. Students who drop out of high school are economically disadvantaged. They are not able to earn as much as individuals who complete high school. Similarly, students who do not attain a baccalaureate degree are economically disadvantaged. On average, they do not earn as much as individuals who complete a baccalaureate degree. Students who do not complete a rigorous high school curriculum or algebra 2 are disadvantaged when applying for college. The completion of calculus is an indicator of a student's level of preparation in mathematics and preparation for careers in science, technology, engineering, and mathematics. Enrollment at a 4year institution and compilation of an undergraduate GPA above 3.75 are important indicators of progress toward successful completion of a baccalaureate degree. Aspiring to a doctorate or professional degree is an important indicator of a student's educational aspirations. If a student does not aspire to a doctorate or professional degree it is unlikely that the student would apply to a doctoral or professional degree program and unlikely that the student would complete a doctoral or professional degree program. METHOD SAMPLE AND DATA COLLECTION The National Education Longitudinal Study (NELS) sponsored by the National Center for Education Statistics followed a nationallyrepresentative cohort of 27,394 individuals who were surveyed as eighthgrade students in 1988, tenthgrade students in 1990, twelfthgrade students in 1992, and again in 1994 and 2000, eight years after their expected high school graduation date. In addition to the student survey data, information was collected from parents, teachers, and school administrators; high school transcripts; postsecondary institutions and transcripts; and standardized reading and math tests developed by the Educational Testing Service specifically for NELS. The analysis employed the results of tests administered during the sophomore year (1990). To guard against ceiling and floor effects and improve accuracy, each sample member was administered a test form that aligned the difficulty level of the mathematics and reading questions to the examinee based on his or her scores on the eighthgrade base year mathematics and reading tests. Unlike the SAT or ACT, the tests were administered to all students in the nationallyrepresentative sample and therefore do not suffer from restriction of range. The measure of socioeconomic status was constructed from parent and student surveys using parental education levels and occupations and family income. The dropout indicator was constructed from transcript and survey information. Postsecondary transcripts and institutional data were used to determine baccalaureate attainment and to construct the indicator of attendance at a fouryear institution. High school transcripts were used to determine whether a student received an Advanced Placement (AP) exam score in any subject and to determine course enrollment in specific courses and patterns of courses. Undergraduate GPA and educational aspirations by age 30 were each constructed from student survey responses at the last followup in 2000. ANALYSIS High school GPA was calculated from transcript data. GPA and test scores were standardized. Subgroup weighted means for test scores, GPA and the socioeconomic status index, plus test scores one SD above the mean, were calculated by race. In Stage 1, high school GPA was regressed on test scores, socioeconomic status, sex, and categorical variables for black, Hispanic, Asian, and American Indian/Alaska Native students (the excluded category was white students). In Stage 2, logistic regression was employed to regress each of eight outcomes on test scores, GPA, socioeconomic status, sex, and categorical variables for black, Hispanic, Asian, and American Indian/Alaska Native students and the northcentral, south, and west regions of the United States (the excluded categories were white students and northeast region). Insignificant predictors were dropped to arrive at final models for the eight outcomes. The eight categorical outcomes were: a) ever dropped out of high school, b) ever attended a 4year postsecondary institution, c) attained baccalaureate degree, d) completed a highly rigorous high school curriculum, e) took calculus in high school, f) took algebra 2 in high school, g) attained an undergraduate GPA above 3.75, and h) aspired to a Ph.D. or professional degree. Predicted percentages for each of the eight outcomes were generated by race and sex, using corresponding values of mean test scores, predicted GPA from the Stage 1 regression, socioeconomic status, and the race, sex, and region categorical variables. Predictions were repeated using values of test scores at one SD above the corresponding mean values, including the increase in predicted GPA from the Stage 1 regression that would be expected if test scores were increased by one SD. Alternative estimates employing hierarchical linear modeling (HLM) are reported in Appendix A. The svyset feature of Stata was used to specify the primary sampling unit, the sampling weights and strata information, and to linearize the standard errors. Subgroup analyses accounted for the stratified, clustered survey design when estimating variances. Significance tests were conducted for each model and for each coefficient within each model. Partial correlations (a measure of effect size) were calculated for the outcomes and factors included in the Stage 1 and Stage 2 regressions. RESULTS Results of the Stage 1 ordinary least squares (OLS) regression are reported in Table 1, and partial correlations are reported in Appendix B. As expected, GPA is strongly predicted by test scores. The magnitude of the effect exceeds the magnitude of the effect of socioeconomic status by 74%. For white students, a oneSD increase in test scores is predicted to increase GPA by 0.45 SD. At the mean test score for black males, individuals in this group are predicted to have GPAs that are 0.63 SD below the predicted GPA for white males with mean test scores for white males. At one SD above the mean test score for black males, individuals in this group are predicted to have GPAs that are 0.28 SD below the predicted GPA for white males with mean test scores for white males. Table 1. OLS Estimates and Predicted Values of GPA at Mean Scores and 1 SD Above Mean Scores, by Race and Sex
Notes. *p<.05, **p<.01, ***p<.001 (twotailed tests). Top panel: Linearized standard errors in parentheses Predicted GPA is standardized. Results of the Stage 2 logistic regressions are reported in Tables 2a and 2b, and partial correlations are reported in Appendix B. The effect size for test scores exceeds the effect size for socioeconomic status in every comparison, and exceeds the effect size for GPA in all but one comparison. For each outcome, results for the full model as well as the final reduced model are reported. Table 3 reports the predicted percentages of black and Hispanic students attaining each outcome when test scores are one SD above the mean score for the corresponding racial group, in comparison with the predicted percentages of black and Hispanic students meeting each outcome when test scores are at the mean of the corresponding group (if test scores are normally distributed, a oneSD increase above the mean score corresponds to an increase from the 50th to the 84th percentile of test scores for the relevant group). By every measure, highscoring students are significantly better prepared and perform significantly better than students who score at the mean. In most cases, the difference in educational preparation and performance is large; in some cases, it is extremely large. Table 2a. Logit Estimates for Selected Outcomes
Notes. *p<.05, **p<.01, ***p<.001 (twotailed tests). Linearized standard errors in parentheses. Neither Rsquared nor pseudorsquared is computed with logistic regression using survey data. ^{a} No AmIndian/Alaskan student experienced a highly rigorous curriculum; these cases were dropped. Table 2b. Logit Estimates for Selected Outcomes
Notes. *p<.05, **p<.01, ***p<.001 (twotailed tests). Linearized standard errors in parentheses. Neither Rsquared nor pseudorsquared is computed with logistic regression using survey data. Table 3. Predicted Probabilities at Mean Test Scores and 1 SD Above Mean Test Scores, by Race and Sex
EVER DROPPED OUT Table 2a reports logit estimates and Table 3 reports predicted probabilities that a student ever dropped out of high school, by race and sex. For black males, the predicted probability of ever dropping out of high school decreased from 21.5% for students scoring at the mean test score to 10.4% for students scoring one SD above the mean. For Hispanic males, the predicted probability of ever dropping out of high school decreased from 20.1% for students scoring at the mean test score to 7.7% for students scoring one SD above the mean. The interpretation is that raising black student test scores from the mean to one SD above the mean is predicted to reduce the number of black male students who ever drop out of high school by 51.6%; the corresponding reduction for Hispanic male students is 61.7%. EVER ATTENDED A 4YEAR INSTITUTION Table 2a reports logit estimates and Table 3 reports predicted probabilities that a student ever attended a 4year postsecondary institution, by race and sex. For black males, the predicted probability of ever attending a 4year institution increased from 57.6% for students scoring at the mean test score to 76.1% for students scoring one SD above the mean. For Hispanic males, the predicted probability of ever attending a 4year institution increased from 55.1% for students scoring at the mean test score to 78.6% for students scoring one SD above the mean. The interpretation is that raising black student test scores from the mean to one SD above the mean is predicted to increase the number of black male students who ever attend a 4year institution by 32.1%; the corresponding increase for Hispanic male students is 42.6%. ATTAINED BA DEGREE Table 2a reports logit estimates and Table 3 reports predicted probabilities that a student attained a baccalaureate degree, by race and sex. For black males, the predicted probability of attaining a baccalaureate degree increased from 11.4% for students scoring at the mean test score to 21.5% for students scoring one SD above the mean. For Hispanic males, the predicted probability of attaining a baccalaureate degree increased from 10.5% for students scoring at the mean test score to 23.7% for students scoring one SD above the mean. The interpretation is that raising black student test scores from the mean to one SD above the mean is predicted to increase the number of black male students who attain a baccalaureate degree by 88.6%; the corresponding increase for Hispanic male students is 125.7%. HIGHLY RIGOROUS HIGH SCHOOL CURRICULUM Table 2a reports logit estimates and Table 3 reports predicted probabilities that a student completed a highly rigorous high school curriculum, by race and sex. "Highly rigorous" was defined as a minimum curriculum including no less than four years of English, three years of math, three years of science, three years of social science, two years of a foreign language, one or more AP test scores in any subject, and all of the following courses: precalculus, biology, chemistry, and physics. For black males, the predicted probability increased from 0.17% for students scoring at the mean test score to 0.80% for students scoring one SD above the mean. For Hispanic males, the predicted probability increased from 0.02% for students scoring at the mean to 0.18% for students scoring one SD above the mean. The interpretation is that raising black student test scores from the mean to one SD above the mean is predicted to increase the number of black male students who completed a highly rigorous high school curriculum by 370.6%; the corresponding increase for Hispanic male students is 800%. COMPLETED CALCULUS Table 2b reports logit estimates and Table 3 reports predicted probabilities that students had completed calculus in high school, by race and sex. For black males, the predicted probability increased from 1.37% for students scoring at the mean test score to 7.1% for students scoring one SD above the mean. For Hispanic males, the predicted probability increased from 0.55% for students scoring at the mean to 4.8% for students scoring one SD above the mean. The interpretation is that raising black male student test scores from the mean to one SD above the mean is predicted to increase the number of black male students who completed calculus in high school by 418.2%; the corresponding increase for Hispanic male students is 772.7%. COMPLETED ALGEBRA 2 Table 2b reports logit estimates and Table 3 reports predicted probabilities that students had completed algebra 2 in high school, by race and sex. For black males, the predicted probability increased from 23.6% for students scoring at the mean test score to 41.8% for students scoring one SD above the mean. For Hispanic males, the predicted probability increased from 33.4% for students scoring at the mean to 59.9% for students scoring one SD above the mean. The interpretation is that raising black male student test scores from the mean to one SD above the mean is predicted to increase the number of black male students who completed algebra 2 in high school by 77.1%; the corresponding increase for Hispanic male students is 79.3%. ATTAINED 3.75 GPA Table 2b reports logit estimates and Table 3 reports predicted probabilities that a student had compiled a cumulative undergraduate GPA above 3.75, by race and sex. For black males, the predicted probability increased from 5.1% for students scoring at the mean test score to 7.0% for students scoring one SD above the mean. For Hispanic males, the predicted probability increased from 6.5% for students scoring at the mean to 9.7% for students scoring one SD above the mean. The interpretation is that raising black male student test scores from the mean to one SD above the mean is predicted to increase the number of black males who compile a cumulative undergraduate GPA above 3.75 by 37.3%; the corresponding increase for Hispanic male students is 49.2%. ASPIRED TO PH.D. OR PROFESSIONAL DEGREE Table 2b reports logit estimates and Table 3 reports predicted probabilities that a student aspired to a doctorate or a professional degree, by race and sex. For black males, the predicted probability increased from 1.4% for students scoring at the mean test score to 2.8% for students scoring one SD above the mean. For Hispanic males, the predicted probability increased from 1.3% for students scoring at the mean to 3.2% for students scoring one SD above the mean. The interpretation is that raising black male student test scores from the mean to one SD above the mean is predicted to increase the number of black males who aspire to a doctorate or a professional degree by 100.0%; the corresponding increase for Hispanic males is 146.2%. DISCUSSION The results of the current study predict that raising student test scores from the mean score in each racial group to scores that are one SD above the mean in each group would increase the number of black male students who attain baccalaureate degrees by 88.6% and the number of Hispanic male students who attain baccalaureate degrees by 125.7%. The number of black female students who attain baccalaureate degrees would increase by 79.2% and the number of Hispanic female students who attain baccalaureate degrees would increase by 110.3%. The results predict that black and Hispanic students would be less likely to drop out of high school, much more likely to complete a highly rigorous high school curriculum and to complete algebra 2 and calculus in high school, more likely to attend a 4year institution, more likely to compile an undergraduate GPA above 3.75, and much more likely to aspire to a doctorate or professional degree. Thus, an intervention that raises student achievement by one SD is predicted to have an enormous impact on disadvantaged black and Hispanic students. A question that arises is whether the predictions would change if the ability of disadvantaged students to afford the cost of college is incorporated. However, while net tuition and fees at fouryear public institutions increased significantly for fulltime dependent students in the top half of the income distribution, these costs barely increased for students in the bottom half of the income distribution, from $893 in 19992000 to $1,163 in 20112012 (College Board, 2013, Figure 12). For students in the bottom quartile of the income distribution, net tuition and fees remained at zero from 19992000 through 20112012 (College Board, 2013, Figure 12). For students in the bottom half of the income distribution, the net cost of attendance, including living costs, increased modestly from $11,059 in 19992000 to $13,843 in 20112012: an increase of $2,784 over the 12year period (College Board, 2013, Figure 12). For students in the bottom quartile of the income distribution, the net cost of attendance increased only $2,234 over the same period (College Board, 2013, Figure 12). Thus, while list prices have increased, those prices are not what wellprepared black and Hispanic students pay. These students are highly sought by selective colleges that offer scholarships and grants to attract qualified minorities. As a consequence, the average net price of tuition for fulltime, dependent, instate freshmen in the bottom half of the income distribution at 20 prestigious state flagship universities, including the University of CaliforniaLos Angeles, the University of WisconsinMadison, the University of Illinois at UrbanaChampaign, and the University of MinnesotaTwin Cities was $1,570; in other words, these students received more grant money than they paid in tuition, offsetting their living costs (calculated from Bowen et al., 2009, Figure 9.2, p. 169). The average net price of tuition for fulltime, dependent, instate freshmen in the bottom half of the income distribution at 15 lessselective state universities, including the University of North CarolinaCharlotte, Appalachian State University, George Mason University, and Virginia Commonwealth University was $1,280 (calculated from Bowen et al., 2009, Figure 9.4, p. 171). While listed tuition and fees at private colleges seemingly place them out of reach of low and middleincome students, net prices are much more affordable. The average net cost of attendance in 201213, including tuition, fees, and room and board, after taking into account federal, state, and institutional financial aid for students who come from households earning between $30,000 and $48,000 a year and qualifying for federal aid, was $3,000 at Harvard, $3,500 at Columbia, and $4,300 at Stanford (Leonhardt, 2014). Students who come from households earning less would pay even less than those amounts. The actual tuition paid by wellprepared minorities is typically a fraction of the list price. As a result, it appears that students are not being forced to enroll in inexpensive colleges that are inappropriate for their level of preparedness (Hoxby, 2000). Instead, it appears that students from high and mediumhighincome families who have low SAT scores and high school grades are being replaced by highly prepared students from lowincome families (Hoxby, 2000). Less than 8% of all students are prevented from enrolling by their inability to pay (Avery & Hoxby, 2004; Carneiro & Heckman, 2003), and federal Pell Grants have no significant impact on enrollment (Kane, 1999), leading Bowen et al. (2005) to conclude that family finances have a fairly minor direct impact on a students ability to attend a college (p. 91). Math and verbal SAT scores are much more important factors in the college [application] process than financial variables such as family income (Spies, 2001, p. 17). Therefore, it appears that high tuition is not what prevents minorities from attaining baccalaureate degrees. Instead, the results reported here suggest that low average test scores significantly depress rates of attainment. If the test scores of minorities are raised by one SD, it appears that a much larger number of minorities would qualify for selective colleges, accelerating the substitution of highlyqualified minorities for white students. The significance of the results reported here is that they underline the importance of swiftly adopting the most efficient approaches for raising student achievement in the K12 gradeswell before students enroll in college. A oneSD increase in test scores would boost the average performance of every school, creating more highperforming schools and permitting more studentsincluding more minority studentsto experience the benefits of attending highperforming schools. In addition, students would also benefit from the improvement in educational outcomes that would occur whether they attended a high or lowperforming school. The results suggest that when underprepared students enroll in college, their prospects for success are greatly diminished. However, the results should not be interpreted to imply that college admissions officers should emphasize test scores when they select students, nor should they be interpreted to imply that there is no room for colleges to improve baccalaureate attainment rates. Instead, the policy implication is that efforts to identify and adopt efficient approaches for raising student achievement should be redoubled because the payoff appears to be large. This conclusion may seem unsurprising. The call to raise student achievement has been sounded for at least 30 years, since the publication of A Nation at Risk: The Imperative of Educational Reform by the National Commission on Excellence in Education (1983). The report issued a call to improve what the Commission viewed as the nations mediocre level of educational performance. In response, enormous effort has been invested over the past 30 years toward the goal of identifying effective approaches for raising student achievement. The Institute of Education Sciences alone budgets over $671 million annually for education research (U.S. Department of Education, 2013). What remains unclear is whether this effort has been productive. While there has been some improvement with regard to trends in achievement by 9yearold and 13yearold students, the achievement of 17yearold students has remained flat over the past 40 years (National Center for Education Statistics, 2008). If existing educational strategies have been unsuccessful, perhaps a new strategy is required. Rapid performance feedback is a strategy that has largely been ignored, yet potentially offers large gains. A review of research regarding feedback found an average effect size of 0.79 SD (Hattie & Timperley, 2007). The results suggest that feedback is most effective when it is nonjudgmental, involves frequent testing (25 times per week), and is presented immediately after a test. Under these conditions, the metaanalyses and reviews of feedback interventions suggest that the effect size for testing feedback is no lower than 0.7 SD (BangertDrowns, Kulik, Kulik, & Morgan, 1991; Black & Wiliam, 1998; Fuchs & Fuchs, 1986; Kluger & DeNisi, 1996), equivalent to raising the achievement of an average nation such as the United States to the level of the top five nations (Black & Wiliam, 1998). When teachers were required to follow rules about using the assessment information to change instruction for students, the average effect size exceeded 0.9 SD, and when students were reinforced with material tokens in addition to the frequent testing, the average effect size increased even further, exceeding 1.1 SD (Fuchs & Fuchs, 1986). Emotionally neutral (i.e., testing) feedback that is devoid of praise or criticism is likely to yield impressive gains in performance, possibly exceeding 1 SD (Kluger & DeNisi, 1996, p. 278). Lysakowski and Walberg (1982) reported an effect size of 1.13, Walberg (1982) reported an effect size of 0.82, and Tenenbaum and Goldring (1989) reported an effect size of 0.74, all of which are substantial effects. These effect sizes were typically obtained over periods of one year or less. Presumably, the implementation of rapid performance feedback throughout the entire academic careers of students, from kindergarten through 12th grade, would result in even larger effect sizes. The studies suggest that more efficient approaches for raising student achievement are available. To the extent that existing approaches for raising student achievement are relatively unproductive, inefficient, and disproportionately affect minority students, current policies may serve to perpetuate the disadvantaged status of minorities. The results reported in the present study suggest that attention to these issues should receive high priority. Appendix A When students are nested within schools, linear mixed modeling (including hierarchical linear modeling or HLM) may be employed to analyze the effects of schoollevel factors on the intercepts and coefficients of studentlevel relationships. When outcomes are significantly correlated within level2 units, linear mixed modeling (LMM) is generally preferred. LMM corrects the standard errors of the prediction parameters and partitions the variance in outcomes into within and betweenschool components. However, the software used to implement LMM and HLM typically employs an empirical Bayes estimation strategy because it results in a smaller mean square error (Woltmann, Feldstain, MacKay, & Rocchi, 2012). Parameter estimates are "shrunk" toward their estimated conditional mean vectors and estimates with the least precision experience the most shrinkage (Raudenbush, 1988). Low precision and high shrinkage would occur whenever analyses are conducted with a truncated sample, for example, minority subgroups experiencing lowfrequency outcomes such as electing a rigorous curriculum or electing calculus. In practice, the shrinkage estimator is often biased, especially in smallsample situations involving a logit link and Bernoulli sampling model (Afshartous & de Leeuw, 2005, p. 119; Busing, 1993; Raudenbush, 2008, pp. 213, 229, 230, 231, 234). With this in mind, regression equations for eight educational outcomes were estimated using HLM. The level2 group variable was each student's high school. Parameter estimates are reported in Table A.1. Table A.1. Logit Estimates for Selected Outcomes
Notes. *p<.05, **p<.01, ***p<.001 (twotailed tests). Robust standard errors in parentheses. Estimation method: EM Laplace approximation. Neither Rsquared nor pseudorsquared is computed with logistic regression using survey data. School score mean = schoollevel test score associated with each studentlevel observation. Centered test = studentlevel test score minus school score mean. ^{a} No AmIndian/Alaskan student experienced a highly rigorous curriculum; these cases were dropped. Predicted probabilities for each outcome at mean studentlevel test scores, and one SD above mean studentlevel test scores, are reported in Table A.2. School test score means when studentlevel test scores are increased by one SD were predicted from regression of school scores on studentlevel test scores, socioeconomic status, student GPA, sex, race, and geographic region (Table A.3) and were combined with predictions of grades when studentlevel test scores are increased by one SD (Table 1) to estimate the total impact on eight educational outcomes when studentlevel test scores are raised by one SD. Table A.2. Predicted Probabilities at Mean Test Score and 1 SD Above Mean Test Score, by Race and Sex
Table A.3. OLS Estimates and Predicted Value of School Score at Mean StudentLevel Score and 1 SD Above Mean StudentLevel Score, by Race
Notes. *p<.05, **p<.01, ***p<.001 (twotailed tests). Linearized standard errors in parentheses. Predicted school scores are standardized values. The overall pattern of predicted probabilities reported in the top panel of Table A.2 is similar to the pattern reported in the top panel of Table 3: Asian students outperformed white students, who outperformed black and Hispanic students. Relative magnitudes of performance across race were roughly comparable. However, with regard to minority subgroups experiencing lowfrequency outcomes such as electing a rigorous curriculum or electing calculus, the HLM estimates produced predicted frequencies that departed more sharply from the actual frequencies exhibited in the raw data, compared to the predicted frequencies produced by the generalized linear model (GLM) estimates. This difference is attributable to the application of empirical Bayes estimation, which shrinks estimates for each level2 group intercept toward the grand mean (which is zero when values are standardized). Shrinkage is expected to be large when analyzing outcomes that are infrequent (because the resulting parameter estimates are imprecise). The HLM estimates predicted that no Asian, black, Hispanic, or white student experienced a rigorous curriculum. However, tabulations of frequencies derived from the raw data indicate that small percentages of Asian, black, Hispanic, and white students experienced a rigorous curriculum. The GLM estimates correctly predicted that small percentages of each racial group experienced a rigorous curriculum. In general, across most of the eight outcome measures, frequencies predicted using the GLM estimates were more closely matched to the actual frequencies exhibited in the raw data, compared to frequencies predicted from the HLM estimates. In addition, it appears that the GLM estimates produced more conservative estimates of the predicted impact of raising test scores by one SD. For almost all outcomes, the predicted impact of raising test scores by one SD was larger when estimated with HLM than with GLM (compare the top and bottom panels of Table A.2 to the top and bottom panels of Table 3). The HLM estimates predicted that 9.3% of black males attained a BA degree and 20.3% of black males who scored one SD above the mean score would attain a BA degreea 118.3% increase in the number of black males who would attain a BA degree. In comparison, the GLM estimates predicted that raising black student test scores from the mean to one SD above the mean would increase the number of black males who attain a BA degree by 88.6%. The HLM estimates predicted that 6.0% of Hispanic males attained a BA degree and 17.3% of Hispanic males who score one SD above the mean score would attain a BA degreea 188.3% increase in the number of Hispanic males who would attain a BA degree. In comparison, the GLM estimates predicted that raising Hispanic student test scores from the mean to one SD above the mean would increase the number of Hispanic males who attain a baccalaureate degree by 125.7%. The HLM estimates imply that a oneSD increase in test scores would increase the number of black males who complete calculus in high school by 500%. The number of Hispanic males who complete calculus would increase by 1,300%, and the number of Asian males who complete calculus would increase by 1,805.9%. In comparison, the GLM estimates predict that raising black student test scores from the mean to one SD above the mean would increase the number of black males who complete calculus in high school by 418.2%; the corresponding increase for Hispanic males is 772.7% and the increase for Asian males is 852.6%. It might be argued that the HLM estimates should be preferred because they separate effects that are properly attributed to schools, rather than the characteristics of students within schools. However, a oneSD increase in test scores that occurs over the K12 years would have two effects. An increase in test scores would boost the average performance of every school, creating more highperforming schools and permitting more studentsincluding more minority studentsto experience the benefits of attending highperforming schools. In addition, students would also benefit from the improvement in educational outcomes that would occur whether they attended a high or lowperforming school. Therefore, it is appropriate to aggregate both effects of test scores on educational outcomes: the effect that occurs when the average performance of every school increases, permitting more students to experience the benefits of highperforming schools, and the effect that occurs regardless of the quality of the high school that a student attended. The aggregate effect is what is reported by the GLM estimates in Table 2a and Table 2b and the predicted probabilities in Table 3. In principle, if all studentlevel test scores are increased by one SD, school test score means would simply increase by corresponding amounts. As indicated above, however, school test score means when studentlevel test scores are increased by one SD were predicted from regression of school scores on studentlevel test scores, socioeconomic status, student GPA, sex, race and geographic region (Table A.3). The latter approach produces conservative estimates of the impact of studentlevel test scores on school test score means. The conservative approach reflects the sorting that occurs when highscoring students gain admission and selfselect into high schools with high average test scores. Sorting occurs in several ways. First, many of the best public high schools only admit students through competitive examinations. This includes Lowell High School in San Francisco; DeBakey High School in Houston; Boston Latin Academy, Boston Latin School, and John D. O'Bryant in Boston; nine selective public high schools in New York, including the Bronx High School of Science and the Brooklyn Latin School; and 11 selectiveenrollment schools in Chicago. Second, the parents of highscoring students seek to place their children in the best private and parochial high schools, which only admit students through competitive examinations. These private and parochial schools require applicants to submit scores from the Independent School Entrance Examination, the Secondary School Admission Test, or the High School Placement Test. In New York, students seeking admission to Catholic high schools must submit scores from the Test for Admission into Catholic High Schools. Third, regardless of whether school admission involves a competitive examination, students and their parents are acutely aware of the reputed quality of various schools. Parents of students who achieve at high levels seek to place their children in the best available schoolsboth public and private. It is not uncommon for children to commute long distances by bus or car to attend the best schools. Even after controlling for socioeconomic status and race, a oneSD increase in student test score is predicted to increase the quality of the school attended by a student (the regression estimates reported in Table A.3 control for socioeconomic status and race). This occurs because highscoring students gain admission and selfselect into the best high schools. However, it is important to note that the relationship between studentlevel test scores and school test score means would be accentuated if all studentlevel test scores are raised by one SD because this increase would boost the average performance of every high school, creating more highperforming high schools and permitting more students to experience the benefits of attending highperforming high schools even if no student changed his or her school. To summarize, while HLM is often preferred to GLM when students are nested within schools, the purpose of the analysis reported in this article is consistent with the use of GLM rather than HLM. The results reported in Tables A.1 and A.2 suggest that raising student test scores by one SD would have a significant positive impact, whether estimated with HLM or GLM. The impact is larger when estimated with HLM (implying that GLM estimates are relatively conservative). Finally, the application of HLM and empirical Bayes estimation when analyzing lowfrequency minority group outcomes using a logit link and Bernoulli sampling model appears to introduce a greater degree of bias in the parameter estimates, compared to the application of GLM. Appendix B Table B.1. Partial Correlations Between Test Scores and Outcomes, Controlling for High School GPA, SES, Race, Region, and Sex
Notes. Each column reports a separate set of partial correlations after dropping insignificant covariates. The partial correlations are measures of effect sizes. They indicate sizable effects of test scores on the outcome variables, relative to the effects of socioeconomic status and race. References Adelman, C. (1999). Answers in the tool box: Academic intensity, attendance patterns, and bachelor's degree attainment. Washington, DC: U.S. Department of Education. Afshartous, D., & de Leeuw, J. (2005). Prediction in multilevel models. Journal of Educational and Behavioral Statistics, 30(2), 109139. Astin, A. W., & Oseguera, L. (2005). Degree attainment rates at American colleges and universities. Los Angeles, CA: Higher Education Research Institute, Graduate School of Education, University of California, Los Angeles. Avery, C., & Hoxby, C. M. (2004). Do and should financial aid packages affect students' college choices? In C. M. Hoxby (Ed.), College choices: The economics of where to go, when to go, and how to pay for it (pp. 239301). Chicago: University of Chicago Press. BangertDrowns, R. L., Kulik, C. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in testlike events. Review of Educational Research, 61(2), 213238. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7 74. Bowen, W. G., & Bok, D. (1998). The shape of the river: Longterm consequences of considering race in college and university admissions. Princeton, NJ: Princeton University Press. Bowen, W. G., Chingos, M., & McPherson, M. S. (2009). Crossing the finish line: Completing college at America's public universities. Princeton, NJ: Princeton University Press. Bowen, W. G., Kurzweil, M. A., & Tobin, E. M. (2005). Equity and excellence in American higher education. Charlottesville, VA: University of Virginia Press. Bridgeman, B., Pollack, J., & Burton, N. (2004). Understanding what SAT reasoning test scores add to high school grades: A straightforward approach. New York, NY: College Entrance Examination Board. Burton, N. W., & Ramist, L. (2001). Predicting success in college: SAT studies of classes graduating since 1980. New York, NY: College Entrance Examination Board. Busing, F. M. T. A. (1993). Distribution characteristics of variance estimates in twolevel models: A Monte Carlo study. Leiden, The Netherlands: University of Leiden. Camara, W. J., & Echternacht, G. (2000). The SAT I and high school grades: Utility in predicting success in college. New York, NY: College Entrance Examination Board. Carneiro, P., & Heckman, J. J. (2003). Human capital policy. In J. J. Heckman & A. Krueger (Eds.), Inequality in America: What role for human capital policies? (pp. 77239). Cambridge, MA: MIT Press. College Board. (2013). Net prices by income over time: Public sector. Retrieved from Duncan, O. D., Featherman, D. L., & Duncan, B. (1972). Socioeconomic background and achievement. New York, NY: Seminar Press. Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A metaanalysis. Exceptional Children, 53(3), 199208. Geiser, S., & Santelices, M. V. (2007). Validity of highschool grades in predicting student success beyond the freshman year: Highschool record vs. standardized tests as indicators of fouryear college outcomes. Berkeley, CA: University of California Press, Geiser, S., & Studley, R. (2002). UC and the SAT: Predictive validity and differential impact of the SAT I and SAT II at the University of California. Educational Assessment, 8(1), 126. Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81 112. Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and class structure in American life. New York, NY: Simon and Schuster. Jencks, C., Bartlett, S., Corcoran, M., Crouse, J., Eaglesfield, D., Jackson, G.,&Williams, J. (1979). Who gets ahead? The determinants of economic success in America. New York, NY: Basic Books. Jencks, C., & Phillips, M. (Eds.). (1998). The blackwhite test score gap. Washington, DC: Brookings Institution Press. Kane, T. J. (1999). The price of admission: Rethinking how Americans pay for college. Washington, DC: Brookings Institution Press. Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a metaanalysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254284. Leonhardt, D. (2014, September 9). Measuring colleges' success in enrolling the less affluent, The New York Times, p. A3. Lysakowski, R. S., & Walberg, H. J. (1982). Instructional effects of cues, participation, and corrective feedback: A quantitative synthesis. American Educational Research Journal, 19, 559578. Manski, C. F., & Wise, D. A. (1983). College choice in America. Cambridge, MA: Harvard University Press. National Center for Education Statistics. (2008). NAEP 2008 trends in academic progress. Washington, DC: U.S. Department of Education. National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational reform. Washington, DC: Author. Niu, S. X., & Tienda, M. (2009). Testing, ranking and college performance: Does high school matter? Princeton, NJ: Princeton University. Ramist, L., Lewis, C., & McCamleyJenkins, L. (1994). Student group differences in predicting college grades: Sex, language, and ethnic groups. New York, NY: College Entrance Examination Board. Raudenbush, S. W. (1988). Educational applications of hierarchical linear models: A review. Journal of Educational and Statistics, 13(2), 85116. Raudenbush, S. W. (2008). Many small groups. In J. De Leeuw & E. Meijer (Eds.), Handbook of multilevel analysis (pp. 207236). New York, NY: Springer. Scott, L., Ingels, S. J., & Owings, J. A. (2007). Interpreting 12thgraders NAEPscaled mathematics performance using high school predictors and postsecondary outcomes from the National Education Longitudinal Study of 1988 (NELS:88). Washington, DC: National Center for Education Statistics. Spies, R. (2001). The future of private colleges. The effect of rising costs on college choice. Princeton, NJ: Princeton University. Stumpf, H., & Stanley, J. C. (2002). Group data on high school grade point averages and scores on academic aptitude tests as predictors of institutional graduation rates. Educational and Psychological Measurement, 62(6), 10421052. Tenenbaum, G., & Goldring, E. (1989). A metaanalysis of the effect of enhanced instruction: Cues, participation, reinforcement and feedback and correctives on motor skill learning. Journal of Research and Development in Education, 22, 5364. Testimony by Caroline M. Hoxby, "The rising cost of college tuition and the effectiveness of government financial aid", U.S. Senate, 106th Congress, 2d Sess. (2000). U.S. Department of Education. (2013). Department of Education budget tables. Retrieved from http://www2.ed.gov/about/overview/budget/tables.html?src=ct Walberg, H. J. (1982). What makes schooling effective? Contemporary Education Review, 1, 134. Winship, C., & Korenman, S. D. (1999). Economic success and the evolution of schooling and mental ability. In S. E. Mayer & P. E. Peterson (Eds.), Earning and learning: How schools matter (pp. 4978). Washington, DC: Brookings Institution Press. Woltmann, H., Feldstain, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in quantitative methods for psychology, 8(1), 5269. Yeh, S. S. (2007). The costeffectiveness of five policies for improving student achievement. American Journal of Evaluation, 28(4), 416436. Yeh, S. S. (2008). The costeffectiveness of comprehensive school reform and rapid assessment. Education Policy Analysis Archives, 16(13). Retrieved from http://epaa.asu.edu/epaa/v16n13/ Yeh, S. S. (2009a). Class size reduction or rapid formative assessment? A comparison of cost effectiveness. Educational Research Review, 4, 715. Yeh, S. S. (2009b). The costeffectiveness of raising teacher quality. Educational Research Review, 4(3), 220232. Yeh, S. S. (2010a). The costeffectiveness of 22 approaches for raising student achievement. Journal of Education Finance, 36(1), 3875. Yeh, S. S. (2010b). The costeffectiveness of NBPTS teacher certification. Evaluation Review, 34(3), 220241. Yeh, S. S. (2011). The costeffectiveness of 22 approaches for raising student achievement. Charlotte, NC: Information Age Publishing. Yeh, S. S. (2012). The reliability, impact and costeffectiveness of valueadded teacher assessment methods. Journal of Education Finance, 37(4), 374399. Yeh, S. S. (2013). A reanalysis of the effects of teacher replacement using valueadded modeling. Teachers College Record, 115(12). 135. Yeh, S. S., & Ritter, J. (2009). The costeffectiveness of replacing the bottom quartile of novice teachers through valueadded teacher assessment. Journal of Education Finance, 34(4), 426451.





