Do Student-Level Incentives Increase Student Achievement? A Review of the Effect of Monetary Incentives on Test Performance
by Vi-Nhuan Le - 2020
Background: Policymakers have debated whether test scores represent students’ maximum level of effort, prompting research into whether student-level financial incentives can improve test scores. If cash incentives are shown to improve students’ test performance, there can be questions as to whether test scores obtained in the absence of financial incentives accurately reflect what students know and can do. This can raise concerns as to whether test scores should be used to guide policy decisions.
Purpose: This study used meta-analysis to estimate the effect of student-level incentives on test performance. The study also included a narrative review. Research Design: Twenty-one studies conducted in the United States and internationally were included in the meta-analysis. Effect sizes were estimated separately for mathematics, reading/language arts, and overall achievement.
Findings: Financial incentives had a significantly positive effect on overall achievement and on mathematics achievement, but no effect on reading/language arts achievement. The narrative review suggested mixed effects with regards to whether treatment estimates could be sustained after the removal of the incentives and whether larger cash payments were associated with stronger program impact. Programs that offered monetary incentives in conjunction with other academic supports tended to show stronger effects than programs that offered incentives alone.
Conclusion: The findings raise questions as to whether policymakers should use scores from low-stakes test to inform high-stakes policy decisions. The study cautions against using scores from international assessments to rank order countries’ educational systems or using scores from state achievement tests to sanction schools or award teacher bonuses.
Although providing incentives to students to improve student achievement is a controversial practice, many schools throughout the United States use reward programs as a means of improving student test scores (Prothero, 2017). In a survey of 250 charter school principals from 17 states, Raymond (2008) found that 57% of responding principals indicated that they used incentives with their students as a way of raising student achievement. In recent years, cash incentive programs have gained traction, with districts throughout the country paying students for academic achievement. For example, the Baltimore City Public School district implemented a monetary incentive program that paid 10th and 11th graders who had previously failed one of their state graduation exams up to $110 if the students improved their scores on benchmark assessments (Ash, 2008). Similarly, the Urban Strategies Memphis Hope program awards students $30 for each A, $20 for each B, $10 for each C grade earned on their report card, and $50 for scoring at least 19 on the ACT (Scheie, 2014).
Understanding whether financial incentives can improve student achievement is important for two reasons. First, incentives interventions are less costly to implement than other types of educational reforms that require higher levels of human capital (e.g., class size reductions, teacher training, curricular development, etc.). If the body of literature suggests that monetary incentives have positive effects on test scores, policymakers may want to increase investments in cash incentive programs as a cost-effective strategy to improve student achievement. Second, educators often use test scores to guide policy decisions. An important assumption underlying the test scores is that students are sufficiently motivated to perform well so their scores are an accurate representation of what they know and can do. If research shows that offering financial incentives can improve student scores, then there can be concerns as to whether test scores obtained in the absence of financial incentives are truly indicative of students abilities. This would call into question the utility of the test scores to inform policy decisions.
The purpose of this study is to use meta-analysis to synthesize the results across multiple evaluations of student-level incentives in order to estimate the magnitude of the effect of monetary incentives on student test performance. The meta-analysis is supplemented with a narrative review of findings that have important ramifications for the design of future cash incentive programs. This study is guided by the five research questions shown below. The first question is addressed via meta-analysis and the latter four questions are addressed through a narrative review. The research questions include:
What is the effect of cash incentives on student test performance? Does the effect vary by particular subgroups, including location (i.e., international students versus students in the United States), schooling level (i.e., elementary versus secondary grades), gender, and initial achievement level?
Are the effects of incentives on test performance sustained, once the incentives are removed?
Is there a relationship between the magnitude of program effect and the size of the monetary incentive?
Among studies with multiple treatment conditions, what are the features of promising incentive programs?
Are there any unintended consequences of implementing incentive programs?
This review is organized as follows. It begins with a discussion of the pros and cons of incentive programs from a motivational perspective. Next, the study describes previous literature that has examined the effect of monetary incentives on test performance. The study then describes the analytic approach used in the study, including the search methods, inclusion criteria, and meta-analytic techniques used to estimate effect sizes. This is followed by the results of the meta-analysis, then the results of the narrative review. The article concludes with implications of the results for future policy and research.
EXPECTANCY-VALUE THEORY AS A RATIONALE FOR PROVIDING MONETARY INCENTIVES TO STUDENTS
Although many different motivational theories have been used as a rationale for incentive programs,1 this study uses the expectancy-value framework adopted in the health and work performance fields (Stolovitch, Clark, & Condly, 2002; White, 2012). According to expectancy-value theory, students effort, persistence, and performance on a task depend on their beliefs about their chances of performing well on the task (i.e., expectancy) and on their subjective values that they place on the task and its associated rewards (i.e., value) (Eccles et al., 1983; Eccles & Wigfield, 2002; Wigfield & Cambria, 2000). Expectancy-value theorists argue that students may not put forth their best effort on achievement tasks because they have low valuation of the tasks and/or low expectancy of performing well on the task (Eccles, 2007). Monetary incentives are primarily intended to increase the value that students place on an achievement task (Levitt, List, Neckermann, & Sadoff, 2016).
From a behavioral economics perspective, students may have low subjective values for achievement tasks because the costs and effort required to perform well on the tasks are upfront and high, but the benefits are delayed and not readily apparent or tangible (Barrow & Rouse, 2016; Levitt, List, Neckermann, & Sadoff, 2016). For example, paying attention in class, completing homework, and engaging in behaviors that lead to school success can bring intangible rewards (such as peer recognition), but a tangible payoff may not be realized until well in the future (Bembenutty, 2008). Monetary rewards can enhance the value that students place on achievement tasks by allowing them to more quickly realize the payoffs of their hard work in a concrete manner (Sadoff, 2014; Wallace, 2009).
MONETARY INCENTIVES AS A DETRIMENT TO INTRINSIC MOTIVATION
Despite the theoretical appeal of expectancy-value theory as a rationale for providing monetary incentives for student achievement, many motivational theorists question the premise that monetary incentives will necessarily increase students motivation and resulting performance. Depending on the reward structure, incentives may have the opposite effect and can actually lead to decreased performance (Ryan & Deci, 2000). Because incentives create an explicit link between desired student behaviors that lead to high student achievement (such as studying) and monetary payments, engaging in these behaviors becomes a transactional process (Gallani, 2017). If students attribute their desire to study to a transactional link to a monetary reward as opposed to an inherent interest in the subject matter, they may express lower intrinsic motivation in the task. A meta-analysis conducted by Deci, Koestner, and Ryan (1999) found support for the notion that providing performance-contingent rewards can undermine students intrinsic motivation, as the offer of a performance-contingent reward was associated with lower levels of self-reported interest. In a classic experiment, Lepper, Greene, and Nisbett (1973) found that offering an external reward to young children to draw and color pictures resulted in a subsequent decrease in childrens interest in drawing as a free-choice activity, relative to children who had not received a reward. Frey and Goette (1999) found that high school volunteers who were collecting donations for charity put forth more effort when they were not compensated than when a small payment was offered.
Motivational theorists also contend that even if external rewards could improve performance, the positive effect is likely to be fleeting, as students may engage in temporary compliance (Kohn, 1993) and decrease their efforts after the removal of the incentives (Gneezy, Meier, & Rey-Biel, 2011; Willingham, 2008). Gallini (2017) examined hand hygiene practices from a hospital and found that incentives resulted in an increase in hand sanitizing during the period when they were eligible for incentives, but that individuals regressed to lower levels of hand sanitizing after the incentives were withdrawn. Visaria, Dehejia, Chao, and Mukhopadhyay (2016) examined an incentive program in India that was designed to improve the school attendance of low-income children. When the incentives were in place, average attendance improved, but the removal of the incentives resulted in even lower attendance among children with initially low baseline attendance. Taken together, this line of research suggests that any positive effect of incentives on student achievement may fade once the incentives are removed.
PREVIOUS REVIEWS OF FINANCIAL INCENTIVES ON STUDENT ACHIEVEMENT
There have been three often-cited reviews of financial incentive programs on student achievement in field settings. Slavin (2010) highlighted 19 financial incentive programs implemented across developing and developed countries. Using a narrative approach, Slavin concluded that monetary incentives did not have any effects on the graduation rates and achievement of students in developed countries but were weakly positive for students in developing countries. The National Research Council (2011) conducted a narrative review of test-based incentives and concluded that incentives had relatively weak effect on student achievement. McEwan (2015) conducted a meta-analysis of eight randomized cash incentive studies implemented in developing countries and reported a statistically significant standardized regression coefficient of 0.089.
Although these reviews provide useful information about the potential effect of student-level financial incentives on student achievement, they also underscore the need for more research. Conditional cash transfer programs comprised the vast majority of the studies included in Slavins (2010) review. However, conditional cash transfer programs typically incentivize school attendance as opposed to student achievement, so it is not entirely surprising that he found the effect of incentives on student achievement to be weak. Presumably, stronger effects could be observed with incentive programs that explicitly set out to improve student achievement. The National Research Council (2011) review represented an early synthesis of incentives research, and since its publication, numerous other incentive studies have been completed. McEwans (2015) study examined student-level incentives in conjunction with teacher-level incentives, rendering it difficult to understand to what extent the reported 0.089 regression coefficient reflected the specific effect of student-level incentives. The present study builds on these prior reviews by increasing the number of student-level incentive studies reviewed and by estimating an effect size specific to student-level incentives.
LITERATURE SEARCH PROCEDURES
Three sequential steps were used to search for relevant literature. First, various combinations of the search terms financial/cash/monetary, incentives/rewards/awards/prize, and student achievement/test scores/test performance were entered within Google and six databases representing multiple disciplines: ERIC, PsycInfo, Sociological Abstracts, Dissertation Abstracts, EconLit, and National Bureau of Economic Research. The sources covered both peer-reviewed journal articles as well as working papers in order to mitigate publication bias. The search included all studies up to December 2017, but because evaluations of student-level financial incentives are a relatively recent development, most studies were published within the last decade. Second, the Social Sciences Citation Index was used to identify studies that cited seminal research in the field. Finally, the bibliographies of all major literature reviews, as well as the bibliographies of the studies that met the inclusion criteria, were examined for any relevant or missing citations.
To be included in the meta-analysis, the study had to meet several criteria. First, the study must have been a field experiment as opposed to a laboratory experiment. Second, the incentive programs needed to provide students with cash rewards for meeting a specified academic performance threshold. This criterion eliminated the majority of conditional cash transfer reforms because most of these programs incentivized school attendance as opposed to school achievement. Third, the study needed to include standardized test scores as an outcome. This criterion eliminated studies that examined whether incentives increased participation in certain courses, such as Advanced Placement courses. Fourth, the study needed to focus on students in the elementary and secondary grades. This criterion eliminated all of the studies that provided postsecondary scholarship incentives to college students. Finally, research briefs, op-ed pieces, and research narratives were excluded because they were not primary empirical studies.
The literature search yielded 80 research studies, of which 74 were empirical studies that necessitated further review (see Figure 1). The majority of the studies (n = 30) were eliminated because they were conditional cash transfer programs that incentivized school attendance but not student achievement. Another 10 studies were eliminated because they were scholarship programs conducted with college-going students in postsecondary settings. Four studies were eliminated because they did not provide sufficient details that would allow for a conversion of their statistical results (e.g., percentages meeting an academic performance threshold) into the regression estimate metric used in this study. Five studies were eliminated because they did not include test performance as an outcome, and another four studies were eliminated because they were conducted in a laboratory setting. In total, 21 unique studies were included in the analysis.
Figure 1. Exclusion criteria for the meta-analysis
CODING STUDY INFORMATION
Each study was coded for the following information: (a) research design (e.g., randomized versus correlational); (b) sample size; (c) structure of the financial incentive program; (d) study location; (e) achievement outcomes; and (f) regression coefficients and associated standard errors for the treatment and control conditions for each achievement outcome, delineated by subject, gender, and initial achievement level, where relevant.
In some studies, both unconditional and conditional regression estimates were reported. In those instances, the regression coefficients from models that included controls for student- and school-level variables were reported because the inclusion of the control variables are often used to adjust for imbalances between the treatment and control groups (McEwan, 2015), and can reduce the standard error of the estimated treatment estimate (Duflo, Hanna, & Ryan, 2008).
This study focused on three achievement outcomes: overall achievement, mathematics, and reading/language arts. Overall achievement was defined as the treatment estimate pooled across subjects such as mathematics, reading/language arts, science, and history/geography/ social sciences. The analysis also estimated separate effect sizes for mathematics and reading/language arts, where possible.
Separate effect sizes were examined for certain subgroups. The subgroups include location, schooling level, gender, and initial achievement level. These subgroups were chosen because the literature suggests effect sizes may vary by these factors. For example, many of the incentive programs in international settings were conducted in developing countries, which differs markedly from the United States with respect to school resources. In terms of schooling level, it is possible that older children may be more motivated by incentives than younger children because older children may have a better understanding of money and finances. With respect to gender, scholarship studies conducted at the postsecondary level have found that females are more responsive to financial incentives than males (Angrist, Lang, & Oreopoulos, 2009). Finally, it is important to examine whether there are differential effects on students of different achievement levels because previous studies have found that financial rewards can have stronger effects on higher achieving college students than on lower achieving college students (Leuven, Oosterbeek, & van der Klaauw, 2010).
THE REGRESSION COEFFICIENT AS AN EFFECT SIZE INDEX
Following other studies (e.g., Cooper, Robinson, & Patall, 2006; Kim, 2011; McEwan, 2015; Nieminen, Lehtiniemi, Vähäkangas, Huusko, & Rautio, 2013), a standardized regression coefficient was used as a measure of effect size. All but two studies reported a standardized regression coefficient for the effect of financial incentives on test performance. In the two instances that unstandardized regression coefficients were reported, the unstandardized regression coefficient estimate was converted to a standardized regression coefficient by dividing the treatment effect and its associated standard error by the pooled standard deviation of the outcome variable (McEwan, 2015).
An efficient estimator of the mean of the true effects of the various programs is the weighted average of the observed effect sizes, where the weight is the inverse of the squared standard error (Kim, 2011; Nieminen et al., 2013). Thus, the mean effect size estimate of the standardized regression coefficients (i.e., ()) and the associated standard errors (i.e., ) are estimated by:
and , where (1)
where is the standardized regression estimate for the effect size in treatment j for study k, is the squared standard error, and represents the common between-study variance resulting from random effects pooling (Borenstein, Hedges, Higgins, & Rothstein, 2009), which is estimated via restricted maximum likelihood (Ringquist, 2013). Effects were estimated using a random-effects model, which assumes that variation in the observed effect sizes stems from both sampling error and random variance.
ACCOUNTING FOR DEPENDENT EFFECT SIZE ESTIMATES
A key assumption of meta-analysis is that the treatment effect sizes can be treated as independent (Kim, 2011). However, this assumption is violated when there are multiple outcomes or multiple incentive conditions within the same study (Scammacca, Roberts, & Stuebing, 2014). Multiple outcomes (e.g., mathematics and reading achievement scores) are not independent data points because the effect sizes are based on the same set of students. Similarly, studies can have multiple incentive conditions, and require multiple treatment contrasts relative to the same control group. To account for the non-independence of effect sizes, the study adopted Hedges, Tipton, and Johnsons (2010) robust variance estimation (RVE) approach. The RVE has the advantage of making no assumptions about the covariance structure of the effect size estimates (Tanner-Smith & Tipton, 2014) and yields estimates that are robust to a wide range of within-study intraclass correlations (Scammacca et al., 2014; Wilson, Tanner-Smith, Lipsey, Steinka-Fry, & Morrison, 2013). In addition, the analyses used the small-sample corrections recommended by Tipton (2015) and Tipton and Pustejovsky (2015), which makes adjustments to both the residuals and the degrees of freedom of the treatment estimates.
CHARACTERISTICS OF THE INCLUDED STUDIES
The appendix shows the characteristics of the incentive programs included in this meta-analysis. There was a balance of published and unpublished studies in the analysis, as well as a balance with respect to location and schooling level. All but three studies used a randomized design. Four studies did not report treatment estimates separately by subject area, which meant that these studies could only be used to estimate the effect of incentives on overall achievement.
Analysis was conducted on 21 studies that yielded 103 effect sizes. The relatively large number of effect sizes arose in part because several investigators chose to report results from distinct experiments within the same study. For example, Fryer, Devi, and Holden (2016) described two field experiments conducted in Washington, DC, and Houston that varied on several dimensions, including the academic behaviors that were incentivized, the frequency with which students were provided incentives, the grade levels that participated, the magnitude of the rewards, and the state achievement tests used as outcome measures. In addition, nine studies examined multiple treatment conditions (e.g., individual-level incentives versus team-level incentives versus a control group). As a result, the number of studies included in the meta-analysis was not aligned with the number of experiments that were conducted. Overall, the 21 studies reported on 39 different student-level cash incentive programs.
EFFECTS ON TEST PERFORMANCE
The I2, which is a measure of heterogeneity between studies, was approximately 23%, which is indicative of a low level of between-study variability (Higgins, Thompson, Deeks, & Altman, 2003). Table 1 provides the mean estimates, the standard errors of the mean estimates, the associated p values, and the number of studies and effect sizes for each analysis. For overall achievement, the mean estimate (= 0.062) was significantly positive. There was also a significantly positive effect of monetary incentives on mathematics achievement (= 0.095), but there was no relationship between student incentives and reading/language arts achievement (= 0.020).
Table 1. Effect Sizes of Monetary Incentives on Test Performance
For both international and U.S. students, cash incentives had a statistically significant positive effect on overall achievement (= 0.099 and = 0.044, respectively) as well as on mathematics achievement (= 0.125 and = 0.082, respectively). The effect was stronger for international students than for U.S. students for overall achievement, but the effect was the same for both groups for mathematics achievement.
There was also a statistically significant effect of cash incentives on overall achievement at the secondary grades (= 0.076), but not at the elementary grades (= 0.039). However, there was no difference in the magnitude of effect between the two schooling levels.
Financial incentives had a statistically significant effect on overall achievement for both males (= 0.085) and females (= 0.084). In addition, the positive effect was equally strong for both genders.
Initial Achievement Levels
The effect of financial incentives for lower achieving students, defined as those whose test performance prior to the implementation of the incentive program was below the median score, was compared to the effect for higher achieving students, defined as those whose initial test performance was above the median score. The effect of incentives on overall achievement was statistically significant and positive for both higher achieving students (= 0.074) as well as for lower achieving students (= 0.065). There was also no difference in the magnitude of effect between the two groups.
In designing cash incentive programs, policymakers need to consider several issues, including whether treatment effects persist after the incentives are removed and whether stronger effects can be observed if larger incentives are provided. In addition, it is important to identify promising features of incentive programs, especially among studies with multiple treatment conditions that facilitate direct comparisons of the effectiveness of different design options. Because studies that examined these issues did not necessarily present the findings within a quantitative framework, a narrative review is presented instead.
EFFECTS AFTER THE REMOVAL OF THE INCENTIVES
The evidence is mixed as to whether achievement gains stemming from the implementation of the incentive programs could be sustained after the removal of the incentives. Kremer, Miguel, and Thornton (2009) found that even one year after the incentive program had ended, the program continued to have a positive effect on test scores, although the effect was weaker than that observed when the program was still in place. They contend that these findings support the notion that the initial test score gains reflected real learning as opposed to cramming or cheating for the test. Levitt, List, and Sadoff (2016) also found the effects of financial incentive programs could persist after the incentive program ended, at least for one additional year. They examined whether financial incentives provided during students freshman year in high school was associated with a higher probability of being on track to graduate in subsequent years. Treatment effects persisted one year after the incentives had been removed, such that students in the treatment group had a higher probability of being on track to graduate when measured at the 10th grade. However, the treatment effects dissipated thereafter, and there were no differences between the treatment and control students probabilities of being on track to graduate by the time students reached the 11th or 12th grades.
Other studies have confirmed that achievement gains associated with the monetary incentive programs may be short-lived. Bettinger (2012) conducted a multi-year evaluation, where students could be eligible for incentives in one year, but not the next. He found that the achievement gains demonstrated by the incentive recipients in the previous year did not persist into the following year. Similarly, examining the incentive program in Dallas, Fryer (2011) found that a year after the incentives had ended, the treatment estimates had faded and were no longer statistically significant.
EFFECTS OF INCENTIVES AS A FUNCTION OF THE SIZE OF THE CASH PRIZES
The results are also contradictory as to whether the effect of the monetary incentive programs is related to the size of the cash reward. Jackson (2014) did not find a relationship between program effect size and the size of the rewardthe number of Advanced Placement tests passed was the same for schools that paid $100 per exam as for schools that paid between $101 and $500. By way of contrast, Fryer, Devi, and Holden (2016) found that achievement gains were greater with larger incentives. They initially paid students $2 per mathematics objective mastered. When they temporarily increased the amount of incentives to $4 and then $6, the rate of learning objectives mastered per week also increased. Namely, when the incentive amount was $2, students mastered an average of 2.32 objectives per week, but when the amount was increased to $4 then to $6, the average number of objectives mastered increased to 2.81 and 5.79, respectively. In a similar vein, Levitt, List, Neckermann, and Sadoff (2016) found that offering a $20 cash prize had a positive effect on test scores, while offering a $10 cash prize did not have an effect. However, this effect appeared to be driven mostly by older students, as younger children responded in similar ways to both the larger and smaller incentives.
COMPARISONS OF MULTIPLE TREATMENT CONDITIONS
Some studies included two or more treatment conditions, allowing for a direct comparison of the effectiveness of different types of incentive programs. Blimpo (2014) studied three types of student incentive structuresincentives to individual students, to teams of students, and to teams of students in a tournament formatand found them all to be equally effective at raising student test scores. Behrman, Parker, Todd, and Wolpin (2012) also studied the effectiveness of three incentive conditions, which differed with respect to the stakeholders being incentivized: individual students; individual teachers; or individual students with groups of teachers and school administrators. They found that providing incentives to individual teachers had no effect on student test scores, but incentives provided to individual students had a positive effect on test scores. The strongest effect was found when individual students, groups of teachers, and schools administrators were all eligible for cash rewards. Notably, teachers in this incentive condition reported spending more outside-of-class time helping students prepare for the exam than teachers in the two other incentive conditions.
Li, Han, Rozelle, and Zhang (2014) also found that a multipronged incentive structure was most effective. They studied an individual-level incentive program in which students who posted the largest achievement gains received a cash prize. In a variation on this incentive structure, they also incentivized peer tutoring in addition to test performance, such that a subset of higher achieving students were given contracts to tutor other students in the class. If their tutees were among the highest-gaining students, the tutors would receive the same cash prizes as the tutees. Offering individual student-level incentives had no effect on test scores, but combining incentives for test performance with incentives for peer tutoring showed a positive effect.
Hirshleifer (2017) compared the effectiveness of two incentive conditions, one of which focused on incentivizing inputs while the other focused on incentivizing outputs. In the inputs condition, students completed a series of interactive learning modules, after which they were administered a cumulative end-of-unit test. While working through the learning modules, students were provided immediate feedback about their performance. If they answered incorrectly, students were given an opportunity to click on a button to see the fully worked out question and the correct solution. Students could then incorporate this approach to future questions within a module. Students were paid based on the number of items they answered correctly while working through a given module and on the total number of modules mastered. In the outputs condition, students were paid based on the number of items answered correctly on the cumulative end-of-unit test. Hirshleifer (2017) found that on a subsequent non-incentivized test, students in the inputs condition outperformed students in the outputs condition. He hypothesized that incentivizing inputs was more effective than incentivizing outputs because it allowed students to more quickly and directly realize the fruits of their efforts.
UNINTENDED CONSEQUENCES OF MONETARY INCENTIVE PROGRAMS
One potential unintended consequence of monetary incentive programs is that it may divert students attention to the incentivized subjects at the expense of the subjects that are not incentivized. This substitution effect, however, may depend on initial achievement level. In their study, Fryer and Holden (2013) paid students based on the number of mathematics objectives mastered. High-achieving treatment students mastered more mathematics objectives, scored higher on the standardized mathematics test, and scored comparably on the standardized reading test, relative to high-achieving control students. In contrast, although low-achieving treatment students mastered more mathematics objectives than low-achieving control students, low-achieving treatment students scored comparably to low-achieving control students on the standardized mathematics test, and lower on the standardized reading test. Fryer and Holden (2013) noted that although both high-and low-achieving treatment students put in additional effort to obtain the prize (as evidenced by the increase in the number of mathematics objectives mastered), this increased effort came at the expense of the low-achieving students reading performance.
The results of this study suggest that financial incentives can modestly improve student achievement. There was a positive effect of monetary incentives on overall achievement and on mathematics achievement, although there was no effect for reading/language arts achievement. This finding is consistent with studies that suggest that incentives may be more effective with concrete subjects, such as mathematics, than with conceptual subjects, such as reading/language arts (Rouse, 1998). Incentives were related to the achievement of both international students and U.S. students, although the effect was stronger within the international context. There were no differences in the effects of incentives by gender or by initial achievement level, but the effect was significant for secondary grade students but not for elementary grade students. Perhaps by virtue of their better understanding of finances, older students may have found the cash rewards to be more enticing than younger students.
IMPLICATIONS FOR THE DESIGN OF FUTURE INCENTIVE PROGRAMS
The modest effect sizes found in this study raise questions as to why incentives did not have a stronger effect. One possibility, raised by Fryer (2011), is that offering financial incentives may increase students motivation to perform well, but students may not know what to do to improve their performance, despite their desire to do so. In interviews with students, Fryer (2011) found that students expressed excitement about the possibility of obtaining a cash prize, but when asked about how they could improve their test performance to attain the reward, students could not readily answer. Students responded with general test-taking strategies (e.g., making sure that their answers were entered correctly), as opposed to strategies that would actually improve their learning (e.g., studying harder, completing their homework, asking teachers for help). Students lack of understanding about what to do to improve their performance may explain why incentives did not affect their study habits.
In a similar vein, Li et al. (2014) noted that incentives may help to motivate students, but without being accompanied by additional remediation or academic supports, incentives, in and of themselves, will not help students learn the material. This may explain why treatment conditions that incentivized peers or teachers to provide extra assistance to students, such as those implemented by Li et al. (2014) and Behrman et al. (2015), showed stronger effects than treatment conditions that simply paid students for test performance. This finding is consistent with the conclusions by Slavin (2010), whose review led him to conclude that financial incentives to students worked best when paired with improvements in teaching or other supports.
Fryer (2011) suggested that incentivizing educational inputs (e.g., reading books) as opposed to outputs (e.g., reaching a performance standard on a test) may lead to stronger effects because inputs encourage students to engage in concrete behaviors that can lead to improved performance. By way of contrast, outputs such as reach proficient level are abstract goals, and do not offer students guidance about specific steps that will improve their learning. The findings from Hirshleifers (2017) study lend credence to this idea, as students who were incentivized on inputs (i.e., the number of modules mastered during a unit) demonstrated better test performance than students who were incentivized on outputs (i.e., the number of items answered correctly on an end-of-unit exam). Due to a lack of studies that incentivized educational inputs, more definitive conclusions cannot be drawn. However, future studies should examine whether incentives applied in combination with educational inputs, such as providing immediate corrective feedback to students, prove to be more effective than incentives that merely pay students for the number of correct responses.
The results of this study have implications for the design of future incentive programs. Consistent with the findings from laboratory experiments (ONeill, Abedi, Miyoshi, & Mastergeorge, 2005) as well as the findings from incentive programs conducted at the postsecondary level (Barrow & Rouse, 2013), there was some evidence that offering a larger cash prize may not necessarily lead to stronger effect than offering a smaller cash prize (Jackson, 2010). The results also suggest that students may engage in substitution, and focus on the incentivized subjects to the detriment of non-incentivized subjects (Fryer et al., 2016). This suggests that policymakers may want to design an incentive program that incentivizes multiple subjects, or include stipulations that performance on non-incentivized subjects cannot decline beyond a pre-specified level in order to receive the reward.
IMPLICATIONS FOR POLICY AND PRACTICE
A key assumption in the interpretation of test scores is that the results are an accurate demonstration of what students know and can do. If students have not put forth their best effort on the tests because the tests do not hold personal consequences for them, then the test results can yield a misleading picture of students abilities. The fact that this study found that student performance on mostly low-stakes tests could be improved with financial incentives calls into question the practice of making policy decisions based on these types of assessments. For example, the educational systems of countries that are highly ranked on international assessments are often lauded as exemplars that should serve as models for improvement (Grek, 2009). However, there is evidence that students from different countries may not have the same levels of motivation to perform on the low-stakes international tests (Zamarro, Hitt, & Mendez, 2016), and that performance differences on these tests reflect differences in ability as well as differences in motivation (Gneezy et al., 2017). This raises concerns whether the scores from these tests can be accurately used to rank order countries educational systems.
Similarly, in the U.S., policy decisions are often made on state achievement tests that have little consequence to students, yet have high-stakes consequences for teachers and schools. Results from state achievement tests have been used to dismiss teachers, determine teacher bonuses, and sanction schools for failing to make adequate yearly progress. That students may not put forth maximum effort on these types of low-stakes tests in the absence of a financial reward undermines the utility of these tests as an indicator of quality of instruction because the test scores may not accurately reflect what students have actually learned (Cole & Osterlind, 2008). Future studies should examine whether the use of tests that have personal consequences for students would change interpretations about teacher effectiveness or school improvement.
Another important policy question is the magnitude of student-level financial incentives relative to other programs designed to improve achievement. Compared to the effect sizes for other interventions such as class size reduction (= 0.08 to 0.10; Jepsen & Rivkin, 2002) or instructional reforms that involve computers or technology (= 0.15; McEwan, 2015), the effect sizes for student-level financial incentives are smaller. However, financial incentive programs are relatively inexpensive to implement, especially when compared to other types of reforms that involve substantial investments in human capital (Bettinger, 2012; Blimpo, 2014). Yeh (2010) suggested conducting a cost-effective analysis, in which policymakers examine the relative impact of each intervention per dollar, in order to evaluate different types of interventions to inform policy decisions. Fryer (2011) conducted a cost-effective analysis for the incentive programs included in his study, and found that statistically insignificant and weak effect sizes ranging from 0.0006 to 0.016 could have a 5% return on investment. Similarly, Blimpo (2014) found that the cost of implementing student-level financial incentives was $30 per standard deviation gain in test score. By way of comparison, Yeh (2010) reported significantly higher costs for interventions that involved significant human capital. For example, teacher education was estimated to cost just over $700 for a one quarter standard deviation increase in student achievement, and cross-age tutoring was estimated to cost a little more than $550 for a nearly one standard deviation gain in mathematics test performance. Thus, although the effect sizes for financial incentives are smaller than those for other educational interventions, the relatively low level of resources needed to implement student-level incentives may mean that financial incentives can be a more cost-effective strategy for improving achievement. It is important to emphasize, however, that the effect sizes associated with financial incentives are not nearly large enough to bring the United States to the levels of the highest-achieving nations (National Research Council, 2011), so policymakers may wish to invest in other reforms that are more costly but also have the potential to yield stronger effects.
There are several limitations to this study. First, the study could not disentangle the impact of student incentives from other concurrent interventions. For example, in the United States, many schools and districts are required to submit continuous improvement plans, which are often accompanied by changes to the curriculum or teachers professional development. To the extent that the incentive programs took place at the same time as other ongoing interventions, the estimates in the meta-analysis may be overstated.
Second, it is possible that publication bias may have resulted in inaccurate estimates. Analyses using the Duval and Tweedies (2000) trim and fill procedure to assess publication bias did not change the study conclusions that financial incentives had modestly positive effects on achievement. Similarly, Rosenthals (1979) fail-safe N approach indicated that it would require more than 300 additional studies with an average effect size of zero to render the effect on overall achievement to be statistically insignificant. Although these analyses suggest that publication bias may be minimal, it remains possible that researchers may have self-censored themselves and failed to publicly disseminate manuscripts with findings of null or negative relationships, which would result in an upward bias in the estimates.
Finally, the relatively small number of studies included in the review means that the conclusions warrant caution. However, research has indicated that as few as two studies can yield meaningful meta-analytic results (Valentine, Pigott, & Rothstein, 2010), and it is common to find meta-analyses conducted on a small number of studies (Inthout et al., 2015). For example, within the Cochrane Database of Systematic Reviews, which is a repository for thousands of systemic reviews and meta-analyses, the median number of studies per meta-analysis is seven (von Hippel, 2015). Research has also found that evaluating the reliability of meta-analytic results solely on the number of studies included in the review can be misleading because many meta-analyses include smaller studies that are underpowered, and the inclusion of smaller studies increases the between-study heterogeneity (IntHout, Ioannidis, Borm, & Goeman, 2015) and reduces the precision of the meta-analytic estimates (Turner, Bird, & Higgins, 2013). For this reason, researchers have suggested that meta-analysis ignore estimates from smaller studies and make conclusions based exclusively on estimates from larger studies that are sufficiently powered (Kraemer, Gardner, Brooks, & Yesavage, 1998; Stanley, Jarrell, & Doucouliagos, 2010). Notably, the primary studies included in the meta-analysis were able to leverage existing administrative records, and each analysis was conducted on thousands of students. These larger sample sizes lend credence to the reliability of the study conclusions.
Overall, this study suggests that financial incentives can modestly improve student achievement. The findings also raise questions as to whether policymakers should use low-stakes tests, such as state achievement tests or international assessments, to inform high-stakes decisions. More research is needed to better understand the intersection between student achievement and student motivation, and the inferences that can be drawn from tests that are administered in the absence of financial incentives or other personal consequences for students.
1. See Fang and Gerhart (2012) and Johnston and Sniehotta (2010) for a discussion of alternative motivational theories for incentive programs, including self-regulation theory, self-determination theory, and cognitive evaluation theory.
This research was supported by funding from NORCs Working Paper Series. The content or opinions expressed do not necessarily reflect the views of NORC, and any errors remain my own.
References marked with an asterisk indicate studies included in the meta-analysis.
Angrist, J., Lang, D., & Oreopoulos, P. (2009). Incentives and services for college achievement: Evidence
from a randomized trial. American Economic Journal: Applied Economics, 1, 136163.
Angrist, J. D.., & Lavy, V. (2009). The effects of high stakes school achievement awards: Evidence from a
randomized trial. American Economic Review, 99, 301331.*
Ash, K. (2008, February 13). Promises of money meant to heighten student motivation. Education Week.
Retrieved from http://www.edweek.org
Barrera-Osorio, F., & Filmer, D. (2016). Incentivizing schooling for learning: Evidence on the impact of
alternative targeting approaches. Journal of Human Resources, 51(2), 461499.*
Barrow, L., & Rouse, C. E. (2016). Financial incentives and educational investment: The impact of
performance-based scholarships on student time use. Education Finance and Policy. 13(4), 419448.
Behrman, J. R., Parker, S. W., Todd, P. E., & Wolpin, K. I. (2015). Aligning learning incentives of students
and teachers: Results from a social experiment in Mexican high schools. Journal of Political Economy,
Bembenutty, H. (2008). Academic delay of gratification and expectancy value. Personality and Individual
Differences, 44, 193202.
Berry, J., Kim, H. B., & Son, H. (2017). When student incentives dont work: Evidence from a field
experiment in Malawi. Newark, DE: University of Delaware.*
Berry, J. W. (2015). Child control in education decisions. Journal of Human Resources, 50(4), 1051
Bettinger, E. P. (2012). Paying to learn: The effect of financial incentives on elementary school test
scores. The Review of Economics and Statistics, 94, 686698.*
Blimpo, M. P. (2014). Team incentives for education in developing countries: A randomized field
experiment in Benin. American Economic Journal: Applied Economics, 6, 90109.*
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis.
Chichester, UK: Wiley.
Burgess, S., Metcalfe, R., & Sadoff, S. (2016). Understanding the response to financial and non-financial
incentives in education: Field experimental evidence using high-stakes assessments. Bristol, UK:
University of Bristol.*
Cole, J. S., & Osterlind, S. J. (2008). Investigating differences between low- and high-stakes test
performance on a general education exam. The Journal of General Education, 57(2), 119130.
Cooper, H., Robinson, J. C., & Patall, E. A. (2006). Does homework improve academic achievement? A
synthesis of research, 19872003. Review of Educational Research, 76, 162.
Deci, E. L, Koestner, R., & Ryan, R. (1999). A meta-analytic review of experiments examining the effects
of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125, 627668.
Deci, E. L., Koestner, R., & Ryan, R. M. (2001). Extrinsic rewards and intrinsic motivation in education:
Reconsidered once again. Review of Educational Research, 71, 127.
Duflo, E., Hanna, R., & Ryan, S. P. (2012). Incentives work: Getting teachers to come to school.
American Economic Review, 102, 12411278.
Duval, S. J., & Tweedie, R. L. (2000). Trim and fill: A simple funnel-plot-based method of testing and
adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455463.
Eccles, J. S. (2007). Families, schools, and developing achievement-related motivations and
engagement. In J. E. Grusec & P. D. Hastings (Eds.), Handbook of socialization (pp. 665691). New
York: The Guilford Press.
Eccles, J. S., Adler, T. F., Futterman, R., Goff, S. B., Kaczala, C. M., Meece, J. L., & Midgley, C. (1983).
Expectancies, values, and academic behaviors. In J. T. Spence (Ed.), Achievement and achievement
motivation (pp. 75146). San Francisco, CA: W. H. Freeman.
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology,
Fang, M., & Gerhart, B. (2012). Does pay for performance diminish intrinsic interest? The International Journal of Human Resource Management, 26(6), 11761196.
Frey, B. S., & Goette, L. (1999). Does pay motivate volunteers? Zürich, CH: Institute for Empirical
Research in Economics, Universität Zürich.
Fryer, R. G. (2011). Financial incentives and student achievement: Evidence from randomized trials. The
Quarterly Journal of Economics, 126, 17551798.*
Fryer, R. G., Devi, T., & Holden, R. T. (2016). Vertical versus horizontal incentives in education: Evidence
from randomized trials. Cambridge, MA: Harvard University.*
Gallini, S. (2017). Incentives, peer pressure, and behavior persistence. Cambridge, MA: Harvard
Gneezy, U., List, A., Livingston, J., Sadoff, S., Qin, X., & Xu, Y. (2017). Measuring success in education: The role of effort on the test itself. National Bureau of Economic Research Working Paper No. w24004. Cambridge, MA: NBER.*
Gneezy, U., Meier, S., & Rey-Biel, P. (2011). When and why incentives (don't) work to modify behavior. The Journal of Economic Perspectives, 25(4), 191209.
Grek, S. (2009). Governing by numbers: the Pisa effect in Europe. Journal of Education Policy, 24, 2337.
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods, 1, 3965.
Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman. D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557560.
Hirshleifer, S. R. (2017). Incentives for effort or outputs? A field experiment to improve student performance. Riverside, CA: University of California, Riverside.*
IntHout, J., Ioannidis, J. P. A., Borm, G. F., & Goeman, J. J. (2015). Small studies are more
heterogeneous than large ones. Journal of Clinical Epidemiology, 68, 860869.
Jackson, C. K. (2010). A little now for a lot later. A look at a Texas Advanced Placement incentive program. The Journal of Human Resources, 45, 591639.*
Jackson, C. K. (2014). Do college-preparatory programs improve long-term outcomes? Economic Inquiry, 52, 7299.*
Jepsen, C., & Rivkin, S. (2002). Class size reduction, teacher quality, and academic achievement in California elementary public schools. San Francisco: Public Policy Institute of California.
Johnston, M., & Sniehotta, F. (2010). Financial incentives to change patient behavior. Journal of Health Services Research Policy, 15(3), 131132.
Kim, R.S. (2011). Standardized regression coefficients as indices of effect sizes in meta-analysis. (Unpublished doctoral dissertation). Florida State University, Tallahassee, FL.
Kohn, A. (1993). Why incentive plans cannot work. Harvard Business Review, 71, 5463.
Kraemer, H. C., Gardner, C., Brooks, J. O., & Yesavage, J. A. (1998) Advantages of excluding underpowered studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological Methods, 3, 2331.
Kremer, M., Miguel, E., & Thornton, R. (2009). Incentives to learn. The Review of Economics and Statistics, 91, 437456.*
Lepper, M., Greene, D., & Nisbett, R. (1973). Undermining childrens intrinsic interest with extrinsic reward: A test of the overjustification hypothesis. Journal of Personality and Social Psychology, 28, 129137.
Leuven, E., Oosterbeek, H., & van der Klaauw, B. (2010). The effect of financial rewards on students achievement: Evidence from a randomized experiment. Journal of the European Economic Association, 8, 12431265.
Levitt, S. D., List, J. A., Neckermann, S., & Sadoff, S. (2016). The behavioralist goes to school: Leveraging behavioral economics to improve educational performance. American Economic Journal: Economic Policy, 8(4), 183219.*
Levitt, S. D., List, J. A., & Sadoff, S. (2016). The effect of performance-based incentives on educational achievement: evidence from a randomized experiment. National Bureau of Economic Research Working Paper No. w22107. Cambridge, MA: NBER.*
Li, T., Han, L., Rozelle, S., & Zhang, L. (2014). Encouraging classroom peer interactions: Evidence from Chinese migrant schools. Journal of Public Economics, 111, 2945.*
List, J. A., Livingston, J. A., & Neckermann, S. (2012). Harnessing complementarities in the education production function. Chicago, IL: University of Chicago.*
McEwan, P. J. (2015). Improving learning in primary schools of developing countries: A meta-analysis of randomized experiments. Review of Educational Research, 85(3).
Miller, C., Riccio, J., Verma, N., Nunez, S., Dechausay, N., & Yang, E. (2015). Testing a conditional cash transfer program in the U.S.: The effects of the Family Rewards program in New York City. Journal of Labor Policy, 4, 129.*
National Research Council (2011). Incentives and test-based accountability in education. Washington, DC: Author.
Nieminen, P., Lehtiniemi, H., Vähäkangas, K., Huusko, A., & Rautio, A. (2013). Standardised regression coefficient as an effect size index in summarizing findings in epidemiological studies. Epidemiology Biostatistics and Public Health, 10, 115.
ONeil, H.F., Abedi, J., Miyoshi, J., & Mastergeorge, A. (2005). Monetary incentives for low-stakes tests. Educational Assessment, 10, 185208.
Prothero, A. (2017, October 17). Does paying kids to do well in school actually work? Education Week. Retrieved from http://www.edweek.org.
Raymond, M. (2008). Paying for As: An early exploration of student reward and incentive programs in charter schools. Stanford, CA: CREDO.
Ringquist, E.J. (2013). Meta-analysis for public management and policy. San Francisco, CA: Jossey-Bass.
Rosenthal, R. (1979). The file-drawer problem and tolerance for null results. Psychological Bulletin, 86, 638641.
Rouse, C. (1998). Private school vouchers and student achievement: An evaluation of the Milwaukee Parental Choice Program. Quarterly Journal of Economics, 113, 553602.
Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55, 6878.
Sadoff, S. (2014). The role of experimentation in education policy. Oxford Review of Economic Policy, 30, 597620.
Scammacca, N., Roberts, G., & Stuebing, K. K (2014). Meta-analysis with complex research designs: Dealing with dependence from multiple measures and multiple group comparisons. Review of Educational Research, 84(3), 328364.
Scheie, E. (2014, August 20). Cities experiment with paying poor students for good grades. World
Journalism Institute. Retrieved from https://world.wng.org.
Sharma, D. (2010). The impact of financial incentives on academic achievement and household behavior: Evidence from a randomized trial. Columbus, OH: The Ohio State University.*
Slavin, R. E. (2010). Can financial incentives enhance educational outcomes? Evidence from international experiments. Educational Research Review, 5, 6880.
Stanley, T. D., Jarrell, S. B., & Doucouliagos, H. (2010) Could it be better to discard 90% of the data? A statistical paradox. The American Statistician, 64, 7077.
Stolovitch, H. D., Clark, R. E., & Condly, S. J. (2002). Incentives, motivation, and workplace performance: Research and best practices. McLean, VA: International Society for Performance Improvement and The Incentive Research Foundation.
Tanner-Smith, E. E., & Tipton, E. (2014). Robust variance estimation with dependent effect sizes: Practical considerations including a software tutorial in Stata and SPSS. Research Synthesis Methods, 5(1), 1330.
Tipton, E. (2015). Small sample adjustments for robust variance estimation with meta-regression. Psychological Methods, 20(3), 375393.
Tipton, E. & Pustejovsky, J. (2015). Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression. Journal of Educational and Behavioral Statistics, 40(6), 604634.
Turner, R. M., Bird, S. M., & Higgins, J. P. T. (2013) The impact of study size on meta-analyses: Examination of underpowered studies in Cochrane Reviews. PLoS ONE, 8(3), e59202.
Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). How many studies do you need? A primer on statistical power for meta-analysis. Journal of Educational and Behavioral Statistics, 35(2), 215247.
Visaria, S., Dehejia, R., Chao, M. M., & Mukhopadhyay, A. (2016). Unintended consequences of rewards for student attendance: Results from a field experiment in Indian classrooms. Economics of Education Review, 54, 173184.
von Hippel, P. T. (2015). The heterogeneity statistic I2 can be biased in small meta-analyses. BMC Medical Research Methodology, 15(35), 18.
Wallace, B. D. (2009). Do economic rewards work? District Administration, 45, 2427.
White, R. (2012). The effectiveness of an incentivized program to increase daily fruit and vegetable dietary intake by low-income, middle-aged women. Unpublished thesis. Mankato, MN: Minnesota State University, Mankato.
Wigfield, A., & Cambria, J. (2000). Expectancy-value theory: Retrospective and prospective. In T. Urdan & S. A. Karabenck (Eds.), The decade ahead: Theoretical perspectives on motivation and achievement advances in motivation and achievement (pp. 3570). Bingley, UK: Emerald Group Publishing Limited.
Willingham, D. T. (2008). Should learning be its own reward? American Educator, 31, 2935.
Wilson, S. J., Tanner-Smith, E. E., Lipsey, M. W., Steinka-Fry, K., & Morrison, J. (2011). Dropout prevention and intervention programs: Effects on school completion and dropout among school-aged children and youth. Campbell Systematic Reviews, 8.
Yeh, S. (2010). The cost effectiveness of 22 approaches for raising student achievement. Journal of Education Finance, 36(1), 3875.
Zamarro, G., Hitt, C., & Mendez, I. (2016). Reexamining international differences in achievement and non-cognitive skills: When students don't care. Little Rock, AR: University of Arkansas.
CHARACTERISTICS OF THE STUDIES INCLUDED IN THE META-ANALYSIS
Note: A study may have included a separate condition that incentivized teachers or parents. Unless teachers or parents were incentivized in conjunction with students, the effect sizes from these treatment conditions were omitted from the analysis.