Home Articles Reader Opinion Editorial Book Reviews Discussion Writers Guide About TCRecord
transparent 13
Topics
Discussion
Announcements
 

Assessing the Effects of Small School Size on Mathematics Achievement: A Propensity Score-Matching Approach


by Adam E. Wyse, Venessa Keesler & Barbara Schneider - 2008

Background: Small schools have been promoted as an educational reform that is capable of improving student outcomes. However, a survey of the research on small schools indicates that much of the movement for decreasing school size is based primarily on correlational methods that do not control for selection effects in the data. In addition, several recent studies have suggested that smaller schools may be able to increase student attendance and graduation rates but that these gains might not necessarily translate into gains in student achievement.

Purpose: This study investigates the potential effect of attending smaller schools on student mathematics achievement using propensity score matching techniques.

Research Design: Data in the study are from the Educational Longitudinal Study of 2002 and represent over 12,000 high school students. Observed student responses from 10th and 12th grade are used in the analyses.

Data Collection and Analysis: An estimate of the potential effect of attending smaller schools is determined by matching students in the largest schools to smaller schools of four different sizes using propensity score matching techniques. These methods are used to attempt to account for selection effects present in these data and to approximate what the effect of attending a smaller school would be in each case.

Results: Results from the study suggest that simply switching students to smaller school environments does not necessarily raise the mathematics achievement of students in the largest schools. Further analysis indicated that there did not appear to be an optimal range of school sizes that would provide maximum levels of student mathematics achievement.

Conclusion: This study suggests that creating smaller schools might not be the best mechanism to raise student achievement. It is suggested that policy makers make careful deliberations before deciding to invest in small schools as an educational reform, because it is hard to establish when they will or will not be successful. Further research is needed into what makes some small schools effective.



Small schools have been promoted as a reform to improve student achievement (Lee, 2000; Lee & Smith, 1995, 1997; Wasley et al., 2000). However, many studies have reported inconsistent results for achievement, with much of this research primarily based on correlation analyses that do not control for selection effects (Schneider, Wyse, & Keesler, 2007). It could be, for example, that students in small schools have higher achievement scores not because of the size of the school they attend, but because students who attend these types of schools are fundamentally different from students in larger schools. Students in smaller schools may have characteristics influencing their achievement that are unrelated to school size. By matching students with similar characteristics, propensity score matching techniques attempt to account for selection effects and allow for an estimate of the potential impact of school size to be determined. This article uses propensity score matching techniques to analyze data from the Educational Longitudinal Survey (ELS), exploring what the impact of attending smaller schools of various sizes could be for mathematics achievement.


THE EVIDENCE FOR SMALL SCHOOLS: IS SMALLER REALLY BETTER


Small schools have been implemented as a school reform mechanism, drawing on a significant body of literature that suggests that small schools can produce a number of positive student outcomes. Through much of the 1980s and 1990s, research on small schools argued that decreasing school size could promote student motivation and achievement, teacher engagement, and effective management practices. Additionally, small schools were shown to have a positive impact on lowering dropout rates both in smaller religious and nonreligious schools (Coleman & Hoffer, 1987; Pittman & Haughwout, 1987).


These same findings were supported by analyses of the National Education Longitudinal Study of 1988 (NELS:88) in the 1990s, which showed that smaller schools produced substantial gains in mathematics achievement for high school students (Lee, 2000; Lee & Smith, 1997).1 Small schools were perceived to be an effective solution for improving low academic achievement, particularly in those schools serving low-income or minority students (Lee & Smith, 1995). Other researchers using different methods also showed positive effects for school size, arguing that small schools were more likely to produce a strong sense of community by creating a sense of belonging, improving interpersonal relations, strengthening teacher commitment, increasing student participation, developing greater program coherence, and modestly increasing student expectations (Raywid, 1996; Walsey et al., 2000). Results from these and other studies were often used as evidence by policy makers and school administrators to decrease school size, with the expectation that this could improve student achievement (Toch, 1991).


Recently, the Gates Foundation has funded the establishment of over 150 small schools around the country. However, the results of these initiatives have been mixed. Although the Gates Foundation schools have been found to have positive school cultures, the academic results are not encouraging, with these small schools performing lower in math and only slightly higher in English and reading than larger schools in the same locations with similar student populations (Greene & Symonds, 2006). In cities like New York and Chicago, which have pursued small schools as a means for raising student achievement, students in these smaller schools have better attendance rates and are more likely to be on track for graduation as compared with students in larger schools, but they do not score higher on achievement tests (Huebner, 2005; Herszenhorn, 2006; Kahne, Sporte, de la Torre, & Easton, 2006). Moreover, the dropout rate in the small schools is comparable with larger schools with similar student populations (Kahne et al., 2006). This suggests that decreasing the size of schools may be an insufficient solution to remediate the significant problems facing education today.


For example, research has shown that although students may feel better in schools with a greater sense of belonging, these feelings do not necessarily translate into achievement gains (Shouse, 1996, 1997). In schools with a sense of strong academic press, students were more likely to have increased achievement scores, whereas students in schools with weaker academic press, coupled with a strong sense of community, were more likely to have lower achievement scores.2 Phillips (1997) found that academic climate was related to increases in attendance and mathematics achievement, whereas communitarian values were negatively related to mathematics achievement and had minimal effects on attendance. In schools where teacher caring was high, mathematics test scores were low, suggesting that “teachers in some schools may be more concerned with maintaining affective relations with students than with imparting skills” (Phillips, 1997, p. 656). Some have argued that this substitution of “socially therapeutic” values and activities for the more intellectually demanding ones related to high academic press may be particularly prevalent in schools that serve disadvantaged students (Shouse, 1996).


Small schools are often created in low-performing urban areas with low-income and minority families. If these schools are successful in strengthening the sense of community and developing a positive school climate but are not able to raise achievement at the same time, it would appear that this reform may not be working as effectively for the population for which it was designed (Battistich, Solomon, Kim, Watson, & Schaps, 1995; Ravitch, 2006). When schools are the unit of reform for an innovation, effective change should be located in the school itself and be specific to the school, which is likely to have a unique organizational character and student population (Stevenson, 2000). Applying this rationale to school size, schools can be very similar in size but very different in organization, outcomes, and norms. Therefore, simply creating smaller schools and transferring students into them from larger schools may not produce the desired effect on student achievement.


Much of the prior research on small schools has used techniques to assess causal effects that rely on correlational analyses rather than random clinical trails that are considered the “gold standard” for estimating effects (Shadish, Cook, & Campbell, 2002).3 Ideally, to determine whether smaller schools are effective in raising student achievement, one would randomly assign students to large and small schools and then compare achievement outcomes. Because this is challenging from a practical standpoint, an alternative technique is to match students on a set of common characteristics and then statistically estimate the effect of being in schools of varying sizes. Using such a technique is especially important when investigating the impact of school size on achievement because students who attend smaller schools may have background characteristics that are substantially different from those of students who attend large schools. Thus, in this study, we use propensity score matching techniques, which match students on a set of characteristics to estimate the effect of attending smaller schools (see, e.g., Rosenbaum, 2002; Rosenbaum & Rubin, 1983; Rubin, 1979; Rubin & Thomas, 2000).

 

The effect we have chosen to examine is mathematics achievement. There are several reasons for selecting mathematics achievement. Researchers have tended to agree that in contrast to other school subjects, mathematics learning is likely to occur in school and be particularly sensitive to instruction. Mathematics learning is thus more school dependent than other subjects (Borman & D’Agostino, 1996; Burkam, Ready, Lee, & LoGerfo, 2004; Porter, 1989). Second, achievement in science and mathematics has been shown to be associated with college attendance. Students who score higher in mathematics on standardized tests (Hoffer, 1995) and who take more advanced mathematics and science courses (Schneider, Swanson, & Riegle-Crumb, 1998) are more likely to attend competitive 4-year colleges. Finally, mathematics achievement test scores are one of the criteria by which schools are judged to be making Adequate Yearly Progress under the No Child Left Behind legislation (No Child Left Behind Act of 2001, 2002). Because mathematics achievement is a skill taught primarily in schools and often serves as a gatekeeper for access to postsecondary success, this study explores whether shifting students from very large schools to smaller schools of varying sizes has an impact on mathematics achievement. For this study, mathematics achievement is measured by standardized math test scores.


DATA AND SAMPLE


The data for this study were obtained from the Educational Longitudinal Study of 2002 (ELS: 2002), conducted by the National Center for Education Statistics (NCES).4 ELS: 2002 is a nationally representative sample of over 15,000 students in 750 high schools and provides detailed information about the nation’s high schools and students. The ELS data set is similar to the NELS and contains many school and student measures, including information related to student achievement, academics, and interests, and demographic information. For this analysis, both the baseline and the first year follow-up surveys are used.


The sample for this analysis included students who completed surveys in both 10th and 12th grade, which represented 12,853 students in 745 schools. The change in sample size occurred because respondents who had legitimate skips on either high school size or mathematics test scores were excluded.5 For missing data, four imputed data sets are created using an EM algorithm (Dempster, Laird, & Rubin, 1977).6 This algorithm estimates missing values by using the other nonmissing values in the data set (Allison, 2002; Little & Rubin, 1987; Schafer, 1997). Although this strategy provides a more reliable alternative to listwise deletion, the results may still be an underestimation compared with the complete data set (Allison; Little & Rubin; Schafer).7


Descriptive statistics indicate that in the ELS data set, students in the largest schools are more likely to attend public schools in urban areas than students in the smallest schools, who are more evenly dispersed between public and private schools in urban, suburban, and rural school environments. Larger schools also have more minority students, with a lower percentage of students enrolling in 4-year colleges after graduation, and are more likely to have students from lower income families (see Table 1).



Table 1. Descriptive Statistics

      

 

1–399 students

400–799 students

800–1,199 students

1,200–1,999 students

2,000+ students

 

Urbanicity

 

 

 

 

 

 

Urban

23%

25%

35%

37%

41%

 

Suburban

35%

50%

51%

52%

50%

 

Rural

42%

25%

14%

11%

9%

 
 

 

     

Region

 

 

 

 

 

 

Northeast

18%

23%

23%

16%

12%

 

Midwest

41%

28%

23%

26%

15%

 

South

26%

38%

39%

38%

34%

 

West

15%

11%

15%

20%

39%

 
 

 

     

Race

 

 

 

 

 

 

White

78%

72%

64%

50%

35%

 

Black

6%

11%

14%

16%

11%

 

Hispanic

7%

9%

9%

15%

27%

 

API

2%

3%

8%

13%

21%

 

Mutiracial

5%

4%

4%

5%

5%

 

Other

2%

1%

1%

1%

1%

 
 

 

     

Sector

 

 

 

 

 

 

Public

50%

63%

71%

92%

98%

 

Catholic

19%

26%

20%

6%

1%

 

Other private

31%

11%

9%

2%

1%

 
 

 

     

Free-Reduced Lunch

 

 

 

 

 

 

25% or less

79%

75%

72%

72%

67%

 

Greater than 25% but less than 50%

17%

18%

19%

24%

26%

 

Greater than 50% but less than 75%

3%

4%

8%

4%

5%

 

75% or more

<1%

3%

<1%

<1%

2%

 
 

 

     

College-going Rate

 

 

 

 

 

 

Less than 50% 4-year college

45%

40%

41%

44%

62%

 

50% or more 4-year college

55%

60%

59%

56%

38%

 



These differences in background characteristics suggest that there are potential selection effects in these data and that an adjustment for selection effects should be conducted.8


MEASURES


The key outcome variable of interest is 12th-grade mathematics achievement, which is a continuous measure obtained from the math item response theory (IRT) estimate on the ELS mathematics test.9 This measure was selected because it is a standardized measure that can be used for comparative purposes, as opposed to other measures of math achievement, such as teacher-awarded grades in math classes, which are subjective and not standardized; SAT or ACT scores, which have missing values for a large proportion of students, especially for minority students in urban school settings; or math courses, which are not standardized across schools. This study focuses on how mathematics achievement is influenced by high school size—specifically, whether attending schools of contrasting sizes could result in different levels of mathematics achievement.


METHODOLOGICAL APPROACH


Most of the previous research on high school size has used hierarchical linear modeling (HLM) or correlation analyses. HLM provides a significant improvement over ordinary least squares (OLS) regression models because it can effectively capture the multilevel nature of these data by estimating effects at both the school and student level (Raudenbush & Bryk, 2002). However, similar to OLS regression, HLM is not designed to account for selection effects that may be present in these data.10, 11


One potential way to account for selection effects is to conduct an experiment in which students are randomly assigned to schools of different sizes. In this way, the selection effects would be accounted for by random assignment, and the difference in mathematics achievement could be estimated for large and small schools. However, it would be quite difficult to conduct an experiment in which students are assigned to schools that only varied in size. Because we cannot assume that school size will have the same effect on students with different characteristics, an alternative strategy is needed.


The propensity score methods of Rosenbaum and Rubin (1983) allow for the approximation of an experimental design using observational data by matching students together based on a set of covariates.12 Propensity score matching is a technique to approximate an experiment in which individuals with similar propensities are matched and compared on outcome measures to determine the effect of the treatment. This set of covariates typically includes each variable that could potentially cause selection effects. The propensity of being in a particular group (in this case, smaller schools) is estimated by submitting the covariates to logistic regression. For these analyses, 68 covariates are used to attempt to account for the selection effects that might be present in these data and include such factors as 10th-grade student achievement, demographics, attitudes, motivation, and extracurricular activity participation.13


In each of the analytic models that follow, students in schools of 2,000 or more students are matched to students in schools of smaller sizes based on their propensity of being in the smaller school. Students are then sorted and subclassified into 20 different strata, which are used to estimate the average effect of attending a smaller school for students who are currently attending schools of 2,000 or more students.14


Four separate propensity matches are performed for students in schools of 2,000 or more students, to students attending schools of 1–399, 400–799, 800–1,199, and 1,200–1,999 students. These categories, which contain approximately equal numbers of students, were selected because the natural division between school size categories reflects the various sizes that exist in current school organizations. In each propensity score match, balance was checked across each of the strata, and the observed differences did not exceed what would be expected purely by chance alone.


Table 2 shows the balance of the logit of the propensity score for each of the four matches across 20 different strata. The mean and the standard deviation within each stratum are approximately the same, further suggesting that the matches achieved adequate balance. The sample sizes across the strata indicate that students who are in the 20th stratum are more likely to attend a smaller school (fewer than 2,000 students), whereas students in the first stratum are more likely to attend schools of 2,000 or more students.


Table 2. Balance of Logit of Propensity Score

 

 

 

 

 

 

 

First Match for Mathematics Achievement

   
 

Schools with 2,000 or more

Schools with 1–399

 

N

Mean

SD

N

Mean

SD

Stratum 1

206

-5.26

0.65

1

-5.12

 

Stratum 2

197

-4.06

0.24

11

-4.00

0.29

Stratum 3

202

-3.39

0.16

5

-3.45

0.10

Stratum 4

196

-2.87

0.14

12

-2.85

0.16

Stratum 5

191

-2.39

0.14

16

-2.32

0.08

Stratum 6

181

-1.97

0.12

27

-0.19

0.10

Stratum 7

175

-1.56

0.11

32

-1.57

0.11

Stratum 8

166

-1.21

0.10

42

-1.21

0.11

Stratum 9

156

-0.83

0.13

51

-0.83

0.12

Stratum 10

118

-0.45

0.10

90

-0.43

0.09

Stratum 11

114

-0.12

0.10

94

-0.09

0.10

Stratum 12

98

0.23

0.09

109

0.24

0.10

Stratum 13

62

0.59

0.12

146

0.60

0.10

Stratum 14

48

0.97

0.10

159

0.97

0.11

Stratum 15

41

1.36

0.12

167

1.35

0.11

Stratum 16

30

1.78

0.11

177

1.77

0.12

Stratum 17

23

2.18

0.14

185

2.21

0.14

Stratum 18

11

2.73

0.18

196

2.76

0.17

Stratum 19

10

3.32

0.31

198

3.48

0.27

Stratum 20

2

4.80

1.03

205

5.18

1.07

       

Second Match for Mathematics Achievement

   
 

Schools with 2,000 or more

Schools with 400–799

 

N

Mean

SD

N

Mean

SD

Stratum 1

222

-3.39

0.56

16

-3.06

0.35

Stratum 2

225

-2.39

0.19

13

-2.41

0.19

Stratum 3

209

-1.86

0.12

30

-1.89

0.13

Stratum 4

183

-1.49

0.10

55

-1.51

0.09

Stratum 5

182

-1.15

0.11

57

-1.13

0.10

Stratum 6

176

-0.81

0.09

62

-0.80

0.10

Stratum 7

144

-0.54

0.07

95

-0.53

0.06

Stratum 8

140

-0.31

0.07

98

-0.30

0.07

Stratum 9

133

-0.08

0.06

106

-0.08

0.06

Stratum 10

121

0.13

0.07

117

0.15

0.06

Stratum 11

99

0.36

0.06

139

0.35

0.06

Stratum 12

87

0.57

0.06

152

0.57

0.06

Stratum 13

71

0.78

0.06

167

0.79

0.07

Stratum 14

53

1.02

0.07

186

1.03

0.07

Stratum 15

50

1.24

0.07

188

1.25

0.07

Stratum 16

34

1.49

0.07

205

1.49

0.07

Stratum 17

38

1.74

0.07

200

1.75

0.08

Stratum 18

27

2.05

0.09

212

2.06

0.11

Stratum 19

20

2.46

0.13

218

2.46

0.14

Stratum 20

13

3.21

0.35

225

3.20

0.41

       

Third Match for Mathematics Achievement

   
 

Schools with 2,000 or more

Schools with 800–1,199

 

N

Mean

SD

N

Mean

SD

Stratum 1

209

-2.25

0.42

38

-2.23

0.38

Stratum 2

210

-1.55

0.12

37

-1.51

0.09

Stratum 3

197

-1.18

0.10

50

-1.19

0.10

Stratum 4

170

-0.89

0.08

77

-0.88

0.07

Stratum 5

158

-0.64

0.07

89

-0.64

0.07

Stratum 6

141

-0.42

0.06

106

-0.43

0.06

Stratum 7

144

-0.24

0.05

103

-0.23

0.05

Stratum 8

140

-0.07

0.04

107

-0.07

0.04

Stratum 9

121

0.08

0.04

126

0.09

0.04

Stratum 10

123

0.23

0.04

124

0.23

0.04

Stratum 11

99

0.36

0.04

149

0.36

0.04

Stratum 12

93

0.51

0.04

154

0.50

0.04

Stratum 13

82

0.66

0.05

165

0.66

0.04

Stratum 14

81

0.81

0.04

166

0.81

0.04

Stratum 15

64

0.97

0.05

183

0.97

0.04

Stratum 16

56

1.14

0.05

191

1.14

0.05

Stratum 17

55

1.32

0.06

192

1.33

0.06

Stratum 18

31

1.53

0.07

216

1.54

0.07

Stratum 19

31

1.83

0.09

216

1.84

0.11

Stratum 20

22

2.43

0.29

225

2.46

0.31

       

Fourth Match for Mathematics Achievement

   
 

Schools with 2,000 or more

Schools with 1,200–1,999

 

N

Mean

SD

N

Mean

SD

Stratum 1

182

-0.85

0.18

101

-0.84

0.17

Stratum 2

182

-0.51

0.06

102

-0.51

0.06

Stratum 3

176

-0.31

0.05

108

-0.31

0.05

Stratum 4

154

-0.15

0.05

130

-0.14

0.05

Stratum 5

144

0.00

0.04

139

0.00

0.04

Stratum 6

131

0.11

0.03

153

0.10

0.03

Stratum 7

135

0.21

0.03

149

0.21

0.03

Stratum 8

122

0.30

0.02

162

0.30

0.02

Stratum 9

133

0.38

0.02

151

0.38

0.02

Stratum 10

116

0.46

0.02

167

0.46

0.02

Stratum 11

98

0.54

0.02

186

0.54

0.02

Stratum 12

100

0.62

0.02

184

0.62

0.02

Stratum 13

82

0.70

0.02

202

0.70

0.02

Stratum 14

75

0.79

0.02

209

0.79

0.03

Stratum 15

75

0.87

0.02

208

0.88

0.03

Stratum 16

76

0.97

0.03

208

0.97

0.03

Stratum 17

83

1.08

0.03

201

1.08

0.04

Stratum 18

59

1.20

0.03

225

1.20

0.04

Stratum 19

52

1.36

0.05

232

1.36

0.06

Stratum 20

52

1.72

0.24

231

1.76

0.26



For each of the four matches, separate least squares within strata regression models are constructed. These models allow for the estimation of the potential effect of attending a smaller school for students in each stratum. However, to estimate the overall average treatment effect of being in the smaller schools, across-strata regression models that control for each stratum are used.15 This strategy allows for the effective estimation of the overall average treatment effect of being in the smaller schools for students currently enrolled in schools of 2,000 or more students.


EXPLORING THE EFFECT OF DIFFERENT SCHOOL SIZES


In these analyses, students in schools of 2,000 or more students are matched with students in schools of 1–399, 400–799, 800–1,199, and 1,200–1,999 students to determine whether students in larger schools would experience increased mathematics achievement if they attended these smaller schools of varying sizes.16 If attending smaller schools would provide a potential benefit to students currently enrolled in larger schools, the effects for average students’ mathematics achievement should be positive.


EFFECTS OF BEING IN SCHOOLS WITH 1-399 STUDENTS


For this first match, students in schools of 2,000 or more students are matched with students in schools with 1–399 students based on the set of 68 covariates. As shown in Table 3a, no significant difference is observed for the average treatment effect of being in the smaller schools. In terms of 12th grade mathematics achievement, there does not appear to be a potential benefit for the students in large high schools if they were to attend smaller schools. In fact, the direction of the effect, although not significant, is in the negative direction.


Table 3a. OLS Within Stratum Propensity Score Regression: Math Achievement

 
 

Schools with 2,000 or more

Schools with 1–399

Mean Difference

 
 

Mean

Mean

MD

STE

Signif

Stratum 1

51.561

78.295

26.734

4.248

Stratum 2

48.905

55.520

6.615

4.953

 

Stratum 3

47.888

65.288

17.400

1.538

*

Stratum 4

48.349

50.625

2.276

4.871

 

Stratum 5

47.071

57.427

10.356

4.216

*

Stratum 6

49.729

44.477

-5.252

3.084

Stratum 7

47.548

51.048

3.500

3.019

 

Stratum 8

46.973

50.532

3.559

2.568

 

Stratum 9

48.348

49.342

0.994

2.579

 

Stratum 10

49.620

47.278

-2.342

1.897

 

Stratum 11

51.825

50.905

-0.920

1.975

 

Stratum 12

50.356

48.741

-1.615

2.060

 

Stratum 13

52.009

49.063

-2.946

2.205

 

Stratum 14

53.786

46.832

-6.954

2.345

**

Stratum 15

51.592

48.967

-2.625

2.502

 

Stratum 16

46.744

50.185

3.441

2.876

 

Stratum 17

54.001

49.363

-4.638

3.079

 

Stratum 18

46.285

49.007

2.722

4.602

 

Stratum 19

49.287

50.726

1.439

4.400

 

Stratum 20

38.690

50.063

11.373

9.742

 
       
       
    

STE

Signif

    

Average Treatment Effect

-0.565

 

0.654

     
          

 † ≤ .10; * p ≤ .05. **p ≤ .01. *** p ≤ .001

      



The within-strata regression models for the individual strata also suggest that being in a school of 1–399 students for students currently in schools of 2,000 or more would not be beneficial in all circumstances. Stratum 3 and Stratum 5 suggest potential benefits for students in the larger schools in switching, whereas Stratum 14 suggests potential detrimental effects of changing to the smaller schools. Across the regression models, signs of the observed mean differences are not strictly positive. This further indicates that being in a smaller school of 1–399 students would not result in increased mathematics achievement for all students in schools of 2,000 or more students.


EFFECTS OF BEING IN SCHOOLS WITH 400-799 STUDENTS


In this model, students in schools with 2,000 or more students are matched with students in schools with 400–799 students based on their propensity score. The average overall effect for 12th-grade mathematics achievement is not significant. The direction of the coefficient is also negative. Again, it appears that if students currently enrolled in schools of 2,000 or more were instead enrolled in schools with 400–799 students, there would not be a significant increase in the students’ mathematics achievement (Table 3b).


Table 3b. OLS Within Stratum Propensity Score Regression: Math Achievement

 
 

Schools with 2,000 or more

Schools with 400–799

Mean Difference

 
 

Mean

Mean

MD

STE

Signif

Stratum 1

51.561

44.491

-7.070

4.248

Stratum 2

47.931

47.290

-0.641

4.592

 

Stratum 3

49.258

51.177

1.919

1.538

 

Stratum 4

49.881

44.978

-4.903

2.431

*

Stratum 5

47.411

47.238

-0.173

2.383

 

Stratum 6

46.655

47.454

0.799

2.271

 

Stratum 7

47.661

50.351

2.690

2.074

 

Stratum 8

48.359

50.532

2.173

2.075

 

Stratum 9

49.175

49.024

-0.151

2.084

 

Stratum 10

49.465

48.778

-0.687

1.932

 

Stratum 11

48.116

47.771

-0.345

1.917

 

Stratum 12

51.900

52.195

0.295

2.013

 

Stratum 13

48.558

52.138

3.580

2.005

Stratum 14

55.005

49.160

-5.845

2.381

*

Stratum 15

51.915

51.020

-0.895

2.274

 

Stratum 16

53.526

52.059

-1.467

2.761

 

Stratum 17

54.484

51.781

-2.703

2.557

 

Stratum 18

49.471

51.238

1.767

2.996

 

Stratum 19

54.112

51.142

-2.970

3.308

 

Stratum 20

46.568

50.705

4.137

3.971

 
       
       
    

STE

Signif

    

Average Treatment Effect

-0.226

 

0.543

     
          

 † ≤ .10; * p ≤ .05; **p ≤ .01; *** p ≤ .001

      



The coefficients in the individual strata suggest that students in schools of 2,000 or more would not experience the desired increase in mathematics achievement from being in these smaller schools. Only Stratum 4 and Stratum 13 exhibit significant mean differences, and these differences are opposite of what would be desired. In these cases, being enrolled in the smaller schools would result in decreased mathematics achievement.


EFFECTS OF BEING IN SCHOOLS WITH 800-1,199 STUDENTS


In this model, students attending the larger schools are matched to students in schools that enroll between 800 and 1,199 students (Table 3c). The overall average treatment effect for these larger-school students is again not significant, which suggests that being in these smaller schools would not result in increased mathematics achievement for these students. Similar to the previous models, the direction of the coefficient for the average treatment effect is not positive, indicating that creating schools of this size would probably not be beneficial in terms of increasing student mathematics achievement. In addition, the coefficients for the individual within-strata regression models are not significant. The benefit of these smaller schools is not evident for students in schools of 2,000 or more students.


Table 3c. OLS Within Stratum Propensity Score Regression: Math Achievement

 
 

Schools with 2,000 or more

Schools with 800–1,199

Mean Difference

 
 

Mean

Mean

MD

STE

Signif

Stratum 1

49.919

78.295

3.414

2.969

 

Stratum 2

48.295

55.520

1.368

2.962

 

Stratum 3

49.328

65.288

-2.466

2.488

 

Stratum 4

48.072

50.625

0.492

2.186

 

Stratum 5

48.242

57.427

0.207

2.111

 

Stratum 6

48.788

44.477

1.024

2.115

 

Stratum 7

48.799

51.048

-3.294

1.908

Stratum 8

49.166

50.532

-0.097

2.057

 

Stratum 9

48.740

49.342

1.123

2.025

 

Stratum 10

48.232

47.278

-0.449

2.016

 

Stratum 11

47.266

50.905

3.599

2.039

Stratum 12

50.834

48.741

-1.984

1.889

 

Stratum 13

50.068

49.063

0.179

1.944

 

Stratum 14

52.448

46.832

0.094

1.936

 

Stratum 15

49.431

48.967

-0.696

2.258

 

Stratum 16

51.435

50.185

-0.033

2.291

 

Stratum 17

52.001

49.363

-0.657

2.265

 

Stratum 18

54.595

49.007

-2.472

2.794

 

Stratum 19

50.771

50.726

2.123

2.905

 

Stratum 20

53.015

50.063

0.770

3.022

 
       
       
    

STE

Signif

    

Average Treatment Effect

-0.031

 

0.497

     
          

 † ≤ .10. * p ≤ .05. **p ≤ .01. *** p ≤ .001.

      



EFFECTS OF BEING IN SCHOOLS WITH 1,200-1,999 STUDENTS


In Table 3d, students in schools of 2,000 or more are matched with students in schools with 1,200–1,999 students. Results indicate that the average overall treatment effect is not significant with the coefficient in the negative direction. If students in schools of 2,000 or more students were instead enrolled in schools of 1,200–1,999, it appears that they would not see a substantial increase in their mathematics achievement. In some cases, the effect could be positive, and in others, the effect might be negative. Only Stratum 19 is significant with an increase in students’ 12th-grade mathematics achievement. The other strata do not exhibit this significant positive relationship, which again suggests that these smaller schools would not result in increased mathematics achievement for every student.


Table 3d. OLS Within Stratum Propensity Score Regression: Math Achievement

 
 

Schools with 2,000 or more

Schools with 1,200–1,999

Mean Difference

 
 

Mean

Mean

MD

STE

Signif

Stratum 1

52.118

51.825

-0.293

2.042

 

Stratum 2

47.931

46.253

-1.678

2.133

 

Stratum 3

47.690

48.415

0.725

2.052

 

Stratum 4

49.041

48.161

-0.880

1.844

 

Stratum 5

50.076

48.705

-1.371

1.773

 

Stratum 6

48.498

50.009

1.511

1.870

 

Stratum 7

49.757

49.120

-0.637

1.891

 

Stratum 8

49.489

48.452

-1.037

1.749

 

Stratum 9

48.879

49.216

0.337

1.890

 

Stratum 10

50.197

50.066

-0.131

1.948

 

Stratum 11

49.236

49.445

0.209

1.901

 

Stratum 12

50.550

49.040

-1.510

1.943

 

Stratum 13

47.465

49.758

2.293

1.943

 

Stratum 14

48.070

49.249

1.179

1.954

 

Stratum 15

48.341

49.671

1.330

2.020

 

Stratum 16

50.056

48.175

-1.881

2.004

 

Stratum 17

51.204

49.496

-1.708

2.089

 

Stratum 18

52.493

49.621

-2.872

2.264

 

Stratum 19

42.432

49.098

6.666

2.382

**

Stratum 20

50.885

46.465

-4.420

2.296

       
       
    

STE

Signif

    

Average Treatment Effect

-0.235

 

0.441

     
          

 † ≤ .10. * p ≤ .05. **p ≤ .01. *** p ≤ .001.

      



MULTIVARIATE SENSITIVITY ANALYSIS


It is entirely possible that the matches did not isolate the “ideal” size category for which students in schools of 2,000 or more would experience a benefit. Perhaps attending schools in a different range of school sizes would result in increased overall mathematics achievement that was not observable in these comparisons. To explore this possibility, a multivariate sensitivity analysis was conducted. Similar to Lee and Smith (1997), a HLM was estimated with mathematics achievement as an outcome and the other important predictors, excluding school size, at both the school and student level included in the model.17 After running the model, the residuals were saved, and the average residual from the model for each school was plotted against the continuous variable for school size. If a peak in the plot of the points was observed, this would suggest that there was an optimal school size in which mathematics achievement could be potentially maximized.


Figure 1. Multivariate Sensitivity Analysis


[39_15175.htm_g/00001.jpg]
click to enlarge


Figure 1 shows a band of points across the entire range of school sizes, which suggests that there does not seem to be an optimal school size that would result in higher levels of mathematics achievement. It appears that students in schools of 2,000 or more would not uniformly benefit from being in smaller schools, and, if there are benefits for these students, they do not appear to accrue for student mathematics achievement.


DISCUSSION


The issue of the impact of high school size on academic outcomes is a pressing educational policy concern, especially in light of the establishment of many smaller schools across the United States. Few studies have systematically investigated the impact of smaller schools as it relates to academic achievement. Because mathematics achievement serves as a “gatekeeper” for many forms of higher education, the results of this study present discouraging implications of the return on the investment in these smaller school environments.


The propensity score models employed in this study allow for the systematic exploration of the potential benefits of shifting students in schools with 2,000 or more students to schools of smaller sizes. The models suggest that in few circumstances would moving students who are currently in large schools to smaller schools result in increased student mathematics achievement. In fact, in some cases, the result could potentially be detrimental to students and actually lower their current level of performance. Furthermore, results from the multivariate sensitivity analysis, which was conducted to determine whether the “ideal” match for students in these largest schools had been missed, confirmed that there was not a particular school size that would result in optimal mathematics achievement. Students’ mathematics performance across high schools of different sizes appears to be quite similar. Taken as a whole, these two sets of findings suggest that creating small schools may not be an effective strategy for improving mathematics achievement in high schools today.


What could explain the difference in these results from previous findings, such as those by Lee and Smith (1995,1997) showing that student mathematics performance was stronger in smaller high schools? One explanation is the methodological implications of selection effects in the data set. When students in larger schools systematically differ from students in smaller schools, such as the differences between large urban schools and smaller rural schools, these differences must be accounted for in the analyses. The analyses presented in this article using propensity score techniques attempt to correct for these potential selection effects in these data. It could be that the selection effects found in the ELS data were not as prevalent in earlier decades, when Lee and others conducted their analyses using the NELS data.18 Another possibility is that the use of 10th-grade measures to perform propensity matching does not allow a large enough time frame for the cumulative effect of smaller schools to be captured.19 This is a limitation of the ELS, because data are only available for 10th and 12th grade.


Additionally, the lack of evidence regarding increased mathematics achievement presented in this article may be attributable to the change in mathematics instruction in the past several years. Recent years have seen the initiation of different techniques for teaching mathematics, such as constructivist approaches, that may not be effective in increasing student performance in mathematics.20 Many high school mathematics teachers are often inadequately prepared to teach in this subject area, with many providing instruction out of their fields of expertise (Ingersoll, 1999; Monk, 1994). Creating small schools may exacerbate this problem because the need for mathematics teachers may exceed the number of teachers with knowledge about how to teach mathematics effectively.


The literature on small schools shows that these schools often build a sense of community and increase student and teacher engagement. In related work using propensity score methods on a variety of outcome measures, the authors found some positive community effects for small high schools. With respect to students taking action on college plans—for example, by applying to college—small schools appear to have a positive impact. Specifically, students in small schools had a greater tendency to apply to more colleges than those in larger schools (Schneider et al., 2007).


These results may suggest that small-school environments could have more resources per student and therefore may be better equipped to help students align their ambitions with actual plans of action. Although these are important outcomes, small schools also need to demonstrate similar gains in academic performance to meet the demands of today’s era of high-stakes accountability. Schools are held accountable for improving student achievement in key areas such as mathematics, and failing to raise achievement in these areas has serious implications under the No Child Left Behind Act. Until we have stronger evidence with respect to raising achievement, especially for low-performing students in urban areas, decisions to create smaller schools should be made with caution.



Notes


1. The positive effects that Lee and Smith (1997) found were based on hierarchical linear modeling.

2. Academic press is defined as “the degree to which school organizations are driven by achievement oriented values, goals and norms” (Shouse, 1997, p. 61). It includes three components: academic climate, disciplinary climate, and teachers’ instructional practices and emphasis (Shouse, 1996, 1997).

3. For a discussion of the methodological challenges in prior school-size research, see Brookings Paper on Education Policy 2006/2007 (Loveless & Hess, 2007).

4. We use data from the restricted use file for this analysis because it contains a continuous measure of school size.

5. Dropouts and early completers were not excluded from the analysis if they had legitimate responses on both school size and mathematics test scores.

6. The EM algorithm is a general purpose iterative algorithm for computing maximum likelihood (ML) estimates for incomplete data (McLachlan & Krishnan, 1997). On each iteration of the algorithm, there is an expectation step (E-step) and a maximization step (M-step), and iterations continue until convergence is reached (Little & Rubin, 1987; McLachlan & Krishnan; Schafer, 1997).

7. An assessment of missing data indicated that data are not missing at random. Missing responses were more likely to be observed for students who were low-income or minority students. A comparison of the imputed data to the original data suggested that the imputed data have similar characteristics to the data with observed responses. However, the effects of imputing data when data are not missing at random are unknown.

8. In prior work, we conducted extensive analyses of the ELS data set and found significant selection effects when investigating the effects of school size on student achievement, student expectations, and college attendance choices (see, Schneider et al., 2007).

9. The IRT ability estimate for math is an estimate of the number of items students would have answered correctly had they responded to all 72 items in the ELS:2002 mathematics item pool.

10. The authors previously estimated a series of hierarchical linear models and found no effect for small schools (Schneider et al., 2007).

11. Similarly, design effects and weights are used to correct for sample selection and issues of representation that affect generalizability. In this analysis, we are focusing on individuals and the effect that attending smaller schools would have on their mathematics achievement.

12. Assuming that the treatment effects do not interact with each other (Stable Treatment Unit Value Assumption [STUVA]), using the propensity score would provide an unbiased estimate of being in one group versus the other (Rosenbaum & Rubin, 1983).

13. Each covariate is assumed to be a function primarily of the student and not the school that the student was attending. The goal of propensity score matching is to obtain accurate predictions of the propensity of attending small schools by using every variable in the data set that might potentially be directly related to the effect, which is why this analysis has 68 covariates. Other applications of propensity score matching in educational research have used over 150 covariates when matching (see Hong & Raudenbush, 2005, 2006). Examples of specific covariates used in the analyses include whether a student planned to take college entrance exams or the Armed Services Vocational Aptitude Battery (ASVAB), whether the student had ever skipped school, whether the student was classified as learning disabled, and family composition. For a complete list of covariates, please contact the authors.

14. Twenty strata are used based on the sample size of these data and the need for appropriate balance across the strata, and to maintain consistency in presenting results. Adequate balance was obtained with the use of fewer strata for all the models except the match of students in schools of 2,000 or more and in schools of 1–399 students. This could indicate that the students in schools of 1–399 students and students in schools of 2,000 or more are less similar to each other. The numbers of strata were increased so that students in each of these strata would be more homogenous and to facilitate a comparison in this case. The use of large numbers of strata to achieve balance in educational research is not uncommon. For example, Hong and Raudenbush (2005) used 15 strata in their work on kindergarten retention policies.

15. We use weighted least squares (WLS) to estimate the overall treatment effects because heteroscedasticity was observed across some of the different strata. If OLS estimators were used in these cases, the overall significance of the result could not be determined because the difference could be based on an actual difference in the observed statistic or the difference in variances. WLS provides a way of correcting for these issues.

16. Lee and Smith (1997) also divided the continuous school size variable into categorical school size ranges. The ranges used in this study are somewhat similar to the ones used in their work and the ones provided with the ELS data set.

17. Specifically, school-level factors included the percentage of the school that is minority, average socioeconomic status (SES) of the school, whether a school is Catholic, public, or private, and the school urbanicity. Dummy variables were constructed for Catholic, private, high minority, urban, and rural schools, and average SES was treated as continuous. Student-level factors included student SES, composite test score, and dummy variables for female, Asian, Black, Hispanic, and students from other minority racial groups (Alaskan, American Indian, and multiracial).

18. However, to check this, descriptive statistics on a corresponding sample in the NELS data set should be computed.

19. Other studies on school size usually focus their investigation on students’ high school years (Lee & Smith, 1997), capturing similar periods of time as those covered in this study.

20. There is some evidence of this.  For example, using the NELS data set, Petrin (2005) found that more constructivist approaches were not uniformly associated with gains in science achievement and in some cases appeared to have a negative impact on student outcomes.



References


Allison, P. (2002). Missing data. Thousand Oaks, CA: Sage.


Battistich, V., Solomon, D., Kim, D.-i., Watson, M., & Schaps, E. (1995). Schools as communities, poverty levels of student populations, and students' attitudes, motives, and performance: A multilevel analysis. American Educational Research Journal, 32, 627–658.


Borman, G., & D’Agostino, J. (1996). Title I and student achievement: A meta-analysis of federal evaluation results. Educational Evaluation and Policy Analysis, 18, 309–326.


Burkam, D., Ready, D., Lee, V. E., & LoGerfo, L. (2004). Social-class differences in summer learning between kindergarten and first grade: Model specification and estimation. Sociology of Education, 77, 1–31.


Coleman, J. S., & Hoffer, T. (1987). Public and private high schools: the impact of communities. New York: Basic Books.


Dempster, A. P., Laird, M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.


Greene, J., & Symonds, W. C. (2006, June 26). Bill Gates gets schooled: Why he and other execs have struggled in their school reform efforts, and why they keep trying. Business Week, pp. 64–70.  


Herszenhorn, D. M. (2006, September 13). 11 city schools fail to meet state criteria. The New York Times, p. 8.


Hoffer, T. (1995). High school curriculum differentiation and postsecondary outcomes. In P. W. Cookson & B. Schneider (Eds.), Transforming schools (pp. 371–402). New York: Garland.


Hong, G., & Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27, 205–224.


Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of casual inference for multilevel observational data. Journal of the American Statistical Association, 101, 901–910.


Huebner, T. A. (2005). Rethinking high school: An introduction to New York City's experience. San Francisco: WestEd for the Bill and Melinda Gates Foundation.


Ingersoll, R. M. (1999). The problem of underqualified teachers in American secondary schools.  Educational Researcher, 28(2), 26–37.


Kahne, J. E., Sporte, S. E., de la Torre, M., & Easton, J. (2006). Small high schools on a larger scale: The first three years of the Chicago High School Redesign Initiative. Chicago: Consortium on Chicago School Research at the University of Chicago.


Lee, V. E. (2000). School size and the organization of secondary schools. In M. Hallinan (Ed.), Handbook of the sociology of education (pp. 327–344). New York: Kluwer/Academic Plenum.


Lee, V. E., & Smith, J. (1995). Effects of high school restructuring and size on early gains in achievement and engagement. Sociology of Education, 68, 241–270.


Lee, V. E., & Smith, J. (1997). High school size: which works best and for whom? Educational Evaluation and Policy Analysis, 19, 205–227.


Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.


Loveless, T., & Hess, F. M. (2007). Brookings papers on education policy 2006/2007. Washington, DC: Brookings Institution Press.


McLachlan, G. J., & Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.


Monk, D. H. (1994). Subject area preparation of secondary mathematics and science teachers and student achievement. Economics of Education Review, 13, 125–145.


No Child Left Behind Act of 2001, Pub. L. No. 107-110, •115, Stat. 1425 (2002).


Petrin, R. (2005). School organization, curricular structure, and the distribution and effects of instruction for tenth-grade science. In L. V. Hedges & B. Schneider (Eds.), The social organization of schooling (pp. 175–199). New York: Russell Sage Foundation.


Phillips, M. (1997). What makes schools effective? A comparison of the relationships of communitarian climate and academic climate to mathematics achievement and attendance during middle school. American Educational Research Journal, 34, 633–662.


Pittman, R. B., & Haughwout, P. (1987). Influence of high school size on dropout rate. Educational Evaluation and Policy Analysis, 9, 337–343.


Porter, A. (1989). A curriculum out of balance: the case of elementary school mathematics. Educational Researcher, 18(5), 9–15.


Raudenbush, S., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: Sage.


Ravitch, D. (2006, July 30). Bill Gates, the nation's superintendent of schools [Electronic version]. Los Angeles Times. Retrieved August 3, 2006, from http://www.latimes.com


Raywid, M. A. (1996). Taking stock: The movement to create mini-schools, schools-within-schools, and separate small schools (Urban Diversity Series No. 108). New York: ERIC Clearinghouse on Urban Education Institute for Urban and Minority Education. (ERIC Document Reproduction Service No. ED396045)


Rosenbaum, J. (2002). Observational studies. New York: Springer Verlag.


Rosenbaum, J., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.


Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association, 73, 318–328.


Rubin, D. B., & Thomas, N. (2000). Combining propensity score matching with additional adjustments for prognostic covariates. Journal of the American Statistical Association, 95, 573–585.


Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman and Hall.


Schneider, B., Swanson, C., & Riegle-Crumb, C. (1998). Opportunities for learning: Course sequences and positional advantages. Social Psychology of Education, 2, 25–53.


Schneider, B., Wyse, A. E., & Keesler, V. (2007). Is small really better? Testing some assumptions about high school size. In T. Loveless & F. M. Hess (Eds.), Brookings papers on education policy 2006/2007 (pp. 15–47). Washington, DC: Brookings Institution Press.


Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.


Shouse, R. (1996). Academic press and sense of community: Conflict, congruence, and implications for student achievement. Social Psychology of Education, 1, 47–68.


Shouse, R. (1997). Academic press, sense of community, and student achievement. In J. S. Coleman (Ed.), Redesigning American education (pp. 60–86). Boulder, CO: Westview Press.


Stevenson, D. (2000). The fit and misfit of sociological research and educational policy. In M. Hallinan (Ed.), Handbook of the sociology of education (pp. 547–563). New York: Kluwer Press.


Toch T. (1991). In the name of excellence. New York: Oxford University Press.


Wasley, P. A., Fine, M., Gladden, M., Holland, N. E., King, S. P., Mosak, E., et al. (2000). Small schools: Great strides. A study of new small schools in Chicago. New York: Bank Street College of Education.








Cite This Article as: Teachers College Record Volume 110 Number 9, 2008, p. 1879-1900
https://www.tcrecord.org ID Number: 15175, Date Accessed: 11/29/2021 10:26:38 AM

Purchase Reprint Rights for this article or review
 
Article Tools

Related Media


Related Articles

Related Discussion
 
Post a Comment | Read All

About the Author
  • Adam E. Wyse
    Michigan State University
    ADAM E. WYSE is a doctoral student and graduate research associate in the Measurement and Quantitative Methods program at Michigan State University. His research interests include educational assessments, educational policy, and psychometrics. Recent publications include “Is Small Really Better? Testing Some Assumptions of School Size” with Barbara Schneider and Venessa Keesler, in Brookings Papers on Education Policy 2006/2007 (Brookings Institution, 2007).
  • Venessa Keesler
    Michigan State University
    VENESSA KEESLER is a doctoral student and graduate research associate in the Measurement and Quantitative Methods program at Michigan State University. Her research interests include sociology of education, educational policy, and quantitative methods. Recent publications include “School Reform 2007: Transforming Education Into a Scientific Enterprise” with Barbara Schneider (in Annual Review of Sociology, Volume 33) and “Scaling-Up Exemplary Interventions” with Sarah-Kathryn McDonald, Nils Kauffman, and Barbara Schneider (in Educational Researcher, Volume 35).
  • Barbara Schneider
    Michigan State University
    E-mail Author
    BARBARA SCHNEIDER is the John A. Hannah Chair University Distinguished Professor in the College of Education and professor of sociology in the Department of Sociology at Michigan State University. Her research interests include sociology of education, scale-up research, and quasi-experimental designs for estimating causal effects. Recent publications include “Scale-up in Education, Volume 1: Ideas in Principle” and “Scale-Up in Education, Volume 2: Issues in Practice” with Sarah-Kathryn McDonald (Rowman & Littlefield, 2007); and “Estimating Causal Effects Using Experimental and Observational Designs” with Martin Carnoy, Jeremy Kilpatrick, William Schmidt, and Richard Shavelson (American Educational Research Association, 2007).
 
Member Center
In Print
This Month's Issue

Submit
EMAIL

Twitter

RSS