
Class Size Effects on Reading Achievement Using PIRLS Data: Evidence from Greeceby Spyros Konstantopoulos & Anne Traynor  2014 Background/Context: The effects of class size on student achievement have gained considerable attention in education research and policy, especially over the last 30 years. Perhaps the best evidence about the effects of class size thus far has been produced from analyses of Project STAR data, a largescale experiment where students and teachers were randomly assigned to smaller or larger classes within schools. Researchers have also analyzed observational data to examine the effects of class size, but the results have been mixed. Purpose/Objective/Research Question/Focus of Study: It is generally difficult to draw causal inferences about class size effects with observational data because of the omitted variables problem. This shortcoming can be overcome with instrumental variables (IV) methods that are designed to facilitate causal inferences. The present study uses IV methods to examine the effects of class size on reading achievement using data from the 2001 fourthgrade sample of the Progress in International Reading Literacy Study (PIRLS) in Greece. We took advantage of Greece’s nationwide rule about maximum class size in elementary schools to construct IV estimates of class size. Population: PIRLS was designed to monitor children’s achievement levels in fourth grade worldwide. We used reading achievement data from 2001 in Greece. The sample was a national probability sample of fourth graders. The use of appropriate weights helped us make projections to the fourthgrade student population in Greece in 2001. Research Design: The research design was secondary analysis. We examined whether class size predicts reading achievement for fourth graders in Greece net of student, teacher/classroom, and school characteristics. We used multilevel models to capture the dependency in the data (i.e., students nested within schools). We also used instrumental variables methods to facilitate causal inferences about class size effects.
Conclusions: We investigated the effects of class size on reading achievement for fourth graders in Greece in 2001 using rich data from PIRLS. The results produced from the multilevel and the IV analyses were overall similar. Generally, the results indicated a positive association between class size and achievement. However, the association was typically statistically insignificant, especially when teacher/classroom and school variables were taken into account. The effective allocation of school resources to increase student achievement has been a paramount objective of school effects research and policy making. Much of education research has focused on identifying important schoolrelated factors that affect student learning positively. The underlying hypothesis is that school resources make a difference and have positive effects on student achievement. At the same time, many school policies are designed to ensure the best possible distribution of school resources that will result in higher levels of student performance. Decisions about the best allocation of school resources include decisions about assigning teachers and students to classrooms. Typically, such decisions involve determining the optimal number of students in a classroom (i.e., class size) in order to maximize student learning. The effects of class size on student achievement have gained considerable attention in education research and policy, especially over the last 30 years. Perhaps the best evidence about the effects of class size thus far has been produced from analyses of Project STAR data, a largescale experiment where students and teachers were randomly assigned to smaller or larger classes within schools (e.g., Finn & Achilles, 1990; Nye, Hedges, & Konstantopoulos, 2000). However, researchers have also frequently analyzed observational data to examine the effects of class size (e.g., Angrist & Lavy, 1999; Milesi & Gamoran, 2006). It is generally difficult to draw causal inferences about class size effects with observational data however, because of the omitted variables problem. In schools, the assignment of students and teachers to classrooms is by and large not random, but rather a selection process. For example, parents, teachers, or principals influence student assignment to classrooms, and student background, ability, or motivation are often used to assign students to classrooms. Teacher assignment can also be subject to selection, because teacher characteristics such as seniority are often used to assign teachers to classrooms. As a result, the evidence about the effects of class size is usually correlational with observational data. This “shortcoming” with observational data can be overcome with instrumental variables (IV) methods, which are econometric approaches designed to facilitate causal inferences with observational data (see Angrist & Krueger, 2001). The present study uses IV methods to examine the effects of class size on reading achievement using data from the 2001 fourthgrade sample of the Progress in International Reading Literacy Study (PIRLS) in Greece. Specifically, we took advantage of Greece’s nationwide rule about maximum^{1} class size in elementary schools to construct IV estimates of class size that are hypothesized to facilitate causal inferences. RELATED LITERATURE Class size reduction is an appealing school intervention because it is easy to implement, assuming there are adequate funds and supply of classrooms and qualified teachers. Its implementation involves making sure that each classroom does not exceed a specific number of students (e.g., 20). Also, class size reduction does not necessarily require changes in teaching or instructional practices, or curricula; that is, teachers can go about their everyday classroom routine. That does not preclude teachers, however, from modifying their classroom practices in smaller classes if needed, but such changes are more of a teacherdependent decision, not necessarily an intention of class size reduction programs. The effects of class size reduction on student achievement have been examined empirically via various research designs and analyses over the past few decades. Numerous smallscale experimental and quasiexperimental studies have investigated the effects of class size on student achievement. Metaanalytic reviews of early work on class size effects have suggested that class size reduction has positive effects on student achievement and that these effects become greater as class size becomes smaller (Glass, Cahen, Smith, & Filby, 1982; Glass & Smith, 1979). The benefits of small classes appeared to be more pronounced in classes with 10 or 20 students as opposed to 30 or 40 students. The class size achievement association was more pronounced in randomized studies (Glass & Smith, 1979). There was also some evidence that minority and economically disadvantaged students benefitted more from being in smaller classes. Other studies have examined the effects of class size reduction on student achievement using observational data. Typically, these studies compute the association between class size and student achievement, adjusting for important student background factors such as gender, race/ethnicity, socioeconomic status (SES), and previous academic achievement. The interpretation of these studies’ results has been contradictory, and this body of research has overall not yielded consistent evidence about the effects of small classes. Some reviewers of this research have argued that the effects of class size on student achievement are small or nonexistent (Hanushek, 1989). However, other reviewers have suggested that reducing class size has considerable effects, and that students benefit by being in small classes (Greenwald, Hedges, & Laine, 1996). Findings from primary school studies during the last decade have also been mixed. For instance, Angrist and Lavy (1999) found that reducing class size increased Israeli fourth and fifth graders’ scores significantly, and Pong and Pallas (2001) found positive small class effects on eighthgrade achievement in different countries. In contrast, Hoxby (2000) reported that smaller classes had little to no effect on student achievement, and Milesi and Gamoran (2006) found no evidence of class size effects on student achievement in early grades. Perhaps the best and most convincing evidence about class size effects has been produced from Project STAR, a largescale randomized experiment designed to investigate class size effects in the state of Tennessee. In Project STAR the average class size in small classes was nearly 15 students, and in regular size classes it was nearly 23 students. Early analyses of Project STAR data have indicated that small classes had positive effects on student achievement in early grades (Finn & Achilles, 1990). There was also some evidence that minority students benefitted more from small classes. More recent reanalyses of Project STAR data have also demonstrated that small classes increase achievement for all students on average (Hanushek, 1999; Krueger, 1999; Konstantopoulos, 2008; Nye, Hedges, & Konstantopoulos, 2000). Analyses of followup data that examined the lasting benefits of small classes have indicated that the small class advantage persists at least through middle school (Finn, Gerber, Achilles, & BoydZaharias, 2001; Nye, Hedges, & Konstantopoulos, 1999). Mainly because of the findings produced by analyses of Project STAR data, class size reduction has been identified as a promising school mechanism that promotes student achievement. As a result, some states have introduced class size reduction programs. For example, the state of California introduced a class size reduction program giving schools financial incentives to reduce class size in the early elementary grades to 20 or fewer students in each classroom. Also, the state of Wisconsin has adopted a program that reduced class size to nearly 15 students per classroom in the early grades in schools with high percentages of economically disadvantaged students. Overall, the findings from studies that have used experimental data seem to be consistent about the small class effects, indicating positive immediate and longer term effects of small classes on student achievement. In contrast, the findings obtained from observational data are by and large mixed. One reason for this inconsistency may be that in many observational studies class size is not measured accurately because data about the actual class size in each classroom are not available. Instead, frequently class size is represented by average class size (e.g., pupil to teacher ratio) in each school. Another reason could be that student data are not always available in observational studies, and researchers typically analyze administrative data. In addition, in observational studies, selection bias is plausible, and measures of class size may be confounded with unobserved variables that play a role in the assignment of individuals to classrooms (e.g., parental pressure, teacher seniority, student ability, et cetera). Regardless of the assignment mechanism, it is likely that class size effects are confounded with student and teacher characteristics, observed or unobserved, because classroom assignment is generally nonrandom. To address this confounding issue with observational data, very careful statistical modeling is required. However, the validity of the statistical modeling depends on whether variables that are relevant to class size and achievement have been measured and controlled for in regression models. In some cases important statistical controls such as prior achievement may be available, while in other cases they may not be. That is, model specifications might vary, and as a result the findings might vary. Omitted variable bias is a real threat with observational data, and the produced estimates might have lower internal validity than well designed, well executed experiments such as Project STAR (Krueger, 1999; Nye et al., 2000). Causal inferences are more likely to be tenable in randomized experiments because in principle random assignment eliminates preexisting differences between groups on average. STUDENT VARIABLES AND ACHIEVEMENT The association between student variables such as gender and SES and student achievement has been repeatedly estimated in empirical research. In education and the social sciences it is not uncommon to include gender in specifications that predict student achievement. Previous studies that have used national probability samples of students have found gender differences favoring female students in reading (see Hedges & Nowell, 1995; Willingham & Cole, 1997). Also, there is little disagreement about the role that the SES of a student plays in school performance. The relationship between test scores and SES has been documented repeatedly in education and the social sciences (e.g., White, 1982; White, Reynolds, Thomas, & Gitzlaff, 1993). Traditional measures of SES include parental education, family income, family size, and household possessions (Coleman, 1969; Konstantopoulos, 2006; White et al., 1993). In our study, SES was an important covariate that was used to adjust for preexisting differences among students in classes of different sizes. Another important covariate is prior achievement. However, because PIRLS data are crosssectional, such measures were not available. Instead we used an average of four selfreported measures of reading ability (see the appendix table). Finally, due to increased migration to Greece from the late 1990s, we also included in our specifications whether Greek language was spoken at home. We hypothesized that this variable would be important in predicting reading performance. TEACHER/CLASSROOM VARIABLES AND ACHIEVEMENT Prior work has examined whether teacher characteristics are linked with student achievement. For instance, Greenwald et al., (1996) found strong and positive effects of teacher characteristics (e.g., teacher education and experience) on student achievement. More recent studies have confirmed these findings for teacher experience in particular (Clotfelter, Ladd, & Vigdor, 2006; Nye, Konstantopoulos, & Hedges, 2004). In observational studies, where assignment to classrooms is not random, classroom context is also important and could contribute to achievement differences in reading among classrooms. For example, differences in the proportion of female or low SES students among classrooms could explain partly achievement differences in reading among classrooms. The average reading ability level of the students in a classroom is also an important covariate that explains classroom differences in reading achievement. Such effects are also known as peer effects, and have been shown to be a significant determinant of student achievement (Zimmer & Toma, 2000). Instruction and teaching practices are also an important part of class size effects. Earlier studies have documented that teachers in small classes are more likely to teach the class as a whole (see Bourke, 1986). Other studies have provided evidence that, in small classes, teachers spent more time in individualized instruction and less time on group instruction (Betts & Shkolnik, 1999). Along the same lines, more recent work has indicated that in small classes students are less likely to work in groups and engage in collaborative group work (Blatchford, Bassett, & Brown, 2011). Our study examined whether teacher characteristics and peer effects were linked to reading achievement. In addition, we investigated whether grouping had an effect on or mediated class size effects.
SCHOOL VARIABLES AND ACHIEVEMENT The social composition of students in a school has also been found to have an influence on achievement. For example, school composition measured as percent of economically disadvantaged students in the school has been found to be negatively associated with achievement and accounted for a substantial amount of variability in achievement (see Bryk & Raudenbush, 1988). In addition, we hypothesized that time spent on instruction, which is a function of time spent on instruction daily as well as throughout the school year, would be a positive correlate of reading achievement. Previous work has shown that the length of the school year, which translates to more instruction, has positive effects on student learning (D’Agostino, 2000). The effects of school sector on student achievement have also been frequently studied. For example, Coleman and Hoffer (1987) found that, on average, students’ verbal achievement growth in Catholic schools was higher than that in public schools. School urbanization and school size have also been shown to be related to student achievement (Konstantopoulos, 2006; Lee & Loeb, 2000). For example, Lee and Loeb have found that achievement gains in small schools in Chicago were higher than gains in larger schools. THE PRESENT STUDY Although evidence from Project STAR has strongly indicated that students in small classes have higher achievement than students in larger classes, largescale experiments about class size effects that are both designed and executed well are uncommon. Project STAR was an exception and perhaps one of the best experiments that have been conducted in the US (see Mosteller, Light, & Sacks, 1996). More frequently, researchers analyze observational data to examine the effects of class size (e.g., Angrist & Lavy, 1999). A potential caveat with findings produced from analyses of observational data, however, is that it is difficult to infer causality about class size effects. Specifically, omitted variable bias in these cases is plausible; that is, variables that contribute to the assignment mechanism of students and teachers to classrooms are not always measured. For example, parental pressure could determine the assignment of students to classrooms. One promising technique that can address this shortcoming with findings from observational data is the IV estimation, which is designed to assist claims about causality of class size on student achievement. This approach is used widely in the econometric literature (e.g., Angrist & Krueger, 2001; Wooldridge, 2009). The challenging task in this procedure is to identify a good instrument about class size. However, whenever this is accomplished, causal inferences could be drawn about the effects of class size (see Angrist & Krueger, 2001). The promise of IV methods is that under specific assumptions and conditions a good instrument resembles the intention to treat variable in a randomized experiment, and that omitted variable bias should be minimized. However, good instruments for class size are not always easy to construct. Arguably the rule about a maximum number of students in a classroom, as mandated by law in some countries (or states), can be used to construct a very good instrument that explains formation of classes and average class size in grades and schools (see Angrist & Krueger, 2001). Thus far, there is little evidence about class size effects using the rule about maximum class size. Notable exceptions are the study by Angrist and Lavy (1999) in elementary schools in Israel, and the study by Leuven, Oosterbeek, and Ronning (2008) in secondary schools in Norway. Yet these studies have used administrative data and have not conducted analyses using rich student or classroom data. That is, previous studies have not controlled for student level background adequately nor have they examined how instructional practices in classrooms could impact or mediate the class size effects. Also, there is not much evidence in the literature about class size effects in countries that have standardized curricula regulated by the federal government (e.g., Greece). Finally, many studies about class size effects have produced findings that are not representative of specific student populations and thus have limited external validity (Shadish, Cook, & Campbell, 2002). Our study provides additional evidence about class size effects in Greece and fills in a gap in the literature in four ways. First, Greece, a country in southeastern Europe, has a standardized education system regulated by the Ministry of Education. That is, in Greece all schools, public and private, have standardized curricula and follow the same guidelines regarding content coverage and classroom practices established by the Greek Ministry of Education. Students typically attend public schools in their own neighborhoods, unless they attend private schools that may be outside of their neighborhoods. In that sense Greece is differently structured than the US. Second, Greece has a nationwide rule about maximum^{1} class size in elementary schools. Specifically, from 1985 to 2005 the ceiling for the number of students in a classroom was 30, both in public and private schools. We took advantage of that rule to construct IV estimates of class size that facilitate causal inferences (see Angrist & Lavy, 1999, for a thorough discussion). Because of standardized curricula and the rule about maximum class size, one can compute estimates about class size for the entire country of Greece. Third, PIRLS is a national probability sample of fourth graders and thus the results of our study should have higher external validity and should be generalizable to all fourth graders in Greece. Fourth, PIRLS includes information about classroom practices (e.g., type of instruction, grouping) that are crucial in indentifying the mechanism of class size effects. This is important because data about classroom practices are infrequently available in class size studies, and therefore, thus far, factors that could mediate class size effects have not been clearly identified. Finally, PIRLS data are more recent than data from Project STAR, which allows us to explore class size effects on student achievement in the early 2000s. Data from Project STAR are now more than 20 years old, and not much work lately has examined class size effects using rich recent data that represent specific student populations. To our knowledge, PIRLS data have not been used to examine class size effects or to identify the class size mechanism. METHODS DATA PIRLS is an important source of information about the reading achievement levels of fourthgrade students worldwide (Mullis, Martin, Kennedy, & Flaherty, 2003). The study was designed to monitor children’s progress in reading, and has collected reading achievement data of high quality. PIRLS has collected data in a manner that permits trend comparisons with tests that are equated over time for countries that have participated more than once in the assessment. Few other largescale surveys have collected reading achievement data on pre high school students, making PIRLS a unique source of information on reading achievement of fourth graders. Also, PIRLS has gathered rich information about the home and school contexts where learning to read takes place (Mullis et al., 2003). The student, home, teacher, and school questionnaires that PIRLS has administered helped us explore the associations between family background, teachers, classroom practices, and schools, and achievement in an interpretive framework. So far, PIRLS has collected reading achievement data of nationally representative samples of fourth graders in 2001, 2006, and 2011^{2} and will continue data collection every five years. The instrumentation, sampling, and data collection procedures have been kept the same over time and the scales on which tests are reported have been equated. In this study, we used reading achievement data from 2001 in Greece, a country that participated in PIRLS only that year. PIRLS’s sampling design was a 2stage stratified cluster sample where schools were selected at the first stage and classrooms within schools at the second stage. In principle, all students in sampled classrooms were part of the study. In the Greek sample, only one classroom was sampled within each school and thus the number of schools and classrooms is the same in our dataset. The study was designed to yield a national probability sample of fourth graders. Thus, with the use of appropriate weights we were able to make projections to the fourthgrade student population in Greece in 2001. DEPENDENT VARIABLE The main dependent variable was reading achievement. Because the item pool of PIRLS 2001 was quite extensive and would require more than five hours for each student to complete the entire assessment, it was decided that only a part of the entire assessment was going to be administered to each student (see Mullis et al., 2003). That is, each participating student responded to a subset of items only, and as a result, fewer responses from each student were available. Content representation was maintained by and large when the responses were aggregated across all students (see Mullis et al., 2003). To construct more reliable estimates of student performance that could be projected to the entirety of the assessment, state of the art statistical methods were used (i.e., multiple imputation that generate plausible values) (see Rubin, 1987; Schaefer, 1997). The key idea is that the students’ estimates of their reading ability incorporate some uncertainty that needs to be taken into account in the computation of their overall reading performance. Multiple imputation is an appropriate procedure that generates multiple sets of imputed scores (i.e., plausible values) that incorporate this uncertainty. Thus, plausible values are preferred to a single index of student achievement. The plausible values methodology for largescale surveys was first developed for the National Assessment of Educational Progress (NAEP) (Mislevy, Johnson, & Muraki, 1992). Since then, the plausible values approach has been used in all subsequent NAEP surveys, but also in TIMSS (Trends in International Mathematics and Science Survey), PIRLS, and PISA (Program for International Student Assessment). Statistical theory has shown that five plausible values can produce reliable and consistent estimates of student performance (Schaefer, 1997). PIRLS followed NAEP and TIMSS and used the typical procedure of generating five estimates of performance for each student. Reading performance was derived from test booklets that were sent to schools along with student, teacher, and school questionnaires. The assessment took place in the spring of 2001. The class size information was provided by teachers’ responses to a teacher questionnaire. All the data that include reading performance, class size, student, teacher/classroom, and school characteristics were recorded during the same period (Spring 2001). INDEPENDENT VARIABLES The main independent variable is class size in fourth grade. Specifically, the class size measure we used is an average of three variables: the total number of enrolled students in the sampled classroom, the total number of enrolled fourth grade students in the sampled classroom, and the total number of fourth grade students present on the day of the PIRLS survey. We took this average class size variable as the best estimate of the actual class size experienced by fourthgraders during a typical school day around the time of the PIRLS survey. The three variables we used to compute the composite measure of class size were highly related with one another (correlations were greater than 0.91). From a measurement perspective the average of the three measures of class size should be a more reliable/stable measure than each one of the individual measures of class size. It should be noted that we also conducted separate analyses for each individual measure of class size and the results were similar to those reported for the average measure in the results section. We also included in our models student, teacher, classroom, and school variables of interest. Evidence from previous research guided us in selecting the variables that we used in the models (see the literature review section). Overall, we built four model specifications. In the first model, only class size was used as a predictor of reading achievement; that is, we measured the bivariate association between reading achievement and class size. In the second model, we also included student covariates such as gender (e.g., a dummy for female); student reading ability; language spoken at home (e.g., a dummy for speaking Greek); SES represented by family size (i.e., number of children in the household); items present at home (e.g., books, computer, own room, et cetera); and parental educational attainment (e.g., a dummy for parents being college graduates). In the third model we added teacher covariates that include education (e.g., a dummy for college graduate); years of teaching experience the fourth grade; and gender (e.g., a dummy for male). Classroom covariates, such as grouping students for instruction by ability and frequency of discussions of readings in small groups, classroom context (e.g., percent of female and remedial students, and reading ability level), and an aggregate classroom measure of items in the home that served as a proxy for classroom SES were also included in the model. It is important to control for variables that measure instructional practices in the classroom involving grouping, because such variables might be mediating the class size effect (Pong & Pallas, 2001). It is also critical to control for classroom context to probe the extent to which the class size estimate is adjusted by these variables. The last model also included school covariates such as sector (e.g., a dummy for private), city size, tracking, time spent on instruction, percent of economically disadvantaged students (50 percent or more being the reference group), percent of students with learning disabilities, and school size. Missing data flags (i.e., dummies) were included in the models to account for missing data effects. Overall, we estimated class size effects on reading achievement, adjusting for family background, teacher, classroom, and school characteristics. A detailed description of the variables we used can be found in the appendix table. STATISTICAL ESTIMATION TwoLevel Model The main objective of our study is to examine whether class size predicts reading achievement in fourth grade net of student, teacher/classroom, and school characteristics. The twolevel regression equation for student i in school j is (1) where represents the achievement scores (i.e., reading), is the constant term, ClassSize is the main independent variable, represents the class size effect and is the regression coefficient of interest, ST is a row vector of student background characteristics such as gender, parental education, family size, items in the home, et cetera, B_{02} is a column vector of regression coefficients of student characteristics, CL a is row vector of teacher or classroom characteristics such as teacher experience, education, use of ability grouping, et cetera, B_{03} is a column vector of regression coefficients of teacher and classroom characteristics, SC is a row vector of school characteristics such as city size, sector, proportion of economically disadvantaged students, school size, et cetera, and B_{04} is a column vector of regression coefficients of school characteristics. The last two terms are random effects or residuals. Specifically, is a school residual, and is a student residual. The variance of the random effect captures the clustering of students within schools and is used to correct the standard errors of the regression estimates. The variables included in the regression were described in the variables section above (also see appendix table). Instrumental Variables In observational studies the assignment of students and teachers to classrooms is not random typically, and could be affected by unobserved factors related to principals, teachers, parents, or student characteristics. If this assumption were true, the association between class size and reading achievement captured in Equation 1 would be biased. Specifically, students could be assigned to classrooms because of decisions made by teachers or principals, or by parental pressure. Other times, assignment to classrooms may be related to students’ characteristics such as prior achievement, motivation, or SES. If some of these variables have not been measured or are not available, the estimate in equation (1) may be biased due to omitted variable bias. To overcome this potential shortcoming and facilitate causal inferences, we used an IV approach that facilitates causal inferences with observational^{3} data. Specifically, we used the approach introduced by Angrist and Lavy (1999). The IV approach assumes that the instrument, which is the average class size in a grade computed using the rule about maximum class size in Greece, influences reading achievement only through class size. In our case, the rule of having a ceiling restricting the number of students in a classroom should determine the average class size in a grade in a school. Now, the instrument should be unrelated (i.e., exogenous) to other unobserved variables that are related to achievement, but related to class size (i.e., relevant). In our study, the correlation between the instrument and class size was nearly 0.50 and significant. This indicates that the instrument is relevant. The exogeneity assumption is much more difficult to prove. Nonetheless, when these assumptions hold the IV approach produces consistent regression estimates in large samples compared to regression models. In addition, IV approaches are useful in decreasing the measurement error in independent variables of interest, which in this case is class size (see Angrist & Krueger, 2001). The study by Angrist and Lavy (1999) was one of the first studies that used the IV technique that capitalizes on a rule about the maximum number of students allowed in a class. The authors used Maimonides’ rule, which restricts class size in Israel to a maximum of 40 students, to create an instrument of average class size. Here, we follow their approach to compute an instrument of average class size for fourth graders in Greece. The rule about maximum class size in Greece is 30 and applies as follows. When there are 31 students in a specific grade, two classrooms should be formed, whereas when 61 students are in a specific grade, three classrooms should be formed, and so forth. Such rules are likely to be an exogenous source of class size variability (Angrist & Lavy, 1999). The first step in this procedure is to compute the average class size in fourth grade using total enrollment in fourth grade and the Greek federal rule of not allowing more than 30 students in a classroom. Total enrollment in fourth grade is computed as the sum of female and male students in the grade (see appendix table). Following Angrist and Lavy we utilized the Greek rule about maximum class size and computed the average class size in Grade 4 as (2) where ACS4 is the average class size in fourth grade, EG4 (the numerator) is the student enrollment in Grade 4, and INT represents the function generating the next smaller integer of the expression . For example, if enrollment in fourth grade is 70 then the integer of (70 – 1)/30 is 2 since it is the next smaller integer of 69/30 = 2.3. That is, the denominator of Equation 2 computes the number of classes in fourth grade. The same logic applies for any enrollment value in Grade 4. That is, if (EG4  1)/30 < 2 then the integer produced is 1. The instrument ACS4 is a function of fourthgrade enrollment and the maximum class size rule and is the same for all classes in that grade in a specific school. Once the instrument was computed, we regressed class size on the instrument and other predictors such as student characteristics and fourthgrade enrollment. The regression equation for student i was defined as (3) where the row vector EG includes linear and quadratic terms of fourthgrade enrollment, the’s are the regression coefficients that need to be estimated, and e is the residual term. All other terms have been defined previously. An implicit assumption in Equation 3 is that the instrument is related to achievement only through class size. Another assumption is that the specification in Equation 3 is correct and that enrollment, for example, is unrelated to the error term (i.e., the variables that could affect achievement via enrollment have been included as covariates in the equation). Enrollment could be related to achievement via variables that are observed or unobserved. It is difficult to check the assumption that the model is exactly correct and that all relevant variables are included in Equation 3. If the assumption is not correct exactly, then the model in Equation 3 may not hold exactly. We computed the fitted or predicted values of the regression model in Equation 3 and used them to construct the class size variable that predicts reading achievement. That is, in the final statistical model, reading achievement was regressed on the fitted values from Equation 3 that represent class size, as well as other student, teacher, classroom, and school variables. The twolevel regression equation for student i in school j is _{ } (4) where Y is reading achievement of fourth graders, FV represents class size (i.e., the fitted values from Equation 3), ’s are the regression coefficients that need to be estimated and all other terms have been introduced previously. We also included missing data dummies for some variables to adjust for missing data effects. The variance of u indicates the proportion of the variance in the outcome that is between schools, and is used to correct the standard errors of the regression estimates, which otherwise may be underestimated. It is also important to correct for heteroscedasticity of the error terms in Equation 4. As a result, the standard errors we computed were robust, that is, they take into account the nesting structure of the data as well as heteroscedasticity (i.e., nonconstant variance). The coefficient of interest here is , which represents the relationship between reading achievement and class size, adjusted for student, teacher/classroom, and school characteristics. In order to make projections to the target population of fourth graders in Greece, we used student weights provided in PIRLS at the student level (see appendix table). The HLM software was used to compute the regression estimates and their standard errors. The analysis was conducted for each plausible value separately, and then an average of all values is calculated (Shafer, 1999). The standard errors of the average estimates were computed using methods described by Shafer (1999) and by Little and Shenker (1995). RESULTS DESCRIPTIVE STATISTICS Descriptive statistics of the variables of interest are reported in Table 1. Fifty percent of fourth graders were females and more than 90% of the students spoke Greek at home. Twenty percent of students’ parents had at least a college degree. The average class size was 19 students per classroom, which is smaller than the average class size in regular size classes in Project STAR (23), but larger than the average class size in small classes (15) (Krueger, 1999). The average teacher experience in fourth grade was 4 years, whereas the overall average teacher experience was 16 years. Slightly more than 20% of the teachers had a bachelor’s degree. It is noteworthy that in Greece until about 1990, teaching degrees for elementary education (i.e., first grade through sixth grade) were typically twoyear degrees awarded by teaching colleges, not universities. Around 1990, the first wave of teachers with a fouryear bachelor’s degree graduated from Greek universities. Slowly, university graduates started entering the teaching profession and thus it is not surprising that in 2001 they composed only onefifth of the elementary school teacher sample. Onethird of the teachers in the sample were male. Only six percent of schools were private and nearly 50% of the schools had more than 50% of economically disadvantaged students. The majority of schools were located in cities with populations that did not exceed 100,000 people. MULTILEVEL ANALYSIS: CLASS SIZE Because students were nested within schools, we used multilevel models to analyze the data as described in Equation 1. First we ran an unconditional model with no predictors to determine the variance decomposition in the outcome. The results indicated that nearly 23% of the variance in the outcome was between schools, while most of the variance, 77 percent, was within schools. This result is similar to results produced from analyses of trend NAEP data (see Konstantopoulos & Hedges, 2008). Then, we ran a model that included the main independent variable, class size. The estimates of this analysis are summarized in the first and second columns of Table 2. The class size regression coefficient was positive but insignificant at the .05 level. With a onestudent increase in class size, reading achievement increased by 1.62 points in the reading scale, which appears to be a small effect given the central tendency and standard deviation of the reading scores. The second model added student covariates in the equation to adjust class size estimates for student background. The estimates of this analysis are reported in the third and fourth columns of Table 2. The class size coefficient was still positive and insignificant at .05. The student variables decreased the class size estimate by nearly 30%. Female students performed on average higher than their male peers in reading by nearly onesixth of a standard deviation, which is not a trivial gap. Family size was negatively and significantly related to reading achievement, suggesting that students with more siblings performed significantly lower in reading than students with fewer siblings. Students who have parents with college degrees performed significantly higher than other students. The parental education gap was nearly three times as large as the gender gap and nearly onehalf of a standard deviation. Items in the home were also positively and significantly related to reading achievement, which indicates that more resources at home correspond to increases in reading achievement. The third model added teacher and classroom variables in the equation. The estimates of this analysis are shown in Columns 5 and 6 in Table 2. The class size estimate was still positive and insignificant at the .05 level. The regression estimate of class size decreased by nearly 10% compared to the estimate in the second model. The estimates of the student covariates were overall similar to those estimates in the second model. The teacher and classroom variables were insignificant for the most part. The only predictor that was significant at the .05 level was whether most students in the class are above average in reading. Some estimates, however, were significant at the .10 level. For example, teacher education was significant at 0.10 and the magnitude of the coefficient was nearly onefifth of a standard deviation. Daily discussions of readings in small groups was also a positive predictor of reading achievement and significant at .10 (the reference group being “never discuss readings in small groups”). Finally, students in classrooms with higher proportions of female students had higher average reading achievement at the .10 level. The fourth and final model also included school covariates in the equation. The estimates of this analysis are reported in the seventh and eighth columns of Table 2. The class size estimate was still positive and insignificant at the .05 level. The regression coefficient decreased by nearly 50 percent compared to the estimate in the third model, and was smaller than its standard error. The estimates of the student covariates were, overall, similar to those estimates in the second and third models. The estimates of the teacher and classroom variables did not change much either, with two exceptions. In classrooms with higher proportions of female students, student achievement was significantly higher than in other classrooms at the .05 level. Also, daily discussions of readings in small groups was a positive and significant predictor of reading achievement compared to the reference group (“never discusses readings in small groups”). School covariates such as instructional time, tracking, or percent of disadvantaged or learning disabled students were not significantly related to student achievement. On average, students who attended schools in cities with populations greater than 500,000 people performed significantly higher at the .05 level in reading achievement than students who attended schools in smaller cities and towns. This result is not that surprising because many higherperforming schools are in larger cities such as Athens, the capital of Greece, or Thessaloniki, the second largest city located in northern Greece. In addition, students who attended private schools had higher average reading achievement than their peers in public elementary schools at the .05 level, a result that is similar to results reported in previous work (Coleman & Hoffer, 1987). The private school advantage was slightly larger than the parental education advantage and nearly onehalf of a standard deviation. This finding is also intuitive because, in Greece, the students who attend private schools come from wealthier backgrounds with many resources. MULTILEVEL ANALYSIS: CLASS SIZE USING IV Multilevel models were also used to examine the effects of class size estimates that were produced by the IV methods (see Equation 4). Table 3 summarizes the estimates of this analysis. The results of the first model showed that the class size estimate was positive and significant at .05. A onestudent increase in class size corresponded to an increase in reading achievement by 3.37 points in the reading scale, which appears to be a small increase given the central tendency and standard deviation of reading scores. That is, the effect is significant, but it is unclear that it is meaningful. The results of the second model were overall similar to those in Table 2. Only now the class size estimate is positive and significant at .05, but still small. Student characteristics adjusted the class size estimate by nearly 30%, but did not eliminate the positive class size effect. All student variables were significant except Greek language spoken at home. However, in the third model when teacher and classroom variables were also included in the equation the class size coefficient—although still positive—became insignificant. The teacher and classroom variables adjusted the class size estimate by nearly 30%, compared to the estimate from the second model. As in Table 2 the majority of teacher and classroom variables had insignificant effects on reading scores except the variable that represented above average reading ability of the students in a classroom. Teacher education, daily discussions of readings in small groups, and proportion of female students in a classroom were significant predictors of reading achievement at the .10 level. Similar patterns were observed in the fourth model, only now the proportion of female students in a classroom was a significant predictor at the .05 level. The class size estimate was positive, nearly 65% smaller compared to that in the third model, insignificant and smaller than its standard error. From the first to the fourth model the small class estimate was reduced by nearly 80%. Most of the school covariates such as instructional time, tracking, or percent of disadvantaged or learning disabled students were not significantly related to student achievement. However, on average students who attended schools in cities with populations smaller than 500,000 people performed significantly lower than students who attended schools in larger cities. In addition, the private school advantage was slightly larger than the parental education advantage and slightly larger than onehalf of a standard deviation. Overall, these results indicate a positive but insignificant association between class size and reading achievement. This finding is in accord with previous evidence from observational studies that have reported insignificant class size effects (Hoxby, 2000; Milesi & Gamoran, 2006). However, our results are not congruent with the evidence from Project STAR that has pointed to positive effects of small classes on achievement (Krueger, 1999). The fact that the class size coefficient was positive is somewhat puzzling, although evidence from previous analysis of TIMSS data has suggested that in Hong Kong for example, class size was positively and significantly related to mathematics achievement (Pong & Pallas, 2001). Positive associations between class size and achievement have also been documented in earlier work. For instance, Mazareas (1981) analyzed data from a randomized experiment and reported a small statistically significant advantage for students in large classes in reading achievement. Other experimental studies’ work had also reported higher means in reading achievement in larger classes, but the mean differences between smaller and larger classes were not statistically significant (see Shapson, Wright, Eason, & Fitzgerald, 1980). In Project STAR, students in regular size classes had higher achievement than students in small classes in nearly onethird of the schools in the sample (see Konstantopoulos, 2011). However, overall, the larger class advantage was not significant at the .05 level. In addition, it is difficult to know exactly how class size effects vary in different education systems in different countries. One hypothesis is that larger classes indicate sometimes school locale, city size, and degree of wealth. For example, in some countries larger classes are perhaps more likely to be formed in good public schools or private schools in larger cities and in wealthier areas. In contrast, perhaps smaller classes are formed more frequently in rural areas with lower SES. In addition, there may be a school context effect; that is, in some schools classroom practices in larger classes may be very effective in promoting student achievement. We were unable to support this hypothesis empirically with our data. For instance, class size did not interact with city size or with percent of economically disadvantaged students. DISCUSSION We investigated the effects of class size on reading achievement for fourth graders in Greece in 2001 using rich data from PIRLS. Because of a Greek Ministry of Education law about maximum class size we were able to use an IV approach to determine whether class size affected reading achievement. The results produced from the multilevel and the IV analyses were overall similar. This may suggest that there is little bias from omitted variables in the multilevel regression estimate of class size. That is, perhaps in our study omitted variables are weakly related with class size and the bias is minimized (see Angrist & Krueger, 2001). We would not have known that if we had not conducted the IV analysis. That is, a researcher does not know how similar or different the results in multilevel regression or IV models are beforehand. Generally, the results indicated a positive association between class size and achievement. However, the association was typically statistically insignificant, especially when teacher/classroom and school variables were taken into account. This finding is in congruence with findings of previous work that suggest class size is not consistently related with student achievement, especially in observational studies (Hanushek, 1989). In other ways our findings are not in congruence with results from experimental studies in the US (Finn & Achilles, 1990). Class size may function differently in different countries. For instance, the function of class size in some countries may be determined at a macro level by laws passed by representatives (e.g., parliaments), and enforced by departments of federal education, while in other countries may be decided at a micro level by states, school districts, and schools. Also, different countries may subscribe to different classroom practices and teaching methods that are driven from different learning theories and pedagogical approaches. Countryspecific context notwithstanding, it is also difficult to disentangle class size effects for selection bias and potential moderators, unless a researcher has good quality classroom data and has measured all possible variables that could have an impact on class size effects. In this study, we were not able to examine the cumulative effects of smaller classes through time. Because our data are crosssectional we were only able to assess the effects of class size in one year (i.e., fourth grade). Previous research in the US has pointed to cumulative effects of small classes in early grades using Project STAR data (e.g., Konstantopoulos & Chung, 2009; Nye et al., 2000). However, given that the oneyear results in our study point to effects that are not different from zero, it is unclear whether cumulative effects over time would have been evident or meaningful. The variance decomposition of reading scores provided results that are similar to those reported for NAEP (Konstantopoulos & Hedges, 2008). That is, nearly 20% of the variance in the outcome was between schools, and when covariates were included in the models, the betweenschool variance was reduced gradually and accounted for 10% of the residual variance in the final model. Such values of clustering are similar to those reported in the US for achievement data (Hedges & Hedberg, 2007). We observed a gender gap favoring female students. In particular, female students outperformed their male peers in reading by nearly onefifth of a standard deviation, which is a larger gap than what has been reported in previous work (see Hedges & Nowell, 1995). Family size was negatively related to reading achievement, a finding congruent with previous work (Kuo & Hauser, 1997). Parental education and items at home were positively and strongly related to reading achievement (i.e., the estimates were several times larger than their standard errors). Parental education in particular had a significant effect on reading achievement and the advantage was nearly onehalf of a standard deviation, which is considerable. In addition, more resources at home were related to higher levels of reading achievement. Overall teacher and classroom practices variables were unrelated to reading achievement. Classrooms with higher proportions of female students had higher average reading achievement than other classrooms. Also, classrooms with higher average reading ability had higher reading achievement than other classrooms. Surprisingly, ability grouping and frequency of discussions of readings in small groups in the classroom were not related to reading achievement and did not affect the class size estimate dramatically. Thus, it is unclear whether grouping is an important part of the class size mechanism. Most of the school variables such as instructional time, tracking, and percentages of disadvantaged or learning disabled students in the school were also not related to student achievement and did not affect the class size estimate much. However, school urbanicity represented by city size had an effect on student achievement. Specifically, schools located in larger cities had higher achievement than schools in smaller cities. In addition, school sector had an important effect. Students who attended private schools had significantly higher reading achievement than their peers in public elementary schools. The private school advantage was considerable and nearly onehalf of a standard deviation. One potential limitation of the study is that we were not able to control for school effects adequately. Because only one classroom was sampled within each school it was not possible to include school fixed effects in the model (as dummies) to control for general school effects. As a result, it is possible that the estimates of class size effects were not adequately adjusted. Nonetheless, in models that include school variables as controls the class size effects are not significantly different from zero, which indicates that the observed school variables we used adjusted the class size coefficient to some degree. In addition, the school variables we included in the model are used frequently in school effects research (see Konstantopoulos, 2006; Lee & Croninger, 1994). Another potential limitation is that the assumptions about the IV estimation may not hold exactly. Although the rule about maximum class size is a good way to compute the number of classes in each school, and ultimately average class size for each grade in each school, it is unclear that enrollment is unrelated to other unobserved variables. For example, if enrollment is related to achievement through variables other than class size, such as student SES or city size and school sector, or other unobserved variables, then the estimates of the IV estimation may not be unbiased. We tested this empirically, and we did not find a meaningful association between enrollment and sector or city size. Still, it is unclear that all variables that are related to enrollment and achievement are controlled for in the IV estimation in Equation 3. We were able to control for student background to some degree in Equation 3. However, we were unable to control for prior achievement at the third grade because such information was not available. Prior achievement is a good covariate to include in regression models not only to control for preexisting differences in achievement, but also to decrease the variance in achievement and compute more precise estimates (i.e., smaller standard errors of estimates). It should be noted that we also tested whether the instrument is relevant to class size and strong. The correlation between class size and the instrument was nearly 0.50 and significant. In addition, the regression coefficient that captured the association between the instrument and class size in the first stage regression (i.e., class size regressed on the instrument) was significant at the 0.0001 level. The F statistic of the regression in Equation 3 that checks whether the regression coefficient of the instrument is zero was much larger (F > 200) than the typical critical value of 10, which indicates that the instrument was not weak, but quite the opposite (see Stock, Wright, & Yogo, 2002). To conclude, our study provided additional evidence about the effects of class size on student achievement. The maximum class size rule provided an opportunity to construct a good instrument about class size, and the data allowed us to make projections to the population of fourth graders in Greece. Data on classroom practices also helped us investigate whether class size is mediated by such practices. The results of the multilevel and the IV analysis were consistent and suggested that class size did not impact reading achievement. In addition, classroom practices did not seem to affect class size effects in our sample. One important aspect of empirical work is replication with different samples and settings and thus future research should continue examining class size effects in different grades and countries or states. In particular, it would be useful to examine class size effects in countries and states that use different rules about maximum number of students allowed in a classroom as well as the effects produced by changes in these rules over time. Acknowledgments The authors thank Steve Porter, Wei Li, Mark Reckase, and two anonymous reviewers for constructive feedback. Notes 1. In 1985, the Greek Ministry of Education passed a law that regulated class size in elementary public and private schools by setting a maximum number of students in each classroom to 30. This law was in effect until 2005. 2. The data from PIRLS 2011 are not currently publically available. 3. IV methods are extensively used with observational data in economics. But even in experiments, the intention to treat has been used frequently as an instrument to estimate consistent treatment on the treated effects (see Angrist, Imbens, & Rubin, 1996; Krueger, 1999). References Angrist, J. D., Imbens, G., & Rubin, D. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444455. Angrist, J. D., & Krueger, A. B. (2001). Instrumental variables and the search for identification: From supply and demand to natural experiments. Journal of Economic Perspectives, 15, 6985. Angrist, J. D., & Lavy, V. (1999). Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics, 114, 533–575. Betts, J. R., & Shkolnik, J. L. (1999). The behavioral effects of variations in class size: The case of math teachers. Educational Evaluation and Policy Analysis, 21, 193213. Blatchford, P., Bassett, P., & Brown, P. (2011). Examining the effect of class size on classroom engagement and teacherpupil interaction: Differences in relation to prior pupil attainment and primary vs. secondary schools. Learning and Instruction, 21, 715730. Bourke, S. (1986). How smaller is better: Some relationships between class size, teaching practices, and student achievement. American Educational Research Journal, 23, 558571. Bryk, A. S., & Raudenbush, S. W. (1988). Toward a more appropriate conceptualization of research on school effects: A threelevel hierarchical linear model. American Journal of Education, 97, 65–108. Clotfelter, C. T., Ladd, H. F., & Vidgor, J. L. (2006). Teacherstudent matching and the assessment of teacher effectiveness. Journal of Human Resources, 41, 778820. Coleman, J. S. (1969). Equality and achievement in education. Boulder, CO: Westview Press. Coleman, J. S., & Hoffer, T. B. (1987). Public and private schools: The impact of communities. New York: Basic Books. D’Agostino, J. V. (2000). Instructional and school effects on students’ longitudinal reading and mathematics achievements. School Effectiveness and School Improvement, 11, 197235. Finn, J. D., & Achilles, C. M. (1990). Answers and questions about class size: A statewide experiment. American Educational Research Journal, 27, 557577. Finn, J. D., Gerber, S. B., Achilles, C. M., & BoydZaharias, J. (2001). The enduring effects of small classes. Teachers College Record, 103, 145183. Glass, G. V., Cahen, L. S., Smith, M. E., & Filby, N. N. (1982). School class size: Research and policy. Beverly Hills, CA: Sage. Glass, G. V., & Smith, M. E. (1979). Metaanalysis of research on class size and achievement. Educational Evaluation and Policy Analysis, 1, 216. Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). The effects of school resources on student achievement. Review of Educational Research, 66, 361396. Hanushek, E. A. (1989). The impact of differential expenditures on school performance. Educational Researcher, 18, 4551. Hanushek, E. A. (1999). Some findings from an independent investigation of the Tennessee STAR experiment and from other investigations of class size effects. Educational Evaluation and Policy Analysis, 21, 143163. Hedges, L. V., & Hedberg, E. (2007). Intraclass correlation values for planning group randomized trials in education. Educational Evaluation and Policy Analysis, 29, 6087. Hedges, L. V., & Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of highscoring individuals. Science, 269, 41–45. Hoxby, C. M. (2000). The effects of class size on student achievement: New evidence from population variation. Quarterly Journal of Economics, 115, 12391285. Konstantopoulos, S. (2006). Trends of school effects on student achievement: Evidence from NLS:72, HSB: 82, and NELS:92. Teachers College Record, 108, 25502581. Konstantopoulos, S. (2008). Do small classes reduce the achievement gap between low and high achievers? Evidence from Project STAR. Elementary School Journal, 108, 275291. Konstantopoulos, S. (2011). How consistent are class size effects? Evaluation Review, 35, 7192. Konstantopoulos, S., & Hedges, L. V. (2008). How large an effect can we expect from school reforms? Teachers College Record, 110, 16131640. Konstantopoulos S., & Chung, V. (2009). What are the longterm effects of small classes on the achievement gap? Evidence from the Lasting Benefits Study. American Journal of Education, 116(1), 125154. Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, 114, 497532. Kuo, H. H. D., & Hauser, R. M. (1997). How does size of sibship matter? Family configuration and family effects on educational attainment. Social Science Research, 26, 6994. Lee, V. E., & Croninger, R. G. (1994). The relative importance of home and school in the development of literacy skills for middlegrade students. American Journal of Education, 102, 286329. Lee, V. E., & Loeb S. (1990). School size in Chicago elementary schools: Effects on teachers’ attitudes and students’ achievement. American Educational Research Journal, 37, 331. Leuven, E., Oosterbeek, H., & Ronning, M. (2008). Quasiexperimental estimates of the effect of class size on achievement in Norway. Scandinavian Journal of Economics, 100, 663693. Little, R. & Shenker, N. (1995). Missing data. In G. Arminger, C. C. Glob, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 3976). New York: Plenum Press. Mazareas, J. (1981). Effects of class size on the achievement of firstgrade pupils. Unpublished doctoral dissertation, Boston University, Boston. Milesi, C., & Gamoran, A. (2006). Effects of class size and instruction on kindergarten achievement. Educational Evaluation and Policy Analysis, 28, 287313. Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational and Behavioral Statistics, 17, 131154. Mosteller, F., Light, R. J., & Sachs, J. A. (1996). Sustained inquiry in education: Lessons learned from skill grouping and class size. Harvard Educational Review, 66, 797842. Mullis, I. V. S., Martin, M. O., Kennedy, A. M., & Flaherty, C. L. (2003). PIRLS 2001 Encyclopedia: A reference guide to reading education in the countries participating in IEA’s Progress of International reading Literacy Study (PIRLS). Chestnut Hill, MA: International Study Center, Lynch School of Education, Boston College. Nye, B., Hedges, L. V., & Konstantopoulos, S. (1999). The longterm effects of small classes: A fiveyear followup of the Tennessee class size experiment. Educational Evaluation and Policy Analysis, 21, 127142. Nye, B., Hedges, L. V., & Konstantopoulos, S. (2000). Effects of small classes on academic achievement: The results of the Tennessee class size experiment. American Educational Research Journal, 37, 123151. Nye, B., Konstantopoulos, S., & Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26, 237257. Pong, S., & Pallas, A. (2001). Class size and eighthgrade math achievement in the United States and abroad. Educational Evaluation and Policy Analysis, 23, 251273. Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley and Sons. Shadish, W. R., Cook, T. D. & Campbell, D. T. (2002). Experimental and quasiexperimental designs for generalized causal inference. Boston, MA: Houghton Mifflin. Shafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman and Hall. Shafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 315. Shapson, S. M., Wright, E. N., Eason, G., & Fitzgerald, J. (1980). An experimental study of the effects of class size. American Educational Research Journal, 17, 144152. Stock, J. H., Wright, J. H., & Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics, 20, 518529 White, K. R. (1982). The relation between socioeconomic status and academic achievement. Psychological Bulletin, 91, 461481. White, S. W., Reynolds, P. D., Thomas, M. M., & Gitzlaff, N J. (1993). Socioeconomic status and achievement revisited. Urban Education, 28, 328–343. Willingham, W. W., & Cole, N. S. (1997). Gender and fair assessment. Mahwah, NJ: Lawrence Erlbaum. Wooldridge, J. M. (2009). Introductory econometrics: A modern approach. Mason, OH: Cengage Learning. Zimmer, R. W., & Toma, E. F. (2000). Peer effects in private and public schools across countries. Journal of Policy Analysis and Management, 19, 7592.


