School Composition and Contextual Effects on Student Outcomes
by J. Douglas Willms - 2010
Background: Findings from several international studies have shown that there is a significant relationship between literacy skills and socioeconomic status (SES). Research has also shown that schools differ considerably in their student outcomes, even after taking account of students’ ability and family background. The context or learning environment of a school or classroom is an important determinant of the rate at which children learn. The literature has traditionally used school composition, particularly the mean SES of the school, as a proxy for context.
Focus of Study: This study examines the relationships among school composition, several aspects of school and classroom context, and students’ literacy skills in science.
Population: The study uses data from the 2006 PISA (Programme for International Student Assessment) for 57 countries. PISA assesses the knowledge and life skills of 15-year-old youth as they approach the end of their compulsory period of schooling.
Research Design: Secondary analyses of the data describe the socioeconomic gradient (the relationship between a student outcome and SES) and the school profile (the relationship between average school performance and school composition) using data for the United States as an example. The analyses demonstrate two important relationships between school composition and the socioeconomic gradient and distinguish between two types of segregation, referred to as horizontal and vertical segregation. The analyses discern the extent to which school composition and classroom and school context separately and jointly account for variation in student achievement.
Findings: The results show that school composition is correlated with several aspects of school and classroom context and that these factors are associated with students’ science literacy. Literacy performance is associated with the extent to which school systems are segregated “horizontally,” based on the distribution among schools of students from differing SES backgrounds, and “vertically,” due mainly to mechanisms that select students into different types of schools.
Conclusions: An understanding of socioeconomic gradients and school profiles for a school system is critical to discerning whether reform efforts should be directed mainly at improving the performance of particular schools or at striving to alter policies and practices within all schools. Both horizontal and vertical segregation are associated with lower student outcomes; therefore, we require a better understanding of the mechanisms through which students are allocated to schools. When the correlation of school composition with a particular contextual variable is strong, it calls for policies aimed at increasing inclusion or differentially allocating school and classroom resources among schools serving students of differing status.
For at least two decades following the landmark U.S. study, Equality of Educational Opportunity (Coleman et al., 1966), researchers devoted their attention to discerning whether schools made a difference to students educational outcomes after taking account of students socioeconomic backgrounds. The terms school effect and value-added were used to refer to the effect on students outcomes associated with attendance at a particular school, net of the effects associated with student family background and wider social and economic factors that lie outside the control of teachers or school administrators (Raudenbush & Willms, 1995; Willms, 1992). Researchers found that the accurate measurement of school effects required outcome measures that were sensitive to what was actually taught in school (e.g., Brookover et al., 1978; Madaus, Kellaghan, Rakow, & King, 1979); research designs that took into account students learning at different stages (Alexander, Pallas, & Cook, 1981; Bryk & Raudenbush, 1988); and sophisticated statistical techniques that could model data that were nested hierarchically (Aitkin & Longford, 1986; Goldstein, 1987; Raudenbush & Bryk, 1986). The research provided compelling evidence that there are indeed school effects: Some schools have better schooling outcomes than others, even after taking account of students family backgrounds and their ability upon entry to school.
Researchers naturally turned their attention to asking why schools varied in their outcomes. What aspects of school policy and practice are associated with effective schools? Numerous studies were conducted during the 1970s and 1980s that focused on the curriculum, teaching practices, and the context or learning environment of the classroom and school (for reviews, see McPherson & Willms, 1987; Murnane, 1981; Rutter, 1983; Sammons, Hillman, & Mortimore, 1995; Scheerens, 1992). A number of detailed studies of teachers behaviors found that certain teaching practices are associated with increased student learning (Brophy & Good, 1986; Gage & Needels, 1989). Two critical elements are the effective use of class time, and teaching that is structured and adaptive (Scheerens; Slavin, 1994). The research on school effectiveness also pointed to curriculum coverage as a significant factor: Effective schools tend to require students to take a core set of academically oriented courses (Alexander, 1982; Lee & Bryk, 1989). The content and pace of the curriculum are also related to student achievement (Barr & Dreeben, 1983; Dreeben & Gamoran, 1986). Finally, the research on school effects examined a number of factors associated with the context of the classroom and school, including the formal and informal mechanisms governing selection into particular schools and school programs, the ways that students are grouped for instruction, the nature of interactions among students and teachers, parental involvement, teachers expectations for achievement, and how time and resources are used (Gamoran, 1986, 1987; Ho & Willms, 1996; Pallas, 1988; Plewis, 1991; Slavin, 1990).
In considering the effects of school and classroom context and its role in educational policy, I will classify the various factors that affect student achievement into three broad domains: curriculum, teaching practices, and the context, or learning environment. This classification is somewhat contrived, because factors in each of these domains interact with those in other domains, and many factors arguably cut across at least two domains. For example, teachers may adopt different teaching practices for differing curricula, and the effective use of time is associated with both practice and context. However, this classification helps to set some boundaries on what is meant by school context, and it provides a framework for considering different educational reforms. I use the term context broadly to refer to the environment in which teaching and learning take place. It includes, among other factors, school and classroom resources, interactions among peers, the relationships between teachers and students, the disciplinary climate of the classroom, and the norms for academic success. Thus, it comprises factors that characterize or describe the learning environmentits physical features, its culture, and teachers practices. It does not include, however, structural features of the school system that may affect the environment, such as whether students are tracked into different types of school programs based on their ability or prior achievement. These affect school and classroom composition, but they are not part of what I consider the school and classroom context. In addition, there are other contexts in which teaching and learning take place. The major ones are the family, the neighborhood, peer groups, and the media. These are also important, but this article will focus on school composition and context.
The term contextual effects has been used in the academic literature to refer to the effects on student outcomes associated with the demographic characteristics of a schools composition, particularly the mean socioeconomic status (SES) of a school (e.g., Alexander & Eckland, 1975; Bryk & Driscoll, 1988; Willms, 1986). Some authors have explicitly used mean SES, or various other classroom- or school-level aggregates that describe student composition, as a proxy for peer effects (Robertson & Symons, 1996; Zimmer & Toma, 1997). This is problematic for at least two reasons. First, the aggregate measure that describes student composition is an inadequate proxy for peer effects. It may be, for example, that low-ability students have fewer interactions with their more able peers when they are in a high-ability setting than when they are in a low-ability setting. Direct measures of peer interactions are required. Second, the model is likely underspecified (Nechyba, McEwan, & Older-Aguilar, 2004), because many other classroom and school processes are correlated with school composition. Thus, it is important to distinguish between contextual effects and school composition effects (Alexander, Fennessey, McDill, & DAmico, 1979) and to be cautious about inferring causation.
In this article, I argue that the roles of school composition and context have important implications for school policy and practice, and the value-added of schools. First, I stress the importance of the socioeconomic gradient, the relationship between a student outcome and SES, and the school profile, which shows how average school performance is related to school composition. For a school system, an understanding of these relationships is critical to determining whether reform efforts should be directed mainly at improving the performance of particular schools or at striving to alter policies and practices within all schools. Second, I distinguish between two types of school segregation, which I refer to as horizontal and vertical segregation. Horizontal segregation is based on the extent to which students from differing SES backgrounds are distributed unequally across schools, whereas vertical inclusion is based on the variation among schools in their performance that arises from early selection or the tracking of students into particular schools or school programs. The results show that both types of segregation are undesirable and that policies aimed at achieving a more inclusive school system require an understanding of the mechanisms through which students are allocated to schools. Third, I show that school composition is correlated with many aspects of school context that are relevant to student performance. The correlation between composition and contextual factors is relevant to equitythat is, the fair allocation of resources. When the correlation of school composition with a particular contextual variable is strong, it calls for policies aimed at increasing inclusion or differentially allocating school and classroom resources among schools that serve students of differing status.
Although the focus of this article is on schools, the same arguments apply to the allocation of students to classes within schools. Indeed, many of the important contextual factors, such as classroom disciplinary climate, are rooted in the classroom, not the school. A few studies of school effectiveness have estimated the proportions of variation in student performance among students within classrooms, among classrooms within schools, and among schools. Generally, for both elementary and secondary schools, a greater proportion of the variation in student performance is among classrooms than among schools (Hill & Rowe, 1996; Mortimore, Sammons, Stoll, Lewis, & Ecob, 1988; Scheerens, Vermeulen, & Pelgrum, 1989; Willms, 2000). Also, the country is the highest unit of analysis in this discussion, but the same arguments and analytic strategies can be applied to school districts or to school systems at the state or provincial levels.
THE PROGRAMME FOR INTERNATIONAL STUDENT ASSESSMENT (PISA)
To examine the role of school composition and contextual factors, I use data from PISA 2006, which describe students science proficiency at age 15 and various family, classroom, and school factors related to science performance. PISA is a collaborative initiative of member countries of the Organisation for Economic Co-operation and Development (OECD) that is aimed at assessing the knowledge and life skills of 15-year-old youth as they approach the end of their compulsory period of schooling. PISA is a policy-oriented assessment program, designed and guided by an international steering committee to provide regular data that pertain to the most pressing policy issues confronting educational administrators and policy makers around the world. The PISA 2006 data include measures of student proficiency in reading and mathematics; however, science was the major domain in PISA 2006, assessed with a large and comprehensive set of test items, whereas reading and mathematics were minor domains. In addition, reading proficiency data for the United States were not made public because of an error in the way the test booklets were organized. At the country level, average science scores are correlated 0.97 with reading and 0.95 with mathematics; therefore, country-level findings for most analyses in this study can be generalized to reading and mathematics proficiency as well.
The outcome measure in the analyses is students performance in science literacy. The literacy tests developed for PISA emphasize the kinds of skills that students will need in their everyday lives as they approach postsecondary education and employment. Therefore, the literacy tests are primarily concerned with whether students can apply the knowledge they have learned at school, rather than the content of secondary school curricula that is common among countries. The PISA science scores were standardized to have a mean of 500 and a standard deviation of 100 for all students in participating OECD countries. Scores were also classified into six levels. Students scoring at the lowest level, Level 1, have very little scientific knowledge, such that it can be applied only in a familiar situation. At Level 2, students have sufficient scientific knowledge to draw simple conclusions in familiar contexts or conduct simple investigations. At the highest level, Level 6, students are able to identify, explain, and apply scientific knowledge in a many complex situations and can link different sources of information and evidence to provide reasoned arguments. Detailed explanations of the scale are provided in the OECD technical report (OECD, 2006). The mean scores, their standard deviations, and the skewness of the distributions for all countries are provided in first three columns of the appendix.
The PISA measure of SES describes students economic, social, and cultural background. It was derived from a factor analysis of data describing levels of parental education and occupation, and an index of material, educational, and cultural possessions in the home. SES was scaled to have a mean of 0 and a standard deviation of 1 at the student level for all OECD countries.
Any measure of SES is likely to differ in its meaning across cultures, and indeed the factor structure does vary significantly among countries. However, in most countries, the relative contribution of the three factors is similar. More important, though, SES could be considered as either a relative measure describing the social hierarchy within countries, or as an absolute measure that is relevant to peoples social standing across countries. For example, a person with a secondary school diploma is likely to have a different relative status in a low-income country than in a high-income country, but the secondary diploma also carries some meaning in an absolute sense. In the analyses in this article, SES is treated in a relative sense in that gradients are estimated separately within countries. But when one compares the expected score of an average child in each country, it is necessary to treat SES as an absolute entity.
RAISING AND LEVELLING THE LEARNING BAR
I use the term learning bar as a metaphor for the socioeconomic gradient for schooling outcomes. A socioeconomic gradient describes the relationship between a social outcome and socioeconomic status for individuals in a specific jurisdiction, such as a school, a province, or state, or a country (Willms, 2006). In the case of schooling, the social outcome is typically a measure of academic achievement, or an affective outcome such as self-esteem. Socioeconomic status refers to the relative position of an individual or family on a hierarchical social structure based on their access to, or control over, wealth, prestige, and power (Mueller & Parcel, 1981). The key indicators of SES in most educational studies include parents level of education and their occupational prestige.
Figure 1 shows the socioeconomic gradient for science performance for the United States (thin black line). Its gradient is similar to the average gradient for all OECD countries (thick gray line). The vertical axis has two scales: The left-hand scale is the continuous scale for science performance, and the right-hand axis depicts the six levels of science proficiency, described previously.
Figure 1. The socioeconomic gradient for the United States
Socioeconomic gradients comprise three components: their level, their slope, and the strength of the outcome-SES relationship1:
a. The level of the gradient is defined as the expected score on the outcome measure for a person with average SES. It can also be considered the SES-adjusted mean, because it is the average performance for a country (or for a province or state, or a school) after taking account of students SES. The level of the U.S. gradient is 477.
b. The slope of the gradient indicates the extent of inequality attributable to SES. Steeper gradients indicate a greater impact of SES on student performancethat is, more inequalitywhereas more gradual gradients indicate a lower impact of SESthat is, less inequality. The slope of the U.S. gradient (in the center of the data) is 46.8, which indicates that the expected science performance increases by 46.8 points for a one-standard-deviation increase in SES. The U.S. gradient also has a small but statistically significant curvilinear component. The estimate for SES-squared is 4.9, which indicates that the slopes are slightly steeper at higher levels of SES.
c. The strength of the gradient refers to the proportion of variance in the social outcome that is explained by SES. If the strength of the relationship is strong, a considerable amount of the variation in the outcome measure is associated with SES, whereas a weak relationship indicates that relatively little of the variation is associated with SES. The most common measure of the strength of the relationship is a measure called R-squared, which for the U.S. example is 0.17.
The gradient line is drawn from the 5th to the 95th percentile of the SES scores for a particular population. For the United States, the 5th and 95th percentiles are -1.47 and 1.49, respectively. Therefore, 90% of the students fall in this range. Students in the United States, on average, have a higher SES than those in other OECD countries. The 5th and 95th percentiles for all OECD students are -2.22 and 1.47 respectively. The learning bar graph also shows the science performance and SES for a representative sample of 5,000 U.S. students. These are the small black dots above and below the gradient line. They show that there is considerable variation in science performance at all levels of SES.
The socioeconomic gradient for the United States immediately reveals three important findings. First, the U.S. learning bar is below that of the OECD and is slightly steeper. Thus, science results are less equitable than the average for all OECD countries. Second, the students at Level 2 and lower are from a wide range of SES backgrounds. There are disproportionately more students from low SES backgrounds scoring at this low level, but there are also many students from average and high SES backgrounds scoring at or below Level 2. Third, there are some resilient students who are from low SES backgrounds and scored at Level 4 or higher; however, the vast majority of students with scores at or above Level 4 are from high SES backgrounds.
The gradient specifications are provided for all countries in the appendix. The results show clearly that countries vary substantially in their SES-adjusted level of performance and in the slopes and strength of their gradients. The gradients for some countries, like the United States, are curvilinear. One might expect that above a certain level of SES, there would be little or no increase in students outcomes. This is a test of the hypothesis of diminishing returns, which holds that the relationship between social outcomes and SES is weaker at higher levels of SES (Willms, 2006). When estimating the gradients, therefore, I included the square of SES to discern whether there was a statistically significant curvilinear relationship. The results in the appendix indicate that there is a statistically significant curvilinear relationship with SES for 24 of the 57 countries; however, the relationship is negative, consistent with the diminishing returns hypothesis, for only four countries: Japan, Austria, Italy, and Macao-China. In the other 20 countries, it is positive, indicating increasing returns for higher levels of SES. There tend to be increasing returns for SES in lower income countries. These findings are consistent with those reported by Willms and Somers (2001) for the reading and mathematics achievement of Grade 3 and 4 students in several Latin American counties. Their results suggested that there was a premium associated with parents having completed secondary school. However, their findings, and the PISA results reported here, may be attributable to a floor effect on the test.
Although the gradient line conveys considerable information about the distributions of science performance and SES and the relationship between them, it does not describe how these relationships vary within and between schools, or among other jurisdictions within the country. Some of this information can be summarized with a school-level scatter-plot that displays the relationship between school mean performance and school mean SES. I refer to these displays as a school profile.
Figure 2 provides an example for the United States. It displays the relationship between average school performance in science and average SES for the 166 U.S. schools that participated in PISA 2006. In this case, the dots represent schools rather than students. The size of the dots is proportional to school enrollment. The type of dot indicates whether the school is an urban public school (black dot), a rural public school (gray dot), or a private school (open circle). School profiles are useful in that they indicate the range in school performance at varying levels of SES. The range in science scores, from the lowest to highest performing schools, is about 100 points. This range appears to be fairly consistent at all levels of SES, at least for schools in the middle of the SES range (e.g., from -0.5 to 0.5).
The 25th percentile for SES at the student level in the United States is -0.43, which could be informally considered the poverty line. In Figure 2, there are 18 schools that had average SES scores below this threshold. These represent about 1 in 8 schools. Such low SES schools served approximately one quarter (25.3%) of all 15-year-old students in the United States. We will see that these students are especially vulnerable in that they tend to have a low SES, but there also is a negative school composition effect associated with attending a low SES school.
Figure 2. School profile for the United States
SCHOOL COMPOSITION AND SOCIOECONOMIC GRADIENTS
The term composition effect refers to the effect on students outcomes associated with the aggregate characteristics of the school, such as the average SES of the school, the percent of students who are either male or female, and the schools ethnic composition. There are two important relationships that link the effects of school composition to the socioeconomic gradient. The overall socioeconomic gradient comprises separate gradients for each school, and a gradient associated with the relationship between mean school performance and the mean SES of the school. Figure 3 shows the average within-school gradient for the United States, characterized with the slopes for two hypothetical schools, School A and School B, and the between-school gradient. School A is a low SES school, with an average SES of -0.5. The expected score of an average SES studentthat is, a student with an SES score of 0attending School A is 445. School B is a high SES school, with an average SES of 0.5. The expected score of an average SES student attending School B is 503. The difference between the scores of these two hypothetical students, both with average SES, is 58 points. This is the estimate of the school composition effect of mean SES for the United States.
The within-school gradients, the average within-school gradients, and the between-school gradient can be estimated in a multilevel analysis.2 The estimates of these are shown for all
Figure 3. Within and between-school gradients for the United States
countries in columns 810 in the appendix. The composition effect of mean SES of the school is the difference between the slope of the between-school gradient and the average slope of the within-school gradients3:
There is another important relationship linking the effects of school composition to the socioeconomic gradient. The overall gradient, which was observed in Figure 1, is a function of the between-school gradient, the average within-school gradient, and , which is the proportion of variation in SES that is between schools (Alwin, 1976):
where βt is the overall gradient, βb is the between-school gradient, and βw is the average within-school gradient. The statistic, n2, is a measure of SES segregation, which theoretically can range from 0 for a completely desegregated system in which the distribution of SES is the same in every school, to 1.0 (or 100%) for a system in which students within schools have the same SES score, but the schools vary in their average SES.
HORIZONTAL AND VERTICAL SEGREGATION
I refer to the segregation associated with SES as horizontal segregation. All school systems have some degree of horizontal segregation stemming from residential segregation, and therefore, horizontal segregation tends to be most pronounced in large cities. Private schools usually contribute to SES segregation because wealthier families are more able to afford private school tuition. Special programs within the public sector can also contribute to SES segregation if they have greater appeal to high SES families. The French immersion program in New Brunswick, Canada, is a good example; it contributes to SES segregation because high SES families are more likely to choose the language program than are low SES families (Willms, 2008). Charter schools can potentially have the same effect even if they do not formally select students on the basis of ability or academic achievement. Levels of horizontal segregation for the countries in PISA 2006 range from 10% (Finland) to 50% (Bulgaria and Chile). The estimate of horizontal segregation for the United States is 23%, which is very close to the average for OECD countries, 24%.
I use the term vertical segregation to refer to the extent that students with differing levels of academic performance are segregated among schools. Recall that horizontal segregation is the proportion of variation in SES that is between schools. Similarly, vertical segregation is the proportion of variation in academic performance that is between schools. School systems that select students on the basis of ability or academic achievement tend to have greater between-school segregation. This need not be the case, but in practice, there is a clear relationship. Levels of vertical segregation range from 6% (Finland) to more than 60% (Hungary and The Netherlands). The estimate of vertical segregation for the United States is 23%, which is well below the average for OECD countries, 33%.
The estimates of the composition effect and horizontal and vertical segregation for all countries are shown in the last two columns of the appendix. The correlations among the descriptive statistics, the gradient specifications, the gradient components, and the indices of segregation are shown in Table 1. School systems that are horizontally segregated tend to be also vertically segregated; the correlation is 0.64. There are some school systems with relatively low levels of horizontal segregation but high levels of vertical segregation, but nearly all school systems with high horizontal segregation tend to also have high vertical segregation. This highlights the important role of the school composition effect, which is highly correlated with the index of vertical segregation (r = 0.69).
Figure 4 shows the school profiles for four countries that illustrate the large differences among school systems on the two segregation indices. France has relatively high levels of both horizontal and vertical segregation, 34% and 54%, respectively. Spain has relatively low vertical segregation (15%) given its level of horizontal segregation, which at 24% is close to the OECD average. As noted, there are few school systems with high SES segregation that do not also have high vertical segregation. Compared with Spain, Japan has a comparable level of horizontal segregation (27%) but a much higher level of vertical segregation, 47%. Finland has the lowest levels of both horizontal and vertical segregation.
Figure 4. School profiles that illustrate vertical and horizontal inclusion.
Does segregation matter? Figure 5 shows the relationship between mean levels of science performance, and horizontal and vertical segregation. School systems with higher levels of horizontal segregation have lower mean scores on average; the correlation is -0.42. This relationship also holds for SES-adjusted means at the country level; the correlation is -0.30. The relationship of country mean performance and vertical segregation is also negative; the correlation is -0.27. This is of moderate size; the graph shows that the levels of performance among countries with vertical segregation below 20% is uniformly average to high, whereas for highly segregated school systems, it is more variable.
Figure 5. Mean achievement and horizontal and vertical segregation
The comparison between France and Spain, shown in Figure 4, is instructive. The two countries have comparable average levels of performance, 495 and 489, respectively. However, France has a steeper overall socioeconomic gradient, 54.1, than Spain, 32.3. In France, the within-school gradient is relatively gradual (19.9), whereas the between-school gradient is very steep (116.0). The composition effect is also very strong: 95.5. This calls for reforms aimed at improving its low SES and low-performing schools. In Spain, the within-school gradient is steeper than in France (25.4 compared with 19.9), whereas the between-school gradient is much more gradual (48.3 compared with 116.0), and the contextual effect is weaker as well (23 compared with 95.5). This calls for reforms aimed at improving performance within every school.
SCHOOL CONTEXTUAL FACTORS AND SOCIOECONOMIC GRADIENTS
Measures of school composition, especially school mean SES, are more than just proxies for peer effects. Some of their observed effects are likely due to their correlations with various aspects of classroom and school context, such as the quality of instruction, the amount of time devoted to instruction, student engagement, and school resources (Slavin, 1994). This section of the article provides estimates of the correlation between school mean SES and six aspects of classroom and school context. It also presents results from four separate multilevel regression models, based on a three-level hierarchical linear model, with students nested within schools and schools nested within countries. The aim of the analysis is to discern the extent to which school composition, classroom and school context, and the two sets of factors jointly account for variation in student achievement at each level.
Table 2 shows the relationships between science performance and school mean SES, as well as six school-level contextual factors:
Quality of instruction is based on two questions asked of students about whether their teachers and the subjects they study at school equip them with the skills they need for a science-related career.
Instruction is relevant is a measure derived from 14 questions about whether students feel that science topics are important to society, applicable in their daily life, and relevant to their future.
Instruction is interesting is based on 12 questions pertaining to students interest in learning science.
Curriculum coverage is based on students reports of whether they learned specific science topics, such as photo-synthesis or nuclear energy, at school.
Science time is based on reports from students on the amount of time per week they spent in regular science lessons in their school. Each point on the 10-point scale represents 40 minutes of class time per week.
School resources is an indication of whether school administrators feel they have adequate material and human resources in the school. It is based on their reports on 13 questions about their school.
All these contextual factors were scaled on a 10-point scale, with higher scores indicating a positive response.
The first column shows the average within-county correlation between school mean SES and the six contextual factors. This is a measure of equity because it indicates whether children in low SES schools have the same access to these school processes as their counterparts in high SES schools.
The results indicate that there is an inequitable distribution of these processes across schools. For example, on average across countries, quality of instruction is correlated 0.07 with school mean SES. Thus, students in low SES schools tended to give lower ratings on this measure. The factor with the highest correlations was science time; students in low SES schools had considerably less time spent on science instruction than those in high SES schools.
The first model of the multilevel analysis is a null model. It simply partitions the variance in science scores into variance at the pupil, school, and county levels. The results at the bottom of Table 2 express these results as standard deviations rather than variances because they are more easily interpreted. The results indicate that on average, within schools, the standard deviation is about 74 points. However, schools vary in their average levels of science scores; on average within countries, the standard deviation is 56.8 points. If we consider that there is a range of about plus or minus two standard deviations, these results suggest that on average, schools vary by about 200 points between their lowest and highest performing schools. Countries also vary substantially in their average levels of science performance; the standard deviation is 54.8 points.
The second model is a standard composition effects model; it includes just SES and school mean SES. The average within-school gradient across all participating countries is 17.3. The subscript s indicates that the gradients vary significantly among schools within countries (although not necessarily within all countries), and the subscript c indicates that the gradient varies among countries. The average effect across countries of school composition is 61.5. It varies significantly among countries, indicated by the subscript c. Although this was obvious in the previous discussion, the HLM provides a formal test of this hypothesis. The figures in the bottom three rows of Table 2 indicate that SES and school mean SES account for about 4.4% of the variance among students within schools, 58.8% of the variance among schools within countries, and 12.7% of the variance among countries. These results emphasize the important role that SES plays in determining student science scores.
The third model includes the six contextual factors. All of them are statistically significant, and their effects vary significantly among countries. The two most important factors are curriculum coverage and science time. For each 1-point increase on the 10-point scale, science scores increased by about 15 points. Each 1-point increase on the science time scale, which corresponds to an increase in 40 minutes of class time per week, corresponds to an increase in science scores of about 17 points. The measure of quality of instruction yielded a contradictory result: It indicates a negative effect of about 10 points. This variable yields positive effects when entered in the model on its own, but its effect is mediated by the other factors in the model. The measures of instructional relevance and student interest are both positive, with effects of about 6 and 9 points, respectively. School resources had a relatively small effect; each 1-point increase on the 10-point scale was associated with an increase of about 2 points.
The school context model accounted for about 5% of the variance at the student level, 66% of the variance at the school level, and 12% of the variance among countries. These figures are comparable with those of the school composition model.
The last model in Table 2 includes school composition and the contextual variables. The first and perhaps most important point is that almost one half of the school composition effect is accounted for by context; the effect decreased from 61.5 to 37.1. In addition, the effects of the six context variables also decreased, except for school resources. The effect of instruction relevance was no longer statistically significant, although it did vary among countries. The variables in the full model accounted for about three quarters of the variation among schools.
The models presented in Table 2 were not intended to be fully specified models; rather, they were estimated to demonstrate the relationships between student performance and classroom and school contextual factors. Normally one would also include measures of family structure, immigrant status, and student gender in the student-level model. Also, the PISA 2006 data are not well-suited for testing hypotheses about the effects of school and classroom context. One problem is that the data on schooling processes are collected at the school level, and, as noted in the introduction to this article, there is considerable evidence that it is classroom processes, especially quality of instruction, that matters most. A more important limitation, though, is that students PISA scores represent the cumulative effects of their family, community, and school experiences on their literacy skills since birth (or arguably even before). Much of the variation in student performance could probably be explained by measures of performance before students entered secondary school (e.g., Willms & Kerckhoff, 1995). Thus, we cannot expect that student accounts of their current classroom or school climate can give us much purchase on their school performance.
SUMMARY AND IMPLICATIONS FOR RESEARCH AND EDUCATIONAL POLICY
The central question facing most educational administrators and policy-makers is, How can we raise and level the learning bar? This study used data from the 2006 PISA to examine the relationship between students science performance and SES at the student and school levels within each participating country, with attention to the role of SES composition. The socioeconomic gradient, or learning bar, and school profiles are useful devices for educational policy because they shift attention away from the rank ordering of schools and countries based on mean scores and toward issues concerning the distribution of educational outcomes, equality associated with SES, and the equitable distribution of educational resources. They can also be used to assess changes in educational performance over time, set standards, and provide direction about the kinds of interventions that might best raise and level the learning bar.
It is not surprising that in every country, there is a significant relationship between science performance and SES. However, this relationship varies considerably among countries, as does the overall level of student performance, even when SES is taken into account. In many school systems, the inequality in schooling outcomes associated with SES is entrenched in the mechanisms through which students are allocated to schools, including residential segregation, private schooling, special programs in the public sector, and selective tracks that channel students into different schools based on their prior achievement or ability. It seems that each country establishes its own tolerable equilibrium for social class inequalities derived from its social, historical, political, and economic context. The central thesis of this article is that an understanding of the nature of socioeconomic gradients and the role of school composition is central to discussions about educational quality, equality, and equity.
In considering the role of school composition, I distinguished between two types of segregation: horizontal segregation, based on the unequal distribution among schools of students with differing SES, and vertical segregation, based on the unequal distribution of students based on their general ability or academic achievement. The two types of segregation are correlated, but some countries have relatively high or low levels of one type of segregation. The key finding of this study is that the most successful countries have low levels of horizontal segregation; the correlation between mean levels of student performance and the level of horizontal segregation is -0.42. The finding regarding vertical segregation is not unequivocal; the correlation between mean levels of student performance and the level of vertical segregation is -0.27, and several countries with high levels of vertical segregation also have relatively high levels of student performance. I would argue, however, that low levels of vertical segregation are also desirable in that most students benefit from learning in a mixed-ability setting.
In other work, I considered five types of interventions for raising and leveling the learning bar (Willms, 2006). These can be considered in light of the notion of horizontally and vertically stratified systems:
Performance-targeted interventions aim to improve the levels of schooling outcomes of students who have low performance in a particular domain. They can entail the provision of a modified curriculum or additional instructional resources. A reading recovery or response-to-intervention (RTI) program is a good example. In school systems with low levels of segregation, such as Finland, most of the variation in student performance is among schools. Therefore, performance targeted interventions would best be implemented in all schools in the system. In a system with a high level of either horizontal or vertical segregation, or both, the efforts would be best directed at raising levels of performance in low-performing schools. In the United States, for example, the school profiles revealed that there are several schools with average levels of performance at Level 2 or lower. Most of the schools are large schools or rural schools that serve students who are predominantly from low SES backgrounds.
SES-targeted interventions can entail the same kinds of interventions as performance-targeted interventions, except that they are directed toward children from low SES families. For example, recent studies have indicated that the gap in reading achievement associated with SES widens during the summer months (Alexander, Entwisle, & Olson, 2007; Burkam, Ready, Lee, & LoGerfo, 2004; Downey, von Hippel, & Broh, 2004), and thus there is great potential in closing the gap with summer learning programs targeted toward low SES youth. An SES-targeted intervention is most appropriate when there is a steep socioeconomic gradient. In a school system with low horizontal segregation, this might be best accomplished with interventions aimed at all schools, whereas in a system with high horizontal segregation, it could be implemented first for children in low SES schools.
Compensatory interventions provide additional economic resources to students from low SES families. These interventions differ from SES-targeted interventions in that they strive to ameliorate the effects of poverty in a more general way than providing instruction aimed at improving results for a particular outcome such as reading. A free breakfast or lunch program for children from low SES families is an example of a compensatory intervention. This type of intervention is most appropriate in school systems with high levels of poverty. Generally they do not directly raise and level the learning bar, but they can be an important complement to performance- or SES-targeted interventions. They are also important at the school level when there are large inequities in school resources associated with school mean SES.
Universal interventions are targeted at all children in a jurisdiction. They strive to raise performance uniformly. For example, a school district might strive to improve overall reading performance by increasing the amount of time devoted to reading instruction and by reducing class sizes in the primary grades. Universal interventions are most appropriate when the socioeconomic gradient is relatively flat and when there is a low level of vertical segregation.
Inclusive interventions attempt to reduce horizontal segregation with policies that redistribute low SES students into mainstream schools. This kind of intervention is appropriate in schools systems that have a high level of horizontal segregation. For example, a school district might try to reduce between-school SES segregation by redrawing school catchment boundaries, amalgamating schools, or creating magnet schools in low SES areas. Inclusive interventions are difficult to achieve practically when SES segregation is entrenched geographically between urban and rural schools. They can also be difficult to achieve politically because there can be great resistance from middle-class parents who feel they benefit from a segregated system.
There is no one best type of intervention for raising and levelling the learning bar. The important point for educators and policy makers is that the effect of school composition is not particularly important by itself; rather, the focus should be on its role in determining the magnitude of the within- and between-school gradients, and the extent of horizontal and vertical segregation.
Increasing educational performance and reducing inequalities among students from differing socioeconomic backgrounds can be achieved in a number of ways. The approach that may work best depends on social and political issues; it also depends on the distribution of student performance and SES within and among schools, and how these factors are related to and interact with curricular offerings, teaching practice, and school context. The results presented in the hierarchical analysis cannot be interpreted as causal effects. The PISA data are cross-sectional and therefore should be considered a descriptive account of the relationship among factors.
Finally, these results have implications for how administrators report results based on district and state monitoring systems, especially given the recent push for assessment based on value-added models. There are two important points. First, although the importance of controlling for students family background and prior achievement is recognized, these results stress the importance of also controlling for school composition. If we refer back to Figure 3, Schools A and B have mean science performance scores that are on the between-school regression line, and as such, they are doing neither poorly or well compared with other schools with the same SES composition. However, an average SES child has a much higher score in School B than in School A. A measure of value-added that takes account of school composition is a fairer measure of the value-added associated with school policy and classroom practice; however, the danger is that in controlling for the school composition effect, one is also removing any contextual effects, including good instructional practices, that are correlated with SES composition (Raudenbush & Willms, 1995; Willms & Raudenbush, 1989). This is a limitation of value-added models that cannot be easily overcome.
School administrators and policy makers are increasingly being pressed to use data to inform their decisions. Although the analyses conducted in this article focus on the variation among countries, they serve as an example of the kinds of analyses that could be informative in the analysis of state and provincial data, and data collected at the district level. The socioeconomic gradient and school profile is a good place to start because it provides a useful portrait of the school system. This can be followed with detailed analyses that include the estimation of horizontal and vertical segregation, the school composition effect, and the gradients for each school. Finally, the analysis can examine the school resources and classroom practices that contribute to better schooling outcomes and discern whether these factors are fairly distributed among schools.
The author is grateful for the support from the Social Sciences and Humanities Research Council for its funding of the research program, Raising and Levelling the Bar in Childrens Cognitive, Behavioural and Health Outcomes. He is also appreciative of Lucia Tramonte for comments on an earlier draft of this paper, and especially for her insights concerning vertical segregation in European countries. He also wishes to thank Beth Fairbairn and Hasnain Mirza for assistance in preparing the manuscript.
Aitkin, M., & Longford, N. (1986). Statistical modelling issues in school effectiveness studies. Journal of the Royal Statistical Society, Series A, 149(1), 143.
Alexander, K. L. (1982). Curricula and coursework: A surprise ending to a familiar story. American Sociological Review, 47, 626640.
Alexander, K. L., & Eckland, B. K. (1975). Contextual effects in the high school attainment process. American Sociological Review, 4, 402416.
Alexander, K. L., Entwisle, D. R., & Olson, L. S. (2007). Lasting consequences of the summer learning gap. American Sociological Review, 72, 167180.
Alexander, K. L., Fennessey, J., McDill, E. L., & DAmico, R. J. (1979). School SES influencesComposition or context? Sociology of Education, 52, 222237.
Alexander, K. L., Pallas, A. M., & Cook, M. A. (1981). Measure for measure: On the use of endogenous ability data in school process research. American Sociological Review, 46, 619631.
Alwin, D. F. (1976). Assessing school effects: Some identities. Sociology of Education, 49, 294303.
Barr, R. D., & Dreeben, R. (1983). How schools work. Chicago: University of Chicago Press.
Brookover, W. B., Schweitzer, J. H., Schneider, J. M., Beady, C. H., Flood, P. K., & Wisenbaker, J. M. (1978). Elementary school social climate and school achievement. American Educational Research Journal, 15, 301318.
Brophy, J., & Good, T., (1986). Teacher behaviour and student achievement. In M. C. Wittrock (Ed.), Handbook of research on teaching (pp. 328375). New York: Macmillan.
Bryk, A. S., & Driscoll, M. E. (1988). The high school community: Contextual influences and consequences for students and teachers. Madison: National Center on Effective Secondary Schools, University of Wisconsin.
Bryk, A. S., & Raudenbush, S. W. (1988). Toward a more appropriate conceptualization of research on school effects: A three-level hierarchical linear model. American Journal of Education, 97, 65108.
Burkam, D. T., Ready, D. D., Lee, V. E., & LoGerfo, L. F. (2004). Social-class differences in summer learning between kindergarten and first grade: Model specification and estimation. Sociology of Education, 77, 131.
Coleman, J. S., Campbell, E. Q., Hobson, C. F., McPartland, A. M., Mood, A. M., Weinfeld, F. D., et al. (1966). Equality of educational opportunity. Washington, DC: Department of Health, Education, & Welfare.
Downey, D. B., von Hippel, P. T., & Broh, B. (2004). Are schools the great equalizer? Cognitive inequality during the summer months and the school year. American Sociological Review, 69, 613635.
Dreeben, R., & Gamoran, A. (1986). Race, instruction, and learning. American Sociological Review, 51, 660-69.
Gage, N. L., & Needels, M. C. (1989). Process-product research on teaching: A review of criticisms. Elementary School Journal, 89, 253300.
Gamoran, A. (1986). Instructional and institutional effects of ability grouping. Sociology of Education, 59, 185198.
Gamoran, A. (1987). The stratification of high school learning opportunities. Sociology of Education, 60, 135155.
Goldstein, H. (1987). Multilevel models in educational and social research. New York: Oxford University Press.
Hill, P., & Rowe, K. (1996). Multilevel modelling in school effectiveness research. School Effectiveness and School Improvement, 7(1), 134.
Ho, E., & Willms, J. D. (1996). The effects of parental involvement on eighth grade achievement. Sociology of Education, 69, 126141.
Lee, V. E., & Bryk, A. S. (1989). A multilevel model of the social distribution of high school achievement. Sociology of Education, 62, 172192.
Madaus, G. F., Kellaghan, T., Rakow, E. A., & King, D. J. (1979). The sensitivity of measures of school effectiveness. Harvard Educational Review, 49, 207230.
McPherson, A. F., & Willms, J. D. (1987). Beyond an atomistic model of school effects: Scottish findings. International Review of Sociology, 1, 145184.
Mortimore, P., Sammons, P., Stoll, L., Lewis, D. & Ecob, R. (1988). School matters. Los Angeles: University of California Press.
Mueller, C. W., & Parcel, T. L. (1981). Measures of socioeconomic status: Alternatives and recommendations. Child Development, 52, 1330.
Murnane, R. J. (1981). Interpreting the evidence on school effectiveness. Teachers College Record, 83, 1935.
Nechyba, T., McEwan, P., & Older-Aguilar, D. (2004). The impact of family and community resources on student outcomes: An assessment of the international literature with implications for New Zealand. Thorndon: New Zealand Ministry of Education.
Organisation for Economic Co-operation and Development. (2006). Assessing scientific, reading and mathematical literacy: A framework for PISA 2006. Paris: Author.
Pallas, A. (1988). School climate in American high schools. Teachers College Record, 89, 541553.
Plewis, I. (1991). Using multilevel models to link educational progress with curriculum coverage. In S. W. Raudenbush & J. D. Willms (Eds.), Schools, classrooms, and pupils: International studies of schooling from a multilevel perspective (pp. 149166). San Diego, CA: Academic Press.
Raudenbush, S. W., & Bryk, A.S. (1986). A hierarchical model for studying school effects. Sociology of Education, 59, 117.
Raudenbush, S. W., & Willms, J. D. (1995). The estimation of school effects. Journal of Educational and Behavioural Statistics, 20, 307335.
Robertson, D., & Symons, J. (1996). Do peer groups matter? Peer group versus schooling effects on academic attainment. London: London School of Economics, Centre for Economic Performance.
Rutter, M. (1983). Schools effects on pupil progress: Research findings and policy implications. Child Development, 54, 129.
Sammons, P., Hillman, J., & Mortimore, P. (1995). Key characteristics of effective schools: A review of school effectiveness research. London: London University Institute of Education, for OFSTED.
Scheerens, J. (1992). Effective schooling: Research, theory, and practice. London: Cassell.
Scheerens, J., Vermeulen, C., & Pelgrum, W. J. (1989). Generalizability of instructional and school effectiveness indicators across nations. International Journal of Educational Research, 13, 789799.
Slavin, R. E. (1990). Achievement effects of ability grouping in secondary schools: A best evidence synthesis. Review of Educational Research, 60, 471499.
Slavin, R. E. (1994). Quality, appropriateness, incentive, and time: A model of instructional effectiveness. International Journal of Educational Research, 21, 141158.
Willms, J. D. (1986). Social class segregation and its relationship to pupils examination results in Scotland. American Sociological Review, 51, 224241.
Willms, J. D. (1992). Monitoring school performance: A non-technical guide for educational administrators. Lewes, England: Falmer Press.
Willms, J. D. (2000). Monitoring school performance for standards-based reform. Evaluation and Research in Education, 14, 237253.
Willms, J. D. (2006). Learning divides: Ten policy questions about the performance and equity of schools and schooling systems. Montreal, Quebec, Canada: UNESCO Institute for Statistics.
Willms, J. D. (2008, JulyAugust). The case for universal French immersion. Policy Options, 9196.
Willms, J. D., & Kerckhoff, A. C. (1995). The challenge of developing new social indicators. Educational Evaluation and Policy Analysis, 17, 113131.
Willms, J. D., & Raudenbush, S. W. (1989). A longitudinal hierarchical linear model for estimating school effects and their stability. Journal of Educational Measurement, 26, 209232.
Willms, J. D., & Somers, M.-A. (2001). Family, classroom and school effects on childrens educational outcomes in Latin America. International Journal of School Effectiveness and Improvement, 12, 409445.
Zimmer, R. W., & Toma, E. F. (1997). Peer effects in private and public schools: Across country empirical analysis. Lexington: University of Kentucky.