Improving High-Stakes Decisions via Formative Assessment, Professional Development, and Comprehensive Educator Evaluation: The School System Improvement Project

by Todd A. Glover, Linda A. Reddy, Ryan J. Kettler, Alexander Kunz & Adam J. Lekwa - 2016

The accountability movement and high-stakes testing fail to attend to ongoing instructional improvements based on the regular assessment of student skills and teacher practices. Summative achievement data used for high-stakes accountability decisions are collected too late in the school year to inform instruction. This is especially problematic for students who require early intervention to remediate skill-specific difficulties, such as those identified for special education. The purpose of this article is to describe the School System Improvement Projectís hybrid approach to utilizing both formative and summative assessments to (a) inform decisions about effective instruction based on all studentsí and teachersí needs, and (b) guide high-stakes decisions about teacher effectiveness. Five key components of the SSI Project are outlined, including: (a) the use of formative assessments; (b) data collection from multiple teacher and student measures; (c) an emphasis on instruction and service delivery for all students, including those with disabilities, from minority groups, or from marginalized populations, based on a continuum of need; (d) ongoing teacher and administrator support via a systematic problem-solving process and coaching; and (e) consideration of student growth or progress. The practical implications of this approach are provided along with recommendations for advancing research and policy.

For over a decade, school reform initiatives in the United States have increased attention to student performance on high-stakes achievement tests as a means of increasing school and teacher accountability (No Child Left Behind Act of 2001). Although such efforts were intended to increase the rigor of educational opportunities for students, they have resulted in some unintended consequences, such as a shift from high-quality instructional experiences to test preparation, a narrowing of the school curriculum, rapid firing and rehiring of teachers and school leaders, and the inappropriate placement of low-achieving students into special education with less rigorous performance expectations (e.g., Heubert, 2002; Jacob, 2002).

One key challenge for the accountability movement is a lack of attention to ongoing instructional improvements based on regular assessment of student skills and teacher practices.  Although a focus on summative assessments, such as end-of-year achievement tests, has value in providing a reliable indication of students’ overall achievement based on well-defined standards, it is not helpful for identifying ongoing or emergent learning needs. For example, although the Partnership for Assessment of Readiness for College and Careers (PARCC) state reading achievement test is useful for determining general reading achievement, it is not designed to be useful for instructional planning. Formative assessments administered during the school year, such as brief reading measures that target discrete reading skills (e.g., phoneme blending or segmenting, reading fluency), are needed to determine which skills to target for individual students.  

Summative achievement data used for high-stakes accountability decisions are collected too late in the school year and tend to be too broad in scope to inform the quality of instruction or to improve student performance. This is especially problematic for students who require early intervention to remediate skill-specific difficulties, such as those identified for special education. By engaging in a process of data-based decision making that uses assessments to identify these students’ needs and measure their progress throughout the school year, teachers are better able to address academic concerns before they become a significant problem. This process is consistent with recent Response to Intervention models of assessment and service delivery that focus on the early detection and remediation of academic and behavior difficulties, as well as individualized or group-based instruction or intervention that is provided and adjusted based on students’ progress (e.g., Glover & Vaughn, 2010; Walker & Shinn, 2002).

We contend that an ideal approach to promoting accountability and student success, especially as it relates to students with disabilities, rests with a hybrid approach that utilizes both formative, skill-specific assessments and summative, end-of-year achievement. This approach shifts away from an exclusive focus on evaluative statements about performance to the provision of instruction based on data-informed needs. One benefit of this approach is that, by attending to instructional needs early, it maximizes opportunities for all students to improve achievement.

The purpose of this chapter is to describe such an approach from the School System Improvement Project that utilizes both formative and summative assessments to (a) inform decisions about effective instruction based on all students’ and teachers’ needs, and (b) guide high-stakes decisions about teacher effectiveness. Through the use of regular skill-focused assessments during the school year, this approach maximizes opportunities for teachers to improve classroom practices that promote learning for their students in a more timely and constructive manner.


High-stakes testing for educators and students under the No Child Left Behind (NCLB) Act of 2001 mandated that all students reach proficiency status on statewide testing of educational performance by 2014. While NCLB highlighted some well-known educational challenges (e.g., student achievement gaps, school effectiveness, teacher quality), it fell short of improving many schools and meeting the needs of students with disabilities. A major shortcoming of NCLB and subsequent accountability efforts has been their primary focus on summative, end-of-year student achievement data to evaluate teacher and school effectiveness. Instructional improvements and needed intervention within this model have typically only occurred after “waiting for students to fail” (Brown-Chidsey, 2007).

Recognizing the shortcomings of NCLB, Congress granted states waivers from meeting NCLB requirements in exchange for their agreeing to improve school accountability and teacher effectiveness. As an NCLB waiver state, New Jersey implemented landmark legislation that revamped teacher tenure and evaluation for improving teacher effectiveness and student achievement (New Jersey Educator Effectiveness Task Force, 2011; Excellent Educators for New Jersey [EE4NJ]). Since 2012, it has mandated that all schools evaluate teachers through a prescribed multi-method assessment approach (with weighted measurement components) that includes measures of educator practice (processes) and student growth in achievement (outputs).  While the impact of this legislation remains unclear, high-poverty schools in New Jersey remain plagued with below-benchmark student achievement and high staff turnover rates (i.e., average 25% in high-poverty schools in comparison to the state average of 5%).  

Recognizing the need for new models of school-wide reform for high-poverty schools, in 2012, the U.S. Department of Education’s Teacher Incentive Fund Program provided a grant to the School System Improvement (SSI) Project, a five-year school reform initiative (2012–2017).  The SSI Project is a partnership between researchers from Rutgers University and Arizona State University and 15 high-poverty charter schools throughout all regions of New Jersey that support traditionally underserved students from highly diverse backgrounds. In its third year of implementation, the SSI Project comprises 38 school administrators, 465 teachers, and over 7,000 kindergartners through 12th-grade general and special education students in New Jersey. The SSI Project aims to enhance school effectiveness through rigorous educator evaluation and the provision of teacher professional development targeting needs identified through the regular assessment of students and teachers. In addition, the project is designed to improve human capital management and educator evaluation systems that recognize, develop, and reward effective school leaders and teachers.


The SSI model differs from traditional high-stakes models, which focus exclusively on the use of tests to make summative judgments about educational effectiveness. In contrast, assessments in the SSI model are used to guide professional development support for teachers in advance of any evaluative determination. To address the shortcomings of existing “wait-to-fail” high-stakes approaches, the SSI Project uses assessment data from multiple sources, including classroom observations, student growth data, and teacher self-report measures, to guide instructional decisions and practices regularly throughout the year for all students, but especially those with the greatest needs, such as students identified for special education (Reddy, Kettler, & Kurz, 2015). Through a data collection and implementation planning platform, general and special education teachers enter and monitor progress toward individualized student goals. The academic and behavior goals of students with disabilities, such as Individualized Education Program (IEP) objectives, thus factor into teachers’ ongoing data-based decision making and professional development.

Five key components of the SSI Project include (a) the use of formative assessments; (b) data collection from multiple teacher and student measures (e.g., observations of instruction, student performance assessments); (c) an emphasis on instruction and service delivery for all students, including those with disabilities, from minority groups, or from marginalized populations, based on a continuum of need; (d) ongoing teacher and administrator support via a systematic problem-solving process and coaching; and (e) consideration of student growth or progress.

Throughout the academic year, skill-specific student and teacher assessments are used iteratively to guide instructional coaching with teachers to improve classroom practices to meet the academic needs of students. Multiple measures are used to assess the quality of teachers’ instruction and the skills of at-risk students. Through ongoing coaching, teachers are supported in the use of strategies for addressing a continuum of students’ needs. In addition to considering benchmarks (e.g., performance levels) to evaluate teachers, the project also takes into account student growth (i.e., rate of change in performance), an important indicator of progress in response to changes in classroom instruction and overall school effectiveness.

The five key components of the SSI Project are based on best practices from empirical research. A description of each follows.


A primary element of the SSI program is the use of formative assessments. Ironically, although both the Elementary and Secondary Education Act (ESEA, 2001) and Individuals with Disabilities Education Improvement Act (IDEIA, 2004) outlined the use of data-driven instructional decisions for students who may be at risk for failure and/or disability classification, high-stakes decisions are still primarily based on summative end-of-year data collection and therefore fail at identifying and meeting students’ needs earlier in the year. The use of formative assessments is needed to address this shortcoming.

Assessments are formative “to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions they would have taken in the absence of the evidence that was elicited” (Black & Wiliam, 1998, p. 9).

There is a strong data-based justification for the use of formative assessments for instructional decision making. Black and Wiliam (1998), in reviewing 250 studies utilizing an intentionally broad range of educational measurements designed to inform instruction, found that formative assessment had a positive impact on students, with average effect sizes ranging from .04 to .70 (large effects for educational practices). Formative use of student data has also been found to be an essential feature for delivering effective interventions for at-risk students and students with disabilities (e.g., Fuchs, Deno, & Mirkin, 1984). Additionally, system-wide use of formative assessment has been beneficial for identifying school- or grade-level needs and tailoring effective interventions to promote school success (e.g., Bollman, Silberglitt, & Gibbons, 2007; Van DerHeyden, Witt, & Gilbertson, 2007). Formative assessment of student reading and mathematics achievement has also been found to be strongly related to state assessments (e.g., Shapiro, Keller, Lutz, Santoro, & Hintze, 2006; Silberglitt, Burns, Maydun, & Lail, 2006; Silberglitt & Hintze, 2005). Thus, extant research supports the utility of formative assessment in monitoring student progress and guiding instructional practices that optimize outcomes throughout the course of the academic year.


The use of multiple methods of assessment for the SSI Project enables teachers to collect data for complementary functions, such as periodic screening to identify students’ and teachers’ performance relative to benchmarks, diagnostic assessment to identify specific skills or behaviors to target for additional support, and progress monitoring measures to assess students’ or teachers’ progress in attaining desired outcomes (e.g., Glover, 2010; Kettler, Glover, Albers, & Feeney-Kettler, 2014). Further, the use of multi-method assessment increases the validity of instructional decisions by triangulating data sources to identify and monitor student needs and teacher instructional goals (e.g., Reddy & Dudek, 2014).

Assessment of Student Performance

The SSI Project used the following to collect data on student performance: the Measures of Academic Progress (MAP; Northwest Education Association, 2015); general outcome measures and skill assessments unique to each school, such as DIBELS Next (Good et al., 2013), aimsweb (Pearson, 2012), or Fastbridge Assessments (Christ & Kember, 2015); and an end-of-year state computer-administered achievement test (Partnership for Assessment of Readiness for College and Careers [PARCC] assessment; Pearson, 2015). MAP benchmark assessments—computer-adaptive measures of K–12 student achievement in reading/language arts and mathematics (and in grades 3–9 science)—are administered two times (fall and spring) or three times (fall, winter, and spring) per year with students school wide to (a) predict student performance on the end-of-year state achievement test, (b) screen for and identify students whose scores indicate that they are performing at a lower level than expected in reading/language arts or mathematics, and (c) monitor students’ academic growth over the course of a single or multiple school years. A broadband measure of academic performance aligned with state and national standards, the MAP is used for early identification of students (in the fall and winter) who may benefit from additional instruction or intervention. In addition to the MAP, some schools in the SSI Project also use screening tools from measures such as DIBELS Next, aimsweb, or Fastbridge Assessments to screen for students in the fall, winter, and spring who may be at risk for difficulties with reading and/or mathematics, including students identified for special education.

After identifying students who may need differentiated instruction and intervention, skill-specific indicators are then used to determine students’ specific skill needs to guide instructional grouping. For example, second-grade students identified via the MAP as experiencing difficulties in the area of reading might then be administered a brief skill-specific measure of oral reading fluency (e.g., DIBELS Oral Reading Fluency; Good et al., 2013). If they performed at or above benchmark on this measure, then they might be placed into a comprehension strategy instructional group. If they did not meet benchmark expectations, then they would be administered a phonetic skill assessment, such as a measure of nonsense words fluency (e.g., DIBELS Nonsense Words Fluency; Good et al., 2013), to determine whether they were experiencing difficulties with phonics decoding and would benefit from a group phonics instruction (e.g., instruction focusing on letter-sound correspondence, putting together sounds to make words).

Once appropriate instructional goals are identified for students, progress is individually monitored at frequent intervals using skill-focused progress monitoring assessments to determine whether any changes are needed to instruction or intervention to meet students’ needs. For example, nonsense words fluency assessments might be administered every other week for students’ progress in response to extra, small-group phonics instruction. If students continued to progress, then the instruction would continue until they met benchmark expectations.

At the end of the school year, all students are administered the spring MAP assessment and the PARCC assessment, a computer-administered achievement test aligned with common core standards at each grade level. Both the PARCC assessment and MAP scores are used to (a) determine students’ achievement relative to benchmarked proficiency standards, and (b) guide decisions about teacher performance based on growth in student achievement over multiple years. As subsequently described, these summative assessments are considered only after teachers have had an opportunity to use formative data to guide instructional decisions that address students’ needs.

Assessment of Teacher Practices

As noted, the SSI Project includes four unique teacher formative assessment systems that capture instructional delivery and content coverage, the CSS-O, CSS-T, DFT, and MyiLOGS. The CSS-T and MyiLOGS assessments are teacher self-report measures, which structure teacher voice and self-reflection for purposes of data-based decision making and job-embedded professional development.

CSS (CSS-O and CSS-T). The CSS is a multi-dimensional assessment for identifying teachers’ use of practices related to student learning and positive behavior (Reddy & Dudek, 2014). Based on over 50 years of effective teaching research, the CSS includes strategies found effective across subject areas and grade levels (Reddy, Dudek, Fabiano, & Peters, 2015). It includes strategies (items) from different evidence-based models of instruction (e.g., direct instruction, constructivist, differential instruction) and classroom behavior management (e.g., praise, corrective feedback, proactive methods, directives) linked to student achievement. The CSS includes two brief forms, the CSS-Observer Form (CSS-O) and CSS-Teacher Form (CSS-T). The CSS-O includes three parts: (1) Strategy Counts, (2) Strategy Rating Scales (i.e., instructional and behavioral management strategies), and (3) a Classroom Checklist. Consistent with the CSS-O, the CSS-T includes two components: (1) Strategy Rating Scales (i.e., instructional and behavioral management strategies) and (2) a Classroom Checklist (for a detailed description of the CSS-O and CSS-T, see Reddy & Dudek, 2014; Reddy, Fabiano, Dudek, & Hsu, 2013).

For the SSI Project, the CSS is used formatively to guide coaching and to chart progress, as well as summatively (annually) to evaluate educator effectiveness (on a four-level performance rubric). Likewise, the CSS can be used to screen teachers who require individual intervention or group professional development. The CSS is used for general and special education teachers and evaluators of teachers such as directors of curriculum, school principals, instructional coaches, and school psychologists. CSS scores are used to (1) assess educators’ use of empirically supported instructional and classroom behavioral management strategies, (2) identify practice goals for improvement, (3) monitor educators’ progress toward practice goals following coaching/intervention, (4) inform professional development and supports (e.g., professional learning committees), and (5) provide evidence for overall teacher effectiveness.

Use of the CSS promotes collaboration between teachers and their evaluators or coaches for identifying practice goals, tracking changes in practices following feedback/intervention, and evaluating whether goals are obtained or require further intervention and support (for a detailed description, see Reddy, Dudek, & Shernoff, 2016).

For the SSI Project, the CSS is used in a three-stage assessment process that facilitates collaboration in measuring and understanding teachers’ classroom practices for coaching, progress monitoring, and evaluation. During a pre-observation conference stage (Stage 1), teachers and a school leader (e.g., evaluators such as principals or curriculum supervisors, school-based instructional coaches) discuss a lesson that will be observed. They engage in this step for the purpose of evaluation. Other leaders participate for the purpose of formatively informing instruction. This step includes discussing the lesson design, its objective, the strategies used for instructional presentation and encouragement of student learning, as well as the activities students may perform and how learning progress will be assessed. This conversation helps teachers and their evaluators understand how instruction takes place in each unique context and aids evaluators in providing feedback to teachers that is tailored to their particular classroom.

In an observation period (Stage 2), teachers and their school leaders conduct observations during lessons in which active instruction and student activities occur. The school leader completes the CSS-O Strategy Counts and takes targeted notes for the Strategy Rating Scales. The Strategy Counts tally how often teachers use empirically validated teaching skills (e.g., give academic or behavioral praise, opportunities for students to respond). The targeted notes help the evaluator when subsequently completing the Strategy Rating Scales and making recommendations for teacher practice. After completing the observation, the observer uses his or her notes to complete the Strategy Rating Scales and Classroom Checklist. School leaders rate how often teachers used specific strategies related to instruction and classroom behavioral management (Frequency Rating Scale). The items assess both frequency and quality of the strategies used. Using their notes, evaluators then rate how often the teachers should have used those same strategies in the observed classroom context (Recommended Frequency Rating Scale). After the observed lesson, teachers complete the CSS-T Strategy Rating Scales and Classroom Checklist. The CSS-O and CSS-T are entered in a web-based CSSscore program that provides easy-to-use score reports and graphic performance feedback that is stored for progress monitoring throughout the school year.

Finally, during a post-observation conference (Stage 3), teachers and their school leaders discuss the observations in relation to the lesson objectives identified in the pre-conference. CSS-O and CSS-T data are used to facilitate discussion of observed effective teaching practices and to establish professional development goals. CSS graphed score performance feedback displayed across time periods (i.e., the time series of each observation) provides a visual representation of teachers’ progress toward goals. In sum, CSS-O and/or CSS-T scores offer a quantitative method to screen, diagnose, and monitor changes in teacher practice strengths and areas for improvement over time.

DFT. A second measure of teacher practices for the SSI Project is the DFT. The DFT is grounded in a research-based set of components of instruction, the Interstate Teachers Assessment and Support Consortium (InTASC) Core Teaching Standards, and a constructivist view of learning and teaching (Danielson, 2013). The DFT employs observations to evaluate teachers based on 26 elements of effective practice, organized into 22 components within four domains. The domains include Planning and Preparing for Student Learning (Domain 1), Creating an Environment for Student Learning (Domain 2), Teaching for Student Learning (Domain 3), and Professionalism (Domain 4). On each element, teachers are rated as Distinguished, Proficient, Basic, or Unsatisfactory. Observations and portfolio review data are used to inform the ratings. The DFT can be adapted to the needs and preferences of the districts in which it is implemented. In the SSI Project, evaluators are typically school principals who make three observations per year.

MyiLOGS. The online teacher log, MyiLOGS, is designed to measure the concept of opportunity to learn (OTL) for students with and without disabilities. For purposes of measurement, MyiLOGS collects five OTL indices along time, content, and quality dimensions of the enacted curriculum. Specifically, OTL is defined as “the degree to which a teacher dedicates instructional time and content coverage to the intended curriculum objectives emphasizing higher-order cognitive processes, evidence-based instructional practices, and alternative grouping formats” (Kurz, Elliott, Kettler, & Yel, 2014, p. 27). The teacher log features preloaded, state-specific standards for various subjects to define the intended content coverage. In addition, teachers can create custom objectives, such as Individualized Education Plan (IEP) objectives, to ensure that other valuable learning targets besides academic content standards are reviewed for implementation. As such, MyiLOGS can be used for formative assessment and coaching, as well as summative (annual) self-reflection (e.g., Kurz, Elliott, & Roach, 2015; Roach, Kurz, & Elliott, 2015).

The SSI Project values teacher voice and self-reflection via self-report measures such as MyiLOGS, but does not permit the use of MyiLOGS (and CSS-T) scores to make high-stakes decisions due to potential corruptibility. Instead, credit is given for regular use of these self-report measures irrespective of what scores teachers are achieving. Data on OTL about instructional time, content coverage, cognitive processes, instructional practices, and grouping formats, however, do factor into the coaching objectives for each teacher. In other words, the data are used to set instructional goals, monitor progress, and professional development—both in formative and summative ways.

To promote teacher self-reflection and data-based decision making in the SSI Project, MyiLOGS provides instructional feedback based on the information gathered through the teacher log. This feedback includes an Instructional Profile Report that summarize a teacher’s OTL provisions via five scores, as well as a Comprehensive Feedback Report that provides over a dozen different charts and tables (see Roach et al., 2015). Teachers and coaches begin to use these OTL data after several weeks of logging. A review of OTL data is used to establish goals around any of the five OTL indices (i.e., content coverage, instructional time, cognitive processes, instructional practices, grouping formats) and to monitor progress toward these goals. For example, the Comprehensive Feedback Report provides a detailed list of all academic standards for a given subject and grade. One chart lists the content coverage per standard and thus helps teachers identify what areas have been covered, for how long, and what areas have yet to receive any coverage. Teachers set goals such as covering 90% of all academic standards for their particular subject area. Instructional coaches further review feedback reports with groups of teachers to identify consistent areas for professional development. For example, a coach may recognize that the majority of her teachers do not emphasize higher-order cognitive processes during instruction such as the “Analyze/Evaluate” process expectations. Lastly, OTL data are used in conjunction with achievement data to promote a more detailed review of instructional practices.


Recent attention to multi-tiered, response-to-intervention models of assessment and service delivery focuses on the early detection and remediation of academic and behavior difficulties. The purpose of these models is to provide a framework for service delivery for all students. This framework helps to ensure that students who would have “fallen through the cracks” or those who would traditionally be referred for special services (e.g., Title 1 services, special education) are regularly monitored with their peers and held to rigorous standards and expectations for performance. Within this framework, all students are screened to identify potential areas of concern, and specific needs for students potentially at risk are identified via skill assessments. Individualized or group-based instruction or intervention is then provided and adjusted based on students’ progress. All students’ needs are continually monitored and addressed through an ongoing assessment and intervention cycle (e.g., Glover & Vaughn, 2010; Walker & Shinn, 2002).

The use of formative assessment and ongoing teacher support for data-driven decisions within the SSI Project enables teachers to provide individualized instruction and intervention based on a continuum of student need. Systematic data collection and support better enables teachers to (a) make objective judgments about students’ progress relative to expectations, and (b) implement instruction/intervention to proactively address needs before students experience significant difficulties in the classroom. By screening all students and administering follow-up skill-based assessments for those experiencing difficulties, SSI teachers are able to identify and form specific target groups for instruction at various levels of intensity. For example, teachers differentiate instruction in their classroom to meet the needs of students experiencing moderate difficulties in specific target areas, or they create intervention groups for students who fall behind with basic skills, including those identified for special education. The frequency and duration of intervention for the SSI Project is determined by monitoring student progress in response to instruction.


Despite the widespread use of data for accountability initiatives, data collection is a necessary but insufficient means of improving teacher and student performance (e.g., Reddy et al., 2015). Ongoing support is often required for teachers to interpret and apply data to make instructional decisions to address students’ needs. Coaching has emerged as a potential approach for providing job-embedded, individualized professional development for teachers; however, few coaching models have been found through research to effectively guide teachers’ use of data-based instructional decisions. The most promising research supports a coaching approach rooted in behavioral consultation that involves using data to inform the implementation of appropriate instruction. According to this approach, coaches work with teachers to (a) identify students’ needs and resources, (b) set data-based goals, (c) design implementation plans to achieve the goals, (d) model implementation steps for research-based practices, (e) provide performance feedback, (f) evaluate implementation and goal attainment, and (g) make revisions to plans of action when needed (e.g., Glover & Ihlo, 2015).

Within the SSI Project, a highly trained, full-time instructional coach who has previously been identified as a successful teacher based on systematic evaluations at each school supports teachers in making data-based instructional decisions. Coaches follow a four-phase coaching process (shown in Figure 1) that involves identifying students’ needs, setting goals, and systematically implementing, evaluating, and refining data-based decision-making plans. Phase 1 of the coaching process involves using data from the school-specific broadband student assessments (e.g., Measures of Academic Progress), skill measures (e.g., DIBELS Next, aimsweb, FastBridge Assessments), and measures of classroom practices (i.e., CSS-T, MyiLOGS) to identify potential areas of instructional need in reading/language arts and mathematics for students in participating teachers’ classes. Phase 2 involves supporting teachers in establishing goals for providing instruction in their classroom that are based on data-identified needs. During this phase, follow-up skill assessments are administered to students, teacher assessments are analyzed, and coaches work with the teachers to use data to prioritize goals for students and their teachers. Phase 3 involves supporting teachers in developing an action plan for attaining each of their goals. Action plans include instructional grouping practices, intervention steps for addressing data-identified needs, protocols for monitoring plan integrity, and timelines for achieving specific performance benchmarks. Appropriate instruction/interventions are identified for each plan based on a toolkit of research-based practices from each school. Coaches support teachers in designating appropriate instructional groups and implementing practices to effectively meet their specified instructional goals. Phase 4 of the coaching process involves supporting teachers by observing the implementation of instruction/intervention from their toolkits and modeling and providing feedback to the teacher. Coaches also assist teachers in using skill assessments and measures of teacher practice to monitor progress relative to their specified goals. The coaches then support teachers in determining whether changes to action plans are needed based on their observations and student progress.

Figure 1. Instructional Coaching Platform (ICP) Model


Thus, the coaching process involves a problem-solving approach to formatively address data-identified instructional needs and to promote adherence to a system of continuous improvement. With coaching support, teachers are more likely to use data to improve their instruction in meaningful ways that ultimately lead to more positive student outcomes. Further, coaching support is useful for promoting teachers’ fidelity to instructional programs. Through engagement at each phase, coaches assist teachers in generating data-based decisions and in implementing appropriate practices to promote achievement for all students, including those identified for special education.


A primary strength of the SSI Project is its focus on student growth or progress over time. Measurement of growth occurs at two levels: (a) student progress in response to ongoing support based on data-identified needs, and (b) growth on broadband measures of achievement used to drive teacher incentives and make human capital management system (HCMS) decisions. On outcome measures, a focus on growth rather than status is preferred because end-of-year status shares a positive correlation with beginning-of-year performance. Using student growth greatly reduces the possibility of rewarding or penalizing a teacher based on her students’ achievement levels in September, with the change in achievement over the school year instead reflecting teaching quality. This change is relatively independent of the students’ initial achievement.

Formative Use of Growth Data

At the former level, data are used to guide instructional grouping based on students’ needs, and individual progress in response to instruction or intervention is monitored over time via skill specific measures (e.g., measures from DIBELS Next, aimsweb, or Fastbridge Assessments) to determine which instructional changes need to be made to increase student performance. In addition, repeated assessments of teacher instructional and behavioral management strategies via the CSS and MyiLOGS are used to determine progress with respect to improving classroom practices aligned with standards. The previously defined coaching process is used to guide teachers throughout the process of using data to adjust instruction based on students’ needs and best practices.

Consideration of growth for making evaluative decisions about teachers. Only once teachers receive support in improving classroom practices matched to students’ needs are growth data then used to inform incentives and HCMS evaluation decisions. By emphasizing formative assessment and data-based instructional decision making as a prerequisite to summative teacher evaluation, the SSI Project model prioritizes effective instruction that addresses students’ needs over punitive appraisals or rewards for teachers. However, summative evaluations are still an important element in maintaining accountability in SSI schools, as long as they include consideration of both practices’ outcomes. McLaughlin and Jordan (2004) developed a model of evaluation that included an emphasis on process accountability that focuses on the efforts that go into activities, and on outcome accountability that focuses on the effects that result from those activities. Similarly, the SSI Project evaluates teachers using observational and rating measures of the education process, and also using student growth on achievement tests of the educational outcomes. As subsequently described, a formula that includes scores from these various measures is used to determine teacher effectiveness and guide teacher incentives.

Observations and teacher self-report methodology are used in the SSI Project for process accountability regarding teacher effectiveness because they are the best techniques available for measuring teaching. Two complementary tools, the CSS-O (Reddy & Dudek, 2014) and the DFT (Danielson, 2013) provide scores based on three annual classroom observations that are used to gauge levels of teacher effectiveness. For the DFT, teacher effectiveness is determined by using a four-level, standards-based rubric that provides qualitative information for identifying teachers within each level of effectiveness. For the CSS-O, teacher effectiveness is determined by using discrepancy scores (i.e., item-level recommended frequency—frequency scores) within a four-level performance rubric (quantitative information) for identifying teachers within each level of effectiveness.  

Standardized tests of student achievement are used in the SSI Project for outcome accountability regarding teacher effectiveness because they are the most objective tools available for measuring student growth in performance. The PARCC assessment is the required state accountability test in New Jersey. To capitalize on the complementary strengths of broadband summative achievement tests such as the PARCC assessment and computer adaptive measures such as the MAP, both tests are used.

Student growth percentiles on the PARCC assessments. For the SSI Project, growth on the PARCC assessments is calculated using student growth percentiles (SGPs) in the Colorado Model (a practice used by the New Jersey Department of Education). The Colorado Model characterizes student growth as the difference between achievement test performance in consecutive years and the projected achievement based on each student’s historical results. According to an August 2011 white paper from the National Center for the Improvement of Educational Assessment, SGPs (used in the Colorado Model) describe how “typical a student’s growth is by examining his/her current achievement relative to his/her academic peers” (Betebenner, 2011, p. 3). In the simplest case, a student who has only taken one achievement test in the past has his or her growth compared with the growth of all other students who have obtained that same score at that same grade level (i.e., academic peers). Over the years, as a student develops a longer history of annual achievement testing, more scores are entered into the model to better predict the amount of growth. For example, a student with an SGP of 75 showed an improvement over the previous year to a degree that equaled or exceeded the improvement of 75% of all other students who had a similar history of achievement test performance.

SGPs can be aggregated at the classroom, school, or district level, and reported as median scores indicative of educational effectiveness. In comparison to competing models of student growth, the Betebenner SGP models require neither vertical scaling across grades nor interval scaling within or across grades. The information provided by the SGPs in the Colorado Model is relatively easy to understand, compared to other value-added models that may input additional variables (e.g., demographics) in an attempt to isolate school and teacher effects.

RIT scores on the MAP tests. The SSI Project provides the MAP tests as a supplement to the PARCC for measuring student growth in reading and mathematics for grades 4–8, and as the sole measure of growth in the other grades and content areas. All teachers from the SSI Project are required to select at least two content areas for inclusion in their evaluations. This requirement serves the purposes of (a) ensuring that no teacher’s growth score is based on a single source, (b) recognizing the interrelatedness of the content areas within a school system, and (c) treating all teachers the same regardless of whether they are in a content area (e.g., reading) that typically has multiple, high-quality, and comparable testing options available or a content area (e.g., social studies) that typically does not.

The SSI Project utilizes growth scores based on Rasch Unit (RIT) scales developed by Northwest Evaluation Association (NWEA) based on item response theory. RIT units have the advantage of being equal interval, such that a difference of a given number of the points is the same whether that difference is at the top, middle, or bottom of the distribution of performers. NWEA provides growth projections on the MAP based on the average growth of all students from the same grade who started at the same RIT level. The SSI Project utilizes NWEA’s Growth Index, the number of points by which a student’s score exceeds or falls short of the projected growth based on that average. Teachers are evaluated based on the percentage of their students who exceed projected growth over a set period of time.

Progress relative to goals and benchmarks. A key distinction between the student assessment tools used for large-scale evaluation and those used for measuring student performance relative to goals and benchmarks is that the former tend to be broad, designed to cover most or all of the content standards within a domain (e.g., reading, mathematics) at a specific grade level. The latter focus on narrower and often prerequisite skills (e.g., phonemic awareness, mathematical operations) used to identify intervention needs for individual students. The SSI Project focuses on evaluation and professional development at the educator level, based on the high likelihood that such improvement in teacher performance will ultimately lead to improvements in student performance. As William (2009) indicated, the person for whom some assessment processes are formative is the teacher rather than the student. While the PARCC and MAP are unlikely to provide information specific enough for intervention at the student level, they are highly reliable measures and the data points collected across a classroom of students can provide critical information about professional development targets. For example, a teacher whose students perform worse on the multiplication and division portions of both the PARCC and MAP than in other areas should explore why the students are experiencing difficulties in these areas. To inform teacher evaluation and potential areas for professional development, the SSI Project considers the intrapersonal or ipsative focus of student achievement data rather than normative categorical benchmarks based on data reduction.

Teacher incentives and HCMS decisions based on growth. At the end of each year, scores from the process measures and outcome measures are combined into a compensatory model that yields four categories of teacher effectiveness, aligned with categories from the State of New Jersey: Ineffective, Partially Effective, Effective, and Highly Effective. The model is compensatory in that very high process scores can compensate for low outcome scores, and vice versa, although an educator would likely need high scores on both processes and outcomes to be Highly Effective. Following guidelines from the State, the weights for each measure are based on whether there are a sufficient number of scores from the PARCC for SGPs from a teacher’s classroom to include in the evaluation model. In the first year of PARCC testing (2015–2016), New Jersey advised weighting the SGPs at a relatively small percentage. As the most established indicators in the model, CSS-O, DFT, and MAP scores are relatively heavily weighted. Teacher self-report data from the CSS-T and MyiLOGS are lightly weighted, given that the level of completion is considered for these assessments rather than actual ratings or scores.

Teachers in the Effective and Highly Effective categories receive modest financial stipends as incentives for their performance. Teachers found Effective in a given year receive a stipend in the range of 3% of their salaries, and teachers found Highly Effective receive a stipend in the range of 5% of their salaries. These amounts were determined following guidelines from the Center for Educational Compensation Reform and HCMS scholars (Odden, 2008). Teachers found Effective or Highly Effective also become eligible to apply for Master Mentor positions (full-time coaches), receiving a salary increase to leave the classroom for multiple years to coach teachers in their schools. SSI Project schools have agreed to take effectiveness determinations into account, along with other factors, when making decisions regarding teacher retention, promotion, and dismissal.


High-stakes testing for educators and students has significantly increased since the passage of the No Child Left Behind (NCLB) Act of 2001. Given the major shortcoming of NCLB and subsequent accountability efforts with respect to their primary focus on summative, end-of-year student achievement data to evaluate teacher and school effectiveness, instructional improvements and needed intervention have typically only occurred after “waiting for students to fail” (Brown-Chidsey, 2007). The SSI Project presents an alternative to traditional “wait to fail” models through an evaluation model that incorporates assessment data from multiple sources to guide instructional decisions for all students, especially those with the greatest needs, such as students identified for special education (Reddy et al., 2015). The approach used in the SSI Project utilizes both formative and summative assessment to (a) inform decisions about effective instruction based on students’ and teachers’ needs, and (b) guide high-stakes decisions about teacher effectiveness. Through the use of regular skill-focused assessments during the school year, this approach maximizes opportunities for teachers to improve classroom practices that promote student learning.

Students identified for special education represent a unique population whose needs are often not met through core classroom instruction alone and who are often most significantly negatively affected by a wait-to-fail approach to school reform that doesn’t sufficiently attend to their academic difficulties (Brown-Chidsey, 2007). Without adequate attention to instructional opportunities targeting specific data-identified skills, these students are unlikely to perform well on standardized achievement tests used to make high-stakes decisions. The SSI Project provides an example of an approach used to maximize the academic performance of such students through regular, ongoing support. This approach not only helps to promote achievement for students in special education, but also holds teachers accountable for these students throughout the school year, providing an alternative to the traditional wait-to-fail model. Unlike traditional high-stakes testing approaches, the SSI approach provides a method of ongoing support for teachers based on their students’ needs throughout the school year, thus promoting both teacher and student success rather than punishing failure. The use of financial incentives can then be applied, once teachers have an opportunity to increase their performance.


Despite the strengths of the SSI Project approach, there remain challenges in promoting and incentivizing teachers’ implementation of instruction that results in maximized achievement for all students. Such challenges include (a) a lack of precision in instruments designed to identify skill needs for students and teachers that are predictive of end-of-year, high-stakes achievement tests, (b) insufficient training for teachers in the use of formative data to guide instructional decisions, (c) insufficient access to systems that support the data-based instructional decision-making process, and (d) a lack of coordinated, school-based inventories of instructional approaches and interventions designed to meet data-identified needs. Although there have been significant advances in the development of assessments targeting student skills and teacher instruction, research on the precision of these instruments in predicting end-of-year achievement and specific instructional needs is still ongoing. Additional decision-making frameworks are needed to determine acceptable levels of false positives and false negatives for such instruments. Future research must also take into account variations in achievement tests that are often used as criteria in the assessment validation process. Developments in equating studies between achievement tests may be useful for advancing this effort (e.g., Glover & Albers, 2007).

For many teachers, assessment literacy is a significant obstacle to adopting a data-based decision-making approach to meeting students’ instructional needs. Although the SSI Project provides a framework for supporting teachers through coaching and professional development opportunities, the availability of such resources in U.S. schools varies greatly. Many schools do not have systematic professional development in formative assessment, nor do they have coaches to provide sufficient ongoing support to teachers.

Although data-based management systems have begun to emerge in schools over the past decade, many schools have not fully integrated such systems into their instructional decision-making process. The SSI Project is currently developing an online system to support teachers in the use of data to make instructional decisions based on their needs and those of their students. This system allows for easy access to graphical displays of data and requires minimal support. It is hoped that this platform can be used as a model for future development efforts. Further research is needed on the utility and acceptability of such systems for advancing instructional decisions in schools.

Finally, few schools have coordinated inventories of instructional approaches and interventions designed to meet data-identified needs. Assessment in absence of appropriate instruction is unlikely to result in benefits for students. Additional efforts are needed to assist schools in making decisions about research-supported instructional approaches for meeting needs identified through systematic assessment.


Given major shortcomings of NCLB and subsequent wait-to-fail accountability efforts, it would be helpful for future policies to advocate for the use of formative data to inform decisions about effective instruction based on students’ and teachers’ needs, and to guide high-stakes decisions about teacher effectiveness. Accordingly, efforts to incentivize formative assessment and data-based instructional decision-making literacy as part of school reform initiatives are sorely needed. Further, future policies should prioritize ongoing teacher professional development that promotes high-quality instruction based on students’ needs.

Based on lessons learned from the SSI Project, several recommendations are posited for advancing future high-stakes evaluation systems. First, it is important for future systems to focus on multi-method formative assessment of teachers and all students, including those identified for special education. By identifying instructional and skill needs early and regularly monitoring progress, educators are able to engage in practices that maximize all students’ achievement. Second, to capture progress over time in response to instructional changes, future systems should consider growth in student performance in addition to status. Third, future high-stakes evaluation systems should provide coaches and support personnel sufficient training to deliver regular guidance to teachers in making formative, data-based instructional decisions throughout the school year. Finally, given rapid developments in assessment and data-driven instructional practices over the past decade, future high-stakes evaluation systems should shift the focus away from a punitive model to one that rewards educators who engage in a prevention- and intervention-focused, data-based decision-making process that leads to improved instruction and outcomes for all students. It is hoped that the SSI Project can serve as an example of such an effort.


The research reported here was supported by the U.S. Department of Education – Teacher Incentive Fund Program, through Grant S374A120060 to Rutgers University. The opinions expressed are those of the authors and do not represent views of the U.S. Department of Education. Correspondence concerning this article should be addressed to Todd A. Glover, Rutgers University, 41 Gordon Rd, Suite C, Piscataway, NJ 08854; e-mail:


American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Betebenner, D. W. (2011). An overview of student growth percentiles. Dover, NH: National Center for the Improvement of Educational Assessment.

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 98(5), 7–75.

Bollman, K. A., Silberglitt, B., & Gibbons, K. A. (2007). The St. Croix River Educational District model: Incorporating systems-level organization and a multi-tiered problem-solving process for intervention delivery. In S. R. Jimerson, M. K. Burns, & A. M. VanDerHeyden (Eds.), Handbook of Response to Intervention: The science and practice of assessment of intervention (pp. 319–330). New York, NY: Springer.

Brown-Chidsey, R. (2007). No more “waiting to fail.” Educational Leadership, 65(2), 40–46.  

Christ, T. J., & Kember, J. (2014). FAST aReading, CBMreading and earlyReading: Standard Setting, Criterion Validity and Diagnostic Accuracy (Technical Report v1.3). Minneapolis, MN: FastBridge.

Danielson, C. (2013). The framework for teaching evaluation instrument (2013). Princeton, NJ: The Danielson Group.

Fuchs, L. S., Deno, S., & Mirkin, P. (1984). Effects of frequent curriculumbased measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21, 449–460.

Glover, T. A. (2010).  Key RTI service delivery components: Considerations for research-informed practice. In T. A. Glover & S. Vaughn (Eds.), The promise of Response to Intervention: Evaluating current science and practice (pp. 7–22). New York, NY: Guilford Press.

Glover, T. A., & Albers, C. A. (2007).  Considerations for evaluating universal screening assessments. Journal of School Psychology, 45, 117–135.  

Glover, T. A., & Ihlo, T. (2015, February). Professional development with coaching in RTI reading: A randomized study. Paper presented at the annual meeting of the National Association of School Psychologists, Orlando, FL.

Glover, T. A., & Vaughn, S. (Eds.). (2010). The promise of Response to Intervention: Evaluating current science and practice. New York, NY: Guilford Press.

Good, R. H., Kaminski, R. A., Dewey, E. N., Wallin, J., Powell-Smith, K. A., & Latimer, R. J. (2013). DIBELS Next Technical Manual. Eugene, OR: Dynamic Measurement Group.

Heubert, J. P. (2002). High-stakes testing: Opportunities and risks for students of color, English-Language Learners, and students with disabilities. Wakefield, MA: National Center on Accessing the General Curriculum. 

Jacob, B. A. (2002). Accountability, incentives, and behavior: The impact of high-stakes testing in the Chicago Public Schools. Cambridge, MA: National Bureau of Economic Research.

Joyce, B., & Showers, B. (1982). The coaching of teaching. Educational Leadership, 40, 4–10.

Kettler, R. J., Glover, T. A., Albers, C. A., & Feeney-Kettler, K. (Eds.). (2014). Universal screening in educational settings: Evidence-based decision making for schools.  Washington, DC: American Psychological Association.

Kurz, A., & Elliott, S. N. (2012). MyiLOGS: My instructional learning opportunities guidance system (Version 2) [Online measurement instrument]. Tempe, AZ: Arizona State University.

Kurz, A., Elliott, S. N., Kettler, R. J., & Yel, N. (2014). Assessing students’ opportunity to learn the intended curriculum using an online teacher log: Initial validity evidence. Educational Assessment, 19(3), 159–184. doi:10.1080/10627197.2014.934606

Kurz, A., Elliott, S. N., & Roach, A. T. (2015). Addressing the missing instructional data problem: Using a teacher log to document Tier 1 instruction. Remedial and Special Education. Advance online publication. doi:10.1177/0741932514567365

McLaughlin, J. A., & Jordan, G. B. (2004). Chapter 3: Logic Models. In J. S. Wholey, H. P. Hatry, & K. E. Newcomer (Eds.) Handbook of Practical Program Evaluation, 2nd Edition (pp. 55–80). Jossey-Bass.

Northwest Evaluation Association (NWEA). (2013). Educational Assessment | Student Centered Learning | Common Core Assessments - Northwest Evaluation Association (NWEA). Retrieved from:  

New Jersey Department of Education. (2011). New Jersey Assessment of Skills and Knowledge: 2011 score interpretation manual, grades 3–8. Trenton, NJ: New Jersey State Department of Education.

New Jersey Department of Education. (2011). New Jersey High School Proficiency Assessment: District/school test coordinator manual. Trenton, NJ: New Jersey State Department of Education.

New Jersey Department of Education. (2012). New Jersey Assessment of Skills and Knowledge: 2011 technical report, grades 3–8. Trenton, NJ: New Jersey State Department of Education.

New Jersey Educator Effectiveness Task Force. (2011, March 1). Interim Report.

No Child Left Behind Act of 2001 (NCLB), 20 U.S.C. • 6311 et seq. (2001).

Odden, A. (2008). How new teacher pay structures can support education reform (Paper prepared for the College Board). Madison, WI: University of Wisconsin, Wisconsin Center for Education Research, Consortium for Policy Research in Education.

Reddy, L. A., & Dudek, C. (2014). Teacher progress monitoring of instructional and behavioral management practices: An evidence-based approach to improving classroom practices. International Journal of School and Educational Psychology, 2, 71–84. doi:10.1080/21683603.2013.876951

Reddy, L. A., Dudek, C. M., Fabiano, G., & Peters, S. (2015). Measuring teacher self-report on classroom practices: Construct validity and reliability of the Classroom Strategies Scale – Teacher Form. School Psychology Quarterly, 30, 513–533.

Reddy, L. A., Dudek, C. M., & Shernoff, E. (2016). Teacher formative assessment: The missing link in Response to Intervention. In S. Jimerson (Ed.), Handbook for Response to Intervention – Second Edition. New York, NY: Springer.

Reddy, L. A., Fabiano, G., Dudek, C., & Hsu, L. (2013). Development and construct validity of the Classroom Strategies Scale – Observer Form. School Psychology Quarterly, 28, 317–341. doi:10.1037/spq0000043

Reddy, L. A., Kettler, R. J., & Kurz, A. (2015). School-wide educator evaluation for improving school capacity and student achievement in high poverty schools: Year 1 of the school system improvement project. Journal of Educational and Psychological Consultation, 2, 1–19.

Roach, A. T., Kurz, A., & Elliott, S. N. (2015). Facilitating opportunity to learn for students with disabilities with instructional feedback data. Preventing School Failure. Advance online publication. doi:10.1080/1045988X.2014.901288

Shapiro, E. S., Keller, M. A., Lutz, J. G., Santoro, L. E., & Hintze, J. M. (2006). Curriculum-based measures and performance on state assessment and standardized tests: Reading and math performance in Pennsylvania. Journal of Psychoeducational Assessment, 24, 19–35.

Silberglitt, B., Burns, M. K., Madyun, N. H., & Lail, K. E. (2006). Relationship of reading fluency assessment data with state accountability test scores: A longitudinal comparison of grade levels. Psychology in the Schools, 43, 527–535.

Silberglitt, B., & Hintze, J. M. (2005). Formative assessment using CBM-R cut scores to track progress toward success on state-mandated achievement tests: A comparison of models. Journal of Psychoeducational Assessment, 23, 304–325.

U.S. Department of Education. (2012). U.S. Department of Education boosts district-led efforts to recognize and reward greater teachers and principals through the 2012 Teacher Incentive Fund. [Press release]. Retrieved from

VanDerHeyden, A. M., Witt, J. C., & Gilbertson, D. (2007). A multi-year evaluation of the effects of a Response to Intervention (RTI) model on identification of children for special education. Journal of School Psychology, 45, 225–256.

Walker, H. M., & Shinn, M. R. (2002). Structuring school-based interventions to achieve integrated primary, secondary, and tertiary prevention goals for safe and effective schools. In M. R. Shinn, H. M. Walker, & G. Stoner (Eds.), Interventions for academic and behavior problems II: Prevention and remedial approaches (pp. 681–701). Bethesda, MD: National Association of School Psychologists.


Table 1. Description and Function of SSI Project Assessments





Student Assessments

Measures of Academic Progress (MAP; Northwest Education Association, 2015)

Computer-adaptive measures of K–12 student achievement in reading/language arts and mathematics (and in grades 3–9 science)

Used to predict end-of-year student achievement, screen for academic difficulties, and monitor single- and multi-year growth in achievement


Skill-Focus Indicators of Student Performance

Brief, individually administered assessments of student performance on discrete academic skills (e.g., DIBELS Next [Good et al., 2013]; aimsweb [Pearson, 2012]; Fastbridge Assessments [Christ & Kember, 2014])

Use to screen for student difficulties, determine specific instructional needs, regularly monitor students’ response to instruction or intervention, and predict students’ end-of-year achievement


State Achievement Test (Partnership for Assessment of Readiness for College and Careers assessment; Pearson, 2015)

Computer-administered achievement test aligned with Common Core standards for reading/language arts, mathematics, and science

Used to determine end-of-year achievement relative to benchmarked proficiency standards and to guide decisions about teacher performance based on multi-year student growth

Teacher Assessments

Classroom Strategies Scale (Reddy & Dudek, 2014)

Multi-dimensional assessment of teachers’ instructional strategies and classroom behavioral management

Used to screen for teachers’ application of evidence-based instructional and behavioral management strategies, identify professional development needs, and monitor teachers’ progress in response to training


Danielson Framework-Teaching Evaluation Instrument (DFT; Danielson, 2013)

Observational measure of teacher practice in four domains: Planning and Preparing for Student Learning, Creating an Environment for Student Learning, Teaching for Student Learning, and Professionalism.

Used to evaluate teacher planning and instructional practices and to identify areas for professional development


Instructional Learning Opportunities Guidance System (Kurz & Elliott, 2012)

Self-report measure (logging system) of student learning opportunities aligned with state content standards

Used to evaluate professional development needs and monitor progress in aligning content instruction with state standards

Cite This Article as: Teachers College Record Volume 118 Number 14, 2016, p. 1-26 ID Number: 21548, Date Accessed: 5/27/2022 6:28:11 PM

Purchase Reprint Rights for this article or review