Triggering ESEA Sanctions

K-12 Testing

Recent research reveals that one-third to three-quarters of the schools in most states will fail to meet new “Adequate Yearly Progress” (AYP) requirements established by the Elementary and Secondary Education Act (ESEA). AYP is a formula intended to ensure that all students attain the “proficient” level on state exams within the next 12 years (see Examiner, Winter 2001-02).


When the legislation first passed both houses of Congress, several researchers warned that most schools in the nation would be labeled as “needing improvement” within a few years. Fearing this would fuel a backlash, Congress revised the formula before final passage. However, subsequent studies based on data from several states have found that the final version only partly solves the problem.


According to state officials, up to 80% of Louisiana’s schools could be deemed “needing improvement,” along with two-thirds in North Carolina, and up to half in Wyoming. A study by the Congressional Research Service estimates that 17% to 64% of schools serving grades 3-8 in Maryland, North Carolina and Texas will “need improvement” under the law. Each state decides what “proficient” means for its students.


Schools that receive ESEA Title I money and fail to make AYP are subject to various “corrective actions.”


Assessment results, usually just standardized test scores, are the sole criteria for AYP. Further, each of a number of sub-groups of students (racial minority, low-income, limited English proficient and special needs students) must all be on track to reach 100% proficiency. This dramatically increases the number of schools and districts potentially facing sanctions.


Score volatility
Year-to-year changes in test scores are volatile, particularly for small schools or schools with small numbers in specific subgroups (see Examiner, Summer 2001). Researchers Thomas Kane and Douglas Staiger found that 50 to 80 percent of the change in annual scores in North Carolina could be attributed to changes in the student population and one-time distractions, rather than to improved (or diminished) learning that could be attributed to the school.


Since then, studies in Colorado, Massachusetts and Florida have reached similar conclusions. A FairTest study conducted by Anne Wheelock found that Massachusetts schools which won awards for score gains one year were unlikely to make similar gains the next year. Often their scores dropped.


Researchers from the Center for Research on Standards, Evaluation and Student Assessment (CRESST) reviewed Colorado data and found that schools which started with high scores were likely to have smaller gains than schools which began with a greater percentage of lower-scoring students. Schools that posted large gains from years one to two usually saw declines in year three, while those that declined in the first two years usually gained in the third.


In Florida, researcher David Figlio studied score changes in two districts. He found volatility from year to year that was more intense for subgroups within schools. He concluded that “it may be nearly impossible for a school to experience persistent improvements across a wide variety of subgroups.”


In response to these problems, Congress allowed schools to use three-year rolling averages to somewhat smooth out the erratic annual data and improve reliability. This may help only partially.


Kane and Staiger found that a North Carolina school seeking to predict its future scores could do so more accurately if it picked the average state score increase rather than use its own previous four years of score changes. Figlio concluded that even with the rolling averages, many schools will have erratic score patterns that have little to do with the quality of education.


• Education Week:
• Kane and Staiger:
• MCAS Alert:
• CRESST Line, Fall 2001, and papers referenced in it:
• Figlio’s paper: