Improving Accountability: A Review of Grading Education

K-12 Testing

FairTest Examiner, December 2008

Richard Rothstein, Rebecca Jacobsen and Tamara Wilder's new book, Grading Education: Getting Accountability Right, is a valuable addition to the nation's thinking on accountability, though it does not go far enough in its discussions of assessment. Noting that NCLB is "an utter failure," the authors would expand the conception of accountability well beyond the current fixation on test scores.

The book gives serious attention to curriculum narrowing but somewhat downplays the harmful consequences of teaching to the test in the tested subjects. Though it promotes performance assessment, it does not sufficiently recognize the full importance of expanding the use of performance assessments in educational accountability and improvement. It calls for a clear but limited accountability role for the federal government, returning primary responsibility to the states.

The book proposes gathering data in several different ways across eight youth development domains, then allowing states to use that data to improve educational opportunities and outcomes. The authors more limited recommendations to the states focus on the development of inspectorates. Below find a synopsis followed by an elaboration on some of the book’s points.

In writing this review, I engaged in some discussion with Richard Rothstein, some of which is reflected in the text. I concur with him that sparking further discussion about accountability is essential, and I hope my comments and the abbreviated dialog with him contribute to that discussion.


The book argues that NCLB (and similar state test-based accountability systems) is "an utter failure" that "gave accountability a bad name." The book examines historical and current goals the public holds for education (the latter via surveys, including one by Rothstein and colleagues), sifting these to construct a list of eight main areas (see below) of youth development. Over the next two chapters, it analyzes "goal distortion," primarily the undermining of the curriculum, caused by NCLB's "perverse accountability." In these chapters, I think the authors give too little attention to the damage caused to reading and math by standardized testing, which perhaps contributes to what I think is an over-reliance on testing in their recommendations (see below; this is the flaw I noted above). The next chapter discusses ways in which "accountability by the numbers" has had counter-productive consequences in a variety of other, non-educational, arenas.

The authors then describe components of what they think will be a reasonable accountability system for all government-funded youth-development areas. They propose first an overhauled National Assessment of Educational Progress (NAEP) that looks more like its original incarnation in the 1960s, and then an inspectorate modeled on the British system that could evolve from current U.S. regional accreditation systems. They conclude with an outline of an accountability system in which the federal government would ensure adequate school funding and conduct an expanded NAEP, while states would engage in a variety of accountability activities, including additional testing as well as the inspection process.

The book does not propose particular actions states should take in response to accountability data, but says that states would be expected to intervene when localities that are not doing well do not improve, despite assistance. It says gathering this expanded accountability data would cost perhaps one percent of what the U.S. now spends on K-12 education and would be well worth it, as accountability can induce better programs and save money. More importantly, the public deserves to know how well schools and other programs are carrying out their social mission. While it proposes accountability for "schools and other institutions of youth development," most of the discussion addresses schools. Framing the accountability question more broadly than just schools is important, but much more work will need to be done to construct an integrated accountability system – a point the authors at least implicitly make themselves, saying they intend this book to spur thinking and discussion.

The book has several appendices, notably "teacher accounts of goal distortions," in which some 14 teachers talk about the mostly harmful consequences of test-based accountability; and "schools as scapegoats," which rebuts those who treat schools as causing economic problems or as being the main solution to those problems. Each of the eight chapters is well-written, deeply researched, thoughtful and helpful. [Note, I am one of the people who read and commented on a pre-publication draft.]

Looking in more detail at the accountability proposals:

The book's eight goals for public education are: basic academic skills, critical thinking, arts and literature, preparation for skilled work, social skills and work ethic, citizenship, physical health, and emotional health. The authors surveyed the general public, school board members, and state legislators for how they would weight these, then they offer their own proposed weighting formula. In general, these are certainly important areas for the nation's youth. In their proposal, 'basic academic skills' (including most academic subjects, not just math and reading) carries only modestly more weighting than do the other factors (about 1/5 in the surveys and the book's own proposals, versus 8-16% for the various other factors, some of which clearly overlap academics, such as critical thinking at 16% and citizenship at 13%). We can predict charges that Rothstein would let schools off the hook if kids do not learn, so long as they do art, are healthy and happy, and have good social skills. But after the destructive reductionism of NCLB, the nation needs public debate on how important various aspects of learning are and how to ensure our children receive a balanced opportunity for human development. The concluding chapter does not offer specific recommendations on how the weighting would play out in actions taken in response to the data, though it provides some possible examples. Plausibly, publicizing the range of data would preclude such narrowly focused, goal distorting actions as teaching to a standardized test.

The book proposes that data be gathered on all eight goal areas, at the state and national levels. This would happen largely through an expanded NAEP that would remain a sampling assessment, assess more academic as well as other areas and include more extensive surveys, then produce state-level data once every three years. The chapters on NAEP and the overall accountability plan provide some thoughtful details on how this would work, including more use of performance assessments, testing by age not grade, and assessing out-of-school 17-year-olds. In any case, NAEP and substantially increased educational funding would be the major components of the federal role. The feds, they say, should get out of school- and student-level data and testing, leaving that as a state responsibility. The authors also have a good argument on why national standards will not improve education. (They note that the positive role of the federal government in support of racial equality was largely confined to the years following the Brown decision and has largely been reversed, especially in the Bush II years. I would add that Reconstruction also was positive, but the general point that the federal government, including the courts, are not necessarily progressive, is correct.)

States would then use the federal data and add on what pieces they think useful. The authors envision additional testing beyond the federal NAEP and other "standardized assessment instruments" that would cover all eight areas and include performance tasks – but they don't provide much detail on these. They don't specify the frequency of such testing. It would be one thing if such assessing includes only a few subjects per grade, or only a few grades are assessed each year, or many components are assessed only once every 2-3 years. However, if the system heads toward collecting significant amounts of data annually in each area in most grades, not only would costs be very high, but the assessing could become even more burdensome for schools than the current system. While a reasonably lean system can be constructed, the quantity of assessing will be a critical issue if states move toward the sort of accountability system the authors outline.

In an email conversation in response to an earlier version of these comments, Rothstein said that because he and his co-authors see statewide standardized testing as being conducted in more subjects, but not annually in any subject, they do not think their proposal entails substantially more standardized testing. He notes that they recommend school inspections once every three years, and standardized testing for accountability purposes should be consistent with that recommendation. Rothstein said they were not more explicit about this because they were avoiding prescribing specific testing and accountability regimes to the states. I remain concerned that this lack of clarity or precision provides an opening for those who seek more testing.

As I read the book, the major problem I had is that it pays insufficient attention to the need, value and feasibility of using classroom- and school-based evidence in an accountability system, as the Forum on Educational Accountability (FEA) proposes (for details, see report of the Expert Panel on Assessment, at or (To be more precise, in Grading Schools, the use of such evidence appears as part of the inspectors' examination of such artifacts.) Using classroom-based evidence is technically feasible, and some other nations with strong education systems do rely primarily on local assessments or a mix of local and national assessments. (An article in the December 2008 Phi Delta Kappan ( from Linda Darling-Hammond provides substantial detail on this via discussions of Finland, Sweden, Queensland and Victoria, Australia, England and Hong Kong; one might also consider New Zealand, which as Rothstein points out has a NAEP-like exam and otherwise relies on local information – see for more on New Zealand and Queensland.)

The major reason to use classroom-based evidence to evaluate academic attainments is that it is the only feasible way to use a significant number of extended performance tasks and projects, which are necessary for assessing many significant areas of the curriculum (as well as a valuable instructional mode and tool in professional development). Rothstein and colleagues think that standardized tests with some performance items in NAEP, plus the same for the state's assessments, plus inspectors who look at student work, will do the job. I don't think the job will be adequately done that way. I fear that a relative emphasis on standardized tests (even better ones) via the state tests and what inspectors then look at will lead to narrowing the modes of instruction, the range and kinds of knowledge students have an opportunity to learn, and the ways in which students can demonstrate their learning. That is, it could still be too much of a one-size-fits-all approach. For example, I have heard comment (which I have not verified) that in England inspectors are now supposed to, in effect, use the nation's standardized tests as their touchstones, building their evaluation around those scores and what should be done to increase them. In such a system, reviews of work samples or classroom visits or interviews with faculty and students are subordinated to the tests. Rather, tests should be an adjunct to the richer system of examining actual classroom- and school-based work.

Moreover, the authors do not consider costs for state assessments, but these costs are likely to be very large if exams include a large share of performance assessment components that every student would take and that would be scored centrally. Such costs can be kept manageable when performance assessments and portfolios are part of teachers' regular work and ongoing professional learning. And if, as Rothstein hopes, states keep their testing limited, then the costs certainly remain within the bounds the book suggests.

NAEP and/or state sampling exams can be key parts of an accountability system, though some states have many schools so small that sampling is not feasible at that level. These exams should have performance components. However, if each school has a portfolio, learning record or work-sampling system, then teams of outside teachers and others can re-score samples and provide valuable feedback in a process that would steadily improve the quality of the system. As Linda Darling-Hammond explains, other nations do this. In the end, a portfolio-based system in which samples are re-scored would be cheaper than having a large set of performance items in an on-demand state exam and would provide numerous other benefits, including variety in methods for students to demonstrate their learning, and useful professional learning that would strengthen instructional skills.

In response to these comments, Rothstein said their expectation is that inspectors would look closely at student work. They would not have the federal government require states or schools to keep portfolios or work-sampling systems, but such approaches would be the feasible way to ensure the evidence was available for state inspections and thus for doing well in the inspection. He reiterated they did not want to specify a great level of detail as to how states should construct their inspections. In addition to appreciating more specificity in the authors' thinking, I think the inspections would be valuable but not sufficient as an improvement or evaluation tool, for the reasons I explained above.

Rothstein and his colleagues provide useful ideas for state accountability systems, despite the lack of development on how to use performance assessments. They also do not much discuss improvement, tending to simply state that the good data can be used for improvement purposes. By contrast, the Forum on Educational Accountability, building on the Joint Organizational Statement on NCLB, proposes that significant federal funds be directed to endeavors such as greatly improved professional development. (For details, see


In sum, I would re-work the authors' system to:

  • include an expanded NAEP that looks much more like early NAEP than the current NAEP;
  • employ a modest amount of state testing;
  • strengthen the accreditation process to provide periodic school inspections; and
  • rely primarily on classroom- and school-based evidence for evaluating schools and helping in improvement, in part via the inspections, but largely through other modes of annually sharing and evaluating samples of student work across schools.

Assuming that testing is limited, then the major distinction I would draw is on the need to prioritize the classroom-based evidence over the testing and to use it more fully than just in inspections. The authors may prefer to leave such issues for later, mainly state discussion. Here I would note that FEA calls for, and a draft bill from Chairman Miller on the reauthorization of ESEA included, funding for states to develop new assessment systems that included classroom-based evidence. The best way(s) to combine limited standardized testing, inspections and reviews of samples of student work into a coherent system remains to be determined.

Grading Schools does not address the issue of making decisions about students, as it is not a component of federal accountability decisions. However, states will be thinking about that should they redesign their systems. Decisions about students should be made on the basis of their schoolwork, not external exams. The review of samples of student work provides a check on schools to ensure the quality of such decisions, as can inspections and a careful use of standardized tests. That is, given a rich evaluation system for schools, there is no good reason not to allow schools to make decisions about students. If there is a belief that in some schools, students are not learning enough, then the state and district must work to improve the school, not impose accountability on the backs of students who attended inadequate schools – or, for that matter, punish schools for not doing what the larger system failed to give them the resources to accomplish.

The Mass Coalition for Authentic Reform in Education provides similar recommendations for a combination of school-based evidence, limited exams, and inspections, including for high school graduation decisions. (FairTest and I participated in crafting these recommendations to Massachusetts, and FairTest has worked to promote them in other states). CARE suggests inspections every five years, compared to the three the book proposes, but the reviews of student work samples across schools would be annual (though need not involve every grade or subject each year). Both agree on more frequent inspections for schools having difficulties. (See see

In "Class and Schools," Rothstein argued forcefully for paying more attention to non-school factors. That is, data about health, housing, employment, etc., should be gathered and factored into any accountability and improvement system. I concur. This book provides some good examples, including via an expanded NAEP survey, of how to obtain this information. More generally, opportunity-to-learn factors, from in and out of school, should be gathered, publicized and used in evaluating schools and their progress and in developing solutions to demonstrated problems.

What Rothstein and his colleagues seek is a rich, comprehensive accountability system that can be used to actually improve education and youth policy and practice. It must be complex enough to prevent the harmful effects caused not only in NCLB and current state testing programs, but also in other fields by narrowly conceived accountability programs. It also must be simple and efficient enough so that the benefits outweigh the burdens. I think this book largely provides a possible and feasible approach, with the important caveat that there also needs to be a state system beyond the inspection process through which educators (primarily) evaluate samples of student work to provide feedback to teachers and guide professional learning. I also have concerns about how burdensome it would be if states continue with their now-extensive testing regimes while also adding inspections and an expanded NAEP, however contrary that might be to the authors' conceptions.

One further issue: programs of extensive data gathering can be misused. There are legitimate concerns about how data about students should be gathered and used. In this calculation, sampling is very important: few students would have much of the range of data accumulated about them (each would, as now, have her/his individual school record, available for parental inspection). However, the development of a far more extensive data system based on sampling could be a prelude to a system of detailed individualized data. Recall that NAEP started with sampling at the national level, moved to the state level, and now also reports on large cities – albeit all by sampling; yet there are serious proposals for a national exam of all students built on NAEP. This area will require serious attention, not only for issues of privacy but also for potential systemic abuses.

Lastly, I think Rothstein and colleagues could have paid more attention to other proposals for overhauling NCLB that are far more than just tinkering (as they say other proposals generally are). No doubt that is partly because I chair FEA, which has produced what I think are some positive conceptions of the federal role that are fundamentally different from current federal law. I also think that by approaching NCLB reauthorization with a rich, wide set of overlapping and complementary proposals, in which proponents acknowledge one another's positive components, we strengthen the likelihood of a more comprehensive and successful overhaul of federal law.


Reviewed by Monty Neill