Promoting Sound Assessment

Promoting Sound Assessment - a paper prepared by
FairTest for the American Association of School Administrators,
published on our website with permission.

What Superintendents Can Do to Promote
Sound Assessment in Light of NCLB

By Monty Neill, Ed.D.
Co-Executive Director

As implemented in all but a few states, NCLB accountability
requirements are based almost entirely on state standardized
tests. Thus, schools are rated and face sanctions on the basis
of test scores. A widely observed result is a huge emphasis on
teaching to the tests and on using those results to shape curriculum
and instruction (increasingly called "data-driven").
Many educators -- teachers and administrators - are uncomfortable
with this, recognizing that the tests typically measure only
a limited slice of what children should know and be able to do
in the tested subjects, as well as not assessing other academic
and non-academic areas. These educators want additional measures,
but they also need to continue to pay close attention to the
test scores so long as states use them as sole measures. Thus,
any additional measures should simultaneously help raise test
scores and encourage teaching practices and provide data on areas
not assessed with the standardized tests.

Fortunately, there are assessment approaches which can accomplish
this end. This memo outlines the basics of a few, inter-related
approaches, listing also some examples and research. For convenience,
I will present the approaches in three categories: embedded performance
assessment tasks; portfolios; formative assessments. The three
overlap in important ways, which I also will address. There are
also important sources of general information on what is variously
termed authentic assessment, performance assessment, and classroom-based

Embedded performance tasks: Essentially, these are tasks that
most typically take a class period or so to complete, though
they may several periods, and some are full-scale projects that
might be worked on over several weeks. While classroom teachers
can design these, a district can design or acquire them. They
could be done individually or by students in groups. A task could
be administered across the district on one given day, or teachers
could use them when it fits the particular instructional moment
for the class or even the individual.

Perhaps at the most familiar level, these can be essays -
in literature, history, science, etc. Many science tasks involve
lab work, though it is essential also that such tasks require
students to think and reflect, not just follow a recipe. In math,
projects can include such things as geometry modeling or conducting
statistical analyses then constructing tables and graphs. Again,
it is essential that the tasks require students to think, to
apply knowledge, to reflect on and perhaps evaluate their own

Typically, teachers use rubrics to score the tasks. They may
be scored by the students' own teacher, by teachers at a school,
or by teachers from across schools; each approach has benefits
and drawbacks. It can be very useful to have teachers in schools
score their own students, then have a limited sample scored separately
by experts in order to provide feedback to teachers and to determine
the reliability of teachers' scoring.

Perhaps the most important benefits of using tasks are: 1)
determining whether students are learning beyond what is measured
by state exams; 2) signaling that such learning is important;
and 3) providing professional development opportunities for teachers.
The latter has two aspects: professional development is often
essential for teachers to be able to use tasks well, and the
using and scoring of the tasks provides opportunities for professional
development through training teachers in scoring and teachers'
discussions of student work. These benefits are also significant
for portfolios and formative assessment, though the particulars
vary. The first point is relevant in that it is widely recognized
that multiple-choice items are most useful in assessing declarative
and basic procedural information, while performance tasks generally
can do a better job of assessing deeper learning, critical thinking
such as synthesis and evaluation, and creativity.

There is debate on whether it is better to write tasks in-house
(gaining professional development and teacher buy-in) or acquire
them (as it is difficult to create good tasks, it can be efficient
to acquire them). The two approaches are not incompatible (acquire
some, make some).

There are several organizations that produce tasks and projects.
A multi-district project in Silicon Valley in California which
uses MARS math and science tasks has been quite successful: when
use of tasks is accompanied by professional development, student
scores on the tasks have risen rapidly, and use of MARS correlates
with higher scores on the traditional state test. The MARS project
more generally includes researchers from Berkeley and Michigan,
as well as the United Kingdom. Their website --
-- includes sample tasks, evaluations of the work, and more.

BEAR out of U. Cal Berkeley is similar--
BEAR's director, Mark Wilson, said they work with a limited number
of districts that are also revising curriculum in ways that include
embedded tasks. The website mainly contains research papers,
though they may soon post a more accessible summary of their

The state of Maine has been posting tasks on its website for
use in the state-mandated Comprehensive Local Assessment Systems
(the tasks are optional, but districts must have a CLAS). Researchers
at the University of Southern Maine collaborated with several
dozen school systems to develop a guide for using assessment
to serve learning (
The guide includes information about embedded tasks and portfolios.

In Queensland, Australia, a pilot project that has been extensively
researched has used 'Rich Tasks' tied to a 'New Basics' curriculum.
As with MARS, there have been strong positive results (New Basics
students scored as well as others on traditional tests but significantly
outscored them on tests of higher order thinking. The pilot is
expanding on a voluntary basis, with new schools joining up and
other schools simply beginning to use the tasks. The website contains a good
deal of information, including many research papers. A few tasks
are presented in summary form. The forthcoming Winter 2004-05
FairTest Examiner has an article summarizing the Rich Task approach.

The NY Performance Standards Consortium is developing and
using tasks across a number of mostly urban small high schools
as part of the development of an alternative approach to accountability.
A few tasks and how tasks are used are described on the web at - click on both 'the consortium'
then 'alternatives to high stakes testing'; and on 'performance
assessment' which includes, among other things, some detailed

Tasks also are available from sources such as Exemplars - Exemplars sells tasks with rubrics
in math, science and writing, and provides professional development
and technical assistance.

Portfolios: At root, portfolios are simply collections of
work, rooted in artists' portfolios, which typically are selected
to show the range and quality of the art. Used in education,
they may be intended to focus on 'best pieces' or be a wider
sampling of student work. They also usually involve student self-assessment
in which a student (along with teachers, typically) select pieces
to include and evaluates either some of the pieces or the portfolio
as a whole. Work in a portfolio can include tasks and projects
(thus an overlap with the task approach) as well as other examples
of classwork, journals, test results, etc. In writing, for example,
multiple drafts of a piece can be included, showing how well
a student knows how to revise and improve her writing.

Portfolio evaluation involves, typically, consideration of
the body of work in light of a rubric. It could be used, for
example, to determine how well a student met the state standards.
Thus, the pieces in a language arts portfolio would be selected
to respond to key content standards (any one piece might address
more than one standard). As with tasks, scorers need guidance
from examples of student work of varying quality and that illustrate
varying ways to demonstrate high quality. Scoring portfolios
proceeds in similar ways to scoring tasks, though with portfolios
a larger body of work must be evaluated. Well-crafted portfolio
approaches are able to produce high agreement among scorers,
even when the work included in the portfolios is highly diverse.
As with tasks, professional development is essential to make
the effort work, while portfolio use provides opportunities for
professional development.

One approach to portfolios is the Learning Record, which structures
the kinds of evidence of learning teachers and students assemble
and provides a structure for evaluation and feedback to students
and parents. (It is, in fact, a more comprehensive approach than
is common with portfolios.) The LR was developed initially in
England, in part for use with students whose first language is
other than English, then further developed in the U.S by the
now-closed Center for Language in Learning. The FairTest website
at contains
some of the key Learning Record materials as well as links to
other LR information, including on the website of the Tiospa
Zina Tribal School (,
which is carrying on some of the CLL's work. Also, highly recommended
are the Teachers' Handbooks to using the LR in literacy (reading,
writing, listening, speaking), available from Heinemann -
- lead author is Mary Barr. Limited technical assistance is available
through Professor Sally Thomas at or
through Dr. Roger Bordeaux via the Tiospa Zina Tribal School

The Work Sampling System is somewhat similar. It covers a
wide academic area and some aspects of non-academic learning,
through grade 5. Developed by Dr. Samuel Meisels and used in
hundreds if not thousands of schools, it is now commercially
available from Pearson at
-- see in particular the Quick Tour on the website.

Maine has portfolio options on its state website, to be used
in the LAS; and the Southern Maine Partnership guide includes
information on portfolios.

There is a rich literature on how to do portfolios in general
as well as in particular subjects. A number of books are particularly
valuable. FairTest has an annotated bibliography on performance
assessment on its website at
-- it has not been updated recently, but the materials remain
excellent. FairTest's booklet Implementing Performance Assessment
contains a section on portfolios - it is available by contacting
FairTest or using the ordering form on our website at

Formative Assessment: This term refers to assessment procedures
which have as their main purpose the provision of feedback to
students so that students can more effectively learn. It is sometimes
called 'assessment for learning' to distinguish it from 'assessment
of learning' (summative or outcome assessment). Extensive research
conducted by Paul Black and Dylan William (summarized in "Inside
the Black Box," Phi Delta Kappan, Oct. 1998) concluded that
the statistical effect size of formative assessment on outcome
measures (including standardized tests) was equal to or greater
than any other school intervention, including smaller class sizes.
(It is true that high-quality assessment is labor intensive,
so sufficiently small class size is often a minimum requirement.)

The point of formative assessment is that a teacher provides
pinpoint information in terms the student understands so that
the student can build on his strengths or overcome a problem.
Standardized tests, even if scored turned around quickly, provide
far too little information to be of much use in this process.
While teachers should have access to assessment tools and not
re-invent every wheel, it is the teacher who is the essential
assessment 'instrument' in formative assessment. The key ingredients
are professional development so teachers know well how to do
this, and sufficient time and proper class size to do it well.
Research suggests it is equally important that students learn
how to use the feedback and that they also learn to self-assess.
Thus, teachers should help students learn this skill.

Information on how to use formative assessment is increasingly
available. For example, Black and Wiliam wrote a follow-up, "Working
Inside the Black Box" (Kappan, Sept. 2004). The Assessment
Working Group in England, which includes additional work by Black
and Wiliam, can be reached at
Rich Stiggins and the Assessment Training Institute also have
done valuable work on formative assessment --

Formative assessment certainly involves looking at student
work products. But is also an involve observing student behaviors
individually and in groups; dialogs with a student; and other
observations about the process. Valuable approaches to formative
work can also be seen in the literature on the "descriptive
review" process pioneered by Patricia Carini and the Prospect
Center in Vermont

The value of high-quality formative assessment is not in doubt.
In a formative assessment process, tasks and portfolios can be
used. For example, teachers can respond to student work on a
task in ways that helps the student to revise and then complete
a higher-quality task. Self-evaluation by a student of her or
his portfolio can be a formative practice. To some extent, assessment
tasks and portfolios therefore can serve both formative and summative
purposes. That said, the observations of student behaviors, responses
to student work, and dialogs with students that comprise the
heart of formative assessment are not likely to be summed up
in an outcome evaluation (though a teacher might comment on how
well a student can self-assess or use formative information provided
by the teacher).

Putting the pieces together: Trying to do everything at once
is likely to be a completely daunting task. A district might
well choose one element as a starting point and only selectively
include other components as a means of getting started. In the
end, all three - engaging tasks, portfolios of student work,
and use of formative assessment -- are important and should be
incorporated in a comprehensive and useful system. The implementation
process itself must be manageable by teachers or it is likely
to fail.

One example: a district might begin with a limited set of
performance tasks, either acquired or developed internally (or
both). They might assess only one or two subjects to start. Some
limited but general professional development would be provided
on how to use the tasks. Teachers at a school would score them,
but it would be best to train a core of teachers from each school
who would then work with teachers at their schools. Teachers
would discuss the results with students and thus provide potentially
formative feedback. Over time, more elements would be added (more
tasks in a subject, more subjects). Continuing professional development
would strengthen teacher capacity. Time for teachers to collaborate
in looking at and thinking about the assessments and student
work should be central to that professional development. The
tasks would become key pieces in portfolios that students would
learn to keep. As teachers became more familiar with this approach,
the quality of feedback to students would improve, and in turn
the students themselves would gain greater capacity to self-assess.
Teachers across schools could begin to meet to look at each others'
portfolios and perhaps move toward scoring them along with providing
detailed feedback.

Again, I recommend the bibliography on performance assessment
and FairTest's booklet, Implementing Performance Assessment,
as resources for thinking about how to begin then continue and
expand this work.

In addition, see the National Council of Teachers of English
Web Site Frequently Requested Topic Collection on Assessment
and Testing at
In late Spring 2005 a Teaching Collection on Assessment will
be featured at . NCTE's Consulting Services (
offers consulting engagements on assessment.

Outcomes: As noted above, evidence exists that these approaches
enhance student learning, both in areas not included in standardized
state tests and even on the state exams themselves. Partly this
is because students often find the tasks engaging, deepening
their commitment to academic work along with their understanding,
which pays off in many ways, including tests.

Costs: The costs of implementing performance assessments range
from negligible to fairly extensive. If teachers choose, they
can, for example, keep portfolios at no cost, or create and share
tasks. Purchasing tasks costs, and technical assistance or professional
development assistance somewhat more.

The major costs would be for staff time, both time for teachers
to learn to use performance assessments and then time to use
them and engage in activities such as creating tasks or scoring
tasks or portfolios. Proponents of performance assessment have
long argued that the cost is not assessment itself, but professional
development. Since improved teaching ought to be central to district
work, professional development is fundamental to improved teaching,
and high-quality assessment greatly benefits student learning,
then it make sense for districts to begin to implement assessment
programs and training beyond the requirements and limits of state