Testing Our Children: Introduction

Testing Our Children:

 

A Report Card on State Assessment Systems

 

 

Introduction

 

Standardized tests first rose to prominence in the 1920s, the era in which the "factory
model" of
education established clear dominance. They reinforced that mode of schooling, in which only a
few children received a high-quality education, and they were used to sort students hierarchically
within that model. The promise of school reform in the 1990s has been to break with that
inadequate, often harmful model of schooling. As one part of reaching that goal, assessment must
be fundamentally restructured to support high standards without standardization.

 

In this study, FairTest evaluates how well state assessment practices live up to this
promise. We
have measured these practices against standards derived from the Principles and Indicators for
Student Assessment Systems,
a 1995 publication by a coalition of education and civil rights
groups
working together through the National Forum on Assessment.

 

In broad terms, the Principles calls for assessments that are:

 

  • grounded in solid knowledge of how students learn;
  • connected to clear statements of
    what is important for students to learn;
  • flexible enough to meet the needs of a diverse
    student body; and
  • able to provide students with the opportunity to actively produce
    work and demonstrate their
    learning.

What we have found is that despite nearly a decade of intensive discussions about the role
and
nature of assessment, and despite some important improvements, the fundamental approach of
state testing programs has not changed. Though the labels have often been revised to
"assessment," most state programs still predominantly rely on traditional, multiple-choice tests,
and many states use them inappropriately to make high-stakes decisions.

 

Based on a detailed survey and other data sources, we conclude that two-thirds of state
K-12
student assessment systems do not reach even the middle level of system quality. One-third of the
systems need a complete overhaul, and another third need major improvements if they are to
provide support for high quality teaching and learning. The remaining third all have positive
components, but still need some improvements.

 

In two-thirds of the states, then, testing systems often impede, rather than enhance,
genuine
education reform:

  • Rather than holding schools accountable for providing a rich, deep education and
    reporting on
    such achievement to the public, most state testing programs provide information on a too-limited
    range of student learning in each important subject area.
  • Rather than supporting and
    assessing complex and critical thinking and the ability to use
    knowledge in real-world situations, most state tests continue to focus too much on measuring rote
    learning.
  • Rather than making decisions about students based on multiple sources of
    evidence, too many
    states use a single test as a mandatory hurdle.

Since state tests powerfully affect curriculum and instruction, most state testing programs
present
obstacles to developing high-quality classroom practices and fail to support strong school reform.
Some improvements can be seen in the use of writing samples (though these are often themselves
narrow) and constructed-response items (though their use remains too limited), and in more
attention to bias reduction. However, in most states, these modest changes amount to tinkering at
the edges of reform.

 

In fact, the recent tendency has been to intensify the traditional mode of testing, with
higher cut-off scores and more "difficult" exams, without changing the underlying approach. In
most state
tests, "difficult" means testing student achievement in conventional academic subjects at an earlier
age, such as algebra in grade 8. The problem with this approach is not that algebra now may need
to be taught in grade 8, but that the kind of algebra tested remains predominantly the
memorization of rules and procedures and very limited applications. This approach fails to meet
the essence of the math standards of the National Council of Teachers of Mathematics. A similar,
flawed approach can be found for every subject.

 

The negative consequences of relying on traditional tests and using them to control
school reform
often seem to be the result of continued confusion over the limitations of large-scale assessments.
Unfortunately, states often fail to recognize these limitations and expect their tests to be useful in
ways they cannot.

 

Large-scale testing programs are generally not useful in improving a student's immediate
learning
process, though clearly that is what most parents hope for from assessment. As diagnostic tools,
most large-scale tests are blunt, imprecise, and often useless -- but most states claim that
diagnosis is a reason for their tests. Because most state tests do not provide any opportunity for
sustained and engaged thinking, they are poor tools for shaping or improving curriculum and
instruction -- a goal most states claim for their tests. While these exams can provide some
information to the public about what students have learned, most do not provide information
about whether students can use in their lives the things they have supposedly learned. They thus
provide limited accountability information.

 

Despite these extreme limitations of state testing programs, the cumulative effect of the
multiple
uses of these tests is that the exams largely define the purpose and processes of schooling in most
states. They affect not only curriculum and instruction, but also the culture of learning, student
motivation, and the underlying conceptions of what learning is and how humans learn. Driving
school reform with traditional tests will not succeed if the nation really wants all children, not just
the children of the wealthy, to gain an education that challenges

 

their minds and spirits, that assumes not only that they can learn some skills but can learn to use
their learning as active participants in a democratic society.

 

There is an alternative. The Principles and Indicators calls for large-scale
assessments that
combine sampling from classroom-based assessment data, such as portfolios and learning records,
with performance exams administered to samples of students. In this way, essential standards are
promoted and accountability information is gathered, while schools are encouraged to become
communities of learning that support all their students. Only one state, Vermont, approaches this
model, though elements of the assessments in a few other states are headed in this direction.

 

Fundamental assessment reform is still feasible. What is lacking is not the technical
know-how,
though much remains to be learned in that domain, but the political will. The responsibility for
improving assessment programs rests first of all with policymakers -- governors, legislators,
boards of education. It rests secondly with all those who can educate, or influence, the
policymakers -- educators, parents, community and business leaders, testing experts, state
education staff, and the voting public. That makes achieving real assessment reform an education
and organizing project. Only with an informed and active community, as well as educated
policymakers, can deep reform be created and sustained, including the necessary transformation of
state assessment programs.

 

 

 

Executive Summary:
State assessment systems

in light of the

Principles and Indicators for

Student Assessment Systems

Across the nation, state testing systems powerfully affect curriculum, instruction, school
cultures,
and the quality of education delivered to our nation's children. They can either support important
learning or undermine it.

 

This study evaluates how well state assessment systems support and help improve student
learning. FairTest based its evaluation on standards derived from the Principles and Indicators
for Student Assessment Systems.
This document was developed by the National Forum on
Assessment to help guide assessment reform and has been signed by over 80 education and civil
rights groups (see Appendix F). To gather data, FairTest used surveys, follow-up interviews, and
various documents (see Appendices D and E).

 

[The text of the Executive Summary is available from FairTest for $10. The full report, which
includes the Executive Summary, the state findings and the appendices, is available for $30].

 

 

State Findings

 

To evaluate the specific characteristics of state assessment programs, FairTest adapted the
Principles and Indicators to create standards and indicators appropriate for large-scale
assessment. The standards are:

Standard 1: Assessment supports important student learning.
Standard 2: Assessments are fair.
Standard 3: Professional development.
Standard 4: Public education, reporting, and parents' rights.
Standard 5: System review and improvement.

The following explains the basic purpose of each standard and indicator and why it is
important,
summarizes the findings from across the states, and discusses the implications of each finding.
Forty-four states responded to the FairTest survey, providing relatively complete information for
the evaluation process. For the remaining six states, FairTest relied on other sources which
provided substantially less data and no information at all on many of the indicators in the
standards.

 

 

 

B. Standards for Evaluating State Assessment Systems

 

Standard 1: Assessment supports important student learning.

1.1. Assessments are based on and aligned with standards.

1.2. Multiple-choice and very-short-answer (e.g., "gridded-in") items are a limited part of
the
assessments; and assessments employ multiple methods, including those that allow students to
demonstrate understanding by applying knowledge and constructing responses.

1.3. Assessments designed to rank order, such as norm-referenced tests (NRT), are not
used or
are not a significant part of the assessment system.

1.4. The test burden is not too heavy in any one grade or across the system.

1.5. High stakes decisions, such as high school graduation for students or probation for
schools,
are not made on the basis of any single assessment.

1.6. Sampling is employed to gather program information.

1.7. The evaluation of work done over time, e.g., portfolios, is a major component of
accountability and public reporting data.

1.8. Students are provided an opportunity to comment on or evaluate the instruction they
receive
and their own learning.

1.9. Appropriate contextual information is gathered and reported with assessment
data.

 

Standard 2: Assessments are fair.

2.1. States have implemented comprehensive bias review procedures.

2.2. Assessment results should be reported both for all students together and with
disaggregated
data for sub-populations.

2.3. Adequate and appropriate accommodations and adaptations are provided for students
with
Individual Education Plans (IEP).

2.4. Adequate and appropriate accommodations and adaptations, including translations or
developing assessments in languages other than English, are available for students with limited
English proficiency (LEP).

2.5. Multiple methods of assessment are provided to students to meet needs based on
different
learning styles and cultural backgrounds.

2.6. Students are provided an adequate opportunity to learn about the assessment.

 

Standard 3: Professional development.

3.1. States have requirements for beginning teachers and administrators to be
knowledgeable
about assessment, including appropriate classroom practices.

3.2. States provide sufficient professional development in assessment, including in
classroom
assessment.

3.3. States survey educators about their professional development needs in assessment and
evaluate their competence in assessment.

3.4. Teachers and other educators are involved in designing, writing and scoring
assessments.

 

 

 

Standard 4: Public education, reporting, and parents' rights.

4.1. Parents and community members are educated about the kinds of assessments used
and the
meaning and interpretation of assessment results.

4.2. The state surveys parents/public to determine information they want on assessments
and
whether assessment reports are understandable.

4.3. Reports should be available in languages other than English if a sizeable number or
significant
percentage of the student population come from homes where another language is commonly
used.

4.5. Parents and/or students have the right to examine assessments, appeal assessment
scores, or
challenge flawed items.

 

 

Standard 5: System review and improvement.

5.1. The assessment system is regularly reviewed.

5.2. The review includes participation by various stakeholders and evaluation by
independent
experts.

5.3. The review studies how well the system actually is aligned to standards.

5.4. The review studies the impact of the assessment(s) on curriculum and instruction.

5.5. The review studies whether assessments assess critical thinking or the ability to
engage in
cognitively complex work within a subject.

5.6. Reviews for assessments at grade 3 or below study whether the assessments are
developmentally appropriate.

5.7. Reviews study the impact of assessment programs on student progress and
particularly the
impact of any high stakes tests, such as high school exit exams, on graduation rates.

5.8. Reviews study the technical quality of assessments.

5.9. The state reviews local assessment practices.

5.10. Reviews help guide improvements in the assessment system that will bring the
program
more in line with the Principles and Indicators.

 

 

This page has been left blank.

 

C. Scoring Guide

 

The FairTest evaluation focuses on the primary characteristics described below. States' scores are
based primarily on their current programs, but on occasion changes that are currently being
implemented were considered.

 

Level 1. State assessment system needs a complete overhaul. Such a state system exhibits
three or more of the following negative characteristics:

Uses all or almost all multiple-choice testing; Tests all students in one or more
grades with a norm-referenced test; Has a single exam as a high school exit or grade-promotion
requirement; or Exhibits generally poor performance on the other standards.

Level 2. State assessment system needs many major improvements. Such a state system
has
two of the following negative characteristics:

Uses all or almost all multiple-choice testing; Tests all students in one or more
grades with a norm-referenced test; Has a single exam as a high school exit or grade-promotion
requirement; or Exhibits generally poor performance on the other standards.

Level 3. State assessment system needs some significant improvements. Such a state
system
has some positive attributes but still has one of the following negative characteristics:

Uses all or almost all multiple-choice testing; Tests all students in one or more
grades with a norm-referenced test; Has a single exam as a high school exit or grade-promotion
requirement; or Exhibits generally poor performance on the other standards.

Level 4. State assessment system needs modest improvement. Such a state system
generally
performs well across the standards, has none of the major problems described at previous levels,
but does not show all the characteristics of a model system, including use of sampling and
classroom-based assessments for accountability and public reporting.

 

Level 5. A model system. Such a state system performs well across all the standards,
including
use of sampling and classroom-based assessments as significant portions of accountability and
public reporting. It may need minor improvements in some areas.

 

Not scorable. The state does not have an assessment system and does not mandate any
assessments for districts to use, or is otherwise not scorable.

 

Discussion. This scoring guide gives the most weight to Standard 1. If an assessment
system does
not support high quality teaching and learning, it should be completely overhauled. The presence
of some ameliorating characteristics such as limited use of NRT (e.g., only one grade and subject)
or alternatives to the graduation requirement, or some other significant positive attributes from
the other standards can move a state up a level.

 

D. STATE DATA TABLE

 

1996-97

 

STATE level m-c nrt grad test writing purposes
AL 1 1 1 1 1 1,4,6
AK 1 1 1 3 1 1,2,6
AZ 1 1 1     1,2,6
AR 2 1 1 3   1,2,6
CA* 2 2 1     1,5,6
CO 4 2     1,3 1,6
CT 4 2/3     1,3 1,2,6
DE** 0 4**   3,2 1 1
FL 1 1 1 1 1 1,2,4
GA 1 1 1 1 1 1,2,3,6
HI 1 1 1 1   1,2,6
ID 2 2 1   1 1,2
IL 3 1 1   1 1,4
IN 2/1 2 1   1 1,2,3,4,6
IA 0          
KS 3/4 2     1 1,2
KY 4/3 3 1   2 1,2,3,4
LA 1 2 1 1 1 1,2,5
ME 4 4     1 1,2,5
MD 3 3 2 1 1 1,2,3,4,6
MA 2 1 1 3   2
MI 3 2   4 4 1,2,3,4,5,6
MN 2 1   2 1 1
MS 1 1 1 1 1 1,2,4,6
MO^ 4/3^ 1     1,3 1,2,4,6
STATE level m-c nrt grad test writing purposes
MT 2 1 1     1,2
NE 2 1 1     2
NE 2 1 1 1 1 1,2,6
NH 4 2     1 1,2
NJ 2 2   1 1 1,2,4,5,6
NM 1 2 1 1 1,2 1,2,6
NY 2 2   1 1 1,2,3,4,5,6
NC 1 2 2 1 1 1,2,3,4,6
ND 2 1 1     1,2,4,6
OH 2 1   1 1 1,2,6
OK 2/1 1 1   1 1,2,4,5
OR 3 2     3 1,2,6
PA 3 2     1 1,2,3
RI 3 2/3 1   1 1,2,6
SC 1 1 1 1 1 1,2,3,4,5,6
SD 2 1 1     1,2
TN 1 1 1 1 1 1,2,3,4,6
TX 2 1   1 1 1,2,3,4,5,6
UT 1 1 1     1,2,5,6
VT 5 4     2 1,2
VA 1 1 1 1 1 1,2,5,6
WA 2 2 1   1 1,2,3
WV 1 1 1 4 1 1,2,4,5,6
WI 2 2 1   1 1,2,4,6
WY+ 0 4       1

Coding and notes follow on next two pages.

Coding of table

 

level = the level of the state program according to the FairTest scoring guide

1 = needs a complete overhaul

2 = needs many major improvements

3 = needs some significant improvements

4 = needs modest improvement

5 = model system

0 = no state system and no state mandate for particular district testing; or otherwise not scorable

 

mc = multiple-choice, excluding writing assessment

1 = all/almost all m-c

2 = majority m-c

3 = minority m-c

4 = no/almost no m-c

 

nrt = use of a norm-referenced test (NRT)

1 = uses an NRT

2 = uses an NRT, but on a sampling basis

 

grad test = graduation test

1 = has a test and passing it is required for graduation

2 = has a required graduation test, but also an acceptable alternative

3 = state plans to require a graduation test but does not now have one

4 = has a graduation test, but passage is not required for diploma

 

writing = states have a writing assessment

1 = write to a prompt

2 = portfolio

3 = multiple choice

4 = anything else for writing

 

purposes = purposes for the test

1 = improve curriculum and instruction

2 = program evaluation/public reporting

3 = rewards for schools/districts

4 = sanctions for schools/districts

5 = rewards or sanctions for students other than high school graduation

6 = student diagnosis

 

 

Notes:

 

Data is from 1996-97 school year, except 1995-96 for Arkansas, Connecticut, Florida,
Maryland,
Mississippi, Ohio, which did not respond to FairTest survey.

 

In the "level" column, use of a slash (/), as in 4/3, indicates that the system is on the
border; the
first number is the direction in which the state appears to be leaning. In this column, numbers
separated by a comma indicate a system whose parts (current, or current and being implemented)
require separate evaluation.

 

In the multiple-choice ("m-c") column, use of a slash (/) indicates we could not precisely
determine the proportions of multiple-choice items used on state assessments.

 

* California pays districts to test voluntarily, mostly with NRTs (hence a 2) and has other
exams
that are criterion-referenced with some constructed-response (hence a 3).

 

** Delaware assessed only writing 1996-97, not a full state testing program, hence a 0. Its
new
program is still being designed, but it will include norm-referenced tests and a high school exit
exam (which will allow for alternatives) hence a 2.

 

^ Missouri's incoming program appears likely to score at a level 4; the current program,
which
relies primarily on criterion-referenced multiple-choice items but employs sampling, rates a 3.

 

+ Wyoming assessed only employment readiness in 1996-97, and that on a sampling basis,
making
it really a state without a state assessment system.