Some Criteria for Intelligent Accountability Applied to Accountability in New Zealand


Terry Crooks
Educational Assessment Research Unit
University of Otago
Box 56, Dunedin, New Zealand

Paper presented at the annual conference
of the American Educational Research Association,
Chicago, Illinois, 22 April 2003,
within Session 36.011 - Accountability from an International Perspective.

Some Criteria for Intelligent Accountability

Because this is the first paper in this session, it is appropriate to begin by considering why we have accountability processes and what sorts of criteria might be used for evaluating their performance. After attempting to do that, I will describe what I regard as the main accountability mechanisms for primary (elementary) and secondary education in New Zealand and discuss how they are performing. I recognize that some accountability mechanisms are aimed as much at students as at teachers or administrators, but for simplicity here I will focus on accountability applied to teachers and administrators.

It is not difficult to think of several possible rationales for accountability mechanisms in education:
o believing that many teachers and schools will only do their job properly if tightly directed and carefully monitored;
o believing that a minority of teachers and schools are performing poorly, and need to be identified and either removed from their roles or persuaded to improve their performance;
o believing that teachers and schools are generally well intentioned and professional in their work, but that it is helpful to them to have unequivocal external guidance about the goals they should be aiming towards;
o believing that teachers and schools will find it helpful to have a systematic source of well informed feedback about their work.

These purposes range from a distrustful, policing vision of accountability to a more collegial, formative vision. Most accountability systems explicitly or implicitly encompass more than one of these purposes, and some could be seen to include elements of all of them. None of the purposes can be rejected out of hand, but I think it is worthwhile to evaluate them further and perhaps suggest some priorities, and some principles to help achieve those priorities.

Each year, the British Broadcasting Corporation chooses an eminent scholar to deliver a series of lectures called the Reith Lectures. Last year, the Reith Lecturer was Cambridge University philosophy professor Onora O'Neill (O'Neill, 2002). Her third lecture was about accountability, and towards the end of it she commented:

Perhaps the present revolution in accountability will make us all trustworthier. Perhaps we shall be trusted once again. But I think that this is a vain hope - not because accountability is undesirable or unnecessary, but because currently fashionable methods of accountability damage rather than repair trust. If we want greater accountability without damaging professional performance we need intelligent accountability. [my emphasis]

Drawing on other parts of her lecture series, on what I think we know about human learning and behaviour, and on what we know about assessments and their use in accountability processes, I want to suggest six criteria for intelligent accountability. Their goal is better quality education. They are not tidily distinct from each other, and could perhaps be distilled into three or four broader criteria, but for now I believe there is value in keeping them separate.

My first criterion is that intelligent accountability preserves and enhances trust among the key participants in the accountability processes. The focus of Professor O'Neill's whole lecture series was on the erosion of trust in our societies. She saw the challenge of trust rather like a tight-rope walking act: too much trust can be dangerous, but so can too little. In lecture 1 she said:

Each of us and every profession and every institution needs trust. We need it because we have to be able to rely on others acting as they say that they will, and because we need others to accept that we will act as we say we will. The sociologist Niklas Luhman was right that 'A complete absence of trust would prevent [one] even getting up in the morning.'

Education is an enterprise built on the interaction of human beings: administrators, teachers, students and parents. The chances that it will be successful must be greater if the participants trust each other. Accordingly, when evaluating accountability mechanisms a prime consideration should be the extent to which they foster or undermine trust: between teachers and students, between teachers and administrators, between educators and key education agencies, between educators and politicians, and between educators, parents of school students, and the wider community.

A second criterion for intelligent accountability is that it involves participants in the process, offering them a strong sense of professional responsibility and initiative. This, of course, contributes to trust. The seminal work of Royce Sadler (1989) on formative assessment stressed that the active involvement of students in self-assessment was a vital component of formative assessment with students. Wynne Harlen and Mary James (1996, p. 7) summarized the point nicely:

…students have to be active in their own learning (teachers cannot learn for them) and unless they come to understand their own strengths and weaknesses, and how they might deal with them, they will not make progress.

A similar message surely applies to the assessment of teachers, administrators and schools.

It appears that the key characteristic that distinguishes professionals from non-professionals is that professionals accept the primary responsibility for the quality of their work. In other words, to a large extent they are self-assessing and self-regulating. Self-regulation involves the ability to control or manage one's own work and mustering the willpower to continue to work on achieving high standards of performance - what Lyn Corno (1992) described as "volition".

If our accountability measures sideline school leaders or teachers, making them feel like pawns in a game that is controlled by someone else, their sense of professionalism is threatened, and this in turn could be expected to undermine a key resource: their intrinsic motivation. To quote Onora O'Neill again:

I think that many public sector professionals find that the new demands damage their real work…Each profession has its proper aim, and this aim is not reducible to meeting set targets following prescribed procedures and requirements.

A third criterion for intelligent accountability is that it encourages deep, worthwhile responses rather than surface window dressing. In his powerful article in the March 2000 Educational Researcher, Bob Linn gave numerous examples of evidence that test based accountability regimes had produced substantial short term gains in test scores on the criterion measures selected, but little or no gain on what seemed to be valid alternative measures of the intended educational outcomes. He and others have presented evidence that educators and students, faced with high stakes accountability tests, often choose approaches designed to achieve good results on the particular measure. These range from teaching narrowly to the specifications and approaches of the particular tests to strategically manipulating who takes the tests. Intelligent accountability, on the other hand, would promote deep, high quality learning in the domain to be assessed: the sort of learning that should have long-term payoff on any appropriate outcome measure. Deep learning takes time and focus, and is undermined by overemphasis on short-term goals.

My fourth criterion for intelligent accountability is closely related to the third: that it should recognise and attempt to compensate for the severe limitations of our ability to capture educational quality in performance indicators. Again, the review by Bob Linn (2000) provided very clear evidence that single indicators are highly vulnerable to distortion. Measurement professionals know that the adequacy of sampling of behaviour is a crucial issue in educational measurement. This operates at multiple levels: individual tasks, collections of tasks (tests or test subscales) intended to evaluate performance in a domain, and for accountability purposes the selection of domains from the full collection of desired educational outcomes that make up the intended school curriculum.

We know that quite subtle changes in how a question is asked can result in large changes in the apparent performance of students. In research conducted by the Assessment of Performance Unit in the UK in the 1980s, different versions of several tasks were administered to large randomly equivalent samples of 11 year olds. The tables below show the different versions of three tasks, with the percentages of students getting the correct answer.

3 added to 14 makes _____ [97%]

What number is 3 more than 14? [67%]

What number is 3 bigger than 14? [54%]

Which of these sums up how well students understand the task of adding 3 and 14?

In New Zealand's National Education Monitoring Project, the following two tasks were administered to the same group of about 460 year 8 (12 to 13 year old) students in two different sessions of the 1995 science assessments (Crooks & Flockton, 1996). The success rate on the multiple-choice version was much lower than the success rate on the version where the students used physical resources and were videotaped working individually with a teacher.

Which statement explains why daylight and
darkness occur on Earth?
a. The Earth rotates on its axis. [30%]
b. The Sun rotates on its axis. [5%]
c. The Earth's axis is tilted. [5%]
d. The Earth revolves around the Sun. [60%]

I want you to use the torch for the Sun and the
Globe for the Earth to show how we have day
and night. [68% of year 8 students held the torch
stable and rotated the globe].

When we aggregate assessment tasks (items) to assess a whole domain, the problems at the level of individual tasks are joined by the problems of adequately assessing the broader domain. The result is often severely constrained by cost considerations, by choices of task format, by difficulties with assessing some parts of the domain in a short time and a highly standardized way, and by pressures to achieve high internal consistency in a test of a domain that really is quite varied. As a result of these pressures, the test is likely to achieve a quite limited sampling of the domain.

In our accountability systems, a broader sampling issue becomes prominent. Usually, these systems select certain indicators as the important ones to be focused on: often reading, mathematics and science. Each of those tests is a limited sample of their domain, but now we have important judgments and decisions being made with the weight entirely placed on these areas, taking no account of performance in other curriculum areas, in the development of learning skills and attitudes, and the development of citizenship. The results can be highly misleading. For instance, one teacher, school, or school district may choose to focus almost entirely on the assessed areas, while another might maintain a very balanced curriculum. If their test results are comparable, they may be seen as doing equally well (or badly), without taking any account of differences of performance in other learning outcomes not covered by the tests.

The National Assessment of Educational Progress in the United States aims to provide a broad picture of educational outcomes nationally, but has been constrained by political pressure to focus frequently on a few priority areas and touch others quite rarely. The state programmes of higher stakes testing of whole cohorts of students usually are much less wide-ranging, and therefore potentially more misleading as accountability indicators.
Onora O'Neill (2002) sums my extended points up very nicely in her third Reith lecture:
In theory again the new culture of accountability and audit makes professionals and institutions more accountable for good performance. This is manifest in the rhetoric of improvement and rising standards, of efficiency gains and best practice, of respect for patients and pupils and employees. But beneath this admirable rhetoric the real focus is on performance indicators chosen for ease of measurement and control rather than because they measure accurately what the quality of performance is.
In the end, the new culture of accountability provides incentives for arbitrary and unprofessional choices.

A fifth criterion for intelligent accountability is that it provides well-founded and effective feedback that promotes insight into performance and supports good decision making about what should be celebrated and what should be changed. Any good coach knows that while part of their role is judgmental - applauding good performance and criticising poor performance - that role has to be complemented by guidance on what needs to be improved and how to go about it. Judgment without help is a poor accountability model.

One of the key decisions taken when we designed New Zealand's National Education Monitoring Project was that more than half of the assessment tasks would be released each time we released a report on student performance (Crooks & Flockton, 1993). Furthermore, the reports are released with a commentary from a group of educators who have tried to identify the good news, the concerns, and some suggestions for action to address the concerns. Our intention is to help educators to understand in some detail what students could and could not do, so that they could think about how their teaching could best be focused to help students improve in areas shown to be weak. By retaining a higher proportion of undisclosed items and concentrating our reporting on domain performance rather than individual task performance, we might have had a politically more powerful indicator, but one less likely to influence teaching of the areas being assessed. The intensive involvement of teachers in administering and scoring the assessments also strengthens the feedback link between the assessment and teaching practice.

In providing feedback on performance in an accountability system, there are three familiar options: comparisons with others (normative), comparisons with pre-defined standards (criterion-referenced or standards-based), and comparisons with past performance (sometimes called ipsative). For formative purposes, comparison with past performance is the clear winner. Every teacher, school and school district has the possibility of doing better, and evidence that they are doing better is highly motivating. An appropriate standard can be found for each individual - a target that is within reach with substantial effort.

My sixth criterion is that as a consequence of the accountability process, the majority of participants are more enthusiastic and motivated in their work (or at least not less enthusiastic and motivated).

When, fifteen years ago, I tried to synthesise research about the impact of assessment practices on students (Crooks, 1988), I planned to focus on cognitive factors. Several months of intensive reading of the literature in this area forced me to reconsider: I could not avoid the conclusion that impacts on motivation were of crucial importance, and indeed that motivation is of crucial importance in education. A couple of years later, I found this wonderful quote which I have used it ever since.

The chief impediments to learning are not cognitive. It is not that students cannot learn; it is that they do not wish to. If educators invested a fraction of the energy they now spend on trying to transmit information in trying to stimulate the students' enjoyment of learning, we could achieve much better results. (Csikszentmihalyi, 1990, p. 115)

In my view, this statement applies equally well to the effect of accountability processes on educators. There is now an enormous literature on motivation in education. Deborah Stipek (1996) has provided a good summary. It is clear that a highly desirable quality in education is intrinsic or continuing (Maehr, 1976) motivation. I am suggesting that we should constantly monitor all of our educational processes, including accountability processes, for their impact on the continuing motivation of the key participants. A key issue seems to be achieving a good balance between intrinsic and extrinsic motivation. It is unrealistic to expect participants in our education system to be intrinsically motivated all of the time. Even the best students have subjects that they only do because the teacher tells them to, or because a failing grade or low mark would be too embarrassing to bear. Most of the dedicated teachers who get enormous pleasure from seeing students blossom under their care also want to be given recognition for the effectiveness of their teaching. Extrinsic motivation is not bad, but it is not a sure winner: it needs to be applied sparingly and sensitively, and seen as a less powerful and less desirable alternative to intrinsic motivation. Intrinsic motivation should be treasured, protected and cultivated as much as is possible, because of the quality of work that it tends to foster.

Some of the politicians and administrators responsible for accountability processes are operating within an extrinsic motivation perspective. Indeed, the concept of assessment driven instruction fits within such a perspective. If educators feel too much pressure of extrinsic motivation through an accountability system, there is considerable risk to the quality and extent of their motivation as educators.



Accountability in Primary and Secondary Education in New Zealand

In the remainder of my paper, I will describe and make some evaluative comments about accountability systems in New Zealand primary and secondary schools. In doing so, I will make significant use of material from two recent papers: a description of educational assessment practices in New Zealand schools (Crooks, 2002a) and a discussion of links between educational goals, learning process and assessment procedures (Crooks, 2002b).

A Brief Description of the New Zealand School System
New Zealand is in the South-west Pacific Ocean, about 1600 kilometres east of Australia. It is more than 1600 kilometres long, and 450 kilometres wide at its widest part. Much of the country is rugged or mountainous. New Zealand's population is approximately 4 million, with 85 percent of the population residing in urban areas. About three-quarters of the population are of European origin, predominantly from the British Isles, with the indigenous Maori making up about 15 percent, people from other Pacific islands about 6 percent, and people of Asian origin about 4 percent. Because of more youthful age profiles, the latter three groups contribute about 20, 8 and 6 percent, respectively, of the students in New Zealand schools.

School attendance is compulsory in New Zealand from age 6 to age 16. However, it is normal for children to be enrolled in early childhood care and education well before their fifth birthday, to enter the school system on or soon after their fifth birthday, and to continue their formal education past their sixteenth birthday.

Primary schooling (Years 0-8) usually lasts seven and a half to eight and a half years, with students completing primary schooling at age 12 or 13. Secondary schooling normally lasts for up to five further years (Years 9-13). More than 80 percent of students remain until year 12 and about 55 percent to year 13, with 3 to 4 percent continuing to year 14 to try to upgrade their qualifications.

Because much of New Zealand is sparsely populated, many primary schools are small. For instance, 35 percent of the 2200 schools for year-4 students have fewer than twelve year 4 students, and 48 percent of the 1600 schools for year-8 students have fewer than twelve year 8 students.

State schools are governed by Boards of Trustees. Each Board includes the principal, a staff member elected to represent the school staff, and several trustees elected by the parents of current students. Boards of state-funded integrated schools have additional members appointed to help preserve the special character of these schools (such as the religious heritage associated with the school). Boards may co-opt additional members.

Each Board of Trustees is responsible for the operation of its school, within constraints imposed by its signed agreement (charter) with the national Ministry of Education. This agreement includes rules and guidelines that apply to all schools, but may also include components to reflect the specific goals of each school. Compliance with the agreement is monitored through external audit by the Education Review Office (which will be discussed later in this paper).

Prior to 1989, administration of schools was more centralised. Regional Education Boards worked closely with school inspectors employed by the nation Department of Education to control key decisions for each school, including the selection of the principal and teachers (for primary schools only), the provision and maintenance of buildings and other resources, and professional development programmes for school personnel. The current system devolves most of these responsibilities to individual Boards of Trustees, with school principals having a more crucial role.
Accountability Associated with School Curriculum Requirements
In parallel to the changes in school administration, school curricula have been extensively restructured since the late 1980s. Prior to 1990, curricula evolved slowly, with leadership from experienced specialists in the Department of Education and heavy involvement of teachers in developing successive drafts and trying them with students. The syllabi that resulted from this process were not highly prescriptive, but were often supported by nationally developed resources that teachers could use. In 1991, a new government and Minister of Education introduced sweeping changes modeled on the curriculum and assessment changes made in the late 1980s in England and Wales. These moves towards tighter content and performance standards must be seen as key accountability measures (Linn, 2000).

The rationale for these changes ostensibly was to improve student learning through better designed and more focused teaching and assessment programmes, and greater assurance of continuity of learning when students moved from one school to another. A more jaundiced interpretation was that the changes reflected a lack of trust in teachers and a wish to restrict their freedom through tighter specification of what they would teach. It is interesting to note that the reforms of school administration in the later 1980s purported to give local communities greater control over the education of their children, yet much of this control was then withdrawn through tighter specification of school curricula and their enforcement through legislation and the activities of the Education Review Office (Codd, McAlpine & Poskitt, 1995).

Operating under considerable time pressure, working groups of curriculum experts and teachers identified several strands for each curriculum, and within each strand a substantial number of achievement objectives. The objectives were then placed into eight levels, representing the planned progress of students over their years of primary and secondary education. The first five levels were spaced approximately two years apart, with the last three levels each intended to represent one year of normal progress. After rapid development of the first four curricula (mathematics, science, English, and technology), primary school teachers were expressing considerable concern at the pace of curriculum change with which they needed to cope. Further curriculum documents were released to a slower timetable, and changes to the national education guidelines for schools offered schools greater flexibility in curriculum implementation, particularly for the first four years of school where renewed priority was given to literacy and numeracy.

These curriculum changes of the last ten years have seen a considerable tightening of curriculum expectations for schools and teachers. Pressure to implement the fairly detailed progressive structures of the new curricula, with their strands, levels, and achievement objectives, largely has come through the reviews of schools conducted by the Education Review Office, supplemented by nationally funded but regionally implemented professional development contracts. Paradoxically, while curriculum expectations had been tightened, teachers were initially offered few resources to support their implementation of the new curricula. More recently, substantial numbers of new resources have been prepared and distributed.

There is significant debate about the merits of these curriculum changes of the last ten years. On one hand, they are seen as helping teachers by offering them more structured guidance for their teaching, protecting students from teachers who are not very capable of planning good programmes without detailed guidance, and increasing the continuity of learning programmes for students moving between schools. Counter arguments are that many teachers feel that they and their students are on a treadmill of curriculum requirements, that the large list of achievement objectives encourages shallow rather than deep learning, and that the loss of the trust and creative freedom that teachers previously enjoyed has undermined their sense of professionalism.

These latter concerns are not restricted to New Zealand. Australian academic John Smyth (2002) commented recently that:

…it is clear that there is a gradual leaching out of innovative practices as schools and teachers become less inclined to engage in risk-taking activities (code for progressive teaching), and instead opt for safe practices, within a wider culture of conformity.

He went on to say that:

The effect of all this on teachers has been: an intensification of their work; a qualitative shift in the nature of that work in the direction of management and accountability; a lack of time to reflect on what they do; embracing imposed reforms in superficial ways; and a general feeing of drowning and "needing a big straw just to get to the surface."

David Almond, the prizewinning English author of Skellig and Kit's Wilderness, who is also a teacher, made these comments in his 1999 Carnegie Medal acceptance speech:

There's an arrogance at work: the arrogance that we know exactly what happens when someone learns something, that we can plan for it, that we can describe it, that we can record it - and that if we can't do these things, then the learning doesn't exist. The arrogance leads us to concentrate on a particular kind of work - noses-to-the-grindstone treadmill kind of work, work that is observable, recordable and well-nigh constant.

What would the assessors and recorders have made of Archimedes splashing happily about in his bath before he yelled Eureka! What would they have made of James Watson snoring in his bed as he dreamed the molecular structure of DNA?

I don't think that things are that bad in New Zealand, but I worry about the extent to which teachers and students are on a treadmill of requirements that threaten the nature and quality of both motivation and learning. Is schooling becoming too much "work" and too little "fun"? Is it becoming too much "prescribed" and too little "choice"? Tightening curriculum specifications and introducing them at a pace that prevents primary school teachers, in particular, from having time to deeply understand them and adopt them as their own, encourages teachers to offer their students a menu of continuous skimming across the surface of knowledge.

Of course, there is a legitimate reason behind the changes: to protect students from poorly thought out teaching. In the 1970s and 1980s, New Zealand offered teachers considerable flexibility, allowing the many fine teachers to offer creative programmes that they and their students found enjoyable. This freedom, however, also probably allowed the weakest New Zealand teachers to offer seriously inadequate programmes. Do we accept a push towards uniformity in order to limit the risks of poor teaching? How much does this in turn risk undermining first the creativity of teachers, and then the energy that is associated with a sense of mission and personal ownership of that mission.

I believe there is a strong case to ease off on the vice that primary teachers are feeling squeezed by: ever more curriculum requirements to be fitted into a constant amount of time. Unless that happens, the quality of learning is seriously threatened. But we have the dilemma that all parts of the curriculum seem to have merit - that we don't want to leave any of it out. I suggest that the solution has to be to raise the quality of learning, so that there does not need to be so much repetition year after year - so many spirals in the curriculum. This would allow a focus on fewer topics at any given time, but at greater depth and with greater likelihood of solid, enduring, and enjoyable learning. While not changing the long-term goals they are working towards, teachers could also exercise greater professional freedom in selecting the experiences they would organize to help the students towards those goals.

Accountability through student assessment
During the first 100 years of the New Zealand education system, standards testing was a quite prominent aspect in our schools (H. Lee & G. Lee, 2000). Standards tests were administered by national school inspectors, and determined whether students were performing at a suitable level to proceed to the next higher level of schooling the following year. Indeed, the middle years of our school system were labeled standards 1 to 6, because of the standards testing associated with those progressions. Gradually, the role of the inspectors and standards testing decreased, until by the middle of the 20th century the vast majority of students progressed through their primary and early secondary schooling in age cohorts. Occasionally, a student would be retained in the same class level for a second year, and more rarely a child would be accelerated, but these decisions would be taken by local school personnel, in consultation with the student's family, rather than by a school inspector.

Thus for the last 50 years teachers and students have been free of national testing until the last three years of secondary schooling. That testing has, however, been a powerful accountability force. Students in years 11 to 13 attempt to obtain national qualifications through national end-of-year examinations, moderated school-based assessments, or a combination of both. The precise form of these assessments has been adjusted several times over the 50 years, while the percentage of students remaining in school until years 12 and 13 has risen dramatically. I have given a brief description of these historical changes in an earlier paper (Crooks, 2002a), and here will focus just on the accountability implications.

There is little doubt that the nature of secondary schooling in New Zealand has largely been sculpted by the national qualifications awarded through national examinations and moderated internal assessments in the final three years of the secondary school system. The prescriptions of the content and skills that will be assessed in the examinations has defined what secondary schools aim to teach in those years, and has also had a significant backwash effect downwards onto the first two years of secondary schooling, giving those years a substantial focus on early preparation for the national examinations.

A less buoyant employment market, increasing demand for tertiary education qualifications, and removal of some of the hardest edges of the national testing arrangements in the final three years of secondary education have resulted in a dramatic increase in retention of students to the highest level of secondary school. Retention of entering secondary school students to a fifth year of secondary schooling has risen over the last 15 years from about 20 percent to about 50 percent. This has forced some rethinking of curriculum and assessment arrangements. Examinations aimed at preparing and selecting students for advanced university studies no longer suit many of the students remaining at school. The has led to the development of a much wider range of subject options, and a new structure with the daunting task of overseeing and coordinating all qualifications in the upper secondary and tertiary education sectors: the New Zealand Qualifications Authority (NZQA - see

Whereas the upper secondary examinations have operated on a normatively standardized basis, the NZQA is committed to the use of performance standards in all of the assessments it is responsible for implementing or monitoring. Despite this, there are still strong pockets of support for norm-referencing and heavy reliance on national examinations, particularly among secondary schools that showed up well under those assessment arrangements. Furthermore, many teachers who have found the arguments for standards-based assessment convincing have struggled to develop practices consistent with the goal. The implementation of fully standards-based assessment in national qualifications at upper secondary school is now well underway, but its implementation in 2002 for year 11 students was accompanied by many reports of difficulty, frustration, and lack of faith in the new approaches. There is less concern this year, with implementation for year 12 underway in most schools, but nevertheless the further implementation to years 12 and 13 over the next two years will be a stern test of the national level planning, the professional development arrangements for the teachers, and the resilience of the teachers and administrators.

While much of the assessment takes place through national end-of-year examinations, substantial percentages of the marks in most subjects are awarded using internal (school-based) assessments. This practice has developed over the last twenty years, and has been accelerated by the recent changes. This has meant that more balanced coverage of the learning goals of the subjects has become possible, and the students' results are not so dependent on single day performances on paper-and-pen examinations. This has reduced the potential for the assessment regime to seriously distort the teaching and learning programmes in schools. In other words, the assessments have a fairly high level of content
validity, and to the extent that they drive instruction they do not, in my opinion, seriously misdirect it.

Using these national examination and assessment results in making judgments about the performance of teachers and schools remains a contentious issue in New Zealand. Because statistics about the national examinations results are publicly available school-by-school, under New Zealand's Official Information Act, many newspapers have published tables comparing the results for differing schools in their region. Often these comparisons have taken no account of different school circumstances, such as the socio-economic resources of the families of the children attending each school, but in some cases more selective comparison groups have been chosen. This publicity raises the stakes for educators in secondary schools, whose work is substantially judged by the published results.

There is no national testing at any level below year 11, where it might have more serious consequences because of the wider focus of students' studies in those earlier years, and the correspondingly great difficulty of assessing the full sweep of learning through external testing. Furthermore, there is no requirement to use standardized tests, although there are some that are in reasonably widespread use. Some political parties and interest groups have argued strongly for national testing in primary and intermediate schools to permit such comparisons at those levels, but to date this has been resisted strongly and effectively by teachers and others concerned about the direct and indirect effects of such high stakes testing. The most recent concerted effort was by the previous government, which in a Green Paper (Government of New Zealand, 1998) proposed whole cohort testing in primary schools, initially at year 6 and year 8 levels. That proposal was opposed in a large majority of the submissions on the Green Paper, and as a result was not implemented. Some educators, officials and organizations remain keen for national testing to be introduced to primary schools, believing that this would allow better information to be obtained about the work of individual schools. The difficulties of interpreting that information appropriately, and the potential backwash effects on teachers and learners, remain the most potent arguments against such a development.

Instead of implementing national testing of all students, the present government is funding the development of tools for assessing literacy and numeracy, in both English and Maori, initially for the upper years of primary education. These Assessment Tools for Teaching and Learning are being developed by a group led by John Hattie at the University of Auckland ( They are supplied on CD-ROM, free of charge to all schools. They include banks of nationally-normed items, calibrated by one-parameter item response modeling, together with software to assemble tests and print reports in several easily understood graphic formats. The reports are intended to identify strengths and learning needs for individuals and groups of students, and to compare the performances of individuals and groups to selected norms. Tests can be tailored to the preferences of particular schools or teachers, for instance by selecting particular curriculum strands to focus on and emphasising easier or harder items, and the selection algorithm gives preference to items not used recently. Results can be viewed by curriculum level and compared against a variety of norms, such as students attending similar schools. This flexibility in test construction and reporting considerably reduces the risk that these test results will be used for high stakes comparisons among schools. It is not yet clear how teachers and schools will use these assessment tools. Some can be expected to use them as an additional source of information for teacher judgments and school self-monitoring. Others will report the results directly to parents. The Education Review Office will have a significant influence based on the use it makes of the results in its reports on school performance.

Even though the large potential problems associated with national testing have been avoided to date in our primary schools, all is not sweetness and light! There are some important concerns about the assessment aspects of our primary school curricula. During most of the 1990s, the new curriculum documents, each with their several strands, eight levels, and more than 100 achievement objectives, had many teachers trying to record which objectives each student had mastered or not mastered. This was a hopeless task: there were so many achievement objectives across the seven large curriculum documents that primary school teachers were trying to implement (several hundred per curriculum level across the seven documents) that to monitor a class of children adequately against these objectives was impossible. Fortunately, in the last five years New Zealand teachers and government agencies seem to have retreated a little from the tick box approach, which often did little more than say that the teacher was attempting to teach that objective to the student. The day-to-day unpredictability in student response means that frequent, fragmentary assessments have little meaning, yet they consume time and attention that would better be devoted to higher value teaching practices, including assessment for learning, as opposed to assessment of learning (Black & Wiliam, 1998; Assessment Reform Group, 1999; Stiggins, 2001).

Accountability of Schools Through the Work of Central Agencies
Two government agencies play major roles in overseeing the work of schools. The Ministry of Education (, working with the Government, defines the resources available to schools, the curriculum to be followed, and the boundaries of acceptable practice. This extends to a more specific arrangement with each school, through a charter that is agreed between the Ministry and the school, and through a new requirement for school goal setting and annual reporting against those goals. The Education Review Office ( is independent of the Ministry, and has the role of evaluating and reporting publicly on the work on individual schools. I am going to discuss each briefly.

Apart from its "front-end" role , in consultation with the government and other parties, in defining the goals and working conditions of schools, the Ministry has several key roles in monitoring and responding to school performance.

First, it has commissioned the Educational Assessment Research Unit at the University of Otago to conduct regular national monitoring of what primary school students know and can do. Since 1995, the National Education Monitoring Project (NEMP) has provided detailed national assessments of the knowledge, skills and attitudes of primary and intermediate school students at two levels: year 4 (ages 8-9) and year 8 (ages 12-13).

Nationally representative samples of approximately 500 students attempt each assessment task. A matrix sampling arrangement distributes three sets of tasks among 1500 students at each year level, so that more tasks can be used without excessive demands on each student. NEMP operates on a four-year cycle, covering fifteen different areas of the national curriculum over a four-year assessment cycle (; Flockton, 1999). About one third of the assessment tasks are kept constant from one cycle to the next. This re-use of tasks allows trends in achievement across a four-year interval to be observed and reported. The remaining tasks are released, making them available for teacher use and allowing detailed and clear reporting of students' responses. Many of the tasks are performance tasks, and heavy use is made of videotaping to record student responses. The tasks are administered to individual students or groups of four students by specially trained teachers.

This is system level accountability. It identifies which aspects are improving, staying constant, or declining nationally, allowing successes to be celebrated and priorities for curriculum change and teacher development to be debated. No information is provided about individual students or schools; so this is not a high-stakes accountability mechanism, but increasing attention is being paid to the trends and performance patterns which are being revealed by NEMP, and these are influencing both educational practices in schools and the development of national education policy.

Second, the Ministry has developed a new planning and reporting policy for schools, imposed on schools by the Education Standards Act (2001). It requires schools to set specific goals for school performance, in consultation with the Ministry. These goals will most commonly, but not exclusively, be expressed in terms of how the students perform. This is done in the school charter, which includes a long-term strategic plan and an annually updated section spelling out annual targets and priorities. The aim is to raise achievement and reduce disparities for particular groups of students. Because of existing patterns of disparity for Maori and Pasifika students, the plan must give attention to their needs and to the provision of instruction relating to Maori language ad culture. Each school is to monitor performance against their charter and report annually to the Ministry, including in their report "an analysis of any variance between the school's performance and the relevant aims, objectives, directions, priorities or targets set out in the school charter" (section 87, clause 2 of the Act).

This regime does not come into full operation until 2004, so just how it will be implemented and what its longer-term effects will be are matters of speculation at present. Key issues will be the extent to which the policy builds trust and an optimal motivational environment, and the extent to which schools' own priorities are given weight relative to government priorities. There is no doubt that there is merit in schools establishing goals for improvement, finding ways to monitor progress towards these goals, and reviewing their success or lack of success in making the desired improvements. Such a process is consistent with my earlier comments about the importance of self-evaluation and self-regulation, and the association of these with professionalism. A key point, though, is that people need to own such strategies - to adopt them because they believe in their value, rather than be forced reluctantly to adopt them. At present most school personnel have
little sense of ownership of this new policy and see it as another bureaucratic requirement imposed on them, and one that has more suggestion of threat than promise of help.

Third, it should be noted that while it is the role of the Education Review Office to visit schools and evaluate their work, that Office has no power to require schools to change. That power rests with the Ministry of Education, which can close schools, suspend their Board of Trustees and replace them with commissioners, or impose strict requirements on how a particular school operates. While not used a lot, this is accountability in its most obvious forms.

The Education Review Office is the other government agency most involved in school accountability. It has changed in very significant ways since it was established 14 years ago, partly as a result of three government-requested reviews of its role and functioning.
I will comment briefly on some of the history and then outline its current approach to the evaluation of schools, which is the accountability mechanism that has the greatest influence on most New Zealand primary schools (the national qualifications structure almost certainly has a stronger influence on most or all secondary schools).

The Education Review Office was established in 1989, as one part of the reforms that placed schools administratively under individual boards of trustees and and in theory, at least, removed the Ministry of Education from operational responsibility for running the education system, placing emphasis instead on its policy role (Lange, 1988). Prior to this time, the Ministry had employed Inspectors of Schools, in regional clusters, who were responsible for periodic inspections of schools, the staffing of primary schools, oversight of most of the professional development for school personnel, and the grading of teachers.
Inspectors also had a very significant role in acting as advisors to schools.

In these plans, the only one of these roles to be passed to the Education Review Office was the periodic inspection of schools. This was initially conceived of as an audit function, and the name initially proposed for the office was the Review and Audit Agency. Central to its design was the idea that an agency responsible for the summative evaluation of schools could not provide advice as to how schools were to resolve the problems identified by the Agency, because this would mean that on their next review in that school they would in part be reviewing the effects of their own advice, and this would compromise their audit role. As an aside, recent problems experienced by major accounting firms involved a similar issue.

One of the consequences of this design idea was that many schools saw the Education Review Office as an agency that posed a threat and offered no help. This view was exacerbated by the first model of review adopted, Accountability (compliance) Audits. These focused on checking whether features required by legislation or school charters were present in the schools. A checklist approach was used, focusing on relatively easily checked features ranging from the display of fire exit signs to the development of school policies and evidence of paperwork related to the Government's curriculum and assessment requirements. Schools had little role in defining what features would be looked at and valued. The tone of the reports seemed to schools largely negative: positive features were treated as expected, while negative features were highlighted. Thus even schools with mostly good results received reports that had a negative tinge, and what varied was the volume of negative comment. While there was good justification for checking many of the features checked, the model failed to meet most of the criteria for intelligent accountability I proposed earlier. There often was little trust, little ownership, little attention given to deeper aspects in the work of schools, and seemingly good reward for attention to surface details. The fact that the results were published and therefore available to news media and the general public made them high stakes evaluations. An unfortunate side effect was that staff of the Education Review Office, many of whom had previously been inspectors of schools, found that they had much lower status in schools than many of the inspectors had had. They often were seen as adversaries, rather than valued (if somewhat scary) experienced colleagues and influential educational leaders. This reduced the ability of the Office to recruit successful, experienced teachers and principals.

After about four years, Effectiveness Reviews were added to the mix. These came closer to addressing the core of the educational enterprise, because they focused on what schools were helping their students to achieve. Educational effectiveness is, however, extraordinarily difficult to judge. Brief visits by reviewers, usually involving less than an hour looking at each class, provide a very limited platform for judging the effectiveness of school programmes. The expertise of reviewers is critical in this sort of work, and it is hard to escape the view that on average the reviewers were less well equipped to make these judgments than the school inspectors of ten years earlier.

In my view, no part of the work of ERO during the 1990s did so much harm to teachers' trust in ERO and to the public's trust in teachers and schools than some of the public comments of the Chief Reviewer of this era. Those picked up by the news media had a strong negative focus, usually criticizing yet another aspect or segment of the education system, and seemingly suggesting that remarkably low proportions of teachers and schools were doing a good job. This had a demoralizing effect, totally inconsistent with intelligent accountability, even though it may have gained favour for ERO with hard-line politicians, agencies and interest groups. Some of the comments did lead to very worthwhile action, notably in certain areas of the country where poor socio-economic conditions appeared to have demoralized some schools. Even these gains could, I believe, have been achieved with less public pain for hard working teachers and schools. There are appropriate ways to stir remedial action without public derogation, and without punishing all for the deficiencies of some - good teachers use such ways all of the time!

After two reviews in the late 1990s, and the end of the term of office of one Chief Reviewer and the appointment of another, the situation is steadily improving. The new review model is Review and Assist, with its focus on educational outcomes for students and factors that directly relate to these. Assistance is limited to guidance and suggestions of alternative approaches to consider, provided at the time of review. Most important of all, schools are asked to conduct a self-review and the external review places major emphasis on school targets and priorities, while not neglecting to draw attention to government goals that appear to be receiving too little priority. There is much more of a sense of negotiation in this model, and compliance has become a lesser component with the focus more on quality of teaching and learning programmes. The main concern remaining must be the quality of reviewers: that can never be high enough to allow complacency, and the Office has to overcome aspects of its previous reputation to attract staff of high caliber.

To sum up, then, New Zealand has a distinctly different accountability profile to England, Canada or the United States. There is currently little use of high-stakes testing, except in the final three years of secondary school. Recent changes in curriculum requirements have followed the outcomes-based models apparent in these other countries, but they are enforced more by the work of the Education Review Office than by testing programmes. Similar issues of curriculum overload and achieving a good balance between depth and breadth of learning apply in all of these countries, and are a long way from being resolved.
There is considerable scope for re-thinking the rationales for and forms of accountability processes to make them more intelligent, so that their ultimate effects are to enhance the quality of education. I see little point in accountability processes if they do not have formative effects alongside their summative purposes.


Assessment Reform Group (1999). Assessment for learning: Beyond the black box.
Cambridge: University of Cambridge School of Education.

Black, P. & Wiliam, D. (1998) Assessment and classroom learning. Assessment in Education, 5, 7-74.

Codd, J.; McAlpine, D. & Poskitt, J. (1995). Assessment policies in New Zealand: educational reform or political agenda, in R. Peddie and B. Tuck (Eds) Setting the Standards (pp. 32-58). Palmerston North, New Zealand: Dunmore Press.

Corno, L. (1992). Encouraging students to take responsibility for learning and performance. Elementary School Journal, 93, 69-83.

Crooks, T.J. (1988). The impact of classroom evaluation practices on students.
Review of Educational Research, 58, 438-481.

Crooks, T. J. (2002a). Educational assessment in New Zealand schools. Assessment in Education, 9, 237-253.

Crooks, T. (2002b, December). Assessment, accountability and achievement-principles, possibilities and pitfalls. Keynote address to the annual conference of the New Zealand Association for Research in Education, Palmerston North.

Crooks, T.J. & Flockton, L.C. (1993). The design and implementation of national monitoring of educational outcomes in New Zealand primary schools. Dunedin, New Zealand: Higher Education Development Centre.

Crooks, T.J., & Flockton, L.C. (1996). Science assessment results 1995: National Education Monitoring Report 1. Dunedin: EARU

Csikszentmihalyi, M. (1990). Literacy and intrinsic motivation. Daedalus, Spring, 115-140.

Flockton, L. C. (1999) School-wide assessment: National Education Monitoring Project Wellington, New Zealand: New Zealand Council for Educational Research.

Government of New Zealand. (1998). Assessment for success in primary schools. Wellington, New Zealand: Ministry of Education.

Harlen, W. & James, M. (1996, April). Creating a positive impact of assessment on learning. Paper presented at the American Educational Research Association annual conference, New York.

Lange, D. (1988). Tomorrow's Schools. Wellington, New Zealand: Government Printer.

Lee, H. & Lee, G. (2000). Back to the future? Compulsory national testing and the Green Paper on Assessment for Success in Primary Schools. Waikato Journal of Education, 6, 63-86.

Linn, R.L. (2002). Assessments and accountability. Educational Researcher, 29(2), 4-16.

Maehr, M.L. (1976). Continuing Motivation: An analysis of a seldom considered educational outcome. Review of Educational Research, 46, 443-462.

O'Neill, O. (2002). A question of trust. BBC Reith Lectures 2002 London: BBC.

Sadler, D.R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119-144.

Stiggins, R.J. (2001). The unfulfilled promise of classroom assessment. Educational Measurement: Issues and Practice, 20(3), 5-15.

Smyth, J. (2002). Value the learning, not the paper pushing. New Zealand Education Review, 7(39) [October 9-15], 6.

Stipek, D.J. (1996). Motivation and instruction. In D.C. Berliner & R.C. Calfee (Eds.), Handbook of Educational Psychology (pp. 85-113). New York: Simon & Schuster Macmillan.