Vous êtes sur la page 1sur 52

Durham Research Online

Deposited in DRO:
05 February 2007

Version of attached le:


Published Version

Peer-review status of attached le:


Not peer-reviewed

Citation for published item:


Ridgway, J. and McCusker, S. and Pead, D. (2004) 'Literature review of e-assessment.', UNSPECIFIED.

Futurelab, Bristol.

Further information on publisher's website:


http://www.futurelab.org.uk/resources/publications-reports-articles/literature-reviews/Literature-Review204

Publisher's copyright statement:

This Open Access Policy allows anyone to access our text content (where we have copyright - please note the
exceptions listed below) electronically without charge, as long as certain conditions are met. Users are welcome to
download, save, perform or distribute this work electronically or in any other format, including in foreign language
translation, without written permission subject to the conditions set out in the Futurelab Open Access Licence, some of
which are as follows: * The material cannot be used for commercial gain including professional, political or promotional
uses or for any nancial gain. * The material must be used in full and without alterations or amendments. * Futurelab
and the authors must be acknowledged (with the Futurelab logo - see press page for download details) as the original
source of the material and the relevant Futurelab webpage (where the material can be found) must be given in a
prominent position. It should also acknowledge that its use is subject to the terms of this licence. * Only text is covered
by this licence - no pictures, images, diagrams, moving images, sound, video, downloads of prototypes or software is to
be used. * No more than 500 copies of any one piece of work may be reproduced. * You must advise Futurelab of where
and when the work will be reproduced - please send an e-mail to info@futurelab.org.uk. For a full list of the criteria to
be met, please read the Futurelab Open Access Licence.

Additional information:

Report number 10.

Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for
personal research or study, educational, or not-for-prot purposes provided that:
a full bibliographic reference is made to the original source
a link is made to the metadata record in DRO
the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full DRO policy for further details.

Durham University Library, Stockton Road, Durham DH1 3LY, United Kingdom
Tel : +44 (0)191 334 3042 | Fax : +44 (0)191 334 2971
http://dro-test.dur.ac.uk
FUTURELAB SERIES

REPORT 10:

Literature Review
of E-assessment
Jim Ridgway and Sean McCusker, School of Education, University of Durham
Daniel Pead, School of Education, University of Nottingham
FUTURELAB SERIES

REPORT 10: CONTENTS:

EXECUTIVE SUMMARY 2

Literature Review PURPOSE 4

of E-assessment SECTION 1
ASSESSMENT DRIVES
EDUCATION 5

SECTION 2
Jim Ridgway and Sean McCusker, School of Education, University of Durham
HOW AND WHERE MIGHT
Daniel Pead, School of Education, University of Nottingham ASSESSMENT BE DRIVEN? 11

SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT 17
FOREWORD SECTION 4
OPPORTUNITIES
I have to admit to being someone who for focus (perhaps the only focus in this day AND CHALLENGES FOR
many years has avoided thinking about and age) for a shared societal debate E-ASSESSMENT 29
assessment it somehow always seemed about what we, as a society, think are the
GLOSSARY 40
distant from my interests, divorced from core purposes and values of education.
my concerns about how children learn If we wish to create an education system BIBLIOGRAPHY 43
with technologies and, to be honest, just a that reflects and contributes to the
little less interesting than other things I development of our changing world, then APPENDIX:
was working on In recent years, however, we need to ask how we might change FUNDAMENTALS
OF ASSESSMENT 46
working in the field of education and assessment practices to achieve this.
technology, it has become clear that
anyone with an interest in how we create The authors of this review provide a
equitable, engaging and relevant education compelling argument for the central role
systems needs to think long and hard of assessment in shaping educational
about assessment. Futurelabs conference practice. They outline the challenges
Beyond the Exam in November 2003 and opportunities posed by the changing
further highlighted this point, as committed global world around us, and the potential
and engaged educators, software and role of technologies in our assessment
media developers came together to raise practices. Both optimistic and practical,
a rallying cry for a rethink of our current the review summarises existing research
assessment practices. and emergent practice, and provides a
blueprint for thinking about the risks and
What I and many others working in this potential that awaits us in this area.
area have come to realise is that we cant
just ignore assessment, or simply see it We look forward to hearing your response
as someone elses job. Assessment to this review.
practices shape, possibly more than any
other factor, what is taught and how it is Keri Facer, Director of Learning Research
taught in schools. At the same time, Futurelab
these assessment practices serve as the research@futurelab.org.uk

1
EXECUTIVE SUMMARY

EXECUTIVE SUMMARY multinational companies, and the need


to defend democracy are discussed. All of
E-assessment must not simply these influences are drivers for increased
invent new technologies which recycle uses of ICT in assessment. Many of the
our current ineffective practices. developments require the assessment of
Martin Ripley, QCA, 2004 higher-order thinking. However, there is
a constant danger that assessment
Assessment is central to educational systems are driven in undesirable ways,
practice. High-stakes assessments where things that are easy to measure are
exemplify curriculum ambitions, define valued more highly than things that are
what is worth knowing, and drive more important to learn (but harder to
classroom practices. It is essential to assess). In order to satisfy educational
assessment is develop systems for assessment which goals, we need to develop ways to make
reflect our core educational goals, and important things easier to measure -
central to which reward students for developing and ICT can help.
educational skills and attributes which will be of
long-term benefit to them and to society. All is not well with education. The
practice There is good research evidence to show Tomlinson Report (2004) identifies major
that well designed assessment systems problems with current educational
lead to improved student performance. provision at ages 14-19 years: there is a
In contrast, the USA provides some plethora of qualifications; too few students
spectacular examples of systems where engage with education; the drop-out rate
narrowly focused high-stakes assessment is scandalously high; and the most able
systems produce illusory student gains; students are not stretched by their studies.
this friendly fire results at best in lost Young people are not being equipped with
opportunities, and at worst in damaged the generic skills, knowledge and personal
students, teachers and communities. attributes they will need in the future.
A radical approach to qualifications is
ICT provides a link between learning, suggested which (in our view) can only
teaching and assessment. In school, ICT be introduced if there is a widespread
is used to support learning. Currently, adoption of e-assessment.
we have bizarre assessment practices
where students use ICT tools such as The UK government is committed to a
word processors and graphics calculators bold e-assessment strategy. Components
as an integral part of learning, and are include: ICT support for current paper-
then restricted to paper and pencil when based assessment systems; some online,
their knowledge is assessed. on-demand testing; and the development
of radical, ICT-set and assessed tests of
Assessment systems drive education, but ICT capability. Some good progress has
are themselves driven by a number of been made with these developments.
factors, which sometimes are in conflict.
To understand likely developments in E-assessment can be justified in a number
assessment, we need to examine some of of ways. It can help avoid the meltdown
these drivers of change. Implications of current paper-based systems; it can
of technology, globalisation, the EU, assess valuable life skills; it can be better

2
REPORT 10
LITERATURE REVIEW OF E-ASSESSMENT
JIM RIDGWAY AND SEAN MCCUSKER, SCHOOL OF EDUCATION, UNIVERSITY OF DURHAM
DANIEL PEAD, SCHOOL OF EDUCATION, UNIVERSITY OF NOTTINGHAM

for users for example by providing on- representations; however, it seems likely
demand tests with immediate feedback, that complex ideas (notably in reasoning
and perhaps diagnostic feedback, and from evidence of various sorts) will be
more accurate results via adaptive testing; acquired better and earlier than they are
it can help improve the technical quality of at present, and that the standards of
tests by improving the reliability of scoring. performance demanded of students will
rise dramatically. Here, we also explore
E-assessment can support current ways to assess important but ill-defined
educational goals. Paper and pencil tests goals such as the development of
can be made more authentic by allowing metacognitive skills, creativity,
students to word process essays, or to use communication skills, and the ability
spreadsheets, calculators or computer to work productively in groups.
algebra systems in paper-based
examinations. It can support current UK A major problem with education policy and
examination processes by using Electronic practice in England is the separation of
Data Exchange to smooth communications academic and practical subjects. In the
between schools and examinations worst case, to be able to invent and create
authorities; current processes of training something of value is taken to be a sure
markers and recording scores can be sign of feeble-mindedness; where as to e-assessment
improved. Systems where student work is opine on the work of others shows
scanned then distributed have advantages towering intellectual power. A diet of
can be used to
over conventional systems in terms of academic subjects with no opportunities to assess new
logistics (posting and tracking large act upon the world fails to equip students
volumes of paper, for example), and with ways to deal with their environments;
educational
continuous monitoring can ensure high a diet of practical subjects which do not goals
marker reliability. Current work is pushing engage higher-order thinking throughout
boundaries in areas such as text the creative process equip students only to
comprehension, and automated analysis become workers for others. Both streams
of student processes and strategies. produce one-handed people, and polarised
societies. E-portfolios can provide working
E-assessment can be used to assess new environments and assessment frameworks
educational goals. Interactive displays which support project-based work across
which show changes in variables over the curriculum, and can offer an escape
time, microworlds and simulations, from one of the most pernicious historical
interfaces that present complex data in legacies in education. E-portfolios solve
ways that are easy to control, all facilitate problems of storing student work, and
the assessment of problem-solving and make the activity of documenting the
process skills such as understanding process of creation and reflection relatively
and representing problems, controlling easy. Reliable teacher assessment is
variables, generating and testing enabled. There is likely to be extensive use
hypotheses, and finding rules and of teacher assessment of those aspects of
relationships. ICT facilitates new performance best judged by humans
representations, which can be powerful (including extended pieces of work
aids to learning. Little is known about assembled into portfolios), and more
the cognitive implications of these extensive use made of on-demand tests

3
of those aspects of performance which PURPOSE
can be done easily by computer, or which
are done best by computer. The purpose of this report is:

The issue for e-assessment is not if it will to assert the centrality of assessment
happen, but rather, what, when and how it in education systems
will happen. E-assessment is a stimulus
to identify drivers of assessment,
for rethinking the whole curriculum, as
and their likely impact on assessment,
well as all current assessment systems.
and thence on education systems
New educational goals continue to
emerge, and the process of critical to describe current, radical plans
reflection on what is important to learn, for increased use of high-stakes
and how this might be assessed e-assessment in the UK
authentically, needs to be institutionalised to describe and exemplify current
into curriculum planning. uses of ICT in assessment
to explore the potential of new
e-assessment is E-assessment is certain to play a major
technologies for enhancing current
role in defining and implementing
a stimulus for curriculum change in the UK. There is a
assessment (and pedagogic) practices
rethinking the strong government commitment to high to identify opportunities and to
quality e-assessment, and good initial suggest ways forward
whole progress has been made; nevertheless, to drip feed criteria for good
curriculum there is a need to be vigilant that the assessment throughout (set out
design of assessment systems is not explicitly in an appendix).
driven by considerations of cost.
This report has been designed to: present
Major challenges of going to scale have key findings on research in assessment;
yet to be faced. A good deal of innovative describe current UK government plans,
work is needed, coupled with a grounded and likely future developments; provide
approach to system-wide implementation. links to interesting examples of
e-assessment; offer speculations on
possible future developments; and to
stimulate a debate on the role of
e-assessment in assessment, teaching,
and learning.

The key findings and implications of


the report are presented within the
Executive Summary.

4
SECTION 1

ASSESSMENT DRIVES EDUCATION

1 ASSESSMENT DRIVES EDUCATION There is an intimate association between the assessment


teaching, learning and assessment,
Assessment is an integral part of being. illustrated in Fig 1. Robitaille et al (1993) system is the
We all make myriads of assessments in distinguish three components of the most potent
the course of everyday life. Is Jane a good curriculum: the intended curriculum (set
friend? Which Rachel Whiteread do I like out in policy statements), the implemented driver of
best? Does my bum look big in this? The curriculum (which can only be known by classroom
questions we ask, and the referents, give studying classroom practices) and the
an insight into the way we see ourselves attained curriculum (which is what practice
and the world (eg Groucho Marxs Please students can do at the end of a course of
accept my resignation. I dont want to study). The links between these three
belong to any club that will accept me as aspects of the curriculum are not
a member). For aspects of our lives that straightforward. The top down ambitions
are goal-directed (getting promoted, going of some policy makers are hostages to a
shopping), assessment is essential to number of other factors. The assessment
progress. To be effective, it is necessary system tests and scoring guides -
to know something of the intended goal; provides a far clearer definition of what
in well-defined situations, this will be is to be learned than does any verbal
relatively easy, and goals will be specified description (and perhaps provides the only
clearly. In ill-defined situations, such as clear definition), and so is a far better
creative acts, and research, the goals basis for curriculum planning at
themselves might not be well specified, classroom level than are grand statements
but the criteria for assessing products of educational ambitions. Teachers values
and processes may well be. and competences also mediate policy and
attainment; however, the assessment
system is the most potent driver of
1.1 ASSESSMENT AND EDUCATION classroom practice.

Assessment is central to the practice of


education. For students, good performance
on high-stakes assessment gives access
to further educational opportunities and Learning
employment. For teachers and schools,
it provides evidence of success as
individuals and organisations. Cultures
of accountability drive everyone to be
instrumental how do I demonstrate Assessment Pedagogy
success (without compromising my deep
values)? Assessment systems provide
the ways to measure individual and
organisational success, and so can have
a profound driving influence on systems Fig 1: Adapted from Pellegrino, Chudowski
they were designed to serve. and Glaser (2001)

5
SECTION 1

ASSESSMENT DRIVES EDUCATION

In the UK, there is a long-standing belief as evidence of his effectiveness as a


(eg Cockcroft 1982) that assessment governor in raising educational standards.
systems have a direct effect on curriculum
and on classroom practices. In Australia, Linn (2000) points to an underhand
Barnes, Clarke and Stevens (2000) traced method sometimes used by incoming
the effects of changing a high-stakes superintendents of school districts to show
assessment on classroom practice, and the effectiveness of their leadership. Most
claimed evidence for a direct causal link. commercially available multiple choice
Mathews (1985) traced the distorting tests of educational attainment have a
effects on the whole school curriculum of number of parallel test forms, designed
formal examinations for university to measure the same knowledge and skills
entrance (now A-levels), introduced when in the same way, but with slightly different
the university sector expanded beyond formats (so 12 men take six days, how
Cambridge, Durham and Oxford to long will six men take? becomes 12 men
accommodate as much as 5% of the take six days, how long will four men
population. There was a perceived need take?). These tests are designed in such a
for entrance tests to pre-university way that student scores on two parallel
courses (O-levels) designed for about forms would be the same (plus or minus
20% of the population - followed by a measurement error). Test designers do
perceived need to align all certification in this so that school districts can change the
the education system (notably O-levels test form every year, in order that tests
and CSE). This linkage between measure the underlying knowledge and
assessment for university admission and skills, not the ability to memorise the
the assessment of low-attaining students answers to specific questions. Linn (2000)
had a direct and often damaging impact gives an example where an incoming
on courses of study for lower attaining Superintendent decides to use a new test
students (Cockcroft 1982). form and also chooses to use this same
test form in successive years. The result is
Ill-conceived assessment can damage a steady increase in student scores simply
educational systems. Klein, Hamilton, because of poor test security students
McCaffrey and Stecher (2000) present are taught to memorise answers. It
evidence on the Texas Miracle. Here, appears that the superintendent has
scores on a rather narrow test designed by worked miracles with student attainment,
the State of Texas showed very large gains because scores have gone up so much.
over a period of just four years. This test is However, when students are tested on a
used to determine the funding received by new parallel form, and have to work out
individual schools. Unfortunately, scores the answers and not rely on memory,
on a national test which supposedly then scores plummet. So the high
ill-conceived measured the same sort of student reputation for increasing student
attainment were largely unchanged in performance is built upon deliberate
assessment can the same time interval. So scores on deceit. This is bad for teachers and
damage narrow tests can rise, even when students, and bad for public morality.
underlying student attainment does not.
educational The Texas Miracle was used in the High-stakes assessment systems define
systems election campaign of President Bush, what is rewarded by a culture, and

6
therefore the knowledge that is valuable. 1.3 ICT AND ASSESSMENT
It is unsurprising that high-stakes
assessment has a profound effect on both ICT perturbs the links between learning,
learning and teaching. Decisions about teaching and assessment in a number of
assessment systems are not made in a distinct ways:
vacuum; the educational community in the
UK (but not universally) is involved in the 1 ICT has changed the ways that research
design of assessment systems, and these is conducted in most disciplines.
decisions are usually grounded in Linguists analyse large corpuses of text;
discussions on what is worth knowing, and geographers use GIS systems; scientists
in the practicalities of teaching different and engineers use modelling packages.
concepts and techniques to students of Everyone uses word processors,
different ages. databases and spreadsheets. Students well designed
should use contemporary research
methods; if they do not, school-based formative
1.2 THE IMPACT OF ASSESSMENT learning will become increasingly assessment is
ON ATTAINMENT irrelevant to understanding
developments in knowledge. associated with
An extensive literature review by Black and Assessment should reinforce good major gains in
Wiliam (2002) showed that well designed curriculum practice. We are
formative assessment is associated with approaching a bizarre situation where student
major gains in student attainment on a students use powerful and appropriate attainment
wide range of conventional measures of tools to support learning and solve
attainment. This result was found across problems in class, but are then denied
all ages and all subject disciplines. access to these tools when their
Topping (1998) reviewed the impact of peer knowledge is assessed.
assessment between students in higher
education on writing, and found large 2 ICT can support educational goals that
positive effects. A major literature review have been judged to be desirable for a
commissioned by the EPPI Centre (2002) long time, but hard to achieve via
showed that regular summative conventional teaching methods. In
assessment had a large negative effect on particular, ICT can support the
the attainment of low-attaining students, development of higher-order thinking
but did little harm to high-attaining skills such as critiquing, reflection on
students. These studies provide strong cognitive processes, and learning to
evidence that good assessment practices learn, and can facilitate group work,
produce large performance gains. These and engagement with extended projects;
gains are amongst the largest gains found ICT competence is itself a (moving)
in any educational treatments. Similarly, target for assessment.
poor assessment systems have negative
not neutral effects on the performance of 3 New technologies raise an important
weak students. It follows that when we set of questions about what is worth
consider the introduction of e-assessment, learning in an ICT-rich environment;
we should be aware that we are working what can be taught, given new
with a very sharp sword. pedagogic tools; and how assessment

7
SECTION 1

ASSESSMENT DRIVES EDUCATION

systems can be designed which put Audience: summative evaluations often


pressure on educational systems to help have a large audience; the student and
students achieve these new goals. If we teacher, parent, school, employer and
ignore these important questions, we educational system. Formative evaluation
run the risk that e-assessment will be can have a small audience; perhaps just
designed on the basis of convenience, the student and teacher (and parent in
with disastrous consequences for younger years).
educational practice.
Mendacity quotient: in summative
assessment, students are advised to focus
1.4 ON THE NATURE OF SUMMATIVE on things they do best and hide areas of
AND FORMATIVE ASSESSMENT ignorance; in formative assessment, it is
more sensible for students to focus on
We should distinguish between summative things they understand least well.
and formative assessment, which are
different in conception and function. In Agency: summative assessment is often
principle, it is easy to distinguish between done to students, perhaps without their
them. Summative assessment takes place willing participation. Formative
at the end of some course of study, and is assessment is often actively sought out by
designed to summarise performance and the student; good formative feedback
attainment at the time of testing; high- depends on student engagement in the
stakes, end of schooling assessment process of revision.
such as GCSE provides a good example.
Formative assessment takes place in Validation methods: summative
mid-course, and is intended to enhance assessment is often judged in terms of
students final performance; comments predictive validity - are students who got
on the first draft of an essay provide A grades more likely to get top grades in
an example. college (but see Messick 1995)??
Formative assessment might be judged
Summative and formative assessments in terms of its usefulness in undoing
differ on a number of dimensions. These predictive validity what feedback can we
include: give to students with C grades, so that they
perform as well in college as anyone else?
Consequences: summative assessment is
often highly significant for the student and Quality of the assessment: for summative
teacher, whereas formative assessments assessment, the assessment method
need not be. should achieve appropriately high
standards of reliability and validity; for
Exchange value: summative assessments formative assessment, reliability and
often have a value outside the classroom - validity are negotiable between teacher
for certification, access to further courses, and student.
and careers; formative assessment usually
has no currency outside a small group. Resources required: the nature of
summative assessment can be influenced
by considerations of cost and time. In

8
terms of cost, the estimation of the cost of Theory dependence: summative
testing is often done very badly, especially assessment rarely rests on theory;
in the USA. There, it is common for cost formative assessment is likely to be
to be equated with the money paid for the theory-genic as participants discuss
test and its scoring, not the real cost, progress, what is known, how to learn and
which is the opportunity cost, measured in remember things, and how best to use
terms of the reduction in time spent evidence.
learning which has been diverted to
useless test prep. Formative evaluation Tool types: summative assessment
should be an integral part of the work of commonly uses timed written
teaching, so estimation of cost focuses assessments where the structure is
naturally on opportunity costs just what specified in advance, and which is scored
is an effective allocation of teaching and using a common set of rules. Tests are
learning time to formative evaluation? In often designed to discriminate between
terms of time, for summative assessment students, and to put them into a rank order
time is easy to measure (so long as in terms of performance. Formative
useless test prep is counted in); again, assessment commonly uses a variety of
formative assessment is an integral part methods such as portfolios of work,
of teaching. student draft work, student annotations of
their work, concept mapping tools,
Knowledge and the knowledge diagnostic interviews and diagnostic tests.
community: summative assessment is Each student is their own referent
explicit about what is being assessed, and comparison with other students may not
ideas about the nature of knowledge are be useful, and is often harmful to learning.
shared within a wide community; with
formative evaluation, ideas about the
nature of knowledge might be negotiated 1.4.1 Reflecting on summative
by just two people. and formative assessment
Status of the assessment: in summative Despite the differences highlighted here,
assessment, the assessment can be the two sorts of assessment have many
ignored by the student; formative areas of overlap:
assessment simply isnt formative
assessment unless the student does a student can change their study
something with it to improve performance. methods on the basis of an end-of-year
examination result (summative
Focal domain: it is useful to distinguish assessment used for formative purposes)
between cognitive, social and emotional
aspects of performance. Summative summative evaluation of students
assessment commonly focuses on can provide formative evaluation for
cognitive performance; formative teachers, schools and educational
assessment can run wild in the social and systems
affective domains. formative assessment always rests on
some sort of summative assessment
feedback and discussion must rest

9
SECTION 1

ASSESSMENT DRIVES EDUCATION

on some assessment of the current practices where ICT is an integral part of


state of knowledge learning, but where students are denied
some summative assessment should access to technology during assessment,
include the ability to benefit from must be reformed as a matter of urgency.
formative assessment learning to Skills in ICT are essential for much
learn is an important educational goal, modern living, and so should be a target
and should be assessed, formally for assessment.

summative assessment (eg of student


teachers) should include the ability to
provide formative assessment.

frequent testing 1.5 SUMMARY OF SECTION 1


and reporting of
Assessment lies at the heart of education.
scores damages Assessment systems exemplify the goals
weaker students and values of education systems. High-
stakes assessment systems have a direct
influence on classroom practices. Any
discussion of assessment raises important
questions about what is worth knowing,
the extent to which such knowledge can be
taught, and the best ways to support
knowledge acquisition.

Well designed assessment systems are


associated with large increases in student
performances; frequent testing and
reporting of scores damages weaker
students. Badly designed high-stakes
assessment systems can have strong
negative consequences for students,
communities and societies.

In this section, we distinguish between


summative assessment (assessment of
learning) and formative assessment
(assessment for learning), and compare
their characteristics.

ICT has changed the ways that academic


work is done; this should be reflected in
the tools used in education for both
learning and assessment. Bizarre current

10
SECTION 2

HOW AND WHERE MIGHT


ASSESSMENT BE DRIVEN?

2 HOW AND WHERE MIGHT easier to use, and is attracting users at an


ASSESSMENT BE DRIVEN? increasing rate. Technology is ubiquitous:
as well as computers in the form of
There is a comforting belief that decisions desktops and laptops, there has been an
about education and education systems explosion of distributed computer power in
are made within those systems, and that the form of mobile phones which are also
outside agents notably foreign outside fully functioning personal digital assistants
agents have little or no influence on (PDAs), containing features such as a
internal affairs. This has been true in the spreadsheet, database and word
UK for a long time, but has not been true processor. It has been estimated that
in countries which (for example) make use there are over three billion mobile phones
of UK examinations to certify students. worldwide (Bennett 2002); as before,
If we are to explore plausible scenarios this number is growing very fast, and
about the future impact of ICT on new phones are manufactured with an
assessment, it is necessary to take increasing range of features. Technology
account of drivers of change. Here, we as a driver has a number of likely effects
consider technology, globalisation, the rise on assessment. New skills (and so new
of mass education, problems of political assessments) are needed for work and
stability, current government plans, and social functioning, which require fluent
likely government plans, as drivers of use of ICT; technology has had a profound
educational change and, in parallel, of effect on many labour intensive work
likely changes in assessment systems. practices, many of which resemble
educational assessment. The use of ICT
for assessment has hardly begun, and
some new technologies such as mobile
2.1 TECHNOLOGY AS DRIVER OF
phones offer great promise not only
SOCIAL CHANGE because of their ubiquity (which might
solve a current problem of access which
Technology is a key driver of social change. has restricted widespread use of ICT in
Technology has transformed the ways we assessment in the past), but also because
work, our leisure activities, and the ways new technologies have become a natural
we interact with each other. The use of the form of communication for very many
web is growing at an extraordinary rate, young people.
and people increasingly have access to rich
sources of information. Metcalfes law
states that the value of a network rises
dramatically as more people join in its
2.2 GLOBALISATION
value doesnt just increase steadily. The
Globalisation is probably the most obvious
capability of computer hardware and
driver of change. Significant features for
software continues to improve, and
the current discussion are: the mobility
features are being added (such as high
of capital, employment opportunities
quality video) which make computer use
increasingly attractive, and well suited to
(jobs), and people. Cooperation between the use of ICT
countries (eg in the European Union), and
supporting human-human interactions.
the pervasive influence of multinational for assessment
The web is an increasingly valuable
resource which is becoming progressively
companies also have profound social effects. has hardly begun

11
SECTION 2

HOW AND WHERE MIGHT


ASSESSMENT BE DRIVEN?

The mobility of capital and jobs has For developed economies to maintain
changed the profile of the job market, their global dominance, their economies
with new kinds of jobs being created (eg must be geared to adding value to raw
in ICT) and old ones disappearing (eg in materials (or to creating value from
manufacturing industries). It is very easy to nothing, as in the entertainment and
export jobs and capital from the developed finance industries). This requires changes
world to the developing world (eg by in the education system which encourage
relocating telephone call centres, or by creative activities, and good problem-
establishing factories in countries with low solving ability. Employment in a post-
wage costs). For people (and economies) industrial society is likely to depend on
to be successful, they must continue to higher-order thinking skills, such as
learn new skills, and to adapt to change. learning to learn. This requires that
Retraining will often require re- these thinking skills be exemplified
certification of competence, with the and assessed, if they are to receive
obvious consequence of further appropriate attention in school.
assessment, and the need to design
assessment systems appropriate to the The effects of cooperation between
new needs of employment. These are countries in Europe will have an effect on
pressures for more, and effective, systems assessment systems. Currently, there is
of competence-based assessment. a problem that qualifications in different
member states (architect, engineer)
cooperation Migration for work and education raises are gained after rather different amounts
similar issues. The developed world has a of training, and equip people for quite
between need to import highly skilled workers; different levels of professional
countries in universities worldwide seek international responsibility. This makes job mobility
students. In both cases, there is a need to very difficult. The Bologna Accord is an
Europe will have certify the competence of applicants, and agreement between EU member states
an effect on to reject those least likely to be effective that all universities will adopt the same
workers, or to complete courses pattern of professional training (typically a
assessment successfully (because of a lack of fluency three-year undergraduate degree followed
systems in the language of instruction, for by a two-year professional qualification) in
example). Financial considerations make order to make qualifications in different
it impractical for testing to take place in member states more comparable.
the target country, and so a good deal Convergence of course structure is likely
of testing takes place in the country to lead to a convergence of assessment
supplying workers or students. Again, it is systems, in line with the desire to increase
common to use competence tests which mobility (see www.engc.org.uk/
are externally mandated and designed. international/bologna.asp for an analysis of
Language testing provides a good example; the impact of the Bologna, Washington and
a computer-based version of the Test of Sidney Accords on engineering).
English as a Foreign Language (TOEFL)
has been developed which adjusts the Globalisation is having a profound effect on
difficulty level of the questions in the light educational systems worldwide. In higher
of the performance of the candidate on the education, Slaughter and Leslie (1997)
test (see www.ets.org/toefl). describe the response of universities in

12
several countries to academic capitalism by a commercial company before they are multinational
a global trend to view knowledge as a allowed to certify student competence.
product to be created and controlled, and companies also
to see universities as organisations which The scale on which such examinations are drive changes in
produce knowledge and more taken is impressive. Bennett (2002)
knowledgeable people as efficiently as describes the National Computer Rank assessment
possible. They document the changes in Examination, China, which is a proficiency practices
university structures and functioning which exam to assess knowledge of computer
have been a response to such pressures; science and the ability to use it; two
these include greater collaboration on million examinations were taken in 2002.
teaching between universities, and mutual Tests for the European Computer Driving
accreditation of courses. Again, the need Licence have been taken by more than a
for comparability of course difficulty and million people.
student attainment will lead to a careful
re-examination of assessment systems,
and some homogenisation. 2.3 MASS EDUCATION
Multinational companies also drive Mass education has developed rapidly and
changes in assessment practices. These recently. In the last 30 years, the
companies are successful in part because percentage of the UK population being
of their emphasis on uniform standards; educated at university has risen from
one is unlikely to get a badly cooked about 5% to about 40%. This puts
hamburger in Macdonalds, or a copy of pressures on academic systems to develop
Excel that functions worse than other efficient assessment systems.
copies. This emphasis on quality control
extends to job qualifications, and to There is now a great deal of distance
standards required of workers. In fast education. China plans to have five million
changing markets such as technology students in 50-100 online colleges by 2005.
provision, retraining workers and checking At least 35 US states have virtual
their competence to use, install or repair universities (Bennett 2002). (The recent
new equipment or software requires failure of the E-university in the UK -
appropriate assessment of competence. www.parliament.uk/post/pn200.pdf - and
The needs of employers for large numbers of the US Open University, shows that such
of staff who are able to use ICT effectively ventures are not always successful!) A
as part of their job has lead to trans- great deal of curriculum material is
national qualifications such as the delivered via a variety of technologies (the
European Computer Driving Licence Massachusetts Institute of Technology
(www.ecdl.co.uk). Such examples are is in the process of putting all its course
interesting because they are set by material online, for example see
international organisations, or commercial http://ocw.mit.edu/index.html). Over
organisations, and in some cases (eg the 3,000 textbooks are freely available
Microsoft Academy programme - online at the National Academy Press
www.microsoft.com/education/ (www.nap.edu). The use of technology in
msitacademy/ITAPApplyOnline.aspx), the assessment process is a logical
state-funded educational organisations consequence of these developments.
must submit themselves for examination

13
SECTION 2

HOW AND WHERE MIGHT


ASSESSMENT BE DRIVEN?

2.4 DEFENDING DEMOCRACY of current national systems. Two current


UK initiatives are likely to lead to radical
Problems of potential political instability changes in assessment practices, notably
provide another driver of change. The rise to increase the use of e-assessment. One
of fundamentalism (both Christian and is the DfES E-assessment Strategy
Moslem) can be seen as a loss for (www.dfes.gov.uk/elearningstrategy/
rationalism. Electoral apathy is a threat to default.stm) which maps out a tight
the democratic process. One problem for timeline for change in current examination
politicians is to explain complex policies to systems; the other is the Tomlinson (2004)
citizens. This is made difficult if citizens Report 14-19 Curriculum And
understand little about modelling (such as Qualifications Reform, which proposes
ideas of multiple causality, feedback in radical changes in educational provision
systems, lead and lag times of effects etc). itself (with direct consequences for
Informed citizens need to understand e-assessment).
something about ways to describe and
model complex systems, in order that they The Tomlinson Report (2002) into A-level
do not give up on democracy simply standards argued that the examinations
because they do not understand the policy system is operating at, or perhaps beyond,
arguments being made. Understanding capacity. According to Tomlinson (2002), in
arguments about causality and some 2001, 24 million examination scripts and
experience of modelling systems via ICT coursework assignments were produced at
should be major educational goals. These GCSE, AS and A level. In terms of the
goals will need to be exemplified and number of students being assessed, in
valued by high-stakes assessment 2002 there were around six million GCSE
systems, if they are to become part of entries and nearly two million children sat
students educational experiences. Key Stage tests. More students are
engaging in post-compulsory education;
Education for citizenship has received the introduction of modular A-levels, and
continued the popularity of AS courses has resulted
increasing emphasis in the UK. Some of
expansion of the the educational goals such as in an increase in the number of
understanding different perspectives, examinations taken (Tomlinson reports a
current growth of 158% over a 20-year period).
increased empathy, and community
examination engagement - seem intangible. However, There is an associated problem concerning
ICT can play a role in posing authentic the supply of examiners, in terms of both
system without recruitment and training. Roan (2003)
questions (for example via video) and
some changes could play a role in formative assessment, estimated that about 50,000 examiners
and perhaps in summative assessment were involved in the assessment of GCSEs,
does not seem a GNVQs and A-levels. Continued expansion
(using portfolios).
viable option of the current examination system without
some changes does not seem a viable
2.5 GOVERNMENT-LED REFORMS option. ICT support for current activities,
described later, might well be of benefit.
IN CURRICULUM AND ASSESSMENT
ICT-based assessment is now part of UK
Governments are responsive to global
government policy, and will be introduced
pressures, and analyses of the limitations

14
progressively, but on a tight timescale. across the West Midlands and the west of
The DfES E-learning Strategy will be England. AQA conducted a live trial in
accompanied by radical changes to the March 2004 on 20,000 scripts (Adams and
assessment process, for which the Hudson 2004); in Summer 2004, about
Qualifications and Curriculum Authority 500,000 marks (5% of the total) will be
are responsible (www.qca.org.uk/ collected; by 2007, 100% of marks will be
adultlearning/workforce/6877.html). Over captured electronically.
the next five years, the following activities
are planned: The Tomlinson Report (2004, in prep)
will offer a more radical challenge to
All new qualifications should include assessment practices. The Interim Report
assessment on-screen (Tomlinson 2004) identified a number of
Awarding bodies set up to accept and problems with the existing system. These the Tomlinson
assess e-portfolios include concerns about:
Report will offer
Most examinations should be available excellence the current system does
optionally on-screen, where appropriate a more radical
not stretch the most able young people
National curriculum tests available (in 2003, over 20% of A-level entries challenge to
on-screen for those schools that want resulted in grade A) assessment
to use them vocational training there is an historic practices
The first on-demand GCSE examinations failure to provide high-quality vocational
are starting to be introduced courses that stretch young people and
10 new qualifications specifically designed prepare them for work
for electronic delivery and assessment vocational learning is often assessed by
QCA Blueprint (2004) external written examinations, not
practical and continuous assessment
The timescale for these changes is short. assessment - the burden on students
For example, in 2005, 75% of basic and key and teachers is too high
skills tests will be delivered on-screen; in
2006, each major examination board will disaffection - our high drop-out rates
offer live GCSE examinations in two are scandalous
subjects, and will pilot at least one the plethora of qualifications currently
qualification, specifically designed for around 4,000
electronic delivery and assessment; in
2007, 10% of GCSE examinations will be curricula - are often narrow, overfull,
administered on-screen; in 2008, there will and limit in-depth learning
be on-demand testing for GCSEs in at too few students develop high levels of
least two subjects. competence in mathematical skills,
communication, working with others, or
Good progress has been made with these problem-solving
developments. For example, Edexcel is
carrying out a pilot scheme for online failure to equip young people with the
GCSEs in chemistry, biology, physics and generic skills, knowledge and personal
geography with 200 schools and colleges attributes they will need in the future.

15
SECTION 2

HOW AND WHERE MIGHT


ASSESSMENT BE DRIVEN?

there is an The Report proposes a single qualifications explosion of its usefulness and use in
framework, based on diplomas set at four everyday life. These provide pressures for
urgent need to levels (Entry, Foundation, Intermediate and more relevant skills to be assessed, and
invent and apply Advanced). Students are expected to also provide an assessment medium which
progress at a pace appropriate to their is largely unexplored. Demands for lifelong
new sorts of e- attainment, rather than their age. Each learning, for people who can innovate and
assessment on a diploma shares some common features. create new ideas, and the needs for
These require students to demonstrate informed citizenship are all pressures for
large scale evidence of: education (and associated assessment
systems) that rewards higher-order
mathematical skills, communication thinking, and personal development.
and ICT skills Conversely, drivers such as the need to
successful completion of an extended retrain and recertify staff, to ensure
project common standards across organisations
in different countries, and to allow access
participation in activities based on to well-qualified migrants for jobs and
personal interest, contribution to the education, emphasise assessments which
community as active citizens, and transcend national boundaries and which
experience of employment are based on well-defined competencies
personal planning, review and making (and where assessment design is
informed choices sometimes based on perceived
commercial imperatives). These drivers
engagement in main learning- the
require different approaches to
major part of the diploma chosen by
assessment, and all require new sorts of
the student in order to open access to
assessments and assessment systems
further opportunities (eg in employment
to be developed.
or education).
In the UK, there are a number of problems
These recommendations are exciting and
with current assessment systems. First,
very ambitious, but deeply problematic,
they serve students very badly; second,
unless there are radical changes to
they might soon collapse under their own
current assessment systems notably in
weight. There is now the political will (and
the large-scale adoption of e-assessment.
a tight timescale) to develop pervasive,
We consider ways these recommendations
high quality e-assessment on a tight
might be met, in Section 3.
timeline, aligned with current and
emerging educational goals. There is also
an urgent need to invent and apply new
2.6 SUMMARY OF SECTION 2 sorts of e-assessment on a large scale.
A number of drivers are shaping both
assessment and ICT; these need to be
taken into account in any discussion of
future developments. These drivers provide
conflicting pressures. The drivers
considered here include the increasing
power and ubiquity of ICT, and the

16
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

3 CURRENT DEVELOPMENTS paper-based testing systems are well


IN E-ASSESSMENT established - it is relatively easy to
prevent candidates from copying from
The UK government has embarked on a each other, for example
very ambitious project to extend the use of paper is easy to distribute, and can be
e-assessment. The issue for education is used in most locations
not if e-assessment will play a major role,
in extreme circumstances, it is possible
but when, what, and how. E-assessment
to copy an examination paper, and find
can take a number of forms, including
another desk
automating administrative procedures;
digitising paper-based systems, and online human judgements are brought to bear
testing - which extends from banal throughout the process, so the scope of
multiple choice tests to interactive questions is unconstrained.
assessments of problem-solving skills.
In this section, we focus on current
developments in e-assessment for 3.1 SOME MOTIVES FOR
summative purposes that can be used COMPUTER-BASED TESTING
across the educational system. In Section
4 we address important but less well- A number of justifications have been put
defined targets for e-assessment. forward for computer-based testing, and
are set out below. Not all justifications
Before we begin this section exploring apply to every use of computers in
different aspects of e-assessment, we assessment.
should remember some of the virtues of
paper-based tests, in order that we do Avoiding meltdown: it may well be
not become so enamoured of new impossible to maintain existing paper-
technologies that we lose sight of the based assessment systems in the face
benefits of current assessment systems. of the current growth in the number of
With paper: students being tested. Scanning
technologies can help.
all stakeholders are familiar with all
aspects of the medium Valuable life skills: much of everyday life
paper is robust it can be dropped, (including professional life) requires people
and it still functions to use computers. Not using computers
for assessment seems perverse.
there are rarely problems of legibility
high resolution displays are readily Alignment of curriculum and assessment:
available
the issue for
there is a danger of an emerging gap
students can take questions in any order between classroom practices and the education is not
assessment system. It is very common for if e-assessment
users can input cursive script, students (and almost all professionals) to
diagrams, graphs, tables use word processors when they write; in will play a major
a number of equity issues have been mathematics and science, the use of role, but when,
solved it is easy to create large fonts graphics calculators, spreadsheets,
and to solve other access problems computer algebra systems (CAS) and what, and how

17
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

modelling software is commonplace in the case of language testing, and


(and universal in professional practice). selection tests for employment. Systems
Assessment systems that do not allow of assessment that change the tasks taken
access to these tools are requiring in the light of progress so far can be useful
students to work in unfamiliar and in such circumstances. The principle is
maladaptive ways. Non-ICT-based straightforward: candidates are presented
assessment can be a drag on curriculum with tasks of intermediate difficulty; if
reform, rather than a useful driver they are successful, the difficulty level
(see Section 1.2). increases; if they are unsuccessful, it
decreases. This allows a more accurate
On-demand testing: in many situations estimate of the level of attainment.
(for example, students engaged in part- Adaptive tests can work well when there is
on-demand time study; students taking courses a single scale of difficulty for example in
designed to develop competencies; number skill, or vocabulary. They require
testing would students on short courses) it is appropriate careful development when a number of
enable students to test students whenever they are judged different factors affect performance (such
(or judge themselves) to be ready. City and as technical as well as problem-solving
to take tests Guilds tests provide an illustration; 75,000 skills), and are unlikely to be useful where
when they are online tests have been taken, and extended responses are required, because
candidates book a test time that suits the adaptive system has too little to work
ready them. Saturday is the third most popular on. Examples in the school system can be
day for assessment (Ripley 2004). found in Victoria, Australia (AIM Online
2003), where adaptive tests of English and
Students progress at different rates: mathematics are used.
currently, the UK examination system acts
as a force against differentiation in the Better immediate feedback: candidates
curriculum. Summative end-of-year tests can often be given information immediately
make it attractive to schools to teach year about success, as is the case in the tests
groups together and to enter them in a that all trainee teachers are required to
common set of examinations. On-demand take in English, mathematics and ICT
testing would enable students to take tests (Teacher Training Agency 2003). (This is not
such as GCSEs when they are ready, and necessarily an advantage, if this testing
to progress through different academic method encourages an instrumental
subjects at different rates. In the USA, approach, where students learn in order
the Advanced Placement system allows to pass tests rather than to learn things.
students to take university-level courses It could also force assessment design
in school, be tested, and to have success to focus on objective knowledge rather
rewarded by college credits so a student than the development of process skills,
might enter the second year university if immediate feedback became a
course, for example. The Tomlinson Report requirement for all testing.) In principle,
(2004) argues for a more differentiated candidates could also be given diagnostic
curriculum. information about those aspects of
performance most in need of improvement.
Adaptive testing: in some circumstances,
the group to be tested is heterogeneous as

18
Motivational gains: there are claims Better task design: it is easier for test
(Richardson, Baird, Ridgway, Ripley, constructors to change tasks on the basis
Shorrocks-Taylor and Swan 2002; Ripley of information during testing and pre-
2004) that students prefer e-assessment to testing, because of the immediacy of data
paper-based assessment, because the collection. This can range from the
users feel more in control; interfaces are rejection of items that do not function well
judged to be friendly; and because some (for example items where students who
tests use games and simulations, which score well overall are likely to fail a
resemble both learning environments and particular item) to improved test design
recreational activities. (for example, ensuring that there are a lot
of items set around critical cut-off points
Better exemplification for students and especially the pass/fail boundary so
teachers: posting examples of work which that the test is most reliable there).
meets certain standards can be beneficial.
In South Australia, excellent student work Cost: it is common to claim that e-
in technology is displayed on the web (see assessment can save money it is clear
www.ssabsa.sa.edu.au/tech/2004techsho/ that online multiple choice tests can be
index.htm). cheap to administer and score. However,
if we are to exploit the potential of ICT to
Better system feedback: having full sets improve assessment for example by
of response data from students available at presenting simulations or video as an
the time of Examiners Reports can integral part of a test then the costs of
improve the quality of feedback. Details of testing are likely to increase.
questions, and parts of questions, that
proved relatively difficult and easy should
improve the quality of Examiners Reports 3.2 USES OF E-ASSESSMENT TO
(which are based currently on examiners SUPPORT CURRENT EDUCATIONAL
experiences of a sample of scripts, and
GOALS
rarely on candidate success on questions
and part-questions). This information will
be useful for both improving the quality of 3.2.1 Using ICT to support
questions, and in providing information to Multiple Choice Tests
teachers about topics that have not been
learned well. This is a well-established technology,
particularly well suited to assessing
Faster information for higher education: declarative knowledge (knowing that) in
universities need assessment results in a well-defined domains. Developing tasks to
timely fashion. UK universities receive identify student misconceptions is also
A-level results quite late in the academic possible. It is harder to assess procedural
year, and engage in a frenetic process knowledge (knowing how). MCT is
to fill places with appropriately qualified unsuited to eliciting student explanations,
applicants when students do and do not or other open responses. MCT have the
achieve the grades that were a condition great advantage that they can be very
of entry. These pressures would be cheap to create and use. Some of this
eased if results were delivered earlier. cheapness is illusory, because the costs

19
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

of designing good items can be high. running a CAS pilot for its Higher Level
Over-use of MCT can be very expensive, if Mathematics Diploma from September
it leads to a distortion of the curriculum in 2004. In the USA, CAS can be used when
favour of atomised declarative knowledge, taking the College Boards Advanced
divorced from conceptual structures that Placement Calculus test.
students can use to work on the world,
effectively. MCT are used extensively in the
USA for high-stakes assessment, and are 3.2.3 Using ICT to support current
presented increasingly via the web. For UK examination processes
example, web-based high-stakes State
tests are available in Dakota and Georgia; A number of ways in which ICT can
the Graduate Record Examination (GRE), improve current examination practices
used by many colleges to determine are set out below.
access to Graduate School in many US
colleges, is available online. Better school-examination board
communication: Tomlinson (2002) points
it makes sense to existing extensive use of ICT by awarding
3.2.2 Creating more authentic paper bodies in the examination process, and
to allow students and pencil tests argues for more use of Electronic Data
access to the Interchange (EDI) systems, which enable
It makes sense to allow students access to schools and colleges to submit
tools they use in the tools they use in class, such as word examination entries and information about
class, during processors, and that professionals use candidates online and to receive results
at work, such as graphing tools and automatically.
testing modelling packages, during testing. It
makes no sense at all to always forbid Supporting the current marking and
students to use tools of the trade when moderation process: a challenge faced by
being assessed. E-learning changes large-scale tests that require human
the nature of the skills required. E- markers is to ensure the comparability of
assessment allows examiners to focus standards across markers, and over time
more on conceptual understanding of what for all markers during the grading process.
needs to be done to solve problems, and Chief examiners create scoring rubrics to
less on telling students what to do, then guide other markers, and there is usually a
assessing them on their competence in process of standardisation where markers
using the manual techniques required to use the scoring rubrics to score a sample
get the answer. In Australia, the State of of scripts, and attend a standardising
Victoria (www.vcaa.vic.edu.au/prep10) has a meeting where standards are compared,
system for essay marking where students discrepancies are discussed, and the
key in their responses to questions, which rubric is tuned. Once markers have
are then distributed electronically and reached an appropriate level of marking
marked by human markers. Computer accuracy, they mark examinations
Algebra Systems (CAS) can be used in the independently. Systems vary in terms of
Baccalaurat Gnral Mathmatiques the extent of the moderation used. In some
examination in France; the International systems, scripts are sampled by chief
Baccalaureate Organisation (IBO) is examiners, and serious deviation from the

20
rubric can lead to the remarking of all the marked. There is flexibility in the ways
scripts sent to a particular examiner. ICT that scoring is done. Markers can be
can be used to support this process. asked to score whole scripts, or individual
Sample scripts typical of different questions. So a newly appointed marker
categories of student work can be put might be sent questions judged to be
online, for easy reference by markers. easy to mark, and more experienced
Entry of marks can be done via templates markers might be sent questions which
that ensure that markers complete every require deeper subject knowledge. The
section, and the tedious process of reliability of scoring can be increased.
aggregating marks from different parts of Scripts judged to be around key
the script is done automatically and borderlines on first marking can be sent
without error. Data is collected in a way to other markers; scripts judged to be
that facilitates rapid and detailed analysis, well away from boundaries need be
at the level of responses to different parts scored only once. Online support can be
of questions, whole questions, and the provided; markers can ask for help with
distribution of test scores. specific student responses. Data is
captured in a form suitable for a number
Replacing paper: in the USA (and of subsequent analyses.
increasingly in the UK), there is
widespread use of systems where students An interesting variant of this approach that
take paper-based examinations, and the obviates the need for scanning would be to
scripts are scanned electronically (this is require candidates to use intelligent pens.
analogous to Optical Mark Recognition for These pens have two distinct functions.
multiple choice tests that has been The first is to write like a conventional pen.
available for many years). Once in this The second is to record its movements
format, the documents can be sent (exactly) on the page. This is done by using
electronically to markers, who can be specially prepared stationery. Imagine you
working almost anywhere. These systems could see a small square area of a
have a number of advantages over paper- banknote. The pattern across the whole
based systems. First, there are surface is never repeated, so that, given
considerable problems in tracking the sufficient time, you could find exactly
distribution and return of large volumes of where the square is located on the note.
paper to and from markers; there are The pen works in a similar way, to record
security issues sending examination its position on the page over the course of
papers by post, and scripts can get lost. the examination. The pen is then
Second, moderation of the quality of connected to a computer, and all the data
scoring can be done easily. Pre-scored is downloaded. The whole student
anchor papers can be sent to markers response can then be reconstructed.
during the course of their marking, to Clearly, this approach would have to be
ensure they are maintaining standards; subjected to extensive trialling before any
markers who do not perform adequately widespread adoption.
can be told to take a break, or can be
removed from the pool of markers. The
whole process can be monitored in terms
of the rate at which scripts are being

21
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

ICT can be used 3.2.4 Online assessment: turning capturing the rough work, and second,
a GCSE paper into computer-only allocating partial credit. Computer capture
to moderate is very difficult, given current interfaces;
e-assessment
human markers the rules for allocating partial credit would
have to be specified in very fine detail for
An interesting challenge is to devise
them to be used as part of an automatic
ways to replace paper-based tests with
scoring routine.
ICT-based tests, and to score them
automatically. Some virtues of paper-
based tests are unlikely to be replicated for
a number of reasons, so setting tests on- 3.2.5 Scoring of open responses
screen is likely to bring about changes in
the nature of what is assessed. Here, we GCSE questions often require students to
consider one specimen GCSE mathematics answer questions in their own way, and to
paper to illustrate the problems. explain things scoring these responses
automatically is inherently difficult.
Measuring and drawing: about 10% of the Automated scoring of open student
marks in the paper-based assessment responses is the focus of a good deal of
required the use of actual instruments ongoing work. A number of approaches
(ruler, protractor, compasses). One have been taken to the problem of
approach for translation onto screen would automatic scoring. One is based on the
be to simulate the physical instruments, analysis of the surface features of the
eg to provide a virtual protractor that can response (Cohen, Ben-Simon and Hovav
be dragged around the screen and rotated. 2003), such as the number of characters
Another is to provide CAD or interactive entered, the number of sentences,
geometry packages. The latter would sentence length, the number of low-
require a substantial change to the frequency words used, and the like. The
syllabus, but could provide real benefits in success of such methods can be judged by
terms of student learning. comparing the correlation between
computer and human judges, and the
Mathematical expressions: about 20% of correlation between scores given by two
the marks required the student to write sets of human judges. Cohen, Ben-Simon
down answers that could not be keyed in, and Hovav (2003) looked at the scoring of a
using a standard keyboard. These included range of essay types by humans and
fractions, division expressions, and powers. computer, and report that the correlation
between the number of characters keyed
Rough work and partial credit: almost by the student, and the scores given by
every question in the paper format human judges are as high as the
included space for rough work, and about correlation between scores given by
30% of the total marks potentially could be human judges. Nevertheless, these
awarded based on this work, in the form of scoring systems do not provide a panacea.
partial credit awarded where the final In the USA, double marking is used to
answer is incorrect (these marks are ensure reliability (this is rarely done in the
usually awarded in full if the final answer UK). ICT can be used to moderate human
is correct). There are two distinct problems markers (and save money) if the
in translating this to a digital format first computer and the human disagree, the

22
paper is re-marked by a human. Machine- produce student responses that are
only scoring is unlikely to be useful in UK difficult to score) and in terms of writing
contexts, for two reasons. First is that the questions which highlight student
UK culture requires that scoring schemes misconceptions. This approach requires a
be described in ways that are useful to good deal of work prior to live testing, so
teachers and students. Second is that the is well suited to situations where tasks
consequential validity of such scoring will be used repeatedly.
systems would be dire the advice to
students would be to improve their scores In the USA, the Graduate Management
simply by using more keystrokes. A second Aptitude Test (GMAT) - used to determine
approach which could improve the quality access to business schools - uses
of scoring and reduce costs is being used automated scoring of text. Here again, the
to assess student responses on tasks in test is scored by both human and machine, new goals involve
contexts where the range of acceptable to offer some sort of reliability check for
responses can be well defined, such as in the human marker. the development
short answer science tasks (eg Sukkarieh, of higher-order
Pulman and Raikes 2003). Here,
appropriate (the Earth rotates around the 3.3 ICT SUPPORT FOR CURRENT thinking, and a
sun) and inappropriate (the sun rotates NEW EDUCATIONAL GOALS range of social
around the Earth) responses are defined.
Lists of synonyms are generated for nouns There is an emerging consensus
skills
(our globe) and verbs (circles), and worldwide on new educational goals,
alternative grammatical forms are defined, focused on problem solving using
based on analyses of large numbers of mathematics and science, supported by an
student responses. Student responses are increased use of information technology
parsed using techniques borrowed from (compare, for example, UK developments
Natural Language Processing, and are with those in New Zealand
compared with stored appropriate and www.minedu.govt.nz; and Singapore
inappropriate responses, using a variety of www1.moe.edu.sg/iteducation). These new
Information Extraction techniques (see goals involve the development of higher-
Cowie and Lehnert 1996). Mitchell, order thinking, and a range of social skills
Aldridge, Williamson and Broomhead such as communication, and working in
(2003) describe work at The Dundee groups. There is an honourable tradition of
Medical School. Here, all students take the assessing problem solving via the use of
same examination at the end of every year. extended tasks, such as those developed
Academics are presented with all the by the APU (eg Archenhold, Bell, Donnelly,
responses to the same question, with the Johnson and Welford 1988). However, the
computers judgement on the correctness computer offers some unique features in
or otherwise of the answer, and an terms of representation, interaction, and
estimate of the confidence of the its support for modelling. Here, we
judgement. Human scoring time is describe some recent developments which
dramatically reduced, and staff report make use of these unique features.
positive benefits in terms of the quality of
the questions they ask, both in terms of
rewriting ambiguous questions (which

23
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

3.3.1 The development of Further examples of tasks can be found


World Class Tests in Ridgway and McCusker (2003). Skills
assessed include:
Tests were designed to identify high-
attaining students in problem solving in Understanding and representing
mathematics, science and technology problems: traditional educational goals
at ages 9 and 13 years, as part of such as the ability to interpret tables and
the work on the World Class Arena graphs, and to translate information coded
(www.worldclassarena.org). Computers in one representation into information
make it easy to present new sorts of tasks, coded in another representation continue
for example tasks where dynamic displays to be vital skills for mathematical and
show changes in several variables over scientific literacy. Computers allow fast
time, or which present video of a situation and reversible transformations of
which students must model. A wide variety information from one representation to
of representations can be supported, and another, and students can be asked to
students can be asked to switch between explain the relationships between them.
them. The interactive properties of
computers make them well suited to Assessing process skills in science
the assessment of process skills. and mathematics: the desire to assess
process skills is not new. Traditionally,
Using computers to give students control students would be presented with tasks in
over how data is presented allows them to laboratories, or would be required to keep
work with complex data sets of a sort that logs and portfolios of their laboratory
would be very difficult to work with on work. However, the laboratory setting can
paper. Tasks can be set in realistic introduce elements which reduce the
contexts, using realistic data to address reliability of the assessment, such as
problems of considerable complexity, using instruments which fail to function properly,
resources and methods that are familiar to or materials whose properties are less
professionals working in the relevant field. than ideal. Students are required to
Two examples are presented here: Oxygen physically manipulate apparatus chance
and Bean Lab. differences between students in terms of

the interactive
properties
of computers
make them well
suited to the
assessment of
process skills

24
their previous exposure to particular Students performed better on some tasks computers can
equipment can both reduce reliability, than one might expect notably tasks that
and add an extra cognitive load to the require them to reason from complex data play a leading
intellectual task being performed. In some sets (eg data with two independent role in the
situations, issues of health and safety variables and one dependent variable at
arise. Some education systems are age 9 years). We take this as a very positive development of
unwilling to accept teacher ratings of sign that computers can play a leading role the skills which
students for the purposes of high-stakes in the development of the skills which
testing, with the result that process skills constitute the new educational agenda. constitute the
in science are not assessed at all. In many aspects, student performance was new educational
Computer-based assessment permits the poor - work characterised by guessing,
assessment of these valuable aspects of too little use of systematic methods, agenda
learning science, at modest cost. A range poor hypothesis generation, and poor
of different process skills can be identified, generalisation. On many tasks, students
which include: were able to show evidence of good
reasoning skills; however, explanations
working systematically (for example, were often weak. Given the earlier
choosing tests systematically, discussion of the impact of assessment on
controlling variables and recording the curriculum, it is to be hoped that the
results systematically) use of e-assessment of process skills will
generating and testing hypotheses lead to better student performance on a
range of important activities.
finding rules and relationships
handling complex data World Class Tests focused on summative
assessment in science, mathematics and
testing solutions
technology, and used a variety of contexts,
seeking completeness and rigour (in including geography and economics, as
many real-world situations, exemplified well as biology, physics, and engineering.
by diagnosis and remediation in spheres The ideas are generic, and can be applied
such as medicine and industrial process to many curriculum areas. On the basis of
control, it is important to find all of the analyses of student performance on WCT,
faults in a system). teaching modules for whole class use have
been developed, targeted on weak process
Five sets of live tests have been skills. These teaching modules provide a
administered in the UK and elsewhere, good deal of formative assessment, and
each of which was preceded by extensive require students to engage in reflective
pre-testing. A notable result was the ease activities such as critiquing student work,
with which students interacted with and explaining their own solution strategies.
computers. The affective response from
students was very strong they really We discuss new educational goals
enjoy working on these tasks. This might that are less amenable to summative
be related to the sustained challenge the assessment such as the ability to work
tasks present, which is similar to the in groups, to communicate, to learn to
reported reasons why they like computer- learn in Section 4.
based games (Kirriemuir and
McFarlane 2004).

25
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

3.3.2 Assessing ICT at Key Stage 3 There are three distinct uses for portfolios.
The first is to provide a repository for
Ongoing work funded by QCA sets out to student work; the second is to provide a
assess student attainment in ICT at age 13 stimulus for reflective activity which
years. A key principle for the design of might involve reflection by the student,
these tests is that students should be and critical and creative input from peers
tested on their performance on extended and tutors; the third is as showcase,
tasks (create a web page about topic X for which might be selected by the student to
audience Y, using a particular set of represent their best work (as in an artists
resources - a database, clients accessible portfolio) or to show that the student has
via e-mail, spreadsheets for planning, satisfied some externally defined criteria,
web page creation tools) not on a series as in some teacher accreditation systems
assessment of sub-tasks (use a spreadsheet to add (eg Schulman 1998). These uses are not
up these numbers). An extraordinarily mutually exclusive. Students may well
systems must ambitious goal is to present tasks and wish to archive all their work; reflective
require students score performance entirely by computer. activities and feedback from others will be
This is a laudable aim, and shows a based on a subset of this work; the final
to show the full government commitment to high presentation portfolio will be selected
spectrum of quality e-assessment (including 20m from this corpus.
for the project).
competencies These different uses of portfolios reflect
different, but not always incompatible,
3.3.3 Digital portfolios theories of learning. A behaviourist
approach will focus on defining core
An historical legacy which bedevils the competencies that are impossible to
current education system in the UK is assess in timed examinations, and the
the distinction between academic and need for fast and efficient feedback on
practical subjects. This was enshrined in student products. A social constructivist
the 1944 Education Act, which created view will focus on the importance of
grammar, technical and secondary reflection and sense making by a group
modern schools (Tattersall 2003). Abstract (including the tutor) which will include
thinking is important; appropriate action in the negotiation of educational goals.
context that rests on practical competence
is important. Neither is much use on its ICT provides an opportunity to introduce
own, and students should be taught to manageable, high quality coursework as
both abstract and apply. For this to part of the summative assessment
become a classroom reality, assessment process. Student portfolios have been
systems must require students to show advocated for a long time, and have
the full spectrum of competencies in a been used on a limited basis. From the
number of school subjects. If high-stakes viewpoint of assessment, the rationale for
assessment systems fail to reward such portfolios is clear: there are a number of
behaviours, they are unlikely to be the valuable activities and attainments that
focus of much work in school. E-portfolios cannot be assessed using the format of
offer a way forward. timed tests. The ability to create, design,
reflect, modify and persevere are all

26
important goals of education. It is entirely it simplifies the documentation of the
appropriate to assess these processes by development of work reducing the
collecting evidence on the ability to engage busy work students might otherwise
in an extended piece of work, and to bring have had to engage in. The process of
it to a successful conclusion by the documentation via a portfolio of work
creation of some product lab report, supports student reflections on processes
video, installation etc. Part of the portfolio on decisions made deliberately, those
can (should) provide evidence of the range forced by circumstances, and those that
of personal skills demonstrated, perhaps just sort of happened. Digital images are
under the headings suggested in the easy to manipulate and present. Student
Tomlinson Report (2004): student self- presentations of work on the development
awareness of themselves and the ways of artefacts is easy, once images are
they learn and what they know; how captured digitally.
students appear to, and interact with,
others; thinking about possible futures and In some subjects, such as design and the ability to
making informed decisions. A section of technology, and art, extended projects are
the portfolio in the form of a viva, or simply at the heart of the discipline. The use of
create, design,
annotations of products where students e-portfolios maps directly onto current reflect, modify
show their attainments in these three conceptions of the domain, and offers
aspects of performance is appropriate. practical solutions to some common
and persevere
problems (eg Kimbell 2003). This work is are all important
A number of problems are associated with important, and is likely to be applicable on
portfolios and other sorts of coursework. a large scale in the near future. A very
goals of
One is the problem of storage especially large number of institutions have made education
in design projects and in art. ICT can solve use of portfolio systems; the American
the problem by holding images of artefacts Association for Higher Education (AAHE)
created. A second problem is student Portfolio Clearinghouse (www.aahe.
misbehaviour; this can have a number org/teaching/portfolio_db.htm) provides an
of forms. One is simply that work is online searchable database of profiles of
plagiarised; another is that students create electronic portfolio projects and resources
some artefact, then back-fill by inventing in higher education, and is a valuable
the development process (which is often source of ideas.
assessed as part of the final mark) post
hoc. ICT can help with both of these
problems by requiring the submission of 3.4 SUMMARY OF SECTION 3
images of intermediate products, with
time stamps. On a more positive note, There are a number of exciting
the ability to store and work with images developments in the use of e-assessment
(photographs, video) is likely to make for both summative and formative
teaching of the design process more purposes, and several UK developments
effective. Devices such as mobile phones are at the leading edge, worldwide. In
with in-built cameras and facilities for the UK, the government has decided
audio recording make it easy to document that extensive use will be made of e-
the evolution of ideas and artefacts. This assessment. Some of these developments
facility serves a number of functions. First, are a response to current problems

27
SECTION 3

CURRENT DEVELOPMENTS
IN E-ASSESSMENT

associated with increases in the volume E-assessment is a stimulus for rethinking


of assessment; some reflect a desire the whole curriculum, as well as all
to improve the technical quality of current assessment systems. E-
assessment (such as increased scoring assessment provides a cost-effective way
reliability), and to make the assessment to integrate high quality portfolio
process more convenient and more useful assessment with externally set and
to users (by the introduction of on-demand marked tests, in any combination. This
testing, and fast reporting of results, for makes it likely that there will be significant
example). E-assessment also makes it changes in the structure of summative
possible to assess aspects of performance assessments, because of the range of
that have been seen as desirable for a long student attainments that can now be
time such as the assessment of process assessed reliably. There is likely to be
skills, and the efficient handling of student extensive use of teacher assessment of
portfolios. Using E-assessment to test those aspects of performance best judged
student ICT capability represents an by humans (including extended pieces of
extremely ambitious goal of presenting work assembled into portfolios), and more
holistic tasks to assess performance, extensive use made of on-demand tests of
rather than a collection of short tasks those aspects of performance which can
which are symptoms, rather than be done easily by computer, or which are
exemplars, of ICT capability. Nevertheless, done best by computer.
some major challenges face these new
developments. Paper tests have a number
of advantages in terms of the quality of the
image presented, and the variety of ways in
which students can respond; automatic
scoring of responses will be very difficult,
and in some cases impossible to achieve
via computer.

A complete reliance on paper-based


assessment has a number of drawbacks;
first is that such assessments are
increasingly inauthentic as classroom and
professional practices embrace ICT.
Second is that such assessments
constrain progress, and have a negative
effect on students who have to learn (just
for the exam) how to do things on paper
that are done far more effectively with ICT.
A third major constraint is that current
innovative suggestions for curriculum
reform, which rely on student portfolios for
their implementation, will be impossible to
manage on a large scale without extensive
use of ICT.

28
SECTION 4

OPPORTUNITIES AND CHALLENGES


FOR E-ASSESSMENT

4 OPPORTUNITIES AND Here, examples of metacognition are given


CHALLENGES FOR E-ASSESSMENT under four headings: knowing how to use
knowledge; analysing and improving
Here, we consider some issues which need cognitive processes; supporting reflection
to be addressed as a matter of urgency. and critical skills; and assessing
First are some speculations on how we competence with different thinking styles.
might assess process skills - essential but
often ill-defined educational goals. It will Knowing how to use knowledge: the web
be important to establish the value of offers great opportunities and pitfalls for
such assessments as part of large-scale assessment. Most obviously, the existence
summative assessment, in contrast of the web means that successful use of it
to their roles as potentially useful should be an educational target. Expertise
components of formative assessment. in navigation, such as learning how to
It will also be important to establish the bookmark useful sources, and how to
appropriate scale of such assessments, refine searches are useful skills, but are
and their locus in the curriculum, in terms subsidiary to a set of meta-knowledge
of educational gains and manageability. skills about the nature of knowledge how
Second, we consider the problems of it is constructed, presented, and used by
going to scale. Large scale innovation different people for different purposes.
especially where computers are involved There is a need for students to develop
does not always run smoothly. sophisticated theories-in-action about
knowledge. These theories should include
accounts of the nature of knowledge its
generation, and the various functions it
4.1 ASSESSING PROCESS SKILLS
serves (including its use as just another
rhetorical device!). Students also need to
4.1.1 Assessing metacognition know about their own knowing what they
do and do not know, how they acquire, lose
As we move towards a knowledge-based and change their own knowledge and
society, the development of metacognitive how they control their cognitive processes
skills increases in importance, and they when solving problems.
become educational goals in themselves.
Currently, these goals are ill-defined in We address the first goal elsewhere in the
that there is not yet a consensus in the discussion on assessing competence in
educational community about their exact ICT. The latter goal is illustrated by Lord
nature or how they can be assessed. Goals Armstrongs remark power is knowing
can be described, and recognised when
there is a need
how to use knowledge. The common
they are achieved, but exemplification corruption to knowledge is power for students
needs further work, and a general sharing misses Armstrongs point almost entirely.
of ideas. Ridgway, Swan and Burkhardt
to develop
Our educational ambitions should be
(2001) exemplify this process as part of to encourage students to become sophisticated
Assessing Mathematical Thinking in sophisticated users and creators of
materials developed for the US National
theories-in-
knowledge. Good formative assessment
Institute for Science Education should contribute to students action about
(www.wcer.wisc.edu/nise/cl1). development; web-based sources can knowledge

29
SECTION 4

OPPORTUNITIES AND CHALLENGES


FOR E-ASSESSMENT

be part of both formative and summative asked to compare and contrast different
assessment of these key elements of presentations, and to describe the
student performance. evolution of a news event over time. This
requires analysis of the way that evidence
Key aspects of performance relate to the is selected, and the ways that events are
exploration of the origins of the source, reconstructed over time.
analysis of its qualities as a source, and its
relation to a wider set of information. A further key aspect of knowledge use is
Successful formative assessment helps the ability to relate a particular source to a
students to internalise questions and larger body of knowledge. It will always be
question styles. For summative important for learners to develop rich
assessment, we expect students to ask schemas of knowledge facts, skills, and
questions about the nature of the procedures and their interconnections as
information source. The originator can be the basis for judging the value or
important dietary advice from Kelloggs otherwise of putative new information, or a
should be treated more cautiously than theoretical account. In science, a simple
advice from the British Medical example is a digital image of a mammal
Association. Who created it? For what with horns and claws. Students are
purpose? From what perspective was this expected to say it is most unlikely, because
written? The poor quality of much of the horns are associated with herbivores, and
information on the web can be a virtue, claws with carnivores. At a higher level of
pedagogically, because students see the abstraction, students might be asked to
sense in challenging the authority of any resolve famous conflicts in scientific ideas,
source, and can do so easily by considering in terms of what was known at the time.
alternative sources (eg Downes and For example, Lord Kelvin probably the
Zammit 2000). most distinguished scientist of his day
argued against the theory of evolution, on
Skills in analysing documents in terms of the grounds that the timescale was
their style and their use of particular impossible. The core of the Earth is largely
rhetorical devices, and in creating molten, but if the Earth were really the
documents for different audiences and in millions of years old needed for
different writing genres, are being evolutionary processes to work, it would
developed and used in English (and have cooled down long ago. What didnt he
sociology and philosophy at university know (or is his criticism valid)? The web is
level). Again, the ubiquitous use of web a source of information that challenges
sources provides both a rationale for the current knowledge students can be
value of these analytic and creative asked to relate breaking research to a
activities, and a rich source of resources wider set of knowledge. The recent scare
for assessment purposes. over the MMR vaccine (and the damage
that will be done to children by an under-
The web makes it easy to compare and analysed and over-publicised piece of
contrast different interpretations of the research) provides an example.
same events by different news providers,
and by the same provider over time. In A vivid example of summative evaluation
terms of assessment, students can be which requires both a deep knowledge

30
schema and powerful skills in knowledge that can be effective when preparing for
deconstruction and reconstruction is conventional examinations. There, the
provided by a final undergraduate danger is that students hold information
examination at Goldsmiths University on in a relatively temporary state for the
the art history course, where students are purpose of the examination, then forget
presented with two pictures, side by side, the information once the examination is
which they are to compare and contrast. over. Open-web examinations are likely to
They are required to name the artist, have desirable consequential validity
deconstruct the iconography, and interpret that is to say, are likely to lead to desirable
each work in its historical context. This learning (and learning strategies). The
could be presented via ICT, and could be unpopularity of open-book examinations
extended to film, and to other contexts. (which probably arises because they
require serious thought about the subject open-web
Another approach to supporting reflection matter) is likely to apply equally to open-
about knowledge acquisition and creation web examinations. The potential for examinations
is to incorporate assignments that require fraudulent behaviour by students (such as are likely to lead
a reflective account of the process of e-mailing for advice in situations where
creating some artefact (object or written). the purpose of testing is to assess the to desirable
Students can be asked process questions ability to search the web, or searching the learning
about sources of information ways to find web when the purpose of the assessment
good sources (perhaps in the form of is to assess networking skills) means (and learning
advice to someone with a similar job to that student activities will need to be strategies)
do), and about the sources themselves. constrained in appropriate ways.
They can be asked about problems faced, Nevertheless, open-web assessment
and the ways they were solved, in these should be explored further.
meta-learning essays.
Analysing and improving cognitive
Open-web examinations offer a parallel processes: interactive whiteboards can
to open-book examinations. One virtue provide the facility to work as a whole
of such examinations is that they are class on a problem or simulation, then to
more authentic than conventional replay and critique the sequence of
examinations, in that, outside educational actions. This provides the opportunity to
contexts, one rarely has to answer a discuss seemingly abstract concepts such
substantive question without any as strategy and exemplify them with
resources. They allow the examiner to set concrete examples. Analogies with the
a broader range of questions, because analysis of games (eg tennis) can make the
students are not expected to retain all activity seem natural in class (of course,
the relevant information in memory. analysis of on-screen video of ongoing
An adaptive strategy for success on games is a specific example of the sorts of
such examinations is to develop meta- analyses being described here). The long-
knowledge of the whole area, and to index term intention is to help students develop
sources very carefully. A large information metacognitive skills that will be applicable
bank with no index is of little use. Compare in a wide variety of situations. By looking at
the preparation necessary for this sort of different solution attempts, students can
examination with the cramming strategy be asked high-level questions such as

31
SECTION 4

OPPORTUNITIES AND CHALLENGES


FOR E-ASSESSMENT

how do you solve problems of this sort? annotate work to show where they meet
which can be assessed more formally by the assessment criteria.
tasks such as write some guidance for
someone else, that will help them to solve Courtenay (personal communication, 2004)
problems like this one. A requirement for described an activity designed to support
summative e-portfolios could be that creative writing in English in a night class
sample reflective analyses of processes comprised of 30 non-native speakers at an
be included. early stage of learning English. Courtenay
focuses on creation and critique, and
These techniques have great potential seeks to spend as much time as possible
when the focus is on the social and interacting with his students. Each student
emotional education of students. Topics writes online, and when they are satisfied
raised in personal and social education with their composition, it is posted to a
such as approaches to bullying can be shared server. Every student is required to
approached by presenting students with offer constructive comments on five
video vignettes, and asking them to compositions, and to revise their own
describe situations, the interactions that writing in the light of five sets of
take place, and the feelings of participants. comments. The teacher is able to tour and
Parallel information channels (provided by coach individuals as they write. With little
the participants) can provide students with effort, this approach could be extended to
feedback on the correctness or otherwise providing summative assessment.
of their insights. At a lower level, Students could be required to submit their
assessing childrens ability to identify the comments on others writing to be
emotions being expressed in different evaluated, and could provide evidence of
faces can give insights into their their ability to use comments on their own
developmental state (or, in more extreme work. An assessment system like this
cases, into pathological states such as would reinforce rather than distort the
autism). If summative information is educational ambitions of the teacher.
appropriate, it can be based on the
analysis of such vignettes. Peer assessment is attractive for a
number of reasons. (Toppings 1998 review
Supporting reflection and critical skills: demonstrated that it is associated with
an important higher-order skill is the gains on conventional performance
ability to review and improve work. This measures, in higher education.) Students
can be done via paper and pencil (for can be asked to create far more pieces of
example by writing on every third line, and work than could be marked by a single
changing pen colour at every revision tutor. It can avoid the problem that as a
cycle), but is made very easy by the use of class size gets bigger, the load on the
ICT, with facilities such as track changes tutor increases directly, along with the
in MS-Word. Students can be asked to time taken to provide feedback to students.
provide examples of their ability to improve Students must understand criteria for
work on the basis of others and their own assessment, and must acquire a range of
suggestions, and of their ability to critique higher-order skills, such as abstracting
the work of others. Another way to assess ideas, detecting errors and misconceptions,
critical thinking is to require students to critiquing and suggesting improvements, if

32
they are to engage in peer assessment. assignment, write an exemplar answer peer assessment
Peer assessment is a fact of life outside for calibration, and select two pieces of
education, so peer assessment is far more student work which contain interesting is a fact of life
authentic than some forms of assessment errors or omissions. Each of these has outside
such as multiple choice tests. Possible to be graded by the tutor, and relevant
disadvantages relate to the possibility of an comments have to be written. The tutor education
enhanced workload on students, unreliable also writes key questions on content and
feedback, and biased feedback. style. CPR is designed to overcome the
potential weakness of peer assessment
A number of commercially available in terms of unreliable assessment
systems have been designed to support (via training and moderation) and bias
peer assessment. Calibrated Peer (via anonymity). The authors claim
Review (Chapman and Fiore 2001) was considerable gains in students ability to
designed to support the peer assessment learn to learn because their attention
of essays in molecular science, but has is focused on abstracting ideas and
been applied in a variety of subjects, and arguments, describing, analysing and
with students across the education system. assessing the quality of material, and in
Students write short essays, and are asked review. CPR also increases the amount
questions designed to foster their critical of writing that students do.
thinking. Students are presented with
three calibration essays to grade, and Doiron and Isaac (2002) have developed a
must demonstrate their competence novel form of online peer review designed
before they progress. Two of the essays to complement the American College of
contain errors and misconceptions which Surgeons Advanced Trauma Life Support
students must identify and correct. Course for fourth year medical students.
Students are also asked questions on style Their system involves self-assessment,
and grammar. The scores they give to the peer evaluation, feedback and debate.
assignments are compared with official There is an inherent problem giving large
scores, and a calibration report is created numbers of students direct experience of
for the student and the tutor. If Emergency Room procedures. Here,
performance is inadequate, more students are presented with a realistic
instruction is provided, and the student case study, and must prevent the patient
must repeat the activity. Once they have from dying, conduct clinical tests, then
shown that they can assess essays request appropriate lab work followed by
effectively and reliably, they are asked to diagnosis and recommendation of a
grade three essays by peers, and finally treatment. Students reflect on, and self-
are asked to grade their own essay. The assess, their knowledge. They submit a
student and the instructor receive diagnosis and proposed treatment plan to
comments and scores. the whole group. For peer review, they are
presented with two other diagnoses and
CPR is not restricted to essays in science; treatments one from the tutor, prepared
the idea is generic, and can be applied to to contain errors, for critique. If the
literary criticism, commentaries on a piece student fails to detect the errors, they get
of art, or laboratory reports, for example. individual feedback from the tutor.
The tutor must select the focus of the Students then review live reports from

33
SECTION 4

OPPORTUNITIES AND CHALLENGES


FOR E-ASSESSMENT

two of their peers (so three reviews are course of group work. He suggests a
considered together). Where there are formal mechanism for this, where thinking
disagreements, the two views are styles are associated with hats of different
presented to a larger group (four to ten colours, and group members are invited to
students) who must all offer their own take particular roles sometimes as
view, and debate the issue. Similar work is individuals, and sometimes as a whole
being conducted on a health psychology group. Thinking styles include asking
course, and in engineering. about what is known or what is needed
(the White Hat); saying why an idea wont
Assessing competence with different work (the Black Hat); generating ideas and
thinking styles: mobile phone technology alternatives (the Green Hat); describing
might provide a means of assessing feelings, hunches and intuitions (the Red
mobile phone thinking styles via simulated group work. Hat); managing group processes (the Blue
Here, each student works in a simulated Hat); and the optimistic advocacy of ideas
technology might environment, where responses from other (the Yellow Hat).
provide a means group members are pre-specified, and
some responses to the actions of the Given some specific suggestions for
of assessing student are pre-defined. This environment actions via mobile phone or e-mail,
thinking styles is artificial for a number of obvious students can be asked to work in Red,
reasons contact is via phone (or e-mail) Yellow and Black Hat styles; or given a
rather than face-to-face and the range of stream of (simulated) input to a
dynamic interactions is constrained. conference, students can be asked to work
However, these constraints mean that in Blue Hat mode. Their responses provide
students can be assessed in relatively information on their strengths and
standardised conditions, and sequences weaknesses working in different thinking
can be replayed for analysis and reflection styles. This idea is not restricted to de
as part of formative assessment. Bonos framework, but is a generic
idea for assessing individual skills in
Analysing the ability to engage in De group settings.
Bonos (2000) Thinking Hats activity
provides a concrete example. De Bono has
identified a number of thinking styles, all 4.1.2 Assessing group projects
of which are useful when solving problems.
None is effective on its own. He argues A valuable skill is the ability to work
that people differ in their preferences for productively in groups. This requires
these different thinking styles, and often good communication skills, understanding
stick with a particular style of thinking. In the criteria for effective group work,
terms of group dynamics, individuals can understanding different roles, the ability to
become ego-involved with a particular assess ones own work and the work of
style of thinking, with negative others, and the ability to respond positively
consequences for the productivity of the to formative and summative feedback. The
group. De Bono argues that these different assessment of group work is problematic
thinking styles should be made explicit, for a number of reasons: problems can
and that every group member should be caused by social loafing and the
engage with every thinking style in the allocation of equal marks for unequal

34
contributions; undesirable effects of Intelligence. A problem with some of it is important
students rating peers; and time-hungry these early proponents of creativity (eg
procedures for gathering accurate Getzels and Jackson 1962) was that they to develop
evidence on student performance. accepted many of the philosophical creativity, and
assumptions of the Intelligence movement,
SPARK (Self and Peer Assessment and many of their methods, but were to evaluate the
Resource Kit - www.educ.dab.uts. incompetent in their use. The result was a products of
edu.au/darrall/sparksite) is an academic movement that was based on some good
open source project designed to support ideas, but which was poorly theorised, and creative thinking
the effective evaluation of group work, that supported by flawed evidence. Just as
has been used in a variety of contexts in there are many styles of analytic thinking,
higher education. It requires a clear that are coloured and improved by
specification of the tasks to be performed knowledge in particular domains, and
by the group and the assessment criteria. different ways to represent information,
Students reflect on group processes during so too are there many styles of creative
the performance of the task, and rate all thinking, again, influenced by knowledge
the group members, and themselves and experience in a variety of domains.
against the criteria provided. The tutor Creativity (as defined above) requires an
monitors the work of the group, grades the intimate interplay of creative and analytic
product of the group work, uses SPARK to thinking. It is important to develop
convert group marks into individual marks, creativity, and to evaluate the products of
and provides individual summative and creative thinking. Creativity should be
formative feedback (eg that a student rates evaluated by an analysis of product, and
their own contribution to the group far by an analysis of student processes,
higher than other group members do). using methods described earlier (notably,
Evaluations of SPARK by its authors in tracking the design process, and reflective
a variety of higher education contexts accounts on this process).
have been positive (eg Freeman and
McKenzie 2002). It can be difficult to obtain good paper-
based accounts of student processes and
results after engaging with an extended
4.1.3 Assessing creativity piece of work. This can be a desirable
activity for a number of reasons. First, it
Creativity involves the production of a new requires students to translate knowledge
idea or artefact that is judged by some from one form to another, and to consider
community to be of value. Many writers the needs of a different audience notably
have made a distinction between analytic from a static written form whose primary
and creative thinking. Analytic thinking has audience is the teacher, to a visual and
been characterised as: linear, rational, dynamic form for some predefined
logical, conscious and deliberate. Creative audience, who will have a range of
thinking has been described as: parallel, understandings about the topic in hand.
unconstrained, illogical, unconscious, and Second, it is inherently valuable as a skill.
chaotic. Creativity became a bandwagon Digital cameras and whiteboards make it
for education in the 1960s, in part as a easy for students to show their work
healthy corrective to an over-emphasis on (which might be on paper, in the form of

35
SECTION 4

OPPORTUNITIES AND CHALLENGES


FOR E-ASSESSMENT

manipulatives, or some artefact that has elementary aspects of learning such as


been created) and to explain what they pronunciation, to vocabulary, and to
have done, justify their answer, and correcting sentence structure mistakes
describe the design decisions they took. presented to students. Given test
technologies that support tailored testing,
the phone system could be used to provide
4.1.4 Assessing communication skills on-demand testing of some aspects of
language use. Such systems are unlikely to
Mobile phones could be used more be useable (in the short term at least) for
extensively for assessment. A simple high-stakes testing, because of problems
example would be to use mobile phones of impersonation. These problems may be
for the aural comprehension aspect of removed if effective person recognition
language learning. Current practices of systems are developed and introduced on
using an analogue tape recorder at the a large scale.
front of a classroom are inherently unfair.
The quality of the sound will differ as a
function of the tape machine used; the 4.2 NATIONAL CURRICULA,
sound intensity at the front of the room will NATIONAL ASSESSMENT
be dramatically higher than at the back of
the room. Using conventional computer The Tomlinson Report (2004) addresses
technology, Southern Australia uses MP3 fundamental questions about curriculum
files to test language comprehension (see design and assessment, and describes a
www.ssabsa.sa.edu.au) clearly, good number of serious problems with current
practice. systems. Assessment exemplifies
educational goals, and has a major effect
The eVIVA project (www.qca.org.uk/ on educational practice. Unless
adultlearning/downloads/eviva_project.pdf, assessment systems are aligned with
www.eviva.tv) uses phones as the medium educational goals, they will distort
for oral testing with portfolio-based Key curriculum ambitions. There is a general
Stage 3 ICT assessment. Students can desire for more school-based assessment,
book a test session, and so can have and more process-based assessment, and
(almost) on-demand testing. The phones an insistence that current high standards
are also used for recording voice of equity and probity in the examination
postcards of learning milestones, and process are maintained. E-assessment
posting these to a central website. The (eg via e-portfolios) can provide the means
voice postcards can be used by a student to empower teachers and schools, while
to support the piece of portfolio evidence ensuring that high standards of
which they are presenting. assessment are met. ICT can support the
whole process of teacher preparation, and
As speech recognition technologies the establishment of procedures to ensure
continue to improve, one can envisage a comparability of standards across schools.
situation where questions are posed orally School-based judgements could be
by telephone, and student responses are moderated by external computer-based
scored automatically. In the case of tests. E-assessment can extend the range
language learning, this could be applied to of reliable assessments that can be

36
conducted, and so can widen the debate on will be important to phase the introduction
curriculum and assessment design. On- of e-assessment in such a way that the
demand testing will have considerable load on students, teachers, schools
implications for curriculum planning. and systems is lower than the current
Students could take summative tests at assessment load. Some barriers are
different times, and could progress discussed below.
through the curriculum at different rates.
Establishing the credibility of
E-assessment could reduce the damage e-assessment: in some areas such as
caused by current tests. At present, new competency-based assessment, the case
SAT papers are created each year, and all for e-assessment is self-evident. In other
students answer the same questions. If the areas, reasonable sceptics will have to be
purpose of testing is to establish the convinced of its value. They will have
performance of some system (such as a concerns about the construct validity of
school or an LEA), better methods could new tests (exactly what do they measure?);
be employed. If there were a large bank of the reliability of new tests in comparison
tasks available in electronic form, and with existing tests; and the educational
different students received a different set standards required both in relation to
of tasks, then coverage of the curriculum current tests, and across tests such as
could be better, and there would be no those given on-demand in different places
need to report individual student scores. and at different times. Each of these
This would have the advantage that a questions will need to be addressed for
larger variety of task types could be used, each family of e-assessments, usually
and would avoid the current distortions by means of an empirical study.
caused by teachers teaching to the SAT.
Building system capacity: there is an
urgent need to build capacity for e-
4.3 EVOLUTION AND REVOLUTION assessment that ranges from test design,
test delivery and processing, and expertise
Even where there is a shared vision on in school. Each of these is problematic.
future curricula, there can be considerable
problems in implementation. Ridgway Task and test design: very few people have
(1998) draws analogies between ecological expertise in creating e-assessments, in e-assessment
restoration and educational change, and comparison to the large numbers of could reduce the
describes the sorts of research needed for people competent to create conventional
successful change. This style is close to tests. There is an urgent need to create damage caused
research in fast-changing fields such as new task types and to explore their by current tests
electronics, where discoveries and reliability and validity. If we do not continue
inventions drive practice and theory, in to explore, students will be faced with a set
contrast to well-established fields where of tasks which recently were innovative,
theory can lead practice. It is important to but which are now hackneyed.
be aware that some goals are easy to
achieve from most starting points, whilst
others need a good deal of capacity
building before they can be reached. It

37
SECTION 4

OPPORTUNITIES AND CHALLENGES


FOR E-ASSESSMENT

Establishing technical standards: increased accuracy and validation at input,


currently, there are three sets of technical and the auto-totalling of marks by the
standards. We need a consensus computer, and the electronic management
document. The needs of students with of reporting and discrepancies.
special needs must be addressed.
Standards for monitoring the quality of the On examiners and examining: High quality
assessments given in schools (actually a training is an essential aspect of reliable
rather hostile environment for ICT, assessment. Tomlinson recommends
because of the plethora of machines and (paras 134136) a thorough
operating systems), and the procedures professionalisation of the role of markers
put in place by examination authorities and examiners, including coursework
need to be written, and validated in markers, and the Report makes a number
it is practical settings. of specific recommendations on how this
might be institutionalised via schemes for
important that ICT infrastructure: good broadband professional development, accreditation,
e-assessment systems are needed in particular, very and appropriate professional reward
high specification systems are needed for systems. The Secondary Heads
does not create big schools. Currently, about 40% of Associations have argued for the
a digital divide primary schools, and about 100% of establishment of Chartered Examiners in
secondary schools have broadband access, schools and colleges, who would give their
but not necessarily at the levels needed for organisations the right to take more
online assessment (Rt Hon Charles Clarke control over examination assessment.
MP 2004). The proposals set out in the
Tomlinson Report are only feasible if a School and test-centre expertise:
national database of student achievement this presents a massive challenge for
is established. At school level, extensive professional development. Schools need
investment in ICT will be needed, and to develop systems which are robust.
costs will recur.
Plagiarism: poses a major threat to all
The examination process: dealing with assessment systems (eg Ridgway and
e-assessment poses serious challenges Smith 2004). These threats range from
to paper-based examination authorities. downloading work direct from the internet,
They need to develop a robust technology commissioning work, and impersonation.
infrastructure, and (at least as important) Assessment systems will need to be
the competencies of staff to make these resistant to such attacks.
systems function effectively. A good start
has been made here, for example in the Equity issues: it is important that
work on the assessment of basic and key e-assessment does not create a digital
skills. However, there are salutary divide which privileges some students
messages from the QCA Report on over others on the basis of opportunities
implementation (QCA 2004). AQA report of access.
(Adams and Hudson 2004) that their
surveys show considerable satisfaction
from examiners. Examiners report that the
software is easy to use; they like the

38
4.4 RELIABLE TEACHER 4.6 SUMMARY OF SECTION 4
ASSESSMENT VIA E-PORTFOLIOS
New educational goals continue to
A key decision for educational systems is emerge, and the process of critical
to decide exactly how much of the reflection on what is important to learn,
students time should be devoted to and how this might be assessed
working on extended projects, and how authentically needs to be institutionalised
much should be based on shorter into curriculum planning. In this section,
activities. A related decision is the balance we explore ways to assess metacognition,
to be struck between portfolio systems group projects, creativity and
assessed in school, and timed external communication skills. E-assessment is
assessments. A key issue is to establish certain to play a major role in defining and
robust and reliable systems of school- implementing curriculum change in the e-assessment
based assessment. It is worth highlighting UK. There is a strong government
the extreme positions that different commitment to e-assessment, and good is certain to play
systems use. In some systems, all initial progress has been made. Major a major role in
assessment is done externally. In some challenges of going to scale have yet to
systems for example Queensland, be faced. A good deal of innovative work is defining and
Australia - all assessment is school- needed, coupled with a grounded approach implementing
based. Queensland provides extensive to system-wide implementation.
systems for training teachers, and for
curriculum
moderating their judgements. ICT can change in the UK
facilitate this process. All student
submissions can be put onto the web,
and systems of cross-moderation can be
established. Externally defined tests can
be used to guide the moderation process.

4.5 DUMBING-DOWN ASSESSMENT

There is a danger that considerations of


cost and ease of assessment will lead to
the introduction of cheap assessment
systems which prove to be very expensive
in terms of the damage they do to
students educational experiences. At the
time of writing, this seems most unlikely in
the UK. QCA have funded some innovative
e-assessment developments at investment
levels beyond the reach of most companies,
and have a large group focused on
developing and sharing expertise in
e-assessment (www.qca.org.uk).

39
GLOSSARY

ACKNOWLEDGEMENTS examinations, questionnaires, surveys and


collateral sources used to draw inferences
We wish to thank a number of people who about characteristics of people, objects or
have commented constructively on this programs for a specific purpose
document, in particular Keri Facer, Annika
Basic skills the ability to read, write and
Small, Jeremy Tafler, and Kathleen
speak in English and use mathematics at a
Tattersall. We are grateful to them for their
level necessary to function and progress at
input. All the faults and errors of omission
work and society in general
are our own.
CAS Computer Algebra System. Software
package used for the manipulation of
GLOSSARY mathematical formulae. Automates
tedious and sometimes difficult algebraic
Adaptive testing a sequential form of manipulation tasks. Systems vary and may
individual testing in which successive include facilities for graphing equations or
items in the test are chosen based provide a programming language for the
primarily on the psychometric properties user to define their own procedures
and content of the items, and the
City and Guilds major awarding body for
participants response to previous items
vocational qualifications in the UK
A-level (AS/A2) General Certificate of
Competency-based assessment
Education (GCE) Advanced Level. Study
assessment process based on the
usually consists of a two-year academic
collection of evidence on which judgments
course and students will usually select two
are made concerning progress towards
or three subjects from subjects studied at
satisfaction of standard performance
AS-levels to continue to A-level (called A2)
criteria
Anchor(s) a sample of student work that
Concept map the arrangement of ideas
exemplifies a specific level of performance.
into a visual layout highlighting
Markers use anchors to score student
connections between associated ideas,
work, usually comparing the student
revealing the structural pattern in the
performance to the anchor
information
AQA an awarding body: Assessment and
Criterion referenced assessment
Qualifications Alliance formed from the
assessment linked to predefined
merger of Associated Examining Board
standards. (eg Can swim 25 metres in a
(AEB) and the Northern Examinations and
swimming pool)
Assessment Board (NEAB) in 2000
CSE Certificate of Secondary Education:
AS-levels General Certificate of Advanced
former system of British examinations
Supplementary Level, considered to be the
taken in a range of subjects, usually at the
equivalent of half an A-level. Young people
age of 16
are now expected to study four AS-levels
during Year 12 at school or college Diagnostic testing testing used to identify
the conceptions and misconceptions with a
Assessment any systematic method of
view to providing appropriate remedial
obtaining evidence from tests,
experiences

40
Discrimination the ability to distinguish taken as an alternative to GCSE or A-
between and among different levels of levels, usually after compulsory schooling.
work or achievement Available at three levels; Foundation,
Intermediate, and Advanced
E-assessment electronic assessment:
processes involving the implementation of High-stakes assessment assessment that
ICT for the recording, transmission, has important consequences or
presentation and processing of implications for students, staff or schools
assessment material
ICT Information and Communications
Edexcel UK examining and awarding body Technology
providing a range of qualifications
Key sills a group of skills valued by
including at higher education level
employers as being central to all work and
EiC Excellence in Cities. Government learning, including communication,
initiative aimed at raising the educational information technology, application of
aspirations and attainment of children in numbers, working with others, and
inner cities improving own learning and performance
European Computer Driving Licence Key Stages the four stages of the National
European-wide qualification allowing Curriculum: KS1 for pupils aged 5-7; KS2
candidates to demonstrate competence in for 7-11; KS3 for 11-14; KS4 for 14-16
computer skills, covering the areas of
NVQ National Vocational Qualifications.
basic concepts of IT, using the computer
Work-based vocational qualifications. They
and managing files, word processing,
are portfolio-based qualifications which
spreadsheets, database, presentation and
show skills, knowledge and ability in
information, and communication
specific work areas. Can be taken at five
Formative assessment often called levels, depending on level of expertise and
assessment for learning. Assessment used responsibility of the job
to support teaching and learning, which
O-level also GCE Ordinary level. Former
identifies strengths and weaknesses of the
system of British examinations taken in a
student
range of subjects, usually at the age of 16.
GCE General Certificate of Education Ran in parallel with but at a higher level
than CSE. Both systems now replaced by
GCSE General Certificate of Secondary current GCSE
Education (GCSE). The main secondary
school examinations usually at 16, which Parallel forms tests that are created to
replaced previous system GCE O-levels measure the same constructs, and to
and CSEs produce the same scores, if they were
given to individuals on different occasions
GIS Geographic Information System.
System of software used for the storage, PDA Personal Digital Assistant; a small
retrieval, mapping and analysis of spatial hand-held computer. Depending on level of
data, such as mortality by different regions sophistication may allow e-mail, word
processing, music playback, internet
GNVQ General National Vocational access, digital photography or GPS
Qualification. Vocational qualification, often reception

41
GLOSSARY

Pedagogy philosophy of approach to and science achievement from an


schooling, learning, and teaching including international perspective. Data from 1995,
what is taught, how teaching occurs, and 1999, and 2003
how learning occurs
UCLES University of Cambridge Local
Portfolio a representative collection of a Examinations Syndicate, comprising three
candidates work, which is used to business units: Cambridge ESOL (English
demonstrate or exemplify either that a for Speakers of Other Languages),
range of criteria has been met, or to providing examinations in English as a
showcase the very best that a candidate is foreign language and qualifications for
capable of language teachers; CIE (University of
Cambridge International Examinations),
Portfolio assessment assessment based providing international school
on judgment made about the work shown examinations and international vocational
as evidence within a portfolio awards; and OCR (Oxford, Cambridge and
Predictive validity the extent to which RSA Examinations), providing general and
scores on a test predict some future vocational qualification
performance. For example, a students Validity the appropriateness of the
GSCE grade can be used to predict their interpretation and use of the results for
likely A-level grade in some subjects, the any assessment procedure
prediction is better than in other subjects
Value added the increase in learning that
QCA UK public body, sponsored by the occurs during a course of education.
Department for Education and Skills Based either on the gains of an individual
(DfES). Roles include the maintenance and or a group of students. Requires a baseline
development of the national curriculum measurement for comparison
and associated assessments, tests and
examinations
Reliability reliability in measurement and
testing is a measure of the accuracy of the
score achieved, with respect to the
likelihood that the score would be constant
if the test were re-taken or the same
performance were re-scored by another
marker, or if another test from a test bank
of ostensibly equivalent items is used
Summative assessment assessment used
to measure performance, usually at the
end of a course of study
TIMSS Trends in International
Mathematics and Science Study, formerly
Third International Mathematics and
Science Study. Comprehensive study
offering data on students mathematics

42
BIBLIOGRAPHY

BIBLIOGRAPHY Cowie, J and Lehnert W (1996).


Information extraction. Communications
Adams, C and Hudson, G (2004). AQA and of the ACM vol 39 (1), pp80-91
DRS electronic mark capture, presented at De Bono, E (2000). Six Thinking Hats.
the QCA E-assessment Summit, 24 April London: Penguin Books
Aim Online P10 Supplement (2003). Doiron, G and Isaac JR (2002). Designing
Supplement to the VCAA Bulletin No 6 an ER online role play for medical
September 2003. AIM Online: students. 2nd Symposium on Teaching and
www.aimonline.vic.edu.au Learning in Higher Education Paradigm
Archenhold, WF, Bell, J, Donnelly, J, Shift in Higher Education, National
Johnson, S and Welford, G (1988). Science University of Singapore, 4-6 September 2002
at Age 15: a Review of APU Findings 1980- Downes, T and Zammit, K (2000). New
1984. London: HMSO literacies for connected learning in global
Barnes, M, Clarke, D and Stephens, M classrooms, in: H Taylor and P Hogenbirk
(2000). Assessment: the engine of systemic (Eds) Information and Communication
curriculum reform? Journal of Curriculum Technologies: the School of the Future.
Studies, 32(5) 623-650 London: Kluwer Academic Publishers

Bennett, RE (2002). Inexorable and EPPI Centre (2002). A Systematic Review


inevitable: the continuing story of of the Impact of Summative Assessment
technology and assessment. Journal of and Tests on Students Motivation for
Technology, Learning, and Assessment, (). Learning. http://eppi.ioe.ac.uk
Available from www.jtla.org Frederikson, JR and Collins, A (1989).
Black, P and Wiliam, D (2002). A system approach to educational testing.
Assessment for Learning: Beyond the Educational Researcher, 18(9), 27-32
Black Box (2002). www.assessment- Freeman, MA and McKenzie, J (2002).
reform-group.org.uk/publications.html Implementing and evaluating SPARK, a
Chapman, OL and Fiore, MA (2001). confidential web-based template for self
Calibrated peer review: a writing and and peer assessment of student
critical thinking instructional tool. The teamwork: benefits of evaluating across
White Paper: a Description of CPR. different subjects. British Journal of
http://cpr.molsci.ucla.edu/ Educational Technology, 33 (5), pp553-572.
Cited at www.educ.dab.uts.edu.au/
Cockcroft, WH (1982). Mathematics darrall/sparksite
Counts. London: HMSO
Getzels, JW and Jackson, PW (1962).
Cohen, Y, Ben-Simon, A and Hovav, M Creativity and Intelligence: Explorations
(2003). The effect of specific language with Gifted Students. New York: John Wiley
features on the complexity of systems for
automated essay scoring. Paper presented Kimbell, R (2003). Performance
to the 29th Annual Conference of the assessment: assessing the inaccessible.
International Association for Educational Paper presented at Futurelabs Beyond the
Assessment. www.aqa.org.uk/support/ Exam conference, 19-20 November 2003,
iaea/papers/ben-cohen-hovav.pdf Bristol

43
BIBLIOGRAPHY

Kirriemuir, J and McFarlane, A (2003). of computer-based World Class Tests of


Literature Review in Games and Learning problem solving. Computers and Human
(2004). Bristol: Futurelab. Retrieved Behaviour, 18 (6), 633-649
05/09/2004 from www.futurelab.org.uk/
Ridgway, J and Passey, D (1993). An
research/lit_reviews.htm
international view of mathematics
Klein SP, Hamilton, LS, McCaffrey, DF assessment - through a class, darkly, in:
and Stecher, BM (2000). What do test Niss, M (Ed) Investigations into
scores in Texas tell us? RAND Issues Paper. Assessment in Mathematics Education.
www.rand.org/publications/IP/IP202 Kluwer Academic Publishers, pp57-72
Koretz and Barron (1998). The Validity Ridgway, J (1998). The Modelling of
of Gains in Scores on the Kentucky Systems and Macro-Systemic Change -
Instructional Results Information System Lessons for Evaluation from Epidemiology
(KIRIS). www.rand.org/publications/MR/ and Ecology. National Institute for Science
MR1014/MR1014.pref.pdf Education Monograph 8, University of
Wisconsin-Madison
Linn, RL (2000). Assessments and
accountability. ER Online, 29(2). Ridgway, J and Smith, H (2004). Against
www.aera.net/pubs/er/arts/29-02/ plagiarism: strategies for defending the
linn01.htm validity of assessment systems. EARLI
Assessment SIG, Bergen, Norway
Mathews, JC (1985). Examinations:
a Commentary. London: George Allen Ridgway, J, Swan, M and Burkhardt, H
and Unwin (2001). Assessing mathematical thinking
via FLAG, in: D Holton and M Niss (Eds)
Messick, S (1995). Validity of psychological
Teaching and Learning Mathematics at
assessment. American Psychologist vol 50,
University Level - An ICMI Study.
no 9, pp741-749
Dordrecht: Kluwer Academic Publishers,
Mitchell, T, Aldridge, N, Williamson, W pp 423-430. Field-Tested Learning
and Broomhead, P (2003). Computer Assessment Guide (FLAG).
based testing of medical knowledge. www.wcer.wisc.edu/nise/cl1
Proceedings of the 7th International
Ridgway J and McCusker, S (2003).
Computer Assisted Assessment
Using computers to assess new
Conference, Loughborough, pp249-267
educational goals. Assessment in
Pellegrino, JW, Chudowski, N, Glaser, R Education: Principles, Policy and Practice,
(Eds) (2001). Knowing What Students vol 10, no 3, pp309-328(20)
Know. Washington DC: National Academy
Ripley, M (2004). E-assessment question
of Sciences
2004 QCA keynote speech e-assessment:
QCA (2004). The Basic and Key Skills (BKS) an overview. Presentation given by Martin
E-assessment Experience Report. Ripley at Delivering E-assessment - a Fair
www.qca.org.uk/adultlearning/downloads/ Deal for Learners, a summit held by QCA
bks_e-assessment_experience.pdf on 20 April 2004
Richardson, M, Baird, J, Ridgway, J, Ripley, Roan, M (2003). Computerised
M, Shorrocks-Taylor, D and Swan, M (2002). assessment: changes in marking UK
Challenging minds? Students perceptions examinations are we ready yet? Paper

44
presented to the 29th Annual Conference Tomlinson, M (2004). 14-19 Curriculum
of the International Association for and Qualifications Reform: Interim Report
Educational Assessment. www.aqa.org.uk/ Of The Working Group On 14-19 Reform.
support/iaea/papers/roan.pdf London: DfES. www.14-19reform.gov.uk
Robitaille, DF, Schmidt, WH, Raizen, S, Topping, KJ (1998). Peer assessment
McKnight, C, Britton, E and Nicol, C between students in college and university.
(1993). Curriculum frameworks for Review of Educational Research. 68 (3),
mathematics and science. TIMSS 249-276
Monograph No 1. Vancouver: Pacific
Educational Press
Rt Hon Charles Clarke MP, Secretary of
State for Education and Skills. Keynote
speech at Delivering E-assessment - a
Fair Deal for Learners, a summit held by
QCA on 20 April 2004
Schulman, L (1998). Teacher portfolios: a
theoretical activity, in: N Lyons (Ed) With
Portfolio in Hand: Validating the New
Teacher Professionalism (pp23-37). NY:
Teachers College Press
Slaughter, S and Leslie, LL (1997).
Academic Capitalism: Politics, Policies and
the Entrepreneurial University. Baltimore:
The Johns Hopkins University Press
Sukkarieh, JZ, Pulman, SG and Raikes, N
(2003). Auto-marking: using computational
linguistics to score short, free text
responses. Paper presented to the 29th
Annual Conference of the International
Association for Educational Assessment.
www.aqa.org.uk/support/iaea/papers/
sukkarieh-pulman-raikes.pdf
Tattersall, K (2003). Ringing the changes:
educational and assessment policies, 1900
to the present, in: Setting the Standard.
AQA: Manchester, pp7-27
Teacher Training Agency (2003). Qualifying
to Teach: Professional Standards for
Qualified Teacher Status and
Requirements for Initial Teacher Training
Tomlinson, M (2002). Inquiry into A Level
Standards. London: DfES

45
APPENDIX:
FUNDAMENTALS OF ASSESSMENT

APPENDIX: measure. There is a need for a clear


FUNDAMENTALS OF ASSESSMENT description of the whole topic area (the
domain definition) covered by the test.
How shall they be judged? There is a need for a clear statement of
the design of the test (the test blueprint),
Here we consider some of the criteria with examples in the form of tasks and
against which tests and testing systems sample tests. Construct validity requires
can be judged. supporting evidence on the match between
Validity and reliability are often written the domain definition and the test.
about as if they were separate things. Construct validity can be approached in a
Actually, they are intimately entwined, but number of ways. It is important to check on:
it is worth starting with two simple content validity: are items fully
definitions: validity is concerned with the representative of the topic being
nature of what is being measured, while measured?
reliability is concerned with the quality of
the measurement instrument. convergent validity: given the domain
definition, are constructs which should
A loose set of criteria can be set out under be related to each other actually
the heading of educational validity observed to be related to each other?
(Frederikson and Collins (1989) use the
discriminant validity: given the domain
term systemic validity). Educational
definition, are constructs which should
validity encompasses a number of aspects
not be related to each other actually
which are set out below.
observed to be unrelated?
Consequential validity: refers to the concurrent validity: does the test
effects that assessment has on the correlate highly with other tests which
educational system (Ridgway and Passey supposedly measure the same things?
(1993) use generative validity). Messick
(1995) argues that consequential validity is The essential idea about reliability is that
probably the most important criteria on test scores should be a lot better than
which to judge an assessment system. For random numbers. Test situations have lots
example, high-stakes testing regimes of reliabilities. The over-arching question
which focus exclusively on timed multiple concerning reliability is: if we could test
choice items in a narrow domain can identical students on different occasions
produce severe distortions of the using the same tests, would we get the
educational process, including rewarding same results?
both students and teachers for cheating. Take the measurement of student height
Klein, Hamilton, McCaffrey and Stecher as an example. The concept is easy to
(2000), and Koretz and Barron (1998) define; we have good reason to believe
provide examples where scores on high- that height can be measured on a single
stakes State tests rise dramatically over a dimension (contrast this with athletic
four-year period, while national tests taken ability, or creativity where a number
by the same students, which measure the of different components need to be
same constructs, show little change. considered). However, the accurate
Construct validity: refers to the extent to measurement of height needs care.
which a test measures what it purports to

46
Height is affected by the circumstances of Usability: people using an assessment
measurement students should take off system notably students and teachers
their shoes and hats, and should not need to understand and be sympathetic to
slump when they are measured. The its purposes.
measuring instrument is important a
Practicality: few designers work in arenas
yard stick will provide a crude estimate,
where cost is irrelevant. In educational
good for identifying students who are
settings, a major restriction on design is
exceptionally short or exceptionally tall,
the total cost of the assessment system.
but not capable of fine discriminations
The key principle here is that test
between students; using a tape measure is
administration and scoring must be
likely to lead to more measurement error
manageable within existing financial
than using a fixed vertical ruler with a bar
resources, and should be cost-effective in
which rests on each students head. Time
the context of the education of students.
of day should be considered (people are
taller in the morning); so should the time Equity: equity issues must be addressed -
between measurements. If we assess the inequitable tests are (by definition) unfair,
reliability of measurement by comparing illegal, and can have negative social
measurements on successive occasions, consequences.
we will under-estimate reliability if the
measures are taken too far apart, and
students grow different amounts in the
intervening period.
Exploration of reliability raises a set of
finer-grained questions. Here are some
examples:
is the phenomenon of being measured
relatively stable? What inherent
variation do we expect? (mood is likely
to be less stable than vocabulary size)
to what extent do different markers
assign the same marks as each other to
a set of student responses?
do students of equal ability get the
same marks no matter which version of
the test they take?
Fitness for purpose: the quality of any
design can be judged in terms of its
fitness for purpose. Tests are designed
for a variety of purposes, and so the
criteria for judging a particular test will
shift as a function of its intended purpose;
the same test may be well suited to one
purpose and ill suited to another.

47
About Futurelab

Futurelab is passionate about transforming the way people learn. Tapping into the huge
potential offered by digital and other technologies, we are developing innovative learning
resources and practices that support new approaches to education for the 21st century.

Working in partnership with industry, policy and practice, Futurelab:

incubates new ideas, taking them from the lab to the classroom
offers hard evidence and practical advice to support the design and use of innovative
learning tools
communicates the latest thinking and practice in educational ICT
provides the space for experimentation and the exchange of ideas between the
creative, technology and education sectors.

A not-for-profit organisation, Futurelab is committed to sharing the lessons learnt from


our research and development in order to inform positive change to educational policy
and practice.

Futurelab
1 Canons Road
Harbourside
Bristol BS1 5UH
United Kingdom

tel +44 (0)117 915 8200


fax +44 (0)117 915 8201
info@futurelab.org.uk

www.futurelab.org.uk

Registered charity 1113051


This publication is available to download from the Futurelab website
www.futurelab.org.uk/research/lit_reviews.htm

Also from Futurelab:

Literature Reviews and Research Reports


Written by leading academics, these publications provide comprehensive surveys of
research and practice in a range of different fields.

Handbooks
Drawing on Futurelab's in-house R&D programme as well as projects from around the
world, these handbooks offer practical advice and guidance to support the design and
development of new approaches to education.

Opening Education Series


Focusing on emergent ideas in education and technology, this series of publications
opens up new areas for debate and discussion.

We encourage the use and circulation of the text content of these publications, which
are available to download from the Futurelab website www.futurelab.org.uk/research.
For full details of our open access policy, go to www.futurelab.org.uk/open_access.htm.

Creative Commons

Futurelab 2006. All rights reserved; Futurelab has an open access policy which encourages circulation of
our work, including this report, under certain copyright conditions - however, please ensure that Futurelab is
acknowledged. For full details of our Creative Commons licence, go to www.futurelab.org.uk/open_access.htm

Disclaimer

These reviews have been published to present useful and timely information and to stimulate thinking and
debate. It should be recognised that the opinions expressed in this document are personal to the author and
should not be taken to reflect the views of Futurelab. Futurelab does not guarantee the accuracy of the
information or opinion contained within the review.
FUTURELAB SERIES

REPORT 10

ISBN: 0-9544695-8-5
Futurelab 2004

Vous aimerez peut-être aussi