Académique Documents
Professionnel Documents
Culture Documents
Deposited in DRO:
05 February 2007
Futurelab, Bristol.
This Open Access Policy allows anyone to access our text content (where we have copyright - please note the
exceptions listed below) electronically without charge, as long as certain conditions are met. Users are welcome to
download, save, perform or distribute this work electronically or in any other format, including in foreign language
translation, without written permission subject to the conditions set out in the Futurelab Open Access Licence, some of
which are as follows: * The material cannot be used for commercial gain including professional, political or promotional
uses or for any nancial gain. * The material must be used in full and without alterations or amendments. * Futurelab
and the authors must be acknowledged (with the Futurelab logo - see press page for download details) as the original
source of the material and the relevant Futurelab webpage (where the material can be found) must be given in a
prominent position. It should also acknowledge that its use is subject to the terms of this licence. * Only text is covered
by this licence - no pictures, images, diagrams, moving images, sound, video, downloads of prototypes or software is to
be used. * No more than 500 copies of any one piece of work may be reproduced. * You must advise Futurelab of where
and when the work will be reproduced - please send an e-mail to info@futurelab.org.uk. For a full list of the criteria to
be met, please read the Futurelab Open Access Licence.
Additional information:
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for
personal research or study, educational, or not-for-prot purposes provided that:
a full bibliographic reference is made to the original source
a link is made to the metadata record in DRO
the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full DRO policy for further details.
Durham University Library, Stockton Road, Durham DH1 3LY, United Kingdom
Tel : +44 (0)191 334 3042 | Fax : +44 (0)191 334 2971
http://dro-test.dur.ac.uk
FUTURELAB SERIES
REPORT 10:
Literature Review
of E-assessment
Jim Ridgway and Sean McCusker, School of Education, University of Durham
Daniel Pead, School of Education, University of Nottingham
FUTURELAB SERIES
EXECUTIVE SUMMARY 2
of E-assessment SECTION 1
ASSESSMENT DRIVES
EDUCATION 5
SECTION 2
Jim Ridgway and Sean McCusker, School of Education, University of Durham
HOW AND WHERE MIGHT
Daniel Pead, School of Education, University of Nottingham ASSESSMENT BE DRIVEN? 11
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT 17
FOREWORD SECTION 4
OPPORTUNITIES
I have to admit to being someone who for focus (perhaps the only focus in this day AND CHALLENGES FOR
many years has avoided thinking about and age) for a shared societal debate E-ASSESSMENT 29
assessment it somehow always seemed about what we, as a society, think are the
GLOSSARY 40
distant from my interests, divorced from core purposes and values of education.
my concerns about how children learn If we wish to create an education system BIBLIOGRAPHY 43
with technologies and, to be honest, just a that reflects and contributes to the
little less interesting than other things I development of our changing world, then APPENDIX:
was working on In recent years, however, we need to ask how we might change FUNDAMENTALS
OF ASSESSMENT 46
working in the field of education and assessment practices to achieve this.
technology, it has become clear that
anyone with an interest in how we create The authors of this review provide a
equitable, engaging and relevant education compelling argument for the central role
systems needs to think long and hard of assessment in shaping educational
about assessment. Futurelabs conference practice. They outline the challenges
Beyond the Exam in November 2003 and opportunities posed by the changing
further highlighted this point, as committed global world around us, and the potential
and engaged educators, software and role of technologies in our assessment
media developers came together to raise practices. Both optimistic and practical,
a rallying cry for a rethink of our current the review summarises existing research
assessment practices. and emergent practice, and provides a
blueprint for thinking about the risks and
What I and many others working in this potential that awaits us in this area.
area have come to realise is that we cant
just ignore assessment, or simply see it We look forward to hearing your response
as someone elses job. Assessment to this review.
practices shape, possibly more than any
other factor, what is taught and how it is Keri Facer, Director of Learning Research
taught in schools. At the same time, Futurelab
these assessment practices serve as the research@futurelab.org.uk
1
EXECUTIVE SUMMARY
2
REPORT 10
LITERATURE REVIEW OF E-ASSESSMENT
JIM RIDGWAY AND SEAN MCCUSKER, SCHOOL OF EDUCATION, UNIVERSITY OF DURHAM
DANIEL PEAD, SCHOOL OF EDUCATION, UNIVERSITY OF NOTTINGHAM
for users for example by providing on- representations; however, it seems likely
demand tests with immediate feedback, that complex ideas (notably in reasoning
and perhaps diagnostic feedback, and from evidence of various sorts) will be
more accurate results via adaptive testing; acquired better and earlier than they are
it can help improve the technical quality of at present, and that the standards of
tests by improving the reliability of scoring. performance demanded of students will
rise dramatically. Here, we also explore
E-assessment can support current ways to assess important but ill-defined
educational goals. Paper and pencil tests goals such as the development of
can be made more authentic by allowing metacognitive skills, creativity,
students to word process essays, or to use communication skills, and the ability
spreadsheets, calculators or computer to work productively in groups.
algebra systems in paper-based
examinations. It can support current UK A major problem with education policy and
examination processes by using Electronic practice in England is the separation of
Data Exchange to smooth communications academic and practical subjects. In the
between schools and examinations worst case, to be able to invent and create
authorities; current processes of training something of value is taken to be a sure
markers and recording scores can be sign of feeble-mindedness; where as to e-assessment
improved. Systems where student work is opine on the work of others shows
scanned then distributed have advantages towering intellectual power. A diet of
can be used to
over conventional systems in terms of academic subjects with no opportunities to assess new
logistics (posting and tracking large act upon the world fails to equip students
volumes of paper, for example), and with ways to deal with their environments;
educational
continuous monitoring can ensure high a diet of practical subjects which do not goals
marker reliability. Current work is pushing engage higher-order thinking throughout
boundaries in areas such as text the creative process equip students only to
comprehension, and automated analysis become workers for others. Both streams
of student processes and strategies. produce one-handed people, and polarised
societies. E-portfolios can provide working
E-assessment can be used to assess new environments and assessment frameworks
educational goals. Interactive displays which support project-based work across
which show changes in variables over the curriculum, and can offer an escape
time, microworlds and simulations, from one of the most pernicious historical
interfaces that present complex data in legacies in education. E-portfolios solve
ways that are easy to control, all facilitate problems of storing student work, and
the assessment of problem-solving and make the activity of documenting the
process skills such as understanding process of creation and reflection relatively
and representing problems, controlling easy. Reliable teacher assessment is
variables, generating and testing enabled. There is likely to be extensive use
hypotheses, and finding rules and of teacher assessment of those aspects of
relationships. ICT facilitates new performance best judged by humans
representations, which can be powerful (including extended pieces of work
aids to learning. Little is known about assembled into portfolios), and more
the cognitive implications of these extensive use made of on-demand tests
3
of those aspects of performance which PURPOSE
can be done easily by computer, or which
are done best by computer. The purpose of this report is:
The issue for e-assessment is not if it will to assert the centrality of assessment
happen, but rather, what, when and how it in education systems
will happen. E-assessment is a stimulus
to identify drivers of assessment,
for rethinking the whole curriculum, as
and their likely impact on assessment,
well as all current assessment systems.
and thence on education systems
New educational goals continue to
emerge, and the process of critical to describe current, radical plans
reflection on what is important to learn, for increased use of high-stakes
and how this might be assessed e-assessment in the UK
authentically, needs to be institutionalised to describe and exemplify current
into curriculum planning. uses of ICT in assessment
to explore the potential of new
e-assessment is E-assessment is certain to play a major
technologies for enhancing current
role in defining and implementing
a stimulus for curriculum change in the UK. There is a
assessment (and pedagogic) practices
rethinking the strong government commitment to high to identify opportunities and to
quality e-assessment, and good initial suggest ways forward
whole progress has been made; nevertheless, to drip feed criteria for good
curriculum there is a need to be vigilant that the assessment throughout (set out
design of assessment systems is not explicitly in an appendix).
driven by considerations of cost.
This report has been designed to: present
Major challenges of going to scale have key findings on research in assessment;
yet to be faced. A good deal of innovative describe current UK government plans,
work is needed, coupled with a grounded and likely future developments; provide
approach to system-wide implementation. links to interesting examples of
e-assessment; offer speculations on
possible future developments; and to
stimulate a debate on the role of
e-assessment in assessment, teaching,
and learning.
4
SECTION 1
5
SECTION 1
6
therefore the knowledge that is valuable. 1.3 ICT AND ASSESSMENT
It is unsurprising that high-stakes
assessment has a profound effect on both ICT perturbs the links between learning,
learning and teaching. Decisions about teaching and assessment in a number of
assessment systems are not made in a distinct ways:
vacuum; the educational community in the
UK (but not universally) is involved in the 1 ICT has changed the ways that research
design of assessment systems, and these is conducted in most disciplines.
decisions are usually grounded in Linguists analyse large corpuses of text;
discussions on what is worth knowing, and geographers use GIS systems; scientists
in the practicalities of teaching different and engineers use modelling packages.
concepts and techniques to students of Everyone uses word processors,
different ages. databases and spreadsheets. Students well designed
should use contemporary research
methods; if they do not, school-based formative
1.2 THE IMPACT OF ASSESSMENT learning will become increasingly assessment is
ON ATTAINMENT irrelevant to understanding
developments in knowledge. associated with
An extensive literature review by Black and Assessment should reinforce good major gains in
Wiliam (2002) showed that well designed curriculum practice. We are
formative assessment is associated with approaching a bizarre situation where student
major gains in student attainment on a students use powerful and appropriate attainment
wide range of conventional measures of tools to support learning and solve
attainment. This result was found across problems in class, but are then denied
all ages and all subject disciplines. access to these tools when their
Topping (1998) reviewed the impact of peer knowledge is assessed.
assessment between students in higher
education on writing, and found large 2 ICT can support educational goals that
positive effects. A major literature review have been judged to be desirable for a
commissioned by the EPPI Centre (2002) long time, but hard to achieve via
showed that regular summative conventional teaching methods. In
assessment had a large negative effect on particular, ICT can support the
the attainment of low-attaining students, development of higher-order thinking
but did little harm to high-attaining skills such as critiquing, reflection on
students. These studies provide strong cognitive processes, and learning to
evidence that good assessment practices learn, and can facilitate group work,
produce large performance gains. These and engagement with extended projects;
gains are amongst the largest gains found ICT competence is itself a (moving)
in any educational treatments. Similarly, target for assessment.
poor assessment systems have negative
not neutral effects on the performance of 3 New technologies raise an important
weak students. It follows that when we set of questions about what is worth
consider the introduction of e-assessment, learning in an ICT-rich environment;
we should be aware that we are working what can be taught, given new
with a very sharp sword. pedagogic tools; and how assessment
7
SECTION 1
8
terms of cost, the estimation of the cost of Theory dependence: summative
testing is often done very badly, especially assessment rarely rests on theory;
in the USA. There, it is common for cost formative assessment is likely to be
to be equated with the money paid for the theory-genic as participants discuss
test and its scoring, not the real cost, progress, what is known, how to learn and
which is the opportunity cost, measured in remember things, and how best to use
terms of the reduction in time spent evidence.
learning which has been diverted to
useless test prep. Formative evaluation Tool types: summative assessment
should be an integral part of the work of commonly uses timed written
teaching, so estimation of cost focuses assessments where the structure is
naturally on opportunity costs just what specified in advance, and which is scored
is an effective allocation of teaching and using a common set of rules. Tests are
learning time to formative evaluation? In often designed to discriminate between
terms of time, for summative assessment students, and to put them into a rank order
time is easy to measure (so long as in terms of performance. Formative
useless test prep is counted in); again, assessment commonly uses a variety of
formative assessment is an integral part methods such as portfolios of work,
of teaching. student draft work, student annotations of
their work, concept mapping tools,
Knowledge and the knowledge diagnostic interviews and diagnostic tests.
community: summative assessment is Each student is their own referent
explicit about what is being assessed, and comparison with other students may not
ideas about the nature of knowledge are be useful, and is often harmful to learning.
shared within a wide community; with
formative evaluation, ideas about the
nature of knowledge might be negotiated 1.4.1 Reflecting on summative
by just two people. and formative assessment
Status of the assessment: in summative Despite the differences highlighted here,
assessment, the assessment can be the two sorts of assessment have many
ignored by the student; formative areas of overlap:
assessment simply isnt formative
assessment unless the student does a student can change their study
something with it to improve performance. methods on the basis of an end-of-year
examination result (summative
Focal domain: it is useful to distinguish assessment used for formative purposes)
between cognitive, social and emotional
aspects of performance. Summative summative evaluation of students
assessment commonly focuses on can provide formative evaluation for
cognitive performance; formative teachers, schools and educational
assessment can run wild in the social and systems
affective domains. formative assessment always rests on
some sort of summative assessment
feedback and discussion must rest
9
SECTION 1
10
SECTION 2
11
SECTION 2
The mobility of capital and jobs has For developed economies to maintain
changed the profile of the job market, their global dominance, their economies
with new kinds of jobs being created (eg must be geared to adding value to raw
in ICT) and old ones disappearing (eg in materials (or to creating value from
manufacturing industries). It is very easy to nothing, as in the entertainment and
export jobs and capital from the developed finance industries). This requires changes
world to the developing world (eg by in the education system which encourage
relocating telephone call centres, or by creative activities, and good problem-
establishing factories in countries with low solving ability. Employment in a post-
wage costs). For people (and economies) industrial society is likely to depend on
to be successful, they must continue to higher-order thinking skills, such as
learn new skills, and to adapt to change. learning to learn. This requires that
Retraining will often require re- these thinking skills be exemplified
certification of competence, with the and assessed, if they are to receive
obvious consequence of further appropriate attention in school.
assessment, and the need to design
assessment systems appropriate to the The effects of cooperation between
new needs of employment. These are countries in Europe will have an effect on
pressures for more, and effective, systems assessment systems. Currently, there is
of competence-based assessment. a problem that qualifications in different
member states (architect, engineer)
cooperation Migration for work and education raises are gained after rather different amounts
similar issues. The developed world has a of training, and equip people for quite
between need to import highly skilled workers; different levels of professional
countries in universities worldwide seek international responsibility. This makes job mobility
students. In both cases, there is a need to very difficult. The Bologna Accord is an
Europe will have certify the competence of applicants, and agreement between EU member states
an effect on to reject those least likely to be effective that all universities will adopt the same
workers, or to complete courses pattern of professional training (typically a
assessment successfully (because of a lack of fluency three-year undergraduate degree followed
systems in the language of instruction, for by a two-year professional qualification) in
example). Financial considerations make order to make qualifications in different
it impractical for testing to take place in member states more comparable.
the target country, and so a good deal Convergence of course structure is likely
of testing takes place in the country to lead to a convergence of assessment
supplying workers or students. Again, it is systems, in line with the desire to increase
common to use competence tests which mobility (see www.engc.org.uk/
are externally mandated and designed. international/bologna.asp for an analysis of
Language testing provides a good example; the impact of the Bologna, Washington and
a computer-based version of the Test of Sidney Accords on engineering).
English as a Foreign Language (TOEFL)
has been developed which adjusts the Globalisation is having a profound effect on
difficulty level of the questions in the light educational systems worldwide. In higher
of the performance of the candidate on the education, Slaughter and Leslie (1997)
test (see www.ets.org/toefl). describe the response of universities in
12
several countries to academic capitalism by a commercial company before they are multinational
a global trend to view knowledge as a allowed to certify student competence.
product to be created and controlled, and companies also
to see universities as organisations which The scale on which such examinations are drive changes in
produce knowledge and more taken is impressive. Bennett (2002)
knowledgeable people as efficiently as describes the National Computer Rank assessment
possible. They document the changes in Examination, China, which is a proficiency practices
university structures and functioning which exam to assess knowledge of computer
have been a response to such pressures; science and the ability to use it; two
these include greater collaboration on million examinations were taken in 2002.
teaching between universities, and mutual Tests for the European Computer Driving
accreditation of courses. Again, the need Licence have been taken by more than a
for comparability of course difficulty and million people.
student attainment will lead to a careful
re-examination of assessment systems,
and some homogenisation. 2.3 MASS EDUCATION
Multinational companies also drive Mass education has developed rapidly and
changes in assessment practices. These recently. In the last 30 years, the
companies are successful in part because percentage of the UK population being
of their emphasis on uniform standards; educated at university has risen from
one is unlikely to get a badly cooked about 5% to about 40%. This puts
hamburger in Macdonalds, or a copy of pressures on academic systems to develop
Excel that functions worse than other efficient assessment systems.
copies. This emphasis on quality control
extends to job qualifications, and to There is now a great deal of distance
standards required of workers. In fast education. China plans to have five million
changing markets such as technology students in 50-100 online colleges by 2005.
provision, retraining workers and checking At least 35 US states have virtual
their competence to use, install or repair universities (Bennett 2002). (The recent
new equipment or software requires failure of the E-university in the UK -
appropriate assessment of competence. www.parliament.uk/post/pn200.pdf - and
The needs of employers for large numbers of the US Open University, shows that such
of staff who are able to use ICT effectively ventures are not always successful!) A
as part of their job has lead to trans- great deal of curriculum material is
national qualifications such as the delivered via a variety of technologies (the
European Computer Driving Licence Massachusetts Institute of Technology
(www.ecdl.co.uk). Such examples are is in the process of putting all its course
interesting because they are set by material online, for example see
international organisations, or commercial http://ocw.mit.edu/index.html). Over
organisations, and in some cases (eg the 3,000 textbooks are freely available
Microsoft Academy programme - online at the National Academy Press
www.microsoft.com/education/ (www.nap.edu). The use of technology in
msitacademy/ITAPApplyOnline.aspx), the assessment process is a logical
state-funded educational organisations consequence of these developments.
must submit themselves for examination
13
SECTION 2
14
progressively, but on a tight timescale. across the West Midlands and the west of
The DfES E-learning Strategy will be England. AQA conducted a live trial in
accompanied by radical changes to the March 2004 on 20,000 scripts (Adams and
assessment process, for which the Hudson 2004); in Summer 2004, about
Qualifications and Curriculum Authority 500,000 marks (5% of the total) will be
are responsible (www.qca.org.uk/ collected; by 2007, 100% of marks will be
adultlearning/workforce/6877.html). Over captured electronically.
the next five years, the following activities
are planned: The Tomlinson Report (2004, in prep)
will offer a more radical challenge to
All new qualifications should include assessment practices. The Interim Report
assessment on-screen (Tomlinson 2004) identified a number of
Awarding bodies set up to accept and problems with the existing system. These the Tomlinson
assess e-portfolios include concerns about:
Report will offer
Most examinations should be available excellence the current system does
optionally on-screen, where appropriate a more radical
not stretch the most able young people
National curriculum tests available (in 2003, over 20% of A-level entries challenge to
on-screen for those schools that want resulted in grade A) assessment
to use them vocational training there is an historic practices
The first on-demand GCSE examinations failure to provide high-quality vocational
are starting to be introduced courses that stretch young people and
10 new qualifications specifically designed prepare them for work
for electronic delivery and assessment vocational learning is often assessed by
QCA Blueprint (2004) external written examinations, not
practical and continuous assessment
The timescale for these changes is short. assessment - the burden on students
For example, in 2005, 75% of basic and key and teachers is too high
skills tests will be delivered on-screen; in
2006, each major examination board will disaffection - our high drop-out rates
offer live GCSE examinations in two are scandalous
subjects, and will pilot at least one the plethora of qualifications currently
qualification, specifically designed for around 4,000
electronic delivery and assessment; in
2007, 10% of GCSE examinations will be curricula - are often narrow, overfull,
administered on-screen; in 2008, there will and limit in-depth learning
be on-demand testing for GCSEs in at too few students develop high levels of
least two subjects. competence in mathematical skills,
communication, working with others, or
Good progress has been made with these problem-solving
developments. For example, Edexcel is
carrying out a pilot scheme for online failure to equip young people with the
GCSEs in chemistry, biology, physics and generic skills, knowledge and personal
geography with 200 schools and colleges attributes they will need in the future.
15
SECTION 2
there is an The Report proposes a single qualifications explosion of its usefulness and use in
framework, based on diplomas set at four everyday life. These provide pressures for
urgent need to levels (Entry, Foundation, Intermediate and more relevant skills to be assessed, and
invent and apply Advanced). Students are expected to also provide an assessment medium which
progress at a pace appropriate to their is largely unexplored. Demands for lifelong
new sorts of e- attainment, rather than their age. Each learning, for people who can innovate and
assessment on a diploma shares some common features. create new ideas, and the needs for
These require students to demonstrate informed citizenship are all pressures for
large scale evidence of: education (and associated assessment
systems) that rewards higher-order
mathematical skills, communication thinking, and personal development.
and ICT skills Conversely, drivers such as the need to
successful completion of an extended retrain and recertify staff, to ensure
project common standards across organisations
in different countries, and to allow access
participation in activities based on to well-qualified migrants for jobs and
personal interest, contribution to the education, emphasise assessments which
community as active citizens, and transcend national boundaries and which
experience of employment are based on well-defined competencies
personal planning, review and making (and where assessment design is
informed choices sometimes based on perceived
commercial imperatives). These drivers
engagement in main learning- the
require different approaches to
major part of the diploma chosen by
assessment, and all require new sorts of
the student in order to open access to
assessments and assessment systems
further opportunities (eg in employment
to be developed.
or education).
In the UK, there are a number of problems
These recommendations are exciting and
with current assessment systems. First,
very ambitious, but deeply problematic,
they serve students very badly; second,
unless there are radical changes to
they might soon collapse under their own
current assessment systems notably in
weight. There is now the political will (and
the large-scale adoption of e-assessment.
a tight timescale) to develop pervasive,
We consider ways these recommendations
high quality e-assessment on a tight
might be met, in Section 3.
timeline, aligned with current and
emerging educational goals. There is also
an urgent need to invent and apply new
2.6 SUMMARY OF SECTION 2 sorts of e-assessment on a large scale.
A number of drivers are shaping both
assessment and ICT; these need to be
taken into account in any discussion of
future developments. These drivers provide
conflicting pressures. The drivers
considered here include the increasing
power and ubiquity of ICT, and the
16
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
17
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
18
Motivational gains: there are claims Better task design: it is easier for test
(Richardson, Baird, Ridgway, Ripley, constructors to change tasks on the basis
Shorrocks-Taylor and Swan 2002; Ripley of information during testing and pre-
2004) that students prefer e-assessment to testing, because of the immediacy of data
paper-based assessment, because the collection. This can range from the
users feel more in control; interfaces are rejection of items that do not function well
judged to be friendly; and because some (for example items where students who
tests use games and simulations, which score well overall are likely to fail a
resemble both learning environments and particular item) to improved test design
recreational activities. (for example, ensuring that there are a lot
of items set around critical cut-off points
Better exemplification for students and especially the pass/fail boundary so
teachers: posting examples of work which that the test is most reliable there).
meets certain standards can be beneficial.
In South Australia, excellent student work Cost: it is common to claim that e-
in technology is displayed on the web (see assessment can save money it is clear
www.ssabsa.sa.edu.au/tech/2004techsho/ that online multiple choice tests can be
index.htm). cheap to administer and score. However,
if we are to exploit the potential of ICT to
Better system feedback: having full sets improve assessment for example by
of response data from students available at presenting simulations or video as an
the time of Examiners Reports can integral part of a test then the costs of
improve the quality of feedback. Details of testing are likely to increase.
questions, and parts of questions, that
proved relatively difficult and easy should
improve the quality of Examiners Reports 3.2 USES OF E-ASSESSMENT TO
(which are based currently on examiners SUPPORT CURRENT EDUCATIONAL
experiences of a sample of scripts, and
GOALS
rarely on candidate success on questions
and part-questions). This information will
be useful for both improving the quality of 3.2.1 Using ICT to support
questions, and in providing information to Multiple Choice Tests
teachers about topics that have not been
learned well. This is a well-established technology,
particularly well suited to assessing
Faster information for higher education: declarative knowledge (knowing that) in
universities need assessment results in a well-defined domains. Developing tasks to
timely fashion. UK universities receive identify student misconceptions is also
A-level results quite late in the academic possible. It is harder to assess procedural
year, and engage in a frenetic process knowledge (knowing how). MCT is
to fill places with appropriately qualified unsuited to eliciting student explanations,
applicants when students do and do not or other open responses. MCT have the
achieve the grades that were a condition great advantage that they can be very
of entry. These pressures would be cheap to create and use. Some of this
eased if results were delivered earlier. cheapness is illusory, because the costs
19
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
of designing good items can be high. running a CAS pilot for its Higher Level
Over-use of MCT can be very expensive, if Mathematics Diploma from September
it leads to a distortion of the curriculum in 2004. In the USA, CAS can be used when
favour of atomised declarative knowledge, taking the College Boards Advanced
divorced from conceptual structures that Placement Calculus test.
students can use to work on the world,
effectively. MCT are used extensively in the
USA for high-stakes assessment, and are 3.2.3 Using ICT to support current
presented increasingly via the web. For UK examination processes
example, web-based high-stakes State
tests are available in Dakota and Georgia; A number of ways in which ICT can
the Graduate Record Examination (GRE), improve current examination practices
used by many colleges to determine are set out below.
access to Graduate School in many US
colleges, is available online. Better school-examination board
communication: Tomlinson (2002) points
it makes sense to existing extensive use of ICT by awarding
3.2.2 Creating more authentic paper bodies in the examination process, and
to allow students and pencil tests argues for more use of Electronic Data
access to the Interchange (EDI) systems, which enable
It makes sense to allow students access to schools and colleges to submit
tools they use in the tools they use in class, such as word examination entries and information about
class, during processors, and that professionals use candidates online and to receive results
at work, such as graphing tools and automatically.
testing modelling packages, during testing. It
makes no sense at all to always forbid Supporting the current marking and
students to use tools of the trade when moderation process: a challenge faced by
being assessed. E-learning changes large-scale tests that require human
the nature of the skills required. E- markers is to ensure the comparability of
assessment allows examiners to focus standards across markers, and over time
more on conceptual understanding of what for all markers during the grading process.
needs to be done to solve problems, and Chief examiners create scoring rubrics to
less on telling students what to do, then guide other markers, and there is usually a
assessing them on their competence in process of standardisation where markers
using the manual techniques required to use the scoring rubrics to score a sample
get the answer. In Australia, the State of of scripts, and attend a standardising
Victoria (www.vcaa.vic.edu.au/prep10) has a meeting where standards are compared,
system for essay marking where students discrepancies are discussed, and the
key in their responses to questions, which rubric is tuned. Once markers have
are then distributed electronically and reached an appropriate level of marking
marked by human markers. Computer accuracy, they mark examinations
Algebra Systems (CAS) can be used in the independently. Systems vary in terms of
Baccalaurat Gnral Mathmatiques the extent of the moderation used. In some
examination in France; the International systems, scripts are sampled by chief
Baccalaureate Organisation (IBO) is examiners, and serious deviation from the
20
rubric can lead to the remarking of all the marked. There is flexibility in the ways
scripts sent to a particular examiner. ICT that scoring is done. Markers can be
can be used to support this process. asked to score whole scripts, or individual
Sample scripts typical of different questions. So a newly appointed marker
categories of student work can be put might be sent questions judged to be
online, for easy reference by markers. easy to mark, and more experienced
Entry of marks can be done via templates markers might be sent questions which
that ensure that markers complete every require deeper subject knowledge. The
section, and the tedious process of reliability of scoring can be increased.
aggregating marks from different parts of Scripts judged to be around key
the script is done automatically and borderlines on first marking can be sent
without error. Data is collected in a way to other markers; scripts judged to be
that facilitates rapid and detailed analysis, well away from boundaries need be
at the level of responses to different parts scored only once. Online support can be
of questions, whole questions, and the provided; markers can ask for help with
distribution of test scores. specific student responses. Data is
captured in a form suitable for a number
Replacing paper: in the USA (and of subsequent analyses.
increasingly in the UK), there is
widespread use of systems where students An interesting variant of this approach that
take paper-based examinations, and the obviates the need for scanning would be to
scripts are scanned electronically (this is require candidates to use intelligent pens.
analogous to Optical Mark Recognition for These pens have two distinct functions.
multiple choice tests that has been The first is to write like a conventional pen.
available for many years). Once in this The second is to record its movements
format, the documents can be sent (exactly) on the page. This is done by using
electronically to markers, who can be specially prepared stationery. Imagine you
working almost anywhere. These systems could see a small square area of a
have a number of advantages over paper- banknote. The pattern across the whole
based systems. First, there are surface is never repeated, so that, given
considerable problems in tracking the sufficient time, you could find exactly
distribution and return of large volumes of where the square is located on the note.
paper to and from markers; there are The pen works in a similar way, to record
security issues sending examination its position on the page over the course of
papers by post, and scripts can get lost. the examination. The pen is then
Second, moderation of the quality of connected to a computer, and all the data
scoring can be done easily. Pre-scored is downloaded. The whole student
anchor papers can be sent to markers response can then be reconstructed.
during the course of their marking, to Clearly, this approach would have to be
ensure they are maintaining standards; subjected to extensive trialling before any
markers who do not perform adequately widespread adoption.
can be told to take a break, or can be
removed from the pool of markers. The
whole process can be monitored in terms
of the rate at which scripts are being
21
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
ICT can be used 3.2.4 Online assessment: turning capturing the rough work, and second,
a GCSE paper into computer-only allocating partial credit. Computer capture
to moderate is very difficult, given current interfaces;
e-assessment
human markers the rules for allocating partial credit would
have to be specified in very fine detail for
An interesting challenge is to devise
them to be used as part of an automatic
ways to replace paper-based tests with
scoring routine.
ICT-based tests, and to score them
automatically. Some virtues of paper-
based tests are unlikely to be replicated for
a number of reasons, so setting tests on- 3.2.5 Scoring of open responses
screen is likely to bring about changes in
the nature of what is assessed. Here, we GCSE questions often require students to
consider one specimen GCSE mathematics answer questions in their own way, and to
paper to illustrate the problems. explain things scoring these responses
automatically is inherently difficult.
Measuring and drawing: about 10% of the Automated scoring of open student
marks in the paper-based assessment responses is the focus of a good deal of
required the use of actual instruments ongoing work. A number of approaches
(ruler, protractor, compasses). One have been taken to the problem of
approach for translation onto screen would automatic scoring. One is based on the
be to simulate the physical instruments, analysis of the surface features of the
eg to provide a virtual protractor that can response (Cohen, Ben-Simon and Hovav
be dragged around the screen and rotated. 2003), such as the number of characters
Another is to provide CAD or interactive entered, the number of sentences,
geometry packages. The latter would sentence length, the number of low-
require a substantial change to the frequency words used, and the like. The
syllabus, but could provide real benefits in success of such methods can be judged by
terms of student learning. comparing the correlation between
computer and human judges, and the
Mathematical expressions: about 20% of correlation between scores given by two
the marks required the student to write sets of human judges. Cohen, Ben-Simon
down answers that could not be keyed in, and Hovav (2003) looked at the scoring of a
using a standard keyboard. These included range of essay types by humans and
fractions, division expressions, and powers. computer, and report that the correlation
between the number of characters keyed
Rough work and partial credit: almost by the student, and the scores given by
every question in the paper format human judges are as high as the
included space for rough work, and about correlation between scores given by
30% of the total marks potentially could be human judges. Nevertheless, these
awarded based on this work, in the form of scoring systems do not provide a panacea.
partial credit awarded where the final In the USA, double marking is used to
answer is incorrect (these marks are ensure reliability (this is rarely done in the
usually awarded in full if the final answer UK). ICT can be used to moderate human
is correct). There are two distinct problems markers (and save money) if the
in translating this to a digital format first computer and the human disagree, the
22
paper is re-marked by a human. Machine- produce student responses that are
only scoring is unlikely to be useful in UK difficult to score) and in terms of writing
contexts, for two reasons. First is that the questions which highlight student
UK culture requires that scoring schemes misconceptions. This approach requires a
be described in ways that are useful to good deal of work prior to live testing, so
teachers and students. Second is that the is well suited to situations where tasks
consequential validity of such scoring will be used repeatedly.
systems would be dire the advice to
students would be to improve their scores In the USA, the Graduate Management
simply by using more keystrokes. A second Aptitude Test (GMAT) - used to determine
approach which could improve the quality access to business schools - uses
of scoring and reduce costs is being used automated scoring of text. Here again, the
to assess student responses on tasks in test is scored by both human and machine, new goals involve
contexts where the range of acceptable to offer some sort of reliability check for
responses can be well defined, such as in the human marker. the development
short answer science tasks (eg Sukkarieh, of higher-order
Pulman and Raikes 2003). Here,
appropriate (the Earth rotates around the 3.3 ICT SUPPORT FOR CURRENT thinking, and a
sun) and inappropriate (the sun rotates NEW EDUCATIONAL GOALS range of social
around the Earth) responses are defined.
Lists of synonyms are generated for nouns There is an emerging consensus
skills
(our globe) and verbs (circles), and worldwide on new educational goals,
alternative grammatical forms are defined, focused on problem solving using
based on analyses of large numbers of mathematics and science, supported by an
student responses. Student responses are increased use of information technology
parsed using techniques borrowed from (compare, for example, UK developments
Natural Language Processing, and are with those in New Zealand
compared with stored appropriate and www.minedu.govt.nz; and Singapore
inappropriate responses, using a variety of www1.moe.edu.sg/iteducation). These new
Information Extraction techniques (see goals involve the development of higher-
Cowie and Lehnert 1996). Mitchell, order thinking, and a range of social skills
Aldridge, Williamson and Broomhead such as communication, and working in
(2003) describe work at The Dundee groups. There is an honourable tradition of
Medical School. Here, all students take the assessing problem solving via the use of
same examination at the end of every year. extended tasks, such as those developed
Academics are presented with all the by the APU (eg Archenhold, Bell, Donnelly,
responses to the same question, with the Johnson and Welford 1988). However, the
computers judgement on the correctness computer offers some unique features in
or otherwise of the answer, and an terms of representation, interaction, and
estimate of the confidence of the its support for modelling. Here, we
judgement. Human scoring time is describe some recent developments which
dramatically reduced, and staff report make use of these unique features.
positive benefits in terms of the quality of
the questions they ask, both in terms of
rewriting ambiguous questions (which
23
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
the interactive
properties
of computers
make them well
suited to the
assessment of
process skills
24
their previous exposure to particular Students performed better on some tasks computers can
equipment can both reduce reliability, than one might expect notably tasks that
and add an extra cognitive load to the require them to reason from complex data play a leading
intellectual task being performed. In some sets (eg data with two independent role in the
situations, issues of health and safety variables and one dependent variable at
arise. Some education systems are age 9 years). We take this as a very positive development of
unwilling to accept teacher ratings of sign that computers can play a leading role the skills which
students for the purposes of high-stakes in the development of the skills which
testing, with the result that process skills constitute the new educational agenda. constitute the
in science are not assessed at all. In many aspects, student performance was new educational
Computer-based assessment permits the poor - work characterised by guessing,
assessment of these valuable aspects of too little use of systematic methods, agenda
learning science, at modest cost. A range poor hypothesis generation, and poor
of different process skills can be identified, generalisation. On many tasks, students
which include: were able to show evidence of good
reasoning skills; however, explanations
working systematically (for example, were often weak. Given the earlier
choosing tests systematically, discussion of the impact of assessment on
controlling variables and recording the curriculum, it is to be hoped that the
results systematically) use of e-assessment of process skills will
generating and testing hypotheses lead to better student performance on a
range of important activities.
finding rules and relationships
handling complex data World Class Tests focused on summative
assessment in science, mathematics and
testing solutions
technology, and used a variety of contexts,
seeking completeness and rigour (in including geography and economics, as
many real-world situations, exemplified well as biology, physics, and engineering.
by diagnosis and remediation in spheres The ideas are generic, and can be applied
such as medicine and industrial process to many curriculum areas. On the basis of
control, it is important to find all of the analyses of student performance on WCT,
faults in a system). teaching modules for whole class use have
been developed, targeted on weak process
Five sets of live tests have been skills. These teaching modules provide a
administered in the UK and elsewhere, good deal of formative assessment, and
each of which was preceded by extensive require students to engage in reflective
pre-testing. A notable result was the ease activities such as critiquing student work,
with which students interacted with and explaining their own solution strategies.
computers. The affective response from
students was very strong they really We discuss new educational goals
enjoy working on these tasks. This might that are less amenable to summative
be related to the sustained challenge the assessment such as the ability to work
tasks present, which is similar to the in groups, to communicate, to learn to
reported reasons why they like computer- learn in Section 4.
based games (Kirriemuir and
McFarlane 2004).
25
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
3.3.2 Assessing ICT at Key Stage 3 There are three distinct uses for portfolios.
The first is to provide a repository for
Ongoing work funded by QCA sets out to student work; the second is to provide a
assess student attainment in ICT at age 13 stimulus for reflective activity which
years. A key principle for the design of might involve reflection by the student,
these tests is that students should be and critical and creative input from peers
tested on their performance on extended and tutors; the third is as showcase,
tasks (create a web page about topic X for which might be selected by the student to
audience Y, using a particular set of represent their best work (as in an artists
resources - a database, clients accessible portfolio) or to show that the student has
via e-mail, spreadsheets for planning, satisfied some externally defined criteria,
web page creation tools) not on a series as in some teacher accreditation systems
assessment of sub-tasks (use a spreadsheet to add (eg Schulman 1998). These uses are not
up these numbers). An extraordinarily mutually exclusive. Students may well
systems must ambitious goal is to present tasks and wish to archive all their work; reflective
require students score performance entirely by computer. activities and feedback from others will be
This is a laudable aim, and shows a based on a subset of this work; the final
to show the full government commitment to high presentation portfolio will be selected
spectrum of quality e-assessment (including 20m from this corpus.
for the project).
competencies These different uses of portfolios reflect
different, but not always incompatible,
3.3.3 Digital portfolios theories of learning. A behaviourist
approach will focus on defining core
An historical legacy which bedevils the competencies that are impossible to
current education system in the UK is assess in timed examinations, and the
the distinction between academic and need for fast and efficient feedback on
practical subjects. This was enshrined in student products. A social constructivist
the 1944 Education Act, which created view will focus on the importance of
grammar, technical and secondary reflection and sense making by a group
modern schools (Tattersall 2003). Abstract (including the tutor) which will include
thinking is important; appropriate action in the negotiation of educational goals.
context that rests on practical competence
is important. Neither is much use on its ICT provides an opportunity to introduce
own, and students should be taught to manageable, high quality coursework as
both abstract and apply. For this to part of the summative assessment
become a classroom reality, assessment process. Student portfolios have been
systems must require students to show advocated for a long time, and have
the full spectrum of competencies in a been used on a limited basis. From the
number of school subjects. If high-stakes viewpoint of assessment, the rationale for
assessment systems fail to reward such portfolios is clear: there are a number of
behaviours, they are unlikely to be the valuable activities and attainments that
focus of much work in school. E-portfolios cannot be assessed using the format of
offer a way forward. timed tests. The ability to create, design,
reflect, modify and persevere are all
26
important goals of education. It is entirely it simplifies the documentation of the
appropriate to assess these processes by development of work reducing the
collecting evidence on the ability to engage busy work students might otherwise
in an extended piece of work, and to bring have had to engage in. The process of
it to a successful conclusion by the documentation via a portfolio of work
creation of some product lab report, supports student reflections on processes
video, installation etc. Part of the portfolio on decisions made deliberately, those
can (should) provide evidence of the range forced by circumstances, and those that
of personal skills demonstrated, perhaps just sort of happened. Digital images are
under the headings suggested in the easy to manipulate and present. Student
Tomlinson Report (2004): student self- presentations of work on the development
awareness of themselves and the ways of artefacts is easy, once images are
they learn and what they know; how captured digitally.
students appear to, and interact with,
others; thinking about possible futures and In some subjects, such as design and the ability to
making informed decisions. A section of technology, and art, extended projects are
the portfolio in the form of a viva, or simply at the heart of the discipline. The use of
create, design,
annotations of products where students e-portfolios maps directly onto current reflect, modify
show their attainments in these three conceptions of the domain, and offers
aspects of performance is appropriate. practical solutions to some common
and persevere
problems (eg Kimbell 2003). This work is are all important
A number of problems are associated with important, and is likely to be applicable on
portfolios and other sorts of coursework. a large scale in the near future. A very
goals of
One is the problem of storage especially large number of institutions have made education
in design projects and in art. ICT can solve use of portfolio systems; the American
the problem by holding images of artefacts Association for Higher Education (AAHE)
created. A second problem is student Portfolio Clearinghouse (www.aahe.
misbehaviour; this can have a number org/teaching/portfolio_db.htm) provides an
of forms. One is simply that work is online searchable database of profiles of
plagiarised; another is that students create electronic portfolio projects and resources
some artefact, then back-fill by inventing in higher education, and is a valuable
the development process (which is often source of ideas.
assessed as part of the final mark) post
hoc. ICT can help with both of these
problems by requiring the submission of 3.4 SUMMARY OF SECTION 3
images of intermediate products, with
time stamps. On a more positive note, There are a number of exciting
the ability to store and work with images developments in the use of e-assessment
(photographs, video) is likely to make for both summative and formative
teaching of the design process more purposes, and several UK developments
effective. Devices such as mobile phones are at the leading edge, worldwide. In
with in-built cameras and facilities for the UK, the government has decided
audio recording make it easy to document that extensive use will be made of e-
the evolution of ideas and artefacts. This assessment. Some of these developments
facility serves a number of functions. First, are a response to current problems
27
SECTION 3
CURRENT DEVELOPMENTS
IN E-ASSESSMENT
28
SECTION 4
29
SECTION 4
be part of both formative and summative asked to compare and contrast different
assessment of these key elements of presentations, and to describe the
student performance. evolution of a news event over time. This
requires analysis of the way that evidence
Key aspects of performance relate to the is selected, and the ways that events are
exploration of the origins of the source, reconstructed over time.
analysis of its qualities as a source, and its
relation to a wider set of information. A further key aspect of knowledge use is
Successful formative assessment helps the ability to relate a particular source to a
students to internalise questions and larger body of knowledge. It will always be
question styles. For summative important for learners to develop rich
assessment, we expect students to ask schemas of knowledge facts, skills, and
questions about the nature of the procedures and their interconnections as
information source. The originator can be the basis for judging the value or
important dietary advice from Kelloggs otherwise of putative new information, or a
should be treated more cautiously than theoretical account. In science, a simple
advice from the British Medical example is a digital image of a mammal
Association. Who created it? For what with horns and claws. Students are
purpose? From what perspective was this expected to say it is most unlikely, because
written? The poor quality of much of the horns are associated with herbivores, and
information on the web can be a virtue, claws with carnivores. At a higher level of
pedagogically, because students see the abstraction, students might be asked to
sense in challenging the authority of any resolve famous conflicts in scientific ideas,
source, and can do so easily by considering in terms of what was known at the time.
alternative sources (eg Downes and For example, Lord Kelvin probably the
Zammit 2000). most distinguished scientist of his day
argued against the theory of evolution, on
Skills in analysing documents in terms of the grounds that the timescale was
their style and their use of particular impossible. The core of the Earth is largely
rhetorical devices, and in creating molten, but if the Earth were really the
documents for different audiences and in millions of years old needed for
different writing genres, are being evolutionary processes to work, it would
developed and used in English (and have cooled down long ago. What didnt he
sociology and philosophy at university know (or is his criticism valid)? The web is
level). Again, the ubiquitous use of web a source of information that challenges
sources provides both a rationale for the current knowledge students can be
value of these analytic and creative asked to relate breaking research to a
activities, and a rich source of resources wider set of knowledge. The recent scare
for assessment purposes. over the MMR vaccine (and the damage
that will be done to children by an under-
The web makes it easy to compare and analysed and over-publicised piece of
contrast different interpretations of the research) provides an example.
same events by different news providers,
and by the same provider over time. In A vivid example of summative evaluation
terms of assessment, students can be which requires both a deep knowledge
30
schema and powerful skills in knowledge that can be effective when preparing for
deconstruction and reconstruction is conventional examinations. There, the
provided by a final undergraduate danger is that students hold information
examination at Goldsmiths University on in a relatively temporary state for the
the art history course, where students are purpose of the examination, then forget
presented with two pictures, side by side, the information once the examination is
which they are to compare and contrast. over. Open-web examinations are likely to
They are required to name the artist, have desirable consequential validity
deconstruct the iconography, and interpret that is to say, are likely to lead to desirable
each work in its historical context. This learning (and learning strategies). The
could be presented via ICT, and could be unpopularity of open-book examinations
extended to film, and to other contexts. (which probably arises because they
require serious thought about the subject open-web
Another approach to supporting reflection matter) is likely to apply equally to open-
about knowledge acquisition and creation web examinations. The potential for examinations
is to incorporate assignments that require fraudulent behaviour by students (such as are likely to lead
a reflective account of the process of e-mailing for advice in situations where
creating some artefact (object or written). the purpose of testing is to assess the to desirable
Students can be asked process questions ability to search the web, or searching the learning
about sources of information ways to find web when the purpose of the assessment
good sources (perhaps in the form of is to assess networking skills) means (and learning
advice to someone with a similar job to that student activities will need to be strategies)
do), and about the sources themselves. constrained in appropriate ways.
They can be asked about problems faced, Nevertheless, open-web assessment
and the ways they were solved, in these should be explored further.
meta-learning essays.
Analysing and improving cognitive
Open-web examinations offer a parallel processes: interactive whiteboards can
to open-book examinations. One virtue provide the facility to work as a whole
of such examinations is that they are class on a problem or simulation, then to
more authentic than conventional replay and critique the sequence of
examinations, in that, outside educational actions. This provides the opportunity to
contexts, one rarely has to answer a discuss seemingly abstract concepts such
substantive question without any as strategy and exemplify them with
resources. They allow the examiner to set concrete examples. Analogies with the
a broader range of questions, because analysis of games (eg tennis) can make the
students are not expected to retain all activity seem natural in class (of course,
the relevant information in memory. analysis of on-screen video of ongoing
An adaptive strategy for success on games is a specific example of the sorts of
such examinations is to develop meta- analyses being described here). The long-
knowledge of the whole area, and to index term intention is to help students develop
sources very carefully. A large information metacognitive skills that will be applicable
bank with no index is of little use. Compare in a wide variety of situations. By looking at
the preparation necessary for this sort of different solution attempts, students can
examination with the cramming strategy be asked high-level questions such as
31
SECTION 4
how do you solve problems of this sort? annotate work to show where they meet
which can be assessed more formally by the assessment criteria.
tasks such as write some guidance for
someone else, that will help them to solve Courtenay (personal communication, 2004)
problems like this one. A requirement for described an activity designed to support
summative e-portfolios could be that creative writing in English in a night class
sample reflective analyses of processes comprised of 30 non-native speakers at an
be included. early stage of learning English. Courtenay
focuses on creation and critique, and
These techniques have great potential seeks to spend as much time as possible
when the focus is on the social and interacting with his students. Each student
emotional education of students. Topics writes online, and when they are satisfied
raised in personal and social education with their composition, it is posted to a
such as approaches to bullying can be shared server. Every student is required to
approached by presenting students with offer constructive comments on five
video vignettes, and asking them to compositions, and to revise their own
describe situations, the interactions that writing in the light of five sets of
take place, and the feelings of participants. comments. The teacher is able to tour and
Parallel information channels (provided by coach individuals as they write. With little
the participants) can provide students with effort, this approach could be extended to
feedback on the correctness or otherwise providing summative assessment.
of their insights. At a lower level, Students could be required to submit their
assessing childrens ability to identify the comments on others writing to be
emotions being expressed in different evaluated, and could provide evidence of
faces can give insights into their their ability to use comments on their own
developmental state (or, in more extreme work. An assessment system like this
cases, into pathological states such as would reinforce rather than distort the
autism). If summative information is educational ambitions of the teacher.
appropriate, it can be based on the
analysis of such vignettes. Peer assessment is attractive for a
number of reasons. (Toppings 1998 review
Supporting reflection and critical skills: demonstrated that it is associated with
an important higher-order skill is the gains on conventional performance
ability to review and improve work. This measures, in higher education.) Students
can be done via paper and pencil (for can be asked to create far more pieces of
example by writing on every third line, and work than could be marked by a single
changing pen colour at every revision tutor. It can avoid the problem that as a
cycle), but is made very easy by the use of class size gets bigger, the load on the
ICT, with facilities such as track changes tutor increases directly, along with the
in MS-Word. Students can be asked to time taken to provide feedback to students.
provide examples of their ability to improve Students must understand criteria for
work on the basis of others and their own assessment, and must acquire a range of
suggestions, and of their ability to critique higher-order skills, such as abstracting
the work of others. Another way to assess ideas, detecting errors and misconceptions,
critical thinking is to require students to critiquing and suggesting improvements, if
32
they are to engage in peer assessment. assignment, write an exemplar answer peer assessment
Peer assessment is a fact of life outside for calibration, and select two pieces of
education, so peer assessment is far more student work which contain interesting is a fact of life
authentic than some forms of assessment errors or omissions. Each of these has outside
such as multiple choice tests. Possible to be graded by the tutor, and relevant
disadvantages relate to the possibility of an comments have to be written. The tutor education
enhanced workload on students, unreliable also writes key questions on content and
feedback, and biased feedback. style. CPR is designed to overcome the
potential weakness of peer assessment
A number of commercially available in terms of unreliable assessment
systems have been designed to support (via training and moderation) and bias
peer assessment. Calibrated Peer (via anonymity). The authors claim
Review (Chapman and Fiore 2001) was considerable gains in students ability to
designed to support the peer assessment learn to learn because their attention
of essays in molecular science, but has is focused on abstracting ideas and
been applied in a variety of subjects, and arguments, describing, analysing and
with students across the education system. assessing the quality of material, and in
Students write short essays, and are asked review. CPR also increases the amount
questions designed to foster their critical of writing that students do.
thinking. Students are presented with
three calibration essays to grade, and Doiron and Isaac (2002) have developed a
must demonstrate their competence novel form of online peer review designed
before they progress. Two of the essays to complement the American College of
contain errors and misconceptions which Surgeons Advanced Trauma Life Support
students must identify and correct. Course for fourth year medical students.
Students are also asked questions on style Their system involves self-assessment,
and grammar. The scores they give to the peer evaluation, feedback and debate.
assignments are compared with official There is an inherent problem giving large
scores, and a calibration report is created numbers of students direct experience of
for the student and the tutor. If Emergency Room procedures. Here,
performance is inadequate, more students are presented with a realistic
instruction is provided, and the student case study, and must prevent the patient
must repeat the activity. Once they have from dying, conduct clinical tests, then
shown that they can assess essays request appropriate lab work followed by
effectively and reliably, they are asked to diagnosis and recommendation of a
grade three essays by peers, and finally treatment. Students reflect on, and self-
are asked to grade their own essay. The assess, their knowledge. They submit a
student and the instructor receive diagnosis and proposed treatment plan to
comments and scores. the whole group. For peer review, they are
presented with two other diagnoses and
CPR is not restricted to essays in science; treatments one from the tutor, prepared
the idea is generic, and can be applied to to contain errors, for critique. If the
literary criticism, commentaries on a piece student fails to detect the errors, they get
of art, or laboratory reports, for example. individual feedback from the tutor.
The tutor must select the focus of the Students then review live reports from
33
SECTION 4
two of their peers (so three reviews are course of group work. He suggests a
considered together). Where there are formal mechanism for this, where thinking
disagreements, the two views are styles are associated with hats of different
presented to a larger group (four to ten colours, and group members are invited to
students) who must all offer their own take particular roles sometimes as
view, and debate the issue. Similar work is individuals, and sometimes as a whole
being conducted on a health psychology group. Thinking styles include asking
course, and in engineering. about what is known or what is needed
(the White Hat); saying why an idea wont
Assessing competence with different work (the Black Hat); generating ideas and
thinking styles: mobile phone technology alternatives (the Green Hat); describing
might provide a means of assessing feelings, hunches and intuitions (the Red
mobile phone thinking styles via simulated group work. Hat); managing group processes (the Blue
Here, each student works in a simulated Hat); and the optimistic advocacy of ideas
technology might environment, where responses from other (the Yellow Hat).
provide a means group members are pre-specified, and
some responses to the actions of the Given some specific suggestions for
of assessing student are pre-defined. This environment actions via mobile phone or e-mail,
thinking styles is artificial for a number of obvious students can be asked to work in Red,
reasons contact is via phone (or e-mail) Yellow and Black Hat styles; or given a
rather than face-to-face and the range of stream of (simulated) input to a
dynamic interactions is constrained. conference, students can be asked to work
However, these constraints mean that in Blue Hat mode. Their responses provide
students can be assessed in relatively information on their strengths and
standardised conditions, and sequences weaknesses working in different thinking
can be replayed for analysis and reflection styles. This idea is not restricted to de
as part of formative assessment. Bonos framework, but is a generic
idea for assessing individual skills in
Analysing the ability to engage in De group settings.
Bonos (2000) Thinking Hats activity
provides a concrete example. De Bono has
identified a number of thinking styles, all 4.1.2 Assessing group projects
of which are useful when solving problems.
None is effective on its own. He argues A valuable skill is the ability to work
that people differ in their preferences for productively in groups. This requires
these different thinking styles, and often good communication skills, understanding
stick with a particular style of thinking. In the criteria for effective group work,
terms of group dynamics, individuals can understanding different roles, the ability to
become ego-involved with a particular assess ones own work and the work of
style of thinking, with negative others, and the ability to respond positively
consequences for the productivity of the to formative and summative feedback. The
group. De Bono argues that these different assessment of group work is problematic
thinking styles should be made explicit, for a number of reasons: problems can
and that every group member should be caused by social loafing and the
engage with every thinking style in the allocation of equal marks for unequal
34
contributions; undesirable effects of Intelligence. A problem with some of it is important
students rating peers; and time-hungry these early proponents of creativity (eg
procedures for gathering accurate Getzels and Jackson 1962) was that they to develop
evidence on student performance. accepted many of the philosophical creativity, and
assumptions of the Intelligence movement,
SPARK (Self and Peer Assessment and many of their methods, but were to evaluate the
Resource Kit - www.educ.dab.uts. incompetent in their use. The result was a products of
edu.au/darrall/sparksite) is an academic movement that was based on some good
open source project designed to support ideas, but which was poorly theorised, and creative thinking
the effective evaluation of group work, that supported by flawed evidence. Just as
has been used in a variety of contexts in there are many styles of analytic thinking,
higher education. It requires a clear that are coloured and improved by
specification of the tasks to be performed knowledge in particular domains, and
by the group and the assessment criteria. different ways to represent information,
Students reflect on group processes during so too are there many styles of creative
the performance of the task, and rate all thinking, again, influenced by knowledge
the group members, and themselves and experience in a variety of domains.
against the criteria provided. The tutor Creativity (as defined above) requires an
monitors the work of the group, grades the intimate interplay of creative and analytic
product of the group work, uses SPARK to thinking. It is important to develop
convert group marks into individual marks, creativity, and to evaluate the products of
and provides individual summative and creative thinking. Creativity should be
formative feedback (eg that a student rates evaluated by an analysis of product, and
their own contribution to the group far by an analysis of student processes,
higher than other group members do). using methods described earlier (notably,
Evaluations of SPARK by its authors in tracking the design process, and reflective
a variety of higher education contexts accounts on this process).
have been positive (eg Freeman and
McKenzie 2002). It can be difficult to obtain good paper-
based accounts of student processes and
results after engaging with an extended
4.1.3 Assessing creativity piece of work. This can be a desirable
activity for a number of reasons. First, it
Creativity involves the production of a new requires students to translate knowledge
idea or artefact that is judged by some from one form to another, and to consider
community to be of value. Many writers the needs of a different audience notably
have made a distinction between analytic from a static written form whose primary
and creative thinking. Analytic thinking has audience is the teacher, to a visual and
been characterised as: linear, rational, dynamic form for some predefined
logical, conscious and deliberate. Creative audience, who will have a range of
thinking has been described as: parallel, understandings about the topic in hand.
unconstrained, illogical, unconscious, and Second, it is inherently valuable as a skill.
chaotic. Creativity became a bandwagon Digital cameras and whiteboards make it
for education in the 1960s, in part as a easy for students to show their work
healthy corrective to an over-emphasis on (which might be on paper, in the form of
35
SECTION 4
36
conducted, and so can widen the debate on will be important to phase the introduction
curriculum and assessment design. On- of e-assessment in such a way that the
demand testing will have considerable load on students, teachers, schools
implications for curriculum planning. and systems is lower than the current
Students could take summative tests at assessment load. Some barriers are
different times, and could progress discussed below.
through the curriculum at different rates.
Establishing the credibility of
E-assessment could reduce the damage e-assessment: in some areas such as
caused by current tests. At present, new competency-based assessment, the case
SAT papers are created each year, and all for e-assessment is self-evident. In other
students answer the same questions. If the areas, reasonable sceptics will have to be
purpose of testing is to establish the convinced of its value. They will have
performance of some system (such as a concerns about the construct validity of
school or an LEA), better methods could new tests (exactly what do they measure?);
be employed. If there were a large bank of the reliability of new tests in comparison
tasks available in electronic form, and with existing tests; and the educational
different students received a different set standards required both in relation to
of tasks, then coverage of the curriculum current tests, and across tests such as
could be better, and there would be no those given on-demand in different places
need to report individual student scores. and at different times. Each of these
This would have the advantage that a questions will need to be addressed for
larger variety of task types could be used, each family of e-assessments, usually
and would avoid the current distortions by means of an empirical study.
caused by teachers teaching to the SAT.
Building system capacity: there is an
urgent need to build capacity for e-
4.3 EVOLUTION AND REVOLUTION assessment that ranges from test design,
test delivery and processing, and expertise
Even where there is a shared vision on in school. Each of these is problematic.
future curricula, there can be considerable
problems in implementation. Ridgway Task and test design: very few people have
(1998) draws analogies between ecological expertise in creating e-assessments, in e-assessment
restoration and educational change, and comparison to the large numbers of could reduce the
describes the sorts of research needed for people competent to create conventional
successful change. This style is close to tests. There is an urgent need to create damage caused
research in fast-changing fields such as new task types and to explore their by current tests
electronics, where discoveries and reliability and validity. If we do not continue
inventions drive practice and theory, in to explore, students will be faced with a set
contrast to well-established fields where of tasks which recently were innovative,
theory can lead practice. It is important to but which are now hackneyed.
be aware that some goals are easy to
achieve from most starting points, whilst
others need a good deal of capacity
building before they can be reached. It
37
SECTION 4
38
4.4 RELIABLE TEACHER 4.6 SUMMARY OF SECTION 4
ASSESSMENT VIA E-PORTFOLIOS
New educational goals continue to
A key decision for educational systems is emerge, and the process of critical
to decide exactly how much of the reflection on what is important to learn,
students time should be devoted to and how this might be assessed
working on extended projects, and how authentically needs to be institutionalised
much should be based on shorter into curriculum planning. In this section,
activities. A related decision is the balance we explore ways to assess metacognition,
to be struck between portfolio systems group projects, creativity and
assessed in school, and timed external communication skills. E-assessment is
assessments. A key issue is to establish certain to play a major role in defining and
robust and reliable systems of school- implementing curriculum change in the e-assessment
based assessment. It is worth highlighting UK. There is a strong government
the extreme positions that different commitment to e-assessment, and good is certain to play
systems use. In some systems, all initial progress has been made. Major a major role in
assessment is done externally. In some challenges of going to scale have yet to
systems for example Queensland, be faced. A good deal of innovative work is defining and
Australia - all assessment is school- needed, coupled with a grounded approach implementing
based. Queensland provides extensive to system-wide implementation.
systems for training teachers, and for
curriculum
moderating their judgements. ICT can change in the UK
facilitate this process. All student
submissions can be put onto the web,
and systems of cross-moderation can be
established. Externally defined tests can
be used to guide the moderation process.
39
GLOSSARY
40
Discrimination the ability to distinguish taken as an alternative to GCSE or A-
between and among different levels of levels, usually after compulsory schooling.
work or achievement Available at three levels; Foundation,
Intermediate, and Advanced
E-assessment electronic assessment:
processes involving the implementation of High-stakes assessment assessment that
ICT for the recording, transmission, has important consequences or
presentation and processing of implications for students, staff or schools
assessment material
ICT Information and Communications
Edexcel UK examining and awarding body Technology
providing a range of qualifications
Key sills a group of skills valued by
including at higher education level
employers as being central to all work and
EiC Excellence in Cities. Government learning, including communication,
initiative aimed at raising the educational information technology, application of
aspirations and attainment of children in numbers, working with others, and
inner cities improving own learning and performance
European Computer Driving Licence Key Stages the four stages of the National
European-wide qualification allowing Curriculum: KS1 for pupils aged 5-7; KS2
candidates to demonstrate competence in for 7-11; KS3 for 11-14; KS4 for 14-16
computer skills, covering the areas of
NVQ National Vocational Qualifications.
basic concepts of IT, using the computer
Work-based vocational qualifications. They
and managing files, word processing,
are portfolio-based qualifications which
spreadsheets, database, presentation and
show skills, knowledge and ability in
information, and communication
specific work areas. Can be taken at five
Formative assessment often called levels, depending on level of expertise and
assessment for learning. Assessment used responsibility of the job
to support teaching and learning, which
O-level also GCE Ordinary level. Former
identifies strengths and weaknesses of the
system of British examinations taken in a
student
range of subjects, usually at the age of 16.
GCE General Certificate of Education Ran in parallel with but at a higher level
than CSE. Both systems now replaced by
GCSE General Certificate of Secondary current GCSE
Education (GCSE). The main secondary
school examinations usually at 16, which Parallel forms tests that are created to
replaced previous system GCE O-levels measure the same constructs, and to
and CSEs produce the same scores, if they were
given to individuals on different occasions
GIS Geographic Information System.
System of software used for the storage, PDA Personal Digital Assistant; a small
retrieval, mapping and analysis of spatial hand-held computer. Depending on level of
data, such as mortality by different regions sophistication may allow e-mail, word
processing, music playback, internet
GNVQ General National Vocational access, digital photography or GPS
Qualification. Vocational qualification, often reception
41
GLOSSARY
42
BIBLIOGRAPHY
43
BIBLIOGRAPHY
44
presented to the 29th Annual Conference Tomlinson, M (2004). 14-19 Curriculum
of the International Association for and Qualifications Reform: Interim Report
Educational Assessment. www.aqa.org.uk/ Of The Working Group On 14-19 Reform.
support/iaea/papers/roan.pdf London: DfES. www.14-19reform.gov.uk
Robitaille, DF, Schmidt, WH, Raizen, S, Topping, KJ (1998). Peer assessment
McKnight, C, Britton, E and Nicol, C between students in college and university.
(1993). Curriculum frameworks for Review of Educational Research. 68 (3),
mathematics and science. TIMSS 249-276
Monograph No 1. Vancouver: Pacific
Educational Press
Rt Hon Charles Clarke MP, Secretary of
State for Education and Skills. Keynote
speech at Delivering E-assessment - a
Fair Deal for Learners, a summit held by
QCA on 20 April 2004
Schulman, L (1998). Teacher portfolios: a
theoretical activity, in: N Lyons (Ed) With
Portfolio in Hand: Validating the New
Teacher Professionalism (pp23-37). NY:
Teachers College Press
Slaughter, S and Leslie, LL (1997).
Academic Capitalism: Politics, Policies and
the Entrepreneurial University. Baltimore:
The Johns Hopkins University Press
Sukkarieh, JZ, Pulman, SG and Raikes, N
(2003). Auto-marking: using computational
linguistics to score short, free text
responses. Paper presented to the 29th
Annual Conference of the International
Association for Educational Assessment.
www.aqa.org.uk/support/iaea/papers/
sukkarieh-pulman-raikes.pdf
Tattersall, K (2003). Ringing the changes:
educational and assessment policies, 1900
to the present, in: Setting the Standard.
AQA: Manchester, pp7-27
Teacher Training Agency (2003). Qualifying
to Teach: Professional Standards for
Qualified Teacher Status and
Requirements for Initial Teacher Training
Tomlinson, M (2002). Inquiry into A Level
Standards. London: DfES
45
APPENDIX:
FUNDAMENTALS OF ASSESSMENT
46
Height is affected by the circumstances of Usability: people using an assessment
measurement students should take off system notably students and teachers
their shoes and hats, and should not need to understand and be sympathetic to
slump when they are measured. The its purposes.
measuring instrument is important a
Practicality: few designers work in arenas
yard stick will provide a crude estimate,
where cost is irrelevant. In educational
good for identifying students who are
settings, a major restriction on design is
exceptionally short or exceptionally tall,
the total cost of the assessment system.
but not capable of fine discriminations
The key principle here is that test
between students; using a tape measure is
administration and scoring must be
likely to lead to more measurement error
manageable within existing financial
than using a fixed vertical ruler with a bar
resources, and should be cost-effective in
which rests on each students head. Time
the context of the education of students.
of day should be considered (people are
taller in the morning); so should the time Equity: equity issues must be addressed -
between measurements. If we assess the inequitable tests are (by definition) unfair,
reliability of measurement by comparing illegal, and can have negative social
measurements on successive occasions, consequences.
we will under-estimate reliability if the
measures are taken too far apart, and
students grow different amounts in the
intervening period.
Exploration of reliability raises a set of
finer-grained questions. Here are some
examples:
is the phenomenon of being measured
relatively stable? What inherent
variation do we expect? (mood is likely
to be less stable than vocabulary size)
to what extent do different markers
assign the same marks as each other to
a set of student responses?
do students of equal ability get the
same marks no matter which version of
the test they take?
Fitness for purpose: the quality of any
design can be judged in terms of its
fitness for purpose. Tests are designed
for a variety of purposes, and so the
criteria for judging a particular test will
shift as a function of its intended purpose;
the same test may be well suited to one
purpose and ill suited to another.
47
About Futurelab
Futurelab is passionate about transforming the way people learn. Tapping into the huge
potential offered by digital and other technologies, we are developing innovative learning
resources and practices that support new approaches to education for the 21st century.
incubates new ideas, taking them from the lab to the classroom
offers hard evidence and practical advice to support the design and use of innovative
learning tools
communicates the latest thinking and practice in educational ICT
provides the space for experimentation and the exchange of ideas between the
creative, technology and education sectors.
Futurelab
1 Canons Road
Harbourside
Bristol BS1 5UH
United Kingdom
www.futurelab.org.uk
Handbooks
Drawing on Futurelab's in-house R&D programme as well as projects from around the
world, these handbooks offer practical advice and guidance to support the design and
development of new approaches to education.
We encourage the use and circulation of the text content of these publications, which
are available to download from the Futurelab website www.futurelab.org.uk/research.
For full details of our open access policy, go to www.futurelab.org.uk/open_access.htm.
Creative Commons
Futurelab 2006. All rights reserved; Futurelab has an open access policy which encourages circulation of
our work, including this report, under certain copyright conditions - however, please ensure that Futurelab is
acknowledged. For full details of our Creative Commons licence, go to www.futurelab.org.uk/open_access.htm
Disclaimer
These reviews have been published to present useful and timely information and to stimulate thinking and
debate. It should be recognised that the opinions expressed in this document are personal to the author and
should not be taken to reflect the views of Futurelab. Futurelab does not guarantee the accuracy of the
information or opinion contained within the review.
FUTURELAB SERIES
REPORT 10
ISBN: 0-9544695-8-5
Futurelab 2004