Académique Documents
Professionnel Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/280111671
READS
148
1 author:
D Royce Sadler
University of Queensland
45 PUBLICATIONS 2,727 CITATIONS
SEE PROFILE
All in-text references underlined in blue are linked to publications on ResearchGate, Available from: D Royce Sadler
letting you access and read them immediately. Retrieved on: 13 June 2016
Authors final manuscript. This article was published online in July 2015. It will be assigned to an issue of
the journal with final page numbers in due course. Publication details: Sadler, D. R. (2015 online): Three in-
course assessment reforms to improve higher education learning outcomes, Assessment & Evaluation in
Higher Education, DOI:10.1080/02602938.2015.1064858
D. Royce Sadler
School of Education, The University of Queensland
Abstract
A current international concern is that, for too large a proportion of graduates, their
higher-order cognitive and practical capabilities are below acceptable levels. The
constituent courses of academic programs are the most logical sites for developing
these capabilities. Contributing to patchy attainment are deficiencies in three particular
aspects of assessment practice: the design and specifications of many assessment tasks;
the minimum requirements for awarding a passing grade in a course and granting
credit towards the degree; and the accumulation of points derived from quizzes,
assessments or activities completed during the teaching period. Rethinking and
reforming these would lead to improvements for significant sub-populations of
students. Pursuing such a goal would also have significant positive implications for
academic teachers but be contingent on favourable contextual settings including
departmental and institutional priorities.
Introduction
This article is mainly about cognitive capabilities that are important in most academic fields:
proficiency in thinking, reasoning, synthesising, conceptualising, evaluating and
communicating. These higher-order capabilities form a subset of what are also variously
called intended learning outcomes (Biggs and Tang 2011); or some combination of
generic, graduate or higher education with competencies, skills or attributes. With
the rapid expansion of higher education worldwide, it is natural to ask about the extent to
which all students can demonstrate adequate levels of such higher-order capabilities by the
time they graduate. But what is meant by adequate? This is the fundamental question. A
1
number of agencies and commentators referenced in the next section have alleged that while
many graduates do achieve desired standards, many others do not.
This article is based on the premise that the most logical, direct and appropriate site for
developing capabilities is within the courses that constitute degree programs. Research by
Jones (2009, 2013) has demonstrated that interpretations of the competences differ from
field to field, sometimes widely. This is the nature of disciplines. However, there are
reasonable grounds for believing that capabilities developed thoroughly in one context a
particular course or sequence of courses normally have a transferable element to them.
This allows them to be reconfigured and repurposed for use in other contexts at other times.
As Strathern (1997, 320), an anthropologist, explained it:
In making transferable skills an objective, one cannot reproduce what makes a skill
work, i.e. its embeddedness [W]hat is needed is the very ability to embed oneself
in diverse contexts, but that can only be learnt one context at a time[I]f you
embed yourself in site A you are more likely, not less, to be able to embed yourself
in site B. But if in Site A you are always casting around for how you might do
research in B or C or D, you never learn that. There is a lesson here for disciplines.
Somehow we have to produce embedded knowledge: i.e. insights that are there
for excavating later, when the context is right, but not until then [W]e have not
to block or hinder the organism's capacity to use time for the absorption of
information time-released knowledge or delayed-reaction comprehension.
[Capitalization in the original].
Reforming three particular assessment practices would increase the likelihood that more
students, especially those currently at the minimum pass level, would achieve the levels
expected of all graduates. The three form a mutually interdependent package. They are: the
design and specification of assessment tasks; the requirements for a Pass; and the design of
course assessment programs. Wherever these are not currently being practiced as aspects of
normal institutional quality assurance, they amount to reforms that require enabling changes
to be made elsewhere in the learning environment.
Context
Two widely read books by Bok (2006) and Arum and Roksa (2010) respectively describe
unevenness in graduate outcomes as perceived in the USA. Bok (2006, 7-8) wrote: Survey
after survey of students and recent graduates shows that they are remarkably pleased with
their college years. Overall, they also achieve significant gains in critical thinking, general
knowledge, moral reasoning, quantitative skills, and other competencies. At the same time
and fully compatible with that, colleges and universities, for all the benefits they bring,
accomplish far less for their students than they should. Many seniors graduate without being
able to write well enough to satisfy their employers (8) by expressing themselves with
clarity, precision, style and grace (82). Many cannot reason clearly or perform
competently in analysing complex, nontechnical problems, even though faculties rank
2
critical thinking as the primary goal of a college education (8). The ability to think
critically to ask pertinent questions, recognize and define problems, identify the arguments
on all sides of an issue, search for and use relevant data, and arrive in the end at carefully
reasoned judgments is the indispensable means of making effective use of information
(109).
Here, Bok has raised quite specific concerns. They may be valid to a greater or lesser
extent for particular institutions, academic degree programs or component courses there is
usually no independent way of telling. However, his portrayal of the situation in the USA
resonates with similar concerns raised in other countries. These are reflected in the number
of national and international discussions, policies, projects, regulations, instruments and
forms of cooperation aimed at assuring graduate outcomes (Australian Learning and
Teaching Council 2010; Bergan and Damian 2010; Lewis 2010; Williams 2010; Douglass,
Thomson and Zhao 2012; Blmeke, Zlatkin-Troitschanskaia, Kuhn and Fege 2013; Dill and
Beerkens 2013; Sadler 2013b; Shavelson 2010, 2013; Tremblay, 2013; Coates, 2014). Part
of the overall unease is because, globally, higher education has expanded rapidly without
matching increases in public funding directed specifically towards teaching.
Despite what may seem an overwhelming challenge, progress could be made by
ensuring that the course grades entered on students academic transcripts can be trusted to
represent adequate levels of the expected graduate competencies. Across a full degree
program, the transcript reports student performance on a large range of demanding tasks, in a
wide variety of courses, studied over a considerable period of time, and covering substantial
disciplinary and professional territory. Specialised tests of graduate competencies are not set
up to do this (Shavelson 2013). If third parties are to draw reasonably robust conclusions
about a graduates acquired overall capability or competence, the grades on transcripts must
be trustworthy.
3
similarities, differences, and the superior fit of one of the styles for a particular purpose.
(Many other possibilities exist.) These students comprehensive understanding of both the
topic itself and the assessment context leads them to be analytical rather than descriptive,
and high marks typically follow. Regardless of the actual form of the assessment task
specifications, examiners and markers find that student responses generally range from low
to high quality for any reasonably sized student group. In some examiners eyes, this range
would be sufficient evidence to conclude that the structure and content of the assessment
task is unproblematic, thus reinforcing the status quo. That reasoning is faulty.
An example of better design for the leadership styles task would be to set up some
scenario involving two particular types of organisations (say, a voluntary association and a
business employing mostly casual staff). Ask students to explain which leadership style, or
which aspects of each, might suit the two organisations. Making the intention clear in this
way makes separate descriptions unnecessary, because how well students know the two
styles will be evident in their responses. The improved design also makes it reasonable to
hold all students to the task requirements. This is an important consideration if the
evidence of achievement is not to be compromised by poor item structure. Poor quality
evidence of a students... achievement must not be confused with evidence of poor
achievement (Sadler 2014b, 286).
In general, tasks need to stimulate higher-order thought processes such as:
hypothesising; extrapolating (or interpolating); exploring and articulating relationships
among things; estimating the likely effects of varying the parameters of a system;
redesigning something to suit a new purpose; using analogues as explanatory tools; outlining
and defending a scenario; and evaluating inadequacies or errors in solutions or arguments.
Given the huge variety of expected outcomes in different disciplines, fields and professions,
academics in those fields are best placed to determine the nature of well-formed questions
that push the students into the right amount of unfamiliar territory.
Ideally, the task specifications identify for all students the genre of response required.
Critical reviews, arguments, underlying assumptions that have to be identified, and causal
explanations are all distinct response genres (Sadler 2014a). This does not mean that students
should be given copious instructions on how to go about the task or detailed rubrics and
statements of criteria and standards of the type often recommended (Grunert O'Brien, Millis
and Cohen 2008). It means they have the right to know the genre for their response. It is both
illogical and counterproductive to appraise the quality of a student work as a member of a
particular genre if the work is not actually a member of that genre. (The concept of
response genre is not identical with writing genre as Gardner and Nesi [2013] use the
term in connection with teaching academic writing to students.)
Creating demanding assessment tasks from scratch is hard work if the tasks are to tap
into higher-order operations on ideas and information. A straightforward way to proceed is
to collect a broad range of existing tasks that require students to construct responses of
considerable length. Sources include previous assignment tasks, project descriptions and
examinations in the field. Similar material from related discipline fields may also prove
useful for ideas. Academics, individually or in groups generally can, without special tuition
or much difficulty, scrutinise the materials, broaden their own insights, and differentiate
them according to quality. In so doing, they expand their own understanding of the
4
possibilities and can decide which to avoid, emulate or adapt to suit their own context and
purpose. They can also imagine themselves as students faced with responding to particular
task specifications, trying to figure out how they would proceed.
Potential sources also include real-life problems in the relevant field. These may be of
special value in assessing graduate capability late in a degree program. Although it may not
be feasible to deal with the complexities of the full problem in its context, doing away with
unnecessary detail has to be balanced against the cost of providing students with experience
in deciding for themselves what is necessary and what may be safely discarded to make the
problem amenable to solution (Taylor 1974).
5
problem. The explanation can be found in Sadlers (2014b) parallel argument about the
impossibility of expressing academic achievement standards in verbal or other codified form.
The same reasoning applies to learning outcomes. The key terms in the language used cannot
be interpreted unambiguously. They float according to context because they have imagined
rather than concrete referents. On the other hand, assessment tasks and specifications are
material formulations that can be exhibited, argued about and administered. They provide the
sharpest and most direct tool available for discussing, clarifying and communicating course
intentions for students and academics alike.
How do institutions conceptualise what should count as a pass? Some clues can be found in
their published grade descriptors, where these exist, although the statements may not
necessarily correspond closely with actual grading decisions. Consider the five statements in
Table 1 outlining what a Pass represents in five different institutions, all obtained from their
web sites. All use the word Pass explicitly as a grade label or refer to a pass in associated
documentation. In some cases, conditions apply. For example, the number of courses that
6
can be passed at the minimum level and also credited towards a degree may be strictly
limited.
In these statements, expectations range from concessions to students who stay the full
length of courses but may actually learn very little through to notionally adequate levels of
capability. Also in there can be found open tolerance of low levels of performance on
higher-order objectives (specifically, the ability to make sound judgments, act
independently, engage in analysis, and communicate clearly) and specific endorsement of
participation in class towards course credit. Participation is not strictly an element of
achievement or competence at all. Taken together, these grade descriptors send mixed
messages about what it means to pass.
Table 1 Five grade descriptors for the lowest level of achievement in a course for
which credit can be counted towards the degree. Conditions may apply.
7
Although these formal grade descriptors indicate particular orientations, the definitive
measure of the adequacy of an institutions standards is whether the lowest-performing
students who gain credit for a course achieve higher-order objectives to a sufficient degree.
In the case of written responses, that includes the quality of writing. This can be determined
only by scrutinizing student responses to well-constructed assessment tasks. If a grade of D
is officially the lowest on the credit-earning scale but all students gain at least a B-, the
salient issue is whether the work awarded a B- deserves credit in terms of higher-order
outcomes. At the upper end of the grade scale, the issue is whether all students who gain the
highest available grade really do demonstrate excellence or a high level of distinction.
This is not the end of the story, and key questions still need to be asked: What is meant
and implied by acceptable standards or to a sufficient degree? How can appropriate
standards be set collaboratively so as to reflect a broad consensus? What is required to give
course grades integrity and currency across courses, programs and institutions? How may
standards be given material form so they can remain stable reference points over time?
These have been at least partially addressed both theoretically (Sadler 2013a, 2014b) and in
field trials (Watty, et al. 2014).
Accumulation of marks
In theory, a course grade is meant to represent a students level of capability attained by the
end of a course. [G]rading is the assignment of a letter or number to indicate the level of
mastery the student has attained at the end of a course of study (Schrag 2001, 63). It is
literally the out-come that goes on record. This is entirely consistent with the customary (and
legitimate) way of expressing intended learning outcomes: By the end of this course,
students should: . Whether the actual path of learning is smooth or bumpy, and regardless
of the effort the student has (or has not) put in, only the final achievement status should
matter in determining the course grade (Sadler 2009, 2010b). However, in many higher
education institutions, accumulating marks or points for work assessed during a period of
learning (continuous assessment) is the prevailing practice, mandated or at least endorsed by
the institution. Readily available software provides bookkeeping tools for it. These make it
easy to progressively bank marks, then weight and process them at the point of withdrawal
for conversion into the course grade.
The common arguments for accumulation are essentially instrumentalist (Isaksson
2008). The purpose is not so much to help learners attain adequate levels of complex
knowledge and skills by the end of a course as to keep them working and provide multiple
8
opportunities for feedback. In any case, so it is argued, students need, expect, appreciate and
thrive under continuous assessment (Trotter 2006; Hernndez 2012). However,
notwithstanding its superficial appeal, accumulation actually diverts attention from the goal
of achieving a satisfactory level by course end.
First, accumulating performance measures during the learning period maps the shape of
the actual learning path into the grade (Sadler 2010b). In general, the context and actions of
both teacher and learner influence the rate and depth at which learning occurs. For many
students, coming to grips with and then and overcoming false starts, errors, bumbling
attempts and time spent going up blind alleys lead to deep understandings by the end of a
course. Students can take bold risks that end in disasters and safely make conceptual
connections that later have to be unlearned. For well over a century, the role of spacing
during the total time available for developing high-order knowledge and skills has been
extensively studied. This research provides robust findings on how humans learn
(Ebbinghaus 1885; Bloom 1974; Conway, Cohen and Stanhope 1992; Rohrer and Pashler
2010; Bud et al. 2011). This is especially marked in sequential learning in which
competence is attained only after a series of learning experiences that may take months or
years to complete before the learner has developed a satisfactory degree of attainment in the
field (Bloom 1974, 682).
Second, accumulation lends itself to awarding and banking marks for a variety of
non-achievements for the purpose of influencing student behaviour. Marks are used to
incentivise and reward student effort, engagement in preferred activities, completion of
exercises or work stages, and participation. These behaviours and activities may well assist
learning, but they do not constitute the final level of achievement, or even part of it. On the
debit side of the ledger, marks may be deducted to penalise late submission, cheating or
plagiarism. The cost of using marks to modify behaviour is contamination of the grade.
Other ways have to be found. Quite apart from behaviour management, many students insist
they have a moral right for aspects other than unadulterated achievement to be included in
their grades (Zinn et al. 2011; Tippin, Lafreniere and Page 2012). Overall, the banking
model takes data from non-achievement contaminants, early deficits, and idiosyncratic
paths of learning and mixes them all into the final grade. The grade is then logically
impossible to disentangle and hence interpret (Brookhart 1991; Sadler 2010b). Equally
serious is that no coherent concept of a standard can apply to such a mishmash of data.
Finally, although accumulating marks may succeed in motivating and focusing student
effort, the pressure and drive typically ease off once the ledger balance approaches the Pass
score cut-off. This allows students to sidestep the challenge of gaining a command over the
course as a whole, especially its higher-order objectives. Put another way, accumulation
invites students to valorise externally offered proximate goals at the very time that the
eventual goal should be kept front and centre in their minds. A persons perspective on the
fullness of the eventual goal to be achieved, or the central purpose to be served, can play a
determinative role in how they approach and manage their own learning, and the task of
becoming competent (Sommers 1980; Entwistle 1995; Sadler 2014a). A steady stream of
extrinsic rewards is a poor substitute for developed intrinsic rewards where students take
primary responsibility for their own learning. Extrinsic rewards work directly against the
9
students-as-learners maturation process in which they progress towards becoming
independent, self-directed, lifelong learners.
10
unaided and to a satisfactory standard. There are not just a handful of stereotypic problems
or types of tasks that characterise the course but a wide range of possibilities that entail
diverse cognitive and practical skills in different combinations. Research by Entwistle (1995)
showed that the best preparation for course examinations comes about only by having a
thorough grasp of the whole course. Sound assessment plans, tasks and specifications are
crucial to this.
The choice of assessment task format is an important meta-parameter in the design of
course assessment programs, both formative and summative. Extensive use of multiple
choice tests reduces if not eliminates altogether the number of written prose responses,
and with it a valuable opportunity to develop competence in discipline-focused writing.
Creating precise and cogent prose promotes high-level learning primarily because it requires
careful, probing thought (Bok 2006, 103). In her classic 1977 article, Emig wrote that
Clear writing by definition is that writing which signals without ambiguity the nature of
conceptual relationships, whether they be coordinate, subordinate, superordinate, causal, or
something other (126). In Sternglasss research, students repeatedly reported that: Only
through writing [papers of a type that]required them to integrate theory with evidence did
they achieve the insights that moved them to complex reasoning about the topic under
consideration (1997, 295). Bok (2006), Zorn (2013) and many others have argued that the
best site for developing good writing is within the disciplines themselves, not separately as a
specialist activity.
11
differences of meaning, the dissonance, that writing as opposed to speech allows in the
possibility of revision. Writing has spatial and temporal features not apparent in speech
words are recorded in space and fixed in time which is why writing is susceptible to
reordering and later addition. Such features make possible the dissonance that both
provokes revision and promises, from itself, new meaning. (386)
Additional concerns about traditional examinations have their roots in typical examination
conditions. Students often experience considerable stress because of both the strict time
limits and the summary nature of high stakes, make-or-break events. In some cases,
medical researchers have explored coping strategies and the possible use of medication
(Edwards and Trimble 1992). Removing or relaxing problematic examination conditions
could well include making time limits generous (within reasonable limits) and allowing
review time and re-examination (with an accompanying fee if necessary). If it is objected
that all students in a course should perform under identical conditions, the reply is
straightforward. Students with special needs typically have accommodations made for them,
but within any course, some students may be just below the threshold at which special
accommodations would apply. In addition, the quality of a students response as appraised
against standards rather than against other students work is a clearer indicator of their
capability than the speed of task completion.
This section is concluded with two observations that apply regardless of the mode or
medium of response: efficiency and sampling. An efficient plan results in high levels of
valid achievement information relative to the costs of getting it including time in setting
and marking student work, and administrative overheads. Appropriate sampling involves
coverage across both the course subject matter (a preoccupation with many examiners) and
the range of relevant intended higher-order outcomes. These two together are somewhat
analogous to evaluating the economic potential of a mineral deposit by drilling a series of
cores into a prospective ore body to test its lateral extent and its richness (Whateley and
Scott 2006). Emphasising depth in thinking and precision in expression may well result in
higher quality but more condensed outputs.
12
this involves attending to course achievement goals as they come, and for each, a sense of
agency over personal performance.
Goal setting
Extensive research over several decades in a wide variety of field and laboratory settings has
investigated the impact that so-called hard goals have on task performance. Progressive
reviews of this work are available in Locke et al. (1981), Locke et al. (1990), and the first
and last chapters of Locke and Latham (2013). Hard goals are specific and clear rather than
general or vague, difficult and challenging rather than simple or easy, and closer to the upper
limit of a persons capacity to perform than was their initial level of performance. Goals that
require students to stretch for them generally lead to substantial gains in performance. They
act to focus attention, mobilise effort and increase persistence at a task. In contrast, do-your-
best goals often fare little better than having no goals at all. As one would expect, the degree
of improvement is moderated by other factors, including the complexity of the task, the
learners ability, the strategies employed and various contextual constraints (Locke et al.
1981). However, the general conclusion is that an individual [cannot] thrive without goals
to provide a sense of purpose If the purpose is neither clear nor challenging, very little
gets accomplished (Locke and Latham 2009, 22).
Arranging the learning environment so that all students have an adequate grasp of the
higher-order outcomes stated in course outlines is a clear imperative for universities and
colleges. Setting standards that some students initially see as tough and possibly even
unfair or coercive, depending on their initial expectations is part of that. Serious students
adapt pragmatically to hard constraints provided the settings are known, fair and relevant.
The consequences of a hard-earned Pass are highly positive in terms of both credit towards
the degree and personal sense of accomplishment. Carried out ethically, hard goals work
constructively for the student in both the short and the long term (Sadler 2014b).
13
reasonable conditions for students to succeed but also by providing effective developmental
support.
Being vividly aware that one is in control of ones actions brings with it a personal
sense of responsibility. Frith (2014) summarised an ancient Hellenistic perspective on this,
which in essence is that ones sense of agency is developed through two factors. The first is
the cognitive binding that links ones intentional action (say, considerable effort) to its
outcome (Passing the course). The second is the belief that an alternative action (investing
little or no effort) would have led to a different outcome (Fail) accompanied by an
experience of regret. The second part of this is known as counterfactual reasoning because
although it is valid to think this way, it is essentially hypothetical, being contrary to what
actually happened (Roese 1997). If the likelihood of failure in a course is low or
non-existent, the sense of agency is weakened or disappears altogether.
For students to gain clarity on a complex course-based achievement goal something
radically different from trying to improve by, say, one grade they must understand what
high-level achievement looks like and experience for themselves what reaching it entails.
Overall, students need to see and appreciate the purpose to be served, experience success in
moving towards its attainment, and be motivated, with grit and determination, to follow
through to completion.
Genuine achievement for which a student works hard and produces a high quality result
brings about levels of fulfilment and confidence that come only from possessing deep and
thorough knowledge of some body of worthwhile material or attaining proficiency in
high-level professional skills. The terms pleasure, satisfaction, motivation and
accomplishment have many nuanced and overlapping meanings, but there is little doubt
about the legitimacy of pleasure as a by-product of successful striving (Duncker 1941,
391). This is categorically different from, in the modern context, having satisfying
experiences in the classroom (although the two may co-occur) or experiencing success in
winning against others. For some students more than others, developing this type of personal
capital demands substantial striving and struggling and induces considerable stress.
However, little by way of significant and enduring learning comes cheaply, and experiencing
success at something that was originally thought to be out of reach brings a distinctive
personal reward, a palpable sense of accomplishment. Not to insist on a demonstration of an
adequate level of higher-order capabilities is to deprive students of both an important
stimulus to achieve and the satisfaction of reaching a significant goal.
Inhibitors of change
Some inhibitors are conceptual in nature. One of these consists of the multiple meanings
attached to the term standard. Add to that a limited awareness of the need for externally
validated anchorage points for standards generally and Passing grades in particular (Sadler
2011, 2013a). Others have to do with assessment practices that detract from the integrity of
course marks and grades. Some have been criticized in the literature for decades
(Oppenheim, Jahoda and James 1967; Elton 2004; Sadler 2009), but they are now so deeply
embedded in assessment cultures they are resistant to change. In addition, new practices
14
keep coming along and are added incrementally. Accepted uncritically, these often become
popular through being labelled as innovative or best practice. They are defended strongly
by academic teachers, students and administrators and may even be mandated in institutional
assessment policies. Accumulating marks is but one example. The fact that they reduce the
integrity of course grades goes largely unheralded.
Whether hard goals are actually set and enforced depends on a variety of other factors
as well, some of which are related to the grading dispositions of individual academics. At
successively higher levels in the chain of authority, the freedom of academics to make
significant changes depends on: an enabling and supportive context provided by academic
department heads and program directors; the fixedness of the prevailing assessment
traditions, grading policies and academic priorities; and requirements externally set by
governments or accrediting agencies.
15
personal and social consequences, such as loss of face, additional fees and delay to graduate
earnings, so avoiding failing grades is important. When students have to pay substantial
fees, they expect to pass and in any case would appeal against failure. All grade results are
reviewed by the Assessment Review Committee and, with very few exceptions, approved
without amendment. Consistent with the principle of academic freedom, professors must
be free to decide, according to their own professional judgments, the grades to be assigned.
Creative ways are found for students to earn enough marks for them to at least pass, with
scaffolding and active coaching there to help. Students these days need a qualification even
if it means they are not truly qualified at the end. In any case, graduates learn most of what
they need to know after graduation. Cutting out cumulative assessment and instead,
grading according to serious standards would produce high failure rates and consequential
loss of income. The institution would not tolerate that.
Finally, I know I am generous in grading, but I need to keep my teaching evaluation
scores up so I can look forward to tenure. Whether there is a causal link between grades and
teaching evaluations is debated, but [r]egardless of the true relationship between grades and
teaching evaluations, the fact that many instructors perceive a positive correlation between
assigned grades and student evaluations of teaching has important consequences when there
also exists a perception that student course evaluations play a prominent role in promotion,
salary, and tenure decisions (Johnson 2003, 49).
Most of these comments amount to admissions that things as they exist may not be as
they ought to be, but by implication, not much can be done about it. Addressing inflated pass
rates at their source by raising actual achievement levels is the only valid means of ensuring
grade integrity. No amount of tinkering with other variables, and no configuration of proxy
measurements, will make the difference required.
Conclusion
In recent decades, the focus for evaluating teaching quality has been heavily weighted
towards inputs (student entry levels, participation rates, facilities, resources and support
services) and a select group of outcomes (degree completions, employability, starting
salaries and student satisfaction, experience or engagement). Conspicuously absent is
anything to do with actual academic achievement in courses. This has allowed a number of
sub-optimal assessment practices to become normalised into assessment cultures. One of the
consequences is that too many students have been able to graduate without the capabilities
expected of graduates, yet this is not necessarily apparent from their transcripts.
The focus in this article has been on student outcomes rather than inputs, with particular
emphasis on the higher-order capabilities of students. Many students fail to master these, yet
they gain credit in course after course and eventually graduate. Directly addressing the
deficient aspects of assessment culture and practice could radically alter this state of affairs,
but it would require a transformation in thinking and practice on the part of many academics.
The ultimate aim is to ensure that all students accept a significant proportion of the
responsibility for achieving adequate levels of higher-order outcomes. Bluntly put, no
student would be awarded a pass in a course without being able to demonstrate these levels.
16
For some students, this would necessitate a major change in their priorities. For academics,
both their assessment practices and the nature of the student-teacher relationship would
change.
Undoubtedly, determination to pursue this end would have significant washback effects
on teaching, learning, and course and program objectives, but that is intended. The
likelihood of success depends on finding a rational, ethical and affordable way to do it. This
may require re-engineering some parts of the transition path, creating other parts from
scratch, and reworking priorities, policies, and practices to a considerable extent. In
particular, it would entail rebalancing institutional resource allocations in order to cater for
student cohorts that have become much more diversified. Except for aims geared narrowly to
economic and employment considerations, this goal is broadly consistent with traditional
and many recent statements of the real purposes of higher education.
17
References
Alderson, J. C. 1986. Innovations in Language Testing? In Innovations in Language
Testing: Proceedings of the IUS/NFER Conference, edited by M. Portal, 93-105.
Windsor, Berkshire: NFER-Nelson.
Arum, R., and J. Roksa. 2010. Academically Adrift: Limited Learning on College Campuses.
Chicago: University of Chicago Press.
Australian Learning and Teaching Council 2010. Learning and Teaching Academic
Standards Project Final Report. Strawberry Hills, NSW: Australian Learning and
Teaching Council.
Bergan S., and R. Damian, R., eds. 2010. Higher Education for Modern Societies:
Competences and Values. Higher Education Series No. 15. Strasbourg: Council of
Europe Publishing.
Biggs, J. B., and C. Tang. 2011. Teaching for Quality Learning at University: What the
Student Does. 4th ed. Maidenhead, UK: McGraw-Hill/Society for Research into Higher
Education/Open University Press.
Blmeke, S., O. Zlatkin-Troitschanskaia, C. Kuhn, and J. Fege, eds. 2013. Modeling and
Measuring Competencies in Higher Education: Tasks and Challenges. Rotterdam:
Sense Publishers.
Bloom, B. S. 1974. Time and Learning. American Psychologist 29 (9), 682-688.
doi:10.1037/h0037632.
Bok, D. 2006. Our Underachieving Colleges: A Candid Look at How Much Students Learn
and Why They Should Be Learning More. Princeton, NJ: Princeton University Press.
Brookhart, S. M. 1991. Grading Practices and Validity. Educational Measurement: Issues
and Practice 10 (1): 35-36. doi:10.1111/j.1745-3992.1991.tb00182.x.
Bud, L., T. Imbos., M. W. van de Wiel, and M. P. Berger. 2011. The Effect of Distributed
Practice on Students Conceptual Understanding of Statistics. Higher Education 62
(1): 69-79. doi:10.1007/s10734-010-9366-y.
Coates, H. (ed.). 2014. Higher Education Learning Outcomes Assessment: International
Perspectives. (Series: Higher Education Research and Policy. Vol. 6). Frankfurt am
Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien: Peter Lang.
Conway, M A., G. Cohen, and N. Stanhope. 1992. Very Long-Term Memory for
Knowledge Acquired at School and University. Applied Cognitive Psychology 6 (6):
467-482. doi:10.1002/acp.2350060603.
Dill, D. D. and M. Beerkens. 2013. Designing the Framework Conditions for Assuring
Academic Standards: Lessons Learned about Professional, Market, and Government
Regulation of Academic Quality. Higher Education 65 (3): 341-357.
doi:10.1007/s10734-012-9548-x.
Douglass, J. A., G. Thomson, and C-M. Zhao. 2012. The Learning Outcomes Race: The
Value of Self-Reported Gains in Large Research Universities. Higher Education 64
(3): 317-335. doi:10.1007/s10734-011-9496-x.
Duncker, K. 1941. On Pleasure, Emotion, and Striving. Philosophy and Phenomenological
Research 1 (4): 391-430. doi:org/10.2307/2103143.
18
Ebbinghaus, H. 1885. Memory: A Contribution to Experimental Psychology. Trans. H. A.
Ruger and C. E. Bussenius. 1913. NY: Teachers College, Columbia University.
Edwards, J. M. and K. Trimble. 1992. Anxiety, Coping and Academic Performance.
Anxiety, Stress & Coping: An International Journal 5 (4): 337-350.
doi:10.1080/10615809208248370.
Elton, L. 2004. A Challenge to Established Assessment Practice. Higher Education
Quarterly 58 (1): 43-62. doi:10.1111/j.1468-2273.2004.00259.x.
Emig, J. 1977. Writing as a Mode of Learning. College Composition and Communication,
28 (2): 122-128. doi:10.2307/356095.
Entwistle, N. 1995. Frameworks for Understanding as Experienced in Essay Writing and in
Preparing for Examination. Educational Psychologist 30 (1): 47-54.
doi:10.1207/s15326985ep3001_5.
Fiamengo, J. 2013. The Fail-Proof Student. Academic Questions 26 (3): 329-337.
doi:10.1007/s12129-013-9372-5.
Frith, C.D. 2014. Action, Agency and Responsibility. Neuropsychologia, 55 (1): 137142.
doi:10.1016/j.neuropsychologia.2013.09.007.
Gardner, S. and H. Nesi. 2013. A Classification of Genre Families in University Student
Writing. Applied Linguistics 34 (1): 25-52. doi:10.1093/applin/ams024.
Grunert O'Brien, J., B. J Millis, and M. W Cohen. 2008. The Course Syllabus: A Learning-
Centered Approach. San Francisco: Jossey-Bass.
Hernndez, R. 2012. Does Continuous Assessment in Higher Education Support Student
Learning? Higher Education 64 (4):489-502. doi:10.1007/s10734-012-9506-7.
Hounsell, D. 1987. Essay Writing and the Quality of Feedback. In Student Learning:
Research in Education and Cognitive Psychology, edited by J. T. E. Richardson, M. W.
Eysenck, and D. Warren-Piper, 109-119. Milton Keynes: Open University Press and
Society for Research into Higher Education.
Isaksson, S. 2008. Assess As You Go: The Effect of Continuous Assessment on Student
Learning During a Short Course in Archaeology. Assessment & Evaluation in Higher
Education 33 (1): 17. doi:10.1080/02602930601122498.
Johnson, V. E. 2003. Grade Inflation: A Crisis in College Education. New York: Springer-
Verlag.
Jones, A. 2009. Redisciplining Generic Attributes: The Disciplinary Context in Focus.
Studies in Higher Education 34 (1): 85-100. doi:10.1080/03075070802602018.
Jones, A. 2013. There is Nothing Generic about Graduate Attributes: Unpacking the Scope
of Context. Journal of Further and Higher Education 37 (5): 591-605.
doi:10.1080/0309877X.2011.645466.
Lewis, R. 2010. External Examiner System in the United Kingdom. In Public Policy for
Academic Quality: Analyses of Innovative Policy Instruments, edited by D. D. Dill, and
M. Beerkens, 21-36. Dordrecht: Springer.
Lindgren, R., and R. McDaniel. 2012. Transforming Online Learning through Narrative and
Student Agency. Educational Technology & Society 15 (4), 344355.
Locke, E. A., K. N. Shaw, L. M. Saari, and G. P. Latham. 1981. Goal Setting and Task
Performance: 1969-1980. Psychological Bulletin 90 (1): 125-152. doi:10.1037/0033-
2909.90.1.125
19
Locke, E. A., G. P. Latham, K. J. Smith, and R. E. Wood, 1990. A Theory of Goal Setting
and Task Performance. Englewood Cliffs, NJ: Prentice Hall.
Locke, E. A. and G. P. Latham. 2009. Has Goal Setting Gone Wild, or Have its Attackers
Abandoned Good Scholarship? Academy of Management Perspectives 23 (1): 17-23.
doi:10.5465/AMP.2009.37008000.
Locke, E. A., and G. P. Latham, eds. 2013. New Developments in Goal Setting and Task
Performance. New York: Routledge.
Nicol, D. J., and Macfarlane-Dick, D. 2006. Formative Assessment and Self-Regulated
Learning: A Model and Seven Principles of Good Feedback Practice. Studies in Higher
Education 31 (2): 199218. doi:10.1080/03075070600572090.
Oppenheim, A. N., M. Jahoda, and R. L. James. 1967. Assumptions Underlying the Use of
University Examinations. Higher Education Quarterly 21 (3): 341-351.
doi:10.1111/j.1468-2273.1967.tb00245.x.
Osman, M. 2014. Future-Minded: The Psychology of Agency and Control. Basingstoke:
Palgrave-Macmillan.
Pacherie, E. 2008. The Phenomenology of Action: A Conceptual Framework. Cognition
107 (1): 179-217. doi:10.1016/j.cognition.2007.09.003.
Park, C. 2003. In Other (Peoples) Words: Plagiarism by University StudentsLiterature
and Lessons. Assessment & Evaluation in Higher Education 28 (5): 461-488.
doi:10.1080/02602930301677.
Roese, N. 1997. Counterfactual Thinking. Psychological Bulletin 121 (1): 133148.
doi:10.1037/0033-2909.121.1.133.
Rohrer, D., and H. Pashler. 2010. Recent Research on Human Learning Challenges
Conventional Instructional Strategies. Educational Researcher 39 (5): 406-412.
doi:10.3102/0013189X10374770.
Sadler, D. R. 1989. Formative Assessment and the Design of Instructional Systems.
Instructional Science 18 (2): 119-144. doi:10.1007/BF00117714.
Sadler, D. R. 2009. Grade Integrity and the Representation of Academic Achievement.
Studies in Higher Education 34 (7): 807-826. doi:10.1080/03075070802706553.
Sadler, D. R. 2010a. Beyond Feedback: Developing Student Capability in Complex
Appraisal. Assessment & Evaluation in Higher Education 35 (5): 535-550.
doi:10.1080/02602930903541015.
Sadler, D. R. 2010b. Fidelity as a Precondition for Integrity in Grading Academic
Achievement. Assessment & Evaluation in Higher Education 35 (6): 727-743.
doi:10.1080/02602930902977756.
Sadler, D. R. 2011. Academic Freedom, Achievement Standards and Professional Identity.
Quality in Higher Education 17 (11): 103-118. doi:10.1080/13538322.2011.554639.
Sadler, D. R. 2013a. Assuring Academic Achievement Standards: From Moderation to
Calibration. Assessment in Education: Principles, Policy and Practice 20 (1): 5-19.
doi:10.1080/0969594X.2012.714742.
Sadler, D. R. 2013b. Making Competent Judgments of Competence. In Modeling and
Measuring Competencies in Higher Education, edited by S. Blmeke, O. Zlatkin-
Troitschanskaia, C. Kuhn, and J. Fege, 13-27. Rotterdam: Sense Publishers.
20
Sadler, D. R. 2013c. Opening up Feedback: Teaching Learners to See. In
Reconceptualising Feedback in Higher Education: Developing Dialogue with Students,
edited by S. Merry, M. Price, D. Carless, and M. Taras, 54-63. London: Routledge.
Sadler, D. R. 2014a. Learning from Assessment Events: The Role of Goal Knowledge. In
Advances and Innovations in University Assessment and Feedback edited by C. Kreber,
C. Anderson, N. Entwistle, and J. McArthur, 152-172. Edinburgh: Edinburgh University
Press.
Sadler, D. R. 2014b. The Futility of Attempting to Codify Academic Achievement
Standards. Higher Education 67 (3): 273-288. doi:10.1007/s10734-013-9649-1.
Schatzki, T. R. 2001. Practice Mind-ed Orders. In The Practice Turn in Contemporary
Theory, edited by T. R. Schatzki, K. K. Cetina, and E. von Savigny, 50-63. London:
Routledge.
Schrag, F. 2001. From Here to Equality: Grading Policies for Egalitarians. Educational
Theory 51 (1): 63-73. doi:10.1111/j.1741-5446.2001.00063.x.
Seligman, M. E. P., Railton, P., Baumeister, R. F., and Sripada, C. 2013. Navigating Into
the Future or Driven by the Past. Perspectives on Psychological Science 8(2): 119-141.
DOI:10.1177/1745691612474317.
Shavelson, R. J. 2010. Measuring College Learning Responsibly: Accountability in a New
Era. Stanford, CA: Stanford University Press.
Shavelson, R. J. 2013. An Approach to Testing and Modeling Competence. In Modeling
and Measuring Competencies in Higher Education, edited by S. Blmeke, O. Zlatkin-
Troitschanskaia, C. Kuhn, and J. Fege, 29-43. Rotterdam: Sense Publishers.
Sommers, N. 1980. Revision Strategies of Student Writers and Experienced Adult Writers.
College Composition and Communication 31 (4): 378-388. doi:10.2307/356588.
Sternglass, M. 1997. Time to Know Them: A Longitudinal Study of Writing and Learning at
the College Level. Mahwah, NJ: Erlbaum.
Strathern, M. 1997. Improving Ratings: Audit in the British University System.
European Review 5 (3): 305-321. doi:0.1002/(SICI)1234- 981X(199707)5:33.0.CO;2- 4.
Taylor, R. N. 1974. Nature of Problem Ill-Structuredness: Implications for Problem
Formulation and Solution. Decision Sciences 5 (4): 632-643. doi:10.1111/j.1540-
5915.1974.tb00642.x.
Tippin, G. K., Lafreniere, K. D., and Page, S. 2012. Student Perception of Academic
Grading: Personality, Academic Orientation, and Effort. Active Learning in Higher
Education 13 (1): 5161. doi:10.1177/1469787411429187.
Tremblay, K. 2013. OECD Assessment of Higher Education Learning Outcomes
(AHELO): Rationale, Challenges and Initial Insights from the Feasibility Study. In
Modeling and Measuring Competencies in Higher Education, edited by S. Blmeke, O.
Zlatkin-Troitschanskaia, C. Kuhn, and J. Fege, 113-126. Rotterdam: Sense Publishers.
Trotter, E. 2006. Student Perceptions of Continuous Summative Assessment. Assessment
& Evaluation in Higher Education 31 (5): 505-521. doi:10.1080/02602930600679506.
Walker, J. 2010. Measuring Plagiarism: Researching What Students Do, Not What They
Say They Do. Studies in Higher Education 35 (1): 41-59.
doi:10.1080/03075070902912994.
21
Watty, K., Freeman, M., Howieson, B., Hancock, P., OConnell, B., de Lange, P., and
Abraham., A. 2014. Social Moderation, Assessment and Assuring Standards for
Accounting Graduates. Assessment & Evaluation in Higher Education, 39 (4): 461-
478. doi:10.1080/02602938.2013.848336.
Weissberg, R. 2013. Critically Thinking about Critical Thinking. Academic Questions 26
(3): 317-328. doi:10.1007/s12129-013-9375-2.
Whateley, M. K. G. and Scott, B. C. 2006. Evaluation Techniques. Chap.10 in
Introduction to Mineral Exploration. 2nd ed. edited by Charles J. Moon, Michael K. G.
Whateley, and Anthony M. Evans, 199-252. Malden, MA: Blackwell.
Williams, G. 2010. Subject Benchmarking in the UK. In Public Policy for Academic
Quality: Analyses of Innovative Policy Instruments, edited by D. D. Dill, and M.
Beerkens, 157-181. Dordrecht: Springer. doi:10.1007/978-90-481-3754-1_9.
Williams, J. B. 2006. The Place of the Closed Book, Invigilated Final Examination in a
Knowledge Economy. Educational Media International 43 (2): 107119.
doi:10.1080/09523980500237864.
Williams, J. B. and A. Wong. 2009. The Efficacy of Final Examinations: A Comparative
Study of Closed-Book, Invigilated Exams and Open-Book, Open-Web Exams. British
Journal of Educational Technology 40 (2): 227-236.
doi:10.1111/j.1467-8535.2008.00929.x.
Zinn,T. E., Magnotti, J. F., Marchuk, K., Schultz, B. S., Luther, A., and Varfolomeeva, V.
2011. Does Effort Still Count? More on What Makes the Grade. Teaching of
Psychology 38 (1): 10-15. doi:10.1177/0098628310390907.
Zorn, J. 2013. English Compositionism as Fraud and Failure. Academic Questions 26 (3):
270-284. doi:10.1007/s12129-013-9368-1.
22