Vous êtes sur la page 1sur 13

A peer-reviewed electronic journal.

Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission
is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the
right to authorize third party reproduction of this article in print, electronic and database forms.
Volume 22 Number 4, May 2017 ISSN 1531-7714

Constructing Multiple-Choice Items to


Measure Higher-Order Thinking
Darina Scully, Dublin City University

Across education, certification and licensure, there are repeated calls for the development of
assessments that target higher-order thinking, as opposed to mere recall of facts. A common assumption
is that this necessitates the use of constructed response or essay-style test questions; however,
empirical evidence suggests that this may not be the case. In this paper, it is argued that multiple-
choice items have the capacity to assess certain higher-order skills. In addition, a series of practical
recommendations for test developers seeking to purposefully construct such items is provided.

The concept of higher-order thinking is often compiles some practical suggestions for how such items
linked to Blooms Taxonomy of Educational Objectives may be constructed.
(Bloom, Englehart, Furst, Hill & Krathwohl, 1956);
which set out six increasingly sophisticated cognitive Higher-Order Thinking: Blooms
processes in which a learner may engage. Revisions of Taxonomy & Related Frameworks
and alternatives to Blooms taxonomy have been
proposed over the years, but the underlying framework Higher-order thinking has typically been defined
has remained a stable and important influence in with specific reference to the cognitive domain of
education. There is now widespread recognition of the Blooms Taxonomy (outlined in Table 1), a trend that is
importance of invoking higher-order processes in both still evident in contemporary research and discourse (e.g.
curriculum and assessment design (e.g., Lord & Barnett & Francis, 2012; Jensen, McDaniel, Woodard &
Baviskar, 2007; Momsen, Long, Wyse & Ebert-May, Kummer, 2014). The persistent influence of Blooms
2010). framework most likely stems from its intuitively
appealing nature, and the fact that each level of cognitive
This article provides an overview of what is meant sophistication, although designed to transcend specific
by higher-order thinking, and outlines why it is valued subject matters and educational stages, can be
in assessment, not only in K-12 and higher education interpreted and operationalized to suit individual
contexts, but also in the field of professional certification contexts. The most basic level of the taxonomy is
and licensure. It discusses research that has investigated knowledge. Within any subject area, a learner can possess
the capacity of multiple-choice (MC) items to assess mere knowledge, and may demonstrate the ability to
higher-order thinking, and argues that this item format recall this learned knowledge in an assessment. They may
is not as restricted as once thought. Intended for both not, however, understand the meaning of this
researchers and practitioners in the field of assessment knowledge. Furthermore, they may not possess the
and test development, this paper highlights the potential ability to apply it in situations other than that in which it
of MC items to assess higher-order thinking skills and was learnt, or to combine it with additional knowledge
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 2
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

to create new insights. Such abilities are represented by learners engage. Furthermore, it provides a range of
subsequent levels of the taxonomy. additional verbs associated with each level, facilitating
greater precision in classifying particular learning
objectives at different levels. Anderson et al.s revision
Table 1. Blooms Taxonomy - Cognitive Domain also reverses the top two levels of the taxonomy, with
Level: Description create categorized as a higher level of thinking than
Knowledge Recallorrecognitionoflearned evaluate. No empirical evidence was provided for this
knowledgewithoutnecessarily decision, however, and it has been argued (e.g. Huitt,
havingtheabilitytoapplythis 2011) that these two levels are best thought of as being
knowledge equal in terms of complexity.
Comprehension Describingandexplaininglearned Table 2. Anderson et al.s (2001) Revision of
knowledge Blooms Taxonomy
Application Usinglearnedknowledgetosolve
Verbs
problemsinnovel(butstructurally
Associatedwith
similar)contexts
Level Description Levels
Analysis Usinglearnedknowledgeto
decomposesituationsinto Remember Retrievingrelevant Recognize,
components,recognizeunstated knowledgefromlong Recall
assumptions&identifymotives termmemory
Synthesis Combiningelementsoflearned Understand Determiningthe Interpret,
knowledgeintonewintegrated meaningof Exemplify,
wholes instructional Classify,
Evaluation Critiquingorjudgingthevalueor messages,including Summarize,
worthoflearnedknowledge oral,written& Infer,Compare,
graphic Explain
communication
Although the taxonomy appears to assume a Apply Carryingoutorusing Execute,
hierarchical structure; with the implication that aprocedureina Implement
processes such as knowledge and comprehension are givensituation
prerequisites for processes such as application and analysis, Analyse Breakingmaterial Differentiate,
Bloom et al. (1956) made no specific references to lower- intoitsconstituent Organize,
order and higher-order thinking. Consequently, others parts,detectinghow
interpretations of where lower-order thinking ends and thepartsrelateto
higher-order thinking begins have been inconsistent. oneanotherandto
Hopson, Simms and Knezek (2001), for example, anoverallstructure
included only analysis, synthesis and evaluation in their orpurpose
definition of higher-order thinking, whilst Fives and Evaluate Makingjudgements Check,Critique
DiDonato-Barnes (2013) made the cut-point between basedoncriteriaand
comprehension and application. Wiggins (2015), on the other standards
hand, posited that only knowledge constitutes lower- Create Puttingelements Generate,Plan,
order thinking. In the present paper, the taxonomy is togethertoforma Produce
conceived of as a continuum, with each level identifying novelwhole
a higher level of thinking than that which preceded it. As
such, the term higher-order thinking is understood to
An additional and important element of Anderson
include all levels from comprehension onwards.
et al.s (2001) taxonomy is its recognition of the fact that
Since its inception, Blooms taxonomy has been knowledge itself is not a unitary concept. Drawing on
formally revised (Anderson et al., 2001, see Table 2). The concepts from the field of cognitive psychology that
revised taxonomy substitutes the noun forms used to emerged in the latter half of the 20th century, the revised
name the levels with equivalent verb forms, with the aim taxonomy differentiates between four types of
of drawing greater attention to the actions in which knowledge: factual (knowledge of the basic elements of a
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 3
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

discipline such as terminology and specific details), Indeed, it has been shown that students who experience
conceptual (knowledge of the inter-relationships between assessments demanding higher-order thinking are
these elements within larger structures), procedural (how- subsequently more likely to adopt meaningful, holistic
to knowledge) and metacognitive (awareness and approaches to their study, as opposed to engaging in
knowledge of ones own cognition). It follows that any mere surface-level or rote learning techniques (Jensen et
cognitive process can interact with any type of al., 2014; Leung, Mok & Wong, 2008). In addition, such
knowledge. Krathwohl (2002) provided a helpful assessments allow instructors to provide more detailed
template to illustrate this concept, by plotting these and specific feedback (Momsen et al., 2010), which in
different types of knowledge and the various cognitive turn can promote and guide future learning.
levels on opposing axes of a two-dimensional table In higher education, there is a particularly strong
(Table 3). The cells formed by the intersections of these interest in the assessment of higher-order skills, as
two dimensions give rise to a wide range of potential universities and third-level institutions face growing
cognitive activities. demands to bridge the perceived gap between what
Table 3. Krathwohls Taxonomy Table
CognitiveProcessDimension

KnowledgeDimension Remember Understand Apply Analyse Evaluate Create


FactualKnowledge
ConceptualKnowledge
ProceduralKnowledge
MetacognitiveKnowledge

It is acknowledged that there are various students learn, and what is valued by employers. The
alternatives to the Bloom/Anderson framework. need for T-shaped professionals i.e. university
Indeed, Simkin and Kuechler (2005) noted 11 other graduates equipped not only with disciplinary
knowledge taxonomies that have been proposed over specialization (represented by the vertical stroke of the
the years. Examples include the SOLO (Structure of T), but also with soft skills that allow them to operate
Observed Learning Outcome) taxonomy proposed by effectively across a broad range of contexts (represented
Biggs and Collis (1982), comprising unistructural, multi- by the horizontal bar of the T) is increasingly
structural, relational and extended abstract stages of emphasized in both the academic literature and the
knowledge, and Webbs (1997) DOK (Depth of mainstream media (e.g. Bitner & Brown, 2008;
Knowledge) model, made up of recall & reproduction, MacCraith, 2016; Oskam, 2009; Selingo, 2015).
working with skills & concepts, short-term strategic thinking, and Examples of these soft skills include creativity,
extended strategic thinking. Some of these alternative collaborative problem-solving and critical thinking, all of
frameworks have been adapted and refined for particular which can be aligned with the upper levels of the various
disciplines (e.g. Webb, 2005), however, Blooms (1956) cognitive taxonomies.
original taxonomy and Anderson et al.s (2001) revision In the field of certification and licensure, the
of same continue to predominate in both research and primary objective of assessment is to reliably distinguish
practice. between candidates who do and do not possess the
Higher-Order Thinking & Assessment necessary knowledge, skills and abilities to practise a
particular profession. Indeed, as LaDuca, Downing and
In recent years, there has been increasing Henzel (1995, p.138) asserted, the purpose of these types
recognition of the potential formative role that assessment of assessment can be defined as the protection of the public
can play in education. That is, in addition to providing and the profession from unqualified practitioners. With this in
evaluative information about a student, assessment can - mind, issues such as the potential impact of these
and should - also serve as a mechanism to aid learning assessments on subsequent learning behaviours, or the
(Black & Wiliam, 1998). To this end, assessments value of generic, transferrable skills may seem less
tapping higher-order thinking are particularly desirable. relevant. This does not imply that the importance of
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 4
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

higher order thinking in this context is diminished, across MC items and CR/performance items. Many
however. As Webb, Cizek and Kaloh (1993) pointed out, have revealed that these item types measure two distinct
test items requiring higher-order thinking can improve constructs (e.g. Becker & Johnston, 1999; Frederiksen,
the breadth and depth of content coverage within a 1984; Hickson & Reed, 2011; Krieg & Uyar, 2001),
licensure test. More importantly, the abilities to think which is usually interpreted as a demonstration of the
strategically, to reflect, and to apply learned knowledge inferiority of MC items in assessing higher-order
in a range of situations have been identified as key thinking. This interpretation, however, rests on the
indicators of competency for a wide range of assumption that all CR items are a valid measure of
professions, including, but not limited to: nursing higher-order thinking, which is not necessarily the case.
(Morrison & Free, 2001), medicine and the allied health Furthermore, other studies have failed to find
professions (Choudhury, Gouldsborough & Shaw, 2015; differences in student performance across the two item
Mann, 2008), accountancy (Hansen, 2006), and teaching types (e.g. Hickson, Reed & Sander, 2012; Thissen,
(Struyven, Blieck & De Roeck, 2014). As such, it is vital Wainer & Wang, 1994; Walstad & Becker, 1994).
that items measuring these skills are included to ensure As Martinez (1999, p.207) argued, it is likely that any
the validity of pass/fail decisions arising from these tests. differences emerging between MC and CR items are less
MC Items as an Assessment Format a reflection of the limitations of these formats than they are of typical
use. Indeed, when subject-matter experts are given the
MC items (typically consisting of a stem and a task of classifying MC items according to the different
choice of 3-5 response options) are an attractive levels of Blooms Taxonomy, the overwhelming
assessment option for both educators and professional majority are typically deemed to be recall or recognition
test developers for several reasons. Unlike constructed- items (e.g. Masters, Hulsmeyer, Pike, Leichty, Miller &
response (CR) items such as short-answer or essay-style Verst, 2001; Momsen et al., 2010; Tarrant, Knierim,
questions, they can be quickly administered and Hayes & Ware, 2006; Webb et al., 1993). A small number
machine-scored, rendering them suitable for use with of comprehension, application and analysis items have
large groups of students or test candidates (Morrison & also been identified in these studies, however, which
Free, 2001). In addition, they facilitate higher sampling suggests that MC items do indeed have the potential to
of content per unit time (Schuwirth & Van der Vleuten, assess these skills, but that lower level MC items are
2003), are associated with greater objectivity and simply over-represented. It follows that comparisons of
reliability (Newstead & Dennis 1994, Kniveton, 1996) different item types have been constrained by this fact;
and have even been shown to demonstrate superior and that the potential of MC items to assess higher-order
concurrent validity with other measures of achievement thinking may be chronically underestimated. Strong
(Bleks-Rechek, Zeug, & Webb, 2007). The use of support for this assertion arises from studies that use
multiple-choice items also provides the opportunity for Blooms Taxonomy to classify each item from both their
test developers to quickly analyse the performance of MC and CR item banks at the offset. Indeed, Hancock
each test item, and use this information to improve (1994), Simkin and Kuechler (2005) and Traub (1993)
future assessments. followed this method, and observed that when
Despite these advantages, MC items have also subsequent performance comparisons were restricted to
received a great deal of criticism. Veloski, Rabinowitz, MC vs. CR items written at the same taxonomic level,
Robeson and Young (1999, p. 539), for example, moderate to strong correlations emerged.
condemned their use in the context of medical At this point, it should be noted that there is
education, arguing that professional competence currently insufficient evidence to suggest that MC items
requires being able to perform in a real-life setting that does can measure a fully exhaustive range of thinking skills.
not offer short lists of five choices. This reflects a general Educational research distinguishes between convergent
perception that MC items are incapable of assessing thinking, which refers to working with knowledge,
cognitive process beyond recall or recognition of concepts and processes that already exist; and divergent
knowledge, given that the correct answer is provided thinking, which is required in situations wherein there is
amongst the response options. no pre-determined solution. Given these definitions, it
A number of empirical studies have attempted to could be argued that the nature of the MC item format
investigate this issue, by comparing student performance necessarily precludes its ability to assess the two highest
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 5
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

levels of the taxonomy. Indeed, the research discussed Strategies for Constructing MC items
to this point supports the contention that the capabilities
of MC items extend as far as the analyse level, but there Assessing Higher-Order Thinking
have been few instances of evaluate or create MC items Targeting an item at a particular cognitive level
identified to date. A glance at some of the verbs requires, above all, explicit reference to a well-
suggested by Anderson et al. to describe actions established taxonomic framework such as that of
reflecting these levels (refer back to Table 2) highlights Anderson et al. (2001). For example, items can be
the potential difficulty to this end. Despite this, some written specifically to assess a students ability to
strategies may be employed to support pseudo- remember procedural knowledge, to apply factual
assessment of these higher levels using MC items. These knowledge, etc. (refer back to Table 3). Whilst this is
will be discussed in the following section. undoubtedly a helpful starting point, it is not enough to
MC items, like all forms of assessment, are refer solely to process descriptions, as they are often
associated with certain limitations. It is fully quite broad in nature and open to subjective
acknowledged that the authenticity of an assessment can interpretations (Hancock, 1994). Rather, careful
and where possible, should - be maximized by consideration should be given to the precise definitional
encompassing various methods of measurement. The criteria set out both for the cognitive level and type of
aim of this paper is not to advocate for the exclusive use knowledge in question, and these criteria mapped closely
of one particular method; rather, it is to draw attention to the structure of the item.
to the possibility that the capacity of MC items to assess Furthermore, although the taxonomy was initially
higher order thinking has been masked by studies that designed to transcend specific subject matters,
treat this item format as a homogenous entity. measurement experts argue that the more sophisticated
Constructing MC items assessing higher-order thinking cognitive processes are inherently domain-specific.
is undoubtedly a challenging and time-consuming task. Indeed, Anderson et al. (2001) acknowledged that,
Yet, it is possible; moreover, it is worthwhile investing ideally, each major field should have its own taxonomy,
the time and resources in doing so. Cognitively closer to the special language and thinking of its experts, and
challenging MC items offer an attractive balance, as they reflecting its own appropriate sub-divisions. All subject-matter
have the potential to simultaneously meet the needs of experts are faced with this challenge of operationalizing
(i) students seeking to improve their learning through the general taxonomic levels for their specific area
medium of assessment, (ii) educators who wish to obtain (Morrison & Free, 2001). As such, the extent to which
meaningful information about their students abilities, generic rules for item construction at various cognitive
(iii) universities seeking to measure skills valued by levels can be generated is somewhat limited.
employers, and (iv) certification/licensure test Nevertheless, some strategies have been identified that
developers aiming to improve the fidelity of their test may help guide the production of items that reach
items and the validity of their decision-making in a cost- beyond mere recall.
effective manner.
(i) Manipulation of Target Verbs Specific verbs
have been linked to the various cognitive processes
(Morrison & Free, 2001; Table 4). These verbs, when

Table 4. Examples of verbs associated with various categories of Blooms Taxonomy reproduced from
Morrison & Free (2001)
Knowledge Comprehension Application Analysis Synthesis Evaluation
Identify Describe Apply Analyse Compose Appraise
Define Differentiate Calculate Categorize Construct Assess
Know Discuss Classify Compare Create Evaluate
List Explain Develop Contrast Design Judge
Name Rephrase Examine Distinguish Formulate
Recognize Restate Solve Determine Modify
State Reword Use Investigate Plan
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 6
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

placed in an item, may serve as rudimental indicators of


the cognitive level it is likely to assess. This strategy Table 5. Item Flipping
should be used with caution, as some verbs could
arguably be placed in multiple categories, and much OriginalItems: FlippedItems:
depends on the context of the item in which they are Whichofthefollowing Ateacherusesastrategy
placed. They may, however, provide an objective, bestdescribeswhatis calledThumbsUp,
transparent basis for item-writers. meantbyformative ThumbsDownwithher
assessment? students. Thisillustrates
At a first glance, many of these verbs may appear to theuseof:
be incompatible with MC items - even those associated
A. isbasedonthe
with relatively low cognitive levels such as describe and A. affectiveassessment
studentsattitudes,
explain. This may explain why there is usually an B. formative
interestsandvalues
abundance of knowledge level items. However, as assessment*
B. isdesignedprimarily
Dickinson (2011) pointed out, MC items assessing C. diagnostic
toevaluatelearning
higher levels can be constructed by replacing the desired assessment
C. isusuallyhighstakes
unconstrained verb with its noun derivative; and D. summative
D. providesinformation
preceding it with a knowledge level verb, resulting in assessment
tomodifyteaching
stems such as select the best description or identify the most
andlearning*
accurate explanation. This strategy could theoretically be
extended to the synthesis and evaluation levels (e.g. select the (Source:OLeary,personal
best plan/ identify the best modification). However, as communication,
students are not required to independently generate the May1,2017)
solutions in such scenarios, this is best thought of as AccordingtoPiagets AfterSarahlearnedthat
pseudo-assessment of the highest cognitive levels. theoryofcognitive penguinscantfly,she
(ii) Item Flipping Items that present an development,whatis hadtomodifyherexisting
overarching concept or category, and require examinees accommodation? conceptofbirds.Thisbest
to recognize a specific instance of this concept, can illustratestheprocessof:
usually be classified at the lowest taxonomic level. A. theabilitytothink
Examinees can successfully answer these items without logically A. Accommodation*
necessarily having an understanding of the concept, or B. thediminishingofa B. Conservation
what it means to be an exemplar of that concept, by responsetoa C. Habituation
simply drawing on a memorized list of terms. Dickinson frequentlyrepeated D. Egocentrism
(2011) suggests that such items can be flipped, by stimulus
presenting the specific instance in the item stem, and C. alteringonesexisting
asking the examinee to identify the underlying rule or schemasasaresultof
newinformation* (Adaptedfrom:ProProfs,
concept instead. To correctly answer flipped items, test-
D. aninabilityto n.d.)
takers must also have a complete understanding of the
understand
alternative distractor concepts, and consider whether or
perspectivesbesides
not the characteristic presented in the stem could fit in
onesown
with any of these; thus the item is moved from the
knowledge to the comprehension level. Examples from the
fields of education and psychology are provided in Table
5. (iii) Use of High Quality Distractors Regardless
of how an item is constructed, if one or more of its
distractors are implausible to even the weakest students,
it will not assess higher level thinking (Hancock, 1994).
Distractors that are superficially similar to the item key,
on the other hand, demand a high level of discriminating
judgement. It has thus been recommended that, where
possible, all of the given options are theoretically
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 7
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

plausible, with the key being the best answer, as application, because they require examinees to have
opposed to the only correct option. The item stem knowledge of more than one fact or concept, and to
should also be worded appropriately to reflect this. successfully combine these to arrive at the correct
Great care must be taken to ensure that this strategy does answer. Table 7 tracks the transformation of an item
not introduce subjectivity - the item key must remain stem from a one-neuron to a five-neuron
indisputably and objectively more correct than any of classification.
the distractors. Table 7. The transformation of an item from a one-
Table 6 provides an example of the effectiveness of neuron to a five-neuron classification -adapted
high quality distractors, using two items from the field from Burns(2010)
of nursing (Morrison & Free, 2001). Although both of 1neuron Identifythecellattheendofthepointer?
these items assess similar content, the item on the right 2neuron Identifythehormoneproducedbythiscell?
demands a higher level of cognitive processing. To 3neuron Identifythetargetorgan/tissue/cellforthe
correctly answer this item, an examinee must know that hormoneproducedbythiscell?
a shuffling gait is characteristic of Parkinsons disease, 4neuron Identifythephysiologiceffectinthetarget
organ/tissue/cellforthehormoneproducedby
and use this knowledge to understand that the presence
thiscell?
of throw rugs throughout the home pose a significant 5neuron Identifythephysiologiceffectinthebody
safety hazard to the client, and thus has the greatest causedbythetargetorgan/tissue/cellforthe
implication for his care. All three of the distractors are hormoneproducedbythiscell?
plausible, as they may also be construed as having
implications for the clients care. For example, options
C and D are indicative of additional symptoms of As is evident from Table 7, the process of literally
Parkinsons disease that may require attention, whilst transforming an item from one- to five-neuron status in
option A could potentially, but not necessarily, have this way may result in rather cumbersome, poorly-
implications for care, depending on how the client feels worded items. It is important to strike a balance between
about visits from his grandchildren. achieving the desired level of cognitive complexity,
whilst simultaneously maintaining basic principles for
(iv) Tapping Multiple Neurons Burns (2010, good item-writing, such as clarity of wording (see
p.332) distinguished between one-neuron items, Haladyna, Downing, & Rodriguez, 2002 for a
whereby, figuratively, the student only has to fire one neuron to comprehensive overview of these). Furthermore, some
obtain the memorized, tidbit answer, and multiple-neuron content areas simply may not lend themselves to the use
items, which require an understanding of interconnections of five-neuron questions. An appropriate rule-of-thumb
between knowledge. Practically speaking, multiple-neuron may be to strive for items that could be classified at least
items assess higher-level processes such as knowledge at the two-neuron level or, as Morrison and Free (2001)
Table 6. A comparison of two MC items, one with a standard stem and distractors, and one with a
discriminating stem and high quality distractors (adapted from Morrison & Free, 2001)
Standardstemanddistractors: Discriminatingstemandhighqualitydistractors:
Whichofthefollowingassessmentfindingsis Anurseismakingahomevisittoa75yearoldmalewhohas
characteristicofaclientwithParkinsonsdisease? hadParkinsonsdiseaseforthepastfiveyears.Whichofthe
followinghasthegreatestimplicationforthisclientscare?
A. Nightblindness
B. Paininlowerextremities A.Theclientswifetellsthenursethatthegrandchildren
C. Shufflinggait* havevisitedforoveramonth.
D. Incontinence B.Thenursenotesthattherearenumerousthrowrugs
throughouttheclientshome*
C.Theclienthasatowelwrappedaroundhisneckthatthe
wifeusestowipeherhusbandsface
D.Theclientissittinginanarmchair,andthenursenotes
thatheisgrippingthearmsofthechair
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 8
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

suggested; items for which the answer could not One potential disadvantage of context-dependent items
theoretically be located on one page of a textbook. is their inherently greater reading load (Airasian et al.,
1994). Theoretically, this can introduce construct
Table 8 provides some examples of multiple-
irrelevant variance and disadvantage test-takers with low
neuron items from the fields of pharmacy, statistics, and
verbal ability or poor English proficiency. Consideration
education respectively. It should be noted that multiple-
of test-taker characteristics and differential item
neuron items are often (but not always) context-dependent.
functioning analyses may help monitor whether or not
That is, they may present a stimulus or scenario,
this is an issue in particular testing contexts. Where
requiring the examinee to draw on various elements of
concerns are raised, the use of video (Chan & Schmitt,
their subject knowledge to interpret the scenario and
1997) or animation (Dancy & Beichner, 2006) to present
subsequently select the most appropriate response. In
the information contained in the stem may be an
some cases, several items may accompany a given
effective solution.
scenario. These are known as context-dependent item sets.

Table 8. Examples of multiple-neuron items


Whichofthefollowingisacontraindicationforspironolactone?
A.Serumcreatinine=3.0mg/dL*
B.Serumpotassium=3.5mEq/L
C.Restingheartrate=68bpm
D.Bloodpressure=130/85mmHg
(Source:Tiemeier,Stacy&Burke,2011)
Youhavecarriedouta3x2ANOVAforindependentgroups.Therewere60participantswith10participantsrandomly
assignedtoeachcell.Youhavenowanalysedthedataandarecheckingyourwork.Whichofthefollowingwould
immediatelyletyouknowthatyouhavemadeanerror?
A.Youfoundthetotaldegreesoffreedomtobe60.*
B.Youfoundthemeansquarefortheerrortermtobe6.25
C.YoufoundtheFstatisticfortheinteractioneffecttobe2.34
D.Youfounddegreesoffreedomfortheinteractioneffecttobe2.
(Source:DiBattista,2011)
Jamesisafourthclassstudent.Hisresultsfromastandardizedreadingassessmentarebelow:
Test:Reading StandardScore: 81
Level:4 STENScore: 2
Form:A PercentileRank: 12

Jamesteacher,MrsSmithispreparingtoexplainthetestresultstoJamesparents.Whichofthefollowingrepresents
acorrectinterpretationoftheresults?
A)Jamesdidaswellorbetterthan12%ofstudentsinhisclass
B)Jamesdidaswellorbetterthan12%of4thclassstudentsnationally*
C)Jamesdidbetterthan81%of4thclassstudentsnationally
D)Jamesdidbetterthan81%ofstudentsinhisclass

WhatadditionalinformationwouldbemostimportantforMrs.SmithtocommunicatetoJamesparentstohelpthem
fullyunderstandthemeaningoftheseresults?
A)Jamesrawscoreonthestandardizedtest
B)Themeanstandardscorefortheclass
C)Jamesperformanceineverydayreadingactivities*
D)Jamesstandardizedscoresfromlastyear
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 9
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

Each of the strategies described to this point may speaking, these items require high-level problem-solving
aid item-writers to construct MC items at higher skills such as interpreting abstract information,
cognitive levels. However, it should be appreciated that recognizing discrepancies, scrutinizing decisions, or
none will automatically produce items aligned to one inferring causality in complex situations (Lord &
particular level. Constructing analysis level items, for Baviskar, 2007; Simkin & Kuechler, 2005; Vacc, Loesch,
example, is especially challenging, and requires an & Lubik, 2001). Of course, it is necessary to tailor these
additional layer of abstraction. Indeed, as Oermann and criteria to the specific subject matter, and also to the level
Gaberson (2009) pointed out, differences between of the exam. Two examples of analysis items from a
application and analysis items are not always readily nursing certification exam and a primary school science
apparent. Analysis has been described in various terms, test, respectively, are provided in Table 9.
such as breaking material into its constituent parts (Anderson
et al., 2001) and recognizing unstated assumptions (Bloom et Validity Considerations
al., 1956). Items that assess analysis are often Following attempts to construct MC items that
accompanied by complex stimuli that must be measure higher-order thinking, the next logical step is to
interpreted; alternatively, they may require examinees to investigate whether or not these items succeed in doing
digest and make sense of multiple pieces of information so. This is essential to establish support for the validity
with respect to each response option in order to of any subsequent judgements based on the assessment.
determine which is the most appropriate. Generally However, it can present some challenges. Ideally, the

Table 9. Examples of MC items that could be classified at the analysis level


Youreceiveareportonthefollowingpatientsatthebeginningofyoureveningshift.Whichpatientshouldyou
assessfirst?

A.An82yearoldwithpneumoniawhoseemsconfusedattime*
B.A76yearoldpatientwithcancerwith300mLremainingofanintravenousinfusion
C.A40yearoldwhohadanemergencyappendectomy8hoursago
D.An18yearoldwithchesttubesfortreatmentofanpneumothoraxfollowinganaccident

(Source:Oermann&Gaberson,2009)
LindaandStevedidasurveyofthefruitthatchildrenintheirclasslikedbest.
Lookatthechartandanswerthetwoquestions.

12 1.Orangesaremorepopularthan
grapes
10 A. True
8 B. False*
C. Cantsay
6

4
2 2.Childreneatbananasmostoften
A. True
0 B. False
Apples Oranges Bananas Pears Grapes
C. Cantsay*

(AdaptedfromKilfeather,OLeary&Varley,2011)
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 10
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

cognitive complexity of an item should be rated by of measuring complex cognitive processes. A more
subject-matter experts, trained explicitly in the use of accurate assertion, however, may be that MC items
one or more of the relevant taxonomies (e.g. Tarrant et measuring complex cognitive processes are simply rarely
al., 2006). A drawback to this method is that there is constructed. That is, the format itself is not necessarily
necessarily an element of subjectivity associated with restricted to the assessment of superficial recall and
these ratings. Accordingly, it is advisable that items are recognition. By adhering to certain strategies, it is
classified by diverse groups of experts, who have been possible to construct MC items measuring processes
instructed to focus closely on the definitional criteria such as knowledge application and analysis. This can
outlined in the frameworks, and to consider potential benefit both learners and test developers in a variety of
discrepancies that could arise from these as they are contexts.
classifying the items. Measures of inter-rater reliability
should then be considered in determining whether an References
item can be eventually classified at any given level, or Airasian, P.W. (1994) Classroom assessment. New York:
whether it should be revised/removed. McGraw-Hill.
For those favouring more objective criteria, it may Anderson, L.W., Krathwohl, D.R., Airasian, P.W.,
seem appealing to refer to the items difficulty indices. Cruickshank, K.A., Mayer, R.E., Pintrich, P.R.,
However, this is not advisable. Difficulty and complexity Wittrock, M.C. (2001). A taxonomy for learning,
are two distinct attributes, with the former simply teaching and assessing: A revision of Blooms
referring to the proportion of test-takers who answer an Taxonomy of Educational Objectives. (Complete
item correctly. As Hancock (1994) pointed out, a test edition). New York: Longman.
item may be difficult on the basis that it requires a test- Azevedo, R., Moos, D.C., Johnson, A.M., & Chauncey, A.D.
taker to recall a relatively obscure fact. This fact, (2010). Measuring cognitive and metacognitive
however, may be trivial with regard to a learners overall regulatory processes during hypermedia learning: issues
level of understanding of a complex concept, or a and challenges. Educational Psychologist, 45 (4), 210-223
licensure candidates competence in the given Barnett, J. E & Francis, A.L. (2012). Using higher order
profession. Empirical evidence has supported this view, thinking questions to foster critical thinking: a
with weak and occasionally inverse correlations classroom study. Educational Psychology, 32 (2) 201-211.
emerging between complexity ratings and difficulty
indices (e.g. Hancock, 1994; Schneider, Huff, Egan, Becker, W.E., & Johnston, C. (1999). The relationship
between multiple-choice and essay response questions
Gaines, & Ferrara, 2013).
in assessing economics understanding. Economic Record,
One particularly promising method that may be 75 (231), 348-357
suitable for identifying the cognitive complexity of a test Biggs,. J.B. & Collis, K.F. (1982). Evaluating the Quality of
item is the think aloud protocol (TAP; Ericsson 2006), Learning: The SOLO Taxonomy (Structure of the
whereby individuals are asked to verbalize their thoughts Observed Learning Outcome) New York: Academic
whilst engaged in a learning activity. Evidence suggests Press
that TAP may represent an accurate measure of both
cognitive and metacognitive processes (Azevedo, Moos, Bitner, M.J & Brown, S.W. (2008). The service imperative.
Business Horizons, 51 (1) 39-46
Johnson & Chauncey, 2010); as such, if TAP was
employed to measure the cognitive processes of a test- Black, P. & Wiliam, D. (1998). Assessment and Classroom
taker whilst completing a given item, this could Learning. Assessment in Education: Principles, Policy
potentially give valuable insights into the level of & Practice, 5: 1, 7-74
thinking assessed by this item. Bleks-Rechek, A., Zeug, N. & Webb, R.M. (2007).
Discrepant performance on multiple-choice and short
Concluding Comments answer assessments and the relation of performance to
The importance of measuring higher-order general scholastic aptitude. Assessment and Evaluation in
thinking, for a variety of reasons, is well recognized in Higher Education, 32 (2) 89-105
both educational and professional assessment circles. Bloom, B., Englehart, M., Furst, E., Hill, W. & Krathwohl,
Many have argued that MC items although valued for D. (1956). A taxonomy of educational objectives,
their objective and cost-efficient nature are incapable
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 11
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

Handbook I: Cognitive domain. New York: David Hickson, S. & Reed, W.R. (2011). More evidence on the use
McKay Company. of constructed-response questions in principles of
economics classes. International Review of Economic
Burns, E.R. (2010). Anatomizing reversed: Use of Education, 10, 28-48
examination questions that foster use of higher order
learning skills by students. Anatomical Science Education, 3 Hickson, S., Reed, R. W., & Sander (2012). Estimating the
(6) 330-334 Effect on Grades of Using Multiple-Choice Versus
Constructive-Response Questions: Data From the
Chan, D. & Schmitt, N. (1997). Video-based versus paper- Classroom. Educational Assessment, 17 (4), 200-213
and-pencil method of assessment in situational
judgment tests: Subgroup differences in performance Hopson, M.H. Simms, R.L. & Knezek, G.A. (2001). Using a
and face validity perceptions. Journal of Applied Technology-Enriched Environment to Improve
Psychology, 82, 143-159 Higher-Order Thinking Skills. Journal of Research on
Technology in Education, 34 (2), 109-119
Choudhury, B., Gouldsborough, I., Shaw, F.L. (2015). The
intelligent anatomy spotter: A new approach to Huitt, W. (2011). Bloom et al.s taxonomy of the cognitive
incorporate higher levels of Blooms taxonomy. domain. Educational Psychology Interactive. Valdosta,
Anatomical Science Education, 19, GA: Valdosta State University. Retrieved from
http://www.edpsycinteractive.org/topics/cognition/bl
Dancy, M. & Beichner, R. (2006). Impact of animation on oom.html
assessment of conceptual understanding in physics.
Physical Review Special Topics Physics Education Research, 2 Jensen, J.L., McDaniel, M.A., Woodard, S.M., & Kummer,
(1), 1-7 T.A. (2014). Teaching to the Test or Testing to Teach:
Exams Requiring Higher Order Thinking Skills
DiBattista, D. (2011, September). Getting the most out of Encourage Greater Conceptual Understanding.
multiple-choice questions. Paper presented at Educational Psychology Review, 26 (2), 307-329
University of New Brunswick, Saint John
Kilfeather, P., OLeary, M. & Varley, J. (2011). Irish Primary
Dickinson, M. (2011, December 5th). Writing multiple- Science Achievement Tests.Dublin: CJ Fallon.
choice questions for higher-level thinking. Learning
Solutions Magazine. Retrieved from Kniveton, B. H. (1996). A correlational analysis of multiple-
http://www.learningsolutionsmag.com/articles/804/w choice and essay assessment measures. Research in
riting-multiple-choice-questions-for-higher-level- Education, 56, 73-84
thinking
Krathwohl, D. (2002). A Revision of Blooms Taxonomy:
Ericsson, K.A. (2006). Protocol analysis: verbal thoughts as An Overview. Theory into Practice, 31 (4), 212-218
data. Cambridge: MIT Press.
Krieg, R.G., & Uyar, B. (2001). Student performance in
Fives, H. & DiDonato-Barnes, N. (2013). Classroom Test business and economic statistics: Does exam structure
Construction: The Power of a Table of Specifications. matter? Journal of Economics and Finance, 25 229-241
Practical Assessment, Research & Evaluation, 18 (3).
Available online. LaDuca, A., Downing, S.M., & Henzel, T.R. (1995).
Systematic Item Writing and Test Construction. In: J.
Frederiksen, N. (1984). The real test bias. American Impara (Ed.) Licensure Testing: Purposes, Procedures
Psychologist, 39, 193-202. and Practice, 117-148 (Lincoln, NE: Buros)
Haladyna, T.M., Downing, S.M. & Rodriguez, M.C. (2002). Leung, S.F., Mok, E., & Wong, D. (2008). The impact of
A Review of Multiple-Choice Item-Writing Guidelines assessment methods on the learning of nursing
for Classroom Assessment. Applied Measurement in students. Nurse Education Today, 28, 711-719
Education, 15 (3), 309-333
Lord, T. & Baviskar, S. (2007). Moving students from
Hancock, G.R. (1994). Cognitive Complexity and the information recitation to information understanding:
Comparability of Multiple-Choice and Constructed exploiting Blooms taxonomy in creating science
Response Test Formats. The Journal of Experimental questions. Journal of College Science Teaching, 5, 40-44
Education, 62 (2), 143-157
MacCraith, B. (2016, Mar 29). Why we need more T-shaped
Hansen, J.D. (2006). Using Problem-Based Learning in graduates. The Irish Times. Retrieved from:
Accounting. Journal of Education for Business, 81 (4), 221- https://www.irishtimes.com
224
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 12
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

Mann, K. (2008). Reflection: understanding its influence on Struyven, K, Blieck, Y., & De Roeck, V. (2014). The
practice. Medical Education, 42, 449-451 electronic portfolio as a tool to develop and assess pre-
service student teaching competences: Challenges for
Martinez, M.E. (1999). Cognition and the question of test quality. Studies in Educational Evaluation, 43, 40-54
item format. Educational Psychologist, 34 (4), 207-218
Tarrant, M., Knierim, A., Hayes, S., Ware, J. (2006). The
Masters, J.C., Hulsmeyer, B.S., Pike, M.E., Leichty, K., frequency of item writing flaws in multiple-choice
Miller, M.T., & Verst, A.L. (2001). Assessment of questions used in high stakes nursing assessments.
multiple-choice questions in selected banks Nurse Education in Practice, 6, 354-363
accompanying text books used in nursing education.
Journal of Nursing Education, 40 (1), 25-32 Tiemeier, A.M., Stacy, Z.A., & Burke, J.M. (2011). Using
Multiple Choice Questions Written at Various Blooms
Momsen, J.L., Long, T.M., Wyse, S.A., & Ebert-May, D. Taxonomy Levels to Evaluate Student Performance
(2010). Just the Facts? Introductory Undergraduate across a Therapeutics Sequence. Innovations in Pharmacy,
Biology Courses Focus on Low-Level Cognitive Skills. 2 (2)
CBE Life Sciences Education, 9, 435-440
Thissen, D. Wainer, H. & Wang, X. (1994). Ares tests
Morrison, S. & Free, K.W. (2001). Writing multiple-choice comprising both multiple-choice and free-response
items that promote and measure critical thinking. items necessarily less unidimensional than multiple-
Journal of Nursing Education, 40, (1) 17- 24 choice tests? An analysis of two tests. Journal of
Newstead, S. & Dennis, I. (1994). The reliability of exam Educational Measurement, 31, 113-123
marking in psychology: examiners examined. Traub, R.E. (1993). On the equivalence of the traits assessed
Psychologist, 7, 216-219 by multiple-choice and constructed-response tests. In
Oermann, M.H. & Gaberson, K.B. (2009). Evaluation and R. E. Bennett & W. C. Ward (Eds.), Construction
Testing in Nursing Education: Third Edition. New versus choice in cognitive measurement: Issues in
York: Springer constructed response, performance testing, and
portfolio assessment pp. 29-43. Hillsdale, NJ: Lawrence
Oskam, I.F. (2009). T-shaped engineers for interdisciplinary Erlbaum Associates, Inc.
innovation: An attractive perspective for young people
as well as a must for innovative organizations. SEFI Vacc, N.A., Loesch, L.C., & Lubik, R.E. (2001). Writing
(European Society of Engineering Education) Annual Multiple-Choice Test Items. In G. Walz & J. Bleuer
Conference. Retrieved from www.sefi.be/wp- (Eds.) Assessment: Issues and Challenges for the
content/abstracts2009/Oskam.pdf Millennium. CAPS: Greensboro, NC
ProProfs (n.d.) Unit 3: Developmental Psychology. Veloski, J.J., Rabinowitz, H.K., Robeson, M.R., & Young,
Retrieved from https://www.proprofs.com/quiz- P.R. (1999) Patients Dont Present with Five Choices:
school/story.php?title=unit-3-developmental- An Alternative to Multiple-choice Tests in Assessing
psychology Physicians Competence. Academic Medicine, 75 (5), 539-546
Schuwirth, L.W. & van der Vleuten, C.P. (2003). ABC of Walstad, W.B., & Becker, W.E. (1994). Achievement
learning and teaching in medicine: Written assessment. differences on multiple-choice and essay tests in
British Medical Journal, 326, 643-645. economics. American Economic Review, 84, 193-196
Selingo, J. (2015, June 21). Education for a jobless future: Webb, L., Cizek, G. & Kaloh, J. (1993, April). The Use of
Are colleges preparing students for the workforce? The Cognitive Taxonomies in Licensure and Certification
Washington Post. Retrieved from: Test Development: Reasonable or Customary? Paper
https://www.washingtonpost.com presented at the annual meeting of the American
Educational Research Association.
Schneider, M.C., Huff, K.L., Egan, K.L., Gaines, M.L., &
Ferrara, S. (2013). Relationships Among Item Cognitive Webb, N.L. (1997, April). Research Monograph Number 6:
Complexity, Contextual Demands, and Item Difficulty: Criteria for Alignment of Expectations and
Implications for Achievement-Level Descriptors. Assessments on Mathematics and Science Education.
Educational Assessment, 18 (2), 99-121 Washington D.C: Council of Chief State School
Officers.
Simkin, M. & Kuechler, W. (2005). Multiple-Choice Tests
and Student Understanding: What Is the Connection? Webb, N.L. (2005). Depth of Knowledge levels for four
Decision Sciences Journal of Innovative Education, 3 (1), content areas. Presentation to the Florida Education
73-97
Practical Assessment, Research & Evaluation, Vol 22 No 4 Page 13
Scully, Constructing Multiple-Choice Items to Measure Higher-Order Thinking

Research Association, 50th Annual Meeting, Miami, Zimmaro, D.M. (2004). Writing good multiple-choice
Florida. exams. [Workshop material]. Learning Sciences,
University of Texas Austin. Retrieved from:
Wiggins, G. (2015, March 4). Five unfortunate https://facultyinnovate.utexas.edu/sites/default/files/
misunderstandings that almost all educators have about documents/Writing-Good-Multiple-Choice-Exams-04-
Blooms Taxonomy [Blog post]. Retrieved from: 28-10.pdf
https://grantwiggins.wordpress.com/2015/03/04/5-
unfortunate-misunderstandings-that-almost-all-
educators-have-about-blooms-taxonomy

Author Notes
The work of the Centre for Assessment Research, Policy and Practice in Education at Dublin City University is
supported by Prometric. The author would like to thank the test development team at Prometric for stimulating
her interest in this topic. Acknowledgements are also due to Anastasios Karakolidis, Michael OLeary and Linda
Waters for their helpful comments on drafts of the manuscript. The views expressed in this article are those of
the author and do not necessarily represent the views of Prometric.
Citation:
Scully, Darina (2017). Constructing Multiple-Choice Items to Measure Higher-Order Thinking. Practical
Assessment, Research & Evaluation, 22(4). Available online: http://pareonline.net/getvn.asp?v=22&n=4

Corresponding Author
Dr. Darina Scully
Centre for Assessment Research, Policy & Practice in Education (CARPE)
Institute of Education, St. Patricks Campus
Dublin City University
Dublin 9
Republic of Ireland

email: darina.scully [at] dcu.ie

Vous aimerez peut-être aussi