Pachur Olsson 2012-1 PDF

Cognitive Psychology 65 (2012) 207240
Contents lists available at SciVerse ScienceDirect
Cognitive Psychology
journal homepage: www.elsevier.com/locate/cogpsych
Type of learning task impacts performance and strategy

selection in decision making
Thorsten Pachur a,, Henrik Olsson b
a
Department of Psychology, University of Basel, Missionsstrasse 60/62, 4055 Basel, Switzerland
b
Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Lentzeallee 84, 14195 Berlin, Germany
a r t i c l e i n f o a b s t r a c t
Article history: In order to be adaptive, cognition requires knowledge about the

Accepted 30 March 2012 statistical structure of the environment. We show that decision
performance and the selection between cue-based and exemplar-
based inference mechanisms can depend critically on how this
Keywords:
knowledge is acquired. Two types of learning tasks are distin-
Decision making
guished: learning by comparison, by which the decision maker
Strategy
Inference learns which of two objects has a higher criterion value, and direct
Feedback criterion learning, by which the decision maker learns an objects
Learning criterion value directly. In three experiments, participants were
Exemplar processing trained either with learning by comparison or with direct criterion
Comparison learning and subsequently tested with paired-comparison, classi-
cation, and estimation tasks. Experiments 1 and 2 showed that
although providing less information, learning by comparison led
to better generalization (at test), both when generalizing to new
objects and when the task format at test differed from the task for-
mat during training. Moreover, learning by comparison enabled
participants to provide rather accurate continuous estimates. Com-
putational modeling suggests that the advantage of learning by
comparison is due to differences in strategy selection: whereas
direct criterion learning fosters the reliance on exemplar process-
ing, learning by comparison fosters cue-based mechanisms. The
pattern in decision performance reversed when the task environ-
ment was changed from a linear (Experiments 1 and 2) to a nonlin-
ear structure (Experiment 3), where direct criterion learning led to
better decisions. Our results demonstrate the critical impact of
learning conditions for the subsequent selection of decision
Corresponding author. Fax: +41 61 2670441.

E-mail address: thorsten.pachur@unibas.ch (T. Pachur).
0010-0285/$ - see front matter 2012 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.cogpsych.2012.03.003
208 T. Pachur, H. Olsson / Cognitive Psychology 65 (2012) 207240
strategies and highlight the key role of comparison processes in

cognition.
2012 Elsevier Inc. All rights reserved.
1. Introduction
In Simon Becketts The Chemistry of Death protagonist David Hunter gets drawn into a murder
investigation because of his expertise in reading dead bodies. For instance, he is able to assess the
times of death of a murdered dog and its female owner based on the presence of empty pupae husks,
beetles, ies and maggots in and around the corpses, leading to the crucial insight that the dog was
killed before the woman. This ability to draw inferences about the criterion (e.g., time of death) of ob-
jects based on probabilistic cue information (e.g., presence of maggots) is a key cognitive function: it is
involved in many real-world judgment tasks, ranging from medical diagnosis to predicting the success
of job applicants, students, and stocks.
Making accurate inferences requires knowledge about the statistical structure of the environment.
There are two main experimental paradigms in decision research for training participants on this
knowledge.1 In one frequently used type of learning task, people are presented with pairs of objects
and given feedback on which object has a higher value on the criterion dimension (e.g., Bergert & Nosof-
sky, 2007; Newell, Weston, & Shanks, 2003; Persson & Rieskamp, 2009). To illustrate, when learning the
relationship between features of a dead body and its time of death, one may learn which of two bodies
has been dead for a longer timebut not by how much. In this article, we will refer to this type of learn-
ing task as learning by comparison. An alternative type of learning task is that during training, each object
is shown separately along with its continuous criterion value (e.g., Juslin, Jones, Olsson, & Winman, 2003;
Juslin, Olsson, & Olsson, 2003). To illustrate, one may learn for each body the specic time that has
elapsed since the person was killed (e.g., 12 days). In this article, we will refer to this type of learning
task as direct criterion learning. Note that only in direct criterion learning, but not in learning by compar-
ison, is the decision maker directly provided with metric information about the criterion.
In previous studies, learning by comparison and direct criterion learning have been used more or
less interchangeably (see below), presumably under the assumption that it does not really matter how
knowledge about the statistical structure of the environment is acquired.2 However, it is far from clear
whether this assumption is justied. First, in classication research it has been demonstrated that learn-
ing tasks involving the possibility to compare objects with each other can lead to more accurate re-
sponses at a later test (e.g., Namy & Gentner, 2002)though no research exists as to whether such an
advantage also holds in judgment.3 Second, studies on the relative contribution of cue-based mecha-
nisms (which rely on abstracted cuecriterion relations) and exemplar-based mechanisms (which rely
on retrieving entire feature patterns) in inference have yielded some puzzling inconsistencies that might
be due to an impact of the type of learning task on strategy selection. For instance, whereas Juslin, Jones,
et al. (2003) and Juslin, Olsson, et al. (2003), highlighting the importance of exemplar processing, used
direct criterion learning, studies that found only limited evidence for exemplar processing (and instead
obtained strong support for cue-based mechanisms; e.g., Nosofsky & Bergert, 2007; Persson & Rieskamp,
2009) used learning by comparison. By choice of the type of learning task used during training, research-
ers might thus have inadvertently fostered the use of a particular type of judgment mechanism.
1
In addition to the feedback-based learning paradigms discussed in this article, some studies provide information about the
cuecriterion relationships (i.e., cue validities) numerically (e.g., Brder & Schiffer, 2003; Rieskamp & Otto, 2006).
2
In contrast to judgment research, more sophisticated analyses of the effect of different types learning tasks (e.g., feature
inference learning vs. categorization learning) have been conducted in classication (e.g., Hoffman & Rehder, 2010; Yamauchi &
Markman, 1998). One of the goals of this article is to connect results in classication to decision research, thus bridging the often
lamented gap between the two related literatures (cf. Juslin, Olsson, et al., 2003; Juslin & Persson, 2002).
3
In fact, evidence in judgment research may even suggest a disadvantage of learning by comparison. Specically, Juslin, Jones,
et al. (2003) showed that providing only discrete (rather than continuous) information about the criterionas in learning by
comparisonobstructs the abstraction of cue-criterion relationships.
T. Pachur, H. Olsson / Cognitive Psychology 65 (2012) 207240 209
Our goal in this article is to examine the impact of the type of learning task on subsequent decision
making (i.e., in a test phase). Specically, participants were trained with either learning by comparison
or direct criterion learning and we investigated how this manipulation affected (a) peoples ability to
generalize their learned knowledge to new objects (i.e., ones not shown during training) or when
switching to a different type of decision task (e.g., from paired comparison to classication) and (b)
peoples strategy selection between cue-based and exemplar-based mechanisms. We also test the
hypothesis that whether learning by comparison or direct criterion learning leads to better perfor-
mance at test depends on the structure on the environmentwith potentially considerable practical
implications for designing methods to improve decision making. In order to be able to directly com-
pare reliance on cue-based and exemplar-based mechanisms after learning by comparison and after
direct criterion learning, we propose a new exemplar model that can produce continuous responses
after training with learning by comparison. This model assumes that each exemplar is stored along
with a tally of how often the exemplar had a higher criterion value than the exemplars to which it
was compared. In the following, we rst characterize cue-based and exemplar-based mechanisms
and give an overview of studies that have used learning by comparison and direct criterion learning.
Then we contrast possible hypotheses concerning the impact of the type of learning task on decision
accuracy and the reliance on cue-based and exemplar-based strategies and report three experiments
that test these hypotheses.
1.1. Cue-based and exemplar-based mechanisms in multiple-cue judgment
What mechanisms do people use when making inferences about the world? A key distinction is be-
tween cue-based and exemplar-based mechanisms (cf. Juslin, Olsson, et al., 2003; Rouder & Ratcliff,
2004). Cue-based models represent the traditional approach to modeling human judgment and assume
that the decision maker processes individual cues of an object based on knowledge about the relation-
ship between the cues and the criterion (Brehmer, 1994; Cooksey, 1996; Hammond, 1955; Meehl,
1954; see also Katsikopoulos, Pachur, Machery, & Wallin, 2008). This knowledge is abstracted from
previous experience. Accordingly, David Hunter would judge which of two bodies has been dead for
a longer time by considering indicators of time of death that he observes at the site of the bodies.
He would give more weight to those indicators that in previous investigations were more accurate
indicators of the actual time of death. Various models of cue-based inference have been proposed,
such as the cue abstraction model (CAM; Juslin, Olsson, et al., 2003), the rational model (Lee &
Cummins, 2004), tallying (Gigerenzer, Todd, & the ABC Research Group, 1999), the take-the-best
heuristic (Gigerenzer & Goldstein, 1996; Khader et al., 2011), and the recognition heuristic (Goldstein
& Gigerenzer, 2002; Pachur, 2010; Pachur, 2011).
Juslin and colleagues (Juslin, Olsson, et al., 2003; Juslin & Persson, 2002) highlighted a qualitatively
different type of inference mechanism: exemplar processing. Exemplar-based models are an established
approach in classication research (e.g., Medin & Schaffer, 1978; Nosofsky & Johansen, 2000), but are
only rarely considered in decision research (but see Dougherty, Gettys, & Ogden, 1999; Sieck & Yates,
2001). In contrast to cue-based models, exemplar models do not assume abstraction of cuecriterion
relationships. Instead, a judgment about a target object is based on knowledge represented by mem-
ory traces of exemplars (which represent collections of cues) that have previously been encountered.
Each individual exemplar contributes to the judgment as a function of its similarity to the target ob-
ject. According to the exemplar approach, David Hunter would judge the time of death of a body by
recalling similar bodies he had previously examined and consider for how long they turned out to
have been dead. The time of death of bodies with features highly similar to the current case would
obtain a larger weight in Hunters judgment than the time of death of less similar bodies.
Consistent with other approaches that assume a cognitive system with multiple qualitatively dis-
tinct representations (e.g., Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Erickson & Kruschke,
1998; Juslin, Jones, et al., 2003; Logan, 1988; Nosofsky, Palmeri, & McKinley, 1994; Sloman, 1996),
it has been proposed that people exibly select between cue-based and exemplar-based mechanisms
depending on the task structure (e.g., Juslin, Karlsson, & Olsson, 2008; Juslin, Olsson, et al., 2003). For
instance, people seem to rely more on exemplar memory when the task is to judge a binary (rather
than a continuous) criterion (Juslin, Olsson, et al., 2003); when the criterion values follow from the
cue values deterministically (rather than probabilistically; Juslin, Olsson, et al., 2003); and when the
criterion values follow from a multiplicative (rather than an additive) combination of cues (Juslin, Kar-
lsson, & Olsson, 2008; Karlsson, Juslin, & Olsson, 2008). By examining the impact of different types of
learning tasks on subsequent strategy selection, we contribute to a better understanding of the bound-
ary conditions under which people use cue-based and exemplar-based mechanisms.
1.2. Learning by comparison vs. direct criterion learning
To illustrate learning by comparison, consider the experiments by Nosofsky and Bergert (2007).
Participants were to make judgments concerning the poisonousness of ctitious bugs. Each bug was
characterized by a pattern on several binary-valued cues and the bugs poisonousness varied along
a continuous dimension. At each trial of the training phase, participants were presented with a pair
of bugs and the task was to judge which one was more poisonous. Each decision was followed by feed-
back indicating the correct response, that is, which of the two bugs was more poisonous. The bugs
continuous poison levels were not provided.
Contrast this procedure with direct criterion learning as used by Juslin, Jones, et al. (2003), who
asked participants to judge frogs in terms of their poisonousness. At each trial of the training phase,
participants were presented with an individual frog and asked to decide, based on the frogs cue pro-
le, whether the frog was dangerous or not. Each decision was followed by feedback indicating
whether the response was correct as well as the continuous poison level of the frog.
As mentioned above, in previous judgment research learning by comparison and direct criterion
learning have been used more or less interchangeably. Several studies have used learning by comparison
(Bergert & Nosofsky, 2007; Garcia-Retamero, Wallin, & Dieckmann, 2007; Lee & Cummins, 2004; Newell
et al., 2003; Nosofsky & Bergert, 2007; Persson & Rieskamp, 2009, Study 1), others have used direct cri-
terion learning (Brder & Schiffer, 2006; Juslin, Olsson, et al., 2003; Juslin et al., 2008; Karlsson, Juslin, &
Olsson, 2007; Olsson, Enkvist, & Juslin, 2006; Rakow, Hinvest, Jackson, & Palmer, 2004), and some have
even used a mixture of both (Brder, Newell, & Platzer, 2010; Newell & Shanks, 2003; Persson & Riesk-
amp, 2009, Study 2). Crucially, neither of these studies has considered the possibility that the use of a
particular learning task might have an inuence on subsequent strategy selection. For instance, despite
using similar material, Nosofsky and Bergert (2007) and Juslin, Jones, et al. (2003) reached opposite con-
clusions regarding the contribution of cue-based and exemplar-based mechanisms, but did not discuss
that this difference might be due to them employing different types of learning tasks.
1.3. Might the type of learning task affect subsequent decision performance and strategy selection?
Currently, there is very little theoretical or empirical work in judgment research on how learning
by comparison and direct criterion learning might differentially affect subsequent decision making. In
addition, the existing work is consistent with opposing hypotheses as to the direction of a possible im-
pact. On the one hand, it is assumed that one decisive requirement for efcient learning of an environ-
ments statistical structure is the availability of metric information about the criterion. For instance,
Juslin, Jones, et al. (2003; Experiment 1; see also Juslin, Olsson, et al., 2003) found that the provision
of continuous criterion values during training led to better performance and a greater reliance on cue-
based mechanisms than when only binary criterion information (in the context of a classication task)
was provided. As direct criterion learningbut not learning by comparisondirectly provides metric
information about the objects criterion values (learning by comparison provides only ordinal informa-
tion), one might therefore expect less accurate decisions after learning by comparison. Moreover, if
providing only binary feedback hampers the abstraction of cuecriterion relationships, learning by
comparison should give rise to greater reliance on exemplar-based processes.
On the other hand, it has been proposed that tasks highlighting how differences between objects on
the same cue are related to differences on the criterion might foster a more efcient abstraction of
cuecriterion relationships (cf. Juslin et al., 2008; Klayman, 1988a). Indeed, there is evidence for a ben-
et of training conditions that involve comparison in classication (Namy & Gentner, 2002), word
learning (Oakes & Ribar, 2005), and memory (Markman & Gentner, 1997; Oakes, Kovack-Lesh, & Horst,
2009; we provide a more extended discussion of this literature in Section 5). Further, Gentner and
Namy (1999) found that after comparing two (similar) objects, participants relied on abstract taxo-
nomic features to categorize objects (e.g., they categorized strawberry and banana in one category),
whereas after viewing single objects, participants relied on perceptual features (e.g., they categorized
strawberry and balloon in one category). Therefore, reliance on cue-based mechanisms might be
greater after learning by comparison than after direct criterion learningwhich could also be expected
given the opposing ndings of Nosofsky and Bergert (2007) and Juslin, Jones, et al. (2003) described
above. In addition, Stewart, Chater, and Brown (2006) proposed that people can approximate metric
criterion information very well based on simple, pairwise comparisons of objects. More specically,
although a single comparision provides only ordinal criterion information, people training with multi-
ple comparisions may be able to extract metric information cumulatively by keeping, for instance, a
tally of the relative frequencies with which one object wins against (i.e., has a higher criterion value
than) another. As a consequence, performance might thus be equal or even better after learning by
comparison.
A third possibility is that, if the type of learning task inuences the selection between cue-based
and exemplar-based mechanisms, a particular type of learning task does not generally lead to better
or worse performance; rather, whether learning by comparison or direct criterion learning is bene-
cial might depend on the structure of the environment. To appreciate this hypothesis, note that cue-
based models and exemplar-based models excel in different environments. First, cue-based models,
but not exemplar-based models, are able to extrapolatethat is, to produce estimates that are more
extreme than the criterion values of objects encountered during the training phase (DeLosh,
Busemeyer, & McDaniel, 1997; Juslin, Olsson, et al., 2003; Little & Lewandowsky, 2009). In a linear
environment, a learning task fostering the use of cue-based mechanisms should therefore lead to
higher accuracy than a learning task fostering exemplar processing; in a nonlinear environment, by
contrast, the opposite should hold. Second, as cue-based processing is assumed to be constrained to
linear additive integration (e.g., Juslin et al., 2008), cue-based mechanisms are highly limited in
nonlinear environments, whereas exemplar processing is able to represent any functional form in
the environment. In a nonlinear environment, a learning task fostering the use of an exemplar-based
mechanism should therefore lead to higher accuracy than a learning task fostering the use of a
cue-based strategy (for a related contingency between task structure and the type of learning task
in classication research, see Yamauchi, Love, & Markman, 2002).
In the experiments reported below, we examined these three possibilities. Another (and related)
issue investigated in this article is the degree to which people trained with learning by comparison
are able to provide accurate estimates about continuous criterion values. To foreshadow our ndings,
after training with learning by comparison, people show an impressive ability to estimate continuous
criterion values that can, surprisingly, even exceed the accuracy of people trained with direct criterion
learning. In other words, we provide evidence that people can extract much of the metric structure of
an environment based on simple ordinal comparisons between objects.
In sum, we highlighted a distinction between two types of learning tasks often used in decision re-
search and argued that they might lead to very different conclusions regarding the contribution of
cue-based and exemplar-based mechanisms in decision making. Moreover, we argued that the exist-
ing literature is consistent with opposing hypotheses as to which type of learning task should lead to
superior performance. In addition to clarifying this important empirical issue, our work makes a the-
oretical contribution by investigating further boundary conditions for the use of exemplar processes in
multiple-cue judgment.
1.4. Overview of the experiments
We conducted three experiments that investigated whether learning by comparison and direct cri-
terion learning differentially impact performance and strategy selection in decision making. Speci-
cally, we asked: does learning by comparison hamper decision performance at test or can it even
lead to superior performance as compared with direct criterion learning? Is there evidence that direct
criterion learning and learning by comparison differ in the strategies that they subsequently trigger?
Does a potential effect of the type of learning task extent similarly across different types of decision
tasks (e.g., paired comparison, classication, and estimation)? How does the direction of the effect
of learning task on decision performance (if there is one) depend on the statistical structure of the
environment (e.g., linear vs. nonlinear)? And nally, does the ability to provide accurate continuous
estimates of the criterion value necessitate training with continuous criterion values? To examine
these questions, we used a broad array of tasks including paired-comparison, classication, and esti-
mation tasks. In all experiments, participants were asked to make decisions concerning the toxicity of
subspecies of a ctitious, poisonous death bug (cf. Juslin, Jones, et al., 2003) and our main manipu-
lation was whether participants learned information about the criterionthat is, the subspecies tox-
icityvia learning by comparison or via direct criterion learning.
All experiments used the same basic methodology and design. The subspecies were described on
four binary cues. The cue values were translated into binary features of the bug (green or brown back,
wedge or round spot on its back, large or small glands above its eyes, and white or yellow abdomen),
which were presented to participants as verbal descriptions. The toxicity level of each subspecies, de-
ned as the concentration of poison in a secretion of the bug, was a continuous value between 0 (0%)
and 1 (100%). In Experiments 1 and 2, each subspecies criterion value was a linear additive function of
the cue values; in Experiment 3, we used a modied version of the nonlinear function used by Olsson
et al. (2006).
Fig. 1 shows the design and the different tasks used in the experiments. In an initial training phase,
participants were presented with a subset of the subspecies and acquired knowledge about their cri-
terion values either via learning by comparison or via direct criterion learning. In the learning-
by-comparison condition, participants were presented with two subspecies and asked to judge which
one had a higher toxicity level. After each trial, participants were told the correct answer, but no infor-
mation was provided about the continuous toxicity level of the subspecies. For comparability with the
learning-by-comparison condition, in Experiments 1 and 3 the direct-criterion-learning condition also
involved a binary response format, namely a classication task. Specically, participants were pre-
sented with individual subspecies and asked to classify each as either deadly or not. In Experiment
2, the direct-criterion-learning condition involved an estimation task, in which the continuous crite-
rion value had to be estimated and after each trial, participants were told the actual toxicity level of
the subspecies. Importantly, in the training phase cue-based and exemplar-based mechanisms
allowed, in principle, equally good performance, so possible differences in subsequent strategy use
cannot be due to differences in the accuracy that the strategies conveyed.
Fig. 1. Design and tasks used in Experiments 1 and 3 (in Experiment 2, direct criterion learning occurred via an estimation task
and the classication task in the test phase was replaced by an estimation task).
After the training phase, participants were presented with a test phase, in which no feedback was
provided and in which new subspecies (i.e., which had not been presented during the training phase)
were introduced. Our focus was on whether peoples decisions in this test phase differed depending on
whether they had trained with learning by comparison or direct criterion learning. The rst task in the
test phase was either a paired-comparison task or a single-object task (Experiments 1 and 3: classi-
cation; Experiment 2: estimation). As Fig. 1 shows, we crossed the type of learning task in the training
phase (learning by comparison vs. direct criterion learning) with the type of the rst decision task in
the test phase (paired comparison vs. classication/estimation), yielding four conditions. In two con-
ditions participants continued in the test phase with the same task type as in the training phase,
whereas in the other two conditions participants continued with a different task type (i.e., switching
from paired comparison to classication/estimation or vice versa). As a third and nal task, all partic-
ipants were presented with an estimation task, in which they had to estimate the continuous toxicity
level of all subspecies (without feedback).
1.4.1. Cross-item generalization and cross-task generalization

Based on this design, we were able to evaluate both (a) cross-item generalization, that is, partici-
pants ability to generalize in the test phase knowledge about old species (i.e., presented in the train-
ing phase) to new subspecies (i.e., introduced in the test phase); and (b) cross-task generalization, that
is, the degree to which, in the test phase, decision performance is affected by whether or not the type
of decision task matches the one during the training phase. For instance, after training with learning
by comparison (which involves a paired-comparison task), how does performance in a classication
task differ from performance in a paired-comparison task? Although the ability to generalize knowl-
edge about objects across different decision tasks is a very important aspect of successful cognition,
there is very little research on peoples ability for cross-task generalization (for work on this issue
in classication, see Yamauchi & Markman, 1998).
1.4.2. Computational modeling: Cue-based and exemplar-based models

To investigate whether the selection between cue-based and exemplar-based mechanisms in the test
phase is impacted by the type of learning task and whether the impact manifests itself similarly across
different types of decision tasks, we used computational modeling. Specically, we compared various
cue-based and exemplar-based models in their ability to capture participants responses in the various
tasks of the test phase. As described above, cue-based models assume that a decision is based on ab-
stracted cuecriterion relationships whereas exemplar-based models assume that a decision is based
on the retrieval of previously encountered objects. Due to the different response formats of the decision
tasks (paired comparison, classication, and estimation) and because different versions of cue-based
and exemplar-based models have been proposed in the literature, we implemented several models
for each of the two model classes. For instance, cue-based models were either implemented with or
without a guessing parameter and exemplar models assumed either a continuous or a binary represen-
tation of the criterion values. By considering a broad range of implementations, we wanted to avoid that
our conclusions about the relative contribution of cue-based and exemplar-based processes in each of
the different decision tasks would depend on the specic implementation of the models.
The cue-based models we tested included two compensatory modelsthe cue abstraction model
(CAM; and Juslin, Jones, et al., 2003; Juslin, Olsson, et al., 2003), and weighted additive (WADD; Payne,
Bettman, & Johnson, 1988)and the noncompensatory generalized take-the-best heuristic (gTTB;
Nosofsky & Bergert, 2007). gTTB is a generalized version of Gigerenzer and Goldsteins (1996) take-
the-best heuristic. Both compensatory models assume a weighted integration of the objects cue val-
ues (although the models differ in terms of the denition of the weights); the noncompensatory gTTB
assumes that cues are inspected sequentially, according to their validity (for a denition see Appendix
A), and that inspection is stopped as soon as a discriminating cue is found.
All exemplar-based models (EBM) that we tested were implemented within the framework of the
generalized context model (Nosofsky, 1984; Nosofsky, 1986). For predicting decisions after direct
criterion learning, the application of the generalized context model is straightforward, and the deci-
sion is based on a similarity-weighted average of the criterion values of the stored exemplars. For pre-
dicting classication or estimation responses after learning by comparison, however, current exemplar
models cannot be applied, as in learning by comparison no criterion value is provided. Therefore, we

propose a new exemplar model for decisions after training with learning by comparison. In brief, this
model assumes that each exemplar is stored along with a tally of the relative frequency of the exem-
plar having a higher criterion value than the exemplars to which it had previously been compared (to
which we refer to as dominance rate). This makes it possible for the model to predict continuous re-
sponses, which are function of the average dominance rate of the stored exemplars, weighted by each
exemplars similarity to the probe.
Detailed descriptions of the various implementations of the cue-based and exemplar-based models
can be found in Appendix A. Table 2 summarizes which models were tested in the different tasks and
conditions as well as the number of free parameters in each model.
Next we report three experiments across which we varied (a) the response format in the training
condition with direct criterion learning (binary vs. continuous), and (b) the structure of the environ-
ment (linear vs. nonlinear). The differences between the experiments are summarized in Table 3. In
Experiments 1 and 3, the training phase in both training conditions involved a task with a binary re-
sponse format, whereas in Experiment 2, the training phase with direct criterion learning involved an
estimation task in which participants had to provide continuous responses.
2. Experiment 1
The main aim of Experiment 1 was to compare decision making after training with learning by
comparison and decision making after training with direct criterion learning. Do cross-item and
cross-task generalization at test differ as a function of the type of learning task in the training phase?
In addition, how well are people trained with learning by comparison able to map their learned
knowledge onto a continuous scale in an estimation task (when informed about the range in which
the values are located)? Moreover, we investigated whether the two types of learning tasks lead to
reliance on different strategies in the subsequent test phase.
2.1. Method
2.1.1. Participants
Eighty students (average age 24.0 years, 46 female) from the Free University Berlin participated in
the experiment. They received an hourly fee of 10 as compensation as well as an additional bonus
depending on their performance in the various decision tasks (see below).
Table 1
Cue patterns and continuous and binary criterion values of the 15 (16) exemplars used in Experiments 1 and 2 (Experiment 3).
Exemplar Cue values Linear criterion (Experiments 12) Nonlinear criterion (Experiment 3)
(Subspecies No.)
C1 C2 C3 C4 Continuous (c) Binary (b) Role Continuous (cNL) Binary (bNL) Role
1 1 1 1 0 1 1 N 0.16 0 N
2 1 1 1 1 .9 1 O 0.47 0 O
3 1 1 0 0 .8 1 O 0.71 0 O
4 1 1 0 1 .7 1 O 0.88 1 O
5 1 0 1 0 .7 1 N 0.88 1 N
6 1 0 1 1 .6 1 N 0.97 1 N
7 1 0 0 0 .5 p = .5 N 1 1 N
8 1 0 0 1 .4 0 O 0.94 1 O
9 0 1 1 0 .6 1 O 0.97 1 O
10 0 1 1 1 .5 p = .5 O 1 1 O
11 0 1 0 0 .4 0 O 0.94 1 O
12a 0 1 0 1 0.82 0 O
13 0 0 1 0 .3 0 O 0.82 0 O
14 0 0 1 1 .2 0 O 0.62 0 O
15 0 0 0 0 .1 0 O 0.35 0 O
16 0 0 0 1 0 0 N 0 0 N
Note: O = old exemplar, presented in the training phase, N = new exemplar, presented only in the test phase.
a
Subspecies 12 was not used in Experiments 12.
Table 2
Cue-based and exemplar-based models that were tested in the paired-comparison (PC), classication (CLASS), and estimation (EST)
tasks of Experiments 13, separately for the conditions with training by learning by comparison and direct criterion learning, and
the number of free parameters in each model (see Appendix A for a detailed description of the models).
Model Experiment 1 Experiment 2 Experiment 3 Number of free

parameters
PC CLASS EST PC CLASS EST PC CLASS EST
Learning by comparison
CAM X (11) X (8) X (2) 4
CAMk X (28) X (28) X (9) 5
bCAM X (3) X (6) X (1) X (9) X (0) X (2) 4
bCAMk X (1) X (11) X (0) X (9) X (5) X (1) 5
gTTB X (0) X (1) X (1) 3
WADD X (13) X (16) X (2) 4
EBMdombin X (0) X (0) X (1) X (0) X (2) X (0) 4
EBMgdombin X (1) X (0) X (1) X (2) X (2) X (3) 5
EBMdomcon X (1) X (0) X (0) X (2) X (8) X (4) 4
EBMgdomcon X (0) X (1) X (0) X (0) X (5) X (22) 5
EBMpc X (0) X (0) X (0) 4
EBMgpc X (3) X (1) X (7) 5
EBMpc2 X (0) X (1) X (2) 4
EBMgpc2 X (0) X (0) X (3) 5
Direct criterion learning
CAM X (6) X (7) X (1) 4
CAMk X (18) X (7) X (9) 5
bCAM X (2) X (5) X (1) X (4) X (5) X (1) 4
bCAMk X (4) X (1) X (7) X (0) X (2) X (0) 5
gTTB X (2) X (3) X (1) 3
WADD X (11) X (4) X (4) 4
EBMcon X (0) X (1) X (1) X (8) X (1) X (6) 4
EBMgcon X (3) X (6) X (1) X (14) X (0) X (23) 5
EBM X (6) X (2) X (14) X (1) X (10) X (0) 4
EBMg X (2) X (7) X (0) X (3) X (8) X (1) 5
EBMdirpc X (0) X (3) X (7) 4
EBMgdirpc X (1) X (2) X (1) 5
Note: The numbers in the brackets indicate the number of participants who were classied as following each model. Note that in
the classication task in Experiment 1, there were 1 and 3 participants in the learning-by-comparison and the direct-criterion-
learning conditions, respectively, for whom several models showed the same best t (see Fig. 5). As these tied participants
could thus not be classied unambiguously, they are not considered in the counts.
2.1.2. Material
Fifteen different subspecies of the bug were used in the experiment. Each subspecies was charac-
terized by values on four binary cues, C1, C2, C3, and C4, which were mapped on the four physical fea-
tures of the bugs (e.g., color of the back, color of the abdomen). The mapping of the four cues on the
features was randomized across participants, so that, for instance, C1 corresponded to the color of the
back for one participant and to the color of the abdomen for another. Table 1 shows the cue patterns of
the 15 subspecies as well as their continuous criterion values c, the toxicity level. The toxicity level of a
subspecies i followed a linear, additive function of the four cues4
ci :1 :4 C i1 :3 C i2 :2 C i3 :1 C i4 : 1
Table 1 also shows the subspecies binary criterion values b, indicating whether the subspecies was
deadly or not. Subspecies with c < .5 were harmless (b = 0), subspecies with c > .5 were deadly
(b = 1), and subspecies with c = .5 were harmless or deadly, determined randomly.
In the training phase, participants were presented with 10 (indicated by O in Table 1) of the 15 sub-
species. The 10 subspecies had criterion values ranging from 0.1 to 0.9. In a test phase, the 10 old as
4
The environment we used in Experiments 1 and 2 is slightly modied compared to the one used by Juslin, Jones, et al. (2003).
Specically, to ensure that all cues had a validity >.5, we reversed C4 and did not use exemplar #12.
Table 3
Task manipulations across Experiments 13.
Experiment Response format during training phase Criterion for completion of Environmental
training structure
Learning by Direct criterion
comparison learning
1 Binary Binary Accuracy of 85% Linear
2 Binary Continuous Accuracy of 85% Linear
3 Binary Binary Accuracy of 80% Nonlinear
well as 5 new subspecies (indicated by N in Table 1) were presented. The new subspecies were se-
lected such that they allowed discrimination between (linear) cue-based and exemplar-based pro-
cesses (see below for details). They included the subspecies with the most extreme criterion values
(i.e., 0 and 1) as well as subspecies that, despite different cue patterns, had the same criterion values
as some subspecies presented in the training phase.
2.1.3. Design and procedure

A 2 2 factorial between-subjects design was employed with type of learning task during the
training phase (learning by comparison or direct criterion learning) and type of the rst decision task
in the test phase (paired comparison or classication) as independent variables (Fig. 1). Twenty par-
ticipants were randomly assigned to each of the four experimental conditions. In the conditions with
learning by comparison, the training phase involved a paired-comparison task in which participants
had to decide which of two subspecies was more poisonous. After each decision, feedback was pro-
vided in that the more poisonous subspecies was indicated by a red rectangular border surrounding
it. The leftright positioning of each subspecies in a pair was randomized on each trial. Of the 45 pos-
sible pairwise combinations of the 10 subspecies the one where two subspecies had identical toxicity
levels was excluded, resulting in 44 different pairs. In the basic training phase, each pair was repeated
four times, resulting in a total of 176 trials. The 176 trials were divided into four blocks (each consist-
ing of a cycle of the 44 pair comparisons), after each of which participants could take a short break. In
the conditions with direct criterion learning, the training phase consisted of a classication task in
which each of 10 subspecies had to be classied as deadly or not deadly. After each decision, partic-
ipants received feedback about the accuracy of their decision as well as the subspecies toxicity level.5
In the basic training phase, each subspecies was shown 18 times, resulting in a total of 180 trials. The 180
trials were presented as three blocks (with 60 trials each), after each of which participants could take a
short break. In both training conditions, trials were self-paced and participants could study the feedback
and the cue pattern of the subspecies of the current trial for as long as they wished. Participants were
told that they would receive 0.02 for every correct response and would lose the same amount for every
incorrect response. If, after the basic training phase, a participants performance in the last training block
did not reach an accuracy of at least 85%, the bonus gathered in that block was lost and the participant
was presented with another training block until the required accuracy level was reached.
To ensure a high level of motivation, participants were informed prior to the experiment that they
might nd the training phase difcult, but that the task was solvable and that breaks were permitted
when necessary. Additionally, it was emphasized to participants who were training with direct crite-
rion learning that it was important to pay attention to the criterion values of the individual subspecies
as they would be required in a later task.
In the test phase, participants were rst presented with either a paired-comparison task or a clas-
sication task (without getting feedback). Note that for half of the participants, the type of the rst
task in the test phase was different from the type of the decision task during the training phase
(Fig. 1). In the paired-comparison task, participants were presented with pairs of subspecies and asked
to decide which of the two was more poisonous. The leftright positioning of each subspecies in a pair
5
One may object that there was thus a mismatch between the format of the required response and the format in which the
feedback was provided. This objection is addressed in Experiment 2.
was randomized on each trial. Of the 105 (=15 14/2) possible comparisons of all 15 subspecies, 18
pairs were selected in which TTB, WADD (both using the Bayesian denition of the validities/weights;
Lee & Cummins, 2004) and EBM predicted different decisions (with the attention weights in the model
set to 0.25 for all cues). Eleven comparisons involved either a new and an old subspecies or two new
subspecies and were repeated eight times; 7 comparisons involved two old subspecies and were re-
peated four times, yielding a total of 116 trials. In the classication task, all 15 subspecies were pre-
sented and participants were asked to classify each as deadly or not deadly. Each of the 10 old
subspecies was presented four times, and each of the ve new subspecies was presented eight times,
yielding a total of 80 trials. The order in which the subspecies were presented was determined ran-
domly. Participants who were trained with learning by comparison but presented with a classication
task in the test phase were informed that the criterion values of the subspecies seen in the training
phase varied between 0.1 to 0.9 and that all subspecies with a toxicity higher than 0.5 were deadly.
As in the training phase, participants were told that they would receive 0.02 for every correct re-
sponse and would lose the same amount for every incorrect response.
The nal task was an estimation task (in all conditions). For this task, participants were reminded
that the toxicity of the subspecies varied between 0 and 1 and instructed to estimate the continuous
toxicity level of each of the 15 subspecies, given its characteristics, by typing in an estimate via the
computer keyboard. A value of 0 should be assigned to the subspecies that they thought was least
poisonous, and a value of 1 to the subspecies that they thought was most poisonous. Participants
were instructed that depending on their accuracy in the estimation task, they could win an additional
bonus of up to 1.7. Each subspecies was presented twice in random order. Completion of the exper-
iment took, on average, around 50 min.
2.2. Results
2.2.1. Performance
The mean percentage of correct decisions in the last training block before achieving the required
accuracy level differed slightly between the training conditions (learning by comparison: 93.7%; direct
criterion learning: 95.1%), t(78) = 1.4, p = .17. However, participants training with learning by com-
parison reached the required level of accuracy substantially faster than those training with direct cri-
terion learning: at the end of the training phase, participants in the learning-by-comparison condition
had been presented with fewer trials than participants in the direct-criterion-learning condition
(Ms = 181.5 vs. 264.3), t(40.6) = 3.31, p = .002, although the former had been presented with each
subspecies more frequently (Ms = 37.1 vs. 26.4), t(45.8) = 4.15, p = .001. To appreciate this apparent
advantage of participants who were trained with learning by comparison, note that they received
information about the subspecies continuous criterion values only indirectly.
How well were participants able to generalize their knowledge acquired in the training phaseand
did their ability differ as a function of the type of learning task? Fig. 2 shows the results concerning
cross-item generalization, dened as the degree to which decision performance in the test phase dif-
fers between new and old subspecies. Unsurprisingly, in most conditions performance was lower
when new, as compared to when old, subspecies were presented. In the classication task, the drop
in performance when generalizing to new items was considerably less pronounced after training with
learning by comparison than after training with direct criterion learning, t(19) = 1.4, p = .17, d = 0.42
vs. t(19) = 9.2, p = .001, d = 3.17. In the paired-comparison task, performance dropped on new items
when trained with learning by comparison, t(19) = 4.3, p = .001, d = 0.91, but not when trained with
direct criterion learning, t(19) = 0.82, p = .42, d = 0.17. However, note that in the former, accuracy on
new items was still slightly higher than in the latter, t(38) = 1.43, p = .16, d = .46. A repeated-measures
ANOVA showed an interaction between item type (i.e., old/new) and training condition for the classi-
cation task, F(1, 38) = 38.72, p = .001, g2p :505, but not for the paired-comparison task,
F(1, 38) = 2.32, p = .14, g2p :058.
Fig. 3 shows the results concerning cross-task generalization. Trained with a paired-comparison
task (i.e., learning by comparison), participants test performance in the classication task was
similarly high as test performance in the paired-comparison task, t(38) = 1.2, p = .25, d = .37;
trained with a classication task (i.e., direct criterion learning), participants test performance in the
Fig. 2. Participants mean percentage of correct responses for old and new items in the classication and paired-comparison
tasks in the test phase of Experiments 13, separately for the two training conditions. Error bars represent 1 standard error of
the mean.
paired-comparison task was lower than in the classication task, t(38) = 3.0, p = .005, d = .93. An
ANOVA showed a trend for an interaction between type of decision task and training condition,
F(1, 38) = 2.96, p = .089, g2p :037. In sum, when tested with a different type of decision task than
the one in the training phase, participants were better able to approximate the accuracy of partici-
pants continuing with the same task type when they had been trained with learning by comparison
than when they had been trained with direct criterion learning.
2.2.2. Estimation task

Fig. 4 shows participants mean estimates for old and new subspecies, separately for the two train-
ing conditions. Accuracy was quantied in terms of (a) the average (across participants) root mean
square error (RMSE) of all estimates and (b) the average (across participants) correlation between
the estimates and the actual criterion for old subspecies, rold (cf. Olsson et al., 2006).6 Table 5 shows
the results. Although participants trained with learning by comparison were never provided with the
subspecies continuous toxicity levels, their estimates were more accurate than those of participants
who had trained with direct criterion learning in terms of RMSE, t(78) = 2.1, p = .036, and equal in terms
of rold, t(78) = .797, p = .49.
Inspection of Fig. 4 also provides some indication that depending on the type of learning task dur-
ing training, participants relied on cue abstraction and exemplar processes in the estimation task to
6
rold provides a straightforward and model neutral measure of performance. The correlation across new subspecies, by contrast,
is not model neutral because the task is construed specically so that people relying on exemplar memory will make poor
judgments for these subspecies.
Fig. 3. Mean percentage of correct responses in the classication and paired-comparison tasks in the test phase of Experiments
13, separately for the two training conditions. Error bars represent 1 standard error of the mean.
different degrees. Two aspects are important in this regard. First, the test phase introduced new sub-
species that have different cue patterns but the same criterion value as some subspecies that were in-
cluded in the training phase (e.g., subspecies 4 and 5 in Table 1 both have a criterion value c = .7).
Exemplar processing predicts large differences between the estimates for such oldnew pairs,
whereas cue abstraction predicts differences to be small. Second, in the test phase we introduced
the two subspecies with the most extreme criterion values (i.e., subspecies 1 and 16 have the highest
and lowest, respectively; Table 1) and estimating them accurately requires extrapolation (i.e., provid-
ing estimates that go beyond the range of the criterion values of the objects encountered during the
training phase). Exemplar processing predicts a lack of extrapolation, whereas cue abstraction predicts
an ability to extrapolate. As can be seen, in the condition where participants had trained with direct
criterion learning, there are both substantial oldnew differences and a lack of extrapolation; in the
condition where participants had trained with learning by comparison, by contrast, oldnew differ-
ences are relatively small and there is clear evidence for extrapolation.
To quantify the contribution of cue abstraction and exemplar processes, we calculated the repre-
sentation index (RI; cf. Juslin, Olsson, et al., 2003; Olsson, Enkvist, & Juslin, 2006; see Appendix B).
RI combines a measure of the amount of extrapolation (i.e., the deviation of the actual estimate for
the most extreme new subspecies from the estimate predicted by a regression line based on partici-
pants estimates of the old subspecies) and a measure of the amount of oldnew differences (i.e., the
average difference between the estimation error for the old subspecies and the estimation error for the
corresponding new subspecies). A negative RI suggests exemplar memory whereas an RI close to zero
suggests cue abstraction. As shown in Table 5, RI did not differ from zero after training with learning
by comparison, whereas it was clearly negative after training with direct criterion learning. In addi-
tion, there was a higher average proportion of estimates outside the training range in the condition
with learning by comparison than in the condition with direct criterion learning, t(78) = 5.39,
p = .001, d = 1.20. Taken together, these patterns in participants responses in the estimation task pro-
vide a rst indication that direct criterion learning fosters reliance on exemplar memory and learning
by comparison fosters reliance on cue abstraction. Next, we examine the effect of the type of learning
task on the mechanisms used in the different tasks of the test phase with computational modeling.
2.2.3. Computational modeling

The analysis above has shown that peoples performance in the test phase is affected by whether
the training phase involved learning by comparison or direct criterion learning. The analysis of the
estimation task provided a rst hint that these differences may be due a differential reliance on
cue-based and exemplar-based strategies after learning by comparison and direct criterion learning.
Fig. 4. Participants estimates in the estimation task (test phase), separately for the two training conditions. Error bars represent
1 standard error of the mean.
Alternatively, the two learning tasks may differ in terms of how effectively they allow the decision ma-
ker to extract relevant information about the structure of the environment (e.g., cue weights), while
not leading to differences in strategy use. To examine the impact of the type of learning task on strat-
egy use, we tted the various cue-based and exemplar-based models listed in Table 2 to each individ-
ual participants responses in the paired-comparison task, the classication task, and the estimation
task, separately for those participants who had trained with learning by comparison and those who
had trained with direct criterion learning. As described above, each item in the different tasks (i.e.,
paired-comparison, classication, and estimation tasks) was presented several times. For binary re-
sponses (i.e., paired-comparison and classication tasks), the models predicted for each individual
participant the probability of a particular response at a given item. For continuous responses, the mod-
els predicted the average (across the repetitions) response for a given item.
Depending on the type of decision task and learning condition, between six and eight different mod-
els were tted for each task (see Table 2). For each participant, the models were evaluated using the
Bayesian Information Criterion (BIC; Schwarz, 1978; see Appendix A for a detailed description of the
Table 4
Measures of t for the best-tting models in the computational modeling analysis.
Task Learning by comparison Direct criterion learning

Experiment 1 Experiment 2 Experiment 3 Experiment 1 Experiment 2 Experiment 3
2
r
Paired comparison 0.939 0.955 0.717 0.904 0.733 0.765
Classication 0.973 0.931 0.725 0.977 0.928 0.731
Estimation 0.930 0.924 0.669 0.916 0.930 0.579
RMSD
Paired comparison 0.064 0.062 0.199 0.097 0.171 0.170
Classication 0.049 0.061 0.218 0.049 0.082 0.202
Estimation 0.054 0.067 0.147 0.063 0.057 0.137
Note: r2 = average (across participants) squared Pearson correlation between the predicted and the actual response proportions
(paired-comparison and classication tasks) or the predicted and the actual estimates (estimation task). RMSD = average (across
participants) root mean square deviation between the predicted and the actual response proportions (paired-comparison and
classication tasks) or between the predicted and the actual estimates (estimation task).
tting procedure), and each participant was classied as a user of the model with the lowest BIC (cf.
Bergert & Nosofsky, 2007). Table 2 shows the number or participants who were classied to each of
the individual models (four participants for whom, in the classication task, both a cue-based and an
exemplar-based strategy showed the best t are not considered in Table 2). As our primary focus was
on a comparison between cue-based and exemplar-based strategies and to simplify the presentation,
in Fig. 5 we collapsed all the best-tting cue-based models and all the best-tting exemplar-based mod-
els (we turn to a more detailed discussion of the various cue-based models in Section 5) for each decision
task, separately for the two training conditions. For the paired-comparison task, the type of learning task
did not seem to affect strategy selection in the test phase, with the large majority of participants clas-
sied to one of the cue-based models and only a minority to an exemplar model (learning by compar-
ison: 85% vs. 15%; direct criterion learning: 95% vs. 5%). For the classication task, by contrast, there
were clear differences as a function of the training conditions. After training with direct criterion learn-
ing, 55% of the participants were best described by an exemplar model, but after training with learning
by comparison this was the case for only 10%. A similar picture emerged for the estimation task (with
separately tted models), where 40% of participants were best described by an exemplar model after
training with direct criterion learning but only 3% after training with learning by comparison.
Note that, in addition to the effect of the type of learning task, after training with direct criterion
learning strategy selection was also strongly affected by the type of decision task itself. The paired-
comparison task seemed to predominantly trigger reliance on cue-based processes, whereas the clas-
sication and estimation tasks seemed to trigger reliance on exemplar-based processes. Potentially,
the effect of the paired-comparison task to push people toward reliance on cue abstraction may have
overridden any additional impact of the type of learning task.
Table 4 shows how well the best-tting models captured participants responses, both in terms of
the models average (across participants) root mean square deviation (RMSD) and the average squared
correlation between the predicted and observed response proportions. As can be seen, in both training
conditions the t was good (with r2 ranging between .904 and .977) and not consistently lower after
training with direct criterion learning. This suggests that the lower performance in this condition was
not due to more error-prone strategy execution (or a less efcient extraction of cuecriterion relation-
ships). In order to directly test the thesis that differences in accuracy were associated with differences
in strategy use, we calculated the percentage of correct responses in the classication and paired-com-
parison tasks conditional upon strategy classication. As expected, Fig. 6 shows that participants clas-
sied as following a cue-based model showed a slight trend for higher accuracy than those following
an exemplar-based model; nevertheless, in an ANOVA with strategy type (cue-based vs. exemplar-
based), type of decision task (classication vs. paired comparison; thus controlling for effects of task
difculty), and training condition (learning by comparison vs. direct criterion learning) as independent
factors the main effect for strategy type did not reach conventional signicance levels, F(1, 72) = 1.37,
p = .26, g2p :038.
Fig. 5. Percentage of participants classied as following a cue-based model or an exemplar-based model based on their
responses in the paired-comparison, classication, and estimation tasks in the test phase, separately for the condition with
learning by comparison and direct criterion learning. Tied participants are those for whom a cue-based model and an
exemplar-based model showed the same best t.
Table 5
Results of the estimation task.
Experiment Training condition RMSE rold RI (95% CI) Proportion of extrapolation estimates
1 Learning by comparison .124 .92 .017 (.106, .072) .118
Direct criterion learning .158 .90 .308 (.397, .218) .043
Note: RMSE = average (across participants) root mean square error of the estimates, rold = median correlation between estimates
and actual criterion values for old exemplars, RI = representation index, Proportion of extrapolation estimates = mean pro-
portion of estimates that were outside the range of values shown in the training phase. In Experiment 3, RI was calculated only
based on the extrapolation index.
2.3. Summary and discussion
The results of Experiment 1 provide a rst indication that the type of learning task during training
impacts decision performance at test. Surprisingly and inconsistent with the thesis that mastering a
probabilistic environment requires the direct provision of metric information about the objects crite-
rion values, learning by comparison not only allowed people to perform well; it even seemed to give
an edge over direct criterion learning. Impressively, participants trained with learning by comparison
were able to map objects on a metric dimension with high accuracy although they had only obtained
ordinal information about the objects criterion values. Computational modeling suggests that these
differences were (at least in part) due to the two types of learning tasks fostering the use of different
decision strategiesand not due to the fact that the learning tasks differ in how well they allow the
decision maker to abstract cue weights. Specically, direct criterion learning was associated with a
somewhat greater reliance on exemplar processes than learning by comparison (and this effect was
most pronounced in classication and estimation tasks). Opposing conclusions in previous studies
regarding the relative contribution of exemplar processing and cue abstraction (Juslin, Jones, et al.,
2003; Nosofsky & Bergert, 2007) are thus potentially due to differences in the type of learning task
employed. Reliance on exemplar processing deteriorates decision performance (as compared to cue-
based models) because exemplar processing is unable to extrapolate.
85%
Cue-based participants
Exemplar-based participants
Accuracy (% correct decisions)

80%
75%
70%
65%
60%
55%
50%
Exp. 1 Exp. 2 Exp. 3
Fig. 6. Accuracy in the paired-comparison and classication tasks of the test phase, separately for participants classied as
using a cue-based or an exemplar-based model. Shown are the marginal estimated means of the percentage of correct
responses, collapsed across the paired-comparison and the classication tasks (controlling for effects of the type of decision
task). Error bars represent 1 standard error of the mean.
It could be objected that the worse performance after training with direct criterion learning may
alternatively be due to a mismatchin this conditionbetween the response format (i.e., a binary clas-
sication) and the format of the feedback (a continuous criterion value) in the training phase rather
than due to the type of learning task itself. To test this possibility, in Experiment 2 participants train-
ing with direct criterion learning were presented with an estimation task (during training) rather than
a classication task.
3. Experiment 2
3.1. Methods
3.1.1. Participants
Eighty students (average age 24.5 years, 53 female) from the University of Basel participated in the
experiment. Participants received either course credit or an hourly fee of 15 Swiss Francs as compen-
sation, and an additional bonus depending on their performance in the various decision tasks.
3.1.2. Material, design and procedure

The materials and design were as in Experiment 1, except for two modications. First, in the train-
ing phase, participants training with direct criterion learning were presented with an estimation task
(rather than a classication task), and had to estimate the continuous toxicity level of each bug. (The
training phase with learning by comparison was not modied.) Blocking and presentation frequency
in the training phase with direct criterion learning were as in the classication task in Experiment 1.
To proceed to the test phase, participants training with direct criterion learning had to provide the cor-
rect toxicity level on 85% of trials in the last block of the training phase (and had to repeat the last
block otherwise). The second modication compared to Experiment 1 was that the classication task
in the test phase was replaced by an estimation task.7 Completion of the experiment took, on average,
around 70 min.
7
For comparability with the other conditions, the estimation task at the end of the experiment was retained in all conditions. As
a consequence, half of the participants thus performed two estimation tasks.
3.2. Results
3.2.1. Performance
In both training conditions, participants nished the training phase with a high level of accuracy.
Although the percentage of correct decisions differed somewhat between the conditions with learning
by comparison and direct criterion learning, the difference was rather small (Ms = 94.5% vs. 92.5%),
t(78) = 1.9, p = .07, d = .41. As in Experiment 1, participants training with direct criterion learning took
longer to reach the required accuracy level and, at the end of the training phase, had been presented
with a larger number of trials than participants training with learning by comparison (Ms = 331.5 vs.
180.4), t(39.4) = 4.90, p = .001. The average number of times each subspecies was presented, however,
did not differ between the two training conditions (learning by comparison: 36.9; direct criterion learn-
ing: 33.2), t(40.5) = 1.21, p = .23. Any potential advantage for participants who trained with learning by
comparison could thus not be due to a higher presentation frequency of the individual subspecies.
For comparability with Experiment 1, we determined the performance of participants whose rst
task in the test phase was an estimation task by simulating classication responses (based on their
individual estimates): estimates higher than 0.5 were counted as deadly responses; estimates lower
than 0.5 were counted as not deadly responses. Estimates of 0.5 were counted as deadly or not
deadly responses with equal probability. Given that in Experiment 1 reliance on cue-based and exem-
plar-based strategies was rather similar for the classication and estimation tasks (see Fig. 5), we did
not expect these simulated classication responses to be strongly affected by response format ef-
fects (e.g., Slovic, 1995). Concerning cross-item generalization, Fig. 2 shows some differences to Exper-
iment 1. For instance, in the paired-comparison task performance was now rather similar in the two
training conditions, both for old items, t(38) = 1.4, p = .18, d = .43, and new items, t(38) = 1.04, p = .31,
d = .33. However, the difference to Experiment 1 is mainly due to different results in the condition
with learning by comparison, which was not modied across experiments (although run in different
labs). The reason for this inconsistency across the experiments is unclear, and we refrain from spec-
ulation. Nevertheless, note in the classication task that there was again a larger drop in performance
from old to new items after training with direct criterion learning, t(19) = 5.88, p = .001, d = 1.77, than
after training with learning by comparison, t(19) = 1.55, p = .14, d = 0.55. A repeated-measures ANOVA
showed an interaction between item type (i.e., old/new) and training condition for the classication
task, F(1, 38) = 4.66, p = .037, g2p :11, but not for the paired-comparison task, F(1, 38) = 0.11, p = .74,
g2p :003.
The results concerning cross-task generalization shown in Fig. 3. As can be seen, the (simulated)
classication performance in the condition with direct criterion learning was higher than in Experi-
ment 1 and now exceeded the performance of the condition with learning by comparison,
t(38) = 2.9, p = .006, d = .92. Nevertheless, cross-task generalization was again better after training
with learning by comparison: here performance of the (simulated) classication responses did not dif-
fer from the performance in the paired-comparison task, t(38) = .99, p = .33, d = .31, whereas after
training with direct criterion learning performance in the paired-comparison task was lower than
the performance of the (simulated) classication responses, t(38) = 4.8, p = .001, d = 1.52. An ANOVA
showed a signicant interaction between type of decision task and training condition, F(1, 38) = 6.62,
p = .012, g2p :08.

Fig. 4 plots, separately for the two training conditions, the mean estimates for old and new subspe-
cies in the estimation task that all participants received at the end of the experiment. As can be seen,
the requirement of participants training with direct criterion learning to provide estimates during the
training phase indeed led to more accurate estimates for old subspecies than in Experiment 1. As can
be seen from Table 5, in contrast to Experiment 1 participants trained with direct criterion learning
provided more accurate estimates than participants trained with learning by comparison,
t(70.7) = 1.76, p = .08 and t(64.5) = 3.7, p = .001 for RMSE and rold, respectively. Replicating Experi-
ment 1, however, there were again clear indications for extrapolation after learning by comparison,
but not after direct criterion learning: RI was strongly negative after training with direct criterion
learning, whereas it did not differ from zero after training with learning by comparison (Table 5). Fur-
ther, there was a higher proportion of estimates outside the training range in the condition with learn-
ing by comparison, t(78) = 4.22, p = .001, d = .95. These results again support the hypothesis that direct
criterion learning fosters reliance on exemplar processes and that learning by comparison fosters reli-
ance on cue abstraction.

For comparability with the analysis of decision performance and Experiment 1, for the participants
whose rst task in the test phase was an estimation task, we tted the models to simulated classi-
cation responses. As can be seen from Fig. 5, the computational modeling replicated, by and large, the
patterns observed in Experiment 1 (average RMSD and r2 of the best-tting models are reported in
Table 4; see again Section 5 for a more detailed discussion of the various cue-based models). Specif-
ically, in the paired-comparison task there was only a relatively weak effect of the type of learning task
on strategy selection, with 25% and 10% of the participants best described by an exemplar model in the
conditions with direct criterion learning and learning by comparison, respectively. Both in the classi-
cation task and in the estimation task, however, the type of learning task had a huge impact: consid-
erably more participants were best described by an exemplar model after training with direct criterion
learning than after training with learning by comparison (classication: 80% vs. 10%; estimation: 65%
vs. 10%). In addition, as in Experiment 1, strategy selection was strongly affected by the type of deci-
sion task. The paired-comparison task seemed to predominantly trigger reliance on cue-based pro-
cesses, whereas the classication and estimation tasks seemed to trigger reliance on exemplar
processes. This effect was particularly pronounced after training with direct criterion learning. Sup-
porting that differences in strategy use are accompanied by differences in decision accuracy, Fig. 6
shows that participants best described by a cue-based model achieved a higher accuracy than those
best described by an exemplar-based model, F(1, 72) = 7.41, p = .008, g2p :093.
The results of Experiment 2 suggest that even when the training phase with direct criterion learn-
ing involved an estimation task, participants can only approximate, but not consistently exceed the
performance of participants who trained with learning by comparison. Specically, after training with
direct criterion learning people still seem to suffer more when tested with a decision task that differs
from the one with which they were trained (Fig. 3). Moreover, computational modeling again shows a
clear effect of the type of learning task on strategy selection (which was most pronounced in classi-
cation and estimation). This suggests that at least some of the remaining differences in performance
are due to a greater reliance on exemplar processing after training with direct criterion learning. (Re-
call that exemplar processing restricts performance in the linear environment used in Experiments 1
and 2 because it does not allow for extrapolation.)
If the worse performance after training with direct criterion learning is due to a greater reliance on
exemplar processing in this condition, it should be possible to reverse the direction of the effect of
learning task on subsequent performance by changing the statistical structure of the task from a linear
to a nonlinear environment (where exemplar processes can excel). This hypothesis is tested in Exper-
iment 3, thus examining a possible boundary condition of the benecial effect of learning by
comparison.
4. Experiment 3
In Experiment 3, the toxicity of the subspecies followed a similar nonlinear quadratic function as
the one used by Olsson et al. (2006). Juslin et al. (2008) proposed that the mind is constrained to inte-
grate information in an additive linear fashion. Thus constrained, cue abstraction is unable to capture a
nonlinear task structure very well, whereas exemplar-based processes can capture any structure (i.e.,
exemplar models are nonparametric in the sense that they do not pose any restriction on the decision
bound between categories; as a consequence, they predict that given enough experience, participants
will eventually respond optimally even for highly complex category structures). The key question in
Experiment 3 was: does the direction of the type of learning tasks impact on performance reverse in a
(quadratic) nonlinear environment? If people continue to show a tendency for greater reliance on cue-
based mechanisms when trained with learning by comparison and a greater reliance on exemplar pro-
cessing when trained with direct criterion learning, it should now be the latter who learn faster in the
training phase and show a better performance in the test phasea pattern opposite to that observed
for the linear environment in Experiments 1 and 2.8
4.1. Method
4.1.1. Participants
Eighty students (average age 22.6 years, 55 female) from the University of Basel participated in the
experiment. The payment scheme was identical to the one used in Experiment 2.
4.1.2. Material
Sixteen different subspecies were used (see Table 1). Following Olsson et al. (2006), the toxicity le-
vel of each subspecies i was a nonlinear, quadratic function of the linear criterion, ci, used in Experi-
ments 1 and 2 (see Fig. 7).9

2:006 c2i 1
ciNL 0:9144 44:3 c 1157:09 50 : 2
5 10
Table 1 shows the resulting continuous (cNL) and binary (bNL) criterion values. Subspecies with toxicity
levels cNL > .85 were deadly, and subspecies with lower toxicity levels were not deadly. Eleven of the
16 subspecies were presented during the training phase (indicated by O in Table 1), and ve new sub-
species were introduced in the test phase (indicated by N). For the paired-comparison task in the test
phase, we selected those 18 pairs of the 120 (=16 15/2) possible comparisons of all 16 subspecies
that allowed a discrimination between CAM and EBMthat is, where the two models (with CAM tted
to the criterion values in the training phase and EBM having w = 0.25 for all cues) made different pre-
dictions and where the difference between the predicted criterion values for the subspecies in the pair
was at least 0.05.
4.1.3. Design and procedure

With the exception of the structure of the environment, the design was identical to that in Exper-
iment 1 (see Fig. 1). In the conditions with direct criterion learning, the basic training phase consisted
of the 11 subspecies each being presented 18 times, resulting in 198 trials. The 198 trials were pre-
sented as three blocks (with 66 trials each), after each of which participants could take a short break.
For the conditions with learning by comparison, two of the 55 possible pair comparisons of the 11 sub-
species were excluded because the 2 subspecies had identical toxicity levels. The basic training phase
consisted of each of the remaining 53 pairs being presented four times, resulting in 212 trials. The 212
trials were divided into four blocks (each consisting of a cycle of the 53 pair comparisons). In both
training conditions, participants had to reach a certain accuracy level before they could proceed to
the test phase. Because quadratic functions are considerably more difcult to learn than linear ones
(e.g., Mellers, 1980; Olsson et al., 2006), the required accuracy for completing the training phase
was set to 80% (compared to 85% in Experiments 1 and 2). Again, care was taken to motivate partic-
ipants and to encourage them to persevere if they required several additional training blocks to reach
the necessary accuracy level.
8
In Olsson, Enkvist, and Juslin (2006), people showed no adaptive shift to exemplar processing in a quadratic nonlinear task
(unless explicitly instructed to do so) and instead seemed to stubbornly stick to cue-based processing. However, performance in
these experiments was very poor and responses were rather noisy (as indicated by poor model ts), possibly due to low
motivation. Whereas Olsson et al.s participants were not paid according to performance, in our experiments participants received
performance-contingent payment and we therefore expected better performance and more systematic responses than in Olsson
et al.
9
Compared to Olsson et al. (2006), the function was adjusted to reduce the number of subspecies with the same criterion values
(which would have required a departure from a binary response format in the paired-comparison task).
In the test phase, participants were rst presented with either a paired-comparison task or a clas-
sication task (without feedback). In the paired-comparison task, 11 of the 18 pairs included a new
and an old subspecies and were presented eight times; seven pairs included two old subspecies and
were presented four times, yielding a total of 116 trials. In the classication task, they were asked
to classify each of the 16 subspecies as either deadly or not deadly. Each of the 11 old subspecies
was presented four times, and each of the ve new subspecies was presented eight times, and yielding
a total of 84 trials. In the nal estimation task, participants were presented twice with all 16 subspe-
cies (in random order) with the instruction to estimate the toxicity level of each. Completion of the
experiment took, on average, around 120 min.
4.2. Results
4.2.1. Performance
In both training conditions, participants nished the training phase with a high level of accuracy.
The percentage of correct decisions differed between the two groups, but in contrast to Experiments 1
and 2, it was now the participants in the condition with direct criterion learning who performed better
in the last training block (Ms = 90.3% vs. 83.4%), t(78) = 5.7, p = .001, d = 1.27. In addition, participants
training with direct criterion learning reached the required accuracy level faster: when nishing the
training phase, they had been presented with fewer trials (Ms = 282.2 vs. 360.4), t(78) = 2.54,
p = .013, and fewer repetitions of each subspecies (Ms = 25.7 vs. 66.8), t(51.5) = 8.7, p = .001, than
participants training with learning by comparison.
How was cross-item generalization in the test phase affected by the type of learning task in the
nonlinear environment? In the paired-comparison task, performance for old items did not differ be-
tween the training conditions, t(38) = .60, p = .55, d = .19. Fig. 2 shows that performance on new items
was lower than performance on old items for participants who had trained with learning by compar-
ison, t(19) = 2.90, p = .009, d = .85, but not for those who had trained with direct criterion learning,
t(19) = 0.33, p = .75, d = .07. In the classication task, performance on new items was lower than on
old items in both training conditionsdirect criterion learning: t(19) = 11.70, p = .001, d = 3.48;
learning by comparison: t(19) = 4.71, p = .001, d = 1.51. A repeated-measures ANOVA showed an
interaction between item type (i.e., old/new) and training condition both for the classication task,
F(1, 38) = 5.61, p = .02, g2p :13, and for the paired-comparison task, F(1, 38) = 4.81, p = .03, g2p :11.
Concerning cross-task generalization, Fig. 3 shows that in contrast to Experiment 1 (which is most
similar to Experiment 3, except for the structure of the environment), performance was now better
after training with direct criterion learning than after training with learning by comparison. Perfor-
mance in the paired-comparison and the classication tasks differed after direct criterion learning,
t(27.1) = 3.42, p = .002, d = 1.11, but not after learning by comparison, t(38) = .55, p = .60, d = 0.17.
Nevertheless, overall performance was considerably better after direct criterion learning than after
learning by comparison in the classication task, t(38) = 5.63, p = .001, d = 1.83, and, to some degree,
also in the paired-comparison task, t(38) = 1.09, p = .28, d = 0.36. An ANOVA showed a signicant inter-
action between type of decision task and training condition, F(1, 38) = 3.90, p = .05, g2p :049.

Fig. 4 shows the mean estimates for old and new subspecies, separately for the two training con-
ditions. As Table 5 shows, participants trained with learning by comparison provided less accurate
estimates than those trained with direct criterion learning in terms of RMSE, t(78) = 2.1, p = .036,
but not in terms of rold, t(78) = 0.01, p = .99. Because in nonlinear environments only the amount of
extrapolation, but not the size of oldnew differences is diagnostic for distinguishing between (linear)
cue abstraction and exemplar-based processes (Karlsson et al., 2007), RI was calculated based on the
extrapolation index only (see Appendix B). As can be seen in Table 5, although RI was negative (sug-
gesting exemplar processing) in the condition with direct criterion learning only, it did not differ sig-
nicantly from zero in both conditions. The proportion of extrapolation estimates was higher after
training with learning by comparison than after training with direct criterion learning, t(65.4) = 4.7,
p = .001, d = 1.06.

As reported above, performance after learning by comparison was now somewhat lower than after
direct criterion learning. This would be expected if participants trained with learning by comparison
relied more on cue-based processes than participants trained with direct criterion learning (as in
Experiments 1 and 2). Is there again evidence for such an effect of the type of learning task on strategy
selection in the test phase? As the results of the computational modeling of individual participants
responses in the test phase show (see Table 2 and Fig. 5; average RMSD and r2 of the best-tting mod-
els are reported in Table 4), the answer is no (see again Section 5 for a more detailed discussion of the
various cue-based models). Instead, participants seemed to rely mainly on exemplar-based processes,
both after training with learning by comparison and after training with direct criterion learning
(paired comparison: 55% vs. 40%; classication: 85% vs. 95%; estimation: 72% vs. 75%). This result sug-
gests that in a nonlinear environment, people generally and adaptively switch to exemplar processing
and that this effect of the structure of the environment overrides an effect of the type of learning task
on strategy selection. The worse performance after training with learning by comparison might nev-
ertheless indicate that learning by comparison has a hindering effect on peoples ability to apply
exemplar processing. As expected, Fig. 6 shows that in contrast to Experiments 1 and 2, accuracy
was now higher for participants classied as following an exemplar-based model than for those clas-
sied as following a cue-based model, F(1, 72) = 3.94, p = .051, g2p :052.
The nding that in the classication and estimation tasks after learning by comparison a large pro-
portion of participants were best described by our new exemplar model supports the thesis that in the
absence of metric criterion information, people store dominance proportions for each exemplar as
proxies for criterion values. Finally, as in Experiments 1 and 2, there were indications for an effect
of the type of decision task on strategy selection, with greater use of cue-based mechanisms in the
paired-comparison task than in classication and estimation tasks.10
Most importantly, compared to the linear environment used in Experiments 1 and 2, we observed a
reversal of the effect of learning task on decision performance when using a nonlinear environment in
Experiment 3. Specically, it was now the participants in the condition with direct criterion learning
who showed faster learning in the training phase and better performance in the test phase. One pos-
sible explanation for the slower learning in the training phase with learning by comparison could be
that in this condition participants rst stubbornly tried to apply linear cue abstractionconsistent
with the results in Experiments 1 and 2. As, however, a strategy with linear cue abstraction allowed
a maximum performance of about 70% correct decisions only, the participants in this condition also
nally switched to exemplar processing. Although an effect of type of learning task on strategy selec-
tion was not visible in the test phase (and participants in both training conditions seemed to rely
mainly and adaptively on exemplar processing; Fig. 5), participants who trained with learning by com-
parison nevertheless performed worse. This lower performance may thus be mainly due to a more er-
ror-prone execution of exemplar processing in this condition.
Taken together, the results of Experiment 3 highlight a boundary condition of the benecial effect
of learning by comparison and provide some evidence that this training condition fosters reliance on
examplar-based processes less than direct criterion learning also in a nonlinear environment. More-
over, the ability of exemplar models to capture the continuous responses of participants who had
10
Although sigma theory predicts an inability of the mind to implement nonlinear cue abstraction, for the purpose of comparison
we additionally tested how well a nonlinear CAM would capture participants responses (cf. Juslin et al., 2008; Olsson et al., 2006).
An analysis of nonlinear CAMs exibility showed that based on BIC it tended to overt. Specically, in contrast to the other models
tested in this article, it was able to t a randomly generated data set. This was not the case, however, when projective t (e.g.,
Juslin, Jones, et al., 2003) was used to evaluate the models. Therefore, the competition of nonlinear CAM to the other models was
based on projective t. It turned out that although a minority of participants was indeed best described by a nonlinear CAM, the
general conclusions remained the same (i.e., with the majority of participants best described by an exemplar model and only small
differences in strategy use between the training conditions).
0.8
Nonlinear criterion
0.6
Old New
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
Linear criterion
Fig. 7. The nonlinear quadratic function determining the criterion values of the subspecies used in Experiment 3. Shown are
both the old subspecies (i.e., those presented during the training phase) and new subspecies (i.e., those introduced only in the
test phase).
trained with learning by comparison supports our new implementation of the generalized context
model, which uses dominance rates as proxies for criterion values.
5. General discussion
We highlighted a distinction between two types of learning taskslearning by comparison and di-
rect criterion learningthat have frequently and interchangeably been used in previous studies. Not-
ing striking inconsistencies in ndings concerning the contribution of exemplar processing in
multiple-cue judgment (Juslin, Jones, et al., 2003; Nosofsky & Bergert, 2007), we argued that the selec-
tion of cue-based and exemplar-based mechanisms might be critically inuenced by the type of learn-
ing task used during trainingthus pointing to a possible further boundary condition for exemplar
processing in decision making. We highlighted that the existing ndings on effects of learning condi-
tions in decision making and classication research give rise to opposing hypotheses concerning deci-
sion performance: learning by comparison could lead to either worse or better decisions as compared
to direct criterion learning. In addition, we pointed out that the direction of the effect might depend on
the structure of the environment.
In three experiments, it was demonstrated that the type of learning task provided during training
can have a substantial impact on decision performance and strategy selection in a subsequent test
phase. Specically, although training with learning by comparison provides, in principle, less informa-
tion than direct criterion learning, Experiments 1 and 2 showed that this learning task can convey sev-
eral advantages to the decision maker in a linear environment: with training via learning by
comparison performance improved faster, and people performed better when they had to generalize
knowledge about the environment to new items as well as when switching to a different type of deci-
sion task. In addition, after training via learning by comparison participants were able to translate
their knowledge about the environment into highly accurate continuous estimates. Tests of qualitative
predictions of cue abstraction and exemplar processing (based on the RI index) as well as computa-
tional modeling suggest that the effect of learning task is due (at least in part) to an effect on strategy
selection. Learning by comparison seems to foster the recruitment of cue-based processes, whereas
direct criterion learning seems to foster the recruitment of exemplar-based processes; this effect
was very pronounced in classication and estimation but did not occur in paired comparison.
Consistent with the thesis that mechanisms based on cue abstraction are constrained to linear, addi-
tive integration, Experiment 3 showed that in a nonlinear environment, learning by comparison leads
to worse performance and generalization than direct criterion learning. In addition, the large majority
of participants reliedadaptively in this environmenton exemplar processing. Overall, we thus iden-
tied two boundary conditions where the type of learning task did not affect strategy selection:
paired-comparison task and environments with a nonlinear (quadratic) structure. In the following,
we discuss the implications of our ndings.
5.1. The role of exemplar processing and cue abstraction in decision making
Our results have important implications for research on the relative contribution of cue-based and
exemplar-based processes in decision making. Specically, researchers decision to use, in the training
phase, learning by comparison or direct criterion learning can lead to drastically different conclusions.
For illustration, consider the results of Experiment 2 depicted in Fig. 5. Whereas using direct criterion
learning would lead to the conclusion that classication decisions are mainly based on exemplar pro-
cesses (i.e., 80% of participants assigned to an exemplar model), with cue-based mechanisms playing
only a minor role (20%), one would come to the exact opposite conclusion using learning by compar-
ison (10% vs. 90%).
Whereas previous research has yielded a rather inconsistent picture as to whether cue-based or
exemplar-based mechanisms are predominant, our results demonstrate that both are key tools in
decision making. However, peoples reliance on them seems to be highly sensitive to various charac-
teristics of the task environment. The crucial goal is thus to identify these characteristics. Previous
studies have mainly focused on the inuence of the statistical structure of the environment and as-
pects of the decision task (for an overview, see Karlsson et al., 2008). Effects of the decision task were
clearly evident in our experiments as well: participants relied on exemplar processes much more in
classication and estimation than in paired comparison and in the latter the inuence of the type
of learning task on strategy selection was rather weak (Fig. 5). Our results underline that, in addition
to environmental and task factors, also the way in which information about the environment is ac-
quired has an important inuence on strategy selection (for related results in experience-based risky
choice see Hau, Pleskac, Kiefer, & Hertwig, 2008; Hills & Hertwig, 2010).
5.2. Why do learning by comparison and direct criterion learning lead to differences in decision making?
Our results show that peoples decision making can be critically affected by whether they ac-
quired knowledge about the environment via learning by comparison or via direct criterion learning.
What are possible explanations for this effect of the type of learning task? Our nding that when
training in a linear environment, peoples performance improved faster with learning by comparison
is related to ndings by Klayman (1988b). He observed in a cue discovery experiment that partici-
pants who were allowed to compare cue values of several objects were subsequently more accurate
in mastering the task than participants who could not directly compare the objects. To account for
this nding, Klayman (1988a) proposed that people spontaneously engage in comparative hypoth-
esizing and approach learning in a multiple-cue judgment task by learning how changes in cues are
associated with changes in the criterion (rather than learning associations between cues and the cri-
terion per se). Learning by comparison may match such comparative hypothesizing more readily
than direct criterion learning, which requires the decision maker to construct a reference for the ob-
served changes internally. A benecial effect of comparison was also reported by Oakes and Ribar
(2005), who found that 4-month-old infants were able to discriminate between two categories
(e.g., cats and dogs) if, in a training phase, they were presented with pairs of objects that allowed
comparing members within one category. When the infants were presented with individual objects,
however, they were not able to discriminate between the two categories. According to the authors,
the advantage of learning with pairs of objects was due to a lower memory demand, which should
facilitate the detection of similarities between objects and thus leads to a richer encoding of the
relevant features.
A benecial effect of comparison can also be related to Gentners (1983; see also Gentner & Mark-
man, 1997) structure-mapping theory, according to which extracting the relational structure in a task
is facilitated to the extent that features of objects can easily be aligned. This idea has been investigated
in the context of spatial mapping (e.g., nding the corresponding locations in two rooms; Loewenstein
& Gentner, 2001), part learning (e.g., mapping a novel part of an object within the whole object) and
word learning (Gentner & Namy, 2004). For example, in Gentner, Loewenstein, and Hung (2007) chil-
dren were trained in mapping a novel part (e.g., a blick) on a whole object (e.g., a seal). As alignment
between two highly similar objects is assumed to be easier than alignment between two dissimilar
objects, similarity should promote good performance in this task. As predicted, children who were
trained on highly similar objects performed better on a later generalization test than children who
were trained on dissimilar objects. Potentially, learning by comparison makes alignment easier than
direct criterion learning.
Importantly, however, although these approaches might be used to explain the benecial effect of
learning by comparison that we observed in the training phase of Experiments 1 and 2, they cannot
directly account for the observed differences in decision making in the test phase. First, due to the
accuracy criterion that had to be achieved in the training phase, we ensured that participants in both
training conditions were equally well able to master the task. Second, we have no indication that par-
ticipants trained with direct criterion learning had abstracted cue weights to a lesser extent than par-
ticipants trained with learning by comparison. Specically, in both training conditions the majority of
participants were classied as following a cue-based strategy (which relies on abstracted cue weights)
when presented with a paired-comparison task. So how then to account for the differences in strategy
selectionwhich, as we argued, contributed to the differences in generalization ability after learning
by comparison and direct criterion learning?
Importantly, as in the training phase cue-based and exemplar-based mechanisms allowed, in prin-
ciple, equally good performance, differences in strategy selection cannot be due to differences in rein-
forcement (cf. Rieskamp & Otto, 2006). A more likelybut necessarily post hocexplanation could be
that the comparison processes facilitated in learning by comparison led to a greater focus on the indi-
vidual cues and thus might have primed participants to subsequently rely more on a cue-based strat-
egy (cf. Juslin, Jones, et al., 2003). To what extent, however, learning by comparison indeed triggers a
stronger focus on individual cues than direct criterion learning is an interesting issue for future re-
search (we discuss possible ways to study the learning processes in Section 5.4).
5.3. The impact of the type of decision task on strategy selection
Though not the focus in the article, our results also show differences in the selection of cue-based
and exemplar-based mechanisms between different types of decision tasks. As can be seen in Fig. 5,
peoples reliance on cue-based strategies was more pronounced in the paired-comparison task than
in the classication and estimation tasks, in particular after direct criterion learning. How to explain,
the greater reliance on cue-based strategies in a task with two objects than in a single-object task?
One possible contributing factor might be cognitive effort. As in a paired-comparison task an exem-
plar-based mechanism would require the retrieval of two exemplars, people might prefer an arguably
less effortful cue-based strategy, which only requires the retrieval of a rule. Given the current lack of
studies systematically investigating task effects in decision making, however, these considerations
must remain speculative.
5.4. Cue abstraction and exemplar processing after learning by comparison
How do people acquire knowledge about the structure of the environment when training with
learning by comparison? Existing process accounts of how people abstract cuecriterion relation-
ships assume that people relate changes in cue values to changes in criterion values (e.g., DeLosh
et al., 1997; Juslin et al., 2008). With direct criterion learning this works in a rather straightforward
fashion, but with learning by comparison no criterion values are provided. Nevertheless, people
seem to be able to accurately derive the structure of a linear environment when learning by com-
parison. In this article, we assumed that people abstract metric information about objects criterion
values from comparisons by keeping a tally of the relative frequency with which one object has a
higher criterion value than another. Moreover, we proposed an exemplar model that is based on
such dominance information (and which captured participants responses in Experiment 3 rather
well). In both the linear and nonlinear environments used in the experiments, the objects domi-
nance rates were highly correlated with their criterion values (linear environment: r = .998; nonlin-
ear environment: r = .950) and therefore provided a helpful basis to learn about the metric structure
of the environment. The hypothesis that people derive metric information about the environment
from accumulated frequencies of ordinal comparisons is shared by other current models of decision
making, such as decision-by-sampling theory (Stewart et al., 2006), and supported by the vast liter-
ature showing that people are highly efcient in encoding, storing, and judging frequency informa-
tion in the environment (e.g., Hertwig, Pachur, & Kurzenhuser, 2005; Zacks & Hasher, 2002). One
further way to test this hypothesis would be to manipulate, by selective repetitions of individual
comparisons during the training phase, dominance rates such that they deviate systematically from
the criterion values.
Another important issue that, however, is beyond the scope of this article, are differences in the
learning processes (during the training phase) between learning by comparison and direct criterion
learning. For instance, take the nding that participants in the condition with learning by comparison
improved their performance during training more quickly than in the condition with direct criterion
learning. Given that the training phase could, in principle, be equally well mastered with a cue-based
and an exemplar-based strategy, the differences in learning speed cannot be due to differences in
strategy use. Rather, as we discussed in Section 5.2, the two learning tasks might, in addition to nudg-
ing participants to subsequently rely on different strategies, also differ in the degree to which they al-
low detecting statistical regularities in the environment. The processes during learning could be
examined, for instance, by tracking peoples eye movements during the training phase (cf. Clement,
Harris, Burns, & Weatherholt, 2010; Kruschke, Kappenman, & Hetrick, 2005). Do people in the learn-
ing-by-comparison condition primarily examine the cue patterns in a cue-wise fashion (i.e., by mainly
moving attention between the bugs) rather than processing the bugs in an alternative-wise fashion
(i.e., by mainly moving within the bugs)?
5.5. Compensatory vs. noncompensatory cue-based decision making
Our main focus in this article was on comparing cue-based and exemplar-based mechanisms. How-
ever, cue-based mechanisms can be further distinguished between compensatory onessuch as CAM
and WADDand noncompensatory onessuch as gTTB. Which strategy do people use for cue-based
inference? Consistent with previous analyses by Brder (2000), Newell and Shanks (2003), and Glck-
ner and Betsch (2008), we found that in inferences from givens, where all cues are readily provided on
the screen and thus information costs rather low, the majority of participants relied on compensatory
mechanisms (see Table 6). Our results diverge, however, from those by Bergert and Nosofsky (2007),
who found in inferences from givens that 85% of the participants were best predicted by the noncom-
pensatory gTTB. How to explain these discrepancies?
Table 6
Percentage of participants classied as following the three cue-based mechanisms in the paired-comparison task in Experiments
13.
Experiment Training condition gTTB% WADD% CAM%

1 Learning by comparison 0 65 20
Direct criterion learning 10 55 30
Note: gTTB = generalized take-the-best; WADD = weighted additive strategy; CAM = cue abstraction model.
One possibility might be differences in cue dispersion (i.e., the distribution of weights across the
cues), which has been shown to affect strategy selection. For instance, Mata, Schooler, and Rieskamp
(2007) showed that people are more likely to follow the take-the-best heuristic when cue dispersion is
skewed as compared to as when the distribution is rather equal across the cues. However, both in
Bergert and Nosofsky (2007; see their Table 2) and in our studies (Experiments 1 and 2: 0.98, 0.90,
0.62, and 0.52) the cue validities decreased relatively evenly from the most to the least valid cue,
so this factor can be ruled out.
A more likely explanation is a combination of the number of cues and how positive and negative
values are coded. Some studies (Brder, 2000; Glckner & Betsch, 2008; Newell & Shanks, 2003) used
a consistent cue coding, where for all cues positive and negative values were indicated as + and ,
respectively; in Bergert and Nosofskys (2007) and our experiments, by contrast, the cue coding was
varied, that is, the way positive and negative cue values were indicated differed across the cues
(e.g., green vs. brown back, large vs. small glands). Pachur and Hass (2012) found that people relied
on compensatory strategies even when the number of cues was high as long as the cue coding was
consistent, or when the number of cues, with varied cue coding, was relatively small (i.e., 4 cues). Peo-
ple switched to noncompensatory strategies, however, when the number of cues was high and the cue
coding was varied. In other words, the strong reliance on take-the-best found by Bergert and Nosofsky
might be the result of the combination of a relatively large number of cues (6 cues) and varied cue
coding.
5.6. Learning by comparison and direct criterion learning in the wild
Our results have important implications for understanding and improving decision making in the
real world. First, domains are likely to differ in terms of how people learn about the statistical struc-
ture of the environment. For instance, for predicting outcomes in competitive contests (e.g., sports,
political elections, job selection), the main focus is often only on which of two contestants (or candi-
dates) is superior, with the margin being irrelevant. As a consequence, learning in such domains is
likely to occur mainly by comparison, and our results suggest that one should therefore expect mainly
cue-based decision making. In other domains, by contrast, the decision maker is more likely to focus
on the continuous criterion value of each individual object and thus acquire knowledge by direct cri-
terion learning. In investment decisions, for instance, the absolute difference between stocks is highly
relevant, as even an only slightly higher protability can translate into enormously higher gains.
According to our results, one should expect exemplar-based decision making to play an important role
in these domains.
Second, real-world domains differ in terms of their statistical structure and our ndings might be
used to derive recommendations as to how methods for training decision makers should be matched
to a domains structure. In domains with an underling linear structure procedures highlighting com-
parisons between objects should be employed; in domains with a nonlinear structure procedures
highlighting the criterion value of individual objects seem more appropriate.
6. Conclusion
Thurstones (1927) law of comparative judgment, which asks people for successive ordinal compar-
isons of paired objects, was proposed as the experimenters royal road for revealing peoples internal
representation of the world. Our results highlight that comparing objects might convey considerable
benets for people as well. Specically, ordinal comparisons might represent not only an effective way
for experimenters to learn about the structure of the mind, but also an effective way for the mind to
learn about the structure of the world.
Acknowledgments
We thank Stephan Lewandowsky, Ralph Hertwig, Thomas Hills, Bryan Bergert, and Linnea Karlsson
for comments on a previous draft of this article and Laura Wiles for editing the manuscript.
Appendix A
A.1. Detailed description of the cue-based and exemplar-based models
Due to the different response formats in the decision tasks and the different feedback formats in
the learning tasks, and because different versions of cue-based and exemplar-based models have been
proposed in the literature, we tested several types and implementations of the models. We thereby
wanted to minimize the risk that our conclusions regarding the relative contribution of cue-based
or exemplar-based decision making would depend on the specic implementations of the models. Ta-
ble 2 summarizes which models were tested for the different decision tasks (i.e., paired comparison,
classication, and estimation), separately for the conditions with direct criterion learning and learning
by comparison, as well as the number of free parameters in each model. Note that we tted the models
to individual participants responses. Next we describe the tested models in more detail.
A.2. Cue-based models
A.2.1. Cue abstraction model (CAM)

For the estimation task, the cue abstraction model assumes that a continuous judgment y for an
object is the sum of the weighted cue values c1. . .cj plus an intercept k (Juslin, Olsson, et al., 2003).
X
J
yk wj cj ; A1
j1
where the intercept k and the weights w are free parameters. If k = .1, w1 = .4, w2 = .3, w3 = .2 and
w4 = .1, Eq. (A1) is identical to the function determining the continuous criterion in Experiments 1
and 2 and the model produces perfect estimates. We implemented two versions of the cue abstraction
model, CAMk and CAM. In CAMk, the intercept k was a free parameter. In CAM, k was set equal to
P
0:5 1 wj .
For the classication task, we implemented a cue abstraction model where the probability of a
deadly response (i.e., b = 1; b = 0 indicates a harmless response) followed a logistic function (cf.
Juslin, Olsson, et al., 2003):
P
ek wj cj
Pb 1 P ; A2
1 ek wj cj
where wj is the weight and cj the value of cue j, and k is the intercept. As for the estimation task, two
models were implemented: bCAMk and bCAM. In bCAMk, the intercept k was a free parameter; in
P
bCAM, k was set equal to 0:5 wj .
For the paired-comparison task, the cue abstraction models were the same as for the classication
task, except that the input to the models were the differences of the cue values, Dc, of the subspecies
within each pair. The probability that alternative A will be chosen over alternative B is given by
P
ek wj Dcj
PA; A; B P : A3
1 ek wj Dcj
For the paired-comparison task, we also implemented two alternative cue-based models: the compen-
satory weighted-additive model and the noncompensatory generalized take-the-best heuristic.
A.2.2. Weighted-additive (WADD) model

WADD (cf. Payne et al., 1988) applies to paired comparison, is a probabilistic generalization of the
rational model described by Lee and Cummins (2004) and was proposed by Bergert and Nosofsky
(2007). In this model, the probability that alternative A will be chosen over alternative B is given by
P
a2FA wa c
PA; A; B P P ; A4
a2FA wa c b2FB wb c
where wj (0 6 wj 6 1) are the weights assigned to each individual cue; the weights are constrained to
sum to 1. c is a free response scaling parameter (see e.g., Ashby & Maddox, 1993). With c set to 1,
WADD reduces to Lee and Cummins (2004) model. FA and FB denote the set of cues favoring alterna-
tives A and B, respectively.
A.2.3. Generalized take-the-best (gTTB)

The noncompensatory strategy gTTB (Bergert & Nosofsky, 2007; Nosofsky & Bergert, 2007) applies
to paired comparison and represents a probabilistic generalization of Gigerenzer and Goldsteins
(1996) take-the-best heuristic. gTTB is formally identical to Tverskys (1972) elimination-by-aspects
model as applied to paired comparison. As for WADD, wj (0 6 wj 6 1) are the weights assigned to each
cue and are constrained to sum to 1. The probability that alternative A will be chosen over alternative
B is given by
P
a2FA wa
PA; A; B P P ; A5
a2FA wa b2FB wb
where FA and FB again denote the set of cues favoring alternatives A and B, respectively. Note that
gTTB is nested under WADD: when c in Eq. (A4) is set to 1, the two models make identical predictions
(although the implied psychological processes are different).
A.3. Exemplar-based models
As for the cue-based models, we also tested various versions of the exemplar model, which differed
depending on the type of decision task in the test phase and whether the training had been with direct
criterion learning or learning by comparison. The exemplar model for the estimation task after direct
criterion learning, EBMcon, assumes that the continuous judgment y for object/probe p is the average of
the criterion values of the previously encountered exemplars, weighted by the similarity of each
exemplar to the probe:
PI
Sp; ixi
y Pi1
I
; A6
i1 Sp; i
where S(p, i) is the similarity of the probe p to the stored exemplar i; xi is the criterion value of exem-
plar i; and I is the number of stored exemplars in memory. S(p, i) is calculated by the similarity rule of
the generalized context model (Nosofsky, 1984), that is, by transforming the distances between probe
p and exemplar i. The distance dpi is calculated as
" #
XJ
dpi h wj jcpj cij j ; A7
j1
where cpj and cij are the cue values of probe p and exemplar i, respectively, on cue dimension j; h is a
sensitivity parameter that reects the overall discriminability in the psychological space (Nosofsky,
1984); the parameter wj is the respective attention weight attached to cue j. Attention weights vary
between 0 and 1 and are constrained to sum to 1. The similarity S(p, i) between a probe p and an exem-
plar i is a nonlinearly decreasing function of their distance (dpi),
Sp; i edpi : A8
The exemplar model for the classication task, EBM, predicts the probability of a deadly response
(i.e., b = 1) as
P
Sp; ib1
Pb 1 P P ; A9
Sp; ib1 Sp; ib0
P P
where Sp; ib1 and Sp; ib0 denote the summed similarities of a probe to the exemplars belong-
ing to categories b = 1 and b = 0, respectively. The distance d and similarity S are calculated as dened
in Eqs. (A7) and (A8). This version of the exemplar model is identical to the generalized context model
except for having no response scaling parameter (1986; Nosofsky, 1984). EBM was also tested in the
estimation task, with the predicted probability (as dened in Eq. (A9)) used as the predicted estimate
of the criterion value; in addition, EBMcon was also tested in the classication task, using the predicted
estimates as predicted probabilities of a b = 1 response.
In contrast to the situation after direct criterion learning, for the classication and estimation tasks
after training with learning by comparison current exemplar models cannot be applied, as they as-
sume that each exemplar is stored along with either its binary (i.e., deadly vs. not deadly) or contin-
uous criterion value (e.g., Juslin, Olsson, et al., 2003). In learning by comparison, however, participants
are never directly informed about the exemplars binary or continuous criterion values. We therefore
propose a new exemplar model for classication and estimation after learning by comparison. This
new model assumes that each exemplar is stored along with criterion information derived from dom-
inance rates, dened as the relative frequency of the exemplar having a higher criterion value than the
exemplars to which it had been compared in the training phase. This makes it possible for the model
to predict classication probabilities and continuous criterion values for single objects.
We implemented two versions of this new exemplar model. In the rst version, EBMdomcon, it is
assumed that continuous dominance rates are stored, and for the estimation task, a continuous judg-
ment y for the object/probe p is as dened as in Eq. (A6), except that the criterion value x is replaced by
the exemplars dominance rate. In other words, the predicted estimate is the average dominance rate
of the stored exemplars, weighted by each exemplars similarity to the probe. In the second version,
EBMdombin, it is assumed that binary dominance rates are stored. A binary dominance rate indicates
for each exemplar whether it has won (i.e., was the one with the higher criterion value) in the major-
ity of paired comparisons in the training phase, in which case the exemplar was assigned to category
b = 1; or whether it has lost in the majority of paired comparisons, in which case it was assigned to
category b = 0. For the classication task, the probability of a b = 1 response is dened as in Eq. (A9).
EBMdombin was also tested in the estimation task, with the predicted classication probabilities as
predicted estimates; and EBMdomcon was also tested in the classication task, using the predicted
estimates as predicted probabilities of a b = 1 response.
For the paired-comparison task after training with learning by comparison, we implemented both
exemplar models that Nosofsky and Bergert (2007, p. 1002) proposed for this situation. In the rst
model, EBMpc, it is assumed that winning alternatives are stored as exemplars of the winners category
and losing alternatives are stored as members of the losers category, and the similarities to the win-
ners and losers categories for alternative A are given by
X
SA; W sA; w
w2W
and
X
SA; L sA; l; A10
l2L
respectively, where S(A, W) and S(A, L) denote the similarities of A to each exemplar in the winners and
losers category, respectively. The relative evidence for alternative A is given by
SA; W
GA : A11
SA; W SA; L
The process is the same for alternative B. The probability that alternative A is judged to have a higher
criterion value is given by
GA
PA; A; B : A12
GA GB
The distances and similarities are calculated as described in Eqs. (A7) and (A8) with the same number
of free parameters and the same parameter constraints.
The second model proposed by Nosofsky and Bergert (2007), EBMpc2, assumes that in the training
phase people store pairs of alternatives as exemplars in winners and losers categories. If the feedback
indicates that alternative A has a higher criterion value than alternative B, then the pair AB is stored as
a member of the winners category and the pair BA is stored as a member of the losers category. It is
further assumed that a decision for a probe pair is a function of its similarity to each exemplar pair
stored in memory (Nosofsky & Bergert, 2007, p. 1003). The similarity is calculated with equations
analogous to Eqs. (A7) and (A8). The attention weight w assigned to cue j for the rst exemplar in
the pair is assigned the same value as for cue j + 4 for the second exemplar in the pair. The probability
that alternative A is judged as having a higher criterion value than alternative B is given by
SAB; W
PA; A; B ; A13
SAB; W SAB; L
where S(AB, W) and S(AB, L) denote the similarities of the pair AB to each exemplar in the winners and
losers categories, respectively.
For the paired-comparison task after training with direct criterion learning, we implemented a
modied version of EBMpc, where the allocation of members of the winners and losers categories
was determined by the binary criteria for each exemplar in the training phase (EBMdirpc).
Finally, all exemplar models were also implemented with a response scaling parameter (EBMgcon,
EBMg, EBMgdombin, EBMgdomcon, EBMgpc, EBMgpc2, and EBMgdirpc), as for the WADD model (cf. Eq.
(A4)).
A.4. Parameter estimation and model evaluation
For the paired-comparison task and the classication task, all models were tted to each individual
participants responses in the test phase by minimizing the Bayesian Information Criterion (BIC;
Schwarz, 1978). The BIC for a model is given by
BIC 2 lnL k lnN; A14
where ln(L) is the log-likelihood of the data given the model, k is the number of free parameters in the
model, and N is the number of observations. The likelihood function assumes binomially distributed
data. For the estimation task, we assumed normally distributed errors and all models were tted by
minimizing the least squares version of Eq. (A14),
^ 2e k lnN;
BIC LS n lnr A15
where r ^ 2e is the error variance (i.e., mean squared error).
We tried several numerical and analytical methods for parameter estimation, but the method that
performed best overall was a version of the simplex algorithm (Lagarias, Reeds, Wright, & Wright,
1998) with 30 random starting points for each data set. We chose to use BIC for the model selection
in the test phase for two reasons. First, it turned out that some of the models were non-identiable or
near non-identiable (i.e., different parameter estimates gave rise to the same likelihood) when fewer
than all of the responses in the test phase were used to estimate the parameter values or when the last
part of the training phase was used to estimate the parameter values. This ruled out model selection
methods such as cross-validation and projective t (in the latter, the parameters are estimated based
on data in the nal part of the training phase and then used to predict the data in the test phase). The
second reason was that using BIC makes our results comparable to those of Nosofsky and Bergert
(2007).
Appendix B
B.1. Calculation of the representation index (RI)
The representation index is dened as the sum of two individual indices: the interpolation index
and the extrapolation index (cf. Olsson et al., 2006). The interpolation index is calculated on the n pairs
of subspecies for which there is an old subspecies and a corresponding new subspecies with the same
criterion value (i.e., c = .7, c = .6 and c = .5 in Table 1). The index is dened as
Pn
i1 DOldi DNewi
I A16
k
and expresses the average difference between the estimation errors D for the old subspecies and the
estimation error for the corresponding new subspecies. Exemplar processing predicts that the estima-
tion error for old subspecies is smaller than for new ones (yielding an I < 0), whereas cue abstraction
predicts no systematic differences between the estimates for old and new subspecies. Therefore, a
negative value is suggestive of exemplar processing, and a value of I near 0 is suggestive of cue
abstraction. The extrapolation index is dened as the deviation of the observed estimate for the most
extreme new subspecies from the estimate predicted by linear regression, which was derived from
each participants estimates for the old subspecies:

xc1 xc0:9 b; for xc x1
E ; A17
xc0:1 xc0 b; for xc x0
where (xc=1 xc=0.9) and (xc=0.1 xc=0) are the slopes of the lines that relate the mean estimates for
exemplars with criteria 1 and 0.9, and 0.1 and 0, respectively; b is the predicted extrapolation. A value
of E close to 0 implies cue abstraction, a negative value of E implies exemplar processing. Why? In con-
trast to cue abstraction (which predicts an ability to extrapolate), exemplar memory predicts an
inability to extrapolate beyond the range of the training data (i.e., exemplar memory acts as a
weighted average of the criterion values stored during training), which would lead to a value of E
smaller than 0. As RI is the sum of I and E, an RI of around 0 is suggestive of cue abstraction and a neg-
ative RI is suggestive of exemplar processing.
References
Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological theory of multiple systems in
category learning. Psychological Review, 105, 442481.
Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization.
Journal of Mathematical Psychology, 37, 372400.
Bergert, F. B., & Nosofsky, R. M. (2007). A response-time approach to comparing generalized rational and take-the-best models
of decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 107129.
Brehmer, B. (1994). The psychology of linear judgement models. Acta Psychologica, 87, 137154.
Brder, A. (2000). Assessing the empirical validity of the Take-the-Best heuristic as a model of human probabilistic inference.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 13321346.
Brder, A., Newell, B. R., & Platzer, C. (2010). Cue integration vs. exemplar-based reasoning in multi-attribute decisions from
memory: A matter of cue representation. Judgment and Decision Making, 5, 326338.
Brder, A., & Schiffer, S. (2003). Take-the-Best versus simultaneous feature matching: Probabilistic inferences from memory and
effects of representation format. Journal of Experimental Psychology: General, 132, 277293.
Brder, A., & Schiffer, S. (2006). Stimulus format and working memory in fast and frugal strategy selection. Journal of Behavioral
Decision Making, 19, 361380.
Clement, C. A., Harris, R. C., Burns, B. M., & Weatherholt, T. N. (2010). An eye-tracking analysis of the effect of prior comparison
on analogical mapping. Current Psychology, 29, 273287.
Cooksey, R. W. (1996). Judgment analysis: Theory, methods, and applications. New York: Academic Press.
DeLosh, E. L., Busemeyer, J. R., & McDaniel, M. A. (1997). Extrapolation: The sine qua non for abstraction in function learning.
Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: A memory processes model for judgments of likelihood.
Psychological Review, 106, 180209.
Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology: General,
127, 107140.
Garcia-Retamero, R., Wallin, A., & Dieckmann, A. (2007). Does causal knowledge help us be faster and more frugal in our
decisions? Memory and Cognition, 35, 13991409.
Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155170.
Gentner, D., Loewenstein, J., & Hung, B. (2007). Comparison facilitates childrens learning of names for parts. Journal of Cognition
and Development, 8, 285307.
Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity. American Psychologist, 52, 4556.
Gentner, D., & Namy, L. L. (1999). Comparison in the development of categories. Cognitive Development, 14, 487513.
Gentner, D., & Namy, L. L. (2004). The role of comparison in childrens early word learning. In D. G. Hall & S. R. Waxman (Eds.),
Weaving a lexicon (pp. 533568). Cambridge, MA: MIT Press.
Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review,
103, 650669.
Gigerenzer, G., Todd, P. M., & the ABC Research Group (1999). Simple heuristics that make us smart. New York: Oxford University
Press.
Glckner, A., & Betsch, T. (2008). Multiple-reason decision making based on automatic processing. Journal of Experimental
Psychology: Learning, Memory and Cognition, 34, 10551075.
Goldstein, D. G., & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review, 109,
7590.
Hammond, K. R. (1955). Probabilistic functioning and the clinical method. Psychological Review, 62, 255262.
Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (2008). The description-experience gap in risky choice. The role of sample size and
experienced probabilities. Journal of Behavioral Decision Making, 21, 493518.
Hertwig, R., Pachur, T., & Kurzenhuser, S. (2005). Judgments of risk frequencies: Tests of possible cognitive mechanisms. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 31, 621642.
Hills, T. T., & Hertwig, R. (2010). Information search in decisions from experience. Do our patterns of sampling foreshadow our
decisions? Psychological Science, 21, 17871792.
Hoffman, A. B., & Rehder, B. (2010). The costs of supervised classication: The effect of learning task on conceptual exibility.
Journal of Experimental Psychology: General, 139, 319340.
Juslin, P., Jones, S., Olsson, H., & Winman, A. (2003). Cue abstraction and exemplar memory in categorization. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 29, 924941.
Juslin, P., Karlsson, L., & Olsson, H. (2008). Information integration in multiple cue judgment: A division of labor hypothesis.
Cognition, 106, 259298.
Juslin, P., Olsson, H., & Olsson, A.-C. (2003). Exemplar effects in categorization and multiple-cue judgment. Journal of
Experimental Psychology: General, 132, 133156.
Juslin, P., & Persson, M. (2002). PROBabilities from Exemplars (PROBEX): A lazy algorithm for probabilistic inference from
generic knowledge. Cognitive Science, 26, 563607.
Karlsson, L., Juslin, P., & Olsson, H. (2007). Adaptive changes between cue abstraction and exemplar memory in a multiple-cue
judgment task with continuous cues. Psychonomic Bulletin and Review, 14, 11401146.
Karlsson, L., Juslin, P., & Olsson, H. (2008). Exemplar-based inference in multi-attribute decision making: Contingent, not
automatic, strategy shifts? Judgment and Decision Making, 3, 244260.
Katsikopoulos, K., Pachur, T., Machery, E., & Wallin, A. (2008). From Meehl (1954) to fast and frugal heuristics (and back): New
insights into how to bridge the clinicalactuarial divide. Theory and Psychology, 18, 443464.
Khader, P. H., Pachur, T., Meier, S., Bien, S., Jost, K., & Rsler, F. (2011). Memory-based decision making with heuristics involves
increased activation of decision-relevant memory representations. Journal of Cognitive Neuroscience, 23, 35403554.
Klayman, J. (1988a). On the how and why (not) of learning from outcomes. In B. Brehmer & C. R. B. Joyce (Eds.), Human judgment:
The SJT view. Amsterdam, The Netherlands: Elsevier.
Klayman, J. (1988b). Cue discovery in probabilistic environments: Uncertainty and experimentation. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 14, 317330.
Kruschke, J. K., Kappenman, E. S., & Hetrick, W. P. (2005). Eye gaze and individual differences consistent with learned
inattention in associative blocking and highlighting. Journal of Experimental Psychology: Learning, Memory, and Cognition,
31, 830845.
Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the NelderMead Simplex method in
low dimensions. SIAM Journal of Optimization, 9, 112147.
Lee, M. D., & Cummins, T. D. R. (2004). Evidence accumulation in decision making: Unifying the take the best and rational
models. Psychonomic Bulletin and Review, 11, 343352.
Little, D. R., & Lewandowsky, S. (2009). Beyond non-utilization: Irrelevant cues can gate learning in probabilistic categorization.
Journal of Experimental Psychology: Human Perception and Performance, 35, 530550.
Loewenstein, J., & Gentner, D. (2001). Spatial mapping in preschoolers: Close comparisons facilitate far mappings. Journal of
Cognition and Development, 2, 189219.
Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95, 492527.
Markman, A. B., & Gentner, D. (1997). The effects of alignability on memory. Psychological Science, 8, 363367.
Mata, R., Schooler, L. J., & Rieskamp, J. (2007). The aging decision maker: Cognitive aging and the adaptive selection of decision
strategies. Psychology and Aging, 22, 796810.
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classication learning. Psychological Review, 85, 207238.
Meehl, P. E. (1954). Clinical versus statistical predictions: A theoretical analysis and review of the evidence. Minneapolis, MN:
University of Minnesota Press.
Mellers, B. A. (1980). Congurality in multiple-cue probability learning. American Journal of Psychology, 93, 429443.
Namy, L. L., & Gentner, D. (2002). Making a silk purse out of two sows ears: Young childrens use of comparison in category
learning. Journal of Experimental Psychology: General, 131, 515.
Newell, B. R., & Shanks, D. R. (2003). Take the best or look at the rest? Factors inuencing one-reason decision making. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 29, 5365.
Newell, B. R., Weston, N. J., & Shanks, D. R. (2003). Empirical tests of a fast-and-frugal heuristic: Not everyone takes-the-best.
Organizational Behavior and Human Decision Processes, 91, 8296.
Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classication. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 10, 104114.
Nosofsky, R. M. (1986). Attention, similarity, and the identicationcategorization relationship. Journal of Experimental
Psychology: General, 115, 3957.
Nosofsky, R. M., & Bergert, F. B. (2007). Limitations of exemplar models for multi-attribute probabilistic inference. Journal of
Nosofsky, R. M., & Johansen, M. K. (2000). Exemplar-based accounts of multiple-system phenomena in perceptual
categorization. Psychonomic Bulletin and Review, 7, 375402.
Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-exception model of classication learning. Psychological Review,
101, 5379.
Oakes, L. M., Kovack-Lesh, K. A., & Horst, J. S. (2009). Two are better than one: Comparison inuences infants visual recognition
memory. Journal of Experimental Child Psychology, 104, 124131.
Oakes, L. M., & Ribar, R. J. (2005). A comparison of infants categorization in paired and successive presentation of familiarization
tasks. Infancy, 7, 8598.
Olsson, A.-C., Enkvist, T., & Juslin, P. (2006). Go with the ow: How to master a nonlinear multiple-cue judgment task. Journal of
Pachur, T. (2010). Recognition-based inference. When is less more in the real world? Psychonomic Bulletin and Review, 17,
589598.
Pachur, T. (2011). The limited value of precise tests of the recognition heuristic. Judgment and Decision Making, 6, 413422.
Pachur, T., & Hass, A. (2012). Boundary conditions of unbounded rationality. Paper presented at the 54th Tagung experimentell
arbeitender Psychologen, Mannheim, Germany.
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy selection in decision making. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 14, 534552.
Persson, M., & Rieskamp, J. (2009). Inferences from memory: Strategy- and exemplar-based judgment models compared. Acta
Psychologica, 130, 2537.
Rakow, T., Hinvest, N., Jackson, E., & Palmer, M. (2004). Simple heuristics from the adaptive toolbox: Can we perform the
requisite learning? Thinking and Reasoning, 10, 129.
Rieskamp, J., & Otto, P. E. (2006). SSL: A theory of how people learn to select strategies. Journal of Experimental Psychology:
General, 135, 207236.
Rouder, J. N., & Ratcliff, R. (2004). Comparing categorization models. Journal of Experimental Psychology: General, 133, 6382.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461464.
Sieck, W. R., & Yates, J. F. (2001). Overcondence effects in category learning: A comparison of connectionist and exemplar
memory models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 10031021.
Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 322.
Slovic, P. (1995). The construction of preference. American Psychologist, 50, 364371.
Stewart, N., Chater, N., & Brown, G. D. A. (2006). Decision by sampling. Cognitive Psychology, 53, 126.
Thurstone, L. L. (1927). A law of comparative judgement. Psychological Review, 34, 273286.
Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79, 281299.
Yamauchi, T., Love, B. C., & Markman, A. B. (2002). Learning nonlinearly separable categories by inference and classication.
Yamauchi, T., & Markman, A. B. (1998). Category learning by inference and classication. Journal of Memory and Language, 39,
124148.
Zacks, R. T., & Hasher, L. (2002). Frequency processing: A twenty-ve year perspective. In P. Sedlmeier & T. Betsch (Eds.), Etc.:
Frequency processing and cognition (pp. 2136). New York: Oxford University Press.

Pachur Olsson 2012-1 PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Pachur Olsson 2012-1 PDF

Transféré par

Droits d'auteur :

Formats disponibles

Cognitive Psychology 65 (2012) 207240

Contents lists available at SciVerse ScienceDirect

Type of learning task impacts performance and strategy

Article history: In order to be adaptive, cognition requires knowledge about the

Corresponding author. Fax: +41 61 2670441.

strategies and highlight the key role of comparison processes in

1.1. Cue-based and exemplar-based mechanisms in multiple-cue judgment

1.2. Learning by comparison vs. direct criterion learning

1.4. Overview of the experiments

1.4.1. Cross-item generalization and cross-task generalization

1.4.2. Computational modeling: Cue-based and exemplar-based models

models cannot be applied, as in learning by comparison no criterion value is provided. Therefore, we

Model Experiment 1 Experiment 2 Experiment 3 Number of free

2.1.3. Design and procedure

2.2.2. Estimation task

2.2.3. Computational modeling

Task Learning by comparison Direct criterion learning

2.3. Summary and discussion

Accuracy (% correct decisions)

3.1.2. Material, design and procedure

3.2.2. Estimation task

3.2.3. Computational modeling

3.3. Summary and discussion

4.1.3. Design and procedure

4.2.2. Estimation task

4.2.3. Computational modeling

4.3. Summary and discussion

5.3. The impact of the type of decision task on strategy selection

5.4. Cue abstraction and exemplar processing after learning by comparison

5.5. Compensatory vs. noncompensatory cue-based decision making

Experiment Training condition gTTB% WADD% CAM%

5.6. Learning by comparison and direct criterion learning in the wild

A.1. Detailed description of the cue-based and exemplar-based models

A.2. Cue-based models

A.2.1. Cue abstraction model (CAM)

A.2.2. Weighted-additive (WADD) model

A.2.3. Generalized take-the-best (gTTB)

A.3. Exemplar-based models

A.4. Parameter estimation and model evaluation

B.1. Calculation of the representation index (RI)

Vous aimerez peut-être aussi