Vous êtes sur la page 1sur 28

1 Introduction to Reward processing

Krishna Prasad Miyapuram


Ph. D. Thesis Chapter, University of Cambridge, April 2!
1.1 Functions of rewards
Reward seeking behaviour depends to a large extent on successfully extracting reward information
from a large variety of environmental stimuli and events. Learning to reliably predict the occurrence
of rewards such as food enables an organism to prepare behavioural reactions and improve the
choices that it makes in the future. Learning can be defined as a change in behaviour. Various
sensory cues from the environment such as sounds, sights and smells that are associated with a
reward guide the animal to return to the previously experienced reward (Wise, 2002). Thus, one of
the main functions of rewards is to induce learning, as subjects will come back for more when they
encounter a reward. Another function of rewards is to induce approach and consummatory
behaviour for acquiring the rewarding object. This is essential for decision making and Goal-
directed behaviour, as the animal learns to decide the appropriate actions to be executed with
rewards as goals. The third function of rewards is to induce subjective feelings of pleasure and
hedonia (positive emotions). This common perception associates rewards primarily with happiness.
Thus rewards have very basic functions in the life of individuals and are necessary for survival and
reproduction (survival of genes) (Schultz, 2000, 2004, 2006).
1.1.1 Learning by conditioning
Reward-directed learning can occur by associating a stimulus with a reward (Pavlovian or classical
conditioning) or by associating an action with a reward (instrumental or operant conditioning).
These forms of learning fall under the category of associative learning. More than a century ago,
Thorndike (1898) argued that learning consists of the formation of connections between stimuli and
responses and that these connections are formed whenever a response is followed by a reward. This
kind of learning is called instrumental (or operant) conditioning as the delivery of the reward is
contingent on the response made by the animal. Pavlov (1929) delivered the reward to his subjects
independently of the animals behaviour. Thus, learning in Pavlovian conditioning consisted of
Introduction to Reward processing
2

pairing between a stimulus and a reward. In both kinds of learning an arbitrary, previously neutral
stimulus (Conditioned Stimulus, CS) acquires the function of a rewarding stimulus after being
repeatedly associated in time with a rewarding object (Unconditioned Stimulus, US).
The early definitions of conditioning have emphasised that the temporal contiguity of the CS
and the US is essential for learning. Modern views of conditioning, however, suggest that the
pairing or contiguity of two events is neither necessary nor sufficient for learning to occur (see
Rescorla, 1988 for review). Rather, conditioning depends on the information that the CS provides

Figure 1-1 Learning by classical conditioning
(a) Contiguity requirement. The US needs to follow the CS in a temporally
contiguous manner. (b) If the US is delayed after the offset of CS, it is
called trace conditioning procedure. (c) Contingency requirement. The US
should have higher probability of occurring in the presence of the CS than
its absence for excitatory conditioning. (d) If the CS predicts the omission
of a US, it is said to be conditioned inhibition. (e) prediction error.
Unexpected delivery of reward gives a positive prediction error; while the
omission of a predicted reward gives a negative prediction error. (f) higher
order conditioning occurs when a second stimulus CS predicts the
occurrence of the CS.
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
3

about the US. More specifically, the US needs to occur more frequently in the presence of the CS as
compared with its absence. Further, a negative relation between a CS and US can be learned if the
occurrence of the CS predicts the omission of the US (conditioned inhibition, Rescorla, 1969). This
suggests that contingency of the US upon occurrence of the CS is crucial for Pavlovian conditioning
(Dickinson, 1980). When a US is fully predicted by a CS, then it does not contribute to any further
learning even if the contiguity and contingency requirements are fulfilled. This phenomenon is
illustrated by the blocking effect (Kamin, 1969), in which a previously formed association prevents
or blocks the formation of a new association. Kamin (1969) proposed that the surprise or error in
prediction of the US contributes to learning. Thus, there are three key factors govern learning by
conditioning contiguity, contingency, and prediction error (Tobler, 2003; Schultz, 2006).

Box 1 Models of conditioning Role of prediction error
Prediction error has been fundamental to many models of conditioning. Rescorla and Wagner
(1972) proposed that repeated pairing of a CS (stimulus A) and a US will result in a gradual
increase in the strength of association (V
A
) between them. According to their model, the change in
associative strength,
V
A
= (-V
T
)
where, the value of is set by the magnitude of the US and represents the maximum strength that
the CS-US association can achieve. V
T
represents the sum of associative strengths of all stimuli
present on the trial. Therefore, the term -V
T
represents the prediction error, which is nothing but
the discrepancy between the maximum associative strength and the current prediction. The two
learning-rate parameters and with values between 0 and 1 determined by the salience of the CS
(stimulus A) and the US respectively, that are fixed during conditioning. The Rescorla-Wagner (R-
W) model can explain the contingency requirement for conditioning by allowing the experimental
context to be associated with the US like any other CS. Hence if the probability p(US|CS) of the US
occurring in the presence of the CS is lower than the probability p(US|no CS) of the US occurring
in the absence of the CS, the associative strength for predicting the US would be greater for the
experimental context compared to that of the CS (conditioned inhibition). The blocking effect can
also be explained as the R-W model incorporates the prediction error from the total associative
strength V
T
of all stimuli present on a given trial. So a fully predicted US does not generate any
prediction error and hence blocks any further learning by a second stimulus. Despite, the limitations
Introduction to Reward processing
4

of R-W model to explain phenomenon like latent inhibition (the pre-exposure of a CS retards later
conditioning of the CS with a US), the prediction error principle remains central to a number of
contemporary models of conditioning (see Pearce and Boston, 2001).
Attentional theories of conditioning have suggested that in addition to the processing of the
US proposed by Rescorla-Wagner model, the processing of the CS is integral to the process of
conditioning (Mackintosh, 1975; Pearce and Hall, 1980). According to Mackintosh (1975), stimuli
that generate least absolute value of prediction error are good predictors of US and generate
maximum attention. The change in associability of a stimulus A is positive if | -V
A
| < | -V
x
|
and is negative otherwise. Here, V
x
is sum of associative strengths of all stimuli except A. The
change in associative strength is given by
V
A
=
A
(-V
A
)
Thus, Mackintosh model suggests a separable error term so that associative change undergone by a
CS is influenced by the discrepancy between its own associative strength (V
A
) and the outcome ().
Pearce and Hall (1980) proposed that the associability
A
of a stimulus A on a trial n is determined
by the absolute value of the discrepancy for the previous occasion on which stimulus A was
presented.

An
= | -V
T
|
n-1

The change in associative strength is determined by
V
A
=
A
S
A

where, S
A
denotes the salience of the CS.
Pearce-Hall model suggests, contrary to the Mackintosh model, that maximum attention (processing
of the CS) is generated by stimuli that have generated prediction error of the US in the previous
trial. Nevertheless, the attentional theories of conditioning suggest that attention to CS is crucial for
learning and changes in attentional processing result from absolute prediction errors (see Pearce and
Bouton, 2001 for a review).
The models of conditioning can be summarised as essentially including two terms that are
combined multiplicatively CS processing (eligibility) and US processing (reinforcement). While
the Rescorla-Wagner model proposed that learning is driven entirely by changes in US processing
in terms of prediction error, the Mackintosh and Pearce-Hall models have emphasised the role of
CS processing (attention) in terms of change in associability. Le Pelley (2004) has suggested a
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
5

hybrid model integrating these previous models of associative learning. The hybrid model
distinguishes between attentional associability of the Mackintosh model and the salience
associability of the Pearce-Hall model and combines them in a multiplicative way along with
separable error term (e.g. | -V
A
| ) and summed error term of Rescorla-Wagner model.
A real-time extension of the Rescorla-Wagner model is the temporal difference (TD) model
developed by Sutton and Barto (1981; Sutton, 1988; see Sutton and Barto, 1990 for a review with
reference to animal learning theories). The advantage of real-time models is that the temporal
relationship between stimuli within a trial can be captured. An important illustration is the delay
conditioning procedure. In this procedure, the CS has an onset much earlier than the US and the
onset of the US is at the offset of the CS or slightly earlier. A further delay between the offset of the
CS and the onset of the US is referred to as trace conditioning procedure. The time between the
onset of the CS and the onset of the US is called Inter-Stimulus-Interval (ISI). The effectiveness of
conditioning is known to reduce for long ISI (see Sutton and Barto, 1990). This can be explained by
assuming that the internal representation of CS as perceived by the subject diminishes during the
ISI. This can be modelled by taking several time-bins within a trial and the CS predicts a temporally
discounted sum of all future rewards within the trial with more recent time-bins having greater
weight. Thus, an US occurring with longer ISI is discounted more and hence is less effective in
conditioning. For example, using an exponential discounting function, with as discount factor, the
reward predicted V
t
at time t is given by
V
t

t+1
+
t+2
+
2

t+3
+
3

t+4
+
The following recursive relationship allows estimation of the current prediction and avoids the
necessity to wait until all future rewards are received in that trial.
V
t

t+1
+ V
t+1

We can now define the temporal difference error that must approach zero with learning as

t
=
t+1
+ V
t+1
- V
t

and the learning is governed by
V
A
=
A
(
t+1
+ V
t+1
- V
t
)
where,
t+1
+ V
t+1
takes the role of (asymptotic value of US) in Rescorla-Wagner model.
Another important illustration of the use of real-time models such as the TD is that it can explain
higher-order conditioning, in which conditioned stimuli not only acquire predictive power when
Introduction to Reward processing
6

associated with an US, but also when associated with another conditioned stimuli that has
previously been associated with an US. The prediction of reward at various time-points within a
trial, as proposed by the TD model, explains the ability of the organism to predict the US based on
the earliest available CS.
1.1.2 Approach behaviour and decision making
Rewards act as positive reinforcers by increasing the frequency and intensity of the behaviour that
leads to the acquisition of goal objects (Schultz, 2000). Reinforcers are those objects that increase
the frequency of behaviour. Rewards also act as goals in their own right and can therefore elicit
approach and consummatory behaviour. Omission of reward leads to extinction of behaviour.
Punishment has opposite motivational valence to reward and decrease the frequency of behaviour.
Avoidance/escape behaviours are negatively reinforced (strengthened) in order to prevent/terminate
a punishment, respectively. These findings have been formalised as law of effect (Thorndike, 1911)
that states learning would only occur if there was reinforcement. The approach behaviour has been
central to the operational definition of rewards as those objects which subjects will work to acquire
through allocation of time, energy, or effort (McClure, 2003) or in other words, rewards make
subjects come back for more.
In Pavlovian conditioning, the conditioned stimuli elicit responses that help prepare the
animal for the consumption of reward. Konorski (1967) distinguished between preparatory and
consummatory conditioned responses. Preparatory responses (e.g. excitement, approach) depend on
the general motivational attributes of, or emotional responses to, a reinforcer and hence reflect the
general affective value of the reinforcer. Consummatory responses (e.g. pecking, salivation) depend
on the specific sensory attributes of the reinforcer (Mackintosh, 1983). In most experiments, both
preparatory and consummatory conditioning will occur. Therefore, CS will be associated with both
affective and sensory attributes of the US.
In instrumental conditioning, the actions that lead to reward are reinforced. In real world, an
animal is often encountered with more than one action to choose. The animal is then confronted
with a decision-making situation and would choose those actions that have maximum value.
Reinforcement learning models and its implementations such as the actor-critic architecture
provide an account of choice behaviour. An agent (organism) learns to achieve a goal (maximise
reward) by navigating through the space of states (making decisions - actor) using the reinforcement
signal (updating the value function - critic). In the temporal difference (TD) model, the TD error
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
7

guides the updating of value function V(S
t
) when transitioning from state S
t
to state S
t+1
. Q-learning
and its variants have offered estimation of value functions over state-action pairs, so that in a given
state s, the organism chooses the action a that maximises the value Q(s,a). The updating of value
function Q is done similar to the TD model.
Box 2 Basic reward parameters Microeconomic concepts
The influence of rewards on decision-making can be assessed by the basic reward parameters such
as magnitude, probability and delay. Given a choice between different magnitudes or probabilities
of reward, an organism would choose those options with higher magnitude and probability. Smaller
delays to obtain reward are preferred to longer delays. The reward value is expressed as associative
strength in models of conditioning that facilitates learning.
The occurrence of rewards is uncertain in the dynamic world, in which both the environment
and the behaviour of other agents render the rewards partly unpredictable. Uncertainty can be in the
expected magnitude of the reward (characterised by the variance) or the probability (p) of the
reward (maximum uncertainty at p = 50%) or the time of delivery of the reward. The uncertainty of
rewards can generate attention that determines learning according to associability learning rules
(Mackintosh, 1975; Pearce and Hall, 1980).
Pascal, way back in 1650, conjectured that human choice behaviour could be understood by
the expected value (product of probability and magnitude of the reward). Bernoulli (1738/1954)
suggested that the actual value or the utility that the people assign to an outcome depends on the
wealth of the assigning person and grows more slowly than its magnitude. Bernoulli proposed that
increase in magnitude is always accompanied by an increase in the utility, which follows a concave
(more specifically, a logarithmic) function of magnitude. Hence, individuals behave as to maximise
the expected utility, instead of the expected value. Prospect theory (Kahneman and Tversky, 1979)
suggests that not only the perception of magnitude but also the perception of probability is
subjective to an individual.
1.1.3 Subjective feelings of pleasure
The common perception of reward associates positive feelings of pleasure and hedonia as one of the
main functions of reward. Pleasure is a subjective feeling as it depends on the motivation of the
organism (wealth, satiety etc) and other available options (contextual effects). Rewards induce
positive emotions (affect). Recent theories (Berridge and Robinson, 2003) have suggested that the
Introduction to Reward processing
8

motivational and emotional functions of rewards are dissociable as wanting and liking respectively.
Both the motivational and emotional functions can occur either consciously or unconsciously.
Indeed, wanting can occur without pleasurable liking as accumulated wealth or satiation can fade
away the liking.
1.2 Classical Reward structures: Neurophysiology
Dopamine neurons of the ventral tegmental area (VTA) and substantia nigra have long been
identified with the processing of rewarding stimuli. Romo and Schultz (1990) have shown that
phasic dopamine responses appeared to be related to the appetitive properties of the object being
touched rather than the object itself. Phasic burst of dopamine neurons occurred when the monkey's
hand touched a morsel of food but not when the animal's hand touched a wire or other non-food
objects. Dopamine neurons in the substantia nigra pars compacta form part of the nigrostriatal
pathway and project mainly to the caudate and putamen and is identified strongly with motor
function. More medially, the ventral tegmental area (VTA) projects strongly to the nucleus
accumbens and also to the amygdala and hippocampus (mesolimbic pathway). The mesocortical
pathway from medial VTA project to a number of brain structures including the dorsal and ventral
prefrontal cortex. The mesocorticolimbic structures are known to be involved in processing the
reward information.
1.2.1 Dopamine responses related to animal learning theory
Dopamine neurons respond to the sight of primary food reward and to the conditioned stimulus
associated with reward (Ljungberg et al. 1992). However dopamine responses were not observed to
a light that was not associated with task performance, suggesting the behavioural significance of
dopamine neurons specific to reward. When a stimulus predicting reward is itself preceded by
another stimulus, the phasic activation of dopamine neurons transfers back to this latter stimulus
(Schultz et al., 1993). Thus, dopamine neurons might respond to the earliest reward predicting
stimulus.
Mirencowiz and Schultz (1994) found that of dopamine neurons showed a short-latency,
phasic response to unpredicted liquid rewards and during conditioning. After learning, the neuronal
responses occurred at the onset of the conditioned stimulus. When a predicted reward is omitted,
dopamine neurons are depressed time-locked to the usual occurrence of the reward. It is suggested
that the phasic dopamine response might encode the discrepancy between the predicted and the
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
9

actual occurrence of the reward (for review see Schultz et al., 1997). More recently, Bayer and
Glimcher (2005) have used a regression model that replicated the findings consistent with a
temporal difference model demonstrating a role of dopamine neurons in positive reward prediction
error. Hollerman and Schultz (1998) showed that dopamine neurons were activated by rewards
during early trials and the activity progressively reduced as the rewards became more predictable.
Further, these neurons were activated when rewards occurred at unpredicted times and were
depressed when rewards were omitted at predicted times. Thus dopamine neurons encode errors in
prediction of both the occurrence and the time of rewards.
Waelti et al. (2001) used blocking procedure to show that the responses of dopamine
neurons to conditioned stimuli were governed differentially by the occurrence of reward prediction
errors rather than stimulusreward associations alone. Tobler et al. (2003) used a conditioned
inhibition paradigm and showed that out of 69 dopamine neurons that were strongly active to
reward predicting stimuli, 48 neurons showed considerable depressions to conditioned inhibitors
and minor activations in remaining neurons. To be able to successfully discriminate between reward
and non-reward predicting stimuli, attention must be paid to both conditioned excitors as well as

Figure 1-2 Primary target regions of dopamine
The dopamine neurons, named after the neurotransmitter they release with
nerve impulses in their projection territories, are located in the midbrain
structures substantia nigra (pars compacta) and the medially adjoining
ventral tegmental area (VTA). The axons of dopamine neurons project to
the striatum (caudate nucleus, putamen and ventral striatum including
nucleus accumbens), the dorsal and ventral prefrontal cortex, and a
number of other structures.
Introduction to Reward processing
10

inhibitors. This indicates differential neural coding of reward prediction and attention.
These findings indicate dopamine responses comply with basic tenets of animal learning
theory and indicate a role for dopamine in reward-based learning, in particular representing reward
prediction errors. Learning rules such as proposed by Rescorla and Wagner (1972) also explain
greater associative strength for increasing magnitudes of reward. Further as learning is captured by
the concept of prediction error, thus increasing probability of reward should result in smaller
responses to the reward and thereby greater responses to the reward predicting cue. These basic
parameters of reward processing, namely magnitude, probability, expected value and uncertainty
have been fundamental concepts of microeconomics.
Two reports by Schultz and colleagues (Fiorillo et al., 2003; Tobler et al., 2005) have shown
dopamine responses to magnitude and probabilities of reward. Fiorillo et al. (2003) found that the
phasic activation of dopamine neurons varied monotonically across the full range of probabilities,
supporting past claims that this response codes the discrepancy between predicted and actual
reward. In addition, a gradual increase in activity until the potential time of reward was observed
that was related to the uncertainty of obtaining a reward. Tobler et al. (2005) found that the phasic
activation of midbrain dopamine neurons showed similar sensitivity to both the magnitude and
probability of reward, and appeared to increase monotonically with expected reward value. Further,
a second form of adaptation observed was the change in sensitivity or gain of neural activity that
appeared to depend on the range of likely reward magnitudes.

Figure 1-3 Dopamine responses to basic reward parameters (adapted from Tobler, 2003)
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
11

1.2.2 Reward signals in the striatum and orbitofrontal cortex
Hikosaka et al. (1989) showed reward expectation and reward delivery related activation in caudate
neurons. The activations were non-selective for how the monkey obtained the reward, i.e., by visual
fixation only, by a saccade, or by a hand movement. Apicella et al (1991) found ventral and dorsal
striatal responses to primary liquid rewards that could be distinguished from movement related
activations in posterior putamen. Neurons that detect rewards are more common in the ventral
striatum than in the caudate nucleus and putamen. Schultz et al. (1992) showed reward-expectation
and reward-delivery related responses in the ventral striatum. Changes in the appetitive value of the
reward liquid modified the magnitude of activations, suggesting a possible relationship to the
hedonic properties of the expected event.
Thorpe et al. (1983) showed that neurons in orbitofrontal cortex responded selectively to
particular foods or aversive stimuli that could not be explained by simple sensory features of the
stimulus. Orbitofrontal neurons tracked whether particular visual stimuli continue to be associated
with reinforcement and the responses reversed when the stimulus contingencies were interchanged.
Critchley and Rolls, (1996) found neuronal responses in orbitofrontal cortex to rewards and reward-
predicting stimuli is reduced with satiation and hence are related to the motivational value rather
than the sensory properties of reward objects. Tremblay and Schultz (2000) showed activation in
orbitofrontal neurons to expectation of reward that also detect reward delivery at trial end. The
activations also preceded expected drops of liquid delivered outside the task.
The number of possible reward values and stimuli has no absolute limits. However, the
number of neurons and their possible spike outputs are limited. If the neurons outputs were evenly
allocated for reward values, there would be little discrimination between rewards. Neurons in the
orbitofrontal cortex of the monkey discriminate between different rewards on the basis of their
relative preferences (Tremblay and Schultz, 1999). For example, consider a neuron that is active
when a more preferred reward (such as a piece of apple) is expected rather than a less preferred
reward (such as cereal). The same neuron shows higher activity, in a different trial, when an even
more preferred reward (such as raisin) is expected rather than the previously preferred reward of
apple. Thus, rewards may influence each other, and the value of a reward can depend on other
available rewards. Cromwell and Schultz (2003) have shown that single neurons within the anterior
striatum distinguish between minute differences in reward magnitude (<0.1 ml). These findings
parallel with the adaptive coding of dopaminergic midbrain activity found in the Tobler et al. (2005)
Introduction to Reward processing
12

study. Cromwell et al., (2005) suggested that the shift in reward processing due to different
preferences of the animal may reflect the adaptation of responses to the current reward distribution.
For linear, monotonic responses, this can be expressed as y=a+b(x-p) where, b represents reward
sensitivity and p represents the shift in the current distribution and a is a constant. It might be
possible that the immediate past experience sets up a prediction about the mean and range of the
future rewards. This prediction would facilitate the brain to use its full coding potential, thus
optimising its response, only within this distribution.
1.3 FMRI studies of reward processing
Although animal studies provide an unprecedented approach to study neural mechanisms at cellular
level, the limited communication and cognitive capabilities restricts the investigation of reward
functions in animals. Early neuroimaging studies have replicated the animal results in human
subjects and extended the view of putative reward-processing neural structures. Presentation of
monetary or liquid rewards and stimuli predicting such rewards activates reward structures
previously characterised in neurophysiological experiments, notably the striatum, orbitofrontal
cortex, amygdala and dopaminergic midbrain. As human blood oxygen level dependent (BOLD)
responses most likely reflect presynaptic inputs to neurons (Logothetis et al. 2001), some of these
activations may be due to the known dopaminergic inputs to these structures.
In a Positron Emission Tomography (PET) study, Thut et al. (1997) have found activation of
left frontal cortex, thalamus and midbrain in a go-no go task using monetary rewards. Arana et al.
(2003) in a PET study used a restaurant task in which subjects considered or chose items from a
menu tailored according to the subjects preferences. Amygdala and medial orbitofrontal cortex
were activated when considering appetitive incentive values of foods. Activation in amygdala
correlated with the subjects incentive ratings and the activation in medial orbitofrontal cortex
correlated with the difficulty of choice being made suggesting its role in goal selection.
Kirsch et al. (2003) used a differential conditioning paradigm and asked participants to
perform a reaction time task. Participants were rewarded (or not rewarded) with a monetary or
verbal feedback (fast or slow). Activity related to anticipation of reward in substantia nigra and
nucleus accumbens was stronger with highly motivating stimuli (monetary reward) compared to
less motivating ones (verbal feedback).
Gottfried et al. (2003) trained subjects with picture-odour associations while performing a
visuospatial discrimination task. After training, subjects received the same contingencies in two
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
13

further sessions. Subjects were fed to satiety selectively on one of the two food-based olfactory
rewards in between the two sessions. Activity in the amygdala and OFC declined for the CS
predicting the devalued odour, while the activity in the ventral striatum, insula and cingulate cortex
not only showed decreased responses to the CS predicting the devalued odour, but also increased
responses to the CS predicting non-devalued odour. Their results suggest that amygdala and OFC
encode the current value of the reward representations accessible to the predictive cues.
Ramnani et al. (2004) trained participants with Pavlovian conditioning paradigm in which
two conditioned stimuli predicted occurrence of a 1 pound reward or no reward respectively.
Participants were then scanned during which few of the trials had cue-outcome contingencies
reversed. Unexpected rewards evoked activation in the orbitofrontal cortex, frontal pole,
parahippocampal cortex and cerebellum. Unexpected reward failure evoked activity in the frontal
pole and the temporal cortex.
Cox et al. (2005) used a simple card game (guessing whether the number on the back of a
card shown face up was higher or lower than the value shown) to mask awareness of a conditioning
task in which discriminable visual patterns were associated with monetary reward and loss. The
patterns were then presented one at a time without reward or negative feedback. Subjects indicated
their preference when two patterns were presented simultaneously. This procedure allowed the
authors to test the brain activations to conditioned stimuli in the absence of explicit reward
anticipation. Activity was observed in ventral striatum and OFC when reward was compared with
negative feedback. When passively viewing the conditioned stimuli, activation was observed in the
OFC. Thus OFC is involved in representing both rewarding and conditioned stimuli that have
acquired reward value.
ODoherty et al. (2006) used Pavlovian conditioning and determined subjects preferences
for five different food flavours that were associated with five fractals. Subjects performed a
keypress indicating the spatial location (left or right) of the fractal. Using a temporal difference
model of learning the value signal, they found that ventral midbrain region showed a linear response
to preferences. However, the ventral striatum showed bivalent responses, with maximal responses
to most and least preferred food, possibly consistent with the suggestions that ventral striatum might
be involved in both appetitive and aversive learning (Jensen et al., 2003; Knutson et al., 2001;
Seymour et al., 2004). Given that no aversive stimuli were used by ODoherty et al. (2006), a
further possibility is that the ventral striatum is coding a relative value of the stimuli rather than the
Introduction to Reward processing
14

objective value independently of the context in which the stimuli are presented (Cromwell et al.
2005).
Recently, Bray and ODoherty (2007) used classical conditioning procedure in which
subjects performed a simple spatial identification task to indicate the side (left or right) on which a
fractal was presented. Participants received reinforcement on 50% of trials by attractive/
unattractive faces. They find significant prediction-error related activity in the ventral striatum for
the attractive compared with unattractive faces. In contrast, amygdala showed positive correlations
with prediction error signals of both attractive and unattractive faces.
1.3.1 Motivational Valence
A number of neuroimaging studies have found distinct neural systems processing reward and
punishment information. Delgado et al. (2000) asked participants to guess whether the value of the
card was higher or lower than 5. Participants received monetary reward ($1.00), punishment ($0.50)
or neutral feedback. They found that bilateral caudate in the dorsal striatum showed differential
activation based on the valence of the feedback. A sharp decrease of response below baseline was
observed after a punishment, while the activations sustained following a reward. Delgado et al.
(2004) found the activity in caudate nucleus was more robust in early phases of learning, which
decreased for the reward-feedback signal as the learning progressed for the well-predicted cues.
They suggest the role of caudate in the initial acquisition of contingencies by trial-and-error
learning as well as its activity is modulated as a function of learning and predictability.
Knutson et al. (2001) showed that anticipation of reward significantly increased activation in
the nucleus accumbens, whereas activation in medial caudate increased in anticipation of both
rewards and punishments. Nucleus accumbens activity was also correlated with self-reported
happiness. Cues signalled the potential reward ($0.20, $1.00, or $5.00), punishment ($0.20, $1.00,
or $5.00) or no monetary outcome. Subjects performed a button press task during a target to win or
avoid losing money and the task difficulty was adjusted so that subjects should succeed on ~66% of
target responses.
The lateral area of orbitofrontal cortex (OFC) is activated following a punishing outcome
and the medial OFC is activated following a rewarding outcome (see Elliot et al; 2000 Review).
ODoherty et al. (2001) used a visual reversal-learning task in which the choice of a correct
stimulus lead to a probabilistically determined monetary reward and the choice of an incorrect
stimulus lead to a monetary loss. They found a medial-lateral distinction for rewarding and
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
15

punishing outcomes, respectively. ODoherty et al., (2003) used a reversal task in which selection
of a correct stimulus lead to 70% probability of receiving monetary reward and 30% probability of
monetary punishment. The incorrect stimulus had the reverse contingency. The reversal occurred on
a random trial after a criterion of five selections of the correct stimulus was reached. They found
that ventromedial and orbital PFC are not only involved in representing the valence of outcomes,
but also signal subsequent behavioural choice. The anterior insula / caudolateral OFC was related to
behavioural choice and was active in trials that required a switch in stimulus choice the subsequent
trials.
Jensen et al. (2003) found ventral striatum activation in anticipation of aversive stimuli
(unpleasant cutaneous electrical stimulation) that was not a consequence of relief after the aversive
event. Further, the ventral striatum was active regardless of whether there was an opportunity to
avoid the stimulus or not. Nitschke et al. (2006) used a passive viewing task of aversive and neutral
pictures. They found activation in dorsal amygdala, anterior insula, dorsal ACC, right DLPFC, and
posterior OFC during both anticipation and viewing of aversive pictures. Further, rostral ACC,
superior sector of right DLPFC and medial sectors of OFC were more responsive to anticipation of
aversive pictures than in response to them.
The relief obtained by avoidance of an aversive stimulus can itself be a reward. Kim et al.
(2006) used a monetary instrumental task in which participants chose between a pair of fractals that
marked the onset of four trial types that predicted reward, avoided loss, neutral feedback, or no
feedback with 60% or 30% probability. They found that medial OFC activity increased after
receiving reward or avoiding loss, and decreased after failing to obtain a reward or receiving an
aversive outcome. These responses cannot be explained as a prediction error, because the activity
does not decrease over the course of learning. They also found signed reward prediction error
signals in the ventral striatum on reward trials but not on avoidance trials, possibly indicating that
monetary loss as a secondary reinforcer as against primary reinforcers such as aversive flavours or
pain might be processed differently in the ventral striatum.
1.3.2 Reward prediction errors
The reward responses comply with formalisms of learning theory such as the reward prediction
error hypothesis. Berns et al. (2001) delivered subjects with fruit juice and water in a temporally
predictable or an unpredictable manner. Unpredictability of rewards resulted in significant activity
in nucleus accumbens and medial orbitofrontal cortex, while predictability resulted in activation
Introduction to Reward processing
16

predominantly in the superior temporal gyrus. Unlike classical conditioning, the source of
prediction in Berns et al (2001) was based on the sequence of stimuli. Moreover, the subjects
preference for juice or water was reflected in the activity in sensorimotor cortex, but not in reward
regions. Pagnoni et al. (2002) demonstrated that activity in ventral striatum is time-locked to reward
prediction errors when a juice expected at 4 seconds after a cue initiated button press was delayed to
seconds. The finding was not replicated when the juice was replaced by a visual stimulus indicating
that ventral striatum selectively encodes rewarding events and not any salient stimulus in general.
McClure et al. (2003) used classical conditioning paradigm to test for temporal prediction errors
when a juice expected at 6 seconds delay after a light cue was delivered only after a further delay of
4 seconds. Thus a negative prediction error would occur for the absence of juice, while a positive
prediction error would occur for unexpected delivery of juice at a later time. They found that both
these prediction errors correlated with activity in the left putamen.
The real-time extension of the Rescorla-Wagner learning rule, i.e. the TD model has been
successfully used to explain brain activity in tasks involving prediction error. ODoherty et al.
(2003) used appetitive conditioning with taste reward. Three fractals were associated with glucose,
neutral taste or no taste. Reward was omitted or unexpectedly delivered in some of the trials.
Regression analysis with a temporal difference model revealed significant correlation of activity in
the ventral striatum and OFC with the error signal, suggesting their role in reward-related learning.
Seymour et al. (2004) used second-order pain learning task in which two visual cues preceded
delivery of high or low pain. While the second cue fully predicted the strength of the subsequently
experienced pain, the first cue only allowed a probabilistic prediction. They demonstrate that
activity in the ventral striatum and the anterior insula display a marked correspondence to the
signals predicted by the temporal difference models.
Seymour et al. (2007) used a probabilistic Pavlovian task to compare winning / losing
money in two conditions when the alternative was respectively winning / losing nothing, or losing /
winning money. Positive reward prediction error can be obtained by contrasting bivalent 1.00 win
with univalent 1.00 outcome. Similarly, Positive loss prediction error is obtained by contrasting
bivalent 1.00 loss with univalent 1.00 loss. The opposite contrasts would reveal negative
prediction errors. They found that striatal activation reflected positively signed prediction error in
anterior region for rewards and more posteriorly for losses.
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
17

1.3.3 Neuroimaging of basic reward parameters
Animal learning theory and microeconomic theory have suggested a number of basic reward
parameters, such a magnitude, probability, delay etc that are involved in processing reward
information. In a parametric study, Elliott et al. (2003) have found non-linear responses in
orbitofrontal cortex with increasing magnitudes of financial reward. They parametrically varied the
monetary reward value (10, 20, 50 pence and 1 pound) while subjects performed a simple target
detection task. They found that amygdala, striatum and dopaminergic midbrain responded
regardless of the reward value, while medial and lateral OFC responded non-linearly showing
maximum response for the lowest and highest values. Galvan et al. (2005) used a delayed-response
spatial-choice task and were presented with small, medium or large amount of coins. The exact
value of each reward was not disclosed to the subjects to avoid the subjects counting the total
money after each trial. They find reward magnitude related responses in nucleus accumbens,
thalamus and orbitofrontal cortex. Interestingly, only nucleus accumbens had a shift in activity from
the reward to the predicting cue during later stages of learning. A frontostriatal shift in activity can
be suggested (Pasupathy and Miller, 2005) as the OFC responses contrasted with the accumbens
activity and the responses in OFC increased to the rewarded response rather than to the predictive
cue.
Breiter et al. (2001) showed subjects prospects consisting of a set of three outcomes and one
of these amounts was awarded after a delay. Subjects could win or lose money in these prospects.
Three kinds of prospects (good: $10, $2.50, $0, intermediate: $2.50, $0, -$1.50 and bad: $0, -$1.50,
-$6) were used. Subjects could either win, lose or retain their initial endowment of $50. In the good
prospect, subjects could win additional money or retain their earnings, in the bad prospect subjects
could retain or lose money and in the intermediate prospect, subjects could win/retain/lose money.
Haemodynamic responses in the amygdala and orbital gyrus tracked the expected values of the
prospects. Sustained outcome phase responses in nucleus accumbens, amygdala, and hypothalamus
were ordered as a function of monetary payoff on the good prospect. They found a large overlap
between the neural activations in the prospect and outcome phase and little evidence for anatomical
segregation between the prospect and outcome phases. According to decision affect theory (Mellers
et al., 1997), responses to a given outcome depend on counterfactual comparisons. Thus $0 on a
good prospect will be experienced as a loss and the same outcome in a bad prospect would be
experienced as a win. Partial evidence for this was observed clearly in time courses of nucleus
accumbens and amygdala for the good and bad prospects, but not so for intermediate prospect.
Introduction to Reward processing
18

McClure et al. (2004) examined neural correlates of time discounting while subjects made a
series of choices between monetary reward options that varied by delay (same day to 6 weeks later)
to delivery. They found ventral striatum, medial OFC, medial PFC, posterior cingulate and left
posterior hippocampus were related to choice of immediate rewards. In contrast, regions of lateral
prefrontal cortex and posterior parietal cortex are engaged uniformly by inter-temporal choices
irrespective of delay. Recently, McClure et al. (2007) used primary rewards (fruit juice or water)
with time delay of minutes instead of weeks and found similar activation patterns as in their
previous study. When the delivery of all rewards was offset by 10 min, there was no further
differential activity in limbic reward-related areas. Hence suggesting that time discounting is not a
relative concept.
Dreher et al. (2006) used three slot machines with two different reward values ($10, $20)
and reward probabilities (0.25, 0.5), so that one pair of slot machines had the expected value
matched. To avoid counterfactual comparison, a common outcome of no reward with a probability
of 1 served as a fourth slot machine. They found that midbrain region responded transiently to
higher reward probability at the time of the cue and to lower reward probability at the time of the
reward outcome and in a sustained fashion to reward uncertainty during the delay period. These
results parallel those found in electrophysiological studies of primates (Fiorillo et al., 2003). The
midbrain activations could not be explained by increase in expected value alone, as when
comparing the two conditions with equal expected values, midbrain activation was robustly
activated in anticipation of an uncertain reward (50% probability) with low magnitude ($10)
compared with reward with known lower probability (25%) but higher magnitude ($20). A frontal
network covaried with the reward prediction error signal both at the time of the cue and at the time
of the outcome. The ventral striatum (putamen) showed sustained activation that covaried with
maximum reward uncertainty during reward anticipation. Their results suggest distinct functional
networks encoding statistical properties of reward information.
Recently, Liu et al. (2007) used a monetary decision making task in which participants
chose whether to bank or bet a certain number of chips. Decision to bank or losing the bet made
them start over from one chip, while the wager was doubled if they won the bet. However,
participants witnessed the outcome even after they banked. They contrasted three reward processes
reward anticipation (bet Vs bank), outcome monitoring (win Vs loss) and choice evaluation (right
Vs wrong). They found the striatum and medial/middle orbitofrontal cortex was activated by
positive reward anticipation, winning outcome and evaluation of right choices, whereas lateral
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
19

orbitofrontal cortex, anterior insula, superior temporal pole and dorsomedial frontal cortex were
active by negative reward anticipation, losing outcome and evaluation of wrong choices. These
results suggest that valence of reward information and counterfactual comparison (right Vs wrong
choices) result in functional dissociation between frontal and striatal regions and are crucial in
reward-seeking behaviour.
1.3.4 Reward processing and Instrumental conditioning
It is suggested that the ventral and dorsal striatum are important in stimulus-reward and stimulus-
response-reward associations, corresponding to Pavlovian and Instrumental conditioning,
respectively. ODoherty et al. (2004) dissociated the actor and critic components in instrumental
conditioning paradigm using a yoked Pavlovian conditioning (involving the same value predictions,
but not requiring any action) as a control condition. Participants chose between two stimuli
predicting high and low probabilities (60% or 30%) of juice reward or affectively neutral solution.
They found nucleus accumbens and ventral putamen to be active during instrumental as well as
Pavlovian conditioning. In contrast, activity in the left caudate nucleus was significantly greater in
instrumental conditioning. Their results suggest dissociable contributions of the ventral and the
dorsal striatum to be the critic and actor, respectively during instrumental conditioning.
Instrumental conditioning consists of at least two distinctive learning systems a goal-
directed system that involves learning the associations between the responses and incentive value of
the outcomes (stimulus-response-outcome) and a habit system that involves stimulus-response
learning. Valentin et al. (2007) used an outcome-devaluation paradigm in which participants were
trained to perform different instrumental actions for liquid-food rewards (tomato juice, chocolate
milk) delivered with 40% probability. The participants were then fed to satiety on one of the foods
and tested again in extinction. To maintain some degree of responding during extinction, an orange
juice was used with equal probability (30%) for both actions before as well as after devaluation
sessions. This paradigm allowed them to separate brain regions that are related with goal-directed
action from those involved with habit learning (regions not influenced by outcome-devaluation).
They found that medial and lateral OFC showed a strong modulation in activity during selection of
a devalued compared with a non-devalued action, suggesting its role in goal-directed learning.
There have been proposals that the striatum is involved in coding stimulus saliency rather
than having an exclusive role in reward processing per se. Zink et al. (2004) investigated the effect
of saliency, which was maximal when receipt of monetary rewards was contingent on subjects
Introduction to Reward processing
20

performance and minimal when receipt of money was unrelated to the task. They found that
behaviourally salient monetary rewards activate the human striatum, suggesting its role in saliency
of rewards rather than value or hedonic feelings. Tricomi et al. (2004) have reported that the
caudate nucleus was robustly activated when the subjects thought that whether they won or lost
money was contingent on the button press (i.e. action). Elliot et al. (2004) investigated whether the
neural responses to financial reward depended on instrumental action using a 2x2 factorial design
consisting of movement and reward as the two factors. Subjects performed a simple target detection
task. The trial types were indicated by coloured squares and hence rewards were fully predictable
and reward expectation remained fixed. Significant enhancement of the reward-related response
under the movement condition was seen in the dopaminergic midbrain, dorsal striatum and the
amygdala.
1.4 Rationale
The recently developed functional Magnetic Resonance Imaging (fMRI) methods provide a unique
opportunity to extend reward work to humans, first by replicating, and thus referencing, the reward
work done in monkeys, and then by investigating typical 'human' tasks that are difficult to approach
in animals.
As mentioned earlier, rewards have schematically three functions: they induce learning,
approach behaviour, and positive emotions. The first of the reward functions (learning) can be well
investigated in animals, for example using classical (Pavlovian) and instrumental (operant)
conditioning. The second reward function (approach behaviour) can also be investigated in animals,
but the work is limited due to their limited communication and cognitive abilities. The third reward
function (subjective feelings of pleasure) is very difficult to investigate in animals, and humans
appear to be the subjects of choice.
Monetary rewards are uniquely human. The importance of money in everyday life makes it
a strong reinforcer. Neurophysiological studies in animals have provided the primary basis for
speculations about the brain areas that might process reward information in the human brain. The
initial neuroimaging studies in humans using Positron Emission Tomography revealed that alpha-
numerically presented monetary reward was more reinforcing than positive reinforcement of the
word ok in the dorsolateral and orbital frontal cortex, midbrain and thalamus (Thut et al., 1997).
The success of fMRI to study reward processing in humans was obtaining measurable BOLD signal
changes in the orbitofrontal cortex (OFC), amygdala, ventral striatum/nucleus accumbens (see
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
21

McClure et al., 2004 for a review), regions that have previously been implicated in reward
processing in non-human primates. A wide range of rewarding stimuli including primary rewards
(liquids, smells, sexual stimuli), abstract rewards (money, positive reinforcement) and social
rewards (beautiful faces, pleasant touch) activate the same network of brain areas. The findings
from numerous animal and human studies has led researchers to suggest the roles that the different
brain areas might play in processing reward information. The midbrain and ventral striatum might
be involved in reward prediction error, while the orbitofrontal cortex might be involved in
evaluating rewards and relative processing of rewards. The amygdala, though traditionally believed
to process aversive and fear-inducing stimuli, is now generally believed to involve processing
reinforcer intensity both appetitive and aversive.
Bibliography
Adcock, R., Thangavel, A., Whitfield-Gabrieli, S.,
Knutson, B., and Gabrieli, J. (2006). Reward-
motivated learning: mesolimbic activation precedes
memory formation. Neuron, 50:507517.
Anderson, A., Christoff, K., Stappen, I., Panitz, D.,
Ghahremani, D., Glover, G., Gabrieli, J., and Sobel,
N. (2003). Dissociated neural representations of
intensity and valence in human olfaction. Nat.
Neurosci., 6:196202.
Apicella, P., Ljungberg, T., Scarnati, E., and Schultz,
W. (1991). Responses to reward in monkey dorsal
and ventral striatum. Exp Brain Res, 85(3):491
500.
Arana, F., Parkinson, J., Hinton, E., Holland, A.,
Owen, A., and Roberts, A. (2003). Dissociable
contributions of the human amygdala and
orbitofrontal cortex to incentive motivation and
goal selection. J. Neurosci., 23:96329638.
Bandettini, P.A. (1994). Magnetic resonance imaging
of human brain activation using endogenous
susceptibility contrast, PhD Thesis, Medical
College of Wisconsin.
Bandettini, P., Wong, E., Jesmanowicz, A., Hinks, R.,
and Hyde, J. (1994). Spin-echo and gradient-echo
EPI of human brain activation using BOLD
contrast: a comparative study at 1.5 T. NMR
Biomed, 7:1220.
Bayer, H. and Glimcher, P. (2005). Midbrain dopamine
neurons encode a quantitative reward prediction
error signal. Neuron, 47:129141.
Beaver, JD, Lawrence, AD, van Ditzhuijzen, J, Davis,
MH, Woods, A, Calder, AJ (2006). Individual
differences in reward drive predict neural responses
to images of food. J. Neurosci., 26, 19:5160-6.
Bensafi, M, Sobel, N, Khan, RM (2007). Hedonic-
specific activity in piriform cortex during odor
imagery mimics that during odor perception. J.
Neurophysiol., 98, 6:3254-62.
Berger, TW, Alger, B, Thompson, RF (1976).
Neuronal substrate of classical conditioning in the
hippocampus. Science, 192, 4238:483-5.
Bernoulli, D. (1738/1954). Exposition of a new theory
on the measurement of risk. Econometrica, 22, 23-
36 (translated from latin).
Berns, G. (1999). Functional neuroimaging. Life Sci.,
65:2531 2540.
Berns, G., McClure, S., Pagnoni, G., and Montague, P.
(2001). Predictability modulates human brain
response to reward. J. Neurosci., 21:27932798.
Berridge, K. and Robinson, T. (1998). What is the role
of dopamine in reward: hedonic impact, reward
learning, or incentive salience? Brain Res. Brain
Res. Rev., 28:309369.
Berridge, K. and Robinson, T. (2003). Parsing reward.
Trends Neurosci., 26:507513.
Boser, B.E., Guyon, I., and Vapnik, V. (1992). A
training algorithm for optimal margin classifiers. In
Proceedings of the Fifth Annual Workshop on
Computational Learning Theory, (ACM Press) pp.
144152.
Bowman, C. and Turnbull, O. (2003). Real versus
facsimile reinforcers on the Iowa Gambling Task.
Brain Cogn, 53:207210.
Boynton, G., Engel, S., Glover, G., and Heeger, D.
(1996). Linear systems analysis of functional
magnetic resonance imaging in human V1. J.
Neurosci., 16:42074221.
Bray, S. and ODoherty, J. (2007). Neural coding of
reward-prediction error signals during classical
conditioning with attractive faces. J. Neurophysiol.,
97:30363045.
Bray, S., Shimojo, S., and ODoherty, J. (2007). Direct
instrumental conditioning of neural activity using
functional magnetic resonance imaging-derived
reward feedback. J. Neurosci., 27:74987507.
Introduction to Reward processing
22

Breiter, H., Aharon, I., Kahneman, D., Dale, A., and
Shizgal, P. (2001). Functional imaging of neural
responses to expectancy and experience of
monetary gains and losses. Neuron, 30:619639.
Brett, M., Leff, A., Rorden, C., and Ashburner, J.
(2001). Spatial normalization of brain images with
focal lesions using cost function masking.
Neuroimage, 14:486500.
Bunzeck, N. and Duzel, E. (2006). Absolute coding of
stimulus novelty in the human substantia
nigra/VTA. Neuron, 51:369379.
Buxton, R., Wong, E., and Frank, L. (1998). Dynamics
of blood flow and oxygenation changes during
brain activation: the balloon model. Magn Reson
Med, 39:855864.
Camerer, C.F., Hogarth, R.M. (1999). The Effects of
Financial Incentives in Experiments: A Review and
Capital-Labor-Production Framework. Journal of
Risk and Uncertainty, 19:742.
Carlson, TA, Schrater, P, He, S (2003). Patterns of
activity in the categorical representations of objects.
J Cogn Neurosci, 15, 5:704-17.
Chein, J. M. and Schneider, W. (2003). Designing
effective fMRI experiments. In Grafman, J. and
Robertson, I., editors, Handbook of
Neuropsychology. Elsevier Science B.V.,
Amsterdam.
Cox, DD, Savoy, RL (2003). Functional magnetic
resonance imaging (fMRI) "brain reading":
detecting and classifying distributed patterns of
fMRI activity in human visual cortex. Neuroimage,
19, 2 Pt 1:261-70.
Childress, A.R., Franklin, T., Listerud, J., Acton, P.D.,
and OBrien, C.P. (2002). Neuroimaging of cocaine
craving states: cessation, stimulant administration,
and drug cue paradigms. In
Neuropsychopharmacology: a fifth generation of
progress. K.L. Davis, D. Charney, J.T. Coyle, C.
Nemeroff, eds. pp. 575-1590.
Cohen, M. and Bookheimer, S. (1994). Localization of
brain function using magnetic resonance imaging.
Trends Neurosci., 17:268277.
Constable, R. (1995). Functional MR imaging using
gradientecho echo-planar imaging in the presence
of large static field inhomogeneities. J Magn Reson
Imaging, 5:746752.
Cox, S., Andrade, A., and Johnsrude, I. (2005).
Learning to like: a role for human orbitofrontal
cortex in conditioned reward. J. Neurosci.,
25:27332740.
Critchley, H. and Rolls, E. (1996). Hunger and satiety
modify the responses of olfactory and visual
neurons in the primate orbitofrontal cortex. J.
Neurophysiol., 75:16731686.
Cromwell, H. C., Hassani, O. K., and Schultz, W.
(2005). Relative reward processing in primate
striatum. Exp Brain Res, 162(4):520525.
Cromwell, H. C. and Schultz, W. (2003). Effects of
expectations for different reward magnitudes on
neuronal activity in primate striatum. J
Neurophysiol, 89(5):28232838.
Cusack, R., Russell, B., Cox, S., De Panfilis, C.,
Schwarzbauer, C., and Ansorge, R. (2005). An
evaluation of the use of passive shimming to
improve frontal sensitivity in fMRI. Neuroimage,
24:8291.
Dadds, M., Bovbjerg, D., Redd, W., and Cutmore, T.
(1997). Imagery in human classical conditioning.
Psychol Bull, 122:89103.
Dale, A. (1999). Optimal experimental design for
event-related fMRI. Hum Brain Mapp, 8:109114.
DArdenne, K., McClure, S., Nystrom, L., and Cohen,
J. (2008). BOLD responses reflecting dopaminergic
signals in the human ventral tegmental area.
Science, 319:12641267.
Davatzikos, C, Ruparel, K, Fan, Y, Shen, DG,
Acharyya, M, Loughead, JW, Gur, RC, Langleben,
DD (2005). Classifying spatial patterns of brain
activity with machine learning methods: application
to lie detection. Neuroimage, 28, 3:663-8.
De Houwer, J., Thomas, S., and Baeyens, F. (2001).
Associative learning of likes and dislikes: A review
of 25 years of research on human evaluative
conditioning. Psychological Bulletin, 127, 853869.
Deichmann, R., Gottfried, J., Hutton, C., and Turner,
R. (2003). Optimized EPI for fMRI studies of the
orbitofrontal cortex. Neuroimage, 19:430441.
Delgado, M., Miller, M., Inati, S., and Phelps, E.
(2005). An fMRI study of reward-related
probability learning. Neuroimage, 24:862 873.
Delgado, M., Nystrom, L., Fissell, C., Noll, D., and
Fiez, J. (2000). Tracking the hemodynamic
responses to reward and punishment in the striatum.
J. Neurophysiol., 84:30723077.
Delgado, M.R., Stenger, V.A. and Fiez, J.A. (2004).
Motivation-dependent responses in the human
caudate nucleus. Cereb. Cortex, 14(9):1022-30.
Dematt`e, M., Osterbauer, R., and Spence, C. (2007).
Olfactory cues modulate facial attractiveness.
Chem. Senses, 32:603610.
Dickinson, A. (1980). Contemporary animal learning
theory. Cambridge: Cambridge University Press.
Djordjevic, J., Zatorre, R., Petrides, M., Boyle, J., and
Jones-Gotman, M. (2005). Functional
neuroimaging of odor imagery. Neuroimage,
24:791801.
Dreher, J., Kohn, P., and Berman, K. (2006). Neural
coding of distinct statistical properties of reward
information in humans. Cereb. Cortex, 16:561573.
Elliott, R., Dolan, R., and Frith, C. (2000). Dissociable
functions in the medial and lateral orbitofrontal
cortex: evidence from human neuroimaging studies.
Cereb. Cortex, 10:308317.
Elliott, R., Newman, J., Longe, O., and Deakin, J.
(2003). Differential response patterns in the
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
23

striatum and orbitofrontal cortex to financial reward
in humans: a parametric functional magnetic
resonance imaging study. J. Neurosci., 23:303307.
Elliott, R., Newman, J., Longe, O., and William
Deakin, J. (2004). Instrumental responding for
rewards is associated with enhanced neuronal
response in subcortical reward systems.
Neuroimage, 21:984990.
Fawcett, T. (2006). An introduction to ROC analysis.
Pattern Recognition Letters, 27: 861-874.
Fernie, G. and Tunney, R. (2006). Some decks are
better than others: the effect of reinforcer type and
task instructions on learning in the Iowa Gambling
Task. Brain Cogn, 60:94102.
Fiorillo, C., Tobler, P., and Schultz, W. (2003).
Discrete coding of reward probability and
uncertainty by dopamine neurons. Science,
299:18981902.
Friston, K. (1997). Testing for anatomically specified
regional effects. Human Brain Mapping, 5:133
136.
Friston, K. J., Ashburner, J., Frith, C. D., Poline, J. B.,
Heather, J. D., and Frackowiak, R. S. J. (1995a).
Spatial registration and normalisation of images.
Human Brain Mapping, 2:165189.
Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.
B., Frith, C. D., and Frackowiak, R. S. J. (1995b).
Statistical parametric maps in functional imaging:
A general linear approach. Human Brain Mapping,
2:189210.
Friston, K., Price, C., Fletcher, P., Moore, C.,
Frackowiak, R., and Dolan, R. (1996). The trouble
with cognitive subtraction. Neuroimage, 4:97104.
Gallistel, C. (1990). Representations in animal
cognition: an introduction. Cognition, 37:122.
Galvan, A., Hare, T., Davidson, M., Spicer, J., Glover,
G., and Casey, B. (2005). The role of ventral
frontostriatal circuitry in rewardbased learning in
humans. J. Neurosci., 25:86508656.
Genovese, C., Lazar, N., and Nichols, T. (2002).
Thresholding of statistical maps in functional
neuroimaging using the false discovery rate.
NeuroImage, 870878.
Gneezy, U., and Rustichini, A. (2000). Pay Enough or
Don't Pay at All. The Quarterly Journal of
Economics, 115:791810.
Gottfried, J., Deichmann, R., Winston, J., and Dolan,
R. (2002a). Functional heterogeneity in human
olfactory cortex: an eventrelated functional
magnetic resonance imaging study. J. Neurosci.,
22:10819 10828.
Gottfried, J., ODoherty, J., and Dolan, R. (2002b).
Appetitive and aversive olfactory learning in
humans studied using eventrelated functional
magnetic resonance imaging. J. Neurosci.,
22:1082910837.
Gottfried, J., ODoherty, J., and Dolan, R. (2003).
Encoding predictive reward value in human
amygdala and orbitofrontal cortex. Science,
301:11041107.
Gusnard, D., Raichle, M., and Raichle, M. (2001).
Searching for a baseline: functional imaging and
the resting human brain. Nat. Rev. Neurosci.,
2:685694.
Hampton, A., Adolphs, R., Tyszka, M., and ODoherty,
J. (2007). Contributions of the amygdala to reward
expectancy and choice signals in human prefrontal
cortex. Neuron, 55:545555.
Hassabis, D., Kumaran, D., Vann, S., and Maguire, E.
(2007). Patients with hippocampal amnesia cannot
imagine new experiences. Proc. Natl. Acad. Sci.
U.S.A., 104:17261731.
Haxby, JV, Gobbini, MI, Furey, ML, Ishai, A,
Schouten, JL, Pietrini, P (2001). Distributed and
overlapping representations of faces and objects in
ventral temporal cortex. Science, 293, 5539:2425-
30.
Haynes, JD, Rees, G (2006). Decoding mental states
from brain activity in humans. Nat. Rev. Neurosci.,
7, 7:523-34.
Heeger, D. and Ress, D. (2002). What does fMRI tell
us about neuronal activity? Nat. Rev. Neurosci.,
3:142151.
Henson, R., Rugg, M., and Friston, K. (2001). The
choice of basis functions in event-related fMRI.
NeuroImage, 13(6):127. Supplement 1.
Hikosaka, O., Sakamoto, M., and Usui, S. (1989).
Functional properties of monkey caudate neurons.
III. Activities related to expectation of target and
reward. J. Neurophysiol., 61:814832.
Holland, P. (1990). Event representation in Pavlovian
conditioning: image and action. Cognition, 37:105
131.
Hollerman, J. R. and Schultz, W. (1998). Dopamine
neurons report an error in the temporal prediction of
reward during learning. Nat Neurosci, 1(4):304
309.
Holmes, A. and Friston, K. (1998). Generalisability,
random effects and population inference. In
NeuroImage, volume 7, page S754.
Holt, C.A., and Laury, S.K. (2002). Risk Aversion and
Incentive Effects. The American Economic Review
92, 1644-1655.
Horvitz, J. (2000). Mesolimbocortical and nigrostriatal
dopamine responses to salient non-reward events.
Neuroscience, 96:651656.
Hulvershorn, J., Bloy, L., Gualtieri, E., Leigh, J., and
Elliott, M. (2005). Spatial sensitivity and temporal
response of spin echo and gradient echo bold
contrast at 3 T using peak hemodynamic activation
time. Neuroimage, 24:216223.
Ishai, A., Ungerleider, L., and Haxby, J. (2000).
Distributed neural systems for the generation of
visual images. Neuron, 28:979990.
Jensen, J., McIntosh, A., Crawley, A., Mikulis, D.,
Remington, G., and Kapur, S. (2003). Direct
Introduction to Reward processing
24

activation of the ventral striatum in anticipation of
aversive stimuli. Neuron, 40:12511257.
Johnson, M. and Bickel, W. (2002). Within-subject
comparison of real and hypothetical money rewards
in delay discounting. J Exp Anal Behav, 77:129
146.
Kahneman, D. & Tversky, A. (1979). Prospect theory:
an analysis of decision under risk. Econometrica,
47, 263-291.
Kamin, L. J. (1969). Predictability, surprise, attention
and conditioning. In B. A. Campbell & R. M.
Church (eds.), Punishment and aversive behavior,
279-296, New York: Appleton-Century-Crofts.
Kamitani, Y, Tong, F (2005). Decoding the visual and
subjective contents of the human brain. Nat.
Neurosci., 8, 5:679-85.
Kim, H., Shimojo, S., and ODoherty, J. (2006). Is
avoiding an aversive outcome rewarding? Neural
substrates of avoidance learning in the human brain.
PLoS Biol., 4:e233.
King, D. (1973). An image theory of classical
conditioning. Psychol Rep, 33:403411.
King, D. (1974). An image theory of instrumental
conditioning. Psychol Rep, 35:11151122.
Kirsch, P., Schienle, A., Stark, R., Sammer, G.,
Blecker, C., Walter, B., Ott, U., Burkart, J., and
Vaitl, D. (2003). Anticipation of reward in a
nonaversive differential conditioning paradigm and
the brain reward system: an event-related fMRI
study. Neuroimage, 20:10861095.
Knutson, B., Adams, C., Fong, G., and Hommer, D.
(2001). Anticipation of increasing monetary reward
selectively recruits nucleus accumbens. J.
Neurosci., 21:RC159.
Knutson, B. and Cooper, J. (2005). Functional
magnetic resonance imaging of reward prediction.
Curr. Opin. Neurol., 18:411417.
Knutson, B., Taylor, J., Kaufman, M., Peterson, R., and
Glover, G. (2005). Distributed neural representation
of expected value. J Neurosci 25:4806-4812.
Kobayashi, M., Takeda, M., Hattori, N., Fukunaga, M.,
Sasabe, T., Inoue, N., Nagai, Y., Sawada, T.,
Sadato, N., and Watanabe, Y. (2004). Functional
imaging of gustatory perception and imagery: top-
down processing of gustatory signals.
Neuroimage, 23:12711282.
Konorski, J. (1967). Integrative action of the brain.
Chicago: University of Chicago Press.
Kosslyn, S. (1988). Aspects of a cognitive
neuroscience of mental imagery. Science,
240:16211626.
Kosslyn, S., Ganis, G., and Thompson, W. (2001).
Neural foundations of imagery. Nat. Rev.
Neurosci., 2:635642.
Kosslyn, S., Shin, L., Thompson, W., McNally, R.,
Rauch, S., Pitman, R., and Alpert, N. (1996).
Neural effects of visualizing and perceiving
aversive stimuli: a PET investigation. Neuroreport,
7:15691576.
Kringelbach, M. (2005). The human orbitofrontal
cortex: linking reward to hedonic experience. Nat.
Rev. Neurosci., 6:691702.
Kringelbach, M., ODoherty, J., Rolls, E., and
Andrews, C. (2003). Activation of the human
orbitofrontal cortex to a liquid food stimulus is
correlated with its subjective pleasantness. Cereb.
Cortex, 13:10641071.
Kringelbach, M. and Rolls, E. (2004). The functional
neuroanatomy of the human orbitofrontal cortex:
evidence from neuroimaging and neuropsychology.
Prog. Neurobiol., 72:341372.
LaConte, S, Strother, S, Cherkassky, V, Anderson, J,
Hu, X (2005). Support vector machines for
temporal classification of block design fMRI data.
Neuroimage, 26, 2:317-29.
Lancaster, J., Woldorff, M., Parsons, L., Liotti, M.,
Freitas, C., Rainey, L., Kochunov, P., Nickerson,
D., Mikiten, S., and Fox, P. (2000). Automated
Talairach atlas labels for functional brain mapping.
Hum Brain Mapp, 10:120131.
Lauterbur, P.C. (1973). Image formation by induced
local interactions. Examples employing nuclear
magnetic resonance. Nature, 242:190191.
Le Pelley, M. (2004). The role of associative history in
models of associative learning: a selective review
and a hybrid model. Q J Exp Psychol B, 57:193
243.
Lin, H.-T., Lin, C.-J., and Weng, R. C. (2007). A note
on Platt's probabilistic outputs for support vector
machines. Machine Learning. 68(3), 267-276.
Lisman, J. and Grace, A. (2005). The hippocampal-
VTA loop: controlling the entry of information into
long-term memory. Neuron, 46:703713.
Liu, X., Powell, D., Wang, H., Gold, B., Corbly, C.,
and Joseph, J. (2007). Functional dissociation in
frontal and striatal areas for processing of positive
and negative reward information. J. Neurosci.,
27:4587 4597.
Ljungberg, T., Apicella, P., and Schultz, W. (1992).
Responses of monkey dopamine neurons during
learning of behavioral reactions. J Neurophysiol,
67(1):145163.
Logothetis, N. (2002). The neural basis of the blood-
oxygenlevel- dependent functional magnetic
resonance imaging signal. Philos. Trans. R. Soc.
Lond., B, Biol. Sci., 357:10031037.
Logothetis, N., Pauls, J., Augath, M., Trinath, T., and
Oeltermann, A. (2001). Neurophysiological
investigation of the basis of the fMRI signal.
Nature, 412:150157.
Mackintosh, N. J. (1975). A theory of attention:
Variations in the associability of stimuli with
reinforcement. Psychological Review, 82, 276-298.
Mackintosh, N. J. (1983). Conditioning and associative
learning. Oxford: Oxford University Press.
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
25

Maldjian, J., Laurienti, P., Kraft, R., and Burdette, J.
(2003). An automated method for neuroanatomic
and cytoarchitectonic atlasbased interrogation of
fMRI data sets. Neuroimage, 19:12331239.
Mansfield, P. (1977). Multi-planar image formation
using NMR spin echoes. J. Phys. C 10:L55L58.
McClure, S. M. (2003). Reward prediction errors in
human brain, PhD Thesis, Baylor College of
Medicine.
McClure, S., Berns, G., and Montague, P. (2003).
Temporal prediction errors in a passive learning
task activate human striatum. Neuron, 38:339346.
McClure, S., Ericson, K., Laibson, D., Loewenstein,
G., and Cohen, J. (2007). Time discounting for
primary rewards. J. Neurosci., 27:57965804.
McClure, S., Laibson, D., Loewenstein, G., and Cohen,
J. (2004a). Separate neural systems value
immediate and delayed monetary rewards. Science,
306:503507.
McClure, S., York, M., and Montague, P. (2004b). The
neural substrates of reward processing in humans:
the modern role of FMRI. Neuroscientist, 10:260
268.
Mechelli, A., Price, C., Friston, K., and Ishai, A.
(2004). Where bottom-up meets top-down:
neuronal interactions during perception and
imagery. Cereb. Cortex, 14:12561265.
Mellers, B.A., Schwartz, A., Ho, K., and Ritov, I.
(1997). Decision affect theory: Emotional reactions
to the outcomes of risky options. Psychological
Science, 8(6):423429.
Miller, G. (2007). Neurobiology. A surprising
connection between memory and imagination.
Science, 315:312.
Mirenowicz, J. and Schultz, W. (1994). Importance of
unpredictability for reward responses in primate
dopamine neurons. J Neurophysiol, 72(2):1024
1027.
Mirenowicz, J. and Schultz, W. (1996). Preferential
activation of midbrain dopamine neurons by
appetitive rather than aversive stimuli. Nature,
379(6564):449451.
Mouro-Miranda, J, Bokde, AL, Born, C, Hampel, H,
Stetter, M (2005). Classifying brain states and
determining the discriminating activation patterns:
Support Vector Machine on functional MRI data.
Neuroimage, 28, 4:980-95.
Murray, E. (2007). The amygdala, reward and emotion.
Trends Cogn. Sci. (Regul. Ed.), 11:489497.
Nichols, T., Brett, M., Andersson, J., Wager, T., and
Poline, J. (2005). Valid conjunction inference with
the minimum statistic. Neuroimage, 25:653660.
Nitschke, J., Sarinopoulos, I., Mackiewicz, K.,
Schaefer, H., and Davidson, R. (2006). Functional
neuroanatomy of aversion and its anticipation.
Neuroimage, 29:106116.
Norris, D., Zysset, S., Mildner, T., and Wiggins, C.
(2002). An investigation of the value of spin-echo-
based fMRI using a Stroop colorword matching
task and EPI at 3 T. Neuroimage, 15:719726.
OCraven, K. and Kanwisher, N. (2000). Mental
imagery of faces and places activates corresponding
stiimulus-specific brain regions. J Cogn Neurosci,
12:10131023.
ODoherty, J. (2004). Reward representations and
rewardrelated learning in the human brain: insights
from neuroimaging. Curr. Opin. Neurobiol.,
14:769776.
ODoherty, J. (2007). Lights, camembert, action! The
role of human orbitofrontal cortex in encoding
stimuli, rewards, and choices. Ann. N. Y. Acad.
Sci., 1121:254272.
ODoherty, J., Buchanan, T., Seymour, B., and Dolan,
R. (2006). Predictive neural coding of reward
preference involves dissociable responses in human
ventral midbrain and ventral striatum. Neuron,
49:157 166.
ODoherty, J., Critchley, H., Deichmann, R., and
Dolan, R. (2003a). Dissociating valence of outcome
from behavioral control in human orbital and
ventral prefrontal cortices. J. Neurosci., 23:7931
7939.
ODoherty, J., Dayan, P., Friston, K., Critchley, H.,
and Dolan, R. (2003b). Temporal difference models
and reward-related learning in the human brain.
Neuron, 38:329337.
ODoherty, J., Dayan, P., Schultz, J., Deichmann, R.,
Friston, K., and Dolan, R. (2004). Dissociable roles
of ventral and dorsal striatum in instrumental
conditioning. Science, 304:452454.
ODoherty, J., Kringelbach, M., Rolls, E., Hornak, J.,
and Andrews, C. (2001). Abstract reward and
punishment representations in the human
orbitofrontal cortex. Nat. Neurosci., 4:95102.
Ogawa, S., Lee, T., Kay, A., and Tank, D. (1990).
Brain magnetic resonance imaging with contrast
dependent on blood oxygenation. Proc. Natl. Acad.
Sci. U.S.A., 87:98689872.
Ogawa, S., Menon, R., Tank, D., Kim, S., Merkle, H.,
Ellermann, J., and Ugurbil, K. (1993). Functional
brain mapping by blood oxygenation level-
dependent contrast magnetic resonance imaging. A
comparison of signal characteristics with a
biophysical model. Biophys. J., 64:803 812.
Ogawa, S., Tank, D., Menon, R., Ellermann, J., Kim,
S., Merkle, H., and Ugurbil, K. (1992). Intrinsic
signal changes accompanying sensory stimulation:
functional brain mapping with magnetic resonance
imaging. Proc. Natl. Acad. Sci. U.S.A., 89:5951
5955.
Ojemann, J., Akbudak, E., Snyder, A., McKinstry, R.,
Raichle, M., and Conturo, T. (1997). Anatomic
localization and quantitative analysis of gradient
refocused echo-planar fMRI susceptibility artifacts.
Neuroimage, 6:156167.
Introduction to Reward processing
26

Osterbauer, R., Matthews, P., Jenkinson, M.,
Beckmann, C., Hansen, P., and Calvert, G. (2005).
Color of scents: chromatic stimuli modulate odor
responses in the human brain. J. Neurophysiol.,
93:34343441.
Osterbauer, R., Wilson, J., Calvert, G., and Jezzard, P.
(2006). Physical and physiological consequences of
passive intra-oral shimming. Neuroimage, 29:245
253.
Pagnoni, G., Zink, C., Montague, P., and Berns, G.
(2002). Activity in human ventral striatum locked
to errors of reward prediction. Nat. Neurosci., 5:97
98.
Parkes, L., Schwarzbach, J., Bouts, A., Deckers, R.,
Pullens, P., Kerskens, C., and Norris, D. (2005).
Quantifying the spatial resolution of the gradient
echo and spin echo BOLD response at 3 Tesla.
Magn Reson Med, 54:14651472.
Pasupathy, A. and Miller, E. (2005). Different time
courses of learning-related activity in the prefrontal
cortex and striatum. Nature, 433:873876.
Pavlov, I. P. (1927/1960). Conditional Reflexes. New
York: Dover Publications (the 1960 edition is an
unaltered republication of the 1927 translation by
Oxford University Press).
Pearce, J. and Bouton, M. (2001). Theories of
associative learning in animals. Annu Rev Psychol,
52:111139.
Pearce, J. and Hall, G. (1980). A model for Pavlovian
learning: variations in the effectiveness of
conditioned but not of unconditioned stimuli.
Psychol Rev, 87:532552.
Pelchat, M., Johnson, A., Chan, R., Valdez, J., and
Ragland, J. (2004). Images of desire: food-craving
activation during fMRI. Neuroimage, 23:1486
1493.
Penny, W.D., Holmes, A.P., and Friston, K.J. (2003)
Random effects analysis. In R.S.J. Frackowiak, K.J.
Friston, C. Frith, R. Dolan, K.J. Friston, C.J. Price,
S. Zeki, J. Ashburner, and W.D. Penny, eds, Human
Brain Function. Academic Press, 2nd edition.
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.,
and Frith, C. (2006). Dopamine-dependent
prediction errors underpin rewardseeking behaviour
in humans. Nature, 442:10421045.
Price, C. and Friston, K. (1997). Cognitive
conjunction: a new approach to brain activation
experiments. Neuroimage, 5:261 270.
Price, C.J., Moore, C.J., and Friston, K.J. (1997).
Subtractions, conjunctions, and interactions in
experimental design of activation studies. Human
Brain Mapping, 5(4):264 272.
Pylyshyn, Z. (2002). Mental imagery: in search of a
theory. Behav Brain Sci, 25:157182.
Ramnani, N., Elliott, R., Athwal, B., and Passingham,
R. (2004). Prediction error for free monetary
reward in the human prefrontal cortex. Neuroimage,
23:777786.
Redgrave, P., Prescott, T., and Gurney, K. (1999). Is
the short-latency dopamine response too short to
signal reward error? Trends Neurosci., 22:146151.
Rescorla, R. A. (1969). Pavlovian conditioned
inhibition. Psychological Bulletin, 72, 77-94.
Rescorla, R. (1988). Pavlovian conditioning. Its not
what you think it is. Am Psychol, 43:151160.
Rescorla, R. A. & Wagner, A. R. (1972). A theory of
Pavlovian conditioning: variations in the
effectiveness of reinforcement and
nonreinforcement. In A. Black & W. F. Prokasy
(eds.), Classical conditioning II: current research
and theory, 64-99, New York: Appleton-Century-
Crofts.
Romo, R. and Schultz,W. (1990). Dopamine neurons
of the monkey midbrain: contingencies of responses
to active touch during self-initiated arm
movements. J. Neurophysiol., 63:592606.
Rorden, C, Brett, M (2000). Stereotaxic display of
brain lesions. Behav Neurol, 12, 4:191-200.
Schmidt, C., Boesiger, P., and Ishai, A. (2005).
Comparison of fMRI activation as measured with
gradient- and spin-echo EPI during visual
perception. Neuroimage, 26:852859.
Schoenbaum, G, Setlow, B, Saddoris, MP, Gallagher,
M (2003). Encoding predicted outcome and
acquired value in orbitofrontal cortex during cue
sampling depends upon input from basolateral
amygdala. Neuron, 39:855-67.
Schott, B., Sellner, D., Lauer, C., Habib, R., Frey, J.,
Guderian, S., Heinze, H., and Duzel, E. (2004).
Activation of midbrain structures by associative
novelty and the formation of explicit memory in
humans. Learn. Mem., 11:383387.
Schultz, W. (1997). Dopamine neurons and their role in
reward mechanisms. Curr Opin Neurobiol,
7(2):191197.
Schultz, W. (1998). Predictive reward signal of
dopamine neurons. J. Neurophysiol., 80:127.
Schultz, W. (2000). Multiple reward signals in the
brain. Nat. Rev. Neurosci., 1:199207.
Schultz, W. (2002). Getting formal with dopamine and
reward. Neuron, 36(2):241263.
Schultz, W. (2004). Neural coding of basic reward
terms of animal learning theory, game theory,
microeconomics and behavioural ecology. Curr
Opin Neurobiol, 14(2):139147.
Schultz, W. (2006). Behavioral theories and the
neurophysiology of reward. Annu Rev Psychol,
57:87115.
Schultz, W. (2007). Multiple dopamine functions at
different time courses. Annu Rev Neurosci,
30:259288.
Schultz, W., Apicella, P., Ljungberg, T., Romo, R., and
Scarnati, E. (1993b). Reward-related activity in the
monkey striatum and substantia nigra. Prog Brain
Res, 99:227235.
Ph.D. Thesis Chapter University of Cambridge Krishna Prasad Miyapuram
27

Schultz, W., Dayan, P., and Montague, P. R. (1997). A
neural substrate of prediction and reward. Science,
275(5306):15931599.
Schwarzbauer, C., Raposo, A. & Tyler, L.K. (2005).
Spin-echo fMRI overcomes susceptibility-induced
signal losses in the inferior temporal lobes.
NeuroImage, 26 (S1): 802.
Schwarzbauer C., Mildner T., Heinke W., Zysset S.,
Deichmann R., Brett M., Davis M.H. (2006). Spin-
echp EPI The method of choice for fMRI of brain
regions affected by magnetic field
inhomogeneities? Human Brain Mapping, Abstract
No: 1049.
Seymour, B., Daw, N., Dayan, P., Singer, T., and
Dolan, R. (2007a). Differential encoding of losses
and gains in the human striatum. J. Neurosci.,
27:48264831.
Seymour, B., ODoherty, J., Dayan, P., Koltzenburg,
M., Jones, A., Dolan, R., Friston, K., and
Frackowiak, R. (2004). Temporal difference models
describe higher-order learning in humans. Nature,
429:664 667.
Seymour, B., Singer, T., and Dolan, R. (2007b). The
neurobiology of punishment. Nat. Rev. Neurosci.,
8:300311.
Shafir, E., Diamond, P.A., and Tversky, A. (1997). On
Money Illusion. Quarterly Journal of Economics,
112:341-74.
Simmons, W., Martin, A., and Barsalou, L. (2005).
Pictures of appetizing foods activate gustatory
cortices for taste and reward. Cereb. Cortex,
15:16021608.
Stark, C.E., Squire, L.R. (2001). When zero is not zero:
the problem of ambiguous baseline conditions in
fMRI. Proc. Natl. Acad. Sci. U.S.A., 98, 22:12760-
6.
Sutton, R. S. (1988). Learning to predict by the method
of temporal difference. Machine Learning, 3, 9-44.
Sutton, R. and Barto, A. (1981). Toward a modern
theory of adaptive networks: expectation and
prediction. Psychol Rev, 88:135170.
Sutton, R. S. & Barto, A. G. (1990). Time-derivative
models of Pavlovian reinforcement. In M. Gabriel
& J. Moore (eds.), Learning and computational
neuroscience: foundations of adaptive networks,
497-537, Boston: MIT Press.
Talairach, J. and Tournoux, P. (1988). Co-planar
Stereotaxic Atlas of the Human Brain. Thieme,
New York.
Talmi, D., Seymour, B., Dayan, P., and Dolan, R.
(2008). Human pavlovian-instrumental transfer. J.
Neurosci., 28:360368.
Thorpe, S., Rolls, E., and Maddison, S. (1983). The
orbitofrontal cortex: neuronal activity in the
behaving monkey. Exp Brain Res, 49:93115.
Thorndike, E. L. (1911). Animal intelligence:
experimental studies. New York: Macmillan.
Thut, G., Schultz, W., Roelcke, U., Nienhusmeier, M.,
Missimer, J., Maguire, R., and Leenders, K. (1997).
Activation of the human brain by monetary reward.
Neuroreport, 8:12251228.
Tiggemann, M, Kemps, E (2005). The phenomenology
of food cravings: the role of mental imagery.
Appetite, 45, 3:305-13.
Tobler, P. (2003). Coding of basic reward parameters
by dopamine neurons. PhD Thesis, University of
Cambridge.
Tobler, P., Dickinson, A., and Schultz, W. (2003).
Coding of predicted reward omission by dopamine
neurons in a conditioned inhibition paradigm. J.
Neurosci., 23:1040210410.
Tobler, P., Fiorillo, C., and Schultz, W. (2005).
Adaptive coding of reward value by dopamine
neurons. Science, 307:16421645.
Tobler, P., Fletcher, P., Bullmore, E., and Schultz, W.
(2007a). Learning-related human brain activations
reflecting individual finances. Neuron, 54:167175.
Tobler, P., Odoherty, J., Dolan, R., and Schultz, W.
(2006). Human neural learning depends on reward
prediction errors in the blocking paradigm. J.
Neurophysiol., 95:301310.
Tobler, P., ODoherty, J., Dolan, R., and Schultz, W.
(2007b). Reward value coding distinct from risk
attitude-related uncertainty coding in human reward
systems. J. Neurophysiol., 97:16211632.
Tremblay, L., Hollerman, J. R., and Schultz, W.
(1998). Modifications of reward expectation-related
neuronal activity during learning in primate
striatum. J Neurophysiol, 80(2):964977.
Tremblay, L. and Schultz, W. (1999). Relative reward
preference in primate orbitofrontal cortex. Nature,
398(6729):704 708.
Tremblay, L. and Schultz, W. (2000a). Modifications
of reward expectation-related neuronal activity
during learning in primate orbitofrontal cortex. J
Neurophysiol, 83(4):18771885.
Tremblay, L. and Schultz, W. (2000b). Rewardrelated
neuronal activity during go-nogo task performance
in primate orbitofrontal cortex. J Neurophysiol,
83(4):18641876.
Tricomi, E., Delgado, M., and Fiez, J. (2004).
Modulation of caudate activity by action
contingency. Neuron, 41:281292.
Tzourio-Mazoyer, N, Landeau, B, Papathanassiou, D,
Crivello, F, Etard, O, Delcroix, N, Mazoyer, B,
Joliot, M (2002). Automated anatomical labeling of
activations in SPM using a macroscopic anatomical
parcellation of the MNI MRI single-subject brain.
Neuroimage, 15, 1:273-89.
Valentin, V., Dickinson, A., and ODoherty, J. (2007).
Determining the neural substrates of goal-directed
learning in the human brain. J. Neurosci., 27:4019
4026.
Introduction to Reward processing
28

Vohs, K., Mead, N., and Goode, M. (2006). The
psychological consequences of money. Science,
314:11541156.
Waelti, P., Dickinson, A., and Schultz, W. (2001).
Dopamine responses comply with basic
assumptions of formal learning theory. Nature,
412(6842):4348.
Winston, J., Gottfried, J., Kilner, J., and Dolan, R.
(2005). Integrated neural representations of odor
intensity and affective valence in human amygdala.
J. Neurosci., 25:89038907.
Wise, R. (2002). Brain reward circuitry: insights from
unsensed incentives. Neuron, 36:229240.
Wise, R. (2004). Dopamine, learning and motivation.
Nat. Rev. Neurosci., 5:483494.
Wittmann, B., Schott, B., Guderian, S., Frey, J.,
Heinze, H., and Duzel, E. (2005). Reward-related
FMRI activation of dopaminergic midbrain is
associated with enhanced hippocampus-dependent
long-term memory formation. Neuron, 45:459467.
Worsley, K., Marrett, S., Neelin, P., Vandal, A. C.,
Friston, K., and Evans, A. C. (1996). A unified
statistical approach for determining significant
voxels in images of cerebral activation. Human
Brain Mapping, 4:5873.
Yoo, S., Freeman, D., McCarthy, J., and Jolesz, F.
(2003). Neural substrates of tactile imagery: a
functional MRI study. Neuroreport, 14:581585.
Zink, C., Pagnoni, G., Chappelow, J., Martin-Skurski,
M., and Berns, G. (2006). Human striatal activation
reflects degree of stimulus saliency. Neuroimage,
29:977983.
Zink, C., Pagnoni, G., Martin, M., Dhamala, M., and
Berns, G. (2003). Human striatal response to salient
nonrewarding stimuli. J. Neurosci., 23:80928097.
Zink, C., Pagnoni, G., Martin-Skurski, M., Chappelow,
J., and Berns, G. (2004). Human striatal responses
to monetary reward depend on saliency. Neuron,
42:509517

Vous aimerez peut-être aussi