Académique Documents
Professionnel Documents
Culture Documents
Author’s preprint, which has not yet been copy edited. Citation:
Barr, D. J. (in press). Perspective taking and its impostors: Four patterns of deception.
In T. Holtgraves (Ed.), The Oxford Handbook of Language and Social Psychology.
New York: Oxford University Press.
But, of all my troubles, this was the chief: I was every day and every hour
assailed with accusations of deeds of which I was wholly ignorant; of acts of
cruelty, injustice, defamation, and deceit; of pieces of business which I could
not be made to comprehend. . .
James Hogg, The Private Memoirs and Confessions of a Justified Sinner (1824/1997)
Intersubjectivity is one of the enduring mysteries in the study of language use: how
can different people with different beliefs use language to arrive a common understand-
ing? Despite pervasive ambiguity, people seem to communicate their intentions effectively;
moreover, they seem to do so with relative ease. These observations suggest the existence
of powerful constraints on language processing. What are these constraints, and how are
they incorporated into the production and interpretation of utterances?
One influential proposal has been that language users reduce ambiguity by producing
and interpreting language against their common ground—the set of information interlocu-
tors believe to be shared (Clark & Marshall, 1981). Common ground is distinct from
shared knowledge inasmuch as it is a special kind of meta-knowledge: a given piece of
shared information is not common ground unless interlocutors are mutually aware that that
it is shared. For example, you might know some salacious gossip about your friend, who
knows that people have been spreading gossip about her, and knows of its nature, but who
doesn’t know whether you know it. Though shared, this gossip is not yet part of your
common ground. Typically, language users can infer that some piece of information is
part of their common ground on the basis of certain copresence heuristics—on the basis of
sharing perceptual experiences (being located in the same environment and thus perceiving
PERSPECTIVE TAKING AND ITS IMPOSTORS 2
the same objects and hearing the same speech) or on joint membership in certain social
communities (Clark & Marshall, 1981).
The proposal that people speak and interpret language against the background of their
common ground has been highly influential, as it offers a possible solution to the problem
of pervasive ambiguity in language use. Speakers can tailor their utterances to the common
ground, in a process known as audience design (Clark & Murphy, 1982). Likewise, lis-
teners can use common ground to constrain the set of possible meanings they might assign
to an utterance (Clark & Carlson, 1981). However, achieving this reduction in ambiguity
through common ground requires language users to pay the cost of continually accessing
and using information about their interlocutor’s perspective (Lin, Keysar, & Epley, 2010;
Rossnagel, 2000).
The strategy of processing language against common ground has obvious appeal, as it
would seem to promote accurate communication. However, the psycholinguistic literature
suggests that language users do not always follow it. In many cases, speakers or listeners
make use of privileged information—information that is known not to be shared with the
interlocutor (e.g., Brown & Dell, 1987; Ferreira & Dell, 2000; Keysar, Barr, Balin,
& Brauner, 2000; for review, see Barr & Keysar, 2006). In general, what the literature
suggests is that language users can make use of common ground in certain circumstances,
but are not always able (or willing) to pay the cost of perspective taking. In fact, it is not
always necessary for language users to pay this cost. At least in some cases, speakers and
listeners can reduce ambiguity using other, less cognitively demanding tactics (Barr, 2004;
Barr & Keysar, 2002; Brown & Dell, 1987; Ferreira & Dell, 2000; Horton & Gerrig,
2005; Pickering & Garrod, 2004).
The recognition that there are multiple strategies for ambiguity resolution beyond
perspective taking has greatly enriched the theoretical landscape in the study of language
use. At the same time, it has made investigation of language processing more challeng-
ing, because a language user can produce or interpret utterances in ways that are consistent
with common ground, but without any perspective taking ever having taking place. In other
words, there can be multiple explanations for the same behavior, and as a result it is often
difficult to identify or create the conditions necessary to distinguish these alternative expla-
nations from true perspective taking. My goal in this chapter is to characterize these theo-
retical and empirical challenges by identifying four “patterns of deception"—four patterns
of behavior that are distinct from, but easily confused with, genuine perspective taking.
By genuine perspective taking, I mean behavior satisfying the following criteria: (1)
it arises when there is a disparity in beliefs between a perspective-taking agent (henceforth
agent) and a target person’s perspective (henceforth target); (2) the agent adapts its pro-
cessing or behavior because of this disparity; (3) the target’s beliefs are accessible and
accurately represented by the agent; (4) the agent produces adaptations that are appropriate
to the target’s perspective; and (5) these adaptations are undertaken spontaneously. The
first criterion is needed because in cases where there is no disparity, it is generally not pos-
sible to distinguish perspective taking from behavior that is merely egocentric; the second,
because true perspective taking must be produced in response to a perceived change in the
PERSPECTIVE TAKING AND ITS IMPOSTORS 3
common ground, as opposed to changes in the perspective of the agent (Keysar, 1997).
The third criterion rules out various uninteresting cases in which perspective taking fails
due to ignorance or inaccurate beliefs about the target. The fourth criterion indicates that
we should only accept adaptations that are appropriate to the target’s needs. Finally, the
fifth criterion states that we should restrict our definition to cases in which language users
recognize the need for perspective taking and adapt their behavior spontaneously (i.e., with-
out prompting by an experimenter or through feedback received from their conversational
partner).
The Double
This strategy of substituting one’s own knowledge will be effective whenever there
is extensive overlap between the perspectives of agent and target. It is possible, then, that
language users will be more likely to adjust away from the “egocentric” anchor when the
target is perceived to be dissimilar from the self. This yields an interesting and counterintu-
itive prediction: people should be more egocentric when talking to friends. This prediction
was confirmed in a study by Savitsky, Keysar, Epley, Carter, and Swanson (2011), who
found that speakers were more prone to overestimate how well they conveyed their mean-
ings when speaking to friends than when speaking to strangers. In addition, they found
that listeners experienced greater interference from privileged information when they in-
terpreted speech from friends. In short, talking to someone familiar causes people to drop
their “cognitive guard” against their native tendency toward egocentrism.
The Charlatan
Much of human cognition can be regarded as heuristic in nature: rather than map-
ping inputs to outputs using some computationally demanding or uncertain target function,
it approximates the mapping using some alternative function that sacrifices accuracy for
efficiency, a characteristic known as “satisficing” (Simon, 1996). In decision making, sub-
stituting an easier problem for a harder problem is known as attribute substitution (Kahne-
man & Frederick, 2002). As Kahneman and Frederick put it: “judgment is mediated by
a heuristic when an individual assesses a specified target attribute of a judgment object by
substituting another property of that object—the heuristic attribute—which more readily
comes to mind.” (p. 53). For example, when assessing whether it is safer to travel from
A to B by car or by plane, one should attempt to compare two ratios: the ratio of unsafely
completed car journeys to all car journeys, compared to the ratio of unsafely completed air
journeys to all air journeys. However, one may answer this question instead by considering
how readily unsafe journeys by either mode come to mind. If there have recently been
significant air disasters in the news, one may be more likely to consider car travel as safer.
In such an example, the heuristic attribute of “ease of recall” is substituted for the target
attribute of relative risk.
The Charlatan uses such attribute substition to produce behavior that appears to be
shaped by the target attribute of the target person’s perspective, but that in reality is based on
heuristic attributes available to the self.1 Often, this reliance on heuristic attributes will be
sufficient for successful communication, because the heuristic attributes are strongly cor-
related with the target attributes. However, the correlated nature of these attributes means
that often quite clever experimentation is required to distinguish attribute substitution from
perspective taking.
Consider, for example, decisions about how to articulate a word. A variety of evi-
dence suggests that the less contexually predictable a word is, the more clearly speakers
will tend to articulate it (Bell, Brenier, Gregory, Girand, & Jurafsky, 2009; Lieberman,
1
In a sense, the Double might be seen as one of the many manifestations of the Charlatan, inasmuch as it
substitutes its own knowledge for that of the interlocutor.
PERSPECTIVE TAKING AND ITS IMPOSTORS 6
1963); indeed, listeners seem to infer whether a referent is given or new in the discourse
based on the intelligibility of the referring expression (Fowler & Housum, 1987). It would
therefore seem that phonological planning in language production is closely calibrated to
the comprehender’s informational needs. The available evidence generally suggests, how-
ever, that articulatory clarity is driven by availability of information to the speaker rather
than by the speaker’s model of the listener. Bard et al. (2000) found that speakers produced
less intelligible tokens of a word when an expression was repeated, regardless of whether
or not the expression was new to the addressee. In a more recent study by Arnold, Kahn,
and Pancani (2012), the speaker’s beliefs about the availability of a referent to an addressee
were directly manipulated by the addressee’s nonverbal behavior. There was no evidence
that speakers spoke less intelligibly when the referent was available to the addressee. How-
ever, Galati and Brennan (2010) found some evidence for audience sensitivity: while the
measured duration of words seemed only to reflect the speaker’s knowledge, a set of par-
ticipant raters who rated the intelligibility of the words out of context gave higher ratings
to words spoken to less-knowledgeable addressees.
Moving up the ladder of language production, studies of syntactic processing have,
on the whole, discredited the idea that speakers choose syntactic constructions to avoid am-
biguity or to make their speech easier for an addressee to parse (Arnold, Wasow, Asudeh,
& Alrenga, 2004; Ferreira & Dell, 2000). There is also little evidence that when using
a syntactically ambiguous construction, speakers use speech prosody to disambiguate it
(Kraljic & Brennan, 2005). Instead, these choices seem to be driven by what is easier to
produce or more salient to the speaker. For example, much of a speaker’s decision regard-
ing whether to use the optional complementizer that in sentence complement structures
(e.g., the coach knew (that) a good player would be hard to find) depends on cognitive
availability of the material in the complement clause (Ferreira & Dell, 2000), with higher
availability resulting in fewer instances of that. Even when speakers are made explicitly
aware that a construction will be syntactically ambiguous to the listener, they are drasti-
cally overconfident in their ability to use prosody to signal an intended meaning (Keysar &
Henly, 2002).
At the next-higher level of production, speakers have to make choices about how to
refer to a referent. Unlike syntactic or prosodic choices, such choices would seem to be
more accessible to consciousness and therefore more subject to beliefs about the interlocu-
tor’s informational needs. When referring to an entity, speakers have a large of variety of
referential forms at their disposal. Philosophers and language scientists have long assumed
that it is accessibility of information for the addressee that is the driving factor in speakers’
referential choices (Ariel, 1988; Chafe, 1976; Gundel, Hedberg, & Zacharski, 1993).
However, here is another case where careful research has unmasked the Charlatan’s clever
tricks.
Speakers produce full noun-phrases when a referent is new, and pronouns when a
referent is the salient focus of discussion. But do they do this based on what is accessible
in the addressee’s discourse model, or based on what is accessible in their own discourse
models? To distinguish these possibilities, Fukumura and van Gompel (2012) had speakers
PERSPECTIVE TAKING AND ITS IMPOSTORS 7
describe a critical action performed by one of two characters (of different genders) to an
addressee who had to recreate the action using toys. Just before the speaker described the
critical action, Fukumura and van Gompel (2012) auditorily presented a context sentence,
manipulating two factors: (1) whether the character mentioned in the context sentence was
the same character later mentioned in the critical sentence (the “critical” character) or a
different character; and (2) whether the addressee, or only the speaker, heard the context
sentence. The question was whether the speaker, in describing the critical action, used a
pronoun (e.g., he stood up) or a full noun phrase (The admiral stood up). If the choice
is based on the accessibility of referents for the addressee, then speakers should only use
pronouns when the context sentence was about the critical character and the addressee had
heard it. Instead, Fukumura and van Gompel’s speakers seemed influenced only by whether
or not the context sentence mentioned the critical character, using pronouns at a dramati-
cally higher rate when it did. Intriguingly, speakers did not seem altogether indifferent to
whether or not addressees heard the context sentence; there was a (not fully significant)
trend toward speakers using fewer pronouns when the addressee did not hear the context
sentence. However, this was true even when the competitor was mentioned. In short, it
seems that speakers largely base their choices of referential form on accessibility within
their own discourse model. They may also be sensitive to the possibility that addressees
may not be knowledgeable about a topic, but this causes them just to be more explicit over-
all, not just in the only case where it actually matters to the addressee (for similar effects,
see Ferreira & Dell, 2000).
Choosing the form of a referring expression is just one type of high-level planning
decisions that speakers have to undertake when generating spoken utterances. Another type
of planning decision concerns how speakers package information along different visual and
auditory channels. In many conversational settings, speakers have visual contact with their
conversational partner, such that they can use gestures and facial expressions to convey
their meanings alongside speech. It is critical for speakers to consider what information
channels are available to their interlocutor when packaging information; for example, it
is likely to be ineffective to attempt to gesturally demonstrate an idea when speaking to a
visually-impaired person. A number of studies support the idea that speakers are sensitive
to addressees’ access to visual information channels; for example, speakers gesture more
frequently and vividly when they have visual contact (Alibali, Heath, & Myers, 2001;
Bavelas, Gerwing, Sutton, & Prevost, 2008; Cohen & Harrison, 1973).
In most situations, there is parity across production and comprehension modalities in
terms of the availability of information channels. In other words, it is unusual for one party
in a communication to have access to a modality that is not also accessible to the other
party.2 When you can see and hear me, it is generally safe to assume that I can also see and
hear you. But this parity opens up the opportunity for attribute substitution: specifically,
speakers can substitute the heuristic attribute of can I see and hear my interlocutor? for the
2
Arguably, disparity in access to channels has recently become somewhat more common than it has been
traditionally, with the rise of videoconferencing software like Skype, in combination with the variation in the
availability of a front-facing webcam across mobile devices.
PERSPECTIVE TAKING AND ITS IMPOSTORS 8
target attribute of can my interlocutor see and hear me? Given this, the finding that speakers
gesture more under visibility conditions may not truly reflect adaptation to the fact of being
seen, but rather, a learned response to the visible availability of the interlocutor.
It was only recently that researchers had the idea of deconfounding the effects of
seeing from the effects of being seen on gesture production. Mol, Krahmer, Maes, and
Swerts (2011) assessed the rate of representational gesture3 production among speakers re-
telling a cartoon under various communication conditions, while independently varying the
factors of seeing and being seen. In one of the two experiments, they used a special webcam
setup that allowed for simulated eye contact between interlocutors. In this experiment, they
found that the rate of gesture production was influenced only by being seen and not by
seeing. This supports the idea that the distribution of information across channels reflects
genuine perspective taking (though see the section on The Freeloader for possible caveats).
Although so far the discussion of attribute substitution has focused on language pro-
duction, this stratagem is also available to language comprehension. Consider how people
determine who is being referred to when they hear a person’s initial name such as Kevin.
Inital names are highly ambiguous due to conservatism in how parents name their children;
for example, about half of the approximately two million baby boys born in the United
States in 2011 received one of just 140 names (Barr, Jackson, & Phillips, 2014). When
your friend asks, did you hear about Kevin?, how do you know which Kevin she is referring
to? One possibility is that you access the common ground you share with your friend, nar-
rowing down the set of Kevins to those people named Kevin that are contextually relevant,
that both of you know, and that you know that you know. However, another possibility
is that you have learn to associate hearing the word Kevin in your friend’s voice with the
target Kevin. Upon hearing the compound cue of the name and your friend’s voice, you
access the referent immediately, bypassing common ground.
To distinguish these possibilities, Barr et al. (2014) took advantage of the naturally-
occurring common ground existing between university students in the form of social net-
works. Pairs of friends from a given year of study were recruited to play a communication
game along with another student (the assistant), who was an outsider to the friends’ so-
cial circle. One of the friends (the addressee) heard the name of a mutually-known target
person from the same year (e.g., the next person will be Kevin) and had to select the cor-
responding photo from a computerized display while his or her eyes were tracked. On any
given trial, addressees were led to believe that the name for the target had been chosen
either by the friend or by the outsider, and that this “designer” agent had then written down
his or her choice on a slip of paper. Critically, the name on the slip of paper could be read
aloud by either the friend or the assistant; thus, in addition to friends reading messages they
themselves had written, they sometimes read messages that the assistant had written, and
vice-versa.
An analysis of eye data from the first 1.5 seconds after name onset revealed that ad-
dressees gazed more quickly and more reliably at the target when the name was spoken by
3
Representational (iconic) gestures are gestures that iconically depict their referents, in contrast to beat
gestures, used for emphasis, and deictic gestures which direct a listener’s attention (see McNeill, 1996).
PERSPECTIVE TAKING AND ITS IMPOSTORS 9
the friend, regardless of whether the name was chosen by the friend or by the outsider. This
supports the idea that listeners identify persons from spoken names through episodic asso-
ciations rather than through common ground. However, it was not the case that addressees
simply ignored the identity of the designer, since effects of the designer were observed on
response time. This case provides support for the general idea that the ordinary operations
of memory can serve as a proxy for computations of common ground (Horton & Gerrig,
2005).
The Conspirator
One of the more nuanced questions that has arisen in recent years concerns not
whether an agent used information about a target perspective at all during processing, but
how such information was used. Specifically, a key question is the extent to which agents
are able to use information about a target perspective to constrain the operations at various
levels of language processing. To date, these more nuanced questions have been asked
mostly about language comprehension rather than language production, and have focused
on the extent to which perspective information constrains the lexical level of processing
(recognizing words in context and accessing their meanings).
The comprehension of words is an incremental process that does not wait to resolve
ambiguities until a linguistic constituent has been filled; rather, the comprehension system
generates hypotheses on-the-fly, moment-by-moment, as information accrues (Allopenna,
Magnuson, & Tanenhaus, 1998; Marslen-Wilson, 1987). For example, by the time one
has heard the first syllable buh of the word bucket, the comprehension system will have
already reduced the set of possible lexical candidates from a large set to a much smaller
set of words consistent with the evidence, such as buckle and bucket. But it is only in
psycholinguistic labs that listeners hear single words in isolation without having access to
any prior information that makes certain words more or less likely. In general, listeners
have access to a wide array of prior information that can constrain their expectations from
a wide variety of sources. A long-standing debate in spoken language comprehension
concerns how effectively this information constrains the lexical hypotheses generated about
the input.
Some of the earliest claims about the use of common ground in language processing
favored an “egocentric first” model in which comprehenders initially interpret utterances
from their own perspectives and only later consider information about speakers’ perspec-
tives as part of a secondary, optional, resource-demanding stage of processing (Keysar et
al., 2000; Keysar, Barr, Balin, & Paek, 1998). However, this view has been largely
disconfirmed, with visual-world eyetracking studies showing that listeners pay less atten-
tion to privileged information than to information in common ground. Indeed, effects of
common ground generally are present during the earliest moments of lexical processing
(Brown-Schmidt et al., 2008; Hanna et al., 2003; Nadig & Sedivy, 2002). Moreover,
listeners seem to spontaneously access information about the speaker’s perspective; doing
so even when considering the speaker’s perspective is unnecessary to resolve ambiguity
(Barr, 2008b; Heller et al., 2008).
PERSPECTIVE TAKING AND ITS IMPOSTORS 10
Taken together, these findings would seem to suggest that common ground modu-
lates lexical processing from the earliest moments of comprehension. As such, they have
been marshalled in support of probabilistic constraint-based models of language compre-
hension, in which various structures and interpretations are assigned probabilities through
a weighted combination of cues (Hanna et al., 2003; Nadig & Sedivy, 2002). Under
this interpretation, whether or not a speaker knows about a particular referential alternative
is seen as just one of many cues that are used in parallel to resolve references. Juraf-
sky (2003) notes that inasmuch as constraint-based models assume weighted cue combi-
nation and probabilistic outputs, they can be considered a notational variant of Bayesian
models. These models have a critical requirement of cognitive penetrability—that is, that
“top-down” information available to higher-level subsystems, such as information about a
speaker’s perspective, must be also available to lower level systems engaged in “bottom
up” processing, such as those involved in lexical processing.
Although it might be tempting to see the evidence from comprehension as settling
matters in favor of Bayesian-type integration accounts, the problem is that it is possible to
have concurrent effects of top-down perspective taking and bottom-up lexical processing
in a system lacking in cognitive penetrability—that is, even when the lexical processing
subsystem is not influenced by the top down information. When visual world researchers
observe concurrent effects of a contextual constraint and of linguistic processing, they all
too readily interpret this pattern as evidence for interactive integration (Barr, 2008a). But
as any good Bayesian would tell you, observing only the output of the belief updating
process (the posterior) cannot possibly provide any information about whether or not the
new information (linguistic evidence) was combined with the prior information (context) in
an optimal way. Participants in perspective-taking experiments, before hearing any refer-
ring expression, already know which referential alternatives are privileged and which are
shared. This creates a visual bias favoring shared over privileged referents. Effects of com-
mon ground during the expression could just be a continuation of these prior effects. To
be sure, the fact that this bias exists shows that listeners are sensitive to information about
the speaker’s perspective; however, from observing that this bias continues during process-
ing of the referring expression, it is not valid to conclude that the information modulated
lexical processing. The relevant evidence for information integration is how the incoming
linguistic information changes the probabilities of gazing at potential referents; and partic-
ularly, whether the boost in probability is larger for shared than for privileged alternatives,
as Bayesian (or probabilistic constraint-based models) would predict.
Barr (2008b) tested the predictions of information integration by examining lexical
competition within a visual-world paradigm. Participants viewed computerized displays
depicting four referential alternatives and heard a speaker telling them to click on a target
picture (e.g., “Click on the bucket.”). One of the three remaining alternatives, the critical
picture, was either a phonological competitor (i.e., shared a phonological onset with the
target, e.g., a buckle) or a baseline picture (i.e., phonologically unrelated picture e.g., a
stepladder). Importantly, this critical object was either privileged or shared. The question
was whether the boost in activation for competitors (measured against baseline) was larger
PERSPECTIVE TAKING AND ITS IMPOSTORS 11
when the critical object was shared than when it was privileged.
.25
Lexical Activation (vs. baseline)
.15
.10
.05
.00
350 400 450 500 550 600 650 700 750 800 850 900 950 1000
Time from Word Onset (ms)
Figure 1. Predicted and observed posterior probability of gazing at a privileged competitor
plotted against the observed posterior for a shared competitor (uniform prior).
Over all three experiments, there was clear evidence that comprehenders anticipated
references to shared referential alternatives: typically, by the onset of the referring expres-
sion, they were three to four times more likely to be gazing at a shared referent than a
privileged referent. In other words, they had a strong prior expectation that the speaker
would refer to one of the shared alternatives. However, none of the experiments provided
any evidence that listeners were able to integrate this information with incoming speech.
Data from one such experiment (Experiment 2) appears in Figure 1, showing the observed
competition effect compared to the predictions of a Bayesian model. The competition from
privileged alternatives was far greater than an Bayesian information-integration account
would predict; indeed, in each experiment, there was no evidence that the competition
from privileged alternatives was any less than that induced by shared alternatives (the case
of a uniform prior).
What this implies is that comprehenders represent speakers’ beliefs and sponta-
neously activate them in time to constrain processing, but that the neurocognitive system
responsible for mapping incoming speech to referential alternatives has no access to this
information. This is particularly surprising when considered against a further finding. In
one experiment, the top-down effects of perspective were contrasted with top-down effects
of a linguistic nature (specifically, top-down effect from verb semantics). While the ef-
fects of perspective showed the anticipation-without-integration pattern, the effects of the
linguistic constraint showed a pattern that was nearly Bayes-optimal (Barr, 2008b).
These findings suggest the intriguing possibility that at least some of the difficulties
people experience in using knowledge about others may not lie not in a failure to represent
what others know in an accurate or timely fashion, but as the result of constraints imposed
by neurocognitive architecture on the flow of information between autonomous (or semi-
autonomous) processing modules. The independent, parallel operations of these seemingly
autonomous subsystems conspire to give an illusion of information integration.
PERSPECTIVE TAKING AND ITS IMPOSTORS 12
The Freeloader
The Freeloader, like the Conspirator, is unduly credited with more sophisticated pro-
cessing than actually undertaken; however, in this case, the processing in question is not
divided across separate processing modules within the same individual, but rather, across
separately acting individuals. When you interact with another person, the behaviors you
and your partner produce are not independent: you will reciprocally influence one another’s
behavior (Pickering & Garrod, 2004; Pickering & Garrod, this volume). For example, ad-
dressees play an active role in determining an utterance’s informational content through
cues they provide to the speaker (Bavelas, Coates, & Johnson, 2000; Clark & Wilkes-
Gibbs, 1986; Gann & Barr, 2012; Kraut, Lewis, & Swezey, 1982; Schober & Clark,
1989), and speakers monitor these cues as they speak (Clark & Henetz, this volume; Clark
& Krych, 2004; Gann & Barr, 2012). The same might be true for comprehenders—to
the extent different interpretations of an utterance lead to different overt actions, and to the
extent comprehenders have access to online feedback from a speaker who is monitoring
their behavior, they could modify their interpretations based on this feedback (though to
my knowledge this possibility has not been investigated). In close-knit interactive situa-
tions, certain acts of speaking and understanding might be better viewed as collaborative
achievements rather than individual acts.
The Freeloader takes advantage of interactive situations to reduce his own
perspective-taking burden (Barr & Keysar, 2005; Fussell & Krauss, 1992; Gann &
Barr, 2012). Modeling the listener’s knowledge is an effortful activity that is fraught
with uncertainty. Interactive situations enable interlocutors to reduce uncertainty about one
another’s perspectives by acting on uncertain suppositions, rather than requiring them to
more effortfully do so through cognitive computation; viewed in this way, acts of speaking
and understanding in dialogue can constitute a form of epistemic action (Kirsh & Maglio,
1994). A speaker who is uncertain about whether an addressee is familiar with an expres-
sion (does she know that “NHS” stands for the National Health Service in Britain?) can
piece together the likelihood that she knows it by considering other things she knows about
the individual (I think she once visited the UK, and her aunt married an Englishman).
When there is no potential for interaction (the speaker is crafting an email), the speaker
would have only her own knowledge to rely on to decide what form is appropriate. But
in interaction, the speaker can simply take a gamble by emitting the uncertain material (“I
can’t imagine something like the NHS ever existing in the US”) and observing her part-
ner’s reaction (she looks confused; now she is asking me what the NHS is). Given that
the best possible model of the addressee’s knowledge is the addressee herself, this strat-
egy is necessarily more accurate than any estimation strategy that a speaker could possibly
undertake.
The fact that speakers’ and listeners’ conversational acts are intertwined in interactive
dialogue means that studying socially-immersed behavior can lead to erroneously credit-
ing individual language users with feats of perspective-taking that are more appropriately
characterized as interactively emergent. To illustrate, an influential study by Brennan and
Clark (1996) sought to test the argument for partner-specificity in referring to everyday ob-
PERSPECTIVE TAKING AND ITS IMPOSTORS 13
jects: that speakers choose referring expressions based on their history of interaction with
specific partners. They found that if you have been calling a particular shoe “the loafer”
to distinguish it from another shoe when talking to me, you would be likely to continue
calling it thus even in contexts where the term is overspecific (because it is the only shoe);
more importantly, they found that you would use overspecific terms at a higher rate when
you continued talking to me as compared to when you spoke to another person. But why?
Is it because in designing your utterances you are considering my perspective, or because
a new partner gave you feedback that led you to stop using the overly specific term?
In a follow-up study, Gann and Barr (2012) recruited triads of participants and got
speakers to entrain on very unusual ways of describing very typical objects. For instance,
by presenting a very typical candle in the context of a half-melted candle, they got speakers
to frequently call the typical candle “the unmelted candle.” The question was whether
speakers would continue using these unusual expressions when the contrasting object (the
melted candle) was removed from the context. Like Brennan and Clark’s speakers, they
continued using the overly specific terms; in this case, about 70% of the time. Unlike
Brennan and Clark (1996), however, speakers were only tested on each entrained term a
single time with a single addressee, giving them no opportunity to change their individual
descriptions in response to feedback. Under these circumstances, there was no evidence
that speakers were any less likely to use unusual entrained terms such as unmelted candle
with a new addressee than with the old partner.
The danger of erroneously attributing interactively-emergent effects to individual
processing lurks most obviously in the practice of aggregating over multiple referring
episodes, and treating the result as if it were a picture of what takes place within a sin-
gle episode. However, interactive effects can also emerge within a single referring episode,
where its operation can often be quite subtle. In the same study by Gann and Barr (2012),
speakers described abstract figures to addressees, much in the manner of the well-known
study by Clark and Wilkes-Gibbs (1986). This time, however, the images were presented
on computer monitors, and addressees selected images using a computer mouse. Speakers
could observe the addressee’s movement of the mouse cursor, and could use this as an in-
dex of their emerging understanding. Indeed, there was evidence that speakers did this: the
longer the lag between the onset of the speaker’s expression and the onset of movement of
the addressee’s mouse cursor, the more words speakers used to describe the target.
In sum, in close-knit, interactive dialogue, many of the adjustments that speakers
make for their addressees—and possibly, that addressees make for their speakers—may
emerge out of interactive processes, and as such do not reflect spontaneous adjustments
to a perceived change in the common ground. It can be argued that these adjustments do
reflect a kind of perspective taking; after all, when a speaker notes that an addressee is not
understanding, they need to determine how to adapt their speech to improve the situation.
But as we have seen above, careful research is needed to know whether adaptations to
signals of mis-coordination reflect true consideration of the addressee’s perspective, or the
thoughtless deployment of heuristics that have been learned through experience, and which
may or may not be advantageous for the situation at hand.
PERSPECTIVE TAKING AND ITS IMPOSTORS 14
Conclusion
In summary, we have considered four explanations for partner-adapted behavior in
language use, personified in four impostor types: The Double, who substitutes a minimally-
adapted version of its own knowledge for that of the target; The Charlatan, who employs
attribute substitution; The Conspirator, in which the parallel but independent operation of
autonomous subsystems conspires to produce an illusion of information integration; and
The Freeloader, who gets solitary credit for interactively-emergent processes.
To the extent that we are interested in mechanistic explanations of speaking and
understanding, the existence of these four patterns of deception calls for greater caution
in how evidence from language use in dialogue informs psycholinguistic theories. Re-
searchers currently seem to be well aware of the pattern of The Double, but less aware of
the three other personality types. The Charlatan is often overlooked, and can be particularly
tricky to unmask, as this requires identifying heuristic attributes that might be available in
a given situation and varying them independently of target attributes. The Conspirator is
generally ignored, even though studies claimed to support interactive effects of perspective
on lexical processing generally show the same anticipation-without-integration pattern (see
Barr, 2014 for discussion).
It has recently been argued that studies of perspective taking in non-interactive con-
texts (Brown-Schmidt & Hanna, 2011) or that use confederates (Kuhlen & Brennan, 2013)
have limited ecological validity. But the pattern of The Freeloader highlights the invalid-
ity of drawing conclusions about individual psycholinguistic processing from close-knit
interactive situations. To be sure, studying language processing in dialogue is critical for
understanding the interactive strategies that language users bring to bear in typical face-to-
face settings. But it must be kept in mind that the behaviors produced in such circumstances
are not independent, and claims about individual processing should be appropriately tem-
pered. This is currently not the case. For example, although Brennan and Clark’s own data
suggest that lexical choices emerged through interaction, researchers still heavily cite this
as a key study supporting perspective taking in production, e.g.: “Extensive evidence has
shown that speakers select linguistic forms on the basis of the knowledge, intentions, or
goals of the addressee (e.g., Brennan & Clark, 1996).” (Arnold et al., 2012, p. 505). To
distinguish competing psycholinguistic explanations, there can be no substitute for the con-
trolled experiment; and there is no escaping the tradeoff between ecological validity and
experimental control. It should also be kept in mind that settings of language use vary in
their interactivity, with even some face-to-face interactions showing relatively low interac-
tivity due to social status differences or sociocultural norms regarding nonverbal behavior
and interruption (e.g., behavior in a job interview versus in the pub). They also vary in their
multimodality, from rarefied exchanges in text messaging to full blown face to face interac-
tion. To the extent that people take advantage of interactivity to lower their computational
burden, this means that studying language in dialogue might cause us to underestimate
rather than overestimate language users’ perspective-taking abilities.
The search for mechanisms underlying perspective taking in language use has led
to the identification of a variety of strategies that work as a proxy for perspective taking.
PERSPECTIVE TAKING AND ITS IMPOSTORS 15
If language use rests on an almost ad hoc collection of strategies, how do language users
know which strategies are appropriate in which situation? Perhaps this approach is merely
shifting the explanatory burden rather than resolving it. It is indeed the case that this type of
approach leaves much to be explained. However, it at least points a finger in the direction
where such explanations could be profitably sought: namely, in processes of social develop-
ment and acculturation. Our communicative skills are honed over a lifetime of experience
with various communicative partners in various social settings. One well-known hallmark
of expert skill is that it involves the appropriate structuring of a database of knowledge that
enables experts to see underlying patterns that are hidden from the novice, and to access
relevant information in a timely manner. Experts are thus better at recognition of particular
classes of situations and have developed heuristics that enable processing that is efficient
and generally accurate, but subject to systematic biases. When an expert categorizes a situ-
ation as a new instance of an old problem, this results in the obligatory retrieval of solutions
and processing strategies that have proven effective in the past (Logan, 1988). If the tricks
that we learn are good enough for understanding one another—or at least, to comfort us
with the illusion of doing so—then we should not fault ourselves for using them.
References
Alibali, M. W., Heath, D. C., & Myers, H. J. (2001). Effects of Visibility between Speaker and
Listener on Gesture Production: Some Gestures Are Meant to Be Seen. Journal of Memory
and Language, 44(2), 169–188. Retrieved from http://dx.doi.org/10.1006/jmla
.2000.2752 doi: 10.1006/jmla.2000.2752
Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time-course of spoken
word recognition using eye movements: Evidence for continuous mapping models. Journal
of Memory and Language, 38, 419–439.
Apperly, I. A., Carroll, D. J., Samson, D., Humphreys, G. W., Qureshi, A., & Moffitt, G. (2010).
Why are there limits on theory of mind use? evidence from adults’ ability to follow in-
structions from an ignorant speaker. The Quarterly Journal of Experimental Psychology, 63,
1201–1217. doi: 10.1080/17470210903281582
Ariel, M. (1988). Referring and accessibility. Journal of Linguistics, 24, 65–87.
Arnold, J. E., Kahn, J., & Pancani, G. (2012). Audience design affects acoustic reduction via
production facilitation. , 19, 505–512. doi: 10.3758/s13423-012-0233-y
Arnold, J. E., Wasow, T., Asudeh, A., & Alrenga, P. (2004). Avoiding attachment ambiguities: The
role of constituent ordering. Journal of Memory and Language, 51, 55–70.
Bard, E. G., Anderson, A. H., Sotillo, C., Aylett, M., Doherty-Sneddon, G., & Newlands, A. (2000).
Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and
Language, 42, 1–22.
Barr, D. J. (2004). Establishing conventional communication systems: Is common knowledge
necessary? Cognitive Science, 28, 937–962. doi: 10.1016/j.cogsci.2004.07.002
Barr, D. J. (2008a). Analyzing ’visual world’ eyetracking data using multilevel logistic regression.
Journal of Memory and Language, 59, 457–474. doi: 10.1016/j.jml.2007.09.002
Barr, D. J. (2008b). Pragmatic expectations and linguistic evidence: Listeners anticipate but do not
integrate common ground. Cognition, 109, 18–40. doi: 10.1016/j.cognition.2008.07.005
PERSPECTIVE TAKING AND ITS IMPOSTORS 16
Barr, D. J. (2014). Visual world studies of conversational perspective taking: Similar findings,
diverging interpretations. In P. Pyykkönen-Klauck & M. W. Crocker (Eds.), Visually situated
language comprehension. Manuscript in press.
Barr, D. J., Jackson, L., & Phillips, I. (2014). Using a voice to put a name to a face: The psycholin-
guistics of proper name comprehension. Journal of Experimental Psychology: General, 143,
404–413.
Barr, D. J., & Keysar, B. (2002). Anchoring comprehension in linguistic precedents. Journal of
Memory and Language, 46, 391–418. doi: 10.1006/jmla.2001.2815
Barr, D. J., & Keysar, B. (2005). Making sense of how we make sense: The paradox of egocentrism
in language use. In H. L. Colston & A. N. Katz (Eds.), Figurative language comprehension:
Social and cultural influences (pp. 21–41). Mahwaw, N. J.: Erlbaum.
Barr, D. J., & Keysar, B. (2006). Perspective taking and the coordination of meaning in language
use. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed.)
(pp. 901–938). Amsterdam, Netherlands: Elsevier.
Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality
and Social Psychology, 79, 941–952.
Bavelas, J. B., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing on the telephone: Inde-
pendent effects of dialogue and visibility. Journal of Memory and Language, 58(2), 495–
520. Retrieved from http://dx.doi.org/10.1016/j.jml.2007.02.004 doi:
10.1016/j.jml.2007.02.004
Begeer, S., Malle, B. F., Nieuwland, M. S., & Keysar, B. (2010). Using theory of mind to represent
and take part in social interactions: Comparing individuals with high-functioning autism and
typically developing controls. European Journal of Developmental Psychology, 7, 104–122.
doi: 10.1080/17405620903024263
Bell, A., Brenier, J. M., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on
durations of content and function words in conversational english. Journal of Memory and
Language, 60, 92–111. doi: 10.1016/j.jml.2008.06.003
Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal
of Experimental Psychology: Learning, Memory, & Cognition, 22, 1482–1493.
Brown, P. M., & Dell, G. S. (1987). Adapting production to comprehension: The explicit mention
of instruments. Cognitive Psychology, 19, 441–472.
Brown-Schmidt, S., Gunlogson, C., & Tanenhaus, M. K. (2008). Addressees distinguish shared
from private information when interpreting questions during conversation. Cognition.
Manuscript in press.
Brown-Schmidt, S., & Hanna, J. E. (2011). Talking in another’s shoes: Incremental perspective-
taking in language processing. Dialogue and Discourse, 2, 11–33.
Chafe, W. L. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and point of view.
In C. N. Li (Ed.), Subject and topic. Academic Press.
Clark, H. H., & Carlson, T. B. (1981). Context for comprehension. In J. Long & A. Baddeley
(Eds.), Attention and performance ix (pp. 313–330). Hillsdale, N. J.: Erlbaum.
Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for understanding.
Journal of Memory and Language, 50, 62–81.
Clark, H. H., & Marshall, C. R. (1981). Definite reference and mutual knowledge. In A. K. Joshe,
B. L. Webber, & I. A. Sag (Eds.), Elements of discourse understanding (pp. 10–61). Cam-
bridge: Cambridge University Press.
PERSPECTIVE TAKING AND ITS IMPOSTORS 17
Clark, H. H., & Murphy, G. L. (1982). Audience design in meaning and reference. In J. Le Ny &
W. Kintsch (Eds.), Language and comprehension (pp. 287–299). Amsterdam: North Holland
Publishing.
Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22,
1–39.
Cohen, A. A., & Harrison, R. P. (1973). Intentionality in the use of hand illustrators in face-to-face
communication situations. Journal of Personality and Social Psychology, 28(2), 276–279.
Retrieved from http://dx.doi.org/10.1037/h0035792 doi: 10.1037/h0035792
Epley, N., Keysar, B., Van Boven, L., & Gilovich, T. (2004, September). Perspective Taking as
Egocentric Anchoring and Adjustment. Journal of Personality and Social Psychology, 87(3),
327–339. Retrieved from http://dx.doi.org/10.1037/0022-3514.87.3.327
doi: 10.1037/0022-3514.87.3.327
Epley, N., Morewedge, C. K., & Keysar, B. (2004). Perspective taking in children and adults: Equiv-
alent egocentrism but differential correction. Journal of Experimental Social Psychology, 40,
760–768.
Ferreira, V. S., & Dell, G. S. (2000). Effect of ambiguity and lexical availability on syntactic and
lexical production. Cognitive Psychology, 40, 296–340.
Fowler, C. A., & Housum, J. (1987). Talkers’ signaling of "new" and "old" words in speech and
listeners’ perception and use of the distinction. Journal of Memory and Language, 26, 484–
504.
Fukumura, K., & van Gompel, R. P. G. (2012). Producing Pronouns and Definite Noun
Phrases: Do Speakers Use the Addressee’s Discourse Model? Cognitive Science, 36(7),
1289–1311. Retrieved from http://dx.doi.org/10.1111/j.1551-6709.2012
.01255.x doi: 10.1111/j.1551-6709.2012.01255.x
Fussell, S. R., & Krauss, R. M. (1992). Coordination of knowledge in communication: Effects of
speakers’ assumptions about others’ knowledge. Journal of Personality and Social Psychol-
ogy, 62, 378–391.
Galati, A., & Brennan, S. E. (2010). Attenuating information in spoken communication: For
the speaker, or for the addressee? Journal of Memory and Language, 62, 35–51. doi:
10.1016/j.jml.2009.09.002
Gann, T. M., & Barr, D. J. (2012). Speaking from experience: Audience design as expert perfor-
mance. Language and Cognitive Processes. Manuscript in press.
Gundel, J. K., Hedberg, N., & Zacharski, R. (1993). Cognitive status and the form of referring
expressions in discourse. Language, 69, 274–307.
Hanna, J. E., Tanenhaus, M. K., & Trueswell, J. C. (2003). The effects of common ground and
perspective on domains of referential interpretation. Journal of Memory and Language, 49,
43–61. doi: 10.1016/s0749-596x(03)00022-6
Heller, D., Grodner, D., & Tanenhaus, M. K. (2008). The role of perspective in identifying domains
of reference. Cognition, 108, 831–836.
Horton, W. S., & Gerrig, R. J. (2005). The impact of memory demands on audience design during
language production. Cognition, 96, 127–142.
Horton, W. S., & Keysar, B. (1996). When do speakers take into account common ground? Cogni-
tion, 59, 91–117.
Jurafsky, D. (2003). Probabilistic modeling in psycholinguistics: Linguistic comprehension and
production. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probabilistic linguistics (pp. 39–95).
PERSPECTIVE TAKING AND ITS IMPOSTORS 18