Vous êtes sur la page 1sur 594

Frequency vs.

iconicity in explaining
grammatical asymmetries
MARTIN HASPELMATH*
Abstract
This paper argues that three widely accepted motivating factors subsumed
under the broad heading of iconicity, namely iconicity of quantity, iconicity
of complexity and iconicity of cohesion, in fact have no role in explaining
grammatical asymmetries and should be discarded. The iconicity accounts
of the relevant phenomena have been proposed by authorities like Jakobson,
Haiman and Givon, but I argue that these linguists did not suciently con-
sider alternative usage-based explanations in terms of frequency of use. A
closer look shows that the well-known Zipan eects of frequency of use
(leading to shortness and fusion) can be made responsible for all of the al-
leged iconicity eects, and initial corpus data for a range of phenomena
conrm the correctness of the approach.
Keywords: frequency; iconicity; markedness; economic motivation
1. Introduction
The notion of iconicity has become very popular in the last 25 years
among functional and cognitive linguists. In Crofts (2003: 102) words,
the intuition behind iconicity is that the structure of language reects in
some way the structure of experience. Iconicity is thus a very broad no-
tion, and it has been understood and applied in a great variety of ways
(see Newmeyer 1992: 23 for an attempt at a survey). In this paper,
I will examine just the three sub-types of (diagrammatic)
1
iconicity in
(1)(3), which have played an important role in discussions of gram-
matical asymmetries. I will argue that in fact none of these is relevant
for explaining grammatical asymmetries, and that the phenomena in
question should instead be explained by asymmetries of frequency of
occurrence.
Cognitive Linguistics 191 (2008), 133
DOI 10.1515/COG.2008.001
09365907/08/00190001
6 Walter de Gruyter
(1) Iconicity of quantity
Greater quantities in meaning are expressed by greater quantities
of form.
Example: In Latin adjective inection, the comparative and super-
lative denote increasingly higher degrees and are coded by increas-
ingly longer suxes (e.g., long(-us) long, long-ior longer, long-
issim(-us) longest).
(2) Iconicity of complexity
More complex meanings are expressed by more complex forms.
Example: Causatives are more complex semantically than the corre-
sponding non-causatives, so they are coded by more complex forms,
e.g., Turkish du s(-mek) fall, causative du s-u r(-mek) make fall,
drop.
(3) Iconicity of cohesion
Meanings that belong together more closely semantically are ex-
pressed by more cohesive forms.
Example: In possessive noun phrases with body-part terms, the
possessum and the possessor are conceptually inseparable. This is
mirrored in greater cohesion of coding in many languages, e.g., Mal-
tese id hand, id-i my hand, contrasting with sig g u chair, is-sig g u
tiegh-i [the-chair of-me] my chair (*sig g (u)-i ).
While iconicity of quantity is mentioned rarely, iconicity of complexity
and iconicity of cohesion are often invoked in the functional and cogni-
tive literature (and recently to some extent also in the generative litera-
ture; see 4.5). Both have been applied to a wide range of grammatical
phenomena by many dierent authors.
I argue in this paper that these three types of iconicity play no role
in explaining grammatical asymmetries of the type long(-us)/long-ior,
du s(-mek)/du s-u r(-mek), id-i/sig g u tiegh-i. Instead, such formal asym-
metries can and should be explained by frequency asymmetries: In all
these cases, the shorter and more cohesive expression types occur signi-
cantly more frequently than the longer and less cohesive expression types,
and this suces to explain their formal properties. No appeal to iconicity
is necessary. Worse, iconicity often makes wrong predictions, whereas fre-
quency consistently makes the correct predictions.
I want to emphasize that I make no claims about other types of iconic-
ity, such as
iconicity of paradigmatic isomorphism (one form, one meaning in the
system, i.e., synonymy and homonymy are avoided; Haiman 1980; Croft
1990a: 165, 2003: 105);
2 M. Haspelmath
iconicity of syntagmatic isomorphism (one form, one meaning in the
string, i.e., empty, zero and portmanteau morphs are avoided; Croft
1990a: 165, 2003: 103);
2
iconicity of sequence (sequence of forms matches sequence of experi-
ences; e.g., Greenberg 1963 [1966: 103]);
iconicity of contiguity (forms that belong together semantically occur
next to each other; this is similar to iconicity of cohesion, but dierent in
crucial ways, cf. 5);
iconicity of repetition (repeated forms signal repetition in experience,
as when reduplication expresses plurality or distribution).
For most of these iconicity types, frequency is clearly not a relevant
factor, and I have no reason to doubt the conventional view that the
relevant phenomena are motivated by functional factors that can be con-
veniently subsumed under the label iconicity. Whether these functional
factors can be reduced to a general preference for iconic over noniconic
patterns is a separate question that I will not pursue here.
I also need to emphasize that I am interested in explanation of gram-
matical structures, perhaps more so than many other authors that have
discussed iconicity. That is, I want to know why language structure is
the way it is, whereas some authors seem to be content with observing
that language structure is sometimes iconic:
The traditional view of language is that most relationships between linguistic units
and the corresponding meanings are arbitrary . . . But the cognitive claim is that
the degree of iconicity in language is much higher than has traditionally been
thought to be the case. (Lee 2001: 77)
As long as one merely observes that cases like long(-us)/long-ior and
du s(-mek)/du s-u r(-mek) can be regarded as iconic in some way, I have
no problem. What I am denying is that iconicity plays a motivating role
and should be invoked in explaining why the patterns are the way they are.
What I observed while reading the literature on iconicity is that a num-
ber of authors (e.g., Hockett 1958: 577578; Givo n 1985, 1991) seem to
use the term iconicity as a kind of antonym of arbitrariness, so that
almost anything about language structure that is not arbitrary falls under
iconicity. I am in broad sympathy with Givo ns general account of the
relation between arbitrariness and non-arbitrariness in language, but I
would insist on the need to identify the relevant factors as precisely as
possible and to make testable predictions. It is quite possible that the dis-
agreements about the role of frequency vs. iconicity will eventually turn
out to be less severe than it may seem at the beginning, but in any event
this paper should help to clarify the issues.
Frequency vs. iconicity in explaining grammatical assymetries 3
Iconicity and the frequency asymmetries discussed here are universal
explanatory factors, so their eects should be universal. This means that
in principle conrming data could come from any language, and ideally
the data should come from a large representative sample of languages.
Such data are still not very widely available, so this paper will continue
the practice of Haiman (1983) (and much other work) of making claims
about universal asymmetries that are not fully backed up by conrming
data, but that nevertheless seem very plausible because of the apparent
absence of counterevidence. Likewise, disconrming data could come
from any language, but of course isolated counterexamples are not su-
cient to show that no systematic coding asymmetry exists. Many of the
generalizations cited here are known to be merely strong tendencies, not
absolute universals.
The remainder of this paper is organized as follows: 2 discusses icon-
icity of quantity, 34 discuss iconicity of complexity, and 56 discuss
iconicity of cohesion. For each subtype of iconicity, I will rst cite au-
thors who have advocated it and mention examples of phenomena that
are allegedly motivated by iconicity, before presenting my arguments for
a frequency-based explanation of the phenomena. The nal 7 presents
the conclusions.
2. Iconicity of quantity
2.1. Advocates and examples
Iconicity of quantity was dened in 1 as follows:
(4) Greater quantities in meaning are expressed by greater quantities of
form.
It seems that the rst author to mention this motivating principle was
Jakobson (1965[1971: 352]) and (1971). Jakobson cited three examples:
(i) In many languages, the positive, comparative and superlative de-
grees of adjectives show a gradual increase in the number of phonemes,
e.g., high-higher-highest, [Latin] altus, altior, altissimus. In this way, the
signantia reect the gradation gamut of the signata (1965[1971: 352]).
The higher the degree, the longer the adjective.
(ii) The signans of the plural tends to echo the meaning of a numeral
increment by an increased length of the form (1965[1971: 352]). The
more referents, the more phonemes (e.g., singular book, plural books,
French singular je nis I nish, plural nous nissons we nish).
(iii) In Russian, the perfective aspect expresses a limitation in the
extent of the narrated event, and it is expressed by a more limited (i.e.,
4 M. Haspelmath
a smaller) number of phonemes (e.g., perfective zamoroz-it, imperfective
zamoraz -ivat freeze) (Jakobson 1971).
Iconicity of quantity is mentioned approvingly in Plank (1979: 123),
Haiman (1980: 528529, 1985: 5), Anttila (1989: 17), in Taylors (2002:
46) Cognitive Grammar textbook, and in Itkonen (2004: 28); see also
Lako and Johnson (1980: 127).
2.2. Frequency-based explanation
Any ecient sign system in which costs correlate with signal length will
follow the following economy principle:
3
(5) The more predictable a sign is, the shorter it is.
Since frequency implies predictability, we also get the following predic-
tion for ecient sign systems:
(6) The more frequent a sign is, the shorter it is.
These principles have been well known at least since Horns (1921) and
Zipf s (1935) work, but somehow under the inuence of the structuralist
movements many linguists lost sight of them for a few decades. However,
more recently cognitively oriented linguists have begun to appreciate the
importance of frequency again (e.g., Bybee and Hopper 2001, among
many others). I do not claim to have original insights about the way in
which frequency inuences grammatical structures, but I want to argue
that iconicity turns out to be less important as an explanatory concept if
one gives frequency the explanatory role that it deserves.
Principle (6) straightforwardly explains Jakobsons observations about
adjectival degree marking and singular/plural asymmetries, because uni-
versally comparative and superlative forms are signicantly rarer than
positive forms of adjectives, and plural forms are signicantly rarer than
singular forms (see Greenberg 1966: 3437, 4041). It is not possible to
make such a universal statement about perfective and imperfective aspect,
and the frequency of these aspectual categories depends much more on
the lexical meaning of the individual verb. But for Russian, Fenk-Oczlon
(1990) has shown that there is a strong correlation between length and
frequency of a verb form: in general, the more frequent member of a Rus-
sian aspectual pair is also shorter.
This frequency-based explanation is not only sucient to account
for the phenomena cited by Jakobson, but also necessary, because the
principle of iconicity of quantity makes many wrong predictions (as
was also observed by Haiman 2000: 287). For example, it predicts that
plurals should generally be longer than duals, that augmentatives should
Frequency vs. iconicity in explaining grammatical assymetries 5
generally be longer than diminutives, that words for ten should be
longer than words for seven, or even that words for long should
be longer than words for short, or that words for elephant should be
longer than words for mouse. None of these predictions are generally
correct (except perhaps for the last prediction, but note that mouse is
about twice as frequent as elephant in English).
4
Iconicity of quantity has never been considered particularly important,
and its refutation here is only a prelude to the refutation of the other two
kinds of iconicity in 36.
3. Iconicity of complexity: Advocates and examples
Iconicity of complexity was dened in 1 as follows:
(7) More complex meanings are expressed by more complex forms.
Here are some quotations from the literature that describe this principle
and refer to it as isomorphic or iconic.
Lehmann (1974: 111): Je komplexer die semantische Reprasentation
eines Zeichens, desto komplexer seine phonologische Reprasenta-
tion. (The more complex the semantic representation of a sign is,
the more complex is its phonological representation.)
Mayerthaler (1981: 25): Was semantisch mehr ist, sollte auch kon-
struktionell mehr sein. (What is more semantically should also
be more constructionally.)
Givo n (1991: 2.2): A larger chunk of information will be given a
larger chunk of code.
Haiman (2000: 283): The more abstract the concept, the more re-
duced its morphological expression will tend to be. Morphological
bulk corresponds directly and iconically to conceptual intension.
Langacker (2000: 77): [I]t is worth noting an iconicity between of s
phonological value and the meaning ascribed to it (cf. Haiman 1983).
Of all the English prepositions, of is phonologically the weakest by
any reasonable criterion. . . . Now as one facet of its iconicity, of is
arguably the most tenuous of the English prepositions from the se-
mantic standpoint as well . . .
In Lehmanns (1974) approach, semantic complexity is measured by
counting the number of features needed to describe the meaning of an ex-
pression. A contrast between presence and absence of a semantic feature
is often called semantic markedness, and very often iconicity of com-
plexity is described as a kind of iconicity of markedness matching:
(8) Marked meanings are expressed by marked forms.
6 M. Haspelmath
This principle was already formulated by Jakobson (1963[1966: 270]),
and repeated many times in the later literature, e.g.,
Plank (1979: 139): Die formale Markiertheitsopposition bildet die
konzeptuell-semantische Markiertheitsopposition d[iagrammatisch]-
ikonisch ab. (The formal markedness opposition mirrors the
conceptual-semantic markedness opposition in a diagrammatically
iconic way.)
Haiman (1980: 528): Categories that are marked morphologically
and syntactically are also marked semantically.
Mayerthaler (1987: 489): If (and only if ) a semantically more
marked category C
j
is encoded as more featured [ formally complex]
than a less marked category C
i
, the encoding of C
j
is said to be
iconic.
Givo n (1991: 106, 1995: 58): The meta-iconic markedness principle:
Categories that are cognitively markedi.e., complextend also to
be structurally marked.
Aissen (2003: 449): Iconicity favors the morphological marking of
syntactically marked congurations.
For similar statements, see also Zwicky (1978: 137), Matthews (1991:
236), Newmeyer (1992: 763), and Levinson (2000: 136137).
By formally marked, these authors generally mean expressed
overtly. Typical examples of such markedness matching are given in
(9).
(9) less marked/unmarked (more) marked
number singular (tree-) plural (tree-s)
case subject (Latin homo-) object (homin-em)
tense present ( play-) past ( play-ed )
person third (Spanish canta-)
5
second (canta-s)
gender masculine ( petit-) feminine ( petit-e)
causation non-causative
(Turkish du s--mek fall)
causative
(du s-u r-mek fell, drop)
object inanimate animate
(Spanish Veo la casa Veo a la nin a.
I see the house I see the girl.)
That there are universal formal asymmetries in these (and many other)
categories has been known since Greenberg (1966), and Jakobson
(1963[1966]) and (1965[1971]) explicitly refers to Greenbergs cross-
linguistic work. However, Greenberg did not invoke iconicity to explain
the formal asymmetries of the kind illustrated in (9). He had good rea-
sons, as we will see in the next section.
Frequency vs. iconicity in explaining grammatical assymetries 7
4. Iconicity of complexity: frequency-based explanation
4.1. Complex/marked expressions are rarer
Greenbergs (1966) explanation was in terms of the frequency asymme-
tries in the use of the grammatical forms. He noted that less marked
forms are more frequent, and more marked forms are less frequent
across languages. Thus, the economy principles in (5)(6) are sucient
to explain the asymmetries in (9) (see also Croft 2003: 110117). The
English preposition of is not only the most semantically tenuous
(Langacker 2000: 77), but also the most frequent of all the English prep-
ositions. Singulars are more frequent than plurals, nominatives are more
frequent than accusatives, the present tense is more frequent than the
past tense, the third person is more frequent than other persons, and
the masculine is more frequent than the feminine. All of this was docu-
mented by Greenberg (1966) for a few selected languages, and the hy-
pothesis that it holds universally has not been challenged. That causa-
tives are generally less frequent than the corresponding non-causatives
is also clear; I discuss this case in more detail below (4.4). And among
objects, inanimate referents are much more frequent than animate refer-
ents (4.5).
This frequency-based explanation is not only sucient to account for
the relevant phenomena, but also necessary, because iconicity of com-
plexity makes some wrong predictions. In (10), I list cases that go in the
opposite direction of the patterns in (9).
(10) less marked/unmarked (more) marked
number plural singular
Welsh plu feathers plu-en feather
case object case subject case
Godoberi mak
0
i child mak
0
i-di (ergative)
person second p. imperative third p. imperative
Latin canta- sing! canta-to let her sing
gender female male
English widow- widow-er
causation causative noncausative
German onen sich onen
In all these cases, frequency makes the right predictions. Plurals like
Welsh plu feathers are more frequent than singulars (Tiersma 1982), in
the imperative mood the second person is more frequent than the third
person, the word widow is more frequent than the word widower, and
with verbs like open, the causative is more frequent than the noncausa-
tive (see 4.4).
8 M. Haspelmath
These exceptions have long been known in the literature, but linguists
have often described them in terms of markedness reversal. The idea is
that markedness values can be dierent in dierent contexts, so that, for
example, third person is not absolutely unmarked with respect to second
person, but in certain contexts second person can be unmarked and rst
person can be marked (e.g., Waugh 1982; Tiersma 1982; Witkowski and
Brown 1983; Haiman 1985: 148149; Croft 1990a: 66). But in order to
reconcile the cases in (10) with iconicity of complexity, one would have
to show that not only the formal coding, but also the semantic/functional
markedness value has changed. This is much more dicult, and it has
not been shown that it is generally true that in cases of markedness rever-
sal, the formally unmarked term of the opposition is also semantically or
functionally unmarked. For example, Tiersmas (1982) main additional
evidence that locally unmarked plurals like Welsh plu feathers are
generally unmarked (i.e., do not merely show reversed formal coding) is
that in analogical leveling, the plural survives. But analogical leveling is
of course just another symptom of frequency of occurrence (cf. Bybee
1985: Ch. 3).
To make matters even more complex, some authors seem to mean fre-
quency when they say (functional) unmarkedness: Marked means rare,
and unmarked means frequent. For example, in a discussion of un-
marked plurals, Haiman writes:
. . . what is fundamentally at issue is markedness. Where plurality is the norm, it
is the plural which is unmarked, and a derived marked singulative is employed
to signal oneness: thus, essentially, wheat vs. grain of wheat. (Haiman 2000:
287)
The norm is of course the same as the more frequent situation, so what
is fundamentally at issue is frequency. Linguists are of course free to
dene their terms in whatever way they wish, but claiming not only that
formally marked elements tend to be functionally marked (in the sense
of being less frequent), but also that this a surprising instance of mark-
edness matching (or iconicity), is not helpful. The much simpler obser-
vation is that formally marked elements tend to be less frequent, and
this observation is straightforwardly explained by the economy princi-
ples in (5)(6). Neither iconicity nor markedness are relevant con-
cepts in stating and explaining these facts (see Haspelmath 2006 for
detailed argumentation that a notion of markedness is superuous in
linguistics).
The contrasts in (9) show zero expression vs. overt expression, but
some authors such as Lehmann (1974) and Haiman (2000) also talk
Frequency vs. iconicity in explaining grammatical assymetries 9
about length dierences between dierent types of morphemes. In partic-
ular, both authors note that grammatical morphemes are universally
shorter than lexical morphemes, and they claim that this iconically mir-
rors their more abstract or less complex meaning. But again frequency
and economy account for the same facts. Iconicity makes the wrong
prediction that lexical items with highly abstract or simple meanings
should be consistently shorter than items with more concrete or complex
meanings (as noted by Ronneberger-Sibold 1980: 239). It predicts, for ex-
ample, that entity should be shorter than thing or action, that animal
should be shorter than cat, that perceive should be shorter than see, and
so on.
6
4.2. Relative frequency and absolute frequency
It is important to recognize that the relevant type of frequency for the
purposes of this paper is relative frequency, not absolute frequency (cf.
Corbett et al. 2001 for some discussion of this contrast). That is, what I
am looking at here is the relation between the frequency of one category
and the frequency of another category (within a class of lexemes or a
construction): e.g., the relation between the frequency of singulars and
the frequency of plurals (in nouns), the relation between the frequency of
positive forms and the frequency of comparative forms (in adjectives), the
relation between the frequency of inanimate objects and the frequency of
inanimate objects (in transitive verb phrases), and so on.
I am not looking at the absolute frequencies of individual lexemes with
a particular category. The absolute frequency of English books, the plural
of book, is 131 (occurrences per million words, Leech et al. 2001), while
the singular of notebook occurs only 8 times. But the singular and the
plural should not be compared across dierent lexemes. The relative fre-
quencies are as expected: book 243, books 131, notebook 8, notebooks 3.
Likewise for positives and comparatives: the comparative lower occurs
111 times, and the positive bright occurs only 54 times. But the propor-
tions (i.e., relative frequencies) are as expected: low 158, lower 111, bright
54, brighter 5.
What is crucial is that the items whose frequency and formal expression
is compared are paradigmatic alternatives, i.e., that in some sense they
must occur in the same slot. It is in such slots that expectations arise, so
that more frequent items can make do with shorter coding because of
their greater predictability. If two items are not paradigmatically related,
it does not make so much sense to compare their frequency.
Another question is how big the frequency dierence should be to be
reected in grammar. The answer is: signicant. Perhaps one would see
10 M. Haspelmath
bigger dierences in form where the frequency dierences are bigger, but
this is an issue that I do not pursue in this paper.
4.3. Adjectives and abstract nouns: Resolving an iconicity paradox
Croft and Cruse (2004: 175) observe a curious iconicity paradox in
connection with adjectives such as those in (11) and the corresponding
abstract nouns:
(11) long leng-th
deep dep-th
high heigh-t
thick thick-ness
They note that denitions of such adjectives presuppose a scale of length,
depth, height, or thickness that is expressed by an abstract noun. Thus,
long means something like noteworthy in terms of length (cf. also
Melcuk 1967). This abstract noun is thus conceptually simpler than the
adjective, and yet it tends to be morphologically more complex across
languages. The situation in (11) thus appears to run counter to the prin-
ciple that morphological complexity mirrors cognitive complexity (Croft
and Cruse 2004: 175).
Croft and Cruse try to solve the paradox, but do not seem to be very
condent in their solution:
One possible explanation is that, in applying the iconic principle, we should
distinguish between structural complexity (in terms of the number of elementary
components and their interconnections) and processing complexity (in terms of
the cognitive eort involved). Perhaps they are acquired rst of all in an unanal-
yzed, primitive, Gestalt sense, which is basically relative. Maybe in order to
develop the full adult system, analysis and restructuring are necessary. Some of
the results of the analysis may well be conceptually simpler in some sense than
the analysand, but the extra eort that has gone into them is mirrored by the mor-
phological complexity. (Croft and Cruse 2004: 175)
But in fact, no solution to the paradox is required, because it is a
pseudo-paradox: There is no principle that morphological complexity
mirrors cognitive complexity. As we saw, morphological complexity
(in the sense of length) mirrors rarity of use. It is easy to determine that
adjectives are signicantly more frequent than the corresponding abstract
nouns. In (12), frequency gures from Leech et al. 2001 are given (the
gures again indicate occurrences per million words). The example of
beautiful/beauty shows that isolated exceptions to the coding regularity
are possible.
7
Frequency vs. iconicity in explaining grammatical assymetries 11
(12) long 392 leng-th 85
deep 97 dep-th 41
high 547 heigh-t 47
thick 51 thick-ness <10
beautiful 87 beauty 44
4.4. The inchoative-causative alternation: Economy instead of iconicity
In 3 and 4.1, we saw that pairs of noncausative (inchoative) and caus-
ative verbs are not uniformly coded: Sometimes the causative is coded
overtly, based on the inchoative (e.g., Turkish du s--mek fall, du s-u r-
mek fell, drop), and sometimes the inchoative is coded overtly, based
on the semantically causative verb. Such cases are called anticausatives
(e.g., German onen open (tr.), sich onen open (intr.); Russian otkry-
vat
0
-sja open (tr.), otkryvat
0
-sja open (intr.)).
On the natural assumption that causatives have an additional meaning
element (i.e. Russian otkryvat
0
sja means become open, and otkryvat
0
means cause to become open), anticausative coding would be counter-
iconic (as was observed by Melcuk 1967). This was seen as a problem
by Haspelmath (1993), who assumed the iconicity-of-complexity principle
(as well as markedness matching). However, Haspelmath found in a
cross-linguistic study that dierent verb pairs tend to behave dierently
with respect to which member of the pair (the inchoative or the causative)
tends to be coded overtly (cf. also Croft 1990b). Some verb meanings
(which for convenience will be called automatic) tend to be coded as caus-
atives (e.g., freeze, dry, sink, go out, melt), whereas others (which
for convenience will be called costly) are preferably coded as anticaus-
atives (e.g., split, break, close, open, gather). The idea behind the
terms automatic and costly is that the automatic events do not often
require input from an agent to occur, whereas the costly events tend not
to occur spontaneously but must be instigated by an agent. While the au-
tomatic events conform to iconicity, it is especially the costly events that
do not. Haspelmath tried to save the iconicity hypothesis by suggesting
that in some way the frequency of occurrence of a particular event de-
scription is reected in the way its meaning is treated by speakers:
Iconicity in language is based [not on objective meaning but] on conceptual
meaning . . . Events that are more likely to occur spontaneously will be associated
with a conceptual stereotype (or prototype) of a spontaneous event, and this will
be expressed in a structurally unmarked way. (Haspelmath 1993: 106107)
This move is reminiscent of Lehmanns suggestion that rarity results in
a high informational value and therefore somehow in high semantic
12 M. Haspelmath
complexity (cf. note 6), and of the desperate attempt by Croft and Cruse
to solve their iconicity paradox.
Fortunately, a much simpler explanation is available in which iconicity
of complexity plays no role, and the coding preferences are explained
in terms of economy: Automatic verb meanings tend to occur more fre-
quently as inchoatives than costly verb meanings, which tend to occur
more frequently as causatives. Due to economic motivation, the rarer ele-
ments tend to be overtly coded. Wright (2001: 127128) presents some
preliminary corpus evidence from English, as shown in Table 1:
Thus, inchoatives and causatives behave in much the same way as singu-
lars and plurals: Whichever member of the pair occurs more frequently
tends to be zero-coded, while the rarer (and hence less expected) member
tends to be overtly coded. Language-particular dierences often obscure
this picture (e.g., languages that never have overtly coded singulars, or
languages lacking overtly coded causatives), which emerges fully only
once a typological perspective is adopted.
4.5. Dierential object marking: Economy instead of iconicity
It has long been observed (e.g., Blansitt 1973; Comrie 1989; Bossong
1985, 1998) that the overt coding of a direct object often depends on
its animacy, and that such variation in object-marking can be subsumed
under a general rule:
(13) The higher a (direct) object is on the animacy scale, the more likely
it is to be overtly coded (i.e., accusative-marked).
According to Comrie, this is because animate objects are not as natural
as inanimate objects:
. . . the most natural kind of transitive construction is one where the A[gent] is
high in animacy and deniteness and the P[atient] is lower in animacy and
Table 1. Percentage of transitive ( causative) occurrences of some English inchoative-
causative verb pairs
verb pair % transitive
freeze 62% more causatives
dry 61%
melt 72%
burn 76%
open 80%
break 90%
A
B
more anticausatives
Frequency vs. iconicity in explaining grammatical assymetries 13
deniteness; and any deviation from this pattern leads to a more marked construc-
tion. (Comrie 1989: 128)
In an interesting paper that tries to integrate insights from the
functional-typological literature into an Optimality Theory (OT) frame-
work, Aissen (2003: 3) proposes an account that appeals to a xed
constraint subhierarchy involving local conjunction of a markedness hier-
archy of relation/animacy constraints (cf. 14) with a constraint against
non-coding (*
Case
):
(14) markedness subhierarchy:
*Obj/Humg*Obj/Animg*Obj/Inan
The resulting xed constraint subhierarchy is shown in (15). Roughly this
can be read as follows: Structures with zero-coded human objects are
worse than structures with zero-coded animate objects, and these in turn
are worse than structures with zero-coded inanimate objects.
(15) *Obj/Hum & *
Case
g*Obj/Anim & *
Case
g*Obj/Inan &
*
Case
Aissen motivates these constraints by appealing to markedness matching
and iconicity:
The eect of local conjunction here is to link markedness of content (expressed by
the markedness subhierarchy) to markedness of expression (expressed by *).
That content and expression are linked in this way is a fundamental idea of mark-
edness theory (Jakobson 1939; Greenberg 1966). In the domain of Dierential
Object Marking, this is expressed formally through the constraints [in (15)]. Thus
they are iconicity constraints: they favor morphological marks for marked
congurations. (Aissen 2003: 449)
Combined with economy constraints (*Struc), these constraints allow
Aissen to describe all and only the attested language types in her
framework.
However, a much more straightforward explanation of the Dierential
Object Marking universal is available: Inanimate NPs occur more fre-
quently as objects, whereas animate NPs occur more frequently as sub-
jects. Due to economic motivation, the rarer elements tend to be overtly
coded. This explanation has in fact long been known (Filimonova 2005
cites antecedents in the 19th century), though actual frequency evidence
has been cited only more recently (see Jager 2004).
8
Thus, no appeal to markedness matching or iconicity is needed, nor is
Aissens elaborate machinery of OT constraints needed to explain Dier-
ential Object Marking.
14 M. Haspelmath
5. Iconicity of cohesion: Advocates and examples
Iconicity of cohesion was dened in 1 as follows:
(16) Meanings that belong together more closely are expressed by more
cohesive forms.
Iconicity of cohesion is discussed in detail by Haiman (1983) under the
label iconic expression of conceptual distance (The linguistic distance
between expressions corresponds to the conceptual distance between
them, Haiman 1983: 782).
9
What he means by linguistic distance is
made clear by the scale in (17), where (a)(d) show diminishing linguistic
distance (in my terms, increasing cohesion).
(17) Haimans (1983: 782) cohesion scale
a. X word Y (function-word expression)
b. X Y ( juxtaposition)
c. XY (bound expression)
d. Z (portmanteau expression)
I prefer the term cohesion to distance for this scale, because (b) and (c) do
not literally dier in distance, and distance is not really applicable to (d).
Moreover, I want to distinguish strictly between cohesion and contigu-
ity. That there is a functionally motivated preference for contiguity, i.e.,
for elements that belong together semantically to occur next to each
other in speech, is beyond question (see also Hawkins 2004: Ch. 5).
Newmeyers (1992: 761762) discussion of iconicity of distance (and
similarly Givo ns (1985: 202, 1991: 89) proximity principle) conate
cohesion and contiguity. I only argue against an iconicity-based explana-
tion of phenomena related to cohesion.
The following four examples of iconicity of cohesion are the most im-
portant cases cited in the literature:
(i) Possessive constructions: Inalienable possession shows at least the
same degree of cohesion as alienable possession, because in inalienable
possession (i.e., possession of kinship and body part terms) the possessor
and the possessum belong together more closely semantically (Haiman
1983: 793795, 1985: 130136; see also Koptjevskaja-Tamm 1996). An
example:
(18) Abun (West Papuan; Berry and Berry 1999: 7782)
a. ji bi nggwe
I of garden
my garden
b. ji syim
I arm
my arm
Frequency vs. iconicity in explaining grammatical assymetries 15
(ii) Causative constructions: Causative constructions showing a greater
degree of cohesion tend to express direct causation (where cause and
result belong together more closely), whereas causative constructions
showing less cohesion tend to express indirect causation (Haiman 1983:
783787; cf. also Comrie 1989: 172173; Dixon 2000: 7478). The fol-
lowing example is cited by Dixon (2000: 69):
(19) Buru (Austronesian; Indonesia; Grimes 1991: 211)
a. Da puna ringe gosa.
3sg.A cause 3sg.O be.good
He (did something which, indirectly,) made her well.
b. Da pe-gosa ringe.
3sg.A caus-be.good 3sg.O
He healed her (directly, with spiritual power).
A similar Japanese example is provided by Horie (1993: 26):
(20) a. John-wa Mary-ni huku-o ki-se-ta.
John-top Mary-dat clothes-acc wear-caus-past
John put clothes on Mary.
b. John-wa Mary-ni huku-o ki sase-ta.
John-top Mary-dat clothes-acc wear cause-past
John made Mary wear clothes.
The much-discussed English distinction between kill and cause to die is of
course also an instance of this contrast (e.g., Lako and Johnson 1980:
131).
(iii) Coordinating constructions: Many languages distinguish between
loose coordination and tight coordination (i.e., less vs. more cohesive pat-
terns), where the rst expresses greater conceptual distance and the latter
expresses less conceptual distance (Haiman 1983: 788790, 1985: 111
124). Haiman discusses coordination of clauses and cites the two exam-
ples in (21) and (22), where the greater cohesion is manifested by the
absence of a coordinator. In (21a), the greater conceptual distance lies in
the temporal non-connectedness, while in (22a), the greater conceptual
distance lies in the lack of subject identity.
(21) Fefe (Bantoid; Cameroon; Hyman 1971: 43)
a. a` ka` gen ntee n njwen lwa`
0
he past go market and buy yams
He went to the market and also (at some later date) bought
yams.
b. a` ka` gen ntee njwen lwa`
0
he past go market buy yams
He went to the market and bought yams (there).
16 M. Haspelmath
(22) Aghem (Bantoid; Cameroon; Anderson 1979: 114)
a. O
`
nam kb

gha y a z
she cook fufu we.excl and eat
She cooked fufu and we ate it.
b. O
`
m

m mam kb

she past sing cook fufu


She sang and cooked fufu.
Walchli (2005: Ch. 3) also discusses noun phrase coordination and cites
contrasts such as (23). He calls the semantic distinction between them
accidental coordination vs. natural coordination, and claims that
the formal contrast between loose coordination in (23a) and tight coordi-
nation in (23b) iconically reects this semantic contrast (2005: 13).
(23) Georgian
a. gveli da k
0
ac
0
i
snake and man
the snake and the man
b. da-dzma
sister-brother
brother and sister
(iv) Complement clause constructions: Haiman (1985: 124130) also
discusses complement-clause constructions in terms of iconicity of cohe-
sion mirroring conceptual closeness. He observes that in the contrast in
(24), the reduced or contracted version signals conceptual closeness
(same subject), while a non-reduced version signals conceptual distance
(dierent subject) (1985: 126).
(24) a. Who do you wanna succeed? (whopatient; same subject)
b. Who do you want to succeed? (whoagent possible; di.
subject possible)
But much better known is Givo ns work on iconic form-function cor-
respondences in complement clauses (1980, 1990: Ch. 13, 2001: Ch. 12;
see also 1985: 199202, 1991: 9596), which posits a scale of event
integration (called binding hierarchy in earlier versions) that corre-
sponds to a scale of formal integration. In the most recent version of
this, Givo n posits an iconic principle of event integration and clause
union:
The stronger is the semantic bond between the two events, the more extensive will
be the syntactic integration of the two clauses into a single though complex clause
(Givo n 2001: 40)
Frequency vs. iconicity in explaining grammatical assymetries 17
Among his examples are contrasts such as the following, where in each
case the rst example exhibits greater event integration and greater syn-
tactic integration (non-niteness and/or absence of a complementizer):
(25) a. John made Mary quit her job. (2001: 45)
b. John caused Mary to quit her job.
(26) a. She wanted him to leave. (2001: 47)
b. She wished that he would leave.
(27) a. She told him to leave. (2001: 48)
b. She insisted that he must leave.
(28) a. She saw him coming out of the theatre. (2001: 50)
b. She saw that he came out of the theatre.
6. Iconicity of cohesion: frequency-based explanation
My claim here is that Haimans cohesion scale in (17) does not reect one
single underlying cause. It should be taken apart into three dierent
distinctions: (i) overt coding vs. lack of coding (X word Y vs. X Y), (ii)
juxtaposition vs. bound expression (X Y vs. X-Y), and (iii) portmanteau
expression (Z). All three are related to frequency, but not in the same
way. This is clearest in the case of portmanteau expression (or supple-
tion), which only occurs when the combination of the two elements has a
high absolute frequency. For instance, in the domain of causative con-
structions, English has the bound causatives sadd-en make sad, wid-en
make wide, hard-en make hard, but it is only for high-frequency adjec-
tives like good and small that it has suppletive causatives (improve make
good, reduce make small). Similarly, a few cases of suppletion in posses-
sive constructions are attested, but these all come from high-frequency
nouns such as mother (e.g., Ju|hoan taqe` mother, a a my mother,
Dickens 2005: 35). The reason why high absolute frequency favours
suppletion (and irregularity more generally) has long been known: High
frequency elements are easy to store and retrieve from memory, so
there is little need for regularity (cf. Ostho 1899, Ronneberger-Sibold
1988).
However, the overt-covert contrast (X word Y vs. X Y) and the free-
bound contrast (X Y vs. X-Y) are due to frequency-induced predictabil-
ity, as seen earlier for contrasts that others have explained by iconicity of
quantity (2) and by iconicity of complexity (34). Predictability leads to
shortness of coding by economy, and shortness of coding itself leads
to bound expression, because short (and unstressed) elements do not
have enough bulk to stand on their own. The phenomena that Haiman
18 M. Haspelmath
explains through iconicity of cohesion actually all instantiate only the
overt-covert contrast and/or the free-bound contrast, so what matters
for them is again relative frequency.
Let us now examine the four main construction types with alleged ef-
fects of iconicity of cohesion to see how their properties can be explained
in terms of relative frequency.
6.1. Possessive constructions
With inalienably possessed nouns, possessive constructions are of course
much more frequent than with alienably possessed nouns (cf. Nichols
1988: 579). This can be easily demonstrated with corpus gures. Table 2
shows frequencies of three (hopefully representative) sets of nouns in
spoken English and spoken Spanish.
We see that alienable nouns occur as possessed nouns in a possessive
construction only relatively rarely (12% and 7% of the time, respectively),
Table 2. Frequencies of selected kinship terms, body part terms and alienable nouns
English kinship terms
a
body part terms
b
alienable nouns
c
total 16235 100% 11038 100% 24991 100%
possessed 7797 48% 4940 45% 2967 12%
nonpossessed 8434 52% 6098 55% 22024 88%
Source: British National Corpus, spoken part
a mother, father, brother(s), sister(s), wife, husband, son(s), daughter(s), mum, dad,
grandfather, grandmother, aunt, uncle
b head, hand(s), face, nger(s), knee(s), ear(s), leg(s), wrist, hair, nose, neck, belly,
skin, elbow, chest
c car, dinner, health, tree, knife, bed, community, meat, money, bike, suitcase, tools,
book(s), room, bedroom, kitchen
Spanish kinship terms
d
body part terms
e
alienable nouns
f
total 18391 100% 8863 100% 10913 100%
possessed 7362 40% 1297 15% 776 7%
nonpossessed 11029 60% 7566 85% 10137 93%
Source: Corpus del Espan ol, spoken part
d madre, padre(s), hermano(s), hermana(s), esposa, marido, hijo(s), hija(s), mama, papa,
abuelo(s), abuela, t a, t o
e cabeza, mano(s), cara, dedo(s), rodilla(s), o do(s), pierna(s), mun eca, pelo, nariz,
cuello, vientre, piel, codo, pecho, hombro(s)
f coche, cena, salud, arbol, cuchillo, cama, comunidad, pueblo, carne, dinero, bicicleta,
maleta, herramientas, libro(s), habitacion, dormitorio, cocina
Frequency vs. iconicity in explaining grammatical assymetries 19
while it is very common for kinship terms and body part terms to occur
as possessed nouns. (The fact that the gure for Spanish body part terms
is relatively low here is due to the omissibility of overt possessors in body-
part constructions like levanta la mano raise your hand; strictly speak-
ing, all notional possessors would have to be counted, but this is im-
possible to do automatically.)
As we saw in 4.2, what counts is relative frequencies, not absolute fre-
quencies. Since frequent alienable nouns like house or show are much
more frequent than rare inalienable nouns like kidney or great niece in
most cultural contexts, the alienable nouns may well occur in a possessive
construction more often than the inalienable nouns. However, the per-
centage of possessed occurrences of inalienable nouns will always be sig-
nicantly higher than the corresponding percentage of alienable nouns.
Thus, upon encountering an inalienable noun, it will be much easier to
predict that it occurs in a possessive construction, and the possessive
marking is therefore relatively redundant. Since languages are ecient
systems, they tend to show less overt coding with inalienable nouns.
Moreover, since pronominal possessors are more predictable, they show
a greater tendency to become axed, thus accounting for the contrast be-
tween juxtaposition and bound expression.
Crucially, the economy account given here makes somewhat dierent
predictions from Haimans (1983) iconicity account. The facts show that
the predictions of the economy account are the correct ones.
First, the iconicity account is compatible with a hypothetical situation
in which the pronominal possessor in the inalienable construction is
actually longer than the corresponding form in the alienable possession.
However, economy additionally predicts that the form of the inalienable
pronominal possessor not only tends to be bound, but also tends to be
shorter than the alienable possessor. This is in general borne out, and I
know of no counterexamples. Some examples are given in (29).
(29) alienable
construction
inalienable
construction
a. Nakanai luma taku lima-gu
(Johnston 1981: 217) house I hand-1sg
my house my hand
b. Hua dgai fu d-za
(Haiman 1983: 793) I pig 1sg-arm
my pig my arm
c. Ndjebbana budmanda ngayabba nga-ngardabbamba
(McKay 1996: 3026) suitcase I 1sg-liver
my suitcase my liver
20 M. Haspelmath
d. Kpelle a pri m-polu
(Welmers 1973: 279) I house 1sg-back
my house my back
e. Ju|hoan m tju` m ba
(Dickens 2005: 35) 1sg house 1sg father
my house my father
Second, Haimans account in terms of distance matching predicts that
the additional element in alienable constructions should occur in the
middle between the possessor and the possessum, as seen in the canonical
examples from Maltese (is-sig g u tiegh-i [the-chair of-me] my chair, see
1) and from Abun ( ji bi nggwe [I of garden] my garden, see (18)).
However, the extra element may also occur to the left or right of both
the possessor and the possessum, as seen in (30).
(30) alienable
construction
inalienable
construction
a. Puluwat nay-iy hamwol pay-iy
(Elbert 1974: 55, 61) poss-1sg chief hand-1sg
my chief my hand
b.
0
O
0
odham n -mi:stol-ga n -je
0
e
(Zepeda 1983: 7481) 1sg-cat-possd 1sg-mother
my cat my mother
c. Koyukon se-tel-eO se-tlee
0
(Thompson 1996: 654, 667) 1sg-socks-possd 1sg-head
my socks my head
d. Achagua nu-caarru-ni nu-w ta
(Wilson 1992) 1sg-car-possd 1sg-head
my car my head
My economy account only predicts that the coding of inalienable con-
structions should tend to be shorter, but it says nothing about the posi-
tion of the extra coding element in alienable constructions, so cases like
(30ad) are counterevidence to Haimans iconicity account, but com-
patible with my economy account. Haiman (1983: 795) himself cites the
Puluwat example, recognizes that it is a problem for him, and ac-
knowledges the need to reformulate his initial generalization. But he
does not seem to recognize that the facts no longer support any role of
iconicity.
Finally, some languages show overt coding of inalienable nouns as
well, but only when they are not possessed. An example comes from
Koyukon (Athabaskan; Thompson 1996: 654, 656, 667):
Frequency vs. iconicity in explaining grammatical assymetries 21
(31) Koyukon unpossessed possessed
alienable te se-tel-e
0
socks 1sg-socks-possd
socks my socks
inalienable k
0
e-tlee
0
se-tlee
0
unsp-head 1sg-head
head my head
Haimans iconicity does not make any predictions about unpossessed
constructions, but the economy account predicts just what we see: Alien-
able nouns tend to have overt coding in the possessed construction,
whereas inalienable nouns tend to have overt coding in the unpossessed
construction.
Thus, the iconicity account is both too weak (in that it does not predict
the shortness of inalienable possessive pronouns, seen in (29)) and too
strong (in that it wrongly predicts that the patterns in (30) should not be
possible). Economy, by contrast, makes just the right predictions.
6.2. Causative constructions
Again I claim that direct causatives are signicantly more frequent than
indirect causatives and that that explains why they exhibit more cohesive
coding than indirect causatives. No appeal to iconicity is necessary.
In order to show that this is true, ideally one would examine a corpus
of a language with a regular grammatical contrast between direct and in-
direct causation, as illustrated in (19) for Buru and in (20) for Japanese. I
hope that this paper will inspire such research, and I expect that the direct
causatives are much more frequent than the indirect causatives. In the lit-
erature on English, the contrasts between the dierent types of periphras-
tic causatives have received some attention. According to Gilquin (2006:
7), the frequency in the British National Corpus of the four causative
verbs that combine with an innitive are as in (32):
(32) spoken written total
make (I made him go) 898 258 1,156
get (I got him to go) 350 52 402
cause (I caused him to go) 15 207 222
have (I had him go) 48 29 77
Since the make and get causatives are usually regarded as expressing a
more direct type of causation, while the cause and have causatives express
a more indirect type of causation, this is just what we would expect.
It is also possible to compare lexical causative verbs with the corre-
sponding periphrastic cause causatives (this is also what Haiman 1983
22 M. Haspelmath
mostly does for the semantic aspects). Some gures from the British Na-
tional Corpus are given in (33) (these are only the forms with a pronoun
object, i.e., kill me, cause him to die, etc).
(33) stop 3267 cause to stop 6
kill 2400 cause to die 2
raise 466 cause to rise 3
bring down 269 cause to come down 0
drown 80 cause to drown 0
These comparisons are more problematic than those in (32) in that the
length of the two types of causatives diers sharply, so one might suspect
that the lexical direct causatives are more frequent simply because they
are shorter. In general, such eects do not seem to be particularly strong,
if they exist at all (see Haspelmath 2008: 6.5 for further discussion), but
still in the ideal case we would like to perform our corpus study on a lan-
guage where all causatives are expressed grammatically (i.e., even kill
and raise are expressed as die-caus and rise-caus). But since many di-
rect causatives are highly frequent (in an absolute sense) in all languages,
we normally nd a lot of portmanteau expression of causatives, which
limits our options for corpus counts. Nevertheless, the gures in (32) and
(33) should be sucient to make a good initial case for the claim that
direct causatives are generally more frequent than indirect causatives.
If this is true, then the economy account makes a further prediction:
that markers of indirect causation should not only be less cohesive, but
also tend to be longer. And indeed a number of languages have two
causatives diering primarily in length, not in cohesion (cf. Dixon 2000:
7478).
(34) indirect causative direct causative
a. Amharic as-balla a-balla
(Haiman 1983: 786, caus-eat caus-eat
Amberber 2000: 317320) force to eat feed
b. Hindi ban-vaa- ban-aa-
(Dixon 2000: 67, be.built-caus be.built-caus
Saksena 1982) have sth. built build
c. Jinghpaw -shangun sha-
(Maran and Clifton 1976)
d. Creek -ipeyc -ic
(Martin 2000: 394399)
Although Haiman (1983: 786) cites the example from Amharic as an in-
stance of an iconicity contrast, it does not actually t his iconicity expla-
nation. The two causatives of Amharic and the other languages in (34)
Frequency vs. iconicity in explaining grammatical assymetries 23
do not dier in cohesion, but only in length, so the contrast is predicted
only by the economy account.
10
6.3. Coordinating constructions
While Haimans discussion of examples like (2122) above only mentions
the semantic contrast between greater and less conceptual distance, Wal-
chlis terminology (accidental vs. natural coordination) already points to
the real motivating factor: Natural coordination (as in 21b, 22b and 23b)
is natural, i.e., frequent and expected for the pair of expressions, while
accidental coordination is infrequent and hence unexpected. Thus, it is
economical to use more explicit and less cohesive coding in accidental
coordination, and less explicit and more cohesive coding in natural
coordination.
Doing the frequency counts for clause coordination is fairly trivial. For
example, in the German version of The wolf and the seven little kids (one
of Grimms fairy tales), there are 47 und-coordinations, and 41 of them
show subject identity, while only 6 have dierent subjects. All 47 cases
exhibit temporal closeness.
For noun phrase conjunction of the type da-dzma brother-and-sister
(23b), the frequency counts are less straightforward, because the deni-
tion of accidental and natural coordination is quite vague: Walchli (2005:
5) describes natural coordination as coordination of items which are ex-
pected to cooccur, which are closely related in meaning, and which form
conceptual units. This is not specic enough to test the claim directly,
but it seems plausible that for noun phrases, too, it will be possible to
show that coordinations of the type brother and sister will turn out to
be more frequent than coordinations of the type the man and the snake.
6.4. Complement-clause constructions
For many of the examples given by Haiman and Givo n, the frequency ex-
planation is completely straightforward. With want verbs (cf. 24ab),
the same-subject use is of course overwhelmingly more frequent than the
dierent-subject use, for well-understood reasons (our desires naturally
concern rst of all our own actions), and this is often reected in shorter
coding (cf. Haspelmath 1999). This explains the contrast between English
wanna and want to, and also a similar contrast between gotta and got to (I
gotta go home now vs. I got to go to Hawaii last winter) that was already
pointed out and correctly explained by Bolinger (1961: 27) (condensa-
tion is tied to familiarity, cited approvingly by Haiman 1985: 126).
There are also obvious frequency asymmetries between the pairs make/
cause (cf. 25), want/wish (cf. 26), and tell/insist (cf. 27) which suce to
24 M. Haspelmath
explain the shorter coding of the rst member of each pair.
11
Givo n is
right that in each case there is also a semantic contrast, but in order
to show that the semantic contrast is indeed responsible for the formal
contrast, he should provide contrasting examples of constructions with
roughly equal frequency.
In contrasts such as (28ab) (She saw him coming out of the theatre vs.
She saw that he came out of the theatre), which do not exhibit a striking
frequency asymmetry, another factor is clearly highly relevant: In (28a),
the complement event necessarily occurs simultaneously with the main
event, in contrast to (28b), where the complement event could take place
at some other time (She saw that he would come out only two hours later/
that he had come out two hours earlier). In Cristofaros (2003: 5.3.2)
terms, (28a) shows predetermination of the tense value of the comple-
ment clause, and Cristofaro rightly explains the lack of niteness (i.e., the
lack of tense) in (28a) as due to syntagmatic economy: Information
that can be readily inferred from the context can be left out. (See also
Horie 1993: 203212 for related discussion.)
This factor of predetermination is of course not unrelated to the
broader notion of semantic closeness. If a complement-taking verb prede-
termines the tense value and other semantic properties of its complement,
this can be seen as one facet of conceptual closeness or event integra-
tion. However, such cases do not provide evidence for iconicity of cohe-
sion, because the higher syntagmatic cohesion of She saw him coming out
of the theatre would be expected anyway for reasons of economy.
12
7. Conclusion
I conclude that for most of the core phenomena for which iconicity of
quantity, complexity and cohesion have been claimed to be responsible,
there are very good reasons to think that they are in fact explained by fre-
quency asymmetries and the economy principle. The nal result may look
iconic to the linguist in some cases, but iconicity is not the decisive causal
factor.
Linguists have rarely discussed the mechanism by which iconicity could
come to have a causal role in shaping grammars. However, Givo n claims
that iconic structures are easier to process than noniconic structures:
The iconicity meta-principle: All other things being equal, a coded experience is
easier to store, retrieve, and communicate if the code is maximally isomorphic to
the experience. (Givo n 1985: 189)
And similarly, Dressler et al. (1987: 18) say that the more iconic a sign is,
the more natural it is, i.e., the easier speakers nd using it.
Frequency vs. iconicity in explaining grammatical assymetries 25
If these claims were correct also for iconicity of quantity, complexity
and cohesion, it would indeed be predicted that such iconic structures
should be preferred by speakers, and we should see a signicant eect of
iconicity in language structures. But in fact we do not see such an eect.
We see eects of frequency and predictability, i.e. of the economy princi-
ple, which (as everyone agrees) is independently needed. What we can
conclude from this is that the above claims are wrong, i.e., that iconic
structures are apparently not necessarily preferred in processing.
The respective role of iconicity and economy was discussed already
in the 1980s. Haiman (1983: 802) recognized that formal complexity/
simplicity is very often economically motivated, and he rejected the sub-
sumption of economic motivation under iconicity, even though one might
argue that the correspondence between a linguistic dimension (full vs. re-
duced form) and a conceptual dimension (unpredictable vs. predictable) is
itself iconic. As an example of economic motivation, he cites the tendency
for predictable referents to be coded with little material (short pronouns
or zero), while less predictable or unpredictable referents are coded with
more material (longer pronouns or full NPs) (as documented in Givo n
(ed.) 1983).
However, Givo n (1985: 197) sees the correlation between unpredict-
ability and amount of coding material as primarily iconic (see also Givo n
1991: 8789), and he objects to Haimans economy account:
. . . the principle of economy has not been working here by itself, since the end re-
sult of such a situation would have been the exclusive use of zero anaphora for all
topic identication in discourse. (Givo n 1991: 8789)
But that economy (favoring the speakers needs) is not the only relevant
factor in communication should be clear from the beginningif there
were no opposite principle of distinctiveness (favoring the hearer), we
would have no linguistic forms at all. Another argument that Givo n
makes is the following:
It may well be that Zipf-like economy considerations were indeed involved in the
diachronic . . . shaping of the quantity-scale . . . But the end result is nonetheless
an iconicisomorphicrelation between code and coded. And such a relation
surely carries its own meta-motivation, i.e., [the iconicity meta-principle, cited at
the beginning of this section]. (Givo n 1991: 8789)
This last sentence simply does not follow. If the end result is iconic in the
eyes of the analyst, this does not mean that it is iconically motivated, i.e.,
that iconicity is a relevant causal factor. The empirical evidence from
26 M. Haspelmath
frequency distributions and cross-linguistic coding types that was cited in
this paper shows that iconicity may well be irrelevant for an explanation
of the grammatical asymmetries considered here. That is, in the debate
between Haiman and Givo n, Haiman was right to favor economy over
iconicity in explaining the quantity scale for referent expressions. How-
ever, as I hope to have shown here, Haimans economy explanation
should be extended also to many other cases that he and others explained
in terms of iconicity.
Received 04 December 2006 Max-Planck Institute for Evolutionary
Anthropology, Leipzig, Germany Revision received 05 April 2007
Notes
* Versions of this paper were presented at the University of Jena, Tohoku University,
Seoul National University, and the Scuola Normale Superiore in Pisa. I am grateful
for all comments that I received on these occasions and on other occasions. I also
thank Brian Kessler for help with the corpus counts. Contact address: Max Planck In-
stitute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany;
authors e-mail: 3haspelmath@eva.mpg.de4.
1. In C. S. Peirces received typology of signs, there are three types of icons: diagrams,
images, and metaphors (see, e.g., Dressler 1995 for discussion). Nowadays metaphor
is not generally discussed under the heading of iconicity, and imagic iconicity is rele-
vant primarily for onomatopoeia. This paper is exclusively concerned with possible
iconicity eects in grammar, so only diagrammatic iconicity will be considered here.
The relevance of Peirces semiotic concepts to the study of grammar was rst brought
to linguists attention by Jakobson (1965).
2. The idea that (syntagmatic and paradigmatic) isomorphism can be considered an in-
stance of Peircean iconicity was apparently rst proposed by Anttila (1989, originally
published in 1972). A number of authors have noted that this represents a fairly
extreme extension of Peirces original concept, and Itkonen (2004) atly rejects the
subsumption of isomorphism under iconicity.
3. Haiman (1985: 194195) recognizes that the motivation for the reduction is also
partly economic: one gives less expression to that which is familiar or predict-
able, but he does not consider the possibility that the motivation may be entirely
economic.
4. Lako and Johnson (1980: 127) apply their principle more of form is more of con-
tent (which they call a metaphor, not relating it to iconicity) to cases of iteration (She
ran and ran and ran and ran) and lengthening (He is bi-i-i-i-ig!). Such extragrammatical
phenomena may well be motivated by a kind of iconicity of quantity. However, their
attempt to extend the principle to grammatical reduplication fails: While many cases of
reduplication signal more of content (e.g., plurality, continuative aspect), this is by
no means always the case (Moravcsik 1978 also mentions a widespread sense of dimi-
nution and attenuation, and more specic senses such as indierence and pretending).
Grammatical reduplication is apparently just like axation in that the reduplicated
form is always the rarer one.
5. The third vs. rst/second person contrast has also been interpreted as a kind of icon-
icity of absence (closely related to iconicity of quantity as seen in 2): Haiman (1985:
Frequency vs. iconicity in explaining grammatical assymetries 27
45), citing Benveniste (1946), claims that the third person, as a non-speech act partic-
ipant, can be seen as an absent person, a non-person that is iconically represented
by a non-desinence (i.e., zero). But neither Benveniste nor Haiman mention impera-
tives, where the hearer is present, but a second-person desinence is typically absent
(see (10) below). (See also the discussion in Helmbrecht 2004: 228229.)
6. Lehmann (1974: 113) notes that length correlates with rarity, but instead of following
Zipf in explaining length with reference to frequency/rarity, he suggests that rarity can
also be seen as equivalent to improbability or informational value. He then assumes
that informational value correlates with semantic complexity and infers that rare items
tend to be semantically complex. But evidently informational value in the statistical
sense is very dierent from semantic complexity. Talking about animals or perceiving
is perhaps in some technical sense of high informational value (even though it is not
very informative), but it is hard to argue that animal and perceive are semantically
more complex than cat or see.
7. A reviewer observes that the English pair widow/widower in (10) is also an isolated
exception and asks how it is dierent from beautiful/beauty. The answer is that the
widow/widower contrast is not isolated from a cross-linguistic point of view: There is
a general tendency for this pair to show overt coding on the male member (e.g., Ger-
man Witwe/Witw-er, Russian vdova/vdov-ec), whereas beautiful/beauty is isolated not
only within English, but also cross-linguistically.
8. Also much of the earlier functionalist literature is insuciently explicit with regard to
the causal factor. For example, Comrie (1989: 128) only invokes the naturalness of
certain associations between role and animacy, a relatively vague notion compared to
frequency.
9. Cf. also Lako and Johnsons (1980: 128132) principle closeness is strength of
effect, which is, however, not related to iconicity by them, but is regarded as a
metaphor. The frequency-based perspective here suggests that Lako and Johnsons
metaphor-based account is not necessary.
10. A further observation is that direct vs. indirect causation is not the only semantic
parameter by which competing causatives dier. Dixon (2000: 76) lists the following
parameters and observes that they all tend to correlate with the degree of compact-
ness of the causative marker (i.e., its shortness).
longer marker shorter marker
action state
transitive intransitive
causee having control causee lacking control
causee unwilling causee willing
causee fully aected causee partially aected
accidental intentional
with eort naturally
Not all of these can be subsumed under less conceptual distance, but they can be
plausibly related to frequency asymmetries. This is a matter for future research.
11. Leech et al. 2001 give the following gures for the verbal lexemes, which can be taken
as representative for the complement-clause constructions as well: want 945, wish 30;
tell 775, insist 67; make 2165, cause 206.
12. Cristofaro (2003: Ch. 9), while pointing to the importance of the factor of predetermi-
nation, still wants to retain semantic integration and iconicity as explanatory factors
for complement-clause constructions. But like Haiman and Givo n, she does not even
consider the potential explanatory value of frequency-based economy.
28 M. Haspelmath
References
Aissen, Judith
2003 Dierential object marking: Iconicity vs. economy. Natural Language and
Linguistic Theory 21(3), 435483.
Amberber, Mengistu
2000 Valency-changing and valency-encoding devices in Amharic. In R. M. W.
Dixon, and Alexandra Y. Aikhenvald (eds.), Changing Valency. Cambridge:
Cambridge University Press, 312332.
Anderson, Stephen C.
1979 Verb structure. In Larry Hyman (ed.), Aghem Grammatical Structure
(Southern California Occasional Papers in Linguistics 7). Los Angeles: Uni-
versity of Southern California, 73136.
Anttila, Raimo
1989 Historical and Comparative Linguistics. 2nd ed. Amsterdam: Benjamins.
Benveniste, E

mile
1946 Relations de personne dans le verbe. Bulletin de la Societe de Linguistique de
Paris 43, 112.
Berry, Keith and Christine Berry
1999 A description of Abun: A West Papuan language of Irian Jaya. (Pacic Lin-
guistics, B-115) Canberra: Australian National University.
Blansitt, Edward L.
1973 Bitransitive clauses. Working Papers in Language Universals (Stanford) 13,
126.
Bolinger, Dwight
1961 Generality, Gradience, and the all-or-none. The Hague: Mouton.
Bossong, Georg
1985 Dierenzielle Objektmarkierung in den neuiranischen Sprachen. Tu bingen:
Narr.
1998 Le marquage dierentiel de lobjet dans les langues dEurope. In Jack Feuil-
let (ed.), Actance et valence dans les langues de lEurope. Berlin: Mouton de
Gruyter, 193258.
Bybee, Joan L.
1985 Morphology: A Study of the Relation between Meaning and Form. Amster-
dam: Benjamins.
Bybee, Joan L. and Paul Hopper (eds.)
2001 Frequency and the emergence of linguistic structure. Amsterdam: Benjamins.
Comrie, Bernard
1989 Language Universals and Linguistic Typology. 2nd ed. Oxford: Blackwell.
Corbett, Greville, Andrew Hippisley, Dunstan Brown, and Paul Marriott
2001 Frequency, regularity and the paradigm: A perspective from Russian on a
complex relation. In Bybee, Joan and Paul Hopper (eds), Frequency and the
Emergence of Linguistic Structure. Amsterdam: John Benjamins. 201226.
Cristofaro, Sonia
2003 Subordination. Oxford: Oxford University Press.
Croft, William
1990a Typology and Universals. Cambridge: Cambridge University Press.
1990b Possible verbs and the structure of events. In Tsohatzidis, S. L. (ed.), Mean-
ings and Prototypes: Studies in Linguistic Categorization. London: Rout-
ledge, 4873.
Frequency vs. iconicity in explaining grammatical assymetries 29
2003 Typology and Universals. 2nd ed. Cambridge: Cambridge University Press.
Croft, William and Alan Cruse
2004 Cognitive Linguistics. Cambridge: Cambridge University Press.
Dickens, Patrick J.
2005 A Concise Grammar of Ju|hoan. Cologne: Ko ppe.
Dixon, R. M. W.
2000 A typology of causatives: Form, syntax and meaning. In Dixon, R. M. W.
and Alexandra Y. Aikhenvald (eds.), Changing Valency. Cambridge: Cam-
bridge University Press, 3083.
Dressler, Wolfgang U.
1995 Interactions between iconicity and other semiotic parameters in language. In
Raaele Simone (ed.), Iconicity in Language. Amsterdam: Benjamins, 2137.
Dressler, Wolfgang U., Willi Mayerthaler, Oswald Panagl and Wolfgang U. Wurzel
1987 Leitmotifs in Natural Morphology. (Studies in Langauge Companion Series
10). Amsterdam: Benjamins.
Elbert, Samuel
1974 Puluwat Grammar. (Pacic Linguistics, B-29.) Canberra: Australian Na-
tional University.
Fenk-Oczlon, Gertraud
1990 Ikonismus versus O

konomieprinzip: Am Beispiel russischer Aspekt- und


Kasusbildungen. Papiere zur Linguistik 42(1), 4969.
Fenk-Oczlon, Gertraud
1991 Frequenz und KognitionFrequenz und Markiertheit. Folia Linguistica
25(34), 361394.
Filimonova, Elena
2005 The noun phrase hierarchy and relational marking: Problems and countere-
vidence. Linguistic Typology 9(1), 77113.
Gilquin, Gaetanelle
2006 The verb slot in causative constructions: Finding the best t. Constructions
SV1-3. (www.constructions-online.de)
Givo n, Talmy
1980 The Binding Hierarchy and the typology of complements. Studies in Lan-
guage 4, 333377.
1985 Iconicity, isomorphism and non-arbitrary coding in syntax. In John Haiman
(ed.), Iconicity in Syntax. Amsterdam: Benjamins, 187219.
1990 Syntax: A Functional-Typological Introduction. Vol. II. Amsterdam:
Benjamins.
1991 Isomorphism in the grammatical code: Cognitive and biological considera-
tions. Studies in Language 15(1): 85114.
1995 Markedness as meta-iconicity: Distributional and cognitive correlates of
syntactic structure. In Functionalism and Grammar, T. Givo n, 2569.
Amsterdam: Benjamins.
2001 Syntax: An Introduction. Volume II. Amsterdam: Benjamins.
Givo n, T. (ed.)
1983 Topic Continuity in Discourse: A Quantitative Cross-Language Study. Am-
sterdam: Benjamins.
Greenberg, Joseph H.
1963[1966] Some universals of grammar with particular reference to the order of mean-
ingful elements. In Joseph H. Greenberg (ed.), Universals of Grammar. 2nd
ed. 1966. Cambridge, Mass.: MIT Press, 73113.
30 M. Haspelmath
1966 Language Universals, with Special Reference to Feature Hierarchies. (Janua
Linguarum, Series Minor 59). The Hague: Mouton.
Haiman, John
1980 The iconicity of grammar. Language 56, 515540.
1983 Iconic and economic motivation. Language 59, 781819.
1985 Natural Syntax. Cambridge: Cambridge University Press.
2000 Iconicity. In Geert Booij, Joachim Mugdan and Christian Lehmann (eds.),
Morphology: An International Handbook Vol. I. Berlin: de Gruyter, 281
288.
Haspelmath, Martin
1993 More on the typology of inchoative/causative verb alternations. In Bernard
Comrie and Maria Polinsky (eds.), Causatives and Transitivity (Studies in
Language Companion Series 23). Amsterdam: Benjamins, 87120.
1999 On the cross-linguistic distribution of same-subject and dierent-subject
complement clauses: Economic vs. iconic motivation. Paper presented at the
ICLC, Stockholm, July 1999. (Handout available from authors website.)
2006 Against markedness (and what to replace it with). Journal of Linguistics
42(1), 146.
2008 Creating economical morphosyntactic patterns in language change. To
appear in Good, Je (ed.), Language Universals and Language Change.
Oxford: Oxford University Press.
Hawkins, John A.
2004 Eciency and Complexity in Grammars. Oxford: Oxford University Press.
Helmbrecht, Johannes
2004 Ikonizitat in Personalpronomina. Zeitschrift fu r Sprachwissenschaft 23, 211
244.
Horie, Kaoru
1993 A cross-linguistic study of perception and cognition verb complements:
A cognitive perspective. PhD dissertation, University of Southern Cali-
fornia.
Hockett, Charles F.
1958 A Course in Modern Linguistics. New York: MacMillan.
Horn, Wilhelm
1921 Sprachkorper und Sprachfunktion. Berlin: Mayer and Mu ller.
Hyman, Larry
1971 Consecutivization in Fefe. Journal of African Languages 10(2), 2943.
Itkonen, Esa
2004 Typological explanation and iconicity. Logos and Language 5(1), 2133.
Jager, Gerhard
2004 Learning constraint sub-hierarchies: The bidirectional gradual learning
algorithm. In Blutner and H. Zeevat (eds.), Pragmatics in OT, R. Palgrave
MacMillan, 251287.
Jakobson, Roman
1963[1966] Implications of language universals for linguistics. In Joseph H. Greenberg
(ed.), Universals of language. 1966. Cambridge, MA: MIT Press. (First edi-
tion 1963). 263278. 2d ed.
1965[1971] Quest for the essence of language. In Selected Writings, vol. II. The Hague:
Mouton, 345359. (Originally published in Diogenes 51, 1965).
1971 Relationship between Russian stem suxes and verbal aspects. In Selected
Writings, vol. II. The Hague: Mouton, 198202.
Frequency vs. iconicity in explaining grammatical assymetries 31
Johnston, Ray
1981 Conceptualizing in Nakanai and English. In Franklin, Karl (ed.), Syntax
and Semantics in Papua New Guinea Languages. Ukarumpa, Papua New
Guinea: SIL, 212224.
Koptjevskaja-Tamm, Maria
1996 Possessive noun phrases in Maltese: Alienability, iconicity and grammatical-
ization. Rivista di Linguistica 8(1), 245274.
Lako, George and Mark Johnson
1980 Metaphors We Live By. Chicago: University of Chicago Press.
Langacker, Ronald
2000 The meaning of of. In Grammar and Conceptualization. Berlin: Mouton de
Gruyter, 7390.
Lee, David
2001 Cognitive Linguistics: An Introduction. Melbourne: Oxford University Press.
Leech, Georey, Paul Rayson, and Andrew Wilson
2001 Word Frequencies in Written and Spoken English Based on the British Na-
tional Corpus. Harlow, England: Pearson Education.
Lehmann, Christian
1974 Isomorphismus im sprachlichen Zeichen. In Seiler, Hansjkob (ed.), Linguis-
tic workshop II: Arbeiten des Kolner Universalienprojekts 1973/4, (Struc-
tura 8). Mu nchen: Fink, 98123.
Levinson, Stephen C.
2000 Presumptive Meanings: The Theory of Generalized Conversational Implica-
ture. Cambridge/MA: MIT Press.
Maran, L. R. and J. R. Clifton
1976 The causative mechanism in Jinghpaw. In Shibatani, Masayoshi (ed.), The
Grammar of Causative Constructions. New York: Academic Press, 443
458.
Martin, Jack B.
2000 Creek voice: Beyond valency. In Dixon, R. M. W. and Alexandra Y.
Aikhenvald (eds.), Changing Valency. Cambridge: Cambridge University
Press, 375403.
Matthews, Peter
1991 Morphology. 2nd ed. Cambridge: Cambridge University Press.
Mayerthaler, Willi
1981 Morphologische Natu rlichkeit. Wiesbaden: Athenaion.
1987 System-independent morphological naturalness. In Dressler et al., Leitmotifs
in Natural Morphology. (Studies in Language Companion Series 10). Am-
sterdam: Benjamins, 2558.
McKay, Graham R.
1996 Body parts, possession marking and nominal classes in Ndjebbana. In Chap-
pell, Hilary and William McGregor (eds.), The Grammar of Inalienability.
Berlin: Mouton de Gruyter, 293326.
Melcuk, Igor A.
1967 K ponjatiju slovoobrazovanija. Izvestija Akademii Nauk SSSR, serija litera-
tury i jazyka 26 (4), 352362.
Moravcsik, Edith A.
1978 Reduplicative constructions. In Greenberg, Joseph H. (ed.), Universals of
Human Language. Vol. 3. Word Structure. Stanford: Stanford University
Press, 297334.
32 M. Haspelmath
Newmeyer, Frederick
1992 Iconicity and generative grammar. Language 68, 756796.
Nichols, Johanna
1988 On alienable and inalienable possession. In Shipley, William (ed.), In Honor
of Mary Haas. Berlin: Mouton de Gruyter, 557609.
Ostho, Hermann
1899 Vom Suppletivwesen der indogermanischen Sprachen. Heidelberg: Ho rning.
Plank, Frans
1979 Ikonisierung und De-Ikonisierung als Prinzipien des Sprachwandels. Sprach-
wissenschaft 4, 121158.
Ronneberger-Sibold, Elke
1980 SprachverwendungSprachsystem: O

konomie und Wandel (Linguistische


Arbeiten 87). Tu bingen: Niemeyer.
1988 Entstehung von Suppletion und Natu rliche Morphologie. Zeitschrift fu r
Phonetik, Sprachwissenschaft und Kommunikationsforschung 41, 453462.
Saksena, Anuradha
1982 Contact in causation. Language 58, 820831.
Taylor, John R.
2002 Cognitive Grammar. Oxford: Oxford University Press.
Thompson, Chad
1996 On the grammar of body parts in Koyukon Athabaskan. In Chappell,
Hilary and William McGregor (eds.), The Grammar of Inalienability. Berlin:
Mouton de Gruyter, 651676.
Tiersma, Peter
1982 Local and general markedness. Language 58, 832849.
Walchli, Bernhard
2005 Co-compounds and Natural Coordination. Oxford: Oxford University Press.
Waugh, Linda R.
1982 Marked and unmarked: A choice between unequals. Semiotica 38, 299318.
Welmers, Wm. E.
1973 African Language Structures. Berkeley: University of California Press.
Wilson, Peter J.
1992 Una descripcion preliminar de la gramatica del Achagua (Arawak). Bogota:
Asociacio n Instituto Lingu stico de Verano.
Witkowski, S. R. and Cecil H. Brown
1983 Marking reversal and cultural importance. Language 59, 569582.
Wright, Saundra Kimberly
2001 Internally caused and externally caused change of state verbs. Ph.D. disser-
tation, Northwestern University, Evanston, IL.
Zepeda, Ofelia
1983 A Papago Grammar. Tucson: University of Arizona Press.
Zipf, George K.
1935 The Psycho-Biology of Language: An Introduction to Dynamic Philology.
Boston: Houghton Miin.
Zwicky, Arnold
1978 On markedness in morphology. Die Sprache 24, 129143.
Frequency vs. iconicity in explaining grammatical assymetries 33
In defence of iconicity
JOHN HAIMAN*
Abstract
A number of iconically motivated grammatical distinctions, among them
that between alienable and inalienable possession in Japanese and Korean,
are graded. Haspelmaths Zipan frequency hypothesis may be able to
accommodate these facts (lowest bulk is most frequent, middle bulk is less
frequent, and maximal bulk is maximally infrequent), but until more data
are forthcoming, iconicity alone makes the correct predictions in those
cases, and (crucially) in others where bulk is simply not the grammatical
variable at issue in signaling markedness (as for example, the distinction
between nominative/absolutive and ergative/accusative in Kurdish). The
productivity (not just the fortuitous correctness) of an iconically motivated
more form implies more meaning principle is attested in: (a) the
(pre)history of the development of nominalizations in Romanian and
Khmer, (b) in the frequent operation of Watkins Law whereby 3sg.
forms are interpreted as if they were zero-marked, even when they are not,
and (c) grammaticality judgments about the dierences between anaphoric
epithets and structurally identical non-anaphoric noun phrases like the pig
in English. Like reduced form, so too elaborated form, may have a number
of motivations, not only iconic and economic (both cognitive), but also
esthetic. It is probably misconceived to look for only one motivating factor
to account for most observed grammatical facts, although the motivating
factors are more easily identied when they operate alone.
Keywords: iconicity; frequency; productivity.
1. Introduction
Martin Haspelmaths article is a stimulating and thought-provoking cri-
tique of the notion of iconic motivation which deals with a broad range
Cognitive Linguistics 191 (2008), 3548
DOI 10.1515/COG.2008.002
09365907/08/00190035
6 Walter de Gruyter
of data and demands careful scrutiny. Not surprisingly, I am not equally
convinced by all of his arguments.
Haspelmaths fundamental argument is a version of Occams razor:
certain phenomena of reduced expression which seem to be iconic are
equally motivated by Zipan reduction, which is necessary anyway.
He thus proposes that for a variety of phenomena which seem to manifest
diagrammatic iconicity, only frequencyin fact, no cognitive explana-
tion whatsoeveris necessary. It is important to recognize this aspect of
his argument. To say that frequency itself is motivated by some concep-
tual considerations would be to beg the questionwhich one? But since
he never denies that iconicity is also necessary anyway in other areas
of grammar, Occams razor wont work for him. Moreover, he has not
yet done all his homework. The iconicity hypothesis is compatible with
graded phenomena. For example, Sohn (1994) and Tsunoda (1995) have
argued that possession (in Korean and Japanese) may be graded so
that there may be three or even four-way contrasts in conceptual close-
ness that are mirrored in grammatical performance and grammaticality
judgments. To claim that frequency counts also reect this graduation,
Haspelmath would need to produce frequency comparisons of not two,
but three or more forms. Until such evidence is available, paired fre-
quency counts alone will not be able to compete with iconicity.
Let us however assume for now that there are a variety of phenomena
for which both a frequency and an iconicity explanation are equally plau-
sible. When a structure is equally motivated by two constraints, however,
credit should be given
a) to the one that is applicable to a broader range of phenomenanot
just the one which seems to have one or two fewer exceptions. (I am
not impressed by Haspelmaths claims that here and there an excep-
tion to the iconicity principle makes the wrong prediction, while
frequency does not. Frequency also has its exceptions. For example,
as Orwell (1957: 150) and others have pointed out, it is not necessar-
ily always true that the shorter of two forms is the most frequent:
infrequency, verbal sludge like the American people at least in po-
litical discourses, swamps out homely expressions like Americans,
probably by orders of magnitude. In the same way, the occasional
counterexample to a generalization about iconicity is not convinc-
ing.)
b) to the one that is shown to be productivethat is, responsible for the
creation of novel forms. Productivity is the real test for psychological
reality.
36 J. Haiman
2. Broader range of phenomena arguments
2.1. Alienable vs. inalienable possession
One of the best apparent pieces of evidence for diagrammatic iconicity is
the contrast between alienable and inalienable possession. Typically,
though not always, the expression of alienable possession is more com-
plex, with greater linguistic distance between possessor and possessum,
than that of inalienable possession and this seems to reect conceptualiza-
tion iconically (inalienable possessionat least of body partsis concep-
tual closeness to the point of identity: and you cant get closer than
that.) Haspelmath argues, with convincing statistics, that inalienable pos-
session is more frequently expressed, and that the dierent degrees of
bulkiness of my arm versus my house in languages which make an ex-
plicit distinction between the two is nothing but a Zipan consequence of
this dierence in their relative frequency of occurrence. Indeed Haspel-
math makes much of the fact that in some languages like Puluwat, his
frequency test makes the right predictions about morphological bulk,
while iconicity does not. (I rst noted this as a problem myself. I would
now hazard the guess that Puluwat, like other Oceanic languages, rst
allowed the inversion of alienable possession structures like
Possessor
1
X # Possessum
2
! 2 1
as an occasional stylistic inversion, as is still the case in Tinrin (Osumi
1995: 437438) or Paamese (Crowley 1995: 384, 386). Iconicity is not
eternal.)
But morphological bulk is not the only means whereby the conceptual
contrast between alienable and inalienable possession can be expressed.
As William James (1890) pointed out:
it is clear that between what a man calls me and what he calls mine, the line is
dicult to draw. In its widest possible sense, a mans self is the sum total of all
that he can call his, not only his body and his psychic powers, but his clothes
and his house, his wife and children, his ancestors and his friends. (James 1890:
291292)
That is, the contrast between what one is and what one merely has is an in-
nitely gradable one, and languages sometimes reect this gradation in a variety
of (iconic) ways.
2.1.1. Possessor ascension. The phenomenon of possessor ascension or
external possession exists in English as well as in many other languages
(cf. Bally 1926; Hyman 1977; Durie 1987; Clark 1995; Tsunoda 1995;
Payne and Barshi 1999). The contrast is illustrated by pairs like:
In defence of iconicity 37
She patted his cheek/head/knee. (no possessor ascension)
She patted him on the cheek/head/knee. (possessor ascension)
In English possessor ascension is possible with all (real or imagined) body
parts, and with clothes one is actually wearing at the time the action
occurred, but not with clothes in ones closet, with pets, or with ones
productions or other possessions.
She tapped him on the shoe (OK when worn, not OK when not)
*She tapped him on the gerbil/wallet/article he had just written/car
The operative criterion is not exactly inalienable possession but one very
closely related to it. Possessor ascension may occur when the possessor
can be identied with the possessum. It may seem like arrant chutzpah to
invoke possessor ascension in defence of the idea of conceptual closeness,
since it is the relatively inalienable possessor which can be separated from
the possessum in this construction. But note that ascension is a natural
consequence of identity: Who pats my shoulder is ipso facto patting me
(cf. Hyman 1977: 107; Durie 1987: 388; Tsunoda 1995: 590, among
many). Thus possessor ascension is iconic of conceptual closeness, al-
though the means for expressing this closeness are dierent than in cases
like Hua d-zorgeva my hair versus d-gai zu my house. Iconicity can
provide a common explanation (or at least a common characterization)
of these facts, and frequency does not.
2.1.2. Honoric agreement. The phenomenon of honoric agreement is
the tendency for honorics to appear not only on NP denoting respected
persons, but on NP denoting their possessions, or on predications that are
made concerning these possessed NP.
Sohn (1994) and Tsunoda (1995) provide careful examinations of
honoric agreement in Korean and Japanese, whereby a verb may
mark the respect that the speaker accords to its subject or object. How-
ever, when that subject or object is a NP consisting of a possessor (mod-
ier) and a possessum (head), and the one respected is the possessor,
as in the emperors X, there is a cline of subtle and widely shared
grammaticality judgments depending on where the possessum is on the
hierarchy:
Body part > inherent attribute > clothing worn > (kin) > pet >
production > other
The higher the possessum on the hierarchy, the more likely that possessor
respect agreement as marked on the verb (either by a special verb form or
38 J. Haiman
by a respect sux) will be acceptable (Tsunoda 1995: 576). Accordingly,
the emperors hand is accorded respect; the emperors glasses less; the
emperors horse still less; the emperors book that he wrote less still;
and the emperors car/villa none at all. Below, the same range of facts
illustrated from Korean (Sohn 1994: 176):
(1) a. sensayng-nim-uy phali khu-sey- yo
teacher hon.gen. arm big hon.pol.
The teachers arms are big. (arms are inalienably part of the
teacher)
b. sensayng-nim-uy ankyengi khu- (sey)-yo
teacher hon.gen. glasses big hon. pol.
The teachers glasses are big.
(glasses are less likely to bask in the teachers reected honor
and glory)
c. sensayng-nim-uy namwuka khu-(?sey)-yo
teacher hon.gen. trees big hon. pol.
The teachers trees are big.
(trees even less than glasses)
b. sensayng-nim-uy kaytuli khu- (sey)-yo
teacher hon.gen. Dogs big hon. pol.
The teachers dogs are big.
(dogs the teacher owns are less likely to share in his honor than
trees he has planted, perhaps because they have a will of their
own)
The iconic principle behind these judgments is this:
The more we tend to identify the possessum with the possessor, the more
we..show our respect for it, in accordance with our respect for the possessor.
(Tsunoda 1995: 584)
This is exactly in accordance with the conceptual closeness of possessor
and inalienable possessum as marked in physical closeness. The iconicity
hypothesis suggests a common conceptual basis for these facts. The fre-
quency hypothesis proposes none.
2.2. Markedness in general
Haspelmath argues (I think largely convincingly) that local markedness
(Tiersma 1982) or markedness reversal (Andersen 1972) phenomena dem-
onstrate that markedness is not so much an icon of the unexpected as
a consequence of the relative infrequency of the unexpected. There is
In defence of iconicity 39
nothing inherently marked even about singular or plural, which is why
the unmarked form of stars may be (in some languages) the plural. But
relative markedness is reected not only in relative bulk (the Zipan cor-
relation) but in other ways as well.
Consider one elaboration of markedness reversal, Silversteins well-
known hierarchy of animacy (1976), and its ability to explain a number
of nominative/ergative case-marking splits. The ergative is marked
relative to the nominative and marks unexpected/infrequent subjects
(typically inanimate nouns, and typically transitive subjects in the past
tense). Conversely, the accusative is marked relative to the nominative
and marks unexpected/infrequent objects (typically animate human
nouns, and objects in the present tense).
Sorani Kurdish happens to be a language in which the accusative
and ergative are marked in exactly the same waya triumphant demon-
stration of Silversteins hypothesis that markedness alone is at issue in
both nominative/accusative and nominative/ergative oppositions. But in
Kurdish the marked/unmarked distinction (called the oblique/direct dis-
tinction in Western accounts, cf. McCarus 1958) is instantiated not by
greater versus lesser bulk, but by the contrast between agreement suxes
on the verb (for the unmarked S and O) versus mobile pronominal clitics
which land (roughly) after the rst immediate constituent of the VP (for
the marked A and O), cf. Haiman (forthcoming c). There is no dierence
in bulk between the agreement suxes on the one hand and the pronomi-
nal clitics on the other. It is only in their syntactic behaviour that they
dier systematically. To claim with Haspelmath that the unmarked is
simply the most reduced is to miss the obvious generalization that in
Kurdish, as in other languages with split ergativity, it is the nominative
which is the unmarked grammatical relation.
3. Productivity arguments
Haspelmath correctly notes that investigators have had little to say on the
genesis of iconicity: it is merely something that is already there, to be
(alternately) oohed and aahed over or dismissed as epiphenomenal. This
section examines some evidence for the productivity of iconic motivations
for morphological asymmetries. Such evidence is relatively hard, but not
impossible, to nd.
It is worth emphasizing before going on that token frequency can make
no predictions about productivity. It can only account for changes that
have already happened. When an utterance is about to be made for the
rst time, there is nothing for frequency to work on.
40 J. Haiman
3.1. takete/maluma thought experiments
I suggested that periphrastic causatives like cause to rise tend to evoke
some image of magic or telekinesis as opposed to raise. Haspelmath ar-
gues that there is no need to account for such judgments since the shorter
form is simply the most frequent. One typically raises objects through di-
rect contact rather than by waving a wand. But consider now a new form
like the verb disappear as a transitive verb, which rst made its appear-
ance in English (at least for me) in Joseph Hellers Catch-22:
I just heard them say they were going to disappear Dunbar.
Why are they going to disappear him?
I dont know.
It doesnt make sense. It isnt even good grammar. What the hell
does it mean when they disappear someone? (Heller 1972 [1955]:
376)
It has now become widespread, but I still recall the image it conjured up
when I rst read this book in the sixties. Contrasted with make disappear
it included as at least part of its meaning the notion of directly killing, as
opposed to make disappear which would have suggested some bureau-
cratic mediation. Speakers who make judgments like this are basing their
images on a contrast between patterns and performing their computations
without reference to frequency, since the frequency of a new form when it
is rst introduced is zero.
3.2. length revisited
As most cognitive linguists maintain, and as Haspelmath (1993: 106107)
has also contended, human imagery and conceptualization tend to be
based on concrete experience, and are not always the same as what is
viewed by the elite of the scientic community as objective physical
fact. For example, consider notions of up and down: the sun still rises
and sets, even though we accept Copernicus.) In his discussion here,
Haspelmath seems to retreat from this sensible position: an entity is trisyl-
labic, although things are monosyllabic. Hence length does not corre-
spond to conceptual complexity. To Haspelmath (as to Leibniz), an
entity may have seemed conceptually prior to things and people (and
how fortunate at least for Leibniz that ens is monosyllabic in Latin). Hu-
mans in general simply do not seem to operate in this manner: before we
make abstractions about entities, we are at home with things and people
(cf. Wierzbicka 1972, who boldly disregarded both the thought of Leibniz
and the morphology of English in insisting that someone and something
are mutually independent semantic primitives).
In defence of iconicity 41
In the same way, it is, if not ridiculous, at least highly unlikely, that hu-
man beings begin their thinking with a priori dimensions of absolute
space, time, colour, and morality (among them length), which they then
populate with judgments like long versus short, good versus
bad, green versus red and so forth. Rather, the conceptual dimen-
sions like length, width, time and morality and personications
like life, death, beauty and justice come into being (if they do
so at all) only after scores of these judgments are made and people have
reected on them. (It is satisfying, if ultimately irrelevant, that modern
physics now seems to agree with this folklore to the extent that space,
rather than preexisting, is thought to be created by the objects within it.)
This is why in every language I have ever heard of, nominalizations like
length are systematically more complex than judgments like long from
which they derive (again, we can and should take occasional exceptions
like beauty and (German) Tod death in stride). And that is also why
there are languages (like Hua) in which words like death, justice
and beauty do not exist all, and therefore have a frequency of zero.
We can observe the generation of the verb/nominalization distinction
in the recorded history of one language (Romanian) and in the tentatively
reconstructible prehistory of another (Khmer). In both languages, inher-
ited phonetic material was exapted (essentially from the careful pronun-
ciation of a verb form) to create a novel and productive derivational
nominalizing sux (Romanian -re from the inherited innitive) or inx
(Khmer awm(n)- from an inherited unstable anacrusic syllable in sesqui-
syllabic roots) to form new words (Haiman 2003). Considerations of fre-
quency will not explain why this recycled material was assigned the novel
task of marking nominalization in both languages (and also marking
causativity in Khmer). The iconic principle that more form is more
meaning can do so naturally.
3.3. Watkins Law
Not only is it true that the 3sg. form in the indicative or the 2sg. in the
imperative are typically zero, facts which may or may not be accountable
through Zipan reduction. It is more interesting to observe that in para-
digmatic restructurings, 3sg. is often treated as if it were zero, even when
it isnt (Watkins 1962: 16; Haiman and Beninca 1992: 89; Bybee 1985:
55). So we are faced not with actually reduced forms but the reinterpreta-
tion of non-reduced forms. Zipf may account for the actual erosion of a
frequently occurring form, but not for the perception of a non-reduced
form as if it were reduced.
42 J. Haiman
3.4. Full nouns and anaphoric expressions
On the relative abbreviation of anaphoric forms, I would have been
tempted to accept Haspelmaths position, but now I am not so sure. Con-
sider the well-known e-mail joke variously told about various political
leaders:
George Bush and his chaueur are out for a drive in the countryside. Suddenly a
pig darts across the road in front of the car and is killed. Bush sends his driver to
the farmhouse to apologize and make amends (insert a more plausible politician if
you wish) and settles down to wait. After more than an hour, the chaueur re-
emerges from the farmhouse. In his left had he holds a Havana cigar; in his right,
a bottle of champagne. His shirt is undone and covered with lipstick.
What happened?
Well, I got the cigar from the farmer, and the champagne from his wife, and
for the last hour, their daughter has been making passionate love to me.
What did you tell them?
Just that Im George Bushs driver, and Ive killed the pig.
The linguistic judgment that needs to be accounted for is that this joke
can only be written, and not told. The reason, as everyone seems to agree,
is that what the chaueur said can only be pronounced
. . . Ive killed the PIG
while what the farmers family responded to could only be
. . . Ive KILLED the pig.
The contrast, of course is between the uses of the expression the pig as
a full noun phrase (the chaueurs speech) and as an epithet (as in the
farmers understanding). Epithets need not be monosyllabic. Other possi-
bilities include the cocksucking bastard, that idiotic asshole, the cross-eyed
son of a bitch, or virtually anything else one might want to use to charac-
terize George Bush or anyone else. Thus there need be no contrast in
morphological bulk between an epithet and a full NP. What is common
to all epithets, however, is that in addition to incorporating any amount
of information or speakers attitude about a referent, they also function
as anaphoric expressions, and refer back to some antecedent, wherein
that referent is rst named. The totally iconic intuition that underlies the
grammaticality judgment above is that epithets, as anaphors, are copies of
an original referring expression, and like copies everywhere, paler than
their originals. This pallor is indirectly reected in the locus of sentence
stress on the verb rather than its object. Whether relative pallor
via destressing is the same fact as the relative abbreviation of anaphoric
In defence of iconicity 43
pronouns in general is perhaps open to debate, but here Occam is on the
side of iconicity.
4. Conclusions
Both complexity and elaboration may arise more or less accidentally:
Frequency may lead to erosion (thus Zipf 1935), and appendix-like
quirky vestigial residues may result in unnecessary elaboration in lan-
guages as well as in biological organisms (Mayr 2000, Dahl 2005, Kuteva
to appear). But there are also multiple non-accidental motivations for
both compact and elaborated expression. Among the functionally moti-
vated bases for compactness are brutality (e.g., four letter words) and
esthetic power (e.g., haiku). Among other motivations for elaboration are
a) various mechanisms of phonetic bulking which prevent total loss of
both lexical and grammatical categories (Bloomeld 1933: 395396;
Bolinger 1975: 438; Matiso 1982: 7476; Heath 1998),
b) high register and/or politeness (Geertz 1955; Aoki and Okamoto
1988)
c) disambiguation via diacritics (Haiman 1985: 6067),
d) the iconic representation of conceptual symmetry (Haiman 1988) and
e) esthetic appeal (ritual elaboration, Haiman to appear a,b).
Doubtless there are also others. In approaching human language, the
rational exuberance of Chappell and Thompson 1992 (who uncover no
fewer than seven dierent motivations for the absence of qualifying de in
Chinese possessive constructions), or of Hugo Schuchardt 1885: 23 (I
perceive here the motley interplay of innumerable drives) seems to do
more empirical justice to its subject than Haspelmaths reductionism.
Each of these motivations is most clearly attested ceteris paribus, that
is, when they operate unopposed. Haspelmath makes a strong case for
frequency as the sole possible motivation for diering expressions of one
conceptual dimension (transitive versus intransitive): frequently or typi-
cally spontaneous events like freezing will typically occur as root in-
transitives, and form their marked transitive congeners via an extra
causative morpheme. Conversely, events which are typically seen to be
brought about by external agents, like breaking, will typically occur as
transitive verbs, and form their intransitive congeners via an extra medio-
passive or reexive morpheme. (Note that here again, human conceptual-
ization is not the same as objective physical fact. To a physicist, freezing,
melting and boiling are brought about by external agency no less than
breaking). This case is exactly analogous to the contrast between typically
44 J. Haiman
introverted and typically extroverted transitive actions discussed at
considerable length in Haiman 1985: actions typically performed upon
oneself occur in the typical case with unexpressed or reduced objects
(e.g., I shave: middle voice) while actions more typically performed on
others occur when they are reexive with a separate object noun phrase
(e.g. I kicked myself : reexive voice). Armed with this clear example
of economic motivation, Haspelmath attempts to eliminate iconicity in
other cases as well, with less success.
I think we must acknowledge that iconicity is clearly one possible mo-
tivation for the asymmetric realization of referential asymmetry. I have
tried to argue here that it is preferable to frequency when it accounts
for a broader range of related phenomena, and when it seems to be pro-
ductive in generating new forms. Moreover, iconicity is the only possible
motivation for the even more wide-spread if not universal manifestations
of referential symmetry (in distributivity, comparison, reciprocity, coordi-
nate conjunction and so on) that have been discussed elsewhere in the
literature (Lako and Peters 1969; Haiman 1980, 1985, 1988).
I believe that this is puzzling: Iconicity seems at least at present to oer
no proven cognitive benets (Here I am reluctantly in disagreement with
Givon 1985: 189; cf. Bellugi and Klima 1976; Bonvillian et al. 1997;
Tomasello et al. 1999). If we grant this, it is unexplained why it should
occur at all. Given the fact that it disappears so rapidly under convention-
alization (Bloom 1979), moreover, it cannot possibly be regarded as a
vestigial featurefrom proto-language, or from Old English, or even
from last week. In other words, for iconicity to appear in language at all,
it has to be productive. Now it seems that neither of the traditional moti-
vations for linguistic form (economy of eort for the benet of the
speaker versus clarity for the benet of the hearer) can account for it.
I very tentatively propose that iconicity is generated over and over not
only for purely cognitive reasons, but because speakers take a purely es-
thetic pleasure in making the form t the sense. Some purely creative
drive is necessary to account for the ultimate genesis of linguistic material
(which sound change, analogy and grammaticalization merely erode and
tidy up), but it is frequently overwhelmed by the other two (and perhaps
others). Yet a creative esthetic drive compounded of imitation and am-
bition is well attested in human behavior generally and even in language
it is not only inferable on a priori grounds. Indeed it is responsible for the
creation of non-referential non-iconic symmetry, which may include not
only twin forms like imam (Pott 1862; Marchand 1960), but even
nuts and bolts phenomena like grammatical agreement (Ferguson and
Barlow 1988: 17; Haiman to appear a,b). A creative esthetic drive may
even be responsible for the spontaneous creation of expressive mor-
In defence of iconicity 45
phemes like ideophones, as noted long ago by Hermann Paul 1880,
Ch. 9. But that is a subject that deserves another treatment.
Received 28 February 2007 Macalester College, USA
Notes
* Authors email address: 3Haiman@Macalester.edu4. Contact address: Linguistics
Search, Macalester College, 1600 Grand Avenue, St. Paul, MN 55105, USA.
References
Andersen, Henning
1972 Diphthongization. Language 48, 1150.
Aoki, Haruo, and Okamoto, S.
1988 Rules for Conversational Rituals in Japanese. Tokyo: Taishukan.
Bally, Charles
1926 [1995] The expression of the concepts of personal domain and indivisibility in
Indo-European languages. In Chappell, H., and W. McGregor (eds.), 31
61.
Bellugi, Ursula, and Klima, Edward
1976 Two faces of the sign: Iconic and abstract. In Harnad, S. et al. (eds.), The
Origins and Evolution of Language and Speech. New York: N.Y. Academy
of Sciences, 514538.
Bloom, H.
1979 Language Creation in the Manual Modality. Honors Thesis, University of
Chicago.
Bloomeld, Leonard
1933 Language. New York: Holt.
Bolinger, Dwight
1975 Aspects of Language, 2nd ed. New York: Harcourt: Brace.
Bonvillian, John, A. M. Garber, and S. B. Dell
1997 Language origin accounts. First-Language 17, 219239.
Bybee, Joan
1985 Morphology. Amsterdam: Benjamins.
Chappell, Hilary, and McGregor, William (eds.)
1995 The Grammar of Inalienability. Berlin/New York: Mouton de Gruyter.
Chappell, H., and Thompson, S.
1992 Semantics and pragmatics of associative de in Chinese discourse. Cahiers de
Linguistique Asie Orientale 21, 199229.
Clark, M.
1995 Where do you feel? Stative verbs and body-part terms in Mainland South-
east Asia. In Chappell, H., and W. McGregor (eds.), 529563.
Crowley, Terry
1995 Inalienable possession in Paamese. In Chappell, H., and W. McGregor
(eds.), 383432.
46 J. Haiman
Dahl, O

sten
2005 Origins and Maintenance of Linguistic Complexity. Amsterdam: Benjamins.
Durie, Mark
1987 Grammatical relations in Acehnese. Studies in Language 11, 365399.
Ferguson, Charles, and Barlow, Michael
1988 Introduction to Agreement in Natural Language. Stanford: CSLI.
Geertz, Cliord
1955 Linguistic etiquette. The Religion of Java. Glencoe: The Free Press.
Givon, Talmy
1985 Iconicity, isomorphism, and non-arbitrary coding in syntax. In Haiman,
John (ed.) Iconicity in syntax, 187219. Amsterdam: Benjamins.
Haiman, John
1980 The iconicity of grammar: Isomorphism and motivation. Language 56,
515540.
1985 Natural Syntax. Cambridge: CUP.
1988 Incorporation, parallelism, and focus. In Hammond, Michael, Edith
Moravcsik, Jessica Wirth (eds.), Studies in Syntactic Typology. Amsterdam:
Benjamins, 303320.
2003 Explaining inxation. In Polinsky, M., and J. Moore (eds.), The Nature of
Explanation in Linguistic Theory. Stanford: CSLI, 105120.
forthc. a Competing motivations. In Song, J. (ed.), Handbook of Typology. Oxford:
OUP.
forthc. b Decorative imagery in ritual elaboration. In Noonan, M. et al. (eds.), For-
mulaic Language. Amsterdam: Benjamins.
forthc. c Ergativity in Kurdish.
Haiman, John, and Beninca, P.
1992 The Rhaeto-Romance Languages. London: Routledge.
Haspelmath, Martin
1993 More on the typology of inchoative/causative verb alternations. In Comrie,
Bernard, and Maria Polinsky (eds.), Causatives and Transitivity. Amster-
dam: Benjamins, 87120.
Heath, Jerey
1998 Hermit crabs. Language 74, 728759.
Heller, Joseph
1972 [1955] Catch-22. New York: Dell.
Hinton, Leanne
1982 How to cause in Mixtec. BLS 8, 354363.
Hyman, Larry
1977 On the syntax of body parts in Haya. In Haya Grammatical Structure,
(Southern California Occasional Papers in Linguistics 6). Los Angeles: Uni-
versity of Southern California, 99117.
James, William
1890 Principles of Psychology. NY: Dover reprints.
Kuteva, Tania
forthc. On the frills in language. Unpublished manuscript.
Lako, George, and Peters, S.
1969 Phrasal conjunction and symmetric predicates. In Reibel, D., and S. Schane
(eds.), Modern Studies in English. NJ: Englewood Clis, 113142.
Marchand, Hans
1960 Categories and Types in English Word Formation. Heidelberg: Carl Winter.
In defence of iconicity 47
Matiso, James
1982 The Grammar of Lahu. Berkeley: University of California Press.
Mayr, Ernst
2000 What Evolution Is. New York: Basic Books.
McCarus, Ernest
1958 A Kurdish Grammar: Descriptive Analysis of the Kurdish of Sulaimaniya,
Iraq. New York: American Coincil of Learned Societies.
Orwell, George
1957 Politics and the English language. In: Inside the whale and other essays, 143
157. Harmondsworth: Penguin.
Osumi, M.
1995 Body parts in Tinrin. In Chappell, H., and W. McGregor (eds.), 433
462.
Paul, Hermann
1880 Prinzipien der Sprachgeschichte. Tu bingen: Max Niemeyer.
Payne, Doris, and Barshi, Emmanuel
1999 External Possession. Amsterdam: Benjamins.
Pott, August
1862 Die Doppelung. Lemgo: Detmold.
Schuchardt, Hugo
1885 Gegen die Junggrammatiker. Berlin: Robert Oppenheimer.
Silverstein, Michael
1976 Hierarchy of features and ergativity. In Dixon, R. (ed.), Grammatical Cate-
gories in Australian Languages. Canberra: Australian Institute of Aboriginal
Studies, 112171.
Sohn, Ho-Min.
1994 Korean. London: Routledge.
Tiersma, Peter
1982 Local and general markedness. Language 58, 832849.
Tomasello, Michael, Tricia Striano, and Philippe Rochat
1999 Do young children use objects as symbols? British Journal of Developmental
Psychology 17, 563584.
Tsunoda, Tasaku
1995 The possession cline in Japanese and other languages. In Chappell, H., and
W. McGregor (eds.), 565630.
Watkins, Calvert
1962 Indo-European Origins of the Celtic Verb. Dublin: Institute for Advanced
Studies.
Wierzbicka, Anna
1972 Semantic Primitives. Frankfurt: Athenaum.
Zipf, George Kingsley
1935 The Psychobiology of Language. Boston: Houghton-Miin.
48 J. Haiman
On iconicity of distance
WILLIAM CROFT*
Abstract
Haspelmath argues that certain universal asymmetries in linguistic distance
previously analyzed as examples of iconicity of distance are better analyzed
as the result of frequency. It is argued here that Haspelmaths arguments
can be countered by an advocate of iconicity of distance as an explanatory
factor. Iconicity of distance is not dierent in kind from iconicity of conti-
guity, which Haspelmath endorses. Haspelmaths argument works only if
one takes relative frequency instead of absolute frequency; yet it is gener-
ally accepted that economy eects are the result of absolute frequency.
The empirical frequency data that Haspelmath presents is inconclusive.
However, Haspelmath presents data that suggest that an iconicity of dis-
tance analysis, at least for possession constructions, must be revised as icon-
icity of length. Finally, criteria are oered to dierentiate the eects of
economy, iconicity of distance/length, and iconicity of independence.
Keywords: frequency; iconicity; economy; distance.
Haspelmaths article challenges explanations based on iconic motivation
for three categories of linguistic phenomena, quantity, complexity and co-
hesion (distance). For all three of these phenomena, Haspelmath argues
that an explanation in terms of economic motivation, that is, based on
dierences in frequency, is superior to the iconicity explanation that
has been oered in the literature. Haspelmath does not deny that icon-
icity plays a major role in determining linguistic structure; his critique
does not touch the most important manifestations of iconicity in lan-
guage, namely paradigmatic isomorphism, syntagmatic isomorphism,
and contiguity.
I believe that Haspelmath is correct in his arguments that economic
motivation is a superior explanation for the quantity and complexity phe-
Cognitive Linguistics 191 (2008), 4957
DOI 10.1515/COG.2008.003
09365907/08/00190049
6 Walter de Gruyter
nomena he discusses. I did not consider either of these to be examples
of iconic motivation in Typology and Universals (Croft 2003). Instead,
length and complexity are reexes of typological markedness (Greenberg
1966). Typological markedness, at least the formal asymmetries in expres-
sion that Haspelmath discusses, are economically motivated, as Green-
berg argues (Haspelmath cites Greenbergs frequency-based explanations
in both cases).
Iconicity of cohesion is another matter. An explanation in terms of
iconic motivation can be largely defended, and an explanation in terms
of economic motivation appears to be unsatisfactory. Nevertheless, Has-
pelmaths article helps us to tease apart the relationship between iconicity
and economy in motivating linguistic universals.
Haspelmath divides Haimans iconicity of distance into two, contiguity
(see above) and cohesion. Haspelmath distinguishes contiguity from the
scale of iconicity of distance for the grammatical relationship of X to Y
in (1) (from Haiman 1983: 783):
(1) a. X A Y (an additional word is used to express the relationship
between X and Y)
b. X Y (no additional word is used to express the relationship be-
tween X and Y)
c. X-Y (X and Y are morphologically bound)
d. Z (a portmanteau expression of the concepts denoted by X and
Y)
Haspelmath argues that the scale in (1) does not correspond to dis-
tance, because (b) and (c) do not literally dier in distance, and distance
is not really applicable to (d) (5). But it is not clear to me that the notion
of distance is inappropriate for the distinctions in (1). The presence vs.
absence of a third morpheme can be fairly straightforwardly interpreted
in terms of linguistic distance (but see below). The contrast between mor-
phological freedom ( juxtaposition) and boundedness is intended to
represent both prosodic and segmental dierences in behavior that do
represent phenomena that can reasonably be called distance, even in a
strict temporal sense. Prosodically, morphologically free elements may
occur in dierent intonation units, and be interrupted by pause. Segmen-
tally, the articulatory gestures for the forms X and Y may overlap (in
assimilation and other segmental eects), which represents a certain tem-
poral overlap of the formal expression of X and Y. Finally, portmanteau
expression represents complete temporal overlap of the formal expression
of X and Y: all of Z expresses both X and Y. Thus, it is not unreasonable
to consider the scale in (1) to be an extension of iconicity of contiguity,
which Haspelmath accepts as genuinely iconically motivated.
50 W. Croft
The more important question, however, is whether the phenomena that
Haspelmath discusses really are better explained in terms of economic
rather than iconic motivation. Haspelmath discusses four examples: at-
tributive possession constructions, causative constructions, coordinating
constructions, and complement clause constructions. In the case of posses-
sive constructions, he gives frequency data as evidence for a frequency-
based explanation, and oers other grammatical arguments to support a
frequency-based explanation over an iconic explanation. In the case of
the other three constructions, however, he oers little or no frequency
data and few other arguments. It is only for possessives that Haspelmath
has a well developed argument against the iconicity explanation and in
favor of the frequency explanation. I will therefore focus on Haspel-
maths arguments regarding possessive constructions.
The relevant typological universal for possessives is that if there is a
dierence in linguistic distance (cohesion) between the alienable and in-
alienable constructions, the inalienable construction will always be lower
on the distance scale in (1) than the corresponding alienable construction.
Haiman explains this universal by iconicity of distance.
Haspelmaths frequency explanation is based on the relative frequency
of the possessed to the unpossessed form of a noun.
1
In text counts from
English and Spanish, Haspelmath demonstrates that the relative fre-
quency of body part terms and kinship terms in the possessed form com-
pared to the unpossessed form is greater than the relative frequency
of alienable nouns in the possessed form compared to the unpossessed
form. Haspelmath notes that inalienable nouns in the unpossessed con-
struction are crosslinguistically sometimes overtly coded (see his Koyu-
kon examples), and that this fact can be explained in terms of frequency.
In fact, Haspelmaths text counts actually indicate that even kinship
terms and body part terms occur more frequently in the unpossessed
construction.
Thus, an economy explanation only works if one uses relative fre-
quency of unpossessed vs. possessed inalienable nouns compared to the
relative frequency of unpossessed vs. possessed alienable nouns. But all
other examples of typological markednessfrequency-based dierences
in the structural expression of conceptsare of absolute frequency, not
relative frequency. Many such examples are given in Greenberg (1966)
and Bybee (1985); see also Croft (2003: 151, 154). In the one study that
that compares relative and absolute frequency with respect to phenomena
attributed to economy, namely morphological irregularity in Russian
nominal paradigms (Corbett et al. 2001), absolute frequency was a
strongly signicant factor, but relative frequency was only weakly signi-
cant (see Croft 2003: 206207).
On iconicity of distance 51
It is not an accident that absolute frequency has been found to be the
causal factor for economically motivated linguistic patterns. The theoret-
ical explanation for economy (e.g., Bybee 1985) requires absolute fre-
quency. Economy eects are due to degree of entrenchment of linguistic
forms (morphological forms or constructions such as the possessive) in
the mental representation of linguistic knowledge. Entrenchment leads
to routinization of the production of the form by a speaker, which in
turn brings about reduction of that form. But entrenchment is a result
of exposure to the number of tokens of the linguistic form; that is,
entrenchment is a function of the absolute frequencies of forms, not rela-
tive frequencies.
Haspelmath also appeals to predictability to account for economy
(2.2). Unfortunately, predictability is a vague concept: any mathemati-
cal relationship can be construed as predictable. But the most natural
psychological interpretation of predictability, as what the speaker would
be expected to produce, also relies on absolute frequency. If the posses-
sion construction is reduced for inalienable nouns compared to alien-
able nouns because inalienable nouns are more predictable in the pos-
session construction, this means that one would expect the absolute
frequency of inalienable nouns in the possession construction to be
greater than the absolute frequency of alienable nouns in the possession
construction (or perhaps greater than the absolute frequency of inalien-
able nouns in the unpossessed construction). In other words, relative
frequency would not be expected to lead to economy eects such as
reduction.
2
Furthermore, the iconicity of distance hypothesis is not about the rela-
tionship of the possessed construction to the unpossessed construction.
The iconicity of distance hypothesis compares the relationship of two pos-
sessed constructions, the inalienable construction and the alienable con-
struction. The iconicity of distance hypothesis makes no claim about the
unpossessed construction, or about the relationship of the unpossessed
construction to the possessed construction. This evidence is irrelevant to
the iconicity account. A genuine comparison of an iconicity account and
an economy account for distance/cohesion should compare the absolute
frequency of the inalienable possession construction to that of the alien-
able possession construction.
Haspelmaths gures appear to suggest that comparing these two abso-
lute frequencies does support an economic explanation. In English, there
are 12737 tokens of body part and kinship terms in the possessed con-
struction, and only 2967 tokens of alienable nouns in the possessed con-
struction. Unfortunately, the data that Haspelmath presents represents
only a subset of both inalienable and alienable nouns. We cannot be cer-
52 W. Croft
tain if the frequency dierence will remain the same once the vast number
of inalienable nouns is included. Thus, Haspelmaths frequency data is
inconclusive. Also, an economy explanation would make a dierent pre-
diction for those languages in which kinship terms are not found in the
inalienable possession construction (e.g., Kosraean [Kusaiean]). In such
a language, the alienable possession construction tokens would probably
outnumber the inalienable ones (this is what is implied by the token
frequency data for English and Spanish oered by Haspelmath). In that
case, an economy account would predict that the alienable possession
construction would be the more cohesive one. This would be an incorrect
empirical prediction.
For the possessive construction, Haspelmath argues that it is the (rela-
tive) frequency of the construction that matters, not the individual words:
. . . the [frequent] alienable nouns may well occur in a possessive con-
struction more often than the inalienable nouns. However, the percentage
of possessed occurrences of inalienable nouns will always be signi-
cantly higher than the corresponding percentage of [infrequent] inalien-
able nouns. But in every case of reduced alternative constructions that
has been investigated, what determines the reduction of the linguistic ex-
pression is the token frequency of specic words in the construction, not
the construction itself (e.g., Bybee and Slobin 1982; Bybee and Thompson
1997; Bybee and Scheibman 1999; Bybee 2001). If an individual word has
a low token frequency, it tends to be regularized. This latter phenomenon
cannot be explained by an economy account based on construction fre-
quency. The fact that low token frequency inalienable nouns still take
the more cohesive possessive construction must be due to some other fac-
tor. That other factor is iconicity of distance.
Haspelmath proposes that a frequency account can explain the fact
that the inalienable pronominal possessor expression is phonologically
shorter than the alienable pronominal possessor expression, and an icon-
icity of distance account cannot. The greater phonological fusion and
reduction of the inalienable pronominal possessor is almost certainly due
to the fact that the inalienable possessive construction is more highly
grammaticalized than the alienable possession construction. Higher fre-
quency plays a major role in grammaticalization (Bybee 2003). Once a
linguistic expression is extended to a grammatical function, it increases
in frequency, and the increase in frequency leads to erosion of the gram-
maticalizing morphemes in the expression.
However, the relevant frequency contrast in grammaticalization is be-
tween the grammaticalizing construction in a grammatical function and
its historical antecedent in its non-grammatical function. This is not
what the iconicity of distance account intends to explain. What we observe
On iconicity of distance 53
in possessive constructions is that a less grammaticalized pronominal con-
struction, which is presumably newer, has entered the possessive domain,
and is competing with the more grammaticalized possessive construction.
The result of this competition in many languages is a semantic division of
labora common result of competing variants (Croft 2000: 176177).
The semantic division is always such that the more cohesive construction
is used for more inalienable possession relations. The frequency eects of
grammaticalization cannot explain this fact. Iconicity of distance explains
this fact.
3
The remaining argument that Haspelmath raises against the iconicity
account is the only one that is a serious challenge to iconicity of distance
in possessive constructions. Haspelmath argues that iconicity of distance
requires the extra morpheme in alienable possessive constructions to oc-
cur between the possessor and possessum in the construction, yet it some-
times does not do so. But the iconicity explanation can be reformulated to
accommodate this phenomenon.
What is required is a reformulation of iconicity of distance with a
dierent measure of linguistic distance than simple temporal distance.
Haiman himself allows a nontemporal measure of linguistic distance
in his analysis of causative constructions: he argues that there is greater
linguistic distance when the causee is expressed as an indirect object
than as a direct object (Haiman 1983: 792). Haiman discusses the
Puluwat exampleas Haspelmath notesand proposes an alternative
formulationwhich Haspelmath does not discussin terms of phono-
logical bulk (Haiman 1983: 795). Haimans reformulation is that a
conceptually more distant relation is encoded by a linguistically bulkier
expression. This formulation changes the iconic mapping from distance
between X and Y to length of the linguistic form used to code the rela-
tion between X and Y. This reformulation would allow us to distinguish
between iconicity of distance proper, based on temporal distance of forms
in the linguistic signal, and an iconicity of temporal length of the rela-
tional expression. An iconicity of length account for possessive construc-
tions is superior to both the original iconicity of distance hypothesis and
the economy hypothesis.
Haspelmaths arguments against iconicity of distance or length do not
hold up, at least for possessive constructions. The frequency data that
Haspelmath invokes in favor of an economy account are inconclusive or
irrelevant. Haspelmath appeals to relative frequency of constructions,
whereas economy is due to absolute token frequencies of lexical forms (in
either morphological paradigms or grammatical constructions). Where
Haspelmath oers grammatical phenomena that really are economically
motivated (grammaticalization), they are dierent phenomena from the
54 W. Croft
ones that are hypothesized to be iconically motivated by Haiman. Where
frequency and semantics conict (low frequency forms, languages with
smaller inalienable classes), or frequency makes no prediction (division of
labor in grammaticalization), semantics explains what is observed, and the
semantic dierences are iconically motivated.
Nevertheless, some grammatical phenomena are undoubtedly econom-
ically motivated. Nor should we rule out the possibility of multiple moti-
vations, which Haspelmath appears to do. For example, Haspelmath
suggests (though with little data) that certain patterns in complex sen-
tence cohesion that have been taken to be conceptually close are also
higher in frequency. Haspelmath concludes that iconicity therefore has
no role to play in explaining linguistic dierences in complex sentence
constructions. But there is no a priori reason to assume that only one
functional motivation applies for every linguistic construction (Croft
2003: 6469). For example, natural coordination (such as brother and
sister; see Hasplemath, 6.3) may be more frequent than man and snake,
but it is also conceptually more of a unit (Wierzbicka 1980). What is re-
ally necessary is a careful examination of what motivations appear to be
operating to explain each typological universal, and the linguistic predic-
tions each makes.
For instance, in the analysis of complex sentences, one must distinguish
between iconicity of distance and iconicity of independence: concepts
having less conceptual independence will be linguistically no more inde-
pendent than concepts having greater conceptual independence (Cristo-
faro 2003; Croft 2003: 213219). In discussing Cristofaros (2003) typo-
logical universals of subordination, I suggest that deranking of clauses
(Stassen 1985) is economically motivated, because deranking involves
asymmetries in overt coding and reduced behavioral potential as well
as dierences in token frequency. However, dierences in linguistic inte-
gration of subordinate clauses is iconically motivated by conceptual inde-
pendence, because semantic integration and temporal dependence of the
subordinate clausethe factors that determine degree of linguistic inde-
pendenceare symptoms of conceptual independence, not conceptual
distance.
Table 1 indicates salient properties of dierent types of functional
explanations for linguistic cohesion.
Some properties are found under more than one explanation: the same
phenomenon may have alternative explanations. For example, coding
length can be explained either by economy or iconicity of distance/
length. Haspelmath appears to assume that coding length is only explain-
able in terms of economy (6), but this is not necessarily the case. One
must examine all of the properties in Table 1 in order to identify which
On iconicity of distance 55
motivation is operating. If one nds the union of properties from more
than one motivation, then it is likely that multiple motivation is the best
explanation for the linguistic phenomenon.
Received 30 April 2007 University of New Mexico,
Albuquerque, USA
Notes
1. Haspelmath incorrectly describes the distinction between absolute and relative frequency
in 4.2. There he describes relative frequency as a comparison of the absolute frequen-
cies of paradigmatic alternatives such as singular book and plural books. But this is sim-
ply comparison of absolute frequencies. Relative frequency is a proportional frequency,
measured by comparing percentages of one form relative to another form. That is,
relative frequency is a second-order comparison of sets of absolute frequencies (see also
Corbett et al. 2001: 202203).
2. An anonymous referee points out that some recent work in cognitive linguistics (e.g.,
Gries et al. 2005 and works cited therein) makes use of relative frequency. However,
this work uses relative frequency to tease apart subtle semantic distinctions between con-
structions and to better identify individual words semantically most closely associated
with a constructions meaning. It does not claim to motivate economy eects such as
phonological reduction.
3. Haspelmath also cites the relative length of direct vs. indirect causation markers as par-
allel to the relative length of inalienable vs. alienable pronominal possessors. The same
counterargument applies to the causatives as well. It is also possible that in causation,
the dierence between direct and indirect causation reects the conceptual distance be-
Table 1. Major properties of dierent types of functional motivation
Economy Iconicity of Distance or
Length
Iconicity of Independence
Asymmetry in coding length Asymmetry in coding length
Asymmetry in morphological
boundedness (e.g., [X Y] vs.
[X-Y] or [X A Y] vs. [X A-Y]
Asymmetry in
morphological boundedness
Asymmetry in behavioral
potential (Croft 2003: 9599)
Asymmetry in syntactic
potential
Independent phenomenon
from rest of construction
(e.g., contrasts between
[X A] and [X] regardless of
presence or absence of Y)
Involves coding of relation
between X and Y in
construction
Involves coding of relation
between X and Y in
construction
Motivated by absolute
lexical token frequency
Motivated by conceptual
distance
Motivated by conceptual
independence (which also
involves conceptual distance)
56 W. Croft
tween causer and causee, i.e., the arguments of the causative construction. In that case,
the arguments are X and Y and the causative marker is A in Haimans formula for lin-
guistic distance in (1a), and the dierences in length of direct vs. indirect causation
markers is exactly predicted by iconicity of distance.
References
Bybee, Joan L.
1985 Morphology: A Study into the Relation between Meaning and Form. Amster-
dam: John Benjamins.
2001 Frequency eects on French liaison. In Bybee, Joan L., and Paul Hopper
(eds.), Frequency and the Emergence of Linguistic Structure. Amsterdam:
John Benjamins, 337359.
2003 Mechanisms of change in grammaticalization: the role of frequency. In Jo-
seph, Brian, and Richard Janda (eds.), Handbook of Historical Linguistics.
Oxford: Blackwell, 602623.
Bybee, Joan L. and Joanne Scheibman
1999 The eect of usage on degrees of consitutency: the reduction of dont in
English. Linguistics 37, 575596.
Bybee, Joan L. and Dan I. Slobin
1982 Rules and schemas in the development and use of the English past tense.
Language 58, 265289.
Bybee, Joan L. and Sandra A. Thompson
1997 Three frequency eects in syntax. In Juge, Matthew L., and Jeri Moxley
(eds.), Proceedings of the 23rd Annual Meeting of the Berkeley Linguistics
Society. Berkeley: Berkeley Linguistics Society, 378388.
Corbett, Greville G., Andrew Hippisley, Dunstan Brown and Paul Marriott
2001 Frequency, regularity and the paradigm: a perspective from Russian on a
complex relation. In Bybee, Joan L., and Paul Hopper (eds.), Frequency
and the Emergence of Linguistic Structure. Amsterdam: John Benjamins,
201226.
Cristofaro, Sonia
2003 Subordination. Oxford: Oxford University Press.
Croft, William
2000 Explaining Language Change: An Evolutionary Approach. Harlow, Essex:
Longman.
2003 Typology and Universals, 2nd ed. Cambridge: Cambridge University Press.
Greenberg, Joseph H.
1966 Language Universals, with Special Reference to Feature Hierarchies. (Janua
Linguarum, Series Minor 59.) The Hague: Mouton.
Gries, Stefan Th., Beate Hampe and Doris Scho nefeld
2005 Converging evidence: bringing together experimental and corpus data on the
association of verbs and constructions. Cognitive Linguistics 16, 635677.
Haiman, John
1983 Iconic and economic motivation. Language 59, 781819.
Stassen, Leon
1985 Comparison and Universal Grammar. Oxford: Basil Blackwell.
Wierzbicka, Anna
1980 Coordination: the semantics of syntactic constructions. Lingua Mentalis:
The Semantics of Natural Language. New York: Academic Press, 223285.
On iconicity of distance 57
Reply to Haiman and Croft
MARTIN HASPELMATH*
I am grateful to John Haiman and William Croft for their penetrating cri-
tiques of my claims and for the interesting challenges that they provide
for them. This oers me a chance to clarify and elaborate on some of the
central points of my article. This is an important debate, because iconicity
and frequency are central explanatoty concepts in functional and cogni-
tive linguistics. Even if we do not succeed in resolving the issues, our un-
derstanding will be enhanced by this discussion.
1. How frequency explains grammatical asymmetries
A key presupposition of my paper is that frequency of use implies short
coding because frequent items are more predictable. Croft assumes a
rather dierent account of the frequency-shortness connection. He claims
that
[e]conomy eects are due to degree of entrenchment of linguistic forms . . . En-
trenchment leads to routinization of the production of the form by a speaker,
which in turn brings about reduction of that form.
This echoes similar remarks in Joan Bybees work (e.g., Bybee 2001,
2003), but I do not see how such a view can be reconciled with some basic
facts. To be sure, routinization often cooccurs with reduction of form, be-
cause forms that are routinized for the speaker are often also predictable
for the hearer. But in such cases the cause of the reduction is not the
routinization, but the speakers tendency to save energy when part of
the message is predictable. When a routinized form is not predictable
(e.g., when I dictate my phone number to someone), no reduction occurs.
George Kingsley Zipf saw this correctly from the beginning of his
writings:
In listening to spoken language, we notice that, among other things, the speaker
invariably emphasizes these two: rst, what is new or unexpected to the hearer;
Cognitive Linguistics 191 (2008), 5966
DOI 10.1515/COG.2008.004
09365907/08/00190059
6 Walter de Gruyter
second, what the hearer desires [for the speaker] to make especially clear . . . But
that which is unexpected, unusual, or unfamiliar to the hearer is, by denition, the
seldom. (Zipf 1929: 5)
Thus, frequency-induced reduction is to a large extent a hearer-based
phenomenon and is not due to routinization, but to predictability. It
should also be noted that predictability need not be due to linguistic fre-
quency. Stereotypical situations allow massive reduction, simply because
the context makes the utterance content easy to predict. In grammar, too,
some reductions (e.g., lack of stress on anaphoric epithets, discussed by
Haiman in his 2.4) are due to referential predictability from the context,
not to high frequency of use.
A related issue is productivity, which, as Haiman rightly observes, is
the real test for psychological reality. However, an explanatory factor
like frequency of use is not meant to be psychologically real in the way
in which cognitive schemas or generative rules are sometimes said to be
psychologically real. Frequency eects in processing (cf. Ellis 2002) aect
language structure through speakers innovations that ultimately lead to
language change (cf. Bybee 2007). This type of explanation is thus akin
to adaptive explanation in biology (cf. Haspelmath 1999; Croft 2000; Ble-
vins 2004), and I take this to be the standard appproach to explaining
universals in current functional linguistics (cf. Bybee 1988; Kirby 1999;
Newmeyer 2005). Even one of Haimans productivity arguments (under-
analysis of 3rd person markers, or Watkins Law, in his 2.3) is clearly
of the diachronic sort. The two real examples of productivity of a claimed
iconicity eect, transitive disappear (Haimans 2.1) and unstressed epi-
thets (Ive KILLED the pig, 2.4), evidently illustrate the productivity of
conventional regularities of English, not of iconicity itself. In German,
for instance, the verb verschwinden disappear could not possibly develop
transitive uses, because there is no productive ambitransitive alternation
in the language. Of course, to the extent that these regularities of English
reect universal tendencies, these might be due to iconicity (or frequency
or some other explanatory factor), but in that case the explanation is
again mediated by diachrony.
2. How to compare frequencies
As Crofts comments show, there may be a question about which fre-
quencies to compare with which other frequencies. My claim is that
alleged iconicity-of-cohesion eects such as alienability contrasts in
possessive constructions are due to the same kinds of frequency asymme-
tries that give rise to the classical eects of typological markedness
60 M. Haspelmath
(Greenberg 1966, Croft 2003: Ch. 4). Croft disputes this and claims that
my explanation is based on relative frequency, whereas typological mark-
edness is based on absolute frequency. I believe that this reects a misun-
derstanding, so let me clarify the way I see the parallels.
We consider two forms (A and B) that are paradigmatic alternatives
(Croft 2003: 90). In the case of markedness reversal, there are two
classes of lexemes (I and II) that behave dierently, both in terms of fre-
quency and (consequently) in terms of coding. Let us take singular and
plural marking again, where quite a few languages have a class of nouns
(let us call them plural-prominent) which often occur in the plural and
hence have a longer singular form (singulative, cf. Croft 2003: 189
190). Even English could be said to have a few such nouns, e.g., datu-m/
data, criterio-n/criteria. Table 1 shows the frequencies of two selected
English nouns, a singular-prominent and a plural-prominent noun. For
each noun, the rst column gives the absolute frequency, and the second
column gives the relative frequency in percentages.
The standard frequency-based and least-eort-based explanation of
the coding contrasts is that they are due to within-class, across-form dif-
ferences in frequency: In each case, the overtly coded form is signicantly
rarer than the other form. As long as we only look at individual nouns,
it does not matter whether we compare the absolute or the relative fre-
quencies. But when we compare dierent classes, it is important to com-
pare relative rather than absolute frequencies, because in absolute terms,
houses is much more frequent than criteria. Clearly, across-class, within-
form comparisons are not meaningful in the present context.
The picture for alienability contrasts is completely analogous. The two
forms are the possessed and the unpossessed form, and the two classes are
alienable nouns and inalienable ( kinship and body-part term) nouns.
Table 2 shows the frequencies of two selected English nouns.
Again, I claim that the coding contrasts (which this time cannot be il-
lustrated from English, because English treats all possessed nouns alike)
are due to (vertical) within-class, across-form dierences in frequency.
Croft, by contrast, suggests that one should compare (horizontal) across-
class, within-form dierences, but these are as irrelevant for the form
dierences as in Table 1. Some alienable nouns (such as house) are very
Table 1. Frequencies of house and criterion (singular/plural) in the British National Corpus
(spoken)
class I (singular-prominent) class II (plural-prominent)
form A (singular) house 4811 83% criterio-n 137 27%
form B (plural) house-s 1020 17% criteria 365 73%
5831 100% 502 100%
Reply to Haiman and Croft 61
frequent, others (such as palace) are rarer, and some inalienable nouns
(such as head or hand ) are very frequent, whereas others (such as nose or
kidney) are rarer. What unites inalienable nouns is that they have a high
proportion of possessed occurrences, i.e., a high relative frequency of
form B. All this is just as in the singular/plural contrasts seen earlier.
In Table 2, the proportion of form B in class II is more than 50%, as is
the proportion of form B (plural) in class II (plural-prominent) in Table
1. Likewise, the proportion of form B in class I is below 50% in both
tables. However, this is not actually necessary in order to explain the
form contrast between class I and class II. All that matters is that the pro-
portion of form B is signicantly higher in class II than in class I. A
higher proportion of form B means that form B is more predictable than
in class I, which means that it is more likely to be expressed in a short
way. Thus, while the gures in Table 3 are not as overwhelmingly signi-
cant as those in Tables 1 and 2, they are still signicant and sucient to
explain the fact that in some languages, paired body parts have longer
singulars than plurals.
The gures given in my article for body-part and kinship terms in En-
glish and Spanish are more like the gures in Table 3 than the gures in
Table 1, and this may strike some observers (such as Croft) as less than
fully convincing. However, requiring that the frequency should be higher
than 50% in a within-class comparison would not be reasonable, because
quite generally, implicational universals make only relative predictions.
Some languages never mark number or possession, and some languages
always do. But when number or possession marking (or any other kind
of marking) is dierent for dierent lexeme classes, the general prediction
Table 2. Frequencies of house and nose (unpossessed/possessed) in the British National
Corpus (spoken)
class I (alienable) class II (inalienable)
form A (unpossessed) house 3614 75% nose 134 36%
form B (possessed) (someones) house 1197
4811
25%
100%
(someones) nose 238
372
64%
100%
Table 3. Frequencies of nose and foot (singular/plural) in the British National Corpus
(spoken)
class I (singular-prominent) class II (plural-prominent)
form A (singular) nose 372 92% foot 886 51%
form B (plural) nose-s 32 8% feet 877 49%
404 100% 1763 100%
62 M. Haspelmath
is that the higher the frequency of a form, the less marking it receives.
This prediction is fully borne out by the available data on possessive
constructions.
Croft also mentions Corbett et al.s (2001) discussion of relative and
absolute frequency, and their result that relative frequency is much less
important than absolute frequency. However, Corbett et al. are interested
in morphological irregularities, not in coding asymmetries. As I note at
the beginning of 6 of my paper, high absolute frequency favours sup-
pletion (and irregularity more generally), because irregularity is due to
memorizability and has nothing to do with predictability.
Thus, coding asymmetries that correlate with frequency asymmetries
are due to dierential predictability, which can be measured by relative
frequencies. Absolute frequencies explain irregularity. Haimans cohesion
scale has two completely dierent explanations.
3. Kinds of iconicity
In his comments, Croft insists that Haimans cohesion scale can be inter-
preted in terms of temporal distance (not just cohesion, as I had argued),
and thus as an extension of iconicity of cohesion (an iconicity type that I
do not question). I nd this a legitimate view, and indeed the similarities
between iconicity of cohesion/distance and iconicity of contiguity are too
obvious to be overlooked. But sometimes appearances are deceptive, and
I have claimed that the phenomena explained by iconicity of cohesion are
not uniform and must be explained in two dierent ways. This makes
it less surprising that I also claim that contiguity phenomena have yet
another explanation. Thus, instead of Crofts uniform explanation in
terms of iconicity of contiguity and distance (Croft 2003: 7.2.1), I have
three separate explanations for three separate kinds of phenomena: icon-
icity of contiguity (for constituency), frequency-induced predictability
(for coding asymmetries), and frequency-induced memorizability (for
suppletion).
This ies in the face of Haimans principle that an explanation should
be preferred if it is applicable to a broader range of phenomena. But I
do not think that this is a useful heuristic for developing explanatory ac-
counts of highly complex phenomena such as language. Nobody doubts
that language structure is inuenced by a variety of factors, so we should
put all our energy into identifying the precise roles of these factors and
rening our predictions, rather than reducing everything to a few big prin-
ciples and sweeping the details under the rug.
1
Thus, I happily admit that
my frequency account cannot be extended to the external possession and
possessor agreement constructions mentioned by Haiman.
Reply to Haiman and Croft 63
Croft accepts the frequency explanation for iconicity of complexity and
only argues for iconicity of distance/cohesion, and Haiman (1983) had
also argued only for iconicity of cohesion. So it is surprising to see Hai-
man defend iconicity of complexity, in his 2.2, with regard to contrats
such as entity/thing and leng-th/long. He apparently wants to claim that
words expressing basic and concrete meanings such as thing and long
tend to be short, whereas words expressing derived and abstract meanings
such as entity and length tend to be long. This could be evaluated
properly once the claim is made more precise, but let me note here that
many highly abstract concepts are expressed in a very short way (e.g.,
in, on, to, have), and that many concrete concepts are expressed in
a very long way (e.g., caterpillar, rhododendron, game console, thun-
derstorm). Maybe these latter examples would not counts as basic,
and of course children rst acquire shorter words (e.g., car, dog,
rain), but this is because they tend to be more frequent. Haiman
cites my (1993) paper in support of this view, so I should emphasize
again here that I now believe that the relevant part of that paper was
mistaken.
Croft recognizes that the cases where the possessive marker is in a pe-
ripheral rather than in a medial position (examples (30ad) of my article)
present a serious challenge to iconicity of distance, and he suggests that it
should be replaced by iconicity of length. However, he does not explain
in what sense the relationship between form and meaning would still be
iconic. According to Haiman (1983: 782783), short linguistic distance
iconically corresponds to short conceptual distance ( conceptual close-
ness). Iconicity of length would presumably consist in an iconic corre-
spondence between linguistic length and conceptual length, but it is un-
clear what the latter could mean. Perhaps it would mean the same as
conceptual complexity, but then iconicity of length would be indistin-
guishable from iconicity of complexity, which Croft rejects just as I do.
4. Multiple motivations
Both Croft and Haiman emphasize that we should allow for the possi-
bility of multiple motivations (Schuchardts motley interplay of innu-
merable drives), and I fully agree with this. I also agree with Haiman
that each of these motivations is most clearly attested ceteris paribus,
that is, when they operate unopposed, and with Croft that one must
examine all the properties [of a phenomenon] in order to identify which
motivation is operating. Discovering the motivation(s) of a universal
tendency of language structure is not at all straightforward, and overall
our understanding is still very limited. Both John Haiman and William
64 M. Haspelmath
Croft have made substantial contributions to this enterprise, but I still
believe that the modication of their views that I proposed in my article
is hard to avoid. But to make further progress in this area, we need more
empirical work and more debates like the current one.
Received 1 June 2007 Max-Planck Institute for Evolutionary
Anthropology, Leipzig, Germany
Notes
* Contact address: Max Planck Institute for Evolutionary Anthropology, Deutscher Platz
6, 04103 Leipzig, Germany; authors e-mail: 3haspelmath@eva.mpg.de4.
1. In his conclusions, Haiman himself emphasizes the diversity of factors, and (ironically)
accuses me of reductionism. Evidently we need both some reductionism (such as the
principle that identical eects should be derived from identical causes) and close atten-
tion to the details, and the only question is how to achieve the right balance. It seems to
me that I am more on the splitters side, whereas Haiman is more (and a bit too much)
of a lumper.
References
Blevins, Juliette
2004 Evolutionary Phonology. Cambridge: Cambridge University Press.
Bybee, Joan L.
1988 The diachronic dimension in explanation. In: John A. Hawkins (ed.),
Explaining Language Universals. Oxford: Blackwell, 350379.
2001 Phonology and Language Use. Cambridge: Cambridge University Press.
2003 Mechanisms of change in grammaticalization: the role of frequency. In:
Janda, Richard and Joseph, Brian (eds.), Handbook of Historical Linguistics.
Blackwell, 602623.
2007 Frequency of Use and the Organization of Language. Oxford: Oxford Uni-
versity Press.
Corbett, Greville, Andrew Hippisley, Dunstan Brown, and Paul Marriott
2001 Frequency, regularity and the paradigm: A perspective from Russian on a
complex relation. In Joan Bybee and Paul Hopper (eds.), Frequency and the
Emergence of Linguistic Structure. Amsterdam: John Benjamins, 201226.
Croft, William
2000 Explaining Language Change: An Evolutionary Approach. London:
Longman.
2003 Typology and Universals. 2nd ed. Cambridge: Cambridge University Press.
Ellis, Nick C.
2002. Frequency eects in language acquisition: A review with implications
for theories of implicit and explicit language acquisition. Studies in Second
Language Acquisition 24(2), 143188.
Greenberg, Joseph H.
1966 Language Universals, with Special Reference to Feature Hierarchies. (Janua
Linguarum, Series Minor 59.) The Hague: Mouton.
Reply to Haiman and Croft 65
Haiman, John
1983 Iconic and economic motivation. Language 59: 781819.
Haspelmath, Martin
1993 More on the typology of inchoative/causative verb alternations. In Comrie,
Bernard, and Maria Polinsky (eds.), Causatives and Transitivity. Amster-
dam: Benjamins, 87120.
1999 Optimality and diachronic adaptation. Zeitschrift fur Sprachwissenschaft
18(2), 180205.
Kirby, Simon
1999 Function, Selection, and Innateness: The Emergence of Language Universals.
Oxford: Oxford University Press.
Newmeyer, Frederick J.
2005 Possible and Probable Languages. Oxford: Oxford University Press.
Schuchardt, Hugo
Gegen die Junggrammatiker. Berlin: Robert Oppenheimer.
Zipf, George Kingsley
1929 Relative frequency as a determinant of phonetic change. Harvard Studies in
Classical Philology 40: 195.
66 M. Haspelmath
Linguistic and metalinguistic categories
in second language learning
KAREN ROEHR*
Abstract
This paper discusses proposed characteristics of implicit linguistic and ex-
plicit metalinguistic knowledge representations as well as the properties of
implicit and explicit processes believed to operate on these representations.
In accordance with assumptions made in the usage-based approach to lan-
guage and language acquisition, it is assumed that implicit linguistic knowl-
edge is represented in terms of exible and context-dependent categories
which are subject to similarity-based processing. It is suggested that, by
contrast, explicit metalinguistic knowledge is characterized by stable and
discrete Aristotelian categories which subserve conscious, rule-based pro-
cessing. The consequences of these dierences in category structure and
processing mechanisms for the usefulness or otherwise of metalinguistic
knowledge in second language learning and performance are explored. Ref-
erence is made to existing empirical and theoretical research about the role
of metalinguistic knowledge in second language acquisition, and specic
empirical predictions arising out of the line of argument adopted in the cur-
rent paper are put forward.
Keywords: categorization; explicit and implicit knowledge; metalinguistic
knowledge; second language learning, usage-based model.
1. Introduction
This article is concerned with the role of metalinguistic knowledge, or ex-
plicit knowledge about language, in the area of second language acquisi-
tion (SLA). It is situated within a cognitive-functional approach to lan-
guage and language learning, in the belief that our understanding of an
essentially pedagogical notionmetalinguistic knowledgemay be en-
hanced if we consider this notion in terms of a specic linguistic theory,
Cognitive Linguistics 191 (2008), 67106
DOI 10.1515/COG.2008.005
09365907/08/00190067
6 Walter de Gruyter
that is, the usage-based model of language. In this way, light can be shed
on a concept which is of interest to second language (L2) teachers, adult
language learners themselves, and last but certainly not least, applied lin-
guists of all theoretical persuasions, including cognitive linguists with a
pedagogical outlook (e.g., Achard and Niemeier 2004; Boers and Lind-
stromberg 2006).
In this paper, I argue that while implicit linguistic knowledge is charac-
terized by exemplar-based categories, explicit metalinguistic knowledge
relies on Aristotelian categories. Exemplar-based categories are exible,
highly contextualized, and subject to prototype eects, whereas Aristote-
lian categories are stable, discrete, and clearly delineated. These charac-
teristics can be illustrated briey with the help of the following examples
(from Taylor 2003): (1) The Pope is a bachelor. (2) Her husband is an un-
repentant bachelor.
1
If the construction bachelor is considered in terms of
Aristotelian category structure, i.e., if it is dened by means of primitive
binary features such as adult, male, married, etc., sentence (1) would
be judged semantically acceptable, while sentence (2) would have to be
regarded as semantically anomalous. Conversely, if the construction
bachelor is considered in terms of exemplar-based category structure, cat-
egorization by means of primitive binary features no longer applies. In-
stead, specic attributes associated with the category [bachelor] can be
perspectivized in accordance with the linguistic and cultural context pro-
vided by the sentences in which the construction appears, whereas other
attributes may be ltered out. Thus, sentence (1) seems somewhat odd,
since bachelorhood is taken for granted in a pope. Sentence (2), by con-
trast, is no longer anomalous, since certain behavioural attributes associ-
ated with the (idealized) prototype of an unmarried man are highlighted;
at the same time, the attribute associated with the marital status of a pro-
totypical bachelor is temporarily ignored.
In addition to positing qualitatively distinct category structures, I as-
sume that the processing mechanisms operating on implicit linguistic and
explicit metalinguistic knowledge representations are qualitatively dier-
ent. While implicit linguistic knowledge is stored in and retrieved from
an associative network during parallel distributed, similarity-based pro-
cessing, explicit metalinguistic knowledge is processed sequentially with
the help of rule-based algorithms. I suggest that these distinctions be-
tween linguistic and metalinguistic knowledge representations and pro-
cesses aect the way in which the two types of knowledge can be used in
L2 learning and performance.
Indeed, it appears that the proposed conceptualization of linguistic and
metalinguistic knowledge in terms of dierent category structures and as-
sociated dierences in processing mechanisms can help explain available
68 K. Roehr
ndings from the area of SLA which are indicative of both facilitative po-
tential and apparent limitations of metalinguistic knowledge in L2 learn-
ing and performance. Moreover, if read in conjunction with existing
research, the proposed conceptualization allows for the formulation of
specic predictions about the use of metalinguistic knowledge in L2 learn-
ing, both at a general level and for particular types of language learners.
The article is organized as follows: Section 2 provides denitions of the
main constructs under discussion, that is, explicit and implicit knowledge,
explicit and implicit learning, pedagogical grammar, and metalinguistic
knowledge. In Section 3, assumptions about the nature of implicit linguis-
tic knowledge commonly made by researchers working in a usage-based
paradigm are outlined. Section 4 contains a summary and evaluation of
key empirical and theoretical research in relation to the role of explicit
knowledge in language acquisition, with a strong emphasis on L2 learn-
ing. Section 5 puts forward the proposal which is at the core of the
current paper, with the argument focusing on the contrasting category
structures of implicit linguistic knowledge and explicit metalinguistic
knowledge as well as dierences in processing mechanisms associated
with these. Section 6 details empirical predictions that emerge from the
argument put forward in the current paper. Section 7 oers a brief
conclusion.
2. Construct denitions
Explicit knowledge is dened as declarative knowledge that can be
brought into awareness and that is potentially available for verbal report,
while implicit knowledge is dened as knowledge that cannot be brought
into awareness and cannot be articulated (Anderson 2005; Hulstijn 2005).
Accordingly, explicit learning refers to situations when the learner has
online awareness, formulating and testing conscious hypotheses in the
course of learning. Conversely, implicit learning describes when learn-
ing takes place without these processes; it is an unconscious process of
induction resulting in intuitive knowledge that exceeds what can be ex-
pressed by learners (N. Ellis 1994: 3839; see also N. Ellis 1996; Hul-
stijn 2005).
It is assumed that focused attention is a necessary requirement for
bringing representations or processes into conscious awareness, i.e., for
knowledge or learning to be explicit. In accordance with existing research,
three separable but associated attentional sub-processes are assumed, that
is, alertness, orientation, and detection (Schmidt 2001; Tomlin and Villa
1994). In this conceptualization of attention, alertness refers to an indi-
viduals general readiness to deal with incoming stimuli; orientation
Categories in second language learning 69
concerns the allocation of resources based on expectations about the
particular class of incoming information; during detection, attention fo-
cuses on specic details. Detection is thought to require more attentional
resources than alertness and orientation, and to enable higher-level pro-
cessing (Robinson 1995). Stimulus detection may occur with or without
awareness. If coupled with awareness, stimulus detection is equivalent
with noticing, which is dened as awareness in the sense of (momentary)
subjective experience (Schmidt 1990, 1993, 2001). Proponents of the so-
called noticing hypothesis argue that noticing, or attention at the level of
awareness, is required for L2 learning to take place.
It is worth noting that the concepts of attention, noticing, and aware-
ness, as well as their application in SLA, remain controversial (for critical
reviews, see, for instance, Robinson 2003; Simard and Wong 2001). Nev-
ertheless, a working denition is needed to allow for a clear discussion.
Thus, for the purpose of the present article, it is assumed that the ne
line between focused attention in the sense of stimulus detection and fo-
cused attention in the sense of noticing can be regarded as the threshold
of conscious awareness, that is, the point of interface between implicit
and explicit processes and representations.
First and foremost, the present paper is concerned with the notion of
metalinguistic knowledge. Metalinguistic knowledge is a specic type of
explicit knowledge, that is, an individuals explicit knowledge about
language. Accordingly, L2 metalinguistic knowledge is an individuals
knowledge about the L2 they are attempting to learn. The term metalin-
guistic knowledge tends to be used in applied linguistics research concen-
trating on L2 learning and teaching (e.g., Alderson et al. 1997; Bialystok
1979; Elder and Manwaring 2004), and it is closely related to applied
linguists conceptualization of pedagogical grammar (e.g., McDonough
2002; Saporta 1973; Towell 2002). Pedagogical grammar has been de-
scribed as a cover term for any learner- or teacher-oriented description
or presentation of foreign language rule complexes with the aim of pro-
moting and guiding learning processes in the acquisition of that language
(Chalker 1994: 34, quoting Dirven 1990). It is worth noting that, in dis-
cussions of pedagogical grammar, the term grammar is used in a broad
sense as referring to any aspect of language that can be described system-
atically; it is therefore not restricted to morphosyntactic phenomena.
In sum, the notion of metalinguistic knowledge is concerned with a
learners explicit mental representations, while the notion of pedagogical
grammar is concerned with explicit written or oral descriptions of lin-
guistic systematicities which can be presented to a learner as a source of
information about the L2. Accordingly, a learners metalinguistic knowl-
edge may arise from encounters with pedagogical grammar, e.g., through
70 K. Roehr
textbooks and/or through exposure to rule-based or other types of form-
focused instruction (R. Ellis 2001; Sanz and Morgan-Short 2005). By the
same token, pedagogical grammar has arisen from the metalinguistic
knowledge of applied linguists, L2 teachers, and materials designers.
Thus, while the labels of metalinguistic knowledge and pedagogical gram-
mar are used to denote, respectively, an individuals mental representa-
tions and written or oral instructional aids, the two notions are similar
to the extent that they are both explicit by denition and that the latter
can give rise to the former as well as vice versa.
As the argument presented in what follows is concerned with dier-
ences in category structure between explicit and implicit knowledge, the
question of whether a learners explicit knowledge has been derived
bottom-up through a process of analysis of the linguistic input or whether
it has been acquired top-down through formal study of grammar text-
books is not of immediate relevance. In other words, for the purpose of
the current discussion, it does not matter whether explicit knowledge has
arisen from implicit knowledge, e.g., when an L2 learner, perhaps after
prolonged experience with the L2, discovers certain systematicities and
arrives at a pedagogical grammar rule of their own, which is represented
as metalinguistic knowledge and can be articulated, or whether explicit
knowledge is assimilated from the environment, e.g., when an L2 learner
listens to a teachers explanation drawing on a pedagogical grammar rule
and memorizes this information as metalinguistic knowledge. In either
scenario, the dening characteristics, including the internal category
structure, of the metalinguistic knowledge held by the learner remain the
same, as will become apparent in Section 5 below.
It is acknowledged that there may be pedagogically relevant dierences
between internally induced metalinguistic knowledge and metalinguistic
knowledge gleaned from externally presented pedagogical grammar that
are of practical interest to teachers and learners in the L2 classroom. I
am not aware of any empirical research pertaining to this specic issue,
but one could hypothesize, for instance, that pedagogical grammar rules
presented to the learner are more accurate than metalinguistic knowledge
induced bottom-up by the learner him/herself, since the cumulative
knowledge of the applied linguistics community is based on more exten-
sive language experience than the average individual learner has been
able to gather. Alternatively, one could hypothesize that metalinguistic
knowledge derived by the learner him/herself is more relevant to the indi-
viduals L2 learning situation than one-size-ts-all pedagogical grammar
rules acquired from a commercially produced textbook. These questions,
though clearly interesting in themselves, do not impact on the theoretical
argument put forward here, however.
Categories in second language learning 71
Finally, it is worth noting that rule-based or other types of form-
focused instruction occur not only in the L2 classroom, but also in the
context of laboratory studies. Reports of such empirical studies as well
as theoretical papers with a psycholinguistic orientation (e.g., DeKeyser
2003; N. Ellis 1993; Robinson 1997) tend not to use the terms form-
focused instruction, pedagogical grammar, or metalinguistic knowledge;
instead, they refer more generally to explicit learning conditions and
learners explicit knowledge. However, explicit learning conditions draw-
ing on learners explicit knowledge typically require knowledge about the
L2, i.e., metalinguistic knowledge. Hence, the notion of metalinguistic
knowledge is of relevance to L2 learning and L2 teaching, as well as to
psycholinguistically oriented and applied SLA research.
In the context of the present article, metalinguistic knowledge is dened
as a learners explicit or declarative knowledge about the syntactic, mor-
phological, lexical, pragmatic, and phonological features of the L2. Meta-
linguistic knowledge includes explicit knowledge about categories as well
as explicit knowledge about relations between categories (R. Ellis 2004;
Hu 2002; Roehr 2007). Metalinguistic knowledge can vary in terms of
specicity and complexity, but it minimally involves either a schematic
category or a relation between two categories, specic or schematic. Meta-
linguistic knowledge relies on Aristotelian categories, i.e., categories that
are stable and discrete. These categories subserve sequential, rule-based
processing.
In the following sections, these proposed characteristics of metalinguis-
tic knowledge will be explained and exemplied. I will begin by comparing
and contrasting the characteristics of explicit metalinguistic knowledge
with the characteristics of implicit linguistic knowledge as conceptualized
in the usage-based model of language.
3. Implicit linguistic knowledge in the usage-based model
Within the framework of cognitive-functional linguistics, the usage-based
model makes several fundamental assumptions about the nature of lan-
guage: First, interpersonal communication is seen as the main purpose of
language. Second, language is believed to be shaped by our experience
with the real world. Third, language ability is regarded as an integral
part of general cognition. Fourth, all linguistic phenomena are explained
by a unitary account, including morphology, syntax, semantics, and prag-
matics. Hence, at the most general level, the usage-based model charac-
terizes language as a quintessentially functional, input-driven phenome-
non (e.g., Bybee and McClelland 2005; Goldberg 2003; Tomasello 1998).
Two specic theoretical consequences arising from these general premises
72 K. Roehr
are particularly relevant to the current discussion, namely, rst, the pro-
cess of categorization and the sensitivity of knowledge representations to
context and prototype eects, and second, the notion of linguistic con-
structions as conventionalized form-meaning pairings varying along the
parameters of specicity and complexity.
In the usage-based model, the representation and processing of lan-
guage is understood in terms of general psychological mechanisms such
as categorization and entrenchment, with the former underlying the lat-
ter. Entrenchment refers to the strengthening of memory traces through
repeated activation. Categorization can be dened as a comparison be-
tween an established structural unit functioning as a standard and an ini-
tially novel target structure (Langacker 1999, 2000). In view of well-
established empirical evidence from the area of cognitive psychology
(Rosch and Lloyd 1978; Rosch and Mervis 1975), it is accepted that cog-
nitive categories are subject to prototype eects, which are assumed to
apply in equal measure to conceptual and linguistic knowledge (Dirven
and Verspoor 2004; Taylor 2003; Tomasello 2003). A prototype can be
dened as the best example of a category, i.e., prototypical members of
cognitive categories have the largest number of attributes in common
with other members of the category and the smallest number of attributes
which also occur with members of neighbouring categories. In terms of
attributes, prototypical members are thus maximally distinct from the
prototypical members of other categories. To illustrate by means of a
well-known example, robin or magpie are prototypical members of the
category [bird] for (British) speakers of English, while penguin consti-
tutes a marginal category member (Ungerer and Schmid 1996).
Categorization is inuenced by the frequency of exemplars in the input
as well as by the recency and context of encounters with specic exem-
plars (N. Ellis 2002a, 2002b). As the parameters of frequency, recency,
and context interact, specic memory traces may be more or less en-
trenched and hence more or less salient and accessible for retrieval (Mur-
phy 2004). In addition, exemplars encountered in the input may be more
or less similar to exemplars encountered previously. Accordingly, cate-
gory membership is often a matter of degree and cannot normally be un-
derstood as a clear-cut yes/no distinction. It follows from this that cate-
gory boundaries may be fuzzy, and that categories may merge into one
another (Langacker 1999, 2000).
Two theoretical approaches to categorization are compatible with the
usage-based assumptions outlined in the previous paragraphs, that is,
the prototype view and the exemplar view (Murphy 2004). In its pure
form, the prototype view holds that concepts are represented by schemas,
i.e., structured representations of cognitive categories. Schemas contain
Categories in second language learning 73
information about both attributes and relations between attributes that
characterize a certain category. Conversely, the exemplar view, in its pure
form, posits that our mental representations never encompass an entire
concept. Instead, an individuals concept of a category is the set of spe-
cic category members they can remember, and there is no summary rep-
resentation. In this view, categorization is determined not only by the
number of exemplars a person remembers, but also by the similarity of a
new exemplar to exemplars already held in memory.
While the prototype and exemplar views may be incompatible in their
pure forms, they share a suciently large number of characteristics to
allow for a hybrid model to be formulated which includes both schema-
based and exemplar-based representations (Abbot-Smith and Tomasello
2006; Langacker 2000). As a hybrid model is not only compatible with
usage-based assumptions, but also particularly informative for accounts
of language learning and use, it is adopted in the current paper.
According to the hybrid model, all learning is initially exemplar-based.
As experience with the input grows and as repeated encounters with
known exemplars gradually change our mental representations of these
exemplars, it is believed that, ultimately, abstractions over instances are
derived (Kemmer and Barlow 2000; Taylor 2002). These abstractions
are in fact schemas. Schema formation can be dened as the emergence
of a structure through reinforcement of the commonality inherent in mul-
tiple experiences, while, at the same time, experiential facets which do
not recur are ltered out. Correspondingly, a schema is the commonality
that emerges from distinct structures when one abstracts away from their
points of dierence by portraying them with lesser precision and specic-
ity (Langacker 2000: 4).
To illustrate with the help of a linguistic example, a large number of
encounters with specic utterances such as I sent my mother a birthday
card and Harry is sending his friend a parcel lead to entrenchment, i.e.,
the strengthening of memory traces for the form-meaning associations
constituting these constructions. Gradually, constructional subschemas
such as send-[np]-[np] and nally the wholly general ditransitive schema
[v]-[np]-[np] are abstracted. Entrenched constructions, both general and
specic, are described as conventional units. Accordingly, a speakers lin-
guistic knowledge can be dened as a structured inventory of conven-
tional linguistic units (Langacker 2000: 8).
Crucially, the hybrid view argues that representations of specic exem-
plars can be retained alongside more general schemas subsuming these
exemplars. Put dierently, specic instantiations of constructions and
constructional schemas at varying levels of abstraction exist alongside
each other, so that the same linguistic patterns are potentially represented
74 K. Roehr
in multiple ways. Thus, linguistic knowledge is represented in a vast, re-
dundantly organized, hierarchically structured network of form-meaning
associations.
Conventional linguistic units, or constructions, are viewed as inherently
symbolic (Kemmer and Barlow 2000; Taylor 2002), so that constructions
at all levels of abstraction are pairings of form and meaning (Goldberg
2003: 219). Hence, even though a constructional schema at the highest
level of abstraction such as the English ditransitive [v]-[np]-[np] no longer
contains any specic lexical items, it is still endowed with constructional
meaning. Accordingly, a construction is always more than the sum of its
parts; beyond symbolizing the meanings and relations of its constituents,
it has its own semantic prole (Langacker 1991, 2000). For instance, at
the most general level, the semantics of the English ditransitive schema
[v]-[np]-[np] are captured by the notions of transfer and motion (Gold-
berg 1995, 1999, 2003).
To reiterate, the unitary approach to language which characterizes the
usage-based model is applied both at the level of cognition and at the
level of linguistic structure itself. Hence, syntax, morphology, and the lex-
icon are all accounted for by the same system (Bates and Goodman 2001;
Langacker 1991, 2000; Tomasello 1998); they are regarded as diering in
degree rather than as diering in kind. Syntax, morphology, and the lexi-
con are conceptualized as a graded continuum of conventional linguistic
units, or constructions, varying along the parameters of specicity and
complexity, as shown in Figure 1.
2
As Figure 1 indicates, schematic and complex constructions such as the
ditransitive [v]-[np]-[np] occupy the area traditionally referred to as syn-
tax. Words such as send or above are both minimal and specic and oc-
cupy the area traditionally labelled lexicon. Morphemes such as English
plural -s or regular past tense -ed are situated at the centre of the two
clines, since instances of morphology are neither entirely specic nor en-
tirely schematic; by the same token, they are neither truly minimal nor
truly complex, but they are always bound. Lexical categories like [noun],
[verb], and [adjective] are minimal but schematic, while idioms such as
kick the bucket tend to be both complex and specic in that they allow
for little variation. The example kick the bucket only permits verb inec-
tion for person and tense, for instance, and thus ranges high on the specif-
icity scale. At the same time, the construction kick the bucket can be con-
sidered as more complex than the constructions send or above because the
latter cannot be broken down any further.
To summarize, the usage-based model assumes that categorization is a
key mechanism in language representation, learning, and use. As linguis-
tic knowledge is regarded as an integral part of cognition, it is accepted
Categories in second language learning 75
that both conceptual and linguistic categories are subject to context and
prototype eects. Linguistic knowledge is conceptualized in terms of con-
structions, i.e., conventionalized form-meaning units varying along the
parameters of specicity and complexity. Crucially, these assumptions
underlie the usage-based account of implicit phenomena of language rep-
resentation, acquisition, and use. The role of explicit phenomena, in par-
ticular as studied in the eld of SLA, is the focus of the next section.
4. Explicit knowledge in language learning
The notion of explicit knowledge has consistently attracted the interest of
researchers in the areas of SLA and applied linguistics more generally.
Over the past two decades in particular, this interest has generated an im-
pressive amount of both empirical and theoretical research. Depending
on whether researchers take a primarily educational or a primarily psy-
cholinguistic perspective, empirical studies have drawn on a variety of
correlational and experimental research designs, investigating the rela-
tionship between L2 learners linguistic prociency and their metalinguis-
tic knowledge, the role of explicit knowledge in instructed L2 learning,
and the eects of implicit versus explicit learning conditions on the acqui-
sition of selected L2 constructions.
Figure 1. Linguistic constructions in the specicity/complexity continuum
76 K. Roehr
The most uncontroversial cumulative nding resulting from this body
of research has borne out the prediction that attention (in the sense of
stimulus detection) is a necessary condition for the learning of novel input
(Doughty 2003; N. Ellis 2001, 2003; MacWhinney 1997). Moreover, it
has been found that form-focused instructional intervention is more eec-
tive than mere exposure to L2 input (Doughty 2003; R. Ellis 2001, 2002;
Norris and Ortega 2001). As it is the intended purpose of all types of form-
focused instruction to direct learners attention to relevant form-meaning
associations in the linguistic input, this is not a surprising outcome.
Beyond the well-substantiated claim that attention in the sense of stim-
ulus detection is a necessary requirement for input to become intake, the
picture is much less clear. In other words, ndings regarding the role of
explicit knowledge, i.e., knowledge above the threshold of awareness,
yield a more complex and sometimes even apparently contradictory pat-
tern of evidence. As it is beyond the scope of this paper to present an ex-
haustive review of the large body of research that has been carried out in
the preceding decades, the following summary is deliberately brief and fo-
cused exclusively on representative studies that are directly relevant to the
current discussion (for more comprehensive recent reviews of the litera-
ture, see DeKeyser 2003; R. Ellis 2004). In particular, work which illus-
trates the sometimes contrasting nature of ndings and conclusions as
well as work which emphasizes the complex interplay of variables in lan-
guage learning processes has been selected.
Empirical research concerned with metalinguistic knowledge in SLA
has led to at least two results that highlight the potential benets of ex-
plicit knowledge and learning. First, learners metalinguistic knowledge
and their L2 linguistic prociency have been found to correlate positively
and signicantly, even though the strength of the relationship varies be-
tween studies, ranging from a moderate 0.3 to 0.5 (e.g., Alderson et al.
1997; Elder et al. 1999) to between 0.6 and 0.7 (Elder and Manwaring
2004), and, reported most recently, up to 0.8 (Roehr 2007). Thus, there
is evidence for an overall association between higher levels of learner
awareness, use of metalinguistic knowledge, and successful L2 perfor-
mance (Leow 1997; Nagata and Swisher 1995; Rosa and ONeill 1999).
Second, learners use of metalinguistic knowledge when resolving form-
focused L2 tasks has been found to be associated with consistent and sys-
tematic performance (Roehr 2006; Swain 1998).
While these ndings are indicative of a generally facilitative role for ex-
plicit knowledge about the L2, empirical evidence likewise demonstrates
that use of metalinguistic knowledge by no means guarantees successful
L2 performance. For instance, Doughty (1991) found equal gains in per-
formance across two experimental groups comprising 20 university-level
Categories in second language learning 77
learners of L2 English from various L1 backgrounds. Focusing on restric-
tive relative clauses (e.g., I know the people who you talked with), learners
receiving meaning-oriented instruction with enhanced input and learners
exposed to rule-oriented instruction with explicit explanation of the tar-
geted L2 construction showed equal gains in performancea nding
which suggests that metalinguistic explanations may be unnecessary.
By the same token, Sanz and Morgan-Short (2004) found support for
the null hypothesis that providing learners with explicit information
about the targeted L2 construction either before or during exposure to
input-based practice would not aect their ability to interpret and pro-
duce L2 sentences containing the targeted L2 construction, as long as
learners received structured input aimed at focusing their attention appro-
priately. The study was carried out with 69 L1 English learners of L2
Spanish and concentrated on preverbal direct object pronouns. The re-
searchers concluded that structured input practice which made linking
form and meaning task-essential, as proposed in processing instruction
(VanPatten 1996, 2004), appeared to be sucient for successful learning.
Additional explicit information about the targeted L2 construction did
not enhance participants performance any further.
The ambivalent relationship between use of metalinguistic knowledge
and successful L2 performance was likewise underlined by Green and
Hecht (1992), Camps (2003), and Roehr (2006). Green and Hecht (1992)
report a study with 300 L1 German learners of L2 English which targeted
the use of various morphosyntactic features such as tense and word order.
While successful metalinguistic rule formulation typically co-occurred
with the successful correction of errors instantiating the rules in question,
it was also found that successful error correction could be associated with
the formulation of incorrect rules, or no rule knowledge at all.
In a study involving 74 L1 English learners of L2 Spanish focusing on
third-person direct object pronouns, Camps (2003) collected both concur-
rent and retrospective verbal protocol data. He found that references to
the targeted L2 construction co-occurred with accurate performance in
92 percent of cases; yet, no reference to the targeted L2 construction still
co-occurred with accurate performance in 69 percent of cases. Thus, de-
spite providing additional benets in some cases, use of explicit knowl-
edge appears to have been far from necessary.
Roehr (2006) studied retrospective verbal reports from ten L1 English
learners of L2 German, which were obtained immediately after the com-
pletion of form-focused tasks targeting adjectival inection. She found
that although reported use of metalinguistic knowledge co-occurred
more frequently with successful than with unsuccessful item resolution
overall, fully correct use of metalinguistic knowledge still co-occurred
78 K. Roehr
with unsuccessful item resolution in 22 percent of cases. Along similar
lines, anecdotal evidence from the L2 classroom suggests that, on occa-
sion, learners may use their metalinguistic knowledge to override more
appropriate intuitive responses based on implicit linguistic knowledge
(Gabrielatos 2004).
Theoretically oriented work concerned with metalinguistic knowledge
has mainly sought to identify the dening characteristics of the concept
of explicit knowledge as well as the facilitative potential of such knowl-
edge in SLA. The most substantial contribution to establishing the den-
ing characteristics of metalinguistic knowledge has arguably been made
by R. Ellis (2004, 2005, 2006), according to whom explicit L2 knowledge
is represented declaratively, characterized by conscious awareness, and
verbalizable, as mentioned in the construct denition presented in Section
2 above. Moreover, explicit L2 knowledge is said to be learnable at any
age, given sucient cognitive maturity. As explicit knowledge is em-
ployed during controlled processing, it tends to be used when the learner
is not under time pressure. Finally, it has been hypothesized that learners
explicit L2 knowledge may be more imprecise and more inaccurate than
their implicit knowledge.
Research with a primarily theoretical outlook has further considered
metalinguistic knowledge in terms of the categories and relations between
categories that are represented explicitly, as well as the nature of the L2
constructions described by explicit categories and relations between cate-
gories. Typically, such research has conceptualized metalinguistic knowl-
edge as knowledge of pedagogical grammar rules consisting of explicit de-
scriptions of linguistic phenomena. It has been argued that metalinguistic
descriptions may vary along several parameters, including complexity,
scope, and reliability (DeKeyser 1994; Hulstijn and de Graa 1994).
For instance, metalinguistic descriptions may refer to either prototyp-
ical or peripheral uses of a particular L2 construction (Hu 2002). More-
over, the L2 construction described may itself vary in terms of complex-
ity, perceptual salience, or communicative redundancy (Hulstijn and de
Graa 1994). In view of this multifaceted interaction between the type of
explicit description and the type of L2 construction described, it is notori-
ously dicult to predict which kind of metalinguistic description is likely
to be helpful to the L2 learner. Accordingly, positions have shifted some-
what over the years, with earlier work advocating fairly categorically ei-
ther the teaching of more complex metalinguistic descriptions (Hulstijn
and de Graa 1994), or the teaching of simpler rules (DeKeyser 1994;
Green and Hecht 1992).
In recent years, researchers have adopted a more sophisticated line of
argument. DeKeyser (2003) has highlighted the fact that the diculty
Categories in second language learning 79
and hence the potential usefulnessof metalinguistic descriptions is a
complex function of a number of variables, including the characteristics
of the description itself, the characteristics of the L2 construction being
described (see also DeKeyser 2005), and individual learner dierences in
aptitude.
Indeed, the fact that the relative usefulness of metalinguistic descrip-
tions in L2 learning and performance is aected by a range of variables
is to be expected, since language is necessarily learned and used by spe-
cic individuals in specic contexts. First and foremost, the role of meta-
linguistic knowledge in SLA is at least partially dependent upon a
learners current level of L2 prociency (Butler 2002; Camps 2003; Sorace
1985). Second, a learners use of metalinguistic knowledge is likely to be
subject to situation-specic variation, since both the targeted L2 construc-
tion(s) and the task requirements at hand play a part in determining
whether and how metalinguistic knowledge is employed (R. Ellis 2005;
Hu 2002; Klapper and Rees 2003; Renou 2000). Hence, timed tasks in
general and oral task modalities in particular may prevent a learner
from allocating sucient attentional resources to controlled processing
involving metalinguistic knowledge, whereas untimed tasks in general
and written task modalities in particular may have the opposite eect,
possibly encouraging the use of metalinguistic knowledge.
Third, the L1-L2 combination under investigation, paired with the rel-
ative typological distance between L1 and L2, may have a part to play
(Elder and Manwaring 2004). Fourth, length of prior exposure to L2 in-
struction and the type of instruction experienced have been shown to im-
pact on a learners level and use of metalinguistic knowledge (Elder et al.
1999; Roehr 2007). Finally, individual dierences in cognitive and learn-
ing style, strategic preferences, and aptitude may inuence a learners use
of metalinguistic knowledge (Collentine 2000; DeKeyser 2003; Roehr
2005).
Most recently, existing work concerned with the role of explicit knowl-
edge in SLA has been complemented by hypotheses about the nature of
the representations and processes involved in the use of metalinguistic
knowledge. Crucial to the current paper, both empirical ndings and the-
oretical research suggest that explicit and implicit knowledge are separa-
ble constructs which are nonetheless engaged in interplay (N. Ellis 1993,
2005; R. Ellis 2005; Segalowitz 2003). In other words, the so-called weak-
interface position
3
allows for the possibility of explicit metalinguistic
knowledge contributing indirectly to the acquisition of implicit linguistic
knowledge, and vice versa. It has been argued that the two types of
knowledge come together during conscious processing (for particularly
readable reviews of the complex subject matter of consciousness, see
80 K. Roehr
Baddeley 1997; Cattell 2006). Moreover, when explicit knowledge is
brought to bear on implicit knowledge and vice versa, enduring learning
eects may result (N. Ellis and Larsen-Freeman 2006).
The mechanism which is thought to enable conscious processing is
called binding. During binding, a number of implicit representations in
dierent modalities are activated simultaneously and integrated into a
unied explicit representation that is held in a multimodal code in work-
ing memory (Bayne and Chalmers 2003; Dienes and Perner 2003; N. Ellis
2005). We consciously experience this unied representation as a coherent
episode. Put dierently, the mechanism of binding, explained through the
temporally synchronized ring of a number of neurons in dierent brain
regions (Engel 2003), accounts for how implicit representations subserve
explicit representations.
With regard to explicit metalinguistic and implicit linguistic processing,
it has been proposed that implicit learning of language occurs during u-
ent comprehension and production. Explicit learning of language occurs
in our conscious eorts to negotiate meaning and construct communica-
tion (N. Ellis 2005: 306). Thus, during uent language use, the implicit
system automatically processes input and produces output, with the indi-
viduals conscious self focused on the meaning rather than the form of the
utterance. When comprehension or production diculties arise, however,
explicit processes take over. We focus our attention on linguistic form,
and we notice patterns; moreover, we become aware of these patterns as
unied, coherent representations. Such explicit representations can then
be used as pattern recognition units for new stimuli in future usage
events. In this way, conscious processing helps consolidate new bindings,
which are fed back to the brain regions responsible for implicit processing
(N. Ellis 2005).
Steered by the focus of our conscious processing, the repeated simulta-
neous activation of a range of implicit representations helps consolidate
form-meaning associations, often to the extent that implicit learning on
subsequent occasions of use becomes possible. Thus, as the various ele-
ments constituting a coherent form-meaning association are activated si-
multaneously during processing, they are bound together more tightly
(N. Ellis 2005). Crucially, however, it is not a question of the explicit
representation turning into an implicit representation. According to the
weak-interface position, it is not the metalinguistic knowledge, e.g., in the
form of an explicit description of a linguistic phenomenon, that becomes
implicit, but its instantiation, i.e., the sequences of language that the de-
scription is used to comprehend or to construct (R. Ellis 2004: 238).
4
The locus of conscious processingmetaphorically speakingis work-
ing memory. Put dierently, explicit knowledge is conceptualized as
Categories in second language learning 81
information that is selectively attended to, stored, and processed in work-
ing memory. Working memory refers to the system or mechanism un-
derlying the maintenance of task-relevant information during the perfor-
mance of a cognitive task (Shah and Miyake 1999: 1). Thus, working
memory allows for the temporary storage and manipulation of informa-
tion which is being used during online cognitive operations such as lan-
guage comprehension, learning, and reasoning (Baddeley 2000; Baddeley
and Logie 1999). The so-called episodic buer, a component of working
memory, is capable of binding information from a variety of sources and
holding such information in a multimodal code. Importantly, working
memory is limited in capacity (Just and Carpenter 1992; Miyake and
Friedman 1998), i.e., we can only attend to and hence be aware of so
much information at any one time.
Clearly, the fact that limited working memory resources constrain ex-
plicit processing of language aects L2 and L1 in equal measure. It is
well-established that individuals dier in the maximum amount of activa-
tion available to them, i.e., that individuals dier in terms of their work-
ing memory capacity (e.g., Daneman and Carpenter 1980; Just and Car-
penter 1992; Miyake and Shah 1999). Moreover, young children generally
have smaller working memory capacity than cognitively mature adoles-
cents and adults. In other words, beyond the issue of individual dier-
ences, working memory capacity increases in the course of an individuals
development.
In L1 acquisition and use, the emergence of metalinguistic ability is
closely associated with the development of literacy skills, that is, another
dimension of linguistic competence which requires selective attention to
language form (Birdsong 1989; Gombert 1992). As both metalinguistic
ability and literacy skills rely on conscious processing drawing on work-
ing memory resources, a certain level of cognitive maturity which guaran-
tees sucient working memory capacity is required; hence, these abili-
ties do not tend to develop until a child is between six and eight years of
age.
Metalinguistic processeswhether concerned with L1 or L2are
analogous to other higher-level mental operations that draw on working
memory resources and thus require a certain level of cognitive maturity.
Hence, the application of metalinguistic knowledge and the process of
analytic reasoning as applied during general problem-solving appear to
rely on the same basic mechanisms. Put dierently, use of metalinguistic
knowledge in language learning and performance can be regarded as an-
alytic reasoning applied to the problem space of language; metalinguistic
processing is problem-solving in the linguistic domain (Anderson 1995,
1996; Butler 2002; Hu 2002).
82 K. Roehr
In L1, a child may raise questions about form-meaning associations
(Why are there two names, orange and tangerine?), comment on non-
target-like utterances they have overheard (e.g., if another child mispro-
nounces certain words), or objectify language (Is the a word?), thus not
only demonstrating their ability to monitor language use, but also show-
ing the rst signs of what will eventually result in the ability to reason
about language (examples adapted from Birdsong 1989: 17; Karmilo
and Karmilo-Smith 2002: 80). In L2, use of metalinguistic knowledge
can likewise be understood in terms of monitoring and reasoning based
on hypothesis-testing operations (N. Ellis 2005; Roehr 2005), which are
characteristic of a problem-solving approach. Thus, the cognitively ma-
ture L2 learner may deliberately analyze input in an attempt to compre-
hend an utterance (What is the subject and what is the object in this sen-
tence?), or creatively construct output that is monitored for formal
accuracy (If I use a compound tense in this German clause, the rst
verb needs to be in second position and the second verb in nal position.)
To summarize this section, available empirical evidence about the role
of explicit knowledge in language learning and use bears out the theoreti-
cally motivated expectation that metalinguistic knowledge can have both
benets and limitations. Whilst the facilitative eect of focused attention
in the sense of stimulus detection is all but undisputed, determining the
impact of higher levels of learner awareness and more explicit types of
learner knowledge which go beyond focused attention in the sense of
stimulus detection is less straightforward. On the one hand, L2 pro-
ciency and metalinguistic knowledge have been found to correlate posi-
tively and signicantly. Moreover, use of metalinguistic knowledge is typ-
ically associated with performance patterns characterized by consistency
and systematicity. On the other hand, use of metalinguistic knowledge is
by no means a guarantee of successful performance, and higher levels of
learner awareness that reach beyond noticing may be unnecessary or pos-
sibly even unhelpful in certain situations.
In the area of theory, a recent position includes the proposal that ex-
plicit and implicit knowledge are separate and distinct, but can interact.
Hence, explicit knowledge about language may contribute indirectly to
the development of implicit knowledge of language, and vice versa. As
explicit and implicit knowledge interface during conscious processing,
and as such processing is subject to working memory constraints, use of
metalinguistic knowledge in language learning and performance is likely
to have not only benets, but also certain limitations. On the one hand,
conscious processing involving the higher-level mental faculty of analytic
reasoning allows the cognitively mature individual to apply a problem-
solving approach to language learning. On the other hand, conscious
Categories in second language learning 83
processing is constrained by limited working memory capacity and thus
only permits the consideration of a restricted amount of information at
any one time.
Finally, existing research acknowledges that the relative usefulness of
metalinguistic knowledge can be expected to depend on a range of
learner-internal and learner-external variables, including task modalities,
the learners level of L2 prociency, their language learning experience,
their cognitive abilities, and their stylistic orientation.
Whilst it is important to bear in mind that all these factors will dier-
entially aect the role of metalinguistic knowledge in language learning
and performance (see Section 6 below), it is argued here that, ceteris par-
ibus and over and above these factors, another, more fundamental vari-
able which goes beyond specic usage situations and individual learner
dierences is worthy of consideration: The contrasting category structures
of implicit linguistic knowledge representations on the one hand and ex-
plicit metalinguistic knowledge representations on the other hand as well
as the dierent modes of implicit, associative processing and explicit, rule-
based processing constitute the basic cognitive conditions in which lan-
guage learning and performance take place. If taken into account, these
phenomena not only help explain existing ndings about the apparently
ambivalent role of metalinguistic knowledge in L2 learning and use, but
also permit us to formulate specic empirical predictions that can guide
future research.
5. The representation and processing of implicit linguistic knowledge and
explicit metalinguistic knowledge
As linguistic and metalinguistic knowledge pertain to the same cognitive
domainlanguagethey can be expected to share certain characteristics.
Specically, it appears that linguistic constructions and metalinguistic de-
scriptions vary along the same parameters, namely, specicity and com-
plexity. The usage-based model assumes that linguistic constructions can
be more or less specic as well as more or less complex (see Figure 1
above). By the same token, empirical evidence suggests that L2 learners
metalinguistic knowledge can be more or less specic and more or less
complex (e.g., Roehr 2005, 2006; Rosa and ONeill 1999).
For the purpose of illustration, one might imagine the case of an edu-
cated L1 English-speaking adult learner of L2 German and consider their
metalinguistic knowledge which has mostly been derived from encounters
with pedagogical grammar in the classroom and in textbooks.
5
Thus, a
metalinguistic description which this learner is aware of can refer to spe-
cic instances, e.g., German hin expresses movement away from the
84 K. Roehr
speaker, while her expresses movement towards the speaker. Alterna-
tively, it can be entirely schematic and therefore involve no specic exem-
plars at all, e.g., a subordinating conjunction sends the nite verb to the
end of the clause. Both of these examples are additionally complex, i.e.,
they state relations between categories, and they can be broken down into
their constituent parts and therefore require several mental manipulations
during processing (DeKeyser 2003; Stankov 2003). However, a metalin-
guistic description can also be minimal, e.g., noun. Various combina-
tions of dierent levels of specicity and complexity seem possiblewith
the exception of both minimal and specic.
In fact, the joint characteristics of minimal and specic appear to be
unique to lexical items, that is, linguistic constructions. By contrast, even
entirely specic metalinguistic descriptions containing no schematic cate-
gories such as German ei is pronounced like English i or English desk
means Schreibtisch in German involve a relation between two specic
instances and can therefore still be broken down into their constituent
parts. By the same token, a minimal metalinguistic description such as
noun, which cannot be broken down any further, is schematic rather
than specic. Put dierently, as soon as implicit linguistic knowledge is
made explicit, i.e., when a metalinguistic knowledge representation is cre-
ated (no matter by whom, whether an L2 learner, an applied linguist, or
any other language user), it seems to take the form of either a schematic
description (noun), or a proposition involving at least two categories
and a relation between them.
It should be pointed out that this circumstance does not exclude state-
ments about the lexicon from the realm of metalinguistic description and
representation; quite to the contrary, semantic knowledge is perhaps the
most obvious area of explicit knowledge about language, since it typically
encompasses not only L2 metalinguistic knowledge, but also L1 metalin-
guistic knowledge. Indeed, we can glean metalinguistic knowledge about
lexical items from any monolingual or bilingual dictionary. However, it is
crucial to note that, when made explicit, semantic knowledge incorpo-
rates at least two categories and a relation between them, as exemplied
by dictionary denitions of any description. Even the briefest listing of a
synonym without further explanatory comment amounts to stating a rela-
tion between two categories (X means Y). Hence, one can argue that im-
plicit knowledge of the meaning, function, and appropriate usage con-
texts of minimal and specic linguistic constructions such as lexical items
is distinguishable from explicit knowledge about the meaning, function,
and appropriate usage contexts of these constructions. This claim applies
not only to implicit knowledge of and explicit knowledge about the lexi-
con, but also to all other areas of language.
Categories in second language learning 85
Whilst metalinguistic knowledge is comparable with linguistic con-
structions in terms of the parameters of complexity and specicity, ex-
plicit metalinguistic knowledge diers qualitatively from implicit linguis-
tic knowledge in the crucial respect of categorization, that is, one of the
key cognitive phenomena underlying conceptual as well as linguistic rep-
resentation and processing. As outlined in Section 3 above, the usage-
based model assumes that cognitive categories, whether conceptual or lin-
guistic, are exible and context-dependent, sensitive to prototype eects,
and have fuzzy boundaries.
By contrast, metalinguistic knowledge appears to be characterized by
stable, discrete, and context-independent categories with clear-cut bound-
aries. Put dierently, metalinguistic knowledge relies on what has alter-
nately been labelled Aristotelian, categorical, classical, or scientic cate-
gorization (Anderson 2005; Bod et al. 2003; Taylor 2003; Ungerer and
Schmid 1996). For instance, the metalinguistic category subordinating
conjunction is stable and clearly dened; in the case of German, it is in-
stantiated by a certain number of exemplars, such as weil (because), da
(as), wenn (if, when), etc. Although some instantiations occur more fre-
quently than others, there are no better or worse category members; all
subordinating conjunctions have equal status and are equally valid exem-
plars, regardless of context.
By the same token, the linguistic construction [noun] and the metalin-
guistic description noun can be contrasted. As all linguistic construc-
tions are form-meaning pairings, the linguistic construction [noun] is not
devoid of semantic content. Even though it has no specic phonological
instantiation, it has been abstracted over a large number of exemplars oc-
curring in actual usage events (as exemplied in more detail for the
English ditransitive construction in Section 3 above); accordingly, the
linguistic construction [noun] is strongly associated with the semantics of
its most frequent instantiations, such as lexical items denoting entities in
the real world. Consequently, in the average user of English, the highly
frequent and prototypical constructions man, woman and house can be ex-
pected to be more strongly associated with the schema [noun] than the
relatively rare constructions rumination and oxymoron, or the dual-class
words brush and kiss, for instance. Likewise, in the average user of Ger-
man, Fuhlen (the sensing/feeling) is likely to be a relatively marginal
instantiation of the category [noun], compared with the more common
instantiation Gefuhl (sensation/feeling). The more marginal status of
Fuhlen can be attributed to the relative rarity of its nominal usage as
well as its homophone fuhlen (sense/feel), a prototypical verb. Thus,
by dint of its association with various instantiations, their respective
conceptual referents, and their usage contexts, the linguistic schema
86 K. Roehr
[noun] exhibits a category structure which is characterized by exibil-
ity and context-dependency, and which takes into account prototype
eects.
The metalinguistic description noun, on the other hand, relies on Aris-
totelian categorization. It may be dened by means of a discrete state-
ment, e.g., as a word ( . . . ) which can be used with an article (Swan
1995: xxv) or a content word that can be used to refer to a person, place,
thing, quality, or action.
6
Metalinguistic categorization is based on clear
yes/no distinctions; frequency distributions or contextual information are
not taken into account, and prototype eects are ltered out. Thus, in
metalinguistic terms, the constructions man, woman, house, rumination,
oxymoron, brush, kiss, Fuhlen, and Gefuhl all have equal status as mem-
bers of the Aristotelian category noun.
Of course, use of Aristotelian categorization does not mean that we as
language users are unaware of the potential shortcomings of such an
approach. This awareness is also acknowledged in L2 instruction which
draws on metalinguistic descriptions. Most L2 learners will be able to
think of examples of pedagogical grammar rules that are qualied by fre-
quency adverbs such as usually, in general, etc. Most L2 learners will like-
wise be familiar with statements about specic usage contexts as well as
lists of exceptions to a rule that apparently have to be learned by rote. Fi-
nally, the realm of metalinguistic descriptions is not immune to prototype
eects. For instance, descriptions of prototypical functions of a certain
L2 form will occur more often than descriptions of less prototypical
functions of the same form and will thus be more familiar to learners
(Hu 2002). However, it is argued here that these prototype eects only
concern the presentation and/or our perception of metalinguistic de-
scriptions; they do not seem to have any bearing on the internal cate-
gory structure of explicit knowledge representations or the processing
mechanisms operating on these representations, as explicated in the
following.
As a matter of fact, in order to be of use, metalinguistic knowledge re-
quires conditions of stability and discreteness; otherwise, it would be of
little practical value (see also Swan 1994). For metalinguistic knowledge
to be informative, the user needs to decide categorically whether a specic
linguistic construction is to be classied as a noun or not, otherwise a
metalinguistic description such as the verb needs to agree in number
with the preceding noun or pronoun cannot be implemented. By the
same token, the user needs to decide categorically whether a linguistic
construction is a subordinate conjunction or not, otherwise a metalinguis-
tic description such as in German, the nite verb appears at the end of a
subordinate clause cannot be employed.
Categories in second language learning 87
To exemplify further, the metalinguistic description in English re-
ported speech, the main verb of the sentence changes to the past tense
when it is in the present tense in direct speech applies in equal measure
to all English utterances, unless it is qualied by further statements about
specic contexts, e.g., if something that is still true at the time of speak-
ing is being reported, the main verb may remain in the present tense.
Further propositions are required to make explicit the formal and func-
tional criteria of introducing reported speech by means of dierent verbs
such as say and tell, to describe the formal and functional aspects of re-
ported questions, and so forth (example adapted from Murphy 1994).
No matter how many statements are formulated, though, the user needs
to be able to clearly assign category membership in each case in order to
be able to apply the metalinguistic description, represented as metalin-
guistic knowledge, to a concrete linguistic construction. If we cannot de-
cide categorically if something is a main verb, if something is direct
speech, etc., we cannot bring to bear our explicit knowledge.
As a nal example, consider a general, dictionary-style metalinguistic
description pertaining to the constructions desk and Schreibtisch (desk),
which is again necessarily stable and discrete. The statement that English
desk means Schreibtisch in German is posited as a context-independent
proposition which does not take into account prototypicality or usage sit-
uations. In order to achieve a ner descriptive grain, additional proposi-
tions need to be formulated, e.g., in the context of English check-in desk,
the word Check-in-Schalter needs to be used in German. Conversely, the
implicit linguistic knowledge of a procient user of both English and
German would accurately reect the frequency distributions of the con-
structions desk, Schreibtisch, and Schalter in connection with the relevant
referential meanings and suitable pragmatic contexts in which these con-
structions tend to appear.
The same principle applies to the internal structure of all metalinguistic
categories and propositions about relations between categories that make
up metalinguistic descriptions, regardless of whether these refer to lexico-
semantic, morphosyntactic, phonological, or pragmatic phenomena: Aris-
totelian categories are needed to allow for the eective deployment of
metalinguistic knowledge. To reiterate, if we cannot take clear-cut deci-
sions about category membership, our metalinguistic knowledge is of lit-
tle practical value in concrete usage situations.
The contrasting category structures of implicit linguistic and explicit
metalinguistic representations can be expected to aect the processing
mechanisms which operate on these representations during language
learning and use. Indeed, implicit and explicit mental operations involv-
ing natural language appear to be analogous with what is respectively
88 K. Roehr
termed similarity-based and rule-based processing in the eld of cognitive
psychology.
Similarity-based and rule-based processing have been studied in rela-
tion to categorization, reasoning, and articial language learning, and ex-
perimental evidence for a qualitative distinction between the two pro-
cesses is quite robust, though not uncontroversial. In accordance with
the weak-interface position adopted in the current paper (see Section 4
above), I am in agreement with researchers who not only regard rule-
based and similarity-based processing as separable and distinct, but also
argue that the dening property of rule-based processing is its conscious
nature (Cleeremans and Destrebecqz 2005; Hampton 2005; Smith 2005).
As mentioned previously, conscious awareness occurs in working mem-
ory, a limited-capacity resource; as rule-based processes require executive
attention and eort, they may exceed an individuals working memory ca-
pacity (Ashby and Casale 2005; Bailey 2005; Reber 2005).
Empirical evidence indicates that rule-based processing is characterized
by compositionality, productivity, systematicity, commitment, and a drive
for consistency (Diesendruck 2005; Pothos 2005; Sloman 2005). A set of
operations is compositional when more complex representations can be
built out of simpler components without a change in the meaning of the
components. Productivity means that, in principle, there is no limit to the
number of such new representations. An operation is systematic when it
applies in the same way to a whole class of objects (Pothos 2005). Rule-
based processing entails commitment to specic kinds of information,
while contextual variations are neglected (Diesendruck 2005). The reason
for this is that rule-based operations involve only a small subset of an ob-
jects properties which are selected for processing, while all other object
dimensions are suppressed (Markman et al. 2005; Pothos 2005). A strict
match between an objects properties and the properties specied in the
rule has to be achieved for rule-based processing to apply. Because of
this, rule-based judgements are more consistent and more stable than
similarity-based judgements (Diesendruck 2005; Pothos 2005). It should
be immediately apparent that all these properties of rule-based processing
are in keeping with the characteristics of Aristotelian category structure
detailed and exemplied above in relation to metalinguistic knowledge,
i.e., stability, discreteness, lack of exibility, as well as selective and cate-
gorical decision-making.
The characteristics of rule-based processing can be contrasted with the
characteristics of similarity-based processing. The latter involves a large
number of an objects properties, which only need to be partially matched
with the properties of existing representations to allow for successful
categorization (Pothos 2005). Moreover, and contrary to rule-based
Categories in second language learning 89
processing, similarity-based processing is exible, dynamic, open, and
susceptible to contextual variation (Diesendruck 2005; Markman et al.
2005). Again, it should be apparent that the attributes of similarity-based
processing identied in the eld of cognitive psychology are fully conso-
nant with the characteristics of implicit linguistic categories assumed in
the usage-based model.
It is now possible to consider the empirical ndings about the role of
metalinguistic knowledge in language learning (see Section 4 above) in
light of the proposed conceptualization of explicit metalinguistic repre-
sentations and processes as opposed to implicit linguistic representations
and processes. First, I have argued that linguistic and metalinguistic
knowledge pertain to the same cognitive domain (language) and vary
along the same parameters (specicity and complexity). These circum-
stances are consistent with the empirical nding that the two types of
knowledge are positively correlated in L2 learners. At the same time, it
is of course necessary to bear in mind that, considered on their own, cor-
relations do not allow for direct conclusions to be drawn about cause-
eect relationships, or indeed the directionality of such relationships.
Second, I have suggested that linguistic and metalinguistic knowledge
dier qualitatively in terms of their internal category structure, with im-
plicitly represented categories characterized by exibility, fuzziness, and
context-dependency, and explicitly represented categories showing the
contrasting attributes of Aristotelian structure. This proposal is compati-
ble with the existing claim that the two types of knowledge are separate
and distinguishable constructs.
Third, research in cognitive psychology has revealed that rule-based
processes, i.e., processes which operate on explicit knowledge represen-
tations, are characterized by compositionality, productivity, systematic-
ity, commitment, and a drive for consistency. These characteristics are
consonant with the empirical nding that use of metalinguistic knowl-
edge is associated with consistent, systematic, and often successful L2
performance.
Fourth, rule-based processes are associated with stability and denite
commitment to selected information, while exibility and attention to
contextual variation are absent. Furthermore, as rule-based processes re-
quire both attentional resources and eort, they are constrained by an
individuals working memory capacity. These circumstances are in keep-
ing with the empirical nding that use of metalinguistic knowledge does
not guarantee successful L2 performance and may even be unhelpful
in certain situations. Put dierently, rule-based processes operating on
Aristotelian categories may not only exceed an individuals working
memory resources in a given situation, but may also fail to capture the
90 K. Roehr
intricacies of certain linguistic constructions in the rst place, as exempli-
ed below.
7
In sum, it appears that the proposed conceptualization of explicit meta-
linguistic representations and rule-based processes can account for the
benets as well as the limitations of knowledge based on Aristotelian cat-
egory structure. Such knowledge is at its best when it pertains to highly
frequent and entirely systematic patterns whose usage is largely indepen-
dent of context and may be described in terms of one or a few relations
between categories. In English, an -s needs to be added to present tense
verbs in the third person is an example of a metalinguistic description
instantiating metalinguistic knowledge of this kind. Conversely, metalin-
guistic knowledge is less useful, or perhaps even useless, when less
frequent, more item-based constructions exhibiting complicated form-
meaning relations need to be captured, since the required number of cat-
egories and propositions specifying relations between categories grows
rapidly with every specic usage context that diverges from the regular
pattern.
To exemplify, our implicit representations of the linguistic construc-
tions desk and Schreibtisch (desk) include a wealth of information about
appropriate pragmatic usage contexts of the linguistic forms based on cul-
tural models relating to the meanings they symbolize. Accordingly, the
implicit linguistic representations of a procient user of English and Ger-
man would include information about the suitability of the construction
desk to describe an item of furniture commonly found in an oce, as
well as the place where you check in at an airport or see a bank clerk to
open an account. Furthermore, the procient user would hold informa-
tion about the suitability of the construction Schreibtisch in the former
scenario but not in the latter.
At the implicit level, this probabilistic information is represented in a
vast network of associations subject to parallel distributed processing,
i.e., non-conscious operations that are unaected by the constraints of
working memory and the cumbersome propositional nature of explicit
knowledge representations and processes. By contrast, the Aristotelian
categories and relations of the relevant metalinguistic description require
the formulation of a set of independent propositions that specify dierent
usage situations, such as English desk is Schreibtisch in German. How-
ever, if you want to say English desk in German and if the expression is
used in the context of an airport or a bank, Schalter needs to be used,
and so forth.
At the level of more schematic categories, the implicit linguistic knowl-
edge of a procient user of English and German would include not only
the schema [co-ordinating conjunction], but likewise instantiations of
Categories in second language learning 91
this schema, all of which are associated with a wealth of linguistic and
conceptual context information. Accordingly, the fact that the German
constructions aber, jedoch, allein and sondern may all be translated as En-
glish but would be complemented not only by information about the high
frequency of aber, but also by knowledge of the specic syntactic proper-
ties of jedoch, the literary or archaic connotations of allein, the tendency
of sondern to be used in contradicting a preceding negative, etc. However,
the metalinguistic descriptions formulated in the previous sentence clearly
show that, when made explicit, this information needs to be stated in
terms of additional independent propositions based on stable and discrete
categories.
This potentially explosive growth of propositions that would be re-
quired to make explicit representations applicable in dierent contexts
has two detrimental consequences. First, it increases working memory
load and thus renders metalinguistic knowledge proportionally more bur-
densome to process; and, second, it becomes less widely applicable. These
potential drawbacks of explicit, rule-based processes apply in equal mea-
sure to the use of metalinguistic knowledge, i.e., reasoning about lan-
guage, and reasoning in other cognitive domains: If there is white-grey
smoke coming out of the kitchen oven where I have had sh cooking for
the last three hours, then there is a re (example adapted from Pothos
2005: 8) is obviously both harder to process and less useful than if there
is smoke, then there is re. Unfortunately, the complexity, exibility, and
context-dependency of natural language means that general (and truthful)
metalinguistic descriptions equivalent to the latter statement are inevita-
bly rather rare.
6. Empirical predictions
In the preceding section, I have argued that the distinct category struc-
tures and processes which characterize explicit and implicit knowledge
are consonant with existing ndings in the area of SLA. Naturally, a ret-
rospective explanatory account can only take us so far. However, the the-
oretical proposals I have put forward oer us further and arguably more
important insights: They allow for the formulation of empirically testable
predictions with regard to the role of metalinguistic knowledge in L2
learning. In what follows, ve specic hypotheses which are intended to
inform future research are presented.
(1) Linguistic constructions which are captured relatively easily by Aris-
totelian categories and relations between such categories will be easier to
acquire explicitly than linguistic constructions which are not captured
easily by Aristotelian categories and relations between such categories.
92 K. Roehr
Specically, linguistic constructions which show comparatively system-
atic, stable, and context-independent usage patterns should be more ame-
nable to explicit teaching and learning than linguistic constructions which
do not show these usage patterns.
There is as yet very little existing research which has investigated the
potential amenability of specic linguistic constructions to explicit L2
instruction drawing on metalinguistic descriptions, even though theo-
retically motivated predictions about the potential diculties of simple
versus complex metalinguistic rules were put forward more than a decade
ago (e.g., DeKeyser 1994; Hulstijn and de Graa 1994). Recent empirical
ndings suggest that L2 form-function mappings which can be described
metalinguistically in conceptually simple terms and which refer to system-
atic usage patterns appear to pose the least explicit learning diculty (R.
Ellis 2006; Roehr and Ganem 2007) and may therefore be particularly
suitable for explicit teaching and learning. By contrast, L2 form-function
mappings with less systematic usage patterns which require conceptually
complex metalinguistic descriptions should pose greater explicit learning
diculty. In view of the small number of studies that have been con-
ducted so far, further investigation of Hypothesis 1 is clearly required.
(2) Use of metalinguistic knowledge will dierentially aect the uency,
accuracy, and complexity of L2 performance. Specically, uency may
decrease, while accuracy and complexity may increase.
Existing research has shown that L2 learners metalinguistic knowledge
correlates positively with L2 prociencyprovided that the latter is oper-
ationalized by means of written rather than oral measures (e.g., Alderson
et al. 1997; Elder et al. 1999; Renou 2000). Given that the use of explicit
knowledge requires controlled processing which is by denition slow and
eortful compared with automatic, implicit operations, this nding is
perfectly compatible with previous theoretical argumentation. However,
whilst L2 prociency has typically been operationalized via discrete-item
tests of structural and lexical competence and/or via the four skills of
reading, writing, speaking, and listening, no study to date has investigated
learners use of metalinguistic knowledge in relation to the SLA-specic
developmental measures of uency, accuracy, and complexity (R. Ellis
and Barkhuizen 2005; Larsen-Freeman 2006; Skehan 1998) which cut
across both oral and written performance.
In view of the fact that explicit, rule-based processing drawing on rep-
resentations with Aristotelian category structure is subject to working
memory constraints and thus relies on the selective allocation of atten-
tional resources, one would expect that increased accuracy, for instance,
can only be achieved at the expense of decreased complexity and uency.
Likewise, increased complexity can only be achieved at the expense of
Categories in second language learning 93
decreased accuracy and uency, whereas increased uency is unlikely to be
achieved at all in association with high use of metalinguistic knowledge.
Averaged across a group of learners, these predicted patterns should
hold for both oral and written performance, although trade-o eects
can be expected to be stronger in the case of oral performance, since the
time pressures of online processing inevitably place even higher demands
on working memory. To my knowledge, none of the performance pat-
terns hypothesized here have been subjected to empirical enquiry yet.
(3) Use of metalinguistic knowledge will be related to cognitively based
individual learner dierences. Specically, a learners cognitive and learn-
ing style, language learning aptitude, and working memory capacity are
likely to dierentially aect their use of metalinguistic knowledge in L2
performance.
I have argued that metalinguistic knowledge representations exhibit
Aristotelian category structure and that rule-based processing mecha-
nisms operate on these representations. As mentioned previously, rule-
based processing mechanisms are characteristic of analytic reasoning
more generally, so that use of metalinguistic knowledge can be regarded
as problem-solving in the linguistic domain. Accordingly, individuals
with an analytic stylistic orientation and large working memory capacity
should be particularly adept at using metalinguistic knowledge.
While existing research has occasionally speculated on some of these
issues (e.g., Collentine 2000; DeKeyser 2003), no study to date has
probed the relationship between L2 learners metalinguistic knowledge
and their stylistic preferences (for recent work on cognitive and learning
style in SLA more generally, see, for instance Ehrman and Leaver 2003;
Reid 1998). As far as I am aware, only one study to date has directly in-
vestigated the interplay of L2 learners metalinguistic knowledge, their
language learning aptitude, and their working memory capacity (Roehr
and Ganem 2007). Results indicate that learners level of metalinguistic
knowledge and their working memory capacity are unrelated, but that
analytic components of language learning aptitude, i.e., components
whose operationalization incorporates no purely memory-based or purely
auditory elements, were positively correlated with learners level of meta-
linguistic knowledge (r 0:42). In view of the shortage of available evi-
dence, further research into the relationship between metalinguistic
knowledge and cognitively based individual dierence variables is needed.
(4) Use of metalinguistic knowledge and cognitively based individual
dierences will be related to learners aective responses. Specically, in-
dividuals with an analytic disposition who are likely to benet from ex-
plicit learning and teaching drawing on metalinguistic knowledge will
experience feelings of greater self-ecacy and will thus develop positive
94 K. Roehr
attitudes towards their L2 learning situation. By contrast, individuals with
a non-analytic disposition who are likely to benet less from explicit learn-
ing and teaching drawing on metalinguistic knowledge will experience
greater anxiety and will thus develop negative attitudes towards their L2
learning situation.
To my knowledge, there is as yet no published research that has put
this prediction to the test (but see Roehr 2005 for some preliminary
analyses based on a small number of cases; for work on the interaction
of aect and cognition more generally, see, for instance, Schumann 1998,
2004; Stevick 1999). In view of Hypothesis 1 above, it is plausible to hy-
pothesize that metalinguistic descriptions which pertain to linguistic con-
structions characterized by systematic and relatively context-independent
usage patterns may be facilitative for any L2 learner, regardless of cogni-
tively based individual dierences. Such metalinguistic descriptions may
focus a learners attention on aspects of the L2 input that might otherwise
be ignored, thus leading to noticing, i.e., conscious processing just above
the threshold of awareness, and all its associated benets.
If, on the other hand, metalinguistic descriptions pertaining to linguis-
tic constructions that pose more substantial explicit learning diculty ac-
cording to Hypothesis 1 are used, cognitively based individual learner dif-
ferences should begin to matter. An analytically oriented individual may
continue to benet by moving beyond noticing towards understanding,
thus relying on conscious processing at a high level of awareness (Schmidt
1990, 1993, 2001). The achievement of understanding is likely to result in
positive aective responses such as feelings of greater self-ecacy and en-
hanced self-condence. A positive attitude towards the L2 learning situa-
tion may result, which would in turn encourage the learner to deliberately
seek further exposure to the L2. In a learner with a dierent stylistic ori-
entation, however, this upward dynamic could well be replaced by a
downward spiral of failure to understand, feelings of anxiety and loss of
control, a negative attitude towards the L2 learning situation, and, in the
worst-case scenario, the eventual abandonment of L2 study. This hy-
pothesized interaction of cognitive and aective variables can and should
be put to the test.
(5) Use of metalinguistic knowledge in L2 learning will be related to L1
metalinguistic ability. Specically, individuals who show strong metalin-
guistic ability and literacy skills in L1 development are likely to exhibit
high levels of metalinguistic knowledge in L2.
With regard to metalinguistic knowledge in adult learners, the link be-
tween L1 and L2 skills has not been widely explored. Some studies have
incorporated measures of L1 metalinguistic knowledge alongside tests of
L2 metalinguistic knowledge (e.g., Alderson et al. 1997), or acknowledged
Categories in second language learning 95
the association between metalinguistic and literacy skills (e.g., Kemp
2001). Furthermore, existing research has emphasized the link between
L1 ability and aptitude for L2 learning (e.g., Sparks and Ganschow
2001), or highlighted the fact that multilingual individuals generally
show greater metalinguistic awareness (e.g., Jessner 1999, 2006). Yet, I
am not aware of any published study of cognitively mature learners
which has directly focused on the relationship between L1 and L2 compe-
tence on the one hand and L1 and L2 metalinguistic knowledge on the
other hand. If Hypotheses 3 and 4 are borne out, the patterns of interplay
between individual dierence variables and metalinguistic knowledge can
be expected to be similar in both L1 and L2.
7. Conclusion
In this paper, I have put forward a theoretically motivated and empirically
grounded conceptualization of the construct of metalinguistic knowledge,
or explicit knowledge about language, with specic reference to L2 learn-
ing. I have argued that explicit metalinguistic and implicit linguistic
knowledge vary along the same parameters, specicity and complexity,
but that they dier qualitatively in terms of their internal category struc-
ture and, accordingly, the processing mechanisms that operate on their
representation in the human mind. In consonance with assumptions made
in the usage-based approach to language, implicit knowledge is character-
ized by exible and context-dependent categories with fuzzy boundaries.
By contrast, explicit knowledge is represented in terms of Aristotelian cat-
egories with a stable, discrete, and context-independent structure.
In accordance with research in cognitive psychology, implicit knowl-
edge is subject to similarity-based processing which is characterized by
dynamicity, exibility, and context-dependency. Conversely, explicit
knowledge is subject to rule-based processing which is both conscious
and controlled. Such processing is constrained by the capacity limits of
working memory; it requires eort, selective attention, and commit-
ment. Rule-based processing is further characterized by stability and
consistencyproperties that are achieved at the cost of exibility and
consideration of contextual and frequency information. Rule-based pro-
cessing underlies analytic reasoning, whether in the linguistic or any other
cognitive domain. Hence, use of metalinguistic knowledge can be under-
stood as problem-solving applied to language.
The proposed attributes of implicit linguistic and explicit metalinguistic
category structures and processes have been considered in relation to
available research in the eld of SLA, and a post-hoc account that is
96 K. Roehr
consistent with both the benets and the limitations of metalinguistic
knowledge as identied in existing research has been provided. Arising
from the theoretical proposals put forward in the present paper, I have
further formulated ve specic predictions which, if conrmed, would
identify the conditions under which metalinguistic knowledge is likely to
be useful to the L2 learner. These predictions constitute empirically test-
able hypotheses which, it is hoped, will be addressed in future research.
Received 7 August 2006 University of Essex, UK
Revision received 16 May 2007
Notes
* I would like to thank Martin Atkinson, Bob Borsley, Ewa Dabrowska, and two anony-
mous reviewers for their helpful and constructive comments. I am also grateful to Sonja
Eisenbeiss, Roger Hawkins, and Max Roberts for reading an earlier version of this
paper. Address for correspondence: Karen Roehr, Department of Language & Lin-
guistics, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK; email:
3kroehr@essex.ac.uk4.
1. The following notation conventions are used: Schematic categories are shown in small
capitals with square brackets, e.g., [bird]. Exemplars of conceptual categories are shown
in small capitals, e.g., robin. Specic linguistic constructions are shown in italics, e.g.,
bachelor, unrepentant, etc. Metalinguistic descriptions are shown in single inverted com-
mas, e.g., da sends the nite verb to the end of the clause.
2. Langackers (1991) terminology is employed throughout this article. Croft (2001) uses
the terms atomic and substantive instead of minimal and specic, respectively.
3. The weak-interface position can be contrasted with the non-interface position and the
strong-interface position. The non-interface position contends not only that explicit and
implicit knowledge are separate and distinct constructs, but also that they cannot engage
in interplay (Krashen 1981, 1985; Paradis 2004). The strong-interface position maintains
that explicit and implicit knowledge interact directly, and that explicit knowledge may
be converted into implicit knowledge, e.g., through prolonged practice (DeKeyser
1994; Johnson 1996; McLaughlin 1995). A review of these various positions can be
found in R. Ellis (2005).
4. Current research into the interface between explicit and implicit knowledge does not yet
oer any highly precise descriptions of the links between the level of the mind and the
level of the brain. Likewise, researchers understanding of the notion of consciousness
is still incomplete. Therefore, what I present here are hypotheses that are compatible
with existing empirical ndings. While recognizing that further research is required, I re-
gard these hypotheses both as suciently plausible to be given serious consideration and
as suciently detailed to be incorporated into a coherent line of argument.
5. As mentioned previously, for the current discussion it does not matter whether an indi-
viduals metalinguistic knowledge has been derived internally or assimilated from exter-
nal sources.
6. URL: 3http://wordnet.princeton.edu/perl/webwn4, retrieved 16 April 2007, based on a
keyword search for noun.
7. This circumstance is consistent with the proposal that explicit knowledge about lan-
guage may be more inaccurate and more imprecise than implicit knowledge (R. Ellis
Categories in second language learning 97
2004, 2005, 2006). While, at rst glance, this hypothesis seems to be incompatible with
the attributes of rule-based processing, it ts into the picture if the limitations of meta-
linguistic knowledge based on representations with Aristotelian category structure are
taken into consideration.
References
Abbot-Smith, Kirsten and Michael Tomasello
2006 Exemplar-learning and schematization in a usage-based account of syntactic
acquisition. The Linguistic Review 23 (3), 275290.
Achard, Michel, and Susanne Niemeier (eds.)
2004 Cognitive Linguistics, Second Language Acquisition, and Foreign Language
Teaching. Berlin: Mouton de Gruyter.
Alderson, J. Charles, Caroline Clapham, and David Steel
1997 Metalinguistic knowledge, language aptitude and language prociency. Lan-
guage Teaching Research 1, 93121.
1995 Learning and Memory: An Integrated Approach. New York, NY: John Wiley
and Sons.
Anderson, John R.
1996 The Architecture of Cognition. Mahwah, NJ: Erlbaum.
2005 Cognitive Psychology and its Implications (6th ed.). New York, NY: Worth
Publishers.
Ashby, F. Gregory and Michael B. Casale
2005 Empirical dissociations between rule-based and similarity-based categoriza-
tion. Behavioral and Brain Sciences 28 (1), 1516.
Baddeley, Alan D.
1997 Human Memory: Theory and Practice. Hove: Psychology Press.
2000 The episodic buer: A new component of working memory? Trends in Cog-
nitive Sciences 4 (11), 417423.
Baddeley, Alan D. and Robert H. Logie
1999 Working memory: The multiple-component model. In Miyake, Akira and
Priti Shah (eds.), Models of Working Memory: Mechanisms of Active Main-
tenance and Executive Control. Cambridge: Cambridge University Press,
2861.
Bailey, Todd M.
2005 Rules work on one representation; similarity compares two representations.
Behavioral and Brain Sciences 28 (1), 16.
Bates, Elizabeth A. and Judith C. Goodman
2001 On the inseparability of grammar and the lexicon: Evidence from acquisi-
tion. In Tomasello, Michael and Elizabeth A. Bates (eds.), Language Devel-
opment. Malden, MA: Blackwell, 134162.
Bayne, Tim and David J. Chalmers
2003 What is the unity of consciousness? In Cleeremans, Axel (ed.), The Unity of
Consciousness: Binding, Integration, and Dissociation. Oxford: Oxford Uni-
versity Press, 2358.
Bialystok, Ellen
1979 Explicit and implicit judgements of L2 grammaticality. Language Learning
29 (1), 81103.
Birdsong, David
1989 Metalinguistic Performance and Interlinguistic Competence. Berlin: Springer.
98 K. Roehr
Bod, Rens, Jennifer Hay, and Stefanie Jannedy
2003 Introduction. In Bod, Rens, Jennifer Hay, and Stefanie Jannedy (eds.),
Probabilistic Linguistics. Cambridge, MA: MIT Press, 110.
Boers, Frank and Seth Lindstromberg
2006 Cognitive linguistic applications in second and foreign language instruction:
Rationale, proposals, and evaluation. In Kristiansen, Gitte, Michel Achard,
Rene Dirven, and Francisco J. Ruiz de Mendoza Ibanez (eds.), Cognitive
Linguistics: Current Applications and Future Perspectives. Berlin: Mouton
de Gruyter, 303355.
Butler, Yuko Goto
2002 Second language learners theories on the use of English articles: An analysis of
the metalinguistic knowledge used by Japanese students in acquiring the En-
glish article system. Studies in Second Language Acquisition 24 (3), 451480.
Bybee, Joan L. and James L. McClelland
2005 Alternatives to the combinatorial paradigm of linguistic theory based on do-
main general principles of human cognition. The Linguistic Review 22 (24),
381410.
Camps, Joaquim
2003 Concurrent and retrospective verbal reports as tools to better understand the
role of attention in second language tasks. International Journal of Applied
Linguistics 13 (2), 201221.
Cattell, Ray
2006 An Introduction to Mind, Consciousness and Language. London: Continuum.
Chalker, Sylvia
1994 Pedagogical grammar: Principles and problems. In Bygate, Martin, Alan
Tonkyn, and Eddie Williams (eds.), Grammar and the Language Teacher.
New York, NY: Prentice Hall, 3144.
Cleeremans, Axel and Arnaud Destrebecqz
2005 Real rules are conscious. Behavioral and Brain Sciences 28 (1), 1920.
Collentine, Joseph
2000 Insights into the construction of grammatical knowledge provided by user-
behavior tracking technologies. Language Learning and Technology 3 (2),
4457.
Croft, William
2001 Radical Construction Grammar: Syntactic Theory in Typological Perspective.
Oxford: Oxford University Press.
Daneman, Meredyth and Patricia A. Carpenter
1980 Individual dierences in working memory and reading. Journal of Verbal
Learning and Verbal Behavior 19, 450466.
DeKeyser, Robert M.
1994 How implicit can adult second language learning be? AILA Review 11, 8396.
2003 Implicit and explicit learning. In Doughty, Catherine J. and Michael H.
Long (eds.), The Handbook of Second Language Acquisition. Malden, MA:
Blackwell, 313348.
2005 What makes learning second-language grammar dicult? A review of issues.
Language Learning 55(s1), 125.
Dienes, Zoltan and Josef Perner
2003 Unifying consciousness with explicit knowledge. In Cleeremans, Axel (ed.),
The Unity of Consciousness: Binding, Integration, and Dissociation. Oxford:
Oxford University Press, 214232.
Categories in second language learning 99
Diesendruck, Gil
2005 Commitment distinguishes between rules and similarity: A developmental
perspective. Behavioral and Brain Sciences 28 (1), 2122.
Dirven, Rene
1990 Pedagogical grammar. Language Teaching 23 (1), 118.
Dirven, Rene and Marjolijn Verspoor
2004 Cognitive Exploration of Language and Linguistics (2nd ed.). Amsterdam:
John Benjamins.
Doughty, Catherine J.
1991 Second language instruction does make a dierence: Evidence from an em-
pirical study of SL relativization. Studies in Second Language Acquisition 13,
431469.
2003 Instructed SLA: Constraints, compensation, and enhancement. In Doughty,
Catherine J. and Michael H. Long (eds.), The Handbook of Second Lan-
guage Acquisition. Malden, MA: Blackwell, 256310.
Ehrman, Madeline E. and Betty Lou Leaver
2003 Cognitive style in the service of language learning. System 31, 393415.
Elder, Catherine and Diane Manwaring
2004 The relationship between metalinguistic knowledge and learning outcomes
among undergraduate students of Chinese. Language Awareness 13 (3),
145162.
Elder, Catherine, Jane Warren, John Hajek, Diane Manwaring, and Alan Davies
1999 Metalinguistic knowledge: How important is it in studying a language at
university? Australian Review of Applied Linguistics 22 (1), 8195.
Ellis, Nick C.
1993 Rules and instances in foreign language learning: Interactions of explicit and
implicit knowledge. European Journal of Cognitive Psychology 5 (3), 289318.
1994 Consciousness in second language learning: Psychological perspectives on
the role of conscious processes in vocabulary acquisition. AILA Review 11,
3756.
1996 Sequencing in SLA: Phonological memory, chunking, and points of order.
Studies in Second Language Acquisition 18, 91126.
2001 Memory for language. In Robinson, Peter (ed.), Cognition and Second Lan-
guage Instruction. Cambridge: Cambridge University Press, 3368.
2002a Frequency eects in language processing: A review with implications for
theories of implicit and explicit language acquisition. Studies in Second Lan-
guage Acquisition 24 (2), 143188.
2002b Reections on frequency eects in language processing. Studies in Second
Language Acquisition 24 (2), 297340.
2003 Constructions, chunking, and connectionism: The emergence of second lan-
guage structure. In Doughty, Catherine J. and Michael H. Long (eds.), The
Handbook of Second Language Acquisition. Malden, MA: Blackwell, 63103.
2005 At the interface: Dynamic interactions of explicit and implicit language
knowledge. Studies in Second Language Acquisition 27 (2), 305352.
Ellis, Nick C. and Diane Larsen-Freeman
2006 Language emergence: Implications for applied linguistics. Applied Linguis-
tics 27 (4), 558589.
Ellis, Rod
2001 Introduction: Investigating form-focused instruction. Language Learning 51
(1), 146.
100 K. Roehr
2002 Does form-focused instruction aect the acquisition of implicit knowledge?
Studies in Second Language Acquisition 24 (2), 223236.
2004 The denition and measurement of L2 explicit knowledge. Language Learn-
ing 54 (2), 227275.
2005 Measuring implicit and explicit knowledge of a second language: A psycho-
metric study. Studies in Second Language Acquisition 27 (2), 141172.
2006 Modelling learning diculty and second language prociency: The dieren-
tial contributions of implicit and explicit knowledge. Applied Linguistics 27
(3), 431463.
Ellis, Rod and Gary Barkhuizen
2005 Analysing Learner Language. Oxford: Oxford University Press.
Engel, Andreas K.
2003 Temporal binding and the neural correlates of consciousness. In Cleere-
mans, Axel (ed.), The Unity of Consciousness: Binding, Integration, and Dis-
sociation. Oxford: Oxford University Press, 132152.
Gabrielatos, Costas
2004 If-conditionals in ELT materials and the BNC: Corpus-based evaluation of
pedagogical materials. Paper presented at the Corpus Linguistics Research
Group meeting on 26 April 2004, Lancaster University.
Goldberg, Adele E.
1995 Constructions: A Construction Grammar Approach to Argument Structure.
Chicago: University of Chicago Press.
1999 The emergence of the semantics of argument structure constructions. In
MacWhinney, Brian (ed.), The Emergence of Language. Mahwah, NJ: Erl-
baum, 197212.
2003 Constructions: A new theoretical approach to language. Trends in Cognitive
Sciences 7 (5), 219224.
Gombert, Jean Emile
1992 Metalinguistic Development. Hemel Hempstead: Harvester.
Green, Peter S. and Karlheinz Hecht
1992 Implicit and explicit grammar: An empirical study. Applied Linguistics 13
(2), 168184.
Hampton, James A.
2005 Rules and similaritya false dichotomy. Behavioral and Brain Sciences 28
(1), 26.
Hu, Guangwei
2002 Psychological constraints on the utility of metalinguistic knowledge in sec-
ond language production. Studies in Second Language Acquisition 24 (3),
347386.
Hulstijn, Jan H.
2005 Theoretical and empirical issues in the study of implicit and explicit second-
language learning: Introduction. Studies in Second Language Acquisition 27
(2), 129140.
Hulstijn, Jan H. and Rick de Graa
1994 Under what conditions does explicit knowledge of a second language facili-
tate the acquisition of implicit knowledge? A research proposal. AILA Re-
view 11, 97112.
Jessner, Ulrike
1999 Metalinguistic awareness in multilinguals: Cognitive aspects of third lan-
guage learning. Language Awareness 8 (34), 201209.
Categories in second language learning 101
2006 Linguistic Awareness in Multilinguals: English as a Third Language. Edin-
burgh: Edinburgh University Press.
Johnson, Keith
1996 Language Teaching and Skill Learning. Oxford: Blackwell.
Just, Marcel Adam and Patricia A. Carpenter
1992 A capacity theory of comprehension: Individual dierences in working
memory. Psychological Review 99 (1), 122149.
Karmilo, Kyra and Annette Karmilo-Smith
2002 Pathways to Language: From Fetus to Adolescent. Cambridge, MA: Harvard
University Press.
Kemmer, Suzanne and Michael Barlow
2000 Introduction: A usage-based conception of language. In Barlow, Michael
and Suzanne Kemmer (eds.), Usage-Based Models of Language. Stanford,
CA: CSLI, viixxviii.
Kemp, Charlotte
2001 Metalinguistic awareness in multilinguals: Implicit and explicit grammatical
awareness and its relationship with language experience and language at-
tainment. Unpublished doctoral dissertation, University of Edinburgh.
Klapper, John and Jonathan Rees
2003 Reviewing the case for explicit grammar instruction in the university foreign
language learning context. Language Teaching Research 7 (3), 285314.
Krashen, Stephen D.
1981 Second Language Acquisition and Second Language Learning. Oxford:
Pergamon.
1985 The Input Hypothesis: Issues and Implications. London: Longman.
Langacker, Ronald W.
1991 Concept, Image, and Symbol: The Cognitive Basis of Grammar. Berlin: Mou-
ton de Gruyter.
1999 Grammar and Conceptualization. Berlin: Mouton de Gruyter.
2000 A dynamic usage-based model. In Barlow, Michael and Suzanne Kemmer
(eds.), Usage-Based Models of Language. Stanford, CA: CSLI, 164.
Larsen-Freeman, Diane
2006 The emergence of complexity, uency, and accuracy in the oral and written
production of ve Chinese learners of English. Applied Linguistics 27 (4),
590619.
Leow, Ronald P.
1997 Attention, awareness, and foreign language behavior. Language Learning
47, 467505.
MacWhinney, Brian
1997 Implicit and explicit processes: Commentary. Studies in Second Language
Acquisition 19, 277281.
Markman, Arthur B., Sergey Blok, Kyungil Kom, Levi Larkey, Lisa R. Narvaez, C. Hunt
Stilwell, and Eric Taylor
2005 Digging beneath rules and similarity. Behavioral and Brain Sciences 28 (1),
2930.
McDonough, Steven
2002 Applied Linguistics in Language Education. London: Arnold.
McLaughlin, Barry
1995 Aptitude from an information-processing perspective. Language Testing 12,
370387.
102 K. Roehr
Miyake, Akira and Naomi P. Friedman
1998 Individual dierences in second language prociency: Working memory as
language aptitude. In Healy, Alice F. and Lyle E. Bourne (eds.), Foreign
Language Learning: Psycholinguistic Studies on Training and Retention.
Mahwah, NJ: Erlbaum, 339364.
Miyake, Akira and Priti Shah
1999 Toward unied theories of working memory: Emerging general consensus,
unresolved theoretical issues, and future research directions. In Miyake,
Akira and Priti Shah (eds.), Models of Working Memory: Mechanisms of
Active Maintenance and Executive Control. Cambridge: Cambridge Univer-
sity Press, 442481.
Murphy, Gregory L.
2004 The Big Book of Concepts. Cambridge, MA: MIT Press.
Murphy, Raymond
1994 English Grammar in Use (2nd ed.). Cambridge: Cambridge University Press.
Nagata, Noriko and Virginia M. Swisher
1995 A study of consciousness-raising by computer: The eect of metalinguistic
feedback on second language learning. Foreign Language Annals 28 (3),
337347.
Norris, John M. and Lourdes Ortega
2001 Does type of instruction make a dierence? Substantive ndings from a
meta-analytic review. Language Learning 51 (1), 157213.
Paradis, Michel
2004 A Neurolinguistic Theory of Bilingualism. Amsterdam: John Benjamins.
Pothos, Emmanuel M.
2005 The rules versus similarity distinction. Behavioral and Brain Sciences 28 (1),
149.
Reber, Rolf
2005 Rule versus similarity: Dierent in processing mode, not in representations.
Behavioral and Brain Sciences 28 (1), 3132.
Reid, Joy M. (ed.)
1998 Understanding Learning Styles in the Second Language Classroom. Upper
Saddle River, NJ: Prentice Hall Regents.
Renou, Janet M.
2000 Learner accuracy and learner performance: The quest for a link. Foreign
Language Annals 33 (2), 168180.
Robinson, Peter
1995 Review article: Attention, memory, and the noticing hypothesis. Lan-
guage Learning 45 (2), 283331.
1997 Generalizability and automaticity of second language learning under im-
plicit, incidental, enhanced, and instructed conditions. Studies in Second
Language Acquisition 19, 223247.
2003 Attention and memory during SLA. In Doughty, Catherine J. and Michael
H. Long (eds.), The Handbook of Second Language Acquisition. Malden,
MA: Blackwell, 631678.
Roehr, Karen
2005 Metalinguistic knowledge in second language learning: An emergentist per-
spective. Unpublished doctoral dissertation, Lancaster University.
2006 Metalinguistic knowledge in L2 task performance: A verbal protocol anal-
ysis. Language Awareness 15 (3), 180198.
Categories in second language learning 103
2007 Metalinguistic knowledge and language ability in university-level L2 learn-
ers. Applied Linguistics. doi: 10.1093/applin/amm037. URL: 3http://applij.
oxfordjournals.org/cgi/content/full/amm037?ijkey=1xNurNzW63Rt3Um&
keytype=ref4.
Roehr, Karen and Adela Ganem
2007 Metalinguistic knowledge in L2 learning: An individual dierence variable.
Paper presented at Euro SLA on 13 September 2007, Newcastle University.
Rosa, Elena and Michael D. ONeill
1999 Explicitness, intake, and the issue of awareness. Studies in Second Language
Acquisition 21, 511556.
Rosch, Eleanor, and Barbara B. Lloyd (eds.)
1978 Cognition and Categorization. Hillsdale, NJ: Erlbaum.
Rosch, Eleanor and Carolyn B. Mervis
1975 Family resemblances: Studies in the internal structure of categories. Cogni-
tive Psychology 7, 573605.
Sanz, Cristina and Kara Morgan-Short
2004 Positive evidence versus explicit rule presentation and explicit negative
feedback: A computer-assisted study. Language Learning 54 (1), 3578.
2005 Explicitness in pedagogical interventions: Input, practice, and feedback. In
Sanz, Cristina (ed.), Mind and Context in Adult Second Language Acquisi-
tion: Methods, Theory, and Practice. Washington, DC: Georgetown Univer-
sity Press, 234263.
Saporta, Sol
1973 Scientic grammars and pedagogical grammars. In Allen, J. P. B. and Pit
Corder (eds.), The Edinburgh Course in Applied Linguistics. London: Oxford
University Press, 265274.
Schmidt, Richard W.
1990 The role of consciousness in SLA learning. Applied Linguistics 11, 129158.
1993 Awareness and second language acquisition. Annual Review of Applied Lin-
guistics 13, 206226.
2001 Attention. In Robinson, Peter (ed.), Cognition and Second Language Instruc-
tion. Cambridge: Cambridge University Press, 332.
Schumann, John H.
1998 The neurobiology of aect in language. Language Learning 48 (s1), xi341.
2004 The neurobiology of aptitude. In Schumann, John H., Sheila E. Crowell,
Nancy E. Jones, Namhee Lee, Sara Ann Schuchert, and Lee A. Wood
(eds.), The Neurobiology of Learning: Perspectives from Second Language
Acquisition. Mahwah, NJ: Erlbaum, 722.
Segalowitz, Norman
2003 Automaticity and second languages. In Doughty, Catherine J. and Michael
H. Long (eds.), The Handbook of Second Language Acquisition. Malden,
MA: Blackwell, 382408.
Shah, Priti and Akira Miyake
1999 Models of working memory: An introduction. In Miyake, Akira and Priti
Shah (eds.), Models of Working Memory: Mechanisms of Active Mainte-
nance and Executive Control. Cambridge: Cambridge University Press, 1
27.
Simard, Daphnee and Wynne Wong
2001 Alertness, orientation, and detection: The conceptualization of attentional
functions in SLA. Studies in Second Language Acquisition 23, 103124.
104 K. Roehr
Skehan, Peter
1998 A Cognitive Approach to Language Learning. Oxford: Oxford University
Press.
Sloman, Steven
2005 Avoiding foolish consistency. Behavioral and Brain Sciences 28 (1), 33
34.
Smith, Edward
2005 Rule and similarity as prototype concepts. Behavioral and Brain Sciences 28
(1), 3435.
Sorace, Antonella
1985 Metalinguistic knowledge and language use in acquisition-poor environ-
ments. Applied Linguistics 6 (3), 239254.
Sparks, Richard and Leonore Ganschow
2001 Aptitude for learning a foreign language. Annual Review of Applied Linguis-
tics 21, 90111.
Stankov, Lazar
2003 Complexity in human intelligence. In Sternberg, Robert J., Jacques
Lautrey, and Todd I. Lubart (eds.), Models of Intelligence: International
Perspectives. Washington, DC: American Psychological Association, 27
42.
Stevick, Earl W.
1999 Aect in learning and memory: From alchemy to chemistry. In Arnold, Jane
(ed.), Aect in Language Learning. Cambridge: Cambridge University Press,
4357.
Swain, Merrill
1998 Focus on form through conscious reection. In Doughty, Catherine
J. and Jessica Williams (eds.), Focus on Form in Classroom Second
Language Acquisition. Cambridge: Cambridge University Press, 64
81.
Swan, Michael
1994 Design criteria for pedagogic language rules. In Bygate, Martin, Alan Ton-
kyn, and Eddie Williams (eds.), Grammar and the Language Teacher. New
York, NY: Prentice Hall, 4555.
1995 Practical English Usage (2nd ed.). Oxford: Oxford University Press.
Taylor, John R.
2002 Cognitive Grammar. Oxford: Oxford University Press.
2003 Linguistic Categorization (3rd ed.). Oxford: Oxford University Press.
Tomasello, Michael
1998 Introduction: A cognitive-functional perspective on language structure. In
Tomasello, Michael (ed.), The New Psychology of Language: Cognitive and
Functional Approaches to Language Structure (Vol. 1). Mahwah, NJ: Erl-
baum, viixxiii.
2003 Constructing a Language: A Usage-Based Theory of Language Acquisition.
Cambridge, MA: Harvard University Press.
Tomlin, Russell and Victor Villa
1994 Attention in cognitive science and second language acquisition. Studies in
Second Language Acquisition 16, 183203.
Towell, Richard
2002 Design of a pedagogical grammar. URL: 3http://www.lang.ltsn.ac.uk/
resources/goodpractice.aspx?resourceid=4104.
Categories in second language learning 105
Ungerer, Friedrich and Hans-Jo rg Schmid
1996 An Introduction to Cognitive Linguistics. London: Longman.
VanPatten, Bill
1996 Input Processing and Grammar Instruction in Second Language Acquisition.
Norwood, NJ: Ablex.
VanPatten, Bill (ed.)
2004 Processing Instruction: Theory, Research, and Commentary. Mahwah, NJ:
Erlbaum.
106 K. Roehr
Explaining intersubjectivity. A comment on
Arie Verhagen, Constructions of
Intersubjectivity
WOLFRAM HINZEN and MICHIEL VAN LAMBALGEN
1. Overview
Constructions of Intersubjectivity (CoI) is an important addition to the
growing body of work on cognitive and construction-based grammars,
which CoI links to evolutionary issues in interesting ways. CoI also
touches upon a number of fundamental (indeed philosophical) issues in
the study of linguistic communication, meaning, and human cognition; it
should be applauded for the explicitness with which it does so, using lan-
guage as a window on the mind (p. 210). A concrete vision of the evolu-
tion of language is endorsed, arising against the background of analyses
of a number of seemingly disparate and scattered linguistic data. The
book thus forms an excellent starting point to engage with foundational
assumptions entering into the theoretical framework adopted. We will
here equally embed our comments within a theoretical discussion at the
level of frameworks.
The book begins by isolating a number of seemingly unrelated small
grammatical puzzles, which later gain a theoretical signicance for certain
big theoretical issues. The small grammatical puzzles concern negation
(in particular the lack of functional equivalence in the use in discourse of
not impossible and possible); whether nite sentential complements in cop-
ular constructions like The danger is that depleted uranium is poisonous
are subjects or predicates; and discourse connectives (e.g., concessive con-
junctions like although). These three construction types form the topics of
Chapters 2, 3, and 4, respectively. Chapter 5 concludes the book. We here
reverse the order of small and big and begin big, with some claims of
linguistic anthropology.
2. Anthropological and evolutionary issues
Following Verhagen, using human language is essentially a manipulative
activity: language is fundamentally a matter of regulating and assessing
Cognitive Linguistics 191 (2008), 107123
DOI 10.1515/COG.2008.006
09365907/08/00190107
6 Walter de Gruyter
others (9). Its use is never just informative, but always argumentative
(910); like animal communication systems geared at getting conspecics
to act in ways benecial to the communicator (8), language is about get-
ting things done rather than the disinterested representation of the world.
Any such similarity between human language and non-human communi-
cation systems would be a welcome result, as it reduces the apparent gap
separating human and animal language. That said, while human lan-
guage can be used to manipulate and getting others to behave as one de-
sires, ever so often it is not so used, highlighting a crucial dissimilarity be-
tween human language and animal communication systems: we may as
well use language to freely express our thoughts or ponder and assert the
truth of something, without necessarily expecting particular functional
benets ensuing from that. Unlike in non-human communicating species
there is no apparent cause or functional pressure for our deliberate deci-
sions to assert what we do, and much functional pressure is needed to
prevent them. Nor are we restricted in what we choose to refer to, assert,
or communicate. Stuck in the immediate here and now, by contrast, as
non-human animals by and large are, they only have a small number of
non-voluntary vocalizations at their disposal, all intrinsically linked to an
immediate adaptive purpose. No doubt human language use will seem
somewhat pathological if all we say serves some instrumental purpose
and is intrinsically linked to a certain response we wish to achieve. Inter-
estingly, the descriptive and assertoric aspect of language is inescapable
even where language is used manipulativly, as in making compliments to
a lady, where unavoidably we are making a descriptive claim too (what a
beautiful perfume!).
The denial that human language exhibits the very features that roman-
ticists like Schlegel, Herder, or Humboldt claimed to be so distinctive for
itits use for the free and creative expression of thoughtalso has an in-
tellectual heritage we should be aware of. Assimilating human language
to non-human animal communication systems was part and parcel of
B. F. Skinners (1957) vision of language, who atly denied that language
is used for purposes of reference, representation, or the assertion of truth,
arguing instead that it is an instrument serving purposes of the control of
behavior. In CoI, too, we read that language evolved as a mechanism
producing pressure favoring long-term predictability of behavior (14).
CoI does not support a Skinnerian psychology, to be sure; nor does it
claim that all language use is a function of strategic interaction. Yet it is
not entirely clear how far removed its foundational claims about lan-
guage are from Skinnerian views of language as an instrument of control.
We think it is an obvious fact that language is used as an instrument of
control. Our point is merely that (i) the opposite is equally true, (ii) not
108 W. Hinzen and M. v. Lambalgen
so using it is actually a hallmark of human language that should be cen-
tral in any account of its evolution.
We suggest that, more generally, the general assimilation of human to
non-human language on the basis of ascriptions of an evolutionary func-
tion to language, such as communication, will not lead to much insight in
linguistic structure and its special character. To begin with, function as-
criptions to whole, complex systems such as language dont typically
transfer to the parts from which such systems are assembled: these will
typically have independent evolutionary trajectories, unrelated to the
function for which they are later employed in when entering the system
of language. To whatever extent cognitive mechanisms entering language
are used in non-humans, and have non-communicative functions there,
language will not be rationalizable by looking at it as a communication
system. Nor will the study of non-linguistic animal communication un-
lock the secret of what makes language special. If there is anything special
to the human communication system, it is that it is a linguistic one, which
means that its being a communication system cannot possibly be what as
such explains its special features. The study of communication systems
(Hauser 1996) does not tell us much about the special properties of
human language, such as its structural and computational aspects, or the
fact of its intentional and creative use.
Again, none of this means that the study of the communicative use of
language will not let us see many interesting facts about language. CoI
succeeds rather remarkably in unearthing such facts. This books funda-
mental theoretical commitment however is deeper: that social and cul-
tural cognition alone is the key to the understanding of language. The
most basic explanatory notion in Verhagens framework, used extensively
throughout the book, is taken from Tomasello (e.g., 1999, 2003): the
human ability to take others perspectives (2), understand what they at-
tend to, and share their intentions. On Verhagens view this complex of
mental reasoning abilities is the prime biological factor distinguishing us
from other primates. Let a primate interacting manipulatively with others
understand itself as an intentional agent, and have him ascribe intentional
life to other agents as well; have him want to share beliefs and identify
with the intentional mental life of others; then culture becomes possible,
with its own special mechanisms of inheritance, since humans can now
learn from others as opposed to merely from their own interactions with
a non-human environment. With this, language is on its way, if not given,
Verhagen suggests. For language simply is a system of conventions (of
symbols and ways of using them) that solve a cognitive coordination
problem. It is culturally transferred (3); and thus there is no biological
adaptation specic to language needed. In sum, starting from the one
Explaining intersubjectivity. A comment on Verhagen 109
basic notion of taking anothers perspective, language evolvesfor the
coordination and managing of multiple such perspectives in discourse.
3. Testing a hypothesis
What would be evidence for the correctness of such a view? What we
need is independent empirical evidence that human language is optimized
to some signicant extent for the coordination task envisaged. A good de-
gree of optimization is what testing any functionalist hypothesis in biol-
ogy requires. In short, the hypothesis should be a particularly good
source for predictions of mechanisms that we can then empirically attest.
But note that even if this proves possible, the functional rationale of the
mechanisms in question will not be their cause or origin. An independent
story about the mechanisms will have to be told, as a mere hypothesis
about functions will leave the question of origin (proximate causes) open.
Recognizing the need for validation above, Verhagen asserts that we
must be able to see repercussions for the content that is systematically
coded in linguistic symbols of the capacity of understanding others as
like oneself, in short read o the semantics of basic linguistic units from
their ways of handling perspectives:
[I]f coordinating cognitively with others is so basic a component of human prac-
tices, then we should see it reected in more than one area of grammar [ . . . ] con-
necting, dierentiating and tailoring the contents of points of view with respect
to each other (rather than organizing a connection to the world) is essential for
understanding their semantics [ . . . ] (p. 4)
Here we note a potentially wrong opposition, to which we will return sev-
eral times: even granted that, generally, coordinating cognitively with
others is basic to human cognition, and this general principle of cogni-
tion is also instantiated in grammar, we dont see that there somehow ex-
ists an opposition between coordinating cognitively and organizing a
connection to the world, which entails that semantics cannot be under-
stood as serving both functions simultaneously. We contend, in line with
our anthropological claims above, that it can and does.
1
We also note that there are potentially two dierent aspects of
language that we might want to explain by appeal to their discourse
function: sentence-internal organization on the one hand, and discourse
phenomena transcending the sentence-boundary on the other. By
sentence-internal organization we mean the structure of the clause, the or-
ganization of phrases and their dependents, and syntactic mechanisms
like complementation. Sentential connectives and discourse conjunctions
110 W. Hinzen and M. v. Lambalgen
fall into the class of discourse phenomena. The complementation con-
struction in (1) illustrates the rst one; (2) is an example of an inter-
sentential or discourse phenomenon:
(1) George saw/knew/said that his opponent was closing in.
(2) Max fell. John pushed him.
Clearly, it is discourse phenomena that we expect a discourse-based per-
spective to elucidate best. It is much less clear that such a perspective
would illuminate sentence-internal and syntactic organization. Verha-
gens striking claim is that a perspective departing from discourse and
cognitive coordination shows us that both central semantic and syntactic
analyses of particular linguistic constructions have been mistaken.
Now, in (2), the two sentences are obviously semantically connected
(even though they are not parts of one another, in a phrase-structural
sense, as in the construction (1)). To understand (2), it has to be inferred
that the event order is the reverse of the sentence order, and it is only by
applying causal knowledge (pushing can be a cause of falling) and a gen-
eral inference principle (no other possible cause is mentioned, whence one
must assume pushing is the only operant cause) that the listener can con-
struct the corresponding event structure. The speaker need not supply ex-
plicit information about the intended event order since he knows that the
listener is able to compute this herself. This is in fact a general fact about
discourse production and understanding: it is both impossible and unde-
sirable to supply all relevant information in linguistic form and both
speaker and listener therefore appeal to general principles in computing
that information from the linguistic material given. So the principles driv-
ing the understanding in cases like (2) are not specically linguistic ones:
they are more generally cognitive, logical, or inferential ones (examples
will be seen below). Again, we expect this to be dierent in (1), where we
meet a hypotactic construction missing in (2), in which, at least on a stan-
dard syntactic analysis, that his opponent was closing in is the internal ar-
gument of saw (see below for more on this structural claim). It is therefore
more plausible that cognitive coordination in discourse could potentially
tell us much about (2), but little about (1).
4. Negation and discourse connections
Let us see whether this is so and begin with the observation that clearly the
discourse in (2) is about the world, and involves a large amount of cogni-
tive coordination, exemplifying languages potential, insisted on above, to
serve both of these functions simultaneously. The dierence between the
general fact about discourse understanding just noted and Verhagens
Explaining intersubjectivity. A comment on Verhagen 111
claims is that he argues for the existence of specic grammatical construc-
tions whose purpose would lie precisely in cognitive coordination, and
whose semantics would not be explainable otherwise (or on more tradi-
tional semantic assumptions) Verhagen considers that negation is an in-
stance of such a construction, and this is the topic of Chapter 2, to which
we now turn.
We will give a slightly more formal treatment of Verhagens examples
in CoI, to see whether negation can indeed be used in building a case
against semantics as organizing a connection to the world. We rst sum-
marise Verhagens take on negation, with page numbers to where Verha-
gen states his views:
a. the primary function of negation is intersubjective cognitive coordi-
nation (42, bottom of page)
b. the relation between language and the world is only secondary (42,
bottom of page)
2
c. negation is concerned with the relation between distinct mental
spaces of participants in discourse (57)
d. more specically, the speaker uses negation to instruct the addressee
to entertain two distinct mental spaces, one of which has to be re-
jected (42, bottom of page)
e. these mental spaces may incorporate topoi, collections of culturally
determined default rules (58).
Now consider the following three example discourses:
(3) A. Do you think our son will pass his courses this term?
B. Well, he passed them in the autumn term.
(4a) A. Do you think our son will pass his courses this term?
B-a. Well, he did not pass his rst statistics course.
(4b) A. Do you think our son will pass his courses this term?
B-b. Well, he barely passed his rst statistics course.
The general principle behind understanding such exchanges is that, in-
stead of giving a direct answer, B invites addressee A to activate a defea-
sible rule in her semantic memory (cf. the topoi mentioned under e.
above) and to perform an inference based on the rule and the information
supplied by B. Thus in example (3), A must retrieve a defeasible rule of
the type normally, if a student passes his exams in term n, then also in
term n 1, and apply modus ponens using Bs observation about the au-
tumn term. Things get really interesting in example (4a). Here B invites A
to activate a defeasible rule like normally, if a student passes his rst sta-
tistics course, he can pass other courses as well and apply an inference
112 W. Hinzen and M. v. Lambalgen
using his utterance B-a. That the rule is defeasible can be seen from the
possible continuation of (4a) in (4a*):
(4a*) A. Do you think our son will pass his courses this term?
B-a. Well, he did not pass his rst statistics course.
A. But he got a very good grade for the astrophysics course!
A more formal analysis of these examples goes as follows.
3
A defeasible
rule is an implication of the form if P and nothing exceptional is the case,
then Q. Here P can be the proposition a student passes his rst statistics
course, and Q the proposition he can pass other courses as well. Using
this representation, one may disentangle the coordinating and world-
relating functions of not. First o, sentence B-a has a dual function: it
states a fact and it triggers an inference process that allows A to deduce
Bs opinion on the relevant issue. Sentence B-a can have this dual func-
tion because the inference process that it triggers has certain universal fea-
tures which are common knowledge of A and B. Namely, the inference is
a form of closed world reasoning, a form of logical reasoning which is dif-
ferent from classical logic but which is all the time applied in discourse
understanding (see van Lambalgen and Hamm 2004). The logical princi-
ple invoked here is: assume all propositions are false which you have no
reason to assume to be true. One can make sense of (4a) by invoking this
principle twice. First the defeasible rule, written fully as if a student
passes his rst statistics course and nothing exceptional is the case, he
can pass other courses as well is reduced to if a student passes his rst
statistics course, he can pass other courses as well, because no informa-
tion about exceptions is supplied in the discourse. Secondly, no other suf-
cient conditions for passing the other courses are given, so that the rule
is actually an equivalence, and utterance B-a can be used to derive the in-
tended conclusion he will not pass all his courses this term. Note that
without invoking closed world reasoning, the inference that B implicitly
appeals to in (4a) is the classically invalid denial of the antecedent. In
(4b) the suggestion is that if a student barely passes a statistics course,
then one actually has an exceptional circumstance. Therefore the previous
reduction of the defeasible rule no longer applies, and the inference using
utterance B-a fails.
The defeasible character of the inferences involved is brought home
further by the discourse (4a*), where the function of the utterance But
he got a very good grade for the astrophysics course! is precisely to high-
light a second defeasible rule: if a student passes an astrophysics course
and nothing exceptional is the case, he can pass other courses as well. In
this case the second application of closed world reasoning fails, thus ren-
dering invalid the conclusion previously drawn. The circumstance that
Explaining intersubjectivity. A comment on Verhagen 113
conclusions from logical arguments may have to be withdrawn when new
information comes in, may have reinforced the impression that organiz-
ing the connection to the world is of minor importance in language use.
But in actual fact, these discourses are all about ones best guesses about
the state of the world. We conclude, then, that at least with respect to sen-
tential negation, the general framework of non-monotonic logic elegantly
captures the data in question, and the uniqueness implied in Verhagens
claims about the need for a functional explanation is without support.
Note that non-monotonic logic is not intrinsically a framework for rea-
soning in an intersubjective context at all: we nd the same principles of
reasoning in other cognitive domains such as planning, hence their ratio-
nale is not purely in cognitive coordination, leading to further doubts
about the foundational assumptions used.
Another example in the same vein is taken from Chapter 4, on dis-
course connections. Consider Verhagens discussion of although and but
on pp. 167174. He mentions the following general explication of the
meaning of although (167): p although q means: (a) truth conditions: p
& q; (b) presupposition: q implies not-p. Here presupposition means that
if q implies not-p is not yet present in the discourse, it must be intro-
duced (presupposition accomodation). Verhagen correctly notes that if
q implies not-p is formalised as the material implication of classical
logic, (a) and (b) are in immediate contradiction, and then after some dis-
cussion draws the following moral: What is especially important to avoid
the derivation of contradictions, even if the defeasibility of generaliza-
tions is recognized, is that a background mental space, distinct from that
of the speaker/writer, is invoked in which the shared topos is construed
as a basis for a causal inference (168).
A formalisation in non-monotonic logic again shows that we can re-
main agnostic about the necessity (and precise form) of mental space rep-
resentations. We shall provide representations for although and but
using the defeasible conditionals introduced above. These feature a con-
junct nothing exceptional is the case, which we shall formalize here as
not-ab (where ab is a proposition letter indicating some abnormality):
p although q means: (a) truth conditions: p & q; (b) presupposition: q
& not-ab implies not-p.
p but q means: (a) truth conditions: p & q; (b) presupposition: p & not-
ab implies not-q.
In both cases (a) and (b) are consistent, and jointly entail the derivation
of an abnormality. Thus, if someone utters p although q, he contributes
a variable for an abnormality to the discourse, which can be unied with
114 W. Hinzen and M. v. Lambalgen
a concrete circumstance. E.g., He failed his exam, although he worked
very hard. He was sick on the day of the exam. The second sentence is
read as an instantiation of the abnormality pointed at by the rst sen-
tence. No special machinery for mental spaces needs to be adopted; it suf-
ces to apply to general principles for discourse coherence such as the in-
troduction of variables to be unied with linguistic material.
5. Cognitive signicance
Before we continue with our discussion of linguistic matters and return to
the issue of sentential complementation in the next section, there is a
methodological point we want to raise: the use of formal representations
in cognitive linguistics, especially Verhagens use of Fauconniers theory
of mental spaces in explaining the function of negation. We presented a
formal analysis of Verhagens examples involving negation in non-
monotonic logic, without rst explaining Verhagens own mental space
analysis. We did so because we have severe doubts as to the adequacy of
such analyses in a cognitive context. We fully agree that the most produc-
tive way to do linguistics is to relate it to human cognition as a whole.
But what makes a particular piece of linguistic analysis also cognitive?
Let us pause to consider this important question in some detail. At the
outset of modern linguistics in the 1950s a demand was imposed on
theories of linguistic competence according to which such theories should
be explicit. That is, they should not rely on badly understood and
question-begging notions such as understanding, intending, or grasp-
ing the meaning. In practice, explicitness meant to give such psycho-
logical processes a computational or algorithmic description.
4
Adopting
this methodological decision, a given semantic analysis of a natural lan-
guage should employ representations that have well-dened formation
rules, and the mapping between syntactic and semantic representations
should be computationally transparent.
Note that a purely semantic analysis of a linguistic phenomenon can as
such be considered to be successful if it gets the truth conditions of sen-
tences and entailments between sentences in context right. Here, one
does not put any demands upon the semantic representations used except
that one can meaningfully speak of entailments between them. Although
this demand is by no means trivial, it does not yet suce for explanatory
signicance in the context of a study of human cognition. We do not wish
to imply that only pointing at a neural substrate suces for a demonstra-
tion of cognitive reality. Clearly, a given linguistic analysis can stand on its
own feet and does not need validation from neuroscience.
5
Yet, the con-
cepts and entities used in abstract syntactic and semantic representations
Explaining intersubjectivity. A comment on Verhagen 115
must at least not be in conict with known constraints on the processing
of these structures or their storage in long-term and working memory, for
example.
6
The simple point we want to make here is that this integration of elds
of inquiry operating at dierent levels of abstraction (i.e., linguistic and
neurological) depends on the explicitness of the computational descrip-
tions involved. In particular, semantic representations need to be mathe-
matically denite enough to be used in algorithms. We have strong doubts
that this desideratum is met by Fauconniers theory. The analysis of nega-
tion presented above in terms of non-monotonic logic goes some way to-
ward fullling theses desiderata, since, as is shown in Stenning and van
Lambalgen (2008), the proposed system has considerable cognitive signif-
icance, including an appealing neural implementation.
6. The complementation construction
Let us now return to sentential complementation constructions such as
(1). Verhagens suggestion (Chapter 3) is that sentential complementation
is a special purpose construction that, again, intrinsically serves a coordi-
nation aim. Verhagen claims that (1), repeated here as (5), is fundamen-
tally dierent in structure from a construction like (6):
(5) George knew/saw/said that his opponent was closing in.
(6) George knew/saw/said something.
That is, it is wrong to construe (5) as a transitive construction on the
basis of a mere analogy with (6). In particular, he argues that the em-
bedded clause in (5) is not a syntactic constituent or verbal argument (p.
83). Rather, (5) is a construction in its own right, a holistic template
with irreducible sound and meaning properties (p. 79) that doesnt follow
from any general phrase-structural rules.
However, no structural analysis of the sentences in question is actually
provided in this chapter, and no denition of what it would be for the
that-clause to be a constituent is provided. Clearly, a structural analysis
is not ipso facto provided once certain functional claims are made: the
mechanisms underlying certain functions are a logically independent
issue. But standard tests for constituency suggest that we can question
the that-clause, as in (7), or elide it, as in (8):
(7) George saw/knew/said what?
(8) George saw/knew/said that his opponent was closing in, and Bill
saw/knew/said so too.
116 W. Hinzen and M. v. Lambalgen
Verhagens conclusion by contrast is rather exclusively derived from
claims about dierences in discourse functions of (7) and (8), which we
claim is a logical error and fails to provide any independent evidence for
the functions used to an explanatory purpose.
In addition, a wrong opposition arises again. Let it be true that (5) in-
dicates a perspective in the matrix clause, and that a thought is being
perspectivized in the embedded one, as Verhagen argues. This observa-
tion appears fully consistent with the that-clause in (5) being a constituent
that is the complement of the matrix verb. To the extent that there is a
dierence between (5) and (6) in the functional respects just noted
although that dierence is not obvious to usit can follow composition-
ally from the dierence in the two complements of the matrix verb, which
after all dier, in syntactic category and Case. Again, independent evi-
dence is needed for a dierence in structure between (5) and (6)
evidence not simply predicated on the functionalist hypothesis made.
Contrary to the claims made in this chapter, a standard generative
constituent structure analysis of (5) would not proceed merely from an
intuited analogy or relatedness between (5) and (6) (as stated on p.
87). It would also not proceed by a top-down analysis (p. 82). On the
contrary, it would build such a structure from the bottom upwards, be-
ginning with the minimal assumption that saw and the CP in question
must be somehow merged with one another, giving rise to a structure of
the general form [X Y]. Assuming in addition to that minimal require-
ment that in human language, phrases are headed, one of X and Y will
have to be the head, H, which thus projects, with Y becoming its com-
plement or internal argument. The result is then as a whole predicated of
an external argument, Z (i.e., George). In this way we derive that the
common underlying structure of (5) and (6) is indeed [Z [X [Y]]], an anal-
ysis making the rather minimal assumptions that:
(i) human language is combinatorial (there is a recursive operation
merging constituents),
(ii) the organization of expressions is hierarchical (it contains phrases
over and above lexical items),
(iii) phrases are headed (Merge(X,Y) is of type X or else type Y), and
(iv) branching is binary (Merge takes two arguments).
This analysis moreover does not automatically assume the possibility of
generalizing over clausal and nominal structures: it does not refer to any
such constructions, which are not even visible for a minimal analysis that
appeals to abstract notions such as head, complement, internal argument,
and external argument, alone. So it also does not predict that in all
Explaining intersubjectivity. A comment on Verhagen 117
contexts nominal arguments can be inserted where the putative clausal ar-
guments can be, which is the prediction that Verhagen (pp. 8385) pro-
vides evidence against.
It is neither clear to us why double object constructions like They
warned us that the prot would turn out lower would support Verhagens
viewpoint (see p. 86), nor why inversely linked predications of the type
in (7) and (8) do:
(7) [The danger] is [that the middle class feels alienated].
(8) [That the middle class feels alienated] is [the danger].
We here briey discuss only the latter case. The problem posed by Verha-
gen is that more than hundred years of analysis could not settle whether
the that-clause in (7)(8) is a subject or predicate. But perhaps this is a
wrong dilemma. It may precisely be a feature of these constructions that
they are organized around a symmetrical predicational relation between
two XPs in a Small Clause (SC) as in (9), in a way that either of them
can raise to a sentence-subject position in front of the auxiliary, resulting
in either (10) or (11) (see Moro 2000):
(9) SC
CP DP
that . . . alienated the danger
D
(10) SUBJECT [BE [
Small Clause
[The danger] [that the middle class feels
alienated]]
D
(11) SUBJECT [BE [
Small Clause
[The danger] [that the middle class feels
alienated]]
Neither the CP nor the DP in (9) are the head in the Small Clause (or
project), which explains their symmetry, and potentially the fact that ei-
ther of them can raise out of the Small Clause.
7. Constructions as such
Above we appealed to a minimal computational machinery in terms of bi-
nary Merge, which led us to the scheme [Z [X [Y]]]. An argument for using
a minimal phrase structural analysis generated by a recursive operation
118 W. Hinzen and M. v. Lambalgen
Merge is that we need some account of the recursive machinery of lan-
guage (unless recursivity is denied, which it is not in the present volume).
If one assumes a minimal conception to account for recursive structure
building (Merge on its current minimalist construal is such a candidate,
see Hinzen 2006), the question whether there is a complementation con-
struction and whether or not it is identical to a direct object construc-
tion (p. 86) cannot even be formulated. Merge is too primitive to be sen-
sitive to such categorial distinctions, giving us a much simpler vision of
the linguistic systems basic computations. The question is whether this is
a bad or a good result.
The claimed achievement of the Principles and Parameters framework,
incorporated into Minimalism, was that constructions as we can perceive
them in languages at a descriptive level can be shown to follow from
more abstract generative principles which are neither language-specic
nor construction-specic. Thus, what we called the complementation
construction above is simply the overt consequence of Merge plus the
fact that some heads subcategorize for an object that is semantically a
proposition. This, if feasible, is a desirable view, we contend, because the
abstract generative principles in question, if indeed minimal, have to be
part of anyones account; and because having constructions as merely
the overt result of deeper, fewer, and more abstract structure-building op-
erations is both explanatorily benecial and in no conict with the fact
that they take up distinctive discourse functions when used. From an evo-
lutionary viewpoint, too, a minimal and construction-free grammar (that
remains descriptively adequate) should be welcomed: it allows to accom-
plish more (a great variety of linguistic constructions) with less (minimal
structuring principles cutting across constructions), which is arguably in
line with general principles of economy and conservativity in biological
evolution.
8. Perspective-taking
As noted, Verhagen doesnt deny recursion, but places it outside language,
in perspective-taking, which as such, he argues, is inherently recursive
(p. 98). That sentential complementation constructions are paradigmati-
cally recursive is on his view only a sign for the fact that they are the
grammaticalization of this basic human cognitive capacity. The problem
with this account however is that to our knowledge there is no evidence
for recursive perspective-taking outside human language; ipso facto we
cannot invoke perspective-taking to explain language, and the direction
of explanation might precisely have to be reversed, unless there is a com-
mon cause of both. Furthermore, taking a perspective on something
Explaining intersubjectivity. A comment on Verhagen 119
canalthough it need notinvolve what philosophers traditionally have
called a propositional attitude. It need not, since it has been observed in
false belief tasks that while a child may take the wrong perspective
(namely its own) in propositional terms, it takes the right perspective in
behavioural terms, e.g., by looking at the right spot (Clements and Perner
1994). That is, the notion of perspective as such is consistent with both
propositional and non-propositional mental representations; it doesnt ex-
plain why it should be the case that we take propositional perspectives or
why such forms of thought exist. Although there are some claims for
propositionality in non-humans (Seyfarth 2006), there are also strong
ones against it (Terrace 2005), and the notion of propositionality invoked
in the former claims is too broad to illuminate the specics of human
clause structure and the propositional meanings that sentential construc-
tions have. There is also evidence that the understanding of sentential
complementation is actually itself an instrumental causal factor in the
genesis of mind-reading and how the child forms explicit propositional
representations of false beliefs, a task that is not mastered before senten-
tial complementation itself is (De Villiers 2005). All of this indicates that
Verhagens bold attempt to explain language from social cognition may
wellat least partiallyhave the cart before the horse.
7
9. Meaning
We close with a general observation on the philosophy of meaning as-
sumed in CoI. If the meaning of linguistic expressions is inherently and
necessarily linked to their discourse purpose, we face consequences such
as that an assertion of There are seats in this room implies a presupposi-
tion having to do with the seats being comfortable, as Verhagen asserts
(15). But obviously, there can be assertions about seats in rooms where
these seats fail to be comfortable. Hence, the implicature is a mere con-
textual one, and ipso facto not an inherent (non-contextual) aspect of the
expression in question. Is the claim the radical one that there are no such
inherent aspects of the meaning of an expression at all? If it isnt, a non-
contextual notion of linguistic meaning as determined by linguistic form
needs to be preserved on which the compositional process of meaning
determination would be based. If it is, that would entail giving up the
compositionality of meaning, which depends on the availability of a
context-independent notion of meaning that is determined by the syntac-
tic part-whole structure of the expression in question (see Fodor and
Lepore 2002). We may be wary of giving up this widely endorsed con-
straint, as it seems needed to explain the forms of recursivity that lan-
guage exhibits. Note that to whatever extent we endorse compositionality
120 W. Hinzen and M. v. Lambalgen
as a principle for the generation of meaning, meaning will not be conven-
tional: meaning will follow by necessity from algebraic laws of phrasal
composition, in much the way that 5 follows from composing 2 and 3 by
means of the operation .
Note, also, that if the meaning of a sentence is spelled out by appeal to
its argumentative consequences, it will be the case that there is nothing to
rationally explain why we endorse the inferences we do. If we want to jus-
tify moving from A&B to A, say (or claim classical validity for this
move), part of what we will appeal to is the meaning of & (and our
grasp of that meaning). We couldnt justify the classical rule of conjunc-
tion elimination, say, by the existence of a causal mechanism carrying us
from premise to conclusion, or the desirability of the result, or the force
of a drug that we take. By consequence, an independent notion of mean-
ing is needed, even if an argumentation-oriented perspective is adopted,
and meaning cant consist in argumentative consequences alone.
10. Conclusions
Summarizing our main claims, we believe that while the data that CoI un-
earths are rich and certainly need explanation, they have an explanation
in more traditional formal semantic or syntactic frameworks which are
implicitly rejected in CoI. In short, the data do not support either the
analyses provided or the foundational assumptions about language en-
dorsed. Again, we see no conict between older representational or dis-
interested perspectives on the use of language, and observations on the
discursive functions that linguistic expressions may serve. We also see a
danger in one-sided perspectives on language that leave out some of its
distinctive features. Coordination in discourse and manipulative commu-
nication are very clearly vital functions of language, and taking this as
our starting point many important phenomena of language may come to
the surface: we fully concur with Verhagen on this issue. But their expla-
nation will be another question.
Received 31 January 2007 Durham University
Revision received 21 March 2007 University of Amsterdam
Notes
1. Figure 1.2 on p. 7, as one referee notes, may suggest that Verhagen recognizes both fac-
tors. But the claim made is that special foundational signicance attaches to the former
function and that negation and complementation illustrate this, and we dispute this.
2. Since the two rst points are important in what follows, it is worthwhile to quote Verha-
gen directly: [T]he linguistically most relevant properties of negation, the ones that it
Explaining intersubjectivity. A comment on Verhagen 121
shares with other elements in the same paradigmatic class, are purely cognitive opera-
tions (p. 57).
3. Here we follow the analysis of defeasible conditionals given in Stenning and van Lam-
balgen (2006). The interested reader is referred to this paper for a fully formal treatment
of phenomena related to the ones discussed here.
4. Algorithmic is taken in a wide sense here, and also includes computations in neural
networks.
5. Empirical linguistic arguments for a universal argument-adjunct distinction, for exam-
ple, are not empirically invalid if we cant link or translate the primitives used in the
analysis to primitives of a neurobiological description.
6. Together with constraints owing in this particular direction (Dabrowska 2004), it is an
equally reasonable proposal at this point that linguistics may and should impose con-
straints on neuroscience. That is, explicit linguistic proposals for computational pro-
cesses underlying language should be the basis for evaluations of (and predictions for)
neuroscientic experimentation (see e.g., Stockall and Marantz 2006; Poeppel and Em-
bick 2005; for such a perspective for the case of syntax, and Baggio and van Lambalgen
2007 for the case of semantics).
7. One referee claims that it is no objection to Verhagen that there is no evidence for
recursive perspective-taking outside human language, since Verhagen precisely claims
that perspective taking is what makes humans dier from other animals. The point
however is whether it explains language, and recursion therein. For this it needs to
have the relevant formal properties (propositionality, recursivity) independently of
language.
References
Baggio, Giosue and Michiel van Lambalgen
2007 The processing consequences of the imperfective paradox. Journal of Seman-
tics 24, 307330.
Clements, W. A., and Josef Perner
1994 Implicit understanding of belief , Cognitive Development 9, 377395.
Dabrowska, Ewa
2004 Language, Mind and Brain. Some Psychological and Neurological Con-
straints on Theories of Grammar. Edinburgh University Press.
De Villiers, Jill
2005 Can language acquisition give children a point of view? In Astington, J. W.
and J. A. Baird (eds.): Why Language Matters for Theory of Mind, Oxford
University Press, 186219.
Fodor, Jerry, and Ernie Lepore
2002 The Compositionality Papers. Oxford: Oxford University Press.
Hauser, Mark D.
1996 The Evolution of Communication, Cambridge, MA: MIT Press.
Hinzen, Wolfram
2006 Mind Design and Minimal Syntax, Oxford: Oxford University Press.
Moro, Andrea
2000 Dynamic Antisymmetry, Cambridge, MA: MIT Press.
Poeppel, David and David Embick
2005 The relation between linguistics and neuroscience. In Cutler, A. (ed.),
Twenty-First Century Psycholinguistics: Four Cornerstones. Lawrence
Erlbaum.
122 W. Hinzen and M. v. Lambalgen
Seyfarth, Robert
2005 Primate social cognition and the origins of language, Trends in Cognitive
Sciences 9, 264266.
Skinner, B. F.
1957 Verbal Behavior. New York: Appleton-Century-Crofts.
Stenning, Keith and Michiel van Lambalgen
2006 Semantic interpretation as computation in nonmonotonic logic, Cognitive
Science 29 (2006), 919960.
2008 Human Reasoning and Cognitive Science. Cambridge: MIT Press.
Stockall, Linnea, and Alec Marantz
2006 A single route, full decomposition model of morphological complexity:
MEG evidence, The Mental Lexicon 1:1.
Terrace, Herbert
2005 Metacognition and the evolution of language, in Terrace, H. and Metcalfe
(eds.), The Missing Link in Cognition. Oxford University Press, 84115.
Tomasello, Michael
1999 The Cultural Origins of Human Cognition. Cambridge: Harvard University
Press.
2003 Constructing a Language: A Usage-Based Theory of Language Acquisition.
Cambridge: Harvard University Press.
van Lambalgen, Michiel, and Fritz Hamm
2004 The Proper Treatment of Events, Blackwell.
Verhagen, Arie
2005 Constructions of Intersubjectivity. Oxford: Oxford University Press.
Explaining intersubjectivity. A comment on Verhagen 123
Intersubjectivity and explanation
in linguistics: A reply to Hinzen
and van Lambalgen
ARIE VERHAGEN*
1. Introduction
Let me start by saying that I very much appreciate both the eort that
Hinzen and Van Lambalgen (hereafter, H&L) have put into commenting
on Constructions of Intersubjectivity (hereafter, CoI ), and their comments
as such. It is important for all cognitive disciplines studying language that
representatives from dierent schools of thought try to address each
others work, in terms of both results and foundations. We may not reach
agreement as a result of a discussion, but it will still be helpful in clarify-
ing matters for ourselves and for other interested scholars, and thus for
the future development of our common eld of study. This is true even
if the divide is deepwhich is the case here in a number of respects, as
H&L indicate themselves.
Another important preliminary remark concerns the nature and scope
of our dierences. Philosophically they are certainly far reaching, but
from an empirical point of view it is useful to notice that H&L do not
present counterexamples to the actual linguistic analyses presented in
CoI. Rather, their main point is that such analyses can also be provided
in other frameworks, which they label more traditional than cognitive
linguistics, and which should in their view be preferred for other than em-
pirical reasons, having more to do with general ideas about concepts such
as meaning, communication, grammar, etc., and the way these re-
late to even more comprehensive concepts such as evolution or lan-
guage. Below, I will actually dispute that H&Ls comments show that
the alternative, non-cognitive, frameworks provide these explanations
(and suggest that they are not forthcoming either), but it is good to note
at the start that their own comments do not concern the empirical claims
of CoI. In fact, in my own view, our main dierence concerns the question
what may count as an explanation in the analysis of linguistic phenomena.
Finally, as to the organization of this reply, I will not follow H&Ls
comments step by step, as this would lead me to repeat myself too much.
Cognitive Linguistics 191 (2008), 125143
DOI 10.1515/COG.2008.007
09365907/08/00190125
6 Walter de Gruyter
Instead, I will rst concentrate on the notion of meaning, addressing
mainly sections 2, 3, and 9 of H&L (section 2 below); then I will look at
the grammar of negation and argue that the alternative analysis H&L
suggest is linguistically unmotivated, which is partly due to them leaving
out some pieces that constitute important components of the argumenta-
tion in CoI; it is at this point that the dierence in what should be allowed
to count as an explanation in linguistic analysis becomes most concrete.
In this section (3), I will also deal with H&Ls remarks about mental
spaces, cognitive signicance (their section 5), and formalization.
Section 4 concerns H&Ls sections 68, dealing with complementation,
recursion, and some basic assumptions about grammatical structure. Sec-
tion 5 concludes this reply.
2. What do we mean by meaning?
Perhaps the most baing passage for me to read in H&Ls comments was
in the second paragraph of their Section 3. They rst summarize the gen-
eral programme of CoI: to demonstrate that the specic human ability to
manage perspectives is systematically reected in the meanings of several
grammatical constructions, in the sense that these meanings are often
related to the management of such perspectiveswhat I call intersubjec-
tive cognitive coordinationrather than to describing the world (speci-
fying an object of conceptualization in some way). What baed me was
that they immediately add to this: which entails that semantics cannot
be understood as serving both functions simultaneously (and then they
set out to argue that this is a bad idea). How could it be that they see
this as a core idea of CoI, while evidence against it is abundantly present
in the book? Specically, the rst section (p. 210212) of the Concluding
Remarks is entitled, Not everything is intersubjectivity (although inter-
subjectivity is widespread), and it refers back to parts of the book where
the meaning of dierent items was claimed to involve both the objective
and the intersubjective level of conceptualization (cf. also CoI section
1.3, esp. p. 18). Moreover: why would it be an entailment? There must
be something that I missed, and I assume it is to be found in what H&L
conceive of as meaning, and hence as semantics.
H&L devote a separate section to meaning, but the points they make
there are closely related to some they make at the beginning. In section
9, they contest the proposal that evoking inferences is part of the meaning
of linguistic expressions, and defend a context-independent notion of
meaning; in section 2, they oppose an argumentative view of language
use (their picture of this view is a bit of a straw man; see the end of this
section) to the romanticist view that language is used for the free and
126 A. Verhagen
creative expression of thought (construed as reference, representation
or the assertion of truth), claiming that the latter function, unlike the
former, is crucial for understanding what makes language dier from an-
imal communication systems. We can safely equate these two opposi-
tions, since argumentative in the Ducrot-sense adopted in CoI means
evoking inferences (through associated topoi, or defeasible rules),
and the context-independent meaning, as explicated by H&L, consists in
the contribution of a linguistic (or logical) symbol to the reference or the
truth conditions of an expression containing the symbol.
Just how close these two oppositions are connected also comes out in
H&Ls discussion of Ducrots example of the use of seats, used in CoI to
elucidate and specify the idea of argumentativity: saying There are
seats in this room invites the addressee to (i.a.) ascribe a certain positive
degree of comfort to the room under discussion. H&L write: But obvi-
ously, there can be assertions about seats in rooms where these seats fail
to be comfortable. Hence [my italics], the implicature is a mere contextual
one, and ipso facto not an inherent [italics original] (non-contextual) as-
pect of the expression. The implicit premise, necessary to complete this
line of reasoning, can only be: If an aspect of the interpretation of an
expression is not truth-conditional (does not have to represent something
in the world of which the expression is predicated), then this aspect is not
an inherent aspect of the meaning of the expression, but a contextual
one. First of all, this begs the question, the point of dispute precisely
being how linguistic meaning should be construed: as (strictly) truth-
conditional or as (at least also) argumentative. So in principle, we could
stop the debate here, as this basic point of H&L contains a fatal fallacy.
However, I nd it even more important to note that H&L overlook the
fact that their observation has actually been used as an argument for the
argumentative view (cf. CoI 11, and the Ducrot reference cited there).
The point is that the utterance There are seats in this room has its argu-
mentative value regardless of the actual degree of comfort, or lack there-
of, of the seats in the room under discussion (the only condition is that
the language users mutually share the idea that rooms with seats are nor-
mally more comfortable than rooms without). This is precisely the point
that explains why the statement that the seats are uncomfortable can only
be connected to this utterance by means of an adversative connective,
e.g., but, and that something like and moreover is incongruent. Assuming,
for the sake of the argument, that it is somehow established as true that
the seats in a certain room are not exactly comfortable, this still does not
make the text There are seats in this room, and moreover they are un-
comfortable a coherent one. If we want to express, i.e., represent linguis-
tically, both the presence of seats and their lack of comfort, then we have
A reply to Hinzen and van Lambalgen 127
to mark this as contrastive, and that is what makes a linguist, whose job
is to account for the use and distribution of linguistic expressions and
their constituent parts, conclude that the argumentative character is in-
herent in the linguistic elements involved.
H&L do say that semantics should account for both inherent and
contextual aspects of linguistic expressions. But they equate these two
notions with truth and argumentativity, respectively, and then also
with the sentence and discourse levels (their Section 3). So according to
H&L, the following 1-to-1 relationships hold:
a) Inherent meaning : descriptive : sentence level (and presumably
below)
b) Contextual meaning : argumentative/inferential : discourse level
It seems to be this relatively implicitbut contestable and contested
1

view of meaning and the organization of semantic description that makes


H&L conclude that the CoI-view of linguistic meaning as including as-
pects of discourse and argumentation gives up the possibility to account
for relationships between language and the world. Not only do they rst
implicitly identify inherent meaning with descriptive meaning, thus
begging the question, they moreover connect descriptive meaning espe-
cially to the sentence level. Since sentence semantics presumably in their
view precedes discourse and inferential semantics (sentences being taken
as the building blocks of discourse), it follows from considering some in-
ferential and discourse meaning as inherent that there is no possibility
to account for correspondences between language and the world. In any
case, this is the only way in which I can make any sense at all of their
statement.
But, of course, nothing of this kind actually follows from the basic as-
sumptions of CoI, or cognitive linguistics in general. It is knowledge of
shared (i.e., cultural) cognitive models that is directly evoked by linguistic
elements, not information about the world; but some of the inferences
that knowing these models allows us to make, do involve the world.
Thus, the primary meaning of beautiful is to express a positive evaluation,
not to give a description of some sort (consider the task of specifying the
truth conditions for H&Ls example of beautiful perfume . . .); knowing
the culture, and especially having some relevant experience, allows many
language users to make some inferences about actual properties of the
perfume involved. But it is not necessary to make such descriptive infer-
ences, and a person not (capable of ) making them can still understand the
utterance.
Another important point about conceptions of meaning relates to the
role of convention. Section 9 of H&L contains many clauses of which
128 A. Verhagen
the noun meaning is a part, but it is not at all clear that it can be used in
the same sense in all these statements; in other words, H&L do not seem
to be aware of, or at least they do not at all worry about, a possible poly-
semy of the term meaning, which might aect the contents and conse-
quences of their statements. They object to the philosophy of meaning
they think they nd in CoI, but do not explicate what specic sense of
meaning they mean. They state one point of their own position, in rela-
tion to compositionality as: meaning will not be conventional: mean-
ing will follow by necessity from algebraic laws, etc.. But in a context
like this (leaving aside the issue whether compositionality is indeed to be
viewed as an algebraic phenomenon, independent of a particular cogni-
tive system), meaning does not have the same sense as in, for example
The meaning of the word banana is: a category of fruit with (prototypi-
cally) characteristics X, Y, Z. The latter involves a relation between a
sound and a concept that is conventional; banana means what it does
because speakers of English mutually share knowledge of the rules for
the proper use of the word. So in all larger expressions, the meaning of
the whole is partly conventional, because of the words; moreover, the bal-
ance between conventionality and compositionality is not xed (consider
banana republic), and there are even complete sentences with a meaning
that is mostly a matter of convention (An apple never falls far from the
tree). This is very elementary linguistics, of course. The basic hypothesis
of CoI, about intersubjectivity being a prominent aspect of meaning, is
explicitly stated in terms of the meanings of linguistic symbols (words
and constructions) (p. 4), i.e., conventional signs. The claim is that inter-
subjectivity is so important that several linguistic elements, especially a
number of grammatical ones, are conventional instruments for intersub-
jective management (and that they have not suciently been recognized
as such in the past). Nothing in the argumentation for this point hinges
on a view of compositionality, which is an important, but independent is-
sue. But H&L keep talking about (philosophy of ) meaning as if it were
a unitary concept, and then present compositionality of meaning as an
argument against conventionality of meaning. It will be clear that this
is simply completely beside the point. Moreover, if all the senses of mean-
ing are to be subsumed under one philosophy of meaning, this philoso-
phy is never going to be anywhere near coherent, so of little explanatory
value.
2
As a nal remark on meaning, a word on H&Ls terminology in rela-
tion to very general scientic and philosophical commitments. In their
attempt to challenge the idea that argumentation and intersubjectivity
are inherent aspects of linguistic meaning, H&L use manipulation and
controlalluding to behaviourismas terms for the function of
A reply to Hinzen and van Lambalgen 129
language use as viewed in CoI, while CoI itself uses management and
assessment and cognitive coordination. Manipulation normally goes
against the interests of the receiver, and especially: without the receiver
recognizing the intentions of the sender (usually, it involves deceit). The
point of the use of argumentation in CoI is precisely to simultaneously
express similarity to animal communication (it is an attempt to inuence),
and a dierence: it is an attempt to convince, i.e., to inuence the re-
ceivers decision making process, by (i.a.) displaying ones communicative
intention.
3
Of course, it is true that we can and do sometimes use lan-
guage, and our brains, to ponder the truth of something, just as it is true
that we can and do sometimes use our legs to run for fun or in an athlet-
ics competition, that we can and do use our brains to play chess and
watch the stars, etc.. But focussing on these kinds of uses is not going to
get us very far in understanding how the features involved (legs, brains,
language, etc.) t into the natural world, that is: in explaining them. The
challenge is precisely to develop hypotheses, maximally constrained by
what we know about evolution and communication in general, about the
way the human communication system also got to be usable for some
functions, such as reference and description, for which is was not, in all
probability, originally an adaptation (cf. Verhagen forthcoming a).
3. Negation and connectives: Interaction between grammatical items
and its explanation
For negation, the point H&L try to make is that a more traditional
non-monotonic logical approach, enriched with clauses that introduce
the possibility of exceptions, can account for the same observations and
generalizations as CoI without introducing the notion of mental spaces;
if that were true, then their analysis would be simpler (using at least one
theoretical construct less than mine). Moreover, they have objections
against this construct as they have doubts about its cognitive and formal
status (see the end of section 3.2 for some remarks on this last point).
3.1. Exception clauses and topoi
As H&L notice, their use of exception clauses runs parallel to the use
in CoI of Ducrots concept of topoi. The general template of the latter
is If P, then normally Q; the template for H&Ls defeasible rules is If
P and nothing exceptional is the case, then Q. Indeed, for the cases they
discuss, their analysis produces the same account of inferences associated
with negative sentences as the one in CoI; the descriptive adequacy
of their account is thus not better than CoI s, so the approaches might
130 A. Verhagen
be considered notational variants. But one important question is: How
about cases they do not discuss, but which are part of the account in
CoI ? Does their analysis generalize to these cases? This amounts to a
question of explanatory power; it will be taken up in section 3.2. Another
question, also an issue of explanation, is: Does their analysis classify ele-
ments into categories that make sense linguistically? In other words: Does
their characterization of the semantics t the distribution of the linguistic
elements involved? This is the issue for the remainder of this section.
In H&Ls analysis, barely indicates the existence of an exception; for
example in He barely passed his rst statistics course, barely indicates
that the passing was abnormal, so the clause nothing exceptional is the
case is not satised, and therefore the subsequent derivation of a rele-
vant inference Q (e.g., he can pass other courses as well) is blocked.
4
The same result is produced in CoI by the assumption that barely invali-
dates the applicability of topoi associated with the content of the sentence
(the performance was so minimal that one cannot draw conclusions that
one would otherwise draw from the fact that he passed). First, it seems
to me that there may be a serious conceptual problem with the exception-
approach. Exception does not seem to be a primitive notion; it presup-
poses the notion of a rule, whereas the reverse does not hold. Rules (in-
cluding those about what is normally, not necessarily always, the case)
can be experientially based generalizations (e.g., in terms of frequency:
What happens most of the time?), but not the other way around. In-
deed, exceptions must be dened in terms of rules (negatively), as they
are not themselves generalizations (what makes something an exception
is the background rule). Thus it seems to me that H&Ls use of ab as a
proposition letter indicating some abnormality as if it were something
unanalysable, may mask the possibility that their analysis ultimately
reduces to mine.
Second, it is clear that the exception-approach to barely implies that it
belongs to a dierent class of linguistic elements than not: the rst belongs
to the abnormality indicators, the second does not. Here we reach a
fundamental dierence between the logical approach of H&L and the lin-
guistic one of CoI. The initial reason for reconsidering the semantics of
barely in argumentative terms was that both not and barely license the
let alone construction, i.e., their distribution is similar in a linguistically
important way (grammatical behaviour). What the analysis of CoI shows
is that this grammatical behaviour parallels the inferential (and discourse
connecting) properties of both elements (not the real-world relations), and
can thus be explained by assuming that the grammatical properties are
determined by argumentative rather than real-world aspects of mean-
ing. By putting not and barely in semantically dierent categories of
A reply to Hinzen and van Lambalgen 131
elements, H&L simply give up this explanatory power. On the basis of
their account, if the grammatical behaviour of elements reects their
meaning, then one should expect not and barely to be grammatically
very dierent, but in fact they are not; taking the programmatic idea of
language as a window on the mind seriously should precisely lead one
to taking the intersubjective analysis seriously, I maintain. As I said, I
suspect the ultimate source of H&L overlooking this point is that their
basic concerns are logical, rather than linguistic.
5
3.2. Explanatory scope
The last comments in the previous section already indicate that H&L do
not always take into account that an important part of the argumentation
in CoI involves connections between dierent parts of the linguistic sys-
tem, and that they focus their semantic analysis only on certain words
and constructions in isolation. In fact, this is a more general tendency,
that severely undermines the power of their criticism and their alterna-
tive. For one thing, they do not discuss how their analysis of the de-
feasibility of the argumentative implications of not and barely can be
applied/extended to almost, while this is, again, an integral part of the
argumentation in CoI. The importance of the point can be demonstrated
with example (1) (H&Ls 4a*), showing the defeasibility of (at least some
of the) argumentative inferences associated with a negated sentence:
(1) a A. Do you think our son will pass his courses this term?
b B-a. Well, he did not pass his rst statistics course.
c A. But he got a very good grade for the astrophysics course!
Just like I invoked a wide-spread cultural model (Statistics is a hard sub-
ject), H&L invoke another one in the form of astrophysics to demon-
strate that in a next move in the discourse, the initial suggestion He is
not going to pass can be reversed again (If he is smart enough to get a
good grade for astrophysics, he may still pass). Now the point is that the
same reversal can also be established by certain sentences that contain the
operator almost:
(2) (1)ab
c A. But he almost passed the astrophysics course!
In this case, As utterance entails the negation of He passed astrophysics:
in actual fact, the student in question neither passed statistics nor astro-
physics. But by means of almost, speaker A construes the latter as an argu-
ment for the conclusion that he might still pass the term, i.e., in the same
way as the strongly positive statement in (1)c. This is straightforwardly
132 A. Verhagen
accounted for in CoI ( just like barely is a relatively weak negative opera-
tor, almost is a relatively weak positive operator on the argumentative
orientation of an utterance), which also explains why almost does not li-
cense the let alone construction (the argument from linguistic distribution
again), despite the entailment of a negation.
How should this be accounted for in an exception-approach? First of
all, H&L do not themselves indicate what such a generalization would
look like. Perhaps we should say that almost also marks the event de-
scribed in the sentence as an exception, so that otherwise licensed infer-
ences cannot be derived? That would clearly not suce, as it would then
be said to have the same meaning as barely. In fact, we can now see that
the characterization of barely as an exception-indicator is insucient
minimally, the direction, i.e., negative, of the inferences involved should
be included in this characterization. So suppose we characterize almost P
as not-P, and something abnormal is the case. Even though this might
seem better, I dont think it is. The problem is to derive positive infer-
ences from the negative statement. Recall that the general form of the de-
feasible rule, according to H&L, is If (P and nothing abnormal), then
Q. When we now have, due to the presence of almost, not-P as a mi-
nor premise, then it seems to me that nothing can be derived anymore.
The conjunction of the rule (a) If (P & nothing is abnormal), then Q
with (b) something abnormal is the case can lead to the derivation of
(c) not-Q, even if P is the case (with closed world reasoning); but the
conjunction of the same rule (a) with (b) not-P and something abnormal
is the case cannot produce the derivation of (c) Q, as not-P by itself
contradicts the antecedent clause of the rule (P & nothing is abnormal).
In fact, it seems to me that in this case, too, not-Q would have to be de-
rived, given the falsity of the antecedent clause. Thus, I conclude that
there are good grounds for claiming that the exception-approach does
not generalize to almost, and in that sense is also low in explanatory
power, while almost ts naturally into the argumentative framework of
CoI, as a weak positive argumentative operator, complementary to the
negative barely.
A similar conclusion holds for H&Ls discussion of the concessive
connective although. While their reanalysis in terms of the exception-
approach can provide an adequate semantic characterization of sentences
of the type p although q in isolation from the rest of the linguistic system,
they do not show that it accounts for interactions with other elements, in
particular negation. Precisely this interaction is the key part in the argu-
mentation in CoI: although cannot occur in the scope of negation, i.e.,
not p although q must be understood as (not p) although q; it can-
not be interpreted as not (p although q), while its positive (causal)
A reply to Hinzen and van Lambalgen 133
counterpart because can occur in the scope of negation: not p because q
can in principle be read both as (not p) because q and as not (p be-
cause q). In this case, I will refrain from elaborating H&Ls approach
myself to see how it might work, and simply observe that this is what
they actually should have done in order to make their point, but they
havent.
The analysis in CoI of these phenomena crucially rests on the assump-
tion that sentential negation introduces a separate representation (men-
tal space) of the viewpoint that the speaker of the present sentence op-
poses. This point is also not mentioned by H&L, who simply dismiss
mental spaces as if they were only used in the analysis of negative sen-
tences as such. On the contrary, chapter 4 of CoI shows that a mental
space analysis of negation provides an explanation not only of the combi-
natorial restrictions between negation and although, but also of a number
of such restrictions between negation and causal connectivessome of
which exhibit scope restrictions similar to although. Moreover, the mental
space analysis of negation is motivated in chapter 2 (as it is in the mental
space literature in general) independently of the argumentative analysis of
negation, viz. in terms of the interpretation of discourse anaphors follow-
ing negative sentences, and the connective On the contrary. The greatest
explanatory power of the mental space approach, according to CoI, lies
in the possibility of this single idea to unify the analysis of the linguistic
distribution of a number of phenomena. While it may be possible to con-
struct an analysis of simple although-sentences without special machin-
ery for mental spaces, this analysis again does not naturally generalize
to cases of interaction with other phenomena, as manifested in distribu-
tional and interpretive restrictionsa fundamental concern for a linguist
with the ambition to provide explanations. But again, H&Ls concerns
seem to be located more in the dimension of logical rather than linguistic
analysis.
It is in this context that H&L dedicate a separate section, with the title
Cognitive signicance, to the status of the theoretical construct of
mental spaces, which in their view is rather dubious. To many cogni-
tive scientists, this may appear somewhat puzzling, because the basic
idea of mental spaces seems to be just a specic formulation of the funda-
mental human capacity of perspective taking and perspective shifting: to
entertain the same object or idea in dierent ways, from dierent an-
gles, etc., i.e., to combine dierence and sameness, by means of parti-
tioned representations (Dinsmore 1991). As it turns out, however, what
H&L mean is that it is not (to their knowledge and/or standards) su-
ciently formalized. In their view and invoking Chomskys earliest work,
the most important condition for a linguistic analysis to be called a
134 A. Verhagen
cognitive one, is to be explicit, which they immediately identify with to
be given a computational or algorithmic description. Firstly, notice
that they move, very quickly, from what might be a necessary condition
to a necessary-and-sucient one. Secondly, it is quite strange, in view of
the history of science (including recent cognitive science) to read that the
integration of elds of inquiry should depend on the explicitness of
computational descriptions. In actual fact, the possibility of operation-
alising generalizations obtained by one kind of research method in terms
of another seems at least as important, and to my mind much more com-
mon (If your distributional analysis says that A and B are basically the
same/dierent, and if this is psychologically real, then the results of my
reaction time/fMRI-measurements/etc. should look like this: . . . .). But
in the main stream of the generative enterprise, the focus has been on
developing formalisms rather than on deriving such predictions from the
theory and testing them. The confrontation with evidence, however, is the
hallmark of empirical science; the generative preoccupation with formal-
isms at the expense of maximising evidence is thus indicative of the fact
that linguistics is seen more like philosophy or mathematics than like
science. It seems to me that H&Ls point of view is only a recent carry-
over of the unfortunate identication, in the 1950s indeed, of language
with formal language (in the sense of the set of well-formed strings of
elements taken from some nite alphabet) that has hindered the under-
standing of human languages as historical and psychological phenomena
that cannot be so dened, but that are still quite real ( just like, to take a
well known example, a biological species).
4. Complementation and recursion
4.1. A minimalist account?
There is a curious sort of complementarity in H&Ls response to CoI.
Their discussion of negation and connectives contains an alternative se-
mantic analysis, but does not really pay attention to combinatorial and
distributional (i.e., syntactic) aspects as sources of evidence for the seman-
tics. Their treatment of complementation constructions exhibits the re-
verse pattern: it focuses almost completely on the issue of the proper
syntactic analysis, and contains no more than two sentences about the
semantics; in fact, for the sake of the argument they go along with the
CoI-analysis (in brief: matrix clauses are perspectival operators, rather
than event descriptions with other events as parts),
6
so here they ignore
the possibility that the semantics may provide a constraint on the syn-
tactic analysis (which, to be sure, is not to say that it would ipso facto
A reply to Hinzen and van Lambalgen 135
provide such an analysis). Be that as it may, their comments essentially
come down on an argument against a constructionist approach to syntax,
from the point of view of Chomskys minimalist program,
7
and I will
accordingly also only comment on issues of syntactic analysis strictu senso.
They do make some comments on perspective taking, and I will also have
a bit to say about that, but they are unrelated to the syntactic analysis.
H&L suggest that standard tests for constituency provide evidence in
favour of the idea that clausal complements should be analysed as verbal
arguments, i.e., as bearing the same syntactic relation to the verb as a
nominal complement. The problem is that these tests of constituency are
never conclusive. They give the examples George saw/knew/said what?
and George saw/knew/said that X, and Bill saw/knew/said so too, to sug-
gest the generalization that complement clauses in general can be re-
placed by what and so. But that is simply not true, witness *George
warned/was afraid what? and *. . . and Bill warned/was afraid so too,
while George warned/was afraid that his opponent would raise taxes is
ne. Thus, although the distribution of what and of so partly overlaps
with that of complement clauses, there are also discrepancies. This makes
allowing replacement by what/so basically worthless as tests, as they
sometimes produce the answer no and sometimes yes to the question
Does this complement clause bear the same syntactic relationship to
the matrix verb as a noun phrase or a pronoun?. What H&L do, decid-
ing that the yes-answer is the decisive one, is a clear case of the meth-
odological opportunism in much syntactic argumentation exposed by
Croft (2001: Ch. 1). As we saw previously, H&L have a tendency to over-
look one of the most basic concerns of a (cognitive) linguist: to account
for the distribution of linguistic elements, and to take the patterns in this
distribution as the most reliable indicators for the precise way in which
language provides a window on the mind.
H&L mention part of the more complete discussion of this issue in CoI,
admitting that it is not really clear to them what the problem is, and then
go on to provide an analysis of one type of complementation construc-
tion in terms of the minimalist program. They start with formulating a
number of what they call rather minimal assumptions. Leaving aside
whether they are really minimal in the sense of virtually a conceptual ne-
cessity (I dont think so), I will restrict myself to the question whether
this approach accomplishes what H&L claim it does. They state that
these assumptions allow one to describe the similarities between nominal
objects and clausal complements, and of course it does: any suciently
abstract analysis does. They then say that because this analysis does not
refer to syntactic categoriesi.e., it abstracts from the dierences be-
tween nominal and clausal phrasesit does not predict that nominal
136 A. Verhagen
and clausal phrases have the same distribution. But of course, without
additional (presumably not so minimal) stipulations, that is precisely
what the analysis does predict. If the claim is (and that is how H&L pres-
ent it) that the minimalist approach can explain the occurrence of both
nominal and clausal complements (and not only describe what, however
minute and abstract, is similar to them), then the system as they describe
it must predict the same distribution for the two (and more?) types of
phrases that are instantiations of the fully general category label X
(again, without additional stipulations). It seems to me that there is more
of a logical error here than in CoI (cf. footnote 6).
Somewhat more mildly, one could say that it while it may be true that
H&Ls minimalist analysis does not strictly predict the same distribution
for nominal and clausal phrases, it does not predict the dierences either.
Then the CoI-analysis would still have to be viewed as superior, since
it does predict the possibility of clausal complements with warn and be
afraid despite the fact that these predicates do not take nominal objects:
they are both perspective markers (as a verb of communication, and a
mental state predicate, respectively) and hence fully compatible with the
hypothesized meaning of the complementation construction (notice that
the analysis of form and function crucially meet here). But in any case,
the minimalist analysis as provided denitely does not account for the ob-
served distribution of clausal complements as only partially overlapping
with that of nominal ones.
Surprisingly, the most detailed actual syntactic analysis in H&Ls paper
ultimately results in a full contradiction. For the rather sketchy analysis
of standard object complementation (the George saw/knew/warned/ . . .
examples above), H&L invoke the general principle that phrases are
headed. They then attempt to give a minimalist account of copular
complementation constructions of the type The danger is that the middle
class feels alienatedwhich t straightforwardly into CoI since being a
danger is not an observable property in the world, but rather a subjective
assessment; hence the matrix clause evokes a (perhaps unidentied)
perspective, and thus satises the conditions for combination with a com-
plement clause. In the minimalist account, the predication must count as
symmetricalonly in this way is it possible for either element of the
small clause allegedly underlying such sentences to surface as the
subject of the sentence (cf. That the middle class feels alienated is the
danger). H&L state explicitly: Neither the CP [ clause] nor the DP
[ nominal phrase] are the head. This directly contradicts their minimal-
ist claim (iii) (phrases are headed), which was moreover necessary in
the description of object complements. I conclude that their account is
inherently inconsistent, and hence that it again does not accomplish what
A reply to Hinzen and van Lambalgen 137
it is claimed to accomplish, also not for copular matrix clauses of comple-
ments. And then we even have not yet touched upon all the theoretical
machinery invoked, such as movement and empty structural positions,
for which Occams razor would require independent evidencebut that
is a much more general issue than need concern us here.
4.2. Perspective taking, recursion, and understanding false beliefs
In CoI, it is observed that the hypothesis of complementation expressing
perspective taking immediately accounts for the fact that complementa-
tion is a prototype of recursion in language (the possibility for a structure
of type X to be embedded in another structure of type X), since concep-
tual perspective taking itself inherently allows for recursion. Thus, the
source of this case of recursion in language is in a sense placed outside
language, but that is dierent from placing recursion outside lan-
guage, as H&L construe it. More importantly, they contest the CoI ex-
planation on the basis of the argument that this explanation would only
work if this conceptual recursivity is propositional, for which they
claim there is no evidence. They do not state very precisely what they
mean by propositionality, but they refer to experiments involving
what is known in theory of mind research as false belief tasks; these
are tasks in which subjects must be able to entertain another persons be-
lief about the world and predict how s/he would act on that basis, while
knowing simultaneously that this belief is false (hence the term), so that
the subjects own response to the situation would be dierent. If recursion
were restricted to this kind of management of incompatible beliefs,
then H&L would have a point, because (understandably) having a system
of secondary representation for beliefs, i.e. a system on top of the primary
sensori-motor system, seems to be a necessary condition for performing
false belief tasks adequately.
However, managing false beliefs is simply not the same as perspective
taking, it is one of its most abstract and complex forms. Human children
develop several skills of social cognition before language, such as rec-
ognizing intentionality (distinguishing intentional acts from accidental
events), sharing attention (e.g., in gaze following), and directing attention
(pointing, showing. These basic skills all involve perspective taking, and
their development is a necessary condition (given the arbitariness of con-
nections between sound and meaning as children encounter them in the
world) for the development of linguistic symbolic communication (Tom-
asello 1999): it is only through recognition of an adults intention that a
child can start to make guesses about the meaning of some sound. More-
over, these skills clearly already exhibit the potential for recursion; e.g., a
138 A. Verhagen
child can manage other peoples attention to get them to show something
to the child. More recently, is has been shown that very young children
(and, to a limited extent, young chimpanzees) can also recognize other
peoples goals and desires, as evidenced by their propensity to provide
help (Warneken and Tomasello 2006). These are complex social cognitive
skills, and even in cases where our closest relatives can be argued to have
similar abilities, humans are usually much better at them, also at a young
age. Still, they are less complex than understanding beliefs, which are rel-
atively permanent mental states not directly caused by the outside world
(such as perceptions) but by other mental states or eventsnor directly
causing actions, but only indirectly so through guiding plans and inten-
tions (cf. DAndrade 1987). And they all involve simple alignment of the
self with the other, not alignment plus dissociation, which, as mentioned
above, requires a system of secondary representation.
Thus, understanding other people as having intentions and desires like
oneself is simpler and more basic, also developmentally, than understand-
ing beliefs, and especially false beliefs. Now language, being a system of
symbolic communication, has the (fortunate) automatic side eect of
also providing humans with a system of secondary representation (cf.
Keller 1998: 127128), so that it may, itself being based on capacities for
social cognition, provide the scaolding to enhance these capacities to
a level like that of managing false beliefs. Thus, the acquisition of false
belief understanding may very well be dependent on the acquisition of a
representation system for perspectivization, such as complementation. As
a matter of fact, Tomasello has been one of the scholars contributing
some of the most compelling evidence for this view so far (Lohmann
and Tomasello 2003). So it is not at all a matter of putting the cart be-
fore the horse, but rather a matter of treating perspective taking,
theory of mind, etc. not as monolithic concepts, but as congurations
of features that constitute a family of related perspectivization capabilities
of dierent degrees of complexity.
8
5. Conclusion
H&L attempt to show that data adduced in CoI as support for an inter-
subjective, argumentative view of meaning in grammar, have an explana-
tion in other approaches, which they consider more traditional and
that they would in principle consider superior; but in general these at-
tempts fail. The reasons for this failure are various, including misunder-
standings and misconstruals, but the most important one is the fact that
they ignore the precise character of the task of explanation in linguistics,
which involves taking the distribution of linguistic elements seriously:
A reply to Hinzen and van Lambalgen 139
if several linguistic forms behave similarly with respect to one or more
environmentsgrammatical ones and/or discourse onesthen an analy-
sis should account for this (with a minimum of assumptions, of course),
to be acceptable as an explanation (in some cases, H&Ls overlooking of
this crucial point even makes them leave out certain crucial parts of anal-
yses in CoI from their own discussion). Indeed, it is only by taking the
distribution of linguistic elements seriously in this sense, that the study of
language provides an independent window on the mind, such that cer-
tain conceptions of the nature of meaning and mind are not already
built into the foundational concepts of a purported explanation.
It is thus ironic, in my view, that H&L turn explanation into their
major point in their conclusions: I couldnt agree more.
Received 28 September 2007 Leiden University, The Netherlands
Revision received 19 November 2007
Notes
* Authors email address: 3arie@arieverhagen.nl4.
1. The usefulness of drawing boundaries and connections in this way has been disputed by
cognitive and functional linguists for decades now, and these alternatives have produced
several important insights. In an important sense, Fauconniers (1985) theory of Mental
Spaces was motivated by the discovery of the inferential character of meaning at the
level of the sentence, and even below; metaphors were demonstrated to be both inherent
(in sentences a`nd in words) and argumentative in Lako and Johnson (1980), etc.; an
approach showing quite directly that the distinctions invoked by H&L must be called
into question, is Levinsons (2000) theory of presumptive meanings. So given the state
of the art in cognitive linguistics, and in cognitive science in general, some more argu-
mentation to still maintain the traditional view is highly desirable, to say the least.
Even though, admittedly, one cannot cover everything in a relatively short commentary
article, references to other work do not constitute such an argumentation (cf. Bierwischs
2006 response to Hamm, Kamp and Van Lambalgen 2006).
2. The situation may even be worse. In English (unlike some other languages), the term
meaning may also be used for a contextually derived, i.e., person and time bound, inter-
pretation of a linguistic element or a piece of discourse, and even for what a speaker/
writer means (i.e., intends to convey) with an utterance, and it seems to me that
H&L include these senses in their notion of meaning too. Needless to say, all the
senses are related, but they are certainly not identical. One important dierence is that
conventional meaning is, by denition, a social phenomenon, while speaker meaning is,
also by denition, an individual phenomenon. Therefore, a theory of speaker-meaning
and a theory of conventional meaning can never be the same, as a matter of principle
(although they can and should inform and constrain each other).
3. H&L widen the gap between animal and human communication, not only by making
humans very dierent from animals, but also by underestimating the cognitive and com-
municative capabilities of animals, especially when they say that these are [s]tuck in the
immediate here and now (see Emery and Clayton 2004 on certain food caching birds),
only have a small number of vocalizations (see Kroodsma 2004: 122 on some song-
140 A. Verhagen
birds having repertoires of thousands of songs), which are all intrinsically linked to an
immediate [ . . . ] purpose (cf. Pepperberg 2004 on Grey parrots).
4. H&L also notice that the exception-approach entails the derivation of a variable for an
abnormality. Rather than an advantage, I consider this somewhat of a problem, as I
have no trouble understanding I barely passed and I failed although I worked hard with-
out being committed to even the existence of a particular abnormality such as being sick
on the day of the exam; so the occurrence of some abnormality does not seem to be a
necessary condition for the occurrence of an exception. Rather, the inference of the
possible existence of an abnormality seems to be a defeasible inference itself (which it is
not in H&Ls approach, as far as I can see).
5. In their footnote 2, they actually cite one of a number of passages from CoI stating this,
but they seem to simply have missed the point. In a way, their analysis consists of a
return to the position of Fillmore, Kay and OConnor (1988), the problems of which
precisely motivated the alternative analysis in CoI.
6. In one of these two sentences, they insert the proviso that the functional dierence be-
tween their examples (5) and (6) is not obvious to them. But one of the (repeated)
methodological points in CoI is that such dierences are often not obvious when one
looks at sentences in isolation, and only become visible when one looks at what are
and are not coherent ways of tting a sentence into a piece of discourse; it is that kind
of evidence that is adduced in CoI to make the point. H&L do not recognize the validity
of this kind of evidence, claiming it contains a logical error, but it is clear that this
opinion is entirely based on the fallacy of assuming a 1-to-1 relationship between sen-
tence and inherent meaning as discussed in section 2, so not a matter of logic but of
assumptions about the subject matter.
7. In fact, they devote a separate section to a very general discussion of this topic. The
issue has been discussed in many other places in a more adequate way than I could do
here, so I will restrict myself to two remarks. First, H&L call the idea that constructions
can be reduced to deeper principles that are not construction-specic, a claimed
achievement of Chomskys two most recent research programmes. However, if there
is one thing that work in constructional approaches over the last 10 years or so has es-
tablished, then it is tons of evidence that the claimed result is not at all achieved, a`nd in
fact inachievable, for all practical and theoretical purposes. The alleged reduction of
raising constructions and passive constructions to a single non-specic rule Move NP
or even Move, a few other principles, plus some construction-like stipulations (cf.
H&Ls idea of some syntactic heads subcategorizing for specic semantic categories)
to take care of the details, turned out not to generalize to many other constructions, and
meanwhile one construction after the other was found that has demonstrably unique,
so irreducible, and yet productive features. Second, for conceptual reasons to doubt
the general desirability of the Minimalist approach, I would like to point here to work
within the generative tradition that H&L adhere to (though not the two research pro-
grammes mentioned above), viz. Jackendo and Pinker (2005) and Culicover and Jack-
endo (2005), esp. chapter 1.
8. A possible misunderstanding I have encountered in discussions of this point is that per-
spective taking would be the only source of recursion in language. It is true that CoI
does not mention other possible sources, i.e., in other conceptual domains. As a matter
of fact, I think that there are other such sources, independent of perspective taking (e.g.,
the specication of locations or referents, as manifested in embedding of prepositional
phrases and relative clauses). But these still do not create an overall potential for
recursion of all kinds of phrases; rather, recursion is restricted to its own functional
niches (see also Verhagen forthcoming.b).
A reply to Hinzen and van Lambalgen 141
References
Bierwisch, Manfred
2006 Comments on: Fritz Hamm, Hans Kamp, Michiel van Lambalgen, There is
no opposition between Formal and Cognitive Semantics. Theoretical Lin-
guistics 32: 4145.
Croft, William
2001 Radical Construction Grammar. Syntactic Theory in Typological Perspective.
Oxford: Oxford University Press.
Culicover, Peter W. and Ray Jackendo
2005 Simpler Syntax. Oxford: Oxford University Press.
DAndrade, Roy G.
1987 A folk model of the mind. In: Dorothy Holland and Naomi Quinn (eds.),
Cultural Models in Language and Thought. Cambridge: Cambridge Univer-
sity Press, 112148.
Dinsmore, John
1991 Partitioned Representations: A Study in Mental Representation, Language
Understanding, and Linguistic Structure. Dordrecht: Kluwer.
Ducrot, Oswald
1996 Slovenian Lectures/Conferences Slove`nes. Argumentative Semantics/Seman-
tique argumentative. Igor Z

. Z

agar (ed.). Ljubljana: ISH Institut za human-


isticne studije Ljubljana.
Emery, Nathan J. and Nicola S. Clayton
2004 The mentality of crows: Convergent evolution of intelligence in corvids and
apes. Science 306: 19031907.
Fauconnier, Gilles
1985 Mental Spaces. Aspects of Meaning Construction in Natural Language. Cam-
bridge, MA: The MIT Press. [Reprinted 1994, Cambridge: Cambridge Uni-
versity Press.]
Fillmore, Charles J., Paul Kay and Mary Catherine OConnor
1988 Regularity and idiomaticity in grammatical constructions: the case of let
alone Language 64: 501538.
Hamm, Fritz, Hans Kamp and Michiel van Lambalgen
2006 There is no opposition between Formal and Cognitive Semantics. Theoreti-
cal Linguistics 32: 140.
Jackendo, Ray and Steven Pinker
2005 The nature of the language faculty and its implications for evolution
of language ( Reply to Fitch, Hauser, and Chomsky). Cognition 97: 211
225.
Keller, Rudi
1998 A Theory of Linguistic Signs. Oxford: Oxford University Press.
Kroodsma, Don
2004 The diversity and plasticity of birdsong. In: Peter Marler and Hans Slabbe-
koorn (eds.), Natures Music. The Science of Birdsong. Amsterdam: Elsevier
Academic Press, 108131.
Lako, George and Mark Johnson
1980 Metaphors We Live By. Chicago/London: The University of Chicago Press.
Levinson, Stephen C.
2000 Presumptive Meanings. The Theory of Generalized Conversational Implica-
ture. Cambridge, MA: The MIT-Press.
142 A. Verhagen
Lohmann, Heidemarie and Michael Tomasello
2003 The role of language in the development of false belief understanding: A
training study. Child Development 74: 11301144.
Pepperberg, Irene M.
2004 Grey parrots: learning and using speech. In: Peter Marler and Hans Slabbe-
koorn (eds.), Natures Music. The Science of Birdsong. Amsterdam: Elsevier
Academic Press, 363373.
Tomasello, Michael
1999 The Cultural Origins of Human Cognition. Cambridge, MA: Harvard Uni-
versity Press.
Verhagen, Arie
forthc.a Intersubjectivity and the architecture of the language system. In: Jordan Zla-
tev, Timothy P. Racine, Chris Sinha, Esa Itkonen (eds.), The Shared Mind:
Perspectives on Intersubjectivity. Amsterdam/Philadelphia: John Benjamins
Publishing Company.
forthc.b What do you think is the proper place of recursion? Conceptual and empiri-
cal issues. The Linguistic Review.
Warneken, Felix and Tomasello, Michael
2006 Altruistic helping in human infants and young chimpanzees. Science 31:
13011303.
A reply to Hinzen and van Lambalgen 143
Tense and cognitive space:
On the organization of tense/aspect systems
in Bantu languages and beyond
ROBERT BOTNE AND TIFFANY L. KERSHNER*
Abstract
Bantu languages are well-known for their complex tense systems encoding
multiple degrees of remoteness. Two assumptions underlie most approaches
to analysis of such systems: (1) that linguistic time is optimally construed
as a unidimensional expanse, whereby multi-tense systems carve up the
timeline in regular progressive intervals away from the speech event; and
(2) that tense markers quintessentially exhibit no overlap in denoting
reference along this expanse. In this paper, the authors propose a dierent
approach to understanding Bantu tense systems which treats linguistic
timefrom the perspective of Ego (the conceptualizer)as a multi-
dimensional array comprising cognitively dissociated temporal worlds, or
domains, temporally linked and grounded in the deictic dichotomy between
events construed as occurring in a contemporal world of the present ver-
sus those situated in cognitively dissociated domains. That is, tense markers
function to situate events in one of two distinct conceptual types of domain
that correlate with dierent construals of time: Ego-moving or moving-
time. Support comes from a variety of curious facts found in Bantu lan-
guages. A key element of this approach is that it provides an explanation
for why temporal overlap of tenses does, indeed, occur, and advances the
position that there are conceptually dierent pasts and futures.
Keywords: Bantu; cognitive domains; dissociation; semantics; tense.
1. Introduction
In his inuential work Tense, Comrie (1985: 50) alludes to a possible
universal of tense systems: in a tense system, the time reference of each
tense is a continuity. By this, he seems to imply (1) that linguistic time
is optimally construed as a unidimensional expanse and (2) that tense
Cognitive Linguistics 192 (2008), 145218
DOI 10.1515/COG.2008.008
09365907/08/00190145
6 Walter de Gruyter
systems exhibit no gaps in denoting reference along this expanse, a posi-
tion reiterated in, for example, Givo n (2001) and Frawley (1992). How-
ever, Comrie points out a possible exception to that hypothesis in Burera,
an Australian aboriginal language.
1
Burera has a formal opposition be-
tween two verbal suxes, -nga and -de, each of which has two temporal
interpretations, present time reference (be V-ing) and recent past (V-ed
in last few days) with -nga, hodiernal past (V-ed earlier today) and re-
mote past (V-ed more than a few days ago) with -de. Both morphemes
appear to denote discontinuous time reference and, hence, constitute
counter-examples to the purported universal. Bybee et al. (1994: 104), cit-
ing data from Merrield 1968) point out a similar case in Palantla Chi-
nantec. In this language, ka
1
denotes an action just completed or com-
pleted on another day, in opposition to na
2
, which refers to an event
occurring earlier on the same day. Signicantly, Comrie suggests margin-
alizing this kind of situation in coming to understand tense systems:
This kind of tense opposition does not t well within most current conceptions of
tense, although its existence must be acknowledged; at best, one could appeal to
its rarity as an excuse for according it marginal status within the overall theory.
(p. 89)
We believe, contrary to Comries view, that such seemingly idiosyn-
cratic distinctions constitute keys to understanding how tense systems
are organized. In particular, we believe that they provide evidence for a
multi-dimensional conceptualization of time and cognitive space. In this
paper, we set out various kinds of evidence from Bantu and similar Ban-
toid languages that support this view.
Tense systems in Bantu languages are typically rich and complex, with
multiple past and/or future tense markings. Thus, a common set of past
tenses may include a distinct form for immediate past, another for a past
earlier in the day, a third for yesterday or a few days ago, and a fourth for
a more distant past. Generally, Bantuists conceive of these dierent past
forms as denoting linear temporal reference at farther and farther remove
from the speech event. Nurse (2003: 99), for example, states, . . . dierent
languages divide the timeline up dierently, resulting in a dierent num-
ber of tenses. In principle, the timeline can be cut at many points. If this
view were correct, we should expect to nd that the only dierence se-
mantically would be in the time referred to or, morpho-syntactically, in
the form of the tense marker. However, as we will show in the cases ex-
amined here, other dierences in the semantics and morpho-syntax arise
that cannot be explained, or are unsatisfactorily explained, in terms of a
simple linear timeline.
146 R. Botne and T. L. Kershner
2. Tense and time
Tense, painted in rather broad strokes, has commonly been dened as
that grammatical category that marks the location in time of some event
2
with respect to some conventionally recognized reference locus (see, for
example, Chung and Timberlake 1985 among others). That is, temporal
relations can purportedly be conveyed in terms of four basic concepts:
an anchoring reference locus, a situated event, a direction or temporal
location vis-a`-vis the reference locus, and, in some cases, the degree of
remoteness from the reference locus. The typical deictic reference locus
in natural language is the time of the speech event itself, with events con-
strued as situated temporally before, after, or simultaneous with it. Con-
sonant with this perception of tense is the common view that tense is best
understood and represented in terms of a one-dimensional linear timeline
anchored by the speech event. Indeed, Frawley (1992: 337338) explicitly
states that [t]he stereotypical, ideal timeline is an entirely adequate
model of linguistic time. Likewise, Givo n (2001: 285) asserts that [t]he
category tense involves the systematic coding of the relationship between
two points along the ordered linear dimension of time. As we intend to
show, data from a variety of Bantu languages demonstrate that this is too
simple a mental model of tense systems and that there is not such a simple
linguistic correspondence between time and tense, that the common corre-
lation of tense marking solely with the traditional unidimensional timeline
fails to account adequately for the range and dierences in usage one nds.
As suggested in the statement from Frawley cited above, linguists in
general and Bantuists in particular have persisted in correlating tenses
with a simple timeline. However, Comrie (1985: 2), though subscribing
to the simple linear view that such a diagrammatic representation of
time is adequate for an account of tense in human language, does ob-
serve that the timeline does not directly represent the ow of time, i.e.,
whether the present moment is viewed as moving along a stationary time-
line, or whether time is viewed as owing past a stationary present refer-
ence time point. (p. 3) Nevertheless, he demurs in stating that these
dierent perspectives on the ow of time do not seem to play any role
in the characterisation of grammatical oppositions cross-linguistically.
(p. 3) Binnick (1991: 56) and, later, Lako and Johnson (1999) also
call attention to these alternative perspectives of time but, again, do not
correlate them formally with tense or tense systems. We believe it is nec-
essary to integrate the contrasting perspectives directly into any semantic
analysis of tense/aspect systems as a cogent organizational principle.
In adopting this position, we do not claim that there are dierent time-
lines but, rather, dierent construals of timetime as path vs time as
Tense and cognitive space in Bantu languages 147
stream (Figure 1). In the former, time is construed as a stationary time-
line along which Ego, the conceptualizer, moves, as diagrammatically
presented in Figure 2a; in the latter, time itself is perceived as moving
(Figures. 2b, c). Additionally, either Ego or Event may be perceived as
moving with respect to the other. Metaphorically, one could visualize the
former as a person on a raft (moving-Ego) oating past a gathering
(stationary-Event) on the bank of a stream (Fig. 2b), the latter as a per-
son standing on a bridge over the stream (stationary-Ego) observing var-
ious items oating by beneath the bridge (moving Event(s)) (Fig. 2c). In
the latter case, one can imagine the observer either to be observing items
oating toward her (coming from the future) or, on the other side of
the bridge, to be observing items as they pass by going downstream
Figure 1. Alternative construals of time
Figure 2. Ego (speaking at (S)) and Event (E) construed in relation to time(line) [The g-
ure (I) represents Ego, a diamond ()) the location of an event on a imeline.]
148 R. Botne and T. L. Kershner
(moving o into the past). [N.B. abbreviations can be found in the
Appendix.]
In Figure 2b, Ego (at S) conceptualizes herself as moving in time with
respect to a stationary event (E); in Figure 2c, she conceptualizes herself as
stationary while E moves in time toward her. In each case, the temporal
relation of Ego and Event is constant in time; what varies is the cognitive
orientation the individual conceptualizing the situation chooses to adopt.
Our claim is that a language may correlate these dierent orientations
with dierent formal linguistic features, providing the means for a speaker
to adopt either a path or stream construal at the time of speaking.
In order to reduce the number of schemas used in the paper and to fa-
cilitate comparison of formal marking in each construal, we combine the
path and stream orientations illustrated in Figure 2 into one diagram-
matic representation, as in Figure 3. Furthermore, we will, henceforth,
for ease of exposition, refer to each line as a timeline, even though con-
ceptually they represent alternative perspectives on one timeline.
This contrast in perspectives of time was noted and expressed as early
as the work of Gustave Guillaume (1929, 1937, 1945 cited in Hewson
et al. 2000) and later in the work of Benveniste (1965), Traugott (1978),
Fleischman (1982), Emanatian (1992), Hewson et al. (2000) and, most re-
cently, Evans (2005). Hewson et al. (2000: 3840), following Guillaume,
correlate moving time with aspect, moving ego with tense; Traugott
(1978) and Fleischman (1982) discuss the dierences with respect to come
and go as grammaticized temporal markers, but do not utilize the con-
trast further in developing a model of the organization of tense systems.
Emanatian (1992), though espousing a solely moving-ego analysis of
come and go temporal use in Chaga (E.62)
3
, grants the possibility
that the moving-ego analysis and the moving-event analysis may describe
dierent routes for come verbs to become future markers. Evans
(a) Ego-moving;
(b) moving-Ego or moving-Event
[i.e., either Past to Future or Future to Past (as shown), respectively]
Figure 3. Linguistic construals of time(line) combined
Tense and cognitive space in Bantu languages 149
(2003) provides the most detailed discussion and analysis of these cogni-
tive models of time, arguing that they represent complex, and not pri-
mary, metaphors. All of these views consider there to be a binary contrast
in perspectives. We believe there to be a tertiary distinction that has sub-
tle consequences for temporal marking systems. We will return to this is-
sue in Section 4, but rst we consider tense in relation to mental worlds.
3. Tense and mental worlds
Tense systems constitute the overt manifestation of the linguistic organi-
zation of time. Although the multiple conceptual perspectives of time
noted in the preceding section constitute a key organizing principle, just
as important is the concept of mental worlds or, as we shall refer to
them, cognitive temporal domains. These domains are grounded in the
fundamental dichotomy that exists between basic and dissociated deictic
views of realis, space, and time. As background to this discussion, we be-
gin with a brief overview of reference time, or reference locus, found in
the linguistic literature. Two linear models of tense, oneReichenbachs
(1947) modelthat has been and continues to be particularly inuential
(cf. Smith 2004; Helland 1995; Hornstein 1990, for example), the
otherBulls (1960) modelmuch less so, merit a brief discussion.
Although Reichenbachs work can be considered anti-mentalist to a
certain extent and ignored aspectual relations, the inuence of his model
of tense relations nevertheless merits a brief review. In breaking with the
Jespersenian model of primitive absolute tenses (cf. Binnick 1991: 110
112), Reichenbach (1947) dened tenses in terms of the relations holding
among three timesthe time of the speech event (S), the time of the event
(E), and a reference time (R), inclusion of the abstract reference time per-
haps the most signicant feature of his model. The relative order of each
of these with respect to the two others determined temporal reference. Al-
though this model provided a solution to problems inherent in dierenti-
ating the preterite and the perfect, there nevertheless were problems with
the approach. As Comrie (1981: 25) and later Declerck (1991: 236) point
out, the model advanced the idea that specication and strict ordering of
all three times in a linear manner along a timeline were both necessary
and sucient conditions for the proper specication of any tense. How-
ever, for some tenses, such specication is unnecessary and infelicitous.
For example, in the future perfect in French or English, as in il aura
chante he will have sung, the strict ordering between all three times for
the future perfectSER, S,ER, ESRis unwarranted; the rela-
tion of E to S is not part of the meaning of the form, but simply deter-
mined pragmatically from context. That is, aura chante will have sung
150 R. Botne and T. L. Kershner
simply indicates that the event sing preceded the reference time (R),
which itself is posterior to S; it does not indicate the temporal relationship
of E with respect to S. Rather, as Comrie (1981) and Dinsmore (1982)
propose, and Binnick (1991) reiterates, one can specify two pairwise rela-
tions: E with respect to R, and R with respect to S. As Binnick (1991:
115) rightly points out with respect to Reichenbachs approach, [t]ense
is a matter of how R relates to S. In Reichenbachs terms, this would
be RS (past), R,S (present), SR (future).
Although Reichenbachs introduction of the concept of reference time
was, perhaps, his most controversial and key contribution in thinking
about temporal relations, it is not simply a question of reference point.
As Klein (1992: 533) indicates, the possibility of explaining the dierent
behavior of the past and perfect, for example, hinges on what is under-
stood by R. Thus, he proposes a more explicit and precise denition of
reference time, which he labels topic time (TT), as the time span to
which the claim made on a given occasion is constrained (p. 535). While
we believe this to be a very useful and salutary proposal, what Reichen-
bach (and subsequent adherents to his model) as well as Klein have not
addressed, and what seems to be vaguely hinted at, for example, in Com-
ries and Declercks critiques, is that reference time needs to be further de-
composed into two separate concepts, reference anchor and reference
world, and that linguistic time is conceptualized cognitively. The refer-
ence anchor constitutes a locus of orientation with respect to which an
event may be temporally related, as in the English past perfect, for exam-
ple she had sung, in which the singing occurred prior to some other time
or event which itself preceded the moment of speaking (in the pairwise
modication of Reichenbachs terms, ER:RS). On the other hand, ref-
erence worldsor, as we will label them since we are speaking of mental
activity, cognitive domainsconstitute temporal time spans within which
events are asserted to occur. This kind of distinction began to emerge in
the work of Bull (1960).
Bulls model, similar in certain respects to that of Reichenbach, has had
signicant inuence on the development of some of the ideas presented in
this paper. In his work, Bull proposes a model in which there are multiple
axes of orientation, each anchored by a dierent reference point. Al-
though the number of axes is in principle innite, Bull (1960: 22) suggests
that the maximum number grammatically encoded in any language is not
likely to exceed four. He labels these axes in the following manner: PP
(present point equivalent typically to the time of speaking S) for reference
time at the speech event, RP (retrospective point) for a reference time in
the past, AP (anticipated point) for a reference point in the future, and
RAP (retrospective anticipated point) for a reference point posterior to
Tense and cognitive space in Bantu languages 151
another reference point in the past. Events can be temporally related to
each of these reference anchors in three ways, which Bull labels vectors:
anterior to it, simultaneous with it, posterior to it, as shown in Figure 4.
Since Bulls system is based on the notion of relativity, events can only
be projected and construed in relation to one reference point at a time.
Hence, reference points other than PP, the present speech event, may be
potentially encoded in grammatical forms that indicate the temporal rela-
tion of the particular axis (or reference point) to the speech event.
Though providing a rich model for analyzing tense relations by implicitly
dierentiating reference anchor (his primary points, i.e., PP, AP, etc.)
and reference world (his axes of orientation), Bull nevertheless envi-
sioned temporal relations as features of a single timeline (pp. 22, 24).
Furthermore, his model fails to separate the dierent kinds of temporal
relations encoded in the dierent verb forms, in particular, the dierence
between have forms and -ed forms. This singular view of the timeline, as
we have stated, is insucient to account for the distinctions one encoun-
ters in many tense systems; rather, a dual perspective is necessary.
We incorporate and develop further some of the insights from these
various models into one conceptual framework, but diverge in signicant
ways from the simple linear approaches. Tense, in our view, denotes that
relation that holds between S (the locus of the speech event) and a cogni-
tive temporal domain (comparable, but not identical, to Bulls notion of
axis and Kleins topic time), a relation that is best construed in terms
of clusivity: inclusivityi.e., the deictic center (anchored at S) occurs
within the time span of the cognitive worldversus exclusivity, or
dissociationi.e., the deictic center at S is external to, or dissociated
Figure 4. Bulls (1960) tense schema
152 R. Botne and T. L. Kershner
from, the cognitive world. In the privileged case of inclusion, i.e., when
the cognitive world includes S, we label that world the P-domain, denot-
ing a primary, prevailing experiential past and future perspective. For re-
lations of non-inclusion, or dissociation, we refer to that cognitive world
as a D-domain. For expository convenience, we represent these dierent
temporal domains (i.e., the dierent cognitive worlds) as bounded qua-
drangular planes, as in Figure 5, correlated with two perspectives of
time: (i) ego projecting movement over the temporal landscape from one
cognitve domain to the next, and (ii) either moving-ego or moving-event
(dotted arrows) passing through the P-domain. That is, Ego construes
herself as moving across the temporal landscape from one cognitive
domain, or world, to another. Within a given cognitive world, Ego con-
strues time as moving, either carrying Ego along into the future, or carry-
ing events toward Ego from the future. In our analyses of the various lan-
guages, we endeavor to determine which perspective of Time-moving is
relevant. However, limited data in some languages has precluded making
a denitive determination. This lacuna, though unfortunate, does not im-
pede analysis of the organization of temporal systems within the domain
model.
To illustrate this model, we can consider the binary tense distinction in
English represented morphologically in the contrast opposing -ED and
marked verb forms. The -marked English verb forms situate the event
in the P-domain. Although labeled a present tense, the -form does
not necessarily denote coincidence with the time of speaking (however,
see, for example, Langacker 2001 for a vigorous argument that it does
4
).
Rather, the event may be construed in a number of ways other than
present within the domain. It may, for example, be construed as future or
Figure 5. Correlation of cognitive worlds with three perspectives on time
Tense and cognitive space in Bantu languages 153
as past (better known under the rubric historical present). Consider the
phrase were dining out, progressive aspect in the -tense form. One can
use this in any of the following:
(1) a. Were dining out. (response to a query via cell phone while at a
restaurant)
b. Tomorrow, were dining out.
c. Yesterday, were dining out, completely enjoying our evening,
when her phone buzzes.
d. Were dining out every night this month.
e. When were dining out, I always have red wine.
Context and/or use of adverbials of time situate the event with respect to
the speech event. Certainly, there are specic constraints in English on
when the simple -form can be used felicitously, for example, its non-
use for an on-going action at S (such as, I work now), a Modern English
development (cf. Middle English al dares for drede all are cowering for
fear Burrow and Turville-Petre 1992: 45). Our claim is simply that the
-form situates the event somewhere in the P-domain, as illustrated by
the positions of the small diamonds ()) in Figure 6a, in opposition to -ED.
The -ED form, in our view, indicates that the cognitive domain does
not include the deictic locus at S and, furthermore, is specically past
(Fig. 6b). Note that we are not claiming that the only semantic function
of -ED in English is to mark past tense; that is only one of its functions. It
clearly has others, for example, marking irrealis (e.g., If I knew the an-
swer, I would tell you) or marking social distance or politeness (I wanted
to ask you about that picture). In its role as tense marker, however, it
has only past meaning. However, our model presents a framework invit-
ing a unied approach to both temporal and non-temporal uses of
morphology.
4. Cognitive Grammar and mental space models of tense and aspect
Having briey outlined our model in the preceding section, we turn here
to a synoptic consideration of two other cognitive approaches that also
address tense and aspect. In his Cognitive Grammar approach, Lan-
gacker (2000: 23) considers tense to be the primary grounding element
of a nite clause, which proles a grounded instance of a [process] type
[i.e., a lexical verb]. Hence, tense is one kind of grounding predica-
tion, whose function is to locate the clausal prole, i.e., the process de-
noted by the verb in the nite clause, in relation to the ground (the time
of the speech event, the participants, and any immediate circumstances)
(Ibid. 220), which constitutes the locus of conception and viewpoint, and
154 R. Botne and T. L. Kershner
is evoked implicitly as a point of reference. A verb proles a complex re-
lation, a process, in which its evolution through time is salient (Ibid. 222).
As an example, for English, Langacker proposes two aspectual classes,
perfective and imperfective, construed essentially as bounded, as in Fig-
ure 7a, or unbounded (Fig. 7b) within the immediate temporal scope
(IS), respectively.
The processual prole, i.e., the grounded process, in English is specied
for location in time by either of two markers, or -D, which denote that
a. P-domain (contemporal) construals of -marked forms
[) potenital positions of events]
b. D-domain (past) indicating dissociation of cognitive space from S; marked by
-ED
Figure 6. Temporal worlds and S: English Past (-ED) vs Contemporal ()
Figure 7. Perfective and imperfective processes in time (Langacker 2000: 224)
Tense and cognitive space in Bantu languages 155
the process is either proximal or distal to the ground. An example of
such a relationship, the past of a perfective verb, is schematized in (8),
where the squiggly lines denote the time of the speech event.
This approach to tense and aspect, although adopting a conceptual
cognitive framework, diers little from those that we discussed previ-
ously. Tense is construed and represented in terms of relations along a
unidimensional expanse of time. As we have stressed, such a simple view
of tense relations is inadequate to capture the multi-layered systems of
Bantu languages.
A richer cognitive model addressing tense and aspect is that originally
developed and propounded in Fauconnier (1985, 1997) and further ex-
panded in Cutrer (1994). According to Mental Spaces Theory (MST),
tense and mood provide the means for keeping track of the time and real-
ity status (epistemic distance in MST terms) of a conguration of mental
spaces built up in discourse. Essentially, then, they constitute a discourse
management tool. Mental spaces, in the theory, are partial and tempo-
rary conceptual domains constructed during the process of discourse
(Fauconnier 1997 and Evans and Green 2006). There are four dierent
kinds of space (Cutrer 1994: 7173): (1) a Base space, which is always
in the present and contains the initial viewpoint from which events are
construed; (2) a Viewpoint space, essentially equivalent to the notion of
reference or vantage point, that space from which deictic relations are
determined; (3) a Focus space, which is where meaning is actively being
constructed (that space which an utterance is about); and (4) an Event
space, the temporal space in which the event encoded by the verb takes
place. An example illustrating the following brief narrative will make
these concepts clearer.
(2) Anna is moving to Kansas. She has lived in Indiana for ve years.
Yesterday, she rented a U-haul truck.
An MST representation of the passage in (2) begins with a Base (B)
space, which is also the initial Viewpoint (V), Focus (F), and Event (E)
space, as in Figure 9. This space is interpreted by default as in the present,
as denoted by the verb form is moving. Note that there is much more to
Figure 8. Past of a perfective process (Langacker 2000: 225)
156 R. Botne and T. L. Kershner
the structure of these spaces in MST; we have limited the information to
that which is relevant to tense and aspect representation.
In the second sentence, according to the principles of MST, the present
perfect functions to keep the Base space in focus while adding new infor-
mation relevant to the meaning structure being built in the Base. That is,
the event lived represents an event that is complete with respect to the
Base space and, hence, is accorded a new space (Fig. 10). Focus, how-
ever, remains in the Base (indicated by present has). Current relevance of
the perfect arises from the divergence of Event and Focus spaces, indicat-
ing that knowledge of the former has some relevance in the latter.
In the third sentence, the adverbial yesterday is considered to be a space
builder, hence, this sentence establishes a new space which is marked for
past (-D) with respect to the Viewpoint space, which is in the Base (Fig.
11). This new space is now also the Event space. What dierentiates past
from present perfect is that the new space is also the Focus space.
Figure 9. Representation of Anna is moving to Kansas.
Figure 10. Representation of She has lived in Indiana for ve years.
Figure 11. Representation of Yesterday, she rented a U-haul truck.
Tense and cognitive space in Bantu languages 157
This short sample of MST representation of temporal relations in a text
illustrates the basic principle behind the theory: Base, Viewpoint, Focus,
and Event serve as general discourse organizers. Tense and aspect provide
information on the distribution, location, and conguration of these or-
ganizing mental spaces. Tense comprises three categoriesPast, Present,
Futurethat either denote an already existing space or create a new one.
They function, therefore, as discourse links, connecting various spaces
cognitively. AspectPerfect, Progressive, Imperfective, Perfective
provides information about the arrangement of Viewpoint and Focus
(Cutrer 1994: 100), but, unlike tense markers, does not put a space in Fo-
cus. Crucially, with respect to our model,
[t]he tense-aspect categories characterized here are not represen-tations of seman-
tic form, nor are they intended as language specic grammatical categories. But
rather, they are characterizations of conceptual discourse links, which operate at
the cognitive construc-tion level, and which in the strongest possible claim, are
universal. Each tense-aspect category is a universal type of local link between
spaces, a local relationship which may be extablished between spaces as part of
the underlying cognitive structure. These discourse links are conceptual notions
which are separate from language, but which may be encoded by the grammatical
conventions of individual languages. (Cutrer 1994: 94)
We believe our model complements this approach. As Fauconnier (1997:
82) notes, Languages dier . . . in the type of coding they adopt and
what they code. Thus, whereas MST focuses on tense-aspect in terms of
its conceptual linking of events in discourse, our focus is on the organiz-
ing principles of the tense-aspect system itself, what distinctions are made
within a system and how they relate to each other, in short, what gets
encoded. Consequently, we feel the two approaches may be combined
fruitfully to provide a global picture of how tense-aspect systems are or-
ganized and how they are used to manage discourse.
Concluding this brief excursus into other cognitive approaches, we re-
turn to a consideration of the issue of deixis in our approach.
5. Tense and other verbal deixis
Our view of temporal deixis in terms of dissociation is commensurate
with two other possible deictic verbal categoriesrealis and spatial
positionwhich denote whether the situated event is treated as real or
not, or as occurring in the immediate vicinity of the speech event or not.
In each case we can identify two domainsreal vs. not real, here vs. not
hereone coincident, one dissociated, a distinction comparable, perhaps,
to Traugotts (1978) proximal-distal relation.
158 R. Botne and T. L. Kershner
The speech event, then, can be considered to be grounded in the real,
the here, and the contemporal.
5
We believe that this deictic dichotomy
between the extant real, here and contemporal and the displaced not
real, not here, or not contemporal constitutes a second signicant facet
of the organization of tense distinctions in cognitive space. For this
reason, we propose that cognitive space is divided into two distinct con-
ceptual domains for each of four contrasting deictic components: realis,
temporalityopposing a contemporal domain with a non-contemporal
(past) domain, on one hand, and a non-contemporal (future) one, on
the otherand spatial location, as in Table 1.
6
In each case we can
contrast inclusion of the deictic center within the prevailing cognitive
world, for which we use the label P-domain, and dissociation, for which
we employ the label D-domain. More simply stated, we are proposing
that there are paired conceptual worlds, suggested by the oppositions
set out in Table 1, that natural human languages may choose to mark
grammatically.
A language may choose to mark none of these oppositions grammati-
cally, one, or more. Comparison of several disparate languages in Table
2, for example, shows that Norwegian morphologically marks not con-
temporal (past) with -ET, while Slave [Athapaskan, Hare dialect] (Rice
2000) marks not real and not contemporal (future) with -O-, while Tunen
Table 1. Verbal deixis
inclusive dissociative
reality: real not real
temporality: contemporal not contemporal P
(i.e., Cog domain prior to S)
not contemporal F
(i.e., Cog domain later than S)
spatial position: here not here
Table 2. Dissociative marking (morphological) in Norwegian, Slave, and Tunen
Norwegian Slave Tunen
reality: not real n.m. -O- n.m.
temporality: not contemporal P(ast) -ET n.m. ls
not contemporal F(uture) n.m. -O- jo
space: not here (away) n.m. n.m. ka
[N.M. not marked; indicates no overt morphological marking]
Tense and cognitive space in Bantu languages 159
[Bantu, Cameroon] (Dugast 1971) marks not contemporal (Past) with ls,
not contemporal (Future) with jo, and not here with ka.
7
What is relevant and signicant here is that markers for one deictic re-
lation may come to be used for one of the other relations. For example,
Botne (2003a: 39697) shows that in Chindali (Bantu, Malawi) an
itive marker -ka-, indicating an action occurring at a distance from
the deictic center, developed an additional role as a future tense marker.
Parallel to this, a remote past marker, also -ka-, added the function of ir-
realis marker. Consequently, Chindali marks all of the deictic contrasts
with the same morpheme -ka-. This expansion of functions from one
deictic function to another is also found in English -ED, which began
as a marker of past tense, but came to be used as well for irrealis.
The inclusive vs. dissociative distinction, therefore, constitutes an impor-
tant cognitive opposition that unies these and, perhaps, other such
contrasts.
This concept of dissociation is not a new idea. Seiler (1971) appears to
have been the rst to use the concept as a feature in his analysis of the
preterit in Greek. Steele (1975) adopts it in her analysis of irrealis and
past in the reconstruction of Proto-Uto-Aztecan, while Traugott (1978)
implies it in her proximal-distal distinction of tense relations. However,
James (1982) and Fleischman (1989) argue against Steeles use, preferring
instead to retain the temporal meaning past as a basic, fundamental
notion from which an irrealis reading is derived. More recently, Cutrer
(1994: 184) notes that . . . temporal distance extends to express non-
actuality or non probability. Similarly, Taylor (2002: 395) states that,
. . . the past tense presents a situation as located distant from the
ground, whether this be distance in time or distance in reality. We do
not dispute that irrealis use may derive from past use. What we are pro-
posing here, however, and what diers from previous proposals, is that
dissociation involves potentially all deictic phenomena related to linguis-
tic specication of event occurrence and that it constitutes a fundamental
organizing principle not only of tense phenomena, but also of related
verbal deictic phenomena. Furthermore, temporally it is not simply a
separation of past from present. As our data and analyses will demon-
strate, there are potentially several kinds of past or future reference that
can be dierentiated: one or more that fall within the P-domain and one
or more which are dissociative and, hence, fall within a (past or future)
D-domain.
As further exemplication of the essential concepts we have set forth
here, consider briey the case of Nugunu (A.62), a Bantu language
spoken in Cameroon, whose basic TAM system provides a concrete illus-
tration of these relationships in a complex system. There are eight verb
160 R. Botne and T. L. Kershner
conjugations in Nugunu, as shown in (3) (Gerhardt 1989). In addition
to the pre-verbal temporal marker, the verb in some cases acquires a
H(igh) tone on non-initial syllables (for example, go fyga reveiller
[wake up] > fyga in the remote past (P3)).
(3) Nugunu verb conjugations (Gerhardt 1989; Orwig 1991 [ex. in (e)])
a. P3 matoa ma mba mo bombana gala la voiture la cogne avant-hier
voiture elle P3 le cognerH avant-hier [the car struck him the day before
yesterday]
b. P2 a a ds maa
8
nts ms yshs iyo il a defriche son champs hier
il P2 defricherH champs son hier [he cleared his eld yesterday]
c. P1 a baa fyga tulubu il sest reveille to t (ce matin)
il P1 reveiller to t he woke up early (this morning)
d. RSL go a g l ok d ba tsbsn tu as pris la femme de ton fre`re
(et tu las encore) tu RSL prendreH femme de ton_fre`re
[youve taken the wife of your brother
(and you still have her)]
go a fy ga tu tes reveille
tu RSL reveillerH [youve awakened]
e. Pr a d mba hes leaving/about to leave
3S leave
IMPF a duenene (<due sell) he is selling/will sell
3S sell.IMPF
f. F1 ds gaa miee noni yssys nous allons lenterrer aujourdhui
nous F1 enterrerH aujourdhui ceci [we are going to bury her/him today]
g. F2 a na bola il arrivera [demain/dans quelques
jours] il F2 arriver
he will arrive (tomorrow/in a few days)]
h. F3 a nga foaga nya ja heeni il construira une maison la-bas
il F3 construireH maison la-bas [he will build a house over there]
Time reference:
P3 before yesterday
P2 preceding relevant time unit (e.g., yesterday, last month, etc.)
P1 earlier today
RSL resultative
F1 today or tomorrow [but with adverb can be used for more distant time]
F2 1 or 2 days after tomorrow [later if certain]
F3 >2 days
The remote tenses, P3 and F3, comprise an initial nasal segment, hence,
m.ba and n.ga. The near tenses, P1 and F1, are decomposable as ba-a and
ga-a, respectively. The initial CV element can be observed alone in certain
relative clauses, where only the initial morpheme appears, as illustrated
by the P1 example in (4a), followed by what Orwig (1991: 151) terms a
dependent marker (DEP), -na-. Note that the P2 and F2 tense markers
do not change in form, as shown for P2 in (4b).
Tense and cognitive space in Bantu languages 161
(4) Nugunu P1 and P2 in relative clauses (Orwig 1991)
a. P1 a baa ja a baa go gue, gscamsna gssgs m ba na bola
he P1 do he P1 INF die time which I P1 DEP arrive
he had already died when I arrived
b. P2 aja msss ma a na hume, m bss sda naa nyony
when mass it P2 DEP let_out I NAR go to market
when mass let out, I went to the market
The time denoted by P2 varies according to context, but always de-
notes the relevant time unit preceding the temporal locus, for example,
yesterday (if the locus is today), last month, last year. Consequently, the
temporal denotation overlaps that of P3 in time. In the model we are pro-
posing, this is readily accounted for: the two denote dierent perspectives
of the timeline; P3 situates an event in a D-domain, P2 in an anterior time
unit of the P-domain. The temporal markers of Nugunu are summarized
in Table 3 below.
A salient feature of this set is the parallel nature of the morphemes for
past and non-past, morphologically similar in both segmental form and
tonal marking. The remote tenses are both N.CV and low-toned, the
P2/F2 tenses mono-morphemic and high-toned, the P1/F1 tenses CV-a
with reversed H and L pattern. Given the separability of the nal -a and
the predictability of the tones, we can analyze the -a as the same element.
The identity in form (i.e., a) of P2 and RSL makes it tempting to analyze
them as the same morpheme, as Gerhardt (1989: 321) does. Historically,
the P2 use undoubtedly arose from the resultative (RSL) use, which de-
notes a post-Nucleus resultant state of an event (Fig. 12) that has oc-
curred at some time prior to the moment of speaking; this state continues
to exist at the speech locus S, as illustrated in Figure 13a. Because the
state is denoted as current at S, adverbials denoting the time of the event
E cannot be used, as Gerhardt notes.
P2, on the other hand (illustrated in Figure 13b), denotes the temporal
relation of the event proper (E) with respect to S. In this case, it situates E
in that time unit immediately anterior to the current time unit. The rele-
vant temporal unit may be a natural time unit such as yesterday (or
Table 3. Nugunu tense markers
Tense marking
P3 m-ba n-ga F3
P2 a na F2
P1 ba-a ga-a F1
RSL a (-an) Pr (Impf )
162 R. Botne and T. L. Kershner
last month, last year), or a societal time unit, such as that of a rulers
reign, as in (5). In this use, a temporal adverbial such as iyo yesterday is
appropriate.
(5) ofuje yunu yo indenyee gsdj nyoma
chef ce-la` 3S P2 diriger.P2 village an
ss d (Gerhardt 1989: 321)
dix
ce chef-la` dirigea le village dix ans
(sous entendu, cest le predecesseur de celui qui re`gne maintenant)
[that chief ruled the village for ten years
(understood that it is the predecessor of the one who rules now)]
The question we pose here is why there should be such regularity in the
patterning of form and meaning. We propose that our domain model
provides a principled and motivated answer. The schema in Figure 14 il-
lustrates the Nugunu tenses in the model we have laid out. The tense
markers can be sub-divided into two sets based on their formal and se-
mantic characteristics, one that patterns along the moving-Ego timeline
through the P-domain, one along the Ego-moving timeline across the
Figure 12. Event structure and extension (post-Nucleus result phase)
a. a as Resultative (RSL) marker
b. a as Past P2 marking Anteior time unit
[Anterior (AnTU) and Current (CTU) time units]
Figure 13. Resultative (RSL) and Past P2 interpretations of a
Tense and cognitive space in Bantu languages 163
temporal landscape connecting domains. We have analyzed each tense
marker as comprised of two elements; the P2 and F2 have marking
where the P1 and F1 have nal -a. Those markers that correlate with the
moving-Event timeline (and, hence, the P-domain) have a nal -a when
marking the current time unit (i.e., within the bold quadrangle), zero-
marking when specifying the adjacent time units. Those correlated with
the Ego-moving timeline have an initial nasal element. Close observation
shows that the tones also pattern regularly.
What evidence is there in Nugunu to support the claim that the per-
ceived ow of time through the P-domain is toward the future, i.e., a
moving-Ego conceptualization? Both the resultative (RSL, see (3d)) and
the present imperfective (Pr Impf, see (3e)) foster this interpretation. The
RSL does not denote a retrospective view (]
E
X) of the event it marks,
but rather a continuous, on-going view of the result state at the time Ego
speaks (]
E
---Xd) Figure 15. This is supported by the fact that use of this
form does not permit a past time adverbial that would situate the time of
the event itself.
The present imperfective (marked by -an- or -anan- or a phonological
variant) denotes an unbounded temporal interval which is construed
Figure 14. Organization of tense markers in Nugunu
Figure 15. Resultative (RSL) ]
E
---Xd
164 R. Botne and T. L. Kershner
either as time contained within the event (an internal view of the event
E) (Fig. 16a) or as time containing the event (an external view that
invites a future interpretation) (Fig. 16b). In this case, Egos perspective
is toward the endpoint (]
E
) of the event, hence, Ego is construed as
embedded in the time matrix moving forward through the interval into
the future.
Further justication for this analysis of the organization of tense
markers comes from two sources. First, the F2 and F3 futures do not
reect simply a dierence in remoteness. They dier in the degree of cer-
tainty associated with each. The F2 future typically denotes a time tomor-
row or the day after. However, it may also denote a more distant time if
the speaker is certain. The F3 future typically denotes a time a few days
away, but may also be used to indicate uncertainty on the speakers part.
This epistemic dierence is captured nicely in the dissociation of the fu-
ture D-domain, and unies temporal and epistemic meaning.
Second, certain types of dependent (DEP) clause marking dier ac-
cording to domain (see Orwig 1991). The examples in (4) above illustrate
this for P1 and P2, where the marker is na. In fact, it is na for all of the P-
domain markers, except for Pr and F2, which require a sux -mo on the
verb. (These -mo forms appear to be an alternative to use of na in order
to avoid a present tense that would look identical in form to F2 and a na
na sequence that would result in F2.) On the other hand, the two remote
tenses, P3 and F3, both attach a nal -a, as shown in (6) (Orwig 1991:
158). Thus, the two D-domains mark dependency dierently from the P-
domain.
(6) a. P3 a mba ja a baa go due fsa sshs , gecamsna gssgs
he P3 do he P1 INF sell avocados her time which
m mba-a bola
I P3-DEP arrive
he had already sold her avocados when I arrived
a. Internal imperfective view of time in E (be V-ing)
b. External imperfective view of time containing E (will V)
Figure 16. Pr Impf Xd]
E
Tense and cognitive space in Bantu languages 165
b. F3 nobola no jga ja no baa go naaa, gecamsna gssgs o
rain it F3 do it P1 INF fall time which you
jga-a gulu
F3-DEP return
the rain will already have fallen when you return
The sentences in (6) also show that the P1 marker is not an absolute
tense, but a relative one: Specically, it denotes a past event considered as
a completed whole occurring within the temporal domain of the particular
reference time. The futures behave in the same way, as shown for F1 in (7).
(7) Kunuu a mba ls i yimene, gojaa a gss sda
Tortoise he P3 be he know that he F1 go
naa Makoa [Orwig 1991: 160]
to Makoa
Tortoise knew that he would go to Makoa (later that day)
The Nugunu tense system, then, can be analyzed as having the catego-
ries shown in Table 4, lines indicating which forms may cooccur.
The morphemes |ba| and |ga| invariably denote past and future, respec-
tively, whether situated in the P- or D-domains, which particular domain
being dependent on whether they co-occur with N- or not.
6. Resultatives and perfects: dierences between Nugunu and English
What we have labeled the Resultative (RSL) in Nugunumarked by a
has a translation in the English Present Perfect, as noted above in (3d).
We have also observed that P1marked by baamay be translated by
the English Past Perfect (4a and 6a). Clearly, there is not a straightfor-
ward correspondence between the Nugunu and English forms. How to
reconcile this apparent lack of correspondence within the framework of
our model?
As we showed in Figure 12, Resultative a denotes a continuing post-
Nucleus state brought about by some event E that occurred in the past.
Table 4. Nugunu verbal categories
Domain Past/Future Dependency
N. a
ba
ga
na
a
na (V)-mo
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::
166 R. Botne and T. L. Kershner
That is, it denotes the extension of the results of E into the present. Be-
cause of this phasal focus, we can consider the Resultative to be a kind
of Aktionsart, marked only in the P-domain. From this developed the
P2 anterior time unit sense. In order to dierentiate this latter kind of
domain-internal relation from the cross-domain relations of tense proper,
we adopt the term tenor to refer to the dierent temporal relations (dis-
tinct from aspect and Aktionsart) marked in the P-domain.
9
Consonant
with this distinction, henceforth, the terms past and future will be used
to refer to tense relations (i.e., cross-domain temporal relations), Mpast
and Mfuture for tenor (domain-internal) relations along the Time-moving
dimension.
How, then, do the Nugunu Resultative and the English Perfect com-
pare? Both exhibit a moving-Ego time dimension, while neither permits
co-occurrence with a time adverbial denoting a past time. In both the
event E may have just occurred, though this is not directly denoted by
the construction itself, but determined in context. They dier in that the
Nugunu Resultative denotes a post-Nucleus phase to the event, while the
English Perfect denotes a temporal template (or overlay) imposed on
the event schema (Fig. 17). The imposed overlay establishes a point of as-
sessment (PoA) that acts as a new reference point (R2) that is, itself, situ-
ated temporally with respect to S, while the event proper is interpreted
temporally with respect to R2. It is this point of assessment from which
the duration of time elapsed from the onset phase of the event (
E
[) (as in
(Figures 17bd), is determined. In the present perfect, R2 is coincident
with S as a default reading. The Nugunu Resultative lacks not only the
temporal overlay, but also the point-of-assessment from R2 of the En-
glish Perfect. Consequently, it does not permit specication of duration
of time since the event occurred; rather, it only denotes a position in the
extensionthe result phaseof the event.
Although many linguists consider the perfect to be an aspect, we
follow Bybee (1985: 160) here in considering it to have primarily
features of tense (tenor, in our model); it appears aspectual because it
introduces a point-of-assessment (R2) that may be situated inside the
event boundaries, as in (Fig. 17b). It is tense-like in that it relates the
time of the event to the time of assessment, which itself is related to
the time of reference. Current relevance at S is an invited interpretation
from the extension of the temporal overlay to a point of coincidence (of
R2) with S.
Consider now the past perfect use of Nugunu baa. In simple usage,
P1 baa denotes a hodiernal past, as shown in (8). However, it may be
combined in a complex construction with auxiliary ja do, which may
occur with any of the tense/tenor markers, for example, as with P1 baa
Tense and cognitive space in Bantu languages 167
(9a) and P3 mba (9b) (Orwig 1991: 158).
10
Although these examples
have been translated with the past perfect, they denote a past-in-the-
past or ante-past reading rather than a perfect reading. The English
Perfect may have either a past perfect or an ante-past interpretation,
as in (10).
(8) P1 a baa bola na gsys ns
he P1 arrive with morning
he arrived this morning
(9) a. P1P1 a baa ja a baa go gue, gscamsna gssgs m ba
he P1 do he P1 INF die time which I P1
na bola
DEP arrive
he had already died when I arrived
a. Temporal overlay (bold) on Event structure.
b. Dave has lectured for 40 minutes; hell wrap up shortly.
c. He has lectured at IU since 1985.
d. He has lectured at IU. [experiential]
Figure 17. Interpretations of the English Perfect (moving-Ego time matrix)
168 R. Botne and T. L. Kershner
b. P3P1 a mba ja a baa go due fsa ss hs, gecamsna
he P3 do he P1 INF sell avocados her time
gssgs m mba-a bola
which I P3-DEP arrive
he had already sold her avocados when I arrived
(10) a. The Corps had built a bridge across the river the [past perfect]
previous year, but Spring ooding destroyed it
last year.
[past]
b. The Corps had built a bridge across the
river the previous year, but Spring ooding
had destroyed it before the road opened.
[antepast perfect]
[antepast]
Thus, P1 denotes a past tenor relation (i.e., within the same domain) with
respect to some specied reference locus. When that reference locus is not
S, its temporal location is indexed by temporal marking on the auxiliary
verb ja do, either P1 or P3, a schema of the latter provided in Figure 18.
The dierence between Nugunu and English lies in the nature of the
temporal marking within the domains. In the P-domain, Nugunu mor-
phologically marks either tenorproximity (current or anterior time
unit)or result state of a past event. In contrast, English does not mark
domain-internal temporal relations morphologically. Zero-marked verb
forms do not specically situate the event with respect to S, though the
default interpretation is time of speaking (S). Rather, these non-past
forms can be used for any time, but only when temporal position is speci-
ed by the appropriate temporal adverb. Thus, either the historical
R2 indexed by ja; temporally situated by mba (P3)
E (sell ) temporally situated by baa (P1)
Figure 18. Past-in-the-past (or ante-past) in Nugunu
Tense and cognitive space in Bantu languages 169
present yesterday he tells the chief hes taking the day o, and the chief
res him or the future tomorrow he tells the chief hes quitting is accept-
able. The periphrastic present perfect imposes over the event structure a
temporal overlay whose salient endpoint establishes a point of assessment
and a new point of orientation from some contextually identiable locus
(the default being S), from which magnitude ( duration) and/or past ex-
istence may be assessed; hence, one may indicate duration of the interval
or the time at which the event began (e.g., since 2 oclock). Thus, while in
Nugunu one can indicate the time period (this morning, yesterday, etc.),
in English the perfect permits only adverbials of duration or point of
origination.
R2 indexed by have; temporally situated by -D
E (built) temporally situated by -N (i.e., the past participial form)
a. Perfect in the past
b. Perfect in the antepast
Figure 19. Perfect and Past-in-the-past (or ante-past) in English
170 R. Botne and T. L. Kershner
In this paper, we present a general picture of tense, tenor, and dissocia-
tion, adducing from a variety of Bantu languages seemingly curious
evidence of dierent kinds that doesnt nd a satisfactory analysis in a
simple one-dimensional linear approach. These cases, we propose, sup-
port the view of a bipartite mental model of deictic relations. Before turn-
ing to consideration of curiosities in individual languages, we add a
brief note on aspect in our approach.
7. Aspect and tenor in the domain approach
Aspect denotes the particular temporal view of time in the narrated event.
More precisely, a specic aspect denotes a particular temporal phase of
the narrated event as the focal frame for viewing the event. This focal
frame depicts the status of the event in relation to the vantage point deter-
mined by Ego, by default typically the moment of speaking. Tenor, on
the other hand, situates the event at some location in time in relation to
a reference point. We have already addressed dierences between resulta-
tives and perfects in the last section. Here we present a brief sketch of
some aspect and tenor marking in Kilega (D.25), a Bantu language spo-
ken in eastern Democratic Republic of Congo, in order to illustrate other
types of aspect and tenor in the P-domain.
Kilega, like most Bantu languages, exhibits a complex set of tense, ten-
or, and aspectual marking. Data here are drawn from Botne (2003b and
eld notes).
From the vantage point of the moment of speaking, Ego may adopt
any of three aspectual views of the event: inceptive, continuative, culmi-
native, marked by the prexes -sa-, -ku-, and -a-, respectively. Because as-
pect interacts dierently with the dierent inherent lexical aspect of verbs,
we illustrate below three cases, with the activity verb -kangula clear, the
inchoative transitional achievement verb -zombama be(come) hot, and
the inceptive achievement verb -boboka soften; become soft.
With the activity verb -kangula, the aspects denote views of three dier-
ent phases of the event, the beginning (or Onset), the core (or Nucleus), or
the culmination (or Coda), as exemplied by the sentences in (11) and the
diagrammatic representation in
(11) a. a-sa-kangula i swa
11
he has started clearing the eld
b. a-ku-kangula i swa he is clearing the eld
c. a-(a-)kangula i swa he has just cleared the eld
Not all events are activities. Achievements dier in that they have a
punctual core (Nucleus), and may have dierent congurations of onset
and coda phases. Two such verbs are illustrated below. The inchoative
Tense and cognitive space in Bantu languages 171
transitional achievement verb -shika be(come) hot, as exemplied in
(12), comprises both an onset coming-to-be phase as well as a stative
coda phase. As with the activity verb, the inceptive and culminative as-
pects indicate a focal frame just before or just following the nucleus of
the event, as in Figure 21. The continuative -ku-, however, denotes time
in the stative coda phase, rather than in the nucleus phase. That happens
for two reasons: (1) the nucleus is a point, so it cannot be continuative; (2)
the stative coda constitutes the semantic essence of the event. Hence, we
can say that continuative -ku- depicts an interval in the semantic core of
the event, which may or may not be the nucleus.
(12) a. i dya li-sa-sh ka the food is beginning to get hot
b. i dya ly-a-sh ka the food has just become hot
c. i dya li-ku-sh ka the food is very hot
The inceptive achievement verb -boboka soften (13) diers from -shika
in not encoding a coda phase; rather, it encodes an onset and punctual
nucleus (Fig. 22). Again, the three aspects may apply, with the inceptive
and culminative aspects denoting, as expected, inception and culmination
with respect to the nucleus of softening. However, the semantic essence
in this verb is the onset phase, which is denoted by continuative -ku-.
(13) a. by na bi-ku-boboka the sh [are showing signs that they] will
soften [as they cook]
b. by na bi-sa-boboka the sh are almost [beginning to be] soft
c. by na by-a-boboka the sh have softened [and are ready to
eat]
Figure 20. Focal phases of -kangula denoted by aspect marking
Figure 21. Focal phases of -shika be(come) hot by aspect marking
172 R. Botne and T. L. Kershner
The three aspectual markers constitute a set depicting dierent views
of the event at the moment of speaking S, schematized for the dierent
views of an activity verb in Figure 23.
These aspects may have as their vantage point some time other than S.
In such cases, a complex constructiona form of the auxiliary verb be
plus the aspect-marked form of the verbis used. Be indexes the new
vantage point, determined according to tenor marking in the P-domain,
as shown in Figure 24, or by tense marking. Note that the completive
and continuous aspect constructions behave as tenor markers in the cur-
rent time unit in opposition to those that mark the anterior and posterior
time units.
(14) a. tw-a-bez-ag-ile tu-ku-kangula i swa we had been clearing the eld
1P-P2-be-IMPF-P2 1P-CONT-clear eld [at that time] (yesterday)
Figure 22. Focal phases of -boboka soften by aspect marking
Figure 23. Three views of an activity verb at S (P-domain)
Figure 24. Tenor marking in the Kilega P-domain
Tense and cognitive space in Bantu languages 173
b. tw-a-b-e tu-sa-kangula i swa we will have started clearing the
1P-F2-b-F2 1P-INCEP-clear eld eld [at that time] (tomorrow)
c. tw-a-bez-ag-a tw-a-kangula i swa we had cleared the eld [at that
1P-P1-be-IMPF-P1 1P-COMP-clear eld time] (earlier today)
As the Kilega data suggest, certain forms may be perceived as more or
less denoting aspect or tenor within a TMA system. Data from other lan-
guages in our exposition will illustrate further dierences between the
two.
Having eshed out our model rst, we turn now to our goal in this
paper: to illustrate various kinds of evidence from Bantu and Bantoid
languages that support this concept of separate cognitive domains. The
kinds of evidence that we focus on here come from unexpected and inad-
equately explained curiosities in the data.
8. Curiosity #1: Use of the remote past in Basaa (A.43; Cameroon)
The common approach to Bantu language tenses, as for most languages,
is to map tense markings to appropriate intervals of a timeline (cf. for
example, Nurse and Muzale 1999 for Ruhaya and other lacustrine Bantu
languages; Maganga and Schadeberg 1992 for Kinyamwezi; Taylor 1985
for Runyankore/Rukiga, among others). We have done this for some
simple tense markings in Basaa, labeling for convenience each form P
1
,
P
2
and so forth, as it seemingly situates events farther and farther from
the tense locus, here the speech event (S) (see Fig. 25 and examples in
(15)). While this approach seems intuitively sensible at rst glance, it nev-
ertheless fails to account for the data in any satisfying way or provide any
insight into how Basaa speakers organize and conceptualize event space.
(15) a. P
1
a n-sebel juu she called last night
b. P
2
a b -sebel snd ntagbs she called last week
c. P
3
a -w gwet b 14 she died in the war of
(19)14 (WWI)
a -pam aj lsn
then today
she went out ages ago
today
Figure 25. Tenses in Basaa (Mbom 1996; Hyman 2003)
174 R. Botne and T. L. Kershner
d. Pr/F
1
a n-temb kokoa she is returning this
evening
e. F
2
a ga-masak jw i nl she will dance next year
f. F
3
nsajgw a-a jkj s
ksl yada
there will be peace in the
world one day
a a-ks ha lsn she will leave later today
there today
Time reference (general):
P3 remote past
P2 yesterday or earlier
P1 earlier today
Pr/F1 present or future today
F2 tomorrow or later
F3 remote future
The particular curiosity (indicated by ) that we focus on here is the
use of the remote past P3 and remote future F3 with the adverbial len to-
day, which, apparently, cannot be done with either P2 or F2. Naturally,
one has to wonder why this should be the case. We propose that the P3
and F3 markers, unlike the P2 or F2 markers, situate the event in a D-
domain, as schematized in Figure 26. In Basaa, this represents a subjec-
tive sense of distance or separation of the event with respect to the speech
event; hence, not only can it be used to refer to temporally distant events,
but also to temporally proximate ones, which are subjectively construed
as remote, or dissociated in our terms. The other verbal morphemes
denote temporal divisions within the P-domain, as illustrated. The N-
Figure 26. P- and D-domains in Basaa
Tense and cognitive space in Bantu languages 175
morpheme denotes an event within the current time unit (CTU)tone
marking determining before or after Sfor example, today, this
month, etc., P2 b - and F2 ga- adjacent time units (anterior or posterior
to the CTU), for example, yesterday, last month, or tomorrow, next
year, respectively. P2 and F2 cannot be used with len today because
they encode for a time period that is NOT the current time unit.
Time in the P-domain is represented tentatively as moving-Ego, al-
though there is little evidence available. Most grammatical descriptions
appear to describe the Mbene variety of the language. Schu rle (1912: 74)
notes, however, that the Bakoko variety uses a form that incorporates the
verb ks go in the near future, as in mi n-ks l I will come [lit I F1-go
come]. Following the analyses with motion verbs set out in Botne (2006b),
we believe this use to be indicative of a moving-Ego timeline.
In a more conventional manner, we can represent the semantic organiza-
tion of the Basaa tense system as in Table 5. The (extended) contemporal
dimension comprises two cross-cutting concepts: (1) directionearlier
(<S) or later (bS)with respect to the deictic anchor, i.e., the speech event
S, and (2) location with respect to the deictic anchor, either situated within
the same time unit or in the comparable adjacent/contiguous time unit.
This analysis of Basaa implies that there are, then, dierent kinds of re-
moteness possible. In the P-domain, we nd a measured remoteness in
terms of temporal proximity to the deictic center, within or outside of the
relevant current time unit. Projection of an event into a D-domain, on the
other hand, connotes a subjective separation and distance; the event is in
another world. This distinction, we feel, provides the basis for a more
nuanced analysis of the concept of remoteness. This issue will appear
again in several of the following cases.
9. Curiosity #2: Negation patterns in Tunen (A.44; Cameroon)
A second type of evidence comes from Tunen. Like Basaa, Tunen exhib-
its multiple past and future tenses. Unlike Basaa, however, it is the nega-
tive forms that are of direct interest, as they vary across the tenses. Hence,
examples of both armative and negative tenses are illustrated in the sen-
tences in (16)(18).
Table 5. Organization of Basaa temporal markers
Contemporal [P-domain] Not contemporal
CurrentTU ContiguousTU
<S n- b - [Past D-domain]
bS n- ga- a- [Future D-domain]
176 R. Botne and T. L. Kershner
(16) a. P4 msk ls wam a mon n [Dugast 1971: 182]
leopard P4 my child kill
the leopard killed my child
a
0
. wam a mon ata mts a ls ls na [Ibid.]
my child NEG one 1S NEG P4 be_sick
not one of my children was ever sick
b. P3 ba ka nekaka b lihani m`ss malsndolonum [Ibid. 180]
3P P3 meeting x days seven
they set the meeting in seven days
b
0
. hiseli sa siana metana ta buss
antelope NEG lie_down hunger NEG day
bomts [Ibid. 181]
one
Antelope did not sleep hungry, not even for one day
c. P2 ms na nifu sambs o buana numwa [Ibid. 178]
1S P2 package put bed under
I put the package under the bed
c
0
. o sa miajo sin [Ibid. 179]
2S NEG me see
you did not see me
d. P1 ms no mokolo nk [Ibid. 176]
1S P1 foot break
I broke my foot ( just a moment ago)
d
0
. same negative form as P2 [Ibid. 194]
(17) a. Aorist a miajo mona bwansn [Ibid. 172]
3S me child carry.for
he carries the child for me
a
0
. msss ls bslabsnia b bsndo ns [Ibid. 173]
chimpanzees NEG food of humans eat
chimpanzees dont eat human food
b. Present ba ndo efs nys [Ibid. 176]
3P Pr maize_porridge
they are eating maize-porridge
b
0
. a ls ndo buoli nyo [Ibid.]
3S NEG Pr work work
he is not working
(18) a. F1 ms ndo bua buh na sabon-ak [Ibid. 183]
1S F1 your debt pay-F1
I will pay your debt [today]
a
0
. nij a miajoa, ms sa noye
save me 1S NEG in_that_manner
Tense and cognitive space in Bantu languages 177
kia ton [Ibid. 185]
do anymore
save me, I wont act like that anymore
b. F2 o na sabon imw nyi na many kul emts [Ibid.]
2S F2 pay goat and medicines time one
you will pay (for) the goat and medicines at the same
time
b
0
. o kal o ss? yam mila sa ta [Ibid. 186]
2S explain 2S say my palm_nuts NEG produce_much
you said that my palm nuts would not produce
much [oil]
c. F3 ms jo ndasa bulila [Ibid. 187]
1S F3 VEN.come tomorrow
I will come tomorrow
c
0
. ms so jo ajo mima f`alabi [Ibid. 188]
1S F3 NEG 2S hut build.CAUS
I will not have a hut built for you
These sentences illustrate the general use of the dierent tense markers.
As one might expect, the markers change form from one tense to another,
with the sole exception of F1, which is the Present plus the sux -Vk
(whose vowel harmonizes with the root vowel). The general time refer-
ence of each and both armative and negative markers are listed in Table
6. The curious facts in Tunen are (1) why some tenses form their nega-
tives with a form of sa while others do so with ls or so, and (2) why the
present and F1, which have the same armative marker ndo`, have dier-
ent negative marking. Moreover, note that the ls and so negatives are
added to the armative tense marker, while sa negatives replace the cor-
responding tense marker.
Table 6. Armative and negative tense markers in Tunen (Dugast 1971)
Time reference Armative T Negative T
P4 distant, time not precise ls ls ls
P3 pre-hodiernal ka sa
P2 earlier today; stories na sa
P1 immediate past no [same as P2]
Aorist general present ls
Present in midst of E at S ndo ls ndo
F1 hodiernal ndo -Vk sa
F2 time not precise, certain na sa
F3 tomorrow or later, certain j o so jo
178 R. Botne and T. L. Kershner
At rst glance, the forms and distribution of the negative markers seem
almost arbitrary; sometimes a variant of sa, sometimes ls, and in one case
so, with no apparent motivation for the distribution. For example, why
should P4 and Pr tenses have ls, but P3 and P2 forms of sa? The distribu-
tion assumes a denite pattern when we consider the organization of these
markers in terms of P- and D-domains. Our proposed analysis of the
organization and distribution of the armative tense markers, based on
their form and semantics, is shown in Figure 27. The timeline in the P-
domain is presented as moving-Event, based on the premise that P3 ka
was derived from the itive (movement away) marker ka. It is temporally
sub-divided by multiple verb markers; the P4 and F3 markers, however,
situate events in dierent D-domains, either not contemporal (past) or
not contemporal (future), respectively.
It is the organization and distribution of negative markers, however,
that is of primary interest here. First, we nd that all of the sa negatives
are situated along the moving-Event timeline (i.e., within the P-domain),
diering only in tone: pre-S before today a low tone, pre-S today a
high tone, post-S a rising tone (Fig. 28a). Mous (2003: 29495) treats
each sa form as a distinct temporal morpheme because nothing can
be gained in terms of economy of description by extracting a common
negative element sa . . .. We disagree; there are two pieces of semantic
information encoded in each unit: sa indicates negation, tone the time ne-
gated. Hence, we conceive of each as a combination of two elements, seg-
mental sa plus a tone. Furthermore, these sa forms completely replace the
armative tense markers and are, consequently, the only indicators of
time.
Figure 27. Distribution of armative tense marking in Tunen
Tense and cognitive space in Bantu languages 179
Second, the odd negatives ls and so lie along the Ego-moving time-
line, ls for non-future, so for the future D-domains. Unlike the sa forms,
these co-occur with the corresponding armative tense marker.
There are two present constructions, the Aorist (called Present indeni
by Dugast) and the Present (called the Present ponctuel by Dugast). In
both cases, ls is inserted before any tense marking. Since the Aorist form
is simply the bare verb, it has simply ls plus the verb in the negative. In
the present, ls occurs before the tense marking, hence, ls ndo (the high
tone on ls arising contributed by ndo). The Aorist expresses a general
fact, habit, or situation sans consideration . . . de position dans le temps
. . . (Dugast 1971: 172). That is, it does not denote an event on-going
at S. Thus, the semantics of this form naturally correlate with the Ego-
moving timeline, which depicts time as static and unbounded. It is not
surprising, then, that the negative patterns the way it does. On the other
a. Negatives along the moving-event timeline (P-domain)
b. Negatives along the Ego-moving timeline
Figure 28. Distribution of negative tense marking in Tunen
180 R. Botne and T. L. Kershner
hand, the Present does denote an event on-going at S and so presents a
dierent situation.
From the data examined so far, we can see that Pr and F1, both having
the tenor marker ndo, behave dierently with the negative: F1 patterns
in the negativei.e., manifests a form of sawith other tenor forms
in the P-domain; Pr, like the Aorist, patterns with ls negatives outside
the P-domain. Since we naturally anticipate an on-going present to
pattern within the P-domain, why might the negative Pr pattern as it
does in Tunen? We propose the following. In the armative, the F1 ndo
Root-Vk construction clearly derives from the Pr, hence, both F1 and Pr
in the armative can be assumed to assert a fact about an event occur-
ring in the P-domain, one soon after S (F1), one on-going at S. Negating
F1 denotes that in the world that Ego perceives to exist at S (i.e., the P-
domain), it is not a possibility that E will occur in the near future. Negat-
ing Pr, on the other hand, denotes that E is not real at the moment of
speaking S. Tunen speakers have, as we have observed, correlated this
fact with the use of ls, denoting not real at S. That is, the present af-
rmative asserts the reality of E at S, the negative the non-reality (or, to
put it another way, X may be doing something, but E isnt it, i.e., the
reality).
Support for this comes from the observation that the morpheme ls
is linked to the verbal deictic D-domain not real, in that it is found
in the negative of irrealis constructions (what Dugast labels la forme
subjective (1971: 18990), Mous (2003: 297) optative), which in the
armative have the form sp-

-root, in the negative sp-ls-root. This


subjective form is used in expressing wishes, desires, intentions, ques-
tions. Tunen speakers have, then, negated Pr with the same negative
form associated with dissociated domains, past and, specically, irrealis.
The analysis of negative forms in Tunen suggests an organization such
as that depicted in Table 7. What is clear from these data is that the dis-
tribution of negatives in Tunen is not arbitrary, but motivated, we claim,
by the contrast in perspectives on time coupled with the cognitive division
between P- and D-domains.
Table 7. Organization of negative marking in Tunen
Contemporal
[P-domain]
Not Contemporal
[Temporal D-domains]
Not real
[Irrealis D-domain]
<S sa (p3) ls T (p4)
sa (p2)
S ls T (pr)
>S sa (f1 and f2) so T (f3)
Tense and cognitive space in Bantu languages 181
10. Curiosity #3: Dierential implications with tense markers: Lusaamia
(J.34, Kenya) and Ekoti (P.30, Mozambique)
The simple sequential correlation of tense markers with a single timeline
would suggest that the only dierence between, for example, one future
marker and another would lie in the temporal distance from the time
of speaking (although see Janssen 1994 for a dierent view in Dutch).
However, data from Lusaamia and Ekoti demonstrate that this is not
necessarily the case. Rather, there is a dierence in implications or in re-
strictions on use. Consider rst a pair of examples from Lusaamia, one
marked with the near future prex -na-, the other with the remote future
prex -axa- and nal -e, as shown in (19).
(19) Lusaamia (data from Botne eld notes)
a. xusuub ra mbwee a-na-meny-a
1P.hope.FV that 3S-F2-live-FV
we hope that it [child] will live
[implies child exists, i.e., is living]
b. xusuub ra mbwee y-axa-meny-e
1P.hope.FV that 3S-F2-live-F2
we hope that it [child] will live
[implies child has not been born]
Note that in (19a) the sentence is only felicitous when the child spoken
of is actually alive at the moment of speaking. Use of the remote future
construction, in contrast, is felicitous if the child has not yet been born,
regardless of whether, say, the mother is currently pregnant or not. This
dierence in implication falls out naturally in the model we are propos-
ing. As illustrated by the schema in Figure 29, the -na- future situates the
event within the P-domain, that is, within the contemporal world in
which the speaker perceives herself to be. In contrast, the -axa-ROOT-e
future construction situates the event in a dissociated future world in which
the child does not yet exist; that world we have labeled a D-domain. (See
Botne 2006a for greater detail and discussion of Lusaamia TMA forms.)
The dotted timeline indicates that we do not have enough evidence at
this time to select between a moving-Ego or moving-Event analysis.
A similar contrast is found in Ekoti. There are two simple past con-
structions, a recent past formed with -a-. . .-a (20) and a remote past
formed with -aa-. . .-iy-e (21).
(20) Recent Past (P1)
a. taana n-a-c-a fooxi [Schadeberg and Mucanheia
2000: 172] yesterday 1P-P1-eat-P1 together
yesterday we ate together
182 R. Botne and T. L. Kershner
b. mwanakhwaawe a-(a-)n-x c-el-a wuuluvala [Ibid. 170]
chicken.her 3S-P1-it-slaughter-APPL-P1 become_old
her chicken, she slaughtered it because of old age
c. k(i)-a-n-s kan-a ari wi kuri [Ibid. 135]
1S-P1-3S-meet-P1 3S.be LOC.Inguri
I met him (while) he was in Inguri
d. mvuka w-(a-)uum-a ncuwa [Ibid. 164]
rice it-P1-dry-P1 LOC.sun
the rice (has) dried in the sun
(21) Remote Past (P2)
a. Hayaathi ti-ye y-aa-m-par z-iye hooma taana [Ibid. 151]
Hayathi COP-3S 9-P2-3S-heat-P2 9.fever yesterday
Hayathi got/had a fever yesterday
b. (a-)aa-lum-ach- (y)-w-e [Ibid. 89]
1S-P2-bite-INT-P2-PASS-P2
he was very badly bitten
According to Schadeberg and Mucanheia (2000: 112), the P1 tense de-
notes a completed action in the recent past, perfective in nature. It also
may be used when there is a sense of present result, as in (20d). Although
translatable by the English Perfect, it is neither a perfect as in English nor
fully resultative, indicating a present state. Nevertheless, it invites a sense
of current relevance.
The remote past denotes an event that happened in a distant past.
However, as the example in (21a) illustrates, this may be as recent as
Figure 29. Lusaamia domain organization
Tense and cognitive space in Bantu languages 183
yesterday. We posit that the dierence between the uses of the two tenses
lies in the dissociative nature of the remote past. That is, the so-called re-
mote past situates an event in the D-domain, the recent past in the P-
domain (Fig. 30). Current relevance arises from an events being situated
in the P-domain in opposition to the D-domain.
The timeline through the P-domain denotes moving-Ego. Evidence
comes from grammaticalization of the verb -eetta go (>-tta) in the pres-
ent progressive/immediate future construction. This construction, as
shown by the examples in (22), consists of the present tense marker
-n(i)- prexed to the phonologically-reduced form of go followed by
the innitival form of the main verb. The present marker alone on the
main verb, as in (22b) indicates a generic fact, and can be considered to
mark present along the Ego-moving timeline.
(22) a. ki-n-tta o-lawa [Schadeberg and Mucanheia 2000: 142]
1S-Pr-GO INF-leave
I am leaving or I am about to leave
b. akot a-n-l ma maxapa m-pamela [Ibid. 109]
Koti_people 3S-Pr-cultivate farm LOC-interior
the Koti people farm in the interior
What is curious in Ekoti is that the distinction between the two past
forms has consequences for the syntax. In the examples in (23), we nd
that the same propositionportuguese build fortressrequires dier-
ent tense marking depending on whose world, the Portuguese or the
fortress, is perceived as salient.
12
In (23a), the Portuguese are the salient
agents who built in the remote past, hence, the remote tense marker is
Figure 30. Ekoti pasts
184 R. Botne and T. L. Kershner
used. However, in the passive (23b), when the fortress becomes salient as
the subject, the near past marker is used.
(23) Ekoti [Schadeberg and Mucanheia 2000: 116]
a. azuku (a-)aa-cek- ye fortaleeza
Portuguese 3P-P2-build-P2 fortress
the Portuguese built the fortress
b. fortaleeza y-a-cek- w-a naazuku
fortress 3S-P1-build-PASS-FV by-Portuguese
the fortress was built by the Portuguese
As with the case in Lusaamia, this dierence in use and implications
falls out naturally from the model we are proposing. The action of the
Portuguese, the subject and topic of the active sentence (23a), occurred
in a remote and dissociated past and, hence, is marked with the D-
domain tense marker. However, the fortress, subject and topic of the pas-
sive sentence (23b), still exists in the contemporal world of the speaker
and is marked with the past appropriate to the P-domain.
A reviewer has suggested that use of the recent past for (23b) may be a
conventionalization of the fact that the resulting state is nearer the pre-
sent than is the agent of the action. The same reviewer also suggested
that the Lusaamia examples could be accounted for by the lexicalization
or idiomatization of an invited inference. Specically, the remote future
use emanates from the invited inference that the childs not yet existing
is necessarily more remote than if the child has already been born. True,
these may be possible ways to account for the observed data. However,
our model provides a principled and motivated framework within which
these, as well as other, disparate facts can be accounted for in a unified
manner. Another of these disparate curiosities occurs in the next language
we examine.
11. Curiosity #4: Dierential temporal implications of lexical items in
Chisukwa (M.20, Malawi)
Another kind of implicational evidence can be found in the senses of par-
ticular lexical items as they are used with dierent tenses. Consider, for
example, the Chisukwa verb -fwa die (24) (data from Kershner eld-
notes). Apart from its common use referring to humans (24a), it can also
be used metaphorically with respect to the closing of a store (24b and c).
(24) Inchoative verb uku-fwa to die
a. PR a-ku-fw-a. s/he is dying/will die.
3S-Pr-die-FV
Tense and cognitive space in Bantu languages 185
b. P2 isitoolo y-aa-fw- ile the store closed
9.store 9-P2-die-CMPL [implication: temporary]
c. P3 isitoolo i-ka-fw-a the store closed
9-P3-die-FV [implication: out of business]
In a strictly linear analysis, we would expect the only dierence be-
tween P2 and P3 to be an earlier or later time reference, such as yester-
day and before yesterday, respectively. However, what we nd is that
there is an implication of temporariness for P2, but of permanency for
P3. Although anomalous in a simple linear model, this distinction is mo-
tivated in our domain model. The P3 marker -ka- situates the event in the
D-domain where the event is interpreted as permanent, i.e., no longer
active, as opposed to P2, which indicates it is past in the P-domain but still
active, hence, interpreted as a potentially temporary state of aairs.
A second, and similar, piece of lexical evidence is found in the dier-
ence in inter-pretation of the aspectualizer -leka cease when it occurs
with -ka- P3 forms in contrast with tenses marked with -aa-, e.g., P1 and
P2, as shown in (25). Similar to the -fwa case in (24), there is an implica-
tion of permanency with -ka- and temporariness without -ka-; this is re-
ected in the interpretation equivalent the verb receives, that is, quit
(for P3) versus stop (for P1).
(25) Aspectualizing verb uku-leka cease
a. P3 tu-ka-lek-a pakuseenga boo aafulala
1P-P3-cease LOC.INF.build after 3S.P1.become_injured
we quit building after he became injured
P1 tw-aa-lek-a pakuseenga boo aafulala
1P-P1-cease LOC.INF.build after 3S.P1.become_injured
we stopped (temporarily) building after he became
injured
b. P3 ba-ka-leka pakugobola if loombe looli bakubyaala
3P-P3-cease LOC.INF.harvest maize but 3P.Pr.plant
nukugobola amalesi
and.harvest millet
they quit harvesting maize and, instead, are planting and
harvesting millet
P2 b-aa-lek-ite pakugobola if loombe
3P-P2-cease-CMPL LOC.INF.harvest maize
they stopped (temporarily) harvesting the maize
As in the preceding cases of interpretation in Lusaamia and Ekoti, one
might say that the dierence in interpretations is due to an invited infer-
ence that permanent cessation would naturally extend further into the
past, while temporary cessation is more likely with a recent past. Again
186 R. Botne and T. L. Kershner
we point out that, while such an account is possible, it seems to us to be
rather ad hoc. In our analysis, the various and disparate curiosities we
have enumerated can all be accounted for in a unied manner.
12. Curiosity #5: Parallel constructs in Lucazi (K.13, Angola)
With Lucazi, we consider six non-future verbal constructs that illus-
trate the inter-connection of tense, tenor, and aspect. These constructs
do not constitute all the non-future forms in the languagethere are
also gnomic, habitual, and progressive forms as wellbut the ones we ex-
amine here are sucient to illustrate a particularly interesting case of pat-
tern congruity within the dissociative model.
The six constructs each consist of a prex (two prexes in one instance)
and a sux circumscribing a verb stem, as shown in Table 8, labeled as in
Fleisch (2000). As the reader will note, there are two prexes, -na- and -a-
that recur, and which Fleisch has associated with the concepts anterior
and perfective, respectively. Our analysis diverges from Fleischs in sig-
nicant ways; consequently, the reader should note carefully that we do
not employ the terminology as he does.
We begin our analysis by considering rst the two formally parallel
patterns with cross-cutting co-occurrence of the prexes -na- and -a- with
the suxes -V and -ile, as set out in Table 9. The sux -V represents a
vowel that typically harmonizes with that of the verb stem.
Consider the examples in (26) of the -a-. . .-ile construct, Fleischs Sim-
ple Past.
(26) -a-. . .-ile construct
a. v-a-h t-ile mu-musenge [Fleisch 2000: 164]
3P-PST-pass-PFV LOC-bush
they crossed the wilderness
Table 8. Lucazi non-future constructs (Fleisch 2000)
Anterior -na-. . .-V -a-. . .-V Present Perfective
Past Anterior -na-. . .-ile -a-. . .-ile Simple Past (Perfective)
Hesternal Past -na-ka-. . .-ile -a-. . .-a Perfective
Table 9. Parallel prex sux pairs in Lucazi
affixes -V -ile
-na- -na-. . .-V -na-. . .-ile
-a- -a-. . .-V -a-. . .-ile
Tense and cognitive space in Bantu languages 187
b. ka-tali u-a-mu-sum-in-ine
13
[Ibid. 165]
dog 3S-PST-3S-bite-APPL-PFV
the dog bit him
c. kasumbi u-a-y-ile ku-alenga ndonga [Ibid. 142]
chicken 3S-PST-go-PFV LOC-border river
one day, Chicken walked along the river
Fleish (2000: 166) notes the following characteristics associated with use
of the Simple Past:

it denotes absolute temporal reference;

it is associated with a notion of remoteness, though not in a strictly


metrical sense (i.e., the sense of remoteness arises from something
other than simply temporal linear distance);

it provides the temporal reference point as a background for other


states of aairs;

it does not express or imply later consequences resulting from occur-


rence of the event.
These semantic attributes lead us to conclude that the -a-. . .-ile construct
denotes a complete (i.e., whole) and completed event situated at an un-
divided moment in the past. In our model, it situates an event in the past
D-domain. Hence, we label it the D-Past. How does this D-past dier
from the similar -na-. . .ile construct, which also denotes a past event?
Fleisch (2000: 168) notes rst that the -na-. . .-ile construct . . . may re-
fer to markedly remote states of aairs, but at the same time it is in some
respects less remote than the simple past [i.e., what we are calling the D-
Past]. Second, it has potentially a past perfect interpretation, as in
(27ce), meaning it can be interpreted with respect to some reference
point other than S; hence, it is relative and not absolute.
(27) -na-. . .-ile construct
a. tu-na-h luk-ile [Fleisch 2000: 168]
1P-PANT-return-PFV
we (had) returned
b. kaha tusitu vose va-na-l -kungulu-ile [Ibid. 344]
then animals all 3P-PANT-REFL-gather-PFV
then all the animals gathered
c. kaha vangazi vavene va-na-handek-ele ngu-avo: . . . [Ibid.]
then judges themselves 3P-PANT-speak-PFV QUOT-3P
then the judges themselves spoke, saying . . .
d. kasumbi ngu-eni mu-nji-n(a)-amb-ile [Ibid. 169]
chicken QUOT-3S 18-1S-PANT-say-PFV
Chicken said, [that is] why I had said . . .
188 R. Botne and T. L. Kershner
e. amba, nge-oco mu-na-han-a mulonga ku-li [Ibid. 349]
that like-that 18-PANT-give-CMPL judgment 17-COP
ou kasumbi, kasumbi omu njila na-handek-ele
DEM chicken chicken DEM way 3S.PANT-speak-PFV
so, like that, a judgment was found; Chicken had told the
truth
The -na-. . .-ile construct, then, is like the D-past in that it denotes
completion of the event. Unlike the D-past, though, it typically implicates
relevance to some posterior situation. In our model, we propose that it
situates an event in the past of that domain in which the reference locus
is situated, that is, it is a past internal to a domain, the default interpreta-
tion being the P-domain. Hence, though denoting past, that past may be
felt to be less remote than the D-past or markedly remote when in-
terpreted as anterior to some other situation.
We account for the similarity and dierences between these two con-
structs in the following manner. The -ile sux denotes a perfective aspect.
That is, it denotes a perspective on the event following the terminative
coda phase of the event, as shown for the -na-. . .-ile construct in Figure
31. If the point perspective is interpreted with respect to the event itself
(Fig. 31a), an aspectual role, then the interpretation is a completed event.
However, the point perspective can also be interpreted as a reference
point, as in (Fig. 31b), in which case it functions to index a new reference
locus (R2). The event is, then, interpreted as occurring prior to that event
and contributing a sense of relevance at that point.
The -a-. . .-ile and -na-. . .-ile pasts dier, then, in two important ways.
First, the former situates a completed event in a dissociated D-domain,
i.e., that domain external to the reference locus (S); the latter situates the
a. Simple perfective in past reading
b. Perfect reading (had V-ed prior to some reference event)
Figure 31. Dual interpretations of the -na-. . .-ile perfective aspect
Tense and cognitive space in Bantu languages 189
event internal to a domain (by default in the P-domain). Hence, the -a-
form situates the event along the Ego-moving timeline, thereby marking
a tense relation, the -na- form along the Time-moving timeline, thereby
marking a tenor relation. Second, the -a- form treats the event as a com-
pleted whole, the -na- form as a completed endpoint.
Let us turn now to the -na-. . .-V construct. Instead of the perfective suf-
x -ile, it has a nal, typically harmonizing, vowel, transcribed -V. This
ending is semantically similar to, yet subtly dierent from the perfective
-ile. Although, like the perfective, it expresses completion according to
Fleisch, it also expresses either immediacy of the event ([w]ith action
type verbs the anterior predicates express a verbal action which immedi-
ately preceded the reference moment and is still relevant to the latter.
(pp. 1578), as in (28a and b), or it expresses relevance of the event to
the time of reference (by default the speech event), as in (28c and d).
(28) -na-. . .-V construct
a. tu-na-fum-u mu-Vunonge [Fleisch 2000: 160]
1P-PFV-come_from-CMPL LOC-Menongue
we have just come from Menongue
b. tu-na-sokulul-a lipito mu-va-na-het-e [Ibid. 207]
1P-PANT-open-CMPL door 18-3P-PANT-reach-CMPL
we have opened the door (because) they have [ just] arrived
c. vi-ka u-na-kuatilil-a? [Ibid. 124]
8-which 2S-PANT-seize-CMPL
what [lit which thing] have you seized [and still hold]?
d. hee, mu-na-man-e ku-teta laza mulonga? [Ibid. 346]
INTJ 2P-PANT-nish-CMPL INF-cut already 3-judgment
ooh, have you already reached a judgment?
Figure 32. Temporal analysis of -a-. . .-ile and -na-. . .-ile in the dissociative model
190 R. Botne and T. L. Kershner
yii, u-na-hu-u
yes 3-PANT-become-CMPL
yes, it has come to be
The dierence between this construct and the perfective ones lies in the
location of the aspectual viewpoint. For the perfectives, we saw that the
viewpoint was situated post-Coda; in this construct, however, it is situ-
ated post-Nucleus. That is, in both instances something has been com-
pleted, in the former it is the whole event, in the latter the nucleus phase.
That it is the nucleus only that is construed as completed is evident from
the dierence in interpretation of activity and inchoative achievement
verbs. The former have an immediate past interpretation, as in (28a
and b) above, the latter either an immediate past or a resultative state in-
terpretation, as in (29). We will label this the Completive aspect, in oppo-
sition to the Perfective aspect discussed previously.
(29) a. cimbanda na-y-i [Fleisch 2000: 280]
healer 3S.PANT-go-CMPL
the healer has/is gone
b. muangana na-tsi [Ibid. 157]
king 3S.PANT-die.CMPL
the king has died/is dead
The reason for this is illustrated by the schemas in Figure 33. Inchoa-
tive achievement verbs, unlike activity verbs, encode a stative coda phase.
The completive aspect situates Ego in the post-N coda phase of the
event (Fig. 33b), which may be interpreted as immediately post-N (hence,
a. With activity verbs
b. With inchoative achievement verbs (N point of transition)
Figure 33. Completive -V
Tense and cognitive space in Bantu languages 191
a have just V-ed interpretation), or as some unspecied location in
the coda (hence, an is V-ed interpretation). A comparable distinction
has been proposed for similar suxes in Zulu (see Botne and Kershner
2000).
The meaning and use of the -a-. . .-V construct follows from the analy-
ses we have just presented. The sux -V denotes a post-Nucleus perspec-
tive, the prex -a- that the event is situated at an undivided (i.e., punctive)
moment of time along the Ego-moving timeline. This, in fact, is what we
nd. With some verbs, there is a performative sense with the utterance, as
in (30a), which might best be translated as have just this instant named.
With others, there may be a sense of have come to act in this way at this
very moment, as in (30c). In all cases, the event is immediate to S.
(30) Present Completive -a-. . .-V
a. nji-a-mu-luk-u ou mu-ana li-zina
1S-PST-name-CMPL this 1-child 5-name
li-a-eni Joao [Fleisch 2000: 163]
5-POSS-3S John
I (have) named this child John
b. vi-ze vi-nzunda vi-mu-a-mon-o va-a-li-kungulul-ile [Ibid.]
8-DEM 8-frog 8-2P-PST-see-CMPL 3P-PST-REFL-gather-PFV
these frogs that you (pl) have just seen [ just mentioned in
story] gathered (again)
c. m-bambi u-a-hev-e [Ibid. 303]
9-duiker 3S-PST-be_foolsih-CMPL
how foolish the duiker is or the duiker is being foolish
Before considering the nal two constructs, we pause here briey to
summarize the ndings so far. We have dierentiated the four constructs
in terms of the aspectual meaning encoded in the sux and in terms of
the temporal character encoded in the prex. The sux denotes either
a completive (post-Nucleus) perspective or a perfective (post-Coda)
perspective on the event. The prexes -na- and -a- distinguish relations
within a domain (Time-moving timeline) from relations across domains
(Ego-moving timeline), respectively. Table 10 sets out the distinctions.
Table 10. Lucazi constructs in a dissociative model analysis
Tempus
Aspect Completive
(post-Nucleus)
Perfective
(post-Coda)
Anterior (tenor) -na-. . .-V -na-. . .-ile
Past (tense) -a-. . .-V -a-. . .-ile
Aspect
192 R. Botne and T. L. Kershner
The -a-ka-. . .-ile construct (Fleischs Hesternal Past) is the same as the
Anterior Perfective tenor in form, with addition of the prex -ka-.
14
It is
perfective and past, situating an event in that time unit adjacent to the
current relevant time unit, for example, yesterday (vs. today) (31a). It is
relative in the sense that it is anchored to a moment of speaking whether
current or otherwise, as the example in (31b) illustrates. It diers from the
Anterior Perfective in that it apparently has only a past perfective inter-
pretation, and not a past perfect interpretation as well.
(31) Contiguous Anterior Perfective -na-ka-. . .-ile
a. tu-na-ka-y-ile ku-Venduka ngoco
1P-ANT-IT-go-PST 17-Windhoek and
tu-na-ka-h luk-ile zau [Fleisch 2000: 169]
1P-ANT-IT-return-PST yesterday
we went to Windhoek and returned yesterday
b. muan-etu na-handek-a zau ngu-eni mema
brother-1P 3S.PANT-speak-CMPL yesterday QUOT-3S 6-water
a-na-ka-tontol-ele muakama zaulize [Ibid. 170]
6-PANT-AnTU-be_cold-PFV very_much day_before_yesterday
my brother said yesterday that the water had been cold the
day before
The nal construct to consider is the -a-. . .-a construct. Two related
facets of its three uses are exhibited in its experiential (32) and resultative
(33) readings. The experiential interpretation denotes not only an occur-
rence of the event at some indenite time in the past, but typically that
the event denotes an important attribute or characteristic of the subject
at the reference time. Hence, according to Fleisch, (32c) indicates not
only that we have come from Menongue, but that our coming from
Menongue is an integral attribute, either because we were born there or
because we lived there for a lengthy period of time.
(32) Experiential -a-. . .-a
a. nj-a-mon-a ngandu [Fleisch 2000: 160]
1S-PF-see-FV crocodile
I have seen a crocodile
b. na-ngandu na eni u-a-fum-a mu-liyaki [Ibid. 161]
and-crocodile and 3S 3S-PF-come_from-FV LOC-egg
and Crocodile, he too, has come from inside an egg
c. tu-a-fum-a mu-Vunonge [Ibid. 160]
1P-PF-come_from-FV LOC-Menongue
we come from Menongue
[implies birth there or living there for some time]
Tense and cognitive space in Bantu languages 193
The resultative use is very similar to the experiential in that the event
has occurred at some indenite time in the past, and a characteristic state
exists at the time of reference. This use is found with inchoative achieve-
ment verbs that express a resultant state, as in (33a). In order to indicate
that the state existed at a particular time in the past, it is necessary to
combine the -a-. . .-a construct with the auxiliary verb -pu- be(come), it-
self marked with the past perfective construct, as in (33b). In this complex
construction, the auxiliary verb functions to index a point of reference
with respect to which the resultant state is understood.
(33) Resultative -a-. . .-a
a. kasumbi u-a-l -zind-a na-ngandu [Fleisch 306; 341]
chicken 3S-PF-REFL-hate-FV COM-crocodile
Chicken and Crocodile hate one another
b. ci-tapalo c(i)-a-pu-ile c(i)-a-sungam-a [Ibid. 307]
7-street 7-PST-be-PFV 7-PST-be_straight-FV
the street was straight
In neither case does the -a-. . .-a construct co-occur with temporal ad-
verbials, either of location or duration in time. On the other hand, as the
reader can see, both interpretations denote a property characteristic of the
subject at the reference time. We propose that the -a-. . .-a construct im-
poses a temporal frame (in bold in Figure 34) over event structure along
the Ego-moving timeline. The dierence between the experiential and re-
sultative readings is determined solely by whether the event coded by the
verb expresses a stative coda phase or not. If so, then we have the resulta-
tive reading (Fig. 34b), in which the viewpoint (the right edge or bound-
ary) of this frame is located in the coda phase of the event; if not, we nd
an experiential reading, in which the right edge is outside the event, but
the event is within the experiential frame (Fig. 34a).
The temporal frame overlies the timeline, precluding identication of
specic time units on the timeline. Consequently, the time at which the
nucleus (N) of the event occurred is not statable with this verb construct.
Moreover, the frame depicts the state of the event as existing over the
duration of the interval, interpreted as denoting an experience or attribute
true of the subject over that interval.
There is a third use of the -a-. . .-a construct, one which does not follow
from either of the complex temporal modelsEgo-moving or Time-
movingwe have outlined so far. This is its use in narrative to express
consecutive occurrence of events, as in (34), although Fleisch (2000: 263)
notes that this is not as common, or as stylistically appropriate, as use of
the D-past perfective in many contexts.
194 R. Botne and T. L. Kershner
(34) Narrative sequencing with -a- . . . -a
setting: kasumbi u-a-y-ile kualenga
chicken 3S-PST-go-PFV along
ndonga [Fleisch 2000: 342]
river
one day Chicken walked along the river
event 1: kaha u-a-mon-a nguvi
then 3S-PST-see-FV hippopotamus
then he saw Hippopotamus
event 2: kunahu u-a-mu-sik-a ngu-eni: . . .
then 3S-PST-3S-call-FV QUOT-3S
then he called him, saying: . . .
(35) event 3a: kaha nguvu u-a-y-ile
then hippo 3S-PST-go-PFV
u-a-ka-lek-ile ngandu [Ibid.]
3S-PST-SEQ-tell-PFV croc
then Hippopotamus went and told Crocodile
a. Experiential, e.g., see a crocodile
b. Resultative (of change-of-state verb), e.g., hate
Figure 34. Interpretations of the -a-. . .-a construct
Tense and cognitive space in Bantu languages 195
event 3b: kaha nguvu u-a-y-a
3S-PST-go-FV
u-a-ka-lek-a ngandu [Ibid. 162]
3S-PST-SEQ-tell-FV
then Hippopotamus went and told Crocodile
The storyteller apparently gave two renditions of the same story, one in
which he reverted to the D-past perfective when the scene changed (35a),
one in which he continued with use of the -a-. . .-a construct in sequencing
events (35b). In both cases, the itive marker -ka- adds emphasis to a
closely tied sequence.
The use of -a-. . .-a in the narrative follows not from the Ego- vs Time-
moving models that we have discussed throughout the paper, but rather
from a complex temporal sequencing model (see Evans 2005: 229234
for a detailed discussion), in which sequences of events are conceptualized
as discrete entities with respect to some event in question other than the
time of speaking (S). In Lucazi story narrative, this event is typically
marked with the D-past perfective construct -a-. . .-ile, as in (34) above.
Subsequent events may be marked either with the same form or with the
-a-. . .-a construct. As Evans (2005: 230) indicates, a consequence of inte-
gration into this model is the imposition of an in-tandem alignment on
the events (Fig. 35).
These rather curious parallel sets of constructs nd a motivated analy-
sis in the model we are proposing. As illustrated in Figure 36, the similar-
ity, both morphological and semantic, between the Anterior and Past sets
results from their patterning in similar fashion along a timeline; their
dierences arise from their patterning along dierent perspectives of the
timeline, the Anteriors along the moving-Event perspective, the Pasts
along the Ego-moving perspective. The Perfect indicates that an event
has been completed, but also falls within the experiential domain of the
reference locus, i.e., it is relative. Hence, we can conclude that it is found
in both the P-domain and the past D-domain. As noted above, the Past
Anterior does not co-occur with the D-Past, but does co-occur with the
Perfective. This falls out from the occurrence of both in the P-domain, in
contrast to the D-Past, which situates an event in a dierent domain.
Figure 35. Complex sequencing model of narrative sequencing with -a-. . .-a
196 R. Botne and T. L. Kershner
13. Curiosity #6: Multiple futures in Chisukwa (M.20, Malawi)
To this point we have focused primarily on past tenses. Here, we will
discuss the occurrence of multiple future tenses in Chisukwa that seem to
overlap at least to some extent in temporal reference. There are four such
forms, as shown in Figure 38; data in (36) are from Kershner (2002).
They dier in marking: ti vs tiise, and vs ka. Furthermore, speakers of
Chisukwa consider tiise . . .-ka-. . .-e marked events to be deeply re-
mote. What needs to be claried is the relation these congurations
Figure 36. Organizational schema of Lucazi Anteriors and Pasts
Figure 37. Future tenses in Chisukwa
Figure 38. Simple futures in Chisukwa
Tense and cognitive space in Bantu languages 197
have to one another semantically and what the individual morphemes
contribute to the meanings.
(36) F
1
ti a-mu-busy-e
F 3S-3S-tell-F
s/he will tell him/her (sometime soon)
F
2
tiise a-mu-busy-e pala abasikali biisa
FCont 3S-3S-tell-F if police 3P.P1.come
s/he may tell him/her if/when the police (have) come
F
3
ti tu-ka-byaal-e amalima
F 1P-F3-plant-F beans
we will plant beans (at some point)
F
4
tiise tu-ka-byaal-e amalima
FCont 1P-F3-plant-F beans
we might plant beans (e.g., at some point if we have money)
[Note: FCont contingent future]
The conguration ti . . .-e (F1) denotes the highly probable occurrence
of an event in relatively close proximity to the speech event (S). That is,
the speaker is condent of the event happening; hence, it is perceived as in
the here-and-now (Fig. 38). In contrast, the similar conguration ti
-ka-. . .-e (F3) suggests less certainty, both in speaker condence of the
event taking place and in the time at which it might occur. We propose
that the -ka- marks dissociation, situating the event in a future D-domain.
Hence, from the speakers perspective, the event is subjectively more re-
mote. The congurations with tiise (F2 and F4) contrast with each other
in the same way as the ti congurations (F1 and F3). The question, then,
is how the tiise forms dier in meaning and use from the ti forms.
The tiise conguration indicates that the occurrence of the reported
event depends on the fulllment of prior information or on the occurrence
of some other event, a second reference locus R2. The element tiise is
appropriately analyzed as bi-morphemic: ti :se. The morpheme -:se
induces vowel length, but does not aect vowel quality. Evidence for this
comes from the negative forms: ta (<ti a) and taase (<ti a :se)
(Kershner 2002: 193). It is the morpheme -:se that establishes the new
R2. Moreover, this -:se derives from the verb come, -iisa.
Consider, then, the sentence with F4 in (36) above and its schematic
representation in Figure 40 below. The marker -ka- situates the event in
the future D-domain. tiise indicates that there is another condition or event
(in this case, have money) that must be fullled before the reported event
plant beans will occur. Hence, tiise establishes a new reference locus, an
R2, eectively indexing some discourse established event as the substantive
time of R2, thereby creating a new locus of orientation and, hence, a new
198 R. Botne and T. L. Kershner
perspective on time. The reported event is future with respect to R2 within
the future D-domain. It is for this reason that tiise . . .-ka-. . .-e events are
considered by Sukwa speakers to be deeply remote.
A simple linear analysis does not account for these dierences in a de-
scriptively accurate or explanatory manner; our domain analysis does. As
the schema in Figure 40 shows, the ti and tiise congurations comprise
parallel sets in the P- and D-domains. Events marked with ti are simple
futures, either in the P-domain or D-domain. On the other hand, events
marked with tiise are indexed to an antecedent R2. This, in essence, con-
stitutes a kind of mediated remoteness, in contrast to the more direct
remoteness of the D-domain.
Figure 39. Representation of F4
Figure 40. Combined schema of Chisukwa futures
Tense and cognitive space in Bantu languages 199
These data demonstrate, once again, that the same markers may be
used to express temporal relations in both P- and D-domains. Typically,
though not necessarily, there will be a marker such as Chisukwa -ka- that
indicates which of the two domains the tense marking is referencing.
We have seen in this analysis of Chisukwa that the futures involve
more than simple temporal meaning; they involve as well semantic di-
mensions of greater or lesser epistemic certainty and contingency. Our
model, we believe, lends itself well to integrating both temporal and non-
temporal dimensions of meaning in one unied account. Although we are
not able to extend the analysis here to other non-temporal uses, condi-
tional contexts, for example, we feel that the account we have provided
holds much promise for future investigation.
14. Curiosity #7: Discontinuity and pattern in Kom tense markers
(Grasselds Bantu, Cameroon)
Kom presents a challenging case, both for Comries hypothesis of conti-
nuity in the time reference of tense marking as well as for our thesis that a
dissociated domain approach provides a motivated, explanatory analysis
of the data. There are four past tenses, a present/future, and three fu-
tures, in addition to two aspects, imperfective and completive. Data are
from Chia (1976).
(37) a. Amadu nun l fel
A. P4 work
Amadu worked [remote]
b. ivu t na su su? a
rain P3 IMPF INCEP fall
rain was beginning to fall
c. Peter l nyij busi-busi
P. P2 ran in_morning
Peter ran in the morning
d. wu ni yem njaj j taka wa gwi
3S P1 sing song before 2S come
she sang a song before you came
(38) es nun kf asaj uwe afein
1P Pr harvest corn week this
we are harvesting corn this week
(39) a. Pauline ni yem njja a-l kf a
P. F1 sing song LOC-evening
Pauline will sing a song in the evening
200 R. Botne and T. L. Kershner
b. Peter l fel-a al bus
P. F2 work-F tomorrow
Peter will work tomorrow
c. wu nun l yem njaj ulva wu lema kwen
3S F3 sing song when 3S grow reach
she will sing a song when she matures
As one can readily see from these examples and the list of tense
markers alone in Table 11, Kom tense marking is replete with segmental
forms recurring in dierent tense constructions and, hence, having dier-
ent time reference. For example, the forms nun and lwith dierent
tonesoccur in tenses referring to three dierent times, while ni is em-
ployed in two.
The P4 and F4 tenses stand out as curiosities in two ways: they are
composed of two elements, unlike other markers, and they incorporate
the same morphemenun as does the present/future (Pr/F1) tense. Al-
though one could certainly describe the system in terms of a simple linear
model with dierent degrees of remoteness incorporated into it, it would
be dicult to see any coherent motivation for the semantic and morpho-
logical patterning in such an analysis. Although the overall distribution of
tense markers suggests a gradual temporal demarcation outwards from
the present (and even that pattern is disrupted by P3), that doesnt pro-
vide any explanation for the composition of P4 and F4 from Pr/F1 nun
and P2/F3 l. However, it does become understandable and motivated
in the domain approach we are advocating. First, nun functions to denote
a kind of present along both timelines. Not only does it indicate an on-
going event at S, it is found in generic sentences as well. In (40) a simple
present generic is formed with the present tense. Furthermore, a past or
future generic is also possible, as Chia points out (1976: 125), but only
with the P4 or F3 tenses, respectively, as illustrated in (41). No other
tenses can be used this way. That is, only nun tenses can be used to denote
Table 11. Kom tense markers
Tense markers Temporal reference
P4 nun l long ago
P3 t sometime before today (e.g., yesterday, last year)
P2 l early in the day
P1 ni a little while ago (aP3 hours)
Pr/F1 nun now or in a little while (aP3 hours)
F2 ni later in the day
F3 l tomorrow or specic known time in the future
F4 nu n l some time in the future (after tomorrow)
Tense and cognitive space in Bantu languages 201
generic events, i.e., genericness is associated with the static perspective of
time along the Ego-moving timeline. At any point along this line, the
speakers perspective is from within a particular cognitive domain for
which that situation named by a verb is true.
15
(40) Kom nun fel akun
Kom_people Pr cultivate rice
the Kom cultivate rice
(41) a. ngum su nun l na kful nfu?
locusts P4 IMPF eat grass
locusts ate grass
b. ngum su nun l na kful nfu?
locusts F3 IMPF eat grass
locusts will eat grass
Thus, nun (in its generic or gnomic use) situates events along the Ego-
moving timeline (see Fig. 41); l marks either of the dissociated D-
domains (41). Consequently, location of events in either D-domain is
marked by nun l, past or future being dierentiated by reversed tone
patterns on the tense markers.
In contrast, all the tenor markers situate events along the moving-event
timeline, that is, within the P-domain, as illustrated in Figure 42. Note in
particular that nun occurs as a marker of present along both timelines.
Note, too, that l here marks time units removed from S, i.e.,
not contiguous with S; hence, use of l appears to be consistent over
each timeline.
16
More specically, the tenor forms in the P-domain are
organized in form, meaning, and distribution by two interacting princi-
ples. First, the time frame is sub-divided into approximately equivalent
Figure 41. Distribution of nun forms
202 R. Botne and T. L. Kershner
intervals grounded in the concept of today (see Fig. 43a), creating a bi-
lateral symmetry. Two relatively short intervals (approximately e3 hrs)
form the central core, the more future interval of which contains the
time of the speech event. One time unit removed are the comparable in-
tervals earlier and later in the day. All four of these today intervals re-
ceive a low (L) tone on the tense marker. Parallel intervals outside of to-
day, the pre-hodiernal and post-hodiernal intervals, both receive high
(H) tone on the tense marker. Second, the tense forms refer to abstract
metrical units in relation to S (Fig. 43b). Thus, ni refers to those time
units abutting S, l to those time units twice removed from S. Those two
that are temporally past receive L tone, those that are future, H tone.
Figure 42. Tenor marking in the P-domain
Figure 43. Organizational principles in the Kom P-domain
Tense and cognitive space in Bantu languages 203
Note that the tones in the upper and lower schemas match, with the sole
exception of ni later in the day, where we observe a HL sequence. This
pattern, which initially appeared out of place, is to be expected from the
correlation of hodiernal temporal divisions with grammatically abstract
temporal units denoted by the tense markers, as illustrated here.
The Kom case demonstrates that continuity of time reference is a prin-
ciple that holds within a region of one domain, for example, over the past
in the P-domain. Thus, l occurs in the past of the P-domain, but also in
the future, a separate region. It also marks each of the D-domains. This
symmetrical pattern is not just a coincidental fact about Kom; the
same type of correspondence in form between remote past and future do-
main marking can be found in numerous Bantu languages, as the exam-
ples in Table 12 attest.
Segmentally, the dissociative (remote) marker in each of the languages
is identical; tonally, they are identical in two languages, dierent in three,
as well as in Kom. This fact supports the view that Bantu languages often
mark the two dissociative D-domains in the same way, typically using
tone to separate past from future.
In sum, we can retain Comries continuity hypothesis as a universal
principle, but with the stipulation that the same marker may be used in
dierent domains or in comparable intervals (past/future) of the P-
domain.
15. Limits on domains: Bamileke-Dschang (Grasselds Bantu;
Cameroon)
The discussion to this point has addressed languages that mark at most a
past and a future D-domain in addition to the P-domain. One may well
ask whether more than one past or future D-domain can be marked and,
Table 12. Some Bantu languages with comparable marking for D-domains
Tones identical
Sibende F12 P2 -a-ka- cf. P1 -a- Yuko Abe p.c.
F2 -loo-ka- F1 -loo-
Kimatuumbi P13 P2 -a--. . .-ite cf. P1 --. . .-ite Odden 1996
F2 -a-luwa- . . -a F2 -luwa-. .-a
Tones differ
Ewondo A72a P3 -ngaH Redden 1979
F3 -ngaMid
Gimbala H41 P3 -ga- Ndolo 1972
F2 -ga-
Ciila M63 P2 -a-ka- cf. P1 -a- Yukuwa 1987
F2 -la-ka- F1 -la-
204 R. Botne and T. L. Kershner
in fact, whether there are any limits on the number of domains marked.
In making a rst pass at addressing these questions, we can consider the
case of Bamileke-Dschang, whose tense system represents perhaps the
fullest development possible, having ve pasts and ve futures, illustrated
by the examples in (42)(43).
(42) Pasts in Bamileke-Dschang [Hyman 1980: 227]
a. P1 aa
!
taj he bargained [ just a moment ago]
3S.P1 bargain
b. P2 a aa ntaj he bargained [earlier today]
3S P2 bargain
c. P3 a ke taj
!
j he bargained [yesterday]
3S P3 bargain
d. P4 a le taj
!
j he bargained [before yesterday]
3S P4 bargain
e. P5 a le la n
!
taj he bargained [long ago]
3S P4 P5 bargain
(43) Futures in Bamileke-Dschang [Ibid. 228]
a. F1: a
!
a
!
taj he is about to bargain
3S.F1 bargain
b. F2 aa
!
pij
!
j taj he will bargain [later today]
3S.F1 F2 bargain
c. F3
17
aa
!
sue taj he will bargain [tomorrow]
3S.F1 F3 bargain
d. F4 a
!
a lae
!
taj he will bargain [after tomorrow;
3S.F1 F4 bargain some days hence]
e. F5 a
!
a fu
!
taj he will bargain [a long time from
now] 3S.F1 F5 bargain
Following analysis of these constructions, Hyman (1980: 228) extracts
the tense formatives as listed in Table 13. What is of particular interest
and relevance for our discussion here are the two pasts, P4 and P5, and
the F4 future.
Table 13. Bamileke-Dschang tenses
Time reference Past Future
Proximate P
1
a` F
1
!
a
Same day P
2
aa F
2
a
!
pi
e1 day P
3
ke F
3
a
!
s$
e2 days P
4
le F
4
!
a la$
e1 year or more P
5
le la$ F
5
!
a fu
Tense and cognitive space in Bantu languages 205
Before discussing in more detail those tenses, it is important to note
that the Bamileke system, according to Hyman, represents relative tense
marking rather than absolute. This can be observed, for example, in the
crastinal (tomorrow) future F3, derived from the verb le-su to come.
A sentence such as that in (44) may be interpreted in absolute terms with
respect to the time of speaking, as illustrated in the schema in Figure 44.
(44) a ke
!
le jgs oo
!
sue zu
!
m [Hyman 1980: 229]
he P
3
say that you.F1 F
3
see child
he said that you will see the child [tomorrow]
However, the same sentence can be interpreted relatively, i.e., with re-
spect to some other reference locus identied in the discourse context, in
this case that denoted by said, as in (45). The F3 tense marker still sit-
uates the event in the adjacent posterior time unit, but that unit is now
understood to be today because the reference locus R is situated in the
time unit yesterday (Fig. 45). It is thus signicant for our model that it
is not restricted to absolute tense, S being simply a special locus of orien-
tation. At any orienting locus, there will be two perspectives on time, and
the model will apply accordingly.
(45) a ke
!
le jgs oo
!
sue zu
!
m [Hyman 1980: 229]
he P
3
say that you.F1 F
3
see child
he said that you will see the child [today]
18
The Bamileke-Dschang system, because it is more complex than most,
lends itself well to a domain analysis rather than a simple linear analysis.
Figure 44. Absolute interpretation of a su
206 R. Botne and T. L. Kershner
Consider that four of the tenses denote times within the current time
unit of today, while two others denote times one time unit away, either
yesterday or tomorrow. We propose that these tenors situate events
along the moving-Event timeline in the P-domain (Fig. 46). One piece of
evidence supporting this analysis comes from the crastinal (tomorrow)
future noted above; the verb come (4445) denotes events moving to-
ward and past the reference locus. Its counterpart, ke, seems likely to
have derived from an obsolete verb for go or an itive marker (compare
the case of Lucazi, note 14 and see Botne 2006b).
Consider now the more remote past tenses, P4 and P5. We can analyze
le (P4) as denoting a remote past, situating an event in a D-domain. But
what about P5? As the time reference indicated in (43) and Table 13
Figure 45. Relative interpretation of a su
Figure 46. Bamileke-Dschang P-domain
Tense and cognitive space in Bantu languages 207
shows, this is a very remote past; in fact, it is constructed in part with the
P4 past marker le. Hence, we propose that it establishes a second D-
domain in a more remote past than P4 (Fig. 47), that is, the la element
instills an earlier time sense.
We see also in Figure 47 and in (43d) and Table 13 that la appears as a
marker of a future D-domain (F4), but that it is not the most remote fu-
ture. Assuming these two tenses derived from the same morpheme la, this
is an odd situation. There is, however, a unifying analysis behind this cu-
riosity. Hyman (1980: 234) points out that la, in conjunction with P2 and
P3 (i.e., the P-domain pasts) is also used as a pluperfect, as shown by the
examples in (46). He concludes from this that the pluperfect and far re-
mote past use (P5) are comparable anteriors and can be factored out of
the tense system, but that the future la (F4) is not comparable and must
be treated dierently. However, within the framework of our model, we
can see that la functions consistently and coherently in the dierent time
dimensions. In the tenor dimension of the P-domain, i.e., along the
moving-Event timeline, it combines with P2 and P3 to indicate anteriority
to an indexed orienting locus; in the tense dimension across domains,
i.e., along the Ego-moving timeline, it denotes a domain prior to another
D-domain, i.e., P5 prior to P4, or F4 prior to F5. Thus, the key concept
underlying la in both time dimensions is temporal anteriority, construed
in a manner commensurate with the particular conceptualization of the
timeline. This division of functional roles lends further support to the
analysis proposed in Figure 47, as the hesternal ke` and crastinal su both
behave in the same way with respect to la.
Figure 47. Bamileke-Dschang temporal domains
208 R. Botne and T. L. Kershner
(46) a. a aa nda? ntaj he had already
bargained
[Hyman 1980: 234]
he P2 la bargain
(earlier today)
b. a ke nda? ntaj he had already bargained (yesterday)
he P3 la bargain
Assuming this analysis to be appropriate, we propose further that
it also represents the limits of a tense and tenor system. That is, once
a language goes beyond one D-domain in either the past or the future,
it will systematize only one more domain in that time sphere (whether
past or future) and that marking of that domain will be built upon or
with respect to the marking of the other D-domain. One might appropri-
ately consider and compare this situation with the surcompose past
tenses of French, built upon the passe compose, or the use of the English
pluperfect for a past-before-past interpretation, as noted in 5 (Fig.
19b).
16. A brief re-examination of Burera and Palantla Chinantec
We return here briey to the cases of Burera and Palantla Chinantec,
whose tense systems, as we indicated in the introduction, posed problems
for the idea of a linear, continuous marking of tenses. Following from our
analyses in terms of the approach we have advocated here, there is no is-
sue of continuity; seemingly discontinuous tenses are continuous within
their domains. Where Burera diers from Bantu languages is that it only
marks tenor relations within domains and not tense relations between do-
mains, as shown in Figure 48. Note that we have no data to determine the
Figure 48. Burera tenses in the domain approach
Tense and cognitive space in Bantu languages 209
direction of the Time-moving timeline; for expository purposes, we have
represented it by a dotted line moving toward the past. The same holds
for the case of Palantla Chinantec below (Fig. 49).
As an interesting aside, Glasgow (1964) suggests (following a sugges-
tion by Richard Pittman) interpreting the Burera tenses as occurring in
two frames of reference. Although it is unlikely that either of them had
our approach in mind, the spirit seems similar.
The case of Palantla Chinantec (Merrield 1968) is similar to that of
Burera but illustrates a simpler system than that of Burera in that it does
not make a remoteness distinction within the D-domain. Like Burera,
however, it only marks tenor relations within domains and not tense rela-
tions between them.
17. Conclusion
Our goal in this paper has been to present various curious kinds of data
that nd no satisfactory motivation or analysis in a traditional one-
dimensional linear model of temporal relations. The dierent kinds of
curiosities investigated, we propose, provide evidence for a multi-
dimensional model, organized by and grounded in two cogent factors:
(1) the relevance of three perspectives on timeEgo-moving vs Time-
moving (moving event vs moving ego)which motivate dierences in
the organization of tense markers, and (2) the dierentiation of temporal
marking into cognitively distinct domains. This dierentiation into sepa-
rate cognitive domains is not unlike the distinction proposed by Chafe
(1994: 19899) between displaced conscious experience and immedi-
Figure 49. Palantla Chinantec
210 R. Botne and T. L. Kershner
ate conscious experience, and perhaps relates in some way to the distinc-
tion Givo n (1984: 405) makes between active and permanent les in a
mentally projected world.
In the model we propose, tense systems may grammatically mark con-
tinuity of time relations along a timeline in each domain; that is, a tense
marker may denote the temporal relation of an event with respect to a
reference locus within a domain, the tenor of the event. We can thus
salvage Comries proposed universal of continuity by stating that in a
tense system, the time reference of each tense (i.e., tenor) is a continuity
within a specic domain.
Second, tense systems may mark discontinuity of relations reecting
deictic dissociation, i.e., contemporal vs not contemporal for either
past or future times, that is, the relation of a reference frame to S. Thus,
a tense marker may function to project an event into a separate cognitive
domain. This also has implications for Comries universal: along the Ego-
moving timeline, i.e., across domains, the time reference of each tense
marker will have the same general meaning (recall the Kom case of l as
remote marking).
Third, tense systems may mark dierent kinds of remoteness, three of
which we have identied here. These include metrical remoteness in
either the P- or D-domains, dissociative remoteness imbued by projec-
ting an event into a D-domain, and mediated remoteness, a conse-
quence of a reported event being directly related to a second tense locus
intervening between it and the speech event.
Given these ndings, we believe this model establishes a functionally
versatile cognitive framework that accommodates the diversity and range
of tense/aspect systems we encounter in Bantu languages and beyond,
and provides a dynamic means for analyzing and comparing deictic phe-
nomena in the verb in a ner-grained manner than has previously been
the case.
Received 5 September 2006 Indiana University, USA
Revision received 28 August 2007 Kansas State University, USA
Tense and cognitive space in Bantu languages 211
Appendix
Abbreviations
AnTU anterior time unit IMPF imperfective PFV perfective
APPL applicative INCEP inceptive PoA point of
assessment
C coda INF innitive POSS possessive
CMPL completive INT intensive Pr present
COM comitative INTJ interjection PST past
CONT contigent IS immediate
scope
QUOT quotative
COP copula IT itive R2 2nd reference
locus
CTU current time unit L low tone REFL reexive
CV consonant-vowel LOC locative RSL resultative
DEM demonstrative N nucleus S speech time
DEP dependent NAR narrative SEQ sequentive
E event NEG negative t time
F future O onset T tense
FV verb-nal vowel PANT past
anterior
TAM tense, aspect,
mood
H high tone PASS passive V vowel
HOD hodiernal PF perfect
Note: 1S; 2S; 3S 1st person singular, etc; 1P; 2P; 3P 1st person plu-
ral, etc.; P1; P2; P3; etc. nearer past, more distant past, etc.; F1; F2;
F3; etc. nearer future, more distant future, etc.; numbers that appear
as glosses with nouns indicate verb classes. The same number on the verb
or modier denotes agreement with that noun. A diamond ()) marks the
location of an event on a timeline. In Reichenbach, PP point present,
RP recalled point, AP anticipated point, RAP recalled anticipated
point.
212 R. Botne and T. L. Kershner
Notes
* This paper has gone through many revisions, proting from comments and discussion
following presentations at the 98th Annual meeting of the American Anthropological
Society (1999), the 32nd Annual Conference on African Linguistics (2001), The 4th
World Conference on African Linguistics (2003), an IULC colloquium (2004), and the
International Conference on Bantu Grammar (2006). We wish to thank Phil Lesourd,
John Hewson, Brian Joseph, and Ewa Dabrowska, as well as a coterie of reviewers
for their valuable comments and suggestions. We also thank Carol Orwig and Keith
Patman for assistance with obtaining and clarifying data on Nugunu. Any errors or
misinterpretations are the sole responsibility of the authors: Robert Botne, Indiana
University (botner@indiana.edu) and Tiany L. Kershner, Kansas State University
(tlkershn@ksu.edu).
1. Dixon (2002) refers to it as Burarra.
2. We use the term event throughout the paper in a very general and generic sense to
refer to any type of basic predication. It is equivalent here to what situation would
be in a more technical discussion of verb types.
3. Numbering such as E.62 refers to the referential classication of individual Bantu lan-
guages followed by Bantuists (based on Guthrie 1948, 1971), the letter identifying one
of 16 zones, the number a particular language within a zone.
4. There are at least two problems with Langackers arguments. First, with respect to fu-
ture use, although the simple -form often applies to scheduled events, this does not
necessarily appear to be the case. If I am exasperated with the meals I am getting at
home, I could say Tomorrow we eat out, or Tomorrow we will eat out, or To-
morrow we are going to eat out. The former seems no more scheduled than the other
two. A similar remark can be made about past use as a historical present. Langacker
considers this use to constitute a virtual occurrence of the event(s) at the time of
speaking. However, theres no reason to believe this past use is any more virtual than
that marked by -ED; neither is actually occurring at the time of speaking.
5. There is no term that captures exactly the essence of the cognitive domain coincident
with S, hence we use the term contemporal here for lack of a more precise term.
Contemporal is, naturally, a relative notion whose range is determined by each lan-
guage but whose meaning suggests prevailing eect or relevance at S. It replaces the
use of (extended) now in Botne (2003a).
6. A possible additional contrast seems to exist in the case of oral vs written language use.
Consider, for example, the passe compose vs preterit distinction in French, with the lat-
ter having narrowed its use to marking specically written language, again indicative of
a dissociative function. Perhaps others are possible as well: narrative-non-narrative, for
example.
7. Allomorphic variation occurs in most morphemes: Norwegian (-et, -te, -de, -dde), Slave
(-o-, -u-, -wo-).
8. The double letters at the end of the verb appear to be an orthographic convention for
writing a falling or rising tone on a short vowel. For temporal markers, double letters
indicate length.
9. We do not believe this to be the same thing as taxis, Jakobsons (1971) term often
equated with relative tense, although Gu ldemanns (2002) denitionthe time re-
lation between the communicated state of aairs and another state of aairs which is
encoded or implied in the discourse context and which serves as the temporal reference
pointis very similar. The dierence is to be found in the use of S or some specied
time (other than that related to some event) as the reference anchor in our case.
Tense and cognitive space in Bantu languages 213
10. It may also be used in the future as, for example, in (6b) above.
11. Kilega has a 7-vowel system, apparently involving ATR distinctions in the high vowels
(Botne 2003b).
ATR

High i i u u
Non-high e a o
12. We thank Thilo Schadeberg for bringing this example to our attention.
13. -ine is an allomorph of -ile, occurring following a nasal in the verb stem.
14. Note that the itive (i.e., motion away) marker -ka- denotes the movement out of the
current time period of today into the preceding time interval yesterday. This accords
with the moving event perspective of time, as shown in the schema.
15. In our model, simple or progressive, habitual, and generic presents can be dierenti-
ated by the timeline and domain that are relevant. A simple or progressive present
indicates an event situated at S along the Time-moving timeline in the P-domain; a ha-
bitual is true of the whole domain. Generics or gnomics, on the other hand, denote
the relation of an event to the static timeline from the perspective of Ego-moving
across a static temporal landscape.
16. The morpheme l appears to have derived from the verb lli wake up (Chia 1976: 95
96), grammaticized as a marker referring to events earlier in the day. While this may
have been motivation for its grammaticization as an early today past, it doesnt ap-
pear to provide motivation for its other uses.
17. There is an alternative construction for F3 with
!
lu
!
lu instead of
!
su
!
e. Since, according
to Hyman (1980), they are comparable semantically, we will only consider the latter
form here.
18. One reviewer noted that this sentence might be ambiguous between seeing the child
later today (i.e., after S) and seeing the child earlier today (i.e., before S). Although an
interesting question, we do not have an answer, as Hyman (1980) does not give any
more information than the two readings we have cited here.
214 R. Botne and T. L. Kershner
References
Benveniste, Emile
1965 Language and human experience. Diogenes 51, 112.
Binnick, Robert I.
1991 Time and the Verb: A Guide to Tense and Aspect. Oxford/New York:
Oxford University Press.
Botne, Robert
2003a Dissociation in tense, realis, and location in Chindali verbs. Anthropological
Linguistics 45, 390412.
2003b Lega (Beya dialect) (D25). In Nurse, Derek and Gerard Philippson (eds.),
The Bantu Languages. London/New York: Routedge, 422449.
2006a A Grammatical Sketch of the Lusaamia Verb. Ko ln: Ru diger Ko ppe Verlag.
2006b Motion, time, and tense: Grammaticization of come and go futures in
Bantu. Studies in African Linguistics 35, 127188.
Botne, Robert and Tiany L. Kershner
2000 Time, tense, and the perfect in Zulu. Afrika und U

bersee 83:161180.
Bull, William E.
1960 Time, Tense, and the Verb. Berkeley: University of California Press.
Burrow, J. A. and Thorlac Turville-Petre.
1992 A Book of Middle English. Oxford/Cambridge, MA: Blackwell.
Bybee, Joan
1985 Morphology: A study of the relation between meaning and form. Amsterdam/
Philadelphia: John Benjamins Publishing Company.
Bybee, Joan, Revere Perkins, and William Pagliuca
1994 The Evolution of Grammar: Tense, Aspect, and Modality in the Languages of
the World. Chicago/London: The University of Chicago Press.
Chafe, Wallace
1994 Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: The University of Chi-
cago Press.
Chia, Emmanuel Nges
1976 Kom tenses and aspects. Doctoral dissertation, Georgetown University.
Chung, Sandra, and Alan Timberlake
1985 Tense, aspect, and mood. In Shopen, Timothy (ed.), Language Typology and
Syntactic Description: Grammatical Categories and the Lexicon. Cambridge:
Cambridge University Press, 202258.
Comrie, Bernard
1981 On Reichenbachs approach to tense. In Hendrick, Roberta A., Carrie S.
Masek, and Mary Frances Miller (eds.), Papers from the Seventeenth Re-
gional Meeting, Chicago Linguistic Society, 2430.
1985 Tense. Cambridge: Cambridge University Press.
Cutrer, L. Michelle
1994 Time and Tense in Narratives and Everyday Language. Doctoral disserta-
tion, University of California, San Diego.
Declerck, Renaat
1991 Tense in English. London/New York: Routledge.
Dinsmore, John
1982 The semantic nature of Reichenbachs tense system. Glossa 16, 216239.
Tense and cognitive space in Bantu languages 215
Dixon, Robert M. W.
2002 Australian Languages: Their Nature and Development. Cambridge/New
York: Cambridge University Press.
Dugast, Idelette
1971 Grammaire du tunen. Paris: E

ditions Klincksieck.
Emanatian, Michele
1992 Chagga come and go: Metaphor and the development of tense-aspect.
Studies in Language 16, 133.
Evans, Vyvyan
2005 The Structure of Time. Amsterdam/Philadlphia: John Benjamins.
Evans, Vyvyan and Melanie Green
2006 Cognitive Linguistics: An Introduction. Mahwah, NJ/London: Lawrence
Erlbaum Associates.
Fauconnier, Gilles
1985 Mental Spaces. Cambridge, MA: MIT Press.
1997 Mappings in Thought and Language. Cambridge, New York/Melbourne:
Cambridge University Press.
Fleisch, Axel
2000 Lucazi Grammar. Ko ln: Ru diger Ko ppe Verlag.
Fleischman, Suzanne
1982 The past and the future: Are they coming or going? Proceedings of the Berke-
ley Linguistics Society 8, 322334.
1989 Temporal distance: A basic linguistic metaphor. Studies in Language 13, 1
50.
Frawley, William
1992 Linguistic Semantics. Hillsdale, NJ/London: Lawrence Erlbaum Associates.
Gerhardt, Phyllis
1989 Les temps en nugunu. In Barreteau, Daniel and Robert Hedinger (eds.), De-
scriptions de Langues Camerounaises. Paris: Agence de Cooperation Cultur-
elle et Technique and ORSTOM, 315331.
Givo n, Talmy
1984 Syntax: A Functional-Typological Introduction. Amsterdam/Philadelphia:
John Benjamins Publishing.
2001 Syntax: An Introduction (rev. ed.). Amsterdam/Philadelphia: John Benja-
mins Publishing.
Glasgow, Kathleen
1964 Frame of reference for two Burera tenses. In Pittman, Richard and Harland
Kerr (eds.), Papers on the Languages of the Australian Aborigines. Occa-
sional Papers in Aboriginal Studies 3. Canberra: Australian Institute of Ab-
original Studies, 118.
Guillaume, Gustave
1929 Temps et verbe. Paris: Champion.
1937 The`mes de present et syste`me des temps francais. Journal de psychologie.
Reprinted in Guillaume, Gustave Langage et science du langage (1964).
Quebec: Presses de lUniversite Laval, 5972.
1945 Architechtonique du temps dans les langues classiques. Copenhagen: Munks-
gard.
Gu ldemann, Tom
2002 The relation between imperfective and simultaneous taxis in Bantu: Late
stages of grammaticalization. In Fiedler, Ines, Catherine Griefenow-Mewis,
216 R. Botne and T. L. Kershner
and Brigitte Reineke (eds.), Afrikanische Sprachen im Brennpunkt der For-
schung. Ko ln: Ru diger Ko ppe Verlag, 157177.
Guthrie, Malcolm
1948 The Classication of the Bantu Languages. London: Oxford University Press
for the International African Institute (IAI).
1971 Comparative Bantu: An introduction to the comparative and pre-history of the
Bantu languages, vol. 2. London: Gregg International.
Helland, Hans Petter
1995 A compositional analysis of the French tense system. In Thiero, Rolf (ed.),
Tense Systems in European Languages II. Tu bingen: Max Niemeyer Verlag,
6994.
Hewson, John, Derek Nurse, and Henry Muzale
2000 Chronogenetic staging of tense in Ruhaya. Studies in African Linguistics 29,
3356.
Hornstein, Norbert
1990 As Time Goes By: Tense and universal grammar. Cambridge, MA: MIT Press.
Hyman, Larry M.
1980 Relative time reference in the Bamileke tense system. Studies in African Lin-
guistics 11, 227237.
2003 Basaa (A43). In Nurse, Derek and Gerard Philippson (eds.), The Bantu Lan-
guages. London/New York: Routledge, 257282.
Jakobson, Roman
1971 (1956) Shifters, verbal categoies, and the Russian verb. In Jakobson, R., Selected
Writings, The Hague: Mouton, 13047.
James, Deborah
1982 Past tense and the hypothetical: A cross-linguistic study. Studies in Language
6, 375403.
Janssen, Theo A. J. M.
1994 Tense in Dutch: Eight tenses or two tenses? In Thiero, Rolf, and Joachim
Ballweg (eds.), Tense Systems in European Languages. Tu bingen: Max Nie-
meyer Verlag, 93118.
Kershner, Tiany L.
2002 The verb in Chisukwa: Aspect, tense, and time. Doctoral dissertation. Indi-
ana University, Bloomington.
Klein, Wolfgang
1992 The present perfect puzzle. Language 68, 525552.
Lako, George and Mark Johnson
1999 Philosophy in the Flesh. New York: Basic books.
Langacker, Ronald W.
2000 Grammar and Conceptualization. Berlin/New York: Mouton de Gruyter.
2001 The English present tense. English Language and Linguistics 5, 251272.
Maganga, Clement and Thilo C. Schadeberg
1992 Kinyamwezi: grammar, texts, vocabulary. Ko ln: Ru diger Ko ppe Verlag.
Mbom, Bertrade
1996 The parameter of remoteness distinction in temporal organization: The
case of Basaa. Paper presented at the 27th Annual conference of African
Linguistics. Gainesville: University of Florida.
Merrield, William R.
1968 Palantla Chinantec Grammar. Mexico City: Instituto Nacional de Anthropo-
log a e Historia de Mexico.
Tense and cognitive space in Bantu languages 217
Mous, Maarten
2003 Nen (A44). In Nurse, Derek and Gerard Philippson (eds.), The Bantu Lan-
guages. London/New York: Routledge, 283306.
Ndolo, Pius
1972 Essai sur la tonalite et la exion verbale du Gimbala. Tervuren: Musee Royal
de lAfrique Centrale.
Nurse, Derek
2003 Aspect and tense in Bantu languages. In Nurse, Derek and Gerard Philipp-
son (eds.), The Bantu Languages. London/New York: Routledge, 90102.
Nurse, Derek and Henry Muzale
1999 Tense and aspect in Great Lakes Bantu languages. In Hombert, J. M. and
L. M. Hyman (eds.), Recent Advances in Bantu Historical Linguistics. Stan-
ford: CSLI, 517544.
Odden, David
1996 The Phonology and Morphology of Kimatuumbi. Oxford: Clarenden Press.
Orwig, Carol
1991 Relative time reference in Nugunu. In Anderson, Stephen and Bernard
Comrie (eds.), Tense and Aspect in Eight Languages of Cameroon. Dallas:
SIL, 14762.
Redden, James E.
1979 A Descriptive Grammar of Ewondo (Occsional Papers on Linguistics 4). Car-
bondale, Illinois: Department of Linguistics, Southern Illinois University.
Reichenbach, Hans
1947 Elements of Symbolic Logic. New York: The Macmillan Company.
Rice, Keren
2000 Morpheme Order and Semantic Scope: Word Formation in the Athapaskan
Verb. Cambridge/New York: Cambridge University Press.
Schadeberg, Thilo C. and Francisco Ussene Mucanheia
2000 Ekoti: The Maka or Swahili language of Angoche. Ko ln: Ru diger Ko ppe
Verlag.
Schu rle, Georg
1912 Die Sprache der Basa in Kamerun: Grammatik und Worterbuch. Hamburg:
L. Friederichsen and Co.
Seiler, Hansjakob
1971 Abstract structures for moods in Greek. Language 47, 7989.
Smith, Carlota S.
2004 The domain of tense. In Gueron, Jacqueline and Jacqueline Lecarme (eds.),
The Syntax of Time. Cambridge, Massachusetts/London: The MIT Press,
597619.
Steele, Susan
1975 Past and irrealis: just what does it all mean? International Journal of Ameri-
can Linguistics 41, 20017.
Taylor, Charles
1985 Nkore-Kiga. London: Croom Helm.
Taylor, John R.
2002 Cognitive Grammar. Oxford: Oxford University Press.
Traugott, Elizabeth Closs
1978 On the expression of spatio-temporal relations in language. In Greenberg,
Joseph H. (ed), Univerals of Human Language: Word Structure (Vol. 3).
Stanford: Stanford University Press, 370400.
218 R. Botne and T. L. Kershner
Three types of conditionals and their verb
forms in English and Portuguese
GILBERTO GOMES*
Abstract
An examination of conditionals in dierent languages leads to a distinction
of three types of conditionals instead of the usual two (indicative and sub-
junctive). The three types can be explained by the degree of acceptance or
as-if acceptance of the truth of the antecedent. The labels subjunctive and
indicative are shown to be inadequate. So-called indicative conditionals
comprise two classes, the very frequent uncertain-fact conditionals and the
quite rare accepted-fact conditionals. Uncertain-fact conditionals may have
a time shift in contemporary English and the future subjunctive in Portu-
guese (though not all of them do). Moreover, paraphrases of if with in
case or supposing are usually possible with approximately the same mean-
ing. Accepted-fact conditionals never have these features.
Keywords: conditionals; indicative; subjunctive; counterfactuals.
1. Indicative and subjunctive
Conditionals are often classied into two types: subjunctives (or counter-
factuals) and indicatives (Edgington 1995; Dancygier 1998; Bennett
2003). Here is an example of a subjunctive conditional:
(1) If he were here today, he would certainly help her.
The verb form used in the antecedent of this conditional is traditionally
called the past subjunctive. The verb to be is at present the only verb in
English that has a distinctive form for the past subjunctive (in the rst
and third persons singular: were). It should be noted that the past
subjunctive refers to the present time. The use of the subjunctive impli-
cates that the condition expressed by the antecedent is not real, but only
imaginary. The main verb in the consequent (help) is preceded by the
Cognitive Linguistics 192 (2008), 219240
DOI 10.1515/COG.2008.009
09365907/08/00190219
6 Walter de Gruyter
modal verb would, and this verb-phrase corresponds to the conditional
mood of other languages. It usually expresses an unreal, imaginary sit-
uation that would be the consequence of the condition expressed by the
antecedent.
Thus, subjunctive conditionals typically involve unreal, imaginary sit-
uations. That is why they are often called counterfactual conditionals. It
is usually agreed, however, that the falsity of the antecedent in counter-
factuals is conversationally implicated rather than asserted (Anderson
1951; Stalnaker 1975; Iatridou 2000). This is because a subsequent sen-
tence may assert it without redundancy or cancel it without contradiction.
The term counterfactual is somewhat too strong, since not always is
the antecedent really deemed contrary to fact. Sometimes this type of
conditional is used when the speaker thinks that the antecedent is only
probably (and not certainly) false. For example:
(2) If she were at home, we might visit her now.
Counterfactuals may be used even when the speaker considers the ante-
cedent probable, but wants to avoid the conditional to be interpreted as
too direct a suggestion. For example, Jean may say to Charles
(3) If you took a taxi, you would arrive on time.
believing that Charles will probably accept the implicit suggestion. But in
saying so she is distancing herself from this suggestion by speaking as if
she believed that he was not (or probably not) going to take a taxi; other-
wise she would have simply said If you take a taxi, you will arrive on time.
The subjunctive verb form were is certainly related to the indicative
form were used for the past, although the latter is not used for the rst
and third persons singular. Would may also be the past of will, but here
it merely indicates an imaginary present or future. According to Iatridou
(2000), past tense morphology as a component of counterfactual mor-
phology is found not only throughout Indo-European languages but also
in other totally unrelated languages. Imagining a situation that is not
occurring now seems to be cognitively related to remembering a past sit-
uation which is similarly not occurring now. As Langacker (1991: Ch. 6)
observes, both involve an epistemic distance between the designated pro-
cess and the speaker. According to him, instead of present vs. past we
can speak more generally of a proximal/distal contrast in the epistemic
sphere (Langacker 1991: 245). As this contrast is usually referred to a
time-line mental model, the predication of immediate reality is commonly
interpreted as one of present time and that of non-immediate reality as
one of past time (Langacker 1991: 246). In counterfactuals, by contrast,
the distal morpheme is interpreted as one of unreal circumstances.
220 G. Gomes
We should bear in mind that the verb forms described above are those
of the English language. The same counterfactual conditional structure
may be expressed in other languages with the aid of verb forms that do
not have the same properties as those used in English. For instance, in
German, the same verb form (Konjunktiv II) is used for both the ante-
cedent and the consequent. What is important, however, is that there are
verb forms for conditionals involving imaginary and unreal conditions
that are dierent from those used in conditionals involving possibly real
conditions, such as the following one:
(4) If he was here yesterday, he certainly helped her.
Here there is no would in the consequent, and the indicative is used in
both the antecedent and the consequent. Conditionals of this sort are
called indicative conditionals. Instead of he were, as in (1), we have
he was. It should be noted, however, that in contemporary English the
meaning of (1) may also be expressed by:
(5) If he was here today, he would certainly help her.
In older days this was considered incorrect, and some still consider it so,
but it is part of spoken and written language for many dialects of English.
Many would say that the verb in the antecedent of (5) is in the indicative
mood. Yet, the fact that a verb form normally used for simple statements
about the past is here used for the present timea past/present time
shiftmay at least be considered as an equivalent of the past subjunctive.
Fowlers Modern English Usage (quoted in Edgington 1995: 240) gives the
following examples:
(6) If he heard, he gave no sign.
(7) If he heard, how angry he would be!
The rst heard refers to the past, the second to the present. According
to Fowlers, the rst heard is indicative, the second subjunctive. Others
would consider both as simple past indicative. It would be harder to
maintain that I were and he/she/it were also belong to the simple past
indicative.
2. The present subjunctive in English indicative conditionals
Consider now the following two examples, which are not counterfactual,
since they involve possibly real conditions:
(8) If he is here tomorrow, he will certainly help her.
(9) If he be here tomorrow, he will certainly help her.
Three types of conditionals in English and Portuguese 221
These say in relation to the future what (4) says in relation to the past.
However, while in (4) there is no time shift and no subjunctive, in (8) we
have a present/future time shift and in (9) a subjunctive.
1
In (8), is, which
in simple statements is normally used in relation to the present, refers to
the future. (9) follows the regular form for this kind of conditional in
16th- and 17th-century English. For example:
(10) If he be not in love with some woman, there is no believing old signs.
(Shakespeare, Much Ado about Nothing, Act III, Scene II)
(11) A commander of an army in chief, if he be not popular, shall not
be beloved, nor feared as he ought to be by his army (. . .) (Hobbes,
Leviathan, Ch. XXX)
Here we have what is called the present subjunctive, by contrast with
the past subjunctive that we have seen in counterfactual conditionals.
This archaic form is still sometimes found in recent times:
(12) If it be your will [ . . . ], I will speak no more. (Song by Leonard
Cohen, 1984)
(13) I will be ne with you if you be good to me. (Song by Rick Astley,
1988)
(14) . . . in general, this has a negligible eect on the correlogram, but if
the grouping be very drastic, it is possible to introduce corrections
analogous to Sheppards corrections . . . (L. B. C. Cunningham and
W. R. B. Hynd, 1946)
(15) But right now those considerationsif we be at warare secondary
to victory. (Victor Davis Hanson, in National Review Online, 23
October 2001)
(16) And if we be robbers, how can we expect anything dierent from
our children? (Sermon by Rabbi Barry H. Block, 17 February
2006)
(17) It would make it more important if that be the case, he [Ralph
Nader] said yesterday. (New York Daily News, 5 February 2007)
This use of the present subjunctive in English conditionals has usually
been overlooked. Although rare now, it clearly inrms, for example, the
following statement by Bennett (2003: 11): The conditionals that are
called indicative under this proposal are indeed all in the indicative
mood (. . .).
The fact that indicative conditionals such as (9)(17) use the sub-
junctive moodthough this use is now archaicmay be enough reason
to question the adequacy of the traditional terms subjunctive and in-
dicative for distinguishing these two classes of conditionals, even in En-
glish. The fact that the subjunctive mood is also used in many indicative
222 G. Gomes
conditionals in Portuguese and in classical Spanish (see below) is an addi-
tional argument against this label.
The adequacy of classifying conditionals as indicative or subjunctive
has previously been questioned for the opposite reason. Thus Dudman
(1988) maintains that English counterfactuals use the indicative, not the
subjunctive mood, in spite of If I/he/she/it were. Bennett (2003: 11) also
states that most and perhaps all of [subjunctive conditionals] are in the
indicative mood also. To my mind, at least those with If I/he/she/it
were are undeniably in the subjunctive mood. In addition, the subjunctive
is also the rule in counterfactuals in other languages, such as German and
Spanish. An example in Spanish:
(18) Si el jefe estuviese/estuviera aqui no suceder a
If the boss were here not would happen
eso.
this.
If the boss were here, this would not happen.
(Estuviese/estuviera are alternative forms of the past
subjunctive (preterito imperfecto de subjuntivo).)
My point against this nomenclature is not that most subjunctives in
English use the indicative, but rather that indicatives may have the
present subjunctive in English (If it be, etc.)even if this is exceptional
in current Englishand the future subjunctive in Portuguese and also in
classic Spanish. An example in classic Spanish:
(19) Si fuere a Mexico, visitare las
If go-1sg-fut sbj to Mexico, visit-1sg-fut ind the
piramides.
pyramids.
If I go to Mexico, Ill visit the pyramids.
2
3. Three syntactical forms for conditionals in Portuguese
Let us now examine conditionals in Portuguese. (I will present the discus-
sion in a way that can be followed by those who have no knowledge of
Portuguese.)
(20) (I know that she is not Italian.)
Se ela fosse italiana, ela seria europeia.
If she were Italian, she would be European.
(21) (I do not know whether she is Italian or not.)
Se ela for italiana, ela e europeia.
If she be-1sg-fut sbj Italian, she is European.
Three types of conditionals in English and Portuguese 223
(22) (I know that she is Italian.)
Se ela e italiana, ela e europeia.
If she is Italian, she is European.
In Portuguese, there are three dierent forms of the verb in the ante-
cedent in these three cases: fossefore. (20) has the Portuguese imper-
fect subjunctive (corresponding to past subjunctive in English) in the
antecedent: fosse (were). (If she were [fosse] Italian, she would be
European.) (21) has the Portuguese so-called future subjunctive in the
antecedent: for. (If she is [for] Italian [which is not certain], she is Euro-
pean.) (22) has the present indicative: e (is). (If she is [e] Italian [as we
know she is], she is European).
The use of the future subjunctive always implicates doubt. For in-
stance, if X tells Y that Maria has studied a lot, Y may respond:
(23) Se ela estiver cansada, e melhor parar.
If she be-1sg-fut sbj tired, is better to stop.
If she is tired, she had better stop.
This implicates that, although she has studied a lot, she may be tired or
not. It also implicates that, if she is not tired, perhaps the best thing to do
is to go on studying (for example, because of her test tomorrow).
Now let us imagine a second situation, in which X told Y that Maria is
tired, because she has studied a lot. Y may respond:
(24) Se ela esta cansada, e melhor parar.
If she is tired, is better to stop.
If she is tired, she had better stop.
Y could never use (23) in this situation. If he already knows that she is
tired, he would never use estiver, which implicates doubt. He must use the
present indicative esta. In the rst situation, by contrast, some dialects of
Portuguese would use (24), but others would not (unless the speaker had
already concluded that she is tired, from the fact that she has studied a
lot).
Thus, the Portuguese language has three grammatical forms for the
conditional, not just two. The one using the future subjunctive (or future
perfect subjunctive) in the antecedent, which is absent in English, French,
German and other languages, is usually a clear sign of doubt and is
not used when the antecedent is treated as certain. In English (among
other languages), the noncounterfactual conditional construction is usu-
ally used in situations involving uncertain conditions, but it can also be
used in those involving conditions accepted as facts, like (22).
3
The three
grammatical forms present in Portuguese and the dierences in their use
suggest a distinction among three types of conditional sentences.
224 G. Gomes
4. Three types of conditional according to acceptance or as-if acceptance
of the antecedent
What should we call these three types of conditional? Those such as (1)
(3), (5), (7) and (20), in which the speaker accepts or speaks as if she ac-
cepted that the antecedent is false or probably false, but imagines a situa-
tion in which it would be true, are often called counterfactual conditionals,
a traditional name that may be kept.
4
I propose to call those such as (4),
(6), (8)(17), (19), (21) and (23), in which the speaker is or pretends to be
or speaks as if she were uncertain about the truth of the antecedent,
uncertain-fact conditionals. For those such as (22) and (24), in which the
speaker accepts or speaks as if she accepted that the antecedent is true, I
suggest the name accepted-fact conditionals.
5
Thus, I suggest that we should prefer counterfactual to subjunc-
tive to refer to the rst class, and that so-called indicative conditionals
should be divided in two classes: uncertain-fact conditionals and
accepted-fact conditionals. This classication of conditionals based on
the acceptance or as-if acceptance of the truth of the antecedent needs to
be defended against objections that may be raised following two inuen-
tial traditions in the philosophy of conditionals. First, several philoso-
phers have noted that counterfactuals are sometimes used in cases in
which the speaker believes the antecedent to be true. Second, it has been
argued that the dierence between counterfactual and indicative condi-
tionals is deeper than and not explained by the belief in or acceptance of
the truth of the antecedent. The rst objection is discussed in the section 8
and the second in section 9.
5. The distinction between accepted-fact and uncertain-fact conditionals
Further examples of uncertain-fact and accepted-fact conditionals are
given below. Suppose Johnny is trying to solve the following problem:
What is the value of x if x y 27 and x y 9? He is a clever boy,
but he has never studied algebra. He thinks: 27 may be the result of add-
ing several pairs of numbers. Lets try one.
(25) If x is equal to 20, then y is equal to 7.
(26) And if x is equal to 20 and y is equal to 7, then x minus y is equal to
13.
But x y 9. So x is not equal to 20. After trying another pair of
numbers that add up to 27 and failing again, he decides to ask his older
sister for help. Then she teaches him:
Three types of conditionals in English and Portuguese 225
(27) Look: x y is equal to 9. And if x y is equal to 9, then x is equal
to 9 y.
(28) Now, if x is equal to 9 y and x y is equal to 27, then 9 y y
is equal to 27.
From there she nds the solution.
The verb forms used in all these four conditionals in English are: isis.
If Johnny were thinking in Portuguese, (25) and (26) would typically have
the verb forms forsera (future subjunctivefuture indicative).
6
This
would show that Johnny is just trying out numbers that may or may not
be the right ones. By contrast, his sister would use the verb forms ee
(present indicativepresent indicative) in (27) and (28), because she is
dealing with certainties. In (28), for example, she is certain that x is equal
to 9 y, because she deduced this (in (27)) from the second equation
of the problem. In (27) and (28) we have acceptedfact conditionals,
with isis in English and ee in Portuguese. In (25) and (26) we have
uncertain-fact conditionals, with isis in English and typically forsera
in Portuguese.
We can see that the verb form used in the antecedent does not in
general allow one to make the distinction between accepted-fact and
uncertain-fact conditionals in English. In Portuguese, the use of the future
subjunctive (or future perfect subjunctive) indicates an uncertain-fact con-
ditional, but indicative forms may be used in both types.
The question then arises whether the conventional meaning of the con-
ditional construction is dierent or the same in accepted-fact conditionals
as compared to what it is in uncertain-fact conditionals. Let us consider
English conditionals without would in the consequent. One could ar-
gue that the default interpretation of the antecedent of such conditionals
is that it refers to an uncertain fact and that, in certain cases, additional
information may override this default interpretation, so that their ante-
cedent is understood as referring to an accepted fact. Alternatively, one
could argue that the meaning of the conditional construction does not
include anything about the antecedent referring to an accepted fact or
to an uncertain fact. In other words, one may ask whether the condi-
tional construction in these cases is ambiguous or vague as regards the
uncertain-fact/accepted-fact contrast.
7
This is a dicult question, but there is an argument that favours the
ambiguity thesis. This is the fact that if can usually be paraphrased with
in case or supposing in uncertain-fact conditionals (but not in accepted-
fact conditionals) and by since or given that in accepted-fact conditionals
(but not in uncertain-fact conditionals). This points to a dierence in the
meaning of if in each type of conditional. In an accepted-fact conditional,
226 G. Gomes
the meaning of if is similar to the meaning of since or given that, while in
uncertain-fact conditionals it is similar to the meaning of in case or sup-
posing. (This may be compared to the two meanings of while, a word
that may either mean whereas or during the time that.)
Note that I am not claiming that if, as used in uncertain-fact and in
accepted-fact conditionals, is synonymous with in case (or supposing)
and with since (or given that), respectively, but only that their meanings
are usually similar enough to allow the respective paraphrases. However,
this dierential possibility of paraphrasing accepted-fact and uncertain-
fact conditionals is a linguistic fact that indicates a dierence in the mean-
ing of the conditional construction in these two types.
For example,
(29) If you dont want me here, (then) Ill leave.
may either mean something similar to
(30) In case you dont want me here, (then) Ill leave.
or something similar to
(31) Since you dont want me here, (then) Ill leave.
Example (29) could be used either by someone who is considering the
hypothesis of being unwanted to be there ( just as (30)) or by someone
who has had clear evidence that she is really unwanted to be there ( just
as (31)). It will be an uncertain-fact conditional in the rst case and an
accepted-fact conditional in the second.
Suppose the following isolated sentence is overheard in an airport:
(32) If your ight is late, youll miss your connection.
Two interpretations are possible: (1) There is a possibility of your ight
being late and, in that case, youll miss your connection; (2) Your ight
is late and consequently youll miss your connection. Excluding any inu-
ence of special intonation or facial expression, the conditional construc-
tion itself might favour the rst interpretation. However, special circum-
stances might favour the second. Suppose that this takes place in a small
airport with only one scheduled departure in the next three hours and
that the person who hears the sentence knows that this departure is de-
layed. She may then think that the addressee is taking this ight and that
the speaker is referring to the known fact that it is late. My point is that
the hearer cannot fail to interpret the sentence one way or the other (or
even consider both alternatives). According to the rst interpretation, the
sentence could be paraphrased as In case your ight is late, youll miss
Three types of conditionals in English and Portuguese 227
your connection or Supposing your ight is late, youll miss your con-
nection. According to the second, it could be paraphrased as Since
your ight is late, youll miss your connection or Given that your ight
is late, youll miss your connection.
Many conditionals in Portuguese are also ambiguous as concerns the
uncertain-fact/accepted-fact distinction, as the following example:
(33) Se ele foi contratado, vamos primeiro ver o
If he was hired, go-1pl imp rst see the
trabalho dele para depois criticar.
work of him for after criticize.
If he was hired, lets rst see his work and then criticize it.
The sentence could be used either by one who thinks that the man was
hired or by one who is merely considering the hypothesis that he was.
8
As
in English, however, dierent paraphrases for se [if] would be possible in
each case. If (33) is meant as an accepted-fact conditional, se could be
paraphrased with ja que [since] or dado que [given that], but not with
caso [in case] or supondo que [supposing]. If it is meant as an uncertain-
fact conditional, se could be paraphrased with caso or supondo que (in
which case the verb tense would have to be changed to the past perfect
subjunctive: Caso ele tenha sido contratado, . . . or Supondo que ele
tenha sido contratado, . . .) but not with ja que or dado que.
6. Comparison with other proposed distinctions
My distinction has nothing to do with the thesis of Dudman (1984, 1989)
according to which indicatives should be divided in two classes according
to the presence or absence of a time-shift (and that those presenting
a time shift should be classied in the same group as counterfactuals).
To my mind, the presence of a present/future time shift is undoubtedly
signicant, since it is a sure sign of an uncertain-fact conditional. (No
accepted-fact conditional has a time shift.) However, there are many
uncertain-fact conditionals that do not have a time shift. For example,
when the antecedent refers to the past, as in (4), there is no time shift.
Thomason and Gupta (1980: 299) give an example in which the present
tense in the antecedent may refer to the present, thus without a time shift:
If he loves her, he will marry her.
Haegeman (2003) proposed a distinction between two types of indica-
tive conditionals that is also dierent from that between uncertain-
fact and accepted-fact conditionals: the distinction between premise-
conditionals and event-conditionals. According to her, the conditional
clause in event-conditionals structures the event: it expresses an event
228 G. Gomes
which will lead to the main clause event. In premise-conditionals, by con-
trast, the conditional clause structures the discourse: it expresses a
premise leading to the matrix clause (Haegeman 2003: 31819).
As it happens, almost all of her examples of premise-conditionals are
accepted-fact conditionals or may be interpreted as such. Here is one:
(34) John wont nish on time, if theres (already) such a lot of pressure
on him now. (Haegeman 2003: 322)
The speaker here clearly accepts that there is a lot of pressure on
John. However, the following example, also classied by the author as a
premise-conditional, is an uncertain-fact conditional:
(35) If his children arent in the garden, John will already have left home
(. . .). (Haegeman 2003: 325)
The speaker now seems uncertain about whether Johns children are still
in the garden or not. So we see that Haegemans distinction does not
coincide with mine.
In fact, I do not nd the distinction between event- and premise-
conditionals very clear. In (34), classied as a premise-conditional, we
could also say that the event expressed by the conditional clause will
lead to the main clause event, which is how Haegeman characterizes
event-conditionals.
Edgington (2003) also found diculties with Haegemans distinction.
She stresses the following two characteristics of event-conditionals as dis-
cussed by Haegeman: a causal relation between the conditional clause
and the main clause, and tense oddity (what I have called a present/
future time shift). And she concludes:
Given that there can be tense oddity and no causation running from conditional
to main clause, and vice versa, I am left somewhat uncertain about where to draw
the line between event-conditionals and the rest (Edgington 2003: 396).
Haegeman states that event-conditionals may be clefted and premise-
conditionals may not. (A conditional of the form A only if B is said to
be clefted when it is transformed to one of the form It is only if B that
A.) For example, we cannot say:
(36) *It is only if there is already such a lot of pressure on him now, that
John will nish the book. (Haegeman 2003: 323)
Edgington remarks that without the word such this example would be
in order. She notes that the role of such here is to suggest that the
Three types of conditionals in English and Portuguese 229
speaker already knows that there is all this pressure on John now. She
considers that conditionals in which the premise is really accepted by the
speaker are marginal and untypical and notes that while this is not
part of Haegemans ocial doctrine of premise-conditionals ( . . . ) quite
a few of her examples are of this kind (Edgington 2003: 397). Such con-
ditionals are precisely my accepted-fact conditionals.
Other authors have also proposed distinctions between types of indica-
tive conditionals that do not coincide with the one I am arguing for. Eve
Sweetser, for example, makes a distinction between content conditionals,
in which the realization of the event or state of aairs described in the
protasis is a sucient condition for the realization of the event or state
of aairs described in the apodosis (Sweetser 1990: 114), and epistemic
conditionals, in which knowledge of the truth of the hypothetical premise
expressed in the protasis would be a sucient condition for concluding
the truth of the proposition expressed in the apodosis (Sweetser 1990:
116). Both may either be uncertain-fact conditionals or accepted-fact
conditionals. Incidentally, it may be noted that an example such as (4)
(If he was here yesterday, he certainly helped her) ts both of Sweetsers
categories.
9
7. Features and uses of accepted-fact conditionals
Accepted-fact conditionals are no doubt much rarer than those of the
two other types. In chapters 18 (part 1) of Hobbess Leviathan, I found
only one accepted-fact conditional against 41 uncertain-fact and 11
counterfactual conditionals. In chapters 16 of Portrait of a Lady, by
Henry James, I also found only one accepted-fact conditional against 17
uncertain-fact and 9 counterfactual conditionals. In Portuguese, a search
in Contos Fluminenses by Machado de Assis revealed 6 accepted-fact con-
ditionals against 31 uncertain-fact and 16 counterfactual conditionals.
(Atypical conditionals as dened elsewhere (Gomes 2007) and discussed
in section 10 were excluded from these counts. The search involved only
conditionals with if in English or se in Portuguese.)
One might ask why people would use a conditional if they are certain
about the antecedent. They may do so to draw a conclusion from a
known fact or an accepted premise. Examples are Johnnys sisters sen-
tences (27) and (28). Another example is the following (in a context in
which the speaker had a life-threatening illness):
(37) If Im alive, (its because) my doctors did a good job.
Dudman (1986) quotes two other good examples of what I call
accepted-fact conditionals:
230 G. Gomes
(38) If it had not been possible to stop, or even delay, the Japanese up
country with the help of prepared defences and relatively fresh
troops, it was improbable that they would be stopped now at the
gates of the city (J. G. Farrell 1978).
(39) If they werent my doing, and they werent, then I couldnt control
their appearance or disappearance (Donald E. Westlake 1974).
In accepted-fact conditionals (as noted earlier), if (or if . . . then) may
often be paraphrased with since or given that with little change in mean-
ing, as for example in (38). This may lead one to question whether
accepted-fact conditionals are in fact conditionals (see Bennett 2003: 5).
I will argue that they are, for four reasons. First (most obviously), they
share the same overall linguistic structure with other conditionals. They
use the same conjunctions (if; if . . . then), the same pattern for building
the compound sentence and the same or similar intonation and prosody
in speech. They may have dierent verb forms, but counterfactuals also
do and this does not prevent us from considering them as conditionals.
From a grammatical point of view, there is no reason not to consider
them as conditionals.
Second, they usually share many basic logical and cognitive properties
with the other two types of conditionals. All three types are often used to
make inferences. They may be used to draw a conclusion, based on regu-
larity or on logical necessity, or to indicate this regularity or logical neces-
sity itself. They may all be used to make a prediction, dependent on some
condition. They may also be used to indicate the subjects intention to do
something in the future, conditional on a certain circumstance.
Third, though in accepted-fact conditionals since can often be used to
paraphrase if, this does not show that their subclause is merely a reason
clause. This is shown by the fact that many since-clauses cannot be para-
phrased with if-clauses. For example: Since she was not there, I went
away. The subclause here is not meant as conditional and consequently
we cannot say: *If she was not there, I went away. Thus, the subclause in
accepted-fact conditionals is not merely an adverbial clause of reason (or
cause), as might be thought from the possibility of paraphrasing if with
since, but a real conditional adverbial clause.
Fourth, accepted-fact conditionals may in many cases supply an ade-
quate contrapositive for counterfactual conditionals. For example:
(40) If she were Italian, she would be European.
(41) If she isnt European, she isnt Italian.
Within a context that gives reason to state (40), (41) is an accepted-fact
conditional, since in fact we know that she is neither European nor
Three types of conditionals in English and Portuguese 231
Italian. If we did not, we would not assert the counterfactual (40). Other
examples:
(42) If it had rained, the road would be wet.
(43) If (as is indeed the case) the road isnt wet, it hasnt rained.
(44) If she were very ill, she would be in bed.
(45) If (as is indeed the case) she is not in bed, she is not very ill.
The phrase as is indeed the case was included in parentheses in (43) and
(45) to make clear that these are intended as accepted-fact conditionals.
It could be omitted in a suitable context. In many dialects of Portuguese,
we would not need to include the corresponding phrase, since the verb
form (present indicative) would already implicate that. (If we had been
in doubt, we would have used the future subjunctive.)
Although quite rare, accepted-fact conditionals should be recognized
and distinguished from other indicative conditionals. They are the con-
ditionals that are really indicative, since they involve conditions that the
speaker considers (or acts as if she considered) to be real. The others deal
with uncertain conditions, and in some cases this is reected in the use of
a time shift in English (and other languages) and of the future subjunctive
in Portuguese and classic Spanish.
8. Acceptance and as-if acceptance
In rare cases, a counterfactual is employed even though the speaker does
not really accept the antecedent as false. Anderson (1951) gives the fol-
lowing example:
(46) If he had taken arsenic, he would have shown just these symptoms
[those which he in fact shows].
Note, however, that this example could have been used as a usual
counterfactual, in a situation where the speaker believes the antecedent
to be false. Suppose that there is another medical condition that presents
the same symptoms as arsenic poisoning and that the result of a special
test has shown that the patient has that medical condition. The sentence
would then be just a comment on the similarity of symptoms. Alterna-
tively, the counterfactual could have been used to convey that the speaker
nds it highly improbable that the man has taken arsenic, and that he is
perplexed by the similarity between his symptoms and those of arsenic
poisoning.
If the sentence is used in a situation where the speaker believes the an-
tecedent to be true (the possibility that the example is intended to show),
we should rst ask why the speaker would have chosen to use it, instead
232 G. Gomes
of saying something simpler as, for example: He shows symptoms of arse-
nic poisoning. It seems that the latter would be a clear suggestion that the
man has taken arsenic, and that making such a direct suggestion is pre-
cisely what the speaker is trying to avoid in (46). Here is where an as-if
acceptance of the falsity of the antecedent can be identied. The speaker
acts as if she was making a default assumption that the man has not
taken arsenic, but remarks that, had he done so, he would have shown
just the symptoms he in fact shows. It is a euphemistic way of suggesting
that he has indeed taken arsenic.
An uncertain-fact conditional could have been used to make the same
point in a simpler (though not as euphemistic) way:
(47) If one takes arsenic, one shows just these symptoms [which he
shows].
Edgington (1995: 240) gives another example:
(48) People in line are picking up their bags and inching forwardand
thats what they would be doing if a bus were coming.
It would seemingly be more natural to say: and thats what they usually
do if a bus is coming. The counterfactual here seems to be a more elabo-
rate way of saying the same thing. It is as if the speaker were saying
something like: First lets assume that no bus is coming, since we cannot
see one from here. Then lets imagine a situation that well treat as unreal
in which a bus is coming. What would people do in this situation? They
would pick up their bags and inch forward. Now, what are they doing
now? They are picking up their bags and inching forward. So lets revise
our initial assumption and conclude that a bus is probably coming.
Again, the speaker seems to provisionally act as if she accepted that the
situation described in the antecedent is unreal. It is a way of avoiding
commitment to the hypothesis that a bus is coming.
As noted earlier, the falsity of the antecedent in counterfactuals is usu-
ally considered to be conversationally implicated rather than asserted
(Anderson 1951; Stalnaker 1975; Iatridou 2000), since a subsequent sen-
tence may assert it without redundancy or cancel it without contradiction.
The same applies to the truth of the antecedent in accepted-fact condi-
tionals, as shown in the following example by Sweetser (1990: 128):
(49) Well, if (as you say) he had lasagne for lunch, he wont want spa-
ghetti for dinner. But I dont believe he had lasagne for lunch.
Declerck and Reed (2001: 45) have also shown that there are cases in
which the antecedent is accepted only to be challenged by a question in
the consequent.
Three types of conditionals in English and Portuguese 233
It is thus clear that in special cases an accepted-fact conditional may be
used even though the antecedent is not in fact accepted as true. In such
cases, however, an as-if acceptance is always the reason for using this
type of conditional. Suppose someone believes that the other person is
lying and this is why he is nervous. She says:
(50) If you are not lying, there is no reason to be nervous.
This may be seen as an ironic (or cautious) equivalent of:
(51) If you were not lying, there would be no reason to be nervous.
Pretended belief or a provisional strategic acceptance of the antecedent
is again the explanation. In (50) the speaker acts as if she accepted as a
fact that he is not lying, when in fact she believes he is. The utterance
seems to function as a reductio ad absurdum. If the addressee is not lying,
there is no reason to be nervous and a person does not get nervous when
there is no reason to be nervous. But the addressee is nervous, so it is not
true that he is not lying. The feigned belief in the truth of the antecedent
(achieved by giving it the form of an accepted-fact conditional) is pre-
cisely what makes the sentence ironic, since the speaker is suggesting
something (the fact that the addressee is lying) which is the opposite of
the natural implicature of the sentence (which could be accepted if in
fact the addressee were not nervous).
The antecedents of some accepted-fact conditionals are said to be
echoic, since they repeat something that has previously been stated by
the interlocutor. It has been noted (Sperber and Wilson 1986; Dancygier
1998) that in such cases the speaker does not necessarily share the belief
in the assumption echoed. However, she certainly acts as if she shared
that belief. She manifests at least a provisional acceptancewhich may
be ironic or notof the content of the antecedent.
An uncertain-fact conditional may also be used instead of a counterfac-
tual for irony. Instead of saying that since he is not Superman he will not
be able to do it, one might say:
(52) If he is Superman, he will be able to do it.
In saying this, one acts as if one considered his being Superman as an un-
certain fact, while in fact one believes it to be false.
9. Degree of acceptance or as-if acceptance of the antecedent as a basis
for distinguishing the three types of conditionals
I will now argue that the speakers degree of acceptance or as-if accep-
tance of the reality or probability of the condition described in the
234 G. Gomes
antecedent is sucient for explaining the dierence between the three
types of conditionals. Consider a situation in which three people saw a
man kill John. X is uncertain whether this man was Oswald or not and
says:
(53) If Oswald wasnt the one who killed John, then someone else was.
Y is sure that the man was not Oswald and says:
(54) If Oswald wasnt the one who killed John (as in fact he wasnt), then
someone else was.
Z is sure that the man was Oswald and says:
(55) If Oswald had not been the one who killed John, then someone else
would have been the one who killed him.
Though these three sentences sound unnatural, they are grammatical
and make sense. They could certainly be replaced by simpler ones, but
they were chosen on purpose to have a parallel formulation in the three
cases and at the same time avoid dierent contextual assumptions that
would be induced by a simpler wording (see Fogelin 1998).
The only dierence between the three is the belief that the speaker has
concerning the truth of the antecedent (and that of the consequent, as a
result). Y believes it is true, Z believes it is false and X is uncertain about
it.
10
If they did not have these respective beliefs, at least they would be
implicating acceptance of, non-acceptance of and uncertainty about the
truth of the antecedent, respectively.
We have a dierent situation in the following famous pair of examples
(from Lewis 1973: 3, based on Adams 1970):
(56) If Oswald didnt kill Kennedy, then someone else did.
(57) If Oswald hadnt killed Kennedy, then someone else would have.
The person asserting (56) implicates that she is uncertain and the one
asserting (57) implicates that she is certain about Oswald having killed
Kennedy. As Fogelin (1998) has shown, however, in addition to the dif-
ferent degree of acceptance concerning the truth of the antecedent, each
conditional involves dierent contextual assumptions. Thus, they are in-
terpreted dierently by the listener and they would be asserted by people
wanting to communicate dierent thoughts. One believes that Kennedy
was bound to be killed; the other is merely concerned with the identity
of the killer.
Pairs of examples such as this (rst suggested by Adams 1970), have
been considered by Lewis (1973) and many others after him as evidence
that the dierence between indicative and subjunctive conditionals cannot
Three types of conditionals in English and Portuguese 235
be explained by the speakers opinion about or acceptance of the truth of
the antecedent. However, I am in complete agreement with Fogelin
(1998) in attributing any further dierence to the contextual setting. He
shows that the disparity in the reasons for believing each conditional sim-
ply disappears when the relevant contextual features are held constant.
This is obtained by changing the wording of the sentences, as in (53) and
(55).
11
(I have merely added (54) to complete the picture of the three
types.)
Counterfactuals are thus used when the speaker accepts or speaks as
if she somehow accepted that the antecedent is false or highly improba-
ble; uncertain-fact conditionals are used when the speaker accepts or
speaks as if she somehow accepted that the antecedent is uncertain; and
accepted-fact conditionals are those used when the speaker accepts or
speaks as if she somehow accepted that the antecedent is true or highly
probable.
10. Atypical conditionals
I have distinguished three types of conditionals. This is not to say that ev-
ery conditional should fall into one of these types. There are also some
deviant ones, which I call atypical conditionals. (I have elsewhere pro-
posed a denition and an explanation of atypical conditionals (Gomes
2007). For instance (from Edgington 1995: 240):
(58) If he took arsenic, hes showing no signs.
The person who says so probably believes the antecedent is false and
could have said:
(59) If he had taken arsenic, he would be showing signs of arsenic
poisoningbut he isnt.
At least she is uncertain about it and could have said:
(60) If he took arsenic, signs of arsenic poisoning are expectedbut hes
showing no such signs.
Example (59) includes a typical counterfactual and (60) a typical
uncertain-fact conditionaland they also include a comment with but
after these conditionals, to convey the meaning of the atypical (58).
11. Conclusion
An examination of conditionals in English and Portuguese has thus led
us to distinguish three types of conditionals instead of the usual two
236 G. Gomes
(indicative and subjunctive). The labels indicative and subjunctive
were found inadequate, since subjunctive verb forms may be found in in-
dicative conditionals (in the archaic use of the present subjunctive in En-
glish and of the future subjunctive in classical Spanish, and in the current
use of the future subjunctive in Portuguese). Moreover, so-called indica-
tive conditionals comprise two classes, the very frequent uncertain-fact
conditionals and the quite rare accepted-fact conditionals.
Uncertain-fact conditionals may have a time shift in contemporary
English and the future subjunctive in Portuguese (though not all of
them do). Accepted-fact conditionals never have these features. Al-
though accepted-fact conditionals are rare, I have argued that they are
genuine conditionals, which have the theoretically important function
of providing a contrapositive for many counterfactuals (when a contra-
positive is valid). When the verb forms used do not permit the identi-
cation of an accepted-fact conditional, it may be recognized by the
possibility of adding (as is indeed the case), (as you say) or (as X
says) after if, or by the possibility of paraphrasing if with since or given
that.
I have argued that the degree of real or as-if acceptance by the
speaker of the truth of the proposition expressed by the antecedent is
sucient to explain the dierential use of these three types (and that
further dierences are accidental and due to contextual features). The
task of establishing common or dierent truth conditions for them may
be considered as a subsequent one, which is outside the scope of this
paper.
Received 14 May 2007 Universidade Estadual do
Revision received 15 December 2007 Norte Fluminense, Brasil
Notes
* Laboratory of Cognition and Language, Universidade Estadual do Norte Fluminense,
Campos, RJ, Brazil. E-mail: 3ggomes@uenf.br4.
1. Interestingly, Gibbard (1980) considers conditionals in which there is a present/future
time shift as grammatically subjunctive.
2. The following abbreviations are used in the glosses: 1, 3rst, third person; sg
singular; futfuture; sbjsubjunctive; indindicative; impimperative; perf
perfect.
3. Although I have emphasized in section 2 that there is an archaic use of the present sub-
junctive in indicative conditionals in English (which questions the adequacy of this
label), I am not claiming that this use is preferentially associated with a type of condi-
tional, as the future subjunctive is in Portuguese.
4. Against the term counterfactual, Bennett (2003: 12) remarks that it may be consid-
ered as based on a feature that has nothing to do with the antecedents being
Three types of conditionals in English and Portuguese 237
contrary-to-fact, but only with the speakers thinking that it is so. However, I do
not think that this is really a problem. The labels reference to the speakers opinion
may easily be considered as implicit: a conditional will be called counterfactual when
the speaker accepts or speaks as if she accepted that the antecedent is (or probably is)
contrary-to-fact.
5. Following Auwera (1986), Comrie (1986) and Bhatt and Pancheva (2006), among
others, one might call such conditionals factual conditionals. However, the term has
already been used in relation to uncertain-fact conditionals that express habitual or
general facts. Moreover, accepted-fact shows that the speaker may merely be treat-
ing the antecedent as true, without in fact committing herself to its truth.
6. Though they might also have e e (present indicativepresent indicative).
7. I am indebted to the Editor for this observation.
8. However, the use of the indicative seems to favour the accepted-fact interpretation.
Using the future perfect subjunctive, this could be framed unambiguously as an
uncertain-fact conditional:
Se ele tiver sido contratado, vamos primeiro ver o
If he be-1sg-fut perf sbj hired, go-1pl-imp rst see the
trabalho dele para depois criticar.
work of him for after criticize.
If he was hired, lets rst see his work and then criticize it.
9. This accords with the following observation by Dancygier and Sweetser (2005: 17):
Since reasoning from cause to likely eect is just as possible as reasoning from eect
to likely cause, epistemic conditionals can also follow the direction of content causal
contingency.
10. In Portuguese, a dierent verb form could have been used in each: (56) Se nao tiver sido
. . . (57) Se nao foi . . . (58) Se nao tivesse sido . . .
11. The context of (57) is xed by changing the pair to: If Oswald did not kill Kennedy, then
someone else stepped in and did. If Oswald had not killed Kennedy, then someone else
would have stepped in and killed him (Fogelin 1998). The rst sentence might have
been used by a conspirator who was unsure whether Oswald had succeeded in killing
Kennedy. That (56) might be used in a context similar to that of (57) had already
been pointed out by Bennett (1995: 3345). The same conspirator having the same
beliefs concerning the presence of someone prepared to step in if Oswald failed might
utter (56) before knowing that Oswald had succeeded and (57) after knowing that he
had. By contrast, the context of (56) is xed by using wordings similar to those of (53)
and (55) (Fogelin 1998).
References
Adams, Ernest W.
1970 Subjunctive and indicative conditionals. Foundations of Language 6, 89
94.
Anderson, Alan Ross
1951 A note on subjunctive and counterfactual conditionals. Analysis 12, 35
38.
Auwera, Johan van der
1986 Conditionals and speech acts. In Traugott, E. C., Meulen, A. t., Snitzer-
Reilly, J. and Ferguson, C. A. (eds.) On Conditionals. Cambridge: Cam-
bridge University Press.
238 G. Gomes
Bennett, Jonathan
1995 Classifying conditionals: The traditional way is right. Mind 104, 331354.
2003 A Philosophical Guide to Conditionals. Oxford: Oxford University Press.
Bhatt, Rajesh and Roumyana Pancheva
2006 Conditionals. In Everaert, M. and Riemsdijk, H. C. van (eds.) Blackwell
Companion to Synthax, vol. 1. Oxford: Blackwell.
Comrie, Bernard
1986 Conditionals: A typology. In Traugott, E. C., Meulen, A. t., Snitzer-Reilly,
J. and Ferguson, C. A. (eds.) On Conditionals. Cambridge: Cambridge
University Press.
Dancygier, Barbara
1998 Conditionals and Prediction. Cambridge: Cambridge University Press.
Dancygier, Barbara and Eve Sweetser
2005 Mental Spaces in Grammar: Conditional Constructions. Cambridge: Cam-
bridge University Press.
Declerck, Renaat and Susan Reed
2001 Conditionals: A Comprehensive Empirical Analysis. Berlin: Mouton de
Gruyter.
Dudman, Victor H.
1984 Parsing if -sentences. Analysis 44, 145153.
1986 Antecedents and consequents. Theoria 52, 168199.
1988 Indicative and subjunctive. Analysis 48, 113122.
1989 Vive la revolution! Mind 98, 591603.
Edgington, Dorothy
1995 On conditionals. Mind 104, 235329.
2003 What if? Questions about conditionals. Mind and Language 18 (4), 380
401.
Fogelin, Robert J.
1998 David Lewis on indicative and counterfactual conditionals. Analysis 58 (4),
286289.
Gibbard, Allan
1980 Two recent theories of conditionals. In Harper,W. L., Stalnaker, R. and
Pearce, G. (eds.) Ifs. Dordrecht: Reidel.
Gomes, Gilberto
2007 Truth in natural language conditionals. Unpublished paper. Laboratory of
Cognition and Language. Universidade Estadual do Norte Fluminense,
Campos, RJ, Brazil.
Haegeman, Liliane
2003 Conditional clauses: External and internal syntax. Mind and Language
18(4), 317339.
Iatridou, Sabine
2000 The grammatical ingredients of counterfactuality. Linguistic Inquiry 31(2),
231270.
Langacker, Ronald W.
1991 Foundations of Cognitive Grammar, vol. II: Descriptive Application. Stan-
ford, CA: Stanford University Press.
Lewis, David K.
1973 Counterfactuals. Cambridge, MA: Harvard University Press.
Sperber, Dan and Deirdre Wilson
1986 Relevance: Communication and Cognition. Oxford: Blackwell.
Three types of conditionals in English and Portuguese 239
Stalnaker, Robert C.
1975 Indicative conditionals. Philosophia 5, 269289.
Sweetser, Eve
1990 From Etymology to Pragmatics: Metaphorical and Cultural Aspects of
Semantic Structure. Cambridge: Cambridge University Press.
Thomason, Richmond and Anil Gupta
1980 A theory of conditionals in the context of branching time. In Harper, W. L.,
Stalnaker, R. and Pearce, G. (eds.) Ifs. Dordrecht: Reidel.
240 G. Gomes
Much mouth much tongue:
Chinese metonymies and metaphors
of verbal behaviour
ZHUO JING-SCHMIDT
Abstract
This paper explores metonymical and metaphorical expressions of verbal
behaviour in Chinese. While metonymy features prominently in some of
these expressions and metaphor in others, the entire dataset can be best
viewed as spanning the metonymy-metaphor-continuum. That is, we observe
a gradation of conceptual distance between the source and target which cor-
responds to the gradation of gurativity. Specically, roughly half of the
expressions we encounter are based on the ORGAN OF SPEECH ARTICULATION
FOR SPEECH metonymy and can be considered as clustering around the met-
onymic pole. The other half can be seen as tending towards the metaphoric
pole, as they are largely motivated by conceptual metaphors: (a) VERBAL
BEHAVIOUR IS PHYSICAL ACTION, (b) SPEECH IS CONTAINER, (c) ARGUMENT
IS WAR (or WORDS ARE WEAPONS) and (d) WORDS ARE FOOD. The interac-
tion between metonymy and metaphor is an important cognitive strategy in
the conceptualisation of verbal behaviour. The ndings (i) evidence the gra-
dient predictability of idiom meanings based on semantic compositionality,
(ii) conrm the hypothesis of a bodily and experiential basis of cognition,
(iii) suggest the existence of culture-specic models in the utilization of ba-
sic experiences, and (iv) point to the role of emotion in the metaphorisation
of verbal behaviour as a socio-emotional domain.
Keywords: Chinese; verbal behaviour; metaphor; metonymy; embodi-
ment; emotion.
1. Introduction
Since Lako and Johnson published Metaphors We Live By (1980), many
cognitive linguistic studies have been conducted on conceptual metaphor
and metonymy as evidence of the embodiment of human cognition (e.g.,
Cognitive Linguistics 192 (2008), 241282
DOI 10.1515/COG.2008.010
09365907/08/00190241
6 Walter de Gruyter
Lako and Johnson 1999; Lako 1987; Johnson 1987; Ko vecses 2005, in-
ter alios). This line of research has corrected the long-held misconception
of metaphor and metonymy as mere rhetoric devices. They are, as we
now know, the fundamental components of our cognitive behaviour as
well as an integral part of our socio-cultural practice (Ko vecses 2005:
89).
Within the framework of Cognitive Linguistics, metaphor is under-
stood as the conceptualisation of an abstract or, to use Ko vecses word,
intangible, domain in terms of a basic, usually physical and tangible,
domain (Lako and Johnson 1980; Ko vecses 2005; Langacker 1987).
The former is known as the target domain and the latter the source. Met-
aphor is not only functionally expressive and interactional, it is also
conceptually constitutive, as Ko vecses (1999) has argued. For example,
according to Lako and Johnson, underlying the utterance Im on top of
the world! is the happiness is up metaphor. This metaphor not only en-
ables us to understand and express the emotion of happiness in terms of
the spatial relationship encoded in up. More fundamentally, it enables us
to understand and to express how it feels to be happy at all.
Metonymy, on the other hand, is understood as the process whereby a
certain aspect of a given domain provides mental access to another aspect
of the same domain or, as Croft (2002) points out, a subdomain is
mapped into another subdomain within the same domain matrix. For ex-
ample, the question Have you read Goethe? makes little sense unless the
name of the writer is taken to refer to, and, more importantly, to provide
conceptual access to, the literary works produced by the writer. Thus,
metonymy is functionally a conceptual access mechanism (Ko vecses and
Radden 1998).
Given the distinct functions of metaphor and metonymy, it might ap-
pear that the two processes would be two distinct mental strategies in
their respective prototypical instantiations. In real linguistic conceptual-
isations, however, metaphor and metonymy are hard to separate. Goos-
sens (2002) describes a number of dierent forms in which metaphor and
metonymy interact in British English expressions of verbal behaviour. He
coined the term metaphtonymy to refer to the intertwinement of the
two processes. Barcelona (2000) argues that metaphor and metonymy
are inseparable not only at the level of combined uses, but, more funda-
mentally, at the conceptual level. He points out that metonymy enables
metaphorical mapping by recognising the abstract structural similarity
between the source domain and the target domain. Because of the intimate
relationship between the two processes, metaphor and metonymy are
increasingly being regarded as constituting a continuum rather than a bi-
nary distinction. Dirven (2002) contends that the metaphor-metonymy
242 Z. Jing-Schmidt
continuum can be understood as a gradation between conceptual close-
ness and conceptual distance, which explains the varying degrees of gu-
rativity as seen in metaphor and dierent types of metonymy.
Having outlined the basic tenets of Cognitive Linguistics regarding
conceptual metaphor and metonymy, a brief reference to Conceptual
Blending is in order because of its immediate relevancy to the Theory of
Conceptual Metaphor in terms of Lako and Johnson. It should be noted
that Lako and Johnson (1980: 147148) have explicitly argued that the
metaphorical mapping creates similarities between the source domain and
the target, similarities that do not exist independently of the metaphor.
The recognition of a creation of similarities alone, however, is insu-
cient for the construction of a distinct novel meaning, at least with respect
to certain metaphors. Critically, the novelty of the constructed meaning
seems to resist explanation based on a two-domain mapping. This point
has been stressed by a number of cognitive linguists in view of the
strengths of the four-space model known as mental integration or blend-
ing in the sense of Turner and Fauconnier (1995) and Fauconnier and
Turner (2002). Grady et al. (1999), for instance, argue that blending de-
velops emergent content as a result of experiential incongruity. Such
incongruity gives rise to, and thus accounts for, connotations that are
otherwise not inferable from the input. A famous example is the surgeon
as butcher metaphor. The sense of incompetence behind this metaphor,
Grady et al. (1999: 103106) argue, results from the contradiction be-
tween helping and healing as the surgeons presumable goal, and butch-
ery as the means being named. Croft and Cruise (2004: 203204), Ko -
vecses (2005: 268), and Evans and Green (2006: 403404), among other
scholars, also acknowledge the relative mental complexity and conceptual
richness made explicit by the blending model. In the present paper, the
reader will also encounter particular cases that call for the notion of
blending as an adequate complement to the main model being adopted
here, namely the metaphor-metonymy-interaction model. Accordingly,
applicability of the blending model will be pointed out in the analysis of
such cases.
Cognitively oriented studies of guration in the Chinese language have
made signicant contributions to our awareness and appreciation of
culture-specic as well as universal patterns of conceptualisation. For ex-
ample, Kornacki (2001) shows that metaphors and metonymies are among
the driving mechanisms of Chinese concepts of anger. Ye (2001) presents
metaphors and metonymies in the conception of sadness in Chinese. Most
conspicuously, Yus numerous analyses of metaphors and metonymies
demonstrate how dierent body-part terms are employed for the con-
ceptualisation of various abstract experiences including emotion, social
Chinese metonymies and metaphors 243
dignity, control, and thought (Yu 2000, 2001, 2002, 2003a, 2003b). The
present study is aimed to continue the cognitive linguistic eort to expli-
cate the conceptual mechanisms by which humans make sense of complex
and abstract experiences. Specically, I focus on Chinese lexical com-
pounds and idiomatic expressions that metonymically and/or metaphori-
cally conceptualise verbal behaviour. In short, I study the guration in
the Chinese language about language.
1
Verbal behaviour generally refers to the use of language for social pur-
poses. In this general sense, it includes instantaneous linguistic actions
as well as stable dispositions that characterise persons in relation to the
use of language. To the extent that the physical production of language
(speech) is based on species-specic physiology, verbal behaviour has its
universal biological basis and is subjected to physiological constraints.
Consequently, body-parts that conspicuously participate in the articula-
tion of speech sounds constitute an important source from which the con-
ception of verbal behaviour derives by way of metonymy. In Chinese,
about 50 compounds and idioms describing verbal behaviour involve one
or more salient speech organs including the mouth (zui or kou), the tongue
(she), the lips (chun) and the teeth (chi or ya). To illustrate the role played
by body-part related metonymies in the conception of verbal behaviour,
consider the compounds (1a, b), the idiom (1c), and their uses in (2)
2
:
(1) a. zui-ying (mouth-hard) verbally stubborn, unwilling to admit an
obvious mistake
b. zui-tian (mouth-sweet) marked by a readiness to utter attering
words
c. duo-zui-duo-she (much-mouth-much tongue) marked by the an-
noying tendency to make unsolicited remarks or general verbal
indiscretion
(2) a. women zuo-de bu hao, dei chengren, buyao zui-ying.
1PL do-RES not-good, must admit, not-want mouth-hard
We are not doing well, we have to admit it, and shouldnt be
too stubborn to admit it.
b. zhe ren suiran benshi bu da, danshi zui-tian.
this person though ability not big, but mouth-sweet
Although this person doesnt have great abilities, hes good at
attering.
c. dajia dou bu yanyu, pian ni duo-zui-duo-she, you ni shenme shi a?
everyone all not speak, just 2SG much-mouth-much-tongue,
have 2SG what matter Q
All were silent, only you couldnt spare your mouth and
tongue. It was none of your business!
244 Z. Jing-Schmidt
Here, by means of a metonymical mapping, the body-parts zui mouth
and she tongue dene the conceptual space in which to understand the
meanings of the respective expressions as the space of verbal behaviour.
However, structurally simple as they are, the items in (1) cannot be ana-
lysed as simply metonymic. Rather, the juxtaposition of the body-parts
with the adjectives describing palpable properties indicates the interaction
between metonymy and metaphor. To be specic, ying hard, tian sweet
and duo much are metaphorical because literally they describe texture
and taste in the sensual domain and quantity in the physical domain, re-
spectively. Thus, the three expressions represent an embedment of meta-
phor in metonymy in the conception of verbal stubbornness, verbal do-
cility, and verbal indiscretion, respectively. It is crucial to note that the
properties described by ying, tian and duo are in no way objective and in-
herent to the entities being described. To the contrary, they express how
people feel about certain verbal behaviours, thus reecting peoples inter-
action with their environment including both physical objects and ab-
stract phenomena. That is to say, they are interactional in the sense of
Lako (1987: 51).
The idea of interactional properties of reality, however, has long been
well-known and widely acknowledged in cognitive psychology. Church
(1961: xii), for instance, points out that human knowledge has an inevi-
table component of ambiguity, since we repeatedly discover that proper-
ties found in reality are in fact reections of ourselvesprojections.
Hebb (1972: 234245), drawing on the fact that the same sensory stimu-
lation can give rise to completely distinct perceptions, and dierent stim-
ulations can give rise to the same perception, argues for the necessity to
distinguish perception from sensation. Although Hebb does not explicitly
claim that the properties of reality we perceive are interactional in nature,
the experimental evidence of the complexity of perception he provides
suggests this idea. Section 2.1 will address the interactional properties de-
scribed by metaphors embedded in body-part metonymies in details.
The metonymy involving the body-parts of speech articulation is not
uniquely Chinese. Expressions that operate by the same metonymic prin-
ciple abound in languages throughout the world. For examples, English
speakers are familiar with mouthpiece, give mouth to ones feelings,
badmouth someone, give someone a mouthful, the gift of tongues, have a
sharp tongue, lip service, loose lips, etc. The Japanese use kuchi ga karui
(mouth-light) annoyingly talkative, warukuchi (bad-mouth) slander,
kuchisaki dake (mouth-rst-merely) mere words, lip service. The Ger-
mans say bose Zungen (evil-tongue) verbally vicious people, jemandem
die Zunge losen (somebody-tongue-release) cause somebody to talk, mund-
faul (mouth-lazy) unwilling to speak, in aller Munde (in-all-mouths)
Chinese metonymies and metaphors 245
well-known, sich den Mund verbrennen (self-mouth-burn) do harm to
oneself by speaking mindlessly, to name just a few. In Goossens (2002:
359) data on English expressions, 49 out of 109 items based on body-parts
contain a body-part that is instrumental to speech. The Duden Univer-
sal Dictionary (1996) lists roughly 30 phrasal idioms and compounds con-
taining the mouth and about 20 items containing the tongue as referring
to verbal behaviour in German. The crosslinguistic observance of the
metonymic conception of verbal behaviour in terms of the relevant oral
structure suggests that the universal physiological reality of speech articu-
lation has a powerful and predictable eect on the conceptualisation of
verbal behaviour.
Universality or near universality is also readily observed in the large
repertoire of expressions motivated by conceptual metaphors in Chinese:
(a) verbal behaviour is physical action, (b) speech is container, (c)
argument is war and words are weapons, and (d) words are food.
These conceptual metaphors are not arbitrary conventions of the lan-
guage, but are rooted in basic human experience relevant to existence
and survival. Because of the widely shared experiential, and mostly phys-
ical, basis, the meanings of the idioms are by and large recoverable on
account of the meanings of the components, though to a varying extent.
On the other hand, because the use of language for social purposes
is largely learned cultural behaviour, cultural conventions of perceiving
and discoursing on such behaviour are likely to give rise to variations in
the conceptualisation of verbal behaviour. Variation may occur either
in the source domain or in the target. That is to say, on one side, the
ideas associated with a source domain agreed upon by a community of
speakers to dene a certain abstract experience may be culture-specic
(Ko vecses 2005: 12). These culture-specic ideas are congruent with
what Bruner (1990: 40) calls folk psychology which embodies the inter-
pretive principles elaborated by a culture. For example, sweetness in the
physical domain of taste is a crosslinguistically common source of meta-
phor. Yet the aective connotation of the target onto which sweetness is
mapped may vary from language to language. While sweet taste, e.g., in
(1b) zui-tian sweet-mouthed, apt to atter, is mapped onto a slightly
contemptible verbal tendency in Chinese, it is usually associated with
aection and related positive emotions in English, e.g., sweetheart,
sweetie.
3
On the other side, the same target domain may be approached
through dierent sources across cultures. For example, the quality of
garrulity is approached via the physical domain of (the lack of ) weight
in Japanese, e.g., kuchi ga karui (mouth-light). In Chinese, by contrast,
it is understood in terms of physical disintegration, e.g., zui-sui (mouth-
shattered). Thus, given the role of culture in the conceptualisation of
246 Z. Jing-Schmidt
abstract matters such as verbal behaviour, it might not be an exaggera-
tion to state, as does Bruner, that culture is constitutive of mind.
The way the present paper is organized reects the metonymy-
metaphor continuum with the conceptual metonymy organ of speech
articulation stands for speech on the one pole, and the four concep-
tual metaphors on the other. Section 2 focuses on the metonymically
based expressions that fall into three major subtypes depending on the
specic event types encoded in the particular form of interaction between
the speech organ metonymy and particular metaphors. Section 3 deals
with facts and principles of the major conceptual metaphors of verbal
behaviour. The relationship between conventionality and semantic pre-
dictability, between gurativity and emotionality, between universality
and culture-specicity is addressed throughout the analysis. Section 4 ad-
dresses the role of emotion in the metaphorisation of verbal behaviour.
Quantitative data extracted from a questionnaire survey are employed
to show the existence of a negativity bias in the aective valence of the
gurative lexicon of verbal behaviour. Section 5 concludes the article by
stating the theoretical implications and setting forth possible tasks for fur-
ther research.
The data are extracted from four dierent and functionally comple-
mentary sources: (a) Yao (2000), a standard Mandarin Chinese dictio-
nary, (b) Zhu (2002), a standard Mandarin Chinese dictionary of idioms
and prefabricated expressions, (c) the Chinese language internet search
engine www.baidu.com, and (d) the authors own native lexical reper-
toire. The dictionaries are the principal sources with regard to the seman-
tics of the expressions being studied. The on-line data are useful insofar as
they provide clues into the lexical status of certain colloquial expressions
that are not included in the dictionaries. The authors native knowledge
of the lexicon has been helpful in determining the direction of search for
relevant expressions both in the dictionaries and in the on-line resource
specied above. The present dataset consists of a total of 122 items. De-
spite the attempt to include as many examples as possible, the dataset is
not intended to be exhaustive and remains to be supplemented by future
explorations.
The spell sound adopted here for the examples is based on pinyin,
the standard pronunciation system used in mainland China. Tonal
markers are omitted. Word-for-word literal glosses are provided in the
parentheses following each example, which proceed a translation of con-
ceptual proximity. The Chinese originals of the data employed in the
study are provided in Appendix A at the end of this paper, numbered in
correspondence to the numbering of the examples in the text. Details of
the extraction of data for the analysis in section 4 are provided in the
Chinese metonymies and metaphors 247
beginning of that section. The original questionnaire is provided in Ap-
pendix B.
2. Conceptual metonymy of verbal behaviour
The systematic conceptual metonymy of verbal behaviour can be schema-
tised as organ of speech articulation stands for speech. For the sake
of simplicity, I call it the speech organ metonymy. This basic metonymy
has a variety of representations in Chinese depending on the scenes or
event types being encoded. It takes the interaction between the speech
organ metonymy and a supporting metaphor to describe a particular
type of verbal behaviour. Generally, expressions based on the interaction
between the speech organ metonymy and a supporting metaphor fall
into three subtypes: (I) property of speech organ stands for property
of verbal behaviour, (II) action affecting speech organ stands for
verbal action, and (III) effect of a speech organ stands for effect
of verbal behaviour. In what follows, we shall consider the subtypes in
turn.
2.1. Property of speech organ as source
To consider speech organs, a look at the mouth is fundamental. So I shall
begin by looking at expressions with the mouth (zui or kou) as a metony-
mic vehicle providing mental access to speech. Consider the compounds
involving zui in (3):
(3) a. zui-ying (mouth-hard) verbally stubborn, unwilling to admit an
obvious mistake
b. zui-tian (mouth-sweet) marked by a readiness to utter attering
words
c. zui-sui (mouth-shattered) annoyingly talkative, garrulous, apt
to nag
d. zui-jin (mouth-tight) unlikely to spread news, able to keep se-
crets
Like the rst two expressions, (3a) and (3b), which I already discussed in
the introductory section as (1a) and (1b), (3c) is not simply metonymic or
metaphorical, but reects the intertwinement of both conceptual pro-
cesses. More concretely, a certain interactional property of a speech
organ metonymically refers to a property of verbal behaviour that the
organ helps to produce. The description of the interactional property is
metaphorical because sui shattered, broken in pieces is taken from the
248 Z. Jing-Schmidt
familiar domain of physical objects in which it depicts the shattered state
of breakable things. In the context set up by the speech organ metonymy,
the loss of physical integrity described by sui signies the loss of form
and coherence, the resistance to collection, the incessantness and repeti-
tiveness that typify the verbal activity of nagging and the quality of gar-
rulity. Note, however, that sui, unlike English broken which proles a
damaged state or defect through breaking, does not prole the rather ab-
stract concept of damage or defect, but merely depicts the physical scene
that something is fragmental as a result of breaking. In light of the lan-
guage-specic semantic proling that sets sui apart from broken, the idea
of damage and defect is absent in the imagery underlying zui-sui. This
absence inhibits the inference that, since the mouth enables one to
talk, someone with a broken mouth will be unable to talk, an inference
that would have been justied if sui were a true semantic equivalent of
broken.
4
The same principle of metonymy-metaphor interaction applies to (3d).
Here, the quality of tightness describes physical objects such as the lid of
a container that prevents leaking or, alternatively, a door or a gate that
can be shut tightly, securely. Metaphorically, it expresses the idea that a
person is trustworthy because he or she doesnt spread words. Thus, the
combination of the mouth metonymy and the metaphor constituted by
jin tight suggests an image of the mouth as a tightly shut physical object
in the context of verbal behaviour. This combination might be thought of
as a conceptual integration or blend. That is, the verbal function of the
mouth as selectively derived from input 1, zui mouth, and the idea of
prohibited leaking related to an object being tightly shut as selectively
derived from input 2, jin tight, are projected into a blended space to
yield the distinct meaning of discreet in verbal behaviour and thus
trustworthy.
5
In view of these examples, it is crucial to point out that the physical
domains of texture, taste, conguration, etc. are systems entirely indepen-
dent of the domain constituted by the concept of a speech organ. Thus, it
appears plausible that certain conventional knowledge (Ko vecses 2002)
is required to make sense of the idiosyncratic collocation, say, of the con-
cept of sweet taste and the system of verbal behaviour constituted by the
mouth. It is likely that this conventional knowledge consists of both
knowledge of the physical world and cultural knowledge.
The functional salience of the mouth as a speech organ is also evident
in expressions where the mouth is paired with another body-part. In what
follows, I explore double metonymies in which the property of the mouth
is contrasted with that of the heart (or the bowel) in the conceptualisation
of a particular language-mind relationship. The body-parts xin heart
Chinese metonymies and metaphors 249
and fu bowel refer metonymically to the mind or intention, as the heart
and the bowel are considered the loci of feelings and thoughts in Chinese
folk psychology. Let us look at the idioms in (4):
(4) a. ku-kou-po-xin (bitter-mouth-grandma-heart) lovingly intended
advice put in unpleasant words
b. fo-kou-sheng-xin (Buddha-mouth-saint-heart) compassionate in
words and intentions
c. fo-kou-she-xin (Buddha-mouth-snake-heart) nice talk, vicious
intention
d. daozi-zui, doufu-xin (knife-mouth, tofu-heart) marked by a ten-
dency to utter biting words in spite of a sympathetic disposition
e. kou-shi-xin-fei (mouth-true-heart-false) verbally agree, but
think the opposite
f. xin-zhi-kou-kuai (heart-straight-mouth-quick) frank and
straightforward
g. zui-tian-xin-ku (mouth-sweet-heart-bitter) say good words while
holding malignant intentions
h. kou-mi-fu-jian (mouth-honey-bowel-sword) say good words
while holding malignant intentions
Expressions (4a), (4b) and (4c) in this category share the form of a juxta-
position of two NPs in which the mouth (N
1
) and the heart (N
2
) are each
preceded by a modier (X), thus X
1
N
1
X
2
N
2
. In (4a), bitter mouth is
contrasted with grandma heart to refer metonymically to the contrast
between unpleasant speech and loving intention. The modier bitter
is used metaphorically because it prototypically describes taste in the sen-
sual domain. On the other hand, grandma is used metonymically in that
a kinship term whose referent is typically associated with loving inten-
tions refers to loving intentions. The modiers fo Buddha, sheng saint,
and she snake in (4b) and (4c) are all metonymies in the sense that their
referents are the respective prototypes of mercy, virtue, and malig-
nancy. As such they provide mental access to the respective abstract
qualities. In (4d), a sharp utensil, daozi knife, stands metonymically
for sharpness and a culture-specic food of a soft texture, doufu tofu, re-
fers to softness. This softness, in turn, is a metaphor for sympathy, be-
cause touch in the sensual domain is mapped into the domain of social
emotion.
Expressions (4e), (4f ), (4g) and (4h) take the structure N
1
P
1
N
2
P
2
,
where the two body-part nouns are each followed by a predicate (P) to
describe the perceived relationship between verbal behaviour and thought.
(4e) can be seen as a pair of straightforward metonymies where the truth-
fulness of the mouth refers to the truthfulness of verbal behaviour and the
250 Z. Jing-Schmidt
falseness of the heart stands for the falseness of intention. By contrast,
both (4f ) and (4g) contain a metaphorical mapping that is embedded in
the metonymy based on the properties of a speech organ. The adjectival
predications zhi straight and kuai quick in (4f ) originate in the respec-
tive domains of geometric properties and speed; tian sweet and ku bit-
ter in (4g) belong to the sensual domain of taste. They are mapped into
the domain of verbal behaviour to refer to frankness in (4f ), deceptive
verbal friendliness and malignant intention in (4g). In (4h), a prototype
of sweet and enjoyable food, mi honey, is reduced to enjoyability; a cul-
tural prototype of aggressive weapon, jian sword, is reduced to aggres-
siveness by way of metonymy. This metonymical mapping is embedded
in the main metonymy involving a property of a speech organ as the
source. That is, the honey-like quality associated with the speech organ
stands for the enjoyability of speech produced by that organ. The sword-
like quality associated with the bowel stands for the aggressiveness har-
boured in the bowel as the locus of emotion. On the other hand, the
transfer of enjoyability and aggressiveness available in the physical do-
main of food and weapon into the behavioural domain of language and
intention is distinctly metaphorical.
Further examples of the interactional properties of speech organ as
source are found in the following expressions involving the mouth, the
tongue, the lips and the teeth or various combinations of these oral struc-
tures. Consider (5):
(5) a. duo-zui-duo-she (much-mouth-much-tongue) marked by the an-
noying tendency to make unsolicited remarks or general verbal
indiscretion
b. you-zui-hua-she (oil-mouth-lubricant-tongue) speak in an insin-
cere manner
c. she-jian-kou-kuai (tongue-sharp-mouth-quick) verbally aggres-
sive
d. chi-kou-bai-she (red-mouth-white-tongue) talking groundlessly,
irresponsibly
e. chang-she-fu (long-tongue-woman) gossipy woman
f. du-she (poison-tongue) ability to make hurtful remarks
g. san-cun-bu-lan-she, liang-hang-ling-li-chi (three-inch-not-rotten-
tongue, two-row dexterous-teeth) eloquence
h. ling-ya-li-chi (nimble-incisor-dexterous-molar) marked by ver-
bal skill
i. gou-zui-tu-bu-chu-xiang-ya (dog-mouth-spit-not-out-elephant-
teeth) A verbally mean person is unlikely to utter good re-
marks.
Chinese metonymies and metaphors 251
In (5a), as has been mentioned with regard to (1c), the interactional quan-
titative property of the speech organ, duo much, refers to the excessive-
ness and indiscretion of verbal behaviour. In (5b), the sense of insincerity
is recoverable from the idiosyncratic metaphorical use of you oil and hua
lubricant. These two materials are apt to make things smooth and slip-
pery at once. When it comes to persons, one who is slippery is insincere,
tricky and undependable. The association of slipperiness with trickiness is
not unfamiliar to speakers of English. In (5c), sharpness as an interac-
tional property of the tongue and the mouth stands for aggressiveness of
verbal behaviour produced with the help of these speech organs. Here
again, the use of jian sharp, pointed and kuai quick is metaphorical
because the proper understanding of these words depends on a cross-
domain transfer from the physical to the verbal.
Compared to (5a), (5b) and (5c), in (5d), the metaphorical meanings ex-
pressed by the interactional properties associated with chi red and bai
white are less predictable. Two metaphors are indicated here. On the one
hand, the colour red describes bareness and emptiness. This metaphor re-
lies on an imagery: red is the usual colour of a newborn baby who comes
into the world naked. Thus, chi red acquires a polysemous extension,
namely naked, by way of the birth scene. This extension is metonymic in
nature. As the scene-specic metonymic link between the two senses is re-
moved from the original physical context of birth, chi becomes a general
representation of abstract bareness and emptiness. This is a metaphoric
process. On the other hand, the white colour, being perceived as the most
colourless colour, is taken as the source for the understanding of plainness,
blankness, emptiness and similar abstract qualities. Together, in the verbal
context dened by the mouth and tongue metonymy, red and white char-
acterise the verbal behaviour of talking irresponsibly, or accusing someone
groundlessly. This example is a perfect illustration of the deep experiential
grounding of seemingly unmotivated expressions.
While the rst four examples in (5) represent the metonymy mouth
and tongue as speech, (5e) and (5f ) feature the tongue as speech met-
onymy. (5e) is a pejorative name given to a gossipy female. The improper
size of the tongue as an interactional property stands for the excessiveness
of speech that the tongue helps to produce. In (5f ), the venomous prop-
erty of the tongue gives access to the malignant nature of speech meant to
hurt. In the sense that the hazard of poison is partially mapped into the
verbal domain to describe the power of words to hurt, we are dealing
with a metaphor embedded in the metonymy property of speech organ
for property of verbal behaviour.
A frequently used idiom in the oral tradition of urban narratives, (5g)
describes eloquence or verbal persuasiveness in terms of the skilfulness
252 Z. Jing-Schmidt
of the tongue and the dexterity of the teeth by metonymy. Likewise, the
dexterity of the teeth stands for good verbal skill in (5h). As usual,
the properties of nimbleness and dexterity assigned to the tongue and the
teeth are subjective and interactional.
(5i) is conceptually more complex in that it contains two metonymies
based on interactional properties of a speech organ as source. Here, gou-
zui dog-mouth is contrasted with xiang-ya, elephant teeth, whereby zui
mouth and ya teeth both refer to verbal behaviour per metonymy. The
respective interactional properties signalled by the modiers gou dog
and xiang elephant are also metonymically derived, as the two animals
are taken as the respective prototypes of meanness and dignity. However,
in the sense that a scenario in the animal domain is used for the under-
standing of a phenomenon in human verbal behaviour, it is certain that
metaphor, too, is at work here.
Closely related to the expressions based on the speech organ meton-
ymy are items containing the words sheng voice, qi air, qiang accent,
and diao tone or their combinations. Although these words are not
speech organ terms from the perspective of physiology, their referents
are important components of speech articulation from the perspective of
phonetics. Examples in (6), below, exhibit a similar conceptual process,
namely that the interactional property of the physical manner of speech
articulation refers to the property of verbal behaviour.
(6) a. di-sheng-xia-qi (low-voice-down-air) humble-toned in deference
to a superior
b. ying-yang-guai-qi (yin-yang-anomalous-air) verbally elusive,
ambiguous and marked by a dubious intention
c. kou-qi-da (mouth-air-big) boastful
d. li-zhi-qi-zhuang (reason-straight-air-strong) talk assertively on
the ground of a strong argument
e. you-qiang-hua-diao (oil-accent-lubricant-tone) speak in an insin-
cere manner
Clearly, as (6a) shows, the humbleness of tone that characterises the sub-
missive verbal behaviour of an inferior is made accessible by the meto-
nymical depiction of the properties of low voice and weak air as compo-
nents of the articulation of speech. Imbedded in this metonymy is a
spatial metaphor of social status. Specically, the notions of di low and
xia down are used metaphorically because their understanding as signal-
ling inferiority in the present context requires a two-domain mapping
from spatial perception to social relationship.
Expression (6b) provides a description of a culture-specic experience
of sarcastic verbal behaviour. The interactional properties described by
Chinese metonymies and metaphors 253
yin and yang, the two primordial principles in opposition that govern all
things, are obviously unique to the Taoist cosmology. The ambiguity in
the waxing and waning of yin and yang in the air involved in speech ar-
ticulation signies the anomalously mixed tone that marks elusive and
ambiguous verbal behaviour. Here, the culture-specicity of the metaphor
does not seem to forbid a straightforward interpretation of the idiom on
account of the probable semantics of the components, though basic cul-
tural knowledge is necessary. (6c), too, involves the air in the mouth as a
physical component of speech articulation. In this case, the great physical
force as an interactional property associated with the release of air in
speech articulation stands for verbal exaggeration. Meanwhile, verbal
assertiveness is conceived of as strong air in (6d). (6e) exhibits the same
imagery of slipperiness as (5b) except that the mouth and the tongue are
here replaced by the accent and the tone.
By now, the discerning reader may have noticed that the expressions
based on the property of speech organ stands for property of verbal
behaviour metonymy exhibit a remarkable syntactic regularity. That is,
they instantiate two major form-meaning pairs, or constructions. One is
the nominal construction X-N, containing a modier (X) and a modied
entity (N), e.g., you-qiang-hua-diao (oil-accent-lubricant-tone); the other
is the subject-predicate construction N-P, containing a subject noun (N)
and a predicate (P), e.g., zui-ying (mouth-hard) and kou-mi-fu-jian
(mouth-honey-bowel-sword). The speech organ as the metonymic vehicle
is the N in both constructions. The modier (X) and the predicate (P) are
semantically metaphorical in some cases, specifying the interactional
properties being communicated. They are metonymic in other cases, pro-
viding a prototype that typies the interactional properties to be con-
veyed. This compositionality-based constructional regularity corresponds
to the semantic recoverability of the expressions, although the individual
lexical collocations are largely conventional. Thus, clearly, as Nunberg
et al. (1994) argue, conventionality should not be confused with non-
compositionality.
2.2. Action upon speech organ as source of metonymy
Expressions in this subcategory construe verbal behaviour in terms of a
transitive event in which the speech organ is the object being aected by
the transitive action. Thus, we have the metonymy action affecting
speech organ stands for verbal action. Inherent in this metonymy is
the metaphorical mapping from a physical action into a verbal action serv-
ing particular social purposes. There are two typical forms. The simple VO
is usually used in everyday contexts and the V
1
O
1
V
2
O
2
construction is
used more in literary contexts. Let us rst consider the expressions in (7):
254 Z. Jing-Schmidt
(7) a. ding-zui (upward push-mouth) retort to the explicit criticism or
charge made by a superior
b. dou-zui (ght-mouth) argue, quarrel
c. du-zui (stu-mouth) disallow someone to speak up by bribing
them
d. cha-zui (stick (vt.)-mouth) chip in, interrupt
In these expressions, the word zui mouth acquires its meaningfulness
only through a metonymical highlighting of the mouth as a speech organ.
The verbal actions of arguing with a superior, quarrelling, and silencing
someone by bribing are invariably accessed via physical actions upon the
mouth as speech organ. The transitive action of ding upward pushing in
(7a) contains the implicit spatial element upward which metaphorically
alludes to a social hierarchy in which a superior is considered up. This
metaphor enables a further metaphor, namely that an upward bodily
movement is social deance which, by way of the mouth metonymy, is
understood as deance in verbal communication. Similarly, dou ght in
(7b) constitutes the metaphor argument is war, whereby the physical
action of ghting stands for verbal ght.
In (7c), the strategic practice of bribing someone is understood as the
physical action of stung someones mouth, that is, feeding someone
with a bribe. Furthermore, the stung of a persons mouth in the literal
sense metonymically entails the immediate result of the action of stung:
the person with a stued mouth is physically prohibited from speaking.
This metonymy, however, acquires its relevancy only in the verbal con-
text set up by the speech organ metonymy. The complex construction
of meaning in this case, it seems, involves the creation of a novel space,
e.g., the inference of manipulating someones verbal behaviour by giving
them a bribe, which is erstwhile unavailable in du and zui as the input. In
this sense, it may be suitable to invoke the notion of conceptual blending
as an alternative to metaphor-metonymy interaction. In (7d), the verbal
behaviour of interrupting someones speech by taking an unjustied turn
of speech is referred to as the physical action of sticking ones mouth, as if
it were a manually manipulable object, in between someones talk.
Clearly, as these examples show, the mouth as a speech organ invaria-
bly acts as the metonymic vehicle via which Chinese speakers understand
verbal behaviour. In addition, verbal actions are described as if they were
physical actions. Thus, the metonymy action affecting speech organ
for verbal action is not purely metonymic, but relies on a metaphorical
mapping also. The idiom in (8) works by the same principle:
(8) ma-bu-huan-kou, da-bu-huan-shou (scolded-not-return-mouth,
beaten-not-return-hand,) entirely obedient, non-resistant
Chinese metonymies and metaphors 255
Here, the rst half of the idiom describes ones failure to verbally defend
oneself in terms of returning the mouth as a speech organ. The second
half describes ones failure to physically defend oneself against physical
assault in terms of returning the hand as the most useful bodily instru-
ment for action. No doubt the mouth and the hand are taken metonymi-
cally to refer to speech and action, respectively. On the other hand, the
choice of the verb huan return suggests the metaphor social interac-
tion is exchange of objects. Thus, the verb-object collocations represent
a cardinal metaphor-metonymy interaction. The metaphorical under-
standing of speaking as performing physical actions is also illustrated by
the items in (9) below. These expressions, again, invariably contain at
least one speech organ that serves as the metonymic vehicle. The verbs
preceding the nouns of speech organ, however, are used dierently, vary-
ing between the literal, as in (9a), the metonymical, as in (9b), and the
metaphorical sense, as in (9f ).
(9) a. xue-she (imitate-tongue) imitate, repeat what others say
b. nan-yi-qi-chi (dicult-to-open-teeth) having diculty in talking
about something
c. jiao-shetou (chew-tongue) gossip
d. zhang-kou-jie-she (stretch-mouth-knot-tongue) speechless as in
shock
e. yao-chun-gu-she (sway-lip-pu-tongue) attempt to verbally
instigate
f. gao-chun-shi-she (balm-lip-wipe-tongue) attempt to persuade by
nice talk
The action of repeating what others say is understood as the action of
imitating other peoples tongue, as in (9a). Opening ones teeth refers to
the start of a speech, as in (9b). While these two items are essentially met-
onymic, the items in (9cf ) seem to lean towards the metaphoric pole
in that they rely more heavily on a two-domain mapping. Consequently,
the level of gurativity and aectivity are signicantly higher in these
items. The verbal activity of gossiping is made accessible by the depic-
tion of the physical scene of tongue chewing in (9c). The link is enabled
by the perception that gossiping and chewing share certain oral move-
ments involving the activities of the teeth and the tongue. Presumably,
the scene of chewing the tongue is impressionistic and interactional rather
than objective, reecting a negative attitude towards gossip. The negative
sense probably arises from the incongruity characterising the image of
a gossiping mouth. That is, the teeth are busy doing the wrong thing:
what gets chewed is the tongue instead of food, which would have been
sensible.
256 Z. Jing-Schmidt
Fictive dynamic actions can be observed in the conception of verbal ac-
tions in the three idioms (9df ) involving the V
1
O
1
V
2
O
2
construction.
Specically, the imaginary knotting of the tongue indicates the inability
to speak in (9d), the elaborate performance of swaying the lips and pu-
ing the tongue metaphorically and hyperbolically describes the verbal ef-
fort to instigate in (9e), and the physical actions of beautifying the lips
and the tongue as speech organs signify the verbal attempt to beautify
ones speech for the purpose of persuasion in (9f ). Thus, what we encoun-
ter in the conception of verbal behaviour is an insistent imagination of dra-
matic physical scenes involving speech organs. This dramaticity gives rise
to a sense of irony with regard to the verbal behaviours being described.
The expressions in (10), below, show that verbal eort can be con-
ceived of in monetary terms.
(10) a. fei kou-she (spend-mouth-tongue) talk in vain
b. fei zuipizi (spend-lip/mouth skin) talk in vain
c. fei-she-lao-chun (spend-tongue-labour-lip) take a verbal eort
to convince
Here, the eort to talk to convince someone is understood in terms of
spending ones organs of speech articulation. Thus the metaphor verbal
effort is money, or simply speaking is spending, is embedded in the met-
onymy action affecting speech organ for verbal action.
The expressions discussed in this section (with 9a as an exception,
where the verb xue imitate, learn is more literal than metaphorical) ex-
hibit an embedment of a metaphor in the metonymy action affecting
speech organ for verbal action. The metaphor can be based on a phys-
ical action that involves the dynamic transaction of bodily energy. Alter-
nately, the metaphor may be based on the familiar experience of nancial
transactions. In any case, however, the source action does not occur in re-
ality when a person engages in the target verbal action. It is merely an im-
pression, or imagination, based on our subjective experience of the target
action in the verbal domain. In this sense, it is important to recognise the
interactional properties of the action being depicted. Here again, a syn-
tactic regularity (V-O) accompanies the semantic consistency of the
expressions. This syntactic regularity reects the cognitive grammatical
principle that basic constructions encode humanly relevant events (Lako
1987; Goldberg 1995).
2.3. Eect of speech organ as source of metonymy
In this subcategory, we encounter idioms that conceptualise the eect
of verbal behaviour in terms of what the organs of speech articulation
Chinese metonymies and metaphors 257
accomplish in a physical sense. Thus we have the metonymy effect of
speech organ for effect of verbal behaviour. Inevitably, the meton-
ymy contains a metaphorical mapping from the physical domain into the
socio-verbal domain. Let us consider the items in (11):
(11) a. zhong kou shuo jin (multitude-mouth-melt-gold) collective crit-
icism is destructive
b. chi-she-shao-cheng (red-tongue-burn-town) blatant words are
destructive
c. yi-kou-yao-ding (one-mouth-bite-rm) make a vociferous ac-
cusation
d. xue-kou-pen-ren (blood-mouth-spray-person) ruthlessly attack
someone with false accusations
e. she-zhan-qun-ru (tongue-combat-multitude-scholar) verbally
combat many scholars at once
f. chi-ya-wei-huo (molar-incisor-do-harm) words cause harm.
g. kou-kou-sheng-sheng (mouth-mouth-voice-voice) repeatedly
declare
In (11a), mouths of the multitude metonymically refers to collective
criticism and melt gold metaphorically expresses the idea of a highly
destructive power. A similar idea is expressed in (11b) where the destruc-
tive potentials of the red tongue metonymically stand for the destructive
potentials of blatant words. The sense of blatancy is derived from chi red,
naked, as has been explained with regard to (5c) in section 2.1. The phys-
ical scene of a town being burned down, however, is a metaphorical
description of abstract destruction.
(11c) and (11d) are expressions of the highly specic verbal practice of
making a false accusation. By means of the metaphorical use of yao-ding
bite-rm, (11c) emphasizes the violent and determined manner in which
the accuser makes the charge. By employing the metaphorical senses of
xue bloody and pen spray, (11d) emphasizes the aggressiveness and
ruthlessness of the accuser. In both cases the metaphorical imageries are
framed in the metonymy that understands the eect of speech organ as
the eect of verbal behaviour. (11e) is the war metaphor of argument
embedded in the metonymy based on the performance or eect of a
speech organ. In (11f ), the negative consequence of speaking is referred
to as the potential harm done by the teeth as body-parts of speech articu-
lation. (11g) consists of the mouth word, kou, and the voice word, sheng.
Both are metonymic, referring to speech. The reduplication of the two
words for the purpose of expressing repetition seems to work metaphor-
ically in the sense of more of form is more of content (Lako and John-
son 1980: 127).
258 Z. Jing-Schmidt
In this section, I have shown that the highly schematic conceptual met-
onymy organ of speech articulation for speech is a central mecha-
nism underlying the Chinese conceptualisation of verbal behaviour.
Referentiality and gurativity (Dirven 2002: 102105) are evident in the
speech organ metonymy: the speech organ being named cannot be taken
literally. Rather, it is invariably gurative and provides mental access to
speech-related behaviour. This metonymy is realized in three specic sub-
types each of which emphasizes a particular aspect of verbal behaviour.
The particularities of the subtypes derive from the particular metaphors
embedded in the basic speech organ metonymy.
It might be worthwhile to observe that this central metonymy interacts
with the embedded metaphors in nontrivial fashions. More specically,
the successful metaphorical mapping into the abstract domain of verbal
behaviour as target presupposes a metonymical mapping from speech or-
gan onto speech physically produced by the speech organ within the do-
main matrix of language. Put otherwise, the speech organ metonymy,
being referential in nature, puts a relevancy constraint on the metaphori-
cal mapping such that the target is restricted to the domain of verbal be-
haviour. On the other hand, the metaphorical mapping, by virtue of its
interactional and expressive force, is responsible for the conceptual rich-
ness that arises from the imagic particulars of the source concepts.
In summary, our analysis of the relevant expressions amount to two
generalisations. First, the conceptualisation of verbal behaviour is em-
bodied. Second, the meanings of the idiosyncratic expressions are not en-
tirely opaque, but recoverable and even predictable from the metonymic
and metaphoric senses conveyed by the components. The predictability,
however, is not an all-or-nothing matter but relative and gradient, e.g.,
(4g) kou-mi-fu-jian (mouth-honey-bowel-sword), (5d) du-she (poison-
tongue) and (9a) xue-she (imitate-tongue) may be more readily recovered
than (6b) yin-yang-guai-qi (yin-yang-anomalous-air), (7a) ding-zui (up-
ward push-mouth) and (9f ) gao-chun-shi-she (balm-lip-wipe-tongue).
3. Major conceptual metaphors of verbal behaviour
As well as the conceptual metonymy based on organs of speech articula-
tion, Chinese is rich in conceptual metaphors of verbal behaviour. De-
pending on the concepts or schemata that constitute the source domain,
four conceptual metaphors of verbal behaviour are observed. They are:
(i) verbal behaviour is physical action, (ii) speech is container, (iii)
argument is war, and (iv) words are food. Because these metaphors
arise from basic human experience, they are likely to be universal. How-
ever, as we shall see in the forthcoming paragraphs, these conceptual
Chinese metonymies and metaphors 259
metaphors may at times allow culture-specic image-schemata associated
with special aective connotations. In what follows, I shall discuss the
four conceptual metaphors underlying a large number of Chinese expres-
sions of verbal behaviour.
3.1. VERBAL BEHAVIOUR IS PHYSICAL ACTION
The conceptual process underlying this metaphor pertains to the mapping
from physical actions onto social actions of using words in interpersonal
interaction. Importantly, this metaphor is closely related to the meanings
(or words) are objects metaphor as part of what Reddy (1979: 290) refers
to as the conduit metaphor. The expressions in (12) illustrate this process:
(12) a. gua-zai-zui-bian (hang-on-mouth-side) habitually or repeat-
edly say something
b. bu-zu-gua-chi (not-enough-hang-tooth) not worth mentioning
c. zhi-di-you-sheng (throw-ground-have-sound) making remarks
that produce resonance
d. yi-tu-wei-kuai (one-spit-make-pleasure) speak up uninhibitedly
to feel good
e. zhan-ding-jie-tie (chop-nail-cut-iron) speak resolutely
f. zi-zhen-ju-zhuo (word-pour with measure-sentence-pour with
measure)
In (12a), the verbal behaviour of habitually or repeatedly saying some-
thing is conceived of as a physical event of hanging a three dimensional
object on the side of ones mouth. In this physical domain of a transitive
action, the action of hanging has the eect that the object being hung be-
comes attached to the mouth and remains in that state of attachment. It is
this physical attachment as part of the imagery in the source domain that
gets mapped into the target domain of verbal behaviour to describe the
incessantness with which a certain unit of language is uttered. A similar
process is at play in (12b) where the teeth instead of the mouth participate
in the physical scene. In (12c) a resonant verbal expression is described as
a physical object of, presumably, substantial volume, weight and a particu-
lar texture such that it is able to produce a loud sound when thrown on
the ground. Clearly, this mapping entails that words are conceived of as
having an existence independent of people and context (Lako and
Johnson 1980: 11) and are capable of generating physical eects. The
need to talk about something is conceived of as a bodily urgency to spit
something out in (12d): something that one feels a great desire to commu-
nicate is described as a disturbing object in the mouth. One must spit it
out for the sake of ones well-being. (12e) describes the resolute manner
in which something is said. Determination and resolution are conceived
260 Z. Jing-Schmidt
of as the physical potency to cut and break something as hard and un-
breakable as nails and iron. (12f ) describes the discreet manner of a cir-
cumspective speaker as a careful physical action of pouring out tea or
wine with exact measure, whereby words and utterances are imaged as
the liquids being measured and dispensed from a container.
The more schematic metaphor verbal behaviour is physical action
involving a physical object has a specic instantiation in the metaphor
verbal behaviour is manipulation of a musical instrument, illustrated
by the idioms in (13):
(13) a. dui-niu-tan-qin (towards-bovine-play-musical instrument) say
things that are beyond the hearers ability to understand, cf.
pearls before swine)
b. da-bian-gu (beat-side-drum) to help someone inconspicuously
by saying things in the background
c. zi-chui-zi-lei (self-blow trumpet-self-beat drum) blow your own
trumpet, boast about oneself
d. lao-diao-chong-tan (old-tone-renewed-play) to talk about
something that is already old hat
It is noteworthy that, in (13a), the lack of intelligence is metonymically
inferable from niu bovine animal, a typical dull creature. By contrast, a
string instrument metonymically gives rise to the inference of sophistica-
tion and good taste. Thus, even in this apparently metaphorical expres-
sion in which talking is understood as playing a string instrument, meton-
ymy is at work in tandem with metaphor.
Apart from the idioms I have discussed, there are a number of lexical
compounds in the colloquial language that encode verbal behaviours met-
aphorically by describing a transitive physical action involving a physical
object. Because of lexical entrenchment due to frequent use, it is possible
that the physical scenes behind these prefabricated compounds have be-
come washed out, if not entirely obscure, to native speakers of Chinese.
Consider (14):
(14) a. da-cha (beat-road branch) chip in, interrupt
b. che-pi (pull-skin) chat, talk rubbish
c. wa-ku (dig-bitterness) speak sarcastically
d. chui-niu (blow-bull) boast
e. pai-ma-pi (pat-horse-ass) toady
f. po-leng-shui (slosh-cold-water) verbally discourage
g. huo-xi-ni (mix-thin-mud) say neutral things to dilute the inten-
sity of a conict between other people
h. fa-lao-sao (let out-prison-urine stench) complain, grumble
i. kai-men-jian-shan (open-door-see-mountain) speak directly
Chinese metonymies and metaphors 261
j. dakai-tian-chuang-shuo-liang-hua (open-sky-window-speak-
light-words) talk openly
k. xin-kou-kai-he (let-mouth-open-river) speak heedlessly
Syntactically, all of these compounds can be schematised as the transitive
construction, i.e., the VO construction. However, some of them describe
semantically intransitive events, as, for examples, (14b), (14c), (14d),
(14h), (14i) and (14j). This apparent mismatch between the syntactic
form and the meaning may at rst glimpse suggest that the conventional-
ised metaphors resist an interpretation on account of the respective mean-
ings of the verb and its argument, much in the same way as kick the
bucket which must be treated as an unanalysed whole. However, a closer
examination of these items will correct this initial impression. The verb-
noun collocations are analysable and can be shown to have cognitive
motivations despite their high idiosyncrasy. In each case, the semantic
motivation resides in the dynamic scene conjured by the transitive con-
struction. Some scenes are more dramatic and outrageous than others.
In (14a), beating a road branch is a guration of interruption exactly be-
cause of the semantic contribution made by road branch in its gurative
sense of deviating from the main road to a side road. In (14h), expressing
grief and discontent is understood via the image that someone opens a
window in the prison to let out the chronic, cumulate odour of urine.
While this imagery might be the height of idiosyncrasy, the constitutive
elements of grief, namely prolonged connement and intense poignancy,
are made palpable via lao prison and sao urine stench. These two
things may be considered the respective archetypes of connement and
poignancy. The same semantic recoverability based on constituent input
characterises all the other items in this list, though the degree of semantic
transparency varies from expression to expression.
Compared to the expressions in (14), the compounds in (15) are seman-
tically more transparent because the objects of the transitive actions ex-
plicitly name concepts in the domain of verbal behaviour instead of
three-dimensional things. Nevertheless, the transitive verbs per se describe
physical actions such that the verbal behaviours in question are accessed
via a metaphorical mapping.
(15) a. sa-huang (throw-lie) tell a lie
b. che-huang (pull-lie) tell a lie
c. zao-yao (manufacture-rumour) start a rumour
Another type of the conceptual metaphor verbal behaviour is physical
action involves a transitive action the object of which is the heart. Con-
sider (16):
262 Z. Jing-Schmidt
(16) a. tao-xin (pull out-heart) candidly communicate
b. jiao-xin (exchange-heart) communicate
c. tui-xin-zhi-fu (push-heart-put in-bowel) communicate with
mutual trust
These expressions are based on the interaction between the physical
action metaphor and a metonymy that employs the heart as a body-
part to refer to thoughts. Thus, the act of communication is conceived of
as physical actions whereby the heart is treated as a manipulable object
that can be displaced or transferred between persons.
A further subtype of the physical action metaphor describes a verbal
act that is conducted upon an implicit human object. The gurativity arises
from the fact that the verb being used to describe such an act belongs to the
physical domain where it aects a physical object. Yet regular speakers of
Chinese may not be aware of the underlying metaphorical mapping and
simply learn these expressions as part of the lexicon. Consider (17):
(17) a. ding-zhuang (push upward-hit) verbally insult (a superior)
b. pang-qiao-ce-ji (side-knock-side-strike) indirectly suggest
c. hu-you (sway-swing) atter, toady to
d. chui-peng (blow-lift) lavishly praise
e. kai-dao (open-guide) instruct, help to understand
f. wai-qu (crooked-bend) distort
g. da-duan (beat-broken) interrupt
h. jie-lu (pull-bare) reveal, uncover
The physical and mostly kinetic sense of the actions is largely latent and
may not be accessed by the average language user. Nevertheless, the lex-
ical semantics of these items is not arbitrary, but motivated and cannot be
explained without reference to the conceptual process of a metaphorical
mapping from physical actions to verbal behaviours. For example, in
(17a), verbally insulting someone is understood in terms of the physical
action of upward pushing and hitting someone. Here again, the spatial
element implicit in ding push upward signies relationship in a social hi-
erarchy. In (17b), talking is described as knocking and striking and the
spatial terms pang side and ce side indicate the abstract indirectness in
verbal interaction. Specically, spatial periphery is verbal indirectness.
The rest of (17) works by the same mechanism of mapping a bodily
action onto a verbal action.
3.2. SPEECH IS CONTAINER
According to Reddys (1979: 290) observation, the majority of English
expressions about language instantiate the conduit metaphor of which
Chinese metonymies and metaphors 263
the container metaphor is a signicant part. The same conceptual meta-
phor is operative in the Chinese conceptualisation of linguistic expres-
sions and verbal behaviour. The expressions in (18) illustrate this:
(18) a. yan-wai-zhi-yi (word-outside-of-meaning) unsaid but inferable
message
b. hua-li-you-hua (speech-inside-have-speech) Theres a (hidden)
message in the utterance.
c. hua-li-hua-wai (speech-inside-speech-outside) direct and indi-
rect message
d. dakai-hua-xiazi (open-word-box) start to talk excessively
e. shi-hua (lled-word) honest words
f. kong-hua (empty-word) empty or pretentious words
g. xian-wai-zhi-yin (string-outside-of-sound) unsaid or hidden
message
The spatial concepts of li inside and wai outside in (18ac) signal that
words or meanings are understood as a container that has an inside and
an outside that can be physically dened. Example (18d) takes the con-
tainer metaphor a step further by indicating that words may be imaged
more concretely as a box that can be opened or shut. As containers of
meanings, words may be lled (18e) or empty (18f ). (18g) is related to
the metaphor of containment in a more complex manner. On the one
hand, we have to do with the metaphor talking is playing an instru-
ment whereby xian string (as of a musical instrument) stands in a part-
whole metonymic relation to a stringed instrument. On the other hand,
the spatial concept of wai outside points at the container metaphor.
3.3. ARGUMENT IS WAR
This conceptual metaphor, too, is not unique to Chinese and its near uni-
versality is grounded in the fundamental human experience of conict of
varying scopes. Conicts and battles among individuals and groups have
always accompanied the history of evolution and are no doubt one of the
most entrenched experiences in human existence. This experiential basis
accounts for the fact that this metaphor is conventionalised in a multitude
of idioms and compounds in Chinese. As we shall observe in (19), inher-
ent in the war metaphor is the more specic metaphor words are
weapons.
(19) a. chun-qiang-she-jian (lip-spear-tongue-sword) disputatious ver-
bal exchange
b. dan-dao-zhi-ru (single-knife-straight-enter) engage in a direct
verbal attack
264 Z. Jing-Schmidt
c. maotou-zhi-xiang (spearhead-point-to) aim a verbal attack at
d. hua-feng-yi-zhuan (speech-sharp point of a weapon-one-turn)
change the direction of verbal attack
e. yi-yu-zhong-di (one-utterance-hit-target) hit the spot verbally
f. chu-kou-shang-ren (exit-mouth-hurt-person) verbally insult
g. e-yu-shang-ren (evil-language-hurt-person) verbally insult
h. ren-yan-ke-wei (people-word-worth-fear) words are worth
fearing
i. ti-wu-wan-fu (body-without-complete-skin) completely af-
fected by harsh verbal attacks
j. yan-ci-ji-lie (word-rhetoric-erce-intense) speak polemically
The expressions (19ad) explicitly name specic weapons as the source of
the metaphor. (19a) exhibits a complex interaction between the speech
organ metonymy and the weapon metaphor. On the one hand, chun
lip and she tongue refer metonymically to speech; on the other hand,
qiang spear and jian sword describe the aggressive potency of speech
metaphorically by mapping weapons in the domain of war and battle
onto the domain of argument. While (19be) encode language in terms
of a weapon used to attack an opponent in the linguistic battle, (19fg)
state plainly that words have the potentials of physically wounding a vic-
tim and are thus feared (19h). (19i) conceives of the devastating eect of
verbal attacks in terms of the extent to which the body is physically
wounded. (19j), on the other hand, adopts the description ji-lie erce
and intense which is usually used to describe battles.
Similarly, the compounds in (20) describe verbal aggression in terms of
physical aggression that typies war and battle.
(20) a. ci-er (pierce/stab-ear) biting (remarks)
b. feng-ci (sarcasm-stab) sarcasm
c. mo-sha (wipe-kill) deny
d. peng-ji (blow-strike) vehemently criticise
e. zhong-shang (hit-wound) verbally defame
All of these expressions contain at least one morpheme describing a phys-
ical act of aggression, e.g., ci stab in (20ab), sha kill in (20c), peng hit,
blow and ji strike in (20d) and shang wound in (20e). Thus, a hostile
remark is described as a sharp object (weapon) that pierces the ear in
(20a). The verbal behaviour known as sarcasm is associated with the ag-
gressive physical act of stabbing, as in (20b). To verbally deny a fact is to
wipe out and even kill that fact, as in (20c). The issuance of criticism is
conceived of as a physical act of striking in (20d) and to defame someone
verbally is to wound them physically, as in (20e).
Chinese metonymies and metaphors 265
3.4. LINGUISTIC EXPRESSIONS ARE FOOD
As the most immediate survival necessity for all living organisms includ-
ing humans, food is fundamental to our existence and, as Kass (1999) and
Rozin (1999) show, inuences our behaviour in the most profound ways
possible. Thus, it is likely that food constitutes a natural universal con-
ceptual domain that serves as the source of conceptual metaphors. Lako
and Johnson (1980: 152) and Ko vecses (2002) have discussed the ideas
are food metaphor. Ko vecses (2002) has talked about the sexual desire
is appetite metaphor, the source of which is related to the experientially
basic domain of food. In Chinese, we observe the conceptual potentials of
food in the following expressions:
(21) a. tian-yan-mi-yu (sweet-words-honey-speech) exceedingly nice
talks intended to atter or deceive
b. hua-bu-dui-wei (language-not-right-avour) words with some
peculiar hidden message
c. tun-tun-tu-tu (swallow-swallow-spit-spit) speak with reluctance
and dishonesty
d. yao-wen-jiao-zi (bite-text-chew-word) write or speak ver-
bosely
e. ye-ren (choke-person) aggressive (remarks)
f. sheng-se (raw-puckery) obscure and dicult to understand
g. tian-you-jia-cu (add-oil-add-vinegar) (as a third party) inten-
sify a conict by saying things conducive to an escalation
The taste of food is mapped onto the agreeability of words in (21a) and
(21b). We further observe mappings that focus on food as physical objects
that can be swallowed, spit out, bitten or chewed, as in (21c) and (21d),
too hard too coarse to swallow and thus capable of choking the eater, as
in (21e), or too unripe (as of fruit) to be enjoyable or even digestible, as in
(21f ). Related to the metaphor linguistic expressions are food is (21g)
in which the culinary practice of adding seasonings to enhance the avour
of food is employed to describe a particular verbal behaviour that is in-
tended to intensify a conict.
3.5. Culture-specic metaphors
Apart from the major conceptual metaphors discussed in the foregoing
paragraphs, we encounter several other metaphors that may not immedi-
ately arise from universal source domains. The three expressions in (22)
illustrate the collocation of disagreeable physical temperatures, both cold
and heat, with speech in the conceptualisation of unfriendly or sarcastic
attitude associated with certain hostile verbal acts.
266 Z. Jing-Schmidt
(22) a. leng-yan-leng-yu (cold-speech-cold-language) unfriendly
speech
b. leng-chao-re-feng (cold-irony-hot-satire) speak with biting
sarcasm
c. shuo-feng-liang-hua (say-wind-cold-words) speak ironically
In the background of the metaphor hostile speech is adverse tempera-
ture, verbal behaviour marked by a lack of aection, as (22a) shows,
does seem to reect a more universal metaphor, namely affection is
warmth, or lack of affection is cold. This points to the universal gu-
rative potential of warmth in emotion conceptualisation.
The expressions in (23) utilize the natural weather phenomena of wind
and rain to describe the mobility of unreliable verbal information. The
circulatory character of the wind and the disseminating character of the
rain seem to be the imageries being mapped onto spreading rumours in
the verbal domain in (23a) and (23b). (23c), by contrast, focuses on the
intentional aspect of the verbal behaviour by construing it as a volitional
transitive action of blowing wind. In addition, zhen-bian pillow side is
metonymic in that the typical location of nuptial communication stands
for nuptial communication.
(23) a. feng-yan-feng-yu (wind-speech-wind-talk) gossips, rumours
b. man-cheng-feng-yu (full-town-wind-rain) a rumour being
spread widely
c. chui-zhen-bian-feng (blow-pillow-side-wind) engage in pillow
talk in order to inuence the spouses decision
The items in (24) have as their common source domain the action of sing-
ing, which apparently gives rise to a negative connotation, though to a
varying degree of negativity:
(24) a. yi-chang-yi-he (one-sing-one-echo) oer mutual sympathetic
verbal response
b. ci-chang-bi-he (here-sing-there-echo) oer mutual sympathetic
verbal response
c. gao-chang-ru-yun (high-sing-into-clouds) propagandise a cause
or doctrine
d. chang-gao-diao (sing-high-tone) carry on propaganda
e. shuode-bi-changde-hao-ting (speaking-compare-singing-good-to
hear) nice talks that are unaccountable (waing, empty prom-
ise, or blatant attery etc.)
The collaborative performative act of singing and echoing in (24a) and
(24b) emphasizes the elaborated mutual responsiveness characterising a
Chinese metonymies and metaphors 267
conspicuous display of harmony. The underlying metaphor may be
schematised as collaborative verbal performance is collaborative
musical performance. (24c) and (24d) both employ the complex meta-
phor propaganda is singing in high pitch to convey the deliberate eort
involved in propagandising a cause. (24e) is an explicit comparison be-
tween talking and singing whereby talking is said to surpass singing in
sensual agreeability. However, as common sense has it, the opposite is
usually true. That is to say, singing is perceived (or at least intended) to
be more pleasant to the ear than speaking. It is precisely this paradox in
the comparative claim that invites the inference that talks that sound
nicer than singing are unaccountable and suspicious. This comparison is
metaphorical because auditory agreeability in the sensual domain is
mapped onto semantic and interpersonal agreeability in the verbal
domain.
As these examples show, the metaphors discussed in this subsection dis-
play a higher degree of culture-specicity than those analysed in the pre-
vious subsection, both with regard to the sources being mapped and in
terms of the emotional connotations behind the metaphors. Yet it is clear
that culture-specicity does not contradict the idea of an experiential
basis for the more local metaphors, but reects a dierential experiential
focus underlying them, to use Ko vecses (2005: 246) terms.
To summarize the section on metaphors of verbal behaviour, let us
state the basic propositions that arise from the analysis. Generally, it ap-
pears that the concepts that constitute the source domains of all the meta-
phors of verbal behaviour pertain to familiar human experiences that
can be envisaged as physical scenarios, as Semino (2005) points out in
her discussion of English metaphors of speech activity. Some of these ex-
periences are more fundamental to our existence and survival than others.
Concretely speaking, physical actions, food, and self-defence are proba-
bly the most fundamental aspects of human life: we inevitably and rou-
tinely conduct physical actions in everyday life; we depend on food and
are familiar with its properties; we are adaptively tuned to the survival
signicance of conicts and battles. Because these experiences determine
our tness and vitality as living beings, their eects on conceptualisation
are powerful and predictable. Consequently, we draw on these experi-
ences in understanding the more complex and less palpable social experi-
ence of using words in interaction with our relevant others. The sense that
our body is a container, too, is an irreducible physical experience that
inuences the way we think of language. Thus, just as many cognitive lin-
guists (e.g., Lako and Johnson 1980; Sweetser 1990; Gibbs 1994, 2006;
Grady 1999) have argued, the conventionalisation of conceptual meta-
phors is not arbitrary, but experientially motivated.
268 Z. Jing-Schmidt
Consequently, it is important to note, as Nunberg and his colleagues
have argued in their study of idioms, that conventionality does not equal
arbitrariness and non-compositionality. Rather, there is a substantial se-
mantic recoverability on account of the lexical input that conjures a rich
gestalt of experience. This recoverability is due to the fact that the coining
of the metaphors draw[s] on the full richness of our encyclopaedic knowl-
edge of our bodily and cultural experience, as Croft and Cruise (2004:
204) put it. Ko vecses (2002: 207208) speaks of conventional knowledge
and considers it an important conceptual factor that contributes to the un-
derstanding of idioms. Such knowledge enables the association between
the source and the target such that the image-schematic similarity between
the two domains can be established and novel senses can be created.
From a social psychological perspective, metaphors are expressions of
emotion. The expression of emotion, as most psychologists seem to agree,
is both a communicative and a social strategy. As has been pointed out in
section 2 with regard to metaphors embedded in the speech organ met-
onymy, metaphors do not name truth-conditional properties of verbal be-
haviour. To the contrary, they give voice to feelings and beliefs about the
perceived particularities of various socially signicant verbal behaviours.
In the sense that feelings and beliefs about verbal behaviour, universal or
culture-specic, are encoded and transmitted through metaphors, the
expressive, interactional and constitutive function of metaphors can be
specied as representing emotions with regard to verbal behaviour. This
view will be elaborated in the following section.
4. Emotion and the negative metaphor/metonymy
The denitions of emotion are various and controversial. I follow Arnold
(1960: 182) in considering emotion as the felt tendency toward anything
intuitively appraised as good (benecial), or away from anything intui-
tively appraised as bad (harmful), for a working denition. Accordingly,
the felt tendency associated with a positive appraisal may be called a pos-
itive emotion and that which is associated with a negative appraisal may
be called a negative emotion. Positive and negative emotions divide the
aective space (Russell 1979). The positivity vs. negativity of an emotion
is known as aective valence.
To measure the overall tendency of the present dataset in terms of af-
fective valence, a questionnaire survey was conducted. The questionnaire
contained the entire dataset adopted in this paper. All 122 items were
presented in isolation. The order of the 122 items was randomised. 50
informants were instructed to rate each item as positive, negative, or
neutral.
6
Four informants each left one item unevaluated, giving rise to
Chinese metonymies and metaphors 269
four missing values. The total frequencies of positive, negative and neu-
tral rating are presented in Table 1.
The gures in Table 1 show that the negative rating has the highest to-
tal frequency and the positive rating the lowest. Furthermore, the total
frequency of the negative rating is signicantly higher than expected. By
contrast, the total frequency of the positive rating is signicantly lower
than expected. The total frequency of the neutral rating is not signi-
cantly higher than expected. On the whole, the mismatch between the ob-
served and the expected frequency is signicant with the positive and the
negative rating. That is, signicantly more items of the entire set were
signicantly more frequently rated negative than positive.
7
The strong
asymmetry in the rating points to a negativity bias.
This bias is not alone from a crosslinguistic perspective. A similar ten-
dency has been reported by Simon-Vandenbergen (1995) in her study of
metaphors of linguistic actions in British English. White (1994: 226) points
to a preponderance of negative terms in the Aara lexicon of emotion.
Although these are referentially dierent from metaphors of verbal behav-
iour, the converging trend is remarkable, especially given the attitudinal
and emotional nature of such metaphors. Simon-Vandenbergen, however,
speaks of value judgments instead of emotions. I prefer the notion of
emotion over value judgement in the current context for two important
reasons. First, from an evolutionary psychological perspective, emotion is
a superordinate orchestrating program that directs the activities and in-
teractions of various subprograms including value judgement (Cosmides
and Tooby 2000: 93). To use Johnson-Laird and Oatleys (2000: 459)
words, emotions guide our lives. And since making value judgments is
part of our lives, it follows that emotions guide value judgments. Secondly,
metaphor does not pertain to the objective conceptual representation of
the external world, but pertains to the attitudinal and aective evaluation
of percepts via recurrent basic bodily experience. Thus, metaphor assumes
an aective basis and is immediately relevant to emotion.
Now what do we make of the negativity bias observed in the meta-
phorisation of verbal behaviour? This question is nontrivial, for such a
Table 1. Frequencies of positive, negative and neutral rating
Rating Frequency Percentage
Positive 990 16.2%
Negative 3005 49.3%
Neutral 2101 34.5%
Total 6096 100%
Chi-square 1002.586; df 2; p < 0.001
270 Z. Jing-Schmidt
negativity bias must be considered peculiar or marked in the face of
the well-known Pollyanna Hypothesis which claims that positive words
universally outnumber negative words (Boucher and Osgood 1969). To
explain the marked tendency, Simon-Vandenbergen (1995: 112) contends
that linguistic actions that are perceived as being out of the ordinary,
extreme in one way or another, i.e., too much or too little of some-
thing call for metaphorisation. This statement implies that metaphor-
isation is a selective process. Apparently, metaphor is not used to concep-
tualise any and all verbal behaviours, but primarily those that are
perceived as inadequate or negative. Moreover, it appears that metaphor
is not used merely to provide conceptual access to verbal behaviour. In-
stead, it seems to construe subjective experience of verbal behaviour.
However, the recognition of the selective character and the construal
function of metaphor does not explain why there are more negative meta-
phors of verbal behaviour than positive and neutral ones.
I will now propose that the predominance of negative metaphors of
verbal behaviour can be explained and predicted (a) on account of the
socio-emotional nature of verbal behaviour and (b) on account of the
basic cognitive-aective principle underlying its conceptualisation via
metaphor as a process of aective information processing. As has been
stated previously, verbal behaviour pertains to the use of language for
the purpose of communication in social interaction. The interpersonal
nature of verbal behaviour determines its socio-emotional signicance.
Therefore, the conceptualisation of verbal behaviour via metaphorical
representation pertains to the processing of socio-emotional information.
To be more accurate, the metaphorisation of verbal behaviour can be
viewed as an instantiation of socio-emotional information processing
and is consequently subjected to the principled patterns thereof. Bearing
this in mind, let us consider the default pattern of aective information
processing.
In cognitive psychology, it is widely accepted that people do not pay
equal attention to all the information in their surroundings. Rather, our
attention allocation is a limited capacity process and as such highly selec-
tive (Nosofsky 1986). Hebb (1972: 88) denes attention as sensory selec-
tivity and considers it the distinguishing mark of the higher animal.
More recently, converging evidence has conrmed that negative social in-
formation has stronger impacts on people than positive and neutral infor-
mation and that the processing of socio-emotional information is auto-
matically biased towards negative events. It has been argued that such a
bias is in keeping with our general adaptive behaviour that emphasizes
vigilance and self-defence (Baumeister et al. 2001; Rozin and Royzman
2001; Jing-Schmidt 2007).
8
In light of the socio-emotional character of
Chinese metonymies and metaphors 271
verbal behaviour and in light of the negativity bias as the default
processing pattern with regard to socio-emotional information, the pre-
dominance of negative metaphors of verbal behaviour as reported by
Simon-Vandenbergen and in the present study is no longer surprising but
seems to have a psychological grounding.
On a sociolinguistic note, the predominance of negatively valenced
metaphors also requires an analytic model that reects principles of inter-
action. Here I shall suggest some foundational ideas underlying such a
model. On the one hand, conventional metaphors are prefabricated and
as such convenient and reassuring. As Matiso (1979: 110) puts it, [t]he
security of knowing the right thing to say in a given situation is a precious
commodity (italics in the original). Such security, I shall argue, is strate-
gically appreciable especially in situations where a negative message such
as dismay, contempt, disgust, indignation, anger, etc. needs to be con-
veyed. Metaphorical prefabs full our communicative need to express
negative emotions not in our own name, but in the name of received wis-
dom, i.e., conventionalised collective emotions. This possibility is part of
what Goman (1981: 34) calls the embedding capacity that gives us
dramatic liberties. Thus, to be able to use negative prefabs is not only
reassuring, but also empowering in that the utterances of the speaking
individual assume a collective frame of emotionality. On the other hand,
Whites idea that the articulation of emotion serves a moral regulatory
function in interaction is highly relevant. Specically, negative emotions
communicate moral discontent, which may serve as a motivation to
change undesirable situations and improve social environment.
5. Conclusions
I conclude this paper by emphasizing four points. First, the cognitive pro-
cesses underlying the Chinese conceptualisations of verbal behaviour are
metonymy and metaphor. Metonymy and metaphor form a continuum of
gurativity. At the metonymic pole, we encounter expressions that con-
tain one or more salient bodily components of speech articulation which
refer to speech. On the metaphorical pole, we observe expressions in
which recurrent concrete physical actions, activities and experiences are
mapped onto abstract social behaviours involving the use of language.
Thus, common to both cognitive processes is the embodiment of the con-
ceptualisation of verbal behaviour.
Secondly, the metonymy-metaphor continuum is one of gurativity.
Correlating to the continuum of gurativity that extends from metonymy
to metaphor is the continuum of semantic predictability. The speech
272 Z. Jing-Schmidt
organ metonymy is highly schematic and semantically predictable. Com-
pared to this metonymy, the metaphors show an increasing degree of
conventionality and variously lowered degrees of semantic predictability
depending on what aspect of the source domain participates in the map-
ping. However, based on semantic compositionality, the meanings of the
metaphors are more or less recoverable.
Thirdly, many of the Chinese expressions of verbal behaviour can be
categorized in terms of universal conceptual metaphors because of the
experientially fundamental nature of their source domains. This said,
however, the particular aective valence inherent in the semantics of an
expression does suggest the reality of a culture-specic experiential fo-
cus. For this reason, a proper interpretation of the Chinese metaphors
requires a cultural model that accounts for the culture-specic aective
valence.
Finally, the overall distributions of aective valence characterised
crosslinguistically by a negativity bias may be attributable to the socio-
emotional nature of verbal behaviour and the cognitive-aective patterns
underlying its perception. On this view, the metaphorisation of verbal
behaviour is not only a cognitive phenomenon, but, more accurately, a
cognitive-aective process whereby emotion plays a crucial part in the
conventionalisation of metaphors. By making reference to the larger con-
text of human cognitive-aective behaviour and especially emotion, the
current approach seems to have provided an adequate perspective from
which to deal with the phenomenon at hand. The intellectual signicance
of this perspective is that it raises important questions for linguists with
regard to the relationship between language, cognition and emotion.
Concretely, it will be a task for future studies to determine to what extent
metaphors concerned with other target domains are related to emotion. It
is hopeful that the answers will not only shed new light on our knowledge
of metaphorical language as a cognitive phenomenon, but will also carry
our understanding of emotion and language a step further.
Received 14 May 2007 University of Cologne, Germany
Revision received 26 October 2007
Appendix A: Examples in Chinese original
(1) a. b. c.
(2) a.
b.
c. ?
Chinese metonymies and metaphors 273
(3) a. b. c. d.
(4) a. b. c. d.
e. f. g. h.
(5) a. b. c. d.
e. f. g.
h. i.
(6) a. b. c. d.
e.
(7) a. b. c. d.
(8)
(9) a. b. c. d.
e. f.
(10) a. b. c.
(11) a. b. c. d.
e. f. g.
(12) a. b. c. d.
e. f.
(13) a. b. c. d.
(14) a. b. () c. d. e.
f. g. h. i.
j.
(15) a. b. c.
(16) a. b. c.
(17) a. b. c. d. e.
f. g. h.
(18) a. b. c. d.
e. f. g.
(19) a. b. c. d.
e. f. g. h.
i. j.
(20) a. b. c. d. e.
(21) a. b. c. d.
e. f. g.
(22) a. b. c.
(23) a. b. c.
(24) a. b. c. d.
e.
Appendix B: Questionnaire
!
!
274 Z. Jing-Schmidt

!
1. b b b
2. b b b
3. b b b
4. b b b
5. b b b
6. b b b
7. b b b
8. b b b
9. b b b
10. b b b
11. b b b
12. b b b
13. b b b
14. b b b
15. b b b
16. b b b
17. b b b
18. b b b
19. () b b b
20. b b b
21. b b b
22. b b b
23. b b b
24. b b b
25. b b b
26. b b b
27. b b b
28. b b b
29. b b b
30. b b b
31. b b b
32. b b b
33. b b b
34. b b b
35. b b b
36. b b b
37. b b b
38. b b b
39. b b b
40. b b b
Chinese metonymies and metaphors 275
41. b b b
42. b b b
43. b b b
44. b b b
45. b b b
46. b b b
47. b b b
48. b b b
49. ... b b b
50. b b b
51. b b b
52. b b b
53. b b b
54. b b b
55. b b b
56. b b b
57. b b b
58. b b b
59. b b b
60. b b b
61. b b b
62. b b b
63. () b b b
64. b b b
65. b b b
66. b b b
67. b b b
68. b b b
69. b b b
70. b b b
71. b b b
72. b b b
73. () b b b
74. b b b
75. b b b
76. b b b
77. () b b b
78. b b b
79. b b b
80. b b b
81. b b b
82. b b b
276 Z. Jing-Schmidt
83. b b b
84. b b b
85. b b b
86. b b b
87. b b b
88. b b b
89. b b b
90. b b b
91. b b b
92. () b b b
93. b b b
94. b b b
95. b b b
96. () b b b
97. b b b
98. b b b
99. b b b
100. b b b
101. b b b
102. b b b
103. b b b
104. b b b
105. b b b
106. b b b
107. b b b
108. b b b
109. () b b b
110. b b b
111. b b b
112. () b b b
113. b b b
114. b b b
115. b b b
116. b b b
117. b b b
118. b b b
119. b b b
120. b b b
121. b b b
122. b b b
!
Chinese metonymies and metaphors 277
Notes
* I thank Ewa Dabrowska and the two anonymous expert reviewers for their comments
and suggestions which contributed to the improvements of this paper. I wish to express
my gratitude to Jing Ting of Harbin Normal University, China, who administered the
questionnaire survey and helped me collect the data employed in section 4. I thank her
and Stefan Th. Gries of UCSB for the help they generously oered me in dealing with
the statistics on which the quantitative analysis in section 4 rests. My thanks also go to
my friend Debra Grant who patiently studied the manuscript and improved my English.
Of course, all remaining errors are my own. Authors contact address: University of
Cologne, Department of General Linguistics, Albertus-Magnus-Platz, 50923 Ko ln.
Authors e-mail address: zjingsc0@uni-koeln.de
1. Throughout this paper, the term Chinese refers to the standard ocial language used
in mainland China and Taiwan, known as Mandarin.
2. The following abbreviations are adopted in the relevant glosses in this paper: 1Pl rst
person plural, 2SG second person singular, RES resultative, Q question.
3. An exception has been pointed out to me, by Debra Grant, in the expression Dont try
to sweet talk your way out of it. In general, however, culture-specic food preferences
and culinary experiences may account for the contrastive semantic coloration of sweet-
ness. The European appreciation of confectionery is not only evident in the delight of
the dessert as the culinary highlight. It is also linguistically evident in the idiomatic ex-
pression have a sweet tooth that encodes the favouring of sweetness. More importantly,
the emotional signicance of confectionery is such that sweets are powerful symbols of
love and their withdrawal can serve as punishment, thus constituting enormous psycho-
logical and pedagogical consequences. The Chinese are a people known to place great
value on the entree which by virtue of its variety and elaboration inevitably pre-empts
the dessert which is at the most an afterthought. Within this cultural frame, confection-
ery is marginalized, and even despised or considered destructive to a culinary event if
overindulged. In light of this, the word sweet carries very dierent connotations in the
two languages.
4. The meaning of broken may be rendered variously as sui shattered, po damaged but
unshattered, duan (oblong object) broken in two or more sections or huai defect, dam-
aged, in Chinese. While the rst three senses focus on the perceptual features of the
damaged object, the last one emphasizes functional damage. See Chen (2007) for an in-
depth study of the conceptualisation of cutting and breaking events in Chinese.
5. For our example to be considered an instance of conceptual blending, it is essential to
argue against the availability of the quality of being discreet and trustworthy in either
input. It seems to me, however, such availability is not an all-or-nothing matter, but
one of degree, depending largely on how far one wishes to stretch the association with
the input meanings. In our case, something that is tight is unlikely to leak, which allows
the association that it is safe, metaphorically so when it comes to the organ of speech.
6. All the informants are undergraduate students of the Department of Education, Harbin
Normal University, China. Mandarin is the only native language of the informants. The
questionnaire was presented at the beginning of the Fall/Winter semester of 2007.
7. The fact that the total frequency of the neutral rating is slightly higher than expected can
be explained if we take into account the potential weakness of the current questionnaire
design. Because the items are presented out of context, the rating heavily depends on the
informants abstract lexical knowledge. This is problematic especially because many
negative emotions are usually associated with behaviour in a specic situation. Thus, in
the absence of context, informants might nd it dicult to make denite judgment on
278 Z. Jing-Schmidt
the aective valence of a certain item, which may have contributed to the relatively high
frequency of the neutral rating. This is particularly considerable with items that are too
infrequently used in everyday life for informants to know what they mean at all, espe-
cially in isolation. This methodological weakness is acknowledged here and should be
overcome in follow-up research in the future.
8. Details regarding the vast literature on negativity bias are beyond the scope of this
paper. The interested reader may consult Peeters and Czapinski (1989), Skowronski and
Carlston (1989), Pratto and John (1991), Taylor (1991), Caccioppo and Berntson (1994),
Cacioppo et al., (1997, 1999) in addition to the three references in the parentheses.
References
Arnold, Magda B.
1960 Emotion and Personality, vol. 1, Psychological Aspects. New York: Colum-
bia University Press.
Barcelona, Antonio
2000 On the plausibility of claiming a metonymic motivation for conceptual met-
aphor. In A. Barcelona (ed.), Metaphor and Metonymy at the Crossroads,
3158. Berlin/New York: Mouton de Gruyter.
Baumeister, Roy F., Ellen Bratslavlavsky, Catrin Finkenauer, and Kathleen D. Vohs
2001 Bad Is Stronger Than Good. Review of General Psychology 5(4), 323370.
Boucher, Jerry and Charles E. Osgood
1969 The Pollyanna Hypothesis. Journal of Verbal Learning and Verbal Behaviour
8, 18.
Bruner, Jerome
1990 Acts of Meaning. Cambridge, MA./London: Harvard University Press.
Cacioppo, John T. and Gary G. Berntson
1994 Relationship between attitudes and evaluative space: A critical review, with
emphasis on the separability of positive and negative substrates. Psychologi-
cal Bulletin 115, 401423.
Cacioppo, John T., Wendi L. Gardener, and Gary G. Berntson
1997 Beyond bipolar conceptualizations and measures: The case of attitudes and
evaluative space. Personality and Social Psychology Review 1, 325.
Cacioppo, John T., Wendi L. Gardener, and Gary G. Berntson
1999 The aect system: Form follows function. Journal of Personality and Social
Psychology 76, 839855.
Chen, Jidong
2007 He cut-break the rope: Encoding and categorizing cutting and breaking
events in Mandarin. Cognitive Linguistics 18(2), 273286.
Church, Joseph
1961 Language and the Discovery of Reality. New York: Vintage Books.
Cosmides, Leda and John Tooby
2000 Evolutionary Psychology and the Emotions. In Michael Lewis and Jean-
nette M. Haviland-Jones (eds.), Handbook of Emotions, 2nd edition. New
York/London: The Gilford Press, 91115.
Croft, William
2002 The role of domains in the interpretation of metaphors and metonymies.
In Rene Dirven and Ralf Po rings (eds.), Metaphor and Metonymy in Com-
parison and Contrast. Berlin/New York: Mouton de Gruyter, 161206.
Chinese metonymies and metaphors 279
Croft, William and Alan Cruise
2004 Cognitive Linguistics. Cambridge: Cambridge University Press.
Dirven, Rene
2002 Metonymy and Metaphor: Dierent mental strategies of conceptualisation.
In Rene Dirven and Ralf Po rings (eds.), Metaphor and Metonymy in Com-
parison and Contrast. Berlin/New York: Mouton de Gruyter, 75111.
DUDEN
1996 Deutsches Universal-Worterbuch. Mannheim/Leipzig: Dudenverlag.
Evans, Vyvyan and Melanie Green
2006 Cognitive Linguistics: An Introduction. Edinburgh: Edinburgh University
Press.
Fauconnier, Gilles, and Mark Turner
2002 The Way We Think: Conceptual Blending and the Minds Hidden Complex-
ities. New York: Basic Books.
Gibbs, Raymond W. Jr.
1994 The Poetics of Mind: Figurative Thought, Language, and Understanding.
Cambridge: Cambridge University Press.
1996 Why many concepts are metaphorical. Cognition 61, 309319.
2006 Embodiment and Cognitive Science. Cambridge: Cambridge University Press.
Goman, Erving
1981 Forms of Talk. Philadelphia, PA.: University of Pennsylvania Press.
Goldberg, Adele E.
1995 Constructions: A Construction Grammar Approach to Argument Structure.
Chicago/London: The University of Chicago Press.
Goossens, Louis
2002 Metaphtonymy: The interaction of metaphor and metonymy in expressions
for linguistic Action. In Rene Dirven and Ralf Po rings (eds.), Metaphor and
Metonymy in Comparison and Contrast. Berlin/New York: Mouton de
Gruyter, 349378.
Grady, Joseph
1999 A Typology of Motivation for Conceptual Metaphor: Correlation vs. Re-
semblance. In Raymond W. Gibbs and Gerard J. Stehen (eds.), Metaphor
in Cognitive Linguistics. Amsterdam/Philadelphia: Benjamins, 79100.
Grady, Joseph, Todd Oakley, and Seana Coulson.
1999 Blending and Metaphor. In Raymond W. Gibbs and Gerard J. Steen (eds.),
Metaphor in Cognitive Linguistics. Amsterdam/Philadelphia: Benjamins,
101124.
Hebb, D. O.
1972 Textbook of Psychology. Philadelphia/London: W.B. Saunders Company.
Jing-Schmidt, Zhuo
2007 Negativity bias in language: A cognitive-aective model of emotive intensi-
ers. Cognitive Linguistics 18 (3), 417443.
Johnson, Mark
1987 The body in the mind: The bodily basis of meaning, imagination, and reason.
Chicago: University of Chicago Press.
Johnson-Laird, P. N. and Keith Oatley
2000 Cognitive and Social Construction in Emotions. In Michael Lewis and Jean-
nette M. Haviland-Jones (eds.), Handbook of Emotions, 2nd edition. New
York/London: The Gilford Press, 458475.
Kass, Leon R.
1999 The Hungry Soul. Chicago: The University of Chicago Press.
280 Z. Jing-Schmidt
Kornacki, Pawe
2001 Concepts of anger in Chinese. In Jean Harkins and Anna Wierzbicka (eds.),
Emotions in Crosslinguistic Perspective. Berlin/New York: Mouton de
Gruyter, 255290.
Ko vecses, Zoltan
1999 Metaphor: Does it constitute or reect cultural models? In Raymond W.
Gibbs and Gerard J. Steen (eds.), Metaphor in Cognitive Linguistics.
Amsterdam/Philadelphia: Benjamins, 167188.
2002 Metaphor: A Practical Introduction. Oxford/New York: Oxford University
Press.
2005 Metaphor in Culture. Cambridge: Cambridge University Press.
Ko vecses, Zoltan and Gunter Radden
1998 Metonymy: developing a cognitive linguistic view. Cognitive Linguistics 9
(1), 3777.
Lako, George
1987 Women, Fire, and Dangerous Thing: What Categories Reveal about our
Mind. Chicago: The University of Chicago Press.
Lako, George and Mark Johnson
1980 Metaphors We Live By. Chicago: The University of Chicago Press.
1999 Philosophy in the Flesh. New York: Cambridge University Press.
Langacker, Ronald W.
1987 Foundations of Cognitive Grammar, vol. I. Stanford, CA.: Stanford Univer-
sity Press.
Matiso, James A.
1979 Blessings, Curses, Hopes, and Fears: Psycho-Ostensive Expressions in Yid-
dish. Stanford: Stanford University Press.
Nosofsky, R. M.
1986 Attention, similarity, and the identication-categorization relationship.
Journal of Experimental Psychology, General 115, 3957.
Nunberg, Georey, Ivan A. Sag and Thomas Wasow
1994 Idioms. Language 70, 491538.
Peeters, Giudo and Czapinski, Janusz
1989 Positive-negative asymmetry in evaluations: The distinction between aec-
tive and informational negativity eects. In W. Stroebe and M. Hewstone
(eds.), European Review of Social Psychology, vol. 1. Chichester, UK: Wiley,
3360.
Pratto, Felicia and John, Oliver P.
1991 Automatic vigilance: the attention-grabbing power of negative social
information. Journal of Personality and Social Psychology 61, 380
391.
Reddy, Michael J.
1979 The conduit metaphor: A case of frame conict in our language about
language. In Andrew Ortony (ed.), Metaphor and Thought. Cambridge:
Cambridge University Press, 164201.
Rozin, Paul
1999 Food is fundamental, fun, frightening, and far-reaching. Social Research 66,
930.
Rozin, Paul and Edward B. Royzman
2001 Negativity Bias, Negativity Dominance, and Contagion. Personality and
Social Psychology Review 5(4), 296320.
Chinese metonymies and metaphors 281
Russell, James A.
1979 Aective space is bipolar. Journal of Personality and Social Psychology 37,
345356.
Semino, Elena
2005 The metaphorical construction of complex domains: The case of speech
activity in English. Metaphor and Symbol 20 (1), 3570.
Simon-Vandenbergen, Anne-Marie
1995 Assessing Linguistic Behaviour: A Study of Value Judgements. In Louis
Goossens, Paul Pauwels, Brygida Rudzka-Ostyn, Anne-Marie Simon-
Vandenbergen, and Johan Vanparys (eds.), By Word of Mouth: Metaphor,
Metonymy, and Linguistic Action in Cognitive Perspective. Amsterdam/
Philadelphia: Benjamins, 71124.
Skowronski, John J. and Donal E. Carlston
1988 Negativity and extremity biases in impression formation: A review of ex-
planations. Psychological Bulletin 105, 131142.
Sweetser, Eve
1990 From Etymology to Pragmatics. Cambridge: Cambridge University Press.
Taylor, Shelley E.
1991 Asymmetrical eects of positive and negative events: the mobilization-
minimization hypothesis. Psychological Bulletin 110 (1), 6785.
Turner, Mark and Gilles Fauconnier
1995 Conceptual integration and formal expression. Metaphor and Symbolic
Activity 10, 183203.
White, Georey M.
1994 Aecting culture: emotion and morality in everyday life. In S. Kitayama and
H. R. Markus (eds.), Emotion and Culture. Washington D.C.: American
Psychological Association.
Yao, Naiqiang (ed.)
2000 Xinhua Zidian (Xianhua Chinese Dictionary). Beijing: Shangwu Yinshu-
guan.
Ye, Zhengdao
2001 An inquiry into sadness in Chinese. In Jean Harkins and Anna Wierz-
bicka (eds.), Emotions in Crosslinguistic Perspective. Berlin/New York:
Mouton de Gruyter, 359404.
Yu, Ning
2000 Figurative uses of nger and palm in Chinese and English. Metaphor and
Symbol 15, 159175.
2001 What does our face mean to us? Pragmatics and Cognition 9, 136.
2002 Body and emotion: body parts in Chinese expressions of emotion. In En-
eld, N. and Anna Wierzbicka (eds.), The Body in Description of Emotion:
Cross-linguistic Studies, Pragmatics and Cognition (special issue) 10, 341367.
2003a Metaphor, body, and culture: The Chinese understanding of gallbladder and
courage. Metaphor and Symbol 18, 1331.
2003b The bodily dimension of meaning in Chinese: what do we do and mean
with hands? In Eugene H. Casad and Gary B. Palmer (eds.), Cognitive
Linguistics and Non-Indo-European Languages. Berlin/New York: Mouton
de Gruyter, 337362.
Zhu, Zuyan (ed.)
2002 Hanyu Chengyu Dacidian (Chinese Dictionary of Idioms). Beijing: Zhong-
hua Shuju.
282 Z. Jing-Schmidt
Subjects in the hands of speakers:
An experimental study of syntactic subject
and speech-gesture integration
FEY PARRILL*
Abstract
Work by Russell Tomlin has shown that there is a close relationship
between the syntactic subject of an utterance and the entity the speakers
attention is focused on while the utterance is being formulated, for descrip-
tions of a simple event (Tomlin 1985, 1995, 1997). The experiment pre-
sented in this paper demonstrates that the same eect can be obtained for
a more complex event, and that attention also impacts the spontaneous
hand gestures produced along with speech. The paper shows that both syn-
tactic subject and the information contained in gesture can be manipulated
by changing which entity a speaker is focused on during utterance formula-
tion. This pattern suggests that changes in conceptualization give rise to
changes in both speech and gesture.
Keywords: co-speech gesture, attention, syntactic subject.
1. Introduction
Language researchers must deal with the following very fundamental and
very troublesome question: how do speakers choose between the dierent
syntactic structures available in their language when encoding a mental
representation? For instance, why does a person say Mark was scared by
the raccoon rather than the raccoon scared Mark? Within traditional ap-
proaches, these two sentences are assumed to be semantically equivalent,
so why choose one over the other? Researchers tend to agree that such a
choice is driven by a dierence in the speakers underlying construal of
the situation. (Perhaps the raccoon is more central to the conversation in
the latter case.) There is wide disagreement, however, about the formal
apparatus necessary for describing the relationship between speaker con-
strual and grammatical structure. Some approaches require a series of
Cognitive Linguistics 192 (2008), 283299
DOI 10.1515/COG.2008.011
09365907/08/00190283
6 Walter de Gruyter
translations from one part of the language system to anotherfrom
pragmatics, to semantics, to syntax, for example. Other approaches sug-
gest that syntactic structures can directly reect the outcome of cognitive
processes. This paper advocates for the latter kind of approach. Speci-
cally, the paper oers further support for a hypothesized link between
sentence structure and attention. A number of authors have suggested
that attention plays a major role in determining how language users em-
ploy grammatical constructions (MacWhinney 1977; Talmy 1996, 2007,
forthcoming; Tomlin 1985, 1995, 1997). In addition, using a simple exper-
imental paradigm (discussed in more detail below), Russell Tomlin has
shown that the element a persons attention is focused on while she
formulates an utterance is likely to be encoded as the subject of that utter-
ance (Tomlin 1985, 1997). The choice between an active or passive sen-
tence (as in the examples above) thus reects a dierence in the how the
speakers attention is deployed.
While Tomlin shows that this pattern obtains in a number of lan-
guages, this work is open to certain criticisms. First, Tomlin focuses on a
very simple (transitive) event. Second, psycholinguistic experiments are
particularly vulnerable to the claim that a pattern observed in the
data arises from what participants think they should do in an experi-
ment (sometimes referred to as demand characteristics: Intons-Peterson
1983), rather than from the way that language works under normal
circumstances.
The experiment described in this paper provides responses to these two
criticisms. First, a modication of Tomlins paradigm will be used to ex-
plore the role of attention in predicting syntactic subject in descriptions of
a very complex (caused motion) event. Second, an additional source of
information will be exploited to support the claim that syntax and con-
ceptualization are linked: the hand gestures that people produce while
they are talking. Because these gestures are packed with meaning that is
directly connected to the meaning of the accompanying speech, they are
extremely informative about the relation between language and conceptu-
alization. Because they are not consciously monitored, however, they can
oer a more direct path to the speakers conceptualization than does
speech.
There are two goals for this paper. The rst is to demonstrate a connec-
tion between conceptual structure and grammatical form, along the lines
of Tomlins proposal. The paper will show that the eect of attention on
speech Tomlin obtains can be observed for gesture as well. That is, both
the syntactic subject of an utterance and the information encoded in ges-
ture can be manipulated by changing which entity a speaker is focused on
while planning her utterance. Because gesture can provide information
284 F. Parrill
about imagistic aspects of a speakers representations (Beattie and Shov-
elton 2002; Goldin-Meadow 2003; Kita 2000; Kita and O

zyu rek 2003;


McNeill 1992; Parrill and Sweetser 2004; Sweetser 1998), we can make
more forceful claims about how those representations might be expressed
linguistically. The second goal is to show that speech-gesture patterning
can be manipulated. Previous work on the integration of speech and ges-
ture takes advantage of grammatical dierences across languages that are
thought to correlate with dierences in the information encoded in ges-
ture (Kita and O

zyu rek 2003; McNeill and Duncan 2000). This study,


on the other hand, attempts to evoke dierent patterns of conceptualiza-
tion for the purposes of linguistic expression (often referred to as thinking
for speaking: Slobin 1987, 1996) within a single language.
In what follows, extremely basic information about the kinds of ges-
tures that occur with speech, and their connection to that speech will be
presented.
1
The experimental paradigm Tomlin employs will then be re-
viewed, and the modications necessary to permit an analysis of gesture
will be explained. An experiment replicating Tomlins basic ndings for
syntactic subject and extending the ndings to the domain of gesture will
then be presented.
2. Coordination of speech and gesture
When people talk they typically also produce hand and arm motions, or
gestures. These gestures link spaces in front of the speakers body to
topics in the discourse, point to elements that are not physically present,
or depict aspects of scenes the speaker is describing. In all such cases,
the gestures produced are very closely connected to the accompanying
speech. Because of this tight connection, many researchers have come to
believe that gesture is part of the language systemthat language is not
just speech (or sign), but speech (or sign) plus gesture (Goldin-Meadow
2003; Liddell 2003; McNeill 1992, 2005, 2000; Nun ez and Sweetser 2006;
Parrill and Sweetser 2004; Sweetser 1998).
Once language has been broadened to include gesture, discovering ex-
actly how the two modalities are coordinated during production presents
a major theoretical problem. In the past few decades, there have been
many signicant discoveries on this front (for reviews, see Goldin-
Meadow 2003; Kendon 2004; McNeill 2005). Much of the work that ex-
plicitly examines the encoding of semantic information in the two chan-
nels has focused on iconic gestures (gestures that iconically depict some
aspect of an event or scene) produced when the speaker is talking about
motion. There are two reasons for this focus. First, gestures tend to be
very frequent and complex when we speak about motion. Second, motion
Subjects in the hands of speakers 285
is a relatively well-understood semantic domain. The experiment pre-
sented here also involves motion event descriptions, so some of this work
will be briey reviewed.
2.1. Speech and gesture in descriptions of motion
Motion events can be studied by breaking an event up into a set of com-
ponents, as Talmy (1985) has done. These components (conventionally
placed in small caps) include path (the trajectory of motion), manner (in-
ternal structure of the motion), figure (the moving object), and others.
Talmy has examined the ways in which dierent languages express these
components, and has sorted languages into two groups based on how
path is encoded. If path is encoded in a satellite (e.g., a prepositional
phrase), the language is referred to as a satellite-framed language. This is
true of English, which typically uses prepositional phrases for path, and
conates manner and activity in a motion verb (e.g., the ball rolled down
the hill ). Verb-framed languages, on the other hand, have many verbs en-
coding path or direction while manner is encoded in a separate phrase
(e.g., Spanish fue a Ibiza nadando, literally he went to Ibiza swimming).
Interestingly, studies of gesture production in motion event descriptions
reveal that speakers also gesture dierently depending on whether their
language is satellite-framed or verb-framed (Kita and O

zyu rek 2003; Mc-


Neill and Duncan 2000). Speakers of satellite-framed languages tend to
accompany utterances containing manner verbs with path-only gestures,
unless there is particular focus on manner in the description (McNeill
and Duncan 2000). Figure 1 is an example of an English motion event de-
scription exhibiting this typical pattern. The verb (roll ) encodes manner,
while the prepositional phrase (down the street) encodes path. The ges-
ture also encodes path. The studies presented here ask whether this
typical pattern can be manipulated by shifting the speakers attention.
Speech: then hes like [rolling down the street]. Gesture: speakers right
hand moves downward from left to right (path) while left hand holds.
The gesture occurs during the bracketed speech. The peak prosodic
emphasis is in bold. These transcription conventions, which are used
throughout the paper, are borrowed from McNeill 1992.
3. Linking attention and syntactic subject
People are very likely to gesture when describing a motion event. The
paradigm Tomlin has used to explore the link between attention and syn-
tactic subject involves asking participants to describe an event. For this
reason, Tomlins paradigm can be adapted for analysis of speech-gesture
coordination.
286 F. Parrill
In Tomlins study, participants describe an event as they are watching
it unfold. In this event, two cartoon sh swim towards each other. When
they reach the center of the screen, one sh opens its mouth and swallows
the other. At two points during the scene, an arrow appears above one of
the sh, directing participants attention to that sh, as pictured in Figure
2. The second appearance of the arrow occurs right before the swallowing
event. Participants are instructed to x their eyes on the element to which
arrow points.
If attention is directed to the agent sh (the sh doing the eatingin
this case, the grey sh), participants produce an active sentence to
describe the event, such as the grey sh eats the white sh. If attention is
directed to the patient sh (the sh being eaten, as is the case in the
gure), speakers produce a passive sentence, such as the white sh gets
eaten by the grey sh. In other words, the choice between these two
Figure 1. PATH gesture
Figure 2. Stimulus for Tomlins experiments (redrawn from Tomlin 1997)
Subjects in the hands of speakers 287
grammatical patterns, the active and the passive, is regulated by atten-
tion. (It should be noted that while English is the focus here, Tomlin has
used this paradigm with a variety of languages: see Tomlin 1997.)
3.1. Adaptation of Tomlins paradigm
For the experiment described here, Tomlins paradigm is used with a car-
toon motion event. Participants watch a video clip involving two ele-
ments and describe it while it is unfolding. Their attention is directed to
one of the two elements using a visual cue (an arrow). The event used
for Experiment 1 is a short segment from a cartoon (Pierce and Freleng
1950). Because this manipulation requires the event to have very specic
properties, the stimulus will be described in a fair amount of detail.
(A complete scene-by-scene description of the cartoon from which the
stimulus comes, as well as a very detailed description of this scene, can
be found in McNeill 1992, pp. 366374).
The cartoon depicts the antics of a cat and a bird. The cat is attempting
to catch the bird. The bird is on a building window ledge, while the cat
is below on the street. The cat climbs up towards the bird through the
interior of a drainpipe axed to the side of the building. The bird sees
him coming and puts a bowling ball into the mouth of the drainpipe.
The viewer sees the impact within the pipe of the cat and ball colliding.
The cat then emerges from the bottom of pipe with the ball inside his
bodyone infers that he has swallowed the ball. Because the street is on
an incline, the ball begins to roll inside the cat, propelling him down the
street. The cats legs ail in a circular motion as he rolls, though they do
not touch the ground (thus are not the cause of motion). The event of spe-
cic interest for the experiment is the nal portion of the scene, where the
cat moves down the street with the ball inside his body, shown in Figure
3. This will be referred to as the target event.
Figure 3. Target event for experiment
288 F. Parrill
This event was used because it meets certain requirements imposed by
the nature of Tomlins attention manipulation. First, there are two enti-
ties participating in one event (the ball and the cat). Second, the motion
event components associated with the two entities in the event are separa-
ble. While the cats motion involves manner, it also involves a simple
path. The balls motion involves a very distinctive rotating manner. A
narrator can potentially encode one, both, or neither of these components
in gesture. path alone would look like Figure 1. Such a gesture encodes
trajectory, but no explicit manner of motion.
2
Examples of very similar
speech accompanied by manner alone and path plus manner gestures
are shown in Figures 4 and 5, respectively. (It should be noted that par-
ticipants also sometimes represent the ailing motion of the cats legs in
gesturesuch gestures are discussed below in the analysis section.)
Speech: so hes [rolling down the street]. Gesture: speakers right hand
traces a circular path repeatedly.
Speech: then it is [rolling down the hill ]. Gesture: speakers left hand
traces a circular path moving downward.
This paradigm also requires the language being spoken (English) to
have a syntactic alternation that allows either of the two elements to
appear as sentence subject. This is possible with the target event. Partici-
pants can use a causative (the ball rolls the cat down the hill ) or an intran-
sitive motion construction (the cat rolls down the hill ) to describe the
event. Finally, it is possible to predict a dierence in how participants
will gesture as a function of which element they encode as the subject of
Figure 4. MANNER alone gesture
Subjects in the hands of speakers 289
their utterance. This is because this event has been used for a number of
experiments exploring speech-gesture patterning in motion event descrip-
tions (Duncan 1996; Kita 2000; Kita and O

zyu rek 2003; McNeill 1992,


2000, 2005; McNeill and Duncan 2000). This body of research has shown
that the typical way for an English speaker to describe the target event is
to say the cat rolls down the street, and to accompany the verb phrase
with a gesture depicting path (the trajectory followed by the cat), but
not rotating manner (showing the motion of the ball). This typical pat-
tern is shown in Figure 1. Knowledge of this typical pattern allowed
attention to be manipulated and predictions to be made about the ways
in which gesture would change.
4. Experimental manipulation of speech-gesture integration
Before discussing the experiment and predictions, it is important to note
that this stimulus diers from Tomlins in a number of signicant ways.
The two sh in Tomlins experiments are identical except in color. This
is not true of the bowling ball and the cat, which are dierent sizes and
shapes. Second, the visual cue that directs attention to one of the two
sh in Tomlins stimulus is extremely carefully timed. It takes about 150
milliseconds for a person to shift attention from one target to another
(Posner and Petersen 1990). In Tomlins stimulus, the arrow appears
over the sh 75 milliseconds before the eating event, and the eating event
is very brief (220 ms). This timing ensures that participants will not be
able to shift attention away from the entity the arrow is indicating before
Figure 5. PATH MANNER gesture
290 F. Parrill
beginning to describe the event. Again, this is not the case with the target
event in Experiment 1. The target event lasts 8700 ms. As a result, al-
though the arrow remains on the screen during the entire target event,
participants in this study can shift their attention during their descriptions
of the action. Third, both the agent and the patient in Tomlins stimulus
are animate, whereas in the target event, one of the elements in inanimate
(the bowling ball). Animate entities are thought to make better subjects
(Bock 1986; Chafe 1976; Tomlin 1997). Finally, the event in Tomlins ex-
periment lacks any kind of narrative complexity. In the target event, there
is a fair amount of visual and narrative complexity. One is required to
make inferences about what has happened (e.g., that the cat and ball
have collided, that the cat has swallowed the ball), and each new event is
relatively unpredictable. In summary, much of the careful control built
into Tomlins paradigm is lacking here.
4.1. Methods
Thirty University of Chicago students participated in the experiment for
payment. All were native speakers of English. Each arrived at the experi-
ment room with a friend who served as a listener during the participants
narration. (Participants produce more naturalistic narrations when not
speaking to an experimenter who they may assume has already seen
the stimulus: Parrill, forthcoming.) Each participant watched three one-
minute cartoon clips and described what was happening as the clip un-
folded. At the beginning of the experiment participants were given the
following verbal instructions:
Toward the end of the video, you will see a red arrow on the screen pointing to an
element in the video. When you see the arrow, keep your eyes on the element the
arrow is pointing to. Do not mention the arrow in your description, though. Just
keep describing whats happening in the clip.
Before watching each clip, participants were reminded of these instruc-
tions. The rst two clips were practice clips and will not be discussed fur-
ther. The third (experimental) clip was the bowling ball scene described
above. The stimuli were presented on a laptop, which was placed between
the two participants so that only the narrator could see the screen. The
experimental set-up can be seen in Figures 1, 4 and 5. During the target
event (the cats transit down the street), a red arrow ashed either above
or below the cat, as shown in Figures 6a and 6b. These conditions will be
referred to as the cat arrow and ball arrow conditions, respectively. Fif-
teen participants were in the cat arrow condition and fteen were in the
ball arrow condition. To ensure that the arrow was actually directing
Subjects in the hands of speakers 291
attention to the cat or the ball (not the cats head and the cats bottom),
participants were explicitly asked during debrieng what they thought the
arrow pointed to, and always responded appropriately (cat in the cat
arrow condition, ball in the ball arrow condition).
4.2. Predictions
English speakers typically describe this event by saying the cat rolls down
the street, and produce a gesture that depicts path. Participants in the
cat arrow condition are expected to produce just this pattern. That is, be-
cause their attention is directed to the cat, no change from the default
pattern is anticipated. In the ball arrow condition, however, participants
are expected to produce more utterances with the ball as the subject. This
is because their attention has been directed to the ball. They are also ex-
pected to produce more rotating manner gestures (with or without a path
component) because the ball exhibits this kind of manner. When there is
particular focus on manner, it tends to appear in gesture (McNeill and
Duncan 2000).
4.3. Analysis
The speech and gesture each participant produced for the target event
was transcribed. Each utterance describing the target event was coded as
having either the cat or the ball as the syntactic subject. Participants ges-
tures were sorted into the following categories: gestures depicting rotating
manner (an example of which can be seen in Figure 4), rotating manner
combined with path (shown in Figure 5), path alone (shown in Figure
1), and gestures depicting the manner in which the cats legs move as he
rolls (leg manner). The latter gesture always involves two hands, moving
in apping motions, shown in Figure 7, and is very distinct from the rota-
tion gestures discussed above. No gestures were produced for the target
event that did not contain motion event information.
Figures 6a and 6b. Cat and ball arrow conditions
292 F. Parrill
Speech: His feet are sort of like [not touching the ground ]. Gesture:
speakers two hands move up and down.
4.4. Results
Results describing the eect of the manipulation on speech will be pre-
sented rst. Interestingly, while directing participants attention to the
ball did make them more likely to produce utterances in which the ball
was the subject, the manipulation also made participants in the ball arrow
condition more likely to produce multiple utterances describing the target
event. Table 1 shows the number of ball-subject utterances and cat-
subject utterances produced in the ball arrow and cat arrow conditions.
Participants in the cat arrow condition universally produced one utter-
ance per participant, and the cat was the syntactic subject of all of these
utterances. Participants in the ball arrow condition, on the other hand,
produced an average of 2.4 utterances. For half the utterances produced
in this condition the ball was the syntactic subject.
Figure 7. Gesture depicting MANNER of motion of cats legs
Table 1. Number and syntactic subject of utterances
Syntactic subject of utterance
Condition Ball Cat Total
Ball arrow 18 18 36
Cat arrow 0 15 15
Total 18 33
Subjects in the hands of speakers 293
The two groups diered in the number of utterances produced (Mann-
Whitney U 108, p 0:001). In order to explore the relationship be-
tween syntactic subject and condition, a chi-square test for independence
was used. A chi-square statistic was calculated based on the rst utterance
produced by each participant. The rationale for this analysis is as follows.
If the attention manipulation is successful, participants in the ball arrow
condition should rst produce a ball-subject utterance, even if the cat is
the syntactic subject of subsequent utterances. The frequencies of ball-
subject and cat-subject initial utterances are shown in Table 2. A chi-
square test showed that condition had a signicant eect on the syntactic
subject of the rst utterance produced (w
2
1; N 30 15, p < 0:001).
The majority of participants in the ball arrow condition rst produced
a ball-subject utterance. Interestingly, the ve participants who did not
begin their description of the target event with a ball-subject utterance
never produced one. In other words, the manipulation appeared to have
no eect on those participants.
To explore the eect of the experimental manipulation on the gestures
participants produced, gestures were sorted into those having a rotation
component (rotating manner alone: 9 produced; rotating manner with
path: 2 produced) and those not having a rotation component (path ges-
tures: 16 produced; gestures depicting the manner of motion of the cats
legs: 2 produced), as shown in Table 3.
Participants in the cat arrow condition never produced gestures in
which rotation was present, while participants in the ball arrow condi-
tion produced an average of .73 rotating manner gestures. The dierence
across the two groups approaches signicance (Mann-Whitney U 150,
Table 2. First utterance produced by each participant
Syntactic subject of utterance
Condition Ball Cat Total
Ball arrow 10 5 15
Cat arrow 0 15 15
Total 10 20
Table 3. Hand gestures
Rotation No rotation Total
Ball arrow 11 13 24
Cat arrow 0 5 5
Total 11 18
294 F. Parrill
p :06). Of the 11 rotating manner gestures, 63 percent accompanied
ball-subject utterances. While these numbers are too small to permit a
meaningful statistical analysis, they suggest a link between syntactic sub-
ject and rotating manner.
In this study, gestures involving rotation were assumed to reect think-
ing about balls motion, not the motion of the cats legs. This is because
gestures accompanying speech about the cats legs look quite distinct, as
noted above. Further, if rotating manner were associated with the cat,
the results above might be surprising. While the exact signicance of a
gesture can only be inferred, the pattern above oers some support for
the starting assumptions of this study.
4.5. Discussion
The attention manipulation inuenced which entity participants encoded
as the subject of utterances describing the target event. When attention
was directed to the cat, participants uniformly encoded the cat as the sub-
ject of their utterances, following the typical English pattern. When atten-
tion was directed to the ball, however, participants tended to rst produce
an utterance with the ball as the syntactic subject, but then to revert to
the typical pattern. An impulse to return to the typical English pattern is
one explanation for the larger number of utterances produced in the ball
arrow condition. That is, participants strongly prefer to describe the event
with utterances in which the cat is the syntactic subject, and continue to
talk until they have done so.
Why might the cat be the preferred syntactic subject? First, as noted
above, the cat is animate and animate entities make for better syntactic
subjects (Chafe 1994; Tomlin 1995; 1997). Second, the cat is more central
to the overall narrative than is the ball. The cat is a well-known character
and appears in other scenes (making it discourse-old), whereas the ball
appears only briey. Such factors also make an entity a better candidate
for syntactic subject (Chafe 1994; MacWhinney 1977; Tomlin 1995,
1997).
A second explanation for the larger number of utterances in the ball
arrow condition has less to do with the properties of the cat, and more
to do with the complexity of the motion event. Choosing the cat as the
subject may allow speakers to avoid encoding some of the complexity of
the event. That is, the cats trajectory can be described without giving any
detail about the causal source of that trajectory (the balls rotation).
While space does not permit a full discussion of the speech produced in
this study, it is noteworthy that thirteen of the participants in the cat ar-
row condition described the event by saying hes rolling down the street.
Subjects in the hands of speakers 295
The other two used the same construction but with the verbs scoot and
run. Such descriptions give very little detail about the real manner of mo-
tion, and might even create a misleading impression of what transpired.
Participants in the ball arrow condition, on the other hand, are cued to
focus attention on the motion of the ball. As a result, they describe the
event in more detail. When participants in this condition selected the ball
as the syntactic subject, they produced a variety of constructions. While
the balls rolling down the street was most frequent (57%), caused motion
constructions were also used (the balls pushing/dragging/rolling him
down the street: 25%), as were descriptions of the balls location (the balls
in his stomach: 18%).
Variation in the number of utterances produced when describing an
event has previously been noted for speakers of dierent languages (Kita
et al. 2005). Such dierences are assumed to be a product of the dierent
semantic and syntactic resources provided by the language. The current
project, however, shows that the number of utterances produced is inu-
enced by a speakers decisions about what is most important for the
narrative at the moment of utterance formulation, not just by how her
language packages information. That is, these data may inform us about
thinking for speaking within a single language.
The patterns observed also indicate that speech and gesture change in
coordinated ways. Dierences in the selection of syntactic subject as a
function of the attention manipulation were associated with dierences
in the motion-event components appearing in gesture. Participants pro-
duced rotating manner gestures only in the ball arrow condition, and
tended to produce them with ball-subject utterances. While there is a
large body of work exploring how speech and gesture pattern dierently
as a function of the grammatical features of the language spoken, this is
the rst systematic manipulation of speech-gesture integration within a
single language.
5. Conclusion
The entity on which a speakers attention is focused while she is planning
her utterance shapes both what she says and how she gestures. This holds
true even when the event being described is very complex, thus extending
Russell Tomlins work on a simple transitive event. The patterns observed
also indicate that while attention is a powerful force in language produc-
tion, factors such as animacy and discourse relevance also play a central
role in determining how a person will speak and gesture. These results
have implications for our understanding of the linguistic system as well
as for our understanding of language as a multimodal behavior. They
296 F. Parrill
suggest that changes in conceptualization give rise to changes in both
speech and gesture. It therefore seems prudent to include gesture in our
accounts of the human language system.
Received 16 January 2007 Case Western Reserve University,
Revisions received 24 September 2007 USA
Notes
* I am grateful for extensive and thoughtful comments from the editor of Cognitive
Linguistics and from three anonymous reviewers. Authors aliation: Department
of Cognitive Science, Case Western Reserve University. Authors e-mail address:
fey.parrill@case.edu.
1. Gestures are also produced in the absence of speech. This paper focuses on a rela-
tively narrow subset of gesture, co-speech gestures. More general discussions of gesture
can be found in Kendon 1981, 2004; Ekman and Friesen 1972; Mu ller and Posner,
2004.
2. One reviewer suggests that sliding manner of motion and trajectory alone (path) will be
indistinguishable, thus the gesture depicted in Figure 1 could potentially encode manner
as well, just of a dierent type (sliding rather than rotating). This distinction happens
not to be important for the study, however.
References
Beattie, Georey and Heather Shovelton
2002 An experimental investigation of the role of dierent types of iconic ges-
ture in communication: A semantic feature approach. Gesture 1(25), 129
149.
Bock, J. Kathryn
1986 Syntactic persistence in language production. Cognitive Psychology 18(3),
355387.
Chafe, Wallace
1976 Givenness, contrastiveness, deniteness, subjects, topics, and point of view.
In Li, Charles N. (ed.), Subject and Topic. New York: Academic Press,
2755.
1994 Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: Chicago University
Press.
Duncan, Susan D.
1996 Grammatical form and thinking-for-speaking in Mandarin Chinese and
English: an analysis based on speech-accompanying gestures. Unpublished
doctoral dissertation, University of Chicago, Chicago IL.
Ekman, Paul and Wallace Friesen
1972 Hand movements. The Journal of Communication 22, 353374.
Goldin-Meadow, Susan
2003 Hearing Gesture: How Our Hands Help Us Think. Cambridge: Belknap
Press of Harvard University Press.
Subjects in the hands of speakers 297
Intons-Peterson, Margaret J.
1983 Imagery paradigms: How vulnerable are they to experimenters expecta-
tions? Journal of Experimental Psychology: Human Perception and Perfor-
mance 9, 394412.
Kendon, Adam (ed.)
1981 Nonverbal Communication, Interaction and Gesture: Selections from Semio-
tica. The Hague: Mouton.
2004 Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kita, Sotaro
2000 How representational gestures help speaking. In McNeill, David (ed.),
Language and Gesture. Cambridge: Cambridge University Press, 162
185.
Kita, Sotaro and Asl O

zyu rek
2003 What does cross-linguistic variation in semantic coordination of speech and
gesture reveal? Evidence for an interface representation of spatial thinking
and speaking. Journal of Memory and Language 48(1), 1632.
Kita, Sotaro, Asl O

zyu rek, Shanley Allen, Reyhan Furman and Amanda Brown


2005 How does linguistic framing of events inuence co-speech gestures? In-
sights from cross-linguistic variations and similarities. Gesture 5(1/2), 219
240.
Liddell, Scott K.
2003 Grammar, Gesture, and Meaning in American Sign Language. Cambridge:
Cambridge University Press.
MacWhinney, Brian
1977 Starting points. Language 53, 152187.
McNeill, David
1992 Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
2005 Gesture and Thought. Chicago: University of Chicago Press.
McNeill, David (ed.)
2000 Language and Gesture. Cambridge: Cambridge University Press.
McNeill, David and Susan D. Duncan
2000 Growth Points in thinking-for-speaking. In McNeill, David (ed.), Language
and Gesture. Cambridge: Cambridge University Press, 141161.
Mu ller, Cornelia and Roland Posner (eds.)
2004 The Semantics and Pragmatics of Everyday Gestures: Proceedings of the
Berlin Conference April 1998. Berlin: Weidler.
Nun ez, Rafael and Eve Sweetser
2006 With the future behind them: Convergent evidence from Aymara language
and gesture in the crosslinguistic comparison of spatial construals of time.
Cognitive Science 30 (5), 401450.
Parrill, Fey and Eve Sweetser
2004 What we mean by meaning: Conceptual integration in gesture analysis and
transcription. Gesture 4(2), 197219.
Parrill, Fey
forthc The hands are part of the package: Gesture, common ground, and informa-
tion packaging. In Rice, Sally and John Newman (eds.), Empirical and
Experimental Methods in Cognitive/Functional Research. Stanford: CSLI
Publications.
298 F. Parrill
Pierce, Tedd (writer), and Friz Freleng (director)
1950 Canary Row [Television series episode]. Los Angeles: Warner Brothers.
Posner, Michael I. and Steven E. Petersen
1990 The attention system of the human brain. Annual Review of Neuroscience 13,
2442.
Slobin, Dan I.
1987 Thinking for speaking. In Aske, John, Natasha Beery, Laura Michaelis and
Hana Filip (eds.), Proceedings of the 13th Annual Meeting of the Berkeley
Linguistic Society. Berkeley: Berkeley Linguistic Society, 435445.
1996 From thought and language to thinking for speaking. In Gumperz,
John J. and Stephen C. Levinson (eds.), Rethinking Linguistic Relativity.
Cambridge: Cambridge University Press, 7096.
Sweetser, Eve
1998 Regular metaphoricity in gesture: Bodily-based models of speech interac-
tion. Actes du 16 Congres International des Linguistes.
Talmy, Leonard
1985 Lexicalization patterns: semantic structure in lexical forms. In Shopen,
Timothy (ed.), Language Typology and Syntactic Description. Cambridge:
Cambridge University Press, 5